Does Typesense consider Korean logographic? I ask this because I am a fluent speaker of this language and it is a syllabic alphabet, quite different from Chinese, but am hearing conflicting information about how Typesense classifies it.
j
Jason Bosco
11/09/2022, 9:14 PM
May be we butchered the exact terminology for “logographic”, but we really meant to convey that Typesense works for any language that uses spaces between words. Korean doesn’t use spaces between words (and so we classified it as logographic), and have had to add specialized support for it in recent versions
🤔 1
Jason Bosco
11/09/2022, 9:15 PM
I’d recommend using the latest RC build 0.24.0.rcn30 and setting the
locale
for each field, to use the improved Korean tokenizer
p
Pete
11/09/2022, 9:17 PM
Cool, I'll try that. Thanks.
There are definitely spaces between words. I can see how it could seem different to English and romantic languages though.
j
Jason Bosco
11/09/2022, 9:19 PM
I see! Would love to get your feedback on how it works with your Korean dataset. We don’t have native Korean speakers on the core team, so we entirely rely on community feedback to improve support