Hi Anyone know what characters are tokenized, when...
# community-help
j
Hi Anyone know what characters are tokenized, when setting locale to
danish
?
k
We use the ICU library's tokenizer. I don't have a definite answer on the characters it tokenizes.
👍 1
j
It actually seems to work, when not setting locale, but when locale is set to
danish
, the characters
æ. ø, å
are tokenized So searching for something in default locale yields a result, whereas the same query (containing æ,ø or å) does not produce anything if locale is set to danish. This seems wierd?
f
Was this working on earlier versions of Typesense? Or is this a new issue that arose in
v28.0
and later?
j
Not sure - we just encounteret it now in v28.0
f
Did you set
locale: da
or
locale: danish
?
j
da
f
Hm, it seems that it is tied to the tokenizer's way of interpreting danish. Could you try changing it to
da_DK
?
j
We'll try that
Thanks
Hey. This doesn't work. The locale is not valid, we get this message: "The
locale
value of the field
name
is not valid."
k
We have to check what's happening. Can you please provide a small reproducible example in this format: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
👍 1