Hi, we are searching without specifying with which...
# community-help
j
Hi, we are searching without specifying with which language we are searching. Is typesense just finding out itself which language it is? e.g dutch and german have many same words, as well as some latino, norse, and balkan languages. I assume that if I search e.g in dutch with similar words that also some german results will be shown, is that correct? if I specify a custom field like “language_tag” and filter by e.g “language_tag”=“DE”, will that decrease the performance drastically? or is that fine? thanks a lot!!!
k
Without a locale specified, Typesense just splits the text by space and removes accents etc. and indexes. So it can search such languages without any configuration. But some like Cyrillic require a locale.
❤️ 1
🔥 1
j
Thanks a lot @Kishore Nallan! Where are locales specified? ive been browsing the API docs https://typesense.org/docs/0.21.0/api/documents.html#index-a-document but I can’t find how to specify a locale
k
We've done a lot of work on improving support for Cyrillic languages in 0.23 RC builds. See an example here: https://github.com/typesense/typesense/issues/438#issuecomment-992127176 Use a recent RC build like
typesense/typesense:0.23.0.rc30
-- for other languages, that
locale
might not add much value, but I haven't specifically yet. Support Cyrillic locales are:
Copy code
locale == "el" ||
locale == "ru" || locale == "sr" || locale == "uk" || locale == "be";
j
Ahhhh so the ‘locale’ thing is not a feature, but just a field like
Copy code
'filter_by' : 'num_employees:>100'
'filter_by' : 'food_name:vegetable'
so it is basically just an own field:
'filter_by' : 'locale:russian_language'
[dunno why the color is orange now] so the locals are not specified by some ISO, i can call them how I want, right?
k
Locale is a field's property.
❤️ 1
You can could have 3 fields in your collection, each containing a different language. You use the
locale
property to tell Typesense which language each field belongs to.
❤️ 1
j
so, is the ‘locale’ property a predefined feature now, or just a field? 😄 so instead of calling it ‘locale’ i could also call it ‘name_of_language’, correct?
k
Locale is the standard convention for defining language codes as per ISO convention.
🎉 1
❤️ 1
If you don't give a locale, by default a whitespace separated ascii language is assumed.
🙌 1
❤️ 1
j
AHHHH THANKS!!!! So if i dont specify anything, typesense will just guess, but I can also specify it - thanks a lot man!!
k
For Cyrllic languages some additional pre-processing is needed, so for them you have to specify them.
🙏 1
❤️ 1
j
thanks a lot you guys are awesome i love you!!!!!!
k
🙌
1
🙌 1
❤️ 1
🙏 1
j
@Kishore Nallan “you can could have 3 fields in your collection” is 3 fields some kind of limit? Or can I also have 15 fields?
k
You can have as many fields as you like.