#community-help

Understanding Typesense Language Search and Locale

TLDR Juri inquired about language search in Typesense, and Kishore Nallan clarified it requires a locale to index certain languages. Discussion also included how to specify these locales and that multiple languages can be searched without decreasing performance.

Powered by Struct AI

7

2

2

1

1

1

16
20mo
Solved
Join the chat
Feb 24, 2022 (20 months ago)
Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
05:18 AM
Hi, we are searching without specifying with which language we are searching. Is typesense just finding out itself which language it is? e.g dutch and german have many same words, as well as some latino, norse, and balkan languages. I assume that if I search e.g in dutch with similar words that also some german results will be shown, is that correct?

if I specify a custom field like “language_tag” and filter by e.g “language_tag”=“DE”, will that decrease the performance drastically? or is that fine?

thanks a lot!!!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:38 AM
Without a locale specified, Typesense just splits the text by space and removes accents etc. and indexes. So it can search such languages without any configuration. But some like Cyrillic require a locale.

1

1

Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
08:42 AM
Thanks a lot Kishore Nallan! Where are locales specified? ive been browsing the API docs https://typesense.org/docs/0.21.0/api/documents.html#index-a-document but I can’t find how to specify a locale
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:48 AM
We've done a lot of work on improving support for Cyrillic languages in 0.23 RC builds. See an example here: https://github.com/typesense/typesense/issues/438#issuecomment-992127176

Use a recent RC build like typesense/typesense:0.23.0.rc30 -- for other languages, that locale might not add much value, but I haven't specifically yet.

Support Cyrillic locales are:

locale == "el" ||
locale == "ru" || locale == "sr" || locale == "uk" || locale == "be";
Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
09:05 AM
Ahhhh so the ‘locale’ thing is not a feature, but just a field like
'filter_by' : 'num_employees:>100'
'filter_by' : 'food_name:vegetable'

so it is basically just an own field:
'filter_by' : 'locale:russian_language'
[dunno why the color is orange now]
so the locals are not specified by some ISO, i can call them how I want, right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:05 AM
Locale is a field's property.

1

09:06
Kishore Nallan
09:06 AM
You can could have 3 fields in your collection, each containing a different language. You use the locale property to tell Typesense which language each field belongs to.

1

Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
09:07 AM
so, is the ‘locale’ property a predefined feature now, or just a field? 😄
so instead of calling it ‘locale’ i could also call it ‘name_of_language’, correct?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:09 AM
Locale is the standard convention for defining language codes as per ISO convention.

1

1

09:10
Kishore Nallan
09:10 AM
If you don't give a locale, by default a whitespace separated ascii language is assumed.

1

1

Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
09:10 AM
AHHHH THANKS!!!! So if i dont specify anything, typesense will just guess, but I can also specify it - thanks a lot man!!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:11 AM
For Cyrllic languages some additional pre-processing is needed, so for them you have to specify them.

1

1

Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
09:11 AM
thanks a lot you guys are awesome i love you!!!!!!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:12 AM
🙌

1

1

1

1

Mar 03, 2022 (20 months ago)
Juri
Photo of md5-756d5da34cc5127c88730a39db749024
Juri
09:31 AM
Kishore Nallan “you can could have 3 fields in your collection” is 3 fields some kind of limit? Or can I also have 15 fields?
Mar 07, 2022 (20 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:09 AM
You can have as many fields as you like.