#random

Typesense Multilingual Document Search

TLDR Mridul needed to search across source and translated documents. Jason and Sergio suggested putting translations in the same document, using regex with specific fields for different locales, and rebuilding the collection when adding fields.

Powered by Struct AI

1

16
6mo
Solved
Join the chat
May 18, 2023 (6 months ago)
Mridul
Photo of md5-934a3c15b1be03be70406428fd962f39
Mridul
01:26 AM
Hi all, our documents are structured as source and translated documents. When someone searches, we need to search across the source and their translations
However, when the result is returned, we need it to return both the source and all its translated documents (even the one not in the match), and count a set them as 1 document
Is this possible within Typesense?
All the related documents have a relationId field which is the same
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:05 AM
I’d recommend putting all translations inside the same document in Typesense, when indexing
02:05
Jason
02:05 AM
Eg: { fieldA_en, fieldA_fr, fieldA_de, fieldB_en, fieldB_fr, fieldB_de }
Mridul
Photo of md5-934a3c15b1be03be70406428fd962f39
Mridul
03:55 AM
Thanks Jason…I was trying to avoid it because the data sync from DB would not be just a single row updates …but maybe that’s the only way
May 19, 2023 (6 months ago)
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
09:08 AM
We rolled out this implementation too title_.*
By using wildcards we could expand to future locales without major collection changes 🙂
Mridul
Photo of md5-934a3c15b1be03be70406428fd962f39
Mridul
09:09 AM
Oh this is great…makes things so much better …can i index it as such?
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
09:11 AM
You then index title_en or title_es and a new field is generated.
09:12
Sergio
09:12 AM
And we have some logic to query Typesense by the locale the user requires
Mridul
Photo of md5-934a3c15b1be03be70406428fd962f39
Mridul
09:15 AM
How can I pass different locales when I define fields with regex ? Eg: text_chinese needs zh tokenizer , but text_en needs a different one
09:17
Mridul
09:17 AM
We can always define new collection and start using that when we add a new language, however that would mean a complete reindexing
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
09:17 AM
For those you would need to define them specifically
09:17
Sergio
09:17 AM
Order matter, so you could to
title_ja -> ja
title_zh -> zh
title_.* -> generic

1

Mridul
Photo of md5-934a3c15b1be03be70406428fd962f39
Mridul
09:18 AM
Okay…seeing that there are only a handful of tokenizers currently, we can do a comprehensive one without much overhead
09:19
Mridul
09:19 AM
Thanks a tonne @Sergio
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
09:21 AM
Currently we rebuild the whole collection when adding fields, and re index the whole database.
There is an option to add a field to the collection, but still requires indexing all the data.
Since there is no "collection migration management" we just avoid conflicts by recreating everything and then moving the alias.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Discussions on Typesense, Collections, and Dynamic Fields

Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.

3

45
35mo
Solved

Seeking Help for Locale Schema Option

David asked about the locale schema option and its documentation. Kishore Nallan explained it's a bit undocumented, but provided an example for Korean. David then expressed their e-commerce store use-case, with Kishore Nallan suggesting separate collections. Minyong also received directions regarding Korean support from Kishore Nallan.

3

8
14mo
Solved

Cold Start Problem with Dynamic Collections

Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard `*` for query_by parameters, upgrading to `0.25.0.rc34`, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.

6

39
6mo
Solved

Methods for Fetching, Querying, and Modifying Collections in Typesense

Bill inquired about performing OR queries, querying empty arrays and modifying collections in Typesense. Kishore Nallan explained the current limitations and provided workarounds and recommendations for each case. The conversation also touched upon the usage of cache in Typesense and the workings of the _eval function.

5

52
10mo
Solved

Updating Collections Strategy and Faceting New Field

Nithin asked about strategies for updating collections and faceting new fields. Kishore Nallan suggested creating another collection, indexing in the background and using aliases to switch live traffic over, and shared details about the upcoming release.

10
34mo