Hi all our documents are structured as source and translated typesense #random

Hi all, our documents are structured as source and...

Mridul Khanal

05/18/2023, 1:26 AM

Hi all, our documents are structured as source and translated documents. When someone searches, we need to search across the source and their translations However, when the result is returned, we need it to return both the source and all its translated documents (even the one not in the match), and count a set them as 1 document Is this possible within Typesense? All the related documents have a relationId field which is the same

Jason Bosco

05/18/2023, 2:05 AM

I’d recommend putting all translations inside the same document in Typesense, when indexing

Jason Bosco

05/18/2023, 2:05 AM

Eg:

{ fieldA_en, fieldA_fr, fieldA_de, fieldB_en, fieldB_fr, fieldB_de }

Mridul Khanal

05/18/2023, 3:55 AM

Thanks Jason…I was trying to avoid it because the data sync from DB would not be just a single row updates …but maybe that’s the only way

Sergio Behrends

05/19/2023, 9:08 AM

We rolled out this implementation too

title_.*

By using wildcards we could expand to future locales without major collection changes 🙂

Mridul Khanal

05/19/2023, 9:09 AM

Oh this is great…makes things so much better …can i index it as such?

Sergio Behrends

05/19/2023, 9:11 AM

You then index

title_en

title_es

and a new field is generated.

Sergio Behrends

05/19/2023, 9:12 AM

https://typesense.org/docs/0.24.1/api/collections.html#with-auto-schema-detection

Sergio Behrends

05/19/2023, 9:12 AM

And we have some logic to query Typesense by the locale the user requires

Mridul Khanal

05/19/2023, 9:15 AM

How can I pass different locales when I define fields with regex ? Eg: text_chinese needs zh tokenizer , but text_en needs a different one

Mridul Khanal

05/19/2023, 9:17 AM

We can always define new collection and start using that when we add a new language, however that would mean a complete reindexing

Sergio Behrends

05/19/2023, 9:17 AM

For those you would need to define them specifically

Sergio Behrends

05/19/2023, 9:17 AM

Order matter, so you could to

Copy code

title_ja -> ja
title_zh -> zh
title_.* -> generic

👍 1

Mridul Khanal

05/19/2023, 9:18 AM

Okay…seeing that there are only a handful of tokenizers currently, we can do a comprehensive one without much overhead

Mridul Khanal

05/19/2023, 9:19 AM

Thanks a tonne @Sergio

Sergio Behrends

05/19/2023, 9:21 AM

Currently we rebuild the whole collection when adding fields, and re index the whole database. There is an option to add a field to the collection, but still requires indexing all the data. Since there is no "collection migration management" we just avoid conflicts by recreating everything and then moving the alias.

Open in Slack

Previous Next