Hi my company is hosting Typesense search solutions for a po typesense #community-help

Hi, my company is hosting Typesense search solutio...

Peter Thramkrongart

07/25/2025, 7:38 AM

Hi, my company is hosting Typesense search solutions for a portfolio of e-commerce companies. We are considering getting into vector search using text embeddings. There is a thing that seems unclear to me. As far as I can gather, most models are meant for words or single sentences. The documents in our indexes usually have 10-20 fields with product name, category paths, description, brand an so on. Can the models handle whole paragraphs? How much can the models handle if we use auto-embeddings?

Harisaran

07/25/2025, 7:56 AM

Can the models handle whole paragraphs?

It all boils down to

max_token

of the embedding model. If you are planning to use auto-embedding

ts/nomic-embed-text-v1.5

can handle around 8192 tokens. (This is the highest token limit if you are using with auto-embedding)

Peter Thramkrongart

07/25/2025, 7:57 AM

Is it multilingual?

Harisaran

07/25/2025, 8:02 AM

No it is not a multingual model.

Harisaran

07/25/2025, 8:02 AM

If you want a multilingual model you can covert the model into a ONNX model and then use it typesense auto embeddings.

Peter Thramkrongart

07/25/2025, 8:03 AM

I see typesense supports this https://huggingface.co/intfloat/multilingual-e5-large. Does max_length correspond to max toxens?

Harisaran

07/25/2025, 8:06 AM

Yep correct with e5 models the max size is around 512 tokens

Peter Thramkrongart

07/25/2025, 8:08 AM

Okay, thanks. Now I have something more concrete to look for!

🙌 1

2 Views

Open in Slack

Previous Next