Hi, my company is hosting Typesense search solutio...
# community-help
p
Hi, my company is hosting Typesense search solutions for a portfolio of e-commerce companies. We are considering getting into vector search using text embeddings. There is a thing that seems unclear to me. As far as I can gather, most models are meant for words or single sentences. The documents in our indexes usually have 10-20 fields with product name, category paths, description, brand an so on. Can the models handle whole paragraphs? How much can the models handle if we use auto-embeddings?
h
Can the models handle whole paragraphs?
It all boils down to
max_token
of the embedding model. If you are planning to use auto-embedding
ts/nomic-embed-text-v1.5
can handle around 8192 tokens. (This is the highest token limit if you are using with auto-embedding)
p
Is it multilingual?
h
No it is not a multingual model.
If you want a multilingual model you can covert the model into a ONNX model and then use it typesense auto embeddings.
p
I see typesense supports this https://huggingface.co/intfloat/multilingual-e5-large. Does max_length correspond to max toxens?
h
Yep correct with e5 models the max size is around 512 tokens
p
Okay, thanks. Now I have something more concrete to look for!
🙌 1