Daniel Martel
02/20/2025, 2:19 AMtext-embedding-005
model, but the docs aren't super clear on how Typesense handles embedding documents (or I missed it):
• I want to create 256 dimensions embeddings... is setting num_dim
enough?
• Google has this concept of Task Types - is that used? Using RETRIEVAL_DOCUMENT
and RETRIEVAL_QUERY
might be optimal...not 100% sure.
• I write to this collection daily, but the fields I want to embed don't change that often, does Typesense only update the embedding if an embed.from
field changes? Or does a write event trigger a re-computation regardless?
• I have a collection with ~800k documents... is it worth trying a batch size above 200? Not sure if I'll hit rate limits or anything.
• One of the fields I want to embed can be very long - do you truncate it to a certain max length?
• How do you preprocess and format/order the embedding if there's multiple embed.from
fields (and arrays etc).?Jason Bosco
02/20/2025, 2:42 AMOzan Armağan
02/20/2025, 5:43 PMnum_dim
according to the response.
2. We don't use this currently.
3. Yes, the embeddings will only going to be updated if any of the fields in embed.from
is updated.
4. You may probably hit the rate limits, I think you should leave it as 200.
5. For local embedding models we truncate the inputs to 512 tokens, but for the remote embedder services (Google, OpenAI etc.) we don't do any truncation as they handle this internally.
6. We join all fields by space in the same order of embed.from
Ozan Armağan
02/20/2025, 5:44 PMDaniel Martel
02/20/2025, 11:40 PMOzan Armağan
02/21/2025, 10:33 AMnum_dim
manually, but not yet for Google models. Could you also open an issue for that?Matheus Bombonato
02/21/2025, 12:18 PMYes, the embeddings will only going to be updated if any of the fields inis updated.embed.from
Matheus Bombonato
02/21/2025, 12:18 PMMatheus Bombonato
02/21/2025, 12:20 PMMatheus Bombonato
02/21/2025, 12:20 PMFanis Tharropoulos
02/21/2025, 12:57 PMplease, add that to the docs. Couldn't find this information ther
Will do
Fanis Tharropoulos
02/21/2025, 1:48 PMDaniel Martel
02/21/2025, 4:45 PMOzan Armağan
02/24/2025, 10:04 AMDo you include facets in the embedding for a search query?No we only embed query.
Can I store the embedding for a query so it doesn't get recalculated every time?We already automatically do this, we cache results for embedding calls and reuse.
Daniel Martel
02/24/2025, 6:11 PMDaniel Martel
02/24/2025, 11:26 PMDaniel Martel
02/25/2025, 9:44 PMOzan Armağan
02/26/2025, 10:04 PM