Hi I want to use Google s `text embedding 005` model but the typesense #community-help

Hi I want to use Google's `text-embedding-005` mod...

Daniel Martel

02/20/2025, 2:19 AM

Hi I want to use Google's

text-embedding-005

model, but the docs aren't super clear on how Typesense handles embedding documents (or I missed it): • I want to create 256 dimensions embeddings... is setting

num_dim

enough? • Google has this concept of Task Types - is that used? Using

RETRIEVAL_DOCUMENT

and

RETRIEVAL_QUERY

might be optimal...not 100% sure. • I write to this collection daily, but the fields I want to embed don't change that often, does Typesense only update the embedding if an

embed.from

field changes? Or does a write event trigger a re-computation regardless? • I have a collection with ~800k documents... is it worth trying a batch size above 200? Not sure if I'll hit rate limits or anything. • One of the fields I want to embed can be very long - do you truncate it to a certain max length? • How do you preprocess and format/order the embedding if there's multiple

embed.from

fields (and arrays etc).?

Jason Bosco

02/20/2025, 2:42 AM

CC: @Ozan Armağan

Ozan Armağan

02/20/2025, 5:43 PM

Hi Daniel, 1. You won't need to set the dimensions manually, Typesense will make an API call to the model with a dummy text to the model and will set

num_dim

according to the response. 2. We don't use this currently. 3. Yes, the embeddings will only going to be updated if any of the fields in

embed.from

is updated. 4. You may probably hit the rate limits, I think you should leave it as 200. 5. For local embedding models we truncate the inputs to 512 tokens, but for the remote embedder services (Google, OpenAI etc.) we don't do any truncation as they handle this internally. 6. We join all fields by space in the same order of

embed.from

Ozan Armağan

02/20/2025, 5:44 PM

You can open a Github issue for the point #2

Daniel Martel

02/20/2025, 11:40 PM

For #1 - the default dimension size is 768 but they allow you to set it between 1-768. I would like to set it to 256. So are you saying I can't customize it, it will just default to 768?

Ozan Armağan

02/21/2025, 10:33 AM

We support that for OpenAI’s text-embedding-3-* models by setting

num_dim

manually, but not yet for Google models. Could you also open an issue for that?

Matheus Bombonato

02/21/2025, 12:18 PM

Yes, the embeddings will only going to be updated if any of the fields in
embed.from
is updated.

Matheus Bombonato

02/21/2025, 12:18 PM

this is a very important information.

Matheus Bombonato

02/21/2025, 12:20 PM

I was about to create a whole data pipeline to check if my "embed.from" fields changed to avoid unnecessary embedding recreation

Matheus Bombonato

02/21/2025, 12:20 PM

please, add that to the docs. Couldn't find this information there

Fanis Tharropoulos

02/21/2025, 12:57 PM

please, add that to the docs. Couldn't find this information ther

Will do

Fanis Tharropoulos

02/21/2025, 1:48 PM

https://github.com/typesense/typesense-website/pull/296

🙌 1

✅ 1

❤️ 1

Daniel Martel

02/21/2025, 4:45 PM

I can definitely open those issues, I'm also curious: • Do you include facets in the embedding for a search query? • Can I store the embedding for a query so it doesn't get recalculated every time?

Ozan Armağan

02/24/2025, 10:04 AM

@Daniel Martel

Do you include facets in the embedding for a search query?

No we only embed query.

Can I store the embedding for a query so it doesn't get recalculated every time?

We already automatically do this, we cache results for embedding calls and reuse.

👍 1

Daniel Martel

02/24/2025, 6:11 PM

Sometimes users may have no query but only facets selected... in that case for hybrid search are embeddings just not used? Is it worth opening an issue to include facets in the embedding? Especially if we embed facet fields in the document embedding.

Daniel Martel

02/24/2025, 11:26 PM

^ nvm looking back on this that wouldn't really make sense lol. I'm assuming it just becomes a keyword search at that point? I think embedding facets when there's a query makes sense though.

Daniel Martel

02/25/2025, 9:44 PM

I opened 3 issues: • https://github.com/typesense/typesense/issues/2223 • https://github.com/typesense/typesense/issues/2224 • https://github.com/typesense/typesense/issues/2225

Ozan Armağan

02/26/2025, 10:04 PM

Thanks Daniel, we will add those to our roadmap.

🙌 1

2 Views

Open in Slack

Previous Next