Holy i didn t realize you guys were working on allowing for typesense #community-help

Holy i didn't realize you guys were working on all...

robert

01/30/2023, 11:21 PM

Holy i didn't realize you guys were working on allowing for vector storage as well. This is hype team. Can you give any guidance on what to expect with the various typesense cluster sizes & how much memory the vectors take up in terms of space?

🙌 1

robert

01/30/2023, 11:25 PM

Also can we combine both keyword (q: 'keyword') and vector search together?

Jason Bosco

01/30/2023, 11:26 PM

It would depend on the number of dimensions in your embeddings. These are essentially floating point arrays, so if you have X dimensions per embedding, Y number of documents, you’d need at least X * Y * 4 * 2 bytes of RAM to index the embeddings

👍 1

Jason Bosco

01/30/2023, 11:27 PM

Also can we combine both keyword (q: ‘keyword’) and vector search together?

This is not supported yet.

Jason Bosco

01/30/2023, 11:28 PM

But you can query based on document ID: https://typesense.org/docs/0.24.0/api/vector-search.html#querying-for-similar-documents

robert

01/30/2023, 11:29 PM

filter_by

still works tho?

Jason Bosco

01/30/2023, 11:29 PM

Yup!

robert

01/30/2023, 11:29 PM

Damn this is wild. We're gonna have to move our vectors off pinecone and onto typesense now too.

😄 1

Jason Bosco

01/30/2023, 11:30 PM

Out of curiosity, does PineCone support text keyword queries with vector search?

robert

01/30/2023, 11:30 PM

Nope. Just vector search. We filter by the metadata after matching K closest vectors.

Jason Bosco

01/30/2023, 11:30 PM

I see

robert

01/30/2023, 11:30 PM

Hence why keyword + vector search would be a game changer.

👍 1

robert

01/30/2023, 11:31 PM

filter_by with the vectors is already powerful enough with the way we index our documents

🙌 1

Jason Bosco

01/30/2023, 11:43 PM

Btw, does pinecone support filter_by with vector search?

robert

01/31/2023, 2:11 PM

Yeah it does Jason. You can filter by "metadata" which is indexed. It's roughly 500k 1536 dim floating point vectors I can store (with only 1kb of metadata associated w/ the vector) per "pod" as they call it. Which is ~ $0.10/hr on pinecone. So the comparable typesense cluster at 8gb at $0.17/hr can store roughly the same amount of vectors Based on your math, I can store 500k vectors * 1536 * 8 = 6.14gb with another 2gb to spare for metadata (roughly 4kb per vector)

robert

01/31/2023, 2:15 PM

The difference of course being I can have collections of different types (vector based data & keyword based data) all within one cluster. And I believe y'all are uniquely positioned to capitalize on hybrid search (keyword + vector)

❤️ 1

Jason Bosco

01/31/2023, 3:03 PM

Thanks Robert! That’s really good context to have.

Jason Bosco

01/31/2023, 3:05 PM

Btw, the estimate formula I shared is based on my intuition. We don’t yet have enough benchmarks to see how that spans across different types of datasets. So when you do get a chance to index your data in a Typesense cluster, could you let me know how memory usage looks like?

Open in Slack

Previous Next