Holy i didn't realize you guys were working on all...
# community-help
r
Holy i didn't realize you guys were working on allowing for vector storage as well. This is hype team. Can you give any guidance on what to expect with the various typesense cluster sizes & how much memory the vectors take up in terms of space?
🙌 1
Also can we combine both keyword (q: 'keyword') and vector search together?
j
It would depend on the number of dimensions in your embeddings. These are essentially floating point arrays, so if you have X dimensions per embedding, Y number of documents, you’d need at least X * Y * 4 * 2 bytes of RAM to index the embeddings
👍 1
Also can we combine both keyword (q: ‘keyword’) and vector search together?
This is not supported yet.
r
filter_by
still works tho?
j
Yup!
r
Damn this is wild. We're gonna have to move our vectors off pinecone and onto typesense now too.
😄 1
j
Out of curiosity, does PineCone support text keyword queries with vector search?
r
Nope. Just vector search. We filter by the metadata after matching K closest vectors.
j
I see
r
Hence why keyword + vector search would be a game changer.
👍 1
filter_by with the vectors is already powerful enough with the way we index our documents
🙌 1
j
Btw, does pinecone support filter_by with vector search?
r
Yeah it does Jason. You can filter by "metadata" which is indexed. It's roughly 500k 1536 dim floating point vectors I can store (with only 1kb of metadata associated w/ the vector) per "pod" as they call it. Which is ~ $0.10/hr on pinecone. So the comparable typesense cluster at 8gb at $0.17/hr can store roughly the same amount of vectors Based on your math, I can store 500k vectors * 1536 * 8 = 6.14gb with another 2gb to spare for metadata (roughly 4kb per vector)
The difference of course being I can have collections of different types (vector based data & keyword based data) all within one cluster. And I believe y'all are uniquely positioned to capitalize on hybrid search (keyword + vector)
❤️ 1
j
Thanks Robert! That’s really good context to have.
Btw, the estimate formula I shared is based on my intuition. We don’t yet have enough benchmarks to see how that spans across different types of datasets. So when you do get a chance to index your data in a Typesense cluster, could you let me know how memory usage looks like?