Jason Bosco
05/30/2023, 11:26 PMWill it be possible to combine this with highlights? I know that’s tricky but if anyone can figure it out it’s you guys 😉Haha! If you do a hybrid search (keyword + semantic search combined) then we will highlight the keywords
What are the options for embedding models? Is the embedding interface generic enough to use external tools like GCP apis? Or does it depend on what you guys ship with it?We’ll be shipping with API-based models like OpenAI’s embedding model, Google PaLM and Vertex APIs. We’ll also have these in-built models: S-BERT and E5
How would we tackle large document embeddings? I’m guessing out of the box you’ll just create vectors for the entire document, but I’m trying to think through how we can segment large documents. (some of our documents represent large pdf reports).We don’t handle this chunking at the moment, so you would have to handle this outside of Typesense.
how does using embeddings affect typesense cloud pricing?Every vector dimension takes up 6-7bytes in the index. So if you use a 1536 dimension embedding model, each document will require 9.2KB - 10.8KB of additional RAM, besides the keyword-based index. We’re about to start working on reducing this by a factor of almost 60x in the next month or so.
how do synonyms interact with the embeddings?At the moment, synonyms only affect keyword-based search. For semantic search, we let the embedding model handle it natively.