Vector Search Filter and Cosine Similarity in Typesense

TLDR LT asked about vector search result filtering and cosine similarity. Kishore Nallan explained how cosine distance is related to similarity and shared plans to add a threshold restriction option.

Powered by Struct AI


Join the chat
Mar 25, 2023 (6 months ago)
Photo of md5-01b4600434aa419becd17a9f7773e2ff
10:09 PM
Hey Kishore Nallan, Krish asked 2 months ago about the vector search, wether the results can be filtered with "filter_by=vector_distance:>0.25".

You answered then "Nope that's not possible but these distances don't carry any semantic absolute meaning. They are only useful as relative values."

But just for my udneerstanding: The vector distance is the cosine similarity between the requested embedding and the on in typesense right? Because if I want to store face emebddings and the cosine similarity indicates how similar they are, the distance is very meaningful.
Are there plans to add filter_by functionality for vector distance or is the "go-to-way" for the next years to postprocess the results?
10:51 PM
I also noticed, that the vector distance can be greater than 1. How do you calculate the cosine similarity: It usually is in the interval of -1 (opposite direction), over 0 (orthogonal) to 1 (similar).
Mar 26, 2023 (6 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:49 PM
Cosine similarity is a range between -1 to 1.

cosine_distance = 1 - cosine_similarity

When 2 vectors are exactly same, the cosine similarity be 1, so the cosine distance will be 0.
Likewise, when 2 vectors are very different then the cosine similarity will be -1 so the cosine distance will be 2.

We plan to add a way to restrict results based on a threshold. When I meant "don't carry any semantic absolute meaning" I meant generically across datasets. It's still useful to have a cutoff threshold for some datasets so we will be adding an option for that.