Hybrid Search Distance Threshold Issue
TLDR Anish has an issue with search results not respecting the vector distance threshold when using hybrid search. Jason explains additional fields cause vector_distance
to only apply to vector search results and suggests opening a feature request on GitHub.
2
Sep 12, 2023 (2 weeks ago)
Anish
03:47 PMvector_query' : 'embedding:([], distance_threshold:0.30)'
it doesn't seem to affect the results of a hybrid search. Some of the results returned have vector distance of 2. How would I set a threshold?Anish
03:48 PM'query_by': 'embedding,description'
Jason
03:59 PMJason
03:59 PMโ ~ curl -s '' \
-X 'POST' \
--data-binary '
{
"searches": [
{
"query_by": "embedding",
"vector_query": "embedding:([], distance_threshold:0.50, k:10)",
"collection": "hn-comments",
"q": "cinema"
}
]
}
' | jq '.results[0].hits[].vector_distance'
0.35301458835601807
0.3530757427215576
0.37170201539993286
0.37984395027160645
0.39115262031555176
0.4030781388282776
0.4054117798805237
0.410835325717926
0.41370344161987305
0.4149748682975769
Jason
04:00 PMAnish
04:06 PM"query_by": "embedding",
, but as soon I add another field there the vector distance isn't respected.Anish
04:07 PMJason
04:07 PMโ ~ curl -s '' \
-X 'POST' \
--data-binary '
{
"searches": [
{
"query_by": "text,embedding",
"vector_query": "embedding:([], distance_threshold:0.50, k:10)",
"collection": "hn-comments",
"q": "cinema"
}
]
}
' | jq '.results[0].hits[].vector_distance'
2
2
0.35301458835601807
2
2
0.3530757427215576
2
0.37170201539993286
2
2
1
Jason
04:08 PMvector_distance
field is being set to 2
for all the results that are because of keyword search.Jason
04:09 PMvector_distance
is only calculated for results that were pulled up from a vector search and distance threshold only applies to vector search.Jason
04:10 PMvector_distance: 2
for keyword-matched results, when technically we should not be returning that field at allAnish
04:11 PMJason
04:12 PM1
Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Integrating Semantic Search with Typesense
Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Error in Implementing Vector Search
Krish faced an error while implementing vector search. After recreating the collection as suggested by Kishore Nallan, the issue was resolved.
Discrepancy in Search Results Between Postman and Python Library
Md raised an issue about differing search results when using Postman and a Python library. Kishore Nallan suggested trying a multi_search request to compare values, and to set a distance threshold on the vector component. The issue was resolved.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.