Joel Ödlund
01/30/2025, 8:40 AMK = rank of document in keyword search
S = rank of document in semantic search
rank_fusion_score = 0.7 * K + 0.3 * S
It appears that if we have several hits with the same text match, they will get an arbitrary keyword search rank.
This arbitrary rank will then be weighted together with the semantic search rank.
The result seems to be arbitrary in the end, even if semantically closer documents clearly should be on top.
This is something that we experience as random ordering for many searches, which is not great.
Perhaps this algorithm can be adjusted to allow several documents with the same keyword search rank, in order to make the semantic search rank the tie breaker.Kishore Nallan
01/30/2025, 8:43 AMrerank_hybrid_matches: true
search parameter.
When enabled, it'll compute text_match_score for records found with vector search only and vice versa. This might improve overall quality of hybrid search.Kishore Nallan
01/30/2025, 8:44 AMJoel Ödlund
01/30/2025, 8:47 AMJoel Ödlund
01/30/2025, 8:48 AMJoel Ödlund
01/30/2025, 8:48 AMKishore Nallan
01/30/2025, 9:49 AMthis will essentially disable any keyword search functionality.Why? Vector search will only be used to break ties in keyword search.
How can i understand this new rerank_hybrid_matches parameter?There will be documents that appear in top-K keyword hits but not in top-K semantic search hits (and vice versa). This option will make the engine compute the missing complementary score so that there is always a complete picture.
Joel Ödlund
01/30/2025, 9:55 AM{
"q": "shoes",
"query_by": "title",
"sort_by": "_text_match:desc,_vector_query(embedding:([])):asc"
}
does this disable the hybrid search score and the K parameter?Kishore Nallan
01/30/2025, 10:23 AMJoel Ödlund
01/30/2025, 12:48 PMKishore Nallan
01/30/2025, 12:52 PMJoel Ödlund
01/30/2025, 12:57 PMJoel Ödlund
01/30/2025, 12:59 PMKishore Nallan
01/30/2025, 1:04 PMalpha
parameter for weighting both components. So even if several hits have the same text match score, if there is no secondary sorting condition, they have to be ordered somehow -- which in our case is done by the ID of the record (document indexing order).Kishore Nallan
01/30/2025, 1:06 PMJoel Ödlund
01/30/2025, 1:51 PMKishore Nallan
01/30/2025, 1:54 PMJoel Ödlund
01/30/2025, 1:54 PMJoel Ödlund
01/30/2025, 2:47 PM