Hello I am having trouble with hybrid search at the moment I typesense #community-help

Hello! I am having trouble with hybrid search at t...

Joel Ödlund

01/30/2025, 8:40 AM

Hello! I am having trouble with hybrid search at the moment. It appears that i get incorrect ranking in certain circumstances. In particular, when there is a tie in the text matching, i expect to get the document with the lowest vector distance on top, but this is not happening From the documentation we have

Copy code

K = rank of document in keyword search
S = rank of document in semantic search

rank_fusion_score = 0.7 * K + 0.3 * S

It appears that if we have several hits with the same text match, they will get an arbitrary keyword search rank. This arbitrary rank will then be weighted together with the semantic search rank. The result seems to be arbitrary in the end, even if semantically closer documents clearly should be on top. This is something that we experience as random ordering for many searches, which is not great. Perhaps this algorithm can be adjusted to allow several documents with the same keyword search rank, in order to make the semantic search rank the tie breaker.

Kishore Nallan

01/30/2025, 8:43 AM

In recent v28 RC builds, you can try setting

rerank_hybrid_matches: true

search parameter. When enabled, it'll compute text_match_score for records found with vector search only and vice versa. This might improve overall quality of hybrid search.

Kishore Nallan

01/30/2025, 8:44 AM

You can also directly rank keyword search results with semantic search this way: https://typesense.org/docs/27.1/api/vector-search.html#rank-keyword-search-via-vector-search

Joel Ödlund

01/30/2025, 8:47 AM

I have considered ranking the results on vector distance, but as far as i understand, this will essentially disable any keyword search functionality. It will just boil down to a vector search, which is not great either.

Joel Ödlund

01/30/2025, 8:48 AM

correct me if am wrong here.

Joel Ödlund

01/30/2025, 8:48 AM

How can i understand this new rerank_hybrid_matches parameter?

Kishore Nallan

01/30/2025, 9:49 AM

this will essentially disable any keyword search functionality.

Why? Vector search will only be used to break ties in keyword search.

How can i understand this new rerank_hybrid_matches parameter?

There will be documents that appear in top-K keyword hits but not in top-K semantic search hits (and vice versa). This option will make the engine compute the missing complementary score so that there is always a complete picture.

👍 2

Joel Ödlund

01/30/2025, 9:55 AM

ok, rerank_hybrid_matches seems great, and something i have been struggling with as well. so if i provide an explicit sorting like in the docs

Copy code

{
  "q": "shoes",
  "query_by": "title",
  "sort_by": "_text_match:desc,_vector_query(embedding:([])):asc"
}

does this disable the hybrid search score and the K parameter?

Kishore Nallan

01/30/2025, 10:23 AM

Yes, with this we will sort first by text match score and only if there is a tie break, the vector query score (semantic search score) is used. Normal hybrid search works using the fusion formula, which will not apply here.

Joel Ödlund

01/30/2025, 12:48 PM

I am looking for a way to continuously combine text match and vector contributions. It seems like the current rank fusion score is broken, in the above sense. can you point me to the part of the code where the score is computed, so i can make my own version?

Kishore Nallan

01/30/2025, 12:52 PM

I don't follow, can you elaborate on what you mean by its broken

Joel Ödlund

01/30/2025, 12:57 PM

if we have several hits with the same text match score, I would expect them to be sorted by vector similarity But with the rank fusion algorithm, the hits are ranked strictly, even if they have the same text match score. so you get some document A with a much higher rank fusion score than document B, even tough document B has the same text match and a better vector distance

Joel Ödlund

01/30/2025, 12:59 PM

( this is my interpretation of what is going on, i have not seen the code)

Kishore Nallan

01/30/2025, 1:04 PM

> if we have several hits with the same text match score, I would expect them to be sorted by vector similarity That's not how rank fusion works. Rank fusion uses the rank of the document in keyword search and combines that with rank of the document in semantic search and

alpha

parameter for weighting both components. So even if several hits have the same text match score, if there is no secondary sorting condition, they have to be ordered somehow -- which in our case is done by the ID of the record (document indexing order).

Kishore Nallan

01/30/2025, 1:06 PM

If you strictly need that type of behavior you have to rank keyword search by semantic search (link I shared earlier).

Joel Ödlund

01/30/2025, 1:51 PM

Yes, that is my understanding. I think the issue for me is that I really need this weighted combination of vector and text. But due to the implementation of the rank fusion, the ID of the document becomes more significant than the vector score in many cases. It leads to results that are unexpected, and not usable for us. I would argue that it is in fact broken, since it will order A before B even when B is strictly better than A . One could consider a slight modification, where the rank factor in the expression can be the same for equal documents. You could then have say 3 documents with rank 1, and use the same formula for rank fusion, and get a correct ordering

Kishore Nallan

01/30/2025, 1:54 PM

This makes sense. Can you please create a GitHub issue? We will pick it up.

Joel Ödlund

01/30/2025, 1:54 PM

I will do that. Thank you!

Joel Ödlund

01/30/2025, 2:47 PM

https://github.com/typesense/typesense/issues/2163

Open in Slack

Previous Next