Hello there I have a question about hybrid search I ve under typesense #community-help

Hello there, I have a question about hybrid search...

Charley Carriero

11/18/2024, 2:46 PM

Hello there, I have a question about hybrid search. I've understood that there are both keyword search results, and semantic search results, and that they are merged together using the formula found in your docs. I've noticed a funny situation where • I have 2 documents with the same name (and therefore the same embedding), • When I search the exact document name, I see the two results, but the

rank_fusion_score

's are different. In the results I can see clearly one is a semantic search result and one is a keyword search result. I'm wondering, why both aren't semantic search results? I would have imagined them to have the same exact score? I'll post the response I'm getting in the thread. But basically, the first result has no

vector_distance

and the second one does. Curious why 🧐

Charley Carriero

11/18/2024, 2:47 PM

Copy code

"hits": [
    {
      "document": {
        "id": "1003447",
        "name": "Financial services - 12.2",
        "source": "exiobase"
      },
      "highlight": {
        "name": {
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      },
      "highlights": [
        {
          "field": "name",
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      ],
      "hybrid_search_info": {
        "rank_fusion_score": 0.699999988079071
      },
      "text_match": 1736172819517538425,
      "text_match_info": {
        "best_field_score": "3315704398080",
        "best_field_weight": 15,
        "fields_matched": 1,
        "num_tokens_dropped": 0,
        "score": "1736172819517538425",
        "tokens_matched": 3,
        "typo_prefix_score": 0
      }
    },
    {
      "document": {
        "id": "530671",
        "name": "Financial services - 12.2",
        "source": "lune"
      },
      "highlight": {
        "name": {
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      },
      "highlights": [
        {
          "field": "name",
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      ],
      "hybrid_search_info": {
        "rank_fusion_score": 0.6499999761581421
      },
      "text_match": 1051931443,
      "text_match_info": {
        "best_field_score": "513638",
        "best_field_weight": 102,
        "fields_matched": 3,
        "num_tokens_dropped": 3,
        "score": "1051931443",
        "tokens_matched": 0,
        "typo_prefix_score": 255
      },
      "vector_distance": 4.76837158203125E-7
    },
]

Charley Carriero

11/19/2024, 3:06 PM

Odd I've restarted the database and now both documents have a vector_distance, as I expected originally. Any ideas why that is the case?

Jason Bosco

11/19/2024, 11:28 PM

Could you try this on v28.0.rc20?

Kishore Nallan

11/20/2024, 3:07 AM

Rank fusion score will be different because it takes into account the position of a document in the ranking. Even if 2 documents have the same embedding, one will have to appear after the other, rank fusion score will be different However, I am not sure why only one would have a vector distance, and why it worked after a restart. Let me know if you are able to reproduce the issue again.

Charley Carriero

11/20/2024, 9:20 AM

Odd right? Yes, I'll let you know if I'm able to reproduce it, but at the moment everything is working as expected. Probably best to upgrade to the latest version as well, we're currently on v26.0

Kishore Nallan

11/20/2024, 9:20 AM

Oh yeah that's definitely a good idea. Several fixes since then.

Charley Carriero

11/20/2024, 9:26 AM

Thank you both! I'll let you know if we're ever able to reproduce.

Open in Slack

Previous Next