Hello there, I have a question about hybrid search...
# community-help
c
Hello there, I have a question about hybrid search. I've understood that there are both keyword search results, and semantic search results, and that they are merged together using the formula found in your docs. I've noticed a funny situation where • I have 2 documents with the same name (and therefore the same embedding), • When I search the exact document name, I see the two results, but the
rank_fusion_score
's are different. In the results I can see clearly one is a semantic search result and one is a keyword search result. I'm wondering, why both aren't semantic search results? I would have imagined them to have the same exact score? I'll post the response I'm getting in the thread. But basically, the first result has no
vector_distance
and the second one does. Curious why 🧐
Copy code
"hits": [
    {
      "document": {
        "id": "1003447",
        "name": "Financial services - 12.2",
        "source": "exiobase"
      },
      "highlight": {
        "name": {
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      },
      "highlights": [
        {
          "field": "name",
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      ],
      "hybrid_search_info": {
        "rank_fusion_score": 0.699999988079071
      },
      "text_match": 1736172819517538425,
      "text_match_info": {
        "best_field_score": "3315704398080",
        "best_field_weight": 15,
        "fields_matched": 1,
        "num_tokens_dropped": 0,
        "score": "1736172819517538425",
        "tokens_matched": 3,
        "typo_prefix_score": 0
      }
    },
    {
      "document": {
        "id": "530671",
        "name": "Financial services - 12.2",
        "source": "lune"
      },
      "highlight": {
        "name": {
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      },
      "highlights": [
        {
          "field": "name",
          "matched_tokens": [
            "Financial",
            "services",
            "12.2"
          ],
          "snippet": "<mark>Financial</mark> <mark>services</mark> - <mark>12.2</mark>"
        }
      ],
      "hybrid_search_info": {
        "rank_fusion_score": 0.6499999761581421
      },
      "text_match": 1051931443,
      "text_match_info": {
        "best_field_score": "513638",
        "best_field_weight": 102,
        "fields_matched": 3,
        "num_tokens_dropped": 3,
        "score": "1051931443",
        "tokens_matched": 0,
        "typo_prefix_score": 255
      },
      "vector_distance": 4.76837158203125E-7
    },
]
Odd I've restarted the database and now both documents have a vector_distance, as I expected originally. Any ideas why that is the case?
j
Could you try this on v28.0.rc20?
k
Rank fusion score will be different because it takes into account the position of a document in the ranking. Even if 2 documents have the same embedding, one will have to appear after the other, rank fusion score will be different However, I am not sure why only one would have a vector distance, and why it worked after a restart. Let me know if you are able to reproduce the issue again.
c
Odd right? Yes, I'll let you know if I'm able to reproduce it, but at the moment everything is working as expected. Probably best to upgrade to the latest version as well, we're currently on v26.0
k
Oh yeah that's definitely a good idea. Several fixes since then.
c
Thank you both! I'll let you know if we're ever able to reproduce.