Discrepancy in Search Results Between Postman and Python Library

TLDR Md raised an issue about differing search results when using Postman and a Python library. Kishore Nallan suggested trying a multi_search request to compare values, and to set a distance threshold on the vector component. The issue was resolved.

Photo of Md
Md
Tue, 22 Aug 2023 09:40:15 UTC

I have got a problem where using postman I get different results for hybrid search and using python library it returns different results. typesense==0.16.0 (python client)

Photo of Kishore Nallan
Kishore Nallan
Tue, 22 Aug 2023 10:10:51 UTC

Can you please post a small reproduceable example?

Photo of Md
Md
Tue, 22 Aug 2023 10:17:35 UTC

I am just getting the IDs for sequentially fetching from DB using django queryset `search_parameters1 = {` `'q': remove_trailing_whitespace(search),` `'query_by': 'embedding,title,job_responsibility,skills,industry,location,type,division,company',` `'include_fields': 'id',` `'per_page': 250,` `'page': 1,` `'sort_by': '_vector_distance:desc'` `}` `res = client.collections['alljobsupdated'].documents.search(search_parameters1)` `newlist = [x['document']['id'] for x in res['hits']]` `print(newlist)`

Photo of Md
Md
Tue, 22 Aug 2023 10:19:11 UTC

Here the sequence and list are different than fetching directly from postman like {server}/collections/alljobsupdated/documents/search?q=laravel&query_by=embedding,title,job_responsibility,skills,industry,location,type,division,company&sort_by=_vector_distance:desc&include_fields=id&page=250

Photo of Kishore Nallan
Kishore Nallan
Tue, 22 Aug 2023 10:27:54 UTC

Can you try a multi_search request with both python client and with postman and compare those values? I wonder if there is some url encoding issue because of the GET params

Photo of Kishore Nallan
Kishore Nallan
Tue, 22 Aug 2023 10:28:09 UTC

Multi search uses POST so we can rule that out

Photo of Md
Md
Tue, 22 Aug 2023 10:32:40 UTC

okay thanks bhai, I will try it

Photo of Md
Md
Tue, 22 Aug 2023 14:29:46 UTC

it is working now but sometimes returns irrelevant data , do we need to remove stop words before inserting on typesense ?

Photo of Kishore Nallan
Kishore Nallan
Tue, 22 Aug 2023 14:32:43 UTC

Irrelevant data can happen because of semantic search.

Photo of Md
Md
Tue, 22 Aug 2023 14:36:43 UTC

is there any possibilities to tweak the fusion mechanism dynamically?

Photo of Kishore Nallan
Kishore Nallan
Wed, 23 Aug 2023 07:32:29 UTC

Can you try setting a distance threshold on the vector component: ```vector_query=embedding:([], distance_threshold:0.30)``` In the snippet above, `embedding` is the name of the vector field. Try adding the above to your queries to see if it helps.

Photo of Md
Md
Wed, 23 Aug 2023 09:44:26 UTC

thank you for the support