Troubleshooting Search Results for Health Products

TLDR Tom is having issues with search results when adding extra words in the query. Jason suggests using `max_candidates` and `exhaustive_search`, but needs more time to find a proper solution.

Photo of Tom
Tom
Fri, 02 Jun 2023 16:34:10 UTC

Hi guys, loving TypeSense and embedding it into our healthtech app. Unfortunately we've hit a bit of a snag. We are searching across about 60k health products on the product name. When we search for 'Prosys sleeve small' we get the first result as 'Prosys leg bag sleeve small' which is perfect. However, if we put any word in between 'Prosys' and 'sleeve small' it seems to just return any result with 'prosys' in the name with equal weighting. For example, if we search 'Prosys bob sleeve small' the first result is 'Prosys Flofit self adhesive sheath standard' which doesn't have 'sleeve' or 'small' in the result We've tried looking through and turning on/off lots of different search parameters, but we can't get 'Prosys bob sleeve small' to return the result containing 'sleeve' and 'small' :disappointed: Any suggestions very welcome!

Photo of Jason
Jason
Fri, 02 Jun 2023 16:38:02 UTC

Could you try setting `max_candidates: 100000` and see if that helps?

Photo of Jason
Jason
Fri, 02 Jun 2023 16:38:26 UTC

The next thing to try is setting `exhaustive_search: true`

Photo of Jason
Jason
Fri, 02 Jun 2023 16:38:37 UTC

If that also doesn’t work, could you adapt with your schema and a few sample records to replicate this issue?

Photo of Tom
Tom
Fri, 02 Jun 2023 16:45:56 UTC

Thanks for the quick reply - I should clarify that one of the results will include 'Prosys leg bag sleeve small', but it's about 25 results down which would be very confusing to the user.

Photo of Tom
Tom
Fri, 02 Jun 2023 16:46:04 UTC

Result with exhaustive search: `{` `"facet_counts": [],` `"found": 6763,` `"hits": [` `{` `"document": {` `"amp_drug_tariff_id": "20072911000001103",` `"category": "Silk vest small adult",` `"colour": "White",` `"drug_tariff_id": "20073011000001106",` `"flavour": "",` `"id": "20073011000001106",` `"inactive": false,` `"name": "Skinnies Silk vest short sleeve small adult White (Dermacea Ltd) 1 device",` `"product_name": "Skinnies Silk vest short sleeve small adult",` `"quantity": "1",` `"size_weight": "",` `"sub_pack_information": "",` `"supplier": "Dermacea Ltd",` `"tariffs": "Part IXa",` `"unit_of_measure": "device"` `},` `"highlights": [` `{` `"field": "product_name",` `"matched_tokens": [` `"sleeve",` `"small"` `],` `"snippet": "Skinnies Silk vest short <mark>sleeve</mark> <mark>small</mark> adult"` `}` `],` `"text_match": 144681433930137601` `},` `{` `"document": {` `"amp_drug_tariff_id": "15037711000001107",` `"category": "Elasticated viscose stockinette vest small adult",` `"colour": "Beige",` `"drug_tariff_id": "15039111000001100",` `"flavour": "",` `"id": "15039111000001100",` `"inactive": false,` `"name": "Skinnies Viscose stockinette vest long sleeve small adult Beige (Dermacea Ltd) 1 device",` `"product_name": "Skinnies Viscose stockinette vest long sleeve small adult",` `"quantity": "1",` `"size_weight": "",` `"sub_pack_information": "",` `"supplier": "Dermacea Ltd",` `"tariffs": "Part IXa",` `"unit_of_measure": "device"` `},` `"highlights": [` `{` `"field": "product_name",` `"matched_tokens": [` `"sleeve",` `"small"` `],` `"snippet": "Skinnies Viscose stockinette vest long <mark>sleeve</mark> <mark>small</mark> adult"` `}` `],` `"text_match": 144681433930137601` `},` `{` `"document": {` `"amp_drug_tariff_id": "23589211000001108",` `"category": "Tubing and accessories",` `"colour": "",` `"drug_tariff_id": "23589311000001100",` `"flavour": "",` `"id": "23589311000001100",` `"inactive": false,` `"name": "Prosys leg bag sleeve small PLS3881 24cm-40cm (CliniSupplies Ltd) 4 device",` `"product_name": "Prosys leg bag sleeve small",` `"quantity": "4",` `"size_weight": "24cm-40cm",` `"sub_pack_information": "",` `"supplier": "CliniSupplies Ltd",` `"tariffs": "Part IXb",` `"unit_of_measure": "device"` `},` `"highlights": [` `{` `"field": "product_name",` `"matched_tokens": [` `"Prosys",` `"sleeve",` `"small"` `],` `"snippet": "<mark>Prosys</mark> leg bag <mark>sleeve</mark> <mark>small</mark>"` `}` `],` `"text_match": 144681433930137601` `}` `],` `"out_of": 165394,` `"page": 1,` `"request_params": {` `"collection_name": "drug_tariff",` `"per_page": 3,` `"q": "prosys geoff sleeve small"` `},` `"search_cutoff": false,` `"search_time_ms": 15` `}`

Photo of Tom
Tom
Fri, 02 Jun 2023 16:47:05 UTC

So the result we want first was the third one. Given more tokens matched, I'm surprised it had the same text_match score...

Photo of Tom
Tom
Fri, 02 Jun 2023 16:52:10 UTC

Actually, without exhaustive search the result is similar - lots of results with the same text_match score - but the result we want first has the most matched_tokens so I would expect it to be the highest text_match score

Photo of Jason
Jason
Fri, 02 Jun 2023 16:55:55 UTC

What’s happening here is that if we don’t find an exact match, we drop words from left to right and then right to left until we find enough results (as defined by `drop_tokens_threshold`).

Photo of Jason
Jason
Fri, 02 Jun 2023 16:57:35 UTC

In your case the word to drop exists in the middle of the search query…

Photo of Jason
Jason
Fri, 02 Jun 2023 16:59:14 UTC

Need to think through how to solve this without a performance impact… Let me get back to you on this in a few days

Photo of Jason
Jason
Fri, 02 Jun 2023 16:59:41 UTC

This would still help to build a good test case: