#community-help

Phrase Search Relevancy and Weights Fix

TLDR Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

Powered by Struct AI

5

1

Mar 08, 2023 (9 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:02 PM
from your app, ok cool
05:03
Jason
05:03 PM
Could you also run the same search from Typesense Cloud search UI and paste the payload?
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:03 PM
With the same query_by and facet settings ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:04 PM
Yup
05:05
Jason
05:05 PM
Oh wait, I just noticed I misread your original question:

> When I do the same phrase search (using double quotes) in the cloud.typesense.org/clusters/ interface it’s also not returning records as expected. What’s your advice on this?
I misread this as it IS returning correct results in Typesense Cloud, but not in your app
05:05
Jason
05:05 PM
So that’s why I had asked you to send me the payload sent by your app vs Typesense Cloud
05:06
Jason
05:06 PM
Sorry about the confusion. Could you right click on the network request in the browser dev console, click on copy-as-curl and paste that curl command here?
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:07 PM
Yes the same results in the app as on cloud
05:07
Jan
05:07 PM
Yes I will grab that CURL
05:08
Jan
05:08 PM
Sorry just to be clear: the cloud and the app are showing the same results.
05:09
Jan
05:09 PM
It’s just that the relevance of the results are off when using “double quotes”. But if you do the same query with a prepended space like: ” double quotes” then we see results that are like expected/relevant
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:09 PM
Yup, understood
05:10
Jason
05:10 PM
If you can generate the curl command from the network request sent by your app, I can then take a closer look
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:10 PM
curl '' \
  -H 'authority: ' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: https://testapp.digibeetle.eu' \
  -H 'pragma: no-cache' \
  -H 'referer: https://testapp.digibeetle.eu/' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"200,500,100,50,250,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\"state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed
05:11
Jan
05:11 PM
And this is the query with a prepended space like ” state of the art” :
05:11
Jan
05:11 PM
curl '' \
  -H 'authority: ' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: https://testapp.digibeetle.eu' \
  -H 'pragma: no-cache' \
  -H 'referer: https://testapp.digibeetle.eu/' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"200,500,100,50,250,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\" state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:14 PM
Looks like you’re running 0.23.1. Can we try upgrade you to the latest version to see if some of the fixes we have there help with your dataset?
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:14 PM
yes please

1

05:20
Jan
05:20 PM
Do we need to re-sync the collection index?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:20 PM
No, not necessary

1

Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:22 PM
Should we test again?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:25 PM
I was just testing after the upgrade… Looks like the issue still persists
05:25
Jason
05:25 PM
Taking a closer look
05:28
Jason
05:28 PM
Could you try setting the weights to this:

"query_by_weights": "127,127,100,50,127,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1",
05:29
Jason
05:29 PM
Weights can only go up to a max of 127, beyond that it causes overflow and I wonder if that’s causing issues
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:29 PM
ah alright! sorry about that
05:29
Jan
05:29 PM
checking it right now
05:31
Jan
05:31 PM
No effect unfortunately
05:32
Jan
05:32 PM
One sec I will double test this again
05:33
Jan
05:33 PM
curl '' \
  -H 'authority: ' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: https://testapp.digibeetle.eu' \
  -H 'pragma: no-cache' \
  -H 'referer: https://testapp.digibeetle.eu/' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"127,127,100,50,127,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\"state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed
05:34
Jan
05:34 PM
I get the same results
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:36 PM
Ok, thank you for checking. We’ll take a closer look later today and keep you posted.

1

Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
05:39 PM
meanwhile I will double check the results in Typesense Cloud again

1

05:47
Jan
05:47 PM
I compared our app and the Cloud again using the same query_by and querry_by_weight settings. The results in Cloud are (almost) the same when searching the phrase match with a space prepended in the double quoted query. They are like we expect them to be in terms of relevance. But the results are different (and not like we expect them to be) without the space in the phrase query.

1

Mar 09, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
09:28 AM
Jason I presume there is no update on this yet ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 PM
I will be looking into this issue today. Will update.
03:53
Kishore Nallan
03:53 PM
I've identified the issue and will work on a patch for this. I will keep you posted.
Mar 10, 2023 (9 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:47 AM
Jan I've a fix for this problem. Can we update your cluster to the version with the fix? Let me know if we can go ahead and do that (or you prefer a particular time to do that).
Mar 13, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
10:20 AM
Hi Kishore You can update the cluster, thanks!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:35 AM
Done, please check again
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
10:51 AM
No difference
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:00 AM
Hmm, let me look. I did test locally on a similar document that reproduced the issue.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:02 AM
Let me know if I can provide you with anything that might help, or maybe I need to refresh/make changes on our side. I’ll re-test this on the cloud as well now, but in our app we don’t see any changes in the results.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:04 AM
Ok see this query:
11:04
Kishore Nallan
11:04 AM
curl '' --data-raw '{"searches":[{"query_by":"case_law_documents","sort_by":"_text_match:desc,post_date:desc","collection":"sources","q":"\"state of the art\"","per_page":10, "highlight_fields": "case_law_documents", "include_fields": "id"}]}' | jq
11:04
Kishore Nallan
11:04 AM
Earlier, this was producing hits that did not have the full phrase.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:07 AM
But that query is only querying 1 attribute
11:12
Jan
11:12 AM
Btw, does the text_match_info.score tells us anything about the relevance of the hits while using a phrase (double quoted) query? Because in the cloud and our app we get scores of 100, while the query with a prepended space in the query gives us text_match_scores that are way higher (e.g. text_match_info.score: 2314894167593451644)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:16 AM
For phrase search since all documents have the exact match, it's just a constant. The match info is misleading there. We should fix it.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:42 AM
Thanks for clearing that up, do you need more information from me to be able to reproduce the relevance problem we experience when querying with phrase search using this curl:
curl 'https://cloud.typesense.org/clusters/ocpdr54qif7a3tb0p/api/multi_search' \
  -H 'authority: ' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'cookie: _gcl_aw=GCL.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; _gcl_au=1.1.2056632186.1671042832; _gac_UA-116415641-1=1.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; __stripe_mid=67095200-c2fc-4b42-889f-8ac824a73822c95db9; _gac_UA-116415641-2=1.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; _ga_XTFPJRM8H9=GS1.1.1673438604.3.0.1673438604.0.0.0; _ga=GA1.2.853002192.1671042832; _gid=GA1.2.634655676.1678705359; _dc_gtm_UA-116415641-1=1; __stripe_sid=268480af-be78-45ce-b9d4-c17b80f5cc1eea59c6; _typesense_cloud_app_session=ERv8bWisgxuGfRaMHP%2FIjcXxj753nfkzx%2F0gY3eRs5zbw3eZL8e8DEm%2BpR5pwRxAKxwNPg9611knkD5JAYOKfOcaWH6N5EwjELCFtmf2e1tWYz8goYHna%2FMA1to%2FgTaPnEbal%2BC40i81sGiiKiQorGiKGdUGB0C0sLfYEiI2w1HiCzuPJL4AWmZMYTTIv32pJlADdyu9OY0txz28jDUk41Ac2Z6GWrTj%2FHDy9jFpEPXeGz4x1uA6pfFwdrnJ4c0qNT3wH41%2B8%2FnQ%2BonBVDd%2FfF%2BjbySRHprCttBHh%2BMVMOk87EW7UouFRfkF9yK8nG8akO7wPlhiUSlhc8uyz8cPjEx24jVvQgvq%2B5YbEVyMR6VjfDrc6v8V2G9fpVhmUuOYvnvIww8FafGpBbkJbiQfjzE1CocLXx73wiP%2BPNfL%2Bw5s%2BqkeFrdOyxXd2RkPKDzQs%2F1nVO4y6AD1Ps47ZWTxqQ6IXQHB17tqtbKPFhBL254h%2BM%2Fpv8zKXkBu0LGShzYTxoFwnMOQLKIBptSFzVbYHyiTB8EhGkGjX0a2bpLkKq9m7qBLNRK4YOENw0EbaoVZMDBA4QlMm9I%3D--ndZv%2FxfNntvyU3R%2B--hm%2FIrIn7YLqDjuw58R0%2B3Q%3D%3D' \
  -H 'origin: https://cloud.typesense.org' \
  -H 'pragma: no-cache' \
  -H 'referer: https://cloud.typesense.org/clusters/ocpdr54qif7a3tb0p/collections/sources/documents/search' \
  -H 'sec-ch-ua: "Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36' \
  -H 'x-csrf-token: kAR1Yt8jPXq1PyOwGH6cPu7EL80Qpz7gUOKDHveJrWgtSy-gD1WMDuacdFI06KPdE7xgVq-PSAEzfw37vpHWQA' \
  -H 'x-typesense-api-key: iyplt0v67zms5uhdk3w8gxcae4oqbrfn' \
  --data-raw '{"searches":[{"preset":"Main preset like testapp","collection":"sources","q":"\"state of the art\"","page":1,"per_page":5}]}' \
  --compressed

Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:44 AM
Thanks, let me analyze and get back to you.
01:04
Kishore Nallan
01:04 PM
The output of that query contains the phrase matches correctly. Can you tell me what issue you are seeing, maybe I am missing something.
01:05
Kishore Nallan
01:05 PM
First result:

"general_documents": [
              {
                "matched_tokens": [
                  "state",
                  "of",
                  "the",
                  "art",
                  "of",
                  "the"
                ],
                "snippet": "based on the present <mark>state</mark> <mark>of</mark> <mark>the</mark> <mark>art</mark> <mark>of</mark> <mark>the</mark> Internet, which"
              }
            ],

We do highlight all tokens appearing in the query in the snippet, but the phrase state of the art does exist.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
02:59 PM
When we prepend the phrase search with a space ” state of the art” the results are more relevant (and completely different) compared to “state of the art”
03:12
Jan
03:12 PM
It’s not that we don’t see results that have matches, the problem is that the relevance order is not like we’d expect.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:17 PM
I think that's just a coincidence. I have to check how that additional space is being treated by the engine, maybe that influences a different ordering.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
03:18 PM
The results we see in the cloud using the preset (like in my last curl code), are the least relevant results if we don’t use a prepended space in the phrase search. If we add the space, the results are perfect.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:18 PM
With the space, 750 results are fetched, but without it, only 75.
03:19
Kishore Nallan
03:19 PM
The earlier problem with non-phrase matches being returned is fixed: this seems different and relevancy related. Will look again and get back to you.
03:23
Kishore Nallan
03:23 PM
The first result which you deem more relevant (id: 4580) when space is used, also occurs in the query without prefix space, but it occurs much later.
03:25
Kishore Nallan
03:25 PM
Would you able to describe to me why you find that more relevant? Because all results are documents that contain the exact phrase. So what makes some more relevant than others? I can use this as a guide to see how we can fix the ordering.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
03:30 PM
I thought the order of the results should be dictated by the query_by_weight in combination with text_match:desc sorting
03:30
Jan
03:30 PM
But I will try to explain why we expect these results to rank high and why the other should be ranked low.
Mar 14, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
08:49 AM
If the search queries contain more than one word between quotation marks, then we get unexpected results.
First example: There is a record (ID 4580) with an attribute post_title and the value EDPB Guidelines 4/2019 If you query this value title without double quotes, we get good results, meaning: this is the first hit we see. If you use phrase search this record shows as the third result even though the post_title has query_by_weights: 115 Which is the third-highest weight that we’ve used. But… if you phrase search with a space prepended like " EDPB Guidelines 4/2019", the results are like we’d expect.

Second example: The same record ( ID 4580) has an attribute key_topics and an (array) value containing “state of the art”. This attribute has query_by_weights: 127 (the highest weight we’ve used and we’ve used this weight only for this attribute). This same record also has an attribute called labels containing an array with the value “state of the art”, this label attribute has query_by_weights: 120
If you use phrase search this record by "state of the art" this records ends up somewhere on ranking-position 50. Again… if you phrase search with a space prepended like " state of the art", the results are like we’d expect.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:52 AM
Thanks for the detailed examples. I've been looking into this behaviour myself and I see that there are cases where the weight is being ignored in phrase search that is having an impact here. With a padded space, the search query is no longer being searched via the phrase search code path so that's why the weights work. I'm working on a fix. I'll keep you posted.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
08:55 AM
Thanks!
Mar 15, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
09:08 AM
Hi Kishore, not to rush you but to manage expectations here on my side. Do you have an ETA for the phrase search fix?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:38 AM
I'm working on incorporating the weights properly into phrase search. It will take a few days to implement and then thoroughly test. So I should be able to get you a patched build by early next week.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:22 AM
👌
Mar 21, 2023 (9 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:46 AM
👋 I have a build with a patch. Do you want to test it out first on a dev/staging environment first?
Mar 22, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
09:11 AM
Hi!
09:13
Jan
09:13 AM
You can roll it out to the current environment! Or was it already rolled out because we see that things changed for the better 👌:skin-tone-4:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:53 AM
Not yet, I can roll it out now. Since your instance is not a HA instance there will be a downtime. So let me know if you want to do it at a specific time.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:13 AM
Hi, please roll out asap 👌:skin-tone-4:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:37 AM
It's done.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
12:24 PM
Yes this looks perfect, I’m going to do some more testing but it seems like it’s working!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:33 PM
Happy to hear!
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
02:11 PM
Hi
02:17
Jan
02:17 PM
Not sure if this is related, but if we add any symbols in the searchbar then our document titles return with a ‘Undefined’.
02:19
Jan
02:19 PM
Like this, all results do show the correct date and they correctly link to the documents.
Image 1 for Like this, all results do show the correct date and they correctly link to the documents.
02:20
Jan
02:20 PM
If you start searching using double quote
Image 1 for If you start searching using double quote
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:28 PM
Can you share the request being made for this query like before?
02:47
Kishore Nallan
02:47 PM
Ignore, I think we have to handle this. I will get back to you.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
03:27 PM
Thanks
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:19 PM
What's happening here is that the special characters are removed from the query string unless they are explicitly allowed via symbols_to_index configuration. This results in an empty query string which is treated as a * wildcard search. Those "undefined" values are showing up because we don't return highlights for wildcard searches since the query is essentially empty / catch-all.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:41 PM
In general, when you process the response from Typesense to render the UI, you want to first check if a highlight exists for that field inside the highlight key, if it doesn’t then fallback to the field inside the document key…
Mar 23, 2023 (9 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
10:23 AM
Thanks for your reply. Instead of showing all results on a wildcard search, how can I prevent a wildcard search all together? I’m using instantsearch.js
Mar 29, 2023 (8 months ago)
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
10:43 AM
Thanks Jason, but I don’t understand how this part (helper.state.query === '') is related to wildcard searches, it just hides something if the searchbox is empty. A user can still type in special characters and we get the ‘Undefined’ results. I have played around with this helper.state.query before, but it seems only useful in specific cases when you only want to listen to the state of the searchbox. This helper does not account for the use of facets/checkbox-filter states. Anyway, that is something not related to Typesense
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:51 AM
Typesense treats an empty q as a wildcard query as well. If there are only special characters in q then those are removed by tokenizer and we again end up with empty q string.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
10:53 AM
Yes
11:00
Jan
11:00 AM
But how can I prevent from allowing ‘wild card’-results? If I type any special character I see that character as a value for the q in the payload. So q is not empty when it’s sent to Typesense right? Typesense will interpret this as a empty q string, but it will returns all results from the index…. I want to prevent this last thing from happening.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:02 AM
I regret allowing empty q to be treated as a wildcard 😞 The work around is to mimic Typesense behavior client side. Check if q contains all symbols in addition to checking if it's empty.
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:04 AM
Yes that’s what I’m looking for (client-side solution), I seems that the example of Jason is doing this (the songs-search demo), but I figure out what code is filtering these special characters.
11:05
Jan
11:05 AM
if I search for &amp; for example, nothing happens -&gt; no payload… what part of the code is preventing this from happening?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:07 AM
How about just checking /^[^a-zA-Z0-9]+$/.test(helper.state.query) to check if query string contains only alpha numeric? That can be added along with the empty string check here: https://github.com/typesense/showcase-songs-search/blob/e7ad97ce4e09191743abd727c2dfc949811bbcd6/src/app.js#L176
Jan
Photo of md5-79fc728f0635dc2218142cb603ce2570
Jan
11:09 AM
Thanks Kishore, going to give it a try!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:13 AM
Actually the above will not allow a string with both special characters and alpha numeric like foo? bar
11:17
Kishore Nallan
11:17 AM
This will work:

/[a-zA-Z0-9]/.test(helper.state.query)
11:18
Kishore Nallan
11:18 AM
Will return true if atleast one alpha numeric character appears in the query string, which is what we want here.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Inconsistent Search Results with Typesense

David reported inconsistencies with infix searching using Typesense, despite no change in configuration. Upon review, Jason could not consistently reproduce the issue and suggested potential fixes including a debug build on the user's cluster. The issue remains unresolved.

6

59
1mo

Resolving Typesense Query Issues

Todd had queries regarding Typesense operation. Jason clarified Typesense's default behavior and provided a recommendation to enhance results ranking based on relevance and recency.

1

11
1mo

Querying and Indexing Multiple Elements Issues

Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.

34
12mo

Issues With `text_match` Scoring for Search Queries in Typesense

Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.

8

33
17mo

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo