Hi there! I’m trying to make the phrase match work...
# community-help
j
Hi there! I’m trying to make the phrase match work with double quotes. We are using Typesense Instantsearch Adapter but it doesn’t work as expected. It’s not showing records that actually DO contain the exact phrase, it only works as expected when the phrase match is matching the complete value of an attribute. When I do the same phrase search (using double quotes) in the cloud.typesense.org/clusters/ interface it’s also not returning records as expected. What’s your advice on this? Update: if I prefix the token with a space like this
" this is my phrase"
then the results are showing like expected.
j
Hmm, you can see the difference in queries sent to Typesense by looking at the network requests in your dev console in both Typesense Cloud and in your app…
Could you post screenshots of the search parameters being sent in the network requests, and I can help explain the difference
j
Extremely sorry for my late reply. I’m back on this project again and will reply right away on your follow ups! Do you mean the Network tab > multi_search request > response tab?
Query with “state of the art” between quotes
Copy code
"request_params":{"collection_name":"sources","per_page":10,"q":"\"state of the art\""},
Query with a space prepended like ” state of the art”
Copy code
"request_params":{"collection_name":"sources","per_page":10,"q":"\" state of the art\""}
j
Do you mean the Network tab > multi_search request > response tab?
No, Network tab > multi_search request > Payload tab
j
Query with “state of the art” between quotes
j
Actually, could you copy-paste that here? That’ll be easier
j
Copy code
{
  "searches": [
    {
      "query_by": "post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents",
      "query_by_weights": "200,500,100,50,250,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1",
      "num_typos": 0,
      "highlight_affix_num_tokens": "20",
      "sort_by": "_text_match:desc,post_date:desc",
      "highlight_full_fields": "post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents",
      "collection": "sources",
      "q": "\"state of the art\"",
      "facet_by": "key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation",
      "max_facet_values": 10,
      "page": 1,
      "per_page": 10
    }
  ]
}
j
And this is in Typesense Cloud?
j
No
j
from your app, ok cool
Could you also run the same search from Typesense Cloud search UI and paste the payload?
j
With the same query_by and facet settings ?
j
Yup
Oh wait, I just noticed I misread your original question:
When I do the same phrase search (using double quotes) in the cloud.typesense.org/clusters/ interface it’s also not returning records as expected. What’s your advice on this?
I misread this as it IS returning correct results in Typesense Cloud, but not in your app
So that’s why I had asked you to send me the payload sent by your app vs Typesense Cloud
Sorry about the confusion. Could you right click on the network request in the browser dev console, click on copy-as-curl and paste that curl command here?
j
Yes the same results in the app as on cloud
Yes I will grab that CURL
Sorry just to be clear: the cloud and the app are showing the same results.
It’s just that the relevance of the results are off when using “double quotes”. But if you do the same query with a prepended space like: ” double quotes” then we see results that are like expected/relevant
j
Yup, understood
If you can generate the curl command from the network request sent by your app, I can then take a closer look
j
Copy code
curl '<https://ocpdr54qif7a3tb0p-1.a1.typesense.net/multi_search?x-typesense-api-key=meSE8pinMxwFJlECsmTRVMLCbrrzoL2R>' \
  -H 'authority: <http://ocpdr54qif7a3tb0p-1.a1.typesense.net|ocpdr54qif7a3tb0p-1.a1.typesense.net>' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: <https://testapp.digibeetle.eu>' \
  -H 'pragma: no-cache' \
  -H 'referer: <https://testapp.digibeetle.eu/>' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"200,500,100,50,250,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\"state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed
And this is the query with a prepended space like ” state of the art” :
Copy code
curl '<https://ocpdr54qif7a3tb0p-1.a1.typesense.net/multi_search?x-typesense-api-key=meSE8pinMxwFJlECsmTRVMLCbrrzoL2R>' \
  -H 'authority: <http://ocpdr54qif7a3tb0p-1.a1.typesense.net|ocpdr54qif7a3tb0p-1.a1.typesense.net>' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: <https://testapp.digibeetle.eu>' \
  -H 'pragma: no-cache' \
  -H 'referer: <https://testapp.digibeetle.eu/>' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"200,500,100,50,250,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\" state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed
👍 1
j
Looks like you’re running 0.23.1. Can we try upgrade you to the latest version to see if some of the fixes we have there help with your dataset?
j
yes please
👍 1
Do we need to re-sync the collection index?
j
No, not necessary
👍 1
j
Should we test again?
j
I was just testing after the upgrade… Looks like the issue still persists
Taking a closer look
Could you try setting the weights to this:
Copy code
"query_by_weights": "127,127,100,50,127,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1",
Weights can only go up to a max of 127, beyond that it causes overflow and I wonder if that’s causing issues
j
ah alright! sorry about that
checking it right now
No effect unfortunately
One sec I will double test this again
Copy code
curl '<https://ocpdr54qif7a3tb0p-1.a1.typesense.net/multi_search?x-typesense-api-key=meSE8pinMxwFJlECsmTRVMLCbrrzoL2R>' \
  -H 'authority: <http://ocpdr54qif7a3tb0p-1.a1.typesense.net|ocpdr54qif7a3tb0p-1.a1.typesense.net>' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'origin: <https://testapp.digibeetle.eu>' \
  -H 'pragma: no-cache' \
  -H 'referer: <https://testapp.digibeetle.eu/>' \
  -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: cross-site' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"query_by":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","query_by_weights":"127,127,100,50,127,50,50,25,75,75,50,50,50,50,50,50,50,1,1,1","num_typos":0,"highlight_affix_num_tokens":"20","sort_by":"_text_match:desc,post_date:desc","highlight_full_fields":"post_title,key_topics,shorthand,full_source_title,labels,relevant_gdpr_recitals,directive_95_46_ec_equivalent,post_content,answered_questions,prelim_qs_referred_or_pleas_in_law,data_categories,data_subject_categories,organisation_focus,sectors,party_a,party_b,party_c,case_law_doc_celex_id,case_law_documents,general_documents","collection":"sources","q":"\"state of the art\"","facet_by":"key_topics,relevant_gdpr_articles,document_types,sectors,document_categories,document_status,type_of_bcr,competent_supervisory_authority_bcr_lead,case_law_case_status,case_law_case_stage,outcomes_of_the_procedure,type_of_procedure,advocate_general_name,judge_rapporteur,chamber,post_date,source_types,source_abbreviation","max_facet_values":10,"page":1,"per_page":10}]}' \
  --compressed
I get the same results
j
Ok, thank you for checking. We’ll take a closer look later today and keep you posted.
🙌 1
j
meanwhile I will double check the results in Typesense Cloud again
👍 1
I compared our app and the Cloud again using the same query_by and querry_by_weight settings. The results in Cloud are (almost) the same when searching the phrase match with a space prepended in the double quoted query. They are like we expect them to be in terms of relevance. But the results are different (and not like we expect them to be) without the space in the phrase query.
👍 1
@Jason Bosco I presume there is no update on this yet ?
k
I will be looking into this issue today. Will update.
I've identified the issue and will work on a patch for this. I will keep you posted.
@Jan Willem Hoogstraten I've a fix for this problem. Can we update your cluster to the version with the fix? Let me know if we can go ahead and do that (or you prefer a particular time to do that).
j
Hi @Kishore C You can update the cluster, thanks!
k
Done, please check again
j
No difference
k
Hmm, let me look. I did test locally on a similar document that reproduced the issue.
j
Let me know if I can provide you with anything that might help, or maybe I need to refresh/make changes on our side. I’ll re-test this on the cloud as well now, but in our app we don’t see any changes in the results.
k
Ok see this query:
Copy code
curl '<https://ocpdr54qif7a3tb0p-1.a1.typesense.net/multi_search?x-typesense-api-key=meSE8pinMxwFJlECsmTRVMLCbrrzoL2R>' --data-raw '{"searches":[{"query_by":"case_law_documents","sort_by":"_text_match:desc,post_date:desc","collection":"sources","q":"\"state of the art\"","per_page":10, "highlight_fields": "case_law_documents", "include_fields": "id"}]}' | jq
Earlier, this was producing hits that did not have the full phrase.
j
But that query is only querying 1 attribute
Btw, does the
text_match_info.score
tells us anything about the relevance of the hits while using a phrase (double quoted) query? Because in the cloud and our app we get scores of 100, while the query with a prepended space in the query gives us text_match_scores that are way higher (e.g.
text_match_info.score: 2314894167593451644
)
k
For phrase search since all documents have the exact match, it's just a constant. The match info is misleading there. We should fix it.
j
Thanks for clearing that up, do you need more information from me to be able to reproduce the relevance problem we experience when querying with phrase search using this curl:
Copy code
curl '<https://cloud.typesense.org/clusters/ocpdr54qif7a3tb0p/api/multi_search>' \
  -H 'authority: <http://cloud.typesense.org|cloud.typesense.org>' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'cache-control: no-cache' \
  -H 'content-type: text/plain' \
  -H 'cookie: _gcl_aw=GCL.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; _gcl_au=1.1.2056632186.1671042832; _gac_UA-116415641-1=1.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; __stripe_mid=67095200-c2fc-4b42-889f-8ac824a73822c95db9; _gac_UA-116415641-2=1.1671042832.CjwKCAiAheacBhB8EiwAItVO23gnt1Leqk8-8BYwLubtO9k8e2FfBfyTot8gfc9tXYWXeegNH8Pf_RoCgaUQAvD_BwE; _ga_XTFPJRM8H9=GS1.1.1673438604.3.0.1673438604.0.0.0; _ga=GA1.2.853002192.1671042832; _gid=GA1.2.634655676.1678705359; _dc_gtm_UA-116415641-1=1; __stripe_sid=268480af-be78-45ce-b9d4-c17b80f5cc1eea59c6; _typesense_cloud_app_session=ERv8bWisgxuGfRaMHP%2FIjcXxj753nfkzx%2F0gY3eRs5zbw3eZL8e8DEm%2BpR5pwRxAKxwNPg9611knkD5JAYOKfOcaWH6N5EwjELCFtmf2e1tWYz8goYHna%2FMA1to%2FgTaPnEbal%2BC40i81sGiiKiQorGiKGdUGB0C0sLfYEiI2w1HiCzuPJL4AWmZMYTTIv32pJlADdyu9OY0txz28jDUk41Ac2Z6GWrTj%2FHDy9jFpEPXeGz4x1uA6pfFwdrnJ4c0qNT3wH41%2B8%2FnQ%2BonBVDd%2FfF%2BjbySRHprCttBHh%2BMVMOk87EW7UouFRfkF9yK8nG8akO7wPlhiUSlhc8uyz8cPjEx24jVvQgvq%2B5YbEVyMR6VjfDrc6v8V2G9fpVhmUuOYvnvIww8FafGpBbkJbiQfjzE1CocLXx73wiP%2BPNfL%2Bw5s%2BqkeFrdOyxXd2RkPKDzQs%2F1nVO4y6AD1Ps47ZWTxqQ6IXQHB17tqtbKPFhBL254h%2BM%2Fpv8zKXkBu0LGShzYTxoFwnMOQLKIBptSFzVbYHyiTB8EhGkGjX0a2bpLkKq9m7qBLNRK4YOENw0EbaoVZMDBA4QlMm9I%3D--ndZv%2FxfNntvyU3R%2B--hm%2FIrIn7YLqDjuw58R0%2B3Q%3D%3D' \
  -H 'origin: <https://cloud.typesense.org>' \
  -H 'pragma: no-cache' \
  -H 'referer: <https://cloud.typesense.org/clusters/ocpdr54qif7a3tb0p/collections/sources/documents/search>' \
  -H 'sec-ch-ua: "Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36' \
  -H 'x-csrf-token: kAR1Yt8jPXq1PyOwGH6cPu7EL80Qpz7gUOKDHveJrWgtSy-gD1WMDuacdFI06KPdE7xgVq-PSAEzfw37vpHWQA' \
  -H 'x-typesense-api-key: iyplt0v67zms5uhdk3w8gxcae4oqbrfn' \
  --data-raw '{"searches":[{"preset":"Main preset like testapp","collection":"sources","q":"\"state of the art\"","page":1,"per_page":5}]}' \
  --compressed
k
Thanks, let me analyze and get back to you.
The output of that query contains the phrase matches correctly. Can you tell me what issue you are seeing, maybe I am missing something.
First result:
Copy code
"general_documents": [
              {
                "matched_tokens": [
                  "state",
                  "of",
                  "the",
                  "art",
                  "of",
                  "the"
                ],
                "snippet": "based on the present <mark>state</mark> <mark>of</mark> <mark>the</mark> <mark>art</mark> <mark>of</mark> <mark>the</mark> Internet, which"
              }
            ],
We do highlight all tokens appearing in the query in the snippet, but the phrase
state of the art
does exist.
j
When we prepend the phrase search with a space ” state of the art” the results are more relevant (and completely different) compared to “state of the art”
It’s not that we don’t see results that have matches, the problem is that the relevance order is not like we’d expect.
k
I think that's just a coincidence. I have to check how that additional space is being treated by the engine, maybe that influences a different ordering.
j
The results we see in the cloud using the preset (like in my last curl code), are the least relevant results if we don’t use a prepended space in the phrase search. If we add the space, the results are perfect.
k
With the space, 750 results are fetched, but without it, only 75.
The earlier problem with non-phrase matches being returned is fixed: this seems different and relevancy related. Will look again and get back to you.
The first result which you deem more relevant (id: 4580) when space is used, also occurs in the query without prefix space, but it occurs much later.
Would you able to describe to me why you find that more relevant? Because all results are documents that contain the exact phrase. So what makes some more relevant than others? I can use this as a guide to see how we can fix the ordering.
j
I thought the order of the results should be dictated by the
query_by_weight
in combination with
text_match:desc
sorting
But I will try to explain why we expect these results to rank high and why the other should be ranked low.
If the search queries contain more than one word between quotation marks, then we get unexpected results. First example: There is a record (
ID 4580
) with an attribute
post_title
and the value
EDPB Guidelines 4/2019
If you query this value title without double quotes, we get good results, meaning: this is the first hit we see. If you use phrase search this record shows as the third result even though the
post_title
has
query_by_weights: 115
Which is the third-highest weight that we’ve used. But… if you phrase search with a space prepended like
" EDPB Guidelines 4/2019"
, the results are like we’d expect. Second example: The same record (
ID 4580
) has an attribute
key_topics
and an (array) value containing “state of the art”. This attribute has
query_by_weights: 127
(the highest weight we’ve used and we’ve used this weight only for this attribute). This same record also has an attribute called
labels
containing an array with the value “state of the art”, this label attribute has
query_by_weights: 120
If you use phrase search this record by
"state of the art"
this records ends up somewhere on ranking-position 50. Again… if you phrase search with a space prepended like
" state of the art"
, the results are like we’d expect.
k
Thanks for the detailed examples. I've been looking into this behaviour myself and I see that there are cases where the weight is being ignored in phrase search that is having an impact here. With a padded space, the search query is no longer being searched via the phrase search code path so that's why the weights work. I'm working on a fix. I'll keep you posted.
j
Thanks!
Hi Kishore, not to rush you but to manage expectations here on my side. Do you have an ETA for the phrase search fix?
k
I'm working on incorporating the weights properly into phrase search. It will take a few days to implement and then thoroughly test. So I should be able to get you a patched build by early next week.
j
👌
k
👋 I have a build with a patch. Do you want to test it out first on a dev/staging environment first?
j
Hi!
You can roll it out to the current environment! Or was it already rolled out because we see that things changed for the better 👌🏽
k
Not yet, I can roll it out now. Since your instance is not a HA instance there will be a downtime. So let me know if you want to do it at a specific time.
j
Hi, please roll out asap 👌🏽
k
It's done.
j
Yes this looks perfect, I’m going to do some more testing but it seems like it’s working!
k
Happy to hear!
j
Hi
Not sure if this is related, but if we add any symbols in the searchbar then our document titles return with a ‘Undefined’.
Like this, all results do show the correct date and they correctly link to the documents.
If you start searching using double quote
k
Can you share the request being made for this query like before?
Ignore, I think we have to handle this. I will get back to you.
j
Thanks
k
What's happening here is that the special characters are removed from the query string unless they are explicitly allowed via
symbols_to_index
configuration. This results in an empty query string which is treated as a
*
wildcard search. Those "undefined" values are showing up because we don't return highlights for wildcard searches since the query is essentially empty / catch-all.
j
In general, when you process the response from Typesense to render the UI, you want to first check if a highlight exists for that field inside the
highlight
key, if it doesn’t then fallback to the field inside the
document
key…
j
Thanks for your reply. Instead of showing all results on a wildcard search, how can I prevent a wildcard search all together? I’m using instantsearch.js
j
j
Thanks Jason, but I don’t understand how this part (
helper.state.query === ''
) is related to wildcard searches, it just hides something if the searchbox is empty. A user can still type in special characters and we get the ‘Undefined’ results. I have played around with this helper.state.query before, but it seems only useful in specific cases when you only want to listen to the state of the searchbox. This helper does not account for the use of facets/checkbox-filter states. Anyway, that is something not related to Typesense
k
Typesense treats an empty q as a wildcard query as well. If there are only special characters in q then those are removed by tokenizer and we again end up with empty q string.
j
Yes
But how can I prevent from allowing ‘wild card’-results? If I type any special character I see that character as a value for the
q
in the payload. So
q
is not empty when it’s sent to Typesense right? Typesense will interpret this as a empty q string, but it will returns all results from the index…. I want to prevent this last thing from happening.
k
I regret allowing empty q to be treated as a wildcard 😞 The work around is to mimic Typesense behavior client side. Check if q contains all symbols in addition to checking if it's empty.
j
Yes that’s what I’m looking for (client-side solution), I seems that the example of Jason is doing this (the songs-search demo), but I figure out what code is filtering these special characters.
if I search for
&
for example, nothing happens -> no payload… what part of the code is preventing this from happening?
k
How about just checking
/^[^a-zA-Z0-9]+$/.test(helper.state.query)
to check if query string contains only alpha numeric? That can be added along with the empty string check here: https://github.com/typesense/showcase-songs-search/blob/e7ad97ce4e09191743abd727c2dfc949811bbcd6/src/app.js#L176
j
Thanks Kishore, going to give it a try!
k
Actually the above will not allow a string with both special characters and alpha numeric like
foo? bar
This will work:
Copy code
/[a-zA-Z0-9]/.test(helper.state.query)
Will return
true
if atleast one alpha numeric character appears in the query string, which is what we want here.