#community-help

Issues with Repeated Words and Hyphen Queries in Typesense API

TLDR JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on token_separators and how to send custom headers. Issues remain with repeated word queries.

Powered by Struct AI
raised_hands5
heart2
+11
Nov 01, 2021 (23 months ago)
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:17 AM
I’m having issue with exact match and mostly for repeated words. e.g https://songs-search.typesense.org/?songs_1630520530850%5Bquery%5D=boom%20boom “Boom Boom” should be first imo. What’s the best way to deal with this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:18 AM
Have you tried setting prioritize_exact_match=true?
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:19 AM
I thought prioritize_exact_match is true be default based on the doc.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:21 AM
For some demos we don't set it to true, as it makes the results very monotonous for simple queries like "pizza" if your data set has multiple titles with just that word. I am not sure what configuration songs uses. It might very well be an issue with repeated tokens, but wanted to mention that possibility.
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:28 AM
Yea it is set to true, it seems to ignore the repeated words
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:29 AM
Got it, thanks for confirming. Can you please create a quick issue on Github for this? I am actually working on some changes in this area so I can fix it as part of that for the next release!
+11
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:33 AM
Since I got your attention here. How do I deal with a search query that contains hyphen? “One Two-Three” should return “One Two Three” and vice versa. I could fix this issue by removing it on my end but it’s not exactly ideal
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:34 AM
The latest 0.22 rc builds have a configuration to specify custom characters as separators, which in this case would by hyphen.
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:34 AM
Thank you
06:35
Kishore Nallan
06:35 AM
Check token_separators
raised_hands3
Nov 03, 2021 (23 months ago)
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
05:53 AM
where I have to add token_separators ? when I create collection format or ??
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:12 AM
Yes during collection creation.
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
10:03 AM
Kishore Nallan when I add token_separators , I can't find it in Response type (
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:03 AM
What version of Typesense are you using? It's available only in recent 0.22 RC builds.
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
10:10 AM
I use the version 0.21 , it means I have to update to 0.22 ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:11 AM
Yes, correct. 0.22 is not out yet, but we publish pre-release builds. These are pretty stable now and can be used.
raised_hands1
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
10:11 AM
Ok , thanks a lot !🙌
10:19
Mr
10:19 AM
Is it available docker image for v0.22.0 ? How can I test it ??
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:31 AM
Yes it is available.
10:32
Kishore Nallan
10:32 AM
Check 0.22.0.rcs22 Docker image.
raised_hands1
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
10:46 AM
ok , thanks
11:23
Mr
11:23 AM
Kishore Nallan I have one more question 🙃. Is it possible to send headers on query. I need to send {Bearer Token} on headers.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:23 AM
I don't follow you. Custom headers to Typesense?
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
11:25 AM
Kishore Nallan yes )
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:26 AM
And what should Typesense do with that?
Mr
Photo of md5-70864c696bb7a50f63f9b1e057d3c9de
Mr
06:42 PM
Kishore Nallan
Sorry, for making a confusion here. The question sounded a bit silly without the context. 
We build a proxy backend (only for read operations), and this proxy behaves as is Typesense API, so that we can use Algolia Instant Search UI lib with no customization.
The only problem is that we want to pass a bearer token in a special header so our backend proxy could process it and if it's allowed forward the request to typesense.

We couldn't find a way to send a custom header through InstantSearch API. Maybe it's better to target this question to the Algolia community, but wanted to check with you guys first. Maybe you've already faced this one before

Nov 04, 2021 (22 months ago)
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
07:13 AM
Kishore Nallan I also confirmed that the issue is also in 0.22.0.rcs22 for repeated tokens
Dec 08, 2021 (21 months ago)
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
07:31 PM
Jason Kishore Nallan Do you know if anyone looking int o this one. Thank you
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:33 PM
JinW Looks like some of the fixes in v0.22 have addressed this issue:
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
09:11 PM
Oh that’s weird, i can’t seem to get it to work on my end. Let me confirm it again. Is that in the latest released build or rcs build
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:13 PM
That's in the version released publicly yesterday 0.22
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
09:37 PM
Yea it didn’t really fix it. merry merry or pasta pasta . “Merry Merry Christmas” or “Pasta Pasta” should be first in the result. boom boom results are still not in the correct exact order.
09:39
JinW
09:39 PM
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:40 PM
Oh well... Will take a closer look in the coming weeks
heart1
Dec 09, 2021 (21 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:17 AM
Can you please confirm that you're setting ?prioritize_exact_match=true?
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
05:54 AM
Kishore Nallan Yes, it is set to true.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:55 AM
Ok got it. Do you have a small dataset on which this is trivially reproduceable? Like maybe a test set with 4-5 documents.
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:17 AM
Here is an example dataset I created. The query is mong mong . And it will never show up first.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:17 AM
Thanks I will check
heart1
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
06:19 AM
I think it should at least show up in this order since the token drop from RTL.

{"id":"9", "title": "Mong Mong"}
{"id":"26", "title": "Mong Mong Racoon"}
{"id":"4", "title": "Mong Mong SATELLITES JAPAN"}
{"id":"18", "title": "Mong Mong Zi Zung"}
{"id":"7", "title": "Mong Mong Tapo & Raya"}
Dec 16, 2021 (21 months ago)
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
07:25 PM
Update: We temporary solved this issue by storing the string “Mong Mong” without space and change the query for repeated words “MongMong”. (not ideal but it works for now)