I m having issue with exact match and mostly for repeated wo typesense #community-help

I’m having issue with exact match and mostly for r...

JinW

11/01/2021, 6:17 AM

I’m having issue with exact match and mostly for repeated words. e.g https://songs-search.typesense.org/?songs_1630520530850%5Bquery%5D=boom%20boom “Boom Boom” should be first imo. What’s the best way to deal with this?

Kishore Nallan

11/01/2021, 6:18 AM

Have you tried setting

prioritize_exact_match=true

JinW

11/01/2021, 6:19 AM

I thought prioritize_exact_match is true be default based on the doc.

Kishore Nallan

11/01/2021, 6:21 AM

For some demos we don't set it to true, as it makes the results very monotonous for simple queries like "pizza" if your data set has multiple titles with just that word. I am not sure what configuration songs uses. It might very well be an issue with repeated tokens, but wanted to mention that possibility.

JinW

11/01/2021, 6:28 AM

Yea it is set to true, it seems to ignore the repeated words

Kishore Nallan

11/01/2021, 6:29 AM

Got it, thanks for confirming. Can you please create a quick issue on Github for this? I am actually working on some changes in this area so I can fix it as part of that for the next release!

👍 1

JinW

11/01/2021, 6:33 AM

Since I got your attention here. How do I deal with a search query that contains hyphen? “One Two-Three” should return “One Two Three” and vice versa. I could fix this issue by removing it on my end but it’s not exactly ideal

Kishore Nallan

11/01/2021, 6:34 AM

The latest 0.22 rc builds have a configuration to specify custom characters as separators, which in this case would by hyphen.

JinW

11/01/2021, 6:34 AM

Thank you

Kishore Nallan

11/01/2021, 6:35 AM

https://github.com/typesense/typesense-website/blob/6c7ec794c816c0695879e2b7688dca04fa4fb0d4/docs-site/content/0.22.0/api/collections.md#sche[…]ents

Kishore Nallan

11/01/2021, 6:35 AM

Check

token_separators

🙌 3

Mr Fun

11/03/2021, 5:53 AM

where I have to add

token_separators

? when I create collection format or ??

Kishore Nallan

11/03/2021, 6:12 AM

Yes during collection creation.

Mr Fun

11/03/2021, 10:03 AM

@Kishore Nallan when I add

token_separators

, I can't find it in Response type (

Kishore Nallan

11/03/2021, 10:03 AM

What version of Typesense are you using? It's available only in recent 0.22 RC builds.

Mr Fun

11/03/2021, 10:10 AM

I use the version

0.21

, it means I have to update to

0.22

Kishore Nallan

11/03/2021, 10:11 AM

Yes, correct. 0.22 is not out yet, but we publish pre-release builds. These are pretty stable now and can be used.

🙌 1

Mr Fun

11/03/2021, 10:11 AM

Ok , thanks a lot !🙌

Mr Fun

11/03/2021, 10:19 AM

Is it available docker image for

v0.22.0

? How can I test it ??

Kishore Nallan

11/03/2021, 10:31 AM

Yes it is available.

Kishore Nallan

11/03/2021, 10:32 AM

Check

0.22.0.rcs22

Docker image.

🙌 1

Mr Fun

11/03/2021, 10:46 AM

ok , thanks

Mr Fun

11/03/2021, 11:23 AM

@Kishore Nallan I have one more question 🙃. Is it possible to send headers on query. I need to send {Bearer Token} on headers.

Kishore Nallan

11/03/2021, 11:23 AM

I don't follow you. Custom headers to Typesense?

Mr Fun

11/03/2021, 11:25 AM

@Kishore Nallan yes )

Kishore Nallan

11/03/2021, 11:26 AM

And what should Typesense do with that?

Mr Fun

11/03/2021, 6:42 PM

@Kishore Nallan

Copy code

Sorry, for making a confusion here. The question sounded a bit silly without the context. 
We build a proxy backend (only for read operations), and this proxy behaves as is Typesense API, so that we can use Algolia Instant Search UI lib with no customization.
The only problem is that we want to pass a bearer token in a special header so our backend proxy could process it and if it's allowed forward the request to typesense.

We couldn't find a way to send a custom header through InstantSearch API. Maybe it's better to target this question to the Algolia community, but wanted to check with you guys first. Maybe you've already faced this one before

JinW

11/04/2021, 7:13 AM

@Kishore Nallan I also confirmed that the issue is also in 0.22.0.rcs22 for repeated tokens

JinW

12/08/2021, 7:31 PM

@Jason Bosco @Kishore Nallan Do you know if anyone looking int o this one. Thank you

JinW

12/08/2021, 7:32 PM

https://github.com/typesense/typesense/issues/427

Jason Bosco

12/08/2021, 8:33 PM

@JinW Looks like some of the fixes in v0.22 have addressed this issue:

JinW

12/08/2021, 9:11 PM

Oh that’s weird, i can’t seem to get it to work on my end. Let me confirm it again. Is that in the latest released build or rcs build

Jason Bosco

12/08/2021, 9:13 PM

That's in the version released publicly yesterday 0.22

JinW

12/08/2021, 9:37 PM

Yea it didn’t really fix it.

merry merry

pasta pasta

. “Merry Merry Christmas” or “Pasta Pasta” should be first in the result.

boom boom

results are still not in the correct exact order.

JinW

12/08/2021, 9:39 PM

message has been deleted

Jason Bosco

12/08/2021, 9:40 PM

Oh well... Will take a closer look in the coming weeks

❤️ 1

Kishore Nallan

12/09/2021, 5:17 AM

Can you please confirm that you're setting

?prioritize_exact_match=true

JinW

12/09/2021, 5:54 AM

@Kishore Nallan Yes, it is set to true.

Kishore Nallan

12/09/2021, 5:55 AM

Ok got it. Do you have a small dataset on which this is trivially reproduceable? Like maybe a test set with 4-5 documents.

JinW

12/09/2021, 6:17 AM

Here is an example dataset I created. The query is

mong mong

. And it will never show up first.

mong.jsonl

Kishore Nallan

12/09/2021, 6:17 AM

Thanks I will check

❤️ 1

JinW

12/09/2021, 6:19 AM

I think it should at least show up in this order since the token drop from RTL.

Copy code

{"id":"9", "title": "Mong Mong"}
{"id":"26", "title": "Mong Mong Racoon"}
{"id":"4", "title": "Mong Mong SATELLITES JAPAN"}
{"id":"18", "title": "Mong Mong Zi Zung"}
{"id":"7", "title": "Mong Mong Tapo & Raya"}

JinW

12/16/2021, 7:25 PM

Update: We temporary solved this issue by storing the string “Mong Mong” without space and change the query for repeated words “MongMong”. (not ideal but it works for now)

Open in Slack

Previous Next