#community-help

Resolving TypeSense Query Query Confusion

TLDR Nelson didn't understand why a query for "hong kong" returned "singapore". Jason recommended changing a search parameter, then explained how TypeSense attempts to find results when exact matches aren't available. Kishore Nallan further clarified the issue and Jason and Kishore Nallan mentioned changes in the upcoming version to tackle this.

Powered by Struct AI

4

1

Sep 01, 2021 (29 months ago)
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
07:51 PM
hi Friends of TypeSense. y need help. i send this query to index typesense and no understand this result. The query it is "hong kong" and result contain "singapore,singapore" ?????? please help me
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:53 PM
That's quite a drastic edit distance to be considered a typo. Do you have synonyms set for hong kong?
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
07:54 PM
no, i dont have synonyms create.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:55 PM
Could you show the contents of the search function?
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
07:56 PM
07:59
Nelson
07:59 PM
data in index
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:59 PM
Could you add drop_tokens_threshold: 0 to the search parameters and see what the result is after that?
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
08:00 PM
ok
08:03
Nelson
08:03 PM
ok it worked. Please, can you explain what happened?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:04 PM
Let me know if this makes sense: If at least drop_tokens_threshold number of results are not found for a specific query, Typesense will attempt to drop tokens (words) in the query until enough results are found. Tokens that have the least individual hits are dropped first. Set drop_tokens_threshold to 0 to disable dropping of tokens.
08:05
Jason
08:05 PM
So basically in your example, hong and kong were individually dropped, and it ended up matching a seemingly unrelated term
08:05
Jason
08:05 PM
In a future version, we plan to reduce the senisitivity of this feature by default, so it doesn't pick up results like this
08:06
Jason
08:06 PM
The original goal of this feature was to make sure that a search query always returns some results that are somewhat related or close to the search query when no exact matches are found, but over time we've seen this to actually cause confusion
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
08:06 PM
aaaa ok ok
08:06
Nelson
08:06 PM
thk Jason

1

08:07
Nelson
08:07 PM
please add this param to documentation
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:07 PM
It is documented here already (in the table): https://typesense.org/docs/0.21.0/api/documents.html#arguments
Nelson
Photo of md5-04a94d0e1b53117c272637eb497a1540
Nelson
08:08 PM
ok thk for you help

1

08:08
Nelson
08:08 PM
:thumbsup:
Sep 02, 2021 (29 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:06 AM
I just checked what's happening here. sing is getting prefix-matched to kong because they are within 2-typos away. In the next version of Typesense we've made the default typo correction less "eager" on such small terms to reduce false positives like this.

1

bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
11:12 AM
Is there an ETA for next version?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:13 AM
We don't have a fixed release schedule. We release when we feel that the build is stable and we have covered enough ground. I think we are about 3 weeks away from the next release given current state of things.

1

11:13
Kishore Nallan
11:13 AM
We do have pre-release builds available for testing and they are generally stable as they are produced only after an exhaustive internal test suite approves them.
CaptainCodeman
Photo of md5-d3a4ca49ba4aeb3b9d0cb7d846eb0989
CaptainCodeman
08:30 PM
would it make sense for the number of typos to factor in the word length?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:31 PM
Yup, that's exactly what we're doing in the next version to reduce "eagerness".

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Issues with Repeated Words and Hyphen Queries in Typesense API

JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.

8

43
25mo

Resolving Typesense Search Issues

Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.

1

14
21mo

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo

Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense

Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.

3

29
26mo

Array Field Autocomplete Issue in Typesense Migration

Kanwei encountered issues with autocomplete when migrating from Elasticsearch to Typesense. Jason and Kishore Nallan identified it as a bug and instructed Kanwei to create a GitHub issue.

1

20
9mo