#community-help

Issue with Typo Correction/Prefix Search and the Role of max_candidates

TLDR John noticed inconsistent search results based on max_candidates settings, and Kishore Nallan clarified its role for multi-word queries. They resolved that increasing max_candidates ensures the query isn't prematurely limited.

Powered by Struct AI

1

1

10
18mo
Solved
Join the chat
Jun 29, 2022 (18 months ago)
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
12:15 PM
I’m seeing some strange behaviour with typo correction/prefix search. Haven’t quite nailed it down yet, but max_candidates affects it and I don’t understand how so I’d like to get some clarification on what it does. In the documentation it says
> Control the number of words that Typesense considers for typo and prefix searching.
and I interpret that to refer to words in the search query, but then what I’m seeing doesn’t make sense so maybe I misunderstood it!

Posting an example of what I mean in comments
12:16
John
12:16 PM
Basically I have documents like
{"title": "foobar", "description": "baz"}

in my collection, and I try to search for fooba baz . With default max_candidates=4 I get no results, with max_candidates=1000 I get correct results. In my mind this parameter shouldn’t have any effect, but I must be misunderstanding something.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:20 PM
What version are you trying this on?
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
12:23 PM
Seeing the same behaviour on both 0.23.0 and 0.24.0.rc12
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:31 PM
What kind of results are you getting without setting a high max_candidates?
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
12:43 PM
I get no results
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:44 PM
I'll be happy to debug this issue if you can make the dataset (or any subset that exhibits the problem) available to me. You can DM me or email me.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
01:56 PM
Alright I tried to make something minimal but it’s inherently quite complicated…
https://pastebin.com/TvLRZJ5G

Some notes:
• It only seems to happen when the random strings contain special characters #{}|
• It only seems to happen when one of the query characters is a special character
• It only happens with drop_tokens_threshold=0
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:07 PM
I will take a look and get back to you.

1

Jul 04, 2022 (18 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:56 PM
I looked into this. For the query foobar o} the document that's matching is {'description': 'o}', 'title': 'foobars'}

When you have 2 tokens and each token can have variations of prefix/typo combination, the actual possible number of queries will be a combination each token's variations. Max candidates parameter also governs the number of combinations checked in multi word queries. So increasing max_candidates helps in not restricting the query earlier.

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Understanding Typesense Query Fuzziness and Thresholds

Ashraful was confused about different query results when applying filters in Typesense. Jason clarified the function of `drop_tokens_threshold` and `typo_tokens_threshold` options, explaining their effect on search results and their precedence.

9
3mo
Solved

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Issues with Repeated Words and Hyphen Queries in Typesense API

JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.

8

43
25mo

Issue with Search Term Results in Typesense

Dipankar had issues with specific search terms returning unexpected results. Kishore Nallan clarified why this may occur and how to fine-tune the behavior using the 'drop_tokens_threshold' parameter in Typesense.

1

9
16mo
Solved

Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense

Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.

3

29
26mo
Solved