Issue with Typo Correction/Prefix Search and the Role of max_candidates
TLDR John noticed inconsistent search results based on max_candidates settings, and Kishore Nallan clarified its role for multi-word queries. They resolved that increasing max_candidates ensures the query isn't prematurely limited.
1
1
Jun 29, 2022 (18 months ago)
John
12:15 PMmax_candidates
affects it and I don’t understand how so I’d like to get some clarification on what it does. In the documentation it says> Control the number of words that Typesense considers for typo and prefix searching.
and I interpret that to refer to words in the search query, but then what I’m seeing doesn’t make sense so maybe I misunderstood it!
Posting an example of what I mean in comments
John
12:16 PM{"title": "foobar", "description": "baz"}
in my collection, and I try to search for
fooba baz
. With default max_candidates=4
I get no results, with max_candidates=1000
I get correct results. In my mind this parameter shouldn’t have any effect, but I must be misunderstanding something.Kishore Nallan
12:20 PMJohn
12:23 PM0.23.0
and 0.24.0.rc12
Kishore Nallan
12:31 PMJohn
12:43 PMKishore Nallan
12:44 PMJohn
01:56 PMhttps://pastebin.com/TvLRZJ5G
Some notes:
• It only seems to happen when the random strings contain special characters
#{}|
• It only seems to happen when one of the query characters is a special character
• It only happens with
drop_tokens_threshold=0
Kishore Nallan
03:07 PM1
Jul 04, 2022 (18 months ago)
Kishore Nallan
01:56 PMfoobar o}
the document that's matching is {'description': 'o}', 'title': 'foobars'}
When you have 2 tokens and each token can have variations of prefix/typo combination, the actual possible number of queries will be a combination each token's variations. Max candidates parameter also governs the number of combinations checked in multi word queries. So increasing max_candidates helps in not restricting the query earlier.
1
Typesense
Indexed 3011 threads (79% resolved)
Similar Threads
Understanding Typesense Query Fuzziness and Thresholds
Ashraful was confused about different query results when applying filters in Typesense. Jason clarified the function of `drop_tokens_threshold` and `typo_tokens_threshold` options, explaining their effect on search results and their precedence.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.
Issue with Search Term Results in Typesense
Dipankar had issues with specific search terms returning unexpected results. Kishore Nallan clarified why this may occur and how to fine-tune the behavior using the 'drop_tokens_threshold' parameter in Typesense.
Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense
Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.