I’m seeing some strange behaviour with typo correc...
# community-help
j
I’m seeing some strange behaviour with typo correction/prefix search. Haven’t quite nailed it down yet, but
max_candidates
affects it and I don’t understand how so I’d like to get some clarification on what it does. In the documentation it says
Control the number of words that Typesense considers for typo and prefix searching.
and I interpret that to refer to words in the search query, but then what I’m seeing doesn’t make sense so maybe I misunderstood it! Posting an example of what I mean in comments
Basically I have documents like
Copy code
{
  "title": "foobar",
  "description": "baz"
}
in my collection, and I try to search for
fooba baz
. With default
max_candidates=4
I get no results, with
max_candidates=1000
I get correct results. In my mind this parameter shouldn’t have any effect, but I must be misunderstanding something.
k
What version are you trying this on?
j
Seeing the same behaviour on both
0.23.0
and
0.24.0.rc12
k
What kind of results are you getting without setting a high max_candidates?
j
I get no results
k
I'll be happy to debug this issue if you can make the dataset (or any subset that exhibits the problem) available to me. You can DM me or email me.
j
Alright I tried to make something minimal but it’s inherently quite complicated… https://pastebin.com/TvLRZJ5G Some notes: • It only seems to happen when the random strings contain special characters
#{}|
• It only seems to happen when one of the query characters is a special character • It only happens with
drop_tokens_threshold=0
k
I will take a look and get back to you.
🙌 1
I looked into this. For the query
foobar o}
the document that's matching is
{'description': 'o}', 'title': 'foobars'}
When you have 2 tokens and each token can have variations of prefix/typo combination, the actual possible number of queries will be a combination each token's variations. Max candidates parameter also governs the number of combinations checked in multi word queries. So increasing max_candidates helps in not restricting the query earlier.
👍 1