#community-help

Understanding Search Result Variations with Filtering Parameters

TLDR SamHendley faced inconsistencies in the number of documents returned when adding more filter parameters. Jason explained it's due to Typesense limiting the number of variables checked for better performance. Increasing max_candidates or enabling exhaustive_search can help obtain all values.

Powered by Struct AI
Nov 17, 2022 (13 months ago)
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
09:05 PM
New question. I am trying to determine why adding more filter parameters can sometimes drastically change the number of documents I get back for a term. I had assumed it was likely ‘typo’ correction of some sort but if so I can’t find the correct set of parameters to disable it. The sort of words that trigger this are short (3 letter) prefix that matches lots of slightly longer words in my data set.
The way this shows up is if I search for ‘bad’ I get 30 results (Limit is 50 so this would appear to be an ‘exhaustive listing’). These documents are spread across 5 types of documents (reported as facets). If I then filter to any of those document types I will sometimes get a much larger document count. The correct count in this case would have been 42. It’s hard to analyze the extra entries but it looks like it might be mostly cases of “harder to find” values in the middle of a string.
None of these parameters made a difference:
NumTypos:            operutil.NewLit(0),
DropTokensThreshold: operutil.NewLit(0),
SplitJoinTokens:     operutil.NewLit("off"),
TypoTokensThreshold: operutil.NewLit(0),
MaxCandidates:       operutil.NewLit(0),

Eventually I figured it out to be related to ‘Prefix’. If I disable Prefix matching I get stable results. This isn’t a problem per-se but it’s not obvious why it is “giving up” so early and not finding all of the documents that can match the data. Any thoughts? If nothing else I’d recommend updating the documentation to indicate “prefix searching may return incomplete answers in X or Y cases”.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:36 PM
For short prefixes, given the number of variations possible, for performance reasons, Typesense only picks the top max_candidates unique candidates. So if you set that value to say 10K or set exhaustive_search: true you should see all values
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
09:44 PM
does max_candidates change based on other conditions?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:45 PM
No it’s currently fixed to 4 by default. We’re planning to automatically increase it to 10 if the number of records is less than 100K… and use 4 otherwise in a future release