Addressing `num_typos` Inconsistency in Document Search
TLDR John had an issue with
num_typos inconsistency when using prefix search. Kishore Nallan clarified the technical aspects, adjusted the aggressiveness of the feature and resolved the issue. They also discussed a limit on
Jul 13, 2022 (18 months ago)
num_typosis consistently respected, in our production use-case we get results with edit distance 4 even though we have
num_typos: 2. It only happens with prefix search turned on.
If I have two documents with
sparklingand search for
starkbin(edit distance 4 and 3 respectively) I get both results on
Kishore Nallan12:24 PM
Jul 14, 2022 (17 months ago)
Kishore Nallan11:20 AM
q=strawberriesto match against a word like
strawberry-- since the query word is longer than the indexed word, this is not a prefix search technically, but it "looks" correct and needed to be matched, especially in English where searching for plural form of singular words are common.
So we had to add a special condition here: https://github.com/typesense/typesense/blob/main/src/art.cpp#L1319
The condition allows matching if all these criteria match:
a) The indexed word is greater than 5 chars (to reduce false positives)
b) the word in the query is greater than the indexed word (like the strawberry example above)
c) if the difference in their length is within the maximum typo allowed (in this case it is 2)
Unfortunately when we make some relaxations like that, some other non-obvious cases like
storka/starkbincan match and look odd.
Kishore Nallan11:27 AM
Jul 15, 2022 (17 months ago)
Kishore Nallan06:04 AM
Jul 19, 2022 (17 months ago)
Kishore Nallan11:33 AM
num_typos=10, hmm (that’s not a problem since we never want it to match but I would expect it to!)
Kishore Nallan11:42 AM
num_typoscan only be 0,1 or 2. It's too expensive to fuzzy matching above that.
Indexed 3005 threads (79% resolved)
Understanding 'max_candidates' and 'num_typos' Parameters in Typesense
Narayan asked about difference between 'max_candidates' and 'num_typos' parameters in typo tolerance within Typesense. Jason referred them to the documentation. Kishore Nallan offered clarity and answered Narayan's follow-up questions, as well as addressed Akash's query about case sensitivity in Typesense.
Issue with Typo Correction/Prefix Search and the Role of max_candidates
John noticed inconsistent search results based on max_candidates settings, and Kishore Nallan clarified its role for multi-word queries. They resolved that increasing max_candidates ensures the query isn't prematurely limited.
Typesense Search Solution Issues
Rolando faced incorrect search results using Typesense. Kishore Nallan suggested changing typo parameters and upgrading Typesense version. However, undesired results persisted and need further investigation.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Understanding Typo Tolerance in Search Queries
gab sought clarity on typo tolerance settings in search operations, specifically on the discrepancy in document returns when typos are involved. Kishore Nallan explained the "num_typos" and "typo_tokens_threshold" parameters within search queries, and how they dictate typo allowance during searches.