Addressing `num_typos` Inconsistency in Document Search
TLDR John had an issue with num_typos
inconsistency when using prefix search. Kishore Nallan clarified the technical aspects, adjusted the aggressiveness of the feature and resolved the issue. They also discussed a limit on num_typos
value.
1
Jul 13, 2022 (15 months ago)
John
12:22 PMnum_typos
is consistently respected, in our production use-case we get results with edit distance 4 even though we have num_typos: 2
. It only happens with prefix search turned on.If I have two documents with
storka
and sparkling
and search for starkbin
(edit distance 4 and 3 respectively) I get both results on 0.24.0.rc16
and only sparkling
on 0.23.0
Kishore Nallan
12:24 PMJohn
12:24 PMJul 14, 2022 (15 months ago)
Kishore Nallan
11:20 AMq=strawberries
to match against a word like strawberry
-- since the query word is longer than the indexed word, this is not a prefix search technically, but it "looks" correct and needed to be matched, especially in English where searching for plural form of singular words are common.So we had to add a special condition here: https://github.com/typesense/typesense/blob/main/src/art.cpp#L1319
The condition allows matching if all these criteria match:
a) The indexed word is greater than 5 chars (to reduce false positives)
b) the word in the query is greater than the indexed word (like the strawberry example above)
c) if the difference in their length is within the maximum typo allowed (in this case it is 2)
Unfortunately when we make some relaxations like that, some other non-obvious cases like
storka/starkbin
can match and look odd.John
11:26 AMKishore Nallan
11:27 AMJohn
03:19 PMJul 15, 2022 (15 months ago)
Kishore Nallan
06:04 AMJul 19, 2022 (15 months ago)
Kishore Nallan
11:33 AM0.24.0.rc20
John
11:41 AMnum_typos=10
, hmm (that’s not a problem since we never want it to match but I would expect it to!)Kishore Nallan
11:42 AMnum_typos
can only be 0,1 or 2. It's too expensive to fuzzy matching above that.John
11:42 AM1
Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Issue with Typo Correction/Prefix Search and the Role of max_candidates
John noticed inconsistent search results based on max_candidates settings, and Kishore Nallan clarified its role for multi-word queries. They resolved that increasing max_candidates ensures the query isn't prematurely limited.
Typesense Search Solution Issues
Rolando faced incorrect search results using Typesense. Kishore Nallan suggested changing typo parameters and upgrading Typesense version. However, undesired results persisted and need further investigation.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Understanding Typo Tolerance in Search Queries
gab sought clarity on typo tolerance settings in search operations, specifically on the discrepancy in document returns when typos are involved. Kishore Nallan explained the "num_typos" and "typo_tokens_threshold" parameters within search queries, and how they dictate typo allowance during searches.
Typesense Search Issue with Prefix Search and Typo Correction
John raised an issue with Typesense search results concerning typo correction and prefix searching. Kishore Nallan explained the behavior based on the system parameters for typo constraints. He later corrected a mistake in documentation brought up by John.