#community-help

Understanding Typesense's `drop_tokens_threshold` and `typo_tokens_threshold`

TLDR em1nos sought clarification on Typesense's drop_tokens_threshold and typo_tokens_threshold. Kishore Nallan defined them, emphasizing that they depend on the number of documents found, not tokens or typos; num_typos configures the typo allowance.

Powered by Struct AI
Aug 01, 2021 (30 months ago)
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:06 PM
Kishore Nallan when you have some time over, could you eloborate a bit on how drop_tokens_threshold and typo_tokens_threshold work. I've read the docs about it but I'm not sure I understand it fully.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:13 PM
Let's say your query is "alpa beta gamma". There are 3 words/tokens in this query. Each of these tokens could contain a typo (in this case, "alpa" is wrong). When you set typo_tokens_threshold: X you are telling Typesense to continue generating alternative tokens from the tokens in the query that are within an edit distance of num_typos until you find atleast X results. You want to stop at some point, because you can keep modifying the query tokens to generate a lot of alternative tokens.

Similarly, there might be no documents that contains all tokens in the query. In that case, Typesense tries to drop tokens in the query, for e.g. searching only for "beta gamma" to find relevant documents. When you set drop_tokens_threshold: X you are telling Typesense to continue dropping tokens from the query until X results are found.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:33 PM
ok, so the X is both cases is how many results/documents it needs to find at a minimum?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:34 PM
Yes, threshold on number of docs to continue either looking for tokens with more typos or dropping more tokens from original query.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:34 PM
the X is not about how many tokens to drop, or how many typos to allow ...
03:34
em1nos
03:34 PM
oh ok
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:34 PM
Correct, X is number of docs.
03:35
Kishore Nallan
03:35 PM
Should have been named maybe drop_tokens_num_docs or something.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:35 PM
I understands
03:36
em1nos
03:36 PM
so how does num_typos play together with these previous settings?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:36 PM
num_typos is the maximum number of typos (0, 1, 2) allowed.
03:37
Kishore Nallan
03:37 PM
First tokens with typo = 0 is used to fetch results. If not enough results found, then look for results which contain tokens with typo = 1, and do the same for typo = 2. If any point, the threshold is reached, it will stop.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
07:03 PM
ok understood. thank you for explaining.
07:04
em1nos
07:04 PM
is it doing the drop tokens first, or typos first? how does that work?
Aug 02, 2021 (30 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:19 AM
Typos first and then dropping tokens.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Understanding Typesense Query Fuzziness and Thresholds

Ashraful was confused about different query results when applying filters in Typesense. Jason clarified the function of `drop_tokens_threshold` and `typo_tokens_threshold` options, explaining their effect on search results and their precedence.

9
3mo

Understanding Typo Tolerance in Search Queries

gab sought clarity on typo tolerance settings in search operations, specifically on the discrepancy in document returns when typos are involved. Kishore Nallan explained the "num_typos" and "typo_tokens_threshold" parameters within search queries, and how they dictate typo allowance during searches.

2

13
22mo

Understanding 'max_candidates' and 'num_typos' Parameters in Typesense

Narayan asked about difference between 'max_candidates' and 'num_typos' parameters in typo tolerance within Typesense. Jason referred them to the documentation. Kishore Nallan offered clarity and answered Narayan's follow-up questions, as well as addressed Akash's query about case sensitivity in Typesense.

3

14
3w

Issue with Search Term Results in Typesense

Dipankar had issues with specific search terms returning unexpected results. Kishore Nallan clarified why this may occur and how to fine-tune the behavior using the 'drop_tokens_threshold' parameter in Typesense.

1

9
16mo

Typesense Search Solution Issues

Rolando faced incorrect search results using Typesense. Kishore Nallan suggested changing typo parameters and upgrading Typesense version. However, undesired results persisted and need further investigation.

1

14
31mo