#community-help

Understanding Typesense's `drop_tokens_threshold` and `typo_tokens_threshold`

TLDR em1nos sought clarification on Typesense's drop_tokens_threshold and typo_tokens_threshold. Kishore Nallan defined them, emphasizing that they depend on the number of documents found, not tokens or typos; num_typos configures the typo allowance.

Powered by Struct AI
15
25mo
Solved
Join the chat
Aug 01, 2021 (25 months ago)
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:06 PM
Kishore Nallan when you have some time over, could you eloborate a bit on how drop_tokens_threshold and typo_tokens_threshold work. I've read the docs about it but I'm not sure I understand it fully.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:13 PM
Let's say your query is "alpa beta gamma". There are 3 words/tokens in this query. Each of these tokens could contain a typo (in this case, "alpa" is wrong). When you set typo_tokens_threshold: X you are telling Typesense to continue generating alternative tokens from the tokens in the query that are within an edit distance of num_typos until you find atleast X results. You want to stop at some point, because you can keep modifying the query tokens to generate a lot of alternative tokens.

Similarly, there might be no documents that contains all tokens in the query. In that case, Typesense tries to drop tokens in the query, for e.g. searching only for "beta gamma" to find relevant documents. When you set drop_tokens_threshold: X you are telling Typesense to continue dropping tokens from the query until X results are found.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:33 PM
ok, so the X is both cases is how many results/documents it needs to find at a minimum?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:34 PM
Yes, threshold on number of docs to continue either looking for tokens with more typos or dropping more tokens from original query.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:34 PM
the X is not about how many tokens to drop, or how many typos to allow ...
03:34
em1nos
03:34 PM
oh ok
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:34 PM
Correct, X is number of docs.
03:35
Kishore Nallan
03:35 PM
Should have been named maybe drop_tokens_num_docs or something.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
03:35 PM
I understands
03:36
em1nos
03:36 PM
so how does num_typos play together with these previous settings?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:36 PM
num_typos is the maximum number of typos (0, 1, 2) allowed.
03:37
Kishore Nallan
03:37 PM
First tokens with typo = 0 is used to fetch results. If not enough results found, then look for results which contain tokens with typo = 1, and do the same for typo = 2. If any point, the threshold is reached, it will stop.
em1nos
Photo of md5-f33ae8b829901656b655c985c894be3a
em1nos
07:03 PM
ok understood. thank you for explaining.
07:04
em1nos
07:04 PM
is it doing the drop tokens first, or typos first? how does that work?
Aug 02, 2021 (25 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:19 AM
Typos first and then dropping tokens.