#community-help

Understanding Typo Tolerance in Search Queries

TLDR gab sought clarity on typo tolerance settings in search operations, specifically on the discrepancy in document returns when typos are involved. Kishore Nallan explained the "num_typos" and "typo_tokens_threshold" parameters within search queries, and how they dictate typo allowance during searches.

Powered by Struct AI

2

13
20mo
Solved
Join the chat
Mar 18, 2022 (20 months ago)
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
09:54 AM
Hi,
It seems I misunderstand typo tolerance settings.

I have 2 documents with:
doc1 name: "Linder"
doc 2 name: "Lindenhof"

Here is the search query I'm doing
"limit_hits": 6,
"num_typos": 1,
"per_page": 6,
"q": "Linder",
"query_by": "name",
"typo_tokens_threshold": 1

Only one document is returned: the document 1. The document 2 is not returned.

If one typo is allowed with "num_typos", why I'm not getting both document returned?
Thanks
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:59 AM
gab It's because of "typo_tokens_threshold": 1 -> this means that you want Typesense to continue searching with more and more typos until atleast 1 document is found. In this case, since Lindenhof is a zero-typo prefix match, that requirement is satisfied and typo relaxation is not done. If you increased typo_tokens_threshold to 2, the other result will show up.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
10:01 AM
Wow I'm so sorry I just noticed because of your answer my isssue in the question. I just fix the "q" param.
My issue is with "q": "Linder" param.

1

11:59
gab
11:59 AM
Kishore Nallan I don't get it, isn't it an issue ? Shoudn't "num_typos" cover this case ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:01 PM
Num typos is the maximum number of typos you want to account for, but it is subject to typo tokens threshold. Some wanted no typo results to be shown if there exists documents without typo so we had to introduce the threshold parameter to handle that.
12:03
Kishore Nallan
12:03 PM
By default, typo tokens threshold is 1. Which means that if atleast one document exists which matches with query tokens without any typos, then don't look for other documents with typo matches. If you want typo matches also, then just increase this number to 10 or something.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:07 PM
Ah ok I get it, I thought num_typos was applied and then, results was enlarged with the typo tokens threshold.

But now, how can I say strictly one typo is allowed + allow typo and no typo ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:12 PM
Set num_typos: 1 and set typo tokens threshold to a large enough number that's likely to produce enough results.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:13 PM
num_typo will still be the maximum typo allowed?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:13 PM
Yes, that's what you want right?
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:17 PM
yep exactly. I thought in that situation.
num_typo:1
typo_tokens_threshold : 10

typo_tokens_threshold would try to allow more typo error until 10 is reached.

So just to be sure, it means in the upper case, if I have only one document with 0 or 1 typo, and 100 other documents with 5 typos. I will get only one result?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:19 PM
Yes. Num typos is a max typos that will never be breached. The typo tokens threshold will still respect that.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:22 PM
Ok, thanks for this clarification 🙂 !

1