#community-help

Troubleshooting Typo Correction in Typesense Search

TLDR John encountered issues with the typo costs while executing prefix searches in Typesense. Kishore Nallan tracked and resolved the problem, providing John with an updated build to verify the fix.

Powered by Struct AI

2

May 27, 2022 (19 months ago)
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
12:22 PM
Is it expected to drop characters without counting it towards the typo cost when doing prefix search? E.g. with num_typos=2 we get that earrings matches arvin even though the edit distance is 4, but earrin and arvin has edit distance of 2. Not sure it’s dropping, it’s just my best guess but it seems like strange behaviour to me. It doesn’t happen with prefix: false
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:00 PM
May I know what version of Typesense you are using?
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
01:01 PM
Sure, 0.22.2
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:10 PM
John I'm not able to reproduce this. I indexed a single document with a title field having the word "Earrings" then when I query with ?q=arvin&query_by=title I get no results. Can you provide a reproduceable snippet?
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
01:17 PM
import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)

client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:47 PM
Thanks! I just verified that this is fixed in the 0.23 RC builds.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
01:48 PM
That’s great! Do you know what the issue was? 🙂 Just curious
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:49 PM
Over eager typo correction 🙂
May 30, 2022 (19 months ago)
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
08:02 AM
But now it doesn’t seem to match earrings to earring even though it’s just 1 typo, example:
import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)
client.collections[COLLECTION].delete()
client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "fancy earring", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "3", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])

just gives the document with earrings
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:08 AM
So the num_typos parameter is basically a maximum value of typos allowed. Since there is already a record with exact match, other typos are not considered. This behavior can be tweaked with the typo_tokens_threshold parameter. This parameter controls the minimum number of results that should be fetched before typo relaxation is stopped. Since the default is 1, Typesense does not look for words with more typos when it finds atleast a document with exact match.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
08:08 AM
I just realized that, makes sense, thank you for being so responsive! 🙂

1

08:26
John
08:26 AM
I think something’s still off with the cost calculation, unless I’m missing something.

With typo_tokens_threshold=50 and num_typos=2 I still get arvin as a result when querying for earrings. With num_typos=1 I don’t get it. I think that it should only show up if num_typos=4. It still only happens with prefix=True.

This is on 0.23.0.rc70
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:58 AM
Reproduceable, I will check and get back to you.

1

May 31, 2022 (19 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:32 AM
I've identified the issue. Will fix, test and have a build available for testing in the next few days.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
07:27 AM
Thank you Kishore!
Jun 07, 2022 (19 months ago)
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
06:13 AM
Any progress on this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:20 AM
Yes, I can share a build with you. Do you use Docker?
06:28
Kishore Nallan
06:28 AM
I've published typesense/typesense:0.24.0.rc2 to Docker that contains this fix.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
06:49 AM
Awesome, we’ll take a look
07:40
John
07:40 AM
Seems to work, thanks!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
07:40 AM
Super, thanks for confirming!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community