Is it expected to drop characters without counting it toward typesense #community-help

Is it expected to drop characters without counting...

John

05/27/2022, 12:22 PM

Is it expected to drop characters without counting it towards the typo cost when doing prefix search? E.g. with

num_typos=2

we get that

earrings

matches

arvin

even though the edit distance is 4, but

earrin

and

arvin

has edit distance of 2. Not sure it’s dropping, it’s just my best guess but it seems like strange behaviour to me. It doesn’t happen with

prefix: false

Kishore Nallan

05/27/2022, 1:00 PM

May I know what version of Typesense you are using?

John

05/27/2022, 1:01 PM

Sure,

0.22.2

Kishore Nallan

05/27/2022, 1:10 PM

@John I'm not able to reproduce this. I indexed a single document with a title field having the word "Earrings" then when I query with

?q=arvin&query_by=title

I get no results. Can you provide a reproduceable snippet?

John

05/27/2022, 1:17 PM

Copy code

import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)

client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])

Kishore Nallan

05/27/2022, 1:47 PM

Thanks! I just verified that this is fixed in the 0.23 RC builds.

John

05/27/2022, 1:48 PM

That’s great! Do you know what the issue was? 🙂 Just curious

Kishore Nallan

05/27/2022, 1:49 PM

Over eager typo correction 🙂

John

05/30/2022, 8:02 AM

But now it doesn’t seem to match

earrings

earring

even though it’s just 1 typo, example:

Copy code

import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)
client.collections[COLLECTION].delete()
client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "fancy earring", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "3", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])

just gives the document with

earrings

Kishore Nallan

05/30/2022, 8:08 AM

So the

num_typos

parameter is basically a maximum value of typos allowed. Since there is already a record with exact match, other typos are not considered. This behavior can be tweaked with the

typo_tokens_threshold

parameter. This parameter controls the minimum number of results that should be fetched before typo relaxation is stopped. Since the default is 1, Typesense does not look for words with more typos when it finds atleast a document with exact match.

John

05/30/2022, 8:08 AM

I just realized that, makes sense, thank you for being so responsive! 🙂

🙌 1

John

05/30/2022, 8:26 AM

I think something’s still off with the cost calculation, unless I’m missing something. With

typo_tokens_threshold=50

and

num_typos=2

I still get

arvin

as a result when querying for

earrings

. With

num_typos=1

I don’t get it. I think that it should only show up if

num_typos=4

. It still only happens with

prefix=True

. This is on 0.23.0.rc70

Kishore Nallan

05/30/2022, 8:58 AM

Reproduceable, I will check and get back to you.

🙌 1

Kishore Nallan

05/31/2022, 5:32 AM

I've identified the issue. Will fix, test and have a build available for testing in the next few days.

John

05/31/2022, 7:27 AM

Thank you Kishore!

John

06/07/2022, 6:13 AM

Any progress on this?

Kishore Nallan

06/07/2022, 6:20 AM

Yes, I can share a build with you. Do you use Docker?

Kishore Nallan

06/07/2022, 6:28 AM

I've published

typesense/typesense:0.24.0.rc2

to Docker that contains this fix.

John

06/07/2022, 6:49 AM

Awesome, we’ll take a look

John

06/07/2022, 7:40 AM

Seems to work, thanks!

Kishore Nallan

06/07/2022, 7:40 AM

Super, thanks for confirming!

Open in Slack

Previous Next