Is it expected to drop characters without counting...
# community-help
j
Is it expected to drop characters without counting it towards the typo cost when doing prefix search? E.g. with
num_typos=2
we get that
earrings
matches
arvin
even though the edit distance is 4, but
earrin
and
arvin
has edit distance of 2. Not sure it’s dropping, it’s just my best guess but it seems like strange behaviour to me. It doesn’t happen with
prefix: false
k
May I know what version of Typesense you are using?
j
Sure,
0.22.2
k
@John I'm not able to reproduce this. I indexed a single document with a title field having the word "Earrings" then when I query with
?q=arvin&query_by=title
I get no results. Can you provide a reproduceable snippet?
j
Copy code
import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)

client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])
k
Thanks! I just verified that this is fixed in the 0.23 RC builds.
j
That’s great! Do you know what the issue was? 🙂 Just curious
k
Over eager typo correction 🙂
j
But now it doesn’t seem to match
earrings
to
earring
even though it’s just 1 typo, example:
Copy code
import typesense

COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)
client.collections[COLLECTION].delete()
client.collections.create(
    {
        "name": COLLECTION,
        "fields": [
            {"name": "title", "type": "string"},
            {"name": "brand", "type": "string"},
        ],
    }
)

client.collections[COLLECTION].documents.create(
    {"id": "1", "title": "daylight earrings gold plated", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "2", "title": "fancy earring", "brand": "foo"}
)
client.collections[COLLECTION].documents.create(
    {"id": "3", "title": "something else", "brand": "arvin"}
)


result = client.collections[COLLECTION].documents.search(
    {
        "q": "earrings",
        "query_by": "title,brand",
        "use_cache": False,
        "num_typos": "2,2",
    }
)
print(result["hits"])
just gives the document with
earrings
k
So the
num_typos
parameter is basically a maximum value of typos allowed. Since there is already a record with exact match, other typos are not considered. This behavior can be tweaked with the
typo_tokens_threshold
parameter. This parameter controls the minimum number of results that should be fetched before typo relaxation is stopped. Since the default is 1, Typesense does not look for words with more typos when it finds atleast a document with exact match.
j
I just realized that, makes sense, thank you for being so responsive! 🙂
🙌 1
I think something’s still off with the cost calculation, unless I’m missing something. With
typo_tokens_threshold=50
and
num_typos=2
I still get
arvin
as a result when querying for
earrings
. With
num_typos=1
I don’t get it. I think that it should only show up if
num_typos=4
. It still only happens with
prefix=True
. This is on 0.23.0.rc70
k
Reproduceable, I will check and get back to you.
🙌 1
I've identified the issue. Will fix, test and have a build available for testing in the next few days.
j
Thank you Kishore!
Any progress on this?
k
Yes, I can share a build with you. Do you use Docker?
I've published
typesense/typesense:0.24.0.rc2
to Docker that contains this fix.
j
Awesome, we’ll take a look
Seems to work, thanks!
k
Super, thanks for confirming!