Seeing some weird results when combining prefix se...
# community-help
j
Seeing some weird results when combining prefix search and typo correction. This is on
0.24.0.rc16
. There’s only one document with a string field “technique”. If I search for
techhn
I get 0 results, but if I add or remove a letter I get the correct result, i.e.
techh
or
techhni
. Any clue what’s going on? Posting full example in comment.
Copy code
import typesense


COLLECTION = "example"
client = typesense.Client(
    {
        "api_key": "TYPESENSEDEV",
        "nodes": [{"host": "localhost", "port": "8108", "protocol": "http"}],
        "connection_timeout_seconds": 2,
    }
)

try:
    client.collections[COLLECTION].delete()
except:
    pass

fields = [
    {
        "facet": False,
        "index": True,
        "infix": False,
        "locale": "",
        "name": "title",
        "optional": False,
        "sort": True,
        "type": "string",
    },
]
client.collections.create(
    {
        "name": COLLECTION,
        "fields": fields,
    }
)

products = [
    {
        "title": "Technique"
    }
]

client.collections[COLLECTION].documents.import_(products, {"action": "create"})


words = [
    "techh",
    "techhn",
    "techhni"
]

for word in words:
    hits = client.collections[COLLECTION].documents.search(
        {
            "q": word,
            "query_by": "title",
        }
    )["hits"]
    print(f"Got {len(hits)} hits for {word}")
k
We have some measures in place to prevent eager matching of query tokens, especially in smaller words. This is based on feedback from our customers so far. Specifically, there are two parameters, both of which can be customized:
Copy code
min_len_1typo=4
min_len_2typo=7
In this example, since the query
techhn
is 6 chars, it is constrained to only 1 typo. When you compare the first 6 chars of the word
technique
it does not satisfy the single typo restriction so it does not match:
Copy code
techni
techhn
However, when you remove a letter, the first 5 chars are within 1 typo of each other
Copy code
techn
techh
Likewise, when you add a letter, e.g.
techhni
it is now eligible for 2 typo consideration (since default is
min_len_2typo=7
and hence matches.
j
Aha, that makes sense, thank you. The documentation says that the default value for
min_len_1typo
is 3 though, is that correct?
k
Sorry that's wrong and must be fixed. It's 4.
👍 1