Typo Correction Issue in Typesense v0.24.1
TLDR Yoann encounters mysterious behavior in typo correction for certain query strings. Kishore Nallan will investigate the issue.
1
Jun 02, 2023 (6 months ago)
Yoann
08:05 AMI have a document with a field name = "La Bouitte".
With typo params in the search set to default and exhaustive search set to
true
, the typo correction works differently depending on the position of the typo:• q="bouite" (one t) --> doc is found
• q="bouittee" (extra e) --> doc is found
• q="boutte" (no i) --> doc is found
• q="boitte" (missing u) --> doc is found
• q="buitte" (missing o) -->
doc is not found
• q="ouitte" (missing b) -->
doc is not found
• q="ouittee" (missing b, extra e) --> doc is found (I guess because
min_len_2typo
defaults to 2So inserting a char earlier in the word seems to be harder to correct, why is that so ?
Kishore Nallan
08:11 AMdoc is not found
do you mean to say that you get no results at all or that you get other results?Jun 05, 2023 (6 months ago)
Yoann
08:05 AMYoann
08:05 AMKishore Nallan
01:45 PMKishore Nallan
01:45 PMJun 06, 2023 (6 months ago)
Yoann
12:09 PM# Example doc
{
"id": "16421",
"name": ["la bouitte"],
"resort": [419,422]
}
Search Request
{"searches":
[{
"query_by":"name",
"collection":"prod__location",
"filter_by": "resort:=[419]",
"q":"buitte",
"exhaustive_search": true
}]
}
Yoann
12:20 PMcurl --location '' \
--header 'Content-Type: application/json' \
--header 'X-TYPESENSE-API-KEY: xyz' \
--data '{
"name": "test",
"fields": [
{"name": "name", "type": "string[]"},
{"name": "resort", "type": "int32[]", "facet": true}
]
}'
Insert
curl --location '' \
--header 'Content-Type: application/json' \
--header 'X-TYPESENSE-API-KEY: xyz' \
--data '{
"name": ["la bouitte"],
"resort": [419,422]
}'
Search
curl --location --globoff '' \
--header 'X-TYPESENSE-API-KEY: xyz'
Response:
{
"facet_counts": [],
"found": 0,
"hits": [],
"out_of": 1,
"page": 1,
"request_params": {
"collection_name": "test",
"per_page": 10,
"q": "buitte"
},
"search_cutoff": false,
"search_time_ms": 0
}
Yoann
12:23 PMbuitte l
and for la buitte
but (as expected, since l
has no exact match) not for l buitte
Kishore Nallan
12:23 PM1
Typesense
Indexed 3011 threads (79% resolved)
Similar Threads
Troubleshooting Typo Correction in Typesense Search
John encountered issues with the typo costs while executing prefix searches in Typesense. Kishore Nallan tracked and resolved the problem, providing John with an updated build to verify the fix.
Typesense Search Issue with Prefix Search and Typo Correction
John raised an issue with Typesense search results concerning typo correction and prefix searching. Kishore Nallan explained the behavior based on the system parameters for typo constraints. He later corrected a mistake in documentation brought up by John.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Understanding Typo Tolerance in Search Queries
gab sought clarity on typo tolerance settings in search operations, specifically on the discrepancy in document returns when typos are involved. Kishore Nallan explained the "num_typos" and "typo_tokens_threshold" parameters within search queries, and how they dictate typo allowance during searches.
Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense
Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.