Prefix Matching Issues in Typesense
TLDR Toby has an issue with prefix matching where text-matches are inconsistently ordered. Jason suggested opening a Github issue for this bug, which Toby did.
Aug 16, 2021 (29 months ago)
Toby
04:44 PMI’m having some trouble getting prefix matches to work as I expect; with a small corpus of 10 documents with names starting with
John
, a prefix search for John W
correctly shows John Williams
with a text_match
higher than the other Johns. But if you start the second word with the same letter that the first word starts with, i.e. John J
, the text_match
is the same for all results and therefore the order isn’t what you’d expect:curl "" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: xyz" \
-d '{
"name": "johns",
"fields": [
{"name": "name", "type": "string" }
]
}'
curl "" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: xyz" \
-d '{ "id": "1", "name": "John Stark" }
{ "id": "1", "name": "John Atwood" }
{ "id": "2", "name": "John Smith" }
{ "id": "3", "name": "John Johnson" }
{ "id": "4", "name": "John Williams" }
{ "id": "5", "name": "John Brown" }
{ "id": "6", "name": "John Jones" }
{ "id": "7", "name": "John Garcia" }
{ "id": "8", "name": "John Miller" }
{ "id": "9", "name": "John Keller" }
{ "id": "10", "name": "John Davis" }'```
curl -H "X-TYPESENSE-API-KEY: xyz" \"http://localhost:8108/collections/johns/documents/search\
?q=John%20W&query_by=name&per_page=3"
{"facet_counts":[],"found":11,"hits":[{"document":{"id":"4","name":"John Williams"},"highlights":[{"field":"name","matched_tokens":["John","Williams"],"snippet":"<mark>John</mark> <mark>Williams</mark>"}],"text_match":50225924},{"document":{"id":"10","name":"John Davis"},"highlights":[{"field":"name","matched_tokens":["John"],"snippet":"<mark>John</mark> Davis"}],"text_match":33514496},{"document":{"id":"9","name":"John Keller"},"highlights":[{"field":"name","matched_tokens":["John"],"snippet":"<mark>John</mark> Keller"}],"text_match":33514496}],"out_of":11,"page":1,"request_params":{"collection_name":"johns","per_page":3,"q":"John W"},"search_time_ms":8}
curl -H "X-TYPESENSE-API-KEY: xyz" \"http://localhost:8108/collections/johns/documents/search\
?q=John%20J&query_by=name&per_page=3"
{"facet_counts":[],"found":11,"hits":[{"document":{"id":"10","name":"John Davis"},"highlights":[{"field":"name","matched_tokens":["John"],"snippet":"<mark>John</mark> Davis"}],"text_match":50226176},{"document":{"id":"9","name":"John Keller"},"highlights":[{"field":"name","matched_tokens":["John"],"snippet":"<mark>John</mark> Keller"}],"text_match":50226176},{"document":{"id":"8","name":"John Miller"},"highlights":[{"field":"name","matched_tokens":["John"],"snippet":"<mark>John</mark> Miller"}],"text_match":50226176}],"out_of":11,"page":1,"request_params":{"collection_name":"johns","per_page":3,"q":"John J"},"search_time_ms":14}```
Any ideas? Thanks!
Jason
07:06 PMReally appreciate the default minimal reproduceable test case! 🙏
Toby
07:47 PMJason
08:27 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Resolving Typesense Search Issues
Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.
Querying TypeSense with Different Search String Orders
Jesper needed explanation on the difference in TypeSense search results for "X Y" and "Y X". Kishore Nallan clarified that Typesense treats the last word as a prefix query, explaining the discrepancy.
Adjusting Text Match Score Calculation in TypeSense
Johannes wanted to modify the Text Match Score calculation in TypeSense to improve search results returns. With counsel from Jason and Kishore Nallan, various solutions were proposed, including creating a Github issue, attempting different parameters, and updating Docker to a new version to resolve the matter.