#community-help

Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense

TLDR Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.

Powered by Struct AI

3

Nov 07, 2021 (26 months ago)
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
12:51 AM
Question: is it possible to set "drop_tokens_threshold" = 0, while still allowing for typo tolerance?It seems when set to 0, typo tolerance is disabled as well.
How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)?
12:52
Joe
12:52 AM
Example query: davis lorum ipsum

With the example query, all documents that include "davis", are being matched, despite having no mention of "lorum" or "ipsum", in any of the document attributes. No matter how many non-matching words I add at the end of the query, it still returns documents that only match the first word.

How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)? e.g. {name: 'davis', class: 'lorum etc...', 'notes': 'call ipsum'} should match.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:04 AM
Joe Recent 0.22.0 RC builds should have addressed this issue you have noticed. Can you please try against typesense/typesense:0.22.0.rcs25 Docker build.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
03:28 AM
Kishore Nallan I used the DEB package. Is the RC build available in DEB package? I will test on Docker for now.
JinW
Photo of md5-be53735a2b0297bb542711c1d2ecea45
JinW
03:37 AM
Kishore Nallan Is there a different between rcs22 and rcs25?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:38 AM
Yes:

https://dl.typesense.org/releases/0.22.0.rcs25/typesense-server-0.22.0.rcs25-amd64.deb

We keep fixing some small edge cases that we encounter and last mile performance regressions as we head to the final GA build.

1

Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
03:40 AM
Just tested on docker, and looks like its working as desired!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:41 AM
Awesome! I have also posted the DEB build above.

1

Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
03:43 AM
And one more thing, what is the process for upgrading the deb package? I cant fine any docs on site
03:45
Kishore Nallan
03:45 AM
Since 0.22 is not GA yet docs are on a branch, not published yet.
03:46
Kishore Nallan
03:46 AM
We already have some customers using 0.22 rc builds on production, so it's stable to use and that's how we are addressing last mile edge cases on some of the new features.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
03:58 AM
Noticed an issue while setting "drop_tokens_threshold = 0". It will only search within one attribute. E.g. assuming {name: "jim", last: 'baker'}. A search for "jim baker" returns 0 results. (With query_by = "name,last")
04:00
Joe
04:00 AM
So while typo tolerance does work within a given attribute, it no longer matches on multiple attributes, even without typos.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:03 AM
Once you set drop_tokens_thresold: 0 you are saying that don't drop any tokens from the query string. So Typesense will look for fields that contain both tokens jim and baker.
04:03
Kishore Nallan
04:03 AM
The parameter works at a per-field level.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:06 AM
I see. Is there anyway to not drop tokens, before searching all attributes? (See my earlier message re Example query: davis lorum ipsum )
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:08 AM
One way to make that happen is if you just have a composite field where you just concatenate all the text from other fields and then do a strict search match on that field.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:08 AM
The idea being, retain all the normal search functionality, with the additional requirement, that all tokens must (fuzzy) exist (somewhere) on the document.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:09 AM
You can still search against the composite fields but have Typesense highlight other regular fields.
04:09
Kishore Nallan
04:09 AM
Using the highlight_fields parameter during search.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:09 AM
I suppose that could work, though it seems very inefficient, duplicated all data.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:11 AM
Typesense index works at a field level, and all fields are queried independently so it has no way of knowing the global matching sequence.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:13 AM
Gotcha. One hacky method I thought of, is filtering results client-side, such that number of highlighted fields == query.words.length, but that would be tricky with pagination, and not very efficent
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:17 AM
Without an aggregated field index, the other problem is also about how to drop tokens from the query, whether it should be left to right or right to left, or from the middle etc. because Typesense has no way to tell the semantic meaning of the words. So without an aggregated index, the word combinations from the query will mean having to do many repeated searches with various combinations. A composite field works best to avoid this issue and gives that option to people who need it.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:18 AM
Will consider that. is it always the case that "_text_match" will be higher for document that has more matching tokens?, if so I could just terminate the search as soon as first result in encountered with insufficient tokens.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:19 AM
If you have no need to search individual fields, you don't even need them as part of the schema. Just have the composite field in the schema and Typesense will allow you to highlight ANY field even if it is not part of the schema, as all fields are stored on disk. There is no overhead with this approach.
Joe
Photo of md5-fc30d4a0d5c4f7a0c8ee62149a148e6f
Joe
04:19 AM
good idea. I do need to filter on some fields, but perhaps I can aggregate the others.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:20 AM
Yes _text_match will be higher for documents with better match, both is number of tokens found and how near they are found to each other in terms of proximity.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo

Issue with Query Expectations on Typesense Search

Sean was having an issue with their search query on Typesense. Kishore Nallan suggested adjusting the 'drop_tokens_threshold' parameter. After making the adjustment, Sean found an issue with the order of the results, which was resolved by updating Typesense version.

2

22
2mo

Issues with Repeated Words and Hyphen Queries in Typesense API

JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.

8

43
25mo

Resolving Typesense Result Issue in Document Collection Queries

Mike was encountering errors when searching for specific query in their Typesense document collection. Jason suggested it may be due to the `drop_tokens_threshold` setting. There was a misunderstanding but after further explanation from Jason, Mike understood and decided to continue the conversation via email.

1

19
22mo

Resolving Typesense Search Issues

Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.

1

14
21mo