Question is it possible to set drop tokens threshold = 0 whi typesense #community-help

Question: is it possible to set "drop_tokens_thres...

Joe

11/07/2021, 12:51 AM

Question: is it possible to set "drop_tokens_threshold" = 0, while still allowing for typo tolerance?It seems when set to 0, typo tolerance is disabled as well. How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)?

Joe

11/07/2021, 12:52 AM

Example query:

davis lorum ipsum

With the example query, all documents that include "davis", are being matched, despite having no mention of "lorum" or "ipsum", in any of the document attributes. No matter how many non-matching words I add at the end of the query, it still returns documents that only match the first word. How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)? e.g.

{name: 'davis', class: 'lorum etc...', 'notes': 'call ipsum'}

should match.

Kishore Nallan

11/07/2021, 3:04 AM

@Joe Recent 0.22.0 RC builds should have addressed this issue you have noticed. Can you please try against

typesense/typesense:0.22.0.rcs25

Docker build.

Joe

11/07/2021, 3:28 AM

@Kishore Nallan I used the DEB package. Is the RC build available in DEB package? I will test on Docker for now.

JinW

11/07/2021, 3:37 AM

@Kishore Nallan Is there a different between rcs22 and rcs25?

Kishore Nallan

11/07/2021, 3:38 AM

Yes: https://dl.typesense.org/releases/0.22.0.rcs25/typesense-server-0.22.0.rcs25-amd64.deb We keep fixing some small edge cases that we encounter and last mile performance regressions as we head to the final GA build.

👍 1

Joe

11/07/2021, 3:40 AM

Just tested on docker, and looks like its working as desired!

Kishore Nallan

11/07/2021, 3:41 AM

Awesome! I have also posted the DEB build above.

👍 1

Joe

11/07/2021, 3:43 AM

And one more thing, what is the process for upgrading the deb package? I cant fine any docs on site

Kishore Nallan

11/07/2021, 3:45 AM

https://github.com/typesense/typesense-website/blob/v0.22.0-docs/docs-site/content/0.22.0/api/README.md#upgrading

👍 1

Kishore Nallan

11/07/2021, 3:45 AM

Since 0.22 is not GA yet docs are on a branch, not published yet.

Kishore Nallan

11/07/2021, 3:46 AM

We already have some customers using 0.22 rc builds on production, so it's stable to use and that's how we are addressing last mile edge cases on some of the new features.

Joe

11/07/2021, 3:58 AM

Noticed an issue while setting "drop_tokens_threshold = 0". It will only search within one attribute. E.g. assuming {name: "jim", last: 'baker'}. A search for "jim baker" returns 0 results. (With query_by = "name,last")

Joe

11/07/2021, 4:00 AM

So while typo tolerance does work within a given attribute, it no longer matches on multiple attributes, even without typos.

Kishore Nallan

11/07/2021, 4:03 AM

Once you set

drop_tokens_thresold: 0

you are saying that don't drop any tokens from the query string. So Typesense will look for fields that contain both tokens

jim

and

baker

Kishore Nallan

11/07/2021, 4:03 AM

The parameter works at a per-field level.

Joe

11/07/2021, 4:06 AM

I see. Is there anyway to not drop tokens, before searching all attributes? (See my earlier message re Example query:

davis lorum ipsum

)

Kishore Nallan

11/07/2021, 4:08 AM

One way to make that happen is if you just have a composite field where you just concatenate all the text from other fields and then do a strict search match on that field.

Joe

11/07/2021, 4:08 AM

The idea being, retain all the normal search functionality, with the additional requirement, that all tokens must (fuzzy) exist (somewhere) on the document.

Kishore Nallan

11/07/2021, 4:09 AM

You can still search against the composite fields but have Typesense highlight other regular fields.

Kishore Nallan

11/07/2021, 4:09 AM

Using the

highlight_fields

parameter during search.

Joe

11/07/2021, 4:09 AM

I suppose that could work, though it seems very inefficient, duplicated all data.

Kishore Nallan

11/07/2021, 4:11 AM

Typesense index works at a field level, and all fields are queried independently so it has no way of knowing the global matching sequence.

Joe

11/07/2021, 4:13 AM

Gotcha. One hacky method I thought of, is filtering results client-side, such that number of highlighted fields == query.words.length, but that would be tricky with pagination, and not very efficent

Kishore Nallan

11/07/2021, 4:17 AM

Without an aggregated field index, the other problem is also about how to drop tokens from the query, whether it should be left to right or right to left, or from the middle etc. because Typesense has no way to tell the semantic meaning of the words. So without an aggregated index, the word combinations from the query will mean having to do many repeated searches with various combinations. A composite field works best to avoid this issue and gives that option to people who need it.

Joe

11/07/2021, 4:18 AM

Will consider that. is it always the case that "`_text_match"` will be higher for document that has more matching tokens?, if so I could just terminate the search as soon as first result in encountered with insufficient tokens.

Kishore Nallan

11/07/2021, 4:19 AM

If you have no need to search individual fields, you don't even need them as part of the schema. Just have the composite field in the schema and Typesense will allow you to highlight ANY field even if it is not part of the schema, as all fields are stored on disk. There is no overhead with this approach.

Joe

11/07/2021, 4:19 AM

good idea. I do need to filter on some fields, but perhaps I can aggregate the others.

Kishore Nallan

11/07/2021, 4:20 AM

Yes

_text_match

will be higher for documents with better match, both is number of tokens found and how near they are found to each other in terms of proximity.

Open in Slack

Previous Next