Question: is it possible to set "drop_tokens_thres...
# community-help
j
Question: is it possible to set "drop_tokens_threshold" = 0, while still allowing for typo tolerance?It seems when set to 0, typo tolerance is disabled as well. How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)?
Example query: 
davis lorum ipsum
With the example query, all documents that include "davis", are being matched, despite having no mention of "lorum" or "ipsum", in any of the document attributes. No matter how many non-matching words I add at the end of the query, it still returns documents that only match the first word. How can I limit results, such that the matched document must (fuzzy) contain all terms in the query (not necessarily in order, or on the same attributes)? e.g. 
{name: 'davis', class: 'lorum etc...', 'notes': 'call ipsum'}
 should match.
k
@Joe Recent 0.22.0 RC builds should have addressed this issue you have noticed. Can you please try against
typesense/typesense:0.22.0.rcs25
Docker build.
j
@Kishore Nallan I used the DEB package. Is the RC build available in DEB package? I will test on Docker for now.
j
@Kishore Nallan Is there a different between rcs22 and rcs25?
k
Yes: https://dl.typesense.org/releases/0.22.0.rcs25/typesense-server-0.22.0.rcs25-amd64.deb We keep fixing some small edge cases that we encounter and last mile performance regressions as we head to the final GA build.
👍 1
j
Just tested on docker, and looks like its working as desired!
k
Awesome! I have also posted the DEB build above.
👍 1
j
And one more thing, what is the process for upgrading the deb package? I cant fine any docs on site
Since 0.22 is not GA yet docs are on a branch, not published yet.
We already have some customers using 0.22 rc builds on production, so it's stable to use and that's how we are addressing last mile edge cases on some of the new features.
j
Noticed an issue while setting "drop_tokens_threshold = 0". It will only search within one attribute. E.g. assuming {name: "jim", last: 'baker'}. A search for "jim baker" returns 0 results. (With query_by = "name,last")
So while typo tolerance does work within a given attribute, it no longer matches on multiple attributes, even without typos.
k
Once you set
drop_tokens_thresold: 0
you are saying that don't drop any tokens from the query string. So Typesense will look for fields that contain both tokens
jim
and
baker
.
The parameter works at a per-field level.
j
I see. Is there anyway to not drop tokens, before searching all attributes? (See my earlier message re Example query: 
davis lorum ipsum
)
k
One way to make that happen is if you just have a composite field where you just concatenate all the text from other fields and then do a strict search match on that field.
j
The idea being, retain all the normal search functionality, with the additional requirement, that all tokens must (fuzzy) exist (somewhere) on the document.
k
You can still search against the composite fields but have Typesense highlight other regular fields.
Using the
highlight_fields
parameter during search.
j
I suppose that could work, though it seems very inefficient, duplicated all data.
k
Typesense index works at a field level, and all fields are queried independently so it has no way of knowing the global matching sequence.
j
Gotcha. One hacky method I thought of, is filtering results client-side, such that number of highlighted fields == query.words.length, but that would be tricky with pagination, and not very efficent
k
Without an aggregated field index, the other problem is also about how to drop tokens from the query, whether it should be left to right or right to left, or from the middle etc. because Typesense has no way to tell the semantic meaning of the words. So without an aggregated index, the word combinations from the query will mean having to do many repeated searches with various combinations. A composite field works best to avoid this issue and gives that option to people who need it.
j
Will consider that. is it always the case that "`_text_match"` will be higher for document that has more matching tokens?, if so I could just terminate the search as soon as first result in encountered with insufficient tokens.
k
If you have no need to search individual fields, you don't even need them as part of the schema. Just have the composite field in the schema and Typesense will allow you to highlight ANY field even if it is not part of the schema, as all fields are stored on disk. There is no overhead with this approach.
j
good idea. I do need to filter on some fields, but perhaps I can aggregate the others.
k
Yes
_text_match
will be higher for documents with better match, both is number of tokens found and how near they are found to each other in terms of proximity.