#community-help

Handling Two-Word Queries with Custom Separators

TLDR Dima proposes adding a parameter to API for handling two-word queries. Jason suggests opening a GitHub issue for the feature request.

Powered by Struct AI
pray1
raised_hands1
6
3mo
Solved
Join the chat
Jun 14, 2023 (3 months ago)
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
08:04 PM
Hi team! We found that for our dataset we don’t want to use drop_tokens_threshold>0 if it is only two words in the query. Usually two-words queries are meaningful only if both words are present in result. Right now we use something like query.split(' ').length < 2 ? drop_tokens_threshold = 0 : drop_tokens_threshold = 1, but I’m unsure about our simple tokenizator. Maybe it’s a good parameter to add it into API directly?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:09 PM
The space based approach should work for all languages that don’t use spaces between words.
08:09
Jason
08:09 PM
Could you open a GitHub issue with this feature request?
raised_hands1
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
08:10 PM
Unfortunately we have custom separators for some collections
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:12 PM
Ah hmm, good point. I guess you could split on any of those custom separators as well, in addition to space…