Hi team! We found that for our dataset we don’t wa...
# community-help
d
Hi team! We found that for our dataset we don’t want to use
drop_tokens_threshold>0
if it is only two words in the query. Usually two-words queries are meaningful only if both words are present in result. Right now we use something like
query.split(' ').length < 2 ? drop_tokens_threshold = 0 : drop_tokens_threshold = 1
, but I’m unsure about our simple tokenizator. Maybe it’s a good parameter to add it into API directly?
j
The space based approach should work for all languages that don’t use spaces between words.
Could you open a GitHub issue with this feature request?
🙌 1
d
Unfortunately we have custom separators for some collections
j
Ah hmm, good point. I guess you could split on any of those custom separators as well, in addition to space…