Handling Two-Word Queries with Custom Separators
TLDR Dima proposes adding a parameter to API for handling two-word queries. Jason suggests opening a GitHub issue for the feature request.
Powered by Struct AI


6
3mo
Solved
Jun 14, 2023 (3 months ago)
Dima
Dima
08:04 PMHi team! We found that for our dataset we don’t want to use
drop_tokens_threshold>0
if it is only two words in the query. Usually two-words queries are meaningful only if both words are present in result. Right now we use something like query.split(' ').length < 2 ? drop_tokens_threshold = 0 : drop_tokens_threshold = 1
, but I’m unsure about our simple tokenizator. Maybe it’s a good parameter to add it into API directly?Jason
Jason
08:09 PMThe space based approach should work for all languages that don’t use spaces between words.
08:09
1
Jason
08:09 PMCould you open a GitHub issue with this feature request?

Dima
Dima
08:10 PMUnfortunately we have custom separators for some collections
Jason
Jason
08:12 PMAh hmm, good point. I guess you could split on any of those custom separators as well, in addition to space…
Dima
Typesense
Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI
Indexed 2764 threads (79% resolved)
Similar Threads
Discussing Typesense's Tokenization Feature
Roshan seeks to understand typesense's tokenization feature. Kishore Nallan explains that it tokenizes on spaces and suggests using a special character as a separator.
8
19mo
Solved
Resolving Typesense Search Issues
Conversation started by Maximilian about Typesense search behavior led to Users Kishore Nallan and Mike discussing and suggesting workaround, with Kishore Nallan promising an official solution soon. No final confirmation of resolution provided.

14
17mo
Improving Search Query to Match Multiple Words
Tim wanted to adjust search for multiple-word values. Jason suggested using double-quotes or adjusting search settings.
5
7mo
Solved