Token Priorities and Infix Search in Typesense Multi-word Queries
TLDR Sidharth sought guidance on creating multi-word query with token priority in Typesense. Kishore Nallan explained fetching results only for last word as prefix and suggested infix
search and data modelling as potential solutions. However, Kishore Nallan emphasized that infix
doesn't support multiple words and is only recommend for small datasets.
Jan 03, 2023 (9 months ago)
Sidharth
06:52 AMIs there a way to apply token priority on the multi word query
Eg, As of now for a query like this "rel ind"
we are getting results for only the token "ind"
Please guide with the parameter which can give the results such that
we can get results matching for different keywords
and
with token priority -> "rel" get 1st priority then "ind" and so on
Kishore Nallan
06:55 AMrel
actually match a word in the dataset?Kishore Nallan
06:55 AMSidharth
07:04 AMrel
is a prefix of a word in the databaseeg. reliance
Kishore Nallan
07:22 AMSidharth
07:41 AMCan you guide on some parameters which do following:-
• samples with more words getting priority.
◦ eg. for query
rel ind
priortize results where "rel" & "ind" is getting matched• Further,
rel
getting priority over ind
Sidharth
08:52 AMKishore Nallan
09:41 AMinfix
search option but that won't be very fast.Kishore Nallan
09:42 AMYou probably need to think about how you can model your data so that you can achieve what you want. It will be difficult for me to advise you on the modelling unless I understood your use case better.
Sidharth
09:49 AMSidharth
10:14 AMCan we apply
infix
on multiple words in a queryCurrently, for an example
rel ind
we are getting match with highlight as below in which infix is getting applied on only first token,'highlights': [{'field': 'tradingSymbol',
'matched_tokens': ['RELIANCE'],
'snippet': '<mark>RELIANCE</mark>'},
{'field': 'name',
'matched_tokens': ['RELIANCE'],
'snippet': '<mark>RELIANCE</mark> INDUSTRIES LTD'},
{'field': 'synonymField',
'matched_tokens': ['Reliance', 'Reliance', 'Reliance'],
But, you can see that in the output infix operation is not applied on
ind
We wanted to apply infix on subsequent words. as well as shown below
'snippet': '<mark>RELIANCE</mark> <mark>INDUSTRIES</mark> LTD'}
Kishore Nallan
10:22 AMSidharth
10:27 AMKishore Nallan
11:16 AMThe infix search is meant for handling searching of identifiers like model number etc so it actually only searches on the first word in the query. This is why the highlight is not working as expected.
Taking a step back, I think the best way for you to achieve what you want is to generate those 2-3 char combinations of tokens yourself and index them in a separate array field which you can include as part of your query_by field list.
Kishore Nallan
11:17 AMSidharth
11:47 AMCurrently TypeSense do not support infix match on multiple word, right?
Kishore Nallan
12:12 PMTypesense
Indexed 2786 threads (79% resolved)
Similar Threads
Discussing Prefix-Match for Multiple Tokens
Sidharth asked if prefix matching for separate tokens was possible and Kishore Nallan explained why it would be computationally intensive. Kishore Nallan then suggested an ngram solution which seemed to satisfy Sidharth's need.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Querying and Indexing Multiple Elements Issues
Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.
Understanding Indexing and Search-As-You-Type In Typesense
Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.