#community-help

Token Priorities and Infix Search in Typesense Multi-word Queries

TLDR Sidharth sought guidance on creating multi-word query with token priority in Typesense. Kishore Nallan explained fetching results only for last word as prefix and suggested infix search and data modelling as potential solutions. However, Kishore Nallan emphasized that infix doesn't support multiple words and is only recommend for small datasets.

Powered by Struct AI
17
11mo
Solved
Join the chat
Jan 03, 2023 (11 months ago)
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
06:52 AM
Hello Folks
Is there a way to apply token priority on the multi word query
Eg, As of now for a query like this "rel ind"
we are getting results for only the token "ind"

Please guide with the parameter which can give the results such that
we can get results matching for different keywords
and
with token priority -> "rel" get 1st priority then "ind" and so on
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:55 AM
Does rel actually match a word in the dataset?
06:55
Kishore Nallan
06:55 AM
Typesense does a prefix search only on the last word in the query.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
07:04 AM
rel is a prefix of a word in the database
eg. reliance
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
07:22 AM
Yes so that's why it's not matching. Only last word is prefix searched since that's what is useful in a typeahead autocomplete use case.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
07:41 AM
For our use-case we don't want typeahead feature

Can you guide on some parameters which do following:-
• samples with more words getting priority.
◦ eg. for query rel ind priortize results where "rel" & "ind" is getting matched
• Further, rel getting priority over ind
08:52
Sidharth
08:52 AM
Kishore Nallan Could you please guide us on the above use-case
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:41 AM
If you want all parts of a query to match then you have to use infix search option but that won't be very fast.
09:42
Kishore Nallan
09:42 AM
There's no way to prioritize one word in a query over another.

You probably need to think about how you can model your data so that you can achieve what you want. It will be difficult for me to advise you on the modelling unless I understood your use case better.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
09:49 AM
Ohk sure
10:14
Sidharth
10:14 AM
Hello Kishore Nallan
Can we apply infix on multiple words in a query
Currently, for an example rel ind we are getting match with highlight as below in which infix is getting applied on only first token,
'highlights': [{'field': 'tradingSymbol',
     'matched_tokens': ['RELIANCE'],
     'snippet': '<mark>RELIANCE</mark>'},
    {'field': 'name',
     'matched_tokens': ['RELIANCE'],
     'snippet': '<mark>RELIANCE</mark> INDUSTRIES LTD'},
    {'field': 'synonymField',
     'matched_tokens': ['Reliance', 'Reliance', 'Reliance'],

But, you can see that in the output infix operation is not applied on ind
We wanted to apply infix on subsequent words. as well as shown below
'snippet': '<mark>RELIANCE</mark> <mark>INDUSTRIES</mark> LTD'}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:22 AM
I will have to check on that, will get back to you.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
10:27 AM
Sure, thanks
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:16 AM
Sidharth

The infix search is meant for handling searching of identifiers like model number etc so it actually only searches on the first word in the query. This is why the highlight is not working as expected.

Taking a step back, I think the best way for you to achieve what you want is to generate those 2-3 char combinations of tokens yourself and index them in a separate array field which you can include as part of your query_by field list.
11:17
Kishore Nallan
11:17 AM
The other option is for us to add a feature to use prefix search against all the tokens in the query. Happy to discuss the specifics of that on DM.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
11:47 AM
Just to confirm my understanding
Currently TypeSense do not support infix match on multiple word, right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:12 PM
Yes correct, and infix search is a O(N) operation so it's for a very specific case for small datasets. We don't recommend it on high traffic or large data use cases.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Discussing Prefix-Match for Multiple Tokens

Sidharth asked if prefix matching for separate tokens was possible and Kishore Nallan explained why it would be computationally intensive. Kishore Nallan then suggested an ngram solution which seemed to satisfy Sidharth's need.

1

22
3mo
Solved

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Querying and Indexing Multiple Elements Issues

Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.

34
12mo
Solved

Understanding Indexing and Search-As-You-Type In Typesense

Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.

2

13
28mo
Solved

Issues with Repeated Words and Hyphen Queries in Typesense API

JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.

8

43
25mo