#community-help

Discussing Prefix-Match for Multiple Tokens

TLDR Sidharth asked if prefix matching for separate tokens was possible and Kishore Nallan explained why it would be computationally intensive. Kishore Nallan then suggested an ngram solution which seemed to satisfy Sidharth's need.

Powered by Struct AI

1

22
1mo
Solved
Join the chat
Aug 08, 2023 (1 month ago)
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
10:47 AM
Hello Kishore Nallan
Is there a way to consider prefix match for each token separately?
eg. rel  fut
searched output: document -> Reliance Future
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:56 AM
No this is not possible. Primarily because implementing this will be very intensive computationally. Each prefix could produce tens of matching words. For two prefixes if each produces ten words, then total combinations are 10 x 10 = 100
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
11:01 AM
Is there any way to solve, in case some one have faced similar issue?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:25 AM
Should they always be adjacent words?
11:25
Kishore Nallan
11:25 AM
Infix search will help but against that's exhaustive so won't support high concurrency.
11:54
Kishore Nallan
11:54 AM
The only workaround I can think of is storing n grams of the words in an array. So for reliance industries you will store:

[r, re, rel, reli, relia, relianc, reliance, i, in, ind, ...]
11:55
Kishore Nallan
11:55 AM
This way rel ind will produce results fast. This might work for you since you are perhaps indexing stock symbols? There are not many companies so this should not take too much memory.
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
03:07 PM
Kishore Nallan
With infix parameter, for
eg. query -> rel fut
and: document -> Reliance-Future

will it match both rel & fut in the document?
03:48
Sidharth
03:48 PM
Kishore Nallan
How many input tokens will be searched in the scenario of INFIX.
eg. query -> rel fut
Will both rel & fut be searched in the fields?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
> will it match both rel & fut in the document?
Yes it will.
03:53
Kishore Nallan
03:53 PM
You can play around with it to get a feel. Some additional details here under the infix column here: https://typesense.org/docs/0.24.1/api/search.html#search-parameters
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
03:54 PM
To how many tokens INFIX will be applied?
03:54
Sidharth
03:54 PM
Is it possible to connect over a short huddle?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:55 PM
All tokens
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
03:57 PM
In our scenario, we are not getting the top matches based upon the both the keywords
03:58
Sidharth
03:58 PM
Idealy for below example,
eg. query -> rel fut
and: document -> Reliance-Future
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:03 PM
Difficult to say without looking at the overall dataset, your schema and query etc. We do community support in a public slack channel so other users who might be looking for similar information find this conversation helpful. Here's more info if you need private/prioritized support when self-hosting: https://typesense.org/support/
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
04:07 PM
Sure Thanks Kishore Nallan
04:09
Sidharth
04:09 PM
One last query Kishore Nallan
Is there any feature to apply prefix on all the query tokens?
05:03
Sidharth
05:03 PM
Kishore Nallan
Can you please further guide me on the ngram solution
Aug 09, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:33 AM
The ngram solution is pretty much what I've described above. You generate ngrams of words in a field and store them as a string array field in Typensese and then search on it.
Aug 10, 2023 (1 month ago)
Sidharth
Photo of md5-051f535431ff484f44f165e9a0b696a5
Sidharth
05:53 AM
Kishore Nallan
Thanks a lot for suggesting a great solution. Most likely it will solve the problem in our use-case.

1