Hi! We're trying to migrate from elasticsearch to ...
# community-help
k
Hi! We're trying to migrate from elasticsearch to TS. One problem I'm running into is this: We have a "company" schema with company_name and an array field called subsidiary_names with the company's subsidiaries. We have an autocomplete/prefix query searching on both company_name and subsidiary_names. It works, except that when searching on subsidiary_names, it seems to comingle the entries. For example, if there's a company with subsidiaries of ["Hawaii Electric", "Oahu Power"] and you search for "Hawaii Power" both entries get matched Also, it seems like Typesense will match as long as there's a single match. For example, if the company name is "Hawaii Electric" but you search for "hawaii electric asdfasdf" it still considers it a match. Any way to change this behavior? elasticsearch doesn't do this by default
j
For example, if there’s a company with subsidiaries of [“Hawaii Electric”, “Oahu Power”] and you search for “Hawaii Power” both entries get matched
Could you expand on what you mean by “both entries” here? Do you mean it’s matching records with company_name as
Hawaii Power
and also records where the subsidiary field has
Hawaii Power
in it?
For example, if the company name is “Hawaii Electric” but you search for “hawaii electric asdfasdf” it still considers it a match. Any way to change this behavior?
This is behavior is controlled by
drop_tokens_threshold
which is set to
1
by default. If you set it to 0, it will give you the behavior you’re describing. Documented under this table here: https://typesense.org/docs/0.24.0/api/search.html#typo-tolerance-parameters
k
oh thanks for drop_tokens_threshold
👍 1
@Jason Bosco The first question is basically, I want each entry in the array of subsidiaries to be totally independent instead of comingled
let's say the parent company is "Western Corp" and two subsidiaries are [“Hawaii Electric”, “Oahu Power”]
I want to match "Hawaii electric", but not "hawaii power"
right now it will match both hawii electric and oahu power
because it seems all the entries in subsidiaries are just concatenated
j
Setting drop_tokens_threshold to 0, will also address this
k
Yeah i thought it may do that but it doesn't work
message has been deleted
@Jason Bosco I searched for "saxon incorporated" and both "saxon business systems" and "palo alto incorporated" were matched
message has been deleted
@Jason Bosco Sorry to bug you again but any thoughts on this? It's our last remaining blocker to potentially migrate from ES
j
Could you clone this code snippet, adapt it with some sample records from your dataset, then update the search query to replicate the issue: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b Want to make sure I understand the issue fully
k
j
CC: @Kishore Nallan ^
k
Ah I see, what's happening here. Our index for arrays are at an aggregate field level so for a phrase like
foo bar
could end up being stored in 2 different elements of a single field and currently we are unable to account for this case. Can you please create a github issue here: https://github.com/typesense/typesense/issues? This is a bug and we need to fix it.
k