#community-help

Array Field Autocomplete Issue in Typesense Migration

TLDR Kanwei encountered issues with autocomplete when migrating from Elasticsearch to Typesense. Jason and Kishore Nallan identified it as a bug and instructed Kanwei to create a GitHub issue.

Powered by Struct AI

1

Mar 17, 2023 (9 months ago)
Kanwei
Photo of md5-d38e77fc89361377799b917976969992
Kanwei
02:03 PM
Hi! We're trying to migrate from elasticsearch to TS. One problem I'm running into is this:

We have a "company" schema with company_name and an array field called subsidiary_names with the company's subsidiaries.

We have an autocomplete/prefix query searching on both company_name and subsidiary_names. It works, except that when searching on subsidiary_names, it seems to comingle the entries. For example, if there's a company with subsidiaries of ["Hawaii Electric", "Oahu Power"] and you search for "Hawaii Power" both entries get matched

Also, it seems like Typesense will match as long as there's a single match. For example, if the company name is "Hawaii Electric" but you search for "hawaii electric asdfasdf" it still considers it a match. Any way to change this behavior? elasticsearch doesn't do this by default
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:35 PM
> For example, if there’s a company with subsidiaries of [“Hawaii Electric”, “Oahu Power”] and you search for “Hawaii Power” both entries get matched
Could you expand on what you mean by “both entries” here? Do you mean it’s matching records with company_name as Hawaii Power and also records where the subsidiary field has Hawaii Power in it?
04:36
Jason
04:36 PM
> For example, if the company name is “Hawaii Electric” but you search for “hawaii electric asdfasdf” it still considers it a match. Any way to change this behavior?
This is behavior is controlled by drop_tokens_threshold which is set to 1 by default. If you set it to 0, it will give you the behavior you’re describing. Documented under this table here: https://typesense.org/docs/0.24.0/api/search.html#typo-tolerance-parameters
Kanwei
Photo of md5-d38e77fc89361377799b917976969992
Kanwei
05:29 PM
oh thanks for drop_tokens_threshold

1

05:53
Kanwei
05:53 PM
Jason The first question is basically, I want each entry in the array of subsidiaries to be totally independent instead of comingled
05:54
Kanwei
05:54 PM
> let's say the parent company is "Western Corp" and two subsidiaries are [“Hawaii Electric”, “Oahu Power”]
05:55
Kanwei
05:55 PM
I want to match "Hawaii electric", but not "hawaii power"
05:55
Kanwei
05:55 PM
right now it will match both hawii electric and oahu power
05:55
Kanwei
05:55 PM
because it seems all the entries in subsidiaries are just concatenated
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:06 PM
Setting drop_tokens_threshold to 0, will also address this
Kanwei
Photo of md5-d38e77fc89361377799b917976969992
Kanwei
06:30 PM
Yeah i thought it may do that but it doesn't work
06:46
Kanwei
06:46 PM
Image 1 for
06:47
Kanwei
06:47 PM
Jason I searched for "saxon incorporated" and both "saxon business systems" and "palo alto incorporated" were matched
06:47
Kanwei
06:47 PM
Image 1 for
Mar 20, 2023 (9 months ago)
Kanwei
Photo of md5-d38e77fc89361377799b917976969992
Kanwei
03:27 PM
Jason Sorry to bug you again but any thoughts on this? It's our last remaining blocker to potentially migrate from ES
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:29 PM
Could you clone this code snippet, adapt it with some sample records from your dataset, then update the search query to replicate the issue: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b

Want to make sure I understand the issue fully
Mar 21, 2023 (9 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:47 AM
CC: Kishore Nallan ^
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:15 AM
Ah I see, what's happening here. Our index for arrays are at an aggregate field level so for a phrase like foo bar could end up being stored in 2 different elements of a single field and currently we are unable to account for this case.

Can you please create a github issue here: https://github.com/typesense/typesense/issues? This is a bug and we need to fix it.
Kanwei
Photo of md5-d38e77fc89361377799b917976969992
Kanwei
03:41 PM

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Querying and Indexing Multiple Elements Issues

Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.

34
12mo
Solved

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo
Solved

Phrase Match Problem in Typesense Version 0.24.0rcn25

Robert was unsure about correct phrase match usage in Typesense. After providing Kishore Nallan with necessary data, Kishore Nallan was able to replicate the issue. Robert shared a Github link for further tracking, where Kishore Nallan responded later.

9
13mo
Solved

Issues with Repeated Words and Hyphen Queries in Typesense API

JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.

8

43
25mo

Issue with Search Duration on Typesense Database

Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.

13
10mo