Array Field Autocomplete Issue in Typesense Migration
TLDR Kanwei encountered issues with autocomplete when migrating from Elasticsearch to Typesense. Jason and Kishore Nallan identified it as a bug and instructed Kanwei to create a GitHub issue.
Mar 17, 2023 (9 months ago)
We have a "company" schema with company_name and an array field called subsidiary_names with the company's subsidiaries.
We have an autocomplete/prefix query searching on both company_name and subsidiary_names. It works, except that when searching on subsidiary_names, it seems to comingle the entries. For example, if there's a company with subsidiaries of ["Hawaii Electric", "Oahu Power"] and you search for "Hawaii Power" both entries get matched
Also, it seems like Typesense will match as long as there's a single match. For example, if the company name is "Hawaii Electric" but you search for "hawaii electric asdfasdf" it still considers it a match. Any way to change this behavior? elasticsearch doesn't do this by default
Could you expand on what you mean by “both entries” here? Do you mean it’s matching records with company_name as
Hawaii Powerand also records where the subsidiary field has
Hawaii Powerin it?
This is behavior is controlled by
drop_tokens_thresholdwhich is set to
1by default. If you set it to 0, it will give you the behavior you’re describing. Documented under this table here: https://typesense.org/docs/0.24.0/api/search.html#typo-tolerance-parameters
Mar 20, 2023 (9 months ago)
Want to make sure I understand the issue fully
Mar 21, 2023 (9 months ago)
Kishore Nallan11:15 AM
foo barcould end up being stored in 2 different elements of a single field and currently we are unable to account for this case.
Can you please create a github issue here: https://github.com/typesense/typesense/issues? This is a bug and we need to fix it.
Indexed 3011 threads (79% resolved)
Querying and Indexing Multiple Elements Issues
Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Phrase Match Problem in Typesense Version 0.24.0rcn25
Robert was unsure about correct phrase match usage in Typesense. After providing Kishore Nallan with necessary data, Kishore Nallan was able to replicate the issue. Robert shared a Github link for further tracking, where Kishore Nallan responded later.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.
Issue with Search Duration on Typesense Database
Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.