Troubleshooting Typo Tolerance Issue with Typesense for Korean
TLDR Minyong informed Kishore Nallan about a typo tolerance issue in Typesense with Korean text. Kishore Nallan suggested adjusting the byte difference limit for Korean, but warned this could slow down the search function. Minyong approved testing the solution.
1
1
1
Oct 19, 2022 (14 months ago)
Minyong
08:36 AMHello! I am using typesense for a search app. Our records have fields which are mixes of English and Korean text. I am trying to make the search as lenient as possible to increase recall โ in that sense, typo tolerance is very important. However, typo tolerance doesnโt seem to work well for korean text. Could you take a look at this reproducible example?
https://gist.github.com/minyonglee/d0129025d04192d8f09f236f4d11165b
Kishore Nallan
08:49 AM1
Minyong
08:52 AMq:
'๊น์ฒ ์๋ ๋ถํธ์บ ํ'
field:
'๊น์ฒ ์ ๋ถํธ์บ ํ'
result:
matched_tokens: [ '๋ถํธ์บ ํ' ]
expected:
matched_tokens: [ '๊น์ฒ ์๋', '๋ถํธ์บ ํ' ]
english works well
q:
'Kevin Jordan'
field:
'Kev Jordan'
result:
matched_tokens: [ 'Kev', 'Jordan' ]
expected:
matched_tokens: [ 'Kev', 'Jordan' ]
Kishore Nallan
09:14 AM๊น์ฒ ์๋
to be a prefix match against ๊น์ฒ ์
?1
Kishore Nallan
09:16 AM1
Minyong
09:28 AMKishore Nallan
09:31 AMKishore Nallan
09:32 AMMinyong
09:52 AMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Phrase Match Problem in Typesense Version 0.24.0rcn25
Robert was unsure about correct phrase match usage in Typesense. After providing Kishore Nallan with necessary data, Kishore Nallan was able to replicate the issue. Robert shared a Github link for further tracking, where Kishore Nallan responded later.
Resolving TypeSense Query Query Confusion
Nelson didn't understand why a query for "hong kong" returned "singapore". Jason recommended changing a search parameter, then explained how TypeSense attempts to find results when exact matches aren't available. Kishore Nallan further clarified the issue and Jason and Kishore Nallan mentioned changes in the upcoming version to tackle this.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.
Resolving Multilingual Search Function in Typesense Software
Bill is having difficulty with multilingual search functionality in Typesense software. Developer Kishore Nallan suggested setting a language locale and provided a demo build. The build solution had some issues, and after multiple rounds of software updates and troubleshooting, the problem still persists.
Issue with Typo Correction/Prefix Search and the Role of max_candidates
John noticed inconsistent search results based on max_candidates settings, and Kishore Nallan clarified its role for multi-word queries. They resolved that increasing max_candidates ensures the query isn't prematurely limited.