Arabic Search Highlighting Issue and Exhaustive Mode Timeout
TLDR Kumail reports issues with highlighting Arabic words and exhaustive mode timeouts. Kishore Nallan provides a potential fix for highlighting but suggests creating a Github issue for enabling full highlighting.
May 07, 2023 (4 months ago)
Kumail
02:00 AMKumail
02:16 AMKishore Nallan
03:18 AMExhaustive search is not meant to be used for large datasets.
Kumail
04:08 AMquery:
لة ثم دعا فلم يستجب له فأتى عيسى ابن مريم عليه السلام يشكو إل
snippet:
"رجلا منهم اجتهد اربعين ليله ثم دعا فلم يستجب له <mark>فاتي</mark> <mark>عيسي</mark> ابن <mark>مريم</mark> <mark>عليه</mark> السلام يشكو <mark>ال</mark>يه ما هو فيه ويساله الدعاء له فتطهر <mark>عيسي</mark> وصلي ثم"
multiple words aren’t getting highlighted here, for example
ابن
Kumail
03:23 PMKishore Nallan
03:52 PMThe problem was that we were looking at 175 unicode bytes which will mean much smaller strings in Arabic because every character is not just a single byte like English. I have fixed this so it will work in future RC builds. I can produce a fixed RC build in a couple of days.
Kumail
09:12 PMMay 08, 2023 (4 months ago)
Kishore Nallan
11:39 AMtypesense/typesense:0.25.0.rc27
Kumail
06:54 PM0.25.0.rc27
May 09, 2023 (4 months ago)
Kumail
01:32 AMKishore Nallan
02:01 AMKumail
04:33 AMKishore Nallan
05:10 AMKumail
05:57 PMMay 10, 2023 (4 months ago)
Kishore Nallan
03:53 PMKishore Nallan
03:54 PMKumail
06:53 PMKumail
06:53 PMMay 17, 2023 (4 months ago)
Kumail
05:28 PMMay 18, 2023 (4 months ago)
Kishore Nallan
09:45 AMTypesense
Indexed 2764 threads (79% resolved)
Similar Threads
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.


Querying and Indexing Multiple Elements Issues
Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.



Query on "weighted_score" & Issue with Synonym Highlighting
Stefan asked about "weighted_score" field and reported a possible synonym highlighting issue. Kishore Nallan clarified the use of "weighted_score". The possible synonym issue is still being investigated.
Adjusting Text Match Score Calculation in TypeSense
Johannes wanted to modify the Text Match Score calculation in TypeSense to improve search results returns. With counsel from Jason and Kishore Nallan, various solutions were proposed, including creating a Github issue, attempting different parameters, and updating Docker to a new version to resolve the matter.
