#community-help

Arabic Search Highlighting Issue and Exhaustive Mode Timeout

TLDR Kumail reports issues with highlighting Arabic words and exhaustive mode timeouts. Kishore Nallan provides a potential fix for highlighting but suggests creating a Github issue for enabling full highlighting.

Powered by Struct AI
May 07, 2023 (4 months ago)
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
02:00 AM
wanted to report 2 issues: searches in Arabic don’t highlight all the words in the search query & searches in Arabic with exhaustive mode always exceed timeout
02:16
Kumail
02:16 AM
eg you can see the words فلم & السلام not highlighted even though theyre exactly present in the search query
Image 1 for eg you can see the words فلم & السلام not highlighted even though theyre exactly present in the search query
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:18 AM
Please share a text snippet of query and the text for the highlight issue.

Exhaustive search is not meant to be used for large datasets.
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
04:08 AM
here’s the search query and the corresponding snippet for the first hit

query:
لة ثم دعا فلم يستجب له فأتى عيسى ابن مريم عليه السلام يشكو إل

snippet:
"رجلا منهم اجتهد اربعين ليله ثم دعا فلم يستجب له <mark>فاتي</mark> <mark>عيسي</mark> ابن <mark>مريم</mark> <mark>عليه</mark> السلام يشكو <mark>ال</mark>يه ما هو فيه ويساله الدعاء له فتطهر <mark>عيسي</mark> وصلي ثم"

multiple words aren’t getting highlighted here, for example ابن
03:23
Kumail
03:23 PM
Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
I've fix for this issue. The actual issue is that when a field's string exceeds 175 characters, and when not all tokens in the query match a document (i.e. approximate matching) we don't highlight all keywords in the query because it will involve exhaustively going through all words in the string which will be slow.

The problem was that we were looking at 175 unicode bytes which will mean much smaller strings in Arabic because every character is not just a single byte like English. I have fixed this so it will work in future RC builds. I can produce a fixed RC build in a couple of days.
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
09:12 PM
awesome thank you! if you could let me know when the RC build image is pushed to docker that’d be great!
May 08, 2023 (4 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:39 AM
Please check against:

typesense/typesense:0.25.0.rc27
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
06:54 PM
still running into this with 0.25.0.rc27
May 09, 2023 (4 months ago)
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
01:32 AM
Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:01 AM
Did you try with the example earlier?
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
04:33 AM
yes the same example
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:10 AM
Just tried, seems to work on this example atleast, see: https://gist.github.com/kishorenc/0e9dc69733d099b3c8ee3d75e17bbefc
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
05:57 PM
heres my query request, you can see some words arent being highlighted still, https://gist.github.com/kumailn/d616b0242e01c890461cc6f65cc1b26a Kishore Nallan
May 10, 2023 (4 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:53 PM
This one exceeds 175 arabic characters. As I mentioned earlier for long strings, we refrain from doing full highlight of all characters in the query for partial query matches.
03:54
Kishore Nallan
03:54 PM
We should probably atleast allow a way to enable this, because I can see that being useful.
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
06:53 PM
is there any way to get around this other than perform exhaustive search?
06:53
Kumail
06:53 PM
or will i need to implement highliting manually on my end after the response is received
May 17, 2023 (4 months ago)
Kumail
Photo of md5-a43daed28ee5027a0aa4348c19129da3
Kumail
05:28 PM
Kishore Nallan any plan to add enabling full highlighting?
May 18, 2023 (4 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:45 AM
Can you please create a Github issue for this please? We will priorotize it. We need to add a flag to make it happen.