wanted to report 2 issues: searches in Arabic don’...
# community-help
k
wanted to report 2 issues: searches in Arabic don’t highlight all the words in the search query & searches in Arabic with exhaustive mode always exceed timeout
eg you can see the words فلم & السلام not highlighted even though theyre exactly present in the search query
k
Please share a text snippet of query and the text for the highlight issue. Exhaustive search is not meant to be used for large datasets.
k
here’s the search query and the corresponding snippet for the first hit query:
لة ثم دعا فلم يستجب له فأتى عيسى ابن مريم عليه السلام يشكو إل
snippet:
"رجلا منهم اجتهد اربعين ليله ثم دعا فلم يستجب له <mark>فاتي</mark> <mark>عيسي</mark> ابن <mark>مريم</mark> <mark>عليه</mark> السلام يشكو <mark>ال</mark>يه ما هو فيه ويساله الدعاء له فتطهر <mark>عيسي</mark> وصلي ثم"
multiple words aren’t getting highlighted here, for example
ابن
@Kishore Nallan
k
I've fix for this issue. The actual issue is that when a field's string exceeds 175 characters, and when not all tokens in the query match a document (i.e. approximate matching) we don't highlight all keywords in the query because it will involve exhaustively going through all words in the string which will be slow. The problem was that we were looking at 175 unicode bytes which will mean much smaller strings in Arabic because every character is not just a single byte like English. I have fixed this so it will work in future RC builds. I can produce a fixed RC build in a couple of days.
k
awesome thank you! if you could let me know when the RC build image is pushed to docker that’d be great!
k
Please check against:
Copy code
typesense/typesense:0.25.0.rc27
k
still running into this with
0.25.0.rc27
@Kishore Nallan
k
Did you try with the example earlier?
k
yes the same example
k
Just tried, seems to work on this example atleast, see: https://gist.github.com/kishorenc/0e9dc69733d099b3c8ee3d75e17bbefc
k
heres my query request, you can see some words arent being highlighted still, https://gist.github.com/kumailn/d616b0242e01c890461cc6f65cc1b26a @Kishore Nallan
k
This one exceeds 175 arabic characters. As I mentioned earlier for long strings, we refrain from doing full highlight of all characters in the query for partial query matches.
We should probably atleast allow a way to enable this, because I can see that being useful.
k
is there any way to get around this other than perform exhaustive search?
or will i need to implement highliting manually on my end after the response is received
@Kishore Nallan any plan to add enabling full highlighting?
k
Can you please create a Github issue for this please? We will priorotize it. We need to add a flag to make it happen.