Hi team, Last time we used custom `stem_dictionar...
# community-help
p
Hi team, Last time we used custom
stem_dictionary
for the several fields, and we were happy! (We have the
benchmarks
, like records of what we expected on some position for some search terms). This week I found strange behavior. Many of benchmark records have been failed. I started to investigate and decided to remove for test
stem_dictionary
property from my fields. And that’s confirmed, the many changes disappeared. The worst thing, that when I redeployed it again the initial problems were not reproduced. That means that I can’t give you reproducible example. Any ideas are appreciated. Thank you.
j
Hard to debug this based on what you shared unfortunately. If you can replicate with a set of curl commands like this even one time, that would be helpful. We can then try running it a bunch of times to see if it's a race condition
🙌 1
d
It is hard to build reproducible example, unfortunately, but what we know now: • Issue happens only after some time / multiple emplace operations • Exact matches stopped working: e.g. for search query
user talks
article with exact match in one of the field somewhere at the bottom of results, while top is filled with hits with
user talk
. The same behavior is for queries with quotes, e.g.
"user talks"
return empty results while we have multiple documents with the exact match • Restart of the instance OR recreation of the field helps — I’m trying to understand which exact event helps
More info: • Starts not just after some time, but after the restart of the instance. Emplace/update operations don’t help to fix the issue • Only recreation of the field with stem_dictionary helps • In case of HA cluster, only restarted instance is affecteed
And some example: https://gist.github.com/b0g3r/256eae056f8368d84472ed6082a6b579 Different results before and after restart:
Copy code
"Company talks"
"Some talk in the company"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1458    0  1276  100   182   269k  39334 --:--:-- --:--:-- --:--:--  355k
"Company talks"
"Some talk in the company"
typesense-debug
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   915    0   732  100   183  51491  12872 --:--:-- --:--:-- --:--:-- 65357
"Company talks"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1456    0  1274  100   182   118k  17321 --:--:-- --:--:-- --:--:--  142k
"Some talk in the company"
"Company talks"
j
CC: @Kishore Nallan @Krunal Gandhi Any ideas?
k
seems like stemming dictionary is not getting loaded from disk after restart. need to investigate further.
k
I found the issue. Will have a patched build shortly.
29.0.rc11
contains the patch. We had to load the stemming dictionary before loading any of the documents because the tokens have to be stemmed before indexing on restart. Having said that, a gotcha still remains: 1. The above fix works if you restart the server after a snapshot has run (post the indexing of stemming dictionary) 2. However, if someone creates a new stemming dictionary, then creates a collection and imports documents, and then immediately restarts Typesense without a snapshot, there is a race condition during restart, when the raft logs are replayed. 3. Since we parallelize the requests at a collection level, but send all other "meta" requests like the stemming dictionary creation into a separate thread, we could have a case where the collection is finished indexing before the stemming dictionary is finished loading. This can cause the indexed documents to not have stemming normalization. It's not possible to solve this elegantly because the same code path is used for real time writes also but since real time writes always have a time component (one operations is done after the other), it's not affected by this race condition. Which is why our upgrade guides always require running a snapshot before commencing an upgrade. However, this is super rare, and in fact, I can't think of another feature which has this particular quirk.
👍 1
✍️ 1
On a separate note, recent 29.0 RC builds have a performance refactoring for group-by which you might need to be aware of: since I know you rely on group-bys, so I wanted to given an heads up about that. If you plan to use this version, be sure to test this on staging before upgrading production.
✍️ 1