Pavel Koroteev
03/18/2025, 6:44 PMstem_dictionary
for the several fields, and we were happy! (We have the benchmarks
, like records of what we expected on some position for some search terms).
This week I found strange behavior. Many of benchmark records have been failed. I started to investigate and decided to remove for test stem_dictionary
property from my fields. And that’s confirmed, the many changes disappeared. The worst thing, that when I redeployed it again the initial problems were not reproduced. That means that I can’t give you reproducible example.
Any ideas are appreciated. Thank you.Jason Bosco
03/18/2025, 7:23 PMDima
03/28/2025, 11:16 AMuser talks
article with exact match in one of the field somewhere at the bottom of results, while top is filled with hits with user talk
. The same behavior is for queries with quotes, e.g. "user talks"
return empty results while we have multiple documents with the exact match
• Restart of the instance OR recreation of the field helps — I’m trying to understand which exact event helpsDima
03/28/2025, 12:41 PMDima
03/28/2025, 1:14 PM"Company talks"
"Some talk in the company"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1458 0 1276 100 182 269k 39334 --:--:-- --:--:-- --:--:-- 355k
"Company talks"
"Some talk in the company"
typesense-debug
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 915 0 732 100 183 51491 12872 --:--:-- --:--:-- --:--:-- 65357
"Company talks"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1456 0 1274 100 182 118k 17321 --:--:-- --:--:-- --:--:-- 142k
"Some talk in the company"
"Company talks"
Jason Bosco
03/28/2025, 5:24 PMKrunal Gandhi
03/29/2025, 10:10 AMKishore Nallan
04/01/2025, 1:55 PMKishore Nallan
04/02/2025, 9:39 AM29.0.rc11
contains the patch. We had to load the stemming dictionary before loading any of the documents because the tokens have to be stemmed before indexing on restart.
Having said that, a gotcha still remains:
1. The above fix works if you restart the server after a snapshot has run (post the indexing of stemming dictionary)
2. However, if someone creates a new stemming dictionary, then creates a collection and imports documents, and then immediately restarts Typesense without a snapshot, there is a race condition during restart, when the raft logs are replayed.
3. Since we parallelize the requests at a collection level, but send all other "meta" requests like the stemming dictionary creation into a separate thread, we could have a case where the collection is finished indexing before the stemming dictionary is finished loading. This can cause the indexed documents to not have stemming normalization.
It's not possible to solve this elegantly because the same code path is used for real time writes also but since real time writes always have a time component (one operations is done after the other), it's not affected by this race condition. Which is why our upgrade guides always require running a snapshot before commencing an upgrade. However, this is super rare, and in fact, I can't think of another feature which has this particular quirk.Kishore Nallan
04/02/2025, 9:41 AM