Kishore Nallan
04/02/2025, 9:39 AM29.0.rc11
contains the patch. We had to load the stemming dictionary before loading any of the documents because the tokens have to be stemmed before indexing on restart.
Having said that, a gotcha still remains:
1. The above fix works if you restart the server after a snapshot has run (post the indexing of stemming dictionary)
2. However, if someone creates a new stemming dictionary, then creates a collection and imports documents, and then immediately restarts Typesense without a snapshot, there is a race condition during restart, when the raft logs are replayed.
3. Since we parallelize the requests at a collection level, but send all other "meta" requests like the stemming dictionary creation into a separate thread, we could have a case where the collection is finished indexing before the stemming dictionary is finished loading. This can cause the indexed documents to not have stemming normalization.
It's not possible to solve this elegantly because the same code path is used for real time writes also but since real time writes always have a time component (one operations is done after the other), it's not affected by this race condition. Which is why our upgrade guides always require running a snapshot before commencing an upgrade. However, this is super rare, and in fact, I can't think of another feature which has this particular quirk.