i still see the error in prod , but not able to reproduce with my example. i m trying to isolate what is different.
For now i have 1 pod over five that did reboot, (the full rolling restart takes 16-17h) . could that explain why i still have the issue ? my initial guess is that the leader send the document/update and each follower run it locally, and expect the upgraded node to have the correct data., but may be i m missing something here (for example the leader decide which field get updated)