Lane Goolsby
03/13/2023, 8:31 PMJason Bosco
03/13/2023, 8:53 PMKishore Nallan
03/14/2023, 8:55 AMWhen a node comes up leaderless it doesn't appear to check the config map again. Adding some logic to recheck the config map when leaderless should solve that.We do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost. Let me see if I can find a safe way to handle this.
Sergio Behrends
03/14/2023, 10:59 AMexcept when all pods get recycled at onceIf it's configured as a StatefulSet and has readiness probe this should never happen. Unless massive node failure, but that should be solved via having nodes in multi-az
Lane Goolsby
03/14/2023, 2:00 PMIf it's configured as a StatefulSet and has readiness probe this should never happen. Unless massive node failure, but that should be solved via having nodes in multi-azYeah, that's how we have our cluster setup.
Lane Goolsby
03/23/2023, 7:03 PMKishore Nallan
03/24/2023, 5:19 AMSergio Behrends
03/24/2023, 9:29 AMKishore Nallan
03/28/2023, 11:22 AMtypesense/typesense:0.25.0.rc18
Docker build. To enabled this feature, you have to set the --reset-peers-on-error
flag or set TYPESENSE_RESET_PEERS_ON_ERROR=TRUE
environment variable. Try it out and let me know if this works @Sergio Behrends @Lane Goolsby -- hopefully we can put this whole peer change issue behind us with this fix.Lane Goolsby
03/29/2023, 1:44 AMAre you sure it's able to handle 3 pods getting rotated out at the same timeYes, on my local machine using Rancher Desktop with no disruption budget I can delete one-to-n nodes at a time and the cluster seems to handle it gracefully for the most part. Things can get out of whack if you try hard enough, but you have to be a little intentional. Currently the code just sleeps for a few seconds to make sure the cluster has allocated all the pending pods long enough for each to get an IP. I could easily extend it to to wait for some sort of minimum pod count instead but for my purposes right now its Good Enough™. I'm still letting things marinate in my environment. There's still an edge case I'm tracking down. Its not related to pods recycling. My current suspicion is its network related. Our production environment has had the leaderless cluster issue 2, maybe 3, times in a year. However our lower environment has it on nearly daily basis. I'm 99.5% confident its not because the pods are recycling, its something more fundamental than that (or else one of our sys admins has been playing a practical joke on me 😅).
The new pod will get all IPs updated when start, but how does an old pod discovers the new pod IP?The config map is updated instantly across all pods. So its purely a matter of how long it takes for the old pods to realize a config map update was done. In my testing that's somewhere between 30-60 seconds (I haven't actually timed it so that's a guesstimate). If there could be some sort of listener for when the config map gets updated that would make this whole thing nearly instantaneous.
Lane Goolsby
03/29/2023, 1:53 AMSergio Behrends
03/29/2023, 10:38 AMThe config map is updated instantly across all pods.True that! Sounds an more ondemand solution which should work properly too! I am curious, do you have that script to share? It uses k8s api to update the config map right?
Lane Goolsby
03/29/2023, 1:57 PMI am curious, do you have that script to share? It uses k8s api to update the config map right?Correct, the code uses the .Net library for k8s. I am working with my company to open source the code. If there's interest I can press a bit harder.
Adrian Kager
03/29/2023, 3:00 PMAdrian Kager
03/29/2023, 3:40 PMWe do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost.When you say "current leader has buffered the write but has not sent it to the followers yet" - do you mean the write is not committed yet? In this case I think thats acceptable since we should only expect writes to persist once committed, and any write api calls should not have returned a success yet. Is that thinking correct?
auto reset the peer list on error during startupJust want to make sure I know what you mean by this. Does this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)? If so I know you said here this was a potentially dangerous action that could lead to data loss. My thinking was that is should not be dangerous as you could just start a new election and it should not be possible for committed data to get lost. Does that thinking make sense? Or if not how did you get around the risk of data loss? also disclaimer I'm still getting context on this problem/the existing work arounds. So apologies if these questions are at all naive or repetitive :)
Kishore Nallan
03/29/2023, 4:08 PMDoes this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)?Correct.
Adrian Kager
03/29/2023, 9:50 PMLane Goolsby
03/30/2023, 3:04 AMKishore Nallan
03/30/2023, 3:05 AMDima
04/02/2023, 9:07 PMKishore Nallan
04/03/2023, 4:30 PM<https://dl.typesense.org/releases/0.25.0.rc20/typesense-server-0.25.0.rc20-linux-amd64.tar.gz>