Solutions for HA Issues Running Typesense in Kubernetes
TLDR Lane shared a solution for fixing HA issues in k8s, involving an InitContainer to update the config map. Kishore Nallan introduced a cluster auto reset peers feature. Users discussed limitations in the approach and potential deployment considerations.
Mar 13, 2023 (7 months ago)
I started off with the sidecar idea but went a slightly different tack.
Instead of each node having its own sidecar that wrote to a mapped volume, I have an app that monitors the TS namespace and when a pod recycles it updates the config map instead.
This approach solves pretty much every scenario except when all pods get recycled at once. But I think we could fix that edge case as well with a minor tweak to the TS codebase. When a node comes up leaderless it doesn't appear to check the config map again. Adding some logic to recheck the config map when leaderless should solve that.
If there's interest I have tentative approval from my company to open source it.
Mar 14, 2023 (7 months ago)
Kishore Nallan08:55 AM
> When a node comes up leaderless it doesn't appear to check the config map again. Adding some logic to recheck the config map when leaderless should solve that.
We do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost.
Let me see if I can find a safe way to handle this.
If it's configured as a StatefulSet and has readiness probe this should never happen. Unless massive node failure, but that should be solved via having nodes in multi-az
Yeah, that's how we have our cluster setup.
Mar 23, 2023 (6 months ago)
Instead of a Watcher I moved the logic into a InitContainer. The InitContainer does what the code did before, but does it before the TS pod starts up.
This appears to handle when all the pods are restarted at once and when individual pods are recycled.
When a new pod comes up it takes about 30 seconds for the other TS pods to refresh its values in the config map. Once that happens the new pod gets brought into the cluster cleanly.
Mar 24, 2023 (6 months ago)
Kishore Nallan05:19 AM
Mar 28, 2023 (6 months ago)
Kishore Nallan11:22 AM
typesense/typesense:0.25.0.rc18Docker build. To enabled this feature, you have to set the
--reset-peers-on-errorflag or set
TYPESENSE_RESET_PEERS_ON_ERROR=TRUEenvironment variable. Try it out and let me know if this works Sergio Lane -- hopefully we can put this whole peer change issue behind us with this fix.
Mar 29, 2023 (6 months ago)
Yes, on my local machine using Rancher Desktop with no disruption budget I can delete one-to-n nodes at a time and the cluster seems to handle it gracefully for the most part. Things can get out of whack if you try hard enough, but you have to be a little intentional.
Currently the code just sleeps for a few seconds to make sure the cluster has allocated all the pending pods long enough for each to get an IP. I could easily extend it to to wait for some sort of minimum pod count instead but for my purposes right now its Good Enough™.
I'm still letting things marinate in my environment. There's still an edge case I'm tracking down. Its not related to pods recycling. My current suspicion is its network related. Our production environment has had the leaderless cluster issue 2, maybe 3, times in a year. However our lower environment has it on nearly daily basis. I'm 99.5% confident its not because the pods are recycling, its something more fundamental than that (or else one of our sys admins has been playing a practical joke on me 😅).
> The new pod will get all IPs updated when start, but how does an old pod discovers the new pod IP?
The config map is updated instantly across all pods. So its purely a matter of how long it takes for the old pods to realize a config map update was done. In my testing that's somewhere between 30-60 seconds (I haven't actually timed it so that's a guesstimate).
If there could be some sort of listener for when the config map gets updated that would make this whole thing nearly instantaneous.
True that! Sounds an more ondemand solution which should work properly too!
I am curious, do you have that script to share? It uses k8s api to update the config map right?
Correct, the code uses the .Net library for k8s.
I am working with my company to open source the code. If there's interest I can press a bit harder.
> We do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost.
When you say "current leader has buffered the write but has not sent it to the followers yet" - do you mean the write is not committed yet? In this case I think thats acceptable since we should only expect writes to persist once committed, and any write api calls should not have returned a success yet. Is that thinking correct?
> auto reset the peer list on error during startup
Just want to make sure I know what you mean by this. Does this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)?
If so I know you said here this was a potentially dangerous action that could lead to data loss. My thinking was that is should not be dangerous as you could just start a new election and it should not be possible for committed data to get lost. Does that thinking make sense? Or if not how did you get around the risk of data loss?
also disclaimer I'm still getting context on this problem/the existing work arounds. So apologies if these questions are at all naive or repetitive :)
Kishore Nallan04:08 PM
> Does this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)?
Mar 30, 2023 (6 months ago)
Kishore Nallan03:05 AM
Apr 02, 2023 (6 months ago)
Apr 03, 2023 (6 months ago)
Kishore Nallan04:30 PM
Indexed 2779 threads (79% resolved)
Troubleshooting IP Update on Kubernetes Typesense
Alessandro and Damien are having issues with old IP addresses in a Kubernetes Typesense cluster not being updated. Kishore Nallan provides possible troubleshooting solutions, and mentioned the need for a fix for DNS retries. A suggested update strategy was shared by Aljosa.
Debugging and Recovery of a Stuck Typesense Cluster
Charlie had a wedged staging cluster. Jason provided debugging and recovery steps, and Adrian helped with more insights. It turns out the issue was insufficient disk space. Once Adrian increased the disk size, the cluster healed itself.
Testing High Availability with Raft Returns Crashes
pboros reports an issue with usual crashes when testing high availability with Raft. Kishore Nallan suggests checking the quorum recovery period and efficiently logging the crash on all nodes. The issue persists, with pboros suspecting it's due to hostname being no longer resolvable once a container is killed.
Segfault in Typesense 0.25.0rc24 during Node Restart
Charlie reported a segfault while restarting node in a k8s deployment using version 0.25.0.rc24. Kishore Nallan advised rolling rotations for nodes and confirmed that nodes will join as the cluster expands.
Troubleshooting Typesense Cluster Multi-node Leadership Error
Bill experienced a problem with a new typesense cluster, receiving an error about no leader and health status issues. Jason and Kishore Nallan provided troubleshooting steps and determined it was likely due to a communication issue between nodes. Kishore Nallan identified a potential solution involving resetting the data directory. Following this, Bill reported the error resolved.