Solution for Lost Cluster Quorum Due to IP Rotation
TLDR Kishore Nallan helped Pavan understand that the Kubernetes clusters were losing quorum due to multiple node rotations and subsequent changes in their IPs.
Nov 16, 2022 (13 months ago)
Kishore Nallan08:02 AM
Kishore Nallan10:10 AM
Indexed 3015 threads (79% resolved)
Testing High Availability with Raft Returns Crashes
pboros reports an issue with usual crashes when testing high availability with Raft. Kishore Nallan suggests checking the quorum recovery period and efficiently logging the crash on all nodes. The issue persists, with pboros suspecting it's due to hostname being no longer resolvable once a container is killed.
Troubleshooting IP Update on Kubernetes Typesense
Alessandro and Damien are having issues with old IP addresses in a Kubernetes Typesense cluster not being updated. Kishore Nallan provides possible troubleshooting solutions, and mentioned the need for a fix for DNS retries. A suggested update strategy was shared by Aljosa.
Debugging and Recovery of a Stuck Typesense Cluster
Charlie had a wedged staging cluster. Jason provided debugging and recovery steps, and Adrian helped with more insights. It turns out the issue was insufficient disk space. Once Adrian increased the disk size, the cluster healed itself.