#community-help

Solutions for HA Issues Running Typesense in Kubernetes

TLDR Lane shared a solution for fixing HA issues in k8s, involving an InitContainer to update the config map. Kishore Nallan introduced a cluster auto reset peers feature. Users discussed limitations in the approach and potential deployment considerations.

Powered by Struct AI

4

1

1

hand_with_index_finger_and_thumb_crossed

1

1

21
6mo
Solved
Join the chat
Mar 13, 2023 (7 months ago)
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
08:31 PM
Jason Kishore Nallan I think I have come up with a solution for fixing the HA issues when running TS in k8s.

https://github.com/typesense/typesense/issues/465

I started off with the sidecar idea but went a slightly different tack.

Instead of each node having its own sidecar that wrote to a mapped volume, I have an app that monitors the TS namespace and when a pod recycles it updates the config map instead.

This approach solves pretty much every scenario except when all pods get recycled at once. But I think we could fix that edge case as well with a minor tweak to the TS codebase. When a node comes up leaderless it doesn't appear to check the config map again. Adding some logic to recheck the config map when leaderless should solve that.

If there's interest I have tentative approval from my company to open source it.

1

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:53 PM
I’ll let Kishore speak to this
Mar 14, 2023 (7 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:55 AM
Lane

> When a node comes up leaderless it doesn't appear to check the config map again. Adding some logic to recheck the config map when leaderless should solve that.
We do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost.

Let me see if I can find a safe way to handle this.
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
10:59 AM
> except when all pods get recycled at once
If it's configured as a StatefulSet and has readiness probe this should never happen. Unless massive node failure, but that should be solved via having nodes in multi-az
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
02:00 PM
> If it's configured as a StatefulSet and has readiness probe this should never happen. Unless massive node failure, but that should be solved via having nodes in multi-az
Yeah, that's how we have our cluster setup.
Mar 23, 2023 (6 months ago)
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
07:03 PM
After playing around with the first attempt I mentioned I came up with (what I believe) to be a better solution.

Instead of a Watcher I moved the logic into a InitContainer. The InitContainer does what the code did before, but does it before the TS pod starts up.

This appears to handle when all the pods are restarted at once and when individual pods are recycled.

When a new pod comes up it takes about 30 seconds for the other TS pods to refresh its values in the config map. Once that happens the new pod gets brought into the cluster cleanly.
Mar 24, 2023 (6 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
05:19 AM
Are you sure it's able to handle 3 pods getting rotated out at the same time (i.e. all of their IPs changing)? One of the problems that people have faced is that the old pod IPs are persisted in an internal state and it's not possible to recover without a force reset because the cluster still expects the pods to join back with the old IPs.
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
09:29 AM
Also I have the question how that implementation solves when pods rotate. The new pod will get all IPs updated when start, but how does an old pod discovers the new pod IP?
Mar 28, 2023 (6 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:22 AM
Ok, I've figured out a way to make cluster auto reset the peer list on error during startup (in the case when all pods are rotated and they come up with new IPs).

Check: typesense/typesense:0.25.0.rc18 Docker build. To enabled this feature, you have to set the --reset-peers-on-error flag or set TYPESENSE_RESET_PEERS_ON_ERROR=TRUE environment variable. Try it out and let me know if this works Sergio Lane -- hopefully we can put this whole peer change issue behind us with this fix.

1

Mar 29, 2023 (6 months ago)
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
01:44 AM
> Are you sure it's able to handle 3 pods getting rotated out at the same time
Yes, on my local machine using Rancher Desktop with no disruption budget I can delete one-to-n nodes at a time and the cluster seems to handle it gracefully for the most part. Things can get out of whack if you try hard enough, but you have to be a little intentional.

Currently the code just sleeps for a few seconds to make sure the cluster has allocated all the pending pods long enough for each to get an IP. I could easily extend it to to wait for some sort of minimum pod count instead but for my purposes right now its Good Enough™.

I'm still letting things marinate in my environment. There's still an edge case I'm tracking down. Its not related to pods recycling. My current suspicion is its network related. Our production environment has had the leaderless cluster issue 2, maybe 3, times in a year. However our lower environment has it on nearly daily basis. I'm 99.5% confident its not because the pods are recycling, its something more fundamental than that (or else one of our sys admins has been playing a practical joke on me 😅).

> The new pod will get all IPs updated when start, but how does an old pod discovers the new pod IP?
The config map is updated instantly across all pods. So its purely a matter of how long it takes for the old pods to realize a config map update was done. In my testing that's somewhere between 30-60 seconds (I haven't actually timed it so that's a guesstimate).

If there could be some sort of listener for when the config map gets updated that would make this whole thing nearly instantaneous.
01:53
Lane
01:53 AM
I should have prefaced that^ with the fact that I'm not testing writes in great depth. We may run +/-5 content crawls in a day. If we lose a little data because of a blip I don't really care. We're just indexing documentation so if we're slightly out date on a couple pages for a bit its not a big deal.
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
10:38 AM
> The config map is updated instantly across all pods.
True that! Sounds an more ondemand solution which should work properly too!
I am curious, do you have that script to share? It uses k8s api to update the config map right?
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
01:57 PM
> I am curious, do you have that script to share? It uses k8s api to update the config map right?
Correct, the code uses the .Net library for k8s.

I am working with my company to open source the code. If there's interest I can press a bit harder.

1

Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:00 PM
I would be interested as well! I'm evaluating Typesense for a search use case at my company and k8s deployment is one of the open issues we need to figure out
03:40
Adrian
03:40 PM
Kishore Nallan
> We do this because resetting of peers could lead to data loss, if for e.g. a current leader has buffered the write but has not sent it to the followers yet. In this scenario, if the peers are force reset, then that buffered write could be lost.
When you say "current leader has buffered the write but has not sent it to the followers yet" - do you mean the write is not committed yet? In this case I think thats acceptable since we should only expect writes to persist once committed, and any write api calls should not have returned a success yet. Is that thinking correct?
> auto reset the peer list on error during startup
Just want to make sure I know what you mean by this. Does this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)?
If so I know you said here this was a potentially dangerous action that could lead to data loss. My thinking was that is should not be dangerous as you could just start a new election and it should not be possible for committed data to get lost. Does that thinking make sense? Or if not how did you get around the risk of data loss?

also disclaimer I'm still getting context on this problem/the existing work arounds. So apologies if these questions are at all naive or repetitive :)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:08 PM
There are a lot of nuances with Raft, some of which are also down to the specific implementation details of a given raft library. The warning about reset_peers are from the raft library we use (braft) but I think in a state when all nodes are restarting from scrarch, calling this API should be safe because there are no ongoing writes in that state.

> Does this mean the node will pull the latest IP addresses from the node file (which it previously would only do if the cluster has quorum)?
Correct.

1

Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:50 PM
Kishore Nallan Curious for opinions on anther deployment approach my team is considering. We are considering creating one service per typesense node. Services have a stable IP address, so this way we won't have to deal with changing IP addresses. Does this approach sound feasible or are we missing a possible drawback? We realize it would make autoscaling difficult but I think were fine with that tradeoff
Mar 30, 2023 (6 months ago)
Lane
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
Lane
03:04 AM
If your IPs are stable then there's no worry. You'll never have to deal with any of this. These problems are only because of us trying to get TS to work in k8s, where dynamic IPs are causing all sorts of merry chaos.

1

Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:05 AM
I agree with the above 👍

1

Apr 02, 2023 (6 months ago)
Dima
Photo of md5-1b62114a658b760944aa7d2b4c274460
Dima
09:07 PM
Hi Kishore Nallan! Could you please share tar.gz for 0.25.0.rc18?
Apr 03, 2023 (6 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:30 PM
tar.gz for latest rc build:

hand_with_index_finger_and_thumb_crossed

1