We have a 5 node typesense cluster running locally...
# community-help
a
We have a 5 node typesense cluster running locally. After we restarted 3 of them, it is taking a long time to recover.
k
You cannot restart 3 of them at the same time. You have to maintain majority of nodes at all time.
a
But there can be genuine outages. Like box failures.
Elasticsearch for instance recovers as soon as it is up. (It takes 1 min to come up though )
k
Different design decisions, you can't compare systems that way. The trade-offs that each system chooses is different. Elasticsearch stores all indices on disk. Which is why it's slow.
j
But there can be genuine outages. Like box failures.
This is a property of the RAFT protocol we use for consensus - a 5 node cluster can only tolerate 2 simultaneous node failures, beyond that the cluster loses quorum and needs manual intervention to recover (more in the HA docs about how to do this). So when you do upgrades, you have to rotate 1 node at a time in a 3 node cluster, and 2 nodes at a time in a 5 node cluster, and wait till those nodes fully re-index the data in-memory, before doing additional rotations