Hello my company is evaluating Typesense to replace Algolia for our search use cases. We will be self hosting due to customer security requirements and have a high availability SLA for this service. I have a few questions based on
this section of the Typesense documentation on HA. Cc
@Jason Bosco since we talked about adjacent things when we met last Friday. For context I am generally familiar with the raft algorithm (I implemented a simple version a few years ago), but am not familiar with how it is used with Typesense or the implementation details of Braft.
1. It makes sense to me that Typesense stops accepting writes when quorum is lost, but why does it stop accepting reads? Given that reads are "served by the node that receives it" during normal operation. I don't see why reads could not continue to be served by any nodes still running once quorum is lost. The reads should be no more out of date than already possible in normal operation - unless I am missing something here
2. Why can the cluster not recover once quorum is lost without manual intervention? If quorum is lost, but then the down nodes come back online. I would expect a normal election to be possible and for a new leader to be elected. The documentation cites the risk of a split brain, but afaik this is not possible in raft as any writes require ack from a majority of nodes, thus there can be only one active leader at a time.
I appreciate any input on these point!