Hi, when testing high availability with raft, when...
# community-help
p
Hi, when testing high availability with raft, when I kill a node, the new elected leader crashes intermittently.
Copy code
ts3     | I20220801 06:02:29.496749   199 raft_server.cpp:534] Term: 4, last_index index: 3, committed_index: 3, known_applied_index: 3, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 0
ts3     | I20220801 06:02:29.496763   199 node.cpp:3029] node default_group:172.18.0.4:8107:8108 change_peers from 172.18.0.3:8107:8108,172.18.0.4:8107:8108,172.18.0.5:8107:8108 to , begin removing.
ts3     | F20220801 06:02:29.496856   272 replicator.cpp:607] Check failed: !entry->peers->empty() log_index=4
ts3     | *** Check failure stack trace: ***
ts3     |     @          0x109b422  google::LogMessage::Fail()
ts3     |     @          0x109b380  google::LogMessage::SendToLog()
ts3     |     @          0x109acc2  google::LogMessage::Flush()
ts3     |     @          0x109e19c  google::LogMessageFatal::~LogMessageFatal()
ts3     |     @           0x7609ad  braft::Replicator::_prepare_entry()
ts3     |     @           0x769415  braft::Replicator::_send_entries()
ts3     |     @           0x769e68  braft::Replicator::_continue_sending()
ts3     |     @           0x7b274d  braft::LogManager::run_on_new_log()
ts3     |     @           0x8085af  bthread::TaskGroup::task_runner()
ts3     |     @           0x963441  bthread_make_fcontext
ts3     | E20220801 06:02:29.858803   272 backward.hpp:4199] Stack trace (most recent call last) in thread 272:
ts3     | E20220801 06:02:29.858835   272 backward.hpp:4199] #12   Object "/opt/typesense-server", at 0x963440, in bthread_make_fcontext
ts3     | E20220801 06:02:29.858839   272 backward.hpp:4199] #11   Object "/opt/typesense-server", at 0x8085ae, in bthread::TaskGroup::task_runner(long)
ts3     | E20220801 06:02:29.858846   272 backward.hpp:4199] #10   Source "/opt/braft-80d97b2475b3c0afca79c19b64d46bb665d704f4/src/braft/log_manager.cpp", line 831, in run_on_new_log [0x7b274c]
ts3     | E20220801 06:02:29.858848   272 backward.hpp:4199] #9    Source "/opt/braft-80d97b2475b3c0afca79c19b64d46bb665d704f4/src/braft/replicator.cpp", line 723, in _continue_sending [0x769e67]
ts3     | E20220801 06:02:29.858850   272 backward.hpp:4199] #8    Source "/opt/braft-80d97b2475b3c0afca79c19b64d46bb665d704f4/src/braft/replicator.cpp", line 649, in _send_entries [0x769414]
ts3     | E20220801 06:02:29.858851   272 backward.hpp:4199] #7    Source "/opt/braft-80d97b2475b3c0afca79c19b64d46bb665d704f4/src/braft/replicator.cpp", line 617, in _prepare_entry [0x7609ac]
ts3     | E20220801 06:02:29.858853   272 backward.hpp:4199] #6    Object "/opt/typesense-server", at 0x109e19b, in google::LogMessageFatal::~LogMessageFatal()
ts3     | E20220801 06:02:29.858855   272 backward.hpp:4199] #5    Object "/opt/typesense-server", at 0x109acc1, in google::LogMessage::Flush()
ts3     | E20220801 06:02:29.858856   272 backward.hpp:4199] #4    Object "/opt/typesense-server", at 0x109b37f, in google::LogMessage::SendToLog()
ts3     | E20220801 06:02:29.858858   272 backward.hpp:4199] #3    Object "/opt/typesense-server", at 0x109b421, in google::LogMessage::Fail()
ts3     | E20220801 06:02:29.858860   272 backward.hpp:4199] #2    Object "/opt/typesense-server", at 0x10a26b9, in google::DumpStackTraceAndExit()
ts3     | E20220801 06:02:29.858861   272 backward.hpp:4199] #1    Source "/build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c", line 79, in abort [0x7f72626e1858]
ts3     | E20220801 06:02:29.858863   272 backward.hpp:4199] #0    Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7f726270200b]
ts3     | Aborted (Signal sent by tkill() 1 0)
Did anybody see this so far?
k
Looks like leader's peers are empty. I've seen these types of error happen if you kill one too many nodes without giving chance for cluster to regain quorum. Are you giving sufficient time for the quorum to recover? We've also improved some edge cases in 0.23.1 (saw you testing with 0.23.0 previously).
p
I just killed one, I saw one of the nodes going to leader state and crashing like this shortly after
I will check 0.23.1
k
If it continues to happen, will need full logs leading to the crash on all nodes.
p
It happens with 0.23.1 as well, not always, but once in every 2-3 times
let me grab logs
I have 3 nodes, ts{1,2,3}
Copy code
root@1e235d517537:/# for i in $(seq 1 3) ; do echo -n "ts${i}: " ;  curl "<http://ts>${i}:8108/debug" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" ; echo ; done
ts1: {"state":1,"version":"0.23.1"}
ts2: {"state":4,"version":"0.23.1"}
ts3: {"state":4,"version":"0.23.1"}
root@1e235d517537:/# for i in $(seq 1 3) ; do echo -n "ts${i}: " ;  curl "<http://ts>${i}:8108/debug" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" ; echo ; done
ts1: curl: (6) Could not resolve host: ts1

ts2: {"state":4,"version":"0.23.1"}
ts3: {"state":1,"version":"0.23.1"}
root@1e235d517537:/#
root@1e235d517537:/# for i in $(seq 1 3) ; do echo -n "ts${i}: " ;  curl "<http://ts>${i}:8108/debug" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" ; echo ; done
ts1: curl: (6) Could not resolve host: ts1

ts2: {"state":4,"version":"0.23.1"}
ts3: curl: (6) Could not resolve host: ts3
Killed ts1, ts3 was elected a leader and crashed shortly after
k
Were these brand new nodes?
p
Yes
With no data even
k
What does the
nodes
file contain? Also are you running these on localhost or remotely?
p
I am running them in docker compose, so each node is a separate container
Copy code
ts1:8107:8108,ts2:8107:8108,ts3:8107:8108
This is the nodes file
k
Okay that looks fine. Is it easy for you to just run them from the raw binary on separate ports? Our "chaos monkey" test for the clustering runs that way and I've never found such a trivial issue via that or during any of our rotations (we've done thousands on live rotations on Typesense Cloud). Not ruling out a bug, but a bit surprised to see the log line about
change_peers from 172.18.0.3:8107:8108,172.18.0.4:8107:8108,172.18.0.5:8107:8108 to , begin removing.
-- not sure why it is becoming empty.
p
You mean in one OS image? Sure I can do that.
It could be some kind of race condition because it doesn't always happen
k
Yes, just start the executable one by one on different ports.
Copy code
./typesense-server --data-dir=/tmp/node-data-1 --api-key=abcd --nodes=/tmp/nodes --api-port=6108 --peering-port=6107 --peering-address=127.0.0.1
Then you can have a nodes file like this locally:
Copy code
127.0.0.1:6107:6108,127.0.0.1:7107:7108,127.0.0.1:8107:8108
p
Ok, let me try that one
k
Since i'm not familiar with Docker compose, will be easier for me to debug if we can reproduce it this way.
p
We saw very similar symptoms in running in k8s
Ok, let me try, I can use the macos binary to see if it can be reproduced that way
k
Talking about Kubernetes, Typesense deployment on k8s has some quirks because the raft library we are using supports only raw IPs. See this comment for details: https://github.com/typesense/typesense/issues/465#issuecomment-1173536082
p
Yes, found that
I can't reproduce the issue locally, and I think that is the issue
Both in k8s and docker compose when a container is killed, the hostname can no longer be resolved
If a process dies, it's just a connection reset at the tcp level
k
Typesense does resolve hostnames to IPs periodically and on k8s having a sidecar to directly generate IPs works unless k8s decides to relocate all nodes.