Segfault in Typesense 0.25.0rc24 during Node Restart
TLDR Charlie reported a segfault while restarting node in a k8s deployment using version 0.25.0.rc24. Kishore Nallan advised rolling rotations for nodes and confirmed that nodes will join as the cluster expands.
1
May 14, 2023 (7 months ago)
Charlie
08:58 PMCharlie
08:58 PMI0512 23:24:54.815885 566 external/com_github_brpc_braft/src/braft/node.cpp:2202] node default_group:192.168.131.159:8107:8108 received RequestVote from 192.168.128.63:8107:8108 in term 44 current_term 43 log_is_ok 1 votable_time 0
I20230512 23:24:54.815958 550 raft_server.h:287] Node stops following { leader_id=192.168.132.42:8107:8108, term=43, status=Raft node receives higher term request_vote_request.}
I0512 23:24:54.822806 566 external/com_github_brpc_braft/src/braft/raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 44 votedfor 0.0.0.0:0:0 time: 6865
I0512 23:24:54.824909 566 external/com_github_brpc_braft/src/braft/raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 44 votedfor 192.168.128.63:8107:8108 time: 2058
I20230512 23:24:54.827086 566 raft_server.h:283] Node starts following { leader_id=192.168.128.63:8107:8108, term=44, status=Follower receives message from new leader with the same term.}
I20230512 23:24:55.325982 552 raft_server.h:278] Configuration of this group is 192.168.132.42:8107:8108,192.168.128.63:8107:8108,192.168.131.159:8107:8108
F0512 23:25:00.412377 552 external/com_github_brpc_braft/src/braft/node.cpp:2515] Check failed: entry.type() != ENTRY_TYPE_CONFIGURATION (3 vs 3).
#0 0x0000016c7482 logging::DestroyLogStream()
#1 0x0000016c55ff logging::LogMessage::~LogMessage()
#2 0x0000013369fc braft::NodeImpl::handle_append_entries_request()
#3 0x0000013874d3 braft::RaftServiceImpl::append_entries()
#4 0x0000013e2580 braft::RaftService::CallMethod()
#5 0x000001550e22 brpc::policy::ProcessRpcRequest()
#6 0x000001560e8a brpc::ProcessInputMessage()
#7 0x000001560f5b brpc::InputMessenger::InputMessageClosure::~InputMessageClosure()
#8 0x000001561ee1 brpc::InputMessenger::OnNewMessages()
#9 0x00000143663d brpc::Socket::ProcessEvent()
#10 0x000001633c26 bthread::TaskGroup::task_runner()
#11 0x0000016595e1 bthread_make_fcontext
E0512 23:25:01.179824 552 include/backward.hpp:4200] Stack trace (most recent call last) in thread 552:
E0512 23:25:01.179858 552 include/backward.hpp:4200] #13 Object "/opt/typesense-server", at 0x16595e0, in bthread_make_fcontext
E0512 23:25:01.179862 552 include/backward.hpp:4200] #12 Object "/opt/typesense-server", at 0x1633c25, in bthread::TaskGroup::task_runner(long)
E0512 23:25:01.179864 552 include/backward.hpp:4200] #11 Object "/opt/typesense-server", at 0x143663c, in brpc::Socket::ProcessEvent(void*)
E0512 23:25:01.179867 552 include/backward.hpp:4200] #10 Object "/opt/typesense-server", at 0x1561ee0, in brpc::InputMessenger::OnNewMessages(brpc::Socket*)
E0512 23:25:01.179870 552 include/backward.hpp:4200] #9 Object "/opt/typesense-server", at 0x1560f5a, in brpc::InputMessenger::InputMessageClosure::~InputMessageClosure()
E0512 23:25:01.179872 552 include/backward.hpp:4200] #8 Object "/opt/typesense-server", at 0x1560e89, in brpc::ProcessInputMessage(void*)
E0512 23:25:01.179874 552 include/backward.hpp:4200] #7 Object "/opt/typesense-server", at 0x1550e21, in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
E0512 23:25:01.179879 552 include/backward.hpp:4200] #6 Object "/opt/typesense-server", at 0x13e257f, in braft::RaftService::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*)
E0512 23:25:01.179883 552 include/backward.hpp:4200] #5 Object "/opt/typesense-server", at 0x13874d2, in braft::RaftServiceImpl::append_entries(google::protobuf::RpcController*, braft::AppendEntriesRequest const*, braft::AppendEntriesResponse*, google::protobuf::Closure*)
E0512 23:25:01.179886 552 include/backward.hpp:4200] #4 Object "/opt/typesense-server", at 0x1336b0c, in braft::NodeImpl::handle_append_entries_request(brpc::Controller*, braft::AppendEntriesRequest const*, braft::AppendEntriesResponse*, google::protobuf::Closure*, bool)
E0512 23:25:01.179890 552 include/backward.hpp:4200] #3 Object "/opt/typesense-server", at 0x13a4ff3, in braft::LogManager::append_entries(std::vector<braft::LogEntry*, std::allocator<braft::LogEntry*> >*, braft::LogManager::StableClosure*)
E0512 23:25:01.179894 552 include/backward.hpp:4200] #2 Object "/opt/typesense-server", at 0x13a84e4, in braft::ConfigurationEntry::ConfigurationEntry(braft::LogEntry const&)
E0512 23:25:01.179897 552 include/backward.hpp:4200] #1 Object "/opt/typesense-server", at 0x13a825a, in braft::Configuration::operator=(std::vector<braft::PeerId, std::allocator<braft::PeerId> > const&)
E0512 23:25:01.179900 552 include/backward.hpp:4200] #0 Object "/opt/typesense-server", at 0x130ea74, in std::vector<braft::PeerId, std::allocator<braft::PeerId> >::size() const
Segmentation fault (Address not mapped to object [0x8])
E0512 23:25:01.796072 552 src/main/typesense_server.cpp:107] Typesense 0.25.0.rc24 is terminating abruptly.
May 15, 2023 (7 months ago)
Kishore Nallan
03:21 AMAdrian
02:56 PMKishore Nallan
02:59 PMCharlie
03:13 PMKishore Nallan
03:14 PMCharlie
03:15 PMIf we move to a 5-node cluster, is it OK to temporarily have 3 IPs?
Kishore Nallan
03:16 PMCharlie
03:16 PMCharlie
03:16 PMKishore Nallan
03:17 PMCharlie
03:21 PMKishore Nallan
03:22 PM1
Typesense
Indexed 3011 threads (79% resolved)
Similar Threads
Typesense Node Stuck in Segfault Loop After Stress Test
Adrian encountered a segfault loop when stress testing a Typesense cluster. Kishore Nallan recommended trying a newer RC build and suggested potential issues with hostname resolution.
Debugging and Recovery of a Stuck Typesense Cluster
Charlie had a wedged staging cluster. Jason provided debugging and recovery steps, and Adrian helped with more insights. It turns out the issue was insufficient disk space. Once Adrian increased the disk size, the cluster healed itself.
Troubleshooting IP Update on Kubernetes Typesense
Alessandro and Damien are having issues with old IP addresses in a Kubernetes Typesense cluster not being updated. Kishore Nallan provides possible troubleshooting solutions, and mentioned the need for a fix for DNS retries. A suggested update strategy was shared by Aljosa.
Troubleshooting Typesense Cluster Multi-node Leadership Error
Bill experienced a problem with a new typesense cluster, receiving an error about no leader and health status issues. Jason and Kishore Nallan provided troubleshooting steps and determined it was likely due to a communication issue between nodes. Kishore Nallan identified a potential solution involving resetting the data directory. Following this, Bill reported the error resolved.
Testing High Availability with Raft Returns Crashes
pboros reports an issue with usual crashes when testing high availability with Raft. Kishore Nallan suggests checking the quorum recovery period and efficiently logging the crash on all nodes. The issue persists, with pboros suspecting it's due to hostname being no longer resolvable once a container is killed.