Hello, we are using version 0.25.0rc24 and have re...
# community-help
c
Hello, we are using version 0.25.0rc24 and have received a segfault when restarting a different node. I do not see any changelog for RC versions, and am hoping to verify that the latest rc29 has this fixed (or submit a bug report 🙂). I am also curious if there is a timeline for a GA version of 0.25. Thank you!
Copy code
I0512 23:24:54.815885   566 external/com_github_brpc_braft/src/braft/node.cpp:2202] node default_group:192.168.131.159:8107:8108 received RequestVote from 192.168.128.63:8107:8108 in term 44 current_term 43 log_is_ok 1 votable_time 0
I20230512 23:24:54.815958   550 raft_server.h:287] Node stops following { leader_id=192.168.132.42:8107:8108, term=43, status=Raft node receives higher term request_vote_request.}
I0512 23:24:54.822806   566 external/com_github_brpc_braft/src/braft/raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 44 votedfor 0.0.0.0:0:0 time: 6865
I0512 23:24:54.824909   566 external/com_github_brpc_braft/src/braft/raft_meta.cpp:546] Saved single stable meta, path /usr/share/typesense/data/state/meta term 44 votedfor 192.168.128.63:8107:8108 time: 2058
I20230512 23:24:54.827086   566 raft_server.h:283] Node starts following { leader_id=192.168.128.63:8107:8108, term=44, status=Follower receives message from new leader with the same term.}
I20230512 23:24:55.325982   552 raft_server.h:278] Configuration of this group is 192.168.132.42:8107:8108,192.168.128.63:8107:8108,192.168.131.159:8107:8108
F0512 23:25:00.412377   552 external/com_github_brpc_braft/src/braft/node.cpp:2515] Check failed: entry.type() != ENTRY_TYPE_CONFIGURATION (3 vs 3).
#0 0x0000016c7482 logging::DestroyLogStream()
#1 0x0000016c55ff logging::LogMessage::~LogMessage()
#2 0x0000013369fc braft::NodeImpl::handle_append_entries_request()
#3 0x0000013874d3 braft::RaftServiceImpl::append_entries()
#4 0x0000013e2580 braft::RaftService::CallMethod()
#5 0x000001550e22 brpc::policy::ProcessRpcRequest()
#6 0x000001560e8a brpc::ProcessInputMessage()
#7 0x000001560f5b brpc::InputMessenger::InputMessageClosure::~InputMessageClosure()
#8 0x000001561ee1 brpc::InputMessenger::OnNewMessages()
#9 0x00000143663d brpc::Socket::ProcessEvent()
#10 0x000001633c26 bthread::TaskGroup::task_runner()
#11 0x0000016595e1 bthread_make_fcontext

E0512 23:25:01.179824   552 include/backward.hpp:4200] Stack trace (most recent call last) in thread 552:
E0512 23:25:01.179858   552 include/backward.hpp:4200] #13   Object "/opt/typesense-server", at 0x16595e0, in bthread_make_fcontext
E0512 23:25:01.179862   552 include/backward.hpp:4200] #12   Object "/opt/typesense-server", at 0x1633c25, in bthread::TaskGroup::task_runner(long)
E0512 23:25:01.179864   552 include/backward.hpp:4200] #11   Object "/opt/typesense-server", at 0x143663c, in brpc::Socket::ProcessEvent(void*)
E0512 23:25:01.179867   552 include/backward.hpp:4200] #10   Object "/opt/typesense-server", at 0x1561ee0, in brpc::InputMessenger::OnNewMessages(brpc::Socket*)
E0512 23:25:01.179870   552 include/backward.hpp:4200] #9    Object "/opt/typesense-server", at 0x1560f5a, in brpc::InputMessenger::InputMessageClosure::~InputMessageClosure()
E0512 23:25:01.179872   552 include/backward.hpp:4200] #8    Object "/opt/typesense-server", at 0x1560e89, in brpc::ProcessInputMessage(void*)
E0512 23:25:01.179874   552 include/backward.hpp:4200] #7    Object "/opt/typesense-server", at 0x1550e21, in brpc::policy::ProcessRpcRequest(brpc::InputMessageBase*)
E0512 23:25:01.179879   552 include/backward.hpp:4200] #6    Object "/opt/typesense-server", at 0x13e257f, in braft::RaftService::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*)
E0512 23:25:01.179883   552 include/backward.hpp:4200] #5    Object "/opt/typesense-server", at 0x13874d2, in braft::RaftServiceImpl::append_entries(google::protobuf::RpcController*, braft::AppendEntriesRequest const*, braft::AppendEntriesResponse*, google::protobuf::Closure*)
E0512 23:25:01.179886   552 include/backward.hpp:4200] #4    Object "/opt/typesense-server", at 0x1336b0c, in braft::NodeImpl::handle_append_entries_request(brpc::Controller*, braft::AppendEntriesRequest const*, braft::AppendEntriesResponse*, google::protobuf::Closure*, bool)
E0512 23:25:01.179890   552 include/backward.hpp:4200] #3    Object "/opt/typesense-server", at 0x13a4ff3, in braft::LogManager::append_entries(std::vector<braft::LogEntry*, std::allocator<braft::LogEntry*> >*, braft::LogManager::StableClosure*)
E0512 23:25:01.179894   552 include/backward.hpp:4200] #2    Object "/opt/typesense-server", at 0x13a84e4, in braft::ConfigurationEntry::ConfigurationEntry(braft::LogEntry const&)
E0512 23:25:01.179897   552 include/backward.hpp:4200] #1    Object "/opt/typesense-server", at 0x13a825a, in braft::Configuration::operator=(std::vector<braft::PeerId, std::allocator<braft::PeerId> > const&)
E0512 23:25:01.179900   552 include/backward.hpp:4200] #0    Object "/opt/typesense-server", at 0x130ea74, in std::vector<braft::PeerId, std::allocator<braft::PeerId> >::size() const
Segmentation fault (Address not mapped to object [0x8])
E0512 23:25:01.796072   552 src/main/typesense_server.cpp:107] Typesense 0.25.0.rc24 is terminating abruptly.
k
How are you deploying Typesense? Kubernetes?
a
Hey Charlie is my coworker. Yes we are deploying in k8s. We were rotating a single node in a 3 node cluster when this occurred
k
These kinds of errors only typically happen if somehow nodes are not rotated carefully one by one. Also do you use DNS or IP addresses for the pods?
c
We are using IP addresses. The IP address list is updated every 10 seconds, and when a node goes offline, its IP is replaced with a dummy IP in the list to keep the count of IPs at 3. Looking at the logs above the error, it appears that no dummy IPs were placed at the time of error (or placed without logs).
k
Can you not place a dummy IP? It's okay to have just 2 IPs temporarily.
c
Yes, I will make that change. Is it OK to have 1 IP temporarily? If we move to a 5-node cluster, is it OK to temporarily have 3 IPs?
k
Technically you should always only do rolling rotations. Rotate one node, wait for it to become healthy before doing next.
c
That makes sense. I'm thinking about the initialization process
(or recovery)
k
3 node raft cluster can have max 1 unavailable. 5 nodes can tolerate 2 nodes being down. But I'll just play it safe.
c
Gotcha, thanks! Given how kubernetes works with pod DNS resolution, when we first install the TS cluster, one pod will come up at a time. If the list has a single IP to start, and soon after has 3 IPs, will typesense boot in single-node mode and be unable to transition to a multi-node cluster?
k
Nope that's perfectly fine. Nodes will join as cluster expands.
👍 1