Troubleshooting Typesense Cluster Multi-node Leadership Error
TLDR Bill experienced a problem with a new typesense cluster, receiving an error about no leader and health status issues. Jason and Kishore Nallan provided troubleshooting steps and determined it was likely due to a communication issue between nodes. Kishore Nallan identified a potential solution involving resetting the data directory. Following this, Bill reported the error resolved.
Jan 16, 2022 (24 months ago)
Bill
10:01 PMJan 17, 2022 (24 months ago)
Jason
01:56 AMBill
10:04 AMKishore Nallan
10:12 AMBill
11:14 AMI20220117 11:12:54.934175 2240 node.cpp:1484] node default_group:10.114.0.4:8107:8108 term 2 start pre_vote
W20220117 11:12:54.934615 2240 node.cpp:1494] node default_group:10.114.0.4:8107:8108 can't do pre_vote as it is not in 10.19.0.7:8107:8108
I20220117 11:13:00.795843 2239 node.cpp:1484] node default_group:10.114.0.4:8107:8108 term 2 start pre_vote
W20220117 11:13:00.796419 2239 node.cpp:1494] node default_group:10.114.0.4:8107:8108 can't do pre_vote as it is not in 10.19.0.7:8107:8108
I20220117 11:13:03.173813 2229 raft_server.cpp:524] Term: 2, last_index index: 1, committed_index: 0, known_applied_index: 0, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 0
W20220117 11:13:03.173848 2229 raft_server.cpp:551] Multi-node with no leader: refusing to reset peers.
I20220117 11:13:06.508644 2237 node.cpp:1484] node default_group:10.114.0.4:8107:8108 term 2 start pre_vote
W20220117 11:13:06.508692 2237 node.cpp:1494] node default_group:10.114.0.4:8107:8108 can't do pre_vote as it is not in 10.19.0.7:8107:8108
Bill
11:14 AMKishore Nallan
11:16 AMLooks like the node was part of another cluster earlier? I see two different IPs there from entirely different subnets as well.
Bill
11:16 AMBill
11:16 AMBill
11:16 AMBill
11:16 AMKishore Nallan
11:16 AMBill
11:18 AMKishore Nallan
11:18 AMBill
11:18 AMBill
11:19 AMapi-port = 8108
data-dir = /var/lib/typesense
api-key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
log-dir = /var/log/typesense
peering-address = 10.114.0.4
peering-port = 8107
nodes = /etc/typesense/nodes.txt
Kishore Nallan
11:20 AM/etc/typesense/nodes.txt
?Bill
11:20 AMBill
11:20 AMKishore Nallan
11:20 AM10.19.0.7
-> is it some other node on your infrastructure.Bill
11:20 AMKishore Nallan
11:21 AMBill
11:21 AMBill
11:21 AMKishore Nallan
11:21 AM1. Stop Typesense server on all nodes
2. Do
rm -rf /var/lib/typesense/*
on all nodes.3. Start any one node and check the logs. What do they say?
Bill
11:22 AMKishore Nallan
11:22 AMBill
11:22 AMBill
11:25 AMKishore Nallan
11:26 AMBill
11:26 AMBill
11:26 AMBill
11:27 AMKishore Nallan
11:27 AMBill
11:27 AMKishore Nallan
11:28 AMBill
11:30 AMBill
11:30 AMKishore Nallan
11:31 AMKishore Nallan
11:31 AMBill
11:32 AMKishore Nallan
11:33 AMKishore Nallan
11:33 AMBill
11:35 AMKishore Nallan
11:35 AMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Troubleshooting Multi-Node Setup in Docker
Harpreet experienced issues running a multi-node setup on Docker and received troubleshooting advice and alternative solution from Kishore Nallan.
Typesense Node Stuck in Segfault Loop After Stress Test
Adrian encountered a segfault loop when stress testing a Typesense cluster. Kishore Nallan recommended trying a newer RC build and suggested potential issues with hostname resolution.
Debugging and Recovery of a Stuck Typesense Cluster
Charlie had a wedged staging cluster. Jason provided debugging and recovery steps, and Adrian helped with more insights. It turns out the issue was insufficient disk space. Once Adrian increased the disk size, the cluster healed itself.
Resolving Server Stoppage Issues in Typesense Multi VM Cluster
gaurav faced issues with the Typesense server in a multi VM cluster, including automatic stoppage and errors. Kishore Nallan identified the lack of a quorum and suggested using three nodes. When the issue persisted, they advised running Typesense via `nohup` or `systemd` to prevent session closure from stopping the process.
Large JSONL Documents Import Issue & Resolution
Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.