Hi, can you help me understand this error, while c...
# community-help
m
Hi, can you help me understand this error, while calling debug api multiple times i only receive state:1.
Copy code
W20241228 05:40:45.300213   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:45.800693   602 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:45.800693   595 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.301167   599 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.301167   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.801540   602 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.801571   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
k
Node has become unlinked from the cluster. You have to restart it.
m
i already did still getting this error.
k
You have to check why it goes into that state. You can check the logs to identify the underlying problem. e.g. running out of disk space can cause this.
m
i did restarted the pods but seeing this kind of logs
Copy code
W20241231 15:21:09.273634   454 node.cpp:843] [default_group:192.168.26.156:8107:8108 ] Refusing concurrent configuration changing
E20241231 15:21:09.273667   598 raft_server.h:62] Peer refresh failed, error: Doing another configuration change
W20241231 15:21:11.362025   593 replicator.cpp:397] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1041, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
E20241231 15:21:12.273871   454 raft_server.cpp:785] 52077 queued writes > healthy read lag of 1000
E20241231 15:21:12.273907   454 raft_server.cpp:797] 52077 queued writes > healthy write lag of 500
W20241231 15:21:13.867532   591 replicator.cpp:297] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1051, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
I20241231 15:21:15.868325   455 batched_indexer.cpp:428] Running GC for aborted requests, req map size: 5650
W20241231 15:21:16.372705   603 replicator.cpp:297] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1061, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
W20241231 15:21:18.878010   603 replicator.cpp:397] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1071, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
I20241231 15:21:19.275310   454 raft_server.cpp:706] Term: 62, pending_queue: 1, last_index: 1048973, committed: 0, known_applied: 1048970, applying: 0, pending_writes: 0, queued_writes: 52063, local_sequenc
Copy code
E20241231 15:34:02.178666   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:02.178712   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500
I20241231 15:34:07.181252   454 raft_server.cpp:706] Term: 62, pending_queue: 0, last_index: 1031276, committed: 0, known_applied: 977871, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 2235540238
E20241231 15:34:11.181556   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:11.181602   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500
I20241231 15:34:17.184123   454 raft_server.cpp:706] Term: 62, pending_queue: 0, last_index: 1031276, committed: 0, known_applied: 977871, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 2235540238
E20241231 15:34:20.184360   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:20.184481   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500
k
53405 lagging entries > healthy read lag of 1000
Is this number decreasing?
m
Nope
typesense-0 node still loading indexes, where as other nodes showing "lagging entries" logs
k
Difficult to debug self-hosted clusters remotely. I suspect that somehow the cluster lost its quorum. If it's on k8s that can happen if multiple pods restart at the same time.
m
Copy code
{
  "state": 5,
  "version": "27.1"
}
getting this on debug call