Hi can you help me understand this error while calling debug typesense #community-help

Hi, can you help me understand this error, while c...

Manav Kothari

12/28/2024, 5:43 AM

Hi, can you help me understand this error, while calling debug api multiple times i only receive state:1.

Copy code

W20241228 05:40:45.300213   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:45.800693   602 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:45.800693   595 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.301167   599 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.301167   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.801540   602 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR
W20241228 05:40:46.801571   603 node.cpp:2366] node default_group:192.168.22.146:8107:8108 is not in active state current_term 60 state ERROR

Kishore Nallan

12/28/2024, 12:35 PM

Node has become unlinked from the cluster. You have to restart it.

Manav Kothari

12/29/2024, 11:50 AM

i already did still getting this error.

Kishore Nallan

12/29/2024, 12:13 PM

You have to check why it goes into that state. You can check the logs to identify the underlying problem. e.g. running out of disk space can cause this.

Manav Kothari

12/31/2024, 3:22 PM

i did restarted the pods but seeing this kind of logs

Copy code

W20241231 15:21:09.273634   454 node.cpp:843] [default_group:192.168.26.156:8107:8108 ] Refusing concurrent configuration changing
E20241231 15:21:09.273667   598 raft_server.h:62] Peer refresh failed, error: Doing another configuration change
W20241231 15:21:11.362025   593 replicator.cpp:397] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1041, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
E20241231 15:21:12.273871   454 raft_server.cpp:785] 52077 queued writes > healthy read lag of 1000
E20241231 15:21:12.273907   454 raft_server.cpp:797] 52077 queued writes > healthy write lag of 500
W20241231 15:21:13.867532   591 replicator.cpp:297] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1051, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
I20241231 15:21:15.868325   455 batched_indexer.cpp:428] Running GC for aborted requests, req map size: 5650
W20241231 15:21:16.372705   603 replicator.cpp:297] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1061, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
W20241231 15:21:18.878010   603 replicator.cpp:397] Group default_group fail to issue RPC to 192.168.19.44:8107:8108 _consecutive_error_times=1071, [E2][192.168.19.44:8107][E2]peer_id not exist [R1][E2][192.168.19.44:8107][E2]peer_id not exist [R2][E2][192.168.19.44:8107][E2]peer_id not exist [R3][E2][192.168.19.44:8107][E2]peer_id not exist
I20241231 15:21:19.275310   454 raft_server.cpp:706] Term: 62, pending_queue: 1, last_index: 1048973, committed: 0, known_applied: 1048970, applying: 0, pending_writes: 0, queued_writes: 52063, local_sequenc

Manav Kothari

12/31/2024, 3:36 PM

Copy code

E20241231 15:34:02.178666   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:02.178712   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500
I20241231 15:34:07.181252   454 raft_server.cpp:706] Term: 62, pending_queue: 0, last_index: 1031276, committed: 0, known_applied: 977871, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 2235540238
E20241231 15:34:11.181556   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:11.181602   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500
I20241231 15:34:17.184123   454 raft_server.cpp:706] Term: 62, pending_queue: 0, last_index: 1031276, committed: 0, known_applied: 977871, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 2235540238
E20241231 15:34:20.184360   454 raft_server.cpp:781] 53405 lagging entries > healthy read lag of 1000
E20241231 15:34:20.184481   454 raft_server.cpp:793] 53405 lagging entries > healthy write lag of 500

Kishore Nallan

12/31/2024, 3:41 PM

53405 lagging entries > healthy read lag of 1000

Is this number decreasing?

Manav Kothari

12/31/2024, 3:42 PM

Nope

Manav Kothari

12/31/2024, 3:42 PM

typesense-0 node still loading indexes, where as other nodes showing "lagging entries" logs

Kishore Nallan

12/31/2024, 4:50 PM

Difficult to debug self-hosted clusters remotely. I suspect that somehow the cluster lost its quorum. If it's on k8s that can happen if multiple pods restart at the same time.

Manav Kothari

12/31/2024, 5:51 PM

Copy code

{
  "state": 5,
  "version": "27.1"
}

getting this on debug call

Open in Slack

Previous Next