#community-help

Resolving Server Stoppage Issues in Typesense Multi VM Cluster

TLDR gaurav faced issues with the Typesense server in a multi VM cluster, including automatic stoppage and errors. Kishore Nallan identified the lack of a quorum and suggested using three nodes. When the issue persisted, they advised running Typesense via nohup or systemd to prevent session closure from stopping the process.

Powered by Struct AI

2

Nov 17, 2022 (13 months ago)
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
03:55 AM
hi i was trying to run typesense in multi VM cluster and it worked as expected for a while but then it automatically stops.
i am using ssm to access VM and what i have seen typesense on docker works perfectly but typesense service such as typesense-server crashes after a while. any ideas?

also now i am again trying to set up cluster one VM is throwing 503 error.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:56 AM
Please share all relevant logs
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
04:18 AM
Sure,
Its a 2 VM configuration. VM1 throws 503, {'ok': False} in health and VM2 200 {'ok': True}

To start server i had started using below commands
 VM1: sudo typesense-server \
>   --data-dir /home/charlie/cluster/typesense \
>   --api-key=xyz \
>   --api-address 0.0.0.0 \
>   --api-port 8108 \
>   --peering-address 10.212.22.59 \
>   --peering-port 8107 \
>   --log-dir=/home/charlie/cluster/logs \
>   --nodes=/home/charlie/cluster/nodes
Log directory is configured as: /home/charlie/cluster/logs```
VM2: sudo typesense-server \
> --data-dir /home/charlie/cluster/typesense \
> --api-key=xyz \
> --api-address 0.0.0.0 \
> --api-port 8108 \
> --peering-address 10.212.22.189 \
> --peering-port 8107 \
> --log-dir=/home/charlie/cluster/logs
Log directory is configured as: /home/charlie/cluster/logs
E20221117 09:35:39.196200 2809 raft_server.cpp:589] Node not ready yet (known_applied_index is 0).
E20221117 09:35:39.196218 2817 raft_server.h:62] Peer refresh failed, error: Doing another configuration change`` When i debug the API i am getting state 4 for VM1 and state 1` for VM2.
Also attached logs

problem is for VM1 i dont know what’s the issue its throwing 503 error. Altough i can see from debug it is working as expected.
06:13
gaurav
06:13 AM
Restarting VM and uninstalling and installing works for me. dont know what whats the issue
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:17 AM
Typesense uses Raft which requires 3 nodes for quorum.
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
06:17 AM
Okay cool, altough it worked on 2 VM also.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:18 AM
It will work initially but no guarantee and can lead to split brain. Without an odd number, you can't get quorum.
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
06:19 AM
Okay NP even i would set up 3 VM cluster
06:20
gaurav
06:20 AM
one more thing.
Still after 50-60 minutes, typesense server automatically stops.
It works perfectly in docker, any ideas?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:21 AM
Does this happen on 2 node or 3 node configuration?
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
06:21 AM
2 VM, let me check for 3 VM
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:22 AM
Yeah, please do. And if it fails, check the logs at the time it does. Mostly with 3 nodes it just works. Unless all host IPs change at the same time (do rolling rotations if you want to change the host for some reason).

1

gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:03 AM
So i am facing the issue again in 3 VM. I dont think its issue of VM anymore as all 3 VM stops after some time.

I am using aws EC2 instances using SSM.
From logs i can see my typesense server starts from 13.56 till 14.16 after that it automatically stops. You can see the logs but there isnt any which can point what’s the issue.
I20221117 14:14:45.019073 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:14:45.019165 21940 raft_server.h:60] Peer refresh succeeded!
I20221117 14:14:55.020509 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:14:55.020674 21942 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:05.022079 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:05.022176 21951 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:15.023499 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:15.023589 21940 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:25.025126 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:25.025230 21942 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:33.873915 21929 batched_indexer.cpp:250] Running GC for aborted requests, req map size: 0
I20221117 14:15:35.026558 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:35.026647 21951 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:45.028023 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:45.028113 21940 raft_server.h:60] Peer refresh succeeded!
I20221117 14:15:55.029551 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:15:55.029634 21942 raft_server.h:60] Peer refresh succeeded!
I20221117 14:16:05.030953 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:16:05.031044 21951 raft_server.h:60] Peer refresh succeeded!
I20221117 14:16:15.032476 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:16:15.032645 21940 raft_server.h:60] Peer refresh succeeded!
I20221117 14:16:25.034013 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:16:25.034106 21942 raft_server.h:60] Peer refresh succeeded!
I20221117 14:16:34.881245 21929 batched_indexer.cpp:250] Running GC for aborted requests, req map size: 0
I20221117 14:16:35.035606 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:16:35.035696 21951 raft_server.h:60] Peer refresh succeeded!
I20221117 14:16:45.037305 21928 raft_server.cpp:534] Term: 32, last_index index: 161, committed_index: 161, known_applied_index: 161, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 759502
I20221117 14:16:45.037391 21940 raft_server.h:60] Peer refresh succeeded!

Now when i check in /var/log/messages i can see some logs pertaining to same time as typesense server.

Nov 17 13:56:14 ds-pro-search-0202 systemd-logind: New session c70 of user root.
Nov 17 14:16:53 ds-pro-search-0202 systemd-logind: Removed session c70.

Do you know anything on this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:04 AM
When you mean by "it stops" does the process die?
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:04 AM
yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:05 AM
And the logs posted above were the last few lines? Nothing after that?
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:05 AM
yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:05 AM
And all 3 nodes shut down at the same time?
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:06 AM
no its based when the VM service SSM session i started
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:07 AM
How are you running Typesense? If you are not running the server via nohup or screen, it will shut down when your SSH session expires.
09:08
Kishore Nallan
09:08 AM
The shell is the parent process for any program you start from the command line. When your shell connection closes, all child processes will be killed. The timestamp seems to match with /var/log/messages
09:08
Kishore Nallan
09:08 AM
Use the DEB/RPM to install Typesense: they use systemd for handling this issue. You can start/stop typesense service.
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:12 AM
okay sure thanks. Will check that out.
Will update on this. However thanks for this will test out.
09:14
gaurav
09:14 AM
I am actually using RPM, as my VM is centos and it doesnt work.
09:14
gaurav
09:14 AM
Looks like docker is only way
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:14 AM
RPM does not work?
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:14 AM
yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:14 AM
What error are you getting? And what CentOS version? We've other users who use RPM and CentOS.
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
09:21 AM
Sorry u misunderstood, RPM works in installing typesense, however as we were discussing previously i am having difficulity in running the service after my terminal/ session closes

Just for refrence
CentOS Linux release 7.9.2009 
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:24 AM
:thinking_face: RPM also uses systemd. That certainly works.

In any case, the quickest fix for you now is to change your command structure to run via nohup this way:

nohup ./typesense-server <arguments> &

This will 100% ensure that process is not killed when session closes.
gaurav
Photo of md5-c793ac7faa870e19aa043d1f9b35abd1
gaurav
05:38 PM
hi Kishore Nallan, it worked. For best use i can also recommend systemd

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Addressing High CPU Usage in Typesense

Robert reported high CPU usage on Typesense, even after halting all incoming searches. Kishore Nallan suggested logging heavy queries and increasing thread count. The issue was resolved after Robert found and truncated unusually large documents in the database.

35
14mo
Solved

Troubleshooting Stalled Writes in TypeSense Instance

Robert was experiencing typesense instances getting stuck after trying to import documents. Kishore Nallan provided suggestions and added specific logs to diagnose the issue. The two identified queries causing troubles but the issues had not been fully resolved yet.

7

57
12mo

Large JSONL Documents Import Issue & Resolution

Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.

run

4

94
9mo
Solved

Typesense Node Stuck in Segfault Loop After Stress Test

Adrian encountered a segfault loop when stress testing a Typesense cluster. Kishore Nallan recommended trying a newer RC build and suggested potential issues with hostname resolution.

6

31
7mo

Troubleshooting Typesense 503 Errors and Usage Queries

Kevin encountered 503s using typesense. Jason asked for logs and explained why 503s occur. They made recommendations to remedy the issue and resolved Kevin's import parameter confusion. User was asked to open a github issue for accepting booleans.

2

18
4mo
Solved