Hi everyone we we re running a self hosted Typesense Cluster typesense #community-help

Hi everyone! we we're running a self-hosted Typese...

Thomas I.

12/11/2024, 3:25 PM

Hi everyone! we we're running a self-hosted Typesense-Cluster with three nodes (0.25.1), due to an hardware error we had to turn off two nodes, the remaining node

typesense-b

became sole "cluster" (remove the other two from the nodes file) and is the leader. After a couple of days the error is now finally solved, but the two other nodes are now out of sync, so i wanted to spin them up. At first i wanted to start

typesense-c

(with an empty nodes file) -> i get an error

Error while refreshing peer configuration: File containing nodes configuration is empty.

curl -H "X-TYPESENSE-API-KEY: xxx" "<http://typesense-c:8108/status>"

says

not ready

When i add the node itself into the nodes-file of

typesense-c

& the peer refresh was done ->

typesense-c

also became a leader Finally my question: how can

typesense-b

remain the leader? I have the fear that when i add both nodes to the nodes-files of both servers, that

typesense-c

will be the leader and i have a data consistency problem (because there is no data at

typesense-c

) Do i need to keep these things in an exact sequence (start typesense-c without a nodes file, add typesense-c to the nodes-file of typesense-b,...)

Jason Bosco

12/11/2024, 10:24 PM

You want to first add typesense-c into the nodes file of typesense-b, while typesense-c is still off. Then update the nodes file in typesense-c to now be typesense-b and typesense-c and then start the process on typesense-c. (Make sure you clear the data dir on typesense-c, so it can resync the latest snapshot from typesense-b)

Jason Bosco

12/11/2024, 10:25 PM

Then once typesense-c is fully synced and reindexed, you want to add typesense-a to the nodes file of all the nodes and then start the process on typesense-a back up

Thomas I.

12/12/2024, 6:18 AM

but then typesense-b would be blocking because violating

(N-1)/2

, maybe thats the missing link -> so i could not activate typesense-c without having a short outage of typesense until both are finding each other?

Jason Bosco

12/12/2024, 5:18 PM

but then typesense-b would be blocking because violating (N-1)/2

Great observation! The nuance here is that, if you have a single node leader (where the nodes file only contains that single node's IP) and that node is healthy and serving traffic, you can add a 2nd node to the mix and have it sync data from the leader without any issues. We detect this state and let the 2nd sync data from the leader while the first node is still healthy. If we didn't have this, then there would be no way to add new nodes into a cluster when enabling a clustered environment for eg!

Thomas I.

12/12/2024, 5:29 PM

okay, so summing up: (1) i need to start typesense-c with a nodes file which contains typesense-b & typesense-c (2) let typesense-c sync the data (3) when both are in sync, add typesense-c to the nodes file in typesense-b and repeat the procedure for typesense-a

Jason Bosco

12/12/2024, 5:31 PM

In 1) you also need to update the nodes file in typesense-b, to include typesense-b & typesense-c. Only then typesense-c will sync data from typesense-b

Jason Bosco

12/12/2024, 5:32 PM

And when repeating for typesense-a, you'd first need to add typesense-a to the nodes file of all the nodes, and then start up typesense-a

Thomas I.

12/12/2024, 5:34 PM

hmmm 🤔 - i will give it a try.. but who needs the get both nodes first? do i need to start typesense-c with both nodes and then add typesense-c to the nodes file of typesense-b (when typesense-c is up & running) because typesense-c must be up, otherwise typesense-b will be blocking but from a logical perspective typesense-c must have both nodes and must be up&running and afterwards typesense-b should get the new node

Jason Bosco

12/12/2024, 5:54 PM

I didn't fully understand your last message. But I just updated the docs with a more detailed set of steps:

Jason Bosco

12/12/2024, 5:55 PM

These steps definitely work, and will let you bring the cluster back up in a multi-node setup without having to bring down the whole cluster. (You can ignore the quorum equation during this recovery, since we've accounted for this specifically)

Jason Bosco

12/12/2024, 5:55 PM

Let me know if I can clarify anything in those steps

Thomas I.

12/13/2024, 6:31 AM

Okay, thanks for the change of the documentation, i tried it -> and it is currently catching up 💪

Copy code

W20241213 07:24:18.176734 1702532 node.cpp:843] [default_group:192.168.0.1:8107:8108 ] Refusing concurrent configuration changing
E20241213 07:24:18.176801 1702583 raft_server.h:62] Peer refresh failed, error: Doing another configuration change
I20241213 07:24:23.172806 1702575 node.cpp:754] node default_group:192.168.0.1:8107:8108 waits peer 192.168.0.2:8107:8108 to catch up

and after 4-5 minutes it was fine and i got

Peer refresh succeeded!

Thank you @Jason Bosco for your help!!

👍 1

🙌 1

2 Views

Open in Slack

Previous Next