Error in restore node 2
# community-help
n
Error in restore node 2
Add Log
@Jason Bosco @Kishore Nallan
stoped and stop again in moment to restore
j
Uh oh, that looks like a bug in how we handle potentially badly formatted data... Could you open a Github issue for this with the full log you posted above, so we can track this?
n
Ok
error in client python
j
Thank you, will take a closer look
n
The Node 1 and 3 its run. I create one snapshot to node 3 and delete data in node 2 and copy snapshot to node 2 and start, Produce the same error when the data is indexing
I create one snapshot to node 1 and delete data in node 2 and copy snapshot to node 2 and start, Produce the same error when the data is indexing again
message has been deleted
j
In general, you do not want to copy data like that between nodes manually. Since each node stores its own state information that's tied to its IP address internally
So copying the data directory between nodes in the same cluster will cause the cluster state to get corrupted
If you need to reset a node, you want to stop the Typesense process on that node, delete the data directory and then restart the Typesense process on that node. That node will then reach out to the other nodes in the cluster, get a snapshot via its own internal mechanism and recover by itself
n
delete only db directory?
j
No you'd need to delete all data in the typesense dir
n
What files or directories deleted? or that endpoint we were to make the elimination?
j
You'd have to essentially do
rm -rf /data/typesense/*
in your case
n
this endpoint create one snapshot only one node??
j
Correct
If you use a snapshot generated from that API endpoint, you'd have to create a standalone 1-node cluster first with that snapshot, and then add nodes 2 and 3 after Node 1 is fully up
n
To restore the same node? or serve for the other cluster nodes?
mmmm ok
Then the snapshot, despite being only one node, serves to lift the entire cluster. In theory then the first node of the cluster is configured with that snapshot and then spreads the information to the other nodes?
j
Yup, exactly
n
mmm ok
I explain what I did and work. First create the Snapshot with the Endpoint. Remove the entire data from the data folder. Copy the STATE folder that is inside the snapshot folder, run the container and self-configured.
and the data is the same of that of the other nodes
@Jason Bosco that speed up the time to raise the node in the case of replicating the data on the node
j
I see, I haven't tried to do this myself, but good to know
👍 1
🙌 1
I usually just let the new nodes just catch up from other cluster nodes
1
😃 1
👍 1
n
because there was a long time, and before faults of this kind I can not lie down all the cluster and then lift it
k
@Nelson Moncada is this still reproduceable?
n
I delete container docker and copy snapshot to directory of data and run other container new. This resolved problem.
k
Okay if this happens again, I will be happy to look further. The proper way to do start a new node is to start with empty data directory. The node will be able to pull everything from the current leader (one of the other 2 nodes will be a leader). You don't have to do any manual snapshot and data transfer.