Hi all - I’m trying to get a HA cluster running on...
# community-help
j
Hi all - I’m trying to get a HA cluster running on Fly, I see that the current workaround is to use
6tunnel
since Fly.io doesnt natively support IPv4 but I have a bit of confusion for what I need to be using as the IPs for the nodes file and also the peering addresses (since they cant be IPv4…right?) I posted a question on the Fly.io forum but I think it’s a bit too typesense specific so was wondering if we could work through it here? Please let me know what else I can provide Current raft error I see is:
Copy code
typesense-server | W0519 20:38:53.494565   652 external/com_github_brpc_braft/src/braft/node.cpp:1559] node default_group:127.0.0.1:8107:8063 request PreVote from 127.0.0.1:8107:8062 error: [E2][172.19.128.34:8107][E2]peer_id not exist
Most likely has to do with my nodes file using localhost since I thought that’s how I could use 6tunnel to communicate from port to other node 🤔 https://community.fly.io/t/how-to-run-a-typesense-ha-cluster-on-fly/12994
j
Hmmm! Could you share your current nodes file?
j
Yea! It’s linked in the post there for full context, but here ya go 🙂
Copy code
localhost:8107:8062,localhost:8107:8063
Also for context for multiple nodes, the below is hardcoded for now but thought I could do something like the code below for each region I deploy in: So in this case Port 8062 is for LAX region and 8063 is ported to SJC region.
Copy code
if typesenseVars.Region == "lax" {

		exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()

		exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
		exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

		exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
		exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "sjc.foundry-typesense-cluster.internal").Run()

		svisor.AddProcess(
			"typesense-server",
			"doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
			supervisor.WithRestart(0, 1*time.Second),
		)

	}

	if typesenseVars.Region == "sjc" {

		exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8063").Run()

		exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "localhost", "8063").Run()
		exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

		exec.Command("6tunnel", "-4", "-l", "localhost", "8062", "lax.foundry-typesense-cluster.internal").Run()
		exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "lax.foundry-typesense-cluster.internal").Run()

		svisor.AddProcess(
			"typesense-server",
			"doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8063 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
			supervisor.WithRestart(0, 1*time.Second),
		)

	}
Also more than happy to write some documentation for how to currently handle HA on Fly as I’m sure it’ll be fairly common!
j
Yeah it’s probably the repeated hostnames that are throwing things off. Are you able to setup different local ip addresses on each node via 6tunnel? For eg on node 1, setup
127.0.0.1
as the IP, on node 2, setup
127.0.0.2
, and node 3 setup
127.0.0.3
j
Thanks for the help so far 🙂 Bear with me as I’m pretty unknowledgeable on network related topics! Is this kind of what you mean? I changed all
localhost
’s in the
6tunnel
to 127.0.0.1 for LAX and 127.0.0.2 for SJC
Copy code
if typesenseVars.Region == "lax" {

		exec.Command("6tunnel", "-6", "-l", "::", "8108", "127.0.0.1", "8062").Run()

		exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "127.0.0.1", "8062").Run()
		exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "127.0.0.1", "8107").Run()

		exec.Command("6tunnel", "-4", "-l", "127.0.0.1", "8063", "sjc.foundry-typesense-cluster.internal").Run()
		exec.Command("6tunnel", "-4", "-l", "127.0.0.1", "8107", "sjc.foundry-typesense-cluster.internal").Run()

		svisor.AddProcess(
			"typesense-server",
			"doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
			supervisor.WithRestart(0, 1*time.Second),
		)

	}

	if typesenseVars.Region == "sjc" {

		exec.Command("6tunnel", "-6", "-l", "::", "8108", "127.0.0.2", "8063").Run()

		exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "127.0.0.2", "8063").Run()
		exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "127.0.0.2", "8107").Run()

		exec.Command("6tunnel", "-4", "-l", "127.0.0.2", "8062", "lax.foundry-typesense-cluster.internal").Run()
		exec.Command("6tunnel", "-4", "-l", "127.0.0.2", "8107", "lax.foundry-typesense-cluster.internal").Run()

		svisor.AddProcess(
			"typesense-server",
			"doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8063 --peering-address 127.0.0.2 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
			supervisor.WithRestart(0, 1*time.Second),
		)

	}
And then I updated my nodes file like this for LAX and SJC respectively:
Copy code
127.0.0.1:8107:8062,127.0.0.2:8107:8063
As a result, I get an error from LAX stating that it cannot
listen to 127.0.0.1:8107
which means that I haven’t yet setup
6tunnel
correctly to listen on that address
j
This is probably a question for the Fly team…
j
roger that
j
Essentially from Typesense’s perspective, you’d need unique IP addresses for each node (it’s ok for the port numbers to be the same)
Oh wait
hang on, scratch that. Not sure what I was thinking!
You need a unique combination of
ip:api_port:peering_port
for each node
From your very first setup:
Copy code
localhost:8107:8062,localhost:8107:8063
Could you try using doing something like this instead:
Copy code
127.0.0.1:8107:8062,127.0.0.1:8107:8063
Essentially use IP address instead of hostname
j
absolutely - one sec!
j
Also, two nodes will not form a cluster, you’d need at least 3 nodes running together (odd number)
j
yea! since one can fail in a three node cluster, I figured just trying to get two running before pushing a third out was an OK approach, but I can push a third if that may be what’s causing issues
j
Once a 3 node cluster starts functioning (they’ve already established quorum about which node is the leader), then the cluster will tolerate 1 node failure. But if only two nodes come up afresh, then they won’t be able to elect a leader among themselves
j
ooh i see - thanks for that insight, will push out another. i feel like issues with nodes unable to listen on 8107 is going to be a problem regardless so let me see if that error goes away first (or maybe im off again hah)
j
Could you try using
localhost
in the 6tunnel config, and only use
127.0.0.1
in the Typesense nodes file?
j
yea for sure. I think what’s about to happen is when I deploy to LAX, we’ll see that it can listen on 8107 (but cant start peer vote with the other server since it’s not running yet), but when i push another SJC will say that it cannot listen on 8107 (and not even get to the method to try and peer vote). About to confirm
here’s LAX confirming it can listen on 8107, about to spin up SJC
yea get a load of this 🤯 for some reason LAX can listen on 8107 but SJC cant
pretty sure I’ve done it the other way too where I deployed SJC first and it was able to listen to that port but the second server messes things up
j
Could you post more of the log lines?
Could you also try adding
--api-address 127.0.0.1
as an additional command line arg to
typesense-server
?
j
I’m using Axiom for logs, I would export if i knew how let me look in to that. here’s a screenshot leading up to the SJC error for now. and yes let me try adding that address
j
Did you change this line in SJC back to 127.0.0.1?
Copy code
exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "127.0.0.2", "8063").Run()
		exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "127.0.0.2", "8107").Run()
That has to be 127.0.0.1
j
j
Ah right ok!
j
thanks for double checking - let me try that api address flag
pushing now - the fact that it works for one and not the other has me puzzled lol
j
Yeah, if somehow the network layer is shared across containers in Fly may be?
If that’s the case you might want to try different port numbers for each api-port and peering port on each node
j
ok both are showing a failure to listen on 8107, maybe that’s what happens when both are running and ive just seen it failing on one when both start running. alright lets see how i can change up those ports 😄 EDIT: of which i already do for the api ports, just need to do it for the peering
ok making headway i think, no more peer listening errors, now it just cannot seem to connect to the other server for the peer vote. This wouldnt be a problem with running 2 servers instead of 3….yet right?
j
I don’t think so, this error seems like it’s not even able to open the tcp connection to the other node
Although, I do see the multi-node with no leader error message as well. That is definitely related to having only 2 nodes
Could you try with all 3 nodes, just to rule out that causing some side effects
j
yea! 😄 I’ve added a 3rd node for SEA to confirm things while debugging. I’ve sorta narrowed it down to 6tunnel not sending the peering request in the right ip format or the receiving side not set up correctly to listen to whatever ip format is being set
line 52 is where it should be picking up that request that you see in the error that is marked as connection refused
j
Don’t you need another line similar to line 46, for the other port (8107) as well?
j
thats what i was sorta thinking too, however i cant set something like
Copy code
exec.Command("6tunnel", "-6", "-l", "::", "8107", "localhost", "8107").Run()
since that’s the port I’m using in my
typesense-server
command and it’ll throw a conflict so I need to add an intermediary port:
Copy code
exec.Command("6tunnel", "-6", "-l", "::", "8117", "localhost", "8107").Run()
^ like that and then from my other nodes send to 8117 ..i think sighs
j
Right, that’s what I’m thinking too
Just like how you’ve mapped Typesense’s api port 8108 to another port
j
Alrighty glad we’re thinking the same, I’ll give this a go. About to catch a flight so may be a tad delayed. Really appreciate the help thinking this out so far Jason. Will report back
👍 1
Hey Jason - reporting back. A bit of a caveat with the re-mapping of the peering port is that Typesense throws an error that the ip/port pair does not exist in the
nodes
file. Let me explain… 1. I’ve added a new IPv6 listener that routes requests from Port
8117
to
8107
(my peering port defined as flag) 2. I route localhost
8119
requests from LAX to SJC (in the same vain where
8119
is the intermediary port that points to SJC’s
8109
peering port). Here’s the LAX block, added
new!
two these new updates
Copy code
if typesenseVars.Region == "lax" {

	exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()
	exec.Command("6tunnel", "-6", "-l", "::", "8117", "localhost", "8107").Run() // new!

	exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
	exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

	exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
	exec.Command("6tunnel", "-4", "-l", "localhost", "8119", "sjc.foundry-typesense-cluster.internal").Run() // new!

	svisor.AddProcess(
		"typesense-server",
		"doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address=127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
		supervisor.WithRestart(0, 1*time.Second),
	)

}
For typesense to actually choose who votes, we need to update the
nodes
file too, and since we need the requests to be sent to the intermediary ports we need to update our file like so:
Copy code
localhost:8117:8062,localhost:8119:8063 (8117 instead of 8107 etc)
Which now means Typesense throws an error:
Copy code
typesense-server | W0520 19:07:48.512210   652 external/com_github_brpc_braft/src/braft/node.cpp:1589] node default_group:127.0.0.1:8107:8062 can't do pre_vote as it is not in 127.0.0.1:8117:8062,127.0.0.1:8119:8063
Since the IP/port (127.0.0.1:8107) defined in the server flag is not in the nodes file…I have been stumped again.
j
That’s great to hear! 🙌