#community-help

Setting Up HA Cluster on Fly with Typesense

TLDR Jordan needed help setting up an HA cluster on Fly for Typesense. Jason assisted in troubleshooting the configuration and 6tunnel usage. Eventually, a solution was found in the Fly community forum.

Powered by Struct AI

1

May 19, 2023 (7 months ago)
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
08:59 PM
Hi all - I’m trying to get a HA cluster running on Fly, I see that the current workaround is to use 6tunnel since Fly.io doesnt natively support IPv4 but I have a bit of confusion for what I need to be using as the IPs for the nodes file and also the peering addresses (since they cant be IPv4…right?)

I posted a question on the Fly.io forum but I think it’s a bit too typesense specific so was wondering if we could work through it here? Please let me know what else I can provide

Current raft error I see is:

typesense-server | W0519 20:38:53.494565   652 external/com_github_brpc_braft/src/braft/node.cpp:1559] node default_group:127.0.0.1:8107:8063 request PreVote from 127.0.0.1:8107:8062 error: [E2][172.19.128.34:8107][E2]peer_id not exist

Most likely has to do with my nodes file using localhost since I thought that’s how I could use 6tunnel to communicate from port to other node :thinking_face:

https://community.fly.io/t/how-to-run-a-typesense-ha-cluster-on-fly/12994
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:31 PM
Hmmm! Could you share your current nodes file?
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
10:37 PM
Yea! It’s linked in the post there for full context, but here ya go 🙂

localhost:8107:8062,localhost:8107:8063

Also for context for multiple nodes, the below is hardcoded for now but thought I could do something like the code below for each region I deploy in:

So in this case Port 8062 is for LAX region and 8063 is ported to SJC region.

if typesenseVars.Region == "lax" {

        exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()

        exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
        exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

        exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
        exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "sjc.foundry-typesense-cluster.internal").Run()

        svisor.AddProcess(
            "typesense-server",
            "doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
            supervisor.WithRestart(0, 1*time.Second),
        )

    }

    if typesenseVars.Region == "sjc" {

        exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8063").Run()

        exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "localhost", "8063").Run()
        exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

        exec.Command("6tunnel", "-4", "-l", "localhost", "8062", "lax.foundry-typesense-cluster.internal").Run()
        exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "lax.foundry-typesense-cluster.internal").Run()

        svisor.AddProcess(
            "typesense-server",
            "doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8063 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
            supervisor.WithRestart(0, 1*time.Second),
        )

    }

Also more than happy to write some documentation for how to currently handle HA on Fly as I’m sure it’ll be fairly common!
May 20, 2023 (6 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:32 AM
Yeah it’s probably the repeated hostnames that are throwing things off.

Are you able to setup different local ip addresses on each node via 6tunnel?

For eg on node 1, setup 127.0.0.1 as the IP, on node 2, setup 127.0.0.2, and node 3 setup 127.0.0.3
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
12:47 AM
Thanks for the help so far 🙂 Bear with me as I’m pretty unknowledgeable on network related topics! Is this kind of what you mean? I changed all localhost ’s in the 6tunnel to 127.0.0.1 for LAX and 127.0.0.2 for SJC

if typesenseVars.Region == "lax" {

        exec.Command("6tunnel", "-6", "-l", "::", "8108", "127.0.0.1", "8062").Run()

        exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "127.0.0.1", "8062").Run()
        exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "127.0.0.1", "8107").Run()

        exec.Command("6tunnel", "-4", "-l", "127.0.0.1", "8063", "sjc.foundry-typesense-cluster.internal").Run()
        exec.Command("6tunnel", "-4", "-l", "127.0.0.1", "8107", "sjc.foundry-typesense-cluster.internal").Run()

        svisor.AddProcess(
            "typesense-server",
            "doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address 127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
            supervisor.WithRestart(0, 1*time.Second),
        )

    }

    if typesenseVars.Region == "sjc" {

        exec.Command("6tunnel", "-6", "-l", "::", "8108", "127.0.0.2", "8063").Run()

        exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "127.0.0.2", "8063").Run()
        exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "127.0.0.2", "8107").Run()

        exec.Command("6tunnel", "-4", "-l", "127.0.0.2", "8062", "lax.foundry-typesense-cluster.internal").Run()
        exec.Command("6tunnel", "-4", "-l", "127.0.0.2", "8107", "lax.foundry-typesense-cluster.internal").Run()

        svisor.AddProcess(
            "typesense-server",
            "doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8063 --peering-address 127.0.0.2 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
            supervisor.WithRestart(0, 1*time.Second),
        )

    }

And then I updated my nodes file like this for LAX and SJC respectively:

127.0.0.1:8107:8062,127.0.0.2:8107:8063

As a result, I get an error from LAX stating that it cannot listen to 127.0.0.1:8107 which means that I haven’t yet setup 6tunnel correctly to listen on that address
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:51 AM
This is probably a question for the Fly team…
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
12:51 AM
roger that
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:51 AM
Essentially from Typesense’s perspective, you’d need unique IP addresses for each node (it’s ok for the port numbers to be the same)
12:52
Jason
12:52 AM
Oh wait
12:52
Jason
12:52 AM
hang on, scratch that. Not sure what I was thinking!
12:52
Jason
12:52 AM
You need a unique combination of ip:api_port:peering_port for each node
12:54
Jason
12:54 AM
From your very first setup:

localhost:8107:8062,localhost:8107:8063

Could you try using doing something like this instead:
127.0.0.1:8107:8062,127.0.0.1:8107:8063
12:54
Jason
12:54 AM
Essentially use IP address instead of hostname
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
12:55 AM
absolutely - one sec!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:56 AM
Also, two nodes will not form a cluster, you’d need at least 3 nodes running together (odd number)
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
12:57 AM
yea! since one can fail in a three node cluster, I figured just trying to get two running before pushing a third out was an OK approach, but I can push a third if that may be what’s causing issues
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:59 AM
Once a 3 node cluster starts functioning (they’ve already established quorum about which node is the leader), then the cluster will tolerate 1 node failure. But if only two nodes come up afresh, then they won’t be able to elect a leader among themselves
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:01 AM
ooh i see - thanks for that insight, will push out another. i feel like issues with nodes unable to listen on 8107 is going to be a problem regardless so let me see if that error goes away first (or maybe im off again hah)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:02 AM
Could you try using localhost in the 6tunnel config, and only use 127.0.0.1 in the Typesense nodes file?
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:04 AM
yea for sure. I think what’s about to happen is when I deploy to LAX, we’ll see that it can listen on 8107 (but cant start peer vote with the other server since it’s not running yet), but when i push another SJC will say that it cannot listen on 8107 (and not even get to the method to try and peer vote). About to confirm
01:05
Jordan
01:05 AM
here’s LAX confirming it can listen on 8107, about to spin up SJC
Image 1 for here’s LAX confirming it can listen on 8107, about to spin up SJC
01:07
Jordan
01:07 AM
yea get a load of this 🤯 for some reason LAX can listen on 8107 but SJC cant
Image 1 for yea get a load of this :exploding_head: for some reason LAX can listen on 8107 but SJC cant
01:07
Jordan
01:07 AM
pretty sure I’ve done it the other way too where I deployed SJC first and it was able to listen to that port but the second server messes things up
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:07 AM
Could you post more of the log lines?
01:08
Jason
01:08 AM
Could you also try adding --api-address 127.0.0.1 as an additional command line arg to typesense-server ?
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:10 AM
I’m using Axiom for logs, I would export if i knew how let me look in to that. here’s a screenshot leading up to the SJC error for now. and yes let me try adding that address
Image 1 for I’m using Axiom for logs, I would export if i knew how let me look in to that. here’s a screenshot leading up to the SJC error for now. and yes let me try adding that address
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:11 AM
Did you change this line in SJC back to 127.0.0.1?

exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "127.0.0.2", "8063").Run()
        exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8107", "127.0.0.2", "8107").Run()
01:11
Jason
01:11 AM
That has to be 127.0.0.1
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:12 AM
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:12 AM
Ah right ok!
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:12 AM
thanks for double checking - let me try that api address flag
01:13
Jordan
01:13 AM
pushing now - the fact that it works for one and not the other has me puzzled lol
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:15 AM
Yeah, if somehow the network layer is shared across containers in Fly may be?
01:15
Jason
01:15 AM
If that’s the case you might want to try different port numbers for each api-port and peering port on each node
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
01:18 AM
ok both are showing a failure to listen on 8107, maybe that’s what happens when both are running and ive just seen it failing on one when both start running.

alright lets see how i can change up those ports 😄 EDIT: of which i already do for the api ports, just need to do it for the peering
01:32
Jordan
01:32 AM
ok making headway i think, no more peer listening errors, now it just cannot seem to connect to the other server for the peer vote. This wouldnt be a problem with running 2 servers instead of 3….yet right?
Image 1 for ok making headway i think, no more peer listening errors, now it just cannot seem to connect to the other server for the peer vote. This wouldnt be a problem with running 2 servers instead of 3….yet right?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:43 AM
I don’t think so, this error seems like it’s not even able to open the tcp connection to the other node
02:45
Jason
02:45 AM
Although, I do see the multi-node with no leader error message as well. That is definitely related to having only 2 nodes
02:45
Jason
02:45 AM
Could you try with all 3 nodes, just to rule out that causing some side effects
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
02:46 AM
yea! 😄 I’ve added a 3rd node for SEA to confirm things while debugging. I’ve sorta narrowed it down to 6tunnel not sending the peering request in the right ip format or the receiving side not set up correctly to listen to whatever ip format is being set
02:47
Jordan
02:47 AM
line 52 is where it should be picking up that request that you see in the error that is marked as connection refused
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:48 AM
Don’t you need another line similar to line 46, for the other port (8107) as well?
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
02:51 AM
thats what i was sorta thinking too, however i cant set something like

exec.Command("6tunnel", "-6", "-l", "::", "8107", "localhost", "8107").Run()

since that’s the port I’m using in my typesense-server command and it’ll throw a conflict so I need to add an intermediary port:

exec.Command("6tunnel", "-6", "-l", "::", "8117", "localhost", "8107").Run()

^ like that and then from my other nodes send to 8117

..i think

sighs
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:52 AM
Right, that’s what I’m thinking too
02:52
Jason
02:52 AM
Just like how you’ve mapped Typesense’s api port 8108 to another port
Jordan
Photo of md5-ad648f0416b0aa480b934e45bcc719ee
Jordan
02:53 AM
Alrighty glad we’re thinking the same, I’ll give this a go. About to catch a flight so may be a tad delayed. Really appreciate the help thinking this out so far Jason. Will report back

1

07:10
Jordan
07:10 PM
Hey Jason - reporting back. A bit of a caveat with the re-mapping of the peering port is that Typesense throws an error that the ip/port pair does not exist in the nodes file. Let me explain…

1. I’ve added a new IPv6 listener that routes requests from Port 8117 to 8107 (my peering port defined as flag)
2. I route localhost 8119 requests from LAX to SJC (in the same vain where 8119 is the intermediary port that points to SJC’s 8109 peering port).
Here’s the LAX block, added new! two these new updates

if typesenseVars.Region == "lax" {

    exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()
    exec.Command("6tunnel", "-6", "-l", "::", "8117", "localhost", "8107").Run() // new!

    exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
    exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

    exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
    exec.Command("6tunnel", "-4", "-l", "localhost", "8119", "sjc.foundry-typesense-cluster.internal").Run() // new!

    svisor.AddProcess(
        "typesense-server",
        "doppler run -- typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address=127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
        supervisor.WithRestart(0, 1*time.Second),
    )

}

For typesense to actually choose who votes, we need to update the nodes file too, and since we need the requests to be sent to the intermediary ports we need to update our file like so:

localhost:8117:8062,localhost:8119:8063 (8117 instead of 8107 etc)

Which now means Typesense throws an error:

typesense-server | W0520 19:07:48.512210   652 external/com_github_brpc_braft/src/braft/node.cpp:1589] node default_group:127.0.0.1:8107:8062 can't do pre_vote as it is not in 127.0.0.1:8117:8062,127.0.0.1:8119:8063

Since the IP/port (127.0.0.1:8107) defined in the server flag is not in the nodes file…I have been stumped again.
10:04
Jordan
10:04 PM
May 21, 2023 (6 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:02 PM
That’s great to hear! 🙌

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community