Hi, Currently debugging our slow queries (seems o...
# community-help
c
Hi, Currently debugging our slow queries (seems only happening on 1 node based from dashboard, some taking as high as 48seconds queries) and found this online (screenshot): • it says that the leader node is always the one being used for write processes, if that's correct, I am correct to assume that the node 1 is always considered the primary node? Because if that's right, we plan to direct most of our read query via node 2 and 3 only
f
Can you share your cluster's ID? Also, client libraries already use round robin, no need to create separate objects of the Client
c
@Fanis Tharropoulos 89xrjew4m6sk7lqpp.a1.typesense.net but from our dashboard, there's always 1 node that 100 on CPU usage
f
Which client library are you using?
c
python
f
How have you set up your Client object?
c
Copy code
{
        'nodes': [
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }
f
https://typesense.org/docs/guide/high-availability.html#when-using-typesense-cloud-or-a-load-balancer You should also add the nearest node for the load balancing to take place
c
from our understanding, the round robin of client is that, it will use a certain node until it reaches a certain level, then that's when it will only use the next node
we used nearest_node before but was explained before that having that, it will always use that node, what we want is to distribute all incoming queries equally to all nodes
like if node 1 used now, next query should use node 2 even if node 1 is not yet loaded
@Fanis Tharropoulos how about my question regardung write process, does it always use the leader node? which is always node 1?
f
The nearest node will load balance all three nodes by default, overriding the round robin process of the client. It's taking place at the infra level
Yes, the leader will take the writes and distribute them, you can reelect a leader though, it won't always be node 1
c
how do we determine which is the leader?
f
It doesn't matter on which node the write request is sent to, Typesense will route the write to the leader internally
If you want to find the leader, you can call out to the
/debug
route. If the
state
key is 1, then it's the leader
The other nodes will have a state of 4, being follower nodes
c
would it help if like for example, the leader node is 1, then on our python client, we intentionally use node 2 and node 3 for some read queries?
@Fanis Tharropoulos is there cases where the leader node changes? or it is fixed to a certain node?
f
Yes, if the node goes unhealthy, for example
c
this is the reason on why we remove the nearest node before:
Copy code
but the point was because we only had a single worker it didn't work like that
because it round robins on dns level
but os can cache the dns response
so that works best when the typesense queries are coming from client side (e.g. browsers of users of a webshop)
that also played into removing it
@Fanis Tharropoulos because of that we create our own round robin, we have 3 clients, where for each the order of the nodes are different
Copy code
typesense_client_b_config = {
        'nodes': [
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
        ]


typesense_client_b_config = {
        'nodes': [
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }
    typesense_client_b = typesense.Client(typesense_client_b_config)


 typesense_client_c_config = {
        'nodes': [
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }
typesense_client_c = typesense.Client(typesense_client_c_config)
with this, we use them linearly for each query like round robin, if first query use client_a, then next query will use client_b, to evenly distribute tasks
@Fanis Tharropoulos but somehow, as the tasks are running, there will be times where 1 node becomes 100 CPU usage causing like 48 seconds query
any idea on why is that? maybe on your side?
f
With 3 clients with 3 nodes on each, if you're always going 1-2-3 you're basically using the round robin of each client each time. I'll have to take a look into your logs as well, but that's not advised and may be the reason why.
c
@Fanis Tharropoulos sorry, but would it be ok if we do a scheduled call or maybe one of your colleague if you're not free? my product manager and CTO are requesting for call regarding our issue since it's having huge impact on our user base and is causing downtime, it's been our highest priority right now and would really appreciate if you can give us time for a call so we explain clearly everything about this, hoping it would be no later than this Thursday (September 25) Netherlands time zone, today or tomorrow would be better Thank you, appreciate your help on this
f
Hey Cris, We can only offer support through phone and video meetings on our bussiness support tier and up. If you want, you can sign up and we'll send you a calendly link to book a meeting time that works best for you.
c
@Fanis Tharropoulos not aware of that, and we're on the free tier only, but let me just discuss our case here Case:
@Fanis Tharropoulos would you be able to check on your side if there's anything you may find weird or any idea what might be causing the issue? Case: • we have a worker with 4 concurrencies that process a schedule task typesense search 4 in parallel, at first we ran it at the Demo Typesense (not multi node) and didn't encounter any performance issues, then we deployed it to our prod typesense(multi node), somehow, in prod, one node is getting overloaded, causing query time to reached as high as 48seconds, this is the first time we encounter this, since before we only run the scheduled task linearly (so 1 at a time) • difference between demo and prod is that, in prod, we have our actual users who access our website that performs other typesense queries/write, demo is more of a testing app • also this time, the scheduled task is running typesense query 4 in parallel, which before, only 1 • 1 node is reaching as high as 100 CPU
f
It seems you're using Burst Capacity CPU, and had filled up your credits, leading to a CPU throttle. Apart from that, it looks like every write request is being routed to that unhealthy node, and not leaving it up to Typesense to route the request there internally. We'd suggest upgrading to a non-burst capacity CPU setup and following our guide for load balancing from Typesense Cloud instead of manually routing requests to different nodes
c
@Fanis Tharropoulos will take note on this, also, just need to clarify a case: We have instantiated 3 typesense clients via python library
client_a = {nodes: [host-1, host-2, host-3]}
client_b = {nodes: [host-2, host-3, host-1]}
client_c = {nodes: [host-3, host-1, host-2]}
Questions: 1. Since we our running typesense queries 4 at the same time on our scheduled task, if these queries are set to use
client_a
(where
host-1
is the first one), would they all use
host-1
? or just in case on the 2rd query, host-1 becomes unhealthy, would typesense automatically route the 3rd and 4th queries on the next node? 2. if other processes are using typesense queries and we set them to use
client_b
(where
host-2
is the first one), would they use host-2? and maybe if it's unhealthy, they will be rerouted to the next node?
f
We wouldn't suggest using three clients at once. Having a global one and letting that handle everything is the way to go.
c
@Fanis Tharropoulos that's how it is before, but to clarify, using 1 global client 1. Since we our running typesense queries 4 at the same time on our scheduled task, at first first query will use the first node, right? just in case on the 2rd query,
node 1
becomes unhealthy or reached a CPU set limit, would typesense automatically route the 3rd and 4th queries on the next node?
a
@Cris, if you don't have SDN activated, the requests are distributed in a round-robin way. But still, you'll want to add the load-balanced endpoint to the client initialization (the load-balanced endpoint is the one without any '-'). If you are using the appropriate endpoint, yes, Typesense will redirect the request. Also, for write requests, it doesn't matter which node you send it to; it will be proxied to the leader node for processing.