Hi Currently debugging our slow queries seems only happening typesense #community-help

Hi, Currently debugging our slow queries (seems o...

Cris

09/22/2025, 7:31 AM

Hi, Currently debugging our slow queries (seems only happening on 1 node based from dashboard, some taking as high as 48seconds queries) and found this online (screenshot): • it says that the leader node is always the one being used for write processes, if that's correct, I am correct to assume that the node 1 is always considered the primary node? Because if that's right, we plan to direct most of our read query via node 2 and 3 only

Fanis Tharropoulos

09/22/2025, 8:00 AM

Can you share your cluster's ID? Also, client libraries already use round robin, no need to create separate objects of the Client

Cris

09/22/2025, 8:02 AM

@Fanis Tharropoulos 89xrjew4m6sk7lqpp.a1.typesense.net but from our dashboard, there's always 1 node that 100 on CPU usage

Fanis Tharropoulos

09/22/2025, 8:03 AM

Which client library are you using?

Cris

09/22/2025, 8:03 AM

python

Fanis Tharropoulos

09/22/2025, 8:03 AM

How have you set up your Client object?

Cris

09/22/2025, 8:04 AM

Copy code

{
        'nodes': [
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }

Fanis Tharropoulos

09/22/2025, 8:05 AM

https://typesense.org/docs/guide/high-availability.html#when-using-typesense-cloud-or-a-load-balancer You should also add the nearest node for the load balancing to take place

Cris

09/22/2025, 8:05 AM

from our understanding, the round robin of client is that, it will use a certain node until it reaches a certain level, then that's when it will only use the next node

Cris

09/22/2025, 8:09 AM

we used nearest_node before but was explained before that having that, it will always use that node, what we want is to distribute all incoming queries equally to all nodes

Cris

09/22/2025, 8:09 AM

like if node 1 used now, next query should use node 2 even if node 1 is not yet loaded

Cris

09/22/2025, 8:11 AM

@Fanis Tharropoulos how about my question regardung write process, does it always use the leader node? which is always node 1?

Fanis Tharropoulos

09/22/2025, 8:12 AM

The nearest node will load balance all three nodes by default, overriding the round robin process of the client. It's taking place at the infra level

Fanis Tharropoulos

09/22/2025, 8:13 AM

Yes, the leader will take the writes and distribute them, you can reelect a leader though, it won't always be node 1

Cris

09/22/2025, 8:13 AM

how do we determine which is the leader?

Fanis Tharropoulos

09/22/2025, 8:14 AM

It doesn't matter on which node the write request is sent to, Typesense will route the write to the leader internally

Fanis Tharropoulos

09/22/2025, 8:15 AM

If you want to find the leader, you can call out to the

/debug

route. If the

state

key is 1, then it's the leader

Fanis Tharropoulos

09/22/2025, 8:15 AM

The other nodes will have a state of 4, being follower nodes

Cris

09/22/2025, 8:18 AM

would it help if like for example, the leader node is 1, then on our python client, we intentionally use node 2 and node 3 for some read queries?

Cris

09/22/2025, 8:19 AM

@Fanis Tharropoulos is there cases where the leader node changes? or it is fixed to a certain node?

Fanis Tharropoulos

09/22/2025, 8:20 AM

Yes, if the node goes unhealthy, for example

Cris

09/22/2025, 8:21 AM

this is the reason on why we remove the nearest node before:

Copy code

but the point was because we only had a single worker it didn't work like that
because it round robins on dns level
but os can cache the dns response
so that works best when the typesense queries are coming from client side (e.g. browsers of users of a webshop)
that also played into removing it

Cris

09/22/2025, 8:24 AM

@Fanis Tharropoulos because of that we create our own round robin, we have 3 clients, where for each the order of the nodes are different

Copy code

typesense_client_b_config = {
        'nodes': [
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
        ]


typesense_client_b_config = {
        'nodes': [
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }
    typesense_client_b = typesense.Client(typesense_client_b_config)


 typesense_client_c_config = {
        'nodes': [
            {'host': f'{host_unique_key}-3.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-1.{host_domain}', 'port': '443', 'protocol': 'https'},
            {'host': f'{host_unique_key}-2.{host_domain}', 'port': '443', 'protocol': 'https'},
        ],
        'api_key': Bootstrapper.config.typesense_api_key,
        'connection_timeout_seconds': typesense_connection_timeout
    }
typesense_client_c = typesense.Client(typesense_client_c_config)

with this, we use them linearly for each query like round robin, if first query use client_a, then next query will use client_b, to evenly distribute tasks

Cris

09/22/2025, 8:25 AM

@Fanis Tharropoulos but somehow, as the tasks are running, there will be times where 1 node becomes 100 CPU usage causing like 48 seconds query

Cris

09/22/2025, 8:25 AM

any idea on why is that? maybe on your side?

Fanis Tharropoulos

09/22/2025, 8:28 AM

With 3 clients with 3 nodes on each, if you're always going 1-2-3 you're basically using the round robin of each client each time. I'll have to take a look into your logs as well, but that's not advised and may be the reason why.

Cris

09/23/2025, 2:22 AM

@Fanis Tharropoulos sorry, but would it be ok if we do a scheduled call or maybe one of your colleague if you're not free? my product manager and CTO are requesting for call regarding our issue since it's having huge impact on our user base and is causing downtime, it's been our highest priority right now and would really appreciate if you can give us time for a call so we explain clearly everything about this, hoping it would be no later than this Thursday (September 25) Netherlands time zone, today or tomorrow would be better Thank you, appreciate your help on this

Fanis Tharropoulos

09/23/2025, 9:01 AM

Hey Cris, We can only offer support through phone and video meetings on our bussiness support tier and up. If you want, you can sign up and we'll send you a calendly link to book a meeting time that works best for you.

Cris

09/23/2025, 9:25 AM

@Fanis Tharropoulos not aware of that, and we're on the free tier only, but let me just discuss our case here Case:

Cris

09/23/2025, 9:34 AM

@Fanis Tharropoulos would you be able to check on your side if there's anything you may find weird or any idea what might be causing the issue? Case: • we have a worker with 4 concurrencies that process a schedule task typesense search 4 in parallel, at first we ran it at the Demo Typesense (not multi node) and didn't encounter any performance issues, then we deployed it to our prod typesense(multi node), somehow, in prod, one node is getting overloaded, causing query time to reached as high as 48seconds, this is the first time we encounter this, since before we only run the scheduled task linearly (so 1 at a time) • difference between demo and prod is that, in prod, we have our actual users who access our website that performs other typesense queries/write, demo is more of a testing app • also this time, the scheduled task is running typesense query 4 in parallel, which before, only 1 • 1 node is reaching as high as 100 CPU

Fanis Tharropoulos

09/23/2025, 9:43 AM

It seems you're using Burst Capacity CPU, and had filled up your credits, leading to a CPU throttle. Apart from that, it looks like every write request is being routed to that unhealthy node, and not leaving it up to Typesense to route the request there internally. We'd suggest upgrading to a non-burst capacity CPU setup and following our guide for load balancing from Typesense Cloud instead of manually routing requests to different nodes

Cris

09/24/2025, 7:50 AM

@Fanis Tharropoulos will take note on this, also, just need to clarify a case: We have instantiated 3 typesense clients via python library

client_a = {nodes: [host-1, host-2, host-3]}

client_b = {nodes: [host-2, host-3, host-1]}

client_c = {nodes: [host-3, host-1, host-2]}

Questions: 1. Since we our running typesense queries 4 at the same time on our scheduled task, if these queries are set to use

client_a

(where

host-1

is the first one), would they all use

host-1

? or just in case on the 2rd query, host-1 becomes unhealthy, would typesense automatically route the 3rd and 4th queries on the next node? 2. if other processes are using typesense queries and we set them to use

client_b

(where

host-2

is the first one), would they use host-2? and maybe if it's unhealthy, they will be rerouted to the next node?

Fanis Tharropoulos

09/24/2025, 8:35 AM

We wouldn't suggest using three clients at once. Having a global one and letting that handle everything is the way to go.

Cris

09/24/2025, 9:13 AM

@Fanis Tharropoulos that's how it is before, but to clarify, using 1 global client 1. Since we our running typesense queries 4 at the same time on our scheduled task, at first first query will use the first node, right? just in case on the 2rd query,

node 1

becomes unhealthy or reached a CPU set limit, would typesense automatically route the 3rd and 4th queries on the next node?

Alan Martini

09/24/2025, 6:29 PM

@Cris, if you don't have SDN activated, the requests are distributed in a round-robin way. But still, you'll want to add the load-balanced endpoint to the client initialization (the load-balanced endpoint is the one without any '-'). If you are using the appropriate endpoint, yes, Typesense will redirect the request. Also, for write requests, it doesn't matter which node you send it to; it will be proxied to the leader node for processing.

Open in Slack

Previous Next