Hi Any idea of the possible reason why there s quite a diffe typesense #community-help

Hi, Any idea of the possible reason why there's q...

Cris

05/21/2025, 8:09 AM

Hi, Any idea of the possible reason why there's quite a difference in returned

search_time_ms

from 2 environments using exactly the same search parameters? • Production env - we're for looping thousands of users and searching via typesense but there are some

search_time_ms

that returns an average of

6000+ ms

• Local device - tried to replicate this while the for-loop in prod is running, local app is pointed on same typesense collection with same search parameters and

search_time_ms

only returns less than

1000ms

Kishore Nallan

05/21/2025, 8:11 AM

How many concurrent searches are happening on production env?

Cris

05/21/2025, 8:14 AM

@Kishore Nallan we have a celery worker task that for-loops +-5k users, for each user we're using

multi_search

feature with at least 4 search parameters each then we have another web app, where one endpoint search the same collection, number of request depends on the logged in users

Cris

05/21/2025, 8:15 AM

since we implemented typesense, this long queries only happened since last week

Kishore Nallan

05/21/2025, 8:15 AM

I wonder if these backend tasks are competing with CPU for real-time searches happening on the same cluster.

Cris

05/21/2025, 8:17 AM

@Kishore Nallan current CPU status from the typesense dashboard

Cris

05/21/2025, 8:19 AM

@Kishore Nallan is it possible that one node(black one) seems to always getting all loads? though we're using load balancing

Kishore Nallan

05/21/2025, 8:21 AM

Ah yes. I suspect that the IP returned by the load balanced DNS is being cached so all workers are using the same underlying host. You can try configuring the python client to use individual hosts by shuffling them.

Cris

05/21/2025, 8:24 AM

@Kishore Nallan what do you mean by

"""the python client to use individual hosts by shuffling them."""

? this is our typesense config, shouldn't this handle load balancing to avoid 1 node from processing almost everything?

Kishore Nallan

05/21/2025, 8:26 AM

This will pick only the first host.

Kishore Nallan

05/21/2025, 8:26 AM

Only if that fails, another will be used

Cris

05/21/2025, 8:28 AM

@Kishore Nallan got it from here https://typesense.org/docs/guide/high-availability.html#when-using-typesense-cloud-or-a-load-balancer

Cris

05/21/2025, 8:29 AM

are we missing the

nearest_node

key

Fanis Tharropoulos

05/21/2025, 8:29 AM

The nearest node will be the one that will be used by default if it exists

Kishore Nallan

05/21/2025, 8:29 AM

The nearest node will help if multiple instances of the client as used. If a single instance is used, the underlying resolution of the IP could still be cached.

Cris

05/21/2025, 8:30 AM

@Fanis Tharropoulos yea, that's what I thought, we removed it before

Cris

05/21/2025, 8:33 AM

@Kishore Nallan so, is there typesense config that handles this to avoid a certain node from processing everything? I mean also to handle cached IP or we need to manually handle it to use individual hosts by shuffling them? and avoid the workers of using the same node

Kishore Nallan

05/21/2025, 8:35 AM

You have to use individual hosts and shuffle them for each client.

Cris

05/21/2025, 8:40 AM

@Kishore Nallan sorry, just want clarify, you're referring to something like this? separate client config for each node?

Kishore Nallan

05/21/2025, 8:43 AM

No. Instead of always having -1,-2, -3 order shuffle this order. So each client instance uses a different order. Because the first host is picked and used by the client.

Kishore Nallan

05/21/2025, 8:44 AM

@Fanis Tharropoulos maybe we need to allow the client to round robin the hosts as an option.

Cris

05/21/2025, 8:48 AM

@Kishore Nallan i see, problem now is that we only instantiate 1 client and it's during the start of the app, so all typesense queries uses that instantiated client

Kishore Nallan

05/21/2025, 9:53 AM

Let's see if we can add a round robin rotation feature to the python client.

Cris

05/21/2025, 10:25 AM

@Kishore Nallan sorry, as I continued testing it, was able to replicate the slow queries, we have 3 nodes, tested every node and only on 1 node that the query is very slow, does this confirm that specific node might be overloaded during that query and the other 2 nodes are not?

Kishore Nallan

05/21/2025, 10:27 AM

Yes, the query performance will depend on other searches that are happening on the node.

Cris

05/23/2025, 7:39 AM

@Kishore Nallan sorry for lot of questions, we just implemented a round robin using the 3 nodes but still we're having slow queries of 6seconds on average, from our logs, it seems that, there's only 1 node that's slow (not yet 100% sure on this one) would you be able to check on your side regarding our burst per day (2 vCPUs, 4 hr burst per day) if it's being used and reset?

Cris

05/23/2025, 7:43 AM

the slow queries only happens when we run this scheduled job where it iterates around 6k users and run search for it, their filters are almost the same, but only some are having slow queries, we're thinking that this happens if a query hit the slow node, maybe you have a way on your side to check the status of the nodes (burst per day or other things that might cause)?

Kishore Nallan

05/23/2025, 11:17 AM

Please share your cluster ID

Cris

05/23/2025, 2:45 PM

@Kishore Nallan sent cluster id via pm

Kishore Nallan

05/23/2025, 2:46 PM

Until about 10-15 mins ago, 1/3 nodes had high latency and cpu usage. Is this what you are referring to?

Open in Slack

Previous Next