Could you help me understand how the load balancin...
# community-help
s
Could you help me understand how the load balancing works in HA? I ran a load test against our production HA cluster this morning. At the same time the search load test was running (you see the searches per second metric climb to 5 and 3 for two of the nodes), I also triggered some write updates. And you see the pending write count climb. Some questions: 1. It seems like the load balancing doesn't distribute searches to other nodes until the primary reaches 100% CPU. This is means search response times are 3-4 times higher than non-load before new nodes start serving searches. a. Is that expected, and is that configurable? b. I'd like the load balancer to server the closest node under low load, but as search response times climb, we're not load balancing soon enough. 2. When there are pending writes for a node, and that node is 1 of 2 nodes serving searches, does that mean users are getting different search results depending on where they've been routed? 3. In my mind, I would want/expect the search latency chart to show overall performance slowing across all nodes slowly. Are there things we can adjust to drive that? This is my expectation just because fast user search is the primary use case for implementing TS. a. Currently we get 1 node spiking then another node kicking in. And the CPU, requests per second, and pending write graphs all point to the HA setting leaning heavily on just 1 node.
Our write updates are a series of individual Update or Create calls. We aren't currently using the import option with action=upsert because it doesn't allow us to protect against updates coming out of order. Currently we check a "lastUpdated" field to make sure that updates don't override a newer document. But Import doesn't support filter_by yet.
Here is the same test again (same search load and same write load), but instead of hitting the load balance endpoint for the searches, I round robin the load test to each of the 3 nodes randomly. The time scale is different but you can see our response times are much better, pending writes never spike, and while CPU briefly pegs it is not like the first test where 1 node spent most of the test at 100%.
j
> It seems like the load balancing doesn't distribute searches to other nodes until the primary reaches 100% CPU. This is an artifact of the load testing harness. We publish load balancing decisions via DNS changes (so it's a DNS-based load balancer and not a TCP load balancer), so if the load testing framework is caching DNS advertisements without looking at the TTL published, then that's when this happens
> When there are pending writes for a node, and that node is 1 of 2 nodes serving searches, does that mean users are getting different search results depending on where they've been routed? That's correct - different versions of the documents technically
Currently we get 1 node spiking then another node kicking in. And the CPU, requests per second, and pending write graphs all point to the HA setting leaning heavily on just 1 node.
This is the same issue as the first statement.
Basically currently you're saturating one node by sending all requests from your load testing harness to a client-side cached individual node's IP, and then the next is only being used after a fallback. You want to try running your load testing harness from say 3 different machines to simulate real-life scenario where the requests are coming from various browsers or various app server instances
On a side note, doing single document check and updates is going to be your biggest bottleneck like I mentioned in another thread. You want to do the resolution of the final state of a document in your application DB and then sync a copy of that final state into Typesense, and not do that within Typesense using single document writes
👍 1