Could you help me understand how the load balancing works in HA? I ran a load test against our production HA cluster this morning. At the same time the search load test was running (you see the searches per second metric climb to 5 and 3 for two of the nodes), I also triggered some write updates. And you see the pending write count climb.
Some questions:
1. It seems like the load balancing doesn't distribute searches to other nodes until the primary reaches 100% CPU. This is means search response times are 3-4 times higher than non-load before new nodes start serving searches.
a. Is that expected, and is that configurable?
b. I'd like the load balancer to server the closest node under low load, but as search response times climb, we're not load balancing soon enough.
2. When there are pending writes for a node, and that node is 1 of 2 nodes serving searches, does that mean users are getting different search results depending on where they've been routed?
3. In my mind, I would want/expect the search latency chart to show overall performance slowing across all nodes slowly. Are there things we can adjust to drive that? This is my expectation just because fast user search is the primary use case for implementing TS.
a. Currently we get 1 node spiking then another node kicking in. And the CPU, requests per second, and pending write graphs all point to the HA setting leaning heavily on just 1 node.