#community-help

Addressing Raised CPU Usage and System Instability

TLDR Todd raised a concern regarding increased CPU usage and system instability. Jason identified the problem due to a change in write patterns and suggested an upgrade. The system upgrade was subsequently implemented and solved the issue.

Powered by Struct AI

4

3

1

1

1

1

1

Aug 04, 2023 (4 months ago)
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
03:42 PM
Hey all, we’ve started maxing our CPU Usage somewhat consistently, and I know the fix is to auto-scale, but we’d really like to figure out where this increased load is coming from. Is there any way to look at system usage per operation or keep a running monitor? Basically, are there any tools or methods anyone is using to find out whats blowing swap disk through the roof, or what might be causing (edit) cpu to spike?
03:44
Todd
03:44 PM
Also, thanks as always for the great tool
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:54 PM
Looking at your cluster’s metrics, it looks like your write patterns changed in the last few hours… which is what is eating up CPU.

And the volume of writes that came in also briefly exhausted RAM and caused the OS to kill the Typesense process and restart.

If you plan to add more data, I would recommend upgrading to 32GB RAM and either 4vCPU or may be 8vCPU if needed
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
04:11 PM
Jason - You are the goat (compliment). Thank you sir

1

04:12
Todd
04:12 PM
We have a big batch write process, and we talked about moving it to really low volume time and seeing if that gets us by. Thank you
04:14
Todd
04:14 PM
Also just to be clear, goat is an acyonym for Greatest Of All Time. I am trying to stay hip with the kids

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:15 PM
Hahaha! Thank you! I’ve been reading the term goat on twitter recently and was too lazy to go look up urban dictionary, so thank you for educating me! 😂

1

1

04:17
Jason
04:17 PM
> https://typesense-community.slack.com/archives/C01P749MET0/p1691165564344679?thread_ts=1691163759.067149&cid=C01P749MET0
I would also recommend may be pacing out the writes to smaller batches of say 5K records over a longer duration of time
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
04:21 PM
Absolutely. I’m discussing concurrency controls with the team now. Thank you as always. I can’t express that enough.

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:12 PM
Todd Your cluster is going through a crash-restart loop due to running out of RAM. Upgrading would be the only way to recover it
09:13
Jason
09:13 PM
We’d have to do this upgrade from our side, since the node is unstable. Let me know if you’re ok with proceeding with this
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
09:14 PM
Let me reach out to my boss real quick

1

09:20
Todd
09:20 PM
She’s not around currently, but I think we need these features. Please upgrade our node at your earliest convenience. Thank you.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:20 PM
Queuing it up now
Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
09:22 PM
Thank you very much. I’ll test once its live. We’ve got significant fixes around this, but we’re going to hold off deploying until Monday

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:22 PM
The upgrade is running now.

1

Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
09:25 PM
thank you Jason and Todd for saving the day!

2

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:38 PM
The upgrade just completed

1

Todd
Photo of md5-cccf0b87668408fef09dd77e1948fced
Todd
11:15 PM
Jason - Sorry, forgot to update here. Everything looks great. Thank you
11:15
Todd
11:15 PM
We are stable again 😅

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community