Hi, we saw a spike in CPU usage today 1 hour 30 mi...
# community-help
a
Hi, we saw a spike in CPU usage today 1 hour 30 minutes before (at around 4:30 PM EST). However, there were no pending writes or no significant spike in searches (from 3-4 searches per second to a bit more then 5 and 10 on a specific node) yet CPU usage spike to almost 100% (screenshots attached).
I can share our cluster id if it helps.
FYI we're running on
v28.0.rc27
with
16 GB RAM
and
4v CPU
Is there a way to figure out what happened there? Usually we see spike in CPU usage when we're writing to Typesense, but that wasn't the case this time.
j
Looking at the more granular metrics we have internally, it looks like traffic increased by about 2.5x, which then exhausted all CPU capacity, which resulted in searches slowing down
(Side note: we usually don't share metrics publicly in the Slack community, but I only shared it here since you had already posted screenshots from your dashboard view. Let me know if you'd want to delete all the metric screenshots)
a
No that is okay. And thank you for looking into it 🙌 Also, we're seeing high CPU usage throughout the day, today, and searches per second is around 4-6. And we have
4v CPU
as I mentioned, Not sure if its expected? (IIRC we didn't have high CPU usage previously, but i maybe missing something). What are some ways in which we can find more about this? i.e. reason behind high CPU usage? (I think we only have access to metrics we see on dashboard?)
j
The graph you see on the dashboard aggregates at larger time windows as you change the duration dropdown. So for eg, if you set the duration to 24hrs, then the time window on the x-axis becomes 1 hr, and if you set p95 as the aggregation function, then even if CPU reaches to say 70%, for more than 95% of the time that will register as 70% on the graph due to the aggregation. Looking at more granular metrics (see screenshot), CPU only spikes when the search queries also spike to say 12 searches per second. So the CPU usage corresponds strongly to your search volume, and the fact that you're using vector search. On that note, could you make sure that you're using
exclude_fields: <your_embedding_field>
so the raw floating point values are not returned from the Typesense server. This sometimes causes high CPU usage as well due to the high IO
a
That makes sense, I'll make sure that we exclude embedding field (in case there are places where we aren't doing this). Thank you so much for the detailed response. 🙌 🙏
👍 1