#community-help

Resolving API Service Unavailability With Upgrades and Monitoring

TLDR Keith reported an API service lag. Jason diagnosed the issue as a problem with an early build and resolved it with an upgrade. They also suggested monitoring the HTTP response codes for anomalies.

Powered by Struct AI

1

20
1mo
Solved
Join the chat
Aug 07, 2023 (1 month ago)
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
04:41 PM
When sending to the API, we’re getting, “Service Unavaiable or Lagging”
Image 1 for When sending to the API, we’re getting, “Service Unavaiable or Lagging”
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:43 PM
Looking now
04:45
Jason
04:45 PM
Looks like you’re on an early build that had a couple of validation issues that stalled the write queue we’ve fixed since in recent RC builds…
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
04:45 PM
Thanks!
04:46
Keith
04:46 PM
Can we do an in-place upgrade?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:46 PM
Yup, I can queue that up for you, once we stabilize this node with the stalled write queue
04:46
Jason
04:46 PM
Working on it
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
04:46 PM
You’re the best

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:07 PM
Keith You’re all set. Your cluster is now running 0.24.1 and that version doesn’t have this particular issue
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
05:07 PM
Awesome! Thank you so much 🙏
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:07 PM
Happy to help!
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
07:30 PM
Do you know if any way we could put alerts around those stats?
07:30
Keith
07:30 PM
It went a week without us noticing to be honest
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:34 PM
For this particular case where writes were stalled, but searches were working fine, I would recommend monitoring the HTTP response codes from Typesense and monitoring for any anomalies there…
07:34
Jason
07:34 PM
If both reads and writes were stalled, then our monitoring would have flagged this…
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
07:35 PM
Do we get notifications on those?
07:35
Keith
07:35 PM
Rather, are we suppose to?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:35 PM
No, we receive pages on our side for anything that affects both reads and writes, and in most cases we’re able to resolve them by upgrading capacity for eg (when auto capacity scaling is enabled).
07:36
Jason
07:36 PM
In other cases, if we need additional info to stabilize the cluster, we then email customers
Keith
Photo of md5-49c23d5dc46241ca1af5233658633765
Keith
07:36 PM
AH gotcha, no problem, I’m sure I could whip up a monitoring tool