At 9:17AM CDT my production cluster bue2jx8qic7kst...
# community-help
a
At 9:17AM CDT my production cluster bue2jx8qic7kstwrp unexpectidly stopped working, and was listed as, ‘unhealthy’ by typesense
k
Looking into it @Andrew Denta
a
Like I just had to spin up a new cluster and reindex everything
Suuuper scary, glad my customers are mostly on the west coast
k
Sorry about that. I will share the findings with you shortly. We have to improve the cloud UX to expose logs and other resolution actions from the UI.
The issue happened because there was a
null
value in a field defined as a
string[]
. In v0.19, we validate only the first entry in an array for the type. This has since been fixed on master and I will be happy to migrate your cluster to a stable 0.20 RC build if you like. Please remove null values from arrays for now as a workaround.
It was the second value in this case @Andrew Denta and that caused an issue.
a
if you migrate my cluster to 0.20, I’ll need to update my client library, yeah?
This is really stressful, stuff is broken
again
j
@Andrew Denta No, you can use previous versions of the library with v0.20
👍 1
a
@Kishore Nallan @Jason Bosco like I have to maintain 99.9% uptime, and the clock is ticking. I don’t want to have to spin up another cluster, it sounds like that won’t actually do any good? If I get this right, someone can upgrade my existing cluster
dez7hbjn35u89a1mp
to a higher version, and the issue wil lgo away?
Sweet, so you upgrade the cluster, and everything will just work?
j
Yup, I can upgrade
dez7hbjn35u89a1mp
.
a
❤️ plz and thank you
j
Oh wait, you'd need to reindex the data once I upgrade to v0.20. To get the bad data out of the logs
a
thats fine
only like 2k records
j
@Andrew Denta Ok to wipe the existing data from the cluster, yeah?
a
yup
j
You'd need to generate a new API key btw
a
noooooooo
that’s a deploy which will take like 20 minutes
k
We can just drop the collections if there aren’t many.
Jason you don’t need to delete the data
@Jason Bosco Just upgrade, the v0.20 should handle the bad data.
j
Oh cool, ok
Ok, cluster is running v0.20
a
🎺
ight data is showing up on my customers dashboard
🎉 1
thanks guys
k
I apologize once again for this: we are working on automatic recovery from bad data so that bad records can be skipped over.