Production Cluster Failure and Solution
TLDR Andrew experienced an unexpected production cluster failure. Kishore Nallan and Jason helped diagnose the problem, remediated it, and upgraded the cluster to prevent future issues.
1
1
Apr 21, 2021 (34 months ago)
Andrew
03:09 PMKishore Nallan
03:20 PMAndrew
03:20 PMAndrew
03:20 PMKishore Nallan
03:21 PMKishore Nallan
03:47 PMnull
value in a field defined as a string[]
.In v0.19, we validate only the first entry in an array for the type. This has since been fixed on master and I will be happy to migrate your cluster to a stable 0.20 RC build if you like. Please remove null values from arrays for now as a workaround.
Kishore Nallan
03:48 PMAndrew
04:04 PMAndrew
04:04 PMAndrew
04:04 PMJason
04:06 PM1
Andrew
04:06 PMIf I get this right, someone can upgrade my existing cluster
dez7hbjn35u89a1mp
to a higher version, and the issue wil lgo away?Andrew
04:06 PMJason
04:06 PMdez7hbjn35u89a1mp
.Andrew
04:07 PMJason
04:07 PMAndrew
04:07 PMAndrew
04:07 PMJason
04:09 PMAndrew
04:09 PMJason
04:09 PMAndrew
04:10 PMAndrew
04:10 PMKishore Nallan
04:10 PMKishore Nallan
04:10 PMKishore Nallan
04:10 PMJason
04:11 PMJason
04:11 PMAndrew
04:12 PMAndrew
04:12 PM1
Andrew
04:12 PMKishore Nallan
04:13 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Addressing Cluster Issue due to Excessive Data
Andrew had trouble with cluster operations due to excessive data and collections. Jason advised flushing the data and stated that the upcoming update will remedy such issues. Both agreed to stick to v0.19 and not to fill the cluster excessively.
Troubleshooting Unhealthy Cluster Issue
Sruli was unable to utilize their cluster. Jason suggested an update, which didn't solve the issue, then diagnosed the problem as a large string causing crashes. The resolution required resetting the cluster state.
Typesense Bug Fix with `canceled_at` Field and Upgrade Concerns
Mateo reported an issue regarding the treatment of an optional field by Typesense which was confirmed a bug by Jason. After trying an upgrade, an error arose. Jason explained the bug was due to a recent change and proceeded to downgrade their version. Future upgrade protocols were discussed.
Resolving Unhealthy Typesense Cluster and JSON Parsing Bug
Masahiro reported an unhealthy Typesense cluster. The cause was a parsing bug related to boolean values in JSON schemas. Jason resolved the issue by clearing node data and upgrading the server to v0.20, which resolved the issue and Masahiro's team decided to use Typesense.
Typesense Cluster Upgrade Issues and Solutions
Ken reported a system outage due to Typesense cluster upgrade issues. Jason recommended upgrading to the next RAM tier and explained when the auto upgrade feature takes effect. After a repeat issue, Jason added handling upgrades when disk space ran out to their backlog.