#community-help

Resolving Unhealthy Typesense Cluster and JSON Parsing Bug

TLDR Masahiro reported an unhealthy Typesense cluster. The cause was a parsing bug related to boolean values in JSON schemas. Jason resolved the issue by clearing node data and upgrading the server to v0.20, which resolved the issue and Masahiro's team decided to use Typesense.

Powered by Struct AI

2

2

1

1

1

Apr 23, 2021 (34 months ago)
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:33 AM
My cluster (gxu1iajypedl2z0tp-1.a1.typesense.net) turned into unhealthy.
Any ways to fix this issue? If there are documentations about this, it will be helpful 😄
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:33 AM
Looking
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:34 AM
thanks!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:39 AM
Could you share the exact collection schema you used?
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:40 AM
{
        'name': 'users',
        'fields': [
          {'name': 'displayName', 'type': 'string' },
          {'name': 'userId', 'type': 'string' },
          {'name': 'favGameTitle1','optional':'true', 'type': 'string' },
      
          {'name': 'gender', 'optional':'true', 'type': 'int32' },

        ],
        'default_sorting_field': 'gender'
      }
01:41
Masahiro
01:41 AM
After adding optional parameter, the cluster might stop working.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:42 AM
Ah, so the issue is that Typesense crashed because there's an uncaught exception in the json parsing around boolean values, in Typesense core.

We currently expect optional to have values of true or false. Looks like a string value of "true" for optional crashes the server 😞
01:44
Jason
01:44 AM
We'll fix the bug shortly, in the meantime could you use bool values instead of string values for true and false in the schema?
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:44 AM
Of course, ok!
01:44
Masahiro
01:44 AM
Thanks 😸
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:45 AM
Now to recover the node, we'd unfortunately need to clear the bad data in the server... Would that be ok?

1

01:45
Jason
01:45 AM
(Separately we're also working on a way to skip over bad data in v0.20, so having to clear data when bad data makes it in, is only an issue with v0.19)

1

01:48
Jason
01:48 AM
> Now to recover the node, we'd unfortunately need to clear the bad data in the server... Would that be ok?
Could you confirm that you're ok with this Masahiro
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:50 AM
unfortunately no, still unhealthy..
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:50 AM
Sorry, I'd need explicit confirmation that you're ok with me clearing the data on the node and then restart the server
01:50
Jason
01:50 AM
Only then will it turn healthy
01:51
Jason
01:51 AM
Could you reply with an explicit OK to clear the data?
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:51 AM
Oh sorry, ok!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:52 AM
Ok, clearing data now
01:54
Jason
01:54 AM
Alright, node should be back up now
Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
01:57 AM
healthy now, thank you!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:59 AM
Sorry about that! Bug fix for the original issue should be out in v0.20 soon

1

1

Apr 29, 2021 (33 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:08 AM
Hi Masahiro I've upgraded this cluster to v0.20, now that it's out. It should be largely transparent to you!

1

Masahiro
Photo of md5-366dff6b5f9b1a7d0f404fdc3261e573
Masahiro
07:12 AM
Thank you so much😆😆🎉
07:14
Masahiro
07:14 AM
And finally our team has decided to use Typesense😊
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:59 AM
Yaaay! 🎉

1

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Troubleshooting Read Timeout Issues on AWS Cluster.

Stefan encountered read timeout issues with AWS cluster, and Kishore Nallan helped evaluate the issue, finding an unhealthy node which was fixed. Kishore Nallan also helped Stefan to upgrade the cluster configuration.

29
32mo

Discussion About Typesense Nodes Not Synchronizing Correctly

Erick experienced an issue where documents weren't updated properly in a Typesense instance running on 3 nodes. Upon requesting debug logs and configs, Jason identified that these nodes weren't part of the same cluster. They couldn't resolve the nodes' failure to connect issue and recommended a fresh installation.

2

91
23mo

Typesense Bug Fix with `canceled_at` Field and Upgrade Concerns

Mateo reported an issue regarding the treatment of an optional field by Typesense which was confirmed a bug by Jason. After trying an upgrade, an error arose. Jason explained the bug was due to a recent change and proceeded to downgrade their version. Future upgrade protocols were discussed.

3

74
10mo

Production Cluster Failure and Solution

Andrew experienced an unexpected production cluster failure. Kishore Nallan and Jason helped diagnose the problem, remediated it, and upgraded the cluster to prevent future issues.

2

32
34mo

Troubleshooting Typesense Cluster Multi-node Leadership Error

Bill experienced a problem with a new typesense cluster, receiving an error about no leader and health status issues. Jason and Kishore Nallan provided troubleshooting steps and determined it was likely due to a communication issue between nodes. Kishore Nallan identified a potential solution involving resetting the data directory. Following this, Bill reported the error resolved.

45
24mo