#community-help

Fixing Errors on Typesense Cluster

TLDR Tugay is having issues with their Typesense cluster, and Jason is trying to diagnose the problem. They have ruled out issues with the data dir and are currently considering whether the errors could be due to high concurrent writes or running bulk migrations. They plan to test with 0.24.0.rcn56 Typesense version.

Powered by Struct AI

1

Jan 12, 2023 (9 months ago)
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:26 PM
Hi everyone, I am receiving following error logs on my Typesense cluster, is anyone knows the root cause of these errors and has any suggestions how to fix that?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:27 PM
Seems like the data dir might have been wiped… so Typesense it able to fetch the document from disk
03:28
Jason
03:28 PM
Could you make sure you’re not using the /tmp directory for data dir?
03:28
Jason
03:28 PM
Because that gets wiped by the OS randomly
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:28 PM
Checking it..
03:29
Tugay
03:29 PM
sudo docker run -i -d --restart always --network host --name typesense-server -p 8107:8107 -p 8108:8108 --ulimit nofile=50000:50000  -v /var/lib/typesense:/var/lib/typesense -v /var/log/typesense:/var/log/typesense -v /etc/typesense/:/etc/typesense/ typesense/typesense:0.23.1 --config /etc/typesense/typesense-server.ini
03:29
Tugay
03:29 PM
I am using typesense docker image with the command above
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:30 PM
Could you share the contents of the config file?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:30 PM
Of course
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:30 PM
Could you also check if the Docker mount didn’t get lost somehow?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:31 PM
; Typesense Configuration

[server]

api-address = 0.0.0.0
api-port = 8108
peering-port = 8107
data-dir = /var/lib/typesense
api-key = ----
log-dir = /var/log/typesense
nodes = /etc/typesense/nodes.txt
log-slow-requests-time-ms = 1000
thread-pool-size = 64
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:31 PM
Could you exec in to the container and check if the data dir is still inside var lib?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:32 PM
Of course, checking
03:36
Tugay
03:36 PM
Before that I have another question we have about 5k collections stored on our typesense cluster and we ran a migration for all collections in parallel approxiamately 70-100 collections at a time. Can there be any side effect because of this. Such as some documents imported to invalid to collection which is different than given collection 😄
03:40
Tugay
03:40 PM
$ docker exec -it typesense-server du -sh /var/lib/typesense/*
2.7G    /var/lib/typesense/db
273M    /var/lib/typesense/meta
1012M    /var/lib/typesense/state
03:40
Tugay
03:40 PM
yes all data dirs are available
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:40 PM
Ok we can rule that out… May I know what version of Typesense this is on?
03:41
Jason
03:41 PM
I vaguely remember an issue like this from about two years ago…
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:41 PM
0.23.1

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:42 PM
When you say migration, did you alter the schema in place or did you create a new set of collections with the new schema?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:42 PM
mostly dropped and created collection again or just reimported data
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:43 PM
Another scenario I've seen this happen is when there's high concurrent writes. During that time, some documents during query will be available in memory but not yet on disk… Are you able to still replicate this issue after the writes have completed?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
03:45 PM
Nope, this error occurs only after running bulk migration
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:54 PM
Could you try this on 0.24.0.rcn56? and see if you can replicate it?
Tugay
Photo of md5-e920cc88c8354329d64e9a0332a7e5e2
Tugay
06:37 PM
Yes I can but is the v0.24.0 ready for production?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:59 PM
0.24.0.rcn56 is ready for production. If we find no other issues, that’s the build we hope to release as the GA version next week