Issues with Typesense and k8s Snapshot Restoration
TLDR Arnob experienced data loss and errors with Typesense in k8s. Kishore Nallan explained corruption could be from premature pod termination. To resolve, Kishore Nallan suggested deleting the data directory on the malfunctioning pod for automatic restoration from the leader.
Aug 06, 2023 (4 months ago)
Arnob
08:08 AMI using Typesense in k8s. When i increase k8s resource, it deleted all the collection and also not restore the snapshot.
Here the error log:
E20230806 07:06:25.193410 1 store.h:68] Error while initializing store: Corruption: file is too short (1158 bytes) to be an sstable/data/db/000320.sst in file /data/db/MANIFEST-000095
E20230806 07:06:25.288295 128 store.h:68] Error while initializing store: Corruption: file is too short (1158 bytes) to be an sstable/data/db/000320.sst in file /data/db/MANIFEST-000095
E20230806 07:06:25.289043 128 raft_server.h:279] Met peering error {type=StateMachineError, error_code=-1, error_text=`StateMachine on_snapshot_load failed'}
E20230806 07:06:25.289099 115 snapshot_executor.cpp:393] Fail to load snapshot from local:///data/state/snapshot
E20230806 07:06:25.289158 115 node.cpp:557] node default_group:10.4.11.74:8107:8108 init_snapshot_storage failed
E20230806 07:06:25.289209 115 raft_server.cpp:126] Fail to init peering node
E20230806 07:06:25.289883 115 typesense_server_utils.cpp:276] Failed to start peering state
E20230806 07:12:41.508708 1 auth_manager.cpp:263] Scoped API keys can only be used for searches.
E20230806 07:12:41.919620 1 auth_manager.cpp:263] Scoped API keys can only be used for searches.
E20230806 07:12:44.110736 1 auth_manager.cpp:263] Scoped API keys can only be used for searches.
I20230806 07:06:25.151355 1 typesense_server_utils.cpp:331] Starting Typesense 0.25.0.rc45
I20230806 07:06:25.151386 1 typesense_server_utils.cpp:334] Typesense is using jemalloc.
I20230806 07:06:25.151721 1 typesense_server_utils.cpp:384] Thread pool size: 16
I20230806 07:06:25.166436 1 store.h:64] Initializing DB by opening state dir: /data/db
I20230806 07:06:25.193804 1 store.h:64] Initializing DB by opening state dir: /data/meta
I20230806 07:06:25.228266 1 ratelimit_manager.cpp:546] Loaded 0 rate limit rules.
I20230806 07:06:25.228302 1 ratelimit_manager.cpp:547] Loaded 0 rate limit bans.
I20230806 07:06:25.228466 1 typesense_server_utils.cpp:495] Starting API service...
I20230806 07:06:25.228753 115 typesense_server_utils.cpp:232] Since no --nodes argument is provided, starting a single node Typesense cluster.
I20230806 07:06:25.229040 1 http_server.cpp:178] Typesense has started listening on port 8108
I20230806 07:06:25.229212 116 batched_indexer.cpp:124] Starting batch indexer with 16 threads.
I20230806 07:06:25.248844 115 server.cpp:1107] Server[braft::RaftStatImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is serving on port=8107.
I20230806 07:06:25.248876 115 server.cpp:1110] Check out in web browser.
I20230806 07:06:25.249287 115 raft_server.cpp:68] Nodes configuration: 10.4.11.74:8107:8108
I20230806 07:06:25.256830 115 log.cpp:690] Use murmurhash32 as the checksum type of appending entries
I20230806 07:06:25.257787 116 batched_indexer.cpp:129] BatchedIndexer skip_index: -9999
I20230806 07:06:25.263262 115 log.cpp:1172] log load_meta /data/state/log/log_meta first_log_index: 1 time: 6369
I20230806 07:06:25.263456 115 log.cpp:1112] load open segment, path: /data/state/log first_index: 1
I20230806 07:06:25.281968 128 raft_server.cpp:529] on_snapshot_load
I20230806 07:06:25.282536 128 store.h:299] rm /data/db success
I20230806 07:06:25.282799 128 store.h:309] copy snapshot /data/state/snapshot/snapshot_00000000000000000173/db_snapshot to /data/db success
I20230806 07:06:25.282831 128 store.h:64] Initializing DB by opening state dir: /data/db
W20230806 07:06:25.288667 128 store.h:319] Open DB /data/db failed, msg: Corruption: file is too short (1158 bytes) to be an sstable/data/db/000320.sst in file /data/db/MANIFEST-000095
I20230806 07:06:25.288950 128 snapshot_executor.cpp:264] node default_group:10.4.11.74:8107:8108 snapshot_load_done, last_included_index: 173 last_included_term: 16 peers: "10.4.8.106:8107:8108"
I20230806 07:06:25.289275 115 node.cpp:961] node default_group:10.4.11.74:8107:8108 shutdown, current_term 0 state UNINITIALIZED
W20230806 07:06:25.289113 128 node.cpp:1311] node default_group:10.4.11.74:8107:8108 got error={type=StateMachineError, error_code=-1, error_text=`StateMachine on_snapshot_load failed'}
I20230806 07:06:25.289458 128 raft_server.h:275] This node is down
I20230806 07:07:26.264640 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:08:27.271401 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:09:28.278858 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:10:29.286155 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:11:30.293216 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:12:31.301210 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:13:32.308427 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:14:33.315384 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:15:34.322649 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
I20230806 07:16:35.329157 116 batched_indexer.cpp:284] Running GC for aborted requests, req map size: 0
Attn: Kishore Nallan, Sai
Kishore Nallan
02:09 PMSai
03:18 PMKishore Nallan
03:19 PMSai
03:29 PMKishore Nallan
03:29 PMSai
03:50 PMKishore Nallan
04:13 PMAug 08, 2023 (4 months ago)
Arnob
09:29 AMTypesense
Indexed 3005 threads (79% resolved)
Similar Threads
"Resolving Startup Issue with Typesense Single Replica Set in Local `K8s` Setup"
Vishal encountered issues starting a Typesense single replica set due to directory permission issues. Kishore Nallan suggested checking if the volume was accessible for writes. Vishal resolved the problem by adjusting permissions on the host.
Typesense Issues in Kubernetes Environment
siva.sunkara experienced Typesense issues in Kubernetes and shared logs. Kishore Nallan recommended stand-alone nodes and shared a Github issue thread. Sergio suggested fine-tuning Kubernetes deployment and linked to a working setup.
Typesense Node Stuck in Segfault Loop After Stress Test
Adrian encountered a segfault loop when stress testing a Typesense cluster. Kishore Nallan recommended trying a newer RC build and suggested potential issues with hostname resolution.
Issues deploying Typesense to AWS EKS
Pavan had issues when deploying Typesense to AWS EKS. Kishore Nallan suggested deployment to plain EC2 instances and provided the API key information. Eventually, Pavan resolved the issue with Helm.
Crash and Recovery Issues with Node Reindexing
Greg encountered issues with node health during reindexing, with service unresponsive and recovery taking significant time. Jason and Kishore Nallan suggested it might be a case of high volume writes and not a crash. Problem wasn't fully resolved after attempted solutions and data sharing for further debugging.