Hi all! I've finally managed to sit down and get ...
# community-help
e
Hi all! I've finally managed to sit down and get Typesense 27.1 running in a Kubernetes environment... what a mission! The pod starts up and runs for about a minute, then gets a shutdown command somehow and spends the next few minutes shutting down and cleaning up. It's apparently complaining about replicas, which I've tried scaling down to 1 but it still seems to be struggling. Where do I start even trying to debug this? Logs below:
Copy code
I20250218 01:10:21.937121     1 typesense_server_utils.cpp:346] Starting Typesense 27.1
I20250218 01:10:21.937201     1 typesense_server_utils.cpp:349] Typesense is using jemalloc.
I20250218 01:10:21.938023     1 typesense_server_utils.cpp:411] Thread pool size: 32
I20250218 01:10:21.955042     1 store.cpp:40] Initializing DB by opening state dir: /typesense-data/db
I20250218 01:10:21.980901     1 store.cpp:40] Initializing DB by opening state dir: /typesense-data/meta
I20250218 01:10:22.004446     1 store.cpp:40] Initializing DB by opening state dir: /typesense-data
I20250218 01:10:22.033447     1 ratelimit_manager.cpp:546] Loaded 0 rate limit rules.
I20250218 01:10:22.033502     1 ratelimit_manager.cpp:547] Loaded 0 rate limit bans.
I20250218 01:10:22.034224     1 typesense_server_utils.cpp:556] Starting API service...
I20250218 01:10:22.034504   185 batched_indexer.cpp:190] Starting batch indexer with 32 threads.
I20250218 01:10:22.034579   187 typesense_server_utils.cpp:499] Conversation garbage collector thread started.
I20250218 01:10:22.034513     1 http_server.cpp:180] Typesense has started listening on port 8108
I20250218 01:10:22.034595   184 typesense_server_utils.cpp:248] Since no --nodes argument is provided, starting a single node Typesense cluster.
I20250218 01:10:22.036999   185 batched_indexer.cpp:195] BatchedIndexer skip_index: -9999
I20250218 01:10:22.046913   184 server.cpp:1181] Server[braft::RaftStatImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is serving on port=8107.
I20250218 01:10:22.046988   184 server.cpp:1184] Check out http://<redacted>:8107 in web browser.
I20250218 01:10:22.047415   184 raft_server.cpp:69] Nodes configuration: 172.16.238.48:8107:8108
I20250218 01:10:22.047550   184 raft_server.cpp:112] Snapshot does not exist. We will remove db dir and init db fresh.
I20250218 01:10:22.048545   184 store.cpp:246] rm /typesense-data/db success
I20250218 01:10:22.049427   184 store.cpp:40] Initializing DB by opening state dir: /typesense-data/db
I20250218 01:10:22.094455   184 store.cpp:270] DB open success!
I20250218 01:10:22.094499   184 raft_server.cpp:619] Loading collections from disk...
I20250218 01:10:22.094527   184 collection_manager.cpp:288] CollectionManager::load()
I20250218 01:10:22.094568   184 auth_manager.cpp:35] Indexing 0 API key(s) found on disk.
I20250218 01:10:22.094594   184 collection_manager.cpp:324] Loading upto 16 collections in parallel, 1000 documents at a time.
I20250218 01:10:22.094610   184 collection_manager.cpp:333] Found 0 collection(s) on disk.
I20250218 01:10:22.095547   184 collection_manager.cpp:464] Loaded 0 collection(s).
I20250218 01:10:22.096078   184 raft_server.cpp:626] Finished loading collections from disk.
I20250218 01:10:22.096128   184 raft_server.cpp:637] Loaded 0conversation model(s).
I20250218 01:10:22.096140   184 raft_server.cpp:641] Initializing batched indexer from snapshot state...
I20250218 01:10:22.096887   184 log.cpp:690] Use murmurhash32 as the checksum type of appending entries
I20250218 01:10:22.097003   184 log.cpp:1172] log load_meta /typesense-data/state/log/log_meta first_log_index: 1 time: 82
I20250218 01:10:22.097061   184 log.cpp:1112] load open segment, path: /typesense-data/state/log first_index: 1
I20250218 01:10:22.097400   184 raft_meta.cpp:521] Loaded single stable meta, path /typesense-data/state/meta term 7 votedfor 172.16.238.48:8107:8108 time: 28
I20250218 01:10:22.097436   184 node.cpp:608] node default_group:172.16.238.48:8107:8108 init, term: 7 last_log_id: (index=6,term=7) conf: 172.16.238.48:8107:8108 old_conf: 
I20250218 01:10:22.097483   184 node.cpp:1645] node default_group:172.16.238.48:8107:8108 term 7 start vote and grant vote self
I20250218 01:10:22.102871   184 raft_meta.cpp:546] Saved single stable meta, path /typesense-data/state/meta term 8 votedfor 172.16.238.48:8107:8108 time: 4467
I20250218 01:10:22.102936   184 node.cpp:1899] node default_group:172.16.238.48:8107:8108 term 8 become leader of group 172.16.238.48:8107:8108 
I20250218 01:10:22.103024   184 raft_server.cpp:135] Node last_index: 6
I20250218 01:10:22.103040   184 typesense_server_utils.cpp:296] Typesense peering service is running on 172.16.238.48:8107
I20250218 01:10:22.103049   184 typesense_server_utils.cpp:297] Snapshot interval configured as: 3600s
I20250218 01:10:22.103055   184 typesense_server_utils.cpp:298] Snapshot max byte count configured as: 4194304
W20250218 01:10:22.103065   184 controller.cpp:1550] SIGINT was installed with 1
I20250218 01:10:22.103088   184 raft_server.cpp:706] Term: 8, pending_queue: 1, last_index: 6, committed: 0, known_applied: 0, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0
W20250218 01:10:22.103101   184 node.cpp:843] [default_group:172.16.238.48:8107:8108 ] Refusing concurrent configuration changing
E20250218 01:10:22.103132   184 raft_server.cpp:762] Node not ready yet (known_applied_index is 0).
E20250218 01:10:22.103217   219 raft_server.h:62] Peer refresh failed, error: Doing another configuration change
I20250218 01:10:22.106943   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107028   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107049   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107065   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107079   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107098   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107110   217 raft_server.h:293] Configuration of this group is 172.16.238.48:8107:8108
I20250218 01:10:22.107124   217 node.cpp:3298] node default_group:172.16.238.48:8107:8108 reset ConfigurationCtx, new_peers: 172.16.238.48:8107:8108, old_peers: 172.16.238.48:8107:8108
I20250218 01:10:22.107139   217 raft_server.h:276] Node becomes leader, term: 8
I20250218 01:10:32.104281   184 raft_server.cpp:706] Term: 8, pending_queue: 0, last_index: 7, committed: 7, known_applied: 7, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0
I20250218 01:10:32.104416   217 raft_server.h:60] Peer refresh succeeded!
I20250218 01:10:42.105536   184 raft_server.cpp:706] Term: 8, pending_queue: 0, last_index: 7, committed: 7, known_applied: 7, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0
I20250218 01:10:42.105744   217 raft_server.h:60] Peer refresh succeeded!
I20250218 01:10:52.106731   184 raft_server.cpp:706] Term: 8, pending_queue: 0, last_index: 7, committed: 7, known_applied: 7, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0
I20250218 01:10:52.106881   217 raft_server.h:60] Peer refresh succeeded!
I20250218 01:11:02.108397   184 raft_server.cpp:706] Term: 8, pending_queue: 0, last_index: 7, committed: 7, known_applied: 7, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0
I20250218 01:11:02.108783   217 raft_server.h:60] Peer refresh succeeded!
I20250218 01:11:10.724798     1 typesense_server_utils.cpp:56] Stopping Typesense server...
I20250218 01:11:11.110502   184 typesense_server_utils.cpp:329] Typesense peering service is going to quit.
I20250218 01:11:11.110580   184 raft_server.cpp:986] Set shutting_down = true
I20250218 01:11:11.110589   184 raft_server.cpp:990] Waiting for in-flight writes to finish...
I20250218 01:11:11.110596   184 raft_server.cpp:996] Replication state shutdown, store sequence: 0
I20250218 01:11:11.110616   184 raft_server.cpp:1000] node->shutdown
I20250218 01:11:11.110626   184 node.cpp:961] node default_group:172.16.238.48:8107:8108 shutdown, current_term 8 state LEADER
I20250218 01:11:11.110750   184 replicator.cpp:1499] Group default_group Fail to find the next candidate
I20250218 01:11:11.110786   184 raft_server.cpp:1004] node->join
I20250218 01:11:11.110805   217 raft_server.h:281] Node stepped down : Raft node is going to quit.
I20250218 01:11:11.110854   217 raft_server.h:285] This node is down
I20250218 01:11:11.110978   184 node.cpp:961] node default_group:172.16.238.48:8107:8108 shutdown, current_term 8 state SHUTDOWN
I20250218 01:11:11.111166   184 typesense_server_utils.cpp:334] raft_server.stop()
I20250218 01:11:11.111189   184 server.cpp:1241] Server[braft::RaftStatImpl+braft::FileServiceImpl+braft::RaftServiceImpl+braft::CliServiceImpl] is going to quit
I20250218 01:11:11.111701   184 typesense_server_utils.cpp:337] raft_server.join()
I20250218 01:11:11.111865   184 typesense_server_utils.cpp:340] Typesense peering service has quit.
I20250218 01:11:11.112567   184 typesense_server_utils.cpp:520] Shutting down batch indexer...
I20250218 01:11:11.112587   184 typesense_server_utils.cpp:523] Waiting for batch indexing thread to be done...
I20250218 01:11:12.046155   185 batched_indexer.cpp:481] Notifying batch indexer threads about shutdown...
I20250218 01:11:12.046342   185 batched_indexer.cpp:487] Notifying reference sequence thread about shutdown...
I20250218 01:11:12.046532   185 batched_indexer.cpp:491] Batched indexer threadpool shutdown...
I20250218 01:11:12.048100   184 typesense_server_utils.cpp:526] Shutting down event sink thread...
I20250218 01:11:12.048161   184 typesense_server_utils.cpp:529] Waiting for event sink thread to be done...
I20250218 01:11:12.048365   184 typesense_server_utils.cpp:532] Shutting down conversation garbage collector thread...
I20250218 01:11:12.050161   184 typesense_server_utils.cpp:535] Waiting for conversation garbage collector thread to be done...
I20250218 01:11:12.050217   184 typesense_server_utils.cpp:538] Waiting for housekeeping thread to be done...
I20250218 01:11:12.050282   184 typesense_server_utils.cpp:542] Shutting down server_thread_pool
I20250218 01:11:12.051800   184 typesense_server_utils.cpp:546] Shutting down app_thread_pool.
I20250218 01:11:12.053494   184 typesense_server_utils.cpp:550] Shutting down replication_thread_pool.
I20250218 01:11:12.056151     1 typesense_server_utils.cpp:563] Typesense API service has quit.
I20250218 01:11:12.056715     1 typesense_server_utils.cpp:567] Deleting batch indexer
I20250218 01:11:12.056771     1 typesense_server_utils.cpp:571] CURL clean up
I20250218 01:11:12.056785     1 typesense_server_utils.cpp:575] Deleting server
I20250218 01:11:12.057505     1 typesense_server_utils.cpp:579] CollectionManager dispose, this might take some time...
I20250218 01:11:12.058318     1 typesense_server_utils.cpp:588] Bye.
Thanks for any assistance, I've set up Typesense on traditional VMs many times but first time doing it on K8s 👍
I tried increasing CPU and memory but no change: This is our old Solr container:
Copy code
resources:
    requests:
      cpu: "50m"
      memory: "576Mi"
    limits:
      cpu: "1000m"
      memory: "768Mi"
Changed to this:
Copy code
resources:
    requests:
      cpu: "500m"
      memory: "1076Mi"
    limits:
      cpu: "1000m"
      memory: "1568Mi"
While I'm here... is this the best way to check for liveness and readiness?
command: ["/bin/sh", "-c", "curl --silent --fail -H 'x-typesense-api-key: ${TYPESENSE_API_KEY}' <http://localhost:8108> > /dev/null"  ]
a
There’s a
/health
endpoint which might be more appropriate for a liveness/readiness check. The postman collection is a nice resource I’ve found very useful.
👍 1
e
~In case anybody else stumbles upon this, my intuition was on the right track. Not actually anything to do with CPU or memory limits, but boosting them did help. The liveness and readiness checks were just guesses but not correct at all. After a minute or so, because the pod was neither live nor ready, it would be restarted which was sending the SIGINT to the Typesense daemon The fix was adding httpGet commands to /health on the
livenessProbe
and `readinessProbe`~
In case anybody else stumbles upon this, my intuition was on the right track. Not actually anything to do with CPU or memory limits, but boosting them did help. The liveness and readiness checks were just guesses but not correct at all. After a minute or so, because the pod was neither live nor ready, it would be restarted which was sending the SIGINT to the Typesense daemon The fix was adding httpGet commands to /health on the
livenessProbe
and
readinessProbe