Hi everyone, when I restart Typesense it takes wel...
# community-help
j
Hi everyone, when I restart Typesense it takes well over 3-4 hours for all collections to load for the /health endpoint = {"ok": true}. Any tips or ideas on how to speed up the collection loading process? Context: 4 collections • Collection 1 has 50861 documents • Collection 2 has 1,564,924 documents • Collection 3 has ~6.6M documents • Collection 4 has ~8M documents
a
Hey @John Sokol, This is a considerable amount of documents, so is not at all weird for it to take 3-4 hours. That said, on restart, Typesense will load your collection from the disk to RAM. This process is mainly dictated by the following: 1. Your CPU capacity 2. Shape of data (less complex datasets will load faster). 3. If you're using embeddings (in this case a GPU might speed it up). 4. How many fields of your documents are declared in the schema (any field not present on the schema, but present on the document, will not be loaded into RAM).
You can also tweak the following server parameters, to see which works better for you:
Copy code
--num-collections-parallel-load	
--num-documents-parallel-load
Ref: https://typesense.org/docs/29.0/api/server-configuration.html#resource-usage
j
Got it! Thanks Alan. The 8M document collection is what is taking a long time. The other collections take 15 minutes to load. For the 8M document collection, the average document size is 4247.61 bytes. The num-documents-parallel-load parameter is what I should tweak to see if there are performance increases.
t
This is normal, check the /var/log/typesense for progress, I have 2M documents in 1 collection, I use a dedicated box with 40 cores and 125gb ram. It takes around 10 mins per mil docs.