Any chance of server startup ever being multi-thre...
# community-help
r
Any chance of server startup ever being multi-threaded? It feels so painful to watch 250GB of data being loaded in with just a single core at 100% and taking 15+ hours.
Looks like the very long duration is because it hasn't had a chance to do a snapshot yet. The instance I have that has snapshotted before takes about 3 hours to load instead 15+ hours.
Even the snapshot load is still single-threaded, would be amazing if it could use more threads, but I can understand how that might be... tricky.
j
You can configure how many documents and collections are loaded in parallel using:
--num-collections-parallel-load
and
--num-documents-parallel-load
Documented here: https://typesense.org/docs/0.23.1/api/server-configuration.html#using-command-line-arguments
You could also improve parallelization (for both indexing and searching) if you’re able to shard the data across multiple collections using some attribute as the shard key, if possible.
k
You can shard your data across N collections to parallelize things.
r
Only have a single collection. I'm running with
--num-documents-parallel-load=10000
Thanks for the tip about going across multiple collections, I'll look into that, thanks!
k
There are limits to how much things can be parallelized within a single collection because there are hot spots and locks of various kinds. Of course there is potential for much work to be done to improve, but for these large datasets sharding is the most reliable way to scale out because you can independently index N collections without inter dependency.