#community-help

Increasing Server Startup Speed Through Parallelization

TLDR Robert was experiencing long server startup times due to single-threaded data loading. Jason and Kishore Nallan suggested increasing parallelization by adjusting load settings and distributing data across multiple collections.

Powered by Struct AI
Oct 20, 2022 (14 months ago)
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
10:14 PM
Any chance of server startup ever being multi-threaded? It feels so painful to watch 250GB of data being loaded in with just a single core at 100% and taking 15+ hours.
10:34
Robert
10:34 PM
Looks like the very long duration is because it hasn't had a chance to do a snapshot yet. The instance I have that has snapshotted before takes about 3 hours to load instead 15+ hours.
10:35
Robert
10:35 PM
Even the snapshot load is still single-threaded, would be amazing if it could use more threads, but I can understand how that might be... tricky.
Oct 21, 2022 (14 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:32 AM
You can configure how many documents and collections are loaded in parallel using: --num-collections-parallel-load and --num-documents-parallel-load

Documented here: https://typesense.org/docs/0.23.1/api/server-configuration.html#using-command-line-arguments
12:33
Jason
12:33 AM
You could also improve parallelization (for both indexing and searching) if you’re able to shard the data across multiple collections using some attribute as the shard key, if possible.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:37 AM
You can shard your data across N collections to parallelize things.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
12:38 AM
Only have a single collection. I'm running with --num-documents-parallel-load=10000
12:38
Robert
12:38 AM
Thanks for the tip about going across multiple collections, I'll look into that, thanks!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:40 AM
There are limits to how much things can be parallelized within a single collection because there are hot spots and locks of various kinds. Of course there is potential for much work to be done to improve, but for these large datasets sharding is the most reliable way to scale out because you can independently index N collections without inter dependency.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Speeding Up Typesense Startup Time

Jessica experienced lengthy Typesense startup times. Kishore Nallan offered strategies around indexing times, advising to increase `thread-pool-size` and adjust the `num-collections-parallel-load`. They also offered to check schema for further optimization.

16
3mo

Loading Specific Collections and Deleting Part of Collection in Typesense

Chetan inquired about loading specific collections and deleting part of a collection in Typesense. Jason advised that this was not possible, but suggested trying the `--num-collections-parallel-load` and `--num-documents-parallel-load` options for future reference.

2

9
3mo

Optimizing Typesense Implementation for Large Collections

Oskar faced performance issues with his document collection in Typesense due to filter additions. Jason suggested trying a newer Typesense build and potentially partitioning the data into country-wise collections. They also discussed reducing network latency with CDN solutions.

5

67
11mo

Addressing Cluster Issue due to Excessive Data

Andrew had trouble with cluster operations due to excessive data and collections. Jason advised flushing the data and stated that the upcoming update will remedy such issues. Both agreed to stick to v0.19 and not to fill the cluster excessively.

5

40
35mo

Addressing Typesense Server Issues and Optimization Needs

Robert had an issue with a 'stuck' typesense server. Jason and Kishore Nallan gave advice on handling writes, configuration for high search volumes, and running multiple typesense instances. They also recommended monitoring CPU usage and updating the server version for bug fixes.

1

30
14mo