Jason Bosco
08/23/2024, 4:14 PMYou wouldn't be able to hold that much data in memory unless you can split it in machines of 32 gb to 64 gb. That's the hard limit we reachedQuick note to clarify this - we have users using Typesense with several hundreds of GBs of RAM. So this is not a hard limit within Typesense. As long as you have sufficient RAM to hold your data in memory and CPU to handle the indices / searching, Typesense can handle more data.
The limit is that it takes an hour to rebuild the indexes and that affects also restarts, etc...Typesense does rebuild indices on restarts, and the amount of time it takes depends on the number of CPU cores you have, and the configuration of
num-collections-parallel-load
and num-documents-parallel-load
.
So for 100s of millions of rows, it could take a few hours to rebuild the indices.
This is a conscious design decision we made to in order to keep version upgrades seemless - it's just a restart of the process and Typesense will reindex using any new datastructures that might have changed internally.
So if there's any issue, you risk having a very long downtimeTo avoid this downtime in production, you'd want to run a clustered Highly Available setup with multiple nodes, and only rotate one node at a time, wait for it to come back, before rotating other nodes. This way the cluster can still accept reads / writes on the other two nodes, while the 3rd node is being rotated and is rebuilding indices.