Boubacar BARRY
08/23/2024, 7:33 AMÓscar Vicente
08/23/2024, 7:51 AMBoubacar BARRY
08/23/2024, 11:24 AMÓscar Vicente
08/23/2024, 11:25 AMÓscar Vicente
08/23/2024, 11:25 AMÓscar Vicente
08/23/2024, 11:26 AMÓscar Vicente
08/23/2024, 11:26 AMBoubacar BARRY
08/23/2024, 11:27 AMÓscar Vicente
08/23/2024, 11:31 AMÓscar Vicente
08/23/2024, 11:31 AMBoubacar BARRY
08/23/2024, 11:35 AMÓscar Vicente
08/23/2024, 12:03 PMÓscar Vicente
08/23/2024, 12:03 PMBoubacar BARRY
08/23/2024, 1:07 PMÓscar Vicente
08/23/2024, 1:34 PMÓscar Vicente
08/23/2024, 1:39 PMBoubacar BARRY
08/23/2024, 2:26 PMBoubacar BARRY
08/23/2024, 2:31 PMÓscar Vicente
08/23/2024, 3:42 PMÓscar Vicente
08/23/2024, 3:43 PMJason Bosco
08/23/2024, 4:14 PMYou wouldn't be able to hold that much data in memory unless you can split it in machines of 32 gb to 64 gb. That's the hard limit we reachedQuick note to clarify this - we have users using Typesense with several hundreds of GBs of RAM. So this is not a hard limit within Typesense. As long as you have sufficient RAM to hold your data in memory and CPU to handle the indices / searching, Typesense can handle more data.
The limit is that it takes an hour to rebuild the indexes and that affects also restarts, etc...Typesense does rebuild indices on restarts, and the amount of time it takes depends on the number of CPU cores you have, and the configuration of
num-collections-parallel-load
and num-documents-parallel-load
.
So for 100s of millions of rows, it could take a few hours to rebuild the indices.
This is a conscious design decision we made to in order to keep version upgrades seemless - it's just a restart of the process and Typesense will reindex using any new datastructures that might have changed internally.
So if there's any issue, you risk having a very long downtimeTo avoid this downtime in production, you'd want to run a clustered Highly Available setup with multiple nodes, and only rotate one node at a time, wait for it to come back, before rotating other nodes. This way the cluster can still accept reads / writes on the other two nodes, while the 3rd node is being rotated and is rebuilding indices.
Jason Bosco
08/23/2024, 4:23 PMÓscar Vicente
08/23/2024, 4:53 PMÓscar Vicente
08/23/2024, 4:53 PMJason Bosco
08/27/2024, 2:21 AMThe more, smaller collections the betterIn general yes - this is better.
You won't have HA while upgrading... Or you'll have to pay for another node while the process happensWhen you set up HA, you'll be spinning up 3 nodes and running it 24x7. Then when you upgrade, you'll rotate one node at a time. So even if that single node takes a few hours for a large dataset, the other two nodes will still continue serving traffic, while the 3rd one is re-indexing. So there won't be any downtime when upgrading in an HA set up. Once all 3 nodes are stable again, you'd then rotate the 2nd node, wait for it to reindex and then the 3rd node. So during this whole operation the cluster will still be healthy and serving requests with the other two nodes. Besides that point, yes - larger collections will take more time for schema changes, reindexing, syncing between nodes during rotations, etc. So best to keep collections smaller when possible.
Óscar Vicente
08/27/2024, 10:01 AM