Large Collections vs Smaller Collections Performance
TLDR Ricardo asked about the benefits of having either one large collection or multiple smaller collections. Jason recommended sharding data into multiple smaller collections for better performance.

Jun 22, 2023 (3 months ago)
Ricardo
05:15 AMso 1 collection -> 500k entries
many collections -> 1k -> 20k documents
we mostly search through 1 collection at a time, if we were to split it up in many collections. Would this bring any search gains, vs the overhead of managing the collections? re-indexing is easier if we have many collections.
Ricardo
05:16 AMJason
03:59 PMIn your case, I would recommend sharding the data into multiple collections (many collections -> 1k -> 20k documents), because in general the smaller the collection the better performance you can expect.
Jun 23, 2023 (3 months ago)
Ricardo
04:33 AM
Typesense
Indexed 2764 threads (79% resolved)
Similar Threads
Optimizing Document Re-ingestion in Typesense
Viktor and Elyes discuss ways to handle frequent doc updates in Typesense. Kishore Nallan recommends using the update/upsert mode, data sharding, and the emplace action for efficient re-ingestion.
Discussing Potential Sharding of Data across Nodes
Vadali spoke with Jason about potentially sharding data across multiple nodes, seeking solutions for their large dataset. Jason recommended reducing the indexed fields and considering splitting future data across multiple collections.
Performance Characteristics of Filtering Search Results
Oskar queries the performance difference in filtering search results. Jason clarifies how filters work and provides performance improvement suggestions like increasing vCPUs and sharding the collection. Kishore Nallan explains filter IDs and document ID matching. The thread concludes with discussions on performance tradeoffs in filter implementation.