Estimating RAM Requirements for Indexing Documents
TLDR Epi asked about index sizes in relation to document sizes and RAM requirements for their dataset. Kishore Nallan suggested indexing a sample and extrapolating results, and confirmed suitability for indexing large documents like Wikipedia articles in Typesense.
Dec 04, 2022 (9 months ago)
I'm trying to figure out the RAM requirements for my dataset, which is let's say (for example) 10M-100M documents ranging from 5k to 50k tokens per document.
Kishore Nallan11:28 AM
Dec 05, 2022 (9 months ago)
Dec 06, 2022 (9 months ago)
Kishore Nallan02:55 AM
Indexed 2764 threads (79% resolved)
Discussing Document Indexing Speeds and Typesense Features
Thomas asks about the speed of indexing and associated factors. The conversation reveals that larger batch sizes and NVMe disk usage can improve speed, but the index size is limited by RAM. Jason shares plans on supporting nested fields, and they explore a solution for products in multiple categories and catalogs.
Handling Large Document Indexing in Typesense
Anish asked about handling large documents in typesense, then found their answer within a linked thread.
Optimizing Bulk Indexing and Reducing RAM Usage in Typesense
Timon experienced issues with Typesense becoming unresponsive during bulk indexing and sought advice. Jason recommended larger import requests and adjusting the client-side timeout allowance, revealing a need to increase RAM allocation for Docker. Kishore Nallan undertook to find ways to optimize memory usage, particularly for geopoint indexing.