Reducing Memory Usage in Large Dataset Indexing
TLDR Alan asked for an article on reducing memory usage when indexing large datasets. Jason provided general tips without sharing an article.
May 11, 2023 (4 months ago)
Alan
03:57 PMJason
04:52 PM• faceting adds additional memory overhead, so be sure to only turn it on for fields you’re actually using in
facet_by
• Enabling string sorting on a fields adds memory overhead
• Enabling infix on a field adds memory overhead
• Make sure that you only specify fields you’re searching / filtering / faceting / grouping / sorting on in the schema - you don’t have to mention every field in your document in the collection schema, even though you can send additional fields when indexing in the document. These additional fields will be stored on disk and returned when the document is a hit, and won’t count towards memory consumption
Typesense
Indexed 2764 threads (79% resolved)
Similar Threads
Optimizing Bulk Indexing and Reducing RAM Usage in Typesense
Timon experienced issues with Typesense becoming unresponsive during bulk indexing and sought advice. Jason recommended larger import requests and adjusting the client-side timeout allowance, revealing a need to increase RAM allocation for Docker. Kishore Nallan undertook to find ways to optimize memory usage, particularly for geopoint indexing.


Understanding Typesense Indexing and Memory Usage
Ed inquires about pros and cons of indexing in typesense. Kishore Nallan and Jason explain the purpose and benefits of Typesense as a secondary data store and how to optimize memory usage.

Understanding Memory Usage Breakdown for Collections
Chetan asked for details on determining memory usage in collections. Kishore Nallan explained that it wasn't feasible by collection but dependant on the shape of data, and other factors and once 100k docs are present, extrapolation can be attempted.

