Typesense RAM Limitation and Memory-Mapped Files
TLDR TJ asked if Typesense would move to a model using memory-mapped files for datasets. Kishore Nallan said they have no concrete plans but appreciated the idea and would consider community feedback.
Mar 11, 2023 (9 months ago)
Kishore Nallan11:51 AM
Typesense is currently optimized for fast, real-time search. This requires RAM. Memory mapping is not a magic bullet: it works well when you are most likely to search on a portion of your dataset often so those parts of the data could be kept "hot" in the memory. This might not always hold true for search engines, in which case you will only get the performance of a disk based search engine like Elasticsearch because the kernel will have to continuously page memory in/out of disk.
Perhaps you'll find that people can sacrifice a few extra milliseconds for the cost, depending on how bad the performance is, if you explore storing it on disk. Or maybe provide the flexibility on a per collection basis: either indexed entirely in memory or with memory mapping. Just an idea.
Kishore Nallan12:11 PM
Indexed 3005 threads (79% resolved)
Understanding Dataset Sizes and Data Types for Typesense
Ethan questioned about dataset size limits and data types for Typesense. Jason clarified that as long as the dataset fits the RAM, Typesense works, also adding that Typesense supports only JSONL.
Troubleshooting Typesense Document Import Error
Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.
Understanding Data Storage in Typesense
Ethan wanted information on how to index large amounts of data. Jason guided that Typesense is for secondary data storage and all data for search results must be in Typesense.