Clarification on RAFT and Trie in GitHub
TLDR Prabhat asked Kishore Nallan about Trie maintenance, durability, and sharding in GitHub. Kishore Nallan explained the in-memory storage of indexing data and provided a relevant source code link.

Mar 17, 2022 (18 months ago)
Prabhat
07:43 AMAttended your awesome talk today in GitHub, I asked few Qns related to RAFT if you remember but I’ve a bunch of other Qns like Is Trie always maintained in memory, how do you ensure durability of trie while Indexing, Is sharding of tries also possible etc. Can you point me to any design doc or something which I can read to get more info or point me to relevant code folders where I can dig up info myself?
Kishore Nallan
07:48 AM1. All indexing data structures are stored in-memory, including the Trie. Here's the trie implementation, which is forked off a simpler library: https://github.com/typesense/typesense/blob/master/src/art.cpp
2. The trie is reconstructed on start, only raw documents are stored on disk. This allows us to modify / introduce new datastructures without the baggage of migrating on-disk structures, which can be cumbersome. The downside is that there is some "boostrapping" time as the indexes are built from scratch from the raw documents. But this is again a trade-off chosen specifically for the kind of uses cases and datasets we've chosen to support.

Typesense
Indexed 2764 threads (79% resolved)
Similar Threads
Addressing High CPU Usage in Typesense
Robert reported high CPU usage on Typesense, even after halting all incoming searches. Kishore Nallan suggested logging heavy queries and increasing thread count. The issue was resolved after Robert found and truncated unusually large documents in the database.
Troubleshooting Stalled Writes in TypeSense Instance
Robert was experiencing typesense instances getting stuck after trying to import documents. Kishore Nallan provided suggestions and added specific logs to diagnose the issue. The two identified queries causing troubles but the issues had not been fully resolved yet.


Large JSONL Documents Import Issue & Resolution
Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.
