#community-help

Offloading Idle Keys to Disk for Efficient Memory Usage in Typesense

TLDR Thomas proposed offloading idle keys to disk in data-rich typesense systems. Although Kishore Nallan explains that this conflicts with their in-memory design philosophy, Thomas suggests deleting idle collections from RAM and re-indexing when requested, and shares that they have implemented a similar serverless solution at their company.

Powered by Struct AI

2

27
20mo
Solved
Join the chat
Mar 09, 2022 (20 months ago)
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
08:35 AM
Most servers have NVMe storage these days, can't keys that are rarely accessed be off-loaded to disk to save on RAM? We're building a system that requires a LOT of data, but most of the usage are idle. Being able to offload unused keys to disk would make a gigantic difference. If it takes even 400ms for a search that isn't in RAM, wouldn't matter because it's just the first one.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:30 AM
If performance is not critical, you can try using swap memory to see if OS managed swapping works.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
09:33 AM
Sure, but that would affect performance of the whole index, even those in active use
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:35 AM
Philosophically we've chosen to be a fast, in-memory engine and that is always going to be great for some use cases and bad for others. In my opinion, it's really difficult to do both (disk and memory) since the datastructures that you would choose will be very different. Some just won't work for a disk based design.

1

Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
09:41 AM
I'm aware, the problem with having it as only in memory, always, is that as soon as the use case has idle-data, the cost becomes unfeasible
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:44 AM
Unlike a KV store it's not easy to move parts of the index onto disk.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
10:45 AM
No, that's not what I'm proposing. I'm proposing deleting unused (idle) collections from RAM and re-indexing when it's requested again.
10:45
Thomas
10:45 AM
It's like swapping, but properly
10:46
Thomas
10:46 AM
Anyway, we solved with a custom solution.
10:46
Thomas
10:46 AM
But if it could work in Typesense, then it would be better/more performant
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:04 AM
I see, you mean something like archive + restore at a collection level? The only catch is that the restore part could take anywhere from seconds to minutes depending on size of collection.
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
11:20 AM
Which is fine of course
11:21
Thomas
11:21 AM
Each of the collection would be less than 1000 documents of 10kb each, so would index almost instantly
11:25
Thomas
11:25 AM
But we could have an enormous amount of idle collections without additional cost
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:26 AM
Got it.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
11:59 AM
RAM for dedicated servers is pretty cheap. I have 3TB of RAM in my server, all of it for Typesense, and it didn't cost much at all. Just how much data do you anticipate needing to index?
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
01:42 PM
Billions, it's for site search
01:42
Thomas
01:42 PM
This wouldn't be a problem if we charged for the sites, but we only charge for other usage.
01:43
Thomas
01:43 PM
So the majority of customers will be idle accounts
01:43
Thomas
01:43 PM
We found a solution for this though, but we will use typesense for premium search
01:44
Thomas
01:44 PM
This feature would still be great to have though!
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
11:45 PM
We're going to be doing something similar to this soon. Basically allowing client side collection creation for a specific use case but we don't want to keep everything in memory.

For us it's as simple as deleting the collection and recreating it from the database if it gets requested again, with a small delay until it's ready. The source of truth is already external so it makes sense for us to handle it

1

Mar 10, 2022 (20 months ago)
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
07:17 AM
We created a serverless solution that can handle ~1000 10k documents within 200ms (first load) then 20ms for searches. That way we can offer typesense for the premium users and still have millions of "cheap" users.
07:21
Thomas
07:21 AM
We could code a similar delete and restore ourselves, but it would be difficult to know what collections are โ€œactiveโ€ or not
bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
08:25 AM
Thomas Can you elaborate on the serverless solution you came up with?
Mar 11, 2022 (20 months ago)
bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
12:26 AM
Thomas Do you have a write-up for this somewhere? Sounds interesting
Thomas
Photo of md5-364d4bd42c5fa7cc676d57e1c52abbbc
Thomas
07:01 AM
Unfortunately not, itโ€™s IP part of our company, it was not an easy implementation ๐Ÿ˜„