#community-help

Trouble with Typesense Memory Usage when Restarting Docker Container

TLDR Blend reports increased memory usage when restarting Docker with Typesense, information shared with Jason and Kishore Nallan. Potential data edge case identified as potential cause, although resolution undetermined.

Powered by Struct AI

6

1

1

1

Dec 20, 2022 (12 months ago)
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
04:29 PM
Hello,
I am running Typesense (typesense:0.24.0.rcn41) with Docker. I run the container, index about 2GB of documents, and everything seems fine. Memory usage is at around 1.7GB.
If I stop the container and start it again (without destroying the data collection in disk), the collections in disk seem to be detected/loaded but memory usage never stops increasing (it got up to 16GB before I stopped the container).
Am I doing something wrong?
(90% of our data is in nested fields, that is why I do not use a stable release. Also, I tried indexing 5% of our data and this problem did not happen)

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:43 PM
Could you share the exact commands you’re using to start the container, and then stop the container?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
05:54 PM
I have this docker-compose.yml
version: '3.3'
services:
    typesense:
        ports:
            - '${TYPESENSE_PORT}:${TYPESENSE_PORT}'
        volumes:
            - ./data:/data
        command: '--data-dir /data --api-key=${TYPESENSE_API_KEY}'
        image: 'typesense/typesense:0.24.0.rcn41'

The first time I start the container, I use docker-compose up, then docker-compose stop and docker-compose start
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:56 PM
Could you post the output of /GET metrics.json, just after you start and then after you stop and start?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
06:00 PM
Yes, I can, though it will take a while for me to post those because the process of indexing the documents from my other database takes a while. I will post the results in about 3 hours. Thank you!

1

Dec 21, 2022 (12 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:43 AM
Blend Does the same issue happen when you restart the container immediately after indexing vs doing it after a few hours?

I wonder if there is an edge case with nested fields that manifest only on a restart. If you are able to share this dataset with us privately (we will promptly destroy it after debugging), we could also take a detailed look at what is happening. Once you have finished indexing, you can zip the data directory and share it with us.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
11:40 AM
Good news (or maybe not?). I have tried to re-produce the issue three times since last night, and it has not happened since. It is the same data being indexed, same version of typesense, same docker commands.
Before posting about it here, the issue happened 4/4 times that I restarted the container.
I will keep checking when I deploy typesense in our prod environment, and also ask my team manager about sharing the data directory with you.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:41 AM
Ok, do you remember if the issue happened when you restarted the container immediately after indexing vs doing it after a few hours?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
11:44 AM
I think it happened to me in both cases. First time I noticed it was when I stopped the container and re-ran it the next day, but then I tried to restart the container immediately after indexing, and ran into it again (and again..).
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:45 AM
I see. If it happens again, regardless of whether you can share the dataset or not, it will be good to take a backup of the Typesense data directory (just copy the whole directory somewhere else).

1

11:46
Kishore Nallan
11:46 AM
Also when you tried reproducing again, did you reindex from scratch?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
11:47 AM
Yes, I was removing the data directory completely.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:48 AM
Okay if it's not a big hassle, can you try this:

a) Clear data directory
b) Reindex the data
c) Trigger a snapshot via the manual snapshot API (https://typesense.org/docs/0.23.1/api/cluster-operations.html#create-snapshot-for-backups)
d) Restart container

1

11:48
Kishore Nallan
11:48 AM
If that also works, then we can assume that whatever happened earlier was something very edgy and which we might have to wait for if it happens again.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
11:55 AM
BTW, this is the response of GET /metrics.json when starting the container the first time, and just after re-starting.
{                                                     {
  "system_cpu1_active_percentage": "0.00",              "system_cpu1_active_percentage": "81.82",
  "system_cpu2_active_percentage": "0.00",              "system_cpu2_active_percentage": "100.00",
  "system_cpu3_active_percentage": "9.09",              "system_cpu3_active_percentage": "100.00",
  "system_cpu4_active_percentage": "0.00",              "system_cpu4_active_percentage": "90.00",
  "system_cpu5_active_percentage": "10.00",             "system_cpu5_active_percentage": "100.00",
  "system_cpu6_active_percentage": "0.00",              "system_cpu6_active_percentage": "100.00",
  "system_cpu7_active_percentage": "0.00",              "system_cpu7_active_percentage": "100.00",
  "system_cpu8_active_percentage": "0.00",              "system_cpu8_active_percentage": "75.00",
  "system_cpu_active_percentage": "1.28",               "system_cpu_active_percentage": "92.50",
  "system_disk_total_bytes": "123816591360",            "system_disk_total_bytes": "123816591360",
  "system_disk_used_bytes": "80937127936",              "system_disk_used_bytes": "83946774528",
  "system_memory_total_bytes": "16575455232",           "system_memory_total_bytes": "16575455232",
  "system_memory_used_bytes": "6468882432",             "system_memory_used_bytes": "7615643648",
  "system_network_received_bytes": "22211",             "system_network_received_bytes": "533235",
  "system_network_sent_bytes": "15146",                 "system_network_sent_bytes": "403554",
  "typesense_memory_active_bytes": "36155392",          "typesense_memory_active_bytes": "713400320",
  "typesense_memory_allocated_bytes": "33771824",       "typesense_memory_allocated_bytes": "696897696",
  "typesense_memory_fragmentation_ratio": "0.07",       "typesense_memory_fragmentation_ratio": "0.02",
  "typesense_memory_mapped_bytes": "116314112",         "typesense_memory_mapped_bytes": "966184960",
  "typesense_memory_metadata_bytes": "10287312",        "typesense_memory_metadata_bytes": "21854176",
  "typesense_memory_resident_bytes": "36155392",        "typesense_memory_resident_bytes": "713400320",
  "typesense_memory_retained_bytes": "57225216"         "typesense_memory_retained_bytes": "631844864"
}                                                      }
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:00 PM
You mean first time, after clearing the data?
12:00
Kishore Nallan
12:00 PM
Ignore, I saw the second column 😄

1

Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
12:44 PM
So, I cleared the data directory, indexed the data, and ran
curl "" -X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

to create a snapshot.
The response was {"success":true} , but I do not see any file/directory being created in the specified path. Doing ls /tmp | grep typesense shows nothing. Is this a bad sign?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:58 PM
It would be created inside the Docker container not on host

1

12:59
Kishore Nallan
12:59 PM
This docker container has a mounted volume which is used as the data directory right?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:06 PM
No. I provide a relative path ('./data') for the source of the volume, so the folder is bind mounted into the container. It is not a volume managed by docker afaik. My docker-compose.yml is up in this thread.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:07 PM
Got it. Yes, ./data (local data dir) is mounted as /data inside the container and this is the path used as data-dir. This is correct.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:08 PM
BTW, I do see the snapshot created inside the container.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:08 PM
👍 if you restart now and everything behaves we are good
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:19 PM
The issue did happen this time. I let the container reach a memory usage of 10 GB (observed thorugh GET /metrics.json) until I stopped it. I will attach the docker conatiner logs here.
01:19
Blend
01:19 PM
I also did zip the data directory before re-starting, so I also have that handy.
01:23
Blend
01:23 PM
This is the response of the GET metrics.json 😅
{
  "system_cpu1_active_percentage": "72.73",
  "system_cpu2_active_percentage": "40.00",
  "system_cpu3_active_percentage": "100.00",
  "system_cpu4_active_percentage": "40.00",
  "system_cpu5_active_percentage": "37.50",
  "system_cpu6_active_percentage": "100.00",
  "system_cpu7_active_percentage": "44.44",
  "system_cpu8_active_percentage": "60.00",
  "system_cpu_active_percentage": "62.03",
  "system_disk_total_bytes": "123816591360",
  "system_disk_used_bytes": "82421317632",
  "system_memory_total_bytes": "16575455232",
  "system_memory_used_bytes": "16246272000",
  "system_network_received_bytes": "4167076",
  "system_network_sent_bytes": "4260233",
  "typesense_memory_active_bytes": "10272575488",
  "typesense_memory_allocated_bytes": "10254819992",
  "typesense_memory_fragmentation_ratio": "0.00",
  "typesense_memory_mapped_bytes": "10580803584",
  "typesense_memory_metadata_bytes": "173259808",
  "typesense_memory_resident_bytes": "10272575488",
  "typesense_memory_retained_bytes": "1700118528"
}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:24 PM
This is great. I think we have a definite bug here that we need to look into.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:27 PM
I see another weird thing. Upon the initial indexing of the data, my data directory was ~2.7 GB, but just before I restarted the container, it was ~560 MB. The zip file I created a few minutes earlier than the restart is 810 MB.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:27 PM
I see On demand snapshot succeeded! 4 times in the log though. Did you trigger it multiple times?
01:27
Kishore Nallan
01:27 PM
The reduction is because of the snapshot truncating all the data that was buffered during indexing, that's fine.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:30 PM
I did call the snapshot creation route exactly 4 times.
Oh interesting. So if I want to estimate the disk usage of our data, I should wait for a while after the indexing?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:30 PM
Also I see the shutdown happen at 13:09:

I20221221 13:09:06.579972     1 typesense_server_utils.cpp:524] Bye.

And the final stop just 2 mins later:

I20221221 13:11:02.677739     1 typesense_server_utils.cpp:48] Stopping Typesense server...

Did the memory grow to 10GB+ within that short time?
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:32 PM
The container is started about 15 seconds later than that
I20221221 13:09:22.098304     1 typesense_server_utils.cpp:355] Starting Typesense 0.24.0.rcn41

But yes, the memory increases very quickly
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:33 PM
> So if I want to estimate the disk usage of our data, I should wait for a while after the indexing?
Reclaiming of disk space is a bit of a complex topic. In a recent build (not available in rcn41 that you are using here), I"ve introduced an option to "compact" the DB manually. That will be useful for heavy write use cases.

1

01:34
Kishore Nallan
01:34 PM
It will be useful to have access to the zip file to debug what's happening. Only I will have access to it and I will promptly destroy once I'm done testing it. Let me know if you can get permission for that.

1

Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:36 PM
I will let you know in an hour or so (that is when the US-based team is online). BTW, the issue happens again if I try to start the conatiner.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:36 PM
That's great. Consistent reproduction is half the battle own.
01:37
Kishore Nallan
01:37 PM
This is likely some data edge case since it doesn't happen when you index 5% of your own data, and we haven't see it on other datasets as well.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
01:40 PM
Ok, I think that we can get away with it for now if we never restart the docker container upon indexing. And re-indexing data if we do restart it. Do you know if there are cases that the container can stop or "fail" by itself, and require a restart?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:51 PM
What you can do is set --snapshot-interval-seconds to a very high value so that snapshotting never happens. Since the issue only happens post snapshotting, this should ensure that even Docker restarts, it will work. This is a theory, but I'm reasonably confident about it.

Container shouldn't fail by itself unless it crashes due to some reason.

1

01:52
Kishore Nallan
01:52 PM
Default snapshot interval is 3600s (1 hour)
04:10
Kishore Nallan
04:10 PM
Btw, we will be happy to sign a NDA if required as well.
Blend
Photo of md5-94c93df7325e8fde185c76c659656ee9
Blend
04:41 PM
I talked to our CTO, he also talked about this and I will have an answer by tomorrow. Thank you.
Dec 22, 2022 (12 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:40 AM
Thank you. I was going through an earlier email thread and I see that there could be some records with an array of "thousands of nested objects" --> this could be one area where the issue could be happening and might also explain why 5% of the data worked fine. If there is a way to extract these records out and try indexing on that, we might even narrow down a small subset for sharing.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Crash and Recovery Issues with Node Reindexing

Greg encountered issues with node health during reindexing, with service unresponsive and recovery taking significant time. Jason and Kishore Nallan suggested it might be a case of high volume writes and not a crash. Problem wasn't fully resolved after attempted solutions and data sharing for further debugging.

2

40
yesterday

Troubleshooting Typesense Snapshot Errors in AWS Docker Container

Arthur experienced recurring Typesense snapshotting errors in an AWS Docker container. Kishore Nallan diagnosed two different issues: a known Google log issue to be fixed in the next release and an unknown file opening error, and guided Arthur on creating GitHub issues for both.

20
27mo
Solved

Large JSONL Documents Import Issue & Resolution

Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.

run

4

94
9mo
Solved

Troubleshooting Typesense Server Error on Docker

vikram was facing an error with Typesense Server Docker container and loss of data on restart. Kishore Nallan guided to avoid mounting tmp directory from localhost and explained stopping the Docker container.

2

16
14mo
Solved

Using Typesense in Docker Container – Importing JSONL File

Hakim faced 'Empty reply from server' error when importing a JSONL file into a Typesense-Docker container. Kishore Nallan and Jason suggested checking the docker logs and increasing default RAM and CPU allocation. Finally, Hakim successfully indexed the documents using a PHP script.

9
16mo
Solved