#community-help

Systemic Deletion of Collection in Typesense on Amazon EC2 Instance

TLDR Tatu's Typesense collections were mysteriously disappearing. After investigating with Kishore Nallan, they discovered it was due to misuse of the PHP library which deleted the whole collection instead of a single record.

Powered by Struct AI
38
24mo
Solved
Join the chat
May 03, 2021 (28 months ago)
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:04 AM
I have successfully set up Typesense running on an Amazon EC2 instance, except that one of my collection containing ~140k records keeps deleting itself around once a day. There is no process running that could explain this behaviour on application side, and there are no helpful messages in the typesense log file. I'm also using the PHP library with Laravel adapter.

Are there any reasons why Typesense would drop a collection by itself? Any idea where I could start diagnosing something like this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:06 AM
👋 Can you please clarify what you mean by deleting themselves? The collection is empty or does not exist or contains fewer documents than previously indexed?

Have you checked that there is enough RAM and Typesense isn't restarting by checking the logs?
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:11 AM
The collection does not exist, I have to recreate it from scratch. I also thought it could be a RAM issue, but the problem persists even after I upgraded to a server with 16 GB of RAM, of which 13.7 GB is still available.

I have one smaller collection that is also doing this but a bit less often, and four collections that are stable.

I don't see anything relevant in the log, no mentions of a restart. But a cold start from disk should anyways be possible?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:12 AM
Can you check if you are using ephemeral storage (if you are on EC2) and whether the host instance is getting restarted or something?
11:13
Kishore Nallan
11:13 AM
> But a cold start from disk should anyways be possible?
Yes, certainly. What are the contents of the Typesense data directory? Check the date time of the files.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:14 AM
The host instance is running fine, the data is kept (EBS storage), no restarts. So I don't expect to be an instance issue. Maybe something operating system related
11:15
Tatu
11:15 AM
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:17 AM
That log snippet looks fine to me. Once a day suspiciously sounds like a cron to me.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:19 AM
It's not exactly once a day, I'd say around 15-30 hours
11:20
Tatu
11:20 AM
And this server was purpose built for typesense, nothing else installed. Quite frustrating, I know there's a reason for this but can't think of anything that might cause this. Server uptime is 4 days, the collection has dropped multiple times since that etc.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:22 AM
Can you check the date time of the files in your data directory?
11:22
Kishore Nallan
11:22 AM
Also do you've metrics that show how the free memory on the instance varies over time, and whether that correlates with the collections going missing?
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:22 AM
They are all quite recent
11:23
Tatu
11:23 AM
No metrics yet, but the server is quite beefy for this dataset, I wouldn't expect all 16 GB of RAM to be eaten up, but I'll setup something if I can't think of anything else
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:24 AM
How many times does the phrase Starting Typesense occur in the logs? I presume the logs themselves are preserved for all the 4 days of uptime.
11:25
Kishore Nallan
11:25 AM
(edited last message to remove the version number, it must be just Starting Typesense)
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:26 AM
Just twice, when I initially started it. So it seems like it has been running
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:27 AM
So strange. Can you try upgrading to v0.20.0 -- that shouldn't really change much but will helpful to be on the latest version to compare logs etc.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:28 AM
Yep, I'll try that next.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:28 AM
One other experiment I would try is to also create other data on the server, like generate an API key and see whether that also goes missing when collections are missing.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:29 AM
Here are the six collections. The smallest three have survived, the others I've had to recreate
11:29
Tatu
11:29 AM
So at least it's keeping some of the collections. Don't know about API keys, but at least the initial ones I created are still working.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:31 AM
Do you have any cron or periodic jobs running? This certainly seems like an issue of some process dropping the collection during re-indexing. I might be wrong, but if only some collections go missing then that certainly is very strange.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:32 AM
I have cron jobs that are able to rebuild indexes, but none of them have been running. I only run those manually when I need to recreate an dropped collection.

But I wouldn't put it past me to find out something on the application side is actually doing the damage.
11:33
Tatu
11:33 AM
But as there's no clear reason for now, I'll just try to upgrade to 0.20 and add some metrics so I can get some more reliable data
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:33 AM
One quick to way to verify: just stop all inbound traffic to instance by modifying security group and see what happens.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:35 AM
Or actually, I can spin up an identical instance and leave that untouched traffic wise, and see if the collections are still dropped. At least then we'll know if it's caused by some traffic, or if it's happening without outside influence
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:35 AM
👍
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:36 AM
Could there be an situation where somehow corrupted or malformed data could cause typesense to freak out and drop the collection? That's a possibility also.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:37 AM
Unlikely, because collection look up to check if a collection exists is done off an in-memory hash map and so that will never be wrong even if disk becomes corrupted.
11:37
Kishore Nallan
11:37 AM
Also 0.19.0 has been successfully deployed and used by multiple customers for 2+ months now with no issues. This is such a serious issue, it should have surfaced by now.
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
11:38 AM
Yep. I'm also expecting (and hoping) this to be something stupid created by myself, but we'll see.
11:39
Tatu
11:39 AM
All right, I'll keep investigating and keep you posted. Thanks so far.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:39 AM
Welcome.
May 14, 2021 (28 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:15 PM
Tatu Did you figure out what was happening here?
Jun 28, 2021 (27 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:20 PM
Tatu Sorry to follow up on this once again: did you get to the bottom of what was happening here? Since this seemed like a serious issue, I just want to make sure that there are not gotchas that we might have missed.
Sep 24, 2021 (24 months ago)
Tatu
Photo of md5-b4a54d591f9148a83dc5f8e2fed6f871
Tatu
12:04 PM
Kishore Nallan Sorry for being inactive here and not replying to this issue. The problem ended up being just me using the PHP library wrong. The correct way to delete a single record is $client->collections['products']->documents($id)->delete(), not $client->collections['products']->delete($id) which I was doing and which coincidentally deletes the whole collection. The syntax "looks" valid, which is why it took me a long time to figure out what's wrong.

Maybe a point of improvement to the PHP library would be to throw a warning if a parameter was used with collection deletion, but I can raise an issue about this in the typesense-php repo.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:05 PM
Oh that's interesting. Thanks for pointing this out. We will take a look 👍