#community-help

Bulk Import 50MB JSON Files Error - Timeout and Solutions

TLDR madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.

Powered by Struct AI

3

1

21
24mo
Solved
Join the chat
Nov 19, 2021 (24 months ago)
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
11:09 AM
Trying to bulk import 50mb json files and get this error. Any suggestions on how to fix?

Request #1637320016147: Request to Node 0 failed due to "undefined Too many properties to enumerate"
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:09 AM
11:11
Kishore Nallan
11:11 AM
We are looking to address this. For now, if you just batched up the imports into smaller chunks, it will work. Or just use curl to import the full file. Typesense can support imports of several GBs of file directly, so this is just a JS client limitation at the moment.
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
11:28 AM
Kishore Nallan how do I use curl to import the file? I’m using typesense cloud. Is there an example
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:41 AM
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X POST --data-binary @/path/to/file.jsonl ""

The file needs to be in JSONL. Described here: https://typesense.org/docs/0.21.0/api/documents.html#import-a-jsonl-file
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
12:23 PM
curl -H "X-TYPESENSE-API-KEY: ${api key}" -X POST --data-binary @./path/to/file.jsonl "https://{host name}.a1.typesense.net/collections/products/documents/import"
12:23
madhweep
12:23 PM
Getting a failed to connect to {host name} operation time out error
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:58 PM
As soon as you run it or after some time?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:28 PM
madhweep could you try using v1.0.3-2 of typesense-js? I fixed a similar issue and I’m curious if that fix works in your case as well.
03:28
Jason
03:28 PM
Also for large datasets, you want to make sure you set connection timeout to a large value
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
05:05 PM
Kishore Nallan it runs for like 15-20 mins and then times out . Jason just tried it and it didn’t fix it. Looks like this is a js dependency Issue, I can use python for the bulk upload then
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:34 PM
Hmm it times out for 50MB with curl? Could you share the exact curl command you’re using and the collection schema?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
05:49 PM
The same happened to me with the Python library, so I used curl and it got the job done
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:52 PM
Madhweep - I wonder if you're running out of RAM that's causing the timeout. Because usually I've seen timeouts happen on the client-side (by default it's set to just 2s IIRC), so you'd have to increase that to like say 10 minutes...
05:52
Jason
05:52 PM
Daniel - did it still timeout after you increased the client-side timeout?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
06:04 PM
Yeah
06:05
Daniel
06:05 PM
It's not an issue though (at least for me), I don't think most people are going to add millions of docs
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:05 PM
If you have a script and a dataset you can share (you can email it to [email protected]) that replicates the issue, I can take a closer look

1

06:07
Jason
06:07 PM
I'm mainly curious because just yesterday I was able to index 1.1M docs with the script below, that reads from a 1GB JSONL file and inserts 500K docs at a time into Typesense: https://github.com/typesense/showcase-airbnb-geosearch/blob/bcf50950d6d74d5d62aac3ed139dc74981b9ff88/scripts/index_data.js

So I'm wondering what's happening...
11:15
Jason
11:15 PM
Summarizing what we found via DM - the cluster had run out of memory, which is what was causing the timeouts. Indexing it on a cluster with sufficient memory solved the issue

1

1

Nov 20, 2021 (24 months ago)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:36 PM
For me it was a low timeout, after I added more it ran without issues, the py script

1