#community-help

Bulk Import 50MB JSON Files Error - Timeout and Solutions

TLDR madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.

Powered by Struct AI

3

1

21
26mo
Solved
Join the chat
Nov 19, 2021 (26 months ago)
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
11:09 AM
Trying to bulk import 50mb json files and get this error. Any suggestions on how to fix?

Request #1637320016147: Request to Node 0 failed due to "undefined Too many properties to enumerate"
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:09 AM
11:11
Kishore Nallan
11:11 AM
We are looking to address this. For now, if you just batched up the imports into smaller chunks, it will work. Or just use curl to import the full file. Typesense can support imports of several GBs of file directly, so this is just a JS client limitation at the moment.
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
11:28 AM
Kishore Nallan how do I use curl to import the file? I’m using typesense cloud. Is there an example
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:41 AM
curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -X POST --data-binary @/path/to/file.jsonl ""

The file needs to be in JSONL. Described here: https://typesense.org/docs/0.21.0/api/documents.html#import-a-jsonl-file
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
12:23 PM
curl -H "X-TYPESENSE-API-KEY: ${api key}" -X POST --data-binary @./path/to/file.jsonl "https://{host name}.a1.typesense.net/collections/products/documents/import"
12:23
madhweep
12:23 PM
Getting a failed to connect to {host name} operation time out error
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:58 PM
As soon as you run it or after some time?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:28 PM
madhweep could you try using v1.0.3-2 of typesense-js? I fixed a similar issue and I’m curious if that fix works in your case as well.
03:28
Jason
03:28 PM
Also for large datasets, you want to make sure you set connection timeout to a large value
madhweep
Photo of md5-cd364dad5e546eae959ac6570110513e
madhweep
05:05 PM
Kishore Nallan it runs for like 15-20 mins and then times out . Jason just tried it and it didn’t fix it. Looks like this is a js dependency Issue, I can use python for the bulk upload then
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:34 PM
Hmm it times out for 50MB with curl? Could you share the exact curl command you’re using and the collection schema?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
05:49 PM
The same happened to me with the Python library, so I used curl and it got the job done
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:52 PM
Madhweep - I wonder if you're running out of RAM that's causing the timeout. Because usually I've seen timeouts happen on the client-side (by default it's set to just 2s IIRC), so you'd have to increase that to like say 10 minutes...
05:52
Jason
05:52 PM
Daniel - did it still timeout after you increased the client-side timeout?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
06:04 PM
Yeah
06:05
Daniel
06:05 PM
It's not an issue though (at least for me), I don't think most people are going to add millions of docs
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:05 PM
If you have a script and a dataset you can share (you can email it to [email protected]) that replicates the issue, I can take a closer look

1

06:07
Jason
06:07 PM
I'm mainly curious because just yesterday I was able to index 1.1M docs with the script below, that reads from a 1GB JSONL file and inserts 500K docs at a time into Typesense: https://github.com/typesense/showcase-airbnb-geosearch/blob/bcf50950d6d74d5d62aac3ed139dc74981b9ff88/scripts/index_data.js

So I'm wondering what's happening...
11:15
Jason
11:15 PM
Summarizing what we found via DM - the cluster had run out of memory, which is what was causing the timeouts. Indexing it on a cluster with sufficient memory solved the issue

1

1

Nov 20, 2021 (26 months ago)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:36 PM
For me it was a low timeout, after I added more it ran without issues, the py script

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Troubleshooting Write Timeouts in Typesense with Large CSVs

Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.

2

59
32mo

Troubleshooting Indexing Duration in Typesense Import

Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.

5

43
15mo
Solved

Issues with Importing Typesense Collection to Different Server

Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.

3

35
3mo
Solved

Importing data to Typesense Cloud with JS client

Zaiste attempted to import data to Typesense Cloud but encountered an error. Kishore Nallan explained the error was due to server side backpressure and suggested a curl approach for the import.

4

13
25mo
Solved

Discussion on Document Inserting Speed and Process

David inquired about document insertion speed, and Jason provided reference values and recommended sending more documents per API call. Both David and Chetan acknowledged the suggestions, with David stating to report back on their experience.

5

23
4mo
Solved