Resolving JSONL File Import Issues in Python
TLDR Jon struggles importing a large JSONL file using Python, encountering decode errors and size restrictions. Kishore Nallan instructs to use curl for imports under 10GB, and references an update to the Python client which could more capably handle large imports.
Jan 08, 2023 (9 months ago)
Jon
04:50 PMJan 09, 2023 (9 months ago)
Kishore Nallan
01:06 AMJon
01:54 AMJon
01:55 AMJon
02:10 AMJon
02:10 AMKishore Nallan
04:20 AMKishore Nallan
04:32 AMIterable
here: https://github.com/typesense/typesense-python/pull/22This is available in
0.15.0
version of the Python client that I've just published.Jan 16, 2023 (9 months ago)
Jon
05:57 PMJan 17, 2023 (9 months ago)
Jon
03:19 PMKishore Nallan
03:24 PMKishore Nallan
03:25 PMJon
03:25 PMJon
03:25 PMKishore Nallan
03:29 PMKishore Nallan
03:29 PMKishore Nallan
03:32 PMKishore Nallan
03:34 PMcurl
will work fine as long as POST data is less than 10 GB. So if your total dataset size is 28 GB, you will need to split into 3 files.Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Issues with Importing Typesense Collection to Different Server
Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.
Troubleshooting Write Timeouts in Typesense with Large CSVs
Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.
Bulk Import 50MB JSON Files Error - Timeout and Solutions
madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.
Troubleshooting Typesense Document Import Error
Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.
Discussion on Document Inserting Speed and Process
David inquired about document insertion speed, and Jason provided reference values and recommended sending more documents per API call. Both David and Chetan acknowledged the suggestions, with David stating to report back on their experience.