Troubleshooting Write Timeouts in Typesense with Large CSVs
TLDR Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.
1
1
Jun 11, 2021 (30 months ago)
Agustin
01:51 AMFor this purpose, I created a python script that loads a huge csv file, cleans it, separates it in chunks and imports records in parallel through the Typesense API. While the first few thousand records load correctly, I start getting write timeouts from the Typesense library seconds later:
ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))
I tried retrying failing requests, but I can't even seem to catch the exceptions in the
import_
function. My instance should have more than enough memory and CPU to handle everything. (10 cores/20 gb RAM with a 8gb dataset)Any ideas?
Kishore Nallan
02:23 AM2. Are you using the import API? If so, you don't have to parallelize the writes: the API itself has batching parameter that allows you to send large data in.
Agustin
03:07 AM2. Because the dataset is bigger than local memory (> 8 gb) I can't load it all at once, so I use Dask Dataframes to load out-of-core and operate on chunks.
Kishore Nallan
03:18 AMAgustin
03:19 AMAgustin
03:19 AMKishore Nallan
03:20 AMAgustin
03:21 AMKishore Nallan
03:21 AMresponse_abort called
in the logs though.Kishore Nallan
03:21 AMKishore Nallan
03:22 AM1
Agustin
03:23 AMKishore Nallan
03:23 AMKishore Nallan
03:24 AMWill depend on the http client used. It should be smart enough. Otherwise, just use CURL.
Agustin
03:26 AMKishore Nallan
03:28 AMKishore Nallan
03:28 AMAgustin
03:29 AMKishore Nallan
03:29 AMAgustin
03:30 AMAgustin
03:30 AMKishore Nallan
03:30 AMAgustin
03:31 AMKishore Nallan
03:31 AMRicardo
08:11 AMRicardo
08:12 AMKishore Nallan
08:16 AMRicardo
09:20 AMtypesense_client.collections['collection'].documents.import_(documents, {'action': 'upsert'})
Ricardo
09:20 AMKishore Nallan
09:20 AMRicardo
09:21 AMKishore Nallan
09:22 AMRicardo
09:23 AMRicardo
09:23 AMRicardo
09:23 AMKishore Nallan
09:23 AMKishore Nallan
09:24 AMRicardo
09:32 AM1
sonu
02:21 PMJun 13, 2021 (29 months ago)
Agustin
02:18 AMAgustin
02:18 AMKishore Nallan
02:21 AMa) How many lines does the generated JSONL file contain
b) What exact CURL command you are using?
c) How long does the CURL command run before getting this error?
d) After the curl command fails, how many records were imported successfully on Typesense?
Agustin
02:28 AMb)
curl -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" -X POST --data-binary @$FILE \
"http://$ENDPOINT/collections/revenue_entry/documents/import?action=create"
c) 1m 15s
d) Around 70k
Kishore Nallan
02:31 AMAgustin
02:31 AMKishore Nallan
02:32 AMKishore Nallan
02:33 AMAgustin
02:34 AMKishore Nallan
02:35 AMAgustin
02:35 AMKishore Nallan
02:37 AMAgustin
02:42 AMKishore Nallan
02:43 AMAgustin
03:19 AMAgustin
03:22 AMKishore Nallan
03:25 AMAgustin
03:35 AMAgustin
03:36 AMKishore Nallan
03:57 AMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Bulk Import 50MB JSON Files Error - Timeout and Solutions
madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.
Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.
Discussion on Document Inserting Speed and Process
David inquired about document insertion speed, and Jason provided reference values and recommended sending more documents per API call. Both David and Chetan acknowledged the suggestions, with David stating to report back on their experience.
Typesense Server Bulk Import/Upsert Issue Resolved
Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.
Issues with Importing Typesense Collection to Different Server
Kevin had problems migrating a Typesense collection between Docusaurus sites on different machines. Jason advised them on JSONL format, handling server hosting, and creating a collection schema before importing documents, leading to successful import.