#community-help

Resolving Timeout Issues in Bulk Document Imports

TLDR Aljosa encountered timeouts when importing 32k documents. Kishore Nallan advised increasing client side timeout. Bruno suggested import in chunks and a retry for individual chunks. Kishore Nallan mentioned the next release of Typesense will make imports more reliable.

Powered by Struct AI

1

Sep 22, 2021 (28 months ago)
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
03:59 PM
Hey there :man-raising-hand:

I've implemented a rebuild process for all documents when I update a collection schema.

So in my backend, I call the updateSchema endpoint which drops the collection, creates a new one with the schema (I should use aliases but just found out about them) and then retrieves all items from the DB to reindex them with the new schema.

Currently the DB has 32k items but will have millions in production. When I send all 32k using documents().import() , it fails with timeouts. Sending batches of approx 10k to the same call works.

I'm not sure why it makes a difference since the import call already uses batching.

Is there an upper limit to how many documents you can send with import() ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:02 PM
👋 I think you are running into client side timeout during import. For example, if you use curl it should import the whole set without issues.
04:02
Kishore Nallan
04:02 PM
You can increase the timeout in the client configuration.
Bruno
Photo of md5-0ca37054c6c9042aa04fcfb92cc7d99c
Bruno
04:35 PM
My two cents: it's probably good practice to send the items in chunks anyway, else you lose the whole import if there's any hiccup whatsoever, and you can implement a retry with upsert.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
06:27 PM
Kishore Nallan makes sense! I have the default 2 seconds from the docs, I'll try to bump that up. Thank you for your advice.

Bruno understood, I think I'm gonna keep it chunked (I have clear sections in my data anyway that are anywhere from 2k to 20k rows) and add the retry for individual chunks if ever I run into a timeout.

Thank you both

1

Sep 23, 2021 (28 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:29 AM
In the next release of Typesense (which you can already preview if you want via a RC build), we have made the imports atomic. So if a client times out or disconnects, partial updates don't creep it. This should make imports more reliable.