#community-help

Resolving Excessive Document Addition and Import Errors

TLDR Rushil experienced problems with overloading and failing imports when adding documents. Kishore Nallan advised adjusting batch size, increasing client timeout, and checking import responses for errors. Rushil resolved the issues by following these suggestions.

Powered by Struct AI

1

Nov 01, 2022 (14 months ago)
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
08:36 AM
i added 90k documents and it kept on adding till like 300k
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:36 AM
Check if your client has retry configured. If your timeout is too low and if retry kicks in because of that, the client will send the same data again.
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
08:52 AM
oh okay, sure I adjusted the batch size to 1000 and it seems to be going better
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:04 AM
Yeah that's because a smaller batch size does not timeout.
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
09:12 AM
it still seems
09:12
Rushil
09:12 AM
to be overdoing and when i try to delete it doesnt work
09:13
Rushil
09:13 AM
Any ideas what to do, it worked fine with another set where there was less facets
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:29 AM
Did you try increasing client timeout first?
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
11:38 AM
I resolved the problem, I gave each item a unique id but now the issue is that only 88k of the 94k documents are uploading
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:40 AM
Check the response of the import operation. Each line will have either {"success": true} or the actual error on why the document was not imported.
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
11:50 AM
How would i do that in node if it has to check 94k documents?
11:50
Rushil
11:50 AM
usually it throws an error that it timed out but it still goes through
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:56 AM
We cannot fail an entire import if only a few documents are malformed or does not conform to expected scheme. So the import goes through, but in the result JSON response, we highlight which documents failed to import.
11:57
Kishore Nallan
11:57 AM
Each line in the response corresponds to corresponding line in import, in the same order.
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
12:09 PM
ImportError: 88673 documents imported successfully, 5496 documents failed during import. Use error.importResults from the raised exception to get a detailed error reason for each document
12:09
Rushil
12:09 PM
this is what im getting
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:16 PM
Yes, you have to loop through error.importResults to get the details. Doesn't that work?
Rushil
Photo of md5-23d3662ca8a778f4934a53bf33a66f97
Rushil
12:32 PM
I resolved it thanks! looked back at previous solution

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Resolving Timeout Errors in Large Document Imports

Ken had issues with importing over 360k documents due to operation timeout. Jason advised increasing the timeout in the client library.

1

3
2mo

Resolving Timeout Issues in Bulk Document Imports

Aljosa encountered timeouts when importing 32k documents. Kishore Nallan advised increasing client side timeout. Bruno suggested import in chunks and a retry for individual chunks. Kishore Nallan mentioned the next release of Typesense will make imports more reliable.

1

6
28mo

Issues with Duplicate Entries in New Collections

Ivan reported an issue wherein duplicate entries appear in his database collection. This occurs sporadically when he deletes a collection, creates a new one with the same name, and adds the same entries. Kishore Nallan identified this as a race condition, suggesting a larger timeout value to avoid premature disconnection. Unable to increase the timeout, Ivan instead reduced his import chunks to 500 entries, which successfully eliminated the error.

24
2mo

Troubleshooting Write Timeouts in Typesense with Large CSVs

Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.

2

59
32mo

Troubleshooting Indexing Duration in Typesense Import

Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.

5

43
15mo