Resolving Typesense Documents Import Error
TLDR Aljosa experienced an error while using typesense documents().import()
, related to handling of large document arrays. Jason clarified that batch_size controls server-side batching, not client-side. He advised splitting arrays to address the issue and committed to elaborating its functionality in the docs. Aljosa proposed amending the TypeScript types to accommodate batch_size in the import options.
1
Nov 08, 2021 (26 months ago)
Aljosa
10:46 PMWith typesense.js 0.14.0 I was using batch_size as an option as described in the doc but in 1.0.0 with typings, batch_size is not an accepted option. Anyways, reducing it to 1 or using the default 40 didn't matter. The only way I was able to resolve it was by literally splitting the array in half and doing two imports one after the other
Aljosa
10:49 PMJason
11:04 PMThe solution for such large imports would be to convert to JSONL, and then send that JSONL string into the import method and you won't run into this issue.
Now the
batch_size
parameter you mention is actually a Typesense server parameter which does something different - server-side batching, after ever X documents imported, it will pause and look at the search request queue services those and then get back to importing.Aljosa
11:06 PMI see that actually in typesense js it's converted to JSONL anyways https://github.com/typesense/typesense-js/blob/a21d4101bc21fe59e0e85b41e64ba14d6fe88667/src/Typesense/Documents.ts#L112
Aljosa
11:06 PMJason
11:07 PMAljosa
11:07 PMbatch_size
parameter, I believe this is a documentation bug then ?Jason
11:09 PMAlso notice how that method calls JSON.stringify on the entire array object. That's what causes the issue. One thing we could do is to split large arrays into smaller ones, then call JSON.stringify on them indvidiually and then concat them together. So users of the client library don't have to do this themselves...
Jason
11:10 PMThe parameter still works from Typesense Server's perspective, we need to add it to Typescript types and clarify in the docs what it exactly means. It doesn't control client-side batching, only server-side batching
Jason
11:13 PMAljosa
11:15 PMI guess I must be close to the limit with the additional manipulations I do on the raw json I post to my server since I'm able to split the json using the same
.map()
used in typesense js> The parameter still works from Typesense Server's perspective, we need to add it to Typescript types and clarify in the docs what it exactly means. It doesn't control client-side batching, only server-side batching
Understood 🙂 , I know what you mean now with regard to my initial question being about batching before sending.
And I appreciate the issue having been created! Will
batch_size
actually be sent correctly then to the server if I add it to the options of the import()
call ?Aljosa
11:15 PMJason
11:17 PMYup it should be sent. Do you want to create PR adding this to the types?
Jason
11:18 PMDocumentWriteParameters
and add batch_size to thatAljosa
11:20 PMDocumentImportParameters
?Jason
11:24 PM1
Typesense
Indexed 3005 threads (79% resolved)
Similar Threads
Revisiting Typesense for Efficient DB Indexing and Querying
kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.
Troubleshooting Typesense Document Import Error
Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.
Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.
Typesense Server Bulk Import/Upsert Issue Resolved
Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.
Errors in Batch Import with Typesense and OpenAI API
Gustavo encountered errors when importing documents into a collection. After discussion with Jason, it was concluded that the issue stemmed from OpenAI API's handling of batch requests with problematic documents, and improvements to Typesense's error messages and handling were suggested.