#community-help

Resolving Typesense Documents Import Error

TLDR Aljosa experienced an error while using typesense documents().import(), related to handling of large document arrays. Jason clarified that batch_size controls server-side batching, not client-side. He advised splitting arrays to address the issue and committed to elaborating its functionality in the docs. Aljosa proposed amending the TypeScript types to accommodate batch_size in the import options.

Powered by Struct AI

1

16
26mo
Solved
Join the chat
Nov 08, 2021 (26 months ago)
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
10:46 PM
Hey good evening 👋 - I was getting this exact error when using documents().import() with an array of documents https://typesense-community.slack.com/archives/C01P749MET0/p1626107101113000

With typesense.js 0.14.0 I was using batch_size as an option as described in the doc but in 1.0.0 with typings, batch_size is not an accepted option. Anyways, reducing it to 1 or using the default 40 didn't matter. The only way I was able to resolve it was by literally splitting the array in half and doing two imports one after the other
10:49
Aljosa
10:49 PM
This was an array of ~10k documents by the way. It feels like batches are maybe not handled correctly when using an array instead of jsonl .. but not idea overall
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:04 PM
Aljosa That particular error actually comes from node, and not Typesense. It's essentially saying there are too many properties inside each object, for converting the entire array of objects into a JSON string.

The solution for such large imports would be to convert to JSONL, and then send that JSONL string into the import method and you won't run into this issue.

Now the batch_size parameter you mention is actually a Typesense server parameter which does something different - server-side batching, after ever X documents imported, it will pause and look at the search request queue services those and then get back to importing.
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
11:06 PM
Hey Jason, thanks for the answer!

I see that actually in typesense js it's converted to JSONL anyways https://github.com/typesense/typesense-js/blob/a21d4101bc21fe59e0e85b41e64ba14d6fe88667/src/Typesense/Documents.ts#L112
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:07 PM
Yup, Typesense Server only accepts JSONL for import, so the client converts arrays to JSONL before making the API call
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
11:07 PM
As for the batch_size parameter, I believe this is a documentation bug then ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:09 PM
> Yup, Typesense Server only accepts JSONL for import, so the client converts arrays to JSONL before making the API call
Also notice how that method calls JSON.stringify on the entire array object. That's what causes the issue. One thing we could do is to split large arrays into smaller ones, then call JSON.stringify on them indvidiually and then concat them together. So users of the client library don't have to do this themselves...
11:10
Jason
11:10 PM
> As for the batch_size parameter, I believe this is a documentation bug then ?
The parameter still works from Typesense Server's perspective, we need to add it to Typescript types and clarify in the docs what it exactly means. It doesn't control client-side batching, only server-side batching
11:13
Jason
11:13 PM
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
11:15 PM
> Also notice how that method calls JSON.stringify on the entire array object. That's what causes the issue. One thing we could do is to split large arrays into smaller ones, then call JSON.stringify on them indvidiually and then concat them together. So users of the client library don't have to do this themselves...
I guess I must be close to the limit with the additional manipulations I do on the raw json I post to my server since I'm able to split the json using the same .map() used in typesense js

> The parameter still works from Typesense Server's perspective, we need to add it to Typescript types and clarify in the docs what it exactly means. It doesn't control client-side batching, only server-side batching
Understood 🙂 , I know what you mean now with regard to my initial question being about batching before sending.

And I appreciate the issue having been created! Will batch_size actually be sent correctly then to the server if I add it to the options of the import() call ?
11:15
Aljosa
11:15 PM
Typescript won't like it but I'll modify the typings
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:17 PM
> Will batch_size actually be sent correctly then to the server if I add it to the options of the import() call ?
Yup it should be sent. Do you want to create PR adding this to the types?
11:18
Jason
11:18 PM
batch_size is only for the import method though, so you'd probably have to create a new interface that extends DocumentWriteParameters and add batch_size to that
Aljosa
Photo of md5-6bb7313b20c5179141d6908d6c09b2d5
Aljosa
11:20 PM
Sure thing, what would you call it ? DocumentImportParameters ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:24 PM
Yeah that sounds good

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Revisiting Typesense for Efficient DB Indexing and Querying

kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.

1

46
9mo
Solved

Troubleshooting Typesense Document Import Error

Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.

3

30
10mo
Solved

Troubleshooting Indexing Duration in Typesense Import

Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.

5

43
15mo
Solved

Typesense Server Bulk Import/Upsert Issue Resolved

Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.

2

22
7mo
Solved

Errors in Batch Import with Typesense and OpenAI API

Gustavo encountered errors when importing documents into a collection. After discussion with Jason, it was concluded that the issue stemmed from OpenAI API's handling of batch requests with problematic documents, and improvements to Typesense's error messages and handling were suggested.

5

64
5mo
Solved