When importing documents into a collection after about 2500 typesense #community-help

When importing documents into a collection, after ...

Gustavo

06/15/2023, 8:50 PM

When importing documents into a collection, after about 2500 successful imports, I started to get an error saying

The server had an error while processing your request. Sorry about that!

. I suspect the error has some relation with the fact the collection has a built-in embedding field. Cluster:

v601y2x3upjea4tip

Jason Bosco

06/15/2023, 8:51 PM

Hmm, that doesn’t seem like an error from Typesense…

Jason Bosco

06/15/2023, 8:51 PM

Are you using a remote embedding service?

Gustavo

06/15/2023, 8:51 PM

I'm importing in batches of 100. Connection timeout is set to 3 minutes. Retry interval is set to 5 seconds.

Gustavo

06/15/2023, 8:51 PM

Yes, OpenAI.

Jason Bosco

06/15/2023, 8:52 PM

Could you share the full JSON message that Typesense returns?

Jason Bosco

06/15/2023, 8:52 PM

I suspect it’s from OpenAI’s API that we just proxy through

Gustavo

06/15/2023, 8:53 PM

Copy code

"error": {
  "message": "The server had an error while processing your request. Sorry about that!",
  "type": "server_error",
  "param": null,
  "code": null
}

Gustavo

06/15/2023, 8:54 PM

Yeah, I think it's from OpenAI. Maybe exceeding the rate limit or something.

Jason Bosco

06/15/2023, 8:54 PM

We should probably indicate where the error is originating from, in cases like this

Jason Bosco

06/15/2023, 8:55 PM

Confirming that it’s an OpenAI message: https://community.openai.com/t/openai-api-error-the-server-had-an-error-while-processing-your-request-sorry-about-that/53263

Gustavo

06/15/2023, 8:55 PM

And maybe also retry the request to OpenAI's API when it makes sense if it's not already retrying.

Jason Bosco

06/15/2023, 8:56 PM

We didn’t add a retry built-in to Typesense for remote services, to prevent any potential (billing) surprises. So if you see an error message in the API response from Typesense, you want to retry the import on those docs

Gustavo

06/15/2023, 8:57 PM

Makes sense

Gustavo

06/15/2023, 8:57 PM

Although...

Gustavo

06/15/2023, 8:58 PM

Importing with

action: upsert

doesn't work because of that error where Typesense sends an empty string to OpenAI.

Gustavo

06/15/2023, 8:58 PM

So I guess I'll have to retry with

action: create

and just ignore errors saying the document already exists.

Jason Bosco

06/15/2023, 8:59 PM

Yeah… For now 😞

Jason Bosco

06/15/2023, 9:00 PM

We’re just about to start addressing all the reported bugs in the last week. Should have something for you to test next week, if no other surprises come up

🤞 1

Gustavo

06/15/2023, 9:00 PM

I was trying to delete and recreate instead of upserting, but it puts too much pressure into the server and starts to constantly give me errors when dealing with thousands of documents.

Jason Bosco

06/15/2023, 9:01 PM

Deleting one by one is not as performant as deleting in a batch by query

Gustavo

06/15/2023, 9:02 PM

Can I delete using the IDs in a query? Like `id in [a, b, c, d, ...]``

Gustavo

06/15/2023, 9:03 PM

In case it's not clear, I mean, I have the IDs of the documents I want to delete. So I'd need to make a query with those IDs.

Jason Bosco

06/15/2023, 9:03 PM

Yup,

id:=[a, b, c, d, ...]

Gustavo

06/15/2023, 9:03 PM

Gonna try it

Gustavo

06/15/2023, 9:03 PM

How many items can I send in one query to be safe?

Jason Bosco

06/15/2023, 9:04 PM

Since the parameter is sent as a query parameter, it takes a max of 2K characters

Gustavo

06/15/2023, 9:05 PM

Ok, I'll try here and let it know

👍 1

Gustavo

06/15/2023, 9:17 PM

Weirdly, sometimes I get the error

'$.input' is invalid. Please check the API reference: <https://platform.openai.com/docs/api-reference>.

even doing the workaround of deleting the document and recreating.

Gustavo

06/15/2023, 9:18 PM

I was able to import 9900 documents (very fast using "delete by query") and the next batch gave me a lot of consecutive errors with that message.

Gustavo

06/15/2023, 9:19 PM

It will probably work if I just restart my script skipping the successful batches, but still I'm intrigued by the error. It should only happen when updating a document, not when recreating.

Gustavo

06/15/2023, 9:19 PM

And the error happened in the whole batch:

0 documents imported successfully, 100 documents failed during import.

Gustavo

06/15/2023, 9:20 PM

The code:

Jason Bosco

06/15/2023, 9:21 PM

Are you seeing this error on delete on creation? https://typesense-community.slack.com/archives/C01P749MET0/p1686863856037699?thread_ts=1686862232.273089&cid=C01P749MET0

Gustavo

06/15/2023, 9:21 PM

I assume it's on creation.

Jason Bosco

06/15/2023, 9:21 PM

I wonder if there’s some API response from OpenAI API that we’re not handling properly

Jason Bosco

06/15/2023, 9:21 PM

Or we’re may be passing blank strings somehow

Jason Bosco

06/15/2023, 9:22 PM

Could you give me a script like this that replicates the $.input error message: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b

Gustavo

06/15/2023, 9:24 PM

I'm afraid I won't be able to reproduce with a Bash script because the error only happens when dealing with thousands of documents, and I'm not very good at Bash.

Gustavo

06/15/2023, 9:25 PM

Does a JS repro work for you?

Jason Bosco

06/15/2023, 9:25 PM

Yeah JS works too

Gustavo

06/15/2023, 9:25 PM

I'll try to reproduce, let's see

Jason Bosco

06/15/2023, 9:35 PM

Could you also give me a similar snippet to reproduce this error message: https://typesense-community.slack.com/archives/C01P749MET0/p1686862403175659?thread_ts=1686862232.273089&cid=C01P749MET0

Gustavo

06/15/2023, 9:52 PM

What I know is: when I had the first error, I didn't have the

delete

and the

retry

in this code. Other than that, the same code. https://typesense-community.slack.com/archives/C01P749MET0/p1686864054410659?thread_ts=1686862232.273089&cid=C01P749MET0

Gustavo

06/15/2023, 9:52 PM

I'm trying to reproduce here without success.

Gustavo

06/15/2023, 9:53 PM

Will try one more thing.

Gustavo

06/15/2023, 9:57 PM

BTW, it would help a lot to identify the issue if the error from Typesense included the request that was sent to OpenAI's API.

Gustavo

06/15/2023, 10:29 PM

So, here's what I found:

Gustavo

06/15/2023, 10:33 PM

1. I couldn't reproduce the

server_error

from my first message. 2. I found that the

invalid_request_error

error happening in the 100th batch is because Typesense is trying to generate the embedding from a field that's an array and is empty. 3. There's a single document like that, but the whole batch fails saying

0 documents imported successfully, 100 documents failed during import

Gustavo

06/15/2023, 10:34 PM

Indeed, if Typesense's error included the request being sent to OpenAI, I'd probably immediately identify the invalid input being sent in that specific document.

Jason Bosco

06/15/2023, 10:35 PM

Ah good idea

Jason Bosco

06/15/2023, 10:36 PM

Added this to our todo list

👍 1

Gustavo

06/15/2023, 10:36 PM

Or maybe not because the problematic document was the 75th in the batch, so I wouldn't necessarily read all the 100 requests/errors and notice that specific one. So one thing that could be improved would be to prevent failing the whole batch, make it give me an error only for the actually problematic document.

Jason Bosco

06/15/2023, 10:37 PM

Hmm, we shouldn’t be failing the whole batch on a single document failure already… Will look into this

👍 1

Gustavo

06/15/2023, 10:37 PM

I guess it could be an unhandled error crashing the whole thing in your code.

Gustavo

06/15/2023, 10:40 PM

It's probably sending an empty string or something like that to OpenAI's API, which is the cause of the error, so maybe just assign some sort of null embedding (

[0, 0, ...]

?) in that case instead of crashing.

Gustavo

06/15/2023, 10:41 PM

I mean, assign a null embedding instead of making the request to OpenAI, so it doesn't crash.

Gustavo

06/15/2023, 10:42 PM

But I'm not sure if failing silently is ideal. Just writing some ideas here without too much thought.

Jason Bosco

06/15/2023, 11:38 PM

So it turns out that OpenAI’s API fails the whole API call, even if one of the strings in a batch embedding request has an issue

Jason Bosco

06/15/2023, 11:39 PM

We make one batch embedding call to OpenAI’s API for all the documents in a Typesense import API call

Gustavo

06/16/2023, 12:00 AM

Oh, got it. So there doesn't seem to be a lot you can do. Maybe simply document that behavior.

Jason Bosco

06/16/2023, 12:01 AM

In the upcoming build, we’re going to filter out all blank strings before we send it to openai, so at least that error is avoided. But if there are any other errors, yeah we have to fail the full batch on our side

Gustavo

06/16/2023, 12:04 AM

How will the embedding field look like in that case?

Jason Bosco

06/16/2023, 12:06 AM

We’ll set it to

null

👍 1

Open in Slack

Previous Next