Errors in Batch Import with Typesense and OpenAI API
TLDR Gustavo encountered errors when importing documents into a collection. After discussion with Jason, it was concluded that the issue stemmed from OpenAI API's handling of batch requests with problematic documents, and improvements to Typesense's error messages and handling were suggested.
4
1
Jun 15, 2023 (3 months ago)
Gustavo
08:50 PMThe server had an error while processing your request. Sorry about that!
. I suspect the error has some relation with the fact the collection has a built-in embedding field. Cluster: v601y2x3upjea4tip
Jason
08:51 PMJason
08:51 PMGustavo
08:51 PMGustavo
08:51 PMJason
08:52 PMJason
08:52 PMGustavo
08:53 PM"error": {
"message": "The server had an error while processing your request. Sorry about that!",
"type": "server_error",
"param": null,
"code": null
}
Gustavo
08:54 PMJason
08:54 PMJason
08:55 PMGustavo
08:55 PMJason
08:56 PMGustavo
08:57 PMGustavo
08:57 PMGustavo
08:58 PMaction: upsert
doesn't work because of that error where Typesense sends an empty string to OpenAI.Gustavo
08:58 PMaction: create
and just ignore errors saying the document already exists.Jason
08:59 PMJason
09:00 PM1
Gustavo
09:00 PMJason
09:01 PMGustavo
09:02 PMGustavo
09:03 PMJason
09:03 PMid:=[a, b, c, d, ...]
Gustavo
09:03 PMGustavo
09:03 PMJason
09:04 PMGustavo
09:05 PM1
Gustavo
09:17 PM'$.input' is invalid. Please check the API reference: <https://platform.openai.com/docs/api-reference>.
even doing the workaround of deleting the document and recreating.Gustavo
09:18 PMGustavo
09:19 PMGustavo
09:19 PM0 documents imported successfully, 100 documents failed during import.
Gustavo
09:20 PMJason
09:21 PMGustavo
09:21 PMJason
09:21 PMJason
09:21 PMJason
09:22 PMGustavo
09:24 PMGustavo
09:25 PMJason
09:25 PMGustavo
09:25 PMJason
09:35 PMGustavo
09:52 PMdelete
and the retry
in this code. Other than that, the same code.https://typesense-community.slack.com/archives/C01P749MET0/p1686864054410659?thread_ts=1686862232.273089&cid=C01P749MET0
Gustavo
09:52 PMGustavo
09:53 PMGustavo
09:57 PMGustavo
10:29 PMGustavo
10:33 PMserver_error
from my first message.2. I found that the
invalid_request_error
error happening in the 100th batch is because Typesense is trying to generate the embedding from a field that's an array and is empty.3. There's a single document like that, but the whole batch fails saying
0 documents imported successfully, 100 documents failed during import
.Gustavo
10:34 PMJason
10:35 PMJason
10:36 PM1
Gustavo
10:36 PMJason
10:37 PM1
Gustavo
10:37 PMGustavo
10:40 PM[0, 0, ...]
?) in that case instead of crashing.Gustavo
10:41 PMGustavo
10:42 PMJason
11:38 PMJason
11:39 PMJun 16, 2023 (3 months ago)
Gustavo
12:00 AMJason
12:01 AMGustavo
12:04 AMJason
12:06 AMnull
1
Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Revisiting Typesense for Efficient DB Indexing and Querying
kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.
Typesense Server Bulk Import/Upsert Issue Resolved
Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.
Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Investigating Unhealthy Cluster and Typesense Issues
Gustavo reported an unhealthy cluster, which Jason identified was due to stalled writes from OpenAI's API. The cluster was temporarily stabilized by downgrading to RC35, but a full resolution is still pending.