Issue with Upsert Duplicating Documents Due to Nested ID
TLDR Ed was encountering duplicate documents when using upsert. Jason explained that the 'id' must be a top-level key to prevent this issue.
1
Aug 02, 2023 (4 months ago)
Ed
04:44 PM # Import documents into the collection
upsert = client.collections[collection_name].documents.import_(
jsonl_data.encode("utf-8"), {"action": "upsert"}
)
example data:
"data": {
"idFS": "xx",
"jobNumber": "xxx",
"applicationUrl": "/xxx",
"idClient": "870844",
"id": "870844"
}
"content": {xxx}
Jason
04:46 PMid
needs to be top level key.De-duplication does not work when
id
is nested inside another fieldEd
04:47 PM1
Typesense
Indexed 3005 threads (79% resolved)
Similar Threads
Working with Typesense SDK and Addressing Import Issues
Jacob was having issues with importing and indexing documents using Typesense SDK. After discussing with Jason, the user discovered the `emplace` action was ideal for their parallel processing requirement.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Threading Problem During Multiple Collection Creation and Batch Insertion in Typesense
Johan has a problem with creating multiple collections and batch-inserting documents into Typesense, which is returning results from different collections. Kishore Nallan helps troubleshoot the issue and suggests a potential local race condition, which is fixed in a later build.