Typesense Server Bulk Import/Upsert Issue Resolved
TLDR Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.
2
Apr 26, 2023 (7 months ago)
Adam
06:49 PMAdam
07:02 PM• a batch of ~1000 post IDs are identified (there are ~20k in my test db)
• they’re chunked with
array_chunk
into groups of 10• I iterate over those chunks: querying data from the db, preparing the data to send, and POSTing the data to the typesense server
• each time the data are sent, I count how many successes and failures there are and report those back as part of REST response
here’s the strange part. in the browser network tab, I can see that the network response says the request of 1000 batched posts (10x100) went through. But, when I query the typesense server directly, it says that only 216 documents are in my collection.
So - is it possible that somehow there’s a race condition between how fast typesense can process the data I’m sending and how fast those data are returned by my plugin? Should I
sleep
part of the process to allow the typesense server to catch up?FWIW - I’m batching things this way because otherwise my plugin kept hitting memory limits. Thanks for any help you can provide!
Adam
07:25 PMsleep(5)
after each run of my loop and got the same result. no additional posts indexedApr 27, 2023 (7 months ago)
Kishore Nallan
12:41 AMAdam
11:28 AM/collections/{collection-name}/documents/import?action=upsert&return_id=true
/ I’ve also tried action=create
. all the responses claim to return successfully. there’s just this mismatch between the network response and the actually indexed items. I was wondering if it might be related to this issue, but I’m not sure how to interpret the /metrics.json
responseKishore Nallan
11:29 AMAdam
11:30 AMAdam
11:34 AMlast_index index: 2371, committed_index: 2371, known_applied_index: 2371, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 21719
but I wasn’t sure how to interpret them
Kishore Nallan
11:41 AMKishore Nallan
11:41 AMqueued_writes
will be non-zero.Have you tried printing the output of the import response?
Adam
01:05 PMKishore Nallan
01:07 PMAdam
02:20 PMlocalhost-posts
collection before indexing documents{
"facet_counts": [],
"found": 0,
"hits": [],
"out_of": 0,
"page": 1,
"request_params": {
"collection_name": "localhost-posts",
"per_page": 10,
"q": "post"
},
"search_cutoff": false,
"search_time_ms": 0
}
then I’ve got two screenshots. one is the network response from the client once the POST request resolves. it’s showing the JSON response from the typesense server. the second screenshot though shows that only 216 documents got indexed
Kishore Nallan
02:22 PMdata:
etc.Adam
02:24 PMAdam
02:25 PM"<http://typesense:8108/collections/localhost-posts/documents/import?action=upsert&return_id=true>"
Adam
02:26 PMKishore Nallan
02:28 PM1
Adam
02:30 PM1
Adam
03:13 PMAdam
03:43 PM<http://localhost:8108/collections/localhost-posts>
{
"created_at": 1682609530,
"default_sorting_field": "",
"enable_nested_fields": true,
"fields": [
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": ".*",
"optional": true,
"sort": false,
"type": "auto"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "featured_image_url",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "permalink",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author",
"optional": true,
"sort": false,
"type": "object"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_content",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_date",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_excerpt",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_id",
"optional": true,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_sortby_date",
"optional": true,
"sort": true,
"type": "int64"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_title",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_type",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.user_name",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.link",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.last_name",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.full_name",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.image_url",
"optional": true,
"sort": false,
"type": "string"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "post_author.first_name",
"optional": true,
"sort": false,
"type": "string"
}
],
"name": "localhost-posts",
"num_documents": 216,
"symbols_to_index": [],
"token_separators": []
}
Apr 28, 2023 (7 months ago)
Adam
05:22 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.
Resolving Typesense v0.22.0 Import Issues
Anton encountered issues importing documents in batches using Typesense v0.22.0. Kishore Nallan suggested using atomic import and proposed a debug build. After multiple trials, they were able to reproduce and fix the issue. Anton confirmed the solution was working.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Revisiting Typesense for Efficient DB Indexing and Querying
kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.
Threading Problem During Multiple Collection Creation and Batch Insertion in Typesense
Johan has a problem with creating multiple collections and batch-inserting documents into Typesense, which is returning results from different collections. Kishore Nallan helps troubleshoot the issue and suggests a potential local race condition, which is fixed in a later build.