Optimizing Document Re-ingestion in Typesense
TLDR Viktor and Elyes discuss ways to handle frequent doc updates in Typesense. Kishore Nallan recommends using the update/upsert mode, data sharding, and the emplace action for efficient re-ingestion.
Nov 15, 2022 (10 months ago)
Viktor
11:29 AMThink of a CRM with content that updates often (including deletions). Our current thinking is to have stateless process that runs according to the following pseudo-code:
function reindex(docs: Doc[]) {
const now = Date.now()
const docsWithUpsertedAt = docs.map(doc => {...doc, addedToTypesenseAt: now })
// Upsert docsWithUpsertedAt
// Delete docs with filter_by=addedToTypesenseAt < $now
}
We had some concerns regarding the load on the Typesense service, especially as our documents would number in the tens of thousands. Mainly that there might be some inconsistency due to the asynchronous behaviour of upserting. What do you think about this? Are there alternative approaches worth considering?
cc Elyes
Kishore Nallan
11:50 AM// Upsert docsWithUpsertedAt
operation begins and the // Delete docs
operation ends, some of the documents will be duplicated.Kishore Nallan
11:52 AMupdate
or upsert
mode during import which will ensure that only the parts of the document that are changed are updated. That reduces some load on the indexing (though the field-wise comparison between old and new document does happen).Kishore Nallan
11:53 AMElyes
12:40 PMemplace
action mode instead (and we supply a stable id
field in each document)?Viktor
12:51 PMKishore Nallan
01:10 PMPerformance of large collections really depends on shape of data, whether the data is skewed or evenly distributed etc. Upto 20M records a single collection should be sufficient.
Typesense
Indexed 2764 threads (79% resolved)
Similar Threads
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.


Implementing Typesense Updates with JSONL Import and Aliases
Ken is building a search solution for a website using Typesense. They consulted Kishore Nallan about the implementation of updates using JSONL import and aliases and how to know when the new collection is indexed and ready. Measures, such as dividing large imports into small batches, were suggested to address the issue.

Working with Typesense SDK and Addressing Import Issues
Jacob was having issues with importing and indexing documents using Typesense SDK. After discussing with Jason, the user discovered the `emplace` action was ideal for their parallel processing requirement.
