Handling Order of Firestore Events for Synchronization with Typesense.
TLDR Ross ran into an issue with Firestore events triggering out of order, causing synchronization inconsistency between Firestore and Typesense. With advice and input from Jason and Kishore Nallan, they implemented a debouncing solution using redis, ensuring that the latest Firestore data is synced to Typesense accurately.
2
1
Jun 12, 2022 (16 months ago)
Ross
03:23 PMI've been trying to solve this by using a Firestore transaction to create a "lock" at a given Firestore location and then comparing timestamps to ensure that events are only sent to Typesense if they are newer than what has already been sent, otherwise they are safe to discard -- i.e. the portion of the code that handles events is idempotent already -- it's simply the order of events that is the problem.
Example (not working) implementation I've tried. If we can get it working I'd be happy to open a PR for the Typesense Firebase extension so everyone can benefit:
.database.ref(`tasks/{itemId}`)
.onWrite(async (change, context) => {
/**
* We use a Firestore transaction to create a "lock" at a DB location
* for a given `itemId`
*/
const { timestamp, eventId } = context;
const { itemId } = context.params;
const timestampRef = firestore
.collection(`typesenseLocks_tasks`)
.doc(itemId);
await admin.firestore().runTransaction(async transaction => {
const dataChangedTimestamp = new Date(timestamp).getTime();
const lastUpdatedTimestampDoc = await transaction.get(timestampRef);
const lastUpdatedData = lastUpdatedTimestampDoc.data();
/**
* If this is the first time this document was changed (no previous locks),
* or the last-stored lock timestamp is older than the current event's timestamp,
* prepare a payload and send to Typesense.
*/
if (
(!lastUpdatedData?.timestamp ||
dataChangedTimestamp > lastUpdatedData.timestamp) &&
lastUpdatedData?.eventId !== eventId
) {
// Send to Typesense
await updateTypesense(change, indexer, itemId);
// Finalize Transaction
transaction.set(timestampRef, {
timestamp: dataChangedTimestamp,
eventId
});
} else {
/**
* Do nothing, current event is older than last-indexed event already recorded, can be safely discarded
*/
}
});
Ross
09:57 PMJun 13, 2022 (16 months ago)
Kishore Nallan
04:44 AMJason
07:58 PMThis is surprising to hear! Do you know if this documented in the Firebase docs by any chance?
Ross
08:00 PMRoss
08:01 PMJason
08:01 PMRoss
08:01 PMRoss
08:02 PMโข small latency penalty (whatever you set the debounce wait time to)
โข an additional firestore document retrieval at the end
Ross
08:03 PMJason
08:03 PMJason
08:03 PMAh
Ross
08:03 PM1
Ross
08:04 PMRoss
08:05 PMJason
08:07 PMRoss
08:07 PM for (let i = 0; i < 300; i++) {
promises.push(ref.child("tasksOpen").child(taskId).child("name").set(`A ${i}`));
console.log("Set name to", i);
}
await Promise.all(promises);
Ross
08:08 PMRoss
08:08 PMRoss
08:09 PM1
Jun 14, 2022 (16 months ago)
Ross
12:27 AM1
Jason
12:55 AM> https://typesense-community.slack.com/archives/C01P749MET0/p1655150586446689?thread_ts=1655047418.496499&cid=C01P749MET0
Following up on this idea ^
What if in the function:
await updateTypesense(change, indexer, itemId);
instead of using the
change
object provided by Firestore, we query Firestore for the latest version of that document by ID and insert that into Typesense?Ross
12:56 AMRoss
12:57 AMRoss
12:57 AMJason
12:58 AMRoss
12:58 AMRoss
12:59 AMRoss
12:59 AMJason
01:00 AMRoss
01:01 AMRoss
01:01 AMJason
01:03 AMRoss
01:04 AMRoss
01:04 AMJason
01:06 AMThis would only be needed to avoid wasteful processing on the Typesense side, so technically just fetching the latest data from Firestore should be sufficient to keep the two stores in sync... So to keep the extension simple (and cost effective to avoid additional Firestore data storage costs), may be the extension can just implement the data fetch from Firestore, and then provide instructions on how to setup a debounce mechanism that users could choose to implement...
Ross
01:07 AMi'd love to see Google's take on this as well. They promote Algolia / Typesense as a "need Search?" solution but kinda skirt around this one ๐
Jason
01:09 AMJason
01:10 AMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Troubleshooting 409 Errors with Firestore to Typesense Cloud Function
Orion encounters 409 errors with `ext-firestore-typesense-search-indexToTypesenseOnFirestoreWrite` cloud function. Jason suggests possible solutions like querying Firestore on each change or tracking sync state in a collection. Both agreed on adding a config option. Orion proposed contributing a PR for the change.
Troubleshooting 400 Error When Upgrading Typesense Firestore Extension
Orion experienced a `400` error after updating the Typesense Firestore extension, causing issues with cloud functions. They traced the issue back to a data type conflict in their Typesense collection schema after updating. With help from Jason and Kishore Nallan, they resolved the issue by recreating the collection.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Discussion on Firebase Extension for Typesense
Jason announced the release of a Firebase Extension for Typesense, leading to a discussion with Gabe about capabilities and scalability concerns. Other users, John and A, simply expressed their admiration.
Discussing the Intervention of Typesense for Race Conditions
micha asked about handling race conditions in Typesense. Jason responded with a prospective solution using an SQL-like UPDATE, proposing an atomic process, which was well-received by micha. An issue was created on GitHub for this feature.