#community-help

Handling Order of Firestore Events for Synchronization with Typesense.

TLDR Ross ran into an issue with Firestore events triggering out of order, causing synchronization inconsistency between Firestore and Typesense. With advice and input from Jason and Kishore Nallan, they implemented a debouncing solution using redis, ensuring that the latest Firestore data is synced to Typesense accurately.

Powered by Struct AI

2

1

Jun 12, 2022 (16 months ago)
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
03:23 PM
Hi All. Looking into using Firebase + Typesense and running into an issue that I've noticed the official extension is also affected by. Firestore event triggers are not guaranteed to happen in the order the DB changes did, which means simply handling events in the order they are received will result in inconsistency between Firestore -> Typesense. This is quite easy to replicate by making some rapid-fire changes to Firestore documents and observing the final Firestore value does not end up in Typesense.

I've been trying to solve this by using a Firestore transaction to create a "lock" at a given Firestore location and then comparing timestamps to ensure that events are only sent to Typesense if they are newer than what has already been sent, otherwise they are safe to discard -- i.e. the portion of the code that handles events is idempotent already -- it's simply the order of events that is the problem.

Example (not working) implementation I've tried. If we can get it working I'd be happy to open a PR for the Typesense Firebase extension so everyone can benefit:

.database.ref(`tasks/{itemId}`)
.onWrite(async (change, context) => {
  /**
   * We use a Firestore transaction to create a "lock" at a DB location
   * for a given `itemId`
   */
    const { timestamp, eventId } = context;
    const { itemId } = context.params;
    const timestampRef = firestore
      .collection(`typesenseLocks_tasks`)
      .doc(itemId);
    await admin.firestore().runTransaction(async transaction => {
      const dataChangedTimestamp = new Date(timestamp).getTime();
      const lastUpdatedTimestampDoc = await transaction.get(timestampRef);
      const lastUpdatedData = lastUpdatedTimestampDoc.data();

      /**
       * If this is the first time this document was changed (no previous locks),
       * or the last-stored lock timestamp is older than the current event's timestamp,
       * prepare a payload and send to Typesense.
       */
      if (
        (!lastUpdatedData?.timestamp ||
          dataChangedTimestamp > lastUpdatedData.timestamp) &&
        lastUpdatedData?.eventId !== eventId
      ) {
        // Send to Typesense
        await updateTypesense(change, indexer, itemId);

        // Finalize Transaction
        transaction.set(timestampRef, {
          timestamp: dataChangedTimestamp,
          eventId
        });
      } else {
        /**
         * Do nothing, current event is older than last-indexed event already recorded, can be safely discarded
         */
      }
    });
09:57
Ross
09:57 PM
After some more testing, the transaction locking is actually working -- the last eventId processed and recorded in the "lock" document is the correct event with the final value. But the last value that ends up in Typesense is for the event prior which leads me to believe indexing on the Typesense side is not processed in series, but somewhat parallelized as part of index queue processing, is that the case?
Jun 13, 2022 (16 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:44 AM
Typesense in v0.22+ strictly serializes writes in exact order in which they are received at a per-collection level. In 0.23 we have also made deletes strictly serial, even as it happens in batches, it's not allowed to intermix with other writes.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:58 PM
> Firestore event triggers are not guaranteed to happen in the order the DB changes did
This is surprising to hear! Do you know if this documented in the Firebase docs by any chance?
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
08:00 PM
Yessir one moment. Quite a few SO posts on it as well
08:01
Ross
08:01 PM
(etcetera) ๐Ÿ™‚
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:01 PM
๐Ÿ˜ฑ
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
08:01 PM
i ended up fixing it by debouncing the cloud functions using redis, and then when the debounce ends, fetch the data at the document one last time and send to typesense
08:02
Ross
08:02 PM
this sidesteps any ordering issues / race conditions between event handler instances, with the only downside being:
โ€ข small latency penalty (whatever you set the debounce wait time to)
โ€ข an additional firestore document retrieval at the end
08:03
Ross
08:03 PM
basically the firebase function becomes more of a signal to "send to typesense, but fetch the correct data yourself" instead of "send to typesense, and use the data from this event"
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:03 PM
Your idea of storing the last synced timestamp and only triggering a write to Typesense if the current timestamp is later than the last synced timestamp essentially adds a debounce in a way right?
08:03
Jason
08:03 PM
> instead of "send to typesense, and use the data from this event"
Ah
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
08:03 PM
yes, and despite all of my testing (lots) indicating that the transaciton was properly locking things, it was still ending up with incorrect data in typesense

1

08:04
Ross
08:04 PM
which is why i was wondering about if typesense serializes writes, which you do...in which case i'm stumped so went with the redis debounce route instead
08:05
Ross
08:05 PM
every "firebase -> algolia" and "firebase -> typesense" extension / tutorial i've ever found suffers from the same issue where it doesn't account for event ordering. i guess assumes low change frequency
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:07 PM
This is quite interesting. Would you be able to summarize your findings, what you tried and also the solution in a Github issue in this repo (just copy-pasting from this Slack thread is good)? I can then pass on this feedback to the Firestore team, and see if they have any additional thoughts... I'd imagine this is a common thing for any extensions that want to rely on triggers to sync data
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
08:07 PM
  for (let i = 0; i < 300; i++) {
    promises.push(ref.child("tasksOpen").child(taskId).child("name").set(`A ${i}`));
    console.log("Set name to", i);
  }

  await Promise.all(promises);
08:08
Ross
08:08 PM
^ that was just a super quick test script i whipped up (it hits the RTDB but firestore acts the same)
08:08
Ross
08:08 PM
and checking the function logs you can see they are triggered in a totally random order when writes happen quickly
08:09
Ross
08:09 PM
no problem, i'll write up an issue tonight (will try to find some time ๐Ÿคž )

1

Jun 14, 2022 (16 months ago)
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:55 AM
Thank you Ross!

&gt; https://typesense-community.slack.com/archives/C01P749MET0/p1655150586446689?thread_ts=1655047418.496499&amp;cid=C01P749MET0
Following up on this idea ^

What if in the function:

await updateTypesense(change, indexer, itemId);

instead of using the change object provided by Firestore, we query Firestore for the latest version of that document by ID and insert that into Typesense?
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
12:56 AM
to be honest I never thought of doing the re-query for the freshest data until I had given up on the transaction approach and gone with the redis debounce instead :man-facepalming:
12:57
Ross
12:57 AM
I think that could work, since Firestore transactions require all reads to happen before writes
12:57
Ross
12:57 AM
so read timestamp data / eventId to avoid wasteful processing -&gt; requery for freshest data -&gt; send to typesense -&gt; update timestamp / eventId
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:58 AM
Right
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
12:58 AM
although with that approach, really all processing except for the last event in a flurry of events even matters at that point
12:59
Ross
12:59 AM
but i think doing a "distributed debounce" without introducing another element like redis is tricky
12:59
Ross
12:59 AM
like...the snapshot provided by firestore is kinda meaningless, since we ignore it and fetch from DB. the firestore trigger is more of an indicator to sync to typesense :shrug:
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:00 AM
Yeah so instead of using redis to store state, you could use another Firestore collection as a simple kv pair to store sync state right? (Unless there are cost implications to do this)
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
01:01 AM
might work? i'm more familiar with redis and how it handles locking vs. firestore
01:01
Ross
01:01 AM
also redis has auto-expiring keys which makes writing a debouncer much easier, but i digress ๐Ÿ˜ž
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:03 AM
Yeah, without that something has to manually cull expired keys lazily... Or may be another cron function does the expiration separately
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
01:04 AM
if it's just about cleaning out old state for cost reasons, cron would be okay
01:04
Ross
01:04 AM
if it's about clearing out keys to act as a debounce delay, i don't think that would work since the min increment would be 1 minute
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:06 AM
Ah ok.

This would only be needed to avoid wasteful processing on the Typesense side, so technically just fetching the latest data from Firestore should be sufficient to keep the two stores in sync... So to keep the extension simple (and cost effective to avoid additional Firestore data storage costs), may be the extension can just implement the data fetch from Firestore, and then provide instructions on how to setup a debounce mechanism that users could choose to implement...
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
01:07 AM
nice ๐Ÿ‘Œ

i'd love to see Google's take on this as well. They promote Algolia / Typesense as a "need Search?" solution but kinda skirt around this one ๐Ÿ™‚
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:09 AM
Haha! I'm actually surprised no one has brought this up before, not even with the Algolia extension, though it's clearly an issue with high-volume syncs
01:10
Jason
01:10 AM
But yeah, will still ask Firebase team for their thoughts