Hi All. Looking into using Firebase + Typesense an...
# community-help
r
Hi All. Looking into using Firebase + Typesense and running into an issue that I've noticed the official extension is also affected by. Firestore event triggers are not guaranteed to happen in the order the DB changes did, which means simply handling events in the order they are received will result in inconsistency between Firestore -> Typesense. This is quite easy to replicate by making some rapid-fire changes to Firestore documents and observing the final Firestore value does not end up in Typesense. I've been trying to solve this by using a Firestore transaction to create a "lock" at a given Firestore location and then comparing timestamps to ensure that events are only sent to Typesense if they are newer than what has already been sent, otherwise they are safe to discard -- i.e. the portion of the code that handles events is idempotent already -- it's simply the order of events that is the problem. Example (not working) implementation I've tried. If we can get it working I'd be happy to open a PR for the Typesense Firebase extension so everyone can benefit:
Copy code
.database.ref(`tasks/{itemId}`)
.onWrite(async (change, context) => {
  /**
   * We use a Firestore transaction to create a "lock" at a DB location
   * for a given `itemId`
   */
    const { timestamp, eventId } = context;
    const { itemId } = context.params;
    const timestampRef = firestore
      .collection(`typesenseLocks_tasks`)
      .doc(itemId);
    await admin.firestore().runTransaction(async transaction => {
      const dataChangedTimestamp = new Date(timestamp).getTime();
      const lastUpdatedTimestampDoc = await transaction.get(timestampRef);
      const lastUpdatedData = lastUpdatedTimestampDoc.data();

      /**
       * If this is the first time this document was changed (no previous locks),
       * or the last-stored lock timestamp is older than the current event's timestamp,
       * prepare a payload and send to Typesense.
       */
      if (
        (!lastUpdatedData?.timestamp ||
          dataChangedTimestamp > lastUpdatedData.timestamp) &&
        lastUpdatedData?.eventId !== eventId
      ) {
        // Send to Typesense
        await updateTypesense(change, indexer, itemId);

        // Finalize Transaction
        transaction.set(timestampRef, {
          timestamp: dataChangedTimestamp,
          eventId
        });
      } else {
        /**
         * Do nothing, current event is older than last-indexed event already recorded, can be safely discarded
         */
      }
    });
After some more testing, the transaction locking is actually working -- the last eventId processed and recorded in the "lock" document is the correct event with the final value. But the last value that ends up in Typesense is for the event prior which leads me to believe indexing on the Typesense side is not processed in series, but somewhat parallelized as part of index queue processing, is that the case?
k
Typesense in v0.22+ strictly serializes writes in exact order in which they are received at a per-collection level. In 0.23 we have also made deletes strictly serial, even as it happens in batches, it's not allowed to intermix with other writes.
j
Firestore event triggers are not guaranteed to happen in the order the DB changes did
This is surprising to hear! Do you know if this documented in the Firebase docs by any chance?
r
Yessir one moment. Quite a few SO posts on it as well
(etcetera) šŸ™‚
j
😱
r
i ended up fixing it by debouncing the cloud functions using redis, and then when the debounce ends, fetch the data at the document one last time and send to typesense
this sidesteps any ordering issues / race conditions between event handler instances, with the only downside being: • small latency penalty (whatever you set the debounce wait time to) • an additional firestore document retrieval at the end
basically the firebase function becomes more of a signal to "send to typesense, but fetch the correct data yourself" instead of "send to typesense, and use the data from this event"
j
Your idea of storing the last synced timestamp and only triggering a write to Typesense if the current timestamp is later than the last synced timestamp essentially adds a debounce in a way right?
instead of "send to typesense, and use the data from this event"
Ah
r
yes, and despite all of my testing (lots) indicating that the transaciton was properly locking things, it was still ending up with incorrect data in typesense
šŸ¤” 1
which is why i was wondering about if typesense serializes writes, which you do...in which case i'm stumped so went with the redis debounce route instead
every "firebase -> algolia" and "firebase -> typesense" extension / tutorial i've ever found suffers from the same issue where it doesn't account for event ordering. i guess assumes low change frequency
Copy code
for (let i = 0; i < 300; i++) {
    promises.push(ref.child("tasksOpen").child(taskId).child("name").set(`A ${i}`));
    console.log("Set name to", i);
  }

  await Promise.all(promises);
j
This is quite interesting. Would you be able to summarize your findings, what you tried and also the solution in a Github issue in this repo (just copy-pasting from this Slack thread is good)? I can then pass on this feedback to the Firestore team, and see if they have any additional thoughts... I'd imagine this is a common thing for any extensions that want to rely on triggers to sync data
r
^ that was just a super quick test script i whipped up (it hits the RTDB but firestore acts the same)
and checking the function logs you can see they are triggered in a totally random order when writes happen quickly
no problem, i'll write up an issue tonight (will try to find some time šŸ¤ž )
šŸ™ 1
šŸ™ 1
j
Thank you @Ross!
https://typesense-community.slack.com/archives/C01P749MET0/p1655150586446689?thread_ts=1655047418.496499&amp;cid=C01P749MET0
Following up on this idea ^ What if in the function:
await updateTypesense(change, indexer, itemId);
instead of using the
change
object provided by Firestore, we query Firestore for the latest version of that document by ID and insert that into Typesense?
r
to be honest I never thought of doing the re-query for the freshest data until I had given up on the transaction approach and gone with the redis debounce instead šŸ¤¦ā€ā™‚ļø
I think that could work, since Firestore transactions require all reads to happen before writes
so read timestamp data / eventId to avoid wasteful processing -> requery for freshest data -> send to typesense -> update timestamp / eventId
j
Right
r
although with that approach, really all processing except for the last event in a flurry of events even matters at that point
but i think doing a "distributed debounce" without introducing another element like redis is tricky
like...the snapshot provided by firestore is kinda meaningless, since we ignore it and fetch from DB. the firestore trigger is more of an indicator to sync to typesense 🤷
j
Yeah so instead of using redis to store state, you could use another Firestore collection as a simple kv pair to store sync state right? (Unless there are cost implications to do this)
r
might work? i'm more familiar with redis and how it handles locking vs. firestore
also redis has auto-expiring keys which makes writing a debouncer much easier, but i digress šŸ˜ž
j
Yeah, without that something has to manually cull expired keys lazily... Or may be another cron function does the expiration separately
r
if it's just about cleaning out old state for cost reasons, cron would be okay
if it's about clearing out keys to act as a debounce delay, i don't think that would work since the min increment would be 1 minute
j
Ah ok. This would only be needed to avoid wasteful processing on the Typesense side, so technically just fetching the latest data from Firestore should be sufficient to keep the two stores in sync... So to keep the extension simple (and cost effective to avoid additional Firestore data storage costs), may be the extension can just implement the data fetch from Firestore, and then provide instructions on how to setup a debounce mechanism that users could choose to implement...
r
nice šŸ‘Œ i'd love to see Google's take on this as well. They promote Algolia / Typesense as a "need Search?" solution but kinda skirt around this one šŸ™‚
j
Haha! I'm actually surprised no one has brought this up before, not even with the Algolia extension, though it's clearly an issue with high-volume syncs
But yeah, will still ask Firebase team for their thoughts