I’m getting occasional 409 errors with the `ext-fi...
# community-help
o
I’m getting occasional 409 errors with the
ext-firestore-typesense-search-indexToTypesenseOnFirestoreWrite
cloud function. I don’t understand why this would throw an error as it shouldn’t matter if a document exists to update it… any idea what I’m doing wrong here?
j
Could you post the full log message with a few lines before and after?
o
Sorry this is a few months behind... Here's a recent instance of the error. Log says "creating document" and not the usual "upserting" which I'm sure is part of this problem but its unclear how. This is the log from document creation:
Copy code
{
  "textPayload": "Creating document {\"measure_text_length\":343,\"rank\":0.5,\"text\":\"Indexers certainly have a very strong want for data determinism, they are constantly monitoring and worried about their data veracity. it's very time consuming\\nConsumers are also feeling this too since we've seen cases with \\\"wrong\\\" data returned by queries\\nBut no we have not had an attack like that yet that we know of, but what's your point?\",\"title\":\"slack\",\"type\":\"message\",\"url\":\"<https://blockscienceteam.slack.com/archives/C036NAWP0CT/p1677261752395249?cid=C036NAWP0CT&thread_ts=1676059661.243089>\",\"id\":\"aHR0cHM6Ly9ibG9ja3NjaWVuY2V0ZWFtLnNsYWNrLmNvbS9hcmNoaXZlcy9DMDM2TkFXUDBDVC9wMTY3NzI2MTc1MjM5NTI0OT9jaWQ9QzAzNk5BV1AwQ1QmdGhyZWFkX3RzPTE2NzYwNTk2NjEuMjQzMDg5\"}",
  "insertId": "63f8fbbb0005cfd5c189c11f",
  "resource": {
    "type": "cloud_function",
    "labels": {
      "project_id": "knowledge-management-333914",
      "region": "us-central1",
      "function_name": "ext-firestore-typesense-search-indexToTypesenseOnFirestoreWrite"
    }
  },
  "timestamp": "2023-02-24T18:02:35.380885Z",
  "severity": "DEBUG",
  "labels": {
    "instance_id": "00c61b117c5a189878a95fae39681518ed338d4fa826a35986e68469d2ec5b3070fede155da6333715bb7dfc26d0c4c0f650d7a4a775945d9115",
    "execution_id": "qu4q7bkf0lq7"
  },
  "logName": "projects/knowledge-management-333914/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
  "trace": "projects/knowledge-management-333914/traces/a9c2419e85dd694ae3fe11d14d31bebf",
  "receiveTimestamp": "2023-02-24T18:02:35.518729400Z"
}
And this is the error:
Copy code
{
  "textPayload": "Error: Request failed with HTTP code 409 | Server said: A document with id aHR0cHM6Ly9ibG9ja3NjaWVuY2V0ZWFtLnNsYWNrLmNvbS9hcmNoaXZlcy9DMDM2TkFXUDBDVC9wMTY3NzI2MTc1MjM5NTI0OT9jaWQ9QzAzNk5BV1AwQ1QmdGhyZWFkX3RzPTE2NzYwNTk2NjEuMjQzMDg5 already exists.\n    at ApiCall.customErrorForResponse (/workspace/node_modules/typesense/lib/Typesense/ApiCall.js:229:21)\n    at ApiCall.performRequest (/workspace/node_modules/typesense/lib/Typesense/ApiCall.js:118:48)\n    at processTicksAndRejections (internal/process/task_queues.js:95:5)",
  "insertId": "63f8fbbb000bdb5feae60206",
  "resource": {
    "type": "cloud_function",
    "labels": {
      "project_id": "knowledge-management-333914",
      "region": "us-central1",
      "function_name": "ext-firestore-typesense-search-indexToTypesenseOnFirestoreWrite"
    }
  },
  "timestamp": "2023-02-24T18:02:35.777055Z",
  "severity": "ERROR",
  "labels": {
    "instance_id": "00c61b117c5a189878a95fae39681518ed338d4fa826a35986e68469d2ec5b3070fede155da6333715bb7dfc26d0c4c0f650d7a4a775945d9115",
    "execution_id": "qu4q7bkf0lq7"
  },
  "logName": "projects/knowledge-management-333914/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
  "trace": "projects/knowledge-management-333914/traces/a9c2419e85dd694ae3fe11d14d31bebf",
  "receiveTimestamp": "2023-02-24T18:02:35.852702927Z"
}
j
Hmm, so it looks like this document already exists in Typesense, but Firestore is calling the extension as a new document…
Does this document get created and then updated shortly after in Firestore?
o
In the above case I see a log for upserting and then creating the doc, in other cases where creation precedes updating it seems to run fine
Copy code
{
  "textPayload": "Upserting document {\"measure_text_length\":343,\"platform\":\"<http://blockscienceteam.slack.com|blockscienceteam.slack.com>\",\"rank\":0.5,\"text\":\"Indexers certainly have a very strong want for data determinism, they are constantly monitoring and worried about their data veracity. it's very time consuming\\nConsumers are also feeling this too since we've seen cases with \\\"wrong\\\" data returned by queries\\nBut no we have not had an attack like that yet that we know of, but what's your point?\",\"title\":\"slack\",\"type\":\"message\",\"url\":\"<https://blockscienceteam.slack.com/archives/C036NAWP0CT/p1677261752395249?cid=C036NAWP0CT&thread_ts=1676059661.243089>\",\"id\":\"aHR0cHM6Ly9ibG9ja3NjaWVuY2V0ZWFtLnNsYWNrLmNvbS9hcmNoaXZlcy9DMDM2TkFXUDBDVC9wMTY3NzI2MTc1MjM5NTI0OT9jaWQ9QzAzNk5BV1AwQ1QmdGhyZWFkX3RzPTE2NzYwNTk2NjEuMjQzMDg5\"}",
  "insertId": "63f8fbba00080662c369637e",
  "resource": {
    "type": "cloud_function",
    "labels": {
      "region": "us-central1",
      "function_name": "ext-firestore-typesense-search-indexToTypesenseOnFirestoreWrite",
      "project_id": "knowledge-management-333914"
    }
  },
  "timestamp": "2023-02-24T18:02:34.525922Z",
  "severity": "DEBUG",
  "labels": {
    "execution_id": "qu4q00yzm4k2",
    "instance_id": "00c61b117c5a189878a95fae39681518ed338d4fa826a35986e68469d2ec5b3070fede155da6333715bb7dfc26d0c4c0f650d7a4a775945d9115"
  },
  "logName": "projects/knowledge-management-333914/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
  "trace": "projects/knowledge-management-333914/traces/88dc0c0598a37d19e7d0901e25dfeb9c",
  "receiveTimestamp": "2023-02-24T18:02:34.855731622Z"
}
^ that's about 300ms before the crash
j
I see… This is most probably a result of Firestore not triggering change events in order unfortunately: https://github.com/typesense/firestore-typesense-search/issues/32
So in your case the update event shows up before the create event
o
Yeah I wondered about that from the seemingly random probability of this error
j
Question for you: I was thinking of this as a solution:
One potential solution to this could be to query Firestore on each change trigger and push the latest version of the Firestore document to Typesense, instead of using the snapshot document from the event
But from a cost perspective, do you think doing these reads for each Firestore write is reasonable? I guess it will essentially increase write costs by 33%?
o
Hmmmm, hard to say, it feels very much like a workaround. It would probably work for my case but only because it's not already a high-cost function.
Is there a need to separate upserting and creation on the firestore side? Would it be reasonable to have a single call for both? An upsert-or-create-if-it-doesnt-exist kinda thing...
j
Yeah that would solve the 409 error from showing up… But then if the create shows up after the update, the document will now be stale in Typesense, since the create event will have a stale version of the doc (when it was created) in its payload
o
Wouldn't it just mean you'd have two update_or_create calls? So the order of events in my case would be a bit like: 1. document added to firestore 2. TS update_or_create triggered -> TS document doesn't exist, creates it 3. document updated with "platform" field 4. TS update_or_create triggered -> TS document exists, updates it Ahhh... I see what you mean now...
Because you're using the snapshot from the event...
j
Right
o
The other solution mentioned on the GH issue, does that sync state need to be stored in firestore? if write triggers have an accurate timestamp can that be used to order TS updates?
j
We still need to store the state of what the last synced timestamp is for each document, to be able to know whether to discard an event
o
To avoid redundant calls to typesense?
j
To prevent processing stale events
o
Right
Hard to see a perfect solution here... Just different cost tradeoffs
j
Unfortunately yeah… That’s why I didn’t make this change. Figured if this is an issue for a particular use case, the 409 error will at least be a hint that this is happening
And then a good (may be $$ solution) could be to do something like this using a set of custom functions: https://typesense.org/docs/guide/syncing-data-into-typesense.html#polling-your-primary-database
o
So the options are: • pull from firestore directly instead of using the snapshot -> increase write costs • track sync state in a collection -> increased storage usage (and still need more than just the snapshot anyway? I guess this requires the first option too) • periodically sync -> increased cost from queries and storage and unneeded periodic checks when no activity is happening?
For me personally pulling from firestore directly would be fine
I don't know that any solution is perfect but I wonder if there's a good option to expose in the extension config
j
I think #1 is the simplest to expose as a config option… #2 also increases storage costs, so #1 might be better #3 requires a completely different extension, since it doesn’t rely on triggers
o
Yeah and I suppose #3 cant be automatically configured as an extension afaik and would require extra action to setup periodic events/triggers in GCP?
#1 would be a nice and simple option for sure
So with these out-of-order issues does that mean that some of the documents in TS will be out-of-date? Perhaps not in my case as quick subsequent writes only happen after initial creation. But if you had an existing document with two quick writes you can end up with stale data on TS?
j
Correct
o
Gotcha, I would definitely turn on a config option for #1 if it was there. Not because it's vitally important to be in perfect sync for users but because it makes debugging so much easier when the Firestore-TS link is 100% predictable. Then I know a bug has to be my fault 🤣
j
Haha!
Could you summarize this config flag in that Github issue? I can take a stab at it in the coming weeks
o
Yeah sure, if I get a chance I can try and create a PR, seems like it should be a simple change looking at the src now
j
A PR would be awesome!
o
Got any links to help test firestore extensions? Or I can just PR and cross my fingers (or you could test it on your end)
j
There’s also a set of commands to enable extensions in the emulator. The CI config shows these: https://github.com/typesense/firestore-typesense-search/blob/4e46be1b35494d11dda7c5c1a799e9be227d35e6/.circleci/config.yml
🙌 1
o
Do you think the flag should be "read from firestore" (off by default) or the opposite, "read from event" (on by default)?
^ flag names tbd of course
@Jason Bosco I created a very quick PR that I haven't had a chance to test (also never touched firestore extensions, etc, etc) but figured it worth getting the ball rolling.
If no one picks it up I will get to testing but i don't know when I'll have a sec to do that.
j
Thank you! I’ll take a look this coming week or early week after.
1