is there any way in typesense to help with race conditions f typesense #community-help

is there any way in typesense to help with race co...

micha

05/19/2022, 2:51 PM

is there any way in typesense to help with race conditions? for example, i insert documents based on a async queue processing. My documents will all have an “updated_at” field. I would like typesense to discard inserts if the updated_at field of the payload (that updates the document) is lower/equal than the document that is already stored in the collection. This would help me make my implementation more robust (i.e. some job weirdly comes in late or is retried for whatever reason and tries to update the document in typesense with stale/outdated data

Jason Bosco

05/19/2022, 2:55 PM

Typesense doesn't have a mechanism for this, but you could implement this on your side by may be maintaining an enqueue timestamp in your async job, and then when executing the job, if the difference between the enqueue timestamp and current timestamp is over a threshold, discard it? Or you could fetch the document from Typesense before write, and only do an update if updated_at is over a threshold

micha

05/19/2022, 2:56 PM

i was thinking about fetching the doucment before insert as well but this does not really scale well and also it doesn’t prevent the problem from happening but only reduces the chances (between fetching and inserting it could have still been updated)

micha

05/19/2022, 2:57 PM

the thing with the enqueued timestamp is - if i understand you correctly, that relies on a lot of assumptions and just because the queue job is “old” doesn’t mean it’s wrong.

micha

05/19/2022, 2:57 PM

i think the only thing that truly could guarantee it would be typesense

micha

05/19/2022, 2:58 PM

i mean ofc i could start greating a locking mechanism in the queue between fetching the document and updating it and prevent other jobs until the lock is released but would love if i could offload this to typesense 😛

Jason Bosco

05/19/2022, 2:58 PM

Yeah, true. All others mechanism have tiny gaps that could still lead to atomicity issues...

micha

05/19/2022, 2:59 PM

locking on the queue also leads to extra complexity in case some job crashes and can not release the lock anymore, i could easily end up deadlocking myself

Jason Bosco

05/19/2022, 3:00 PM

I think if we had a SQL-like UPDATE WHERE mechanism that would help

Jason Bosco

05/19/2022, 3:00 PM

So you could do UPDATE x WHERE updated_at > sometimstamp

micha

05/19/2022, 3:02 PM

but sometimestamp would still require me to fetch the current documents timestamp which introduces state on the queue-job which leads to all this atomicity issues again, no?

micha

05/19/2022, 3:03 PM

i think the better analogy would be a SELECT FOR UPDATE + UPDATE - which would create a lock on mysql/typesense side

micha

05/19/2022, 3:05 PM

but i think that would be way complex and less peformant than if i could define some “stale-indicator” field (int) in the collection and the update happens only, if this value is greater than the existing one. Then the “lock”/async timing problem is at least reduced to typesense and it’s nodes and not involving external systems with uncertain timing/availability etc.

Jason Bosco

05/19/2022, 3:06 PM

Let's say the typesense doc has an updated_at of 1 And the doc on your side has an updated_at of 2 If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at (You should have yourdoc.updated_at in your DB right), this should go through --- Let's say the typesense doc has an updated_at of 5 And the doc on your side has an updated_at of 3 If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at The condition evaluates to false and so the update won't go through.

Jason Bosco

05/19/2022, 3:07 PM

When evaluating the WHERE condition, we're essentially doing an implied SELECT behind the schenes

micha

05/19/2022, 3:09 PM

and typesense would lock the document for other PATCH operations in between the check and the actual update?

Jason Bosco

05/19/2022, 3:09 PM

Correct, it would be atomic

💙 1

micha

05/19/2022, 3:10 PM

and the where statement would probably be a header field right?

micha

05/19/2022, 3:11 PM

i wonder if this increases complexity because then consumers want to have boolean operators and all kinds of other stuff for their where queries

Jason Bosco

05/19/2022, 3:11 PM

It would essentially be a

filter_by

query param on the update endpoint, very similar to the search and export endpoints

micha

05/19/2022, 3:11 PM

but it’s an interesting concept, that in theory would be more flexible for clients. i think i like it. so basically i would send my document with its updated_at timestamp and a filter_by that uses the same value as the document has. nice

micha

05/19/2022, 3:13 PM

and if someone PATCHes the document while typesense has it lock, the request will be declined instead of waited right?

Jason Bosco

05/19/2022, 3:13 PM

It would block (wait) that API request, until the lock is released

micha

05/19/2022, 3:14 PM

ok that will get more tricky i guess because timeouts and stuff but yeah.

micha

05/19/2022, 3:15 PM

thanks, that would be amazing. is there anything you need from me to make this a reality? a github issue?

Jason Bosco

05/19/2022, 3:16 PM

Yes! Could you summarize this use-case on this issue (specifically the atomicity part): https://github.com/typesense/typesense/issues/496

micha

05/19/2022, 3:19 PM

you sure we wanna recycle the issue? i understand it would use the same mechanism but i’m not sure the original requester here would need the atomic support. So do you think it UPDATE WHERE should always be atomic or maybe for performance reasons it should be configurable so clients that don’t mind inconsistency (if that is ever the case) could disable the behavior?

Jason Bosco

05/19/2022, 3:20 PM

Yeah we'd have to think through the atomicity piece when we implement UPDATE WHERE, vs doing it separately and having to rework things later.

micha

05/19/2022, 3:20 PM

ok as you wish - will comment on the issue! thanks!

👍 1

micha

05/19/2022, 3:34 PM

for posterity: link to the github comment created

🙏 1

micha

05/19/2022, 3:35 PM

(i still think atomic updates and mass-updates as two separated concerns/issues that could be tackled independently of each other but i’ll hope this will make it at some point and not go under the original idea 🙂)

Jason Bosco

05/19/2022, 3:39 PM

There would definitely be more work involved to handle atomicity. But if we can't tackle that in that issue when we implement it, I'll create a separate one to track atomicity. But I'd at least want to evaluate doing it together

micha

05/19/2022, 3:40 PM

awesome, thanks for the context!

👍 1

Open in Slack

Previous Next