is there any way in typesense to help with race co...
# community-help
m
is there any way in typesense to help with race conditions? for example, i insert documents based on a async queue processing. My documents will all have an “updated_at” field. I would like typesense to discard inserts if the updated_at field of the payload (that updates the document) is lower/equal than the document that is already stored in the collection. This would help me make my implementation more robust (i.e. some job weirdly comes in late or is retried for whatever reason and tries to update the document in typesense with stale/outdated data
j
Typesense doesn't have a mechanism for this, but you could implement this on your side by may be maintaining an enqueue timestamp in your async job, and then when executing the job, if the difference between the enqueue timestamp and current timestamp is over a threshold, discard it? Or you could fetch the document from Typesense before write, and only do an update if updated_at is over a threshold
m
i was thinking about fetching the doucment before insert as well but this does not really scale well and also it doesn’t prevent the problem from happening but only reduces the chances (between fetching and inserting it could have still been updated)
the thing with the enqueued timestamp is - if i understand you correctly, that relies on a lot of assumptions and just because the queue job is “old” doesn’t mean it’s wrong.
i think the only thing that truly could guarantee it would be typesense
i mean ofc i could start greating a locking mechanism in the queue between fetching the document and updating it and prevent other jobs until the lock is released but would love if i could offload this to typesense 😛
j
Yeah, true. All others mechanism have tiny gaps that could still lead to atomicity issues...
m
locking on the queue also leads to extra complexity in case some job crashes and can not release the lock anymore, i could easily end up deadlocking myself
j
I think if we had a SQL-like UPDATE WHERE mechanism that would help
So you could do UPDATE x WHERE updated_at > sometimstamp
m
but sometimestamp would still require me to fetch the current documents timestamp which introduces state on the queue-job which leads to all this atomicity issues again, no?
i think the better analogy would be a SELECT FOR UPDATE + UPDATE - which would create a lock on mysql/typesense side
but i think that would be way complex and less peformant than if i could define some “stale-indicator” field (int) in the collection and the update happens only, if this value is greater than the existing one. Then the “lock”/async timing problem is at least reduced to typesense and it’s nodes and not involving external systems with uncertain timing/availability etc.
j
Let's say the typesense doc has an updated_at of 1 And the doc on your side has an updated_at of 2 If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at (You should have yourdoc.updated_at in your DB right), this should go through --- Let's say the typesense doc has an updated_at of 5 And the doc on your side has an updated_at of 3 If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at The condition evaluates to false and so the update won't go through.
When evaluating the WHERE condition, we're essentially doing an implied SELECT behind the schenes
m
and typesense would lock the document for other PATCH operations in between the check and the actual update?
j
Correct, it would be atomic
💙 1
m
and the where statement would probably be a header field right?
i wonder if this increases complexity because then consumers want to have boolean operators and all kinds of other stuff for their where queries
j
It would essentially be a
filter_by
query param on the update endpoint, very similar to the search and export endpoints
m
but it’s an interesting concept, that in theory would be more flexible for clients. i think i like it. so basically i would send my document with its updated_at timestamp and a filter_by that uses the same value as the document has. nice
and if someone PATCHes the document while typesense has it lock, the request will be declined instead of waited right?
j
It would block (wait) that API request, until the lock is released
m
ok that will get more tricky i guess because timeouts and stuff but yeah.
thanks, that would be amazing. is there anything you need from me to make this a reality? a github issue?
j
Yes! Could you summarize this use-case on this issue (specifically the atomicity part): https://github.com/typesense/typesense/issues/496
m
you sure we wanna recycle the issue? i understand it would use the same mechanism but i’m not sure the original requester here would need the atomic support. So do you think it UPDATE WHERE should always be atomic or maybe for performance reasons it should be configurable so clients that don’t mind inconsistency (if that is ever the case) could disable the behavior?
j
Yeah we'd have to think through the atomicity piece when we implement UPDATE WHERE, vs doing it separately and having to rework things later.
m
ok as you wish - will comment on the issue! thanks!
👍 1
(i still think atomic updates and mass-updates as two separated concerns/issues that could be tackled independently of each other but i’ll hope this will make it at some point and not go under the original idea 🙂)
j
There would definitely be more work involved to handle atomicity. But if we can't tackle that in that issue when we implement it, I'll create a separate one to track atomicity. But I'd at least want to evaluate doing it together
m
awesome, thanks for the context!
👍 1