#community-help

Discussing the Intervention of Typesense for Race Conditions

TLDR micha asked about handling race conditions in Typesense. Jason responded with a prospective solution using an SQL-like UPDATE, proposing an atomic process, which was well-received by micha. An issue was created on GitHub for this feature.

Powered by Struct AI

2

1

1

May 19, 2022 (20 months ago)
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
02:51 PM
is there any way in typesense to help with race conditions?

for example, i insert documents based on a async queue processing. My documents will all have an “updated_at” field. I would like typesense to discard inserts if the updated_at field of the payload (that updates the document) is lower/equal than the document that is already stored in the collection.

This would help me make my implementation more robust (i.e. some job weirdly comes in late or is retried for whatever reason and tries to update the document in typesense with stale/outdated data
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:55 PM
Typesense doesn't have a mechanism for this, but you could implement this on your side by may be maintaining an enqueue timestamp in your async job, and then when executing the job, if the difference between the enqueue timestamp and current timestamp is over a threshold, discard it?

Or you could fetch the document from Typesense before write, and only do an update if updated_at is over a threshold
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
02:56 PM
i was thinking about fetching the doucment before insert as well but this does not really scale well and also it doesn’t prevent the problem from happening but only reduces the chances (between fetching and inserting it could have still been updated)
02:57
micha
02:57 PM
the thing with the enqueued timestamp is - if i understand you correctly, that relies on a lot of assumptions and just because the queue job is “old” doesn’t mean it’s wrong.
02:57
micha
02:57 PM
i think the only thing that truly could guarantee it would be typesense
02:58
micha
02:58 PM
i mean ofc i could start greating a locking mechanism in the queue between fetching the document and updating it and prevent other jobs until the lock is released but would love if i could offload this to typesense 😛
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:58 PM
Yeah, true. All others mechanism have tiny gaps that could still lead to atomicity issues...
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
02:59 PM
locking on the queue also leads to extra complexity in case some job crashes and can not release the lock anymore, i could easily end up deadlocking myself
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:00 PM
I think if we had a SQL-like UPDATE WHERE mechanism that would help
03:00
Jason
03:00 PM
So you could do UPDATE x WHERE updated_at > sometimstamp
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:02 PM
but sometimestamp would still require me to fetch the current documents timestamp which introduces state on the queue-job which leads to all this atomicity issues again, no?
03:03
micha
03:03 PM
i think the better analogy would be a SELECT FOR UPDATE + UPDATE - which would create a lock on mysql/typesense side
03:05
micha
03:05 PM
but i think that would be way complex and less peformant than if i could define some “stale-indicator” field (int) in the collection and the update happens only, if this value is greater than the existing one. Then the “lock”/async timing problem is at least reduced to typesense and it’s nodes and not involving external systems with uncertain timing/availability etc.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:06 PM
Let's say the typesense doc has an updated_at of 1
And the doc on your side has an updated_at of 2

If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at (You should have yourdoc.updated_at in your DB right), this should go through

---

Let's say the typesense doc has an updated_at of 5
And the doc on your side has an updated_at of 3

If you do UPDATE ... WHERE typesense_doc.updated_at < yourdoc.updated_at

The condition evaluates to false and so the update won't go through.
03:07
Jason
03:07 PM
When evaluating the WHERE condition, we're essentially doing an implied SELECT behind the schenes
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:09 PM
and typesense would lock the document for other PATCH operations in between the check and the actual update?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:09 PM
Correct, it would be atomic

1

micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:10 PM
and the where statement would probably be a header field right?
03:11
micha
03:11 PM
i wonder if this increases complexity because then consumers want to have boolean operators and all kinds of other stuff for their where queries
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:11 PM
It would essentially be a filter_by query param on the update endpoint, very similar to the search and export endpoints
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:11 PM
but it’s an interesting concept, that in theory would be more flexible for clients. i think i like it. so basically i would send my document with its updated_at timestamp and a filter_by that uses the same value as the document has. nice
03:13
micha
03:13 PM
and if someone PATCHes the document while typesense has it lock, the request will be declined instead of waited right?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:13 PM
It would block (wait) that API request, until the lock is released
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:14 PM
ok that will get more tricky i guess because timeouts and stuff but yeah.
03:15
micha
03:15 PM
thanks, that would be amazing. is there anything you need from me to make this a reality? a github issue?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:16 PM
Yes! Could you summarize this use-case on this issue (specifically the atomicity part): https://github.com/typesense/typesense/issues/496
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:19 PM
you sure we wanna recycle the issue? i understand it would use the same mechanism but i’m not sure the original requester here would need the atomic support. So do you think it UPDATE WHERE should always be atomic or maybe for performance reasons it should be configurable so clients that don’t mind inconsistency (if that is ever the case) could disable the behavior?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:20 PM
Yeah we'd have to think through the atomicity piece when we implement UPDATE WHERE, vs doing it separately and having to rework things later.
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:20 PM
ok as you wish - will comment on the issue! thanks!

1

03:34
micha
03:34 PM

1

03:35
micha
03:35 PM
(i still think atomic updates and mass-updates as two separated concerns/issues that could be tackled independently of each other but i’ll hope this will make it at some point and not go under the original idea 🙂)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:39 PM
There would definitely be more work involved to handle atomicity. But if we can't tackle that in that issue when we implement it, I'll create a separate one to track atomicity. But I'd at least want to evaluate doing it together
micha
Photo of md5-acabeb270eee485e29143a208813eecc
micha
03:40 PM
awesome, thanks for the context!

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Handling Kinesis Stream Event Batching with Typesense

Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.

8

91
24mo

Handling Order of Firestore Events for Synchronization with Typesense.

Ross ran into an issue with Firestore events triggering out of order, causing synchronization inconsistency between Firestore and Typesense. With advice and input from Jason and Kishore Nallan, they implemented a debouncing solution using redis, ensuring that the latest Firestore data is synced to Typesense accurately.

3

41
19mo

Typesense Bug Fix with `canceled_at` Field and Upgrade Concerns

Mateo reported an issue regarding the treatment of an optional field by Typesense which was confirmed a bug by Jason. After trying an upgrade, an error arose. Jason explained the bug was due to a recent change and proceeded to downgrade their version. Future upgrade protocols were discussed.

3

74
10mo

Implementing Semantic Search with Typesense

Erik sought advice for semantic search implementation in Typesense and raised issues around slow document import and excessive latency. Upon implementing advice from Kishore Nallan to try different models, Erik reported faster times, ultimately deciding to rate-limit imports.

1

17
1mo

Issue with Search Duration on Typesense Database

Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.

13
10mo