Updating Typesense Indexes using Cron Jobs and MySql

TLDR sonu was unsure how to track deleted ids for batch updates to Typesense. Kishore Nallan proposed a detailed system involving MySql data and also referenced the Typesense API documentation. sonu understood the suggestion.

Powered by Struct AI
Join the chat
Jun 12, 2021 (30 months ago)
Photo of md5-6ade4a341436f96c87480052a1584bf3
07:01 AM
Kishore Nallan i was implementing the cron job for batch update to typesense but adding boolean  field of is_deleted in mysql won't do it,  as i won't change boolean value of it after deleting from typesense
should i use a file and save ids to track it or just alter the table to have something like is_deleted_typesense to track deleted ids from typesense.... any suggestions ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
07:14 AM
Here is what I had in mind:

1. Each row in your mysql table has updated_at and is_deleted
2. When a record is deleted the is_deleted is set to true and the updated_at also should be updated.
3. When a record is updated, similarly, the updated_at field should also be updated.
4. When the Typesense sync cron job runs, it will query for the records that need to be synced with this query: SELECT * FROM table WHERE updated_at < NOW() AND id > $prev_id ORDER BY updated_at ASC, id ASC
5. The $prev_id value is the largest record ID processed in every cron -- this should be stored after every cron and is referenced for the next cron.
6. We use updated_at < NOW() to prevent picking up records currently being written to.
7. To prune your database of deleted records, you can run another weekly cron that just deletes all the records which have the is_deleted flag set but is older than a few days old (to prevent interfering with Typesense cron which runs more often)
Kishore Nallan
07:19 AM
8. When the records from the DB query are processed, you check the is_deleted flag to decide whether you should be deleting the record from Typesense or updating Typesense. The update and delete batch operations must be separated since upserts and deletes cannot be batched together in the same API call.
Kishore Nallan
07:20 AM
If your database is not too large, you don't need this complicated setup: you can just run hourly batch imports onto a new collection and use the Alias feature to swap over to the new collection without having to update your client code. See: https://typesense.org/docs/0.20.0/api/collection-alias.html#collection-alias
Jun 13, 2021 (30 months ago)
Photo of md5-6ade4a341436f96c87480052a1584bf3
04:27 PM
got it