Managing Typesense Across Large Datasets

TLDR Phillip inquired about best practices with Typesense for large datasets. Ross shared personal practices, and Jason confirmed Typesense isn't designed to be a primary datastore. Patrick shared a related discussion link. Viji mentioned using DynamoDB Streams as an option.

Photo of Phillip
Phillip
Mon, 15 Aug 2022 16:22:41 UTC

Hi all. Two questions. 1. When people use typesense for large datasets, do they usually keep the dataset in typesense and in a relational database and somehow keep them in sync? Is there a design pattern or best practice for this? Or is it more normal to keep everything in only one place? 2. If the data is only in typesense, what is the best way to go about relating a thing in your database to a thing in typesense in a many to many fashion? Normally a link/join table would be used. What is the best way to do this without that/is that somehow still the correct way?

Photo of Ross
Ross
Mon, 15 Aug 2022 17:10:26 UTC

1. speaking for ourselves -- we maintain a sync between our DB (firebase RTDB / firestore) and typesense. we handle this through google cloud functions that ensure typesense is kept up to date with changes in firestore. 2. while we don't do this since i believe the recommended approach is _not_ to rely on Typesense as your primary datastore -- i have seen people use multiple Typesense queries to perform "join-like" operations between multiple collections if needed

Photo of Viji
Viji
Mon, 15 Aug 2022 18:03:33 UTC

Can you please point to any documentation that says we should not use Typesense as the primary datastore?

Photo of Jason
Jason
Mon, 15 Aug 2022 18:18:07 UTC

I can confirm that Typesense is not designed to be used as a primary datastore. You only want to sync a _copy_ of the your data stored in another primary datastore into Typesense, to search through it.

Photo of Jason
Jason
Mon, 15 Aug 2022 18:19:30 UTC

I thought we had mentioned this in the docs, but it looks like we’ve only mentioned this in conversations with users and in some presentations. Will add a note to the docs about this.

Photo of Jason
Jason
Mon, 15 Aug 2022 18:22:24 UTC

re: best practice to sync data, I’d recommend maintaining an updated_at timestamp for each record in your primary data store, and then on a periodic basis run a scheduled job that checks for records that have an updated_at greater than the last sync time and batch insert them into Typesense using the documents/import endpoint.

Photo of Patrick
Patrick
Tue, 16 Aug 2022 15:28:08 UTC

This talk might be interesting to you:

Photo of Viji
Viji
Tue, 16 Aug 2022 20:55:52 UTC

interesting talk. thanks for sharing Patrick Using DynamoDB Streams may also be an option for us