#community-help

Managing Typesense Across Large Datasets

TLDR Phillip inquired about best practices with Typesense for large datasets. Ross shared personal practices, and Jason confirmed Typesense isn't designed to be a primary datastore. Patrick shared a related discussion link. Viji mentioned using DynamoDB Streams as an option.

Powered by Struct AI

3

1

8
14mo
Solved
Join the chat
Aug 15, 2022 (14 months ago)
Phillip
Photo of md5-3d8346de287401da0aaa8b11cddb1db7
Phillip
04:22 PM
Hi all. Two questions.

1. When people use typesense for large datasets, do they usually keep the dataset in typesense and in a relational database and somehow keep them in sync? Is there a design pattern or best practice for this? Or is it more normal to keep everything in only one place?
2. If the data is only in typesense, what is the best way to go about relating a thing in your database to a thing in typesense in a many to many fashion? Normally a link/join table would be used. What is the best way to do this without that/is that somehow still the correct way?
Ross
Photo of md5-faf0fdba0b6739a6706f05c15b6738c6
Ross
05:10 PM
1. speaking for ourselves -- we maintain a sync between our DB (firebase RTDB / firestore) and typesense. we handle this through google cloud functions that ensure typesense is kept up to date with changes in firestore.
2. while we don't do this since i believe the recommended approach is not to rely on Typesense as your primary datastore -- i have seen people use multiple Typesense queries to perform "join-like" operations between multiple collections if needed
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
06:03 PM
Can you please point to any documentation that says we should not use Typesense as the primary datastore?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:18 PM
I can confirm that Typesense is not designed to be used as a primary datastore. You only want to sync a copy of the your data stored in another primary datastore into Typesense, to search through it.
06:19
Jason
06:19 PM
I thought we had mentioned this in the docs, but it looks like we’ve only mentioned this in conversations with users and in some presentations. Will add a note to the docs about this.
06:22
Jason
06:22 PM
re: best practice to sync data, I’d recommend maintaining an updated_at timestamp for each record in your primary data store, and then on a periodic basis run a scheduled job that checks for records that have an updated_at greater than the last sync time and batch insert them into Typesense using the documents/import endpoint.

3

Aug 16, 2022 (14 months ago)
Patrick
Photo of md5-8072c418cbb65dda14994ff787349c1d
Patrick
03:28 PM

1

Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
08:55 PM
interesting talk. thanks for sharing Patrick
Using DynamoDB Streams may also be an option for us
https://typesense.org/docs/guide/dynamodb-full-text-search.html#step-1-create-typesense-cluster