Hello, I was looking for help with setting up bulk...
# community-help
a
Hello, I was looking for help with setting up bulk imports/syncing in our codebase. We currently handle updating single documents when the database is hit. We want to move to doing the updates periodically to reduce complexity. The examples and documentation I saw required an additional table to track the timestamp of the last sync. I was wondering if there were other recommended approaches I should consider or if there's any links to example implementations of periodic bulk importing. Any help is appreciated.
The function containing the logic currently looks like this, but isn't yet plugged in due to the question about needing an additional table/how to track when we last synced.
Copy code
export const pollDatabaseAndSyncTypesense = async () => {
  try {
    if (!lastSyncedAt) {
      lastSyncedAt = await getLastSyncedAt();
    }
    const models: Array<{ model: typeof Model; collectionName: string }> = [
      { model: User, collectionName: "users" },
      { model: Community, collectionName: "communities" },
      { model: Event, collectionName: "events" },
      { model: Message, collectionName: "messages" },
      { model: Group, collectionName: "groups" },
      { model: Project, collectionName: "projects" },
      { model: Membership, collectionName: "membership" },
      { model: MembershipRequest, collectionName: "membership_request" },
    ];

    // Process each model for updated and deleted records
    for (const { model, collectionName } of models) {
      const config = CollectionConfig[collectionName as CollectionKey];

      // Query updated records
      const updatedRecords = await model.query()
        .where("updatedAt", ">", lastSyncedAt)
        .select(...config.fields);

      const mappedRecords = updatedRecords.map(config.mapObject);

      if (mappedRecords.length > 0) {
        await typesense
          .collections(collectionName)
          .documents()
          .import(mappedRecords, { action: "upsert" });
        <http://log.info|log.info>(`Synced ${mappedRecords.length} updated records for ${collectionName}`);
      }
    }

    // Update lastSyncedAt
    await updateLastSyncedAt();

  } catch (error) {
    Sentry.captureException(error);
    log.error("Error syncing data to Typesense:", error);
  }
};
j
You'd need some place to store the last synced timestamp to use inside
getLastSyncedAt
Since we wrote the docs, one recent feature you can use is the collection metadata feature inside of Typesense to store this last synced timestamp: https://typesense.org/docs/27.1/api/collections.html#adding-metadata-to-schema
Let me know if you're able to do the sync this way without an extra table, and we can update the docs
a
So, using this approach it would look something like (as a high level example): 1. add metadata
last_synced
to the collections 2. Periodically (lets say every 5 mins) use similar code as above to do comparison of db records
updatedAt
comparing to
last_synced
for the collection the record belongs to. 3. Update the collection's
last_synced
to the current time. (Question: does updating metadata work the same as updating other fields in the collection or how would that look)?
We were considering storing the last synced at for each table using redis. Anything to keep in mind if we did it this way?
j
Your approach sounds good. Metadata works the same as updating fields. Instead of sending
fields
in the PATCH endpoint, you'll send metadata
You could use redis to maintain this metadata as well, no other gotchas I can think of, other than making sure you've configured persistence in Redis