#community-help

Understanding Firestore Extension's Backfill Functionality

TLDR David asked about the backfill process of the firestore extension for different collections. Jason detailed the default backfill behavior and suggested running the backfill from a local computer for large collections to avoid function timeouts or memory issues.

Powered by Struct AI

1

9
1mo
Solved
Join the chat
Oct 21, 2023 (1 month ago)
David
Photo of md5-1ce0d4f5649c536b3f5c3cf511e37511
David
09:20 AM
Question about the firestore extension backfill functionality. If I've installed multiple instances of the extension for different collections, would this trigger a backfill for all of them, including ones that have been backfilled previously and that might be up to date? Jason
Image 1 for Question about the firestore extension backfill functionality. If I've installed multiple instances of the extension for different collections, would this trigger a backfill for all of them, including ones that have been backfilled previously and that might be up to date? <@4L6c7>
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:54 PM
By default, the backfill will indeed get triggered for all collections.

But the 3rd point here describes how to backfill a single collection: https://github.com/typesense/firestore-typesense-search#step-3%EF%B8%8F⃣--optional-backfill-existing-data
David
Photo of md5-1ce0d4f5649c536b3f5c3cf511e37511
David
06:10 PM
Nice, I missed that! What's the complexity of the backfill? I assume it reads through the entire collection once? We had some issues with the Algolia extension when backfilling a collection of 2M documents where the complexity was not linear and costs skyrocketed
Oct 23, 2023 (1 month ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:26 PM
&gt; I assume it reads through the entire collection once?
Yeah
04:27
Jason
04:27 PM
For large collections, I would recommend running your own backfill from your local computer
David
Photo of md5-1ce0d4f5649c536b3f5c3cf511e37511
David
06:14 PM
Why is that, would the cloud function time out otherwise?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:15 PM
That and also, the function might run out of RAM to export the full dataset

1

David
Photo of md5-1ce0d4f5649c536b3f5c3cf511e37511
David
06:17 PM
If it's a 2nd gen cloud function the timeout can be adjusted by quite a bit and I should be able to update the memory of it from the GCP cloud console too
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:26 PM
I see, good to know

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Issue with "Search Firestore with Typesense" Extension

Rajbek had an issue synchronizing data between Firestore and Typesense collection. Jason helped diagnose the issue and the problem was resolved by Rajbek increasing the memory allocation of the specific cloud function.

2

11
2mo
Solved

Discussing Firestore Extension and Collection Schema in Typesense

Loic asks about settings in Firestore Extension and experiences issues with the collection schema in Typesense. Discussion with Jason is ongoing.

1

12
3mo

Updating Bulk Records and Resolving Typsense Issues.

Greg inquired about updating bulk records. Jason proposed using the documents/import endpoint for bulk updating, identified issues with Typesense, and provided solutions. Greg appreciated the assistance. Conversation related to the procedure was shared with Viktor.

8

63
13mo
Solved

Syncing records issue from BigQuery to Typesense using Airbyte

Jamshid had a problem syncing records from BigQuery to Typesense via Airbyte, with only a partial number of records syncing. Jason suggested checking AirByte logs for Typesense API responses. They found an issue with BigQuery's connector on Airbyte handling repeated fields and concluded to consider building their own sync script to mitigate the situation.

1

29
2mo
Solved

Collection-Level Scoring in Typesense Multi-Search

Mile questioned the possibility of collection-level scoring in Typesense. Jason advised adjusting scores client-side, and requested Mile to create a GitHub issue to assess community interest.

4
1mo