Syncing records issue from BigQuery to Typesense using Airbyte
TLDR Jamshid had a problem syncing records from BigQuery to Typesense via Airbyte, with only a partial number of records syncing. Jason suggested checking AirByte logs for Typesense API responses. They found an issue with BigQuery's connector on Airbyte handling repeated fields and concluded to consider building their own sync script to mitigate the situation.
1
Sep 29, 2023 (2 months ago)
Jamshid
09:59 PM"recordsSynced" : 50551,
but when we see on typesense cloud, only 11,032 records are synced.Jason
10:00 PMJamshid
10:03 PM2023-09-29 21:56:16 replication-orchestrator > Schema validation errors found for stream xxxx. Error messages: [$.has_profile: null found, string expected, $.have_dependent_children: null found, string expected, $.country_of_residence: null found, string expected, $.meta_noc: null found, string expected, ........]
a bunch more of the exact same messaging as above.
Now I wonder why that is. Why not accept
null
and expecting string
.Any immediate thoughts?
Jason
10:08 PMoptional: true
in the collection schemaJason
10:09 PM1
Jamshid
10:09 PM{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "has_profile",
"optional": true,
"sort": false,
"type": "string"
},
Jason
10:09 PMJamshid
10:10 PMJamshid
10:11 PMJason
10:12 PMJason
10:12 PMJamshid
10:13 PMJamshid
10:20 PMrecord
type fields and now all 50K records are synced. Definitely something about them.Jamshid
10:30 PMOct 02, 2023 (2 months ago)
Jason
01:00 AMJamshid
05:56 PMJason
07:48 PMWe do not change Typesense versions automatically, except in very rare cases to restore cluster stability.
Jason
07:49 PMJamshid
09:58 PMAgain, if I do not include the object fields, I get all the data on the typesense side.
Object fields shows-up like this on the typesense side:
{
"created_at": 1696283162,
"default_sorting_field": "",
"enable_nested_fields": true,
"fields": [
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "work",
"optional": true,
"sort": false,
"type": "object[]"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "",
"optional": true,
"sort": false,
"type": "string[]"
},
{
"facet": false,
"index": true,
"infix": false,
"locale": "",
"name": "work.job_title",
"optional": true,
"sort": false,
"type": "string[]"
},
.....
.....
}
which looks okay to me.
Jason
10:39 PMSince that error message you shared is not from Typesense, it's hard to debug this further from our side. May be AirByte can offer insights into what that error exactly is?
Oct 03, 2023 (2 months ago)
Jamshid
04:33 PM1. https://github.com/airbytehq/airbyte/issues/30179
2. https://github.com/airbytehq/airbyte/issues/4487
It looks the connector can’t handle the repeated fields really well. I can confirm that’s the case for me.
To avoid that error, we can save the Array of objects as string. Doing that though, we lose the ability to access those objects, as our
"enable_nested_fields": true,
will be useless. I wonder if there is a way to change the data type after the sync happens to typesense from string
to object[]
or if there is any alternative solutions.Jason
05:00 PMway to change the data type after the sync happens to typesense from string to object[]This is not possible to do in Typesense...
Jason
05:02 PMJason
05:02 PMJason
05:03 PMJamshid
05:09 PMOct 04, 2023 (2 months ago)
Jason
04:13 PMJason
04:13 PMJason
04:13 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Setting `facet` as `true` in DB fields through Airbyte
Jamshid had an issue setting `facet` as `true` in DB fields through Airbyte. Jason shared the equivalent API endpoint and recommended upgrading Typesense to resolve an unusual bug.
Troubleshooting 400 Error When Upgrading Typesense Firestore Extension
Orion experienced a `400` error after updating the Typesense Firestore extension, causing issues with cloud functions. They traced the issue back to a data type conflict in their Typesense collection schema after updating. With help from Jason and Kishore Nallan, they resolved the issue by recreating the collection.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.
Typesense Bug Fix with `canceled_at` Field and Upgrade Concerns
Mateo reported an issue regarding the treatment of an optional field by Typesense which was confirmed a bug by Jason. After trying an upgrade, an error arose. Jason explained the bug was due to a recent change and proceeded to downgrade their version. Future upgrade protocols were discussed.
Updating Bulk Records and Resolving Typsense Issues.
Greg inquired about updating bulk records. Jason proposed using the documents/import endpoint for bulk updating, identified issues with Typesense, and provided solutions. Greg appreciated the assistance. Conversation related to the procedure was shared with Viktor.