Hey guys, I wrote airbyte connector for typesense....
# community-help
c
Hey guys, I wrote airbyte connector for typesense. Hope more people can use typesnse through airbyte. https://github.com/airbytehq/airbyte/pull/18349 It’s an nocode alternative to move data from more then 100 sources to typesense. In my use case I move data from BigQuery to typesense. Maybe it deserve a Walk-throughs guide on how to use it!
j
@Cirdes Henrique Thank you for building this 🙏 🙏 🎉 I’ve been meaning to look into an Airbyte integration for a while now, so I’m excited that it’s now available!
A frequent ask I’ve heard from users is to sync data from MySQL / Postgres into Typesense, so this integration will be very helpful
👍 1
👍🏽 1
I’ll play around with it and let you know how it goes!
c
@Jason Bosco great!!
j
@Cirdes Henrique I can’t seem to find the Typesense connector as an available destination in the UI… Do I need to do something else to enable it? I just did the following:
Copy code
git clone <https://github.com/airbytehq/airbyte.git>
cd airbyte
docker-compose up
and visited http://localhost:8000/ I do see the typesense connector code in the master branch of the airbyte repo, but it’s not showing up in the UI
c
Hey @Jason Bosco, we have to wait for a platform release to be able to see typsense in destination catalog.
For now, you should go on Settings > Destinations > New connector
and type: Typesense airbyte/destination-typesense 0.1.0 any url as Documentation
@Jason Bosco, I need some help to configure airbyte and typesense. I was able to sync 1M records but now I’m trying to sync 50M and is failing. Let me explain how it is working…
Airbyte is reading from source and I’m batching records. For now I’m reading 100000 records and after that using python lib
Copy code
client.collections['companies'].documents.import_(data)
around 6M records ingested into typesense I’m getting
Copy code
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='*', port=443): Read timed out. (read timeout=2)
• Should I call import every 100000 records or change it? • Should I change timeout threshold on python lib? • Import method has a batch_size option, it should match the number of records in import_ data?
j
Should I call import every 100000 records or change it?
Yeah, that’s good.
Should I change timeout threshold on python lib?
Yes definitely. Typesense does the indexing syncronously as part of the API call. So you want to set the timeout to as high as 60 minutes to make sure the import API call never times out.
Import method has a batch_size option, it should match the number of records in import_ data?
No this parameter controls the server-side batching. For eg, if you sent 100K records in a single import API call, and then set
batch_size
to say 5000, Typesense will import the first 5K records from the 100K docs you sent over in one execution cycle, and then switch to processing any pending search requests in the queue, execute them and then switch back to the import queue to import the next 5K records, and so on. So increasing this server-side batch_size could cause search requests to be deprioritized while import is happening. So I would recommend not changing this unless really required.
🎉 1
c
@Jason Bosco, thanks for explain it, I will open an PR on airbyte to change timeout threshold and test it!
👍 1
worked like a charm! 🥳
🙌 1
🔥 1
🎉 1
j
Amazing! Great to see the connector working well with 54M docs
😊 1