#community-help

Airbyte Connector for Typesense Integration and Issue Resolution

TLDR Cirdes shares the creation of an Airbyte connector for Typesense, later experiencing timeout issues with large-scale data synchronization. Jason provides solutions for adjustment of timeout thresholds and batch sizes, successfully resolving the issue.

Powered by Struct AI
+12
tada2
+1::skin-tone-41
blush1
fire1
raised_hands1
17
11mo
Solved
Join the chat
Oct 28, 2022 (11 months ago)
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
04:53 PM
Hey guys, I wrote airbyte connector for typesense. Hope more people can use typesnse through airbyte.
https://github.com/airbytehq/airbyte/pull/18349
It’s an nocode alternative to move data from more then 100 sources to typesense. In my use case I move data from BigQuery to typesense.
Maybe it deserve a Walk-throughs guide on how to use it!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:55 PM
Cirdes Thank you for building this 🙏 🙏 🎉

I’ve been meaning to look into an Airbyte integration for a while now, so I’m excited that it’s now available!
04:56
Jason
04:56 PM
A frequent ask I’ve heard from users is to sync data from MySQL / Postgres into Typesense, so this integration will be very helpful
+11
+1::skin-tone-41
04:56
Jason
04:56 PM
I’ll play around with it and let you know how it goes!
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
04:57 PM
Jason great!!
Oct 29, 2022 (11 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:41 AM
Cirdes I can’t seem to find the Typesense connector as an available destination in the UI… Do I need to do something else to enable it?

I just did the following:

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up

and visited http://localhost:8000/

I do see the typesense connector code in the master branch of the airbyte repo, but it’s not showing up in the UI
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
11:34 AM
Hey Jason, we have to wait for a platform release to be able to see typsense in destination catalog.
11:35
Cirdes
11:35 AM
For now, you should go on Settings > Destinations > New connector
11:36
Cirdes
11:36 AM
and type:
Typesense
airbyte/destination-typesense
0.1.0
any url as Documentation
Nov 01, 2022 (11 months ago)
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
09:54 PM
Jason, I need some help to configure airbyte and typesense. I was able to sync 1M records but now I’m trying to sync 50M and is failing. Let me explain how it is working…
09:59
Cirdes
09:59 PM
Airbyte is reading from source and I’m batching records. For now I’m reading 100000 records and after that using python lib
client.collections['companies'].documents.import_(data)
10:01
Cirdes
10:01 PM
around 6M records ingested into typesense I’m getting
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='*', port=443): Read timed out. (read timeout=2) 
10:05
Cirdes
10:05 PM
• Should I call import every 100000 records or change it?
• Should I change timeout threshold on python lib?
• Import method has a batchsize option, it should match the number of records in import data?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:08 PM
> Should I call import every 100000 records or change it?
Yeah, that’s good.

> Should I change timeout threshold on python lib?
Yes definitely. Typesense does the indexing syncronously as part of the API call. So you want to set the timeout to as high as 60 minutes to make sure the import API call never times out.

> Import method has a batchsize option, it should match the number of records in import data?
No this parameter controls the server-side batching. For eg, if you sent 100K records in a single import API call, and then set batch_size to say 5000, Typesense will import the first 5K records from the 100K docs you sent over in one execution cycle, and then switch to processing any pending search requests in the queue, execute them and then switch back to the import queue to import the next 5K records, and so on.

So increasing this server-side batch_size could cause search requests to be deprioritized while import is happening. So I would recommend not changing this unless really required.
tada1
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
10:14 PM
Jason, thanks for explain it, I will open an PR on airbyte to change timeout threshold and test it!
+11
Nov 03, 2022 (11 months ago)
Cirdes
Photo of md5-e553f1bcd08d913921023ad117a11229
Cirdes
03:38 PM
worked like a charm! 🥳
fire1
raised_hands1
tada1
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:02 PM
Amazing! Great to see the connector working well with 54M docs
blush1