Airbyte Connector for Typesense Integration and Issue Resolution
TLDR Cirdes shares the creation of an Airbyte connector for Typesense, later experiencing timeout issues with large-scale data synchronization. Jason provides solutions for adjustment of timeout thresholds and batch sizes, successfully resolving the issue.






Oct 28, 2022 (11 months ago)
Cirdes
04:53 PMhttps://github.com/airbytehq/airbyte/pull/18349
It’s an nocode alternative to move data from more then 100 sources to typesense. In my use case I move data from BigQuery to typesense.
Maybe it deserve a Walk-throughs guide on how to use it!
Jason
04:55 PMI’ve been meaning to look into an Airbyte integration for a while now, so I’m excited that it’s now available!
Jason
04:56 PM

Jason
04:56 PMCirdes
04:57 PMOct 29, 2022 (11 months ago)
Jason
12:41 AMI just did the following:
git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up
and visited http://localhost:8000/
I do see the typesense connector code in the master branch of the airbyte repo, but it’s not showing up in the UI
Cirdes
11:34 AMCirdes
11:35 AMCirdes
11:36 AMTypesense
airbyte/destination-typesense
0.1.0
any url as Documentation
Nov 01, 2022 (11 months ago)
Cirdes
09:54 PMCirdes
09:59 PMclient.collections['companies'].documents.import_(data)
Cirdes
10:01 PMurllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='*', port=443): Read timed out. (read timeout=2)
Cirdes
10:05 PM• Should I change timeout threshold on python lib?
• Import method has a batchsize option, it should match the number of records in import data?
Jason
10:08 PMYeah, that’s good.
> Should I change timeout threshold on python lib?
Yes definitely. Typesense does the indexing syncronously as part of the API call. So you want to set the timeout to as high as 60 minutes to make sure the import API call never times out.
> Import method has a batchsize option, it should match the number of records in import data?
No this parameter controls the server-side batching. For eg, if you sent 100K records in a single import API call, and then set
batch_size
to say 5000, Typesense will import the first 5K records from the 100K docs you sent over in one execution cycle, and then switch to processing any pending search requests in the queue, execute them and then switch back to the import queue to import the next 5K records, and so on.So increasing this server-side batch_size could cause search requests to be deprioritized while import is happening. So I would recommend not changing this unless really required.

Cirdes
10:14 PM
Nov 03, 2022 (11 months ago)
Cirdes
03:38 PM


Jason
06:02 PM
Typesense
Indexed 2764 threads (79% resolved)
Similar Threads
Airbyte Timeout Issue with Typesense Sync
Jamshid experienced timeouts when syncing MySQL records with Airbyte. Jason suggested it's related to an Airbyte issue, leading Jamshid to create a custom Typesense connector to resolve the problem.

Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.


Revisiting Typesense for Efficient DB Indexing and Querying
kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.

Troubleshooting Write Timeouts in Typesense with Large CSVs
Agustin had issues with Typesense getting write timeouts while loading large CSV files. Kishore Nallan suggested chunking data or converting to JSONL before loading. Through troubleshooting, they identified a possible network problem at AWS and found a workaround.


Bulk Import 50MB JSON Files Error - Timeout and Solutions
madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.

