Cirdes Henrique
10/28/2022, 4:53 PMJason Bosco
10/28/2022, 4:55 PMJason Bosco
10/28/2022, 4:56 PMJason Bosco
10/28/2022, 4:56 PMCirdes Henrique
10/28/2022, 4:57 PMJason Bosco
10/29/2022, 12:41 AMgit clone <https://github.com/airbytehq/airbyte.git>
cd airbyte
docker-compose up
and visited http://localhost:8000/
I do see the typesense connector code in the master branch of the airbyte repo, but it’s not showing up in the UICirdes Henrique
10/29/2022, 11:34 AMCirdes Henrique
10/29/2022, 11:35 AMCirdes Henrique
10/29/2022, 11:36 AMCirdes Henrique
11/01/2022, 9:54 PMCirdes Henrique
11/01/2022, 9:59 PMclient.collections['companies'].documents.import_(data)
Cirdes Henrique
11/01/2022, 10:01 PMurllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='*', port=443): Read timed out. (read timeout=2)
Cirdes Henrique
11/01/2022, 10:05 PMJason Bosco
11/01/2022, 10:08 PMShould I call import every 100000 records or change it?Yeah, that’s good.
Should I change timeout threshold on python lib?Yes definitely. Typesense does the indexing syncronously as part of the API call. So you want to set the timeout to as high as 60 minutes to make sure the import API call never times out.
Import method has a batch_size option, it should match the number of records in import_ data?No this parameter controls the server-side batching. For eg, if you sent 100K records in a single import API call, and then set
batch_size
to say 5000, Typesense will import the first 5K records from the 100K docs you sent over in one execution cycle, and then switch to processing any pending search requests in the queue, execute them and then switch back to the import queue to import the next 5K records, and so on.
So increasing this server-side batch_size could cause search requests to be deprioritized while import is happening. So I would recommend not changing this unless really required.Cirdes Henrique
11/01/2022, 10:14 PMCirdes Henrique
11/03/2022, 3:38 PMJason Bosco
11/03/2022, 6:02 PM