#community-help

Optimal batch_size for Importing Million Document Dataset

TLDR Ahmad asked the best batch_size for inserting million data. Jason advised staying at default and using 5000 document batches through API calls.

Powered by Struct AI

1

5
22mo
Solved
Join the chat
Jan 21, 2022 (22 months ago)
Ahmad
Photo of md5-4a6338a1d6016269c9f234fc1a133144
Ahmad
10:30 PM
Hi Everyone, What should be maximum batch_size value for inserting a million number of data ? I need a number that will not effect the performance.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:31 PM
Ahmad I'd recommend just leaving it at its default value and instead use client-size batching and send say 5000 documents per import API call
Ahmad
Photo of md5-4a6338a1d6016269c9f234fc1a133144
Ahmad
10:32 PM
and what if i run multiple api calls inserting 5000 documents each ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:33 PM
Yeah that should be fine. You want to keep an eye on CPU usage to make sure it's not saturated
Ahmad
Photo of md5-4a6338a1d6016269c9f234fc1a133144
Ahmad
10:37 PM
its 48%.

1