#community-help

Issue with Slow Bulk Indexing for 3 Million Documents

TLDR Digamber reports slow indexing performance for 3 million documents. Jason asks for the cluster ID and suggests confirming bulk import usage; Digamber will email a JSONL file and curl command for further investigation.

Powered by Struct AI

1

Mar 09, 2023 (7 months ago)
Digamber
Photo of md5-a0246423746b3b51425d05cfd9c494ae
Digamber
05:42 PM
Hi Guys, i am importing 3 million documents into a collection.
I’m doing chunked bulk index at a time - it’s taking me 7 secs per 40 records.
With some quick maths - it’s going to take me 145 hours 😢
Is there a more performative way i can index these posts ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:14 PM
That sounds odd. Is this on Typesense Cloud? If so, can you share the cluster ID?
Digamber
Photo of md5-a0246423746b3b51425d05cfd9c494ae
Digamber
06:21 PM
I have the node id - is the cluster id the same ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:36 PM
Yup it’s the same
Digamber
Photo of md5-a0246423746b3b51425d05cfd9c494ae
Digamber
06:37 PM
Here you gp bt0k9i561mfvgzp4p
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:42 PM
Hmm, metrics looks fine on this cluster
07:42
Jason
07:42 PM
Could you confirm that you’re using the bulk import endpoint?
07:43
Jason
07:43 PM
If so, could you try replicating the issue with curl?
Digamber
Photo of md5-a0246423746b3b51425d05cfd9c494ae
Digamber
07:55 PM
I can indeed confirm - that it’s the bulk import endpoint
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:56 PM
Ok if you can send me a JSONL file (via email) and a curl command to replicate the issue, I can take a closer look
Digamber
Photo of md5-a0246423746b3b51425d05cfd9c494ae
Digamber
07:57 PM
Ah - ok it’s a bit late today - i will email it to you tomorrow, thanks for the help

1