Issue with Slow Bulk Indexing for 3 Million Documents
TLDR Digamber reports slow indexing performance for 3 million documents. Jason asks for the cluster ID and suggests confirming bulk import usage; Digamber will email a JSONL file and curl command for further investigation.
1
Mar 09, 2023 (7 months ago)
Digamber
05:42 PMI’m doing chunked bulk index at a time - it’s taking me 7 secs per 40 records.
With some quick maths - it’s going to take me 145 hours 😢
Is there a more performative way i can index these posts ?
Jason
06:14 PMDigamber
06:21 PMJason
06:36 PMDigamber
06:37 PMJason
07:42 PMJason
07:42 PMJason
07:43 PMDigamber
07:55 PMJason
07:56 PMDigamber
07:57 PM1
Typesense
Indexed 2779 threads (79% resolved)
Similar Threads
Revisiting Typesense for Efficient DB Indexing and Querying
kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.
Troubleshooting Indexing Duration in Typesense Import
Alan asked about lengthy indexing times for importing documents to Typesense. Jason suggested various potential causes, including network connectivity and system resources. They later identified the problem to be an error in Alan's code.
Typesense Import Issue with HTTP Code 503 Error
Tomas faced errors while importing to typesense, including an HTTP code 503. Jason identified the issue as CPU exhaustion and recommended slowing down writes or upgrading to at least 4vCPU.
Implementing Typesense Updates with JSONL Import and Aliases
Ken is building a search solution for a website using Typesense. They consulted Kishore Nallan about the implementation of updates using JSONL import and aliases and how to know when the new collection is indexed and ready. Measures, such as dividing large imports into small batches, were suggested to address the issue.
Bulk Import 50MB JSON Files Error - Timeout and Solutions
madhweep encounters an error while bulk importing JSON files. Kishore Nallan provided help, but the issue persists. Jason intervenes and after troubleshooting, they concluded the cluster had run out of memory causing the issue. The problem was resolved by using a cluster with sufficient memory. Daniel also experienced a similar issue, resolved by increasing the timeout.