How fast should I expect the inserting process? I ...
# community-help
d
How fast should I expect the inserting process? I have around 10 million documents totally ~2-3 gb
j
It depends on the size of each document (specifically the indexed fields). As a point of reference, I’ve been able to index 2M documents with about 0.2KB per document in about 4 minutes
d
Awesome, that’s super fast
😄 1
c
how many documents do you put into each API call for that example @Jason Bosco? did you have to change the default API timeout at all?
j
In that case, I put all 2M docs in a single api call
There’s no timeout on the server-side API. But if you’re using a client library, you do have to increase the client-side timeout on it
d
Do you generally put the docs onto the server where Typesense is hosted re: ETL workflow?
Also, is it fair to assume the serverside API is the same at the client outside of the location?
c
interesting, good to know. i'd been batching with 100 docs / 28mb per API call for my ~1 gb dataset.
j
https://typesense-community.slack.com/archives/C01P749MET0/p1688945588318639?thread_ts=1688928527.796179&channel=C01P749MET0&message_ts=1688945588.318639 Not necessary to do this. In most cases you’ll be running Typesense in a different server and sending the JSONL data via api from a different server (for eg where your app server is running) But in the example I mentioned above, I did have the JSONL file locally on the server running the Typesense process.
1
d
Ah I see what you mean
Got it
j
d
That’s cool
My company was using Pinecone and there was an incredibly low upload rate limit
j
Ah I see, good to know! I haven’t benchmarked how many vectors Typesense is able to handle in what amount of time yet… but in any case, there are no upload limits set. For ideal indexing performance you want a minimum of atleast 4 vCPU cores.
d
I think the bottleneck was the API/networking, not the software itself. Will report back on how the experience was
👍 1
Is there a limit to what the client timeout is?
j
No you can increase it as high as needed
👍 1
In fact in the JS client for eg (in the
next
version), we’ve removed timeouts for import API calls, so they never timeout (which is what we want - so partial uploads don’t end up clogging up server capacity)
d
Ah makes sense
Appreciate the quick response!
👍 1