How can I know that indexing completed on a datase...
# community-help
t
How can I know that indexing completed on a dataset?
k
When the endpoint response arrives, indexing is done. All endpoints are synchronous.
t
If we restart the instance, it take a long time before it's usable, like 30 minutes. It re-index on reboot?
k
Yes, only raw documents are stored on disk and indexing happens in memory on restart.
t
Alright, what's the bottleneck on that? CPU or Disk speed?
k
CPU. The latest 0.23 RC builds are faster in this respect.
t
How much faster is 0.23 RC?
k
Depends on dataset. Primary work is around numerical fields.
t
Dataset is majority text
k
We recommend running a 3 node configuration so rotationse can be done without a single point of failure.
There might be a few other things we can still do to optimize text fields.
t
Yeah we're starting with 3 nodes in a cluster
What's the optimal size in terms of keys?
We have 60 datapoints that need to filter on
@Kishore Nallan How often do the API break, could rolling upgrades on a cluster be a problem?
k
Every node stores all the data so nodes help in increasing throughput acorss many users.
We've successfully done 5 versions so far on Typesense cloud across hundreds of deployments. We take care about backward compatibility.
We store nothing but documents on disk so not much problem with upgrades.
t
Superb 🙂
Thank you
k
👍
t
@Kishore Nallan It's not possible to dump, periodically the RAM that's the index, to disk, so it doesn't need to be re-indexed on reboot?
💡 2
k
You could try doing this via CRIU: https://criu.org/Main_Page
👍 1
t
Yeah that's how we currently do it with KVM, but this doesn't help if there's a hardware issue