Aljosa Asanovic
10/11/2021, 9:25 PMAljosa Asanovic
10/11/2021, 9:29 PMJason Bosco
10/11/2021, 10:19 PMJason Bosco
10/11/2021, 10:20 PMWould it be possible to run multiple 3 node (typesense) clusters to achieve horizontal kubernetes scaling like you would by having replicas?
I imagine you would need to keep the attached volumes synchronized somehow since you can't index to all locations at once.I haven't tried to do this, but I'd be surprised if this works, because each cluster maintains its own node state information
Jason Bosco
10/11/2021, 10:21 PMAljosa Asanovic
10/11/2021, 10:45 PMIf the primary goal is to scale read throughput, then you could add odd number of nodes into the Typesense cluster and load-balance requests between these nodes. However, the tradeoff is that each node that you add to the cluster increases write latency, because more nodes need to ack the write before the write API calls is deemed a successFor our use case, write latency is not very important. We add anywhere from 5-15k documents in a single batch and can do 1 or more batches per day. Initially documents are set in an "unpublished" state by using a scoped api key and a boolean "published" field. We then upsert the batch again to publish once we've previewed/reviewed the data for accuracy. (we're not actually doing keyword search, but rather using typesense to build a search-powered experience with facets, breadcrumbs, infinite scroll etc in an easy manner with fast response times). Is increased latency linear for each added noded? I think we need to use aliases here to switch over to a fully indexed collection but I think every collection would technically double the required memory for typesense, at least temporarily. I have not yet had the chance to run full benchmarks, our production document count will start around 5 million.
I haven't tried to do this, but I'd be surprised if this works, because each cluster maintains its own node state informationUnderstood, that makes sense.
Another way to horizontally scale would be to shard the data among multiple clusters. So they're standard independent Typesense clusters, but on the application side, you can write certain types of records or certain user records into particular clusters
Jason Bosco
10/11/2021, 11:02 PMIs increased latency linear for each added noded?I haven't benchmarked the growth of latency vs node count myself. Would be curious to know what you see!
I think we need to use aliases here to switch over to a fully indexed collection but I think every collection would technically double the required memory for typesenseYeah, for 5M documents that might be large enough RAM especially across multiple nodes for it to be expensive. So an in-place update might be better.
Aljosa Asanovic
10/12/2021, 3:09 PMI haven't benchmarked the growth of latency vs node count myself. Would be curious to know what you see!I'll keep you posted with what we see!