Hey so I want to add read replicas, or some sort o...
# community-help
s
Hey so I want to add read replicas, or some sort of global CDN to my open source instance of typesense to make performance for my users globally better. Basically I have a ton of data (around 60GB) so currently I have this on a hetzner box for around 120 euro p/m which is like 5-10x less than cloud providers. So this works for me financially, however I want to get latency really lowered down with read replicas or a CDN approach. My workflow is extreemly read heavy, infact I will only really write in batch jobs a couple of times a month. Any ideas of how I go could go about this with the OSS route or am I best just literally buying two boxes, and applying writes to both boxes when they need them. Bare in mind this will X my cost per read replica as I assume I'll need to have a VM with 64gb of ram for each instance.
j
A multi-node clustered setup in Typesense requires a minimum of 3 nodes for quorum. So if cost is a constraint, and especially given your minimal writes, you could just spin up two standalone instances and do dual writes to each.
as I assume I'll need to have a VM with 64gb of ram for each instance.
That's correct
You would then need a CDN on top that can support multiple origins and can route to those origins based on request geo origin
s
Thank you @Jason Bosco just to follow up. I'm using cloudflare workers so all my api requests are distributed. most of my customers are uk / us. So I was thinking of just spinning up a box in london and somewhere near san francisco for example. One other idea I had was just adding a redis cache for queries to speed up common reads. Is there any way to reduce my ram requirements per box? (as this would make the decision alot easier financially) I have about 60gb of data and I believe you need to mirror that with ram with typesense.
j
With a 60GB dataset, I would imagine that your cache hit ratio would be low. But if that's not the case and you have a few queries that are commonly accessed, then even an HTTP-based CDN cache would be helpful, instead of having to introduce redis as a cache
I believe Cloudflare has a way to dynamically set Cache Control headers for all requests, overriding any origin headers
s
Hey @Jason Bosco so my app is a diet and nutrition one but with ai logging essentially the ai takes a natural language description or photo of food "a carbonara, two glasses of red wine, and a chocolate mousse" it turns this into three queries "spagetti" "red wine" "chocolate mousse" which I'm currently using a federated search (where I query for all three a the same time in one request) but perhaps it would be better to do this as a Promise.all with the three seperated queries as Im more likely to get cache hits that way e.g. coffee vs coffee and orange juice (federated) hope that makes sense. Do you think this approach makes the most sense or am I going to lose alot by spamming typesense with additional requests vs batching in a federated. And would you recommend over the other redis vs cloudflare approach? (I'm using cloudflare workers already but upstash and stuff like is pretty easy and cheap to setup so wouldn't be a problem either)
> With a 60GB dataset, I would imagine that your cache hit ratio would be low. Honestly an absurd amount of the dataset will never be touched (as naturally there is like 100 different coca colas in my db + lots of uncommonly eaten foods etc), but because I'm in alpha/beta and also because of the sheer volume of data its hard to reduce this down at the current time, hopefully ai models + context windows will get to a place where I can get an AI to do the grunt and go through the millions and millions of records and clean this up to reduce the size. This way it would be more affordable for me to use managed hosting etc. As 64gb ram box is basically just a fortune without relying on cloud credits or hetzner etc. I'd imagine things like "coffee" for example would get a lot of cache hits, but not "dragonfruit"
j
but perhaps it would be better to do this as a Promise.all with the three seperated queries as Im more likely to get cache hits that way
Yeah, from a caching perspective, you might want to separate the query for each ingredient in a separate single-search request, instead of using multi-search.
If speed is your primary concern, then Typesense also has a built-in caching mechanism that you can enable: https://typesense.org/docs/26.0/api/search.html#caching-parameters This would essentially be the equivalent of manually doing it with redis as a cache, since the Typesense cache is also in-memory
❤️ 1
s
I guess I'm thinking that cloudflare workers cache will be next to the user, and also yeah (minor benefit tbh as I think my server will have a ton of compute for the amount of requests, as my data size is the main issue so I'll have a load of spare CPU) is it will reduce a bit of load off of typesense too and really typesense is so fast in general (also in-memory) as you say that the main thing is the trip to the server
Congrats on 5 billion searches per month btw @Jason Bosco 🙂 ❤️ !!! This is my third time using the product and it is just the best.
Yeah, from a caching perspective, you might want to separate the query for each ingredient in a separate single-search request, instead of using multi-search. (edited)
I guess the catch is say if I'm searching for ["red wine", "spagetti", "coke"] and I do a federated search Im making one request vs three if it I do it in single requests but I guess this is still probably better if I gain better cdn caching and I can paralyse the requests to the typesense server
1
Thank you for all the advice btw ! Appreciate it.
👍 1