I have a gigantic data set (50gb+ and growing) as...
# community-help
s
I have a gigantic data set (50gb+ and growing) as an indie cloud VPS costs for ram can get very expensive. Currently im dealing with this with eating through some cloud credits but they will run out soon. I have very minimal requests to the actual typesense instance, and many of the records are being hit extremely infrequently. Do you have any strategies you recommend to reduce the RAM needed in these scenarios? Is the only real strategy probably something along the lines of sync less json/data from your master database to typesense or is there any other ideas you can think of to bring my RAM requirements down with this growing data set.
j
Is the only real strategy probably something along the lines of sync less json/data from your master database
That pretty much sums it up! Because Typesense is an in-memory data store, so you need to have sufficient RAM to hold the full dataset in RAM. But one common way to blow up RAM usage is to index every single field in your dataset in RAM. So you want to be cognizant about which fields you want to use for search / filtering / sorting and which fields are display-only fields (for eg: image URLs) that are only needed to render the search results. For any display-only fields, you want to leave them out of the collection schema, but you can still the field into the collection when indexing the document. Any fields not mentioned in the schema, but present in documents, will be treated as an unindexed field and will only be stored on disk and won't take up RAM. So optimizing your collection schema is a good way to control RAM usage. Separately, enabling
facet: true
on a field increases RAM usage. So you want to be judicious about that as well
❤️ 1
s
By display only do you mean not searchable. If I want to display them to my end user in the result but not search across them don't put them in the schema. Is my interpretation correct? Approximately how can I think about how much weight indexes/facets/collection non searchable fields might add, so I can make a pro/con decision?
f
You can put them in the schema, just add
index: false
to the field definition
❤️ 1