I m really impressed with the Typesense search engine so far typesense #community-help

I’m really impressed with the Typesense search eng...

Gabriel Delattre

05/15/2025, 9:03 PM

I’m really impressed with the Typesense search engine so far. Until now, I’ve been working with fewer than

100,000 items

, but I’m planning to scale up to several million. I’d love to hear from anyone who has successfully indexed around

60,000,000 items

, any insights or lessons learned would be greatly appreciated. Thank s a lot!

Alan Martini

05/16/2025, 12:03 AM

Hi Gabriel, Glad to hear you're impressed! There’s virtually no limit to the number of items you can index, as long as your cluster has enough RAM to hold them. But in general the larger the collection size, the more CPU cycles are required for search and indexing, so some queries might slow down by say a few hundred milliseconds for over 50M docs when they are all in a single collection. At that stage, you want to consider sharding your documents across multiple collections, as to maintain fast performance. A couple of things to keep in mind: • If you have display-only fields (for eg: image URLs), you can further improve write performance by just leaving those fields out of the collection schema, and instead send them in the documents when importing into Typesense. Any fields present in the documents, but not mentioned in the collection schema will just be stored on disk and won't take up RAM or CPU cycles in trying to build an index. When the document is a hit for a search query, we'll fetch the display-only fields just for that document from disk and stuff it into the API response. • Use the bulk import API to efficiently load data into your newly created cluster. I'd recommend starting with a batch size of 1000 documents per import API call, and a concurrency of say N-1 parallel import API calls, where N is the number of CPU cores in your cluster. • You might see 503s when importing into Typesense, which is the built-in back-pressure mechanism. You want to make sure you handle those in your indexing pipeline as described here

Gabriel Delattre

05/16/2025, 9:31 AM

hello ! thanks for all this details

🙌 1

Sahil Rally

05/16/2025, 9:45 AM

@Gabriel Delattre Consider checking limits , for instance, for Groups it is 250 and for Hits per Group it is 100. These are soft limits though. Also there are no aggregation operations like Sum, Average etc for Grouped Hits individually that might require extra processing on client side.

Gabriel Delattre

05/16/2025, 10:15 AM

hmmm

Gabriel Delattre

05/16/2025, 10:15 AM

I have lot of groups

Gabriel Delattre

05/16/2025, 10:15 AM

hit per group ?

Gabriel Delattre

05/16/2025, 10:16 AM

per instance or per collection ?

Sahil Rally

05/16/2025, 12:12 PM

Hit Per Group. “group_by”.

Gabriel Delattre

05/16/2025, 12:15 PM

Sorry, I don't get it.

Gabriel Delattre

05/16/2025, 12:15 PM

Do you mean the returned result will be cut-off ?

Sahil Rally

05/16/2025, 12:16 PM

Yes it will be cut-off

Gabriel Delattre

05/16/2025, 12:16 PM

why ?

Sahil Rally

05/16/2025, 12:16 PM

Founders can better tell.

Sahil Rally

05/16/2025, 12:16 PM

Its soft limit though.

Gabriel Delattre

05/16/2025, 12:19 PM

hmmm, this worry me a bit. I need to make facet and group by with multiple item that belongs to a category

Gabriel Delattre

05/16/2025, 12:19 PM

for instance I've got 100 000 items in this category, I will

group_by

id and then I will use the faceted results

Gabriel Delattre

05/16/2025, 12:21 PM

hmm

Gabriel Delattre

05/16/2025, 12:21 PM

Sahil Rally

05/16/2025, 12:21 PM

Ya, we had similar use case and are now stuck, not due to the limit as it can be increased but due to overhead of doing few thing on Client side like doing aggregation of results of a particualr group which we assumed to be trival and would be the part of the product like MongoDB.

Gabriel Delattre

05/16/2025, 12:22 PM

worry me more 🙂

Sahil Rally

05/16/2025, 12:23 PM

We asked fort the feature also but it seems not on their priority list as well!

Kishore Nallan

05/16/2025, 12:56 PM

for instance I've got 100 000 items in this category, I will
group_by
id and then I will use the faceted results

id

I presume it's some type of product ID?

Gabriel Delattre

05/16/2025, 12:56 PM

yes

Kishore Nallan

05/16/2025, 12:57 PM

Product ID tends to be high cardinality, right. Do you expect to have 100s of records in a group when grouped by this

id

Gabriel Delattre

05/16/2025, 1:01 PM

yes

Gabriel Delattre

05/16/2025, 1:01 PM

even more

Kishore Nallan

05/16/2025, 1:02 PM

You can then increase the soft limit. We parameterized it in the v29 RC build and will soon be adding it as a configuration on the Typesense Cloud cluster configuration as well.

Kishore Nallan

05/16/2025, 1:03 PM

We also heavily optimized group-by in v29 RC build so it's quite efficient now.

Gabriel Delattre

05/16/2025, 1:06 PM

I’m running on your cloud

Gabriel Delattre

05/16/2025, 1:06 PM

Gabriel Delattre

05/16/2025, 1:07 PM

If you can tell me how to test

Fanis Tharropoulos

05/16/2025, 1:09 PM

You can update your cluster's Typesense Version by going to the "Cluster Configuration" page and hitting "Modify" button. There will be a selectbox for it. Latest one is

v29.0.rc23

Kishore Nallan

05/16/2025, 1:10 PM

We still need to support modifying this group limit on Typesense Cloud. We will be adding this support today.

Gabriel Delattre

05/16/2025, 1:28 PM

Ok great will test

Gabriel Delattre

05/17/2025, 6:00 PM

Circling back on this project

Gabriel Delattre

05/17/2025, 6:00 PM

We are trying to run the ingestion of our 17 millions items

Gabriel Delattre

05/17/2025, 6:00 PM

but one of our clusters is running out of space

Gabriel Delattre

05/17/2025, 6:02 PM

Should we increase the VCPU temporarly to ingest and then scale down as we have few traffic for now ?

Gabriel Delattre

05/17/2025, 6:03 PM

Alan Martini

05/17/2025, 8:36 PM

Hi Gabriel, You can temporarily scale up vCPUs to speed up ingestion, but that alone won’t solve the disk space issue you're hitting. Disk space in Typesense is tied to RAM — each cluster comes with disk equal to 5x the RAM. So if your current cluster doesn’t have enough disk to hold all 17 million items, you’ll need to scale up the RAM, which in turn increases the disk quota. After ingestion, if your traffic is low and you don’t need as much compute, you can scale the cluster back down.

Open in Slack

Previous Next