Hello I m using typesense in a intensive write environment I typesense #community-help

Hello! I'm using typesense in a intensive write en...

Nicolas

08/18/2025, 12:34 PM

Hello! I'm using typesense in a intensive write environment. I'm using client side batching, and server side batching in default (40). I'm inserting/updating nearly 20 million documents continually within two hours, but I'm suffering from a bottleneck. I'm using 4VCPU + 16gb machine with v29. Are there any best practices or tips in these use cases?

Alan Martini

08/18/2025, 12:45 PM

@Nicolas, This is quite a substantial amount of data, and in this case, the CPUs will play a much more significant role in processing than the RAM. You'll want to check the import API and parallelize it accordingly. If you are using embeddings, enabling the GPU is highly recommended. Here are a few key points to keep in mind: - Utilize the bulk import API to efficiently load data into your newly created cluster. I recommend starting with a batch size of 1,000 documents per import API call, and setting the concurrency to N-1 parallel import API calls, where N represents the number of CPU cores in your cluster. - You may encounter 503 errors when importing into Typesense, as this is part of the built-in back-pressure mechanism. Be sure to handle these errors in your indexing pipeline, as described in the documentation: https://typesense.org/docs/guide/syncing-data-into-typesense.html#handling-http-503s If you still face a significant bottleneck after making these adjustments, it may indicate that 4 vCPUs are insufficient for your use case without experiencing bottlenecks.

Nicolas

08/18/2025, 12:56 PM

Hi @Alan Martini, thanks for you help. I'm not using embeddings. I'm using bulk import. Do you recommend increase this server-side batching from TS? On docs I saw that's saying "you rarely want to modify this parameter". Okay, so with 4vCPUs, it would be 2 parallel imports for a better performance? I never encoutered 503 error during this insertions. It's just taking too long to process them, going into timeout.

Nicolas

08/18/2025, 3:20 PM

@Alan Martini does the size of the collection influences on the velocity on indexing?

Nicolas

08/18/2025, 3:31 PM

- Utilize the bulk import API to efficiently load data into your newly created cluster. I recommend starting with a batch size of 1,000 documents per import API call, and setting the concurrency to N-1 parallel import API calls, where N represents the number of CPU cores in your cluster. about this: I did a client-side batching on 5k, So, i'm batching 5k and inserting using batch_size of 1k on typesense. Does it make sense? Or would be better if both were 1k?

Alan Martini

08/18/2025, 3:50 PM

I'm using bulk import. Do you recommend increase this server-side batching from TS? On docs I saw that's saying "you rarely want to modify this parameter".

IIRC they are the same. It will be the amount that TS will send over Typesense. On the second point, most users won't have an intensive writing pattern, especially of 10mi/hours, so there is no need to tweak it.

Okay, so with 4vCPUs, it would be 2 parallel imports for a better performance?

Your cluster has 4vCPUs? Then it's 3

So, I'm batching 5k and inserting using batch_size of 1k on Typesense.

Does it make sense? Or would be better if both were 1k?

1k is a good starting point, but you'll really want to measure what works best in your scenario. Try with 1k, then 2k. If it yields good results, try increasing further. If not, try decreasing to 500 and so on.

@Alan Martini does the size of the collection influences on the velocity on indexing?

Absolutely. The size of each document as well. The more documents and the bigger they are (and the more complex, with objects and so on), the longer it should take. Some datasets can take as long as 10 hours to index.

Nicolas

08/18/2025, 4:01 PM

@Alan Martini great points. Thank you very much. I saw that the batch_size parameter on bulk import is the way that TS will queue the insertions. I'm wondering if the client side batch should be the same size of server-side batching?

Nicolas

08/18/2025, 6:05 PM

For example. I'm inserting 100k documents. I'm doing a batch of 10k documents, só 10 batches of 10k documents. So, I call bulk insert API on 1 batch of 10k documents, but my batch_size on TS is 1k. Is it okay? Or there are any problem/concerns of using different sizes on this two points, @Alan Martini?

Alan Martini

08/18/2025, 7:11 PM

I'm not sure, will have to check it Hey @Fanis Tharropoulos do you have this info on the top of your head?

✅ 1

Nicolas

08/19/2025, 11:49 AM

thanks anyway, @Alan Martini!

Nicolas

08/20/2025, 1:43 PM

@Fanis Tharropoulos hi ! Any thoughts?

Nicolas

08/20/2025, 2:40 PM

@Alan Martini regarding of my machine's performance. i'm wondering if using ARM would be better or there's no performance difference?

Alan Martini

08/20/2025, 5:16 PM

@Nicolas which batch_size are your saying? the parameter for the server? That one will be when the server initializes or restart. Else, the batch amount you set on your client side in the amount that's being sent/processed by Typesense

Alan Martini

08/20/2025, 5:16 PM

@Kishore Nallan could you help on the ARM question?

✅ 1

Kishore Nallan

08/20/2025, 5:17 PM

ARM vs AMD/Intel is difficult to say: depends a lot on workload. But we do also use ARM on Typesense Cloud so it's well supported.

✅ 1

Nicolas

08/20/2025, 5:19 PM

thanks @Kishore Nallan. @Alan Martini I mean this parameter: https://typesense.org/docs/guide/syncing-data-into-typesense.html#client-side-batch-size-vs-server-side-batching

Nicolas

08/27/2025, 6:03 PM

Hi there @Alan Martini. A batch of 100k records, divided into 100 batches of 1000 records, it's taking more or less 10 minutes to be inserted, where each batch of 1000 records is taking 3~4 seconds to be inserted. I'm currently using server and client side batching equally on 1k records.

Nicolas

08/27/2025, 6:05 PM

is there any way to optimize it?

Kishore Nallan

08/27/2025, 6:06 PM

Can you share your schema? I can review it to see what could be causing bottle necks.

Nicolas

08/27/2025, 6:11 PM

yeah, sure @Kishore Nallan.

[{'async_reference': True,

'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'reference': 'collection2.id',

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int64'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int64'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'float'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'float'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'float'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int32'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int32'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int32'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'bool'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'bool'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int32'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'bool'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': True,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int64'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'float'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': False,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'string'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': True,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'int64'},

{'facet': False,

'index': True,

'infix': False,

'locale': '',

'name': 'key',

'optional': False,

'sort': True,

'stem': False,

'stem_dictionary': '',

'store': True,

'type': 'float'}],

Nicolas

08/27/2025, 6:12 PM

i'm also inserting this in 3 parallel insertions. The collection has nearly 20million documents

Kishore Nallan

08/27/2025, 6:13 PM

Copy code

{'facet': False,
    'index': True,
    'infix': False,
    'locale': '',
    'name': 'key',
    'optional': False,
    'sort': False,
    'stem': False,
    'stem_dictionary': '',
    'store': True,
    'type': 'string'},

Enabling sorting on string type can be a very cpu intensive operation.

Kishore Nallan

08/27/2025, 6:13 PM

Scratch that.

Kishore Nallan

08/27/2025, 6:14 PM

I misread, string fields don't have sort. But are you using a reference?

Kishore Nallan

08/27/2025, 6:15 PM

Async references can also introduce some delay in indexing because we have to wait for fields to line up.

Nicolas

08/27/2025, 6:15 PM

hm, but it's an interesting observation, I can see a lot of sort : True, but today, i'm not using sorting at all. So, turning off these can improve insertion performance, right?

Kishore Nallan

08/27/2025, 6:16 PM

Yes, definitely

Nicolas

08/27/2025, 6:16 PM

Yeah, I've 1 field that use a reference, using async reference. And I really thought that this async reference would cause the contrary effect on performance

Nicolas

08/27/2025, 6:16 PM

like, it would delegate to by "synced" when the document arrive

Kishore Nallan

08/27/2025, 6:16 PM

Are you planning to join this collection with another?

Nicolas

08/27/2025, 6:17 PM

not now. All the things I need are "already joined"

Nicolas

08/27/2025, 6:17 PM

between these two collections

Kishore Nallan

08/27/2025, 6:17 PM

Then you can remove the reference and async reference as well.

Nicolas

08/27/2025, 6:18 PM

but this is the only one I need to join

Nicolas

08/27/2025, 6:18 PM

this one with async

Nicolas

08/27/2025, 6:18 PM

(maybe not async, but definitely the reference)

Kishore Nallan

08/27/2025, 6:19 PM

Try first without async_reference to see if that helps also.

✅ 1

Nicolas

08/27/2025, 6:31 PM

thank you very much, @Kishore Nallan!

Nicolas

08/27/2025, 6:45 PM

@Kishore Nallan about the batch_size on import call, are there any considerations about what am I doing?

Kishore Nallan

08/28/2025, 5:19 AM

We don't rely too much on batch_size parameter in import calls anymore. We try to batch intelligently inside.

✅ 1

Nicolas

09/05/2025, 10:40 PM

@Kishore Nallan I applied your advices. • I trim up my collection to contain only the needed fields. • Remove the sort = True • change to graviton + CPU optimized

Nicolas

09/05/2025, 10:40 PM

got good results. Thank you very much !

🙌 1

Nicolas

09/10/2025, 1:57 PM

@Kishore Nallan. Hi! I am trying to improve even further this insertions. 😆 I am inserting 250k documents in 7 parallel insertions on 8vCPU machine. And it's taking ~13minutes each, so it's roughly 780s to insert 250k. So it's taking 2.88ms per document. Is that okay? What else can I do?

Kishore Nallan

09/11/2025, 12:48 PM

You can try increasing some of these default rocksdb parameters also: https://typesense.org/docs/29.0/api/server-configuration.html#on-disk-db-fine-tuning

Nicolas

09/11/2025, 12:50 PM

Thanks, @Kishore Nallan, I'll give a look! Are there something else I can do?

Kishore Nallan

09/11/2025, 12:51 PM

I think that's all I can think of.

🙌 1

Nicolas

09/11/2025, 12:51 PM

Thank you very much for your support!

👍 1

12 Views

Open in Slack

Previous Next