How can I see the speed of indexing typesense #community-help

Join Slack

How can I see the speed of indexing?

# community-help

Thomas

02/23/2022, 4:11 PM

How can I see the speed of indexing?

Harrison Burt

02/23/2022, 4:12 PM

Normally one the request has completed the indexing is complete

➕ 1

Thomas

02/23/2022, 4:13 PM

Ok, so indexing 3000 product documents (official sample) takes 1800ms on 3 cores?

Harrison Burt

02/23/2022, 4:14 PM

are you inserting them one at a time or via Line separated JSON?

Thomas

02/23/2022, 4:14 PM

LJSON

Thomas

02/23/2022, 4:14 PM

2.6MB

Thomas

02/23/2022, 4:14 PM

from 1.8 sec to 2.5 sec.

Thomas

02/23/2022, 4:14 PM

batch is 100

Jason Bosco

02/23/2022, 4:15 PM

1.8s, sounds about right... But most of it is overhead with the request processing. As another data point, I've indexed 2.2M docs in 3.6 minutes on a 4vCPU server

Thomas

02/23/2022, 4:15 PM

With what settings?

Thomas

02/23/2022, 4:15 PM

It's taking longer and longer to index too, which is weird

Jason Bosco

02/23/2022, 4:15 PM

I sent the entire 2.2M docs as JSONL in a single import API call, with the default (server-side) batch size

Thomas

02/23/2022, 4:16 PM

You can use batches that large?

✅ 1

Kishore Nallan

02/23/2022, 4:16 PM

Is this on your local machine or uploading to a remote server? Network latency also comes into the picture.

Harrison Burt

02/23/2022, 4:16 PM

Generally you should try and do them as big as possible

➕ 1

Thomas

02/23/2022, 4:17 PM

local

Harrison Burt

02/23/2022, 4:17 PM

I'd probably argue that anything under 100k docs should be done in one go

➕ 1

Thomas

02/23/2022, 4:17 PM

we get the same time no matter which batch size

Jason Bosco

02/23/2022, 4:17 PM

With 3K docs you mean?

Thomas

02/23/2022, 4:17 PM

Yes

Jason Bosco

02/23/2022, 4:18 PM

Yeah that's the fixed overhead in request processing

Harrison Burt

02/23/2022, 4:18 PM

you'll only run into the network latency when doing lots of small round trips on a big index

Jason Bosco

02/23/2022, 4:18 PM

That overhead doesn't linearly scale, if your extrapolating

Thomas

02/23/2022, 4:18 PM

Ok, so larger batch sizes are always better

Jason Bosco

02/23/2022, 4:18 PM

For sure

Harrison Burt

02/23/2022, 4:18 PM

i.e 100 docs at a time on a 1 million doc index

Thomas

02/23/2022, 4:19 PM

Ok, noted

Thomas

02/23/2022, 4:19 PM

are there plans to support NVMe as storage?

Jason Bosco

02/23/2022, 4:19 PM

You can already use NVMe disks

Harrison Burt

02/23/2022, 4:19 PM

🤔 The type of storage shouldn't affect typesense

Harrison Burt

02/23/2022, 4:19 PM

or any program for the most part

Jason Bosco

02/23/2022, 4:20 PM

From Typesense's perspective its just a file system

Thomas

02/23/2022, 4:20 PM

so if it doesn't fit in RAM, it fetches from disk?

Harrison Burt

02/23/2022, 4:20 PM

the index is always stored in RAM iirc

Thomas

02/23/2022, 4:20 PM

so index size isn't limited by RAM then?

Jason Bosco

02/23/2022, 4:20 PM

Unless you mean storing indices on disk instead of RAM - that we have no plans

Thomas

02/23/2022, 4:21 PM

So index is limited by RAM size?

Jason Bosco

02/23/2022, 4:21 PM

Index size is indeed constrained by RAM. So you need to have sufficient RAM to hold the entire index in memory.

Thomas

02/23/2022, 4:21 PM

Aha, understood

Thomas

02/23/2022, 4:21 PM

any plans to support nested fields?

Harrison Burt

02/23/2022, 4:22 PM

Just as an FYI you may find https://cloud.typesense.org/pricing/calculator useful for working out roughly how much memory you want

Thomas

02/23/2022, 4:23 PM

I think I've seen it before

Jason Bosco

02/23/2022, 4:23 PM

Largest commercially available RAM today is 24TB, and RAM cost has only been getting cheaper. So we're hoping that for most site/app search use-cases RAM-based search that lets you build search-as-you-type instant-search experiences would work out well.

Jason Bosco

02/23/2022, 4:24 PM

any plans to support nested fields?

Yes for sure, probably in the next few releases. Until then, here's a workaround: https://typesense.org/docs/0.22.2/api/collections.html#indexing-nested-fields

Thomas

02/23/2022, 4:27 PM

any difference in using?

Thomas

02/23/2022, 4:27 PM

is it expected within the next two months?

Jason Bosco

02/23/2022, 4:28 PM

I'd say probably 3-4 months time frame

Thomas

02/23/2022, 4:31 PM

We have catalogs with categories that has products and we want them filterable, how do you suggest doing this with the current one?

Jason Bosco

02/23/2022, 4:31 PM

Have a look at the sample ecommerce dataset here: https://github.com/typesense/showcase-ecommerce-store/tree/master/scripts/data

Jason Bosco

02/23/2022, 4:32 PM

That's what powers this demo: https://ecommerce-store.typesense.org/

Thomas

02/23/2022, 4:32 PM

We did, but that's only categories, not multiple catalogs

Jason Bosco

02/23/2022, 4:32 PM

Didn't get you... Could you expand on what you mean by multiple catalogs with an example?

Thomas

02/23/2022, 4:33 PM

1 product can be in multiple catalogs and categories

Jason Bosco

02/23/2022, 4:34 PM

You could have a

catalog_ids: [1,4,6]

field in each product

Jason Bosco

02/23/2022, 4:34 PM

and then depending on which catalog you're rendering, filter by the catalog id?

Thomas

02/23/2022, 4:38 PM

catalogs are multi language and need labels

Thomas

02/23/2022, 4:38 PM

our idea was two collections, one for catalogs and categories and one for products

Jason Bosco

02/23/2022, 4:40 PM

I'd need a little more context on how you plan to query the dataset and the search UI (if you have mockups), since that will dictate how you structure the collections

Thomas

02/23/2022, 4:42 PM

I can make some sample json for you tomorrow, clocking off for today

Thomas

02/23/2022, 4:43 PM

Thanks for the replies so far

Jason Bosco

02/23/2022, 4:43 PM

Sounds good

👍 1

Thomas

02/23/2022, 5:11 PM

is the full document stored in index/ram or can the document be fetched from disk?

Jason Bosco

02/23/2022, 5:24 PM

Only indexed fields are stored in RAM, the full doc is stored on disk as a backup and also for unindexed fields not mentioned in the schema.

Open in Slack

Previous Next