How can I see the speed of indexing?
# community-help
t
How can I see the speed of indexing?
h
Normally one the request has completed the indexing is complete
1
t
Ok, so indexing 3000 product documents (official sample) takes 1800ms on 3 cores?
h
are you inserting them one at a time or via Line separated JSON?
t
LJSON
2.6MB
from 1.8 sec to 2.5 sec.
batch is 100
j
1.8s, sounds about right... But most of it is overhead with the request processing. As another data point, I've indexed 2.2M docs in 3.6 minutes on a 4vCPU server
t
With what settings?
It's taking longer and longer to index too, which is weird
j
I sent the entire 2.2M docs as JSONL in a single import API call, with the default (server-side) batch size
t
You can use batches that large?
1
k
Is this on your local machine or uploading to a remote server? Network latency also comes into the picture.
h
Generally you should try and do them as big as possible
1
t
local
h
I'd probably argue that anything under 100k docs should be done in one go
1
t
we get the same time no matter which batch size
j
With 3K docs you mean?
t
Yes
j
Yeah that's the fixed overhead in request processing
h
you'll only run into the network latency when doing lots of small round trips on a big index
j
That overhead doesn't linearly scale, if your extrapolating
t
Ok, so larger batch sizes are always better
j
For sure
h
i.e 100 docs at a time on a 1 million doc index
t
Ok, noted
are there plans to support NVMe as storage?
j
You can already use NVMe disks
h
🤔 The type of storage shouldn't affect typesense
or any program for the most part
j
From Typesense's perspective its just a file system
t
so if it doesn't fit in RAM, it fetches from disk?
h
the index is always stored in RAM iirc
t
so index size isn't limited by RAM then?
j
Unless you mean storing indices on disk instead of RAM - that we have no plans
t
So index is limited by RAM size?
j
Index size is indeed constrained by RAM. So you need to have sufficient RAM to hold the entire index in memory.
t
Aha, understood
any plans to support nested fields?
h
Just as an FYI you may find https://cloud.typesense.org/pricing/calculator useful for working out roughly how much memory you want
t
I think I've seen it before
j
Largest commercially available RAM today is 24TB, and RAM cost has only been getting cheaper. So we're hoping that for most site/app search use-cases RAM-based search that lets you build search-as-you-type instant-search experiences would work out well.
any plans to support nested fields?
Yes for sure, probably in the next few releases. Until then, here's a workaround: https://typesense.org/docs/0.22.2/api/collections.html#indexing-nested-fields
t
any difference in using?
is it expected within the next two months?
j
I'd say probably 3-4 months time frame
t
We have catalogs with categories that has products and we want them filterable, how do you suggest doing this with the current one?
j
That's what powers this demo: https://ecommerce-store.typesense.org/
t
We did, but that's only categories, not multiple catalogs
j
Didn't get you... Could you expand on what you mean by multiple catalogs with an example?
t
1 product can be in multiple catalogs and categories
j
You could have a
catalog_ids: [1,4,6]
field in each product
and then depending on which catalog you're rendering, filter by the catalog id?
t
catalogs are multi language and need labels
our idea was two collections, one for catalogs and categories and one for products
j
I'd need a little more context on how you plan to query the dataset and the search UI (if you have mockups), since that will dictate how you structure the collections
t
I can make some sample json for you tomorrow, clocking off for today
Thanks for the replies so far
j
Sounds good
👍 1
t
is the full document stored in index/ram or can the document be fetched from disk?
j
Only indexed fields are stored in RAM, the full doc is stored on disk as a backup and also for unindexed fields not mentioned in the schema.