wave Hi everyone typesense #community-help

Join Slack

:wave: Hi everyone!

# community-help

Tugay Karaçay

03/03/2021, 6:57 AM

👋 Hi everyone!

Jason Bosco

03/03/2021, 6:58 AM

Hi Tugay! Welcome!

Tugay Karaçay

03/03/2021, 6:59 AM

Hi @Jason Bosco It is great to see that you are online, I have a couple questions if you have time 😄

Jason Bosco

03/03/2021, 7:00 AM

I'll be around for about 15-20 minutes! Happy to answer questions

Tugay Karaçay

03/03/2021, 7:02 AM

We are planning to use Typesense for our multi-tenant SaaS platform wihch is a E-Commerce platform like Shopify. We designed our system one collection for each customer bu is there any collection limit for that. Every collection will have 5k documents on average. In the long run there may be 10k-20k collections within a cluster.

Tugay Karaçay

03/03/2021, 7:04 AM

And another question is do you planning to add a sharding mechanism to Typesense?

Jason Bosco

03/03/2021, 7:14 AM

There are no technical limits in Typesense on the number of collections. That said, each collection spins up 4 threads to parallelize searches, so the upper limit really depends on how many CPU cores your cluster has

Jason Bosco

03/03/2021, 7:14 AM

Any reason you want to store each customer's data in a separate index, vs using a scoped API key and storing everything in one index btw?

Jason Bosco

03/03/2021, 7:17 AM

And another question is do you planning to add a sharding mechanism to Typesense?

We do replicate the data across multiple nodes for high availability. However if you're talking about partioning the data and storing a subset on different nodes, we don't have plans for that at the moment. But you can always do application-side sharding, by spinning up multiple clusters and then mapping certain user-id ranges to a particular cluster for eg. You can scale vertically up to 3TB of RAM (AWS offers this for eg), and we haven't had asks to scale up beyond this size of a dataset yet, so we haven't prioritized horizontal scaling.

Tugay Karaçay

03/03/2021, 7:18 AM

Because in our app user can add dynamic props to a product so collection must be a dynamic for each customer and dynamic props will be used for filtering and faceting

👍 1

Tugay Karaçay

03/03/2021, 7:19 AM

We have to use alias to update an existing collection right?

Jason Bosco

03/03/2021, 7:20 AM

Alias is like a symlink, you could either use that or use the collection name directly to perform operations on the collection

Tugay Karaçay

03/03/2021, 7:20 AM

You can scale vertically up to 3TB of RAM (AWS offers this for eg), and we haven’t had asks to scale up beyond this size of a dataset yet, so we haven’t prioritized horizontal scaling.

👍

Tugay Karaçay

03/03/2021, 7:21 AM

Can we update a collection?

Jason Bosco

03/03/2021, 7:21 AM

Do you mean the schema?

Tugay Karaçay

03/03/2021, 7:21 AM

Yes

Jason Bosco

03/03/2021, 7:22 AM

Not at the moment unfortunately. Ah I now see what you meant earlier. You'd have to create a new collection, and then if you use an alias, update the alias to point to the new collection

Tugay Karaçay

03/03/2021, 7:23 AM

Yeap we are planning to do in that way 👍

Jason Bosco

03/03/2021, 7:23 AM

That said, v0.20 has a new auto-schema detection feature, where the first time a field is encountered in a record, it will automatically be indexed if you turn this mode on

Jason Bosco

03/03/2021, 7:23 AM

So if you don't need to change the datatype of a field and only need to add new fields, then the auto-schema detection feature will be useful for you

Tugay Karaçay

03/03/2021, 7:24 AM

Wow that would be great for us 🎉

Jason Bosco

03/03/2021, 7:24 AM

I actually have a nightly build with the feature! Would you be interested in beta testing it if I give you a docker build?

Tugay Karaçay

03/03/2021, 7:25 AM

There are no technical limits in Typesense on the number of collections. That said, each collection spins up 4 threads to parallelize searches, so the upper limit really depends on how many CPU cores your cluster has

Unfortunately this will be a big bottleneck for us 😞 We need to redesign our system for that

Tugay Karaçay

03/03/2021, 7:25 AM

I actually have a nightly build with the feature! Would you be interested in beta testing it if I give you a docker build?

I would love to 🙂

Jason Bosco

03/03/2021, 7:26 AM

Awesome! Let me put some instructions together for you, the docs are not yet written for it

Tugay Karaçay

03/03/2021, 7:27 AM

One final question sorry for taking too much time of you 🙂 Is there any limit for number of fields?

Jason Bosco

03/03/2021, 7:28 AM

Unfortunately this will be a big bottleneck for us 😞 We need to redesign our system for that

If you allow your users to define custom fields on the product, then going down the path of one collection per user makes total sense, because the schema is different for each user. v0.20 also has some threading improvements where we'll be able to use a shared thread pool to process requests across multiple collections. So this should allow you to scale to an even higher number of collections

Jason Bosco

03/03/2021, 7:29 AM

One final question sorry for taking too much time of you 🙂 Is there any limit for number of fields?

Happy to answer! No, there are no limits on number of fields. As long as you have sufficient RAM to hold the data, Typesense will happily chug along

Jason Bosco

03/03/2021, 7:34 AM

Alright! Here are instructions to use the new auto-schema detection feature: https://gist.github.com/jasonbosco/c712b52a4b29e84ebce82c9a5ec82ffc I'd love to get your feedback on how it works out for your use-case.

Tugay Karaçay

03/03/2021, 7:45 AM

Thank you so much for your help, We will try it as soon as possible 👍

👍 1

Andrew Sittermann

03/03/2021, 7:57 AM

'Any reason you want to store each customer's data in a separate index, vs using a scoped API key and storing everything in one index btw?" Hi Jason. I had thought scoped API keys were always scoped to a whole collection. Just reread the documentation. This feature is WAY cooler!

Jason Bosco

03/03/2021, 7:59 AM

Haha! Scoped API keys are a powerful feature! The "scoped" part means scoped to particular records, not just the collection. But I can see how it can be easily misunderstood as scoped to a collection

Jason Bosco

03/03/2021, 8:00 AM

You can actually embed any of the search parameters inside a scoped API key, so it's not just for filters. If you need a particular search parameter to not be changeable by users, you can embed it in a scoped API key and do searches with that

Jason Bosco

03/03/2021, 8:01 AM

Here's another interesting use case that came up recently: https://github.com/typesense/typesense/issues/193#issuecomment-765878863

🙌 1

Tugay Karaçay

03/03/2021, 8:09 AM

Hi @Andrew Sittermann because every collection will have dynamic fields and these are unique per customer, we may add all fields to collection and filter responses by using

include_fields

and

facet_by

but there may be 10k fields within a collection and I am not sure about efficiency of this solution 😄

Kishore Nallan

03/03/2021, 9:55 AM

@Tugay Karaçay regarding the per collection threading resources allocated that Jason mentioned earlier, this is also addressed in the 0.20 RC build. A common shared thread pool is used so it's no longer a constraining factor for having a large number of collections. In fact I think having a per customer collection is an easy way to scale as it offers much flexibility and is a logical way to shard your data for performance.

Tugay Karaçay

03/03/2021, 11:26 AM

Hi @Kishore Nallan yes shared thread pool improvement would be perfect for our solution also we’ve just made a little POC with RC build on a adding new fields and filtering them works good in our test cases but we need to use

facet: true

on dynamic fields too so it is not suitable for us now. And also are you considering to add

search: false

and

index: false

to field definition since we enable auto-schema detection we may want to prevent some fields to be indexed.

Kishore Nallan

03/03/2021, 11:34 AM

Implementing

facet: true

is easy, held back from doing that only because facets can consume memory and so enabling it on every field (especially long text fields like description) will be a huge waste of resources. Thinking of how best to handle that. One way of doing that is to enable facets only on field names ending with a

_facet

prefix.

And also are you considering to add
search: false
and
index: false
to field definition

Would you know upfront which fields will not need to be searched upon?

Tugay Karaçay

03/03/2021, 11:54 AM

Yes for our e-commerce platform only product name and some additional fields will be searchable other fields will be used for filtering and facets.

Tugay Karaçay

03/03/2021, 12:01 PM

Thinking of how best to handle that. One way of doing that is to enable facets only on field names ending with a
_facet
prefix

This is a good solution but not flexible one, using wildcards can be considered. For example on a fields definition we can use following syntax to dynamically match field definition:

Copy code

[
    {
      name: 'created_at',
      type: 'int64'
    },
    {
      name: '*_auto', 
      type: 'auto'
    },
    {
      name: '*_fct', 
      type: 'auto',
      facet: true
    },
    // stringify rest
    {
      name: '*', 
      type: 'stringify',
      facet: true
    }
]

Kishore Nallan

03/03/2021, 12:06 PM

Excellent. The

index: false

configuration can also be mentioned in the same way.

Gabe O'Leary

03/03/2021, 4:29 PM

Now if you use
scopedApiKey
to do searches instead of the main search api key, the server will automatically enforce the embedded
exclude_fields
param and users can't override it.

I'm using exactly this! to protect sensitive data & prevent excess data from being transmitted over the wire.

Jason Bosco

03/03/2021, 6:05 PM

@Gabe O'Leary That's great! Did you stumble on the Github issue first or did you discover that you could do this yourself?

Gabe O'Leary

03/03/2021, 6:06 PM

you told me 😁

Jason Bosco

03/03/2021, 6:13 PM

Oh lol, I’ve got some bad memory!

Open in Slack

Previous Next