:wave: Hi everyone!
# community-help
t
👋 Hi everyone!
j
Hi Tugay! Welcome!
t
Hi @Jason Bosco It is great to see that you are online, I have a couple questions if you have time 😄
j
I'll be around for about 15-20 minutes! Happy to answer questions
t
We are planning to use Typesense for our multi-tenant SaaS platform wihch is a E-Commerce platform like Shopify. We designed our system one collection for each customer bu is there any collection limit for that. Every collection will have 5k documents on average. In the long run there may be 10k-20k collections within a cluster.
And another question is do you planning to add a sharding mechanism to Typesense?
j
There are no technical limits in Typesense on the number of collections. That said, each collection spins up 4 threads to parallelize searches, so the upper limit really depends on how many CPU cores your cluster has
Any reason you want to store each customer's data in a separate index, vs using a scoped API key and storing everything in one index btw?
And another question is do you planning to add a sharding mechanism to Typesense?
We do replicate the data across multiple nodes for high availability. However if you're talking about partioning the data and storing a subset on different nodes, we don't have plans for that at the moment. But you can always do application-side sharding, by spinning up multiple clusters and then mapping certain user-id ranges to a particular cluster for eg. You can scale vertically up to 3TB of RAM (AWS offers this for eg), and we haven't had asks to scale up beyond this size of a dataset yet, so we haven't prioritized horizontal scaling.
t
Because in our app user can add dynamic props to a product so collection must be a dynamic for each customer and dynamic props will be used for filtering and faceting
👍 1
We have to use alias to update an existing collection right?
j
Alias is like a symlink, you could either use that or use the collection name directly to perform operations on the collection
t
You can scale vertically up to 3TB of RAM (AWS offers this for eg), and we haven’t had asks to scale up beyond this size of a dataset yet, so we haven’t prioritized horizontal scaling.
👍
Can we update a collection?
j
Do you mean the schema?
t
Yes
j
Not at the moment unfortunately. Ah I now see what you meant earlier. You'd have to create a new collection, and then if you use an alias, update the alias to point to the new collection
t
Yeap we are planning to do in that way 👍
j
That said, v0.20 has a new auto-schema detection feature, where the first time a field is encountered in a record, it will automatically be indexed if you turn this mode on
So if you don't need to change the datatype of a field and only need to add new fields, then the auto-schema detection feature will be useful for you
t
Wow that would be great for us 🎉
j
I actually have a nightly build with the feature! Would you be interested in beta testing it if I give you a docker build?
t
There are no technical limits in Typesense on the number of collections. That said, each collection spins up 4 threads to parallelize searches, so the upper limit really depends on how many CPU cores your cluster has
Unfortunately this will be a big bottleneck for us 😞 We need to redesign our system for that
I actually have a nightly build with the feature! Would you be interested in beta testing it if I give you a docker build?
I would love to 🙂
j
Awesome! Let me put some instructions together for you, the docs are not yet written for it
t
One final question sorry for taking too much time of you 🙂 Is there any limit for number of fields?
j
Unfortunately this will be a big bottleneck for us 😞 We need to redesign our system for that
If you allow your users to define custom fields on the product, then going down the path of one collection per user makes total sense, because the schema is different for each user. v0.20 also has some threading improvements where we'll be able to use a shared thread pool to process requests across multiple collections. So this should allow you to scale to an even higher number of collections
One final question sorry for taking too much time of you 🙂 Is there any limit for number of fields?
Happy to answer! No, there are no limits on number of fields. As long as you have sufficient RAM to hold the data, Typesense will happily chug along
Alright! Here are instructions to use the new auto-schema detection feature: https://gist.github.com/jasonbosco/c712b52a4b29e84ebce82c9a5ec82ffc I'd love to get your feedback on how it works out for your use-case.
t
Thank you so much for your help, We will try it as soon as possible 👍
👍 1
a
'Any reason you want to store each customer's data in a separate index, vs using a scoped API key and storing everything in one index btw?" Hi Jason. I had thought scoped API keys were always scoped to a whole collection. Just reread the documentation. This feature is WAY cooler!
j
Haha! Scoped API keys are a powerful feature! The "scoped" part means scoped to particular records, not just the collection. But I can see how it can be easily misunderstood as scoped to a collection
You can actually embed any of the search parameters inside a scoped API key, so it's not just for filters. If you need a particular search parameter to not be changeable by users, you can embed it in a scoped API key and do searches with that
Here's another interesting use case that came up recently: https://github.com/typesense/typesense/issues/193#issuecomment-765878863
🙌 1
t
Hi @Andrew Sittermann because every collection will have dynamic fields and these are unique per customer, we may add all fields to collection and filter responses by using
include_fields
and
facet_by
but there may be 10k fields within a collection and I am not sure about efficiency of this solution 😄
k
@Tugay Karaçay regarding the per collection threading resources allocated that Jason mentioned earlier, this is also addressed in the 0.20 RC build. A common shared thread pool is used so it's no longer a constraining factor for having a large number of collections. In fact I think having a per customer collection is an easy way to scale as it offers much flexibility and is a logical way to shard your data for performance.
t
Hi @Kishore Nallan yes shared thread pool improvement would be perfect for our solution also we’ve just made a little POC with RC build on a adding new fields and filtering them works good in our test cases but we need to use
facet: true
on dynamic fields too so it is not suitable for us now. And also are you considering to add
search: false
and
index: false
to field definition since we enable auto-schema detection we may want to prevent some fields to be indexed.
k
Implementing
facet: true
is easy, held back from doing that only because facets can consume memory and so enabling it on every field (especially long text fields like description) will be a huge waste of resources. Thinking of how best to handle that. One way of doing that is to enable facets only on field names ending with a
_facet
prefix.
And also are you considering to add 
search: false
  and 
index: false
  to field definition
Would you know upfront which fields will not need to be searched upon?
t
Yes for our e-commerce platform only product name and some additional fields will be searchable other fields will be used for filtering and facets.
Thinking of how best to handle that. One way of doing that is to enable facets only on field names ending with a 
_facet
 prefix
This is a good solution but not flexible one, using wildcards can be considered. For example on a fields definition we can use following syntax to dynamically match field definition:
Copy code
[
    {
      name: 'created_at',
      type: 'int64'
    },
    {
      name: '*_auto', 
      type: 'auto'
    },
    {
      name: '*_fct', 
      type: 'auto',
      facet: true
    },
    // stringify rest
    {
      name: '*', 
      type: 'stringify',
      facet: true
    }
]
k
Excellent. The
index: false
configuration can also be mentioned in the same way.
g
Now if you use 
scopedApiKey
 to do searches instead of the main search api key, the server will automatically enforce the embedded 
exclude_fields
 param and users can't override it.
I'm using exactly this! to protect sensitive data & prevent excess data from being transmitted over the wire.
j
@Gabe O'Leary That's great! Did you stumble on the Github issue first or did you discover that you could do this yourself?
g
you told me 😁
j
Oh lol, I’ve got some bad memory!