Discussions on Typesense, Collections, and Dynamic Fields
TLDR Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.
2
1
Mar 03, 2021 (33 months ago)
Tugay
06:57 AMJason
06:58 AMTugay
06:59 AMJason
07:00 AMTugay
07:02 AMTugay
07:04 AMJason
07:14 AMJason
07:14 AMJason
07:17 AMWe do replicate the data across multiple nodes for high availability. However if you're talking about partioning the data and storing a subset on different nodes, we don't have plans for that at the moment. But you can always do application-side sharding, by spinning up multiple clusters and then mapping certain user-id ranges to a particular cluster for eg.
You can scale vertically up to 3TB of RAM (AWS offers this for eg), and we haven't had asks to scale up beyond this size of a dataset yet, so we haven't prioritized horizontal scaling.
Tugay
07:18 AM1
Tugay
07:19 AMJason
07:20 AMTugay
07:20 AM👍
Tugay
07:21 AMJason
07:21 AMTugay
07:21 AMJason
07:22 AMTugay
07:23 AMJason
07:23 AMJason
07:23 AMTugay
07:24 AMJason
07:24 AMTugay
07:25 AMUnfortunately this will be a big bottleneck for us 😞 We need to redesign our system for that
Tugay
07:25 AMI would love to 🙂
Jason
07:26 AMTugay
07:27 AMJason
07:28 AMIf you allow your users to define custom fields on the product, then going down the path of one collection per user makes total sense, because the schema is different for each user. v0.20 also has some threading improvements where we'll be able to use a shared thread pool to process requests across multiple collections. So this should allow you to scale to an even higher number of collections
Jason
07:29 AMHappy to answer! No, there are no limits on number of fields. As long as you have sufficient RAM to hold the data, Typesense will happily chug along
Jason
07:34 AMI'd love to get your feedback on how it works out for your use-case.
Tugay
07:45 AM1
Andrew
07:57 AMHi Jason. I had thought scoped API keys were always scoped to a whole collection. Just reread the documentation. This feature is WAY cooler!
Jason
07:59 AMJason
08:00 AMJason
08:01 AM1
Tugay
08:09 AMinclude_fields
and facet_by
but there may be 10k fields within a collection and I am not sure about efficiency of this solution 😄Kishore Nallan
09:55 AMTugay
11:26 AMfacet: true
on dynamic fields too so it is not suitable for us now. And also are you considering to add search: false
and index: false
to field definition since we enable auto-schema detection we may want to prevent some fields to be indexed.Kishore Nallan
11:34 AMfacet: true
is easy, held back from doing that only because facets can consume memory and so enabling it on every field (especially long text fields like description) will be a huge waste of resources. Thinking of how best to handle that. One way of doing that is to enable facets only on field names ending with a _facet
prefix.> And also are you considering to add
search: false
and index: false
to field definitionWould you know upfront which fields will not need to be searched upon?
Tugay
11:54 AMTugay
12:01 PM_facet
prefixThis is a good solution but not flexible one, using wildcards can be considered. For example on a fields definition we can use following syntax to dynamically match field definition:
[
{
name: 'created_at',
type: 'int64'
},
{
name: '*_auto',
type: 'auto'
},
{
name: '*_fct',
type: 'auto',
facet: true
},
// stringify rest
{
name: '*',
type: 'stringify',
facet: true
}
]
Kishore Nallan
12:06 PMindex: false
configuration can also be mentioned in the same way.Gabe
04:29 PMscopedApiKey
to do searches instead of the main search api key, the server will automatically enforce the embedded exclude_fields
param and users can't override it.I'm using exactly this! to protect sensitive data & prevent excess data from being transmitted over the wire.
Jason
06:05 PMGabe
06:06 PMJason
06:13 PMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Cold Start Problem with Dynamic Collections
Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard `*` for query_by parameters, upgrading to `0.25.0.rc34`, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.
Updating Collections Strategy and Faceting New Field
Nithin asked about strategies for updating collections and faceting new fields. Kishore Nallan suggested creating another collection, indexing in the background and using aliases to switch live traffic over, and shared details about the upcoming release.
User-Specific Tagging and Filtering in UI
bnfd asked for the best way to create user-specific tags available on the UI. Jason suggested using personalized filters and creating a separate collection for each user's movies. The duo clarified the use of 'tags' in schemas and the refinementList widget in instantsearch. They also discussed various approaches to import and search large document collections.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.