Hello ```{ "searches": [ { "collectio...
# community-help
t
Hello
Copy code
{
  "searches": [
    {
      "collection": "casemanagement_cases",
      "filter_by": "id:* || $casemanagement_case_category_relations(id:*)",
      "include_fields": "$casemanagement_users(Name, strategy: merge) as AgentName,$casemanagement_clients(Name, strategy: merge) as ClientName,$queues_queuegroups(Name, strategy: merge) as GroupName, $casemanagement_case_category_relations(CategoryId, strategy: nest_array) as Categories",
      "q": "",
      "query_by": "ClientID",
      "limit": 5,
      "enable_lazy_filter": true,
      "validate_field_names": true
    }
  ]
}
I am doing a pretty complicated join. (not sure if best practice) but this works fine on dataset approx 2 million.. But it is pretty slow about 2 seconds to do this query. Do i miss something? When i remove the filter_by it goes quick about 2ms. The category_relation can have multiple caseIds that i want to join. Any suggestions?
j
Just like in a relational database, doing joins at query time is computationally expensive, compared to pre-joining the data and storing them in a flattened way at indexing time. So there is a performance tradeoff involved when using JOINs.
t
When i remove this segment, it is quick again
Copy code
"filter_by": "id:* || $casemanagement_case_category_relations(id:*)",
@Jason Bosco
I understand that, i was hoping to not have to do it that's why i tried. But agree, will try to flatten it
bit dissapointed, i now normalized the data, so there's no joins. But some filters, and it is slower than mariadb with the same amount of data de-normalized...
Copy code
searchParameters {
  q: '*',
  query_by: 'ClientID',
  sort_by: 'Created:DESC',
  filter_by: 'InternalType:=[call,callback,chat,email,form,sms,social] && GroupId:=[135,162,383,445,584,585,615,616,0] && Status:=[Closed,Open]',
  page: 1,
  limit: 5
}
Do i do anything wrong or is this expected..? @Jason Bosco
Copy code
{
                name: schemaName,
                fields: [
                    { name: '.*', type: 'auto' },
                    { name: 'ID', type: 'int32' },
                    { name: 'CaseID', type: 'string', sort: true },
                    {
                        name: 'GroupName',
                        type: 'string',
                        optional: true,
                        sort: true,
                    },
                    { name: 'Status', type: 'string', sort: true },
                    { name: 'Created', type: 'string', sort: true },
                    { name: 'SLA', type: 'string', sort: true },
                ],
            }
j
In your filter_by, could you try changing all the
:=
to
:
t
Will try that, i'll let you know how it went
Ok that made it faster, thanks! Any other tips and trix to make it faster? 🙂 🙂 @Jason Bosco
h
I'm wondering about the Join's performance. @Taieb can you share the schemas of the collections you had prior to denormalization?
t
Sure 1 sec
Copy code
export function schema(schemaName: string): CollectionCreateSchema {
    switch (schemaName) {
        case 'casemanagement_cases':
            return {
                name: schemaName,
                fields: [
                    { name: '.*', type: 'auto' },
                    { name: 'ID', type: 'int32' },
                    {
                        name: 'UserId',
                        type: 'int32',
                        reference: 'casemanagement_users.UserId',
                        async_reference: true,
                    },
                    {
                        name: 'ClientID',
                        type: 'string',
                        reference: 'casemanagement_clients.ClientID',
                        async_reference: true,
                    },
                ],
                default_sorting_field: 'ID',
            };
        case 'casemanagement_clients':
            return {
                name: schemaName,
                fields: [
                    { name: '.*', type: 'auto' },
                    { name: 'ID', type: 'int32' },
                    {
                        name: 'ClientID',
                        type: 'string',
                    },
                ],
                default_sorting_field: 'ID',
            };
        case 'casemanagement_users':
            return {
                name: schemaName,
                fields: [
                    { name: '.*', type: 'auto' },

                    {
                        name: 'UserId',
                        type: 'int32',
                    },
                ],
            };

        default:
            return {
                name: schemaName,
                fields: [{ name: '.*', type: 'auto' }],
            };
    }
}
It was dynamically created, will this help you?
h
What was the definition of
casemanagement_case_category_relations
collection?
t
sorry i missed that one sec
I have removed the code
will try to recreate it
casemanagement_case_category_relations had a relation to casemanagement_cases CaseID ON CaseIDRef
I don't have the code
h
Okay, no issue. I'm just looking for reference fields.
t
Ok great!
any tips on how to get this quicker? @Harpreet Sangar
Copy code
searchParameters {
  q: '*',
  query_by: 'ClientID',
  sort_by: 'Created:DESC',
  filter_by: 'InternalType:=[call,callback,chat,email,form,sms,social] && GroupId:=[135,162,383,445,584,585,615,616,0] && Status:=[Closed,Open]',
  page: 1,
  limit: 5
}
Fields:
Copy code
[
                    { name: 'ID', type: 'int32', index: false, optional: true },
                    { name: 'Source', type: 'string', sort: true, index: false, optional: true },
                    { name: 'CaseID', type: 'string', sort: true, index: false, optional: true },
                    { name: 'ClientID', type: 'string', index: false, optional: true },
                    { name: 'Author', type: 'string', sort: true, index: true, optional: true },
                    { name: 'Parent', type: 'string', index: false, optional: true },
                    { name: 'Status', type: 'string', sort: true, index: true, optional: true },
                    { name: 'Topic', type: 'string', sort: true, index: true, optional: true },
                    { name: 'Description', type: 'string', index: false, optional: true },
                    { name: 'UserId', type: 'int32', index: false, optional: true },
                    { name: 'SLA', type: 'string', sort: true, index: false, optional: true },
                    { name: 'Answered', type: 'int32', optional: false },
                    { name: 'Responded', type: 'int32', index: false, optional: true },
                    { name: 'InternalCreated', type: 'int32', index: false, optional: true },
                    { name: 'MsgID', type: 'string', index: false, optional: true },
                    { name: 'CaseRespondant', type: 'string', index: false, optional: true },
                    { name: 'AssignedTo', type: 'string', index: false, optional: true },
                    // { name: 'HasAttachment', type: 'int32', index: false, optional: true },
                    { name: 'InternalType', type: 'string', index: true, optional: true },
                    { name: 'Language', type: 'string', index: false, optional: true },
                    { name: 'GroupId', type: 'int32', sort: false, index: true, optional: true },
                    { name: 'Group', type: 'string', index: false, optional: true },
                    { name: 'Result', type: 'string', index: false, optional: true },
                    { name: 'Tries', type: 'int32', index: false, optional: true },
                    { name: 'TextField', type: 'string', index: false, optional: true },
                    { name: 'Text', type: 'string', index: false, optional: true },
                    { name: 'UnreadMessages', type: 'int32', index: false, optional: true },
                    { name: 'DateTime', type: 'string', index: false, optional: true },
                    { name: 'Created', type: 'string', sort: true, index: true, optional: true },
                    { name: 'LastIncoming', type: 'string', index: false, optional: true },
                    { name: 'LastUpdated', type: 'string', index: false, optional: true },
                    { name: 'LastUpdatedBy', type: 'int32', index: false, optional: true },
                    { name: 'Resolved', type: 'int32', sort: true, index: false, optional: true },
                    { name: 'ResolvedAt', type: 'string', sort: true, index: false, optional: true },
                    { name: 'ResolvedBy', type: 'string', sort: true, index: false, optional: true },
                    
                    { name: 'AgentCommentTime', type: 'string', sort: true, index: false, optional: true },
                    { name: 'InternalSubType', type: 'string', sort: true, index: false, optional: true },
                    { name: 'Outgoing', type: 'bool', sort: true, index: true, optional: false },
                    { name: 'IsChild', type: 'int32', sort: true, index: false, optional: true },
                    { name: 'ToEmail', type: 'string', sort: true, index: false, optional: true },
                    { name: 'CopyToEmail', type: 'string', sort: true, index: false, optional: true },
                    { name: 'HiddenCopyToEmail', type: 'string', sort: true, index: false, optional: true },
                    { name: 'Signature', type: 'string', sort: true, index: false, optional: true },
                    { name: 'AgentName', type: 'string', sort: true, index: true, optional: true },
                    { name: 'GroupName', type: 'string', sort: true, index: false, optional: true },
                    { name: 'ClientName', type: 'string', sort: true, index: true, optional: true },
                    { name: 'ClientPersonNr', type: 'string', sort: true, index: false, optional: true },
                    { name: 'ContactName', type: 'string', sort: true, index: false, optional: true },
                    { name: 'BrandId', type: 'int32', sort: false, index: false, optional: true },
                    { name: 'QueueIcon', type: 'string', sort: false, index: false, optional: true },
                    { name: 'CategoryCount', type: 'int32', sort: false, index: false, optional: true },
                    { name: 'SubCasesCount', type: 'int32', sort: false, index: false, optional: true },
                    { name: 'GroupIdNotNull', type: 'int32', sort: false, index: false, optional: true }
                  ]
h
There's no option to make it quicker with wildcard search (q: *). When you're querying a field however, you can mention
enable_lazy_filtering: true
when your
filter_by
matches too many ids in comparison to the query.
> There's no option to make it quicker with wildcard search (q: *). (apart from changing
:=
to
:
as Jason mentioned)
What is the value of
search_time_ms
in the response?
t
Will check 1 sec
Copy code
search_time_ms: 114
Mariadb has around 30ms with the same amount but with complex joins
There's no option to make it quicker with wildcard search (q: *).
Any other way to solve it?
Copy code
result {
  facet_counts: [],
  found: 747037,
  hits: [
    { document: [Object], highlight: {}, highlights: [] },
    { document: [Object], highlight: {}, highlights: [] },
    { document: [Object], highlight: {}, highlights: [] },
    { document: [Object], highlight: {}, highlights: [] },
    { document: [Object], highlight: {}, highlights: [] }
  ],
  out_of: 2510122,
  page: 11,
  request_params: {
    collection_name: 'casemanagement_case_summary_view',
    first_q: '*',
    per_page: 5,
    q: '*'
  },
  search_cutoff: false,
  search_time_ms: 133
}
h
I'll need your dataset with just
InternalType
,
GroupId
,
Status
, and
Created
fields to see if there's room for improvement somewhere.
t
Copy code
# InternalType, GroupId, Status, Created
'chat', '135', 'Closed', '2025-02-15 17:42:48'
'email', '162', 'Closed', '2025-02-16 08:50:09'
'chat', '135', 'Closed', '2025-02-16 11:50:03'
'email', '135', 'Closed', '2025-02-17 15:42:02'
'email', '135', 'Open', '2025-02-17 15:45:01'
'email', '135', 'Closed', '2025-02-17 15:49:02'
'chat', '383', 'Closed', '2025-02-21 13:25:21'
'chat', '135', 'Closed', '2025-02-26 10:45:58'
'chat', '135', 'Closed', '2025-02-26 14:40:49'
'chat', '135', 'Closed', '2025-02-26 15:07:36'
GroupID is int32
The full dataset or is this snippet enough?
h
The full dataset. As a jsonl file, if possible.
t
hmm
it's quite big, but i will try
1
My workbench crashes when i try to dump the data
h
If you've indexed the data in Typesense, you can try the export endpoint.
t
ok trying that now
h
What Typesense version are you on?
t
Copy code
typesense/typesense:28.0
1
Fyi, the export is running
message has been deleted
Is that ok?
h
Yes
1
t
Any ideas? 🙂
h
I'm working on a bug right now. I'll let you know when I'm available to look into this.
t
ok thanks
@Harpreet Sangar -> How's it going with the bug? 🙂
h
Figured out the issue, just working on the best possible solution.
t
really? Cool!
@Harpreet Sangar i abandoned typesense for meilisearch you don't have to look at it, thanks anyways 🙂
v
I think I’ve got the same performance issue, as soon as I add an inner join like the one above (1-N relation), the query time goes from a few ms to at least 1 second, and adding another 1-1 join below that inner join makes the query ~3s. I’m running tests on an empty bare metal server with 128GB memory and data is on an NVMe, so plenty of resources for these tests - running typesense v28 or v29-rc23. The 1-1 relation that run in a few ms is between 2 collections of 18M and 6M documents so that’s nice. The 1-3 seconds 1-N query is between 18M and 2M documents. It’s actually much slower than running the join manually with multiple queries on mongodb on a small VM (total 100ms) I can write more details in a new thread after running more detailed tests. https://typesense-community.slack.com/archives/C01P749MET0/p1742327251756489?thread_ts=1742320842.862919&channel=C01P749MET0&message_ts=1742327251.756489
h
@Vincent Giersch Can you share the schema of your collections?
v
sure, I will try to make a minimal repro
🙌🏼 1