Hello everyone, i have a question regarding the me...
# community-help
p
Hello everyone, i have a question regarding the memory usage and the loading times on a paginated query. First of all lets start with the memory usage. I created a raw JSON file with data that is equal to the data i sent to the server, this means that the server should have about 3x the amount in memory usage. However the JSON file is about
770MB
and the memory usage of the server is about
5.5GB
. What am i missing here? The collection schema is appended at the end. Now onto the paginated query. I have a paginated filterable table in my application and if i just fetch the first page of the table, the query looks like this:
Copy code
{
  "q": "*",
  "page": 1,
  "per_page": 10,
  "limit_hits": 20,
  "include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
  "filter_by": "(id:* || $in_use(id:) || $tenant_product_distributors(id:) || $product_bans(id:*))",
  "sort_by": "eid:ASC"
}
This query takes about
1.4s
to finish and i was wondering if i am doing the left joins wrong, because in SQL this takes a few milliseconds. Now if i am trying to filter by a distributor like so:
Copy code
{
  "q": "*",
  "page": 1,
  "per_page": 10,
  "limit_hits": 20,
  "include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
  "filter_by": "$tenant_product_distributors(distributor_id:=2) && (id:* || $in_use(id:) || $product_bans(id:))",
  "sort_by": "eid:ASC"
}
The query now takes only
600ms
. How is a query faster with a filter on a joined collection than without any filters at all. I get that there will be less documents to join but does this really add up to cutting the the query time in half? Thank you in advance. Here is the collection schema i am using:
Copy code
[
  {
    "name": "products",
    "fields":
      [
        { "name": "eid", "type": "int32", "sort": true },
        { "name": "name", "type": "string", "sort": true },
        { "name": "number", "type": "string", "sort": true },
        {
          "name": "manufacturer_id",
          "type": "string",
          "reference": "manufacturers.id"
        },
        { "name": "gs1", "type": "string" },
        { "name": "hibc", "type": "string" },
        { "name": "ean8", "type": "string" },
        { "name": "ean13", "type": "string" }
      ],
    "default_sorting_field": "eid"
  },
  {
    "name": "tenant_product_distributors",
    "fields":
      [
        { "name": "tenant_id", "type": "int32", "index": false },
        {
          "name": "distributor_id",
          "type": "string",
          "reference": "distributors.id"
        },
        { "name": "product_id", "type": "string", "reference": "products.id" }
      ]
  },
  {
    "name": "in_use",
    "fields":
      [
        { "name": "tenant_id", "type": "int32", "index": false },
        { "name": "product_id", "type": "string", "reference": "products.id" },
        { "name": "in_use", "type": "bool", "sort": true }
      ],
    "default_sorting_field": "in_use"
  },
  {
    "name": "product_bans",
    "fields":
      [
        { "name": "tenant_id", "type": "int32", "index": false },
        { "name": "product_id", "type": "string", "reference": "products.id" },
        { "name": "ban", "type": "bool", "sort": true }
      ],
    "default_sorting_field": "ban"
  },
  { "name": "manufacturers", "fields": [{ "name": "name", "type": "string" }] },
  {
    "name": "manufacturer_prefixes",
    "fields":
      [
        {
          "name": "manufacturer_id",
          "type": "string",
          "reference": "manufacturers.id"
        },
        { "name": "prefix", "type": "string" },
        { "name": "prefix_type", "type": "int32" }
      ]
  },
  { "name": "distributors", "fields": [{ "name": "name", "type": "string" }] }
]
h
The 3x estimate of memory usage is only for key word search. It doesn't include the memory usage of the reference indexes. Which collection are you querying with
Copy code
{
  "q": "*",
  "page": 1,
  "per_page": 10,
  "limit_hits": 20,
  "include_fields": ", $in_use(), $manufacturers(), $distributors(), $product_bans(*)",
  "filter_by": "(id:* || $in_use(id:) || $tenant_product_distributors(id:) || $product_bans(id:*))",
  "sort_by": "eid:ASC"
}
maybe I can suggest a better query?
p
Thanks for your help Harpreet, i am querying through the products collection. Better suggestions are always welcome.
h
@Paul Wallnöfer
Copy code
{
  "q": "*",
  "page": 1,
  "per_page": 10,
  "limit_hits": 20,
  "include_fields": ", $in_use(*), $manufacturers(*), $distributors(*), $product_bans(*), $tenant_product_distributors(*)",
  "filter_by": "id:* || $tenant_product_distributors(id:)",
  "sort_by": "eid:ASC"
}
Can you check if this reduces the time from
1.4s
?
p
@Harpreet Sangar thanks for your suggestion. The query time is faster due to unjoined fields, but I need the information to show to the user, for example in_use is now empty. Probably because it is ManyToOne and not OneToMany from a Product standpoint.
h
Only the
in_use
is missing?
product_bans
is still returned?
p
Unfortunately neither
in_use
nor
product_bans
is returned
h
They should've since we allow including the related documents of a collection which has a direct reference to the collection you're querying. Can you provide a minimal example using only the reference fields that I can use to reproduce the issue in this format https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b?
p
I hope i am picking up what you are putting down. Here are the curl commands:
Copy code
curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          {
            "name": "products",
            "fields":
              [
                { "name": "eid", "type": "int32", "sort": true },
                { "name": "name", "type": "string", "sort": true },
                { "name": "number", "type": "string", "sort": true },
                {
                  "name": "manufacturer_id",
                  "type": "string",
                  "reference": "manufacturers.id"
                },
                { "name": "gs1", "type": "string" },
                { "name": "hibc", "type": "string" },
                { "name": "ean8", "type": "string" },
                { "name": "ean13", "type": "string" }
              ],
            "default_sorting_field": "eid"
          }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          {
            "name": "tenant_product_distributors",
            "fields":
              [
                { "name": "tenant_id", "type": "int32", "index": false },
                {
                  "name": "distributor_id",
                  "type": "string",
                  "reference": "distributors.id"
                },
                { "name": "product_id", "type": "string", "reference": "products.id" }
              ]
          }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          {
            "name": "in_use",
            "fields":
              [
                { "name": "tenant_id", "type": "int32", "index": false },
                { "name": "product_id", "type": "string", "reference": "products.id" },
                { "name": "in_use", "type": "bool", "sort": true }
              ],
            "default_sorting_field": "in_use"
          }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          {
            "name": "product_bans",
            "fields":
              [
                { "name": "tenant_id", "type": "int32", "index": false },
                { "name": "product_id", "type": "string", "reference": "products.id" },
                { "name": "ban", "type": "bool", "sort": true }
              ],
            "default_sorting_field": "ban"
          }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          { "name": "manufacturers", "fields": [{ "name": "name", "type": "string" }] }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          {
            "name": "manufacturer_prefixes",
            "fields":
              [
                {
                  "name": "manufacturer_id",
                  "type": "string",
                  "reference": "manufacturers.id"
                },
                { "name": "prefix", "type": "string" },
                { "name": "prefix_type", "type": "int32" }
              ]
          }' | jq

curl -s "$TYPESENSE_HOST/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '
          { "name": "distributors", "fields": [{ "name": "name", "type": "string" }] }' | jq

curl -s "$TYPESENSE_HOST/collections/products/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "id": "3",
          "name": "SR Vivodent PE Set of 6 UA11 1A",
          "number": "504377",
          "ean13": "",
          "ean8": "",
          "eid": 3,
          "gs1": "",
          "hibc": "DIVO5043771",
          "manufacturer_id": "1"
        }'  | jq

curl -s "$TYPESENSE_HOST/collections/in_use/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "id": "12",
          "in_use": false,
          "product_id": "3",
          "tenant_id": 1
        }'  | jq

curl -s "$TYPESENSE_HOST/collections/product_bans/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "ban": false,
          "id": "8",
          "product_id": "3",
          "tenant_id": 0
        }'  | jq

curl -s "$TYPESENSE_HOST/collections/tenant_product_distributors/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "distributor_id": "1",
          "id": "25",
          "product_id": "3",
          "tenant_id": 1
        }'  | jq

curl -s "$TYPESENSE_HOST/collections/distributors/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "id": "1",
          "name": "Ivoclar Vivadent GmbH (DE)"
        }'  | jq

curl -s "$TYPESENSE_HOST/collections/manufacturers/documents/import?action=create" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{
          "id": "1",
          "name": "Ivoclar"
        }'  | jq
h
Okay. Thanks for providing reproduction steps. I'll check and let you know.
p
Have you been able to reproduce this issue?
h
Hi @Paul Wallnöfer, I have been working on some high priority tasks since last week. Hopefully I'll complete them today. I'll let you know.
p
No worries you don't have to stress it. Thank you for your help.
👍🏼 1
@Harpreet Sangar I have once again stumbled opon this speed problem, have you gotten around to testing out different query structures?
h
Hi @Paul Wallnöfer I am working on reducing the time taken to compute the left joins: https://github.com/typesense/typesense/pull/2617
This will increase the memory usage but we should see significant improvement in the response time. I'll let you know when this feature is ready.
p
Great news! I will be waiting. 👍