Hi, I am running typesense on an EKS cluster with...
# community-help
m
Hi, I am running typesense on an EKS cluster with 3 nodes of r7g.xlarge (4vcpu, 32gb ram) instance I have a total of 60 million records and index on nearly 12 fields (occupying 28gb out of 32 GB) out of 14 fields when I perform a single filter query on the field it takes more than 5 seconds.., how can I optimize this?. Just so you know, there is no production traffic yet.
j
Could you share a curl request to Typesense, showing all the search parameters?
m
Copy code
curl '{{URL}}/multi_search?x-typesense-api-key={{KEY}}' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8' \
  -H 'content-type: text/plain' \
  -H 'sec-ch-ua: "Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-site' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36' \
  --data-raw '{"searches":[{"exhaustive_search":true,"query_by":"country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state","highlight_full_fields":"country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state","collection":"people","q":"*","facet_by":"city,country,country_region,seniority,state,title","filter_by":"country:=[`India`]","max_facet_values":10,"page":1,"per_page":12},{"exhaustive_search":true,"query_by":"country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state","highlight_full_fields":"country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state","collection":"people","q":"*","facet_by":"country","max_facet_values":10,"page":1}]}'
Copy code
[
  {
    "exhaustive_search": true,
    "query_by": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
    "highlight_full_fields": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
    "collection": "people",
    "q": "*",
    "facet_by": "city,country,country_region,seniority,state,title",
    "filter_by": "country:=[`India`]",
    "max_facet_values": 10,
    "page": 1,
    "per_page": 12
  },
  {
    "exhaustive_search": true,
    "query_by": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
    "highlight_full_fields": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
    "collection": "people",
    "q": "*",
    "facet_by": "country",
    "max_facet_values": 10,
    "page": 1
  }
]
j
It's most likely the exhaustive_search flag. Could you remove that? And also change
country:=[India]
to
country:[India]
(remove the equals - if you search the docs for exact vs non-exact match, you'll see what it does)
Could you also add
facet_sample_threshold: 10000
and
facet_sample_percent: 20
as additional parameters?
m
okay let me try.
Copy code
{
  "searches": [
    {
      "query_by": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
      "highlight_full_fields": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
      "collection": "people",
      "q": "*",
      "facet_by": "city,country,country_region,seniority,state,title",
      "filter_by": "country:[`India`]",
      "max_facet_values": 10,
      "facet_sample_threshold": 10000,
      "facet_sample_percent": 20,
      "page": 1,
      "per_page": 12
    },
    {
      "query_by": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
      "highlight_full_fields": "country,country_region,email,first_name,name,seniority,job_start_date,title,last_name,linkedin_url,organization_id,city,state",
      "collection": "people",
      "q": "*",
      "facet_by": "country",
      "facet_sample_threshold": 10000,
      "facet_sample_percent": 20,
      "max_facet_values": 10,
      "page": 1
    }
  ]
}
Still takes 4.3 sec, let me know if the values are any wrong here.
Do i require more vcpu for given query pattern?
j
Yeah more CPUs and higher clock speed CPUs will help. But besides that it's hard to debug further without having access to the dataset. And we only offer this type of performance tuning help on Typesense Cloud.
m
I understand, but with no production traffic it should give better latency right. and this is not even a complex query. if you could share some performace optimization tips or some kinda documents that would be helpful.
j
The tips I shared above are the generic ones. Besides that optimizations become specific to the dataset, hardware configuration etc. This is why we only offer this on Typesense Cloud where we have complete visibility on runtime characteristics.
👍 1