#community-help

Improving Record Retrieval Speed from Typesense

TLDR Yoshi sought ways to accelerate Typesense record retrieval. Jason advised upgrading to high availability and using the documents/export endpoint. They also noted a high volume of writes consuming significant CPU capacity as a possible performance factor.

Powered by Struct AI

1

Sep 08, 2023 (3 months ago)
Yoshi
Photo of md5-9e27fed7af6568f2c8abff36e7e9da4d
Yoshi
04:52 PM
Hi everyone, I am trying to figure out the best way to return a few thousand records from Typesense.

I originally started with a for loop of 250 records, but this is very slow, as each request takes about 2-3 seconds.
I then tried parallelizing it, but any kind of parallelization (even 2 queries at a time) results in Node timing out.
Federated queries also time out when trying to return over 250 records.

Anyone have any thoughts on how this can be improved?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:01 PM
May I know how many records the collection has, and the total number of records matched by the query (the found parameter in the API response)
05:01
Jason
05:01 PM
Could you also share the search query with all the search parameters you're using?
Yoshi
Photo of md5-9e27fed7af6568f2c8abff36e7e9da4d
Yoshi
05:08 PM
total found is ~539k, similar to how many total records there are in the collection.

Here is our query:
{
  q: '*',
  filter_by: 'address_state:!=[`AL`]',
  page: 1,
  per_page: 25,
  query_by: 'company_name, contact_names, phone_numbers, website, organization_company_statuses',
  sort_by: 'number_of_enriched_fields:desc,company_display_name:asc',
  typo_tokens_threshold: 0,
  drop_tokens_threshold: 0
}
05:19
Yoshi
05:19 PM
Weโ€™re going to try upgrading to high availability and seeing if that helps (also think we need it in production anyways). If we do upgrade from single-node to multi-node, should we do that in off hours because it will bring our services down?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:37 PM
Since you're only using filter_by (and not keyword search), you can use the documents/export endpoint which supports filter_by: https://typesense.org/docs/0.25.0/api/documents.html#export-documents
05:38
Jason
05:38 PM
This will return the full dataset without pagination
05:39
Jason
05:39 PM
> if we do upgrade from single-node to multi-node, should we do that in off hours because it will bring our services down?
Correct. The first time you enable HA, you will experience downtime
Yoshi
Photo of md5-9e27fed7af6568f2c8abff36e7e9da4d
Yoshi
05:48 PM
ok, thanks Jason. this is helpful.

It looks like export does not have any way to limit or paginate, is that correct? it will always just return the full dataset?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:49 PM
That's correct
05:51
Jason
05:51 PM
On a side note, it looks like there's a constant stream of writes coming into your cluster, which is consuming about 50-60% of the available 2vCPU capacity. That could be another reason that searches are a bit slow.
Yoshi
Photo of md5-9e27fed7af6568f2c8abff36e7e9da4d
Yoshi
06:17 PM
got it. Yea, weโ€™re constantly updating our data setโ€ฆiโ€™ll play around with that as well. thanks for all of the help, we sincerely appreciate it

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Optimizing Typesense Implementation for Large Collections

Oskar faced performance issues with his document collection in Typesense due to filter additions. Jason suggested trying a newer Typesense build and potentially partitioning the data into country-wise collections. They also discussed reducing network latency with CDN solutions.

5

67
11mo

Revisiting Typesense for Efficient DB Indexing and Querying

kopach experienced slow indexing and crashes with Typesense. The community suggested to use batch import and check the server's resources. Improvements were made but additional support was needed for special characters and multi-search queries.

1

46
9mo

Multiple Filters and JSON Requests in Typesense

Manish asked about multiple filter_by arguments, JSON input, and using multisearch. Jason offered typesense documentation links, examples, and how to use JSON formatted requests with multisearch. Ed shared a similar use case.

6

44
5mo

Resolving Typesense Cloud Cluster Issue with Cron Job

Issei reported a problem with an unhealthy Typesense Cloud cluster. With the particular help of Jason and Kishore Nallan, they discovered that a problematic cron job was responsible. A solution, using a different endpoint for data export, was agreed on and implemented.

5

65
31mo

Enhancing Vector Search Performance and Response Time using Multi-Search Feature

Bill faced performance issues with vector search using multi_search feature. Jason and Kishore Nallan suggested running models on a GPU and excluding large fields from the search. Through discussion, it was established that adding more CPUs and enabling server-side caching could enhance performance. The thread concluded with the user reaching a resolution.

3

140
1mo