#community-help

Issues and Improvements in Typesense with 14 Million Records

TLDR Miguel experienced performance issues when using Typesense for large datasets. Jason suggested performance improvements made to Typesense since then and directed them to specific server-side parameters for better handling. Miguel agreed to try again.

Powered by Struct AI

1

1

21
17mo
Solved
Join the chat
Jun 07, 2022 (17 months ago)
Miguel
Photo of md5-a51d207af8f5d32e51bd662fc8249b1c
Miguel
10:00 PM
A few months ago ( like 8 ) I was trying Typesense
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:01 PM
Could you post in this thread?
Miguel
Photo of md5-a51d207af8f5d32e51bd662fc8249b1c
Miguel
10:02 PM
But I faced a few setbacks with a big amount of data, about 14 million records with a size of 3.6GB
10:02
Miguel
10:02 PM
When I used the filters the response time increased from 100-300ms to 2-7 seconds
10:03
Miguel
10:03 PM
And what crashed the request, was when I tried to search through the filters
10:04
Miguel
10:04 PM
Now I'm wondering if this is still the case, since the other alternative Meilisearch didn't have such a thing
10:04
Miguel
10:04 PM
And I had to build my own in the frontend, kinda like a hack which ultimately didn't serve my purpose
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:05 PM
Interesting, I'm surprised to hear about the crash... But that could happen if there wasn't enough RAM and the OS killed the process.
10:06
Jason
10:06 PM
For eg, here's a larger dataset (32M records, ~10GB on disk) with filters, search as you type, etc: https://songs-search.typesense.org/
10:06
Jason
10:06 PM
In any case, we have made some improvements to performance and memory consumption in the last several months. So I'd recommend giving v0.23.0 a shot
Miguel
Photo of md5-a51d207af8f5d32e51bd662fc8249b1c
Miguel
10:07 PM
Yeah, I mean it wasn't like a crash, the request just timed out after a while (30-40 seconds). I tried with a great server (64 GB of RAM)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:08 PM
Did you do a wild-card query (q=*) by any chance?
Miguel
Photo of md5-a51d207af8f5d32e51bd662fc8249b1c
Miguel
10:08 PM
Okay, I will try again, although the process for indexing the data is long due to the size of the it so the feedback will come later :D
10:08
Miguel
10:08 PM
Yes, that was the problem now that I remember
10:08
Miguel
10:08 PM
You told me that haha

1

10:09
Miguel
10:09 PM
That's why the data took a long time to load the first time
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:09 PM
Ah yes, wild card queries are very CPU intensive indeed. We made some minor improvements there, but you'll still see issues with 10s of millions of records
10:10
Jason
10:10 PM
Two new features we added recently to help with this are the ability to turn on server-side caching and server-side search cutoff threshold, so if the search query exceeds your defined threshold, it will return the results found so far
Miguel
Photo of md5-a51d207af8f5d32e51bd662fc8249b1c
Miguel
10:11 PM
Nice, I will definitely try it again
10:12
Miguel
10:12 PM
Will report back with feedback soon

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:12 PM
The parameters are called: search_cutoff_ms, use_cache and cache_ttl documented in the tables here: https://typesense.org/docs/0.23.0/api/documents.html#search-parameters