#community-help

Discussing Large JSON File's Latency in Typesense

TLDR Daniel asked about handling a large JSON file in Typesense. Kishore Nallan explained the slowdown is due to wildcard queries hitting and sorting entire records. He suggested enabling caching to improve performance.

Powered by Struct AI
+11
10
22mo
Solved
Join the chat
Dec 01, 2021 (22 months ago)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:26 PM
The server has 32 vCPUs and 128GB ram, the jsonl file is 3.4 GB with almost 15 million records
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 PM
Daniel Can you please elaborate what you mean by "first time"?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:38 PM
When the query (q) parameter is *
12:38
Daniel
12:38 PM
Instead of having a specific search value
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:40 PM
What version of Typesense are you using? We have made some improvements on the performance on wildcard query (* ) on 0.22 RC builds (0.22 GA release is right around the corner)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:42 PM
The latest RC build that's on Docker (although I'm using it as a DEB package), 0.22.0.rcs41
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:45 PM
Okay then you are using the latest version. On the wildcard query, Typesense is hitting and sorting on the entire 15M records, so that explains the higher latency. We have to do more work to make Typesense use all the cores of a beefier server like this.

In the mean time, on 0.22 we also have a way for you to enable caching, so you can use that to handle this. Set use_cache=true parameter. You can also set a cache_ttl parameter in seconds (default is 60 seconds) as a scoped API key parameter if you want to cache for longer duration.
+11
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:49 PM
Thanks Kishore!
12:50
Daniel
12:50 PM
The slowdown on filters is because of the same thing, right? Because the wilcard query is being used
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:54 PM
Yes, it's a function of having to sort and rank millions. The trade-off for on-the-fly sorting. Maybe in future we might have to introduce specific indices that can help with sorting by using more specialized datastructures for it.