#community-help

Discussing Large JSON File's Latency in Typesense

TLDR Daniel asked about handling a large JSON file in Typesense. Kishore Nallan explained the slowdown is due to wildcard queries hitting and sorting entire records. He suggested enabling caching to improve performance.

Powered by Struct AI

1

Dec 01, 2021 (26 months ago)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:26 PM
The server has 32 vCPUs and 128GB ram, the jsonl file is 3.4 GB with almost 15 million records
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 PM
Daniel Can you please elaborate what you mean by "first time"?
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:38 PM
When the query (q) parameter is *
12:38
Daniel
12:38 PM
Instead of having a specific search value
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:40 PM
What version of Typesense are you using? We have made some improvements on the performance on wildcard query (* ) on 0.22 RC builds (0.22 GA release is right around the corner)
Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:42 PM
The latest RC build that's on Docker (although I'm using it as a DEB package), 0.22.0.rcs41
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:45 PM
Okay then you are using the latest version. On the wildcard query, Typesense is hitting and sorting on the entire 15M records, so that explains the higher latency. We have to do more work to make Typesense use all the cores of a beefier server like this.

In the mean time, on 0.22 we also have a way for you to enable caching, so you can use that to handle this. Set use_cache=true parameter. You can also set a cache_ttl parameter in seconds (default is 60 seconds) as a scoped API key parameter if you want to cache for longer duration.

1

Daniel
Photo of md5-3e862430ae06b87120e3640d3f9f8061
Daniel
12:49 PM
Thanks Kishore!
12:50
Daniel
12:50 PM
The slowdown on filters is because of the same thing, right? Because the wilcard query is being used
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:54 PM
Yes, it's a function of having to sort and rank millions. The trade-off for on-the-fly sorting. Maybe in future we might have to introduce specific indices that can help with sorting by using more specialized datastructures for it.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Debugging Search Issue with Typesense Server Caching

Jameshwart reported experiencing caching issues with typesense server, despite the server's default setting of not caching. Through troubleshooting steps with Kishore Nallan, they were able to confirm an issue with the Typesense Javascript client and resolved it by adding `cacheSearchResultsForSeconds: 0` after `apiKey` in the initialisation. Laura also participated in discussion, learning about server and client level caching.

66
5mo

Typesense Bug Fix with `canceled_at` Field and Upgrade Concerns

Mateo reported an issue regarding the treatment of an optional field by Typesense which was confirmed a bug by Jason. After trying an upgrade, an error arose. Jason explained the bug was due to a recent change and proceeded to downgrade their version. Future upgrade protocols were discussed.

3

74
10mo

Large JSONL Documents Import Issue & Resolution

Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.

run

4

94
9mo

Issue with Search Duration on Typesense Database

Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.

13
10mo

Slow, High CPU Write Operations After Collection Drop in Typesense

Himank discussed an issue in Typesense where deleting and recreating a collection led to slow write operations and high CPU usage. Kishore Nallan suggested using an alias to avoid this issue. Numerous tests and debugging was conducted as pboros contributed with local testing. Kishore Nallan aimed to start implementing a range delete and full db compaction after deletion to potentially solve the issue.

20

232
17mo