#community-help

Issue with Search Duration on Typesense Database

TLDR Robert reported an issue about query time delay when adding a filter_by constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.

Powered by Struct AI
Jan 17, 2023 (11 months ago)
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
02:33 AM
Got a DB with 251 million docs. The fields are metadata about files including an int64 representing the size of a file in bytes. When doing a regular query with a ?q="some string" I get 6 documents back in under 100 ms. If I add filter_by : "size:[0..1048576]" to the query, it now takes 25 seconds to run and returns the same 6 docs (which is expected, all match). Based on this, it feels like TypeSense might be performing the filter_by first, taking 25 seconds and then querying the filtered docs. If that is the case, can I instruct typesense to pefrorm the query ?q first and then it can filter_by from those results? Seems like it would reduce the search time from 25 seconds to 100ms or so. Any advice?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:42 AM
Currently the filtering is done before querying. The filter is then applied to the query results, which is why this particular inefficiency happens. We don't have a way to change the order of this operation, though we have to certainly see how to address this inefficiency.
02:44
Kishore Nallan
02:44 AM
Numerical filters are slow especially because the documents that match a given range must first be identified and this itself can be expensive for a large exhaustive range like this.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
02:57 AM
Gotcha. I worry what the search duration for these sorts of queries will be as this DB grows to 500+ million docs in the future.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:58 AM
We will address this soon. Already on our list.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
03:02 AM
Typesense is a really awesome search engine and has made the project I'm working on (http://discmaster.textfiles.com/search) become a reality. I greatly appreciate the work that you and your team puts into this software!
Is there a place (github roadmap or issue) that this performance improvement is tracked at? I'd love to keep an eye on it so that when it makes it into an RC I can try it out.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:04 AM
Thank you!

We don't have an issue open for this on GH. It's part of our internal performance improvement backlog. If you can create an issue, I'll be sure to link the GH issue so we update it when we've a build to test.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
03:13 AM
Cool. I created an issue for it: https://github.com/typesense/typesense/issues/858
03:14
Robert
03:14 AM
Lastly, if there was a way to help fund/pay for the development of this improvement, perhaps move it up the backlog a bit, I'd be willing to contribute in that way as well.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:55 AM
Thanks.

For sponsoring our work, we have a way to do that via Github. See the one-time tiers here: https://github.com/sponsors/typesense?frequency=one-time

However, for this particular issue, we have to first figure out how to approach it. When there are n tokens in the query, we AND the document IDs associated with each token in our inverted index, and when we do this operation we simultaneously apply the filter, which is done before hand.

Until we finish the AND we have no way to knowing the size of results, and likewise for filtering also until we have identified the document IDs that match a large range filter (which itself is expensive) we have no way to knowing the size of the filtered set.

So the improvement of flipping the order of operation is not that straightforward to implement because we don't know the sizes upfront. We will have to develop some approximate heuristic to detect this scenario early and handle it. Let me get back to you after looking into this in more detail.
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
03:22 PM
Ahh, that makes sense. I've been a $5/month sponsor for a while, but hadn't noticed the one-time feature sponsorship.
Yes, let me know if you determine it can be done and I'd be glad to sponsor the $5,000 to have it be prioritized. Thanks!
Feb 11, 2023 (10 months ago)
Robert
Photo of md5-6384d24e1825271b2c37ad8afa24a899
Robert
04:28 AM
Just wanted to give you a heads up that I am moving on from the project that has the massive typesense DB and so I will no longer be looking to sponsor the typesense improvements we’ve chatted about here. Wanted to let you know that I’m no longer interested in doing that, but I greatly appreciate typesense and all the work you and your team have put into it!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:36 AM
Thank you Robert. What you reported is nevertheless on our roadmap and we will be focusing on that in the next few months. All the best and thanks for all the help!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Typesense Capabilities and Troubleshooting Queries

A had issues with refinement lists and analytics in Typesense. Jason provided a possible solution and recommended the analytics widget. They clarified import size limits and helped identify a filter issue in A's query. Upgrade options are in Typesense's roadmap.

4

32
34mo
Solved

Troubleshooting 400 Error When Upgrading Typesense Firestore Extension

Orion experienced a `400` error after updating the Typesense Firestore extension, causing issues with cloud functions. They traced the issue back to a data type conflict in their Typesense collection schema after updating. With help from Jason and Kishore Nallan, they resolved the issue by recreating the collection.

5

96
14mo
Solved

Resolving Typesense Result Issue in Document Collection Queries

Mike was encountering errors when searching for specific query in their Typesense document collection. Jason suggested it may be due to the `drop_tokens_threshold` setting. There was a misunderstanding but after further explanation from Jason, Mike understood and decided to continue the conversation via email.

1

19
21mo

Troubleshooting Typesense Document Import Error

Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.

3

30
10mo
Solved

Handling Kinesis Stream Event Batching with Typesense

Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.

8

91
24mo