Issue with Search Duration on Typesense Database
TLDR Robert reported an issue about query time delay when adding a filter_by
constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.
Jan 17, 2023 (11 months ago)
Robert
02:33 AMint64
representing the size
of a file in bytes. When doing a regular query with a ?q="some string"
I get 6 documents
back in under 100 ms
. If I add filter_by : "size:[0..1048576]"
to the query, it now takes 25 seconds
to run and returns the same 6 docs (which is expected, all match). Based on this, it feels like TypeSense might be performing the filter_by
first, taking 25 seconds and then querying the filtered docs. If that is the case, can I instruct typesense to pefrorm the query ?q
first and then it can filter_by
from those results? Seems like it would reduce the search time from 25 seconds
to 100ms
or so. Any advice?Kishore Nallan
02:42 AMKishore Nallan
02:44 AMRobert
02:57 AMKishore Nallan
02:58 AMRobert
03:02 AMIs there a place (github roadmap or issue) that this performance improvement is tracked at? I'd love to keep an eye on it so that when it makes it into an RC I can try it out.
Kishore Nallan
03:04 AMWe don't have an issue open for this on GH. It's part of our internal performance improvement backlog. If you can create an issue, I'll be sure to link the GH issue so we update it when we've a build to test.
Robert
03:13 AMRobert
03:14 AMKishore Nallan
10:55 AMFor sponsoring our work, we have a way to do that via Github. See the one-time tiers here: https://github.com/sponsors/typesense?frequency=one-time
However, for this particular issue, we have to first figure out how to approach it. When there are
n
tokens in the query, we AND
the document IDs associated with each token in our inverted index, and when we do this operation we simultaneously apply the filter, which is done before hand.Until we finish the
AND
we have no way to knowing the size of results, and likewise for filtering also until we have identified the document IDs that match a large range filter (which itself is expensive) we have no way to knowing the size of the filtered set.So the improvement of flipping the order of operation is not that straightforward to implement because we don't know the sizes upfront. We will have to develop some approximate heuristic to detect this scenario early and handle it. Let me get back to you after looking into this in more detail.
Robert
03:22 PMYes, let me know if you determine it can be done and I'd be glad to sponsor the $5,000 to have it be prioritized. Thanks!
Feb 11, 2023 (10 months ago)
Robert
04:28 AMKishore Nallan
04:36 AMTypesense
Indexed 3011 threads (79% resolved)
Similar Threads
Typesense Capabilities and Troubleshooting Queries
A had issues with refinement lists and analytics in Typesense. Jason provided a possible solution and recommended the analytics widget. They clarified import size limits and helped identify a filter issue in A's query. Upgrade options are in Typesense's roadmap.
Troubleshooting 400 Error When Upgrading Typesense Firestore Extension
Orion experienced a `400` error after updating the Typesense Firestore extension, causing issues with cloud functions. They traced the issue back to a data type conflict in their Typesense collection schema after updating. With help from Jason and Kishore Nallan, they resolved the issue by recreating the collection.
Resolving Typesense Result Issue in Document Collection Queries
Mike was encountering errors when searching for specific query in their Typesense document collection. Jason suggested it may be due to the `drop_tokens_threshold` setting. There was a misunderstanding but after further explanation from Jason, Mike understood and decided to continue the conversation via email.
Troubleshooting Typesense Document Import Error
Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.