Issue with Search Duration on Typesense Database
TLDR Robert reported an issue about query time delay when adding a
filter_by constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.
Jan 17, 2023 (11 months ago)
sizeof a file in bytes. When doing a regular query with a
?q="some string"I get
6 documentsback in under
100 ms. If I add
filter_by : "size:[0..1048576]"to the query, it now takes
25 secondsto run and returns the same 6 docs (which is expected, all match). Based on this, it feels like TypeSense might be performing the
filter_byfirst, taking 25 seconds and then querying the filtered docs. If that is the case, can I instruct typesense to pefrorm the query
?qfirst and then it can
filter_byfrom those results? Seems like it would reduce the search time from
100msor so. Any advice?
Kishore Nallan02:42 AM
Kishore Nallan02:44 AM
Kishore Nallan02:58 AM
Is there a place (github roadmap or issue) that this performance improvement is tracked at? I'd love to keep an eye on it so that when it makes it into an RC I can try it out.
Kishore Nallan03:04 AM
We don't have an issue open for this on GH. It's part of our internal performance improvement backlog. If you can create an issue, I'll be sure to link the GH issue so we update it when we've a build to test.
Kishore Nallan10:55 AM
For sponsoring our work, we have a way to do that via Github. See the one-time tiers here: https://github.com/sponsors/typesense?frequency=one-time
However, for this particular issue, we have to first figure out how to approach it. When there are
ntokens in the query, we
ANDthe document IDs associated with each token in our inverted index, and when we do this operation we simultaneously apply the filter, which is done before hand.
Until we finish the
ANDwe have no way to knowing the size of results, and likewise for filtering also until we have identified the document IDs that match a large range filter (which itself is expensive) we have no way to knowing the size of the filtered set.
So the improvement of flipping the order of operation is not that straightforward to implement because we don't know the sizes upfront. We will have to develop some approximate heuristic to detect this scenario early and handle it. Let me get back to you after looking into this in more detail.
Yes, let me know if you determine it can be done and I'd be glad to sponsor the $5,000 to have it be prioritized. Thanks!
Feb 11, 2023 (10 months ago)
Kishore Nallan04:36 AM
Indexed 3011 threads (79% resolved)
Typesense Capabilities and Troubleshooting Queries
A had issues with refinement lists and analytics in Typesense. Jason provided a possible solution and recommended the analytics widget. They clarified import size limits and helped identify a filter issue in A's query. Upgrade options are in Typesense's roadmap.
Troubleshooting 400 Error When Upgrading Typesense Firestore Extension
Orion experienced a `400` error after updating the Typesense Firestore extension, causing issues with cloud functions. They traced the issue back to a data type conflict in their Typesense collection schema after updating. With help from Jason and Kishore Nallan, they resolved the issue by recreating the collection.
Resolving Typesense Result Issue in Document Collection Queries
Mike was encountering errors when searching for specific query in their Typesense document collection. Jason suggested it may be due to the `drop_tokens_threshold` setting. There was a misunderstanding but after further explanation from Jason, Mike understood and decided to continue the conversation via email.
Troubleshooting Typesense Document Import Error
Christopher had trouble importing 2.1M documents into Typesense due to memory errors. Jason clarified the system requirements, explaining the correlation between RAM and dataset size, and ways to tackle the issue. They both also discussed database-like query options.
Handling Kinesis Stream Event Batching with Typesense
Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.