We ve implemented typesense and are building advanced search typesense #community-help

We've implemented typesense and are building "adva...

David Jones

10/17/2023, 3:35 PM

We've implemented typesense and are building "advanced search" tooling. Our system is returning document id's of those that match the search criteria. To do advanced search (logic against multiple fields) we accept a number of fields to issue a multi search across (eg: name=David, company=Typesense), emit the multisearch for non-null fields, and then find the intersection of the returned result sets. However the result sets are capped at 250 items per page, and so it's possible that one set is >250 and there is an intersection at result 251 or higher with one of the other result sets below result 250. For example, we match highly on one document on name and it is #1 in the name search, but it is a lower match on the company and it is result 251, we would exclude this document unless we page through all the searches. Our collections are relatively large (60k+) and we're worried that there would, in effect, be hundreds of pages that we'd have to scan. Is there any recommendation for handling this? Right now we just show a warning that "Too many results returned, try narrowing your search" but this is far from ideal.

David Jones

10/17/2023, 3:52 PM

We're considering building from source and increasing the per_page max and enforcing reasonable search queries on the application side, is there any danger to this? I don't exactly see why 250 was chosen, other than it's not too many and not too few.

Kishore Nallan

10/17/2023, 4:21 PM

There's a cost to be paid for sorting so a larger per page number is going to take longer. We've debated about increasing this number but it has a lot of potential for abuse. Perhaps we could bring in a command line flag for people to enable that can increase this limit.

David Jones

10/17/2023, 4:25 PM

That would be very helpful. In cases like ours, an advanced search operation is understood to take longer to provide a more "accurate" result, I think the tradeoff is acceptable. The most ideal solution would be to run the intersection serverside before sorting, since we are only interested in the intersection of the multisearch and don't need to spend any computing time sorting results that will be filtered out client side

David Jones

10/17/2023, 4:25 PM

If I were to open a github issue for that feature, would it be considered if there was enough interest?

Kishore Nallan

10/17/2023, 4:35 PM

Yes definitely. Please create one. Meanwhile we can look into lifting the limits via parameter

Open in Slack

Previous Next