Hi all could I get some advice on how to optimize a collecti typesense #community-help

Hi all, could I get some advice on how to optimize...

Alex Kalyvitis

02/18/2025, 11:24 AM

Hi all, could I get some advice on how to optimize a collection/query? For context the collection holds about 2.2M documents. As you can tell, there are a number of

facet_by

fields in use, which is one part of the issue, performance-wise. We’ve managed to overcome this by using the facet sampling parameters. However some facets are of high cardinality, which means the sample may not include them, leaving users confused. The second problem is the

filter_by

, particularly the

(city:[...])

array filter. This is how we perform authorization at the moment, which we intend to replace with scoped api keys in the near future. Performance is quite bad for large arrays (~50 cities). Any advice on overcoming for these issues? Sample query attached.

search.json

Kishore Nallan

02/18/2025, 1:44 PM

How many cities are there in your dataset totally?

Alex Kalyvitis

02/18/2025, 1:44 PM

Hi Kishore! Around 50 at the moment and growing

Kishore Nallan

02/18/2025, 1:58 PM

One idea you can try is to do a != query if the filter by city list is super large.

Kishore Nallan

02/18/2025, 1:58 PM

Inverting the filter condition might help speed up the queyr.

Alex Kalyvitis

02/18/2025, 1:59 PM

I’ll try that, thank you Kishore 🙏🏻

Kishore Nallan

02/18/2025, 1:59 PM

You can also try setting

enable_lazy_filter: true

enable_lazy_filter: false

to see if that help.

👀 1

Alex Kalyvitis

02/18/2025, 2:01 PM

What are your thoughts on the high cardinality facets and sampling? We’re using instantsearch and are considering decoupling these filters from being faceted but it feels like swimming against the current

Kishore Nallan

02/18/2025, 2:03 PM

Doing filtering on so many values is a bit challenging. Because we have to do a really big OR of all individual filter matched documents.

Kishore Nallan

02/18/2025, 2:03 PM

With faceting, by nature sampling means it's going to be approximate.

Kishore Nallan

02/18/2025, 2:04 PM

We might have to support enabling sampling at a per-field level, which I don't think we do today.

Alex Kalyvitis

02/18/2025, 2:08 PM

That could help, fields with different cardinalities may need different sampling

Alex Kalyvitis

02/18/2025, 2:09 PM

And I believe 28.x should have some perf improvements with facets right?

Alex Kalyvitis

02/18/2025, 2:09 PM

Any ETA on when it will be available?

Kishore Nallan

02/18/2025, 2:09 PM

We have some improvements in 28.x you can try it out. It's due for release today.

🙌 1

3 Views

Open in Slack

Previous Next