Hi all, could I get some advice on how to optimize...
# community-help
a
Hi all, could I get some advice on how to optimize a collection/query? For context the collection holds about 2.2M documents. As you can tell, there are a number of
facet_by
fields in use, which is one part of the issue, performance-wise. We’ve managed to overcome this by using the facet sampling parameters. However some facets are of high cardinality, which means the sample may not include them, leaving users confused. The second problem is the
filter_by
, particularly the
(city:[...])
array filter. This is how we perform authorization at the moment, which we intend to replace with scoped api keys in the near future. Performance is quite bad for large arrays (~50 cities). Any advice on overcoming for these issues? Sample query attached.
k
How many cities are there in your dataset totally?
a
Hi Kishore! Around 50 at the moment and growing
k
One idea you can try is to do a != query if the filter by city list is super large.
Inverting the filter condition might help speed up the queyr.
a
I’ll try that, thank you Kishore 🙏🏻
k
You can also try setting
enable_lazy_filter: true
or
enable_lazy_filter: false
to see if that help.
👀 1
a
What are your thoughts on the high cardinality facets and sampling? We’re using instantsearch and are considering decoupling these filters from being faceted but it feels like swimming against the current
k
Doing filtering on so many values is a bit challenging. Because we have to do a really big OR of all individual filter matched documents.
With faceting, by nature sampling means it's going to be approximate.
We might have to support enabling sampling at a per-field level, which I don't think we do today.
a
That could help, fields with different cardinalities may need different sampling
And I believe 28.x should have some perf improvements with facets right?
Any ETA on when it will be available?
k
We have some improvements in 28.x you can try it out. It's due for release today.
🙌 1