Hi! I'm trying to limit the number of hits in a se...
# community-help
j
Hi! I'm trying to limit the number of hits in a semantic search and I'm running into issues: • If I set the
k
param and then refine the search results using a filter, then the facet counts stop making sense (which I understand, since
k
is limiting the number of hits for the filtered query, not the original query). This behavior can be seen here: https://hn-comments-search.typesense.org. Just search for something and then apply a filter, you'll see that the facet count changes. • If I don't set
k
, but set
distance_threshold
together with
per_page
I only get results for the 1st page (never more than
per_page
number of hits), regardless of the
distance_threshold
value. • The results for these two approaches show different hits, so they're ranking them differently for some reason, why is that? 🤔
I'm on version
28.0.rc29
j
When
k
is not explicitly set,
k
defaults to using
per_page
, which is why you only ever see
per_page
number of results
The behavior you're observing with filtering when
k
is set is unfortunately a limitation of how semantic search and filtering work together. You want to set
k
to a large enough value for your dataset for the effects to be less pronounced
j
Right, but my question is, why is the total number of hits limited by
per_page
when
k
is not defined, instead of limited by the
distance_threshold
?
using
k
or
per_page
forces me to limit the results of a semantic search manually, unlike keyword search
j
Because the nearest neighbor search can get unbounded pretty fast without being grounded in number of results. We use HNSW under the hood and that does not provide a way to search only on threshold. K is also needed for the approximate algorithm to prune results at each hop of the graph traversal. So this is inherent to how HNSW is designed.
You can set
k
to a high number like
10000
with performance being the tradeoff
j
thanks Jason! I'll try that