Hey everyone, We’ve been doing lots of work recen...
# community-help
l
Hey everyone, We’ve been doing lots of work recently to try and improve the speed of our requests as they are a little on the slow side unfortunately and we’ve become a bit stuck with numerical filtering. We have a price filter which could be any value from
0
to
100_000_000
and in our UI we use a range slider to allow the user to select what prices they would like to see. We then form a filter like so
filter_by: price:[100..500000]
as an example. We have about 3.5 million records and on average this search takes 1 second to complete. Our current ideas are to try all of the following: • Adding
ranged_index
to the schema ◦ We’ve done this for part of our data set and so far we are only seeing a very small improvement • Increase RAM • Move to run in a VM • Move to Typesense cloud and if all of that doesn’t help, then we have thought about trying to turn it into a string based filter with a set of ranges, so something priced at
10
would be assigned
priceFilter: '0'
(range 0 to 100) and something priced at
12000
would get
priceFilter: '10000'
(range 10_000 to 20_000). We would then turn this into a filter like so
filter_by: (priceFilter:='0' || priceFilter:='100' || priceFilter:='1000' || priceFilter:='10000')
We would like to avoid doing this though as it would be a huge undertaking to apply this new
priceFilter
to all 3.5 million records. If there is a better way or something that we are missing I would really appreciate some advice. Thanks in advance.
k
I'm surprised that adding
ranged_index
does not help. Could it be possible that there are other parameters in the search query which is actually slowing it down (or maybe range_index in combination with those becomes ineffective)?
Would you be able to share a sample query that's slow here?
l
We found about a
40ms
improvement from
ranged_index
although we’re not completely finished with our testing of that yet
So as an example, this is a query without any price, and it took
97ms
Copy code
curl '***/search/multi_search?use_cache=true' \
  -H 'x-typesense-api-key: ***' \
  --data-raw '{
   "searches":[
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(department.name:=Collector Cars) && (status:=NEW)",
         "facet_by":"brand,country.name,country.code,department.code,department.name,flags.isWithoutReserve,price.GBPHighEstimate,price.GBPLowEstimate,price.estimateHigh,price.estimateLow,status,groups",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":1,
         "per_page":48,
         "max_facet_values":1000,
         "q":""
      },
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(status:=NEW)",
         "facet_by":"department.name",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":0,
         "per_page":0,
         "max_facet_values":1000,
         "q":""
      },
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(department.name:=Collector Cars)",
         "facet_by":"status",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":0,
         "per_page":0,
         "max_facet_values":1000,
         "q":""
      }
   ]
}'
And here is an example with a price filter which took
1470ms
Copy code
curl '***/search/multi_search?use_cache=true' \
  -H 'x-typesense-api-key: ***' \
  --data-raw '{
   "searches":[
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(price.GBPLowEstimate:[0..276797] || price.GBPHighEstimate:[0..276797]) && (department.name:=Collector Cars) && (status:=NEW)",
         "facet_by":"brand,country.name,country.code,department.code,department.name,flags.isWithoutReserve,price.GBPHighEstimate,price.GBPLowEstimate,price.estimateHigh,price.estimateLow,status,groups",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":1,
         "per_page":48,
         "max_facet_values":1000,
         "q":""
      },
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(price.GBPLowEstimate:[0..276797] || price.GBPHighEstimate:[0..276797]) && (status:=NEW)",
         "facet_by":"department.name",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":0,
         "per_page":0,
         "max_facet_values":1000,
         "q":""
      },
      {
         "collection":"lots",
         "exclude_fields":"footnotes,catalogDesc",
         "filter_by":"(price.GBPLowEstimate:[0..276797] || price.GBPHighEstimate:[0..276797]) && (department.name:=Collector Cars)",
         "facet_by":"status",
         "query_by":"title",
         "sort_by":"price.GBPLowEstimate:desc",
         "page":0,
         "per_page":0,
         "max_facet_values":1000,
         "q":""
      }
   ]
}'
k
"max_facet_values":1000,
Is probably taking the most time here.
Also,
Copy code
(price.GBPLowEstimate:[0..276797] || price.GBPHighEstimate:[0..276797])
Is probably something that query optimizer is not equipped to handle. ORing of two range queries.
This can be a pretty intensive operation because prices can be so sparse.
l
Yeah, and unfortunately as we are an auction house, we have low estimate price and high estimate price which means we have to perform the search on both to make sure we include the correct results for the customer
Could you explain a little more about the
max_facet_values
please and what we could do to improve there?
k
With
max_facet_values
generally having large values like
1000
can hurt if the facets are high cardinality in nature. Try reducing it to say,
10
and check what response time you get?
Also, these queries are running sequentially, i.e. every query in multi search is processed one after the other. Perhaps we can add an option to run them parallely. However you need lots of CPU cores if you already have high concurrency of number of searches happening across users.
l
I just did a
curl
with it reduced to 10 and it’s come down by approximately
200ms
but still on average about
1200ms
for the request
We have a very high user base, my team are looking to bump the RAM in each pod to 32GB
CPU cores I am not sure about
k
CPU cores is what you need, not RAM
l
If there a better way to structure the query at all, we have tried ordering things differently but it doesn’t seem to help
k
I will run it by someone and see if we can make this more organized for efficiency
l
Thank you we’d really appreciate that