Lukas Matejka
02/20/2025, 8:01 PM_text_match(bucket_size:3)
than bucket of size 3 is created and than sorted with next criterium. I have two search configs differs only in sort_by
1) _text_match:desc,recent_popularity:desc
2) _text_match(bucket_size:3):desc,recent_popularity:desc
From 1) i receive more than 3 results with same text_score and yet in 2) i revecive on firt position item with lower text_score. For better explanation enclosing also picture. Do I understand correclty behaviour of bucket_size param?Lukas Matejka
02/20/2025, 8:07 PMJason Bosco
02/21/2025, 3:43 AMLukas Matejka
02/21/2025, 8:57 AMbucket_size
is correct and i can dig more deeper into this issue a try to syntethise exampleKrunal Gandhi
02/21/2025, 9:20 AMbucket_size
param, text_match_score
of all hits inside bucket will be tied and tie breaker will happen on secondary sort param.Kishore Nallan
02/21/2025, 12:49 PMbucket_size: 3
we divide results into groups of 3 records and all the items in the group are deemed to have the same text match score. The secondary sorting condition is then used to sort the items within the group.
So the behavior seen in the screenshot is correct, as you can see that the items are sorted on popularity.Kishore Nallan
02/21/2025, 12:50 PMKishore Nallan
02/21/2025, 12:51 PMLukas Matejka
02/21/2025, 1:06 PMLukas Matejka
02/21/2025, 1:06 PMbucket_size = 3
text_match_score [10,10,10,10,5,5,5,5,1,1]
buckets = {[10,10,10],[10,5,5],[5,5,5],[1]} → within these buckets sort is applied according second sort param
Lukas Matejka
02/21/2025, 1:07 PMKishore Nallan
02/21/2025, 1:09 PMKishore Nallan
02/21/2025, 1:09 PMLukas Matejka
02/21/2025, 1:10 PMLukas Matejka
02/21/2025, 1:12 PMLukas Matejka
02/21/2025, 1:13 PMKishore Nallan
02/21/2025, 1:14 PMbucket_size
has been introduced only in v28Kishore Nallan
02/21/2025, 1:15 PMLukas Matejka
02/24/2025, 10:42 AMLukas Matejka
02/24/2025, 10:42 AMschema = {
"name": collection, # Replace with your desired collection name
"fields": [
{"name": "short_description", "type": "string", "facet": False},
{"name": "categories", "type": "string", "facet": False},
{"name": "recent_popularity", "type": "int32", "facet": False},
],
"default_sorting_field": "recent_popularity",
}
Lukas Matejka
02/24/2025, 10:43 AMsearch_parameters = {
"q": query,
"query_by": "short_description,categories",
"query_by_weights": "5,3",
"prefix": "true",
"sort_by": "_text_match(buckets:2):desc,recent_popularity:desc",
"limit": "20",
}
Lukas Matejka
02/24/2025, 11:06 AMbuckset_size:3
Fanis Tharropoulos
02/24/2025, 11:09 AM_text_match(buckets: 3):desc
will net you the same results as I said about bucket_size: 2
, since it's 6 / 3 = 2Lukas Matejka
02/24/2025, 11:14 AMignoring the weights for the query_by parametersso, potential issue can be text_match_score respect the
query_by weights
, but sort not?Fanis Tharropoulos
02/24/2025, 11:42 AMLukas Matejka
02/24/2025, 1:30 PMKishore Nallan
02/24/2025, 3:31 PM0 -> "score": "578730123365711913",
1 -> "score": "578730123365711913",
2 -> "score": "578730123365711913",
3 -> "score": "578730123365711913",
4 -> "score": "578730123365711897",
5 -> "score": "578730123365711897",
With bucket_size: 3
scores are grouped into [0, 1, 2]
and [3, 4, 5]
indices. Within each group, we pick the first record's text match score as the score for the entire group. Since index 0 and 3 have the same score of 578730123365711913
both the buckets end up with the same text match score. After assigning the anchor score, we sort all the documents on this anchor text match score, so the documents get sorted by popularity.
This is the intended way of the bucketing logic: the goal is to fuzz the text match scores such that there is a gradual transition in text match ranking. However, with small result sets like this with a sharp change, this can lead to a behavior like this.Kishore Nallan
02/24/2025, 3:35 PMLukas Matejka
02/25/2025, 8:56 AMquery=shoe
with a same _text_match_score
as catalogue has many shoes (let's say X hundreds). After hundreds results there are other products with lower score and they efffectively can jump (if secondary criterium like popularity is high enough) from 150th position to 4th position with setting bucket_size:3
as anchor score of 50th bucket (where decline in score happend) will be same as anchor score of 2nd bucket. I think this is not an edge case, but typical situation in our case. Maybe quick solution would be to find out if all scores in bucket are same or if not than do not allow to jump items from this bucket higher in ranking.Fanis Tharropoulos
02/25/2025, 9:00 AMLukas Matejka
02/25/2025, 9:04 AMFanis Tharropoulos
02/25/2025, 9:05 AMKishore Nallan
02/25/2025, 9:13 AMLukas Matejka
02/25/2025, 2:39 PMKishore Nallan
02/26/2025, 5:57 AM29.0.rc3
that contains the fix.Lukas Matejka
02/26/2025, 2:04 PM