#community-help

Document Weighting and Sorting Discussion

TLDR SamHendley asked how to weight a document based on age and offered a sorting method. Jason clarified the method and suggested an RC for use. SamHendley tested and confirmed the solution. User Kishore Nallan assisted when an error occurred with a new feature.

Powered by Struct AI

1

Dec 02, 2022 (13 months ago)
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:16 PM
Any thoughts on how I might best down weight a document based on age older than X?
08:18
SamHendley
08:18 PM
I was thinking I could use text_match(buckets: 10):desc or similar to make sure I’m only looking at relavent docs first then use a sorting based on a date field
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:18 PM
That will just segment the results into 10 buckets and force the text match score to tie within each bucket
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:19 PM
Whoa
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:20 PM
0.24.0.rc37 0.24.0.rcn37 is the latest RC btw
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:20 PM
yeah that looks like it would do it.
08:21
SamHendley
08:21 PM
awesome, I’ll give that a try. (though I’m using M1 mac and the intel only rcs are :sloth: )
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:21 PM
The docker amd64 builds work on a M1s too
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:22 PM
it works, just super slow compared to the native. Not a problem for this sort of testing.
08:24
SamHendley
08:24 PM
typo above: 0.24.0.rcn37
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:24 PM
Yeah I’ve noticed that too… Turning on their new beta filesystem help a little bit
08:24
Jason
08:24 PM
oh yes typo
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:25 PM
is there still a ‘3 sort options’ limit?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:25 PM
In a single search API call, yes
08:26
Jason
08:26 PM
You can have any number of sortable fields in the schema, but you can only pick 3 at a time in a given search API call
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:27 PM
ok. That starts to get a little limiting if you are taking advantage of bucketing/eval.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:29 PM
The limit is because of a performance tradeoff and we were waiting for feedback to see if it was a real issue before attempting to optimize it… Could you elaborate on your use case for needing more than 2 static fields, plus the eval?
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:32 PM
I think this is my ideal sort:
_eval(recent_activity>now-90),_text_match(buckets: 10):desc,recent_activity(buckets:5):desc,_text_match:desc
• First things that are within last 90 days
• then “relevant” (but not strict)
• then “recent” (but not strict)
• then by text best text match
08:34
SamHendley
08:34 PM
Also I may have missed it but is buckets the only way to group? I can’t pass in a constant divider? (ie in my case I might do 86400 seconds to make all documents on same day have same score)
08:36
Jason
08:36 PM
> I can’t pass in a constant divider?
No, you would have to do this at indexing time
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:36 PM
It’s still only ‘touching’ 2 fields which keep the perf issues to a minimum
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:37 PM
I know you didn’t want to open Github issues, so I’ve noted this down in our internal tracker
08:38
Jason
08:38 PM
For now, if you’re able to calculate a single score that combines the time components, you should then be able to sort on that
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
08:38 PM
coolio, thanks for the amazingly quick response and having the feature I need already available

1

Dec 30, 2022 (12 months ago)
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
02:25 PM
Any chance the new _eval feature fell out of the more recent RCs? I’m on rcn47 and making what I think is a valid sort_by setting and getting an error that feels like it isn’t parsing correctly.
Setting: "sort_by":"_eval(recent_activity_raw:1664634102),_text_match:desc,recent_activity_raw:desc"
Error: {"message": "Could not find a field named _eval(recent_activity_raw in the schema for sorting."}
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:12 PM
I just tried this on rcn47 docker build and I don't get that error. Can you post the full client snippet?
SamHendley
Photo of md5-a9a351e11d64f05b41fec183816a0cda
SamHendley
03:17 PM
DMed you the full query
04:01
SamHendley
04:01 PM
> It should be _eval(recent_activity_raw:1664637399):desc or :asc
From Kishore Nallan in DM. My response :man-facepalming: