Solving Conflicts in Searching and Ordering Data with Typesense
TLDR SamHendley faced an issue with search result order in Typesense. Kishore Nallan explained two behaviors that affected the ranking and pledged to change these, while also considering an additional suggestion from SamHendley. These changes were implemented in version 0.24.0.rcn39
.
Dec 03, 2022 (10 months ago)
SamHendley
12:40 AMI have two conflicting requirements. One is “Bucket things by relevancy and then sort by date” the other is “An exact match in the title should always show first”. I don’t see an obvious way to doing this with the current bucketing primitive. Imagine that I have 100 documents, 1 that has a title “Potatoes” and 99 others have titles like “Report about Potatoes” and secondary text like: “Potatoes are a type of food” or “Aloo is south asian term for Potatoes”. That first document is the food profile is what I really want returned first for a search of “Potato”. The food profile is updated relatively rarely but is still the most important document, the reports are published as interesting things occur and may be spread over a long time range. I have arranged the data and
query_by
order so the titles are higher ranked than the secondary text so the raw _text_match
score for the personal profile is higher than any of the secondary reports (lets imagine it’s much higher).If I use the simplest sort option
_text_match:desc,recent_activity:desc
I get the food profile first but then I get the reports in strict ordering of their text match which might mean some recent interesting reports fall off the top page because they have slightly worse text match scores.So lets take advantage of the bucketing feature. If I change my sort to
_text_match(buckets: 100):desc,recent_activity:desc
my results are now pretty biased in favor of showing me recent things which is what I wanted. The only problem is I think this would push my food profile doc down the list since it now has same effective score as other documents (all those sharing the highest ranked bucket).I was going to ask if it would be possible to have the first few highest scores be kept out of the bucketing so the exact match keeps it’s very high score. Could be something like
_text_match(buckets: 10, excludeTop:1)
.SamHendley
12:40 AMSamHendley
12:41 AMSamHendley
12:42 AM_text_match
huzzah.SamHendley
12:44 AMprioritize_token_position
. It is sorted correctly so I’m guessing there is some precision that isn’t reported in the _text_match field returned to the client, that’s not a problem, just a surpriseSamHendley
12:49 AM_text_match
score but the sorting is not strictly based on recent_activity
so something else must be changing the sort order. That makes me think the values are still there, they just aren’t being reported.SamHendley
12:50 AMbuckets:8
gets me pretty close to the result I want, the 3 documents with “Potatoes” in the title are at the top then sorted by recent_activity
SamHendley
12:52 AMbuckets:100
which I would have thought would give me the most granularity but seems to do the opposite.Kishore Nallan
10:34 AMtext_match_info
object in the response. This should clear up the confusion with the text match scores looking the same.SamHendley
01:06 PMSamHendley
01:06 PMSamHendley
01:06 PMDec 05, 2022 (10 months ago)
Kishore Nallan
02:53 PMDec 06, 2022 (10 months ago)
Kishore Nallan
10:59 AM1. We disable prioritize exact match flag when bucketing is enabled. I'm not quite sure why we do this anymore. If there is a strong case for not doing it, I can remove this behavior.
2. When there are more buckets than the number of results (e.g. num_buckets:
100
but only 17
records are found), we put all the records into a single bucket. Instead I wonder if we should not bucket at all, i.e. they retain their original match scores.SamHendley
01:16 PMThere was alot in this thread, I had another suggestion you might have missed
> I was going to ask if it would be possible to have the first few highest scores be kept out of the bucketing so the exact match keeps it’s very high score. Could be something like
_text_match(buckets: 10, exclude_top:1)
.This would allow hitting both “exact match first” and “show interesting results near top” requirements at same time.
Dec 07, 2022 (10 months ago)
Kishore Nallan
08:32 AMKishore Nallan
08:33 AMKishore Nallan
03:12 PM0.24.0.rcn39
Typesense
Indexed 2786 threads (79% resolved)
Similar Threads
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Document Weighting and Sorting Discussion
SamHendley asked how to weight a document based on age and offered a sorting method. Jason clarified the method and suggested an RC for use. SamHendley tested and confirmed the solution. User Kishore Nallan assisted when an error occurred with a new feature.
Understanding and Adjusting Query Parameter Weights
John needed help understanding default weights, manipulating weights, and sorting by weight in query parameters. Jason educated on default weights, sorting mechanism, and also suggested using buckets for sorting based on custom fields.
Discussion on Typesense Raw and Altered Text Match Scores
Weilin requested that Typesense provide both raw and altered 'text match' scores for ranking. Jason asked for a GitHub issue and promised to address it next week. Kishore Nallan then clarified the current implementation already includes the raw score.
Issues With `text_match` Scoring for Search Queries in Typesense
Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.