Solving Conflicts in Searching and Ordering Data with Typesense
TLDR SamHendley faced an issue with search result order in Typesense. Kishore Nallan explained two behaviors that affected the ranking and pledged to change these, while also considering an additional suggestion from SamHendley. These changes were implemented in version
Dec 03, 2022 (12 months ago)
I have two conflicting requirements. One is “Bucket things by relevancy and then sort by date” the other is “An exact match in the title should always show first”. I don’t see an obvious way to doing this with the current bucketing primitive. Imagine that I have 100 documents, 1 that has a title “Potatoes” and 99 others have titles like “Report about Potatoes” and secondary text like: “Potatoes are a type of food” or “Aloo is south asian term for Potatoes”. That first document is the food profile is what I really want returned first for a search of “Potato”. The food profile is updated relatively rarely but is still the most important document, the reports are published as interesting things occur and may be spread over a long time range. I have arranged the data and
query_byorder so the titles are higher ranked than the secondary text so the raw
_text_matchscore for the personal profile is higher than any of the secondary reports (lets imagine it’s much higher).
If I use the simplest sort option
_text_match:desc,recent_activity:descI get the food profile first but then I get the reports in strict ordering of their text match which might mean some recent interesting reports fall off the top page because they have slightly worse text match scores.
So lets take advantage of the bucketing feature. If I change my sort to
_text_match(buckets: 100):desc,recent_activity:descmy results are now pretty biased in favor of showing me recent things which is what I wanted. The only problem is I think this would push my food profile doc down the list since it now has same effective score as other documents (all those sharing the highest ranked bucket).
I was going to ask if it would be possible to have the first few highest scores be kept out of the bucketing so the exact match keeps it’s very high score. Could be something like
_text_match(buckets: 10, excludeTop:1).
prioritize_token_position. It is sorted correctly so I’m guessing there is some precision that isn’t reported in the _text_match field returned to the client, that’s not a problem, just a surprise
_text_matchscore but the sorting is not strictly based on
recent_activityso something else must be changing the sort order. That makes me think the values are still there, they just aren’t being reported.
buckets:8gets me pretty close to the result I want, the 3 documents with “Potatoes” in the title are at the top then sorted by
buckets:100which I would have thought would give me the most granularity but seems to do the opposite.
Kishore Nallan10:34 AM
text_match_infoobject in the response. This should clear up the confusion with the text match scores looking the same.
Dec 05, 2022 (12 months ago)
Kishore Nallan02:53 PM
Dec 06, 2022 (12 months ago)
Kishore Nallan10:59 AM
1. We disable prioritize exact match flag when bucketing is enabled. I'm not quite sure why we do this anymore. If there is a strong case for not doing it, I can remove this behavior.
2. When there are more buckets than the number of results (e.g. num_buckets:
17records are found), we put all the records into a single bucket. Instead I wonder if we should not bucket at all, i.e. they retain their original match scores.
There was alot in this thread, I had another suggestion you might have missed
> I was going to ask if it would be possible to have the first few highest scores be kept out of the bucketing so the exact match keeps it’s very high score. Could be something like
_text_match(buckets: 10, exclude_top:1).
This would allow hitting both “exact match first” and “show interesting results near top” requirements at same time.
Dec 07, 2022 (12 months ago)
Kishore Nallan08:32 AM
Kishore Nallan08:33 AM
Kishore Nallan03:12 PM
Indexed 3015 threads (79% resolved)
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Document Weighting and Sorting Discussion
SamHendley asked how to weight a document based on age and offered a sorting method. Jason clarified the method and suggested an RC for use. SamHendley tested and confirmed the solution. User Kishore Nallan assisted when an error occurred with a new feature.
Understanding and Adjusting Query Parameter Weights
John needed help understanding default weights, manipulating weights, and sorting by weight in query parameters. Jason educated on default weights, sorting mechanism, and also suggested using buckets for sorting based on custom fields.
Discussion on Typesense Raw and Altered Text Match Scores
Weilin requested that Typesense provide both raw and altered 'text match' scores for ranking. Jason asked for a GitHub issue and promised to address it next week. Kishore Nallan then clarified the current implementation already includes the raw score.
Issues With `text_match` Scoring for Search Queries in Typesense
Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.