#community-help

Issues With `text_match` Scoring for Search Queries in Typesense

TLDR Colin encountered issues with the text_match scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.

Powered by Struct AI

5

2

1

Jul 18, 2022 (17 months ago)
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
05:50 PM
Hello 👋 I've got a question about how text_match is calculated on search 🧵
05:51
Colin
05:51 PM
if I'm searching "fox" with the options { query_by: 'fieldA, fieldB', query_by_weights: '20,1' } it looks like it's calculating the same text_match score for all the records returned here in the query.
05:51
Colin
05:51 PM
05:51
Colin
05:51 PM
my expectation here is that that second record we see here would have a higher text_match score since we have more occurrences of the search term on fieldA . Is there a way to make frequency effect the text_match score?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:56 PM
May I know which version of Typesense this is on?
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
05:57 PM
Typesense v0.23.1
Jul 19, 2022 (17 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:09 AM
The weights are not added across fields but is used for prioritizing. If 2 records already match fully against the query on the highest weighted field, their text match score will be the same.

Prior to 0.23 we were summing up weights across fields but it led to various edge cases leading to noisy results.
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
03:45 PM
Is there any chance of getting this re-enabled? The way we need to rank our content would rely on making frequency of search term factor into the text_match score.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:03 PM
Can you post a real world example that shows exactly the behavior you want? For e.g. do you just want repeated tokens across field to be counted or do you have any other requirements?
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
04:12 PM
sure so for example here:
04:13
Colin
04:13 PM
I would expect this search to rank the second result above the first since more fields match, and there's more frequency of the term.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:17 PM
Got it, thanks. Let me look into what kind of effort we need to make this configurable and get back to you. Right now there is now way to have the old behavior.

1

Mark
Photo of md5-19278eab7d805ebcc0a1d5aca558a2a1
Mark
05:04 PM
Kishore Nallan is this current behavior really intended? how can they have the same text_match score when one of the results matches across more fields

here's another example that highlights the problem. the two results have the same text_match score despite "description" designated to receive higher weight
Rebecca
Photo of md5-051f535431ff484f44f165e9a0b696a5
Rebecca
05:21 PM
do you mind setting up a zoom call for us all to sync about this issue? Jason Kishore Nallan? this is a pretty big blocker for us
Jul 20, 2022 (17 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:36 AM
Jason will contact Rebecca, but in the mean time, I will summarize the 3 different issues in discussion here:

a) The very first example in the thread involving the fox query: Typesense does not count individual occurences of the tokens since that caused relevancy issues due to keyword stuffing in real-world data sets that can be noisy.

b) The second example involving the javascript query: since Typesense derives a text match score from the best matched field of a record, the scores are same here. This is something that I agree is not always ideal so we've to see how we can support a more fine-grain scoring that considers additional fields that match.

c) The java query: I've to check what's happening here as the category field is weighted lower so that record must appear ahead.

1

Mark
Photo of md5-19278eab7d805ebcc0a1d5aca558a2a1
Mark
01:17 PM
thanks for investigating :bow:
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:36 PM
We have a potential fix that addresses b) and c). Would you be ok with us applying the patch on your cluster once it’s ready in an RC build?
Rebecca
Photo of md5-051f535431ff484f44f165e9a0b696a5
Rebecca
02:39 PM
yep, we haven’t been able to launch this to production users yet so a patch should be fine

1

Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:45 PM
Will keep you posted, RC should be ready in 24 hours

2

Jul 21, 2022 (17 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:56 PM
We've applied the patch to your cluster. Could you try now?

1

Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
02:34 PM
Jason I'm not seeing a difference in behavior, did you patch production?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:35 PM
Yup production. Could you post screenshots of what results you now see for b and c? Along with the search params you're using?
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
02:43 PM
so this search for "javascript" with query_by=title,shortDescription,description
02:44
Colin
02:44 PM
we see that the text match score is exactly the same when the first record has all 3 fields matching and the second only has one
02:45
Colin
02:45 PM
would expect some difference in the score
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:49 PM
Can you please copy + paste the content of the description field of the first result here?
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
02:52 PM
### Why learn JavaScript Errors and Debugging? This course will guide you through the basics of debugging and handling JavaScript errors to build a growth mindset approach to programming and prevent a crash in your applications! ### Outcomes: Learn how to debug your code and learn to predict and handle errors in your web applications. ### Note on Prerequisites: Intermediate JavaScript is a prerequisite, and you should be comfortable with arrays, objects, and looping through arrays.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:22 PM
It looks like there might be some numeric overflow issues when assembling the text match score, which we’re looking into… But the results themselves should be sorted properly now.
03:23
Jason
03:23 PM
On a side note, the screenshot you posted above might be from 0.23.1. Could you make sure you’re running this search against your production cluster? On 0.24.0.rc22, the scores for your query should start with a 5
Colin
Photo of md5-33a3fa3c5d128eb8daef85cce13d43c6
Colin
03:28 PM
this one is production and the one where 3 fields is ranked below one where 2 fields match
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:33 PM
Could you open up the browser dev console, look at the network requests generated to the .../api/... endpoint as you search, copy as curl the last request and DM it to me?

1

03:47
Jason
03:47 PM
Looks like you have "query_by":"description,longDescription,organizationId" in the query
03:48
Jason
03:48 PM
Could you try changing it to query_by=title,shortDescription,description in the search parameters panel at the bottom of the page?

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Adjusting Text Match Score Calculation in TypeSense

Johannes wanted to modify the Text Match Score calculation in TypeSense to improve search results returns. With counsel from Jason and Kishore Nallan, various solutions were proposed, including creating a Github issue, attempting different parameters, and updating Docker to a new version to resolve the matter.

3

48
19mo

Troubleshooting "drop_tokens_threshold" and Typo Tolerance in Typesense

Joe had issues with "drop_tokens_threshold" = 0 and typo tolerance in Typesense, after which Kishore Nallan provided solutions and clarifications on feature functionality. Their issues with the search result limit and tokens were resolved after discussion and testing.

3

29
26mo
Solved

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo
Solved

Docker Upgrade and Indexing Data Issues for Travel App

The thread discussed upgrading docker while retaining indexing data and addressed search result ranking issues in an app with collections indexed by attractions, destinations, countries, and users. Kishore Nallan provided guidance on adjusting query parameters and weights to improve search outcomes.

92
31mo
Solved