Hello wave I ve got a question about how `text match` is cal typesense #community-help

Hello :wave: I've got a question about how `text_...

Colin Tatro

07/18/2022, 5:50 PM

Hello 👋 I've got a question about how

text_match

is calculated on search 🧵

Colin Tatro

07/18/2022, 5:51 PM

if I'm searching "fox" with the options

{ query_by: 'fieldA, fieldB',  query_by_weights: '20,1' }

it looks like it's calculating the same

text_match

score for all the records returned here in the query.

Colin Tatro

07/18/2022, 5:51 PM

message has been deleted

Colin Tatro

07/18/2022, 5:51 PM

my expectation here is that that second record we see here would have a higher

text_match

score since we have more occurrences of the search term on

fieldA

. Is there a way to make frequency effect the

text_match

score?

Jason Bosco

07/18/2022, 5:56 PM

May I know which version of Typesense this is on?

Colin Tatro

07/18/2022, 5:57 PM

Typesense v0.23.1

Kishore Nallan

07/19/2022, 3:09 AM

The weights are not added across fields but is used for prioritizing. If 2 records already match fully against the query on the highest weighted field, their text match score will be the same. Prior to 0.23 we were summing up weights across fields but it led to various edge cases leading to noisy results.

Colin Tatro

07/19/2022, 3:45 PM

Is there any chance of getting this re-enabled? The way we need to rank our content would rely on making frequency of search term factor into the

text_match

score.

Kishore Nallan

07/19/2022, 4:03 PM

Can you post a real world example that shows exactly the behavior you want? For e.g. do you just want repeated tokens across field to be counted or do you have any other requirements?

Colin Tatro

07/19/2022, 4:12 PM

sure so for example here:

Colin Tatro

07/19/2022, 4:13 PM

I would expect this search to rank the second result above the first since more fields match, and there's more frequency of the term.

Kishore Nallan

07/19/2022, 4:17 PM

Got it, thanks. Let me look into what kind of effort we need to make this configurable and get back to you. Right now there is now way to have the old behavior.

👍 1

Mark Hannallah

07/19/2022, 5:04 PM

@Kishore Nallan is this current behavior really intended? how can they have the same text_match score when one of the results matches across more fields here's another example that highlights the problem. the two results have the same text_match score despite "description" designated to receive higher weight

Rebecca

07/19/2022, 5:21 PM

do you mind setting up a zoom call for us all to sync about this issue? @Jason Bosco @Kishore Nallan? this is a pretty big blocker for us

Kishore Nallan

07/20/2022, 9:36 AM

Jason will contact Rebecca, but in the mean time, I will summarize the 3 different issues in discussion here: a) The very first example in the thread involving the

fox

query: Typesense does not count individual occurences of the tokens since that caused relevancy issues due to keyword stuffing in real-world data sets that can be noisy. b) The second example involving the

javascript

query: since Typesense derives a text match score from the best matched field of a record, the scores are same here. This is something that I agree is not always ideal so we've to see how we can support a more fine-grain scoring that considers additional fields that match. c) The

java

query: I've to check what's happening here as the category field is weighted lower so that record must appear ahead.

👍 1

Mark Hannallah

07/20/2022, 1:17 PM

thanks for investigating 🙇

Jason Bosco

07/20/2022, 2:36 PM

We have a potential fix that addresses b) and c). Would you be ok with us applying the patch on your cluster once it’s ready in an RC build?

Rebecca

07/20/2022, 2:39 PM

yep, we haven’t been able to launch this to production users yet so a patch should be fine

👍 1

Jason Bosco

07/20/2022, 2:45 PM

Will keep you posted, RC should be ready in 24 hours

❤️ 2

Jason Bosco

07/21/2022, 1:56 PM

We've applied the patch to your cluster. Could you try now?

👀 1

Colin Tatro

07/21/2022, 2:34 PM

@Jason Bosco I'm not seeing a difference in behavior, did you patch production?

Jason Bosco

07/21/2022, 2:35 PM

Yup production. Could you post screenshots of what results you now see for b and c? Along with the search params you're using?

Colin Tatro

07/21/2022, 2:43 PM

so this search for "javascript" with

query_by=title,shortDescription,description

Colin Tatro

07/21/2022, 2:44 PM

we see that the text match score is exactly the same when the first record has all 3 fields matching and the second only has one

Colin Tatro

07/21/2022, 2:45 PM

would expect some difference in the score

Kishore Nallan

07/21/2022, 2:49 PM

Can you please copy + paste the content of the description field of the first result here?

Colin Tatro

07/21/2022, 2:52 PM

Copy code

### Why learn JavaScript Errors and Debugging? This course will guide you through the basics of debugging and handling JavaScript errors to build a growth mindset approach to programming and prevent a crash in your applications! ### Outcomes: Learn how to debug your code and learn to predict and handle errors in your web applications. ### Note on Prerequisites: Intermediate JavaScript is a prerequisite, and you should be comfortable with arrays, objects, and looping through arrays.

Jason Bosco

07/21/2022, 3:22 PM

It looks like there might be some numeric overflow issues when assembling the text match score, which we’re looking into… But the results themselves should be sorted properly now.

Jason Bosco

07/21/2022, 3:23 PM

On a side note, the screenshot you posted above might be from 0.23.1. Could you make sure you’re running this search against your production cluster? On 0.24.0.rc22, the scores for your query should start with a 5

Colin Tatro

07/21/2022, 3:28 PM

this one is production and the one where 3 fields is ranked below one where 2 fields match

Jason Bosco

07/21/2022, 3:33 PM

Could you open up the browser dev console, look at the network requests generated to the

.../api/...

endpoint as you search, copy as curl the last request and DM it to me?

👍 1

Jason Bosco

07/21/2022, 3:47 PM

Looks like you have

"query_by":"description,longDescription,organizationId"

in the query

Jason Bosco

07/21/2022, 3:48 PM

Could you try changing it to query_by=title,shortDescription,description in the search parameters panel at the bottom of the page?

👍 1

Open in Slack

Previous Next