Hello :wave: I've got a question about how `text_...
# community-help
c
Hello 👋 I've got a question about how
text_match
is calculated on search 🧵
if I'm searching "fox" with the options
{ query_by: 'fieldA, fieldB',  query_by_weights: '20,1' }
it looks like it's calculating the same
text_match
score for all the records returned here in the query.
message has been deleted
my expectation here is that that second record we see here would have a higher
text_match
score since we have more occurrences of the search term on
fieldA
. Is there a way to make frequency effect the
text_match
score?
j
May I know which version of Typesense this is on?
c
Typesense v0.23.1
k
The weights are not added across fields but is used for prioritizing. If 2 records already match fully against the query on the highest weighted field, their text match score will be the same. Prior to 0.23 we were summing up weights across fields but it led to various edge cases leading to noisy results.
c
Is there any chance of getting this re-enabled? The way we need to rank our content would rely on making frequency of search term factor into the
text_match
score.
k
Can you post a real world example that shows exactly the behavior you want? For e.g. do you just want repeated tokens across field to be counted or do you have any other requirements?
c
sure so for example here:
I would expect this search to rank the second result above the first since more fields match, and there's more frequency of the term.
k
Got it, thanks. Let me look into what kind of effort we need to make this configurable and get back to you. Right now there is now way to have the old behavior.
👍 1
m
@Kishore Nallan is this current behavior really intended? how can they have the same text_match score when one of the results matches across more fields here's another example that highlights the problem. the two results have the same text_match score despite "description" designated to receive higher weight
r
do you mind setting up a zoom call for us all to sync about this issue? @Jason Bosco @Kishore Nallan? this is a pretty big blocker for us
k
Jason will contact Rebecca, but in the mean time, I will summarize the 3 different issues in discussion here: a) The very first example in the thread involving the
fox
query: Typesense does not count individual occurences of the tokens since that caused relevancy issues due to keyword stuffing in real-world data sets that can be noisy. b) The second example involving the
javascript
query: since Typesense derives a text match score from the best matched field of a record, the scores are same here. This is something that I agree is not always ideal so we've to see how we can support a more fine-grain scoring that considers additional fields that match. c) The
java
query: I've to check what's happening here as the category field is weighted lower so that record must appear ahead.
👍 1
m
thanks for investigating 🙇
j
We have a potential fix that addresses b) and c). Would you be ok with us applying the patch on your cluster once it’s ready in an RC build?
r
yep, we haven’t been able to launch this to production users yet so a patch should be fine
👍 1
j
Will keep you posted, RC should be ready in 24 hours
❤️ 2
We've applied the patch to your cluster. Could you try now?
👀 1
c
@Jason Bosco I'm not seeing a difference in behavior, did you patch production?
j
Yup production. Could you post screenshots of what results you now see for b and c? Along with the search params you're using?
c
so this search for "javascript" with
query_by=title,shortDescription,description
we see that the text match score is exactly the same when the first record has all 3 fields matching and the second only has one
would expect some difference in the score
k
Can you please copy + paste the content of the description field of the first result here?
c
Copy code
### Why learn JavaScript Errors and Debugging? This course will guide you through the basics of debugging and handling JavaScript errors to build a growth mindset approach to programming and prevent a crash in your applications! ### Outcomes: Learn how to debug your code and learn to predict and handle errors in your web applications. ### Note on Prerequisites: Intermediate JavaScript is a prerequisite, and you should be comfortable with arrays, objects, and looping through arrays.
j
It looks like there might be some numeric overflow issues when assembling the text match score, which we’re looking into… But the results themselves should be sorted properly now.
On a side note, the screenshot you posted above might be from 0.23.1. Could you make sure you’re running this search against your production cluster? On 0.24.0.rc22, the scores for your query should start with a 5
c
this one is production and the one where 3 fields is ranked below one where 2 fields match
j
Could you open up the browser dev console, look at the network requests generated to the
.../api/...
endpoint as you search, copy as curl the last request and DM it to me?
👍 1
Looks like you have
"query_by":"description,longDescription,organizationId"
in the query
Could you try changing it to query_by=title,shortDescription,description in the search parameters panel at the bottom of the page?
👍 1