#community-help

Adjusting Text Match Score Calculation in TypeSense

TLDR Johannes wanted to modify the Text Match Score calculation in TypeSense to improve search results returns. With counsel from Jason and Kishore Nallan, various solutions were proposed, including creating a Github issue, attempting different parameters, and updating Docker to a new version to resolve the matter.

Powered by Struct AI

3

May 24, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
11:45 AM
Hi #community, I was wondering if there is a possibility to modify the Text Match Score calculation. In Algolia I can do this here (see image). How does it work with TypeSense?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:46 AM
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
11:47 AM
Yes, I also found this. It explains it but how can I modify the ranking criteria?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:51 AM
Using the sort_by parameter, along with these params:

https://typesense.org/docs/0.22.2/api/documents.html#ranking-parameters
11:51
Jason
11:51 AM
Could you give me an example of what you're trying to modify ranking-wise?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
11:59 AM
I want to show suggestions in form of an autocomplete to the user. The suggestions are just simple documents with one attribute value. I want to make sure that when I start typing e.g. "a" that then only terms starting with "a" are first shown, no matter if there are other terms that include more "a" but not in the beginning.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:03 PM
We don't account for word position in Typesense at the moment. Could you open a Github issue to track this, along with this example?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
12:04 PM
Oh, that's a surprise. I will create a Github issue.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:27 PM
One workaround in the meantime would be to index the first word in a separate field, search on that for autocomplete, but at display time show the other field which has all the words.
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
12:41 PM
Interesting suggestion, but I don't want to apply this only to the first word but to multiple words if they exist together.
12:42
Johannes
12:42 PM
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:03 PM
You could add both the first_word field and the full field in that order to query_by.

Could you try this on 0.23.0.rc69?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
01:18 PM
So the full_term field should contain the whole term or the whole term except the first word?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
01:37 PM
Let's try whole term except the first word…
May 25, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
09:28 AM
Jason I'm trying it right out. Do I have to sort it in a specific way?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:58 PM
Not sure yet, how does it look with the default sort?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
03:27 PM
With the suggested solution I get right now these results which aren't still correct.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:59 PM
Do you have a code sandbox I can play around with?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
04:00 PM
Unfortunately for this example I don't because I use docker on my local machine.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:03 PM
If you can may be setup something like ngrok temporarily that would work as well...
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
04:04 PM
Thanks, I will have a look.
May 26, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
09:42 AM
Jason Here it is. Using "An" for example, you will see that the result are not correct. I'm also not sure why it highlights the whole word instead of just the term I typed in. https://codesandbox.io/s/typesense-autocomplete-with-first-word-attribute-h7y1p1?file=/src/index.js
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:57 PM
Johannes Could you upgrade to 0.24.0.rc1 (this should be available on docker hub) and let me know? We made some changes to relevance algorithms there, which I think will help with your use case. This build also has a change to highlighting, where single characters are highlighted, instead of the whole work on match.
May 30, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
11:54 AM
Jason I updated the docker now to this new version. Unfortunately, the order of the results is still not correct.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:10 PM
Johannes I set query_by: 'first_word' and these are the results I see. Does this line up with what you're looking to do?
May 31, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
09:04 AM
Jason Yes, it is definitely an improvement but it would obviously fail when you continue writing a second word.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:07 PM
Ah yes, my bad :man-facepalming:

Johannes Ok here's another way to do this: you want to index a new field called say "search_string" and then remove all spaces when you create this field, at indexing time.

So for eg, you would index "Anterior part of the inferior surface of cerebrum" as:

{
  search_string: "Anteriorpartoftheinferiorsurfaceofcerebrum",
  display_words: "Anterior part of the inferior surface of cerebrum"
}

And then set these search params:

{
  ...
  query_by: "search_string",
  highlight_full_fields: "display_words"
}
03:08
Jason
03:08 PM
We're essentially getting Typesense treat the whole string as one word when searching, but then at display time we show a different field
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
03:13 PM
Ok, thanks I will test it. I just wonder why you don't add the word position to the search ranking?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:13 PM
Just a matter of bandwidth (time / effort) 🙂
03:14
Jason
03:14 PM
We definitely want to support it
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
03:17 PM
I understand. But I can't imagine that I'm the only one who would like to have this. It feels quite essential.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:20 PM
We've had may be 4 or 5 asks for it over the years, but it seemed like it wasn't important enough for anyone to document the ask in a Github issue (until you did recently). So I'd imagine not all use-cases require start of words prioritization specifically.
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
03:25 PM
Hmm, I see. Just to clarify. It isn't only about the first word. It is more about the distance of the matching string to the beginning of the term. Anyway, you guys know better what is important for you.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:26 PM
> It is more about the distance of the matching string to the beginning of the term.
Yup yup. That's the general use-case.
04:27
Jason
04:27 PM
Johannes Even if we had this feature, I'm wondering it that would help your specific use-case.

For eg, if there were two records with title:

"Function of the brain"
"Brain function"


and the search query is "Brai", this feature would rank the results as:

1. "Brain function"
2. "Function of the brain"
Since "Brain" appears earlier in the field in result #1.

Key thing is that word position is a ranking signal, and doesn't exclude any results. But in your use-case it sounds like you'd want to not show #2 at all, since it doesn't start with "Brai" in the first word right?
Jun 01, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
07:41 AM
No, I also want to show #2. For example search for "a" I would expect

a b a b
a b b a
a b b b
b a b a
b a b b
b b a b
b b b a

1

02:47
Johannes
02:47 PM
Just another thought that I had today. Specifically for the suggestions feature I would expect this ranking as well

a b a
a b
b b a b a b a

Basically, the distance of the first word is important, and only when it is the same, then the amount or even the distances of the other word are taken into account.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:37 PM
I think this should already be covered in how we're thinking about this feature. I'll keep you posted.

1

Jun 07, 2022 (19 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:41 AM
I've a build available for testing position based text match. Do you have a local dev environment setup that you can test with or do you use Typesense Cloud?
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
01:26 PM
Hi Kishore Nallan, yes I have a local docker instance running which I made publicly available with a service.
Jun 08, 2022 (19 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:02 PM
Johannes This is available in typesense/typesense:0.24.0.rc2 Docker build. You need to send a prioritize_token_position=true flag to the search query to enable this feature.
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
01:02 PM
Perfect, I will test it as soon as possible. Thanks!

1

Jun 09, 2022 (19 months ago)
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
08:39 AM
Kishore Nallan It looks alreay very good. Well done! Just found one issue. Have a look a this picture. Shouldn't be "Ear" list as the first result? Also, "Outer ear" should be height then "Uterus, early proliferative phase", no?
08:41
Johannes
08:41 AM
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:17 AM
Yes, the changes I made only takes into consideration the positional information with that flag. To make "ear" rank first, we should also then consider shorter text to be more relevant than longer text.
Johannes
Photo of md5-d6007902eeacec8d887e29b33f5045cb
Johannes
09:18 AM
Ah, I see.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:39 AM
This is not easy to do at the moment because we don't store the length of all the fields for each document and the inverted index only contains positional information for each word in the field.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Resolving Typesense Query Issues

Todd had queries regarding Typesense operation. Jason clarified Typesense's default behavior and provided a recommendation to enhance results ranking based on relevance and recency.

1

11
1mo
Solved

Troubleshooting Issues with DocSearch Hits and Scraper Configuration

Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.

7

131
8mo
Solved

Troubleshooting Typesense Setup and Understanding Facets and Keywords

Demitri encountered errors when exploring Typesense for the first time. Jason guided them through troubleshooting and discussed facets, keyword settings, and widget configurations. Helin shared a Python demo app and its source code to help Demitri with their project.

1

56
21mo
Solved

Issues With `text_match` Scoring for Search Queries in Typesense

Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.

8

33
17mo