Issues with Text Match Info and Split Tokens
TLDR Dima reported weird ranking and confusion with text_match_info
. Kishore Nallan clarified split tokens don't do prefix searches but have higher rank due to more matched words. Suggested creating a Github issue for further investigation.
1
1
1
Apr 17, 2023 (5 months ago)
Dima
02:15 PMtext_match_info
. Got weird ranking, trying to research a problem and this field looks like good source of debug info, but I’m not sure how to read itDima
02:17 PM<mark>T</mark> arget <mark>esti</mark>mation
somewhere on the first page for test
keywordKishore Nallan
02:23 PMDima
02:23 PMKishore Nallan
03:02 PMDima
03:06 PMsplit_join_tokens: always
I got T
+ esti
at the first place for q: test
while I also have full match on the 4-5 places (<mark>Test</mark> cards
)Dima
03:08 PM• I expect that
split_join_tokens
will not find T
+ estimates
, maybe T
+ est
but without prefix search• I expect that full match will get more score than split one
Kishore Nallan
03:12 PM1
Dima
05:02 PMtest
keyword 🐸 Will add it to the exampleDima
06:16 PMDima
08:05 PMbasketball
is one matched token, while basket ball
are two matched tokens, so the second version will always have more match score if split_join_tokens
is set to always:https://gist.github.com/b0g3r/69a2268cc0965ce706a06b8d7ae108e1
From my point of view both
basketball
and basket ball
should have the same weight, but it looks very hard to implement if we add something like he *basket* his *ball*
Apr 28, 2023 (5 months ago)
Kishore Nallan
11:14 AMJust got a chance to look into this in more detail. Split tokens don't do prefix searches. However, while highlighting we end up highlighting the split word, if present as a prefix in the text. For e.g.
test
could be split as t + est
and if there is a word like estimation
in the text then that gets highlighted as <mark>est</mark>imation
-- however we don't "search" for estimation
, just highlight if present.1
Kishore Nallan
11:15 AM1
Typesense
Indexed 2776 threads (79% resolved)
Similar Threads
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Inconsistent Search Results in Typesense UI vs Dashboard
Abhishek reports inconsistent search results in the typesense UI vs dashboard integration when using page rank with Docusaurus plugin. Jason suggests creating a GitHub issue while Abhishek seeks clarification on prioritizing exact matches.
Query on "weighted_score" & Issue with Synonym Highlighting
Stefan asked about "weighted_score" field and reported a possible synonym highlighting issue. Kishore Nallan clarified the use of "weighted_score". The possible synonym issue is still being investigated.
Issues With `text_match` Scoring for Search Queries in Typesense
Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.