Issues with Text Match Info and Split Tokens
TLDR Dima reported weird ranking and confusion with
text_match_info. Kishore Nallan clarified split tokens don't do prefix searches but have higher rank due to more matched words. Suggested creating a Github issue for further investigation.
Apr 17, 2023 (5 months ago)
text_match_info. Got weird ranking, trying to research a problem and this field looks like good source of debug info, but I’m not sure how to read it
<mark>T</mark> arget <mark>esti</mark>mationsomewhere on the first page for
Kishore Nallan02:23 PM
Kishore Nallan03:02 PM
split_join_tokens: alwaysI got
estiat the first place for
q: testwhile I also have full match on the 4-5 places (
• I expect that
split_join_tokenswill not find
estbut without prefix search
• I expect that full match will get more score than split one
Kishore Nallan03:12 PM
testkeyword 🐸 Will add it to the example
basketballis one matched token, while
basket ballare two matched tokens, so the second version will always have more match score if
split_join_tokensis set to always:
From my point of view both
basket ballshould have the same weight, but it looks very hard to implement if we add something like
he *basket* his *ball*
Apr 28, 2023 (5 months ago)
Kishore Nallan11:14 AM
Just got a chance to look into this in more detail. Split tokens don't do prefix searches. However, while highlighting we end up highlighting the split word, if present as a prefix in the text. For e.g.
testcould be split as
t + estand if there is a word like
estimationin the text then that gets highlighted as
<mark>est</mark>imation-- however we don't "search" for
estimation, just highlight if present.
Kishore Nallan11:15 AM
Indexed 2776 threads (79% resolved)
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Inconsistent Search Results in Typesense UI vs Dashboard
Abhishek reports inconsistent search results in the typesense UI vs dashboard integration when using page rank with Docusaurus plugin. Jason suggests creating a GitHub issue while Abhishek seeks clarification on prioritizing exact matches.
Query on "weighted_score" & Issue with Synonym Highlighting
Stefan asked about "weighted_score" field and reported a possible synonym highlighting issue. Kishore Nallan clarified the use of "weighted_score". The possible synonym issue is still being investigated.
Issues With `text_match` Scoring for Search Queries in Typesense
Colin encountered issues with the `text_match` scoring on Typesense v0.23.1. Jason and Kishore Nallan identified a potential issue with numeric overflow in the text match score and applied an unverified patch. The final resolution is unclear.
Issues with Repeated Words and Hyphen Queries in Typesense API
JinW discusses issues with repeated word queries and hyphen-containing queries in Typesense. Kishore Nallan offers possible solutions. During the discussion, Mr seeks advice on `token_separators` and how to send custom headers. Issues remain with repeated word queries.