Rajaie
12/17/2024, 3:25 AMtext_match_type
.
Proximity: Whether the query tokens appear verbatim or interspersed with other tokens in the field. Documents in which the query tokens appear right next to each other will be ranked above documents where the query tokens exist but are far apart in a text field.
But when I am querying a string field (title
) for "*Android Mobile Phone*", the top 2 results have a title with an exact match of android mobile phone
. Both have the same best_field_score
.
The next 8 results all have the same score
, text_match
, and best_field_score
values, which isn't expected given their values:
1. best android mobile phone
2. top android mobile phone
3. top android mobile & smart phone
4. android mobile phone - best mobile
I would expect #4 to have the highest score since the word mobile
is mentioned twice (frequency
bullet point from the above link), and #3 to be last because the words aren't right next to each other. #1 and #2 should be in positions #2 and #3.Fanis Tharropoulos
12/17/2024, 7:53 AMprioritize_token_position
parameter and setting it to true will boost results that have the queried tokens appear earlier in the documentRajaie
12/17/2024, 8:11 AMKishore Nallan
12/18/2024, 1:37 AMbest_field_score
with the default text match type will pick the best score for the query among all fields queried. Is there are another field where a direct verbatim match of android mobile phone
occurs?
Otherwise, please share the full JSON response returned by Typesense.Rajaie
12/18/2024, 5:42 AMRajaie
12/18/2024, 5:42 AMandroid mobile phone
android mobile phone
top android mobile phone - best android mobile phone
top android mobile phone - best android mobile phone
top android mobile phone
best android mobile phone
top android mobile phone - best android mobile phone
i would expect
top android mobile phone
best android mobile phone
to be in last place since the current last result has 2 repetitions of android mobile phone
Rajaie
12/18/2024, 5:44 AMRajaie
12/18/2024, 5:46 AMend_time="$(date +%s)"
day=$((60*60*24))
start_time=$(($end_time - 30*$day))
country="CA"
show_highlights=""
include_fields="title"
query_by_weights="1"
query_by="title"
query="site%20reliability%20engineer"
curl -H 'X-TYPESENSE-API-KEY: xyz' "<http://localhost:8108/collections/products/documents/search?q=$query&per_page=100&page=1&query_by=$query_by&highlight_fields=$show_highlights&include_fields=$include_fields&query_by_weights=$query_by_weights&filter_by=posting_date%3A%5B$start_time..$end_time%5D&enable_highlight_v1=false>" | jq .
Rajaie
12/18/2024, 5:47 AMPrincipal Site Reliability & Cloud Engineer - Americas
to be in last placeKishore Nallan
12/18/2024, 3:46 PMsince the current last result has 2 repetitions ofWe don't count repetitions of a token because in many practical cases we found that to surface bad records with too many repeated words.android mobile phone
Kishore Nallan
12/18/2024, 4:07 PMPrincipal Site Reliability & Cloud Engineer - Americas
to be in last place
This one still look weird. Would need access to your dataset, atleast a small portion which can reproduce this issue. Maybe just the title field which should not be containing anything sensitive for sharing.Rajaie
12/18/2024, 4:31 PMRajaie
12/18/2024, 5:09 PMKishore Nallan
12/19/2024, 5:52 AM