#community-help

Docker Upgrade and Indexing Data Issues for Travel App

TLDR The thread discussed upgrading docker while retaining indexing data and addressed search result ranking issues in an app with collections indexed by attractions, destinations, countries, and users. Kishore Nallan provided guidance on adjusting query parameters and weights to improve search outcomes.

Powered by Struct AI
92
29mo
Solved
Join the chat
Jun 17, 2021 (29 months ago)
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
09:50 AM
Is there any way to upgrade to a rc from docker and keeping the indexing data?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:54 AM
Yes simply point the new Docker image to the same data directory and Typesense will just start and index the on disk back again and be ready. There will be a small downtime during this restart process.
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
09:55 AM
thanks!
09:56
Robert
09:56 AM
can we install the latest rc without docker too?
09:56
Robert
09:56 AM
on a linux server
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:05 AM
Do you need a RPM or DEB?
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
10:24 AM
DEB
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
10:25 AM
thanks!
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:25 AM
Welcome, let me know how it works.
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
10:26 AM
how should we install it and keep the index?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:30 AM
1. Take a back up of the data dir just in case.
2. Stop service.
3. Install deb with:
apt -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" -y install new-typsense.deb
10:30
Kishore Nallan
10:30 AM
It ensures that if a config file already exists on disk, it is reused.
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
10:31 AM
thanks!
11:54
Robert
11:54 AM
So we installed the lastest RC. The issue that we have is this:
• we have a travel app with those collections indexed:
◦ attractions
◦ destinations
◦ countries
◦ users
11:55
Robert
11:55 AM
when I search for “Paris” I expect to see Paris (from the destinations collection) to be displayed first. However, somehow the response from typesense puts these collection in the same order everytime (attractions -> destinations -> countries -> users)
11:56
Robert
11:56 AM
11:56
Robert
11:56 AM
this is how we get the results
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:56 AM
Are you querying across multiple collections using multi_search?
Robert
Photo of md5-a0aba3e46685345ae57c342a8130989f
Robert
11:57 AM
Ioan-Andrei can answer this
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
11:57 AM
we don't, we query each collection and merge the results
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:58 AM
When you merge the results, how do you sort them? Based on the text_match score value?
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
11:58 AM
yes
11:59
Ioan-Andrei
11:59 AM
desc by text_match, assuming the biggest score is the most accurate
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:00 PM
The issue here is that the query paris is not an exact match with paris, france -- Typesense does not rank strings that are shorter ahead of strings that are longer. i.e. we only look at the number of tokens matched, whether there are typos and the number of fields matching in a record against the query.

Exact matching requires an exact match of the token, i.e. paris query will match a field with string Paris .
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:00 PM
would multi search be a better option?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:01 PM
No, multi search will just return independent per-collection results -- it just parallelizes the query.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:02 PM
just a heads up, the name of the destination is just "Paris"
12:02
Ioan-Andrei
12:02 PM
in the screenshot above, he just append the coundtry name also
12:02
Ioan-Andrei
12:02 PM
the indexed value is "Paris"
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:03 PM
I see. In that case, can you give me a sample data set where this problem can be illustrated?
12:04
Kishore Nallan
12:04 PM
We have other customers using this exact match feature in RC so it might be some other issue at play here.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:04 PM
giving you a exact response json is ok?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:04 PM
Yes, exact JSON response of a search against a single collection is fine.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:05 PM
`
[
    {
        "document": {
            "attraction_name": "Mosquee de Paris",
            "coordinates": "nan",
            "country_id": "82",
            "country_name": "France",
            "destination_name": "Paris",
            "id": "880",
            "parent_destination_name": "Lhasa"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "<mark>Paris</mark>"
            },
            {
                "field": "attraction_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "Mosquee de <mark>Paris</mark>"
            }
        ],
        "text_match": 2203368317191,
        "type": "attraction"
    },
    {
        "document": {
            "attraction_name": "The Paris Catacombs",
            "coordinates": "nan",
            "country_id": "82",
            "country_name": "France",
            "destination_name": "Paris",
            "id": "906",
            "parent_destination_name": "Lhasa"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "<mark>Paris</mark>"
            },
            {
                "field": "attraction_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "The <mark>Paris</mark> Catacombs"
            }
        ],
        "text_match": 2203368317191,
        "type": "attraction"
    },
    {
        "document": {
            "attraction_name": "Eglise Saint-Etienne-du-Mont de Paris",
            "coordinates": "nan",
            "country_id": "82",
            "country_name": "France",
            "destination_name": "Paris",
            "id": "831",
            "parent_destination_name": "Lhasa"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "<mark>Paris</mark>"
            },
            {
                "field": "attraction_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "Eglise Saint-Etienne-du-Mont de <mark>Paris</mark>"
            }
        ],
        "text_match": 2203368317191,
        "type": "attraction"
    },
    {
        "document": {
            "coordinates": "2.3522,48.8566",
            "country_id": "82",
            "country_name": "France",
            "destination_name": "Paris",
            "id": "42",
            "parent_destination_name": "Lhasa"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Paris"
                ],
                "snippet": "<mark>Paris</mark>"
            }
        ],
        "text_match": 1103840043779,
        "type": "destination"
    },
    {
        "document": {
            "coordinates": "25.160855,37.080582",
            "country_id": "92",
            "country_name": "Greece",
            "destination_name": "Paros",
            "id": "986"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Paros"
                ],
                "snippet": "<mark>Paros</mark>"
            }
        ],
        "text_match": 4328350465,
        "type": "destination"
    },
    {
        "document": {
            "coordinates": "10.3280833,44.8013678",
            "country_id": "114",
            "country_name": "Italy",
            "destination_name": "Parma",
            "id": "676"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Parma"
                ],
                "snippet": "<mark>Parma</mark>"
            }
        ],
        "text_match": 4328284929,
        "type": "destination"
    },
    {
        "document": {
            "id": "211",
            "name": "Sri Lanka"
        },
        "highlights": [
            {
                "field": "name",
                "matched_tokens": [
                    "Sri"
                ],
                "snippet": "<mark>Sri</mark> Lanka"
            }
        ],
        "text_match": 33317888,
        "type": "country"
    },
    {
        "document": {
            "id": "196",
            "name": "San Marino"
        },
        "highlights": [
            {
                "field": "name",
                "matched_tokens": [
                    "Marino"
                ],
                "snippet": "San <mark>Marino</mark>"
            }
        ],
        "text_match": 33317888,
        "type": "country"
    },
    {
        "document": {
            "full_name": "LarisaNegreanu",
            "id": "98",
            "username": "larisa.negreanu"
        },
        "highlights": [
            {
                "field": "username",
                "matched_tokens": [
                    "larisa.negreanu"
                ],
                "snippet": "<mark>larisa.negreanu</mark>"
            },
            {
                "field": "full_name",
                "matched_tokens": [
                    "LarisaNegreanu"
                ],
                "snippet": "<mark>LarisaNegreanu</mark>"
            }
        ],
        "text_match": 4344668419,
        "type": "user"
    },
    {
        "document": {
            "full_name": "LarisaNegreanu",
            "id": "110",
            "username": "larisanegreanu"
        },
        "highlights": [
            {
                "field": "username",
                "matched_tokens": [
                    "larisanegreanu"
                ],
                "snippet": "<mark>larisanegreanu</mark>"
            },
            {
                "field": "full_name",
                "matched_tokens": [
                    "LarisaNegreanu"
                ],
                "snippet": "<mark>LarisaNegreanu</mark>"
            }
        ],
        "text_match": 4344668419,
        "type": "user"
    },
    {
        "document": {
            "full_name": "MariusIonescu",
            "id": "285",
            "username": "marius.ionescu"
        },
        "highlights": [
            {
                "field": "username",
                "matched_tokens": [
                    "marius.ionescu"
                ],
                "snippet": "<mark>marius.ionescu</mark>"
            },
            {
                "field": "full_name",
                "matched_tokens": [
                    "MariusIonescu"
                ],
                "snippet": "<mark>MariusIonescu</mark>"
            }
        ],
        "text_match": 4344471811,
        "type": "user"
    }
]
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:05 PM
If I can't find anything obvious, I might still need a representative dataset that reproduces the issue so I can debug further locally.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:06 PM
here is our response after querring multiple collections, getting the top responses, then merging the results and sorting by text_match
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:06 PM
Can you also please tell me your exact query?
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:06 PM
one sec
12:06
Ioan-Andrei
12:06 PM
from each collection?
12:07
Ioan-Andrei
12:07 PM
all of them are
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:07 PM
Which record in that JSON you would like to appear first?
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:07 PM
`
const searchParameters = {
    q: key,
    query_by: 'attraction_name, destination_name, parent_destination_name',
  }
12:07
Ioan-Andrei
12:07 PM
just in different collections
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:08 PM
I think the issue is with multi-field matching.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:08 PM
for countries is just
`
const searchParameters = {
    q: key,
    query_by: 'name',
  } 
12:08
Ioan-Andrei
12:08 PM
in the countries collection
12:08
Ioan-Andrei
12:08 PM
`
typeClient
    .collections('countries')
    .documents()
    .search(searchParameters)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:08 PM
Yes, the issue is because in countries collection you can have a match with only 1 single field, but when you query the other collection there can be many fields than can match.
12:09
Kishore Nallan
12:09 PM
For e.g. Mosquee de Paris record contains 2 fields which have the word paris
12:10
Kishore Nallan
12:10 PM
Which is why the match score is higher than the Paris, France record.
12:11
Kishore Nallan
12:11 PM
You can try using the query_by_weights parameter to set a much higher weight when querying countries collection.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
12:14 PM
this would be just query_by_weights: 1, in this case where we are querying for only one field
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:16 PM
It would be query_by_weights: 10 when querying a single field but something like query_by_weights: 4,3,2 when querying multiple fields. The values will depend on your exact use case. The basic gist is using weights to control relative popularity.
12:16
Kishore Nallan
12:16 PM
One can say that a match on a city or country is far more important than a match on an attraction name, so hence the higher weight.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
01:08 PM
we modified and got a better result, now we have users apearing higher as a score because they are indexed bu full name and username and they get 2 matches probably
01:09
Ioan-Andrei
01:09 PM
if we make the 2 fields into one field and reindex, would the score be lower?
01:09
Ioan-Andrei
01:09 PM
it will still hit 2 words, but would be one field not 2, would this be the case?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:35 PM
A single field match will be treated as a lower score than two field.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
03:34 PM
We modified our queries to go after single field in the fact that we search throw all collections, after that sorting using the text match. Results are somewhat better but some results we don't understand
searching for romania we get
`
{
        "document": {
            "coordinates": "12.4964,41.9028",
            "country_id": "114",
            "country_name": "Italy",
            "destination_name": "Rome",
            "id": "44"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Rome"
                ],
                "snippet": "<mark>Rome</mark>"
            }
        ],
        "text_match": 4328219393,
        "type": "destination"
    },
    {
        "document": {
            "id": "187",
            "name": "Romania"
        },
        "highlights": [
            {
                "field": "name",
                "matched_tokens": [
                    "Romania"
                ],
                "snippet": "<mark>Romania</mark>"
            }
        ],
        "text_match": 33514498,
        "type": "country"
    },
    {
        "document": {
            "attraction_name": "Museum of the National Bank of Romania",
            "coordinates": "nan",
            "country_id": "187",
            "country_name": "Romania",
            "destination_name": "Bucharest",
            "id": "4690"
        },
        "highlights": [
            {
                "field": "attraction_name",
                "matched_tokens": [
                    "Romania"
                ],
                "snippet": "Museum of the National Bank of <mark>Romania</mark>"
            }
        ],
        "text_match": 33514496,
        "type": "attraction"
    },
03:35
Ioan-Andrei
03:35 PM
and another example would be, search for a destination name called 'Peles Castle', we get it as a the 4 one with the first being Tel Aviv
03:35
Ioan-Andrei
03:35 PM
`
{
        "document": {
            "coordinates": "34.7818,32.0853",
            "country_id": "113",
            "country_name": "Israel",
            "destination_name": "Tel Aviv",
            "id": "117"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Tel"
                ],
                "snippet": "<mark>Tel</mark> Aviv"
            }
        ],
        "text_match": 4328219393,
        "type": "destination"
    },
    {
        "document": {
            "coordinates": "-15.435657,28.12295",
            "country_id": "210",
            "country_name": "Spain",
            "destination_name": "Las Palmas de Gran Canaria",
            "id": "587"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Las"
                ],
                "snippet": "<mark>Las</mark> Palmas de Gran Canaria"
            }
        ],
        "text_match": 4328219393,
        "type": "destination"
    },
    {
        "document": {
            "coordinates": "-115.1398,36.1699",
            "country_id": "236",
            "country_name": "United States",
            "destination_name": "Las Vegas",
            "id": "47"
        },
        "highlights": [
            {
                "field": "destination_name",
                "matched_tokens": [
                    "Las"
                ],
                "snippet": "<mark>Las</mark> Vegas"
            }
        ],
        "text_match": 4328219393,
        "type": "destination"
    },
    {
        "document": {
            "attraction_name": "Peleș Castle",
            "coordinates": "nan",
            "country_id": "187",
            "country_name": "Romania",
            "destination_name": "Sinaia",
            "id": "18878"
        },
        "highlights": [
            {
                "field": "attraction_name",
                "matched_tokens": [
                    "Peleș",
                    "Castle"
                ],
                "snippet": "<mark>Peleș</mark> <mark>Castle</mark>"
            }
        ],
        "text_match": 50291458,
        "type": "attraction"
    },
03:36
Ioan-Andrei
03:36 PM
could you offer some insight?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:41 PM
What's the weight you are using for the field destination_name ?
03:42
Kishore Nallan
03:42 PM
It seems like the weight of the destination_name field just over powers everything else.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
03:42 PM
in the destinations collection, there is no weight on it. Only weight we have is on the country collection with a weight of 10
03:43
Ioan-Andrei
03:43 PM
searching in the destination collection i mean, has no weight to it
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:51 PM
Can you please paste your updated query parameters again for all the 3 collections?
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
03:52 PM
after conidering what you said this morning, i just changed the destination, instead of query for deatination_name and parent_destination_name, just one field
03:52
Ioan-Andrei
03:52 PM
using just one field, produces more favorable results
03:52
Ioan-Andrei
03:52 PM
one field for all i mean
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:52 PM
You mean combine the multiple fields to a single field now?
03:54
Kishore Nallan
03:54 PM
It would be good to have the queries again so I can relook at it.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
04:26 PM
yeah, i modified the search in each collection to only use one field
04:26
Ioan-Andrei
04:26 PM
with is the name
04:26
Ioan-Andrei
04:26 PM
and the results improved greatly
04:27
Ioan-Andrei
04:27 PM
when i gave the last json, i search in destinations using 2 fields that spiked the text match a lot
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:27 PM
Can you place the updated query params for each collection here? I can then see if I can explain the oddities.
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
04:27 PM
ok
04:27
Ioan-Andrei
04:27 PM
const searchParameters = {
    q: key,
    query_by: 'attraction_name',
  }
04:28
Ioan-Andrei
04:28 PM
`
const searchParameters = {
    q: key,
    query_by: 'name',
    query_by_weights: 10,
  }
04:28
Ioan-Andrei
04:28 PM
`
const searchParameters = {
    q: key,
    query_by: 'destination_name',
  }
04:28
Ioan-Andrei
04:28 PM
`
const searchParameters = {
    q: key,
    query_by: 'collection_name',
  }
04:29
Ioan-Andrei
04:29 PM
last one is in countries
04:29
Ioan-Andrei
04:29 PM
if you change to
`
const searchParameters = {
    q: key,
    query_by: 'destination_name, parent_destination_name',
  }
04:29
Ioan-Andrei
04:29 PM
the results are really bad, in which the text_match is huge compared to the other ones
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:30 PM
Got it. And the Romania example above is with a single search param correct? It's late here so I might get back to you tomorrow to resume this convo.
Jun 18, 2021 (29 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
07:29 AM
Ioan-Andrei Can you post the JSON response value of the first result record when you query for romania on just the destination collection against destination_name field?
07:36
Kishore Nallan
07:36 AM
I was perplexed by why the "destination_name": "Rome" record had such a high match score value of 4328219393 in your JSON snippet above. So I indexed just that record in a new collection and queried for it. It is returning a different and lower match score. See here: https://gist.github.com/kishorenc/05789f80175135f7d7d5e9f0b944819f
Ioan-Andrei
Photo of md5-1404e7de8ff35d548193d970041cb8dd
Ioan-Andrei
09:54 AM
We found out why the text match was hifh. It is because we were dearching for 2 fields. Query by one field returned normal
09:54
Ioan-Andrei
09:54 AM
Results
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:54 AM
Cool 👍