#community-help

Typesense Cloud Search Issue for Large Collections

TLDR Anh-Jo encountered search issues in a large collection. Jason identified the max_candidates parameter was causing the problem and mentioned the update in 0.24.1 would help.

Powered by Struct AI
heart1
white_check_mark1
11
4mo
Solved
Join the chat
May 15, 2023 (4 months ago)
Anh-Jo
Photo of md5-532946f640d5033524d0d1a2910e0c53
Anh-Jo
03:49 PM
Hey everyone ! I setup typesense cloud for a professionnal project and I got an issue with search.
I got a collection user with some classic field like first name, last name, email, etc... (I also setup in this collection a specific field (equal to the id) to be able to search directly by id (called db_id)), and sometime, search doesn't show the expected result, that is the exact result.
Does anyone have an idea ?
My query is along multiple field ( eg: first_name, last_name, email, username ), the only way to get my exact match is by deleting my db_id field in my search
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:50 PM
Could you share a set of curl commands like this that show the issue?
Anh-Jo
Photo of md5-532946f640d5033524d0d1a2910e0c53
Anh-Jo
03:54 PM
It will look like this :
export TYPESENSE_API_KEY=xyz

curl "" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "user",
         "fields": [
           {"name": "username", "type": "string" },
           {"name": "db_id", "type": "string" },
         ],
         "default_sorting_field": ""
       }'
       
curl "" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"id": "1","username": "UserTest","db_id": "1"}
            {"id": "2","username": "UserTest1","db_id": "2"}'
            
curl "" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "user",
              "q": "UserTest",
              "query_by": "username,db_id"
            }
          ]
        }'

but with around 490k users with around 121 name near my expeceted result
03:55
Anh-Jo
03:55 PM
And in this case, the expected result should be UserTest but it doesn't appear, only UserTest1 is getting
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:58 PM
I just ran that snippet and it seems to work for me:

➜  ~ curl "" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "user",
         "fields": [
           {"name": "username", "type": "string" },
           {"name": "db_id", "type": "string" }
         ],
         "default_sorting_field": ""
       }'
{"created_at":1684166226,"default_sorting_field":"","enable_nested_fields":false,"fields":[{"facet":false,"index":true,"infix":false,"locale":"","name":"username","optional":false,"sort":false,"type":"string"},{"facet":false,"index":true,"infix":false,"locale":"","name":"db_id","optional":false,"sort":false,"type":"string"}],"name":"user","num_documents":0,"symbols_to_index":[],"token_separators":[]}%
➜  ~ curl "" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"id": "1","username": "UserTest","db_id": "1"}
            {"id": "2","username": "UserTest1","db_id": "2"}'
{"success":true}
{"success":true}%
➜  ~ curl "" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "user",
              "q": "UserTest",
              "query_by": "username,db_id"
            }
          ]
        }' | jq
{
  "results": [
    {
      "facet_counts": [],
      "found": 2,
      "hits": [
        {
          "document": {
            "db_id": "1",
            "id": "1",
            "username": "UserTest"
          },
          "highlight": {
            "username": {
              "matched_tokens": [
                "UserTest"
              ],
              "snippet": "<mark>UserTest</mark>"
            }
          },
          "highlights": [
            {
              "field": "username",
              "matched_tokens": [
                "UserTest"
              ],
              "snippet": "<mark>UserTest</mark>"
            }
          ],
          "text_match": 578730123365712000,
          "text_match_info": {
            "best_field_score": "1108091339008",
            "best_field_weight": 15,
            "fields_matched": 1,
            "score": "578730123365711993",
            "tokens_matched": 1
          }
        },
        {
          "document": {
            "db_id": "2",
            "id": "2",
            "username": "UserTest1"
          },
          "highlight": {
            "username": {
              "matched_tokens": [
                "UserTest"
              ],
              "snippet": "<mark>UserTest</mark>1"
            }
          },
          "highlights": [
            {
              "field": "username",
              "matched_tokens": [
                "UserTest"
              ],
              "snippet": "<mark>UserTest</mark>1"
            }
          ],
          "text_match": 578730089005449300,
          "text_match_info": {
            "best_field_score": "1108074561536",
            "best_field_weight": 15,
            "fields_matched": 1,
            "score": "578730089005449337",
            "tokens_matched": 1
          }
        }
      ],
      "out_of": 2,
      "page": 1,
      "request_params": {
        "collection_name": "user",
        "per_page": 10,
        "q": "UserTest"
      },
      "search_cutoff": false,
      "search_time_ms": 2
    }
  ]
}
Anh-Jo
Photo of md5-532946f640d5033524d0d1a2910e0c53
Anh-Jo
03:58 PM
Yeah, but as I say, it's on a collection with 490k entries, not only two 😛
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:59 PM
Are you able to replicate this in the Typesense Cloud UI?
Anh-Jo
Photo of md5-532946f640d5033524d0d1a2910e0c53
Anh-Jo
03:59 PM
Yes !
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:00 PM
Ok, could you open your network tab in the browser dev console, then replicate it in the search UI in Typesense Cloud, look for the last request to the multi_search endpoint, then right-click that request, copy-as-curl and DM it to me?
white_check_mark1
05:11
Jason
05:11 PM
Summary of the issue: when there are a lot of matches for a common prefix, in 0.23.1 Typesense will take the top 4 prefixes and search based on that for performance reasons. This is controlled by max_candidates parameter - increasing it returns this record.
heart1
05:11
Jason
05:11 PM
In 0.24.1, we’ve increased max_candidates to 1000 for collections with less than 500K documents, which will also help here.