Hi all, I'm stuck on the following issue. Given th...
# community-help
t
Hi all, I'm stuck on the following issue. Given the following schema and dummy test data:
Copy code
POST <http://localhost:8108/collections>
Content-Type: application/json
X-TYPESENSE-API-KEY: xyz

{
  "name": "my-index",
  "num_documents": 0,
  "fields": [
    {"name": "content", "type": "string", "optional": false, "index": true, "infix": true, "stem": true, "locale": "nl" }
  ]
}

###

POST <http://localhost:8108/collections/my-index/documents/import?action=create>
Content-Type: text/plain
X-TYPESENSE-API-KEY: xyz

{"content": "CAD-fiche"}
{"content": "oranje slachtofferfiches"}
{"content": "oranje tassen"}
{"content": "oranje wagens"}
{"content": "meesterlijke informatiefiches"}
{"content": "fache"}

###

POST <http://localhost:8108/multi_search>
Content-Type: application/json
X-TYPESENSE-API-KEY: xyz

{
    "searches": 
    [
        {
            "collection": "my-index",
            "q": "oranje fiche",
            "query_by": "content",
            "infix": "always"
        }
    ]
}
I expect "*oranje* slachtoffer*fiche*s" to appear as the best match. However, this specific text only appears as the fourth match with a very low text_match score. The matching only happens on "oranje", not on "fiche". If I do the same test on Algolia, "oranje slachtofferfiches" appears as the first and only match. It correctly matches on "oranje" and "fiche". Any ideas? Is this a limitation in Typesense? Am I doing something wrong here? Thanks!
j
I think I have similar problem. If you type query "oranje fiche" it will return you an empty array?
t
no, not an empty search result list. It returns 4 hits. Only not in the ordering I would expect. Typesense only matches on "oranje", not on "fiche", or better: the combination. That's why I expect "*oranje* slachtoffer*fiche*s" to appear as the best match, but it doesn't. Algolia does a much better job here and I wonder why.
Fyi: this is the complete search response:
Copy code
{
  "results": [
    {
      "facet_counts": [],
      "found": 4,
      "hits": [
        {
          "document": {
            "content": "CAD-fiche",
            "id": "0"
          },
          "highlight": {
            "content": {
              "matched_tokens": [
                "fiche"
              ],
              "snippet": "CAD-<mark>fiche</mark>"
            }
          },
          "highlights": [
            {
              "field": "content",
              "matched_tokens": [
                "fiche"
              ],
              "snippet": "CAD-<mark>fiche</mark>"
            }
          ],
          "text_match": 578730123365187700,
          "text_match_info": {
            "best_field_score": "1108091338752",
            "best_field_weight": 15,
            "fields_matched": 1,
            "num_tokens_dropped": 1,
            "score": "578730123365187705",
            "tokens_matched": 1,
            "typo_prefix_score": 0
          }
        },
        {
          "document": {
            "content": "oranje wagens",
            "id": "3"
          },
          "highlight": {
            "content": {
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> wagens"
            }
          },
          "highlights": [
            {
              "field": "content",
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> wagens"
            }
          ],
          "text_match": 100,
          "text_match_info": {
            "best_field_score": "0",
            "best_field_weight": 12,
            "fields_matched": 4,
            "num_tokens_dropped": 2,
            "score": "100",
            "tokens_matched": 0,
            "typo_prefix_score": 255
          }
        },
        {
          "document": {
            "content": "oranje tassen",
            "id": "2"
          },
          "highlight": {
            "content": {
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> tassen"
            }
          },
          "highlights": [
            {
              "field": "content",
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> tassen"
            }
          ],
          "text_match": 100,
          "text_match_info": {
            "best_field_score": "0",
            "best_field_weight": 12,
            "fields_matched": 4,
            "num_tokens_dropped": 2,
            "score": "100",
            "tokens_matched": 0,
            "typo_prefix_score": 255
          }
        },
        {
          "document": {
            "content": "oranje slachtofferfiches",
            "id": "1"
          },
          "highlight": {
            "content": {
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> slachtofferfiches"
            }
          },
          "highlights": [
            {
              "field": "content",
              "matched_tokens": [
                "oranje"
              ],
              "snippet": "<mark>oranje</mark> slachtofferfiches"
            }
          ],
          "text_match": 100,
          "text_match_info": {
            "best_field_score": "0",
            "best_field_weight": 12,
            "fields_matched": 4,
            "num_tokens_dropped": 2,
            "score": "100",
            "tokens_matched": 0,
            "typo_prefix_score": 255
          }
        }
      ],
      "out_of": 6,
      "page": 1,
      "request_params": {
        "collection_name": "my-index",
        "first_q": "oranje fiche",
        "per_page": 10,
        "q": "oranje fiche"
      },
      "search_cutoff": false,
      "search_time_ms": 1
    }
  ]
}
a
.
t
btw, this is algolia's response:
a
@Thomas De Craemer, To match the words in exact order you are inserting, you'll want to use the exact phrase match. To do this you add double quotes around the query. I see you are using the infix search, which is the funcionality necessary for the substring matching, so it should be working. Could you create a minimal example that reproduce this error using the default below? https://gist.github.com/auth/github?return_to=https%3A%2F%2Fgist.github.com%2Fjasonbosco%2F7c3432713216c378472f13e72246f46b
t
I suspect that this might be a limitation of the word splitting functionality for dutch (locale 'nl') in typesense. Or am I wrong?
a
Thanks Thomas, Im having a look
@Thomas De Craemer The infix feature only use the first word in the query to perform the infixing. This means that
oranje fiche
will not highlight
oranje slachtofferfiches
but searching for
fiche oranje
will. Discussing internally with a colleague, this is actually expected. Infix is an expensive operation and is primarily used for examining identifiers like ID fields or emails/usernames.
t
@Alan Martini Ok thanks. Does typesense have any special word splitting logic for germanic languages? "slachtofferfiche" is actually a compound word in dutch that you can split in "slachtoffer" and "fiche". That's why I expected it to match out of the box.