Hi We re trying to update our schema so that the following q typesense #community-help

Hi We're trying to update our schema so that the ...

James Kirkby

02/24/2025, 12:45 PM

Hi We're trying to update our schema so that the following query will match the

®

character with a space. We've tried following the docs here: https://typesense.org/docs/guide/tips-for-searching-common-types-of-data.html We've tried using the following schema when creating the collection:

Copy code

token_separators: ['®']

However the Typesense server gives us the following error (v28)

Copy code

Error creating collection_name RequestMalformed: Request failed with HTTP code 400 | Server said: `token_separators` should be an array of character symbols.

Example collection data:

Copy code

[
  {
    "model": "Product 1 with something"
  },
  {
    "model": "Product®1 with something else"
  }
]

Example query:

Copy code

model="Product 1"

Results:

Copy code

- Product 1 with something

Expected results:

Copy code

- Product 1 with something
- Product®1 with something else

Are we going about this the wrong way?

Kishore Nallan

02/24/2025, 12:47 PM

Error creating collection_name RequestMalformed: Request failed with HTTP code 400 | Server said:
token_separators
should be an array of character symbols.

Please post the code that creates the collection.

James Kirkby

02/24/2025, 12:49 PM

using the javascript/typescript client:

Copy code

await client.collections().create({
  name: 'products',
  fields: [
    { name: '.*', type: 'auto' as const },
    { name: 'position', type: 'int32', sort: true },
    { name: 'model', type: 'string', facet: true, infix: true, },
  ],
  default_sorting_field: "position",
  token_separators: ["®"],
});

fields

are trimmed down for brevity

Kishore Nallan

02/24/2025, 12:58 PM

Ok got it. Right now Typesense only supports single-byte ASCII characters as separators. The

®

symbol is multi-byte sequence.

James Kirkby

02/24/2025, 1:01 PM

ah ok, thanks for looking into this! is there something else we can do to get around this, or is it not something which we could implement with Typesense? I think we could get around this using synonyms?

Kishore Nallan

02/24/2025, 1:03 PM

You have to pre-process the text before indexing.

James Kirkby

02/24/2025, 1:04 PM

great, we'll go down that route. thank you!

👍 1

Adam Al-dbhany

02/24/2025, 9:25 PM

Are there any plans to return more robust errors for situations like this?

Open in Slack

Previous Next