Hi We're trying to update our schema so that the ...
# community-help
j
Hi We're trying to update our schema so that the following query will match the
®
character with a space. We've tried following the docs here: https://typesense.org/docs/guide/tips-for-searching-common-types-of-data.html We've tried using the following schema when creating the collection:
Copy code
token_separators: ['®']
However the Typesense server gives us the following error (v28)
Copy code
Error creating collection_name RequestMalformed: Request failed with HTTP code 400 | Server said: `token_separators` should be an array of character symbols.
Example collection data:
Copy code
[
  {
    "model": "Product 1 with something"
  },
  {
    "model": "Product®1 with something else"
  }
]
Example query:
Copy code
model="Product 1"
Results:
Copy code
- Product 1 with something
Expected results:
Copy code
- Product 1 with something
- Product®1 with something else
Are we going about this the wrong way?
k
Error creating collection_name RequestMalformed: Request failed with HTTP code 400 | Server said:
token_separators
should be an array of character symbols.
Please post the code that creates the collection.
j
using the javascript/typescript client:
Copy code
await client.collections().create({
  name: 'products',
  fields: [
    { name: '.*', type: 'auto' as const },
    { name: 'position', type: 'int32', sort: true },
    { name: 'model', type: 'string', facet: true, infix: true, },
  ],
  default_sorting_field: "position",
  token_separators: ["®"],
});
fields
are trimmed down for brevity
k
Ok got it. Right now Typesense only supports single-byte ASCII characters as separators. The
®
symbol is multi-byte sequence.
j
ah ok, thanks for looking into this! is there something else we can do to get around this, or is it not something which we could implement with Typesense? I think we could get around this using synonyms?
k
You have to pre-process the text before indexing.
j
great, we'll go down that route. thank you!
👍 1
a
Are there any plans to return more robust errors for situations like this?