Hi I have created a schema describing once each of my fields typesense #community-help

Hi, I have created a schema describing once each o...

gab gab

11/03/2021, 12:38 PM

Hi, I have created a schema describing once each of my fields. When I retrieve the collection schema using the api, i can see I have one field that is duplicated. Here is a part of the schema. `{ "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" }, { "facet": false, "index": true, "name": "date_updated", "optional": false, "type": "int64" }, { "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" } Also I have an error like this when querying by facet: `` Could not find a facet field named

Copy code

craftsman.prod
uction_labels.*.*

in the schema. ``

Kishore Nallan

11/03/2021, 12:42 PM

Hmm I wonder if there is a bug lurking here... Are you able to consistently recreate this issue?

gab gab

11/03/2021, 12:43 PM

First I have the same bug in development and production env. I'm not sure how to reproduce yet

Kishore Nallan

11/03/2021, 12:44 PM

I mean, if you create the same collection locally do you have the same problem. Btw, Typesense does have a bug where we don't check for duplicate fields. But if there is only 1 field definition we should not duplicate internally further.

gab gab

11/03/2021, 12:59 PM

Ok I see. I only create the collection with the same schema where the field is described once. About reproduction I tested to create a new collection with only one field (with the schema of the duplicated one). This works fine the field is not duplicated.

gab gab

11/03/2021, 1:19 PM

What kind of operations could I test that would mutate my schema ?

gab gab

11/03/2021, 1:31 PM

Another usefull information is my collection is created programatically with the same schema for dev/prod. In dev, I have a local Typesense with Docker and in prod, a Cloud one. Both envs are now in the same state with the duplicated field. So both envs have reacted the same way.

Kishore Nallan

11/03/2021, 1:33 PM

Schemas are immutable at the moment in Typesense so I don't see how they can get duplicated this way. We can try restarting Typesense server to see what happens after that.

gab gab

11/03/2021, 1:50 PM

Ok I have somehting. I have created again the whole collection. I had no duplciated fields. I just trigger an indexation of a document and now I have the duplicated field. I will try to check what is exactly sent to the api when I index.

Kishore Nallan

11/03/2021, 1:54 PM

Ok that's great. If you can create a gist showing the exact sequence I can also debug and fix.

gab gab

11/03/2021, 2:10 PM

Here is a gist https://gist.github.com/gkielwasser/d2758d2186c7c2ea16a1fdbac273843f

gab gab

11/03/2021, 2:11 PM

The correct sequence is: • create the collection • create the alias • index the document

Kishore Nallan

11/03/2021, 2:11 PM

Thanks I will take a look. What field gets duplicated here?

gab gab

11/03/2021, 2:13 PM

The duplicated field is "craftsman.production_labels.*.*"

Kishore Nallan

11/03/2021, 2:13 PM

👍

Kishore Nallan

11/03/2021, 2:26 PM

@gab gab Any reason why the indexing document also has wildcards in the field name:

Copy code

'craftsman.production_labels.*.*': [ 'Natura-Veal' ],

Kishore Nallan

11/03/2021, 2:28 PM

In the schema, you have:

Copy code

{
      name: 'craftsman.production_labels.*.*',
      type: 'string[]',
      optional: true,
      facet: true
    },

This means that: "Accept any field name that begins with `craftsman.production_labels.

Copy code

". When Typesense sees an actual field matching that rule, it creates an entry in the schema with the actual field name and its type.

Since the document that is indexed repeats the

.*` stuff in the field name, you end up with a duplicate. Now, we should certainly account for this edge case and not accept a document that contains a field name that duplicates a regexp field definition.

gab gab

11/04/2021, 7:30 AM

Ah ok! I wasn't aware about that wildcard field name. I tought it was handled as a string. I understand now. I was using it as it was convenient for me, I use a framework that use also that kind of syntax to control deepness access.

gab gab

11/04/2021, 7:33 AM

Thanks for help

Jms

06/28/2024, 12:28 PM

I also see a duplicate field entry in the schema I'm not sure if this is expected? @Kishore Nallan These are my fields:

Copy code

const fields = [
  {
    name: `title_en`,
    type: 'string*',
    facet: false 
  },
  {
    name: `title_fr`,
    type: 'string*',
    facet: false 
  }
]

I create the schema through the api and when I'm viewing the schema through the Typesense cloud dashboard it gives me back this.

Copy code

[
  {
    "facet": false,
    "index": true,
    "infix": false,
    "locale": "",
    "name": "title_en",
    "optional": true,
    "sort": false,
    "stem": false,
    "type": "string*"
  },
  {
    "facet": false,
    "index": true,
    "infix": false,
    "locale": "",
    "name": "title_en",
    "optional": true,
    "sort": false,
    "stem": false,
    "type": "string"
  },
  {
    "facet": false,
    "index": true,
    "infix": false,
    "locale": "",
    "name": "title_fr",
    "optional": true,
    "sort": false,
    "stem": false,
    "type": "string*"
  },
  {
    "facet": false,
    "index": true,
    "infix": false,
    "locale": "",
    "name": "title_fr",
    "optional": true,
    "sort": false,
    "stem": false,
    "type": "string"
  }
]

Notice the only difference between the duplicated entries is that one has a type of:

string

and the other a type of

string*

(again not sure if this is expected) Also when I'm on the Typesense cloud search page, I see that every document contains 2 title_fr properties and 2 title_en properties.

Kishore Nallan

06/28/2024, 12:34 PM

Can you please post on a new thread? This is a 3-year old thread 🙂

Kishore Nallan

06/28/2024, 12:35 PM

But just to answer your question: this is expected. We have a

string*

which is the base schema and then the concrete type

string

which is detected based on the first document indexed. This is expected if you use

string*

as a type in your schema.

Jms

06/28/2024, 12:40 PM

My bad, thank you

Open in Slack

Previous Next