#community-help

Duplicate Field Issue in Schema Creation

TLDR gab faced an issue with duplicated fields in their schema. When examined by Kishore Nallan, they found that gab may have accidentally created the duplicates due to wildcard field naming. The potential bug was identified and resolved.

Powered by Struct AI
19
24mo
Solved
Join the chat
Nov 03, 2021 (24 months ago)
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:38 PM
Hi,
I have created a schema describing once each of my fields. When I retrieve the collection schema using the api, i can see I have one field that is duplicated. Here is a part of the schema.

{ "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" }, { "facet": false, "index": true, "name": "date_updated", "optional": false, "type": "int64" }, { "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" } Also I have an error like this when querying by facet: `` Could not find a facet field named craftsman.prod
uction_labels..` in the schema.
``
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:42 PM
Hmm I wonder if there is a bug lurking here... Are you able to consistently recreate this issue?
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:43 PM
First I have the same bug in development and production env. I'm not sure how to reproduce yet
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:44 PM
I mean, if you create the same collection locally do you have the same problem. Btw, Typesense does have a bug where we don't check for duplicate fields. But if there is only 1 field definition we should not duplicate internally further.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:59 PM
Ok I see. I only create the collection with the same schema where the field is described once.
About reproduction I tested to create a new collection with only one field (with the schema of the duplicated one). This works fine the field is not duplicated.
01:19
gab
01:19 PM
What kind of operations could I test that would mutate my schema ?
01:31
gab
01:31 PM
Another usefull information is my collection is created programatically with the same schema for dev/prod. In dev, I have a local Typesense with Docker and in prod, a Cloud one. Both envs are now in the same state with the duplicated field.
So both envs have reacted the same way.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:33 PM
Schemas are immutable at the moment in Typesense so I don't see how they can get duplicated this way. We can try restarting Typesense server to see what happens after that.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
01:50 PM
Ok I have somehting. I have created again the whole collection. I had no duplciated fields. I just trigger an indexation of a document and now I have the duplicated field.
I will try to check what is exactly sent to the api when I index.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:54 PM
Ok that's great. If you can create a gist showing the exact sequence I can also debug and fix.
02:11
gab
02:11 PM
The correct sequence is:
• create the collection
• create the alias
• index the document
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:11 PM
Thanks I will take a look. What field gets duplicated here?
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
02:13 PM
The duplicated field is "craftsman.production_labels.."
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:13 PM
👍
02:26
Kishore Nallan
02:26 PM
gab Any reason why the indexing document also has wildcards in the field name:

'craftsman.production_labels.*.*': [ 'Natura-Veal' ],
02:28
Kishore Nallan
02:28 PM
In the schema, you have:

{
      name: 'craftsman.production_labels.*.*',
      type: 'string[]',
      optional: true,
      facet: true
    },

This means that: "Accept any field name that begins with craftsman.production_labels.". When Typesense sees an actual field matching that rule, it creates an entry in the schema with the actual field name and its type.

Since the document that is indexed repeats the .* stuff in the field name, you end up with a duplicate. Now, we should certainly account for this edge case and not accept a document that contains a field name that duplicates a regexp field definition.
Nov 04, 2021 (24 months ago)
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
07:30 AM
Ah ok! I wasn't aware about that wildcard field name. I tought it was handled as a string. I understand now. I was using it as it was convenient for me, I use a framework that use also that kind of syntax to control deepness access.
07:33
gab
07:33 AM
Thanks for help