#community-help

Duplicate Field Issue in Schema Creation

TLDR gab faced an issue with duplicated fields in their schema. When examined by Kishore Nallan, they found that gab may have accidentally created the duplicates due to wildcard field naming. The potential bug was identified and resolved.

Powered by Struct AI
19
26mo
Solved
Join the chat
Nov 03, 2021 (27 months ago)
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:38 PM
Hi,
I have created a schema describing once each of my fields. When I retrieve the collection schema using the api, i can see I have one field that is duplicated. Here is a part of the schema.

{ "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" }, { "facet": false, "index": true, "name": "date_updated", "optional": false, "type": "int64" }, { "facet": true, "index": true, "name": "craftsman.production_labels.*.*", "optional": true, "type": "string[]" } Also I have an error like this when querying by facet: `` Could not find a facet field named craftsman.prod
uction_labels..` in the schema.
``
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:42 PM
Hmm I wonder if there is a bug lurking here... Are you able to consistently recreate this issue?
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:43 PM
First I have the same bug in development and production env. I'm not sure how to reproduce yet
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:44 PM
I mean, if you create the same collection locally do you have the same problem. Btw, Typesense does have a bug where we don't check for duplicate fields. But if there is only 1 field definition we should not duplicate internally further.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
12:59 PM
Ok I see. I only create the collection with the same schema where the field is described once.
About reproduction I tested to create a new collection with only one field (with the schema of the duplicated one). This works fine the field is not duplicated.
01:19
gab
01:19 PM
What kind of operations could I test that would mutate my schema ?
01:31
gab
01:31 PM
Another usefull information is my collection is created programatically with the same schema for dev/prod. In dev, I have a local Typesense with Docker and in prod, a Cloud one. Both envs are now in the same state with the duplicated field.
So both envs have reacted the same way.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:33 PM
Schemas are immutable at the moment in Typesense so I don't see how they can get duplicated this way. We can try restarting Typesense server to see what happens after that.
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
01:50 PM
Ok I have somehting. I have created again the whole collection. I had no duplciated fields. I just trigger an indexation of a document and now I have the duplicated field.
I will try to check what is exactly sent to the api when I index.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:54 PM
Ok that's great. If you can create a gist showing the exact sequence I can also debug and fix.
02:11
gab
02:11 PM
The correct sequence is:
• create the collection
• create the alias
• index the document
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:11 PM
Thanks I will take a look. What field gets duplicated here?
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
02:13 PM
The duplicated field is "craftsman.production_labels.."
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:13 PM
👍
02:26
Kishore Nallan
02:26 PM
gab Any reason why the indexing document also has wildcards in the field name:

'craftsman.production_labels.*.*': [ 'Natura-Veal' ],
02:28
Kishore Nallan
02:28 PM
In the schema, you have:

{
      name: 'craftsman.production_labels.*.*',
      type: 'string[]',
      optional: true,
      facet: true
    },

This means that: "Accept any field name that begins with craftsman.production_labels.". When Typesense sees an actual field matching that rule, it creates an entry in the schema with the actual field name and its type.

Since the document that is indexed repeats the .* stuff in the field name, you end up with a duplicate. Now, we should certainly account for this edge case and not accept a document that contains a field name that duplicates a regexp field definition.
Nov 04, 2021 (26 months ago)
gab
Photo of md5-8be2a24edf7a95c9c74abce4b1130c3e
gab
07:30 AM
Ah ok! I wasn't aware about that wildcard field name. I tought it was handled as a string. I understand now. I was using it as it was convenient for me, I use a framework that use also that kind of syntax to control deepness access.
07:33
gab
07:33 AM
Thanks for help

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Threading Problem During Multiple Collection Creation and Batch Insertion in Typesense

Johan has a problem with creating multiple collections and batch-inserting documents into Typesense, which is returning results from different collections. Kishore Nallan helps troubleshoot the issue and suggests a potential local race condition, which is fixed in a later build.

35
17mo
Solved

Trouble in Implementing Deeply Nested Search

Anirudh is struggling to implement a two-level nested search. Jason asked for some specific examples to study the issue. Anirudh provided some material, realizing that adding top fields helped but might over-index. Jason then suggested reporting this issue on GitHub.

2

21
today

Cold Start Problem with Dynamic Collections

Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard `*` for query_by parameters, upgrading to `0.25.0.rc34`, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.

6

39
6mo
Solved

Issue with `included_fields` Command in Typesense

SamHendley encountered an issue with the `included_fields` command in Typesense versions 0.23.0 and 0.24.0.rc17. Jason helped identify it as a bug in the 0.24.X version, which was later addressed in release 0.24.0.rcn19.

16
14mo
Solved

Discussions on Typesense, Collections, and Dynamic Fields

Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.

3

45
35mo
Solved