Hello we are seeing a bit of a concerning issue Basically we typesense #community-help

Hello, we are seeing a bit of a concerning issue. ...

Todd Tarsi

05/20/2025, 5:04 PM

Hello, we are seeing a bit of a concerning issue. Basically, we have a dynamic facet field

skill_*

. This is meant to track various skills that we add dynamically. However, we just launched a new skill id 123, which just started showing up in documents during a demo, and we basically saw search go down mid-demo because it looks like the schema is updating with the new field immediately. Is this a correct interpretation of how auto-faceting works? If so, that’s a big problem for our schema design.

Todd Tarsi

05/20/2025, 5:24 PM

I am incorrect in this. It came about because we had a facet sneak in for a field that hasn’t been populated in a document yet.

Todd Tarsi

05/20/2025, 5:36 PM

It’s much less urgent, but is there any chance when there is one invalid facet, that the facet could be dis-included from the facet result set instead of facets coming back as an empty object?

Jason Bosco

05/20/2025, 5:42 PM

That shouldn't happen. regex-based field names behavior is fairly straightforward - when a document shows up with a field name that doesn't already have an index for it, it just initializes a new field index for it and after that everything else is the same as regular fields for subsequent documents.

Todd Tarsi

05/20/2025, 5:44 PM

Yeah I was definitely wrong there. It was just that we automatically generate facet by clauses for some of the more complex fields. Basically, we auto-generated

skill_*, skill_coach_123(ALL:[1,])

for the facet_by field, and because it doesn’t exist yet (no one has coached on this skill),

facet_by

was coming back as

{}

Jason Bosco

05/20/2025, 10:10 PM

Ah yes, the field has to exist in at least one document before it shows up in the repsponse

Todd Tarsi

05/21/2025, 6:08 PM

Thank you for the help. I don’t want to complain too much, but I do think there is a thing here that is not good for stability. Basically, if I have a schema like this:

Copy code

{
  {
    name: 'field.*',
    type: 'int32',
    facet: true,
  }
}

And this is my only document:

Copy code

{
  field1: 123
}

And I make this kinda request:

Copy code

{
  query: '*',
  query_by: 'field1',
  facet_by: 'field1,field2'
}

I get this back for my facets:

Copy code

{
  facets: {}
}

I really think instead of refusing all facets on the response if one isn’t valid currently, it should make a best effort to return all good facets. It can lead to things being very brittle for someone who struggles with basic things like myself 😅

Jason Bosco

05/21/2025, 9:59 PM

Oh hmmm, yeah I can see that being confusing. Are you using the

validate_field_names

parameter in v28.0 by any chance? Wonder if there's a bug

Jason Bosco

05/21/2025, 9:59 PM

Mind opening a GitHub issue for this, so we can track it?

Todd Tarsi

05/22/2025, 2:20 AM

Yessir! I I am not using that field and I’ll try it out. Also, will have a Github issue shortly. Thanks as always for being extremely helpful, responsive, and professional. You are a gem 💎

❤️ 1

Todd Tarsi

05/27/2025, 7:10 PM

@Jason Bosco - Sorry for the delay, ticket is raised here: https://github.com/typesense/typesense/issues/2382 I listed it as a feature_request, but I’m not sure if its a bug or a feature request, so I leave it to you to adjust its label based on how you feel.\

Jason Bosco

05/27/2025, 10:07 PM

Could you also mention which version of Typesense this is on?

✅ 1

Todd Tarsi

05/28/2025, 12:36 AM

done

Alan Martini

05/29/2025, 9:55 PM

Hi Todd, Stepping in for Jason here! Could you share the exact query that’s not returning the expected results? Feel free to DM if it contains anything sensitive. I ran the code from the issue and it worked as expected when switching to a string field instead of int32 in field1. Let’s confirm if that’s the case on your end too.

Alan Martini

05/30/2025, 3:36 PM

This is the script I used to reproduce your issue. Can you tweak it to fit into your problem?

Copy code

#!/bin/bash

### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
export TYPESENSE_HOST=<http://localhost:8108>

docker stop typesense-repro 2>/dev/null
docker rm typesense-repro 2>/dev/null
rm -rf "$(pwd)"/typesense-data-dir-repro
mkdir "$(pwd)"/typesense-data-dir-repro

# Wait for Typesense to be ready
docker run -d -p 8108:8108 --name typesense-repro \
            -v"$(pwd)"/typesense-data-dir-repro:/data \
            typesense/typesense:28.0 \
            --data-dir /data \
            --api-key=$TYPESENSE_API_KEY \
            --enable-cors

# Wait till typesense is ready
until curl -s -o /dev/null -w "%{http_code}" "$TYPESENSE_HOST/health" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" | grep -q "200"; do
  sleep 2
done

# Create collection with wildcard fields
curl "<http://localhost:8108/collections>" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "wildcard_fields_test",
         "fields": [
           {"name": "field.*", "type": "string", "facet": true }
         ]
       }'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field1": 123}'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field2": 123}'

# Test faceting with both existing and non-existing fields
echo "\n"
echo "Testing faceting with wildcard fields:"
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2"
            }
          ]
        }' | jq .

docker stop typesense-repro
docker rm typesense-repro

Todd Tarsi

05/30/2025, 3:41 PM

I completely will, thank you for reaching out so quickly and with excellent ways to test. I’m suspecting based on your granular testing that I was conflating another issue. I’ll try combining my facets again soon and will post here when I get into trouble. I’m starting to get better at double checking my queries via the typesense dashboard thing someone made, but theres a high chance that I was misusing wildcard facets, numeric facet range labels, etc.

Alan Martini

05/30/2025, 4:25 PM

Nice, will be waiting your answer then!

thankyou 1

Todd Tarsi

06/12/2025, 7:34 PM

@Alan Martini - Hey, I’m back! Okay, this is a pretty big awful query, but it uses strings and still hits this issue. I am attempting to search for calls matching a dynamic list that matches up to a wildcard facet. This is my wildcard facet field:

Copy code

// skill fields - wildcarded
  const fields = [{
    name: 'skill_class_.*',
    type: 'string',
    facet: true,
    optional: true,
  }]

I just added a skill for id 19 and id 30 to my secondary database, and now my search is returning a completely empty result set.

Copy code

const search = '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_19:=[`COACHING_OPPORTUNITY`,`APPLE`,`BANANA`,`NA`]) || (skill_class_30:=[`COACHING_OPPORTUNITY`,`APPLE`,`BANANA`,`NA`]))';

This returns results as expected:

Copy code

const search = '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]))';

Todd Tarsi

06/12/2025, 7:39 PM

This is the full query shape:

Copy code

{
  q: '*',
  filter_by: '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]))',
  include_fields: 'id',
  per_page: 20,
  page: '1',
  sort_by: 'call_date:desc,call_length_seconds:desc',
  group_by: undefined,
  query_by: undefined,
  group_limit: undefined
}

Todd Tarsi

06/12/2025, 7:40 PM

I misspoke, this doesn’t involve the search. This involves the filter_by clause

Todd Tarsi

06/12/2025, 7:41 PM

Okay, tweaking that script to match your issue now

Todd Tarsi

06/16/2025, 2:03 PM

@Alan Martini - Okay, sorry I am so slow here. I dropped the field2 section. Basically, I have the schema with field.*, I import for field1, and haven’t yet for field2, and as a result, hit an error since field2 doesn’t exist even though it matches a wildcard facet. Is there a way to have this still return but just with that facet empty, since its technically valid against the schema?

Copy code

#!/bin/bash

### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
export TYPESENSE_HOST=<http://localhost:8108>

docker stop typesense-repro 2>/dev/null
docker rm typesense-repro 2>/dev/null
rm -rf "$(pwd)"/typesense-data-dir-repro
mkdir "$(pwd)"/typesense-data-dir-repro

# Wait for Typesense to be ready
docker run -d -p 8108:8108 --name typesense-repro \
            -v"$(pwd)"/typesense-data-dir-repro:/data \
            typesense/typesense:28.0 \
            --data-dir /data \
            --api-key=$TYPESENSE_API_KEY \
            --enable-cors

# Wait till typesense is ready
until curl -s -o /dev/null -w "%{http_code}" "$TYPESENSE_HOST/health" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" | grep -q "200"; do
  sleep 2
done

# Create collection with wildcard fields
curl "<http://localhost:8108/collections>" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "wildcard_fields_test",
         "fields": [
           {"name": "field.*", "type": "string", "facet": true }
         ]
       }'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field1": 123}'

# Test faceting with both existing and non-existing fields
echo "\n"
echo "Testing faceting with wildcard fields:"
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2"
            }
          ]
        }' | jq .

docker stop typesense-repro
docker rm typesense-repro

Alan Martini

06/16/2025, 4:16 PM

Ah, I see now Todd. You don't want it to trigger an error in the whole query because only one field of a dynamic field is missing, right?

💯 1

Todd Tarsi

06/16/2025, 4:16 PM

Yes!

Todd Tarsi

06/16/2025, 4:17 PM

Thank you for your patience with me getting here 😆

Alan Martini

06/16/2025, 5:45 PM

Hey @Todd Tarsi, You can use the validate_field_names as false! This will make typesense not validate if a given field exist or no. The last curl would become like this:

Copy code

curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2",
              "validate_field_names": false
            }
          ]
        }' | jq .

You can read more about this parameter (and others) on the last row of this table: https://typesense.org/docs/28.0/api/search.html#query-parameters

Todd Tarsi

06/16/2025, 5:46 PM

@Alan Martini - Holy shit!!! You are a rock star. Thank you so much!

Alan Martini

06/16/2025, 10:12 PM

Glad to help Todd!

🙌 1

Open in Slack

Previous Next