Hello, we are seeing a bit of a concerning issue. ...
# community-help
t
Hello, we are seeing a bit of a concerning issue. Basically, we have a dynamic facet field
skill_*
. This is meant to track various skills that we add dynamically. However, we just launched a new skill id 123, which just started showing up in documents during a demo, and we basically saw search go down mid-demo because it looks like the schema is updating with the new field immediately. Is this a correct interpretation of how auto-faceting works? If so, that’s a big problem for our schema design.
I am incorrect in this. It came about because we had a facet sneak in for a field that hasn’t been populated in a document yet.
It’s much less urgent, but is there any chance when there is one invalid facet, that the facet could be dis-included from the facet result set instead of facets coming back as an empty object?
j
That shouldn't happen. regex-based field names behavior is fairly straightforward - when a document shows up with a field name that doesn't already have an index for it, it just initializes a new field index for it and after that everything else is the same as regular fields for subsequent documents.
t
Yeah I was definitely wrong there. It was just that we automatically generate facet by clauses for some of the more complex fields. Basically, we auto-generated
skill_*, skill_coach_123(ALL:[1,])
for the facet_by field, and because it doesn’t exist yet (no one has coached on this skill),
facet_by
was coming back as
{}
j
Ah yes, the field has to exist in at least one document before it shows up in the repsponse
t
Thank you for the help. I don’t want to complain too much, but I do think there is a thing here that is not good for stability. Basically, if I have a schema like this:
Copy code
{
  {
    name: 'field.*',
    type: 'int32',
    facet: true,
  }
}
And this is my only document:
Copy code
{
  field1: 123
}
And I make this kinda request:
Copy code
{
  query: '*',
  query_by: 'field1',
  facet_by: 'field1,field2'
}
I get this back for my facets:
Copy code
{
  facets: {}
}
I really think instead of refusing all facets on the response if one isn’t valid currently, it should make a best effort to return all good facets. It can lead to things being very brittle for someone who struggles with basic things like myself 😅
j
Oh hmmm, yeah I can see that being confusing. Are you using the
validate_field_names
parameter in v28.0 by any chance? Wonder if there's a bug
Mind opening a GitHub issue for this, so we can track it?
t
Yessir! I I am not using that field and I’ll try it out. Also, will have a Github issue shortly. Thanks as always for being extremely helpful, responsive, and professional. You are a gem 💎
❤️ 1
@Jason Bosco - Sorry for the delay, ticket is raised here: https://github.com/typesense/typesense/issues/2382 I listed it as a feature_request, but I’m not sure if its a bug or a feature request, so I leave it to you to adjust its label based on how you feel.\
j
Could you also mention which version of Typesense this is on?
1
t
done
a
Hi Todd, Stepping in for Jason here! Could you share the exact query that’s not returning the expected results? Feel free to DM if it contains anything sensitive. I ran the code from the issue and it worked as expected when switching to a string field instead of int32 in field1. Let’s confirm if that’s the case on your end too.
This is the script I used to reproduce your issue. Can you tweak it to fit into your problem?
Copy code
#!/bin/bash

### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
export TYPESENSE_HOST=<http://localhost:8108>

docker stop typesense-repro 2>/dev/null
docker rm typesense-repro 2>/dev/null
rm -rf "$(pwd)"/typesense-data-dir-repro
mkdir "$(pwd)"/typesense-data-dir-repro

# Wait for Typesense to be ready
docker run -d -p 8108:8108 --name typesense-repro \
            -v"$(pwd)"/typesense-data-dir-repro:/data \
            typesense/typesense:28.0 \
            --data-dir /data \
            --api-key=$TYPESENSE_API_KEY \
            --enable-cors

# Wait till typesense is ready
until curl -s -o /dev/null -w "%{http_code}" "$TYPESENSE_HOST/health" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" | grep -q "200"; do
  sleep 2
done

# Create collection with wildcard fields
curl "<http://localhost:8108/collections>" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "wildcard_fields_test",
         "fields": [
           {"name": "field.*", "type": "string", "facet": true }
         ]
       }'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field1": 123}'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field2": 123}'

# Test faceting with both existing and non-existing fields
echo "\n"
echo "Testing faceting with wildcard fields:"
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2"
            }
          ]
        }' | jq .

docker stop typesense-repro
docker rm typesense-repro
t
I completely will, thank you for reaching out so quickly and with excellent ways to test. I’m suspecting based on your granular testing that I was conflating another issue. I’ll try combining my facets again soon and will post here when I get into trouble. I’m starting to get better at double checking my queries via the typesense dashboard thing someone made, but theres a high chance that I was misusing wildcard facets, numeric facet range labels, etc.
a
Nice, will be waiting your answer then!
thankyou 1
t
@Alan Martini - Hey, I’m back! Okay, this is a pretty big awful query, but it uses strings and still hits this issue. I am attempting to search for calls matching a dynamic list that matches up to a wildcard facet. This is my wildcard facet field:
Copy code
// skill fields - wildcarded
  const fields = [{
    name: 'skill_class_.*',
    type: 'string',
    facet: true,
    optional: true,
  }]
I just added a skill for id 19 and id 30 to my secondary database, and now my search is returning a completely empty result set.
Copy code
const search = '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_19:=[`COACHING_OPPORTUNITY`,`APPLE`,`BANANA`,`NA`]) || (skill_class_30:=[`COACHING_OPPORTUNITY`,`APPLE`,`BANANA`,`NA`]))';
This returns results as expected:
Copy code
const search = '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]))';
This is the full query shape:
Copy code
{
  q: '*',
  filter_by: '((skill_class_1:=[`APPLE`,`BANANA`,`NA`]) || (skill_class_2:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_3:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_6:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_7:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_8:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_9:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_10:=[`CRITIQUE`,`COMPLIMENT`,`NA`]) || (skill_class_15:=[`COMPLIMENT`,`CRITIQUE`,`COACHING_OPPORTUNITY`,`NA`]) || (skill_class_17:=[`NO_RED_FLAG`,`POTENTIAL_RED_FLAG`,`CRITIQUE`,`NA`]) || (skill_class_18:=[`CRITIQUE`,`COMPLIMENT`,`NA`]))',
  include_fields: 'id',
  per_page: 20,
  page: '1',
  sort_by: 'call_date:desc,call_length_seconds:desc',
  group_by: undefined,
  query_by: undefined,
  group_limit: undefined
}
I misspoke, this doesn’t involve the search. This involves the filter_by clause
Okay, tweaking that script to match your issue now
@Alan Martini - Okay, sorry I am so slow here. I dropped the field2 section. Basically, I have the schema with field.*, I import for field1, and haven’t yet for field2, and as a result, hit an error since field2 doesn’t exist even though it matches a wildcard facet. Is there a way to have this still return but just with that facet empty, since its technically valid against the schema?
Copy code
#!/bin/bash

### Run Typesense via Docker ########################################
export TYPESENSE_API_KEY=xyz
export TYPESENSE_HOST=<http://localhost:8108>

docker stop typesense-repro 2>/dev/null
docker rm typesense-repro 2>/dev/null
rm -rf "$(pwd)"/typesense-data-dir-repro
mkdir "$(pwd)"/typesense-data-dir-repro

# Wait for Typesense to be ready
docker run -d -p 8108:8108 --name typesense-repro \
            -v"$(pwd)"/typesense-data-dir-repro:/data \
            typesense/typesense:28.0 \
            --data-dir /data \
            --api-key=$TYPESENSE_API_KEY \
            --enable-cors

# Wait till typesense is ready
until curl -s -o /dev/null -w "%{http_code}" "$TYPESENSE_HOST/health" -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" | grep -q "200"; do
  sleep 2
done

# Create collection with wildcard fields
curl "<http://localhost:8108/collections>" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "wildcard_fields_test",
         "fields": [
           {"name": "field.*", "type": "string", "facet": true }
         ]
       }'

# Import a single document with only field1
curl "<http://localhost:8108/collections/wildcard_fields_test/documents/import?action=create>" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"field1": 123}'

# Test faceting with both existing and non-existing fields
echo "\n"
echo "Testing faceting with wildcard fields:"
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2"
            }
          ]
        }' | jq .

docker stop typesense-repro
docker rm typesense-repro
a
Ah, I see now Todd. You don't want it to trigger an error in the whole query because only one field of a dynamic field is missing, right?
💯 1
t
Yes!
Thank you for your patience with me getting here 😆
a
Hey @Todd Tarsi, You can use the validate_field_names as false! This will make typesense not validate if a given field exist or no. The last curl would become like this:
Copy code
curl "<http://localhost:8108/multi_search>" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "wildcard_fields_test",
              "q": "*",
              "query_by": "field1",
              "facet_by": "field1,field2",
              "validate_field_names": false
            }
          ]
        }' | jq .
You can read more about this parameter (and others) on the last row of this table: https://typesense.org/docs/28.0/api/search.html#query-parameters
t
@Alan Martini - Holy shit!!! You are a rock star. Thank you so much!
a
Glad to help Todd!
🙌 1