#community-help

Cold Start Problem with Dynamic Collections

TLDR Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard * for query_by parameters, upgrading to 0.25.0.rc34, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.

Powered by Struct AI
+16
39
4mo
Solved
Join the chat
May 26, 2023 (4 months ago)
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:46 PM
I am having a cold start problem with the way our cluster is setup.

Some context:
We have a variety of types of objects that we index in one collection. To keep things simple for now (and because we will have dynamic fields in the future), all non shared fields are dynamically defined in the schema. Ie
{      
    Name:     ".*",
    Type:     "auto",
    Optional: &_true,
},

The problem:
At query time we are getting this error: status: 404 response: {"message": "Could not find a field named $FIELD_NAME in the schema."} . This is happening when we have not indexed an object of the type that has FIELD_NAME yet in a given collection. This situation will happen often since we create new collections per customer tenant. Is there a way to have this FIELD_NAME ignored in the queryBy params if it does not exist, instead of throwing an error? Or is there a better approach here?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:49 PM
This is by design because we validate query by parameters against the schema…

When a collection is searched without any documents, how do you determine the field names to use in query_by?
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:50 PM
they are hard coded currently
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:50 PM
If they’re known ahead of time, are you able to explicitly specify them in the collection, in addition to the .* field?
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:52 PM
we could, but prefer not to in order to avoid migrations for just adding a single new field to a single object type
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:52 PM
You can still keep the dynamic fields
09:53
Jason
09:53 PM
{      
    Name:     "known_field_1",
    Type:     "string",
    Optional: &_true,
},
{      
    Name:     "known_field_2",
    Type:     "string",
    Optional: &_true,
},
{      
    Name:     ".*",
    Type:     "auto",
    Optional: &_true,
},
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:53 PM
but I think we would always run into this issue with dynamic fields
09:54
Adrian
09:54 PM
unless we check ahead of time which documents are in the collection to guide setting the query_by
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:57 PM
Yeah this would be an issue for dynamic fields in general. I was trying to come up with a workaround…

But, I completely forgot that we actually added support for wild card field names in query by in recent builds of 0.25!
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:58 PM
ah ok sweet. I'll take a look. Sounds like that could do the trick
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
09:58 PM
Could you try upgrading to 0.25.0.rc32 and setting query_by: .*
+11
09:58
Jason
09:58 PM
*Typo
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
09:58 PM
Will do. I need to sign off for the day soon, but will test this out shortly
09:59
Adrian
09:59 PM
also when is 0.25 slated to release? We already rely on it so may have to pause our prod launch until its out
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:01 PM
We’re probably a few weeks out - still ironing out issues with new vector search features. But the other features should be stable to use in production
+11
May 30, 2023 (4 months ago)
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:08 PM
hmm okay so one issue is there is still an error if the regex does not have any matches status: 404 response: {"message": "No string or string array field found matching the pattern document_names.* in the schema."}
03:09
Adrian
03:09 PM
I can hack around this, but it would be great for this case to not be treated as an error. It does not benefit me at all
03:10
Adrian
03:10 PM
this also implies in an empty collection the search query will always return an error?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:10 PM
Ah hmm, that’s good feedback.

Could you adapt this set of curl commands to replicate this issue and post it as a comment in this issue.
+11
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:25 PM
actually I realized I was not running 0.25.0.rc32 , but I also don't see that image in dockerhub
03:25
Adrian
03:25 PM
did you mean 0.25.0.rc30 ?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:26 PM
Hmm, we didn’t push the latest RC to docker hub, hang on
+11
03:27
Jason
03:27 PM
Could you try with 0.25.0.rc34?
+11
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
03:46 PM
hmm I can't even get this happy path example to work
export TYPESENSE_API_KEY=xyz

curl "" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
         "name": "companies",
         "fields": [
           {"name": "company_name", "type": "string" },
           {"name": "num_employees", "type": "int32" },
           {"name": "additional_data_1", "type": "string", "optional": true }
         ],
         "default_sorting_field": "num_employees"
       }'

curl "" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -H "Content-Type: text/plain" \
        -X POST \
        -d '{"id": "124","company_name": "Stark Industries","num_employees": 5215}
            {"id": "125","company_name": "Acme Corp","num_employees": 2133}
            {"id": "126","company_name": "Stark Industries","num_employees": 5215,"additional_data_1": "data"}'

curl "" \
        -X POST \
        -H "Content-Type: application/json" \
        -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
        -d '{
          "searches": [
            {
              "collection": "companies",
              "q": "stark",
              "query_by": "company_name, additional_data_.*"
            }
          ]
        }'
03:46
Adrian
03:46 PM
{"results":[{"code":404,"error":"No string or string array field found matching the pattern additionaldata.* in the schema."}]}%
03:46
Adrian
03:46 PM
am I not using the new feature correctly?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:07 PM
Try additional_data_*
04:08
Kishore Nallan
04:08 PM
.* is to be used only for nested field, for e.g. person.name -- we had to come up with this convention to differentiate between the two cases.
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
05:15 PM
gotcha will do. To clarify is full regex syntax supported or just wild cards?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:17 PM
Just wild-card *, for performance reasons
05:18
Jason
05:18 PM
(I’ve updated the issue description)
Adrian
Photo of md5-27ff63286c7b3dcb91085f39e910c437
Adrian
06:51 PM
I was able to get the happy path to work. Still running into the limitation where the query_by value must match at least one field name (which is problematic for a dynamic schema with optional fields, since some fields may not be initialized yet)
06:51
Adrian
06:51 PM
06:53
Adrian
06:53 PM
is this something that could potentially be addressed quickly on your end? Just asking so I know if I should try to find a quick hacky workaround on my end or wait for a potential fix
08:30
Adrian
08:30 PM
the wild-card does not seem to work if it is the first character
08:31
Adrian
08:31 PM
ie *_additional_data
May 31, 2023 (4 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:55 AM
Fair point, we will fix it shortly.
+11