#community-help

Issue with Embedding Error in Version 0.25.0.rc63

TLDR Bill reported a bug in version 0.25.0.rc63 regarding a problem with updating or emplacing a document and receiving an embedding error. This was resolved in version 0.25.0.rc65, but further discussion ensued regarding the function of 'index' in the update feature.

Powered by Struct AI

3

1

1

Aug 10, 2023 (4 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:45 PM
Hello, I've found a bug in version 0.25.0.rc63. When I try to Update or Emplace a document I receive this error:
POST ../documents?action=emplace -> "message": "No valid fields found to create embedding for embedding, please provide at least one valid field or make the embedding field optional." with payload:
{
"id": "602",
"about": "test"
}

this also happens with action=update
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:47 PM
Hi Bill. Is the field already optional in the schema?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:47 PM
No the embedding field is not optional
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:48 PM
I actually fixed a bug which was not enforcing this.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:48 PM
I didn't have this issue with previous versions
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:48 PM
This is for auto embedding?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:48 PM
Yes
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:49 PM
Ok maybe that was not considered. I'll look shortly. Would you be able to post a small snippet that reproduces the issue?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
12:50 PM
Ok
12:55
Bill
12:55 PM
In order to reproduce it,
Create the collection

curl "<http://localhost:8108/collections>" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '
{
"name": "products",
"fields": [
{
"name": "product_name",
"type": "string"
},
{
"name": "about",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"product_name",
"about"
],
"model_config": {
"model_name": "ts/paraphrase-multilingual-mpnet-base-v2"
}
}
}
]
}
'
12:56
Bill
12:56 PM
Index a doc

curl "<http://localhost:8108/collections/products/documents/import?action=create>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: text/plain" \
-X POST \
-d '
{"product_name": "ABCD","about": "This is some description text"}
'
12:58
Bill
12:58 PM
Update the doc

curl "<http://localhost:8108/collections/products/documents?action=emplace>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-H "Content-Type: text/plain" \
-X POST \
-d '
{"id": "1","about": "Test"}
'
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:59 PM
Thanks, will fix shortly. Regression introduced in rc63 or rc62
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:00 PM
Ok
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:32 PM
Wait, the update doesn't have the field product_name
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:33 PM
I want to update only the about field, I use emplace for this reason
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:34 PM
Ok got it.
01:35
Kishore Nallan
01:35 PM
&gt; this also happens with action=update
But it works with action=update for me.
01:35
Kishore Nallan
01:35 PM
And your earlier error message at the start of the thread is different
01:35
Kishore Nallan
01:35 PM
No valid fields found to create embedding for `embedding`, please provide at least one valid field or make the embedding field optional.
01:35
Kishore Nallan
01:35 PM
Is that a different issue?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:36 PM
I use emplace to update part of the doc such as the about filed. I tried with update in order to check it, but I got the same error message:
No valid fields found to create embedding for `embedding`, please provide at least one valid field or make the embedding field optional.

01:37
Bill
01:37 PM
The main method I use to update docs is Emplace
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:37 PM
Ok. Let me investigate further.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:38 PM
Ok
01:39
Bill
01:39 PM
In order to reproduce it, just use emplace with only "id" and "about" fields
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:40 PM
What message do you get when you run the above curl but with action=update instead of emplace?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:41 PM
If i use update with this payload (without product_name) I get this error:
{
"message": "No valid fields found to create embedding for embedding, please provide at least one valid field or make the embedding field optional."
}
01:42
Bill
01:42 PM
I don't use Update method because it requires the full doc. I use only Emplace and update specific fields.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:07 PM
Bill There is an issue with the example above. For emplace "id": "1", is sent as the record ID. But the first record that gets indexed via action=create will actually get id: 0
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:09 PM
Yes my mistake id: 0
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:09 PM
With that change, it works for me :thinking_face:
02:11
Kishore Nallan
02:11 PM
Can you try the above?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:11 PM
Ok i'll try it
02:14
Bill
02:14 PM
I get the same error:
{
"message": "No valid fields found to create embedding for embedding, please provide at least one valid field or make the embedding field optional."
}
02:15
Bill
02:15 PM
I think i found the bug
02:16
Bill
02:16 PM
If you create the collection with this payload:
{
"name": "products",
"fields": [
{
"name": "product_name",
"type": "string"
},
{
"name": "about",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"product_name"
],
"model_config": {
"model_name": "ts/paraphrase-multilingual-mpnet-base-v2"
}
}
}
]
}
02:16
Bill
02:16 PM
Index a doc
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:16 PM
Hmm then how is it working for me :thinking_face: are you using the deb? Maybe I'm not reproducing it properly.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:16 PM
and use emplace
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:16 PM
Model name matters?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:16 PM
I use rc63
02:16
Bill
02:16 PM
maybe
02:16
Bill
02:16 PM
The "embed" field doesnt contain about also
02:18
Bill
02:18 PM
Yes that's it, I tried with your payload also
02:18
Bill
02:18 PM
If you add in "embed" field the "about", the emplace works
02:18
Bill
02:18 PM
if the embed field doesn't have the "about" field it doesn't
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:18 PM
Yup I can reproduce now.
02:18
Kishore Nallan
02:18 PM
Will check, thanks for the help!
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
02:18 PM
No problem
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:37 PM
Bill We just published 0.25.0.rc65. Could you try replicating the issue?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
04:44 PM
Ok I’ll test it in an hour

1

07:23
Bill
07:23 PM
Kishore Nallan Jason Perfect, it works now! 🙌

1

1

07:32
Bill
07:32 PM
The filter_by is available only in UPDATE method?

<http://localhost:8108/collections/docs/documents?filter_by=$FILTER_CLAUSE>

I used this curl but I get num_updated:0
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:38 PM
Could you give me a set of curl commands that create a collection, adds a few docs and then updates the docs by filter to replicate the issue? Here’s a template to use.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
09:33 PM
I had a typo in the curl I sent. My mistake, it works great

1

Aug 11, 2023 (4 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
10:59 AM
Jason Kishore Nallan I found the bug. The filter_by doesn't work for multiple doc updates when the field that you use is not set to "index": true.
For example, if the field has this structure:

{
"facet": *false*,
"index": *false*,
"infix": *false*,
"locale": "",
"name": "productID",
"optional": *true*,
"sort": *false*,
"type": "string"
},
The response using this curl:

curl "<http://localhost:8108/collections/docs/documents?filter_by=productID:=Test>" -X PATCH \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{"title": "Title with 1000 points."}'
is -&gt;
{
"num_updated": 0
}

If the field that I use in filter_by is set to index: true -&gt; num_updated: 1
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:00 AM
This is expected. index: false mean no indices will be available to do any operation. This includes searching, filtering or sorting.
11:01
Kishore Nallan
11:01 AM
In other words index: false field is as good as a stored field that's not part of schema.
11:01
Kishore Nallan
11:01 AM
The only reason we have it is when sometimes you want to use regexp or nested fields where you want some patterns to be indexed in-memory and some to not be. In those cases index: false is useful to indicate which should be excluded.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:03 AM
I don't return this field in search results and it's like an id in the doc. so I though it should be not indexed, that's why i set it to index: false
11:03
Bill
11:03 AM
Ok, i'll set it to index: true, thank you Kishore

1