#community-help

Issues with Cluster Upgrade and Embedding Field

TLDR Gustavo had issues upgrading their cluster and their embedding field wasn't being filled. Jason helped to solve the upgrade issue and advised re-indexing the documents to solve the embedding field issue. Both problems were successfully resolved.

Powered by Struct AI
+17
sweat_smile1
72
1mo
Solved
Join the chat
Aug 02, 2023 (1 month ago)
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
07:14 PM
I've initiated an upgrade in my cluster about 20 minutes ago. How much time does it use to take?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:23 PM
For your dataset it usually takes about 30 minutes from what I remember…
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
07:23 PM
I think something went wrong and it's rolling back
07:24
Gustavo
07:24 PM
Image 1 for
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:24 PM
Yeah, it looks like the RC build you had chosen to upgrade to has a bug that made the process go into a crash-restart loop. We’re looking into it
07:24
Jason
07:24 PM
The rollback should be done in about 30 minutes
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
07:25 PM
The reason I tried to upgrade is because I noted the my embedding field wasn't being filled. No idea why.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:26 PM
For new records you mean?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
07:26 PM
Yes, but I don't now when it started to behave like that.
07:28
Gustavo
07:28 PM
The embedding field is computed from the topics field which is an array of strings. After looking for a bug in my search functionality, I noted that a document in Typesense didn't have the embedding field when I retrived it with the SDK.
07:29
Gustavo
07:29 PM
I'm waiting for the cluster to be up so I can continue investigating.
07:33
Gustavo
07:33 PM
It finished to roll back, but it's unhealthy. Edit: it's healthy now. Edit: and unhealthy again. 😅
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:34 PM
It might be a bit flaky for a few minutes right after an upgrade completes, as DNS propagation happens
+11
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
07:48 PM
Are you able to access my posts-v1 collection? There's a document there with ID z3xMqOyt8d0xdTeEaPyb that contains the topics field, but not the embedding.
07:48
Gustavo
07:48 PM
Cluster: v601y2x3upjea4tip
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:01 PM
Need to try a few things which involve restarting your node. Is it ok if there’s sporadic 10 min downtimes?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
08:02 PM
Yes
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:13 PM
Tried a few things and didn’t work unfortunately… Your cluster is back up, but need to look closer in a few hours
+11
Aug 03, 2023 (1 month ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:10 AM
It looks like this might have been an issue from early RC builds where if an OpenAI API call fails, we still indexed the document without the embeddings. In builds after end of June, we added a validation to fail the document during import if the auto-embedding generation fails.
04:10
Jason
04:10 AM
So it’s likely that these documents that don’t have embeddings were created in earlier RC builds without this validation
04:11
Jason
04:11 AM
I would recommend either deleting them and adding them back, or add a space at the end of each source field that’s used for embedding generation which will then cause the embedding generation to run
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
10:22 AM
I’m on RC53. Upgrading to RC58 failed, but I see there’s the option for RC55 in the list. Is the problem fixed in it?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:32 PM
RC53 doesn’t have the missing validation issue.

However builds after that have a different bug that’s causing your particular dataset to not load in them. So I’d recommend not upgrading up from rc53 for now. We’re working on the fix.
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
04:20 PM
I was having the issue on RC53 🤔
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:21 PM
Right, but the issue started well before when you were running an older RC build. So those documents were probably just carried over with the missing embeddings from the older RC builds, even as you upgraded
04:21
Jason
04:21 PM
Even once you upgrade to the upcoming RC build, the embeddings will still be missing, unless you do this: https://typesense-community.slack.com/archives/C01P749MET0/p1691035880028789?thread_ts=1691003672.362599&cid=C01P749MET0
04:24
Jason
04:24 PM
Just published the new RC build. Scheduled the upgrade to your cluster
+11
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
04:26 PM
I thought I updated a document using RC53 and the embedding was still missing, but it's probably because I didn't actually change the source field
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:31 PM
Yeah there needs to be some change to the field (eg: adding a space / period, etc)
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
05:01 PM
It finished to upgrade at 13:41 UTC-3, the dashboard consistently says it's healthy when I refresh and I can search from there, but my requests are still failing (20min later). Is it expected?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:48 PM
I do see searches coming into your cluster… Are you still seeing issues?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
06:11 PM
No, it's working now
+11
08:00
Gustavo
08:00 PM
I've reindexed all docs and so far it seems the embedding field is working as expected. 👍
+11
Aug 14, 2023 (1 month ago)
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
10:16 PM
Jason Caught more docs missing the embedding field since I sent the message above. The cluster is on RC59 all this time.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:17 PM
Could you give me the ID of one such document?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
10:19 PM
gZKGF5rItp8UWLdmKzIg: had the problem, but I deleted and recreated, so now it's OK
AbHg5oQacJ1Ufw9JjI9v: still missing the embedding field
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:24 PM
The document was created on August 1 (based on the createdAt timestamp), but the cluster was upgraded to rc59 only on Aug 3. So this is an old document that was created on a prior RC build which didn’t have the error handling for this particular case.

See this: https://typesense-community.slack.com/archives/C01P749MET0/p1691079700751459?thread_ts=1691003672.362599&cid=C01P749MET0
10:26
Jason
10:26 PM
Unrelated side note: rc66 is now out which I’d recommend upgrading to
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
10:28 PM
Strange because the last thing I did that day was I deleted all docs and reindexed. 🤔
10:28
Gustavo
10:28 PM
I'll update to RC66, then delete all docs and reindex. Let's see.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:29 PM
Could you make sure that the collection is fully empty before you re-index?

Best to just delete the whole collection and recreate it
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
10:30 PM
I think that's what I did, but I'll make sure this time.
11:01
Gustavo
11:01 PM
Upgraded to RC66.
11:01
Gustavo
11:01 PM
Deleted all docs.
Image 1 for Deleted all docs.
11:01
Gustavo
11:01 PM
Reindexing them...
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:01 PM
You mean deleted the collection and created it right?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
11:01 PM
No, just deleted the docs.
11:02
Gustavo
11:02 PM
Do I have to delete the collection?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
11:02 PM
Could you delete the full collection and create it again, just to be sure (that it’s not an issue with older collections)?
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
11:02 PM
Yes
11:04
Gustavo
11:04 PM
Deleted and recreated the collection.
Image 1 for Deleted and recreated the collection.
+11
Aug 16, 2023 (1 month ago)
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
04:46 PM
Jason I'm having another problem. It doesn't seem to be the same thing, but may be related. The embedding is generated from the topics array. I have a document with:
  id: 'WtSy5f9KgdvxGB0i6HGz',
  topics: [
    'Direito Trabalhista',
    'Direito Previdenciário',
    'Benefícios trabalhistas',
    'Auxílio maternidade',
    'Licença maternidade'
  ]

Weirdly, when I make a semantic search with "Auxílio maternidade", it doesn't appear.
04:47
Gustavo
04:47 PM
How can it be that searching for an exact match in topics doesn't include it? I wonder if there's something wrong with the embedding itself.
04:48
Gustavo
04:48 PM
Here's the definition of the embedding in the schema:
      "embed": {
        "from": [
          "topics"
        ],
        "model_config": {
          "api_key": "...",
          "model_name": "openai/text-embedding-ada-002"
        }
      },
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "embedding",
      "num_dim": 1536,
      "optional": false,
      "sort": false,
      "type": "float[]"
    },

04:48
Gustavo
04:48 PM
Could it be related to the locale because the topics include diacritics?
04:55
Gustavo
04:55 PM
Hmm, doesn't seem to be the case. I updated the document to topics: ['Software development'] just to make it have a topic that's different from all other documents in the collection. I search for "Software development" and it still doesn't appear in the first 50 results. There's no other document with anything even close to that topic.
04:56
Gustavo
04:56 PM
It's like this particular document it just broken.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:57 PM
I’ve seen this happen with different embedding models. Somehow verbatim searches are not considered “close enough” by the model… This is where hybrid search helps. Typesense will do a keyword search and that will put exact matches on top, followed by semantic matches
04:57
Jason
04:57 PM
So you can do something like query_by: topics,embedding
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
04:59 PM
Wait, I deleted the document, recreated and now it appears when I search.
05:00
Gustavo
05:00 PM
But it's not the same issue as before because before the embedding field was missing. This time, it was there.
05:01
Gustavo
05:01 PM
This is from my terminal, printed before I deleted I recreated the document, so you can see the embedding field was present.
Image 1 for This is from my terminal, printed before I deleted I recreated the document, so you can see the embedding field was present.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:02 PM
Hmmm
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
05:02 PM
This is the document now:
Image 1 for This is the document now:
05:02
Gustavo
05:02 PM
The embedding is indeed not identical.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:02 PM
Strange! The embedding values are different
Gustavo
Photo of md5-f930fdb99fd46477205fa1201164ea50
Gustavo
05:03 PM
Oh, nevermind, the reason the embedding is different is because the topics field is different.
sweat_smile1
05:05
Gustavo
05:05 PM
But you can see the topics is the same as how it was before I manually changed it to test.
Image 1 for But you can see the <code>topics</code> is the same as how it was before I manually changed it to test.
05:06
Gustavo
05:06 PM
So here's what I'm sure about:
Same topics, same search.
Before: didn't appear in the results.
Now: appears normally.
05:07
Gustavo
05:07 PM
If this problem happens again, do you want me to do something specific? Maybe send you the ID and leave it untouched so you can inspect it?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
05:11 PM
Yeah, if you can send me the search query (curl request with all the search params, minus the API key) and the document ID that you expect to be returned but not returned, we can take a closer look
+11