Hi everyone, I'm looking to create a custom vector...
# community-help
j
Hi everyone, I'm looking to create a custom vector field using Nomic-embed-text-1.5: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5. I loaded the model into the datadrive/model directory along with the vocab.txt and config.json file. Here is the schema update below (python):
Copy code
update_schema = {
  'fields': [   
    {
      "name" : "vec_nomic",
      "type" : "float[]",
      'optional': False,
      "embed": {
        "from": ['data_field1', 'data_field2', 'data_field3', 'data_field3', 'data_field4', 'data_field5', 'data_field6', 'data_field7'],
        "model_config": {
          "model_name": "nomic",
          "indexing_prefix": "search_document:",
          "query_prefix": "search_query:"
        }
      }
    }
  ]
}

typesense_client.collections['collection1'].update(update_schema)
However, I am experiencing this error at runtime: RequestMalformed: [Errno 400] Invalid model: attention_mask tensor not found. Any ideas on how to remedy?
j
CC: @Ozan Armağan
👍 1
j
Hi @Ozan Armağan, any update on this?
o
@John Sokol Hi, I fixed the problem with the ONNX model and uploaded it as a public model to our repo. Could you please try to use
ts/nomic-embed-text-v1.5
and let me know if it works for you?
j
Got it, Thanks Ozan! I'm updating my collection now. It may take a few days, I will get back to you once the update is complete.
On a sidenote, is there a setting to enable dimension truncation for models that have Matryoshka representation learning?
o
Currently we support dimension truncation only for OpenAI’s text-embedding-3 models and we just merged a PR to support it for Google’s Vertex AI models a few days ago. Local models are also in our roadmap.
j
Hi @Ozan Armağan, if I want to update my vector field to embed with different fields, I tried to execute this update query:
Copy code
update_schema =  {'fields': [{'embed': {'from': [
     'field1',
     'field2',
     'field3',
     'field4',
     'field5',
     'field6',
     'field7'],
    'model_config': {'model_name': 'ts/nomic-embed-text-v1.5'}},
   'facet': False,
   'index': True,
   'infix': False,
   'locale': 'en',
   'name': 'vec_nomic',
   'num_dim': 768,
   'optional': False,
   'sort': False,
   'stem': False,
   'stem_dictionary': '',
   'store': True,
   'type': 'float[]',
   'vec_dist': 'cosine'}]}

typesense_client.collections['collection1'].update(update_schema)
However, this error was returned:
Copy code
RequestMalformed: [Errno 400] `model_config` should be an object containing `model_name` and `api_key` as string values.
And when I try to add an empty API key since this model is open source like so:
Copy code
'model_config': {'model_name': 'ts/nomic-embed-text-v1.5', 'api_key': ''}},
This error is returned:
Copy code
RequestMalformed: [Errno 400] Invalid model for api_key updation.
How should I proceed?
o
@John Sokol which version of Typesense are you using currently?
j
28.0
o
You need to first drop the existing field. Could you try this:
Copy code
update_schema =  {'fields': [
{'name': 'vec_nomic', 'drop': True},
{'embed': {'from': [
     'field1',
     'field2',
     'field3',
     'field4',
     'field5',
     'field6',
     'field7'],
    'model_config': {'model_name': 'ts/nomic-embed-text-v1.5'}},
   'facet': False,
   'index': True,
   'infix': False,
   'locale': 'en',
   'name': 'vec_nomic',
   'num_dim': 768,
   'optional': False,
   'sort': False,
   'stem': False,
   'stem_dictionary': '',
   'store': True,
   'type': 'float[]',
   'vec_dist': 'cosine'}]}
j
Works! Thank you sir!