Hi all, I’m currently working on using the `embed`...
# community-help
d
Hi all, I’m currently working on using the
embed
feature in Typesense 0.25 for semantic search. Here’s a part of my schema: { "name" : "embedding_multi", "type" : "float[]", "embed": { "from": [ "description", "item_details_en" ], "model_config": { "model_name": "ts/multilingual-e5-base" } } }, In this setup: •
description
is a regular string (e.g., product description) •
item_details_en
is an array of objects like: [ { "name": "Material", "title": "Stainless Steel" }, { "name": "Color", "title": "Black" } ] I would like to ask: • How does Typesense internally convert these values into a single string before sending them to the embedding model? • Will the values in the object array be concatenated, and in what format (e.g.,
"Material: Stainless Steel\nColor: Black"
or just raw values)? • Do I need to pre-process
item_details_en
into a string manually, or is it handled internally?
f
I'd suggest using dot notation to specify the sub-field itself :
Copy code
item_details_en.title
d
The combine all field for the embedding using "space" or what?
can the embedding in typesense used chunk?
f
Typesense takes all specified fields in the
from
array, then concatenates their contents with a space separator to form a single text string and finally sends this combined text to the embedding model for processing For example, if you have:
Copy code
"embedding": {
  "type": "float[]",
  "embed": {
    "from": ["product_description", "product_name"],
    "model_config": {"model_name": "ts/e5-small"}
  }
}
The values from both
product_description
and
product_name
would be joined with a space between them, and this combined text would generate a single embedding vector. It handles both string and string array types, and properly skips missing fields.