Hi, does Typesense support vector search on nested...
# community-help
s
Hi, does Typesense support vector search on nested_fields? I'm currently struggling to get a working solution getting this error message: `{'results': [{'code': 400, 'error': 'Field
chunks.DenseVec
does not have a vector query index.'}]}` The current schema would look something similar to this one:
EXAMPLE_SCHEMA = {
fields: [
{"name": {"name": "id", "type": "string"},
{"name": "Date", "type": "string", "sort": True},
{"name": "Document_Summary", "type": "string", "locale": "de", "stem": True},
{"name": "Dense_Summary_Embedding", "type": "float[]","num_dim": 3072},
{
"name": "chunks",
"type": "object[]",
"optional": True,
"fields": [
{"name": "Chunk_ID", "type": "string"},
{"name": "DenseVec", "type": "float[]","num_dim": 3072},
{"name": "Chunk", "type": "string", "locale": "de", "stem": True}
]}],
"token_separators": [";", ",", ".", ":"],
"default_sorting_field": "Date",
"enable_nested_fields": True,
"symbols_to_index": ["+", "-", "@", "/"]
}
Conducting search on e.g. Dense_Summary_Embedding works fine, but it gives the error when trying to run it on the chunks embeddings using this code:
Copy code
typesense_results = self.ts_manager.client.multi_search.perform({
            "searches": [{
                "q": "*",
                "collection": self.collection_name,
                "vector_query": f"chunks.DenseVec:([{','.join(str(v) for v in vector_query)}], k:{max_candidates})",
                "exclude_fields": "Dense_Summary_Embedding, chunks.DenseVec"
            }]}, {})
Technically one could flatten the entire thing, but then we would have quite a lot of duplications. Any ideas?
k
Please post a fully reproducible example using curl using this template so that we can investigate: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
s
Hi there, just create a rather simple example with the template where we also get the same error
Copy code
export TYPESENSE_API_KEY=xyz
mkdir -p "$(pwd)"/typesense-data

docker run -p 8108:8108 \
  -v "$(pwd)"/typesense-data:/data typesense/typesense:27.1 \
  --data-dir /data \
  --api-key=$TYPESENSE_API_KEY \
  --enable-cors
  
  
curl "<http://localhost:8108/collections>" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "nested_test",
    "fields": [
      {"name": "id", "type": "string"},
      {"name": "title", "type": "string"},
      {"name": "Summary", "type": "string"},
      {"name": "Dense_Summary_Embedding", "type": "float[]", "num_dim": 5},
      {
        "name": "chunks",
        "type": "object[]",
        "fields": [
          {"name": "text", "type": "string"},
          {"name": "vector", "type": "float[]", "num_dim": 5}
        ]
      }
    ],
    "enable_nested_fields": true
  }'
  
  
curl "<http://localhost:8108/collections/nested_test/documents>" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "1",
    "title": "Test Document",
    "Summary": "This is an example summary of document 1",
    "Dense_Summary_Embedding": [0.1, 0.2, 0.3, 0.4, 0.5],
    "chunks": [
      {"text": "First chunk", "vector": [0.1, 0.2, 0.3, 0.1, 0.2]},
      {"text": "Second chunk", "vector": [0.4, 0.5, 0.6, 0.4, 0.5]}
    ]
  }'
  
  
curl "<http://localhost:8108/multi_search>" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "searches": [
      {
        "collection": "nested_test",
        "q": "*",
        "vector_query": "chunks.vector:([0.1, 0.2, 0.3, 0.1, 0.2], k:10)"
      }
    ]
  }'
And the error message: `{"results":[{"code":400,"error":"Field
chunks.vector
does not have a vector query index."}]}`
Would be great to know why vector search does not really work out, as Text search does work on the provided example, e.g here:
Copy code
(base) said@said-HP-Laptop-17-cp3xxx:~$ curl "<http://localhost:8108/multi_search>" -X POST \rch" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "searches": [
      {
        "collection": "nested_test",
        "q": "chunk",
        "query_by": "chunks.text"
      }
    ]
  }'
{"results":[{"facet_counts":[],"found":1,"hits":[{"document":{"Dense_Summary_Embedding":[0.1,0.2,0.3,0.4,0.5],"Summary":"This is an example summary of document 1","chunks":[{"text":"First chunk","vector":[0.1,0.2,0.3,0.1,0.2]},{"text":"Second chunk","vector":[0.4,0.5,0.6,0.4,0.5]}],"id":"1","title":"Test Document"},"highlight":{"chunks":[{"text":{"matched_tokens":["chunk"],"snippet":"First <mark>chunk</mark>"}},{"text":{"matched_tokens":["chunk"],"snippet":"Second <mark>chunk</mark>"}}]},"highlights":[],"text_match":578730123365187705,"text_match_info":{"best_field_score":"1108091338752","best_field_weight":15,"fields_matched":1,"num_tokens_dropped":0,"score":"578730123365187705","tokens_matched":1,"typo_prefix_score":0}}],"out_of":1,"page":1,"request_params":{"collection_name":"nested_test","first_q":"chunk","per_page":10,"q":"chunk"},"search_cutoff":false,"search_time_ms":0}]}
k
This is wrong:
Copy code
{
  "name": "chunks",
  "type": "object[]",
  "fields": [
    {
      "name": "text",
      "type": "string"
    },
    {
      "name": "vector",
      "type": "float[]",
      "num_dim": 5
    }
  ]
}
You can't nest fields this way in the schema. You have to use dot notation to refer to nested fields. E.g.
chunks.vector
s
I dont quite understand, how would the indexing of multiple different chunks then work?
Copy code
(base) said@said-HP-Laptop-17-cp3xxx:~$ curl "<http://localhost:8108/collections>" -X POST \ost:8108/collections" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "nested_test_new2",
    "fields": [
      {"name": "id", "type": "string"},
      {"name": "title", "type": "string"},
      {"name": "Summary", "type": "string"},
      {"name": "Dense_Summary_Embedding", "type": "float[]", "num_dim": 5},
      {"name": "chunks.text", "type": "string"},
      {"name": "chunks.vector", "type": "float[]", "num_dim": 5}
    ],
    "enable_nested_fields": true
  }'
{"created_at":1737811871,"default_sorting_field":"","enable_nested_fields":true,"fields":[{"facet":false,"index":true,"infix":false,"locale":"","name":"title","optional":false,"sort":false,"stem":false,"store":true,"type":"string"},{"facet":false,"index":true,"infix":false,"locale":"","name":"Summary","optional":false,"sort":false,"stem":false,"store":true,"type":"string"},{"facet":false,"hnsw_params":{"M":16,"ef_construction":200},"index":true,"infix":false,"locale":"","name":"Dense_Summary_Embedding","num_dim":5,"optional":false,"sort":false,"stem":false,"store":true,"type":"float[]","vec_dist":"cosine"},{"facet":false,"index":true,"infix":false,"locale":"","name":"chunks.text","optional":false,"sort":false,"stem":false,"store":true,"type":"string"},{"facet":false,"hnsw_params":{"M":16,"ef_construction":200},"index":true,"infix":false,"locale":"","name":"chunks.vector","num_dim":5,"optional":false,"sort":false,"stem":false,"store":true,"type":"float[]","vec_dist":"cosine"}],"name":"nested_test_new2","num_documents":0,"symb(base) said@said-HP-Laptop-17-cp3xxx:~$ curl "<http://localhost:8108/collections/nested_test/documents>" -X POST \ctions/nested_test/documents" -X POST \
  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "1",
    "title": "Test Document",
    "Summary": "This is an example summary of document 1",
    "Dense_Summary_Embedding": [0.1, 0.2, 0.3, 0.4, 0.5],
    "chunks.text": "First chunk", "chunks.vector": [0.1, 0.2, 0.3, 0.1, 0.2],
    "chunks.text": "Second chunk", "chunks.vector": [0.4, 0.5, 0.6, 0.4, 0.5]
  }'
{"message":"A document with id 1 already exists."}
Could you maybe send a working example?
If document A is index by Title and Summary, and I would like it to also have the content in multiple chunks (chunks one with an embedding and chunk two has another one, wouldnt this approach just overwrite the old one as one can see in above example?)
k
We don't have a way to index an array of vectors.
s
Still thx for your reply, I kind of thought so. Until that is supported, I guess each doc-level information + chunk in chunks will be used as an own document. After doing that the hybrid search works now, even though it creates duplicates on document level.