Hi! We’re trying to evaluate using typesense in on...
# community-help
j
Hi! We’re trying to evaluate using typesense in one of our applications. I’ve spent the last day working on batch upserting documents and I have a strange issue. When the application starts we generate the schema for multiple collections and create them if needed. Directly after, if no documents have been created before, we batch ingest ~1000 documents across ~20 collections with different schemas. When I try to query one of the collections I get results from different collections. The number of results seem correct, but the documents returned are incorrect. I’m running 0.23.0 locally on Mac OS and using the node client apis. Does anyone know if there’s a potential threading issue with creating multiple collections and batch inserting documents?
k
👋 when you mean by "documents returned are incorrect" -- do you mean to say that the documents are being returned from a different collection than the one being queried?
j
Exactly. Here’s an example response:
Copy code
{
  "facet_counts": [],
  "found": 3,
  "hits": [
    {
      "document": {
        "_createdAt": 1643040319,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654503876,
        "id": "8e0c59b8-5d01-5cce-b024-8648da3399d3",
        "type": "list"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654079543,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654503857,
        "id": "a0b26cc5-376a-57c2-b0ad-a3bb57ee59cd",
        "type": "list"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1643113413,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654076197,
        "id": "fd7cbcf8-3b52-5567-829f-c0349b33c930",
        "type": "page"
      },
      "highlights": [],
      "text_match": 100
    }
  ],
  "out_of": 3,
  "page": 1,
  "request_params": {
    "collection_name": "list",
    "per_page": 10,
    "q": "*"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}
I’ve added the type to the document which is the same as the collection name.
k
Do the
page
and
list
collections have the same schema?
j
Here’s the schema for `list`:
Copy code
{
  "created_at": 1657005437,
  "default_sorting_field": "_updatedAt",
  "fields": [
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_createdAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_updatedAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_publishedAt",
      "optional": true,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": true,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "dataset",
      "optional": false,
      "sort": false,
      "type": "string"
    }
  ],
  "name": "list",
  "num_documents": 3,
  "symbols_to_index": [],
  "token_separators": []
}
And this is for `page`:
Copy code
{
  "created_at": 1657005437,
  "default_sorting_field": "_updatedAt",
  "fields": [
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "webContent.metaTitle",
      "optional": true,
      "sort": false,
      "type": "string"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_createdAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_updatedAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_publishedAt",
      "optional": true,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": true,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "dataset",
      "optional": false,
      "sort": false,
      "type": "string"
    }
  ],
  "name": "page",
  "num_documents": 23,
  "symbols_to_index": [],
  "token_separators": []
}
Page has one more field.
k
To figure out what's going wrong, I would make a field that's non-optional and unique to
list
collection and see what happens when you index.
If a page document is accidentally being indexed into the
list
collection, then an error will be thrown.
j
Ok. Give me a minute and I’ll try.
It did not throw an error the first time I ran the migration, but the second time it gave me this: `RequestMalformed: Request failed with HTTP code 400 | Server said: Field
page
is not part of collection schema.`
k
You mean the schema migration?
j
I added a field with the same name as the collection and set its value to the collection name at ingestion time.
Sorry the second time I made a batch upsert.
k
Copy code
Field X is not part of collection schema.
error message is returned as part of schema change 🤔
j
Sorry my bad. Disregard the above. Let me try again. My schema migration was the cause of that error.
k
👍
j
By adding a unique field to the collection it now seems to work. Is schema uniqueness a requirement?
k
No, it's not... What I am trying to figure out is what was happening earlier. If indeed the wrong document was being sent to the wrong collection, now with a unique field it should be throwing an error since we have not changed anything else apart from a unique constraint.
There are only 2 explanations for the earlier behavior: a) Either a client side error where code erroneously sent the wrong document type to the collection. b) Some race condition inside Typesense that sent the document to the wrong collection. In both cases, if schema mismatch happens, an error should be thrown. So I'm surprised to see it getting indexed fine now.
j
I’ve tried running the upsert script a couple of times now and the responses seem correct…
k
Is it possible for you to extract the behavior (without unique field) into a standalone script that I can run to reproduce the issue?
j
Sure I can try.
I’ve managed to recreate the issue in isolation.
If you run
yarn run-test
multiple times it will start to mix collections in response:
curl -H "X-TYPESENSE-API-KEY: xyz" "<http://localhost:8108/collections/product/documents/search?q=*&query_by=dataset>" | jq
Copy code
{
  "facet_counts": [],
  "found": 23,
  "hits": [
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:8fd389b3-d63a-5a63-b616-7b3320293100",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654261822,
        "dataset": "global",
        "id": "global:6504ace7-7273-5590-8ee8-b263a302d365",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654159883,
        "dataset": "global",
        "id": "global:403d678a-0b39-5caa-8aa2-e583b3737cdb",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:3b385fef-0611-5178-b300-006889e071bc",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:45fdca58-958c-578a-aafa-a09e110b0af4",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654088646,
        "dataset": "global",
        "id": "global:2fc86262-f5b8-5fa7-8010-240f95dae313",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:c7471607-589b-5d20-90e6-92011d1eb194",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1628508407,
        "dataset": "dataset1",
        "id": "dataset1:f82356a0-ca90-5efa-9f4c-6b58d9e35a3f",
        "type": "author"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622620250,
        "dataset": "global",
        "id": "global:72fa23bf-a5d1-5028-ba2e-801ce8841219",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1653301312,
        "dataset": "global",
        "id": "global:c44292ef-c1ae-58a9-8df7-af01956f6149",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    }
  ],
  "out_of": 23,
  "page": 1,
  "request_params": {
    "collection_name": "product",
    "per_page": 10,
    "q": "*"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}
It only seems to happen when not waiting for the response from the server (eg.
collectionNames.forEach(async (name) => {})
instead of
for (const name of collectionNames)
. The forEach statement will spawn multiple promises and not wait for the old ones to finish, but the for loop will work with async.
k
Thanks, we will look into this and keep you posted.
@Johan Stille I think this might be because of the
client
object being shared across all the async functions. Can you try instantiating the client object inside the async function?
@Johan Stille Were you able to figure this out?
I identified a potential race condition that could happen locally (but super rare when you connect to Typesense on another host) that I've fixed in
0.24.0.rc20
build.