Hi We re trying to evaluate using typesense in one of our ap typesense #community-help

Hi! We’re trying to evaluate using typesense in on...

Johan Stille

07/05/2022, 7:28 AM

Hi! We’re trying to evaluate using typesense in one of our applications. I’ve spent the last day working on batch upserting documents and I have a strange issue. When the application starts we generate the schema for multiple collections and create them if needed. Directly after, if no documents have been created before, we batch ingest ~1000 documents across ~20 collections with different schemas. When I try to query one of the collections I get results from different collections. The number of results seem correct, but the documents returned are incorrect. I’m running 0.23.0 locally on Mac OS and using the node client apis. Does anyone know if there’s a potential threading issue with creating multiple collections and batch inserting documents?

Kishore Nallan

07/05/2022, 7:30 AM

👋 when you mean by "documents returned are incorrect" -- do you mean to say that the documents are being returned from a different collection than the one being queried?

Johan Stille

07/05/2022, 7:31 AM

Exactly. Here’s an example response:

Copy code

{
  "facet_counts": [],
  "found": 3,
  "hits": [
    {
      "document": {
        "_createdAt": 1643040319,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654503876,
        "id": "8e0c59b8-5d01-5cce-b024-8648da3399d3",
        "type": "list"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654079543,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654503857,
        "id": "a0b26cc5-376a-57c2-b0ad-a3bb57ee59cd",
        "type": "list"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1643113413,
        "_publishedAt": 1654612352,
        "_updatedAt": 1654076197,
        "id": "fd7cbcf8-3b52-5567-829f-c0349b33c930",
        "type": "page"
      },
      "highlights": [],
      "text_match": 100
    }
  ],
  "out_of": 3,
  "page": 1,
  "request_params": {
    "collection_name": "list",
    "per_page": 10,
    "q": "*"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}

Johan Stille

07/05/2022, 7:31 AM

I’ve added the type to the document which is the same as the collection name.

Kishore Nallan

07/05/2022, 7:32 AM

Do the

page

and

list

collections have the same schema?

Johan Stille

07/05/2022, 7:35 AM

Here’s the schema for `list`:

Copy code

{
  "created_at": 1657005437,
  "default_sorting_field": "_updatedAt",
  "fields": [
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_createdAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_updatedAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_publishedAt",
      "optional": true,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": true,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "dataset",
      "optional": false,
      "sort": false,
      "type": "string"
    }
  ],
  "name": "list",
  "num_documents": 3,
  "symbols_to_index": [],
  "token_separators": []
}

Johan Stille

07/05/2022, 7:35 AM

And this is for `page`:

Johan Stille

07/05/2022, 7:35 AM

Copy code

{
  "created_at": 1657005437,
  "default_sorting_field": "_updatedAt",
  "fields": [
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "webContent.metaTitle",
      "optional": true,
      "sort": false,
      "type": "string"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_createdAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_updatedAt",
      "optional": false,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": false,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "_publishedAt",
      "optional": true,
      "sort": true,
      "type": "int64"
    },
    {
      "facet": true,
      "index": true,
      "infix": false,
      "locale": "",
      "name": "dataset",
      "optional": false,
      "sort": false,
      "type": "string"
    }
  ],
  "name": "page",
  "num_documents": 23,
  "symbols_to_index": [],
  "token_separators": []
}

Johan Stille

07/05/2022, 7:36 AM

Page has one more field.

Kishore Nallan

07/05/2022, 7:36 AM

To figure out what's going wrong, I would make a field that's non-optional and unique to

list

collection and see what happens when you index.

Kishore Nallan

07/05/2022, 7:37 AM

If a page document is accidentally being indexed into the

list

collection, then an error will be thrown.

Johan Stille

07/05/2022, 7:37 AM

Ok. Give me a minute and I’ll try.

Johan Stille

07/05/2022, 7:41 AM

It did not throw an error the first time I ran the migration, but the second time it gave me this: `RequestMalformed: Request failed with HTTP code 400 | Server said: Field

page

is not part of collection schema.`

Kishore Nallan

07/05/2022, 7:41 AM

You mean the schema migration?

Johan Stille

07/05/2022, 7:41 AM

I added a field with the same name as the collection and set its value to the collection name at ingestion time.

Johan Stille

07/05/2022, 7:42 AM

Sorry the second time I made a batch upsert.

Kishore Nallan

07/05/2022, 7:44 AM

Copy code

Field X is not part of collection schema.

error message is returned as part of schema change 🤔

Johan Stille

07/05/2022, 7:44 AM

Sorry my bad. Disregard the above. Let me try again. My schema migration was the cause of that error.

Kishore Nallan

07/05/2022, 7:45 AM

👍

Johan Stille

07/05/2022, 7:47 AM

By adding a unique field to the collection it now seems to work. Is schema uniqueness a requirement?

Kishore Nallan

07/05/2022, 7:48 AM

No, it's not... What I am trying to figure out is what was happening earlier. If indeed the wrong document was being sent to the wrong collection, now with a unique field it should be throwing an error since we have not changed anything else apart from a unique constraint.

Kishore Nallan

07/05/2022, 7:51 AM

There are only 2 explanations for the earlier behavior: a) Either a client side error where code erroneously sent the wrong document type to the collection. b) Some race condition inside Typesense that sent the document to the wrong collection. In both cases, if schema mismatch happens, an error should be thrown. So I'm surprised to see it getting indexed fine now.

Johan Stille

07/05/2022, 7:51 AM

I’ve tried running the upsert script a couple of times now and the responses seem correct…

Kishore Nallan

07/05/2022, 7:52 AM

Is it possible for you to extract the behavior (without unique field) into a standalone script that I can run to reproduce the issue?

Johan Stille

07/05/2022, 7:52 AM

Sure I can try.

Johan Stille

07/05/2022, 8:39 AM

https://github.com/withaspoon/typesense-upsert-issue

Johan Stille

07/05/2022, 8:40 AM

I’ve managed to recreate the issue in isolation.

Johan Stille

07/05/2022, 8:40 AM

If you run

yarn run-test

multiple times it will start to mix collections in response:

Johan Stille

07/05/2022, 8:41 AM

curl -H "X-TYPESENSE-API-KEY: xyz" "<http://localhost:8108/collections/product/documents/search?q=*&query_by=dataset>" | jq

Johan Stille

07/05/2022, 8:41 AM

Copy code

{
  "facet_counts": [],
  "found": 23,
  "hits": [
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:8fd389b3-d63a-5a63-b616-7b3320293100",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654261822,
        "dataset": "global",
        "id": "global:6504ace7-7273-5590-8ee8-b263a302d365",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654159883,
        "dataset": "global",
        "id": "global:403d678a-0b39-5caa-8aa2-e583b3737cdb",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:3b385fef-0611-5178-b300-006889e071bc",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:45fdca58-958c-578a-aafa-a09e110b0af4",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1654088646,
        "dataset": "global",
        "id": "global:2fc86262-f5b8-5fa7-8010-240f95dae313",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622619587,
        "dataset": "global",
        "id": "global:c7471607-589b-5d20-90e6-92011d1eb194",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1628508407,
        "dataset": "dataset1",
        "id": "dataset1:f82356a0-ca90-5efa-9f4c-6b58d9e35a3f",
        "type": "author"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1622620250,
        "dataset": "global",
        "id": "global:72fa23bf-a5d1-5028-ba2e-801ce8841219",
        "type": "productEntryCategory"
      },
      "highlights": [],
      "text_match": 100
    },
    {
      "document": {
        "_createdAt": 1653301312,
        "dataset": "global",
        "id": "global:c44292ef-c1ae-58a9-8df7-af01956f6149",
        "type": "product"
      },
      "highlights": [],
      "text_match": 100
    }
  ],
  "out_of": 23,
  "page": 1,
  "request_params": {
    "collection_name": "product",
    "per_page": 10,
    "q": "*"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}

Johan Stille

07/05/2022, 8:44 AM

It only seems to happen when not waiting for the response from the server (eg.

collectionNames.forEach(async (name) => {})

instead of

for (const name of collectionNames)

. The forEach statement will spawn multiple promises and not wait for the old ones to finish, but the for loop will work with async.

Kishore Nallan

07/05/2022, 9:21 AM

Thanks, we will look into this and keep you posted.

Kishore Nallan

07/05/2022, 10:06 AM

@Johan Stille I think this might be because of the

client

object being shared across all the async functions. Can you try instantiating the client object inside the async function?

Kishore Nallan

07/19/2022, 3:12 AM

@Johan Stille Were you able to figure this out?

Kishore Nallan

07/19/2022, 11:18 AM

I identified a potential race condition that could happen locally (but super rare when you connect to Typesense on another host) that I've fixed in

0.24.0.rc20

build.

Open in Slack

Previous Next