Hi all. We're still having trouble with loading o...
# community-help
s
Hi all. We're still having trouble with loading our production collections. We had some issues last week that were related to excessive load on the TypeSense server. We discovered a bug in our incremental loading that was generating a huge amount of load artificially. This has been resolved and our admin dashboard shows CPU, RAM, and pending writes as stable. But we still are getting failures in our nightly bulk load. We have 2 collections (products and prices) where prices has a reference back to products. We upgraded to V28 to take advantage of the async_reference option, because it simplifies our loads. But for the second time we've gotten this error message, when trying to update a product record.
Copy code
{u'message': u'2499 documents imported successfully, 1 documents failed during import. Use `error.importResults` from the raised exception to get a detailed error reason for each document.', u'name': u'Error', u'stack': u'Error: 2499 documents imported successfully, 1 documents failed during import. Use `error.importResults` from the raised exception to get a detailed error reason for each document.\n    at TypesenseService.importDocumentsInJsonFormat (webpack-internal:///422:362:13)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)\n    at ProductSearchAreaService.updateDocuments (webpack-internal:///426:188:37)\n    at ProductSearchAreaService.updateAllCollections (webpack-internal:///221:103:11)\n    at ProductSearchAreaService.syncAll (webpack-internal:///221:41:5)\n    at ScheduledTaskService.processTask (webpack-internal:///540:310:22)'}
And using the 'error.importResults' we see this error detail:
Copy code
{
  "code": 400,
  "error": "`productPrices_prod_1.0.productId` does not have a reference to `products_prod_1.0` collection.",
  "success": false
}
My understanding was that the async_reference: true setting would prevent this error. But also concerning, is the fact that we can't query, update, delete, or create a record for that ID. See the attached screenshots. Is this a bug in the async_reference option just released?
f
My first assumption here is the dot notation on your collection's name. I suspect that Typesense is trying to find a different collection because of it. Otherwise, please share your collection schema for both the products and their prices
Could you try first changing the naming scheme of your collections to only use underscores (
_
) vs using dots for the semantic versioning? Since dots are used for denoting child fields on references, that may be at play here
s
Yes, we can make that change. I suspect there is something else still going on though, because this usually does work. And after the upgrade I specifically tested adding prices when related products did and did not exist. But if avoiding dot notation is a best practice, we can do that. Thankfully we have an alias in place so it is a pretty straightforward change.
Is the async_reference verified to work with HA clusters where multiple nodes are involved?
j
CC: @Harpreet Sangar
s
~We've recreated this issue on our non-prod cluster which is not HA. I've also had some success in reproducing the issue:~ 1. Create products collection. (products_prod_1.0) 2. Create the prices collection. (productPrices_prod_1.0) a. This collection has a reference back to products, with async_reference = true 3. Trigger some incremental updates to the prices collection. a. This is while both collections are blank, and so the async_reference option is needed. b. We use a POST to create new documents. 4. Do a bulk update of both collections. a. This uses the import option with action=upsert. We do groups of 2500 products to the product collection, then prices for those same products go to the prices collection. 5. During the bulk updates, querying the products collection with a left join to prices begins to fail with this message: "Failed to join on `productPrices_prod_1.0`: No reference field found." a. This doesn't happen immediately. It appears to happen when the bulk update process eventually comes to a batch of 2500 products that includes products I triggered price updates for at the very start. b. We haven't been able to reproduce the failure of the bulk load on this non-prod cluster, but instead get this error when querying the collection. ~If I repeat those steps above with collection names that don't include dots, and only underscores, things work as expected. Is it possible that when POST creates price documents that don't have a reference in the products collection yet, that the naming convention with dots isn't navigated well? Seems like the price is created with a phantom reference to to a collection that will never exist, because the reference collection name isn't handled well.~
We've recreated this issue on our non-prod cluster which is not HA. I've also had some success in reproducing the issue: 1. Create products collection. (products_prod_1.0) 2. Create the prices collection. (productPrices_prod_1.0) a. This collection has a reference back to products, with async_reference = true 3. Trigger some incremental updates to the prices collection. a. This is while both collections are blank, and so the async_reference option is needed. b. We use a POST to create new documents. 4. Do a bulk update of both collections. a. This uses the import option with action=upsert. We do groups of 2500 products to the product collection, then prices for those same products go to the prices collection. 5. During the bulk updates, querying the products collection with a left join to prices begins to fail with this message: "Failed to join on `productPrices_prod_1.0`: No reference field found." a. This doesn't happen immediately. It appears to happen when the bulk update process eventually comes to a batch of 2500 products that includes products I triggered price updates for at the very start. b. We haven't been able to reproduce the failure of the bulk load on this non-prod cluster, but instead get this error when querying the collection. If I repeat those steps above with collection names that don't include dots, and only underscores, things work as expected. Is it possible that when POST creates price documents that don't have a reference in the products collection yet, that the naming convention with dots isn't navigated well? Seems like the price is created with a phantom reference to to a collection that will never exist, because the reference collection name isn't handled well. But the search with GET and bulk import with PUT both seem to handle the referenced collection name with dots OK. So when the product document gets created it is in the correct collection name. But the prices still hold a reference to a non-existent document that now doesn't match the real reference? Just trying to make sense of what could explain how we're seeing with bulk import not working in HA and search break in single clusters when we have dots in our referenced collection names. For now I'm going to proceed with collection names that don't use dots, and see if we can find any scenarios where this issue happens again.
Deduced some more information last night: • If I create prices and use the collection alias in the reference, the bulk import fails like what we saw in our HA prod cluster. • If I create prices and use the actual collection name in the reference but it has dots, the search fails once bulk import reaches related documents. • If I create prices and use the actual collection name in the reference and it only has underscores, it works! @TypesenseTeam, I'm not sure if you're able to verify any of this or explain this behavior based on what you see in the backend. Bur for the current release of async_reference, it seems very important to use the actual collection name (not an alias) and to not use dots in the naming.
h
Hi @Scott Nei Thanks for the detailed report. I'll try to reproduce the issue.
@Scott Nei I hope this issue has been resolved with the latest RC build.
s
If you added a fix, I’m sure it is good. We just switched to using names without dots, and always define joined collection references with the actual name, not the alias. That point would be nice to change though, if you’ve fixed the issue of aliases not working in defining related fields in the schema.
h
The issue that we fixed was related to bulk import of
async_reference
being stuck due to a deadlock. I think what you're describing isn't related to that issue.
We just switched to using names without dots
Since we support referencing the object fields that have field name like
object_name.field_name
, it can interfere with the parsing logic if the collection name has dots in its name itself. So if you mention
"reference": "productPrices_prod_1.0.productId"
it'll be parsed as
collection: productPrices_prod_1
and
field: 0.productId
. Regarding using alias name in
reference
property, the join will only work till you don't update the the aliased collection. The reason for this behaviour is that the references are static to achieve better query response time. By static I mean that we store the internal id of the referenced document in the document that references it. So when the referenced collection is updated, the references might break. I'll suggest setting up aliases only for querying purpose and treating the referenced and referencing collection as a group. https://typesense.org/docs/28.0/api/joins.html#using-aliases-with-joins
So when the referenced collection is updated, the references might break.
To clarify, updating individual fields of referenced documents won't break the references but deleting certain document and re-indexing will.
👍 1