~We've recreated this issue on our non-prod cluster which is not HA.
I've also had some success in reproducing the issue:~
1. Create products collection. (products_prod_1.0)
2. Create the prices collection. (productPrices_prod_1.0)
a. This collection has a reference back to products, with async_reference = true
3. Trigger some incremental updates to the prices collection.
a. This is while both collections are blank, and so the async_reference option is needed.
b. We use a POST to create new documents.
4. Do a bulk update of both collections.
a. This uses the import option with action=upsert. We do groups of 2500 products to the product collection, then prices for those same products go to the prices collection.
5. During the bulk updates, querying the products collection with a left join to prices begins to fail with this message: "Failed to join on `productPrices_prod_1.0`: No reference field found."
a. This doesn't happen immediately. It appears to happen when the bulk update process eventually comes to a batch of 2500 products that includes products I triggered price updates for at the very start.
b. We haven't been able to reproduce the failure of the bulk load on this non-prod cluster, but instead get this error when querying the collection.
~If I repeat those steps above with collection names that don't include dots, and only underscores, things work as expected.
Is it possible that when POST creates price documents that don't have a reference in the products collection yet, that the naming convention with dots isn't navigated well? Seems like the price is created with a phantom reference to to a collection that will never exist, because the reference collection name isn't handled well.~