Is it possible to do joins on collections articles and pagev typesense #community-help

Is it possible to do joins on collections articles...

Mikkel Birkegaard Andersen

02/25/2025, 10:05 AM

Is it possible to do joins on collections articles and pageviews with articles.id = pageviews.articleId, without explicitly setting references in the schema? We have collections that get updates from streams two streams and the problem we have, is that we aren’t sure of the order in which receive updates, a pageview event may arrive before the article, which gives us 404s. Are we approaching this correctly? We could put articles and pageviews into the same collection using

emplace

, but we have a follow-up usecase where pageviews can come from different a different and dynamic number of sources and that wont work very nicely with the everything in one collection approach

Fanis Tharropoulos

02/25/2025, 10:33 AM

Why not use references?

Mikkel Birkegaard Andersen

02/25/2025, 10:35 AM

Because we may receive the pageview data before we receive the article data and then we get a 404 when inserting

Fanis Tharropoulos

02/25/2025, 10:36 AM

Try setting it as optional and updating the doc

Mikkel Birkegaard Andersen

02/25/2025, 10:56 AM

Would we then have to reinsert the pageviews when the article comes in?

Fanis Tharropoulos

02/25/2025, 11:31 AM

Yes

Mikkel Birkegaard Andersen

02/25/2025, 11:41 AM

That doesn’t work then, since that doesn’t let us process the incoming streams independently.

Fanis Tharropoulos

02/25/2025, 11:49 AM

Could you explain how the workflow looks like? You're getting pageview data before getting the article data. But the pageview data references an article? Shouldn't an article reference a pageview in that case?d

Mikkel Birkegaard Andersen

02/25/2025, 11:58 AM

Generally yes, that’s what you would expect, but the data is coming from different systems and CMSses which are not under our control and we have cases where the pageviews data was made available to us months ago (we store it in a Kafka-topic), while the articles are only slowly being backfilled on a case by case basis.

Fanis Tharropoulos

02/25/2025, 12:03 PM

Is the article being referenced by a pageview or a pageview by an article?

Mikkel Birkegaard Andersen

02/25/2025, 12:04 PM

pageviews has an articleId field which references article.id

Fanis Tharropoulos

02/25/2025, 12:05 PM

So an article can be on many pageviews but a pageview can have a single article? And a pageview can't exist without an article, but it's being filled before an article?

Mikkel Birkegaard Andersen

02/25/2025, 12:16 PM

An article only has one pageview object (the latest), but we will receive many pageview update events over the lifetime of an article. A pageview cannot logically exist without an article, but we may not be told about the article before we know about the pageview.

Mikkel Birkegaard Andersen

02/25/2025, 12:17 PM

The pageview event is an aggregate of views over the articles lifetime. It is being computed by another team. Right now we get the daily, but may get them with other frequencies in the future. Other teams still are responsible for providing their articles. When we get a pageview update we may have the article already, we might receive it in the future, or we might never receive it

Fanis Tharropoulos

02/25/2025, 12:20 PM

For your issue with pageviews arriving before articles, I'd suggest: 1. Make your pageviews schema flexible enough to accept entries even when the referenced article doesn't exist yet 2. Use a background job to periodically "reconcile" these orphaned pageviews with articles as they arrive

Kishore Nallan

02/25/2025, 3:50 PM

I just realized that we haven't properly documented the

async_reference

property that relaxes the ordering constraint for indexing on collections which refer to each other. Please see this: https://github.com/typesense/typesense/issues/1675#issuecomment-2337604801

Mikkel Birkegaard Andersen

02/26/2025, 1:43 PM

That’s awesome and exactly what we need!

Fanis Tharropoulos

02/26/2025, 2:24 PM

Added a section in the docs to document this: https://github.com/typesense/typesense-website/pull/298

Mikkel Birkegaard Andersen

02/26/2025, 2:46 PM

Thanks!

Mikkel Birkegaard Andersen

02/27/2025, 7:03 PM

Migrated to async_references and it solves the use case perfectly

👍 2

Open in Slack

Previous Next