#community-help

Grouping and Faceting Denormalized Data

TLDR Phillip asked about grouping and faceting denormalized data. Viji provided a specific example for clarity. Jason confirmed Phillip's plan and suggested a different approach for consideration. Phillip acknowledged the advice.

Powered by Struct AI

1

7
19mo
Solved
Join the chat
Jun 08, 2022 (19 months ago)
Phillip
Photo of md5-3d8346de287401da0aaa8b11cddb1db7
Phillip
06:09 PM
Hi all. We current have a situation where we have one denormalized collection X which we are using to search, which has data about the document X, but also denormalized data about Y that relates to each document X.

We want to take the list of results from X and group them on our server into a list of Y objects which each have a list of n X documents.
The problem we are trying to solve is how to do the faceting here.
1. We want to search over collection X and group it into Y objects.
2. We want the facet counts to be proportional to the number n of Y objects we end up with.
This is our plan right now.
1. Retrieve the full list of X documents and group them into Y type objects on our server.
2. Have another typesense collection of Y type documents and do a second search for all the ids of the Y documents, which would return the facet counts proportional to the Y objects we have on the server.
Is this the most reasonable way to do this? Is there something we are missing?
06:10
Phillip
06:10 PM
Also pls lmk if I can clarify anything this is a bit obtuse.
cc Viji Todd Willian
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:16 PM
Phillip Mind rephrasing this with specific example documents for each of X and Y?
Viji
Photo of md5-d2def4ce72082649c7191218a9e73146
Viji
08:06 PM
Let me try to explain this with an example. Let us say
We have a Movies collection which has Movies with metadata about these Movies
We have a Movie Content collection which is the content of each Movie broken up into each dialogue in the Movie (each dialog is a separate document with its own metadata such as the actor gender). We thought we should denormalize this Movie Content collection by also adding the Movie metadata to this Movie Content collection.
We have complex searches that go against Movie metadata and also searching through Movie Content such as:
We want movies that are 30-60 minutes long released in the last 90 days with these producers from these countries where a female actor said X word or Y word but no male actor said Z word.
When we return these results, we want the counts for the facets to reflect the counts of movies that met the complex search requirements rather than the counts of dialogues that met those requirements.
Hope this helps!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:22 PM
Perfect, that helps!

The plan Phillip mentioned above should work.

Another thing I'd recommend trying to see if it gives you what you're looking for is using group_by and may be grouping by the Movie ID, when searching through the Movie Content collection
Phillip
Photo of md5-3d8346de287401da0aaa8b11cddb1db7
Phillip
08:25 PM
oh that is great to explore. not sure it will work because we do multiple initial paginated queries and the first page may not include all the same Movie IDs, but will def look into it
08:25
Phillip
08:25 PM
thank you!

1