#community-help

Limiting Returned Array Size and Using Joins in Document Search

TLDR James asked how to limit the returned array size in a document search. Jason suggested breaking the document into multiple records. After discussing join options and providing his collections data, James decided to duplicate the parent information in each child. Harpreet confirmed this approach.

Powered by Struct AI
8
2mo
Solved
Join the chat
Sep 22, 2023 (2 months ago)
James
Photo of md5-7fdd6d3e772affdbb3f91db8cb91cc05
James
08:34 PM
Is there a way to limit the number of objects returned in an array of objects field? I have a collection with a field that is an array of objects. There could be between 1 and 20000 objects in this array, depending on the document (typically about 5-10). When I return documents for a search, I would like to limit this field to return at most 5 members, ideally sorted by search score as I'll be including subfields in the search. I can't seem to find this documented.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:35 PM
It's not possible to do this in a nested array of objects.

If the size can be this high, you might want to consider breaking this out into multiple records, and then using group_by to fetch one result from each group
James
Photo of md5-7fdd6d3e772affdbb3f91db8cb91cc05
James
08:52 PM
Thanks for the fast response. I think I follow what you are saying and see how that would work. Maybe I can explain the problem I'm trying to solve and see if you see another option. I was experimenting with joins today and couldn't see a way forward there.
09:02
James
09:02 PM
We want to be able to search across a parent/child relationship simultaneously, lets say fields 1-3 in the document below.
{
   "field1": "abc",
   "children": [
      { "field2": "def" },
      { "field3": "geh" },
   }
}

I see how this can be done with nesting, but there is the issue of returning too many children. or having to split the document as you suggest. If I separate children into a separate collection, and join them to the parent, I don't see a way to search across both collections at once based on the current specs for joins. But maybe I'm wrong there. Do you see a solution with joining?
Sep 23, 2023 (2 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:19 AM
> I don't see a way to search across both collections at once based on the current specs for joins.

Can you please post the parent child collections that you have created and what query you are now trying to do?
James
Photo of md5-7fdd6d3e772affdbb3f91db8cb91cc05
James
01:32 PM
Sure, here is a simplified version of my actual collections:
{
  "name": "parent",
  "fields": [
    { "name": "parent_id", "type": "string" },
    { "name": "name", "type": "string" },
  ]  
}

{
  "name": "children",
  "fields": [
    { "name": "child_id", "type": "string" },
    { "name": "name", "type": "string" },
    { "name": "parent_id", "type": "string", "reference": "parent.parent_id" }   
  ]
}

Based on the current join documentation, it doesn't seem like you can use query_by with a reference collection at all (if I try it hangs), but I'd like to do something like:
{
   "q": "abc",
   "query_by": "name,$parent(name)",
   "collection": "children"
}

A simple solution in my case seems to be just to add all parent fields directly to the children, and have a single collection with the children alone. I can then query across all the fields I want and use group by to aggregate and limit on the parent ID
// collection
{
  "name": "children",
  "fields": [
    { "name": "child_id", "type": "string" },
    { "name": "name", "type": "string" },
    { "name": "parent_id", "type": "string" },
    { "name": "parent_name", "type": "string" }    
  ]
}

// query
{
   "q": "abc",
   "query_by": "name,parent_name",
   "collection": "children",
   "group_by": "parent_id",
   "group_limit": 5
}

The only downside I think is that I have to duplicate the parent information across many children, but the collection isn't that large so I don't think this is an issue.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:52 PM
Thanks for the detailed breakdown. We will be looking into this next week. I'll post an update.
Sep 25, 2023 (2 months ago)
Harpreet
Photo of md5-745d880d794220d9f0fb9ade17c6b861
Harpreet
11:39 AM
Hi James if you only want to query by the parent's name field along with child's name, you can add just the parent name field in each child and do:
{
   "q": "abc",
   "query_by": "name,parent_name",
   "collection": "children",
   "group_by": "parent_id",
   "group_limit": 5
}

parent_id will be the reference field in the child collection. If you want the rest of the fields of the referenced parent to be included in the response, you can send
"include_fields": "$parent(*)"

if there are fields with common names in both the collections, you can specify
"include_fields": "$parent(*) as parent"

so every field of the parent would have parent. as a prefix.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community