Does the Union Search work with group_by parameter...
# community-help
m
Does the Union Search work with group_by parameter?
Also facet_by? I have a 10 million records collection and I'm getting about 600ms for a regular search. After split into several smaller collections and build a Multisearch/Union I've managed to bring down to 80ms. This is huge but I need group_by and facets to make the project as a whole work as expected.
k
Union does not work with group_by or facet_by yet. It's on our roadmap.
I have a 10 million records collection and I'm getting about 600ms for a regular search.
Can you post your search query parameters? Maybe I can suggest a way to optimize the query.
m
Hello @Kishore Nallan! My Scenario: Hardware: 3 Nodes - 32gb - 8 CPU cores Previously I had all data in 1 collection (12 million skus+prices). Each sku has many prices/sellers. This was a "fat" collection and it was taking about 6-7 seconds for similar query. Then I changed the data modeling to split skus and prices in 2 different collections. Using "joins" the timing is now around 500-700ms. 2 collections: skus (about 120k skus) and prices (about 12 million prices) This is a very simplified version of one of the most common query I have here:
Copy code
{ "searches": [{
"collection": "skus",
"query_by": "sku_name,product_name",
"q": "*",
"per_page": 10,
"include_fields": "product_id,sku_id,seller_id,sku_name,product_name",
"filter_by": "$offers(offer_active:=true && clusterId:=['null','abd-1234'] && paymentFormId:!='null') && seller_id:=['670972463814f3334d650c74','66a9981f9e6d8686ba64a3f3']"
}]}
"search_time_ms": 417
And I have this scenario which is very weird. This query should be very fast right? it's very simple and I just want to get 10 active offers.
Copy code
{ "searches": [{ "collection": "offers", "q": "*", "per_page": 10, "filter_by": "offer_active:true" }]}
"search_time_ms": 1974
k
This query should be very fast right? it's very simple and I just want to get 10 active offers.
This is a wildcard query so has to search through all the records if the filter clause matches most products.
👍 1
One tip regarding filtering is to use
foo: bar
syntax rather than
foo:= bar
syntax. The later syntax requires more cpu to compute exact match. If the IDs are fixed length, like SHAs, you can just drop the
=
👍 1
m
I've dropped the
=
from the first query. The search time is basically the same.
Without the join part (skus without price) the result is 5ms
Now I'm doing some tests and splitting the offers in many different collections (prices by sellers). Each one will have less than a million records each. Using the multisearch with UNION returns bellow 80ms, but I need the group_by functionality 😕
k
If you are able to split, adding a filter called
split_id: foo
doesn't help? Within the same collection.
m
Now I'm splitting offers by seller_id trying to make use of the UNION feature. But the original relationship key is sku_id + seller_id. So each sku and each offer has this key. I've tried to add sellers inside the $offers filter, but the result is very close:
Copy code
{ "searches": [{
"collection": "skus",
"query_by": "sku_name,product_name",
"q": "*",
"per_page": 10,
"include_fields": "product_id,sku_id,seller_id,sku_name,product_name",
"filter_by": "$offers(offer_active:=true && clusterId:=['null','abd-1234'] && paymentFormId:!='null' && seller_id:=['670972463814f3334d650c74','66a9981f9e6d8686ba64a3f3']) && seller_id:=['670972463814f3334d650c74','66a9981f9e6d8686ba64a3f3']"
}]}
"search_time_ms": 370