Anirudh Atodaria
12/16/2024, 10:22 PM422 other operation in progress
(something similar). After this, all the nodes went down (it started working after 15 mins, with the added fields).
Was this because we tried to add a few fields? (I can imagine this being the case as we have large number of documents).
In addition to that, we keep seeing high CPU usage (even after upgrading to 8vCPU). (While we do see higher number for search per second, not sure if it should affect the CPU this much).
Is there a way to look at the logs to figure out what are some of the requests? (We aren't fetching embedding field, however some/many of our documents are quiet large, could that be a reason?).
prev convo: https://typesense-community.slack.com/archives/C01P749MET0/p1733959104496539Jason Bosco
12/16/2024, 10:28 PMAnirudh Atodaria
12/16/2024, 10:33 PMJason Bosco
12/16/2024, 10:35 PMAnirudh Atodaria
12/16/2024, 10:35 PMJason Bosco
12/16/2024, 10:36 PMAnirudh Atodaria
12/16/2024, 10:36 PMSearches (especially vector searches and auto-embedding) require additional RAM depending on the query. So you'd need more head-room than what's available to handle both the data and the searchesAh I see, we're seeing a bit higher search volume if I'm not wrong, and since we use vector search, its using up more RAM. Thank you for clearing that up. 🙏
Jason Bosco
12/16/2024, 10:39 PMJason Bosco
12/16/2024, 10:39 PMAnirudh Atodaria
12/16/2024, 10:42 PMAnirudh Atodaria
12/17/2024, 6:35 PMAnirudh Atodaria
12/17/2024, 7:06 PMAnirudh Atodaria
12/17/2024, 10:24 PMJason Bosco
12/17/2024, 11:03 PMJason Bosco
12/17/2024, 11:03 PMAnirudh Atodaria
12/17/2024, 11:23 PM48vCPU
-- RN Typesense is not returning anything and the traffic is good for us.
While the search volume is higher and we're using vector search, is it expected? (With the amount of traffic we have, with having 16vCPU and GPU acceleration?)
(You probably know this already but we're on v28.0.rc27)
Sorry for the vague questions here, just want to make sure that there is not something unexpected.Jason Bosco
12/17/2024, 11:29 PMAnirudh Atodaria
12/17/2024, 11:30 PMJason Bosco
12/17/2024, 11:58 PM{
"searches": [
{
"filter_by": "productId: [about 200 IDs]",
"per_page": 250,
"q": "*",
"group_by": "productId",
"group_limit": 1,
"query_by": "name",
"query_by_weights": "1",
"collection": "variants_v3"
}
]
}
The search_time itself is about 12ms for this query.
BUT, each document is about 300KB (and embeddings are a tiny part of the docs). So fetching 250 documents, results in a payload size of 75MB PER api call. Fetching this from disk and then compressing it to send it over the wire is what is causing the high I/O and hence high CPU usage.
The way to solve this (besides adding a high number of CPU cores) is to reduce this payload size that is fetched through the wire, to just the fields required to display the results, using the exclude_fields
or include_fields
parameter.
For eg: you definitely want to exclude the embedding field. I also see a large array field called offers
which seems to exist at the top level and also repeated inside the variants
field once again (at least from what I can tell). There's also a priceHistory
field which seems to be super large. These seem to be the bulk of the document.
Do you need these for display purposes on the search results page?Anirudh Atodaria
12/18/2024, 12:18 AM