Hi everyone! I found some time to upgrade our inst...
# community-help
d
Hi everyone! I found some time to upgrade our instance from v28 to v29, but faced with high latency / performance degradation issue for some of the searches. How can I debug / troubleshoot it? Do you have known performance issues / changes in v29?
p
Hi Dima. One primary area of changes were around group_by. Can you see if there is a noticeable shift in group_by vs non-group_by queries?
d
We’re using group_by almost everywhere, so I don’t have a clear plan how to check it 😢
However, I pinpoint one of the degradated search query, and it is
a/b test
or `a b test`🤔
p
What you can do is compare a query with/without group by on v28 and v29 to see if it's only group_by that's affected.
And how big is the degradation?
d
200ms -> 1800ms
Removing group_by making it even on 28 and 29 (~200ms)
k
We will probably need some form of dataset with maybe only non identifiable fields to investigate the regression.
d
Do you have any ideas how to make this reproducible example on smaller anonymized dataset? We index internal documentation, so we cannot disclosure the content
My plan was: • Make a search that have degraded performance without group_by • Fetch all found documents • Strip everything from the document except grouping field. Add one indexed field with some predefined keyword like
reproducible example
Will it be enough to reproduce the issue?
k
How many docs does the search query match? Without group by?
Does the slow query have filter clause?
If you can just keep the query word but replace every other word in the documents with random word will work.
d
What is the difference between
found
and
found_docs
?
Ah, got it, grouped vs non-grouped. Convenient
How many docs does the search query match? Without group by?
W/o group_by: 1256 W group_by: 298
Does the slow query have filter clause?
No, it is reproduced without it as well
Btw, group distribution:
k
And total docs in collection?
d
Copy code
out_of
: 
542180
k
Ok hopefully you can extract an anonymzied set out of it.
d
On the smaller dataset (consisting with only documents that was hit by the search) I see difference 2ms w/o group_by -> 10ms w group_by, which I suppose is not enough
k
Yes it might be a little too small for us to deterministically profile in a debug build.
d
I was able to achieve 5 -> 70ms on anonymized dataset, but nothing close to the original number. The moment I replace non-relevant (out of hits) documents’ content with lorem ipsum, the search performance improves 700ms -> 70ms 🤓
k
5 to 70 is a lot better. We can take a look. As for non relevant replacement, one important thing is to retain word distribution. The ideal way to do this is to hash every word and then take last N characters, maybe 6 chars. This anonymizes the data but keeps original data distribution. E.g. "The lazy fox" could become "xtw2ga hegzgy vcshjw" and each time the same word occurs the same hashed word suffix will appear, thereby maintaining word statistics.