Hi everyone I found some time to upgrade our instance from v typesense #community-help

Hi everyone! I found some time to upgrade our inst...

Dima

08/25/2025, 12:17 PM

Hi everyone! I found some time to upgrade our instance from v28 to v29, but faced with high latency / performance degradation issue for some of the searches. How can I debug / troubleshoot it? Do you have known performance issues / changes in v29?

Plain

08/25/2025, 12:20 PM

Hi Dima. One primary area of changes were around group_by. Can you see if there is a noticeable shift in group_by vs non-group_by queries?

Dima

08/25/2025, 12:25 PM

We’re using group_by almost everywhere, so I don’t have a clear plan how to check it 😢

Dima

08/25/2025, 12:26 PM

However, I pinpoint one of the degradated search query, and it is

a/b test

or `a b test`🤔

Plain

08/25/2025, 12:28 PM

What you can do is compare a query with/without group by on v28 and v29 to see if it's only group_by that's affected.

Plain

08/25/2025, 12:28 PM

And how big is the degradation?

Dima

08/25/2025, 12:29 PM

200ms -> 1800ms

Dima

08/25/2025, 12:30 PM

Removing group_by making it even on 28 and 29 (~200ms)

Kishore Nallan

08/25/2025, 12:37 PM

We will probably need some form of dataset with maybe only non identifiable fields to investigate the regression.

Dima

08/25/2025, 2:45 PM

Do you have any ideas how to make this reproducible example on smaller anonymized dataset? We index internal documentation, so we cannot disclosure the content

Dima

08/25/2025, 2:48 PM

My plan was: • Make a search that have degraded performance without group_by • Fetch all found documents • Strip everything from the document except grouping field. Add one indexed field with some predefined keyword like

reproducible example

Will it be enough to reproduce the issue?

Kishore Nallan

08/25/2025, 3:15 PM

How many docs does the search query match? Without group by?

Kishore Nallan

08/25/2025, 3:15 PM

Does the slow query have filter clause?

Kishore Nallan

08/25/2025, 3:16 PM

If you can just keep the query word but replace every other word in the documents with random word will work.

Dima

08/25/2025, 3:30 PM

What is the difference between
found
and
found_docs
? Ah, got it, grouped vs non-grouped. Convenient

Dima

08/25/2025, 3:31 PM

How many docs does the search query match? Without group by?

W/o group_by: 1256 W group_by: 298

Dima

08/25/2025, 3:32 PM

Does the slow query have filter clause?

No, it is reproduced without it as well

Dima

08/25/2025, 3:34 PM

Btw, group distribution:

Kishore Nallan

08/25/2025, 3:48 PM

And total docs in collection?

Dima

08/25/2025, 3:51 PM

Copy code

out_of
: 
542180

Kishore Nallan

08/25/2025, 3:56 PM

Ok hopefully you can extract an anonymzied set out of it.

Dima

08/25/2025, 7:05 PM

On the smaller dataset (consisting with only documents that was hit by the search) I see difference 2ms w/o group_by -> 10ms w group_by, which I suppose is not enough

Kishore Nallan

08/26/2025, 3:21 AM

Yes it might be a little too small for us to deterministically profile in a debug build.

Dima

08/26/2025, 9:10 PM

I was able to achieve 5 -> 70ms on anonymized dataset, but nothing close to the original number. The moment I replace non-relevant (out of hits) documents’ content with lorem ipsum, the search performance improves 700ms -> 70ms 🤓

Kishore Nallan

08/27/2025, 4:03 AM

5 to 70 is a lot better. We can take a look. As for non relevant replacement, one important thing is to retain word distribution. The ideal way to do this is to hash every word and then take last N characters, maybe 6 chars. This anonymizes the data but keeps original data distribution. E.g. "The lazy fox" could become "xtw2ga hegzgy vcshjw" and each time the same word occurs the same hashed word suffix will appear, thereby maintaining word statistics.

3 Views

Open in Slack

Previous Next