#community-help

Typesense Query Suggestions Throttling Mechanism Discussion

TLDR Arad asked about potential abuse of Typesense's query suggestions feature. Jason explained how query uniqueness gets determined based on the X-TYPESENSE-USER-ID and analytics-flush-interval. It was mentioned creating two GitHub issues about implementing a request limit and ignoring certain search analytics.

Powered by Struct AI

2

1

Oct 09, 2023 (1 month ago)
Arad
Photo of md5-c2105be1c75ca77ae57ab06abafe105c
Arad
04:33 PM
Does Typesense's query suggestions feature have some sort of throttling mechanism in place for incrementing the count of a query?
Imagine if a user tried to abuse the system by sending requests for the same query 100 hundred times in quick succession, would that cause Typesense to increment the count by 100?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:36 PM
Aggregations happens based on analytics-flush-interval and the aggregation key X-TYPESENSE-USER-ID.

We consider unique search terms after a 4 second delay after the last keypress
04:38
Jason
04:38 PM
Oh wait, let me refine that

1

Arad
Photo of md5-c2105be1c75ca77ae57ab06abafe105c
Arad
04:40 PM
Jason Ah that's interesting, so all the queries with the same X-TYPESENSE-USER-ID will "collapse" into one, so to speak? And is that collapsing limited to the duration of analytics-flush-interval? Meaning that if analytics-flush-interval is, like, 5 seconds, If the same user (with the same X-TYPESENSE-USER-ID ) sends the same query twice, once now, and once 10 seconds from now, that will increase the count by 2?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:44 PM
X-TYPESENSE-USER-ID is used to group keypresses in a search-as-you-type experience.

So for eg, if the user types in term one letter at a time, the search queries will show up to Typesense as: t, te, ter, term...

Typesense will wait for 4s after the last query term for a given X-TYPESENSE-USER-ID and collapse the previous searches for t, te and ter into term`...
04:45
Jason
04:45 PM
analytics-flush-interval is actually independent from the the 4s interval. flush interval is when the collected analytics logs are analyzed and the aggregation I mentioned above is performed across all users, across all search terms
04:46
Jason
04:46 PM
> If the same user (with the same X-TYPESENSE-USER-ID ) sends the same query twice, once now, and once 10 seconds from now, that will increase the count by 2?
That's correct.

But, this is actually not based on analytics-flush-interval like I mentioned earlier (my bad - I edited that out), but based on a fixed 4s window.
04:46
Jason
04:46 PM
So two search queries sent 4s apart will count as 2 searches
Arad
Photo of md5-c2105be1c75ca77ae57ab06abafe105c
Arad
04:57 PM
Jason Oh okay, got it now. Thank you.
So there isn't a throttling mechanism specifically for preventing abuse of this kind. The type of thing I was thinking about was more along the lines of having a limit for the amount of times the count of a query is incremented within N seconds/minutes/hours. So that, for example, even if there's a 1,000 requests for the same query within the span of 3 hours, all that just increments count by 1.

So I'll probably have to use a custom collection for this that my app populates on its own, according to whatever custom heuristics it may have (given all the requests to Typesense actually go through our backend, it shouldn't be too difficult to implement this,)

One last question: Is there a way to tell Typesense (e.g. via a query string parameter when sending a request to the search endpoint) that it should ignore that particular search in terms of analytics and not store its query in the queries collection?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:42 PM
> having a limit for the amount of times the count of a query is incremented within N seconds/minutes/hours.
Typesense doesn't have this... But I think that will be a useful feature to support.

> Is there a way to tell Typesense (e.g. via a query string parameter when sending a request to the search endpoint) that it should ignore that particular search in terms of analytics
This is not possible at the moment, but I was thinking about this myself recently.

Could you create two GitHub issues, so we can track these?

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Troubleshooting Typesense API Analytics Query Suggestions

Md was confused about implementing Typesense's Analytics Query Suggestions and experienced issues with collections returning no hits. Assistance from Kishore Nallan eventually led to the identification that analytics had to be enabled. They also discussed tracking duplicate and empty queries, resulting in Md creating a Github issue.

3

27
3mo

Typesense Capabilities and Troubleshooting Queries

A had issues with refinement lists and analytics in Typesense. Jason provided a possible solution and recommended the analytics widget. They clarified import size limits and helped identify a filter issue in A's query. Upgrade options are in Typesense's roadmap.

4

32
35mo

Fetching All Docs from a Collection in Typesense

Julian asked if all docs could be fetched from a Typesense collection, and Kishore Nallan explained there's a 250 result limit due to performance considerations. Andrew suggested using the export function, explaining their operations and performance.

19
15mo

Issue with Search Duration on Typesense Database

Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.

13
10mo

Integrating Semantic Search with Typesense

Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.

6

75
11mo