Typesense Query Suggestions Throttling Mechanism Discussion
TLDR Arad asked about potential abuse of Typesense's query suggestions feature. Jason explained how query uniqueness gets determined based on the X-TYPESENSE-USER-ID
and analytics-flush-interval
. It was mentioned creating two GitHub issues about implementing a request limit and ignoring certain search analytics.
2
1
Oct 09, 2023 (1 month ago)
Arad
04:33 PMcount
of a query?Imagine if a user tried to abuse the system by sending requests for the same query 100 hundred times in quick succession, would that cause Typesense to increment the
count
by 100?Jason
04:36 PManalytics-flush-interval
and the aggregation key X-TYPESENSE-USER-ID
.We consider unique search terms after a 4 second delay after the last keypress
Jason
04:38 PM1
Arad
04:40 PMX-TYPESENSE-USER-ID
will "collapse" into one, so to speak? And is that collapsing limited to the duration of analytics-flush-interval
? Meaning that if analytics-flush-interval
is, like, 5 seconds, If the same user (with the same X-TYPESENSE-USER-ID
) sends the same query twice, once now, and once 10 seconds from now, that will increase the count
by 2?Jason
04:44 PMX-TYPESENSE-USER-ID
is used to group keypresses in a search-as-you-type experience.So for eg, if the user types in
term
one letter at a time, the search queries will show up to Typesense as: t
, te
, ter
, term
...Typesense will wait for 4s after the last query
term
for a given X-TYPESENSE-USER-ID
and collapse the previous searches for t,
te and
ter into
term`...Jason
04:45 PManalytics-flush-interval
is actually independent from the the 4s
interval. flush interval is when the collected analytics logs are analyzed and the aggregation I mentioned above is performed across all users, across all search termsJason
04:46 PMThat's correct.
But, this is actually not based on analytics-flush-interval like I mentioned earlier (my bad - I edited that out), but based on a fixed 4s window.
Jason
04:46 PMArad
04:57 PMSo there isn't a throttling mechanism specifically for preventing abuse of this kind. The type of thing I was thinking about was more along the lines of having a limit for the amount of times the
count
of a query is incremented within N seconds/minutes/hours. So that, for example, even if there's a 1,000 requests for the same query within the span of 3 hours, all that just increments count
by 1.So I'll probably have to use a custom collection for this that my app populates on its own, according to whatever custom heuristics it may have (given all the requests to Typesense actually go through our backend, it shouldn't be too difficult to implement this,)
One last question: Is there a way to tell Typesense (e.g. via a query string parameter when sending a request to the search endpoint) that it should ignore that particular search in terms of analytics and not store its query in the queries collection?
Jason
07:42 PMTypesense doesn't have this... But I think that will be a useful feature to support.
> Is there a way to tell Typesense (e.g. via a query string parameter when sending a request to the search endpoint) that it should ignore that particular search in terms of analytics
This is not possible at the moment, but I was thinking about this myself recently.
Could you create two GitHub issues, so we can track these?
1
Typesense
Indexed 3015 threads (79% resolved)
Similar Threads
Troubleshooting Typesense API Analytics Query Suggestions
Md was confused about implementing Typesense's Analytics Query Suggestions and experienced issues with collections returning no hits. Assistance from Kishore Nallan eventually led to the identification that analytics had to be enabled. They also discussed tracking duplicate and empty queries, resulting in Md creating a Github issue.
Typesense Capabilities and Troubleshooting Queries
A had issues with refinement lists and analytics in Typesense. Jason provided a possible solution and recommended the analytics widget. They clarified import size limits and helped identify a filter issue in A's query. Upgrade options are in Typesense's roadmap.
Fetching All Docs from a Collection in Typesense
Julian asked if all docs could be fetched from a Typesense collection, and Kishore Nallan explained there's a 250 result limit due to performance considerations. Andrew suggested using the export function, explaining their operations and performance.
Issue with Search Duration on Typesense Database
Robert reported an issue about query time delay when adding a `filter_by` constraint in a large Typesense database. Kishore Nallan explained that this happens due to the order of operation and also promised to look into this issue further. Robert withdrew his interest in sponsoring the improvement due to moving from the project.
Integrating Semantic Search with Typesense
Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.