#community-help

Typesense Performance with Large Datasets & Custom Sort

TLDR krok inquires about Typesense's performance on large datasets and custom sorting. Kishore Nallan explains that Typesense is optimized for this scenario using pagination and text relevance.

Powered by Struct AI

1

35
8mo
Solved
Join the chat
Feb 05, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
03:19 PM
Hello, can I ask you a few technical questions? I would like to understand how Typesense would work if my collection have more than 1 million documents and is setup to sort_by price. I heard about a hard limit of 1000 candidates on which Typesense does a sort to keep good performances. How does it work?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:18 PM
We've customers with much larger datasets doing the same. Typesense is optimized to do this fast because we only fetch top records based on pagination.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
04:49 PM
I am not really asking that for speed but more for the technical part. What do you mean about « fetching the top records based on pagination » ? Do you mean that Typesense will only sort the first page for example?
Feb 06, 2023 (8 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:26 AM
Correct, first few hundred results.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
07:56 AM
Thank you very much. Does it work the same with a query = * too? Does Typesense always sort the 250 first random documents only?
Feb 07, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:44 AM
Hey Kishore Nallan 👋 Do you have any idea about my previous message?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:46 AM
Yes, treated same way. We limit to 250 results by default unless deep pagination is requested. This is what all search engines do, for e.g. you can't see "all" results on Google even though a search query might produce 100K results, Google will show only first 15-20 pages max.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:48 AM
Thank you very much. However, it means that when you do a query = * the candidates is the whole set and therefore if only the first 250 candidates are sorted, the lowest ones are not shown first, right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:50 AM
I don't get your question.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:52 AM
On a query = * all the documents must be eventually returned, right? They are stored internally in a non specific order, right? But you told me that only the first 250 documents are sorted when returned, it means that if the lowest document (the one that must be returned first) is very deep, it will not be returned first, as it is not in the first 250 candidates, right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:55 AM
No, we support sort on either asc/desc, doesn't matter.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
09:14 AM
I am not sure you understand my point. Are all my previous points correct? Please consider that sort_by = price asc in the settings.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:40 AM
I'm sorry, I still don't fully understand your question. Typesense does return correct results when sorted either way (asc or desc) for q=*.
Feb 08, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
07:49 AM
I just tried the songs demo and I am not sure to understand why results and not sorted by date. I am just searching for « oasis » here no filter/facet selected.
Image 1 for I just tried the songs demo and I am not sure to understand why results and not sorted by date. I am just searching for « oasis » here no filter/facet selected.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:03 AM
Because query sorts by text relevance not date
08:04
Kishore Nallan
08:04 AM
And we also "bucket" text match:

"sort_by": "_text_match(buckets: 10):desc"
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:05 AM
I understand that but why is the second row not sorted? Marches looks equally important: the title matches and text is the same « from oasis ». Am I missing something?
08:05
krok
08:05 AM
What does it mean to « bucket text match » ?
08:10
Kishore Nallan
08:10 AM
You can check the match score info in the results to verify the exact scores. Search could be made on other fields which are not show in UI.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:22 AM
Ho ok, I read and understand the doc page you linked now, thank you! However, is there any way on the songs demo to filter by one artist (on the left) without inputting any query text?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:26 AM
Not on the demo
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:29 AM
Is there a reason why? It seems that there is a front script (in js) preventing me from inputing q = * and hiding the left bar (filters and facets).
Feb 09, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
09:18 AM
Hey 👋 , I found out how to search for q = * on the songs demo and it is very slow (7 secondes or 4 secondes when filtering). I understand why there is a front script preventing people to use it that way. Can you tell me the machine spec that is used for this demo?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:47 PM
We've a number of demos, and this one shows a usecase which comes often in our calls: using instantsearch to build a search experience that shows results only when a query is typed.

Machine spec I think is 2 GB / 2 cpu.
Feb 13, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:28 AM
Indeed, it very cool! But why is it so slow when it comes to expose just 20 results in the first page without any query? What prevents an attacker from DDOSing the server by send q = * requests to the server?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
08:34 AM
There are ways to deal with that. Btw, what's your use case? Let's talk more about what you are building or planning to build.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
05:52 PM
I would like to replace the search engine of a musique website, that’s exactly the use case you show. Unfortunately, I would like to let the user search for an artist, a genre or a date range on the left side and see the musique listed from the more recent to the less
05:52
krok
05:52 PM
How would you deal with that ? Is it a resources issue here? More RAM could help maybe?
Feb 14, 2023 (8 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:15 PM
Can you give me the link to your website? Happy to go over that to understand the UX and offer help.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
12:29 PM
My website is not public and most of the search will be done on mobile. But a large part of the community is already on desktop. Can I understand why it takes a lot of time, please? Is it due to the fact that the sorting is done on the candidates from q = *, i.e, the whole dataset?
Feb 15, 2023 (8 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:16 AM
That demos use pretty underpowered clusters so that might be it.
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
11:18 AM
Which version of Typesense are you using with this demo, please? I will try to host it on my side.
Feb 17, 2023 (8 months ago)
krok
Photo of md5-001a6b8b05601dc8ac56c5f364768cc1
krok
08:26 AM
Hey Kishore Nallan 👋 Can you answer my previous message please ?
08:26
krok
08:26 AM
I’ll also try on the v0.24 that has just been released ☺️