Hello, can I ask you a few technical questions? I ...
# community-help
k
Hello, can I ask you a few technical questions? I would like to understand how Typesense would work if my collection have more than 1 million documents and is setup to
sort_by
price. I heard about a hard limit of 1000 candidates on which Typesense does a sort to keep good performances. How does it work?
k
We've customers with much larger datasets doing the same. Typesense is optimized to do this fast because we only fetch top records based on pagination.
k
I am not really asking that for speed but more for the technical part. What do you mean about « fetching the top records based on pagination » ? Do you mean that Typesense will only sort the first page for example?
k
Correct, first few hundred results.
k
Thank you very much. Does it work the same with a
query = *
too? Does Typesense always sort the 250 first random documents only?
Hey @Kishore Nallan 👋 Do you have any idea about my previous message?
k
Yes, treated same way. We limit to 250 results by default unless deep pagination is requested. This is what all search engines do, for e.g. you can't see "all" results on Google even though a search query might produce 100K results, Google will show only first 15-20 pages max.
k
Thank you very much. However, it means that when you do a
query = *
the candidates is the whole set and therefore if only the first 250 candidates are sorted, the lowest ones are not shown first, right?
k
I don't get your question.
k
On a
query = *
all the documents must be eventually returned, right? They are stored internally in a non specific order, right? But you told me that only the first 250 documents are sorted when returned, it means that if the lowest document (the one that must be returned first) is very deep, it will not be returned first, as it is not in the first 250 candidates, right?
k
No, we support sort on either asc/desc, doesn't matter.
k
I am not sure you understand my point. Are all my previous points correct? Please consider that
sort_by = price asc
in the settings.
k
I'm sorry, I still don't fully understand your question. Typesense does return correct results when sorted either way (asc or desc) for q=*.
k
I just tried the songs demo and I am not sure to understand why results and not sorted by date. I am just searching for « oasis » here no filter/facet selected.
k
Because query sorts by text relevance not date
And we also "bucket" text match:
Copy code
"sort_by": "_text_match(buckets: 10):desc"
k
I understand that but why is the second row not sorted? Marches looks equally important: the title matches and text is the same « from oasis ». Am I missing something?
What does it mean to « bucket text match » ?
You can check the match score info in the results to verify the exact scores. Search could be made on other fields which are not show in UI.
k
Ho ok, I read and understand the doc page you linked now, thank you! However, is there any way on the songs demo to filter by one artist (on the left) without inputting any query text?
k
Not on the demo
k
Is there a reason why? It seems that there is a front script (in js) preventing me from inputing
q = *
and hiding the left bar (filters and facets).
Hey 👋 , I found out how to search for
q = *
on the songs demo and it is very slow (7 secondes or 4 secondes when filtering). I understand why there is a front script preventing people to use it that way. Can you tell me the machine spec that is used for this demo?
k
We've a number of demos, and this one shows a usecase which comes often in our calls: using instantsearch to build a search experience that shows results only when a query is typed. Machine spec I think is 2 GB / 2 cpu.
k
Indeed, it very cool! But why is it so slow when it comes to expose just 20 results in the first page without any query? What prevents an attacker from DDOSing the server by send
q = *
requests to the server?
k
There are ways to deal with that. Btw, what's your use case? Let's talk more about what you are building or planning to build.
k
I would like to replace the search engine of a musique website, that’s exactly the use case you show. Unfortunately, I would like to let the user search for an artist, a genre or a date range on the left side and see the musique listed from the more recent to the less
How would you deal with that ? Is it a resources issue here? More RAM could help maybe?
k
Can you give me the link to your website? Happy to go over that to understand the UX and offer help.
k
My website is not public and most of the search will be done on mobile. But a large part of the community is already on desktop. Can I understand why it takes a lot of time, please? Is it due to the fact that the sorting is done on the candidates from
q = *
, i.e, the whole dataset?
k
That demos use pretty underpowered clusters so that might be it.
k
Which version of Typesense are you using with this demo, please? I will try to host it on my side.
Hey @Kishore Nallan 👋 Can you answer my previous message please ? ☝️
I’ll also try on the v0.24 that has just been released ☺️