How often can I insert/update? I know it depends o...
# community-help
s
How often can I insert/update? I know it depends on a lot of variables, but in general what is suggested here? Let's say I already have around 1 million records and hundreds are added/updated every few minutes. Is it going to be too heavy on indexing? My users can add items (title, description, tags) to the site and these must be indexed. Users can also like items already on the site, and it would be nice if this counter could be updated/indexed as well. Are a few hundred inserts/updates every few minutes too much? Do I need to rethink my insert/update strategy by using bulk every few hours? Thanks
j
Are a few hundred inserts/updates every few minutes too much?
This should be fine. But the complete answer to this depends on the amount of CPU cores / capacity. In general though, bulk updates are faster than single updates. I'd recommend leaning towards batch updates when possible.
s
Thanks for your answer Jason. Awesome work by the way! 👏 Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU? I see that in the benchmarks, Typesense managed to get 104 concurrent search per second for a 2.2 Million records on a 4vCPUs. How much RAM was on that server? It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM? Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second? Is it realistic to think of 1k-10k search a second without breaking the bank server wise? Thanks again. Cheers!
j
@Steven Lacroix
Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU?
Many things in Typesense are heavily optimized specifically for search-as-you-type experiences. For eg, the fact that we store all indices in memory, is to be able to enable this performance for instant-search. The amount of data you have dictates the amount of RAM you'd need. The amount of concurrent searches you have, instant-search or not, dictates the amount of CPU you need. Instant-search experiences do generate more concurrent traffic in general, so CPU demand is relatively higher. But I wouldn't let that stop you from doing instant-search experiences - I'd recommend first benchmarking to see how much CPU you need.
It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM?
That's correct. Here's more info on how to choose RAM: https://typesense.org/docs/0.21.0/guide/system-requirements.html#choosing-ram
Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second?
If 104 users type one letter at the exact same second, that would count as 104 concurrent searches. If let's say 5 of those users type 2 letters in that same second, then you'd have 104 + 5 concurrent searches. From Typesense's perspective, there is no distinction between search-as-you-type or not - you send keywords and you get results. The difference is that the frontend that's triggering the searches sends requests to Typesense on every keypress in a search-as-you-type experience, whereas historically many search experiences would require the user to press enter to start searching, which is when the query gets sent to the search engine.
Is it realistic to think of 1k-10k search a second without breaking the bank server wise?
Yup, definitely realistic! It totally depends on your dataset size and the number of fields you search for within each document. Just the other day, we had one user who was able to get up to 2.5K searches per second when searching through thousands of records with one field per record, on a 512MB, 2vCPU 3-node cluster! My benchmarking server ran out of resources to be able to generate even higher load 😄
s
@Jason Bosco Wow. Thank you so much. You have answered all my questions with great details. You guys are awesome. Really looking forward to implement Typesense in my stack. Cheers! 🙌
j
Happy to help! 🙌
s
@Jason Bosco sorry me again 🙃. I came across this benchmark and wanted to bring it by you. https://medium.com/gigasearch/benchmarking-performance-elasticsearch-vs-competitors-d4778ef75639 I was wondering what you thought of it. You think it's a fair comparison? Thanks again.
j
@Steven Lacroix One thing that was surprising to me in that article was how multi-word search queries take unusually long just in Typesense. So I've been in touch with the author to get some example queries to understand what's happening there. One theory I have currently is that Typesense has a feature where words are dropped from the query if sufficient results are not found for the full multi-word query. This threshold for "sufficient results" is currently set to 100 results by default, so I suspect that for these queries Typesense is actually finding less than 100 results and so is doing far more exhaustive search by dropping one or more words and repeating the search again to find enough results. So it's doing more searching than the other search engines which don't have this result expansion feature. I've asked the author to turn off this feature in Typesense to see if that helps. The other thing that stood out to me is how Meilisearch's indexing time is so less. It turns out that unlike the other search engines, Meilisearch's indexing is async. So the HTTP endpoint just receives the uploaded data, creates an async job and returns that job ID. The actual indexing then happens in the background, and you have to poll for job status separately. This is the reason it seems like its fast, when really the response time is just for uploading data, not for actual indexing.
s
@Jason Bosco The threshold for "sufficient results" on the multi-word search makes perfect sense. It's a bummer that those results are out there, but I hope he comes back to you with the queries he used to clarify things. I was also wondering the same about Meilisearch, but your reasoning makes sense. Thanks again! 🙂
👍 1
b
@Jason Bosco Something related to this thread, what tool do you use for benchmarking?
"This threshold for "sufficient results" is currently set to 100 results by default", which parameter turns this off?
j
@bnfd I use k6 for a benchmarking tool. The parameter is is drop_tokens_threshold
b
thanks!
👍 1