#community-help

Understanding Indexing and Search-As-You-Type In Typesense

TLDR Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.

Powered by Struct AI

2

13
26mo
Solved
Join the chat
Sep 07, 2021 (26 months ago)
Steven
Photo of md5-f930fdb99fd46477205fa1201164ea50
Steven
08:01 PM
How often can I insert/update?
I know it depends on a lot of variables, but in general what is suggested here? Let's say I already have around 1 million records and hundreds are added/updated every few minutes. Is it going to be too heavy on indexing? My users can add items (title, description, tags) to the site and these must be indexed. Users can also like items already on the site, and it would be nice if this counter could be updated/indexed as well. Are a few hundred inserts/updates every few minutes too much? Do I need to rethink my insert/update strategy by using bulk every few hours? Thanks
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:25 PM
> Are a few hundred inserts/updates every few minutes too much?
This should be fine. But the complete answer to this depends on the amount of CPU cores / capacity.

In general though, bulk updates are faster than single updates. I'd recommend leaning towards batch updates when possible.
Sep 08, 2021 (26 months ago)
Steven
Photo of md5-f930fdb99fd46477205fa1201164ea50
Steven
12:33 PM
Thanks for your answer Jason. Awesome work by the way! 👏

Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU?

I see that in the benchmarks, Typesense managed to get 104 concurrent search per second for a 2.2 Million records on a 4vCPUs. How much RAM was on that server? It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM? Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second?

Is it realistic to think of 1k-10k search a second without breaking the bank server wise? Thanks again. Cheers!
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
06:22 PM
Steven

> Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU?
Many things in Typesense are heavily optimized specifically for search-as-you-type experiences. For eg, the fact that we store all indices in memory, is to be able to enable this performance for instant-search. The amount of data you have dictates the amount of RAM you'd need. The amount of concurrent searches you have, instant-search or not, dictates the amount of CPU you need. Instant-search experiences do generate more concurrent traffic in general, so CPU demand is relatively higher. But I wouldn't let that stop you from doing instant-search experiences - I'd recommend first benchmarking to see how much CPU you need.

> It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM?
That's correct. Here's more info on how to choose RAM: https://typesense.org/docs/0.21.0/guide/system-requirements.html#choosing-ram

> Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second?
If 104 users type one letter at the exact same second, that would count as 104 concurrent searches. If let's say 5 of those users type 2 letters in that same second, then you'd have 104 + 5 concurrent searches. From Typesense's perspective, there is no distinction between search-as-you-type or not - you send keywords and you get results. The difference is that the frontend that's triggering the searches sends requests to Typesense on every keypress in a search-as-you-type experience, whereas historically many search experiences would require the user to press enter to start searching, which is when the query gets sent to the search engine.

> Is it realistic to think of 1k-10k search a second without breaking the bank server wise?
Yup, definitely realistic! It totally depends on your dataset size and the number of fields you search for within each document. Just the other day, we had one user who was able to get up to 2.5K searches per second when searching through thousands of records with one field per record, on a 512MB, 2vCPU 3-node cluster! My benchmarking server ran out of resources to be able to generate even higher load 😄
Steven
Photo of md5-f930fdb99fd46477205fa1201164ea50
Steven
06:36 PM
Jason Wow. Thank you so much. You have answered all my questions with great details. You guys are awesome. Really looking forward to implement Typesense in my stack. Cheers! 🙌
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:40 PM
Happy to help! 🙌
Steven
Photo of md5-f930fdb99fd46477205fa1201164ea50
Steven
08:45 PM
Jason sorry me again 🙃. I came across this benchmark and wanted to bring it by you. https://medium.com/gigasearch/benchmarking-performance-elasticsearch-vs-competitors-d4778ef75639
I was wondering what you thought of it. You think it's a fair comparison? Thanks again.
Sep 09, 2021 (26 months ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:24 AM
Steven One thing that was surprising to me in that article was how multi-word search queries take unusually long just in Typesense. So I've been in touch with the author to get some example queries to understand what's happening there.

One theory I have currently is that Typesense has a feature where words are dropped from the query if sufficient results are not found for the full multi-word query. This threshold for "sufficient results" is currently set to 100 results by default, so I suspect that for these queries Typesense is actually finding less than 100 results and so is doing far more exhaustive search by dropping one or more words and repeating the search again to find enough results. So it's doing more searching than the other search engines which don't have this result expansion feature.

I've asked the author to turn off this feature in Typesense to see if that helps.

The other thing that stood out to me is how Meilisearch's indexing time is so less. It turns out that unlike the other search engines, Meilisearch's indexing is async. So the HTTP endpoint just receives the uploaded data, creates an async job and returns that job ID. The actual indexing then happens in the background, and you have to poll for job status separately. This is the reason it seems like its fast, when really the response time is just for uploading data, not for actual indexing.
Steven
Photo of md5-f930fdb99fd46477205fa1201164ea50
Steven
01:00 AM
Jason The threshold for "sufficient results" on the multi-word search makes perfect sense. It's a bummer that those results are out there, but I hope he comes back to you with the queries he used to clarify things. I was also wondering the same about Meilisearch, but your reasoning makes sense. Thanks again! 🙂

1

bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
11:18 AM
Jason Something related to this thread, what tool do you use for benchmarking?
11:26
bnfd
11:26 AM
"This threshold for "sufficient results" is currently set to 100 results by default", which parameter turns this off?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
02:53 PM
bnfd I use k6 for a benchmarking tool.

The parameter is is drop_tokens_threshold
bnfd
Photo of md5-ca6495d5be926db80e09aabf066f4b8b
bnfd
02:54 PM
thanks!

1