Understanding Indexing and Search-As-You-Type In Typesense
TLDR Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.
Sep 07, 2021 (26 months ago)
I know it depends on a lot of variables, but in general what is suggested here? Let's say I already have around 1 million records and hundreds are added/updated every few minutes. Is it going to be too heavy on indexing? My users can add items (title, description, tags) to the site and these must be indexed. Users can also like items already on the site, and it would be nice if this counter could be updated/indexed as well. Are a few hundred inserts/updates every few minutes too much? Do I need to rethink my insert/update strategy by using bulk every few hours? Thanks
This should be fine. But the complete answer to this depends on the amount of CPU cores / capacity.
In general though, bulk updates are faster than single updates. I'd recommend leaning towards batch updates when possible.
Sep 08, 2021 (26 months ago)
Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU?
I see that in the benchmarks, Typesense managed to get 104 concurrent search per second for a 2.2 Million records on a 4vCPUs. How much RAM was on that server? It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM? Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second?
Is it realistic to think of 1k-10k search a second without breaking the bank server wise? Thanks again. Cheers!
> Now I'm just wondering one thing, search-as-you-type is really great user experience, but how heavy is it on the server? Is it heavier on the RAM or CPU?
Many things in Typesense are heavily optimized specifically for search-as-you-type experiences. For eg, the fact that we store all indices in memory, is to be able to enable this performance for instant-search. The amount of data you have dictates the amount of RAM you'd need. The amount of concurrent searches you have, instant-search or not, dictates the amount of CPU you need. Instant-search experiences do generate more concurrent traffic in general, so CPU demand is relatively higher. But I wouldn't let that stop you from doing instant-search experiences - I'd recommend first benchmarking to see how much CPU you need.
> It says it took up about 900MB of RAM when indexed, so 2-3x that amount of RAM?
That's correct. Here's more info on how to choose RAM: https://typesense.org/docs/0.21.0/guide/system-requirements.html#choosing-ram
> Let's say I have 104 users doing search-as-you-type, would that be more than 104 concurrent search if they type more than 1 letter a second?
If 104 users type one letter at the exact same second, that would count as 104 concurrent searches. If let's say 5 of those users type 2 letters in that same second, then you'd have 104 + 5 concurrent searches. From Typesense's perspective, there is no distinction between search-as-you-type or not - you send keywords and you get results. The difference is that the frontend that's triggering the searches sends requests to Typesense on every keypress in a search-as-you-type experience, whereas historically many search experiences would require the user to press enter to start searching, which is when the query gets sent to the search engine.
> Is it realistic to think of 1k-10k search a second without breaking the bank server wise?
Yup, definitely realistic! It totally depends on your dataset size and the number of fields you search for within each document. Just the other day, we had one user who was able to get up to 2.5K searches per second when searching through thousands of records with one field per record, on a 512MB, 2vCPU 3-node cluster! My benchmarking server ran out of resources to be able to generate even higher load 😄
I was wondering what you thought of it. You think it's a fair comparison? Thanks again.
Sep 09, 2021 (26 months ago)
One theory I have currently is that Typesense has a feature where words are dropped from the query if sufficient results are not found for the full multi-word query. This threshold for "sufficient results" is currently set to 100 results by default, so I suspect that for these queries Typesense is actually finding less than 100 results and so is doing far more exhaustive search by dropping one or more words and repeating the search again to find enough results. So it's doing more searching than the other search engines which don't have this result expansion feature.
I've asked the author to turn off this feature in Typesense to see if that helps.
The other thing that stood out to me is how Meilisearch's indexing time is so less. It turns out that unlike the other search engines, Meilisearch's indexing is async. So the HTTP endpoint just receives the uploaded data, creates an async job and returns that job ID. The actual indexing then happens in the background, and you have to poll for job status separately. This is the reason it seems like its fast, when really the response time is just for uploading data, not for actual indexing.
The parameter is is drop_tokens_threshold
Indexed 2776 threads (79% resolved)
Improving Typesense Query Performance
Jonathan queried about slower than expected typesense query performance. Jason and Kishore Nallan offered solutions and explanations. After a series of tests, Jonathan found other queries returned results quickly, indicating the issue was specific to the original query.
Discussing Typesense Search Request Performance
Al experienced longer-than-reported times for Typesense search requests, sparking a detailed examination of json parsing, response times and data transfer. Jason and Kishore Nallan helped solve the issue.
Addressing Typesense Server Issues and Optimization Needs
Robert had an issue with a 'stuck' typesense server. Jason and Kishore Nallan gave advice on handling writes, configuration for high search volumes, and running multiple typesense instances. They also recommended monitoring CPU usage and updating the server version for bug fixes.
Querying and Performance in Typesense
Chris had a problem with a Typesense query not returning a match. Jason solved the issue by suggesting the `exhaustive_search` feature. Further performance and features of Typesense were also discussed.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.