#community-help

Enhancing Vector Search Performance and Response Time using Multi-Search Feature

TLDR Bill faced performance issues with vector search using multi_search feature. Jason and Kishore Nallan suggested running models on a GPU and excluding large fields from the search. Through discussion, it was established that adding more CPUs and enabling server-side caching could enhance performance. The thread concluded with the user reaching a resolution.

Powered by Struct AI

2

1

Oct 23, 2023 (1 month ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:34 AM
What is the response time on a single search request?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:34 AM
110ms
11:34
Bill
11:34 AM
with 50 -> 330 ms
11:35
Bill
11:35 AM
with 100 -> 1.2-1.3secs
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:35 AM
That seems super high for just 1000 docs
11:35
Kishore Nallan
11:35 AM
What is the time with concurrency of 1?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:35 AM
Yes but I search for 5 terms, it's like 5 searches on each request
11:36
Bill
11:36 AM
avg=110.48ms min=95.41ms med=108.91ms max=302.26ms
11:36
Bill
11:36 AM
test for 10s
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:36 AM
Do you have exhaustive search enabled?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:36 AM
no
11:37
Bill
11:37 AM
I have set prefix: false
11:38
Bill
11:38 AM
{
"filter_by": " + filterBy + ",
"collection": "test",
"q": "test",
"query_by": "title, about",
"prefix": false,
"per_page": "8`",
"sort_by": "published:desc",
"page": 1
}
11:40
Bill
11:40 AM
5 payloads like this in "searches": [] field
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:41 AM
Would you be able to share the dataset and queries with me?
11:41
Kishore Nallan
11:41 AM
1000 docs should take like 2-3 ms or less per query.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:41 AM
And i use typsesense v0.25.1
11:42
Bill
11:42 AM
I have deployed the server in a digital ocean droplet, the response time is not typesense's response time
11:42
Bill
11:42 AM
it's the response time from the server to the client
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:43 AM
Ah, then that includes network latency
11:43
Kishore Nallan
11:43 AM
So what you are essentially measuring is the network latency because each search request will take negligible time to be processed.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:44 AM
yes but the results seems like there is a lot of delay when there are 100 users
11:44
Bill
11:44 AM
1.3 seconds
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:44 AM
Unless I know the search_time_ms values it's difficult to really ascertain what's going on.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:47 AM
"search_time_ms":2
11:47
Bill
11:47 AM
for 5 terms in multi_search
11:49
Bill
11:49 AM
wrong, many queries have "search_time_ms":25
11:49
Bill
11:49 AM
"search_time_ms":42
11:49
Bill
11:49 AM
"search_time_ms":109
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:52 AM
This is on a single instance or on a cluster?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:52 AM
Tested on a single
11:52
Bill
11:52 AM
with 20 concurrent
11:53
Bill
11:53 AM
I had indexed an embedded field in this collection, does it adds up overhead in a normal search?
11:54
Bill
11:54 AM
because it has the vec in memory
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:54 AM
Try excluding that field in response because they tend to be large.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:55 AM
I have excluded it
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:55 AM
Fetching the document with large fields will be slow as well.
11:55
Kishore Nallan
11:55 AM
Exclusion will only help with reducing data sent on wire.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:55 AM
Yes I use only include_fields
11:56
Bill
11:56 AM
The documents don't have large fields
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:58 AM
Embedding is a large field
11:59
Kishore Nallan
11:59 AM
Including field or excluding fields is done only after the large doc is fetched from disk. That might add to the overhead even if you don't use the embedding field for anything.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:59 AM
Yes, should I delete it and test again?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:59 AM
Yes
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
11:59 AM
Ok
12:07
Bill
12:07 PM
Deleted and now I have "search_time_ms":17
12:07
Bill
12:07 PM
"search_time_ms":112
12:07
Bill
12:07 PM
20 concurrent reqs
01:27
Bill
01:27 PM
Kishore Nallan I installed v0.25.1 in a fresh cluster but I get Multi-node with no leader: refusing to reset peers.. All nodes are state: 4
02:03
Bill
02:03 PM
solved it
02:35
Bill
02:35 PM
Kishore Nallan Tested on a new fresh instance, 1 document in collection and tested by K6 - 20 concurrent users - multi_search 5 terms. The average response time is avg=228.75ms and in search time ms -> 18 - 83 ms
02:35
Bill
02:35 PM
Is this normal?
02:35
Bill
02:35 PM
1 doc in collection , 5 terms in multi_search and in 1GB RAM - 1 vCPU instance tested - no embeddings
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:23 PM
I'm actually surprised to hear that Typesense even works on a 1vCPU server, we typically require at least 2vCPUs.
04:23
Jason
04:23 PM
Could you try repeating this with 2vCPUs
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
07:32 PM
Jason I tested it in 1GB RAM - 1vCPU : 220ms response 20 concurrent 5 terms in multi_search and 1 doc total in collection
07:32
Bill
07:32 PM
4GB RAM - 2 vCPU - 200ms
07:33
Bill
07:33 PM
"search_time_ms":29
07:33
Bill
07:33 PM
"search_time_ms":48
07:33
Bill
07:33 PM
"search_time_ms":53
Oct 24, 2023 (1 month ago)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
12:24 AM
That seems completely off from what I'd expect, especially with just 1 record. For eg, even with 2M records, this example has a search_time_ms of about 10ms:

https://recipe-search.typesense.org/?r%5Bquery%5D=Oregano
12:25
Jason
12:25 AM
With 28M books, this returns results in about 5ms: https://books-search.typesense.org/?b%5Bquery%5D=Devops
12:27
Jason
12:27 AM
Could you elaborate on how you're running Typesense?

What's the CPU architecture, clock speed of the CPU, disk type (is it an SSD or magnetic disks), are you using Docker, or running natively, are there any load balancers in front, etc?
12:27
Jason
12:27 AM
Also, are you running k6 on the same hardware as Typesense?
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
09:30 AM
Jason these examples don’t use multi_search. I use multi_search with 5 searches (objects)
09:32
Bill
09:32 AM
The disk is nvme, k6 test from my local pc, cpu is a premium intel, intel Xeon I think, runs in Ubuntu Linux not docker, and tested without load balancer. I host it in digital ocean
09:33
Bill
09:33 AM
Have you tested typesense with multi search, appending 5-10 search objects in searches field?
04:18
Bill
04:18 PM
Jason Kishore Nallan any idea?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:22 PM
Even with multiple search objects, I see less 1ms - 16ms in response times:

curl -s '' \
        -d '
{
  "searches": [
    {
      "query_by": "title",
      "collection": "r",
      "q": "Oregano"
    },
    {
      "query_by": "title",
      "collection": "r",
      "q": "Pizza"
    },
    {
      "query_by": "title",
      "collection": "r",
      "q": "Chilli"
    },
    {
      "query_by": "title",
      "collection": "r",
      "q": "Pineapple"
    },
    {
      "query_by": "title",
      "collection": "r",
      "q": "Artichoke"
    }
  ]
}' | jq '.results[].search_time_ms'
1
16
1
12
6
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
04:24 PM
How many concurrent requests and what are the specs of the server?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:26 PM
This was just one request, this is a 4vCPU server
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
04:26 PM
Could you try it with 20-100 concurrent reqs?
04:27
Bill
04:27 PM
With 1 request I don’t have any issue
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:27 PM
That's what the first benchmark on this page did: https://typesense.org/docs/overview/benchmarks.html
04:27
Jason
04:27 PM
A dataset containing 2.2 Million recipes (recipe names and ingredients):
Took up about 900MB of RAM when indexed in Typesense
Took 3.6mins to index all 2.2M records
On a server with 4vCPUs, Typesense was able to handle a concurrency of 104 concurrent search queries per second, with an average search processing time of 11ms.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
04:30 PM
One notice, try the test with multi search. This test run with single queries not multi search of 5 searches
04:31
Bill
04:31 PM
In order to reproduce, run the multi search query that you sent me with 100 concurrent requests. What’s the search time?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:32 PM
That's going to take a bit of time to setup the benchmarking harness, etc...
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
04:33 PM
Just download k6 and run the query with vu 100 in the url that you sent me
07:07
Bill
07:07 PM
Jason Are there benchmark tests for multi_search?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:52 PM
There is now!

• 2.2M recipes
• Running on a 4vCPU server (single node).
• 5 searches per multi_search request, similar to the curl request above.
Results:
• Up to around 84 multi_search requests per second (which translates to 84 * 5 = 420 searches per second), search_time_ms avg is 6ms, max is 34ms.
• After that, CPU is exhausted on the Typesense server (100% CPU usage across all cores), and only then search_time_ms spikes to about 55ms.
Adding more CPU will help increase concurrency beyond that if needed.
Image 1 for There is now!

• 2.2M recipes
• Running on a 4vCPU server (single node). 
• 5 searches per multi_search request, similar to the curl request above. 
Results:
• Up to around 84 multi_search requests per second (which translates to 84 * 5 = 420 searches per second), `search_time_ms` avg is 6ms, max is 34ms. 
• After that, CPU is exhausted on the Typesense server (100% CPU usage across all cores), and only then search_time_ms spikes to about 55ms. 
Adding more CPU will help increase concurrency beyond that if needed.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
07:57 PM
Thank you for the fast response. I appreciate it. Yes that are exactly the response times I get from my tests. 100 concurrent multisearches with average 50-70ms search time
07:58
Bill
07:58 PM
So the solution to this issue, in order to handle more requests, is only by adding more CPUs or is there any other tuning that I can add in search query?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:00 PM
Adding CPU would be the solution...

You could also enable server-side caching in Typesense (use_cache: true). The search_time_ms will be cached as well though, so if you use caching, you want to look at the full http response time
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
08:02 PM
Okay right now, I have tested it in a 3 node cluster 2gb ram - 2vCPU and it can handle 200 concurrent requests with average 300ms and search time at 10-50ms . With normal search query, it can handle 350 concurrent requests with average 400ms response time
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:03 PM
You want to look at CPU usage, measure the response time just before CPU hits say 95%... because otherwise the average would include the high response times once CPU is exhausted
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
08:03 PM
What's the best point in order to upgrade my instance? 75% of CPU?
08:03
Bill
08:03 PM
I'll add an alert
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:04 PM
Yeah 70-75% is a good threshold to upgrade CPU
08:04
Jason
08:04 PM
On a side note, may I know how many concurrent users you're planning for?

Depending on the placement / usage of the search feature, if there are X users on a site / app, I've typically seen that translate to 20% of X searches per second, given that not all users are searching at the exact same second
08:05
Jason
08:05 PM
This is a super generalization, but that's the rough metric I've seen
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
08:05 PM
I built a suggestions query when a user opens my app
08:05
Bill
08:05 PM
So if I have 10K users that would convert to ~ 100 concurrent

1

08:06
Bill
08:06 PM
This multi_search will run exactly on the launch in order to fetch suggestions for the user
08:06
Bill
08:06 PM
I had implemented it with Embeddings etc.. but as I informed you the cost is too high for this phase
08:07
Bill
08:07 PM
Wouldn't be more performant if the requests in multi_search run in parallel and not in sequence?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:13 PM
At high enough concurrency, where the concurrency is higher than the individual number of searches inside a multi_search request, then executing each search inside a multi_search in serial actually reduces context switching overhead and improves performance
08:15
Jason
08:15 PM
Only at low concurrency (when the concurrency is lower than the max number of searches inside a multi search request), executing searches inside a multi_search request in parallel would be slightly more performant...

1

Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
08:16 PM
Ok, thank you for your time Jason
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:16 PM
Happy to help!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Discussion on Performance and Scalability for Multiple Term Search

Bill asks the best way for multi-term searches in a recommendation system they developed. Kishore Nallan suggested using embeddings and remote embedder or storing and averaging vectors. Despite testing several suggested solutions, Bill continued to face performance issues, leading to unresolved discussions about scalability and recommendation system performance.

3

105
1w

Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense

Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.

11

225
4mo

Integrating Semantic Search with Typesense

Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.

6

75
11mo

Understanding Indexing and Search-As-You-Type In Typesense

Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.

2

13
29mo

Discussing Typesense Search Request Performance

Al experienced longer-than-reported times for Typesense search requests, sparking a detailed examination of json parsing, response times and data transfer. Jason and Kishore Nallan helped solve the issue.

2

37
33mo