Enhancing Vector Search Performance and Response Time using Multi-Search Feature
TLDR Bill faced performance issues with vector search using multi_search feature. Jason and Kishore Nallan suggested running models on a GPU and excluding large fields from the search. Through discussion, it was established that adding more CPUs and enabling server-side caching could enhance performance. The thread concluded with the user reaching a resolution.
2
1
Oct 23, 2023 (1 month ago)
Kishore Nallan
11:34 AMBill
11:34 AMBill
11:34 AMBill
11:35 AMKishore Nallan
11:35 AMKishore Nallan
11:35 AMBill
11:35 AMBill
11:36 AMBill
11:36 AMKishore Nallan
11:36 AMBill
11:36 AMBill
11:37 AMBill
11:38 AM"filter_by": "
+ filterBy +
","collection": "test",
"q": "test",
"query_by": "title, about",
"prefix": false,
"per_page": "8`",
"sort_by": "published:desc",
"page": 1
}
Bill
11:40 AMKishore Nallan
11:41 AMKishore Nallan
11:41 AMBill
11:41 AMBill
11:42 AMBill
11:42 AMKishore Nallan
11:43 AMKishore Nallan
11:43 AMBill
11:44 AMBill
11:44 AMKishore Nallan
11:44 AMsearch_time_ms
values it's difficult to really ascertain what's going on.Bill
11:47 AMBill
11:47 AMBill
11:49 AMBill
11:49 AMBill
11:49 AMKishore Nallan
11:52 AMBill
11:52 AMBill
11:52 AMBill
11:53 AMBill
11:54 AMKishore Nallan
11:54 AMBill
11:55 AMKishore Nallan
11:55 AMKishore Nallan
11:55 AMBill
11:55 AMBill
11:56 AMKishore Nallan
11:58 AMKishore Nallan
11:59 AMBill
11:59 AMKishore Nallan
11:59 AMBill
11:59 AMBill
12:07 PMBill
12:07 PMBill
12:07 PMBill
01:27 PMBill
02:03 PMBill
02:35 PMBill
02:35 PMBill
02:35 PMJason
04:23 PMJason
04:23 PMBill
07:32 PMBill
07:32 PMBill
07:33 PMBill
07:33 PMBill
07:33 PMOct 24, 2023 (1 month ago)
Jason
12:24 AMhttps://recipe-search.typesense.org/?r%5Bquery%5D=Oregano
Jason
12:25 AMJason
12:27 AMWhat's the CPU architecture, clock speed of the CPU, disk type (is it an SSD or magnetic disks), are you using Docker, or running natively, are there any load balancers in front, etc?
Jason
12:27 AMBill
09:30 AMBill
09:32 AMBill
09:33 AMBill
04:18 PMJason
04:22 PMcurl -s '' \
-d '
{
"searches": [
{
"query_by": "title",
"collection": "r",
"q": "Oregano"
},
{
"query_by": "title",
"collection": "r",
"q": "Pizza"
},
{
"query_by": "title",
"collection": "r",
"q": "Chilli"
},
{
"query_by": "title",
"collection": "r",
"q": "Pineapple"
},
{
"query_by": "title",
"collection": "r",
"q": "Artichoke"
}
]
}' | jq '.results[].search_time_ms'
1
16
1
12
6
Bill
04:24 PMJason
04:26 PMBill
04:26 PMBill
04:27 PMJason
04:27 PMJason
04:27 PMA dataset containing 2.2 Million recipes (recipe names and ingredients):
Took up about 900MB of RAM when indexed in Typesense
Took 3.6mins to index all 2.2M records
On a server with 4vCPUs, Typesense was able to handle a concurrency of 104 concurrent search queries per second, with an average search processing time of 11ms.
Bill
04:30 PMBill
04:31 PMJason
04:32 PMBill
04:33 PMBill
07:07 PMJason
07:52 PM• 2.2M recipes
• Running on a 4vCPU server (single node).
• 5 searches per multi_search request, similar to the curl request above.
Results:
• Up to around 84 multi_search requests per second (which translates to 84 * 5 = 420 searches per second),
search_time_ms
avg is 6ms, max is 34ms. • After that, CPU is exhausted on the Typesense server (100% CPU usage across all cores), and only then search_time_ms spikes to about 55ms.
Adding more CPU will help increase concurrency beyond that if needed.
Bill
07:57 PMBill
07:58 PMJason
08:00 PMYou could also enable server-side caching in Typesense (
use_cache: true
). The search_time_ms will be cached as well though, so if you use caching, you want to look at the full http response timeBill
08:02 PMJason
08:03 PMBill
08:03 PMBill
08:03 PMJason
08:04 PMJason
08:04 PMDepending on the placement / usage of the search feature, if there are
X
users on a site / app, I've typically seen that translate to 20% of X
searches per second, given that not all users are searching at the exact same secondJason
08:05 PMBill
08:05 PMBill
08:05 PM1
Bill
08:06 PMBill
08:06 PMBill
08:07 PMJason
08:13 PMJason
08:15 PM1
Bill
08:16 PMJason
08:16 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Discussion on Performance and Scalability for Multiple Term Search
Bill asks the best way for multi-term searches in a recommendation system they developed. Kishore Nallan suggested using embeddings and remote embedder or storing and averaging vectors. Despite testing several suggested solutions, Bill continued to face performance issues, leading to unresolved discussions about scalability and recommendation system performance.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Integrating Semantic Search with Typesense
Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.
Understanding Indexing and Search-As-You-Type In Typesense
Steven had queries about indexing and search-as-you-type in Typesense. Jason clarified that bulk updates are faster and search-as-you-type is resource intensive but worth it. The discussion also included querying benchmarks and Typesense's drop_tokens_threshold parameter, with participation from bnfd.
Discussing Typesense Search Request Performance
Al experienced longer-than-reported times for Typesense search requests, sparking a detailed examination of json parsing, response times and data transfer. Jason and Kishore Nallan helped solve the issue.