Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
TLDR Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
8
2
1
Jul 11, 2023 (5 months ago)
Bill
07:26 PM1
Bill
07:26 PMJason
07:26 PMBill
07:27 PMJason
07:27 PMper_page
and page
might work, but it’s not going to go off of vector distanceBill
07:27 PMJason
07:28 PMk
parameter in vector_query, but that runs into the same issue - we need to add support for bothBill
07:28 PMBill
07:29 PMJason
07:30 PMJason
07:30 PMBill
07:31 PMBill
07:37 PMBill
08:45 PMJason
09:02 PMfilter_by
instead of / or in combination with vector searchBill
09:47 PMBill
09:48 PM{
"searches": [
{
"q": "internet",
"collection": "products",
"query_by": "embedding",
"exclude_fields": "embedding",
"prefix": *false*,
"per_page": 250
}
]
}
Jason
09:49 PMq
parameter is used for vector searchesBill
09:50 PM{
"searches": [
{
"q": "*",
"collection": "products",
"filter_by": "title:= internet || title:= coffee || title:= cars",
"query_by": "embedding",
"exclude_fields": "embedding",
"prefix": *false*,
"per_page": 250
}
]
}
Bill
09:50 PMBill
09:50 PMBill
09:50 PMJason
09:51 PMBill
09:51 PMJason
09:52 PMBill
09:52 PMBill
09:53 PMJason
09:54 PMJason
09:54 PMBill
09:56 PMBill
10:53 PMJason
10:55 PMJason
10:57 PMBill
11:38 PMJul 12, 2023 (5 months ago)
Jason
01:57 AMBill
10:06 AMKishore Nallan
03:42 PMI've a fix for the distance_threshold param in
typesense/typesense:0.25.0.rc48
.You can now pass it like this:
'vector_query': 'vec:([], distance_threshold: 0.25)'
Kishore Nallan
03:43 PMBill
07:26 PMJul 13, 2023 (4 months ago)
Kishore Nallan
01:23 AMBill
09:18 AMKishore Nallan
09:20 AM1
Bill
10:16 AMKishore Nallan
10:55 AMBill
10:59 AMKishore Nallan
11:01 AMBill
11:07 AMBill
11:09 AMKishore Nallan
11:12 AMBill
11:12 AMBill
01:48 PMKishore Nallan
01:50 PMBill
01:50 PM{
"model_md5": "728d3db98e1b7a691a731644867382c5",
"vocab_file_name": "sentencepiece.bpe.model",
"vocab_md5": "bf25eb5120ad92ef5c7d8596b5dc4046",
"model_type": "xlm_roberta"
}
Bill
01:50 PM{
"_name_or_path": "intfloat/multilingual-e5-large",
"architectures": [
"XLMRobertaModel"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"classifier_dropout": null,
"eos_token_id": 2,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "xlm-roberta",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"output_past": true,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"transformers_version": "4.30.2",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 250002
}
Bill
01:51 PM{
"vocab_file_name": "sentencepiece.bpe.model",
"model_type": "xlm_roberta"
}
But i get this error-> "message": "Failed to download model file"
Kishore Nallan
01:52 PMBill
01:52 PM{
"name": "productsNew",
"fields": [
{
"name": "product_name",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"product_name"
],
"model_config": {
"model_name": "ts/multilingual-e5-large"
}
}
}
]
}
Bill
01:52 PMJul 14, 2023 (4 months ago)
Kishore Nallan
05:48 AMBill
09:54 AM1
Bill
11:40 AMKishore Nallan
11:46 AMKishore Nallan
11:47 AMBill
11:47 AMBill
11:47 AMKishore Nallan
11:48 AMBill
11:48 AMBill
11:48 AMBill
11:48 AMKishore Nallan
11:49 AMBill
11:49 AMBill
11:51 AMBill
12:01 PMKishore Nallan
12:05 PMBill
12:05 PMKishore Nallan
12:31 PMBill
12:43 PMBill
12:44 PMBill
12:45 PMKishore Nallan
12:48 PMBill
12:48 PMBill
12:51 PMBill
12:51 PMKishore Nallan
12:53 PMBill
01:24 PMKishore Nallan
01:48 PMKishore Nallan
01:48 PMBill
02:01 PMBill
02:33 PMvector_distance
in the schema for sorting"Bill
02:34 PMKishore Nallan
02:37 PM_vector_distance:desc
1
Jul 19, 2023 (4 months ago)
Bill
08:46 PM{
"results": [
{
"code": 500,
"error": "Request timed out."
}
]
}
Bill
08:52 PMJul 20, 2023 (4 months ago)
Bill
02:22 PMKishore Nallan
02:33 PMBill
02:43 PMKishore Nallan
02:46 PMBill
02:46 PMTypesense
Indexed 3011 threads (79% resolved)
Similar Threads
Discussion on Performance and Scalability for Multiple Term Search
Bill asks the best way for multi-term searches in a recommendation system they developed. Kishore Nallan suggested using embeddings and remote embedder or storing and averaging vectors. Despite testing several suggested solutions, Bill continued to face performance issues, leading to unresolved discussions about scalability and recommendation system performance.
Integrating Semantic Search with Typesense
Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.
Announcement: General Availability of Typesense v0.25.0
Jason announces release of Typesense v0.25.0, listing new features. Users express excitement and ask pertinent questions. Gorkem, Manuel, and Daniel commend the team for the new functionalities. Manish and Tugay share their positive experiences with Typesense. Jason and Kishore Nallan answer questions and thank users for their feedback.
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Resolving Multilingual Search Function in Typesense Software
Bill is having difficulty with multilingual search functionality in Typesense software. Developer Kishore Nallan suggested setting a language locale and provided a demo build. The build solution had some issues, and after multiple rounds of software updates and troubleshooting, the problem still persists.