Bill
07/10/2023, 1:24 PMKishore Nallan
07/10/2023, 1:25 PMBill
07/10/2023, 1:28 PMKishore Nallan
07/10/2023, 3:02 PMKishore Nallan
07/10/2023, 3:03 PMKishore Nallan
07/10/2023, 3:03 PMBill
07/10/2023, 3:22 PMJason Bosco
07/10/2023, 3:53 PM0.25.0.rc47
Bill
07/10/2023, 3:54 PMJason Bosco
07/10/2023, 4:20 PMBill
07/10/2023, 5:33 PMBill
07/10/2023, 5:39 PMJason Bosco
07/10/2023, 6:03 PMIs there any other issue except from openAI?None that have been reported. The build has been otherwise stable with other users using it
Jason Bosco
07/10/2023, 6:03 PMIn addition, the GPU support will be added to v0.25 or in the next one?It’s already in 0.25
Bill
07/10/2023, 6:04 PMJason Bosco
07/10/2023, 6:04 PMBill
07/10/2023, 6:05 PMJason Bosco
07/10/2023, 6:07 PMJason Bosco
07/10/2023, 6:10 PM.so
file in the same directory as the typesense binary)Bill
07/10/2023, 6:11 PMJason Bosco
07/10/2023, 6:13 PMJason Bosco
07/10/2023, 6:14 PMBill
07/10/2023, 6:15 PMJason Bosco
07/10/2023, 6:18 PMJason Bosco
07/10/2023, 6:18 PMBill
07/10/2023, 7:21 PMBill
07/10/2023, 7:22 PMJason Bosco
07/10/2023, 7:22 PMJason Bosco
07/10/2023, 7:22 PMJason Bosco
07/10/2023, 7:23 PMJason Bosco
07/10/2023, 7:25 PMBill
07/10/2023, 7:49 PMBill
07/10/2023, 8:04 PMJason Bosco
07/10/2023, 8:36 PMBill
07/10/2023, 8:38 PMJason Bosco
07/11/2023, 3:24 AMBill
07/11/2023, 11:00 AMBill
07/11/2023, 12:12 PMKishore Nallan
07/11/2023, 12:16 PMKishore Nallan
07/11/2023, 12:17 PMxlm_roberta
Bill
07/11/2023, 12:17 PMBill
07/11/2023, 12:17 PMKishore Nallan
07/11/2023, 12:18 PMBill
07/11/2023, 12:18 PMBill
07/11/2023, 12:19 PMKishore Nallan
07/11/2023, 12:19 PMparaphrase-multilingual-mpnet-base-v2
, you just need to do this:
curl -k "<http://localhost:8108/collections>" -X POST -H "Content-Type: application/json" \ 130 ↵
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -d '{
"name": "titles",
"fields": [
{
"name": "title",
"type": "string"
},
{
"name": "points",
"type": "int32"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"title"
],
"model_config": {
"model_name": "ts/paraphrase-multilingual-mpnet-base-v2"
}
}
}
]
}'
Kishore Nallan
07/11/2023, 12:20 PMxlm-roberta-base
is a plain masked model. These models must be then fine tuned specifically for a task, like semantic search. They don't work well without that fine tuning.Kishore Nallan
07/11/2023, 12:21 PMKishore Nallan
07/11/2023, 12:21 PMdistilbert
Bill
07/11/2023, 12:23 PMKishore Nallan
07/11/2023, 12:25 PMKishore Nallan
07/11/2023, 12:26 PMparaphrase-multilingual-mpnet-base-v2
--> it might do what you want.Bill
07/11/2023, 12:26 PMBill
07/11/2023, 12:26 PMKishore Nallan
07/11/2023, 12:27 PMBill
07/11/2023, 12:29 PMKishore Nallan
07/11/2023, 12:29 PMBill
07/11/2023, 12:30 PMKishore Nallan
07/11/2023, 12:31 PMKishore Nallan
07/11/2023, 12:32 PMBill
07/11/2023, 12:34 PMKishore Nallan
07/11/2023, 12:34 PMBill
07/11/2023, 12:35 PMKishore Nallan
07/11/2023, 12:36 PMBill
07/11/2023, 12:40 PMKishore Nallan
07/11/2023, 12:42 PMBill
07/11/2023, 12:42 PMBill
07/11/2023, 12:43 PMKishore Nallan
07/11/2023, 12:47 PMvec:([0.3,0.4,0.5], distance_threshold:0.01)
Bill
07/11/2023, 12:49 PMKishore Nallan
07/11/2023, 12:49 PMdistance_threshold
we ignore records whose distance is greater than this threshold value.Bill
07/11/2023, 12:49 PMKishore Nallan
07/11/2023, 12:50 PMKishore Nallan
07/11/2023, 12:50 PMBill
07/11/2023, 12:51 PMvec
does not have a vector query index.Bill
07/11/2023, 12:52 PMBill
07/11/2023, 12:52 PMKishore Nallan
07/11/2023, 12:53 PMKishore Nallan
07/11/2023, 12:53 PMBill
07/11/2023, 1:05 PMBill
07/11/2023, 1:05 PMBill
07/11/2023, 1:05 PMBill
07/11/2023, 1:06 PMKishore Nallan
07/11/2023, 1:13 PMBill
07/11/2023, 1:14 PMKishore Nallan
07/11/2023, 1:38 PMBill
07/11/2023, 1:43 PMBill
07/11/2023, 1:44 PMBill
07/11/2023, 1:45 PMvec
does not have a vector query index." when I add the vector search in PayloadKishore Nallan
07/11/2023, 1:46 PMBill
07/11/2023, 1:46 PMKishore Nallan
07/11/2023, 1:47 PMBill
07/11/2023, 1:47 PM{
"name": "vec",
"type": "float[]",
"num_dim": 4
}
Bill
07/11/2023, 1:48 PMvec
has been declared in the schema, but is not found in the document." in collections/products/documents/import?action=createKishore Nallan
07/11/2023, 1:49 PMBill
07/11/2023, 1:49 PMBill
07/11/2023, 1:49 PM{
"name": "products",
"fields": [
{
"name": "title",
"type": "string",
"locale": "sr"
},
{
"name": "vec",
"type": "float[]",
"num_dim": 4
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"title"
],
"model_config": {
"model_name": "ts/paraphrase-multilingual-mpnet-base-v2"
}
}
}
]
}
Kishore Nallan
07/11/2023, 1:49 PMKishore Nallan
07/11/2023, 1:50 PMBill
07/11/2023, 1:50 PMBill
07/11/2023, 1:51 PMJason Bosco
07/11/2023, 3:09 PMJason Bosco
07/11/2023, 3:10 PM{
"name": "products",
"fields": [
{
"name": "title",
"type": "string",
"locale": "sr"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"title"
],
"model_config": {
"model_name": "ts/paraphrase-multilingual-mpnet-base-v2"
}
}
}
]
}
Bill
07/11/2023, 3:11 PMJason Bosco
07/11/2023, 3:12 PMJason Bosco
07/11/2023, 3:12 PMembed
key in the field definition will generate and store the embeddingsBill
07/11/2023, 3:12 PMJason Bosco
07/11/2023, 3:12 PMname: embedding
to name: vec
if you needBill
07/11/2023, 3:13 PMJason Bosco
07/11/2023, 3:13 PMKishore Nallan
07/11/2023, 3:14 PMvector_query
at all.Kishore Nallan
07/11/2023, 3:15 PMquery_by=embedding
Bill
07/11/2023, 3:15 PMBill
07/11/2023, 3:17 PMKishore Nallan
07/11/2023, 3:17 PMKishore Nallan
07/11/2023, 3:18 PMKishore Nallan
07/11/2023, 3:19 PMBill
07/11/2023, 3:19 PMBill
07/11/2023, 3:20 PMKishore Nallan
07/11/2023, 3:20 PM[ ]
works, otherwise, I will have to address this use case which we might not have accounted for.Bill
07/11/2023, 3:20 PMBill
07/11/2023, 7:06 PMembedding
must have 768 dimensions.". Model: paraphrase-multilingual-mpnet-base-v2Jason Bosco
07/11/2023, 7:23 PM"vector_query": "embedding:([ ], distance_threshold:0.01)"
Bill
07/11/2023, 7:24 PMid
parameter must be present."Jason Bosco
07/11/2023, 7:25 PMdistance_threshold
when used with auto-embedding.
Could you create a GitHub issue using this template with a set of curl commands that replicates the issue?Bill
07/11/2023, 7:26 PMBill
07/11/2023, 7:26 PMJason Bosco
07/11/2023, 7:26 PMBill
07/11/2023, 7:27 PMJason Bosco
07/11/2023, 7:27 PMper_page
and page
might work, but it’s not going to go off of vector distanceBill
07/11/2023, 7:27 PMJason Bosco
07/11/2023, 7:28 PMk
parameter in vector_query, but that runs into the same issue - we need to add support for bothBill
07/11/2023, 7:28 PMBill
07/11/2023, 7:29 PMJason Bosco
07/11/2023, 7:30 PMJason Bosco
07/11/2023, 7:30 PMBill
07/11/2023, 7:31 PMBill
07/11/2023, 7:37 PMJason Bosco
07/11/2023, 7:37 PMBill
07/11/2023, 7:50 PMBill
07/11/2023, 8:45 PMJason Bosco
07/11/2023, 9:02 PMfilter_by
instead of / or in combination with vector searchBill
07/11/2023, 9:47 PMBill
07/11/2023, 9:48 PM{
"searches": [
{
"q": "internet",
"collection": "products",
"query_by": "embedding",
"exclude_fields": "embedding",
"prefix": *false*,
"per_page": 250
}
]
}
Jason Bosco
07/11/2023, 9:49 PMq
parameter is used for vector searchesBill
07/11/2023, 9:50 PM{
"searches": [
{
"q": "*",
"collection": "products",
"filter_by": "title:= internet || title:= coffee || title:= cars",
"query_by": "embedding",
"exclude_fields": "embedding",
"prefix": *false*,
"per_page": 250
}
]
}
Bill
07/11/2023, 9:50 PMBill
07/11/2023, 9:50 PMBill
07/11/2023, 9:50 PMJason Bosco
07/11/2023, 9:51 PMBill
07/11/2023, 9:51 PMJason Bosco
07/11/2023, 9:52 PMBill
07/11/2023, 9:52 PMBill
07/11/2023, 9:53 PMJason Bosco
07/11/2023, 9:54 PMJason Bosco
07/11/2023, 9:54 PMBill
07/11/2023, 9:56 PMBill
07/11/2023, 10:53 PMJason Bosco
07/11/2023, 10:55 PMJason Bosco
07/11/2023, 10:57 PMBill
07/11/2023, 11:38 PMJason Bosco
07/12/2023, 1:57 AMBill
07/12/2023, 10:06 AMKishore Nallan
07/12/2023, 3:42 PMtypesense/typesense:0.25.0.rc48
.
You can now pass it like this:
'vector_query': 'vec:([], distance_threshold: 0.25)'
Kishore Nallan
07/12/2023, 3:43 PMBill
07/12/2023, 7:26 PMKishore Nallan
07/13/2023, 1:23 AMBill
07/13/2023, 9:18 AMKishore Nallan
07/13/2023, 9:20 AMBill
07/13/2023, 10:16 AMKishore Nallan
07/13/2023, 10:55 AMBill
07/13/2023, 10:59 AMKishore Nallan
07/13/2023, 11:01 AMBill
07/13/2023, 11:07 AMBill
07/13/2023, 11:09 AMKishore Nallan
07/13/2023, 11:12 AMBill
07/13/2023, 11:12 AMBill
07/13/2023, 1:48 PMKishore Nallan
07/13/2023, 1:50 PMBill
07/13/2023, 1:50 PMBill
07/13/2023, 1:50 PMBill
07/13/2023, 1:51 PMKishore Nallan
07/13/2023, 1:52 PMBill
07/13/2023, 1:52 PMBill
07/13/2023, 1:52 PMKishore Nallan
07/14/2023, 5:48 AMBill
07/14/2023, 9:54 AMBill
07/14/2023, 11:40 AMKishore Nallan
07/14/2023, 11:46 AMKishore Nallan
07/14/2023, 11:47 AMBill
07/14/2023, 11:47 AMBill
07/14/2023, 11:47 AMKishore Nallan
07/14/2023, 11:48 AMBill
07/14/2023, 11:48 AMBill
07/14/2023, 11:48 AMBill
07/14/2023, 11:48 AMKishore Nallan
07/14/2023, 11:49 AMBill
07/14/2023, 11:49 AMBill
07/14/2023, 11:51 AMBill
07/14/2023, 12:01 PMKishore Nallan
07/14/2023, 12:05 PMBill
07/14/2023, 12:05 PMKishore Nallan
07/14/2023, 12:31 PMBill
07/14/2023, 12:43 PMBill
07/14/2023, 12:44 PMBill
07/14/2023, 12:45 PMKishore Nallan
07/14/2023, 12:48 PMBill
07/14/2023, 12:48 PMBill
07/14/2023, 12:51 PMBill
07/14/2023, 12:51 PMKishore Nallan
07/14/2023, 12:53 PMBill
07/14/2023, 1:24 PMKishore Nallan
07/14/2023, 1:48 PMKishore Nallan
07/14/2023, 1:48 PMBill
07/14/2023, 2:01 PMBill
07/14/2023, 2:33 PMvector_distance
in the schema for sorting"Bill
07/14/2023, 2:34 PMKishore Nallan
07/14/2023, 2:37 PM_vector_distance:desc
Bill
07/19/2023, 8:46 PMBill
07/19/2023, 8:52 PMBill
07/20/2023, 2:22 PMKishore Nallan
07/20/2023, 2:33 PMBill
07/20/2023, 2:43 PMKishore Nallan
07/20/2023, 2:46 PMBill
07/20/2023, 2:46 PMKishore Nallan
07/21/2023, 10:09 AMe5-small
model and an auto-embedding field.
2. Indexed 100K documents into the collection
3. Once the import was done, I confirmed the collection document count and also noted the memory usage via the /metrics.json end-point
4. Stopped and started Typesense server
5. I'm hitting the metrics end-point and the documents are getting indexed and memory usage is increasing.Bill
07/21/2023, 11:03 AMKishore Nallan
07/21/2023, 11:09 AMKishore Nallan
07/21/2023, 11:09 AMBill
07/21/2023, 11:10 AMBill
07/21/2023, 11:12 AMBill
07/21/2023, 11:12 AMBill
07/21/2023, 11:13 AMKishore Nallan
07/21/2023, 11:16 AMBill
07/21/2023, 11:16 AMKishore Nallan
07/21/2023, 11:17 AMBill
07/21/2023, 11:17 AMKishore Nallan
07/21/2023, 2:28 PMKishore Nallan
07/21/2023, 2:58 PMKishore Nallan
07/21/2023, 3:00 PMBill
07/21/2023, 3:00 PMKishore Nallan
07/21/2023, 3:09 PMBill
07/21/2023, 3:10 PMKishore Nallan
07/21/2023, 3:10 PMBill
07/21/2023, 3:11 PMKishore Nallan
07/21/2023, 3:11 PMBill
07/21/2023, 3:11 PMBill
07/21/2023, 3:14 PMKishore Nallan
07/21/2023, 3:30 PMLoading model from disk: /tmp/data/models/paraphrase-multilingual-mpnet-base-v2/model.onnx
This is the step increases the memory usage. Do you see this log before making a search request?Kishore Nallan
07/21/2023, 3:30 PMBill
07/21/2023, 3:30 PMKishore Nallan
07/21/2023, 3:31 PMBill
07/21/2023, 3:32 PMBill
07/21/2023, 3:32 PMKishore Nallan
07/21/2023, 3:33 PMI20230721 21:02:24.300542 130060 typesense_server_utils.cpp:331] Starting Typesense 0.25.0.rc53
...
I20230721 21:02:39.070822 130064 text_embedder.cpp:21] Loading model from disk: /tmp/data/models/paraphrase-multilingual-mpnet-base-v2/model.onnx
Kishore Nallan
07/21/2023, 3:33 PMKishore Nallan
07/21/2023, 3:34 PMKishore Nallan
07/21/2023, 3:38 PMBill
07/21/2023, 3:43 PMBill
07/21/2023, 3:43 PMBill
07/21/2023, 3:51 PMI20230721 15:36:01.550168 26330 typesense_server_utils.cpp:331] Starting Typesense 0.25.0.rc53
I20230721 15:36:01.550246 26330 typesense_server_utils.cpp:334] Typesense is using jemalloc.
I20230721 15:36:01.550557 26330 typesense_server_utils.cpp:384] Thread pool size: 16
I20230721 15:36:01.553516 26330 store.h:64] Initializing DB by opening state dir: /var/lib/typesense/db
I20230721 15:36:01.571373 26330 store.h:64] Initializing DB by opening state dir: /var/lib/typesense/meta
..........
I20230721 15:36:01.631971 26469 raft_server.cpp:508] Loading collections from disk...
.....
I20230721 15:36:01.913641 26469 collection_manager.cpp:301] Loaded 2 collection(s).
I20230721 15:36:01.913944 26469 collection_manager.cpp:305] Initializing batched indexer from snapshot state...
I20230721 15:36:01.913995 26469 batched_indexer.cpp:446] Restored 0 in-flight requests from snapshot.
I20230721 15:36:01.914005 26469 raft_server.cpp:515] Finished loading collections from disk.
W20230721 15:36:01.914573 26460 raft_server.cpp:591] Multi-node with no leader: refusing to reset peers.
I20230721 15:36:01.983656 26470 raft_server.h:288] Node starts following { leader_id=1.112.0.2:8107:8108, term=74, status=Follower receives message from new leader with the same term.}
I20230721 15:36:11.920372 26460 raft_server.cpp:564] Term: 74, last_index index: 42907, committed_index: 42907, known_applied_index: 42907, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 113332
Kishore Nallan
07/21/2023, 3:52 PMBill
07/21/2023, 3:52 PM{
"results": [
{
"code": 500,
"error": "Request timed out."
}
]
}
Kishore Nallan
07/21/2023, 3:53 PMBill
07/21/2023, 3:54 PMI20230721 15:52:01.187388 26367 text_embedder.cpp:21] Loading model from disk: /var/lib/typesense/models/paraphrase-multilingual-mpnet-base-v2/model.onnx
Bill
07/21/2023, 3:54 PMKishore Nallan
07/24/2023, 4:15 AMBill
07/24/2023, 9:31 AMKishore Nallan
07/24/2023, 1:04 PM0.25.0.rc54
Bill
07/24/2023, 1:07 PMBill
07/24/2023, 1:15 PMKishore Nallan
07/24/2023, 1:16 PMBill
07/24/2023, 1:17 PMBill
07/24/2023, 1:17 PMBill
07/24/2023, 1:18 PMBill
07/24/2023, 1:19 PMKishore Nallan
07/24/2023, 1:19 PMBill
07/24/2023, 1:20 PMBill
07/24/2023, 1:23 PMKishore Nallan
07/24/2023, 1:34 PMKishore Nallan
07/25/2023, 9:30 AM<http://localhost:8108/collections/docs/documents/search?q=the&query_by=title,embedding&x-typesense-api-key=abcd&sort_by=points:desc&include_fields=points&vector_query=embedding:([]>, distance_threshold:0.30)
It's returning me hits sorted descending by points
accurately.Kishore Nallan
07/25/2023, 9:46 AMBill
07/25/2023, 10:42 AMKishore Nallan
07/25/2023, 10:43 AMKishore Nallan
07/25/2023, 10:44 AMBill
07/25/2023, 10:44 AMBill
07/25/2023, 10:44 AMKishore Nallan
07/25/2023, 10:45 AMBill
07/25/2023, 10:46 AMBill
07/25/2023, 10:46 AMKishore Nallan
07/25/2023, 10:47 AMKishore Nallan
07/25/2023, 10:47 AMKishore Nallan
07/25/2023, 10:47 AMKishore Nallan
07/25/2023, 10:48 AMBill
07/25/2023, 10:48 AMBill
07/25/2023, 10:56 AMBill
07/25/2023, 10:57 AMKishore Nallan
07/25/2023, 12:01 PMKishore Nallan
07/25/2023, 4:13 PMBill
07/26/2023, 8:59 AMKishore Nallan
07/26/2023, 11:35 AM0.25.0.rc56
I could not reproduce the issue with default sorting field, but I wonder if that's because it's fixed by the other change.Bill
07/26/2023, 2:02 PMKishore Nallan
07/26/2023, 2:05 PMBill
07/26/2023, 6:09 PMBill
07/27/2023, 11:49 AMKishore Nallan
07/27/2023, 11:55 AMBill
07/27/2023, 12:16 PMKishore Nallan
07/27/2023, 12:19 PMKishore Nallan
07/27/2023, 12:19 PMBill
07/27/2023, 12:23 PMKishore Nallan
07/27/2023, 12:23 PM