Aadarsh
03/06/2024, 7:06 PMid
, name
, category
. Do I explicitly need to mention a field named embedding
?
I have added this embedding
field in the step while creating the collection:
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": ["name", "category"],
"model_config": {"model_name": "all-MiniLM-L12-v2"},
},
}
But, running this with and without the embedding field in JSONL documents give me the error: [Errno 400] Model file not found
Should i have the model stored in some directory within my project so that typesense can access it?Jason Bosco
03/06/2024, 7:27 PMall-MiniLM-L12-v2
should be ts/all-MiniLM-L12-v2
Here's a step-by-step guide you should be able to copy-paste: https://typesense.org/docs/guide/tips-for-searching-common-types-of-data.html#long-pieces-of-textJason Bosco
03/06/2024, 7:28 PMDo I explicitly need to mention a field named embeddingYou do need an explicit field, but it can be named anything
Jason Bosco
03/06/2024, 7:28 PMShould i have the model stored in some directory within my project so that typesense can access it?No, Typesense will automatically download the model for you. The issue here is a syntax error, see my first message above
Aadarsh
03/06/2024, 7:59 PMts/
didn't help, still the same error.Jason Bosco
03/06/2024, 8:05 PMJason Bosco
03/06/2024, 8:05 PMAadarsh
03/06/2024, 8:12 PM{
"name": "TEST",
"fields": [
{
"name": "testId",
"type": "string",
"facet": true
},
{
"name": "name",
"type": "string",
"facet": true
},
{
"name": "category",
"type": "string",
"facet": true
},
{
"name": "userId",
"type": "string",
"facet": true,
"optional": true
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"name",
"category"
],
"model_config": {
"model_name": "ts/all-MiniLM-L12-v2"
}
}
}
]
}
Aadarsh
03/06/2024, 8:13 PM{
"testId": "1",
"name": "Test String",
"userId": "10",
"category": "test-category",
"embedding": []
}
Jason Bosco
03/06/2024, 8:16 PMJason Bosco
03/06/2024, 8:17 PMexport TYPESENSE_API_KEY=xyz
mkdir $(pwd)/typesense-data
docker run -p 8108:8108 \
-v$(pwd)/typesense-data:/data typesense/typesense:0.25.2 \
--data-dir /data \
--api-key=$TYPESENSE_API_KEY \
--enable-cors
export TYPESENSE_API_KEY=xyz
curl "<http://localhost:8108/debug>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
curl "<http://localhost:8108/collections>" \
-X POST \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '
{
"name": "TEST",
"fields": [
{
"name": "testId",
"type": "string",
"facet": true
},
{
"name": "name",
"type": "string",
"facet": true
},
{
"name": "category",
"type": "string",
"facet": true
},
{
"name": "userId",
"type": "string",
"facet": true,
"optional": true
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"name",
"category"
],
"model_config": {
"model_name": "ts/all-MiniLM-L12-v2"
}
}
}
]
}
'
Jason Bosco
03/06/2024, 8:17 PMAadarsh
03/06/2024, 8:18 PMAadarsh
03/06/2024, 8:25 PMcurl "<http://localhost:8108/debug>" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"
{"state":1,"version":"0.25.2"}
Response to the collection schema creation curl
{
"message": "Model not found"
}
Aadarsh
03/06/2024, 8:27 PMtypesense 0.25.2
as well?Jason Bosco
03/06/2024, 8:31 PMJason Bosco
03/06/2024, 8:31 PMAadarsh
03/06/2024, 8:32 PMJason Bosco
03/06/2024, 8:43 PM0.26.0.rc62
of Typesense Server?Aadarsh
03/07/2024, 5:20 AM0.26.0.rc62
but i received an empty response.
### The response:
{
"results": [
{
"facet_counts": [],
"found": 0,
"hits": [],
"out_of": 0,
"page": 1,
"request_params": {
"collection_name": "TEST",
"first_q": "test",
"per_page": 10,
"q": "test"
},
"search_cutoff": false,
"search_time_ms": 8
}
]
}
### The collection schema
{
"name": "TEST",
"fields": [
{
"name": "testId",
"type": "string",
"facet": true
},
{
"name": "name",
"type": "string",
"facet": true
},
{
"name": "category",
"type": "string",
"facet": true
},
{
"name": "userId",
"type": "string",
"facet": true,
"optional": true
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"name",
"category"
],
"model_config": {
"model_name": "ts/all-MiniLM-L12-v2"
}
}
}
]
}
### SEARCH REQUESTS
{
"searches": [
{
"collection": "TEST",
"q": "test",
"sort_by": "_vector_distance:asc,_text_match:desc",
"prioritize_token_position": "true"
}
]
}
### COMMON SEARCH PARAMS
{
"query_by": "embedding,name,category"
}
### Sample Document from JSONL File
{
"testId": "1",
"name": "Test String",
"userId": "10",
"category": "test-category",
"embedding": []
}
Aadarsh
03/07/2024, 5:21 AM"out_of": 0
Kishore Nallan
03/07/2024, 5:22 AMAadarsh
03/07/2024, 5:25 AMAadarsh
03/07/2024, 5:25 AMKishore Nallan
03/07/2024, 5:26 AMAadarsh
03/07/2024, 5:27 AM['Field `embedding` contains an invalid embedding.']
Kishore Nallan
03/07/2024, 5:27 AMAadarsh
03/07/2024, 5:28 AMKishore Nallan
03/07/2024, 5:28 AMKishore Nallan
03/07/2024, 5:29 AMAadarsh
03/07/2024, 5:33 AM0.25.2
Aadarsh
03/07/2024, 5:46 AMAadarsh
03/07/2024, 7:47 AMlaptop
then it should return only documents containing the keyword laptop and the documents containing the related keywords like computer based sorted by the vector distance and rank fusion score. But currently it is returning all documents (even with no text matching and absolutely unrelated) from the collection sorted by vector dictanceKishore Nallan
03/07/2024, 7:50 AMAadarsh
03/07/2024, 7:52 AMKishore Nallan
03/07/2024, 7:53 AMAadarsh
03/07/2024, 8:05 AMKishore Nallan
03/07/2024, 8:19 AM_text_match
is already rank fusion score in hybrid search so you can just sort on _text_match:desc
Aadarsh
03/07/2024, 8:23 AM{
"results": [
{
"facet_counts": [],
"found": 2,
"hits": [
{
"document": {
"category": "test-category",
"id": "2",
"name": "Gift laptop to your friend",
"testId": "1011",
"userId": "15"
},
"highlight": {
"name": {
"matched_tokens": [
"laptop"
],
"snippet": "Gift <mark>laptop</mark> to your friend"
}
},
"highlights": [
{
"field": "name",
"matched_tokens": [
"laptop"
],
"snippet": "Gift <mark>laptop</mark> to your friend"
}
],
"hybrid_search_info": {
"rank_fusion_score": 1
},
"text_match": 1060320051,
"text_match_info": {
"best_field_score": "517734",
"best_field_weight": 102,
"fields_matched": 3,
"score": "1060320051",
"tokens_matched": 0
},
"vector_distance": 0.39046764373779297
},
{
"document": {
"category": "test-category",
"id": "3",
"name": "Gift a gaming computer to your friend",
"testId": "10111",
"userId": "15"
},
"highlight": {},
"highlights": [],
"hybrid_search_info": {
"rank_fusion_score": 0.15000000596046448
},
"text_match": 0,
"text_match_info": {
"best_field_score": "0",
"best_field_weight": 0,
"fields_matched": 0,
"score": "0",
"tokens_matched": 0
},
"vector_distance": 0.5736579895019531
}
],
"out_of": 15,
"page": 1,
"request_params": {
"collection_name": "TEST",
"per_page": 10,
"q": "laptop"
},
"search_cutoff": false,
"search_time_ms": 5
}
]
}
Kishore Nallan
03/07/2024, 8:26 AM_text_match
you are actually sorting on fusion score.Aadarsh
03/07/2024, 8:27 AMAadarsh
03/07/2024, 5:28 PM[Errno 404] Model not found
in my another laptop that doesn't have GPU. Using both the versions RC62 and 0.25.2.Jason Bosco
03/07/2024, 6:26 PM0.26.0.rc62
and share the output of each command and also the Typesenese logs from the beginning of the process start?