#community-help

Using Multilingual-e5-Base Model in Huggingface

TLDR Bill asked about using the Multilingual-e5-Base model and the required structure. Kishore Nallan advised that prefixes are handled automatically in Typesense. The user had error issues but resolved them, and Kishore Nallan mentioned a recent fix.

Powered by Struct AI
+12
19
2w
Solved
Join the chat
Sep 06, 2023 (3 weeks ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:15 PM
Hi, the multilingual-e5-base that you added in huggingface is better/faster than the paraphrase-multilingual-mpnet-base-v2 model?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:17 PM
Yes, generally the e5 family is close to state of the art since they are more recent.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:18 PM
Okay, what's the structure in order to use it in the collection? Do I have to add "query:"?
01:19
Bill
01:19 PM
"model_config": {
"model_name": "ts/multilingual-e5-base",
"indexing_prefix": "query:",
"query_prefix": "query:"
}
01:20
Bill
01:20 PM
or "passage:" ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:26 PM
Yes the same format as all other e5 models that follow that convention.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:27 PM
Should I use passage or query?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:34 PM
In typesense this is already handled for you. You don't have to prefix anything.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:35 PM
Ok, so the filed structure will be:
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": ["product_name"],
"model_config": {
"model_name": "ts/multilingual-e5-base",
"indexing_prefix": "passage:",
"query_prefix": "query:"
}
}
}
Am i right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:37 PM
You can leave out those prefix fields. They should default automatically.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:38 PM
So only the model_name should I add?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:17 PM
Yes
+11
Sep 07, 2023 (2 weeks ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:32 PM
I'm trying to create the following collection but I get error:
{
"name": "products",
"fields": [
{
"name": "brand",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"brand"
],
"model_config": {
"model_name": "ts/multilingual-e5-base"
}
}
}
]
}
01:32
Bill
01:32 PM
{
"message": "Bad request."
}
01:32
Bill
01:32 PM
In localhost, macOS with docker compose
01:33
Bill
01:33 PM
In logs I get this -> E20230907 13:33:24.950084 195 batched_indexer.cpp:229] Raw error: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:13 PM
Could you try replicating it with curl? If you can replicate it, could you share the full curl command?
Sep 08, 2023 (2 weeks ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
06:18 AM
Problem solved. It was an issue with the model
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:19 AM
We actually fixed this in recent 0.25.1 RC build.
+11