#community-help

Using Multilingual-e5-Base Model in Huggingface

TLDR Bill asked about using the Multilingual-e5-Base model and the required structure. Kishore Nallan advised that prefixes are handled automatically in Typesense. The user had error issues but resolved them, and Kishore Nallan mentioned a recent fix.

Powered by Struct AI

2

Sep 06, 2023 (3 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:15 PM
Hi, the multilingual-e5-base that you added in huggingface is better/faster than the paraphrase-multilingual-mpnet-base-v2 model?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:17 PM
Yes, generally the e5 family is close to state of the art since they are more recent.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:18 PM
Okay, what's the structure in order to use it in the collection? Do I have to add "query:"?
01:19
Bill
01:19 PM
"model_config": {
"model_name": "ts/multilingual-e5-base",
"indexing_prefix": "query:",
"query_prefix": "query:"
}
01:20
Bill
01:20 PM
or "passage:" ?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:26 PM
Yes the same format as all other e5 models that follow that convention.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:27 PM
Should I use passage or query?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:34 PM
In typesense this is already handled for you. You don't have to prefix anything.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:35 PM
Ok, so the filed structure will be:
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": ["product_name"],
"model_config": {
"model_name": "ts/multilingual-e5-base",
"indexing_prefix": "passage:",
"query_prefix": "query:"
}
}
}
Am i right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:37 PM
You can leave out those prefix fields. They should default automatically.
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:38 PM
So only the model_name should I add?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
04:17 PM
Yes

1

Sep 07, 2023 (3 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
01:32 PM
I'm trying to create the following collection but I get error:
{
"name": "products",
"fields": [
{
"name": "brand",
"type": "string"
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"brand"
],
"model_config": {
"model_name": "ts/multilingual-e5-base"
}
}
}
]
}
01:32
Bill
01:32 PM
{
"message": "Bad request."
}
01:32
Bill
01:32 PM
In localhost, macOS with docker compose
01:33
Bill
01:33 PM
In logs I get this -> E20230907 13:33:24.950084 195 batched_indexer.cpp:229] Raw error: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
03:13 PM
Could you try replicating it with curl? If you can replicate it, could you share the full curl command?
Sep 08, 2023 (3 months ago)
Bill
Photo of md5-be53735a2b0297bb542711c1d2ecea45
Bill
06:18 AM
Problem solved. It was an issue with the model
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:19 AM
We actually fixed this in recent 0.25.1 RC build.

1

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3015 threads (79% resolved)

Join Our Community

Similar Threads

Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense

Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.

11

225
4mo

Issue with Creating Embeddings using Specific Model Spec on Typesense Cloud

Joel shared issues while creating embeddings with `ts/multilingual-e5-large` model on Typesense Cloud v0.25.1. Jason suggested an upgrade, resolving the issue.

3

14
1w

Issue with Embedding Error in Version 0.25.0.rc63

Bill reported a bug in version 0.25.0.rc63 regarding a problem with updating or emplacing a document and receiving an embedding error. This was resolved in version 0.25.0.rc65, but further discussion ensued regarding the function of 'index' in the update feature.

5

63
4mo

Resolving Multilingual Search Function in Typesense Software

Bill is having difficulty with multilingual search functionality in Typesense software. Developer Kishore Nallan suggested setting a language locale and provided a demo build. The build solution had some issues, and after multiple rounds of software updates and troubleshooting, the problem still persists.

2

89
25mo

Discussions on Typesense, Collections, and Dynamic Fields

Tugay shares plans to use Typesense for their SaaS platform and asks about collection sizes and sharding. Jason clarifies Typesense's capabilities and shares a beta feature. They discuss using unique collections per customer and new improvements. Kishore Nallan and Gabe comment on threading and data protection respectively.

3

45
35mo