Converting finbert to ONNX for Typesense Model Repository
TLDR Walter requested the conversion of finbert to onnx from Jason. Conversation included discussions about model differences and technical adjustments, ultimately ending in a successful conversion and a plan to handle a known bug.
1
Sep 21, 2023 (2 months ago)
Walter
10:52 PMAlso no big deal if you can't! I already appreciate everything you guys have done.
Jason
10:56 PMJason
10:56 PMWalter
11:10 PMSep 22, 2023 (2 months ago)
Jason
06:42 PMts/finbert
?Walter
09:19 PMI tried dropping our current embedding field and re-adding it with the model name replaced (swap e5-small for finbert). It says:
> Error: e: Request failed with HTTP code 400 | Server said: Schema change is incompatible with the type of documents already stored in this collection. error: Field
embedding
must have 768 dimensions.My guess is that e5-small has fewer dimensions. that vector is still stored in the typesense document, and if I want to use the finbert embeddings I need to use a different field?
Jason
09:21 PMSo until then you want to create a new collection and reindex your docs in it
Walter
09:24 PMIs that bug fix a few days or few weeks away? If a few days, I'll wait, if a few weeks, I'll probably create new collections.
Again, thanks for being so responsive and adding finbert so quickly 🙏
1
Jason
09:33 PMIf the field values change, then it will fix itself. Otherwise, if it's an upsert with the same data, then the embeddings won't be regenerated.
The bug fix is about 1-2 weeks away
Walter
09:34 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Customizing Embedding Models for Finance and Economics App
Walter asked for help implementing a finance-focused model to his application. Jason provided instructions on how to use custom models and offered to convert and upload a finbert model for Walter to use on Typesense Cloud.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Issues with Cluster Upgrade and Embedding Field
Gustavo had issues upgrading their cluster and their embedding field wasn't being filled. Jason helped to solve the upgrade issue and advised re-indexing the documents to solve the embedding field issue. Both problems were successfully resolved.
Finding Similar Documents Using JSON and Embeddings
Manish wants to find similar JSON documents and asks for advice. Jason suggests using Sentence-BERT with vector query and provides guidance on working with OpenAI embeddings and Typesense. They discuss upcoming Typesense features and alternative models.
Discussion on Performance and Scalability for Multiple Term Search
Bill asks the best way for multi-term searches in a recommendation system they developed. Kishore Nallan suggested using embeddings and remote embedder or storing and averaging vectors. Despite testing several suggested solutions, Bill continued to face performance issues, leading to unresolved discussions about scalability and recommendation system performance.