Finding Similar Documents Using JSON and Embeddings
TLDR Manish wants to find similar JSON documents and asks for advice. Jason suggests using Sentence-BERT with vector query and provides guidance on working with OpenAI embeddings and Typesense. They discuss upcoming Typesense features and alternative models.
6
1
1
May 09, 2023 (5 months ago)
Manish
06:03 PMManish
06:04 PMJason
06:08 PMJason
06:09 PMManish
06:09 PMManish
06:10 PMJason
06:11 PM1
Jason
06:12 PMJason
06:12 PMManish
06:13 PMManish
06:14 PMJason
06:20 PMManish
06:21 PMJason
06:21 PMJason
06:24 PMJason
06:24 PMJason
06:25 PMJason
06:25 PMManish
06:26 PM1
Jason
06:27 PMYou could concatenate all messages from a single thread into one long string and generate embeddings for the full discussion thread…
Manish
06:27 PMJason
06:28 PM1
Manish
06:29 PMJason
06:30 PMManish
06:31 PMManish
06:31 PM// have observed inferior results when newlines are present.
// E.g.
// "The food was delicious and the waiter..."
Manish
06:31 PMJason
06:32 PMManish
06:43 PMManish
06:43 PMJason
06:44 PMnum_dim
property in the field definitioin in the Typesense collection schema to the number of floats you seeJason
06:44 PM1536
Manish
06:45 PMJason
06:45 PMManish
06:45 PMManish
06:46 PM1
Manish
06:48 PMJason
06:48 PMvector_query
parameter… which you’d use for similarity search and semantic searchManish
06:52 PMJason
06:52 PMManish
06:53 PMJason
06:53 PM2
1
1
Manish
06:53 PMJason
06:54 PMManish
06:54 PMJason
06:54 PMManish
06:55 PMManish
06:55 PMJason
06:57 PMJason
06:57 PMManish
06:57 PMManish
06:57 PMJason
06:58 PMManish
07:09 PMHow would that work? For each search request, you'd have to send a call to openAI apis?
Jason
07:10 PMJason
07:10 PMJason
07:11 PMManish
07:11 PMManish
07:12 PMJason
07:12 PMManish
07:13 PMTypesense
Indexed 2779 threads (79% resolved)
Similar Threads
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Integrating OpenAI Embeddings with DocSearch Scraper
Marcos was looking for how to use OpenAI embeddings with DocSearch. Jason guided with an update to the scraper config, and suggested the GTE built-in model for generic use.
Building a Recommendation System with Typesense
crapthings sought help on setting up a recommendation system using typesense for a database-embedded questions. Jason mentioned using linguistic models for user interactions but recommended starspace model for considering other users' interactions. OpenAI models can be used for linguistic similarity.
Announcement: General Availability of Typesense v0.25.0
Jason announces release of Typesense v0.25.0, listing new features. Users express excitement and ask pertinent questions. Gorkem, Manuel, and Daniel commend the team for the new functionalities. Manish and Tugay share their positive experiences with Typesense. Jason and Kishore Nallan answer questions and thank users for their feedback.
Optimum Cluster for 1M Documents with OpenAI Embedding
Denny inquired about the ideal cluster configuration for handling 1M documents with openAI embedding. Jason recommended a specific configuration, explained record size calculation, and clarified embedding generation speed factors and the conditions that trigger openAI.