Finding Similar Documents Using JSON and Embeddings
TLDR Manish wants to find similar JSON documents and asks for advice. Jason suggests using Sentence-BERT with vector query and provides guidance on working with OpenAI embeddings and Typesense. They discuss upcoming Typesense features and alternative models.
May 09, 2023 (5 months ago)
You could concatenate all messages from a single thread into one long string and generate embeddings for the full discussion thread…
// have observed inferior results when newlines are present.
// "The food was delicious and the waiter..."
num_dimproperty in the field definitioin in the Typesense collection schema to the number of floats you see
vector_queryparameter… which you’d use for similarity search and semantic search
How would that work? For each search request, you'd have to send a call to openAI apis?
Indexed 2779 threads (79% resolved)
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Integrating OpenAI Embeddings with DocSearch Scraper
Marcos was looking for how to use OpenAI embeddings with DocSearch. Jason guided with an update to the scraper config, and suggested the GTE built-in model for generic use.
Building a Recommendation System with Typesense
crapthings sought help on setting up a recommendation system using typesense for a database-embedded questions. Jason mentioned using linguistic models for user interactions but recommended starspace model for considering other users' interactions. OpenAI models can be used for linguistic similarity.
Announcement: General Availability of Typesense v0.25.0
Jason announces release of Typesense v0.25.0, listing new features. Users express excitement and ask pertinent questions. Gorkem, Manuel, and Daniel commend the team for the new functionalities. Manish and Tugay share their positive experiences with Typesense. Jason and Kishore Nallan answer questions and thank users for their feedback.
Optimum Cluster for 1M Documents with OpenAI Embedding
Denny inquired about the ideal cluster configuration for handling 1M documents with openAI embedding. Jason recommended a specific configuration, explained record size calculation, and clarified embedding generation speed factors and the conditions that trigger openAI.