Finding Similar Documents Using JSON and Embeddings
TLDR Manish wants to find similar JSON documents and asks for advice. Jason suggests using Sentence-BERT with vector query and provides guidance on working with OpenAI embeddings and Typesense. They discuss upcoming Typesense features and alternative models.
6
1
1
May 09, 2023 (7 months ago)
Manish
06:03 PMManish
06:04 PMJason
06:08 PMJason
06:09 PMManish
06:09 PMManish
06:10 PMJason
06:11 PM1
Jason
06:12 PMJason
06:12 PMManish
06:13 PMManish
06:14 PMJason
06:20 PMManish
06:21 PMJason
06:21 PMJason
06:24 PMJason
06:24 PMJason
06:25 PMJason
06:25 PMManish
06:26 PM1
Jason
06:27 PMYou could concatenate all messages from a single thread into one long string and generate embeddings for the full discussion thread…
Manish
06:27 PMJason
06:28 PM1
Manish
06:29 PMJason
06:30 PMManish
06:31 PMManish
06:31 PM// have observed inferior results when newlines are present.
// E.g.
// "The food was delicious and the waiter..."
Manish
06:31 PMJason
06:32 PMManish
06:43 PMManish
06:43 PMJason
06:44 PMnum_dim
property in the field definitioin in the Typesense collection schema to the number of floats you seeJason
06:44 PM1536
Manish
06:45 PMJason
06:45 PMManish
06:45 PMManish
06:46 PM1
Manish
06:48 PMJason
06:48 PMvector_query
parameter… which you’d use for similarity search and semantic searchManish
06:52 PMJason
06:52 PMManish
06:53 PMJason
06:53 PM2
1
1
Manish
06:53 PMJason
06:54 PMManish
06:54 PMJason
06:54 PMManish
06:55 PMManish
06:55 PMJason
06:57 PMJason
06:57 PMManish
06:57 PMManish
06:57 PMJason
06:58 PMManish
07:09 PMHow would that work? For each search request, you'd have to send a call to openAI apis?
Jason
07:10 PMJason
07:10 PMJason
07:11 PMManish
07:11 PMManish
07:12 PMJason
07:12 PMManish
07:13 PMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Integrating OpenAI Embeddings with DocSearch Scraper
Marcos was looking for how to use OpenAI embeddings with DocSearch. Jason guided with an update to the scraper config, and suggested the GTE built-in model for generic use.
Issues with Semantic Search in OpenAI Embeddings
Semyon was having issues while creating a schema for semantic search with OpenAI embeddings. Jason gave multiple suggestions for troubleshooting but none of them worked. The error was narrowed down to possibly being related to Semyon's specific OpenAI account or the OpenAI API key. The thread ended with Jason suggesting Semyon to check billing and make a direct API call to OpenAI.
Building a Recommendation System with Typesense
crapthings sought help on setting up a recommendation system using typesense for a database-embedded questions. Jason mentioned using linguistic models for user interactions but recommended starspace model for considering other users' interactions. OpenAI models can be used for linguistic similarity.
Announcement: General Availability of Typesense v0.25.0
Jason announces release of Typesense v0.25.0, listing new features. Users express excitement and ask pertinent questions. Gorkem, Manuel, and Daniel commend the team for the new functionalities. Manish and Tugay share their positive experiences with Typesense. Jason and Kishore Nallan answer questions and thank users for their feedback.