Best Practices for Vector Search
TLDR Gio asked about best practices for vector search in custom search engines. Kishore Nallan suggested working on paragraphs instead of sentences for better context.
1
Feb 28, 2023 (9 months ago)
Gio
11:58 AMLike imagine I wanted to build a custom search engine for Facebook FAQs. Should I vectorize data sentence by sentence or paragraph by paragraph?
Kishore Nallan
12:02 PM> Should I vectorize data sentence by sentence or paragraph by paragraph?
All the demos I've seen on Wikipedia etc. work on paragraphs. Sentences are usually too small to have enough context.
1
Typesense
Indexed 3011 threads (79% resolved)
Similar Threads
Understanding Vector Search with Typesense
In a chat with em1nos and Andrew, Kishore Nallan explained how Vector Search works. He clarified that it can be useful for recommendations and personalization, but it requires machine learning to convert data into vectors before searching.
Optimizing Dataset of Podcast Feeds for a Searchable Database
Alexander seeks advice on optimizing a podcast database for search. Kishore Nallan suggests data size and stopwords impact RAM usage, and that benchmarking on 1M records would be useful. satish raises the potential need for vector searching. Both recommend feeding user activity data into ML models for relevancy ranking. Collaboration was suggested.
Troubleshooting Semantic Search with Typesense
Koushik asked for help on semantic search with Typesense. Jason suggested ensuring spaces between field values and trying different models.