Best Practices for Multisearch Across Collections and Removing Non-important Words
TLDR robert asked for best practices on multisearching across collections and deduping results. He later asked about lessening the importance of trivial words in the search results. Kishore Nallan suggested implementing stop words and a proper Q&A model to tackle semantic queries.
Nov 12, 2022 (11 months ago)
robert
01:51 PMFor example, I have the keywords "programming", "organization", and "files". I have two collections I want to search these keywords in uniquely. Then I want to share the results by grouping the keywords together and deduping their results within a collection. I have two different UI views sharing the reuslts from the two different collections.
Is this all client side manipulation?
Jason
01:58 PMrobert
01:58 PMrobert
01:59 PMrobert
02:02 PMDocuments in collection contain paragraphs of text. I'm trying to search paragraphs for keywords.
For example document might have a paragraph. "My organization services the poor and unneeded. We do that by providing clothes & shelter. We also support programming efforts by xyz."
The user then has a question:
"What is your organization's mission?"
On client side i've explored breaking down the question into semantic keywords like: organization, mission.
I then want to search "organization, mission" and match individually on the keywords to retrieve the above paragraph.
There are multiple collections that contain different "paragraph-like" snippets as the one above but serve different UI purposes.
How do I best utilize typesense to solve this particular use case?
robert
02:09 PMKishore Nallan
02:40 PMrobert
02:43 PM1. User has question
2. Use openai to parse question like "What is your mission statement" to get output "mission statement".
3. Take keyword and search against typesense against multiple collections
4. Show results
The problem with the above is when the question is something like "What is the programming of your organization. How do you ensure equal results? What is the real answer to god?"
And we parse that into "programming, ensure equal results, answer to god" and we want to equally search all of those keywords agianst multiple collections.
My questions are:
1. How to lessen the weight of non important words (the, and, your, etc) in results?
2. Is there a best practice for multisearch in the above scenario?
robert
02:45 PMKishore Nallan
02:46 PMrobert
02:47 PMrobert
02:48 PMKishore Nallan
03:20 PMTypesense
Indexed 2776 threads (79% resolved)
Similar Threads
Phrase Search Relevancy and Weights Fix
Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.
Moving from Algolia to Typesense: Questions and Answers
Juan sought advice from Kishore Nallan about moving from Algolia to Typesense, handling MultiSearch, setting parameters, checking imported documents, and a specific syntax query.
Integrating Semantic Search with Typesense
Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Querying and Indexing Multiple Elements Issues
Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.