#community-help

Reranking Search Results from Different Sources

TLDR Viktor seeks advice on reranking search results. Kishore Nallan suggests hybrid search and custom ranking algorithms. John recommends Metarank as a potential solution.

Powered by Struct AI
Apr 20, 2023 (7 months ago)
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
08:57 AM
We show search results to our users from both Typesense and external search APIs. For this we need to rerank the search results that come from different sources to provide one list of search results that is sorted by relevance across the different sources. Our current approach to this is to build a Lucene index on the fly with all the search results from the different sources (~100 search results). This doesn’t give great results. Can Typesense itself be used for reranking like this? We’ve seen that Typesense isn’t doing so much of sorting based on keyword relevance (eg. words like “the” are ranked as highly as more discriminative words like “cardiovascular”). Any tips for how to deal with this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:07 AM
Typesense does not weight tokens in a query like lucene which uses an algorithm like tf-idf. This is because Typesense is typically used for searching on short strings where the tf idf score is as meaningful. We instead compute a text score that considers the proximity of tokens, how many query tokens are present, typos etc.
10:10
Kishore Nallan
10:10 AM
If you want this type of discrimination i recommend generating word embeddings and using vector search to produce candidates. Recent RC builds also support hybrid search so that you can use both keyword and vector search to return results. In our benchmarks this works very well for search uses cases that require this type of word discrimination
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
12:22 PM
Elyes
12:24
Viktor
12:24 PM
Thanks Kishore Nallan, we’ve seen that embedding based search struggles for quite many types of texts in our use case. We’re looking for a method of reranking that is more stable. Have you seen anything working well for this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:29 PM
That's where hybrid search helps -- you get best of both worlds. However, this will still not solve your original problem if the goal is to rerank results from multiple systems in some unified way. That will possibly require your own custom ranking algorithm that looks at the overlap of tokens etc. Not very straight forward.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
06:43 PM
You might be interested in checking out Metarank, they have a blog post describing something similar
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
07:27 PM
Thanks John, this looks interesting!

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3005 threads (79% resolved)

Join Our Community

Similar Threads

Improving Search Relevance with Typesense

Viktor asks how Typesense calculates relevance and Jason suggests using vector search, specifically S-BERT embeddings, to better match low information queries to relevant documents.

7
10mo
Solved

Optimizing Dataset of Podcast Feeds for a Searchable Database

Alexander seeks advice on optimizing a podcast database for search. Kishore Nallan suggests data size and stopwords impact RAM usage, and that benchmarking on 1M records would be useful. satish raises the potential need for vector searching. Both recommend feeding user activity data into ML models for relevancy ranking. Collaboration was suggested.

26
20mo
Solved

Integrating Semantic Search with Typesense

Krish wants to integrate a semantic search functionality with typesense but struggles with the limitations. Kishore Nallan provides resources, clarifications and workarounds to the raised issues.

6

75
11mo
Solved

Enhancing Typesense Search for Multiple Indexes in CRM Data

William was facing issues with Typesense's search performance on CRM data. JinW and Kishore Nallan suggested strategies, such as adjusting typesense tokens and creating a "concatenated" field for better search results.

5
26mo

Discussion on Performance and Scalability for Multiple Term Search

Bill asks the best way for multi-term searches in a recommendation system they developed. Kishore Nallan suggested using embeddings and remote embedder or storing and averaging vectors. Despite testing several suggested solutions, Bill continued to face performance issues, leading to unresolved discussions about scalability and recommendation system performance.

3

105
1w