#community-help

Reranking Search Results from Different Sources

TLDR Viktor seeks advice on reranking search results. Kishore Nallan suggests hybrid search and custom ranking algorithms. John recommends Metarank as a potential solution.

Powered by Struct AI
Apr 20, 2023 (5 months ago)
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
08:57 AM
We show search results to our users from both Typesense and external search APIs. For this we need to rerank the search results that come from different sources to provide one list of search results that is sorted by relevance across the different sources. Our current approach to this is to build a Lucene index on the fly with all the search results from the different sources (~100 search results). This doesn’t give great results. Can Typesense itself be used for reranking like this? We’ve seen that Typesense isn’t doing so much of sorting based on keyword relevance (eg. words like “the” are ranked as highly as more discriminative words like “cardiovascular”). Any tips for how to deal with this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:07 AM
Typesense does not weight tokens in a query like lucene which uses an algorithm like tf-idf. This is because Typesense is typically used for searching on short strings where the tf idf score is as meaningful. We instead compute a text score that considers the proximity of tokens, how many query tokens are present, typos etc.
10:10
Kishore Nallan
10:10 AM
If you want this type of discrimination i recommend generating word embeddings and using vector search to produce candidates. Recent RC builds also support hybrid search so that you can use both keyword and vector search to return results. In our benchmarks this works very well for search uses cases that require this type of word discrimination
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
12:22 PM
Elyes
12:24
Viktor
12:24 PM
Thanks Kishore Nallan, we’ve seen that embedding based search struggles for quite many types of texts in our use case. We’re looking for a method of reranking that is more stable. Have you seen anything working well for this?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:29 PM
That's where hybrid search helps -- you get best of both worlds. However, this will still not solve your original problem if the goal is to rerank results from multiple systems in some unified way. That will possibly require your own custom ranking algorithm that looks at the overlap of tokens etc. Not very straight forward.
John
Photo of md5-21545f1facb7836c149bc4c70752bd2b
John
06:43 PM
You might be interested in checking out Metarank, they have a blog post describing something similar
Viktor
Photo of md5-972da58c82de3b38862220702e852eda
Viktor
07:27 PM
Thanks John, this looks interesting!