I m using hybrid search and I found that I get duplicated el typesense #community-help

I'm using hybrid search, and I found that I get du...

Óscar Vicente

11/07/2024, 12:09 PM

I'm using hybrid search, and I found that I get duplicated elements as I page through the results. How can I get rid of them? It's annoying our customers

Óscar Vicente

11/07/2024, 12:10 PM

It happens when I go to the next page, and the first and last element of the previous page matches after you go deep enough

Kishore Nallan

11/07/2024, 12:18 PM

Set an explicit k value for vector query. Otherwise the k value varies based on pagination. This can throw up duplicates.

Kishore Nallan

11/07/2024, 12:18 PM

Set a large k value like 200

Óscar Vicente

11/07/2024, 12:19 PM

The problem with that is that I will receive more results, but I'll still have the problem in the last pages. I have 3M+ documents, so it's posible to have many thousans per query

Kishore Nallan

11/07/2024, 12:24 PM

Deep pagination does not work well with vector search without an upfront

value that pre-fetches the result. This is because due to the approximate nature of search, as the search radius (

) expands, more relevant documents could be found which affects overall ranking, which leads to a duplication effect.

Kishore Nallan

11/07/2024, 12:25 PM

This is unlike keyword search where we already have documents that match keywords in an inverted index which we are able to paginate in a deterministic manner.

Óscar Vicente

11/07/2024, 12:25 PM

Is there any othe way of mitigating the issue? What I understand is that it is what it is and there's no way of improving it

Kishore Nallan

11/07/2024, 12:26 PM

Deep pagination is simply not possible without duplication

Óscar Vicente

11/07/2024, 12:29 PM

So if Keyword search returns more elements than the vector search, once you go over the

you will start to see this issues, even if you limit the pagination to

found / pageSize

pages, right? So without knowing the aproximate results, you can't really guess a good

for the search. Can I mitigate it by tweaking ef and M parameters?

Óscar Vicente

11/07/2024, 12:39 PM

I guess this will also affect facets, even if you provide a distance_threshold, right?

Óscar Vicente

11/07/2024, 12:40 PM

There's no way to just search all of them

Kishore Nallan

11/07/2024, 12:40 PM

Right, there's is no way to do exhaustively search through vectors fast.

Óscar Vicente

11/07/2024, 12:42 PM

So if I tweak

ef

and

for index, I can improve the quality of the index avoiding some but not all of the duplicates, right? As a side note, If the index fit within a gpu memory, could it speed up the operations so we can play with higher

and mitigate this further? I mean, in the future

Kishore Nallan

11/07/2024, 12:46 PM

Not for millions of hits. But ef and M will help at the cost of additional latency.

🙌 1

Open in Slack

Previous Next