Hi, My company is using all three sorting criteria...
# community-help
p
Hi, My company is using all three sorting criteria. We would actually like to sort on a 4th criterium. Is there a reason that sorting criteria are capped at three?
k
It's a design decision due to performance considerations. If you want more sort fields, you can try combining the values into a single field value by summing / multiplying the components to form a large single score.
p
We are reranking with _vector_distance_ and a custom rank score. Would we be able to do calculations on the vector distance?
k
Those are 2 fields, what are the other 2 fields you want to add in the sorting condition? Generally beyond 3 sorting conditions, the tie-breaking becomes very rare (you require 3 fields to match exactly for the 4 field to be considered).
p
The main field is text match. We would like to impliment merchandising as well.
We are also looking for a way to smooth out the continous scores of text_match and vector_distance to make tie breaking occur more often. Ideally we would like everything to be ranked together at once instead of tie-break sorting.
We are using the vectors for personalization, so hybrid search does not work in our case, bevause that would give a lot irrelevant results to the search phrase.
k
We support bucketing of text match scores, and in recent RC builds, have made it even more flexible by allowing you to specify the number of docs that go into a bucket (instead of just number of buckets). See: https://github.com/typesense/typesense/pull/2120 But my earlier comment holds: from my experience with other customers, seldom do 4 independent ranking fields work well in practice. Either you should model it as some form of weighted score or fine-tune the model itself to incorporate those signals.
p
Okay, is there a way to make a weighted score in typesense? We need to be able to include text_match and vector_distance.
k
That's what the hybrid search option does. It gives you ability to control the
alpha
parameter that weights keyword search vs embedding searches.
p
Yes, we would love to do that, but we need a way to filter out documents that have no text_match scores. Without a filter, users will see a documents that they have not searched for, because they are close to their user embedding. we need to be able to filter out those documents.
k
Ok got it, then bucketing on text match scores will help you I think. It will introduce a certain amount of fuzziness into text match scores so that the vector distance is used for re-ranking.
p
Example with hybrid search: a user have looked at nike shoes in your store. They then search for tennis rackets. If you don't have a lot of tennis rackets in your stores, Nike shoes may pop up in the search result.
But can you perform calculations on text match score or the vector distance? Can we sum the text_match and custom rank score?
k
If you don't have a lot of tennis rackets in your stores, Nike shoes may pop up in the search result.
This is because your embeddings are personalized on user's search history?
p
Yes, we create embeddings based on user history.
k
Can we sum the text_match and custom rank score?
No that can't be done. You will have to do the re-ranking then on the client if you want a lot of custom logic like this. In any case, for vector search pagination is not recommended because of the approximate nature of the search. When you paginate, the larger search radius could produce better results that didn't show up with the smaller
k
used for the first page, and this can cause the order of results to seemingly repeat. So I recommend just doing a single request with a large
k
like
100
and then apply your custom ranking on the client side.
p
Okay. Thanks. I think we will have to stick to sorting and bucketing for our case. We just needed to clear on what tools are available to us.
👍 1
Just to be clear on our use-case for 4 sorting fields: We want to sort on 1) text-match, then 2) pin products bought earlier with "sorting based on filter score", then 3) sort based on vector distance to user embedding (for personalization based on user history), and lastly 4) sort on a custom rank score for products that are too far away from the user embedding. We are fairly certain that all four fields will be used in our case, because most users won't be have strong relations to most products. We see your point regarding perfomance with 4 sorting fields.
k
I suspect that component 3 (sorting on vector distance) will not produce any ties so the 4th sorting field will never really work.
p
Well, Not in itself, but it does, when you have distance filter on.
k
How would that help with tie-breaking? With filter you will exclude some low-scoring docs. But the ones that do get matched will have float values for vector similarity.
p
every time a user searches for something unrelated to their ealier history, all search results will tie on vector_distance, because of the distance_threshold. Then the custom ranks core applies.
👍 1
k
Is there a feature we can add to hybrid search as it works today to account for this use case?
p
In our case, we need to be able to filter to only show documents that are found by the text search. Then we can use the alpha to dial in the influence of the personalization (vector search). Though I'm wondering will this render the custom rank score useless?
k
This can be done with sorting on vector query so the real limitation you are running into here is the hard limit on sorting fields.
p
Year, that is what we feel like. But we are also quite interested in the alpha parameter. In general we miss a way to weight together these different fields.