Hi My company is using all three sorting criteria We would a typesense #community-help

Hi, My company is using all three sorting criteria...

Peter Thramkrongart

01/30/2025, 9:35 AM

Hi, My company is using all three sorting criteria. We would actually like to sort on a 4th criterium. Is there a reason that sorting criteria are capped at three?

Kishore Nallan

01/30/2025, 10:45 AM

It's a design decision due to performance considerations. If you want more sort fields, you can try combining the values into a single field value by summing / multiplying the components to form a large single score.

Peter Thramkrongart

01/30/2025, 11:10 AM

We are reranking with _vector_distance_ and a custom rank score. Would we be able to do calculations on the vector distance?

Kishore Nallan

01/30/2025, 11:20 AM

Those are 2 fields, what are the other 2 fields you want to add in the sorting condition? Generally beyond 3 sorting conditions, the tie-breaking becomes very rare (you require 3 fields to match exactly for the 4 field to be considered).

Peter Thramkrongart

01/30/2025, 11:21 AM

The main field is text match. We would like to impliment merchandising as well.

Peter Thramkrongart

01/30/2025, 11:23 AM

We are also looking for a way to smooth out the continous scores of text_match and vector_distance to make tie breaking occur more often. Ideally we would like everything to be ranked together at once instead of tie-break sorting.

Peter Thramkrongart

01/30/2025, 11:24 AM

We are using the vectors for personalization, so hybrid search does not work in our case, bevause that would give a lot irrelevant results to the search phrase.

Kishore Nallan

01/30/2025, 11:27 AM

We support bucketing of text match scores, and in recent RC builds, have made it even more flexible by allowing you to specify the number of docs that go into a bucket (instead of just number of buckets). See: https://github.com/typesense/typesense/pull/2120 But my earlier comment holds: from my experience with other customers, seldom do 4 independent ranking fields work well in practice. Either you should model it as some form of weighted score or fine-tune the model itself to incorporate those signals.

Peter Thramkrongart

01/30/2025, 11:36 AM

Okay, is there a way to make a weighted score in typesense? We need to be able to include text_match and vector_distance.

Kishore Nallan

01/30/2025, 11:53 AM

That's what the hybrid search option does. It gives you ability to control the

alpha

parameter that weights keyword search vs embedding searches.

Peter Thramkrongart

01/30/2025, 11:56 AM

Yes, we would love to do that, but we need a way to filter out documents that have no text_match scores. Without a filter, users will see a documents that they have not searched for, because they are close to their user embedding. we need to be able to filter out those documents.

Kishore Nallan

01/30/2025, 11:59 AM

Ok got it, then bucketing on text match scores will help you I think. It will introduce a certain amount of fuzziness into text match scores so that the vector distance is used for re-ranking.

Peter Thramkrongart

01/30/2025, 11:59 AM

Example with hybrid search: a user have looked at nike shoes in your store. They then search for tennis rackets. If you don't have a lot of tennis rackets in your stores, Nike shoes may pop up in the search result.

Peter Thramkrongart

01/30/2025, 12:01 PM

But can you perform calculations on text match score or the vector distance? Can we sum the text_match and custom rank score?

Kishore Nallan

01/30/2025, 12:02 PM

If you don't have a lot of tennis rackets in your stores, Nike shoes may pop up in the search result.

This is because your embeddings are personalized on user's search history?

Peter Thramkrongart

01/30/2025, 12:02 PM

Yes, we create embeddings based on user history.

Kishore Nallan

01/30/2025, 12:04 PM

Can we sum the text_match and custom rank score?

No that can't be done. You will have to do the re-ranking then on the client if you want a lot of custom logic like this. In any case, for vector search pagination is not recommended because of the approximate nature of the search. When you paginate, the larger search radius could produce better results that didn't show up with the smaller

used for the first page, and this can cause the order of results to seemingly repeat. So I recommend just doing a single request with a large

and then apply your custom ranking on the client side.

Peter Thramkrongart

01/30/2025, 12:06 PM

Okay. Thanks. I think we will have to stick to sorting and bucketing for our case. We just needed to clear on what tools are available to us.

👍 1

Peter Thramkrongart

01/30/2025, 1:08 PM

Just to be clear on our use-case for 4 sorting fields: We want to sort on 1) text-match, then 2) pin products bought earlier with "sorting based on filter score", then 3) sort based on vector distance to user embedding (for personalization based on user history), and lastly 4) sort on a custom rank score for products that are too far away from the user embedding. We are fairly certain that all four fields will be used in our case, because most users won't be have strong relations to most products. We see your point regarding perfomance with 4 sorting fields.

Kishore Nallan

01/30/2025, 1:21 PM

I suspect that component 3 (sorting on vector distance) will not produce any ties so the 4th sorting field will never really work.

Peter Thramkrongart

01/30/2025, 1:22 PM

Well, Not in itself, but it does, when you have distance filter on.

Kishore Nallan

01/30/2025, 1:24 PM

How would that help with tie-breaking? With filter you will exclude some low-scoring docs. But the ones that do get matched will have float values for vector similarity.

Peter Thramkrongart

01/30/2025, 1:27 PM

every time a user searches for something unrelated to their ealier history, all search results will tie on vector_distance, because of the distance_threshold. Then the custom ranks core applies.

👍 1

Kishore Nallan

01/30/2025, 1:39 PM

Is there a feature we can add to hybrid search as it works today to account for this use case?

Peter Thramkrongart

01/30/2025, 1:45 PM

In our case, we need to be able to filter to only show documents that are found by the text search. Then we can use the alpha to dial in the influence of the personalization (vector search). Though I'm wondering will this render the custom rank score useless?

Kishore Nallan

01/30/2025, 1:56 PM

This can be done with sorting on vector query so the real limitation you are running into here is the hard limit on sorting fields.

Peter Thramkrongart

01/30/2025, 1:58 PM

Year, that is what we feel like. But we are also quite interested in the alpha parameter. In general we miss a way to weight together these different fields.

Open in Slack

Previous Next