Hi there, I'm using CLIP for an image search enginee and I want to tune a proper distance_threshold, but the vector distances are so similar and all in the 0.7 to 0.8 range. Do you have any advice to how can I find the best distance threshold?
k
Kishore Nallan
01/02/2025, 12:59 PM
This is very difficult to call because it's domain specific. Generally when you have lot of distances in the 0.7-0.8 ranges it means that many of them are actually not that relevant so model is not able to clearly differentiate and rank them more sharply.
m
Mohammad Javad Alizadeh
01/02/2025, 1:17 PM
Thank you, as I searched for some solution, I found that image preprocessing (resizing, normalizing,...) and also normalizing the embeddings may have positive effects? Is there any way to do image preprocessing and embedding normalizing when using typesense built-in CLIP model?