Hello! I need to understand case sensitivity in T...
# community-help
j
Hello! I need to understand case sensitivity in Typesense with respect to vector search. I notice that i get different results depending on casing when semantic search is enabled. Some questions,, 1. can you confirm that Typesense is not case sensitive? 2. How does embedding work during indexing, is the raw text embedded or is it normalized somehow? 3. How is embedding done during search? Is there some normalization here? 4. In general, how is a good approach to achieve case insensitivity when using vector search?
k
Case sensitivity depends on the model used. For e.g. BERT based models are uncased (i.e. they are not case sensitive) We do send normalized, lowercased text for embedding because most embedding models aren't case sensitive and we have not found it to particularly make a difference even for those models that do support it.
👍 1
j
can you confirm that you normalize both at indexing and at search?
k
Yes correct
j
Are there any circumstances where capitalisation makes any difference to a search?
k
Domain specific, but given that BERT based models dominate all benchmarks, I suspect that it doesn't make that much of a difference.
j
My question is more about Typesense. As far as i understand, everything is normalized to lowercase in both lexical and semantic search. Still, I get different results somehow for different capitalization. We use openAI embeddnings
k
Maybe there is a place where we don't. I've to check. I can get back to you after investigating.
🙏 1
j
Thank you!