Hello I need to understand case sensitivity in Typesense wit typesense #community-help

Hello! I need to understand case sensitivity in T...

Joel Ödlund

12/20/2024, 3:51 PM

Hello! I need to understand case sensitivity in Typesense with respect to vector search. I notice that i get different results depending on casing when semantic search is enabled. Some questions,, 1. can you confirm that Typesense is not case sensitive? 2. How does embedding work during indexing, is the raw text embedded or is it normalized somehow? 3. How is embedding done during search? Is there some normalization here? 4. In general, how is a good approach to achieve case insensitivity when using vector search?

Kishore Nallan

12/20/2024, 4:20 PM

Case sensitivity depends on the model used. For e.g. BERT based models are uncased (i.e. they are not case sensitive) We do send normalized, lowercased text for embedding because most embedding models aren't case sensitive and we have not found it to particularly make a difference even for those models that do support it.

👍 1

Joel Ödlund

12/20/2024, 4:26 PM

can you confirm that you normalize both at indexing and at search?

Kishore Nallan

12/20/2024, 4:31 PM

Yes correct

Joel Ödlund

12/20/2024, 4:46 PM

Are there any circumstances where capitalisation makes any difference to a search?

Kishore Nallan

12/20/2024, 4:50 PM

Domain specific, but given that BERT based models dominate all benchmarks, I suspect that it doesn't make that much of a difference.

Joel Ödlund

12/20/2024, 5:03 PM

My question is more about Typesense. As far as i understand, everything is normalized to lowercase in both lexical and semantic search. Still, I get different results somehow for different capitalization. We use openAI embeddnings

Kishore Nallan

12/20/2024, 5:07 PM

Maybe there is a place where we don't. I've to check. I can get back to you after investigating.

🙏 1

Joel Ödlund

12/20/2024, 9:39 PM

Thank you!

6 Views

Open in Slack

Previous Next