We are currently dealing with some user feedback o...
# community-help
s
We are currently dealing with some user feedback on the semantic search. It creates a controversy on how and where to handle that requirement from the user. Contextual details: We are using auto-embedding during indexing and therefore during querying as well. The UI is some react application using
instantsearch
, connected through the
typesense-instantsearch-adapater
. Scenario: The users are performing case sensitive semantic searches and comparing the results. User feedback:
scrum master
and
Scrum master
are semantically rather the same than different. Capitalization on searches shouldn't matter for most situations (especially in English, other languages might differ though ... but this is probably a different story). User expectation: Both search queries should result in the same semantic results. Now to the controversy: Apparently embeddings for those two search terms will be different. And therefore, results for vector distances will differ, too. In order to fulfill the clients request though, the question arose where to manipulate the query to ensure same results are returned. 1. A simple, naive solution would manipulation in the UI already. Drawback when doing this: we are messing with
uiState
and actually impacting the routing as well (user types in
?q=Scrum+master
, the URL will say
?q=scrum+master
and as soon as the user reloads the page with that URL, it will be lower cased as well). Additionally, this offloads how the search engine should behave rather to the frontend than the engine itself. 2. A better solution might be modifying the query on the "server side". Is there any option to e.g. configure / manage the used tokenizer (or better normalizer) of the auto-embedding to lower case queries first? (This would become handy, as also trailing whitespaces in the query cause different vector search results - throwing off users) 3. Overall, the engineering team understands the semantic search and different results. However, the user has some very specific expectation, which we'd like to get to as close as possible - if possible at all. Meaning, maybe this is such an edge case scenario and we rather need user education only.