Hi team! I have a strange feature request, maybe y...
# community-help
d
Hi team! I have a strange feature request, maybe you can help me with a workaround 🤔 In my text dataset, I have some terms that are very similar to english words, but have an additional meaning. They are usually weird names of products, services or companies, similar to how
C++
and
C#
are the same token as
C
, and
.NET
is the same token as
NET
. Because the tokenizer legitimately removes punctuation marks from the text, users have a hard time finding exact matches for such search queries, they have to learn about quotes, see that this is the case, and use them only around the term. I could enable symbols_to_index, add
.+#!
to it, but it will probably worsen the overall quality of search results (e.g. if an author of a text missed a space somewhere and word stuck to the punctuation mark). I have a list of such terms, so can I instruct the tokenizer to keep them as they are? Or build a workaround to disable typo tolerance and punctuation mark stripping for some words in the search query.