Dima
03/22/2024, 9:26 AMC++
and C#
are the same token as C
, and .NET
is the same token as NET
. Because the tokenizer legitimately removes punctuation marks from the text, users have a hard time finding exact matches for such search queries, they have to learn about quotes, see that this is the case, and use them only around the term.
I could enable symbols_to_index, add .+#!
to it, but it will probably worsen the overall quality of search results (e.g. if an author of a text missed a space somewhere and word stuck to the punctuation mark). I have a list of such terms, so can I instruct the tokenizer to keep them as they are? Or build a workaround to disable typo tolerance and punctuation mark stripping for some words in the search query.