#community-help

Special Character Impact on Indexing and Querying

TLDR Sergio suspects a special character may be affecting Typesense indexing and querying. Kishore Nallan requests a code snippet and advises creating an issue on GitHub.

Powered by Struct AI
Mar 01, 2023 (7 months ago)
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
01:47 PM
Hi Team, could it be that a "special character" is messing up indexing or querying?
Although documentation mention that "Typesense will remove special characters", could be that this special character got through, and querying was returning empty values.
01:50
Sergio
01:50 PM
We indexed Apple TV 128GB 4K as a string which contained some weird split between Apple and TV. (Not visible in Slack).
And later querying by Apple TV the document was not being returned.
Image 1 for We indexed `Apple TV 128GB 4K` as a string which contained some weird split between _Apple_ and _TV_. (Not visible in Slack).
And later querying by Apple TV the document was not being returned.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:55 PM
Interesting, is it possible for you to post a code snippet that I can run to index that data?
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
02:14 PM
Image 1 for Image 2 for
02:15
Sergio
02:15 PM
Pasting the code here seems to clean it ๐Ÿ˜•
02:16
Sergio
02:16 PM
Maybe it's a edge case
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:16 PM
Ok we might have to handle this explicitly, can you please create an issue on GH?
Sergio
Photo of md5-19856b8e92142bdd0747d7a3706736c8
Sergio
02:17 PM
Do you have where do you sanitize the input? Maybe I can take a look
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:19 PM
I Just checked, we explicitly split on space and newline. Because non visible white space is not differentiated from regular unicode which we can't split on because that's valid.