#community-help

Resolving Special Character Search Errors

TLDR suraj had issues searching for data containing special characters. Kishore Nallan resolved the issue by advising suraj to remove the parameters for 'preSegmentedQuery' and 'tokenSeparators'.

Powered by Struct AI

1

48
3mo
Solved
Join the chat
Jun 22, 2023 (3 months ago)
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
07:22 AM
Hello Team,
I have trouble while searching for data which contains the speical character like &.

below is my snap shot dummy json data:
{"company_id":"11609817","parent_id":"365","track_id":"1","security_id":"12",
"symbol":"Ram&Madhu","company_nm":"Mahindra & Mahindra","short_name":"M&M"

Whenever I search this data it gives me bad request error, intially when I created json file special character are encoded into UTF-8
still that was not coming into search.

Can anybody please suggest a way here.
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
04:29 PM
Could you share a set of curl commands like this that replicates the issue: https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
04:29
Jason
04:29 PM
I suspect it might be because of URL encoding
Jun 23, 2023 (3 months ago)
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
04:24 AM
So while creating batch document I should use text/plain, is that right?
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:22 AM
Hi Suraj, if you can just post a snippet that i can run to see the actual error, it would be easy for me to figure out what's going wrong.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
11:49 AM
Hi Kishor..
class SearchParameters {
q: m&m
queryBy: security_nm,symbol,brand.brandName
queryByWeights: 3,2,1
prefix: true,true,true
infix: null
maxExtraPrefix: null
maxExtraSuffix: null
filterBy: null
sortBy: null
facetBy: null
maxFacetValues: null
facetQuery: null
numTypos: null
page: null
perPage: 100
groupBy: null
groupLimit: null
includeFields: null
excludeFields: null
highlightFullFields: null
highlightAffixNumTokens: null
highlightStartTag: null
highlightEndTag: null
snippetThreshold: null
dropTokensThreshold: null
typoTokensThreshold: null
pinnedHits: null
hiddenHits: null
highlightFields: null
splitJoinTokens: null
preSegmentedQuery: true
enableOverrides: null
prioritizeExactMatch: false
maxCandidates: null
prioritizeTokenPosition: null
exhaustiveSearch: null
searchCutoffMs: null
useCache: null
cacheTtl: null
minLen1typo: null
minLen2typo: null
}
11:50
suraj
11:50 AM
curl --location --request GET 'http://localhost:8080/search/M & M'
<!doctype html>&lt;html lang="en"&gt;&lt;head&gt;&lt;title&gt;HTTP Status 400 โ€“ Bad Request&lt;/title&gt;&lt;style type="text/css"&gt;body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}&lt;/style&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;HTTP Status 400 โ€“ Bad Request&lt;/h1&gt;&lt;/body&gt;&lt;/html&gt;
Kishore Nallan FYR
Jun 26, 2023 (3 months ago)
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
04:14 AM
Jason Kishore Nallan Can you please suggest a way here
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
10:51 AM
The HTML response above is from your web server? I need to see the actual response from the Typesense API.
10:51
Kishore Nallan
10:51 AM
If you are using the GET search method, you might also want to try the using POST /multi_search end-point.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
12:07 PM
Now I am not getting any error as of now but with &amp; it is not giving me result. it is sending me empty result
12:07
suraj
12:07 PM
for query=m&amp;m
12:08
suraj
12:08 PM
for this json data:{"company_id":"11609817","parent_id":"365","track_id":"1","security_id":"12",
"symbol":"Ram&amp;Madhu","company_nm":"Mahindra &amp; Mahindra","short_name":"M&amp;M"
12:08
suraj
12:08 PM
Kishore Nallan
03:46
suraj
03:46 PM
I have used tokenSeparators and symbolsToIndex as well, providing special character but still it sdoes not search with keyword like M&amp;M
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
03:50 PM
When I tried locally on a small sample set, querying with m&amp;m does fetch that record for me.

Please share a sample dataset so I can try it, for e.g. like this (as Jason shared earlier): https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
Jun 27, 2023 (3 months ago)
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
06:26 AM
This is my jsol file , below is data
06:26
suraj
06:26 AM
06:27
suraj
06:27 AM
This is how I am creating collection in java
06:29
suraj
06:29 AM
06:30
suraj
06:30 AM
So, when I search data for M&amp;M, it return me blank data
06:30
suraj
06:30 AM
Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:34 AM
Please post the search Java snippet also. Again, you need to post a fully reproduceable example so I run and check. Otherwise it's hard for me to help. Please post a full program in Java client that creates collection, indexes document and queries that show cases the issue.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
06:39 AM
06:40
suraj
06:40 AM
Have added code for collection create, document insert and search, please check the same Kishore Nallan, let lme know in case anything else is required
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
06:41 AM
Will check and get back to you in an hour or so.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
06:42 AM
Thanks Kishore Nallan for your quick response, appreciate๐Ÿ™
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:55 AM
suraj That M&amp;M document is not even getting indexed. If you check the response of the import call, you will see this error:

Field `security` has been declared in the schema, but is not found in the document.
11:56
Kishore Nallan
11:56 AM
The import function always returns true because only some of the documents could fail (like in this case), so you have to check the response.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
02:29 PM
Kishore Nallan below is the right jsonl file, on which collection is created
{"company_id":"3005","parent_id":"16795","exchange_id":"2","security":"","symbol":"M&MFIN","company_nm":"Mahindra & Mahindra Financial Services Ltd","short_company_nm":"M & M Fin. Serv.","code":"INE774D0102","series":"","brand":[{"heading":"Financial Services","brandName":"MAHINDRA FINANCE FINSMART"}]}
{"company_id":"3004","parent_id":"27242","exchange_id":"2","security":"","symbol":"BLKASHYAP","company_nm":"B.L.Kashyap & Sons Ltd","short_company_nm":"B.L.Kashyap","code":"INE350H1032","series":"","brand":[{"heading":"Construction & Civil Engineering","brandName":"BLK"}]}
{"company_id":"4354","parent_id":"28","exchange_id":"2","security":"","symbol":"ARUNAHTEL","company_nm":"Aruna Hotels Ltd","short_company_nm":"Aruna Hotels","code":"INE95701019","series":"","brand":[{"heading":"HotelBusiness","brandName":"ARUNA"}]}

02:29
suraj
02:29 PM
I am able to search with other symbol like BLKASHYAP
Jun 28, 2023 (3 months ago)
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
09:03 AM
When I search keyword like Mahindra , I am getting record of M&amp;MFIN as well, hence it is stored in collection properly, only when i search M&amp;M, id do not work Kishore Nallan
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:08 AM
I'm yet to look. Since I juggle supporting multiple customers it's always useful if you can share a fully reproduceable example that I can just run in one go. When you share snippets like above I still need to put it together into a Java program and run it and test it. That takes a lot of my time.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
09:09 AM
I Understand, let me put the entire code base here then
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:10 AM
Just share a single Java program in static void main that we both can run and get the exact results.
09:10
Kishore Nallan
09:10 AM
It will prevent issues like earlier where the documents didn't even get indexed properly .
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
09:10 AM
Okay got it..
09:15
suraj
09:15 AM
java code and jsonl file
09:16
suraj
09:16 AM
09:17
suraj
09:17 AM
have added this line query =URLEncoder.encode(query, StandardCharsets.UTF_8.toString()); in code just to check if with this special character is working or not
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:20 AM
Again you are adding several snippets instead of a single program which creates, collection, indexes the docs, queries and prints the result. ๐Ÿ™‚
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
09:38 AM
Whole code.. jsonl file already sent to you
09:39
suraj
09:39 AM
Kishore Nallan Have sent you consolidated code.. Please have a look.. jsonl file is already sent
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
09:40 AM
Thanks will get back to you

1

10:45
Kishore Nallan
10:45 AM
Remove preSegmentedQuery param and it should work.
10:50
Kishore Nallan
10:50 AM
Also you probably don't need tokenSeparators during collection creation as well.
suraj
Photo of md5-396118c791d531ff7af8cd473d5b26ff
suraj
01:59 PM
Kishore Nallan After removing tokenSeparators and preSegmentedQuery *it is wokring fine now.*
Thank you for for help.:pray:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:01 PM
Welcome!