Resolving Special Character Search Errors
TLDR suraj had issues searching for data containing special characters. Kishore Nallan resolved the issue by advising suraj to remove the parameters for 'preSegmentedQuery' and 'tokenSeparators'.
1
Jun 22, 2023 (3 months ago)
suraj
07:22 AMI have trouble while searching for data which contains the speical character like &.
below is my snap shot dummy json data:
{"company_id":"11609817","parent_id":"365","track_id":"1","security_id":"12",
"symbol":"Ram&Madhu","company_nm":"Mahindra & Mahindra","short_name":"M&M"
Whenever I search this data it gives me bad request error, intially when I created json file special character are encoded into UTF-8
still that was not coming into search.
Can anybody please suggest a way here.
Jason
04:29 PMJason
04:29 PMJun 23, 2023 (3 months ago)
suraj
04:24 AMKishore Nallan
06:22 AMsuraj
11:49 AMclass SearchParameters {
q: m&m
queryBy: security_nm,symbol,brand.brandName
queryByWeights: 3,2,1
prefix: true,true,true
infix: null
maxExtraPrefix: null
maxExtraSuffix: null
filterBy: null
sortBy: null
facetBy: null
maxFacetValues: null
facetQuery: null
numTypos: null
page: null
perPage: 100
groupBy: null
groupLimit: null
includeFields: null
excludeFields: null
highlightFullFields: null
highlightAffixNumTokens: null
highlightStartTag: null
highlightEndTag: null
snippetThreshold: null
dropTokensThreshold: null
typoTokensThreshold: null
pinnedHits: null
hiddenHits: null
highlightFields: null
splitJoinTokens: null
preSegmentedQuery: true
enableOverrides: null
prioritizeExactMatch: false
maxCandidates: null
prioritizeTokenPosition: null
exhaustiveSearch: null
searchCutoffMs: null
useCache: null
cacheTtl: null
minLen1typo: null
minLen2typo: null
}
suraj
11:50 AM<!doctype html><html lang="en"><head><title>HTTP Status 400 โ Bad Request</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 โ Bad Request</h1></body></html>
Kishore Nallan FYR
Jun 26, 2023 (3 months ago)
suraj
04:14 AMKishore Nallan
10:51 AMKishore Nallan
10:51 AM/multi_search
end-point.suraj
12:07 PMsuraj
12:07 PMsuraj
12:08 PM"symbol":"Ram&Madhu","company_nm":"Mahindra & Mahindra","short_name":"M&M"
suraj
12:08 PMsuraj
03:46 PMKishore Nallan
03:50 PMm&m
does fetch that record for me.Please share a sample dataset so I can try it, for e.g. like this (as Jason shared earlier): https://gist.github.com/jasonbosco/7c3432713216c378472f13e72246f46b
Jun 27, 2023 (3 months ago)
suraj
06:26 AMsuraj
06:26 AMsuraj
06:27 AMsuraj
06:29 AMsuraj
06:30 AMsuraj
06:30 AMKishore Nallan
06:34 AMsuraj
06:39 AMsuraj
06:40 AMKishore Nallan
06:41 AMsuraj
06:42 AMKishore Nallan
11:55 AMField `security` has been declared in the schema, but is not found in the document.
Kishore Nallan
11:56 AMsuraj
02:29 PM{"company_id":"3005","parent_id":"16795","exchange_id":"2","security":"","symbol":"M&MFIN","company_nm":"Mahindra & Mahindra Financial Services Ltd","short_company_nm":"M & M Fin. Serv.","code":"INE774D0102","series":"","brand":[{"heading":"Financial Services","brandName":"MAHINDRA FINANCE FINSMART"}]}
{"company_id":"3004","parent_id":"27242","exchange_id":"2","security":"","symbol":"BLKASHYAP","company_nm":"B.L.Kashyap & Sons Ltd","short_company_nm":"B.L.Kashyap","code":"INE350H1032","series":"","brand":[{"heading":"Construction & Civil Engineering","brandName":"BLK"}]}
{"company_id":"4354","parent_id":"28","exchange_id":"2","security":"","symbol":"ARUNAHTEL","company_nm":"Aruna Hotels Ltd","short_company_nm":"Aruna Hotels","code":"INE95701019","series":"","brand":[{"heading":"HotelBusiness","brandName":"ARUNA"}]}
suraj
02:29 PMJun 28, 2023 (3 months ago)
suraj
09:03 AMKishore Nallan
09:08 AMsuraj
09:09 AMKishore Nallan
09:10 AMKishore Nallan
09:10 AMsuraj
09:10 AMsuraj
09:15 AMsuraj
09:16 AMsuraj
09:17 AMKishore Nallan
09:20 AMsuraj
09:38 AMsuraj
09:39 AMKishore Nallan
09:40 AM1
Kishore Nallan
10:45 AMpreSegmentedQuery
param and it should work.Kishore Nallan
10:50 AMtokenSeparators
during collection creation as well.suraj
01:59 PMtokenSeparators and preSegmentedQuery *it is wokring fine now.*
Thank you for for help.:pray:
Kishore Nallan
02:01 PMTypesense
Indexed 2776 threads (79% resolved)
Similar Threads
Cold Start Problem with Dynamic Collections
Adrian reported cold start issues with dynamic collections. Jason suggested using wildcard `*` for query_by parameters, upgrading to `0.25.0.rc34`, and clarified conventions. Adrian's issues were resolved but they reported a limitation that will potentially be addressed.
Large JSONL Documents Import Issue & Resolution
Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.
Troubleshooting Issues with DocSearch Hits and Scraper Configuration
Rubai encountered issues with search result priorities and ellipsis. Jason helped debug the issue and suggested using different versions of typesense-docsearch.js, updating initialization parameters, and running the scraper on a Linux-based environment. The issues related to hits structure and scraper configuration were resolved.
Utilizing Vector Search and Word Embeddings for Comprehensive Search in Typesense
Bill sought clarification on using vector search with multiple word embeddings in Typesense and using them instead of OpenAI's embedding. Kishore Nallan and Jason informed him that their development version 0.25 supports open source embedding models. They also resolved Bill's concerns regarding search performance, language support, and limitations in the search parameters.
Querying and Indexing Multiple Elements Issues
Krish queried fields with multiple elements, which Kishore Nallan suggested checking `drop_tokens_threshold`. Krish wished to force OR mode for token, but Kishore Nallan admitted the feature was missing. Krish was able to resolve the issue with url encoding.