#community-help

Upsert Script Issues and Missing Spaces in Typesense

TLDR Alex is experiencing issues with missing spaces in Typesense, specifically appearing more frequently during upserts. The missing spaces persist despite changing versions and implementing a workaround. Despite detailed troubleshooting with Kishore Nallan, the issue remains unresolved. The problem appears to be random and unpredictable. Further testing and dataset sharing were suggested for resolution.

Powered by Struct AI
Mar 31, 2022 (21 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
01:42 PM
when I run the same upsert script I get this: spaces missing but in different records:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:21 PM
You confirmed that this is not a user interface rendering issue right?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:30 PM
It happens in several different fields and appears to be more frequent with upserts. And the values that have a space missing only change during indexing, not when the UI is being used. Typesense v0.22.1 Still could be in the JS layer I suppose but unlikely.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:32 PM
Can you check the API response from the request tab in browser console?
02:33
Kishore Nallan
02:33 PM
Also can you try checking on 0.22.2 if the issue is reproduceable locally?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:33 PM
yup I'll do some more debugging, just wanted to know if it's something you might be aware of already.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:34 PM
We did fix one upsert related issue on 0.22.2, so it could be that.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:34 PM
thx
Apr 06, 2022 (21 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
08:53 PM
unfortunately the space removing issue is still present in 0.22.2. I think it's more frequent around special characters, but also seen it being removed between two words. let me know if u can think of anything that would help to narrow it down?
09:23
Alex
09:23 PM
I've tried replacing spaces with char(255) so non breaking space, but same issue.
Apr 07, 2022 (21 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:12 AM
Have you looked at the actual response from Typesense? Does that contain those spaces? If so, possible to share a dataset where this is reproduceable?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:31 AM
Hard to share a dataset because the missing space is random (same dataset on re-index will change where the space is missing). Also it appears it's not really just an indexing issue but more of a TS response issue combined the index. I can't pin down an actual document. The space is missing based on how you query. And yes it's missing in the actual response from TS.
11:33
Alex
11:33 AM
{"count":91,"highlighted":"More than $1500","value":"More than $1500"},
{"count":1,"highlighted":"$500- $1500","value":"$500- $1500"}],"field_name":"price_str","stats":{"total_values":3}}]
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:34 AM
Can you check if value of the price_str field in the document object also has this space missing?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:35 AM
curl 'http://159.203.31.163/multi_search?x-typesense-api-key=xyz' \
-H 'Connection: keep-alive' \
-H 'Accept: application/json, text/plain, /*' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36' \
-H 'Content-Type: text/plain' \
-H 'Origin: http://alex.pc-america.com:3010' \
-H 'Referer: http://alex.pc-america.com:3010/' \
-H 'Accept-Language: pl-PL,pl;q=0.9' \
--data-raw '{"searches":[{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"
","facet_by":"productType,price_str,brand,categories,lastDay,lastWeek,lastMonth,instock,free_shipping,price,_Application/Usage,_Form Factor,_Provided Support,_Service Type,_Service Availability Days,_Service Availability Hours,_Service Duration,_Service Location,_Product Family,_Processor Manufacturer,_Output Receptacles,_Operating System,_Processor Type,_Color,_Processor Model,_Standard Memory,_Operating System Platform,_Processor Speed,_Total Solid State Drive Capacity,_Media Size,_Ethernet Technology,_Graphics Controller Manufacturer,_Graphics Controller Model,_Processor Core,_Weight (Approximate),_Operating System Architecture,_Drive Type,_License Type,_Media Type Supported,_Network Technology,_Graphics Memory Accessibility,_HDMI,_Keyboard Localization,_Operating System Language,_License Quantity,_Input Voltage,_Rack Height,_Total Number of USB Ports,_Screen Size,_Maximum Power Supply Wattage,_Input Current,_Power Rating (VA),_Input Receptacles,_Placement,_Output Voltage,_Touchscreen,_Wireless LAN Standard,_Screen Resolution,_Wireless LAN,_USB Type-C,_Energy Star,_Environmentally Friendly,_Optical Drive Type,_Bluetooth,_Phase,_Device Supported,_Firewall Protection Supported,_Limited Warranty,_Media Type,_PDU Type,_Power Rating (Watt),_Number of Cells,_Mounting Orientation,_Finger Print Reader,_Length,_Keyboard Backlight,_License Validation Period,_Screen Mode,_Host Interface,_Interfaces/Ports Details,_Processor Generation,_Storage Capacity,_TAA Compliant,_Cable Length,_USB,_DisplayPort,_Platform Supported,_Drive Interface,_Memory Technology,_Number of Total Memory Slots,_Port/Expansion Slot Details,_Layer Supported,attrib.lvl0,categories.lvl0","filter_by":"price_str:=[$500 - $1500] && price:=[1490..1501]","max_facet_values":100,"page":1,"per_page":60},{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"","facet_by":"price_str","filter_by":"price:=[1490..1501]","max_facet_values":100,"page":1,"per_page":1},{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"","facet_by":"price","filter_by":"price_str:=[$500 - $1500]","max_facet_values":100,"page":1,"per_page":1}]}' \
--compressed \
--insecure
11:36
Alex
11:36 AM
if I try to limit the price more price:=[1490..1501] either 1491 or 1500, then the space is not missing anymore.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:38 AM
What I meant is: we see that the space looks weird in the facet field in the response -- what about the value of the price_str field in the actual document returned in the result in the hits array in the JSON response?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:42 AM
doesn't look like the actual field has a space missing. price_str":"$500- doesn't find anything in the response but multiple results for price_str":"$500 -
11:45
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:46 AM
When Typesense sends the value as part of the facet response, what happens is that we pick that value from any document that matches the facet. If there is a single document that has missing space, then that gets picked as the representative document then this can happen.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:46 AM
the 2nd widget is with space replaced by underscore and then replaced again with space in the widget, seems to work as a workaround in this case. but there are multiple widgets that can be randomly affected by this, very sporadically.
11:47
Alex
11:47 AM
There is no document that has a missing space tho.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:47 AM
You verified that with a grep?
11:48
Kishore Nallan
11:48 AM
Of the dataset that's indexed into Typesense?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:49 AM
that string is generated programmatically. And like I said if I index the same data with spaces as _ they never get trimmed out. Only char(32) or char(255) that I noticed so far can go missing.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:49 AM
Is it possible for you to share a subset of the dataset that I can use to trigger the issue on my end?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:51 AM
Thing is that it happens randomly, about 1ce per 250K items. but when I added another field into the index with _s it happened 1 per 1 million items. Although sometimes it might have been like 3 times per 250K.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:52 AM
But it is easy to reproduce? meaning of I index 250K dataset and do a facet on a bunch of fields, atleast one facet will show that issue?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:12 PM
The chances of the missing space occurring increase with running upserts on the same collection. it's hard to say if it's something in the data or time of day almost at the time of indexing which determine how many records will return with a missing space in the widgets.
12:14
Alex
12:14 PM
for instance right now I ran the 250K upsert script 4 times after deleting the collection and there was not a single time I got the missing space. but when I ran the same script on 1Million of records it produced a few missing spaces the 2nd time I ran it, and 1 the first time I ran it.
12:15
Alex
12:15 PM
bizarre issue 😉
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:17 PM
And you are saying that "-" spacing looks fine on the actual document value in the Typesense response?
12:17
Kishore Nallan
12:17 PM
But hard to verify that since results are paginated.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:19 PM
The strange thing is that why would the space be missing in the widget and not the actual data? Another thing I have is dynamic widgets. So based on the search results I have a list of widgets returned in a widget and then I create them. unfortunately sometimes a space is missing in those names of widgets, so when I go to dynamically create them, they can't reference the actual facet...
12:24
Alex
12:24 PM
I tried to find an actual document with a missing space but was unable to. It seems to depend on the amount returned. In the example I sent earlier it was about 1350 results If I filtered to less then the space would not be missing.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 PM
And it's also interesting that only a space is an issue and not when a _ is used, correct?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:39 PM
Space or non breaking space, so char(32) or char(255). I'm thinking there is some trim() that is happening on a chunk of data?
01:07
Alex
01:07 PM
Do you have a link to ur current non docker RC builds? Wouldn't mind giving that a test with this issue.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:13 PM
Apr 08, 2022 (21 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:12 PM
Didn't get .23 working with this yet due to the connection issues I mentioned in the other post. But it looks like u should be able to reproduce with a snapshot for 0.22.2
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:20 PM
Yes if you can just zip the data directory and share, I can look. However I'm AFK at the moment so please DM a link and can download + look later.

Typesense

Lightning-fast, open source search engine for everyone | Knowledge Base powered by Struct.AI

Indexed 3011 threads (79% resolved)

Join Our Community

Similar Threads

Handling Kinesis Stream Event Batching with Typesense

Dui had questions about how to handle Kinesis stream events with Typesense. Kishore Nallan suggested using upsert mode for creation/update and differentiating with logical deletion. After various discussions including identifying and resolving a bug, they finalized to introduce an `emplace` action in Typesense v0.23.

8

91
24mo

Resolving Typesense v0.22.0 Import Issues

Anton encountered issues importing documents in batches using Typesense v0.22.0. Kishore Nallan suggested using atomic import and proposed a debug build. After multiple trials, they were able to reproduce and fix the issue. Anton confirmed the solution was working.

1

57
27mo
Solved

Typesense Server Bulk Import/Upsert Issue Resolved

Adam was confused about the discrepancy between the successful responses and the actual indexed data while working with a custom WP plugin integrating with Typesense. The issue was a bug related to fetching documents in the wrong order, not a Typesense problem.

2

22
7mo
Solved

Phrase Search Relevancy and Weights Fix

Jan reported an issue with phrase search relevancy using Typesense Instantsearch Adapter. The problem occurred when searching phrases with double quotes. The team identified the issue to be related to weights and implemented a fix, improving the search results.

6

111
8mo
Solved

Resolving Multilingual Search Function in Typesense Software

Bill is having difficulty with multilingual search functionality in Typesense software. Developer Kishore Nallan suggested setting a language locale and provided a demo build. The build solution had some issues, and after multiple rounds of software updates and troubleshooting, the problem still persists.

2

89
25mo