#community-help

Upsert Script Issues and Missing Spaces in Typesense

TLDR Alex is experiencing issues with missing spaces in Typesense, specifically appearing more frequently during upserts. The missing spaces persist despite changing versions and implementing a workaround. Despite detailed troubleshooting with Kishore Nallan, the issue remains unresolved. The problem appears to be random and unpredictable. Further testing and dataset sharing were suggested for resolution.

Powered by Struct AI
Mar 31, 2022 (18 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
01:42 PM
when I run the same upsert script I get this: spaces missing but in different records:
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:21 PM
You confirmed that this is not a user interface rendering issue right?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:30 PM
It happens in several different fields and appears to be more frequent with upserts. And the values that have a space missing only change during indexing, not when the UI is being used. Typesense v0.22.1 Still could be in the JS layer I suppose but unlikely.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:32 PM
Can you check the API response from the request tab in browser console?
02:33
Kishore Nallan
02:33 PM
Also can you try checking on 0.22.2 if the issue is reproduceable locally?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:33 PM
yup I'll do some more debugging, just wanted to know if it's something you might be aware of already.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
02:34 PM
We did fix one upsert related issue on 0.22.2, so it could be that.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
02:34 PM
thx
Apr 06, 2022 (17 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
08:53 PM
unfortunately the space removing issue is still present in 0.22.2. I think it's more frequent around special characters, but also seen it being removed between two words. let me know if u can think of anything that would help to narrow it down?
09:23
Alex
09:23 PM
I've tried replacing spaces with char(255) so non breaking space, but same issue.
Apr 07, 2022 (17 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:12 AM
Have you looked at the actual response from Typesense? Does that contain those spaces? If so, possible to share a dataset where this is reproduceable?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:31 AM
Hard to share a dataset because the missing space is random (same dataset on re-index will change where the space is missing). Also it appears it's not really just an indexing issue but more of a TS response issue combined the index. I can't pin down an actual document. The space is missing based on how you query. And yes it's missing in the actual response from TS.
11:33
Alex
11:33 AM
{"count":91,"highlighted":"More than $1500","value":"More than $1500"},
{"count":1,"highlighted":"$500- $1500","value":"$500- $1500"}],"field_name":"price_str","stats":{"total_values":3}}]
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:34 AM
Can you check if value of the price_str field in the document object also has this space missing?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:35 AM
curl 'http://159.203.31.163/multi_search?x-typesense-api-key=xyz' \
-H 'Connection: keep-alive' \
-H 'Accept: application/json, text/plain, /*' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36' \
-H 'Content-Type: text/plain' \
-H 'Origin: http://alex.pc-america.com:3010' \
-H 'Referer: http://alex.pc-america.com:3010/' \
-H 'Accept-Language: pl-PL,pl;q=0.9' \
--data-raw '{"searches":[{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"
","facet_by":"productType,price_str,brand,categories,lastDay,lastWeek,lastMonth,instock,free_shipping,price,_Application/Usage,_Form Factor,_Provided Support,_Service Type,_Service Availability Days,_Service Availability Hours,_Service Duration,_Service Location,_Product Family,_Processor Manufacturer,_Output Receptacles,_Operating System,_Processor Type,_Color,_Processor Model,_Standard Memory,_Operating System Platform,_Processor Speed,_Total Solid State Drive Capacity,_Media Size,_Ethernet Technology,_Graphics Controller Manufacturer,_Graphics Controller Model,_Processor Core,_Weight (Approximate),_Operating System Architecture,_Drive Type,_License Type,_Media Type Supported,_Network Technology,_Graphics Memory Accessibility,_HDMI,_Keyboard Localization,_Operating System Language,_License Quantity,_Input Voltage,_Rack Height,_Total Number of USB Ports,_Screen Size,_Maximum Power Supply Wattage,_Input Current,_Power Rating (VA),_Input Receptacles,_Placement,_Output Voltage,_Touchscreen,_Wireless LAN Standard,_Screen Resolution,_Wireless LAN,_USB Type-C,_Energy Star,_Environmentally Friendly,_Optical Drive Type,_Bluetooth,_Phase,_Device Supported,_Firewall Protection Supported,_Limited Warranty,_Media Type,_PDU Type,_Power Rating (Watt),_Number of Cells,_Mounting Orientation,_Finger Print Reader,_Length,_Keyboard Backlight,_License Validation Period,_Screen Mode,_Host Interface,_Interfaces/Ports Details,_Processor Generation,_Storage Capacity,_TAA Compliant,_Cable Length,_USB,_DisplayPort,_Platform Supported,_Drive Interface,_Memory Technology,_Number of Total Memory Slots,_Port/Expansion Slot Details,_Layer Supported,attrib.lvl0,categories.lvl0","filter_by":"price_str:=[$500 - $1500] && price:=[1490..1501]","max_facet_values":100,"page":1,"per_page":60},{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"","facet_by":"price_str","filter_by":"price:=[1490..1501]","max_facet_values":100,"page":1,"per_page":1},{"query_by":"concat","sort_by":"instock:desc,popularity:desc,_text_match:desc","prioritize_exact_match":false,"num_typos":1,"drop_tokens_threshold":0,"typo_tokens_threshold":2,"highlight_fields":"concat,description,name","highlight_full_fields":"description,name","collection":"products","q":"","facet_by":"price","filter_by":"price_str:=[$500 - $1500]","max_facet_values":100,"page":1,"per_page":1}]}' \
--compressed \
--insecure
11:36
Alex
11:36 AM
if I try to limit the price more price:=[1490..1501] either 1491 or 1500, then the space is not missing anymore.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:38 AM
What I meant is: we see that the space looks weird in the facet field in the response -- what about the value of the price_str field in the actual document returned in the result in the hits array in the JSON response?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:42 AM
doesn't look like the actual field has a space missing. price_str":"$500- doesn't find anything in the response but multiple results for price_str":"$500 -
11:45
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:46 AM
When Typesense sends the value as part of the facet response, what happens is that we pick that value from any document that matches the facet. If there is a single document that has missing space, then that gets picked as the representative document then this can happen.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:46 AM
the 2nd widget is with space replaced by underscore and then replaced again with space in the widget, seems to work as a workaround in this case. but there are multiple widgets that can be randomly affected by this, very sporadically.
11:47
Alex
11:47 AM
There is no document that has a missing space tho.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:47 AM
You verified that with a grep?
11:48
Kishore Nallan
11:48 AM
Of the dataset that's indexed into Typesense?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:49 AM
that string is generated programmatically. And like I said if I index the same data with spaces as _ they never get trimmed out. Only char(32) or char(255) that I noticed so far can go missing.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:49 AM
Is it possible for you to share a subset of the dataset that I can use to trigger the issue on my end?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
11:51 AM
Thing is that it happens randomly, about 1ce per 250K items. but when I added another field into the index with _s it happened 1 per 1 million items. Although sometimes it might have been like 3 times per 250K.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
11:52 AM
But it is easy to reproduce? meaning of I index 250K dataset and do a facet on a bunch of fields, atleast one facet will show that issue?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:12 PM
The chances of the missing space occurring increase with running upserts on the same collection. it's hard to say if it's something in the data or time of day almost at the time of indexing which determine how many records will return with a missing space in the widgets.
12:14
Alex
12:14 PM
for instance right now I ran the 250K upsert script 4 times after deleting the collection and there was not a single time I got the missing space. but when I ran the same script on 1Million of records it produced a few missing spaces the 2nd time I ran it, and 1 the first time I ran it.
12:15
Alex
12:15 PM
bizarre issue 😉
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:17 PM
And you are saying that "-" spacing looks fine on the actual document value in the Typesense response?
12:17
Kishore Nallan
12:17 PM
But hard to verify that since results are paginated.
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:19 PM
The strange thing is that why would the space be missing in the widget and not the actual data? Another thing I have is dynamic widgets. So based on the search results I have a list of widgets returned in a widget and then I create them. unfortunately sometimes a space is missing in those names of widgets, so when I go to dynamically create them, they can't reference the actual facet...
12:24
Alex
12:24 PM
I tried to find an actual document with a missing space but was unable to. It seems to depend on the amount returned. In the example I sent earlier it was about 1350 results If I filtered to less then the space would not be missing.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:30 PM
And it's also interesting that only a space is an issue and not when a _ is used, correct?
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:39 PM
Space or non breaking space, so char(32) or char(255). I'm thinking there is some trim() that is happening on a chunk of data?
01:07
Alex
01:07 PM
Do you have a link to ur current non docker RC builds? Wouldn't mind giving that a test with this issue.
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:13 PM
Apr 08, 2022 (17 months ago)
Alex
Photo of md5-dc362c4060d01d7ad6c9211157de3d69
Alex
12:12 PM
Didn't get .23 working with this yet due to the connection issues I mentioned in the other post. But it looks like u should be able to reproduce with a snapshot for 0.22.2
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
12:20 PM
Yes if you can just zip the data directory and share, I can look. However I'm AFK at the moment so please DM a link and can download + look later.