#community-help

Imputing Documents with Null Array Data Types in Typesense

TLDR Vishal encountered a "Null" data issue when inserting documents with NaN values in string[] or object[]. Jason recommended using version 0.25.0.rc35 and checking the JSON lines conversion. Issue resolved.

Powered by Struct AI
17
3mo
Solved
Join the chat
Jun 09, 2023 (4 months ago)
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:26 PM
I just searched the slack for the "Null" data issue where it errors when trying to insert a document with NaN/Nulls values for certain keys; I am trying to impute the keys with df['xyz'].fillna(value="Null'') but it is erroring when datatypes are string[] or object[]. Is there a best practice on how to impute documents which containing NaNs for fields whose datatypes are arrays (string[] or object[])?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:28 PM
It sounds very similar to this. Could you try this on 0.25.0.rc35?
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:29 PM
ok, is this build stable?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:30 PM
A few new features are still work in progress, but the existing features in 0.24.1 are stable
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:31 PM
ok, what do you recommend? figuring out a valid imputation workaround or 0.24.1 or 0.25.0.rc35?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:33 PM
I would recommend 0.25.0.rc35, since it has a couple more fixes
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:34 PM
ok, testing
10:47
Vishal
10:47 PM
ok, pulled and running its still erroring on a key with a datatype string[] or object[]
10:48
Vishal
10:48 PM
I made set every single field as optional = True when I created the schema
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:48 PM
I see. Could you clone this script and adapt it to reproduce the issue, and then open a GitHub issue with that snippet?
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:49 PM
ok, just so we're clear - lets say I have added keys to the schema whose datatypes are string[] and object[]
10:49
Vishal
10:49 PM
assume all fields are optional
10:50
Vishal
10:50 PM
if i try to insert a document into the schema and the values for those particular keys in the document are missing or NaN, what is the expected behavior?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
10:51 PM
If the field is missing or the value is null in the JSON document, it should not error out
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
10:54 PM
hmmm ok if that is the case then let me re-check the jsonlines conversion, may be an issue there
Jun 12, 2023 (3 months ago)
Vishal
Photo of md5-178450ab9171fe1c7eba3a5eb7e1a312
Vishal
02:23 AM
#Write pandas dataframe to jsonlines format in local path
with open('test.jsonl', "w") as f:
    f.write(df.to_json(orient='records', lines=True))

#Read from jsonlines file and bulk import to typesense server 
with open('test.jsonl', 'r', encoding='utf8') as file:
     documents = jsonlines.Reader(file)
     print(client.collections['collection_name'].documents.import_(documents, {'batch_size':1000, 'action': 'create', 'dirty_values': "coerce_or_drop"}))
02:23
Vishal
02:23 AM
[Resolved]