Large JSONL Documents Import Issue & Resolution
TLDR Suraj was having trouble loading large JSONL documents into Typesense server. After several discussions and attempts, it was discovered that the issue was due to data quality. Once the team extracted the data again, the upload process worked smoothly.
3
1
Mar 14, 2023 (9 months ago)
Suraj
02:48 PM{
"message": "Not Ready or Lagging"
}
Can you please help what would be the best option to load large number of documents without having to break it down in to lot of smaller JSONL files. Because if i load files with about 20k documents it loads fast and smooth.
Is there any server config or setting that i am missing? Below is the CURL command i am sending
curl "${TYPESENSE_HOST}/collections/AdditionalContacts/documents/import?batch_size=1000" -X POST -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -T additional_contacts_10l-1499999.jsonl | jq
Kishore Nallan
02:50 PMKishore Nallan
02:50 PMSuraj
02:53 PMKishore Nallan
02:54 PMKishore Nallan
02:55 PMSuraj
02:57 PMJason
03:21 PMparallel --block -10 -a documents.jsonl --tmpdir /tmp --pipepart --cat 'curl -H "X-TYPESENSE-API-KEY: xyz" -X POST -T {} '
1
Atul
05:03 PMI uploaded around 200k records and it took around 20-30 minutes :face_palm:.
Jason
05:06 PMJason
05:07 PM1
Suraj
05:53 PM1
Mar 15, 2023 (9 months ago)
Suraj
07:38 AMI have one more question. What should be the next action when i get a ""message": "Not Ready or Lagging" or " and the API stats showing ""pending_write_batches": 675,". I left it as is for over 12hours but the status is still the same status.
In such cases what is the next step you suggest? I have been restarting the docker. But is there any command or restart/reset that can be done to get back to be able to load more data?
Kishore Nallan
07:40 AMSuraj
07:41 AMKishore Nallan
07:42 AMSuraj
07:43 AMSuraj
07:43 AMSuraj
07:43 AM"fields": [
{"name": "id", "type": "string" },
{"name": "kol_id", "type": "string" },
{"name": "master_customer_id", "type": "string" },
{"name": "master_customer_location_id", "type": "string" },
{"name": "title", "type": "string" ,"facet": true },
{"name": "first_name", "type": "string" },
{"name": "middle_name", "type": "string" },
{"name": "last_name", "type": "string" },
{"name": "full_name", "type": "string" },
{"name": "specialty", "type": "string" ,"facet": true},
{"name": "country_name", "type": "string" ,"facet": true},
{"name": "state_name", "type": "string" ,"facet": true},
{"name": "city_name", "type": "string" ,"facet": true},
{"name": "postal_code", "type": "string" ,"facet": *true},
{"name": "address_line_1", "type": "string" },
{"name": "npi", "type": "string" },
{"name": "customer_type", "type": "int32" }
]
Suraj
07:44 AMKishore Nallan
07:49 AMKishore Nallan
07:49 AMSuraj
07:50 AMKishore Nallan
07:50 AMSuraj
07:50 AMSuraj
07:51 AMKishore Nallan
07:51 AMSuraj
07:51 AMKishore Nallan
07:52 AMKishore Nallan
07:52 AMSuraj
07:52 AMSuraj
07:55 AMKishore Nallan
07:56 AMSuraj
07:58 AMI20230314 14:44:14.819993 191 raft_server.h:60] Peer refresh succeeded!
E20230314 14:44:23.821457 167 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230314 14:44:24.821669 167 raft_server.cpp:545] Term: 2, last_index index: 888, committed_index: 888, known_applied_index: 888, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353275
I20230314 14:44:24.821827 191 raft_server.h:60] Peer refresh succeeded!
E20230314 14:44:32.823074 167 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230314 14:44:34.823442 167 raft_server.cpp:545] Term: 2, last_index index: 888, committed_index: 888, known_applied_index: 888, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353275
I20230314 14:44:34.823505 191 raft_server.h:60] Peer refresh succeeded!
Suraj
07:59 AMKishore Nallan
08:02 AMSuraj
08:02 AMSuraj
08:04 AMI stopped the docker container using docker stop {containerid}
Suraj
08:04 AMSuraj
08:05 AME20230315 08:02:44.906180 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:02:52.907020 164 raft_server.cpp:545] Term: 3, last_index index: 906, committed_index: 906, known_applied_index: 906, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353326
I20230315 08:02:52.907131 195 raft_server.h:60] Peer refresh succeeded!
E20230315 08:02:53.907241 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:03:02.908504 164 raft_server.cpp:545] Term: 3, last_index index: 906, committed_index: 906, known_applied_index: 906, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353326
E20230315 08:03:02.908579 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:03:02.908634 190 raft_server.h:60] Peer refresh succeeded!
E20230315 08:03:11.910018 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:03:12.910221 164 raft_server.cpp:545] Term: 3, last_index index: 906, committed_index: 906, known_applied_index: 906, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353326
I20230315 08:03:12.910256 196 raft_server.h:60] Peer refresh succeeded!
E20230315 08:03:20.911703 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:03:22.912124 164 raft_server.cpp:545] Term: 3, last_index index: 906, committed_index: 906, known_applied_index: 906, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353326
I20230315 08:03:22.912271 194 raft_server.h:60] Peer refresh succeeded!
I20230315 08:03:27.124313 165 batched_indexer.cpp:279] Running GC for aborted requests, req map size: 1
E20230315 08:03:29.913832 164 raft_server.cpp:635] 675 queued writes > healthy write lag of 500
I20230315 08:03:32.914208 164 raft_server.cpp:545] Term: 3, last_index index: 906, committed_index: 906, known_applied_index: 906, applying_index: 0, queued_writes: 675, pending_queue_size: 0, local_sequence: 353326
Suraj
08:05 AMSuraj
08:06 AMKishore Nallan
08:11 AMSuraj
02:37 PMSuraj
02:38 PMSuraj
02:39 PM/data/state/log/log_inprogress_00000000000000000265' to
/data/state/log/log_00000000000000000265_00000000000000000271'I20230315 14:20:03.943948 204 log.cpp:108] Created new segment `/data/state/log/log_inprogress_00000000000000000272' with fd=74
I20230315 14:20:04.604780 166 raft_server.cpp:545] Term: 2, last_index index: 274, committed_index: 273, known_applied_index: 273, applying_index: 0, queued_writes: 17, pending_queue_size: 1, local_sequence: 1554680
I20230315 14:20:04.604878 203 raft_server.h:60] Peer refresh succeeded!
I20230315 14:20:06.574550 206 log.cpp:523] close a full segment. Current first_index: 272 last_index: 278 raft_sync_segments: 0 will_sync: 1 path: /data/state/log/log_00000000000000000272_00000000000000000278
I20230315 14:20:06.574635 206 log.cpp:537] Renamed
/data/state/log/log_inprogress_00000000000000000272' to
/data/state/log/log_00000000000000000272_00000000000000000278'I20230315 14:20:06.574759 206 log.cpp:108] Created new segment `/data/state/log/log_inprogress_00000000000000000279' with fd=38
I20230315 14:20:14.605857 166 raft_server.cpp:545] Term: 2, last_index index: 282, committed_index: 282, known_applied_index: 282, applying_index: 0, queued_writes: 18, pending_queue_size: 0, local_sequence: 1611001
I20230315 14:20:14.605947 204 raft_server.h:60] Peer refresh succeeded!
I20230315 14:20:24.607048 166 raft_server.cpp:545] Term: 2, last_index index: 282, committed_index: 282, known_applied_index: 282, applying_index: 0, queued_writes: 18, pending_queue_size: 0, local_sequence: 1611001
I20230315 14:20:24.607146 208 raft_server.h:60] Peer refresh succeeded!
I20230315 14:20:31.222904 167 batched_indexer.cpp:279] Running GC for aborted requests, req map size: 3
I20230315 14:20:34.608918 166 raft_server.cpp:545] Term: 2, last_index index: 282, committed_index: 282, known_applied_index: 282, applying_index: 0, queued_writes: 18, pending_queue_size: 0, local_sequence: 1611001
I20230315 14:20:34.609088 206 raft_server.h:60] Peer refresh succeeded!
I20230315 14:20:44.610810 166 raft_server.cpp:545] Term: 2, last_index index: 282, committed_index: 282, known_applied_index: 282, applying_index: 0, queued_writes: 18, pending_queue_size: 0, local_sequence: 1611001
I20230315 14:20:44.610908 203 raft_server.h:60] Peer refresh succeeded!
Suraj
02:41 PMSuraj
02:42 PMKishore Nallan
02:44 PMMar 16, 2023 (9 months ago)
Suraj
02:05 PMSuraj
02:06 PME20230316 09:34:50.361044 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 91: syntax error while parsing object key - invalid string: forbidden character after backslash; last read: '"master_custom\3'; expected string literal
E20230316 09:34:50.435761 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 229: syntax error while parsing value - invalid string: control character U+000D (CR) must be escaped to \u000D or \r; last read: '"PURNe|<U+000D>'
E20230316 09:34:50.513315 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 449: syntax error while parsing object key - invalid string: control character U+0010 (DLE) must be escaped to \u0010; last read: '"customer_s*V<U+0010>'; expected string literal
E20230316 09:34:50.571502 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 477: syntax error while parsing object separator - unexpected ','; expected ':'
E20230316 09:34:50.623791 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 156: syntax error while parsing object key - invalid string: control character U+0018 (CAN) must be escaped to \u0018; last read: '"first_namYs'w<U+0018>'; expected string literal
E20230316 09:34:51.012920 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 345: syntax error while parsing value - invalid literal; last read: '"city_name":>'
E20230316 09:34:51.444113 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 400: syntax error while parsing object key - invalid string: control character U+0013 (DC3) must be escaped to \u0013; last read: '"address_line<U+0013>'; expected string literal
I20230316 09:34:58.492647 1284 raft_server.cpp:545] Term: 5, last_index index: 348, committed_index: 348, known_applied_index: 348, applying_index: 0, queued_writes: 0, pending_queue_size: 0, local_sequence: 1745884
I20230316 09:34:58.492774 1320 raft_server.h:60] Peer refresh succeeded!
Suraj
02:09 PMSuraj
02:10 PMKishore Nallan
02:28 PMKishore Nallan
02:28 PMSuraj
02:28 PMKishore Nallan
02:29 PMSuraj
02:30 PMSuraj
02:30 PM{"success":true} % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7100k 0 244k 100 6855k 26994 739k 0:00:09 0:00:09 --:--:-- 61596
ETA: 160s Left: 3 AVG: 58.22s local:3/37/100%/58.3s
Suraj
02:30 PMSuraj
02:41 PME20230316 09:34:34.579699 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 473: syntax error while parsing value - invalid string: control character U+0006 (ACK) must be escaped to \u0006; last read: '"N0222`*e<U+0006>'
Suraj
02:42 PMSuraj
02:42 PMKishore Nallan
02:45 PMKishore Nallan
02:46 PMSuraj
02:47 PMKishore Nallan
02:49 PMKishore Nallan
02:49 PMSuraj
02:50 PMSuraj
03:31 PMSuraj
03:31 PMSuraj
03:32 PMSuraj
03:32 PME20230316 14:57:19.214854 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 224: syntax error while parsing object key - invalid string: control character U+0000 (NUL) must be escaped to \u0000; last read: '"full_nam<U+0000>'; expected string literal
I20230316 14:57:20.373448 1284 raft_server.cpp:545] Term: 5, last_index index: 647, committed_index: 647, known_applied_index: 647, applying_index: 0, queued_writes: 132, pending_queue_size: 0, local_sequence: 3796967
I20230316 14:57:20.373706 1320 raft_server.h:60] Peer refresh succeeded!
E20230316 14:57:22.755046 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 336: syntax error while parsing object separator - invalid literal; last read: '"state_name"='; expected ':'
E20230316 14:57:22.910374 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 310: syntax error while parsing object - invalid literal; last read: '"United States"<U+0017>'; expected '}'
E20230316 14:57:24.886736 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 100: syntax error while parsing object key - invalid string: control character U+001F (US) must be escaped to \u001F; last read: '"master_customer_locatio-<U+001F>'; expected string literal
E20230316 14:57:26.560950 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 446: syntax error while parsing object key - invalid string: control character U+001E (RS) must be escaped to \u001E; last read: '"npiw'<U+001E>'; expected string literal
E20230316 14:57:27.054530 1287 collection.cpp:74] JSON error: [json.exception.parse_error.101] parse error at line 1, column 99: syntax error while parsing object key - invalid string: control character U+0017 (ETB) must be escaped to \u0017; last read: '"master_customer_locatio<U+0017>'; expected string literal
Kishore Nallan
03:34 PMSuraj
03:34 PMSuraj
03:35 PMSuraj
03:35 PMKishore Nallan
03:35 PM?return_id=true
parameter to import so that the success
line also has the id
of the document being imported -- this way we can see where it stopsSuraj
03:36 PMKishore Nallan
03:36 PM3602 MATLOCKXVM
Suraj
03:36 PMSuraj
03:37 PMSuraj
03:37 PMSuraj
03:37 PMSuraj
03:37 PMKishore Nallan
03:37 PMSuraj
03:37 PMSuraj
03:37 PMKishore Nallan
03:38 PMSuraj
03:38 PM1
Mar 17, 2023 (9 months ago)
Suraj
06:22 AMMar 19, 2023 (9 months ago)
Kishore Nallan
03:09 PMMar 21, 2023 (9 months ago)
Suraj
08:03 AMThank you again for all your help. This is the version of TypeSense I installed on the Ubunutu server. The Docker version i have to tried to load with the fresh data. But my current setup is good enough for me to continue more R&D before we plan to use with customer instance.
Kishore Nallan
08:04 AMTypesense
Indexed 3015 threads (79% resolved)
Similar Threads
Unresolved High-Volume Write Issue in Typesense
Greg experienced issues with Typesense where it became unresponsive during high-volume write operations. Jason and Kishore Nallan suggested several solutions, but the issue remained unresolved. They suspect that the problem occurs when concurrent writes are happening to the same collection.
Troubleshooting Stalled Writes in TypeSense Instance
Robert was experiencing typesense instances getting stuck after trying to import documents. Kishore Nallan provided suggestions and added specific logs to diagnose the issue. The two identified queries causing troubles but the issues had not been fully resolved yet.
Troubleshooting Typesense 503 Errors and Usage Queries
Kevin encountered 503s using typesense. Jason asked for logs and explained why 503s occur. They made recommendations to remedy the issue and resolved Kevin's import parameter confusion. User was asked to open a github issue for accepting booleans.
Issue Resolution and Upgrade Problems in Typesense Version 0.26rc16
Ankit reported an issue with Typesense, which was addressed by Kishore Nallan and Jason. However, Ankit experienced difficulties while trying to upgrade, with the server status showing as "Not ready or lagging" 503. The resolution remains incomplete.
Addressing High CPU Usage in Typesense
Robert reported high CPU usage on Typesense, even after halting all incoming searches. Kishore Nallan suggested logging heavy queries and increasing thread count. The issue was resolved after Robert found and truncated unusually large documents in the database.