Hi, I find myself in an odd scenario , which I'm u...
# community-help
s
Hi, I find myself in an odd scenario , which I'm unable to replicate in my local machine. System information : Codebase : nodejs server on a linux machine, Typesense version v26.0 Description : For our site crawler, we've written code that crawls the website and stores SEO information with respect to each URL as a JSON document on the filesystem. Thereafter once the crawling is done, we index that data in typesense in batches of 40 using the following code(here docs is an array of documents):
Copy code
await client.collections(collectionName).documents().import(docs, {
				action: 'emplace',
				return_id: true,
				dirty_values: 'coerce_or_reject',
});
When I run the code on our dev server(which has both the indexing nodejs server and the typesense server running), indexing starts to fail randomly after around 10-12 batches(for all subsequent batches) with RequestMalformed 400 error. It happens really fast, almost in the blink of an eye. However, when I run this on my local machine(connecting it to the same typesense server running on our dev server ), indexing works absolutely fine. I can see the batches are being processed a little slower here(on the nodejs server running on my local machine). Now in both cases, the code and data are the same. And I do not see this issue with our production typesense cluster yet. Related logs : What I see in typesense.log(for the duration of indexing of all batches) :
Copy code
I20250127 10:20:01.153842 838508 raft_server.h:60] Peer refresh succeeded!
I20250127 10:20:10.584830 838502 collection.cpp:5039] Collection site_content_11 is being prepared for alter...
I20250127 10:20:10.585074 838502 collection.cpp:5067] Alter payload validation is successful...
I20250127 10:20:10.585126 838502 collection.cpp:4986] Finished altering 0 document(s).
I20250127 10:20:11.154877 838493 raft_server.cpp:693] Term: 53, pending_queue: 0, last_index: 2769909, committed: 2769909, known_applied: 2769909, applying: 0, pending_writes: 0, queued_writes: 428, local_sequence: 471962177
I20250127 10:20:11.155035 838508 raft_server.h:60] Peer refresh succeeded!
I20250127 10:20:11.597046 838502 collection.cpp:5039] Collection site_content_11 is being prepared for alter...
I20250127 10:20:11.597399 838502 collection.cpp:5067] Alter payload validation is successful...
I20250127 10:20:11.597494 838502 collection.cpp:4986] Finished altering 0 document(s).
I20250127 10:20:21.156010 838493 raft_server.cpp:693] Term: 53, pending_queue: 0, last_index: 2769922, committed: 2769922, known_applied: 2769922, applying: 0, pending_writes: 0, queued_writes: 428, local_sequence: 471963644
I20250127 10:20:21.156138 838508 raft_server.h:60] Peer refresh succeeded!
What I see on our application error logs for every batch after 10-12 batches( me just JSON.stringifying the error object in the catch block) :
Copy code
2025-01-27 10:03:14 -08:00: index-server:prod:indexer: Indexing 40 items to site_content_11 
2025-01-27 10:03:14 -08:00: index-server:prod:indexer: === error indexer.ts [38] === {
2025-01-27 10:03:14 -08:00: index-server:prod:indexer:   "name": "RequestMalformed",
2025-01-27 10:03:14 -08:00: index-server:prod:indexer:   "httpStatus": 400
2025-01-27 10:03:14 -08:00: index-server:prod:indexer: }
Question : Is this some sort of race condition that I need to factor in my code ? I can't see any other issue with this.
k
400 status code means that somehow the data being passed to the API is not correctly formatted. Whenever you get that error, can you try printing the actual payload?
s
In this case, you mean the "docs" array, right ?
k
Whichever call fails with the error, print the POST body that the particular call sent.
s
We're calling the import function of the typesense-js sdk. Is the API call used by import being logged somewhere specific, because I don't see it in application logs(in debug mode). I don't see it in typesense.log either. Am I missing something here ?
f
Hey there, regarding the 400 error we're talking about. Is it part of the import error response, or is there an error that's entirely failing when importing? Does the error contain the following string?:
documents failed during import. Use \error.importResults\
from the raised exception to get a detailed error reason for each document.`
s
No, it doesn't contain this string. The error when executing import takes us to catch block. I have code to handle error.importResults when import response is 200 but there is also a possibility of some docs failing. This code is not executed.
f
https://github.com/typesense/typesense-js/pull/259 I've posted a PR to the js-sdk to add the full payload sent to typesense in the error. So in the catch, you should have the ability to check the full payload sent to Typesense. From that we can check what's going on. In the meantime, if possible, try to debug the underlying Typesense code execution using the maps we produce during our build
👍 1
The PR is merged now, try catching the error and accessing the
httpBody
attribute in any instance of a child of
TypesenseError
. The 400
Request Malformed
is one of them, so try accessing it on that
s
I took a slightly different approach. Let me know if this helps. 1. I used interceptors to log all https requests( it also logged the ones made by typesense). The POST body of the failing request contained 40 json docs in JSONL format. It was the same as the first few batches that were successfully imported. There were no errors here. 2. The request URL was https//[hostnameport]/collections/site_content_11/documents/import?action=emplace&return_id=true&dirty_values=coerce_or_reject 3. However, when I logged the 400 response text, it was "custom limit exceeded". 4. It seemed to me that I hit a "docs per second" or a "batches per second" limit. Hence, I added a 2 second timeout(I guess a 1 second timeout would also work) after each batch was sent for processing. This seemed to resolve the issue for me. P.S This error showed up all of a sudden. There were no deployments between the last successful run and the first error run. I will try out the upgrade to v28rc35 as well. I just wanted to get the RCA for this before I do that.
f
So it was a 400 Like "Request failed with HTTP code 400 | Server said: custom limit exceeded"?
s
Yes
f
This should mean that the request itself is passed to Typesense, something goes wrong and we pass the message back. Thought this message isn't part of an error thrown by Typesense server. It could be something with Axios itself or Node. Are you running it in serverless or Node?
s
Running it in node. This message is not part of error thrown by import call. I intercepted the request sent by axios to the typsense server for the import call and subsequently the response as well.
f
With no interceptor, is there no message apart from the "Request failed with HTTP code 400 | Server said: " part?
s
Yes
f
This appears to be an injection of the error message at the network/middleware layer, rather than from Typesense or standard Axios error handling. Could you run a quick check to see if any Node.js agent settings are being configured in your app? Particularly around connection pooling or keepAlive settings? Also, if you happen to have node-http-proxy or similar middleware in your stack, that could explain where this message is getting injected. The fact that it's happening consistently after 10-12 batches but only on your dev server (and not local) suggests some environment-specific limit. By the way, are you using any process manager that might be enforcing its own limits? I'm also thinking this could be coming from an nginx reverse proxy sitting in front of your Node.js app. Nginx has configurable rate limiting that can return similar errors, and it's common to see this kind of behavior when hitting its limits.
s
That's interesting. Thanks Fanis. I'll definitely check this out.
🙌 1
Turns out you were right. We had both the IndexServer and the typesense-sever running on the same instance. That instance was only available via VPN earlier. Around the time the infra team made it available publicly with a firewall and reverse-proxy in front of it, they added the rate limit. I cross-checked and the issue started then as well. I've updated the IndexServer to make requests to the running typesense server with localhost:8108 instead. That fixed the problem.
🙌 2
f
Happy you found the solution!