Hi this might be the wrong place but I thought I should prob typesense #contributions

Hi, this might be the wrong place but I thought I ...

Harrison Burt

01/12/2022, 9:00 PM

Hi, this might be the wrong place but I thought I should probably mention some stuff before writing it all on a blog post without first putting some notes down here 😅 So basically, this was my first time using TypeSense (I say using, it was more stress testing / comparison tests) but I found it really quite awkward to setup for the first time. For a few reasons: • When you use the wrong method, TypeSense returns a 404 not a 405 Method Not Allowed. which was a nightmare when you wonder when you're thinking you've put the wrong url in. • Adding bulk documents is annoying, although I can see why you do Line delimited JSON for bulk docs, I would love the ability to just send an array of objects, especially when you know it all fits into memory and I believe (if I haven't miss-understood the docs) that typesense will do it all in memory anyway. Personally I think the docs for the bulk import should be next to the single doc upload, because realistically when you first set the system up, you're probably going to be importing in bulk no? • The bulk imports return 200 OK and just ignore a invalid payload, now this could fully be me doing something dumb, but that's what it seemed like, which was equally quite confusing when you go to upload 20k docs, wait and then have it just... Ignore everything. Only when adding them one at a time did it turn around and say "actually bro your doc doesn't match the schema". I hope you don't take this as me being massively cynical but err, yeah it was definitely quite confusing when I first read through everything and the docs.

Jason Bosco

01/13/2022, 12:53 AM

@Harrison Burt Thank you for the detailed feedback! I really appreciate it.

When you use the wrong method, TypeSense returns a 404 not a 405 Method Not Allowed. which was a nightmare when you wonder when you're thinking you've put the wrong url in.

Interesting, hadn't considered HTTP code before, and I've certainly been surprised by some 404s I've seen when I used GET instead POST or vise versa. re: JSONL for bulk imports - the reason we use that format is for performance reasons primarily and then to reduce memory consumption during an import. If the input was an array of say 1M documents, we would have to first JSON parse the entire array before we can start indexing. Whereas if it's in JSONL format, we can JSON parse line-by-line and do a streaming import. JSON parsing is unfortunately a very resource heavy operation, so the smaller the JSON string, the better the performance. Also, when we do a streaming import like this, we don't have to store the entire string in memory, to then parse it all at once. Instead we can parse line-by-line. This avoids any big memory spikes during indexing. Now of course, we eventually index everything in memory, but having to hold the entire json parsed dataset in memory and then loop through it to index it almost doubles memory requirements which we don't want.

Personally I think the docs for the bulk import should be next to the single doc upload, because realistically when you first set the system up, you're probably going to be importing in bulk no?

Great point. Will address this shortly.

The bulk imports return 200 OK and just ignore a invalid payload,

The HTTP response should contain

{success: true}

{success: false, error: X}

for every document that was sent in the import. The reason we respond with a 200 is because there might be some documents which were indexed successfully and others that error out, which is what is indicated in the response body. Returning some other error code when a subset of documents errored out and others succeeded felt off, which is why just return a 200. The 200 is really to indicate that the server processed the whole import. Whether each record went through successfully or not is indicated in the response body.

Harrison Burt

01/13/2022, 8:55 AM

The HTTP response should contain
{success: true}
or
{success: false, error: X}
for every document that was sent in the import. The reason we respond with a 200 is because there might be some documents which were indexed successfully and others that error out, which is what is indicated in the response body. Returning some other error code when a subset of documents errored out and others succeeded felt off, which is why just return a 200.

The 200 is really to indicate that the server processed the whole import. Whether each record went through successfully or not is indicated in the response body.

That does make sense, although I think that could do with being made a little more obvious in the docs 😅 Goes on about JSON import via the api, then some version using

cat

then something about csv's then it briefly mentioned that behaviour when I read through it again now 😅

👍 1

2 Views

Open in Slack

Previous Next