Hi all (also Kishore !), I've set up typesense, I ...
# community-help
a
Hi all (also Kishore !), I've set up typesense, I already indexed a collection with about 20k objects, the schema is a single id and text field which is the one indexed. Long thread incoming, but you would probably appreciate my findings. I have noticed that, when running multiple search requests in a single one (i.e. multisearch) the time it takes to complete the requests does not match the time reported by the individual search_time_mss.
Check out this pic:
1st col is the text query, 2nd col is the value reported by search_time_ms for each one, 3rd col is the actual time as measured in my code. And the final value is the time it took for everything to run. If you take the 2nd column and sum all the values, you would get something like 200 ms, but for whatever reason, the whole thing takes about twice the time to execute.
I am running typesense in localhost so It's not a network roundtrip issue, BUT, I still think it is related to how long it takes to transport all the data from ts -> to my client. I am looking at the typical search response and it is quite verbose, is there a way to disable some of the fields in the search result? Like the snippets, which I don't need.
j
@Al Mo The difference between col1 and col2 seems to be around 3-6ms. This additional time most likely comes from http layer parsing, tiny tcp overhead and actual data transfer (like you mentioned). The search time excludes these. You could disable some document fields from being returned using the
exclude_fields
search parameter, but there's no way to disable snippets at the moment.
There's this Github issue we're tracking to disable highlights: https://github.com/typesense/typesense/issues/260
👍 1
a
Following it, thanks!
@Jason Bosco Ok, FWIW, the same queries, same fields, same data, same server, takes about 150ms to run on meilisearch. Each query takes about the same as in typesense, but without the extra overhead. If you find a way to speed this up it would be a nice performance improvement, I wish I could help more, as I am liking TS a lot, so far.
j
I see, I wonder if this is specific to multi-search, since there's some JSON parsing involved there on the server side. If it's not too much trouble, would you be able to repeat this test with the single search (documents/search) endpoint?
a
Yes, I've just finished that about an hour ago 😄 One path uses multisearch, the other is just Promise.all(<with a lot of single searches>) (so they run concurrently) Results are pretty much the same on both paths. Same time per query, same overhead, same time overall (+- 20ms or so).
j
Got it, would you be able to share this dataset (say via email)? I used a 2M recipes dataset recently to benchmark Typesense and Meilisearch and that dataset showed faster search response times with Typesense consistently. So I wonder if this is something specific to this dataset...
a
Sure, is JSON fine with you? I can dump the documents array, it's exactly the same on both ms and ts.
j
Yup, JSON is perfect, thank you!
Oh and also the collection schema you used
a
Sure, what's your email?
j
a
Ok, great give me a few mins.
🙏 1
Email sent 👌
j
Thank you! Will take a look
a
Thank you!
k
I will be taking a look at this today. Can you please tell me what version of Typesense you are using locally?
I just tried it on the dataset you shared (with 42K records), and I am not able to reproduce the latency. When I use curl + timing, the entire query including the response finishes in about
74 ms
. This is on Typesense v0.20.0 Docker image. I will email you the query snippet I used so that we can compare results.
a
Hi again Kishore, ok. So, you mean the whole multisearch 'query' right (all of them bundled)?
k
Yup check your email for the exact snippet.
a
I left out a few in my pic, but still you should be able to see a big difference between the sum of all search_time_ms and the measured time for the whole thing (or not?).
k
I am measuring the end to end curl request time taken. The gist has the curl request I'm sending..if you can run the same on your localhost we can compare the times.
a
Ok, let me check.
Hi, kishore, I got your email but I'm really tired now (2am), I'll go sleep a bit and come back tomorrow with the results you asked. Have good day/night!
k
No problem, good night!
a
Hi Kishore, I sent you an email some time ago, but I just wanted to add something. I've also wrote some code in node.js that makes and http request to the multi_search endpoint and parses the result. Measured times are virtually identical vs. using the library for the same thing. So I guess the overhead I'm seeing is just data transfer + json parsing, not much that could be donde regarding that. The response from the servers for all my queries combined is about 2Mb, so that explains it. I'll wait until there's a way to disable fields from the response, that's the bottleneck in my case.
k
Got it. Have you tried using
exclude_fields
?
a
Let me see what happens if I remove the 'text' field which is the heaviest.
(forgot to come back to that) it doesn't make big difference apparently
k
Has the size of response dropped when you excluded the field?
But it certainly it seems like the issue is with client. Either json parsing or something else.
a
Yes, it dropped, but processing time didn't change significantly.
k
Got it. We will be now looking at it from a json parsing angle. Will keep you posted.