I ve been experimenting with various search libs sonic meili typesense #community-help

I've been experimenting with various search libs (...

Jonathan Otto

11/08/2022, 7:05 PM

I've been experimenting with various search libs (sonic, meili, typesense, etc.), so far typesense is the easiest to get working with raw hex strings (20 byte blockchain addresses encoded as hex), but the query performance is slower than expected: i've indexed 50 million items as a test, each one looks something like:

{"dbid": 1337, "address": "64a43130af34f9150030f2a2509a9efbd07fe372"}

querying for "000000" returns 4 items in ~200ms (12 cores, 128gb ram, 4x2TB RAID 0) 200ms is pretty decent, but not "amazing". an in-memory ART (adaptive radix trie, which i believe typesense also uses) can return this in a few ms. does 200ms seem in-line with your expectations?

Jason Bosco

11/08/2022, 7:07 PM

Could you share all the search query params you’re using?

Jason Bosco

11/08/2022, 7:07 PM

And also the exact collection schema?

Jonathan Otto

11/08/2022, 7:08 PM

QUERY:

Copy code

curl "<http://localhost:8108/collections/addresses/documents/search?q=000000&query_by=address>"

SCHEMA:

Copy code

curl "<http://localhost:8108/collections>" \
  -X POST \
  -H "Content-Type: application/json" '{
    "name": "addresses",
    "fields": [
      {"name": "dbid", "type": "int64" },
      {"name": "address", "type": "string" }
    ],
    "default_sorting_field": "dbid"
  }'

Jonathan Otto

11/08/2022, 7:11 PM

(also running via docker label

0.24.0.rcn28

, i meant to try outside of docker but haven't yet)

Jason Bosco

11/08/2022, 7:12 PM

Could you try adding these additional search params:

num_typos=0 & typo_tokens_threshold=0 & drop_tokens_threshold=0 & prioritize_exact_match=false & highlight_fields=none

(space added for readability) and see if that makes a difference performance-wise

Jason Bosco

11/08/2022, 7:13 PM

I’ve anecdotally seen slower performance when run via Docker… But could you make sure that the Docker runtime is allowed to use all the cores and memory on the host machine?

Jason Bosco

11/08/2022, 7:13 PM

If that also doesn’t help, could you check if running natively on the host makes a difference?

Jonathan Otto

11/08/2022, 7:15 PM

adding those fields did not change performance at all, testing docker stuff now

Jonathan Otto

11/08/2022, 7:52 PM

from 200ms (docker) to 280ms (running on host directly),

typesense-server-0.23.1-linux-amd64.tar.gz

that's with a fresh data directory, new index, and restart after creating index. surprising result

Jason Bosco

11/08/2022, 7:53 PM

Hmmm! That was unexpected

Jason Bosco

11/08/2022, 7:54 PM

@Kishore Nallan Any idea what’s happening here ^

Jason Bosco

11/08/2022, 7:55 PM

Btw, could you post the output of

GET /metrics.json

Jonathan Otto

11/08/2022, 7:56 PM

Copy code

{
  "system_cpu10_active_percentage": "0.00",
  "system_cpu11_active_percentage": "9.09",
  "system_cpu12_active_percentage": "0.00",
  "system_cpu13_active_percentage": "9.09",
  "system_cpu14_active_percentage": "0.00",
  "system_cpu15_active_percentage": "9.09",
  "system_cpu16_active_percentage": "0.00",
  "system_cpu17_active_percentage": "10.00",
  "system_cpu18_active_percentage": "0.00",
  "system_cpu19_active_percentage": "9.09",
  "system_cpu1_active_percentage": "27.27",
  "system_cpu20_active_percentage": "0.00",
  "system_cpu21_active_percentage": "0.00",
  "system_cpu22_active_percentage": "0.00",
  "system_cpu23_active_percentage": "0.00",
  "system_cpu24_active_percentage": "0.00",
  "system_cpu2_active_percentage": "25.00",
  "system_cpu3_active_percentage": "10.00",
  "system_cpu4_active_percentage": "10.00",
  "system_cpu5_active_percentage": "0.00",
  "system_cpu6_active_percentage": "9.09",
  "system_cpu7_active_percentage": "9.09",
  "system_cpu8_active_percentage": "9.09",
  "system_cpu9_active_percentage": "0.00",
  "system_cpu_active_percentage": "6.10",
  "system_disk_total_bytes": "7610737090560",
  "system_disk_used_bytes": "3837115981824",
  "system_memory_total_bytes": "134997864448",
  "system_memory_used_bytes": "71718522880",
  "system_network_received_bytes": "0",
  "system_network_sent_bytes": "0",
  "typesense_memory_active_bytes": "11111964672",
  "typesense_memory_allocated_bytes": "11072338904",
  "typesense_memory_fragmentation_ratio": "0.00",
  "typesense_memory_mapped_bytes": "11397263360",
  "typesense_memory_metadata_bytes": "226870128",
  "typesense_memory_resident_bytes": "11111964672",
  "typesense_memory_retained_bytes": "1533775872"
}

Jason Bosco

11/08/2022, 7:58 PM

Is the 200ms specific to the query

? Could you try a random set of other strings to see if it’s consistent?

Jonathan Otto

11/08/2022, 7:58 PM

very good q

Jonathan Otto

11/08/2022, 7:59 PM

oh wow, that may be it

Jonathan Otto

11/08/2022, 7:59 PM

wtf

Jonathan Otto

11/08/2022, 8:01 PM

all other queries are coming back in 0ms, lightning fast

Jason Bosco

11/08/2022, 8:01 PM

😅

Jonathan Otto

11/08/2022, 8:01 PM

wow

Jonathan Otto

11/08/2022, 8:01 PM

beautiful

Jonathan Otto

11/08/2022, 8:01 PM

you made my day

😄 1

🙌 1

Jonathan Otto

11/08/2022, 8:01 PM

thank you

👍 1

Kishore Nallan

11/09/2022, 1:38 AM

There's a bunch of stuff we do with prefix searching that is not as straightforward as simply using an ART index directly. For e.g. we also sort words that match a prefix based on their frequency/popularity of occurrence. So certain popular prefixes could be a bit slower.

Jonathan Otto

11/09/2022, 1:50 AM

(this particular prefix,

only had 4 matches in 50 million documents so it may not be due to that but i acknowledge your point) i'm surprised and impressed that raw hex strings worked so well with typesense. most other search libraries couldn't handle it

Open in Slack

Previous Next