#community-help

Improving Typesense Query Performance

TLDR Jonathan queried about slower than expected typesense query performance. Jason and Kishore Nallan offered solutions and explanations. After a series of tests, Jonathan found other queries returned results quickly, indicating the issue was specific to the original query.

Powered by Struct AI

1

1

1

26
11mo
Solved
Join the chat
Nov 08, 2022 (11 months ago)
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
07:05 PM
I've been experimenting with various search libs (sonic, meili, typesense, etc.), so far typesense is the easiest to get working with raw hex strings (20 byte blockchain addresses encoded as hex), but the query performance is slower than expected:

i've indexed 50 million items as a test, each one looks something like:

{"dbid": 1337, "address": "64a43130af34f9150030f2a2509a9efbd07fe372"}

querying for "000000" returns 4 items in ~200ms (12 cores, 128gb ram, 4x2TB RAID 0)

200ms is pretty decent, but not "amazing". an in-memory ART (adaptive radix trie, which i believe typesense also uses) can return this in a few ms. does 200ms seem in-line with your expectations?
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:07 PM
Could you share all the search query params you’re using?
07:07
Jason
07:07 PM
And also the exact collection schema?
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
07:08 PM
QUERY:

curl ""

SCHEMA:

curl "" \
  -X POST \
  -H "Content-Type: application/json" '{
    "name": "addresses",
    "fields": [
      {"name": "dbid", "type": "int64" },
      {"name": "address", "type": "string" }
    ],
    "default_sorting_field": "dbid"
  }'
07:11
Jonathan
07:11 PM
(also running via docker label 0.24.0.rcn28, i meant to try outside of docker but haven't yet)
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:12 PM
Could you try adding these additional search params:

num_typos=0 & typo_tokens_threshold=0 & drop_tokens_threshold=0 & prioritize_exact_match=false & highlight_fields=none (space added for readability)

and see if that makes a difference performance-wise
07:13
Jason
07:13 PM
I’ve anecdotally seen slower performance when run via Docker… But could you make sure that the Docker runtime is allowed to use all the cores and memory on the host machine?
07:13
Jason
07:13 PM
If that also doesn’t help, could you check if running natively on the host makes a difference?
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
07:15 PM
adding those fields did not change performance at all, testing docker stuff now
07:52
Jonathan
07:52 PM
from 200ms (docker) to 280ms (running on host directly), typesense-server-0.23.1-linux-amd64.tar.gz

that's with a fresh data directory, new index, and restart after creating index. surprising result
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:53 PM
Hmmm! That was unexpected
07:54
Jason
07:54 PM
Kishore Nallan Any idea what’s happening here ^
07:55
Jason
07:55 PM
Btw, could you post the output of GET /metrics.json?
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
07:56 PM
{
  "system_cpu10_active_percentage": "0.00",
  "system_cpu11_active_percentage": "9.09",
  "system_cpu12_active_percentage": "0.00",
  "system_cpu13_active_percentage": "9.09",
  "system_cpu14_active_percentage": "0.00",
  "system_cpu15_active_percentage": "9.09",
  "system_cpu16_active_percentage": "0.00",
  "system_cpu17_active_percentage": "10.00",
  "system_cpu18_active_percentage": "0.00",
  "system_cpu19_active_percentage": "9.09",
  "system_cpu1_active_percentage": "27.27",
  "system_cpu20_active_percentage": "0.00",
  "system_cpu21_active_percentage": "0.00",
  "system_cpu22_active_percentage": "0.00",
  "system_cpu23_active_percentage": "0.00",
  "system_cpu24_active_percentage": "0.00",
  "system_cpu2_active_percentage": "25.00",
  "system_cpu3_active_percentage": "10.00",
  "system_cpu4_active_percentage": "10.00",
  "system_cpu5_active_percentage": "0.00",
  "system_cpu6_active_percentage": "9.09",
  "system_cpu7_active_percentage": "9.09",
  "system_cpu8_active_percentage": "9.09",
  "system_cpu9_active_percentage": "0.00",
  "system_cpu_active_percentage": "6.10",
  "system_disk_total_bytes": "7610737090560",
  "system_disk_used_bytes": "3837115981824",
  "system_memory_total_bytes": "134997864448",
  "system_memory_used_bytes": "71718522880",
  "system_network_received_bytes": "0",
  "system_network_sent_bytes": "0",
  "typesense_memory_active_bytes": "11111964672",
  "typesense_memory_allocated_bytes": "11072338904",
  "typesense_memory_fragmentation_ratio": "0.00",
  "typesense_memory_mapped_bytes": "11397263360",
  "typesense_memory_metadata_bytes": "226870128",
  "typesense_memory_resident_bytes": "11111964672",
  "typesense_memory_retained_bytes": "1533775872"
}
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
07:58 PM
Is the 200ms specific to the query 000000? Could you try a random set of other strings to see if it’s consistent?
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
07:58 PM
very good q
07:59
Jonathan
07:59 PM
oh wow, that may be it
07:59
Jonathan
07:59 PM
wtf
08:01
Jonathan
08:01 PM
all other queries are coming back in 0ms, lightning fast
Jason
Photo of md5-8813087cccc512313602b6d9f9ece19f
Jason
08:01 PM
😅
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
08:01 PM
wow
08:01
Jonathan
08:01 PM
beautiful
08:01
Jonathan
08:01 PM
you made my day

1

1

08:01
Jonathan
08:01 PM
thank you

1

Nov 09, 2022 (11 months ago)
Kishore Nallan
Photo of md5-4e872368b2b2668460205b409e95c2ea
Kishore Nallan
01:38 AM
There's a bunch of stuff we do with prefix searching that is not as straightforward as simply using an ART index directly. For e.g. we also sort words that match a prefix based on their frequency/popularity of occurrence. So certain popular prefixes could be a bit slower.
Jonathan
Photo of md5-1ac34e3717bc718eb955ab69034d52d6
Jonathan
01:50 AM
(this particular prefix, 00000 only had 4 matches in 50 million documents so it may not be due to that but i acknowledge your point)

i'm surprised and impressed that raw hex strings worked so well with typesense. most other search libraries couldn't handle it