W20250118 15 47 24 963318 27340 replicator cpp 397 Group def typesense #community-help

W20250118 15:47:24.963318 27340 replicator.cpp:397...

Ankith

01/18/2025, 5:48 PM

W20250118 154724.963318 27340 replicator.cpp:397] Group default_group fail to issue RPC to 10.13.13.7881078108 _consecutive_error_times=221, [E2][10.13.13.78:8107][E2]peer_id not exist [R1][E2][10.13.13.78:8107][E2]peer_id not exist [R2][E2][10.13.13.78:8107][E2]peer_id not exist [R3][E2][10.13.13.78:8107][E2]peer_id not exist Getting this log message in typesense logs and typesense DB is shutting down automatically in the production in high peak search requests time. can anyone help me with this. What I have to do so that nodes will not go down. FYI: We are running a 6 node cluster in our production environment.

Jason Bosco

01/20/2025, 2:33 AM

Sounds like one or more of the nodes don't have sufficient CPU capacity, and so are not able to talk to the leader to establish quorum

Jason Bosco

01/20/2025, 2:33 AM

You want to add more CPU cores

Ankith

01/20/2025, 1:55 PM

@Jason Bosco Context: We are running a six node typesense cluster in our production environment with each node n2d-highcpu-48 (48-cores 48gb-memory) hosted in GCP for our search engine. This is the search request we are performing on typesnse DB.

Copy code

curl --location '<http://localhost:8108/multi_search?q=hey>' \
--header 'Content-Type: application/json' \
--header 'X-TYPESENSE-API-KEY: xyz\
--data '{
    "searches": [
       {
                "collection": "contents",
                "drop_tokens_threshold": 2,
                "filter_by": "visible: true && contentType: [Movies] && vendor: ![vkjdsda] && ((playbackType: playback && provider: [netflix,disney,hotstar]) || (provider:[primeVideo,aha,sony,colors,TataNeu,netflix,disney,hotstar]) || (playbackType: deeplink && provider: [disney]))",
                "max_candidates": 3,
                "min_len_1typo": 2,
                "num_typos": "5,0,0,2",
                "per_page": 10,
                "query_by": "title, language, genres, starcast",
                "query_by_weights": "5, 3, 2, 2",
                "sort_by": "_text_match(buckets: 10):desc, score1(missing_values: last):desc, score2(missing_values: last):desc",
                "typo_tokens_threshold": 20,
                "use_cache": true,
                "page_no": 1
        }
    ]
}'

Collection Schema: There are 3 Million records in the collection.

Copy code

{
  "name": "contents",
  "fields": [
    {
      "name": "starcast",
      "type": "string[]",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "genres",
      "type": "string[]",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "title",
      "type": "string",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "editorial_index",
      "type": "float",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "data_engg_index",
      "type": "float",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "reco_index",
      "type": "float",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "telecast_date_ts",
      "type": "int64",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "contentType",
      "type": "string",
      "facet": true,
      "optional": false,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "language",
      "type": "string",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": true,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "maturity_rating",
      "type": "string",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "directors",
      "type": "string[]",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "searchTags",
      "type": "string[]",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "visible",
      "type": "bool",
      "facet": false,
      "optional": false,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "vendor",
      "type": "string",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "masterId",
      "type": "string",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "provider",
      "type": "string",
      "facet": true,
      "optional": true,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "playbackType",
      "type": "string",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    },
    {
      "name": "shareWithPartner",
      "type": "bool",
      "facet": false,
      "optional": true,
      "index": true,
      "sort": false,
      "infix": false,
      "locale": "",
      "stem": false
    }
  ],
  "default_sorting_field": "telecast_date_ts",
  "enable_nested_fields": true,
  "symbols_to_index": [],
  "token_separators": []
}

Despite utilizing a high-performance CPU with 48 cores, our production environment is currently able to handle only 300 requests per second (RPS), averaging approximately 60 RPS per node. This is notably low, considering we route all search requests exclusively to the follower nodes (5 nodes in total). Simultaneously, we are also limiting 100–150 write operations per second directly to leader Node. Could you suggest ways to optimize this performance? Please suggest improvements in query parameters or in schema Data size: 3+ millions Most of the time when incident occurs, its due to CPU utilization goes above 100% And if we get more search requests in the peak time, Nodes are going down and Getting the below log messages in typesense logs and typesense DB is shutting down automatically. Errors found • W20250119 135803.441246 440569 node.cpp:1559] node default_group10.130.133.798107:8108 request PreVote from 10.130.133.10681078108 error: [E2][10.130.133.106:8107][E2]peer_id not exist • W20250119 135803.629504 440408 raft_server.cpp:721] Multi-node with no leader: refusing to reset peers. • W20250119 135303.197127 440408 controller.cpp:1550] SIGINT was installed with 1 • E20250119 135234.970660 472737 raft_server.cpp:780] 6195 lagging entries > healthy write lag of 3000 • E20250119 135234.970606 472737 raft_server.cpp:768] 6195 lagging entries > healthy read lag of 5000

Jason Bosco

01/20/2025, 9:23 PM

60rps on a 48-core machine with just 3M docs sounds very low. The query you shared doesn't look bad either. So there's some other infrastructure bottleneck at play. You might want to check if there's some disk IOPS issue may be?

Jason Bosco

01/20/2025, 9:24 PM

Also, you want to make sure you're running the latest version of Typesense v27.1 and if you are, try using the latest RC build

v28.0.rc35

Ankith

01/21/2025, 6:35 AM

@Jason Bosco Below is Graph of Disk IO, that looks normal. But with the above query and above collection schema, 60 RPS is using 100% CPU.

Ankith

01/21/2025, 6:50 AM

And also We have set 15000 read and write IOPS with ssd

Jason Bosco

01/22/2025, 2:38 AM

It's hard to debug this further without complete visibility into the infra. So we only offer this level of performance tuning support on Typesense Cloud with a Business Support plan or above.

2 Views

Open in Slack

Previous Next