Skip to content
Unverified Commit b32597ef authored by Iulian Barbu's avatar Iulian Barbu Committed by GitHub
Browse files

cumulus/minimal-node: added prometheus metrics for the RPC client (#5572)



# Description

When we start a node with connections to external RPC servers (as a
minimal node), we lack metrics around how many individual calls we're
doing to the remote RPC servers and their duration. This PR adds metrics
that measure durations of each RPC call made by the minimal nodes, and
implicitly how many calls there are.

Closes #5409 
Closes #5689

## Integration

Node operators should be able to track minimal node metrics and decide
appropriate actions according to how the metrics are interpreted/felt.
The added metrics can be observed by curl'ing the prometheus metrics
endpoint for the ~relaychain~ parachain (it was changed based on the
review). The metrics are represented by
~`polkadot_parachain_relay_chain_rpc_interface`~
`relay_chain_rpc_interface` namespace (I realized lining up
`parachain_relay_chain` in the same metric might be confusing :).
Excerpt from the curl:

```
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.001"} 15
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.004"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.016"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.064"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.256"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="1.024"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="4.096"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="16.384"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="65.536"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="+Inf"} 24
relay_chain_rpc_interface_sum{method="chain_getBlockHash",chain="rococo_local_testnet"} 0.11719075
relay_chain_rpc_interface_count{method="chain_getBlockHash",chain="rococo_local_testnet"} 24
```

## Review Notes

The way we measure durations/hits is based on `HistogramVec` struct
which allows us to collect timings for each RPC client method called
from the minimal node., It can be extended to measure the RPCs against
other dimensions too (status codes, response sizes, etc). The timing
measuring is done at the level of the `relay-chain-rpc-interface`, in
the `RelayChainRpcClient` struct's method 'request_tracing'. A single
entry point for all RPC requests done through the
relay-chain-rpc-interface. The requests durations will fall under
exponential buckets described by start `0.001`, factor `4` and count
`9`.

---------

Signed-off-by: default avatarIulian Barbu <[email protected]>
parent d31bb8ac
Pipeline #497876 waiting for manual action with stages
in 1 hour, 3 minutes, and 38 seconds