Skip to content
Snippets Groups Projects
user avatar
Alexandru Vasile authored
network/strategy: Backoff and ban overloaded peers to avoid submitting the same request multiple times (#5029)

This PR avoids submitting the same block or state request multiple times
to the same slow peer.

Previously, we submitted the same request to the same slow peer, which
resulted in reputation bans on the slow peer side.
Furthermore, the strategy selected the same slow peer multiple times to
submit queries to, although a better candidate may exist.

Instead, in this PR we:
- introduce a `DisconnectedPeers` via LRU with 512 peer capacity to only
track the state of disconnected peers with a request in flight
- when the `DisconnectedPeers` detects a peer disconnected with a
request in flight, the peer is backed off
  - on the first disconnection: 60 seconds
  - on second disconnection: 120 seconds
- on the third disconnection the peer is banned, and the peer remains
banned until the peerstore decays its reputation
  
This PR lifts the pressure from overloaded nodes that cannot process
requests in due time.
And if a peer is detected to be slow after backoffs, the peer is banned.

Theoretically, submitting the same request multiple times can still
happen when:
- (a) we backoff and ban the peer 
- (b) the network does not discover other peers -- this may also be a
test net
- (c) the peer gets reconnected after the reputation decay and is still
slow to respond



Aims to improve:
- https://github.com/paritytech/polkadot-sdk/issues/4924
- https://github.com/paritytech/polkadot-sdk/issues/531

Next Steps:
- Investigate the network after this is deployed, possibly bumping the
keep-alive timeout or seeing if there's something else misbehaving




This PR builds on top of:
- https://github.com/paritytech/polkadot-sdk/pull/4987


### Testing Done
- Added a couple of unit tests where test harness were set in place

- Local testnet

```bash
13:13:25.102 DEBUG tokio-runtime-worker sync::persistent_peer_state: Added first time peer 12D3KooWHdiAxVd8uMQR1hGWXccidmfCwLqcMpGwR6QcTP6QRMuD

13:14:39.102 DEBUG tokio-runtime-worker sync::persistent_peer_state: Remove known peer 12D3KooWHdiAxVd8uMQR1hGWXccidmfCwLqcMpGwR6QcTP6QRMuD state: DisconnectedPeerState { num_disconnects: 2, last_disconnect: Instant { tv_sec: 93355, tv_nsec: 942016062 } }, should ban: false

13:16:49.107 DEBUG tokio-runtime-worker sync::persistent_peer_state: Remove known peer 12D3KooWHdiAxVd8uMQR1hGWXccidmfCwLqcMpGwR6QcTP6QRMuD state: DisconnectedPeerState { num_disconnects: 3, last_disconnect: Instant { tv_sec: 93485, tv_nsec: 947551051 } }, should ban: true

13:16:49.108  WARN tokio-runtime-worker peerset: Report 12D3KooWHdiAxVd8uMQR1hGWXccidmfCwLqcMpGwR6QcTP6QRMuD: -2147483648 to -2147483648. Reason: Slow peer after backoffs. Banned, disconnecting.
```

cc @paritytech/networking

---------

Signed-off-by: default avatarAlexandru Vasile <alexandru.vasile@parity.io>
6619277b

SDK Logo SDK Logo

Polkadot SDK

GitHub stars  GitHub forks

StackExchange  GitHub contributors  GitHub commit activity  GitHub last commit

The Polkadot SDK repository provides all the components needed to start building on the Polkadot network, a multi-chain blockchain platform that enables different blockchains to interoperate and share information in a secure and scalable way.

⚡ Quickstart

If you want to get an example node running quickly you can execute the following getting started script:

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/scripts/getting-started.sh | bash

📚 Documentation

🚀 Releases

[!NOTE] Our release process is still Work-In-Progress and may not yet reflect the aspired outline here.

The Polkadot-SDK has two release channels: stable and nightly. Production software is advised to only use stable. nightly is meant for tinkerers to try out the latest features. The detailed release process is described in RELEASE.md.

You can use psvm to manage your Polkadot-SDK dependency versions in downstream projects.

😌 Stable

stable releases have a support duration of three months. In this period, the release will not have any breaking changes. It will receive bug fixes, security fixes, performance fixes and new non-breaking features on a two week cadence.

🤠 Nightly

nightly releases are released every night from the master branch, potentially with breaking changes. They have pre-release version numbers in the format major.0.0-nightlyYYMMDD.

🛠️ Tooling

Polkadot SDK Version Manager: A simple tool to manage and update the Polkadot SDK dependencies in any Cargo.toml file. It will automatically update the Polkadot SDK dependencies to their correct crates.io version.

🔐 Security

The security policy and procedures can be found in docs/contributor/SECURITY.md.

🤍 Contributing & Code of Conduct

Ensure you follow our contribution guidelines. In every interaction and contribution, this project adheres to the Contributor Covenant Code of Conduct.

👾 Ready to Contribute?

Take a look at the issues labeled with mentor (or alternatively this page, created by one of the maintainers) label to get started! We always recognize valuable contributions by proposing an on-chain tip to the Polkadot network as a token of our appreciation.

Polkadot Fellowship

Development in this repo usually goes hand in hand with the fellowship organization. In short, this repository provides all the SDK pieces needed to build both Polkadot and its parachains. But, the actual Polkadot runtime lives in the fellowship/runtimes repository. Read more about the fellowship, this separation, the RFC process here.

History

This repository is the amalgamation of 3 separate repositories that used to make up Polkadot SDK, namely Substrate, Polkadot and Cumulus. Read more about the merge and its history here.