Skip to content
Snippets Groups Projects
user avatar
Alexandru Gheorghe authored
After finality started lagging on kusama around `2025-11-25 15:55:40`
nodes started being overloaded with messages and some restarted with
```
Subsystem approval-distribution-subsystem appears unresponsive when sending a message of type polkadot_node_subsystem_types::messages::ApprovalDistributionMessage. origin=polkadot_service::relay_chain_selection::SelectRelayChainInner<sc_client_db::Backend<sp_runtime::generic::block::Block<sp_runtime::generic::header::Header<u32, sp_runtime::traits::BlakeTwo256>, sp_runtime::OpaqueExtrinsic>>, polkadot_overseer::Handle>
```

I think this happened because our aggression in the current form is way
too spammy and create problems in situation where we already constructed
blocks with a load of candidates to check which what happened around
`#25933682` before and after. However aggression, does help in the
nightmare scenario where the network is segmented and sparsely
connected, so I tend to think we shouldn't completely remove it.

The current configuration is:
```
l1_threshold: Some(16),
l2_threshold: Some(28),
resend_unfinalized_period: Some(8),
```
The way aggression works right now :
1. After L1 is triggered all nodes send all messages they created to all
the other nodes and all messages they would have they already send
according to the topology.
2. Because of resend_unfinalized_period for each block all messages at
step 1) are sent every 8 blocks, so for example let's say we have blocks
1 to 24 unfinalized, then at block 25, all messages for block 1, 9 will
be resent, and consequently at block 26, all messages for block 2, 10
will be resent, this becomes worse as more blocks are created if backing
backpressure did not kick in yet. In total this logic makes that each
node receive 3 * total_number_of messages_per_block
3. L2 aggression is way too spammy, when L2 aggression is enabled all
nodes sends all messages of a block on GridXY, that means that all
messages are received and sent by node at least 2*sqrt(num_validators),
so on kusama would be 66 * NUM_MESSAGES_AT_FIRST_UNFINALIZED_BLOCK, so
even with a reasonable number of messages like 10K, which you can have
if you escalated because of no shows, you end-up sending and receiving
~660k messages at once, I think that's what makes the
approval-distribution to appear unresponsive on some nodes.
4. Duplicate messages are received by the nodes which turn, mark the
node as banned, which may create more no-shows.

## Proposed improvements:
1. Make L2 trigger way later 28 blocks, instead of 64, this should
literally the last resort, until then we should try to let the
approval-voting escalation mechanism to do its things and cover the
no-shows.
2. On L1 aggression don't send messages for blocks too far from the
first_unfinalized there is no point in sending the messages for block
20, if block 1 is still unfinalized.
3. On L1 aggression, send messages then back-off for 3 *
resend_unfinalized_period to give time for everyone to clear up their
queues.
4. If aggression is enabled accept duplicate messages from validators
and don't punish them by reducting their reputation which, which may
create no-shows.

---------

Signed-off-by: default avatarAlexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: default avatarAndrei Sandu <54316454+sandreim@users.noreply.github.com>
85dd228d

SDK Logo SDK Logo

Polkadot SDK

GitHub stars  GitHub forks

StackExchange  GitHub contributors  GitHub commit activity  GitHub last commit

The Polkadot SDK repository provides all the components needed to start building on the Polkadot network, a multi-chain blockchain platform that enables different blockchains to interoperate and share information in a secure and scalable way.

⚡ Quickstart

If you want to get an example node running quickly you can execute the following getting started script:

curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/scripts/getting-started.sh | bash

📚 Documentation

🚀 Releases

Current Stable Release  Next Stable Release

The Polkadot SDK is released every three months as a stableYYMMDD release. They are supported for one year with patches. See the next upcoming versions in the Release Registry.

You can use psvm to update all dependencies to a specific version without needing to manually select the correct version for each crate.

🛠️ Tooling

Polkadot SDK Version Manager: A simple tool to manage and update the Polkadot SDK dependencies in any Cargo.toml file. It will automatically update the Polkadot SDK dependencies to their correct crates.io version.

🔐 Security

The security policy and procedures can be found in docs/contributor/SECURITY.md.

🤍 Contributing & Code of Conduct

Ensure you follow our contribution guidelines. In every interaction and contribution, this project adheres to the Contributor Covenant Code of Conduct.

👾 Ready to Contribute?

Take a look at the issues labeled with mentor (or alternatively this page, created by one of the maintainers) label to get started! We always recognize valuable contributions by proposing an on-chain tip to the Polkadot network as a token of our appreciation.

Polkadot Fellowship

Development in this repo usually goes hand in hand with the fellowship organization. In short, this repository provides all the SDK pieces needed to build both Polkadot and its parachains. But, the actual Polkadot runtime lives in the fellowship/runtimes repository. Read more about the fellowship, this separation, the RFC process here.

History

This repository is the amalgamation of 3 separate repositories that used to make up Polkadot SDK, namely Substrate, Polkadot and Cumulus. Read more about the merge and its history here.