- Mar 05, 2025
-
-
Dmitry Markin authored
Allow adding extra request-response protocols during polkadot service initialization. This is required to add a request-response protocol described in [RFC-0008](https://polkadot-fellows.github.io/RFCs/approved/0008-parachain-bootnodes-dht.html) to the relay chain side of the parachain node. ### Review notes The PR might look scary due to a lot of code being moved. It is easier to review it on a per-commit basis. The commits do not containing changes to the code logic are named accordingly. --------- Co-authored-by:
cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
-
- Feb 27, 2025
-
-
Egor_P authored
This PR backports version bumps and prdocs reorg from the latest stable branch back to master
-
- Feb 24, 2025
-
-
CrabGopher authored
`sc-cli` brings rocksdb dependency into frame-benchmarking-cli when used with `default-features = false`. This PR makes rocksdb deps optional that sc-cli brings in some of the crates. I think I covered all the crates that depend on sc-cli but please let me know if I missed any. Fixes: https://github.com/paritytech/polkadot-sdk/issues/3793
-
Daniel Shiposha authored
# Description Fixes #7413 ## Integration This PR updates the `DryRunApi`. The signature of the `dry_run_call` is changed, and the XCM version of the return values of `dry_run_xcm` now follows the version of the input XCM program. ## Review Notes * **The `DryRunApi` is modified** * **Added the `Router::clear_messages` to `dry_run_xcm` common implementation** * **Fixed the xcmp-queue's Router's clear_messages: channels details' first_index and last_index are reset when clearing** * **The MIN_XCM_VERSION is added** * The common implementation in the `pallet-xcm` is modified accordingly * The `DryRunApi` tests are modified to account for testing old XCM versions * The implementation from the `pallet-xcm` is used where it was not used (including the `DryRunApi` tests) * All the runtime implementations are modified according to the Runtime API change --------- Co-authored-by:
Adrian Catangiu <adrian@parity.io>
-
- Feb 22, 2025
-
-
Al authored
# Description Creating this PR to changed Rotko Networks bootnode addresses to the new structure. Rotko bootnode addresses tested with this results: ``` { "asset-hub-kusama": { "bootnode": "/dns/asset-hub-kusama.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWJUFnjR2PNbsJhudwPVaWCoZy1acPGKjM2cSuGj345BBu", "discovered_peers": 3, "error_details": null, "id": "rotko", "network": "asset-hub-kusama", "status": "success", "test_duration_ms": 19394, "valid": true }, "asset-hub-polkadot": { "bootnode": "/dns/asset-hub-polkadot.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWKkzLjYF6M5eEs7nYiqEtRqY8SGVouoCwo3nCWsRnThDW", "discovered_peers": 5, "error_details": null, "id": "rotko", "network": "asset-hub-polkadot", "status": "success", "test_duration_ms": 5024, "valid": true }, "asset-hub-westend": { "bootnode": "/dns/asset-hub-westend.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWE4UDXqgtTcMCyUQ8S4uvaT8VMzzTBA6NWmKuYwTacWuN", "discovered_peers": 3, "error_details": null, "id": "rotko", "network": "asset-hub-westend", "status": "success", "test_duration_ms": 5023, "valid": true }, "bridge-hub-kusama": { "bootnode": "/dns/bridge-hub-kusama.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWAmBp54mUEYtvsk2kxNEsDbAvdUMcaghxKXgUQxmPEQ66", "discovered_peers": 4, "error_details": null, "id": "rotko", "network": "bridge-hub-kusama", "status": "success", "test_duration_ms": 6049, "valid": true }, "bridge-hub-polkadot": { "bootnode": "/dns/bridge-hub-polkadot.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWMxZY7tDc2Rh454VaJJ7RexKAXVS6xSBEvTnXSGCnuGDw", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "bridge-hub-polkadot", "status": "success", "test_duration_ms": 9112, "valid": true }, "bridge-hub-westend": { "bootnode": "/dns/bridge-hub-westend.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWJyeRHpxZZbfBCNEgeUFzmRC5AMSAs2tJhjJS1k5hULkD", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "bridge-hub-westend", "status": "success", "test_duration_ms": 9106, "valid": true }, "collectives-polkadot": { "bootnode": "/dns/collectives-polkadot.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWKrm3XmuGzJH17Wcn4HRDGsEjLZGDgN77q3ZhwnnQP7y1", "discovered_peers": 4, "error_details": null, "id": "rotko", "network": "collectives-polkadot", "status": "success", "test_duration_ms": 6044, "valid": true }, "collectives-westend": { "bootnode": "/dns/collectives-westend.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWPG85zhuSRoyptjLkFD4iJFistjiBmc15JgQ96B4fdXYr", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "collectives-westend", "status": "success", "test_duration_ms": 6044, "valid": true }, "coretime-kusama": { "bootnode": "/dns/coretime-kusama.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWCyPSkk5cq2eEdw1qHizfa6UT4QggSarCEtcvNXpnod8B", "discovered_peers": 3, "error_details": null, "id": "rotko", "network": "coretime-kusama", "status": "success", "test_duration_ms": 6036, "valid": true }, "coretime-polkadot": { "bootnode": "/dns/coretime-polkadot.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWPk5pR5QxWGVJ1YVWnXd4rkVTZ194iay58rAfcSHDpky3", "discovered_peers": 4, "error_details": null, "id": "rotko", "network": "coretime-polkadot", "status": "success", "test_duration_ms": 6045, "valid": true }, "coretime-westend": { "bootnode": "/dns/coretime-westend.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWFmGg7EGzxGDawuJ9EfyEznCrZfMJgGa4eHpMWjcJmg85", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "coretime-westend", "status": "success", "test_duration_ms": 6050, "valid": true }, "kusama": { "bootnode": "/dns/kusama.boot.rotko.net/tcp/30335/wss/p2p/12D3KooWAa5THTw8HPfnhEei23HdL8P9McBXdozG2oTtMMksjZkK", "discovered_peers": 9, "error_details": null, "id": "rotko", "network": "kusama", "status": "success", "test_duration_ms": 5024, "valid": true }, "paseo": { "bootnode": "/dns/paseo.boot.rotko.net/tcp/30335/wss/p2p/12D3KooWRH8eBMhw8c7bucy6pJfy94q4dKpLkF3pmeGohHmemdRu", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "paseo", "status": "success", "test_duration_ms": 12939, "valid": true }, "people-kusama": { "bootnode": "/dns/people-kusama.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWSKQwgoydfbN6mNN2aNwdqfkR2ExAnTRs8mmdrPQTtDLo", "discovered_peers": 5, "error_details": null, "id": "rotko", "network": "people-kusama", "status": "success", "test_duration_ms": 6053, "valid": true }, "people-polkadot": { "bootnode": "/dns/people-polkadot.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWLg9NPeDFoL54A7WfuHSM3YNxPBGVRAd9ZY6rmVfdT6GJ", "discovered_peers": 3, "error_details": null, "id": "rotko", "network": "people-polkadot", "status": "success", "test_duration_ms": 12195, "valid": true }, "people-westend": { "bootnode": "/dns/people-westend.boot.rotko.net/tcp/30435/wss/p2p/12D3KooWHwUXBUo2WRMUBwPLC2ttVbnEk1KvDyESYAeKcNoCn7WS", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "people-westend", "status": "success", "test_duration_ms": 7059, "valid": true }, "polkadot": { "bootnode": "/dns/polkadot.boot.rotko.net/tcp/30335/wss/p2p/12D3KooWPyEvPEXghnMC67Gff6PuZiSvfx3fmziKiPZcGStZ5xff", "discovered_peers": 6, "error_details": null, "id": "rotko", "network": "polkadot", "status": "success", "test_duration_ms": 11147, "valid": true }, "westend": { "bootnode": "/dns/westend.boot.rotko.net/tcp/30335/wss/p2p/12D3KooWLK8Zj1uZ46phU3vQwiDVda8tB76S8J26rXZQLHpwWkDJ", "discovered_peers": 2, "error_details": null, "id": "rotko", "network": "westend", "status": "success", "test_duration_ms": 5021, "valid": true } } ```
-
- Feb 20, 2025
-
-
Alexander Theißen authored
Ref https://github.com/paritytech/ci_cd/issues/1107 We mainly need that so that we can finally compile the `pallet_revive` fixtures on stable. I did my best to keep the commits focused on one thing to make review easier. All the changes are needed because rustc introduced more warnings or is more strict about existing ones. Most of the stuff could just be fixed and the commits should be pretty self explanatory. However, there are a few this that are notable: ## `non_local_definitions ` A lot of runtimes to write `impl` blocks inside functions. This makes sense to reduce the amount of conditional compilation. I guess I could have moved them into a module instead. But I think allowing it here makes sense to avoid the code churn. ## `unexpected_cfgs` The FRAME macros emit code that references various features like `std`, `runtime-benchmarks` or `try-runtime`. If a create that uses those macros does not have those features we get this warning. Those were mostly when defining a `mock` runtime. I opted for silencing the warning in this case rather than adding not needed features. For the benchmarking ui tests I opted for adding the `runtime-benchmark` feature to the `Cargo.toml`. ## Failing UI test I am bumping the `trybuild` version and regenerating the ui tests. The old version seems to be incompatible. This requires us to pass `deny_warnings` in `CARGO_ENCODED_RUSTFLAGS` as `RUSTFLAGS` is ignored in the new version. ## Removing toolchain file from the pallet revive fixtures This is no longer needed since the latest stable will compile them fine using the `RUSTC_BOOTSTRAP=1`. --------- Co-authored-by:
cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
-
- Feb 13, 2025
-
-
s0me0ne-unkn0wn authored
Closes #3270 --------- Co-authored-by: command-bot <>
-
- Feb 07, 2025
-
-
Alexandru Gheorghe authored
approval-voting-parallel has been running on all network types except polkadot since release `1.17.0`, so it has around 2 months of baking and running without any reported issues, so let's enable it on polkadot as well. After running in polkadot for awhile the flag will be entirely removed. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
Alexandru Gheorghe authored
There is a small issue on restart, where if finality is lagging across session boundary and the validator restarts, then the validator won't be able to contribute anymore with assginments/approvals and gossiping for the blocks from the previous session, because after restart it builds the Topology only for the new session, so without a topology it won't be able to distribute assignments and approvals because everything in `approval-distribution` is gated on having a topology for the block. The fix is to also keep track of the last finalized block and its session and if it is different from the list of encountered sessions, build its topology and send it to the rest of subsystems. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by:
ordian <write@reusable.software>
-
- Feb 03, 2025
-
-
Alin Dima authored
Part of https://github.com/paritytech/polkadot-sdk/issues/5079. Removes all usage of the static async backing params, replacing them with dynamically computed equivalent values (based on the claim queue and scheduling lookahead). Adds a new runtime API for querying the scheduling lookahead value. If not present, falls back to 3 (the default value that is backwards compatible with values we have on production networks for allowed_ancestry_len) Also resolves most of https://github.com/paritytech/polkadot-sdk/issues/4447, removing code that handles async backing not yet being enabled. While doing this, I removed the support for collation protocol version 1 on collators, as it only worked for leaves not supporting async backing (which are none). I also unhooked the legacy v1 statement-distribution (for the same reason as above). That subsystem is basically dead code now, so I had to remove some of its tests as they would no longer pass (since the subsystem no longer sen...
-
- Jan 31, 2025
-
-
Egor_P authored
This PR backports regular version bumps and prdoc reorganization from stable release branch back to master
-
- Jan 30, 2025
-
-
Stephane Gurgenidze authored
malus-collator: implement malicious collator submitting same collation to all backing groups (#6924) ## Issues - [[#5049] Elastic scaling: zombienet tests](https://github.com/paritytech/polkadot-sdk/issues/5049) - [[#4526] Add zombienet tests for malicious collators](https://github.com/paritytech/polkadot-sdk/issues/4526) ## Description Modified the undying collator to include a malus mode, in which it submits the same collation to all assigned backing groups. ## TODO * [X] Implement malicious collator that submits the same collation to all backing groups; * [X] Avoid the core index check in the collation generation subsystem: https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/collation-generation/src/lib.rs#L552-L553; * [X] Resolve the mismatch between the descriptor and the commitments core index: https://github.com/paritytech/polkadot-sdk/pull/7104 * [X] Implement `duplicate_collations` test with zombienet-sdk; * [X] Add PRdoc.
-
- Jan 26, 2025
-
-
Branislav Kontur authored
Part of: https://github.com/paritytech/polkadot-sdk/issues/6906
-
- Jan 23, 2025
-
-
Andrei Sandu authored
Currently the `para_backing_state` API is used only by the prospective parachains subsystems and returns 2 things: the constraints for parachain blocks and the candidates pending availability. This PR deprecates `para_backing_state` and introduces a new `backing_constraints` API that can be used together with `candidates_pending_availability` to get the same information provided by `para_backing_state`. TODO: - [x] PRDoc --------- Signed-off-by:
Andrei Sandu <andrei-mihail@parity.io> Co-authored-by: command-bot <>
-
- Jan 22, 2025
-
-
Stephane Gurgenidze authored
## Issue [[#7107] Core Index Mismatch in Commitments and Descriptor](https://github.com/paritytech/polkadot-sdk/issues/7107) ## Description This PR resolves a bug where normal (non-malus) undying collators failed to generate and submit collations, resulting in the following error: `ERROR tokio-runtime-worker parachain::collation-generation: Failed to construct and distribute collation: V2 core index check failed: The core index in commitments doesn't match the one in descriptor.` More details about the issue and reproduction steps are described in the [related issue](https://github.com/paritytech/polkadot-sdk/issues/7107). ## Summary of Fix - When core selectors are provided in the UMP signals, core indexes will be chosen using them; - The fix ensures that functionality remains unchanged for parachains not using UMP signals; - Added checks to stop processing if the same core is selected repeatedly. ## TODO - [X] Implement the fix; - [x] Add tests; - [x] Add PRdoc.
-
- Jan 21, 2025
-
-
Sebastian Kunert authored
Link-checker job is constantly failing because of these two links. In the browser there is a redirect, apparently our lychee checker can't handle it.
-
- Jan 20, 2025
-
-
Benjamin Gallois authored
## Description The `frame-benchmarking-cli` crate has not been buildable without the `rocksdb` feature since version 1.17.0. **Error:** ```rust self.database()?.unwrap_or(Database::RocksDb), ^^^^^^^ variant or associated item not found in `Database` ``` This issue is also related to the `rocksdb` feature bleeding (#3793), where the `rocksdb` feature was always activated even when compiling this crate with `--no-default-features`. **Fix:** - Resolved the error by choosing `paritydb` as the default database when compiled without the `rocksdb` feature. - Fixed the issue where the `sc-cli` crate's `rocksdb` feature was always active, even compiling `frame-benchmarking-cli` with `--no-default-features`. ## Review Notes Fix the crate to be built without rocksdb, not intended to solve #3793. --------- Co-authored-by: command-bot <>
-
Sebastian Kunert authored
Saw this test flake a few times, last time [here](https://github.com/paritytech/polkadot-sdk/actions/runs/12834432188/job/35791830215). We first fetch all processes in the test, then query `/proc/<pid>/stat` for every one of them. When the file was not found, we would error. Now we tolerate not finding this file. Ran 200 times locally without error, before would fail a few times, probably depending on process fluctuation (which I expect to be high on CI runners).
-
- Jan 15, 2025
-
-
Alexandru Gheorghe authored
Normally, approval-voting wouldn't receive duplicate assignments because approval-distribution makes sure of it, however in the situation where we restart we might receive the same assignment again and since approval-voting already persisted it we will end up inserting it twice in `ApprovalEntry.tranches.assignments` because that's an array. Fix this by making sure duplicate assignments are a noop if the validator already had an assignment imported at the same tranche. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by:
ordian <write@reusable.software>
-
- Jan 14, 2025
-
-
Alexandru Gheorghe authored
There is a problem on restart where nodes will not trigger their needed assignment if they were offline while the time of the assignment passed. That happens because after restart we will hit this condition https://github.com/paritytech/polkadot-sdk/blob/4e805ca0/polkadot/node/core/approval-voting/src/lib.rs#L2495 and considered will be `tick_now` which is already higher than the tick of our assignment. The fix is to schedule a wakeup for untriggered assignments at restart and let the logic of processing an wakeup decide if it needs to trigger the assignment or not. One thing that we need to be careful here is to make sure we don't schedule the wake up immediately after restart because, the node would still be behind with all the assignments that should have received and might make it wrongfully decide it needs to trigger its assignment, so I added a `RESTART_WAKEUP_DELAY: Tick = 12` which should be more t...
-
Alexandru Gheorghe authored
Recovering the POV can fail in situation where the node just restart and the DHT topology wasn't fully discovered yet, so the current node can't connect to most of its Peers. This is bad because for gossiping the assignment you need to be connected to just a few peers, so because we can't approve the candidate and other nodes will see this as a no show. This becomes bad in the scenario where you've got a lot of nodes restarting at the same time, so you end up having a lot of no-shows in the network that are never covered, in that case it makes sense for nodes to actually retry approving the candidate at a later data in time and retry several times if the block containing the candidate wasn't approved. ## TODO - [x] Add a subsystem test. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
Alin Dima authored
-
- Jan 13, 2025
-
-
Alexandru Gheorghe authored
Reference hardware requirements have been bumped to at least 8 cores so we can no allocate 50% of that capacity to PVF execution. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Jan 09, 2025
-
-
wmjae authored
Co-authored-by:
Dónal Murray <donalm@seadanda.dev>
-
- Jan 05, 2025
-
-
thiolliere authored
Implement cumulus StorageWeightReclaim as wrapping transaction extension + frame system ReclaimWeight (#6140) (rebasing of https://github.com/paritytech/polkadot-sdk/pull/5234) ## Issues: * Transaction extensions have weights and refund weight. So the reclaiming of unused weight must happen last in the transaction extension pipeline. Currently it is inside `CheckWeight`. * cumulus storage weight reclaim transaction extension misses the proof size of logic happening prior to itself. ## Done: * a new storage `ExtrinsicWeightReclaimed` in frame-system. Any logic which attempts to do some reclaim must use this storage to avoid double reclaim. * a new function `reclaim_weight` in frame-system pallet: info and post info in arguments, read the already reclaimed weight, calculate the new unused weight from info and post info. do the more accurate reclaim if higher. * `CheckWeight` is unchanged and still reclaim the weight in post dispatch * `ReclaimWeight` is a new transaction extension in frame system. For s...
-
- Dec 20, 2024
-
-
Xavier Lau authored
It doesn't make sense to only reorder the features array. For example: This makes it hard for me to compare the dependencies and features, especially some crates have a really really long dependencies list. ```toml [dependencies] c = "*" a = "*" b = "*" [features] std = [ "a", "b", "c", ] ``` This makes my life easier. ```toml [dependencies] a = "*" b = "*" c = "*" [features] std = [ "a", "b", "c", ] ``` --------- Co-authored-by:
Bastian Köcher <git@kchr.de> Co-authored-by: command-bot <>
-
- Dec 19, 2024
-
-
Egor_P authored
This PR includes backport of the regular version bumps and `prdocs` reordering from the `stable2412` branch back ro master --------- Co-authored-by:
ParityReleases <release-team@parity.io> Co-authored-by: command-bot <>
-
- Dec 13, 2024
-
-
Alexandru Gheorghe authored
Approval voting canonicalize is off by one that means if we are finalizing blocks one by one, approval-voting cleans it up every other block for example: - With 1, 2, 3, 4, 5, 6 blocks created, the stored range would be StoredBlockRange(1,7) - When block 3 is finalized the canonicalize works and StoredBlockRange is (4,7) - When block 4 is finalized the canonicalize exists early because of the `if range.0 > canon_number` break clause, so blocks are not cleaned up. - When block 5 is finalized the canonicalize works and StoredBlockRange becomes (6,7) and both block 4 and 5 are cleaned up. The consequences of this is that sometimes we keep block entries around after they are finalized, so at restart we consider this blocks and send them to approval-distribution. In most cases this is not a problem, but in the case when finality is lagging on restart approval-distribution will receive 4 as being the oldest block it needs to work on, and since BlockFinalize...
-
Tsvetomir Dimitrov authored
Related to https://github.com/paritytech/polkadot-sdk/issues/1797 # The problem When fetching collations in collator protocol/validator side we need to ensure that each parachain has got a fair core time share depending on its assignments in the claim queue. This means that the number of collations fetched per parachain should ideally be equal to (but definitely not bigger than) the number of claims for the particular parachain in the claim queue. # Why the current implementation is not good enough The current implementation doesn't guarantee such fairness. For each relay parent there is a `waiting_queue` (PerRelayParent -> Collations -> waiting_queue) which holds any unfetched collations advertised to the validator. The collations are fetched on first in first out principle which means that if two parachains share a core and one of the parachains is more aggressive it might starve the second parachain. How? At each relay parent up to `max_candidate_depth` candidates are accepted (enforced in `fn is_seconded_limit_reached`) so if one of the parachains is quick enough to fill in the queue with its advertisements the validator will never fetch anything from the rest of the parachains despite they are scheduled. This doesn't mean that the aggressive parachain will occupy all the core time (this is guaranteed by the runtime) but it will deny the rest of the parachains sharing the same core to have collations backed. # How to fix it The solution I am proposing is to limit fetches and advertisements based on the state of the claim queue. At each relay parent the claim queue for the core assigned to the validator is fetched. For each parachain a fetch limit is calculated (equal to the number of entries in the claim queue). Advertisements are not fetched for a parachain which has exceeded its claims in the claim queue. This solves the problem with aggressive parachains advertising too much collations. The second part is in collation fetching logic. The collator will keep track on which collations it has fetched so far. When a new collation needs to be fetched instead of popping the first entry from the `waiting_queue` the validator examines the claim queue and looks for the earliest claim which hasn't got a corresponding fetch. This way the collator will always try to prioritise the most urgent entries. ## How the 'fair share of coretime' for each parachain is determined? Thanks to async backing we can accept more than one candidate per relay parent (with some constraints). We also have got the claim queue which gives us a hint which parachain will be scheduled next on each core. So thanks to the claim queue we can determine the maximum number of claims per parachain. For example the claim queue is [A A A] at relay parent X so we know that at relay parent X we can accept three candidates for parachain A. There are two things to consider though: 1. If we accept more than one candidate at relay parent X we are claiming the slot of a future relay parent. So accepting two candidates for relay parent X means that we are claiming the slot at rp X+1 or rp X+2. 2. At the same time the slot at relay parent X could have been claimed by a previous relay parent(s). This means that we need to accept less candidates at X or even no candidates. There are a few cases worth considering: 1. Slot claimed by previous relay parent. CQ @ rp X: [A A A] Advertisements at X-1 for para A: 2 Advertisements at X-2 for para A: 2 Outcome - at rp X we can accept only 1 advertisement since our slots were already claimed. 2. Slot in our claim queue already claimed at future relay parent CQ @ rp X: [A A A] Advertisements at X+1 for para A: 1 Advertisements at X+2 for para A: 1 Outcome: at rp X we can accept only 1 advertisement since the slots in our relay parents were already claimed. The situation becomes more complicated with multiple leaves (forks). Imagine we have got a fork at rp X: ``` CQ @ rp X: [A A A] (rp X) -> (rp X+1) -> rp(X+2) \-> (rp X+1') ``` Now when we examine the claim queue at RP X we need to consider both forks. This means that accepting a candidate at X means that we should have a slot for it in *BOTH* leaves. If for example there are three candidates accepted at rp X+1' we can't accept any candidates at rp X because there will be no slot for it in one of the leaves. ## How the claims are counted There are two solutions for counting the claims at relay parent X: 1. Keep a state for the claim queue (number of claims and which of them are claimed) and look it up when accepting a collation. With this approach we need to keep the state up to date with each new advertisement and each new leaf update. 2. Calculate the state of the claim queue on the fly at each advertisement. This way we rebuild the state of the claim queue at each advertisements. Solution 1 is hard to implement with forks. There are too many variants to keep track of (different state for each leaf) and at the same time we might never need to use them. So I decided to go with option 2 - building claim queue state on the fly. To achieve this I've extended `View` from backing_implicit_view to keep track of the outer leaves. I've also added a method which accepts a relay parent and return all paths from an outer leaf to it. Let's call it `paths_to_relay_parent`. So how the counting works for relay parent X? First we examine the number of seconded and pending advertisements (more on pending in a second) from relay parent X to relay parent X-N (inclusive) where N is the length of the claim queue. Then we use `paths_to_relay_parent` to obtain all paths from outer leaves to relay parent X. We calculate the claims at relay parents X+1 to X+N (inclusive) for each leaf and get the maximum value. This way we guarantee that the candidate at rp X can be included in each leaf. This is the state of the claim queue which we use to decide if we can fetch one more advertisement at rp X or not. ## What is a pending advertisement I mentioned that we count seconded and pending advertisements at relay parent X. A pending advertisement is: 1. An advertisement which is being fetched right now. 2. An advertisement pending validation at backing subsystem. 3. An advertisement blocked for seconding by backing because we don't know on of its parent heads. Any of these is considered a 'pending fetch' and a slot for it is kept. All of them are already tracked in `State`. --------- Co-authored-by:
Maciej <maciej.zyszkiewicz@parity.io> Co-authored-by: command-bot <> Co-authored-by:
Alin Dima <alin@parity.io>
-
- Dec 12, 2024
-
-
Bastian Köcher authored
Co-authored-by:
GitHub Action <action@github.com> Co-authored-by:
Branislav Kontur <bkontur@gmail.com> Co-authored-by: command-bot <>
-
Kazunobu Ndong authored
# Description Issue #6476 Collation-generation is not needed for validators node, and should be removed. ## Implementation Use a `DummySubsystem` for `collation_generation` --------- Co-authored-by:
Bastian Köcher <git@kchr.de> Co-authored-by: command-bot <> Co-authored-by:
Dmitry Markin <dmitry@markin.tech> Co-authored-by:
Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
-
- Dec 11, 2024
-
-
Alexandru Gheorghe authored
After finality started lagging on kusama around `2025-11-25 15:55:40` nodes started being overloaded with messages and some restarted with ``` Subsystem approval-distribution-subsystem appears unresponsive when sending a message of type polkadot_node_subsystem_types::messages::ApprovalDistributionMessage. origin=polkadot_service::relay_chain_selection::SelectRelayChainInner<sc_client_db::Backend<sp_runtime::generic::block::Block<sp_runtime::generic::header::Header<u32, sp_runtime::traits::BlakeTwo256>, sp_runtime::OpaqueExtrinsic>>, polkadot_overseer::Handle> ``` I think this happened because our aggression in the current form is way too spammy and create problems in situation where we already constructed blocks with a load of candidates to check which what happened around `#25933682` before and after. However aggression, does help in the nightmare scenario where the network is segmented and sparsely connected, so I tend to think we shouldn't completely remove it. The current configuration is: ``` l1_threshold: Some(16), l2_threshold: Some(28), resend_unfinalized_period: Some(8), ``` The way aggression works right now : 1. After L1 is triggered all nodes send all messages they created to all the other nodes and all messages they would have they already send according to the topology. 2. Because of resend_unfinalized_period for each block all messages at step 1) are sent every 8 blocks, so for example let's say we have blocks 1 to 24 unfinalized, then at block 25, all messages for block 1, 9 will be resent, and consequently at block 26, all messages for block 2, 10 will be resent, this becomes worse as more blocks are created if backing backpressure did not kick in yet. In total this logic makes that each node receive 3 * total_number_of messages_per_block 3. L2 aggression is way too spammy, when L2 aggression is enabled all nodes sends all messages of a block on GridXY, that means that all messages are received and sent by node at least 2*sqrt(num_validators), so on kusama would be 66 * NUM_MESSAGES_AT_FIRST_UNFINALIZED_BLOCK, so even with a reasonable number of messages like 10K, which you can have if you escalated because of no shows, you end-up sending and receiving ~660k messages at once, I think that's what makes the approval-distribution to appear unresponsive on some nodes. 4. Duplicate messages are received by the nodes which turn, mark the node as banned, which may create more no-shows. ## Proposed improvements: 1. Make L2 trigger way later 28 blocks, instead of 64, this should literally the last resort, until then we should try to let the approval-voting escalation mechanism to do its things and cover the no-shows. 2. On L1 aggression don't send messages for blocks too far from the first_unfinalized there is no point in sending the messages for block 20, if block 1 is still unfinalized. 3. On L1 aggression, send messages then back-off for 3 * resend_unfinalized_period to give time for everyone to clear up their queues. 4. If aggression is enabled accept duplicate messages from validators and don't punish them by reducting their reputation which, which may create no-shows. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by:
Andrei Sandu <54316454+sandreim@users.noreply.github.com>
-
- Dec 10, 2024
-
-
Alexandru Gheorghe authored
The way we build the messages we need to send to approval-distribution can result in a situation where is we have multiple assignments covered by a coalesced approval, the messages are sent in this order: ASSIGNMENT1, APPROVAL, ASSIGNMENT2, because we iterate over each candidate and add to the queue of messages both the assignment and the approval for that candidate, and when the approval reaches the approval-distribution subsystem it won't be imported and gossiped because one of the assignment for it is not known. So in a network where a lot of nodes are restarting at the same time we could end up in a situation where a set of the nodes correctly received the assignments and approvals before the restart and approve their blocks and don't trigger their assignments. The other set of nodes should receive the assignments and approvals after the restart, but because the approvals never get broacasted anymore because of this bug, the only way they could approve is if other nodes start broadcasting their assignments. I think this bug contribute to the reason the network did not recovered on `25-11-25 15:55:40` after the restarts. Tested this scenario with a `zombienet` where `nodes` are finalising blocks because of aggression and all nodes are restarted at once and confirmed the network lags and doesn't recover before and it does after the fix --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
Joseph Zhao authored
Close: #5858 --------- Co-authored-by:
Bastian Köcher <git@kchr.de>
-
- Dec 09, 2024
-
-
Alexandru Gheorghe authored
After finality started lagging on kusama around 025-11-25 15:55:40 validators started seeing ocassionally this log, when importing votes covering more than one assignment. ``` Possible bug: Vote import failed ``` That happens because the assumption that assignments from the same validator would have the same required routing doesn't hold after you enabled aggression, so you might end up receiving the first assignment then you modify the routing for it in `enable_aggression` then your receive the second assignment and the vote covering both assignments, so the rouing for the first and second assingment wouldn't match and we would fail to import the vote. From the logs I've seen, I don't think this is the reason the network didn't fully recover until the failsafe kicked it, because the votes had been already imported in approval-voting before this error. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Dec 03, 2024
-
-
Lulu authored
-
- Nov 29, 2024
-
-
eskimor authored
This might actually happen in non malicious cases. Co-authored-by:
eskimor <eskimor@no-such-url.com>
-
- Nov 25, 2024
-
-
jpserrat authored
Closes #6415 # Description Remove unused message `ReportCollator` and test related to this message on the collator protocol validator side. cc: @tdimitrov --------- Co-authored-by:
Tsvetomir Dimitrov <tsvetomir@parity.io> Co-authored-by: command-bot <>
-
- Nov 19, 2024
-
-
Bastian Köcher authored
This pull request forward all the logging directives given to the node via `RUST_LOG` or `-l` to the workers, instead of only forwarding `RUST_LOG`. --------- Co-authored-by:
GitHub Action <action@github.com>
-
- Nov 18, 2024
-
-
Tsvetomir Dimitrov authored
Since async backing parameters runtime api is released on all networks the code in backing subsystem can be simplified by removing the usages of `ProspectiveParachainsMode` and keeping only the branches of the code under `ProspectiveParachainsMode::Enabled`. The PR does that and reworks the tests in mod.rs to use async backing. It's a preparation for https://github.com/paritytech/polkadot-sdk/issues/5079 --------- Co-authored-by:
Alin Dima <alin@parity.io> Co-authored-by: command-bot <>
-