Add availability-recovery from systematic chunks (#1644)
**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes #598 Also implements [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](https://github.com/paritytech/polkadot-sdk/pull/2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099 #### TODO: - [x] [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47) - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2177 and rebase on top of those changes - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099 - [x] Rewrite the implementer's guide documentation - [x] https://github.com/paritytech/polkadot-sdk/pull/3065 - [x] https://github.com/paritytech/zombienet/issues/1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by: alindima <[email protected]> Co-authored-by: Javier Viola <[email protected]>
parent
09f07d54
Pipeline
#478647
waiting for manual action
with stages
in
1 hour, 14 minutes, and 26 seconds
Please register or sign in to comment