Skip to content
Snippets Groups Projects
  • Alin Dima's avatar
    Add availability-recovery from systematic chunks (#1644) · 523e6256
    Alin Dima authored
    
    **Don't look at the commit history, it's confusing, as this branch is
    based on another branch that was merged**
    
    Fixes #598 
    Also implements [RFC
    #47](https://github.com/polkadot-fellows/RFCs/pull/47)
    
    ## Description
    
    - Availability-recovery now first attempts to request the systematic
    chunks for large POVs (which are the first ~n/3 chunks, which can
    recover the full data without doing the costly reed-solomon decoding
    process). This has a fallback of recovering from all chunks, if for some
    reason the process fails. Additionally, backers are also used as a
    backup for requesting the systematic chunks if the assigned validator is
    not offering the chunk (each backer is only used for one systematic
    chunk, to not overload them).
    - Quite obviously, recovering from systematic chunks is much faster than
    recovering from regular chunks (4000% faster as measured on my apple M2
    Pro).
    - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is
    different for every core, in order to avoid only querying the first n/3
    validators over and over again in the same session. The mapping is the
    one described in RFC 47.
    - The mapping is feature-gated by the [NodeFeatures runtime
    API](https://github.com/paritytech/polkadot-sdk/pull/2177) so that it
    can only be enabled via a governance call once a sufficient majority of
    validators have upgraded their client. If the feature is not enabled,
    the mapping will be the identity mapping and backwards-compatibility
    will be preserved.
    - Adds a new chunk request protocol version (v2), which adds the
    ChunkIndex to the response. This may or may not be checked against the
    expected chunk index. For av-distribution and systematic recovery, this
    will be checked, but for regular recovery, no. This is backwards
    compatible. First, a v2 request is attempted. If that fails during
    protocol negotiation, v1 is used.
    - Systematic recovery is only attempted during approval-voting, where we
    have easy access to the core_index. For disputes and collator
    pov_recovery, regular chunk requests are used, just as before.
    
    ## Performance results
    
    Some results from subsystem-bench:
    
    with regular chunk recovery: CPU usage per block 39.82s
    with recovery from backers: CPU usage per block 16.03s
    with systematic recovery: CPU usage per block 19.07s
    
    End-to-end results here:
    https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099
    
    #### TODO:
    
    - [x] [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47)
    - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2177 and
    rebase on top of those changes
    - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2771 and
    rebase
    - [x] add tests
    - [x] preliminary performance measure on Versi: see
    https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099
    - [x] Rewrite the implementer's guide documentation
    - [x] https://github.com/paritytech/polkadot-sdk/pull/3065 
    - [x] https://github.com/paritytech/zombienet/issues/1705 and fix
    zombienet tests
    - [x] security audit
    - [x] final versi test and performance measure
    
    ---------
    
    Signed-off-by: default avataralindima <alin@parity.io>
    Co-authored-by: default avatarJavier Viola <javier@parity.io>
    Unverified
    523e6256
Code owners
Assign users and groups as approvers for specific file changes. Learn more.