Skip to content
Unverified Commit 523e6256 authored by Alin Dima's avatar Alin Dima Committed by GitHub
Browse files

Add availability-recovery from systematic chunks (#1644)



**Don't look at the commit history, it's confusing, as this branch is
based on another branch that was merged**

Fixes #598 
Also implements [RFC
#47](https://github.com/polkadot-fellows/RFCs/pull/47)

## Description

- Availability-recovery now first attempts to request the systematic
chunks for large POVs (which are the first ~n/3 chunks, which can
recover the full data without doing the costly reed-solomon decoding
process). This has a fallback of recovering from all chunks, if for some
reason the process fails. Additionally, backers are also used as a
backup for requesting the systematic chunks if the assigned validator is
not offering the chunk (each backer is only used for one systematic
chunk, to not overload them).
- Quite obviously, recovering from systematic chunks is much faster than
recovering from regular chunks (4000% faster as measured on my apple M2
Pro).
- Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is
different for every core, in order to avoid only querying the first n/3
validators over and over again in the same session. The mapping is the
one described in RFC 47.
- The mapping is feature-gated by the [NodeFeatures runtime
API](https://github.com/paritytech/polkadot-sdk/pull/2177) so that it
can only be enabled via a governance call once a sufficient majority of
validators have upgraded their client. If the feature is not enabled,
the mapping will be the identity mapping and backwards-compatibility
will be preserved.
- Adds a new chunk request protocol version (v2), which adds the
ChunkIndex to the response. This may or may not be checked against the
expected chunk index. For av-distribution and systematic recovery, this
will be checked, but for regular recovery, no. This is backwards
compatible. First, a v2 request is attempted. If that fails during
protocol negotiation, v1 is used.
- Systematic recovery is only attempted during approval-voting, where we
have easy access to the core_index. For disputes and collator
pov_recovery, regular chunk requests are used, just as before.

## Performance results

Some results from subsystem-bench:

with regular chunk recovery: CPU usage per block 39.82s
with recovery from backers: CPU usage per block 16.03s
with systematic recovery: CPU usage per block 19.07s

End-to-end results here:
https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099

#### TODO:

- [x] [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47)
- [x] merge https://github.com/paritytech/polkadot-sdk/pull/2177 and
rebase on top of those changes
- [x] merge https://github.com/paritytech/polkadot-sdk/pull/2771 and
rebase
- [x] add tests
- [x] preliminary performance measure on Versi: see
https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099
- [x] Rewrite the implementer's guide documentation
- [x] https://github.com/paritytech/polkadot-sdk/pull/3065 
- [x] https://github.com/paritytech/zombienet/issues/1705 and fix
zombienet tests
- [x] security audit
- [x] final versi test and performance measure

---------

Signed-off-by: default avataralindima <[email protected]>
Co-authored-by: default avatarJavier Viola <[email protected]>
parent 09f07d54
Pipeline #478647 waiting for manual action with stages
in 1 hour, 14 minutes, and 26 seconds
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment