Skip to content
Snippets Groups Projects
  • Alexandru Gheorghe's avatar
    [5 / 5] Introduce approval-voting-parallel (#4849) · b16237ad
    Alexandru Gheorghe authored
    This is the implementation of the approach described here:
    https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2150321612
    &
    https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154357547
    &
    https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154721395.
    
    ## Description of changes
    
    The end goal is to have an architecture where we have single
    subsystem(`approval-voting-parallel`) and multiple worker types that
    would full-fill the work that currently is fulfilled by the
    `approval-distribution` and `approval-voting` subsystems. The main loop
    of the new subsystem would do just the distribution of work to the
    workers.
    
    The new subsystem will have:
    - N approval-distribution workers: This would do the work that is
    currently being done by the approval-distribution subsystem and in
    addition to that will also perform the crypto-checks that an assignment
    is valid and that a vote is correctly signed. Work is assigned via the
    following formula: `worker_index = msg.validator % WORKER_COUNT`, this
    guarantees that all assignments and approvals from the same validator
    reach the same worker.
    - 1 approval-voting worker: This would receive an already valid message
    and do everything the approval-voting currently does, except the
    crypto-checking that has been moved already to the approval-distribution
    worker.
    
    On the hot path of processing messages **no** synchronisation and
    waiting is needed between approval-distribution and approval-voting
    workers.
    
    <img width="1431" alt="Screenshot 2024-06-07 at 11 28 08"
    src="https://github.com/paritytech/polkadot-sdk/assets/49718502/a196199b-b705-4140-87d4-c6900ba8595e">
    
    
    
    ## Guidelines for reading
    
    The full implementation is broken in 5 PRs and all of them are
    self-contained and improve things incrementally even without the
    parallelisation being implemented/enabled, the reason this approach was
    taken instead of a big-bang PR, is to make things easier to review and
    reduced the risk of breaking this critical subsystems.
    
    After reading the full description of this PR, the changes should be
    read in the following order:
    1. https://github.com/paritytech/polkadot-sdk/pull/4848, some other
    micro-optimizations for networks with a high number of validators. This
    change gives us a speed up by itself without any other changes.
    2. https://github.com/paritytech/polkadot-sdk/pull/4845 , this contains
    only interface changes to decouple the subsystem from the `Context` and
    be able to run multiple instances of the subsystem on different threads.
    **No functional changes**
    3. https://github.com/paritytech/polkadot-sdk/pull/4928, moving of the
    crypto checks from approval-voting in approval-distribution, so that the
    approval-distribution has no reason to wait after approval-voting
    anymore. This change gives us a speed up by itself without any other
    changes.
    4. https://github.com/paritytech/polkadot-sdk/pull/4846, interface
    changes to make approval-voting runnable on a separate thread. **No
    functional changes**
    5. This PR, where we instantiate an `approval-voting-parallel` subsystem
    that runs on different workers the logic currently in
    `approval-distribution` and `approval-voting`.
    6. The next step after this changes get merged and deploy would be to
    bring all the files from approval-distribution, approval-voting,
    approval-voting-parallel into a single rust crate, to make it easier to
    maintain and understand the structure.
    
    ## Results
    Running subsystem-benchmarks with 1000 validators 100 fully ocuppied
    cores and triggering all assignments and approvals for all tranches
    
    #### Approval does not lags behind. 
     Master
    ```
    Chain selection approved  after 72500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a
    ```
    With this PoC
    ```
    Chain selection approved  after 3500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a
    ```
    
    #### Gathering enough assignments
     
    Enough assignments are gathered in less than 500ms, so that gives un a
    guarantee that un-necessary work does not get triggered, on master on
    the same benchmark because the subsystems fall behind on work, that
    number goes above 32 seconds on master.
     
    <img width="2240" alt="Screenshot 2024-06-20 at 15 48 22"
    src="https://github.com/paritytech/polkadot-sdk/assets/49718502/d2f2b29c-5ff6-44b4-a245-5b37ab8e58bc">
    
    
    #### Cpu usage:
    Master
    ```
    CPU usage, seconds                     total   per block
    approval-distribution                96.9436      9.6944
    approval-voting                     117.4676     11.7468
    test-environment                     44.0092      4.4009
    ```
    With this PoC
    ```
    CPU usage, seconds                     total   per block
    approval-distribution                 0.0014      0.0001 --- unused
    approval-voting                       0.0437      0.0044.  --- unused
    approval-voting-parallel              5.9560      0.5956
    approval-voting-parallel-0           22.9073      2.2907
    approval-voting-parallel-1           23.0417      2.3042
    approval-voting-parallel-2           22.0445      2.2045
    approval-voting-parallel-3           22.7234      2.2723
    approval-voting-parallel-4           21.9788      2.1979
    approval-voting-parallel-5           23.0601      2.3060
    approval-voting-parallel-6           22.4805      2.2481
    approval-voting-parallel-7           21.8330      2.1833
    approval-voting-parallel-db          37.1954      3.7195.  --- the approval-voting thread.
    ```
    
    # Enablement strategy
    
    Because just some trivial plumbing is needed in approval-distribution
    and approval-voting to be able to run things in parallel and because
    this subsystems plays a critical part in the system this PR proposes
    that we keep both ways of running the approval work, as separated
    subsystems and just a single subsystem(`approval-voting-parallel`) which
    has multiple workers for the distribution work and one worker for the
    approval-voting work and switch between them with a comandline flag.
    
    The benefits for this is twofold.
    1. With the same polkadot binary we can easily switch just a few
    validators to use the parallel approach and gradually make this the
    default way of running, if now issues arise.
    2. In the worst case scenario were it becomes the default way of running
    things, but we discover there are critical issues with it we have the
    path to quickly disable it by asking validators to adjust their command
    line flags.
    
    
    # Next steps
    - [x] Make sure through various testing we are not missing anything 
    - [x] Polish the implementations to make them production ready
    - [x] Add Unittest Tests for approval-voting-parallel.
    - [x] Define and implement the strategy for rolling this change, so that
    the blast radius is minimal(single validator) in case there are problems
    with the implementation.
    - [x]  Versi long running tests.
    - [x] Add relevant metrics.
    
    @ordian @eskimor @sandreim @AndreiEres
    
    , let me know what you think.
    
    ---------
    
    Signed-off-by: default avatarAlexandru Gheorghe <alexandru.gheorghe@parity.io>
    Unverified
    b16237ad
Code owners
Assign users and groups as approvers for specific file changes. Learn more.
0016-approval-voting-parallel.zndsl 1.69 KiB