1. Apr 25, 2024
  2. Apr 08, 2024
    • Aaro Altonen's avatar
      Integrate litep2p into Polkadot SDK (#2944) · 80616f6d
      Aaro Altonen authored
      [litep2p](https://github.com/altonen/litep2p) is a libp2p-compatible P2P
      networking library. It supports all of the features of `rust-libp2p`
      that are currently being utilized by Polkadot SDK.
      
      Compared to `rust-libp2p`, `litep2p` has a quite different architecture
      which is why the new `litep2p` network backend is only able to use a
      little of the existing code in `sc-network`. The design has been mainly
      influenced by how we'd wish to structure our networking-related code in
      Polkadot SDK: independent higher-levels protocols directly communicating
      with the network over links that support bidirectional backpressure. A
      good example would be `NotificationHandle`/`RequestResponseHandle`
      abstractions which allow, e.g., `SyncingEngine` to directly communicate
      with peers to announce/request blocks.
      
      I've tried running `polkadot --network-backend litep2p` with a few
      different peer configurations and there is a noticeable reduction in
      networking CPU usage. For high load (`--out-peers 200`), networking CPU
      usage goes down from ~110% to ~30% (80 pp) and for normal load
      (`--out-peers 40`), the usage goes down from ~55% to ~18% (37 pp).
      
      These should not be taken as final numbers because:
      
      a) there are still some low-hanging optimization fruits, such as
      enabling [receive window
      auto-tuning](https://github.com/libp2p/rust-yamux/pull/176
      
      ), integrating
      `Peerset` more closely with `litep2p` or improving memory usage of the
      WebSocket transport
      b) fixing bugs/instabilities that incorrectly cause `litep2p` to do less
      work will increase the networking CPU usage
      c) verification in a more diverse set of tests/conditions is needed
      
      Nevertheless, these numbers should give an early estimate for CPU usage
      of the new networking backend.
      
      This PR consists of three separate changes:
      * introduce a generic `PeerId` (wrapper around `Multihash`) so that we
      don't have use `NetworkService::PeerId` in every part of the code that
      uses a `PeerId`
      * introduce `NetworkBackend` trait, implement it for the libp2p network
      stack and make Polkadot SDK generic over `NetworkBackend`
        * implement `NetworkBackend` for litep2p
      
      The new library should be considered experimental which is why
      `rust-libp2p` will remain as the default option for the time being. This
      PR currently depends on the master branch of `litep2p` but I'll cut a
      new release for the library once all review comments have been
      addresses.
      
      ---------
      
      Signed-off-by: default avatarAlexandru Vasile <[email protected]>
      Co-authored-by: default avatarDmitry Markin <[email protected]>
      Co-authored-by: default avatarAlexandru Vasile <[email protected]>
      Co-authored-by: default avatarAlexandru Vasile <[email protected]>
      80616f6d
  3. Apr 02, 2024
  4. Apr 01, 2024
  5. Mar 26, 2024
    • Andrei Eres's avatar
      [subsystem-benchmarks] Save results to json (#3829) · fd79b3b0
      Andrei Eres authored
      Here we add the ability to save subsystem benchmark results in JSON
      format to display them as graphs
      
      To draw graphs, CI team will use
      [github-action-benchmark](https://github.com/benchmark-action/github-action-benchmark).
      Since we are using custom benchmarks, we need to prepare [a specific
      data
      type](https://github.com/benchmark-action/github-action-benchmark?tab=readme-ov-file#examples):
      ```
      [
          {
              "name": "CPU Load",
              "unit": "Percent",
              "value": 50
          }
      ]
      ```
      
      Then we'll get graphs like this: 
      
      ![example](https://raw.githubusercontent.com/rhysd/ss/master/github-action-benchmark/main.png)
      
      [A live page with
      graphs](https://benchmark-action.github.io/github-action-benchmark/dev/bench/
      
      )
      
      ---------
      
      Co-authored-by: default avatarordian <[email protected]>
      fd79b3b0
    • Dcompoze's avatar
      Fix spelling mistakes across the whole repository (#3808) · 002d9260
      Dcompoze authored
      **Update:** Pushed additional changes based on the review comments.
      
      **This pull request fixes various spelling mistakes in this
      repository.**
      
      Most of the changes are contained in the first **3** commits:
      
      - `Fix spelling mistakes in comments and docs`
      
      - `Fix spelling mistakes in test names`
      
      - `Fix spelling mistakes in error messages, panic messages, logs and
      tracing`
      
      Other source code spelling mistakes are separated into individual
      commits for easier reviewing:
      
      - `Fix the spelling of 'authority'`
      
      - `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`
      
      - `Fix the spelling of 'prev_enqueud_messages'`
      
      - `Fix the spelling of 'endpoint'`
      
      - `Fix the spelling of 'children'`
      
      - `Fix the spelling of 'PenpalSiblingSovereignAccount'`
      
      - `Fix the spelling of 'PenpalSudoAccount'`
      
      - `Fix the spelling of 'insufficient'`
      
      - `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`
      
      - `Fix the spelling of 'subtracted'`
      
      - `Fix the spelling of 'CandidatePendingAvailability'`
      
      - `Fix the spelling of 'exclusive'`
      
      - `Fix the spelling of 'until'`
      
      - `Fix the spelling of 'discriminator'`
      
      - `Fix the spelling of 'nonexistent'`
      
      - `Fix the spelling of 'subsystem'`
      
      - `Fix the spelling of 'indices'`
      
      - `Fix the spelling of 'committed'`
      
      - `Fix the spelling of 'topology'`
      
      - `Fix the spelling of 'response'`
      
      - `Fix the spelling of 'beneficiary'`
      
      - `Fix the spelling of 'formatted'`
      
      - `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`
      
      - `Fix the spelling of 'succeeded'`
      
      - `Fix the spelling of 'reopened'`
      
      - `Fix the spelling of 'proposer'`
      
      - `Fix the spelling of 'InstantiationNonce'`
      
      - `Fix the spelling of 'depositor'`
      
      - `Fix the spelling of 'expiration'`
      
      - `Fix the spelling of 'phantom'`
      
      - `Fix the spelling of 'AggregatedKeyValue'`
      
      - `Fix the spelling of 'randomness'`
      
      - `Fix the spelling of 'defendant'`
      
      - `Fix the spelling of 'AquaticMammal'`
      
      - `Fix the spelling of 'transactions'`
      
      - `Fix the spelling of 'PassingTracingSubscriber'`
      
      - `Fix the spelling of 'TxSignaturePayload'`
      
      - `Fix the spelling of 'versioning'`
      
      - `Fix the spelling of 'descendant'`
      
      - `Fix the spelling of 'overridden'`
      
      - `Fix the spelling of 'network'`
      
      Let me know if this structure is adequate.
      
      **Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
      `Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
      as it is.
      
      ~~**Note:** In some places the term `Receival` is used to refer to
      message reception, IMO `Reception` is the correct word here, but I left
      it as it is.~~
      
      ~~**Note:** In some places the term `Overlayed` is used instead of the
      more acceptable version `Overlaid` but I also left it as it is.~~
      
      ~~**Note:** In some places the term `Applyable` is used instead of the
      correct version `Applicable` but I also left it as it is.~~
      
      **Note:** Some usage of British vs American english e.g. `judgement` vs
      `judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
      are both present in different places, but I suppose that's
      understandable given the number of contributors.
      
      ~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
      triggers errors in CI when I make changes to it, so I left it as it
      is.~~
      002d9260
    • Dcompoze's avatar
      ea97863c
  6. Mar 25, 2024
  7. Mar 18, 2024
  8. Mar 17, 2024
  9. Mar 13, 2024
  10. Mar 11, 2024
    • Andrei Eres's avatar
      subsystem-bench: adjust test config to Kusama (#3583) · 05381afc
      Andrei Eres authored
      Fixes https://github.com/paritytech/polkadot-sdk/issues/3528
      
      ```rust
      latency:
          mean_latency_ms = 30 // common sense
          std_dev = 2.0 // common sense
      n_validators = 300 // max number of validators, from chain config
      n_cores = 60 // 300/5
      max_validators_per_core = 5 // default
      min_pov_size = 5120 // max
      max_pov_size = 5120 // max
      peer_bandwidth = 44040192 // from the Parity's kusama validators
      bandwidth = 44040192 // from the Parity's kusama validators
      connectivity = 90 // we need to be connected to 90-95% of peers
      ```
      05381afc
  11. Mar 08, 2024
  12. Mar 07, 2024
  13. Mar 01, 2024
    • Andrei Eres's avatar
      subsystem-bench: add regression tests for availability read and write (#3311) · f0e589d7
      Andrei Eres authored
      ### What's been done
      - `subsystem-bench` has been split into two parts: a cli benchmark
      runner and a library.
      - The cli runner is quite simple. It just allows us to run `.yaml` based
      test sequences. Now it should only be used to run benchmarks during
      development.
      - The library is used in the cli runner and in regression tests. Some
      code is changed to make the library independent of the runner.
      - Added first regression tests for availability read and write that
      replicate existing test sequences.
      
      ### How we run regression tests
      - Regression tests are simply rust integration tests without the
      harnesses.
      - They should only be compiled under the `subsystem-benchmarks` feature
      to prevent them from running with other tests.
      - This doesn't work when running tests with `nextest` in CI, so
      additional filters have been added to the `nextest` runs.
      - Each benchmark run takes a different time in the beginning, so we
      "warm up" the tests until their CPU usage differs by only 1%.
      - After the warm-up, we run the benchmarks a few more times and compare
      the average with the exception using a precision.
      
      ### What is still wrong?
      - I haven't managed to set up approval voting tests. The spread of their
      results is too large and can't be narrowed down in a reasonable amount
      of time in the warm-up phase.
      - The tests start an unconfigurable prometheus endpoint inside, which
      causes errors because they use the same 9999 port. I disable it with a
      flag, but I think it's better to extract the endpoint launching outside
      the test, as we already do with `valgrind` and `pyroscope`. But we still
      use `prometheus` inside the tests.
      
      ### Future work
      * https://github.com/paritytech/polkadot-sdk/issues/3528
      * https://github.com/paritytech/polkadot-sdk/issues/3529
      * https://github.com/paritytech/polkadot-sdk/issues/3530
      * https://github.com/paritytech/polkadot-sdk/issues/3531
      
      
      
      ---------
      
      Co-authored-by: default avatarAlexander Samusev <[email protected]>
      f0e589d7
  14. Feb 20, 2024
    • Oliver Tale-Yazdi's avatar
      Lift dependencies to the workspace (Part 2/x) (#3366) · e89d0fca
      Oliver Tale-Yazdi authored
      
      
      Lifting some more dependencies to the workspace. Just using the
      most-often updated ones for now.
      It can be reproduced locally.
      
      ```sh
      # First you can check if there would be semver incompatible bumps (looks good in this case):
      $ zepter transpose dependency lift-to-workspace --ignore-errors syn quote thiserror "regex:^serde.*"
      
      # Then apply the changes:
      $ zepter transpose dependency lift-to-workspace --version-resolver=highest syn quote thiserror "regex:^serde.*" --fix
      
      # And format the changes:
      $ taplo format --config .config/taplo.toml
      ```
      
      ---------
      
      Signed-off-by: default avatarOliver Tale-Yazdi <[email protected]>
      e89d0fca
  15. Feb 19, 2024
  16. Feb 16, 2024
  17. Feb 12, 2024
  18. Feb 08, 2024
  19. Feb 06, 2024
    • Andrei Eres's avatar
      subsystem-bench: Prepare CI output (#3158) · 9e6298e7
      Andrei Eres authored
      
      
      1. Benchmark results are collected in a single struct.
      2. The output of the results is prettified.
      3. The result struct used to save the output as a yaml and store it in
      artifacts in a CI job.
      
      ```
      $ cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/availability_read.yaml | tee output.txt
      $ cat output.txt
      
      polkadot/node/subsystem-bench/examples/availability_read.yaml #1
      
      Network usage, KiB                     total   per block
      Received from peers               510796.000  170265.333
      Sent to peers                        221.000      73.667
      
      CPU usage, s                           total   per block
      availability-recovery                 38.671      12.890
      Test environment                       0.255       0.085
      
      
      polkadot/node/subsystem-bench/examples/availability_read.yaml #2
      
      Network usage, KiB                     total   per block
      Received from peers               413633.000  137877.667
      Sent to peers                        353.000     117.667
      
      CPU usage, s                           total   per block
      availability-recovery                 52.630      17.543
      Test environment                       0.271       0.090
      
      
      polkadot/node/subsystem-bench/examples/availability_read.yaml #3
      
      Network usage, KiB                     total   per block
      Received from peers               424379.000  141459.667
      Sent to peers                        703.000     234.333
      
      CPU usage, s                           total   per block
      availability-recovery                 51.128      17.043
      Test environment                       0.502       0.167
      
      ```
      
      ```
      $ cargo run -p polkadot-subsystem-bench --release -- --ci test-sequence --path polkadot/node/subsystem-bench/examples/availability_read.yaml | tee output.txt
      $ cat output.txt
      - benchmark_name: 'polkadot/node/subsystem-bench/examples/availability_read.yaml #1'
        network:
        - resource: Received from peers
          total: 509011.0
          per_block: 169670.33333333334
        - resource: Sent to peers
          total: 220.0
          per_block: 73.33333333333333
        cpu:
        - resource: availability-recovery
          total: 31.845848445
          per_block: 10.615282815
        - resource: Test environment
          total: 0.23582828799999941
          per_block: 0.07860942933333313
      
      - benchmark_name: 'polkadot/node/subsystem-bench/examples/availability_read.yaml #2'
        network:
        - resource: Received from peers
          total: 411738.0
          per_block: 137246.0
        - resource: Sent to peers
          total: 351.0
          per_block: 117.0
        cpu:
        - resource: availability-recovery
          total: 18.93596025099999
          per_block: 6.31198675033333
        - resource: Test environment
          total: 0.2541994199999979
          per_block: 0.0847331399999993
      
      - benchmark_name: 'polkadot/node/subsystem-bench/examples/availability_read.yaml #3'
        network:
        - resource: Received from peers
          total: 424548.0
          per_block: 141516.0
        - resource: Sent to peers
          total: 703.0
          per_block: 234.33333333333334
        cpu:
        - resource: availability-recovery
          total: 16.54178526900001
          per_block: 5.513928423000003
        - resource: Test environment
          total: 0.43960946299999537
          per_block: 0.14653648766666513
      ```
      
      ---------
      
      Co-authored-by: default avatarAndrei Sandu <[email protected]>
      9e6298e7
  20. Feb 05, 2024
    • Alexandru Gheorghe's avatar
      Introduce approval-voting/distribution benchmark (#2621) · f9f88688
      Alexandru Gheorghe authored
      ## Summary
      Built on top of the tooling and ideas introduced in
      https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
      a synthetic benchmark for measuring and assessing the performance
      characteristics of the approval-voting and approval-distribution
      subsystems.
      
      Currently this allows, us to simulate the behaviours of these systems
      based on the following dimensions:
      ```
      TestConfiguration:
      # Test 1
      - objective: !ApprovalsTest
          last_considered_tranche: 89
          min_coalesce: 1
          max_coalesce: 6
          enable_assignments_v2: true
          send_till_tranche: 60
          stop_when_approved: false
          coalesce_tranche_diff: 12
          workdir_prefix: "/tmp"
          num_no_shows_per_candidate: 0
          approval_distribution_expected_tof: 6.0
          approval_distribution_cpu_ms: 3.0
          approval_voting_cpu_ms: 4.30
        n_validators: 500
        n_cores: 100
        n_included_candidates: 100
        min_pov_size: 1120
        max_pov_size: 5120
        peer_bandwidth: 524288000000
        bandwidth: 524288000000
        latency:
          min_latency:
            secs: 0
            nanos: 1000000
          max_latency:
            secs: 0
            nanos: 100000000
        error: 0
        num_blocks: 10
      ```
      
      ## The approach
      1. We build a real overseer with the real implementations for
      approval-voting and approval-distribution subsystems.
      2. For a given network size, for each validator we pre-computed all
      potential assignments and approvals it would send, because this a
      computation heavy operation this will be cached on a file on disk and be
      re-used if the generation parameters don't change.
      3. The messages will be sent accordingly to the configured parameters
      and those are split into 3 main benchmarking scenarios.
      
      ## Benchmarking scenarios
      
      ### Best case scenario *approvals_throughput_best_case.yaml*
      It send to the approval-distribution only the minimum required tranche
      to gathered the needed_approvals, so that a candidate is approved.
      
      ### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
      It sends the tranche needed to approve a candidate when we have a
      maximum of *num_no_shows_per_candidate* tranches with no-shows for each
      candidate.
      
      ### Maximum throughput *approvals_throughput.yaml*
      It sends all the tranches for each block and measures the used CPU and
      necessary network bandwidth. by the approval-voting and
      approval-distribution subsystem.
      
      ## How to run it
      ```
      cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
      ```
      
      ## Evaluating performance
      ### Use the real subsystems metrics
      If you follow the steps in
      https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
      for installing locally prometheus and grafana, all real metrics for the
      `approval-distribution`, `approval-voting` and overseer are available.
      E.g:
      <img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">
      
      <img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">
      
      <img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">
      
      <img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">
      
      ### Profile with pyroscope
      1. Setup pyroscope following the steps in
      https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
      then run any of the benchmark scenario with `--profile` as the
      arguments.
      2. Open the pyroscope dashboard in grafana, e.g:
      <img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">
      
      
      
      ### Useful  logs
      1. Network bandwidth requirements:
      ```
      Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
      Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
      ```
      
      2. Cpu usage by the approval-distribution/approval-voting subsystems.
      ```
      approval-distribution CPU usage 84.061s
      approval-distribution CPU usage per block 8.406s
      approval-voting CPU usage 96.532s
      approval-voting CPU usage per block 9.653s
      ```
      
      3. Time passed until a given block is approved
      ```
       Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
      Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
      ```
      
      ### Using benchmark to quantify improvements from
      https://github.com/paritytech/polkadot-sdk/pull/1178 +
      https://github.com/paritytech/polkadot-sdk/pull/1191
      
      Using a versi-node we compare the scenarios where all new optimisations
      are disabled with a scenarios where tranche0 assignments are sent in a
      single message and a conservative simulation where the coalescing of
      approvals gives us just 50% reduction in the number of messages we send.
      
      Overall, what we see is a speedup of around 30-40% in the time it takes
      to process the necessary messages and a 30-40% reduction in the
      necessary bandwidth.
      
      #### Best case scenario comparison(minimum required tranches sent).
      Unoptimised
      ```
          Number of blocks: 10
          Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
          Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
          approval-distribution CPU usage 6.732s
          approval-distribution CPU usage per block 0.673s
          approval-voting CPU usage 9.523s
          approval-voting CPU usage per block 0.952s
      ```
      
      vs Optimisation enabled
      ```
         Number of blocks: 10
         Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
         Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
         approval-distribution CPU usage 4.658s
         approval-distribution CPU usage per block 0.466s
         approval-voting CPU usage 6.236s
         approval-voting CPU usage per block 0.624s
      ```
      
      #### Worst case all tranches sent, very unlikely happens when sharding
      breaks.
      
      Unoptimised
      ```
         Number of blocks: 10
         Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
         Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
         approval-distribution CPU usage 118.681s
         approval-distribution CPU usage per block 11.868s
         approval-voting CPU usage 124.118s
         approval-voting CPU usage per block 12.412s
      ```
      
      vs optimised
      ```
          Number of blocks: 10
          Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
          Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
          approval-distribution CPU usage 84.061s
          approval-distribution CPU usage per block 8.406s
          approval-voting CPU usage 96.532s
          approval-voting CPU usage per block 9.653s
      ```
      
      
      ## TODOs
      [x] Polish implementation.
      [x] Use what we have so far to evaluate
      https://github.com/paritytech/polkadot-sdk/pull/1191
      
       before merging.
      [x] List of features and additional dimensions we want to use for
      benchmarking.
      [x] Run benchmark on hardware similar with versi and kusama nodes.
      [ ] Add benchmark to be run in CI for catching regression in
      performance.
      [ ] Rebase on latest changes for network emulation.
      
      ---------
      
      Signed-off-by: default avatarAndrei Sandu <[email protected]>
      Signed-off-by: default avatarAlexandru Gheorghe <[email protected]>
      Co-authored-by: default avatarAndrei Sandu <[email protected]>
      Co-authored-by: default avatarAndrei Sandu <[email protected]>
      f9f88688
  21. Jan 30, 2024
  22. Jan 29, 2024
  23. Jan 26, 2024
  24. Jan 25, 2024
  25. Jan 21, 2024
  26. Jan 18, 2024
  27. Jan 16, 2024
    • Andrei Eres's avatar
      subsystem-bench: cache misses profiling (#2893) · ec7bfae0
      Andrei Eres authored
      ## Why we need it
      To provide another level of understanding to why polkadot's subsystems
      may perform slower than expected. Cache misses occur when processing
      large amounts of data, such as during availability recovery.
      
      ## Why Cachegrind
      Cachegrind has many drawbacks: it is slow, it uses its own cache
      simulation, which is very basic. But unlike `perf`, which is a great
      tool, Cachegrind can run in a virtual machine. This means we can easily
      run it in remote installations and even use it in CI/CD to catch
      possible regressions.
      
      Why Cachegrind and not Callgrind, another part of Valgrind? It is simply
      empirically proven that profiling runs faster with Cachegrind.
      
      ## First results
      First results have been obtained while testing of the approach. Here is
      an example.
      
      ```
      $ target/testnet/subsystem-bench --n-cores 10 --cache-misses data-availability-read
      $ cat cachegrind_report.txt
      I refs:        64,622,081,485
      I1  misses:         3,018,168
      LLi misses:           437,654
      I1  miss rate:           0.00%
      LLi miss rate:           0.00%
      
      D refs:        12,161,833,115  (9,868,356,364 rd   + 2,293,476,751 wr)
      D1  misses:       167,940,701  (   71,060,073 rd   +    96,880,628 wr)
      LLd misses:        33,550,018  (   16,685,853 rd   +    16,864,165 wr)
      D1  miss rate:            1.4% (          0.7%     +           4.2%  )
      LLd miss rate:            0.3% (          0.2%     +           0.7%  )
      
      LL refs:          170,958,869  (   74,078,241 rd   +    96,880,628 wr)
      LL misses:         33,987,672  (   17,123,507 rd   +    16,864,165 wr)
      LL miss rate:             0.0% (          0.0%     +           0.7%  )
      ```
      
      The CLI output shows that 1.4% of the L1 data cache missed, which is not
      so bad, given that the last-level cache had that data most of the time
      missing only 0.3%. Instruction data of the L1 has 0.00% misses of the
      time. Looking at an output file with `cg_annotate` shows that most of
      the misses occur during reed-solomon, which is expected.
      ec7bfae0
  28. Jan 15, 2024
  29. Jan 10, 2024
    • Alin Dima's avatar
      add fallback request for req-response protocols (#2771) · f2a750ee
      Alin Dima authored
      Previously, it was only possible to retry the same request on a
      different protocol name that had the exact same binary payloads.
      
      Introduce a way of trying a different request on a different protocol if
      the first one fails with Unsupported protocol.
      
      This helps with adding new req-response versions in polkadot while
      preserving compatibility with unupgraded nodes.
      
      The way req-response protocols were bumped previously was that they were
      bundled with some other notifications protocol upgrade, like for async
      backing (but that is more complicated, especially if the feature does
      not require any changes to a notifications protocol). Will be needed for
      implementing https://github.com/polkadot-fellows/RFCs/pull/47
      
      TODO:
      - [x]  add tests
      - [x] add guidance docs in polkadot about req-response protocol
      versioning
      f2a750ee
  30. Jan 09, 2024
  31. Jan 07, 2024
  32. Jan 05, 2024
  33. Jan 04, 2024
  34. Dec 19, 2023
  35. Dec 18, 2023
  36. Dec 14, 2023
    • Andrei Sandu's avatar
      Introduce subsystem benchmarking tool (#2528) · 8a6e9ef1
      Andrei Sandu authored
      This tool makes it easy to run parachain consensus stress/performance
      testing on your development machine or in CI.
      
      ## Motivation
      The parachain consensus node implementation spans across many modules
      which we call subsystems. Each subsystem is responsible for a small part
      of logic of the parachain consensus pipeline, but in general the most
      load and performance issues are localized in just a few core subsystems
      like `availability-recovery`, `approval-voting` or
      `dispute-coordinator`. In the absence of such a tool, we would run large
      test nets to load/stress test these parts of the system. Setting up and
      making sense of the amount of data produced by such a large test is very
      expensive, hard to orchestrate and is a huge development time sink.
      
      ## PR contents
      - CLI tool 
      - Data Availability Read test
      - reusable mockups and components needed so far
      - Documentation on how to get started
      
      ### Data Availability Read test
      
      An overseer is built with using a real `availability-recovery` susbsytem
      instance while dependent subsystems like `av-store`, `network-bridge`
      and `runtime-api` are mocked. The network bridge will emulate all the
      network peers and their answering to requests.
      
      The test is going to be run for a number of blocks. For each block it
      will generate send a “RecoverAvailableData” request for an arbitrary
      number of candidates. We wait for the subsystem to respond to all
      requests before moving to the next block.
      At the same time we collect the usual subsystem metrics and task CPU
      metrics and show some nice progress reports while running.
      
      ### Here is how the CLI looks like:
      
      ```
      [2023-11-28T13:06:27Z INFO  subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Created test environment.
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Pre-generating 60 candidates.
      [2023-11-28T13:06:30Z INFO  subsystem-bench::core] Initializing network emulation for 1000 peers.
      [2023-11-28T13:06:30Z INFO  subsystem-bench::availability] Current block 1/3
      [2023-11-28T13:06:30Z INFO  substrate_prometheus_endpoint] ️ Prometheus exporter started at 127.0.0.1:9999
      [2023-11-28T13:06:30Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:37Z INFO  subsystem_bench::availability] Block time 6262ms
      [2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Current block 2/3
      [2023-11-28T13:06:37Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:43Z INFO  subsystem_bench::availability] Block time 6369ms
      [2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Current block 3/3
      [2023-11-28T13:06:43Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time 6194ms
      [2023-11-28T13:06:49Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] All blocks processed in 18829ms
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Throughput: 102400 KiB/block
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time: 6276 ms
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] 
          
          Total received from network: 415 MiB
          Total sent to network: 724 KiB
          Total subsystem CPU usage 24.00s
          CPU usage per block 8.00s
          Total test environment CPU usage 0.15s
          CPU usage per block 0.05s
      ```
      
      ### Prometheus/Grafana stack in action
      <img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
      <img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
      <img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357
      
      ">
      
      ---------
      
      Signed-off-by: default avatarAndrei Sandu <[email protected]>
      8a6e9ef1