1. Apr 19, 2024
  2. Apr 18, 2024
  3. Apr 16, 2024
  4. Apr 12, 2024
  5. Apr 08, 2024
  6. Apr 05, 2024
  7. Apr 02, 2024
  8. Apr 01, 2024
  9. Mar 28, 2024
  10. Mar 26, 2024
    • ordian's avatar
      fix regression in approval-voting introduced in #3747 (#3831) · 3fc5b826
      ordian authored
      
      
      Fixes #3826.
      
      The docs on the `candidates` field of `BlockEntry` were incorrectly
      stating that they are sorted by core index. The (incorrect) optimization
      was introduced in #3747 based on this assumption. The actual ordering is
      based on `CandidateIncluded` events ordering in the runtime. We revert
      this optimization here.
      
      - [x] verify the underlying issue
      - [x] add a regression test
      
      ---------
      
      Co-authored-by: default avatarBastian Köcher <[email protected]>
      3fc5b826
    • Dcompoze's avatar
      Fix spelling mistakes across the whole repository (#3808) · 002d9260
      Dcompoze authored
      **Update:** Pushed additional changes based on the review comments.
      
      **This pull request fixes various spelling mistakes in this
      repository.**
      
      Most of the changes are contained in the first **3** commits:
      
      - `Fix spelling mistakes in comments and docs`
      
      - `Fix spelling mistakes in test names`
      
      - `Fix spelling mistakes in error messages, panic messages, logs and
      tracing`
      
      Other source code spelling mistakes are separated into individual
      commits for easier reviewing:
      
      - `Fix the spelling of 'authority'`
      
      - `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`
      
      - `Fix the spelling of 'prev_enqueud_messages'`
      
      - `Fix the spelling of 'endpoint'`
      
      - `Fix the spelling of 'children'`
      
      - `Fix the spelling of 'PenpalSiblingSovereignAccount'`
      
      - `Fix the spelling of 'PenpalSudoAccount'`
      
      - `Fix the spelling of 'insufficient'`
      
      - `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`
      
      - `Fix the spelling of 'subtracted'`
      
      - `Fix the spelling of 'CandidatePendingAvailability'`
      
      - `Fix the spelling of 'exclusive'`
      
      - `Fix the spelling of 'until'`
      
      - `Fix the spelling of 'discriminator'`
      
      - `Fix the spelling of 'nonexistent'`
      
      - `Fix the spelling of 'subsystem'`
      
      - `Fix the spelling of 'indices'`
      
      - `Fix the spelling of 'committed'`
      
      - `Fix the spelling of 'topology'`
      
      - `Fix the spelling of 'response'`
      
      - `Fix the spelling of 'beneficiary'`
      
      - `Fix the spelling of 'formatted'`
      
      - `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`
      
      - `Fix the spelling of 'succeeded'`
      
      - `Fix the spelling of 'reopened'`
      
      - `Fix the spelling of 'proposer'`
      
      - `Fix the spelling of 'InstantiationNonce'`
      
      - `Fix the spelling of 'depositor'`
      
      - `Fix the spelling of 'expiration'`
      
      - `Fix the spelling of 'phantom'`
      
      - `Fix the spelling of 'AggregatedKeyValue'`
      
      - `Fix the spelling of 'randomness'`
      
      - `Fix the spelling of 'defendant'`
      
      - `Fix the spelling of 'AquaticMammal'`
      
      - `Fix the spelling of 'transactions'`
      
      - `Fix the spelling of 'PassingTracingSubscriber'`
      
      - `Fix the spelling of 'TxSignaturePayload'`
      
      - `Fix the spelling of 'versioning'`
      
      - `Fix the spelling of 'descendant'`
      
      - `Fix the spelling of 'overridden'`
      
      - `Fix the spelling of 'network'`
      
      Let me know if this structure is adequate.
      
      **Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
      `Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
      as it is.
      
      ~~**Note:** In some places the term `Receival` is used to refer to
      message reception, IMO `Reception` is the correct word here, but I left
      it as it is.~~
      
      ~~**Note:** In some places the term `Overlayed` is used instead of the
      more acceptable version `Overlaid` but I also left it as it is.~~
      
      ~~**Note:** In some places the term `Applyable` is used instead of the
      correct version `Applicable` but I also left it as it is.~~
      
      **Note:** Some usage of British vs American english e.g. `judgement` vs
      `judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
      are both present in different places, but I suppose that's
      understandable given the number of contributors.
      
      ~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
      triggers errors in CI when I make changes to it, so I left it as it
      is.~~
      002d9260
  11. Mar 25, 2024
  12. Mar 21, 2024
  13. Mar 20, 2024
  14. Mar 15, 2024
    • ordian's avatar
      collator protocol changes for elastic scaling (validator side) (#3302) · 02e1a7f4
      ordian authored
      Fixes #3128.
      
      This introduces a new variant for the collation response from the
      collator that includes the parent head data. For now, collators won't
      send this new variant. We'll need to change the collator side of the
      collator protocol to detect all the cores assigned to a para and send
      the parent head data in the case when it's more than 1 core.
      
      - [x] validate approach
      - [x] check head data hash
      02e1a7f4
  15. Mar 12, 2024
    • Alexandru Gheorghe's avatar
      Add api-name in `cannot query the runtime API version` warning (#3653) · 1ead5977
      Alexandru Gheorghe authored
      
      
      Sometimes we see nodes printing this warning:
      ```
      cannot query the runtime API version: Api called for an unknown Block: State already discarded for
      ```
      
      The log is harmless, but let's print the api we got this for, so that we
      can track its call site and truly confirm it is harmless or fix it.
      
      Signed-off-by: default avatarAlexandru Gheorghe <[email protected]>
      1ead5977
    • Koute's avatar
      Add a PolkaVM-based executor (#3458) · b0f34e4b
      Koute authored
      This PR adds a new PolkaVM-based executor to Substrate.
      
      - The executor can now be used to actually run a PolkaVM-based runtime,
      and successfully produces blocks.
      - The executor is always compiled-in, but is disabled by default.
      - The `SUBSTRATE_ENABLE_POLKAVM` environment variable must be set to `1`
      to enable the executor, in which case the node will accept both WASM and
      PolkaVM program blobs (otherwise it'll default to WASM-only). This is
      deliberately undocumented and not explicitly exposed anywhere (e.g. in
      the command line arguments, or in the API) to disincentivize anyone from
      enabling it in production. If/when we'll move this into production usage
      I'll remove the environment variable and do it "properly".
      - I did not use our legacy runtime allocator for the PolkaVM executor,
      so currently every allocation inside of the runtime will leak guest
      memory until that particular instance is destroyed. The idea here is
      that I will work on the https://github.com/polkadot-fellows/RFCs/pull/4
      which will remove the need for the legacy allocator under WASM, and that
      will also allow us to use a proper non-leaking allocator under PolkaVM.
      - I also did some minor cleanups of the WASM executor and deleted some
      dead code.
      
      No prdocs included since this is not intended to be an end-user feature,
      but an unofficial experiment, and shouldn't affect any current
      production user. Once this is production-ready a full Polkadot
      Fellowship RFC will be necessary anyway.
      b0f34e4b
  16. Mar 08, 2024
  17. Mar 07, 2024
  18. Mar 01, 2024
    • Alin Dima's avatar
      provisioner: allow multiple cores assigned to the same para (#3233) · 62b78a16
      Alin Dima authored
      https://github.com/paritytech/polkadot-sdk/issues/3130
      
      builds on top of https://github.com/paritytech/polkadot-sdk/pull/3160
      
      Processes the availability cores and builds a record of how many
      candidates it should request from prospective-parachains and their
      predecessors.
      Tries to supply as many candidates as the runtime can back. Note that
      the runtime changes to back multiple candidates per para are not yet
      done, but this paves the way for it.
      
      The following backing/inclusion policy is assumed:
      1. the runtime will never back candidates of the same para which don't
      form a chain with the already backed candidates. Even if the others are
      still pending availability. We're optimistic that they won't time out
      and we don't want to back parachain forks (as the complexity would be
      huge).
      2. if a candidate is timed out of the core before being included, all of
      its successors occupying a core will be evicted.
      3. only the candidates which are made available and form a chain
      starting from the on-chain para head may be included/enacted and cleared
      from the cores. In other words, if para head is at A and the cores are
      occupied by B->C->D, and B and D are made available, only B will be
      included and its core cleared. C and D will remain on the cores awaiting
      for C to be made available or timed out. As point (2) above already
      says, if C is timed out, D will also be dropped.
      4. The runtime will deduplicate candidates which form a cycle. For
      example if the provisioner supplies candidates A->B->A, the runtime will
      only back A (as the state output will be the same)
      
      Note that if a candidate is timed out, we don't guarantee that in the
      next relay chain block the block author will be able to fill all of the
      timed out cores of the para. That increases complexity by a lot.
      Instead, the provisioner will supply N candidates where N is the number
      of candidates timed out, but doesn't include their successors which will
      be also deleted by the runtime. This'll be backfilled in the next relay
      chain block.
      
      Adjacent changes:
      - Also fixes: https://github.com/paritytech/polkadot-sdk/issues/3141
      - For non prospective-parachains, don't supply multiple candidates per
      para (we can't have elastic scaling without prospective parachains
      enabled). paras_inherent should already sanitise this input but it's
      more efficient this way.
      
      Note: all of these changes are backwards-compatible with the
      non-elastic-scaling scenario (one core per para).
      62b78a16
  19. Feb 29, 2024
  20. Feb 28, 2024
  21. Feb 23, 2024
  22. Feb 22, 2024
  23. Feb 20, 2024
    • Oliver Tale-Yazdi's avatar
      Lift dependencies to the workspace (Part 2/x) (#3366) · e89d0fca
      Oliver Tale-Yazdi authored
      
      
      Lifting some more dependencies to the workspace. Just using the
      most-often updated ones for now.
      It can be reproduced locally.
      
      ```sh
      # First you can check if there would be semver incompatible bumps (looks good in this case):
      $ zepter transpose dependency lift-to-workspace --ignore-errors syn quote thiserror "regex:^serde.*"
      
      # Then apply the changes:
      $ zepter transpose dependency lift-to-workspace --version-resolver=highest syn quote thiserror "regex:^serde.*" --fix
      
      # And format the changes:
      $ taplo format --config .config/taplo.toml
      ```
      
      ---------
      
      Signed-off-by: default avatarOliver Tale-Yazdi <[email protected]>
      e89d0fca
    • Bastian Köcher's avatar
      Downgrade log message to `trace` (#3405) · ef6ac94f
      Bastian Köcher authored
      This spams logs in `Debug` with no useful information.
      ef6ac94f
  24. Feb 17, 2024
  25. Feb 12, 2024
  26. Feb 11, 2024
  27. Feb 06, 2024
  28. Feb 05, 2024
    • Alexandru Gheorghe's avatar
      Introduce approval-voting/distribution benchmark (#2621) · f9f88688
      Alexandru Gheorghe authored
      ## Summary
      Built on top of the tooling and ideas introduced in
      https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
      a synthetic benchmark for measuring and assessing the performance
      characteristics of the approval-voting and approval-distribution
      subsystems.
      
      Currently this allows, us to simulate the behaviours of these systems
      based on the following dimensions:
      ```
      TestConfiguration:
      # Test 1
      - objective: !ApprovalsTest
          last_considered_tranche: 89
          min_coalesce: 1
          max_coalesce: 6
          enable_assignments_v2: true
          send_till_tranche: 60
          stop_when_approved: false
          coalesce_tranche_diff: 12
          workdir_prefix: "/tmp"
          num_no_shows_per_candidate: 0
          approval_distribution_expected_tof: 6.0
          approval_distribution_cpu_ms: 3.0
          approval_voting_cpu_ms: 4.30
        n_validators: 500
        n_cores: 100
        n_included_candidates: 100
        min_pov_size: 1120
        max_pov_size: 5120
        peer_bandwidth: 524288000000
        bandwidth: 524288000000
        latency:
          min_latency:
            secs: 0
            nanos: 1000000
          max_latency:
            secs: 0
            nanos: 100000000
        error: 0
        num_blocks: 10
      ```
      
      ## The approach
      1. We build a real overseer with the real implementations for
      approval-voting and approval-distribution subsystems.
      2. For a given network size, for each validator we pre-computed all
      potential assignments and approvals it would send, because this a
      computation heavy operation this will be cached on a file on disk and be
      re-used if the generation parameters don't change.
      3. The messages will be sent accordingly to the configured parameters
      and those are split into 3 main benchmarking scenarios.
      
      ## Benchmarking scenarios
      
      ### Best case scenario *approvals_throughput_best_case.yaml*
      It send to the approval-distribution only the minimum required tranche
      to gathered the needed_approvals, so that a candidate is approved.
      
      ### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
      It sends the tranche needed to approve a candidate when we have a
      maximum of *num_no_shows_per_candidate* tranches with no-shows for each
      candidate.
      
      ### Maximum throughput *approvals_throughput.yaml*
      It sends all the tranches for each block and measures the used CPU and
      necessary network bandwidth. by the approval-voting and
      approval-distribution subsystem.
      
      ## How to run it
      ```
      cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
      ```
      
      ## Evaluating performance
      ### Use the real subsystems metrics
      If you follow the steps in
      https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
      for installing locally prometheus and grafana, all real metrics for the
      `approval-distribution`, `approval-voting` and overseer are available.
      E.g:
      <img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">
      
      <img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">
      
      <img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">
      
      <img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">
      
      ### Profile with pyroscope
      1. Setup pyroscope following the steps in
      https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
      then run any of the benchmark scenario with `--profile` as the
      arguments.
      2. Open the pyroscope dashboard in grafana, e.g:
      <img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
      src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">
      
      
      
      ### Useful  logs
      1. Network bandwidth requirements:
      ```
      Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
      Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
      ```
      
      2. Cpu usage by the approval-distribution/approval-voting subsystems.
      ```
      approval-distribution CPU usage 84.061s
      approval-distribution CPU usage per block 8.406s
      approval-voting CPU usage 96.532s
      approval-voting CPU usage per block 9.653s
      ```
      
      3. Time passed until a given block is approved
      ```
       Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
      Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
      ```
      
      ### Using benchmark to quantify improvements from
      https://github.com/paritytech/polkadot-sdk/pull/1178 +
      https://github.com/paritytech/polkadot-sdk/pull/1191
      
      Using a versi-node we compare the scenarios where all new optimisations
      are disabled with a scenarios where tranche0 assignments are sent in a
      single message and a conservative simulation where the coalescing of
      approvals gives us just 50% reduction in the number of messages we send.
      
      Overall, what we see is a speedup of around 30-40% in the time it takes
      to process the necessary messages and a 30-40% reduction in the
      necessary bandwidth.
      
      #### Best case scenario comparison(minimum required tranches sent).
      Unoptimised
      ```
          Number of blocks: 10
          Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
          Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
          approval-distribution CPU usage 6.732s
          approval-distribution CPU usage per block 0.673s
          approval-voting CPU usage 9.523s
          approval-voting CPU usage per block 0.952s
      ```
      
      vs Optimisation enabled
      ```
         Number of blocks: 10
         Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
         Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
         approval-distribution CPU usage 4.658s
         approval-distribution CPU usage per block 0.466s
         approval-voting CPU usage 6.236s
         approval-voting CPU usage per block 0.624s
      ```
      
      #### Worst case all tranches sent, very unlikely happens when sharding
      breaks.
      
      Unoptimised
      ```
         Number of blocks: 10
         Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
         Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
         approval-distribution CPU usage 118.681s
         approval-distribution CPU usage per block 11.868s
         approval-voting CPU usage 124.118s
         approval-voting CPU usage per block 12.412s
      ```
      
      vs optimised
      ```
          Number of blocks: 10
          Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
          Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
          approval-distribution CPU usage 84.061s
          approval-distribution CPU usage per block 8.406s
          approval-voting CPU usage 96.532s
          approval-voting CPU usage per block 9.653s
      ```
      
      
      ## TODOs
      [x] Polish implementation.
      [x] Use what we have so far to evaluate
      https://github.com/paritytech/polkadot-sdk/pull/1191
      
       before merging.
      [x] List of features and additional dimensions we want to use for
      benchmarking.
      [x] Run benchmark on hardware similar with versi and kusama nodes.
      [ ] Add benchmark to be run in CI for catching regression in
      performance.
      [ ] Rebase on latest changes for network emulation.
      
      ---------
      
      Signed-off-by: default avatarAndrei Sandu <[email protected]>
      Signed-off-by: default avatarAlexandru Gheorghe <[email protected]>
      Co-authored-by: default avatarAndrei Sandu <[email protected]>
      Co-authored-by: default avatarAndrei Sandu <[email protected]>
      f9f88688
  29. Feb 02, 2024
  30. Jan 29, 2024
    • s0me0ne-unkn0wn's avatar
      Do not run unneeded subsystems on collator and its alongside node (#3061) · 3e8139e7
      s0me0ne-unkn0wn authored
      Currently, collators and their alongside nodes spin up a full-scale
      overseer running a bunch of subsystems that are not needed if the node
      is not a validator. That was considered to be harmless; however, we've
      got problems with unused subsystems getting stalled for a reason not
      currently known, resulting in the overseer exiting and bringing down the
      whole node.
      
      This PR aims to only run needed subsystems on such nodes, replacing the
      rest with `DummySubsystem`.
      
      It also enables collator-optimized availability recovery subsystem
      implementation.
      
      Partially solves #1730.
      3e8139e7
  31. Jan 26, 2024