1. Jan 10, 2024
  2. Dec 19, 2023
  3. Dec 18, 2023
  4. Dec 14, 2023
    • Andrei Sandu's avatar
      Introduce subsystem benchmarking tool (#2528) · 8a6e9ef1
      Andrei Sandu authored
      This tool makes it easy to run parachain consensus stress/performance
      testing on your development machine or in CI.
      
      ## Motivation
      The parachain consensus node implementation spans across many modules
      which we call subsystems. Each subsystem is responsible for a small part
      of logic of the parachain consensus pipeline, but in general the most
      load and performance issues are localized in just a few core subsystems
      like `availability-recovery`, `approval-voting` or
      `dispute-coordinator`. In the absence of such a tool, we would run large
      test nets to load/stress test these parts of the system. Setting up and
      making sense of the amount of data produced by such a large test is very
      expensive, hard to orchestrate and is a huge development time sink.
      
      ## PR contents
      - CLI tool 
      - Data Availability Read test
      - reusable mockups and components needed so far
      - Documentation on how to get started
      
      ### Data Availability Read test
      
      An overseer is built with using a real `availability-recovery` susbsytem
      instance while dependent subsystems like `av-store`, `network-bridge`
      and `runtime-api` are mocked. The network bridge will emulate all the
      network peers and their answering to requests.
      
      The test is going to be run for a number of blocks. For each block it
      will generate send a “RecoverAvailableData” request for an arbitrary
      number of candidates. We wait for the subsystem to respond to all
      requests before moving to the next block.
      At the same time we collect the usual subsystem metrics and task CPU
      metrics and show some nice progress reports while running.
      
      ### Here is how the CLI looks like:
      
      ```
      [2023-11-28T13:06:27Z INFO  subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Created test environment.
      [2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Pre-generating 60 candidates.
      [2023-11-28T13:06:30Z INFO  subsystem-bench::core] Initializing network emulation for 1000 peers.
      [2023-11-28T13:06:30Z INFO  subsystem-bench::availability] Current block 1/3
      [2023-11-28T13:06:30Z INFO  substrate_prometheus_endpoint] ️ Prometheus exporter started at 127.0.0.1:9999
      [2023-11-28T13:06:30Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:37Z INFO  subsystem_bench::availability] Block time 6262ms
      [2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Current block 2/3
      [2023-11-28T13:06:37Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:43Z INFO  subsystem_bench::availability] Block time 6369ms
      [2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Current block 3/3
      [2023-11-28T13:06:43Z INFO  subsystem_bench::availability] 20 recoveries pending
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time 6194ms
      [2023-11-28T13:06:49Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] All blocks processed in 18829ms
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Throughput: 102400 KiB/block
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time: 6276 ms
      [2023-11-28T13:06:49Z INFO  subsystem_bench::availability] 
          
          Total received from network: 415 MiB
          Total sent to network: 724 KiB
          Total subsystem CPU usage 24.00s
          CPU usage per block 8.00s
          Total test environment CPU usage 0.15s
          CPU usage per block 0.05s
      ```
      
      ### Prometheus/Grafana stack in action
      <img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
      <img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
      <img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
      src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357
      
      ">
      
      ---------
      
      Signed-off-by: default avatarAndrei Sandu <[email protected]>
      8a6e9ef1
  5. Dec 13, 2023
    • Squirrel's avatar
      Set clippy lints in workspace (requires rust 1.74) (#2390) · be8e6268
      Squirrel authored
      
      
      We currently use a bit of a hack in `.cargo/config` to make sure that
      clippy isn't too annoying by specifying the list of lints.
      
      There is now a stable way to define lints for a workspace. The only down
      side is that every crate seems to have to opt into this so there's a
      *few* files modified in this PR.
      
      Dependencies:
      
      - [x] PR that upgrades CI to use rust 1.74 is merged.
      
      ---------
      
      Co-authored-by: default avatarjoe petrowski <[email protected]>
      Co-authored-by: default avatarBranislav Kontur <[email protected]>
      Co-authored-by: default avatarLiam Aharon <[email protected]>
      be8e6268
    • Alexandru Gheorghe's avatar
      Approve multiple candidates with a single signature (#1191) · a84dd0db
      Alexandru Gheorghe authored
      Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
      Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
      v0: https://github.com/paritytech/polkadot/pull/7554,
      
      ## Overall idea
      
      When approval-voting checks a candidate and is ready to advertise the
      approval, defer it in a per-relay chain block until we either have
      MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
      MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
      candidates we have available.
      
      This should allow us to reduce the number of approvals messages we have
      to create/send/verify. The parameters are configurable, so we should
      find some values that balance:
      
      - Security of the network: Delaying broadcasting of an approval
      shouldn't but the finality at risk and to make sure that never happens
      we won't delay sending a vote if we are past 2/3 from the no-show time.
      - Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
      MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
      the measurements we did on versi, it bottlenecks
      approval-distribution/approval-voting when increase significantly the
      number of validators and parachains
      - Block storage: In case of disputes we have to import this votes on
      chain and that increase the necessary storage with
      MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
      disputes are not the normal way of the network functioning and we will
      limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
      should be good enough. Alternatively, we could try to create a better
      way to store this on-chain through indirection, if that's needed.
      
      ## Other fixes:
      - Fixed the fact that we were sending random assignments to
      non-validators, that was wrong because those won't do anything with it
      and they won't gossip it either because they do not have a grid topology
      set, so we would waste the random assignments.
      - Added metrics to be able to debug potential no-shows and
      mis-processing of approvals/assignments.
      
      ## TODO:
      - [x] Get feedback, that this is moving in the right direction. @ordian
      @sandreim @eskimor @burdges, let me know what you think.
      - [x] More and more testing.
      - [x]  Test in versi.
      - [x] Make MAX_APPROVAL_COALESCE_COUNT &
      MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
      - [x] Make sure the backwards compatibility works correctly
      - [x] Make sure this direction is compatible with other streams of work:
      https://github.com/paritytech/polkadot-sdk/issues/635 &
      https://github.com/paritytech/polkadot-sdk/issues/742
      
      
      - [x] Final versi burn-in before merging
      
      ---------
      
      Signed-off-by: default avatarAlexandru Gheorghe <[email protected]>
      a84dd0db
  6. Dec 11, 2023
  7. Dec 06, 2023
  8. Dec 05, 2023
  9. Dec 01, 2023
  10. Nov 30, 2023
  11. Nov 28, 2023
    • Aaro Altonen's avatar
      Rework the event system of `sc-network` (#1370) · e71c484d
      Aaro Altonen authored
      This commit introduces a new concept called `NotificationService` which
      allows Polkadot protocols to communicate with the underlying
      notification protocol implementation directly, without routing events
      through `NetworkWorker`. This implies that each protocol has its own
      service which it uses to communicate with remote peers and that each
      `NotificationService` is unique with respect to the underlying
      notification protocol, meaning `NotificationService` for the transaction
      protocol can only be used to send and receive transaction-related
      notifications.
      
      The `NotificationService` concept introduces two additional benefits:
        * allow protocols to start using custom handshakes
        * allow protocols to accept/reject inbound peers
      
      Previously the validation of inbound connections was solely the
      responsibility of `ProtocolController`. This caused issues with light
      peers and `SyncingEngine` as `ProtocolController` would accept more
      peers than `SyncingEngine` could accept which caused peers to have
      differing views of their own states. `SyncingEngine` would reject excess
      peers but these rejections were not properly communicated to those peers
      causing them to assume that they were accepted.
      
      With `NotificationService`, the local handshake is not sent to remote
      peer if peer is rejected which allows it to detect that it was rejected.
      
      This commit also deprecates the use of `NetworkEventStream` for all
      notification-related events and going forward only DHT events are
      provided through `NetworkEventStream`. If protocols wish to follow each
      other's events, they must introduce additional abtractions, as is done
      for GRANDPA and transactions protocols by following the syncing protocol
      through `SyncEventStream`.
      
      Fixes https://github.com/paritytech/polkadot-sdk/issues/512
      Fixes https://github.com/paritytech/polkadot-sdk/issues/514
      Fixes https://github.com/paritytech/polkadot-sdk/issues/515
      Fixes https://github.com/paritytech/polkadot-sdk/issues/554
      Fixes https://github.com/paritytech/polkadot-sdk/issues/556
      
      ---
      These changes are transferred from
      https://github.com/paritytech/substrate/pull/14197
      
       but there are no
      functional changes compared to that PR
      
      ---------
      
      Co-authored-by: default avatarDmitry Markin <[email protected]>
      Co-authored-by: default avatarAlexandru Vasile <[email protected]>
      e71c484d
  12. Nov 14, 2023
  13. Nov 09, 2023
    • Oliver Tale-Yazdi's avatar
      Add descriptions to all published crates (#2029) · 48ea86f0
      Oliver Tale-Yazdi authored
      
      
      Missing descriptions (47):  
      
      - [x] `cumulus/client/collator/Cargo.toml`
      - [x] `cumulus/client/relay-chain-inprocess-interface/Cargo.toml`
      - [x] `cumulus/client/cli/Cargo.toml`
      - [x] `cumulus/client/service/Cargo.toml`
      - [x] `cumulus/client/relay-chain-rpc-interface/Cargo.toml`
      - [x] `cumulus/client/relay-chain-interface/Cargo.toml`
      - [x] `cumulus/client/relay-chain-minimal-node/Cargo.toml`
      - [x] `cumulus/parachains/pallets/parachain-info/Cargo.toml`
      - [x] `cumulus/parachains/pallets/ping/Cargo.toml`
      - [x] `cumulus/primitives/utility/Cargo.toml`
      - [x] `cumulus/primitives/aura/Cargo.toml`
      - [x] `cumulus/primitives/core/Cargo.toml`
      - [x] `cumulus/primitives/parachain-inherent/Cargo.toml`
      - [x] `cumulus/test/relay-sproof-builder/Cargo.toml`
      - [x] `cumulus/pallets/xcmp-queue/Cargo.toml`
      - [x] `cumulus/pallets/dmp-queue/Cargo.toml`
      - [x] `cumulus/pallets/xcm/Cargo.toml`
      - [x] `polkadot/erasure-coding/Cargo.toml`
      - [x] `polkadot/statement-table/Cargo.toml`
      - [x] `polkadot/primitives/Cargo.toml`
      - [x] `polkadot/rpc/Cargo.toml`
      - [x] `polkadot/node/service/Cargo.toml`
      - [x] `polkadot/node/core/parachains-inherent/Cargo.toml`
      - [x] `polkadot/node/core/approval-voting/Cargo.toml`
      - [x] `polkadot/node/core/dispute-coordinator/Cargo.toml`
      - [x] `polkadot/node/core/av-store/Cargo.toml`
      - [x] `polkadot/node/core/chain-api/Cargo.toml`
      - [x] `polkadot/node/core/prospective-parachains/Cargo.toml`
      - [x] `polkadot/node/core/backing/Cargo.toml`
      - [x] `polkadot/node/core/provisioner/Cargo.toml`
      - [x] `polkadot/node/core/runtime-api/Cargo.toml`
      - [x] `polkadot/node/core/bitfield-signing/Cargo.toml`
      - [x] `polkadot/node/network/dispute-distribution/Cargo.toml`
      - [x] `polkadot/node/network/bridge/Cargo.toml`
      - [x] `polkadot/node/network/collator-protocol/Cargo.toml`
      - [x] `polkadot/node/network/approval-distribution/Cargo.toml`
      - [x] `polkadot/node/network/availability-distribution/Cargo.toml`
      - [x] `polkadot/node/network/bitfield-distribution/Cargo.toml`
      - [x] `polkadot/node/network/gossip-support/Cargo.toml`
      - [x] `polkadot/node/network/availability-recovery/Cargo.toml`
      - [x] `polkadot/node/collation-generation/Cargo.toml`
      - [x] `polkadot/node/overseer/Cargo.toml`
      - [x] `polkadot/runtime/parachains/Cargo.toml`
      - [x] `polkadot/runtime/common/slot_range_helper/Cargo.toml`
      - [x] `polkadot/runtime/metrics/Cargo.toml`
      - [x] `polkadot/xcm/pallet-xcm-benchmarks/Cargo.toml`
      - [x] `polkadot/utils/generate-bags/Cargo.toml`
      - [x]  `substrate/bin/minimal/runtime/Cargo.toml`
      
      ---------
      
      Signed-off-by: default avatarOliver Tale-Yazdi <[email protected]>
      Signed-off-by: default avataralindima <[email protected]>
      Co-authored-by: default avatarordian <[email protected]>
      Co-authored-by: default avatarTsvetomir Dimitrov <[email protected]>
      Co-authored-by: default avatarMarcin S <[email protected]>
      Co-authored-by: default avataralindima <[email protected]>
      Co-authored-by: default avatarSebastian Kunert <[email protected]>
      Co-authored-by: default avatarDmitry Markin <[email protected]>
      Co-authored-by: default avatarjoe petrowski <[email protected]>
      Co-authored-by: default avatarLiam Aharon <[email protected]>
      48ea86f0
  14. Nov 06, 2023
  15. Oct 27, 2023
    • Alexandru Gheorghe's avatar
      make polkadot die graciously (#2056) · 3069b0af
      Alexandru Gheorghe authored
      
      
      While investigating some db migrations that make the node startup fail,
      I noticed that the node wasn't exiting and that the log file were
      growing exponentially, until my whole system was freezing and that makes
      it really hard to actually find why it was failing in the first place.
      
      E.g:
      ```
       ls -lh /tmp/zombie-01a04c2a2c0265d85f6440cf01c0f44a_-51319-uyggzuD4wEpV/bob.log
       32,6G oct 27 11:16 /tmp/zombie-01a04c2a2c0265d85f6440cf01c0f44a_-51319-uyggzuD4wEpV/bob.log
      ```
      
      This was happening because the following errors were being printed
      continously without the subsystem main loop exiting:
      
      From dispute-coordinator:
      ```
      WARN tokio-runtime-worker parachain::dispute-coordinator: error=Subsystem(Generated(Context("Signal channel is terminated and empty.")))
      ```
      
      From availability recovery:
      ```
      Erasure task channel closed. Node shutting down ?
      ```
      
      Signed-off-by: default avatarAlexandru Gheorghe <[email protected]>
      3069b0af
  16. Oct 23, 2023
  17. Oct 21, 2023
  18. Oct 19, 2023
    • Bastian Köcher's avatar
      Do not force collators to update after enabling async backing (#1920) · b967ba53
      Bastian Köcher authored
      The validators are checking if async backing is enabled by checking the
      version of the runtime api. If the runtime api is upgraded by a runtime
      upgrade, the validators start to also enable the async backing logic.
      However, just because async backing is enabled, it doesn't mean that all
      collators and parachain runtimes have upgraded. This pull request fixes
      an issue about advertising collations to the relay chain when it has
      async backing enabled, but the collator is still using the old
      networking protocol. The implementation is actually backwards compatible
      as we can not expect that everyone directly upgrades. However, the
      collation advertisement logic was requiring V2 networking messages after
      async backing was enabled, which was wrong. This is now fixed by this
      pull request.
      
      Closes: https://github.com/paritytech/polkadot-sdk/issues/1923
      
      
      
      ---------
      
      Co-authored-by: default avatareskimor <[email protected]>
      b967ba53
  19. Oct 12, 2023
    • Anton Vilhelm Ásgeirsson's avatar
      Fix links to implementers' guide (#1865) · d2fc1d7c
      Anton Vilhelm Ásgeirsson authored
      # Description
      
      In a couple of cases, there were links pointing to the w3f github pages
      domain. In other instances, there were links pointing to the old
      polkadot repo's github pages. Both of these are now pointing to the
      relevant links in
      https://paritytech.github.io/polkadot-sdk/book/index.html.
      
      These changes were made specifically because the w3f github pages
      returns a 404, and while fixing the links, the old polkadot repo links
      were touched up as well even if they do redirect properly.
      
      This shouldn't affect anything as these are documentation link changes
      only.
      d2fc1d7c
  20. Sep 27, 2023
  21. Sep 20, 2023
    • Alin Dima's avatar
      Refactor availability-recovery strategies (#1457) · 6f00edbc
      Alin Dima authored
      Refactors availability-recovery strategies to allow for easily adding
      new hotpaths and failover mechanisms.
      
      The new interface allows for chaining multiple `RecoveryStrategy`-es
      together, to cleanly express the relationship between them and share
      state and code where neccessary/possible:
      
      This was done in order to aid in implementing new hotpaths like
      [systematic chunks
      recovery](https://github.com/paritytech/polkadot-sdk/issues/598) and
      [fetching from approval
      checkers](https://github.com/paritytech/polkadot-sdk/issues/575).
      
      Thanks to this design, intermediate state can be shared between the
      strategies. For example, if the systematic chunks recovery retrieved
      less than the needed amount of chunks, pass them over to the next
      FetchChunks strategy, which will only need to recover the remaining
      number of chunks.
      
      Draft example of how a systematic chunk recovery strategy would look:
      https://github.com/paritytech/polkadot-sdk/commit/667d870bdf1470525d66c13929d5eac7249dd995
      (notice how easy it was to add and reuse code)
      
      Note that this PR doesn't itself add any new strategy, it should fully
      preserve backwards compatiblity in terms of functionality. Follow-up PRs
      to add new strategies will come.
      6f00edbc
  22. Sep 18, 2023
    • Vsevolod Stakhov's avatar
      Revert #1409 partially (#1603) · 122086d3
      Vsevolod Stakhov authored
      Futures channels that are used by default has a side effect of
      `Sender::Clone` that efficiently increases the capacity of the bounded
      channel by one. This PR fixes the undesired backpressure removal that
      was caused by the #1409. This issue has been discovered by @sandreim
      during Versi testing and needs to be treated as critical that should not
      be included in any release without this reversion.
      
      This PR reverts the original behaviour.
      122086d3
  23. Sep 11, 2023
    • Vsevolod Stakhov's avatar
      Allow to broadcast network messages in parallel (#1409) · 44dbb739
      Vsevolod Stakhov authored
      This PR addresses multiple issues pending:
      
      * [x] Update orchestra to the recent version and test how the node
      performs
      * [x] Add some useful metrics for outbound network bridge
      * [x] Try to send incoming network requests to all subsystems without
      blocking on some particular subsystem in that loop
      * [x] Fix all incompatibilities between orchestra and polkadot code
      (e.g. malus node)
      44dbb739
  24. Sep 08, 2023
  25. Sep 07, 2023
    • ordian's avatar
      polkadot: pin one block per session (#1220) · 15503883
      ordian authored
      * polkadot: propagate UnpinHandle to ActiveLeafUpdate
      
      Also extract the leaf creation for tests
      into a common function.
      
      * dispute-coordinator: try pinned blocks for slashin
      
      * apparently 1.72 is smarter than 1.70
      
      * address nits
      
      * rename fresh_leaf to new_leaf
      15503883
  26. Sep 05, 2023
  27. Sep 01, 2023
  28. Aug 31, 2023
    • Bastian Köcher's avatar
      Rename `polkadot-parachain` to `polkadot-parachain-primitives` (#1334) · a33d7922
      Bastian Köcher authored
      * Rename `polkadot-parachain` to `polkadot-parachain-primitives`
      
      While doing this it also fixes some last `rustdoc` issues and fixes
      another Cargo warning related to `pallet-paged-list`.
      
      * Fix compilation
      
      * ".git/.scripts/commands/fmt/fmt.sh"
      
      * Fix XCM docs
      
      ---------
      
      Co-authored-by: command-bot <>
      a33d7922
    • Alin Dima's avatar
      backing: move the min votes threshold to the runtime (#1200) · d6af073a
      Alin Dima authored
      
      
      * move min backing votes const to runtime
      
      also cache it per-session in the backing subsystem
      
      Signed-off-by: default avataralindima <[email protected]>
      
      * add runtime migration
      
      * introduce api versioning for min_backing votes
      
      also enable it for rococo/versi for testing
      
      * also add min_backing_votes runtime calls to statement-distribution
      
      this dependency has been recently introduced by async backing
      
      * remove explicit version runtime API call
      
      this is not needed, as the RuntimeAPISubsystem already takes care
      of versioning and will return NotSupported if the version is not
      right.
      
      * address review comments
      
      - parametrise backing votes runtime API with session index
      - remove RuntimeInfo usage in backing subsystem, as runtime API
      caches the min backing votes by session index anyway.
      - move the logic for adjusting the configured needed backing votes with the size of the backing group
      to a primitives helper.
      - move the legacy min backing votes value to a primitives helper.
      - mark JoinMultiple error as fatal, since the Canceled (non-multiple) counterpart is also fatal.
      - make backing subsystem handle fatal errors for new leaves update.
      - add HostConfiguration consistency check for zeroed backing votes threshold
      - add cumulus accompanying change
      
      * fix cumulus test compilation
      
      * fix tests
      
      * more small fixes
      
      * fix merge
      
      * bump runtime api version for westend and rollback version for rococo
      
      ---------
      
      Signed-off-by: default avataralindima <[email protected]>
      Co-authored-by: default avatarJavier Viola <[email protected]>
      d6af073a
  29. Aug 30, 2023
  30. Aug 29, 2023
  31. Aug 28, 2023
  32. Aug 25, 2023