- Jan 13, 2025
-
-
Alexandru Gheorghe authored
Reference hardware requirements have been bumped to at least 8 cores so we can no allocate 50% of that capacity to PVF execution. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Nov 11, 2024
-
-
Nazar Mokrynskyi authored
# Description This seems to be an old artifact of the long closed https://github.com/paritytech/substrate/issues/6827 that I noticed when working on related code earlier. ## Integration `NetworkStarter` was removed, simply remove its usage: ```diff -let (network, system_rpc_tx, tx_handler_controller, start_network, sync_service) = +let (network, system_rpc_tx, tx_handler_controller, sync_service) = build_network(BuildNetworkParams { ... -start_network.start_network(); ``` ## Review Notes Changes are trivial, the only reason for this to not be accepted is if it is desired to not start network automatically for whatever reason, in which case the description of network starter needs to change. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [ ] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. --------- Co-authored-by:
GitHub Action <action@github.com> Co-authored-by:
Bastian Köcher <git@kchr.de>
-
- Nov 07, 2024
-
-
Nazar Mokrynskyi authored
# Description This is a continuation of https://github.com/paritytech/polkadot-sdk/pull/5666 that finally fixes https://github.com/paritytech/polkadot-sdk/issues/5333. This should allow developers to create custom syncing strategies or even the whole syncing engine if they so desire. It also moved syncing engine creation and addition of corresponding protocol outside `build_network_advanced` method, which is something Bastian expressed as desired in https://github.com/paritytech/polkadot-sdk/issues/5#issuecomment-1700816458 Here I replaced strategy-specific types and methods in `SyncingStrategy` trait with generic ones. Specifically `SyncingAction` is now used by all strategies instead of strategy-specific types with conversions. `StrategyKey` was an enum with a fixed set of options and now replaced with an opaque type that strategies create privately and send to upper layers as an opaque type. Requests and responses are now handled in a generic way regardl...
-
- Oct 25, 2024
-
-
Shoyu Vanilla (Flint) authored
Closes #4896
-
- Oct 24, 2024
-
-
Alexandru Gheorghe authored
The approval-voting-parallel introduced with https://github.com/paritytech/polkadot-sdk/pull/4849 has been tested on `versi` and approximately 3 weeks on parity's existing kusama nodes https://github.com/paritytech/devops/issues/3583, things worked as expected, so enable it by default on all kusama nodes in the next release. The next step will be enabling by default on polkadot if no issue arrises while running on kusama. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Oct 15, 2024
-
-
Michal Kucharczyk authored
### Fork-Aware Transaction Pool Implementation This PR introduces a fork-aware transaction pool (fatxpool) enhancing transaction management by maintaining the valid state of txpool for different forks. ### High-level overview The high level overview was added to [`sc_transaction_pool::fork_aware_txpool`](https://github.com/paritytech/polkadot-sdk/blob/3ad0a1b7/substrate/client/transaction-pool/src/fork_aware_txpool/mod.rs#L21) module. Use: ``` cargo doc --document-private-items -p sc-transaction-pool --open ``` to build the doc. It should give a good overview and nice entry point into the new pool's mechanics. <details> <summary>Quick overview (documentation excerpt)</summary> #### View For every fork, a view is created. The view is a persisted state of the transaction pool computed and updated at the tip of the fork. The view is built around the existing `ValidatedPool` structure. A view is created on every new best block notification. To create a view, one of the existing views is chosen and cloned. When the chain progresses, the view is kept in the cache (`retracted_views`) to allow building blocks upon intermediary blocks in the fork. The views are deleted on finalization: views lower than the finalized block are removed. The views are updated with the transactions from the mempool—all transactions are sent to the newly created views. A maintain process is also executed for the newly created views—basically resubmitting and pruning transactions from the appropriate tree route. ##### View store View store is the helper structure that acts as a container for all the views. It provides some convenient methods. ##### Submitting transactions Every transaction is submitted to every view at the tips of the forks. Retracted views are not updated. Every transaction also goes into the mempool. ##### Internal mempool Shortly, the main purpose of an internal mempool is to prevent a transaction from being lost. That could happen when a transaction is invalid on one fork and could be valid on another. It also allows the txpool to accept transactions when no blocks have been reported yet. The mempool removes its transactions when they get finalized. Transactions are also periodically verified on every finalized event and removed from the mempool if no longer valid. #### Events Transaction events from multiple views are merged and filtered to avoid duplicated events. `Ready` / `Future` / `Inblock` events are originated in the Views and are de-duplicated and forwarded to external listeners. `Finalized` events are originated in fork-aware-txpool logic. `Invalid` events requires special care and can be originated in both view and fork-aware-txpool logic. #### Light maintain Sometime transaction pool does not have enough time to prepare fully maintained view with all retracted transactions being revalidated. To avoid providing empty ready transaction set to block builder (what would result in empty block) the light maintain was implemented. It simply removes the imported transactions from ready iterator. #### Revalidation Revalidation is performed for every view. The revalidation process is started after a trigger is executed. The revalidation work is terminated just after a new best block / finalized event is notified to the transaction pool. The revalidation result is applied to the newly created view which is built upon the revalidated view. Additionally, parts of the mempool are also revalidated to make sure that no transactions are stuck in the mempool. #### Logs The most important log allowing to understand the state of the txpool is: ``` maintain: txs:(0, 92) views:[2;[(327, 76, 0), (326, 68, 0)]] event:Finalized { hash: 0x8...f, tree_route: [] } took:3.463522ms ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ unwatched txs in mempool ────┘ │ │ │ │ │ │ │ │ │ │ watched txs in mempool ───────┘ │ │ │ │ │ │ │ │ │ views ───────────────┘ │ │ │ │ │ │ │ │ 1st view block # ──────────┘ │ │ │ │ │ │ │ number of ready tx ───────┘ │ │ │ │ │ │ numer of future tx ─────┘ │ │ │ │ │ 2nd view block # ──────┘ │ │ │ │ number of ready tx ──────────┘ │ │ │ number of future tx ───────┘ │ │ event ────────┘ │ duration ──────────────────────────────────────────────────┘ ``` It is logged after the maintenance is done. The `debug` level enables per-transaction logging, allowing to keep track of all transaction-related actions that happened in txpool. </details> ### Integration notes For teams having a custom node, the new txpool needs to be instantiated, typically in `service.rs` file, here is an example: https://github.com/paritytech/polkadot-sdk/blob/9c547ff3 /cumulus/polkadot-omni-node/lib/src/common/spec.rs#L152-L161 To enable new transaction pool the following cli arg shall be specified: `--pool-type=fork-aware`. If it works, there shall be information printed in the log: ``` 2024-09-20 21:28:17.528 INFO main txpool: [Parachain] creating ForkAware txpool. ```` For debugging the following debugs shall be enabled: ``` "-lbasic-authorship=debug", "-ltxpool=debug", ``` *note:* trace for txpool enables per-transaction logging. ### Future work The current implementation seems to be stable, however further improvements are required. Here is the umbrella issue for future work: - https://github.com/paritytech/polkadot-sdk/issues/5472 Partially fixes: #1202 --------- Co-authored-by:
Bastian Köcher <git@kchr.de> Co-authored-by:
Sebastian Kunert <skunert49@gmail.com> Co-authored-by:
Iulian Barbu <14218860+iulianbarbu@users.noreply.github.com>
-
- Oct 04, 2024
-
-
Alexandru Gheorghe authored
Jaeger tracing went mostly unused and it created bigger problems like wasting CPU or memory leaks, so remove it entirely. Fixes: https://github.com/paritytech/polkadot-sdk/issues/4995 --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Sep 26, 2024
-
-
Alexandru Gheorghe authored
This is the implementation of the approach described here: https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2150321612 & https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154357547 & https://github.com/paritytech/polkadot-sdk/issues/1617#issuecomment-2154721395. ## Description of changes The end goal is to have an architecture where we have single subsystem(`approval-voting-parallel`) and multiple worker types that would full-fill the work that currently is fulfilled by the `approval-distribution` and `approval-voting` subsystems. The main loop of the new subsystem would do just the distribution of work to the workers. The new subsystem will have: - N approval-distribution workers: This would do the work that is currently being done by the approval-distribution subsystem and in addition to that will also perform the crypto-checks that an assignment is valid and that a vote is correctly signed. Work is assigned via the following formula: `worker_index = msg.validator % WORKER_COUNT`, this guarantees that all assignments and approvals from the same validator reach the same worker. - 1 approval-voting worker: This would receive an already valid message and do everything the approval-voting currently does, except the crypto-checking that has been moved already to the approval-distribution worker. On the hot path of processing messages **no** synchronisation and waiting is needed between approval-distribution and approval-voting workers. <img width="1431" alt="Screenshot 2024-06-07 at 11 28 08" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/a196199b-b705-4140-87d4-c6900ba8595e"> ## Guidelines for reading The full implementation is broken in 5 PRs and all of them are self-contained and improve things incrementally even without the parallelisation being implemented/enabled, the reason this approach was taken instead of a big-bang PR, is to make things easier to review and reduced the risk of breaking this critical subsystems. After reading the full description of this PR, the changes should be read in the following order: 1. https://github.com/paritytech/polkadot-sdk/pull/4848, some other micro-optimizations for networks with a high number of validators. This change gives us a speed up by itself without any other changes. 2. https://github.com/paritytech/polkadot-sdk/pull/4845 , this contains only interface changes to decouple the subsystem from the `Context` and be able to run multiple instances of the subsystem on different threads. **No functional changes** 3. https://github.com/paritytech/polkadot-sdk/pull/4928, moving of the crypto checks from approval-voting in approval-distribution, so that the approval-distribution has no reason to wait after approval-voting anymore. This change gives us a speed up by itself without any other changes. 4. https://github.com/paritytech/polkadot-sdk/pull/4846, interface changes to make approval-voting runnable on a separate thread. **No functional changes** 5. This PR, where we instantiate an `approval-voting-parallel` subsystem that runs on different workers the logic currently in `approval-distribution` and `approval-voting`. 6. The next step after this changes get merged and deploy would be to bring all the files from approval-distribution, approval-voting, approval-voting-parallel into a single rust crate, to make it easier to maintain and understand the structure. ## Results Running subsystem-benchmarks with 1000 validators 100 fully ocuppied cores and triggering all assignments and approvals for all tranches #### Approval does not lags behind. Master ``` Chain selection approved after 72500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a ``` With this PoC ``` Chain selection approved after 3500 ms hash=0x0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a ``` #### Gathering enough assignments Enough assignments are gathered in less than 500ms, so that gives un a guarantee that un-necessary work does not get triggered, on master on the same benchmark because the subsystems fall behind on work, that number goes above 32 seconds on master. <img width="2240" alt="Screenshot 2024-06-20 at 15 48 22" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/d2f2b29c-5ff6-44b4-a245-5b37ab8e58bc"> #### Cpu usage: Master ``` CPU usage, seconds total per block approval-distribution 96.9436 9.6944 approval-voting 117.4676 11.7468 test-environment 44.0092 4.4009 ``` With this PoC ``` CPU usage, seconds total per block approval-distribution 0.0014 0.0001 --- unused approval-voting 0.0437 0.0044. --- unused approval-voting-parallel 5.9560 0.5956 approval-voting-parallel-0 22.9073 2.2907 approval-voting-parallel-1 23.0417 2.3042 approval-voting-parallel-2 22.0445 2.2045 approval-voting-parallel-3 22.7234 2.2723 approval-voting-parallel-4 21.9788 2.1979 approval-voting-parallel-5 23.0601 2.3060 approval-voting-parallel-6 22.4805 2.2481 approval-voting-parallel-7 21.8330 2.1833 approval-voting-parallel-db 37.1954 3.7195. --- the approval-voting thread. ``` # Enablement strategy Because just some trivial plumbing is needed in approval-distribution and approval-voting to be able to run things in parallel and because this subsystems plays a critical part in the system this PR proposes that we keep both ways of running the approval work, as separated subsystems and just a single subsystem(`approval-voting-parallel`) which has multiple workers for the distribution work and one worker for the approval-voting work and switch between them with a comandline flag. The benefits for this is twofold. 1. With the same polkadot binary we can easily switch just a few validators to use the parallel approach and gradually make this the default way of running, if now issues arise. 2. In the worst case scenario were it becomes the default way of running things, but we discover there are critical issues with it we have the path to quickly disable it by asking validators to adjust their command line flags. # Next steps - [x] Make sure through various testing we are not missing anything - [x] Polish the implementations to make them production ready - [x] Add Unittest Tests for approval-voting-parallel. - [x] Define and implement the strategy for rolling this change, so that the blast radius is minimal(single validator) in case there are problems with the implementation. - [x] Versi long running tests. - [x] Add relevant metrics. @ordian @eskimor @sandreim @AndreiEres , let me know what you think. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Sep 22, 2024
-
-
Branislav Kontur authored
It is a first step for switching to the `frame-omni-bencher` for CI. This PR includes several changes related to generating chain specs plus: - [x] pallet `assigned_slots` fix missing `#[serde(skip)]` for phantom - [x] pallet `paras_inherent` benchmark fix - cherry-picked from https://github.com/paritytech/polkadot-sdk/pull/5688 - [x] migrates `get_preset` to the relevant runtimes - [x] fixes Rococo genesis presets - does not work https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7317249 - [x] fixes Rococo benchmarks for CI - [x] migrate westend genesis - [x] remove wococo stuff Closes: https://github.com/paritytech/polkadot-sdk/issues/5680 ## Follow-ups - Fix for frame-omni-bencher https://github.com/paritytech/polkadot-sdk/pull/5655 - Enable new short-benchmarking CI - https://github.com/paritytech/polkadot-sdk/pull/5706 - Remove gitlab pipelines for short benchmarking - refactor all Cumulus runtimes to use `get_preset` - https://github.com/paritytech/polkadot-sdk/issues/5704 - https://github.com/paritytech/polkadot-sdk/issues/5705 - https://github.com/paritytech/polkadot-sdk/issues/5700 - [ ] Backport to the stable --------- Co-authored-by: command-bot <> Co-authored-by:
ordian <noreply@reusable.software>
-
- Sep 17, 2024
-
-
Nazar Mokrynskyi authored
# Description Follow-up to https://github.com/paritytech/polkadot-sdk/pull/5469 and mostly covering https://github.com/paritytech/polkadot-sdk/issues/5333. The primary change here is that syncing strategy is no longer created inside of syncing engine, instead syncing strategy is an argument of syncing engine, more specifically it is an argument to `build_network` that most downstream users will use. This also extracts addition of request-response protocols outside of network construction, making sure they are physically not present when they don't need to be (imagine syncing strategy that uses none of Substrate's protocols in its implementation for example). This technically allows to completely replace syncing strategy with whatever strategy chain might need. There will be at least one follow-up PR that will simplify `SyncingStrategy` trait and other public interfaces to remove mentions of block/state/warp sync requests, replacing them with generic APIs, such that strategies where warp sync is not applicable don't have to provide dummy method implementations, etc. ## Integration Downstream projects will have to write a bit of boilerplate calling `build_polkadot_syncing_strategy` function to create previously default syncing strategy. ## Review Notes Please review PR through individual commits rather than the final diff, it will be easier that way. The changes are mostly just moving code around one step at a time. # Checklist * [x] My PR includes a detailed description as outlined in the "Description" and its two subsections above. * [x] My PR follows the [labeling requirements]( https://github.com/paritytech/polkadot-sdk/blob/master/docs/contributor/CONTRIBUTING.md#Process ) of this project (at minimum one label for `T` required) * External contributors: ask maintainers to put the right label on your PR. * [x] I have made corresponding changes to the documentation (if applicable)
-
- Sep 12, 2024
-
-
Alexandru Gheorghe authored
This is part of the work to further optimize the approval subsystems, if you want to understand the full context start with reading https://github.com/paritytech/polkadot-sdk/pull/4849#issue-2364261568, # Description This PR contain changes to make possible the run of single approval-voting instance on a worker thread, so that it can be instantiated by the approval-voting-parallel subsystem. This does not contain any functional changes it just decouples the subsystem from the subsystem Context and introduces more specific trait dependencies for each function instead of all of them requiring a context. This change can be merged independent of the followup PRs. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by:
Andrei Sandu <54316454+sandreim@users.noreply.github.com>
-
- Sep 05, 2024
-
-
Alexandru Gheorghe authored
Fixes: https://github.com/paritytech/polkadot-sdk/issues/5122. This PR extends the existing single core `benchmark_cpu` to also build a score of the entire processor by spawning `EXPECTED_NUM_CORES(8)` threads and averaging their throughput. This is better than simply checking the number of cores, because also covers multi-tenant environments where the OS sees a high number of available CPUs, but because it has to share it with the rest of his neighbours its total throughput does not satisfy the minimum requirements. ## TODO - [x] Obtain reference values on the reference hardware. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Sep 02, 2024
-
-
Nazar Mokrynskyi authored
This improves `sc-service` API by not requiring the whole `&Configuration`, using specific configuration options instead. `RpcConfiguration` was also extracted from `Configuration` to group all RPC options together. We don't use Substrate's CLI and would rather not use `Configuration` either, but some key public functions require it even though they ignored most of the fields anyway. `RpcConfiguration` is very helpful not just for consolidation of the fields, but also to finally make RPC optional for our use case, while Substrate still runs RPC server on localhost even if listening address is explicitly set to `None`, which is annoying (and I suspect there is a reason for it, so didn't want to change the default just yet). While this is a breaking change, most developers will not notice it if they use higher-level APIs. Fixes https://github.com/paritytech/polkadot-sdk/issues/2897 --------- Co-authored-by:
Niklas Adolfsson <niklasadolfsson1@gmail.com>
-
- Aug 28, 2024
-
-
Niklas Adolfsson authored
rpc server: listen to `ipv6 socket` if available and `--experimental-rpc-endpoint` CLI option (#4792) Close https://github.com/paritytech/polkadot-sdk/issues/3488, https://github.com/paritytech/polkadot-sdk/issues/4331 This changes/adds the following: 1. The default setting is that substrate starts a rpc server that listens to localhost both Ipv4 and Ipv6 on the same port. Ipv6 is allowed to fail because some platforms may not support it 2. A new RPC CLI option `--experimental-rpc-endpoint` which allow to configure arbitrary listen addresses including the port, if this is enabled no other interfaces are enabled. 3. If the local addr is not found for any of the sockets the server is not started throws an error. 4. Remove the deny_unsafe from the RPC implementations instead this is an extension to allow different polices for different interfaces/sockets such one may enable unsafe on local interface and safe on only the external interface. So for instance in this PR it's now possible to start up three RPC...
-
- Aug 23, 2024
-
-
Nazar Mokrynskyi authored
I'm not sure if this is exactly what https://github.com/paritytech/polkadot-sdk/issues/3537 meant, but I think it should be fine to wait for relay chain before initializing parachain node fully, which removed the need for background task and extra hacks throughout the stack just to know where warp sync should start. Previously there were both `WarpSyncParams` and `WarpSyncConfig`, but there was no longer any point in having two data structures, so I simplified it to just `WarpSyncConfig`. Fixes https://github.com/paritytech/polkadot-sdk/issues/3537
-
- Jul 23, 2024
-
-
Alexandru Vasile authored
This PR extends the metrics exposed by the peerstore with the total number of banned peers. The new metric is exposed under `substrate_sub_libp2p_peerset_num_banned_peers`. To easily extend metrics in the future, the `fn num_known_peers` is removed in favor of `fn status`. While at it, enable the metrics for litep2p: - total number of peers from peerstore (needed to debug memory consumption) - total number of banned peers from peerstore (needed to debug reputation bans and disconnects) Have added a couple of tests to validate that the number of banned peers is exposed properly. Part of: https://github.com/paritytech/polkadot-sdk/issues/4681 ### Testing Done Using [subp2p-explorer](https://github.com/lexnv/subp2p-explorer) have submitted random data on tx protocol. The peer gets banned, the num of banned peers is incremented then the peer is disconnected. cc @paritytech/networking --------- Signed-off-by:
Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by:
Dmitry Markin <dmitry@markin.tech>
-
- Jun 05, 2024
-
-
Oliver Tale-Yazdi authored
Inherited workspace dependencies cannot be renamed by the crate using them (see [1](https://github.com/rust-lang/cargo/issues/12546), [2](https://stackoverflow.com/questions/76792343/can-inherited-dependencies-in-rust-be-aliased-in-the-cargo-toml-file)). Since we want to use inherited workspace dependencies everywhere, we first need to unify all aliases that we use for a dependency throughout the workspace. The umbrella crate is currently excluded from this procedure, since it should be able to export the crates by their original name without much hassle. For example: one crate may alias `parity-scale-codec` to `codec`, while another crate does not alias it at all. After this change, all crates have to use `codec` as name. The problematic combinations were: - conflicting aliases: most crates aliases as `A` but some use `B`. - missing alias: most of the crates alias a dep but some dont. - superfluous alias: most crates dont alias a dep but some do. The script that i used first determines whether most crates opted to alias a dependency or not. From that info it decides whether to use an alias or not. If it decided to use an alias, the most common one is used everywhere. To reproduce, i used [this](https://github.com/ggwpez/substrate-scripts/blob/master/uniform-crate-alias.py) python script in combination with [this](https://github.com/ggwpez/zepter/blob/38ad10585fe98a5a86c1d2369738bc763a77057b/renames.json) error output from Zepter. --------- Signed-off-by:
Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io> Co-authored-by:
Bastian Köcher <git@kchr.de>
-
- May 30, 2024
-
-
drskalman authored
Revived version of https://github.com/paritytech/substrate/pull/13311 . Except Signature is not generic and is dictated by AuthorityId. --------- Co-authored-by:
Davide Galassi <davxy@datawok.net> Co-authored-by:
Robert Hambrock <roberthambrock@gmail.com> Co-authored-by:
Adrian Catangiu <adrian@parity.io>
-
- May 28, 2024
-
-
Alin Dima authored
**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes #598 Also implements [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](https://github.com/paritytech/polkadot-sdk/pull/2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099 #### TODO: - [x] [RFC #47](https://github.com/polkadot-fellows/RFCs/pull/47) - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2177 and rebase on top of those changes - [x] merge https://github.com/paritytech/polkadot-sdk/pull/2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see https://github.com/paritytech/polkadot-sdk/issues/598#issuecomment-1792007099 - [x] Rewrite the implementer's guide documentation - [x] https://github.com/paritytech/polkadot-sdk/pull/3065 - [x] https://github.com/paritytech/zombienet/issues/1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by:
alindima <alin@parity.io> Co-authored-by:
Javier Viola <javier@parity.io>
-
- May 27, 2024
-
-
Michal Kucharczyk authored
This PR removes deprecated code: - The `RuntimeGenesisConfig` generic type parameter in `GenericChainSpec` struct. - `ChainSpec::from_genesis` method allowing to create chain-spec using closure providing runtime genesis struct - `GenesisSource::Factory` variant together with no longer needed `GenesisSource`'s generic parameter `G` (which was intended to be a runtime genesis struct). https://github.com/paritytech/polkadot-sdk/blob/17b56fae/substrate/client/chain-spec/src/chain_spec.rs#L559-L563
-
- May 24, 2024
-
-
Andrei Sandu authored
availability-recovery: bump chunk fetch threshold to 1MB for Polkadot and 4MB for Kusama + testnets (#4399) Doing this change ensures that we minimize the CPU usage we spend in reed-solomon by only doing the re-encoding into chunks if PoV size is less than 4MB (which means all PoVs right now) Based on susbystem benchmark results we concluded that it is safe to bump this number higher. At worst case scenario the network pressure for a backing group of 5 is around 25% of the network bandwidth in hw specs. Assuming 6s block times (max_candidate_depth 3) and needed_approvals 30 the amount of bandwidth usage of a backing group used would hover above `30 * 4 * 3 = 360MB` per relay chain block. Given a backing group of 5 that gives 72MB per block per validator -> 12 MB/s. <details> <summary>Reality check on Kusama PoV sizes (click for chart)</summary> <br> <img width="697" alt="Screenshot 2024-05-07 at 14 30 38" src="https://github.com/paritytech/polkadot-sdk/assets/54316454/bfed32d4-8623-48b0-9ec0-8b95dd2a9d8c"> </details> --------- Signed-off-by:
Andrei Sandu <andrei-mihail@parity.io>
-
- May 07, 2024
-
-
jimdssd authored
-
- Apr 24, 2024
-
-
Alexandru Gheorghe authored
Part of https://github.com/paritytech/polkadot-sdk/issues/4126 we want to safely increase the execute_workers_max_num gradually from chain to chain and assess if there are any negative impacts. This PR performs the necessary plumbing to be able to increase it based on the chain id, it increase the number of execution workers from 2 to 4 on test network but lives kusama and polkadot unchanged until we gather more data. --------- Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Apr 08, 2024
-
-
Aaro Altonen authored
[litep2p](https://github.com/altonen/litep2p) is a libp2p-compatible P2P networking library. It supports all of the features of `rust-libp2p` that are currently being utilized by Polkadot SDK. Compared to `rust-libp2p`, `litep2p` has a quite different architecture which is why the new `litep2p` network backend is only able to use a little of the existing code in `sc-network`. The design has been mainly influenced by how we'd wish to structure our networking-related code in Polkadot SDK: independent higher-levels protocols directly communicating with the network over links that support bidirectional backpressure. A good example would be `NotificationHandle`/`RequestResponseHandle` abstractions which allow, e.g., `SyncingEngine` to directly communicate with peers to announce/request blocks. I've tried running `polkadot --network-backend litep2p` with a few different peer configurations and there is a noticeable reduction in networking CPU usage. For high load (`--out-peers 200`), networking CPU usage goes down from ~110% to ~30% (80 pp) and for normal load (`--out-peers 40`), the usage goes down from ~55% to ~18% (37 pp). These should not be taken as final numbers because: a) there are still some low-hanging optimization fruits, such as enabling [receive window auto-tuning](https://github.com/libp2p/rust-yamux/pull/176), integrating `Peerset` more closely with `litep2p` or improving memory usage of the WebSocket transport b) fixing bugs/instabilities that incorrectly cause `litep2p` to do less work will increase the networking CPU usage c) verification in a more diverse set of tests/conditions is needed Nevertheless, these numbers should give an early estimate for CPU usage of the new networking backend. This PR consists of three separate changes: * introduce a generic `PeerId` (wrapper around `Multihash`) so that we don't have use `NetworkService::PeerId` in every part of the code that uses a `PeerId` * introduce `NetworkBackend` trait, implement it for the libp2p network stack and make Polkadot SDK generic over `NetworkBackend` * implement `NetworkBackend` for litep2p The new library should be considered experimental which is why `rust-libp2p` will remain as the default option for the time being. This PR currently depends on the master branch of `litep2p` but I'll cut a new release for the library once all review comments have been addresses. --------- Signed-off-by:
Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by:
Dmitry Markin <dmitry@markin.tech> Co-authored-by:
Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by:
Alexandru Vasile <alexandru.vasile@parity.io>
-
- Apr 02, 2024
-
-
Adrian Catangiu authored
This outputs: ``` 2024-04-02 14:36:02.135 ERROR tokio-runtime-worker beefy: 🥩 for session starting at block 21990151 no BEEFY authority key found in store, you must generate valid session keys (https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#generating-the-session-keys) ``` error log entry, once every session, for nodes running with `Role::Authority` that have no public BEEFY key in their keystore --------- Co-authored-by:
Bastian Köcher <git@kchr.de>
-
- Mar 22, 2024
-
-
Dmitry Markin authored
Make sure explicitly set by the operator public addresses go first in the authority discovery DHT records. Also update `Discovery` behavior to eliminate duplicates in the returned addresses. This PR should improve situation with https://github.com/paritytech/polkadot-sdk/issues/3519. Obsoletes https://github.com/paritytech/polkadot-sdk/pull/3657.
-
- Feb 28, 2024
-
-
maksimryndin authored
resolve https://github.com/paritytech/polkadot-sdk/issues/3116 a follow-up on https://github.com/paritytech/polkadot-sdk/pull/3061#pullrequestreview-1847530265: - [x] reuse collator overseer builder for polkadot-node and collator - [x] run zombienet test (0001-parachains-smoke-test.toml) - [x] make wasm build errors more user-friendly for an easier problem detection when using different toolchains in Rust --------- Co-authored-by:
ordian <write@reusable.software> Co-authored-by:
s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
-
- Jan 29, 2024
-
-
s0me0ne-unkn0wn authored
Currently, collators and their alongside nodes spin up a full-scale overseer running a bunch of subsystems that are not needed if the node is not a validator. That was considered to be harmless; however, we've got problems with unused subsystems getting stalled for a reason not currently known, resulting in the overseer exiting and bringing down the whole node. This PR aims to only run needed subsystems on such nodes, replacing the rest with `DummySubsystem`. It also enables collator-optimized availability recovery subsystem implementation. Partially solves #1730.
-
- Jan 12, 2024
-
-
Dmitry Markin authored
Extract `WarpSync` (and `StateSync` as part of warp sync) from `ChainSync` as independent syncing strategy called by `SyncingEngine`. Introduce `SyncingStrategy` enum as a proxy between `SyncingEngine` and specific syncing strategies. ## Limitations Gap sync is kept in `ChainSync` for now because it shares the same set of peers as block syncing implementation in `ChainSync`. Extraction of a common context responsible for peer management in syncing strategies able to run in parallel is planned for a follow-up PR. ## Further improvements A possibility of conversion of `SyncingStartegy` into a trait should be evaluated. The main stopper for this is that different strategies need to communicate different actions to `SyncingEngine` and respond to different events / provide different APIs (e.g., requesting justifications is only possible via `ChainSync` and not through `WarpSync`; `SendWarpProofRequest` action is only relevant to `WarpSync`, etc.) --------- Co-authored-by:
Aaro Altonen <48052676+altonen@users.noreply.github.com>
-
- Jan 04, 2024
-
-
Marcin S. authored
Please let me know if this would be a better UX. Should we have specific links in the error message?
-
- Dec 05, 2023
-
-
Marcin S. authored
Co-authored-by:
Javier Viola <javier@parity.io>
-
- Nov 28, 2023
-
-
Aaro Altonen authored
This commit introduces a new concept called `NotificationService` which allows Polkadot protocols to communicate with the underlying notification protocol implementation directly, without routing events through `NetworkWorker`. This implies that each protocol has its own service which it uses to communicate with remote peers and that each `NotificationService` is unique with respect to the underlying notification protocol, meaning `NotificationService` for the transaction protocol can only be used to send and receive transaction-related notifications. The `NotificationService` concept introduces two additional benefits: * allow protocols to start using custom handshakes * allow protocols to accept/reject inbound peers Previously the validation of inbound connections was solely the responsibility of `ProtocolController`. This caused issues with light peers and `SyncingEngine` as `ProtocolController` would accept more peers than `SyncingEngine` could accept which caused peers to have differing views of their own states. `SyncingEngine` would reject excess peers but these rejections were not properly communicated to those peers causing them to assume that they were accepted. With `NotificationService`, the local handshake is not sent to remote peer if peer is rejected which allows it to detect that it was rejected. This commit also deprecates the use of `NetworkEventStream` for all notification-related events and going forward only DHT events are provided through `NetworkEventStream`. If protocols wish to follow each other's events, they must introduce additional abtractions, as is done for GRANDPA and transactions protocols by following the syncing protocol through `SyncEventStream`. Fixes https://github.com/paritytech/polkadot-sdk/issues/512 Fixes https://github.com/paritytech/polkadot-sdk/issues/514 Fixes https://github.com/paritytech/polkadot-sdk/issues/515 Fixes https://github.com/paritytech/polkadot-sdk/issues/554 Fixes https://github.com/paritytech/polkadot-sdk/issues/556 --- These changes are transferred from https://github.com/paritytech/substrate/pull/14197 but there are no functional changes compared to that PR --------- Co-authored-by:
Dmitry Markin <dmitry@markin.tech> Co-authored-by:
Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
-
André Silva authored
This was never used and we probably don't need it anyway.
-
André Silva authored
Currently the polkadot node will backoff from block authoring if finality starts lagging. This PR disables this mechanism on production networks (polkadot and kusama) and adds a flags to optionally force enabling it.
-
- Nov 23, 2023
-
-
Bastian Köcher authored
This moves the macro related re-exports to `__private` to make it more obvious for downstream users that they are using an internal api. --------- Co-authored-by: command-bot <>
-
- Nov 03, 2023
-
-
Alexandru Gheorghe authored
The check_hardware functions does not give us too much information as to what is failing, so let's return the list of failed metrics, so that callers can print it. This would make debugging easier, rather than try to guess which dimension is actually failing. Signed-off-by:
Alexandru Gheorghe <alexandru.gheorghe@parity.io>
-
- Oct 18, 2023
-
-
Serban Iorga authored
Fellowship companion: https://github.com/polkadot-fellows/runtimes/pull/65 This starts the BEEFY client by default for Polkadot nodes. Governance/sudo call is later required to enable/start consensus. Part of https://github.com/paritytech/parity-bridges-common/issues/2420
-
- Sep 27, 2023
-
-
Chris Sosnin authored
- Async-backing related primitives are stable `primitives::v6` - Async-backing API is now part of `api_version(7)` - It's enabled on Rococo and Westend runtimes --------- Signed-off-by:
Andrei Sandu <andrei-mihail@parity.io> Co-authored-by:
Andrei Sandu <54316454+sandreim@users.noreply.github.com>
-
- Sep 19, 2023
-
-
Bastian Köcher authored
This pull request removes the Polkadot and Kusama native runtime from the polkadot node. This brings some implications with it: There are no more kusama/polkadot-dev chain specs available. We will need to write some tooling in the fellowship repo to provide them easily. The try-runtime job for polkadot & kusama is not available anymore as we don't have the dev chain specs anymore. Certain benchmarking commands will also not work until we migrate them to use a runtime api. Some crates in utils are still depending on the polkadot/kusama native runtime that will also need to be fixed. Port of: https://github.com/paritytech/polkadot/pull/7467
-
- Sep 15, 2023
-
-
Rahul Subramaniyam authored
Submit the outstanding PRs from the old repos(these were already reviewed and approved before the repo rorg, but not yet submitted): Main PR: https://github.com/paritytech/substrate/pull/14014 Companion PRs: https://github.com/paritytech/polkadot/pull/7134, https://github.com/paritytech/cumulus/pull/2489 The changes in the PR: 1. ChainSync currently calls into the block request handler directly. Instead, move the block request handler behind a trait. This allows new protocols to be plugged into ChainSync. 2. BuildNetworkParams is changed so that custom relay protocol implementations can be (optionally) passed in during network creation time. If custom protocol is not specified, it defaults to the existing block handler 3. BlockServer and BlockDownloader traits are introduced for the protocol implementation. The existing block handler has been changed to implement these traits 4. Other changes: [X] Make TxHash serializable. This is needed for exchanging the serialized hash in the relay protocol messages [X] Clean up types no longer used(OpaqueBlockRequest, OpaqueBlockResponse) --------- Co-authored-by:
Dmitry Markin <dmitry@markin.tech> Co-authored-by: command-bot <>
-