- Apr 25, 2024
-
-
Alexandru Gheorghe authored
... a few sessions too late :(, this already happened on polkadot, so as of now there are no known relay-chains without async backing enabled in runtime, but let's fix it in case someone else wants to repeat our steps. Fixes: https://github.com/paritytech/polkadot-sdk/issues/4226 --------- Signed-off-by:
Alexandru Gheorghe <[email protected]>
-
- Apr 18, 2024
-
-
Alexandru Gheorghe authored
The `next_retry_time` gets populated when a request receives an error timeout or any other error, after thatn next_retry would check all requests in the queue returns the smallest one, which then gets used to move the main loop by creating a Delay ``` futures_timer::Delay::new(instant.saturating_duration_since(Instant::now())).await, ``` However when we retry a task for the first time we still keep it in the queue an mark it as in flight so its next_retry_time would be the oldest and it would be small than `now`, so the Delay will always triggers, so that would make the main loop essentially busy wait untill we received a response for the retry request. Fix this by excluding the tasks that are already in-flight. --------- Signed-off-by:
Alexandru Gheorghe <[email protected]> Co-authored-by:
Andrei Sandu <[email protected]>
-
- Apr 08, 2024
-
-
Aaro Altonen authored
[litep2p](https://github.com/altonen/litep2p) is a libp2p-compatible P2P networking library. It supports all of the features of `rust-libp2p` that are currently being utilized by Polkadot SDK. Compared to `rust-libp2p`, `litep2p` has a quite different architecture which is why the new `litep2p` network backend is only able to use a little of the existing code in `sc-network`. The design has been mainly influenced by how we'd wish to structure our networking-related code in Polkadot SDK: independent higher-levels protocols directly communicating with the network over links that support bidirectional backpressure. A good example would be `NotificationHandle`/`RequestResponseHandle` abstractions which allow, e.g., `SyncingEngine` to directly communicate with peers to announce/request blocks. I've tried running `polkadot --network-backend litep2p` with a few different peer configurations and there is a noticeable reduction in networking CPU usage. For high load (`--out-peers 200`), networking CPU usage goes down from ~110% to ~30% (80 pp) and for normal load (`--out-peers 40`), the usage goes down from ~55% to ~18% (37 pp). These should not be taken as final numbers because: a) there are still some low-hanging optimization fruits, such as enabling [receive window auto-tuning](https://github.com/libp2p/rust-yamux/pull/176 ), integrating `Peerset` more closely with `litep2p` or improving memory usage of the WebSocket transport b) fixing bugs/instabilities that incorrectly cause `litep2p` to do less work will increase the networking CPU usage c) verification in a more diverse set of tests/conditions is needed Nevertheless, these numbers should give an early estimate for CPU usage of the new networking backend. This PR consists of three separate changes: * introduce a generic `PeerId` (wrapper around `Multihash`) so that we don't have use `NetworkService::PeerId` in every part of the code that uses a `PeerId` * introduce `NetworkBackend` trait, implement it for the libp2p network stack and make Polkadot SDK generic over `NetworkBackend` * implement `NetworkBackend` for litep2p The new library should be considered experimental which is why `rust-libp2p` will remain as the default option for the time being. This PR currently depends on the master branch of `litep2p` but I'll cut a new release for the library once all review comments have been addresses. --------- Signed-off-by:
Alexandru Vasile <[email protected]> Co-authored-by:
Dmitry Markin <[email protected]> Co-authored-by:
Alexandru Vasile <[email protected]> Co-authored-by:
Alexandru Vasile <[email protected]>
-
- Apr 03, 2024
-
-
Andrei Sandu authored
fixes https://github.com/paritytech/polkadot-sdk/issues/3775 Additionally moves the claim queue fetch utilities into `subsystem-util`. TODO: - [x] fix tests - [x] add elastic scaling tests --------- Signed-off-by:
Andrei Sandu <[email protected]>
-
- Feb 12, 2024
-
-
Alexandru Gheorghe authored
On grid distribution messages have two paths of reaching a node, so there is the possiblity of a race when two peers send each other the same statement around the same time. Statement local_knowledge will tell us that the peer should have not send the statement because we sent it to it. Fix it by also keeping track only of the statement we received from a given peer and penalize it only if it sends it to us more than once. Fixes: https://github.com/paritytech/polkadot-sdk/issues/2346 Additionally, also use different Cost labels for different paths to make it easier to debug things. --------- Signed-off-by:
Alexandru Gheorghe <[email protected]>
-
- Jan 29, 2024
-
-
Alexandru Gheorghe authored
Topology is coming only at the beginning of each session, so we might lose it if prospective parachains was not enabled at the begining of the session, so cache it for later use. Fixes: https://github.com/paritytech/polkadot-sdk/issues/3058 --------- Signed-off-by:
Alexandru Gheorghe <[email protected]>
-
- Jan 10, 2024
-
-
Alin Dima authored
Previously, it was only possible to retry the same request on a different protocol name that had the exact same binary payloads. Introduce a way of trying a different request on a different protocol if the first one fails with Unsupported protocol. This helps with adding new req-response versions in polkadot while preserving compatibility with unupgraded nodes. The way req-response protocols were bumped previously was that they were bundled with some other notifications protocol upgrade, like for async backing (but that is more complicated, especially if the feature does not require any changes to a notifications protocol). Will be needed for implementing https://github.com/polkadot-fellows/RFCs/pull/47 TODO: - [x] add tests - [x] add guidance docs in polkadot about req-response protocol versioning
-
ordian authored
Closes #1591. The purpose of this PR is filter out backing statements from the network signed by disabled validators. This is just an optimization, since we will do filtering in the runtime in #1863 to avoid nodes to filter garbage out at block production time. - [x] Ensure it's ok to fiddle with the mask of manifests - [x] Write more unit tests - [x] Test locally - [x] simple zombienet test - [x] PRDoc --------- Co-authored-by:
Tsvetomir Dimitrov <[email protected]>
-
- Dec 13, 2023
-
-
Alexandru Gheorghe authored
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701 Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178 v0: https://github.com/paritytech/polkadot/pull/7554, ## Overall idea When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available. This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance: - Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time. - Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains - Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed. ## Other fixes: - Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments. - Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments. ## TODO: - [x] Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think. - [x] More and more testing. - [x] Test in versi. - [x] Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration. - [x] Make sure the backwards compatibility works correctly - [x] Make sure this direction is compatible with other streams of work: https://github.com/paritytech/polkadot-sdk/issues/635 & https://github.com/paritytech/polkadot-sdk/issues/742 - [x] Final versi burn-in before merging --------- Signed-off-by:
Alexandru Gheorghe <[email protected]>
-
- Dec 05, 2023
-
-
Marcin S. authored
-
- Nov 14, 2023
-
-
Kristian Sosnin authored
Fixes #1437 Co-authored-by:
Sophia Gold <[email protected]>
-
- Oct 21, 2023
-
-
asynchronous rob authored
in-progress PR adding new tests and solving bugs --------- Co-authored-by:
Bradley Olson <[email protected]> Co-authored-by:
eskimor <[email protected]> Co-authored-by:
eskimor <[email protected]> Co-authored-by:
Andrei Sandu <[email protected]>
-
- Sep 27, 2023
-
-
Chris Sosnin authored
- Async-backing related primitives are stable `primitives::v6` - Async-backing API is now part of `api_version(7)` - It's enabled on Rococo and Westend runtimes --------- Signed-off-by:
Andrei Sandu <[email protected]> Co-authored-by:
Andrei Sandu <[email protected]>
-