Commits · aceda4659509d426d364188fa72555de58b887ba · parity / Mirrored projects / polkadot-sdk

Apr 25, 2024

Fix polkadot parachains not producing blocks until next session (#4269) · 239a23d9

Alexandru Gheorghe authored Apr 25, 2024

... a few sessions too late :(, this already happened on polkadot, so as
of now there are no known relay-chains without async backing enabled in
runtime, but let's fix it in case someone else wants to repeat our
steps.

Fixes: https://github.com/paritytech/polkadot-sdk/issues/4226



---------

Signed-off-by: Alexandru Gheorghe <[email protected]>

239a23d9

Apr 18, 2024

Fix next_retry busy waiting on first retry (#4192) · 0e552893

Alexandru Gheorghe authored Apr 18, 2024



The `next_retry_time` gets populated when a request receives an error
timeout or any other error, after thatn next_retry would check all
requests in the queue returns the smallest one, which then gets used to
move the main loop by creating a Delay
```
futures_timer::Delay::new(instant.saturating_duration_since(Instant::now())).await,
```

However when we retry a task for the first time we still keep it in the
queue an mark it as in flight so its next_retry_time would be the oldest
and it would be small than `now`, so the Delay will always triggers, so
that would make the main loop essentially busy wait untill we received a
response for the retry request.

Fix this by excluding the tasks that are already in-flight.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Andrei Sandu <[email protected]>

0e552893

Apr 10, 2024

net/strategy: Log bad peerId from on_validated_block_announce (#4051) · cd010925

Alexandru Vasile authored Apr 10, 2024

This tiny PR extends the `on_validated_block_announce` log with the bad
PeerID.
Used to identify if the peerID is malicious by correlating with other
logs (ie peer-set).

While at it, have removed the `\n` from a multiline log, which did not
play well with
[sub-triage-logs](https://github.com/lexnv/sub-triage-logs/tree/master

).

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Bastian Köcher <[email protected]>

cd010925

Apr 08, 2024

Integrate litep2p into Polkadot SDK (#2944) · 80616f6d

Aaro Altonen authored Apr 08, 2024

[litep2p](https://github.com/altonen/litep2p) is a libp2p-compatible P2P
networking library. It supports all of the features of `rust-libp2p`
that are currently being utilized by Polkadot SDK.

Compared to `rust-libp2p`, `litep2p` has a quite different architecture
which is why the new `litep2p` network backend is only able to use a
little of the existing code in `sc-network`. The design has been mainly
influenced by how we'd wish to structure our networking-related code in
Polkadot SDK: independent higher-levels protocols directly communicating
with the network over links that support bidirectional backpressure. A
good example would be `NotificationHandle`/`RequestResponseHandle`
abstractions which allow, e.g., `SyncingEngine` to directly communicate
with peers to announce/request blocks.

I've tried running `polkadot --network-backend litep2p` with a few
different peer configurations and there is a noticeable reduction in
networking CPU usage. For high load (`--out-peers 200`), networking CPU
usage goes down from ~110% to ~30% (80 pp) and for normal load
(`--out-peers 40`), the usage goes down from ~55% to ~18% (37 pp).

These should not be taken as final numbers because:

a) there are still some low-hanging optimization fruits, such as
enabling [receive window
auto-tuning](https://github.com/libp2p/rust-yamux/pull/176

), integrating
`Peerset` more closely with `litep2p` or improving memory usage of the
WebSocket transport
b) fixing bugs/instabilities that incorrectly cause `litep2p` to do less
work will increase the networking CPU usage
c) verification in a more diverse set of tests/conditions is needed

Nevertheless, these numbers should give an early estimate for CPU usage
of the new networking backend.

This PR consists of three separate changes:
* introduce a generic `PeerId` (wrapper around `Multihash`) so that we
don't have use `NetworkService::PeerId` in every part of the code that
uses a `PeerId`
* introduce `NetworkBackend` trait, implement it for the libp2p network
stack and make Polkadot SDK generic over `NetworkBackend`
  * implement `NetworkBackend` for litep2p

The new library should be considered experimental which is why
`rust-libp2p` will remain as the default option for the time being. This
PR currently depends on the master branch of `litep2p` but I'll cut a
new release for the library once all review comments have been
addresses.

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Dmitry Markin <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>

80616f6d

Deprecate `para_id()` from `CoreState` in polkadot primitives (#3979) · 59f868d1

Tsvetomir Dimitrov authored Apr 08, 2024

With Coretime enabled we can no longer assume there is a static 1:1
mapping between core index and para id. This mapping should be obtained
from the scheduler/claimqueue on block by block basis.

This PR modifies `para_id()` (from `CoreState`) to return the scheduled
`ParaId` for occupied cores and removes its usages in the code.

Closes https://github.com/paritytech/polkadot-sdk/issues/3948



---------

Co-authored-by: Andrei Sandu <[email protected]>

59f868d1

Apr 03, 2024

Add ClaimQueue wrapper (#3950) · 0f4e849e

Andrei Sandu authored Apr 03, 2024



Remove `fetch_next_scheduled_on_core` in favor of new wrapper and
methods for accessing it.

---------

Signed-off-by: Andrei Sandu <[email protected]>

0f4e849e

statement-distribution: fix filtering of statements for elastic parachains (#3879) · e8e201f0

Andrei Sandu authored Apr 03, 2024

fixes https://github.com/paritytech/polkadot-sdk/issues/3775



Additionally moves the claim queue fetch utilities into
`subsystem-util`.

TODO:
- [x] fix tests
- [x] add elastic scaling tests

---------

Signed-off-by: Andrei Sandu <[email protected]>

e8e201f0

Apr 01, 2024

primitives: Move out of staging released APIs (#3925) · d6f68bb9

Alexandru Gheorghe authored Apr 01, 2024



Runtime release 1.2 includes bumping of the ParachainHost APIs up to
v10, so let's move all the released APIs out of vstaging folder, this PR
does not include any logic changes only renaming of the modules and some
moving around.

Signed-off-by: Alexandru Gheorghe <[email protected]>

d6f68bb9

Mar 26, 2024

Fix spelling mistakes across the whole repository (#3808) · 002d9260

Dcompoze authored Mar 26, 2024

**Update:** Pushed additional changes based on the review comments.

**This pull request fixes various spelling mistakes in this
repository.**

Most of the changes are contained in the first **3** commits:

- `Fix spelling mistakes in comments and docs`

- `Fix spelling mistakes in test names`

- `Fix spelling mistakes in error messages, panic messages, logs and
tracing`

Other source code spelling mistakes are separated into individual
commits for easier reviewing:

- `Fix the spelling of 'authority'`

- `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`

- `Fix the spelling of 'prev_enqueud_messages'`

- `Fix the spelling of 'endpoint'`

- `Fix the spelling of 'children'`

- `Fix the spelling of 'PenpalSiblingSovereignAccount'`

- `Fix the spelling of 'PenpalSudoAccount'`

- `Fix the spelling of 'insufficient'`

- `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`

- `Fix the spelling of 'subtracted'`

- `Fix the spelling of 'CandidatePendingAvailability'`

- `Fix the spelling of 'exclusive'`

- `Fix the spelling of 'until'`

- `Fix the spelling of 'discriminator'`

- `Fix the spelling of 'nonexistent'`

- `Fix the spelling of 'subsystem'`

- `Fix the spelling of 'indices'`

- `Fix the spelling of 'committed'`

- `Fix the spelling of 'topology'`

- `Fix the spelling of 'response'`

- `Fix the spelling of 'beneficiary'`

- `Fix the spelling of 'formatted'`

- `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`

- `Fix the spelling of 'succeeded'`

- `Fix the spelling of 'reopened'`

- `Fix the spelling of 'proposer'`

- `Fix the spelling of 'InstantiationNonce'`

- `Fix the spelling of 'depositor'`

- `Fix the spelling of 'expiration'`

- `Fix the spelling of 'phantom'`

- `Fix the spelling of 'AggregatedKeyValue'`

- `Fix the spelling of 'randomness'`

- `Fix the spelling of 'defendant'`

- `Fix the spelling of 'AquaticMammal'`

- `Fix the spelling of 'transactions'`

- `Fix the spelling of 'PassingTracingSubscriber'`

- `Fix the spelling of 'TxSignaturePayload'`

- `Fix the spelling of 'versioning'`

- `Fix the spelling of 'descendant'`

- `Fix the spelling of 'overridden'`

- `Fix the spelling of 'network'`

Let me know if this structure is adequate.

**Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
`Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
as it is.

~~**Note:** In some places the term `Receival` is used to refer to
message reception, IMO `Reception` is the correct word here, but I left
it as it is.~~

~~**Note:** In some places the term `Overlayed` is used instead of the
more acceptable version `Overlaid` but I also left it as it is.~~

~~**Note:** In some places the term `Applyable` is used instead of the
correct version `Applicable` but I also left it as it is.~~

**Note:** Some usage of British vs American english e.g. `judgement` vs
`judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
are both present in different places, but I suppose that's
understandable given the number of contributors.

~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
triggers errors in CI when I make changes to it, so I left it as it
is.~~

002d9260

Feb 26, 2024

Add more debug logs to understand if statement-distribution misbehaving (#3419) · b9c792a4

Alexandru Gheorghe authored Feb 26, 2024

Add more debug logs to understand if statement-distribution is in a bad
state, should be useful for debugging
https://github.com/paritytech/polkadot-sdk/issues/3314

 on production
networks.

Additionally, increase the number of parallel requests should make,
since I notice that requests take around 100ms on kusama, and the 5
parallel request was picked mostly random, no reason why we can do more
than that.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: ordian <[email protected]>

b9c792a4

Feb 12, 2024

statement-distribution: Fix CostMinor("Unexpected Statement") (#3223) · 8362a681

Alexandru Gheorghe authored Feb 12, 2024

On grid distribution messages have two paths of reaching a node, so
there is the possiblity of a race when two peers send each other the
same statement around the same time. Statement local_knowledge will tell
us that the peer should have not send the statement because we sent it
to it.

Fix it by also keeping track only of the statement we received from a
given peer and penalize it only if it sends it to us more than once.

Fixes: https://github.com/paritytech/polkadot-sdk/issues/2346



Additionally, also use different Cost labels for different paths to make
it easier to debug things.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>

8362a681

Jan 29, 2024

statement-distribution: fix parachains stalling on async_backing enablement (#3063) · 987edd88

Alexandru Gheorghe authored Jan 29, 2024

Topology is coming only at the beginning of each session, so we might
lose it if prospective parachains was not enabled at the begining of the
session, so cache it for later use.

Fixes: https://github.com/paritytech/polkadot-sdk/issues/3058



---------

Signed-off-by: Alexandru Gheorghe <[email protected]>

987edd88

Jan 10, 2024

add fallback request for req-response protocols (#2771) · f2a750ee

Alin Dima authored Jan 10, 2024

Previously, it was only possible to retry the same request on a
different protocol name that had the exact same binary payloads.

Introduce a way of trying a different request on a different protocol if
the first one fails with Unsupported protocol.

This helps with adding new req-response versions in polkadot while
preserving compatibility with unupgraded nodes.

The way req-response protocols were bumped previously was that they were
bundled with some other notifications protocol upgrade, like for async
backing (but that is more complicated, especially if the feature does
not require any changes to a notifications protocol). Will be needed for
implementing https://github.com/polkadot-fellows/RFCs/pull/47

TODO:
- [x]  add tests
- [x] add guidance docs in polkadot about req-response protocol
versioning

f2a750ee

statement-distribution: validator disabling (#1841) · a4195326

ordian authored Jan 10, 2024



Closes #1591.

The purpose of this PR is filter out backing statements from the network
signed by disabled validators. This is just an optimization, since we
will do filtering in the runtime in #1863 to avoid nodes to filter
garbage out at block production time.

- [x] Ensure it's ok to fiddle with the mask of manifests
- [x] Write more unit tests
- [x] Test locally
- [x] simple zombienet test
- [x] PRDoc

---------

Co-authored-by: Tsvetomir Dimitrov <[email protected]>

a4195326

Dec 13, 2023

Approve multiple candidates with a single signature (#1191) · a84dd0db

Alexandru Gheorghe authored Dec 13, 2023

Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,

## Overall idea

When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.

This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:

- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.

## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.

## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x]  Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742


- [x] Final versi burn-in before merging

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>

a84dd0db

Dec 05, 2023
- statement-distribution: Add tests for incoming acknowledgements (#2498) · f240e025
  Marcin S. authored Dec 05, 2023
  
  f240e025
Nov 14, 2023
- statement-distribution: support inactive local validator in grid (#1571) · 31c38cea
  Kristian Sosnin authored Nov 14, 2023
```
Fixes #1437

Co-authored-by: Sophia Gold <[email protected]>
```
  31c38cea
Nov 06, 2023

approval-voting improvement: include all tranche0 assignments in one certificate (#1178) · 0570b6fa

Andrei Sandu authored Nov 06, 2023

**_PR migrated from https://github.com/paritytech/polkadot/pull/6782_** 

This PR will upgrade the network protocol to version 3 -> VStaging which
will later be renamed to V3. This version introduces a new kind of
assignment certificate that will be used for tranche0 assignments.
Instead of issuing/importing one tranche0 assignment per candidate,
there will be just one certificate per relay chain block per validator.
However, we will not be sending out the new assignment certificates,
yet. So everything should work exactly as before. Once the majority of
the validators have been upgraded to the new protocol version we will
enable the new certificates (starting at a specific relay chain block)
with a new client update.

There are still a few things that need to be done:

- [x] Use bitfield instead of Vec<CandidateIndex>:
https://github.com/paritytech/polkadot/pull/6802


  - [x] Fix existing approval-distribution and approval-voting tests
  - [x] Fix bitfield-distribution and statement-distribution tests
  - [x] Fix network bridge tests
  - [x] Implement todos in the code
  - [x] Add tests to cover new code
  - [x] Update metrics
  - [x] Remove the approval distribution aggression levels: TBD PR
  - [x] Parachains DB migration 
  - [x] Test network protocol upgrade on Versi
  - [x] Versi Load test
  - [x] Add Zombienet test
  - [x] Documentation updates
- [x] Fix for sending DistributeAssignment for each candidate claimed by
a v2 assignment (warning: Importing locally an already known assignment)
 - [x]  Fix AcceptedDuplicate
 - [x] Fix DB migration so that we can still keep old data.
 - [x] Final Versi burn in

---------

Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Alexandru Gheorghe <[email protected]>

0570b6fa

Oct 21, 2023

Vstaging statement distribution omnibus (#1436) · a46183c7

asynchronous rob authored Oct 21, 2023



in-progress PR adding new tests and solving bugs

---------

Co-authored-by: Bradley Olson <[email protected]>
Co-authored-by: eskimor <[email protected]>
Co-authored-by: eskimor <[email protected]>
Co-authored-by: Andrei Sandu <[email protected]>

a46183c7

Sep 27, 2023

Migrate polkadot-primitives to v6 (#1543) · 7cbe0c76

Chris Sosnin authored Sep 27, 2023



- Async-backing related primitives are stable `primitives::v6`
- Async-backing API is now part of `api_version(7)`
- It's enabled on Rococo and Westend runtimes

---------

Signed-off-by: Andrei Sandu <[email protected]>
Co-authored-by: Andrei Sandu <[email protected]>

7cbe0c76