Skip to content
Commit 0db3088d authored by Peter Goodspeed-Niklaus's avatar Peter Goodspeed-Niklaus Committed by GitHub
Browse files

runtime upgrade (#70)



* Initial commit

* Update to 3e65111

* Add cfg_attr ... no_std

* Fix version

* WIP: add really simple validate_block insert validity check

* WIP: create a parachain upgrade pallet

This pallet will eventually make life much easier for people attempting
to upgrade parachains on their validator nodes, but for the moment,
key sections remain unimplemented while dependency details are worked
out.

* Implement basic admin-auth pallet functionality.

This compiles, which means it's probably mostly correct. However,
it's pretty far from being finished. Work yet to come:

- Integrate with the democracy pallet somehow to eliminate the
  requirement for the root user to initiate this process.
- Figure out what to do in the event that the parachain blocks
  and relay chain blocks get out of sync / delayed.
- Add testing... somehow. (What's reasonable to test?)

Open questions:

- Is the block number parameter in `on_initialize` the parachain
  block number, or the relay chain block number? If, as I suspect,
  it's the parachain block number, how do we deal with the fact that
  the real upgrade should happen on a very specific parachain block
  number?
- In line 68, is it reasonable to use `if n >= apply_block`, or should
  that line require strict equality?
- Is it reasonable to store/retrieve `CurrentBlockNumber` on every block,
  or is there a more streamlined way to pass that data between functions?
  Maybe it can be inserted into `struct Module` somehow?
- Can we somehow parametrize ValidationUpgradeDelayBlocks by T in
  order to eliminate the `.into()` call?

* use a better storage name

* Add checks ensuring runtime versions increase

Largely cribbed from https://github.com/paritytech/substrate/blob/a439a7aa5a9a3df2a42d9b25ea04288d3a0866e8/frame/system/src/lib.rs#L467-L494

* fix tests

* WIP: add tests from frame/system set_code

Currently doesn't build: line 230 is the problem. Removing or
commenting that line results in the new tests failing due to a
missing block number. Adding it, in an attempt to fix the problem,
fails to compile with this error:

```
   Compiling parachain-upgrade-pallet v2.0.0 (/home/coriolinus/Documents/Projects/coriolinus/parachain-upgrade-pallet)
error[E0599]: no function or associated item named `set_block_number` found for struct `Module<tests::Test>` in the current scope
   --> src/lib.rs:230:21
    |
47  | / decl_module! {
48  | |     pub struct Module<T: Trait> for enum Call where origin: T::Origin {
49  | |         // Initializing events
50  | |         // this is needed only if you are using events in your pallet
...   |
100 | |     }
101 | | }
    | |_- function or associated item `set_block_number` not found for this
...
230 |               System::set_block_number(123);
    |                       ^^^^^^^^^^^^^^^^
    |                       |
    |                       function or associated item not found in `Module<tests::Test>`
    |                       help: there is a method with a similar name: `current_block_number`
    |
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

error: aborting due to previous error

For more information about this error, try `rustc --explain E0599`.
error: could not compile `parachain-upgrade-pallet`.
```

That error is very weird, because the function does in fact exist:
https://github.com/paritytech/substrate/blob/a439a7aa5a9a3df2a42d9b25ea04288d3a0866e8/frame/system/src/lib.rs#L897

* cause tests to pass

Turns out that in fact there was some setup required in order to
get everything testing properly, but now we have a set of passing
unit tests which test some of the more common error cases.

* Add overlapping upgrades test

This currently fails, and I don't yet know why. TODO!

* Fix some logic errors

- In particular, only remove the pending validation function from
  storage when it's time to apply it.
- Don't store our own copy of the current block number.

* WIP: delegate most code upgrade permissions checks

They're defined in System::can_set_code, so may as well use them.

Unfortunately, the tests all fail for this right now, and I don't
yet understand why. Pushing to get immutable line number references.

* fix tests after delegating runtime checks to can_set_code

* WIP: events test

Right now, the events struct doesn't seem to contain enough information
to validate the particular events that we should have fired. Almost
certainly, this is a usage error on my part.

* fully initialize and finalize in event test

This doesn't change the results, though.

* fix events test

This was complicated to figure out. For the record, testing events
requires:

- a private module which publicly exports the crate's event type
- impl_outer_event macro to construct an enum wrapping the event
  types from the listed modules
- system::Trait and local Trait both declare `type Event = TestEvent;`
- (optional) group events with `System::<Test>::initialize(...)` and
  `System::<Test>::finalize()` calls.

It's not entirely clear why both events show up during the initialization
phase; my suspicion is that this is an artifact of not mocking a
particular extrinsic, such that they end up in initialization by default.

* cleanup and move crate to subdirectory

this prepares us to merge this pallet into the cumulus repo

* provisionally incorporate polkadot changes to hook everything together

This feels like the logical next step, and compiles, at least. Still,
there are some big TODOs remaining:

- merge the polkadot PR upstream and reset the polkadot branch in
  `runtime/Cargo.toml`
- in `runtime/src/validate_block/implementation.rs:116`, we should
  almost certainly return `Some(something)` sometime. When, precisely,
  and how to keep track of the appropriate data are all still open
  questions.

* WIP: further updates to work with the polkadot implementation

Hopefully we can upstream `ValidationFunctionParams` into the
polkadot trait defs so we can just copy the struct out of
`ValidationParams`, but no huge loss if not.

This should be more or less everything required at this level.
Next up: fix up `pallet-parachain-upgrade` so it reads from
`VALIDATION_FUNCTION_PARAMS` to determine upgrade legality and
upgrade block, and writes to `NEW_VALIDATION_CODE` when appropriate.

* update pallet-parachain-upgrade appropriately to handle new expectations

Implements the pallet side of the new flow. Basic tests appear to work.

Next up:

- make the "real blob" test work
- add a bunch of additional tests of all the corners

* remove test which set a real WASM blob

This test didn't directly test any of the code in this pallet;
it existed because we were just copying tests out of the substrate
implementation. Now that we have real code of our own to test,
(and because it's not compatible with the `BlockTests` abstraction,)
we don't need that test anymore.

Also added a `Drop` impl to `BlockTests` ensuring they get run at
least once.

* add test that storage gets manipulated as expected

* get validate_block tests compiling again

* Check validation function size against polkadot parameters

Generate a user-handlable error message if it's too big, so that
nothing panics later.

* demonstrate that block tests run

* don't actually store any magic values in parachain storage

We're allowed to use it as a transport layer between validate_block
and the parachain upgrade pallet, but actually editing it or, in
particular, attempting to persist data there which didn't originate
in the extrinsic, breaks things.

This means that we can't keep the :code insertion check, because
the validate_block layer doesn't know when it is legal to actually
upgrade the parachain. However, the rest of the features survive,
and all tests currently pass, so I'm counting it as a win.

Next up: look into adding an inherent which publishes the
ValidationFunctionParams struct to arbitrary pallets.

* Add reference to polkadot_client to Collator

This enables us to get the validation function parameters at
runtime, which unblocks creating an inherent containing them.

* remove unused imports

* Remove VFPX; build VFP from existing data structures

I almost don't want to know how long both global_validation
and local_validation have existed in the produce_candidate
function signature; they were precisely what I needed,
without needing to add anything to the Collator struct at all.

Oh well, at least I noticed it before putting the PR up for review.

Next up: create a proper inherent definition for the
ValidationFunctionParams.

* WIP: add cumulus-validation-function-params crate

Modeled on the substrate timestamp crate.
It's not currently obvious to me why it is desirable to publish
an entire crate for what amounts to a single const definition;
going to ask about that.

* refactor: get rid of validation-function-params crate

Everything about the VFPs has been moved into a module of runtime

* WIP: get VFP from inherent, when possible

Doesn't compile for weird trait errors; probable next steps: just
copy over the relevant code directly.

* ensure VFPs are available during block production and validation

* cleanup in preparation for review request

* Copy cumulus-primitives crate from bkchr-message-broker

That branch is visible as #80; this message copies the crate as of
d4b2045573796955de4e5bf8f74b6c48b44c3bee.

This isn't even a cherry-pick, because the commit which introduced
the primitives crate also did some work which from the perspective
of this PR is irrelevant. With any luck, by coping the crate directly,
there won't be too many merge conflicts when the second of these
open PRs is merged.

* move mod validation_function_params to cumulus_primitives

There is some very weird behavior going on with cargo check: every
individual crate checks out fine, as verified with this invocation:

for crate in $(fd Cargo.toml | xargs dirname); do
    if [ "$crate" == . ] || [[ "$crate" == *test* ]]; then continue; fi;
    name=$(toml2json "$crate/Cargo.toml" | jq -r '.package.name')
    if ! cargo check -p "$name" >/dev/null 2>/dev/null; then
        echo "failed to build $name"
    fi
done

However, `cargo check .` no longer works; it is suddenly demanding
clang in order to build an indirect dependency. I'm not going to
keep messing around with this anymore; it's more profitable for the
moment to knock out the rest of the requested changes. Still, this
behavior is weird enough that I really don't have any idea why
it might be happening.

* convert indentation to tabs

* rename parachain upgrade pallet per PR review

* use compact form for dependencies

* remove pallet readme

Move pertinent documentation into pallet's rustdoc.

* Add weight data in compliance with updated substrate requirements

The substrate API changed, so now we _have_ to invent some kind of
weight data for the dispatchables. This commit does so, with the
caveat that the numbers are pulled straight out of nowhere. Benchmarking
remains a TODO item.

* use anonymous fatal error type for brevity

* Create, use a Call for setting VFPs

Modeled on Timestamp; makes the ProvideInherent impl work much better.

* fix pallet tests

* Apply suggestions from code review

Co-Authored-By: default avatarBastian Köcher <[email protected]>

* fix formatting

* add license header

* refactor primitive inherents / keys into appropriate modules

* impl From<(GlobalValidationSchedule, LocalValidationData)> for ValidationFunctionParams

* extract inherent data vfp injection into a function

* collapse parachain dependency into compact form

* always store vfps under same storage key

* fix docs

* use minimum weight for VFP inherent

* rename module methods for clarity

* fix tests: set_code -> schedule_upgrade

* Apply pending validation function at inherent creation, not init

Initialization happens before inherent creation, which means that
at the time of `on_initialize`, the VFPs for the current block
have not yet been loaded. This is a problem, because it means that
updates would happen one block late, every time.

Moving that logic into inherent creation means that we always have
current information to work with.

* typo: default_features -> default-features

* do not panic in create_inherent

* revert f741cf0f2bc; don't change behavior, but use correct spelling

* move block initialization logic from inherent creation into the inherent

* re-disable default features

It is very difficult to come up with a coherent theory of what's
going on with these default features. Builds were all broken as of
3eb1618. Renaming them in f741cf0 seemed to fix that behavior.
Then they broke again locally, prompting aaee1c0. This commit
restores the status quo as of f741cf0; with any luck, the build
will succeed both locally and in CI.

* regenerate Cargo.lock

This updates several packages, but by inspection, they are all published
crates from crates.io; by semver, this should not cause any behavioral
changes.

This also updates the lockfile format to the new format.

The point of this commit is to deal with the fact that `sc-client`
no longer exists.

* fix checks given new dependencies

Appropriate weight declarations have changed; this follows them,
still using timestamp examples.

Note that these weights are almost certainly wrong.

* fix tests given new dependencies

* add another check preventing block VFPs from contaminating validity checks

* Add OnValidationFunctionParams trait so other modules can callback

There isn't yet an obvious use case for other modules to get the
validation function params from this one, but we may as well support
it on the off chance.

* Add get_validation_function_params

This getter allows other modules to simply get the validation
function parameters, if they have been updated for this block.
Otherwise, it returns None, and they can try again later.

* upgrade substrate: panic on div by 0

* Apply whitespace suggestions from code review

These suggestions should make no semantic difference.

Co-authored-by: default avatarBastian Köcher <[email protected]>

* Apply semantic from code review

These changes affect the semantics of the code; I'll follow up by ensuring that everything still works.

Co-authored-by: default avatarBastian Köcher <[email protected]>

* add documentation to ValidationFunction type

* removing panicing private fn validation_function_params()

* expect validation function params to be in inherent data

* move OnValidationFunctionParams to primitives

* resolve weird formatting

* move mod validation_function_params into its own file

* add license to new file

Co-authored-by: default avatarRicardo Rius <[email protected]>
Co-authored-by: default avatarRicardo Rius <[email protected]>
Co-authored-by: default avatarJoshy Orndorff <[email protected]>
Co-authored-by: default avatarBastian Köcher <[email protected]>
parent e04772dc
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment