testing.md 9.44 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Testing

Automated testing is an essential tool to assure correctness.

## Scopes

The testing strategy for polkadot is 4-fold:

### Unit testing (1)

Boring, small scale correctness tests of individual functions.

### Integration tests

There are two variants of integration tests:

#### Subsystem tests (2)

One particular subsystem (subsystem under test) interacts with a
mocked overseer that is made to assert incoming and outgoing messages
of the subsystem under test.
This is largely present today, but has some fragmentation in the evolved
Denis_P's avatar
Denis_P committed
23
integration test implementation. A `proc-macro`/`macro_rules` would allow
24
25
26
27
28
29
30
31
for more consistent implementation and structure.

#### Behavior tests (3)

Launching small scale networks, with multiple adversarial nodes without any further tooling required.
This should include tests around the thresholds in order to evaluate the error handling once certain
assumed invariants fail.

Denis_P's avatar
Denis_P committed
32
For this purpose based on `AllSubsystems` and `proc-macro` `AllSubsystemsGen`.
33
34
35
36
37
38

This assumes a simplistic test runtime.

#### Testing at scale (4)

Launching many nodes with configurable network speed and node features in a cluster of nodes.
Denis_P's avatar
Denis_P committed
39
At this scale the [Simnet][simnet] comes into play which launches a full cluster of nodes.
40
The scale is handled by spawning a kubernetes cluster and the meta description
Denis_P's avatar
Denis_P committed
41
42
is covered by [Gurke][Gurke].
Asserts are made using Grafana rules, based on the existing prometheus metrics. This can
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
be extended by adding an additional service translating `jaeger` spans into addition
prometheus avoiding additional polkadot source changes.

_Behavior tests_ and _testing at scale_ have naturally soft boundary.
The most significant difference is the presence of a real network and
the number of nodes, since a single host often not capable to run
multiple nodes at once.

---

## Coverage

Coverage gives a _hint_ of the actually covered source lines by tests and test applications.

The state of the art is currently [tarpaulin][tarpaulin] which unfortunately yields a
lot of false negatives. Lines that are in fact covered, marked as uncovered due to a mere linebreak in a statement can cause these artifacts. This leads to
lower coverage percentages than there actually is.

Since late 2020 rust has gained [MIR based coverage tooling](
https://blog.rust-lang.org/inside-rust/2020/11/12/source-based-code-coverage.html).

```sh
# setup
rustup component add llvm-tools-preview
cargo install grcov miniserve

export CARGO_INCREMENTAL=0
# wasm is not happy with the instrumentation
export SKIP_BUILD_WASM=true
export BUILD_DUMMY_WASM_BINARY=true
# the actully collected coverage data
export LLVM_PROFILE_FILE="llvmcoveragedata-%p-%m.profraw"
# build wasm without instrumentation
export WASM_TARGET_DIRECTORY=/tmp/wasm
cargo +nightly build
# required rust flags
export RUSTFLAGS="-Zinstrument-coverage"
# assure target dir is clean
rm -r target/{debug,tests}
# run tests to get coverage data
cargo +nightly test --all

# create the *html* report out of all the test binaries
# mostly useful for local inspection
grcov . --binary-path ./target/debug -s . -t html --branch --ignore-not-existing -o ./coverage/
miniserve -r ./coverage

# create a *codecov* compatible report
grcov . --binary-path ./target/debug/ -s . -t lcov --branch --ignore-not-existing --ignore "/*" -o lcov.info
```

Denis_P's avatar
Denis_P committed
94
The test coverage in `lcov` can the be published to <https://codecov.io>.
95
96
97
98
99

```sh
bash <(curl -s https://codecov.io/bash) -f lcov.info
```

Denis_P's avatar
Denis_P committed
100
or just printed as part of the PR using a github action i.e. [`jest-lcov-reporter`](https://github.com/marketplace/actions/jest-lcov-reporter).
101

Denis_P's avatar
Denis_P committed
102
For full examples on how to use [`grcov` /w polkadot specifics see the github repo](https://github.com/mozilla/grcov#coverallscodecov-output).
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146

## Fuzzing

Fuzzing is an approach to verify correctness against arbitrary or partially structured inputs.

Currently implemented fuzzing targets:

* `erasure-coding`
* `bridges/storage-proof`

The tooling of choice here is `honggfuzz-rs` as it allows _fastest_ coverage according to "some paper" which is a positive feature when run as part of PRs.

Fuzzing is generally not applicable for data secured by cryptographic hashes or signatures. Either the input has to be specifically crafted, such that the discarded input
percentage stays in an acceptable range.
System level fuzzing is hence simply not feasible due to the amount of state that is required.

Other candidates to implement fuzzing are:

* `rpc`
* ...

## Performance metrics

There are various ways of performance metrics.

* timing with `criterion`
* cache hits/misses w/ `iai` harness or `criterion-perf`
* `coz` a performance based compiler

Most of them are standard tools to aid in the creation of statistical tests regarding change in time of certain unit tests.

`coz` is meant for runtime. In our case, the system is far too large to yield a sufficient number of measurements in finite time.
An alternative approach could be to record incoming package streams per subsystem and store dumps of them, which in return could be replayed repeatedly at an
accelerated speed, with which enough metrics could be obtained to yield
information on which areas would improve the metrics.
This unfortunately will not yield much information, since most if not all of the subsystem code is linear based on the input to generate one or multiple output messages, it is unlikely to get any useful metrics without mocking a sufficiently large part of the other subsystem which overlaps with [#Integration tests] which is unfortunately not repeatable as of now.
As such the effort gain seems low and this is not pursued at the current time.

## Writing small scope integration tests with preconfigured workers

Requirements:

* spawn nodes with preconfigured behaviors
* allow multiple types of configuration to be specified
Denis_P's avatar
Denis_P committed
147
* allow extendability via external crates
148
149
150
151
* ...

---

Denis_P's avatar
Denis_P committed
152
## Implementation of different behavior strain nodes
153
154
155
156
157
158
159
160
161
162
163
164
165

### Goals

The main goals are is to allow creating a test node which
exhibits a certain behavior by utilizing a subset of _wrapped_ or _replaced_ subsystems easily.
The runtime must not matter at all for these tests and should be simplistic.
The execution must be fast, this mostly means to assure a close to zero network latency as
well as shorting the block time and epoch times down to a few `100ms` and a few dozend blocks per epoch.

### Approach

#### MVP

Denis_P's avatar
Denis_P committed
166
A simple small scale builder pattern would suffice for stage one implementation of allowing to
167
168
169
170
replace individual subsystems.
An alternative would be to harness the existing `AllSubsystems` type
and replace the subsystems as needed.

Denis_P's avatar
Denis_P committed
171
#### Full `proc-macro` implementation
172
173

`Overseer` is a common pattern.
Denis_P's avatar
Denis_P committed
174
It could be extracted as `proc` macro and generative `proc-macro`.
175
176
177
178
179
This would replace the `AllSubsystems` type as well as implicitly create
the `AllMessages` enum as  `AllSubsystemsGen` does today.

The implementation is yet to be completed, see the [implementation PR](https://github.com/paritytech/polkadot/pull/2962) for details.

Denis_P's avatar
Denis_P committed
180
##### Declare an overseer implementation
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

```rust
struct BehaveMaleficient;

impl OverseerGen for BehaveMaleficient {
	fn generate<'a, Spawner, RuntimeClient>(
		&self,
		args: OverseerGenArgs<'a, Spawner, RuntimeClient>,
	) -> Result<(Overseer<Spawner, Arc<RuntimeClient>>, OverseerHandler), Error>
	where
		RuntimeClient: 'static + ProvideRuntimeApi<Block> + HeaderBackend<Block> + AuxStore,
		RuntimeClient::Api: ParachainHost<Block> + BabeApi<Block> + AuthorityDiscoveryApi<Block>,
		Spawner: 'static + SpawnNamed + Clone + Unpin,
	{
		let spawner = args.spawner.clone();
		let leaves = args.leaves.clone();
		let runtime_client = args.runtime_client.clone();
		let registry = args.registry.clone();
		let candidate_validation_config = args.candidate_validation_config.clone();
		// modify the subsystem(s) as needed:
		let all_subsystems = create_default_subsystems(args)?.
        // or spawn an entirely new set

        replace_candidate_validation(
			// create the filtered subsystem
			FilteredSubsystem::new(
				CandidateValidationSubsystem::with_config(
					candidate_validation_config,
					Metrics::register(registry)?,
				),
                // an implementation of
				Skippy::default(),
			),
		);

		Overseer::new(leaves, all_subsystems, registry, runtime_client, spawner)
			.map_err(|e| e.into())

        // A builder pattern will simplify this further
        // WIP https://github.com/paritytech/polkadot/pull/2962
	}
}

fn main() -> eyre::Result<()> {
	color_eyre::install()?;
	let cli = Cli::from_args();
	assert_matches::assert_matches!(cli.subcommand, None);
	polkadot_cli::run_node(cli, BehaveMaleficient)?;
	Ok(())
}
```

[`variant-a`](../node/malus/src/variant-a.rs) is a fully working example.

#### Simnet

Denis_P's avatar
Denis_P committed
237
238
Spawn a kubernetes cluster based on a meta description using [Gurke] with the
[Simnet] scripts.
239
240
241
242
243
244
245

Coordinated attacks of multiple nodes or subsystems must be made possible via
a side-channel, that is out of scope for this document.

The individual node configurations are done as targets with a particular
builder configuration.

Denis_P's avatar
Denis_P committed
246
#### Behavior tests w/o Simnet
247
248
249

Commonly this will require multiple nodes, and most machines are limited to
running two or three nodes concurrently.
Denis_P's avatar
Denis_P committed
250
Hence, this is not the common case and is just an implementation _idea_.
251
252
253
254
255
256
257
258
259
260
261
262

```rust
behavior_testcase!{
"TestRuntime" =>
"Alice": <AvailabilityDistribution=DropAll, .. >,
"Bob": <AvailabilityDistribution=DuplicateSend, .. >,
"Charles": Default,
"David": "Charles",
"Eve": "Bob",
}
```

Denis_P's avatar
Denis_P committed
263
[Gurke]: https://github.com/paritytech/gurke
264
[simnet]: https://github.com/paritytech/simnet_scripts