Pipelines · Optimize finalization performance (#4922) (221eddc9) · Commits · parity / Mirrored projects / polkadot-sdk

Unverified Commit 221eddc9 authored Jul 05, 2024 by

Nazar Mokrynskyi Committed by GitHub Jul 05, 2024

Optimize finalization performance (#4922)

This PR largely fixes
https://github.com/paritytech/polkadot-sdk/issues/4903 by addressing it
from a few different directions.

The high-level observation is that complexity of finalization was
unfortunately roughly `O(n^3)`. Not only
`displaced_leaves_after_finalizing` was extremely inefficient on its
own, especially when large ranges of blocks were involved, it was called
once upfront and then on every single block that was finalized over and
over again.

The first commit refactores code adjacent to
`displaced_leaves_after_finalizing` to optimize memory allocations. For
example things like `BTreeMap<_, Vec<_>>` were very bad in terms of
number of allocations and after analyzing code paths was completely
unnecessary and replaced with `Vec<(_, _)>`. In other places allocations
of known size were not done upfront and some APIs required unnecessary
cloning of vectors.

I checked invariants and didn't find anything that was violated after
refactoring.

Second commit completely replaces `displaced_leaves_after_finalizing`
implementation with a much more efficient one. In my case with ~82k
blocks and ~13k leaves it takes ~5.4s to finish
`client.apply_finality()` now.

The idea is to avoid querying the same blocks over and over again as
well as introducing temporary local cache for blocks related to leaves
above block that is being finalized as well as local cache of the
finalized branch of the chain. I left some comments in the code and
wrote tests that I belive should check all code invariants for
correctness. `lowest_common_ancestor_multiblock` was removed as
unnecessary and not great in terms of performance API, domain-specific
code should be written instead like done in
`displaced_leaves_after_finalizing`.

After these changes I noticed finalization is still horribly slow,
turned out that even though `displaced_leaves_after_finalizing` was way
faster that before (probably order of magnitude), it was called for
every single of those 82k blocks 🤦



The quick hack I came up with in the third commit to handle this edge
case was to not call it when finalizing multiple blocks at once until
the very last moment. It works and allows to finish the whole
finalization in just 14 seconds (5.4+5.4 of which are two calls to
`displaced_leaves_after_finalizing`). I'm really not happy with the fact
that `displaced_leaves_after_finalizing` is called twice, but much
heavier refactoring would be necessary to get rid of second call.

---

Next steps:
* assuming the changes are acceptable I'll write prdoc
* https://github.com/paritytech/polkadot-sdk/pull/4920 or something
similar in spirit should be implemented to unleash efficient parallelsm
with rayon in `displaced_leaves_after_finalizing`, which will allow to
further (and significant!) scale its performance rather that being
CPU-bound on a single core, also reading database sequentially should
ideally be avoided
* someone should look into removal of the second
`displaced_leaves_after_finalizing` call
* further cleanups are possible if `undo_finalization` can be removed

---

Polkadot Address: 1vSxzbyz2cJREAuVWjhXUT1ds8vBzoxn2w4asNpusQKwjJd

---------

Co-authored-by: Sebastian Kunert <[email protected]>

parent 2b17cb8f

Pipeline #484231 waiting for manual action with stages

in 58 minutes and 45 seconds