Skip to content
  1. Apr 09, 2022
    • Sergei Shulepov's avatar
      Fixes the dead lock when any of the channels get at capacity. (#5297) · b89dc00e
      Sergei Shulepov authored
      The PVF host is designed to avoid spawning tasks to minimize knowledge
      of outer code. Using `async_std::task::spawn` (or Tokio's counterpart)
      deemed unacceptable, `SpawnNamed` undesirable. Instead there is only one
      task returned that is spawned by the candidate-validation subsystem.
      The tasks from the sub-components are polled by that root task.
      
      However, the way the tasks are bundled was incorrect. There was a giant
      select that was polling those tasks. Particularly, that implies that as soon as
      one of the arms of that select goes into await those sub-tasks stop
      getting polled. This is a recipe for a deadlock which indeed happened
      here.
      
      Specifically, the deadlock happened during sending messages to the
      execute queue by calling
      [`send_execute`](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L601).
      When the channel to the queue reaches the capacity, the control flow is
      suspended until the queue handles those messages. Since this code is
      essentially reached from [one of the select
      arms](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L371),
      the queue won't be given the control and thus no further progress can be
      made.
      
      This problem is solved by bundling the tasks one level higher instead,
      by `selecting` over those long-running tasks.
      
      We also stop treating returning from those long-running tasks as error
      conditions, since that can happen during legit shutdown.
      b89dc00e
  2. Apr 08, 2022
  3. Apr 07, 2022
  4. Apr 06, 2022
  5. Apr 04, 2022
  6. Apr 03, 2022
  7. Apr 02, 2022
  8. Apr 01, 2022