Skip to content
Snippets Groups Projects
  • Alexandru Vasile's avatar
    net/libp2p: Enforce outbound request-response timeout limits (#7222) · fd64a1e7
    Alexandru Vasile authored
    This PR enforces that outbound requests are finished within the
    specified protocol timeout.
    
    The stable2412 version running libp2p 0.52.4 contains a bug which does
    not track request timeouts properly:
    - https://github.com/libp2p/rust-libp2p/pull/5429
    
    The issue has been detected while submitting libp2p -> litep2p requests
    in kusama. This aims to check that pending outbound requests have not
    timedout. Although the issue has been fixed in libp2p, there might be
    other cases where this may happen. For example:
    - https://github.com/libp2p/rust-libp2p/pull/5417
    
    For more context see:
    https://github.com/paritytech/polkadot-sdk/issues/7076#issuecomment-2596085096
    
    
    1. Ideally, the force-timeout mechanism in this PR should never be
    triggered in production. However, origin/stable2412 occasionally
    encounters this issue. When this happens, 2 warnings may be generated:
    - one warning introduced by this PR wrt force timeout terminating the
    request
    - possible one warning when the libp2p decides (if at all) to provide
    the response back to substrate (as mentioned by @alexggh
    [here](https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L769)
    and
    [here](https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L842)
    
    2. This implementation does not propagate to the substrate service the
    `RequestFinished { error: .. }`. That event is only used internally by
    substrate to increment metrics. However, we don't have the peer
    information available to propagate the event properly when we
    force-timeout the request. Considering this should most likely not
    happen in production (origin/master) and that we'll be able to extract
    information by warnings, I would say this is a good tradeoff for code
    simplicity:
    
    
    https://github.com/paritytech/polkadot-sdk/blob/06e3b5c6
    
    /substrate/client/network/src/service.rs#L1543
    
    
    ### Testing
    
    Added a new test to ensure the timeout is reached properly, even if
    libp2p does not produce a response in due time.
    
    I've also transitioned the tests to using `tokio::test` due to a
    limitation of
    [CI](https://github.com/paritytech/polkadot-sdk/actions/runs/12832055737/job/35784043867)
    
    ```
    --- TRY 1 STDERR:        sc-network request_responses::tests::max_response_size_exceeded ---
    thread 'request_responses::tests::max_response_size_exceeded' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/time/interval.rs:139:26:
    there is no reactor running, must be called from the context of a Tokio 1.x runtime
    ```
    
    
    
    cc @paritytech/networking
    
    ---------
    
    Signed-off-by: default avatarAlexandru Vasile <alexandru.vasile@parity.io>
    Co-authored-by: default avatarBastian Köcher <git@kchr.de>
    Unverified
    fd64a1e7
Code owners
Assign users and groups as approvers for specific file changes. Learn more.