Skip to content
Snippets Groups Projects
user avatar
Alexandru Vasile authored
This PR enforces that outbound requests are finished within the
specified protocol timeout.

The stable2412 version running libp2p 0.52.4 contains a bug which does
not track request timeouts properly:
- https://github.com/libp2p/rust-libp2p/pull/5429

The issue has been detected while submitting libp2p -> litep2p requests
in kusama. This aims to check that pending outbound requests have not
timedout. Although the issue has been fixed in libp2p, there might be
other cases where this may happen. For example:
- https://github.com/libp2p/rust-libp2p/pull/5417

For more context see:
https://github.com/paritytech/polkadot-sdk/issues/7076#issuecomment-2596085096


1. Ideally, the force-timeout mechanism in this PR should never be
triggered in production. However, origin/stable2412 occasionally
encounters this issue. When this happens, 2 warnings may be generated:
- one warning introduced by this PR wrt force timeout terminating the
request
- possible one warning when the libp2p decides (if at all) to provide
the response back to substrate (as mentioned by @alexggh
[here](https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L769)
and
[here](https://github.com/paritytech/polkadot-sdk/pull/7222/files#diff-052aeaf79fef3d9a18c2cfd67006aa306b8d52e848509d9077a6a0f2eb856af7L842)

2. This implementation does not propagate to the substrate service the
`RequestFinished { error: .. }`. That event is only used internally by
substrate to increment metrics. However, we don't have the peer
information available to propagate the event properly when we
force-timeout the request. Considering this should most likely not
happen in production (origin/master) and that we'll be able to extract
information by warnings, I would say this is a good tradeoff for code
simplicity:


https://github.com/paritytech/polkadot-sdk/blob/06e3b5c6

/substrate/client/network/src/service.rs#L1543


### Testing

Added a new test to ensure the timeout is reached properly, even if
libp2p does not produce a response in due time.

I've also transitioned the tests to using `tokio::test` due to a
limitation of
[CI](https://github.com/paritytech/polkadot-sdk/actions/runs/12832055737/job/35784043867)

```
--- TRY 1 STDERR:        sc-network request_responses::tests::max_response_size_exceeded ---
thread 'request_responses::tests::max_response_size_exceeded' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/time/interval.rs:139:26:
there is no reactor running, must be called from the context of a Tokio 1.x runtime
```



cc @paritytech/networking

---------

Signed-off-by: default avatarAlexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: default avatarBastian Köcher <git@kchr.de>
fd64a1e7