collator-protocol: Always stay connected to validators in backing group (#3544)
Looking at rococo-asset-hub
https://github.com/paritytech/polkadot-sdk/issues/3519 there seems to be
a lot of instances where collator did not advertise their collations,
while there are multiple problems there, one of it is that we are
connecting and disconnecting to our assigned validators every block,
because on reconnect_timeout every 4s we call connect_to_validators and
that will produce 0 validators when all went well, so set_reseverd_peers
called from validator discovery will disconnect all our peers.
More details here:
https://github.com/paritytech/polkadot-sdk/issues/3519#issuecomment-1972667343
Now, this shouldn't be a problem, but it stacks with an existing bug in
our network stack where if disconnect from a peer the peer might not
notice it, so it won't detect the reconnect either and it won't send us
the necessary view updates, so we won't advertise the collation to it
more details here:
https://github.com/paritytech/polkadot-sdk/issues/3519#issuecomment-1972958276
To avoid hitting this condition that often, let's keep the peers in the
reserved set for the entire duration we are allocated to a backing
group. Backing group sizes(1 rococo, 3 kusama, 5 polkadot) are really
small, so this shouldn't lead to that many connections. Additionally,
the validators would disconnect us any way if we don't advertise
anything for 4 blocks.
## TODO
- [x] More testing.
- [x] Confirm on rococo that this is improving the situation. (It
doesn't but just because other things are going wrong there).
---------
Signed-off-by: Alexandru Gheorghe <[email protected]>
Please register or sign in to comment