collator-protocol: Always stay connected to validators in backing group (#3544)
Looking at rococo-asset-hub https://github.com/paritytech/polkadot-sdk/issues/3519 there seems to be a lot of instances where collator did not advertise their collations, while there are multiple problems there, one of it is that we are connecting and disconnecting to our assigned validators every block, because on reconnect_timeout every 4s we call connect_to_validators and that will produce 0 validators when all went well, so set_reseverd_peers called from validator discovery will disconnect all our peers. More details here: https://github.com/paritytech/polkadot-sdk/issues/3519#issuecomment-1972667343 Now, this shouldn't be a problem, but it stacks with an existing bug in our network stack where if disconnect from a peer the peer might not notice it, so it won't detect the reconnect either and it won't send us the necessary view updates, so we won't advertise the collation to it more details here: https://github.com/paritytech/polkadot-sdk/issues/3519#issuecomment-1972958276 To avoid hitting this condition that often, let's keep the peers in the reserved set for the entire duration we are allocated to a backing group. Backing group sizes(1 rococo, 3 kusama, 5 polkadot) are really small, so this shouldn't lead to that many connections. Additionally, the validators would disconnect us any way if we don't advertise anything for 4 blocks. ## TODO - [x] More testing. - [x] Confirm on rococo that this is improving the situation. (It doesn't but just because other things are going wrong there). --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
Please register or sign in to comment