Skip to content
Snippets Groups Projects
  • Alexandru Gheorghe's avatar
    authorithy-discovery: Make changing of peer-id while active a bit more robust (#3786) · 6720279f
    Alexandru Gheorghe authored
    
    In the case when nodes don't persist their node-key or they want to
    generate a new one while being in the active set, things go wrong
    because both the old addresses and the new ones will still be present in
    DHT, so because of the distributed nature of the DHT both will survive
    in the network untill the old ones expires which is 36 hours. Nodes in
    the network will randomly resolve the authorithy-id to the old address
    or the new one.
    
    More details in: https://github.com/paritytech/polkadot-sdk/issues/3673
    
    This PR proposes we mitigate this problem, by:
    
    1. Let the query for a DHT key retrieve more than one results(4), that
    is also bounded by the replication factor which is 20, currently we
    interrupt the querry on the first result.
    ~2. Modify the authority-discovery service to keep all the discovered
    addresses around for 24h since they last seen an address.~
    ~3. Plumb through other subsystems where the assumption was that an
    authorithy-id will resolve only to one PeerId. Currently, the
    authorithy-discovery keeps just the last record it received from DHT and
    queries the DHT every 10 minutes. But they could always receive only the
    old address, only the new address or a flip-flop between them depending
    on what node wins the race to provide the record~
    
    2. Extend the `SignedAuthorityRecord` with a signed creation_time.
    3. Modify authority discovery to keep track of nodes that sent us old
    record and once we are made aware of a new record update the nodes we
    know about with the new record.
    4. Update gossip-support to try resolve authorities more often than
    every session.
    
    ~This would gives us a lot more chances for the nodes in the networks to
    also discover not only the old address of the node but also the new one
    and should improve the time it takes for a node to be properly connected
    in the network. The behaviour won't be deterministic because there is no
    guarantee the all nodes will see the new record at least once, since
    they could query only nodes that have the old one.~
    
    
    ## TODO
    - [x] Add unittests for the new paths.
    - [x] Make sure the implementation is backwards compatible
    - [x] Evaluate if there are any bad consequence of letting the query
    continue rather than finish it at first record found.
    - [x] Bake in versi the new changes.
    
    ---------
    
    Signed-off-by: default avatarAlexandru Gheorghe <alexandru.gheorghe@parity.io>
    Co-authored-by: default avatarDmitry Markin <dmitry@markin.tech>
    Co-authored-by: default avatarAlexandru Vasile <60601340+lexnv@users.noreply.github.com>
    Unverified
    6720279f
Code owners
Assign users and groups as approvers for specific file changes. Learn more.