Skip to content
Snippets Groups Projects
  • Peter Goodspeed-Niklaus's avatar
    Automatic Example Collator (#67) · 5678c8a1
    Peter Goodspeed-Niklaus authored
    
    * add polkadot build script
    
    * Add scripting to bring up a simple alice-bob example net
    
    Demonstrated to produce blocks, but as of right now there's still
    trouble getting it to respond to external queries on its ports.
    
    * enable external rpc access to the nodes
    
    Also shrink the build context by excluding some extraneous data.
    
    * Ensure external RPC access works
    
    Also set default branch appropriately, and have the stop command
    clean itself up more thoroughly.
    
    * Add multi-stage dockerfile for building the cumulus-test-parachain-collator
    
    - Exclude the docker/ directory from build context because we're
      never going to build recursively, and this prevents spurious
      cache misses
    - build the parachain collator in three stages. The build stage
      is discarded; the collator stage has a wrapper script to simplify
      generating the right bootnodes flags, and the default stage
      has just the binary in a small runtime.
    - build_collator.sh collects appropriate build flags for the dockerfile
    - inject_bootnodes.sh discovers the testnet node IDs and inserts them
      into the arguments list for cumulus-test-parachain-collator
    
    * Add services which generate genesis state, run the collator
    
    - Ignore the scripts directory to reduce spurious cache misses.
    - Move inject_bootnodes.sh from the scripts directory into the root:
      It can't stay in the scripts directory, because that's ignored;
      I didn't want to invent _another_ top-level subdirectory for it.
      That decision could certainly be appealed, though.
    - Move docker-compose.yml, add dc.sh, modify *_collator.sh: by
      taking docker-compose.yml out of the root directory, we can
      further reduce cache misses. However, docker-compose normally
      has a strong expectation that docker-compose.yml exist in the
      project root; it takes a moderately complicated invocation to
      override that expectation. That override is encoded in dc.sh;
      the updates to the other scripts are just to use the override.
    
    The expectation as of now is that scripts/run_collator.sh runs
    both chain nodes and the collator, generates the genesis state
    into a volume with a transient container, and runs the collator
    as specified in the repo README.
    
    Upcoming work: Steps 5 and 6 from the readme.
    
    * Launch the collator node
    
    The biggest change here is adding the testing_net network to the
    collator node's networks list. This lets it successfully connect
    to the alice and bob nodes, which in turn lets it get their node IDs,
    which was the blocker for a long time.
    
    Remove httpie in favor of curl: makes for a smaller docker image,
    and has fewer weird failure modes within docker.
    
    Unfortunately this doesn't yet actually connect to the relay chain
    nodes; that's the next area to figure out.
    
    * enable external websocket access to indexer nodes
    
    * Reorganize for improved caching, again
    
    - Manually enumerate the set of source directories to copy when building.
      This bloats the cache a bit, but means that rebuilds on script changes
      don't bust that cache, which saves a _lot_ of time.
    - Un-.dockerignore the scripts directory; it's small and will no longer
      trigger cache misses.
    - Move inject_bootnodes.sh back into scripts directory for better organization.
    - inject_bootnodes.sh: use rpc port for rpc call and p2p port for
      generating the bootnode string. I'm not 100% sure this is correct,
      but upwards of 80% at least.
    - docker-compose.yml: reorganize the launch commands such that alice
      and bob still present the same external port mapping to the world,
      but within the docker-compose network, they both use the same
      (standard) p2p, rpc, and websocket ports. This makes life easier
      for inject_bootnodes.sh
    
    The collator node still doesn't actually connect, but I think this
    commit still represents real progress in that direction.
    
    * Get the collator talking to the indexer nodes
    
    In the end, it was four characters: -- and two = signs in the
    launch arguments. They turn out to be critical characters for
    correct operation, though!
    
    Next up: automating step 5.
    
    * Add runtime stage to collect runtime wasm blob into volume
    
    We can't just copy the blob in the builder stage because the volumes
    aren't available at that point.
    
    Rewrite build_collator.sh into build_docker.sh and update for generality.
    
    * WIP: add registrar service and partial work to actually register the collator
    
    This is likely to be discarded; the Python library in use is 3rd party
    and not well documented, while the official polkadot-js repo has a
    CLI tool: https://github.com/polkadot-js/tools/tree/master/packages/api-cli
    
    * Add a parachain registrar which should properly register the parachain
    
    Doesn't work at the moment because it depends on two api-cli features
    which I added today, which have not yet made it out into a published
    release.
    
    Next up: figure out how to add the `api-cli` at its `master` branch,
    then run tests to ensure the collator is producing blocks. Then,
    automate the block production tests.
    
    * BROKEN attempt to demo registrar communication with the blockchain
    
    This is a really weird bug. After running `scripts/run_collector.sh`,
    which brings everything up, it's perfectly possible to get into
    a state very much like what the registrar is in, and communicate
    with the blockchain without issue:
    
    ```sh
    $ docker run --rm --net cumulus_testing_net para-reg:latest polkadot-js-api --ws ws://172.28.1.1:9944 query.sudo.key
    Thu 20 Feb 2020 12:19:20 PM CET
    {
      "key": "5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY"
    }
    ```
    
    However, the registrar itself, doing the same thing from within
    `register_para.sh`, is failing to find the right place in the network:
    
    ```
    /runtime/cumulus_test_parachain_runtime.compact.wasm found after 0 seconds
    /genesis/genesis-state found after 0 seconds
    2020-02-20 10:43:22          API-WS: disconnected from ws://172.28.1.1:9944 code: '1006' reason: 'connection failed'
    _Event {
      type: 'error',
      isTrusted: false,
      _yaeti: true,
      target: W3CWebSocket {
        _listeners: {},
        addEventListener: [Function: _addEventListener],
        removeEventListener: [Function: _removeEventListener],
        dispatchEvent: [Function: _dispatchEvent],
        _url: 'ws://172.28.1.1:9944',
        _readyState: 3,
        _protocol: undefined,
        _extensions: '',
        _bufferedAmount: 0,
        _binaryType: 'arraybuffer',
        _connection: undefined,
        _client: WebSocketClient {
          _events: [Object: null prototype] {},
          _eventsCount: 0,
          _maxListeners: undefined,
          config: [Object],
          _req: null,
          protocols: [],
          origin: undefined,
          url: [Url],
          secure: false,
          base64nonce: 'aJ6J3pYDz8l5owVWHGbzHg==',
          [Symbol(kCapture)]: false
        },
        onclose: [Function (anonymous)],
        onerror: [Function (anonymous)],
        onmessage: [Function (anonymous)],
        onopen: [Function (anonymous)]
      },
      cancelable: true,
      stopImmediatePropagation: [Function (anonymous)]
    }
    ```
    
    They should be connected to the same network, running the same
    image, doing the same call. The only difference is the file
    existence checks, which really shouldn't be affecting the network
    state at all.
    
    Pushing this commit to ask for outside opinions on it, because
    this is very weird and I clearly don't understand some part of
    what's happening.
    
    * Fix broken parachain registrar
    
    The problem was that the registrar container was coming up too fast,
    so the Alice node wasn't yet ready to receive connections. Using
    a well-known wait script fixes the issue.
    
    Next up: verify that the collator is in fact building blocks.
    
    * fixes which cause the collator to correctly produce new parachain blocks
    
    It didn't take much! The biggest issue was that the genesis state
    was previously being double-encoded.
    
    * add documentation for running the parachain automatically
    
    * Add health check to collator
    
    * minor scripting improvements
    
    * Apply suggestions from code review
    
    Co-Authored-By: default avatarBastian Köcher <bkchr@users.noreply.github.com>
    
    * Docker: copy the whole workspace in one go
    
    Pro: future-proofing against the time we add or remove a directory
    Con: changing any file in the workspace busts Rust's build cache,
         which takes a long time.
    
    Co-authored-by: default avatarBastian Köcher <bkchr@users.noreply.github.com>
    5678c8a1
Code owners
Assign users and groups as approvers for specific file changes. Learn more.