Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement lighthouse beacon service #394

Merged
merged 3 commits into from
Oct 12, 2023

Conversation

scottbot95
Copy link
Contributor

@scottbot95 scottbot95 commented Oct 7, 2023

Implement services for Lighthouse Beacon Node and Validator Client.

I feel like the only particularly controversial thing in this PR is making the network default to the name of the beacon/validator service.

Resolves #20

@scottbot95 scottbot95 changed the title WIP: Implement lighthouse beacon service feat: Implement lighthouse beacon service Oct 9, 2023
@scottbot95 scottbot95 marked this pull request as ready for review October 9, 2023 01:40
@aldoborrero
Copy link
Collaborator

@scottbot95 nice work! Thanks again for taking the time to contribute!

As one extra request, do you think can you add some light tests to check the module works properly (like for example we have done for prysm)?

@scottbot95
Copy link
Contributor Author

I am going on vacation soon so I might not get to it for a week or two, but I could certainly look into some tests

@aldoborrero
Copy link
Collaborator

@scottbot95 OK, we can merge this PR and add tests after

@scottbot95
Copy link
Contributor Author

I'm in no rush to merge this personally. Do whatever you feel is best for the project.

@aldoborrero
Copy link
Collaborator

This PR as it is provides value ;) (if there's any bug we can sort it out later)

@aldoborrero aldoborrero merged commit 9273546 into nix-community:main Oct 12, 2023
44 checks passed
@scottbot95
Copy link
Contributor Author

I started taking a look at writing some tests for this and it looks like the existing prysm tests basically just check that the nix module evaluates to a valid systemd config. The only assertion that we have is that the unit enters the active state, but since the service type is simple (the default) the unit actually ALWAYS enters the active state provided that ExecStart start points to a valid executable (even if the executable exits immediately with non-zero status code, it will still pass through active then failed).

I'm happy to add some tests, but I'd like to add something of a little more value if possible. I'm not really sure what the best thing to do here is though. Perhaps just call wait_for_unit twice with a short delay between the two to ensure that the service starts and doesn't exit immediately? It would be nice if we could test the validator is actually working somehow, but that seems like it's probably not really possible.

@aldoborrero
Copy link
Collaborator

@scottbot95 yeah, you're right about the ExecStart. I experienced it a couple of months ago with one client by not being aware of it.

When I was working on the problem, instead, I decided to implement multiple strategies to check the service was starting correctly, but also double-checking that the ports were open and checking if the logs contained anything related to an error message.

Below you can see a completely invented example of what I'm describing. We can take that approach and improve the testing based on that. As the code of the tests is pure Python we can do more clever checks.

What do you think?

testScript = {nodes, ...}: ''
      def wait_for_units(node, units):
        for unit in units:
          node.wait_for_unit(unit)

      def check_ports(node, ports):
        for port in ports:
          node.wait_for_open_port(port)

      start_all()

      # Service Status
      wait_for_units(portal, [ "bitcoind.service", "geth.service"])
      client.wait_for_unit("multi-user.target")

      # Port Accessibility
      check_ports(portal, [9001, 9002, 18500])

      # Data Persistence
      portal.succeed("test -f /var/lib/geth/chain/wallet.db")

      # Error Logs
      portal.fail("journalctl -u geth.service | grep -i 'error'")
    '';

@scottbot95
Copy link
Contributor Author

This approach seems pretty good. Checking ports/logs/files is probably the only things we can realistically check for. We could probably also come up with a message to expect from in logs. In the case of the Lighthouse Validator, we could probably search for INFO Connected to beacon node(s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a service for lighthouse
2 participants