Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

Open
4 tasks done
jaswanthikolla opened this issue Jun 14, 2024 · 2 comments
Open
4 tasks done
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers

Comments

@jaswanthikolla
Copy link

jaswanthikolla commented Jun 14, 2024

Checks

Controller Version

0.9.2

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

  1. Spin up DinD ARSS runner.
  2. Apply this PR patch on the runner. YOu can override the the file in Docker image using following step, And setup terminationGracePeriod & env variable RUNNER_GRACEFUL_STOP_TIMEOUT
    COPY --chown=runner:runner run.sh /home/runner/run.sh
  3. Write a workflow with a following step and with above DinD runner.
- name: Run test
        run: |
          sleep 300
          docker run hello-world
  1. Find the runner pod that's triggered for above, anddo kubectl terminate pod PODNAME -n NAMESPACE
  2. For Default/k8s runner, this will wait until the job is completed. But, for Dind It will crash, More on why in the next section

Describe the bug

This is a design bug of Scaled Set DinD runner. This Scaled Set DinD runner is clearly meant to run within the kubernetes, but it's not compatible with terminationGracePeriod & karpenter. Pod Movement is expected and application should respect SIGTERM.

While we can do fix the runner and make default/kuberentes type runners compatible with kubernetes/karpenter easily. DinD runner by it's design makes it harder to implement this. Basically, what happens when kubernetes sends a SIGTERM is:

  • There are 2 containers running on the pod - First one is runner and Second one is the DinD Containers.
  • k8s/karpenter sends SIGTERM ( When it decides to move the Pod ) to both the containers running on the pod.
  • Runner will recieve the SIGTERM and decides to wait until the completion of the job due to RUNNER_GRACEFUL_STOP_TIMEOUT
  • DinD container will receive the SIGTERM and will immediately exit the process.

A wrapper script that captures the SIGTERM can't properly fix it and need to think this through, see the next section ( additional Context) on how it's a design issue.

Describe the expected behavior

DinD container should wait until the Runner container is finished running. Capturing SIGTERM on DinD Container with wrapper script and waiting for completion of docker usage ( with run/build) will not work because there could be more workflow steps in the github action that need the DinD container.

Potential Solutions

Thoughts on this Design Bug:
Because DinD container should wait for Runner Container, following are some of the approaches.

  • On the SIGTERM Wrapper for DinD script polls for the runner status using Github API.
  • Create a volume and Mount the same volume on both the container.
    -- Use FileSystem as IPC mechanism, and watch that file in the DinD's SIGTERM trap.
    -- Use shareProcessNamespace and DinD's lifecycle preStop to watch the Runner process.
  • For Kubernetes 1.29+ , there is a new feature called Native sidecar containers, DinD can be moved there which keeps it for the lifetime of the main container runner
  • Combine DinD and runner into one container, I have explained it in the next section.

Combine DinD and runner & run it as rootless:

We can install Rootless Docker or daemonless PodMan into the runner itself and use that. There is also another problem of ScaledSet runs DinD as root user , So It's better to look into rootless Docker or Podman as well . Following are the benefits of this approach.

  • Solves the multi container orchestration problem with SIGTERM.
  • rootless docker-in-docker containers
  • Amortized kubernetes resources ( limits/resources) as opposed to dedicated runner/dind container. For most workloads, the resources are used at point of time for either runner or docker. For example, When you do git checkout on the runner, it uses runner resources but docker sits idle. But, when you do docker run, dind container is busy but runner sits idle.

Controller Logs

NA

Runner Pod Logs

NA
@jaswanthikolla jaswanthikolla added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jun 14, 2024
@jaswanthikolla
Copy link
Author

jaswanthikolla commented Jun 16, 2024

This PR is a decent workaround for this issue, but I think we can do better than that.

@jaswanthikolla jaswanthikolla changed the title DinD runner Design is not compatible with Kubernetes & Karpenter DinD runner Design is not compatible with Kubernetes & Karpenter & Needs root Jun 16, 2024
@jaswanthikolla jaswanthikolla changed the title DinD runner Design is not compatible with Kubernetes & Karpenter & Needs root DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root Jun 23, 2024
@gfrid
Copy link

gfrid commented Jun 24, 2024

Interesting...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers
Projects
None yet
Development

No branches or pull requests

2 participants