DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

jaswanthikolla · 2024-06-14T17:26:27Z

Checks

I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
I am using charts that are officially provided

Controller Version

0.9.2

Deployment Method

Helm

Checks

This isn't a question or user support case (For Q&A and community support, go to Discussions).
I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Spin up DinD ARSS runner.
Apply this PR patch on the runner. YOu can override the the file in Docker image using following step, And setup terminationGracePeriod & env variable RUNNER_GRACEFUL_STOP_TIMEOUT
COPY --chown=runner:runner run.sh /home/runner/run.sh
Write a workflow with a following step and with above DinD runner.

- name: Run test
        run: |
          sleep 300
          docker run hello-world

Find the runner pod that's triggered for above, anddo kubectl terminate pod PODNAME -n NAMESPACE
For Default/k8s runner, this will wait until the job is completed. But, for Dind It will crash, More on why in the next section

Describe the bug

This is a design bug of Scaled Set DinD runner. This Scaled Set DinD runner is clearly meant to run within the kubernetes, but it's not compatible with terminationGracePeriod & karpenter. Pod Movement is expected and application should respect SIGTERM.

While we can do fix the runner and make default/kuberentes type runners compatible with kubernetes/karpenter easily. DinD runner by it's design makes it harder to implement this. Basically, what happens when kubernetes sends a SIGTERM is:

There are 2 containers running on the pod - First one is runner and Second one is the DinD Containers.
k8s/karpenter sends SIGTERM ( When it decides to move the Pod ) to both the containers running on the pod.
Runner will recieve the SIGTERM and decides to wait until the completion of the job due to RUNNER_GRACEFUL_STOP_TIMEOUT
DinD container will receive the SIGTERM and will immediately exit the process.

A wrapper script that captures the SIGTERM can't properly fix it and need to think this through, see the next section ( additional Context) on how it's a design issue.

Describe the expected behavior

DinD container should wait until the Runner container is finished running. Capturing SIGTERM on DinD Container with wrapper script and waiting for completion of docker usage ( with run/build) will not work because there could be more workflow steps in the github action that need the DinD container.

Potential Solutions

Thoughts on this Design Bug:
Because DinD container should wait for Runner Container, following are some of the approaches.

On the SIGTERM Wrapper for DinD script polls for the runner status using Github API.
Create a volume and Mount the same volume on both the container.
-- Use FileSystem as IPC mechanism, and watch that file in the DinD's SIGTERM trap.
-- Use shareProcessNamespace and DinD's lifecycle preStop to watch the Runner process.
For Kubernetes 1.29+ , there is a new feature called Native sidecar containers, DinD can be moved there which keeps it for the lifetime of the main container runner
Combine DinD and runner into one container, I have explained it in the next section.

Combine DinD and runner & run it as rootless:

We can install Rootless Docker or daemonless PodMan into the runner itself and use that. There is also another problem of ScaledSet runs DinD as root user , So It's better to look into rootless Docker or Podman as well . Following are the benefits of this approach.

Solves the multi container orchestration problem with SIGTERM.
rootless docker-in-docker containers
Amortized kubernetes resources ( limits/resources) as opposed to dedicated runner/dind container. For most workloads, the resources are used at point of time for either runner or docker. For example, When you do git checkout on the runner, it uses runner resources but docker sits idle. But, when you do docker run, dind container is busy but runner sits idle.

Controller Logs

NA

Runner Pod Logs

NA

The text was updated successfully, but these errors were encountered:

jaswanthikolla · 2024-06-16T15:32:21Z

This PR is a decent workaround for this issue, but I think we can do better than that.

gfrid · 2024-06-24T10:38:29Z

Interesting...

jaswanthikolla added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Jun 14, 2024

jaswanthikolla changed the title ~~DinD runner Design is not compatible with Kubernetes & Karpenter~~ DinD runner Design is not compatible with Kubernetes & Karpenter & Needs root Jun 16, 2024

jaswanthikolla mentioned this issue Jun 17, 2024

[JK] ARSS DinD Graceful termination - Wait for Runner #3601

Open

jaswanthikolla changed the title ~~DinD runner Design is not compatible with Kubernetes & Karpenter & Needs root~~ DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

jaswanthikolla commented Jun 14, 2024 •

edited

Loading

jaswanthikolla commented Jun 16, 2024 •

edited

Loading

gfrid commented Jun 24, 2024

DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

DinD runner Design is not compatible with Kubernetes/Karpenter & Needs root #3600

Comments

jaswanthikolla commented Jun 14, 2024 • edited Loading

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Potential Solutions

Combine DinD and runner & run it as rootless:

Controller Logs

Runner Pod Logs

jaswanthikolla commented Jun 16, 2024 • edited Loading

gfrid commented Jun 24, 2024

jaswanthikolla commented Jun 14, 2024 •

edited

Loading

jaswanthikolla commented Jun 16, 2024 •

edited

Loading