Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application stuck in deletion because of a postDelete hook #18270

Closed
3 tasks done
joffreychambrin opened this issue May 17, 2024 · 9 comments · Fixed by #18767
Closed
3 tasks done

Application stuck in deletion because of a postDelete hook #18270

joffreychambrin opened this issue May 17, 2024 · 9 comments · Fixed by #18767
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@joffreychambrin
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Since we've upgrade to to ArgoCD v2.10.0, we have a specific application that gets stuck in Deletion. The only solution I have found is to manually remove the finalizers from the App.

To Reproduce

  1. Install the latest ArgoCD version on a Kubernetes cluster,
  2. Then add a Helm app with:
    URL: https://charts.releases.teleport.dev
    Chart: teleport-kube-agent.
    Version: 15.0.2
  3. And delete the ArgoCD app.

Expected behavior

The post hook delete job will be launch, and once completed the app should be automatically deleted from ArgoCD.

Version

argocd: v2.11.0+d3f33c0
  BuildDate: 2024-05-07T16:01:41Z
  GitCommit: d3f33c00197e7f1d16f2a73ce1aeced464b07175
  GitTreeState: clean
  GoVersion: go1.21.9
  Compiler: gc
  Platform: linux/arm64

Logs

time="2024-05-17T09:44:28Z" level=error msg="Recovered from panic: runtime error: invalid memory address or nil pointer dereferencegoroutine 230 [running]:
runtime/debug.Stack()
    /usr/local/go/src/runtime/debug/stack.go:24 +0x64
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppOperationQueueItem.func1()
    /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:890 +0x50
panic({0x2fb9ae0?, 0x6a9bf20?})
    /usr/local/go/src/runtime/panic.go:914 +0x218
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).executePostDeleteHooks(0x4000bee380, 0x401a639000, 0x40230d3200?, 0x4028b6af80?, 0x20?, 0x0?)
    /go/src/github.com/argoproj/argo-cd/controller/hook.go:101 +0x8f4
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).finalizeApplicationDeletion(0x4000bee380, 0x401a638c00, 0x36b3559?)
    /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:1168 +0x560
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).processAppOperationQueueItem(0x4000bee380)
    /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:926 +0x2c0
github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run.func4()
    /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:831 +0x2c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xae6fb19b55bdc225?)
    /go/pkg/mod/k8s.io/apimachinery@v0.26.11/pkg/util/wait/wait.go:157 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x875c9889805965b5?, {0x49d7500, 0x4001575080}, 0x1, 0x40005b8240)
    /go/pkg/mod/k8s.io/apimachinery@v0.26.11/pkg/util/wait/wait.go:158 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x4f1857488fcceb0a?, 0x3b9aca00, 0x0, 0xcc?, 0x231652ab9f10a9f1?)
    /go/pkg/mod/k8s.io/apimachinery@v0.26.11/pkg/util/wait/wait.go:135 +0x80
k8s.io/apimachinery/pkg/util/wait.Until(0x2997fcc5e506dda9?, 0x7c42a7c42a95252d?, 0xd8649a624c79c08a?)
    /go/pkg/mod/k8s.io/apimachinery@v0.26.11/pkg/util/wait/wait.go:92 +0x28
created by github.com/argoproj/argo-cd/v2/controller.(*ApplicationController).Run in goroutine 70
    /go/src/github.com/argoproj/argo-cd/controller/appcontroller.go:830 +0x728
"
@joffreychambrin joffreychambrin added the bug Something isn't working label May 17, 2024
@ChristianCiach
Copy link
Contributor

What kind of hook are you using? A Job? Looking at the code, there is something wrong with the health check for the hook. Are you using custom health checks as Lua scripts?

@ChristianCiach
Copy link
Contributor

I think https://github.com/argoproj/argo-cd/blob/v2.11.0/controller/hook.go just misses necessary nil-checks after calling GetResourceHealth. Every other caller of GetResourceHealth (both in argo-cd and in gitops-engine) checks the returned health object for nil, but not the post-delete hook controller. The returned health object can indeed be nil.

@ChristianCiach
Copy link
Contributor

ChristianCiach commented May 17, 2024

@joffreychambrin You can probably workaround the bug by adding this to your argocd-cm configmap:

  resource.customizations: |
    batch/Job:
      health.lua: |
        hs = {}
        hs.status = "Progressing"
        hs.message = ""
        if obj.status ~= nil then
          if obj.status.conditions ~= nil then
            for i, condition in ipairs(obj.status.conditions) do
              if condition.type == "Complete" and condition.status == "True" then
                hs.status = "Healthy"
                return hs
              end
              if condition.type == "Failed" and condition.status == "True" then
                hs.status = "Degraded"
                return hs
              end
            end
          end
        end
        return hs

Completely untested, and I would advise against it. The idea here is to guarantee a non-nil hookHealth object in

if hookHealth.Status == health.HealthStatusProgressing {
to workaround the missing nil check there.

@ChristianCiach
Copy link
Contributor

@joffreychambrin The unit test introduced in #16595 only tests this feature by using a bare Pod. So it would probably also work when you replace your Job with a bare Pod.

@ChristianCiach
Copy link
Contributor

@alexmt Maybe you want to know about this :)

@joffreychambrin
Copy link
Author

joffreychambrin commented May 20, 2024

@ChristianCiach Yes the hook I am using is a job. It is the default one from Teleport here: https://github.com/gravitational/teleport/blob/70ba6be2eac4f8e5c275ffac6246197ef798392a/examples/chart/teleport-kube-agent/templates/delete_hook.yaml#L47

I think you are right in your analysis, because as you can see in the above link, there are multiple resources with the post-delete hook: a service-account, a role, a roleBinding, and a job. And not all of them have a healthCheck, therefore the necessity to have a nil check on the hookHealth

@agaudreault agaudreault added the good first issue Good for newcomers label Jun 21, 2024
@agaudreault
Copy link
Member

@Yuni-sa you recently made a pull request for another bug related to heath check, would you be interested in fixing this one?

@Yuni-sa
Copy link
Contributor

Yuni-sa commented Jun 21, 2024

@agaudreault Yeah, sure! I can help fix this. How should we handle cases where the health check is nil for some resources? I was thinking of logging a message and skipping those checks.
for example :

if hookHealth == nil {
  logCtx.Infof("No health check available for post-delete hook %s/%s", obj.GetNamespace(), obj.GetName())
  continue
}

@agaudreault
Copy link
Member

@Yuni-sa awesome! I think when there are no health check, a log is good. If possible it should include the group, version and kind of the object as log fields. Then I think the behavior should be the same one as we have during the sync. We consider resources healthy if there is no health check defined or built-in.

ishitasequeira pushed a commit that referenced this issue Jun 26, 2024
* fix: Handle nil health check in post-delete hooks

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* test: Validate deletion of RBAC resources for post-delete hook

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* Update appcontroller_test.go

Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>

---------

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>
Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
gcp-cherry-pick-bot bot pushed a commit that referenced this issue Jun 26, 2024
* fix: Handle nil health check in post-delete hooks

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* test: Validate deletion of RBAC resources for post-delete hook

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* Update appcontroller_test.go

Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>

---------

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>
Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
pasha-codefresh pushed a commit that referenced this issue Jun 26, 2024
…18828)

* fix: Handle nil health check in post-delete hooks



* test: Validate deletion of RBAC resources for post-delete hook



* Update appcontroller_test.go



---------

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>
Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
Co-authored-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
ggjulio pushed a commit to ggjulio/argo-cd that referenced this issue Jul 21, 2024
…rgoproj#18767)

* fix: Handle nil health check in post-delete hooks

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* test: Validate deletion of RBAC resources for post-delete hook

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* Update appcontroller_test.go

Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>

---------

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>
Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
jsolana pushed a commit to jsolana/argo-cd that referenced this issue Jul 24, 2024
…rgoproj#18767)

* fix: Handle nil health check in post-delete hooks

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* test: Validate deletion of RBAC resources for post-delete hook

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>

* Update appcontroller_test.go

Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>

---------

Signed-off-by: Yonatan Sasson <yonatanxd7@gmail.com>
Signed-off-by: Yonatan Sasson <107778824+Yuni-sa@users.noreply.github.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
Signed-off-by: Javier Solana <javier.solana@cabify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants