-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating and adding a timeout to the cleanup cluster step #7727
base: main
Are you sure you want to change the base?
Conversation
@@ -52,5 +52,14 @@ for ns in $namespaces; do | |||
break | |||
fi | |||
echo "deleting namespaces: $ns" | |||
|
|||
# Remove finalizers from the namespace if present | |||
kubectl get namespace $ns -o json | jq '.spec.finalizers = []' >temp-namespace.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, UCP and Applications RP Pods were not healthy, and this caused finalizers to not finalize in some of the resources in some of the namespaces. And then this caused those namespaces to not be deleted.
UCP Pod Down -> Finalizers in some resources -> Namespaces couldn't be deleted -> Clean up step got stuck -> Long Running Workflow failed back-to-back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have logs from the failing pods? or from the Kubernetes controller-manager
pod?
@@ -515,6 +515,7 @@ jobs: | |||
--yes --verbose | |||
- name: Clean up cluster | |||
if: always() | |||
timeout-minutes: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added timeout to clean up step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the goal of the timeout? That we'd be notified if we hit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is 6 hours, so fail faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reformatted the files
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7727 +/- ##
==========================================
- Coverage 61.00% 60.99% -0.01%
==========================================
Files 520 520
Lines 27010 27010
==========================================
- Hits 16478 16476 -2
- Misses 9080 9081 +1
- Partials 1452 1453 +1 ☔ View full report in Codecov by Sentry. |
Radius functional test overview
Click here to see the list of tools in the current test run
Test Status⌛ Building Radius and pushing container images for functional tests... |
Signed-off-by: ytimocin <ytimocin@microsoft.com>
65113ae
to
d8a3fd0
Compare
kubectl delete namespace $ns --ignore-not-found=true | ||
done | ||
|
||
# Cleanup temporary file | ||
rm -f temp-namespace.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if an error occurs above/script exists, is it possible that file is not cleaned up ?
|
||
# Remove finalizers from the namespace if present | ||
kubectl get namespace $ns -o json | jq '.spec.finalizers = []' >temp-namespace.json | ||
kubectl replace --raw "/api/v1/namespaces/$ns/finalize" -f ./temp-namespace.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does --force
help here? https://kubernetes.io/docs/reference/kubectl/generated/kubectl_delete/
Just thinking it might be easier if it works.
I'm also curious what the finalizers are. We have to be a little bit careful with this, because certain types in Kubernetes will manage cloud resources. If we delete the Kubernetes resource without running the finalizers, then the cloud resources leak.
Description
Updating and adding a timeout to the cleanup cluster step.
Type of change