-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for health check when draining nodes #1699
Comments
|
I was aiming also to the ability to specify some grace period between nodes... I ended up writing my one drain script (can put it public if it's interesting), but I'll be happy to use official eksctl code. |
as a workaround I had to use this command my suggestion here would be that eksctl drain would accept a switch |
You are looking for a feature so that there is a time gap between draining of subsequent nodes of a node group ? or you are looking for this flag "grace-period" implementation in eksctl side ? I would like to work on this issue. Please clarify the above questions. |
Sorry to reply to a question aimed to another user by I'm just running in a similar situation, during the creation of a new node group followed by deletion of the old one.
In this way should be safer (and even faster) to move, for example, such apps that needs to form a cluster themselves (1 app cluster node per pod). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
closing this due to lack of activity |
Since 2020.10.19 we are using proper eviction and cordoning for each node until they have pods on them. We are also doing the "Health check" thing by filtering for pods which can be evicted or deleted. What I would like to understand here is if that is now working properly or does there still happen to be issues around it? |
Before creating a feature request, please search existing feature requests to see if you find a similar one. If there is a similar feature request please up-vote it and/or add your comments to it instead
Why do you want this feature?
When draining nodes on a production cluster, it might be safer to use a health check between node/nodegroup draining loops - to ensure that until now things go well. In case the health check failed, the draining process will be stopped. An example health check - how many ready pods exist out of all pods. If the number is below a given threshold, wait a few seconds before checking again and continue.
What feature/behavior/change do you want?
Allow to specify a health check command, or, as MVP, time to wait between nodes/nodegroup, maybe something like:
Or, with health check:
The text was updated successfully, but these errors were encountered: