[feature request] Make daemonset controller stick to strategy.rollingUpdate.maxUnavailable #1323

miklezzzz · 2023-06-27T09:27:59Z

What would you like to be added:
At the moment, DaemonSet is allowed to delete more pods than maxUnavailable setting if pods fail the podutil.IsPodAvailable check. As I understand it was made so for the sake of decreasing the recovery time. At the same time, there is a probability for such circumstances to occur that may result in deleting all the pods of a Daemonset simultaneously. In case of using a Daemonset for scheduling some critical workloads like an Ingress controller's pods, it becomes a problem, as downtime is involved.
We use Openkruise DaemonSet controller pretty extensively and have already observed such problems.
The proposal is to have a way to make the DaemonSet controller strictly obey maxUnavailable settings. We ended up applying the following changes in our environment: miklezzzz@ef27767.

One really simple example is to have a DaemonSet with minReadySeconds set. If we update the DaemonSet twice in a period of time, smaller than minReadySeconds, the second update will have DaemonSet controller delete all the pods at once.
There are more examples, but a bit more elaborated.

Why is this needed:
It would allow a user to have finer control over the pace of a DaemonSet's updates.

The text was updated successfully, but these errors were encountered:

zmberg · 2023-06-28T05:44:30Z

@miklezzzz If remainingUnavailable always is 0, because some nodeToDaemonPod always not ready. Then DaemonSet will block the updating operation.

miklezzzz · 2023-06-28T07:31:41Z

@zmberg HI! thanks for the response!
I couldn't reproduce such an issue, could you please elaborate more on that case?
As we now, remainingUnavailable is the product of maxUnavailable (1) sub numUnavailable and the latter gets incremented in the outer switch section. Even if I intentionally prevent some pods of a daemonset from running (so that they stuck in a crash loop), when I update the daemonset, it still tries to update at least one of the crashlooping pods and if the pod doesn't start up, the update process stucks, but in general it's how it should be.
p.s. I can provide a gif, recorded in a test cluster.

stale · 2023-09-26T07:57:58Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nabokihms · 2023-09-26T20:27:35Z

@zmberg any news on this PR?

stale · 2023-12-27T00:00:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nabokihms · 2023-12-27T17:13:52Z

I'd still like to see an answer here.

stale · 2024-04-01T00:34:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nabokihms · 2024-04-01T05:20:59Z

/unstale

zmberg · 2024-04-07T08:08:09Z

@nabokihms @miklezzzz
The overall logic of Advanced DaemonSet is an enhancement on top of the native K8S DaemonSet, and at the moment this piece of logic is really the logic of the follow community.

If the Pod is Not Ready, I think this logic makes more sense as it allows for quick recovery. Are you guys currently having problems because of the scenario with minReadySeconds set?

stale · 2024-07-07T06:54:38Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

miklezzzz added the kind/feature-request label Jun 27, 2023

miklezzzz assigned FillZpp Jun 27, 2023

zmberg assigned zmberg and unassigned FillZpp Jun 28, 2023

stale bot added the wontfix This will not be worked on label Sep 26, 2023

stale bot removed the wontfix This will not be worked on label Sep 26, 2023

stale bot added the wontfix This will not be worked on label Dec 27, 2023

stale bot removed the wontfix This will not be worked on label Dec 27, 2023

stale bot added the wontfix This will not be worked on label Apr 1, 2024

stale bot removed the wontfix This will not be worked on label Apr 1, 2024

stale bot added the wontfix This will not be worked on label Jul 7, 2024

stale bot closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Make daemonset controller stick to strategy.rollingUpdate.maxUnavailable #1323

[feature request] Make daemonset controller stick to strategy.rollingUpdate.maxUnavailable #1323

miklezzzz commented Jun 27, 2023 •

edited

Loading

zmberg commented Jun 28, 2023

miklezzzz commented Jun 28, 2023

stale bot commented Sep 26, 2023

nabokihms commented Sep 26, 2023

stale bot commented Dec 27, 2023

nabokihms commented Dec 27, 2023

stale bot commented Apr 1, 2024

nabokihms commented Apr 1, 2024

zmberg commented Apr 7, 2024

stale bot commented Jul 7, 2024

[feature request] Make daemonset controller stick to strategy.rollingUpdate.maxUnavailable #1323

[feature request] Make daemonset controller stick to strategy.rollingUpdate.maxUnavailable #1323

Comments

miklezzzz commented Jun 27, 2023 • edited Loading

zmberg commented Jun 28, 2023

miklezzzz commented Jun 28, 2023

stale bot commented Sep 26, 2023

nabokihms commented Sep 26, 2023

stale bot commented Dec 27, 2023

nabokihms commented Dec 27, 2023

stale bot commented Apr 1, 2024

nabokihms commented Apr 1, 2024

zmberg commented Apr 7, 2024

stale bot commented Jul 7, 2024

miklezzzz commented Jun 27, 2023 •

edited

Loading