Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No hardlimit on rescheduleCriticalSeconds #1088

Open
martin2176 opened this issue Sep 23, 2022 · 1 comment
Open

No hardlimit on rescheduleCriticalSeconds #1088

martin2176 opened this issue Sep 23, 2022 · 1 comment
Assignees

Comments

@martin2176
Copy link

What would you like to be added:
Currently rescheduleCriticalSeconds in Adaptive scheduling workloadspread has a max limit of 295 seconds with 2 subsets.
Additional subsets will bring down this allowed rescheduleCriticalSeconds .
Feature/RFE is to not put a hard limit and let user choose what they want.
Or if that is not architecturally possible, allow a limit which is more than 295. Something atleast 15 minutes

scheduleStrategy:
type: Adaptive
adaptive:
rescheduleCriticalSeconds: 180

Invalid value: 180: rescheduleCriticalSeconds < 0 or rescheduleCriticalSeconds > 98 is not permitted
I have 4 subsets .
Adding each subset lower the allowed value of reschedulecriticalseconds.

Can we allow user to specify a reasonable time such as 5 minutes.? Cloud Kubernetes deployments take atleast 2 minutes to spin up a node in response to an autoscaler event

Why is this needed:
Scenario:
I have multiple node pools in Azure AKS. It takes 3 minutes for a node to be spun up and get added to node pool inresponse to an autoscaler event.
If I create a workload spread with 4 subsets, the maxallowed is 98 seconds which is not enough to bring a new kubernets node into the cluster.

admission webhook "vworkloadspread.kb.io" denied the request: spec.scheduleStrategy.adaptive.rescheduleCriticalSeconds: Invalid value: 295: rescheduleCriticalSeconds < 0 or rescheduleCriticalSeconds > 98 is not permitted

apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
name: frontend-workloadspread
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
subsets:
- name: spotvmmodel1
requiredNodeSelectorTerm:
matchExpressions:
- key: type
operator: In
values:
- spotvmmodel1
maxReplicas:
- name: spotvmmodel2
requiredNodeSelectorTerm:
matchExpressions:
- key: type
operator: In
values:
- spotvmmodel2
maxReplicas:
- name: spotvmmodel3
requiredNodeSelectorTerm:
matchExpressions:
- key: type
operator: In
values:
- spotvmmodel3
maxReplicas:
- name: spotvmmodel4
requiredNodeSelectorTerm:
matchExpressions:
- key: type
operator: In
values:
- spotvmmodel4
scheduleStrategy:
type: Adaptive
adaptive:
rescheduleCriticalSeconds: 295

@veophi
Copy link
Member

veophi commented Oct 9, 2022

sounds reasonable @martin2176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants