(Note the prerequisite will be made less restrictive with time.)
- No user workload to be deployed other than system-components.
- 1 worker pool for system components named
sys-comp
withsystemComponents.allow
set totrue
andmin=1
,max>=2
. - Required machine size -> 2 cores CPU, 8Gi memory (Note: Ensure that all worker groups created for these tests use the same machine type)
- 2 worker pools are needed namely
one-zone
andthree-zones
respectively withsystemComponents.allow
set tofalse
. The 4 machinedeployments/node groups in the cluster should be present, with followingmin:max
limits- machineDeployment1 (
0:2
) [worker:-one-zone
] - machineDeployment2 (
1:2
) [worker:-three-zones
] - machineDeployment3 (
0:1
) [worker:-three-zones
] - machineDeployment4 (
0:1
) [worker:-three-zones
]
- machineDeployment1 (
- Make sure that the names of the worker pools are as mentioned above. It is very important for the integration tests to run properly.
- Make sure to disable calico-typha pods as they interfere with Integration test (especially ones related to scale-down due to under-utilization). Refer this doc to disable it.
Cluster Autoscaler integration test suite runs a set of tests against an actual Shoot to verify the behaviour and report anomalies. The integration test suite provided with all the configurational inputs will
- Reconfigure the nodeGroups so that the test suite begins with only one nodeGroup, with 3 zones and 1 node running
- CA would run with leader-election= false
- The testcases will deploy and remove the workloads and nodes based on the test scenario.
- During the tests any nodes present before the start of test suite is tainted with taint having key
testing.node.gardener.cloud/initial-node-blocked
. This taint is removed after the test suite is done (succeeded or failed)
-
Clone the repository at
$GOPATH/k8s.io/autoscaler
and navigate to thecluster-autoscaler
sub directory.cd ./cluster-autoscaler
-
Export the following environment variables
export CONTROL_NAMESPACE=<Shoot namespace in the Seed> export TARGET_KUBECONFIG=<Path to the kubeconfig file of the Shoot> export CONTROL_KUBECONFIG=<Path to the kubeconfig file of the Seed (or the control plane where the Cluster Autoscaler & Machine deployment objects exists)> export KUBECONFIG=<Path to the kubeconfig file of the Shoot> export VOLUME_ZONE=<zone with zero nodes in the worker pool "three-zones" where the PV needs to be created to perform test> export PROVIDER=<aws/gcp/azure/...>
-
Alternatively, you could use the
make download-kubeconfigs
to download and place the kubeconfigs in dev folder and follow the steps as mentioned in output of the command. (only to be used when working with gardenctl V2 and both the clusters are on gardener) -
Invoke integration tests using
make
commandmake test-integration
[toggle] which will execute the integration tests that would look something like this
```bash make test-integration ../.ci/local_integration_test Starting integration tests... Running Suite: Integration Suite ================================ Random Seed: 1642400803 Will run 1 of 1 specs Scaling Cluster Autoscaler to 0 replicas STEP: Starting Cluster Autoscaler.... Machine controllers test Trigger scale up by deploying new workload requesting more resources should not lead to any errors and add 1 more node in target cluster $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:71 STEP: Checking autoscaler process is running STEP: Adjusting the NodeGroups for the purpose of tests STEP: Deploying workload... STEP: Validating Scale up • [SLOW TEST:130.726 seconds] Machine controllers test $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:63 Trigger scale up $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:69 by deploying new workload requesting more resources $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:70 should not lead to any errors and add 1 more node in target cluster $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:71 ------------------------------ STEP: Waiting for scale down of nodes to 1 Ran 1 of 1 Specs in 162.800 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped PASS Ginkgo ran 1 suite in 2m45.928186649s Test Suite Passed ```
- Deploy new workload that asks for more resources and it should create a new machine to accommodate that.
- Scaling to 3 instances of the above workload to increase the number of machines to 3.
- Removing all the above workload to remove all the newly added machines.
- Should not scale up above the max limit for the nodegrp
- Should not scale lower than the min limit for the nodegrp
- Should scale up on the basis of taints and not only on workload size.
- Should respond by shifting load if some taint is removed from a workload/node.
- Autoscaler scales up a node in zone where PV already present and pod is requesting that PV. CSI PV is used.
- Should not scale down a node with annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"
- Should scale down a node after the above annotation is removed from it.
- Should not scale if no machine in worker group can satisfy the requirements of the pod.
- shouldn't scale down with underutilized nodes due to host port conflicts
- CA ignores unschedulable pods while scheduling schedulable pods. (line 337 need to understand reasoning for it)
- shouldn't increase cluster size if pending pod is too large
- should increase cluster size if pending pods are small
- should increase cluster size if pending pods are small and one node is broken
- shouldn't trigger additional scale-ups during processing scale-up
- should increase cluster size if pending pods are small and there is another node pool that is not autoscaled
- should disable node pool autoscaling
- should increase cluster size if pods are pending due to host port conflict
- should increase cluster size if pod requesting EmptyDir volume is pending
- should scale up correct target pool
- should add node to the particular mig
- should correctly scale down after a node is not needed and one node is broken
- should correctly scale down after a node is not needed when there is non autoscaled pool
- should be able to scale down when rescheduling a pod is required and pdb allows for it
- shouldn't be able to scale down when rescheduling a pod is required, but pdb doesn't allow drain
- should be able to scale down by draining multiple pods one by one as dictated by pdb
- should be able to scale down by draining system pods with pdb
- Should be able to scale a node group up from 0
- Should be able to scale a node group down to 0
- Shouldn't perform scale up operation and should list unhealthy status if most of the cluster is broken
- shouldn't scale up when expendable pod is created
- should scale up when non-expandable pod is created
- shouldn't scale up when expendable pod is preempted
- should scale down when expendable pod is running
- shouldn't scale down when non-expandable pod is running
- Should scale up GPU pool from 0
- Should scale up GPU pool from 1
- Should not scale GPU pool up if pod does not require GPUs
- Should scale down GPU pool from 1
for further details refer : https://github.com/kubernetes/kubernetes/blob/master/test/e2e/autoscaling/cluster_size_autoscaling.go