Prerequisite

(Note the prerequisite will be made less restrictive with time.)

No user workload to be deployed other than system-components.
1 worker pool for system components named sys-comp with systemComponents.allow set to true and min=1, max>=2.
Required machine size -> 2 cores CPU, 8Gi memory (Note: Ensure that all worker groups created for these tests use the same machine type)
2 worker pools are needed namely one-zone and three-zones respectively with systemComponents.allow set to false. The 4 machinedeployments/node groups in the cluster should be present, with following min:max limits
- machineDeployment1 (0:2) [worker:- one-zone]
- machineDeployment2 (1:2) [worker:- three-zones]
- machineDeployment3 (0:1) [worker:- three-zones]
- machineDeployment4 (0:1) [worker:- three-zones]
Make sure that the names of the worker pools are as mentioned above. It is very important for the integration tests to run properly.
Make sure to disable calico-typha pods as they interfere with Integration test (especially ones related to scale-down due to under-utilization). Refer this doc to disable it.

Cluster Autoscaler integration test suite

Cluster Autoscaler integration test suite runs a set of tests against an actual Shoot to verify the behaviour and report anomalies. The integration test suite provided with all the configurational inputs will

Reconfigure the nodeGroups so that the test suite begins with only one nodeGroup, with 3 zones and 1 node running
CA would run with leader-election= false
The testcases will deploy and remove the workloads and nodes based on the test scenario.
During the tests any nodes present before the start of test suite is tainted with taint having key testing.node.gardener.cloud/initial-node-blocked. This taint is removed after the test suite is done (succeeded or failed)

Usage guide for running Cluster Autoscaler integration test suite

Clone the repository at $GOPATH/k8s.io/autoscaler and navigate to the cluster-autoscaler sub directory.
```
cd ./cluster-autoscaler
```

Export the following environment variables

export CONTROL_NAMESPACE=<Shoot namespace in the Seed>
export TARGET_KUBECONFIG=<Path to the kubeconfig file of the Shoot>
export CONTROL_KUBECONFIG=<Path to the kubeconfig file of the Seed (or the control plane where the Cluster Autoscaler & Machine deployment objects exists)>
export KUBECONFIG=<Path to the kubeconfig file of the Shoot>
export VOLUME_ZONE=<zone with zero nodes in the worker pool "three-zones" where the PV needs to be created to perform test>
export PROVIDER=<aws/gcp/azure/...>

Alternatively, you could use the make download-kubeconfigs to download and place the kubeconfigs in dev folder and follow the steps as mentioned in output of the command. (only to be used when working with gardenctl V2 and both the clusters are on gardener)

Invoke integration tests using make command

make test-integration

[toggle] which will execute the integration tests that would look something like this

 ```bash
 make test-integration
 ../.ci/local_integration_test
 Starting integration tests...
 Running Suite: Integration Suite
 ================================
 Random Seed: 1642400803
 Will run 1 of 1 specs

 Scaling Cluster Autoscaler to 0 replicas
 STEP: Starting Cluster Autoscaler....
 Machine controllers test Trigger scale up by deploying new workload requesting more resources 
 should not lead to any errors and add 1 more node in target cluster
 $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:71
 STEP: Checking autoscaler process is running
 STEP: Adjusting the NodeGroups for the purpose of tests
 STEP: Deploying workload...
 STEP: Validating Scale up

 • [SLOW TEST:130.726 seconds]
 Machine controllers test
 $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:63
 Trigger scale up
 $GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:69
 	by deploying new workload requesting more resources
 	$GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:70
 	should not lead to any errors and add 1 more node in target cluster
 	$GOPATH/src/github.com/gardener/autoscaler/cluster-autoscaler/test/integration/integration_test.go:71
 ------------------------------
 STEP: Waiting for scale down of nodes to 1

 Ran 1 of 1 Specs in 162.800 seconds
 SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped
 PASS

 Ginkgo ran 1 suite in 2m45.928186649s
 Test Suite Passed
 ```

Tests Covered

Deploy new workload that asks for more resources and it should create a new machine to accommodate that.
Scaling to 3 instances of the above workload to increase the number of machines to 3.
Removing all the above workload to remove all the newly added machines.
Should not scale up above the max limit for the nodegrp
Should not scale lower than the min limit for the nodegrp
Should scale up on the basis of taints and not only on workload size.
Should respond by shifting load if some taint is removed from a workload/node.
Autoscaler scales up a node in zone where PV already present and pod is requesting that PV. CSI PV is used.
Should not scale down a node with annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"
Should scale down a node after the above annotation is removed from it.
Should not scale if no machine in worker group can satisfy the requirements of the pod.

Planned Tests

shouldn't scale down with underutilized nodes due to host port conflicts
CA ignores unschedulable pods while scheduling schedulable pods. (line 337 need to understand reasoning for it)
shouldn't increase cluster size if pending pod is too large
should increase cluster size if pending pods are small
should increase cluster size if pending pods are small and one node is broken
shouldn't trigger additional scale-ups during processing scale-up
should increase cluster size if pending pods are small and there is another node pool that is not autoscaled
should disable node pool autoscaling
should increase cluster size if pods are pending due to host port conflict
should increase cluster size if pod requesting EmptyDir volume is pending
should scale up correct target pool
should add node to the particular mig
should correctly scale down after a node is not needed and one node is broken
should correctly scale down after a node is not needed when there is non autoscaled pool
should be able to scale down when rescheduling a pod is required and pdb allows for it
shouldn't be able to scale down when rescheduling a pod is required, but pdb doesn't allow drain
should be able to scale down by draining multiple pods one by one as dictated by pdb
should be able to scale down by draining system pods with pdb
Should be able to scale a node group up from 0
Should be able to scale a node group down to 0
Shouldn't perform scale up operation and should list unhealthy status if most of the cluster is broken
shouldn't scale up when expendable pod is created
should scale up when non-expandable pod is created
shouldn't scale up when expendable pod is preempted
should scale down when expendable pod is running
shouldn't scale down when non-expandable pod is running

Tests for GPU Pool

Should scale up GPU pool from 0
Should scale up GPU pool from 1
Should not scale GPU pool up if pod does not require GPUs
Should scale down GPU pool from 1

for further details refer : https://github.com/kubernetes/kubernetes/blob/master/test/e2e/autoscaling/cluster_size_autoscaling.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage.md

usage.md

Prerequisite

Cluster Autoscaler integration test suite

Usage guide for running Cluster Autoscaler integration test suite

Tests Covered

Planned Tests

Tests for GPU Pool

Files

usage.md

Latest commit

History

usage.md

File metadata and controls

Prerequisite

Cluster Autoscaler integration test suite

Usage guide for running Cluster Autoscaler integration test suite

Tests Covered

Planned Tests

Tests for GPU Pool