Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 1.49 KB

troubleshooting.md

File metadata and controls

42 lines (30 loc) · 1.49 KB

Troubleshooting

NAP Scale Up

kubectl describe pod <pending-pod>
LAST SEEN   TYPE      REASON          OBJECT                          MESSAGE
12s         Warning   FailedScaleUp   pod/pod-test-5b97f7c978-h9lvl   Node scale up in zones associated with this pod failed: Internal error. Pod is at risk of not being scheduled

The root cause could be serial-port-logging-enable being false or auto_upgrade being set to false. See: https://cloud.google.com/kubernetes-engine/docs/troubleshooting/troubleshooting-autopilot-clusters#scale-up-failed-serial-port-logging

Solution:

gcloud compute project-info add-metadata \
    --metadata serial-port-logging-enable=true

In case your issue is related to auto_upgrade ensure your cluster autoprovisioning defaults enable auto upgrade:

gcloud container clusters update CLUSTER_NAME \
    --enable-autoprovisioning --enable-autoprovisioning-autorepair \
    --enable-autoprovisioning-autoupgrade

Recreate cluster. View errors with more info.

kubectl describe pod <pending-pod>
  Normal   TriggeredScaleUp   35s   cluster-autoscaler  pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/cnp-demo-dev/zones/us-central1-a/instanceGroups/gke-substratus-nap-n1-standard-4-gpu1-dab8d858-grp 0->1 (max: 1000)}]
  Warning  FailedScaleUp      13s   cluster-autoscaler  Node scale up in zones us-central1-a associated with this pod failed: GCE out of resources. Pod is at risk of not being scheduled.