Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(eks): can't update authMode with the same mode #31043

Merged
merged 5 commits into from
Aug 28, 2024

Conversation

pahud
Copy link
Contributor

@pahud pahud commented Aug 6, 2024

The cluster resource handler would fail when updating the authMode with exactly the same mode. This could happen as described in #31032

We need to check if the cluster is already at the desired authMode and gracefully ignore the update.

Issue # (if applicable)

Closes #31032

Reason for this change

Description of changes

Description of how you validated changes

This PR is essentially to address a very special case described in #31032 and not easy to have a unit test or integ test for that. Instead, I validated it using manual deployment.

step 1: initial deployment of a default eks cluster with undefined authenticationMode
step 2: update the cluster and add a s3 bucket that would fail and trigger the rollback. At this point, eks auth mode would update but can't be rolled back. This makes the resource state out of sync with CFN.
step 3: re-deploy the same stack without the s3 bucket but with the same auth mode in step 2. As the cluster has already modified its auth mode, this step should gracefully succeed.

import {
  App, Stack, StackProps,
  aws_ec2 as ec2,
  aws_s3 as s3,
} from 'aws-cdk-lib';
import * as eks from 'aws-cdk-lib/aws-eks';
import { getClusterVersionConfig } from './integ-tests-kubernetes-version';

interface EksClusterStackProps extends StackProps {
  authMode?: eks.AuthenticationMode;
  withFailedResource?: boolean;
}

class EksClusterStack extends Stack {
  constructor(scope: App, id: string, props?: EksClusterStackProps) {
    super(scope, id, {
      ...props,
      stackName: 'integ-eks-update-authmod',
    });

    const vpc = new ec2.Vpc(this, 'Vpc', { maxAzs: 2, natGateways: 1, restrictDefaultSecurityGroup: false });

    const cluster = new eks.Cluster(this, 'Cluster', {
      vpc,
      ...getClusterVersionConfig(this, eks.KubernetesVersion.V1_30),
      defaultCapacity: 0,
      authenticationMode: props?.authMode,
    });

    if (props?.withFailedResource) {
      const bucket = new s3.Bucket(this, 'Bucket', { bucketName: 'aws' });
      bucket.node.addDependency(cluster);
    }

  }
}

const app = new App();

// create a simple eks cluster for the initial deployment
// new EksClusterStack(app, 'create-stack');

// 1st attempt to update with an intentional failure
new EksClusterStack(app, 'update-stack', {
  authMode: eks.AuthenticationMode.API_AND_CONFIG_MAP,
  withFailedResource: true,
});

// // 2nd attempt to update using the same authMode
new EksClusterStack(app, 'update-stack', {
  authMode: eks.AuthenticationMode.API_AND_CONFIG_MAP,
  withFailedResource: false,
});

And it's validated in us-east-1.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p2 labels Aug 6, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team August 6, 2024 19:36
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Aug 6, 2024
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.

A comment requesting an exemption should contain the text Exemption Request. Additionally, if clarification is needed add Clarification Request to a comment.

@pahud pahud marked this pull request as ready for review August 23, 2024 18:12
@@ -247,6 +247,13 @@ export class ClusterResourceHandler extends ResourceHandler {
this.newProps.accessConfig?.authenticationMode === 'API') {
throw new Error('Cannot update from CONFIG_MAP to API');
}
// update-authmode will fail if we try to update to the same mode,
// so skip in this case.
const cluster = (await this.eks.describeCluster({ name: this.clusterName })).cluster;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try catch this API call to avoid network issues or any other issue that will fail customer handler code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GavinZZ just added the try/catch and manually re-validated it in us-east-1. It deployed with no error.

@GavinZZ GavinZZ added the pr-linter/exempt-test The PR linter will not require test changes label Aug 26, 2024
@GavinZZ
Copy link
Contributor

GavinZZ commented Aug 26, 2024

Discussed offline. The integration and unit tests are hard for this change. We've manually tested it by deploying a stack and verified.

@GavinZZ GavinZZ added the pr-linter/exempt-integ-test The PR linter will not require integ test changes label Aug 28, 2024
@GavinZZ
Copy link
Contributor

GavinZZ commented Aug 28, 2024

@mergify update

Copy link
Contributor

mergify bot commented Aug 28, 2024

update

❌ Mergify doesn't have permission to update

For security reasons, Mergify can't update this pull request. Try updating locally.
GitHub response: refusing to allow a GitHub App to create or update workflow .github/workflows/request-cli-integ-test.yml without workflows permission

@aws-cdk-automation aws-cdk-automation dismissed their stale review August 28, 2024 00:06

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

Copy link
Contributor

mergify bot commented Aug 28, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 9efaa1e
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

mergify bot commented Aug 28, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 64df08b into aws:main Aug 28, 2024
9 checks passed
Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. contribution/core This is a PR that came from AWS. effort/medium Medium work item – several days of effort p2 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exempt-test The PR linter will not require test changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eks: authentication mode failed to update
3 participants