Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ecr-assets): simplify docker asset publishing #5733

Merged
merged 5 commits into from
Jan 16, 2020

Conversation

eladb
Copy link
Contributor

@eladb eladb commented Jan 9, 2020

As part of our work on continuous delivery for CDK apps, we decided to store all docker assets in a single ECR repository per environment. This is consistent with file assets, which are stored in a single S3 bucket which is part of the bootstrap stack. This environment-specific ECR repository uses a well-known physical name aws-cdk/assets and will be automatically created if needed. In the future, this repository will be created as part of the environment bootstrapping process.

The primary change is that the location of docker image assets will be fully determined by the framework. The ECR repository name will be hard-coded to aws-cdk/assets and the image tag will be based on the source hash of the docker asset. This basically means that we could get rid of the CloudFormation parameter that the CLI used to assign with the image name which helps to reduce the number of asset parameters (#3463).

Since from now on the asset ECR repository will contain different types of images (and not versions of the same image), there is no concept of "latest" image anymore and the optimization that was triggered by the --ci flag in the CLI is no longer relevant (pull the "latest"). Luckily CodeBuild now supports docker image layer caching, so this should be the preferred way to optimize docker build times. The --ci feature of the CLI is no longer doing anything.

Furthermore, before this change, in order to clean up ECR repositories, a custom resource called AdoptedRepository was automatically added to the stack for each asset. The purpose of this resource was to remove the asset's ECR repository it if the asset was no longer referenced by this stack. To address this need with the centralized repository, we plan to introduce a garbage collection capability that users will be able to invoke in order to clean up unused assets both from ECR and S3.

We will introduce a way to customize asset repository names as part of the CI/CD project. In the meantime, if you need to override the default "aws-cdk/assets" name, you can specify a repo name through the context key assets-ecr-repository-name (--context in the CLI, cdk.json, new App({ context }) or stack.setContext).

BACKWARDS COMPATIBILITY

As per our guidelines for backwards compatibility, the CLI must be backwards compatible with apps from before this change. However, apps that use CDK >= 1.21.0 will require an upgrade of the CLI. Therefore, to introduce this change, we have made the following non-breaking changes in cx-api:

  1. Make imageNameParameter optional. If it is specified, the CLI will continue ti
  2. Add an optional imageTag which instructs the CLI what tag to use for the image. If omitted (by previous versions), latest will be used as before.

To make it easy to reason about the behavior for old apps, the CLI now has a new implementations for prepareContainerAsset called prepareContainerImageAssetNew. This new code path is triggered when the asset metadata does not include imageNameParameter. The new implementation requires that both repositoryName and imageTag will be defined. The old code path was only modified to support the new optional imageTag property (although it is unlikely to be exercised).

Additional backwards compatibility concerns:

  • New ECR repositories the CLI creates will not have the lifecycle policy that retains only the last 5 docker images. This should not have a functional impact on users, but goes back to the imminent garbage collection project.
  • The removal of the AdoptedRepository resource from all stacks will result in the deletion of all ECR previously created ECR repositories (this is what the AdoptedRepository resource is designed to do). This can be harmful since these repositories are being referenced by the stack. To address this, we invalidate the image ID by salting the source hash. This means that after this change, all container images will have a new ID, which is not maintained by the removed adopted repository resource.

TESTING

  • Unit tests for prepareContainerImage were duplicated and extended to exercise the new code path while preserving tests for old path.
  • All CLI integration tests were executed successfully against the new version.
  • Manually tested that the new CLI works with old apps.

This change also fixes #5807 so that custom docker file names are relative and not absolute paths.

BREAKING CHANGE: all docker image assets are now pushed to a single ECR repository named aws-cdk/assets with an image tag based on the hash of the docker build source directory (the directory where your Dockerfile resides). See PR #5733 for details and discussion.


  • Update version to 1.22.0

@eladb eladb requested a review from iliapolo January 9, 2020 13:46
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Jan 9, 2020
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

packages/@aws-cdk/core/lib/stack.ts Show resolved Hide resolved
packages/aws-cdk/lib/docker.ts Outdated Show resolved Hide resolved
packages/@aws-cdk/cx-api/lib/assets.ts Outdated Show resolved Hide resolved
packages/aws-cdk/lib/docker.ts Outdated Show resolved Hide resolved
packages/aws-cdk/lib/docker.ts Outdated Show resolved Hide resolved
@jogold
Copy link
Contributor

jogold commented Jan 9, 2020

and the image tag will be based on the source hash of the docker asset.

This means that it's only based on the "content" of the Dockerfile and not its "build" result. Let say I have a Dockerfile starting with FROM node:12, when a patch/minor of Node.js is released, when and how does my stack (task def) gets updated?

@iliapolo
Copy link
Contributor

iliapolo commented Jan 9, 2020

@eladb Are we sure we want to give up the ability to use latest tag?

I agree we shouldn't be using it for caching purposes, but using latest to pull the latest version is a very common practice in the docker world. Especially during the development of images.

Actually, while the current model of multiple ECR repositories is more complex (for us) - it (*almost) accuratly represents a docker registry as the world expects it. Usually, a repository only contains versioned artifacts of the same image, and not different images.

For example: https://hub.docker.com/r/nginx/unit/tags (repotisoty is nginx/unit)

It also makes looking at those image URL's very akward, since now the URL will look like:

{accountId}.ecr.{region}.amazonaws.com/aws-cdk/assets:{nodeid?}.{hash} (Right?)

It means I have to look at the right hand side of the colon (:) to understand what this image is, which is very unnatural.

I guess it boils down to how are users actually using this? If they expect ECR to correctly map to a regular docker registry, I think we should stick to the current modeling. Also, this feels like a one way door since there is no way of supporting latest with a single repo. I also want to make sure we aren't rearchitecting for simplicty at the expense of user experience.
I mean, how much is the user actually affected by having these multiple repositories?

I guess maybe this comment belongs in the initial CI/CD spec where we decided to do this change.

Thoughts?

*almost: Why? Because we currently only push latest tag, without a strict version using a hash. In reality, the images in each repo should look like:

{accountId}.ecr.{region}.amazonaws.com/cdk/{nodeId}:latest
{accountId}.ecr.{region}.amazonaws.com/cdk/{nodeId}:{hash4}
{accountId}.ecr.{region}.amazonaws.com/cdk/{nodeId}:{hash3}
{accountId}.ecr.{region}.amazonaws.com/cdk/{nodeId}:{hash2}
{accountId}.ecr.{region}.amazonaws.com/cdk/{nodeId}:{hash1}

Where hash4 is the same as latest. Assuming we want to only keep 5 images.

@eladb
Copy link
Contributor Author

eladb commented Jan 12, 2020

@jogold wrote:

This means that it's only based on the "content" of the Dockerfile and not its "build" result. Let say I have a Dockerfile starting with FROM node:12, when a patch/minor of Node.js is released, when and how does my stack (task def) gets updated?

Yes, that's what it means, and it's intentional. In a CI/CD world, there's a foundational tenet that the only trigger for a change is a commit to the repo. To that end, if you want to update your image, you will need to make a modification to the Dockerfile. It's not an uncommon practice to include some label in the Dockerfile that gets updated manually (or automatically by some external process) and then it triggers a rebuild of the layer.

If you want to automatically rebuild the docker image in every "cdk deploy", you could also salt the source hash by adding something like the current timestamp in extraHash when you define the image asset. Do you think we should support this as a first-class feature? (rebuildAlways: true).

@eladb
Copy link
Contributor Author

eladb commented Jan 12, 2020

@iliapolo asked

Are we sure we want to give up the ability to use latest tag?

The CDK is not a docker image release system. I consider image assets in the CDK as "implementation details" of constructs. They are not designed to be consumable as individual artifacts, only by the CDK constructs that defined them. To that end, the use case of consuming a "latest" CDK image asset is not really viable.

If people want to publish docker images, they have great tools to do that and they could 100% consume those through the CDK, just not model them as docker image assets.

Hope that makes sense. I'll see if I can make this clearer in the CI/CD RFC.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@hoegertn
Copy link
Contributor

@jogold wrote:

This means that it's only based on the "content" of the Dockerfile and not its "build" result. Let say I have a Dockerfile starting with FROM node:12, when a patch/minor of Node.js is released, when and how does my stack (task def) gets updated?

Yes, that's what it means, and it's intentional. In a CI/CD world, there's a foundational tenet that the only trigger for a change is a commit to the repo. To that end, if you want to update your image, you will need to make a modification to the Dockerfile. It's not an uncommon practice to include some label in the Dockerfile that gets updated manually (or automatically by some external process) and then it triggers a rebuild of the layer.

If you want to automatically rebuild the docker image in every "cdk deploy", you could also salt the source hash by adding something like the current timestamp in extraHash when you define the image asset. Do you think we should support this as a first-class feature? (rebuildAlways: true).

Hi, while I agree that only a repo change should trigger a build, that change does not need to be in the Dockerfile. If a change my application that is copied to the image that should be enough. I don't want to change the Dockerfile just for the sake of deployment.

@iliapolo
Copy link
Contributor

@eladb wrote:

They are not designed to be consumable as individual artifacts, only by the CDK constructs that defined them

Yeap, this makes sense, its exactly what I was asking when wondering how are users actually using this. 👍

Peripheral thought

Users might be using asset.repository.repositoryUri + ":latest" to always pull the latest version. I agree this isn't a best practice, but its still something we currently support and needs to be addressed as a breaking change. You mentioned losing latest in the RFC but only with regards to caching, perhaps add this clarification to the RFC as well?

@eladb
Copy link
Contributor Author

eladb commented Jan 12, 2020

@iliapolo wrote:

but its still something we currently support and needs to be addressed as a breaking change. You mentioned losing latest in the RFC but only with regards to caching, perhaps add this clarification to the RFC as well?

Users might rely on this behavior but it's not something we officially support. It's not documented anywhere nor part of any official API. Having said that we should call it out in "BREAKING CHANGES" to raise awareness.

Added to PR description.

@eladb
Copy link
Contributor Author

eladb commented Jan 12, 2020

@hoegertn wrote:

While I agree that only a repo change should trigger a build, that change does not need to be in the Dockerfile. If a change my application that is copied to the image that should be enough. I don't want to change the Dockerfile just for the sake of deployment.

Mmm.. It's not just the Dockerfile, it's the entire docker context directory (usually it's the same directory as the Dockerfile). Usually if your app changes, a rebuild will cause an update of some artifact within the Dockerfile directory and that will invalidate the source hash. Does that make sense or am I missing something in your workflow?

@hoegertn
Copy link
Contributor

@hoegertn wrote:

While I agree that only a repo change should trigger a build, that change does not need to be in the Dockerfile. If a change my application that is copied to the image that should be enough. I don't want to change the Dockerfile just for the sake of deployment.

Mmm.. It's not just the Dockerfile, it's the entire docker context directory (usually it's the same directory as the Dockerfile). Usually if your app changes, a rebuild will cause an update of some artifact within the Dockerfile directory and that will invalidate the source hash. Does that make sense or am I missing something in your workflow?

That sounds great. I understood that only the hash of the Dockerfile would matter and that would be a problem.

@jogold
Copy link
Contributor

jogold commented Jan 12, 2020

Yes, that's what it means, and it's intentional. In a CI/CD world, there's a foundational tenet that the only trigger for a change is a commit to the repo.

Agree and this is an improvement compared to what we have right now where deploys are not guaranteed to be "pure". Maybe the fact that images are not rebuilt anymore at each deploy should be clearly documented? as a BREAKING CHANGE?

Also by using source hashes we are still not 100% "pure" since when deploying to a new region the asset will always get built, potentially resulting in another image in this new region. To be documented somewhere?

If you want to automatically rebuild the docker image in every "cdk deploy", you could also salt the source hash by adding something like the current timestamp in extraHash when you define the image asset. Do you think we should support this as a first-class feature? (rebuildAlways: true).

Yes, this would be a nice feature.

As part of our work on [continuous delivery for CDK apps], we decided to store all docker assets in a single ECR repository per environment. This is consistent with file assets, which are stored in a single S3 bucket which is part of the bootstrap stack. This environment-specific ECR repository uses a well-known physical name `aws-cdk/assets` and will be automatically created if needed. In the future, this repository will be created as part of the environment bootstrapping process.

The primary change is that the location of docker image assets will be fully determined by the framework. The ECR repository name will be hard-coded to `aws-cdk/assets` and the image tag will be based on the source hash of the docker asset. This basically means that we could get rid of the CloudFormation parameter that the CLI used to assign with the image name which helps to reduce the number of asset parameters (#3463).

Since from now on the asset ECR repository will contain different types of images (and not versions of the same image), there is no concept of "latest" image anymore and the optimization that was triggered by the `--ci` flag in the CLI is no longer relevant (pull the "latest"). Luckily CodeBuild now supports docker image layer caching, so this should be the preferred way to optimize docker build times. The `--ci` feature of the CLI is no longer doing anything.

Furthermore, before this change, in order to clean up ECR repositories, a custom resource called `AdoptedRepository` was automatically added to the stack for each asset. The purpose of this resource was to remove the asset's ECR repository it if the asset was no longer referenced by this stack. To address this need with the centralized repository, we plan to introduce a garbage collection capability that users will be able to invoke in order to clean up unused assets both from ECR and S3.

We will introduce a way to customize asset repository names as part of the CI/CD project. In the meantime, if you need to override the default "aws-cdk/assets" name, you can specify a repo name through the context key `assets-ecr-repository-name` (`--context` in the CLI, `cdk.json`, `new App({ context })` or `stack.setContext`).

BACKWARDS COMPATIBILITY

As per our guidelines for backwards compatibility, the CLI must be backwards compatible with apps from before this change. However, apps that use CDK >= 1.21.0 will require an upgrade of the CLI. Therefore, to introduce this change, we have made the following non-breaking changes in cx-api:

1. Make `imageNameParameter` optional. If it is specified, the CLI will continue ti
2. Add an optional `imageTag` which instructs the CLI what tag to use for the image. If omitted (by previous versions), `latest` will be used as before.

To make it easy to reason about the behavior for old apps, the CLI now has a new implementations for `prepareContainerAsset` called `prepareContainerImageAssetNew`. This new code path is triggered when the asset metadata *does not include* `imageNameParameter`. The new implementation requires that both `repositoryName` and `imageTag` will be defined. The old code path was only modified to support the new optional `imageTag` property (although it is unlikely to be exercised).

Additional backwards compatibility concerns:

- New ECR repositories the CLI creates will not have the lifecycle policy that retains only the last 5 docker images. This should not have a functional impact on users, but goes back to the imminent garbage collection project.
- The removal of the `AdoptedRepository` resource from all stacks will result in the deletion of all ECR previously created ECR repositories (this is what the AdoptedRepository resource is designed to do). This can be harmful since these repositories are being referenced by the stack. To address this, we invalidate the image ID by salting the source hash. This means that after this change, all container images will have a new ID, which is not maintained by the removed adopted repository resource.

TESTING

- Unit tests for `prepareContainerImage` were duplicated and extended to exercise the new code path while preserving tests for old path.
- All CLI integration tests were executed successfully against the new version.
- Manually tested that the new CLI works with old apps.

This change also fixes #5807 so that custom docker file names are relative and not absolute paths.

[continuous delivery for CDK apps]: #3437

BREAKING CHANGE: all docker image assets are now pushed to a single ECR repository named `aws-cdk/assets` with an image tag based on the hash of the docker build source directory (the directory where your `Dockerfile` resides). See PR #5733 for details and discussion.
@eladb eladb force-pushed the benisrae/container-asset-changes-2 branch from fdabd45 to 209b1e4 Compare January 15, 2020 14:41
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@eladb eladb added the pr/do-not-merge This PR should not be merged at this time. label Jan 16, 2020
Copy link
Contributor

@rix0rrr rix0rrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved modulo tiny changes

@eladb eladb removed the pr/do-not-merge This PR should not be merged at this time. label Jan 16, 2020
@mergify
Copy link
Contributor

mergify bot commented Jan 16, 2020

Thank you for contributing! Your pull request is now being automatically merged.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Jan 16, 2020

Thank you for contributing! Your pull request is now being automatically merged.

@mergify mergify bot merged commit b52b43d into master Jan 16, 2020
@mergify mergify bot deleted the benisrae/container-asset-changes-2 branch January 16, 2020 12:49
@skorfmann
Copy link
Contributor

skorfmann commented Jan 23, 2020

I'm wondering what this means for IAM permissions. Here's an example which we're using at the moment:

    const taskDefinition = new ecs.FargateTaskDefinition(this, 'FooDefinition', {
      memoryLimitMiB: 2048,
      cpu: 1024    
    });

    const asset = new DockerImageAsset(this, 'FooEcrImage', {
      directory: path.join(__dirname, 'assets'),
    });
    
    asset.repository.grantPull(taskDefinition.taskRole)

(snippet from here https://gist.github.com/skorfmann/8da4eb64845e10f5937655520d53ac14#file-docker-image-asset-ts-L13-L22)

If I understand this change correctly, each principal which is granted access to the central ECR repository (aws-cdk/assets), will be able to pull all images. How's that aligned with the principle of minimal privilege?

@caruso-billfire
Copy link

caruso-billfire commented Jan 24, 2020

If I am understanding this correctly if you have N number of Dockerfiles they will all be pushed to the same ECR Repo now?

I feel like service limits were not thought about in this decision. For Organizations wanting to use the CDK who also have a ton of Docker Images they need to store; this cdk construct will lead to service limit related issues.

https://docs.amazonaws.cn/en_us/AmazonECR/latest/userguide/service-quotas.html

Also how would you reference the image outside of the CDK? Is that use case supposed to even be supported? What would the best practices for that be?

@eladb
Copy link
Contributor Author

eladb commented Jan 27, 2020

@caruso-billfire see #5970 and #5971
@skorfmann see #5972

@shellscape
Copy link

IMO it was a mistake to remove this functionality without also providing backcompat in the form of --cache-from and --cache-to support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker assets: cannot build docker images outside the source tree (i.e. against a cdk.out directory)
9 participants