Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efs - ecs : Cannot re-mount an existing efs #26537

Open
ETisREAL opened this issue Jul 27, 2023 · 16 comments
Open

efs - ecs : Cannot re-mount an existing efs #26537

ETisREAL opened this issue Jul 27, 2023 · 16 comments
Labels
@aws-cdk/aws-efs Related to Amazon Elastic File System bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@ETisREAL
Copy link

ETisREAL commented Jul 27, 2023

Describe the bug

Hi, hope to find you well. I am trying to mount an existing EFS to a redis ECS Task. Everything works smoothly the first creation, but no luck when trying to remount the same FS which returns a puzzling error.

Expected Behavior

I should be able to remount the EFS, afterall what is the point of the RetainPolicy otherwise?

Current Behavior

This is my code:

const qmmTasksEfsSecurityGroup = new ec2.SecurityGroup(this, `${props.STAGE}qmmTasksEfsSecurityGroup`, {
            vpc: props.vpc,
            securityGroupName: `${props.STAGE}qmmTasksEfsSecurityGroup`
        })

        let qmmTasksEfs: efs.IFileSystem
        let qmmRedisEfsAccessPoint: efs.IAccessPoint
        let qmmMongoEfsAccessPoint: efs.IAccessPoint

        // if (true) {
        if (props.recreated) {

            qmmTasksEfs = new efs.FileSystem(this, `${props.STAGE}qmmTasksEfs`, {
                fileSystemName: `${props.STAGE}qmmTasksEfs`,
                vpc: props.vpc,
                removalPolicy: cdk.RemovalPolicy.RETAIN,
                securityGroup: qmmTasksEfsSecurityGroup,
                encrypted: true,
                lifecyclePolicy: efs.LifecyclePolicy.AFTER_30_DAYS,
                enableAutomaticBackups: true
            })
            
            new cdk.CfnOutput(this, 'QlashMainClusterEFSID', {
                exportName: 'QlashMainClusterEFSID',
                value: qmmTasksEfs.fileSystemId
            })
    
            qmmRedisEfsAccessPoint = new efs.AccessPoint(this, `${props.STAGE}qmmRedisAccessPoint`, {
                fileSystem: qmmTasksEfs,
                path: '/redis',
                createAcl: {
                    ownerGid: '1001',
                    ownerUid: '1001',
                    permissions: '750'
                },
                posixUser: {
                    uid: '1001',
                    gid: '1001'
                }
            })

            qmmRedisEfsAccessPoint.applyRemovalPolicy(cdk.RemovalPolicy.RETAIN)
    
            new cdk.CfnOutput(this, 'QlashMainClusterRedisAccessPointID', {
                exportName: 'QlashMainClusterRedisAccessPointID',
                value: qmmRedisEfsAccessPoint.accessPointId
            })
    
            qmmMongoEfsAccessPoint = new efs.AccessPoint(this, `${props.STAGE}qmmMongoAccessPoint`, {
                fileSystem: qmmTasksEfs,
                path: '/mongodb',
                createAcl: {
                    ownerGid: '1002',
                    ownerUid: '1002',
                    permissions: '750'
                },
                posixUser: {
                    uid: '1002',
                    gid: '1002'
                }
            })

            qmmMongoEfsAccessPoint.applyRemovalPolicy(cdk.RemovalPolicy.RETAIN)

            new cdk.CfnOutput(this, 'QlashMainClusterMongoAccessPointID', {
                exportName: 'QlashMainClusterMongoAccessPointID',
                value: qmmMongoEfsAccessPoint.accessPointId
            })

        } else {

            qmmTasksEfs = efs.FileSystem.fromFileSystemAttributes(this, `${props.STAGE}qmmTasksEfs`, {
                securityGroup: qmmTasksEfsSecurityGroup,
                fileSystemId: config.QlashMainClusterEFSID
            })

            qmmRedisEfsAccessPoint = efs.AccessPoint.fromAccessPointId(this, `${props.STAGE}qmmRedisAccessPoint`, config.QlashMainClusterRedisAccessPointID)

            qmmMongoEfsAccessPoint = efs.AccessPoint.fromAccessPointId(this, `${props.STAGE}qmmMongoAccessPoint`, config.QlashMainClusterMongoAccessPointID)
        }



        // Redis

        const qmmRedisServiceSecurityGroup = new ec2.SecurityGroup(this, `${props.STAGE}qmmRedisSecurityGroup`, {
            vpc: props.vpc,
            securityGroupName: `${props.STAGE}qmmRedisSecurityGroup`
        })
        
        qmmTasksEfsSecurityGroup.addIngressRule(
            ec2.Peer.securityGroupId(qmmRedisServiceSecurityGroup.securityGroupId),
            ec2.Port.tcp(2049),
            'Allow inbound traffic from qmm_redis to qmmTasksEfs'
        )

        if (props.qlashMainInstanceSecurityGroup) {
            qmmRedisServiceSecurityGroup.addIngressRule(
                ec2.Peer.securityGroupId(props.qlashMainInstanceSecurityGroup.securityGroupId),
                ec2.Port.tcp(6379),
                'Allow inbound traffic to qmm_redis from qmmMain instance'
            )
        }

        qmmRedisServiceSecurityGroup.addIngressRule(
            ec2.Peer.ipv4(props.vpc.vpcCidrBlock),
            ec2.Port.tcp(6379),
            'Allow inbound traffic to qmm_redis from resources in qlashMainClusterVpc'
        )

        const qmmRedisTaskDefinition = new ecs.FargateTaskDefinition(this, `${props.STAGE.toLowerCase()}qmmRedisTask`, {
            cpu: 2048,
            memoryLimitMiB: 8192,
            volumes: [
                {
                    name: `${props.STAGE.toLowerCase()}_qmm_redis_volume`,
                    efsVolumeConfiguration: {
                        fileSystemId: qmmTasksEfs.fileSystemId,
                        transitEncryption: 'ENABLED',
                        authorizationConfig: {
                            accessPointId: qmmRedisEfsAccessPoint.accessPointId,
                            iam: 'ENABLED'
                        }
                    }
                }
            ]
        })

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: [
                    'elasticfilesystem:ClientWrite',
                    'elasticfilesystem:ClientMount',
                    'elasticfilesystem:ClientRootAccess',
                    'elasticfilesystem:DescribeMountTargets',
                    'elasticfilesystem:CreateAccessPoint',
                    'elasticfilesystem:DeleteAccessPoint'
                ],
                resources: [qmmTasksEfs.fileSystemArn],
            })
        )

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: [
                    'elasticfilesystem:DescribeAccessPoints',
                    'elasticfilesystem:DescribeFileSystems'
                ],
                resources: ["*"],
            })
        )

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: ['ec2:DescribeAvailabilityZones'],
                resources: ['*']
            })
        )

Reproduction Steps

When running the following code trying to remount the EFS, you will get this error:

ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: Failed to resolve "fs-006afd6cee7891114.efs.eu-central-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. : unsuccessful EFS utils command execution; code: 1

What realy sounds strange is this:

Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first

Possible Solution

I don't even know if this is something that is up to you guys or if it is an internal error from EFS itself

Additional Information/Context

I've tried giving the task permissions on everything, just to check if it was a permission issue, but to no good

CDK CLI Version

2.88

Framework Version

No response

Node.js Version

v18.15.0

OS

Linux - Ubuntu

Language

Typescript

Language Version

No response

Other information

No response

@ETisREAL ETisREAL added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 27, 2023
@github-actions github-actions bot added the @aws-cdk/aws-efs Related to Amazon Elastic File System label Jul 27, 2023
@peterwoodworth
Copy link
Contributor

Have you tried researching the service errors? e.g. https://repost.aws/knowledge-center/fargate-unable-to-mount-efs

This doesn't look like it's a CDK issue at first glance, but rather either a configuration issue or possibly a service bug. But we can't rule anything out yet, I'm just curious how much you've looked into + double checked the configuration

@peterwoodworth peterwoodworth added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Jul 27, 2023
@ETisREAL
Copy link
Author

@peterwoodworth I will try out the trobleshooting procedures indicated in the link. So far, I've tried to grant all IAM permissions to the task (just to see if the issue was there), I've also retained the Security Group, which (as expected honestly) don't make a difference and in terms of the task I am running is the same container I am using in the first creation, which is a redis:alpine instance.

I doubt myself that this issue is CDK related. Where should I bring this up though? Is there a specific AWS Forum for each service?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jul 28, 2023
@peterwoodworth
Copy link
Contributor

I doubt myself that this issue is CDK related. Where should I bring this up though? Is there a specific AWS Forum for each service?

Well, it might be CDK related. I didn't look at this in-depth enough to rule out CDK. Though, I'm not super familiar with these services so I'm not sure without a deep dive.

Another place to receive help is premium support, or repost

Let me know if you are able to figure out where the error is coming from or if you've been able to unblock

@peterwoodworth peterwoodworth added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 3, 2023
@ETisREAL
Copy link
Author

ETisREAL commented Aug 5, 2023

@peterwoodworth as of now, still stuck. Will let you know if I figure this out. Thank you Peter

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 5, 2023
@peterwoodworth
Copy link
Contributor

Sorry, what exactly is it that you mean by "remount" the file system?

@peterwoodworth peterwoodworth added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 11, 2023
@ETisREAL
Copy link
Author

@peterwoodworth yeah sorry, it is vague. Basically I mean reusing, reattaching an existing EFS (that had a RetainPolicy.RETAIN set for instance) when launching a stack, so that the ECS tasks that were using said EFS, could mount it again on the same AccessPoint and retrieve the data. Does it make sense?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Aug 12, 2023
@peterwoodworth
Copy link
Contributor

Sorry for the delay @ETisREAL,

I'm not exactly sure what you mean. If you provide clear repro steps, including the code deployed at each step it would be really helpful. Especially if it's a full reproduction that's as minimized as possible

@peterwoodworth peterwoodworth added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Sep 7, 2023
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 10, 2023
@ETisREAL
Copy link
Author

ETisREAL commented Sep 10, 2023

No worries @peterwoodworth Thanks for the help either way.

In order to reproduce it:

  1. Launch this stack first:

cdk deploy "StackName"

const qmmTasksEfsSecurityGroup = new ec2.SecurityGroup(this, `qmmTasksEfsSecurityGroup`, {
            vpc: props.vpc,
            securityGroupName: `qmmTasksEfsSecurityGroup`
        })

        let qmmTasksEfs: efs.IFileSystem
        let qmmRedisEfsAccessPoint: efs.IAccessPoint

        if (true) {
            qmmTasksEfs = new efs.FileSystem(this, `qmmTasksEfs`, {
                fileSystemName: `qmmTasksEfs`,
                vpc: props.vpc,
                removalPolicy: cdk.RemovalPolicy.RETAIN,
                securityGroup: qmmTasksEfsSecurityGroup,
                encrypted: true,
                lifecyclePolicy: efs.LifecyclePolicy.AFTER_30_DAYS,
                enableAutomaticBackups: true
            })
            
            new cdk.CfnOutput(this, 'QlashMainClusterEFSID', {
                exportName: 'QlashMainClusterEFSID',
                value: qmmTasksEfs.fileSystemId
            })
    
            qmmRedisEfsAccessPoint = new efs.AccessPoint(this, `qmmRedisAccessPoint`, {
                fileSystem: qmmTasksEfs,
                path: '/redis',
                createAcl: {
                    ownerGid: '1001',
                    ownerUid: '1001',
                    permissions: '750'
                },
                posixUser: {
                    uid: '1001',
                    gid: '1001'
                }
            })

            qmmRedisEfsAccessPoint.applyRemovalPolicy(cdk.RemovalPolicy.RETAIN)
    
            new cdk.CfnOutput(this, 'QlashMainClusterRedisAccessPointID', {
                exportName: 'QlashMainClusterRedisAccessPointID',
                value: qmmRedisEfsAccessPoint.accessPointId
            })
  
        } else {

            qmmTasksEfs = efs.FileSystem.fromFileSystemAttributes(this, `qmmTasksEfs`, {
                securityGroup: qmmTasksEfsSecurityGroup,
                fileSystemId: config.QlashMainClusterEFSID
            })

            qmmRedisEfsAccessPoint = efs.AccessPoint.fromAccessPointId(this, `qmmRedisAccessPoint`, config.QlashMainClusterRedisAccessPointID)


        // Redis

        const qmmRedisServiceSecurityGroup = new ec2.SecurityGroup(this, `qmmRedisSecurityGroup`, {
            vpc: props.vpc,
            securityGroupName: `qmmRedisSecurityGroup`
        })
        
        qmmTasksEfsSecurityGroup.addIngressRule(
            ec2.Peer.securityGroupId(qmmRedisServiceSecurityGroup.securityGroupId),
            ec2.Port.tcp(2049),
            'Allow inbound traffic from qmm_redis to qmmTasksEfs'
        )

        const qmmRedisTaskDefinition = new ecs.FargateTaskDefinition(this, `qmmRedisTask`, {
            cpu: 2048,
            memoryLimitMiB: 8192,
            volumes: [
                {
                    name: `qmm_redis_volume`,
                    efsVolumeConfiguration: {
                        fileSystemId: qmmTasksEfs.fileSystemId,
                        transitEncryption: 'ENABLED',
                        authorizationConfig: {
                            accessPointId: qmmRedisEfsAccessPoint.accessPointId,
                            iam: 'ENABLED'
                        }
                    }
                }
            ]
        })

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: [
                    'elasticfilesystem:ClientWrite',
                    'elasticfilesystem:ClientMount',
                    'elasticfilesystem:ClientRootAccess',
                    'elasticfilesystem:DescribeMountTargets',
                    'elasticfilesystem:CreateAccessPoint',
                    'elasticfilesystem:DeleteAccessPoint'
                ],
                resources: [qmmTasksEfs.fileSystemArn],
            })
        )

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: [
                    'elasticfilesystem:DescribeAccessPoints',
                    'elasticfilesystem:DescribeFileSystems'
                ],
                resources: ["*"],
            })
        )

        qmmRedisTaskDefinition.addToTaskRolePolicy(
            new iam.PolicyStatement({
                actions: ['ec2:DescribeAvailabilityZones'],
                resources: ['*']
            })
        )

const qmmRedisContainer = qmmRedisTaskDefinition.addContainer(`qmm_redis`, {
            image: ecs.ContainerImage.fromAsset('redis'),
            containerName: `qmm_redis`,
            portMappings: [{ containerPort: 6379, name: `qmm_redis` }],
            healthCheck: {
                command: ["CMD", "redis-cli", "-h", "localhost", "-p", "6379", "ping"],
                interval: cdk.Duration.seconds(20),
                timeout: cdk.Duration.seconds(20),
                retries: 5
            },
            logging: ecs.LogDriver.awsLogs({streamPrefix: `qmm_redis`, logRetention: RetentionDays.ONE_DAY}),
            command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
        })

        qmmRedisContainer.addMountPoints({
            sourceVolume: `qmm_redis_volume`,
            containerPath: '/redis/data',
            readOnly: false
        })

        const qmmRedisService = new ecs.FargateService(this, `qmmRedisService`, {
            serviceName: `qmmRedisService`,
            cluster: qlashMainCluster,
            desiredCount: 1,
            securityGroups: [qmmRedisServiceSecurityGroup],
            taskDefinition: qmmRedisTaskDefinition
        })
  1. Tear down the stack:

cdk destroy "StackName"

This will retain the EFS and Access Point because of the cdk.RetainPolic.RETAIN

  1. Redeploy the stack
    • change the if (true) { to if (false) { to make sure it is using this same EFS, and not creating a new one.
    • hardcode (or import from the cdk.json file, as you prefer) the values of the filesystem ID and access point ID here

cdk deploy "StackName"

qmmTasksEfs = efs.FileSystem.fromFileSystemAttributes(this, `qmmTasksEfs`, {
                securityGroup: qmmTasksEfsSecurityGroup,
                fileSystemId: config.QlashMainClusterEFSID
            })

            qmmRedisEfsAccessPoint = efs.AccessPoint.fromAccessPointId(this, `qmmRedisAccessPoint`, config.QlashMainClusterRedisAccessPointID)

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Sep 10, 2023
@cosbor11
Copy link

Im having this same issue, when importing a efs file system

@pahud pahud self-assigned this Nov 30, 2023
@pahud
Copy link
Contributor

pahud commented Nov 30, 2023

Does it only happen when importing or re-using an existing EFS filesystem?

ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: Failed to resolve "fs-006afd6cee7891114.efs.eu-central-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID. See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail. Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first. : unsuccessful EFS utils command execution; code: 1

After you destroy the stack with removal policy as RETAIN, are you still able to see/list this filesystem ID in the EFS console? Not sure if this is a bug but looks like this filesystem ID is invalid when the resource is destroyed with retain removal policy?

@pahud pahud added p2 effort/medium Medium work item – several days of effort labels Nov 30, 2023
@pahud pahud removed their assignment Nov 30, 2023
@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Nov 30, 2023
@ETisREAL
Copy link
Author

ETisREAL commented Dec 2, 2023

@pahud yes I can still see the filesystem from the console and list it with its id

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 2, 2023
@vipulaSD
Copy link

I'm having the same issue.
One observation which has not mentioned in the previous discussion,

When CDK destroy the current stack, it deletes the mount targets from the EFS

@ETisREAL
Copy link
Author

ETisREAL commented Jan 10, 2024

I'm having the same issue. One observation which has not mentioned in the previous discussion,

When CDK destroy the current stack, it deletes the mount targets from the EFS

Even when you explicitly set the lifecycle policy to retain ?
ex.

qmmRedisEfsAccessPoint.applyRemovalPolicy(cdk.RemovalPolicy.RETAIN)

Because I haven't noticed it if I set the policy to RETAIN

@vipulaSD
Copy link

vipulaSD commented Jan 10, 2024

I'm having the same issue. One observation which has not mentioned in the previous discussion,
When CDK destroy the current stack, it deletes the mount targets from the EFS

Even when you explicitly set the lifecycle policy to retain ? ex.

qmmRedisEfsAccessPoint.applyRemovalPolicy(cdk.RemovalPolicy.RETAIN)

Because I haven't noticed it if I set the policy to RETAIN

Yes, I have set the removal policy for the AccessPoint, access point retains but entries in the "network" tab are get deleted.

UPDATE: I have manually created the entries in the network tab. After that the service starts as expected

@srshi
Copy link

srshi commented Jan 26, 2024

I had same issue. In my code I don't retain VPC, so it makes sense, because mount targets have ENI and deletion of VPC will fail if it has ENI. So I solved this issue by creating new mount targets with CfnMountTarget when I reuse File System. Almost the same method as @vipulaSD wrote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-efs Related to Amazon Elastic File System bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

No branches or pull requests

6 participants