Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure removal of security group rules on deleting load balancers #752

Merged

Conversation

JoelSpeed
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:

This PR updates the load balancer security group update logic for ELBs so that we can actually delete a security group when there is an untagged group present.

In the current flow, we can add rules to the security group if it is untagged, but we cannot remove them since the logic excludes untagged groups from the actualGroups list. When we are deleting the load balancer and deleting the security group, we need to make sure that we remove all rules that refer to the security group else the security group deletion will reach a dependency violation deadlock.

To ensure compatibility with BYO security groups, the code ensures that we only pass the isDeleting parameter as true when the existing logic determines that the load balancer should be removing the security group already. This should mean that we only do the full removal of all references when we are about to delete the security group, and if the security group is being left over, we won't remove any references - I don't think in the BYO security group case we have any way to track what we have added so I can't fix that bug here.

Which issue(s) this PR fixes:

Fixes #566

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

When removing a load balancer, the service controller will now remove all security group rules referencing the load balancer's security group, even when the security group containing the rule is unmanaged.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 23, 2023
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 23, 2023
@cartermckinnon
Copy link
Contributor

We're recommending that folks run the separate load balancer controller: https://github.com/kubernetes-sigs/aws-load-balancer-controller

Instead of using the legacy service controller here. Do you see this issue with that controller as well?

@JoelSpeed
Copy link
Contributor Author

As far as I know, this bug only exists in the classic load balancer code path, which has no parallel in the load balancer controller. Until recently, NLBs didn't even support security groups right?

While I appreciate that there's a new way to do things and you can recommend moving people across, the CLB has no equivalent support and as such, still needs to be supported here, and this is a fairly major bug that requires users to manually enter the AWS console/CLI to remove resources to be able to clean up their environment. Else they are leaked.

Note as well that the controllers logic today will happily remove the load balancer and service object, leaking the resources and not even giving the user any indication that resources were leaked, I think this is pretty bad behaviour for a controller like this.

@cartermckinnon
Copy link
Contributor

/assign @M00nF1sh @kishorj

can y'all take a look at this?

@cartermckinnon
Copy link
Contributor

/retest

CI should be fixed now

@JoelSpeed JoelSpeed force-pushed the remove-sg-rules-untagged-groups branch from 60fe9aa to 09471b9 Compare December 6, 2023 15:49
@JoelSpeed
Copy link
Contributor Author

Any of the maintainers able to give this bug a review please?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 15, 2024
@JoelSpeed
Copy link
Contributor Author

Any indication that this might be an acceptable patch? The bug is still present

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 7, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@JoelSpeed
Copy link
Contributor Author

/reopen

Any AWS maintainers able to review this bug?

@k8s-ci-robot k8s-ci-robot reopened this Jul 8, 2024
@k8s-ci-robot
Copy link
Contributor

@JoelSpeed: Reopened this PR.

In response to this:

/reopen

Any AWS maintainers able to review this bug?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kmala
Copy link
Member

kmala commented Jul 31, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 31, 2024
@kmala
Copy link
Member

kmala commented Jul 31, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 31, 2024
@kmala
Copy link
Member

kmala commented Aug 5, 2024

changes mostly looks good, can we do a rebase?

@JoelSpeed JoelSpeed force-pushed the remove-sg-rules-untagged-groups branch from 09471b9 to 4ee1d55 Compare August 15, 2024 10:30
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2024
@JoelSpeed
Copy link
Contributor Author

The E2E failures here don't look to be related as far as I can tell, @kmala do you agree or do I need to look deeper?

@kmala
Copy link
Member

kmala commented Aug 15, 2024

The E2E failures here don't look to be related as far as I can tell, @kmala do you agree or do I need to look deeper?

they are happening for other PR's also #1016 (comment) . Looking at the reason for the issue.

@kmala
Copy link
Member

kmala commented Aug 19, 2024

@JoelSpeed can you rebase such that the e2e works as its been fixed

@JoelSpeed JoelSpeed force-pushed the remove-sg-rules-untagged-groups branch from 4ee1d55 to 912f047 Compare August 20, 2024 08:08
@JoelSpeed
Copy link
Contributor Author

I think Prow is supposed to handle that for you, but I've rebased anyway

@kmala
Copy link
Member

kmala commented Aug 20, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kmala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2024
@k8s-ci-robot k8s-ci-robot merged commit d7e05d5 into kubernetes:master Aug 20, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could not delete security groups
7 participants