Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixup(sg-resolver): Allow multiple SGs with the same Name tag #3775

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

alloveras
Copy link
Contributor

@alloveras alloveras commented Jul 22, 2024

Issue

I looked but couldn't find any open issues about the problem this pull request intends to solve.

Description

The current implementation of ResolveViaNameOrID incorrectly assumes that searching security groups by the Name tag will always yield a single resource per searched name. This is incorrect because AWS allows multiple security groups in the same VPC to share the same Name tag.

In practice, this issue manifests in the form of a rather cryptic runtime error:

Events:
  Type     Reason             Age                    From       Message
  ----     ------             ----                   ----       -------
  Warning  FailedBuildModel   9m (x207 over 2d2h)    ingress    Failed build model due to couldn't find all securityGroups, nameOrIDs: [some-sg-name], found: [sg-xxxxx sg-yyyyy sg-zzzzz]"

Our particular use case is the following:

We have a large set of IP ranges that we need to allow ingress from in our ALBs. Unfortunately, the number of CIDR ranges we need to allow lists is bigger than the default maximum quota of 60 ingress rules per security group. To remediate that, we created multiple security groups all with different IDs but with the same tag:Name=allow-ingress-from-xyz, and spread the CIDR ranges across them.

We have some automation in place to create/delete these security group resources when the number of allowed CIDR ranges changes. Hence, we cannot hard-code the security group IDs in the alb.ingress.kubernetes.io/security-groups annotation. Instead, we would like the controller to dynamically resolve the security group IDs to attach to the ALB from the specified security group names in the annotation.

In the current implementation, the approach described above works if and only if all the security group names specified in the annotation resolve to a single security group ID (1-to-1 mapping). Otherwise, the controller errors out with the error message shown above.

The approach I've taken to solve this problem is the following:

  • For ID-based lookups, it is correct to assume that the number of security group IDs requested needs to match with the number of security group IDs found. That is because, within a VPC, security group IDs are unique.

  • For name-based lookups, each given name needs to match one or more security groups.

To implement that, I moved the "response correctness" logic from the ResolveViaNameOrID to the respective resolveViaGroupID and resolveViaGroupName so that it can be slightly different on each search strategy.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the docs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

Copy link

linux-foundation-easycla bot commented Jul 22, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

Welcome @alloveras!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 22, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @alloveras. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 22, 2024
@alloveras alloveras force-pushed the main branch 2 times, most recently from c6e87d1 to 451d9b9 Compare July 22, 2024 07:30
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 22, 2024
@shraddhabang
Copy link
Collaborator

Thank for your contribution. We will start reviewing this.
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 24, 2024
@shraddhabang
Copy link
Collaborator

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2024
@shraddhabang
Copy link
Collaborator

/assign @oliviassss @wweiwei-li

sgs, err := r.ec2Client.DescribeSecurityGroupsAsList(ctx, req)
if err != nil {
return nil, err
}

resolvedSGNames := make([]string, 0, len(sgs))
for _, sg := range sgs {
Copy link
Collaborator

@oliviassss oliviassss Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add nil check for the sgs and sg.Tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Golang specification tells us that these would be unnecessarily redundant.

A "for" statement with a "range" clause iterates through all entries of an array, slice, string or map, values received on a channel, or integer values from zero to an upper limit [Go 1.22]. For each entry it assigns iteration values to corresponding iteration variables if present and then executes the block.
...
For an array, pointer to array, or slice value a, the index iteration [...] For a nil slice, the number of iterations is 0.

Source: https://go.dev/ref/spec#For_range

Example: https://go.dev/play/p/vBe3S9lbhvh

That said, if you feel strongly about having them or is a common pattern in the codebase I am more than happy to add an extra commit with the additional nil checks 😉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah for the sgs it was fine, but how about sg.Tags? I know in this function we are filtering by tags, so the returned sgs should have Tags field, but just in case if some code changes in the future and we miss this, it will error out because the controller does not check whether there's a Tag spec.

Copy link
Contributor Author

@alloveras alloveras Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how about sg.Tags?

I think for sg.Tags the reasoning is the the same 🤔? Also, if the listing code changes and the returned SecurityGroup struct removes/renames the Tags field, that will produce a compiler error and, hence, we will detect it immediately. Am I missing something?

args: args{
nameOrIDs: []string{
"sg group one",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep a negative test case for the mix name&ids ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 12a7953 to add the negative test for mix name & ids as well as improve error reporting a little bit.

pkg/networking/security_group_resolver.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2024
@oliviassss
Copy link
Collaborator

/lgtm
/approve
this needs a minor release, and we need to call out in the release note too. Thanks. CC: @shraddhabang

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alloveras, oliviassss, shraddhabang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit e1d32f4 into kubernetes-sigs:main Aug 16, 2024
5 checks passed
@alloveras
Copy link
Contributor Author

this needs a minor release, and we need to call out in the release note too.

Hey @oliviassss,
I just noticed that a new patch release got recently issued and it does not include the fix in this PR. Do I need to do anything (Github comment, others) to ensure that this change will be included in future releases?

Thank you!

@shraddhabang
Copy link
Collaborator

Hey @alloveras , The patch release v2.8.3 was done just to address a CVE. We have planned to inclde these changes in v2.9.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants