-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not allow EC2 instance ID NotFound to succeed tagging #674
Conversation
|
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Welcome @ndbaker1! |
Hi @ndbaker1. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
if isAWSErrorInstanceNotFound(err) { | ||
klog.Infof("Couldn't find resource when trying to tag it hence skipping it, %v", err) | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we return this error instead of silencing it, the workItem will be re-queued and presumably we'll successfully tag the instance after the API becomes consistent?
How long did we observe that to take in this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, that's the intention 👍
from the logs we have for this event, CCM executes the tagging work item almost exactly when the instance launches. given that when experimenting through cli
ID=$(aws ec2 run-instances --image-id ami-07d07d65c47e5aa90 --instance-type t2.micro --query Instances[0].InstanceId --output text)
aws ec2 create-tags --resources $ID --tags Key=test,Value=value
aws ec2 describe-tags --filters Name=resource-id,Values=$ID
you pretty much can't encounter the issue, i think any amount of retry in place would fix the issue
/ok-to-test |
Change looks fine to me. IIUC, silencing this error in #448 was just an optimization; if there is an errant This will show up in our error metrics, but I think that's appropriate. |
/retest |
/release-note-none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good change for better reliability. Thanks @ndbaker1!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…pstream-release-1.28 [release-1.28] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
…pstream-release-1.25 [release-1.25] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
…pstream-release-1.24 [release-1.24] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
…pstream-release-1.27 [release-1.27] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
…pstream-release-1.26 [release-1.26] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
…pstream-release-1.23 [release-1.23] Automated cherry pick of #674: do not allow ec2 instance ID not found in tagging path
What type of PR is this?
/kind bug
What this PR does / why we need it:
Removes the graceful handling of
InvalidInstanceID.NotFound
error when attempting totag
an ec2 instance that has not fully come up. This has caused an issue where we've seen the tagging controller misleadingly exit successfully, not actually tagging the instance, and does not re-queue the item to (ideally) execute again once the instance becomes visible.example log feed:
This behavior does satisfy the
untag
action, since removing the tag from a non-existing instance is a no-op, so no changes need to be made there.Its worth mentioning the initial PR to gracefully handle this (#448) aimed to fix all cases discussed in issue #444 where the untracked
InvalidInstanceID.NotFound
errors were valid failure modes in the context of instance termination.Which issue(s) this PR fixes: N/A
Special notes for your reviewer: N/A
Does this PR introduce a user-facing change?: