-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lb: endpointslice controller for externalTrafficPolicyLocal #291
base: main
Are you sure you want to change the base?
Conversation
Hi @BartVerc. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@BartVerc can you split the PR into a pair of commits vendoring and the rest ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really neat PR. I have some concerns about how consistently this behavior of correlating the endpointslices in the tenant with the VM pod in the infra will work though.
This PR is using the tenant endpoint's NodeName to map an endpoint to a VM (excerpt below).
for _, slice := range tenantSlices {
for _, endpoint := range slice.Endpoints {
// find all unique nodes that correspond to an endpoint in a tenant slice
nodeSet.Insert(*endpoint.NodeName)
}
}
The issue here is that the NodeName is an optional field. Is it possible to encounter an endpointslice without a NodeName? For example, a selectorless service with custom endpointslices?
Also, we recently added a new feature to cloud provider kubevirt that allows for the Service that cloud-provider-kubevirt creates in the underlying infra cluster to be "selectorless". This was added as an escapehatch mechanism that allows people to implement their own controllers to manage the mirrored infra services similar to what you've done in this PR. The reason I point that out is, if we find that we can't reliably support the logic you're proposing here for some edge cases, it would be possible for you all to integrate your controller with cloud-provider-kubevirt and use the selectorLess config option. That would let you manage the endpointslices however you see fit and still use cloud-provider-kubevirt.
That is a valid concern, however, a selectorless LoadBalancer service could point to any endpoint, also outside the cluster. If it's inside the cluster it is up to the user to add the NodeName in my opinion. So I don't think an endpointslice without a NodeName is a concern. |
75de3ba
to
0e00a40
Compare
0e00a40
to
f6379e6
Compare
Yeah, I think I get what you're saying.
|
Hey @BartVerc, you can reduce the code a lot if you use controller-runtime lib for this, since we are not really using anything specific from cloud-provider infra, what do you think ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for splitting the PR into multiple commits. That made it way easier to review.
I did a quick pass to try and grok the general idea in more detail. My primary concern from looking at the new controller is that I think it's possible for us to have stale EPS in the infra cluster that do not map to EPS in the tenant.
We have no guarantees that the delete callback will get called when we're watching tenant EPS. If we miss a delete (which definitely happens), then it's possible that we'll have an EPS on the infra that sticks around longer than it should.
any thoughts on how that might be improved?
Another high level comment. Before we consider merging something like this, i'd like to see a basic e2e test case that exercises this controller. something similar to what we have here test/e2e/cloud_provider_kubevirt/load_balancer_test.go
that exercises local traffic policy.
if c.tenantEPSTracker.contains(eps) { | ||
c.tenantEPSTracker.remove(eps) | ||
klog.Infof("get Infra Service for Tenant EndpointSlice: %v/%v", eps.Namespace, eps.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the informer callbacks shouldn't be used to track objects like this. the issue is here is that we're not guaranteed that the DeleteFunc
callback will be executed (the informer list/watch is not guaranteed to see the delete)
this would result in a memory leak.
for _, n := range t.register { | ||
if n == name { | ||
return true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it looks like you're primarily using the t.register
slice as a way of looking up if an endpoint slice exists, a map would be a more efficient data structure. I bet looping through all these EPS could get kind crazy in a cluster with a lot of endpoints
if c.tenantEPSTracker.contains(eps) { | ||
klog.Infof("get Infra Service for Tenant EndpointSlice: %v/%v", eps.Namespace, eps.Name) | ||
infraSvc, err := c.getInfraServiceFromTenantEPS(context.TODO(), eps) | ||
if err != nil { | ||
klog.Errorf("Failed to get Service in Infra cluster for EndpointSlice %s/%s: %v", eps.Namespace, eps.Name, err) | ||
return | ||
} | ||
klog.Infof("EndpointSlice added: %v/%v", eps.Namespace, eps.Name) | ||
c.queue.Add(newRequest(AddReq, infraSvc, nil)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how accurate changes to tenant endpointSlices are being tracked here.
the c.tenantEPSTracker
object will only contain EPS from the last time the service was reconciled. If new tenant EPS are created, this will not trigger an update on the infra side because the new EPS won't exist in the c.tenantEPSTracker
yet.
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
Signed-off-by: Bart Vercoulen <bartv@kumina.nl>
f6379e6
to
d292e75
Compare
quick update: I found some time to work on this again, but I am having some trouble rebasing it on the |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
ExternalTrafficPolicy Local requires traffic to route to specific endpoint nodes instead of routing traffic to all existing nodes. The EndpointSlice Controller looks at the Tenant cluster and updates Endpointslices in the infra cluster accordingly.
Which issue(s) this PR fixes:
Fixes #41
Release note: