Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lb: endpointslice controller for externalTrafficPolicyLocal #291

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BartVerc
Copy link

What this PR does / why we need it:
ExternalTrafficPolicy Local requires traffic to route to specific endpoint nodes instead of routing traffic to all existing nodes. The EndpointSlice Controller looks at the Tenant cluster and updates Endpointslices in the infra cluster accordingly.

Which issue(s) this PR fixes:
Fixes #41

Release note:

Added enableEPSController load balancer config flag, to support ExternalTrafficPolicy Local.

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Mar 12, 2024
@kubevirt-bot
Copy link
Contributor

Hi @BartVerc. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign qinqon for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qinqon
Copy link
Contributor

qinqon commented Mar 12, 2024

@BartVerc can you split the PR into a pair of commits vendoring and the rest ?

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really neat PR. I have some concerns about how consistently this behavior of correlating the endpointslices in the tenant with the VM pod in the infra will work though.

This PR is using the tenant endpoint's NodeName to map an endpoint to a VM (excerpt below).

		for _, slice := range tenantSlices {
			for _, endpoint := range slice.Endpoints {
				// find all unique nodes that correspond to an endpoint in a tenant slice
				nodeSet.Insert(*endpoint.NodeName)
			}
		}

The issue here is that the NodeName is an optional field. Is it possible to encounter an endpointslice without a NodeName? For example, a selectorless service with custom endpointslices?

Also, we recently added a new feature to cloud provider kubevirt that allows for the Service that cloud-provider-kubevirt creates in the underlying infra cluster to be "selectorless". This was added as an escapehatch mechanism that allows people to implement their own controllers to manage the mirrored infra services similar to what you've done in this PR. The reason I point that out is, if we find that we can't reliably support the logic you're proposing here for some edge cases, it would be possible for you all to integrate your controller with cloud-provider-kubevirt and use the selectorLess config option. That would let you manage the endpointslices however you see fit and still use cloud-provider-kubevirt.

@BartVerc
Copy link
Author

The issue here is that the NodeName is an optional field. Is it possible to encounter an endpointslice without a NodeName? For example, a selectorless service with custom endpointslices?

That is a valid concern, however, a selectorless LoadBalancer service could point to any endpoint, also outside the cluster. If it's inside the cluster it is up to the user to add the NodeName in my opinion.
With endpoints created via a Service with selector, the kubernetes endpointsliceController tries to find the NodeName via the pod pod.Spec.NodeName, which is set when it's running.
Also, when a tenant wants to create a selectorless LoadBalancer service, it can always decide to set the ExternalTrafficPolicy to Cluster instead of Local.

So I don't think an endpointslice without a NodeName is a concern.

@kubevirt-bot kubevirt-bot added dco-signoff: no Indicates the PR's author has not DCO signed all their commits. and removed dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Mar 26, 2024
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Mar 26, 2024
@davidvossel
Copy link
Member

That is a valid concern, however, a selectorless LoadBalancer service could point to any endpoint, also outside the cluster. If it's inside the cluster it is up to the user to add the NodeName in my opinion.
With endpoints created via a Service with selector, the kubernetes endpointsliceController tries to find the NodeName via the pod pod.Spec.NodeName, which is set when it's running.
Also, when a tenant wants to create a selectorless LoadBalancer service, it can always decide to set the ExternalTrafficPolicy to Cluster instead of Local.

Yeah, I think I get what you're saying.

NodeName is optional, but not setting NodeName on an endpointslice when externalTrafficPolicy is set to Local wouldn't make a lot of sense, so it doesn't seem to impact your logic in practice.

@qinqon
Copy link
Contributor

qinqon commented Apr 4, 2024

Hey @BartVerc, you can reduce the code a lot if you use controller-runtime lib for this, since we are not really using anything specific from cloud-provider infra, what do you think ?

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 5, 2024
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting the PR into multiple commits. That made it way easier to review.

I did a quick pass to try and grok the general idea in more detail. My primary concern from looking at the new controller is that I think it's possible for us to have stale EPS in the infra cluster that do not map to EPS in the tenant.

We have no guarantees that the delete callback will get called when we're watching tenant EPS. If we miss a delete (which definitely happens), then it's possible that we'll have an EPS on the infra that sticks around longer than it should.

any thoughts on how that might be improved?

Another high level comment. Before we consider merging something like this, i'd like to see a basic e2e test case that exercises this controller. something similar to what we have here test/e2e/cloud_provider_kubevirt/load_balancer_test.go that exercises local traffic policy.

Comment on lines +176 to +178
if c.tenantEPSTracker.contains(eps) {
c.tenantEPSTracker.remove(eps)
klog.Infof("get Infra Service for Tenant EndpointSlice: %v/%v", eps.Namespace, eps.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the informer callbacks shouldn't be used to track objects like this. the issue is here is that we're not guaranteed that the DeleteFunc callback will be executed (the informer list/watch is not guaranteed to see the delete)

this would result in a memory leak.

Comment on lines +92 to +96
for _, n := range t.register {
if n == name {
return true
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it looks like you're primarily using the t.register slice as a way of looking up if an endpoint slice exists, a map would be a more efficient data structure. I bet looping through all these EPS could get kind crazy in a cluster with a lot of endpoints

Comment on lines +150 to +159
if c.tenantEPSTracker.contains(eps) {
klog.Infof("get Infra Service for Tenant EndpointSlice: %v/%v", eps.Namespace, eps.Name)
infraSvc, err := c.getInfraServiceFromTenantEPS(context.TODO(), eps)
if err != nil {
klog.Errorf("Failed to get Service in Infra cluster for EndpointSlice %s/%s: %v", eps.Namespace, eps.Name, err)
return
}
klog.Infof("EndpointSlice added: %v/%v", eps.Namespace, eps.Name)
c.queue.Add(newRequest(AddReq, infraSvc, nil))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how accurate changes to tenant endpointSlices are being tracked here.

the c.tenantEPSTracker object will only contain EPS from the last time the service was reconciled. If new tenant EPS are created, this will not trigger an update on the infra side because the new EPS won't exist in the c.tenantEPSTracker yet.

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2024
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 8, 2024
Signed-off-by: Bart Vercoulen <bartv@kumina.nl>
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2024
@BartVerc
Copy link
Author

BartVerc commented Sep 6, 2024

quick update: I found some time to work on this again, but I am having some trouble rebasing it on the main branch.
I started a new branch from main which seems to work better. Not sure if this will end up correctly in this PR, so maybe I need to open a new one.

@kvaps
Copy link
Member

kvaps commented Sep 26, 2024

@BartVerc hi, I refactored your pr and prepared #330 and #331

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2024
@kubevirt-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Local ExternalTrafficPolicy
5 participants