Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ambient: support explicit EndpointSlice endpoints #51591

Merged
merged 2 commits into from
Jun 24, 2024

Conversation

howardjohn
Copy link
Member

@howardjohn howardjohn commented Jun 14, 2024

This adds support for reading EndpointSlice addresses in ambient. This works by getting all non-pod addresses and sending them over.

The #1 use case is the kubernetes.default.svc service. This has a test case, and it was intentionally broken in istio/ztunnel@636c69c. That is why this PR does not have any new e2e tests; they already exist and are broken with ztunnel from HEAD (ztunnel has not update as a result).

Fixes #51539

@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Jun 14, 2024
@istio-testing
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2024
This is for direct endpoint-slice objects. Note this is required after istio/ztunnel#1137
otherwise kubernetes.default.svc is broken. However, this is useful
beyond that for correctness.
@howardjohn howardjohn changed the title ambient/endpoint slices ambient: support explicit EndpointSlice endpoints Jun 17, 2024
@howardjohn howardjohn marked this pull request as ready for review June 17, 2024 19:26
@howardjohn howardjohn requested a review from a team as a code owner June 17, 2024 19:26
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 17, 2024
@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

Copy link
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is the security impact - left few cosmetic comments as well.

With pods - K8S has some restrictions on the CIDR range and manages the allocation (there are some issues there as well probably).

What is the impact of a malicious user creating a ServiceEntry with an IP of an external service or non-local K8S Service ( from other clsuters ) or even a cluster local service ?

Would be great if the code had more comments - in particular around sensitive areas like possible abuse and why it's safe against that.

Do we want to have some extra protections - like allowing user to turn off watching arbitrary SE, or restricting the IP range ?

WorkloadServices krt.Collection[model.ServiceInfo],
) krt.TransformationMulti[*discovery.EndpointSlice, model.WorkloadInfo] {
return func(ctx krt.HandlerContext, es *discovery.EndpointSlice) []model.WorkloadInfo {
// We only care about EndpointSlices that are for a Service.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ? I think ES can be used without Service to represent endpoints that may not have a Service associated ( but something else ?). Don't we want to apply the same rules as we do for pods ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can represent them, but there is no way for Istio to know what to do with them. They are just data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from the perspective of 'workload to workload' communication - we treat them as plain text ?
Instead of for example checking if they have a HBONE port, generating some rule to check spiffe://.../ns/namespace/sa/*, etc ?

That's related to my security question on how we control malicious IPs - but on the opposite side, assuming ES is strictly locked with RBAC/OPA - and user is using them to represent non-Pod workloads that may or may not use Service but are clearly getting associated with a namespace and have port info ( which in turn may allow inferring if they use HBONE).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: not sure I fully understand all ambient code, but will this only add endpoints to Service - or will it have any impact on the workload-to-workload communication ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it even mean to have an EndpointSlice read by a proxy that does not have a service associated? There is no other proxies doing this (kube-proxy, istio sidecars, etc).

it doesn't have defined semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - kube-proxy implement Service so I assume watches for ES associated with Service.

I don't know exactly if istio-sidecars don't 'accidentally' do some label selections or have other means to take EndpointSlices not associated with a Service but with a ServiceEntry or WorkloadGroup or something similar - after all we did in the past use selector-less Services and Endpoints to represent VMs.

And for 'others' - and semantics - I don't know. It looks like K8S went pretty far in making ES not require a Service - they could have easily added a field and make it required/validated. The semantics are 'a bag of IPs with labels and port info' - just like Gateway API can attach to many things except Service, no clue of ES is not associated with something else, like a ServiceImport or some VM-like resource.

Not saying I know a specific use - just that we should at least comment the code and make it clear that ES for other things may exist and we choose to ignore them - and perhaps warn if we find them.

Similar comment for the other things we silently ignore - IPs that are not valid ( hostnames - I have not seen the deprecation announce and what will happen with them), etc.

Compared with WorkloadEntry - SE provides about the same capabilities but with a more standard interface and some interoperability. Istio is pushing the boundaries with ServiceEntry and WE - we can't expect nobody else does.

That doesn't mean I'm against this change - just trying to understand the risks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selector-less Services and Endpoints to represent VMs.

Key word there is Service... these are ones associated with a service.

In theory we could do something with the arbitrary ones, but I have no clue what. They do not have useful information to us -- no service account, etc.

health = workloadapi.WorkloadStatus_HEALTHY
}
addresses, err := slices.MapErr(ep.Addresses, func(e string) ([]byte, error) {
n, err := netip.ParseAddr(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ES allows anything else ( UDS, etc ) ? The err seems to drop all endpoints even if just one fails, not sure that's what we want ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not. They must be IP addresses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK FQDN is also allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. It is deprecated now I think though. Ill make sure to more explicitly filter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I commented on is the handling - if I read the code correctly, one bad entry will cause the entire set to be ignored. Maybe I missread.

Error or some varz may be better than silently ignoring.

@costinm
Copy link
Contributor

costinm commented Jun 18, 2024 via email

@howardjohn
Copy link
Member Author

added more comments

@howardjohn howardjohn requested a review from costinm June 18, 2024 16:13
@howardjohn howardjohn added the cherrypick/release-1.22 Set this label on a PR to auto-merge it to the release-1.22 branch label Jun 19, 2024
@howardjohn
Copy link
Member Author

blocking #51592

// no service found
return nil
}
// There SHOULD only be one. This is only Service which has unique hostnames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: debug level warn when there are more than 1

@stevenctl
Copy link
Contributor

Could be nice to have a really basic unit test for the kubernetes.default.svc case

@istio-testing istio-testing merged commit 99d3d4c into istio:master Jun 24, 2024
28 checks passed
@istio-testing
Copy link
Collaborator

In response to a cherrypick label: #51591 failed to apply on top of branch "release-1.22":

Applying: ambient: support reading workloads from EndpointSlice
Using index info to reconstruct a base tree...
M	pilot/pkg/serviceregistry/kube/controller/ambient/workloads.go
Falling back to patching base and 3-way merge...
Auto-merging pilot/pkg/serviceregistry/kube/controller/ambient/workloads.go
CONFLICT (content): Merge conflict in pilot/pkg/serviceregistry/kube/controller/ambient/workloads.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 ambient: support reading workloads from EndpointSlice
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

@istio-testing
Copy link
Collaborator

In response to a cherrypick label: new issue created for failed cherrypick: #51690

howardjohn added a commit to howardjohn/istio that referenced this pull request Jun 24, 2024
* ambient: support reading workloads from EndpointSlice

This is for direct endpoint-slice objects. Note this is required after istio/ztunnel#1137
otherwise kubernetes.default.svc is broken. However, this is useful
beyond that for correctness.

* add comments

(cherry picked from commit 99d3d4c)
howardjohn added a commit to howardjohn/istio that referenced this pull request Jun 24, 2024
* ambient: support reading workloads from EndpointSlice

This is for direct endpoint-slice objects. Note this is required after istio/ztunnel#1137
otherwise kubernetes.default.svc is broken. However, this is useful
beyond that for correctness.

* add comments

(cherry picked from commit 99d3d4c)
istio-testing pushed a commit that referenced this pull request Jun 25, 2024
* ambient: support reading workloads from EndpointSlice

This is for direct endpoint-slice objects. Note this is required after istio/ztunnel#1137
otherwise kubernetes.default.svc is broken. However, this is useful
beyond that for correctness.

* add comments

(cherry picked from commit 99d3d4c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick/release-1.22 Set this label on a PR to auto-merge it to the release-1.22 branch release-notes-none Indicates a PR that does not require release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test and support manual EndpointSlices in ambient
5 participants