Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TLS Termination mode for TLSRoutes #2111

Open
Tracked by #3165
Rycieos opened this issue Jun 12, 2023 · 24 comments
Open
Tracked by #3165

Support TLS Termination mode for TLSRoutes #2111

Rycieos opened this issue Jun 12, 2023 · 24 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@Rycieos
Copy link

Rycieos commented Jun 12, 2023

What would you like to be added:

Currently, the spec for TLSRoute Listeners only supports GatewayTLSConfig.TLSModeType = "Passthrough". Similar to HTTPRoute Listeners, mode "Terminate" should also be supported. Specifically, a Gateway spec of:

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
spec:
  listeners:
  - allowedRoutes:
      kinds:
      - group: gateway.networking.k8s.io
        kind: TLSRoute
    protocol: TLS
    tls:
      mode: Terminate

should be valid and work as expected: which is to strip the TLS layer and pass the TCP traffic to the backend specified by each TLSRoute.

Why this is needed:

I have an application stack that speaks both HTTP as well as a nonstandard application protocol over TCP to backend servers. I want all traffic wrapped in TLS. To allow my developers to iterate quickly, I allow them to create new backend environments at will. To greatly simplify environment creation, I want all HTTPS and TCP traffic to be handled on the same FQDN, meaning the same IP address, meaning the same Gateway object. And to make this whole setup simple, I have set a wildcard DNS record on the Gateway IP as well as a matching wildcard certificate.

For example, my Gateway has a DNS A record set to *.dev.example.com and a wildcard cert to match. Now a developer can create an HTTPRoute and a TLSRoute pointing to their application with the domain foobar.dev.example.com, and it just works. But without this suggested feature, the TLS traffic would not be terminated at the Gateway, meaning the application would need to accept it and present a valid certificate for that domain.

Other options:

As pointed out by @skriss in projectcontour/contour#5461 (comment), there is apparently a workaround in creating a Listener per domain and using TCPRoutes instead (though I have not tested this). But this negates the benefit of the Gateway API for my use case, which is that I can create a single Gateway with a single DNS record and a single TLS certificate, and allow a developer to route any TLS traffic to any service and at the same time handle the TLS termination so the application does not need to.

Other notes:

Traefik's Gateway API implimentation (while flawed and not spec compliant in many ways) does support TLSRoute TLS termination. I am using it currently for my above use case, and it works quite well.

@Rycieos Rycieos added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 12, 2023
@candita
Copy link
Contributor

candita commented Jun 12, 2023

@Rycieos
Copy link
Author

Rycieos commented Jun 12, 2023

is this the same as TLS Use Case 3 in @youngnick's document

It would appear so, yes. I am not sure why it would use a TCPRoute vs a TLSRoute, or what the difference would even be, but I guess that's implementation details that I do not care about.

@candita
Copy link
Contributor

candita commented Jun 12, 2023

is this the same as TLS Use Case 3 in @youngnick's document

It would appear so, yes. I am not sure why it would use a TCPRoute vs a TLSRoute, or what the difference would even be, but I guess that's implementation details that I do not care about.

If so, then maybe you can already do this with TCPRoute (https://gateway-api.sigs.k8s.io/concepts/api-overview/#tcproute-and-udproute) and it may be a matter of working on moving it out of the Experimental Channel.

@Rycieos
Copy link
Author

Rycieos commented Jun 12, 2023

maybe you can already do this with TCPRoute (https://gateway-api.sigs.k8s.io/concepts/api-overview/#tcproute-and-udproute)

That doc says:

each TCPRoute really needs a different port on the listener (in general, anyway).

Which makes my use case impossible. A TCPRoute does not have a hostnames field like a TLSRoute does which would allow for matching and routing based on SNI. So I guess that use case 3 in that document isn't exactly a match, as I need this to work with TLSRoutes.

@youngnick
Copy link
Contributor

Okay, thanks for the clarifications. Let me write out what seem to be the critical parts to me and see if we have this right:

  • There should be one Listener that accepts TLS connections on some port (let's say 443 for argument purposes)
  • That Listener will have both TLSRoutes and HTTPRoutes attached to it.
  • The TLSRoutes and HTTPRoutes will all have a hostname set, and the hostname will be used to choose where particular traffic will flow, based on the SNI of the TLS handshake.
  • For TLSRoutes, the TLS connection should be terminated, and the TCP traffic forwarded directly to the backend, unencrypted. (This is why we've talked about this as a TCPRoute previously, because it's forwarding a TCP stream).
  • For HTTPRoutes, the TLS connection should be terminated and the HTTP traffic forwarded as directed by the HTTPRoute spec.

The critical part here is that the routing discriminator that allows the implementation to decide which Route should get the traffic is the hostname field in the HTTPRoute or TLSRoute, which must be matched against the SNI (and which will need to match the usual Listener->Route hostname rules as well).

If that's correct, then this is a bit new, I don't think we've really specced out rules for what happens when different types of Route are attached to the same Listener.

I'll hold off on further comments until @Rycieos can confirm if I've got the bones correct.

@Rycieos
Copy link
Author

Rycieos commented Jun 13, 2023

@youngnick that is all correct, except for:

That Listener will have both TLSRoutes and HTTPRoutes attached to it.

That is one valid use case, but I can imagine two more:

  1. (restating your use case): One Listener that accepts both TLSRoutes and HTTPRoutes, on the same port, but different hostnames.
  2. Two Listeners, on different ports: one that accepts TLSRoutes and one that accepts HTTPRoutes, and can have the same hostnames.
  3. One Listener that accepts both TLSRoutes and HTTPRoutes, on the same port, and can have the same hostnames.

I think all three use cases are valid, but 3) would be hard to implement. You would need a heuristic to identify HTTP layer 5 traffic to know if the incoming traffic is HTTP or something else. I think leaving it out of the spec somehow is valid.

But both of the other two uses cases should be supported in my opinion. You are correct that the first use case is something new, and is probably not simple to implement. The second use case is much simpler, and why I opened this issue, as it is only asking for TLSRoutes to work exactly the same as HTTPRoutes in regards to Listeners, bindings, and TLS termination.

My specific user story is for the second use case. I'll mention again that Traefik already has support for this use case in their TLSRoute implementation, I would assume because, like me, they misunderstood the spec and thought it was already required.

@youngnick
Copy link
Contributor

Thanks for that @Rycieos, that's great clarification.

In use case 2, are there multiple TLSRoutes attached to the "forward TCP traffic" Listener? That would be the reason to use TLSRoute and not TCPRoute, I think.

If that's the case, that's definitely a reason to allow for this, but I think that we may need to add this as an Extended feature (meaning that not every implementation has to do it) - this would mean that Traefik (and any other implementations that want to) can optionally support the feature, and we'll have conformance tests that verify that things work like expected.

Assuming that we're in agreement about the above, I think we can keep the change tightly scoped to the second use case by saying something like "Using TLSRoutes with TLS mode Terminate is okay, as long as they're the only type of Route attached to that Listener" - that would allow us to rule out the mixed-Route use case, which seems like it would be hard.

That said, it's probably worth getting some feedback about that last point from other implementers - I think that @howardjohn is the most likely to have had to do something like this, but I'd like to hear from anyone about if the mixed-Route-types use case is something we should consider supporting.

@Rycieos
Copy link
Author

Rycieos commented Jun 13, 2023

In use case 2, are there multiple TLSRoutes attached to the "forward TCP traffic" Listener? That would be the reason to use TLSRoute and not TCPRoute, I think.

Correct. Here is an example:
DNS A *.example.com. <GATEWAY_IPV4_LB_IP>

apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
spec:
  listeners:
  - allowedRoutes:
      kinds:
      - group: gateway.networking.k8s.io
        kind: TLSRoute
    protocol: TLS
    tls:
      mode: Terminate
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
spec:
  hostnames:
  - foo.example.com
  rules:
  - backendRefs:
    - kind: Service
      name: foo
      port: 3102
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TLSRoute
spec:
  hostnames:
  - bar.example.com
  rules:
  - backendRefs:
    - kind: Service
      name: bar
      port: 3102

I think that we may need to add this as an Extended feature (meaning that not every implementation has to do it)

That is fine with me, though I would argue that allowing support for TLSRoutes without this feature is pretty much worthless. If we could somehow make this feature a Core feature of the Extended TLSRoute resource, that would be my vote. Unless TLSRoutes were planned to be Core at some point, then it's probably fine.

I think we can keep the change tightly scoped to the second use case

Again fine with me, as my use case fits inside this simple solution.

@sunjayBhatia
Copy link
Member

/triage

@sunjayBhatia
Copy link
Member

/needs-triage

@sunjayBhatia
Copy link
Member

/label needs-triage

@k8s-ci-robot
Copy link
Contributor

@sunjayBhatia: The label(s) /label needs-triage cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label needs-triage

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sunjayBhatia
Copy link
Member

@shaneutt @robscott we're looking at this in Contour so would love if we got consensus upstream, could we add the needs-triage label etc. to get this on the triage radar?

@robscott
Copy link
Member

🤔 I'm not actually sure how to do that with prow. I guess in upstream most issues start with needs-triage, can't find an obvious way in https://prow.k8s.io/command-help, but maybe there's another bot we can enable to match upstream behavior here. In any case, adding the label manually for now.

@robscott robscott added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 20, 2023
@wuxingzhong
Copy link

I have the same problem and I support @Rycieos . Is there a plan to support it?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 17, 2024
@Rycieos
Copy link
Author

Rycieos commented Apr 17, 2024

This enhancement is still very much something I need (and am currently using).

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 17, 2024
@shaneutt
Copy link
Member

Hi @Rycieos 👋

I see you changed the lifecycle on this one, it might be worth putting this on the agenda for an upcoming sync to discuss this one because I have yet to see anyone who's looking to step forward and implement this and maybe talking through it could help give it some gas 🤔

@Rycieos
Copy link
Author

Rycieos commented Apr 17, 2024

Thanks @shaneutt. I agree, this hasn't had much traction. I added it to the meeting agenda; hopefully I can make the next meeting to discuss it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 16, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 15, 2024
@youngnick
Copy link
Contributor

/remove-lifecycle rotten

I think we'll still need to handle this type of usecase, even though we haven't touched it for a while.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 16, 2024
@shaneutt
Copy link
Member

/cc @mlavacca @candita

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

9 participants