Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary deployments with Gateway API between namespaces is not working #3806

Closed
arapulido opened this issue Jun 14, 2021 · 9 comments
Closed
Labels
area/gateway-api Issues or PRs related to the Gateway (Gateway API working group) API. kind/bug Categorizes issue or PR as related to a bug.

Comments

@arapulido
Copy link

arapulido commented Jun 14, 2021

What steps did you take and what happened:

Sometimes is very useful to have canary deployments by deploying the new version in a different namespace, as it allows to have the same service names (first dot) in both deployments. Something like this:

image

This is possible to do using Nginx ingress controller with annotations. I created the following two Ingress objects in each of the namespaces:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  namespace: ns1
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: www.example.org
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: service1
            port:
              number: 80
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
  namespace: ns2
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "50"
spec:
  rules:
  - host: www.example.org
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: service1
            port:
              number: 80

This worked well, and 50% of the traffic was sent to the services in ns2.

I tried to do something similar with Contour and the new Gateway API, but this doesn't seem to work.

I tried two different things:

Two HTTPRoute objects, each one in each of the namespaces

I created two HTTPRoute objects:

kind: HTTPRoute
apiVersion: networking.x-k8s.io/v1alpha1
metadata:
  name: httproute1
  namespace: ns1
  labels:
    app: ecommerce
spec:
  gateways:
    allow: "All"
  hostnames:
    - "www.example.org"
  rules:
    - matches:
        - path:
            type: Prefix
            value: /
      forwardTo:
        - serviceName: service1
          port: 80
          weight: 50
kind: HTTPRoute
apiVersion: networking.x-k8s.io/v1alpha1
metadata:
  name: httproute2
  namespace: ns2
  labels:
    app: ecommerce
spec:
  gateways:
    allow: "All"
  hostnames:
    - "www.example.org"
  rules:
    - matches:
        - path:
            type: Prefix
            value: /
      forwardTo:
        - serviceName: service1
          port: 80
          weight: 50

When I apply the first configuration, I get 100% of the traffic to the app running on ns1, but once I apply the second configuration, then I get 100% of the traffic to the app running on ns2.

Creating an external service

The second thing I tried was creating an External service for the service in ns2 and create a single HTTPRoute object in ns1 that points to this external service, something like:

image

I created the following HTTPRoute object:

kind: HTTPRoute
apiVersion: networking.x-k8s.io/v1alpha1
metadata:
  name: httproute-external
  namespace: ns1
  labels:
    app: ecommerce
spec:
  gateways:
    allow: "All"
  hostnames:
    - "www.example.org"
  rules:
    - matches:
        - path:
            type: Prefix
            value: /
      forwardTo:
        - serviceName: service1
          port: 80
          weight: 50
        - serviceName: service1-external
          port: 80
          weight: 50

My external service looks like this:

kind: Service
apiVersion: v1
metadata:
  name: service1-external
spec:
  type: ExternalName
  externalName: service1.ns2.svc.cluster.local

When I apply it, I get the following error:

  Gateways:
    Conditions:
      Last Transition Time:  2021-06-14T12:52:56Z
      Message:               service "service1-external" does not exist
      Observed Generation:   1
      Reason:                Degraded
      Status:                False
      Type:                  ResolvedRefs
      Last Transition Time:  2021-06-14T12:52:56Z
      Message:               Errors found, check other Conditions for details.
      Observed Generation:   1
      Reason:                ErrorsExist
      Status:                False
      Type:                  Admitted

But the service exists and it works correctly:

$ kubectl get svc service1-external

NAME                     TYPE           CLUSTER-IP   EXTERNAL-IP                         PORT(S)   AGE
service1-external   ExternalName   <none>       service1.ns2.svc.cluster.local   <none>    19h

This doesn't work either

What did you expect to happen:

To be able to split traffic between different namespaces, to ease canary deployments. Is there a way to do this? If not, should this be implemented?

Environment:

  • Contour version: 1.16
  • Kubernetes version: (use kubectl version): 1.20
  • Kubernetes installer & version: minikube
@arapulido arapulido added kind/bug Categorizes issue or PR as related to a bug. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. labels Jun 14, 2021
@sunjayBhatia
Copy link
Member

Thanks for the issue!

For your first attempt, I believe currently the behavior you are seeing makes sense for the current state of Contour as we haven't completed our Gateway API implementation and HTTPRoute conflict resolution is not finished. Currently the latest processed HTTPRoute will "win" if multiple try to program the same route match, but with #3608 and #3636 we should start fully implementing what the spec defines, which would actually mean that the HTTPRoute created second would not be admitted b/c it has a route match conflict. https://gateway-api.sigs.k8s.io/references/spec/#networking.x-k8s.io/v1alpha1.HTTPRouteRule describes that merging route rules between HTTPRoute objects is not currently allowed in the spec, and as you know in your second attempt, service references are local to the namespace of the HTTPRoute.

I believe your second attempt should have worked, just a sanity check, service1.ns2.svc.cluster.local is the name of the service resource, not the DNS name to expose it?

@sunjayBhatia
Copy link
Member

This is an interesting use case to bring up upstream potentially, since the spec forces you into operating within a single namespace, maybe b/c it is focused around the idea that a single team managing a route in an org. will do all their work within a single ns, rather than multiple?

@sunjayBhatia sunjayBhatia added the area/gateway-api Issues or PRs related to the Gateway (Gateway API working group) API. label Jun 14, 2021
@youngnick
Copy link
Member

Thanks for the issue @arapulido. There's a few things to talk about here, mainly to give you some background on why things are this way, and where they are going. The tl;dr is we can't help you... yet.

Firstly, the behavior you're using with Ingress is a bit of an accident on ingress-nginx's part, that everyone ended up copying. There's nothing in the Ingress spec that says what should happen if two Ingress objects specify the same config, this is what the ingress-nginx team picked, and everyone just did the same, so it became a de facto standard.

However, yhe behavior is also a huge operational risk for a cluster, as anyone who has access to create an Ingress object in their own namespace can interfere with any domain name pointed at that cluster, causing the traffic to be routed onto their service. Most often, I've seen this accidentally create issues when people make mistakes rather than because they're malicious, but both are concerns. I understand that it can be used for good (like you are here), but it's very easy to accidentally break something.

Because of this, the Gateway API requires that you specifically associate Routes with a certain Gateway. There's some stuff that will get you part of the way:

In the Gateway object, by default, you can only choose Routes from the same namespace, but you may set the spec.listeners[].routes.namespaces.from field to All, or Selector to choose all Routes of the type you've chosen across all namespaces, or a label-selected set of namespaces (which you set in the spec.listener[].routes.namespaces.selector field.

But you'll still run up against what @sunjayBhatia mentioned about HTTPRoute conflict resolution behavior, since the Gateway API mandates that only a single HTTPRoute object can match a request.

I'm a part of the upstream working group, and we haven't really spent a lot of time yet covering traffic-splitting/weighted load balancing use cases (which I think this would come under, as a simple use case).,

I really appreciate the detailed example though, and will make sure I take this back upstream.

I'm sorry I can't give you better news yet!

@arapulido
Copy link
Author

I believe your second attempt should have worked, just a sanity check, service1.ns2.svc.cluster.local is the name of the service resource, not the DNS name to expose it?

Yes, my mistake, as I was copying (and making it generic) the definitions, I made a mistake and copied the DNS instead, the service is called service1-external and it points to that DNS. The result is the same. I will edit the issue description with the correct name, to avoid confusion.

@arapulido
Copy link
Author

This is an interesting use case to bring up upstream potentially, since the spec forces you into operating within a single namespace, maybe b/c it is focused around the idea that a single team managing a route in an org. will do all their work within a single ns, rather than multiple?

Yes, it seems that the spec is expecting that, which may be a bit restricting

@arapulido
Copy link
Author

I'm a part of the upstream working group, and we haven't really spent a lot of time yet covering traffic-splitting/weighted load balancing use cases (which I think this would come under, as a simple use case).,

I really appreciate the detailed example though, and will make sure I take this back upstream.

Thanks for the detailed response! Are the upstream workgroup meetings public? I would be happy to attend and explain my use case. Or shall I file an issue upstream? Thanks again!

@youngnick
Copy link
Member

Thanks for the detailed response! Are the upstream workgroup meetings public? I would be happy to attend and explain my use case. Or shall I file an issue upstream? Thanks again!

Absolutely. The community page for the Gateway API project is at https://gateway-api.sigs.k8s.io/contributing/community/, and you can file an issue at https://github.com/kubernetes-sigs/gateway-api/issues/new/choose . There's also #sig-network-gateway-api on Kubernetes Slack.

@arapulido
Copy link
Author

Thanks for the detailed response! Are the upstream workgroup meetings public? I would be happy to attend and explain my use case. Or shall I file an issue upstream? Thanks again!

Absolutely. The community page for the Gateway API project is at https://gateway-api.sigs.k8s.io/contributing/community/, and you can file an issue at https://github.com/kubernetes-sigs/gateway-api/issues/new/choose . There's also #sig-network-gateway-api on Kubernetes Slack.

Thanks! I opened this issue upstream: kubernetes-sigs/gateway-api#695 and joined the Slack channel.

Thanks again!

@skriss
Copy link
Member

skriss commented Nov 30, 2021

The upstream issue has been resolved and the v1alpha2 API now supports cross-namespace route->service refs (using a ReferencePolicy), which we have added support for in main and will be included in Contour 1.20. This enables you to have a single HTTPRoute, splitting traffic across multiple backend services, each potentially in a different namespace. So I think we can consider this resolved -- I'm going to close it out, but @arapulido feel free to reach out again/reopen if that's not the case.

@skriss skriss closed this as completed Nov 30, 2021
@skriss skriss removed the lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. label Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gateway-api Issues or PRs related to the Gateway (Gateway API working group) API. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants