Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation on GKE (standard & autopilot) fails with Dataplane V2 & kube-dns(Cilium) #3167

Closed
vizeit opened this issue Jul 21, 2023 · 26 comments · Fixed by #3179
Closed

Installation on GKE (standard & autopilot) fails with Dataplane V2 & kube-dns(Cilium) #3167

vizeit opened this issue Jul 21, 2023 · 26 comments · Fixed by #3179
Labels

Comments

@vizeit
Copy link

vizeit commented Jul 21, 2023

Bug description

This is the continuation of the ticket 3163. It was closed with the comment to search on https://discourse.jupyter.org for similar issues. This issue reported here did not match with any existing discussions on the discourse for error 599.
The issue is that when GKE cluster, either standard or Autopilot, is created with Dataplane V2, the hub fails to start with the following error,

[W 2023-07-21 17:35:47.524 JupyterHub proxy:899] api_request to the proxy failed with status code 599, retrying...
[E 2023-07-21 17:35:47.525 JupyterHub app:3382]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/app.py", line 3380, in launch_instance_async
        await self.start()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/app.py", line 3146, in start
        await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/proxy.py", line 946, in get_all_routes
        resp = await self.api_request('', client=client)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/proxy.py", line 910, in api_request
        result = await exponential_backoff(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/jupyterhub/utils.py", line 237, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    TimeoutError: Repeated api_request to proxy path "" failed.

Expected behaviour

The installation should complete successfully

Actual behaviour

The installation fails with hub pod in crashbackoff state. The logs show the following error message,
api_request to the proxy failed with status code 599, retrying...

How to reproduce

GKE Autopilot:

  1. Create GKE autopilot cluster (Dateplane V2 & kube-dns is enabled by default)
  2. Install Jupyterhub
helm upgrade --cleanup-on-fail   --install testjupyterhub jupyterhub/jupyterhub   --namespace testjupyterhubdev   --create-namespace   --version=3.0.0-beta.3   --values config.yaml

config.yaml content:

singleuser:
  cloudMetadata:
    blockWithIptables: false
hub:
  readinessProbe:
    initialDelaySeconds: 60

GKE standard:

  1. Create GKE standard cluster. Go to Network tab and select 'Enable Dataplane V2’ & ‘kube-dns’
  2. Install Jupyterhub
    helm upgrade --cleanup-on-fail   --install testjupyterhub jupyterhub/jupyterhub   --namespace testjupyterhubdev   --create-namespace   --version=3.0.0-beta.3   --values config.yaml
    

config.yaml content:

singleuser:
  cloudMetadata:
    blockWithIptables: false
hub:
  readinessProbe:
    initialDelaySeconds: 60

Your personal set up

  • OS:
  • Version(s):
Full environment
# paste output of `pip freeze` or `conda list` here
Configuration
# jupyterhub_config.py
Logs
@vizeit vizeit added the bug label Jul 21, 2023
@vizeit
Copy link
Author

vizeit commented Jul 22, 2023

I investigated this further and found a work around. However, the core network policies may need to be reviewed for Dataplane V2. I have described the issue and troubleshooting steps with a solution at the link below,
https://www.vizeit.com/jupyterhub-on-gke-autopilot/

@consideRatio
Copy link
Member

Excellent debugging and tracking what the core issue was @vizeit!!

Okay, so with dagaplane v2 you needed to add rules to add egress rules to communicate with the dns server - because the default rules failed to do so.

Your rules looked like:

  networkPolicy:
    egress:
      - to:
          - namespaceSelector:
              matchLabels:
                kubernetes.io/metadata.name: kube-system
        ports:
          - protocol: UDP
            port: 53
          - protocol: TCP
            port: 53

Nice!!!

I learned about the ability to target a namespace name based on the special label metadata.name. I think we can improve our defaults in this helm chart based on this because allowing access to the dns server by default has been a bit tricky.

Thank you for your investigative efforts and detailed writeup, i learned about what gke dataplane v2 is and a debugging trick for it!

@vizeit
Copy link
Author

vizeit commented Jul 22, 2023

Happy to contribute to the community!

@consideRatio
Copy link
Member

consideRatio commented Jul 22, 2023

This chart provides a egress rule by default, configurable via dnsPortsPrivateIPs, that is meant to allow them to establish a connection to a k8s local DNS server. This is what seems to be failing, because otherwise, why would you need to add a rule explicitly for this?

The rule looks like below, and is defined here:

# Allow outbound connections to the DNS port in the private IP ranges
- ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  to:
    - ipBlock:
        cidr: 10.0.0.0/8
    - ipBlock:
        cidr: 172.16.0.0/12
    - ipBlock:
        cidr: 192.168.0.0/16

This rule was designed to arrive at a DNS server without knowing that much about it, and not allowing more than needed. But it is not doing the work for some reason on dataplane v2 as you had to add another rule targetting the pods in the kube-system namespace instead (something I didn't know was possible until now). It seems that in k8s 1.22, we can assume the new label as documented here.

A key question in my mind is why the rules we had didn't work.

      "disposition": "deny",
      "src": {
        "workload_kind": "Deployment",
        "workload_name": "hub",
        "pod_name": "hub-74bb745848-vf8lh",
        "namespace": "testjupyterhubdev",
        "pod_namespace": "testjupyterhubdev"
      },
      "dest": {
        "namespace": "kube-system",
        "pod_namespace": "kube-system",
        "workload_kind": "DaemonSet",
        "pod_name": "node-local-dns-svv4s",
        "workload_name": "node-local-dns"
      },
      "connection": {
        "direction": "egress",
        "protocol": "udp",
        "src_ip": "10.114.1.80",
        "dest_ip": "10.114.1.72",
        "dest_port": 53,
        "src_port": 56625
      },

Is it because the src_port was 56625 rather than 53? Otherwise this part should cover it:

- ports:
    - protocol: UDP
      port: 53
  to:
    - ipBlock:
        cidr: 10.0.0.0/8 # this subnet range includes 10.114.1.72

I've tried to look into source / destination differences via docs and reference.

The reference sais:

ports is a list of destination ports for outgoing traffic. Each item in this list is combined using a logical OR. If this field is empty or missing, this rule matches all ports (traffic not restricted by port). If this field is present and contains at least one item, then this rule allows traffic only if the traffic matches at least one port in the list.

Due to this, I'm confused. Why doesn't the z2jh default rule provided work? Is this a bug in dataplane v2 rather? If so, its important to know that what GKE calls dataplane v2 is really simply Cilium, so it could be either GKE's dataplane v2 specifics related to deploying cilium, or cilium itself.

@consideRatio
Copy link
Member

consideRatio commented Jul 22, 2023

@vizeit I think the networking rule z2jh provides by default should work, and its a bug in Cilium described here that causes the issue: cilium/cilium#25998. If you want to confirm this, I think you could use your test cluster and do kubectl edit on the networkpolicy resources where you adjust the DNS egress rule to only list one ipBlock, the one referencing the 10.0.0.0/8 CIDR.

This is because the bug describes that cilium is failing specifically when multiple ipBlock.cidr are specified as we have, and that is the key difference between the rule you added and the rule we provided with the z2jh chart I think.

@consideRatio consideRatio changed the title Installation on GKE (standard & autopilot) fails with Dataplane V2 Installation on GKE (standard & autopilot) fails with Dataplane V2 (Cilium) Jul 22, 2023
@vizeit
Copy link
Author

vizeit commented Jul 22, 2023

In my understanding Dataplane V2 uses eBUF instead of iptables and that may be the reason existing rule will not work

@vizeit
Copy link
Author

vizeit commented Jul 22, 2023

@vizeit I think the networking rule z2jh provides by default should work, and its a bug in Cilium described here that causes the issue: cilium/cilium#25998. If you want to confirm this, I think you could use your test cluster and do kubectl edit on the networkpolicy resources where you adjust the DNS egress rule to only list one ipBlock, the one referencing the 10.0.0.0/8 CIDR.

This is because the bug describes that cilium is failing specifically when multiple ipBlock.cidr are specified as we have, and that is the key difference between the rule you added and the rule we provided with the z2jh chart I think.

I will try

@consideRatio
Copy link
Member

I think the Helm chart should add a new rule by default like the one you defined because its more narrow, targetting pods in the k8s namespace kube-system specifically instead of all pods. Also like that, we would have a workaround for the cilium bug by default.

Followup actions

I think the following makes sense to followup this issue with. I'd like to be involved working the last three in some way, I want to ensure that these changes that are security related and breaking are implemented quickly and communicated clearly for the 3.0.0 release. If you want to work them as well @vizeit you are most welcome to just go for it. I'm vacationing and will work more the week after the one coming up now.

@vizeit
Copy link
Author

vizeit commented Jul 24, 2023

@vizeit I think the networking rule z2jh provides by default should work, and its a bug in Cilium described here that causes the issue: cilium/cilium#25998. If you want to confirm this, I think you could use your test cluster and do kubectl edit on the networkpolicy resources where you adjust the DNS egress rule to only list one ipBlock, the one referencing the 10.0.0.0/8 CIDR.
This is because the bug describes that cilium is failing specifically when multiple ipBlock.cidr are specified as we have, and that is the key difference between the rule you added and the rule we provided with the z2jh chart I think.

I will try

I tested as per your comment. The hub pod remained in the reported error condition, the reported issue did not resolve with the suggested change. Please let me know if you have any questions

Steps followed:

  1. Install JupyterHub on GKE Autopilot with the following helm values
singleuser:
  cloudMetadata:
    blockWithIptables: false
hub:
  readinessProbe:
    initialDelaySeconds: 60
  1. Edit the hub network policy and change the DNS port core policy to,
# Allow outbound connections to the DNS port in the private IP ranges
- ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  to:
    - ipBlock:
        cidr: 10.0.0.0/8

@consideRatio
Copy link
Member

Did you delete the pod after changing the network policy, so we know its not caused by a failure to re-apply the rule or similar?

@vizeit
Copy link
Author

vizeit commented Jul 24, 2023

Did you delete the pod after changing the network policy, so we know its not caused by a failure to re-apply the rule or similar?

I did check again by deleting the pod and the same error remains, although it does not require to delete the pod to check that updated policy is applied or not. Updated policy gets applied right away. You can double check if you want

@consideRatio
Copy link
Member

Hmmmm, so then it maybe isnt the same bug, but another bug? I think the network policy as defined by this chart isnt enforced correctly by cilium.

@vizeit
Copy link
Author

vizeit commented Jul 24, 2023

Hmmmm, so then it maybe isnt the same bug, but another bug? I think the network policy as defined by this chart isnt enforced correctly by cilium.

In my understanding, it is not a cilium bug. With cilium, the network policy for anything outside the cluster is using IP. The cilium bug you have referred here will apply to IP block for outside the cluster IPs. The kube-system is within the cluster and cannot be enforced with an IP

@consideRatio
Copy link
Member

With cilium, the network policy for anything outside the cluster is using IP.

I don't understand you well here. I'm thinking that what NetworkPolicy specifies should be enforced the same way no matter what network policy controller does the job - according to a kubernetes official definition of how it should be enforced.

I understand it as a NetworkPolicy resource defining an egress rule like below should allow opening connections to port 53 destinations on an IP like 10.x.y.z (because 10.0.0.0/8).

- ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
  to:
    - ipBlock:
        cidr: 10.0.0.0/8

I do believe that you can allow egress to a k8s service and its exposed endpoints on pods based on IPs like this, otherwise we would have had this failure in other k8s clusters where a k8s pod called kube-dns-... in the kube-system namespace wouldn't be reached successfully.

@consideRatio
Copy link
Member

Looking at cilium docs, it seems there are a lot of notes about CIDR and IPAM (ip address management), so I suspect this is perhaps an expected behaviour of cilium being limited by GKE to not support this or similar: https://docs.cilium.io/en/latest/network/concepts/ipam/

@consideRatio
Copy link
Member

Do you know what IPAM configuration cilium has on GKE? I don't understand much what I'm asking now, but I think this is the thread worth digging into to understand why what we have currently doesn't work - but the path onwards remains the same I think: switch to another rule based UDP/IP port 53 in the kube-system namespace instead of the general any non-public ip and port 53 rule the chart currently currently ship with by default.

@vizeit
Copy link
Author

vizeit commented Jul 29, 2023

IPAM on GKE is kubernetes. I have described detailed steps in my post below to find the root cause of the issue. The issue is not limited to Dataplane V2, any Kubernetes cluster with Cilium will have the issue. This DNS behavior is well-known and it is based on the way it is implemented, as I indicated in my previous comments. I hope this post helps to get more understanding of the issue

https://www.vizeit.com/troubleshooting-cilium-on-gke/

@consideRatio
Copy link
Member

I tracked this down to be a known limitation with Cilium, and has been for a long time.

This documentation mentions it as a known missing feature, referencing cilium/cilium#9209 as the issue to track.

@vizeit
Copy link
Author

vizeit commented Aug 1, 2023

Please refer to K8 documentation at https://kubernetes.io/docs/concepts/services-networking/network-policies/

ipBlock: This selects particular IP CIDR ranges to allow as ingress sources or egress destinations. These should be cluster-external IPs, since Pod IPs are ephemeral and unpredictable

@vizeit vizeit changed the title Installation on GKE (standard & autopilot) fails with Dataplane V2 (Cilium) Installation on GKE (standard & autopilot) fails with Dataplane V2 & kube-dns(Cilium) Aug 1, 2023
@vizeit
Copy link
Author

vizeit commented Aug 1, 2023

While working on my implementation, I have found that this issue will be limited to the GKE Autopilot & Standard with Dataplane-V2 & kube-dns for the Google managed Kubernetes. Google is moving GKE to Cloud DNS which will be external to GKE clusters.
GKE standard & Autopilot clusters with Cloud DNS will need the following,

  1. config values
singleuser:
  cloudMetadata:
    blockWithIptables: false
  1. Single user network policy will need to allow egress to the GCP meta server IP 169.254.169.254. The current core network policies only include this IP egress rule for the hub and proxy

@consideRatio not sure if you want to track this cloud DNS issue separately, I can open a new issue

@consideRatio
Copy link
Member

I opened #3179 with a proposal on how to address various situations including when Cloud DNS is reached at the non-private/non-pod-cidr IP of the cloud metadata server.

Thank you soo much for digging into the details about this, I think I've finally understood this well enough to document it.

@consideRatio
Copy link
Member

Do you think #3179 does the trick? With it, I think the only thing needed for use with Cilium or GKE's Cloud DNS could be:

singleuser:
  cloudMetadata:
    blockWithIptables: false

@vizeit
Copy link
Author

vizeit commented Aug 1, 2023

Do you think #3179 does the trick? With it, I think the only thing needed for use with Cilium or GKE's Cloud DNS could be:

singleuser:
  cloudMetadata:
    blockWithIptables: false

3179 will partially address the Cloud DNS issue for GKE. The single user core network policy will need to allow egress to the GCP meta server 169.254.169.254. Otherwise single user pod will fail to launch.

@vizeit
Copy link
Author

vizeit commented Aug 1, 2023

I opened #3179 with a proposal on how to address various situations including when Cloud DNS is reached at the non-private/non-pod-cidr IP of the cloud metadata server.

Thank you soo much for digging into the details about this, I think I've finally understood this well enough to document it.

Happy to help! Thank you for looking into the issue

@consideRatio
Copy link
Member

I've also opened #3180, and if that is agreed on, I think that together with #3179 will make everything work out of the box without disabling singleuser.cloudMetadata.blockWithIptables.

@consideRatio
Copy link
Member

3179 will partially address the Cloud DNS issue for GKE. The single user core network policy will need to allow egress to the GCP meta server 169.254.169.254.

Yepp! This is actually part of #3179, all network policy resources defined by the jupyterhub chart get the same "egressAllowRules" configuration, so both singleuser.networkPolicy.egressAllowRules.dnsPortsKubeSystemNamespace and singleuser.networkPolicy.egressAllowRules.dnsPortsCloudMetadataServer is available and true by default with #3179.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants