Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaskHub helm chart #91

Merged
merged 15 commits into from
Aug 26, 2020
Merged

DaskHub helm chart #91

merged 15 commits into from
Aug 26, 2020

Conversation

TomAugspurger
Copy link
Member

This implements a multi-user, dask-enabled JupyterHub helm chart that's based off the pangeo helm chart. I've called it "DaskHub" to reflect the multi-user component. The hope is that helm install dask/dask is for single users, and helm install dask/daskhub is for multiple users.

The tl/dr is "daskhub = jupyterhub + dask-gateway configured with jupyterhub auth".

The Differences from the current pangeo helm chart are

  • Updated to JupyterHub 0.9.1
  • Updated default singleuser image -> pangeo-notebook:2020.07.28
  • Fixed incorrect hub.services.dask-gateway.url
  • Removed prePuller, scheduling optimizations
  • Removed extraPodConfig (preemptible nodes)
  • Removed extraConfig (preemptible nodes)
  • Removed custom dask-gateway options

and, most importantly, it doesn't use jupyter to proxy the dask-gateway traefik service. Instead, we make it public. There are ongoing discussions about how pangeo wants to do this. In all, it's a bit more generic and less opinionated than the current pangeo helm chart.


The major usability issue right now is the requirement for users to know either the traefik external-ip or the release name so that they can connect to the gateway. In the README I recommend that administrators set that manually. But it'd be nice to do that automatically. There's some discussion at #68 (comment) on this point. (It's unclear how to set a singleuser extraEnv variable with a value that needs to be templated to include the release name).

I gather that helm has tests. I should probably write some :)

Closes #68

@TomAugspurger
Copy link
Member Author

I think the name "daskhub" needs to be discussed in particular. Does anyone else in the project have plans that this might clash with? (cc @mrocklin, @jcrist, and @quasiben in particular).

daskhub/README.md Outdated Show resolved Hide resolved
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this time to move this over @TomAugspurger.

I've left a few general comments.

daskhub/Chart.yaml Outdated Show resolved Hide resolved
@@ -0,0 +1,25 @@
apiVersion: v2
name: daskhub
version: 0.0.1-set.by.chartpress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does chartpress set this from the tag?

Currently with the dask chart we increment this each release. It is semi-automated today with this script.

https://github.com/dask/helm-chart/blob/master/ci/release.sh#L9

We should probably bring them in line. If chartpress can do this automatically from git tags I'd be in favour of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from pangeo-data/helm-chart@66b7ccd (@consideRatio). But it looks like chartpress does indeed set it based on the tag + commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chartpress indeed sets the version based on a tag, number of commits since the tag, and the hash. But note that the tag needs to be on a commit in the branch, if the latest tag was on a tag not in the branch, chartpress would ignore it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the output of chartpress --long

diff --git a/dask/Chart.yaml b/dask/Chart.yaml
index d79b1fd..ee832d7 100755
--- a/dask/Chart.yaml
+++ b/dask/Chart.yaml
@@ -1,7 +1,6 @@
----
 apiVersion: v1
 name: dask
-version: 4.3.1
+version: 4.3.1-n001.h7878309
 appVersion: 2.23.0
 description: Distributed computation in Python with task scheduling
 home: https://dask.org
diff --git a/daskhub/Chart.yaml b/daskhub/Chart.yaml
index 48e1b7d..ac7a094 100644
--- a/daskhub/Chart.yaml
+++ b/daskhub/Chart.yaml
@@ -1,7 +1,7 @@
 apiVersion: v2
 name: daskhub
 icon: https://avatars3.githubusercontent.com/u/17131925?v=3&s=200
-version: 0.0.1-set.by.chartpress
+version: 4.3.1-n008.hfc9e4ce
 description: Multi-user JupyterHub and Dask deployment.
 dependencies:
   - name: jupyterhub

So things seems to be working OK. I assume that when we tag a release this will be set to the actual value?

daskhub/README.md Outdated Show resolved Hide resolved
```python
>>> from dask_gateway import Gateway
>>> gateway = dask_gateway.Gateway(
... address="http://35.223.8.79", # traefik-us-central1b-dhub-dask-gateway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not configuring this automatically?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not :( In the "Configuring JupyterHub" section I give some details for how the administrator can set it for JupyterHub users.

If we can figure out how to set helm templated environment variables for the user then I think we can make this work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right this is the template problem from earlier.

@TomAugspurger
Copy link
Member Author

Just some thinking out loud here.

The primary issue right now is that successful helm install dask/daskhub
requires some care with the release names of certain objects. In particular, you
need to know and set the values of

  1. The domain / IP address of your JupyterHub (for the gateway address) 2. The
    release name & namespace (For the proxy address and service)

ahead of time to get a usable deployment. If you don't know those, then you'll
need two helm installs

  1. An initial helm install dask/daskhub. Get the hub address and / or the
    release name.
  2. A second helm install dask/daskhub with the address and
    release name set.

This isn't an ideal user experience, but maybe it's good enough? In particular
helm3 pushes people to specify release names, so that won't really be a problem.
That just leaves the Hub IP address. I think in the typical case, users will
already be doing two helm installs: one to get the IP address from the cloud
provider, and a second to set it in the JupyterHub config. The only additional
work is that they need to specify it in the Dask config as well. Which doesn't
sound too bad?


If people are broadly OK with that, I'll push up some documentation updates
explaining things.

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 19, 2020

It would be great to try and find a solution to this. In the past when helm charts were stored in a central monorepo one of the criteria to get something merged was that you should be able to run helm install <chartname> once with the default config and you would get something usable. It feels like this should be a minimum bar for a helm chart.

I appreciate we may have to make some exceptions here as the z2jh requires manual generation of secrets, and so does not meet this bar.

A couple of ideas:

  • Is it possible to use relative URLs for the gateway. That's how we configure the lab extension in the regular chart.
  • Could we fix the environment variable from configmap issue upstream? Or perhaps add some custom config which reads the config from a mount instead?

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Aug 19, 2020

Is it possible to use relative URLs for the gateway.

I'll check. It's possibly that this will make the dashboard_link inaccessible from outside the hub, but perhaps that's OK.

Could we fix the environment variable from configmap issue upstream?

I think that if jupyterhub called something like {{ tpl extraEnv }} instead of just including the literal extraEnv then we'd be OK. I'm not sure what that would break though. Perhaps jupyterhub could have an extra extraEnvTemplate field that's merged with extraEnv after templating? Any thoughts on that @consideRatio?

Or perhaps add some custom config which reads the config from a mount instead?

I might have an idea... Earlier, you suggested adding these templated environment variables from a configmap. That didn't work for the singleuser case, since jupyterhub just wants a list of key: value, not the envVar objects. But we should be able to

  1. Add the templated values to a daskhub configMap
  2. Mount the configMap to the hub (not the singleuser, which failed earlier)
  3. Use jupyterhub.hub.extraConfig to update c.Spawner.environment

@consideRatio
Copy link
Collaborator

I think that if jupyterhub called something like {{ tpl extraEnv }} instead of just including the literal extraEnv then we'd be OK. I'm not sure what that would break though. Perhaps jupyterhub could have an extra extraEnvTemplate field that's merged with extraEnv after templating? Any thoughts on that @consideRatio?

Hmmm I'd like to clarify the issue/goal so I can think more clearly about it.

My understanding of the goal

Is it correct that you want to avoid end users needing to specify something like this?

gateway = dask_gateway.Gateway(
    address="http://35.223.8.79",  # traefik-us-central1b-dhub-dask-gateway
)

And preferably, you even hope to avoid asking admins to do something like below to avoid making their users specify something like above? For example, by embedding this logic directly into the DaskHub Helm chart?

# values.yaml
jupyterhub:
  singleuser:
    extraEnv:
      DASK_GATEWAY__ADDRESS: "http://traefik-dhub-dask-gateway"

Thoughts

Hmm... There are many ways to consider solving this, but I'm looking for a solution to be robust for a Helm chart user that wants to customize things further. Due to that, I want us to avoid claiming certain hooks etc in Z2JH / JupyterHub to be used solely by the DataHub helm chart developers and not by its users.

Instead of going down the rabbit hole further, I note that @TomAugspurger wrote this earlier:

I might have an idea... Earlier, you suggested adding these templated environment variables from a configmap. That didn't work for the singleuser case, since jupyterhub just wants a list of key: value, not the envVar objects.

To be able to specify an environment variable in the z2jh Helm chart that we want set, that follows this specification somehow. Because, only with such specifications, one is able to for example mount an environment variable that reads something from a secret or similar.

But, with this said, perhaps you could solve something even quicker?

# values.yaml
jupyterhub:
  singleuser:
    extraEnv:
      DASK_GATEWAY__ADDRESS: "http://traefik-$(SOME_ENV_TO_FIND_THE_HELM_RELEASE_NAME)-dask-gateway"

Conclusion

The Z2JH helm chart doesn't mimic the k8s api specs for hub.extraEnv, or singleuser.extraEnv. This cause issues already reported, and solving that in a backwards compatible way is worth it for the Z2JH / KubeSpawner by its own, so if that would be sufficient to help you then I think I'd like to upstream the fix there.

There are also a potential workarounds, but I don't want to spend much time on them as the complexity rise and it will have side-effects or add assumptions that can be broken by the users of the Helm chart for example.

A potential workaround without much side-effect would be to use the current hub.extraEnv / singleuser.extraEnv to pass an k8s specified environment variable (syntax: $(NAME)). Note that KubeSpawner logic also expands singleuser.extraEnv (syntax: {NAME}`) with userid / username / servername but that won't help us.

Final note, there is also the k8s specification called envFrom which isn't directly exposed through z2jh I think. Hmmm...

@jacobtomlinson
Copy link
Member

This cause issues already reported, and solving that in a backwards compatible way is worth it for the Z2JH / KubeSpawner by its own, so if that would be sufficient to help you then I think I'd like to upstream the fix there.

My preference would be fixing jupyterhub/zero-to-jupyterhub-k8s#1103 upstream. Then we can create a config map with the templated value and pass it through as an environment variable.

@consideRatio
Copy link
Collaborator

Then we can create a config map with the templated value.

Could you clarify what the templates value is? Is it only the Helm release name?

@TomAugspurger
Copy link
Member Author

I think these are the values we'll need

data:
  DASK_GATEWAY__ADDRESS: "traefik-{{ .Release.Name }}-dask-gateway"
  DASK_GATEWAY__PROXY_ADDRESS: "gateway://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}:8786"
  DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL: "http://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}"

So

  • release name
  • release namepsace

@consideRatio
Copy link
Collaborator

@TomAugspurger, and these are env vars to be set on the user pod rather than the hub pod, is that correct?

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Aug 19, 2020

In two places. The DASK_GATEWAY__ADDRESS and DASK_GATEWAY__PROXY_ADDRESS need to go in the singleuser pod. The DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL should go in jupyterhub.hub.services (but I'm still figuring out exactly what the value should be. I think ideally it is the actual hub URL that a user would visit, which may not be available yet).

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Aug 19, 2020

I think ideally it is the actual hub URL that a user would visit, which may not be available yet

Do you mean an external address like a load balancer address?

@consideRatio
Copy link
Collaborator

In two places. The DASK_GATEWAY__ADDRESS and DASK_GATEWAY__PROXY_ADDRESS need to go in the singleuser pod. The DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL should go in jupyterhub.hub.services (but I'm still figuring out exactly what the value should be. I think ideally it is the actual hub URL that a user would visit, which may not be available yet).

DASK_GATEWAY__ADDRESS and DASK_GATEWAY__PROXY_ADDRESS are environment variables to be set on the user server pod, but DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL is an environment variable to be used by the hub pod?

@jacobtomlinson
Copy link
Member

@consideRatio yes

@consideRatio
Copy link
Collaborator

consideRatio commented Aug 19, 2020

These environment variables are available on the singleuser pods by default I think.

      JUPYTERHUB_API_TOKEN:           redacted
      JPY_API_TOKEN:                  redacted
      JUPYTERHUB_ADMIN_ACCESS:        1
      JUPYTERHUB_CLIENT_ID:           jupyterhub-user-consideratio
      JUPYTERHUB_HOST:                
      JUPYTERHUB_OAUTH_CALLBACK_URL:  /user/consideratio/oauth_callback
      JUPYTERHUB_USER:                consideratio
      JUPYTERHUB_SERVER_NAME:         
      JUPYTERHUB_API_URL:             http://10.60.53.247:8081/hub/api
      JUPYTERHUB_ACTIVITY_URL:        http://10.60.53.247:8081/hub/api/users/consideratio/activity
      JUPYTERHUB_BASE_URL:            /
      JUPYTERHUB_SERVICE_PREFIX:      /user/consideratio/
      MEM_LIMIT:                      2147483648
      MEM_GUARANTEE:                  107374182
      CPU_LIMIT:                      16.0
      CPU_GUARANTEE:                  0.36
      JUPYTER_IMAGE_SPEC:             us.gcr.io/neurohackademy/nh-2020-env:9e6db32
      JUPYTER_IMAGE:                  us.gcr.io/neurohackademy/nh-2020-env:9e6db32

And these are available on the hub pod typically I think.

      PYTHONUNBUFFERED:        1
      HELM_RELEASE_NAME:       hub-neurohackademy-org-prod
      JPY_COOKIE_SECRET:       <set to the key 'hub.cookie-secret' in secret 'hub-secret'>  Optional: false
      POD_NAMESPACE:           default (v1:metadata.namespace)
      CONFIGPROXY_AUTH_TOKEN:  <set to the key 'proxy.token' in secret 'hub-secret'>  Optional: false

Due to this, I think on the hub pod, you could do...

# Default values.yaml of DataHub helm chart, with
# no risk of overriding a users configuration.
jupyterhub:
  hub:
    extraEnv:
      # NOTE: we can remove the trailing .$(POD_NAMESPACE)
      #             if this address will be used by a pod in the same namespace
      DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL: "http://traefik-$(HELM_RELEASE_NAME)-dask-gateway.$(POD_NAMESPACE)"

Or can we? These are variables coming from here, and if we add hub.extraEnv they will be appended, so can one rely on another? Hmmm... Yes I think so based on this example.

    env:
    - name: HELM_RELEASE_NAME
      value: hub-neurohackademy-org-prod
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace

If that would work, then we would also need HELM_RELEASE_NAME variable on the user pod, and POD_NAMESPACE as well. Which I think could make sense to add by default for user pods so they can be configured easier in these situations.

@consideRatio
Copy link
Collaborator

Action points

  • Try to resolve the long standing issue of allowing for k8s native environment specification through the z2jh helm chart, so instead of just NAME: VALUE allow for an entry like below somehow.

        - name: MY_NAMESPACE_VAR
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
  • Consider injecting a HELM_RELEASE_NAME environment variable and a POD_NAMESPACE variable to the user pods by default, allowing for other environment variables to reference them with $(POD_NAMESPACE) (not to be confused by typical shell syntax using ${}).

    I opened Inject HELM_RELEASE_NAME and POD_NAMESPACE into the user pods jupyterhub/zero-to-jupyterhub-k8s#1747 to represent this.

@TomAugspurger
Copy link
Member Author

One other thing that would be nice, is a definitive value of the JupyterHub URL that's available jupyterhub.hub.extraConfig. We need that for that, since the users will be accessing the dashboard from their browsers (and I don't think know if this can go through /services/dask-gateway/...).

I think I can get this from helm, but I'm probably missing cases since I need to check for things like

  • is https enabled?
  • if not, is a host specified (what if more than one is specifed?)
  • if not, is a loadBalancerIP specified?

I'll push an update shortly that should demonstrate this.

@TomAugspurger
Copy link
Member Author

OK, I'm reasonably happy with where things are at now. Out of the box, helm install dask/daskhub would give you a JupyterHub + Dask where users can create / connect to clusters without having to specify anything. The only thing that doesn't work are dashboards. I think we need either the hostname or external IP address of the proxy-public service so that this is accessible from outside the k8s cluster.

If you already know these, then you can just helm install dask/daskhub and things will be fine. Otherwise you'll need to get the external-ip from kubernetes and then re-deploy.

Copy link

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask a bit about where/when/how the DASK_GATEWAY__ADDRESS env is used? Is it:

a) used mainly in the user pod, and
b) used to make pod->gateway requests or only browser->gateway?

and does the gateway service need to know its own public host?

There's a lot of complexity in trying to reliably retrieve the public hostname for the hub chart to get this to work by default more often than not, but part of the reason for that is that jupyterhub never needs to know its own name for external connections. Everything is handled via absolute paths and hosts are derived from requests, either by link URls and Location headers that are all href="/hub/..." or in the rare occasion that the host is needed, in a tornado handler retrieving it from the Request object. JupyterHub's point-to-point communications all use service names/ips which don't need to be publicly accessible. Would a similar approach work in this context, or does the public host need to be known in situations where a request info is not available?

Thinking about the components involved, I would think that the only thing you need to store is the address for pod->gateway connections, which should be the knowable internal service name, and public-facing URLs would be resolved at request-time and the only thing you need to set is the services path, /services/dask-gateway.

DASK_GATEWAY__PROXY_ADDRESS: "gateway://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}:80"
DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL: "http://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}"
# Try to detect the gateway address through a few means.
{{ if .Values.jupyterhub.proxy.https.hosts }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the complexity of the Hub chart knowing its own public address (there are so many ways for people to set up ingresses, https, etc. for jupyterhub, some of which can be retrieved from the chart, some of which can't) and various sources it may come from, having a simple explicit override available in this chart may be worthwhile:

{{- if .Values.jupyterhubPublicHost }}
DASK_GATEWAY__ADDRESS: "{{ .Values.jupyterhubPublicHost }}/services/dask-gateway"
{{- else }}

it's manual, but at least it will always work and isn't sensitive to what's going on in the hub chart.

@TomAugspurger
Copy link
Member Author

Thanks Min,

I think you've clarified the problem I've been having: separating singleuser-pod -> gateway-pod requests and browser -> gateway-pod requests. Fortunately the most recent dask-gateway (0.8.0) makes this easier.

Since DASK_GATEWAY__ADDRESS only needs to be for singleuser-pod -> gateway pod requests, we can use kubernetes internal address http://proxy-public/services/dask-gateway.

Then the only thing that needs to worry about the external address is DASK_GATEWAY__PUBLIC_ADDRESS. I've adopted your suggestion of adding a toplevel jupyterhubPublicHost key to explicitly set this. If that's not set,
we'll try to guess it (based on jupyterhub.proxy.https.hosts, jupyterhub.proxy.service.loadBalancerIP, ...).

Hopefully that all makes sense (I'm somewhat out of my element here).

@jcrist
Copy link
Member

jcrist commented Aug 24, 2020

Then the only thing that needs to worry about the external address is DASK_GATEWAY__PUBLIC_ADDRESS

Since this is just for repr'ing the dashboard link on the gateway client in the browser, this could also be a relative url and things should work out. Something like "/services/dask-gateway/" might work.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Aug 24, 2020

Dang, that's awesome. Just confirmed that the relative /services/dask-gateway/ does indeed work. Thanks Jim!

So that'll completely remove the need for the hackiest parts of detecting the hub URL. Now we'll just have a simple configmap that templates in the release name, and a bit of logic in values.yaml that inserts them into the singleuser env. A simple helm install dask/daskhub will get you everything you need, including user's dashboards. No need to deploy a second time.

I'll push an update with that and some CI / frigate things shortly.

@TomAugspurger
Copy link
Member Author

One question for the folks experienced with JupyterHub: is this following warning expected during a helm install?

coalesce.go:196: warning: cannot overwrite table with non table for extraEnv (map[])

I assume that's because jupyterhub.hub.extraEnv can either be a mapping of Key: Value or a list of things like

jupyterhub:
  hub:
    extraEnv:
      - name: DASK_GATEWAY__ADDRESS
        valueFrom:
          configMapKeyRef:
            name: daskhub-config
            key: DASK_GATEWAY__ADDRESS

I can make an issue on z2jh if desired.

@TomAugspurger
Copy link
Member Author

OK @jacobtomlinson, things should be good here on my end.

@jacobtomlinson
Copy link
Member

Thanks for the effort here @TomAugspurger.

I think I'm going to merge as is and if we need to make any further changes we can do it in follow up PRs.

@jacobtomlinson jacobtomlinson merged commit a7bebcb into dask:master Aug 26, 2020
@TomAugspurger
Copy link
Member Author

TomAugspurger commented Aug 26, 2020 via email

@TomAugspurger TomAugspurger deleted the pangeo branch August 26, 2020 11:23
@jacobtomlinson
Copy link
Member

Should we add a deprecation notice to the old one and link folks over here?

@TomAugspurger
Copy link
Member Author

Yep, I was just looking into that. I can take care of it.

Will you be able to help out with a release here @jacobtomlinson? Or do you know if there's a way to install a chart from master, or do we need a tag first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upstream Pangeo Helm Chart here
5 participants