Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream Pangeo Helm Chart here #68

Closed
jacobtomlinson opened this issue Jun 8, 2020 · 23 comments · Fixed by #91
Closed

Upstream Pangeo Helm Chart here #68

jacobtomlinson opened this issue Jun 8, 2020 · 23 comments · Fixed by #91
Labels

Comments

@jacobtomlinson
Copy link
Member

The Pangeo community have a Helm Chart which installs both the zero2jupyterhub and dask-gateway Helm Charts as dependencies and configures them to work together.

This enables folks to set up a multi-user deployment of Jupyter and Dask for their group or organisation. It is effectively the next stage on from the single-user experience of the current chart in this repo.

There has been a discussion in that chart around consolidating it into the main Pangeo configuration to remove maintenance burden (pangeo-data/helm-chart#129).

My view is that the Pangeo chart as it stands does not have much, if any, Pangeo specific content in it. It is just a wrapper around Jupyter and Dask gateway's existing charts. Therefore I propose we upstream the chart here so that this repository contains both a single-user and multi-user chart.

Moving it here would remove the maintenance burden from the Pangeo folks and only add a small burden here as we already maintain this chart with automated PRs to update dependencies. It would also expose the chart to a wider audience.

I would be keen to get feedback from both the Dask and Pangeo communities on this proposal.

@jhamman
Copy link
Member

jhamman commented Jun 11, 2020

Thanks @jacobtomlinson for opening this discussion. Just chiming in to say that I would support this such a move, if it finds support here and that, if a Dask JupyterHub/Gateway chart existed, we would gladly focus our development and maintenance energies here.

@jacobtomlinson
Copy link
Member Author

ping @dask/maintenance

@martindurant
Copy link
Member

I am generally in favour, but have to trust your assessment on how much of a maintenance burden this would be - I assume more than updating dask/distributed versions.

@jacobtomlinson
Copy link
Member Author

@martindurant it is effectively a meta chart so will require updating the versions of the two charts it depends on, one of them being Dask Gateway so we maintain that anyway. I'm happy that this can also be automated.

@martindurant
Copy link
Member

I am convinced

@TomAugspurger
Copy link
Member

I think I'm also onboard with the general idea of us maintaining a multi-user Dask + Jupyter helm chart, powered by Dask-Gateway & Jupyterhub.

The helm chart itself probably needs some updating. Over time it's become fairly opinionated and tailored to pangeo's needs & deployments. For example, I think the tolerations at https://github.com/pangeo-data/helm-chart/blob/b1a230a88d2587713eb440c9de402770f6ae32e6/pangeo/values.yaml#L66-L87 effectively require a kubernetes cluster with node pools that are able to handle those.

@scottyhq
Copy link

thanks for opening this @jacobtomlinson. Just want to say i'm in favor of this and would be happy to dedicate some time for generalizing and testing to give the chart a new home! For starters, would this essentially be defined under a subfolder in this repository and installed as helm install --name my-release dask/dask-jupyterhub ?

@jhamman
Copy link
Member

jhamman commented Jun 16, 2020

@TomAugspurger - small detail but this:

For example, I think the tolerations at https://github.com/pangeo-data/helm-chart/blob/b1a230a88d2587713eb440c9de402770f6ae32e6/pangeo/values.yaml#L66-L87 effectively require a kubernetes cluster with node pools that are able to handle those.

is not quite right. This just set's the worker/scheduler pod tolerations to match those of dask-kubernetes (ref). We may actually want to discuss this as a fixture in dask-gateway but it doesn't require cluster nodes to have a matching taint.

@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 16, 2020 via email

@TomAugspurger
Copy link
Member

I looked into this a bit yesterday. I think the biggest things to decide on are

  1. How to structure the repository (one chart, with flags to enable / disable jupyterhub / dask-gateway, or two separate charts)
  2. If we have two charts, what do we name them? helm install dask gives you the same thing as today (Dask with dask-kubernetes and a single jupyterlab / notebook server). helm install ... gives you JupyterHub + Dask Gateway for multiple users and clusters. Should we continue to call that "pangeo"? "dask-multiuser"? "daskhub"?

@TomAugspurger
Copy link
Member

TomAugspurger commented Aug 16, 2020

A couple additional things I'm struggling with, to make a helm install dask-multiuser thing work out of the box.

  1. Assuming we want to piggy back on JupyterHub auth, it's difficult to get this working out of the box. We need to set the value jupyterhub.hub.services.dask-gateway.url to the rendered value of http://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}. However, helm doesn't evaluate values in the chart's values.yaml. Does anyone know if there's an easy way to set a rendered value in a subchart? Everything I've tried has come up empty.

Alternatively, we don't proxy through a JupyterHub service, and instead expose the gateway publicly? Pangeo might be moving to that to open up some non-JupyterHub-based workflows. Then things are fine.

  1. (Less critical than 1): it'd be nice to configure the singleuser environment to have the right address & proxy address. I've had trouble with that for the same reason, they depend on values that I don't think are known the first time we do a helm install.

@jacobtomlinson
Copy link
Member Author

My preference would be to not call it pangeo. One of the reasons we are moving it here is because it's pretty generic and will be useful to other communities. I like dask-multiuser and daskhub or dask-hub.

I would aim for one chart with configurable options for gateway or no gateway.

I think one way to do things would be to create a config map which gets rendered with the same template and then mount the config map as environment variables in order to configure things. Happy to help/pair on it.

@TomAugspurger
Copy link
Member

create a config map which gets rendered with the same template and then mount the config map as environment variables in order to configure things.

I think I tried this earlier and failed. The rendering does work, but then I had trouble with mounting the configmap as a volume. I'll give it one more shot today and reach out if I have issues.

@jacobtomlinson
Copy link
Member Author

You may have a better experience mounting the config options as environment variables rather than a volume.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#define-container-environment-variables-using-configmap-data

@TomAugspurger
Copy link
Member

TomAugspurger commented Aug 17, 2020

Indeed, I was just looking at that page and this looks much better. Trying it out now. JupyterHub didn't seem to like my first attempt:

      File "/usr/local/lib/python3.6/dist-packages/traitlets/traitlets.py", line 1524, in error
        raise TraitError(e)
    traitlets.traitlets.TraitError: The 'environment' trait of a PatchedKubeSpawner instance must be a dict, but a value of class 'list' (i.e. [{'name': 'DASK_GATEWAY__AUTH__TYPE', 'valueFrom': {'configMapKeyRef': {'key': 'DASK_GATEWAY__AUTH__TYPE', 'name': 'daskhub-config'}}}, {'name': 'DASK_GATEWAY__ADDRESS', 'valueFrom': {'configMapKeyRef': {'key': 'DASK_GATEWAY__ADDRESS', 'name': 'daskhub-config'}}}]) was specified.
jupyterhub:
  singleuser:
    extraEnv:
      - name: DASK_GATEWAY__AUTH__TYPE
        valueFrom:
          configMapKeyRef:
            name: daskhub-config
            key: DASK_GATEWAY__AUTH__TYPE
      - name: DASK_GATEWAY__ADDRESS
        valueFrom:
          configMapKeyRef:
            name: daskhub-config
            key: DASK_GATEWAY__ADDRESS

@jacobtomlinson
Copy link
Member Author

Hmm that may be a limitation of KubeSpawner.

@TomAugspurger
Copy link
Member

TomAugspurger commented Aug 17, 2020

hub.extraEnv does note that it can take a list of EnvVar objects https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/reference.html#hub-extraenv. I think that's at https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/08c13609c1d0c6cb07d45d49d0a876100cf941eb/jupyterhub/templates/hub/deployment.yaml#L208-L210.

singleuser.extraEnv doesn't have any similar note, and I can't find where it would be handled.

@TomAugspurger
Copy link
Member

TomAugspurger commented Aug 17, 2020

@consideRatio does #68 (comment) sound right to you? That there isn't a simple way to pass singleuser environment variables as a list of EnvVar objects?

I think one possible solution would be for jupyterhub to run the user-provided extraEnv through the tpl function to render templates, as discussed in https://hackernoon.com/the-art-of-the-helm-chart-patterns-from-the-official-kubernetes-charts-8a7cafa86d12. I think that would allow us to provide a default values.yaml like

jupyterhub:
  singleuser:
    extraEnv:
      DASK_GATEWAY__ADDRESS: "traefik-{{ .Release.Name }}-dask-gateway"

That would ensure that the singleuser pod gets the rendered form, rather than the template placeholders.

@consideRatio
Copy link
Collaborator

consideRatio commented Aug 26, 2020

@TomAugspurger ah, I see nice that it was possible to use at all! There is an issue with using lists as they will get overridden rather then merged when you have two different configuration files specifying hub.extraEnv. And, since this is a Helm chart to be distributed for use by others, its very likely that users will end up thinking that they can configure hub.extraEnv themselves, but end up overriding your configuration.

I suggest making a inline comment of that issue for now, and awaiting an improvement in z2jh that allows hub.extraEnv to be a dict that can hold k8s EnvVar objects.

@TomAugspurger
Copy link
Member

Ah, I didn't realize that it would be overwritten, but that does make sense.

I'll give an alternative method one more shot, and if that doesn't work then I'll add notes saying how to add additional hub.extraEnv values.

@consideRatio
Copy link
Collaborator

consideRatio commented Aug 26, 2020

I read through the article and considered various options, and I think the way to go for a chart to be distributed for other users etc is to rely on a extraSomething being an object which keys represents the extraSomething elements to be added. While z2jh supports this, it assumes that you pass a string as a value rather than an entire EnvVar object which should also be made supported.

To conclude, fixing jupyterhub/zero-to-jupyterhub-k8s#1103 is what I would recommend. Hmm.. I'll go ahead and spend time doing that now.

TomAugspurger added a commit to TomAugspurger/helm-chart-1 that referenced this issue Aug 26, 2020
As noted in
dask#68 (comment),
library charts should avoid setting `hub.extraEnv` as a list, since it's
liable to be overwritten by users.

This avoids that by doing a bit more work in Python in the
`hub.extraConfig`.
@TomAugspurger
Copy link
Member

Thanks for looking into jupyterhub/zero-to-jupyterhub-k8s#1103. I do think fixing that would generally be useful, but we may have a workaround in #94 in this case :)

TomAugspurger added a commit that referenced this issue Aug 26, 2020
* Fix setting environment variables

As noted in
#68 (comment),
library charts should avoid setting `hub.extraEnv` as a list, since it's
liable to be overwritten by users.

This avoids that by doing a bit more work in Python in the
`hub.extraConfig`.
@jhamman
Copy link
Member

jhamman commented Aug 28, 2020

Thanks @TomAugspurger for pushing this forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants