-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DaskHub helm chart #91
Changes from 5 commits
7878309
5cd660c
21d8ebb
f0ef24f
679ec6f
f56c785
0efaf33
fc9e4ce
08dcabd
35774cc
9439ef1
0d40979
92816d9
5354f08
dc875f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
*.lock | ||
charts |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
apiVersion: v2 | ||
name: daskhub | ||
version: 0.0.1-set.by.chartpress | ||
description: Multi-user JupyterHub and Dask deployment. | ||
dependencies: | ||
- name: jupyterhub | ||
version: "0.9.1" | ||
repository: 'https://jupyterhub.github.io/helm-chart/' | ||
import-values: | ||
- child: rbac | ||
parent: rbac | ||
- name: dask-gateway | ||
version: "0.8.0" | ||
repository: 'https://dask.org/dask-gateway-helm-repo/' | ||
maintainers: | ||
- name: Jacob Tomlinson (Nvidia) | ||
email: jtomlinson@nvidia.com | ||
- name: Joe Hamman (NCAR) | ||
email: jhamman@ucar.edu | ||
- name: Guillaume Eynard-Bontemps (CNES) | ||
email: guillaume.eynard-bontemps@cnes.fr | ||
- name: Erik Sundell | ||
email: erik.i.sundell@gmail.com | ||
- name: Tom Augspurger | ||
email: tom.w.augspurger@gmail.com |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
# DaskHub | ||
|
||
This chart provides a multi-user, Dask-Gateway enabled JupyterHub. | ||
It combines the [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) | ||
and [Dask Gateway](https://gateway.dask.org/) helm charts. | ||
|
||
For single users, a simpler setup is supported by the `dask` helm chart. | ||
|
||
## Chart Details | ||
|
||
This chart will deploy the following | ||
|
||
- A standard Dask Gateway deployment using the Dask Gateway helm chart, | ||
configured to use JupyterHub for authentication. | ||
- A standard JupyterHub deployment using the JupyterHub helm chart, | ||
configured proxy Dask Gateway requests and set Dask Gateway-related | ||
environment variables. | ||
|
||
## Prepare Configuration File | ||
|
||
In this step, we'll prepare a YAML configuration file with the fields | ||
required by the DaskHub helm chart. It will contain some secret | ||
keys, which should not be checked into version control in plaintext. | ||
|
||
We need two random hex strings that will be used as keys, one for | ||
JupyterHub and one for Dask Gateway. | ||
|
||
Run the following command, and copy the output. This is our `token-1`. | ||
|
||
```console | ||
openssl rand -hex 32 # generate token-1 | ||
``` | ||
|
||
Run command again and copy the output. This is our `token-2`. | ||
|
||
```console | ||
openssl rand -hex 32 # generate token-2 | ||
``` | ||
|
||
Now substitute those two values for `<token-1>` and `<token-2>` below. | ||
Note that `<token-2>` is used twice, once for `jupyterhub.hub.services.dask-gateway.apiToken`, and a second time for `dask-gateway.gateway.auth.jupyterhub.apiToken`. | ||
|
||
|
||
```yaml | ||
# file: secrets.yaml | ||
jupyterhub: | ||
proxy: | ||
secretToken: "<token-1>" | ||
hub: | ||
services: | ||
dask-gateway: | ||
apiToken: "<token-2>" | ||
|
||
dask-gateway: | ||
gateway: | ||
auth: | ||
jupyterhub: | ||
apiToken: "<token-2>" | ||
``` | ||
|
||
If your users wish to access Dask dashboards, you'll also need to specify the | ||
public hostname or IP address of the hub . | ||
|
||
```yaml | ||
# file: config.yaml | ||
jupyterhub: | ||
proxy: | ||
https: | ||
hosts: | ||
- "daskhub.example.com" | ||
service: | ||
loadBalancerIP: "35.202.158.223" | ||
``` | ||
|
||
If you don't have an IP for your JupyterHub yet (if, say, you're letting | ||
Kubernetes assign it for you), then you may need to leave this blank and | ||
do a secondary `helm install` with the value set once it's known. | ||
|
||
## Install DaskHub | ||
|
||
This example installs into the namespace `dhub`. Make sure you're | ||
in the same directory as the `secrets.yaml` file. | ||
|
||
```console | ||
$ helm upgrade --wait --install --render-subchart-notes \ | ||
dhub dask/daskhub \ | ||
--namespace=dhub \ | ||
--version=0.0.1 \ | ||
--values=secrets.yaml | ||
``` | ||
|
||
The output explains how to find the IPs for your JupyterHub and Dask Gateway. | ||
|
||
```console | ||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
proxy-public LoadBalancer 10.43.249.239 35.202.158.223 443:31587/TCP,80:30500/TCP 2m40s | ||
``` | ||
|
||
JupyterHub is available at the `proxy-public` external ip (35.202.158.223 in this example). | ||
Note, that this value needs to be set as the `jupyterhub.proxy.service.loadBalancerIP`. | ||
|
||
```yaml | ||
# file: config.yaml | ||
jupyterhub: | ||
proxy: | ||
service: | ||
loadBalancerIP: "35.202.158.223" | ||
``` | ||
|
||
Be sure to (re)deploy helm with this value set to enable the Dask dashboard. | ||
|
||
## Creating a Dask Cluster | ||
|
||
To create a Dask cluster, connect to the Dask Gateway | ||
|
||
```python | ||
>>> from dask_gateway import Gateway | ||
>>> gateway = dask_gateway.Gateway() | ||
>>> gateway.list_clusters() | ||
[] | ||
``` | ||
|
||
Once connected to the gateway, create a cluster and connect a client. | ||
|
||
```python | ||
>>> cluster = gateway.new_cluster() | ||
>>> client = cluster.get_client() | ||
``` | ||
|
||
## Matching the user environment | ||
|
||
Dask Clients will be running the JupyterHub's singleuser environment. To ensure | ||
that the same environment is used for the scheduler and workers, you can provide | ||
it as a Gateway option. | ||
|
||
```yaml | ||
# config.yaml | ||
dask-gateway: | ||
extraConfig: | ||
optionHandler: | | ||
from dask_gateway_server.options import Options, Integer, Float, String | ||
def option_handler(options): | ||
if ":" not in options.image: | ||
raise ValueError("When specifying an image you must also provide a tag") | ||
return { | ||
"image": options.image, | ||
} | ||
c.Backend.cluster_options = Options( | ||
String("image", default="pangeo/base-notebook:2020.07.28", label="Image"), | ||
handler=option_handler, | ||
) | ||
``` | ||
|
||
The user environment will need to include `dask-gateway`. | ||
|
||
## Using dask-kubernetes instead of Dask Gateway | ||
|
||
Users who don't need Dask Gateway can use dask-kubernetes to manage creating Dask Clusters. To use dask-kubernetes, you should set | ||
|
||
``` | ||
# config.yaml | ||
daskhub: | ||
jupyterhub: | ||
singleuser: | ||
servieAccountName: daskkubernetes | ||
|
||
dask-gateway: | ||
enabled: false | ||
|
||
dask-kubernetes: | ||
enabled: true | ||
``` | ||
|
||
When deploying, helm will create a Kubernetes ServiceAccount, Role, and RoleBinding. This ensures that the pods serving JupyterHub singleusers have the eleveated permissions for starting and stopping pods. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
DaskHub | ||
------- | ||
|
||
Thank you for installing DaskHub, a multiuser, Dask-enabled JupyterHub! | ||
|
||
Your release is named {{.Release.Name}} and installed into the namespace {{.Release.Namespace}}. | ||
|
||
|
||
Jupyter Hub | ||
----------- | ||
|
||
You can find if the hub and proxy is ready by doing: | ||
|
||
kubectl --namespace={{.Release.Namespace}} get pod | ||
|
||
and watching for both those pods to be in status 'Ready'. | ||
|
||
You can find the public IP of the JupyterHub by doing: | ||
|
||
kubectl --namespace={{.Release.Namespace}} get svc proxy-public | ||
|
||
It might take a few minutes for it to appear! |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
{{- if and (index .Values "dask-kubernetes" "enabled") .Values.rbac.enabled -}} | ||
kind: ServiceAccount | ||
apiVersion: v1 | ||
metadata: | ||
name: daskkubernetes | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
chart: {{ .Chart.Name }}-{{ .Chart.Version }} | ||
component: daskkubernetes | ||
heritage: {{ .Release.Service }} | ||
release: {{ .Release.Name }} | ||
|
||
--- | ||
|
||
kind: Role | ||
apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
metadata: | ||
name: daskkubernetes | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
chart: {{ .Chart.Name }}-{{ .Chart.Version }} | ||
component: daskkubernetes | ||
heritage: {{ .Release.Service }} | ||
release: {{ .Release.Name }} | ||
rules: | ||
- apiGroups: [""] # "" indicates the core API group | ||
resources: ["pods", "services"] | ||
verbs: ["get", "list", "watch", "create", "delete"] | ||
- apiGroups: [""] # "" indicates the core API group | ||
resources: ["pods/log"] | ||
verbs: ["get", "list"] | ||
|
||
--- | ||
|
||
kind: RoleBinding | ||
apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
metadata: | ||
name: daskkubernetes | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
chart: {{ .Chart.Name }}-{{ .Chart.Version }} | ||
component: daskkubernetes | ||
heritage: {{ .Release.Service }} | ||
release: {{ .Release.Name }} | ||
subjects: | ||
- kind: ServiceAccount | ||
name: daskkubernetes | ||
roleRef: | ||
kind: Role | ||
name: daskkubernetes | ||
apiGroup: rbac.authorization.k8s.io | ||
{{- end }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: daskhub-config | ||
data: | ||
DASK_GATEWAY__PROXY_ADDRESS: "gateway://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}:80" | ||
DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL: "http://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}" | ||
# Try to detect the gateway address through a few means. | ||
{{ if .Values.jupyterhub.proxy.https.hosts }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the complexity of the Hub chart knowing its own public address (there are so many ways for people to set up ingresses, https, etc. for jupyterhub, some of which can be retrieved from the chart, some of which can't) and various sources it may come from, having a simple explicit override available in this chart may be worthwhile: {{- if .Values.jupyterhubPublicHost }}
DASK_GATEWAY__ADDRESS: "{{ .Values.jupyterhubPublicHost }}/services/dask-gateway"
{{- else }} it's manual, but at least it will always work and isn't sensitive to what's going on in the hub chart. |
||
DASK_GATEWAY__ADDRESS: "https://{{ .Values.jupyterhub.proxy.https.hosts | first }}/services/dask-gateway" | ||
{{ else if .Values.jupyterhub.proxy.service.loadBalancerIP }} | ||
DASK_GATEWAY__ADDRESS: "http://{{ .Values.jupyterhub.proxy.service.loadBalancerIP }}/services/dask-gateway" | ||
{{ else }} | ||
DASK_GATEWAY__ADDRESS: "http://proxy-public/services/dask-gateway" | ||
{{ end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does chartpress set this from the tag?
Currently with the dask chart we increment this each release. It is semi-automated today with this script.
https://github.com/dask/helm-chart/blob/master/ci/release.sh#L9
We should probably bring them in line. If chartpress can do this automatically from git tags I'd be in favour of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came from pangeo-data/helm-chart@66b7ccd (@consideRatio). But it looks like chartpress does indeed set it based on the tag + commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chartpress indeed sets the version based on a tag, number of commits since the tag, and the hash. But note that the tag needs to be on a commit in the branch, if the latest tag was on a tag not in the branch, chartpress would ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the output of
chartpress --long
So things seems to be working OK. I assume that when we tag a release this will be set to the actual value?