Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaskHub helm chart #91

Merged
merged 15 commits into from
Aug 26, 2020
4 changes: 4 additions & 0 deletions chartpress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@ charts:
repo:
git: dask/helm-chart
published: https://helm.dask.org
- name: daskhub
repo:
git: dask/helm-chart
published: https://helm.dask.org
2 changes: 2 additions & 0 deletions daskhub/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.lock
charts
23 changes: 23 additions & 0 deletions daskhub/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
25 changes: 25 additions & 0 deletions daskhub/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
apiVersion: v2
name: daskhub
version: 0.0.1-set.by.chartpress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does chartpress set this from the tag?

Currently with the dask chart we increment this each release. It is semi-automated today with this script.

https://github.com/dask/helm-chart/blob/master/ci/release.sh#L9

We should probably bring them in line. If chartpress can do this automatically from git tags I'd be in favour of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came from pangeo-data/helm-chart@66b7ccd (@consideRatio). But it looks like chartpress does indeed set it based on the tag + commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chartpress indeed sets the version based on a tag, number of commits since the tag, and the hash. But note that the tag needs to be on a commit in the branch, if the latest tag was on a tag not in the branch, chartpress would ignore it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the output of chartpress --long

diff --git a/dask/Chart.yaml b/dask/Chart.yaml
index d79b1fd..ee832d7 100755
--- a/dask/Chart.yaml
+++ b/dask/Chart.yaml
@@ -1,7 +1,6 @@
----
 apiVersion: v1
 name: dask
-version: 4.3.1
+version: 4.3.1-n001.h7878309
 appVersion: 2.23.0
 description: Distributed computation in Python with task scheduling
 home: https://dask.org
diff --git a/daskhub/Chart.yaml b/daskhub/Chart.yaml
index 48e1b7d..ac7a094 100644
--- a/daskhub/Chart.yaml
+++ b/daskhub/Chart.yaml
@@ -1,7 +1,7 @@
 apiVersion: v2
 name: daskhub
 icon: https://avatars3.githubusercontent.com/u/17131925?v=3&s=200
-version: 0.0.1-set.by.chartpress
+version: 4.3.1-n008.hfc9e4ce
 description: Multi-user JupyterHub and Dask deployment.
 dependencies:
   - name: jupyterhub

So things seems to be working OK. I assume that when we tag a release this will be set to the actual value?

description: Multi-user JupyterHub and Dask deployment.
dependencies:
- name: jupyterhub
version: "0.9.1"
repository: 'https://jupyterhub.github.io/helm-chart/'
import-values:
- child: rbac
parent: rbac
- name: dask-gateway
version: "0.8.0"
repository: 'https://dask.org/dask-gateway-helm-repo/'
maintainers:
- name: Jacob Tomlinson (Nvidia)
email: jtomlinson@nvidia.com
- name: Joe Hamman (NCAR)
email: jhamman@ucar.edu
- name: Guillaume Eynard-Bontemps (CNES)
email: guillaume.eynard-bontemps@cnes.fr
- name: Erik Sundell
email: erik.i.sundell@gmail.com
- name: Tom Augspurger
email: tom.w.augspurger@gmail.com
174 changes: 174 additions & 0 deletions daskhub/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# DaskHub

This chart provides a multi-user, Dask-Gateway enabled JupyterHub.
It combines the [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/)
and [Dask Gateway](https://gateway.dask.org/) helm charts.

For single users, a simpler setup is supported by the `dask` helm chart.

## Chart Details

This chart will deploy the following

- A standard Dask Gateway deployment using the Dask Gateway helm chart,
configured to use JupyterHub for authentication.
- A standard JupyterHub deployment using the JupyterHub helm chart,
configured proxy Dask Gateway requests and set Dask Gateway-related
environment variables.

## Prepare Configuration File

In this step, we'll prepare a YAML configuration file with the fields
required by the DaskHub helm chart. It will contain some secret
keys, which should not be checked into version control in plaintext.

We need two random hex strings that will be used as keys, one for
JupyterHub and one for Dask Gateway.

Run the following command, and copy the output. This is our `token-1`.

```console
openssl rand -hex 32 # generate token-1
```

Run command again and copy the output. This is our `token-2`.

```console
openssl rand -hex 32 # generate token-2
```

Now substitute those two values for `<token-1>` and `<token-2>` below.
Note that `<token-2>` is used twice, once for `jupyterhub.hub.services.dask-gateway.apiToken`, and a second time for `dask-gateway.gateway.auth.jupyterhub.apiToken`.


```yaml
# file: secrets.yaml
jupyterhub:
proxy:
secretToken: "<token-1>"
hub:
services:
dask-gateway:
apiToken: "<token-2>"

dask-gateway:
gateway:
auth:
jupyterhub:
apiToken: "<token-2>"
```

If your users wish to access Dask dashboards, you'll also need to specify the
public hostname or IP address of the hub .

```yaml
# file: config.yaml
jupyterhub:
proxy:
https:
hosts:
- "daskhub.example.com"
service:
loadBalancerIP: "35.202.158.223"
```

If you don't have an IP for your JupyterHub yet (if, say, you're letting
Kubernetes assign it for you), then you may need to leave this blank and
do a secondary `helm install` with the value set once it's known.

## Install DaskHub

This example installs into the namespace `dhub`. Make sure you're
in the same directory as the `secrets.yaml` file.

```console
$ helm upgrade --wait --install --render-subchart-notes \
dhub dask/daskhub \
--namespace=dhub \
--version=0.0.1 \
--values=secrets.yaml
```

The output explains how to find the IPs for your JupyterHub and Dask Gateway.

```console
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
proxy-public LoadBalancer 10.43.249.239 35.202.158.223 443:31587/TCP,80:30500/TCP 2m40s
```

JupyterHub is available at the `proxy-public` external ip (35.202.158.223 in this example).
Note, that this value needs to be set as the `jupyterhub.proxy.service.loadBalancerIP`.

```yaml
# file: config.yaml
jupyterhub:
proxy:
service:
loadBalancerIP: "35.202.158.223"
```

Be sure to (re)deploy helm with this value set to enable the Dask dashboard.

## Creating a Dask Cluster

To create a Dask cluster, connect to the Dask Gateway

```python
>>> from dask_gateway import Gateway
>>> gateway = dask_gateway.Gateway()
>>> gateway.list_clusters()
[]
```

Once connected to the gateway, create a cluster and connect a client.

```python
>>> cluster = gateway.new_cluster()
>>> client = cluster.get_client()
```

## Matching the user environment

Dask Clients will be running the JupyterHub's singleuser environment. To ensure
that the same environment is used for the scheduler and workers, you can provide
it as a Gateway option.

```yaml
# config.yaml
dask-gateway:
extraConfig:
optionHandler: |
from dask_gateway_server.options import Options, Integer, Float, String
def option_handler(options):
if ":" not in options.image:
raise ValueError("When specifying an image you must also provide a tag")
return {
"image": options.image,
}
c.Backend.cluster_options = Options(
String("image", default="pangeo/base-notebook:2020.07.28", label="Image"),
handler=option_handler,
)
```

The user environment will need to include `dask-gateway`.

## Using dask-kubernetes instead of Dask Gateway

Users who don't need Dask Gateway can use dask-kubernetes to manage creating Dask Clusters. To use dask-kubernetes, you should set

```
# config.yaml
daskhub:
jupyterhub:
singleuser:
servieAccountName: daskkubernetes

dask-gateway:
enabled: false

dask-kubernetes:
enabled: true
```

When deploying, helm will create a Kubernetes ServiceAccount, Role, and RoleBinding. This ensures that the pods serving JupyterHub singleusers have the eleveated permissions for starting and stopping pods.
22 changes: 22 additions & 0 deletions daskhub/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
DaskHub
-------

Thank you for installing DaskHub, a multiuser, Dask-enabled JupyterHub!

Your release is named {{.Release.Name}} and installed into the namespace {{.Release.Namespace}}.


Jupyter Hub
-----------

You can find if the hub and proxy is ready by doing:

kubectl --namespace={{.Release.Namespace}} get pod

and watching for both those pods to be in status 'Ready'.

You can find the public IP of the JupyterHub by doing:

kubectl --namespace={{.Release.Namespace}} get svc proxy-public

It might take a few minutes for it to appear!
52 changes: 52 additions & 0 deletions daskhub/templates/dask-kubernetes-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{{- if and (index .Values "dask-kubernetes" "enabled") .Values.rbac.enabled -}}
kind: ServiceAccount
apiVersion: v1
metadata:
name: daskkubernetes
namespace: {{ .Release.Namespace }}
labels:
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
component: daskkubernetes
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}

---

kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: daskkubernetes
namespace: {{ .Release.Namespace }}
labels:
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
component: daskkubernetes
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods", "services"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""] # "" indicates the core API group
resources: ["pods/log"]
verbs: ["get", "list"]

---

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: daskkubernetes
namespace: {{ .Release.Namespace }}
labels:
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
component: daskkubernetes
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
subjects:
- kind: ServiceAccount
name: daskkubernetes
roleRef:
kind: Role
name: daskkubernetes
apiGroup: rbac.authorization.k8s.io
{{- end }}
15 changes: 15 additions & 0 deletions daskhub/templates/daskhub-configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: daskhub-config
data:
DASK_GATEWAY__PROXY_ADDRESS: "gateway://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}:80"
DASKHUB_JUPYTERHUB_SERVICE_GATEWAY_URL: "http://traefik-{{ .Release.Name }}-dask-gateway.{{ .Release.Namespace }}"
# Try to detect the gateway address through a few means.
{{ if .Values.jupyterhub.proxy.https.hosts }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the complexity of the Hub chart knowing its own public address (there are so many ways for people to set up ingresses, https, etc. for jupyterhub, some of which can be retrieved from the chart, some of which can't) and various sources it may come from, having a simple explicit override available in this chart may be worthwhile:

{{- if .Values.jupyterhubPublicHost }}
DASK_GATEWAY__ADDRESS: "{{ .Values.jupyterhubPublicHost }}/services/dask-gateway"
{{- else }}

it's manual, but at least it will always work and isn't sensitive to what's going on in the hub chart.

DASK_GATEWAY__ADDRESS: "https://{{ .Values.jupyterhub.proxy.https.hosts | first }}/services/dask-gateway"
{{ else if .Values.jupyterhub.proxy.service.loadBalancerIP }}
DASK_GATEWAY__ADDRESS: "http://{{ .Values.jupyterhub.proxy.service.loadBalancerIP }}/services/dask-gateway"
{{ else }}
DASK_GATEWAY__ADDRESS: "http://proxy-public/services/dask-gateway"
{{ end }}
Loading