Skip to content

Commit

Permalink
Added instructions for adding GPUs and increasing shared memory
Browse files Browse the repository at this point in the history
  • Loading branch information
tlkh committed Aug 16, 2019
1 parent 2b07d4e commit 309be91
Showing 1 changed file with 64 additions and 0 deletions.
64 changes: 64 additions & 0 deletions doc/source/user-resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,70 @@ This would limit your users to a maximum of .5 of a CPU (so 1/2 of a CPU core),

Remember to :ref:`apply the change <apply-config-changes>` after changing your ``config.yaml`` file!

Set user GPU guarantees / limits
--------------------------------

It is possible to allocate GPUs to your user. This is useful for heavier
workloads, such as deep learning, that can take advantage of GPUs.

For example, to create a profile that allocates one NVIDIA GPU:

.. code-block:: yaml
singleuser:
profileList:
- display_name: "GPU Server"
description: "Spawns a notebook server with access to a GPU"
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1"
This assumes that at least one of your Kubernetes nodes has compatible GPUs
attached. The method for doing this differs according to your infrastructure
provider. Here are a few links to help you get started:

- `Google Kubernetes Engine (GKE)<https://cloud.google.com/kubernetes-engine/docs/how-to/gpus>`_
- `Amazon Elastic Kubernetes Service (EKS)<https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/>`_
- `Azure Kubernetes Service (AKS)<https://cloud.google.com/kubernetes-engine/docs/how-to/gpus>`_

You will also need to deploy the k8s-device-plugin following the instructions `here<https://github.com/NVIDIA/k8s-device-plugin#quick-start>`_.

To check that your GPUs are schedulable by Kubernetes, you can run the following command:

.. code-block:: none
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
Modifying user shared memory size
---------------------------------

It is also beneficial to increase the shared memory (SHM) allocation on pods
running workloads like deep learning. This is required for functions like
PyTorch's DataLoader to run properly.

The following configuration will increase the SHM allocation by mounting a
:code:`tmpfs` (ramdisk) at :code:`/dev/shm`, replacing the default 64MB allocation.

.. code-block:: yaml
singleuser:
storage:
extraVolumes:
- name: shm-volume
emptyDir:
medium: Memory
extraVolumeMounts:
- name: shm-volume
mountPath: /dev/shm
The volume :code:`shm-volume` will be created when the user's pod is created,
and destroyed after the pod is destroyed.

Some important notes regarding SHM allocation

- SHM usage by the pod will count towards its memory limit
- When the memory limit is exceeded, the pod will be evicted

Modifying user storage type and size
------------------------------------

Expand Down

0 comments on commit 309be91

Please sign in to comment.