Warning: these are probably completely out of date and unlikely to work.
These are a few scripts to take an Amazon EC2 instance from scratch to running nvidia-docker, so that you can easily run any deep learning library (although I have to also provide a modified version of the Tensorflow devel-gpu Dockerfile, as the standard one is not configured for cuda compute capability of 3.0).
The installation procedure is pretty crude and involves running the three scripts provided in this repository in sequence, rebooting in between (but not after the third).
Start either a regular or spot instance, the image this is tested with is
the 64 bit Ubuntu 14.04 image ami-8446ff93
. If you would like to, you
could fill in the KeyName
and SecurityGroupIds
in
nvidocker-spec.json.template
to start an instance from the command line.
Then, with the aws cli tools installed, you can run the following command
to start a spot instance with a max price of one dollar:
aws ec2 request-spot-instances --spot-price 1.00 --instance-count 1 --type one-time --launch-specification file://nvidocker-spec.json
If you're not using AWS for anything else, you can also use the following command to get the public DNS name of probably the instance you just started (give it a few minutes to start up):
aws ec2 describe-instances --query Reservations[0].Instances[0].PublicDnsName
Later, I'm going to refer to the address of this instance as EC2ADDRESS
.
First, copy all of the files in this repository to the new machine:
scp -i <your-key>.pem * ubuntu@$EC2ADDRESS:~/
Or, clone this repo while in a shell on the remote machine:
git clone https://github.com/gngdb/nvidia-docker-ec2.git
Make all the scripts executable:
chmod +x ec2-nvidocker-setup-*
Then run the three scripts, the first two will trigger reboot upon finishing:
./ec2-nvidocker-setup-1.sh
REBOOT./ec2-nvidocker-setup-2.sh
REBOOT./ec2-nvidocker-setup-3.sh
After the third script you can log out and log back in for the docker group to operate correctly, or you can just run:
newgrp docker
Warning: I haven't tested the current version of this Dockerfile, but I think it should work.
The devel-gpu
docker image provided by Tensorflow has to be rebuilt with
the TF_CUDA_COMPUTE_CAPABILITY
environment variable set to 3.0. To do
this, build the docker image using the Dockerfile in this repository:
nvidia-docker build -t gngdb/tensorflow:latest-devel-gpu .
Every other major deep learning library can be pulled from Docker hub thanks to kaixhin's great collection of builds for each of them. So you can have Caffe, Keras and Theano all running in 10 minutes, simultaneously, on the same machine. And, if you accidentally break an install, you can just start a new container.