Nvidia Docker
Table of Contents
Nvidia GPUs
Nvidia has unfortunately come to own the HPC (High Performance Computing) market, mainly through the early lead they gained with CUDA and its relative ease of use over alternatives like the more open and less proprietary OpenCL and ATI’s perhaps too late and too little GPU Open. They are certainly trying, but they just don’t seem to have enough resources behind these efforts to keep up with the latest and the install process is both not easy for the average layman (requiring a fairly new kernel) and needs pretty recent GPUs. For myself… I seem to find that by the time I upgrade my GPU the one I’m upgrading to always seems to already be dropped from support. Due to these troubles I eventually gave in and picked up two 1080 Ti cards from Nvidia. I’m not very happy with CUDA, and certainly not with Nvidia’s attitude in the market, all proprietary, etc… but it is the only platform with reasonable support from the upstream programmers of many machine learning frameworks.
Compatibility
Even so, compatibility with CUDA and Nvidia GPUs is somewhat limited. At least the driver is backwards compatible, but the CUDA toolkit is limited to exactly that version of the runtime. It is somewhat obnoxious to try and support multiple versions of the toolkit at once. Luckily though, Nvidia has provided docker containers of Ubuntu 16.04 with various runtime and development environments. We previously published a paper in which we used used the CUDA 8.0 runtime and cuDNN 5.1. This does not work with newer toolkits, so we are using these containers.
Nvidia-Docker
In order for the container to have access to the GPU, we need a few things set up that the default docker does not provide. We can get these by using Nvidia-Docker 1.0 or if you’re doing a new install, probably use the Nvidia-Docker 2.0. The syntax when you run it is a little bit different, but ultimately it does the same thing, exposes the GPU to the container. The instructions they give for installation seem to be fine, I won’t elaborate on them more here.
An Example
I have the 1.0 version installed, so for me, I want to boot up a container to use with our older code. Note that we say that you may experience issues with other versions of CUDA and/or cuDNN, well, we have now tested that, and you will almost certainly face issues, from hanging, to segfaults, to threading errors. This is the command we can use to start a compatible docker container so long as our Nvidia driver is compatible with the version of the toolkit we need (which it is here). Using these containers makes it so that the GPU hardware and the driver version are the only dependencies we need outside of the container and each toolkit can be kept isolated.
sudo nvidia-docker run -it nvidia/cuda:8.0-cudnn5-devel
Pretty simple, right?
Now if I want another one to use with Julia installed on my main system and
loading data from my home directory, and I want to use a newer toolkit, I can
do this:
sudo nvidia-docker run -it -v /opt/julia-1.1.0-linux-x86_64/:/opt/julia-1.1.0-linux-x86_64/ -v /home/kunji/:/home/kunji/ nvidia/cuda:9.0-cudnn7-devel
Here I have mounted the location of my julia installation, and my home folder via the ‘-v’ option. The path on the left of each ‘:’ is the location on the host system, and the path on the right is where it is mounted in the container.
All the usual docker commands apply, just use ‘nvidia-docker’ instead of ‘docker’ for these if you are using the ‘1.0’ version, and that other syntax specified on their site for the ‘2.0’ version, e.g.
sudo docker-docker run --runtime=nvidia -it -v /opt/julia-1.1.0-linux-x86_64/:/opt/julia-1.1.0-linux-x86_64/ -v /home/kunji/:/home/kunji/ nvidia/cuda:9.0-cudnn7-devel
Would be the equivalent of the 2nd example I gave. We can still list
the containers with just docker ps
, commit the images, etc…