Configuration of RHEL GPU Machine
(Note: This a copy of /wiki/spaces/CHAL/pages/89194559, hastily created to provide a publicly visible version of that page.)
Softlayer intends to provide GPU machines with Red Hat Enterprise Linux. The following steps install the NVIDIA CUDA drivers, then verify the installation by installing and running the Google Tensorflow MNIST demo, retrieved and run as a Docker container.
Install the CUDA NVIDIA Driver
# per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#redhat-installation # per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#pre-installation-actions sudo yum install pciutils # 2.1. Verify You Have a CUDA-Capable GPU lspci | grep -i nvidia curl -o cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm sudo rpm -i cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm sudo yum clean all # per http://www.linuxquestions.org/questions/linux-software-2/installing-dkms-on-rhel-7-0-a-4175510666/: sudo yum update sudo yum install gcc sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm; sudo yum install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r); sudo yum install cuda sudo yum install nano # Sect. 3.2.3 says to update xorg.conf. I appended the lines from /etc/X11/xorg.conf.d/99-nvidia.conf to /etc/X11/xorg.conf # need to reboot to disable the other Noveau driver prior to installing the NVIDIA driver sudo reboot curl -o NVIDIA-Linux-x86_64-352.63.run http://us.download.nvidia.com/XFree86/Linux-x86_64/352.63/NVIDIA-Linux-x86_64-352.63.run chmod 774 NVIDIA-Linux-x86_64-352.63.run sudo ./NVIDIA-Linux-x86_64-352.63.run --kernel-source-path /lib/modules/3.10.0-327.el7.x86_64/source # When asked "Would you like to register the kernel module sources with DKMS?" I answered, NO. # Now check the installation # Per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#post-installation-actions cat /proc/driver/nvidia/version # Looks good! #Step 6.2.1: mkdir cuda-samples /usr/local/cuda-7.5/bin/cuda-install-samples-7.5.sh cuda-samples cd cuda-samples/NVIDIA_CUDA-7.5_Samples/ make # this is the single slowest step ./bin/x86_64/linux/release/deviceQuery # Looks good! ./bin/x86_64/linux/release/bandwidthTest # It passed! ls /dev/nvidia* # should see: #/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm
Install Docker
sudo yum update -y # Per https://docs.docker.com/engine/installation/rhel/ sudo tee /etc/yum.repos.d/docker.repo <<-EOF [dockerrepo] name=Docker Repository baseurl=https://yum.dockerproject.org/repo/main/centos/7 enabled=1 gpgcheck=1 gpgkey=https://yum.dockerproject.org/gpg EOF sudo yum install docker-engine sudo service docker start sudo usermod -a -G docker ec2-user # log out then log in docker info
Install and run the Google TensorFlow Demo
# creating this link helps the Google TensorFlow demo find the libraries: sudo ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu docker pull b.gcr.io/tensorflow/tensorflow:latest-devel-gpu # need a script found in github, so install github and clone the required repository sudo yum install git git clone https://github.com/tensorflow/tensorflow.git tensorflow/tensorflow/tools/docker/docker_run_gpu.sh b.gcr.io/tensorflow/tensorflow:latest-gpu python -m tensorflow.models.image.mnist.convolutional