Configuration of RHEL GPU Machine

Configuration of RHEL GPU Machine

(Note:  This a copy of /wiki/spaces/CHAL/pages/89194559, hastily created to provide a publicly visible version of that page.)

Softlayer intends to provide GPU machines with Red Hat Enterprise Linux.  The following steps install the NVIDIA CUDA drivers, then verify the installation by installing and running the Google Tensorflow MNIST demo, retrieved and run as a Docker container.


Install the CUDA NVIDIA Driver


# per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#redhat-installation
# per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#pre-installation-actions
sudo yum install pciutils
# 2.1. Verify You Have a CUDA-Capable GPU
lspci | grep -i nvidia
curl -o cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm
sudo rpm -i cuda-repo-rhel7-7-5-local-7.5-18.x86_64.rpm
sudo yum clean all
# per http://www.linuxquestions.org/questions/linux-software-2/installing-dkms-on-rhel-7-0-a-4175510666/: 
sudo yum update
sudo yum install gcc
sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm;
sudo yum install dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r);
sudo yum install cuda
sudo yum install nano
# Sect. 3.2.3 says to update xorg.conf. I appended the lines from /etc/X11/xorg.conf.d/99-nvidia.conf to /etc/X11/xorg.conf
# need to reboot to disable the other Noveau driver prior to installing the NVIDIA driver
sudo reboot
curl -o NVIDIA-Linux-x86_64-352.63.run http://us.download.nvidia.com/XFree86/Linux-x86_64/352.63/NVIDIA-Linux-x86_64-352.63.run
chmod 774 NVIDIA-Linux-x86_64-352.63.run
sudo ./NVIDIA-Linux-x86_64-352.63.run --kernel-source-path /lib/modules/3.10.0-327.el7.x86_64/source
# When asked "Would you like to register the kernel module sources with DKMS?" I answered, NO.

# Now check the installation
# Per http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#post-installation-actions
cat /proc/driver/nvidia/version
# Looks good!
#Step 6.2.1:
mkdir cuda-samples
/usr/local/cuda-7.5/bin/cuda-install-samples-7.5.sh cuda-samples
cd cuda-samples/NVIDIA_CUDA-7.5_Samples/
# this is the single slowest step
# Looks good!
# It passed!
ls /dev/nvidia*
# should see:
#/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm

Install Docker


sudo yum update -y
# Per https://docs.docker.com/engine/installation/rhel/
sudo tee /etc/yum.repos.d/docker.repo <<-EOF
name=Docker Repository
sudo yum install docker-engine
sudo service docker start
sudo usermod -a -G docker ec2-user
# log out then log in
docker info



Install and run the Google TensorFlow Demo

# creating this link helps the Google TensorFlow demo find the libraries:
sudo ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu
docker pull b.gcr.io/tensorflow/tensorflow:latest-devel-gpu

# need a script found in github, so install github and clone the required repository
sudo yum install git
git clone https://github.com/tensorflow/tensorflow.git

tensorflow/tensorflow/tools/docker/docker_run_gpu.sh b.gcr.io/tensorflow/tensorflow:latest-gpu

python -m tensorflow.models.image.mnist.convolutional