feat: Add JAX Docker image and hello world example (#1)

* Add nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 based pyhf + JAX Docker image with pyhf v0.7.0 and JAX v0.3.25. - This should eventually be ported to https://github.com/pyhf/cuda-images but this will require some reworking of the build procedure there. * Add noxfile with build and test sessions for the Docker image. * Add "chtc_hello_gpu" example for HTCondor submission to request GPUs. - This is a copy of https://github.com/CHTC/templates-GPUs's "hello_gpu" example modified for the pyhf image. c.f. https://github.com/CHTC/templates-GPUs/tree/a3f7357b633743c96817a92b9f096e2d5db37146/docker/hello_gpu * Add summary information to README
pyhf · Dec 2, 2022 · 0252615 · 0252615
1 parent ba49b91
commit 0252615
Show file tree

Hide file tree

Showing 6 changed files with 197 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -1,2 +1,35 @@
-# htcondor-examples
-Example configurations for using pyhf with HTCondor inspired by the Center for High Throughput Computing examples
+# HTCondor examples for pyhf workflows
+
+Example configurations for using pyhf with HTCondor inspired by the [Center for High Throughput Computing examples](https://github.com/CHTC/templates-GPUs).
+
+## CUDA enabled Docker images
+
+These examples assume that you want to use GPU resources to take advantage of hardware acceleration and so focus on using the [`pyhf`](https://pyhf.readthedocs.io/) Docker base images built on the [NVIDIA CUDA enabled images](https://github.com/NVIDIA/nvidia-docker) for runtime use with the the NVIDIA Container Toolkit.
+
+### Local installation
+
+- Make sure that you have the [`nvidia-container-toolkit`](https://github.com/NVIDIA/nvidia-docker) installed on the host machine
+- Check the [list of available tags on Docker Hub](https://hub.docker.com/r/pyhf/cuda/tags?page=1) to find the tag you want
+- Use `docker pull` to pull down the image corresponding to the tag
+
+Example:
+
+```
+docker pull pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8
+```
+
+### Local use
+
+To check that NVIDIA GPUS are being properly detected run
+
+```
+docker run --rm --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 'nvidia-smi'
+```
+
+and check if the [`nvidia-smi`](https://developer.nvidia.com/nvidia-system-management-interface) output appears correctly.
+
+To run (interactively) using GPUs on the host machine:
+
+```
+docker run --rm -ti --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8
+```
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -0,0 +1,48 @@
+ARG BASE_IMAGE=nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04
+FROM ${BASE_IMAGE} as base
+
+SHELL [ "/bin/bash", "-c" ]
+
+WORKDIR /home/data
+
+ARG PYHF_VERSION=0.7.0
+ARG PYHF_BACKEND=jax
+# Set PATH to pickup virtualenv when it is unpacked
+ENV PATH=/usr/local/venv/bin:"${PATH}"
+RUN apt-get -qq update && \
+    apt-get -qq -y install --no-install-recommends \
+        python3 \
+        python3-dev \
+        python3-venv \
+        curl \
+        git && \
+    apt-get -y autoclean && \
+    apt-get -y autoremove && \
+    rm -rf /var/lib/apt/lists/* && \
+    python3 -m venv /usr/local/venv && \
+    . /usr/local/venv/bin/activate && \
+    python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
+    python -m pip --no-cache-dir install "pyhf[xmlio,contrib]==${PYHF_VERSION}" && \
+    python -m pip --no-cache-dir install \
+    --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
+    "jax[cuda]==0.3.25" && \
+    mkdir -p -v /docker && \
+    curl -sL https://raw.githubusercontent.com/matthewfeickert/nvidia-gpu-ml-library-test/main/jax_detect_GPU.py \
+        -o /docker/jax_detect_GPU.py
+
+# CONTROL FOR MANUAL BUILD
+# # N.B. variable CUDA_VERSION already exists in the image
+# ARG CUDA_VERSION_MAJOR=cuda11
+# # ARG CUDA_VERSION_MAJOR=cuda111
+# # ARG CUDNN_VERSION=cudnn805
+# ARG CUDNN_VERSION=cudnn82
+# ARG JAX_VERSION=0.3.1
+# ARG JAXLIB_VERSION=0.1.76
+
+# RUN python -m pip --no-cache-dir install \
+#     --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
+#     "jax[${CUDA_VERSION_MAJOR}_${CUDNN_VERSION}]==0.3.25"
+# RUN python -m pip --no-cache-dir install \
+#     --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
+#     "jax==${JAX_VERSION}" \
+#     "jaxlib==${JAXLIB_VERSION}+${CUDA_VERSION_MAJOR}.${CUDNN_VERSION}"
diff --git a/htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sh b/htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+echo "Hello CHTC from Job ${1} running on $(hostname)"
+echo ""
+echo "Trying to see if nvidia/cuda can access the GPU...."
+echo ""
+nvidia-smi
+
+echo ""
+echo "# Check if JAX can detect the GPU:"
+echo ""
+python /docker/jax_detect_GPU.py
+
+echo ""
+echo "# Check that pyhf is working as expected:"
+echo ""
+pyhf --version
+pyhf --help
+python -c 'import pyhf; pyhf.set_backend("jax"); print(pyhf.get_backend())'
diff --git a/htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sub b/htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sub
@@ -0,0 +1,38 @@
+# chtc_hello_gpu.sub
+# Submit file to access the GPU via docker
+
+# Must set the universe to Docker
+universe = docker
+docker_image = pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8
+
+# set the log, error and output files
+log = chtc_hello_gpu.log.txt
+error = chtc_hello_gpu.err.txt
+output = chtc_hello_gpu.out.txt
+
+# set the executable to run
+executable = chtc_hello_gpu.sh
+arguments = $(Process)
+
+should_transfer_files = YES
+when_to_transfer_output = ON_EXIT
+
+# We require a machine with a modern version of the CUDA driver
+Requirements = (Target.CUDADriverVersion >= 11.6)
+
+# We must request 1 CPU in addition to 1 GPU
+request_cpus = 1
+request_gpus = 1
+
+# select some memory and disk space
+request_memory = 2GB
+request_disk = 2GB
+
+# Opt in to using CHTC GPU Lab resources
++WantGPULab = true
+# Specify short job type to run more GPUs in parallel
+# Can also request "medium" or "long"
++GPUJobLength = "short"
+
+# Tell HTCondor to run 1 instances of our job:
+queue 1
diff --git a/htcondor_templates/chtc_hello_gpu/submit.sh b/htcondor_templates/chtc_hello_gpu/submit.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+condor_submit chtc_hello_gpu.sub
diff --git a/noxfile.py b/noxfile.py
@@ -0,0 +1,55 @@
+from datetime import datetime
+from pathlib import Path
+
+import nox
+
+# Default sessions to run if no session handles are passed
+nox.options.sessions = ["build"]
+
+
+DIR = Path(__file__).parent.resolve()
+
+
+@nox.session()
+def build(session):
+    """
+    Build image
+    """
+    base_image = "nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04"
+    pyhf_version = "0.7.0"
+    pyhf_backend = "jax"
+    cuda_version = base_image.split(":")[-1].split("-devel")[0]
+
+    session.run("docker", "pull", base_image, external=True)
+    session.run(
+        "docker",
+        "build",
+        "--file",
+        "docker/Dockerfile",
+        "--build-arg",
+        f"BASE_IMAGE={base_image}",
+        "--build-arg",
+        f"PYHF_VERSION={pyhf_version}",
+        "--build-arg",
+        f"PYHF_BACKEND={pyhf_backend}",
+        "--tag",
+        f"pyhf/cuda:{pyhf_version}-{pyhf_backend}-cuda-{cuda_version}",
+        "--tag",
+        f"pyhf/cuda:latest-{pyhf_backend}",
+        ".",
+        external=True,
+    )
+
+
+@nox.session()
+def test(session):
+    session.run(
+        "docker",
+        "run",
+        "--rm",
+        "-ti",
+        "--gpus",
+        "all",
+        "pyhf/cuda:latest-jax",
+        external=True,
+    )