Skip to content

Commit

Permalink
feat: Add JAX Docker image and hello world example (#1)
Browse files Browse the repository at this point in the history
* Add nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 based pyhf + JAX
  Docker image with pyhf v0.7.0 and JAX v0.3.25.
   - This should eventually be ported to https://github.com/pyhf/cuda-images
     but this will require some reworking of the build procedure there.
* Add noxfile with build and test sessions for the Docker image.
* Add "chtc_hello_gpu" example for HTCondor submission to request GPUs.
   - This is a copy of https://github.com/CHTC/templates-GPUs's "hello_gpu" example
     modified for the pyhf image.
     c.f. https://github.com/CHTC/templates-GPUs/tree/a3f7357b633743c96817a92b9f096e2d5db37146/docker/hello_gpu
* Add summary information to README
  • Loading branch information
matthewfeickert authored Dec 2, 2022
1 parent ba49b91 commit 0252615
Show file tree
Hide file tree
Showing 6 changed files with 197 additions and 2 deletions.
37 changes: 35 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,35 @@
# htcondor-examples
Example configurations for using pyhf with HTCondor inspired by the Center for High Throughput Computing examples
# HTCondor examples for pyhf workflows

Example configurations for using pyhf with HTCondor inspired by the [Center for High Throughput Computing examples](https://github.com/CHTC/templates-GPUs).

## CUDA enabled Docker images

These examples assume that you want to use GPU resources to take advantage of hardware acceleration and so focus on using the [`pyhf`](https://pyhf.readthedocs.io/) Docker base images built on the [NVIDIA CUDA enabled images](https://github.com/NVIDIA/nvidia-docker) for runtime use with the the NVIDIA Container Toolkit.

### Local installation

- Make sure that you have the [`nvidia-container-toolkit`](https://github.com/NVIDIA/nvidia-docker) installed on the host machine
- Check the [list of available tags on Docker Hub](https://hub.docker.com/r/pyhf/cuda/tags?page=1) to find the tag you want
- Use `docker pull` to pull down the image corresponding to the tag

Example:

```
docker pull pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8
```

### Local use

To check that NVIDIA GPUS are being properly detected run

```
docker run --rm --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 'nvidia-smi'
```

and check if the [`nvidia-smi`](https://developer.nvidia.com/nvidia-system-management-interface) output appears correctly.

To run (interactively) using GPUs on the host machine:

```
docker run --rm -ti --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8
```
48 changes: 48 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
ARG BASE_IMAGE=nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04
FROM ${BASE_IMAGE} as base

SHELL [ "/bin/bash", "-c" ]

WORKDIR /home/data

ARG PYHF_VERSION=0.7.0
ARG PYHF_BACKEND=jax
# Set PATH to pickup virtualenv when it is unpacked
ENV PATH=/usr/local/venv/bin:"${PATH}"
RUN apt-get -qq update && \
apt-get -qq -y install --no-install-recommends \
python3 \
python3-dev \
python3-venv \
curl \
git && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/* && \
python3 -m venv /usr/local/venv && \
. /usr/local/venv/bin/activate && \
python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \
python -m pip --no-cache-dir install "pyhf[xmlio,contrib]==${PYHF_VERSION}" && \
python -m pip --no-cache-dir install \
--find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
"jax[cuda]==0.3.25" && \
mkdir -p -v /docker && \
curl -sL https://raw.githubusercontent.com/matthewfeickert/nvidia-gpu-ml-library-test/main/jax_detect_GPU.py \
-o /docker/jax_detect_GPU.py

# CONTROL FOR MANUAL BUILD
# # N.B. variable CUDA_VERSION already exists in the image
# ARG CUDA_VERSION_MAJOR=cuda11
# # ARG CUDA_VERSION_MAJOR=cuda111
# # ARG CUDNN_VERSION=cudnn805
# ARG CUDNN_VERSION=cudnn82
# ARG JAX_VERSION=0.3.1
# ARG JAXLIB_VERSION=0.1.76

# RUN python -m pip --no-cache-dir install \
# --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
# "jax[${CUDA_VERSION_MAJOR}_${CUDNN_VERSION}]==0.3.25"
# RUN python -m pip --no-cache-dir install \
# --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \
# "jax==${JAX_VERSION}" \
# "jaxlib==${JAXLIB_VERSION}+${CUDA_VERSION_MAJOR}.${CUDNN_VERSION}"
18 changes: 18 additions & 0 deletions htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash
echo "Hello CHTC from Job ${1} running on $(hostname)"
echo ""
echo "Trying to see if nvidia/cuda can access the GPU...."
echo ""
nvidia-smi

echo ""
echo "# Check if JAX can detect the GPU:"
echo ""
python /docker/jax_detect_GPU.py

echo ""
echo "# Check that pyhf is working as expected:"
echo ""
pyhf --version
pyhf --help
python -c 'import pyhf; pyhf.set_backend("jax"); print(pyhf.get_backend())'
38 changes: 38 additions & 0 deletions htcondor_templates/chtc_hello_gpu/chtc_hello_gpu.sub
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# chtc_hello_gpu.sub
# Submit file to access the GPU via docker

# Must set the universe to Docker
universe = docker
docker_image = pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8

# set the log, error and output files
log = chtc_hello_gpu.log.txt
error = chtc_hello_gpu.err.txt
output = chtc_hello_gpu.out.txt

# set the executable to run
executable = chtc_hello_gpu.sh
arguments = $(Process)

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

# We require a machine with a modern version of the CUDA driver
Requirements = (Target.CUDADriverVersion >= 11.6)

# We must request 1 CPU in addition to 1 GPU
request_cpus = 1
request_gpus = 1

# select some memory and disk space
request_memory = 2GB
request_disk = 2GB

# Opt in to using CHTC GPU Lab resources
+WantGPULab = true
# Specify short job type to run more GPUs in parallel
# Can also request "medium" or "long"
+GPUJobLength = "short"

# Tell HTCondor to run 1 instances of our job:
queue 1
3 changes: 3 additions & 0 deletions htcondor_templates/chtc_hello_gpu/submit.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

condor_submit chtc_hello_gpu.sub
55 changes: 55 additions & 0 deletions noxfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from datetime import datetime
from pathlib import Path

import nox

# Default sessions to run if no session handles are passed
nox.options.sessions = ["build"]


DIR = Path(__file__).parent.resolve()


@nox.session()
def build(session):
"""
Build image
"""
base_image = "nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04"
pyhf_version = "0.7.0"
pyhf_backend = "jax"
cuda_version = base_image.split(":")[-1].split("-devel")[0]

session.run("docker", "pull", base_image, external=True)
session.run(
"docker",
"build",
"--file",
"docker/Dockerfile",
"--build-arg",
f"BASE_IMAGE={base_image}",
"--build-arg",
f"PYHF_VERSION={pyhf_version}",
"--build-arg",
f"PYHF_BACKEND={pyhf_backend}",
"--tag",
f"pyhf/cuda:{pyhf_version}-{pyhf_backend}-cuda-{cuda_version}",
"--tag",
f"pyhf/cuda:latest-{pyhf_backend}",
".",
external=True,
)


@nox.session()
def test(session):
session.run(
"docker",
"run",
"--rm",
"-ti",
"--gpus",
"all",
"pyhf/cuda:latest-jax",
external=True,
)

0 comments on commit 0252615

Please sign in to comment.