-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add JAX Docker image and hello world example (#1)
* Add nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 based pyhf + JAX Docker image with pyhf v0.7.0 and JAX v0.3.25. - This should eventually be ported to https://github.com/pyhf/cuda-images but this will require some reworking of the build procedure there. * Add noxfile with build and test sessions for the Docker image. * Add "chtc_hello_gpu" example for HTCondor submission to request GPUs. - This is a copy of https://github.com/CHTC/templates-GPUs's "hello_gpu" example modified for the pyhf image. c.f. https://github.com/CHTC/templates-GPUs/tree/a3f7357b633743c96817a92b9f096e2d5db37146/docker/hello_gpu * Add summary information to README
- Loading branch information
1 parent
ba49b91
commit 0252615
Showing
6 changed files
with
197 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,35 @@ | ||
# htcondor-examples | ||
Example configurations for using pyhf with HTCondor inspired by the Center for High Throughput Computing examples | ||
# HTCondor examples for pyhf workflows | ||
|
||
Example configurations for using pyhf with HTCondor inspired by the [Center for High Throughput Computing examples](https://github.com/CHTC/templates-GPUs). | ||
|
||
## CUDA enabled Docker images | ||
|
||
These examples assume that you want to use GPU resources to take advantage of hardware acceleration and so focus on using the [`pyhf`](https://pyhf.readthedocs.io/) Docker base images built on the [NVIDIA CUDA enabled images](https://github.com/NVIDIA/nvidia-docker) for runtime use with the the NVIDIA Container Toolkit. | ||
|
||
### Local installation | ||
|
||
- Make sure that you have the [`nvidia-container-toolkit`](https://github.com/NVIDIA/nvidia-docker) installed on the host machine | ||
- Check the [list of available tags on Docker Hub](https://hub.docker.com/r/pyhf/cuda/tags?page=1) to find the tag you want | ||
- Use `docker pull` to pull down the image corresponding to the tag | ||
|
||
Example: | ||
|
||
``` | ||
docker pull pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 | ||
``` | ||
|
||
### Local use | ||
|
||
To check that NVIDIA GPUS are being properly detected run | ||
|
||
``` | ||
docker run --rm --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 'nvidia-smi' | ||
``` | ||
|
||
and check if the [`nvidia-smi`](https://developer.nvidia.com/nvidia-system-management-interface) output appears correctly. | ||
|
||
To run (interactively) using GPUs on the host machine: | ||
|
||
``` | ||
docker run --rm -ti --gpus all pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
ARG BASE_IMAGE=nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 | ||
FROM ${BASE_IMAGE} as base | ||
|
||
SHELL [ "/bin/bash", "-c" ] | ||
|
||
WORKDIR /home/data | ||
|
||
ARG PYHF_VERSION=0.7.0 | ||
ARG PYHF_BACKEND=jax | ||
# Set PATH to pickup virtualenv when it is unpacked | ||
ENV PATH=/usr/local/venv/bin:"${PATH}" | ||
RUN apt-get -qq update && \ | ||
apt-get -qq -y install --no-install-recommends \ | ||
python3 \ | ||
python3-dev \ | ||
python3-venv \ | ||
curl \ | ||
git && \ | ||
apt-get -y autoclean && \ | ||
apt-get -y autoremove && \ | ||
rm -rf /var/lib/apt/lists/* && \ | ||
python3 -m venv /usr/local/venv && \ | ||
. /usr/local/venv/bin/activate && \ | ||
python -m pip --no-cache-dir install --upgrade pip setuptools wheel && \ | ||
python -m pip --no-cache-dir install "pyhf[xmlio,contrib]==${PYHF_VERSION}" && \ | ||
python -m pip --no-cache-dir install \ | ||
--find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \ | ||
"jax[cuda]==0.3.25" && \ | ||
mkdir -p -v /docker && \ | ||
curl -sL https://raw.githubusercontent.com/matthewfeickert/nvidia-gpu-ml-library-test/main/jax_detect_GPU.py \ | ||
-o /docker/jax_detect_GPU.py | ||
|
||
# CONTROL FOR MANUAL BUILD | ||
# # N.B. variable CUDA_VERSION already exists in the image | ||
# ARG CUDA_VERSION_MAJOR=cuda11 | ||
# # ARG CUDA_VERSION_MAJOR=cuda111 | ||
# # ARG CUDNN_VERSION=cudnn805 | ||
# ARG CUDNN_VERSION=cudnn82 | ||
# ARG JAX_VERSION=0.3.1 | ||
# ARG JAXLIB_VERSION=0.1.76 | ||
|
||
# RUN python -m pip --no-cache-dir install \ | ||
# --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \ | ||
# "jax[${CUDA_VERSION_MAJOR}_${CUDNN_VERSION}]==0.3.25" | ||
# RUN python -m pip --no-cache-dir install \ | ||
# --find-links https://storage.googleapis.com/jax-releases/jax_cuda_releases.html \ | ||
# "jax==${JAX_VERSION}" \ | ||
# "jaxlib==${JAXLIB_VERSION}+${CUDA_VERSION_MAJOR}.${CUDNN_VERSION}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#!/bin/bash | ||
echo "Hello CHTC from Job ${1} running on $(hostname)" | ||
echo "" | ||
echo "Trying to see if nvidia/cuda can access the GPU...." | ||
echo "" | ||
nvidia-smi | ||
|
||
echo "" | ||
echo "# Check if JAX can detect the GPU:" | ||
echo "" | ||
python /docker/jax_detect_GPU.py | ||
|
||
echo "" | ||
echo "# Check that pyhf is working as expected:" | ||
echo "" | ||
pyhf --version | ||
pyhf --help | ||
python -c 'import pyhf; pyhf.set_backend("jax"); print(pyhf.get_backend())' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# chtc_hello_gpu.sub | ||
# Submit file to access the GPU via docker | ||
|
||
# Must set the universe to Docker | ||
universe = docker | ||
docker_image = pyhf/cuda:0.7.0-jax-cuda-11.6.0-cudnn8 | ||
|
||
# set the log, error and output files | ||
log = chtc_hello_gpu.log.txt | ||
error = chtc_hello_gpu.err.txt | ||
output = chtc_hello_gpu.out.txt | ||
|
||
# set the executable to run | ||
executable = chtc_hello_gpu.sh | ||
arguments = $(Process) | ||
|
||
should_transfer_files = YES | ||
when_to_transfer_output = ON_EXIT | ||
|
||
# We require a machine with a modern version of the CUDA driver | ||
Requirements = (Target.CUDADriverVersion >= 11.6) | ||
|
||
# We must request 1 CPU in addition to 1 GPU | ||
request_cpus = 1 | ||
request_gpus = 1 | ||
|
||
# select some memory and disk space | ||
request_memory = 2GB | ||
request_disk = 2GB | ||
|
||
# Opt in to using CHTC GPU Lab resources | ||
+WantGPULab = true | ||
# Specify short job type to run more GPUs in parallel | ||
# Can also request "medium" or "long" | ||
+GPUJobLength = "short" | ||
|
||
# Tell HTCondor to run 1 instances of our job: | ||
queue 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/bin/bash | ||
|
||
condor_submit chtc_hello_gpu.sub |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
from datetime import datetime | ||
from pathlib import Path | ||
|
||
import nox | ||
|
||
# Default sessions to run if no session handles are passed | ||
nox.options.sessions = ["build"] | ||
|
||
|
||
DIR = Path(__file__).parent.resolve() | ||
|
||
|
||
@nox.session() | ||
def build(session): | ||
""" | ||
Build image | ||
""" | ||
base_image = "nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04" | ||
pyhf_version = "0.7.0" | ||
pyhf_backend = "jax" | ||
cuda_version = base_image.split(":")[-1].split("-devel")[0] | ||
|
||
session.run("docker", "pull", base_image, external=True) | ||
session.run( | ||
"docker", | ||
"build", | ||
"--file", | ||
"docker/Dockerfile", | ||
"--build-arg", | ||
f"BASE_IMAGE={base_image}", | ||
"--build-arg", | ||
f"PYHF_VERSION={pyhf_version}", | ||
"--build-arg", | ||
f"PYHF_BACKEND={pyhf_backend}", | ||
"--tag", | ||
f"pyhf/cuda:{pyhf_version}-{pyhf_backend}-cuda-{cuda_version}", | ||
"--tag", | ||
f"pyhf/cuda:latest-{pyhf_backend}", | ||
".", | ||
external=True, | ||
) | ||
|
||
|
||
@nox.session() | ||
def test(session): | ||
session.run( | ||
"docker", | ||
"run", | ||
"--rm", | ||
"-ti", | ||
"--gpus", | ||
"all", | ||
"pyhf/cuda:latest-jax", | ||
external=True, | ||
) |