Skip to content

Commit

Permalink
[HuggingFace][Neuronx] Training - Optimum Neuron 0.0.25 - Neuron sdk …
Browse files Browse the repository at this point in the history
…2.20.0 - Transformers to 4.43.2 (#4365)

* feat(neuronx): add 0.0.25 training DLC

* fix(neuronx): add mlflow vulnerabilities to allow-list

These vulnerabilities were already added for the pytorch training DLCs.

* REVERTME: activate neuronx train CI build

* fix(neuronx): apparmor and gevent vulnerabilities

* fix: add werkzeug exception (Windows vuln)

* fix: add another werkzeug exception

* fix: pin sagemaker version to stop importing errors

* fix: try to remove tensorboard 2.6 error

* fix: add mlflow and gunicorn exceptions

* fix: yet another mlflow vuln

* Revert "REVERTME: activate neuronx train CI build"

This reverts commit 7cff21f.

---------

Co-authored-by: Malav Shastri <[email protected]>
  • Loading branch information
dacorvo and malav-shastri authored Nov 18, 2024
1 parent f3f70fa commit 36a6aab
Show file tree
Hide file tree
Showing 4 changed files with 1,991 additions and 2 deletions.
4 changes: 2 additions & 2 deletions huggingface/pytorch/training/buildspec-neuronx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ images:
device_type: &DEVICE_TYPE neuronx
python_version: &DOCKER_PYTHON_VERSION py3
tag_python_version: &TAG_PYTHON_VERSION py310
neuron_sdk_version: &NEURON_SDK_VERSION sdk2.19.1
neuron_sdk_version: &NEURON_SDK_VERSION sdk2.20.0
os_version: &OS_VERSION ubuntu20.04
transformers_version: &TRANSFORMERS_VERSION 4.41.1
transformers_version: &TRANSFORMERS_VERSION 4.43.2
datasets_version: &DATASETS_VERSION 2.18.0
tag: !join [ *VERSION, '-', 'transformers', *TRANSFORMERS_VERSION, '-', *DEVICE_TYPE, '-', *TAG_PYTHON_VERSION,"-", *NEURON_SDK_VERSION, '-', *OS_VERSION ]
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /, *NEURON_SDK_VERSION, /Dockerfile., *DEVICE_TYPE ]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# https://github.com/aws/deep-learning-containers/blob/master/available_images.md
# refer to the above page to pull latest PyTorch Neuronx image

# docker image region us-west-2
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.0-ubuntu20.04


LABEL maintainer="Amazon AI"
LABEL dlc_major_version="1"

# version args
ARG OPTIMUM_NEURON_VERSION=0.0.25
ARG TRANSFORMERS_VERSION
ARG DATASETS_VERSION
ARG GEVENT_VERSION=24.10.3
ARG GAUTH_VERSION=1.35.0
ARG PYTHON=python3

# install Hugging Face libraries and its dependencies
RUN pip install --no-cache-dir \
"sagemaker==2.232.2" \
evaluate \
transformers[sklearn,sentencepiece,audio,vision]==${TRANSFORMERS_VERSION} \
datasets==${DATASETS_VERSION} \
optimum-neuron==${OPTIMUM_NEURON_VERSION} \
peft \
google-auth==${GAUTH_VERSION} \
gevent==${GEVENT_VERSION}

# Pin numpy to version required by neuronx-cc
# Update Pillow and urllib version to fix high and critical vulnerabilities
RUN pip install -U \
"numpy>=1.24.3,<=1.25.2" \
"numba==0.58.1" \
"Pillow==10.3.0" \
"requests<2.32.0" \
"urllib3>=1.26.17,<1.27"

RUN apt-get update \
&& apt install -y --no-install-recommends \
git-lfs \
libgssapi-krb5-2 \
libexpat1 \
expat \
libarchive13 \
&& apt-get upgrade -y apparmor \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN HOME_DIR=/root \
&& curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
&& unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \
&& cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \
&& chmod +x /usr/local/bin/testOSSCompliance \
&& chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
&& rm -rf ${HOME_DIR}/oss_compliance*
Loading

0 comments on commit 36a6aab

Please sign in to comment.