Skip to content

Commit

Permalink
Neuron SDK 2.18.0 updates (#71)
Browse files Browse the repository at this point in the history
Neuron SDK 2.18.0 updates

---------

Co-authored-by: Nathan Mailhot <[email protected]>
  • Loading branch information
aws-rxgupta and natemail-aws authored Apr 1, 2024
1 parent 604b7c3 commit 0815eb2
Show file tree
Hide file tree
Showing 47 changed files with 1,292 additions and 6,029 deletions.
32 changes: 32 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

*Description:*

*Issue #, sim, or t.corp if available:*

* Link to RTD for my changes: https://github.com/aws-neuron/aws-neuron-samples-staging/YOUR_BRANCH_NAME/

* Submitter Checklist
* Tested on : Neuron SDK <version>, release_version, Instance_type.
* I've completely filled out the form above!
**(MANDATORY) PR needs test run output

* I have provided the output with expected metrics in a metrics.json file

* I have attached metric.json in the PR

* I have attached golden_step_loss.txt
* I have added screen shot of plotted loss curve
* (If applicable) I've automated a test to safegaurd my changes from regression.
* (If applicable) I've posted test collateral to prove my change was effective and not harmful.
* (If applicable) I've added someone from QA to the list of reviewers. Do this if you didn't make an automated test or feel it's appropriate for another reason.
* (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the pre-approved Amazon license list.
* Reviewer Checklist
* I've verified the changes render correctly on RTD (link above)
* I've ensured the submitter completed the form
* (If appropriate) I've verified the metrics.json file provided by the submitter




60 changes: 44 additions & 16 deletions .github/workflows/aggregate-prs.yml
Original file line number Diff line number Diff line change
@@ -1,36 +1,64 @@
name: Aggregate PRs into Staging Branch for Automated Testing
name: Merge PR into Dynamic Branch on Label

on:
pull_request:
types: [opened, reopened, synchronize, closed]
on:
pull_request_target:
types: [labeled, synchronize]
branches:
- master

jobs:
merge-to-target:
if: github.event.pull_request.state == 'open' && !github.event.pull_request.draft
merge-to-dynamic-branch:
if: github.event.label.name != 'do-not-merge' #Excludes those labeled with do-not-merge
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: 0

- name: Configure Git
run: |
git config user.name "GitHub Actions"
git config user.email "[email protected]"
- name: Merge PR into Testing Branch
- name: Check PR Labels and Merge for New Commit Events
if: github.event.action == 'synchronize'
run: |
git fetch origin
git checkout -b testing origin/testing
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }}"
git push origin testing
LABELS_JSON=$(gh pr view ${{ github.event.pull_request.number }} --json labels)
LABELS=$(echo "$LABELS_JSON" | jq -r '.labels[].name')
for LABEL_BRANCH in $LABELS; do
# Check if the branch exists
if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
echo "Branch $LABEL_BRANCH already exists."
else
echo "Branch $LABEL_BRANCH does not exist, creating it."
git branch $LABEL_BRANCH origin/master
fi
git checkout $LABEL_BRANCH
# Merge PR changes into dynamic branch
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }} due to new commits on labeled PR"
git push origin $LABEL_BRANCH
done
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Cleanup if PR Closed
if: github.event.action == 'closed'
- name: Merge for Labeled Event
if: github.event.action == 'labeled'
run: |
# Add commands to reset or clean up target branch
# Example: git reset --hard origin/master
LABEL_BRANCH=${{ github.event.label.name }}
# Check if the branch exists
if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
echo "Branch $LABEL_BRANCH already exists."
else
echo "Branch $LABEL_BRANCH does not exist, creating it."
git branch $LABEL_BRANCH origin/master
fi
git checkout $LABEL_BRANCH
# Merge PR changes into dynamic branch
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }} due to label '$LABEL_BRANCH'"
git push origin $LABEL_BRANCH
8 changes: 4 additions & 4 deletions torch-neuronx/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ The following samples are available for training:
| [hf_bert_jp](training/hf_bert_jp) | Fine-tuning & Deployment Hugging Face BERT Japanese model | DataParallel |
| [hf_sentiment_analysis](training/hf_sentiment_analysis) | Examples of training Hugging Face bert-base-cased model for a text classification task with Trn1 Single Neuron and Distributed Training | DataParallel |
| [customop_mlp](training/customop_mlp) | Examples of training a multilayer perceptron model with a custom Relu operator on a single Trn1 | DataParallel |
| [tp_dp_gpt_neox_20b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_dp_gpt_neox_6.9b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_zero1_llama2_7b_hf_pretrain](training/llama2/tp_zero1_llama2_7b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
| [tp_pp_llama2_70b_hf_pretrain](training/llama2/tp_pp_llama2_70b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |
| [tp_dp_gpt_neox_20b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) | Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_dp_gpt_neox_6.9b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) | Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_zero1_llama2_7b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_zero1_llama2_7b_hf_pretrain) | Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
| [tp_pp_llama2_70b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_pp_llama2_hf_pretrain) | Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |

## Inference

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"# torchvision version pinned to avoid pulling in torch 2.0\n",
"!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
"!pip install -U transformers opencv-python Pillow"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"# torchvision version pinned to avoid pulling in torch 2.0\n",
"!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
"!pip install -U transformers opencv-python Pillow"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
"- `opencv-python-headless`\n",
"- `imageio`\n",
"- `scipy`\n",
"- `accelerate`\n",
"Furthermore, it requires the `ffmpeg` video-audio converter which is used to extract audio from the input videos.\n",
"\n",
"`torch-neuronx` and `neuronx-cc` should be installed when you configure your environment following the Inf2 setup guide. The remaining dependencies can be installed below:"
Expand All @@ -53,7 +54,7 @@
"outputs": [],
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy opencv-python==4.8.0.74\n",
"!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy accelerate opencv-python==4.8.0.74\n",
"\n",
"!wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz\n",
"!tar xvf ffmpeg-git-amd64-static.tar.xz\n",
Expand Down
Loading

0 comments on commit 0815eb2

Please sign in to comment.