Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neuron SDK 2.18.0 updates #71

Merged
merged 2 commits into from
Apr 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

*Description:*

*Issue #, sim, or t.corp if available:*

* Link to RTD for my changes: https://github.com/aws-neuron/aws-neuron-samples-staging/YOUR_BRANCH_NAME/

* Submitter Checklist
* Tested on : Neuron SDK <version>, release_version, Instance_type.
* I've completely filled out the form above!
**(MANDATORY) PR needs test run output

* I have provided the output with expected metrics in a metrics.json file

* I have attached metric.json in the PR

* I have attached golden_step_loss.txt

* I have added screen shot of plotted loss curve

* (If applicable) I've automated a test to safegaurd my changes from regression.
* (If applicable) I've posted test collateral to prove my change was effective and not harmful.
* (If applicable) I've added someone from QA to the list of reviewers. Do this if you didn't make an automated test or feel it's appropriate for another reason.
* (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the pre-approved Amazon license list.
* Reviewer Checklist
* I've verified the changes render correctly on RTD (link above)
* I've ensured the submitter completed the form
* (If appropriate) I've verified the metrics.json file provided by the submitter




60 changes: 44 additions & 16 deletions .github/workflows/aggregate-prs.yml
Original file line number Diff line number Diff line change
@@ -1,36 +1,64 @@
name: Aggregate PRs into Staging Branch for Automated Testing
name: Merge PR into Dynamic Branch on Label

on:
pull_request:
types: [opened, reopened, synchronize, closed]
on:
pull_request_target:
types: [labeled, synchronize]
branches:
- master

jobs:
merge-to-target:
if: github.event.pull_request.state == 'open' && !github.event.pull_request.draft
merge-to-dynamic-branch:
if: github.event.label.name != 'do-not-merge' #Excludes those labeled with do-not-merge
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: 0

- name: Configure Git
run: |
git config user.name "GitHub Actions"
git config user.email "[email protected]"

- name: Merge PR into Testing Branch
- name: Check PR Labels and Merge for New Commit Events
if: github.event.action == 'synchronize'
run: |
git fetch origin
git checkout -b testing origin/testing
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }}"
git push origin testing
LABELS_JSON=$(gh pr view ${{ github.event.pull_request.number }} --json labels)
LABELS=$(echo "$LABELS_JSON" | jq -r '.labels[].name')
for LABEL_BRANCH in $LABELS; do
# Check if the branch exists
if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
echo "Branch $LABEL_BRANCH already exists."
else
echo "Branch $LABEL_BRANCH does not exist, creating it."
git branch $LABEL_BRANCH origin/master
fi
git checkout $LABEL_BRANCH

# Merge PR changes into dynamic branch
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }} due to new commits on labeled PR"
git push origin $LABEL_BRANCH
done
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Cleanup if PR Closed
if: github.event.action == 'closed'
- name: Merge for Labeled Event
if: github.event.action == 'labeled'
run: |
# Add commands to reset or clean up target branch
# Example: git reset --hard origin/master
LABEL_BRANCH=${{ github.event.label.name }}
# Check if the branch exists
if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
echo "Branch $LABEL_BRANCH already exists."
else
echo "Branch $LABEL_BRANCH does not exist, creating it."
git branch $LABEL_BRANCH origin/master
fi
git checkout $LABEL_BRANCH

# Merge PR changes into dynamic branch
git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
git commit -m "Merged PR #${{ github.event.pull_request.number }} due to label '$LABEL_BRANCH'"
git push origin $LABEL_BRANCH
8 changes: 4 additions & 4 deletions torch-neuronx/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ The following samples are available for training:
| [hf_bert_jp](training/hf_bert_jp) | Fine-tuning & Deployment Hugging Face BERT Japanese model | DataParallel |
| [hf_sentiment_analysis](training/hf_sentiment_analysis) | Examples of training Hugging Face bert-base-cased model for a text classification task with Trn1 Single Neuron and Distributed Training | DataParallel |
| [customop_mlp](training/customop_mlp) | Examples of training a multilayer perceptron model with a custom Relu operator on a single Trn1 | DataParallel |
| [tp_dp_gpt_neox_20b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_dp_gpt_neox_6.9b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_zero1_llama2_7b_hf_pretrain](training/llama2/tp_zero1_llama2_7b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
| [tp_pp_llama2_70b_hf_pretrain](training/llama2/tp_pp_llama2_70b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |
| [tp_dp_gpt_neox_20b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) | Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_dp_gpt_neox_6.9b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) | Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
| [tp_zero1_llama2_7b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_zero1_llama2_7b_hf_pretrain) | Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
| [tp_pp_llama2_70b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_pp_llama2_hf_pretrain) | Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |

## Inference

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"# torchvision version pinned to avoid pulling in torch 2.0\n",
"!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
"!pip install -U transformers opencv-python Pillow"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"# torchvision version pinned to avoid pulling in torch 2.0\n",
"!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
"!pip install -U transformers opencv-python Pillow"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
"- `opencv-python-headless`\n",
"- `imageio`\n",
"- `scipy`\n",
"- `accelerate`\n",
"Furthermore, it requires the `ffmpeg` video-audio converter which is used to extract audio from the input videos.\n",
"\n",
"`torch-neuronx` and `neuronx-cc` should be installed when you configure your environment following the Inf2 setup guide. The remaining dependencies can be installed below:"
Expand All @@ -53,7 +54,7 @@
"outputs": [],
"source": [
"%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
"!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy opencv-python==4.8.0.74\n",
"!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy accelerate opencv-python==4.8.0.74\n",
"\n",
"!wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz\n",
"!tar xvf ffmpeg-git-amd64-static.tar.xz\n",
Expand Down
Loading