Neuron SDK 2.18.0 updates (#71)

Neuron SDK 2.18.0 updates --------- Co-authored-by: Nathan Mailhot <[email protected]>
aws-neuron · Apr 1, 2024 · 0815eb2 · 0815eb2
1 parent 604b7c3
commit 0815eb2
Show file tree

Hide file tree

Showing 47 changed files with 1,292 additions and 6,029 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,32 @@
+
+*Description:*
+
+*Issue #, sim, or t.corp if available:*
+
+* Link to RTD for my changes: https://github.com/aws-neuron/aws-neuron-samples-staging/YOUR_BRANCH_NAME/
+
+* Submitter Checklist        
+    * Tested on : Neuron SDK <version>, release_version, Instance_type.
+    *  I've completely filled out the form above!
+       **(MANDATORY) PR needs test run output
+
+            * I have provided the output with expected metrics in a metrics.json file
+
+            * I have attached metric.json in the PR
+
+            * I have attached golden_step_loss.txt
+       
+            * I have added screen shot of plotted loss curve
+       
+        *  (If applicable) I've automated a test to safegaurd my changes from regression.
+        *  (If applicable) I've posted test collateral to prove my change was effective and not harmful.
+        *  (If applicable) I've added someone from QA to the list of reviewers. Do this if you didn't make an automated test or feel it's appropriate for another reason.
+        *  (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the pre-approved Amazon license list.
+* Reviewer Checklist
+        *  I've verified the changes render correctly on RTD (link above)
+        *  I've ensured the submitter completed the form
+        *  (If appropriate) I've verified the metrics.json file provided by the submitter
+
+
+
+
diff --git a/.github/workflows/aggregate-prs.yml b/.github/workflows/aggregate-prs.yml
@@ -1,36 +1,64 @@
-name: Aggregate PRs into Staging Branch for Automated Testing
+name: Merge PR into Dynamic Branch on Label
 
-on: 
-  pull_request:
-    types: [opened, reopened, synchronize, closed]
+on:
+  pull_request_target:
+    types: [labeled, synchronize]
     branches:
       - master
 
 jobs:
-  merge-to-target:
-    if: github.event.pull_request.state == 'open' && !github.event.pull_request.draft
+  merge-to-dynamic-branch:
+    if: github.event.label.name != 'do-not-merge' #Excludes those labeled with do-not-merge
     runs-on: ubuntu-latest
     steps:
     - name: Checkout Repository
       uses: actions/checkout@v2
       with:
+        ref: ${{ github.event.pull_request.head.ref }}
         fetch-depth: 0
 
     - name: Configure Git
       run: |
         git config user.name "GitHub Actions"
         git config user.email "[email protected]"
 
-    - name: Merge PR into Testing Branch
+    - name: Check PR Labels and Merge for New Commit Events
+      if: github.event.action == 'synchronize'
       run: |
-        git fetch origin
-        git checkout -b testing origin/testing
-        git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
-        git commit -m "Merged PR #${{ github.event.pull_request.number }}"
-        git push origin testing
+        LABELS_JSON=$(gh pr view ${{ github.event.pull_request.number }} --json labels)
+        LABELS=$(echo "$LABELS_JSON" | jq -r '.labels[].name')
+        for LABEL_BRANCH in $LABELS; do
+          # Check if the branch exists
+          if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
+            echo "Branch $LABEL_BRANCH already exists."
+          else
+            echo "Branch $LABEL_BRANCH does not exist, creating it."
+            git branch $LABEL_BRANCH origin/master
+          fi
+          git checkout $LABEL_BRANCH
+
+          # Merge PR changes into dynamic branch
+          git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
+          git commit -m "Merged PR #${{ github.event.pull_request.number }} due to new commits on labeled PR"
+          git push origin $LABEL_BRANCH
+        done
+      env:
+        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
 
-    - name: Cleanup if PR Closed
-      if: github.event.action == 'closed'
+    - name: Merge for Labeled Event
+      if: github.event.action == 'labeled'
       run: |
-        # Add commands to reset or clean up target branch
-        # Example: git reset --hard origin/master
+        LABEL_BRANCH=${{ github.event.label.name }}
+        # Check if the branch exists
+        if git show-ref --verify --quiet refs/heads/$LABEL_BRANCH; then
+          echo "Branch $LABEL_BRANCH already exists."
+        else
+          echo "Branch $LABEL_BRANCH does not exist, creating it."
+          git branch $LABEL_BRANCH origin/master
+        fi
+        git checkout $LABEL_BRANCH
+
+        # Merge PR changes into dynamic branch
+        git merge ${{ github.event.pull_request.head.sha }} --no-ff --no-commit
+        git commit -m "Merged PR #${{ github.event.pull_request.number }} due to label '$LABEL_BRANCH'"
+        git push origin $LABEL_BRANCH
diff --git a/torch-neuronx/README.md b/torch-neuronx/README.md
@@ -20,10 +20,10 @@ The following samples are available for training:
 | [hf_bert_jp](training/hf_bert_jp)                           | Fine-tuning & Deployment Hugging Face BERT Japanese model                                                                               | DataParallel |
 | [hf_sentiment_analysis](training/hf_sentiment_analysis)     | Examples of training Hugging Face bert-base-cased model for a text classification task with Trn1 Single Neuron and Distributed Training | DataParallel |
 | [customop_mlp](training/customop_mlp)     | Examples of training a multilayer perceptron model with a custom Relu operator on a single Trn1 | DataParallel |
-| [tp_dp_gpt_neox_20b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) [Deprecated]   | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
-| [tp_dp_gpt_neox_6.9b_hf_pretrain](training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
-| [tp_zero1_llama2_7b_hf_pretrain](training/llama2/tp_zero1_llama2_7b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
-| [tp_pp_llama2_70b_hf_pretrain](training/llama2/tp_pp_llama2_70b_hf_pretrain) [Deprecated] | Please note the following sample location has changed to [NeuronX Distributed Repository](https://github.com/aws-neuron/neuronx-distributed). Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |
+| [tp_dp_gpt_neox_20b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain) | Training GPT-NEOX 20B model using neuronx-distributed | Tensor Parallel & DataParallel |
+| [tp_dp_gpt_neox_6.9b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain) | Training GPT-NEOX 6.9B model using neuronx-distributed | Tensor Parallel & DataParallel |
+| [tp_zero1_llama2_7b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_zero1_llama2_7b_hf_pretrain) | Training Llama-2 7B model using neuronx-distributed | Tensor Parallel |
+| [tp_pp_llama2_70b_hf_pretrain](https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/training/llama2/tp_pp_llama2_hf_pretrain) | Training Llama-2 70B model using neuronx-distributed | Tensor Parallel & Pipeline Parallel |
 
 ## Inference
 

diff --git a/torch-neuronx/inference/hf_pretrained_clip_base_inference_on_inf2.ipynb b/torch-neuronx/inference/hf_pretrained_clip_base_inference_on_inf2.ipynb
@@ -54,7 +54,7 @@
             "source": [
                 "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
                 "# torchvision version pinned to avoid pulling in torch 2.0\n",
-                "!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
+                "!pip install -U transformers opencv-python Pillow"
             ]
         },
         {

diff --git a/torch-neuronx/inference/hf_pretrained_clip_large_inference_on_inf2.ipynb b/torch-neuronx/inference/hf_pretrained_clip_large_inference_on_inf2.ipynb
@@ -54,7 +54,7 @@
             "source": [
                 "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
                 "# torchvision version pinned to avoid pulling in torch 2.0\n",
-                "!pip install -U transformers torchvision==0.14.1 opencv-python Pillow"
+                "!pip install -U transformers opencv-python Pillow"
             ]
         },
         {

diff --git a/torch-neuronx/inference/hf_pretrained_perceiver_multimodal_inference.ipynb b/torch-neuronx/inference/hf_pretrained_perceiver_multimodal_inference.ipynb
@@ -41,6 +41,7 @@
                 "- `opencv-python-headless`\n",
                 "- `imageio`\n",
                 "- `scipy`\n",
+                "- `accelerate`\n",
                 "Furthermore, it requires the `ffmpeg` video-audio converter which is used to extract audio from the input videos.\n",
                 "\n",
                 "`torch-neuronx` and `neuronx-cc` should be installed when you configure your environment following the Inf2 setup guide. The remaining dependencies can be installed below:"
@@ -53,7 +54,7 @@
             "outputs": [],
             "source": [
                 "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n",
-                "!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy opencv-python==4.8.0.74\n",
+                "!pip install transformers==4.30.2 opencv-python-headless==4.8.0.74 imageio scipy accelerate opencv-python==4.8.0.74\n",
                 "\n",
                 "!wget https://johnvansickle.com/ffmpeg/builds/ffmpeg-git-amd64-static.tar.xz\n",
                 "!tar xvf ffmpeg-git-amd64-static.tar.xz\n",