Skip to content

Commit

Permalink
remove megatron large sparsity data
Browse files Browse the repository at this point in the history
because we captured megatron large dense data by mistake.
Signed-off-by: Vincent Huang <[email protected]>
  • Loading branch information
ttyio authored and rajeevsrao committed Oct 28, 2022
1 parent 007c347 commit 3e345ef
Showing 1 changed file with 0 additions and 54 deletions.
54 changes: 0 additions & 54 deletions demo/BERT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -476,33 +476,6 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
| 384 | 64 | 45.62 | 45.63 | 45.40 | 84.26 | 84.56 | 83.63 |
| 384 | 128 | 89.51 | 89.55 | 89.01 | 164.56 | 164.95 | 163.70 |

##### Megatron Large with Sparsity

| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.17 | 1.18 | 1.14 |
| 128 | 2 | 1.43 | 1.82 | 1.43 |
| 128 | 4 | 1.90 | 1.90 | 1.90 |
| 128 | 8 | 3.08 | 3.08 | 3.05 |
| 128 | 12 | 3.36 | 3.36 | 3.36 |
| 128 | 16 | 4.42 | 4.42 | 4.42 |
| 128 | 24 | 6.01 | 6.01 | 6.00 |
| 128 | 32 | 7.75 | 7.76 | 7.75 |
| 128 | 64 | 13.91 | 14.04 | 13.81 |
| 128 | 128 | 27.11 | 27.12 | 26.85 |
| 384 | 1 | 1.71 | 1.71 | 1.71 |
| 384 | 2 | 2.37 | 2.37 | 2.37 |
| 384 | 4 | 3.92 | 3.92 | 3.92 |
| 384 | 8 | 6.80 | 6.80 | 6.80 |
| 384 | 12 | 9.02 | 9.03 | 9.02 |
| 384 | 16 | 12.15 | 12.16 | 12.15 |
| 384 | 24 | 17.54 | 17.55 | 17.41 |
| 384 | 32 | 22.94 | 22.96 | 22.71 |
| 384 | 64 | 43.88 | 43.90 | 43.61 |
| 384 | 128 | 85.42 | 85.45 | 84.89 |


#### Inference performance: NVIDIA A30

Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` on NVIDIA A30.
Expand Down Expand Up @@ -559,33 +532,6 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o
| 384 | 64 | 92.04 | 92.37 | 91.21 | 174.21 | 174.91 | 173.29 |
| 384 | 128 | 180.77 | 181.11 | 179.78 | 343.25 | 343.80 | 342.30 |

##### Megatron Large with Sparsity

| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.43 | 1.43 | 1.43 |
| 128 | 2 | 1.90 | 1.90 | 1.90 |
| 128 | 4 | 3.12 | 3.13 | 3.09 |
| 128 | 8 | 4.79 | 4.79 | 4.78 |
| 128 | 12 | 6.38 | 6.39 | 6.35 |
| 128 | 16 | 8.63 | 8.67 | 8.55 |
| 128 | 24 | 11.99 | 12.00 | 11.92 |
| 128 | 32 | 16.42 | 16.43 | 16.37 |
| 128 | 64 | 30.11 | 30.12 | 29.91 |
| 128 | 128 | 58.93 | 59.03 | 58.39 |
| 384 | 1 | 2.70 | 2.70 | 2.70 |
| 384 | 2 | 4.18 | 4.18 | 4.17 |
| 384 | 4 | 7.33 | 7.35 | 7.26 |
| 384 | 8 | 13.78 | 13.79 | 13.63 |
| 384 | 12 | 19.47 | 19.48 | 19.30 |
| 384 | 16 | 25.55 | 25.56 | 25.34 |
| 384 | 24 | 37.13 | 37.15 | 36.55 |
| 384 | 32 | 48.76 | 48.78 | 48.20 |
| 384 | 64 | 95.57 | 95.85 | 94.96 |
| 384 | 128 | 186.36 | 186.83 | 185.37 |


#### Inference performance: NVIDIA T4 (16GB)

Results were obtained by running `scripts/inference_benchmark.sh --gpu Turing` on NVIDIA T4 (16G).
Expand Down

0 comments on commit 3e345ef

Please sign in to comment.