diff --git a/demo/BERT/README.md b/demo/BERT/README.md index 53b513ff..df5791bf 100755 --- a/demo/BERT/README.md +++ b/demo/BERT/README.md @@ -476,33 +476,6 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o | 384 | 64 | 45.62 | 45.63 | 45.40 | 84.26 | 84.56 | 83.63 | | 384 | 128 | 89.51 | 89.55 | 89.01 | 164.56 | 164.95 | 163.70 | -##### Megatron Large with Sparsity - -| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | | -|-----------------|------------|-----------------|-----------------|---------| -| | | 95th Percentile | 99th Percentile | Average | -| 128 | 1 | 1.17 | 1.18 | 1.14 | -| 128 | 2 | 1.43 | 1.82 | 1.43 | -| 128 | 4 | 1.90 | 1.90 | 1.90 | -| 128 | 8 | 3.08 | 3.08 | 3.05 | -| 128 | 12 | 3.36 | 3.36 | 3.36 | -| 128 | 16 | 4.42 | 4.42 | 4.42 | -| 128 | 24 | 6.01 | 6.01 | 6.00 | -| 128 | 32 | 7.75 | 7.76 | 7.75 | -| 128 | 64 | 13.91 | 14.04 | 13.81 | -| 128 | 128 | 27.11 | 27.12 | 26.85 | -| 384 | 1 | 1.71 | 1.71 | 1.71 | -| 384 | 2 | 2.37 | 2.37 | 2.37 | -| 384 | 4 | 3.92 | 3.92 | 3.92 | -| 384 | 8 | 6.80 | 6.80 | 6.80 | -| 384 | 12 | 9.02 | 9.03 | 9.02 | -| 384 | 16 | 12.15 | 12.16 | 12.15 | -| 384 | 24 | 17.54 | 17.55 | 17.41 | -| 384 | 32 | 22.94 | 22.96 | 22.71 | -| 384 | 64 | 43.88 | 43.90 | 43.61 | -| 384 | 128 | 85.42 | 85.45 | 84.89 | - - #### Inference performance: NVIDIA A30 Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` on NVIDIA A30. @@ -559,33 +532,6 @@ Results were obtained by running `scripts/inference_benchmark.sh --gpu Ampere` o | 384 | 64 | 92.04 | 92.37 | 91.21 | 174.21 | 174.91 | 173.29 | | 384 | 128 | 180.77 | 181.11 | 179.78 | 343.25 | 343.80 | 342.30 | -##### Megatron Large with Sparsity - -| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | | -|-----------------|------------|-----------------|-----------------|---------| -| | | 95th Percentile | 99th Percentile | Average | -| 128 | 1 | 1.43 | 1.43 | 1.43 | -| 128 | 2 | 1.90 | 1.90 | 1.90 | -| 128 | 4 | 3.12 | 3.13 | 3.09 | -| 128 | 8 | 4.79 | 4.79 | 4.78 | -| 128 | 12 | 6.38 | 6.39 | 6.35 | -| 128 | 16 | 8.63 | 8.67 | 8.55 | -| 128 | 24 | 11.99 | 12.00 | 11.92 | -| 128 | 32 | 16.42 | 16.43 | 16.37 | -| 128 | 64 | 30.11 | 30.12 | 29.91 | -| 128 | 128 | 58.93 | 59.03 | 58.39 | -| 384 | 1 | 2.70 | 2.70 | 2.70 | -| 384 | 2 | 4.18 | 4.18 | 4.17 | -| 384 | 4 | 7.33 | 7.35 | 7.26 | -| 384 | 8 | 13.78 | 13.79 | 13.63 | -| 384 | 12 | 19.47 | 19.48 | 19.30 | -| 384 | 16 | 25.55 | 25.56 | 25.34 | -| 384 | 24 | 37.13 | 37.15 | 36.55 | -| 384 | 32 | 48.76 | 48.78 | 48.20 | -| 384 | 64 | 95.57 | 95.85 | 94.96 | -| 384 | 128 | 186.36 | 186.83 | 185.37 | - - #### Inference performance: NVIDIA T4 (16GB) Results were obtained by running `scripts/inference_benchmark.sh --gpu Turing` on NVIDIA T4 (16G).