- Overview
- Library Versions
- CNN Benchmarks
- RNN Benchmarks (Experimental)
- Setup
- How to Run CNN Benchmarks
- How to Run RNN Benchmarks
- References
This Benchmark module provides an easy to use scripts for benchmarking various Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) Keras models. You can use these scripts to generate benchmarks on a CPU, one GPU or multi-GPU instances. Apache MXNet and TensorFlow backends are supported for generating the benchmark results.
CREDITS:
This benchmark module borrows and extends the benchmark utility from
TensorFlow Keras benchmarks.
NOTE:
The below benchmarks use native pip packages provided by the frameworks without any optimized compile builds.
Framework | Version | Installation |
---|---|---|
Keras | 2.1.6 | pip install keras-mxnet |
MXNet (CPU) | 1.2 | pip install mxnet-mkl |
MXNet (GPU) | 1.2 | pip install mxnet-cu90 |
TensorFlow (CPU) | 1.8 | pip install tensorflow |
TensorFlow (GPU) | 1.8 | pip install tensorflow-gpu |
CUDA | 9.0 | |
cuDNN | 7.0.5 |
Currently, this utility helps in benchmarking the following CNN networks:
Currently, this utility helps in benchmarking on the following datasets:
NOTE:
1. For CIFAR10 and synthetic data, the benchmark scripts will download and generate the required data respectively.
2. For ImageNet data, you are expected to download the data - http://image-net.org/download
3. You can benchmark with a different number of layers in ResNet.
Instance Type | GPUs | Batch Size | Keras-MXNet (img/sec) | Keras-TensorFlow (img/sec) |
---|---|---|---|---|
P3.8X Large | 1 | 32 | 135 | 52 |
P3.8X Large | 4 | 128 | 536 | 162 |
P3.16X Large | 8 | 256 | 722 | 211 |
Instance Type | GPUs | Batch Size | Keras-MXNet (img/sec) | Keras-TensorFlow (img/sec) |
---|---|---|---|---|
C5.18X Large | 0 | 32 | 13 | 4 |
P3.8X Large | 1 | 32 | 194 | 184 |
P3.8X Large | 4 | 128 | 764 | 393 |
P3.16X Large | 8 | 256 | 1068 | 261 |
Instance Type | GPUs | Batch Size | Keras-MXNet (img/sec) | Keras-TensorFlow (img/sec) |
---|---|---|---|---|
C5.18X Large | 0 | 32 | 87 | 59 |
P3.8X Large | 1 | 32 | 831 | 509 |
P3.8X Large | 4 | 128 | 1783 | 699 |
P3.16X Large | 8 | 256 | 1680 | 435 |
You can see more benchmark experiments with different instance types, batch_size and other parameters in detailed CNN results document.
NOTE:
1. Image_data_format for MXNet backend - 'channels_first'
2. Image_data_format for TensorFlow backend - 'channels_last'
3. C5 instance details - https://aws.amazon.com/ec2/instance-types/c5/
4. P3 instance details (Volta GPU) - https://aws.amazon.com/ec2/instance-types/p3/
RNN support in Keras-MXNet is experimental with few rough edges on CPU training performance and no support for variable length sequence.
RNN Benchmark results will be soon added in the future releases. However, you can use this benchmark utility for benchmarking the following RNN networks:
Currently, this utility helps in benchmarking on the following datasets:
- Nietzsche
- WikiText-2
- Synthetic data
- Install Keras-MXNet following the installation guide.
- You need to install Apache MXNet and/or TensorFlow for running the benchmarks on the respective backend. Install the correct version of the backend based on your instance type (CPU/GPU).
- Download this awslabs/keras-apache-mxnet repository that contains all the benchmarking utilities and scripts.
# Install Keras-MXNet
$ pip install keras-mxnet
# Install the backend - MXNet and/or TensorFlow
# For MXNet
$ pip install mxnet-mkl # CPU
$ pip install mxnet-cu90 # GPU
# For TensorFlow
$ pip install tensorflow # CPU
$ pip install tensorflow-gpu # GPU
# Download the source code for benchmarking
$ git clone https://github.com/awslabs/keras-apache-mxnet
$ cd keras-apache-mxnet/benchmark/scripts
Update the ~/.keras/keras.json
to set the backend
and image_data_format
.
For TensorFlow backend benchmarks, set backend: tensorflow
and image_data_format: channels_last
.
For MXNet backend benchmarks, set backend: mxnet
and image_data_format: channels_first
.
$ python benchmark_resnet.py --dataset imagenet --version 1 --layers 56 --gpus 4 --epoch 20 --train_mode train_on_batch --data_path home/ubuntu/imagenet/train/
- version: can be 1 or 2 for ResNetv1 and ResNetv2 respectively.
- layers: Number of layers in ResNet
- gpus: Number of GPUs to be used. 0 to run on CPU
- train_mode: Since imagenet is a large dataset, you can choose 'train_on_batch' or 'fit_generator' to feed the data. We recommend 'train_on_batch'.
- data_path: Path where you have downloaded the ImageNet data.
Update the ~/.keras/keras.json
to set the backend
and image_data_format
.
For TensorFlow backend benchmarks, set backend: tensorflow
and image_data_format: channels_last
.
For MXNet backend benchmarks, set backend: mxnet
and image_data_format: channels_first
.
$ python benchmark_resnet.py --dataset cifar10 --version 1 --layers 56 --gpus 4 --epoch 20
Set number of gpus, epochs based on your experiments.
We have a utility shell script that you can run for benchmarking on the synthetic data.
For MXNet backend benchmarks:
$ sh run_mxnet_backend.sh cpu_config resnet50 False 20 # For CPU Benchmarks
$ sh run_mxnet_backend.sh gpu_config resnet50 False 20 # For 1 GPU Benchmarks
$ sh run_mxnet_backend.sh 4_gpu_config resnet50 False 20 # For 4 GPU Benchmarks
$ sh run_mxnet_backend.sh 8_gpu_config resnet50 False 20 # For 8 GPU Benchmarks
For TensorFlow backend benchmarks:
$ sh run_tf_backend.sh cpu_config resnet50 False 20 # For CPU Benchmarks
$ sh run_tf_backend.sh gpu_config resnet50 False 20 # For 1 GPU Benchmarks
$ sh run_tf_backend.sh 4_gpu_config resnet50 False 20 # For 4 GPU Benchmarks
$ sh run_tf_backend.sh 8_gpu_config resnet50 False 20 # For 8 GPU Benchmarks
The last parameter, 20, in the command is the number of epoch.
You can use the utility shell script to run the RNN benchmark on the Nietzsche dataset.
For MXNet backend benchmarks:
$ sh run_mxnet_backend.sh cpu_config lstm_nietzsche False 10 # For CPU Benchmarks
$ sh run_mxnet_backend.sh gpu_config lstm_nietzsche False 10 # For 1 GPU Benchmarks
$ sh run_mxnet_backend.sh 4_gpu_config lstm_nietzsche False 10 # For 4 GPU Benchmarks
$ sh run_mxnet_backend.sh 8_gpu_config lstm_nietzsche False 10 # For 8 GPU Benchmarks
For TensorFlow backend benchmarks:
$ sh run_tf_backend.sh cpu_config lstm_nietzsche False 10 # For CPU Benchmarks
$ sh run_tf_backend.sh gpu_config lstm_nietzsche False 10 # For 1 GPU Benchmarks
$ sh run_tf_backend.sh 4_gpu_config lstm_nietzsche False 10 # For 4 GPU Benchmarks
$ sh run_tf_backend.sh 8_gpu_config lstm_nietzsche False 10 # For 8 GPU Benchmarks
You can use the utility shell script to run the RNN benchmark on the WikiText2 dataset.
For MXNet backend benchmarks:
$ sh run_mxnet_backend.sh cpu_config lstm_wikitext2 False 10 # For CPU Benchmarks
$ sh run_mxnet_backend.sh gpu_config lstm_wikitext2 False 10 # For 1 GPU Benchmarks
$ sh run_mxnet_backend.sh 4_gpu_config lstm_wikitext2 False 10 # For 4 GPU Benchmarks
$ sh run_mxnet_backend.sh 8_gpu_config lstm_wikitext2 False 10 # For 8 GPU Benchmarks
For TensorFlow backend benchmarks:
$ sh run_tf_backend.sh cpu_config lstm_wikitext2 False 10 # For CPU Benchmarks
$ sh run_tf_backend.sh gpu_config lstm_wikitext2 False 10 # For 1 GPU Benchmarks
$ sh run_tf_backend.sh 4_gpu_config lstm_wikitext2 False 10 # For 4 GPU Benchmarks
$ sh run_tf_backend.sh 8_gpu_config lstm_wikitext2 False 10 # For 8 GPU Benchmarks
You can use the utility shell script to run the RNN benchmark on the Synthetic dataset.
For MXNet backend benchmarks:
$ sh run_mxnet_backend.sh cpu_config lstm_synthetic False 10 # For CPU Benchmarks
$ sh run_mxnet_backend.sh gpu_config lstm_synthetic False 10 # For 1 GPU Benchmarks
$ sh run_mxnet_backend.sh 4_gpu_config lstm_synthetic False 10 # For 4 GPU Benchmarks
$ sh run_mxnet_backend.sh 8_gpu_config lstm_synthetic False 10 # For 8 GPU Benchmarks
For TensorFlow backend benchmarks:
$ sh run_tf_backend.sh cpu_config lstm_synthetic False 10 # For CPU Benchmarks
$ sh run_tf_backend.sh gpu_config lstm_synthetic False 10 # For 1 GPU Benchmarks
$ sh run_tf_backend.sh 4_gpu_config lstm_synthetic False 10 # For 4 GPU Benchmarks
$ sh run_tf_backend.sh 8_gpu_config lstm_synthetic False 10 # For 8 GPU Benchmarks