This is an attempt to replicate the following paper as the hyperparameter link is not working in the paper.

arXiv:1302.4389 [stat.ML]

Dataset and Device Info

dataset: THE MNIST DATABASE
GPU: 1, 8GB, GM204GL [Tesla M60]
CPU: 4, 30.5 GiB
logs and model: here

The following diagram shows the maxout module with multilayer perceptrons.

MLP + Dropout

How to Run

Train: (first 50000 training data) - python mnist.py --mlp 1 --train true
Validation: (remaining 10000 training data) - python mnist.py --mlp 1 --valid true
Train Continuation: (whole train data, continue from previous training) - python mnist.py --mlp 1 --train_cont true
Testing: python mnist.py --mlp 1 --test true

For complete hyperparameter tuning check hyper-tuning.rst file.

Learning rate: 0.005

Training

Epochs	Batch size	Layer1		Layer2		Accuracy (%)	Loss
Epochs	Batch size	Number of layers	Number of Neurons	Number of layers	Number of Neurons	Accuracy (%)	Loss
5	64	4	2048	2	10	97.79	1.5060
5	64	4	1024	2	10	97.44	1.5107

Validation

Training Epochs	Batch size	Layer1		Layer2		Accuracy (%)	Loss
Training Epochs	Batch size	Number of layers	Number of Neurons	Number of layers	Number of Neurons	Accuracy (%)	Loss
5	64	4	2048	2	10	96.94	1.5097
5	64	4	1024	2	10	96.83	1.5108

It has been trained further with whole training dataset with the following accuracies and loss.

Training with pretrained weights

Epochs	Batch size	Layer1		Layer2		Accuracy (%)	Loss
Epochs	Batch size	Number of layers	Number of Neurons	Number of layers	Number of Neurons	Accuracy (%)	Loss
5	64	4	2048	2	10	99.02	1.4827

Testing

Batch size	Layer1		Layer2		Accuracy (%)	Loss
Batch size	Number of layers	Number of Neurons	Number of layers	Number of Neurons	Accuracy (%)	Loss
64	4	2048	2	10	97.17	1.5007

3 Conv + MLP

How to Run

Train: (50000 shuffled training data) - python mnist.py --conv 1 --train true
Validation: (remaining 10000 training data) - python mnist.py --conv 1 --valid true
Train Continuation: (whole train data, continue from previous training) - python mnist.py --conv 1 --train_cont true
Testing: python mnist.py --conv 1 --test true

Learning Rate

First learning rate is set to 0.01. Then it is halved at epoch 5 for training of 50000 shuffled data. With least error for validation, it is retrained with the pretrained weights. But this time the starting learning rate is 0.001, it is halved at epoch 5.

The architecture presented in paper is as follows: conv -> maxpool -> conv -> maxpool -> conv -> maxpool -> MLP -> softmax. It is evident that the output of MLP is 10 and the input of MLP is whatever number comes from 3rd maxpool. Only I had to adjust was kernels, paddings of convolutional layers. Because those are the only parameters in the network.

Training

Epochs	Batch	Conv1		Maxpool1		Conv2		Maxpool2		Conv3		Maxpool3		MLP		Acc %	Loss
Epochs	Batch	kernel	pad	pool	stride	kernel	pad	pool	stride	kernel	pad	pool	stride	in	out	Acc %	Loss
10	64	7 x 7	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	625	10	97.09	1.4921
10	64	5 x 5	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	729	10	87.62	1.5856
10	64	5 x 5	3	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	961	10	95.43	1.5088
10	64	5 x 5	2	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	841	10	95.96	1.5037

Validation

Batch	Conv1		Maxpool1		Conv2		Maxpool2		Conv3		Maxpool3		MLP		Acc %	Loss
Batch	kernel	pad	pool	stride	kernel	pad	pool	stride	kernel	pad	pool	stride	in	out	Acc %	Loss
64	7 x 7	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	625	10	96.85	1.4928
64	5 x 5	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	729	10	87.76	1.5828
64	5 x 5	3	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	961	10	95.16	1.5828
64	5 x 5	2	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	841	10	96.15	1.5012

Training Continuation

Epochs	Batch	Conv1		Maxpool1		Conv2		Maxpool2		Conv3		Maxpool3		MLP		Acc %	Loss
Epochs	Batch	kernel	pad	pool	stride	kernel	pad	pool	stride	kernel	pad	pool	stride	in	out	Acc %	Loss
10	64	7 x 7	3	2 x 2	1	5 x 5	2	2	1	5 x 5	2	2	1	625	10	97.58	1.4874
10	64	5 x 5	3	2 x 2	1	5 x 5	2	2	1	5 x 5	2	2	1	729	10	88.04	1.5811
10	64	5 x 5	3	2 x 2	1	3 x 3	2	2	1	3 x 3	2	2	1	961	10	96.25	1.5011
10	64	5 x 5	2	2 x 2	1	3 x 3	2	2	1	3 x 3	2	2	1	841	10	96.75	1.4960

Testing

Batch	Conv1		Maxpool1		Conv2		Maxpool2		Conv3		Maxpool3		MLP		Acc %	Loss
Batch	kernel	pad	pool	stride	kernel	pad	pool	stride	kernel	pad	pool	stride	in	out	Acc %	Loss
64	7 x 7	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	625	10	96.87	1.4929
64	5 x 5	3	2 x 2	1	5 x 5	2	2 x 2	1	5 x 5	2	2 x 2	1	729	10	87.39	1.5861
64	5 x 5	3	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	961	10	95.52	1.5070
64	5 x 5	2	2 x 2	1	3 x 3	2	2 x 2	1	3 x 3	2	2 x 2	1	841	10	96.30	1.4989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Dataset and Device Info

MLP + Dropout

How to Run

Training

Validation

Training with pretrained weights

Testing

3 Conv + MLP

How to Run

Learning Rate

Training

Validation

Training Continuation

Testing

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Dataset and Device Info

MLP + Dropout

How to Run

Training

Validation

Training with pretrained weights

Testing

3 Conv + MLP

How to Run

Learning Rate

Training

Validation

Training Continuation

Testing