Padam vs ATMO

This code is a fork of Padam offical code to obtain a perfect comparison between ATMO idea and Padam.

Prerequisites:

pip install -r requirements.txt

Usage:

Use python to run run_cnn_test_cifar10.py for experiments on Cifar10 and run_cnn_test_cifar100.py for experiments on Cifar100

Command Line Arguments:

--lr: (start) learning rate
--method: optimization method, e.g., "sgdm", "adam", "amsgrad", "padam", "mps", "mas", "map"
--net: network architecture, e.g. "vggnet", "resnet", "wideresnet"
--partial: partially adaptive parameter for Padam method
--wd: weight decay
--Nepoch: number of training epochs
--resume: whether resume from previous training process

Usage Examples:

Run experiments on Cifar10:

python run_cnn_test_cifar10.py  --lr 0.01 --method "mps" --net "resnet"  --partial 0.125 --wd 2.5e-2 > logs/resnet/file.log

Obtain max and mean of logs

python folder_mean_accuracy.py

Results

SGD-Momentum	ADAM	Amsgrad	AdamW	Yogi	AdaBound	Padam	Dynamic ATMO
95.00	92.89	93.53	94.56	93.92	94.16	94.94	95.27

Citation

Please check our paper for technical details and full results.

@article{
  title={Combining Optimization Methods Using an Adaptive Meta Optimizer},
  author={Nicola Landro and Ignazio Gallo and Riccardo La Grassa},
  year={2021},
  journal={Algorithms MDPI},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Padam vs ATMO

Prerequisites:

Usage:

Command Line Arguments:

Usage Examples:

Results

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Padam vs ATMO

Prerequisites:

Usage:

Command Line Arguments:

Usage Examples:

Results

Citation