Implementation and evaluation of Synchronous SGD with All-Reduce, a data parallel algorithm to train CNNs on image classification datasets.
Implemented with Python3, PyTorch, and MPI4Py.
Created as part of the seminar Advanced Topics in Parallel Computing 2019 (http://www.scc.kit.edu/en/teaching/11673.php).
The "implementation" subdirectory contains a separate Readme on how to run the code.
The "evaluation" subdirectory contains multiple R scripts that might be useful to evaluate the outputs generated by the training. However, it is currently necessary to add a "device" and a "machine" column manually to the results__*__summary files.