- Overview
- Deep Learning (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Deep Learning in Neural Networks: An Overview (2014) Jurgen Schmidhuber
- The perceptron: a probabilistic model for information storage and organization in the brain (1958) F. Rosenblatt
- Multilayer Feedforward Networks are Universal Approximators (1989) K. Hornik
- Deep Big Simple Neural Nets Excel on Hand-written Digit Recognition (2010) Dan Claudiu Cireşan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber
- GMDH Group method of data handling (Website, Wiki)
- Polynomial Theory of Complex Systems (1971) Ivakhnenko A.G.
- The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (1995) Ivakhnenko A.G., Ivakhnenko G.A.
- Binarized Neural Networks
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016) Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio
- How to Train a Compact Binary Neural Network with High Accuracy? (2017) Wei Tang, Gang Hua, Liang Wang
- One of the papers on convolutional nets - [Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position] (https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf) (1980) K. Fukushima
- A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects Zewen Li, Wenjie Yang, Shouheng Peng, Fan Liu
- Flexible, High Performance ConvolutionalNeural Networks for Image Classification (2011) Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, Jurgen Schmidhube
- Boltzmann machines
- Learning and relearning in Boltzmann machines (1986) G. E. Hinton, T. J. Sejnowski
- LSTM
- Long Short-term Memory (1997) S. Hochreiter, J. Schmidhuber
- Framewise Phoneme Classification withBidirectional LSTM and Other Neural NetworkArchitectures (2005) Alex Graves, Jurgen Schmidhuber
- Competitive learning
- Feature Discovery by Competitive Learning (1985) David E. Rumelhart
- Autoencoders
- Modular learning in neural networks (1987) D.H. Ballard
- Extracting and composing robust features with denoising autoencoders (2008) P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol
- From Deep Learning book - Autoencoders (ch. 14) (2016) Ian Goodfellow, Yoshua Bengio, Aaron Courville
- An Introduction to Variational Autoencoders (2019) Diederik P. Kingma, Max Welling
- Contractive Auto-Encoders: Explicit Invariance During Feature Extraction (2011) S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio
- Deep AutoRegressive Networks (2014) Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra
- Denoising Autoencoders
- VAE Variational autoencoders
- Auto-Encoding Variational Bayes (2014) Diederik P Kingma, Max Welling
- Tutorial on Variational Autoencoders (2016) Carl Doersch
- Variational Autoencoder for Deep Learning of Images, Labels and Captions (2016) Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin
- SOM Self-organizing maps
- Cresceptron (Max-Pooling layers)
- Cresceptron: A Self-organizing Neural Network Which Grows Adaptively (1992) John (Juyang) Weng, Narendra Ahuja, Thomas S. Huang
- Generative Adversarial Networks (2014) Ian J. Goodfellow, Jean Pouget-Abadie∗, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair†, Aaron Courville, Yoshua Bengio
- Time-series Generative Adversarial Networks (2019) J. Yoon, D. Jarrett, M. van der Schaar
- Conditional GAN
- Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks (2019) A. Koochali, P. Schichtel, S. Ahmed, A. Dengel
- A Practical Bayesian Framework for Backpropagation Networks (1992) David J. C. MacKay
- Bayesian Learning for Neural Networks (1995) R.M. Neal
- Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks (1995) David J. C. MacKay
- Practical Variational Inference for Neural Networks (2011) Alex Graves
- Weight Uncertainty in Neural Networks (2015) Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (2016) Y. Gal, Z. Ghahramani
- Stochastic Gradient Descent as Approximate Bayesian Inference (2017) S. Mandt, M.D. Hoffman, D.M. Blei
- Deep neural networks as Gaussian Processes (2018) Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein
- Noisy Natural Gradient as Variational Inference (2018) Guodong Zhang, Shengyang Sun, David Duvenaud, Roger Grosse
- Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam (2018) Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava
- Understanding Priors in Bayesian Neural Networks at the Unit Level (2019) Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, Julyan Arbel
- Bayesian Deep Learning and a Probabilistic Perspective of Generalization (2020) Andrew Gordon Wilson, Pavel Izmailov
- Based on Random Access Memory (RAM) nodes
- Advances in Weightless Neural Systems (2014) F.M.G. França, M. De Gregorio, P.M.V. Lima, W.R. de Oliveira
- WiSARD
- PLN Probabilistic Logic Nodes
- GSN Goal Seeking Neurons
- GRAM
- Sigmoid
- HardSigmoid
- SiLU, dSiLU
- Tanh, HardTanh
- Softmax
- Softplus
- Softsign
- ReLU Rectified Linear Unit
- Rectified Linear Units Improve Restricted Boltzmann Machines (2010) V. Nair, G.E. Hinton
- Deep Sparse Rectifier Neural Networks (2011) X. Glorot, A. Bordes, Y. Bengio
- LReLU Leaky ReLU
- Rectifier Nonlinearities Improve Neural Network Acoustic Models (2013) A.L. Maas, A.Y. Hannun, A.Y. Ng
- PReLU Parametric ReLU
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
- RReLU Randomized ReLU
- Empirical Evaluation of Rectified Activations in Convolutional Network (2015) Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li
- SReLU
- ELU
- Fast and Accurate Deep Network Learning by Exponential Linear Units (2015) Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter
- PELU
- Parametric Exponential Linear Unit forDeep Convolutional Neural Network (2016) L. Trottier, P. Giguère, B. Chaib-draa
- SELU
- Maxout
- Mish
- Mish: A Self Regularized Non-Monotonic Neural Activation Function (2019) Diganta Misra
- Swish
- ELiSH
- HardELiSH
- Weight guessing
- Vanishing gradient problem (Wiki)
- Double descent
- Deep Double Descent: Where Bigger Models and More Data Hurt (2019) Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
- BP Back-propagation
- Learning representations by back-propagating errors (1986) David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
- Backpropagation Applied to Handwritten Zip Code Recognition (1989) Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel
- Pruning - reduces computational cost, improves generalization
- Optimal Brain Damage (1990) Yann Le Cun, John S. Denker, Sara A. Solla
- Learning both Weights and Connections for Efficient Neural Networks (2015) Song Han, Jeff Pool, John Tran, William J. Dally
- Pruning Convolutional Neural Networks for Resource Efficient Inference (2017) Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz
- Learning Sparse Neural Networks through L0 Regularization (2018) Christos Louizos, Max Welling, Diederik P. Kingma
- Pretraining
- Why Does Unsupervised Pre-training Help Deep Learning? (2010) D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio
- Dropout
- Improving neural networks by preventing co-adaptation of feature detectors (2012) G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov
- Adaptive dropout for training deep neural networks (2013) L.J. Ba, B. Frey
- The Dropout Learning Algorithm (2014) P. Baldi, P. Sadowski
- Fast dropout training (2013) S.I. Wang, C.D. Manning
- Knowledge Distillation
- Large neural networks (teacher networks) transfer knowledge to smaller networks (called student networks)
- Neural Network Pruning
- Removing unimportant weights
- Quantization
- Reducing the number of bits used to store the weights
- Software
- KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization (2020) Het Shah, Avishree Khare, Neelay Shah, Khizir Siddiqui
- Neural Network Ensembles (1990) L. K. Hansen, P. Salamon
- When Networks Disagree: Ensemble Methods for Hybrid Neural Networks (1993) M.P. Perrone, L.N. Cooper
- Neural Network Ensembles, Cross Validation, and Active Learning (1995) A. Krogh, J. Vedelsby
- When Ensembling Smaller Models is More Efficient than SingleLarge Models (2020) D. Kondratyuk, M. Tan, M. Brown, B. Gong