This repo contains a portfolio of experiments and notes on NNs and DL. The aim is twofold:
- use visualisations to understand the latent space when a NN is training
- keep up to date with NN programming practices
The KL loss in a VAE encourages the encoder to map inputs to latents that are close to the prior distribution
$\mathcal{N}(0, I)$ . We can visualise this distribution for each class in the MNIST dataset by approximating it as a Mixture of Gaussians.
Standard VAE (β = 1)
β = 10
Both VAEs have the same architecture with a 2D latent space, and were trained for a single epoch. In both cases, the model learns to try and separate the the location of the class distributions, however there is significant overlap between the numbers 4 and 9, which is to be expected. The shapes of the distributions are very similar in the beta VAE, which is due the stronger KL loss.
Visualising hidden layer with 3 nodes - each has its own axis (NN layers: {784,10,3,10}). As epoch increases, the learnt weights push each digit class to a corner. Unsurprisingly, digits 4 and 9 have significant overlap! See this folder for implementation.In classification tasks, a NN learns weights so that it is able to create simple decision boundaries to separate classes in the latent space.
Weights learnt for each digit in a NN with no hidden layer. This is equivalent to applying a mask / linear transformation. See this notebook for implementation.When there is no non-linearity in the NN, the weights are equivalent to a single linear transformation. In the case of classification, intuitively, we are applying a mask on the input.
README.md
experiments/ - NN PyTorch class, training experiments
notes/ - markdown notes for experiments and theory
resources/ - store dataset, model and figures
The notes/
folder contains markdown notes on the relevant NN theory required for the experiments. It also contains notes and exercises from the book Neural Networks and Deep Learning.
For each chapter, I have written some notes and answers to most exercises / problems:
This is a WIP; I have yet to do the later chapters. I also aim to cover the following topics:
- Notes on Activation Functions
- Swish, softplus (for VAE to predict variance)
- Regularisation: L1, L2, Dropout, Continual Backprop
- Grid search over batch-size, lr using hydra
This documents the best practices I have learnt for programming NNs.
def train_loop(self, ...):
for epoch in range(epochs):
for batch in dataloader:
self.train()
# 1. Move data to device
batch = batch.to(device)
# 2. Zero gradients
self.optimizer.zero_grad()
# 3. Forward pass
output = self.model(batch)
# 4. Compute loss
loss = self.loss_function(output, batch)
# 5. Backward pass
loss.backward()
# 6. Update weights
self.optimizer.step()
configuring NNs and storing (hyper)parameters
@dataclass
class ModelConfig:
# model parameters
hidden_dim: int = 128
# training parameters
lr: float = 1e-3
batch_size: int = 32
def save_config(self, path: Path):
OmegaConf.save(OmegaConf.to_yaml(self), path)
@staticmethod
def load_config(path: Path) -> "ModelConfig":
conf = OmegaConf.load(path)
return ModelConfig(**conf)
for logging: see base.py
A list of potential avenues to explore:
- pytorch-lightning
- hydra
- fire