NanoFormer

NanoFormer is a lightweight transformer model implementation designed for efficient training and inference. It features grouped query attention (GQA) and various architectural optimizations.

Features

Configurable transformer architecture with GQA support
Dynamic batch size handling with efficient padding
Mixed precision training (bfloat16)
Gradient checkpointing for memory efficiency
Gradient accumulation support
Wandb integration for experiment tracking
Automatic model checkpointing
Custom training loop with validation

Installation

git clone https://github.com/yourusername/nanoformer.git
cd nanoformer

Usage

Training

To train the model with default parameters:

python train.py \
    --dataset "imdatta0/wikipedia_en_sample" \
    --batch_size 8 \
    --gradient_accumulation_steps 16 \
    --num_epochs 1 \
    --lr 5e-4 \
    --hidden_dim 256 \
    --num_hidden_layers 8

To estimate the number of tokens in a dataset and the model's param count with given config: (will need to refactor this to not create the model for estimation)

python train.py \
    --dataset "imdatta0/wikipedia_en_sample" \
    --batch_size 8 \
    --gradient_accumulation_steps 16 \
    --num_epochs 1 \
    --lr 5e-4 \
    --hidden_dim 256 \
    --num_hidden_layers 8 \
    --estimate

TODO

Implement Differential Transformer
Implement nGPT
Implement custom optimisers like Shampoo, SOAP and whatnot
Add support for Sliding Window Attention
Modify configs to be closer to Chinchilla Optimal Ratios

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
arch		arch
data/sherlock		data/sherlock
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoFormer

Features

Installation

Usage

Training

TODO

About

Releases

Packages

Languages

Datta0/nanoformer

Folders and files

Latest commit

History

Repository files navigation

NanoFormer

Features

Installation

Usage

Training

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages