Skip to content

Latest commit

 

History

History
25 lines (16 loc) · 708 Bytes

README.md

File metadata and controls

25 lines (16 loc) · 708 Bytes

Simple DiLoCo

This repo contains a minimal reproducible torch example of the "DiLoCo: Distributed Low-Communication Training of Language Models" approach in 180 lines of code.

How to run the code

First install the dependencies :

pip install -r requirements.txt

Start run

1 DiLoCo replica worker

torchrun --nproc_per_node=1  pure_torch_diloco.py --per-device-train-batch-size 16 --batch-size 256 --lr 1e-3 --warmup-steps 50  --local-steps 10

2 DiLoCo replica workers

torchrun --nproc_per_node=2  pure_torch_diloco.py --per-device-train-batch-size 16 --batch-size 256 --lr 1e-3 --warmup-steps 50  --local-steps 10