Predictable Scale: Part I

News

🔥 2024/03/10 We have released our fitting code and fitted model parameters! Check them in the code and data folders.
🔥 2024/03/09 We have released our training logs! Access them on Wandb.
🔥 2024/03/08 All smooth loss heatmaps have been released on our homepage.
🔥 2024/03/08 We have launched an optimal hyperparameter tool for the community on our homepage.
🔥 2025/03/07 We have released our paper on Arxiv: 📄 Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining.

🗓 Coming Soon

Introduction

We first present the unified optimal hyperparameter scaling laws, termed Step Law, that generalizes across diverse model shapes, architectures, and data distributions.

Our findings demonstrate remarkable accuracy, with estimated values on test sets deviating by only 0.09% from the globally optimal LLM performance identified through exhaustive search.

This research entails a significant computational investment, utilizing nearly one million NVIDIA H800 GPU hours to train 3,700 LLMs of varying sizes and hyperparameters from scratch, consuming approximately 100 trillion tokens in total. To support reproducibility and advance the field for LLM pre-training, we will progressively release all loss measurements and model checkpoints through our designated repository. The universal, plug-and-play optimal hyperparameter tool is provided for the community.

Usage

The repository provides tools and data for predicting optimal learning rate and batch size for LLM pretraining:

Data Files

The Data folder contains:

Smooth loss results for both dense and MoE models (two CSV files)
Structure and training configurations for each model
data/1004_fitted_lr_bs_scaling_model_parameters.csv: Contains fitted model parameters from 1000 bootstrap models for robust prediction of optimal learning rate and batch size. The model follows the form:
- lr = exp(intercept) * N^coefN * D^coefD
- bs = exp(intercept) * D^coefD

Prediction Tool

We provide a simple command line tool to predict optimal learning rate and batch size based on your model parameters:

python code/fit_tool.py pred-opt-lr-bs [model_params] [data_in_token] [seq_len]

Parameters:

model_params: Number of model parameters
data_in_token: Training data size in tokens
seq_len: Sequence length

Example:

python code/fit_tool.py pred-opt-lr-bs 7e9 1.4e12 2048

Log Analysis Tool

We also provide a log analysis tool to analyze training logs and extract smooth loss measurements:

python code/log_analysis.py quick-check [base_dir] [dir_pattern] [--target-iter] [--max-cnt] [--pretty]

Parameters:

base_dir: Base directory containing training logs
dir_pattern: Regex pattern to match experiment directories
--target-iter: Target iteration to analyze (optional)
--max-cnt: Maximum number of log entries to process (default: 32768)
--pretty: Print results in a pretty table format (optional)

Example:

python code/log_analysis.py quick-check ./logs "exp_.*" --pretty

For more training details and experimental results, please refer to our Wandb page.

Citation

If you find our work helpful, feel free to give us a cite :-)

@misc{li2025predictablescalei,
      title={Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining}, 
      author={Houyi Li and Wenzheng Zheng and Jingcheng Hu and Qiufeng Wang and Hanshan Zhang and Zili Wang and Yangshijie Xu and Shuigeng Zhou and Xiangyu Zhang and Daxin Jiang},
      year={2025},
      eprint={2503.04715},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.04715}, 
}

Name	Name	Last commit message	Last commit date
Latest commit BillZid add smooth loss analysis Mar 14, 2025 4104faa · Mar 14, 2025 History 12 Commits
assets	assets	add fitting code	Mar 9, 2025
code	code	add smooth loss analysis	Mar 14, 2025
data	data	add fitting code	Mar 9, 2025
.gitignore	.gitignore	add smooth loss analysis	Mar 14, 2025
README.md	README.md	add smooth loss analysis	Mar 14, 2025
requirements.txt	requirements.txt	add smooth loss analysis	Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictable Scale: Part I

News

🗓 Coming Soon

Introduction

Usage

Data Files

Prediction Tool

Log Analysis Tool

Citation

Star History

About

Releases

Packages

Contributors 2

Languages

step-law/steplaw

Folders and files

Latest commit

History

Repository files navigation

Predictable Scale: Part I

News

🗓 Coming Soon

Introduction

Usage

Data Files

Prediction Tool

Log Analysis Tool

Citation

Star History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages