GitHub - PrimeIntellect-ai/INTELLECT-MATH

INTELLECT-MATH: Frontier Mathematical Reasoning through Better Initializations for Reinforcement Learning

🔗 Blog Post • 🐦 X / Twitter • 🤗 Weights & Data

This repository contains code for reproducing INTELLECT-MATH, a frontier 7B parameter model for mathematical reasoning.

INTELLECT-Math was trained in two stages: an initial SFT stage and a second online reinforcement learning stage based on Process Reinforcement through Implicit Rewards.

By generating our SFT dataset with a strong teacher model like QwQ-32B, we can provide a better policy initialization for online reinforcement learning. This way, we can match the performance of the existing SOTA model Eurus-2-7B-Prime with 10x less time spent on reinforcement learning, and outperform the model with further training.

	Intellect-Math (Step 255)	Intellect-Math (Step 47)	Eurus-2-Prime (Step 592)	Intellect-Math-SFT	Eurus-2-SFT	Qwen-2.5-Math
MATH-500	82.0	81.6	79.2	72.8	65.1	79.8
OLYMPIADBENCH	49.5	46.7	42.1	39.1	29.8	40.7
AIME 2024	26.7	26.7	26.7	16.6	3.3	13.3
AMC	60.2	57.8	57.8	45.8	30.1	50.6
MINERVA MATH	39.7	37.8	38.6	33.8	32.7	34.6
AVG	51.6	50.1	48.9	41.6	32.2	43.8

Reproducing INTELLECT-MATH

You can reproduce INTELLECT-MATH step by step using the following code:

1) Generate the SFT Dataset

To generate our SFT dataset, we used QwQ-32B to sample two responses for every question from the NuminaMath dataset. To achieve fast throughput and large batch sizes, we use the sglang inference engine. Keeping only the correct responses, we are left with PrimeIntellect/INTELLECT-MATH-7B-SFT-Data, a dataset containing 733k questions and responses.

You can use the code in synthetic-data to generate an SFT dataset:

cd synthetic-data

# install requirements
pip install -r requirements.txt

# generate data, sampling two responses per question
python generate.py --num_responses_per_question 2

2) Fine-tune a model on the synthetic SFT dataset

We used open-instruct to fine-tune Qwen/Qwen2.5-Math-7B into PrimeIntellect/INTELLECT-MATH-SFT. The code for can be found in sft. Follow the README inside the folder to set up your environment - then you can reproduce our SFT model using the script intellect-math-scripts/train_intellect_math_7b_sft.sh.

3) Online Reinforcement Learning on top of your SFT model

Finally, you need to further train your SFT model with reinforcement learning using PRIME-RL. This last step was used to train PrimeIntellect/INTELLECT-MATH from PrimeIntellect/INTELLECT-MATH-SFT. The code for this, along with evals, can be found in rl-and-evals. To reproduce our model, you can use the script rl-and-evals/training/intellect-math-scripts/train_intellect_math_7b.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
rl-and-evals		rl-and-evals
sft		sft
synthetic-data		synthetic-data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INTELLECT-MATH: Frontier Mathematical Reasoning through Better Initializations for Reinforcement Learning

Reproducing INTELLECT-MATH

1) Generate the SFT Dataset

2) Fine-tune a model on the synthetic SFT dataset

3) Online Reinforcement Learning on top of your SFT model

About

Releases

Packages

Languages

PrimeIntellect-ai/INTELLECT-MATH

Folders and files

Latest commit

History

Repository files navigation

INTELLECT-MATH: Frontier Mathematical Reasoning through Better Initializations for Reinforcement Learning

Reproducing INTELLECT-MATH

1) Generate the SFT Dataset

2) Fine-tune a model on the synthetic SFT dataset

3) Online Reinforcement Learning on top of your SFT model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages