Skip to content

This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks.

Notifications You must be signed in to change notification settings

dvlab-research/MoTCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

• 🤗 Data • 🤗 Model • 🐱 Code • 📃 Paper

PWC PWC

Large Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that conventional models often generate solutions as monolithic code blocks, restricting their effectiveness in tackling intricate questions. To overcome this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a pioneering framework for MoT instruction tuning, designed to promote the decomposition of tasks into logical sub-tasks and sub-modules. Our investigations reveal that, through the cultivation and utilization of sub-modules, MoTCoder significantly improves both the modularity and correctness of the generated solutions, leading to substantial relative pass@1 improvements of 12.9% on APPS and 9.43% on CodeContests.

Performance

Performance on APPS

Performance on APPS

Model Size Introductory Interview Competition All
CodeT5 770M 6.60 1.03 0.30 2.00
CodeRL+CodeT5 770M 7.08 1.86 0.75 2.69
text-davinci-002 - - - - 7.48
Self-edit+text-davinci-002 - - - - 7.94
GPT-2 0.1B 5.64 6.93 4.37 6.16
1.5B 7.40 9.11 5.05 7.96
GPT-Neo 2.7B 14.68 9.85 6.54 10.15
GPT-3 175B 0.57 0.65 0.21 0.55
StarCoder 15B 7.25 6.89 4.08 6.40
WizardCoder 15B 26.04 4.21 0.81 7.90
CodeChain+WizardCoder 15B 26.29 7.49 3.75 10.50
Octocoder 16B 16.50 7.92 4.61 8.97
Codellama 7B 14.15 6.63 4.00 7.61
13B 23.94 13.50 9.80 14.85
34B 32.01 18.61 10.19 19.61
Codellama-Python 7B 18.83 8.62 4.47 9.83
13B 26.40 13.44 6.86 14.72
34B 26.45 16.61 8.77 17.01
Codellama-Instruct 7B 14.20 6.63 4.43 7.70
13B 22.41 14.34 6.62 15.21
34B 28.64 16.80 10.51 17.91
Deepseek-Coder-Base 6.7B 40.23 22.12 13.04 23.92
Deepseek-Coder-Instruct 6.7B 44.65 23.86 12.89 25.83
MoTCoder 6.7B 50.01 29.81 14.36 30.76
GPT-4 - 34.97 13.75 14.52 18.15

Performance on CodeContests

Model Size pass@1 pass@5
code-davinci-002 - 1.00 -
code-davinci-002 + CodeT - 3.20 -
WizardCoder 15B 1.98 3.27
WizardCoder + CodeChain 15B 2.48 3.30
Octocoder 16B 4.95 13.03
Codellama 7B 0.30 1.11
13B 2.12 6.26
34B 5.35 12.02
Codellama-Python 7B 4.75 10.30
13B 4.75 12.32
34B 5.86 14.85
Codellama-Instruct 7B 2.12 6.26
13B 5.96 12.02
34B 6.46 14.24
Deepseek-Coder-Base 6.7B 6.46 15.25
Deepseek-Coder-Instruct 6.7B 6.87 8.18
MoTCoder 6.7B 9.29 16.97
GPT-4 - 16.36 -

Environment

Install the dependencies.

python -m pip install -e .

Evaluation Datasets

APPS Dataset

The APPs dataset [github] can be download from huggingface.

CodeContests Dataset

The CodeContests dataset [github] can be download from huggingface. For CodeContests, convert your dataset to the same format as APPs for utilizting APPs evaluation metrics:

python src/convert_codecontest_dataset.py $SRC_DIR $DST_DIR

Inference

You can download our MoTCoder for evaluation from huggingface. We provide the inference command to reproduce the results in our paper.

  • If you want to use modular-of-thought inference prompt, set prompt_type=FORMAT_PROMPT.
  • If you want to use normal inference prompt, set prompt_type=NORMAL_FORMAT_PROMPT.

First generate the solutions for you targeted evaluation dataset. Choice 1: VLLM (Recommended) To install the requreiments:

pip install vllm

Inference:

python src/inference_vllm.py \
    --model_path $model_path \
    --data_path $data_path \
    --solution_path $solution_path \
    --prompt_type $prompt_type

Choice 2: transformers Inference:

python src/inference.py \
    $model_path \
    $data_path \
    $solution_path \
    $prompt_type

APPs Evaluation

For APPs evaluation, choices of $level$ include $introductory, interview, competition$.

python src/test_leetcode.py \
    --solutions_path $solution_path \
    --data_path $data_path \
    --save_path $result_path \
    --level $level

CodeContests Evaluation

python src/test_apps.py \
    --solutions_path $solution_path \
    --data_path $data_path \
    --save_path $result_path

Training

Modular-of-Thought Training Dataset

We provide an example python file to evolution a MoT dataset. Run the following command:

python src/generate_MoT_dataset.py \
    --data_path $data_path \
    --save_path $MoT_data_path \
    --api_base $api_base \
    --api_key $api_key

MoTCode Dataset

Or, you can download our generated modular-of-thought code dataset.

from datasets import load_dataset
load_dataset("JingyaoLi/MoTCode-Data")

Modular-of-Thought Training

Run the following command to train the model

deepspeed src/train.py \
    --model_name_or_path $model_path \
    --data_path $MoT_data_path \
    --output_dir $output_dir \
    --num_train_epochs 3 \
    --model_max_length 2048 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1000 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --warmup_steps 30 \
    --logging_steps 2 \
    --lr_scheduler_type "cosine" \
    --report_to "tensorboard" \
    --gradient_checkpointing True \
    --deepspeed configs/deepspeed_config.json \
    --fp16 True \
    --prompt_type FORMAT_PROMPT

About

This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages