- please download the training data (Table 1) from the official website
- run generate_testing_data.py to get the training data
- run call_openai.py
- Go to finetune folder, and install the dependencies by:
pip -r install requirements.txt
- Build the SFT model by running finetune.without.mpo.sh
- split the training data into 5 folds
- go to finetune folder, and run finetune.without.mpo.sh on each fold
- run export_hf_checkpoint.py to convert lora weights into huggingface's weights
- go to vllm_inference/inference_scripts folder
- run the prediction on the held-out set using prompt.wa.mix.sh
- calculate the score using task-specific evaluation metrics
- go to /trl/examples/scripts/scripts folder
- build the reward model by running train.exp.sh
- conver the SFT's lora weight into huggingface's weights using export_hf_checkpoint.py
- go to trl/examples/scripts/scripts folder, and run inference.sh to get the reward value, and build the preference data
- go to finetune folder, and run finetune.with.mapo.sh to obtain the final WA model
- go to vllm_inference/inference_scripts folder
- run the run_test.accept.input.exp.new.format.sh
To be uploaded soon.
If you found our paper or code useful, please cite as:
@inproceedings{cao-etal-2021-grammatical-error,
title = "Rationalize and Align: Enhancing Writing Assistance with Rationale via Self-Training for Improved Alignment",
author = "Cao, Hannan and
Ye, Hai and
Ng, Hwee Tou",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
year = "2025",
}