This repository provides code, state-of-the art predictions and links to the pretrained Grammatical Error Correction models for "Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models" paper which was accepted for publication at BEA-2024 (19th Workshop on Innovative Use of NLP for Building Educational Applications; co-located with NAACL 2024).
Scripts
directory contain required code to reproduce some of the baselines and build ensembles.
Data
directory contain single systems and ensembles outputs on 3 main GEC benchmarks.
Table bellow contain single system scores and links to trained models available for download.
Model name | CoNNL-2014 (test) | BEA-2019 (dev) | BEA-2019 (test) | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F05 | Precision | Recall | F05 | Precision | Recall | F05 | |
CTC-copy [repo] | 72.6 | 47.0 | 65.5 | 58.2 | 38.0 | 52.7 | 71.7 | 59.9 | 69.0 |
GECToR-2024 [link] | 75.0 | 44.7 | 66.0 | 64.6 | 37.2 | 56.3 | 77.7 | 59.0 | 73.1 |
EditScorer [repo] | 78.5 | 39.4 | 65.5 | 67.3 | 36.1 | 57.4 | 81.0 | 56.1 | 74.4 |
T5-11B [link] | 70.9 | 56.5 | 67.5 | 60.9 | 51.1 | 58.6 | 73.2 | 71.2 | 72.8 |
UL2-20B [link] | 73.8 | 50.4 | 67.5 | 60.5 | 48.6 | 57.7 | 75.2 | 70.0 | 74.1 |
Chat-LLaMa-2-7B-FT [link] | 75.5 | 46.8 | 67.2 | 58.3 | 46.0 | 55.3 | 72.3 | 67.4 | 71.2 |
Chat-LLaMa-2-13B-FT [link] | 77.2 | 45.6 | 67.9 | 59.8 | 46.1 | 56.4 | 74.6 | 67.8 | 73.1 |
Majority-voting ensemble (best 7) | 83.7 | 45.7 | 71.8 | 71.7 | 42.2 | 62.9 | 87.3 | 64.1 | 81.4 |
MAJORITY-VOTING ✚[ majority-voting(best 7), GRECO-rank-w(best 7), GPT-4-rank-a(clust 3) ] | 83.9 | 47.5 | 72.8 | 70.6 | 43.5 | 62.8 | 86.1 | 65.6 | 81.1 |
There are 3 evaluation sets that we are using for GEC:
- CoNLL-2014 (
nucle14-2a
, m2 file is available; m2scorer is official scorer) - BEA19-dev (
bea-dev
, m2 file is available; errant is official scorer) - BEA19-test (
bea-test
, m2 file is NOT available; score can be got only through codelab sumbission)
Evalsest directory: data/evaluation_sets
.
- Example of evaluation with Errant
ERRANT_SCORER=path_to_errant_scorer_directory
INPUT_FILE=data/evaluation_sets/bea-dev.txt
M2_FILE=data/evaluation_sets/bea-dev.m2
PRED_FILE=YOUR_PRED_FILE.txt
TMP_FILE=YOUR_TMP_FILE.m2
python $ERRANT_SCORER/parallel_to_m2.py -orig $INPUT_FILE -cor $PRED_FILE -out $TMP_FILE
python $ERRANT_SCORER/compare_m2.py -hyp $TMP_FILE -ref $M2_FILE >> {{result}}
- Example of evaluation with m2scorer
M2_SCORER=path_to_m2scorer
M2_FILE=data/evaluation_sets/nucle14-2a.m2
PRED_FILE=YOUR_PRED_FILE.txt
$M2_SCORER $PRED_FILE $M2_FILE >> {{reslut}}
[to be updated once proceedings are published]
@misc{omelianchuk2024pillars,
title={Pillars of Grammatical Error Correction: Comprehensive Inspection Of Contemporary Approaches In The Era of Large Language Models},
author={Kostiantyn Omelianchuk and Andrii Liubonko and Oleksandr Skurzhanskyi and Artem Chernodub and Oleksandr Korniienko and Igor Samokhin},
year={2024},
eprint={2404.14914},
archivePrefix={arXiv},
primaryClass={cs.CL}
}