This is implementation of CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model. CommitBERT is accepted in ACL workshop : NLP4Prog. Have you ever hesitated to write a commit message? Now get a commit message from Artificial Intelligence!
CodeBERT: A Pre-Trained Model for Programming and Natural Languages introduces a pre-trained model in a combination of Program Language and Natural Language(PL-NL). It also introduces the problem of converting code into natural language (Code Documentation Generation).
diff --git a/test.py b/test.py
new file mode 100644
index 0000000..d13f441
--- /dev/null
+++ b/test.py
@@ -0,0 +1,6 @@
+
+import torch
+import argparse
+
+def add(a, b):
+ return a + b
Recommended Commit Message : Add two arguments .
We can use CodeBERT to create a model that generates a commit message when code is added. However, most code changes are not made only by add of the code, and some parts of the code are deleted.
diff --git a/test.py b/test.py
index d13f441..1b1b82a 100644
--- a/test.py
+++ b/test.py
@@ -1,6 +1,3 @@
-import torch
-import argparse
-
def add(a, b):
return a + b
Recommended Commit Message : Remove unused imports
To solve this problem, use a new embedding called patch_type_embeddings
that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)
Language | Added | Diff | Data(Only Diff) | Weights |
---|---|---|---|---|
Python | ✅ | ✅ | 423k | Link |
JavaScript | ✅ | ✅ | 514k | Link |
Go | ⬜ | ⬜ | ⬜ | ⬜ |
JAVA | ⬜ | ⬜ | ⬜ | ⬜ |
Ruby | ⬜ | ⬜ | ⬜ | ⬜ |
PHP | ⬜ | ⬜ | ⬜ | ⬜ |
- ✅ — Supported
- ⬜ - N/A ️
We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is CodeSearchNet dataset.
To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.
Prepare Docker and Nvidia-docker before running the server.
Serve flask server with Nvidia Docker. Check the docker tag for programming language in here.
Language | Tag |
---|---|
Python | py |
JavaScript | js |
Go | go |
JAVA | java |
Ruby | ruby |
PHP | php |
$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
Even if you don't have a GPU, you can still serve the flask server by using the ngrok setting in commit_autosuggestions.ipynb.
First, install the package through pip.
$ pip install commit
Set the endpoint for the flask server configured in step 1 through the commit configure command. (For example, if the endpoint is http://127.0.0.1:5000, set it as follows: commit configure --endpoint http://127.0.0.1:5000
)
$ commit configure --help
Usage: commit configure [OPTIONS]
Options:
--profile TEXT unique name for managing each independent settings
--endpoint TEXT endpoint address accessible to the server (example :
http://127.0.0.1:5000/) [required]
--help Show this message and exit.
All setup is done! Now, you can get a commit message from the AI with the command commit.
$ commit --help
Usage: commit [OPTIONS] COMMAND [ARGS]...
Options:
--profile TEXT unique name for managing each independent settings
-f, --file FILENAME patch file containing git diff (e.g. file created by
`git add` and `git diff --cached > test.diff`)
-v, --verbose print suggested commit message more detail.
-a, --autocommit automatically commit without asking if you want to
commit
--help Show this message and exit.
Commands:
configure
Refer How to train for your lint style. This allows you to re-fine tuning to your repository's commit lint style.
You can contribute anything, even a typo or code in the article. Don't hesitate!!. Versions are managed only within the branch with the name of each version. After being released on Pypi, it is merged into the master branch and new development proceeds in the upgraded version branch.
@article{jung2021commitbert,
title={CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model},
author={Jung, Tae-Hwan},
journal={arXiv preprint arXiv:2105.14242},
year={2021}
}