Code-commenting

Dataset

We use the dataset of the DeepCom code for our training. We have trained mostly on Google Colaboratory. Recommend trying for the university's HPC (if not too busy) or GCloud credits (if ANY of your cards can get through without a refund).

Our model did not cross SOTA performance, which is something we have expected. It has however managed to produce semantically correct comments, occasionally more informative than the user's comments themselves.
Many of the comments within the first epoch were repetitive, but the number of meaningful comments increased significantly over time.

NOTE: The first line is the comment spit out by the machine. The second one is the true human comment:

The model requires more training for rarer tokens:

Here the model fails to spit a grammatically correct word, but it can capture the inner semantics of the code:

Due to teacher-forcing, the machine has been confused, but otherwise, it still tried to produce a meaningful comment when humans gave a bad comment: