The Python scripts in this directory are tested on Python 3.9. Due to extensive usage of Python's AST library, which changes from one Python version to another one, these scripts might not work properly on other versions of Python.
The purpose of this phase is to produce four files:
-
File subject_info.csv that contains all the information about the benchmarks that are used in the experiments. The
subject_info.csv
file can then be used to automatically generate the bash scripts in the bash_script_generator/scripts directory. -
File ground_truth_info.json that contains information about the changes in the commits representing the bugs in BugsInPy.
-
File size_counts.json, which contains the number of lines, functions, and modules in each buggy version.
-
File predicate_bug_info.json that indicates which bugs in BugsInPy are predicate-related bugs.
File subject_info.csv
is used
in the bash_script_generator phase
to automatically generate the
bash scripts in
the bash_script_generator/scripts directory,
and the three files ground_truth_info.json
, size_counts.json
, and
predicate_bug_info.json
are used
in the metric_computation phase
to compute the metrics.
We have already generated these three files, and they exist in this directory. So, you do not need to go through this process. But, if you want to replicate the process, you can follow the instructions below. Keep in mind that going through the whole process can take up to 10 days and nights.
At this step, we clone and compile all the buggy and fixed versions in BugsInPy.
To do that, we run the following command by passing one of the Json files in
the info and info_2 directories.
For instance, [BugsInPyProjectName]
can be
keras
. This script checks out all the buggy and
fixed versions of a given BugsInPy project (e.g., keras), and
runs the target failing tests on both the buggy and the fixed
versions, and saves the results.
This script must be executed for every single one of the Json files
in the info
and info_2
directories, which takes around a
week (7 * 24 hours) to finish, depending on
your machine and internet connection.
Before running any of the following steps, open
the workspace.json file and set
the workspace for the script (set the variable WORKSPACE_PATH
).
Make sure there is enough space on the path you provide as
the workspace. It requires ~500 GB for all the projects.
Also, add a file named github_token.txt
to the current directory and
put a GitHub token in it as it is needed to fetch data from GitHub.
./benchmark_opener.sh [info|info_2]/[BugsInPyProjectName].json
At this step, we check if all the buggy and fixed versions are compiled correctly. If any of them is not compiled, we must repeat the previous step for it, or remove it from the experiments. To perform this check, we run the following commands that take no arguments. These scripts print on the screen which compilations where not successful, and it should finish quickly (probably in less than 1 minute).
pip install PyGithub
./check_compile_all.sh
./check_compile_all_2.sh
After running these commands, we
realized tornado 16
cannot be compiled duo to
a bug in the BugsInPy framework
(missing requirements.txt
file for bug 16).
Thus, we
removed tornado 16 from our experiments by setting BUG_NUMBER_END
in
info/tornado.json to 15 instead of 16. The standard output
of this step
can be found in the compile_log and compile_log_2
directories.
The BugsInPy framework contains 17 projects, and the total of 501 bugs. For each bug, it has a buggy and a fixed version, a test suite, and one or more tests to reveal each bug, we refer to which as the target failing tests. The idea is that the target failing tests of a bug must fail on the buggy version of that bug while pass on the fixed version of that bug.
However, due to some reasons (such as dependency problems) it does not always hold. For instance, in some cases, the target failing tests pass or fail on both versions, or produce an error on fixed versions. Such bugs must be removed from our experiments, which is done at this step. To perform this check, run the following commands.
Before running ./check_target_tests_all_2.sh
,
fix the bugsinpy_run_test.sh
file in the
buggy and fixed versions
of matplotlib 8
by putting the two tests in two different lines and removing
the semicolon between them or ./check_target_tests_all_2.sh
crashes
for matplotlib due to an assertion failure.
./check_target_tests_all.sh
./check_target_tests_all_2.sh
When these scripts are finished running, they produce the correct and correct_2 directories, containing a Json file for each project showing which bugs have been removed and kept according to the criteria mentioned above. The standard output of this step can be found in the target_tests_log and target_tests_log_2 directories.
At this step, we generate
the file ground_truth_info.json that contains
information about changes made to fix each bug in BugsInPy.
This file is then used
at the metric_computation phase to calculate
the metrics we use in the paper.
To generate ground_truth_info.json
, run the following command:
python generate_ground_truth_info.py
This script also generates
two files empty_ground_truth_info.json
and empty_ground_truth_info_2.json that
contains those bugs in BugsInPy for which the computed ground
truth is empty. These files are used at
step 6 to exclude such cases
from the experiments (we
ignore empty_ground_truth_info_2.json
because it is empty).
Another file generated at this phase is predicate_bug_info.json that shows which bugs are predicate bugs. We use this file also at the metric computation phase.
At this step, we count the number of lines in every buggy version, excluding empty lines and comment lines. We only consider lines from modules that are used in fault localization, which are those in target directories (not in test modules or Python virtual environments). We also count the number of functions, and the number of modules in each buggy version.
We need this information to compute the Exam Score at the metric_computation phase. Run the following command (which is slow) to generate the size_counts.json file:
python size_counter.py
Based on some rough estimate of the amount of time each experiment requires and the processing resources we have (a cluster server with 15 available nodes for two weeks), we randomly select a subset of the bugs picked at the previous step. To perform this simulation, run the following command:
python estimate_time.py
When this script is finished running, it generates the time_selected_bugs.json file that contains the randomly selected bugs from each of the BugsInPy projects.
This is the final step in which the subject_info.csv file
for the bugs
in time_selected_bugs.json
is generated. To perform this step, we must run
the following command.
pip install python-scalpel
pip install packaging
python generate_subject_info.py
This step also performs a call graph based test case selection using Scalpel, a python static analysis framework. So, it is very slow.