Quickstart Green Mode

Metadata
cEP	22
Version	1.0
Title	Quickstart Green Mode
Authors	Ishan Srivastava mailto:ishan.srt@gmail.com
Status	Implementation Due
Type	Feature

Abstract

This document describes what is the Green Mode for coala-quickstart and how will this be implemented.

This cEP describes new type of config files which will be generated by coala-quickstart, how will they be different from the ones currently being created, how is this helpful and the action-plan for adding this feature.

Requirements of "Green Mode"

Currently coala-quickstart generates config files which point out too many errors in the code-base of the project.
It asks too many questions in the "interactive mode" while generating the config file.
The config files generated in "non-interactive mode" are not that specific to the project i.e. many optional bear settings are skipped and even many bears are skipped due to non-availability of "guessing" features.
The errors pointed out by coala using generated config files are generally undesirable for the project owners due to many organizations following their own custom code styles rather than following pre-defined standards implemented by most linters and bears.

Goals

Preparing "green" configuration files which agree to the project's code style and adapt to their standards, with least user interaction, producing absolutely zero errors in the code base.
The configuration file should be as specific as possible i.e. can point out maximum errors with least amount of manual edits in the future, for future inconsistencies in commits.
Storing metadata in the new types of files and classes proposed to enhance the config file produced for future runs.
Maintaining a GitHub repo for storing metadata of combinations of bear setting values already in use by existing organisations and using them as initial checks before resorting to Brute Force.
Automating the task of Adding .coafile and CI to enforce it (example) to the level of generating config files so it is easier for newcomers to get involved to create easily mergeable Pull Request in other communities, which leads to easier adoption of coala.

Implementation

This project will be implemented in five phases:

Classification of settings and Metadata Generation
The General Operations
Acting upon the newly available metadata
The Brute Force way and further optimization on the Brute Force
Learning from Real World Projects

Classification of settings and Metadata Generation

We categorize the settings into 4 different types:

The ones that attain a bool value.(From here onwards referred to as Type 1 or Type bool settings)
The ones that can possibly attain an infinite number of values.(From here onwards referred to as Type 2 or Type infinite settings)
The ones that can attain some fixed discrete set of values.(From here onwards referred to as Type 3 or Type discrete settings)
The settings which require an additional config file to be parsed by coala-quickstart.(From here onwards referred to as Type 4 or Type config settings)

Type 1 or Type bool settings

How will they be guessed?

Since these settings can obtain a bool value, either True or False, the Brute Force or the lite-mode will run coala over and over again until it is able to find a "green" setting.

Metadata Generation

In case of inability to find a "green" value for a setting the .quickfile will store the setting along with a weight in section [severe]. The weight will determine how many inconsistencies were found in the project on assuming that particular value of the setting.
On successfully finding a combination of setting values as "green", a QUICKINFO.yaml file will be generated storing the combination of setting values along with the project Github links.

Example

SpaceConsistencyBear:

@deprecate_settings(indent_size='tab_width')
def run(self,
        filename,
        file,
        use_spaces: bool,
        allow_trailing_whitespace: bool = False,
        indent_size: int = SpacingHelper.DEFAULT_TAB_WIDTH,
        enforce_newline_at_EOF: bool = True,
        ):

Here use_spaces, allow_trailing_whitespace, pydocstyle_add_ignore and enforce_newline_at_EOF are Type 1 settings.

Type 2 or Type infinite settings

How will they be guessed?

Since these settings can obtain any integer or other such value types, these settings will be guessed with the help of the QuickstartBear.

QuickstartBear

This proposed bear does not generate any patches for the file_dict provided. It parses the file_dict and generates QUICKDATA.yaml of each and every particular value (except the ignored lines) each setting attains on every line of the project. Statistical data about the average, median and standard deviation of the values is also calculated and stored in QUICKDATA.yaml for further analysis.

Most of these settings deal with the maximum limit of these values and a "green" value for the setting can be decided in an instant .

Metadata Generation

Suspected errors and a more suitable value of the setting will be guessed by analysing the probability distribution functions of the setting values and stored in .quickfile under the section [amenable].

Example

PEP8Bear:

@deprecate_settings(indent_size='tab_width')
def run(self, filename, file,
        max_line_length: int = 79,
        indent_size: int = SpacingHelper.DEFAULT_TAB_WIDTH,
        pep_ignore: typed_list(str) = (),
        pep_select: typed_list(str) = (),
        local_pep8_config: bool = False,
        ):

Here max_line_length is a Type 2 setting.

Type 3 or Type discrete settings

How will they be guessed?

These settings will be guessed by using the metadata from applying Type Annotations to the setting values. (possibly using the contracts package) The Brute Force or the lite-mode will run coala over and over again until it gets a "green" value for the setting.

Metadata Generation

In case of inability to find a "green" value for a setting the .quickfile will store the setting along with a weight in section [severe]. The weight will determine how many inconsistencies were found in the project on assuming that particular value of the setting.

Example

PHPMessDetectorBear:

@staticmethod
def create_arguments(filename, file, config_file,
                     phpmd_rulesets: typed_list(str)):
    """
    :param phpmd_rulesets:
        A list of rulesets to use for analysis.
        Available rulesets: cleancode, codesize, controversial, design,
        naming, unusedcode.
    """

Here phpmd_rulesets is a Type 3 setting.

Type 4 or Type config settings

How will they be guessed?

These settings will be guessed by parsing the config files for the specific linters, collecting metadata and then applying the appropriate value to the setting exactly as already done by coala-quickstart for .editorconfig, .gemfile, Gruntfile and Package.json files. The bear settings already detected using this method will be appended to the Type 4 category.

Example

PyDocStyleBear:

def create_arguments(self, filename, file, config_file,
                     pydocstyle_select: typed_list(str) = (),
                     pydocstyle_ignore: typed_list(str) = (),
                     pydocstyle_add_ignore: typed_list(str) = (),
                     pydocstyle_add_select: typed_list(str) = (),
                     ):

Here pydocstyle_select, pydocstyle_ignore, pydocstyle_add_ignore and pydocstyle_add_select are Type 4 settings.

Determining the class of settings

The separation of class of settings, to which type they belong will be done by using the instances of bears and checking their default values using the inspect module at the meta class bearclass. The operations performed at the bearclass will include populating classes (referred to as Quickclass class from here onwards) which will store metadata about the type of settings. Manual checks will be needed for determining which settings are Type 3 settings.

Post-Processing of settings

Many bear settings of different bears achieve the same function. We will either pick randomly one of the settings or bias towards a particular setting if we see certain issues arising for eg. PEP8Bear failing to do line length checks in some cases is a known issue. The settings not required will be removed from Quickclass. Manual sorting needs to be performed to identify such settings.

So we are applying restrictions on bear settings using Type Annotations while distinction of bear settings is done at the meta-class.

General Operations

These consist of developing classes and method which will be used both by the Brute Force and when we improve upon it.

coala will be ran over and over again on the given file dict, using Quickclass for correct detection of type of setting and then guessing its value the appropriate way as described above.
Undetermined values for Type 1, Type 2 and Type 3 settings go into the .quickfile while successful Type 1 combinations of setting values go into QUICKINFO.yaml. The data generated by QuickstartBear goes into QUICKDATA.yaml
"Green" values for settings are added to .coafile

Acting upon the newly available metadata

The metadata has been generated in the following files:

.quickfile
QUICKDATA.yaml
QUICKINFO.yaml

and the following classes:

Quickclass

`.quickfile`

Will have 3 sections as created in .coafile

[severe] This will contain the Type 1 and Type 3 dropped settings along with their values and weights for each value.
[amenable] This will contain the Type 2 dropped settings along with their weights and values.
[permanent] This will contain values to settings that are obtained when the user answers the questions prepared by coala-quickstart. Special string may be used to represent confirmed settings that need to be dropped.

coala-quickstart will always ask at the end of the run whether the user is interested in answering some questions which will lead to more secure creation of config files. If the user choses to answer them, coala-quickstart will provide the user with a bunch of questions from the sections [severe] an [amenable] asking them whether an inconsistency detected is a mistake in the code-base or there are no style rules followed by the project regarding the particular setting. The answered bear-settings will be moved to the [permanent] section.

These set of questions can be invoked directly without a --green-mode run or not provided at all by providing special tags. (Check out the tags section for details)

The user may also be asked to provide a SEVERITY value which will be mapped to the weights of values store in the .quickfile. This mapping will be decided after testing the --green-mode on some projects. All inconsistencies with lesser SEVERITY value than the one entered will be displayed.

In case the user wants to check exactly what lines produce the inconsistencies, coala may be ran again with just that bear and setting value to show the user the erroneous lines in the code-base.

All the necessary information regarding the SEVERITY value will be provided by the initial question.

`QUICKDATA.yaml`

Will contain the data generated by QuickstartBear as described earlier used to guess Type 2 settings

`QUICKINFO.yaml`

Will contain the accepted combinations of values of bear settings along with the project url. These files will be uploaded as Pull Requests to a repository created specifically to store this data.

Encryption will be applied so that any user cannot mess with the data generated by coala-quickstart, thus making Pull Request for junk data and will be available to view only when it has been merged to the repository.

Tools like gitmate-plugins to accept valid Pull Requests automatically and even uploading data directly to the repository may be created as a Stretch Goal.

The Brute Force will always check for these combinations of settings values before resorting to checking all possible combinations.

`Quickclass`

Created dynamically by the metaclass bearclass and groups the settings into Types 1 to 4.

The Brute Force way and further optimization on the Brute Force

The Brute Force Way

The Brute Force will be performing the General Operations to generate the first and formost complete "Green" config file. coala will be ran over and over again for the entire project with all instances of bears with all the combinations of settings running in parallel, as soon as instances of a particular setting is done and we get a value for the setting for which no errors were generated, it goes directly into the .coafile, if no value for a setting matches, it goes into the metadata and we drop the optional settings or the bears (which will have to be dropped in case of inability to find a "green" value for a necessary setting of the bear)

In order to launch bears in such a fashion, some modified methods of Processing.py will have to be called instead depending upon whether the function call stack includes methods from coala-quickstart.

Improving upon the Brute Force

Premature optimization should not be performed on anything although some of the limitations of the brute force method are clear even at the beginning. We try to rush through the brute force as quick as possible and in the mean time keep on thinking of enhancing this in ways to detect more specific bear settings and reducing the total run time on itself.

The "lite-mode"

From what is evident, the brute-force is going to take a huge amount of time to run especially on large projects. For this the coala-quickstart will have a lite-mode. We assume that the lowest level in which there can be variations will be file types as a starting point, we try to guess the settings not from the entire project, but for each file type, we grab a set of files at random and run coala again and again looking for "green" values to settings.

The --lite-mode will drop settings and bears if it is unable to find a "green" setting while appending the weights at the same time.

Successive runs of `--green-mode`

lite-mode should build upon the data from the previous runs to generate less error prone .coafile (less error prone over here indicates not likely to generate errors in the project), so that running lite-mode a number of times is still faster than Brute Force. We choose this method as it is highly unlikely that different files among the same file types have a different set of codestyle rules in a project.

If a .quickfile is present in the directory, the successive runs will try to correct the configuration files (always acting upon .coafile.new over .coafile as a preference) assuming we intend to find a config file which is even more specific than the one already present. Absense of config files will build the config files from scratch.

Successive runs of --lite-mode will check for all settings and will again choose files at random. If it finds a conflicting setting with the config file, it will be appended to the .quickfile. If a setting is already present in the .quickfile, weights of the other values to bear settings will be appended.
Successive runs of Brute Force will assume that it is being run on the results after a few --lite-mode runs and will only run for the settings provided in the .quickfile appending to the weights.

The `--smart-mode`

It is clear that successive runs of the --lite-mode will overestimate the weights in the .quickfile, so it seems advantageous to run --lite-mode several times and then run the Brute Force for the settings dropped by --lite-mode which are now in .quickfile to generate correct value of the weight. This combination of --lite-mode and Brute Force is done collectively by --smart-mode.

Every run will have the annoying question in the end whether the user wants to answer a few questions about the project to get better results unless changed upon by the provided tags. (For more details check out the tags section)

Learning from real world projects

Need to perform this task

There can be endless number of possibilities and assumptions:

Files having similar kind of names can be having different codestyles.
Files in a given directory can be following different codestyles.
Different parts of files maybe even following a different codestyle depending on function name which can also be handled by placing ignores at appropriate places in the file dict.

Such kind of functionality can only be added to the bears themselves, but instead we stop our train of thought over here.

Making Quickstart recognize common patterns

We save ourselves the additional work for building some feature that may hardly ever be used by any org, instead we stop assuming and start implementing and learning from our available resources.

We now choose a list of orgs for which we try to test our green-mode ourselves. We check whether brute force is taking too much time or whether the --lite-mode or the --smart-mode is producing the correct results (i.e. green results). We choose these orgs in such a way that they are very famous or very well known or act as upstream repositories for a huge amount of orgs. This way a very large number of orgs may be mimicking the code style of these orgs itself and if our further optimizations can fix the problem of generating green config files for these organizations, we are in turn solving the problem for these other orgs/communities at the same time.

We learn manually from these orgs, what combinations of these bear-setting values are they using, whether they are using it all over their project or only in certain scenarios, whether they have different code styles in different directories or different nomenclature of file patterns require a different set of settings in their project. We give coala-quickstart, the ability to recognizing these scenarios for further runs.

Engaging newcomers to spread the word about coala

We can only look at a finite number of orgs within the coding period so the last few weeks of the project should deal with writing docs and adding newcomer tasks for either opening PRs in other orgs of the "green" config files or feeding our repository with QUICKINFO.yaml files.

List of tags introduced

--green-mode or --green or -gm: Invokes the "Green mode" for coala-quickstart
--lite-mode or --lite or -lm: To be used along side --green-mode accepts an integer value as parameter indicating number of times the lite-mode will be ran on the project. Will resort to a default value in case of absence of this parameter.
--smart-mode or --smart or -sm: To be used along side --green-mode accepts an integer value as parameter indicating number of times the lite-mode will be ran on the project. Will resort to a default value in case of absence of this parameter.
-d: Don't ask the annoying question
-a: Just ask the annoying question. Takes parameter SEVERITY value

Files

cEP-0022.md

Latest commit

History

cEP-0022.md

File metadata and controls

Quickstart Green Mode

Abstract

Requirements of "Green Mode"

Goals

Implementation

Classification of settings and Metadata Generation

Type 1 or Type bool settings

How will they be guessed?

Metadata Generation

Example

Type 2 or Type infinite settings

How will they be guessed?

QuickstartBear

Metadata Generation

Example

Type 3 or Type discrete settings

How will they be guessed?

Metadata Generation

Example

Type 4 or Type config settings

How will they be guessed?

Example

Determining the class of settings

Post-Processing of settings

General Operations

Acting upon the newly available metadata

.quickfile

QUICKDATA.yaml

QUICKINFO.yaml

Quickclass

The Brute Force way and further optimization on the Brute Force

The Brute Force Way

Improving upon the Brute Force

The "lite-mode"

Successive runs of --green-mode

The --smart-mode

Learning from real world projects

Need to perform this task

Making Quickstart recognize common patterns

Engaging newcomers to spread the word about coala

List of tags introduced

`.quickfile`

`QUICKDATA.yaml`

`QUICKINFO.yaml`

`Quickclass`

Successive runs of `--green-mode`

The `--smart-mode`