Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1554 generate progress output initial draft #10134

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jzohrab
Copy link

@jzohrab jzohrab commented Dec 11, 2024

This is a DRAFT PR addressing issue #1554 ("generate progress").

It adds the following section to pylint --help :

Progress:
  Options related to progress reporting

  --show-progress       Show progress bar. (default: False)
  --show-current-file   Show name of current file being processed. (default: False)

And a sample output from the unit tests using both of these flags is as follows:

        Checking 2 modules.
        1 of 2
        tests/regrtest_data/wrong_import_position.py
        ************* Module wrong_import_position
        tests/regrtest_data/wrong_import_position.py:11:0: C0413: Import "import os" should be placed at the top of the module (wrong-import-position)
        2 of 2
        tests/regrtest_data/import_something.py

This is totally primitive, but maybe it's a useful starting point for concrete discussion.

The code has a few # NOTE: comments for discussion.

Cheers and thanks for the great project!

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea! I really don't want to sound too negative, but I don't thinks this will work.

Most importantly because _lint_files is too late into the process. The call to _get_asts just before the call to _lint_files already does a lot of the heavy lifting of processing a module. In fact, the simple fact that that method can call self.add_message shows that it is technically part of the linting process.

I do think you hint at an option that is worth exploring. What if we turn _iterate_file_descrs into not being an Iterator but just returning a Sequence? We already take a Sequence as input so it is not as if we would be turning something that is lazily evaluated into something that is not.
After that we can check whether verbose is set and just print something like Linting ... modules.

That would be a feasible step towards full progress printing I think: we'd start off by actually correctly identifying the amount of work and reporting this in verbose mode. After this we can worry about printing progress itself.

Since we would only be printing one line I think it should technically also be feasible to do this in a multiprocessing safe manner.

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Hi @DanielNoord - thanks for taking the time to review the draft idea. It's tough to manage time for FOSS projects so I appreciate it.

I really don't want to sound too negative, but I don't thinks this will work.

No sweat about being negative, that's why this PR is draft. I was considering changing the _iterate_file_descrs to return a sequence but didn't want to be too invasive, but if you're ok with that I'll do that. I think it's fine: it's just a list of files, so having a full list vs an iterator shouldn't be a problem. I'll make that change, possibly with a few more tweaks, and will repush. Cheers!

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Is there a better way, or a possible way, to do all of the CI checks locally? It's rather hit-and-miss to push something that I hope passes ci formatting/mypy checks. This is a note for developer docs. 👋

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Feels like this is getting more reasonable as a discussion piece. As @DanielNoord mentioned above, linting is two parts, building the AST, and then actually linting. Rather than try to combine those, instead I've opted to report the progress of each of those. Here's what the test prints, when using --show-progress --show-current-file:

Get ASTs for 2 files.
tests/regrtest_data/wrong_import_position.py (1 of 2)
tests/regrtest_data/import_something.py (2 of 2)
Linting 2 modules.
tests/regrtest_data/wrong_import_position.py (1 of 2)
************* Module wrong_import_position
{module2}:11:0: C0413: Import "import os" should be placed at the top of the module (wrong-import-position)
tests/regrtest_data/import_something.py (2 of 2)

I haven't done the parallel stuff yet, b/c we're still sorting out the basics.

I did end up pulling out a simple "progress reporter" class, just to keep the main linting module clearer.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not a big fan of the new options, is there a reason for not just enabling this through verbose?

As for getting this merged: do you mind splitting off some of the changes into separate PRs? The changes to _iterate_file_descrs seem sensible and can be done in a separate PR. That reduces the scope of this PR and makes it easier to get people to discuss the actual "meat" of the PR without getting distracted by simple refactorings.

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Personally I'm not a big fan of the new options, is there a reason for not just enabling this through verbose?

No problem, and no real reason. I figured that this was a new thing and so should get its own settings, but I can see how others would feel it's clutter. I'll remove those new settings, if someone wants them back in that could be a later iteration.

do you mind splitting off some of the changes into separate PRs?

Makes sense. I think it's just a few commits, I'll get it all together and squash it to a single thing.

When the above is done, there will be two separate PRs - the progress reporting one will contain the commits of the iterate_file_descrs change, I'll note that in the PR. Then I'll close this one. Cheers!

@DanielNoord
Copy link
Collaborator

Personally I'm not a big fan of the new options, is there a reason for not just enabling this through verbose?

No problem, and no real reason. I figured that this was a new thing and so should get its own settings, but I can see how others would feel it's clutter. I'll remove those new settings, if someone wants them back in that could be a later iteration.

Perfect!

do you mind splitting off some of the changes into separate PRs?

Makes sense. I think it's just a few commits, I'll get it all together and squash it to a single thing.

When the above is done, there will be two separate PRs - the progress reporting one will contain the commits of the iterate_file_descrs change, I'll note that in the PR. Then I'll close this one. Cheers!

You can select the "base branch" in the Github UI. If you set it to the branch of the first PR in the second PR we will only see the commits that are introduced by the PR. Github should automatically reset the "base branch" after we merge the first PR.
This helps declutter the PR and focus attention.

Once again, I mostly request this because I have seen way too many PRs fail due to them containing too much changes at once. Review time for pylint is somewhat limited so it really helps to split off changes into easily reviewable chunks so we can at least start merging stuff. I hope you don't mind the additional work.

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Yeah, understood about PRs that are too big, I see that a lot as well. This was a draft for discussion, never intended to be merged itself, so it's worked out well.

Thanks for the "base branch", will check it out.

Some help needed though: I'm currently having some trouble finding out how to the get whether or not "verbose" is set in pylinter. pylinter's self.config.verbose appears to always be None, no matter if I pass in --verbose or not in my test run. It appears that that flag is only used by the runner. pylinter only refers to verbose as an argument to generate_reports. Possible workarounds:

  • leave the "--report-progress" flag :-P
  • I could change the API of the actual check to take a verbose argument, but that seems like a bad idea.
  • change the definition of verbose in pylint/lint/base_options to be a store_true, so it works like the progress flag, but that also seems like a bad idea as I don't know what else is involved.

In the meantime, I'll split out the non-progress-code into a separate PR.

Returning a materialized list instead of an iterator lets us do
simple progress reporting in the future.  As the list of FileItems
is very lightweight, there shouldn't be a performance hit.
@jzohrab jzohrab force-pushed the issue_1554_generate_progress_output_initial_draft branch from 4d14fa5 to d6dff7d Compare December 18, 2024 22:05
@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

Opened #10142 for the change of _iterate_file_descrs.

@jzohrab
Copy link
Author

jzohrab commented Dec 18, 2024

This PR now only contains 3 commits: the commits for #10142, and the single draft commit. When I get some guidance on how to handle the verbose question, I'll fix that last commit in a separate non-draft PR.

@DanielNoord
Copy link
Collaborator

Let me get back to you on that verbose question. Might not be within 24 hours, that does indeed seem like quite the rabbit hole 😢

@jzohrab
Copy link
Author

jzohrab commented Dec 19, 2024

Let me get back to you on that verbose question.

I'll try another quick hack. We can assign verbose = True in the run.py, it's not a terrible solution. Will do that and push it.

@jzohrab
Copy link
Author

jzohrab commented Dec 19, 2024

I added self.verbose = False to the linter itself, and in run.py added linter.verbose = self.verbose.

Now several tests fail due to how they check stdout. Examples

FAILED tests/lint/test_pylinter.py::test_crash_during_linting - AssertionError: assert not 'Linting 1 modules.\n/Users/jeff/Documents/Projects/pylint/tests/lint/test_pylinter.py (1 of 1)\n'

FAILED tests/test_self.py::TestRunTC::test_output_file_can_be_combined_with_output_format_option[text-{path}:4:4: W0612: Unused variable 'variable' (unused-variable)] - AssertionError: Unexpected output to stdout/stderr while output option was set

For the purposes of reducing test code thrash in this PR, I feel that keeping the --show-progress flag might be the best option, at least in the interim, as updating the tests to match the new verbose flag doesn't feel like it adds much value. Reporting progress is good, but for the tests, the important thing is verifying the actual linting work.

Some of these failures will have different resolutions:

  • AssertionError: Unexpected output to stdout/stderr while output option was set. For tests that call def _test_output_file(, I don't know if it's valid to check for empty command line output. I assume that progress reporting should not be directed to the file, it should go to stderr or stdout. Perhaps the current assert cmdline_output == "" in that test function could strip out progress reporting lines.
  • output length: some tests check the number of lines output -- maybe just showing the results would be best (?)

All my own opinion, of course 👋

@jzohrab jzohrab force-pushed the issue_1554_generate_progress_output_initial_draft branch from 816f103 to 4a75af2 Compare December 19, 2024 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants