Set the --testcasefilter directly through the command line #3091

saranyanandagopal · 2024-11-01T15:16:59Z

Is your feature request related to a problem? Please describe.
We can’t run Stryker .NET in pull requests due to the long execution time. Developers don’t get immediate feedback on unit test quality, and late feedback (after nightly runs) is often ignored.

Describe the solution you'd like
Allow setting --testcasefilter directly via the command line. This would let us use git diff to dynamically select test files and run Stryker, providing immediate feedback in pull request builds.

Describe alternatives you've considered
Using --since is impractical because it requires running all unit tests initially, which is too slow for pull request builds in a large project with ~4000 tests.

Additional context
Even though the concurrency is set, I don't think it is being used for the initial test run adding to the longer execution time

dupdob · 2024-11-02T21:29:34Z

Have you tried using the test-case-filter configuration option?

If yes, why do you think it should be also available via the command line ?

saranyanandagopal · 2024-11-04T14:40:50Z

Yes we use the test-case-filter option for local testing. To ensure that mutation testing provides timely feedback in Pull request builds, setting the test-case-filter option via the command line will be the most efficient approach. We will be able to dynamically read the test file names from the git commit and set the test-case-filter value in the pipeline for Stryker run.

This method will allow us to focus on specific tests that cover the code changes, reducing the total duration of the mutation testing process to approximately 5 minutes. This is significantly faster than using the --since option, which takes up to 2 hours.

By integrating mutation testing with the test-case-filter option into our CI/CD pipeline, we can provide developers with quick feedback, allowing them to address any issues promptly without disrupting their workflow.

candoumbe · 2024-11-04T15:26:39Z

@saranyanandagopal
I have a question regarding this feature request (that I would love to come to life) : how are you able to determine which mutation tests you should run given the git commit ?
I've only used --since option so far and have avoided running mutation tests on PR because of how long it did take to run to completion

dupdob · 2024-11-04T15:46:41Z

Yes we use the test-case-filter option for local testing. To ensure that mutation testing provides timely feedback in Pull request builds, setting the test-case-filter option via the command line will be the most efficient approach. We will be able to dynamically read the test file names from the git commit and set the test-case-filter value in the pipeline for Stryker run.

How do you identify which tests need to be run? Stryker runs all tests in the init phase for coverage analysis. That is the only sure way to identify which tests cover which mutants. Stryker would need to run tests that have not been changed.

That being said, you could request for another change: I think Stryker runs all tests twice: once to establish a baseline result, which tests fail and how long they take. And then to discover coverage. Both could be done in a single pass, but we since a typical Stryker run will requires running the tests several times, having one extra run for convenience and stability was ok.
But if your test run is long, we could remove one pass.

saranyanandagopal · 2024-11-04T15:49:58Z

@candoumbe We have a naming pattern for our unit test files where all filenames include the word "fixture". I have verified that the following method works on our Azure DevOps Windows machine to identify if any test fixtures are being checked in.

          git fetch origin +refs/heads/*:refs/remotes/origin/*
          $baseBranch = "origin/main"              
          $currentBranch = "$(Build.SourceBranchName)"
          git checkout $currentBranch
          $diffCommand = "git diff --name-only $baseBranch..$currentBranch"
          Write-Host "Executing command: $diffCommand"
      $files = Invoke-Expression $diffCommand | Select-String -Pattern "fixture"
          if ($files) {
            $filesString = $files -join "|"
            Write-Host "Fixture Files found: $filesString"                
          } else {
            $filesString = ""
            Write-Host "Fixture Files NOT found"  
          }
          Write-Host "##vso[task.setvariable variable=fixtureFiles]$filesString"

I can then build the $testCaseFilterOption (testing in progress) and plug it into the Stryker command.

script: |
$fixtureFiles = "$(filesString)"
if ($fixtureFiles -ne "") {
$testCaseFilters = $fixtureFiles -split " " | ForEach-Object { "FullyQualifiedName~$_" } -join " | "
$testCaseFilterOption = "--test-case-filter "$testCaseFilters""
} else {
$testCaseFilterOption = ""
}
$strykerCommand = "dotnet stryker " +
"--config-file ""../../stryker-config.json"" " + "--solution ../../1-backend.sln " +
$testCaseFilterOption + " " + "--reporter html " +
"--reporter dots " + "--output $(Agent.BuildDirectory)/stryker " +
"--concurrency 4 " + `
"--open-report"
Write-Host "Executing command: $strykerCommand"
Invoke-Expression $strykerCommand

saranyanandagopal · 2024-11-04T15:59:29Z

@dupdob I have seen Stryker running "all" unit tests in the init phase when --since is enabled. In our case it takes closer to 2 hours as we have around 5000 unit tests. But I don't think it does that when the test-case-filter value is passed. Because it finishes in ~5 minutes and we can see the mutation report with the score for that part of the code base.

Thanks for the suggestion on cutting down a pass. It will certainly improve the mutation testing duration (helpful in our nightly builds as we can trigger it for multiple projects), but it still won't help our Pull request builds. Anything that takes more than 10 minutes cannot be integrated in Pull Request builds as it will slow down the feature development.

My above answer has the command that I tested in the Azure Devops Pipeline to identify the test file names to pass to the test-case-filter command.

dupdob · 2024-11-04T16:54:06Z

If I understand your script correctly, you will only run modified tests. There is a significant risk that it will to incorrect result. Stryker needs to run all tests covering a mutation to certify its outcome.

The intent behind the --test-filter option is to focus on fast running tests (i.e. unit tests, usually). Not to adjust it dynamically.

Here is another possible optimization request: Stryker could skip non modified project, i.e. only test projects (incl test projects) that have at least one modified file.

saranyanandagopal · 2024-11-04T19:44:18Z

Thank you for your quicker responses!

We have around 4000 tests in one test project and about 1000 in the second. If only the first project has file changes, I assume Stryker would still need to run all 4000 tests. Is that correct?

I understand your point about the test-case-filter. Our unit tests are organized in such a way that each test fixture corresponds to a single source code file, keeping related tests together.

The risk of missing tests that cover all mutants is very low because of this organization. Running partial mutation tests in pull requests is better than not running them at all. We also have a nightly mutation testing pipeline that runs all tests, giving us a complete overview. However, the most critical time to catch issues is before merging pull requests. This approach helps us spot bugs early as we shift left. Stryker can add significant value to teams adopting a shift-left strategy by reducing the likelihood of bugs reaching integration testing environments. So far, the only faster option that has worked for us is the test-case-filter.

Sorry for being persistent :) but I’ve tested multiple options and haven’t found a faster solution.

dupdob · 2024-11-04T21:16:59Z

No worries for being persistent, I am too.
Yes, your assumption is right, but breaking test projects in smaller ones would reduce this issue.

But what I want you to understand, is that the mutation score will often be false, and probably worse that it should be using a normal run.

Test filter is supposed to be the same between baseline and diff, otherwise score and results are inconsistent.

saranyanandagopal · 2024-11-05T17:50:52Z

You are absolutely right; we need to break our test projects into smaller ones, and it is on the roadmap. It's a huge effort, but we don't want to delay our Stryker integration until then.

Stryker optimization that you are proposing will help! I will be happy to add another feature request for it.

However, I would like to show you some examples of our test-case-filter usage on a specific source code file.

First pass.
Second pass.
Third pass

I couldn't find the snapshot for our fourth pass, but we were able to incrementally improve the mutation score for the specific source code file. Additionally, the mutation run of everything without the test-case-filter shows results that are consistent with our independent test-case-filter runs.

Mutation run without test-case-filter:

I still feel this is a good option for Pull Request runs :)

dupdob · 2024-11-06T08:42:25Z

Thanks for taking the time for this. To be clear: I am not saying that you cannot see progress; I even think that if you see progress, it is actual progress.
I am saying that you may see false regression. I probably have misunderstood your logic, but what happens if a PR only touches some existing production code and do no touch any tests? for fixing an issue for example.

saranyanandagopal · 2024-11-06T15:51:15Z

Got it! When changes are made to the existing code, it’s expected that the corresponding tests are also updated in the same PR. If the tests aren’t updated, they might break during the CI build, requiring the author to review and fix them. Also, Our on-demand pipelines will be manually triggered for regression testing before production releases or hot fixes. This means we’ll have smaller CI builds throughout the day, with a longer regression run conducted separately.

dupdob · 2024-11-06T16:01:15Z

Code can be changed without breaking existing unit tests, otherwise you would not need mutation testing.
And your logic implies a 100% existing coverage, a target that is rarely met.

Again, I am just saying that however you look at this, using a different filter between full mutation testing and PRs means you may get misleading result for the PR.

In practice, if a PR author finds his/her result to be wrong, he/she would still be able to touch the proper test files to get a more accurate result.
By the same token, he/she will still be able to touch several test files to improve his/her score, but this risk is inherent with the PR logic, whatever the underlying algorithm.

That being, I am just a maintainer for this project, and I feel I cannot make this decision . @rouke-broersma and @richardwerkman : WDYT about exposing --testcasefilter as an option ?

This is not difficult to make, but users have to be warned that they may get weird result when using diff mode with varying filter.

saranyanandagopal · 2024-11-06T16:25:15Z

You are correct about the coverage assumption; we do not have 100% test coverage. Stryker has been instrumental in identifying areas with no coverage, and we’ve improved this over the past few weeks. I understand your point about the differences in results between the test-case-filter run and the full mutation run. With our nightly mutation run, our goal is to make any necessary adjustments after the PR but still get immediate feedback in the PRs. Thanks for your questions—they prompted me to consider some important points. I originally raised this in the Slack channel and logged it here at the request of @rouke-broersma

rouke-broersma · 2024-11-06T18:06:52Z

@dupdob I asked them to create the issue so we can have this discussion 👌. I share your concerns, however if we decide to add this in the command line I don't think a warning is necessary. A user could already achieve this by modifying the config file on the fly. The flag would imo not be advertised with the functionality that the OP has in mind, but if they use it like that it's at their own risk. The only big concern I have is the sheer number of commands line options. I feel like it has already grown past the amount we had back when we decided to remove the majority..

Ultimately I think the bigger concern (not sure if this is properly represented in this issue) is that our initial testrun is more than 4 times slower than when running with vstest. From what I understand from our conversation on slack the test case filter is a workaround for this problem. I don't remember if it's intentional but I think our initial testrun is not running with parallel execution enabled. If we can change that I think that could already alleviate some of the concerns.

saranyanandagopal · 2024-11-06T21:31:35Z

Parallelizing the initial test will accelerate Stryker testing, although it may not be faster for PR builds. vstest runs approximately 4,800 tests in parallel in about 20 minutes. Considering the time for mutant generation and the final test run, the total duration would be around 45-50 minutes, assuming no additional steps are involved. Does that seem accurate? This will still be an improvement for the regression runs though.

Apart from the test-case-filter, which has its own risks, I can’t think of any other solution to make this work for PR builds. I am open to testing other suggestions that you might have.

dupdob · 2024-11-06T21:51:15Z

Update

~~Initial testing is already running in //.~~
I just checked. Every test run is done with parallelization disabled. Motivations are: for initial test run, we need to have the total run time to establish a baseline for timeout computation.
For coverage analysis, it is disabled because is two tests, testA and testB, run in //, it will be difficult to attribute covered mutant to testA or testB (1).
For mutation test, it is disabled because there can be only one active mutation at any given time, and there is absolutely no way for Stryker to control which tests are run when during the session.

Axis of improvements

Actually, there are several options to improve performance both for baseline run and diff run. But most of them require significant effort. This is obvious, otherwise they would have already been done.

Safe ones

These improvements have little to no risk.

Skip initial test run

Initial test run could be merged with coverage analysis. Benefit: save the time of a full test run, have a better run time baseline (at it includes mutation switching related costs). Risks: Stryker run on project with no tests or too many failing tests will be aborted after mutation instead of before

Remove duplicate initial test runs

For solution mode, a test project will be run several times: one for each of the 'production' projects that it covers.
Stryker could use a caching logic to prevent rerunning these.
Benefit: could save several runs of each test project
Risks: none identified

Remove duplicate coverage analysis run

For solution mode, a test project will be run several times for coverage analysis: one for each of the 'production' projects that it covers.
Stryker could be modified to use a single coverage analysis for all project in a single run. This probably requires an important refactoring of the internal logic as it is per project based (as of today).
Benefit: could save several runs of each test project Risks: none identified, beside the complexity of the redesign

Diff mode for runs

Stryker could use diff results to skip test project that have not been modified and that do not cover any modified project.
This requires a significant design effort.
Benefits: would save a lot of test runs when diff is very small. Would save little in the general case.

rouke-broersma · 2024-11-07T08:08:36Z

@saranyanandagopal I assume you also run your unit tests separately for your pull request builds? We could add an option to skip the initial testrun, leaving the consequence of invalid results with the user. This would save you the additional time of the initial testrun however the coverage run still needs to be performed.

@dupdob I thought so too but it looks like we currently limit cpu to 1 during initial testrun, only test discovery is run with multiple cores:

These are the runsettings for the initial testrun:

<RunSettings>
<RunConfiguration>
  <CollectSourceInformation>false</CollectSourceInformation>

<DesignMode>false</DesignMode>
<MaxCpuCount>1</MaxCpuCount>
<TargetFrameworkVersion>net8.0</TargetFrameworkVersion>
  <InIsolation>true</InIsolation>
<DisableAppDomain>true</DisableAppDomain>
</RunConfiguration>
</RunSettings>

I think this is a bug right?

dupdob · 2024-11-07T08:23:24Z

No, it is not a bug. It is mandatory to get the proper default test run time.
I updated my previous comment to a deeper analysis

rouke-broersma · 2024-11-07T10:46:01Z

Of course, good points.

saranyanandagopal · 2024-11-07T20:24:35Z

Hi @rouke-broersma I understand the concern behind parallelization. We do run unit tests separately in pull request builds. I can test the time it takes when the duplicate initial test runs are removed.

saranyanandagopal added the 🚀 Feature request New feature or request label Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the --testcasefilter directly through the command line #3091

Set the --testcasefilter directly through the command line #3091

saranyanandagopal commented Nov 1, 2024

dupdob commented Nov 2, 2024 •

edited

Loading

saranyanandagopal commented Nov 4, 2024 •

edited

Loading

candoumbe commented Nov 4, 2024

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024 •

edited

Loading

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 5, 2024

dupdob commented Nov 6, 2024

saranyanandagopal commented Nov 6, 2024

dupdob commented Nov 6, 2024

saranyanandagopal commented Nov 6, 2024

rouke-broersma commented Nov 6, 2024 •

edited

Loading

saranyanandagopal commented Nov 6, 2024

dupdob commented Nov 6, 2024 •

edited

Loading

rouke-broersma commented Nov 7, 2024 •

edited

Loading

dupdob commented Nov 7, 2024 •

edited

Loading

rouke-broersma commented Nov 7, 2024

saranyanandagopal commented Nov 7, 2024

Set the --testcasefilter directly through the command line #3091

Set the --testcasefilter directly through the command line #3091

Comments

saranyanandagopal commented Nov 1, 2024

dupdob commented Nov 2, 2024 • edited Loading

saranyanandagopal commented Nov 4, 2024 • edited Loading

candoumbe commented Nov 4, 2024

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024 • edited Loading

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 4, 2024

dupdob commented Nov 4, 2024

saranyanandagopal commented Nov 5, 2024

dupdob commented Nov 6, 2024

saranyanandagopal commented Nov 6, 2024

dupdob commented Nov 6, 2024

saranyanandagopal commented Nov 6, 2024

rouke-broersma commented Nov 6, 2024 • edited Loading

saranyanandagopal commented Nov 6, 2024

dupdob commented Nov 6, 2024 • edited Loading

Update

Axis of improvements

Safe ones

Skip initial test run

Remove duplicate initial test runs

Remove duplicate coverage analysis run

Diff mode for runs

rouke-broersma commented Nov 7, 2024 • edited Loading

dupdob commented Nov 7, 2024 • edited Loading

rouke-broersma commented Nov 7, 2024

saranyanandagopal commented Nov 7, 2024

dupdob commented Nov 2, 2024 •

edited

Loading

saranyanandagopal commented Nov 4, 2024 •

edited

Loading

saranyanandagopal commented Nov 4, 2024 •

edited

Loading

rouke-broersma commented Nov 6, 2024 •

edited

Loading

dupdob commented Nov 6, 2024 •

edited

Loading

rouke-broersma commented Nov 7, 2024 •

edited

Loading

dupdob commented Nov 7, 2024 •

edited

Loading