Skip to content

Test Robustness Metric (Mutation Testing) #197

@gltanaka

Description

@gltanaka

Problem

Code coverage does not measure test quality. A test suite can have 100% coverage but catch 0% of bugs.

From Dec 13 Benchmarking Meeting:

"The coverage part. So coding coverage is not a very good metric because in a lot of cases, the coverage is very high. But the robustness of the test is very low. For example, if I made small mistake, whether the test cases can report this bug."

Proposed Solution

Integrate mutation testing:

  • Make small changes to code (mutations)
  • Run tests
  • Measure % of mutations caught

Tools to Evaluate

  • mutmut (Python mutation testing)
  • cosmic-ray (Python)

Implementation

pdd test --robustness MODULE  # Run mutation testing

Example Output

Mutation Score: 78%
Survived Mutations: 12
Killed Mutations: 43

Use Case

Before regenerating a module, verify that tests are robust enough to catch regressions. A high mutation score gives confidence that regenerated code will be properly validated.

Priority

Low - this is a nice-to-have after core benchmarking infrastructure is in place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions