Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/evaluation metrics #18

Merged
merged 31 commits into from
Dec 9, 2024
Merged

Feature/evaluation metrics #18

merged 31 commits into from
Dec 9, 2024

Conversation

fluegelk
Copy link
Contributor

@fluegelk fluegelk commented Nov 26, 2024

Adds computation of evaluation metrics from confusion matrices. The following metrics are added, their interfaces are based on the corresponding sklearn.metrics functions:

  • Accuracy: the global accuracy
  • Balanced Accuracy: the accuracy as average over class-wise recalls
  • Precision, Recall, and Fβ-Score: with the following averaging options (average parameter)
    • None: no averaging, return class-wise
    • "micro": compute metrics globally → equal importance on each sample
    • "macro": compute metrics class-wise, then average over classes → equal importance on each class, minority classes can outweigh majority classes
    • "weighted": compute metrics class-wise, then average over classes weighted by their support (#true samples)
    • As we are focussing on multi-class classification, the average options "binary" and "samples" are not included.
  • Cohen's Kappa: compare classification to random guessing, values from -1 to +1, the higher the better, robust to class imbalance but not originally intended for classification problems
  • Matthews Correlation Coefficient (MCC): see https://en.wikipedia.org/wiki/Phi_coefficient, values from -1 to +1, the higher the better, robust to class imbalance

Solves Issue #16

fluegelk and others added 24 commits November 21, 2024 11:47
- Restructure precision_recall_fscore tests by data instead of averaging
- Add test for balanced data
True class distribution is balanced but predictions are not
by using (1+β²) TP / ((1+β²) TP + β² FN + FP) instead of  (1+β²) precision • recall / (β² precision + recall)
@fluegelk fluegelk requested a review from mcw92 November 26, 2024 15:50
Copy link
Contributor

github-actions bot commented Nov 26, 2024

Name Stmts Miss Cover Missing
specialcouscous/__init__.py 0 0 100%
specialcouscous/evaluation_metrics.py 66 1 98% 139
specialcouscous/rf_parallel.py 119 9 92% 89-93, 194, 198, 271-273, 447
specialcouscous/synthetic_classification_data.py 206 49 76% 88-90, 185, 304-324, 358, 469, 471, 561-567, 585, 869-883, 1079-1135, 1202-1224
specialcouscous/train.py 257 3 99% 395-396, 522
specialcouscous/utils/__init__.py 59 31 47% 31, 81-82, 106-274
specialcouscous/utils/plot.py 129 74 43% 97, 222-247, 264-350, 366-492
specialcouscous/utils/result_handling.py 22 1 95% 79
specialcouscous/utils/slurm.py 79 72 9% 22-116, 133-149, 166-177
specialcouscous/utils/timing.py 35 0 100%
TOTAL 972 240 75%

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 98.48485% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.30%. Comparing base (bf7366a) to head (d3e30d6).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
specialcouscous/evaluation_metrics.py 98.48% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   73.62%   75.30%   +1.68%     
==========================================
  Files           8        9       +1     
  Lines         906      972      +66     
==========================================
+ Hits          667      732      +65     
- Misses        239      240       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@mcw92 mcw92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- I made some small changes to the docstrings to appease the MyPy pre-commit hook. In addition, I removed MacOS from the test build matrix as one of the MacOS tests fails for dubious MPI reasons or just times out. The Python version differs among runs and I have no idea why that is right now. As this is not super important and I want the tests to work in the main branch, I will merge your branch now and work on implementing the confusion matrix.

@mcw92 mcw92 merged commit 5c7ce6a into main Dec 9, 2024
4 checks passed
@mcw92 mcw92 deleted the feature/evaluation_metrics branch December 9, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants