Feature/evaluation metrics #18

fluegelk · 2024-11-26T15:50:05Z

Adds computation of evaluation metrics from confusion matrices. The following metrics are added, their interfaces are based on the corresponding sklearn.metrics functions:

Accuracy: the global accuracy
Balanced Accuracy: the accuracy as average over class-wise recalls
Precision, Recall, and Fβ-Score: with the following averaging options (average parameter)
- None: no averaging, return class-wise
- "micro": compute metrics globally → equal importance on each sample
- "macro": compute metrics class-wise, then average over classes → equal importance on each class, minority classes can outweigh majority classes
- "weighted": compute metrics class-wise, then average over classes weighted by their support (#true samples)
- As we are focussing on multi-class classification, the average options "binary" and "samples" are not included.
Cohen's Kappa: compare classification to random guessing, values from -1 to +1, the higher the better, robust to class imbalance but not originally intended for classification problems
Matthews Correlation Coefficient (MCC): see https://en.wikipedia.org/wiki/Phi_coefficient, values from -1 to +1, the higher the better, robust to class imbalance

Solves Issue #16

- Restructure precision_recall_fscore tests by data instead of averaging - Add test for balanced data

True class distribution is balanced but predictions are not

by using (1+β²) TP / ((1+β²) TP + β² FN + FP) instead of (1+β²) precision • recall / (β² precision + recall)

for more information, see https://pre-commit.ci

github-actions · 2024-11-26T15:54:06Z

Name	Stmts	Miss	Cover	Missing
specialcouscous/__init__.py	0	0	100%
specialcouscous/evaluation_metrics.py	66	1	98%	139
specialcouscous/rf_parallel.py	119	9	92%	89-93, 194, 198, 271-273, 447
specialcouscous/synthetic_classification_data.py	206	49	76%	88-90, 185, 304-324, 358, 469, 471, 561-567, 585, 869-883, 1079-1135, 1202-1224
specialcouscous/train.py	257	3	99%	395-396, 522
specialcouscous/utils/__init__.py	59	31	47%	31, 81-82, 106-274
specialcouscous/utils/plot.py	129	74	43%	97, 222-247, 264-350, 366-492
specialcouscous/utils/result_handling.py	22	1	95%	79
specialcouscous/utils/slurm.py	79	72	9%	22-116, 133-149, 166-177
specialcouscous/utils/timing.py	35	0	100%
TOTAL	972	240	75%

for more information, see https://pre-commit.ci

codecov-commenter · 2024-11-26T16:23:40Z

Codecov Report

Attention: Patch coverage is 98.48485% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.30%. Comparing base (bf7366a) to head (d3e30d6).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
specialcouscous/evaluation_metrics.py	98.48%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   73.62%   75.30%   +1.68%     
==========================================
  Files           8        9       +1     
  Lines         906      972      +66     
==========================================
+ Hits          667      732      +65     
- Misses        239      240       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mcw92

Looks good to me -- I made some small changes to the docstrings to appease the MyPy pre-commit hook. In addition, I removed MacOS from the test build matrix as one of the MacOS tests fails for dubious MPI reasons or just times out. The Python version differs among runs and I have no idea why that is right now. As this is not super important and I want the tests to work in the main branch, I will merge your branch now and work on implementing the confusion matrix.

fluegelk and others added 24 commits November 21, 2024 11:47

Add method stubs for all evaluation metrics

7a4726a

Implement (balanced) accuracy and Cohen's kappa

08abb5d

Implement Matthews correlation coefficient

8a9653c

Implement precision_recall_fscore with averaging

9e2c21b

Reformat evaluation metrics

38e3052

Add tests for (balanced) accuracy

ec914c9

Fix (balanced) accuracy score tests for variable number of classes

bbfb11f

Fix F score computation: handling of arrays vs scalars

b95b2c8

Add tests for precision, recall, and fbeta

7f759ba

Add test stubs for all remaining metric functions

21228fc

Implement test for f1 score

c91b590

Add test for and fix f_score_from_precision_and_recall

f422db1

Add and restructure tests for precision_recall_fscore

a0e8b59

- Restructure precision_recall_fscore tests by data instead of averaging - Add test for balanced data

Fix precision_recall_fscore docstring and handling of average values

02e15c0

Add test for precision_recall_fscore with partially balanced data

0d2aef2

True class distribution is balanced but predictions are not

Add test for precision_recall_fscore with imbalanced data

cb90c33

Add tests for Cohen’s kappa and Matthews correlation coefficient

da36a54

Test multiple numbers of classes via parametrize

760328d

Add comparison to sklearn metrics to tests

341fca1

Fix F-Score computation for edge cases

a9900e1

by using (1+β²) TP / ((1+β²) TP + β² FN + FP) instead of (1+β²) precision • recall / (β² precision + recall)

Run black code formatter

5e8014f

Add docstrings and typehints to the metric tests

157fd48

Add overview and usage of evaluation metrics to readme

8701666

[pre-commit.ci] auto fixes from pre-commit.com hooks

1efb539

for more information, see https://pre-commit.ci

fluegelk requested a review from mcw92 November 26, 2024 15:50

fluegelk and others added 3 commits November 26, 2024 17:01

Remove now unused function _f_score_from_precision_and_recall

bd4ccb0

Add test for invalid average values

e4d3463

[pre-commit.ci] auto fixes from pre-commit.com hooks

e65c225

for more information, see https://pre-commit.ci

mcw92 added 4 commits December 9, 2024 11:21

adapt capitalization

c05ec5a

remove MacOS from build matrix

39e8140

format docstrings so they work in principle with RTD

fc3fa5b

reformat docstrings to appease mypy

d3e30d6

mcw92 approved these changes Dec 9, 2024

View reviewed changes

mcw92 merged commit 5c7ce6a into main Dec 9, 2024
4 checks passed

mcw92 deleted the feature/evaluation_metrics branch December 9, 2024 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/evaluation metrics #18

Feature/evaluation metrics #18

fluegelk commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024 •

edited

Loading

codecov-commenter commented Nov 26, 2024 •

edited

Loading

mcw92 left a comment

Feature/evaluation metrics #18

Feature/evaluation metrics #18

Conversation

fluegelk commented Nov 26, 2024 • edited Loading

github-actions bot commented Nov 26, 2024 • edited Loading

codecov-commenter commented Nov 26, 2024 • edited Loading

Codecov Report

mcw92 left a comment

Choose a reason for hiding this comment

fluegelk commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024 •

edited

Loading

codecov-commenter commented Nov 26, 2024 •

edited

Loading