-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/evaluation metrics #18
Conversation
- Restructure precision_recall_fscore tests by data instead of averaging - Add test for balanced data
True class distribution is balanced but predictions are not
by using (1+β²) TP / ((1+β²) TP + β² FN + FP) instead of (1+β²) precision • recall / (β² precision + recall)
for more information, see https://pre-commit.ci
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18 +/- ##
==========================================
+ Coverage 73.62% 75.30% +1.68%
==========================================
Files 8 9 +1
Lines 906 972 +66
==========================================
+ Hits 667 732 +65
- Misses 239 240 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- I made some small changes to the docstrings to appease the MyPy pre-commit hook. In addition, I removed MacOS from the test build matrix as one of the MacOS tests fails for dubious MPI reasons or just times out. The Python version differs among runs and I have no idea why that is right now. As this is not super important and I want the tests to work in the main branch, I will merge your branch now and work on implementing the confusion matrix.
Adds computation of evaluation metrics from confusion matrices. The following metrics are added, their interfaces are based on the corresponding
sklearn.metrics
functions:average
parameter)None
: no averaging, return class-wise"micro"
: compute metrics globally → equal importance on each sample"macro"
: compute metrics class-wise, then average over classes → equal importance on each class, minority classes can outweigh majority classes"weighted"
: compute metrics class-wise, then average over classes weighted by their support (#true samples)"binary"
and"samples"
are not included.Solves Issue #16