Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use balanced accuracy with adult census dataset #785

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ArturoAmorQ
Copy link
Collaborator

Inspired by this question in the forum, this PR uses balanced accuracy instead of accuracy to score a logistic regression pipeline with an interaction feature engineering step.

It feels more natural to me to use balanced accuracy as scoring metric, as it has already been used for the adult census dataset in previous notebooks.

I also took the opportunity to tweak the general wording.

@glemaitre
Copy link
Collaborator

I would not recommend to use the balanced accuracy because it is an arbitrary metric. I would not advise it anymore in scikit-learn as well.

Here, I don't think that the balanced ratio is that catastrophic: we could therefore compare the accuracy of a DummyClassifier predicting the majority class and mention that we need any predictive model to perform above this accuracy level to ensure that the model capture any signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants