Use balanced accuracy with adult census dataset #785

ArturoAmorQ · 2024-12-09T15:19:46Z

Inspired by this question in the forum, this PR uses balanced accuracy instead of accuracy to score a logistic regression pipeline with an interaction feature engineering step.

It feels more natural to me to use balanced accuracy as scoring metric, as it has already been used for the adult census dataset in previous notebooks.

I also took the opportunity to tweak the general wording.

glemaitre · 2024-12-13T22:11:03Z

I would not recommend to use the balanced accuracy because it is an arbitrary metric. I would not advise it anymore in scikit-learn as well.

Here, I don't think that the balanced ratio is that catastrophic: we could therefore compare the accuracy of a DummyClassifier predicting the majority class and mention that we need any predictive model to perform above this accuracy level to ensure that the model capture any signal.

Use balanced accuracy with adult census dataset

fc5eb58

ArturoAmorQ mentioned this pull request Dec 11, 2024

Add notebooks to "Regularization in linear model" section #787

Open

Improve wording

239a7b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use balanced accuracy with adult census dataset #785

Use balanced accuracy with adult census dataset #785

ArturoAmorQ commented Dec 9, 2024

glemaitre commented Dec 13, 2024

Use balanced accuracy with adult census dataset #785

Are you sure you want to change the base?

Use balanced accuracy with adult census dataset #785

Conversation

ArturoAmorQ commented Dec 9, 2024

glemaitre commented Dec 13, 2024