Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Use score in tree hyperparameter notebook #503

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Collaborator

@glemaitre glemaitre commented Jan 6, 2022

This PR isolate the call to score in the hyperparameter notebook.
It is linked to the comment: #464 (comment)

In this PR, we should therefore address the concern of @ogrisel:

I don't see the point of measuring the scores only the training set. Here we speak about hyper-parameter tuning so this would be confusing to only display the training score. I think this notebook needs to be reworked to do a train test split and the plots should display both training and test errors, or neither.

Maybe the plots should be duplicated to each do 2 subplots: one with the prediction function displayed on top of a scatter plot of the samples of the training set (with the training score in the title) and another with the same prediction function displayed on top of a scatter plot of the samples of the testing set (with the testing score in the title).

And then we should comment on those scores to summarize the impact of the hyper-parameters in terms of the overfitting / underfitting trade-off.

@lesteve lesteve changed the title Use score in tree hyperpatameter notebook Use score in tree hyperparameter notebook Jan 6, 2022
@ogrisel ogrisel changed the title Use score in tree hyperparameter notebook WIP Use score in tree hyperparameter notebook Jan 7, 2022
Copy link
Collaborator

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to review this PR but I noticed that it requires blackification first.

Comment on lines +128 to +133
accuracy = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])

_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {accuracy:.2f}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accuracy = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])
_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {accuracy:.2f}"
)
r2 = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])
_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {r2:.2f}"
)

Comment on lines +159 to +164
accuracy = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])

_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {accuracy:.2f}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
accuracy = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])
_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {accuracy:.2f}"
)
r2 = tree_reg.score(data_reg[data_reg_columns], data_reg[target_reg_column])
_ = plt.title(
f"Shallow regression tree with max-depth of {max_depth}"
f"\n R$^2$ of the fit: {r2:.2f}"
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants