- download
data4.csv
from https://upenn.box.com/s/kp43egvarlv6bsgbxuchnbc9fcj9iave and put indata/
folder - you can also generate the dataset from code:
- enter
data
folder - run
d1_data_acquisition.R
-> outputdata_nflFastR_pbp_1999_2022.csv
- run
d2_data_acquisition.R
-> outputdata2.csv
- run
d3a_data_TeamQualityMetrics_epa0HyperParamTuning.R
to tune the hyperparameters for the 8 hand-crafted team quality metrics - clear the environment workspace and run
d3b_data_TeamQualityMetrics_epa0.R
-> outputdata3.csv
, which contains one initial train/test split column and 8 hand-crafted team quality metrics built from EPA0 fit from this initial training set. - run
d4_data_drives.R
-> outputdata4.csv
- enter
- enter
model_comparison
folder - tune XGB (XGBoost) params: run
param_tuning.R
parallelized on a cluster viarun_param_tuning.sh
, then transfer the outputted.yaml
files (which store the tuned params) from the folderparam_tuning_results
into the folderparam_tuning_results_FINAL
- the saved
.yaml
files that store the tuned XGB hyperparameters should already be inparam_tuning_results_FINAL
- the saved
- evaluate EP models (prediction accuracy): run
eval_EP_models.R
(on a cluster viarun_eval_driveEP_models.sh
) -> outputFIXME
- train and save models on the full dataset:
FIXME
- enter
plotting
folder - run
A_plot_EP.R
to visualize EP models- Before visualizing XGB models, need to train and save full XGBoost models via
model_comparison/train_full_models.R
; some of these models should already be saved in the Github
- Before visualizing XGB models, need to train and save full XGBoost models via
- run
A_plot_team_quality.R
to visualize our hand-crafted team quality metrics - run
A_plot_selection_bias.R
to visualize selection bias induced by not adjusting for team quality - run
A_plot_summary_stats.R
to visualize some data summary statistics