Cross-Validated Covariance Matrix Estimation
Authors: Philippe Boileau, Brian Collica, and Nima Hejazi
cvCovEst
implements an efficient cross-validated procedure for
covariance matrix estimation, particularly useful in high-dimensional
settings. The general methodology allows for cross-validation to be used
to data adaptively identify the optimal estimator of the covariance
matrix from a prespecified set of candidate estimators. An overview of
the framework is provided in the package vignette. For a more detailed
description, see Boileau et al. (2021). A suite of plotting and
diagnostic tools are also included.
For standard use, install cvCovEst
from
CRAN:
install.packages("cvCovEst")
The development version of the package may be installed from GitHub
using remotes
:
remotes::install_github("PhilBoileau/cvCovEst")
To illustrate how cvCovEst
may be used to select an optimal covariance
matrix estimator via cross-validation, consider the following toy
example:
library(MASS)
library(cvCovEst)
set.seed(1584)
# generate a 50x50 covariance matrix with unit variances and off-diagonal
# elements equal to 0.5
Sigma <- matrix(0.5, nrow = 50, ncol = 50) + diag(0.5, nrow = 50)
# sample 50 observations from multivariate normal with mean = 0, var = Sigma
dat <- mvrnorm(n = 50, mu = rep(0, 50), Sigma = Sigma)
# run CV-selector
cv_cov_est_out <- cvCovEst(
dat = dat,
estimators = c(linearShrinkLWEst, denseLinearShrinkEst,
thresholdingEst, poetEst, sampleCovEst),
estimator_params = list(
thresholdingEst = list(gamma = c(0.2, 2)),
poetEst = list(lambda = c(0.1, 0.2), k = c(1L, 2L))
),
cv_loss = cvMatrixFrobeniusLoss,
cv_scheme = "v_fold",
v_folds = 5
)
# print the table of risk estimates
# NOTE: the estimated covariance matrix is accessible via the `$estimate` slot
cv_cov_est_out$risk_df
#> # A tibble: 9 × 3
#> estimator hyperparameters cv_risk
#> <chr> <chr> <dbl>
#> 1 linearShrinkLWEst hyperparameters = NA 357.
#> 2 poetEst lambda = 0.2, k = 1 369.
#> 3 poetEst lambda = 0.2, k = 2 372.
#> 4 poetEst lambda = 0.1, k = 2 375.
#> 5 poetEst lambda = 0.1, k = 1 376.
#> 6 denseLinearShrinkEst hyperparameters = NA 379.
#> 7 sampleCovEst hyperparameters = NA 379.
#> 8 thresholdingEst gamma = 0.2 384.
#> 9 thresholdingEst gamma = 2 826.
If you encounter any bugs or have any specific feature requests, please file an issue.
Contributions are very welcome. Interested contributors should consult our contribution guidelines prior to submitting a pull request.
Please cite the following paper when using the cvCovEst
R software
package.
@article{cvCovEst2021,
doi = {10.21105/joss.03273},
url = {https://doi.org/10.21105/joss.03273},
year = {2021},
publisher = {The Open Journal},
volume = {6},
number = {63},
pages = {3273},
author = {Philippe Boileau and Nima S. Hejazi and Brian Collica and Mark J. van der Laan and Sandrine Dudoit},
title = {cvCovEst: Cross-validated covariance matrix estimator selection and evaluation in `R`},
journal = {Journal of Open Source Software}
}
When describing or discussing the theory underlying the cvCovEst
method, or simply using the method, please cite the pre-print below.
@article{boileau2022,
author = {Philippe Boileau and Nima S. Hejazi and Mark J. van der Laan and Sandrine Dudoit},
doi = {10.1080/10618600.2022.2110883},
eprint = {https://doi.org/10.1080/10618600.2022.2110883},
journal = {Journal of Computational and Graphical Statistics},
number = {ja},
pages = {1-28},
publisher = {Taylor & Francis},
title = {Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions},
url = {https://doi.org/10.1080/10618600.2022.2110883},
volume = {0},
year = {2022},
bdsk-url-1 = {https://doi.org/10.1080/10618600.2022.2110883}}
© 2020-2023 Philippe Boileau
The contents of this repository are distributed under the MIT license.
See file
LICENSE.md
for details.
Boileau, Philippe, Nima S. Hejazi, Mark J. van der Laan, and Sandrine Dudoit. 2021. “Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions.” https://arxiv.org/abs/2102.09715.