citation file added, description updated, nonprob function documentat…

…ion updated
ncn-foreigners · Jan 30, 2025 · 1a40409 · 1a40409
1 parent 418cdbd
commit 1a40409
Show file tree

Hide file tree

Showing 8 changed files with 235 additions and 180 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -17,7 +17,7 @@ Authors@R:
              role = c("aut", "ctb"),
              email = "[email protected]",
              comment = c(ORCID = "0009-0006-4867-7434")))
-Description: Statistical inference with non-probability samples when auxiliary information from external sources such as probability samples or population totals or means is available. Details can be found in: Wu et al. (2020) <doi:10.1080/01621459.2019.1677241>, Kim et al. (2021) <doi:10.1111/rssa.12696>, Wu et al. (2023) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm>, Kim et al. (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021001/article/00004-eng.htm>, Kim et al. (2020) <doi:10.1111/rssb.12354>.
+Description: Statistical inference with non-probability samples when auxiliary information from external sources such as probability samples or population totals or means is available. The package implements various methods such as inverse probability (propensity score) weighting, mass imputation and doubly robust approach. Details can be found in: Wu et al. (2020) <doi:10.1080/01621459.2019.1677241>, Kim et al. (2021) <doi:10.1111/rssa.12696>, Wu et al. (2023) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm>, Kim et al. (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021001/article/00004-eng.htm>, Kim et al. (2020) <doi:10.1111/rssb.12354>.
 License: MIT + file LICENSE
 Encoding: UTF-8
 LazyData: true

diff --git a/NEWS.md b/NEWS.md
@@ -1,75 +1,129 @@
+# nonprobsvy News and Updates
+
 # nonprobsvy 0.2
 
 ------------------------------------------------------------------------
 
 ### Breaking changes
 
-- functions `pop.size`, `controlSel`, `controlOut` and `controlInf` were renamed to `pop_size`, `control_sel`, `control_out` and `control_inf` respectively.
-- function `genSimData` removed completely as it is not used anywhere in the package.
-- argument `maxLik_method` renamed to `maxlik_method` in the `control_sel` function.
+-   functions `pop.size`, `controlSel`, `controlOut` and `controlInf`
+    were renamed to `pop_size`, `control_sel`, `control_out` and
+    `control_inf` respectively.
+-   function `genSimData` removed completely as it is not used anywhere
+    in the package.
+-   argument `maxLik_method` renamed to `maxlik_method` in the
+    `control_sel` function.
 
 ### Features
 
-- two additional datasets have been included: `jvs` (Job Vacancy Survey; a probability sample survey) and `admin` (Central Job Offers Database; a non-probability sample survey). The units and auxiliary variables have been aligned in a way that allows the data to be integrated using the methods implemented in this package.
-- a `nonprobsvycheck` function was added to check the balance in the totals of the variables based on the weighted weights between the non-probability and probability samples.
-- citation file added.
+-   two additional datasets have been included: `jvs` (Job Vacancy
+    Survey; a probability sample survey) and `admin` (Central Job Offers
+    Database; a non-probability sample survey). The units and auxiliary
+    variables have been aligned in a way that allows the data to be
+    integrated using the methods implemented in this package.
+-   a `nonprobsvycheck` function was added to check the balance in the
+    totals of the variables based on the weighted weights between the
+    non-probability and probability samples.
+-   citation file added.
 
 ### Bugfixes
-- basic methods and functions related to variance estimation, weights and probability linking methods have been rewritten in a more optimal and readable way.
+
+-   basic methods and functions related to variance estimation, weights
+    and probability linking methods have been rewritten in a more
+    optimal and readable way.
 
 ### Other
-- more informative error messages added.
+
+-   more informative error messages added.
 
 ### Documentation
 
-- annotation has been added that arguments such as `strata`, `subset` and `na_action` are not supported for the time being.
+-   annotation has been added that arguments such as `strata`, `subset`
+    and `na_action` are not supported for the time being.
 
 # nonprobsvy 0.1.1
 
 ------------------------------------------------------------------------
 
 ### Bugfixes
-- bug Fix occurring when estimation was based on auxiliary variable, which led to compression of the data from the frame to the vector.
-- bug Fix related to not passing `maxit` argument from `controlSel` function to internally used `nleqslv` function
-- bug Fix related to storing `vector` in `model_frame` when predicting `y_hat` in mass imputation `glm` model when X is based in one auxiliary variable only - fix provided converting it to `data.frame` object.
-
+
+-   bug Fix occurring when estimation was based on auxiliary variable,
+    which led to compression of the data from the frame to the vector.
+-   bug Fix related to not passing `maxit` argument from `controlSel`
+    function to internally used `nleqslv` function
+-   bug Fix related to storing `vector` in `model_frame` when predicting
+    `y_hat` in mass imputation `glm` model when X is based in one
+    auxiliary variable only - fix provided converting it to `data.frame`
+    object.
+
 ### Features
-- added information to `summary` about quality of estimation basing on difference between estimated and known total values of auxiliary variables
-- added estimation of exact standard error for k-nearest neighbor estimator.
-- added breaking change to `controlOut` function by switching values for `predictive_match` argument. From now on, the `predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean matching imputation and `predictive_match = 2` corresponds to $\hat{y}-y$ matching.
-- implemented `div` option when variable selection (more in documentation) for doubly robust estimation.
-- added more insights to `nonprob` output such as gradient, hessian and jacobian derived from IPW estimation for `mle` and `gee` methods when `IPW` or `DR` model executed.
-- added estimated inclusion probabilities and its derivatives for probability and non-probability samples to `nonprob` output when `IPW` or `DR` model executed.
-- added `model_frame` matrix data from probability sample used for mass imputation to `nonprob` when `MI` or `DR` model executed.
+
+-   added information to `summary` about quality of estimation basing on
+    difference between estimated and known total values of auxiliary
+    variables
+-   added estimation of exact standard error for k-nearest neighbor
+    estimator.
+-   added breaking change to `controlOut` function by switching values
+    for `predictive_match` argument. From now on, the
+    `predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean
+    matching imputation and `predictive_match = 2` corresponds to
+    $\hat{y}-y$ matching.
+-   implemented `div` option when variable selection (more in
+    documentation) for doubly robust estimation.
+-   added more insights to `nonprob` output such as gradient, hessian
+    and jacobian derived from IPW estimation for `mle` and `gee` methods
+    when `IPW` or `DR` model executed.
+-   added estimated inclusion probabilities and its derivatives for
+    probability and non-probability samples to `nonprob` output when
+    `IPW` or `DR` model executed.
+-   added `model_frame` matrix data from probability sample used for
+    mass imputation to `nonprob` when `MI` or `DR` model executed.
 
 ### Unit tests
-- added unit tests for variable selection models and mi estimation with vector of population totals available
-
+
+-   added unit tests for variable selection models and mi estimation
+    with vector of population totals available
+
 # nonprobsvy 0.1.0
 
 ------------------------------------------------------------------------
 
 ### Features
 
--   implemented population mean estimation using doubly robust, inverse probability weighting and mass imputation methods
--   implemented inverse probability weighting models with Maximum Likelihood Estimation and Generalized Estimating Equations methods with `logit`, `complementary log-log` and `probit` link functions.
--   implemented `generalized linear models`, `nearest neighbours` and `predictive mean matching` methods for Mass Imputation
+-   implemented population mean estimation using doubly robust, inverse
+    probability weighting and mass imputation methods
+-   implemented inverse probability weighting models with Maximum
+    Likelihood Estimation and Generalized Estimating Equations methods
+    with `logit`, `complementary log-log` and `probit` link functions.
+-   implemented `generalized linear models`, `nearest neighbours` and
+    `predictive mean matching` methods for Mass Imputation
 -   implemented bias correction estimators for doubly-robust approach
--   implemented estimation methods when vector of population means/totals is available
--   implemented variables selection with `SCAD`, `LASSO` and `MCP` penalization equations
--   implemented `analytic` and `bootstrap` (with parallel computation - `doSNOW` package) variance for described estimators
+-   implemented estimation methods when vector of population
+    means/totals is available
+-   implemented variables selection with `SCAD`, `LASSO` and `MCP`
+    penalization equations
+-   implemented `analytic` and `bootstrap` (with parallel computation -
+    `doSNOW` package) variance for described estimators
 -   added control parameters for models
 -   added S3 methods for object of `nonprob` class such as
     -   `nobs` for samples size
     -   `pop.size` for population size estimation
-    -   `residuals` for residuals of the inverse probability weighting model
-    -   `cooks.distance` for identifying influential observations that have a significant impact on the parameter estimates
-    -   `hatvalues` for measuring the leverage of individual observations
+    -   `residuals` for residuals of the inverse probability weighting
+        model
+    -   `cooks.distance` for identifying influential observations that
+        have a significant impact on the parameter estimates
+    -   `hatvalues` for measuring the leverage of individual
+        observations
     -   `logLik` for computing the log-likelihood of the model,
-    -   `AIC` (Akaike Information Criterion) for evaluating the model based on the trade-off between goodness of fit and complexity, helping in model selection
-    -   `BIC` (Bayesian Information Criterion) for a similar purpose as AIC but with a stronger penalty for model complexity
-    -   `confint` for calculating confidence intervals around parameter estimates
-    -   `vcov` for obtaining the variance-covariance matrix of the parameter estimates
+    -   `AIC` (Akaike Information Criterion) for evaluating the model
+        based on the trade-off between goodness of fit and complexity,
+        helping in model selection
+    -   `BIC` (Bayesian Information Criterion) for a similar purpose as
+        AIC but with a stronger penalty for model complexity
+    -   `confint` for calculating confidence intervals around parameter
+        estimates
+    -   `vcov` for obtaining the variance-covariance matrix of the
+        parameter estimates
     -   `deviance` for assessing the goodness of fit of the model
 
 ### Unit tests