Skip to content

Commit

Permalink
citation file added, description updated, nonprob function documentat…
Browse files Browse the repository at this point in the history
…ion updated
  • Loading branch information
BERENZ committed Jan 30, 2025
1 parent 418cdbd commit 1a40409
Show file tree
Hide file tree
Showing 8 changed files with 235 additions and 180 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Authors@R:
role = c("aut", "ctb"),
email = "[email protected]",
comment = c(ORCID = "0009-0006-4867-7434")))
Description: Statistical inference with non-probability samples when auxiliary information from external sources such as probability samples or population totals or means is available. Details can be found in: Wu et al. (2020) <doi:10.1080/01621459.2019.1677241>, Kim et al. (2021) <doi:10.1111/rssa.12696>, Wu et al. (2023) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm>, Kim et al. (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021001/article/00004-eng.htm>, Kim et al. (2020) <doi:10.1111/rssb.12354>.
Description: Statistical inference with non-probability samples when auxiliary information from external sources such as probability samples or population totals or means is available. The package implements various methods such as inverse probability (propensity score) weighting, mass imputation and doubly robust approach. Details can be found in: Wu et al. (2020) <doi:10.1080/01621459.2019.1677241>, Kim et al. (2021) <doi:10.1111/rssa.12696>, Wu et al. (2023) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm>, Kim et al. (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021001/article/00004-eng.htm>, Kim et al. (2020) <doi:10.1111/rssb.12354>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Expand Down
124 changes: 89 additions & 35 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,129 @@
# nonprobsvy News and Updates

# nonprobsvy 0.2

------------------------------------------------------------------------

### Breaking changes

- functions `pop.size`, `controlSel`, `controlOut` and `controlInf` were renamed to `pop_size`, `control_sel`, `control_out` and `control_inf` respectively.
- function `genSimData` removed completely as it is not used anywhere in the package.
- argument `maxLik_method` renamed to `maxlik_method` in the `control_sel` function.
- functions `pop.size`, `controlSel`, `controlOut` and `controlInf`
were renamed to `pop_size`, `control_sel`, `control_out` and
`control_inf` respectively.
- function `genSimData` removed completely as it is not used anywhere
in the package.
- argument `maxLik_method` renamed to `maxlik_method` in the
`control_sel` function.

### Features

- two additional datasets have been included: `jvs` (Job Vacancy Survey; a probability sample survey) and `admin` (Central Job Offers Database; a non-probability sample survey). The units and auxiliary variables have been aligned in a way that allows the data to be integrated using the methods implemented in this package.
- a `nonprobsvycheck` function was added to check the balance in the totals of the variables based on the weighted weights between the non-probability and probability samples.
- citation file added.
- two additional datasets have been included: `jvs` (Job Vacancy
Survey; a probability sample survey) and `admin` (Central Job Offers
Database; a non-probability sample survey). The units and auxiliary
variables have been aligned in a way that allows the data to be
integrated using the methods implemented in this package.
- a `nonprobsvycheck` function was added to check the balance in the
totals of the variables based on the weighted weights between the
non-probability and probability samples.
- citation file added.

### Bugfixes
- basic methods and functions related to variance estimation, weights and probability linking methods have been rewritten in a more optimal and readable way.

- basic methods and functions related to variance estimation, weights
and probability linking methods have been rewritten in a more
optimal and readable way.

### Other
- more informative error messages added.

- more informative error messages added.

### Documentation

- annotation has been added that arguments such as `strata`, `subset` and `na_action` are not supported for the time being.
- annotation has been added that arguments such as `strata`, `subset`
and `na_action` are not supported for the time being.

# nonprobsvy 0.1.1

------------------------------------------------------------------------

### Bugfixes
- bug Fix occurring when estimation was based on auxiliary variable, which led to compression of the data from the frame to the vector.
- bug Fix related to not passing `maxit` argument from `controlSel` function to internally used `nleqslv` function
- bug Fix related to storing `vector` in `model_frame` when predicting `y_hat` in mass imputation `glm` model when X is based in one auxiliary variable only - fix provided converting it to `data.frame` object.


- bug Fix occurring when estimation was based on auxiliary variable,
which led to compression of the data from the frame to the vector.
- bug Fix related to not passing `maxit` argument from `controlSel`
function to internally used `nleqslv` function
- bug Fix related to storing `vector` in `model_frame` when predicting
`y_hat` in mass imputation `glm` model when X is based in one
auxiliary variable only - fix provided converting it to `data.frame`
object.

### Features
- added information to `summary` about quality of estimation basing on difference between estimated and known total values of auxiliary variables
- added estimation of exact standard error for k-nearest neighbor estimator.
- added breaking change to `controlOut` function by switching values for `predictive_match` argument. From now on, the `predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean matching imputation and `predictive_match = 2` corresponds to $\hat{y}-y$ matching.
- implemented `div` option when variable selection (more in documentation) for doubly robust estimation.
- added more insights to `nonprob` output such as gradient, hessian and jacobian derived from IPW estimation for `mle` and `gee` methods when `IPW` or `DR` model executed.
- added estimated inclusion probabilities and its derivatives for probability and non-probability samples to `nonprob` output when `IPW` or `DR` model executed.
- added `model_frame` matrix data from probability sample used for mass imputation to `nonprob` when `MI` or `DR` model executed.

- added information to `summary` about quality of estimation basing on
difference between estimated and known total values of auxiliary
variables
- added estimation of exact standard error for k-nearest neighbor
estimator.
- added breaking change to `controlOut` function by switching values
for `predictive_match` argument. From now on, the
`predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean
matching imputation and `predictive_match = 2` corresponds to
$\hat{y}-y$ matching.
- implemented `div` option when variable selection (more in
documentation) for doubly robust estimation.
- added more insights to `nonprob` output such as gradient, hessian
and jacobian derived from IPW estimation for `mle` and `gee` methods
when `IPW` or `DR` model executed.
- added estimated inclusion probabilities and its derivatives for
probability and non-probability samples to `nonprob` output when
`IPW` or `DR` model executed.
- added `model_frame` matrix data from probability sample used for
mass imputation to `nonprob` when `MI` or `DR` model executed.

### Unit tests
- added unit tests for variable selection models and mi estimation with vector of population totals available


- added unit tests for variable selection models and mi estimation
with vector of population totals available

# nonprobsvy 0.1.0

------------------------------------------------------------------------

### Features

- implemented population mean estimation using doubly robust, inverse probability weighting and mass imputation methods
- implemented inverse probability weighting models with Maximum Likelihood Estimation and Generalized Estimating Equations methods with `logit`, `complementary log-log` and `probit` link functions.
- implemented `generalized linear models`, `nearest neighbours` and `predictive mean matching` methods for Mass Imputation
- implemented population mean estimation using doubly robust, inverse
probability weighting and mass imputation methods
- implemented inverse probability weighting models with Maximum
Likelihood Estimation and Generalized Estimating Equations methods
with `logit`, `complementary log-log` and `probit` link functions.
- implemented `generalized linear models`, `nearest neighbours` and
`predictive mean matching` methods for Mass Imputation
- implemented bias correction estimators for doubly-robust approach
- implemented estimation methods when vector of population means/totals is available
- implemented variables selection with `SCAD`, `LASSO` and `MCP` penalization equations
- implemented `analytic` and `bootstrap` (with parallel computation - `doSNOW` package) variance for described estimators
- implemented estimation methods when vector of population
means/totals is available
- implemented variables selection with `SCAD`, `LASSO` and `MCP`
penalization equations
- implemented `analytic` and `bootstrap` (with parallel computation -
`doSNOW` package) variance for described estimators
- added control parameters for models
- added S3 methods for object of `nonprob` class such as
- `nobs` for samples size
- `pop.size` for population size estimation
- `residuals` for residuals of the inverse probability weighting model
- `cooks.distance` for identifying influential observations that have a significant impact on the parameter estimates
- `hatvalues` for measuring the leverage of individual observations
- `residuals` for residuals of the inverse probability weighting
model
- `cooks.distance` for identifying influential observations that
have a significant impact on the parameter estimates
- `hatvalues` for measuring the leverage of individual
observations
- `logLik` for computing the log-likelihood of the model,
- `AIC` (Akaike Information Criterion) for evaluating the model based on the trade-off between goodness of fit and complexity, helping in model selection
- `BIC` (Bayesian Information Criterion) for a similar purpose as AIC but with a stronger penalty for model complexity
- `confint` for calculating confidence intervals around parameter estimates
- `vcov` for obtaining the variance-covariance matrix of the parameter estimates
- `AIC` (Akaike Information Criterion) for evaluating the model
based on the trade-off between goodness of fit and complexity,
helping in model selection
- `BIC` (Bayesian Information Criterion) for a similar purpose as
AIC but with a stronger penalty for model complexity
- `confint` for calculating confidence intervals around parameter
estimates
- `vcov` for obtaining the variance-covariance matrix of the
parameter estimates
- `deviance` for assessing the goodness of fit of the model

### Unit tests
Expand Down
Loading

0 comments on commit 1a40409

Please sign in to comment.