Skip to content

Commit

Permalink
small changes in documentation before CRAN submission
Browse files Browse the repository at this point in the history
  • Loading branch information
LukaszChrostowski committed Nov 12, 2024
1 parent 6c64da2 commit 915c5f3
Show file tree
Hide file tree
Showing 5 changed files with 83 additions and 62 deletions.
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
- add estimation of exact standard error for k-nearest neighbor estimator.
- add breaking change to `controlOut` function by switching values for `predictive_match` argument. From now on, the `predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean matching imputation and `predictive_match = 2` corresponds to $\hat{y}-y$ matching.
- implement `div` option when variable selection (more in documentation) for doubly robust estimation.
- add more insights to `nonprob` output such as gradient, hessian and jacobian derived from IPW estimation for `mle` and `gee` methods when `IPW` or `DR` model executed.
- add estimated inclusion probabilities and its derivatives for probability and non-probability samples to `nonprob` output when `IPW` or `DR` model executed.
- add `model_frame` matrix data from probability sample used for mass imputation to `nonprob` when `MI` or `DR` model executed.

## nonprobsvy 0.1.0

Expand Down
69 changes: 39 additions & 30 deletions R/main_function_documentation.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ NULL
#' The function allows you to estimate the population mean with access to a reference probability sample, as well as sums and means of covariates.
#'
#' The package implements state-of-the-art approaches recently proposed in the literature: Chen et al. (2020),
#' Yang et al. (2020), Wu (2022) and use the [Lumley 2004](https://CRAN.R-project.org/package=survey) `survey` package for inference.
#' Yang et al. (2020), Wu (2022) and uses the [Lumley 2004](https://CRAN.R-project.org/package=survey) `survey` package for inference.
#'
#' It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbor) and
#' It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbour) and
#' doubly robust estimators that take into account minimisation of the asymptotic bias of the population mean estimators or
#' variable selection.
#' The package uses `survey` package functionality when a probability sample is available.
Expand All @@ -24,19 +24,19 @@ NULL
#' @param pop_totals an optional `named vector` with population totals of the covariates.
#' @param pop_means an optional `named vector` with population means of the covariates.
#' @param pop_size an optional `double` with population size.
#' @param method_selection a `character` with method for propensity scores estimation
#' @param method_outcome a `character` with method for response variable estimation
#' @param method_selection a `character` with method for propensity scores estimation.
#' @param method_outcome a `character` with method for response variable estimation.
#' @param family_outcome a `character` string describing the error distribution and link function to be used in the model. Default is "gaussian". Currently supports: gaussian with identity link, poisson and binomial.
#' @param subset an optional `vector` specifying a subset of observations to be used in the fitting process.
#' @param strata an optional `vector` specifying strata.
#' @param weights an optional `vector` of prior weights to be used in the fitting process. Should be NULL or a numeric vector. It is assumed that this vector contains frequency or analytic weights
#' @param weights an optional `vector` of prior weights to be used in the fitting process. Should be NULL or a numeric vector. It is assumed that this vector contains frequency or analytic weights.
#' @param na_action a function which indicates what should happen when the data contain `NAs`.
#' @param control_selection a `list` indicating parameters to use in fitting selection model for propensity scores
#' @param control_outcome a `list` indicating parameters to use in fitting model for outcome variable
#' @param control_inference a `list` indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method
#' @param start_selection an optional `vector` with starting values for the parameters of the selection equation
#' @param start_outcome an optional `vector` with starting values for the parameters of the outcome equation
#' @param verbose verbose, numeric
#' @param control_selection a `list` indicating parameters to use in fitting selection model for propensity scores.
#' @param control_outcome a `list` indicating parameters to use in fitting model for outcome variable.
#' @param control_inference a `list` indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method.
#' @param start_selection an optional `vector` with starting values for the parameters of the selection equation.
#' @param start_outcome an optional `vector` with starting values for the parameters of the outcome equation.
#' @param verbose verbose, numeric.
#' @param x Logical value indicating whether to return model matrix of covariates as a part of output.
#' @param y Logical value indicating whether to return vector of outcome variable as a part of output.
#' @param se Logical value indicating whether to calculate and return standard error of estimated mean.
Expand Down Expand Up @@ -188,25 +188,26 @@ NULL
#' \item{\code{control} -- list of control functions.}
#' \item{\code{output} -- output of the model with information on the estimated population mean and standard errors.}
#' \item{\code{SE} -- standard error of the estimator of the population mean, divided into errors from probability and non-probability samples.}
#' \item{\code{confidence_interval} -- confidence interval of population mean estimator}
#' \item{\code{nonprob_size} -- size of non-probability sample}
#' \item{\code{prob_size} -- size of probability sample}
#' \item{\code{pop_size} -- estimated population size derived from estimated weights (non-probability sample) or known design weights (probability sample)}
#' \item{\code{confidence_interval} -- confidence interval of population mean estimator.}
#' \item{\code{nonprob_size} -- size of non-probability sample.}
#' \item{\code{prob_size} -- size of probability sample.}
#' \item{\code{pop_size} -- estimated population size derived from estimated weights (non-probability sample) or known design weights (probability sample).}
#' \item{\code{pop_totals} -- the total values of the auxiliary variables derived from a probability sample or vector of total/mean values.}
#' \item{\code{outcome} -- list containing information about the fitting of the mass imputation model, in the case of regression model the object containing the list returned by
#' [stats::glm()], in the case of the nearest neighbour imputation the object containing list returned by [RANN::nn2()]. If `bias_correction` in [controlInf()] is set to `TRUE`, the estimation is based on
#' the joint estimating equations for the `selection` and `outcome` model and therefore, the list is different from the one returned by the [stats::glm()] function and contains elements such as
#' \itemize{
#' \item{\code{coefficients} -- estimated coefficients of the regression model}
#' \item{\code{std_err} -- standard errors of the estimated coefficients}
#' \item{\code{residuals} -- The response residuals}
#' \item{\code{variance_covariance} -- The variance-covariance matrix of the coefficient estimates}
#' \item{\code{df_residual} -- The degrees of freedom for residuals}
#' \item{\code{family} -- specifies the error distribution and link function to be used in the model}
#' \item{\code{fitted.values} -- The predicted values of the response variable based on the fitted model}
#' \item{\code{linear.predictors} -- The linear fit on link scale}
#' \item{\code{X} -- The design matrix}
#' \item{\code{method} -- set on `glm`, since the regression method}
#' \item{\code{coefficients} -- estimated coefficients of the regression model.}
#' \item{\code{std_err} -- standard errors of the estimated coefficients.}
#' \item{\code{residuals} -- The response residuals.}
#' \item{\code{variance_covariance} -- The variance-covariance matrix of the coefficient estimates.}
#' \item{\code{df_residual} -- The degrees of freedom for residuals.}
#' \item{\code{family} -- specifies the error distribution and link function to be used in the model.}
#' \item{\code{fitted.values} -- The predicted values of the response variable based on the fitted model.}
#' \item{\code{linear.predictors} -- The linear fit on link scale.}
#' \item{\code{X} -- The design matrix.}
#' \item{\code{method} -- set on `glm`, since the regression method.}
#' \item{\code{model_frame} -- Matrix of data from probability sample used for mass imputation.}
#' }
#' }
#' In addition, if the variable selection model for the outcome variable is fitting, the list includes the
Expand All @@ -215,22 +216,30 @@ NULL
#' }
#' \item{\code{selection} -- list containing information about fitting of propensity score model, such as
#' \itemize{
#' \item{\code{coefficients} -- a named vector of coefficients}
#' \item{\code{std_err} -- standard errors of the estimated model coefficients}
#' \item{\code{residuals} -- the response residuals}
#' \item{\code{variance} -- the root mean square error}
#' \item{\code{coefficients} -- a named vector of coefficients.}
#' \item{\code{std_err} -- standard errors of the estimated model coefficients.}
#' \item{\code{residuals} -- the response residuals.}
#' \item{\code{variance} -- the root mean square error.}
#' \item{\code{fitted_values} -- the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.}
#' \item{\code{link} -- the `link` object used.}
#' \item{\code{linear_predictors} -- the linear fit on link scale.}
#' \item{\code{aic} -- A version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters.}
#' \item{\code{weights} -- vector of estimated weights for non-probability sample.}
#' \item{\code{prior.weights} -- the weights initially supplied, a vector of 1s if none were.}
#' \item{\code{est_totals} -- the estimated total values of auxiliary variables derived from a non-probability sample.}
#' \item{\code{est_totals} -- the estimated total values of auxiliary variables derived from a non-probability sample}.
#' \item{\code{formula} -- the formula supplied.}
#' \item{\code{df_residual} -- the residual degrees of freedom.}
#' \item{\code{log_likelihood} -- value of log-likelihood function if `mle` method, in the other case `NA`.}
#' \item{\code{cve} -- the error for each value of the `lambda`, averaged across the cross-validation folds for the variable selection model
#' when the propensity score model is fitting. Returned only if selection of variables for the model is used.}
#' \item{\code{method_selection} -- Link function, e.g. `logit`, `cloglog` or `probit`.}
#' \item{\code{hessian} -- Hessian Gradient of the log-likelihood function from `mle` method}.
#' \item{\code{gradient} -- Gradient of the log-likelihood function from `mle` method.}
#' \item{\code{method} -- An estimation method for selection model, e.g. `mle` or `gee`.}
#' \item{\code{prob_der} -- Derivative of the inclusion probability function for units in a non--probability sample.}
#' \item{\code{prob_rand} -- Inclusion probabilities for unit from a probabiliy sample from `svydesign` object.}
#' \item{\code{prob_rand_est} -- Inclusion probabilites to a non--probabiliy sample for unit from probability sample.}
#' \item{\code{prob_rand_est_der} -- Derivative of the inclusion probabilites to a non--probabiliy sample for unit from probability sample.}
#' }
#' }
#' \item{\code{stat} -- matrix of the estimated population means in each bootstrap iteration.
Expand Down
2 changes: 1 addition & 1 deletion R/summary.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
#' \item \code{call} -- A call which created \code{object}.
#' \item \code{pop_total} -- A list containing information about the estimated population mean, its standard error and confidence interval.
#' \item \code{sample_size} -- The size of the samples used in the model.
#' \item \code{population_size} -- The estimated size of the population from which the nonoprobability sample was drawn.
#' \item \code{population_size} -- The estimated size of the population from which the non--probability sample was drawn.
#' \item \code{test} -- Type of statistical test performed.
#' \item \code{control} -- A List of control parameters used in fitting the model.
#' \item \code{model} -- A descriptive name of the model used, e.g., "Doubly-Robust", "Inverse probability weighted", or "Mass Imputation".
Expand Down
Loading

0 comments on commit 915c5f3

Please sign in to comment.