small changes in documentation before CRAN submission

ncn-foreigners · Nov 12, 2024 · 915c5f3 · 915c5f3
1 parent 6c64da2
commit 915c5f3
Show file tree

Hide file tree

Showing 5 changed files with 83 additions and 62 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -9,6 +9,9 @@
     -   add estimation of exact standard error for k-nearest neighbor estimator.
     -   add breaking change to `controlOut` function by switching values for `predictive_match` argument. From now on, the `predictive_match = 1` means $\hat{y}-\hat{y}$ in predictive mean matching imputation and `predictive_match = 2` corresponds to $\hat{y}-y$ matching.
     - implement `div` option when variable selection (more in documentation) for doubly robust estimation.
+    - add more insights to `nonprob` output such as gradient, hessian and jacobian derived from IPW estimation for `mle` and `gee` methods when `IPW` or `DR` model executed.
+    - add estimated inclusion probabilities and its derivatives for probability and non-probability samples to `nonprob` output when `IPW` or `DR` model executed.
+    - add `model_frame` matrix data from probability sample used for mass imputation to `nonprob` when `MI` or `DR` model executed.
 
 ## nonprobsvy 0.1.0
 

diff --git a/R/main_function_documentation.R b/R/main_function_documentation.R
@@ -8,9 +8,9 @@ NULL
 #' The function allows you to estimate the population mean with access to a reference probability sample, as well as sums and means of covariates.
 #'
 #' The package implements state-of-the-art approaches recently proposed in the literature: Chen et al. (2020),
-#' Yang et al. (2020), Wu (2022) and use the [Lumley 2004](https://CRAN.R-project.org/package=survey) `survey` package for inference.
+#' Yang et al. (2020), Wu (2022) and uses the [Lumley 2004](https://CRAN.R-project.org/package=survey) `survey` package for inference.
 #'
-#' It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbor) and
+#' It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbour) and
 #' doubly robust estimators that take into account minimisation of the asymptotic bias of the population mean estimators or
 #' variable selection.
 #' The package uses `survey` package functionality when a probability sample is available.
@@ -24,19 +24,19 @@ NULL
 #' @param pop_totals an optional `named vector` with population totals of the covariates.
 #' @param pop_means an optional `named vector` with population means of the covariates.
 #' @param pop_size an optional `double` with population size.
-#' @param method_selection a `character` with method for propensity scores estimation
-#' @param method_outcome a `character` with method for response variable estimation
+#' @param method_selection a `character` with method for propensity scores estimation.
+#' @param method_outcome a `character` with method for response variable estimation.
 #' @param family_outcome a `character` string describing the error distribution and link function to be used in the model. Default is "gaussian". Currently supports: gaussian with identity link, poisson and binomial.
 #' @param subset an optional `vector` specifying a subset of observations to be used in the fitting process.
 #' @param strata an optional `vector` specifying strata.
-#' @param weights an optional `vector` of prior weights to be used in the fitting process. Should be NULL or a numeric vector. It is assumed that this vector contains frequency or analytic weights
+#' @param weights an optional `vector` of prior weights to be used in the fitting process. Should be NULL or a numeric vector. It is assumed that this vector contains frequency or analytic weights.
 #' @param na_action a function which indicates what should happen when the data contain `NAs`.
-#' @param control_selection a `list` indicating parameters to use in fitting selection model for propensity scores
-#' @param control_outcome a `list` indicating parameters to use in fitting model for outcome variable
-#' @param control_inference a `list` indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method
-#' @param start_selection an optional `vector` with starting values for the parameters of the selection equation
-#' @param start_outcome an optional `vector` with starting values for the parameters of the outcome equation
-#' @param verbose verbose, numeric
+#' @param control_selection a `list` indicating parameters to use in fitting selection model for propensity scores.
+#' @param control_outcome a `list` indicating parameters to use in fitting model for outcome variable.
+#' @param control_inference a `list` indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method.
+#' @param start_selection an optional `vector` with starting values for the parameters of the selection equation.
+#' @param start_outcome an optional `vector` with starting values for the parameters of the outcome equation.
+#' @param verbose verbose, numeric.
 #' @param x Logical value indicating whether to return model matrix of covariates as a part of output.
 #' @param y Logical value indicating whether to return vector of outcome variable as a part of output.
 #' @param se Logical value indicating whether to calculate and return standard error of estimated mean.
@@ -188,25 +188,26 @@ NULL
 #'  \item{\code{control} -- list of control functions.}
 #'  \item{\code{output} -- output of the model with information on the estimated population mean and standard errors.}
 #'  \item{\code{SE} -- standard error of the estimator of the population mean, divided into errors from probability and non-probability samples.}
-#'  \item{\code{confidence_interval} -- confidence interval of population mean estimator}
-#'  \item{\code{nonprob_size} -- size of non-probability sample}
-#'  \item{\code{prob_size} -- size of probability sample}
-#'  \item{\code{pop_size} -- estimated population size derived from estimated weights (non-probability sample) or known design weights (probability sample)}
+#'  \item{\code{confidence_interval} -- confidence interval of population mean estimator.}
+#'  \item{\code{nonprob_size} -- size of non-probability sample.}
+#'  \item{\code{prob_size} -- size of probability sample.}
+#'  \item{\code{pop_size} -- estimated population size derived from estimated weights (non-probability sample) or known design weights (probability sample).}
 #'  \item{\code{pop_totals} -- the total values of the auxiliary variables derived from a probability sample or vector of total/mean values.}
 #'  \item{\code{outcome} -- list containing information about the fitting of the mass imputation model, in the case of regression model the object containing the list returned by
 #'  [stats::glm()], in the case of the nearest neighbour imputation the object containing list returned by [RANN::nn2()]. If `bias_correction` in [controlInf()] is set to `TRUE`, the estimation is based on
 #'  the joint estimating equations for the `selection` and `outcome` model and therefore, the list is different from the one returned by the [stats::glm()] function and contains elements such as
 #'  \itemize{
-#'  \item{\code{coefficients} -- estimated coefficients of the regression model}
-#'  \item{\code{std_err} -- standard errors of the estimated coefficients}
-#'  \item{\code{residuals} -- The response residuals}
-#'  \item{\code{variance_covariance} -- The variance-covariance matrix of the coefficient estimates}
-#'  \item{\code{df_residual} -- The degrees of freedom for residuals}
-#'  \item{\code{family} -- specifies the error distribution and link function to be used in the model}
-#'  \item{\code{fitted.values} -- The predicted values of the response variable based on the fitted model}
-#'  \item{\code{linear.predictors} -- The linear fit on link scale}
-#'  \item{\code{X} -- The design matrix}
-#'  \item{\code{method} -- set on `glm`, since the regression method}
+#'  \item{\code{coefficients} -- estimated coefficients of the regression model.}
+#'  \item{\code{std_err} -- standard errors of the estimated coefficients.}
+#'  \item{\code{residuals} -- The response residuals.}
+#'  \item{\code{variance_covariance} -- The variance-covariance matrix of the coefficient estimates.}
+#'  \item{\code{df_residual} -- The degrees of freedom for residuals.}
+#'  \item{\code{family} -- specifies the error distribution and link function to be used in the model.}
+#'  \item{\code{fitted.values} -- The predicted values of the response variable based on the fitted model.}
+#'  \item{\code{linear.predictors} -- The linear fit on link scale.}
+#'  \item{\code{X} -- The design matrix.}
+#'  \item{\code{method} -- set on `glm`, since the regression method.}
+#'  \item{\code{model_frame} -- Matrix of data from probability sample used for mass imputation.}
 #'  }
 #'  }
 #'  In addition, if the variable selection model for the outcome variable is fitting, the list includes the
@@ -215,22 +216,30 @@ NULL
 #'  }
 #'  \item{\code{selection} -- list containing information about fitting of propensity score model, such as
 #'  \itemize{
-#'  \item{\code{coefficients} -- a named vector of coefficients}
-#'  \item{\code{std_err} -- standard errors of the estimated model coefficients}
-#'  \item{\code{residuals} -- the response residuals}
-#'  \item{\code{variance} -- the root mean square error}
+#'  \item{\code{coefficients} -- a named vector of coefficients.}
+#'  \item{\code{std_err} -- standard errors of the estimated model coefficients.}
+#'  \item{\code{residuals} -- the response residuals.}
+#'  \item{\code{variance} -- the root mean square error.}
 #'  \item{\code{fitted_values} -- the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.}
 #'  \item{\code{link} -- the `link` object used.}
 #'  \item{\code{linear_predictors} -- the linear fit on link scale.}
 #'  \item{\code{aic} --	A version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameters.}
 #'  \item{\code{weights} -- vector of estimated weights for non-probability sample.}
 #'  \item{\code{prior.weights} -- the weights initially supplied, a vector of 1s if none were.}
-#'  \item{\code{est_totals} -- the estimated total values of auxiliary variables derived from a non-probability sample.}
+#'  \item{\code{est_totals} -- the estimated total values of auxiliary variables derived from a non-probability sample}.
 #'  \item{\code{formula} -- the formula supplied.}
 #'  \item{\code{df_residual} -- the residual degrees of freedom.}
 #'  \item{\code{log_likelihood} -- value of log-likelihood function if `mle` method, in the other case `NA`.}
 #'  \item{\code{cve} -- the error for each value of the `lambda`, averaged across the cross-validation folds for the variable selection model
 #'  when the propensity score model is fitting. Returned only if selection of variables for the model is used.}
+#'  \item{\code{method_selection} -- Link function, e.g. `logit`, `cloglog` or `probit`.}
+#'  \item{\code{hessian} -- Hessian Gradient of the log-likelihood function from `mle` method}.
+#'  \item{\code{gradient} -- Gradient of the log-likelihood function from `mle` method.}
+#'  \item{\code{method} -- An estimation method for selection model, e.g. `mle` or `gee`.}
+#'  \item{\code{prob_der} -- Derivative of the inclusion probability function for units in a non--probability sample.}
+#'  \item{\code{prob_rand} -- Inclusion probabilities for unit from a probabiliy sample from `svydesign` object.}
+#'  \item{\code{prob_rand_est} -- Inclusion probabilites to a non--probabiliy sample for unit from probability sample.}
+#'  \item{\code{prob_rand_est_der} -- Derivative of the inclusion probabilites to a non--probabiliy sample for unit from probability sample.}
 #'   }
 #'  }
 #'  \item{\code{stat} -- matrix of the estimated population means in each bootstrap iteration.

diff --git a/R/summary.R b/R/summary.R
@@ -15,7 +15,7 @@
 #' \item \code{call} -- A call which created \code{object}.
 #' \item \code{pop_total} -- A list containing information about the estimated population mean, its standard error and confidence interval.
 #' \item \code{sample_size} -- The size of the samples used in the model.
-#' \item \code{population_size} -- The estimated size of the population from which the nonoprobability sample was drawn.
+#' \item \code{population_size} -- The estimated size of the population from which the non--probability sample was drawn.
 #' \item \code{test} -- Type of statistical test performed.
 #' \item \code{control} -- A List of control parameters used in fitting the model.
 #' \item \code{model} -- A descriptive name of the model used, e.g., "Doubly-Robust", "Inverse probability weighted", or "Mass Imputation".