From faebafd024c39e63efba436e0877bc61070ddca6 Mon Sep 17 00:00:00 2001 From: LukaszChrostowski Date: Mon, 20 Jan 2025 11:49:39 +0100 Subject: [PATCH] typos correction --- R/control_inference.R | 34 +++++++++++++++---------------- R/main_function_documentation.R | 34 +++++++++++++++---------------- man/control_inf.Rd | 34 +++++++++++++++---------------- man/control_sel.Rd | 2 +- man/jvs.Rd | 2 +- man/nonprob.Rd | 36 ++++++++++++++++----------------- 6 files changed, 71 insertions(+), 71 deletions(-) diff --git a/R/control_inference.R b/R/control_inference.R index 442561a..165dad0 100644 --- a/R/control_inference.R +++ b/R/control_inference.R @@ -3,31 +3,31 @@ #' @description \code{control_inf} constructs a list with all necessary control parameters #' for statistical inference. #' -#' @param vars_selection If `TRUE`, then variables selection model is used. -#' @param var_method variance method. -#' @param rep_type replication type for weights in the bootstrap method for variance estimation passed to [survey::as.svrepdesign()]. +#' @param vars_selection If `TRUE`, then the variables selection model is used. +#' @param var_method the variance method. +#' @param rep_type the replication type for weights in the bootstrap method for variance estimation passed to [survey::as.svrepdesign()]. #' Default is `subbootstrap`. -#' @param bias_inf inference method in the bias minimization. +#' @param bias_inf the inference method in the bias minimization. #' \itemize{ -#' \item if \code{union} then final model is fitting on union of selected variables for selection and outcome models -#' \item if \code{div} then final model is fitting separately on division of selected variables into relevant ones for +#' \item if \code{union}, then the final model is fitted on the union of selected variables for selection and outcome models +#' \item if \code{div}, then the final model is fitted separately on division of selected variables into relevant ones for #' selection and outcome model. #' } -#' @param bias_correction if `TRUE`, then bias minimization estimation used during fitting the model. -#' @param num_boot number of iteration for bootstrap algorithms. -#' @param alpha Significance level, Default is 0.05. -#' @param cores Number of cores in parallel computing. -#' @param keep_boot Logical indicating whether statistics from bootstrap should be kept. +#' @param bias_correction if `TRUE`, then the bias minimization estimation used during model fitting. +#' @param num_boot the number of iteration for bootstrap algorithms. +#' @param alpha significance level, 0.05 by defult. +#' @param cores the number of cores in parallel computing. +#' @param keep_boot a logical value indicating whether statistics from bootstrap should be kept. #' By default set to \code{TRUE} -#' @param nn_exact_se Logical value indicating whether to compute the exact +#' @param nn_exact_se a logical value indicating whether to compute the exact #' standard error estimate for \code{nn} or \code{pmm} estimator. The variance estimator for #' estimation based on \code{nn} or \code{pmm} can be decomposed into three parts, with the -#' third being computed using covariance between imputed values for units in -#' probability sample using predictive matches from non-probability sample. +#' third computed using covariance between imputed values for units in +#' the probability sample using predictive matches from the non-probability sample. #' In most situations this term is negligible and is very computationally -#' expensive so by default this is set to \code{FALSE}, but it is recommended to -#' set this value to \code{TRUE} before submitting final results. -#' @param pi_ij TODO, either matrix or \code{ppsmat} class object. +#' expensive so by default it is set to \code{FALSE}, but the recommended option is to +#' set this value to \code{TRUE} before submitting the final results. +#' @param pi_ij TODO, either a matrix or a \code{ppsmat} class object. #' #' #' @return List with selected parameters. diff --git a/R/main_function_documentation.R b/R/main_function_documentation.R index 82a93ba..d4d63d5 100644 --- a/R/main_function_documentation.R +++ b/R/main_function_documentation.R @@ -1,10 +1,10 @@ #' @import mathjaxr NULL -#' @title Inference with the non-probability survey samples +#' @title Inference with non-probability survey samples #' @author Łukasz Chrostowski, Maciej Beręsewicz #' #' \loadmathjax -#' @description \code{nonprob} fits model for inference based on non-probability surveys (including big data) using various methods. +#' @description \code{nonprob} fits a model for inference based on non-probability surveys (including big data) using various methods. #' The function allows you to estimate the population mean with access to a reference probability sample, as well as sums and means of covariates. #' #' The package implements state-of-the-art approaches recently proposed in the literature: Chen et al. (2020), @@ -13,32 +13,32 @@ NULL #' It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbour) and #' doubly robust estimators that take into account minimisation of the asymptotic bias of the population mean estimators or #' variable selection. -#' The package uses `survey` package functionality when a probability sample is available. +#' The package uses the `survey` package functionality when a probability sample is available. #' #' -#' @param data `data.frame` with data from the non-probability sample. -#' @param selection `formula`, the selection (propensity) equation. -#' @param outcome `formula`, the outcome equation. -#' @param target `formula` with target variables. -#' @param svydesign an optional `svydesign` object (from the survey package) containing probability sample and design weights. +#' @param data a `data.frame` with data from the non-probability sample. +#' @param selection a `formula`, the selection (propensity) equation. +#' @param outcome a `formula`, the outcome equation. +#' @param target a `formula` with target variables. +#' @param svydesign an optional `svydesign` object (from the survey package) containing a probability sample and design weights. #' @param pop_totals an optional `named vector` with population totals of the covariates. #' @param pop_means an optional `named vector` with population means of the covariates. -#' @param pop_size an optional `double` with population size. -#' @param method_selection a `character` with method for propensity scores estimation. -#' @param method_outcome a `character` with method for response variable estimation. -#' @param family_outcome a `character` string describing the error distribution and link function to be used in the model. Default is "gaussian". Currently supports: gaussian with identity link, poisson and binomial. +#' @param pop_size an optional `double` value with population size. +#' @param method_selection a `character` indicating the method for propensity scores estimation. +#' @param method_outcome a `character` indicating the method for response variable estimation. +#' @param family_outcome a `character` string describing the error distribution and the link function to be used in the model, set to `gaussian` by default. Currently supports: gaussian with identity link, poisson and binomial. #' @param subset an optional `vector` specifying a subset of observations to be used in the fitting process - not yet supported. #' @param strata an optional `vector` specifying strata - not yet supported. #' @param weights an optional `vector` of prior weights to be used in the fitting process. Should be NULL or a numeric vector. It is assumed that this vector contains frequency or analytic weights. #' @param na_action a function which indicates what should happen when the data contain `NAs` - not yet supported. -#' @param control_selection a `list` indicating parameters to use in fitting selection model for propensity scores. -#' @param control_outcome a `list` indicating parameters to use in fitting model for outcome variable. -#' @param control_inference a `list` indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method. +#' @param control_selection a `list` indicating parameters to be used when fitting the selection model for propensity scores. +#' @param control_outcome a `list` indicating parameters to be used when fitting the model for the outcome variable. +#' @param control_inference a `list` indicating parameters to be used for inference based on probability and non-probability samples, contains parameters such as the estimation method or the variance method. #' @param start_selection an optional `vector` with starting values for the parameters of the selection equation. #' @param start_outcome an optional `vector` with starting values for the parameters of the outcome equation. #' @param verbose verbose, numeric. -#' @param x Logical value indicating whether to return model matrix of covariates as a part of output. -#' @param y Logical value indicating whether to return vector of outcome variable as a part of output. +#' @param x a logical value indicating whether to return model matrix of covariates as a part of the output. +#' @param y a logical value indicating whether to return vector of the outcome variable as a part of the output. #' @param se Logical value indicating whether to calculate and return standard error of estimated mean. #' @param ... Additional, optional arguments. #' diff --git a/man/control_inf.Rd b/man/control_inf.Rd index c448188..d40f468 100644 --- a/man/control_inf.Rd +++ b/man/control_inf.Rd @@ -20,41 +20,41 @@ control_inf( ) } \arguments{ -\item{vars_selection}{If \code{TRUE}, then variables selection model is used.} +\item{vars_selection}{If \code{TRUE}, then the variables selection model is used.} -\item{var_method}{variance method.} +\item{var_method}{the variance method.} -\item{rep_type}{replication type for weights in the bootstrap method for variance estimation passed to \code{\link[survey:as.svrepdesign]{survey::as.svrepdesign()}}. +\item{rep_type}{the replication type for weights in the bootstrap method for variance estimation passed to \code{\link[survey:as.svrepdesign]{survey::as.svrepdesign()}}. Default is \code{subbootstrap}.} -\item{bias_inf}{inference method in the bias minimization. +\item{bias_inf}{the inference method in the bias minimization. \itemize{ -\item if \code{union} then final model is fitting on union of selected variables for selection and outcome models -\item if \code{div} then final model is fitting separately on division of selected variables into relevant ones for +\item if \code{union}, then the final model is fitted on the union of selected variables for selection and outcome models +\item if \code{div}, then the final model is fitted separately on division of selected variables into relevant ones for selection and outcome model. }} -\item{num_boot}{number of iteration for bootstrap algorithms.} +\item{num_boot}{the number of iteration for bootstrap algorithms.} -\item{bias_correction}{if \code{TRUE}, then bias minimization estimation used during fitting the model.} +\item{bias_correction}{if \code{TRUE}, then the bias minimization estimation used during model fitting.} -\item{alpha}{Significance level, Default is 0.05.} +\item{alpha}{significance level, 0.05 by defult.} -\item{cores}{Number of cores in parallel computing.} +\item{cores}{the number of cores in parallel computing.} -\item{keep_boot}{Logical indicating whether statistics from bootstrap should be kept. +\item{keep_boot}{a logical value indicating whether statistics from bootstrap should be kept. By default set to \code{TRUE}} -\item{nn_exact_se}{Logical value indicating whether to compute the exact +\item{nn_exact_se}{a logical value indicating whether to compute the exact standard error estimate for \code{nn} or \code{pmm} estimator. The variance estimator for estimation based on \code{nn} or \code{pmm} can be decomposed into three parts, with the -third being computed using covariance between imputed values for units in -probability sample using predictive matches from non-probability sample. +third computed using covariance between imputed values for units in +the probability sample using predictive matches from the non-probability sample. In most situations this term is negligible and is very computationally -expensive so by default this is set to \code{FALSE}, but it is recommended to -set this value to \code{TRUE} before submitting final results.} +expensive so by default it is set to \code{FALSE}, but the recommended option is to +set this value to \code{TRUE} before submitting the final results.} -\item{pi_ij}{TODO, either matrix or \code{ppsmat} class object.} +\item{pi_ij}{TODO, either a matrix or a \code{ppsmat} class object.} } \value{ List with selected parameters. diff --git a/man/control_sel.Rd b/man/control_sel.Rd index 44f040d..8de7095 100644 --- a/man/control_sel.Rd +++ b/man/control_sel.Rd @@ -60,7 +60,7 @@ control_sel( \item if \code{2} then \mjseqn{ \mathbf{h}\left(\mathbf{x}, \boldsymbol{\theta}\right) = \mathbf{x}} }} -\item{penalty}{The penanlization function used during variables selection.} +\item{penalty}{The penalization function used during variables selection.} \item{a_SCAD}{The tuning parameter of the SCAD penalty for selection model. Default is 3.7.} diff --git a/man/jvs.Rd b/man/jvs.Rd index 8966df1..d3f4ebe 100644 --- a/man/jvs.Rd +++ b/man/jvs.Rd @@ -20,7 +20,7 @@ A single data.frame with 6,523 rows and 6 columns jvs } \description{ -This is a subset of the subset of the Job Vacancy Survey from Poland (for one quarter). +This is a subset of the Job Vacancy Survey from Poland (for one quarter). The data has been subject to slight manipulation, but the relationships in the data have been preserved. For further details on the JVS, please refer to the following link: \url{https://stat.gov.pl/obszary-tematyczne/rynek-pracy/popyt-na-prace/zeszyt-metodologiczny-popyt-na-prace,3,1.html}. diff --git a/man/nonprob.Rd b/man/nonprob.Rd index 42bcc3e..8d75d51 100644 --- a/man/nonprob.Rd +++ b/man/nonprob.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/main_function_documentation.R, R/nonprob.R \name{nonprob} \alias{nonprob} -\title{Inference with the non-probability survey samples} +\title{Inference with non-probability survey samples} \usage{ nonprob( data, @@ -33,27 +33,27 @@ nonprob( ) } \arguments{ -\item{data}{\code{data.frame} with data from the non-probability sample.} +\item{data}{a \code{data.frame} with data from the non-probability sample.} -\item{selection}{\code{formula}, the selection (propensity) equation.} +\item{selection}{a \code{formula}, the selection (propensity) equation.} -\item{outcome}{\code{formula}, the outcome equation.} +\item{outcome}{a \code{formula}, the outcome equation.} -\item{target}{\code{formula} with target variables.} +\item{target}{a \code{formula} with target variables.} -\item{svydesign}{an optional \code{svydesign} object (from the survey package) containing probability sample and design weights.} +\item{svydesign}{an optional \code{svydesign} object (from the survey package) containing a probability sample and design weights.} \item{pop_totals}{an optional \verb{named vector} with population totals of the covariates.} \item{pop_means}{an optional \verb{named vector} with population means of the covariates.} -\item{pop_size}{an optional \code{double} with population size.} +\item{pop_size}{an optional \code{double} value with population size.} -\item{method_selection}{a \code{character} with method for propensity scores estimation.} +\item{method_selection}{a \code{character} indicating the method for propensity scores estimation.} -\item{method_outcome}{a \code{character} with method for response variable estimation.} +\item{method_outcome}{a \code{character} indicating the method for response variable estimation.} -\item{family_outcome}{a \code{character} string describing the error distribution and link function to be used in the model. Default is "gaussian". Currently supports: gaussian with identity link, poisson and binomial.} +\item{family_outcome}{a \code{character} string describing the error distribution and the link function to be used in the model, set to \code{gaussian} by default. Currently supports: gaussian with identity link, poisson and binomial.} \item{subset}{an optional \code{vector} specifying a subset of observations to be used in the fitting process - not yet supported.} @@ -63,11 +63,11 @@ nonprob( \item{na_action}{a function which indicates what should happen when the data contain \code{NAs} - not yet supported.} -\item{control_selection}{a \code{list} indicating parameters to use in fitting selection model for propensity scores.} +\item{control_selection}{a \code{list} indicating parameters to be used when fitting the selection model for propensity scores.} -\item{control_outcome}{a \code{list} indicating parameters to use in fitting model for outcome variable.} +\item{control_outcome}{a \code{list} indicating parameters to be used when fitting the model for the outcome variable.} -\item{control_inference}{a \code{list} indicating parameters to use in inference based on probability and non-probability samples, contains parameters such as estimation method or variance method.} +\item{control_inference}{a \code{list} indicating parameters to be used for inference based on probability and non-probability samples, contains parameters such as the estimation method or the variance method.} \item{start_selection}{an optional \code{vector} with starting values for the parameters of the selection equation.} @@ -75,9 +75,9 @@ nonprob( \item{verbose}{verbose, numeric.} -\item{x}{Logical value indicating whether to return model matrix of covariates as a part of output.} +\item{x}{a logical value indicating whether to return model matrix of covariates as a part of the output.} -\item{y}{Logical value indicating whether to return vector of outcome variable as a part of output.} +\item{y}{a logical value indicating whether to return vector of the outcome variable as a part of the output.} \item{se}{Logical value indicating whether to calculate and return standard error of estimated mean.} @@ -157,7 +157,7 @@ Returned only if a bootstrap method is used to estimate the variance and \code{k } } \description{ -\code{nonprob} fits model for inference based on non-probability surveys (including big data) using various methods. +\code{nonprob} fits a model for inference based on non-probability surveys (including big data) using various methods. The function allows you to estimate the population mean with access to a reference probability sample, as well as sums and means of covariates. The package implements state-of-the-art approaches recently proposed in the literature: Chen et al. (2020), @@ -166,7 +166,7 @@ Yang et al. (2020), Wu (2022) and uses the \href{https://CRAN.R-project.org/pack It provides propensity score weighting (e.g. with calibration constraints), mass imputation (e.g. nearest neighbour) and doubly robust estimators that take into account minimisation of the asymptotic bias of the population mean estimators or variable selection. -The package uses \code{survey} package functionality when a probability sample is available. +The package uses the \code{survey} package functionality when a probability sample is available. } \details{ Let \mjseqn{y} be the response variable for which we want to estimate the population mean, @@ -232,7 +232,7 @@ Notice that for \mjseqn{ \mathbf{h}\left(\mathbf{x}_i, \boldsymbol{\theta}\right Using the imputed values for the probability sample and the (known) design weights, we can build a population mean estimator of the form: \mjsdeqn{\hat{\mu}_{MI} = \frac{1}{N^B}\sum_{i \in S_{B}} d_{i}^{B} \hat{y}_i.} -It opens the the door to a very flexible method for imputation models. The package uses generalized linear models from \code{\link[stats:glm]{stats::glm()}}, +It opens the door to a very flexible method for imputation models. The package uses generalized linear models from \code{\link[stats:glm]{stats::glm()}}, the nearest neighbour algorithm using \code{\link[RANN:nn2]{RANN::nn2()}} and predictive mean matching. \item Doubly robust estimation -- The IPW and MI estimators are sensitive to misspecified models for the propensity score and outcome variable, respectively. To this end, so-called doubly robust methods are presented that take these problems into account.