CI.qmd

---
title: "Confidence intervals"
editor_options: 
  chunk_output_type: console
---

*Last modified on `r Sys.Date()`*

```{r, echo = FALSE, message = FALSE}
source("libs/Common.R")
```

```{r Load fonts, echo=FALSE, eval=TRUE, warning=FALSE, message=FALSE}
library(extrafont)
font_import(prompt=FALSE, pattern="[A/a]rchitects.*")
loadfonts(device="postscript")
# loadfonts(device="win") # To view in windows
```

# Introduction

Most data, or _batches_, are subsets of some underlying population. From these batches, we attempt to infer about the state of the population. For example, we may want to determine how many hours of TV are watched in each household each week. Since we seldom have the resources needed to collect the data for _all_ households, we opt to collect data for a small subset of the population. From this sampled survey, we compute a  summary statistic (such as the mean hours of TV watched each week). We then use this sample statistic as an estimate of the number of hours watched by _all_ households in the population. 

Now, if another investigator were to sample other households at random, she will probably come up with a slightly different summary statistic. Likewise, if a hundred other investigators were to sample households at random, they would come up with a hundred slightly different means of TV hours watched.  Let's explore this idea with a simulated data set:

```{r mean SE simulation, tidy=FALSE}
num.samp   <- 1000                      # Number of samples to collect
samp.size  <- 50                        # Size of each sample
samp.mean  <- vector(length = num.samp) # Create an empty vector array

# Sample the population 'num.samp' of times then compute the sample mean
for (i in 1:num.samp){
  samp.mean[i] <- mean(rbeta(samp.size,20,20,ncp=0)*60)
}
```

In the above code, the first two lines define the number of samples to collect, `r num.samp`, and the number of households to survey in _each_ sample, `r samp.size`. In other words, we have `r num.samp` investigators each sampling `r samp.size` households. The `for` loop collects a new sample (of `r samp.size` households) at each iteration `r num.samp` times. The function `rbeta(samp.size,20,20,ncp=0)*60` randomly generates `r samp.size` values from a predefined distribution (think of this as _simulating_ the number of hours of TV watched in each household `r samp.size` times). The **sample mean** of TV hours watched for each `r samp.size` household sample is then calculated using the `mean()` function. So each investigator computes a single value that represents the mean hours of TV watched in the  `r samp.size` households. Since we have `r num.samp` investigators, we have `r num.samp` _sample means_. We can plot the distribution of the _sample means_ using a histogram.

```{r plot1_not_executed, warning=FALSE,tidy=FALSE, eval=FALSE}
hist(samp.mean, xlab="Mean TV hours per week")
```

```{r plot1_executed, fig.width=5, fig.height=3, echo=FALSE}
op <- par("mar"=c(5,5,1,1))
hist(samp.mean, xlab="Mean TV hours per week (per 50 houshold)", main=NULL,col="grey",border="white",)
par(op)
```

Note that the distribution of the underlying population mean need not be normally distributed to produce a normally distributed sample mean distribution. That said, the histogram just presented does _not_ represent the distribution of mean hours of TV watched for the population, but the distribution of the _sample averages_.

# Estimate of a population statistic from many samples

From the histogram it seems that, out of the `r num.samp` samples, most sample means are centered around `r round(mean(samp.mean))`. In fact, we can  compute the arithmetic mean from the `r num.samp`  samples. This statistic is referred to as the **grand mean** (not to be confused with each individual _sample mean_ of which there are `r num.samp` in our example).
```{r example1_grand_mean}
grand.mean <- mean(samp.mean)
grand.mean
```

From the _grand mean_, we might be tempted to infer that the mean hours of TV watched by each household in the _entire_ population is `r round(grand.mean,2)` hours. But note from the histogram that there is some variability in the _means_ computed from each `r samp.size` household samples. It might behoove us to assess the chance that this estimate (the grand mean) is not representative of the whole _population_ mean. We can find out how big the chance error might be  by calculating the _standard error_.

The **standard error** is the _standard deviation_ of the sampling distribution of a statistic; it's the likely size of chance error when estimating a statistic. In our case the standard error of our sample means, $SE$, is computed by taking the standard deviation of the `r num.samp` sample means: 
```{r example1_se}
SE <- sd(samp.mean)
SE
```
We can now state that the average number of hours of TV watched per household per week is _`r round(grand.mean,2)`_ $\pm$  _`r round(SE,2)`_. It's important to note that $SE$ is _not_ an estimate of the population standard deviation but a measure of uncertainty in the population mean estimate.

In most cases, we do not have the luxury of collecting hundreds or thousands of samples. We usually only have a _single_ sample to work with (hence a single sample from which to make inferences about the whole population). Methods in _estimating_ the sample variability given a single sample is covered next.

# Estimate of a population statistic from a single sample

Let's assume that we only have a single sample of 50 households to work with. Let's store these values in a vector we'll call `x`.

```{r example2_single_sample, tidy=FALSE}
x <- c(25.7, 38.5, 29.3, 25.1, 30.6, 34.6, 30.0, 39.0, 33.7, 31.6, 
       25.9, 34.4, 26.9, 23.0, 31.1, 29.3, 34.5, 35.1, 31.2, 33.2, 
       30.2, 36.4, 37.5, 27.6, 24.6, 23.9, 27.0, 29.5, 30.1, 29.6, 
       27.3, 31.2, 32.5, 25.7, 30.1, 24.2, 24.1, 26.4, 31.0, 20.7, 
       33.5, 32.2, 34.7, 32.6, 33.5, 32.7, 25.6, 31.1, 32.9, 25.9)
```

The standard error of the mean from the _one_ sample, can be estimated using the following formula:

$$
SE_{mean} = \frac{SD}{\sqrt{sample\; size}}
$$

where $SD$ is the standard deviation of number of hours watched for the _entire population_ (i.e. _all_ households, not just those sampled). The standard error of the mean is sometimes represented as $\sigma_{\bar X}$. It's important to note two things here:
   *  this approximation applies only to the uncertainty associated with the estimate of the **population mean** (and not other statistics like the median, count or percentage--these are treated later),
   * this approximation holds for a _large_ sample sizes only (usually 30 or greater). 

However, there's a problem. We usually don't know the population's $SD$ (if we did, we would not need to bother with sampling in the first place!) so, for a *large* sample size, we can estimate $SD$ using the sample means' standard deviation, $SD_{sample}$ giving us
$$
SE_{mean} = \frac{SD_{sample}}{\sqrt{sample\; size}}.
$$

So following up with our single sample example, we can estimate the population mean and the standard error of the mean from our sample as:

```{r example2_SE, tidy=FALSE}
mean.x <- mean(x)
SE.x   <- sd(x) / sqrt(length(x))
```

where `length(x)` is an `R` function that gives us the number of values in the vector `x` (i.e. our sample size $n$). In this example, the population mean estimate is `r round(mean.x,2)` hours with a standard error, $\sigma_{\bar X}$, of `r round(SE.x,2)` hours.

So how exactly do we interpret $SE$ ? As noted earlier, $SE$ is _not_ an estimate of the population standard deviation. It's a measure of how likely the interval, defined by $SE$, encompasses the true (population) mean. In the following figure, the sample means for 30 samples are plotted along with error bars depicting their $\pm$ 1 $SE$ (another way to interpret a $SE$ is to say that we are ~68% confident that the $\pm$ 1 $SE$ interval encompasses the true population mean). The blue vertical line represents the true population mean of 30, the *expected value* (the unknown parameter we are usually seeking). Batches of data (samples) whose $\pm$ 1 $SE$ range does not contain the true population mean are plotted in red.

```{r echo=FALSE, message=FALSE, fig.width=5, fig.height=4, warning=FALSE}
library(gplots)
OP <- par("mar"=c(3,6,3,1))
x2.mean <- c(29.68,30.44,29.61,30.83,30.72,30.99,30.12,30.67,31.29,30.3,29.03,
        30.34,31.26,31.21,29.42,30.88,27.07,29.35,31.53,29.75,31.88,30.89,
        31.84,29.71,31.05,31.49,29.43,29.46,29.25,28.15)
x2.se   <- c(1.17,0.96,0.88,1.13,1,0.72,1.06,1.41,1.13,0.93,0.82,0.86,1.15,
             0.95,0.96,0.95,1.14,1.15,1.19,1.06,0.83,0.74,0.97,1.03,1.15,
             1.29,0.99,1.05,1.29,0.91)
plotCI(y=1:length(x2.mean), x=x2.mean, uiw=x2.se, err="x",
       ylab = "", xlab="", axes=F, pch=16,col="#777777",sfrac=0.000)
axis(side = 1, lwd = .5, line = 0, family= "Architects Daughter", las = 1, tck = -.025, cex.axis=1.0)
mtext(side=1, text="Sample means with +/- one SE", family= "Architects Daughter", 
      las=1,outer = FALSE , adj=0.5, line=2, cex=1.0)
abline(v=30, lty=1,lw=2, col="blue")
mtext(side=3, text="Population\n mean", family= "Architects Daughter", 
      las=1,outer = FALSE , adj=0.6, line=0.5, col="blue", cex=1.0)

x2.mean2 <- x2.mean
x2.se2   <- x2.se
x2.mean2[(x2.mean + x2.se) > 30 & (x2.mean - x2.se) < 30] <- NA
x2.se2[(x2.mean + x2.se) > 30 & (x2.mean - x2.se) < 30]   <- NA

plotCI(y=1:length(x2.mean2),
       x=x2.mean2,
       uiw=x2.se2, 
       err="x",col="red",pch=16,add=TRUE, sfrac=0.000)
par(OP)
```

It's clear from the figure that `r table(is.na(x2.mean2))[1]` samples have a 68% confidence interval that do not contain the true population mean. If we want to increase our confidence that the sample mean contain the true mean value, we can widen our confidence from 1 $SE$ to 2 $SE$ which covers about 95% of each sample distribution.


```{r echo=FALSE, message=FALSE, fig.width=5, fig.height=4, warning=FALSE}
library(gplots)
OP <- par("mar"=c(3,6,3,1))
x2.mean <- c(29.68,30.44,29.61,30.83,30.72,30.99,30.12,30.67,31.29,30.3,29.03,
        30.34,31.26,31.21,29.42,30.88,27.07,29.35,31.53,29.75,31.88,30.89,
        31.84,29.71,31.05,31.49,29.43,29.46,29.25,28.15)
x2.se   <- c(1.17,0.96,0.88,1.13,1,0.72,1.06,1.41,1.13,0.93,0.82,0.86,1.15,
             0.95,0.96,0.95,1.14,1.15,1.19,1.06,0.83,0.74,0.97,1.03,1.15,
             1.29,0.99,1.05,1.29,0.91)
plotCI(y=1:length(x2.mean), x=x2.mean, uiw=(2 * x2.se), err="x",
       ylab = "", xlab="", axes=F, pch=16,col="#777777", sfrac=0.000)
axis(side = 1, lwd = .5, line = 0, family= "Architects Daughter", las = 1, tck = -.025, cex.axis=1.0)
mtext(side=1, text="Sample means with +/- two SE", family= "Architects Daughter", 
      las=1,outer = FALSE , adj=0.5, line=2, cex=1.0)
abline(v=30, lty=1,lw=2, col="blue")
mtext(side=3, text="Population\n mean", family= "Architects Daughter", 
      las=1,outer = FALSE , adj=0.6, line=0.5, col="blue", cex=1.0)

x2.mean2 <- x2.mean
x2.se2   <- x2.se
x2.mean2[(x2.mean + 2 * x2.se) > 30 & (x2.mean - 2 * x2.se) < 30] <- NA
x2.se2[(x2.mean + 2 * x2.se) > 30 & (x2.mean - 2 * x2.se) < 30]   <- NA

plotCI(y=1:length(x2.mean2),
       x=x2.mean2,
       uiw=(2 * x2.se2), 
       err="x",col="red",pch=16,add=TRUE, sfrac=0.000)
par(OP)
```

Now, only three of the 30 samples have a confidence interval that does not contain the true population mean of 30 hours of TV watched per week.

One $SE$ on both sides of the sample mean encompasses a probability of about 68% (34% on either side of the mean). This is our 68% *confidence interval* (i.e. there is a 68% chance that this interval contains the true population mean). If we want to increase our confidence that the interval contains the population mean, we can widen it by say 2 $SE$ which provides us with about 95% confidence.

The following figure displays the histogram from `r num.samp` sample means superimposed with a Gaussian curve. The (red) shaded areas represent the fraction of the sample mean values that fall within each SE ranges. Note that fewer and fewer sample means fall within SE intervals as the sample means diverge from the _grand mean_.

```{r  echo=FALSE, message=FALSE, fig.width=7, fig.height=3, warning=FALSE}
OP <- par("mar"=c(4,1,1,1))
hist(samp.mean, ylab=NA, xlab=NA,axes=FALSE, col="#DDDDDD",border="white",
     breaks=seq(min(samp.mean),max(samp.mean), length.out=25), prob=TRUE,main=NA)
ncurve.x = seq(range(samp.mean)[1],range(samp.mean)[2],by=.01)
ncurve.y = dnorm(ncurve.x, mean=grand.mean,sd=sd(samp.mean))
lines(ncurve.x, ncurve.y,lw=2, col="#555555")

# Shade left areas
left.SE = c(grand.mean, grand.mean - SE, grand.mean - 2 * SE)
for (i in left.SE){
  ncurve.x.mSE <- seq(range(samp.mean)[1], i, by=.01)
  ncurve.y.mSE <- dnorm(ncurve.x.mSE, mean=grand.mean,sd=sd(samp.mean))
  ncurve.x.mSE <- c(ncurve.x.mSE, max(ncurve.x.mSE), min(ncurve.x.mSE))
  ncurve.y.mSE <- c(ncurve.y.mSE, 0,0)
  polygon( ncurve.x.mSE, ncurve.y.mSE, col=rgb(1, 0, 0,.3), border=NA)
}

# Shade rigth areas
right.SE = c(grand.mean, grand.mean + SE, grand.mean + 2 * SE)
for (i in right.SE){
  ncurve.x.mSE <- seq(i, range(samp.mean)[2], by=.01)
  ncurve.y.mSE <- dnorm(ncurve.x.mSE, mean=grand.mean,sd=sd(samp.mean))
  ncurve.x.mSE <- c(ncurve.x.mSE, max(ncurve.x.mSE), min(ncurve.x.mSE))
  ncurve.y.mSE <- c(ncurve.y.mSE, 0, 0)
  polygon( ncurve.x.mSE, ncurve.y.mSE, col=rgb(1, 0, 0,.3), border=NA)
}

inter = c(rev(left.SE[-1]), right.SE)

lines(c(grand.mean,grand.mean), c(0, max(ncurve.y)), col="#BB2222",lw=2)

axis(side=1, at= inter, padj=-.5, line=-0.2, lwd=1, tck = -.03,
     labels = round(c(rev(left.SE[-1]), right.SE), 2) , family= "Architects Daughter", cex.axis=1.0)

axis(side=1, at= inter, padj=-.5, line=1.5,  tck = -.03, family= "Architects Daughter",
     labels= c("-2 SE", "-1 SE", "Grand Mean", "+1 SE", "+2 SE "), 
     col.axis="#BB2222", col="#BB2222", cex.axis=1.0)


inter.mid <- (inter[-1] + inter[-length(inter)]) / 2
axis(side=1, at= inter.mid, padj=-1, line=-2, lwd=0, tck = -.03, col.axis="#992222",
     labels = c("13.6%", "34.1%", "34.1%", "13.6%"),  family= "Architects Daughter", cex.axis=1.0)
par(OP)
```


So, given our most recent example, we can state that there is a ~ 68% chance that the interval (`r round(mean.x,2)` -  `r round(SE.x,2)`) and (`r round(mean.x,2)` +  `r round(SE.x,2)`) contains the true population mean of hours of TV watched each week or, put more succinctly, there is a 68% chance that the mean hours of TV watched by the population is `r round(mean.x,2)` $\pm$ `r round(SE.x,2)`.

Note that we are making a statement about the probability that the interval _includes_ the population mean and **not** the probability that the population mean _falls_ between this interval. The latter would imply that the population mean is a random variable which it's not, it's static! 

The standard error equation used thus far only applies to the _mean_, if another population statistic is to be estimated, then other $SE$ formulas have to be used. Other types of $SE$'s include the standard error of **sum** of samples, and **fractions** (of a binary output) of samples (_Freedman et al., p.422_):

$$
SE_{sum} = \sqrt{sample\; size} \times SD
$$

$$
SE_{fraction} = \frac{\sqrt{(fraction\; of\; 1's)(fraction\; of\; 0's)}}{\sqrt{sample\; size}}
$$

# A universal approach to computing the standard error: the bootstrap
The preceding section highlighted a few formulae for SE and some restricting assumptions such as the sample size. What if we want to to estimate the population median, mode, etc.. for which a formula may be illusive or non-existent? The solution to this problem is a technique called _bootstrap_. A bootstrap is a computer simulation often used to compute measures of accuracy to statistical estimates. The idea is simple: re-sample from the sample many times, then calculate the statistic for each sub-sample. The re-sampling is usually done with replacement (i.e. a same value can be picked more than once). The bootstrap standard estimated error _is_ the standard deviation of the bootstrap samples.

Continuing with the 50 household sample, we can  apply the bootstrap technique to estimate the population _median_ in R as follows:
```{r Boot SE ex1, tidy=FALSE}
median.boot <- numeric()   # Create an empty vector
for (i in 1:10000){
  sample.boot    <- sample(x, size = length(x), replace=TRUE)
  median.boot[i] <- median(sample.boot)
}
SE.boot <- sd(median.boot)
SE.boot
```
In this example, the sample `x` is re-sampled 10,000 times (each sample having a size equal to the original sample, `length(x)`). The function `sample()` samples anew from the sample `x` _with_ replacement each time. The median of each re-sampled batch is stored in the variable `median.boot[i]`. The variable `SE.boot` stores the standard error for the bootstrap sample medians, or `r SE.boot` in this working example (note that your result may differ slightly given that the re-sampling process is random).

Another way to implement the bootstrap technique is to use the function `boot()` from the `boot` library. This reduces the code to:

```{r Boot SE ex2, tidy=FALSE, message=FALSE}
library(boot)
f.med       <- function(Y,i) median(Y[i])
median.boot <- boot(x, R = 10000, statistic = f.med)
SE.boot     <- sd( as.vector(median.boot$t))
SE.boot
```
In this example, we are defining a function called `f.med` where the function returns the _median_ value for an array. This is a simple example of a function, but this demonstrates how one can create a customized statistic for which $SE$ is sought. The function `boot` returns a _list_ and not a numeric vector. To view the contents of a list, you can invoke the structure function `str()`.

```{r list structure, tidy=FALSE}
str(median.boot)
```
The list contains close to a dozen components. The component of interest to us is `t` which stores the median values from all 10,000 sub-samples. This component is accessed by adding the `$` symbol to the end of the list name `median.boot` (i.e. `median.boot$t`). If you type `median.boot$t` at a command prompt, you will note that the data are stored as a matrix (and not a vector). Hence, if you want to compute the standard deviation of this matrix by invoking `sd(median.boot$t)`, you will be presented with a warning along the lines of `Warning: sd(<matrix>) is deprecated...`. To circumvent this warning, you can convert the matrix to a vector using the `as.vector()` function (i.e. `sd( as.vector(meadian.boot$t) )`.


# Confidence intervals

The idea of a **confidence interval**, CI, is a natural extension of the standard error. It allows us to define a level of confidence in our population parameter estimate gleaned from a sample. For example, if we wanted to be 95% confident that the range of mean TV hours per week computed from our sample encompasses the true mean value for _all_ households in the population we would compute this interval by adding and subtracting $1.96\; SE$ to/from the sample mean. But beware, people will sometimes state this as _"... there is a 95% chance that the population mean falls between such and such values..."_ which is problematic as noted earlier since it implies that the population mean is a random variable when in fact it's not. The confidence interval reminds us that the chances are in the sampling and not the population parameter. 

A confidence interval of 68% and 95% are easily estimated from $1 SE$ or $1.96 SE$ respectively, but what if we want to define some other confidence interval such as 85% or 90%? To estimate the confidence interval for any other value, simply invoke the Student's _t_ quantile function `qt()` in conjunction with $SE$. For example, to generate a 90% confidence interval for the _mean_ hours of TV watched per household:

```{r CI 1, tidy=FALSE}
mean.int.90 <- mean.x + qt( c(0.05, 0.95), length(x) - 1) * SE.x
mean.int.90
```

In this example, we can state that we are 90% confident that the range [`r round(mean.int.90,2)`] encompasses the true population _mean_. The function `qt` finds the two-tailed critical values from Student's _t_ distribution with `length(x) -1` degrees of freedom (or df = `r length(x) -1` in our working example). Note that using Student's _t_ value is recommended over the _normal distribution's_ _z_ value when sample size is small. If the sample size is large, Student's _t_ value and the _normal_'s _z_ values converge. 

In the following figure, the 90% confidence range is shaded in the light red color. Recall that the distribution is for the many different sample means that could be drawn from the population and _not_ the distribution of the raw (sampled) data.

```{r  CI_explained, echo=FALSE, message=FALSE, fig.width=7, fig.height=3, warning=FALSE}
OP <- par("mar"=c(4,1,1,1))
ncurve.x = seq(range(samp.mean)[1],range(samp.mean)[2],by=.01)
ncurve.y = dnorm(ncurve.x, mean=mean.x,sd=sd(samp.mean))
#lines(ncurve.x, ncurve.y,lw=2, col="#555555")
plot(ncurve.x, ncurve.y, col="#555555", type="l", axes=FALSE, xlab=NA)
# Shade left areas
left.SE = c(mean.x, mean.int.90[1])
for (i in left.SE){
  ncurve.x.mSE <- seq(range(samp.mean)[1], i, by=.01)
  ncurve.y.mSE <- dnorm(ncurve.x.mSE, mean=mean.x,sd=sd(samp.mean))
  ncurve.x.mSE <- c(ncurve.x.mSE, max(ncurve.x.mSE), min(ncurve.x.mSE))
  ncurve.y.mSE <- c(ncurve.y.mSE, 0,0)
  polygon( ncurve.x.mSE, ncurve.y.mSE, col=rgb(1, 0, 0,.3), border=NA)
}

# Shade rigth areas
right.SE = c(mean.x, mean.int.90[2])
for (i in right.SE){
  ncurve.x.mSE <- seq(i, range(samp.mean)[2], by=.01)
  ncurve.y.mSE <- dnorm(ncurve.x.mSE, mean=mean.x,sd=sd(samp.mean))
  ncurve.x.mSE <- c(ncurve.x.mSE, max(ncurve.x.mSE), min(ncurve.x.mSE))
  ncurve.y.mSE <- c(ncurve.y.mSE, 0, 0)
  polygon( ncurve.x.mSE, ncurve.y.mSE, col=rgb(1, 0, 0,.3), border=NA)
}

inter = c(rev(left.SE[-1]), right.SE)

lines(c(mean.x,mean.x), c(0, max(ncurve.y)), col="#BB2222",lw=2)

axis(side=1, at= inter, padj=-.5, line=-0.2, lwd=1, tck = -.03,
     labels = round(c(rev(left.SE[-1]), right.SE), 2) , family= "Architects Daughter")

lab.qt = qt( c(0.05, 0.95), length(x) - 1)
axis(side=1, at= inter, padj=-.5, line=1.6,  tck = -.03, family= "Architects Daughter",
     labels= c(round(lab.qt[1],2), "t-values for 90% C.I.", round(lab.qt[2],2)), 
     col.axis="#BB2222", col="#BB2222")


inter.mid <- (inter[-1] + inter[-length(inter)]) / 2
axis(side=1, at= inter.mid, padj=-1, line=-2, lwd=0, tck = -.03, col.axis="#992222",
     labels = c("45%", "45%"),  family= "Architects Daughter")
par(OP)
```


Likewise, we can compute the 90% confidence interval from the bootstrap $SE$ estimate of the _median_.
```{r CI 2, tidy=FALSE}
median.int.90 <- mean(median.boot$t) + qt( c(0.05, 0.95), length(x) - 1) * SE.boot
median.int.90
```

This example suggests that there is a 90% chance that the range of median values [`r round(median.int.90,2)`] encompasses the true population _median_. Note that using standard bootstrap techniques to estimate $CI$ requires that the bootstrap distribution of a statistic follow a normal distribution. If it does not, the computed $CI$ may over- or under-estimate the true confidence interval. A safer (and more robust) bootstrap based measure of $CI$ is the _bias corrected_ bootstrap method, **BCa**, which can be easily computed using the `boot.ci` function in the `boot` library. The parameter `conf` defines the confidence interval sought.

```{r CI BCa, tidy=FALSE}
median.int.90.BCa <- boot.ci(median.boot, conf = .90, type="bca")
median.int.90.BCa
```

The BCa interval values can be extracted from the variable by typing `median.int.90.BCa$bca[4:5]` at the prompt.

On a final note, the **margin of error (MoE)** can be used interchangeably with the confidence interval. The [American Statistical Association][1] (page 64) defines the MoE as a 95% confidence interval, but this definition is not consistent across the literature. So if given estimates with a measure of confidence defined as a MoE make sure to ask for the provider's definition of the MoE!

# References

Freedman D.A., Robert Pisani, Roger Purves. _Statistics_, 4th edition, 2007.  
McClave J.T., Dietrich F.H., _Statistics_, 4th edition, 1988.  

-----

**Session Info**:

```{r,results='asis', echo=FALSE}
detach("package:extrafont")
pander::pander(sessionInfo(), locale = FALSE)
```