inference-tables.qmd

# Inference for two-way tables {#sec-inference-tables}

```{r}
#| include: false
source("_common.R")
```

::: {.chapterintro data-latex=""}
In [Chapter -@sec-inference-two-props] our focus was on the difference in proportions, a statistic calculated from finding the success proportions (from the binary response variable) measured across two groups (the binary explanatory variable).
As we will see in the examples below, sometimes the explanatory or response variables have more than two possible options.
In that setting, a difference across two groups is not sufficient, and the proportion of "success" is not well defined if there are 3 or 4 or more possible response levels.
The primary way to summarize categorical data where the explanatory and response variables both have 2 or more levels is through a two-way table as in @tbl-ipod-ask-data-summary.

Note that with two-way tables, there is not an obvious single parameter of interest.
Instead, research questions usually focus on how the proportions of the response variable changes (or not) across the different levels of the explanatory variable.
Because there is not a population parameter to estimate, bootstrapping to find the standard error of the estimate is not meaningful.
As such, for two-way tables, we will focus on the randomization test and corresponding mathematical approximation (and not bootstrapping).
:::

## Randomization test of independence

We all buy used products -- cars, computers, textbooks, and so on -- and we sometimes assume the sellers of those products will be forthright about any underlying problems with what they're selling.
This is not something we should take for granted.
Researchers recruited 219 participants in a study where they would sell a used iPod[^18-inference-tables-1] that was known to have frozen twice in the past.
The participants were incentivized to get as much money as they could for the iPod since they would receive a 5% cut of the sale on top of \$10 for participating.
The researchers wanted to understand what types of questions would elicit the seller to disclose the freezing issue.

[^18-inference-tables-1]: For readers not as old as the authors, an iPod is basically an iPhone without any cellular service, assuming it was one of the later generations.
    Earlier generations were more basic.

\clearpage

Unbeknownst to the participants who were the sellers in the study, the buyers were collaborating with the researchers to evaluate the influence of different questions on the likelihood of getting the sellers to disclose the past issues with the iPod.
The scripted buyers started with "Okay, I guess I'm supposed to go first. So you've had the iPod for 2 years ..." and ended with one of three questions:

-   General: What can you tell me about it?
-   Positive Assumption: It does not have any problems, does it?
-   Negative Assumption: What problems does it have?

The question is the treatment given to the sellers, and the response is whether the question prompted them to disclose the freezing issue with the iPod.
The results are shown in @tbl-ipod-ask-data-summary, and the data suggest that asking the, *What problems does it have?*, was the most effective at getting the seller to disclose the past freezing issues.
However, you should also be asking yourself: could we see these results due to chance alone if there really is no difference in the question asked, or is this in fact evidence that some questions are more effective for getting at the truth?

```{r}
#| label: ask-data-prep
ask <- ask |>
  mutate(
    response = if_else(response == "disclose", "Disclose problem", "Hide problem"),
    question_class = case_when(
      question_class == "general" ~ "General",
      question_class == "neg_assumption" ~ "Negative assumption",
      question_class == "pos_assumption" ~ "Positive assumption"
    ),
    question_class = fct_relevel(question_class, "General", "Positive assumption", "Negative assumption")
  )
```

```{r}
#| label: tbl-ipod-ask-data-summary
#| tbl-cap: |
#|   Summary of the iPod study, where a question was posed to the study
#|   participant who acted.
#| tbl-pos: H
ask |>
  count(question_class, response) |>
  pivot_wider(names_from = response, values_from = n) |>
  adorn_totals(where = c("row", "col")) |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("Question", "Disclose problem", "Hide problem", "Total")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  )
```

::: {.data data-latex=""}
The [`ask`](http://openintrostat.github.io/openintro/reference/ask.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::

The hypothesis test for the iPod experiment is really about assessing whether there is convincing evidence that there was a difference in the success rates that each question had on getting the participant to disclose the problem with the iPod.
In other words, the goal is to check whether the buyer's question was independent\index{independence} of whether the seller disclosed a problem.

```{r}
#| include: false
terms_chp_18 <- c("independence")
```

### Expected counts in two-way tables

While we would not expect the number of disclosures to be exactly the same across the three question classes, the rate of disclosure seems substantially different across the three groups.
In order to investigate whether the differences in rates is due to natural variability in people's honesty or due to a treatment effect (i.e., the question causing the differences), we need to compute estimated counts for each cell in a two-way table.

::: {.workedexample data-latex=""}
From the experiment, we can compute the proportion of all sellers who disclosed the freezing problem as $61/219 = 0.2785.$ If there really is no difference among the questions and 27.85% of sellers were going to disclose the freezing problem no matter the question they were asked, how many of the 73 people in the `General` group would we have expected to disclose the freezing problem?

------------------------------------------------------------------------

We would predict that $0.2785 \times 73 = 20.33$ sellers would disclose the problem.
Obviously we observed fewer than this, though it is not yet clear if that is due to chance variation or whether that is because the questions vary in how effective they are at getting to the truth.
:::

::: {.guidedpractice data-latex=""}
If the questions were actually equally effective, meaning about 27.85% of respondents would disclose the freezing issue regardless of what question they were asked, about how many sellers would we expect to *hide* the freezing problem from the Positive Assumption group?[^18-inference-tables-2]
:::

[^18-inference-tables-2]: We would expect $(1 - 0.2785) \times 73 = 52.67.$ It is okay that this result, like the result from the Example above, is a fraction.

We can compute the expected number of sellers who we would expect to disclose or hide the freezing issue for all groups, if the questions had no impact on what they disclosed, using the same strategies employed in the previous Example and Guided Practice to compute expected counts.
These expected counts were used to construct @tbl-ipod-ask-data-summary-expected, which is the same as @tbl-ipod-ask-data-summary, except now the expected counts have been added in parentheses.

```{r}
#| label: tbl-ipod-ask-data-summary-expected
#| tbl-cap: The observed counts and the expected counts for the iPod experiment.
#| tbl-pos: H
ask_chi_sq <- chisq.test(ask$response, ask$question_class)

ask_chi_sq_obs <- ask_chi_sq$observed |>
  as_tibble() |>
  mutate(type = "observed")

ask_chi_sq_exp <- ask_chi_sq$expected |>
  as.table() |>
  as_tibble() |>
  mutate(type = "expected")

ask_chi_sq_tabs <- bind_rows(ask_chi_sq_obs, ask_chi_sq_exp) |>
  rename_with(.fn = str_remove, .cols = everything(), "ask\\$")

ask_chi_sq_tabs |>
  mutate(response_type = paste0(response, "-", type)) |>
  select(-response, -type) |>
  pivot_wider(names_from = response_type, values_from = n) |>
  relocate(
    question_class,
    contains("Disclose"),
    contains("Hide")
  ) |>
  mutate(across(contains("expected"), ~ paste0("(", round(.x, 2), ")"))) |>
  rowwise() |>
  mutate(Total = sum(c_across(contains("observed")))) |>
  adorn_totals(where = "row") |>
  mutate(
    `Disclose problem-expected` = ifelse(`Disclose problem-expected` == "-", NA, `Disclose problem-expected`),
    `Hide problem-expected` = ifelse(`Hide problem-expected` == "-", NA, `Hide problem-expected`)
  ) |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("", "", "", "", "", "")
  ) |>
  column_spec(1, width = "15em") |>
  column_spec(3, color = IMSCOL["blue", "full"], italic = TRUE) |>
  column_spec(5, color = IMSCOL["blue", "full"], italic = TRUE) |>
  column_spec(6, width = "5em") |>
  add_header_above(c(" ", "Disclose problem" = 2, "Hide problem" = 2, "Total")) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  )
```

The examples and exercises above provided some help in computing expected counts.
In general, expected counts for a two-way table may be computed using the row totals, column totals, and the table total.
For instance, if there was no difference between the groups, then about 27.85% of each row should be in the first column:

$$
\begin{aligned}
0.2785\times (\text{row 1 total}) &= 20.33 \\
0.2785\times (\text{row 2 total}) &= 20.33 \\
0.2785\times (\text{row 3 total}) &= 20.33
\end{aligned} 
$$

Looking back to how 0.2785 was computed -- as the fraction of sellers who disclosed the freezing issue $(61/219)$ -- these three expected counts could have been computed as

$$
\begin{aligned}
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
    \text{(column 1 total)} &= 20.33 \\
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
    \text{(column 2 total)} &= 20.33 \\
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
    \text{(column 3 total)} &= 20.33
\end{aligned} 
$$

This leads us to a general formula for computing expected counts in a two-way table when we would like to test whether there is strong evidence of an association between the column variable and row variable.

::: {.important data-latex=""}
**Computing expected counts in a two-way table.**

\index{expected counts}

To calculate the expected count for the $i^{th}$ row and $j^{th}$ column, compute

$$\text{Expected Count}_{\text{row }i,\text{ col }j} = \frac{(\text{row $i$ total}) \times (\text{column $j$ total})}{\text{table total}}$$
:::

```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "expected counts")
```

### The observed chi-squared statistic

The chi-squared test statistic for a two-way table is found by finding the ratio of how far the observed counts are from the expected counts, as compared to the expected counts, for every cell in the table.
For each table count, compute:

$$
\begin{aligned}
&\text{General formula} &&
    \frac{(\text{observed count } - \text{expected count})^2}
        {\text{expected count}} \\
&\text{Row 1, Col 1} &&
    \frac{(2 - 20.33)^2}{20.33} = 16.53 \\
&\text{Row 2, Col 1} &&
    \frac{(23 - 20.33)^2}{20.33} = 0.35 \\
& \hspace{9mm}\vdots &&
    \hspace{13mm}\vdots \\
&\text{Row 3, Col 2} &&
    \frac{(37 - 52.67)^2}{52.67} = 4.66
\end{aligned}
$$

Adding the computed value for each cell gives the chi-squared test statistic $X^2:$

$$X^2 = 16.53 + 0.35 + \dots + 4.66 = 40.13$$

Is 40.13 a big number?
That is, does it indicate that the observed and expected values are really different?
Or is 40.13 a value of the statistic that we would expect to see just due to natural variability?
Previously, we applied the randomization test to the setting where the research question investigated a difference in proportions.
The same idea of shuffling the data under the null hypothesis can be used in the setting of the two-way table.

### Variability of the statistic

Assuming that the individuals would disclose or hide the problems **regardless** of the question they are given (i.e., that the null hypothesis is true), we can randomize the data by reassigning the 61 disclosed problems and 158 hidden problems to the three groups at random.
@tbl-ipod-ask-data-summary-rand shows a possible randomization of the observed data under the condition that the null hypothesis is true (in contrast to the original observed data in @tbl-ipod-ask-data-summary).

```{r}
#| label: tbl-ipod-ask-data-summary-rand
#| tbl-cap: |
#|   Summary of the iPod study.
#| tbl-pos: H
set.seed(4747)

# randomize
ask_rand <- ask |>
  mutate(question_class = sample(question_class))

ask_rand |>
  count(question_class, response) |>
  pivot_wider(names_from = response, values_from = n) |>
  adorn_totals(where = c("row", "col")) |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("Question", "Disclose problem", "Hide problem", "Total")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  )
```

As before, the randomized data is used to find a single value for the test statistic (here a chi-squared statistic).
The chi-squared statistic for the randomized two-way table is found by comparing the observed and expected counts for each cell in the *randomized* table.
For each cell, compute:

$$
\begin{aligned}
&\text{General formula} &&
    \frac{(\text{observed count } - \text{expected count})^2}
        {\text{expected count}} \\
&\text{Row 1, Col 1} &&
    \frac{(29 - 20.33)^2}{20.33} = 3.7 \\
&\text{Row 2, Col 1} &&
    \frac{(15 - 20.33)^2}{20.33} = 1.4 \\
& \hspace{9mm}\vdots &&
    \hspace{13mm}\vdots \\
&\text{Row 3, Col 2} &&
    \frac{(56 - 52.67)^2}{52.67} = 0.211
\end{aligned} 
$$

Adding the computed value for each cell gives the chi-squared test statistic $X^2:$ \index{chi-squared statistic}

$$X^2 = 3.7 + 1.4 + \dots + 0.211 = 8$$

```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "chi-squared statistic")
```

### Observed statistic vs. null chi-squared statistics

As before, one randomization will not be sufficient for understanding if the observed data are particularly different from the expected chi-squared statistics when $H_0$ is true.
To investigate whether 40.13 is large enough to indicate the observed and expected counts are substantially different, we need to understand the variability in the values of the chi-squared statistic we would expect to see if the null hypothesis was true.
@fig-ipodRandDotPlot plots 1,000 chi-squared statistics generated under the null hypothesis.
We can see that the observed value is so far from the null statistics that the simulated p-value is zero.
That is, the probability of seeing the observed statistic when the null hypothesis is true is virtually zero.
In this case we can conclude that the decision of whether to disclose the iPod's problem is changed by the question asked.
We use the causal language of "changed" because the study was an experiment.
Note that with a chi-squared test, we only know that the two variables (`question_class` and `response`) are related (i.e., not independent).
We are not able to claim which type of question causes which type of response.

```{r}
#| label: fig-ipodRandDotPlot
#| fig-cap: |
#|   A histogram of chi-squared statisics from 1,000 simulations produced under
#|   the null hypothesis, $H_0,$ where the question is independent of the response. The
#|   observed statistic of 40.13 is marked by the red line. None of the 1,000 simulations
#|   had a chi-squared value of at least 40.13. In fact, none of the simulated chi-squared
#|   statistics came anywhere close to the observed statistic!
#| fig-alt: |
#|   A histogram of chi-squared statisics from 1,000 simulations produced under
#|   the null hypothesis, where the question is independent of the response. The
#|   observed statistic of 40.13 is marked by the red line. None of the 1,000 simulations
#|   had a chi-squared value of at least 40.13.
#| fig-asp: 0.6
set.seed(4747)

ask_rand_obs <- ask |>
  specify(response ~ question_class) |>
  calculate(stat = "Chisq") |>
  pull()

ask_rand_dist <- ask |>
  specify(response ~ question_class) |>
  hypothesise(null = "independence") |>
  generate(reps = 1000, type = "permute") |>
  calculate(stat = "Chisq")

ggplot(ask_rand_dist, aes(x = stat)) +
  geom_histogram(binwidth = 1) +
  geom_vline(xintercept = ask_rand_obs, col = "red", lwd = 1.5) +
  expand_limits(x = 40.13) +
  labs(
    x = "Chi-squared statistics assuming a true null hypothesis",
    y = "Count"
  )
```

## Mathematical model for test of independence {#sec-mathchisq}

### The chi-squared test of independence

Previously, in @sec-math-2prop, we applied the Central Limit Theorem to the sampling variability of $\hat{p}_1 - \hat{p}_2.$ The result was that we could use the normal distribution (e.g., $z^*$ values (see @fig-choosingZForCI) and p-values from $Z$ scores) to complete the mathematical inferential procedure.
The chi-squared test statistic has a different mathematical distribution called the Chi-squared distribution.
The important specification to make in describing the chi-squared distribution is something called degrees of freedom.
The degrees of freedom change the shape of the chi-squared distribution to fit the problem at hand.
@fig-chisqDistDF visualizes different chi-squared distributions corresponding to different degrees of freedom.

```{r}
#| label: fig-chisqDistDF
#| fig-cap: |
#|   The chi-squared distribution for differing degrees of freedom. The larger
#|   the degrees of freedom, the longer the right tail extends. The smaller the degrees
#|   of freedom, the more peaked the mode on the left becomes.
#| fig-alt: |
#|   The chi-squared distribution for differing degrees of freedom. The larger
#|   the degrees of freedom, the longer the right tail extends. The smaller the degrees
#|   of freedom, the more peaked the mode on the left becomes.
#| fig-asp: 0.5
x <- c(0, seq(0.0000001, 40, 0.05))
DF <- c(2.0000001, 4, 9)
y <- list()
for (i in 1:length(DF)) {
  y[[i]] <- dchisq(x, DF[i])
}
par(mar = c(2, 0, 0, 0))
plot(0, 0,
  type = "n",
  xlim = c(0, 25),
  ylim = range(c(y, recursive = TRUE)),
  axes = FALSE,
  xlab = "",
  ylab = ""
)
for (i in 1:length(DF)) {
  lines(x, y[[i]],
    lty = i,
    col = IMSCOL[ifelse(i == 3, 4, i)],
    lwd = 1.5 + i / 2
  )
}
abline(h = 0)
axis(1)
legend("topright",
  lwd = 0.3 + 1:4 / 1.25,
  col = IMSCOL[c(1, 2, 4)],
  lty = 1:4,
  legend = paste(round(DF)),
  title = "Degrees of Freedom",
  cex = 1
)
```

### Variability of the chi-squared statistic

As it turns out, the chi-squared test statistic follows a **Chi-squared distribution**\index{Chi-squared distribution} when the null hypothesis is true.
For two way tables, the degrees of freedom is equal to: $df = \text{(number of rows minus 1)}\times \text{(number of columns minus 1)}$.
In our example, the degrees of freedom parameter is $df = (2-1)\times (3-1) = 2$.

```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "Chi-squared distribution")
```

### Observed statistic vs. null chi-squared statistics

::: {.important data-latex=""}
**The test statistic for assessing the independence between two categorical variables is a** $X^2.$

The $X^2$ statistic is a ratio of how the observed counts vary from the expected counts as compared to the expected counts (which are a measure of how large the sample size is).

$$X^2 = \sum_{i,j} \frac{(\text{observed count} - \text{expected count})^2}{\text{expected count}}$$

When the null hypothesis is true and the conditions are met, $X^2$ has a Chi-squared distribution with $df = (r-1) \times (c-1).$

Conditions:

-   Independent observations
-   Large samples: 5 expected counts in each cell
:::

To bring it back to the example, we can safely assume that the observations are independent, as the question groups were randomly assigned.
Additionally, there are over 5 expected counts in each cell, so the conditions for using the Chi-square distribution are met.
If the null hypothesis is true (i.e., the questions had no impact on the sellers in the experiment), then the test statistic $X^2 = 40.13$ is expected to follow a Chi-squared distribution with 2 degrees of freedom.
Using this information, we can compute the p-value for the test, which is depicted in @fig-iPodChiSqTail.

::: {.important data-latex=""}
**Computing degrees of freedom for a two-way table.**

\index{degrees of freedom!chi-squared test}
When applying the chi-squared test to a two-way table, we use $df = (R-1)\times (C-1)$ where $R$ is the number of rows in the table and $C$ is the number of columns.
:::

```{r}
#| label: fig-iPodChiSqTail
#| fig-cap: Visualization of the p-value for $X^2 = 40.13$ when $df = 2$.
#| fig-alt: |
#|   Chi-square distribution (with df = 2) curve, shaded for p-value for 
#|   X2 = 40.13. The p-value is so small that it is not visible on the plot.
#| fig-asp: 0.5
par(mar = c(2, 0, 0, 0))
x <- 40.13
ChiSquareTail(
  x, 2,
  c(0, 50),
  col = IMSCOL["blue", "full"]
)
text(x, 0, "Tail area (1 / 500 million)\nis too small to see", pos = 3)
lines(c(x, 1000 * x), rep(0, 2), col = IMSCOL["blue", "full"], lwd = 3)
```

The software R can be used to find the p-value with the function `pchisq()`.
Just like `pnorm()`, `pchisq()` always gives the area to the left of the cutoff value.
Because, in this example, the p-value is represented by the area to the right of 40.13, we subtract the output of `pchisq()` from 1.

```{r}
#| echo: true
1 - pchisq(40.13, df = 2)
```

::: {.workedexample data-latex=""}
Find the p-value and draw a conclusion about whether the question affects the sellers likelihood of reporting the freezing problem.

------------------------------------------------------------------------

Using a computer, we can compute a very precise value for the tail area above $X^2 = 40.13$ for a chi-squared distribution with 2 degrees of freedom: 0.000000002.

Using a discernibility level of $\alpha=0.05,$ the null hypothesis is rejected since the p-value is smaller.
That is, the data provide convincing evidence that the question asked did affect a seller's likelihood to tell the truth about problems with the iPod.
:::

::: {.workedexample data-latex=""}
@tbl-diabetes2ExpMetRosiLifestyleSummary summarizes the results of an experiment evaluating three treatments for Type 2 Diabetes in patients aged 10-17 who were being treated with metformin.
The three treatments considered were continued treatment with metformin (`met`), treatment with metformin combined with rosiglitazone (`rosi`), or a `lifestyle` intervention program.
Each patient had a primary outcome, which was either lacked glycemic control (failure) or did not lack that control (success).
What are appropriate hypotheses for this test?

------------------------------------------------------------------------

-   $H_0:$ There is no difference in the effectiveness of the three treatments.
-   $H_A:$ There is some difference in effectiveness between the three treatments, e.g., perhaps the `rosi` treatment performed better than `lifestyle`.
:::

```{r}
#| label: tbl-diabetes2ExpMetRosiLifestyleSummary
#| tbl-cap: Results for the Type 2 Diabetes study.
#| tbl-pos: H
diabetes2 |>
  count(outcome, treatment) |>
  pivot_wider(names_from = outcome, values_from = n) |>
  adorn_totals(where = c("row", "col")) |>
  kbl(
    linesep = "", booktabs = TRUE,
    col.names = c("Treatment", "Failure", "Success", "Total")
  ) |>
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    latex_options = c("striped"), full_width = FALSE
  ) |>
  column_spec(1:4, width = "5em")
```

::: {.data data-latex=""}
The [`diabetes2`](http://openintrostat.github.io/openintro/reference/diabetes2.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::

Typically we will use a computer to do the computational work of finding the chi-squared statistic.
However, it is always good to have a sense for what the computer is doing, and in particular, calculating the values which would be expected if the null hypothesis is true can help to understand the null hypothesis claim.
Additionally, comparing the expected and observed values by eye often gives the researcher some insight into why or why not the null hypothesis for a given test is rejected or not.

::: {.guidedpractice data-latex=""}
A chi-squared test for a two-way table may be used to test the hypotheses in the diabetes Example above.
To get a sense for the statistic used in the chi-squared test, first compute the expected values for each of the six table cells.[^18-inference-tables-3]
:::

[^18-inference-tables-3]: The expected count for row one / column one is found by multiplying the row one total (234) and column one total (319), then dividing by the table total (699): $\frac{234\times 319}{699} = 106.8.$ Similarly for the second column and the first row: $\frac{234\times 380}{699} = 127.2.$ Row 2: 105.9 and 126.1.
    Row 3: 106.3 and 126.7.

Note, when analyzing 2-by-2 contingency tables (that is, when both variables only have two possible options), one guideline is to use the two-proportion methods introduced in [Chapter -@sec-inference-two-props].

\clearpage

## Chapter review {#sec-chp18-review}

### Summary

In this chapter we extended the randomization / bootstrap / mathematical model paradigm to research questions involving categorical variables.
We continued working with one population proportion as well as the difference in populations proportions, but the test of independence allowed for hypothesis testing on categorical variables with more than two levels.
We note that the normal model was an excellent mathematical approximation to the sampling distribution of sample proportions (or differences in sample proportions), but that the questions with categorical variables with more than 2 levels required a new mathematical model, the chi-squared distribution.
As seen in [Chapter -@sec-foundations-randomization], [Chapter -@sec-foundations-bootstrapping] and [Chapter -@sec-foundations-mathematical], almost all the research questions can be approached using computational methods (e.g., randomization tests or bootstrapping) or using mathematical models.
We continue to emphasize the importance of experimental design in making conclusions about research claims.
In particular, recall that variability can come from different sources (e.g., random sampling vs. random allocation, see @fig-randsampValloc).

### Terms

The terms introduced in this chapter are presented in @tbl-terms-chp-18.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
You should be able to easily spot them as **bolded text**.

```{r}
#| label: tbl-terms-chp-18
#| tbl-cap: Terms introduced in this chapter.
#| tbl-pos: H
make_terms_table(terms_chp_18)
```

\clearpage

## Exercises {#sec-chp18-exercises}

Answers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-18].

::: {.exercises data-latex=""}
{{< include exercises/_18-ex-inference-tables.qmd >}}
:::