-
Notifications
You must be signed in to change notification settings - Fork 176
/
Copy pathinference-tables.qmd
560 lines (456 loc) · 26.6 KB
/
inference-tables.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
# Inference for two-way tables {#sec-inference-tables}
```{r}
#| include: false
source("_common.R")
```
::: {.chapterintro data-latex=""}
In [Chapter -@sec-inference-two-props] our focus was on the difference in proportions, a statistic calculated from finding the success proportions (from the binary response variable) measured across two groups (the binary explanatory variable).
As we will see in the examples below, sometimes the explanatory or response variables have more than two possible options.
In that setting, a difference across two groups is not sufficient, and the proportion of "success" is not well defined if there are 3 or 4 or more possible response levels.
The primary way to summarize categorical data where the explanatory and response variables both have 2 or more levels is through a two-way table as in @tbl-ipod-ask-data-summary.
Note that with two-way tables, there is not an obvious single parameter of interest.
Instead, research questions usually focus on how the proportions of the response variable changes (or not) across the different levels of the explanatory variable.
Because there is not a population parameter to estimate, bootstrapping to find the standard error of the estimate is not meaningful.
As such, for two-way tables, we will focus on the randomization test and corresponding mathematical approximation (and not bootstrapping).
:::
## Randomization test of independence
We all buy used products -- cars, computers, textbooks, and so on -- and we sometimes assume the sellers of those products will be forthright about any underlying problems with what they're selling.
This is not something we should take for granted.
Researchers recruited 219 participants in a study where they would sell a used iPod[^18-inference-tables-1] that was known to have frozen twice in the past.
The participants were incentivized to get as much money as they could for the iPod since they would receive a 5% cut of the sale on top of \$10 for participating.
The researchers wanted to understand what types of questions would elicit the seller to disclose the freezing issue.
[^18-inference-tables-1]: For readers not as old as the authors, an iPod is basically an iPhone without any cellular service, assuming it was one of the later generations.
Earlier generations were more basic.
\clearpage
Unbeknownst to the participants who were the sellers in the study, the buyers were collaborating with the researchers to evaluate the influence of different questions on the likelihood of getting the sellers to disclose the past issues with the iPod.
The scripted buyers started with "Okay, I guess I'm supposed to go first. So you've had the iPod for 2 years ..." and ended with one of three questions:
- General: What can you tell me about it?
- Positive Assumption: It does not have any problems, does it?
- Negative Assumption: What problems does it have?
The question is the treatment given to the sellers, and the response is whether the question prompted them to disclose the freezing issue with the iPod.
The results are shown in @tbl-ipod-ask-data-summary, and the data suggest that asking the, *What problems does it have?*, was the most effective at getting the seller to disclose the past freezing issues.
However, you should also be asking yourself: could we see these results due to chance alone if there really is no difference in the question asked, or is this in fact evidence that some questions are more effective for getting at the truth?
```{r}
#| label: ask-data-prep
ask <- ask |>
mutate(
response = if_else(response == "disclose", "Disclose problem", "Hide problem"),
question_class = case_when(
question_class == "general" ~ "General",
question_class == "neg_assumption" ~ "Negative assumption",
question_class == "pos_assumption" ~ "Positive assumption"
),
question_class = fct_relevel(question_class, "General", "Positive assumption", "Negative assumption")
)
```
```{r}
#| label: tbl-ipod-ask-data-summary
#| tbl-cap: |
#| Summary of the iPod study, where a question was posed to the study
#| participant who acted.
#| tbl-pos: H
ask |>
count(question_class, response) |>
pivot_wider(names_from = response, values_from = n) |>
adorn_totals(where = c("row", "col")) |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("Question", "Disclose problem", "Hide problem", "Total")
) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
)
```
::: {.data data-latex=""}
The [`ask`](http://openintrostat.github.io/openintro/reference/ask.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::
The hypothesis test for the iPod experiment is really about assessing whether there is convincing evidence that there was a difference in the success rates that each question had on getting the participant to disclose the problem with the iPod.
In other words, the goal is to check whether the buyer's question was independent\index{independence} of whether the seller disclosed a problem.
```{r}
#| include: false
terms_chp_18 <- c("independence")
```
### Expected counts in two-way tables
While we would not expect the number of disclosures to be exactly the same across the three question classes, the rate of disclosure seems substantially different across the three groups.
In order to investigate whether the differences in rates is due to natural variability in people's honesty or due to a treatment effect (i.e., the question causing the differences), we need to compute estimated counts for each cell in a two-way table.
::: {.workedexample data-latex=""}
From the experiment, we can compute the proportion of all sellers who disclosed the freezing problem as $61/219 = 0.2785.$ If there really is no difference among the questions and 27.85% of sellers were going to disclose the freezing problem no matter the question they were asked, how many of the 73 people in the `General` group would we have expected to disclose the freezing problem?
------------------------------------------------------------------------
We would predict that $0.2785 \times 73 = 20.33$ sellers would disclose the problem.
Obviously we observed fewer than this, though it is not yet clear if that is due to chance variation or whether that is because the questions vary in how effective they are at getting to the truth.
:::
::: {.guidedpractice data-latex=""}
If the questions were actually equally effective, meaning about 27.85% of respondents would disclose the freezing issue regardless of what question they were asked, about how many sellers would we expect to *hide* the freezing problem from the Positive Assumption group?[^18-inference-tables-2]
:::
[^18-inference-tables-2]: We would expect $(1 - 0.2785) \times 73 = 52.67.$ It is okay that this result, like the result from the Example above, is a fraction.
We can compute the expected number of sellers who we would expect to disclose or hide the freezing issue for all groups, if the questions had no impact on what they disclosed, using the same strategies employed in the previous Example and Guided Practice to compute expected counts.
These expected counts were used to construct @tbl-ipod-ask-data-summary-expected, which is the same as @tbl-ipod-ask-data-summary, except now the expected counts have been added in parentheses.
```{r}
#| label: tbl-ipod-ask-data-summary-expected
#| tbl-cap: The observed counts and the expected counts for the iPod experiment.
#| tbl-pos: H
ask_chi_sq <- chisq.test(ask$response, ask$question_class)
ask_chi_sq_obs <- ask_chi_sq$observed |>
as_tibble() |>
mutate(type = "observed")
ask_chi_sq_exp <- ask_chi_sq$expected |>
as.table() |>
as_tibble() |>
mutate(type = "expected")
ask_chi_sq_tabs <- bind_rows(ask_chi_sq_obs, ask_chi_sq_exp) |>
rename_with(.fn = str_remove, .cols = everything(), "ask\\$")
ask_chi_sq_tabs |>
mutate(response_type = paste0(response, "-", type)) |>
select(-response, -type) |>
pivot_wider(names_from = response_type, values_from = n) |>
relocate(
question_class,
contains("Disclose"),
contains("Hide")
) |>
mutate(across(contains("expected"), ~ paste0("(", round(.x, 2), ")"))) |>
rowwise() |>
mutate(Total = sum(c_across(contains("observed")))) |>
adorn_totals(where = "row") |>
mutate(
`Disclose problem-expected` = ifelse(`Disclose problem-expected` == "-", NA, `Disclose problem-expected`),
`Hide problem-expected` = ifelse(`Hide problem-expected` == "-", NA, `Hide problem-expected`)
) |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("", "", "", "", "", "")
) |>
column_spec(1, width = "15em") |>
column_spec(3, color = IMSCOL["blue", "full"], italic = TRUE) |>
column_spec(5, color = IMSCOL["blue", "full"], italic = TRUE) |>
column_spec(6, width = "5em") |>
add_header_above(c(" ", "Disclose problem" = 2, "Hide problem" = 2, "Total")) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
)
```
The examples and exercises above provided some help in computing expected counts.
In general, expected counts for a two-way table may be computed using the row totals, column totals, and the table total.
For instance, if there was no difference between the groups, then about 27.85% of each row should be in the first column:
$$
\begin{aligned}
0.2785\times (\text{row 1 total}) &= 20.33 \\
0.2785\times (\text{row 2 total}) &= 20.33 \\
0.2785\times (\text{row 3 total}) &= 20.33
\end{aligned}
$$
Looking back to how 0.2785 was computed -- as the fraction of sellers who disclosed the freezing issue $(61/219)$ -- these three expected counts could have been computed as
$$
\begin{aligned}
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
\text{(column 1 total)} &= 20.33 \\
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
\text{(column 2 total)} &= 20.33 \\
\left(\frac{\text{row 1 total}}{\text{table total}}\right)
\text{(column 3 total)} &= 20.33
\end{aligned}
$$
This leads us to a general formula for computing expected counts in a two-way table when we would like to test whether there is strong evidence of an association between the column variable and row variable.
::: {.important data-latex=""}
**Computing expected counts in a two-way table.**
\index{expected counts}
To calculate the expected count for the $i^{th}$ row and $j^{th}$ column, compute
$$\text{Expected Count}_{\text{row }i,\text{ col }j} = \frac{(\text{row $i$ total}) \times (\text{column $j$ total})}{\text{table total}}$$
:::
```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "expected counts")
```
### The observed chi-squared statistic
The chi-squared test statistic for a two-way table is found by finding the ratio of how far the observed counts are from the expected counts, as compared to the expected counts, for every cell in the table.
For each table count, compute:
$$
\begin{aligned}
&\text{General formula} &&
\frac{(\text{observed count } - \text{expected count})^2}
{\text{expected count}} \\
&\text{Row 1, Col 1} &&
\frac{(2 - 20.33)^2}{20.33} = 16.53 \\
&\text{Row 2, Col 1} &&
\frac{(23 - 20.33)^2}{20.33} = 0.35 \\
& \hspace{9mm}\vdots &&
\hspace{13mm}\vdots \\
&\text{Row 3, Col 2} &&
\frac{(37 - 52.67)^2}{52.67} = 4.66
\end{aligned}
$$
Adding the computed value for each cell gives the chi-squared test statistic $X^2:$
$$X^2 = 16.53 + 0.35 + \dots + 4.66 = 40.13$$
Is 40.13 a big number?
That is, does it indicate that the observed and expected values are really different?
Or is 40.13 a value of the statistic that we would expect to see just due to natural variability?
Previously, we applied the randomization test to the setting where the research question investigated a difference in proportions.
The same idea of shuffling the data under the null hypothesis can be used in the setting of the two-way table.
### Variability of the statistic
Assuming that the individuals would disclose or hide the problems **regardless** of the question they are given (i.e., that the null hypothesis is true), we can randomize the data by reassigning the 61 disclosed problems and 158 hidden problems to the three groups at random.
@tbl-ipod-ask-data-summary-rand shows a possible randomization of the observed data under the condition that the null hypothesis is true (in contrast to the original observed data in @tbl-ipod-ask-data-summary).
```{r}
#| label: tbl-ipod-ask-data-summary-rand
#| tbl-cap: |
#| Summary of the iPod study.
#| tbl-pos: H
set.seed(4747)
# randomize
ask_rand <- ask |>
mutate(question_class = sample(question_class))
ask_rand |>
count(question_class, response) |>
pivot_wider(names_from = response, values_from = n) |>
adorn_totals(where = c("row", "col")) |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("Question", "Disclose problem", "Hide problem", "Total")
) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
)
```
As before, the randomized data is used to find a single value for the test statistic (here a chi-squared statistic).
The chi-squared statistic for the randomized two-way table is found by comparing the observed and expected counts for each cell in the *randomized* table.
For each cell, compute:
$$
\begin{aligned}
&\text{General formula} &&
\frac{(\text{observed count } - \text{expected count})^2}
{\text{expected count}} \\
&\text{Row 1, Col 1} &&
\frac{(29 - 20.33)^2}{20.33} = 3.7 \\
&\text{Row 2, Col 1} &&
\frac{(15 - 20.33)^2}{20.33} = 1.4 \\
& \hspace{9mm}\vdots &&
\hspace{13mm}\vdots \\
&\text{Row 3, Col 2} &&
\frac{(56 - 52.67)^2}{52.67} = 0.211
\end{aligned}
$$
Adding the computed value for each cell gives the chi-squared test statistic $X^2:$ \index{chi-squared statistic}
$$X^2 = 3.7 + 1.4 + \dots + 0.211 = 8$$
```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "chi-squared statistic")
```
### Observed statistic vs. null chi-squared statistics
As before, one randomization will not be sufficient for understanding if the observed data are particularly different from the expected chi-squared statistics when $H_0$ is true.
To investigate whether 40.13 is large enough to indicate the observed and expected counts are substantially different, we need to understand the variability in the values of the chi-squared statistic we would expect to see if the null hypothesis was true.
@fig-ipodRandDotPlot plots 1,000 chi-squared statistics generated under the null hypothesis.
We can see that the observed value is so far from the null statistics that the simulated p-value is zero.
That is, the probability of seeing the observed statistic when the null hypothesis is true is virtually zero.
In this case we can conclude that the decision of whether to disclose the iPod's problem is changed by the question asked.
We use the causal language of "changed" because the study was an experiment.
Note that with a chi-squared test, we only know that the two variables (`question_class` and `response`) are related (i.e., not independent).
We are not able to claim which type of question causes which type of response.
```{r}
#| label: fig-ipodRandDotPlot
#| fig-cap: |
#| A histogram of chi-squared statisics from 1,000 simulations produced under
#| the null hypothesis, $H_0,$ where the question is independent of the response. The
#| observed statistic of 40.13 is marked by the red line. None of the 1,000 simulations
#| had a chi-squared value of at least 40.13. In fact, none of the simulated chi-squared
#| statistics came anywhere close to the observed statistic!
#| fig-alt: |
#| A histogram of chi-squared statisics from 1,000 simulations produced under
#| the null hypothesis, where the question is independent of the response. The
#| observed statistic of 40.13 is marked by the red line. None of the 1,000 simulations
#| had a chi-squared value of at least 40.13.
#| fig-asp: 0.6
set.seed(4747)
ask_rand_obs <- ask |>
specify(response ~ question_class) |>
calculate(stat = "Chisq") |>
pull()
ask_rand_dist <- ask |>
specify(response ~ question_class) |>
hypothesise(null = "independence") |>
generate(reps = 1000, type = "permute") |>
calculate(stat = "Chisq")
ggplot(ask_rand_dist, aes(x = stat)) +
geom_histogram(binwidth = 1) +
geom_vline(xintercept = ask_rand_obs, col = "red", lwd = 1.5) +
expand_limits(x = 40.13) +
labs(
x = "Chi-squared statistics assuming a true null hypothesis",
y = "Count"
)
```
## Mathematical model for test of independence {#sec-mathchisq}
### The chi-squared test of independence
Previously, in @sec-math-2prop, we applied the Central Limit Theorem to the sampling variability of $\hat{p}_1 - \hat{p}_2.$ The result was that we could use the normal distribution (e.g., $z^*$ values (see @fig-choosingZForCI) and p-values from $Z$ scores) to complete the mathematical inferential procedure.
The chi-squared test statistic has a different mathematical distribution called the Chi-squared distribution.
The important specification to make in describing the chi-squared distribution is something called degrees of freedom.
The degrees of freedom change the shape of the chi-squared distribution to fit the problem at hand.
@fig-chisqDistDF visualizes different chi-squared distributions corresponding to different degrees of freedom.
```{r}
#| label: fig-chisqDistDF
#| fig-cap: |
#| The chi-squared distribution for differing degrees of freedom. The larger
#| the degrees of freedom, the longer the right tail extends. The smaller the degrees
#| of freedom, the more peaked the mode on the left becomes.
#| fig-alt: |
#| The chi-squared distribution for differing degrees of freedom. The larger
#| the degrees of freedom, the longer the right tail extends. The smaller the degrees
#| of freedom, the more peaked the mode on the left becomes.
#| fig-asp: 0.5
x <- c(0, seq(0.0000001, 40, 0.05))
DF <- c(2.0000001, 4, 9)
y <- list()
for (i in 1:length(DF)) {
y[[i]] <- dchisq(x, DF[i])
}
par(mar = c(2, 0, 0, 0))
plot(0, 0,
type = "n",
xlim = c(0, 25),
ylim = range(c(y, recursive = TRUE)),
axes = FALSE,
xlab = "",
ylab = ""
)
for (i in 1:length(DF)) {
lines(x, y[[i]],
lty = i,
col = IMSCOL[ifelse(i == 3, 4, i)],
lwd = 1.5 + i / 2
)
}
abline(h = 0)
axis(1)
legend("topright",
lwd = 0.3 + 1:4 / 1.25,
col = IMSCOL[c(1, 2, 4)],
lty = 1:4,
legend = paste(round(DF)),
title = "Degrees of Freedom",
cex = 1
)
```
### Variability of the chi-squared statistic
As it turns out, the chi-squared test statistic follows a **Chi-squared distribution**\index{Chi-squared distribution} when the null hypothesis is true.
For two way tables, the degrees of freedom is equal to: $df = \text{(number of rows minus 1)}\times \text{(number of columns minus 1)}$.
In our example, the degrees of freedom parameter is $df = (2-1)\times (3-1) = 2$.
```{r}
#| include: false
terms_chp_18 <- c(terms_chp_18, "Chi-squared distribution")
```
### Observed statistic vs. null chi-squared statistics
::: {.important data-latex=""}
**The test statistic for assessing the independence between two categorical variables is a** $X^2.$
The $X^2$ statistic is a ratio of how the observed counts vary from the expected counts as compared to the expected counts (which are a measure of how large the sample size is).
$$X^2 = \sum_{i,j} \frac{(\text{observed count} - \text{expected count})^2}{\text{expected count}}$$
When the null hypothesis is true and the conditions are met, $X^2$ has a Chi-squared distribution with $df = (r-1) \times (c-1).$
Conditions:
- Independent observations
- Large samples: 5 expected counts in each cell
:::
To bring it back to the example, we can safely assume that the observations are independent, as the question groups were randomly assigned.
Additionally, there are over 5 expected counts in each cell, so the conditions for using the Chi-square distribution are met.
If the null hypothesis is true (i.e., the questions had no impact on the sellers in the experiment), then the test statistic $X^2 = 40.13$ is expected to follow a Chi-squared distribution with 2 degrees of freedom.
Using this information, we can compute the p-value for the test, which is depicted in @fig-iPodChiSqTail.
::: {.important data-latex=""}
**Computing degrees of freedom for a two-way table.**
\index{degrees of freedom!chi-squared test}
When applying the chi-squared test to a two-way table, we use $df = (R-1)\times (C-1)$ where $R$ is the number of rows in the table and $C$ is the number of columns.
:::
```{r}
#| label: fig-iPodChiSqTail
#| fig-cap: Visualization of the p-value for $X^2 = 40.13$ when $df = 2$.
#| fig-alt: |
#| Chi-square distribution (with df = 2) curve, shaded for p-value for
#| X2 = 40.13. The p-value is so small that it is not visible on the plot.
#| fig-asp: 0.5
par(mar = c(2, 0, 0, 0))
x <- 40.13
ChiSquareTail(
x, 2,
c(0, 50),
col = IMSCOL["blue", "full"]
)
text(x, 0, "Tail area (1 / 500 million)\nis too small to see", pos = 3)
lines(c(x, 1000 * x), rep(0, 2), col = IMSCOL["blue", "full"], lwd = 3)
```
The software R can be used to find the p-value with the function `pchisq()`.
Just like `pnorm()`, `pchisq()` always gives the area to the left of the cutoff value.
Because, in this example, the p-value is represented by the area to the right of 40.13, we subtract the output of `pchisq()` from 1.
```{r}
#| echo: true
1 - pchisq(40.13, df = 2)
```
::: {.workedexample data-latex=""}
Find the p-value and draw a conclusion about whether the question affects the sellers likelihood of reporting the freezing problem.
------------------------------------------------------------------------
Using a computer, we can compute a very precise value for the tail area above $X^2 = 40.13$ for a chi-squared distribution with 2 degrees of freedom: 0.000000002.
Using a discernibility level of $\alpha=0.05,$ the null hypothesis is rejected since the p-value is smaller.
That is, the data provide convincing evidence that the question asked did affect a seller's likelihood to tell the truth about problems with the iPod.
:::
::: {.workedexample data-latex=""}
@tbl-diabetes2ExpMetRosiLifestyleSummary summarizes the results of an experiment evaluating three treatments for Type 2 Diabetes in patients aged 10-17 who were being treated with metformin.
The three treatments considered were continued treatment with metformin (`met`), treatment with metformin combined with rosiglitazone (`rosi`), or a `lifestyle` intervention program.
Each patient had a primary outcome, which was either lacked glycemic control (failure) or did not lack that control (success).
What are appropriate hypotheses for this test?
------------------------------------------------------------------------
- $H_0:$ There is no difference in the effectiveness of the three treatments.
- $H_A:$ There is some difference in effectiveness between the three treatments, e.g., perhaps the `rosi` treatment performed better than `lifestyle`.
:::
```{r}
#| label: tbl-diabetes2ExpMetRosiLifestyleSummary
#| tbl-cap: Results for the Type 2 Diabetes study.
#| tbl-pos: H
diabetes2 |>
count(outcome, treatment) |>
pivot_wider(names_from = outcome, values_from = n) |>
adorn_totals(where = c("row", "col")) |>
kbl(
linesep = "", booktabs = TRUE,
col.names = c("Treatment", "Failure", "Success", "Total")
) |>
kable_styling(
bootstrap_options = c("striped", "condensed"),
latex_options = c("striped"), full_width = FALSE
) |>
column_spec(1:4, width = "5em")
```
::: {.data data-latex=""}
The [`diabetes2`](http://openintrostat.github.io/openintro/reference/diabetes2.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::
Typically we will use a computer to do the computational work of finding the chi-squared statistic.
However, it is always good to have a sense for what the computer is doing, and in particular, calculating the values which would be expected if the null hypothesis is true can help to understand the null hypothesis claim.
Additionally, comparing the expected and observed values by eye often gives the researcher some insight into why or why not the null hypothesis for a given test is rejected or not.
::: {.guidedpractice data-latex=""}
A chi-squared test for a two-way table may be used to test the hypotheses in the diabetes Example above.
To get a sense for the statistic used in the chi-squared test, first compute the expected values for each of the six table cells.[^18-inference-tables-3]
:::
[^18-inference-tables-3]: The expected count for row one / column one is found by multiplying the row one total (234) and column one total (319), then dividing by the table total (699): $\frac{234\times 319}{699} = 106.8.$ Similarly for the second column and the first row: $\frac{234\times 380}{699} = 127.2.$ Row 2: 105.9 and 126.1.
Row 3: 106.3 and 126.7.
Note, when analyzing 2-by-2 contingency tables (that is, when both variables only have two possible options), one guideline is to use the two-proportion methods introduced in [Chapter -@sec-inference-two-props].
\clearpage
## Chapter review {#sec-chp18-review}
### Summary
In this chapter we extended the randomization / bootstrap / mathematical model paradigm to research questions involving categorical variables.
We continued working with one population proportion as well as the difference in populations proportions, but the test of independence allowed for hypothesis testing on categorical variables with more than two levels.
We note that the normal model was an excellent mathematical approximation to the sampling distribution of sample proportions (or differences in sample proportions), but that the questions with categorical variables with more than 2 levels required a new mathematical model, the chi-squared distribution.
As seen in [Chapter -@sec-foundations-randomization], [Chapter -@sec-foundations-bootstrapping] and [Chapter -@sec-foundations-mathematical], almost all the research questions can be approached using computational methods (e.g., randomization tests or bootstrapping) or using mathematical models.
We continue to emphasize the importance of experimental design in making conclusions about research claims.
In particular, recall that variability can come from different sources (e.g., random sampling vs. random allocation, see @fig-randsampValloc).
### Terms
The terms introduced in this chapter are presented in @tbl-terms-chp-18.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
You should be able to easily spot them as **bolded text**.
```{r}
#| label: tbl-terms-chp-18
#| tbl-cap: Terms introduced in this chapter.
#| tbl-pos: H
make_terms_table(terms_chp_18)
```
\clearpage
## Exercises {#sec-chp18-exercises}
Answers to odd-numbered exercises can be found in [Appendix -@sec-exercise-solutions-18].
::: {.exercises data-latex=""}
{{< include exercises/_18-ex-inference-tables.qmd >}}
:::