-
Notifications
You must be signed in to change notification settings - Fork 5
/
02-onewayANOVA.Rmd
632 lines (513 loc) · 22 KB
/
02-onewayANOVA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
# One-Way ANOVA
```{r include=FALSE}
load("data/oneway_data.RData")
```
## Introduction
Using the formula also used by @albers2018power, we can determine the means that should yield a specified effect sizes (expressed in Cohen's *f*). Eta-squared (identical to partial eta-squared for one-way ANOVA's) has benchmarks of .0099, .0588, and .1379 for small, medium, and large effect sizes [@cohen1988spa]. Although these benchmarks are quite arbitrary, and researchers should only use such benchmarks for power analyses as a last resort, we will demonstrate an a priori power analysis for these values.
### Two conditions
Imagine we aim to design a study to test the hypothesis that giving people a pet to take care of will increase their life satisfaction. We have a control condition, and a condition where people get a pet, and randomly assign participants to either condition. We can simulate a one-way ANOVA with a specified alpha, sample size, and effect size, on see the statistical power we would have for the ANOVA and the follow-up comparisons. We expect pets to increase life-satisfaction compared to the control condition. Based on work by @pavot1993affective we believe that we can expect responses on the life-satifaction scale to have a mean of approximately 24 in our population, with a standard deviation of 6.4. We expect having a pet increases life satisfaction with approximately 2.2 scale points for participants who get a pet. There are 200 participants in total, with 100 participants in each condition. But before we proceed with the data collection, we examine the statistical power our design would have to detect the differences we predict.
```{r start_one_way}
string <- "2b"
n <- 100
# We are thinking of running 100 people in each condition
mu <- c(24, 26.2)
# Enter means in the order that matches the labels below.
# In this case, control, pet.
sd <- 6.4
labelnames <- c("condition", "control", "pet") #
# the label names should be in the order of the means specified above.
design_result <- ANOVA_design(design = string,
n = n,
mu = mu,
sd = sd,
labelnames = labelnames)
alpha_level <- 0.05
# You should think carefully about how to justify your alpha level.
# We will use 0.05 in most examples, but do not support using a 0.05 alpha level for all analyses.
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(simulation_result_2.1$main_results,
caption = "Simulated ANOVA Result") %>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result") %>%
kable_styling(latex_options = "hold_position")
```
The result shows that we have exactly the same power for the ANOVA (note: in the simulation, values will differ slightly, but with a large number of simulations, power estimates will be identical), as we have for the *t*-test. When there are only two groups, these tests are mathematically identical. In a study with 100 participants, we would have quite low power (around 67.7%). An ANOVA with 2 groups is identical to a *t*-test. For our example, Cohen's d (the standardized mean difference) is 2.2/6.4, or d = 0.34375 for the difference between the control condition and pets, which we can use to easily compute the expected power for these simple comparisons using the `pwr` -@R-pwr package.
```{r}
pwr.t.test(d = 2.2/6.4,
n = 100,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided")$power
```
We can also directly compute Cohen's *f* from Cohen's d for two groups, as @cohen1988spa describes, because *f* = 1/2d. So *f* = 0.5*0.34375 = 0.171875. And indeed, power analysis using the pwr package yields the same result using the `pwr.anova.test` as the `power.t.test`.
```{r}
K <- 2
n <- 100
f <- 0.171875
pwr.anova.test(n = n,
k = K,
f = f,
sig.level = alpha_level)$power
```
This analysis tells us that running the study with 100 participants in each condition is very likely to *not* yield a significant test result, even if our expected pattern of differences is true. This is not optimal.
Let's mathematically explore which pattern of means we would need to expect to have 90% power for the ANOVA with 100 participants in each group. We can use the pwr package in R to compute a sensitivity analysis that tells us the effect size, in Cohen's *f*, that we are able to detect with 2 groups and 100 participants in each group, in order to achieve 90% power with an alpha level of 5%.
```{r}
K <- 2
n <- 100
sd <- 6.4
r <- 0
#Calculate f when running simulation
f <- pwr.anova.test(n = n,
k = K,
power = 0.9,
sig.level = alpha_level)$f
f
```
This sensitivity analysis shows we have 90% power in our planned design to detect effects of Cohen's *f* of 0.2303587. Benchmarks by @cohen1988spa for small, medium, and large Cohen's *f* values are 0.1, 0.25, and 0.4, which correspond to eta-squared values of small (.0099), medium (.0588), and large (.1379), in line with d = .2, .5, or .8. So, at least based on these benchmarks, we have 90% power to detect effects that are slightly below a medium effect benchmark.
```{r}
f2 <- f^2
ES <- f2 / (f2 + 1)
ES
```
Expressed in eta-squared, we can detect values of eta-squared = 0.05 or larger.
```{r}
mu <- mu_from_ES(K = K, ES = ES)
mu <- mu * sd
mu
```
We can compute a pattern of means, given a standard deviation of 6.4, that would give us an effect size of *f* = 0.23, or eta-squared of 0.05. We should be able to accomplish this is the means are -1.474295 and 1.474295. We can use these values to confirm the ANOVA has 90% power.
```{r}
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
labelnames = labelnames
)
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(simulation_result_2.3$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
The simulation confirms that for the *F*-test for the ANOVA we have 90% power. This is also what g\*power tells us what would happen based on a post-hoc power analysis with an *f* of 0.2303587, 2 groups, 200 participants in total (100 in each between subject condition), and an alpha of 5%.
![](screenshots/gpower_8.png)
If we return to our expected means, how many participants do we need for sufficient power? Given the expected difference and standard deviation, d = 0.34375, and *f* = 0.171875. We can perform an a priori power analysis for this simple case, which tells us we need 179 participants in each group (we can't split people in parts, and thus always round a power analysis upward), or 358 in total.
```{r}
K <- 2
power <- 0.9
f <- 0.171875
pwr.anova.test(power = power,
k = K,
f = f,
sig.level = alpha_level)
```
If we re-run the simulation with this sample size, we indeed have 90% power.
```{r}
string <- "2b"
n <- 179
mu <- c(24, 26.2)
# Enter means in the order that matches the labels below.
# In this case, control, pet.
sd <- 6.4
labelnames <- c("condition", "control", "pet") #
# the label names should be in the order of the means specified above.
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
labelnames = labelnames
)
alpha_level <- 0.05
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(simulation_result_2.5$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
We stored the result from the power analysis in an object. This allows us to request plots (which are not printed automatically) showing the *p*-value distribution. If we request `simulation_result$plot1` we get the *p*-value distribution for the ANOVA:
```{r echo=FALSE}
simulation_result_2.5$plot1
```
If we request `simulation_result$plot2` we get the *p*-value distribution for the paired comparisons (in this case only one):
```{r echo=FALSE}
simulation_result_2.5$plot2
```
\newpage
## Part 2
### Three conditions
Imagine we aim to design a study to test the hypothesis that giving people a pet to take care of will increase their life satisfaction. We have a control condition, a 'cat' pet condition, and a 'dog' pet condition. We can simulate a One-Way ANOVA with a specified alpha, sample size, and effect size, to see the statistical power we would have for the ANOVA and the follow-up comparisons. We expect all pets to increase life-satisfaction compared to the control condition. Obviously, we also expect the people who are in the 'dog' pet condition to have even greater life-satisfaction than people in the 'cat' pet condition. Based on work by Pavot and Diener (1993) we believe that we can expect responses on the life-satifaction scale to have a mean of approximately 24 in our population, with a standard deviation of 6.4. We expect having a pet increases life satisfaction with approximately 2.2 scale points for participants who get a cat, and 2.6 scale points for participants who get a dog. We initially consider collecting data from 150 participants in total, with 50 participants in each condition. But before we proceed with the data collection, we examine the statistical power our design would have to detect the differences we predict.
```{r}
string <- "3b"
n <- 50
# We are thinking of running 50 peope in each condition
mu <- c(24, 26.2, 26.6)
# Enter means in the order that matches the labels below.
# In this case, control, cat, dog.
sd <- 6.4
labelnames <- c("condition", "control", "cat", "dog") #
# the label names should be in the order of the means specified above.
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
labelnames = labelnames
)
alpha_level <- 0.05
```
```{r eval=FALSE, echo=TRUE}
# You should think carefully about how to justify your alpha level.
# We will give some examples later, but for now, use 0.05.
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(simulation_result_2.7$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
The result shows that you would have quite low power with 50 participants, both for the overall ANOVA (just around 50% power), as for the follow up comparisons (approximately 40% power for the control vs cat condition, around 50% for the control vs dogs condition, and really low power (around 6%, just above the Type 1 error rate of 5%) for the expected difference between cats and dogs.
### Power for simple effects
We are typically not just interested in the ANOVA, but also in follow up comparisons. In this case, we would perform a *t*-test comparing the control condition against the cat and dog condition, and we would compare the cat and dog conditions against each other, in independent *t*-tests.
For our example, Cohen's d (the standardized mean difference) is 2.2/6.4, or d = 0.34375 for the difference between the control condition and cats, 2.6/6.4 of d = 0.40625 for the difference between the control condition and dogs, and 0.4/6.4 or d = 0.0625 for the difference between cats and dogs as pets.
We can easily compute the expected power for these simple comparisons using the `pwr` package.
```{r}
pwr.t.test(
d = 2.2 / 6.4,
n = 50,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)$power
pwr.t.test(
d = 2.6 / 6.4,
n = 50,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)$power
pwr.t.test(
d = 0.4 / 6.4,
n = 50,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)$power
```
This analysis tells us that running the study with 50 participants in each condition is more likely to *not* yield a significant test result, even if our expected pattern of differences is true, than that we will observe a *p*-value smaller than our alpha level. This is not optimal.
Let's mathematically explore which pattern of means we would need to expect to have 90% power for the ANOVA with 50 participants in each group. We can use the `pwr` package in R to compute a sensitivity analysis that tells us the effect size, in Cohen's *f*, that we are able to detect with 3 groups and 50 partiicpants in each group, in order to achive 90% power with an alpha level of 5%.
```{r}
K <- 3
n <- 50
sd <- 6.4
r <- 0
#Calculate f when running simulation
f <- pwr.anova.test(n = n,
k = K,
power = 0.9,
sig.level = alpha_level)$f
f
```
This sensitivity analysis shows we have 90% power in our planned design to detect effects of Cohen's *f* of 0.2934417. Benchmarks by @cohen1988spa for small, medium, and large Cohen's *f* values are 0.1, 0.25, and 0.4, which correspond to eta-squared values of small (.0099), medium (.0588), and large (.1379), in line with d = .2, .5, or .8. So, at least based on these benchmarks, we have 90% power to detect effects that are somewhat sizeable.
```{r}
f2 <- f^2
ES <- f2 / (f2 + 1)
ES
```
Expressed in eta-squared, we can detect values of eta-squared = 0.0793 or larger.
```{r}
mu <- mu_from_ES(K = K, ES = ES)
mu <- mu * sd
mu
```
We can compute a pattern of means, given a standard deviation of 6.4, that would give us an effect size of f = 0.2934, or eta-squared of 0.0793. We should be able to accomplish this if the means are -2.300104, 0.000000, and 2.300104. We can use these values to confirm the ANOVA has 90% power.
```{r}
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
labelnames = labelnames
)
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(simulation_result_2.9$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
The simulation confirms that for the *F*-test for the ANOVA we have 90% power. This is also what g*power tells us what would happen based on a post-hoc power analysis with an f of 0.2934417, 3 groups, 150 participants in total (50 in each between subject condition), and an alpha of 5%.
![](screenshots/gpower_7.png)
We can also compute the power for the ANOVA and simple effects in `R` with the `pwr` package. The calculated effect sizes and power match those from the simulation.
```{r}
K <- 3
n <- 50
sd <- 6.4
f <- 0.2934417
pwr.anova.test(
n = n,
k = K,
f = f,
sig.level = alpha_level
)$power
d <- 2.300104 / 6.4
d
pwr.t.test(
d = 2.300104 / 6.4,
n = 50,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)$power
d <- 2 * 2.300104 / 6.4
d
pwr.t.test(
d = d,
n = 50,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)$power
```
We can also compare the results against the analytic solution by Aberson (2019).
First, load the function for a 3-way ANOVA from the `pwr2ppl` package.
Then we use the function to calculate power.
```{r}
#Initial example, low power
anova1f_3(
m1 = 24,
m2 = 26.2,
m3 = 26.6,
s1 = 6.4,
s2 = 6.4,
s3 = 6.4,
n1 = 50,
n2 = 50,
n3 = 50,
alpha = .05
)
#From: Aberson, Christopher L.
# Applied Power Analysis for the Behavioral Sciences, 2nd Edition.
# $Power [1] 0.4769468
#Later example, based on larger mean difference
anova1f_3(
m1 = -2.300104,
m2 = 0,
m3 = 2.300104,
s1 = 6.4,
s2 = 6.4,
s3 = 6.4,
n1 = 50,
n2 = 50,
n3 = 50,
alpha = .05
)
# $Power [1] 0.9000112
```
\newpage
## Effect Size Estimates for One-Way ANOVA
Using the formulas below, we can calculate the means for a one-way ANOVA. Using the formula from @albers2018power, we can determine the means that should yield a specified effect sizes (expressed in Cohen's *f*).
Eta-squared (idential to partial eta-squared for one-way ANOVA's) has benchmarks of .0099, .0588, and .1379 for small, medium, and large effect sizes [@cohen1988spa].
### Three conditions, small effect size
We can simulate a one-factor anova, setting means to achieve a certain effect size. Eta-squared is biased. Thus, the eta-squared we calculate based on the observed data overestimates the population effect size. This bias is largest for smaller sample sizes. Thus, to test whether the simulation yields the expected effect size, we use extremele large sample sizes in each between subject condition (n = 5000). This simulation should yield a small effect size (0.099)
```{r effect_size}
K <- 3
ES <- .0099
mu <- mu_from_ES(K = K, ES = ES)
n <- 5000
sd <- 1
r <- 0
string = paste(K,"b",sep = "")
```
```{r, message=FALSE, warning=FALSE}
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
r = r,
labelnames = c("factor1", "level1", "level2", "level3")
)
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(simulation_result_2.17$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
The resulting effect size estimate from the simulation is very close to 0.0099
### Four conditions, medium effect size
This simulation should yield a medium effect size (0.0588) across four independent conditions.
```{r}
K <- 4
ES <- .0588
mu <- mu_from_ES(K = K, ES = ES)
n <- 5000
sd <- 1
r <- 0
string = paste(K,"b",sep = "")
```
```{r, message=FALSE, warning=FALSE}
design_result <- ANOVA_design(
design = string,
n = n,
mu = mu,
sd = sd,
r = r,
labelnames = c("factor1", "level1", "level2", "level3", "level4")
)
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(simulation_result_2.19$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo = FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
Results are very close to 0.0588.
### Two conditions, large effect size
We can simulate a one-way ANOVA that should yield a large effect size (0.1379) across two conditions.
```{r}
K <- 2
ES <- .1379
mu <- mu_from_ES(K = K, ES = ES)
n <- 5000
sd <- 1
r <- 0
string = paste(K,"b",sep = "")
```
```{r, message=FALSE, warning=FALSE}
design_result <- ANOVA_design(design = string,
n = n,
mu = mu,
sd = sd,
r = r,
labelnames = c("factor1", "level1", "level2"))
```
```{r eval=FALSE, echo=TRUE}
simulation_result <- ANOVA_power(design_result,
alpha_level = alpha_level,
nsims = nsims,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(simulation_result_2.21$main_results,
caption = "Simulated ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
```{r}
exact_result <- ANOVA_exact(design_result,
alpha_level = alpha_level,
verbose = FALSE)
```
```{r echo=FALSE}
knitr::kable(exact_result$main_results,
caption = "Exact ANOVA Result")%>%
kable_styling(latex_options = "hold_position")
```
The results are very close to is simulation should yield a small effect size (0.1379).