-
Notifications
You must be signed in to change notification settings - Fork 3
/
15-Factor-Analysis-PCA.Rmd
4171 lines (3314 loc) · 199 KB
/
15-Factor-Analysis-PCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Factor Analysis and Principal Component Analysis {#factor-analysis-PCA}
## Overview of Factor Analysis {#factorAnalysisOverview}
[Factor analysis](#factorAnalysis) is a class of latent variable models that is designated to identify the structure of a measure or set of measures, and ideally, a construct or set of constructs.\index{factor analysis}
It aims to identify the optimal latent structure for a group of variables.\index{factor analysis}
[Factor analysis](#factorAnalysis) encompasses two general types: [confirmatory factor analysis](#cfa) and [exploratory factor analysis](#efa).\index{factor analysis}\index{factor analysis!confirmatory}\index{factor analysis!exploratory}
[*Exploratory factor analysis*](#efa) (EFA) is a latent variable modeling approach that is used when the researcher has no a priori hypotheses about how a set of variables is structured.\index{factor analysis!exploratory}
[EFA](#efa) seeks to identify the empirically optimal-fitting model in ways that balance accuracy (i.e., variance accounted for) and parsimony (i.e., simplicity).\index{parsimony}\index{factor analysis!exploratory}\index{latent variable}
[*Confirmatory factor analysis*](#cfa) (CFA) is a latent variable modeling approach that is used when a researcher wants to evaluate how well a hypothesized model fits, and the model can be examined in comparison to alternative models.\index{factor analysis!confirmatory}\index{latent variable}
Using a [CFA](#cfa) approach, the researcher can pit models representing two theoretical frameworks against each other to see which better accounts for the observed data.\index{factor analysis!confirmatory}
[Factor analysis](#factorAnalysis) is considered to be a "pure" data-driven method for identifying the structure of the data, but the "truth" that we get depends heavily on the decisions we make regarding the parameters of our [factor analysis](#factorAnalysis) [@Floyd1995; @Sarstedt2024].\index{factor analysis!confirmatory}
The goal of [factor analysis](#factorAnalysis) is to identify simple, parsimonious factors that underlie the "junk" (i.e., scores filled with measurement error) that we observe.\index{parsimony}\index{factor analysis!confirmatory}
It used to take a long time to calculate a [factor analysis](#factorAnalysis) because it was computed by hand.\index{factor analysis!confirmatory}
Now, it is fast to compute [factor analysis](#factorAnalysis) with computers (e.g., oftentimes less than 30 ms).\index{factor analysis!confirmatory}
In the 1920s, Spearman developed [factor analysis](#factorAnalysis) to understand the factor structure of intelligence.\index{factor analysis!confirmatory}\index{intellectual assessment}
It was a long process—it took Spearman around one year to calculate the first [factor analysis](#factorAnalysis)!\index{factor analysis!confirmatory}
[Factor analysis](#factorAnalysis) takes a large dimension data set and simplifies it into a smaller set of factors that are thought to reflect underlying constructs.\index{parsimony}\index{factor analysis!confirmatory}
If you believe that nature is simple underneath, [factor analysis](#factorAnalysis) gives nature a chance to display the simplicity that lives beneath the complexity on the surface.\index{parsimony}\index{factor analysis!confirmatory}
Spearman identified a single factor, *g*, that accounted for most of the covariation between the measures of intelligence.\index{factor analysis!confirmatory}\index{intelligence!\textit{g}}
[Factor analysis](#factorAnalysis) involves observed (manifest) variables and unobserved (latent) factors.\index{factor analysis}\index{latent variable}
In a [reflective model](#reflectiveConstruct), it is assumed that the latent factor influences the manifest variables, and the latent factor therefore reflects the common ([reliable](#reliability)) variance among the variables.\index{factor analysis}\index{latent variable}\index{latent variable}\index{construct!reflective}
A factor model potentially includes factor loadings, residuals (errors or disturbances), intercepts/means, covariances, and regression paths.\index{factor analysis}
A regression path indicates a hypothesis that one variable (or factor) influences another.\index{factor analysis}
The standardized regression coefficient represents the strength of association between the variables or factors.\index{factor analysis}\index{standardized regression coefficient}
A factor loading is a regression path from a latent factor to an observed (manifest) variable.\index{factor analysis}\index{structural equation modeling!factor loading}\index{latent variable}
The standardized factor loading represents the strength of association between the variable and the latent factor.\index{factor analysis}\index{structural equation modeling!factor loading}\index{latent variable}\index{standardized regression coefficient}
A residual is variance in a variable (or factor) that is unexplained by other variables or factors.\index{factor analysis}\index{structural equation modeling!residual}
An indicator's intercept is the expected value of the variable when the factor(s) (onto which it loads) is equal to zero.\index{factor analysis}\index{structural equation modeling!intercept}
Covariances are the associations between variables (or factors).\index{factor analysis}\index{structural equation modeling!covariance}
In factor analysis, the relation between an indicator ($\text{X}$) and its underlying latent factor(s) ($\text{F}$) can be represented with a regression formula as in Equation \@ref(eq:indicatorLatentAssociation):\index{factor analysis}\index{structural equation modeling!factor loading}\index{structural equation modeling!intercept}
\begin{equation}
\text{X} = \lambda \cdot \text{F} + \text{Item Intercept} + \text{Error Term}
(\#eq:indicatorLatentAssociation)
\end{equation}
where:
- $\text{X}$ is the observed value of the indicator
- $\lambda$ is the factor loading, indicating the strength of the association between the indicator and the latent factor(s)
- $\text{F}$ is the person's value on the latent factor(s)
- $\text{Item Intercept}$ represents the constant term that accounts for the expected value of the indicator when the latent factor(s) are zero
- $\text{Error Term}$ is the residual, indicating the extent of variance in the indicator that is not explained by the latent factor(s)
When the latent factors are uncorrelated, the (standardized) error term for an indicator is calculated as 1 minus the sum of squared standardized factor loadings for a given item (including cross-loadings).\index{factor analysis}\index{structural equation modeling!residual}\index{structural equation modeling!factor loading}\index{cross-loading}
Another class of [factor analysis](#factorAnalysis) models are [higher-order](#higherOrderModel) (or hierarchical) factor models and [bifactor models](#bifactorModel).\index{factor analysis}\index{factor analysis!higher-order}\index{factor analysis!bifactor}
Guidelines in using [higher-order factor](#higherOrderModel) and [bifactor](#bifactorModel) models are discussed by @Markon2019.\index{factor analysis}\index{factor analysis!higher-order}\index{factor analysis!bifactor}
[Factor analysis](#factorAnalysis) is a powerful technique to help identify the factor structure that underlies a measure or construct.\index{factor analysis}
As discussed in Section \@ref(factorAnalysisDecisions), however, there are many decisions to make in [factor analysis](#factorAnalysis), in addition to questions about which variables to use, how to scale the variables, etc. [@Floyd1995; @Sarstedt2024].\index{factor analysis}
If the variables going into a [factor analysis](#factorAnalysis) are not well assessed, [factor analysis](#factorAnalysis) will not rescue the factor structure.\index{factor analysis}
In such situations, there is likely to be the problem of garbage in, garbage out.\index{factor analysis}
Factor analysis depends on the covariation among variables.\index{factor analysis}
Given the extensive [method variance](#methodBias) that measures have, [factor analysis](#factorAnalysis) (and [principal component analysis](#pca)) tends to extract method factors.\index{factor analysis}\index{principal component analysis}\index{method bias}\index{factor analysis!method factor}
Method factors are factors that are related to the methods being assessed rather than the construct of interest.\index{factor analysis}\index{method bias}\index{factor analysis!method factor}
However, [multitrait-multimethod](#MTMM) approaches to [factor analysis](#factorAnalysis) (such as in Section \@ref(mtmmCFA)) help better partition the variance in variables that reflects method variance versus construct variance, to get more accurate estimates of constructs.\index{factor analysis}\index{multitrait-multimethod matrix}
@Floyd1995 provide an overview of [factor analysis](#factorAnalysis) for the development of clinical assessments.\index{factor analysis}
### Example Factor Models from Correlation Matrices {#exampleFactorModelsFromCorrelationMatrices}
Below, I provide some example factor models from various correlation matrices.\index{factor analysis}
Analytical examples of [factor analysis](#factorAnalysis) are presented in Section \@ref(factorAnalysisExamples).\index{factor analysis}
Consider the example correlation matrix in Figure \@ref(fig:correlationMatrix1).\index{factor analysis}
Because all of the correlations are the same ($r = .60$), we expect there is approximately one factor for this pattern of data.\index{factor analysis}
```{r correlationMatrix1, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 1.", echo = FALSE}
knitr::include_graphics("./Images/correlationMatrix1.png")
```
In a single-factor model fit to these data, the factor loadings are .77 and the residual error terms are .40, as depicted in Figure \@ref(fig:factorAnalysis1).\index{factor analysis}
The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the square of the standardized loading: $.60 = .77 \times .77$.\index{factor analysis}\index{structural equation modeling!factor loading}
The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.40 = 1 - .60$.\index{factor analysis}\index{structural equation modeling!residual}
The proportion of the total variance in indicators that is accounted for by the latent factor is the sum of the square of the standardized loadings divided by the number of indicators.\index{factor analysis}
That is, to calculate the proportion of the total variance in the variables that is accounted for by the latent factor, you would square the loadings, sum them up, and divide by the number of variables: $\frac{.77^2 + .77^2 + .77^2 + .77^2 + .77^2 + .77^2}{6} = \frac{.60 + .60 + .60 + .60 + .60 + .60}{6} = .60$.\index{factor analysis}
Thus, the latent factor accounts for 60% of the variance in the indicators.\index{factor analysis}
In this model, the latent factor explains the covariance among the variables.\index{factor analysis}
If the answer is simple, a small and parsimonious model should be able to obtain the answer.\index{parsimony}\index{factor analysis}
```{r factorAnalysis1, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Unidimensional Model.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-01.png")
```
Consider a different correlation matrix in Figure \@ref(fig:correlationMatrix2).\index{factor analysis}
There is no common variance (correlations between the variables are zero), so there is no reason to believe there is a common factor that influences all of the variables.\index{factor analysis}
Variables that are not correlated cannot be related by a third variable, such as a common factor, so a common factor is not the right model.\index{factor analysis}
```{r correlationMatrix2, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 2.", echo = FALSE}
knitr::include_graphics("./Images/correlationMatrix2.png")
```
Consider another correlation matrix in Figure \@ref(fig:correlationMatrix3).\index{factor analysis}
```{r correlationMatrix3, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 3.", echo = FALSE}
knitr::include_graphics("./Images/correlationMatrix3.png")
```
If you try to fit a single factor to this correlation matrix, it generates a factor model depicted in Figure \@ref(fig:factorAnalysis2).\index{factor analysis}
In this model, the first three variables have a factor loading of .77, but the remaining three variables have a factor loading of zero.\index{factor analysis}
This indicates that three remaining factors likely do not share a common factor with the first three variables.\index{factor analysis}
```{r factorAnalysis2, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Multidimensional Model.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-02.png")
```
Therefore, a one-factor model is probably not correct; instead, the structure of the data is probably best represented by a two-factor model, as depicted in Figure \@ref(fig:factorAnalysis3).\index{factor analysis}
In the model, Factor 1 explains why measures 1, 2, and 3 are correlated, whereas Factor 2 explains why measures 4, 5, and 6 are correlated.\index{factor analysis}
A two-factor model thus explains why measures 1, 2, and 3 are not correlated with measures 4, 5, and 6.\index{factor analysis}
In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis}
Each latent factor accounts for 30% of the variance in all of the indicators: $\frac{.77^2 + .77^2 + .77^2 + 0^2 + 0^2 + 0^2}{6} = \frac{.60 + .60 + .60 + 0 + 0 + 0}{6} = .30$.\index{factor analysis}
```{r factorAnalysis3, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Uncorrelated Factors.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-03.png")
```
Consider another correlation matrix in Figure \@ref(fig:correlationMatrix4).\index{factor analysis}
```{r correlationMatrix4, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 4.", echo = FALSE}
knitr::include_graphics("./Images/correlationMatrix4.png")
```
One way to model these data is depicted in Figure \@ref(fig:factorAnalysis4).\index{factor analysis}
In this model, the factor loadings are .77, the residual error terms are .40, and there is a covariance path of .50 for the association between Factor 1 and Factor 2.\index{factor analysis}
Going from the model to the correlation matrix is deterministic.\index{factor analysis}
If you know the model, you can calculate the correlation matrix.\index{factor analysis}
For instance, using path tracing rules (described in Section \@ref(ctt)), the correlation of measures within a factor in this model is calculated as: $0.60 = .77 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules}
Using path tracing rules, the correlation of measures across factors in this model is calculated as: $.30 = .77 \times .50 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules}
In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis}
Each latent factor accounts for 37% of the variance in all of the indicators: $\frac{.77^2 + .77^2 + .77^2 + (.50^2 \times .77^2) + (.50^2 \times .77^2) + (.50^2 \times .77^2)}{6} = \frac{.60 + .60 + .60 + .15 + .15 + .15}{6} = .37$.\index{factor analysis}
```{r factorAnalysis4, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Correlated Factors.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-04.png")
```
Although going from the model to the correlation matrix is deterministic, going from the correlation matrix to the model is not deterministic.\index{factor analysis}
If you know the correlation matrix, there may be many possible models.\index{factor analysis}
For instance, the model could also be the one depicted in Figure \@ref(fig:factorAnalysis5), with factor loadings of .77, residual error terms of .40, a regression path of .50, and a disturbance term of .75.\index{factor analysis}
The proportion of variance in Factor 2 that is explained by Factor 1 is calculated as: $.25 = .50 \times .50$.\index{factor analysis}
The disturbance term is calculated as $.75 = 1 - (.50 \times .50) = 1 - .25$.\index{factor analysis}
In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis}
Factor 1 accounts for 15% of the variance in the indicators that load onto Factor 2: $\frac{(.50^2 \times .77^2) + (.50^2 \times .77^2) + (.50^2 \times .77^2)}{3} = \frac{.15 + .15 + .15}{3} = .15$.\index{factor analysis}
This model has the exact same fit as the previous model, but it has different implications.\index{factor analysis}
Unlike the previous model, in this model, there is a "causal" pathway from Factor 1 to Factor 2.\index{factor analysis}
However, the causal effect of Factor 1 does not account for all of the variance in Factor 2 because the correlation is only .50.\index{factor analysis}
```{r factorAnalysis5, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Regression Path.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-05.png")
```
Alternatively, something else (e.g., another factor) could be explaining the data that we have not considered, as depicted in Figure \@ref(fig:factorAnalysis6).\index{factor analysis}
This is a [higher-order factor model](#higherOrderModel), in which there is a higher-order factor ($A_1$) that influences both lower-order factors, Factor 1 ($F_1$) and Factor 2 ($F_2$).\index{factor analysis}\index{factor analysis!higher-order}
The factor loadings from the lower order factors to the manifest variables are .77, the factor loading from the higher-order factor to the lower-order factors is .71, and the residual error terms are .40.\index{factor analysis}
This model has the exact same fit as the previous models.\index{factor analysis}
The proportion of variance in a lower-order factor ($F_1$ or $F_2$) that is explained by the higher-order factor ($A_1$) is calculated as: $.50 = .71 \times .71$.\index{factor analysis}
The disturbance term is calculated as $.50 = 1 - (.71 \times .71) = 1 - .50$.\index{factor analysis}
Using path tracing rules, the correlation of measures across factors in this model is calculated as: $.30 = .77 \times .71 \times .71 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules}
In this model, the higher-order factor ($A_1$) accounts for 30% of the variance in the indicators: $\frac{(.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2)}{6} = \frac{.30 + .30 + .30 + .30 + .30 + .30}{6} = .30$.\index{factor analysis}
```{r factorAnalysis6, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Higher-Order Factor Model.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-06.png")
```
Alternatively, there could be a single factor that ties measures 1, 2, and 3 together and measures 4, 5, and 6 together, as depicted in Figure \@ref(fig:factorAnalysis7).\index{factor analysis}
In this model, the measures no longer have merely [random error](#randomError): measures 1, 2, and 3 have correlated residuals—that is, they share error variance (i.e., [systematic error](#systematicError)); likewise, measures 4, 5, and 6 have correlated residuals.\index{structural equation modeling!residual!correlated}\index{factor analysis}\index{measurement error!random error}\index{measurement error!systematic error}
This model has the exact same fit as the previous models.\index{factor analysis}
The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the square of the standardized loading: $.30 = .55 \times .55$.\index{factor analysis}\index{structural equation modeling!factor loading}
The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.70 = 1 - .30$.\index{factor analysis}\index{structural equation modeling!residual}
Using path tracing rules, the correlation of measures within a factor in this model is calculated as: $.60 = (.55 \times .55) + (.70 \times .43 \times .70) + (.70 \times .43 \times .43 \times .70)$.\index{factor analysis}\index{path analysis!path tracing rules}
The correlation of measures across factors in this model is calculated as: $.30 = .55 \times .55$.\index{factor analysis}\index{path analysis!path tracing rules}
In this model, the latent factor accounts for 30% of the variance in the indicators: $\frac{.55^2 + .55^2 + .55^2 + .55^2 + .55^2 + .55^2}{6} = \frac{.30 + .30 + .30 + .30 + .30 + .30}{6} = .30$.\index{factor analysis}
```{r factorAnalysis7, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Unidimensional Model With Correlated Residuals.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-07.png")
```
Alternatively, there could be a single factor that influences measures 1, 2, 3, 4, 5, and 6 in addition to a [method bias](#methodBias) factor (e.g., a particular measurement method, item stem, reverse-worded item, or another [method bias](#methodBias)) that influences measures 4, 5, and 6 equally, as depicted in Figure \@ref(fig:factorAnalysis10).\index{factor analysis}\index{method bias}
In this model, measures 4, 5, and 6 have cross-loadings—that is, they load onto more than one latent factor.\index{factor analysis}\index{cross-loading}
This model has the exact same fit as the previous models.\index{factor analysis}
The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the sum of the squared standardized loadings: $.60 = .77 \times .77 = (.39 \times .39) + (.67 \times .67)$.\index{factor analysis}\index{structural equation modeling!factor loading}
The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.40 = 1 - .60$.\index{factor analysis}\index{structural equation modeling!residual}
Using path tracing rules, the correlation of measures within a factor in this model is calculated as: $.60 = (.77 \times .77) = (.39 \times .39) + (.67 \times .67)$.\index{factor analysis}\index{path analysis!path tracing rules}
The correlation of measures across factors in this model is calculated as: $.30 = .77 \times .39$.\index{factor analysis}\index{path analysis!path tracing rules}
In this model, the first latent factor accounts for 37% of the variance in the indicators: $\frac{.77^2 + .77^2 + .77^2 + .39^2 + .39^2 + .39^2}{6} = \frac{.59 + .59 + .59 + .15 + .15 + .15}{6} = .30$.\index{factor analysis}
The second latent factor accounts for 45% of the variance in its indicators: $\frac{.67^2 + .67^2 + .67^2}{3} = \frac{.45 + .45 + .45}{3} = .45$.\index{factor analysis}
```{r factorAnalysis10, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Cross-Loadings.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-10.png")
```
### Indeterminacy {#indeterminacy}
There could be many more models that have the same fit to the data.\index{factor analysis!indeterminacy}
Thus, [factor analysis](#factorAnalysis) has *indeterminacy* because all of these models can explain these same data equally well, with all having different theoretical meaning.\index{factor analysis!indeterminacy}
The goal of [factor analysis](#factorAnalysis) is for the model to look at the data and induce the model.\index{factor analysis}
However, most data matrices in real life are very complicated—much more complicated than in these examples.\index{factor analysis}
This is why we do not calculate our own [factor analysis](#factorAnalysis) by hand; use a stats program!\index{factor analysis}
It is important to think about the possibility of other models to determine how confident you can be in your data model.\index{factor analysis}
For every fully specified factor model—i.e., where the relevant paths are all defined, there is one and only one predictive data matrix (correlation matrix).\index{factor analysis}
However, each data matrix can produce many different factor models.\index{factor analysis!indeterminacy}
There is no way to distinguish which of these factor models is correct from the data matrix alone.\index{factor analysis!indeterminacy}
Any given data matrix can predict an infinite number of factor models that accurately represent the data structure [@Raykov2001a]—so we make decisions that determine what type of factor solution our data will yield.\index{factor analysis!indeterminacy}
Many models could explain your data, and there are many more models that do not explain the data.\index{factor analysis!indeterminacy}
For equally good-fitting models, decide based on interpretability.\index{factor analysis}
If you have strong theory, decide based on theory and things outside of [factor analysis](#factorAnalysis)!\index{factor analysis}\index{theory}
### Practical Considerations {#practicalConsiderations-factorAnalysis}
There are important considerations for doing [factor analysis](#factorAnalysis) in real life with complex data.\index{factor analysis!practical considerations}
Traditionally, researchers had to consider what kind of data they have, and they often assumed interval-level data even though data in psychology are often not interval data.\index{factor analysis!practical considerations}\index{data!interval}
In the past, [factor analysis](#factorAnalysis) was not good with categorical and dichotomous (e.g., True/False) data because the variance then is largely determined by the mean.\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous}
So, we need something more complicated for dichotomous data.\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous}
More solutions are available now for [factor analysis](#factorAnalysis) with ordinal and dichotomous data, but it is generally best to have at least four ordered categories to perform [factor analysis](#factorAnalysis).\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous}\index{data!ordinal}\index{data!polytomous}
The necessary sample size depends on the complexity of the true factor structure.\index{factor analysis!practical considerations}
If there is a strong single factor for 30 items, then $N = 50$ is plenty.\index{factor analysis!practical considerations}
But if there are five factors and some correlated errors, then the sample size will need to be closer to ~5,000.\index{structural equation modeling!residual!correlated}\index{factor analysis!practical considerations}
[Factor analysis](#factorAnalysis) can recover the truth when the world is simple.\index{parsimony}\index{factor analysis}
However, nature is often not simple, and it may end in the distortion of nature instead of nature itself.\index{factor analysis}
Recommendations for [factor analysis](#factorAnalysis) are described by @Sellbom2019a.\index{factor analysis}
### Decisions to Make in Factor Analysis {#factorAnalysisDecisions}
There are many decisions to make in [factor analysis](#factorAnalysis) [@Floyd1995; @Sarstedt2024].\index{factor analysis!decisions}
These decisions can have important impacts on the resulting solution.\index{factor analysis!decisions}
Decisions include things such as:\index{factor analysis!decisions}
1. What variables to include in the model and how to scale them\index{factor analysis!decisions}
1. Method of factor extraction: [factor analysis](#factorAnalysis) or [PCA](#pca)\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
1. If [factor analysis](#factorAnalysis), the kind of [factor analysis](#factorAnalysis): [EFA](#efa) or [CFA](#cfa)\index{factor analysis!decisions}\index{factor analysis!confirmatory}\index{factor analysis!exploratory}
1. How many factors to retain\index{factor analysis}
1. If [EFA](#efa) or [PCA](#pca), whether and how to rotate factors (factor rotation)\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{principal component analysis}
1. Model selection and interpretation\index{factor analysis!decisions}
#### 1. Variables to Include and their Scaling {#variablesToInclude-factorAnalysis}
The first decision when conducting a [factor analysis](#factorAnalysis) is which variables to include and the scaling of those variables.\index{factor analysis!decisions}
What factors (or components) you extract can differ widely depending on what variables you include in the analysis.\index{factor analysis!decisions}
For example, if you include many variables from the same source (e.g., self-report), it is possible that you will extract a factor that represents the common variance among the variables from that source (i.e., the self-reported variables).\index{factor analysis!decisions}\index{factor analysis!method factor}
This would be considered a method factor, which works against the goal of estimating latent factors that represent the constructs of interest (as opposed to the measurement methods used to estimate those constructs).\index{factor analysis!decisions}\index{factor analysis!method factor}
An additional consideration is the scaling of the variables—whether to use the raw scaling or whether to standardize them to be on a more common metric (e.g., z-score metric with a mean of zero and standard deviation of one).\index{factor analysis!decisions}
#### 2. Method of Factor Extraction {#methodOfFactorExtraction}
The second decision is to select the method of factor extraction.\index{factor analysis!decisions}
This is the algorithm that is going to try to identify factors.\index{factor analysis!decisions}
There are two main families of factor or component extraction: analytic or principal components.\index{factor analysis!decisions}\index{principal component analysis}
The principal components approach is called [principal component analysis](#pca) (PCA).\index{factor analysis!decisions}\index{principal component analysis}
[PCA](#pca) is not really a form [factor analysis](#factorAnalysis); rather, it is useful for data reduction [@Lilienfeld2015].\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction}
The analytic family includes [factor analysis](#factorAnalysis) approaches such as principal axis factoring and maximum likelihood [factor analysis](#factorAnalysis).\index{factor analysis!decisions}\index{factor analysis}
The distinction between [factor analysis](#factorAnalysis) and [PCA](#pca) is depicted in Figure \@ref(fig:factorAnalysisPCA).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
```{r factorAnalysisPCA, out.width = "100%", fig.align = "center", fig.cap = "Distinction Between Factor Analysis and Principal Component Analysis.", echo = FALSE}
knitr::include_graphics("./Images/factorAnalysisPCA.png")
```
##### Principal Component Analysis {#pca}
Principal component analysis (PCA) is used if you want to reduce your data matrix.\index{factor analysis!decisions}\index{principal component analysis}
PCA composites represent the variances of an observed measure in as economical a fashion as possible, with no latent underlying variables.\index{factor analysis!decisions}\index{principal component analysis}
The goal of PCA is to identify a smaller number of components that explain as much variance in a set of variables as possible.\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction}
It is an atheoretical way to decompose a matrix.\index{factor analysis!decisions}\index{principal component analysis}
PCA involves decomposition of a data matrix into a set of eigenvectors, which are transformations of the old variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}
The eigenvectors attempt to simplify the data in the matrix.\index{parsimony}\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}
PCA takes the data matrix and identifies the weighted sum of all variables that does the best job at explaining variance: these are the principal components, also called eigenvectors.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}
Principal components reflect optimally weighted sums.\index{factor analysis!decisions}\index{principal component analysis}
In this way, PCA is a [formative model](#formativeConstruct) (by contrast, [factor analysis](#factorAnalysis) applies a [reflective model](#reflectiveConstruct)).\index{factor analysis!decisions}\index{principal component analysis}\index{construct!formative}\index{factor analysis}\index{construct!reflective}
PCA decomposes the data matrix into any number of components—as many as the number of variables, which will always account for all variance.\index{factor analysis!decisions}\index{principal component analysis}
After the model is fit, you can look at the results and discard the components which likely reflect error variance.\index{factor analysis!decisions}\index{principal component analysis}
Judgments about which factors to retain are based on empirical criteria in conjunction with theory to select a parsimonious number of components that account for the majority of variance.\index{factor analysis!decisions}\index{principal component analysis}
The eigenvalue reflects the amount of variance explained by the component (eigenvector).\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue}
When using a varimax (orthogonal) rotation, an eigenvalue for a component is calculated as the sum of squared standardized component loadings on that component.\index{factor analysis!decisions}\index{principal component analysis}\index{orthogonal rotation}\index{eigenvalue}
When using oblique rotation, however, the items explain more variance than is attributable to their factor loadings because the factors are correlated.\index{factor analysis!decisions}\index{principal component analysis}\index{oblique rotation}
PCA pulls the first principal component out (i.e., the eigenvector that explains the most variance) and makes a new data matrix: i.e., new correlation matrix.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}
Then the PCA pulls out the component that explains the next most variance—i.e., the eigenvector with the next largest eigenvalue, and it does this for all components, equal to the same number of variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}\index{eigenvalue}
For instance, if there are six variables, it will iteratively extract an additional component up to six components.\index{factor analysis!decisions}\index{principal component analysis}
You can extract as many eigenvectors as there are variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}
If you extract all six components, the data matrix left over will be the same as the correlation matrix in Figure \@ref(fig:correlationMatrix2).\index{factor analysis!decisions}\index{principal component analysis}
That is, the remaining variables will be entirely uncorrelated with the remaining variables, because six components explain 100% of the variance from six variables.\index{factor analysis!decisions}\index{principal component analysis}
In other words, you can explain (6) variables with (6) new things!\index{factor analysis!decisions}\index{principal component analysis}
However, it does no good if you have to use all (6) components because there is no data reduction from the original number of variables, but hopefully the first few components will explain most of the variance.\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction}
The sum of all eigenvalues is equal to the number of variables in the analysis.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue}
PCA does not have the same assumptions as [factor analysis](#factorAnalysis), which assumes that measures are partly from common variance and error.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance}
But if you estimate (6) eigenvectors and only keep (2), the model is a two-component model and whatever left becomes error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error}\index{eigenvector}
Therefore, PCA does not have the same assumptions as [factor analysis](#factorAnalysis), but it often ends up in the same place.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis}
Most people who want to conduct a [factor analysis](#factorAnalysis) use PCA, but PCA is not really [factor analysis](#factorAnalysis) [@Lilienfeld2015].\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis}
PCA is what SPSS can do quickly.\index{factor analysis!decisions}\index{principal component analysis}
But computers are so fast now—just do a real [factor analysis](#factorAnalysis)!\index{factor analysis!decisions}\index{factor analysis}
[Factor analysis](#factorAnalysis) better handles error than PCA—[factor analysis](#factorAnalysis) assumes that what is in the variable is the combination of common construct variance and error.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis}
By contrast, PCA assumes that the measures have no measurement error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error}
##### Factor Analysis {#factorAnalysis}
Factor analysis is an analytic approach to factor extraction.\index{factor analysis!decisions}\index{factor analysis}
Factor analysis is a special case of [structural equation modeling](#sem) (SEM).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling}
Factor analysis is an analytic technique that is interested in the factor structure of a measure or set of measures.\index{factor analysis!decisions}\index{factor analysis}
Factor analysis is a theoretical approach that considers that there are latent theoretical constructs that influence the scores on particular variables.\index{factor analysis!decisions}\index{factor analysis}\index{latent variable}
It assumes that part of the explanation for each variable is shared between variables, and that part of it is unique variance.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance}
The unique variance is considered error.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error}
The common variance is called the communality, which is the factor variance.\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling!common variance}\index{communality}
Communality of a factor is estimated using the [average variance extracted](#averageVarianceExtracted) (AVE).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling!common variance}\index{communality}\index{reliability!internal consistency!average variance extracted}
The amount of variance due to error is: $1 - \text{communality}$.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance}\index{communality}
There are several types of factor analysis, including principal axis factoring and maximum likelihood factor analysis.\index{factor analysis!decisions}\index{factor analysis}
Factor analysis can be used to test [measurement/factorial invariance](#measurementInvariance) and for [multitrait-multimethod](#MTMM) designs.\index{factor analysis!decisions}\index{factor analysis}\index{multitrait-multimethod matrix}
One example of a [MTMM](#MTMM) model in factor analysis is the correlated traits correlated methods model [@Tackett2019b].\index{factor analysis!decisions}\index{factor analysis}\index{multitrait-multimethod matrix}
There are several differences between (real) factor analysis versus [PCA](#pca).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
Factor analysis has greater sophistication than [PCA](#pca), but greater sophistication often results in greater assumptions.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
Factor analysis does not always work; the data may not always fit to a factor analysis model; therefore, use [PCA](#pca) as a second/last option.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
[PCA](#pca) can decompose any data matrix; it always works.\index{factor analysis!decisions}\index{principal component analysis}
[PCA](#pca) is okay if you are not interested in the factor structure.\index{factor analysis!decisions}\index{principal component analysis}
[PCA](#pca) uses all variance of variables and assumes variables have no error, so it does not account for measurement error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error}
[PCA](#pca) is good if you just want to form a linear composite and if the causal structure is [formative](#formativeConstruct) (rather than [reflective](#reflectiveConstruct)).\index{factor analysis!decisions}\index{principal component analysis}\index{linear composite}\index{construct!formative}\index{construct!reflective}
However, if you are interested in the factor structure, use factor analysis, which estimates a latent variable that accounts for the common variance and discards error variance.\index{factor analysis!decisions}\index{factor analysis}
Factor analysis is useful for the identification of latent constructs—i.e., underlying dimensions or factors that explain (cause) scores on items.\index{factor analysis!decisions}\index{factor analysis}
#### 3. EFA or CFA {#efa-cfa}
A third decision is the kind of [factor analysis](#factorAnalysis) to use: [exploratory factor analysis](#efa) (EFA) or [confirmatory factor analysis](#cfa) (CFA).\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}
##### Exploratory Factor Analysis (EFA) {#efa}
Exploratory factor analysis (EFA) is used if you have no a priori hypotheses about the factor structure of the model, but you would like to understand the latent variables represented by your items.\index{factor analysis!decisions}\index{factor analysis!exploratory}
EFA is partly induced from the data.\index{factor analysis!decisions}\index{factor analysis!exploratory}
You feed in the data and let the program build the factor model.\index{factor analysis!decisions}\index{factor analysis!exploratory}
You can set some parameters going in, including how to extract or rotate the factors.\index{factor analysis!decisions}\index{factor analysis!exploratory}
The factors are extracted from the data without specifying the number and pattern of loadings between the items and the latent factors [@Bollen2002].\index{factor analysis!decisions}\index{factor analysis!exploratory}
All cross-loadings are freely estimated.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}
##### Confirmatory Factor Analysis (CFA) {#cfa}
Confirmatory factor analysis (CFA) is used to confirm a priori hypotheses about the factor structure of the model.\index{factor analysis!decisions}\index{factor analysis!confirmatory}
CFA is a test of the hypothesis.\index{factor analysis!decisions}\index{factor analysis!confirmatory}
In CFA, you specify the model and ask how well this model represents the data.\index{factor analysis!decisions}\index{factor analysis!confirmatory}
The researcher specifies the number, meaning, associations, and pattern of free parameters in the factor loading matrix [@Bollen2002].\index{factor analysis!decisions}\index{factor analysis!confirmatory}
A key advantage of CFA is the ability to directly compare alternative models (i.e., factor structures), which is valuable for theory testing [@Strauss2009].\index{factor analysis!decisions}\index{factor analysis!confirmatory}
For instance, you could use [CFA](#cfa) to test whether the variance in several measures' scores is best explained with one factor or two factors.\index{factor analysis!confirmatory}
In CFA, cross-loadings are not estimated unless the researcher specifies them.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!confirmatory}
##### Exploratory Structural Equation Modeling (ESEM) {#efa-cfa-esem}
In real life, there is not a clear distinction between [EFA](#efa) and [CFA](#cfa).\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}
In [CFA](#cfa), researchers often set only half of the constraints, and let the data fill in the rest.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}
In [EFA](#efa), researchers often set constraints and assumptions.
Thus, the line between [EFA](#efa) and [CFA](#cfa) is often blurred.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}
[EFA](#efa) and [CFA](#cfa) can be considered special cases of exploratory structural equation modeling (ESEM), which combines features of [EFA](#efa), [CFA](#cfa), and [SEM](#sem) [@Marsh2014].\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}
ESEM can include any combination of exploratory (i.e., [EFA](#efa)) and confirmatory ([CFA](#cfa)) factors.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}
ESEM, unlike traditional [CFA](#cfa) models, typically estimates all cross-loadings—at least for the exploratory factors.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}
If a [CFA](#cfa) model without cross-loadings and correlated residuals fits as well as an ESEM model with all cross-loadings, the [CFA](#cfa) model should be retained for its simplicity.\index{cross-loading}\index{structural equation modeling!residual!correlated}\index{parsimony}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}
However, ESEM models often fit better than [CFA](#cfa) models because requiring no cross-loadings is an unrealistic expectation of items from many psychological instruments [@Marsh2014].\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}
The correlations between factors tend to be positively biased when fitting [CFA](#cfa) models without cross-loadings, which leads to challenges in using [CFA](#cfa) to establish [discriminant validity](#discriminantValidity) [@Marsh2014].\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}\index{validity!discriminant}
Thus, compared to [CFA](#cfa), ESEM has the potential to more accurately estimate factor correlations and establish [discriminant validity](#discriminantValidity) [@Marsh2014].\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}\index{validity!discriminant}
Moreover, ESEM can be useful in a [multitrait-multimethod](#MTMM) framework.\index{factor analysis!decisions}\index{structural equation modeling!exploratory}\index{multitrait-multimethod matrix}
We provide examples of ESEM in Section \@ref(esemModel).\index{factor analysis!decisions}\index{structural equation modeling!exploratory}
#### 4. How Many Factors to Retain {#factorsToRetain}
A goal of [factor analysis](#factorAnalysis) and [PCA](#pca) is simplification or parsimony, while still explaining as much variance as possible.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
The hope is that you can have fewer factors that explain the associations between the variables than the number of observed variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
But how do you decide on the number of factors (in [factor analysis](#factorAnalysis)) or components (in [PCA](#pca))?\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
There are a number of criteria that one can use to help determine how many factors/components to keep:\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Kaiser-Guttman criterion: in [PCA](#pca), components with eigenvalues greater than 1\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue}
- or, for [factor analysis](#factorAnalysis), factors with eigenvalues greater than zero\index{factor analysis!decisions}\index{factor analysis}\index{eigenvalue}
- Cattell's scree test: the "elbow" in a scree plot minus one; sometimes operationalized with optimal coordinates (OC) or the acceleration factor (AF)\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
- Parallel analysis: factors that explain more variance than randomly simulated data\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}
- [Very simple structure (VSS)](#vssPlot) criterion: larger is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Velicer's minimum average partial (MAP) test: smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Akaike information criterion (AIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Bayesian information criterion (BIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Sample size-adjusted BIC (SABIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
- Root mean square error of approximation (RMSEA): smaller is better\index{factor analysis!decisions}\index{factor analysis}
- Chi-square difference test: smaller is better; a significant test indicates that the more complex model is significantly better fitting than the less complex model\index{factor analysis!decisions}\index{factor analysis}
- Standardized root mean square residual (SRMR): smaller is better\index{factor analysis!decisions}\index{factor analysis}
- Comparative Fit Index (CFI): larger is better\index{factor analysis!decisions}\index{factor analysis}
- Tucker Lewis Index (TLI): larger is better\index{factor analysis!decisions}\index{factor analysis}
There is not necessarily a "correct" criterion to use in determining how many factors to keep, so it is generally recommended that researchers use multiple criteria in combination with theory and interpretability.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
A scree plot from a [factor analysis](#factorAnalysis) or [PCA](#pca) provides lots of information.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
A scree plot has the factor number on the x-axis and the eigenvalue on the y-axis.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}\index{eigenvalue}
The eigenvalue is the variance accounted for by a factor; when using a varimax (orthogonal) rotation, an eigenvalue (or factor variance) is calculated as the sum of squared standardized factor (or component) loadings on that factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}\index{eigenvalue}
An example of a scree plot is in Figure \@ref(fig:screePlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
```{r screePlot, out.width = "100%", fig.align = "center", fig.cap = "Example of a Scree Plot.", echo = FALSE}
knitr::include_graphics("./Images/screePlot.png")
```
The total variance is equal to the number of variables you have, so one eigenvalue is approximately one variable's worth of variance.\index{eigenvalue}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
In a [factor analysis](#factorAnalysis) and [PCA](#pca), the first factor (or component) accounts for the most variance, the second factor accounts for the second-most variance, and so on.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
The more factors you add, the less variance is explained by the additional factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
One criterion for how many factors to keep is the Kaiser-Guttman criterion.
According to the Kaiser-Guttman criterion, you should keep any factors whose eigenvalue is greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
That is, for the sake of simplicity, parsimony, and data reduction, you should take any factors that explain more than a single variable would explain.\index{parsimony}\index{data!reduction}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
According to the Kaiser-Guttman criterion, we would keep three factors from Figure \@ref(fig:screePlot) that have eigenvalues greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
The default in SPSS is to retain factors with eigenvalues greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
However, keeping factors whose eigenvalue is greater than 1 is not the most correct rule.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
If you let SPSS do this, you may get many factors with eigenvalues around 1 (e.g., factors with an eigenvalue ~ 1.0001) that are not adding so much that it is worth the added complexity.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
The Kaiser-Guttman criterion usually results in keeping too many factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
Factors with small eigenvalues around 1 could reflect error shared across variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
For instance, factors with small eigenvalues could reflect method variance (i.e., method factor), such as a self-report factor that turns up as a factor in [factor analysis](#factorAnalysis), but that may be useless to you as a conceptual factor of a construct of interest.\index{factor analysis!method factor}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
Another criterion is Cattell's scree test, which involves selecting the number of factors from looking at the scree plot.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
"Scree" refers to the rubble of stones at the bottom of a mountain.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
According to Cattell's scree test, you should keep the factors before the last steep drop in eigenvalues—i.e., the factors before the rubble, where the slope approaches zero.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}\index{scree plot}
The beginning of the scree (or rubble), where the slope approaches zero, is called the "elbow" of a scree plot.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
Using Cattell's scree test, you retain the number of factors that explain the most variance prior to the explained variance drop-off, because, ultimately, you want to include only as many factors in which you gain substantially more by the inclusion of these factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
That is, you would keep the number of factors at the elbow of the scree plot minus one.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
If the last steep drop occurs from Factor 4 to Factor 5 and the elbow is at Factor 5, we would keep four factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
In Figure \@ref(fig:screePlot), the last steep drop in eigenvalues occurs from Factor 3 to Factor 4; the elbow of the scree plot occurs at Factor 4.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
We would keep the number of factors at the elbow minus one.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
Thus, using Cattell's scree test, we would keep three factors based on Figure \@ref(fig:screePlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
There are more sophisticated ways of using a scree plot, but they usually end up at a similar decision.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
Examples of more sophisticated tests include parallel analysis and [very simple structure (VSS) plots](#vssPlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{very simple structure plot}\index{parallel analysis}
In a parallel analysis, you examine where the eigenvalues from observed data and random data converge, so you do not retain a factor that explains less variance than would be expected by random chance.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}\index{eigenvalue}
A parallel analysis can be helpful when you have many variables and one factor accounts for the majority of the variance such that the elbow is at Factor 2 (which would result in keeping one factor), but you have theoretical reasons to select more than one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}
An example in which parallel analysis may be helpful is with neurophysiological data.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}\index{neurophysiological assessment}
For instance, parallel analysis can be helpful when conducting temporo-spatial [PCA](#pca) of event-related potential (ERP) data in which you want to separate multiple time windows and multiple spatial locations despite a predominant signal during a given time window and spatial location [@Dien2012].\index{factor analysis!decisions}\index{principal component analysis}\index{parallel analysis}\index{neurophysiological assessment}
In general, my recommendation is to use Cattell's scree test, and then test the factor solutions with plus or minus one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
You should never accept [PCA](#pca) components with eigenvalues less than one (or factors with eigenvalues less than zero), because they are likely to be largely composed of error.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}
If you are using maximum likelihood [factor analysis](#factorAnalysis), you can compare the fit of various models with model fit criteria to see which model fits best for its parsimony.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis}
A model will always fit better when you add additional parameters or factors, so you examine if there is *significant* improvement in model fit when adding the additional factor—that is, we keep adding complexity until additional complexity does not buy us much.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis}
Always try a factor solution that is one less and one more than suggested by Cattell's scree test to buffer your final solution because the purpose of [factor analysis](#factorAnalysis) is to explain things and to have interpretability.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}
Even if all rules or indicators suggest to keep X number of factors, maybe $\pm$ one factor helps clarify things.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
Even though [factor analysis](#factorAnalysis) is empirical, theory and interpretatability should also inform decisions.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
#### 5. Factor Rotation {#factorRotation}
The next step if using [EFA](#efa) or [PCA](#pca) is, possibly, to rotate the factors to make them more interpretable and simple, which is the whole goal.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
To interpret the results of a [factor analysis](#factorAnalysis), we examine the factor matrix.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
The columns refer to the different factors; the rows refer to the different observed variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
The cells in the table are the factor loadings—they are basically the correlation between the variable and the factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{structural equation modeling!factor loading}
Our goal is to achieve a model with simple structure because it is easily interpretable.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure}
*Simple structure* means that every variable loads perfectly on one and only one factor, as operationalized by a matrix of factor loadings with values of one and zero and nothing else.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure}
An example of a factor matrix that follows simple structure is depicted in Figure \@ref(fig:simpleStructure).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure}
```{r simpleStructure, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Matrix That Follows Simple Structure.", echo = FALSE}
knitr::include_graphics("./Images/simpleStructure.png")
```
An example of a measurement model that follows simple structure is depicted in Figure \@ref(fig:factorSolutionSimpleStructure).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure}
Each variable loads onto one and only one factor, which makes it easy to interpret the meaning of each factor, because a given factor represents the common variance among the items that load onto it.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
```{r factorSolutionSimpleStructure, out.width = "100%", fig.align = "center", fig.cap = "Example of a Measurement Model That Follows Simple Structure. 'INT' = internalizing problems; 'EXT' = externalizing problems; 'TD' = thought-disordered problems.", fig.scap = "Example of a Measurement Model That Follows Simple Structure.", echo = FALSE}
knitr::include_graphics("./Images/factorSolutionSimpleStructure.png")
```
However, pure simple structure only occurs in simulations, not in real-life data.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure}
In reality, our [measurement model](#measurementModel-sem) in an unrotated [factor analysis](#factorAnalysis) model might look like the model in Figure \@ref(fig:factorSolutionUnrotatedExample).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
In this example, the [measurement model](#measurementModel-sem) does not show simple structure because the items have cross-loadings—that is, the items load onto more than one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-loading}\index{simple structure}\index{structural equation modeling!measurement model}
The cross-loadings make it difficult to interpret the factors, because all of the items load onto all of the factors, so the factors are not very distinct from each other, which makes it difficult to interpret what the factors mean.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-loading}\index{simple structure}\index{structural equation modeling!measurement model}
```{r factorSolutionUnrotatedExample, out.width = "100%", fig.align = "center", fig.cap = "Example of a Measurement Model That Does Not Follow Simple Structure. 'INT' = internalizing problems; 'EXT' = externalizing problems; 'TD' = thought-disordered problems.", fig.scap = "Example of a Measurement Model That Does Not Follow Simple Structure.", echo = FALSE}
knitr::include_graphics("./Images/factorSolutionUnrotatedExample.png")
```
As a result of the challenges of intepretability caused by cross-loadings, factor rotations are often performed.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
An example of an unrotated factor matrix is in Figure \@ref(fig:factorMatrix).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
```{r factorMatrix, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Matrix.", echo = FALSE}
knitr::include_graphics("./Images/factorMatrix.png")
```
In the example factor matrix in Figure \@ref(fig:factorMatrix), the [factor analysis](#factorAnalysis) is not very helpful—it tells us very little because it did not distinguish between the two factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
The variables have similar loadings on Factor 1 and Factor 2.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
An example of a unrotated factor solution is in Figure \@ref(fig:factorSolutionUnrotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
In the figure, all of the variables are in the midst of the quadrants—they are not on the factors' axes.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
Thus, the factors are not very informative.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
```{r factorSolutionUnrotated, out.width = "100%", fig.align = "center", fig.cap = "Example of an Unrotated Factor Solution.", echo = FALSE}
knitr::include_graphics("./Images/factorSolutionUnrotated.png")
```
As a result, to improve the interpretability of the [factor analysis](#factorAnalysis), we can do what is called rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
*Rotation* involves changing the orientation of the factors by changing the axes so that variables end up with very high (close to one or negative one) or very low (close to zero) loadings, so that it is clear which factors include which variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
That is, it tries to identify the ideal solution (factor) for each variable.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
It searches for simple structure and keeps searching until it finds a minimum.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}
After rotation, if the rotation was successful for imposing simple structure, each factor will have loadings close to one (or negative one) for some variables and close to zero for other variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}
The goal of factor rotation is to achieve simple structure, to help make it easier to interpret the meaning of the factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}
To perform factor rotation, orthogonal rotations are often used.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
Orthogonal rotations make the rotated factors uncorrelated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
An example of a commonly used orthogonal rotation is varimax rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
An example of a factor matrix following an orthogonal rotation is depicted in Figure \@ref(fig:factorMatrixRotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
An example of a factor solution following an orthogonal rotation is depicted in Figure \@ref(fig:factorSolutionRotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
```{r factorMatrixRotated, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Matrix.", echo = FALSE}
knitr::include_graphics("./Images/factorMatrixRotated.png")
```
```{r factorSolutionRotated, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Solution.", echo = FALSE}
knitr::include_graphics("./Images/factorSolutionRotated.png")
```
An example of a factor matrix from SPSS following an orthogonal rotation is depicted in Figure \@ref(fig:rotatedFactorMatrix).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
```{r rotatedFactorMatrix, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Matrix From SPSS.", echo = FALSE}
knitr::include_graphics("./Images/rotatedFactorMatrix.png")
```
An example of a factor structure from an orthogonal rotation is in Figure \@ref(fig:orthogonalRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation}
```{r orthogonalRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Structure From an Orthogonal Rotation.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-08.png")
```
Sometimes, however, the two factors and their constituent variables may be correlated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation}
Examples of two correlated factors may be depression and anxiety.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation}
When the two factors are correlated in reality, if we make them uncorrelated, this would result in an inaccurate model.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation}
Oblique rotation allows for factors to be correlated, but if the factors have low correlation (e.g., .2 or less), you can likely continue with orthogonal rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
An example of a factor structure from an oblique rotation is in Figure \@ref(fig:obliqueRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
Results from an oblique rotation are more complicated than orthogonal rotation—they provide lots of output and are more complicated to interpret.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
In addition, oblique rotation might not yield a smooth answer if you have a relatively small sample size.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
```{r obliqueRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Structure From an Oblique Rotation.", echo = FALSE}
knitr::include_graphics("./Images/FactorAnalysis-09.png")
```
As an example of rotation based on interpretability, consider the Five-Factor Model of Personality (the Big Five), which goes by the acronym, OCEAN: **O**penness, **C**onscientiousness, **E**xtraversion, **A**greeableness, and **N**euroticism.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
Although the five factors of personality are somewhat correlated, we can use rotation to ensure they are maximally independent.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
Upon rotation, extraversion and neuroticism are essentially uncorrelated, as depicted in Figure \@ref(fig:factorRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}
The other pole of extraversion is intraversion and the other pole of neuroticism might be emotional stability or calmness.
```{r factorRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Rotation of Neuroticism and Extraversion.", echo = FALSE}
knitr::include_graphics("./Images/factorRotation.png")
```
Simple structure is achieved when each variable loads highly onto as few factors as possible (i.e., each item has only one significant or primary loading).\index{simple structure}
Oftentimes this is not the case, so we choose our rotation method in order to decide if the factors can be correlated (an oblique rotation) or if the factors will be uncorrelated (an orthogonal rotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation}
If the factors are not correlated with each other, use an orthogonal rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}
The correlation between an item and a factor is a factor loading, which is simply a way to ask how much a variable is correlated with the underlying factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{structural equation modeling!factor loading}
However, its interpretation is more complicated if there are correlated factors!\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
An orthogonal rotation (e.g., varimax) can help with simplicity of interpretation because it seeks to yield simple structure without cross-loadings.\index{simple structure}\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}
Cross-loadings are instances where a variable loads onto multiple factors.\index{cross-loading}
My recommendation would always be to use an orthogonal rotation if you have reason to believe that finding simple structure in your data is possible; otherwise, the factors are extremely difficult to interpret—what exactly does a cross-loading even mean?\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}
However, you should always try an oblique rotation, too, to see how strongly the factors are correlated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
Examples of oblique rotations include oblimin and promax.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation}
#### 6. Model Selection and Interpretation {#modelSelectionInterpretation}
The next step of [factor analysis](#factorAnalysis) is selecting and interpreting the model.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
One data matrix can lead to many different (correct) models—you must choose one based on the factor structure and theory.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
Use theory to interpret the model and label the factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
When interpreting others' findings, do not rely just on the factor labels—look at the actual items to determine what they assess.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
What they are called matters much less than what the actual items are!\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
### The Downfall of Factor Analysis {#downfallOfFactorAnalysis}
The downfall of [factor analysis](#factorAnalysis) is cross-validation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}
Cross-validating a factor structure would mean getting the same factor structure with a new sample.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}
We want factor structures to show good replicability across samples.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}\index{replication}
However, cross-validation often falls apart.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}
The way to attempt to replicate a factor structure in an independent sample is to use [CFA](#cfa) to set everything up and test the hypothesized factor structure in the independent sample.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}\index{replication}\index{factor analysis!confirmatory}
### What to Do with Factors {#whatToDoWithFactors}
What can you do with factors once you have them?\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
In [SEM](#sem), factors have meaning.\index{structural equation modeling}\index{latent variable}
You can use them as predictors, mediators, moderators, or outcomes.\index{structural equation modeling}\index{latent variable}
And, using latent factors in [SEM](#sem) helps disattenuate associations for measurement error, as described in Section \@ref(disattenuation).
People often want to use factors outside of [SEM](#sem), but there is confusion here: When researchers find that three variables load onto Factor A, the researchers often combine those three using a sum or average—but this is not accurate.\index{structural equation modeling}\index{latent variable}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
If you just add or average them, this ignores the factor loadings and the error.\index{structural equation modeling!factor loading}\index{measurement error}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}
A mean or sum score is a [measurement model](#measurementModel-sem) that assumes that all items have the same factor loading (i.e., a loading of 1) and no error (residual of 0), which is unrealistic.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{structural equation modeling!measurement model}
Another solution is to form a linear composite by adding and weighting the variables by the factor loadings, which retains the differences in correlations (i.e., a weighted sum), but this still ignores the estimated error, so it still may not be generalizable and meaningful.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{linear composite}\index{measurement error}
At the same time, weighted sums may be less generalizable than unit-weighted composites where each variable is given equal weight because some variability in factor loadings likely reflects sampling error.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{linear composite}\index{structural equation modeling!factor loading}
A comparison of various measurement models is in Table \@ref(tab:measurementModelsComparison).\index{factor analysis}\index{principal component analysis}\index{structural equation modeling!measurement model}\index{linear composite}\index{measurement error}
Table: (\#tab:measurementModelsComparison) Comparison of Various Measurement Models. "PCA" = principal component analysis; "CFA" = confirmatory factor analysis; "EFA" = exploratory factor analysis; "SEM" = structural equation modeling; "IRT" = item response theory.
| Measurement Model | Items' Factor Loadings | Items' Residuals/Error |
|-----------------------------------------------|-------------------------------|-------------------------------------------------------|
| Mean/Sum Score | All are the same (fixed to 1) | Assumes items are measured without error (fixed to 0) |
| PCA or Weighted Mean/Sum Score | Allowed to differ | Assumes items are measured without error (fixed to 0) |
| Latent Variable Modeling (CFA, EFA, SEM, IRT) | Allowed to differ | Estimates error for each item |
### Missing Data Handling {#missingDataHandling-factorAnalysis}
The [PCA](#pca) default in SPSS is listwise deletion of missing data: if a participant is missing data on any variable, the subject gets excluded from the analysis, so you might end up with too few participants.\index{factor analysis!decisions}\index{principal component analysis}
Instead, use a correlation matrix with pairwise deletion for [PCA](#pca) with missing data.\index{factor analysis!decisions}\index{principal component analysis}
Maximum likelihood [factor analysis](#factorAnalysis) can make use of all available data points for a participant, even if they are missing some data points.\index{factor analysis!decisions}\index{factor analysis}
Mplus, which is often used for [SEM](#sem) and [factor analysis](#factorAnalysis), will notify you if you are removing many participants in [CFA](#cfa)/[EFA](#efa).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling}\index{factor analysis!confirmatory}\index{factor analysis!exploratory}
The `lavaan` package [@R-lavaan] in `R` also notifies you if you are removing participants in [CFA](#efa-cfa)/[SEM](#sem) models.\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling}\index{factor analysis!exploratory}
### Troubleshooting Improper Solutions {#troubleshooting-factorAnalysis}
A [PCA](#pca) will always converge; however, a [factor analysis](#factorAnalysis) model may not converge.\index{principal component analysis}\index{factor analysis}
If a [factor analysis](#factorAnalysis) model does not converge, you will need to modify the model to get it to converge.\index{factor analysis}
Model identification is described [here](#modelIdentificationTypes-sem).\index{factor analysis}\index{factor analysis!identifiability}
In addition, a [factor analysis](#factorAnalysis) model may converge but be improper.\index{factor analysis}\index{factor analysis!improper solution}
For example, if the [factor analysis](#factorAnalysis) model has a negative residual variance or correlation above 1, the model is improper.\index{factor analysis}\index{factor analysis!improper solution}
A negative residual variance is also called a Heywood case.\index{factor analysis}\index{factor analysis!improper solution}
Problematic parameters such as a negative residual variance or correlation above 1 can lead to problems with other parameters in the model.\index{factor analysis}\index{factor analysis!improper solution}
It is thus preferable to avoid or address improper solutions.\index{factor analysis}\index{factor analysis!improper solution}
A negative residual variance or correlation above 1 could be caused by a variety of things such as a misspecified model, too many factors, too few indicators per factor, or too few participants (relative to the complexity of the model).\index{factor analysis}\index{factor analysis!improper solution}
Here are various ways you may troubleshoot improper solutions in [factor analysis](#factorAnalysis):\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Modify the model to be a represent a truer representation of the data-generating process\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Simplify the model\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Estimate fewer paths/parameters\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Reduce the number of factors estimated/extracted\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Increase the sample size\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Increase the number of indicators per factor\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Examine the data for outliers\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Drop the offending item that has a negative residual variance (if theoretically justifiable because the item may be poorly performing)\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Use different starting values\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Test whether the negative residual variance is significantly different from zero; if not, can consider fixing it to zero (though this is not an ideal solution)\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Examine the data for possible outliers\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
- Use regularization with constraints so the residual variances do not become negative\index{factor analysis}\index{factor analysis!improper solution}\index{factor analysis!troubleshooting}
## Getting Started {#gettingStarted-factorAnalysis}
### Load Libraries {#loadLibraries-factorAnalysis}
```{r}
library("petersenlab") #to install: install.packages("remotes"); remotes::install_github("DevPsyLab/petersenlab")
library("lavaan")
library("psych")
library("corrplot")
library("nFactors")
library("semPlot")
library("lavaan")
library("semTools")
library("dagitty")
library("kableExtra")
library("MOTE")
library("tidyverse")
library("here")
library("tinytex")
library("knitr")
library("rmarkdown")
```
### Load Data {#loadData-factorAnalysis}
### Prepare Data {#prepareData-factorAnalysis}
#### Add Missing Data {#addMissingData-factorAnalysis}
Adding missing data to dataframes helps make examples more realistic to real-life data and helps you get in the habit of programming to account for missing data.
For reproducibility, I set the seed below.
Using the same seed will yield the same answer every time.
There is nothing special about this particular seed.
`HolzingerSwineford1939` is a data set from the `lavaan` package [@R-lavaan] that contains mental ability test scores (`x1`–`x9`) for seventh- and eighth-grade children.
```{r}
set.seed(52242)
varNames <- names(HolzingerSwineford1939)
dimensionsDf <- dim(HolzingerSwineford1939)
unlistedDf <- unlist(HolzingerSwineford1939)
unlistedDf[sample(
1:length(unlistedDf),
size = .15 * length(unlistedDf))] <- NA
HolzingerSwineford1939 <- as.data.frame(matrix(
unlistedDf,
ncol = dimensionsDf[2]))
names(HolzingerSwineford1939) <- varNames
vars <- c("x1","x2","x3","x4","x5","x6","x7","x8","x9")
```
## Descriptive Statistics and Correlations {#descriptiveStatsCorrelations-factorAnalysis}
Before conducting a [factor analysis](#factorAnalysis), it is important examine descriptive statistics and correlations among variables.\index{correlation}\index{descriptive statistics}
### Descriptive Statistics {#descriptiveStats-factorAnalysis}
Descriptive statistics are presented in Table \@ref(tab:descriptiveStats).\index{descriptive statistics}
```{r, eval = knitr::is_html_output(excludes = "epub"), echo = knitr::is_html_output(excludes = "epub")}
paged_table(psych::describe(HolzingerSwineford1939[,vars]))
```
```{r, eval = knitr::is_latex_output(), echo = knitr::is_latex_output()}
kable(psych::describe(HolzingerSwineford1939[,vars]),
format = "latex",
booktabs = TRUE) %>%
kable_styling(latex_options = "scale_down")
```
```{r, eval = knitr::is_html_output(excludes = c("markdown","html","html4","html5","revealjs","s5","slideous","slidy","gfm")), echo = knitr::is_html_output(excludes = c("markdown","html","html4","html5","revealjs","s5","slideous","slidy","gfm"))}
kable(psych::describe(HolzingerSwineford1939[,vars]),
booktabs = TRUE)
```
```{r descriptiveStats}
summaryTable <- HolzingerSwineford1939 %>%
dplyr::select(all_of(vars)) %>%
summarise(across(
everything(),
.fns = list(
n = ~ length(na.omit(.)),
missingness = ~ mean(is.na(.)) * 100,
M = ~ mean(., na.rm = TRUE),
SD = ~ sd(., na.rm = TRUE),
min = ~ min(., na.rm = TRUE),
max = ~ max(., na.rm = TRUE),
skewness = ~ psych::skew(., na.rm = TRUE),
kurtosis = ~ kurtosi(., na.rm = TRUE)),
.names = "{.col}.{.fn}")) %>%
pivot_longer(
cols = everything(),
names_to = c("variable","index"),
names_sep = "\\.") %>%
pivot_wider(
names_from = index,
values_from = value)
summaryTableTransposed <- summaryTable[-1] %>%
t() %>%
as.data.frame() %>%
setNames(summaryTable$variable) %>%
round(., digits = 2)
summaryTableTransposed
```
### Correlations {#correlations-factorAnalysis}
```{r}
cor(
HolzingerSwineford1939[,vars],
use = "pairwise.complete.obs")
```
Correlation matrices of various types using the `cor.table()` function from the [`petersenlab`](https://github.com/DevPsyLab/petersenlab) package [@R-petersenlab] are in Tables \@ref(tab:corTable1), \@ref(tab:corTable2), and \@ref(tab:corTable3).\index{petersenlab package}\index{correlation}
```{r, eval = FALSE}
cor.table(
HolzingerSwineford1939[,vars],
dig = 2)
cor.table(
HolzingerSwineford1939[,vars],
type = "manuscript",
dig = 2)
cor.table(
HolzingerSwineford1939[,vars],
type = "manuscriptBig",
dig = 2)
```
```{r, include = FALSE}
corTable1 <- cor.table(
HolzingerSwineford1939[,vars],
dig = 2)
corTable2 <- cor.table(
HolzingerSwineford1939[,vars],
type = "manuscript",
dig = 2)
corTable3 <- cor.table(
HolzingerSwineford1939[,vars],
type = "manuscriptBig",
dig = 2)
```
```{r corTable1, echo = FALSE}
corTable1 %>%
kable(.,
caption = "Correlation Matrix with *r*, *n*, and *p*-values.",
booktabs = TRUE,
linesep = c("", "", "\\addlinespace"),
escape = FALSE)
```
```{r corTable2, echo = FALSE}
corTable2 %>%
kable(.,
caption = "Correlation Matrix with Asterisks for Significant Associations.",
booktabs = TRUE,
linesep = "",
escape = FALSE)
```
```{r corTable3, echo = FALSE}
corTable3 %>%
kable(.,
caption = "Correlation Matrix.",
booktabs = TRUE,
linesep = "")
```
Pairs panel plots were generated using the `psych` package [@R-psych].\index{correlation}
Correlation plots were generated using the `corrplot` package [@R-corrplot].\index{correlation}
A pairs panel plot is in Figure \@ref(fig:pairsPanel).\index{correlation}
```{r pairsPanel, out.width = "100%", fig.align = "center", fig.cap = "Pairs Panel Plot."}
pairs.panels(HolzingerSwineford1939[,vars])
```
A correlation plot is in Figure \@ref(fig:correlationPlot).\index{correlation}
```{r correlationPlot, out.width = "100%", fig.align = "center", fig.cap = "Correlation Plot."}
corrplot(cor(
HolzingerSwineford1939[,vars],
use = "pairwise.complete.obs"))
```
## Factor Analysis {#factorAnalysisExamples}
### Exploratory Factor Analysis (EFA) {#efaExamples}
I introduced [exploratory factor analysis](#efa) (EFA) models in Section \@ref(efa).\index{factor analysis!exploratory}
#### Determine number of factors {#determineNumberOfFactors-efaExamples}
Determine the number of factors to retain using the [Scree plot](#screePlot) and [Very Simple Structure plot](#vssPlot).\index{scree plot}\index{very simple structure plot}
##### Scree Plot {#screePlot}
Scree plots were generated using the `psych` [@R-psych] and `nFactors` [@R-nFactors] packages.\index{scree plot}
The optimal coordinates and the acceleration factor attempt to operationalize the Cattell scree test: i.e., the "elbow" of the scree plot [@Ruscio2012].\index{scree plot}
The optimal coordinators factor is quantified using a series of linear equations to determine whether observed eigenvalues exceed the predicted values.\index{scree plot}
The acceleration factor is quantified using the acceleration of the curve, that is, the second derivative.\index{scree plot}
The Kaiser-Guttman rule states to keep principal components whose eigenvalues are greater than 1.\index{scree plot}\index{eigenvalue}
However, for [exploratory factor analysis](#efa) (as opposed to [PCA](#pca)), the criterion is to keep the factors whose eigenvalues are greater than zero (i.e., not the factors whose eigenvalues are greater than 1) [@Dinno2014].\index{scree plot}\index{factor analysis!exploratory}\index{principal component analysis}\index{eigenvalue}
The number of factors to keep would depend on which criteria one uses.\index{scree plot}
Based on the rule to keep factors whose eigenvalues are greater than zero and based on the parallel test, we would keep three factors.\index{scree plot}\index{eigenvalue}
However, based on the Cattell scree test (as operationalized by the optimal coordinates and acceleration factor), we would keep one factor.\index{scree plot}
Therefore, interpretability of the factors would be important for deciding between whether to keep one, two, or three factors.\index{scree plot}
A scree plot from a parallel analysis is in Figure \@ref(fig:screePlotEFAparallel).\index{scree plot}\index{factor analysis!exploratory}
```{r screePlotEFAparallel, out.width = "100%", fig.align = "center", fig.cap = "Scree Plot from Parallel Analysis in Exploratory Factor Analysis."}
fa.parallel(
x = HolzingerSwineford1939[,vars],
fm = "ml",
fa = "fa")
```
A scree plot from EFA is in Figure \@ref(fig:screePlotEFA).\index{scree plot}\index{factor analysis!exploratory}
```{r screePlotEFA, out.width = "100%", fig.align = "center", fig.cap = "Scree Plot in Exploratory Factor Analysis."}
plot(
nScree(x = cor(
HolzingerSwineford1939[,vars],
use = "pairwise.complete.obs"),
model = "factors"))
```
##### Very Simple Structure (VSS) Plot {#vssPlot}
The very simple structure (VSS) is another criterion that can be used to determine the optimal number of factors or components to retain.\index{very simple structure plot}
Using the VSS criterion, the optimal number of factors to retain is the number of factors that maximizes the VSS criterion [@Revelle1979].\index{very simple structure plot}
The VSS criterion is evaluated with models in which factor loadings for a given item that are less than the maximum factor loading for that item are suppressed to zero, thus forcing simple structure (i.e., no cross-loadings).\index{cross-loading}\index{very simple structure plot}
The goal is finding a factor structure with interpretability so that factors are clearly distinguishable.\index{very simple structure plot}
Thus, we want to identify the number of factors with the highest VSS criterion (i.e., the highest line).\index{very simple structure plot}
Very simple structure (VSS) plots were generated using the `psych` package [@R-psych].\index{very simple structure plot}
The output also provides additional criteria by which to determine the optimal number of factors, each for which lower values are better, including the Velicer minimum average partial (MAP) test, the Bayesian information criterion (BIC), the sample size-adjusted BIC (SABIC), and the root mean square error of approximation (RMSEA).\index{very simple structure plot}
###### Orthogonal (Varimax) rotation {#orthogonal-vss}
In the example with orthogonal rotation below, the VSS criterion is highest with 3 or 4 factors.\index{very simple structure plot}\index{orthogonal rotation}
A three-factor solution is supported by the lowest BIC, whereas a four-factor solution is supported by the lowest SABIC.\index{very simple structure plot}\index{orthogonal rotation}
A VSS plot is in Figure \@ref(fig:vssOrthogonal).\index{very simple structure plot}\index{orthogonal rotation}
```{r vssOrthogonal, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure Plot With Orthogonal Rotation in Exploratory Factor Analysis."}
vss(
HolzingerSwineford1939[,vars],
rotate = "varimax",
fm = "ml")
```
Multiple VSS-related fit indices are in Figure \@ref(fig:vssComplexOrthogonal).\index{very simple structure plot}\index{orthogonal rotation}
```{r vssComplexOrthogonal, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure-Related Indices With Orthogonal Rotation in Exploratory Factor Analysis."}
nfactors(
HolzingerSwineford1939[,vars],
rotate = "varimax",
fm = "ml")
```
###### Oblique (Oblimin) rotation {#oblique-vss}
In the example with oblique rotation below, the VSS criterion is highest with 3 factors.\index{very simple structure plot}\index{oblique rotation}
A three-factor solution is supported by the lowest BIC.\index{very simple structure plot}\index{oblique rotation}
A VSS plot is in Figure \@ref(fig:vssOblique).\index{very simple structure plot}\index{oblique rotation}
```{r vssOblique, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure Plot With Oblique Rotation in Exploratory Factor Analysis."}
vss(
HolzingerSwineford1939[,vars],
rotate = "oblimin",
fm = "ml")
```
Multiple VSS-related fit indices are in Figure \@ref(fig:vssComplexOblique).\index{very simple structure plot}\index{oblique rotation}
```{r vssComplexOblique, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure-Related Indices With Oblique Rotation in Exploratory Factor Analysis."}
nfactors(
HolzingerSwineford1939[,vars],
rotate = "oblimin",
fm = "ml")
```
###### No rotation {#noRotation-vss}
In the example with no rotation below, the VSS criterion is highest with 3 or 4 factors.\index{very simple structure plot}
A three-factor solution is supported by the lowest BIC, whereas a four-factor solution is supported by the lowest SABIC.\index{very simple structure plot}
A VSS plot is in Figure \@ref(fig:vssUnrotated).\index{very simple structure plot}
```{r vssUnrotated, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure Plot With no Rotation in Exploratory Factor Analysis."}
nfactors(
HolzingerSwineford1939[,vars],
rotate = "none",
fm = "ml")
```
Multiple VSS-related fit indices are in Figure \@ref(fig:vssComplexUnrotated).\index{very simple structure plot}
```{r vssComplexUnrotated, out.width = "100%", fig.align = "center", fig.cap = "Very Simple Structure-Related Indices With no Rotation in Exploratory Factor Analysis."}
nfactors(
HolzingerSwineford1939[,vars],
rotate = "none",
fm = "ml")
```
#### Run factor analysis {#runFactorAnalysis-efa}
[Exploratory factor analysis](#efa) (EFA) models were fit using the `fa()` function of the `psych` package [@R-psych] and the `sem()` and `efa()` functions of the `lavaan` package [@R-lavaan].\index{factor analysis!exploratory}
##### Orthogonal (Varimax) rotation {#orthogonal-efa}
###### `psych` {#orthogonalPsych-efa}
Fit a different model with each number of possible factors:\index{factor analysis!exploratory}\index{orthogonal rotation}
```{r}
efa1factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 1,
rotate = "varimax",
fm = "ml")
efa2factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 2,
rotate = "varimax",
fm = "ml")
efa3factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 3,
rotate = "varimax",
fm = "ml")
efa4factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 4,
rotate = "varimax",
fm = "ml")
efa5factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 5,
rotate = "varimax",
fm = "ml")
efa6factorOrthogonal <- fa(
r = HolzingerSwineford1939[,vars],
nfactors = 6,
rotate = "varimax",
fm = "ml")
efa7factorOrthogonal <- fa(