From 2f3963dae3772b2c0d925c3639d22416f7a5ad3a Mon Sep 17 00:00:00 2001 From: Petersen Date: Wed, 2 Oct 2024 10:14:52 -0500 Subject: [PATCH] 20241002 - remove text --- 15-Factor-Analysis-PCA.Rmd | 595 ------------------------------------- 1 file changed, 595 deletions(-) diff --git a/15-Factor-Analysis-PCA.Rmd b/15-Factor-Analysis-PCA.Rmd index 9d6a7245..c05e61ec 100644 --- a/15-Factor-Analysis-PCA.Rmd +++ b/15-Factor-Analysis-PCA.Rmd @@ -1,600 +1,5 @@ # Factor Analysis and Principal Component Analysis {#factor-analysis-PCA} -## Overview of Factor Analysis {#factorAnalysisOverview} - -[Factor analysis](#factorAnalysis) is a class of latent variable models that is designated to identify the structure of a measure or set of measures, and ideally, a construct or set of constructs.\index{factor analysis} -It aims to identify the optimal latent structure for a group of variables.\index{factor analysis} -[Factor analysis](#factorAnalysis) encompasses two general types: [confirmatory factor analysis](#cfa) and [exploratory factor analysis](#efa).\index{factor analysis}\index{factor analysis!confirmatory}\index{factor analysis!exploratory} -[*Exploratory factor analysis*](#efa) (EFA) is a latent variable modeling approach that is used when the researcher has no a priori hypotheses about how a set of variables is structured.\index{factor analysis!exploratory} -[EFA](#efa) seeks to identify the empirically optimal-fitting model in ways that balance accuracy (i.e., variance accounted for) and parsimony (i.e., simplicity).\index{parsimony}\index{factor analysis!exploratory}\index{latent variable} -[*Confirmatory factor analysis*](#cfa) (CFA) is a latent variable modeling approach that is used when a researcher wants to evaluate how well a hypothesized model fits, and the model can be examined in comparison to alternative models.\index{factor analysis!confirmatory}\index{latent variable} -Using a [CFA](#cfa) approach, the researcher can pit models representing two theoretical frameworks against each other to see which better accounts for the observed data.\index{factor analysis!confirmatory} - -[Factor analysis](#factorAnalysis) is considered to be a "pure" data-driven method for identifying the structure of the data, but the "truth" that we get depends heavily on the decisions we make regarding the parameters of our [factor analysis](#factorAnalysis).\index{factor analysis!confirmatory} -The goal of [factor analysis](#factorAnalysis) is to identify simple, parsimonious factors that underlie the "junk" (i.e., scores filled with measurement error) that we observe.\index{parsimony}\index{factor analysis!confirmatory} - -It used to take a long time to calculate a [factor analysis](#factorAnalysis) because it was computed by hand.\index{factor analysis!confirmatory} -Now, it is fast to compute [factor analysis](#factorAnalysis) with computers (e.g., oftentimes less than 30 ms).\index{factor analysis!confirmatory} -In the 1920s, Spearman developed [factor analysis](#factorAnalysis) to understand the factor structure of intelligence.\index{factor analysis!confirmatory}\index{intellectual assessment} -It was a long process—it took Spearman around one year to calculate the first [factor analysis](#factorAnalysis)!\index{factor analysis!confirmatory} -[Factor analysis](#factorAnalysis) takes a large dimension data set and simplifies it into a smaller set of factors that are thought to reflect underlying constructs.\index{parsimony}\index{factor analysis!confirmatory} -If you believe that nature is simple underneath, [factor analysis](#factorAnalysis) gives nature a chance to display the simplicity that lives beneath the complexity on the surface.\index{parsimony}\index{factor analysis!confirmatory} -Spearman identified a single factor, *g*, that accounted for most of the covariation between the measures of intelligence.\index{factor analysis!confirmatory}\index{intelligence!\textit{g}} - -[Factor analysis](#factorAnalysis) involves observed (manifest) variables and unobserved (latent) factors.\index{factor analysis}\index{latent variable} -In a [reflective model](#reflectiveConstruct), it is assumed that the latent factor influences the manifest variables, and the latent factor therefore reflects the common ([reliable](#reliability)) variance among the variables.\index{factor analysis}\index{latent variable}\index{latent variable}\index{construct!reflective} -A factor model potentially includes factor loadings, residuals (errors or disturbances), intercepts/means, covariances, and regression paths.\index{factor analysis} -A regression path indicates a hypothesis that one variable (or factor) influences another.\index{factor analysis} -The standardized regression coefficient represents the strength of association between the variables or factors.\index{factor analysis}\index{standardized regression coefficient} -A factor loading is a regression path from a latent factor to an observed (manifest) variable.\index{factor analysis}\index{structural equation modeling!factor loading}\index{latent variable} -The standardized factor loading represents the strength of association between the variable and the latent factor.\index{factor analysis}\index{structural equation modeling!factor loading}\index{latent variable}\index{standardized regression coefficient} -A residual is variance in a variable (or factor) that is unexplained by other variables or factors.\index{factor analysis}\index{structural equation modeling!residual} -An indicator's intercept is the expected value of the variable when the factor(s) (onto which it loads) is equal to zero.\index{factor analysis}\index{structural equation modeling!intercept} -Covariances are the associations between variables (or factors).\index{factor analysis}\index{structural equation modeling!covariance} - -In factor analysis, the relation between an indicator ($\text{X}$) and its underlying latent factor(s) ($\text{F}$) can be represented with a regression formula as in Equation \@ref(eq:indicatorLatentAssociation):\index{factor analysis}\index{structural equation modeling!factor loading}\index{structural equation modeling!intercept} - -\begin{equation} -\text{X} = \lambda \cdot \text{F} + \text{Item Intercept} + \text{Error Term} -(\#eq:indicatorLatentAssociation) -\end{equation} - -where: - -- $\text{X}$ is the observed value of the indicator -- $\lambda$ is the factor loading, indicating the strength of the association between the indicator and the latent factor(s) -- $\text{F}$ is the person's value on the latent factor(s) -- $\text{Item Intercept}$ represents the constant term that accounts for the expected value of the indicator when the latent factor(s) are zero -- $\text{Error Term}$ is the residual, indicating the extent of variance in the indicator that is not explained by the latent factor(s) - -When the latent factors are uncorrelated, the (standardized) error term for an indicator is calculated as 1 minus the sum of squared standardized factor loadings for a given item (including cross-loadings).\index{factor analysis}\index{structural equation modeling!residual}\index{structural equation modeling!factor loading}\index{cross-loading} - -Another class of [factor analysis](#factorAnalysis) models are [higher-order](#higherOrderModel) (or hierarchical) factor models and [bifactor models](#bifactorModel).\index{factor analysis}\index{factor analysis!higher-order}\index{factor analysis!bifactor} -Guidelines in using [higher-order factor](#higherOrderModel) and [bifactor](#bifactorModel) models are discussed by @Markon2019.\index{factor analysis}\index{factor analysis!higher-order}\index{factor analysis!bifactor} - -[Factor analysis](#factorAnalysis) is a powerful technique to help identify the factor structure that underlies a measure or construct.\index{factor analysis} -As discussed in Section \@ref(factorAnalysisDecisions), however, there are many decisions to make in [factor analysis](#factorAnalysis), in addition to questions about which variables to use, how to scale the variables, etc.\index{factor analysis} -If the variables going into a [factor analysis](#factorAnalysis) are not well assessed, [factor analysis](#factorAnalysis) will not rescue the factor structure.\index{factor analysis} -In such situations, there is likely to be the problem of garbage in, garbage out.\index{factor analysis} -Factor analysis depends on the covariation among variables.\index{factor analysis} -Given the extensive [method variance](#methodBias) that measures have, [factor analysis](#factorAnalysis) (and [principal component analysis](#pca)) tends to extract method factors.\index{factor analysis}\index{principal component analysis}\index{method bias}\index{factor analysis!method factor} -Method factors are factors that are related to the methods being assessed rather than the construct of interest.\index{factor analysis}\index{method bias}\index{factor analysis!method factor} -However, [multitrait-multimethod](#MTMM) approaches to [factor analysis](#factorAnalysis) (such as in Section \@ref(mtmmCFA)) help better partition the variance in variables that reflects method variance versus construct variance, to get more accurate estimates of constructs.\index{factor analysis}\index{multitrait-multimethod matrix} - -@Floyd1995 provide an overview of [factor analysis](#factorAnalysis) for the development of clinical assessments.\index{factor analysis} - -### Example Factor Models from Correlation Matrices {#exampleFactorModelsFromCorrelationMatrices} - -Below, I provide some example factor models from various correlation matrices.\index{factor analysis} -Analytical examples of [factor analysis](#factorAnalysis) are presented in Section \@ref(factorAnalysisExamples).\index{factor analysis} - -Consider the example correlation matrix in Figure \@ref(fig:correlationMatrix1).\index{factor analysis} -Because all of the correlations are the same ($r = .60$), we expect there is approximately one factor for this pattern of data.\index{factor analysis} - -```{r correlationMatrix1, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 1.", echo = FALSE} -knitr::include_graphics("./Images/correlationMatrix1.png") -``` - -In a single-factor model fit to these data, the factor loadings are .77 and the residual error terms are .40, as depicted in Figure \@ref(fig:factorAnalysis1).\index{factor analysis} -The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the square of the standardized loading: $.60 = .77 \times .77$.\index{factor analysis}\index{structural equation modeling!factor loading} -The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.40 = 1 - .60$.\index{factor analysis}\index{structural equation modeling!residual} -The proportion of the total variance in indicators that is accounted for by the latent factor is the sum of the square of the standardized loadings divided by the number of indicators.\index{factor analysis} -That is, to calculate the proportion of the total variance in the variables that is accounted for by the latent factor, you would square the loadings, sum them up, and divide by the number of variables: $\frac{.77^2 + .77^2 + .77^2 + .77^2 + .77^2 + .77^2}{6} = \frac{.60 + .60 + .60 + .60 + .60 + .60}{6} = .60$.\index{factor analysis} -Thus, the latent factor accounts for 60% of the variance in the indicators.\index{factor analysis} -In this model, the latent factor explains the covariance among the variables.\index{factor analysis} -If the answer is simple, a small and parsimonious model should be able to obtain the answer.\index{parsimony}\index{factor analysis} - -```{r factorAnalysis1, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Unidimensional Model.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-01.png") -``` - -Consider a different correlation matrix in Figure \@ref(fig:correlationMatrix2).\index{factor analysis} -There is no common variance (correlations between the variables are zero), so there is no reason to believe there is a common factor that influences all of the variables.\index{factor analysis} -Variables that are not correlated cannot be related by a third variable, such as a common factor, so a common factor is not the right model.\index{factor analysis} - -```{r correlationMatrix2, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 2.", echo = FALSE} -knitr::include_graphics("./Images/correlationMatrix2.png") -``` - -Consider another correlation matrix in Figure \@ref(fig:correlationMatrix3).\index{factor analysis} - -```{r correlationMatrix3, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 3.", echo = FALSE} -knitr::include_graphics("./Images/correlationMatrix3.png") -``` - -If you try to fit a single factor to this correlation matrix, it generates a factor model depicted in Figure \@ref(fig:factorAnalysis2).\index{factor analysis} -In this model, the first three variables have a factor loading of .77, but the remaining three variables have a factor loading of zero.\index{factor analysis} -This indicates that three remaining factors likely do not share a common factor with the first three variables.\index{factor analysis} - -```{r factorAnalysis2, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Multidimensional Model.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-02.png") -``` - -Therefore, a one-factor model is probably not correct; instead, the structure of the data is probably best represented by a two-factor model, as depicted in Figure \@ref(fig:factorAnalysis3).\index{factor analysis} -In the model, Factor 1 explains why measures 1, 2, and 3 are correlated, whereas Factor 2 explains why measures 4, 5, and 6 are correlated.\index{factor analysis} -A two-factor model thus explains why measures 1, 2, and 3 are not correlated with measures 4, 5, and 6.\index{factor analysis} -In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis} -Each latent factor accounts for 30% of the variance in all of the indicators: $\frac{.77^2 + .77^2 + .77^2 + 0^2 + 0^2 + 0^2}{6} = \frac{.60 + .60 + .60 + 0 + 0 + 0}{6} = .30$.\index{factor analysis} - -```{r factorAnalysis3, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Uncorrelated Factors.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-03.png") -``` - -Consider another correlation matrix in Figure \@ref(fig:correlationMatrix4).\index{factor analysis} - -```{r correlationMatrix4, out.width = "100%", fig.align = "center", fig.cap = "Example Correlation Matrix 4.", echo = FALSE} -knitr::include_graphics("./Images/correlationMatrix4.png") -``` - -One way to model these data is depicted in Figure \@ref(fig:factorAnalysis4).\index{factor analysis} -In this model, the factor loadings are .77, the residual error terms are .40, and there is a covariance path of .50 for the association between Factor 1 and Factor 2.\index{factor analysis} -Going from the model to the correlation matrix is deterministic.\index{factor analysis} -If you know the model, you can calculate the correlation matrix.\index{factor analysis} -For instance, using path tracing rules (described in Section \@ref(ctt)), the correlation of measures within a factor in this model is calculated as: $0.60 = .77 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules} -Using path tracing rules, the correlation of measures across factors in this model is calculated as: $.30 = .77 \times .50 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules} -In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis} -Each latent factor accounts for 37% of the variance in all of the indicators: $\frac{.77^2 + .77^2 + .77^2 + (.50^2 \times .77^2) + (.50^2 \times .77^2) + (.50^2 \times .77^2)}{6} = \frac{.60 + .60 + .60 + .15 + .15 + .15}{6} = .37$.\index{factor analysis} - -```{r factorAnalysis4, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Correlated Factors.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-04.png") -``` - -Although going from the model to the correlation matrix is deterministic, going from the correlation matrix to the model is not deterministic.\index{factor analysis} -If you know the correlation matrix, there may be many possible models.\index{factor analysis} -For instance, the model could also be the one depicted in Figure \@ref(fig:factorAnalysis5), with factor loadings of .77, residual error terms of .40, a regression path of .50, and a disturbance term of .75.\index{factor analysis} -The proportion of variance in Factor 2 that is explained by Factor 1 is calculated as: $.25 = .50 \times .50$.\index{factor analysis} -The disturbance term is calculated as $.75 = 1 - (.50 \times .50) = 1 - .25$.\index{factor analysis} -In this model, each latent factor accounts for 60% of the variance in the indicators that load onto it: $\frac{.77^2 + .77^2 + .77^2}{3} = \frac{.60 + .60 + .60}{3} = .60$.\index{factor analysis} -Factor 1 accounts for 15% of the variance in the indicators that load onto Factor 2: $\frac{(.50^2 \times .77^2) + (.50^2 \times .77^2) + (.50^2 \times .77^2)}{3} = \frac{.15 + .15 + .15}{3} = .15$.\index{factor analysis} -This model has the exact same fit as the previous model, but it has different implications.\index{factor analysis} -Unlike the previous model, in this model, there is a "causal" pathway from Factor 1 to Factor 2.\index{factor analysis} -However, the causal effect of Factor 1 does not account for all of the variance in Factor 2 because the correlation is only .50.\index{factor analysis} - -```{r factorAnalysis5, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Regression Path.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-05.png") -``` - -Alternatively, something else (e.g., another factor) could be explaining the data that we have not considered, as depicted in Figure \@ref(fig:factorAnalysis6).\index{factor analysis} -This is a [higher-order factor model](#higherOrderModel), in which there is a higher-order factor ($A_1$) that influences both lower-order factors, Factor 1 ($F_1$) and Factor 2 ($F_2$).\index{factor analysis}\index{factor analysis!higher-order} -The factor loadings from the lower order factors to the manifest variables are .77, the factor loading from the higher-order factor to the lower-order factors is .71, and the residual error terms are .40.\index{factor analysis} -This model has the exact same fit as the previous models.\index{factor analysis} -The proportion of variance in a lower-order factor ($F_1$ or $F_2$) that is explained by the higher-order factor ($A_1$) is calculated as: $.50 = .71 \times .71$.\index{factor analysis} -The disturbance term is calculated as $.50 = 1 - (.71 \times .71) = 1 - .50$.\index{factor analysis} -Using path tracing rules, the correlation of measures across factors in this model is calculated as: $.30 = .77 \times .71 \times .71 \times .77$.\index{factor analysis}\index{path analysis!path tracing rules} -In this model, the higher-order factor ($A_1$) accounts for 30% of the variance in the indicators: $\frac{(.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2) + (.77^2 \times .71^2)}{6} = \frac{.30 + .30 + .30 + .30 + .30 + .30}{6} = .30$.\index{factor analysis} - -```{r factorAnalysis6, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Higher-Order Factor Model.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-06.png") -``` - -Alternatively, there could be a single factor that ties measures 1, 2, and 3 together and measures 4, 5, and 6 together, as depicted in Figure \@ref(fig:factorAnalysis7).\index{factor analysis} -In this model, the measures no longer have merely [random error](#randomError): measures 1, 2, and 3 have correlated residuals—that is, they share error variance (i.e., [systematic error](#systematicError)); likewise, measures 4, 5, and 6 have correlated residuals.\index{structural equation modeling!residual!correlated}\index{factor analysis}\index{measurement error!random error}\index{measurement error!systematic error} -This model has the exact same fit as the previous models.\index{factor analysis} -The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the square of the standardized loading: $.30 = .55 \times .55$.\index{factor analysis}\index{structural equation modeling!factor loading} -The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.70 = 1 - .30$.\index{factor analysis}\index{structural equation modeling!residual} -Using path tracing rules, the correlation of measures within a factor in this model is calculated as: $.60 = (.55 \times .55) + (.70 \times .43 \times .70) + (.70 \times .43 \times .43 \times .70)$.\index{factor analysis}\index{path analysis!path tracing rules} -The correlation of measures across factors in this model is calculated as: $.30 = .55 \times .55$.\index{factor analysis}\index{path analysis!path tracing rules} -In this model, the latent factor accounts for 30% of the variance in the indicators: $\frac{.55^2 + .55^2 + .55^2 + .55^2 + .55^2 + .55^2}{6} = \frac{.30 + .30 + .30 + .30 + .30 + .30}{6} = .30$.\index{factor analysis} - -```{r factorAnalysis7, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Unidimensional Model With Correlated Residuals.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-07.png") -``` - -Alternatively, there could be a single factor that influences measures 1, 2, 3, 4, 5, and 6 in addition to a [method bias](#methodBias) factor (e.g., a particular measurement method, item stem, reverse-worded item, or another [method bias](#methodBias)) that influences measures 4, 5, and 6 equally, as depicted in Figure \@ref(fig:factorAnalysis10).\index{factor analysis}\index{method bias} -In this model, measures 4, 5, and 6 have cross-loadings—that is, they load onto more than one latent factor.\index{factor analysis}\index{cross-loading} -This model has the exact same fit as the previous models.\index{factor analysis} -The amount of common variance ($R^2$) that is accounted for by an indicator is estimated as the sum of the squared standardized loadings: $.60 = .77 \times .77 = (.39 \times .39) + (.67 \times .67)$.\index{factor analysis}\index{structural equation modeling!factor loading} -The amount of error for an indicator is estimated as: $\text{error} = 1 - \text{common variance}$, so in this case, the amount of error is: $.40 = 1 - .60$.\index{factor analysis}\index{structural equation modeling!residual} -Using path tracing rules, the correlation of measures within a factor in this model is calculated as: $.60 = (.77 \times .77) = (.39 \times .39) + (.67 \times .67)$.\index{factor analysis}\index{path analysis!path tracing rules} -The correlation of measures across factors in this model is calculated as: $.30 = .77 \times .39$.\index{factor analysis}\index{path analysis!path tracing rules} -In this model, the first latent factor accounts for 37% of the variance in the indicators: $\frac{.77^2 + .77^2 + .77^2 + .39^2 + .39^2 + .39^2}{6} = \frac{.59 + .59 + .59 + .15 + .15 + .15}{6} = .30$.\index{factor analysis} -The second latent factor accounts for 45% of the variance in its indicators: $\frac{.67^2 + .67^2 + .67^2}{3} = \frac{.45 + .45 + .45}{3} = .45$.\index{factor analysis} - -```{r factorAnalysis10, out.width = "100%", fig.align = "center", fig.cap = "Example Confirmatory Factor Analysis Model: Two-Factor Model With Cross-Loadings.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-10.png") -``` - -### Indeterminacy {#indeterminacy} - -There could be many more models that have the same fit to the data.\index{factor analysis!indeterminacy} -Thus, [factor analysis](#factorAnalysis) has *indeterminacy* because all of these models can explain these same data equally well, with all having different theoretical meaning.\index{factor analysis!indeterminacy} -The goal of [factor analysis](#factorAnalysis) is for the model to look at the data and induce the model.\index{factor analysis} -However, most data matrices in real life are very complicated—much more complicated than in these examples.\index{factor analysis} - -This is why we do not calculate our own [factor analysis](#factorAnalysis) by hand; use a stats program!\index{factor analysis} -It is important to think about the possibility of other models to determine how confident you can be in your data model.\index{factor analysis} -For every fully specified factor model—i.e., where the relevant paths are all defined, there is one and only one predictive data matrix (correlation matrix).\index{factor analysis} -However, each data matrix can produce many different factor models.\index{factor analysis!indeterminacy} -There is no way to distinguish which of these factor models is correct from the data matrix alone.\index{factor analysis!indeterminacy} -Any given data matrix can predict an infinite number of factor models that accurately represent the data structure [@Raykov2001a]—so we make decisions that determine what type of factor solution our data will yield.\index{factor analysis!indeterminacy} - -Many models could explain your data, and there are many more models that do not explain the data.\index{factor analysis!indeterminacy} -For equally good-fitting models, decide based on interpretability.\index{factor analysis} -If you have strong theory, decide based on theory and things outside of [factor analysis](#factorAnalysis)!\index{factor analysis}\index{theory} - -### Practical Considerations {#practicalConsiderations-factorAnalysis} - -There are important considerations for doing [factor analysis](#factorAnalysis) in real life with complex data.\index{factor analysis!practical considerations} -Traditionally, researchers had to consider what kind of data they have, and they often assumed interval-level data even though data in psychology are often not interval data.\index{factor analysis!practical considerations}\index{data!interval} -In the past, [factor analysis](#factorAnalysis) was not good with categorical and dichotomous (e.g., True/False) data because the variance then is largely determined by the mean.\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous} -So, we need something more complicated for dichotomous data.\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous} -More solutions are available now for [factor analysis](#factorAnalysis) with ordinal and dichotomous data, but it is generally best to have at least four ordered categories to perform [factor analysis](#factorAnalysis).\index{factor analysis!practical considerations}\index{data!nominal}\index{data!dichotomous}\index{data!ordinal}\index{data!polytomous} - -The necessary sample size depends on the complexity of the true factor structure.\index{factor analysis!practical considerations} -If there is a strong single factor for 30 items, then $N = 50$ is plenty.\index{factor analysis!practical considerations} -But if there are five factors and some correlated errors, then the sample size will need to be closer to ~5,000.\index{structural equation modeling!residual!correlated}\index{factor analysis!practical considerations} -[Factor analysis](#factorAnalysis) can recover the truth when the world is simple.\index{parsimony}\index{factor analysis} -However, nature is often not simple, and it may end in the distortion of nature instead of nature itself.\index{factor analysis} - -Recommendations for [factor analysis](#factorAnalysis) are described by @Sellbom2019a.\index{factor analysis} - -### Decisions to Make in Factor Analysis {#factorAnalysisDecisions} - -There are many decisions to make in [factor analysis](#factorAnalysis).\index{factor analysis!decisions} -These decisions can have important impacts on the resulting solution.\index{factor analysis!decisions} -Decisions include things such as:\index{factor analysis!decisions} - -1. What variables to include in the model and how to scale them\index{factor analysis!decisions} -1. Method of factor extraction: [factor analysis](#factorAnalysis) or [PCA](#pca)\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -1. If [factor analysis](#factorAnalysis), the kind of [factor analysis](#factorAnalysis): [EFA](#efa) or [CFA](#cfa)\index{factor analysis!decisions}\index{factor analysis!confirmatory}\index{factor analysis!exploratory} -1. How many factors to retain\index{factor analysis} -1. If [EFA](#efa) or [PCA](#pca), whether and how to rotate factors (factor rotation)\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{principal component analysis} -1. Model selection and interpretation\index{factor analysis!decisions} - -#### 1. Variables to Include and their Scaling {#variablesToInclude-factorAnalysis} - -The first decision when conducting a [factor analysis](#factorAnalysis) is which variables to include and the scaling of those variables.\index{factor analysis!decisions} -What factors (or components) you extract can differ widely depending on what variables you include in the analysis.\index{factor analysis!decisions} -For example, if you include many variables from the same source (e.g., self-report), it is possible that you will extract a factor that represents the common variance among the variables from that source (i.e., the self-reported variables).\index{factor analysis!decisions}\index{factor analysis!method factor} -This would be considered a method factor, which works against the goal of estimating latent factors that represent the constructs of interest (as opposed to the measurement methods used to estimate those constructs).\index{factor analysis!decisions}\index{factor analysis!method factor} -An additional consideration is the scaling of the variables—whether to use the raw scaling or whether to standardize them to be on a more common metric (e.g., z-score metric with a mean of zero and standard deviation of one).\index{factor analysis!decisions} - -#### 2. Method of Factor Extraction {#methodOfFactorExtraction} - -The second decision is to select the method of factor extraction.\index{factor analysis!decisions} -This is the algorithm that is going to try to identify factors.\index{factor analysis!decisions} -There are two main families of factor or component extraction: analytic or principal components.\index{factor analysis!decisions}\index{principal component analysis} -The principal components approach is called [principal component analysis](#pca) (PCA).\index{factor analysis!decisions}\index{principal component analysis} -[PCA](#pca) is not really a form [factor analysis](#factorAnalysis); rather, it is useful for data reduction [@Lilienfeld2015].\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction} -The analytic family includes [factor analysis](#factorAnalysis) approaches such as principal axis factoring and maximum likelihood [factor analysis](#factorAnalysis).\index{factor analysis!decisions}\index{factor analysis} -The distinction between [factor analysis](#factorAnalysis) and [PCA](#pca) is depicted in Figure \@ref(fig:factorAnalysisPCA).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -```{r factorAnalysisPCA, out.width = "100%", fig.align = "center", fig.cap = "Distinction Between Factor Analysis and Principal Component Analysis.", echo = FALSE} -knitr::include_graphics("./Images/factorAnalysisPCA.png") -``` - -##### Principal Component Analysis {#pca} - -Principal component analysis (PCA) is used if you want to reduce your data matrix.\index{factor analysis!decisions}\index{principal component analysis} -PCA composites represent the variances of an observed measure in as economical a fashion as possible, with no latent underlying variables.\index{factor analysis!decisions}\index{principal component analysis} -The goal of PCA is to identify a smaller number of components that explain as much variance in a set of variables as possible.\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction} -It is an atheoretical way to decompose a matrix.\index{factor analysis!decisions}\index{principal component analysis} -PCA involves decomposition of a data matrix into a set of eigenvectors, which are transformations of the old variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector} - -The eigenvectors attempt to simplify the data in the matrix.\index{parsimony}\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector} -PCA takes the data matrix and identifies the weighted sum of all variables that does the best job at explaining variance: these are the principal components, also called eigenvectors.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector} -Principal components reflect optimally weighted sums.\index{factor analysis!decisions}\index{principal component analysis} -In this way, PCA is a [formative model](#formativeConstruct) (by contrast, [factor analysis](#factorAnalysis) applies a [reflective model](#reflectiveConstruct)).\index{factor analysis!decisions}\index{principal component analysis}\index{construct!formative}\index{factor analysis}\index{construct!reflective} - -PCA decomposes the data matrix into any number of components—as many as the number of variables, which will always account for all variance.\index{factor analysis!decisions}\index{principal component analysis} -After the model is fit, you can look at the results and discard the components which likely reflect error variance.\index{factor analysis!decisions}\index{principal component analysis} -Judgments about which factors to retain are based on empirical criteria in conjunction with theory to select a parsimonious number of components that account for the majority of variance.\index{factor analysis!decisions}\index{principal component analysis} - -The eigenvalue reflects the amount of variance explained by the component (eigenvector).\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue} -When using a varimax (orthogonal) rotation, an eigenvalue for a component is calculated as the sum of squared standardized component loadings on that component.\index{factor analysis!decisions}\index{principal component analysis}\index{orthogonal rotation}\index{eigenvalue} -When using oblique rotation, however, the items explain more variance than is attributable to their factor loadings because the factors are correlated.\index{factor analysis!decisions}\index{principal component analysis}\index{oblique rotation} - -PCA pulls the first principal component out (i.e., the eigenvector that explains the most variance) and makes a new data matrix: i.e., new correlation matrix.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector} -Then the PCA pulls out the component that explains the next most variance—i.e., the eigenvector with the next largest eigenvalue, and it does this for all components, equal to the same number of variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector}\index{eigenvalue} -For instance, if there are six variables, it will iteratively extract an additional component up to six components.\index{factor analysis!decisions}\index{principal component analysis} -You can extract as many eigenvectors as there are variables.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvector} -If you extract all six components, the data matrix left over will be the same as the correlation matrix in Figure \@ref(fig:correlationMatrix2).\index{factor analysis!decisions}\index{principal component analysis} -That is, the remaining variables will be entirely uncorrelated with the remaining variables, because six components explain 100% of the variance from six variables.\index{factor analysis!decisions}\index{principal component analysis} -In other words, you can explain (6) variables with (6) new things!\index{factor analysis!decisions}\index{principal component analysis} - -However, it does no good if you have to use all (6) components because there is no data reduction from the original number of variables, but hopefully the first few components will explain most of the variance.\index{factor analysis!decisions}\index{principal component analysis}\index{data!reduction} - -The sum of all eigenvalues is equal to the number of variables in the analysis.\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue} -PCA does not have the same assumptions as [factor analysis](#factorAnalysis), which assumes that measures are partly from common variance and error.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance} -But if you estimate (6) eigenvectors and only keep (2), the model is a two-component model and whatever left becomes error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error}\index{eigenvector} -Therefore, PCA does not have the same assumptions as [factor analysis](#factorAnalysis), but it often ends up in the same place.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis} - -Most people who want to conduct a [factor analysis](#factorAnalysis) use PCA, but PCA is not really [factor analysis](#factorAnalysis) [@Lilienfeld2015].\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis} -PCA is what SPSS can do quickly.\index{factor analysis!decisions}\index{principal component analysis} -But computers are so fast now—just do a real [factor analysis](#factorAnalysis)!\index{factor analysis!decisions}\index{factor analysis} -[Factor analysis](#factorAnalysis) better handles error than PCA—[factor analysis](#factorAnalysis) assumes that what is in the variable is the combination of common construct variance and error.\index{factor analysis!decisions}\index{principal component analysis}\index{factor analysis} -By contrast, PCA assumes that the measures have no measurement error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error} - -##### Factor Analysis {#factorAnalysis} - -Factor analysis is an analytic approach to factor extraction.\index{factor analysis!decisions}\index{factor analysis} -Factor analysis is a special case of [structural equation modeling](#sem) (SEM).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling} -Factor analysis is an analytic technique that is interested in the factor structure of a measure or set of measures.\index{factor analysis!decisions}\index{factor analysis} -Factor analysis is a theoretical approach that considers that there are latent theoretical constructs that influence the scores on particular variables.\index{factor analysis!decisions}\index{factor analysis}\index{latent variable} -It assumes that part of the explanation for each variable is shared between variables, and that part of it is unique variance.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance} -The unique variance is considered error.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error} -The common variance is called the communality, which is the factor variance.\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling!common variance}\index{communality} -Communality of a factor is estimated using the [average variance extracted](#averageVarianceExtracted) (AVE).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling!common variance}\index{communality}\index{reliability!internal consistency!average variance extracted} -The amount of variance due to error is: $1 - \text{communality}$.\index{factor analysis!decisions}\index{factor analysis}\index{measurement error}\index{structural equation modeling!common variance}\index{communality} -There are several types of factor analysis, including principal axis factoring and maximum likelihood factor analysis.\index{factor analysis!decisions}\index{factor analysis} - -Factor analysis can be used to test [measurement/factorial invariance](#measurementInvariance) and for [multitrait-multimethod](#MTMM) designs.\index{factor analysis!decisions}\index{factor analysis}\index{multitrait-multimethod matrix} -One example of a [MTMM](#MTMM) model in factor analysis is the correlated traits correlated methods model [@Tackett2019b].\index{factor analysis!decisions}\index{factor analysis}\index{multitrait-multimethod matrix} - -There are several differences between (real) factor analysis versus [PCA](#pca).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Factor analysis has greater sophistication than [PCA](#pca), but greater sophistication often results in greater assumptions.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Factor analysis does not always work; the data may not always fit to a factor analysis model; therefore, use [PCA](#pca) as a second/last option.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -[PCA](#pca) can decompose any data matrix; it always works.\index{factor analysis!decisions}\index{principal component analysis} -[PCA](#pca) is okay if you are not interested in the factor structure.\index{factor analysis!decisions}\index{principal component analysis} -[PCA](#pca) uses all variance of variables and assumes variables have no error, so it does not account for measurement error.\index{factor analysis!decisions}\index{principal component analysis}\index{measurement error} -[PCA](#pca) is good if you just want to form a linear composite and if the causal structure is [formative](#formativeConstruct) (rather than [reflective](#reflectiveConstruct)).\index{factor analysis!decisions}\index{principal component analysis}\index{linear composite}\index{construct!formative}\index{construct!reflective} -However, if you are interested in the factor structure, use factor analysis, which estimates a latent variable that accounts for the common variance and discards error variance.\index{factor analysis!decisions}\index{factor analysis} -Factor analysis is useful for the identification of latent constructs—i.e., underlying dimensions or factors that explain (cause) scores on items.\index{factor analysis!decisions}\index{factor analysis} - -#### 3. EFA or CFA {#efa-cfa} - -A third decision is the kind of [factor analysis](#factorAnalysis) to use: [exploratory factor analysis](#efa) (EFA) or [confirmatory factor analysis](#cfa) (CFA).\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory} - -##### Exploratory Factor Analysis (EFA) {#efa} - -Exploratory factor analysis (EFA) is used if you have no a priori hypotheses about the factor structure of the model, but you would like to understand the latent variables represented by your items.\index{factor analysis!decisions}\index{factor analysis!exploratory} - -EFA is partly induced from the data.\index{factor analysis!decisions}\index{factor analysis!exploratory} -You feed in the data and let the program build the factor model.\index{factor analysis!decisions}\index{factor analysis!exploratory} -You can set some parameters going in, including how to extract or rotate the factors.\index{factor analysis!decisions}\index{factor analysis!exploratory} -The factors are extracted from the data without specifying the number and pattern of loadings between the items and the latent factors [@Bollen2002].\index{factor analysis!decisions}\index{factor analysis!exploratory} -All cross-loadings are freely estimated.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory} - -##### Confirmatory Factor Analysis (CFA) {#cfa} - -Confirmatory factor analysis (CFA) is used to confirm a priori hypotheses about the factor structure of the model.\index{factor analysis!decisions}\index{factor analysis!confirmatory} -CFA is a test of the hypothesis.\index{factor analysis!decisions}\index{factor analysis!confirmatory} -In CFA, you specify the model and ask how well this model represents the data.\index{factor analysis!decisions}\index{factor analysis!confirmatory} -The researcher specifies the number, meaning, associations, and pattern of free parameters in the factor loading matrix [@Bollen2002].\index{factor analysis!decisions}\index{factor analysis!confirmatory} -A key advantage of CFA is the ability to directly compare alternative models (i.e., factor structures), which is valuable for theory testing [@Strauss2009].\index{factor analysis!decisions}\index{factor analysis!confirmatory} -For instance, you could use [CFA](#cfa) to test whether the variance in several measures' scores is best explained with one factor or two factors.\index{factor analysis!confirmatory} -In CFA, cross-loadings are not estimated unless the researcher specifies them.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!confirmatory} - -##### Exploratory Structural Equation Modeling (ESEM) {#efa-cfa-esem} - -In real life, there is not a clear distinction between [EFA](#efa) and [CFA](#cfa).\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory} -In [CFA](#cfa), researchers often set only half of the constraints, and let the data fill in the rest.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory} -In [EFA](#efa), researchers often set constraints and assumptions. -Thus, the line between [EFA](#efa) and [CFA](#cfa) is often blurred.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory} - -[EFA](#efa) and [CFA](#cfa) can be considered special cases of exploratory structural equation modeling (ESEM), which combines features of [EFA](#efa), [CFA](#cfa), and [SEM](#sem) [@Marsh2014].\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory} -ESEM can include any combination of exploratory (i.e., [EFA](#efa)) and confirmatory ([CFA](#cfa)) factors.\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory} -ESEM, unlike traditional [CFA](#cfa) models, typically estimates all cross-loadings—at least for the exploratory factors.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory} -If a [CFA](#cfa) model without cross-loadings and correlated residuals fits as well as an ESEM model with all cross-loadings, the [CFA](#cfa) model should be retained for its simplicity.\index{cross-loading}\index{structural equation modeling!residual!correlated}\index{parsimony}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory} -However, ESEM models often fit better than [CFA](#cfa) models because requiring no cross-loadings is an unrealistic expectation of items from many psychological instruments [@Marsh2014].\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory} -The correlations between factors tend to be positively biased when fitting [CFA](#cfa) models without cross-loadings, which leads to challenges in using [CFA](#cfa) to establish [discriminant validity](#discriminantValidity) [@Marsh2014].\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}\index{validity!discriminant} -Thus, compared to [CFA](#cfa), ESEM has the potential to more accurately estimate factor correlations and establish [discriminant validity](#discriminantValidity) [@Marsh2014].\index{factor analysis!decisions}\index{factor analysis!exploratory}\index{factor analysis!confirmatory}\index{structural equation modeling!exploratory}\index{validity!discriminant} -Moreover, ESEM can be useful in a [multitrait-multimethod](#MTMM) framework.\index{factor analysis!decisions}\index{structural equation modeling!exploratory}\index{multitrait-multimethod matrix} -We provide examples of ESEM in Section \@ref(esemModel).\index{factor analysis!decisions}\index{structural equation modeling!exploratory} - -#### 4. How Many Factors to Retain {#factorsToRetain} - -A goal of [factor analysis](#factorAnalysis) and [PCA](#pca) is simplification or parsimony, while still explaining as much variance as possible.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -The hope is that you can have fewer factors that explain the associations between the variables than the number of observed variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -But how do you decide on the number of factors (in [factor analysis](#factorAnalysis)) or components (in [PCA](#pca))?\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -There are a number of criteria that one can use to help determine how many factors/components to keep:\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -- Kaiser-Guttman criterion: in [PCA](#pca), components with eigenvalues greater than 1\index{factor analysis!decisions}\index{principal component analysis}\index{eigenvalue} - - or, for [factor analysis](#factorAnalysis), factors with eigenvalues greater than zero\index{factor analysis!decisions}\index{factor analysis}\index{eigenvalue} -- Cattell's scree test: the "elbow" in a scree plot minus one; sometimes operationalized with optimal coordinates (OC) or the acceleration factor (AF)\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -- Parallel analysis: factors that explain more variance than randomly simulated data\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis} -- [Very simple structure (VSS)](#vssPlot) criterion: larger is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -- Velicer's minimum average partial (MAP) test: smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -- Akaike information criterion (AIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -- Bayesian information criterion (BIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -- Sample size-adjusted BIC (SABIC): smaller is better\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -- Root mean square error of approximation (RMSEA): smaller is better\index{factor analysis!decisions}\index{factor analysis} -- Chi-square difference test: smaller is better; a significant test indicates that the more complex model is significantly better fitting than the less complex model\index{factor analysis!decisions}\index{factor analysis} -- Standardized root mean square residual (SRMR): smaller is better\index{factor analysis!decisions}\index{factor analysis} -- Comparative Fit Index (CFI): larger is better\index{factor analysis!decisions}\index{factor analysis} -- Tucker Lewis Index (TLI): larger is better\index{factor analysis!decisions}\index{factor analysis} - -There is not necessarily a "correct" criterion to use in determining how many factors to keep, so it is generally recommended that researchers use multiple criteria in combination with theory and interpretability.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -A scree plot from a [factor analysis](#factorAnalysis) or [PCA](#pca) provides lots of information.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -A scree plot has the factor number on the x-axis and the eigenvalue on the y-axis.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}\index{eigenvalue} -The eigenvalue is the variance accounted for by a factor; when using a varimax (orthogonal) rotation, an eigenvalue (or factor variance) is calculated as the sum of squared standardized factor (or component) loadings on that factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot}\index{eigenvalue} -An example of a scree plot is in Figure \@ref(fig:screePlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} - -```{r screePlot, out.width = "100%", fig.align = "center", fig.cap = "Example of a Scree Plot.", echo = FALSE} -knitr::include_graphics("./Images/screePlot.png") -``` - -The total variance is equal to the number of variables you have, so one eigenvalue is approximately one variable's worth of variance.\index{eigenvalue}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -In a [factor analysis](#factorAnalysis) and [PCA](#pca), the first factor (or component) accounts for the most variance, the second factor accounts for the second-most variance, and so on.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -The more factors you add, the less variance is explained by the additional factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -One criterion for how many factors to keep is the Kaiser-Guttman criterion. -According to the Kaiser-Guttman criterion, you should keep any factors whose eigenvalue is greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -That is, for the sake of simplicity, parsimony, and data reduction, you should take any factors that explain more than a single variable would explain.\index{parsimony}\index{data!reduction}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -According to the Kaiser-Guttman criterion, we would keep three factors from Figure \@ref(fig:screePlot) that have eigenvalues greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -The default in SPSS is to retain factors with eigenvalues greater than 1.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -However, keeping factors whose eigenvalue is greater than 1 is not the most correct rule.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -If you let SPSS do this, you may get many factors with eigenvalues around 1 (e.g., factors with an eigenvalue ~ 1.0001) that are not adding so much that it is worth the added complexity.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -The Kaiser-Guttman criterion usually results in keeping too many factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Factors with small eigenvalues around 1 could reflect error shared across variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -For instance, factors with small eigenvalues could reflect method variance (i.e., method factor), such as a self-report factor that turns up as a factor in [factor analysis](#factorAnalysis), but that may be useless to you as a conceptual factor of a construct of interest.\index{factor analysis!method factor}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} - -Another criterion is Cattell's scree test, which involves selecting the number of factors from looking at the scree plot.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -"Scree" refers to the rubble of stones at the bottom of a mountain.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -According to Cattell's scree test, you should keep the factors before the last steep drop in eigenvalues—i.e., the factors before the rubble, where the slope approaches zero.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue}\index{scree plot} -The beginning of the scree (or rubble), where the slope approaches zero, is called the "elbow" of a scree plot.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -Using Cattell's scree test, you retain the number of factors that explain the most variance prior to the explained variance drop-off, because, ultimately, you want to include only as many factors in which you gain substantially more by the inclusion of these factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -That is, you would keep the number of factors at the elbow of the scree plot minus one.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -If the last steep drop occurs from Factor 4 to Factor 5 and the elbow is at Factor 5, we would keep four factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -In Figure \@ref(fig:screePlot), the last steep drop in eigenvalues occurs from Factor 3 to Factor 4; the elbow of the scree plot occurs at Factor 4.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -We would keep the number of factors at the elbow minus one.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -Thus, using Cattell's scree test, we would keep three factors based on Figure \@ref(fig:screePlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} - -There are more sophisticated ways of using a scree plot, but they usually end up at a similar decision.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -Examples of more sophisticated tests include parallel analysis and [very simple structure (VSS) plots](#vssPlot).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{very simple structure plot}\index{parallel analysis} -In a parallel analysis, you examine where the eigenvalues from observed data and random data converge, so you do not retain a factor that explains less variance than would be expected by random chance.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}\index{eigenvalue} -A parallel analysis can be helpful when you have many variables and one factor accounts for the majority of the variance such that the elbow is at Factor 2 (which would result in keeping one factor), but you have theoretical reasons to select more than one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis} -An example in which parallel analysis may be helpful is with neurophysiological data.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{parallel analysis}\index{neurophysiological assessment} -For instance, parallel analysis can be helpful when conducting temporo-spatial [PCA](#pca) of event-related potential (ERP) data in which you want to separate multiple time windows and multiple spatial locations despite a predominant signal during a given time window and spatial location [@Dien2012].\index{factor analysis!decisions}\index{principal component analysis}\index{parallel analysis}\index{neurophysiological assessment} - -In general, my recommendation is to use Cattell's scree test, and then test the factor solutions with plus or minus one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -You should never accept [PCA](#pca) components with eigenvalues less than one (or factors with eigenvalues less than zero), because they are likely to be largely composed of error.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{eigenvalue} -If you are using maximum likelihood [factor analysis](#factorAnalysis), you can compare the fit of various models with model fit criteria to see which model fits best for its parsimony.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis} -A model will always fit better when you add additional parameters or factors, so you examine if there is *significant* improvement in model fit when adding the additional factor—that is, we keep adding complexity until additional complexity does not buy us much.\index{parsimony}\index{factor analysis!decisions}\index{factor analysis} -Always try a factor solution that is one less and one more than suggested by Cattell's scree test to buffer your final solution because the purpose of [factor analysis](#factorAnalysis) is to explain things and to have interpretability.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{scree plot} -Even if all rules or indicators suggest to keep X number of factors, maybe $\pm$ one factor helps clarify things.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Even though [factor analysis](#factorAnalysis) is empirical, theory and interpretatability should also inform decisions.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -#### 5. Factor Rotation {#factorRotation} - -The next step if using [EFA](#efa) or [PCA](#pca) is, possibly, to rotate the factors to make them more interpretable and simple, which is the whole goal.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -To interpret the results of a [factor analysis](#factorAnalysis), we examine the factor matrix.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -The columns refer to the different factors; the rows refer to the different observed variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -The cells in the table are the factor loadings—they are basically the correlation between the variable and the factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{structural equation modeling!factor loading} -Our goal is to achieve a model with simple structure because it is easily interpretable.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure} -*Simple structure* means that every variable loads perfectly on one and only one factor, as operationalized by a matrix of factor loadings with values of one and zero and nothing else.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure} -An example of a factor matrix that follows simple structure is depicted in Figure \@ref(fig:simpleStructure).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure} - -```{r simpleStructure, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Matrix That Follows Simple Structure.", echo = FALSE} -knitr::include_graphics("./Images/simpleStructure.png") -``` - -An example of a measurement model that follows simple structure is depicted in Figure \@ref(fig:factorSolutionSimpleStructure).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure} -Each variable loads onto one and only one factor, which makes it easy to interpret the meaning of each factor, because a given factor represents the common variance among the items that load onto it.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -```{r factorSolutionSimpleStructure, out.width = "100%", fig.align = "center", fig.cap = "Example of a Measurement Model That Follows Simple Structure. 'INT' = internalizing problems; 'EXT' = externalizing problems; 'TD' = thought-disordered problems.", fig.scap = "Example of a Measurement Model That Follows Simple Structure.", echo = FALSE} -knitr::include_graphics("./Images/factorSolutionSimpleStructure.png") -``` - -However, pure simple structure only occurs in simulations, not in real-life data.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{simple structure} -In reality, our measurement model in an unrotated [factor analysis](#factorAnalysis) model might look like the model in Figure \@ref(fig:factorSolutionUnrotatedExample).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -In this example, the measurement model does not show simple structure because the items have cross-loadings—that is, the items load onto more than one factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-loading}\index{simple structure}\index{structural equation modeling!measurement model} -The cross-loadings make it difficult to interpret the factors, because all of the items load onto all of the factors, so the factors are not very distinct from each other, which makes it difficult to interpret what the factors mean.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-loading}\index{simple structure}\index{structural equation modeling!measurement model} - -```{r factorSolutionUnrotatedExample, out.width = "100%", fig.align = "center", fig.cap = "Example of a Measurement Model That Does Not Follow Simple Structure. 'INT' = internalizing problems; 'EXT' = externalizing problems; 'TD' = thought-disordered problems.", fig.scap = "Example of a Measurement Model That Does Not Follow Simple Structure.", echo = FALSE} -knitr::include_graphics("./Images/factorSolutionUnrotatedExample.png") -``` - -As a result of the challenges of intepretability caused by cross-loadings, factor rotations are often performed.\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -An example of an unrotated factor matrix is in Figure \@ref(fig:factorMatrix).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} - -```{r factorMatrix, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Matrix.", echo = FALSE} -knitr::include_graphics("./Images/factorMatrix.png") -``` - -In the example factor matrix in Figure \@ref(fig:factorMatrix), the [factor analysis](#factorAnalysis) is not very helpful—it tells us very little because it did not distinguish between the two factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -The variables have similar loadings on Factor 1 and Factor 2.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -An example of a unrotated factor solution is in Figure \@ref(fig:factorSolutionUnrotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -In the figure, all of the variables are in the midst of the quadrants—they are not on the factors' axes.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -Thus, the factors are not very informative.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} - -```{r factorSolutionUnrotated, out.width = "100%", fig.align = "center", fig.cap = "Example of an Unrotated Factor Solution.", echo = FALSE} -knitr::include_graphics("./Images/factorSolutionUnrotated.png") -``` - -As a result, to improve the interpretability of the [factor analysis](#factorAnalysis), we can do what is called rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -*Rotation* involves changing the orientation of the factors by changing the axes so that variables end up with very high (close to one or negative one) or very low (close to zero) loadings, so that it is clear which factors include which variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -That is, it tries to identify the ideal solution (factor) for each variable.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -It searches for simple structure and keeps searching until it finds a minimum.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure} -After rotation, if the rotation was successful for imposing simple structure, each factor will have loadings close to one (or negative one) for some variables and close to zero for other variables.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure} -The goal of factor rotation is to achieve simple structure, to help make it easier to interpret the meaning of the factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure} -To perform factor rotation, orthogonal rotations are often used.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} -Orthogonal rotations make the rotated factors uncorrelated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} -An example of a commonly used orthogonal rotation is varimax rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} - -An example of a factor matrix following an orthogonal rotation is depicted in Figure \@ref(fig:factorMatrixRotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} -An example of a factor solution following an orthogonal rotation is depicted in Figure \@ref(fig:factorSolutionRotated).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} - -```{r factorMatrixRotated, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Matrix.", echo = FALSE} -knitr::include_graphics("./Images/factorMatrixRotated.png") -``` - -```{r factorSolutionRotated, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Solution.", echo = FALSE} -knitr::include_graphics("./Images/factorSolutionRotated.png") -``` - -An example of a factor matrix from SPSS following an orthogonal rotation is depicted in Figure \@ref(fig:rotatedFactorMatrix).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} - -```{r rotatedFactorMatrix, out.width = "100%", fig.align = "center", fig.cap = "Example of a Rotated Factor Matrix From SPSS.", echo = FALSE} -knitr::include_graphics("./Images/rotatedFactorMatrix.png") -``` - -An example of a factor structure from an orthogonal rotation is in Figure \@ref(fig:orthogonalRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{simple structure}\index{orthogonal rotation} - -```{r orthogonalRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Structure From an Orthogonal Rotation.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-08.png") -``` - -Sometimes, however, the two factors and their constituent variables may be correlated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation} -Examples of two correlated factors may be depression and anxiety.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation} -When the two factors are correlated in reality, if we make them uncorrelated, this would result in an inaccurate model.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation} -Oblique rotation allows for factors to be correlated, but if the factors have low correlation (e.g., .2 or less), you can likely continue with orthogonal rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} -An example of a factor structure from an oblique rotation is in Figure \@ref(fig:obliqueRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} -Results from an oblique rotation are more complicated than orthogonal rotation—they provide lots of output and are more complicated to interpret.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} -In addition, oblique rotation might not yield a smooth answer if you have a relatively small sample size.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} - -```{r obliqueRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Structure From an Oblique Rotation.", echo = FALSE} -knitr::include_graphics("./Images/FactorAnalysis-09.png") -``` - -As an example of rotation based on interpretability, consider the Five-Factor Model of Personality (the Big Five), which goes by the acronym, OCEAN: **O**penness, **C**onscientiousness, **E**xtraversion, **A**greeableness, and **N**euroticism.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} -Although the five factors of personality are somewhat correlated, we can use rotation to ensure they are maximally independent.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -Upon rotation, extraversion and neuroticism are essentially uncorrelated, as depicted in Figure \@ref(fig:factorRotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation} -The other pole of extraversion is intraversion and the other pole of neuroticism might be emotional stability or calmness. - -```{r factorRotation, out.width = "100%", fig.align = "center", fig.cap = "Example of a Factor Rotation of Neuroticism and Extraversion.", echo = FALSE} -knitr::include_graphics("./Images/factorRotation.png") -``` - -Simple structure is achieved when each variable loads highly onto as few factors as possible (i.e., each item has only one significant or primary loading).\index{simple structure} -Oftentimes this is not the case, so we choose our rotation method in order to decide if the factors can be correlated (an oblique rotation) or if the factors will be uncorrelated (an orthogonal rotation).\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation}\index{oblique rotation} -If the factors are not correlated with each other, use an orthogonal rotation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation} -The correlation between an item and a factor is a factor loading, which is simply a way to ask how much a variable is correlated with the underlying factor.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{structural equation modeling!factor loading} -However, its interpretation is more complicated if there are correlated factors!\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} - -An orthogonal rotation (e.g., varimax) can help with simplicity of interpretation because it seeks to yield simple structure without cross-loadings.\index{simple structure}\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation} -Cross-loadings are instances where a variable loads onto multiple factors.\index{cross-loading} -My recommendation would always be to use an orthogonal rotation if you have reason to believe that finding simple structure in your data is possible; otherwise, the factors are extremely difficult to interpret—what exactly does a cross-loading even mean?\index{cross-loading}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{orthogonal rotation} -However, you should always try an oblique rotation, too, to see how strongly the factors are correlated.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} -Examples of oblique rotations include oblimin and promax.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{rotation}\index{oblique rotation} - -#### 6. Model Selection and Interpretation {#modelSelectionInterpretation} - -The next step of [factor analysis](#factorAnalysis) is selecting and interpreting the model.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -One data matrix can lead to many different (correct) models—you must choose one based on the factor structure and theory.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Use theory to interpret the model and label the factors.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -When interpreting others' findings, do not rely just on the factor labels—look at the actual items to determine what they assess.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -What they are called matters much less than what the actual items are!\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} - -### The Downfall of Factor Analysis {#downfallOfFactorAnalysis} - -The downfall of [factor analysis](#factorAnalysis) is cross-validation.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation} -Cross-validating a factor structure would mean getting the same factor structure with a new sample.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation} -We want factor structures to show good replicability across samples.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}\index{replication} -However, cross-validation often falls apart.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation} -The way to attempt to replicate a factor structure in an independent sample is to use [CFA](#cfa) to set everything up and test the hypothesized factor structure in the independent sample.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{cross-validation}\index{replication}\index{factor analysis!confirmatory} - -### What to Do with Factors {#whatToDoWithFactors} - -What can you do with factors once you have them?\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -In [SEM](#sem), factors have meaning.\index{structural equation modeling}\index{latent variable} -You can use them as predictors, mediators, moderators, or outcomes.\index{structural equation modeling}\index{latent variable} -And, using latent factors in [SEM](#sem) helps disattenuate associations for measurement error, as described in Section \@ref(disattenuation). -People often want to use factors outside of [SEM](#sem), but there is confusion here: When researchers find that three variables load onto Factor A, the researchers often combine those three using a sum or average—but this is not accurate.\index{structural equation modeling}\index{latent variable}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -If you just add or average them, this ignores the factor loadings and the error.\index{structural equation modeling!factor loading}\index{measurement error}\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis} -Another solution is to form a linear composite by adding and weighting the variables by the factor loadings, which retains the differences in correlations (i.e., a weighted sum), but this still ignores the estimated error, so it still may not be generalizable and meaningful.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{linear composite}\index{measurement error} -At the same time, weighted sums may be less generalizable than unit-weighted composites where each variable is given equal weight because some variability in factor loadings likely reflects sampling error.\index{factor analysis!decisions}\index{factor analysis}\index{principal component analysis}\index{linear composite}\index{structural equation modeling!factor loading} - -### Missing Data Handling {#missingDataHandling-factorAnalysis} - -The [PCA](#pca) default in SPSS is listwise deletion of missing data: if a participant is missing data on any variable, the subject gets excluded from the analysis, so you might end up with too few participants.\index{factor analysis!decisions}\index{principal component analysis} -Instead, use a correlation matrix with pairwise deletion for [PCA](#pca) with missing data.\index{factor analysis!decisions}\index{principal component analysis} -Maximum likelihood [factor analysis](#factorAnalysis) can make use of all available data points for a participant, even if they are missing some data points.\index{factor analysis!decisions}\index{factor analysis} -Mplus, which is often used for [SEM](#sem) and [factor analysis](#factorAnalysis), will notify you if you are removing many participants in [CFA](#cfa)/[EFA](#efa).\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling}\index{factor analysis!confirmatory}\index{factor analysis!exploratory} -The `lavaan` package [@R-lavaan] in `R` also notifies you if you are removing participants in [CFA](#efa-cfa)/[SEM](#sem) models.\index{factor analysis!decisions}\index{factor analysis}\index{structural equation modeling}\index{factor analysis!exploratory} - ## Getting Started {#gettingStarted-factorAnalysis} ### Load Libraries {#loadLibraries-factorAnalysis}