Principal Components Analysis
Reduce survey data into factors that account for maximum variance
Principal Components Analysis (PCA) uses algorithms to "reduce" data into correlated "factors" that provide a conceptual and mathematical understanding of the construct of interest.
Going back to the construct specification and the survey items, everything has been focused on measuring for one construct related to answering the research question. Under the assumption that researchers are measuring for one construct, the individual items should correlate in some form or fashion.
These inter-correlations among different sets of survey items (or content areas) provide a mathematical basis for understanding latent or underlying relationships that may exist. Principal Components Analysis (PCA) reduces survey data down into content areas that account for the most variance.
Going back to the construct specification and the survey items, everything has been focused on measuring for one construct related to answering the research question. Under the assumption that researchers are measuring for one construct, the individual items should correlate in some form or fashion.
These inter-correlations among different sets of survey items (or content areas) provide a mathematical basis for understanding latent or underlying relationships that may exist. Principal Components Analysis (PCA) reduces survey data down into content areas that account for the most variance.
The process of conducting a Principal Components Analysis
A Principal Components Analysis) is a three step process:
1. The inter-correlations amongst the items are calculated yielding a correlation matrix.
2. The inter-correlated items, or "factors," are extracted from the correlation matrix to yield "principal components."
3. These "factors" are rotated for purposes of analysis and interpretation.
At this point, the researcher has to make a decision about how to move forward. Luckily, there are two statistical calculations that help you make this decision: Eigenvalues and scree plots.
An eigenvalue is essentially a ratio of the shared variance to the unique variance accounted for in the construct of interest by each "factor" yielded from the extraction of principal components. An eigenvalue of 1.0 or greater is the arbitrary criterion accepted in the current literature for deciding if a factor should be further interpreted. The logic underlying the criterion of 1.0 comes from the belief that the amount of shared variance explained by a "factor" should at least be equal to the unique variance the "factor" accounts for in the overall construct.
Scree plots provide a visual aid in deciding how many "factors" should be interpreted from the principal components extraction. In a scree plot, the eigenvalues are plotted against the order of "factors" extracted from the data. Because the first "factors" extracted from the principal components analysis often have the highest inter-correlations amongst their individual survey items, and will thus account for more overall variance in your construct of interest, they tend to be extracted first. As other "factors" are extracted, the inter-correlations will become weaker and have smaller eigenvalues. One can look at a scree plot and see a visually significant decrease at one point in time as eigenvalues decrease. This "elbow" or factor at which the screen plot has a significant reduction in eigenvalue and then level's off is often considered the criterion for selecting the number of "factors" to interpret.
So, based on the two statistical calculations above, the eigenvalues and scree plot, make a decision on how many "factors" should be extracted.
These extracted "factors" of inter-correlated items are "rotated." This "rotation" occurs because it is prevalent for certain items to be highly inter-correlated with items on several different "factors." This makes it hard for the initial extraction of factors to be interpreted. The "rotation" forces these troublesome items onto the "factor" with which it has the most strongest association with the items of the "factor." This mathematical "rotation" increases the interpretability of extracted "factors," but cancels out the ability to interpret the amount of shared variance associated with the "factor."
When it comes to interpreting the "factors" themselves, any item that does not at least have a correlation or "factor loading" of .3 with the "factor" it has loaded on should be discarded.
There are a few assumptions that must be met to conduct a Principal Components Analysis (PCA):
1. There must be a large enough sample size to allow the correlations to converge into mutually exclusive "factors."
2. Normality and linearity of the items is assumed because correlations provide the mathematical foundation for factor analysis to extract "factors."
3. The items must be written in a fashion where sufficiently high enough correlations can be yielded and extracted.
4. Content areas and items must be utilized within some sort of theoretical or conceptual framework so that correlations can be yielded.
5. The sample must be relatively homogeneous so that the construct can be measured for in its relative context in the given population.
1. The inter-correlations amongst the items are calculated yielding a correlation matrix.
2. The inter-correlated items, or "factors," are extracted from the correlation matrix to yield "principal components."
3. These "factors" are rotated for purposes of analysis and interpretation.
At this point, the researcher has to make a decision about how to move forward. Luckily, there are two statistical calculations that help you make this decision: Eigenvalues and scree plots.
An eigenvalue is essentially a ratio of the shared variance to the unique variance accounted for in the construct of interest by each "factor" yielded from the extraction of principal components. An eigenvalue of 1.0 or greater is the arbitrary criterion accepted in the current literature for deciding if a factor should be further interpreted. The logic underlying the criterion of 1.0 comes from the belief that the amount of shared variance explained by a "factor" should at least be equal to the unique variance the "factor" accounts for in the overall construct.
Scree plots provide a visual aid in deciding how many "factors" should be interpreted from the principal components extraction. In a scree plot, the eigenvalues are plotted against the order of "factors" extracted from the data. Because the first "factors" extracted from the principal components analysis often have the highest inter-correlations amongst their individual survey items, and will thus account for more overall variance in your construct of interest, they tend to be extracted first. As other "factors" are extracted, the inter-correlations will become weaker and have smaller eigenvalues. One can look at a scree plot and see a visually significant decrease at one point in time as eigenvalues decrease. This "elbow" or factor at which the screen plot has a significant reduction in eigenvalue and then level's off is often considered the criterion for selecting the number of "factors" to interpret.
So, based on the two statistical calculations above, the eigenvalues and scree plot, make a decision on how many "factors" should be extracted.
These extracted "factors" of inter-correlated items are "rotated." This "rotation" occurs because it is prevalent for certain items to be highly inter-correlated with items on several different "factors." This makes it hard for the initial extraction of factors to be interpreted. The "rotation" forces these troublesome items onto the "factor" with which it has the most strongest association with the items of the "factor." This mathematical "rotation" increases the interpretability of extracted "factors," but cancels out the ability to interpret the amount of shared variance associated with the "factor."
When it comes to interpreting the "factors" themselves, any item that does not at least have a correlation or "factor loading" of .3 with the "factor" it has loaded on should be discarded.
There are a few assumptions that must be met to conduct a Principal Components Analysis (PCA):
1. There must be a large enough sample size to allow the correlations to converge into mutually exclusive "factors."
2. Normality and linearity of the items is assumed because correlations provide the mathematical foundation for factor analysis to extract "factors."
3. The items must be written in a fashion where sufficiently high enough correlations can be yielded and extracted.
4. Content areas and items must be utilized within some sort of theoretical or conceptual framework so that correlations can be yielded.
5. The sample must be relatively homogeneous so that the construct can be measured for in its relative context in the given population.
The steps for conducting a Principal Components Analysis (PCA) in SPSS
1. The data is entered in a within-subjects fashion.
2. Click Analyze.
3. Drag the cursor over the Dimension Reduction drop-down menu.
4. Click Factor.
5. Click on the first ordinal or continuous variable, observation, or item to highlight it.
6. Click on the arrow to move the variable into the Variables: box.
7. Repeat Steps 5 and 6 until all of the variables, observations, or items are in the Variables: box.
8. Click on the Descriptives button.
9. Click on the KMO and Bartlett's test of sphericity box to select it.
10. Click Continue.
11. Click on the Extraction button.
12. Click on the Scree plot box to select it.
13. Click Continue.
14. Click on the Rotation button.
15. Click on the Direct Oblimin choice to select it.
16. Click Continue.
17. Click on Options.
18. In the Coefficient Display Format table, click on the Suppress small coefficients box to select it.
19. Type .40 into the Absolute value below: box.
20. Click Continue.
21. Click OK.
2. Click Analyze.
3. Drag the cursor over the Dimension Reduction drop-down menu.
4. Click Factor.
5. Click on the first ordinal or continuous variable, observation, or item to highlight it.
6. Click on the arrow to move the variable into the Variables: box.
7. Repeat Steps 5 and 6 until all of the variables, observations, or items are in the Variables: box.
8. Click on the Descriptives button.
9. Click on the KMO and Bartlett's test of sphericity box to select it.
10. Click Continue.
11. Click on the Extraction button.
12. Click on the Scree plot box to select it.
13. Click Continue.
14. Click on the Rotation button.
15. Click on the Direct Oblimin choice to select it.
16. Click Continue.
17. Click on Options.
18. In the Coefficient Display Format table, click on the Suppress small coefficients box to select it.
19. Type .40 into the Absolute value below: box.
20. Click Continue.
21. Click OK.
The steps for interpreting the SPSS output for PCA
1. Look in the KMO and Bartlett's Test table.
2. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) needs to be at least .6 with values closer to 1.0 being better.
3. The Sig. row of the Bartlett's Test of Sphericity is the p-value that should be interpreted.
If the p-value is LESS THAN .05, reject the null hypothesis that this is an identity matrix. RESEARCHERS WANT TO REJECT THE NULL HYPOTHESIS.
If the p-value is MORE THAN .05, there is an identity matrix and a principal components analysis should not be conducted.
4. Scroll down to the Total Variance Explained table. Look under the Initial Eigenvalues column heading. The Total column contains the eigenvalues, interpret only factors that have an eigenvalue above 1.0. The % of Variance column shows how much variance within the construct is accounted for by that factor. The Cumulative % column shows the total amount of variance accounted for in the construct by factors with eigenvalues above 1.0. The total number of factors, the amount of variance each factor accounts for, and the final amount of variance accounted for by all factors with eigenvalues above 1.0 are important results to report.
5. Scroll down to the Pattern Matrix table. These are your extracted and rotated factors. Researchers will see which survey items "loaded" on each factor. The items in the factors constitute the underlying components of the overall construct.
2. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) needs to be at least .6 with values closer to 1.0 being better.
3. The Sig. row of the Bartlett's Test of Sphericity is the p-value that should be interpreted.
If the p-value is LESS THAN .05, reject the null hypothesis that this is an identity matrix. RESEARCHERS WANT TO REJECT THE NULL HYPOTHESIS.
If the p-value is MORE THAN .05, there is an identity matrix and a principal components analysis should not be conducted.
4. Scroll down to the Total Variance Explained table. Look under the Initial Eigenvalues column heading. The Total column contains the eigenvalues, interpret only factors that have an eigenvalue above 1.0. The % of Variance column shows how much variance within the construct is accounted for by that factor. The Cumulative % column shows the total amount of variance accounted for in the construct by factors with eigenvalues above 1.0. The total number of factors, the amount of variance each factor accounts for, and the final amount of variance accounted for by all factors with eigenvalues above 1.0 are important results to report.
5. Scroll down to the Pattern Matrix table. These are your extracted and rotated factors. Researchers will see which survey items "loaded" on each factor. The items in the factors constitute the underlying components of the overall construct.
At this point, researchers discard the items that did not make it through the iterations of the reliability analysis, and formally structure the newly piloted survey with the items that loaded on factors with eigenvalues higher than 1.0. These are the survey items that will be tested within a nomological network with a new sample to establish validity evidence for the survey instrument and construct.
Click on the Validity button to continue.
Hire A Statistician
DO YOU NEED TO HIRE A STATISTICIAN?
Eric Heidel, Ph.D., PStat will provide you with statistical consultation services for your research project at $100/hour. Secure checkout is available with Stripe, Venmo, Zelle, or PayPal.
- Statistical Analysis on any kind of project
- Dissertation and Thesis Projects
- DNP Capstone Projects
- Clinical Trials
- Analysis of Survey Data