# Principal Components Analysis

## Reduce survey data into factors that account for maximum variance

Principal Components Analysis (PCA) uses algorithms to "

Going back to the construct specification and the survey items, everything has been focused on measuring for

These inter-correlations among different sets of survey items (or content areas) provide a mathematical basis for understanding latent or underlying relationships that may exist. Principal Components Analysis (PCA) reduces survey data down into content areas that account for the most variance.

**reduce**" data into correlated "**factors**" that provide a**conceptual and mathematical understanding of the construct of interest**.Going back to the construct specification and the survey items, everything has been focused on measuring for

**one construct**related to answering the research question. Under the assumption that researchers are measuring for one construct, the individual items should correlate in some form or fashion.These inter-correlations among different sets of survey items (or content areas) provide a mathematical basis for understanding latent or underlying relationships that may exist. Principal Components Analysis (PCA) reduces survey data down into content areas that account for the most variance.

### The process of conducting a Principal Components Analysis

A Principal Components Analysis) is a

1. The inter-correlations amongst the items are calculated yielding a

2. The inter-correlated items, or "

3. These "factors" are

At this point, the researcher has to make a decision about how to move forward. Luckily, there are two statistical calculations that help you make this decision:

An eigenvalue is essentially a ratio of the shared variance to the unique variance accounted for in the construct of interest by each "factor" yielded from the extraction of principal components. An eigenvalue of

Scree plots provide a visual aid in deciding how many "factors" should be interpreted from the principal components extraction. In a scree plot, the eigenvalues are plotted against the order of "factors" extracted from the data. Because the first "factors" extracted from the principal components analysis often have the highest inter-correlations amongst their individual survey items, and will thus account for more overall variance in your construct of interest, they tend to be extracted first. As other "factors" are extracted, the inter-correlations will become weaker and have smaller eigenvalues. One can look at a scree plot and see a visually significant decrease at one point in time as eigenvalues decrease. This "

So, based on the two statistical calculations above, the eigenvalues and scree plot, make a decision on how many "factors" should be extracted.

These extracted "factors" of inter-correlated items are "

When it comes to interpreting the "factors" themselves, any item that does not at least have a correlation or "

There are a few assumptions that must be met to conduct a Principal Components Analysis (PCA):

1. There must be a

2.

3. The items must be written in a fashion where

4. Content areas and items must be utilized within some sort of

5. The sample must be

**three**step process:1. The inter-correlations amongst the items are calculated yielding a

**correlation matrix**.2. The inter-correlated items, or "

**factors**," are extracted from the correlation matrix to yield "**principal components.**"3. These "factors" are

**rotated**for purposes of analysis and interpretation.At this point, the researcher has to make a decision about how to move forward. Luckily, there are two statistical calculations that help you make this decision:

**Eigenvalues**and**scree plots**.An eigenvalue is essentially a ratio of the shared variance to the unique variance accounted for in the construct of interest by each "factor" yielded from the extraction of principal components. An eigenvalue of

**1.0 or greater**is the arbitrary criterion accepted in the current literature for deciding if a factor should be further interpreted. The logic underlying the criterion of 1.0 comes from the belief that the amount of shared variance explained by a "factor" should at least be equal to the unique variance the "factor" accounts for in the overall construct.Scree plots provide a visual aid in deciding how many "factors" should be interpreted from the principal components extraction. In a scree plot, the eigenvalues are plotted against the order of "factors" extracted from the data. Because the first "factors" extracted from the principal components analysis often have the highest inter-correlations amongst their individual survey items, and will thus account for more overall variance in your construct of interest, they tend to be extracted first. As other "factors" are extracted, the inter-correlations will become weaker and have smaller eigenvalues. One can look at a scree plot and see a visually significant decrease at one point in time as eigenvalues decrease. This "

**elbow**" or factor at which the screen plot has a significant reduction in eigenvalue and then level's off is often considered the criterion for selecting the number of "factors" to interpret.So, based on the two statistical calculations above, the eigenvalues and scree plot, make a decision on how many "factors" should be extracted.

These extracted "factors" of inter-correlated items are "

**rotated.**" This "**rotation**" occurs because it is prevalent for certain items to be highly inter-correlated with items on several different "factors." This makes it hard for the initial extraction of factors to be interpreted. The "rotation" forces these troublesome items onto the "factor" with which it has the most strongest association with the items of the "factor." This mathematical "rotation"**increases the interpretability of extracted "factors," but cancels out the ability to interpret the amount of shared variance associated with the "factor."**When it comes to interpreting the "factors" themselves, any item that does not at least have a correlation or "

**factor loading" of .3**with the "factor" it has loaded on should be discarded.There are a few assumptions that must be met to conduct a Principal Components Analysis (PCA):

1. There must be a

**large enough sample size**to allow the correlations to converge into mutually exclusive "factors."2.

**Normality and linearity**of the items is assumed because correlations provide the mathematical foundation for factor analysis to extract "factors."3. The items must be written in a fashion where

**sufficiently high enough correlations can be yielded and extracted.**4. Content areas and items must be utilized within some sort of

**theoretical or conceptual framework**so that correlations can be yielded.5. The sample must be

**relatively homogeneous**so that the construct can be measured for in its relative context in the given population.### The steps for conducting a Principal Components Analysis (PCA) in SPSS

1. The data is entered in a within-subjects fashion.

2. Click

3. Drag the cursor over the

4. Click

5. Click on the first ordinal or continuous variable, observation, or item to highlight it.

6. Click on the

7. Repeat Steps 5 and 6 until all of the variables, observations, or items are in the

8. Click on the

9. Click on the

10. Click

11. Click on the

12. Click on the

13. Click

14. Click on the

15. Click on the

16. Click

17. Click on

18. In the

19. Type

20. Click

21. Click

2. Click

**.**__A__nalyze3. Drag the cursor over the

**drop-down menu.**__D__imension Reduction4. Click

**.**__F__actor5. Click on the first ordinal or continuous variable, observation, or item to highlight it.

6. Click on the

**arrow**to move the variable into the**box.**__V__ariables:7. Repeat Steps 5 and 6 until all of the variables, observations, or items are in the

**box.**__V__ariables:8. Click on the

**button.**__D__escriptives9. Click on the

**box to select it.**__K__MO and Bartlett's test of sphericity10. Click

**Continue**.11. Click on the

**button.**__E__xtraction12. Click on the

**box to select it.**__S__cree plot13. Click

**Continue**.14. Click on the

**Ro**button.__t__ation15. Click on the

**Direct**choice to select it.__O__blimin16. Click

**Continue**.17. Click on

**.**__O__ptions18. In the

**Coefficient Display Format**table, click on the**S**box to select it.__u__ppress small coefficients19. Type

**.40**into the**box.**__A__bsolute value below:20. Click

**Continue**.21. Click

**OK**.### The steps for interpreting the SPSS output for PCA

1. Look in the

2. The

3. The

If the

If the

4. Scroll down to the

5. Scroll down to the

**KMO and Bartlett's Test**table.2. The

**Kaiser-Meyer-Olkin Measure of Sampling Adequacy**(**KMO**) needs to be at least**.6**with values closer to 1.0 being better.3. The

**Sig.**row of the**Bartlett's Test of Sphericity**is the*p*-value that should be interpreted.If the

*p*-value is**LESS THAN .05**, reject the null hypothesis that this is an identity matrix.**RESEARCHERS WANT TO REJECT THE NULL HYPOTHESIS**.If the

*p*-value is**MORE THAN .05**, there is an identity matrix and**a principal components analysis should not be conducted**.4. Scroll down to the

**Total Variance Explained**table. Look under the**Initial Eigenvalues**column heading. The**Total**column contains the eigenvalues, interpret only factors that have an**eigenvalue above 1.0**. The**% of Variance**column shows**how much variance within the construct is accounted for by that factor**. The**Cumulative %**column shows the**total amount of variance accounted for**in the construct by factors with eigenvalues above 1.0. The total number of factors, the amount of variance each factor accounts for, and the final amount of variance accounted for by all factors with eigenvalues above 1.0 are important results to report.5. Scroll down to the

**Pattern Matrix**table. These are your extracted and rotated factors. Researchers will see which survey items "loaded" on each factor. The items in the factors constitute the underlying components of the overall construct.At this point, researchers discard the items that did not make it through the iterations of the reliability analysis, and formally structure the newly piloted survey with the items that loaded on factors with eigenvalues higher than 1.0. These are the survey items that will be tested within a nomological network with a new sample to establish validity evidence for the survey instrument and construct.

Click on the

**Validity**button to continue.## Hire A Statistician

**DO YOU NEED TO HIRE A STATISTICIAN?**

Eric Heidel, Ph.D., PStat** **will provide the following statistical consulting services for undergraduate and graduate students at $100/hour. Secure checkout is available with Stripe, Venmo, Zelle, or PayPal.

- Statistical Analysis
- Research Design
- Sample Size Calculations
- Diagnostic Testing and Epidemiological Calculations
- Survey Design and Psychometrics