Statistical Package for the Social Sciences (SPSS; Armonk, NY, IBM Corp.) is a statistical software application that allows for researchers to enter and manipulate data and conduct various statistical analyses. Step by step methods for conducting and interpreting over 60 statistical tests are available in Research Engineer. Videos will be coming soon. Click on a link below to gain access to the methods for conducting and interpreting the statistical analysis in SPSS.
Comparison of independent groups on an outcome
Number of groups, scales of measurement, and meeting statistical assumptions
Between-subjects statistics are used when comparing independent groups on an outcome. Independent groups means that the groups are "different" or "independent" from each other according to some characteristic. With between-subjects designs, participants can only be part of one group (independence) and only observed once (independence of observations, IOO).
One chooses a between-subjects statistical test based on the following:
1. Number of independent groups being compared (one group, two groups, or three or more groups)
2. Scale of measurement of the outcome (categorical, ordinal, or continuous)
3. Meeting statistical assumptions (independence of observations, normality, and homogeneity of variance)
Here is a list of between-subjects statistical tests and when they are utilized in applied quantitative research:
1. Chi-square Goodness-of-fit - One group, categorical outcome, a priori hypothesis for dispersal of outcome
2. One-sample median test - One group, ordinal outcome, a priori hypothesis for median value
3. One-sample t-test - One group, continuous outcome, meet the assumption of IOO and normality, a priori hypothesis for mean value
4. Chi-square - Two independent groups, categorical outcome, and chi-square assumption (more than five observations in each cell)
5. Fisher's Exact test - Two independent groups, categorical outcome, and when the chi-square assumption is not met
6. Mann-Whitney U - Two independent groups, ordinal outcome, and when the assumption of homogeneity of variance for independent samples t-test is violated
7. Independent samples t-test - Two independent groups, continuous outcome, meet the assumption of IOO, normality (skewness and kurtosis statistics), and homogeneity of variance (also known as homoscedasticity, tested with Levene's test)
8. Unadjusted odds ratio - Three or more independent groups, categorical outcome, chi-square assumption, choose a reference category and compare each independent group to the reference
9. Kruskal-Wallis - Three or more independent groups, ordinal outcome, and when the assumption of homogeneity of variance is violated
10. ANOVA - Three or more independent groups, continuous outcome, meet the assumption of IOO, normality, and homogeneity of variance
Parametric statistics are more powerful statistics
Non-parametric statistics are used with categorical and ordinal outcomes
As we continue our journey to break through the barriers associated with statistical lexicons, here is another dichotomy of popular statistical terms that are spoken commonly but not always understood by everyone.
Parametric statistics are used to assess differences and effects for continuous outcomes. These statistical tests include one-sample t-tests, independent samples t-tests, one-way ANOVA, repeated-measures ANOVA, ANCOVA, factorial ANOVA, multiple regression, MANOVA, and MANCOVA.
Non-parametric statistics are used to assess differences and effects for:
1. Ordinal outcomes - One-sample median tests, Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Friedman's ANOVA, Proportional odds regression
2. Categorical outcomes - Chi-square, Chi-square Goodness-of-fit, odds ratio, relative risk, McNemar's, Cochran's Q, Kaplan-Meier, log-rank test, Cochran-Mantel-Haenszel, Cox regression, logistic regression, multinomial logistic regression
3. Small sample sizes (n < 30) - Smaller sample sizes make it harder to meet the statistical assumptions associated with parametric statistics. Non-parametric statistics can generate valid statistical inferences in these situations.
4. Violations of statistical assumptions for parametric tests - Normality, Homogeneity of variance, Normality of difference scores
McNemar's can be used as a post hoc test
Significant main effects for Cochran's Q need to be explained
Non-parametric tests like chi-square, fisher's exact test, Kruskal-Wallis, Cochran's Q, and Friedman's ANOVA do not have post hoc analyses to explain significant main effects. In order to conduct these post hoc anlayses, researchers have to resort to using subsequent non-parametric tests for two groups.
In a prior post, I explained how Mann-Whitney U tests were used in a post hoc fashion for significant main effects found with Kruskal-Wallis analyses. This is pertinent for between-subjects tests.
If you are using a within-subjects design with three or more observations of a dichotomous categorical outcome, you utilize Cochran's Q test to assess main effects. If a significant main effect is found, then McNemar's tests have to be employed for post hoc group comparisons. Significant post hoc tests (or relative risk calculations) will provide evidence of significant differences across observations or within-subjects.
Non-parametric statistics should be employed more often than they are in the literature. Many published studies use small sample sizes and ordinal or categorical outcomes. The statistical assumptions of more power parametric statistics can often not be met with these types of designs. Non-parametric statistics are robust to these violations and should be used accordingly. Post hoc analyses are important in non-parametric statistics, just like in parametric statistics.
Non-parametric statistics are robust to small sample sizes
The right way to conduct statistics
Mark Twain said it best, "There are lies, damn lies, and statistics." Statistics can be misleading from both the standpoint of the person conducting the statistics and the person that is interpreting the analyses. Many between-subjects studies have small sample sizes (n < 20) and statistical assumptions for parametric statistics cannot be met.
For basic researchers that operate day in and day out with small sample sizes, the answer is to use non-parametric statistics. Non-parametric statistical tests such as the Mann-Whitney U, Kruskal-Wallis, Wilcoxon, and Friedman's ANOVA are robust to violations of statistical assumptions and skewed distributions. These tests can yield interpretable medians, interquartile ranges, and p-values.
Non-parametric statistics are also useful in the social sciences due to the inherent measurement error associated with assessing human behaviors, thoughts, feelings, intelligence, and emotional states. The underlying algebra associated with psychometrics relies on intercorrelations amongst constructs or items. Correlations can easily be skewed by outlying observations and measurement error. Therefore, in concordance with mathematical and empirical reasoning, non-parametric statistics should be used often for between-subjects comparisons of surveys, instruments, and psychological measures.
Ordinal measures and normality
Ordinal level measurement can become interval level with assumed normality
Here is an interesting trick I picked up along the way when it comes to ordinal outcomes and some unvalidated measures. If you run skewness and kurtosis statistics on the ordinal variable and its distribution meets the assumption of normality (skewness and kurtosis statistics are less than an absolute value of 2.0), then you can "upgrade" the variable to a continuous level of measurement and analyze it using more powerful parametric statistics.
This type of thinking is the reason that the SAT, ACT, GRE, MCAT, LSAT, and validated psychological instruments are perceived at a continuous level. The scores yielded from these instruments, by definition, are not continuous because a "true zero" does not exist. Scores from these tests are often norm- or criterion-referenced to the population so that they can be interpreted in the correct context. Therefore, with the subjectivity and measurement error associated with classical test theory and item response theory, the scores are actually ordinal.
With that being said, if the survey instrument or ordinal outcome is used in the empirical literature often and it meets the assumption of normality as per skewness and kurtosis statistics, treat the ordinal variable as a continuous variable and run analyses using parametric statistics (t-tests, ANOVA, regression) versus non-parametric statistics (Chi-square, Mann-Whitney U, Kruskal-Wallis, McNemar's, Wicoxon, Friedman's ANOVA, logistic regression).
Research questions lead to choice of statistical design
Differences between-subjects and within-subjects designs
There are terms in statistics that many people do not understand from a practical standpoint. I'm a biostatistical scientist and it took me YEARS to wrap my head around some fundamental aspects of statistical reasoning, much less the lexicon. I would hypothesize that 90% of the statistics reported in the empirical literature as a whole fall between two different categories of statistics, between-subjects and within-subjects. Here is a basic breakdown of the differences in these types of statistical tests:
1. Between-subjects - When you are comparing independent groups on a categorical, ordinal, or continuous outcome variable, you are conducting between-subjects analyses. The "between-" denotes the differences between mutually exclusive groups or levels of a categorical predictor variable. Chi-square, Mann-Whitney U, independent-samples t-tests, odds ratio, Kruskal-Wallis, and one-way ANOVA are all considered between-subjects analyses because of the comparison of independent groups.
2. Within-subjects - When you are comparing THE SAME GROUP on a categorical, ordinal, or continuous outcome ACROSS TIME OR WITHIN THE SAME OBJECT OF MEASUREMENT MULTIPLE TIMES, then you are conducting within-subjects analyses. The "within-" relates to the differences within the same object of measurement across multiple observations, time, or literally, "within-subjects." Chi-square Goodness-of-fit, Wilcoxon, repeated-measures t-tests, relative risk, Friedman's ANOVA, and repeated-measures ANOVA are within-subjects analyses because the same group or cohort of individuals is measured at several different time-points or observations.
Mann-Whitney U and Wilcoxon as post hoc tests
Explain significant main effects from Kruskal-Wallis tests and Friedman's ANOVA
Non-parametric statistics are used when analyzing categorical and ordinal outcomes. These statistics are also used with smaller sample sizes (n < 20) and when the assumptions of certain statistical tests are violated.
The Mann-Whitney U test is employed when comparing two independent groups on an ordinal outcome. It is also used when the assumptions of an independent samples or unpaired t-test are violated (normality, homogeneity of variance).
The Wilcoxon test is used when comparing ordinal outcomes at two different points in time or within-subjects. It is further used when the assumptions of a repeated measures t-test are violated (independence of observations, normality of difference scores).
A lesser known use for these two non-parametric tests is when significant main effects are found for non-parametric Kruskal-Wallis and Friedman's ANOVA tests. Much like with a parametric one-way ANOVA or repeated-measures ANOVA, if a significant main effect is found using non-parametric statistics, then a post hoc analysis must be undertaken to explain the significant main effect. Non-parametric statistics do not have Tukey, Scheffe, and Dunnett tests like parametric statistics!
When a significant main effect is found using a Kruskal-Wallis test, subsequent Mann-Whitney U tests must be employed in a post hoc fashion to explain where amongst the independent groups the actual differences exist.
The same holds true for Friedman's ANOVA. If a significant main effect is found, then Wilcoxon tests must be used in a post hoc fashion to explain where the significant changes occur amongst the observations or within-subjects.
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.