Statistical Package for the Social Sciences (SPSS; Armonk, NY, IBM Corp.) is a statistical software application that allows for researchers to enter and manipulate data and conduct various statistical analyses. Step by step methods for conducting and interpreting over 60 statistical tests are available in Research Engineer. Videos will be coming soon. Click on a link below to gain access to the methods for conducting and interpreting the statistical analysis in SPSS.
Parametric statistics are more powerful statistics
Non-parametric statistics are used with categorical and ordinal outcomes
As we continue our journey to break through the barriers associated with statistical lexicons, here is another dichotomy of popular statistical terms that are spoken commonly but not always understood by everyone.
Parametric statistics are used to assess differences and effects for continuous outcomes. These statistical tests include one-sample t-tests, independent samples t-tests, one-way ANOVA, repeated-measures ANOVA, ANCOVA, factorial ANOVA, multiple regression, MANOVA, and MANCOVA.
Non-parametric statistics are used to assess differences and effects for:
1. Ordinal outcomes - One-sample median tests, Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Friedman's ANOVA, Proportional odds regression
2. Categorical outcomes - Chi-square, Chi-square Goodness-of-fit, odds ratio, relative risk, McNemar's, Cochran's Q, Kaplan-Meier, log-rank test, Cochran-Mantel-Haenszel, Cox regression, logistic regression, multinomial logistic regression
3. Small sample sizes (n < 30) - Smaller sample sizes make it harder to meet the statistical assumptions associated with parametric statistics. Non-parametric statistics can generate valid statistical inferences in these situations.
4. Violations of statistical assumptions for parametric tests - Normality, Homogeneity of variance, Normality of difference scores
Logistic regression yields adjusted odds ratios
Adjusted odds ratios are easier generalized to clinical situations
There is a strong need in clinical medicine for adjusted odds ratios with 95% confidence intervals. Medicine, as a science, often uses categorical outcomes to research causal effects. It is important to assess clinical outcomes (measured at the dichotomous categorical level) within the context of various predictor, clinical, prognostic, demographic, and confounding variables. Logistic regression is the statistical method used to understand the associations between the aforementioned variables and dichotomous categorical outcomes.
Logistic regression yields adjusted odds ratios with 95% confidence intervals, rather than the more prevalent unadjusted odds ratios used in 2x2 tables. The odds ratios in logistic regression are "adjusted" because their associations to the dichotomous categorical outcome are "controlled for" or "adjusted" by the other variables in the model. The 95% confidence interval is used as the primary inference with adjusted odds ratios, just like with unadjusted odds ratios. If the 95% confidence interval crosses over 1.0, then there is a non-significant association with the outcome variable.
Adjusted odds ratios are important in medicine because very few physiological or medical phenomena are bivariate in nature. Most disease states or physiological disorders are understood and detected within the context of many different factors or variables. Therefore, to truly understand treatment effects and clinical phenomena, multivariate adjustment must occur to properly account for clinical, prognostic, demographic, and confounding variables.
Multivariate statistical tests show evidence of association between predictor variables and an outcome, when controlling for demographic, confounding, and other patient data.
Multivariate statistics are more reflective of real-world medicine
We covered between-subjects and within-subjects analyses in the first Statistical Designs post. Multivariate statistics will be the focus in Statistical Designs 2.
While 90% of statistics reported in the literature fall under the guise of between-subjects and within-subjects analyses, they do not properly account for all of the variance and confounding effects that exist in reality. Multivariate statistics play an important role in empirical reasoning because they allow us to control for various demographic, confounding, clinical, or prognostic variables that mitigate, mediate, and affect the association between a predictor and outcome variable. They are also much more representative of reality and true effects that exist within human populations.
Very few if any relationships or treatment effects in physiology, psychology, education, or life in general are bivariate in nature. Relationships and treatment effects in reality ARE multivariate, diverse, and confounded by any number of characteristics. Therefore, it makes sense that researchers should be conducting multivariate statistics to truly understand human phenomena.
With this being said, it is important to use multivariate statistics ONLY when you are asking a multivariate research question. Throwing a bunch of variables into a model without some sort of theoretical or conceptual reason for including them can yield false treatment effects and increase Type I errors. Also, these spurious variables can create "statistical noise" which detracts from a model's capability for detecting significant associations.
Choosing the correct multivariate statistic to answer your question is simple. You choose the multivariate analysis based on the outcome.
1. Categorical outcomes - Logistic regression (dichotomous), multinomial logistic regression (polychotomous), Kaplan-Meier, Cochran-Mantel-Haenszel, Cox regression (dichotomous/survival/time-to-event)
2. Ordinal outcomes - Proportional odds regression
3. Continuous outcomes - Factorial ANOVA with fixed effects, factorial ANOVA with random effects, factorial ANOVA with mixed effects, ANCOVA, multiple regression, MANOVA, MANCOVA
4. Count outcomes - Negative binomial regression (variance larger than mean) and Poisson regression (mean larger than variance)
Ordinal measures and normality
Ordinal level measurement can become interval level with assumed normality
Here is an interesting trick I picked up along the way when it comes to ordinal outcomes and some unvalidated measures. If you run skewness and kurtosis statistics on the ordinal variable and its distribution meets the assumption of normality (skewness and kurtosis statistics are less than an absolute value of 2.0), then you can "upgrade" the variable to a continuous level of measurement and analyze it using more powerful parametric statistics.
This type of thinking is the reason that the SAT, ACT, GRE, MCAT, LSAT, and validated psychological instruments are perceived at a continuous level. The scores yielded from these instruments, by definition, are not continuous because a "true zero" does not exist. Scores from these tests are often norm- or criterion-referenced to the population so that they can be interpreted in the correct context. Therefore, with the subjectivity and measurement error associated with classical test theory and item response theory, the scores are actually ordinal.
With that being said, if the survey instrument or ordinal outcome is used in the empirical literature often and it meets the assumption of normality as per skewness and kurtosis statistics, treat the ordinal variable as a continuous variable and run analyses using parametric statistics (t-tests, ANOVA, regression) versus non-parametric statistics (Chi-square, Mann-Whitney U, Kruskal-Wallis, McNemar's, Wicoxon, Friedman's ANOVA, logistic regression).
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.