As I have been adding statistical assumptions to the website, it is safe to say that the "levee" has broke and I have A LOT more work to do to make this the best statistical and empirical website. With this being said, check out the new assumptions pages:
Analyze three or more measures of an ordinal outcome
Wilcoxon is used as a post hoc test for significant main effects
The Greenhouse-Geisser correction is often employed when analyzing data with repeated-measures ANOVA. The statistical assumption of sphericity, as assessed by Mauchly's test in SPSS, is more often times than not violated. The Greenhouse-Geisser correction is robust to the violation of this statistical assumption with repeated-measures ANOVA. The means and standard deviations from a repeated-measures ANOVA can then be interpreted.
Friedman's ANOVA, in my experience, does not make many appearances in the empirical literature. Few people take three or more within-subjects or repeated measures of an ordinal outcome in order to answer their primary research question, I guess. It is a non-parametric statistical test since the data is measured at more of an ordinal level. When a significant main effect is found with a Friedman's ANOVA, then post hoc comparisons must be made within-subjects or amongst observations using Wilcoxon tests.
Friedman's ANOVA, while being a non-parametric statistic, may have the most statistical power when employed with cross-sectional data yielded from a survey instrument that has limited reliability and validity evidence. Likert scales and composite scores from such tests may be naturally skewed due to systematic and unsystematic error. Friedman's ANOVA is robust to these types of distributions that come from cross-sectional studies in the social sciences.
If the assumption of normality among the difference scores between observations of a continuous outcome cannot be met, then Friedman's ANOVA can be used to yield inferential evidence. But it is always a better idea to first check for outliers in a distribution (individual observations that are more than 3.29 standard deviations away from the mean) and make a decision as to whether 1) delete the observation in a listwise fashion, or 2) run a logarithmic transformation on the distribution.
You will have transform the other observations of the outcome if you choose #2 above. The means and standard deviations of transformed variables cannot be interpreted but the p-values can be interpreted. Report the median and interquartile range for transformed variables.
Deleting observations can introduce bias into the statistical analysis. This should only be done if the number of outliers constitutes less than 10% of the overall distribution. One can also run between-subjects comparisons between participants with all observations of the outcome versus participants without all observations. If there are no differences on predictor, confounding, and outcome variables between these two groups, then lessened observation bias can be assumed.
Logarithmic transformations adjust skewed distributions
Analyze skewed data using more powerful parametric statistics
Logarithmic transformations are powerful statistical tools when employed and interpreted in the correct fashion. Transforming the distribution of a continuous variable due to violating normality allows researchers to account for outlying observations and use more powerful parametric statistics to assess any significant associations.
Also, some continuous variables are naturally skewed. One particular outcome that is prevalent in medicine is LOS or length of stay in the hospital. Most patients will be in the hospital between one and three days, VERY FEW will be in the hospital for weeks and months at a time. In order to include these outlying patients in analyses, transformations must be performed. Naturally skewed variables can be analyzed with parametric statistics with transformations!
An important thing to remember when conducting logarithmic transformations is that only the p-value associated with inferential statistics can be interpreted, NOT the means and standard deviations of the transformed observations. Instead, researchers should report the median and interquartile range for the distribution.
Some continuous variables will be naturally skewed
In medicine, there is an important metric that signifies efficiency and quality in healthcare, length of stay (LOS) in the hospital. When thinking about the distribution of a variable such as LOS, you have to put it into a relative context. The vast majority of people will have an LOS of between 0-3 days given the type of treatment or injury that brought them to hospital. VERY FEW individuals will stay at the hospital one month, six months, or a year. Therefore, the distribution looks nothing like the normal curve and is extremely positively skewed.
As a researcher, you may want to predict for a continuous variable that has a natural and logical skewness to its distribution in the population. Yet, the assumption of normality is a central tenet of running statistical analyses. What is one to do in this situation?
The answer is to first, run skewnessand kurtosis statistics to assess the normality of your continuous outcome. If the either statistic is above an absolute value of 2.0, then the distribution is non-normal. Check for outliers in the distribution that are more than 3.29 standard deviations away from the mean. Make sure that the outlying observations were entered correctly.
You now have a choice:
1. You can delete the outlying observations in a listwise fashion. This should be done only if the number of outlying variables is less than 10% of the overall distribution. This is the least preferable choice.
2. You can conduct a logarithmic transformation on the outcome variable. Doing this will normalize the distribution so that you can run the analysis using parametric statistics. The unstandardized beta coefficients, standard errors, and standardized beta coefficients are not interpretable, but the significance of the associations between the predictor variables and the transformed outcome can yield some inferential evidence.
3. You can recode the continuous outcome variable into a lower level scale of measurement such as ordinal or categorical and run non-parametric statistics to seek out any associations. Of course, you are losing the precision and accuracy of continuous-level measurement and introducing measurement error into the outcome variable, but you will still be able to run inferential statistics.
4. You can use non-parametric statistics without changing the skewed variable at all. That is one of the primary benefits of non-parametric statistics: They are robust to violations of normality and homogeneity of variance. Instead of interpreting means and standard deviations, you will interpret medians and interquartile ranges with non-parametric statistics.
Click on the Statistics button to learn more.
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.