P-value - Eric Heidel, PhD PStat - Statistician For Hire

Tags

Published on

December 3, 2014

The assumption of independence of observations

Generalized Estimating Equations (GEE) Homogeneity Of Variance Independence Of Observations Assumption Kurtosis Levene's Test Non-parametric Statistics Normality Outliers P-value Skewness Statistical Assumptions

Independence of observations

Each participant in a sample can only be counted as one observation

As a biostatistician, I spend a lot of time testing for normality and homogeneity of variance.

Skewness and kurtosis statistics are used to assess the normality of a continuous variable's distribution. A skewness or kurtosis statistic above an absolute value of 2.0 is considered to be non-normal. Distributions are often non-normal due to outliers in the distribution. Any observation that falls more than 3.29 standard deviations away from the mean is considered an outlier.

Levene's Test of Equality of Variances is used to measure for meeting the assumption of homogeneity of variance. Any Levene's Test with a p-value below .05 means that the assumption has been violated. In the event that the assumption is violated, non-parametric tests can be employed.

There is one more important statistical assumption that exists coincident with the aforementioned two, the assumption of independence of observations. Simply stated, this assumption stipulates that study participants are independent of each other in the analysis. They are only counted once.

In between-subjects designs, each study participant is a mutually exclusive observation that is completely independent from all other participants in all other groups.

For within-subjects designs, each participant is independent of other participants. There are just multiple observations of the outcome, per participant.

With this being said, it is prevalent for researchers to take multiple measurements of an outcome and compare these multiple measurements in an independent fashion (oftentimes with differing numbers of observations across participants) or within-subjects (ALWAYS with differing numbers of observations of the outcome). By default, these are not independent measures and violate the assumption of independence of observations. What is one to do?

The answer is generalized estimating equations (GEE). This family of statistical tests are robust to multiple observations (or correlated observations) of an outcome and can be used for between-subjects, within-subjects, factorial, and multivariate analyses.
Published on

November 23, 2014

Logarithmic transformations for skewed variables

Interquartile Range Logarithmic Transformations Median Normality P-value

Logarithmic transformations adjust skewed distributions

Analyze skewed data using more powerful parametric statistics

Logarithmic transformations are powerful statistical tools when employed and interpreted in the correct fashion. Transforming the distribution of a continuous variable due to violating normality allows researchers to account for outlying observations and use more powerful parametric statistics to assess any significant associations.

Also, some continuous variables are naturally skewed. One particular outcome that is prevalent in medicine is LOS or length of stay in the hospital. Most patients will be in the hospital between one and three days, VERY FEW will be in the hospital for weeks and months at a time. In order to include these outlying patients in analyses, transformations must be performed. Naturally skewed variables can be analyzed with parametric statistics with transformations!

An important thing to remember when conducting logarithmic transformations is that only the p-value associated with inferential statistics can be interpreted, NOT the means and standard deviations of the transformed observations. Instead, researchers should report the median and interquartile range for the distribution.
Published on

November 18, 2014

Small sample sizes, Type II errors, and empirical reasoning

Accuracy Bonferroni Effect Size P-value Precision Sample Size Statistical-power-test Type II Error

Small sample sizes can lead to Type II errors

Significant effects may not be able to be detected

In instances where a phenomenon or outcome is less prevalent in the population, scientists are forced to work small sample sizes. It is just the nature of the science, and the phenomenon or outcome.

1. When working with smaller sample sizes, adequate statistical power (and therefore statistical significance) is VERY hard to achieve.

2. There is limited precision and accuracy when using categorical or ordinal outcomes, which can further decreases statistical power.

3. When measuring for small effect sizes, small sample sizes cannot provide enough variance in the outcome to detect clinically meaningful, but small effects. This REALLY decreases your statistical power since inferential statistics depend upon variance in the mathematical sense.

With this being said, remember to interpret the p-values yielded from RCT level studies with small sample sizes in the context of the aforementioned points. If a treatment effect does not obtain statistical significance, but appears to be CLINICALLY SIGNIFICANT with a p-value approaching significance (Type II error), then perhaps more credence can be found in the effect.

If researchers run bivariate tests on 30 different outcomes with less than 20 observations and claim significance without a Bonferroni adjustment, throw the article out.
Published on

October 5, 2014

95% confidence intervals

95% Confidence Interval Adjusted Odds Ratio Confidence Interval Hazard Ratio Measurement Odds Ratio With 95% CI P-value Relative Risk Statistics

Precision and consistency of treatment effects

95% confidence intervals are dependent upon sample size

If there is ANY statistical calculation that holds true value for researchers and clinicians on a day-to-day basis, it is the 95% confidence interval wrapped around the findings of inferential analyses. Statistics is not an exact mathematical science as far as other exact mathematical sciences go, measurement error is inherent when attempting to measure for anything related to human beings, and FEW tried and true causal effects have been proven scientifically. Statistics' strength as a mathematical science is in its ability to build confidence intervals around findings to put them into a relative context.

Also, 95% confidence intervals act as the primary inference associated with unadjusted odds ratios, relative risk, hazard ratios, and adjusted odds ratios. If the confidence interval crosses over 1.0, there is a non-significant effect. Wide 95% confidence intervals are indicative of small sample sizes and lead to decreased precision of the effect. Constricted or narrow 95% confidence intervals reflect increased precision and consistency of a treatment effect.

In essence, p-values should not be what people get excited about when it comes to statistical analyses. The interpretation of your findings within the context of the subsequent population means, odds, risk, hazard, and 95% confidence intervals IS the real "meat" of applied statistics.
Published on

October 3, 2014

Chi-square p-values are not enough

95% Confidence Interval Chi-square Odds Ratio With 95% CI P-value Relative Risk Sampling Error

Chi-square p-value

Odds ratio with 95% confidence interval should be reported and interpreted

Most people that need statistics are focused only on the almighty p-value of less than .05. When running Chi-square analyses between a dichotomous categorical predictor and a dichotomous categorical outcome, p-values are not the primary inference that should be interpreted for practical purposes. The lack of precision and accuracy in categorical measures coupled with sampling error makes the p-values yielded from Chi-square analyses virtually worthless in the applied sense.

The correct statistic to run is an unadjusted odds ratio with 95% confidence interval. This is the best measure for interpreting the magnitude of the association between two dichotomous categorical variables collected in a retrospective fashion. Relative risk can be calculated when the association is assessed in a prospective fashion.

The width of the 95% confidence interval and it crossing over 1.0 dictate the significance and precision of the association between the variables. With smaller sample sizes, the 95% confidence interval will be wider and less precise. Larger sample sizes will yield more precise effects.