Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size
My newest published article in Scientifica is now available for download online and on the Research Engineer website. The creation of the Statistical Power engine of Research Engineer led me to write the article. Click on the Download Article button below to download a .pdf directly from the website or click on the Statistical Power button to be taken to the aforementioned engine. Many thanks and regards to everyone that uses Research Engineer! -EH
Writing survey items
Write survey items that cover content areas
Survey items are composed of item stems and response sets
When it comes to writing survey items that use Likert scales as response sets, use 5-point Likert scales with increasing order. The 5-point scale is preferable to a 4-point, 3-point, or dichotomous scales because there is more chance for variance with a 5-point scale and there is a "neutral" rating.
Variance in the responses is needed in order to properly assess the diversity that may exist in a population. Increased variance is also important for the underlying mathematics associated with reliability analysis, exploratory factor analysis, validity analysis, and confirmatory factor analysis.
The use of 5-point Likert scales also works well in an aesthetic fashion for structuring a survey. Space and time can be saved in survey administration when items from similar content areas use the same 5-point Likert response set. These questions can be formatted into a matrix.
Finally, increasing order should be used when using a Likert scale, going from left to right.
Strongly Disagree, Disagree, Neither Agree Nor Disagree, Agree, Strongly Agree
Never, Rarely, Sometimes, Often, Always
Very Poor, Poor, Moderate, Good, Very Good
95% confidence intervals
Precision and consistency of treatment effects
95% confidence intervals are dependent upon sample size
If there is ANY statistical calculation that holds true value for researchers and clinicians on a day-to-day basis, it is the 95% confidence interval wrapped around the findings of inferential analyses. Statistics is not an exact mathematical science as far as other exact mathematical sciences go, measurement error is inherent when attempting to measure for anything related to human beings, and FEW tried and true causal effects have been proven scientifically. Statistics' strength as a mathematical science is in its ability to build confidence intervals around findings to put them into a relative context.
Also, 95% confidence intervals act as the primary inference associated with unadjusted odds ratios, relative risk, hazard ratios, and adjusted odds ratios. If the confidence interval crosses over 1.0, there is a non-significant effect. Wide 95% confidence intervals are indicative of small sample sizes and lead to decreased precision of the effect. Constricted or narrow 95% confidence intervals reflect increased precision and consistency of a treatment effect.
In essence, p-values should not be what people get excited about when it comes to statistical analyses. The interpretation of your findings within the context of the subsequent population means, odds, risk, hazard, and 95% confidence intervals IS the real "meat" of applied statistics.
Effect size, sample size, and statistical power
Choose an effect size to maximize statistical power and decrease sample size
Effect size, sample size, and statistical power are nebulous empirical constructs that require a strong working knowledge of each in a conceptual fashion. Also, there are basic interdependent relationships that exist amongst the three constructs. A change in one will ALWAYS exact a predictable and static change in the other two.
An effect size is the hypothesized difference expected by researchers in an a priori fashion between independent groups (between-subjects analysis), across time or observations (within-subjects analysis), or the magnitude and direction of association between constructs (correlations and multivariate analyses).
Effect size planning is perhaps the HARDEST part of designing a research study. Oftentimes, researchers have NO IDEA of what type of effect size they are trying to detect.
First and foremost, when researchers cannot state the hypothesized differences in their outcomes, an evidence-based measure of effect yielded from a published study that is theoretically or conceptually similar to the phenomenon of interest should be used. Using an evidence-based measure of effect in an a priori power analysis shows more empirical rigor on the part of the researchers and increases the internal validity of the study with the use of published values.
Sample size is the absolute number of participants that are sampled from a given population for purposes of running inferential statistics. The nomenclature of the word, inferential, denotes the basic empirical reasoning that we are drawing a representative sample from a population and then conducting statistics in order to make inferences back to said population. An important part of preliminary study planning is to specify the inclusion and exclusion criteria for participation in your study and then getting an idea of how large a participant pool you have available to you from which to draw a sample for purposes of running inferential statistics.
Due to the underlying algebra associated with mathematical science, large sample sizes will drastically increase your chances of detecting a statistically significant finding, or in other terms, drastically increase your statistical power. Large sample sizes will also allow you to detect both large and small effect sizes, regardless of scale of measurement of the outcome, research design, and/or magnitude, variance, and direction of the effect. Small sample sizes will decrease your chances of detecting statistically significant differences (statistical power), especially with categorical and ordinal outcomes, between-subjects and multivariate designs, and small effect sizes.
Statistical power is the chance you have as a researcher to reject the null hypothesis, given that the treatment effect actually exists in the population. Basically, statistical power is the chance you have of finding a significant difference or main effect when running statistical analyses. Statistical power is what you are interested in when you ask, "How many people do I need to find significance?"
In the applied empirical sense, measuring for large effect sizes increases statistical power. Trying to detect small effect sizes will decrease your statistical power. Continuous outcomes increase statistical power because of increased precision and accuracy in measurement. Categorical and ordinal outcomes decrease statistical power because of decreased variance and objectivity of measurement. Within-subjects designs generate more statistical power due to participants serving as their own controls. Between-subjects and multivariate designs require more observations to detect differences and therefore decrease statistical power.
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.