Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size
My newest published article in Scientifica is now available for download online and on the Research Engineer website. The creation of the Statistical Power engine of Research Engineer led me to write the article. Click on the Download Article button below to download a .pdf directly from the website or click on the Statistical Power button to be taken to the aforementioned engine. Many thanks and regards to everyone that uses Research Engineer! -EH
Within-subjects designs increase statistical power
Each participant serves as their own control in within-subjects designs
Within-subjects designs increase statistical power. because participants serve as their own control. Between-subjects designs necessitate more observations of the outcome to be able to effectively compare independent groups on an outcome. Multivariate analyses further decrease statistical power in that many more observations of the outcome to detect significant effects. At least 20 -40 more observations of the outcome have to collected per variable entered into a simultaneous of hierarchial regression model in order to meet statistical power when trying to account for demographic, etiological, clinical, and confounding effects.
Within-subjects designs, when coupled with with continuous outcomes, large effect sizes, limited variance in the outcome and a large sample size, greatly increase statistical power. Small effect sizes are also easier to detect using within-subjects statistics because participants serve as their own control. Within-subjects design also provide more statistical power when small sample sizes are used.
Case series are used to study rare outcomes and generate hypotheses
Yield measures of effect size and test methodologies
Case series designs yield the lowest form of observational evidence. Researchers choose a series of cases in the population that share some sort of similar characteristic and then they analyze pertinent predictor, demographic, and clinical factors associated with the outcome in the group of cases. Case series designs are at times also called pilot studies.
Case series designs are often employed in basic science and pre-clinical research. They are useful for generating hypotheses, effect sizes for future power analyses, and studying extremely rare outcomes.
Case series designs are an excellent choice for novice researchers looking to get their feet "wet" in empirical pursuits. They also prove their worth when evidence-based measures of effect do not exist in the literature. Running a small pilot study or case series can yield important measures of effect.
Small sample sizes can lead to Type II errors
Significant effects may not be able to be detected
In instances where a phenomenon or outcome is less prevalent in the population, scientists are forced to work small sample sizes. It is just the nature of the science, and the phenomenon or outcome.
1. When working with smaller sample sizes, adequate statistical power (and therefore statistical significance) is VERY hard to achieve.
2. There is limited precision and accuracy when using categorical or ordinal outcomes, which can further decreases statistical power.
3. When measuring for small effect sizes, small sample sizes cannot provide enough variance in the outcome to detect clinically meaningful, but small effects. This REALLY decreases your statistical power since inferential statistics depend upon variance in the mathematical sense.
With this being said, remember to interpret the p-values yielded from RCT level studies with small sample sizes in the context of the aforementioned points. If a treatment effect does not obtain statistical significance, but appears to be CLINICALLY SIGNIFICANT with a p-value approaching significance (Type II error), then perhaps more credence can be found in the effect.
If researchers run bivariate tests on 30 different outcomes with less than 20 observations and claim significance without a Bonferroni adjustment, throw the article out.
Mastery of the literature leads to relevant research questions
Become an expert in the empirical field of endeavor
There is nothing more important when designing and conducting research than being heavily vested in the associated knowledge base. Research questions are born and formulated out of the literature. One cannot argue for a "gap" in the literature unless he or she has put forth the time and effort to know all of the literature. The literature also makes it very easy to make hard decisions in the preliminary phases of study planning.
Here is what the literature can do for you:
1. Give you an evidence-based measure of effect to use in an a priori power analysis. It will show more empirical rigor on your part if you use the values from the most current and highest-quality evidence available.
2. Help you choose the "gold standard" outcome that is most generalizable and applicable to your audience and peers. Using the best outcome measure available increases the internal validity of your study as well. If the same outcome is used in many studies, then it has more validity evidence to back it up. This, again, shows stronger empirical reasoning on your part.
3. Allow you to ask a question that is relevant and that will generate new knowledge. You will be able to pass the "So what?" question with ease when you know the literature. You will know what new knowledge needs to be generated and how it is relevant in the context of the existing literature.
4. Help you choose the correct research design to answer your research question. If you find that the literature only has observational evidence related to your area of interest, then you can make the informed decision to employ a more complex design to yield causal effects.
Feasible research in terms of scope, time, resources, and expertise
Changing the face of medicine versus completing a research study
I have conducted thousands of statistical consultations over the years and have worked with many novice resident researchers over that time. One cannot help but admire the spirit, energy, and motivation of young people wanting to make an impact on medicine through research. I enjoy the zeal and drive of bright people wanting to be physicians and researchers. This is a good thing!
That being said, I spend a lot of my time with novice researchers using deductive reasoning to hone down their research questions into something tangible and feasible. They come into the office with an idea that will change medicine forever and we will be cruising around the Caribbean in a year! This has never been researched before! No one has ever done this before! Trust me, I want all of these proclamations to be true and I also want to change the face of medicine. Yet, most times it just not feasible to do so given the time, resources, participants, competencies and environment associated with the study.
I focus on a few primary areas when it comes to feasible research questions with my consultees:
1. Participant pool - Are there enough participants available in the immediate clinical or empirical environment to achieve adequate statistical power for inferential analyses? How will you recruit the participants? What are your inclusion and exclusion criteria? Inclusion and exclusion criteria may need to be modified to increase sample size.
2. Effect size - Small effect sizes require large sample sizes.
3. Research design - Retrospective designs are always more feasible because the data already exists.
4. Communication - Research never occurs in isolation. Researchers should communicate and collaborate with their peers regarding their research projects. Attendings and academic physicians can give you ideas on how to feasibly conduct your research.
5. Time - What is the time frame for the study from inception to publication? How much time do you have to set aside for the research study? Does the completion of your research coincide with abstract deadlines of interest?
6. Power analysis - Conduct an a priori power anlaysis based on an evidence-based measure of effect to see if the study is feasible in regards to sample size needed to achieve power.
Evidence-based measures of effect
Use the empirical literature to your advantage
One of the most important things you can do when designing your study is to conduct an a priori power analysis. Doing so will tell you how many people that you will need in your sample size to detect the effect size or treatment effect in your study.
Without an a priori calculation, you could frivolously waste months or years of your life conducting a study only to find out that you only needed 100 in each group to achieve significance. Or, with the inverse, you conduct a study with only 50 patients and find out in a post hoc fashion that you would have needed 10,000 to prove your effect!
If you are using Research Engineer and G*Power to run your analyses, here are the things you will need:
1. An evidence-based measure of effect from the literature is the first thing you should seek out. Find a study that is theoretically, conceptually, or clinically similar to your own. Try to find a study that uses the same outcome you plan to use in your study.
2. Use the means, standard deviations, and proportions from these published studies as evidence-based measures of effect size to calculate how large of a sample size you will need. These values will be reported in body of the results section or in tables within the manuscript. It shows more empirical rigor on your part if you conduct an a priori power analysis based on a well-known study in the field.
3. Plug these values into G*Power using the steps published on the sample size page to find out how many people you will need to collect for your study.
G*Power is a necessary tool for every researcher's toolkit
Easy statistical power and sample size calculations
I'm trying to run an online business so I'm fully Google-integrated. I see that there many search queries of different derivations related to sample size calculation as it relates to behind-the-scenes tracking measures.
There is an open-source tool available to EVERYONE that allows you to calculate your own a priori and post hoc power analyses. It is called G*Power and as your personal statistical consultant, I highly suggest you go to the following web address and download Version 3.0 to your respective device:
The researchers that developed this program have made a great contribution to science. It is truly a great and FREE program that can run a litany of different power analyses. You can find out in minutes how large of a sample size that you need, given that you have an idea of the effect size that you are attempting to detect in your study.
Use means, proportions, and variance measures from published studies in your field to have the most empirically rigorous hypothesized effect. Enter these values into G*Power and the adjust the variance and magnitude of the effect size to see how the required sample size changes.
Click on the Sample Size button to access the methods of conducting and interpreting sample size calculations for ten different statistical tests.
Effect size, sample size, and statistical power
Choose an effect size to maximize statistical power and decrease sample size
Effect size, sample size, and statistical power are nebulous empirical constructs that require a strong working knowledge of each in a conceptual fashion. Also, there are basic interdependent relationships that exist amongst the three constructs. A change in one will ALWAYS exact a predictable and static change in the other two.
An effect size is the hypothesized difference expected by researchers in an a priori fashion between independent groups (between-subjects analysis), across time or observations (within-subjects analysis), or the magnitude and direction of association between constructs (correlations and multivariate analyses).
Effect size planning is perhaps the HARDEST part of designing a research study. Oftentimes, researchers have NO IDEA of what type of effect size they are trying to detect.
First and foremost, when researchers cannot state the hypothesized differences in their outcomes, an evidence-based measure of effect yielded from a published study that is theoretically or conceptually similar to the phenomenon of interest should be used. Using an evidence-based measure of effect in an a priori power analysis shows more empirical rigor on the part of the researchers and increases the internal validity of the study with the use of published values.
Sample size is the absolute number of participants that are sampled from a given population for purposes of running inferential statistics. The nomenclature of the word, inferential, denotes the basic empirical reasoning that we are drawing a representative sample from a population and then conducting statistics in order to make inferences back to said population. An important part of preliminary study planning is to specify the inclusion and exclusion criteria for participation in your study and then getting an idea of how large a participant pool you have available to you from which to draw a sample for purposes of running inferential statistics.
Due to the underlying algebra associated with mathematical science, large sample sizes will drastically increase your chances of detecting a statistically significant finding, or in other terms, drastically increase your statistical power. Large sample sizes will also allow you to detect both large and small effect sizes, regardless of scale of measurement of the outcome, research design, and/or magnitude, variance, and direction of the effect. Small sample sizes will decrease your chances of detecting statistically significant differences (statistical power), especially with categorical and ordinal outcomes, between-subjects and multivariate designs, and small effect sizes.
Statistical power is the chance you have as a researcher to reject the null hypothesis, given that the treatment effect actually exists in the population. Basically, statistical power is the chance you have of finding a significant difference or main effect when running statistical analyses. Statistical power is what you are interested in when you ask, "How many people do I need to find significance?"
In the applied empirical sense, measuring for large effect sizes increases statistical power. Trying to detect small effect sizes will decrease your statistical power. Continuous outcomes increase statistical power because of increased precision and accuracy in measurement. Categorical and ordinal outcomes decrease statistical power because of decreased variance and objectivity of measurement. Within-subjects designs generate more statistical power due to participants serving as their own controls. Between-subjects and multivariate designs require more observations to detect differences and therefore decrease statistical power.
Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.