Statistical-power-test - Eric Heidel, PhD PStat

Tags

Published on

April 1, 2015

Categorical measurement caveats

95% Confidence Interval Categorical Diagnostic Testing Inter-rater Reliability Intraclass Correlation Coefficient Kappa Statistic Multivariate Statistics Negative Predictive Value Odds Ratio With 95% CI Positive Predictive Value Relative Risk Sample Size Sensitivity Specificity Statistical-power-test

Effects of categorical measurement

Decrease statistical power and increase sample size

Categorical variables are very prevalent in medicine. Measures like presence of comorbidities, mortality, and test results are categorical in nature. Here are some general caveats associated with categorical measurement and sample size:

1. Categorical outcomes will always DECREASE statistical power and INCREASE the needed sample size. This is due to the lack of precision and accuracy in categorical measurement.

2. The underlying algebra associated with calculating 95% confidence intervals of odds ratios and relative risk is 100% dependent upon the sample size. With smaller sample sizes, by default, wider and less precise 95% confidence intervals will be found. If one of the cells of a cross-tabulation table has fewer observations that the other cells, then the 95% confidence interval will be wider and potentially not truly interpretable. A 95% confidence interval will become narrower or more precise only with larger sample sizes.

3. When using categorical variables for diagnostic testing purposes, larger samples sizes will be needed to calculate precise measures of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV). With smaller sample sizes in diagnostic studies, a change in one or two observations can have drastic effects on the diagnostic values.

This is especially true when there is a subjective rating used for purposes of diagnosing someone as "positive" or "negative" for a given disease state (radiologist reading an X-ray). Inter-rater reliability coefficients such as Kappa or ICC should be employed to ensure consistency and reliability among subsequent ratings and raters. Sensitivity, specificity, and PPV will be affected by inter-rater reliability. Receiver Operator Characteristic (ROC) curves can be used to find a given value where sensitivity and specificity of a test is maximized. ROC curves can also be used to compare the area under the curve (AUC) between several diagnostic tests at the same time so that the best can be chosen.

4. For each predictor categorical parameter (or variable) that you want to include in a multivariate model, you have to increase your sample size by at least 20-40 observations of the outcome. This due to the limited precision, accuracy, and statistical power associated with categorical measurement. Researchers HAVE to collect more observations in order to detect any potential significant multivariate associations.

In the case that a polychotomous variable is to be used in a model, create (a-1), where a is the number of categories, dichotomous variables with "0" as not being that category and "1" as being that category. For each level, 20-40 more observations of the outcome will be needed to have enough statistical power to detect differences amongst the multiple groups.
Published on

January 8, 2015

Research Engineer is the world's first online decision tree for applied research and statistics

Database Management Diagnostic Testing Education Epidemiology Evidence-based Medicine Psychometric Tests Research Design Research Engineer Research Question Statistical-power-test Statistics Survey Variables

Fully automated and freely accessible to researchers around the world

The first interactive decision tree that integrates statistical assumptions and post hoc analyses

Research Engineer is going to be presented for the first time in a public forum next Tuesday. I'm pretty excited to let all of my colleagues know what I've been up to these past five months. I realized earlier today that Research Engineer has completely changed my life for the better. And I'm so thankful to all of those that have supported me along the way.

And to visitors of this website, I extend my most gracious and humble thanks for your patronage. The website will continue to grow and help you in all of your future empirical endeavors.

I have built the world's first online decision engine for research questions, research designs, statistics, statistical power, databases, evidence-based medicine, survey design, psychometrics, epidemiology, diagnostic testing, variables, and education. I look forward to the future!
Published on

November 18, 2014

Small sample sizes, Type II errors, and empirical reasoning

Accuracy Bonferroni Effect Size P-value Precision Sample Size Statistical-power-test Type II Error

Small sample sizes can lead to Type II errors

Significant effects may not be able to be detected

In instances where a phenomenon or outcome is less prevalent in the population, scientists are forced to work small sample sizes. It is just the nature of the science, and the phenomenon or outcome.

1. When working with smaller sample sizes, adequate statistical power (and therefore statistical significance) is VERY hard to achieve.

2. There is limited precision and accuracy when using categorical or ordinal outcomes, which can further decreases statistical power.

3. When measuring for small effect sizes, small sample sizes cannot provide enough variance in the outcome to detect clinically meaningful, but small effects. This REALLY decreases your statistical power since inferential statistics depend upon variance in the mathematical sense.

With this being said, remember to interpret the p-values yielded from RCT level studies with small sample sizes in the context of the aforementioned points. If a treatment effect does not obtain statistical significance, but appears to be CLINICALLY SIGNIFICANT with a p-value approaching significance (Type II error), then perhaps more credence can be found in the effect.

If researchers run bivariate tests on 30 different outcomes with less than 20 observations and claim significance without a Bonferroni adjustment, throw the article out.
Published on

October 31, 2014

Feasible research questions are answerable

Effect Size Feasible Research Questions Research Design Research Question Sample Size Statistical Power Analysis Statistical-power-test

Feasible research in terms of scope, time, resources, and expertise

Changing the face of medicine versus completing a research study

I have conducted thousands of statistical consultations over the years and have worked with many novice resident researchers over that time. One cannot help but admire the spirit, energy, and motivation of young people wanting to make an impact on medicine through research. I enjoy the zeal and drive of bright people wanting to be physicians and researchers. This is a good thing!

That being said, I spend a lot of my time with novice researchers using deductive reasoning to hone down their research questions into something tangible and feasible. They come into the office with an idea that will change medicine forever and we will be cruising around the Caribbean in a year! This has never been researched before! No one has ever done this before! Trust me, I want all of these proclamations to be true and I also want to change the face of medicine. Yet, most times it just not feasible to do so given the time, resources, participants, competencies and environment associated with the study.

I focus on a few primary areas when it comes to feasible research questions with my consultees:

1. Participant pool - Are there enough participants available in the immediate clinical or empirical environment to achieve adequate statistical power for inferential analyses? How will you recruit the participants? What are your inclusion and exclusion criteria? Inclusion and exclusion criteria may need to be modified to increase sample size.

2. Effect size - Small effect sizes require large sample sizes.

3. Research design - Retrospective designs are always more feasible because the data already exists.

4. Communication - Research never occurs in isolation. Researchers should communicate and collaborate with their peers regarding their research projects. Attendings and academic physicians can give you ideas on how to feasibly conduct your research.

5. Time - What is the time frame for the study from inception to publication? How much time do you have to set aside for the research study? Does the completion of your research coincide with abstract deadlines of interest?

6. Power analysis - Conduct an a priori power anlaysis based on an evidence-based measure of effect to see if the study is feasible in regards to sample size needed to achieve power.

Tags

Categorical measurement caveats

Effects of categorical measurement

Decrease statistical power and increase sample size

Research Engineer is the world's first online decision tree for applied research and statistics

Fully automated and freely accessible to researchers around the world

The first interactive decision tree that integrates statistical assumptions and post hoc analyses

Small sample sizes, Type II errors, and empirical reasoning

Small sample sizes can lead to Type II errors

Significant effects may not be able to be detected

Feasible research questions are answerable

Feasible research in terms of scope, time, resources, and expertise

Changing the face of medicine versus completing a research study

Contact Dr. Eric Heidel
consultation@scalelive.com
(865) 742-7731

Copyright © 2026 Scalë. All Rights Reserved. Patent Pending.

Tags

Categorical measurement caveats

Effects of categorical measurement

Decrease statistical power and increase sample size

Research Engineer is the world's first online decision tree for applied research and statistics

Fully automated and freely accessible to researchers around the world

The first interactive decision tree that integrates statistical assumptions and post hoc analyses

Small sample sizes, Type II errors, and empirical reasoning

Small sample sizes can lead to Type II errors

Significant effects may not be able to be detected

Feasible research questions are answerable

Feasible research in terms of scope, time, resources, and expertise

Changing the face of medicine versus completing a research study

Contact Dr. Eric Heidelconsultation@scalelive.com(865) 742-7731

Copyright © 2026 Scalë. All Rights Reserved. Patent Pending.

Contact Dr. Eric Heidel
consultation@scalelive.com
(865) 742-7731