Archive - Eric Heidel, PhD PStat - Statistician For Hire

Tags

Published on

October 5, 2014

Non-parametric statistics and small sample sizes

Friedman's ANOVA Kruskal-Wallis Mann-Whitney U Non-parametric Statistics Psychometric Tests Sample Size Wilcoxon

Non-parametric statistics are robust to small sample sizes

The right way to conduct statistics

Mark Twain said it best, "There are lies, damn lies, and statistics." Statistics can be misleading from both the standpoint of the person conducting the statistics and the person that is interpreting the analyses. Many between-subjects studies have small sample sizes (n < 20) and statistical assumptions for parametric statistics cannot be met.

For basic researchers that operate day in and day out with small sample sizes, the answer is to use non-parametric statistics. Non-parametric statistical tests such as the Mann-Whitney U, Kruskal-Wallis, Wilcoxon, and Friedman's ANOVA are robust to violations of statistical assumptions and skewed distributions. These tests can yield interpretable medians, interquartile ranges, and p-values.

Non-parametric statistics are also useful in the social sciences due to the inherent measurement error associated with assessing human behaviors, thoughts, feelings, intelligence, and emotional states. The underlying algebra associated with psychometrics relies on intercorrelations amongst constructs or items. Correlations can easily be skewed by outlying observations and measurement error. Therefore, in concordance with mathematical and empirical reasoning, non-parametric statistics should be used often for between-subjects comparisons of surveys, instruments, and psychological measures.
Published on

October 3, 2014

Chi-square p-values are not enough

95% Confidence Interval Chi-square Odds Ratio With 95% CI P-value Relative Risk Sampling Error

Chi-square p-value

Odds ratio with 95% confidence interval should be reported and interpreted

Most people that need statistics are focused only on the almighty p-value of less than .05. When running Chi-square analyses between a dichotomous categorical predictor and a dichotomous categorical outcome, p-values are not the primary inference that should be interpreted for practical purposes. The lack of precision and accuracy in categorical measures coupled with sampling error makes the p-values yielded from Chi-square analyses virtually worthless in the applied sense.

The correct statistic to run is an unadjusted odds ratio with 95% confidence interval. This is the best measure for interpreting the magnitude of the association between two dichotomous categorical variables collected in a retrospective fashion. Relative risk can be calculated when the association is assessed in a prospective fashion.

The width of the 95% confidence interval and it crossing over 1.0 dictate the significance and precision of the association between the variables. With smaller sample sizes, the 95% confidence interval will be wider and less precise. Larger sample sizes will yield more precise effects.
Published on

October 2, 2014

Ordinal measures becoming continuous with normality

ANOVA Test Chi-square Classical Test Theory Continuous Friedman's ANOVA Independent Samples T-test Item Response Theory Kruskal-Wallis Kurtosis Logistic Regression Mann-Whitney U Non-parametric Statistics Normality Ordinal Parametric Statistics Regression Repeated-measures ANOVA Skewness Wilcoxon

Ordinal measures and normality

Ordinal level measurement can become interval level with assumed normality

Here is an interesting trick I picked up along the way when it comes to ordinal outcomes and some unvalidated measures. If you run skewness and kurtosis statistics on the ordinal variable and its distribution meets the assumption of normality (skewness and kurtosis statistics are less than an absolute value of 2.0), then you can "upgrade" the variable to a continuous level of measurement and analyze it using more powerful parametric statistics.

This type of thinking is the reason that the SAT, ACT, GRE, MCAT, LSAT, and validated psychological instruments are perceived at a continuous level. The scores yielded from these instruments, by definition, are not continuous because a "true zero" does not exist. Scores from these tests are often norm- or criterion-referenced to the population so that they can be interpreted in the correct context. Therefore, with the subjectivity and measurement error associated with classical test theory and item response theory, the scores are actually ordinal.

With that being said, if the survey instrument or ordinal outcome is used in the empirical literature often and it meets the assumption of normality as per skewness and kurtosis statistics, treat the ordinal variable as a continuous variable and run analyses using parametric statistics (t-tests, ANOVA, regression) versus non-parametric statistics (Chi-square, Mann-Whitney U, Kruskal-Wallis, McNemar's, Wicoxon, Friedman's ANOVA, logistic regression).
Published on

October 1, 2014

Statistical Designs

ANOVA Test Between-subjects Chi-square Chi-square Goodness-of-fit Friedman's ANOVA Independent Samples T-test Kruskal-Wallis Mann-Whitney U Odds Ratio With 95% CI Relative Risk Repeated-measures ANOVA Repeated-measures T-test Wilcoxon Within-subjects

Research questions lead to choice of statistical design

Differences between-subjects and within-subjects designs

There are terms in statistics that many people do not understand from a practical standpoint. I'm a biostatistical scientist and it took me YEARS to wrap my head around some fundamental aspects of statistical reasoning, much less the lexicon. I would hypothesize that 90% of the statistics reported in the empirical literature as a whole fall between two different categories of statistics, between-subjects and within-subjects. Here is a basic breakdown of the differences in these types of statistical tests:

1. Between-subjects - When you are comparing independent groups on a categorical, ordinal, or continuous outcome variable, you are conducting between-subjects analyses. The "between-" denotes the differences between mutually exclusive groups or levels of a categorical predictor variable. Chi-square, Mann-Whitney U, independent-samples t-tests, odds ratio, Kruskal-Wallis, and one-way ANOVA are all considered between-subjects analyses because of the comparison of independent groups.

2. Within-subjects - When you are comparing THE SAME GROUP on a categorical, ordinal, or continuous outcome ACROSS TIME OR WITHIN THE SAME OBJECT OF MEASUREMENT MULTIPLE TIMES, then you are conducting within-subjects analyses. The "within-" relates to the differences within the same object of measurement across multiple observations, time, or literally, "within-subjects." Chi-square Goodness-of-fit, Wilcoxon, repeated-measures t-tests, relative risk, Friedman's ANOVA, and repeated-measures ANOVA are within-subjects analyses because the same group or cohort of individuals is measured at several different time-points or observations.
Published on

September 29, 2014

Operationalization of constructs and behaviors

Affordable Care Act Confirmatory Factor Analysis Construct Validity Cross-sectional Operationalization Prevalence Principal Components Analysis Social Science Standards Of Care Survey

Operationalization leading to understanding

Measurement of new phenomena

The term operationalization is very near and dear to my heart since I conducted my dissertation on operationalizing and validating the construct of isomorphism in supervision. Operationalization essentially means defining observable and measurable components of a given construct or behavior.

The term is used often in the social sciences because scientists in that field have to spend so much time creating and validating their constructs of interest, just to be able to measure for them. From an empirical standpoint, they have to operationalize the construct as it exists within the perception, context, experience, and environment of members of a population.

Many social scientists use survey methodologies (cross-sectional) to operationalize an abstract, new, or unique construct or behavior. They master the content area related to the construct, create a survey, and then administer it to a sample from a targeted population to see what content areas or items account for the most variance. Principal components analysis and confirmatory factor analysis are used to establish the construct validity of survey instruments.

Medical professionals use cross-sectional research designs to establish the prevalence of disease states. Operationalization within physiology deals more with using "gold standard" techniques and concrete measures such as lab values. Treatment protocols are another form of operationalization within medicine. Certain procedures like a central line insertion require 20+ sequential steps to be conducted by surgical team members, every time. With the advent of the Affordable Care Act and upcoming clinical pathways, operationalization will play an even larger role in building economical, efficient, and effective standards of care.
Published on

September 27, 2014

Evidence-based medicine and its applications

Acquiring Clinical Evidence Applying Clinical Evidence Appraisal Of The Literature Appraising Clinical Evidence Asking Clinical Questions Assessing Clinical Practice Bloom's Taxonomy Cognitive Dissonance EBM Evidence-based Medicine PICO

Critical appraisal of the clinical evidence

The cart before the horse

I'm getting ready to add an Education section to the website, I decided to go back to first principles. Bloom's Taxonomy had a pervasive impact on my philosophy of learning, teaching, and cognitive complexity. I used it back in February of this year for an evidence-based medicine (EBM) presentation at work. Bloom's Taxonomy* stipulated six levels of "knowing" or cognitive complexity. The six levels in increasing order were knowledge, comprehension, application, analysis, synthesis, and evaluation.

Here is the conundrum that Bloom's Taxonomy exacts upon applied EBM practice:

There are five steps to EBM: Asking, acquiring, appraising, applying, and assessing.

With asking, the EBM literature posits that clinicians experience "cognitive dissonance" when they have a knowledge gap in their clinical practice. In order to deter the dissonance, the clinician decides to ask a clinical question to fill that gap.

With acquiring, the clinician uses the PICO (population, intervention, comparator, outcome) mnemonic to acquire the best clinical evidence, given the resources and time available.

Now we get to critical appraisal of the literature. When looking at the nomenclature of the word "appraisal," it is reflective of the highest level of "knowing" or cognitive complexity in Bloom's Taxonomy, evaluation. EBM stipulates that clinicians must be able to critically appraise the methods and statistical analyses of published studies. This means that clinicians have to have functioning at a very high cognitive level to do this correctly.

However, past literature has shown that researchers feel anxious and intimidated by statistics due to a lack of experience and competency.** Also, undergraduate and graduate medical training rarely equips clinicians with the necessary competencies to conduct and effectively interpret clinical research evidence.***

So, how can your everyday clinician with limited empirical/statistical training who feels "cognitive dissonance" a second time in the five steps of EBM critically appraise the literature? Therein lies the conundrum, in my opinion.

I'm positing that we need to refocus our efforts on the lower echelons of Bloom's Taxonomy by educating physicians, residents, fellows, faculty, pharmacists, nurses, and staff to better understand (knowledge), recognize (comprehension), choose (application), examine (analysis), and design (synthesis) research studies before we can expect them to critically appraise (evaluation) the literature.

*Bloom, B. S.; Engelhart, M. D.; Furst, E. J.; Hill, W. H.;Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York: David McKay Company.
**Marquardt, DW. Criteria for evaluating the performance of statistical consultants in industry. The American Statistician 1981; 35; 216-219.
***Wegwarth O. Statistical illiteracy in residents: What they do not learn today will hurt their patients tomorrow. Journal of Graduate Medical Education 2013; 5; 340-341.