Statistical Consultation Line: (865) 742-7731
Accredited Professional Statistician For Hire
  • Contact Form

Statistical assumptions must be tested when using inferential statistics

12/31/2014

0 Comments

 

Statistical assumptions on Research Engineer!

Meeting assumptions is necessary when running inferential statistics

I put up some new pages for the following statistical assumptions, as promised! Click on one of the links below!

1. Independence of observations

2. Normality

3. Homogeneity of variance

4. Sphericity

5. Normality of difference scores

6. Chi-square assumption

Scale, LLC
0 Comments

Merry Christmas to all my loved ones, friends, colleagues, and site visitors!

12/24/2014

0 Comments

 

My very best wishes to you and yours!  

No statistical soliloquy today.  I am humbled and thankful for those listed above and for your interest in my website. Thank you for your patronage and many blessings to thee and thine. 

-EH

Scale, LLC
0 Comments

Non-parametric Friedman's ANOVA

12/23/2014

 

Analyze three or more measures of an ordinal outcome

Wilcoxon is used as a post hoc test for significant main effects

The Greenhouse-Geisser correction is often employed when analyzing data with repeated-measures ANOVA.  The statistical assumption of sphericity, as assessed by Mauchly's test in SPSS, is more often times than not violated.  The Greenhouse-Geisser correction is robust to the violation of this statistical assumption with repeated-measures ANOVA. The means and standard deviations from a repeated-measures ANOVA can then be interpreted.

Friedman's ANOVA, in my experience, does not make many appearances in the empirical literature.  Few people take three or more within-subjects or repeated measures of an ordinal outcome in order to answer their primary research question, I guess.  It is a non-parametric statistical test since the data is measured at more of an ordinal level.  When a significant main effect is found with a Friedman's ANOVA, then post hoc comparisons must be made within-subjects or amongst observations using Wilcoxon tests.  

Friedman's ANOVA, while being a non-parametric statistic, may have the most statistical power when employed with cross-sectional data yielded from a survey instrument that has limited reliability and validity evidence.  Likert scales and composite scores from such tests may be naturally skewed due to systematic and unsystematic error.  Friedman's ANOVA is robust to these types of distributions that come from cross-sectional studies in the social sciences.

If the assumption of normality among the difference scores between observations of a continuous outcome cannot be met, then Friedman's ANOVA can be used to yield inferential evidence.  But it is always a better idea to first check for outliers in a distribution (individual observations that are more than 3.29 standard deviations away from the mean) and make a decision as to whether 1) delete the observation in a listwise fashion, or 2) run a logarithmic transformation on the distribution.

You will have transform the other observations of the outcome if you choose #2 above.  The means and standard deviations of transformed variables cannot be interpreted but the p-values can be interpreted.  Report the median and interquartile range for transformed variables.  

Deleting observations can introduce bias into the statistical analysis.  This should only be done if the number of outliers constitutes less than 10% of the overall distribution.  One can also run between-subjects comparisons between participants with all observations of the outcome versus participants without all observations.  If there are no differences on predictor, confounding, and outcome variables between these two groups, then lessened observation bias can be assumed.

Scale, LLC

FINER and PICO

12/15/2014

2 Comments

 

An amalgamation of philosophy and objectivity

The research question is the foundation of everything empirical

Research questions (and answering them) are always the primary focus of anything and everything empirical, methodological, epidemiological, and statistical. Without a research question, there is no reason to conduct a study or run statistics.

The following are DIRECTLY derived from research questions:

1. Null and alternative hypotheses (hypothesis testing and inferential statistics)
2. Research design (observation or experimental)
3. Population of interest (inclusion and exclusion criteria) 
4. Sampling method (non-probability or probability)
5. Intervention or independent variable (categorical, ordinal, or continuous)
6. Confounding or control variables (secondary, tertiary, and ancillary research questions)
7. Comparator or control treatment (categorical, ordinal, or continuous)
8. Outcome or dependent variable (categorical, ordinal, or continuous)
9. Outcome and design for an a priori power analysis to calculate sample size
10. Structure of the database (between-subjects, within-subjects, or multivariate) and code book
11. Statistical tests used (descriptive, between-subjects, within-subjects, correlations, survival, or multivariate)

Researchers must take the appropriate amount of time to fully formulate and refine research questions. SO MUCH is dependent upon it for their study. Luckily, this task is made easier with the use of two prevalent mnemonics: FINER (feasible, interesting, novel, ethical, relevant) and PICO (population, intervention, comparator, outcome).

FINER is a more of a philosophy for writing research questions. The arguments for the "F," "I," "N," "E," and "R" are all and informed upon by the empirical literature in the area of empirical or clinical interest. Researchers especially have to be well vested in the most current literature in order to make sound arguments for interesting, novel, and relevant questions.

PICO is employed to explicitly and operationally define the population of interest, the intervention, the comparator, and the outcome in a research question. It is also more readily applicable in busy clinical and empirical environments and when writing literature search queries.  

These two mnemonics compliment each other very well in applied empirical and clinical environments. The post-positivist philosophy of social and medical sciences lends itself well to FINER. Measurement of observable constructs and the application of experimental designs through the PICO mnemonic is also strongly reflective of a post-positivist philosophical orientation. Together, the "why" and "what" questions associated with conducting research can be argued in an evidence-based, objective, and logically sound fashion.

Scale, LLC
2 Comments

The Kappa statistic

12/6/2014

0 Comments

 

Kappa is a measure of inter-rater reliability

Rating performance or constructs a dichotomous categorical level

The Kappa statistic is a measure of inter-rater reliability when the construct or behavior is being rated using a dichotomous categorical outcome.  When a sequential series of steps must be completed to yield an end product, such as with performance assessment, then a "checklist" or series of "yes/no" responses are scored by independent raters. The Kappa statistic can be used to assess the level of agreement/consistency/reliability between raters on subsequent dichotomous responses.

It is important that raters have an operational definition of what constitutes a "yes" or "no" in regards to performance. The construct or behavior of interest must be standardized between raters so that unsystematic bias can be reduced.  A lack of operationalization and standardization in performance assessment significantly DECREASES the chances of obtaining evidence of inter-rater reliability when using the Kappa statistic.

Kappa is not a "powerful" statistic because of the dichotomous categorical variables used in the analysis.  Larger sample sizes are needed to achieve adequate statistical power when categorical outcomes are utilized.  So, many observations of the performance of simulation may be needed to adequately assess BOTH inter-rater reliability and outcomes of interest. The chances of having adequate inter-rater reliability decreases with fewer observations of performance or simulation.

Scale, LLC
0 Comments

Predictive validity is a powerful type of psychometric evidence

12/4/2014

0 Comments

 

Predictive Validity

Correlations and regression are used to establish this kind of evidence

Predictive validity evidence means that a survey instrument has the ability to predict some sort of occurrence in the future.  The most common application of predictive validity occurs in tests like the ACT, SAT, GRE, MCAT, LSAT, and GMAT. These tests are given before entering various phases of higher education to assess an individual's potential to graduate from either undergraduate or graduate school.  Interestingly enough, the correlation between these prevalent (and expensive) tests and graduation is only 0.3!  This means that 91% of what accounts for graduation is NOT associated with test scores on these instruments.  And we are talking a multi-BILLION dollar business...but, I digress.

Predictive validity is calculated using simple correlation coefficients.  A correlation of 0.1 is considered weak evidence, a correlation of 0.3 denotes moderate evidence, and a correlation of 0.5 would make most social scientists jump for joy. Remember, in order to understand the amount of shared variance between two constructs, you simply "square" the correlation coefficient to yield the coefficient of determination.  Even with the highest level of predictive evidence with a predictive validity coefficient of 0.5, you are only accounting for 25% of the association between the two constructs!

Within medicine, I believe that predictive validity plays an important role in imaging and early diagnosis.  One of the benefits of working in medicine is that the measures are more objective, concrete, observable, validated, and measurable versus the social sciences.  Correlations of 0.9 are common between various etiological, prognostic, confounding, clinical, and demographic phenomena within medicine.  If an imaging or diagnostic method can detect the earlier stages of a progressing disease state, then future outcomes can be mitigated with earlier and preventative treatment.  

Scale, LLC
0 Comments

The assumption of independence of observations

12/3/2014

1 Comment

 

Independence of observations

Each participant in a sample can only be counted as one observation

As a biostatistician, I spend a lot of time testing for normality and homogeneity of variance.

Skewness and kurtosis statistics are used to assess the normality of a continuous variable's distribution.  A skewness or kurtosis statistic above an absolute value of 2.0 is considered to be non-normal.  Distributions are often non-normal due to outliers in the distribution.  Any observation that falls more than 3.29 standard deviations away from the mean is considered an outlier.

Levene's Test of Equality of Variances is used to measure for meeting the assumption of homogeneity of variance. Any Levene's Test with a p-value below .05 means that the assumption has been violated.  In the event that the assumption is violated, non-parametric tests can be employed.

There is one more important statistical assumption that exists coincident with the aforementioned two, the assumption of independence of observations.  Simply stated, this assumption stipulates that study participants are independent of each other in the analysis. They are only counted once.

In between-subjects designs, each study participant is a mutually exclusive observation that is completely independent from all other participants in all other groups.

For within-subjects designs, each participant is independent of other participants.  There are just multiple observations of the outcome, per participant.

With this being said, it is prevalent for researchers to take multiple measurements of an outcome and compare these multiple measurements in an independent fashion (oftentimes with differing numbers of observations across participants) or within-subjects (ALWAYS with differing numbers of observations of the outcome).  By default, these are not independent measures and violate the assumption of independence of observations.  What is one to do?

The answer is generalized estimating equations (GEE).  This family of statistical tests are robust to multiple observations (or correlated observations) of an outcome and can be used for between-subjects, within-subjects, factorial, and multivariate analyses.

Scale, LLC
1 Comment

    Archives

    March 2016
    January 2016
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    November 2014
    October 2014
    September 2014

    Author

    Eric Heidel, Ph.D. is Owner and Operator of Scalë, LLC.

    Categories

    All
    95% Confidence Interval
    Absolute Risk Reduction
    Accuracy
    Acquiring Clinical Evidence
    Adjusted Odds Ratio
    Affordable Care Act
    Alpha Value
    ANCOVA Test
    ANOVA Test
    Applying Clinical Evidence
    Appraisal Of The Literature
    Appraising Clinical Evidence
    A Priori
    Area Under The Curve
    Asking Clinical Questions
    Assessing Clinical Practice
    AUC
    Basic Science
    Beta Value
    Between-subjects
    Biserial
    Blinding
    Bloom's Taxonomy
    Bonferroni
    Boolean Operators
    Calculator
    Case-control Design
    Case Series
    Categorical
    Causal Effects
    Chi-square
    Chi-square Assumption
    Chi-square Goodness-of-fit
    Classical Test Theory
    Clinical Pathways
    Clustered Random Sampling
    Cochran-Mantel-Haenszel
    Cochran's Q Test
    Coefficient Of Determination
    Cognitive Dissonance
    Cohort
    Comparative Effectiveness Research
    Comparator
    Concurrent Validity
    Confidence Interval
    Confirmatory Factor Analysis
    Construct Specification
    Construct Validity
    Continuous
    Control Event Rate
    Convenience Sampling Method
    Convergent Validity
    Copyright
    Correlations
    Count Variables
    Cox Regression
    Cronbach's Alpha
    Cross-sectional
    Curriculum Vitae
    Database Management
    Diagnostic Testing
    EBM
    Education
    Effect Size
    Empirical Literature
    Epidemiology
    Equivalency Trial
    Eric Heidel
    Evidence-based Medicine
    Exclusion Criteria
    Experimental Designs
    Experimental Event Rate
    Facebook
    Factorial ANOVA
    Feasible Research Questions
    FINER
    Fisher's Exact Tests
    Friedman's ANOVA
    Generalized Estimating Equations (GEE)
    "gold Standard" Outcome
    G*Power
    Guidelines For Authors
    Hazard Ratio
    Hierarchical Regression
    Homogeneity Of Variance
    Hypothesis Testing
    ICC
    Incidence
    Inclusion Criteria
    Independence Of Observations Assumption
    Independent Samples T-test
    Intention-to-treat
    Internal Consistency Reliability
    Interquartile Range
    Inter-rater Reliability
    Interval Variables
    Intervention
    Intraclass Correlation Coefficient
    Isomorphism
    Item Response Theory
    Kaplan-Meier Curve
    Kappa Statistic
    KR-20
    Kruskal-Wallis
    Kurtosis
    Levene's Test
    Likert Scales
    Linearity
    Listwise Deletion
    Logarithmic Transformations
    Logistic Regression
    Log-Rank Test
    Longitudinal Data
    MANCOVA
    Mann-Whitney U
    MANOVA
    Mass Emails In Survey Research
    Math
    Mauchly's Test
    McNemar's Test
    Mean
    Measurement
    Median
    Medicine
    Merging Databases
    Missing Data
    Mode
    Multinomial Logistic Regression
    Multiple Regression
    Multivariate Statistics
    Negative Binomial Regression
    Negative Predictive Value
    Nominal Variables
    Nonequivalent Control Group Design
    Non-inferiority
    Non-inferiority Trial
    Non-parametric Statistics
    Non-probability Sampling
    Normality
    Normality Of Difference Scores
    Normal Probability Plot
    Novel Research Question
    Number Needed To Treat
    Observational Research
    Odds Ratio With 95% CI
    One-sample Median Tests
    One-sample T-test
    One-sided Hypothesis
    One-Way Random
    Operationalization
    Ordinal
    Outcome
    Outliers
    Parametric Statistics
    Pearson's R
    Ph.D.
    Phi Coefficient
    PICO
    Pilot Study
    Point Biserial
    Poisson Regression
    Population
    Positive Predictive Value
    Post Hoc
    Post-positivism
    PPACA
    PPV
    Precision
    Predictive Validity
    Prevalence
    Principal Components Analysis
    Probability Sampling
    Propensity Score Matching
    Proportion
    Proportional Odds Regression
    Prospective Cohort
    Psychometrics
    Psychometric Tests
    Publication
    Publication Bias
    Purposive Sampling
    P-value
    Random Assignment
    Randomized Controlled Trial
    Random Selection
    Rank Biserial
    Ratio Variables
    Receiver Operator Characteristic
    Regression
    Regression Analysis
    Relative Risk
    Relevant Research Question
    Reliability
    Repeated-measures ANOVA
    Repeated-measures T-test
    Research
    Research Design
    Research Engineer
    Research Journal
    Research Question
    Residual Analysis
    Retrospective Cohort
    ROC Curve
    Sample Size
    Sampling
    Sampling Error
    Sampling Method
    Scales Of Measurement
    Science
    Search Engine
    Search Query
    Sensitivity
    Simple Random Sampling
    Sitemap
    Skewness
    Social Science
    Spearman-Brown
    Spearman's Rho
    Specificity
    Specificity In Literature Searching
    Sphericity Assumption
    Split-half Reliability
    SPSS
    Standard Deviation
    Standards Of Care
    Statistical Analysis
    Statistical Assumptions
    Statistical Consultation
    Statistical Power
    Statistical Power Analysis
    Statistical-power-test
    Statistician
    Statistics
    Stratified Random Sampling
    Survey
    Survey Construct Specification
    Survey Methods
    Systematic Review
    Test-Retest Reliability
    Twitter
    Two-sided Hypothesis
    Two-Way Mixed
    Two-Way Random
    Type I Error
    Type II Error
    Unadjusted Odds Ratio
    Validity
    Variables
    Variance
    Wilcoxon
    Within-subjects
    YouTube


    Contact Form

Contact Dr. Eric Heidel
consultation@scalelive.com
(865) 742-7731

Copyright © 2023 Scalë. All Rights Reserved. Patent Pending.