Tags

  • Published on

    Statistical assumptions must be tested when using inferential statistics

    Statistical assumptions on Research Engineer!

    Meeting assumptions is necessary when running inferential statistics

    I put up some new pages for the following statistical assumptions, as promised! Click on one of the links below!

    1. Independence of observations

    2. Normality

    3. Homogeneity of variance

    4. Sphericity

    5. Normality of difference scores

    6. Chi-square assumption
  • Published on

    Non-parametric Friedman's ANOVA

    Analyze three or more measures of an ordinal outcome

    Wilcoxon is used as a post hoc test for significant main effects

    The Greenhouse-Geisser correction is often employed when analyzing data with repeated-measures ANOVA.  The statistical assumption of sphericity, as assessed by Mauchly's test in SPSS, is more often times than not violated.  The Greenhouse-Geisser correction is robust to the violation of this statistical assumption with repeated-measures ANOVA. The means and standard deviations from a repeated-measures ANOVA can then be interpreted.

    Friedman's ANOVA, in my experience, does not make many appearances in the empirical literature.  Few people take three or more within-subjects or repeated measures of an ordinal outcome in order to answer their primary research question, I guess.  It is a non-parametric statistical test since the data is measured at more of an ordinal level.  When a significant main effect is found with a Friedman's ANOVA, then post hoc comparisons must be made within-subjects or amongst observations using Wilcoxon tests.  

    Friedman's ANOVA, while being a non-parametric statistic, may have the most statistical power when employed with cross-sectional data yielded from a survey instrument that has limited reliability and validity evidence.  Likert scales and composite scores from such tests may be naturally skewed due to systematic and unsystematic error.  Friedman's ANOVA is robust to these types of distributions that come from cross-sectional studies in the social sciences.

    If the assumption of normality among the difference scores between observations of a continuous outcome cannot be met, then Friedman's ANOVA can be used to yield inferential evidence.  But it is always a better idea to first check for outliers in a distribution (individual observations that are more than 3.29 standard deviations away from the mean) and make a decision as to whether 1) delete the observation in a listwise fashion, or 2) run a logarithmic transformation on the distribution.

    You will have transform the other observations of the outcome if you choose #2 above.  The means and standard deviations of transformed variables cannot be interpreted but the p-values can be interpreted.  Report the median and interquartile range for transformed variables.  

    Deleting observations can introduce bias into the statistical analysis.  This should only be done if the number of outliers constitutes less than 10% of the overall distribution.  One can also run between-subjects comparisons between participants with all observations of the outcome versus participants without all observations.  If there are no differences on predictor, confounding, and outcome variables between these two groups, then lessened observation bias can be assumed.
  • Published on

    FINER and PICO

    An amalgamation of philosophy and objectivity

    The research question is the foundation of everything empirical

    Research questions (and answering them) are always the primary focus of anything and everything empirical, methodological, epidemiological, and statistical. Without a research question, there is no reason to conduct a study or run statistics.

    The following are DIRECTLY derived from research questions:

    1. Null and alternative hypotheses (hypothesis testing and inferential statistics)
    2. Research design (observation or experimental)
    3. Population of interest (inclusion and exclusion criteria) 
    4. Sampling method (non-probability or probability)
    5. Intervention or independent variable (categorical, ordinal, or continuous)
    6. Confounding or control variables (secondary, tertiary, and ancillary research questions)
    7. Comparator or control treatment (categorical, ordinal, or continuous)
    8. Outcome or dependent variable (categorical, ordinal, or continuous)
    9. Outcome and design for an a priori power analysis to calculate sample size
    10. Structure of the database (between-subjects, within-subjects, or multivariate) and code book
    11. Statistical tests used (descriptive, between-subjects, within-subjects, correlations, survival, or multivariate)

    Researchers must take the appropriate amount of time to fully formulate and refine research questions. SO MUCH is dependent upon it for their study. Luckily, this task is made easier with the use of two prevalent mnemonics: FINER (feasible, interesting, novel, ethical, relevant) and PICO (population, intervention, comparator, outcome).

    FINER is a more of a philosophy for writing research questions. The arguments for the "F," "I," "N," "E," and "R" are all and informed upon by the empirical literature in the area of empirical or clinical interest. Researchers especially have to be well vested in the most current literature in order to make sound arguments for interesting, novel, and relevant questions.

    PICO is employed to explicitly and operationally define the population of interest, the intervention, the comparator, and the outcome in a research question. It is also more readily applicable in busy clinical and empirical environments and when writing literature search queries.  

    These two mnemonics compliment each other very well in applied empirical and clinical environments. The post-positivist philosophy of social and medical sciences lends itself well to FINER. Measurement of observable constructs and the application of experimental designs through the PICO mnemonic is also strongly reflective of a post-positivist philosophical orientation. Together, the "why" and "what" questions associated with conducting research can be argued in an evidence-based, objective, and logically sound fashion.
  • Published on

    The Kappa statistic

    Kappa is a measure of inter-rater reliability

    Rating performance or constructs a dichotomous categorical level

    The Kappa statistic is a measure of inter-rater reliability when the construct or behavior is being rated using a dichotomous categorical outcome.  When a sequential series of steps must be completed to yield an end product, such as with performance assessment, then a "checklist" or series of "yes/no" responses are scored by independent raters. The Kappa statistic can be used to assess the level of agreement/consistency/reliability between raters on subsequent dichotomous responses.

    It is important that raters have an operational definition of what constitutes a "yes" or "no" in regards to performance. The construct or behavior of interest must be standardized between raters so that unsystematic bias can be reduced.  A lack of operationalization and standardization in performance assessment significantly DECREASES the chances of obtaining evidence of inter-rater reliability when using the Kappa statistic.

    Kappa is not a "powerful" statistic because of the dichotomous categorical variables used in the analysis.  Larger sample sizes are needed to achieve adequate statistical power when categorical outcomes are utilized.  So, many observations of the performance of simulation may be needed to adequately assess BOTH inter-rater reliability and outcomes of interest. The chances of having adequate inter-rater reliability decreases with fewer observations of performance or simulation.
  • Published on

    Predictive validity is a powerful type of psychometric evidence

    Predictive Validity

    Correlations and regression are used to establish this kind of evidence

    Predictive validity evidence means that a survey instrument has the ability to predict some sort of occurrence in the future.  The most common application of predictive validity occurs in tests like the ACT, SAT, GRE, MCAT, LSAT, and GMAT. These tests are given before entering various phases of higher education to assess an individual's potential to graduate from either undergraduate or graduate school.  Interestingly enough, the correlation between these prevalent (and expensive) tests and graduation is only 0.3!  This means that 91% of what accounts for graduation is NOT associated with test scores on these instruments.  And we are talking a multi-BILLION dollar business...but, I digress.

    Predictive validity is calculated using simple correlation coefficients.  A correlation of 0.1 is considered weak evidence, a correlation of 0.3 denotes moderate evidence, and a correlation of 0.5 would make most social scientists jump for joy. Remember, in order to understand the amount of shared variance between two constructs, you simply "square" the correlation coefficient to yield the coefficient of determination.  Even with the highest level of predictive evidence with a predictive validity coefficient of 0.5, you are only accounting for 25% of the association between the two constructs!

    Within medicine, I believe that predictive validity plays an important role in imaging and early diagnosis.  One of the benefits of working in medicine is that the measures are more objective, concrete, observable, validated, and measurable versus the social sciences.  Correlations of 0.9 are common between various etiological, prognostic, confounding, clinical, and demographic phenomena within medicine.  If an imaging or diagnostic method can detect the earlier stages of a progressing disease state, then future outcomes can be mitigated with earlier and preventative treatment.