Tags

  • Published on

    Non-parametric statistics as post hoc tests

    Mann-Whitney U and Wilcoxon as post hoc tests

    Explain significant main effects from Kruskal-Wallis tests and Friedman's ANOVA

    Non-parametric statistics are used when analyzing categorical and ordinal outcomes.  These statistics are also used with smaller sample sizes (n < 20) and when the assumptions of certain statistical tests are violated.

    The Mann-Whitney U test is employed when comparing two independent groups on an ordinal outcome.  It is also used when the assumptions of an independent samples or unpaired t-test are violated (normality, homogeneity of variance).

    The Wilcoxon test is used when comparing ordinal outcomes at two different points in time or within-subjects.  It is further used when the assumptions of a repeated measures t-test are violated (independence of observations, normality of difference scores).

    A lesser known use for these two non-parametric tests is when significant main effects are found for non-parametric Kruskal-Wallis and Friedman's ANOVA tests.  Much like with a parametric one-way ANOVA or repeated-measures ANOVA, if a significant main effect is found using non-parametric statistics, then a post hoc analysis must be undertaken to explain the significant main effect. Non-parametric statistics do not have Tukey, Scheffe, and Dunnett tests like parametric statistics!

    When a significant main effect is found using a Kruskal-Wallis test, subsequent Mann-Whitney U tests must be employed in a post hoc fashion to explain where amongst the independent groups the actual differences exist.

    The same holds true for Friedman's ANOVA.  If a significant main effect is found, then Wilcoxon tests must be used in a post hoc fashion to explain where the significant changes occur amongst the observations or within-subjects.
  • Published on

    Sampling methods in research

    Probability vs. non-probability

    Establishing causal effects vs. associations

    Experimental research designs, like randomized controlled trials, can yield evidence of causal effects while observational designs like case series, case-controls, and cohorts cannot determine any cause and effect relationships. The reason is because random selection and random assignment of participants allows for any differences at baseline to occur purely by chance AND also for these differences to be adjusted for in subsequent statistical analyses.

    From a conceptual standpoint, a sample assembled in a completely random fashion will be more REPRESENTATIVE of the actual population. Always remember that inferential statistics are conducted on samples to make INFERENCES BACK TO THE POPULATION. With a randomized sample, all of the biodiversity that exists in the real world has a better chance of being accounted for in the statistical analyses.  

    Random selection (every member of a given population has an equal chance of being selected for the study) and random assignment (selected participants are randomly allocated to either the treatment or control group) are the primary components of probability sampling.

    There are three types of probability sampling:

    1. Simple random sampling - Every member of a population has an equal chance of being selected for participation in the study.  

    2. Stratified random sampling - Independent strata within a given population are randomly sampled.  Each stratum must be overtly defined and homogeneous in some relative way.  Simple random sampling is then conducted on the stratum (singular) or strata (plural) of interest. 

    3. Clustered random sampling - Naturally occurring or defined subgroups of a given population are randomly sampled. The subgroups need to be defined and are often grouped according to socioeconomic, demographic, clinical, or theoretical characteristics.

    Non-probability sampling is used in observational research designs. The lack of randomization in these designs introduces selection and observation biases that can greatly skew the inferences yielded from statistics.

    There are two types of non-probability sampling techniques:

    1. Convenience sampling is the most prevalent form of non-probability sampling. Researchers just access retrospective data available to them in their empirical or clinical environment, or via existing databases, and conduct statistical analyses.

    2. Purposive sampling is a more focused approach to sampling where specific groups of individuals are targeted for participation in the study.    
  • Published on

    Retrospective cohort designs are useful to many researchers

    Retrospective cohort designs are very feasible

    The "go-to" research design for busy clinicians

    In my experience working as a biostatistician at a graduate school of medicine, I have learned that there are three precious commodities for busy clinicians and researchers: Time, competency, and accessibility to data.

    Systematic reviews constitute the most prodigious contribution that scientific-practitioners can make to a given body of clinical knowledge. When conducted in a rigorous and objective fashion, the pooled treatment effects yielded from this design are considered the highest level of applied clinical evidence that exists. It is much more of an academic/empirical task versus applied experimental and observational designs. Yet, the time and experience needed to conduct a systematic review often impede these pursuits by researchers. (However, they are greatly needed and should be undertaken if at all possible!  I'm going to start my first one soon.) 

    True experiments such as randomized controlled trials are not feasible for most researchers due to lack of funding, logistical support, and available resources. Also, researchers should first show observational evidence of a treatment effect before conducting a randomized controlled trial.  

    Prospective cohort studies can generate important measures of incidence and relative risk, as well as longitudinal data. However, this type of design means you are moving forward in time and are dependent upon enough observations being generated from you methodology to have adequate statistical power. The logistics and time associated with this design also tend to hinder its application in busy clinical environments.

    The next highest level of evidence is the retrospective cohort design and it is easily applied in a busy clinical environment. This is a retrospective design so the data already exists. One defines a cohort with inclusion and exclusion criteria. Then, members of the cohort are separated into independent groups according to some sort of exposure or non-exposure to a treatment, intervention, or risk. They are then followed up to a certain point in time to see if they did or did not have the outcome. There are obvious selection and observation biases associated with this design but it yields important measures of relative risk and many years of data can be mined for longitudinal or large-scale cohort analyses. Survival analyses are perfect for this type of design when establishing the 1-year, 3-year, or 5-year survival or "time-to-event" rates of an outcome.  They are also relatively inexpensive to conduct and time-friendly. This research design is much more preferable to case-control designs.
  • Published on

    Meeting statistical assumptions

    Meeting statistical assumptions is IMPORTANT

    Statistics is a flawed mathematical science and assumptions MUST be met

    I've read in the literature that somewhere between 30-90% of all statistics reported in the medical literature are incorrectly conducted. First of all, that's a WIDE range and either extreme should be pretty frightening to consumers of healthcare and other related services. If your practitioner is using evidence-based practices, then one would hope that your treatment regimen does NOT fall within that range!

    Many times, statistics are incorrect because researchers do not check for the statistical assumptions associated with using their statistical tests. There are three fundamental statistical assumptions that all researchers should check before running any type of statistic:

    1. Normality - If you are using ANY continuous variables, then use skewness and kurtosis statistics to assess their normality. Any variables that have a skewness or kurtosis statistics above an absolute value of 2.0 are assumed to be non-normal.

    2. Homogeneity of variance - If you are using between-subjects analyses to compare independent groups on a continuous outcome, then use Levene's test to check for meeting the assumption of homogeneity of variance between your independent groups. This assumption assesses if the independent groups have similar variances associated with the outcome. If the p-value for Levene's test is LESS THAN .05, then the assumption has been violated.  

    3. "Missingness" - Missing data is a constant battle when conducting research. There are a litany of different reasons that lead to missing data but regardless, missing data can skew the results of a study by under-representation of the population of interest. If ANY of your variables have MORE THAN 20% of their observations missing, then that variable should be discarded.  
  • Published on

    Effect size, sample size, and statistical power

    Effect size, sample size, and statistical power

    Choose an effect size to maximize statistical power and decrease sample size

    Effect size, sample size, and statistical power are nebulous empirical constructs that require a strong working knowledge of each in a conceptual fashion.  Also, there are basic interdependent relationships that exist amongst the three constructs. A change in one will ALWAYS exact a predictable and static change in the other two.

    An effect size is the hypothesized difference expected by researchers in an a priori fashion between independent groups (between-subjects analysis), across time or observations (within-subjects analysis), or the magnitude and direction of association between constructs (correlations and multivariate analyses).

    Effect size planning is perhaps the HARDEST part of designing a research study. Oftentimes, researchers have NO IDEA of what type of effect size they are trying to detect.

    First and foremost, when researchers cannot state the hypothesized differences in their outcomes, an evidence-based measure of effect yielded from a published study that is theoretically or conceptually similar to the phenomenon of interest should be used. Using an evidence-based measure of effect in an a priori power analysis shows more empirical rigor on the part of the researchers and increases the internal validity of the study with the use of published values.

    Sample size is the absolute number of participants that are sampled from a given population for purposes of running inferential statistics. The nomenclature of the word, inferential, denotes the basic empirical reasoning that we are drawing a representative sample from a population and then conducting statistics in order to make inferences back to said population. An important part of preliminary study planning is to specify the inclusion and exclusion criteria for participation in your study and then getting an idea of how large a participant pool you have available to you from which to draw a sample for purposes of running inferential statistics.

    Due to the underlying algebra associated with mathematical science, large sample sizes will drastically increase your chances of detecting a statistically significant finding, or in other terms, drastically increase your statistical power. Large sample sizes will also allow you to detect both large and small effect sizes, regardless of scale of measurement of the outcome, research design, and/or magnitude, variance, and direction of the effect. Small sample sizes will decrease your chances of detecting statistically significant differences (statistical power), especially with categorical and ordinal outcomes, between-subjects and multivariate designs, and small effect sizes.

    Statistical power is the chance you have as a researcher to reject the null hypothesis, given that the treatment effect actually exists in the population. Basically, statistical power is the chance you have of finding a significant difference or main effect when running statistical analyses.  Statistical power is what you are interested in when you ask, "How many people do I need to find significance?"

    In the applied empirical sense, measuring for large effect sizes increases statistical power. Trying to detect small effect sizes will decrease your statistical power. Continuous outcomes increase statistical power because of increased precision and accuracy in measurement. Categorical and ordinal outcomes decrease statistical power because of decreased variance and objectivity of measurement. Within-subjects designs generate more statistical power due to participants serving as their own controls. Between-subjects and multivariate designs require more observations to detect differences and therefore decrease statistical power.      
  • Published on

    Precision and Accuracy

    Precision and Accuracy

    Cornerstones of measurement reasoning

    Precision and accuracy are terms that are debated intensely in empirical arenas. While definitions will differ from textbook to textbook and within different academic circles, here is a general definition and explanation for both terms:  

    Precision relates to the reliability, consistency, and stability of a variable or outcome, as it is measured in a given population. Commonly in research and biostatistics, precision is assessed using confidence intervals (most often, 95% confidence intervals).

    When using categorical outcome variables in bivariate and multivariate analyses, the precision of odds ratios yielded from analyses is determined by the width of the confidence interval. WIDE confidence intervals mean that there is LESS precision/reliability/consistency/stability/confidence in the measure. Wide confidence intervals are attributed to small sample sizes when using categorical outcomes.

    Analyses using continuous outcomes report the 95% confidence intervals or standard errors of means, mean differences, and unstandardized beta coefficients. Sample size also plays an important role in the width of confidence intervals when using continuous outcomes.

    Precision is often communicated as reliability in psychometrics. Survey instruments are pilot tested and then reliability coefficients are generated using test-retest, internal consistency, or inter-rater methods.  

    Accuracy pertains to the validity, utility, and interpretability of a variable or outcome, as it is measured in a given population. The accuracy or validity of a measure relies upon the methods, assessment, and evidence through which it was created using a theoretical or conceptual framework. In order for a measure to be deemed accurate, it must go through rigorous testing and application in the clinical environment.

    With clinical measures related to "gold standard" treatments, the absolute risk reduction (ARR) and the number needed to treat (NNT) or the absolute risk increase (ARI) and the number needed to harm (NNH) needs to be established using randomized controlled trials and systematic reviews. With diagnostic tests, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (PPV), and total diagnostic accuracy need to be compared against a current and widely accepted "gold standard" diagnostic test.

    Finally, in psychometrics, construct validity is established by gathering many different forms of empirical evidence related to the interpretability, utility, and consequences of the measure. Researchers often use correlations, between-subjects analyses, and multivariate statistics to generate validity evidence. Predictive, concurrent, convergent, and discriminant validity evidence is generated using bivariate correlations. Known-groups validity is generated using parametric and non-parametric statistical tests.  Incremental validity is yielded using statistical regression techniques.