Tags

  • Published on

    New Pages for Hypothesis Testing, Measurement, and Populations in Research Engineer

    Pages for alpha value, beta value, Type I and Type II error, one-tailed and two-tailed tests, precision and accuracy, and inclusion and exclusion criteria

    New content for Research Engineer

    The following pages have been added to Research Engineer. We are dedicated here at Scale, LLC to delivering the newest and most pertinent content for you!

    1. Inclusion criteria
    2. Exclusion criteria
    3. Precision in measurement
    4. Accuracy in measurement
    5. Hypothesis testing
    6. Alpha value
    7. Beta value
    8. Type I error
    9. Type II error
    10. One-sided hypothesis
    11. Two-sided hypothesis

    Thank you for your continued use of the website! ~EH
  • Published on

    Small sample sizes, Type II errors, and empirical reasoning

    Small sample sizes can lead to Type II errors

    Significant effects may not be able to be detected

    In instances where a phenomenon or outcome is less prevalent in the population, scientists are forced to work small sample sizes. It is just the nature of the science, and the phenomenon or outcome.

    1. When working with smaller sample sizes, adequate statistical power (and therefore statistical significance) is VERY hard to achieve.

    2. There is limited precision and accuracy when using categorical or ordinal outcomes, which can further decreases statistical power.

    3. When measuring for small effect sizes, small sample sizes cannot provide enough variance in the outcome to detect clinically meaningful, but small effects. This REALLY decreases your statistical power since inferential statistics depend upon variance in the mathematical sense.

    With this being said, remember to interpret the p-values yielded from RCT level studies with small sample sizes in the context of the aforementioned points. If a treatment effect does not obtain statistical significance, but appears to be CLINICALLY SIGNIFICANT with a p-value approaching significance (Type II error), then perhaps more credence can be found in the effect.

    If researchers run bivariate tests on 30 different outcomes with less than 20 observations and claim significance without a Bonferroni adjustment, throw the article out.
  • Published on

    Writing survey items

    Write survey items that cover content areas

    Survey items are composed of item stems and response sets

    When it comes to writing survey items that use Likert scales as response sets, use 5-point Likert scales with increasing order. The 5-point scale is preferable to a 4-point, 3-point, or dichotomous scales because there is more chance for variance with a 5-point scale and there is a "neutral" rating.

    Variance in the responses is needed in order to properly assess the diversity that may exist in a population. Increased variance is also important for the underlying mathematics associated with reliability analysis, exploratory factor analysis, validity analysis, and confirmatory factor analysis.

    The use of 5-point Likert scales also works well in an aesthetic fashion for structuring a survey. Space and time can be saved in survey administration when items from similar content areas use the same 5-point Likert response set. These questions can be formatted into a matrix.

    Finally, increasing order should be used when using a Likert scale, going from left to right.  

    For example:

    Strongly Disagree, Disagree, Neither Agree Nor Disagree, Agree, Strongly Agree
    Never, Rarely, Sometimes, Often, Always
    Very Poor, Poor, Moderate, Good, Very Good
  • Published on

    Precision and Accuracy

    Precision and Accuracy

    Cornerstones of measurement reasoning

    Precision and accuracy are terms that are debated intensely in empirical arenas. While definitions will differ from textbook to textbook and within different academic circles, here is a general definition and explanation for both terms:  

    Precision relates to the reliability, consistency, and stability of a variable or outcome, as it is measured in a given population. Commonly in research and biostatistics, precision is assessed using confidence intervals (most often, 95% confidence intervals).

    When using categorical outcome variables in bivariate and multivariate analyses, the precision of odds ratios yielded from analyses is determined by the width of the confidence interval. WIDE confidence intervals mean that there is LESS precision/reliability/consistency/stability/confidence in the measure. Wide confidence intervals are attributed to small sample sizes when using categorical outcomes.

    Analyses using continuous outcomes report the 95% confidence intervals or standard errors of means, mean differences, and unstandardized beta coefficients. Sample size also plays an important role in the width of confidence intervals when using continuous outcomes.

    Precision is often communicated as reliability in psychometrics. Survey instruments are pilot tested and then reliability coefficients are generated using test-retest, internal consistency, or inter-rater methods.  

    Accuracy pertains to the validity, utility, and interpretability of a variable or outcome, as it is measured in a given population. The accuracy or validity of a measure relies upon the methods, assessment, and evidence through which it was created using a theoretical or conceptual framework. In order for a measure to be deemed accurate, it must go through rigorous testing and application in the clinical environment.

    With clinical measures related to "gold standard" treatments, the absolute risk reduction (ARR) and the number needed to treat (NNT) or the absolute risk increase (ARI) and the number needed to harm (NNH) needs to be established using randomized controlled trials and systematic reviews. With diagnostic tests, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (PPV), and total diagnostic accuracy need to be compared against a current and widely accepted "gold standard" diagnostic test.

    Finally, in psychometrics, construct validity is established by gathering many different forms of empirical evidence related to the interpretability, utility, and consequences of the measure. Researchers often use correlations, between-subjects analyses, and multivariate statistics to generate validity evidence. Predictive, concurrent, convergent, and discriminant validity evidence is generated using bivariate correlations. Known-groups validity is generated using parametric and non-parametric statistical tests.  Incremental validity is yielded using statistical regression techniques.