Precision and Accuracy

9/11/2014

Precision and Accuracy

Cornerstones of measurement reasoning

Precision and accuracy are terms that are debated intensely in empirical arenas. While definitions will differ from textbook to textbook and within different academic circles, here is a general definition and explanation for both terms:

Precision relates to the reliability, consistency, and stability of a variable or outcome, as it is measured in a given population. Commonly in research and biostatistics, precision is assessed using confidence intervals (most often, 95% confidence intervals).

When using categorical outcome variables in bivariate and multivariate analyses, the precision of odds ratios yielded from analyses is determined by the width of the confidence interval. WIDE confidence intervals mean that there is LESS precision/reliability/consistency/stability/confidence in the measure. Wide confidence intervals are attributed to small sample sizes when using categorical outcomes.

Analyses using continuous outcomes report the 95% confidence intervals or standard errors of means, mean differences, and unstandardized beta coefficients. Sample size also plays an important role in the width of confidence intervals when using continuous outcomes.

Precision is often communicated as reliability in psychometrics. Survey instruments are pilot tested and then reliability coefficients are generated using test-retest, internal consistency, or inter-rater methods.

Accuracy pertains to the validity, utility, and interpretability of a variable or outcome, as it is measured in a given population. The accuracy or validity of a measure relies upon the methods, assessment, and evidence through which it was created using a theoretical or conceptual framework. In order for a measure to be deemed accurate, it must go through rigorous testing and application in the clinical environment.

With clinical measures related to "gold standard" treatments, the absolute risk reduction (ARR) and the number needed to treat (NNT) or the absolute risk increase (ARI) and the number needed to harm (NNH) needs to be established using randomized controlled trials and systematic reviews. With diagnostic tests, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (PPV), and total diagnostic accuracy need to be compared against a current and widely accepted "gold standard" diagnostic test.

Finally, in psychometrics, construct validity is established by gathering many different forms of empirical evidence related to the interpretability, utility, and consequences of the measure. Researchers often use correlations, between-subjects analyses, and multivariate statistics to generate validity evidence. Predictive, concurrent, convergent, and discriminant validity evidence is generated using bivariate correlations. Known-groups validity is generated using parametric and non-parametric statistical tests. Incremental validity is yielded using statistical regression techniques.

0 Comments