# Kappa is a measure of inter-rater reliability

## Rating performance or constructs a dichotomous categorical level

**statistic is a measure of inter-rater reliability when the construct or behavior is being rated using a**

__Kappa__**dichotomous categorical**outcome. When a sequential series of steps must be completed to yield an end product, such as with performance assessment, then a "checklist" or series of "yes/no" responses are scored by independent raters. The Kappa statistic can be used to assess the level of

**agreement/consistency/reliability**between raters on subsequent dichotomous responses.

It is important that raters have an

**operational definition**of what constitutes a "yes" or "no" in regards to performance. The construct or behavior of interest must be

**standardized**between raters so that unsystematic bias can be reduced. A lack of operationalization and standardization in performance assessment significantly

**DECREASES**the chances of obtaining evidence of inter-rater reliability when using the Kappa statistic.

Kappa is not a "powerful" statistic because of the dichotomous categorical variables used in the analysis. Larger sample sizes are needed to achieve adequate statistical power when categorical outcomes are utilized. So,

**many observations**of the performance of simulation may be needed to adequately assess

**BOTH**inter-rater reliability and outcomes of interest. The chances of having adequate inter-rater reliability decreases with fewer observations of performance or simulation.