Measuring Test-Retest Reliability: The Intraclass Kappa

Dennis Fisher1, Grace Reynolds-Fisher1, Eric Neri2, Art Noda2, Helena Kraemer2
1California State University, Long Beach, 2Stanford University


Abstract

Anyone using structured interview, or questionnaire instruments must establish the psychometric properties of their instrument (i.e. reliability and validity). The first property which must be established is reliability, because one cannot have a valid measure unless the measure has sufficient reliability. When the response data are dichotomous (Yes/No, Presence/Absence, Positive/Negative etc.) the most common measure in the literature is Cohen’s kappa (Cohen, 1960). This measure is appropriate for interrater reliability in which the responses from two different raters are assessed for agreement. However, many reliability studies have data from the same rater at two different points in time. This is known as intrarater reliability or test-retest reliability. Cohen’s kappa “forgives” rater bias which is not desirable for a measure that is used in test-retest reliability assessment. The correct statistic to use is the Intraclass kappa (Kraemer, Periyakoil, & Noda, 2002). We present a SAS macro which uses a bootstrap procedure to obtain both the point value and the confidence limits of the intraclass kappa so that applied researchers reporting test-retest reliability will be able to report the correct statistic. The Macro will not run in SAS 9.4 (TS1M2). It will run in 9.4(TS1M5) and 9.4(TS1M6).