Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies


Autoria(s): O’Leary, Shaun; Lund, Marte; Ytre-Hauge, Tore Johan; Holm, Sigrid Reiersen; Naess, Kaja; Dalland, Lars Nagelstad; McPhail, Steven M.
Data(s)

18/03/2014

Resumo

OBJECTIVE To compare different reliability coefficients (exact agreement, and variations of the kappa (generalised, Cohen's and Prevalence Adjusted and Biased Adjusted (PABAK))) for four physiotherapists conducting visual assessments of scapulae. DESIGN Inter-therapist reliability study. SETTING Research laboratory. PARTICIPANTS 30 individuals with no history of neck or shoulder pain were recruited with no obvious significant postural abnormalities. MAIN OUTCOME MEASURES Ratings of scapular posture were recorded in multiple biomechanical planes under four test conditions (at rest, and while under three isometric conditions) by four physiotherapists. RESULTS The magnitude of discrepancy between the two therapist pairs was 0.04 to 0.76 for Cohen's kappa, and 0.00 to 0.86 for PABAK. In comparison, the generalised kappa provided a score between the two paired kappa coefficients. The difference between mean generalised kappa coefficients and mean Cohen's kappa (0.02) and between mean generalised kappa and PABAK (0.02) were negligible, but the magnitude of difference between the generalised kappa and paired kappa within each plane and condition was substantial; 0.02 to 0.57 for Cohen's kappa and 0.02 to 0.63 for PABAK, respectively. CONCLUSIONS Calculating coefficients for therapist pairs alone may result in inconsistent findings. In contrast, the generalised kappa provided a coefficient close to the mean of the paired kappa coefficients. These findings support an assertion that generalised kappa may lead to a better representation of reliability between three or more raters and that reliability studies only calculating agreement between two raters should be interpreted with caution. However, generalised kappa may mask more extreme cases of agreement (or disagreement) that paired comparisons may reveal.

Identificador

http://eprints.qut.edu.au/66185/

Publicador

Elsevier Ltd.

Relação

DOI:10.1016/j.physio.2013.08.002

O’Leary, Shaun, Lund, Marte, Ytre-Hauge, Tore Johan, Holm, Sigrid Reiersen, Naess, Kaja, Dalland, Lars Nagelstad, & McPhail, Steven M. (2014) Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies. Physiotherapy, 100(1), pp. 27-35.

Direitos

Copyright 2013 Chartered Society of Physiotherapy

This is the author’s version of a work that was accepted for publication in Physiotherapy. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Physiotherapy, [VOL 100, ISSUE 1, (2014)] DOI: 10.1016/j.physio.2013.08.002

Fonte

Faculty of Health; Institute of Health and Biomedical Innovation; School of Public Health & Social Work

Palavras-Chave #010402 Biostatistics #110317 Physiotherapy #111799 Public Health and Health Services not elsewhere classified #Reliability; Kappa; Scapular; Posture; Inter-therapist; Agreement
Tipo

Journal Article