Abstract
A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet’s σ (1954), Scott’s π (1955), Cohen’s κ (1960) and Gwet’s γ (2008) were selected to represent the classical, descriptive approach, α agreement parameter from Aickin (1990) to represent loglinear and mixture model approaches and ∆ measure from Martín and Femia (2004) to represent multiple-choice test. Main results confirm that π and κ descriptive measures present high levels of mean bias in presence of extreme values of prevalence and rater bias but small to null levels with moderate values. The best behavior was observed with Bennet and Martín and Femia agreement measures for all levels of prevalence.