Spanish word frequencies based on film subtitles

Published On: 26/06/2011|Categories: 2008–2012, Vol.32 (2), Vol.32 (2011)|

Fernando Cuetos, Maria Glez-Nosti, Analía Barbón, Marc Brysbaert

DOI:

Abstract

Recent studies have shown that word frequency estimates obtained from films and television subtitles are better to predict performance in word recognition experiments than the traditional word frequency estimates based on books and newspapers. In this study, we present a subtitle-based word frequency list for Spanish, one of the most widely spoken languages. The subtitle frequencies are based on a corpus of 41M words taken from contemporary movies and TV series (screened between 1990 and 2009). In addition, the frequencies have been validated by correlating them with the RTs from two megastudies involving 2,764 words each (lexical decision and word naming tasks). The subtitle frequencies explained 6% more of the variance than the existing written frequencies in lexical decision, and 2% extra in word naming.

Open Access

Spanish word frequencies based on film subtitles

Authors

Abstract