Photo by DALL.E 2

Published On: 23/09/2025|Categories: 2023-2027, Vol.46 (2), Vol.46 (2025)|
Cite as: Sendín, E., Conde, J., Reviriego, P., Haro, J., Ferré, P., Hinojosa, JA., Brysbaert, M. (2025). Combining the power of large language models with finetuning based on strategically collected human ratings: A case study about age-of-acquisition estimates of Spanish words. Psicologica 46(2): e17563

Rate this article

3.7

Abstract

This study examined the ability of a large language model, GPT-4o mini, to predict age of acquisition (AoA) for Spanish words, as compared to human ratings. We found a strong correlation (ρ=.75) between the model’s AoA estimates and mean human ratings. This correlation was lower than the level of agreement observed between individual human raters (ρ=.85), but we found that finetuning the model on a relatively small dataset of 2000 human AoA ratings has the potential to enhance the model’s performance to a level comparable to human consensus. Consistent with theoretical expectations, our analyses confirmed that AoA estimates are meaningful only for words within an individual’s vocabulary. Finally, we present a novel dataset of AoA estimates for 28,453 Spanish words likely known by adult speakers.

Open Access