Published On: 27/01/2017|Categories: 2013–2017, Vol.38 (1), Vol.38 (2017)|


With the increasing refinement of language processing models and the new discoveries about which variables can modulate these processes, stimuli selection for experiments with a factorial design is becoming a tough task. Selecting sets of words that differ in one variable, while matching these same words into dozens of other confounding variables is time consuming and error prone. To assist experimenters in this thankless task, we present a simple method to perform it with little effort. The method is based on Kmeans clustering as a way to detect small and tight clusters of words that match in the desired variables. We have formalized the procedure into an algorithmic format, that is, a series of easy-to-follow steps. In addition, we also provide an SPSS syntax that helps in choosing the correct size of the clustering. After reviewing the theory, we present a worked example that will guide the reader through the complete procedure. The dataset of the worked example is available as a supplementary material to this paper.

Open Access