In this paper we reflect on the numerous calls for the development of benchmarks for interpreting effect size indices, reviewing several possibilities. Such benchmarks are aimed to provide criteria so that analysts can judge whether the size of the effect o bserved is rather “small”, “medium” or “large”. The context of this discussion is single-case experimental designs, for which a great variety of procedures have been proposed, with their different nature (e.g., being based on amount of overlap vs. a standardized mean difference) posing challenges to interpretation. For each of the alternatives discussed we point at their strengths and limitations. We also comment how such empirical benchmarks can be obtained, usually by methodologists, and illustrate how these benchmarks can be used by applied researchers willing to have more evidence on the magnitude of effect observed and not only whether an effect is present or not. One of the alternatives discussed is a proposal we make in the current paper. Although it has certain limitations, as all alternatives do, we consider that it is worth discussing it and the whole set of alternatives in order to advance in interpreting effect sizes, now that computing and reporting their numerical values is (or is expected to be) common practice.

Open Access