23 Generalisability

Scientific studies often rely on some kind of sampling. This could mean recruiting people to take part in the study, or it could mean compiling a collection of musical scores or recordings to analyse in a corpus analysis. In either case, we’d describe the resulting collection of participants or musical extracts as a sample group.

The goal of the scientific study is, in most cases, to establish findings that generalise outside of the particular sample groups tested. We are not interested, per se, in quantifying the psychological tendencies of the particular participants who happened to come and do our study; we are interested rather in understanding psychological tendencies of the broader human population.

When performing such a generalisation, we need to think carefully about two main aspects of the sample group: size and representativity. Let’s discuss each in turn.

23.1 Sample group size

People vary in all kinds of ways, and this variation manifests itself in the data that we collect in psychological experiments. Think back to the study discussed in a previous chapter about the impact of music lessons on general intelligence (Schellenberg, 2004). Each child tested in this study will have had their own individual learning trajectory over the year-long research period, independent of whatever training condition they were assigned to. Some will have happened to develop particularly fast; some will have happened to develop particularly slowly. With small sample groups, these random variations will become increasingly larger parts of the observed differences between experimental conditions, and we become less and less confident that our results will actually generalise to the wider population. This is why large sample groups are highly valued in scientific research.

23.2 Sample group representativity

Certain kinds of participants are typically much more convenient to recruit than others. When we conduct an experiment in a university context, it is very common for the participants to be drawn from the local undergraduate community. When we conduct a study online using crowdsourcing services such as Prolific or Amazon Mechanical Turk, we will be recruiting a particular kind of person who spends time completing online tasks for money.

When conducting such studies, we have to think very carefully about the makeup of our participant group, and the implications for the generalisability of our findings. For example, we should be very careful about declaring universals of music perception on the basis of a participant group composed solely of undergraduates from the University of Cambridge, because these students will be universally familiar with the Western tonal music tradition, and this will shape their responses to music in a particular way that may not generalise cross-culturally.

The importance of this representativity and generalisability question was highlighted by a well-publicised series of papers concerning the acronym ‘WEIRD’, which stands for ‘Western, Educated, Industrialized, Rich, and Democratic’ (Henrich et al., 2010; Rad et al., 2018). Henrich et al. (2010) observed that the great majority of published psychological studies base their conclusions solely on WEIRD participants, despite the fact that as a rule the researchers frame their findings as representing general truths about human psychology. They provide a series of interesting and counterintuitive examples of how aspects of psychology that we might intuitively consider to be innate and universal actually differ substantially cross-culturally, highlighting the fact that it’s difficult to make any real assumptions about psychological generalisability in the absence of concrete evidence. The field has been slow to respond meaningfully to this issue, partly because it is so expensive and time-consuming to collect truly representative sample groups, and partly because the scientific culture incentivises researchers to try and make as broad claims as they can get away with (Rad et al., 2018).

Representativity is also an important issue in music corpus analyses, where one takes a collection of music compositions, analyses them computationally, and tries to draw general conclusions about a musical style. Choosing an appropriate sample group in this context can be very complex, especially when one is trying to analyse broad genres (e.g. classical music, jazz music) rather than individual composers. See London (2013) for a discussion of these issues.

23.3 Conclusion

Sample group size and representativity are both very important for ensuring that the results of a scientific study are generalisable. Both bring significant practical and logistic costs, and in practice it is rarely possible to perfect both aspects. When writing up a scientific study, it is important to acknowledge both of these issues, and address how they might limit the conclusions being made.

References

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466(7302), 29–29. https://doi.org/10.1038/466029a

London, J. (2013). Building a representative corpus of classical music. Music Perception, 31(1), 68–90.

Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences of the United States of America, 115(45), 11401–11405. https://doi.org/10.1073/pnas.1721165115

Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science, 15(8), 511–514. https://doi.org/10.1111/j.0956-7976.2004.00711.x