13 Evolution

This chapter explores the different biological ingredients that make humans predisposed to create and appreciate music, and why these biological ingredients developed in the first place.

We are going to approach these topics from the perspective of evolutionary biology. Evolution describes the way in which inherited characteristics of a population of organisms change over time in response to natural selection. Natural selection describes how individuals with different inherited characteristics differ in their propensity to survive and reproduce. A beneficial characteristic may originally come about through a chance genetic mutation: if this beneficial characteristic helps the organism to survive and reproduce, then the characteristic will be spread to more organisms in the next generation, whose survival and reproduction chances can in turn be helped by this characteristic. The theory of evolution is essential for answering ‘why’ questions in biology: it helps us to understand the reason why particular biological traits developed in the first place.

We will structure our discussion around two particularly important research questions in the field, which are as follows:

  1. Where did our musical capacities (our ‘musicality’) come from?

  2. What are music’s adaptive functions?

13.1 Where did our musicality come from?

Before we can begin to answer this question, we need a clear idea of what ‘musicality’ comprises. Reviewing the music psychology literature gives us many example traits that contribute to musicality, such as beat perception, pitch perception, consonance perception, emotion perception, grouping, and sensitivity to musical ‘syntax’. So, we can decompose the question of ‘why did musicality develop’ into many smaller questions: ‘why did beat perception develop’, ‘why did pitch perception develop’, why did ‘consonance perception develop’, and so on.

A common thread throughout all of these questions is the possibility of ‘exaptation’. A biological trait is called ‘exapted’ if it originally evolved for one purpose, but then later was adopted for another purpose. For example, it is now thought that feathers originally evolved for insulation purposes, and were only later exapted for supporting flight. In the context of music, we are particularly interested in understanding whether certain aspects of musicality (for example beat perception) evolved specifically for musical functions, or whether they instead evolved originally for another purpose, and were only later adopted for music.

How can we tell whether a particular musical trait was exapted? It’s always difficult to prove things in evolutionary biology, but nonetheless two kinds of evidence are particularly useful here.

One important piece of evidence comes when the trait has (or at least had, at some point in evolutionary history), an adaptive function that is non-musical. This is a necessary precondition for exaptation. However, it’s not sufficient, because perhaps the trait originally evolved for music and only subsequently achieved its non-musical function. 

So, a second useful piece of evidence comes when we find that the trait evolved before music evolved. If this is the case, then the trait’s evolution cannot have been guided by musical functions. Unfortunately, it is quite difficult to know exactly when music evolved, because music does not leave much of a fossil record. However, scientists try and get a handle on this question by comparing human evolution to that of related species, for examples chimpanzees and apes, who do not produce or engage with music in the way that we understand it.

If we apply these principles, it turns out that many aspects of musicality may well have evolved for non-musical purposes, only later being exapted for musicality. We come to this conclusion after establishing that these musical traits in fact have clear non-musical functions, and are shared with species that do not produce music, implying that they evolved prior to music’s evolution.

Let’s go through a few examples.

13.1.1 Pitch perception

Pitch perception is essential to appreciating most forms of music – without it we can’t begin to appreciate the structure or form of a melody. One might therefore hypothesise that we evolved pitch perception for its musical function. However, it turns out that pitch perception is far from being unique to humans – we find it in many parts of the animal kingdom. In particular, pitch perception seems to be useful for animals that communicate with one another using vocalisations. The ability to control the pitch of one’s vocalisations and to identify the pitch of others’ vocalisations unlocks a whole range of expressive potential, allowing birds to produce complex competitive singing displays in order to attract mates, allowing chimpanzees to communicate danger using alarm calls, and so on.

13.1.2 Beat perception

Beat perception is a second skill that is fundamental to appreciating many kinds of music. This is the process by which we hear a musical extract and identify an underlying isochronous beat that we can entrain to. Most humans can do this easily, but it turns out that it is a non-trivial skill: many of our closest animal relatives are unable to entrain to a beat, and it turns out to be surprisingly hard to program a computer to do accurate beat extraction. So, we might again hypothesise that beat perception is a skill that evolved especially to support music making and listening. However, when we look more broadly over the animal kingdom, we surprisingly do find certain species capable of beat perception, just not very close to humans evolutionarily speaking.

The video below shows a well-known dancing cockatoo called Snowball (Patel et al., 2009). Snowball likes to dance to the musical beat, displaying abilities that strongly imply a sophisticated capacity for beat perception. Alongside cockatoos, beat perception abilities have also been demonstrated in budgerigars and sea lions, and there are ongoing research programmes seeking to identify more species capable of beat perception. Importantly, these species seem not to have evolved beat perception in one common evolutionary event; instead, beat perception seems to have convergently evolved multiple times in multiple species, not obviously in the context of music-making.

Snowball the parrot dancing to music. Credit: Patel et al. (2009)


Patel & Iversen (2014) proposed that beat perception abilities come as a consequence of evolving ‘vocal-learning’ abilities. Vocal learning means learning to produce complex vocal signals from auditory experience and sensory feedback. Vocal-learning abilities are only present in a small range of animal species (parrots, songbirds, hummingbirds, elephants, seals, and bats), and seems to require specialised neural circuitry coupling motor regions to auditory regions. Patel and Iversen’s hypothesis is that beat perception comes ‘for free’ as a consequence of developing this neural circuitry. In later work, Patel (2021) has nuanced this view, arguing that the human capacity for entrainment was further finessed through selection pressures specific to music (see Section 13.2). Various scientists are currently trying to explore these hypotheses by investigating whether other vocal learners, for example seals and bats, possess latent beat perception abilities.

13.1.3 Emotion perception

Emotion perception is a third key characteristic of musicality. Music has a great capacity to elicit emotional responses, and one might think that the sensitivity to these emotions evolved specifically for this context. However, if we explore the cues that music uses to elicit emotional responses, it turns out that many of these cues are shared directly with language. We will discuss these mechanisms in detail in Chapter 15.

13.1.4 Auditory scene analysis

A fourth key aspect of music perception is the ability to parse complex musical textures where many things are happening at the same time. This might involve for example distinguishing the melody from accompaniment in a pop song, hearing out the different subjects in a Bach fugue, and so on. These skills for parsing musical scenes seem directly linked to general processes of auditory scene analysis that help organisms to understand non-musical auditory environments. For example, in a cocktail party, auditory scene analysis is what allows us to separate the noisy sound signal into different perceptual streams corresponding to different conversations. Many aspects of human auditory scene analysis seem to be shared with non-human, non-musical species, implying that these traits evolved for non-musical rather than musical applications.

13.1.5 Musical syntax

Musical syntax is a fifth important aspect of music perception. By musical syntax, we mean music’s sequential and hierarchical structure: for example, the sense in which successive melody notes may be grouped into implied chords, how these chords may be grouped into chord progressions, and how chord progressions come together to produce higher-level tonal organisation. Understanding these musical syntactic structures requires fairly complex cognitive processing, and we might wonder why humans developed the cognitive skills to do this.

A prominent hypothesis is that our abilities to process musical syntax stems from our abilities to process linguistic syntax. This hypothesis is suggested by various structural similarities between linguistic and musical syntax, for example the way that both music and language can be conceptualised as atomic elements (such as notes, phonemes) that are hierarchically grouped into higher-level structures, such as chords and melodies or words and sentences (Chomsky, 1957; Rohrmeier, 2011). There is some tentative empirical evidence for this hypothesis, demonstrating for example certain similarities or overlaps between neural signatures of linguistic and musical processing. However, it is difficult to settle the question definitively, because music and language are not found in any other species, and because neither leave fossil records, making it difficult to know which came first in the evolutionary timeline. This connection between music and language is the subject of lots of ongoing research.

13.1.6 Conclusion

So, returning to our original question of ‘why did our musical capacities develop in the first place?’, it seems clear that many of our musical capacities may in fact have originally evolved for non-musical purposes.

Does this mean that musicality has never shaped evolution in in its own way? Not necessarily. It is possible that musicality originally evolved for non-musical reasons, but subsequently became evolutionarily valuable in its own right, and hence was finessed by further evolutionary processes (Patel, 2021; Savage et al., 2020). The question is then, why might musicality have become evolutionarily valuable? This is the topic of the next section.

13.2 What are music’s adaptive functions?

13.2.1 The ‘auditory cheesecake hypothesis’

One possibility is that music conveys no survival or reproductive benefits, but rather tickles pleasure centres in the brain that originally evolved for other reasons. An analogy would be various recreational drugs. This hypothesis was most famously articulated by Stephen Pinker, who wrote “I suspect that music is auditory cheesecake, an exquisite confection crafted to tickle the sensitive spots of at least six of our mental faculties” (Pinker, 1997).

What pre-existing mental faculties could music be tickling, and why would tickling these faculties be pleasurable? We will consider a couple of possibilities now.

13.2.1.1 Information processing

Music perception can be formulated as an information processing problem. Sound arrives in a very complex form, and the brain’s job is to process and understand this complex information. It needs to group different harmonics together into tones, and group multiple tones into melodies and chords; it needs to relate the pitches and rhythm it hears to prior cultural knowledge about tonality and metre; it needs to keep track of local thematic information, modulations, and emotional cues. One proposition is that music elicits pleasure precisely because of how it stimulates this complex information processing system.

How exactly would this information processing become rewarding? We don’t really know yet, though there are various hypotheses in the literature. One hypothesis is that rewards are released when the brain makes predictions for future events, and these predictions turn out to be correct. A second hypothesis is that rewards are released when the incoming information is initially complex, but the brain successfully compresses it into a small representation, perhaps by noticing certain repeating patterns or by relating the music to prior cultural knowledge. A third hypothesis is that rewards are released when the brain feels that is successfully learning new information. Many of these ideas have some kind of intuitive rationalisation in terms of evolutionary adaptation, but there has been little progress in demonstrating that these ideas explain much of musical pleasure in practice.

13.2.1.2 Emotion processing

Another potential aspect of musical pleasure concerns music’s ability to communicate various emotions. It’s clear to all of us that music can indeed communicate a wide variety of emotions, ranging from ecstasy to tragedy. It’s not surprising that listening to happy music should elicit pleasure; what’s more surprising though is that listening to sad music can elicit just as much pleasure. Why is this? We will outline various possibilities here; see Eerola et al. (2018) for an in-depth review.

One possibility suggested by recent work is that listening to emotional music is a bit like listening to someone tell us an emotional story about their lives. From a survival perspective, it makes a lot of sense to pay attention to other people’s emotional stories, even when these stories are sad: it provides an opportunity to learn from their experiences without having to go through the same negative life experiences, and it provides an opportunity to develop our ability to see from other people’s perspectives (for relevant arguments in the context of literature, see Kidd & Castano, 2013; Mar & Oatley, 2008). This hypothesis may be termed the simulation hypothesis. Moreover, if the music induces the listener to feel compassion towards the musical source, this feeling of compassion may itself be pleasurable; compassion is a direct precursor of altruism, and humans have evolved to experience pleasure upon performing altruistic acts, presumably in part due to the phenomenon of kin selection (helping genetically related individuals helps to propagate one’s shared genes) and in part due to the phenomenon of reciprocal altruism (if I help you, one day you will help me in return). This hypothesis has been termed pleasurable compassion theory (Huron & Vuoskoski, 2020).

Other work has suggested that this imagined compassionate relationship may alternatively operate in the reverse direction. In particular, it is suggested that to sad listeners sad music will represent a virtual person with congruent emotions, and the existence of this virtual person will have a comforting and consoling effect (Lee et al., 2013). This may be termed the social surrogacy hypothesis. Another related possibility is that sad music distracts the sad listener from their own emotions, and this distraction ultimately improves the listener’s mood (Drake & Winner, 2012).

13.2.1.3 Problems with the ‘cheesecake’ hypothesis

Above we explored a couple of ways in which the auditory cheesecake hypothesis could work in practice, with music eliciting pleasure despite not having any intrinsic adaptive function. We should now acknowledge two limitations of the hypothesis.

The first is the fact that humans invest a lot of resources (such as time and money) into music. Evolution tends to prune away behaviours that bring costs without benefits, which makes it less likely that music would survive to become so widespread if it didn’t bring some kind of adaptive function.

A second limitation is explaining ‘groove’, the sense in which music often elicits a deep impulse to entrain body movement to its beat, with this entrainment often then feeling deeply pleasurable. Entrainment doesn’t seem to have an adaptive function in non-musical contexts, so it is difficult to explain why we would enjoy it so much.

These limitations suggest that we should entertain the possibility that music does after all have some kind of adaptive function. We’ll now discuss various ways in which this could work.

13.2.2 Sexual selection

Darwin (1871) promoted the hypothesis that human musicality was shaped by sexual selection, similar to the evolution of birdsong. He claimed that early humans competed to attract mates by virtue of their musical performances, producing a selection bias towards improved musical skills, and hence promoting the evolution of musicality. This hypothesis has some surface plausibility, but is now considered to be problematic (e.g. Mehr et al., 2020). The key issue is that sexual selection tends to create sexual dimorphism, where the relevant trait is amplified most in the sex that invests least in reproduction. This is why male birds tend to have more flamboyant plumage, for example. In contrast, we don’t see strong sexual dimorphism in human musicality.

13.2.3 Social bonding

A second hypothesis is that music’s significant evolutionary function was inducing social bonding (e.g. Savage et al., 2020). Various scientific studies have confirmed that music-making seems to be an effective way of enhancing social bonding within a group. It’s obvious that social bonding has positive adaptive implications: it helps to form stable communities of individuals that can help each other to hunt for food and deter predators. Now, we should ask why we need music to promote social-bonding in the first place; if social bonding is so useful, why didn’t we simply evolve a propensity for social bonding that didn’t depend on time-consuming music-making?

The answer may be that music-making provides a particularly good way to develop social relationships. It allows you to practice complex collaborative behaviors without risk of dangerous consequences. It allows you to develop a relationship with many individuals at the same time in a way that language cannot. It has a ‘floating intentionality’ allowing many people to participate at the same time and feel like they are communicating something together, even if they don’t agree much on specific things (Cross, 2001). Put another way, a communal musical experience can support many people with diverse opinions, even if they wouldn’t ordinarily get on well in conversation.

Note that the social-bonding hypothesis provides a good explanation for musical ‘groove’, where people enjoy synchronising their physical movements to the musical beat: humans evolved to enjoy musical synchrony because it promoted group musical activities, and hence promoting social bonding.

13.2.4 Credible signalling

A third hypothesis is that music-making evolved as a credible signal (e.g. Mehr et al., 2020). A credible signal is a signal that is hard to fake, and therefore provides compelling evidence for a given state of affairs. For example, the building of massive architectural structures such as the pyramids historically provided a credible signal that a civilisation was technologically advanced and had the capacity to coordinate impressively large workforces.

Along these lines, we might suppose that performing a large-scale piece of collaborative music or dance acts as a credible signal of a group’s strength, size, and cooperation ability. A well-known example is the Maori “Haka” dance, which is traditionally associated with the battle preparations of Maori warriors, and is often performed by New Zealand sports teams before international matches. Note that, like the social-bonding hypothesis, the credible-signaling hypothesis also provides a reasonable explanation of musical ‘groove’.

A Maori ‘Haka’ dance.


A second potential form of credible signalling concerns parental attention. The idea is that singing to a child is a credible signal that you’re paying attention to them, because it’s guaranteeing that you’re somewhere in close proximity to the child, and you can’t be using your voice for other things (such as talking).

13.2.5 Conclusions

We began this chapter by asking why our musical capacities developed in the first place. In all the cases we considered, the answer seemed to be exaptation: in other words, the musical capacities seem to derive directly from cognitive capacities that likely evolved for other reasons. However, it seems plausible that these same capacities may have subsequently been shaped by evolutionary functions of music-making.

We then asked a second question: what might these subsequent evolutionary functions be? We discussed three potential explanations to this question. The first was the auditory cheesecake hypothesis, which claims that music has no special evolutionary pressures, and that people pursue music because it tickles their information processing and emotion processing capacities. The second explanation claimed that humans evolved to produce music as a mechanism for inducing social bonding. The third explanation claimed that music humans evolved to produce music as a credible signalling mechanism.

It’s important to emphasise that many of these hypotheses are still rather speculative, and more work is needed to validate them as theories. There are also still lots of differences in opinion in the literature (e.g. Savage et al., 2020; Mehr et al., 2020). Unfortunately, the fact that music leaves essentially no fossil record makes it difficult to imagine ever achieving a conclusive end to this debate. Nonetheless, these evolutionary questions are very important to think about, because they constitute the foundations of almost all of music psychology.

References

Chomsky, N. (1957). Syntactic structures. Mouton Publishers.

Cross, I. (2001). Music, cognition, culture, and evolution. Annals of the New York Academy of Sciences, 930, 28–42. https://doi.org/10.1111/j.1749-6632.2001.tb05723.x

Darwin, C. (1871). The descent of man and selection in relation to sex. John Murray.

Drake, J. E., & Winner, E. (2012). Confronting sadness through art-making: Distraction is more beneficial than venting. Psychology of Aesthetics, Creativity, and the Arts, 6(3), 255–261. https://doi.org/10.1037/a0026909

Eerola, T., Vuoskoski, J. K., Peltola, H.-R., Putkinen, V., & Schäfer, K. (2018). An integrative review of the enjoyment of sadness associated with music. Physics of Life Reviews, 25, 100–121. https://doi.org/10.1016/j.plrev.2017.11.016

Huron, D., & Vuoskoski, J. K. (2020). On the enjoyment of sad music: Pleasurable compassion theory and the role of trait empathy. Frontiers in Psychology, 11, 1060. https://doi.org/10.3389/fpsyg.2020.01060

Kidd, D. C., & Castano, E. (2013). Reading literary fiction improves theory of mind. Science, 342(6156), 377–380. https://doi.org/10.1126/science.1239918

Lee, C. J., Andrade, E. B., & Palmer, S. E. (2013). Interpersonal relationships and preferences for mood-congruency in aesthetic experiences. The Journal of Consumer Research, 40(2), 382–391. https://doi.org/10.1086/670609

Mar, R. A., & Oatley, K. (2008). The function of fiction is the abstraction and simulation of social experience. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 3(3), 173–192. https://doi.org/10.1111/j.1745-6924.2008.00073.x

Mehr, S. A., Krasnow, M. M., Bryant, G. A., & Hagen, E. H. (2020). Origins of music in credible signaling. Behavioral and Brain Sciences, 44, e60. https://doi.org/10.1017/S0140525X20000345

Patel, A. D. (2021). Vocal learning as a preadaptation for the evolution of human beat perception and synchronization. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 376(1835), 20200326. https://doi.org/10.1098/rstb.2020.0326

Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The action simulation for auditory prediction (asap) hypothesis. Frontiers in Systems Neuroscience, 8. https://doi.org/10.3389/fnsys.2014.00057

Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for synchronization to a musical beat in a nonhuman animal. Current Biology, 19(10), 880. https://doi.org/10.1016/j.cub.2009.05.023

Pinker, S. (1997). How the mind works. W. W. Norton.

Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics & Music. Mathematical and Computational Approaches to Music Theory, Analysis, Composition and Performance, 5(1), 35–53.

Savage, P. E., Loui, P., Tarr, B., Schachner, A., Glowacki, L., Mithen, S., & Fitch, W. T. (2020). Music as a coevolved system for social bonding. Behavioral and Brain Sciences, 44, e59. https://doi.org/10.1017/S0140525X20000333