21 Causality

21.1 Correlation and causation

When two variables correlate significantly with each other, this is a sign that there is some kind of underlying causal connection. However, we must be very careful about how we identify the nature of that causal connection.

A classic example from the music psychology is the association between musical training and intelligence. Many studies have noted that, on average, individuals with greater levels of musical training tend to have higher levels of general intelligence, as measured for example by IQ tests (Santos-Luiz et al., 2016; Schellenberg, 2011a, 2011b).

A popular explanation for this association is that musical training causes an individual to develop greater cognitive skills (Gardiner et al., 1996; Hetland, 2000; Moreno & Bidelman, 2014; Rauscher, 2002). After all, playing music is a complex intellectual activity that loads on basic cognitive capacities such as attention, memory, and hand-eye coordination. If these cognitive capacities are anything like muscles, then training them should enhance them, with positive consequences for general intelligence.

An alternative explanation is that high intelligence predisposes individuals to persist with musical training. One way this could happen is if intelligent people tend to be more successful in the early stages of music learning, which encourages them to continue with the training process.

A third potential explanation is that neither musical training nor general intelligence causally affect each other, and instead there is some unnoticed third variable that drives both of them. For example, we know that different children grow up with different levels of household income. Perhaps having higher household income has two relevant effects here: (a) it makes the family more likely to pay for music lessons, and (b) it makes the family invest more money in the child’s academic education, increasing their performance on intelligence tests as a result.

These kinds of causal dilemmas are common whenever a scientist works with an observational dataset. An observational dataset is a dataset collected solely by observing and measuring a given phenomenon. There are various statistical techniques out there (e.g. regression modelling, causal modelling) that can be helpful for interrogating such datasets, but all rely on certain assumptions, and it is difficult to get a definitive answer out of them. It seems strange to imagine nowadays, but it took decades for health organisations to be convinced that smoking had a causal effect on lung cancer incidence, despite the correlation between smoking and lung cancer being established long prior.

21.2 Experimental manipulations

In practice, the scientist’s most powerful tool for solving these kinds of causal problems tends to be manipulation. What happens here is that the scientist actively manipulates Variable A, and observes whether Variable B changes in response. If Variable B does change, then we have strong evidence that Variable A causally influences Variable B. If not, then we must think again.

21.2.1 Independent and dependent variables

When conducting an experiment with a manipulation, it is conventional to classify the variables into two categories: independent and dependent variables. Independent variables are variables that the experimenter manipulates (Variable A in the example above). Dependent variables are variables that the experimenter measures without manipulating them directly (Variable B in the example above).

21.2.2 Repeated measures and between-groups designs

Experimental manipulations generally fall into two categories: repeated-measures and between-groups. Let’s consider each in turn.

In a repeated-measures design, we have a collection of participants, and we wish to examine the impact of a manipulation on these participants. Here the term ‘manipulation’ can be interpreted broadly. It could mean doing something literally to the participants — for example giving them a cup of coffee — but it could also mean doing something to the experimental setup that the participant experiences, for example changing the volume of the auditory stimuli. The defining characteristic of a repeated-measures design is that we expose each individual participant to multiple levels of the manipulation. Depending on the experiment, these levels could mean different things:

  1. Before or after an intervention, for example before or after a cup of coffee;
  2. Different categories of a discrete independent variable, for example running, walking, or sitting;
  3. Different values of a continuous independent variable, for example 33% volume, 49% volume, or 52% volume.

Sometimes it is not practical for the same subject to experience multiple levels of the same independent variable. In this case we conduct a between-groups design, where each participant only experiences only level of the independent variable. For the experimental manipulation to be considered valid, it is essential that the assignment of participants to independent variable levels be randomised. In principle, this could be done by rolling a die for each participant and choosing the value of the independent variable on the basis of the die roll; in practice researchers tend to use random number generators instead. Other methods (e.g. each participant choosing their own condition) do not qualify as proper manipulations, because the values of the independent variable will be affected by unknown pre-existing differences between the participants, which may have their own causal associations with the dependent variable.

It is possible to have multiple independent variables in the same experiment. Experiments with all repeated-measures variables or all between-groups variables are called repeated-measures and between-groups designs respectively; experiments with both repeated-measures and between-groups variables are called mixed designs.

In practice, repeated-measures designs tend to be considerably more powerful than between-groups designs. This is because repeated-measures are very good at accounting for individual differences between participants; even if one participant tends to score particularly low or particularly high on a dependent variable, this idiosyncrasy should apply equally across the different levels of the independent variable, so it can be controlled for when analysing the data. In contrast, in a between-groups design it is much harder to separate individual differences from the effects of the experimental manipulations; as a consequence, such designs can require many times more participants to achieve the same statistical reliability (see this blog post for an analysis). So, where possible, it is advisable to try and formulate studies as repeated-measures rather than between-groups designs.

One disadvantage of repeated-measures designs, though, is that they can be susceptible to carry-over effects. A carry-over effect is one where the identity of preceding conditions influences scores in the current condition. For example, suppose we are studying the effect of physical exercise on music listening, and we have three values of the independent variable: running, walking, and sitting. The effects of physical exercise on heart rate and body temperature can be fairly long-lasting. If we have the participant run for five minutes, then sit for five minutes, then walk for five minutes, their heart rate in the ‘sit’ condition is likely to be inflated by the fact that they were running in the previous condition. It’s essential therefore in repeated-measures designs to ensure that the order of conditions is balanced between participants, rather than being the same for all participants. One way of achieving this is simply to randomise the order of conditions across participants. There also exist more sophisticated ways of achieving this, for example Latin square designs, which ensure that the order of conditions is perfectly balanced between participants, rather than just being balanced on average. These only become important for very small participant groups, and we won’t consider them here.

21.2.3 Case study: Schellenberg (2004)

Schellenberg (2004) addressed the aforementioned question of whether musical training causes improvements in general intelligence. The study used a sample group of 144 six-year-old children in a between-groups design. The children were randomly assigned to one of four types of 36-week extracurricular classes: keyboard lessons, voice lessons, drama lessons, or no lessons. The researchers administered a battery of cognitive tests to the children before and after the training period. On average, children in all groups increased in IQ over the time period, as would be expected due to maturation and education. However, children who took music lessons experienced a greater increase in IQ (~ 7 points), as compared to children in the drama or no lessons conditions (~ 4 points). The researchers concluded that musical training does indeed improve general intelligence (though note that this conclusion is controversial! see Sala & Gobet (2020) for a recent meta-analysis disputing this and related studies).

21.3 Conclusions

Experimental manipulations are a very valuable tool for identifying causal relationships. Unfortunately, however, they are not always practical to conduct. Some kinds of manipulations take too long to achieve in the context of a particular study, or are too expensive, or raise problematic ethical issues. In these cases observational studies may be the only way forward. Fortunately, there are still many interesting things that we can learn from such studies with the right kinds of statistical methods. We’ll explore some of these in subsequent chapters.

References

Gardiner, M. F., Fox, A., Knowles, F., & Jeffrey, D. (1996). Learning improved by arts training. Nature, 381(6580), 284. https://doi.org/10.1038/381284a0

Hetland, L. (2000). Learning to make music enhances spatial reasoning. Journal of Aesthetic Education, 34(3/4), 179–238. https://doi.org/10.2307/3333643

Moreno, S., & Bidelman, G. M. (2014). Examining neural plasticity and cognitive benefit through the unique lens of musical training. Hearing Research, 308, 84–97. https://doi.org/10.1016/j.heares.2013.09.012

Rauscher, F. H. (2002). Mozart and the mind: Factual and fictional effects of musical enrichment. In J. Aronson (Ed.), Improving academic achievement (pp. 267–278). Academic Press. https://doi.org/10.1016/B978-012064455-1/50016-6

Sala, G., & Gobet, F. (2020). Cognitive and academic benefits of music training with children: A multilevel meta-analysis. Memory & Cognition, 48(8), 1429–1441. https://doi.org/10.3758/s13421-020-01060-2

Santos-Luiz, C. dos, Mónico, L. S. M., Almeida, L. S., & Coimbra, D. (2016). Exploring the long-term associations between adolescents’ music training and academic achievement. Music Scientiae, 20(4), 512–527.

Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science, 15(8), 511–514. https://doi.org/10.1111/j.0956-7976.2004.00711.x

Schellenberg, E. G. (2011a). Examining the association between music lessons and intelligence. British Journal of Psychology, 102(3), 283–302. https://doi.org/10.1111/j.2044-8295.2010.02000.x

Schellenberg, E. G. (2011b). Music lessons, emotional intelligence, and IQ. Music Perception, 29(2), 185–194. https://doi.org/10.1525/mp.2011.29.2.185