Jump to ContentJump to Main Navigation
Language, Music, and the BrainA Mysterious Relationship$

Michael A. Arbib

Print publication date: 2013

Print ISBN-13: 9780262018104

Published to MIT Press Scholarship Online: May 2015

DOI: 10.7551/mitpress/9780262018104.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 18 October 2021

Neural Correlates of Music Perception

Neural Correlates of Music Perception

(p.141) 6 Neural Correlates of Music Perception
Language, Music, and the Brain

Stefan Koelsch

The MIT Press

Abstract and Keywords

This chapter provides an overview of neural correlates of music-syntactic and music-semantic processing, as well as of music-evoked emotions. These three aspects of music processing are often intertwined. For example, a music-syntactically irregular musical event does not only evoke processes of syntactic analysis in the perceiver, but might also evoke processing of meaning, an emotional response, decoding of the producer’s intentions, etc. In addition, it becomes clear that the neural correlates of these processes show a strong overlap with the processes engaged during the perception of language. These overlaps indicate that “music” and “language” are different aspects, or two poles, of a single continuous domain: the music-language continuum. Published in the Strungmann Forum Reports Series.

Keywords:   neuroscience of music, music perception, music–language continuum, music-syntactic processing, music-semantic processing, music-evoked emotions

Musical Syntax

The regularity-based arrangement of musical elements into sequences is referred to here as musical syntax (see also Riemann 1877; Patel 2003; Koelsch 2005; Koelsch and Siebel 2005). It is not useful, however, to conceptualize musical syntax as a unitary concept because there are different categories of syntactic organization. Such syntactic organization can emerge from regularities based on local dependencies, from regularities involving long-distance dependencies, from regularities established on a short-term basis that do not require long-term knowledge, and from regularities that can only be represented in a long-term memory format, etc. Therefore, different cognitive processes have to be considered when thinking about (different categories of) musical syntax. In this section, I begin with a discussion of the cognitive processes that can be involved in the processing of musical syntax and then describe neural correlates of some of these processes.

(p.142) Cognitive Processes

What are the cognitive (sub)processes involved in processing different categories of musical syntax? Below I briefly enumerate such processes, mainly referring to tonal music (other kinds of music do not necessarily involve all of these features; see Fritz et al. 2009). The ordering of the enumerated processes does not reflect a temporal order of music-syntactic processing; the processes may partly happen in parallel:

  1. 1. Element extraction: Elements such as tones and chords (or phonemes and words in language) are extracted from the continuous stream of auditory information. In homophonic and polyphonic music, representation of a current melodic and harmonic event is established (with the harmonic event coloring the melodic event). With regard to the temporal structure, a tactus (or “beat”) is extracted. (The tactus is represented by the most salient time periodicity with which musical elements occur, corresponding to the rate at which one might clap, or tap to the music.)

  2. 2. Knowledge-free structuring: Representation of structural organization is established online (on a moment-to-moment basis) without obligatory application of long-term knowledge. For example, in auditory oddball paradigms, with sequences such as …..–…–……–….–..etc., it is possible to establish a representation where “.” is a high-probability standard and “–” is a low-probability deviant without any prior knowledge of any regularity underlying the construction of the sequence. Likewise, listening to a musical passage in one single key, an individual can establish a representation of the tones of a key and detect out-of-key tones (thus also enabling the determination of key membership) based on information stored in the auditory sensory memory: In-key tones become standard stimuli, such that any out-of-key tone (e.g., any black piano key producing a tone within a sequence of C major) represents a deviant stimulus (“auditory oddball”). These auditory oddballs elicit a brain-electric response referred to as the mismatch negativity (MMN). The processes that underlie the establishment of models representing such regularities include grouping and Gestalt principles. With regard to the melodic structure of a piece, grouping is required to assemble single tones to a melodic contour. In terms of temporal structure, grouping serves to extract the meter of a piece, as well as of rhythmic patterns. For a discussion on the establishment of a tonal “hierarchy of stability” of tones and chords based on Gestalt principles, see Koelsch (2012).

  3. 3. Musical expectancy formation. The online models described in Pt. 2 are based on a moment-to-moment basis without long-term knowledge of rules or regularities. By contrast, music-syntactic processing may also involve representations of regularities that are stored in a (p.143) long-term memory format (e.g., probabilities for the transition of musical elements such as chord functions). Such representations can be modeled computationally as fragment models: n-gram model, Markov model, chunking, or PARSER models (for details, see Rohrmeier and Koelsch 2012).

    The important difference between knowledge-free structuring and musical expectancy is that the former is based on psychoacoustic principles and information stored in the auditory sensory memory, whereas the latter is based on long-term memory (this does not exclude that during the listening to a piece, experience based on knowledge-free structuring is immediately memorized and used throughout the musical piece). With regard to tonal music, Rohrmeier (2005) found, in a statistical analysis of the frequencies of diatonic chord progressions occurring in Bach chorales, that the supertonic was five times more likely to follow the subdominant than to precede it (Rohrmeier and Cross 2008). Such statistical properties of the probabilities for the transitions of chord functions are learned implicitly during the listening experience (Tillmann 2005; Jonaitis and Saffran 2009) and stored in a long-term memory format.

    Importantly, with regard to major-minor tonal music, the interval structure of a chord function (e.g., whether a chord is presented in root position or as a sixth [first inversion] or six-four [second inversion] chord) determines the statistical probabilities of chord transitions. For example, six-four chords often have dominant character: a dominant that does not occur in root position is unlikely to indicate the arrival of a cadence; a tonic presented as a sixth chord is unlikely to be the final chord of a chord sequence; the same holds for a tonic in root position with the third in the top voice (for details, see Caplin 2004). In this sense, a chord function parallels a lexeme, a chord in root position parallels a lemma, and the different inversions of a chord parallel word inflections.

    On the metrical and tonal grid that is established due to knowledgefree structuring, musical expectancies for subsequent structural elements are formed on the basis of implicit knowledge. Note that such musical expectancies are different from the expectancies (or predictions) formed as a result of knowledge-free structuring, because the latter are formed on the basis of acoustic similarity, acoustic regularity, and Gestalt principles; long-term memory representations of statistical probabilities are not required. Importantly, making predictions based on the processes of knowledge-free structuring and musical expectancy can only represent local dependencies—not long-distance dependencies, which is discussed next.

  4. 4. Structure building: Tonal music is hierarchically organized (for an example of music that does not have a hierarchical structure, see Fritz et al. 2009). (p.144) Such hierarchical organization gives rise to building and representing structures that involve long-distance dependencies on a phrase-structure level (i.e., structures based on context-free grammar). Such hierarchical structures may involve recursion, and they can best be represented graphically as tree structures. The processing and representation of such structures requires (auditory) working memory. Two approaches have so far developed systematic theoretical accounts on hierarchical structures of music: Lerdahl’s combination of the generative theory of tonal music and his tonal pitch space theory (TPS; see Lerdahl, this volume), and Rohrmeier’s generative syntax model (GTM; Rohrmeier 2011). To date, no neurophysiological investigation has tested whether individuals perceive music cognitively according to tree structures. Similarly, behavioral studies on this topic are extremely scarce (Cook 1987a, b; Bigand et al. 1996; Lerdahl and Krumhansl 2007). Whether tree structures have a psychological reality in the cognition of listeners of (major-minor tonal) music remains an open question.

  5. 5. Structural reanalysis and revision: During the syntactic processing of a sequence of elements, perceivers often tend to structure elements in the most likely way, for example, with regard to language, based on thematic role assignment, minimal attachment, and late closure (for details, see Sturt et al. 1999). However, a listener may also recognize that an established hierarchical model needs be revised; that is, with regard to the representation of a hierarchical organization using a tree structure, the headedness of branches, the assignment of elements to branches, etc., may have to be modified. To give an example with regard to language comprehension, the beginning of “garden-path” sentences (i.e., sentences which have a different syntactic structure than initially expected) suggests a certain hierarchical structure, which turns out to be wrong: “He painted the wall with cracks” (for further examples, see Townsend and Bever 2001). Note that while building a hierarchical structure, there is always ambiguity as to which branch a new element might belong: whether a new element belongs to a left-or a right-branching part of a tree, whether the functional assignment of a node to which the new element belongs is correct, etc. However, once a branch has been identified with reasonable certainty, it is represented as a branch, not as a single element. If this branch (or at least one node of this branch), however, subsequently turns out not to fit into the overall structure (e.g., because previously assumed dependencies break), the structure of the phrase, or sentence, must be reanalyzed and revised.

  6. 6. Syntactic integration: As mentioned above, several syntactic features constitute the structure of a sequence. In tonal music, these include melody, harmony, and meter. These features must be integrated by a listener to establish a coherent representation of the structure, and thus (p.145) to understand the structure. For example, a sequence of chord functions is only “syntactically correct” when played on a certain metric grid; when played with a different meter, or with a different rhythm, the same sequence might sound less correct (or even incorrect), because, for example, the final tonic no longer occurs on a heavy beat.

    In many listeners, the simultaneous operation of melody, meter, rhythm, harmony, intensity, instrumentation, and texture evokes feelings of pleasure. After the closure of a cadence, and particularly when the closure resolves a previous breach of expectancy and/or previous dissonances, the integrated representation of the (simultaneous) operation of all syntactic features is perceived as particularly pleasurable and relaxing.

  7. 7. Large-scale structuring: The cognitive processes described above are concerned with the processing of phrase structure (i.e., processing of phrases that close with a cadence). Musical pieces, however, usually consist of numerous phrases, and thus have large-scale structures: verse and chorus in a song, the A-B-A(’) form of a Minuet, the parts of a sonata form, etc. When listening to music from a familiar musical style with such organization, these structures can be recognized, often with the help of typical forms and transitions. Such recognition is the basis for establishing a representation of the large-scale structuring of a piece.

Neural Correlates of Music-Syntactic Processing

To date, neurophysiological studies on music-syntactic processing have utilized the classical theory of harmony, according to which chord functions are arranged within harmonic sequences according to certain regularities. Here I describe neural correlates of music-syntactic processing that have been obtained through such studies. These studies show a remarkable similarity between neural correlates of music-and language-syntactic processing.

In major-minor tonal music, chord functions (Figure 6.1a) are arranged within harmonic sequences according to certain regularities. Chords built on the tones of a scale fulfill different functions. A chord built on the first-scale tone is called the tonic (I), the chord on the fifth-scale tone is the dominant (V). Normally, a chord constructed on the second scale tone of a major scale is a minor chord; however, when this chord is changed to be major, it can be interpreted as the dominant of the dominant (V/V or secondary dominant) (see square brackets in Figure 6.1a). One example for a regularity-based arrangement of chord functions is that the dominant is followed by the tonic (V-I), particularly at a possible end of a chord sequence; a progression from the dominant to the dominant of the dominant (V-V/V) is less regular (and seen as unacceptable as a marker of the end of a harmonic sequence). The left sequence in Figure 6.1b ends on a regular dominant-tonic (V-I) progression; in the right sequence (p.146)

Neural Correlates of Music Perception

Figure 6.1 Neural correlates of music-syntactic processing. (a) Chord functions are created from the chords built on the tones of a scale. (b) The left sequence ends on a regular dominant-tonic (V-I) progression. The final chord in the right-hand sequence (see arrow) is a dominant of the dominant; this chord function is irregular, especially at the end of a harmonic progression (sound examples are available at www.stefankoelsch.de/TC_DD). (c) Electric brain potentials (in μ‎V) elicited by the final chords of the two sequence types presented in (b) (recorded from a right frontal electrode site, F4, from twelve subjects). Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys. Brain responses to irregular chords clearly differ from those to regular chords. The first difference between the two black waveforms is maximal at about 0.2 s after the onset of the chord (this is best seen in the red difference wave, which represents regular, subtracted from irregular chords) and has a right frontal preponderance. This early right anterior negativity (ERAN) is usually followed by a later negativity, the N5. (d) With MEG, the magnetic equivalent of the ERAN was localized in the inferior frontolateral cortex. (e) fMRI data obtained from twenty subjects using a similar chord sequence paradigm. The statistical parametric maps show areas that are more strongly activated during the processing of irregular than during the processing of regular chords. Reprinted with permission from Koelsch (2005).

(p.147) of Figure 6.1b, the final chord is the dominant to the dominant (arrow). (Sound examples of the sequences can be downloaded from www.stefan-koelsch.de.)

Figure 6.1c shows electric brain potentials elicited by the final chords of the two sequence types presented in Figure 6.1b recorded from a right frontal electrode site (F4) from twelve subjects (for details on how to obtain such potentials, see Koelsch 2012). Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys. Brain responses to irregular chords clearly differ from those to regular chords. The first difference between the two black waveforms is maximal at about 0.2 s after the onset of the chord (this is best seen in the red difference wave, which represents regular, subtracted from irregular chords) and has a right frontal preponderance. This early right anterior negativity (ERAN) is usually followed by a later negativity, the N5 potential (short arrow; for details about the polarity of evoked potentials, see Koelsch 2012).

With magnetoencephalography (MEG), the magnetic equivalent of the ERAN was localized in the inferior frontolateral cortex: Figure 6.1d shows single-subject dipole solutions (indicated by striped disks), and the grand average of these source reconstructions (white dipoles); the grand average data show that sources of the ERAN are located bilaterally in inferior Brodmann area (BA) 44 (see also Maess et al. 2001); the dipole strength was nominally stronger in the right hemisphere, but this hemispheric difference was not statistically significant. This region is in the left hemisphere and is usually referred to as part of “Broca’s area,” although it is presumed that music-syntactic processing also receives additional contributions from the ventrolateral premotor cortex and the anterior superior temporal gyrus (i.e., the planum polare) (discussed below; see also Koelsch 2006).

Results of the MEG study (Koelsch 2000; Maess et al. 2001) were supported either by functional neuroimaging studies using chord sequence paradigms reminiscent of that shown in Figure 6.1b (Koelsch et al. 2002a, 2005; Tillmann et al. 2006) or studies that used “real,” multipart music (Janata et al. 2002b) and melodies. These studies showed activations of inferior frontolateral cortex at coordinates highly similar to those reported in the MEG study (Figure 6.1e). Particularly the fMRI study by Koelsch et al. (2005; Janata et al. 2002a) supported the assumption of neural generators of the ERAN in inferior BA 44: In addition, ERAN has been shown to be larger in musicians than in nonmusicians (Koelsch et al. 2002b), and, in the fMRI study by Koelsch et al. (2005), effects of musical training were correlated with activations of inferior BA 44, both in adults as well as children.

Moreover, data recorded from intracranial grid electrodes from patients with epilepsy identified two ERAN sources: one in the inferior frontolateral cortex and one in the superior temporal gyrus (Sammler 2008). The latter was inconsistently located in anterior, middle, and posterior-superior temporal gyrus.

Finally, it is important to note that inferior BA 44 (part of Broca’s area) is involved in the processing of syntactic information during language perception (p.148) (e.g., Friederici 2002), in the hierarchical processing of action sequences (e.g., Koechlin and Jubault 2006), and in the processing of hierarchically organized mathematical formulas (Friedrich and Friederici 2009). Thus, Broca’s area appears to play a role in the hierarchical processing of sequences that are arranged according to complex regularities. On a more abstract level, it is highly likely that Broca’s area is involved in the processing of hierarchically organized sequences in general, be they musical, linguistic, action-related, or mathematical.

In contrast, the processing of musical structure with finite-state complexity does not appear to require BA 44. Instead, it appears to receive main contributions from the ventral premotor cortex (PMCv). Activations of PMCv have been reported in a variety of functional imaging studies on auditory processing—using musical stimuli, linguistic stimuli, auditory oddball paradigms, pitch discrimination tasks, and serial prediction tasks—which underlines the importance of these structures for the sequencing of structural information, the recognition of structure, and the prediction of sequential information (Janata and Grafton 2003; Schubotz 2007) (Figure 6.1d). With regard to language, Friederici (2004) reports that activation foci of functional neuroimaging studies on the processing of long-distance hierarchies and transformations are located in the posterior inferior frontal gyrus (with the mean of the coordinates reported in that article being located in the inferior pars opercularis), whereas activation foci of functional neuroimaging studies on the processing of local structural violations are located in the PMCv (see also Friederici et al. 2006; Makuuchi et al. 2009; Opitz and Kotz 2011). Moreover, patients with a lesion in the PMCv show disruption of the processing of finite-state, but not phrase-structure, grammar (Opitz and Kotz 2011).

In terms of the cognitive processes involved in music-syntactic processing (see above), the ERAN elicited in the studies mentioned was probably due to a disruption of musical structure building as well as the violation of a local prediction based on the formation of musical expectancy. That is, it seems likely that, in the studies reported, processing of local and (hierarchically organized) long-distance dependencies elicited early negative potentials, and that the observed ERAN effect was a conglomerate of these potentials. The electrophysiological correlates of the formation of musical expectancy, on one hand, and the building of hierarchical structures, on the other, have not yet been separated. It seems likely, however, that the former may primarily involve the PMCv, whereas Broca’s area may be involved in the latter.

Interactions between the Syntactic Processing of Language and Music

The strongest evidence for shared neural resources in the syntactic processing of music and language stems from experiments that show interactions between both (Koelsch et al. 2005; Steinbeis and Koelsch 2008b; for behavioral studies (p.149) see Slevc et al. 2009; Fedorenko et al. 2009; see also Patel, this volume).1 In these studies, chord sequences were presented simultaneously with visually presented sentences while participants were asked to focus on the language-syntactic information, and to ignore the music-syntactic information (Figure 6.2).

Using EEG and chord sequence paradigms reminiscent of those described in Figure 6.1b, two studies showed that the ERAN elicited by irregular chords interacts with the left anterior negativity (LAN), a component of an event-related

Neural Correlates of Music Perception

Figure 6.2 Examples of experimental stimuli used in the studies by Koelsch et al. (2005) and Steinbeis and Koelsch (2008b). (a) Examples of two chord sequences in C major, ending on a regular (upper row: the tonic) and an irregular chord (lower row: the irregular chord, a Neapolitan, is indicated by the arrow). (b) Examples of the three different sentence types (English translations of the German sentences used in the experiment). Onsets of chords (presented auditorily) and words (presented visually) were synchronous. Reprinted with permission from Steinbeis and Koelsch (2008b).

(p.150) potential (ERP) elicited by morphosyntactic violations during language perception. Using German sentences, Koelsch et al. (2005) and Steinbeis and Koelsch (2008b) showed that morphosyntactically irregular (gender disagreement) words elicited an LAN (compared to syntactically regular words; Figure 6.3a). In addition, the LAN was reduced when the irregular word was presented simultaneously with a music-syntactically irregular chord (compared to when the irregular word was presented with a regular chord, Figure 6.3b).

No such effects of music-syntactically irregular chords were observed for the N400 ERP, when words that were syntactically correct, but which had a low semantic cloze probability (e.g., “He sees the cold beer”), were elicited compared to words with a high semantic cloze probability (e.g., “He drinks the cold beer”)2 (Figure 6.3c). The final words of sentences with low semantic cloze probability elicited a larger N400 than words with high semantic cloze probability. This N400 effect was not influenced, however, by the syntactic regularity of chords; that is, the music-syntactic regularity of chords specifically affected the syntactic (not the semantic) processing of words (as indicated by the interaction with the LAN).

Neural Correlates of Music PerceptionNeural Correlates of Music Perception

Figure 6.3 Total average of ERPs elicited by the stimuli shown in Figure 6.2. Participants ignored the musical stimulus, concentrated on the words, and, in 10% of the trials, answered whether the last sentence was (syntactically or semantically) correct or incorrect. (a) Compared to regular words, morphosyntactically irregular words elicit a LAN, best seen in the difference wave (thin line, indicated by the arrow). The LAN had a left anterior scalp distribution and was maximal at the electrode F5. All words were elicited on regular chords. (b) LAN effects (difference waves) are shown for words presented on regular chords (thick line is identical to the thin difference wave in (a) and irregular chords (dotted line). The data show that the morphosyntactic processing (as reflected in the LAN) is reduced when words have to be processed simultaneously with a syntactically irregular chord. (c) shows the analogous difference waves for the conditions in which all words were syntactically correct, but in which ERPs elicited by words with high semantic cloze probability (e.g., “He drinks the cold beer”) were subtracted from ERPs elicited by words with low semantic cloze probability (e.g., “He sees the cold beer”). The solid line represents the condition in which words were presented on regular chords, the dotted line represents the condition in which words were presented on irregular chords. In both conditions, semantically irregular (low-cloze probability) words elicited an N400 effect. The N400 had a bilateral centroparietal scalp distribution and was maximal at the electrode PZ. Importantly, the N400 was not influenced by the syntactic irregularity of chords (both difference waves elicit the same N400 response). (d) ERP waves analogous to those shown in (b) are shown. Here, however, tones were presented (instead of chords) in an auditory oddball paradigm (tones presented at positions 1-4 were standard tones, and the tone at the fifth position was either a standard or a deviant tone, analogous to the chord sequences). As in the chord condition, mor-phosyntactically irregular words elicit a clear LAN effect (thick difference wave). In contrast to the chord condition, virtually the same LAN effect was elicited when words were presented on deviant tones. Morphosyntactic processing (as reflected in the LAN) is not influenced when words have to be processed simultaneously with an acoustically deviant tone. Thus, the interaction between language-and music-syntactic processing shown in (b) is not due to any acoustic irregularity, but rather to specific syntactic irregularities. The scale in (b) to (d) is identical to the scale in (a). Data are presented in Koelsch et al. (2005). Reprinted with permission from Koelsch et al. (2000).

(p.151) In the study by Koelsch et al. (2005), a control experiment was conducted in which the same sentences were presented simultaneously with sequences of single tones. The tone sequences ended either on a standard tone or on a frequency deviant. The physical MMN elicited by the frequency deviants did not interact with the LAN (in contrast to the ERAN), indicating that the processing of auditory oddballs (as reflected in the physical MMN) does not consume resources related to syntactic processing (Figure 6.3d). These ERP studies indicate that the ERAN reflects syntactic processing, rather than detection and integration of intersound relationships inherent in the sequential presentation of discrete events into a model of the acoustic environment. The finding that language-syntactic deviances—but not language-semantic deviances or acoustic deviances—interacted with music-syntactic information suggests shared resources for the processing of music-and language-syntactic information.

(p.152) These ERP findings have been corroborated by behavioral studies: In a study by Slevc et al. (2009), participants performed a self-paced reading of “garden-path” sentences. Words (presented visually) occurred simultaneously with chords (presented auditorily). When a syntactically unexpected word occurred together with a music-syntactically irregular (out-of-key) chord, participants needed more time to read the word (i.e., participants showed stronger garden-path effect). No such interaction between language-and music-syntactic processing was observed when words were semantically unexpected, or when the chord presented with the unexpected word had an unexpected timbre (but was harmonically correct). Similar results were reported in a study in which sentences were sung (Fedorenko et al. 2009). Sentences were either subject-extracted or object-extracted relative clauses, and the note sung on the critical word of a sentence was either in-key or out-of-key. Participants were less accurate in their understanding of object-related extractions compared to subject-extracted extractions (as expected), because the object-extracted sentence constructions required more syntactic integration compared to subject-extracted constructions. Importantly, the difference between the comprehension accuracies of these two sentence types was larger when the critical word (the last word of a relative clause) was sung on an out-of-key note. No such interaction was observed when the critical word was sung with greater loudness. Thus, both of these studies (Fedorenko et al. 2009; Slevc et al. 2009) show that music-and language-syntactic processing specifically interact with each other, presumably because they both rely on common processing resources.

The findings of these EEG and behavioral studies, showing interactions between language-and music-syntactic processing, have been corroborated by a recent patient study (Patel 2008). This study showed that individuals with Broca’s aphasia also show impaired music-syntactic processing in response to out-of-key chords occurring in harmonic progressions. (Note that all patients had Broca’s aphasia, but only some of them had a lesion that included Broca’s area.)

In conclusion, neurophysiological studies show that music-and language-syntactic processes engage overlapping resources (in the frontolateral cortex). The strongest evidence that show these resources underlie music-and language-syntactic processing stems from experiments that demonstrate interactions between ERP components reflecting music-and language-syntactic processing (LAN and ERAN). Importantly, such interactions are observed (a) in the absence of interactions between LAN and MMN (i.e., in the absence of interactions between language-syntactic and acoustic deviance processing, reflected in the MMN) and (b) in the absence of interactions between the ERAN and the N400 (i.e., in the absence of interactions between music-syntactic and language-semantic processing). Therefore, the reported interactions between LAN and ERAN are syntax specific and cannot be observed in response to any kind of irregularity. However, whether the interaction between ERAN and (p.153) LAN is due to the processing of local or long-distance dependencies (or both) remains to be determined.

Musical Meaning

To communicate, an individual must utter information that can be interpreted and understood by another individual. This section discusses neural correlates of the processing of meaning that emerge from the interpretation of musical information by an individual. Seven dimensions of musical meaning are described, divided into the following three classes of musical meaning:

  1. 1. Extramusical meaning can emerge from the act of referencing a musical sign to a (extramusical) referent by virtue of three different types of sign quality: iconic, indexical, and symbolic.

  2. 2. Intramusical meaning emerges from the act of referencing a structural musical element to another structural musical element.

  3. 3. Musicogenic meaning emerges from the physical processes (such as actions), emotions, and personality-related responses (including preferences) evoked by music.

Thus, in contrast to how the term meaning is used in linguistics, musical meaning as considered here is not confined to conceptual meaning; it can also refer to nonconceptual meaning. In language, such nonconceptual meaning may arise, for example, from the perception of affective prosody. Moreover, I use the term musical semantics in this chapter (instead of simply using the terms “musical meaning” or “musical semiotics”) to emphasize that musical meaning extends beyond musical sign qualities: For example, with regard to intramusical meaning, musical meaning can emerge from the structural relations between successive elements. Another example, in terms of extramusical meaning, is that during the listening of program music, the processing of extramusical meaning usually involves integration of meaningful information into a semantic context. Note, however, that the term musical semantics does not refer to binary (true-false) truth conditions. I agree with Reich (2011): no musical tradition makes use of quantifiers (e.g., “all,” “some,” “none,” “always”), modals (e.g., “must,” “may,” “necessary”), or connectives (e.g., “and,” “if…then,” “if and only if,” “neither…nor’) unless music imitates language (such as drum and whistle languages; Stern 1957). Hence, the term “musical semantics” should not be equated with the term “propositional semantics” as it is used in linguistics. Also note that during music listening or music performance, meaning can emerge from several sources simultaneously. For example, while listening to a symphonic poem, meaning may emerge from the interpretation of extramusical sign qualities, from the processing of the intramusical structure, as well as from music-evoked (musicogenic) emotions.

(p.154) Extramusical Meaning

Extramusical meaning emerges from a reference to the extramusical world. It comprises three categories:

  1. 1. Iconic musical meaning: Meaning that emerges from common patterns or forms, such as musical sound patterns, that resemble sounds or qualities of objects. This sign quality is reminiscent of Peirce’s “iconic” sign quality (Peirce 1931/1958); in language, sign quality is also referred to as onomatopoeic.

  2. 2. Indexical musical meaning: Meaning that arises from the suggestion of a particular psychological state due to its resemblance to action-related patterns (such as movements and prosody) that are typical for an emotion or intention (e.g., happiness). This sign quality is reminiscent of Peirce’s “indexical” sign quality (for a meta-analysis comparing the acoustical signs of emotional expression in music and speech, see Juslin and Laukka 2003). Cross and Morley (2008) refer to this dimension of musical meaning as “motivational-structural” due to the relationship between affective-motivational states of individuals and the structural-acoustical characteristics of (species-specific) vocalizations. With regard to intentions, an fMRI study (Steinbeis and Koelsch 2008b), which will be discussed later in detail, showed that listeners automatically engage social cognition as they listen to music, in an attempt to decode the intentions of the composer or performer (as indicated by activation of the cortical theory of mind network). That study also reported activations of posterior-temporal regions implicated in semantic processing, presumably because the decoding of intentions has meaning quality.

  3. 3. Symbolic musical meaning: Meaning due to explicit (or conventional) extramusical associations (e.g., any national anthem). Peirce (1931/1958) denoted this sign quality as symbolic (note that the meaning of the majority of words is due to symbolic meaning). Symbolic musical meaning also includes social associations such as between music and social or ethnic groups (for the influence of such associations on behavior, see Patel 2008). Wagner’s leitmotifs are another example of symbolic extramusical sign quality.

Extramusical Meaning and the N400

The processing of extramusical meaning is reflected in the N400. As described earlier in this chapter, the N400 component is an electrophysiological index of the processing of meaning information, particularly conceptual/semantic processing or lexical access, and/or post-lexical semantic integration. Koelsch et al. (2004) showed that the N400 elicited by a word can be modulated by the (p.155) meaning of musical information preceding that word (see Figure 6.4). Further studies have revealed that short musical excerpts (duration ~ 1 s) can also elicit N400 responses when presented as a target stimulus following meaningfully unrelated words (Daltrozzo and Schön 2009). Even single chords can elicit N400 responses, as shown in affective priming paradigms using chords as targets (and words as prime stimuli) or chords as primes (and words as targets; Steinbeis and Koelsch 2008a). Finally, even single musical sounds can elicit N400 responses due to meaningful timbre associations (e.g., “colorful,” “sharp”; Grieser Painter and Koelsch 2011).

Neural Correlates of Music Perception

Figure 6.4 Examples of the four experimental conditions preceding a visually presented target word. Top left: sentence priming (a) and not priming (b) the target word Weite (wideness). Top right: Total-averaged brain electric responses elicited by target words after the presentation of semantically related (solid line) and unrelated prime sentences (dotted line), recorded from a central electrode. Compared to the primed target words, unprimed target words elicited a clear N400 component in the ERP. Bottom left: musical excerpt priming (c) and not priming (d) the same target word (excerpts had similar durations as sentences). Bottom right: Total-averaged ERPs elicited by primed (solid line) and non-primed (dotted line) target words after the presentation of musical excerpts. After the presentation of sentences, target words presented after unrelated musical excerpts elicited a clear N400 component compared to target words presented after related excerpts. Each trial was presented one time; conditions were distributed randomly, but in counterbalanced order across the experiment. Each prime was used in another trial as a non-prime for a different target word (and vice versa); thus, each sentence or musical excerpt was presented twice (half were first presented as primes, the other half as non-primes). Audio examples are available on www.stefan-koelsch.de. Reprinted with permission from Koelsch et al. (2004).

(p.156) Intramusical Meaning

Musical meaning can also emerge from intramusical references; that is, from the reference of one musical element to at least one other musical element (e.g., a G major chord is usually perceived as the tonic in G major, as the dominant in C major, and, in its first inversion, possibly as a Neapolitan sixth chord in F# minor). The following will illustrate that the so-called N5 appears to be an electrophysiological correlate of such processing of intramusical meaning.

N5 was described first in reports of experiments using chord sequence paradigms with music-syntactically regular and irregular chord functions (see also Figure 6.1b). As described above, such irregular chord functions typically elicit two brain-electric responses: an early right anterior negativity (the ERAN, which is taken to reflect neural mechanisms related to syntactic processing) and a late negativity, the N5. Initially, N5 was proposed to reflect processes of harmonic integration, reminiscent of the N400 reflecting semantic integration of words. N5 was therefore proposed to be related to the processing of musical meaning, or semantics (Koelsch et al. 2000). N5 also shows some remarkable similarities with the N400.

Harmonic Context Buildup

N5 was first observed in experiments using paradigms in which chord sequences (each consisting of five chords) ended either on a music-syntactically regular or irregular chord function. Figure 6.5 shows ERPs elicited by regular chords at positions 1 to 5; each chord elicits an N5 (see arrow in Figure 6.5), with the amplitude of the N5 declining toward the end of the chord sequence. Amplitude decline is taken to reflect the decreasing amount of harmonic integration required with progressing chord functions during the course of the cadence. The small N5 elicited by the (expected) final tonic chord presumably reflects the small amount of harmonic integration required at this position of a chord sequence. This phenomenology of the N5 is similar to that of the N400 elicited by open-class words (e.g., nouns, verbs): as the position of words in a sentence progresses, N400 amplitude declines toward the end of a sentence (Van Petten and Kutas 1990). In other words, during sentence processing, a semantically correct final open-class word usually elicits a rather small N400, whereas the open-class words preceding this word elicit larger N400 potentials. This is due to the semantic expectedness of words, which is rather unspecific at the beginning of a sentence, and which becomes more and more specific toward the end of the sentence (when readers can already guess what the last word will be). Thus, a smaller amount of semantic integration is required at the end of a sentence, reflected in a smaller N400. If the last word is semantically unexpected, then a large amount of semantic processing is required—reflected in a larger amplitude of N400. (p.157)

Neural Correlates of Music Perception

Figure 6.5 Total-averaged ERPs elicited by regular chords, separately for each of the regular chords (first to fifth position) of a five-chord cadence. Amplitude of the N5 (indicated by the arrow) is dependent on the position of the chords in the cadence: amplitude of the N5 decreases with increasing harmonic context buildup. Reprinted with permission from Koelsch (2012).

Harmonic Incongruity

Compared to regular chord functions, irregular chord functions typically elicit an ERAN, which is taken to reflect neural mechanisms related to syntactic processing. In addition, irregular chords elicit an N5 with a larger amplitude than the N5 elicited by regular chord functions (see Figure 6.2c) (for studies on N5 effects for melodies, see Miranda and Ullman 2007; Koelsch and Jentschke 2010). This increase of the N5 amplitude is taken to reflect the increased amount of harmonic integration, reminiscent of the N400 reflecting semantic integration of words (Figure 6.4). That is, at the same position within a chord sequence, N5 is modulated by the degree of fit with regard to the previous harmonic context, analogous to the N400 (elicited at the same position within a sentence), which is modulated by the degree of fit with regard to the previous semantic context (see also Figure 6.4). Therefore, N5 is proposed to be related to the processing of musical meaning, or semantics, although the type of musical meaning is unclear (Koelsch et al. 2000). Further evidence for the notion that the ERAN reflects music-syntactic and N5 music-semantic processes is reported next.

N5 and N400

We have seen (Figure 6.2) that LAN elicited by morphosyntactic violations in language is influenced by the music-syntactically irregular chord functions, whereas the N400 is not affected by such irregularities. In addition, Steinbeis and Koelsch (2008a) found that ERAN was smaller when elicited on syntactically wrong words compared to the ERAN elicited on syntactically correct (p.158) words (cf. Figure 6.3a). This lends strong support to the notion that ERAN reflects syntactic processing.3

Moreover, and most importantly with regard to the processing of musical meaning, results of Steinbeis and Koelsch (2008a) study also show an interaction between the N5 and the semantic cloze probability of words (in the absence of an interaction between the N5 and the syntactic regularity of words; Figure 6.3b, c): N5 was smaller when elicited on words with a semantic low cloze probability (e.g., “He sees the cold beer”) than on words with a semantic high cloze probability (e.g., “He drinks the cold beer”). Importantly, N5 did not interact with the syntactic processing of words, indicating the N5 potential can be modulated specifically by semantic processes; namely by the activation of lexical representations of words with different semantic fit to a previous context. This modulation indicates that N5 is related to the processing of meaning information. Note that the harmonic relation between the chord functions of a harmonic sequence is an intramusical reference (i.e., a reference of one musical element to another musical element, but not a reference to anything belonging to the extramusical world). Therefore, there is reason to believe that N5 reflects the processing of intramusical meaning. The fact that irregular chord functions usually elicit both ERAN and N5 potentials suggests that irregular chord functions evoke syntactic processes (as reflected in the ERAN) as well as semantic processes (as reflected in the N5). In addition, as will be discussed later, irregular chord functions may also evoke emotional effects.4

Musicogenic Meaning

In musicogenic meaning (Koelsch 2011b), listeners not only perceive the meaning expressed by the music, they process the musical information individually in terms of (a) physical activity, (b) emotional effects, and (c) personality-related effects. This, in turn, adds quality to the meaning for the perceiver.


Individuals tend to move to music. For example, they may sing, play an instrument, dance, clap, conduct, nod their heads, tap, or sway to music. In short, individuals tend to display some sort of physical activity in response to, and (p.159) in synchrony with, music. The mere fact that an individual shows such activity carries meaning for the individual. In addition, the way in which an individual moves is in itself an expression of meaning information. Movements are “composed” by the individual. Thus they need to be differentiated from motor effects that may result as an emotional effect of music, such as smiling during emotional contagion when listening to joyful music.

In a social situation, that is, when more than one individual moves to or plays music, meaning also emerges from joint, coordinated activity. For example, an action-related effect that becomes apparent in music based on an isochronous pulse—a pulse to which we can easily clap, sing, and dance—is that individuals synchronize their movements to the external musical pulse. In effect, in a group of individuals, this leads to coordinated physical activity. Notably, humans are one of the few species that are capable of synchronizing their movements to an external beat (nonhuman primates apparently do not have this capability, although some other species do; for a detailed account, see Fitch and Jarvis, this volume). In addition, humans are unique in that they can understand other individuals as intentional agents, share their intentionality, and act jointly to achieve a shared goal. In this regard, communicating and understanding intentions as well as interindividual coordination of movements is a prerequisite for cooperation. Cross stated that, in a social context, musical meaning can emerge from such joint performative actions and referred to this dimension as “socio-intentional” (Cross 2008:6).


Musicogenic meaning can also emerge from emotions evoked by music. This view considers that feeling one’s own emotions is different from the recognition of emotion expressed by the music (Gabrielson and Juslin 2003), the latter usually being due to indexical sign quality of music. The different principles by which music may evoke emotions are discussed elsewhere (e.g., Gabrielson and Juslin 2003; Koelsch 2012); here, the meaning that emerges from (music-evoked) emotions is discussed.

The evocation of emotions with music has important implications for the specificity of meaning conveyed by music as opposed to language. In communicating emotions, language faces several problems: In his discussion about rule following and in his argument against the idea of a “private language,” Wittgenstein (1984) demonstrates that “inner” states (like feelings) cannot be directly observed and verbally denoted by the subject who has these states. His argument shows that the language about feelings functions in a different mode to the grammar of words and things. Wittgenstein argues that it is not possible (a) to identify correctly an inner state and (b) to guarantee the correct language use that is not controlled by other speakers. This means (c) that it is impossible for the speaker to know whether his or her use corresponds to the rules of the linguistic community and (d) whether his or her use is the same in different (p.160) situations. According to Wittgenstein, correct use of the feeling vocabulary is only possible in specific language games. Instead of assuming a direct interaction of subjective feelings and language, Gebauer (2012) proposed that feeling sensations (Wittgenstein’s Empfindungen) are reconfigured by linguistic expressions (although reconfiguration is not obligatory for subjective feeling). This means that there is no (direct) link or translation between feelings and words, thus posing fundamental problems for any assumption of a specificity of verbal communication about emotions. However, affective prosody, and perhaps even more so music, can evoke feeling sensations (Empfindungen) which, before they are reconfigured into words, bear greater interindividual correspondence than the words that individuals use to describe these sensations. In other words, although music seems semantically less specific than language (e.g., Slevc and Patel 2011; Fitch and Gringas 2011), music can be more specific when it conveys information about feeling sensations that are difficult to express in words, because music can operate prior to the reconfiguration of feeling sensations into words. Note that in spoken language, affective prosody also operates in part on this level, because it elicits sensational processes in a perceiver that bear resemblance to those that occur in the producer. I refer to this meaning quality as a priori musical meaning. The reconfiguration of a feeling sensation into language involves the activation of representations of a meaningful concept, such as “joy,” “fear,” etc. Zentner et al. (2008) report a list of 40 emotion words typically used by Western listeners to describe their music-evoked feelings. Such activation presumably happens without conscious deliberation, and even without conscious (overt or covert) verbalization, similar to the activations of concepts by extramusical sign qualities, of which individuals are often not consciously aware.


Feeling sensations evoked by a particular piece of music, or music of a particular composer, can have a personal relevance, and thus meaning, for an individual in that they touch or move the individual more than feeling sensations evoked by other pieces of music, or music of another composer. This is in part due to interindividual differences in personality (both on the side of the recipient and on the side of the producer). Because an individual has a personality (be it a receiver or producer of music), and personalities differ between individuals, there are also interindividual differences among receivers in the preference for, or connection with, a particular producer of music. For example, one individual may be moved more by Beethoven than by Mozart, while the opposite may be true for another. Music-evoked emotions may also be related to one’s inner self, sometimes leading to the experience that one recognizes oneself in the music in a particular, personal way.

(p.161) Music-Evoked Emotions

Unexpected Harmonies and Emotional Responses

A study by Steinbeis et al. (2006) tested the hypothesis that emotional responses can be evoked by music-syntactically unexpected chords. In this study, physiological measures including EEG, electrodermal activity (EDA, also referred to as galvanic skin response), and heart rate were recorded while subjects listened to three versions of Bach chorales. One version was the original version composed by Bach with a harmonic sequence that ended on an irregular chord function (e.g., a submediant). The same chord was also rendered expected (using a tonic chord) and very unexpected (a Neapolitan sixth chord). The EDA to these three different chord types showed clear differences between the expected and the unexpected (as well as between expected and very unexpected) chords. Because the EDA reflects activity of the sympathetic nervous system, and because this system is intimately linked to emotional experiences, these data corroborate the assumption that unexpected harmonies elicit emotional responses. The findings from this study were later replicated in another study (Koelsch et al. 2008), which also obtained behavioral data showing that irregular chords were perceived by listeners as surprising, and less pleasant, than regular chords.

Corroborating these findings, functional neuroimaging experiments using chord sequences with unexpected harmonies (originally designed to investigate music-syntactic processing; Koelsch et al. 2005; Tillmann et al. 2006) showed activations of the amygdala (Koelsch et al. 2008), as well as of orbital frontal (Tillmann et al. 2006), and orbital frontolateral cortex (Koelsch et al. 2005) in response to the unexpected chords. The orbital frontolateral cortex (OFLC), comprising the lateral part of the orbital gyrus (BA 11) as well as the medial part of the inferior frontal gyrus (BA 47 and 10), is a paralimbic structure that plays an important role in emotional processing: OFLC has been implicated in the evaluation of the emotional significance of sensory stimuli and is considered a gateway for preprocessed sensory information into the medial orbitofrontal paralimbic division, which is also involved in emotional processing (see Mega et al. 1997; Rolls and Grabenhorst 2008). As mentioned above, the violation of musical expectancies has been regarded as an important aspect of generating emotions when listening to music (Meyer 1956), and “breaches of expectations” have been shown to activate the lateral OFC (Nobre et al. 1999). Moreover, the perception of irregular chord functions has been shown to lead to an increase of perceived tension (Bigand et al. 1996), and perception of tension has been linked to emotional experience during music listening (Krumhansl 1997).

These findings show that unexpected musical events do not only elicit responses related to the processing of the structure of the music, but also emotional responses. (This presumably also holds for unexpected words in (p.162) sentences and any other stimulus which is perceived as more or less expected.) Thus, research using stimuli that are systematically more or less expected should ideally assess the valence and arousal experience of the listener (even if an experiment is not originally designed to investigate emotion), so that these variables can potentially be taken into account when discussing neurophysiological effects.

Limbic and Paralimbic Correlates

Functional neuroimaging and lesion studies have shown that music-evoked emotions can modulate activity in virtually all limbic/paralimbic brain structures—the core structures of the generation of emotions (see Figure 6.6 for an illustration). Because emotions include changes in endocrine and autonomic system activity, and because such changes interact with immune system function (Dantzer et al. 2008), music-evoked emotions form an important basis for beneficial biological effects of music as well as for possible interventions using music in the treatment of disorders related to autonomic, endocrine, and immune system dysfunction (Koelsch 2010; Koelsch and Siebel 2005; Quiroga Murcia et al. 2011).

Using PET, Blood et al. (1999) investigated brain responses related to the valence of musical stimuli. The stimuli varied in their degree of (continuous) dissonance and were perceived as less or more unpleasant (stimuli with the highest degree of continuous dissonance were rated as the most unpleasant). Increasing unpleasantness correlated with regional cerebral blood flow (rCBF) in the (right) parahippocampal gyrus, while decreasing unpleasantness correlated with rCBF in the frontopolar and orbitofrontal cortex, as well as in the (posterior) subcallosal cingulate cortex. No rCBF changes were observed in central limbic structures such as the amygdala, perhaps because the stimuli were presented under computerized control without musical expression (which somewhat limits the power of music to evoke emotions).

However, in another PET experiment, Blood and Zatorre (2001) used naturalistic music to evoke strongly pleasurable experiences involving “chills” or “shivers down the spine.” Participants were presented with a piece of their own favorite music using normal CD recordings; as a control condition, participants listened to the favorite piece of another subject. Increasing chills intensity correlated with rCBF in brain regions thought to be involved in reward and emotion, including the ventral striatum (presumably the nucleus accumbens, NAc; see also next section), the insula, anterior cingulate cortex (ACC), orbitofrontal cortex, and ventral medial prefrontal cortex. Blood and Zatorre also found decreases of rCBF in the amygdala as well as in the anterior hippocampal formation with increasing chills intensity. Thus, activity changes were observed in central structures of the limbic/paralimbic system (e.g., amygdala, NAc, ACC, and hippocampal formation). This was the first study to show modulation of amygdalar activity with music, and is important for two reasons: First, (p.163)

Neural Correlates of Music Perception

Figure 6.6 Illustration of limbic and paralimbic structures. The diamonds represent music-evoked activity changes in these structures (see legend for references, and main text for details). Note the repeatedly reported activations of amygdala, nucleus accumbens, and hippocampus, which reflect that music is capable of modulating activity in core structures of emotion (see text for details). Top left: view of the right hemisphere; top right: medial view; bottom left: anterior view; bottom right: bottom view. Reprinted with permission from Koelsch (2010).

the activity of core structures of emotion processing was modulated by music, which supports the assumption that music can induce “real” emotions, and not merely illusions of emotions (for details, see Koelsch 2010). Second, it strengthened the empirical basis for music-therapeutic approaches for the treatment of affective disorders, such as depression and pathologic anxiety, because these disorders are partly related to dysfunction of the amygdala. In addition, depression has been related to dysfunction of the hippocampus and the NAc (Drevets et al. 2002; Stein et al. 2007).

An fMRI study conducted by Koelsch et al. (2006) showed that activity changes in the amygdala, ventral striatum, and hippocampal formation can (p.164) be evoked by music even when an individual does not have an intense “chill” experiences. This study compared brain responses to joyful instrumental tunes (played by professional musicians) with responses to electronically manipulated, continuously dissonant counterparts of these tunes. Unpleasant music elicited increases in blood oxygenation level-dependent (BOLD) signals in the amygdala, hippocampus, parahippocampal gyrus, and temporal poles; decreases of BOLD signals were observed in these structures in response to the pleasant music. During the presentation of the pleasant music, increases of BOLD signals were observed in the ventral striatum (presumably the NAc) and insula (in addition to some cortical structures not belonging to limbic or paralimbic circuits, which will not be further reported here). In addition to the studies from Blood and Zatorre (2001) and Koelsch et al. (2006), several other functional neuroimaging studies (for reviews, see Koelsch 2010; Koelsch et al. 2010) and lesion studies (Gosselin et al. 2007) have showed involvement of the amygdala in emotional responses to music. Most of these studies reported activity changes in the amygdala in response to fearful musical stimuli, but it is important to note that the amygdala is not only a “fear center” in the brain. The amygdala also plays a role for emotions that we perceive as pleasant (for further details, see Koelsch et al. 2010).

Compared to studies investigating neural correlates of emotion with stimuli other than music (e.g., photographs with emotional valence, or stimuli that reward or punish the subject), the picture provided by functional neuroimaging studies on music and emotion is particularly striking: The number of studies reporting activity changes within the (anterior) hippocampal formation is remarkably high (for reviews, see Koelsch 2010; Koelsch et al. 2010). Previously it was argued (Koelsch et al. 2010) that the hippocampus plays an important role in the generation of tender positive emotions (e.g., joy and happiness), and that one of the great powers of music is to evoke hippocampal activity related to such emotions. The activity changes in the (anterior) hippocampal formation evoked by listening to music are relevant for music therapy because patients with depression or posttraumatic stress disorder show a volume reduction of the hippocampal formation, associated with a loss of hippocampal neurons and blockage of neurogenesis in the hippocampus (Warner-Schmidt and Duman 2006), and individuals with flattened affectivity (i.e., a reduced capability of producing tender positive emotions) show reduced activity changes in the anterior hippocampal formation in response to music (Koelsch et al. 2007). Therefore, it is reasonable to assume that music can be used therapeutically to: (a) reestablish neural activity (related to positive emotion) in the hippocampus, (b) prevent death of hippocampal neurons, and (c) stimulate hippocampal neurogenesis.

Similarly, because the amygdala and the NAc function abnormally in patients with depression, studies showing modulation of activity within these structures motivate the hypothesis that music can be used to modulate activity of these structures (either by listening to or by making music), and thus (p.165) ameliorate symptoms of depression. However, the scientific evidence for the effectiveness of music therapy on depression is surprisingly weak, perhaps due to the lack of high-quality studies and the small number of studies with randomized, controlled trials (e.g., Maratos et al. 2008).

An Evolutionary Perspective: From Social Contact to Spirituality—The Seven Cs

As discussed, music can evoke activity in brain structures involved in reward and pleasure (e.g., the NAc) as well as changes in the hippocampus, possibly related to experiences of joy and happiness. Here I will summarize why music is so powerful in evoking such emotions and relate these explanations to the adaptive value of music.

Music making is an activity involving several social functions. These functions can be divided into seven different areas (see also Koelsch 2010). The ability and the need to practice these social functions is part of what makes us human, and the emotional effects of engaging in these functions include experiences of reward, joy, and happiness; such effects have important implications for music therapy. Disengagement from these functions represents an emotional stressor and has deleterious effects on health (e.g., Cacioppo and Hawkley 2003). Therefore, engaging in these social functions is important for the survival of the individual, and thus for the human species. Below, I provide an outline of the seven different dimensions of social functions:

  1. 1. When we make music, we make contact with other individuals. Being in contact with others is a basic need of humans, as well as of numerous other species (Harlow 1958). Social isolation is a major risk factor for morbidity as well as mortality (e.g., Cacioppo and Hawkley 2003). Although no empirical evidence is yet available, I hypothesize that social isolation will result in hippocampal damage and that contact with other individuals promotes hippocampal integrity.

  2. 2. Music automatically engages social cognition (Steinbeis and Koelsch 2008a). As individuals listen to music, they automatically engage processes of mental state attribution (“mentalizing” or “adopting an intentional stance”), in an attempt to understand the intentions, desires, and beliefs of those who actually created the music. This is often referred to as establishing a “theory of mind” (TOM). A recent fMRI study (Steinbeis and Koelsch 2008a) investigated whether listening to music would automatically engage a TOM network (typically comprising anterior frontomedian cortex, temporal poles, and the superior temporal sulcus). In this study, we presented nontonal music from Arnold Schönberg and Anton Webern to nonmusicians, either with the cue that they were written by a composer or with the cue that they were generated by a computer. Participants were not informed about the (p.166) experimental manipulation, and the task was to rate after each excerpt how pleasant or unpleasant they found each piece to be. A post-imaging questionnaire revealed that during the composer condition, participants felt more strongly that intentions were expressed by the music (compared to the computer condition). Correspondingly, fMRI data showed during the composer condition (in contrast to the computer condition) a strong increase of BOLD signals in precisely the neuroanatomical network dedicated to mental state attribution; namely, the anterior medial frontal cortex (aMFC), the left and right superior temporal sulcus, as well as left and right temporal poles. Notably, the brain activity in the aMFC correlated with the degree to which participants thought that an intention was expressed in the composed pieces of music. This study thus showed that listening to music automatically engages areas dedicated to social cognition (i.e., a network dedicated to mental state attribution in the attempt to understand the composer’s intentions).

  3. 3. Music making can engage “co-pathy” in the sense that interindividual emotional states become more homogenous (e.g., reducing anger in one individual, and depression or anxiety in another), thus decreasing conflicts and promoting cohesion of a group (e.g., Huron 2001). With regard to positive emotions, for example, co-pathy can increase the well-being of individuals during music making or during listening to music. I use the term co-pathy (instead of empathy) because empathy has many different connotations and definitions. By using the term co-pathy we not only refer to the phenomenon of thinking what one would feel if one were in someone else’s position, we also refer to the phenomenon that one’s own emotional state is actually affected in the sense that co-pathy occurs when one perceives (e.g., observes or hears), or imagines, someone else’s affect, and this evokes a feeling in the perceiver which strongly reflects what the other individual is feeling (see also Singer and Lamm 2009). Co-pathy should be differentiated from:

    • Mimicry, which is a low-level perception action mechanism that may contribute to empathy.

    • Emotional contagion, which is a precursor of empathy (e.g., children laughing because other children laugh); both mimicry and emotional contagion may occur outside of awareness and do not require a self/other concept.

    • Sympathy, empathic concern, and compassion, which do not necessarily involve shared feelings (e.g., feeling pitiful for a jealous person, without feeling jealous oneself) (see Singer and Lamm 2009).

    (p.167) Thus, co-pathy requires self-awareness and self/other distinction (i.e., the capability to make oneself aware that the affect may have been evoked by music made by others, although the actual source of one’s emotion lies within oneself).

  4. 4. Music always involves communication (notably, for infants and young children, musical communication during parent-child singing of lullabies and play songs is important for social and emotional regulation, as well as for social, emotional, and cognitive development; Fitch 2006a; Trehub 2003; see also Trehub, this volume). Neuroscience and behavioral studies have revealed considerable overlap between the neural substrates and cognitive mechanisms that underlie the processing of musical syntax and language syntax (Koelsch 2005; Steinbeis and Koelsch 2008b). Moreover, musical information can systematically influence semantic processing of language (Koelsch et al. 2004; Steinbeis and Koelsch 2008b). It is also worth noting that the neural substrates engaged in speech and song strongly overlap (Callan et al. 2006). Because music is a means of communication, active music therapy (in which patients make music) can be used to train skills of (nonverbal) communication (Hillecke et al. 2005).

  5. 5. Music making also involves coordination of actions. This requires synchronizing to a beat and keeping a beat, a human capability that is unique among primates (Patel et al. 2009). The coordination of movements in a group of individuals appears to be associated with pleasure (e.g., when dancing together), even in the absence of a shared goal (apart from deriving pleasure from concerted movements; see also Huron 2001). Interestingly, a recent study from Kirschner and Tomasello (2009) reported that children as young as two and a half years of age synchronized more accurately to an external drum beat in a social situation (i.e., when the drum beat was presented by a human play partner) compared to nonsocial situations (i.e., when the drum beat was presented by a drumming machine, or when the drum sounds were presented via a loudspeaker). This effect might originate from the pleasure that emerges when humans coordinate movements between individuals (Overy and Molnar-Szakacs 2009; Wiltermuth and Heath 2009). The capacity to synchronize movements to an external beat appears to be uniquely human among primates, although other mammals and birds might also possess this capacity. A current hypothesis (e.g., Patel 2006) is that this capacity is related to the capacity of vocal learning (e.g., as present in humans, seals, some song birds, but not in nonhuman primates), which depends (in mammals) on a direct neural connection between the motor cortex and the nucleus ambiguus, which is located in the brainstem and which contains motor neurons that innervate the larynx; the motor cortex also projects directly to brainstem nuclei innervating the tongue, jaw, palate, and lips (e.g., Jürgens 2002).

  6. (p.168) 6. A sound musical performance by multiple players is only possible if it also involves cooperation between players. Cooperation implies a shared goal, and engaging in cooperative behavior is an important potential source of pleasure. For example, Rilling et al. (2002) reported an association between cooperative behavior and activation of a reward network including the NAc. Cooperation between individuals increases interindividual trust as well as the likelihood of further cooperation between these individuals. It is worth noting that only humans have the capability to communicate about coordinated activities during cooperation to achieve a joint goal (Tomasello et al. 2005).

  7. 7. Music leads to increased social cohesion of a group (Cross and Morley 2008; Lewis, this volume). Many studies have shown that humans have a “need to belong” and a strong motivation to form and maintain enduring interpersonal attachments (Baumeister and Leary 1995). Meeting this need increases health and life expectancy (Cacioppo and Hawkley 2003). Social cohesion also strengthens the confidence in reciprocal care (see also the caregiver hypothesis; Fitch 2005) and the confidence that opportunities to engage with others in the mentioned social functions will also emerge in the future.

Although it should clearly be noted that music can be used to manipulate other individuals as well as to support nonsocial behavior (e.g., Brown and Volgsten 2006), music is still special—although not unique—in that it can engage all of these social functions at the same time. This is presumably one explanation for the emotional power of music (for a discussion on the role of other factors, such as sexual selection, in the evolution of music, see Huron 2001; Fitch 2005). Music, therefore, serves the goal of fulfilling social needs (e.g., our need to be in contact with others, to belong, to communicate). In addition, music-evoked emotions are related to survival functions and other functions that are of vital importance for the individual.

It is worth mentioning that the experience of engaging in these social functions, along with the experience of the emotions evoked by such engagements, can be a spiritual experience (e.g., the experience of communion; the use of “spiritual” and “communion” is not intended to infer religious context). This may explain why many religious practices usually involve music.

Engaging in social functions during music making evokes activity of the “reward circuit” (from the lateral hypothalamus via the medial forebrain bundle to the mesolimbic dopamine pathway, and involving the ventral teg-mental area with projection to the NAc) and is immediately perceived as fun. Interestingly, in addition to experiences of mere fun, music making can also evoke attachment-related emotions (due to the engagement in the mentioned social functions), such as joy and happiness. This capacity of music is an important basis for beneficial biological effects of music, and thus for the use of music in therapy.

(p.169) A Multilevel View of Language and Music: The Music-Language Continuum

The above discussion has illustrated some overlaps of the cognitive operations (and neural mechanisms) that underlie music-and language-syntactic processing, as well as the processing of meaning in music and language. These overlaps indicate that “music” and “language” are different aspects, or two poles, of a single continuous domain. I refer to this domain as the music-language continuum. Several design features (Fitch 2006a; Hockett 1960a) of “music” and “language” are identical within this continuum: complexity, generativity, cultural transmission, and transposability. Complexity means that “musical signals (like linguistic signals) are more complex than the various innate vocalizations available in our species (groans, sobs, laughter and shouts)” (Fitch 2006a:178). Generativity means that both “music” and “language” are structured according to a syntactic system (usually involving long-distance dependencies). Cultural transmission means that music, like language, is learned by experience and culturally transmitted. Transposability means that both “music” and “speech” can be produced in different keys, or with different “starting tones,” without their recognition being distorted.

Two additional design features that could be added to this list are universality (all human cultures that we know of have music as well as language) and the human innate learning capabilities for the acquisition of music and language. Even individuals without formal musical training show sophisticated abilities with regard to the decoding of musical information, the acquisition of knowledge about musical syntax, the processing of musical information according to that knowledge, and the understanding of music. The innate learning capability for the acquisition of music indicates that musicality is a natural ability of the human brain; it parallels the natural human ability to acquire language. Perhaps this natural human ability is a prerequisite for language acquisition, because it appears that an infant’s first step into language is based on prosodic features: Infants acquire considerable information about word and phrase boundaries (possibly even about word meaning) through different types of prosodic cues (i.e., of the musical cues of language such as speech melody, meter, rhythm, and timbre). With regard to production, many cultures do not have concepts such as “musician” and “nonmusician” let alone “musical” and “unmusical” (Cross 2008). This indicates that, at least in some cultures, it is natural for everyone to participate actively in cultural practices involving music making. Even in Western cultures, which strongly distinguishes between “musicians” and “nonmusicians,” it is natural for everyone to participate in singing (e.g., during religious practices or while at rock concerts).

Beyond these design features, there are also design features that are typical for either “music,” at one end of the continuum, or “language,” at the other, (p.170) but which overlap between language and music in transitional zones, rather than being clear-cut distinctive features for “music” or “language” in general. These features are scale-organized discrete pitch, isochrony, and propositional semantics.

Pitch information is essential for both music and speech. With regard to language, tone languages rely on a meticulous decoding of pitch information (due to tones coding lexical or grammatical meaning), and both tonal and nontonal languages use suprasegmental variations in F0 contour (intonation) to code structure and meaning conveyed by speech (e.g., phrase boundaries, questions, imperatives, moods, and emotions). Music often uses sets of discrete pitches, whereas discrete pitches are not used for speech. However, pitches in music are often less discrete than one might think (e.g., glissandos, pitch bending), and many kinds of drum music do not use any scale-organized discrete pitches. On the other hand, the pitch height of different pitches produced during speaking appears not to be arbitrary, but rather to follow principles of the overtone series, which is also the basis of the pitches of many musical scales (e.g., Ross et al. 2007). In particular, emphatic speech (which borders on being song) often uses discrete scalelike pitches (in addition to more isochronous timing of syllables). This illustrates that discrete pitches (such as piano tones) are at the musical end of the music-language continuum and that a transitional zone of discrete pitch usage exists in both “music” and “language” (for the use of pitch in music and language, see Ladd, this volume).

The isochronous tactus (or “beat”), on which the musical signals are built in time, is at the “musical” end of the continuum. Though such an isochronous pulse does not appear to be a characteristic feature of spoken language, it can be found in poetry (Lerdahl 2001a), ritualistic speech, and emphatic speech. Still, not all kinds of music are based on a tactus (in particular, pieces of contemporary music). Thus, like discrete pitches, isochronous signals are more characteristic of the musical end of the music-language continuum, and the transitional zone from isochronous to non-isochronous signals is found in both music and speech. As mentioned above, emphatic speech borders on song and often uses discrete scale-like pitches as well as more isochronous timing of syllables. Art forms such as poetry, rap music, or recitatives represent transitional zones from speech to song.

With regard to the language end of the continuum, I have already mentioned in the discussion on musical meaning that no musical tradition makes use of propositional semantics unless music imitates language, as in drum and whistle languages (Stern 1957). Nevertheless, music can prime representations of quantifiers (such as “some” and “all”) and possibly also evoke at least vague associations of some modals (such as “must” in passages conveying strong intentionality) or connectives (by establishing dependency relations between musical elements, such as the pivot chord in a tonal modulation that either belongs to one or to another key). In Western music, such capabilities can be (p.171) used to convey narrative content, but clearly there is no existence of (or necessity for) a full-blown vocabulary of propositional semantics of language. On the other hand, quantifiers, modals, or connectives are often used imprecisely in everyday language (think of the “logical and,” or the “logical or”). The mere existence of the two words “propositional” and “nonpropositional” leads easily to the illusion that there is a clear border between these two concepts or that one is the opposite of the other. However, although propositional semantics is characteristic of the language pole of the music-language continuum, a transitional zone of propositionality overlaps both language and music. Fitch noted that “lyrical music, which because it incorporates language, thus automatically inherits any linguistic design features (Fitch 2006a:176). Therefore, anyone interested in what listening to music with propositional semantics feels like, just has to listen to a song with lyrics containing propositional semantics (for the hypothesis that true-false conditions are not required in music to be made mutually explicit, see Cross 2011; Koelsch 2011a).

Meaning specificity is another design feature that is often taken as characteristic for language. Language appears to be more suitable to refer to objects of the extra-individual world; that is, objects which can be perceived by different individuals, and whose existence and qualities can thus be verified, or falsified, by others. However, although they possess a limited vocabulary, musical cultures have extramusical sign qualities that can also convey such meaning; the symbolic sign quality of music, for example is, by definition, just as specific as the symbolic sign quality of words (although such a lexicon is not comparable quantitatively to the lexicon of language). Similarly to the terms “propositional” and “nonpropositional,” the terms “communication” (in the sense of conveying specific, unambiguous information with language) and “expression” (in the sense of conveying rather unspecific, ambiguous information with music) are normally used as if there were two separate realms of conveying meaningful information (communication and expression) with a clear border between them. However, this notion is not accurate, because there is a continuous degree of specificity of meaning information, with “expression” being located toward one end and “communication” toward the other.

More importantly, music can communicate states of the intra-individual world; that is, states which cannot be perceived by different individuals, and whose existence and qualities can thus not be falsified by others; they can, however, be expressed in part by facial expression, voice quality, and prosody (Scherer 1995; Ekman 1999): Music can evoke feeling sensations which bear greater interindividual correspondence with the feeling sensations of the producer than the words that an individual can use to describe these sensations (see discussion of a priori musical meaning in the section on musical meaning). In this sense, music has the advantage of defining a sensation without this definition being biased by the use of words. Although music might seem to be “far less specific” (Slevc and Patel 2011) than language (in terms of semantics), music can be more specific when it conveys sensations that are problematic to (p.172) express in words. Importantly, in spoken language, affective prosody operates in part on this level because it elicits sensational phenomena in a perceiver that resemble those that occur in the producer. This notion is supported by the observation that affective information is coded with virtually identical acoustical features in speech and music (Scherer 1995; Juslin and Laukka 2003). For the design feature of translatability, see Patel (2008); for the design features of performative contexts and repertoire, see Fitch (2006a).

The description of the design features illustrates that the notion of clear-cut dichotomies of these features, and thus of clear-cut boundaries between music and language, is too simplistic. Therefore, any clear-cut distinction between music and language (and thus also any pair of separate definitions for language and music) is likely to be inadequate, or incomplete, and a rather artificial construct. Due to our “language games” (Wittgenstein), the meaning of “music” and “language” is sufficiently precise for an adequate use in everyday language. For scientific language, however, it is more accurate to consider the transitional nature of the design features and to distinguish a scientific use of the words “music” and “language” from the use of these words in everyday language. Using the term music-language continuum acknowledges both the commonalities between music and language and the transitional nature of the design features of music and language.


(1) It is a logical fallacy to assume that one could provide empirical evidence for resources that are “distinctively musical” vs. “distinctively linguistic,” because it is not possible to know with certainty what the musical analog for a linguistic phenomenon is (and vice versa). For example, if I do not know the musical analog of a verb inflection, then I can only arbitrarily pick, out of an almost infinite number of musical phenomena, one by which to compare the processing of such musical information with the processing of verb inflections. If the data point to different processes, then there is always the possibility that I just picked the wrong musical analog (and numerous studies, published over the last two decades, have simply made the wrong comparisons between musical and linguistic processes, ostensibly showing distinct resources for music and language). Only if an interaction between processes is observed, in the absence of an interaction with a control stimulus, can one reasonably assume that both music and language share a processing resource.

(2) The semantic cloze probability of the sentence “He drinks to cold beer” is higher than the cloze probability of the sentence “He sees the cold beer”: after the words “He sees the cold…” anything that is cold and can be seen is able to close the sentence, whereas after the words “He drinks the cold…” only things that are cold and that one can drink are able to close the sentence.

(3) The fact that an interaction between language-syntactic irregularity and the ERAN was ob-served in the study by Steinbeis and Koelsch (2008a) but not in the study by Koelsch et al. (2005) is probably due to the task: participants in the study by Steinbeis and Koelsch (2008a), but not in the study by Koelsch et al. (2005), were required to attend to the chords. For details see Koelsch (2012).

(4) See Koelsch (2012) for further intramusical phenomena that give rise to musical meaning, including meaning that emerges from the buildup of structure, the stability and extent of a structure, the structure following a structural breach, resolution of a structural breach, and meaning emerging from large-scale musical structures.