Shared Meaning, Mirroring, and Joint Action
Shared Meaning, Mirroring, and Joint Action
Abstract and Keywords
Mirroring the behavior of others implies the existence of an underlying neural system capable of resonating motorically while this behavior is observed. This chapter aims to show that many instances of “resonance” behavior are based on a mirror mechanism that gives origin to several types of mirror systems, such as those for action and intention, understanding, and empathy. This mechanism provides an immediate, automatic kind of understanding that matches biological stimuli with internal somatomotor or visceromotor representations. It can be associated with other cortical circuits when an understanding of others’ behavior implies inferential processes. Properties of the mirror system are described in monkeys, where it was originally discovered, as well as in humans. Discussion follows on how the system can be involved in social cognitive functions such as understanding of goal-directed motor acts, intention, and emotions. The possible involvement of this system/mechanism in mirroring language and music is then discussed, and it is suggested that it initially evolved for action understanding in nonhuman primates and could have been exploited for other functions involving interindividual interactions. Published in the Strungmann Forum Reports Series.
The capacity to mirror others applies to several types of behaviors. Overtly, it can simply be manifested as a kind of automatic resonance, such as in contagious behavior, mimicry, or in the chameleon effect. Perhaps the most diffuse and adaptively relevant mirroring behavior is imitation. This term, if used in a broad sense, includes several types of different processes, ranging from true imitation (i.e., to copy the form of an action; see Whiten et al. 2009) to action facilitation (e.g., an increase in frequency of an observed behavior that belongs to the observer’s repertoire) to emulation, which consists in reproducing the goal of an observed behavior independent of the means used to achieve it. Depending on the situation or context, one or more of these (p.84) imitative processes may be involved. For example, to extract an object from a closed container that can be opened in different, equally efficient ways, it is not necessary to adopt the exact same strategy used by a demonstrator (Whiten 1998). However, to learn a new guitar chord, the exact finger posture and sequence shown by the expert must be reproduced. While imitative processes such as action facilitation or neonatal imitation have been observed in nonhuman primates, instances of true imitation in nonhuman primates have been described mainly in apes. Monkeys show a very limited imitative capacity (Visalberghi and Fragaszy 2001).
These examples suggest that mirroring others involves very different behaviors, although the underlying neural capacity of “resonating” motorically in them all can be recognized (Rizzolatti et al. 2002). This motor resonance could consist of at least two different neural mechanisms: one related to simple movements or meaningless gestures, the other to meaningful and transitive motor acts. The neural mechanisms underlying these different types of motor resonance have been examined over the last twenty years at the single neuron and population levels. At the single neuron level, many studies in monkeys have demonstrated that the capacity to resonate during the observation of others’ actions is provided by a peculiar category of visuomotor neurons (mirror neurons), which play an important role in enabling monkeys to understand the actions of others. A similar system has been demonstrated in humans, mainly at the population level. Here, action understanding is not only enabled, it appears to be (possibly as a result of new evolutionary pressures) involved in manifesting the result of the neural resonance mechanism, as in imitation.
I begin with a description of the basic features of the mirror system in monkeys and humans. I specifically address the role of the system in intention and emotion understanding and discuss the mirroring function in language and music, including its possible neural substrate.
The Mirror System in Monkeys
For a long time, it was assumed that the motor cortex (i.e., the posterior part of the frontal lobe) could be divided into two main subdivisions: Brodmann’s area 4 and area 6 (Brodmann 1909). From a functional perspective, these areas correspond to the primary motor and premotor cortex, respectively. However, the premotor cortex itself can be subdivided into a mosaic of areas, as shown by parcellation studies (Figure 4.1; see also Matelli et al. 1985, 1991). Accordingly, the ventral part of monkey premotor cortex (PMV) is composed of two areas: F4, caudally, and F5, rostrally (Figure 4.1a).
Area F4 controls goal-directed axial and proximal movements (Fogassi et al. 1996), whereas area F5 controls both hand and mouth goal-directed movements, such as grasping, manipulation, biting, etc. (Rizzolatti et al. 1988). In addition to the purely motor neurons in area F5, there are two classes of (p.85)
The response of these neurons is very specific; they do not respond to the presentation of a simple object or to an observed act that is mimicked without the target. Visual response is also independent of several details of the observed (p.86)
(p.87) From these examples it is clear that the most important property of mirror neurons is to match the observation or hearing of a motor act belonging to the monkey motor repertoire with its execution. It has been proposed that when the input related to the observed or listened motor act has access (in the observer) to the internal motor representation of a similar motor act, this input achieves its meaning. In other words, the internal motor knowledge of an individual constitutes the basic framework for understanding the motor acts of others. I use the term “internal motor knowledge” because this knowledge is not built through the hierarchical elaboration of sensory input but rather through the motor system. Of course, we can “recognize” motor acts by simply using the visual or auditory representations of the biological objects that are observed or heard. However, we do not “internally” understand the meaning of these stimuli.
As is typical of other types of functions that require visuomotor integration, mirror neurons are the result of a process based on a specific anatomical circuit. The main areas belonging to this circuit are the anterior part of the superior temporal sulcus (STSa), the rostral part of inferior parietal cortex, and the PMV. STSa contains neurons that discharge during the observation of biological actions and, more specifically, of hand motor acts, but they are not endowed with motor properties (Perrett et al. 1989). This region is anatomically linked with area PFG (Figure 4.1) of the inferior parietal lobule (IPL), which in turn is connected to area F5 (Bonini et al. 2010; Rozzi et al. 2006). These two links allow the matching, observed in mirror neurons, between the visual representation of a hand motor act and its motor representation. Remarkably, mirror neurons have also been found in area PFG (Fogassi et al. 2005; Gallese et al. 2002; Rozzi et al. 2008) and have properties very similar to those of premotor mirror neurons. This similarity is not surprising for two reasons: First, because of the reciprocal connections between the two areas, neurons belonging to them share several properties. Second, evidence that neurons of IPL also have strong motor properties (see Rozzi et al. 2008) suggests that the matching between visual representations (of observed acts) and motor representations (of the same acts) may, in principle, also occur in this area.
All of the evidence reviewed above shows that mirror neurons activate in relation to goal-directed motor acts. These acts can also be defined as transitive movements because they are directed toward a target, in contrast to intransitive ones which lack a target. Is there any evidence for neural mechanisms underlying other types of mirroring that involve intransitive movements? As discussed, mirror neurons in the monkey do not respond to mimicked motor acts. Even when mimicking evokes their activation, this is normally weaker than when it is obtained during observation of the same act interacting with a target (Kraskov et al. 2009). There is, however, an exception: among mirror neurons that respond to the observation of mouth motor acts, there is a class that is selectively activated by the observation and execution of monkey communicative gestures (Ferrari et al. 2003; see also below). Thus, from the evidence (p.88) collected in monkeys, it can be concluded that mirror neurons respond to the observation of meaningful movements, where the word “meaningful” includes both transitive movements (goal-directed motor acts) and gestures endowed with meaning. The fact that mirror neurons for intransitive movements concern only the orofacial motor representation could be related to their role in communication.
In summary, the presence of mirror neurons responding—even when the motor act is hidden or the sound of it is heard—is a strong indication for suggesting that monkey mirror neurons underpin the understanding of goal-related motor acts. However, whether their discharge also has a role in triggering social interactions is not known. Recent data suggest, albeit indirectly, that the modulation of the response of mirror neurons to visual cues could be related to the behavioral reaction of the observing individual. Caggiano et al. (2009) studied mirror neuron response while monkeys observed an experimenter performing a grasping act that took place within the monkey’s reaching space (peripersonal space) or outside of it (extrapersonal space). Their results show that 50% of the mirror neurons recorded with this paradigm were sensitive to the distance at which the agent performed the observed motor act; half of these responded more strongly when the observed motor act was performed close to the monkey (peripersonal neurons), the other half when it was performed far from the monkey (extrapersonal neurons). This modulation has been proposed to be related to the possible subsequent interaction of the observer with other individuals whose behavior takes place inside or outside the observer’s reaching space.
Mirroring and Shared Attention
Shared attention involves the interaction between two individuals and can be instrumental in achieving joint activity between individuals. One mechanism that may contribute to the capacity for shared attention is gaze following (i.e., orienting one’s eyes and head in the same direction in which another individual is looking). Gaze-following behavior has been described in both apes and monkeys (Ferrari et al. 2000; Tomasello et al. 1998). What could be the neural underpinning of it?
Similar to the visual recognition of effectors, such as the forelimb or the body, the observation of eye position is known to activate neurons in the STS (Perrett et al. 1992). Recently, however, the presence of mirror neurons for eye movements in the lateral intraparietal area (LIP) has been demonstrated. This area, located inside the intraparietal sulcus, is part of a circuit involving the frontal eye field; it plays a crucial role in organizing intended eye movements (Andersen and Buneo 2002). Most LIP neurons have a visual receptive field and discharge when the monkey gazes in the direction of the receptive field. A subset has also been found to discharge when a monkey observes the picture (p.89) of another monkey looking at the neuron-preferred direction (Shepherd et al. 2009), although their discharge is weaker when the observed monkey gazes in the opposite direction. Interestingly, in this study, the picture is presented frontally, outside the neuron-preferred spatial direction. This means that the signal elicited by gazing at the other monkey is processed by high-order visual areas and fed to LIP, which in principle could use it to orient the eyes toward the same direction that the observed monkey is looking. This finding suggests that a system involved in the control of eye movements toward spatial targets is endowed with a mirror mechanism. Just as the mirror system for hand and mouth motor acts, this system is goal-related; in this case, the goal is a given spatial location. Shepherd et al. (2009), however, did not investigate whether, as in the hand and mouth mirror system, the coded goal can be broad (e.g., left or right) or more specific (e.g., in relation to a given gaze amplitude).
Thus, shared attention may be partly subserved by a cortical mechanism that is strongly involved in the control of voluntary eye movements. Although the gaze-following reaction allows an individual to share the same target with another, it can only partly explain joint attention. Other factors are required, in particular the capacity of the two individuals to share intentional communication.
The Mirror System in Humans and a Comparison between Monkeys and Humans
Electrophysiological and neuroimaging studies have demonstrated the existence of a mirror system in humans: when the action of others are observed, a cortical network becomes active when the same motor acts are executed (Rizzolatti and Craighero 2004). This network is made up of two main nodes: the frontal node includes the premotor cortex and the posterior part of the inferior frontal gyrus (IFG); the caudal node is formed by the inferior parietal cortex and, in part, the superior parietal cortex (Figure 4.1b). On the basis of anatomical and other functional considerations, the ventral premotor/IFG and inferior parietal areas belonging to this system can be considered homologous to area F5 and area PFG of the monkey, respectively. Finally, in agreement with the findings on STS in the monkey, a sector of STS is activated when motor acts are observed, but not during the execution of the same motor acts. When comparing monkey and human data, one must be cautious because it is difficult to record from single neurons in humans. This is only possible in neurosurgical patients, and only for a very limited time (for a single neuron study on mirror properties, see Mukamel et al. 2010). Furthermore, it is important to note that while the most commonly employed neuroimaging technique, fMRI, basically reveals a presynaptic activity, single neuron data express neuronal output (i.e., the outcome of presynaptic integration). Therefore, although the resolution of (p.90) electrophysiological and neuroimaging techniques is progressively improving, these methods primarily support inferences at the population level.
Mirroring Intransitive Movements in Monkeys and Humans
Information about mirror responses to intransitive movements in monkeys is very restricted. Just as in the response to transitive motor acts, mirror neurons that respond to the observation of mimed motor acts have only been recorded in area F5 (Kraskov et al. 2009). Note, however, that the monkeys used for this study were previously trained to use a tool to catch food that was out of reach. Umiltà et al. (2008) have demonstrated that F5 neurons code grasping that is performed with tools, both during observation and execution, once the monkeys are trained to use those tools; this confirms the capacity of motor and mirror neurons to code the goal at an abstract level. Thus, it is possible that the mirror responses to mimed motor acts are due to a high abstraction capacity of the premotor cortex. To date, during observation of mimed motor acts, strong mirror responses have not been reported in the parietal area PFG, but it is possible that the sample did not include this type of neuron. This may be because mirror neurons are distributed more sparsely in PFG than in F5.
Although the proportion of neurons responding to intransitive motor acts or orofacial gestures in F5 is small (Ferrari et al. 2003 and previous section), one can hypothesize that from this area, neurons capable of coding goal-directed acts evolved to assign goal-directedness to pantomimes as well as to “ritualize” such acts, thus transforming the original meaning of their discharge (see Arbib 2005a).
In contrast to monkey studies, human data have revealed that cortical activation can be elicited through the observation of meaningful as well as meaningless movements. For example, transcranial magnetic stimulation (TMS) studies1 show that the observation of meaningless movements elicits a resonance in the motor cortex, and that this activation corresponds somatotopically to that of the effector performing the observed movements (Fadiga et al. 1995). In neuroimaging studies, observation of meaningless movements appears to activate a dorsal premotor-parietal circuit that is different from the circuit activated by observing goal-directed motor acts (Grèzes et al. 1998). In contrast, (p.91) observation of pantomimed motor acts activates the same premotor and IFG regions as those activated by the same act when directed to a target (Buccino et al. 2001; Grèzes et al. 1998; for more on the importance of pantomime in human evolution, see Arbib and Iriki, this volume, and Arbib 2005a). As far as the inferior parietal cortex is concerned, miming of functional motor acts activates the same sectors of the parietal cortex that are active during the observation of goal-related motor acts. In particular, the observation of symbolic gestures appears to activate both the ventral premotor and the inferior parietal cortex; however, the latter involves more posterior sectors than those activated by the observation of goal-directed motor acts (Lui et al. 2008). These reports indicate that the observation of intransitive movements may partly activate the same regions that belong to the classical “mirror system”; however, different regions of frontal and parietal cortices may be involved, depending on whether these movements are meaningless or meaningful.
For intransitive gestures, we need to consider those belonging to sign language. Neuroimaging studies have shown that the basic linguistic circuit is activated by the production and comprehension of sign language (see MacSweeney et al. 2008). In two single-case neuropsychological studies in deaf signers (Corina et al. 1992; Marshall et al. 2004), a dissociation was found between a clear impairment in the use of linguistic signs and a preserved use of pantomime or nonlinguistic signs. Although in normally hearing nonsigners there is overlap between some of the cortical sectors activated by observation of pantomime and those involved in language, it may well be possible that, in deaf signers, a cortical reorganization could have created clearly separate circuits for signing and pantomiming, due to the need to use forelimb gestures for linguistic communication.
Mirroring Implies the Retrieval of “First-Person” Knowledge
A fundamental aspect of the mirror mechanism is that the resonance of the motor system, during observation of motor acts, normally occurs when the observed actions are already present in the observer’s motor repertoire. That is, the observer has “first-person” knowledge of observed acts. Motor acts which do not belong to the observer’s motor repertoire should thus not activate the motor system. Precisely this has been observed in an fMRI study by Buccino et al. (2004a). Here, participants were presented with video clips of mouth gestures performed by a man, a monkey, and a dog. Two types of gestures were shown: the act of biting a piece of food and oral silent communicative gestures (e.g., speech reading, lip smacking, barking). Results showed that ingestive gestures performed by an individual of another species (e.g., a dog or a monkey) activate the human mirror system. For communicative gestures, those performed by a human activated the mirror system (particularly the IFG); those performed by non-conspecifics only weakly activated it (monkey gesture) or did not activate it at all (silent barking). In the case of silent barking, only (p.92) higher-order visual areas showed signal increase. These findings indicate that while all observed acts activate higher-order visual areas, only those that are known motorically by an individual—either because the acts were learned or already part of the innate motor repertoire—enter their motor network. Visual areas, such as STS, only provide the observer with a visual description of the observed act.
In summary, Buccino et al.’s study suggests that what we call “understanding” can be related to different mechanisms. The kind of automatic understanding of actions that I have described until now is based on a pragmatic, first-person knowledge. Other types of understanding, including that which results as an outcome of inferential reasoning (discussed below), are based on mechanisms that may allow discrimination between different behaviors, but which are not related to motor experience.
Intention Coding and Emotion Understanding
Actions are organized as sequences of motor acts and are aimed at an ultimate goal (e.g., eating). Motor acts that make up an action sequence, however, have subgoals (e.g., grasping an object). The ultimate goal of an action is thus achieved when fluently linked motor acts are executed (see Jeannerod 1988). The action’s final goal corresponds to the motor intention of the acting agent. Although there has been rich evidence for the existence of neurons coding the goal of motor acts, the neural organization underlying action goals has, until recently, been poorly investigated.
It has now been shown that monkey premotor and parietal motor neurons play an important role in coding the intention of others (Bonini et al. 2010; Fogassi et al. 2005). During the execution of grasping for different purposes, neuronal discharge varies depending on the final goal of the action in which the grasping is embedded (e.g., grasping a piece of food for eating purposes vs. grasping an object on which to place food). Interestingly, during the observation of a grasping motor act that is embedded in different actions, the visual discharge of mirror neurons is also modulated according to the action goal. These data suggest that when the context in which another’s actions unfold is unambiguous, mirror neurons may play an important role in decoding the intentions of others. A long-standing debate exists, however, as to the level and type of neural mechanisms involved in coding the intentions of others (see Gallese and Goldman 1998; Jacob and Jeannerod 2005; Saxe and Wexler 2005). The main theoretical positions are:
1. The so-called theory theory: ordinary people understand others’ intentions through an inferential process that allows the observer to link internal states with behavior, thus enabling the observer to forecast the mental state of the observed individual.
(p.93) 2. The simulation theory: individuals are able to decode intentions of others because, by observing their behavior, they can reproduce it internally, as if they were simulating that specific behavior.
Only the second theory implies a role of the motor system in this mental function; the first implies the involvement of other cognitive circuits. While many have argued for the exclusive truth of one theory or another, it is possible that both processes are involved under certain situations. For example, when observing someone struggling with a door, we may “automatically” assume that the immediate intention is to open it (simulation theory). If, however, the door is the front door of our house and we do not know this person, we might infer that he is planning to burglarize our house (theory theory), since burglary is not part of our own action repertoire (Arbib 2012). The data described here on mirror neurons provide evidence for neural mechanisms and support the simulation theory part of this scenario; that is, showing how actions of others may be internally reproduced by the activity of the observer’s motor system which codes his own intentions. In other words, the parieto-premotor mirror neuron circuit may involve a primitive form of intention understanding that occurs automatically without any type of inference process. This function requires both the activation of mirror neurons specific to a given motor act as well as contextual and mnemonic information. Note, however, that before action execution, contextual and mnemonic information per se is not enough to elicit the differential discharge of mirror neurons. In fact, this discharge is present only when the observed agent executes a specific motor act capable of eliciting a mirror neuron response.
A similar basic process of intention understanding has been shown in human subjects as they observe motor acts performed within different contexts (Iacoboni et al. 2005). In this study, participants were presented with different video clips showing:
• Two different contexts: one shows an array of objects arranged as if a person is just about to have tea; the other shows the same objects arranged as if a person has just had tea.
• Two ways of grasping a cup (motor act) without context (empty background).
• Two ways of grasping a cup but this time within one of the two contexts (intention condition).
Iacoboni et al.’s hypothesis was that if the mirror system is modulated by the observed motor act as well as by the global action in which this act is embedded, the presentation of the same act within a cueing context should produce a higher activation than when the context is absent and, possibly, a different activation when viewing the grasping actions in the two contexts. Independent of whether participants were simply asked to observe the video clips or to observe them to figure out the intention underlying the grasping act within the (p.94) context, results showed that the “intention condition” selectively activated the right IFG, when compared with the other two conditions. This suggests that decoding motor intention in humans can occur automatically, without the need for inferential reasoning. This automatic form of intention understanding occurs frequently in daily life; however, as mentioned above, there are situations in which a reasoning process is necessary to comprehend the final goal of the observed behavior of another individual. This is typical of ambiguous situations. For example, in an fMRI study by Brass et al. (2007), participants had to observe unusual actions performed in plausible and implausible situations. Here, activation, which resulted from the subtraction between implausible and plausible situations, occurred in the STS and, less reliably, in the anterior front-omedian cortex. These two regions, together with the cortex of the temporoparietal junction, are considered to belong to a “mentalizing” system involved in inferential processes, based on pure visual elaboration of the stimuli. Thus, one can hypothesize that during observation of intentional actions, when the task specifically requires inferences to understand the agent’s intentions, an additional network of cortical regions, besides the mirror system, may activate (Figure 4.3).
The discharge of mirror neurons appears to represent others’ actions and intentions through a matching process that occurs at the single neuron level. The reason why an observer can recognize another’s action is because the same neural substrate is activated, regardless of whether the observer thinks of performing (or actually do perform) an action or whether he sees another person perform this same action. Activation of the mirror system alone does not allow the observer to discriminate between his and the other’s action. One explanation may be that a difference in the intensity or timing exists between the visual and motor response of mirror neurons, thus providing the observer with information on sense of agency. To date, however, this has not been demonstrated. There are sensory cues, however, that tell the observer who is acting. For example, proprioceptive and tactile signals are only at work when the observer is acting, whereas non-egocentric visual information reaches the observer only during observation. In joint actions, the observer’s motor representations are activated by his own as well as the other’s actions. Thus, one can predict that the visual discharge of mirror neurons of the observer could come first, activating the corresponding motor representation; thereafter, a motor activation related to his behavioral reaction should follow, accompanied by proprioceptive, tactile, and egocentric visual feedback. This process is not as simple as it appears. Usually, humans internally anticipate the consequences of motor acts so that the two neuronal populations—mirror neurons and purely motor neurons involved in the motor reactions—could be active almost at the same time. Furthermore, when considering the capacity of mirror neurons to predict the intentions of others, we need to account for the comparison that takes place between the predicted consequences of another’s behavior and the actual performance. These processes have not been thoroughly investigated at the single neuron level. Interestingly, a recent study, which used a multidimensional recording technique, found that both premotor and parietal cortex neuronal populations can show some distinction of self action from that of the other (Fujii et al. 2008).
In humans, studies that used different paradigms (e.g., reciprocal imitation, taking leadership in action, or evaluation of observed actions performed by themselves or others) came to the conclusion that, beyond a shared neural region (mirror system), other cortical regions are selectively activated for the sense of agency. Among these, the inferior parietal cortex, the temporoparietal junction, and the prefrontal cortex seem to be crucial structures for distinguishing between own versus others’ action representations (Decety and Chaminade 2003; Decety and Grèzes 2006).
Mirror System and Emotions
The mirror-matching mechanism appears to be most suitable in explaining the basic human capacity for understanding the emotions of others. In his (p.96) fascinating book, The Expression of the Emotions in Man and Animals, Darwin (1872) described, in a vivid and detailed way, the primary emotional reactions and observed how similar they are among different species as well as, in the human species, among very different cultures.
Emotions are crucial for our behavioral responses. They are controlled by specific brain structures belonging to the limbic system and involving many cortical and subcortical sectors, such as cingulate and prefrontal cortex, amygdala, hypothalamus, medial thalamic nucleus, and orbitofrontal cortex (Figure 4.4a; see also LeDoux 2000; Rolls 2005b; Koelsch this volume). Although very much linked to the autonomous nervous system, the skeletomotor system is also involved in the expression of emotions. A good example, emphasized by Darwin, is the association between facial expressions (and also some other body gestures) and specific emotions. This link is so strong in humans that we recognize immediately the type of emotion felt by another individual, just by viewing facial expressions. This highlights the importance of signals about others’ emotions for our own behavior. These signals are advantageous because they allow us to avoid danger, to achieve benefits, and to create interindividual bonds. Therefore, the mechanisms that underlie the understanding of others’ emotions—the core of empathy—constitute an important issue for research.
There are different theories about how we understand the feelings of others, and most are based on the decoding of facial expressions. One theory maintains that the understanding of others’ emotions occurs through inferential elaboration, based on emotion-related sensory information: a certain observed facial expression means happiness, another sadness, and so on. Another, very different theory holds that we can understand emotions because emotion-related sensory information is directly mapped onto neural structures that, when active, determine a similar emotional reaction in the observer. This theory implies a “mirroring” of the affective state of the other individual and involves a partial recruitment of the visceromotor output associated with specific facial expressions. In fact, the neural structures related to affective states are also responsible for visceromotor outputs (i.e., motor commands directed to visceral organs).
Results from several studies suggest that a neural mechanism similar to that used by the mirror system for action understanding is also involved in the understanding of emotions. For example, in subjects instructed to observe and imitate several types of emotional facial expressions, Carr et al. (2003) demonstrated that the IFG and the insular cortex were activated. Frontal activation would be expected on the basis of the motor resonance (discussed above). Regarding the insular cortex (see Figure 4.4b), this region is the target of fibers that convey information about an individual’s internal body state (Craig 2002), in addition to olfactory, taste, somatosensory, and visual inputs. With respect to the motor side, according to older data in humans (Penfield and Faulk 1955; Showers and Lauer 1961; for a recent demonstration, see Krolak-Salmon et al. 2003), it has been reported (Caruana et al. 2011) that (p.97)
The relevance of the insular cortex for the understanding of emotions has been further elucidated in various fMRI studies. One of these (Wicker et al. 2003) investigated areas activated by disgust. Here, participants had, in one condition, to smell pleasant or disgusting odorants and, in another, to observe video clips of actors smelling disgusting, pleasant, and neutral odorants and expressing the corresponding emotions. The most important result was that the same sectors of the anterior insula and (to a lesser degree) anterior cingulate cortex (see Figure 4.4c) were activated when a participant was exposed to disgusting odorants and when disgust was observed. No such overlap was found, however, in the insula for pleasant stimuli. Interestingly, clinical studies show that insular lesions produce deficits in recognizing disgust expressed by others (Adolphs et al. 2003; Calder et al. 2000). As far as the cingulate cortex is concerned, this phylogenetically old region is subdivided (a) along its rostrocaudal axis into a posterior, granular and an anterior, agranular sectors and (b) along its dorso-ventral axis into four sectors, from that adjacent to the corpus callosum to the paracingulate gyrus. Whereas the ventral part (area 24) is more endowed with motor functions, the anterior dorsal part is activated by painful (p.98) stimuli; it contributes to the processing of aversive olfactory and gustatory stimuli. During a neurosurgical operation, Hutchison et al. (1999) report that a single neuron in this cortical sector responded both when the patient received a painful stimulus to his finger as well as when he saw the surgeon receive the same stimulus.
An fMRI study by Singer et al. (2004) investigated empathy for pain: Participants were couples. Female partners were scanned while their male partners stood just outside the scanner, with only their hands visible to the female partner. Two different cues informed the female partner whether she was going to receive a light painful stimulus (“self” condition) or whether her partner was going to receive it (“other” condition). Among the areas activated in the two conditions, the anterior insula and anterior cingulate cortex (Figure 4.4b and 4.4c) showed overlapping activations. The empathic scales constructed with the subjects’ responses to specific questionnaires revealed that there was a significant positive correlation between the degree of empathy and intensity of activation in these cortical regions. In a similar, subsequent study (Singer et al. 2006), both male and female participants were scanned while they observed another individual receiving a painful stimuli, similar to the experiment described above. The observed individual was one of two actors who, before scanning, had played a game with subjects (one acted as a fair player, the other as unfair). For female participants, the results of this study replicated the previous study. For male participants, however, insula activation was present only when a fair player was observed to feel pain. When an unfair player was observed to experience pain, the nucleus accumbens, a reward-related area, was activated. This activation correlates with a desire for retaliation, as assessed in a post-scanning interview.
Altogether, these findings suggest that humans understand disgust (and most likely other emotions) through a direct mapping mechanism that recruits a first-person neural representation of the same feelings. These feelings are normally associated with precise visceromotor reactions. Intensity of activation can vary, however, depending on how the observed emotion is embodied. It can also be modulated by other cognitive factors, thus involving the activation of other areas not directly related to the observed emotion. Thus, it can be concluded that a mirror system, different from the one activated by actions but working in a similar way, may come into play during the understanding of others’ emotions. This kind of mechanism is very likely a necessary prerequisite for establishing empathic relations with others, but per se is not sufficient. To share an emotional state with another does not always elicit the same reactions. If, for instance, we see a person expressing pain, this does not mean that we automatically feel compassion. Compassion may occur with higher probability if the person in pain is a friend or a relative. However, it is much less likely to happen if the other person is not known to us or is an enemy, as demonstrated by Singer et al. (2006).
Many anatomical and functional data indicate a possible homology between human Broca’s area (a crucial component of the language system), or part of it, and the ventral premotor area F5, although there is debate over which sector of F5 is the true homologue of Broca’s area (Petrides et al. 2005; Rizzolatti and Arbib 1998). Coupled with these findings, the property of the motor system to code goals and the presence of mirror neurons in it prompted the idea that these basic functions could be good candidates for explaining how dyadic communication, and then language, evolved. However, whether this evolution was originally grounded in gesture (Rizzolatti and Arbib 1998) or vocalization (Fogassi and Ferrari 2004, 2007) is a matter of debate (for a discussion on gesture, see Arbib and Iriki, this volume).
In terms of vocalization, consider the following: As reported above, in monkey area F5, there is a category of mirror neurons that is specifically activated when motor acts performed with the mouth—some of which respond to communicative monkey gestures—are observed or executed (Ferrari et al. 2003). Second, a subset of F5 motor neurons has been recently reported that activate during the production of trained calls (Coudé et al. 2011). For many years, call production was considered an attribute of emotionally related medial cortical areas. However, the data of Coudé et al. suggest that PMV (already endowed with a neural machinery for the control of voluntary hand and mouth actions) could also be involved in the voluntary control of the combination of laryngeal and articulatory movements to produce vocalizations that are not simply spontaneously driven by a stimulus. These findings, together with the known capacity of PMV to control laryngeal movements (Hast et al. 1974; Simonyan and Jürgens 2003), suggest that this cortical sector could constitute a prototype for primate voluntary vocal communication. It has still to be established whether this neuronal activity related to call production is paralleled by the presence of mirror activity for the perception of these same calls. While mirror neurons related to orofacial communicative gestures are already present at the phylogenetic monkey level, neurons endowed with the property to produce and perceive vocalizations may have appeared later in evolution.
Interestingly, as reported in chimpanzees, vocalizations can be combined with meaningful gestures (Leavens et al. 2004a). This suggests the possibility that the “gestural” and “vocal” theories of language evolution could, at some level, converge. In fact, in the hominin lineage, the frequency of combined vocalizations and gestures could have increased and, as the orofacial articulatory apparatus became more sophisticated, this apparatus may have achieved a leading role in communication. Note, however, that the link between spoken language and gestures is still present in our species and that a reciprocal influence between these modalities has been clearly demonstrated (McNeill 1992; Gentilucci et al. 2004b; Gentilucci and Corballis 2006).
(p.100) Unlike the language frontal area, there is presently no clear evidence for a possible homologue of Wernicke’s area (for some possibilities, see Arbib and Bota 2003). In addition to the anatomical location of Broca’s area (which is similar to that of F5 in the monkey precentral cortex), fMRI experiments show that this cortical sector becomes active when subjects perform complex finger movements and during imitation of hand motor acts (Binkofski et al. 1999; Buccino et al. 2004b; Iacoboni et al. 1999). This indicates that this area, beyond controlling mouth motor acts, contains a hand motor representation as well. Other fMRI studies show that Broca’s area activates when hand and mouth motor acts are observed (Binkofski et al. 1999; Buccino et al. 2004b; Iacoboni et al. 1999). This indicates that, beyond controlling mouth motor acts, this area also contains a hand motor representation. Other fMRI studies show that Broca’s area activates during the observation of hand and mouth motor acts (Buccino et al. 2001) or listening to the noise produced by some of these acts (Gazzola et al. 2006). This is reminiscent of the presence of audiovisual mirror neurons in macaque F5, which are activated by sound produced by the observed motor act (Kohler et al. 2002). Because of all these activations, Broca’s area has been included in the frontal node of the mirror system. In addition, Broca’s area is known to activate while listening to words (Price et al. 1996).
This evidence leads to the hypothesis that if a mirror mechanism is involved in speech perception, the motor system should “resonate” during listening to verbal material. This is exactly the finding of a TMS study (Fadiga et al. 2002) in which motor evoked potentials (MEPs) were recorded from the tongue muscles of normal volunteers who were instructed to listen to acoustically presented words, pseudo-words, and bitonal sounds. In the middle of words and pseudo-words there was either a double “f” or a double “r”: “f” is a labiodental fricative consonant that, when pronounced, requires virtually no tongue movements; “r” is a linguo-palatal fricative consonant that, in contrast, requires the involvement of marked tongue muscles to be pronounced. TMS pulses were given to the left motor cortex of the participants during stimulus presentation, exactly at the time in which the double consonant was produced by the speaker. The results show that listening to words and pseudo-words containing the double “r” determined a significant increase in the amplitude of MEPs recorded from the tongue muscles with respect to listening to words and pseudo-words containing the double “f and bitonal sounds. Furthermore, activation during word listening was higher than during listening to pseudo-words. This strongly suggests that phonology and, perhaps, (partly) semantics are processed by the motor system. These findings appear to be in line with Liberman’s motor theory of speech perception (Liberman and Mattingly 1985), which maintains that our capacity to perceive speech sounds is based on shared representations of speech motor invariants between the sender and the receiver. While it is still being debated whether semantic attributes of words are understood directly through this mechanism or whether this mirror activity is not necessary for this function, the possibility that a mirror-matching mechanism is fundamental (p.101) for phonological perception is quite compelling. Other theoretical approaches contrast with this view (see, e.g., Lotto et al. 2009). Among the arguments, there are data in Broca’s aphasics which show that these patients can be as good as normal in word comprehension or, although impaired in speech discrimination, can be good in speech recognition. Furthermore, there are lesions involving the left frontoparietal system that leave speech recognition intact.
Interestingly, a strict link between language and the motor system has been provided by studies in which subjects had to listen to sentences containing action verbs—I grasp a glass or I kick a ball—contrasted with abstract sentences—I love justice (Tettamanti et al. 2005). The subtraction between action-related sentences and abstract sentences produced a somatotopic activation of premotor areas, in sectors corresponding to the activation of the effectors involved in the listened sentence. Thus, verbal material related to action verbs activates not only Broca’s area but also the corresponding motor representations in the whole premotor cortex. It is worth noting that words related to nonhuman actions do not elicit activation of the motor cortex (Pulvermüller and Fadiga 2009).
In addition to phonology and semantics, the other fundamental property of human language is syntax (Hagoort and Poeppel, this volume). Lesion studies demonstrate that Broca’s area and the left perisylvian cortex play an important role in grammar processing (Caplan et al. 1996). Patients with lesions to the inferior frontal cortex also have difficulty in ordering pictures into well-known sequences of actions (Fazio et al. 2009). Neuroimaging investigations show that Broca’s region (BA 44 or pars opercularis and BA 45 or pars triangularis) in the inferior frontal cortex and Wernicke’s region in the superior temporal cortex are more strongly active in response to complex sentences than to simple control sentences. Furthermore, these areas are active when listening to hierarchically nested sequences (Bahlmann et al. 2008). Sequential organization is typical of actions, and various aspects of it are coded by several cortical regions (Tanji 2001; Tanji and Hoshi 2008; Bonini et al. 2010; Bonini et al. 2011; also discussed below). It is not clear whether and how the structure of sequential motor organization could have been exploited for linguistic construction. One hypothesis holds that during evolution, the more the motor system became capable of flexibly combining motor acts to generate a greater number of actions, the more it approximated a linguistic-like syntactic system. Such a capability could have extended to the combination of larynx and mouth movements in phono-articulatory gestures for communicative purposes (Fogassi and Ferrari 2007).
Mirroring in Dance and Music
For the mirror-matching mechanism to occur, motor representations (either of acts or sounds) must be part of the observer’s internal motor repertoire. This (p.102) repertoire allows us to understand many goal-directed actions and meaningful gestures. Dance gestures constitute one type that is easily recognized. Although many of the observed motor synergies performed by an expert dancer can be represented in the observer’s motor system, the steps specific to a given type of dance are unknown to naïve observers. Thus, motor resonance should be higher when an observer is also an expert dancer. In fact, neuroimaging studies have demonstrated that during the observation of dance steps, activation of the observer’s motor system is higher when the observed steps belong to the observer’s own motor experience (Calvo-Merino et al. 2005). Furthermore, the observation and imagination of rehearsed dance steps produce higher motor activation than steps that are not rehearsed (Cross et al. 2006). Activation of the mirror system is thus greater when the observing subject has more motor experience in the observed motor skills. This result is important because it demonstrates a mirror activation not only for actions, but also for meaningful gestures (e.g., dance steps), which are probably derived from goal-directed actions through a process of ritualization (Arbib 2005a).
The issue of the relation between dance recognition and the structure of the motor system underlying this recognition involves not only the capacity to “resonate” during observation of single gestures but also the capacity to recognize sequences of gestures. This issue has been partly addressed above for the understanding of intentional actions and language. How, though, does the motor system build and, as a consequence, recognize this sequence? A similar question is also valid for dance. It is very likely that we recognize the single gestures that form dance steps using our mirror system; however, a process is then needed to allow the appropriate sequence to be internally reconstructed. Studies on imitation learning of playing instruments may further our understanding of the neural circuitry involved in this process. The basic neural organization for encoding and programming motor acts or meaningful gestures is represented by the parietofrontal circuits, and this organization comes into play both during execution and observation. However, when individuals are required, for instance, to imitate a guitar or a piano chord, neuroimaging studies indicate that there is also a strong involvement of prefrontal cortex (Buccino et al. 2004b). Prefrontal cortex has been classically considered to have a major role in action planning. When it is necessary to build sequences of motor acts, however, its role becomes crucial (Figure 4.5).
Music is another function that may imply mirroring. Although music cannot be strictly considered as a kind of dyadic communication, it can involve a sender and a receiver of a message. Music can, of course, be experienced alone. However, music is often shared with other people. When we attend a concert, for example, both our auditory and visual systems become deeply engaged. The sensory inputs elaborated inside our cortex activate several higherorder neural structures: high-order sensory and motor areas as well as the emotional system. Because of the presence of mirror neurons, the motor system comes into play as we observe the motor acts of musicians as they perform on (p.103)
A specific case of production or perception in music is denoted by musicians themselves, who must synchronize many elements to produce a perfect musical piece. As discussed for joint actions, to play music with others we require:
• a neural mechanism that permits the same motor representations (mirror mechanism) to be shared with co-players,
• a mechanism(s) to distinguish our own actions from those of the others, and
• a mechanism that allows coordination (see also Levinson, this volume).
Several studies have demonstrated a retrieval of the motor system while listening to sounds. For example, participants listening to both familiar and unfamiliar sounds show (when activation from the latter is subtracted from the former) an activation of the superior temporal cortex as well as the supplementary motor area and IFG (Peretz et al. 2009). Listening to and reproducing isochronous or complex rhythms activate several areas of the premotor cortex (pre-SMA, SMA proper, premotor cortex) and subcortical structures (basal ganglia, cerebellum) (Chen et al. 2009). The role of PMD seems to be more related to the use of the metric structure of sound for movement organization, (p.104) whereas PMV may transform known melodies into movement. This has also been described in an area between PMD and PMV, recruited during passive listening.
It is common knowledge that music can evoke several different emotions (for a review, see Juslin and Vastfjall 2008b; see also Scherer and Koelsch, both this volume). Sound intensity, fluctuation of timing, and timbre, for example, may elicit emotions such as happiness, sadness, and fear. The issue here is whether music can evoke emotional mirroring.
Gestures of players may elicit a kind of mirroring. One would expect this effect to be stronger in accomplished players. This mirroring, in turn, may evoke emotional feeling in a listener that draws upon the listener’s own past experiences of playing. Postural gestures of the player during performance emphasize musical message and may elicit a contagion in the observer/listener. Molnar-Szakacs and Overy (2006) have proposed that while listening to expressive music, the mirror system for actions will be activated and relayed to the limbic system through the insular cortex, which, as described above, is a crucial structure for the representation of subjective states. Other authors make a parallel between acoustic features of music and emotional speech, suggesting that in the listener there is a mechanism that mimics the perceived emotion internally, in response to specific stimulus feature (Juslin 2001). In line with this theory, Koelsch et al. (2006) found that listening to music activates brain areas related to a circuitry that serves the formation of premotor representations for vocal sound production. They conclude that this could reflect a mirror function mechanism, by which listeners may mimic the emotional expression of the music internally. All of these considerations suggest that listening to music and observing performers produce both an activation of the motor system and an emotional mirroring, involving the insula and areas of the limbic system, such as cingulate cortex and amygdala.
Another important aspect of music that is strongly related to the motor system is represented by song. In animals studies, song production has been investigated in birds (see Fitch and Jarvis, this volume), although birdsong is different from a human song with words. The presence of mirror neurons activated by both the production and listening of the species-specific song was recently demonstrated in a telencephalic nucleus of a swamp sparrow (Prather et al. 2008). These mirror neurons, however, do not appear to code the goal of a motor act, as in phonological resonance; instead, they map sequences of heard notes onto the brain motor invariants used to produce them. Interestingly, disrupting auditory feedback does not alter this singing-related neuronal activity, which clearly indicates its motor nature. The fact that these neurons innervate striatal structures that are important for song learning suggests their possible role in vocal acquisition.
The presence of mirror neurons in song suggests that resonance mechanisms are probably parsimonious solutions which have evolved in the vertebrate brain several times to process complex biological information. Indeed, (p.105) Fitch and Jarvis (this volume) view birdsong and human vocal learning as convergently evolved systems.
Another interesting aspect of birdsong is that it, like speech, is sequentially organized. Since sequential organization is typical of actions, speech, and music, it would be interesting to elucidate better the neural mechanisms that underlie both production and recognition of motifs in singing birds. In particular, it would be helpful to understand whether the structure of birdsong is more comparable to the syntax or phonology of language (Yip 2006).
Action Sequence Organization and Language Structure: A Comparison
Sequential organization is a typical feature of motor actions. Motor acts endowed with a meaning (the motor goal) form the basis for action (Jeannerod 1988; Bernstein 1996; Rosenbaum et al. 2007). A similarity of this organization with that typical of language can be suggested at two different, but not mutually exclusive, levels. At the first level, motor acts seem to play the role of the words within a phrase. As the motor acts, words are the first minimal element of language endowed with a meaning. The meaning of a sentence is given by the sequential organization of words. By changing the position of the words, the meaning of the sentence changes or is missing. Similarly, if the order of the motor acts in an action sequence is changed, the action goal may also change.
At a second, more motoric, level, syllable production is the result of the execution of orofacial motor acts in combination with the appropriate contraction of larynx muscles. The higher-level neural control, underlying this combination, intervenes to organize a fluent link between syllables, exactly as occurs in the neural control of forelimb or mouth actions. The apparent difference between syllables and motor acts is that the former are devoid of meaning. However, one could always argue that the achievement of the specific configuration necessary to pronounce a syllable represents a goal per se.
While we do not know the neural mechanism that underlies the organization of syllables into words, and of words into phrases, the issue of sequential organization in the monkey motor system at the single neuron level has been addressed in two main series of studies. The first assessed the responses of neurons in mesial (SMA/F3, pre-SMA/F6) and prefrontal cortex, while monkeys executed sequences of movements (Tanji 2001; Tanji and Hoshi 2008), such as turning, pulling, pushing, or traversing trajectories in a maze. These studies show that the neurons of these cortical sectors were activated for different aspects of the task. Some categories of neurons code the type of sequence; others denote the order of a movement within a sequence; still others code the final location of the movement series. The second series of studies (Fogassi et al. 2005; Bonini et al. 2010; Bonini et al. 2011) assessed the responses of parietal and premotor neurons during execution and observation of natural action (p.106) sequences. Results show that motor and mirror neurons from these areas can be differently activated depending on the specific action sequence in which the motor act they code is embedded.
Together, these two series of studies suggest that sequential actions are organized under the control of the premotor-parietal motor system and prefrontal cortex. Part of this neural substrate has a major role in linking motor acts for the achievement of an action goal, while part is involved in coding the order in which various motor elements can appear in a sequence. The mechanisms used by these cortical circuits could have been exploited during the evolution of the organization for syntactic structure. Note that the above described activation of motor structures during language and music production and perception is a good, although indirect, confirmation of this hypothesis.
Relation between Aphasia, Amusia, and Apraxia
Many human neuroimaging studies report an activation of Broca’s area and its right homologue during the processing of syntax in both language and music (Chen et al. 2008a; Maess et al. 2001; Patel 2003). Patients with damage to Broca’s area can, however, show both aphasia and amusia (Alajouanine 1948; Patel 2005). Interestingly, many Broca’s patients can also show limb apraxia, although aphasia and apraxia are not necessarily associated (De Renzi 1989). A recent study showed that such patients, unlike another group of apraxic patients with parietal damage, present a deficit in gesture recognition (Pazzaglia et al. 2008). The presence of three possible syndromes with a lesion to the same region raises the question of whether there is a shared common mechanism or structure (see Fadiga et al. 2009). A possible sharing of similar circuits between language and music comes also from some rehabilitation studies. It has been reported that dyslexic children and nonfluent aphasic patients can benefit from music therapy, as demonstrated not only by an improvement in behavioral scores but also by a change in the white matter of corticocortical connections between superior temporal cortex and premotor cortex/inferior frontal gyrus, in particular the arcuate fasciculus (Schlaug et al. 2009).
In conclusion, it is possible that language and music may partially share neural circuits, due to a possible common motor substrate and organization. The fact that cortical regions included in the mirror system for actions are also activated during the processing of language and music-related gestures lends further support to this hypothesis.
I thank D. Mallamo for his initial help in preparing the illustrations. This work has been supported by the Italian PRIN (2008) the Italian Institute of Technology (RTM), and the ESF Poject CogSys.
(1) TMS is carried out by giving magnetic pulses through a coil located on the subject’s scalp, which produces an electrical field in the cortex underlying the coil, thus modifying the excitability of the neuronal population of this cortical sector. When applied to the motor cortex, pulse delivery at a given intensity elicits overt movements (motor evoked potentials, MEPs), so that it is possible to map, on the scalp, the motor representation of the activated effector. When used in research, the intensity of magnetic pulses is often kept at threshold level to study the enhancement produced by a specific task performed by the subject. For example, if a subject is asked to imagine raising his index finger, the contemporaneous TMS stimulation, at threshold, of the motor field involved in this movement will induce a MEP enhancement, whereas stimulation of the field involved in the opposite movement will determine a MEP decrease.