Jump to ContentJump to Main Navigation
Evolution of Communicative FlexibilityComplexity, Creativity, and Adaptability in Human and Animal Communication$

D. Kimbrough Oller and Ulrike Griebel

Print publication date: 2008

Print ISBN-13: 9780262151214

Published to MIT Press Scholarship Online: August 2013

DOI: 10.7551/mitpress/9780262151214.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2021. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 18 October 2021

Language and Niche Construction

Language and Niche Construction

(p.214) (p.215) 10 Language and Niche Construction
Evolution of Communicative Flexibility

Kim Sterelny

The MIT Press

Abstract and Keywords

This chapter considers the selectional effects of human niche creation on language evolution, effects that are linked with special characteristics of human social systems. It describes the evolution of a fully grammaticalized language and argues that quarantining the costs of deception was one of the factors driving grammaticalization. The chapter distinguishes between signaling systems and symbol-using systems, and shows that fundamental change in the organization of language itself—the grammaticalization of protolanguage—is in part driven by vetting issues. It suggests that the evolutionary transition from a basic protolanguage to full human language involved a multitude of changes to phonology, morphology, and syntax.

Keywords:   human niche, grammaticalized language, social systems, grammaticalization, signaling systems, symbol-using systems, protolanguage, phonology, morphology, syntax

Organism and Environment

In conventional evolutionary thought, selection is seen as shaping an evolving lineage to fit its environment, as a key is shaped to fit a lock. For example, the Australian interior is challenging for plant life. It is very dry. It is subject to fire and other disturbances. Soils are infertile. Plants respond. They evolve disturbance-tolerant (especially fire-tolerant) life cycles. They evolve physical adaptations to reduce water loss, and deep root systems to extract the moisture that is available. They are metabolically thrifty and guard their tissues with toxic chemicals. Eucalypts, banksias, and acacias fit an Australian world by adapting in response to the browning of Australia. The relationship between environment and lineage is asymmetrical: Environmental change has caused lineage change, but not vice versa.

However, though lineages sometimes adapt to an environment, sometimes they adapt their environment instead. Think not of banksias but termites. Termites live in massive structures that equilibrate temperature and humidity. They resist physical disturbance and exclude all but the most specialized of predators. Termites inhabit their own miniuniverses. There is a fit between termites and their world not because termites have unidirectionally adjusted to their world but because they have in part adjusted their world to fit them (Turner, 2000). As Richard Lewontin argued, organisms are active; they partially construct their own niches1 Richard Dawkins has developed a similar line of the argument from the perspective of gene selection theory, showing that the adaptive effects of genes are often expressed outside the organism within which they are located (Dawkins, 1982a). His interactionist perspective on the relationship between lineage and environment is particularly apt when considering the evolution of language. Language is a human product, so it would be odd to think of its evolution in an externalist way—to think of language as an autonomous feature of the environment to which humans have adapted. Moreover, we are niche constructors par excellence. We modify our physical and biological environment: The selective pressures on humans have altered radically as a consequence of (p.216) the inventions of tools, weapons, shelters, clothes, boats, cooking. However, like other organisms, we also modify our informational environment. We are informavores, and anciently so: Our techniques for extracting resources from our environment, and protecting ourselves from its dangers, depend on access to, and use of, impressive amounts of information (Kaplan, Hill, et al., 2000). Thus, humans are active epistemic agents—informational engineers—as well as active physical agents (Clark, 1997). We intervene in the world to improve our access to information and to improve our ability to use information. Think of an act as simple and as normal as marking a trail while out in the bush. This intervention enormously simplifies coordination problems (making it easier for others to follow you) and navigation problems (making it easier to find your way back).

In this chapter I aim to show the relevance of nice construction to the evolution of language. The invention and elaboration of language is itself a crucial alteration of the informational world in which humans live. It is a crucial example of information engineering. Language makes social learning vastly more accurate and powerful, and so it enhances our capacity for the intergenerational transfer of skills and information and thus improves our access to physical and biological resources (Bickerton, 2005). But it also transforms the selective forces acting on social interactions. For instance, language enhances selection for cooperation by magnifying the importance of reputation. Once language has been invented, gossip becomes a powerful mechanism of reward and punishment (Dunbar, 1996; Wilson, Wilcznski, et al., 2000). The role of language in facilitating the transmission of information within and across generations, and in amplifying selection for communication, sets up a coevolutionary loop between language and the rest of culture: Each allows the other to become more elaborate. The elaboration of human cultural and social worlds intensified selection for languages of greater precision and expressive power. As languages of greater power and accuracy became available, that resource enabled human social worlds to become more elaborate.

One aspect of the coevolutionary elaboration of human culture and the cognitive tools that sustain it is cost management, and that will be the focus of this chapter. Humans manage their lifeways with the aid of cognitive technologies of many kinds. These include language, depictive representation, the use of templates, the cognitive division of labor, and the organization of work spaces (Clark, 2002). However, while these technologies enhance the cognitive power of human wetware, they have cognitive costs (Sterelny, 2004, 2006a). These can be contained by appropriate modifications of the tools and the environment in which these tools are used. I shall argue that niche construction is part of the mix of meta-adaptations through which we contain and manage the cognitive costs of using language.

I begin with the problem of honesty. Sharing information is a special case of cooperation, and it inherits the usual problems of explaining the evolution and stability of cooperative social lives. However, language-like systems are especially puzzling (p.217) instances of cooperation. In the “Honest Signaling” section, I explain that prima facie problem and begin its solution, appealing both to population structure and to epistemic action. Even at early stages of language evolution, hominids (I argue) were preadapted to assess signal reliability, and those preadaptations enabled them to contain deception costs. Next, in the section “The Grammaticalization of Protolanguage,” I turn to the evolution of a fully grammaticalized language and argue that quarantining the costs of deception was one of the factors driving grammaticalization. Finally, in the “Symbols, Signals, and Information Load” section, I distinguish between signaling systems and symbol-using systems. In signaling systems, signal transmission covaries with a specific, ecologically salient feature of the environment (external or internal emotional). They are systems of natural signs. Learning the system as a whole may be cognitively challenging2 However, the basic signalworld relationship is covariation. Detecting covariation does not, in general, require special cognitive machinery: Associative mechanisms in rats, pigeons, and people detect covariation. But covariation is not the key to understanding the word-world relationship: Words do not covary in space and time with their targets, and hence symbol use does have special cognitive costs, costs that are contained, I shall argue, by the division of linguistic labor.

Honest Signaling

In their classic paper on the evolution of signaling, Krebs and Dawkins made vivid the evolutionary problem of the evolution of honest signals (Krebs and Dawkins, 1984). In a competitive world, how could an agent be advantaged by honestly signaling to another about risks, opportunities, or resources? It is true that in one respect, deception is self-limiting. If deception were pervasive, other agents would cease to respond to linguistic signals, and there would be no point in sending them. The payoff for deceptive signaling is apt to decline as its proportion of total traffic increases. Therefore, we do not need any special explanation of the fact that so long as agents continue to talk, most talk will be honest. But why do they continue to talk? We need an explanation of the survival and elaboration of mostly honest signaling in the face of temptations to defect and free ride. Such an explanation cannot rely on mechanisms of surveillance and enforcement, since these presuppose a rich linguistic and social environment. They presuppose an environment in which norms can be expressed, taught, and enforced. Policing mechanisms cannot explain how a simple protolanguage used for honest signaling became established as part of our ancestors’ lives, for they presuppose something richer than such a protolanguage. Moreover, though the proportion of deceptive signals may be low, the potential cost to an individual of a lie can be very high. Lies can kill. Audiences cannot afford to ignore the threat of deception, even if it is rare.

(p.218) Honest signaling is an instance of the general evolutionary problem of cooperation; cooperation, in turn, can be selected by synergy: Two agents can generate more return to each acting together than acting alone. Brian Skyrms’ The Stag Hunt is a game-theoretic exploration of such synergies. Two hunters acting together can capture a stag, whereas each hunting individually can take only a hare apiece. Cooperation is favored because a stag is worth more than twice a hare (Skyrms, 2003). The power of coordination builds a temptation to cooperate even for the prudent, selfish agent.

A mere exchange of information may not generate significant synergies. If I tell you the location of a fruiting tree today, and you do the same for me tomorrow, no synergy has been generated. Indeed, given that it is always rational to discount future benefits (you might be killed before you have a chance to reciprocate), I am behind the game. However, agents talk in the course of coordinating and planning their actions, thus improving the synergistic payoffs of other joint activities. Moreover, cooperation is important in reducing variance. Even skilled foragers have unlucky days, so it can pay to share, even if sharing does not increase your average return. But sharing insures against those days. In a world where signals from the environment are noisy and equivocal, communication might likewise have risk reduction functions. You judge that the river is safe to cross. But even for the experienced agent, such judgments are uncertain. Perhaps sampling the opinions of others can reduce this uncertainty. Christian List has explored this idea using the Condorcet jury theorem. Suppose each member in a group has a better than. 5 but worse than 1.0 chance of determining whether this river is indeed safe to cross. As the number of agents in the group goes up, the chance of a correct majority vote rises rapidly. Information pooling increases the reliability of judgment in a noisy and uncertain world (List, 2004). Risk reduction might help underpin the evolution of cooperative signaling.

The benefits of coordination, planning, and information pooling might explain why I should listen. But why should I signal? Despite the benefits of synergies, free riding is a potential barrier to the evolution of any type of cooperative social interaction. Moreover, there are specific barriers to the evolution of cooperative communication. Signaling creates a public good. Information leakage imposes a serious cost, for honest talk generates benefits from which third parties cannot be easily excluded. Eavesdropping imposes a tax on honest talk. This tax is likely to be high in the initial stages of language evolution. The more effortful we suppose protolinguistic communication to have been (and hence slower, repeated, and with redundancy) and the more dependent on pragmatic scaffolding for its interpretation (contextual cues, ostension, and the like), the harder it would be to exclude bystanders. Whispering conspiratorially in the dark requires advanced language skills, for it requires interpretation to be decoupled from context3 Moreover, defection through failure to reciprocate is hard to detect. How will other agents discover that you knew (p.219) yesterday of some resource or danger that they discover tomorrow? Active deception will sometimes be unmasked, but failures to signal are a more cryptic form of free riding.

In short, there does seem to be good reason to believe that honest communication generates synergies. It is a direct aid to coordination in cooperative economic activities: The stag hunters who talk and plan will probably do better than those that remain mute. And information pooling increases the reliability of individual judgment in informationally noisy environments. But those benefits can be eroded by free riders, so how can those costs be contained? Population structure helps block the invasion of deceptive strategies. Game theoretic models show that in a mixed population, in which all agents interact with all agents, free riding prevents the stabilization of honest, low-cost signaling systems. However, if the population is structured into small knots of agents who interact primarily with one another, cooperative strategies can be stable (Grassly, Haeseler, et al., 2000). This general idea comes in a number of different forms. Thus, Tecumseh Fitch has suggested that protolanguage begins within the family, between a mother and her offspring. Family members have overlapping evolutionary interests, so predominantly honest signaling within the family would be no surprise (Fitch, 2004). I have suggested a related idea, arguing that conflicts of interest within human groups are partially suppressed by selection for cooperation on groups. This allows cooperative signaling to evolve in tandem with other forms of cooperation (Sterelny, 2003).

These ideas both link honest signaling to the minimization of competition between signaling agents, but I prefer my form of this idea to that of Fitch. For one thing, I do not think Fitch has a plausible model of how language leaves the family. He relies on the benefits of reciprocation to amplify the circle of conversational exchange. However, there are well-known problems in scaling up reciprocation from two-player to many-player contexts. Withdrawing cooperation is a very blunt instrument for policing cooperation in n-player interactions, for it penalizes cooperators as much as free riders (Sripada, 2005). Moreover, there is a gap in Fitch’s story. His model seems to predict the evolution of protolanguage in chimpanzees, for they too are characterized by long-lasting associations between a mother and her offspring. Among common chimpanzees, a mother and her offspring often spend years foraging together, endlessly in one another’s company. Why, then, is there no chimpanzee protolanguage?4 The overall point, though, is that the costs of deception can be reduced by population structure. In part, population structure is independent of hominin agency: It is a response to the local geography and ecology. However, in other respects it is an effect of human agency: The extent to which the populations within a metapopulation differ from one another will depend on their customs of trade, intermarriage, and migration. Some patterns of group-group interaction will tend to damp down between-group differences; others will accentuate them.

(p.220) The reduction of competition between conversational partners provides part of the explanation of the establishment of honest and cooperative signaling. Equally relevant were the preexisting capacities for epistemic action. Agents intervene in their physical environment, and they intervene in their informational environment too. Agents act epistemically to improve access to the information they need, engaging in both long-term informational engineering and short-term fixes. So consider, again, a leopard signal in the absence of a leopard. Will such signals tend to undermine the whole practice of signaling? Hardly: After all, no detection process is perfect. Leopard signals will never have been perfectly correlated with their adaptive targets. Protolanguage was not “born honest,” beginning with signals that perfectly covaried with their targets and then becoming less reliable only with the invention of deception. Hominins did not need to wait for The Fall to first hear “leopards” without leopards. We evolved in environments in which predators, competitors, and prey hid, camouflaged themselves, and disguised their intentions. They degraded the informational environment. In such environments, despite an agent’s best efforts, pickup of information can never be perfect, and the accuracy of information pickup constrains signal reliability.

Intelligent agents—and protolanguage-using hominins were intelligent—were likely to be aware of imperfect correlations between signal and target and to have established strategies to limit the costs of such failures. An agent hearing “leopard” has options in addition to those of ignoring the call or engaging in leopard flight. The audience can become more alert. In addition, they can probe the environment: They can actively scan, move to a better vantage point, and suppress extraneous noise. They can monitor others, including nonhuman others—a region of bush might have gone suspiciously quiet. If protolanguage is sufficiently rich, they can interrogate the signaler—asking exactly what the signaler saw; how well; how far away, perhaps while monitoring the state of the signaler. Responses to specific signals are not allor-nothing, and audiences have tools to assess the reliability of specific signals. Those tools are needed because agents signal in difficult informational environments. Signal-target failures are bound to occur in such environments, and hence signal reliability assessment is needed independent of, and prior to, the threat of deception.

As we have seen, intelligent hominids are likely to have had capacities that preadapted them to the problem of deception. In assessing the reliability of signals, deception does not seem to be strikingly different from mere error. However, as we shall see in more detail in the “Symbols, Signals, and Information Load” section, language is not a signaling system. Much of the literature on the evolution of protolanguage treats it as a kind of super-vervetese: a system for communicating about the here and now. Signs are arbitrary, but they stand for objects in the signaler’s local environment. Perhaps protolanguage began as such a system, but language is independent of the local environment. We can talk about the elsewhere and the elsewhen (p.221) and about the possible, the impossible, and the imaginary as well as the actual. Communication systems that are decoupled from the current environment pose new reliability-checking problems. Changing positions for a better look is not much help in reliability-testing reports about subjects displaced in time, let alone displacements from the actual.

Even so, audiences are not helpless in the face of this expansion of the expressive power of language. Dan Sperber has argued that metalinguistic capacities are adaptations that limit the dangers of deceptions that are beyond the scope of direct checking. Metarepresentation is a tracking and calibration device. Tagging the source of information is essential if we are to keep track of the trustworthy. Thus, it is important to be able to represent the fact that it is Dave who said that the caves were empty and the creek was full. Moreover, we have a folk logic: We can explicitly represent the truth and falsity of statements, the soundness or invalidity of inferences. These are folk tools. For the first 2,000 years of its history, the discipline of logic may have done little more than systematize and organize folk judgments of validity and invalidity. Folk logic is a toolkit for assessing the reliability of what we are told when we are in no position to check the truth of assertions directly. It gives us techniques of indirect assessment. It is possible to reason without representing reasoning; it is possible to believe truly without the concept of truth. However, we could not assess and interrogate the assertions of others without metarepresentational tools in our language. These are tools that enable audiences to vet signals; they are tools for epistemic action (Cosmides and Tooby, 2000; Sperber, 2000; Sterelny, 2006b).

In short, humans have adapted their linguistic tools and their social environment to reduce the costs of unreliable signals while exploiting the benefits of honest communication. Restricting the size of conversational circles, exploiting our capacities for epistemic action, and constructing a metalinguistic apparatus to vet signals and signalers all help in keeping communication mostly honest and fairly reliable. We have engineered language and the environment in which we use it in response to the challenges of assessing reliability. I shall further suggest that fundamental change in the organization of language itself—the grammaticalization of protolanguage—is in part driven by vetting issues. Grammaticalization allows agents to quarantine the problem of deception, and hence to reduce the costs of using language, by using automated, encapsulated mechanisms to process those aspects of language for which no threat of deception arises.

The Grammaticalization of Protolanguage

I have just suggested that there are structures in language itself that function to reduce the costs of deception: our metarepresentational apparatus. I think that an even more fundamental feature of language, its regularized syntax, exists in part to (p.222) reduce costs of deception. The usual supposition is that the evolution of language involves a trajectory from a system without a regularized morphology and syntax to a phonologically, morphologically, and syntactically patterned system. In the earlier, protolanguage-like stages of this trajectory, interpreting conversation would have depended very heavily on pragmatics: on context, gesture, and background knowledge. At this stage, utterances are short, consisting of just a few word-like elements but without fixed order, syntactic elements, or indications of mood (Bickerton, 1990; Jackendoff, 1999). As hominin communication evolved, it acquired patterned ways of indicating number, tense, aspect, subject and object, and illocutionary force. Context became less important. How is this trajectory to be explained? The evolution of a generative phonology and lexicon is, I suspect, a response to the increasing unpredictability of human environments. But syntax, I shall argue, helps quarantine the costs of assessing others’ reliability. I will develop these ideas by contrasting them with an alternative.

One idea is that grammaticalization is driven by learning costs5 Rather than supposing that the human mind is adapted to the structure of language (by being provided with information about that structure innately), Simon Kirby, in association with Henry Brighton and Kenny Smith, suggests that language is adapted to human psychology. In particular, they argue that the fundamental organizational feature of language—its compositionality—is a consequence of selection for learning (Smith et al., 2003; Brighton et al., 2005). Holistic communication systems are ones in which every signal must be learned independent of every other signal. They model the transitions from these to compositional systems, that is, systems in which the significance of a signal depends on the units from which the signal is constructed together with its structure. In many of their simulations there is no evolution away from holistic systems. However, when this transition occurs, it is driven (they argue) by a learning bottleneck, for any human language must be learned by the N + 1 generation from the sample provided by generation N. As a consequence, languages are under cultural selection for learnability, since the N + 1 generation must reconstruct language from this limited input. There is a sample size filter through which language passes, and they argue that this explains why languages are compositional and recursive.

The general point that languages are under selection for learnability, and that features of language may be explained by structures of the language-using community rather than the structure of individual minds, is surely well-taken. However, Kirby’s iterated learning model is misleadingly idealized. For one thing, protolanguage—if it is anything like the systems Bickerton or Jackendoff have in mind—is not a holistic system, for in holistic systems similarity in meaning is not correlated with similarity in signal. A holistic system maps signals onto the world, but knowledge of one signal-world correlation tells you nothing about the next. A signal meaning that “Bill is large” is no more likely to be similar to a signal meaning that “Bill is small” (p.223) than one randomly chosen. No chunk about Bill appears in both signals. Distance in signal space is not correlated at all with distance in semantic space.

This holistic pattern does not characterize protolanguage, especially if iconic representation—utterances linked with gesture and mime—was important in the early stages of protolanguage. The difficulty of interpreting protolanguage is generated by ambiguity, not holism. Signal interpretation depends on context, so one signal type can have many meanings. And if word order is not fixed, different signal types can have the same meaning. However, though protolanguage utterances do not have regularized linguistic meaning, expressions (subparts of utterances) in protolanguage are portable. An expression that picks out Bill in one utterance can be reused to pick him out again in another. Similarity in signal space is correlated with similarity in semantic space, but the signal-world map is many-to-many. Even granted the inevitable simplifications of model building, the iterated learning model does not characterize the initial state insightfully.

Learnability doubtless plays some role, but compositional signal systems are mechanisms of phenotypic plasticity. They enable agents to produce novel signals in response to novel environments. Kirby, Brighton, and Smith model language change in a range of different environments. However, within each run of a model, the environment is fixed. I suspect that this idealization may explain the fact that in their models recursive organizations do not typically evolve from holistic systems. No system with a fixed menu of signals is adequate in changing or unpredictable environments. Yet human environments are variable: The array of artifacts, resources, dangers, individuals, and social relations important to an agent cannot be predicted in advance. These features of human environments change both within and across generations. Even in stable environments, the expressive power of such systems would be limiting. Agents with a fixed menu of distinct signals will have to leave times, dates, places, and perhaps even agents unspecified, thus throwing an enormous burden onto the pragmatics of interpretation. The problem of limited expressive power is yet more serious if the environment is changeable. A compositional system gives agents the ability to signal about many new states of affairs for free. If you have the capacity to signal that a tiger is at the lake, and to signal that a bear is in the cave, you get for free the capacity to signal that a bear is at the lake.

In the light of these considerations of novelty, Jackendoff is right to emphasize the importance of compositional phonology, for it allows speakers to generate an indefinite number of well-defined word blanks that can be recruited as terms for novel phenomena (Jackendoff, 1999). Protolanguages, as Jackendoff and Bickerton conceive them, contrast with holistic systems in being open-ended. Open-ended systems are needed by agents whose communicative needs cannot be predicted in advance, because their world cannot be predicted in advance. And ancient humans lived in an increasingly unpredictable environment, both because they themselves became (p.224) increasingly important drivers of change, and because the climate became increasingly unstable (Potts, 1996, 1998). Selection for phenotypic plasticity selects for a readily expandable system, perhaps especially for a generative, indefinitely extendable lexicon. A regular phonology is important as well, for it imposes discreteness on the menu of basic referential signals. Without compensating adaptations, as the size of the lexicon increases, so too will the rate of misunderstanding: There will be an error threshold that constrains the size of the lexicon (Grassly et al., 2000). If there is selection for an increasing lexicon, there will also be selection for mechanisms making signals more obviously distinct from one another.

But what of syntax? One possibility is that compositionality quarantines the honesty problem and sets up an adaptive division of labor between consciously supervised and automatic language-processing cognitive systems. Let me begin with a crucial distinction: that between understanding and acceptance. Conflicts of interest and, hence, the temptation to deceive arise over acceptance of the sincerity of the speaker, but not over understanding of the speaker. All parties to a conversation need to have the cognitive resources to ask themselves, “Why is he saying that to me?” We need to be able to identify the pragmatic point of conversation in wholly cooperative interchanges, so that honest uses of language fulfill their coordinating role. Even more obviously, we need to have tools for evaluating others’ purposes in conversation, to guard against unfriendly manipulation of our thoughts and deeds. Conflicts of interest make the assessment of another’s motives and reliability important. The independence of language from local context makes it difficult. Thus, assessing the reliability and point of an utterance cannot safely be left to an encapsulated, automatic mechanism. We need the capacity to reflectively evaluate the expressed views of others to reduce the error costs of learning from others while keeping its benefits. This assessment can depend on any information to which an agent has access. Unmasking an utterance as deceptive (or merely unreliable) can turn on an apparently recondite detail. Much detective fiction turns on this: the subtle seeding of the plot with a tiny unmasking detail, which leads the protagonist, and can lead the reader, to the unmasking inference.

There is less need for reflective evaluation in identifying the features of an utterance that are internal to language: its lexical and phrasal organization. These features of utterances involve no temptation to deceive. There is no deception problem with syntax: Whatever the ultimate motives of people’s talking—whether they are honest cooperators or defectors—it is in their interests to make the organization of their utterance—the lexical items in use: its clausal structure, its subject/predicate organization, and its tense/aspect organization—as transparent as possible to the target of their speech. Whatever your aims in speech, those aims will miscarry if you do not secure minimal understanding of what you say. Likewise, it is in the interests of conversational targets to understand what is said, whether or not they accept (p.225) what is said to them. The expressed views of others are data, whether or not those views are true and whether or not those views are sincere. From this perspective, it is no surprise that while we have a quite rich and detailed folk logic, we have a much less elaborate folk syntax or folk phonology. If dialects really track in-group and out-group distinctions (Nettle and Dunbar, 1997), it might be important to notice that another agent sounds foreign, but there is no special need to be able to explicitly represent the aspects of speech that make other agents sound unfamiliar. And while the folk are quite good at noticing dialectal differences, those without any formal training in linguistics are pretty hopeless at identifying the source of those differences.

The form of language is relevant to its persuasive force and to its memorability, so there has long been reason to develop an awareness of how we say what is said. Thus, we are not completely without the capacity to represent form. Even so, form and organization generate no temptation to deceive. All parties in a linguistic transaction share an interest in making it as easy as possible to identify the lexical items and the way they have been structured into a sentence. Thus, it is adaptive both to individual language users and their speech community to automatize phonology, morphology, and syntax. We should expect speakers and targets to coevolve in ways that reduce the burdens of language processing. Ruth Millikan has written of language as a system of natural telepathy. In the normal case, when a speaker expresses the thought that there is a tiger in the bottom wood, the belief there is a tiger in the bottom wood appears in the audience’s mind without that audience’s representing the informational channel through which that information has been pumped from the speaker’s mind to the target’s mind (Millikan, 1998).

Human life is not cooperative enough for natural telepathy to be a good general model of linguistic interaction. However, a cut-down version of Millikan’s idea is a good model: When a speaker asks a question, her audience takes her to have asked a question without the audience’s having to attend to the informational channel that carries that structural information. For phonology, morphology, and syntax, Millikan’s “natural telepathy” metaphor is apt. Language has been engineered to make these aspects of language transparent. They can then be subcontracted to automatic systems, saving the scarce resources of attention, metarepresentation, and top-down monitoring for domains where those resources are needed: for a decision on acceptance. Once conversational interaction was no longer limited to families, or to small groups of reliably cooperating and repeatedly interacting agents, the need for reliability assessment would increase. Yet this assessment is cognitively demanding. While we have these capacities, despite the importance of reliability assessment, we use them sparingly. The modularization of phonology and syntax might have been crucial to freeing the cognitive resources needed to engage in the attention-rich, informationally demanding, metalinguistic scrutiny of what is said to us.

(p.226) The evolutionary transition from a basic protolanguage to full human language involved a multitude of changes to phonology, morphology, and syntax. There is no reason to suppose that a single selective force was responsible for all these changes. Kirby and his colleagues are surely right at some grain of analysis: Any change to the system that made it more patterned and more regular would have eased some learning problems, as samples of linguistic experience become better guides to the whole. Compositionality and portability allow adaptive linguistic response to environmental uncertainty. However, I also think it likely that as language became more complex, there was selection to automatize the processing of those aspects of language that could, safely, be automatized. Protolanguage, without a well-defined syntax, lacks a clear distinction between what is said and what a speaker is up to in saying what was said. Redesigning the system—regularizing it—introduces this distinction and thus allows much language processing to be automatized. Automatizing language will free attentional and memory resources, and those resources are always scarce. Freeing them will have many advantages. One, I have argued, is more effective counterdeception strategy.

Symbols, Signals, and Information Load

Formal models of the evolution of language often conceive language as a system of arbitrary, low-cost, referential signs. This is a reasonable first approximation, but it misrepresents the cognitive challenge of acquiring a lexicon. Referential signaling systems among animals are poor models of word use in human protolanguages, even early protolanguages. Referential calls are semantically very different from lexical items. The vervetese leopard call is not well-translated as “leopard.” The leopard call signals a state of affairs: Roughly “there is a leopard nearby.” Such signals are natural signs of the phenomena they indicate. Many shorebirds and waterbirds feed together in rather exposed places, and in rather dense flocks, and they use the alarm and escape behavior of other birds as a signal to take wing. They recruit the ecological activities of other birds as signals. There is a rough but useful covariation between the alarm flights of (say) pied stilts and danger, and many birds on an estuary will exploit that natural connection between one very easily observed phenomena— the cloud of stilts panicking into the air—and another, much less easily observed one (a stalking feral cat; Danchin, 2004). Likewise, vervets treat the leopard calls of other vervets as a natural sign of leopard presence. Natural signs are just covariations between one kind of event and another. Learning to use others’ calls as natural signs is not especially challenging. Associative mechanisms can detect and exploit such contingent connections.

Words are not natural signs of their referents. In conversations we talk of the benefits for which we hope, and the potential dangers we wish to avoid. We talk of other (p.227) times and places and describe the possible as well as the actual. We tell stories. For that reason, it would be quite hopeless to regard the English term “leopard” as a natural sign of leopards. Moreover, words are often not learned by association with their referents. Ostensive definition plays a role in language acquisition, but much of language is acquired via other representations: through depiction and description. Thus, associative mechanisms that are powerful enough to detect the covariation between the vervet leopard call and leopards would not suffice to map “tiger” onto tigers or “Charles Darwin” onto Charles Darwin. Few “tiger” tokens are copresent with tigers, and these days no “Charles Darwin” tokens appear with Charles Darwin.

This negative point is universally accepted. Words are not natural signs of their referents. They are symbols of their referents (Deacon, 1997). There is, however, no consensus whatever on the nature of lexical symbols: on what an agent has to understand to use “leopard” as a symbol of leopards. However, though there is no consensus on the nature of a linguistic symbol, on every view, using a symbol is informationally demanding. My approach will be as follows. I shall give one account of the informational demands of symbol use, then show how humans exploit their social environment to minimize those demands. I shall argue that the large and varied vocabularies we all command depend on a socially organized division of linguistic labor (Putnam, 1975). Importantly, this view of the effects of social organization on the informational demands of language does not depend on the specific account of the symbol-reference relationship I discuss. That account merely illustrates the way we collectively manage the problem of information load.

How is it that my tokens of “leopard” are grounded on leopards? On one family of views, “leopard” is a symbol of leopards (rather than, say, panthers) because competent users of English associate that term with something like an identifying criterion of leopards. The use of “leopard” as a symbol of leopards depends on a speaker’s having the capacity to identify leopards as leopards. The use of language is information rich. Though referential expressions do not need to be used in the presence of their referents, their use as symbols depends on the presence of an informational image of leopards. No leopard has to be present, but identifying information does have to be present. Likewise, while Charles Darwin does not have to be present for me to talk about him, identifying information about him is present: I know Darwin to be the codiscoverer of the principle of natural selection, the author of The Origin of Species and The Descent of Man, and I use “Charles Darwin” intending thereby to refer to that very person. The use of linguistic symbols is anchored in descriptive information available to the parties of a conversation about the targets of those symbols.

There is an obvious objection to this view of linguistic symbols: Ignorant and mistaken speakers seem to be able to talk about leopards. Not everyone who can use the term “leopard” can identify leopards; some of those who talk about the Darwins of (p.228) history are confused or ignorant about what they did. This is an important objection, but, equally, there is an important response. Within the community, there are leopard experts. They do have identifying knowledge of leopards. They use the term “leopard” as a symbol of leopards, for they use it with the intention of referring to that great cat that satisfies the following description: the largest spotted cat in Africa and Asia, solitary in habits, adult length (including tail) between 2 and 2.5 meters (Sunquist and Sunquist, 2002). In the same community, there will be speakers whose grip on the distinction between leopards, jaguars, and snow leopards is hazy, but no matter. They use the term “leopard” intending to refer to that great cat to which mammologists refer in using the term “leopard.” That is a unique description of leopards: That description is true of one and only one great cat. The ignorant do have identifying knowledge of leopards, but it is second-hand knowledge; it is the knowledge that others know about leopards, together with a way of specifying those others (Jackson and Braddon-Mitchell, 1996; Jackson, 1997). The problem of reference—the symbol grounding problem—is solved collectively rather than individually.

The division of linguistic labor goes beyond our practice of learning from others. People routinely use linguistic tags to pick out items in the world they have never seen and about which they know little. Sometimes, of course, they learn enough to become identifiers in their own right: They learn the distinguishing features of leopards or the distinctive ideas of Darwin. However, the point is that the use of linguistic symbols does not depend on the cultural transmission of identifying information about the targets of those symbols: It is still Darwin that the hopelessly confused creationist mischaracterizes, even though he or she has a garbled view of Darwin’s life and works. The creationist’s capacity to deploy a symbol for Darwin despite failing to grasp Darwinian ideas depends on a division of linguistic labor within communities. Individuals have differential access to aspects of their environment and differential knowledge about that environment. Those with little access to leopards defer linguistically to those with lots: to the experience and information of those whose worlds are leopard rich. That experience and information grounds “leopard” on leopards for all of us. Collectively, but only collectively, we know what we are talking about.

The idea that a division of linguistic labor is crucial to our ability to use a large lexicon is not tied to any specific account of the relationship between symbol and target in the mouths of experts. So-called “causal theorists” of reference think that the relation between a linguistic symbol and its target is constituted by an appropriate causal chain between language and the world. My “Charles Darwin” tokens refer to Charles Darwin, because there is an appropriate causal chain between my tokens of that name and the man. On this view, I need have no identifying knowledge of Darwin at all—not even a second-hand description like: “By ‘Darwin’ I intend to refer (p.229) to the person historians of biology call ‘Darwin.’” The relevant fact is that my disposition to use that term is caused by speakers whose disposition to use that term was caused by speakers whose disposition to use that term was caused by speakers whose disposition to use that term… was caused by speakers who knew and named Darwin. Thus, causal theorists are also committed to the view that there is an informationally privileged subset of “Darwin” users. Those who grounded the name on the individual knew him. They had to know what they were doing and to whom they were referring when Charles Robert Darwin acquired his names (Devitt and Sterelny, 1999). On this view, too, the division of linguistic labor is important, and those who ground lexical symbols on their targets carry an informational burden for the rest of us. This favor is reciprocated, for most of us play some role in grounding the language of our community, if not for kind terms like “leopard,” at least with proper names. We are all experts on our own inner circle.

In recent years, Dan Dennett and Andy Clark have pointed to the importance of cognitive tools in explaining human cognitive competence and its evolution (Clark, 1999; Dennett, 2000; Clark, 2001, 2002). Undoubtedly, the most important of these tools is language. We make tools for thinking, and these tools enormously enhance our cognitive powers. However, while these tools empower us, they are themselves cognitively demanding, as all of us who have stared despairingly at a computer manual know. Language magnifies our cognitive powers, but at the same time learning and using a language impose cognitive costs. In this chapter, I have discussed a few aspects of those costs and of how we have rejigged our social and linguistic environment to reduce those costs. In particular, I have discussed quality control and its costs. Quality control is necessary to keep the benefits of learning from others while reducing its error costs. But it is itself potentially very expensive, and I have discussed ways in which those costs were managed in the transitions from protolanguage to language6


Bibliography references:

Bickerton D. 1990. Language and Species. Chicago: Chicago University Press.

Bickerton D. 2005. The Origin of Language in Niche Construction. http://www.derekbickerton.com/blog/_archives/2005/3/28/486319.html.

Brighton H, Kirby S, et al. 2005. Cultural selection for learnability: Three principles underlying the view that language adapts to be learnable. In Language Origins: Perspectives on Evolution, ed. M Tallerman, pp. 291–309. Oxford: Oxford University Press.

Clark A. 1997. Being There: Putting Brain, Body, and World Together Again. Cambridge, MA: MIT Press.

Clark A. 1999. An embodied cognitive science? Trends in Cognitive Sciences 3: 345–50.

Clark A. 2001. Reasons, robots and the extended mind. Mind and Language 16: 121–45.

Clark A. 2002. Mindware: An Introduction to the Philosophy of Cognitive Science. Oxford: Oxford University Press.

Cosmides L, Tooby J. 2000. Consider the sources: The evolution of adaptation for decoupling and metarepresentation. In Metarepresentation: A Multidisciplinary Perspective, ed. D Sperber, pp. 53–116. Oxford, Oxford University Press.

Danchin E, et al. 2004. Public information: From nosy neighbors to cultural evolution. Science 305: 487–91.

Dawkins R. 1982. The Extended Phenotype. Oxford: Oxford University Press.

Deacon T. 1997. The Symbolic Species: The Co-evolution of Language and the Brain. New York: Norton.

Dennett DC. 2000. Making tools for thinking. In Metarepresentation: A Multidisciplinary Perspective, ed. D Sperber, pp. 17–29. Oxford: Oxford University Press.

Devitt M, Sterelny K. 1999. Language and Reality: An Introduction to Philosophy of Language. Oxford: Blackwell.

Dunbar R. 1996. Grooming, Gossip and the Evolution of Language. London: Faber and Faber.

Dunbar R. 2001. Brains on two legs: Group size and the evolution of intelligence. In Tree of Origin, ed. F. de Waal, pp. 173–92. Cambridge, MA: Harvard University Press.

Fitch T. 2004. Evolving honest communication systems: Kin selection and “mother tongues.” In Evolution of Communication Systems: A Comparative Approach, ed. DK Oller, U Griebel, pp. 275–96. Cambridge, MA: MIT Press.

Grassly N, Haeseler A, et al. 2000. Error, population structure and the origin of diverse sign systems. Journal of Theoretical Biology 206: 369–78.

Jackendoff R. 1999. Possible stages in the evolution of the language capacity. Trends in Cognitive Sciences 3: 272–9.

Jackson F. 1997. From Metaphysics to Ethics. Oxford: Oxford University Press.

Jackson F and Braddon-Mitchell, D. 1996. The Philosophy of Mind and Cognition. Oxford: Blackwell.

Kaplan H, Hill K, et al. 2000. A theory of human life history evolution: Diet, intelligence and longevity. Evolutionary Anthropology 9: 156–85.

Key C, Aiello L. 1999. The evolution of social organization. In The Evolution of Culture: An Interdisciplinary View, ed. R Dunbar, C Knight, C Power, pp. 15–33. Edinburgh: Edinburgh University Press.

Krakauer DC. 2001. Selective imitation for a private sign system. Journal of Theoretical Biology 213: 145–57.

(p.231) Krebs J, Dawkins R. 1984. Animal signals, mind-reading and manipulation. In Behavioural Ecology: An Evolutionary Approach, ed. JR Krebs, NB Davies, pp. 380–402. Oxford: Blackwell Scientific.

Lewontin RC. 1982. Organism and environment. In Learning, Development and Culture, ed. HC Plotkin, pp. 151–70. New York: Wiley.

Lewontin RC. 1985. The organism as subject and object of evolution. In The Dialectical Biologist, ed. RC Lewontin, R Levins, 85–106. Cambridge, MA: Harvard University Press.

Lewontin RC. 2000. The Triple Helix. Cambridge, MA: Harvard University Press.

List C. 2004. Democracy in animal groups: A political science perspective. Trends in Ecology and Evolution 19: 168–9.

Millikan R. 1998. Language conventions made simple. Journal of Philosophy 94: 161–80.

Nettle D, Dunbar R. 1997. Social markers and the evolution of reciprocal exchange. Current Anthropology 38: 93–9.

Nowak M, Krakauer DC. 1999. The evolution of language. Proceedings of the National Academy of Sciences 96: 8028–33.

Odling-Smee J, Laland K, et al. 2003. Niche Construction: The Neglected Process in Evolution. Princeton, NJ: Princeton University Press.

Potts R. 1996. Humanity’s Descent: The Consequences of Ecological Instability. New York: Avon.

Potts R. 1998. Variability selection in hominid evolution. Evolutionary Anthropology 7: 81–96.

Putnam H. 1975. Mind, Language and Reality: Philosophical Papers, Volume 2. Cambridge, Cambridge University Press.

Skyrms B. 2003. The Stag Hunt and the Evolution of Social Structure. Cambridge, England: Cambridge University Press.

Smith K, Kirby S, et al. 2003. Iterated learning: A framework for the emergence of language. Artificial Life 9: 371–86.

Sperber D. 2001. An Evolutionary Perspective on Testimony and Argumentation. Philosophical Topics 29: 401–413.

Sripada CS. 2005. Punishment and the strategic structure of moral systems. Biology and Philosophy 20: 767–789.

Sterelny K. 2003. Thought in a Hostile World. New York: Blackwell.

Sterelny K. 2004. Externalism, epistemic artefacts and the extended mind. In The Externalist Challenge: New Studies on Cognition and Intentionality, ed. R Schantz, pp. 239–54. Berlin and New York: de Gruyter.

Sterelny K. 2006a. Cognitive load and human decision, or, three ways of rolling the rock up hill. In The Innate Mind, Vol. 2: Culture and Cognition, ed. P Carruthers, S Laurence, S Stich, pp. 218–33. Oxford: Oxford University Press.

Sterelny K. 2006b. Folk logic and animal rationality. In Rational Animals?, ed. S Hurley, M Matthew Nudds, pp. 291–312. Oxford: Oxford University Press.

Sunquist M, Sunquist F. 2002. Wild Cats of the World. Chicago: University of Chicago Press.

Turner JS. 2000. The Extended Organism: The Physiology of Animal-Built Structures. Cambridge, MA: Harvard University Press.

Wilson DS, Wilcznski C, et al. 2000. Gossip and other aspects of language as a group-level adaptations. In The Evolution of Cognition, ed. L Huber, C Heyes, pp. 347–66. Cambridge, MA: MIT Press.


(1.) See Lewontin (1982, 1985, 2000) and more recently, and in great empirical detail, Odling-Smee, Laland, et al. (2003).

(2.) This is especially the case if there is a large menu of distinct signals. Moreover, error complicates learning: No signal will covary perfectly with its target state in the world (see Grassly, Haeseler, et al., 2000).

(3.) (Krakauer, 2001) suggests that the cultural evolution of arbitrary signals patches the public information problem, for only members of the local community will understand those symbols: Public information will not flow beyond the boundaries of the local group.

(4.) There is a plausible ecological hypothesis explaining selection for increased cooperation between early hominin adults. As Africa became hotter and drier, our ancestors found themselves livings in woodlands and savannah. This change in habitat intensified selection for male cooperation in defense (Dunbar, 2001) and for reproductive cooperation among females (Key and Aiello, 1999). If that is right, we can explain why in our lineage alone selection on groups for cooperation partially suppressed selection on individuals for free riding.

(p.230) (5.) Or, analogously, it is driven by error costs in transmission: For this idea, see Nowak and Krakauer (1999).

(6.) Thanks to the editors, David Krakauer, and an anonymous referee for very helpful feedback on an earlier version of this chapter, and thanks also to the workshop participants for their comments and questions.