Jump to ContentJump to Main Navigation
The Primacy of Grammar$

Nirmalangshu Mukherji

Print publication date: 2010

Print ISBN-13: 9780262014052

Published to MIT Press Scholarship Online: August 2013

DOI: 10.7551/mitpress/9780262014052.001.0001

Show Summary Details
Page of

PRINTED FROM MIT PRESS SCHOLARSHIP ONLINE (www.mitpress.universitypressscholarship.com). (c) Copyright The MIT Press, 2018. All Rights Reserved. An individual user may print out a PDF of a single chapter of a monograph in MITSO for personal use. Subscriber: null; date: 24 October 2019

Linguistic Theory II

Linguistic Theory II

(p.160) (p.161) 5 Linguistic Theory II
The Primacy of Grammar

Nirmalangshu Mukherji

The MIT Press

Abstract and Keywords

This chapter presents in a detailed way the biolinguistic project view that languages are systems comprising a computational system and a lexicon. It says that a computational system includes the syntactic and the semantic systems that together provide the rich expressive potency of human language. It explains the significance of the Minimalist Program for linguistic theory, including the conceptual necessity and feature checking of the program, and the developments in the years that followed that we know as “Merge in the System”. It also examines merge and syntax, as well as merge and semantics. This chapter also discusses Chomsky’s study of two global ideas: least effort and last resort in terms of economic principles, as well as his CHL, or the abstract notion of the single computational system of human language, including linguistic specificity, principles, and displacement.

Keywords:   biolinguistic project, lexicon, Minimalist Program, Merge in the System, merge and syntax

Since its inception, the biolinguistic project has always viewed languages as systems consisting of a computational system (CS) and a lexicon. CS works on elements of the lexicon to generate sound-meaning connections of human languages. This foundational aspect of the project was firmly articulated in Chomsky 1980, 54–55, where a principled distinction was made between the computational and the conceptual aspects of languages. A computational system includes the syntactic and the semantic systems that together provide the “rich expressive power of human language”; the conceptual system was taken not to belong to the language faculty at all but to some other faculty that provides “common sense understanding” of the world.

To my knowledge, the strikingly more abstract notion of the single computational system of human language—CHL—was first used offcially in Chomsky 1994a.1 To appreciate how the notion emerged, recall that the G-B system postulated a computational system each of whose principles and operations was universal—that is, the principles themselves did not vary across languages. But most of the principles came with parametric choices such that particular languages could not be described in this system until the parameters were given specific values. In that sense, the G-B system also worked with a plurality of computational systems for describing human languages. A very different conception of the organization of the language faculty emerged as follows.

To track the printed story, Chomsky (1991c, 131) proposed that “parameters of UG relate, not to the computational system, but only to the lexicon.” If this proposal is valid, then “there is only one human language, apart from the lexicon, and language acquisition is in essence a matter of determining lexical idiosyncrasies” (p. 131). The “lexical idiosyncrasies” are viewed as restricted to the morphological part of the lexicon; the rest of the lexicon is also viewed as universal. In that sense, (p.162) Chomsky (1993, 170) held that “there is only one computational system and one lexicon.” Finally, in Chomsky 1994a, this computational system was given a name: a single computational system CHL for human language. “CHL” was used extensively in Chomsky 1995b.

The guiding assumption—mentioned earlier but spelled out much later, in Chomsky 2000b—is that language variation itself ought to be viewed as a problem, an “imperfection,” for learnability of languages; there are just too many of them. Notice that this idea has a different flavor from the classical considerations from the poverty of the stimulus. Even if the salience of those considerations are questioned (Pullum and Scholz 2002; Crain and Pietroski 2002 for response), no one can deny that there are thousands of languages and dialects. Linguistic theory, therefore, should be guided by the “Uniformity Principle” (116).

  1. (116) In the absence of compelling evidence to the contrary, assume languages to be uniform, with variety restricted to easily detectable properties of utterances.

Assuming the detectable properties to reside (essentially) in the morphological part of the lexicon, the conception of a single computational system for (all) human languages follows. The Minimalist Program (MP) for linguistic theory attempts to articulate this conception. As usual, I will keep to the basic design of the system following Chomsky.

5.1 Minimalist Program

To set the stage for MP, let me quickly restate one of the major aspects of the principles-and-parameters framework that was stressed at the very beginning (see section 2.2): the property ±cp (construction-particularity) disappears.2 Literally, there are no rules in the G-B system; there are only universal principles with parametric choices. Move-α is the only transformational operation which applies to any category. Although the property ±lp (language-particularity) cannot disappear, nothing in the system explicitly mentions specific languages such as English or Hopi. Specific languages are thus viewed as “epiphenomenon” produced by the system when triggered by particular experience (Chomsky 1995b, 8).

In this sense, the framework abstracts away from the vast particularities of languages to capture something like “human language” itself. By any measure, this is a striking abstraction. It is surprising that, despite this leap in abstraction, empirical coverage has in fact substantially increased, explaining the exponential growth in crosslinguistic studies in the last three decades. On the basis of what we can see from here with (p.163) limited understanding, the Minimalist Program proposes to push the growing abstraction very close to the limit.

5.1.1 Conceptual Necessity

As a preparation to that end, let us ask, following Chomsky 1993: Can the linguistic system be described entirely in terms of what is (virtually) conceptually necessary? Suppose we are able to form some conception of what is minimally required of a cognitive system that is geared to the acquisition, expression and use of human language with its specific properties of unboundedness and construction of “free expression” for speech and interpretation (Chomsky 1994a). Chomsky is asking whether this minimal conception can be put to maximal explanatory use. In other words, can the relevant facts of language be fully explained with this conception alone?

To appreciate what is involved, let us see if there are nonminimal conceptions in the earlier theory. Consider the distinction between “inner” and “outer” levels of grammatical representations. We assume that (at least) two outer levels are needed in any case since the system must be accessible to external systems: sensorimotor (SM) and conceptual-intentional (C-I). Keeping to G-B, let us assume for now that PF and LF are those levels; PF and LF are then the “interface” levels of representation. The inner levels (d- and s-structures), in contrast, are just theoretical constructs to which no other system of the mind has access. In current terms, the inner levels are not conceptually necessary. So we have a lexicon and two interface levels for the SM and the C-I systems. The task of the computational system then is to map lexical information onto the two interfaces for the “external” systems to read them: satisfaction of legibility conditions.

As for the design of CHL, we not only assume as noted that it is completely universal, we would want the system to contain only those principles and operations that are required under conceptual necessity. Now, we know that there is at least one “imperfection” in human languages in that sound-meaning connections are often indirect: the displacement problem. We return to the issue of whether it in fact is an imperfection. Since syntactic objects need to be displaced, conceptual necessity suggests that there is one operation that affects it: Move-a; better, Affect-a. This part of conceptual necessity (as with much else), then, was already achieved in G-B.

Since movement is “costly,” we assume, again following the spirit of conceptual necessity, that movement happens only as a “last resort” and with “least effort.” These are notions of economy that endow an optimal (p.164) character to a system. We assume that only economy principles with “least-effort” and “last-resort” effects occur in the system to enforce optimal computation. Move-a (or, Affect-a) may now be viewed quite differently: nothing moves unless there is a reason for movement and then the preference is for the most economical movement (Marantz 1995). Finally, we ensure, following conceptual necessity, that optimal computation generates only those syntactic objects that are least costly for the external systems to read. In other words, representations at the interfaces also must meet conditions of economy. Let us say, a system that meets these design specifications is a “perfect” system.

We thus reach the Strong Minimalist Thesis (SMT): “Language is an optimal solution to interface conditions that FL must satisfy.” If SMT held fully, “UG would be restricted to properties imposed by interface conditions” (Chomsky 2006c). (See figure 5.1.)

The first thing we would want to know is how the displacement problem is now addressed within the restrictive conditions. We saw that postulation of s-structures allowed computations to branch off to accommodate this fact. Without s-structures, the problem reappears. We saw that displacement requires movement. So the issue of how to handle displacement gets closely related to the issue of what causes movement and

Linguistic Theory II

Figure 5.1 Minimalist program

(p.165) what constrains it. The basic idea in MP is that certain lexical features cause movement under conditions of optimality enforced by economy principles. The idea is explained and implemented as follows.

5.1.2 Feature Checking

Suppose the system is presented with an array of lexical items. This array is called a “numeration” N, which is a (reference) set {LI, i}, where LI is a lexical item and i its index showing the number of occurrences of LI in the array. A lexical item enters the system with the operation Select. Select maps N to a pair of representations at the interfaces. Each time Select picks a lexical item, the index is reduced by one. A representation is not well formed until all indices reduce to zero; essentially, the procedure guarantees in purely algorithmic terms that each occurrence of every lexical item of the given array has entered the system. Since, by the time computation reaches the interfaces, complex syntactic objects must be formed for inspection by the external systems, two or more objects picked up by Select need to combine. The operation that repeatedly combines lexical items as they individually enter the system is Merge.

What are the items that must be “visible” at the interfaces for the external systems to read? It seems the following items are needed at most: properties of lexical items, and certain types of larger units formed of lexical items—units traditionally called “noun phrase,” “verb phrase,” and so on. As we will see, the semantic component seems to recognize things that can be interpreted as topic, new item, argument structure, perhaps even proposition, and the like. For this, individual lexical items (their interpretable features) and structures like CP, DP, and “light-verb” phrases are needed at the interface. In X-bar theoretic terms, just the minimal and maximal projections are needed; in particular, X-bar levels are not needed. Conceptually speaking, lexical items enter the system via Select anyway and Merge combines larger units from them “online.” So, X-bar theory is not needed under conceptual necessity. Having suggested the conceptual point, I will leave matters at that, since, technically, it is not clear that bar levels can in fact be dispensed with.3

The elimination of bar levels suggests another minimalist move. Recall that bar levels were introduced during computation; similarly for indices for binding theory, not to speak of d- and s-structures. Under conceptual necessity, lexical information must enter CHL. In contrast, bar levels, indices, and inner structures seem to be just theoretical devices, and of these, as we saw, some are certainly eliminable. In the spirit of minimalism, then, we adopt the Inclusiveness Condition: no new objects are added during (p.166) computation apart from rearrangement of lexical properties. In effect, the condition requires that what must enter the system is the only thing that enters. Conceptually, the Inclusiveness Condition captures the natural idea that to know a language is to know its words (Wasow 1985); the computational system does not have anything else to access.

We recall that most of the constraints on movement were located at the inner levels of computation (figure 2.1). With the inner levels removed, the system will generate wildly. The effects of some of these constraints, therefore, need to be rephrased or captured somehow in minimalist terms. Under conceptual necessity, there are a limited number of options available: either the burden of the inner levels is distributed over the outer levels, or they are traced back to the lexicon, or the principles of the system are redesigned; or, as it turns out, all of them together. Basically, we will expect the conditions of computational effciency, such as least effort and last resort, to constrain the operations of Merge such that, after optimal computation, all and only those structures are typically retained which meet legibility conditions. The huge empirical task is to show that this in fact is the case.

Suppose Merge combines two lexical items a and b to form the object K, a “phrase,” which is at least the set {α, β}. K cannot be identified with this set because we need to know which phrase it is: verb phrase, noun phrase, and so on. So K must be of the form {γ, {α, β}}, where γ identifies the type of K; γ is, therefore, the label of K. As discussed below, lexical items are nothing but collections of features. By the inclusiveness condition, γ can be either the set of features in α or in β or a union of these sets or their intersection. For simplicity, suppose γ is either α or β, the choice depending on which of the two items in the combination is the head: the head of a combination is the one that is “missing something,” one that needs supplementation. Suppose it is α;4 the other one, b, is then the complement. Thus K = {α, {α, β}} (for more, see Uriagereka 1998, 176–182).

When the and man are combined by Merge, we get the structure K = [the [the man]]. Note that K here is of the type “determiner phrase” (DP). Suppose Merge forms another object L from saw and it. Merging K and L, we get,

  1. (117) [VP saw [DP the [the man]] [V0 =V saw [saw it]]]

The resulting structure (117) is not exactly pretty in that it is diffcult to interpret visually; the proposed tree diagrams are even more so (see Chomsky 1993). We seem to lose the familiar sense of sentencehood. But (p.167) then the subroutines that underlie the single-click operations Delete or Escape in your computer, not to speak of Change All Endnotes to Footnotes, are not simple either. We expect a genuine theory to describe (states of) its objects at a remove from common expectations. (117) is what the mind “sees,” only a part of which is made available to the articulatory systems via economy conditions internal to the phonological system.

This unfamiliar structure enables a range of novel theoretical moves. Recalling X-bar concepts, think of K, which is the maximal projection of the, as the “Specifier” of saw, and the maximal projection of saw “Verb Phrase”: all and only items needed at the LF-interface have been made available under minimalist assumptions. The following local domains are also available: specifier-head, head-complement, alongwith a modified c-command details of which I put aside. Notice that X-bar theory—especially the intermediate projections (the bar levels)—is essentially abandoned on the basis of most elementary assumptions (Chomsky 1995b, 249). Notice also that the scheme naturally incorporates the VP-internal-Subject hypothesis, which, as noted, has independent motivation—for example, cliticization and external θ-assignment. There are other advantages of generating phrase structure “online” as Select and Merge introduce and combine sets of lexical features. We will see that it leads to a very natural and economical theory of displacement.

Regarding the nature of (lexical) information that enters the system, the basic idea is that lexical items are a cluster of features: phonetic features (how it sounds), semantic features (what is its linguistic meaning) and formal features (what is its category). Suppose we introduce a general requirement that formal features need to be “checked” in the course of a derivation. “Checking” really means checking for agreement, a grammatical well-formedness condition known since antiquity. For example, the number feature of verbs has to agree with the same feature of Subject-nouns: they do versus he does. The number feature makes a difference to the semantic interpretation of nouns, for example, the difference between singularity and plurality. But, although the verbs have to agree with the nouns in this respect, the number feature of verbs does not contribute to its semantic interpretation: do means exactly what does does.

The fact that number feature of verbs are uninterpretable at the semantic interface conflicts with optimal design because a principle of economy on representations, Full Interpretation (FI), will be violated. The presence of uninterpretable features thus renders the system “imperfect.” Uninterpretable features abound in languages: number, gender and person features on verbs; Case features on nouns; number and gender (p.168) features on articles, and so on. The peculiar and ubiquitous property of the agreement system in human languages is at once variously exploited and explained in MP.

For one such exploitation, we “save” semantic interpretation by letting the number feature of verbs being wiped out from the semantic part of the computation once the sound requirement of agreement is met. Generalizing sweepingly on this, we first impose the requirement of feature checking of certain formal features which means, in effect, that another instance of the (agreeing) feature to be checked be found somewhere in the structure. By minimalism, the relation between the feature-checker and the feature-checked must be established in a local domain: least effort. So, if an item is not in the local domain of its feature-checker, it must move—as a last resort, obeying least effort throughout—to a suitable position to get its feature checked (and be wiped out if it is not making any semantic difference). Thus, certain formal features of lexical items such as gender, person, and number, often called “φ-features,” force displacement under the stated conditions.

To take a simple example and skipping many details, consider the sentence she has gone (Radford 1997, 72). The item she has the formal features third-person female nominative in the singular (3FNomS). Apart from the tense feature (Present) the item has has the feature that it needs to agree with a 3NomS Subject. Has is in the local domain of she (specifier-head) and its features match those in the Subject; similarly for the features in has and gone. The structure is well formed. Among features, Nom is a Case feature. Case features do not make any difference to semantic interpretation. So, it is wiped out from she once the requirement of has has been met; similarly with the 3S features of has. However, the 3SF features of she remain in the structure since these are semantically significant.

The situation is different with she mistrusts him. Following Radford 1997, 122, suppose Merge creates the structure (118) in three steps.

Linguistic Theory II

Now, mistrusts requires a 3SNom Subject to check its uninterpretable 3SNom features. She is the desired Subject, but she and mistrusts are not in a local relation since Infl (= Tense) intervenes. The tense feature of Infl, on the other hand, lacks any value. Let us say Infl attracts the tense feature Pres of mistrusts. Suppose that the attraction of the Pres feature (p.169) moves other features of mistrusts as well: general pied-piping. This results in a local relation between the (features) of she and the attracted features such that the checking of rest of the features of mistrusts takes place, and uninterpretable features are wiped out. Additionally, Infl gets filled with tense information Pres such that Infl now becomes LF-interpretable (satisfies FI).

The scheme just described addressed an obvious problem. Surely, uninterpretability is something that external systems determine; how does the computational system know which features are uninterpretable without “look-ahead” information? The basic solution is that CHL looks only for valued and unvalued features, not interpretable and uniterpretable ones. For example, CHL can inspect that she has the value “female” of the gender feature or that some INFL is unvalued for the tense feature, as we saw. So, the computation will be on values of features rather than on their interpretability. Assume that features that remain unvalued after computation is done will be uninterpretable by the external systems. CHL will “know” this in advance, and the computation will crash. Again, a host of complex issues arise that I am putting aside.

5.1.3 (New) Merge

What I just sketched belongs basically to the early phase of the program (Chomsky 1995b). Developments in the years that followed suggest that we can do even better in terms of SMT. I must also note that, as with every major turn in theory, early adherents begin to branch out as the logic of a given proposal is pursued from different directions. Hence, as noted in the preface to this work, there is much less agreement with Chomsky’s more recent proposals than with the early phase of the minimalist program, even within biolinguistics.5

With this caveat in mind, recall that MP postulated two elementary operations: Merge for forming complex syntactic objects (SO), and Move for taking SOs around. Do we need both? We can think of Merge as an elementary operation that has the simplest possible form: Merge (α, β) = {α, β}, incorporating the No Tampering Condition (NTC), which leaves α and β intact. The effect is that Merge, in contrast to the earlier formulation, now projects the union of α and β simpliciter without labels—that is, without identifying the type of syntactic object constructed. As a result, phrase-structure component stands basically eliminated from the scheme, leading to the postulation of “phases” as we will presently see. Also, Merge takes only two objects at a time—again the simplest possibility (Boeckx 2006, 77–78)—yielding “unambiguous paths” in the form (p.170)

Linguistic Theory II

Figure 5.2 Unambiguous paths

of binary branching as in figure 5.2.6 It is obvious that NTC is at work: γ is not inserted inside α and β; it is inserted on top (= edge) of α, β: NTC forces hierarchy.

Merge is conceptually necessary. That is, “unbounded Merge or some equivalent is unavoidable in a system of hierarchic discrete infinity” because complex objects need to form without bound, so “we can assume that it ‘comes free’” (Chomsky 2005, 12; also 2006c, 137). The present formulation of Merge is the simplest since, according to Chomsky, anything more complex—for example, Merge forms the ordered pair 〈 α, β〉—needs to be independently justified. The argument works even if it is suggested that, due to the physical conditions imposed on human sign systems, a specific order—linearity—is inevitable (Uriagereka 1999, 254–255). If the physical design forces linearity anyway, why should Merge specifically reflect that fact? The simplest design of the language faculty thus treats linear ordering of linguistic sequences as a property enforced by the sensorimotor systems since humans can process acoustic information only in a single channel; apparently, dolphins can process acoustic information in two channels simultaneously. The computational system itself does not enforce order; it enforces only hierarchy—that is, sister-hood and containment.7 The emergence of Merge signaled the “Great Leap Forward” in evolution.

Now, unless we make the special assumption that α and b in Merge (α, β) are necessarily distinct, β could be a part of α. Since special assumptions are “costly,” we allow the latter since it comes “free.” In that case, (Internal) Merge can put parts together repeatedly as long as other things are equal. The original part will appear as copies (= traces) conjoined to other parts:

  1. (119) The book seems [the book] to have been stolen [the book]

(p.171) Here, displacement of the book from the original Object position just means that only one of the copies, that is, the leftmost one, is sounded for reasons of economy in the phonological component; others are left as covert elements to be interpreted by the C-I systems. Internal Merge thus functions as Move under copy theory of movement. Merge and Syntax

Consider the effects of Merge in the system. As noted, linguistic information enters the computational system in the form of lexical features divided into three types: phonetic, formal, semantic. External Merge takes lexical features as inputs and constructs complex SOs obeying NTC; internal Merge sets up further relations within SO. These ideas enable us to redescribe the familiar phenomenon of dislocation. Following Chomsky 2002, I will give a brief and informal description (see Chomsky 2000b, 2000c, 2006c, for more formal treatment). To implement dislocation, three things are needed:

  1. (i) A target (= probe) located in a head that determines the type of category to be dislocated

  2. (ii) A position to be made available for dislocation

  3. (iii) A goal located at the category to be dislocated

By the inclusiveness condition, lexical information is the all and only information available to the computational system. Lexical information is distributed as properties of features. So, the preceding three requirements can be met if we can identify three features that have the relevant properties. In fact there are these three features in a range of cases requiring dislocation.

For example, in some cases, the goal is identified by the uninterpretable Structural Case, the probe is identified by redundant agreement features, and the dislocable position (= landing site) is identified by the EPP (extended projection principle) feature. It is easy to see how the phenomenon represented in (118) can now be redescribed in probe-goal terms (Infl = Tense is the locus of EPP). The scheme provides a natural explanation for the existence of the EPP feature. Previously, the Extended Projection Principle—the Subject requirement typically satisfied by pleonastic there and it in English (see section—was considered “weird” because no semantic role is involved. Now we know what the role is: it is a position for dislocation namely, the Subject position (Chomsky 2002; Bošcović 2007 for a different approach to EPP).

To see how the basic scheme extends to wh-movement and to illustrate the subtlety of research, I have followed Pesetsky and Torrego (2006) and (p.172)

Linguistic Theory II

Figure 5.3 Merge in language

Pesetsky (2007), who propose somewhat different ideas than Chomsky regarding valuation and interpretability. Setting many details (and controversies aside), figure 5.3 displays the phenomenon of dislocation of which book in the structure (120); only relevant features for wh-movement are shown.

  1. (120) (I wonder) which book the girl has read.

In the diagram (adapted from Pesetsky 2007), (external) Merge generates the basic structure in six steps. According to Pesetsky and Torrego, the probe C—on analogy with T, which is typically unvalued and which hosts EPP—is viewed as containing an unvalued but interpretable wh-feature along with EPP; it thus seeks a matching valued feature to “share” the value of the latter. A valued (interrogative) but uninterpretable wh-feature is located in the goal as indicated. The EPP feature of C determines the position C′, which is the projection of the head C, for (internal) Merge to place a copy of the goal which book there.8 The dislocation of which book to the edge satisfies the “external” condition that which book is to be interpreted as a quantifier. In effect, conditions on meaning follow from the satisfaction of narrow conditions on computation.

Furthermore, the incorporation of Merge enforces a drastic reduction in computational complexity. As Chomsky (2005) explains, the G-B (p.173) model required three internal levels of representation—D-Structure, S-Structure, and LF—in addition to the interface levels. This increases cyclicity. Intuitively, a syntactic “cycle” refers to the syntactic operations in a domain, where the domain is determined by a selection from the lexicon (Boeckx 2006 for more). Now, the postulation of three internal levels in G-B required five cyclic operations on a selection from the lexicon: (i) the operations forming D-Structures by the cyclic operations of X-bar theory; (ii) the overt syntactic cycle from D- to S-Structure; (iii) the phonological/morphological cycle mapping S-Structure to the sound interface; (iv) the covert syntactic cycle mapping S-Structure to LF; and (v) formal semantic operations mapping LF compositionally to the meaning interface.

SMT suggests that all this can be reduced to a single cycle, dispensing with all internal levels; as a result, the distinction between overt and covert computation is given up. Incorporation of Merge enables the elimination of compositional cycles (i), (ii), and (iv). This leaves two mapping operations, (iii and v), to the interfaces. For these, Chomsky invokes the notion of a minimal SO called a “phase.” Phases denote syntactic domains constructed by Merge. In some cases, phases look like classical phrases such as CP and DP (see figure 5.3), in some others these are new objects such as “light-verb” phrases (vP) with full argument structure. Details (and controversies) aside, the basic point for SMT is that phases are all that the grammatical system generates and transfers to the interfaces; perhaps the same phase is transferred to both the interfaces. Merge and Semantics

As noted, computations at the sound end of the system might pose problems for this coveted austere picture since phonological computation is thought to be more complicated than computation to the C-I interface (= narrow syntax): “We might discover that SMT is satisfied by phonological systems that violate otherwise valid principles of computational effciency, while doing the best it can to satisfy the problem it faces (Chomsky 2006c, 136).”

The austere picture might still hold if we recall classical conceptions of language “as a means of ‘development of abstract or productive thinking’” with “communicative needs a secondary factor in language evolution” (Chomsky 2006c, 136).9 In contemporary terms, this means that the principal task of FL is to map syntactic objects to the C-I interface optimally; mapping to the SM interface will then be viewed as “an ancillary process.”10 SM systems are needed for externalization such as the ability to talk in the dark or at a distance; if humans were equipped with (p.174) telepathy, SM systems would not be needed. Thus, SM systems have little, if anything, to do with the productivity of language. Viewing narrow syntax thus as the principal effect of FL, syntactic operations essentially feed lexical information into the C-I systems single cyclically (phase by phase), period.

Finally, the Merge-based mechanisms just sketched have interesting consequences, in line with SMT, for the meaning-side. First, the single-cyclic operation eliminates the level of representation LF/SEM; this is because, by (v) above, the existence of this level of representation will add another cycle. Second, Merge constructs two types of structures, each a type of phase, that are essentially geared to semantic interpretation. According to Chomsky, most of the grammatically sensitive semantic phenomena seem to divide into two kinds: argument structure and “elsewhere.” “Elsewhere” typically constitutes semantic requirements at the edge such as questions, topicalization, old/new information, and the like, as noted. These requirements seem to fall in place with what Merge does: as we saw, external Merge constructs argument structures, internal Merge moves items to the edge. At the sound end, internal Merge forces displacement.

For our purposes, it is of much interest that, although the relevant format for the C-I systems is made available by Merge, much of the semantic phenomena handled in G-B is no longer covered inside the narrow syntax of MP. However, as argued above (chapter 3), some semantic computation must be taking place in mapping lexical information to the C-I interface; there is no alternative conception of what is happening in this cycle. In that sense, the narrow syntax of MP captures what may be currently viewed as “minimum” semantics. Given SMT, much of what fell under grammatical computation earlier—binding, quantifier scope, antecedent deletion condition, theta theory, and so on—can no longer be viewed as parts of FL proper. According to Chomsky (2002, 159), operations that are involved in these things—assignment of θ-roles to arguments to enforce the θ-criterion, free indexing to set up coindexing for Binding theory, and mechanisms for marking scope distinctions,—“are countercyclic, or, if cyclic, involve much more complex rules transferring structures to the phonological component, and other complications to account for lack of interaction with core syntactic rules.”11

These systems are best viewed as located just outside FL, hence, at the edge of C-I systems, which, to emphasize, are performance systems: “It is conceivable that these are just the interpretive systems on the meaning side, the analogue to articulatory and acoustic phonetics, what is going (p.175) on right outside the language faculty” (Chomsky 2002, 159). These interpretive systems are activated when data for anaphora, thematic roles, and so on carry information for these procedures. It enables us to think of CHL itself as independent of these procedures.12

To speculate, although Chomsky has placed these systems “right outside” FL, it seems to me that they require a special location there. So far, the “outside” of FL at the semantic side was broadly covered by the entire array of the conceptual-intentional (C-I) systems. These comprised of the conceptual system, the “pragmatic” systems giving instructions for topicalization, focus, and perhaps instructions for truth conditions; call the package “classical C-I.” The systems just expelled from FL seem to differ quite radically from classical C-I systems: (i) they enforce structural conditions like scope distinction and referential dependency; (ii) their description needs grammatical notions such as anaphors and wide scope; and, most importantly, (iii) few of them, if any, are likely to be shared with, say, chimpanzees who are otherwise viewed as sharing much of the C-I elements with humans (Hauser, Chomsky, and Fitch 2002; Premack and Premack 2003; Reinhart 2006).

To capture the distinction between these and the classical C-I systems, let us call the former “FL-driven Interpretation” (FLI) systems. To follow Chomsky’s tale, classical C-I (and sensorimotor) systems were already in place when the brain of an ape was “rewired” to insert FL. In contrast, as we will see in section 5.2, although FLI systems are not language-specific, they are best viewed as linguistically specific; we do not expect every cognitive system requiring manipulation of symbols to have them. In other words, although FLI systems are invariant across specific languages such as English and Hopi, their application is restricted to the domain of human language. Given their linguistically specific nature, it is hard to think of FLI systems as existing in the ape’s brain prior to the insertion. In that sense, FLI systems belong to FL without belonging to CHL.

In the extended view of FL just proposed, FLI systems occupy the space between CHL and classical C-I.13 If this perspective makes sense, then, we may not view FLI systems as “external” in the sense in which classical C-I systems are external. In this sense, they are dedicated to language even if they are viewed as located outside of CHL. We are just going by a tentative list of these systems so far. As discussed in chapter 3, much of the preparatory structural work for the thought systems seems to take place here: the θ-criterion is satisfied, referential dependency is established, scope ambiguities are resolved, and so forth. What falls under (p.176) FLI is an empirical issue, but the conception ought to be reasonably clear. The proposed organization thus retains the spirit of G-B in that the information encoded at the output representation of FLI systems recaptures LF; however, in G-B, these “modules” were seen as operative in narrow syntax itself.

FLI systems thus (immediately) surround CHL. In doing so, they carve the path “interpretation must blindly follow” (Chomsky 2006a). CHL is a computational device that recursively transfers symbolic objects optimally to FLI systems. If CHL has optimal design, then we will expect CHL to transfer (mostly) those objects that meet the conditions enforced by FLI systems. For example, we will expect the structural conditions for establishing various dependencies in Binding theory, or scope distinctions for quantifiers, to follow from computational principles contained in CHL (Chomsky 2006a; 2006d; Rouveret 2008). FLI systems then enforce additional conditions—perhaps countercyclically—to establish sound-meaning correlations within grammar. In doing so, they generate structures to which classical C-I systems add content. From this perspective, most of the interesting work trying to articulate grammatically sensitive semantics of human language is in fact focused on FLI systems (Hinzen 2006; Reinhart 2006; Uriagereka, 2008).

5.1.4 Economy Principles

Chomsky (1995b) extensively discussed two global ideas: least effort and last resort. As far as I can figure, these were discussed as “considerations” or “factors” that must be implemented somehow in the working of the computational system for human language; it is hard to locate an actual statement of them as principles on par with, say, the projection principle or the principles of binding.14 Be that as it may, the conceptual significance of these ideas is not difficult to appreciate. As the name suggests, last resort requires that syntactic operations—especially, operations effecting dislocation—are resorted to only when some syntactic requirement cannot be met by other means. This is because syntactic operations are “costly.” Least effort requires that, in case syntactic operations are to be executed, global preference is for executions that are least costly. The point is that, whatever be the specific articulation of these ideas, the minimalist program needs them in one form or another. Hence, subsequent controversies about the character of specific principles (noted below) do not affect the general requirement.

Although Chomsky examines a range of syntactic phenomena directly from these “considerations,” he does propose some specific principles that (p.177) appear to fall under the last resort and least effort categories. For example, Chomsky 1995b made extensive use of the principles of greed and procrastinate. Greed required that a syntactic operation, essentially Move, applies to an item α only to satisfy morphological properties of α, and of nothing else. This clearly is a last resort principle—Chomsky christened it “self-serving last resort”—which has a markedly descriptive character. It is eliminable if we simply require that Move works only under morphological considerations—that is, Move raises features which, in fact, is all that Move does. The idea is even more naturally incorporated in the probe-goal framework with (Internal) Merge, as we saw. Procrastinate, another last resort principle, required that covert operations to LF be delayed until overt operations to PF are completed because PF operations are “costly.” Skipping details, this had to do with a distinction between strong (phonological) features and weak features (for more, see Lasnik and Uriagereka 2005, 3.4). As the distinction was given up, so was procrastinate. In any case, as we saw, in the recent design there is no overt/covert distinction.

The point to note is that both of these last resort principles were cast in linguistic terms—morphological features and strong features—and both were eliminated soon after they were proposed. But the elimination of specific principles does not mean that the last resort condition has disappeared. As Boeckx (2006) shows for a variety of cases, the last resort condition is now deeply embedded in “local” syntactic mechanisms; for example, an element is no longer accessible for computation once its Case has been checked, which means that Case checking is a last resort. We can state this rather specific condition because it is a reflex of the general last resort condition. In this sense, the last resort idea remains central, but as a consideration or an effect to be implemented in syntax whenever required (Lasnik and Uriagereka 2005, 174–175).

As for least-effort principles, three principles stand out: Full Interpretation (FI), Minimal Link Condition (MLC), and Shortest Derivation Condition (SDC)—all drawn essentially from G-B. Although the general conceptions of last resort and least effort are clear enough, it is not obvious that specific principles fall neatly under them. Take Procrastinate. In the preceding discussion, Procrastinate was taken to be a last resort principle because it delayed (covert) computation to LF—that is, between PF- and LF-computation, LF-computation is the last one resorted to. By the same token, since Procrastinate allows PF-computation first, it can be viewed as a least effort principle that reduces the number of “costly” overt operations. This is reaffirmed as follows.

(p.178) Shortest Derivation Condition says that between two converging derivations, the one with less number of steps is preferred. It is then clearly a least-effort principle. However, Kitahara (1997) suggested that Procrastinate can be derived from the least effort SDC; it follows, in any case, that Procrastinate is not independently needed. Although the least effort spirit of SDC is well taken, it looks problematic as formulated since it requires that two or more converging derivations be compared for their length. Comparing derivations “globally” after they are over is a hugely costly affair not becoming of a least-effort principle. The natural solution is to basically block multiple derivations by keeping derivations short so that alternative derivations do not get the chance to branch out, as it were (Boeckx 2006). This is achieved by reducing syntactic domains and operations on them: cyclicity. In that sense, Merge-driven single-cyclic derivations by phase implements the spirit of SDC without invoking it; as a principle, SDC is not required.

This leaves the principles FI and MLC. As noted, Full Interpretation requires that no illegible objects appear at the interfaces (for more, see Lasnik and Uriagereka 2005, 105–106); Minimal Link Condition requires “shortest move” in that an element cannot move to a target if another element of the same category occurs between the first element and the target. MLC thus imposes a condition of relativized minimality—“relativized” because minimal links are defined with respect to specific categories such as WP. Both FI and MLC are clearly least-effort conditions. As with last resort and SDC, the question is if they are to be stated as specific principles. As Chomsky (1995b, 267–268) observes, if we state MLC as a principle, then it can only be implemented by inspecting whether another derivation has shorter links or not. Similar observations apply to FI if it is to be formulated as an output condition for choosing the most economical representation. In each case, the problem of globality arises.

As with last resort, the natural solution is to think of these conditions as enforced directly in the system. Thus, the reflex of MLC obtains by simply barring anything but shortest move (Reinhart 2006, 22); the reflex of FI obtains by rejecting structures with uninterpretable elements within CHL, as noted. Empirically, we know that these conditions on movement and representations are obeyed in a vast range of cases. To that extent, these conditions are empirical generalizations. If, however, an evidence is located in which, say, the minimal link condition on movement is apparently violated, we try not to withdraw the least effort condition, but explain the anomaly by drawing on other factors (Boeckx 2006, 104). Be (p.179) yond empirical generalization, then, the least effort conditions act as general constraints on the system.

To sum up, it is reasonable to say that the last resort condition obtains in CHL in a general way. The least effort condition seems to obtain more specifically in three subconditions implementing the effects of SDC, MLC, and FI: these subconditions are restricted cyclicity, condition on movement, and condition on representations, respectively. To emphasize, neither of the last resort and least effort conditions are stated as specific economy principles. During computation, we will expect the last resort and least effort conditions to obtain throughout to ensure that the computational system, CHL, meets the conditions of optimal design in its operations.

5.2 CHL and Linguistic Specificity

Although intricate and, sometimes, elaborate computations take place in CHL as information from the extremely complex lexicon of human languages enters the system, CHL itself constitutes of just Merge that operates under last-resort and least-effort conditions—apparently, nothing else.15 To put it differently, linguistic information is essentially stored in the lexicon and, working on it, CHL generates symbolic objects at the interfaces which are interpreted by the relevant external systems. As we saw, some of these systems are likely to be linguistically specific. Is the CHL itself—or, better, its components—linguistically specific?

I am raising this question because, once we reach the austere design of CHL under the Minimalist Program, it is difficult to dispel the intuition that the system seems to be functioning “blindly” just to sustain efficient productivity. There is a growing sense that, as the description of the human grammatical system gets progressively simple, the terms of description get progressively linguistically non-specific as well.

Let us say that a principle/operation P of a system Si is nonspecific if P makes no reference to Si-specific categories. Suppose that the collection of Ps is sufficient for describing a major component of Si for us to reach some nontrivial view of the entire system. Recall that, with respect to the language system, we have called such a principle a “purely computational principle” (PCP) earlier (section 2.2). It is the “purely computational” nature of the functioning of CHL that gives rise to the intuition of (the relevant notion of) nonspecificity.

Intuitively, to say that P is purely computational is to say that the character—and hence the formulation—of P is such that its application (p.180) need not be tied to any specific system Si. In that sense, P could be involved in a system Sj which is (interestingly) different from Si in which P was originally found. It could turn out of course that only Si has P since only it requires P even if its formulation is nonspecific—that is, it could be that there is no need for P anywhere else (but, in that case, the nonspecific character of P remains unexplained). So the idea really is that, if a computational system other than the language system required P, then P must be nonspecific; it does not follow from this statement alone that there are other computational systems requiring P. That is an empirical issue, but it interestingly opens up only when the collection of Ps in Si begin to look as if they are non-Si specific.

Until very recently, linguists, including Chomsky, held a very different view of the language system. The GLOW manifesto, which represents the guiding spirit and motivation of current linguistic work, states explicitly that “it appears quite likely that the system of mechanisms and principles put to work in the acquisition of the knowledge of language will turn out to be a highly specific ‘language faculty’” (Koster, Riemsdijk, and Vergnaud 1978, 342). In general, Chomsky had consistently held that, even if the “approaches” pursued in linguistic theory may be extended to study other cognitive systems, the principles postulated by the theory are likely to be specific to language: “There is good reason to suppose that the functioning of the language faculty is guided by special principles specific to this domain” (Chomsky 1980, 44; also Chomsky 1986, xxvi). Notice that this view was expressed during the G-B period that promoted a strongly modular view of the organization of language (Boeckx 2006, 62–66), as we saw. The point of interest here is that the idea of linguistic specificity was advanced for the principles and operations that constitute the computational system of human languages.

Nonetheless, I am asking whether the elements of FL are dedicated to language alone, or whether there is some motivation for thinking that significant parts of FL might apply beyond language. I am suggesting that the most reasonable way to pursue this motivation, if at all, is to focus on the combinatorial part of the system to ask whether some of the central principles and operations of this part could be used for other cognitive functions. Therefore, the term “CHL” is to be understood as a rigid designator that picks out a certain class of computational principles and operations, notwithstanding the built-in qualification regarding human language. However, so far, I am thinking of CHL as restricted to language and some other human cognitive systems, especially those that may be viewed as “language-like,” ones that are likely to require P under a first (p.181) approximation. In this formulation of the issue, the human specificity of these systems is not denied although the domain specificity of some of the central organizing principles of these systems is questioned.

The formulation arises out of the intuition that, besides language, there are many other cognitive domains where combinatorial principles seem to play a central role: arithmetic, geometry, music, logical thinking, interpretation of syntactic trees, maps, and other graphic representations (Casati and Varzi 1999; Roy 2007), to name a few. If the elements of FL are to be used elsewhere at all, it is quite likely that they reappear in some of these domains; that is the step of generalization I have in mind. In a related way, the proposal might enable us to make better sense of the architecture of the language faculty, sketched earlier, in which domain-specific FLI systems are viewed as separate from the core computational system itself. If language is a distinct cognitive domain we will expect some linguistically specific effects to cluster somewhere while the computational system itself effects bare productivity in other domains as well. For that to happen, the computational system itself needs to be linguistically nonspecific.

To my knowledge, there has been little discussion on whether and to what extent the principles actually discovered in the study of language can be extended to other cognitive domains. Clearly, the issue under discussion here arises only for those cognitive domains for which a fairly abstract body of principles is already in hand. In other words, if the principles postulated for a cognitive domain are too directly tied to the phenomena they cover, then their very form will resist generalization across phenomenal domains. For example, questions of generalization could not have been interestingly asked for the system of rules discussed in the Aspects model of language (Chomsky 1965). For the cognitive domains under consideration here, it is generally acknowledged that a sufficiently abstract formulation has been reached, if at all, only for a few cognitive domains including language, and that too very recently. Thus, given the lack of suffcient advance in studies on other “languagelike” cognitive domains, the question that concerns us here has not been routinely asked.

Postponing Chomsky’s current and very different views on this issue to chapter 7, I will propose that a significant component of the language system, under suitable abstractions, consists wholly of purely computational principles. The proposal requires a study of the organization of grammar, principle by principle, to see if it is valid. Since we have just traced the development of grammatical theory, it seems to me that this is the right place (while grammatical theory is still fresh in our minds) to pursue the (p.182) proposal in terms of a quick review of what we have seen so far. I will discuss the significance, including empirical motivation, of the issue in the chapters that follow.

To recapitulate (section 2.2), we may think of four kinds of rules and principles that a linguistic theory may postulate: language-specific rules (LSR), construction-specific rules (CSR), general linguistic principles (GLP), and purely computational principles (PCP). It is obvious that, for the issue of nonspecificity, only PCPs count. From that point of view, the four kinds of rules basically form two groups: linguistically specific (LSR, CSR, GLP), and linguistically nonspecific (PCP). If PCP is empty, then the language system is entirely specific. If PCP is nonempty but “poor,” then the issue of nonspecificity is uninteresting beyond language. Thus, the real question is: is PCP rich? In other words, how much of the working of CHL can be explained with PCPs alone?

As we saw, the principles-and-parameters framework (P&P) postulates that rules of the first two kinds, that is, LSR and CSR, may be totally absent from linguistic theory. In these terms, a linguistic theory under the P&P framework postulates just two kinds of principles, GLP and PCP. However, it is clear that just the framework is not enough for our purposes, since the framework allows both GLP and PCP. Therefore, unless a more abstract scheme is found within the P&P framework in which PCPs at least predominate, no interesting notion of nonspecificity can emerge. The issue obviously is one of grades: the more PCPs there are (and less GLPs) in CHL, the more nonspecific it is. The task then is to examine the short internal history of the P&P framework itself to see if a move towards progressively PCP-dominated conceptions of CHL can be discerned. As noted, CHL has two components: some principles and one operation. I discuss these in turn.

5.2.1 Principles

Recall the organization of grammar in G-B theory schematically represented in figure 2.1. The following principles are postulated in that grammar: Projection principle, X-bar, θ-criterion, Case filter, principles of Binding, empty category principle, subjacency, chain condition, and Full Interpretation, among others. Let us now see how the principles postulated by G-B theory fall under the suggested categories of GLP and PCP. The classification is going to be slightly arbitrary; we will see that it will not affect the general argument.

The projection principle stipulates that lexical information is represented at all syntactic levels to guarantee that input information may not (p.183) be lost to the system. Any computational system requires that none of the representations that encode information are lost to the system until a complete interpretation is reached. However, the formulation of the projection principle mentions the linguistic notion of lexical information. This suggests an intermediate category of principles; call it “quasi-PCPs” (Q-PCP): linguistically specific in formulation, but PCP in intent.

X-bar is a universal template, with parametric options, that imposes a certain hierarchy among syntactic categories. Again, it stands to reason that any computational system will require some notion of hierarchy if a sequence of its elements is to meet conditions of interpretation. Still, it is not obvious that every symbol system must have the rather specific hierarchy of specifiers, heads and complements captured in X-bar theory. In that sense, the principle falls somewhere between GLP and Q-PCP. Given the uncertainty, let us assume the worst case that X-bar theory is GLP.

θ-theory seems linguistically specific in that it is exclusively designed to work on S-selectional properties of predicates. The θ-criterion (“each argument must have a θ-role”), the main burden of this theory, is phrased in terms of these properties. But what does the criterion really do, computationally speaking? As we saw (section 2.3), two kinds of information are needed to precisely determine the relations between the arguments projected at d-structure: an enumeration of arguments, and the order of arguments (Chomsky, Huybregts, and Riemsdijk 1982, 85–86). Thinking of thematic roles as lexical properties of predicates, the θ-criterion checks to see if elements in argument position do have this lexical property. To the extent that the θ-criterion accomplishes this task, it is a PCP. Yet, as noted, it is phrased in GLP-terms. In my opinion, it ought to be viewed as Q-PCP.

The Case filter (“each lexical NP must have Case”), the main burden of Case theory, is also linguistically specific in exactly the same way: it cannot be phrased independently of linguistically specific properties. Yet, as for the θ-criterion, the Case filter serves a purely computational purpose to check for the ordering part of the set of arguments; as we saw, the system does not care which lexical NP has which Case as long as it has a Case. In that sense, it is a Q-PCP as well.

Binding theory explicitly invokes such linguistically specific categories as anaphors, pronominals, and r-expressions to encode a variety of dependency relations between NPs. It is implausible to think of, say, musical quantifiers, anaphors and pronominals, just as it makes no sense to look for Subject-Object asymmetries in music. Notice the problem is not that other symbol systems may lack dependency relations in general; they (p.184) cannot. The issue is whether they have relations of this sort. Similar remarks apply to the Empty Category Principle (ECP). These are then GLPs.

This brings us to the principle of Full Interpretation (FI) and Bounding theory. Bounding theory contains the Subjacency principle that stipulates the legitimate “distance” for each application of Move-a. These distances are defined in terms of bounding nodes, which in turn are labeled with names of syntactic categories such as NP or S. Abstracting over the particular notion of bounding nodes, it is an economy principle that disallows anything but the “shortest move” and, as such, it is not linguistically specific; it is Q-PCP. Finally, the principle of Full Interpretation does not mention linguistic categories at all in stipulating that every element occurring at the levels of interpretation must be interpretable at that level; in other words, all uninterpretable items must be deleted. FI then is PCP.

Notice that most of the principles cluster at the inner levels of representation: d-structure and s-structure. Moreover, the principles discussed are a mixed bag of GLPs, Q-PCPs, and PCP; predominantly Q-PCPs, in my opinion. In this scenario, although PCP is nonempty, it is poor; hence, the system is not really nonspecific. But the predominance of Q-PCPs, and the relatively meager set of GLPs, suggests that there are large PCP-factors in the system which are concealed under their linguistic guise. If these factors are extracted and explicitly represented in the scheme, G-B theory can turn into one that is more suitable for nonspecificity. I will argue that the scheme currently under investigation in the Minimalist Program may be profitably viewed in that light.

As noted, the Minimalist Program is more directly motivated by the assumption that FL has optimal design. Two basic concepts, legibility conditions and conceptual necessity, are introduced to capture this assumption. On the one hand, we saw that the intermediate levels of d-and s-structures are eliminable from the system on grounds of conceptual necessity. On the other, we saw that most of the complex array of principles of G-B theory was clustered on these inner levels. With the elimination of these levels, MP enforced drastic reordering of principles.

Recall that we viewed X-bar theory, Binding theory, and ECP as GLPs. CHL, the computational system in MP, (arguably) does not contain any of them. Further, the projection principle, Subjacency, θ-theory, and Case theory were viewed as Q-PCPs. While the projection principle as such is no longer required in the system, Case theory is basically (p.185) shifted to the lexicon. θ-theory, Binding theory, and ECP are shifted to an external cluster of FLI systems, as we saw.

This leaves Subjacency, a Q-PCP. Q-PCPs are essentially PCPs under linguistic formulation. This raises the possibility, as noted, that just the PCP-factor may be extracted out of them, and explicitly represented in the system. The MP principle Minimal Link Condition (MLC) serves that purpose with respect to Subjacency. Similar remarks apply to the G-B condition on chains. This condition is first replaced by an economy condition called the Shortest Derivation Condition (SDC), which requires that, in case there are competing derivations, the derivation with the least number of steps is chosen (Chomsky 1995b, 130, 182); as we saw the condition was then implemented directly by restricting the domain of operation. Thus, the only G-B principle which is fully retained for the CHL in MP is Full Interpretation (FI), a PCP.

In sum, insofar as the G-B principles are concerned, all linguistically specific factors have been either removed from the system in MP, or they have been replaced by economy conditions. As I attempted to show, all the principles of MP have been factored out of those of G-B—that is, no fundamental principle has been added to the system in MP. The general picture, therefore, is that the CHL in MP is predominantly constituted of PCPs.

The preceding discussion of MP is not exhaustive. Let us also grant that the rendition of some of the individual principles and operations, regarding the presence or absence of linguistically specific factors in them, could be contentious. Yet, plainly, when compared to the G-B framework, the overall picture is one of greater generality and abstraction away from linguistic specificity. Recall that the only issue currently under discussion is whether we can discern a progressively PCP-dominated conception of CHL.

5.2.2 Displacement

A variety of objections may be raised against the picture. A general objection is that, granting that successive phases of linguistic theory do show a movement from GLPs to PCPs, PCPs are to be understood in the context of linguistic explanation (only).

The objection is trivially true if its aim is to draw attention to a certain practice. There is no doubt that these PCPs were discovered while linguists were looking only at human languages. We need not have entered the current exercise if someone also discovered them in the course of (p.186) investigating music or arithmetic. But the future of a theoretical framework need not be permanently tied down to the initial object of investigation. As Chomsky observed in the past, a sufficiently abstract study of a single language, say, Hidatsa, can throw light on the entire class of human languages; hence, on FL. This observation cannot be made if it is held that the non-Hidatsa-specific principles that enter into an explanation of Hidatsa cannot be extended to Hindi because Hindi was not in the original agenda.

No doubt, the laws and principles postulated by a theory need to be understood in their theoretical context. For example, the notions of action and reaction as they occur in Newton’s force-pair law (“every action has an equal and opposite reaction”) have application only in the context of physical forces even if the law does not mention any specific system. We cannot extend its application to, say, psychological or social settings such as two persons shaking hands. Global limits on theoretical contexts, however, do not prevent theoretical frameworks to evolve and enlarge within those limits. The force-pair law does not apply to social situations, but it does apply to a very large range of phenomena, perhaps beyond Newton’s original concerns in some cases. For instance, the law has immediate application in static phenomena like friction, but it also applies to dynamical phenomena such as jet propulsion. So the question whether principles of CHL apply to other cognitive systems is more like asking whether the force-pair law applies to jet propulsion, rather than to people shaking hands. The burden is surely on the linguist now to tell us what exactly the boundaries of the linguistic enterprise are.

A specific objection to the picture arises as follows. As we saw in some detail earlier, human languages require that sometimes an element is interpreted in a position different from where it is sounded. John and the book receive identical interpretations in markedly different structures such as John read the book and the book was read by John. It is the task of a transformational generative grammar to show the exact mechanism by which the element the book moves from its original semantic position to the front of another structure without altering semantic interpretation. A basic operation, variously called Move-a or Affect-a in G-B, and Move, Attract, or Internal Merge in MP, implements displacement. We saw all this.

Now the objection is that nothing is more linguistically specific than the phenomenon just described. A major part of CHL is geared to facilitate instances of movement in an optimal fashion. Thus, even if the requirement of optimality leads to PCPs, the reason why they are there (p.187) is essentially linguistic. In that sense, the phenomenon of displacement could be viewed as blocking any clear conception of nonspecificity of the computational system. To contest, I will outline a number of directions to suggest that the issue of displacement (hopefully) breaks down into intelligible options that are compatible with the general picture of nonspecificity.

First, suppose displacement is specific to human languages. In that case, the general picture will not be disturbed if the phenomenon is linked to other linguistically specific aspects of the system. From one direction, that seems to be the case. We saw that the lexicon, which is a collection of features, is certainly linguistically specific in the sense under discussion here. One of the central ideas in MP is that the lexicon contains uninterpretable features such as Case. Since the presence of these features at the interfaces violates FI, CHL wipes them out during computation. The operation that executes this complex function is Move. Move is activated once uninterpretable features enter CHL; displacement is entirely forced by elements that are linguistically specific.16 There are several ways of conceptualizing this point within the general picture.

If Move is an elementary operation in the system, then we may think of this part of the system as remaining inert until linguistically specific information enters the system. The rest of CHL will still be needed for computing non-linguistic information as Merge is activated to form complex syntactic objects. In effect, only a part of the system will be put to general use and the rest will be reserved for language. Chomsky (1988, 169) says exactly the same thing for arithmetic: “We might think of the human number faculty as essentially an ‘abstraction’ from human language, preserving the mechanism of discrete infinity and eliminating the other special features of language.” Alternatively, Move may not be viewed as an elementary operation but a special case of existing operations. There are suggestions in which Move is viewed as specialized Merge (Kitahara 1997; Chomsky 2001a). As we saw in some detail, an even simpler view is that Move is simply internal Merge. So if you have (external) Merge, you have (internal) Merge for free.

Second, we may ask whether displacement in fact is linguistically specific. We saw a CHL-internal reason for displacement triggered off by uninterpretable features. However, there is another reason for displacement. As noted, external systems impose certain conditions on the form of expressions at the interfaces. For example, (efficient) semantic interpretation often requires that items be placed at the edge of a clause to effect a variety of phenomena such as topicalization, definiteness, and the (p.188) like. The elimination of uninterpretable features takes an element exactly where it receives, say, a definiteness or quantifier interpretation.

Given that linguistic notions such as topicalization, definiteness, and so on—“edge” phenomena—are viewed as special cases of more general notions such as focus, highlight, continuity, and the like, could it be that the external systems that enforce these conditions are not themselves linguistically specific, at least in part? If yes, then these parts could be viewed as enforcing conditions on structures which are met in different ways by different cognitive systems in terms of the internal resources available there. For example, language achieves these conditions by drawing on uninterpretable features specifically available in the human lexicon; as we will see, music could be enforcing similar deleting operations with the “unstable” feature of notes that occur in musical progression. This will make the implementation of displacement specific to the cognitive system in action; but the phenomenon of displacement need not be viewed as specific to any of them. In any case, the issue seems to be essentially empirical in character; we just need to know more.


(1.) If this evidence holds, then the observation in Lasnik and Uriagereka 2005, 2, that CHL was an “aspect” in the classical theories in generative grammar is misleading. It misses the radical departure proposed in MP.

(2.) Thanks to Norbert Hornstein and Bibhu Patnaik for a number of suggestions on this section. Needless to say, I am responsible for the remaining mistakes.

(3.) The sketch of MP presented here is obviously sanitized; it ignores turbulent debates over each G-B principle for over a decade through which an outline of MP slowly emerged. In fact, some G-B principles were found to be wrong, not just dispensable. X-bar theory could be one of them. Presence of SVO structures in supposedly SOV languages, the striking example of polysynthetic languages (Baker 2001), and so on suggested that X-bar may not even be true. Thanks to Wolfram Hinzen (personal communication) for raising this. Also, see the illuminating discussion of phrase-structure theory in Hinzen 2006, 180–193.

(4.) Formally, it could be β as there is nothing to distinguish between a and b at the first step of derivation since both are LIs. We suppose that one of the products of Merge will be “deviant” and thrown out. Notice the problem disappears after the first step for the concerned lexical items (Chomsky 2006a); it will reappear, if two new LIs are to be merged as in the tree below (figure 5.3).

(5.) See Boeckx 2006, chapter 5, for a glimpse of the range of controversies.

(6.) The postulation of unambiguous paths has a variety of consequences for virtually every component of the system. Chomsky (2006c) observes that this “limitation” could be involved in minimal search within a probe-goal framework, requirement of linearization, conditions of predicate-argument structure and others. (See below.)

(7.) This is not to trivialize the complex problem of how linear order is in fact obtained in sound—that is, outside CHL.

(8.) Thanks to David Pesetsky (personal communication) for help.

(p.249) (9.) Although I have cited from a recent paper, the general idea appears repeatedly in Chomsky 1995b.

(10.) Hauser 2008 finds some indirect support for this view from cross-species studies.

(11.) See Reinhart 2006 for differences in computational mechanisms within the (narrow) language system and the systems just outside it leading to complicated “interface strategies.”

(12.) One consequence, among many, is that the operation (External) Merge can no longer be viewed as taking place at predesignated θ-positions.

(13.) If we view the system of I-meanings, which enter into computation (chapter 4), as another layer before classical C-I, the conception of FL would be even broader.

(14.) Chomsky (1995b, 261) does propose alternative formulations for a principle of last resort. But this was largely a rhetorical device to reject them all.

(15.) What follows is a quick summary of Mukherji 2003a; see this paper for details. I am indebted to Taylor and Francis Publishers for permission to use the material.

(16.) There seem to be some exceptions involving long-distance agreement to this general phenomenon, currently under study; see Chomsky 2006c for a powerful explanation.