+ All Categories
Home > Documents > Manaster-Ramer & Kac - The Concept of Phrase Structure

Manaster-Ramer & Kac - The Concept of Phrase Structure

Date post: 20-Jul-2016
Category:
Upload: stephanie-roberts
View: 16 times
Download: 1 times
Share this document with a friend
38
ALEXIS MANASTER-RAMER AND MICHAEL B. KAC THE CONCEPT OF PHRASE STRUCTURE 0. INTRODUCTION One of the perennial issues in the study of human language, debated especially in the mid fifties, and again in this decade, has been the ad- equacy of phrase structure as a model of syntax. In this paper, we shall argue that, from the very beginning, there has been confusion about exactly what theory was under discussion, so much so as to cast doubt on the substantive nature of this issue. Nevertheless, we contend that there are real questions concerning the power and the equivalence of various models that have been referred to as phrase structure grammars. In parti- cular, this is so in regard to a largely tacit conception of phrase structure that has emerged within current syntactic theory and which appears to be distinct from any of the models explicitly bearing this label. We will seek to capture this elusive notion by offering an unformalized but rather detailed decription of a class of grammars which seems to correspond to it. We will show that this class of grammars is unrestricted as to weak generative capacity but that it is severely restricted in its ability to analyze certain kinds of synactic patterns. While these patterns appear to be rather common in human languages, our goal will not be to argue about the validity of this particular model of syntax but rather to clarify just what it is that needs to be argued about. 1. THE MANY MEANINGS OF ~PHRASE STRUCTURE' The story of phrase structure begins with the introduction, by Bloomfield (1914, 1933), of the notion that the syntactic structure of a sentence could in most cases 1 be exhaustively specified by analyzing it into two or more immediate constituents (ICs) and then subjecting each of these to similar analysis until the level of ultimate constituents (that is, morphemes) is 1 It should be noted that, strictly speaking, Bloomfield's IC theory was not meant to cover all syntactic phenomena. He dealt separately with pro-forms ('substitution') and with sentences as opposed to clauses and phrases, i.e., "[w]hen a form is spoken alone (that is, not as a constituent of a larger form) [and] appears in some sentence-O,pe" (1933, p. 169). For some of the background of Bloomfield's ideas, see Percival (1966). Linguistics and Philosophy 13: 325-362, 1990. © 1990 Kluwer Academic Publishers. Printed in the Netherlands.
Transcript
Page 1: Manaster-Ramer & Kac - The Concept of Phrase Structure

A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B. K A C

T H E C O N C E P T O F P H R A S E S T R U C T U R E

0. INTRODUCTION

One of the perennial issues in the study of human language, debated especially in the mid fifties, and again in this decade, has been the ad- equacy of phrase structure as a model of syntax. In this paper, we shall argue that, from the very beginning, there has been confusion about exactly what theory was under discussion, so much so as to cast doubt on the substantive nature of this issue. Nevertheless, we contend that there are real questions concerning the power and the equivalence of various models that have been referred to as phrase structure grammars. In parti- cular, this is so in regard to a largely tacit conception of phrase structure that has emerged within current syntactic theory and which appears to be distinct from any of the models explicitly bearing this label. We will seek to capture this elusive notion by offering an unformalized but rather detailed decription of a class of grammars which seems to correspond to it. We will show that this class of grammars is unrestricted as to weak generative capacity but that it is severely restricted in its ability to analyze certain kinds of synactic patterns. While these patterns appear to be rather common in human languages, our goal will not be to argue about the validity of this particular model of syntax but rather to clarify just what it is that needs to be argued about.

1. T H E M A N Y M E A N I N G S OF ~ P H R A S E S T R U C T U R E '

The story of phrase structure begins with the introduction, by Bloomfield (1914, 1933), of the notion that the syntactic structure of a sentence could in most cases 1 be exhaustively specified by analyzing it into two or more immediate constituents (ICs) and then subjecting each of these to similar analysis until the level of ultimate constituents (that is, morphemes) is

1 It should be noted that, strictly speaking, Bloomfield's IC theory was not meant to cover all syntactic phenomena . He dealt separately with pro-forms ( 'substi tut ion') and with sentences as opposed to clauses and phrases, i.e., "[w]hen a form is spoken alone (that is, not as a consti tuent of a larger form) [and] appears in some sentence-O,pe" (1933, p. 169). For some of the background of Bloomfield's ideas, see Percival (1966).

Linguistics and Philosophy 13: 325-362, 1990. © 1990 Kluwer Academic Publishers. Printed in the Netherlands.

Page 2: Manaster-Ramer & Kac - The Concept of Phrase Structure

326 ALEXIS MANASTER-RAMER AND MICHAEL B. KAC

reached. This idea has had a variety of distinct formulations and has

proved exceptionally durable. It was one of the hallmarks of American structuralism, and has continued to serve as the core of most syntactic models that have emerged since, such as transformational and unification

grammars. While even the necessity of this type of analysis has from time to time been questioned (see, e.g., Hudson 1976, 1984), it is the question of its sufficiency that has arisen as a major concern in syntactic theory.

The issue of whether the syntax of human languages is amenable to description in these terms has on at least three separate occasions focused the attention of syntacticians. First the IC model was subjected to fierce criticism by Chomsky (1956, 1957, 1961, 1962a, 1962b, and passim), who

argued for a more complex formalism for human language syntax, namely, transformational grammar. Chomsky 's critique was based on a purpor ted formalization of IC analysis in terms of string-rewriting systems referred

to as phrase structure grammars (PSG's) , and it consisted in a series of arguments designed to show that such grammars were inherently incapable

of supporting theoretically adequate analyses of certain aspects of human language syntax. In certain works, these were termed 'constituent struc-

ture grammars ' (Chomsky 1962, 1963; Chomsky and Miller 1963). This usage was followed by Postal (1964a) and Harman (1963), but Chomsky (1965) and Ha r m an (1966) reverted to 'phrase structure grammar ' . 2

There have been many rejoinders to Chomsky 's arguments, but only

on two occasions have full-scale explicit at tempts been made to show that PSG could serve as an adequate generative model of human language syntax. The first was the work of Harman (1963), whose "discontinuous-

constituent phrase-structure grammar with subscripts and deletes" incor- porated, as the description indicates, several formal devices which were

barred by Chomsky 's definit ion? This defense of phrase structure was

rejected by Chomsky (1965, 1966), who argued that the augmentations resulted in a model that "has no more connection with phrase structure g rammar than antelopes have with ants" (1966, p. 41). While Harman ' s second article includes a response to Chomsky 's 1965 critique (pp. 290-92), 4 the debate went no further at the time, and Chomsky 's position

2 To add to the confusion, Harman (1963) used the term 'phrase structure grammar' to refer not to Chomsky's formalism but to the IC model of the structuralists. This usage was withdrawn in the 1966 article. 3 Harman's work was derived in part from that of Yngve (1960), where discontinuous disconstituents had also been used. 4 We are grateful to Professor Harman for calling our attention to his 1966 paper. An interesting point that we will discuss below is that in this paper he purports to argue for the adequacy of PSG as defined by Chomsky, even though he reiterates his earlier position that this definition did not do justice to the IC model.

Page 3: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 327

was widely accepted. In fact, the issue seemed to have been settled until the case for phrase structure was reopened with the advent of generalized phrase structure grammar (GPSG). The necessity of transformational rules was again called into question, and the possibility of a purely phrase structure amount of human languages was heralded once more (Gazdar, 1981, 1982; Pullum and Gazdar, 1982; Gazdar et al., 1985).

This latest development provides us with a fresh case of the general (and often difficult) problem of deciding when two outwardly different theoretical models are in fact equivalent, so that arguments for or against one necessarily apply equally to the other. The history of the phrase structure controversy is especially enlightening in this regard, and we feel that its lessons should inform any debate about such claims of equivalence which may arise in the future. The importance of the phrase structure case also stems from the inherent interest that attaches to a strong and controversial theoretical claim: There is widespread agreement that a convincing demonstration that human language syntax is - or is not - amenable to phrase structure analysis would have significant empirical

consequences. Yet, at the same time, Chomsky was surely right in his admonition that nothing of substance is to be gained by redefining existing terminology so as to obscure the subject of a debate, and the question of whether in fact the issues are truly substantive must be squarely faced.

Most of the discussion of phrase structure since Chomsky's original

critique has focused on context-free grammar (CFG). While it has always been clear that there exist non-CF phrase structure grammars, these have receded into the background. In particular, on each of the two occasions when the phrase structure approach to syntax has been advocated, the specific models proposed have been equivalent to CFG's in weak genera- tive capacity and have been explicitly advertised as notational variants of CFG's. Since the debate began with Chomsky's critique of the IC model, however, it becomes crucial to establish whether this model was itself merely a notational variant of CFG, and so we need to be clear, when making statements about PSG, whether these are about

(i)

o r

(ii)

the immediate constituent model of syntax, assuming that this was a single, coherent model,

the class of context-free grammars. 5

5 It should be noted that, while modern definitions of CFG follow Bar-Hillel et al. (1961) in permitting null productions (or, more loosely, deletions), Chomsky has consistently (e.g.,

Page 4: Manaster-Ramer & Kac - The Concept of Phrase Structure

328 ALEXIS MANASTER-RAMER AND MICHAEL B. KAC

Even if the two are ex tens iona l ly equ iva len t , this can only be de c ide d

by careful ly con t ras t ing the ou tward ly ( in tens ional ly) d i f ferent no ta t ions .

M o r e o v e r , the f ideli ty of this fo rma l i za t ion was ca l led into ques t ion by

H a r m a n , and indeed , as n o t e d by Pos ta l (1964, Chap . VI ) , it was t rue

unt i l r a the r r ecen t ly that : " T h e ma jo r i t y of l inguists a re unconv inced that

the t heo ry of ph rase s t ruc ture cor rec t ly r ep re sen t s the m o r p h o s y n t a c t i c

ideas with which mos t m o d e r n l inguists w o r k " . 6 I n d e e d , we will also argue

that m a n y aspects of B loomf ie ld i an IC theo ry were absen t f rom C h o m s k y ' s

g rammars .

Be fo re discussing these issues, however , we need to po in t out that

C h o m s k y ' s a t t emp t s at fo rmal iz ing IC analysis invo lved a b r o a d e r class of

fo rmal g r a m m a r s than the C F G ' s . W h e n C h o m s k y i n t roduc e d P S G ' s , he

def ined t hem so as to al low the use of con tex t (1956 [1965, pp. 111-112]):

A phrase-structure grammar is defined by a finite vocabulary (alphabet) Vp, a finite set E of initial strings in Vp, and a finite set F of rules of the form X ~ Y, where X and Y are strings in Vp. Each rule is interpreted as the instruction: rewrite X as Y. For reasons that will appear directly, we require that in each such [E, F] grammar

(18) ~: E~ . . . . . E,, F: Xl --+ Y1

Xm----~ Ym.

Yi is formed from Xi by the replacement of a single symbol of X+ by some string. Neither the replaced string nor the replacing string may be the identity element.

This class of g r a m m a r s , which requ i res tha t at each s tep in the de r iva t ion

a single symbol in some con tex t be r ewr i t t en as a non-nul l s tr ing, was

ca l led type-1 in C h o m s k y (1959), bu t was subsequen t ly r e n u m b e r e d type-2

in C h o m s k y (1963). 7 In add i t ion , C h o m s k y has ca l led these g r a m m a r s

context-restricted (1961) o r context-sensitive (1963 and ever since).

H o w e v e r , the re has a p p a r e n t l y been some confus ion abou t this po in t ,

as ev idenced by the fact tha t the t e rm phrase structure g rammar is of ten

1956, 1959) defined CFG's, like CSG's, so as to exclude this possibility. Also, regular expressions have never been allowed by Chomsky to appear on the right-hand sides of CF productions. 6 Postal (p. 72) specifically cites Pike (1954-60, III, p. 36) and Hockett (1961, p. 230). 7 Of course, in the 1959 numbering scheme, type-2 referred to context-free grammars, which became type-4 in 1963. The 1961 type-1 grammars are a proper superset of the CSG's, defined by the requirement that the right-hand side of each rule not be shorter than the left- hand side. There is no difference in weak generative capacity between these two classes, and the distinction between them will not play any role in the subsequent discussion here, though it behooves us to note that many theoretical computer scientists use the term con- text-sensitive grammar to refer to the broader class (e.g., Hopcroft and Ullman, 1979, p. 223).

Page 5: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T OF P H R A S E S T R U C T U R E 329

- in theoretical computer science, almost always (e.g., Harrison, 1978, p. 13; Hopcroft and Ullman, 1979, p. 220; Berwick and Weinberg, 1982, p. 168, n. 3) - used to denote type-0 (unrestricted) grammars, which generate all the recursively enumerable languages, a proper superset of the context- sensitive languages, s The reason seems to be that, while Chomsky (1956) clearly only dealt with CSG's, his later works sometimes offer informal descriptions of PSG's that leave out crucial conditions (such as the con- straint against null replacement, e.g., 1957, p. 29) or would present the formal definition of a 'grammar' (i.e., a type-0 grammar) either immedi- ately after or even in the midst of an extended prose discussion of PSG's, without emphasizing the distinction (e.g., 1959 [1965, p. 128-29]). Since, moreover, type-0 grammars were not even considered in the 1956 paper, it might have seemed that, in his later works, Chomsky was in fact broad- ening the scope of the term phrase structure grammar to include them.

For example, (e.g., 1959 [1965, p. 128]):

A phrase structure grammar consists of a finite set of 'rewriting rules' of the form &---~ ~, where 05 and ~ are strings of symbols. It contains a special 'initial' symbol S (standing for 'sentence') and a boundary symbol # indicating the beginning and end of sentences.

Since nothing is said about the relative lengths of 4~ and t), this appears to refer to type-0 grammars. In subsequent passages additional restrictions on the form of rules are introduced, which do refer to CSG (pp. 128-29):

If appropriate restrictions are placed on the form of the rules 0 5 ~ 0 (in particular, the condition that qJ differ from 49 by replacement of a single symbol of q5 by a non-null string), it will always be possible to associate with a derivation a labeled tree in the same way.

But on a first reading, it may not be clear that Chomsky is still defining PSG's in general rather than a special subclass of these. Only a careful reading rules this out, as does the explicit definition of constituent-struc- ture grammars in Chomsky (1961, p. 9), which reiterates the constraints stated in the 1956 paper:

Suppose that each syntactic rule 05~ 0 meets the additional condition that there is a single symbol A and a non-null string w such that & = xIAx2 and ~ = XlW/tv2 . . . . A set of rules meeting this condition I will call a constituent structure grammar. If in each rule 05-+ 0, 05 is a single symbol, the grammar (and each rule) will be called context-free; otherwise, context-restricted.

In light of all this, it appears that Chomsky consistently understood PSG's

8 Some computer scientists, however, like to use the term type-O to refer to some proper subset of unrestricted grammars. Thus, Harrison (1978, p. 17) applies it to the class of grammars in which the left-hand sides of productions may not contain terminal symbols. But these grammars still generate all the r.e. languages.

Page 6: Manaster-Ramer & Kac - The Concept of Phrase Structure

330 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B , K A C

to be formally defined as CSG' s in the nar row sense, i .e. , g rammars which replace a single symbol by a non-nu l l string in some (possibly null) context ,

at each step in the der ivat ion. This gives us a third no t ion of PSG:

(iii) context-sensi t ive grammar .

However , a l though strictly speaking n o n - C S G ' s were not cons idered

PSG's , we will show that Chomsky ' s a rguments against phrase s t ructure

as a mode l of h u m a n language syntax apply just as much to type-0 gram-

mars as they do to CSG's or CFG' s .

The story does not end there, for in subsequen t work Chomsky h in ted

at a class of g rammars more restricted than CSG' s but more general than

C F G ' s as the appropr ia te model of IC analysis. These were labeled type-3

grammars , because they were defined by a Cond i t ion 3 which dist inguished

them from type-2 (i .e. , context-sensi t ive) g rammars (1963, p. 366): 9

Condition 3 G' is a type 2 grammar containing no rule x1Ax2x1~oX2, where ~o is a single nonterminal symbol (i.e., v E VN).

The mot iva t ion for these g rammars was to permi t contextual restrict ions

wi thout allowing, as in the general class of CSG's , the possibility of

pe rmuta t ion . The fact that these g rammars are more powerful than C F G ' s

was thus par t of their appeal (ibid):

Only one nontrivial property of type 3 grammars in known, namely, that stated in Theorem 14. This class of grammars merits further study, however. It seems that Condition 3 provides a reasonably adequate formalization of the set of linguistic notions involved in the richest varieties of immediate constituent analysis? °

In fact, in the same paper , Chomsky argued that even this non-con tex t

free model was still too restricted for h u m a n language (ibid):

Condition 3, as it stands, is too strong to be met by actual grammars of natural languages, but it can be revised, without affecting generative capacity, to be perhaps not unreasonable for the construction of grammars of language-like systems. Suppose that we allow the grammar G to contain a rule x1Ax2---~Xao~x2 only when a is either terminal (as in Condition 3) or when a dominates only a finite number of strings in the full set of derivations (and P- markers) constructible from G. This essentially amounts to the requirement that if a category is divided into subcategories these subcategories are not phrase types but word or morpheme classes. To the extent that systems of the kind we are now discussing are at all useful for

9 The reader should beware of any confusion due to the fact that in the more familiar 1959 numbering system, type-3 refers to regular grammars and type-2 to CFG's. l0 Chomsky's Theorem 14 holds that the language L = {ac"f2+4"d~b, n >1 0}, which is not context-free, is a type-3 language. From this it follows that the proper containment of grammar types extends to classes of languages, that is, that the type-3 languages are a proper superset of the CFL's.

Page 7: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 331

grammatical description, it seems likely that the particular subclass meeting this condition will in fact suffice.

To do justice to the historical facts, then, we must take into account some further possible notion of PSG:

(iv) some proper superset of CFG's such as Chomsky's 'type-3' grammars.

These particular ideas of Chomsky's appear to have been generally forgot- ten, but it is uncanny that in recent years there have been a number of proposals to the effect that human languages are 'mildly context-sensitive', i.e., not contained within the class of CFL's but within some class which is larger than the CFL's and itself properly contained within the CSL's (Joshi, 1985; Gazdar and Pullum, 1985; Gazdar, 1985).

As we have seen, Chomsky's notion of phrase (constituent) structure was supposed to correspond to the theory of immediate constituents. Indeed, in his early works (1956 [1965, p. 111], 1957, p. 34, 1959 [1965, p. 132]), Chomsky refers to IC analysis by name. In later publications (1965, 1966), the criticism is directed more generally at 'structuralist' or 'taxonomic' grammars, but throughout it is clear that PSG was supposed to be the formalization of a pre-existing theoretical concept. We have also seen that, while Chomsky tried to restrict the formal model of PSG in certain ways, it always remained non-context-free. Given this, it might seem mysterious that, as noted above, the debates about the adequacy of phrase structure have focused on CFG. The mystery is solved when we examine some other of Chomsky's writings, especially the linguistic (as opposed to mathematical) ones, for in several of these the claim is made that actual IC systems were purely context-free.

We first find this position in a paper read as early as 1958 but published some years later, where Chomsky (1962b) presents context-sensitive gram- mar as a formalization of Harris's (1951) morpheme-to-utterance formulae but argues that actually Harris's model was context-free. The context- sensitivity is offered as a modification of Harris's original system, but one made "in accordance with his intentions" (p. 129). Here Chomsky seems to be alluding to Harris's observation (1946, p. 182) that "It]he great bulk of selection features, especially those that distinguish between individual morphemes, cannot be expressed except by very unwieldy formulae". The context-sensitive rules would then be used to capture the selectional facts missed by Harris's context-free formulae.

However, we know that Chomsky regarded Harris's model as having other, deeper limitations, which could not be overcome except by the

Page 8: Manaster-Ramer & Kac - The Concept of Phrase Structure

332 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

introduction of transformations. Apparently, then, in order to focus the discussion on these more serious defects, he would direct his criticism at a model which had the needed context-sensitive features built in. Thus, the (context-sensitive) PSG's would not strictly speaking formalize the morpheme-to-utterance model. Rather, the most obvious shortcoming of the latter would be patched up by the use of context-sensitive rules to capture selectional restrictions, and the discussion could then be focused on what Chomsky perceived to be the real arguments for the superiority of transformational grammar. However, none of this was made clear at the time.

At the same time, Chomsky repeatedly describes his PSG's as represent- ing not just the morpheme-to-utterance system but also linguistic analysis "on the IC level" (p. 129; see also p. 129, no. 11, p. 130, p. 131), without specifying in detail the relation of the IC and the morpheme-to-utterance approach. It would seem that he was accepting Harris's (1951, p. 278) assertion that the morpheme-to-utterance and IC models differed only in direction (bottom-up vs. top-down). Thus, Chomsky was apparently claiming at this time that both the morpheme-to-utterance and IC theories were context-free in letter, but context-sensitive in intent. Essentially the same position was taken by Chomsky (1961, p. 9, no. 8):

Immediate constituent analysis as developed within linguistics, particularly in the form given to this theory by Z. S. Harris, Methods in Structural Linguistics, (Chicago, Univ. of Chicago Press, 1951), Chap. 16, suggests a form of grammar similar to what is here called context- free constituent structure grammar.

Here, too, he adds the qualification that "context-restricted rules are unavoidable, in practice, in grammatical description" ( ibid.) .

The same view is reiterated in the first part of Chomsky (1962a), al- though here his criticism is directed at what he calls the 'taxonomic model' of generative grammar. This model is context-sensitive, inasmuch as 'The syntactic component consists of an unordered set of rewriting rules, each of which states the membership of some phrase category or formative category in some context" (p. 510). The taxonomic model is presented as "a direct outgrowth of modern structural linguistics" and "no more than an attempt to formulate a generative grammar which is in the spirit of modern procedural and descriptive approaches". Specifically, it is again said to constitute an extension of Harris's morpheme-to-utterance system.

However, for the first time, Chomsky appears to make a distinction between the morpheme-to-utterance theory, which he continues to take as context-sensitive in spirit, from IC models proper, which he apparently considers to be context-free tout co u r t (p. 557, fn. 3):

Page 9: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 333

On the syntactic level, the taxonomic model is a generalization from Harris' morpheme-to- utterance statements, which constitute the nearest approach to an explicit generative gram- mar on this level. Furthermore, most modern work in syntax is actually more adequately formalized in terms of rewriting rules with null context (in particular, this is true of Pike's tagmemics, as of most work in IC analysis).

Unfortunately, Chomsky did not state this distinction clearly, or say any- thing more specific about IC theory. In particular, he did not cite any

specific names of scholars or titles of publications associated with IC

analysis. To be sure, in a later revision of the same work, Chomsky (1964, p. 916, n. 4) amplifies this s tatement with a reference to Postal 's (1964a) detailed critique of six nontransformational models of syntax.

However , Postal 's account contradicts Chomsky 's claims, and makes the pr ima facie case that both IC and morpheme- to-ut terance analysis did involve context-sensitivity. Of the six structuralist approaches he exam-

ined, Postal claimed that only three were context-free, and these were all models developed at roughly the same time as or later than transfor-

mational grammar: the tagmemics of Pike (1954-60), the statificational g rammar of Lamb (1962), and Hocket t ' s (1961) constructional grammar.

The remaining post- transformational model considered was Harris 's string analysis (1962), which Postal believed to be context-sensitive.

Of theories earlier than transformational grammar , Postal discusses Harris 's morpheme- to-u t te rance model and the IC model proper. For the

latter, Postal does not refer to the original theoretical and descriptive work of Bloomfield, but only to Bloch (1946), Wells (1947), and Hocket t (1954), 11 but he still concludes (p. 79) that the IC model - like the

morpheme- to-ut terance one - was context-sensitive and contrasts it on this very point with the three later, context-free models (ibid). Moreover ,

he cites (p. 19) an example of what he takes to be context-sensitivity (see below) f rom Bloch. 22 Finally, writing about the context-sensitivity of

Harr is 's morpheme- to-ut terance formulae (pp. 25-26), Postal contradicts Chomsky 's position that context-sensitivity was only implicit in this kind of work by citing the explict use of it in Harris 's writings. If Postal was

right, then context-sensitivity was part and parcel of both IC and mor- pheme-to-ut terance systems of syntactic analysis.

In point of fact, there is no doubt that Harris made extensive use of

n Postal refers to Hockett's model as the "item and arrangement system", but Hockett is quite clear that he is talking about IC analysis. t2 As we mention below, Postal apparently mistook a case where Bloch (tacitly) refers to cross-classification, for context-sensitivity. This is apparently the only case where Postal claims to find context-sensitivity in IC work.

Page 10: Manaster-Ramer & Kac - The Concept of Phrase Structure

334 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B. K A C

context-sensitivity as a way of controling the substitutions allowed by his morpheme-to-utterance formulae (e.g., 1946, pp. 166-67):

Equations will be used to indicate substitutability. B C = A will mean that the sequence consisting of a morpheme of class B followed by a morpheme of class C can be substituted for a single morpheme of class A . . . . When we want to say that A substitutes for B only if C follows, we shall write A C = B C .

This means that contextual restrictions were known before the advent of generative grammar. To be sure, neither Bloomfield nor Bloch seem to have made much use of them. Perhaps the clearest example would be Bloch's analysis of the zero variant of the copula in Japanese (1946, p. 213):

Before the particle ka . . . . the non-past indicative of the copula, da, is replaced by a zero alternant (i.e., drops out).

Thus, on Postal's account - and he appears to have been right - the use of context-sensitivity in PSG's would have been nothing more than simple accuracy in formalizing either the morpheme-to-utterance or the IC model. 13

In subsequent work of Chomsky's, we find no direct reference to IC analysis at all, but we do find the sweeping generalization that "it has also been shown that almost all of the nontransformational syntactic theories that have been developed within modern linguistics, pure or applied, fall within this framework" (1965, p. 67) and specifically, that "every variety of syntactic theory that falls within the general range of taxonomic syntax seems to me to be formalizable (insofar as it is clear) within the framework of phrase structure grammar (in fact, with rare exceptions, its context- free variety)" (1968, p. 39).

This is the position that seems to have been tacitly accepted by much of the field, although it does not correspond to the explicit definitions of phrase structure stated by Chomsky. As a result, CFG's alone have figured in subsequent debates over the adequacy of phrase structure models. Thus, both Harman and the generalized phrase structure grammarians explicitly undertook to defend the context-free model. More generally, it

13 McCawley (1968) has argued that the use of context in what he merely describes as "most Amer ican structuralist grammars of the 1940's" amounted to context-sensitive node admissibility conditions, which have the weak generative capacity of CFG's (Peters and Ritchie, 1969). This might appear to render the issue moot, until one realizes that the equivalence result speaks only to the class of languages that can be generated. It should be apparent that context-sensitive node admissibility systems can represent selectional and agreement phenomena in ways impossible in CFG, so the issue of what the structuralist models were capable of would still remain.

Page 11: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 335

has become part of the folklore of syntax that that is what the phrase structure controversy is all about, as though the arguments against phrase structure were specifically concerned with the limitations of context-free models (e.g., Savitch et al., 1987, pp. vii-viii; Pullum and Gazdar, 1982, pp. 472-475, and the references cited there, Bresnan et al., 1982, p. 613). This is turn would seem to be the reason why questions of weak generative capacity have so often been seen as bearing on the issue of the adequacy of phrase structure models for human language syntax. However, as we will show in the next section, the arguments raised by Chomsky against PSG are by no means so restricted. They apply with equal force to context- sensitive, to type-l, and even to type-0 (unrestricted) grammars. This means, of course, that weak generative capacity cannot be the issue, since no formal grammar has greater weak generative capacity than type-0 grammars.

2. O N T H E L I M I T A T I O N S O F P H R A S E S T R U C T U R E

Given what has been said about the history of the definition of PSG, it should be clear that the classic arguments against this model were directed specifically against context-sensitive grammars. The confusion about this point has obscured the full force of these arguments, for Chomsky's case against phrase structure, if accepted, establishes the inadequacy of much more than just CFG's. In fact, although this was apparently never stated at the time, almost all of these arguments apply with equal force to type- 0 grammars. In particular, this would seem to be the case for arguments involving unbounded branching (constituents with arbitrarily many im- mediate subconstituents), discontinuous constituents, separate provision for linear order as opposed to constituency, and cross-classification.

Unbounded branching is involved, on most analyses, in coordinate structures (such as John and Mary and Sally and Bur t . . .). Chomsky and Miller (1968, p. 298) took the inability of PSG's to derive such structures as a telling argument against them. It should be clear that a grammar that contains rules of the form

NP---~ NP(CONJ NP)*

would account for the phenomena in question at least as well as a transfor- mational system. Moreover, we now know that allowing such rules in a CFG does not increase its weak generative capacity. But such rules were not allowed by the letter of Chomsky's definition of either CSG's or type- 0 grammars, any more than by his definition of CFG's.

Discontinuity is involved in the well-known arguments for the transfor-

Page 12: Manaster-Ramer & Kac - The Concept of Phrase Structure

336 ALEXIS MANASTER-RAMER AND MICHAEL B. KAC

mational analysis of English auxiliary verbs and the suffixes which they demand on cooccurring verb stems. The facts as described by Chomsky (1957) are that certain functions in the verbal system (perfect, progressive, and passive) are characterized by complexes consisting of an independent auxiliary verb and a suffix attached to an adjacent verb stem (viz. have V- en, be V-ing, and be V-en). Chomsky used context-free base rules to derive have-en, be-ing, and be-en as continuous elements and then let a transformation transport each suffix to the end of the appropriate v e r b stem, but the same facts could be handled by describing have..-en, be..-ing, and be..-en, as discontinuous constituents. As Chomsky himself notes, "in the auxiliary verb phrase we really have discontinuous elements - e.g., . . . the elements have..en and be..ing" (1957, p. 41).

As another example, consider Russian yes/no questions, which differ from statements, in that the morpheme li appears inside the clause, e.g., Prigol-li Ivan? 'Did Ivan come?' . There are formalisms which would allow us to analyze this sentence as being composed of the two constituents li and prigol Ivan (which by itself is equal in form to the statement 'Ivan came'). Such analyses can be stated in a model of grammar which allows discontinuous constituents, such as the weakly context-free grammars of Yngve (1960) and Harman (1963), but not in any type of Chomsky's grammars. Whether type-0, context-sensitive, or context-free, the latter operate on strings, and restrict rewriting to the replacement of a continu- ous string of grammar symbols by another such string. 14

More generally, many of the arguments for transformations have hinged on dependencies between elements which are not sister constituents. If we consider a language where questions differ from statements in word order, like German Ist Hans gekommen? 'Did Hans come?' (vs. Hans ist gekommen 'Hans came'), we see that discontinuous constituents are not enough. In such cases, it is simply the difference in word order which distinguishes questions from statements, and there are formalisms which can state this fact directly, by handling constituency and linear order separately (e.g., the GPSG of Gazdar et al., 1985). However, like CFG's, both CSG's and type-0 grammars operate on totally ordered structures, and cannot handle such phenomena in the same way.

Cross-classification is another phenomenon which figures prominently

14 By comparison, consider languages in which yes/no questions differ from statements by containing a marker which is clause-initial (e.g., Polish czy) or clause-final (e.g., Japanese ka). In such cases, even a simple CFG could analyze the question marker as one IC and the rest of the clause as another.

Page 13: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 337

in arguments for transformations. ~5 Two such arguments (Chomsky, 1956)

have become especially famous. One involves nominalizations like the shooting of the hunters in relation to clauses such as The hunters shoot and They shoot the hunters. While Chomsky argued that the ambiguity of

the nominal constructions had to be accounted for by assuming that they

derive from deep structures which look more or less like the verbal con-

structions, it should be apparent that an analysis involving complex sym-

bols is also possible. Since Chomsky was concerned with expressing the

fact that the hunters can be either subject or object, we could introduce

corresponding features and make this phrase be either of the complex

category [NP, subject] or [NP, object]. Alternatively, we could also refer

to sentences like The hunters are shot instead of They shoot the hunters and use features to assign shooting to two different categories such as [N,

active] and [N, passive]. If rules of the grammar are allowed to refer to

such features, then it becomes possible to handle the verbal and nominal

constructions as related without recourse to transformations.

The other argument of Chomsky's involves the relation between active

and passive clausal constructions themselves. Again, the dependencies

here involve differing word orders and cooccurrence restrictions on non-

sister constituents (the be..en auxiliary element and transitive verbs, for

example). The use of complex categories with features such as [active]

and [passive] and possibly also [agent] and [patient] would allow the same

general statements to be made as was possible under a transformational

analysis. 16 It becomes clear, incidentally, that what is responsible for the

apparent problems in analyzing the active/passive relation is the particular

word order and cooccurrence facts, rather than some inherent property

of voice as a grammatical notion. If there were to exist a language in

which actives and passives differed only the way that statements and yes/no

questions do in Polish or Japanese, then a phrase-structural analysis would

15 Moreover, many of the phenomena that might seem to be amenable to analysis in terms of discontinuous constituency or of separate word order rules may actually involve cross- classification. For example, many would argue that in the English auxiliary system it is desirable to treat the V-en and V-ing sequences as word-level constituents. This is, of course, precluded by the analysis in terms of discontinuous elements. However, the use of a suitable feature system of categories would allow such an analysis. a6 The limiting case would be the use of complex categories to encode directly the steps in a transformational derivation, with features like [underlying], [derived-by-the-Passive- transformation], etc. Of course, to get the full weak generative capacity of transformational grammars, an infinite number of complex symbols would be required.

Page 14: Manaster-Ramer & Kac - The Concept of Phrase Structure

338 ALEXIS M A N A S T E R - R A M E R AND MICHAEL B. KAC

be perfectly possible.17 Such a language may not be attested, and perhaps it is not even typologically possible, but the fact remains that the difficulties with handling the active/passive relation in a PSG had to do with its inability to handle certain kinds of the word order and cooccurrence facts.

And, once again, we find that cross-classification can be expressed in weakly CF models, such that of Harman (1963) or Gazdar et al. (1985), but not in CFG, CSG, or type-0 grammar, all of which employ unanalyzed nonterminal symbols.

To summarize, type-0 grammars fare no better than CFG's or CSG's on Chomsky's arguments, since they do not allow any of the four grammatical devices suggested above: discontinuous constituents, separation of linear order from constituency, complex nonterminal symbols (cross-classific- ation), or unbounded branching. 18 As a result, if we assume that any of these phenomena must be directly represented in grammars of human languages, then these would have to be more powerful in some sense than type-0 grammars. Now, the Church-Turing thesis, together with the fact type-0 grammars enumerate the same class of languages that Turing ma- chines do, implies that this is precisely the class of languages that can be enumerated by any sort of computation whatsoever.

Hence, Chomsky's arguments against PSG must be logically indepen- dent of considerations of weak generative capacity. This is a point of some importance, since, if correct, it shows that we cannot take literally Chomsky's goal of finding a place for the grammars of human languages in the hierarchy which has come to be known by his name (1959 [1965, p. 127]):

Given such a classification of special kinds of Turing machines, 19 the main problem of immediate relevance to the theory of language is that of determining where in the hierarchy of devices tile grammars of natural languages lie. It would, for example, be extremely

17 This would be a hypothetical language in which the agents and patients occupy the same positions in active and passive sentences, but the latter are marked by a special sentence- initial or sentence-final particle. Thus, the hypothetical passive of John saw Mary might be John saw Mary pa (where pa is the hypothetical passive morpheme). is Early arguments for transformations also sometimes referred to deletions. However, the situation here is different than in the case of the other phenomena considered, because these are allowed by Chomsky's definition of type-0 grammars. Any argument for the necessity of deletions would, to be sure, still count against CSG's, but it would not apply to type-0 grammars. 19 This refers to the hierarchy of grammars, which Chomsky is presenting as special cases of the Turing machine formalism.

Page 15: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 339

interesting to know whether it is in principle possible to construct a phrase structure gram- mar 2° for English (even though there is good motivation of other kinds for not doing so).

If even type-0 grammars are inadequate as models of human language,

then the correct class o f models, whatever they may be, will not be

anywhere in this hierarchy. Chomsky must have meant not the hierarchy

of devices, but the corresponding hierarchy of languages, although even

then it might well be that human languages as a class are incomparable

to the types of languages defined by his hierarchy.

It is worth recalling in this regard that the case against PSG argued by

Chomsky in the late fifties made no reference to weak generative capacity.

The kinds of arguments we have talked about in connection with un-

bounded branching, discontinuity, cross-classification, and linear order

never claimed that a certain language could not be weakly generated by

a PSG. Indeed, while arguments directed specifically against CFG came

to rely on claims of weak inadequacy during the sixties, the quotation

above shows that as late as 1959 Chomsky was still considering the possibil-

ity that human languages are context-sensitive. The reasons for rejecting

PSG's as a model of human language had to do with theoretical motiva-

tions which find partial formal expression in the notion of strong generative

capacity but on the whole have remained outside the scope of formal

language theory.

As a result, the divisive and much-debated issue of whether human

languages are context-free or not has little to do with the question of the

adequacy of PSG's. Hence, questions of what weak generative capacity is

required to model human languages have no bearing on the adequacy of

phrase structure models for such languages.

3. S O M E N E G L E C T E D A S P E C T S O F IC A N A L Y S I S

We return now to the question of the extent to which the phrase structure

debate has been confused from the very beginning by the fact that Chom-

sky's model of PSG failed to do justice to the structuralist theories and

practices which it was supposed to formalize. We have seen that there

20 I.e., CSG. Postal (1964a, p. 76, 1964b) would later claim that Mohawk was probably not a CSL, though he never advanced an explicit argument for this. All this indicates that there may have been some unclarity about the full extent of the weak generative capacity of CSG's, for a time. Cf. Chomsky's remarks a few years later: "Although type 1 grammars generate only recursive sets, there is a certain sense in which they come close to generating arbitrary recursively enumerable sets" (1963, p. 361), or the classic formulation by Hopcroft and Ullman: "'Almost any language one can think of is context-sensitive..." (1979, p. 224).

Page 16: Manaster-Ramer & Kac - The Concept of Phrase Structure

340 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

has been a persistent discrepancy between Chomsky's claims that the approaches he was formalizing were really context-free and his practice of allowing context-sensitivity in the formalism. However, context-sensitiv- ity is in fact a clumsy way of achieving the descriptive power which Bloomfield had without it, and the real shortcoming of Chomsky's pro- posed formalization of both IC and morpheme-to-utterance work will turn out not to be the muddle over context-sensitivity but the fact that he simply left out numerous other devices which the structuralists used to account for complex grammatical phenomena. These included discontinu- ous constituency, separate treatment of linear order and constituency, cross-classification, unbounded branching and null (zero) elements.

Unbounded Branching

When Chomsky was marshalling his arguments against PSG, one of the most telling was the inability of such grammars to provide for unbounded branching (Chomsky and Miller, 1963, p. 298). Moreover, Harman ac, cepted this aspect of Chomsky's model of phrase structure and did not provide any device for unbounded branching, a point which Chomsky (1965, p. 196) in fact raised against him. As is well known today, however, the augmentation of any class of PSG's with regular expressions formed with the Kleene star operator (which denotes concatenation closure) al- lows for unbounded branching to be induced, and these devices have been included in both transformational and nontransformational grammars (in- cluding GPSG).

Yet IC analyses of the structuralist period routinely allowed construc- tions with an unbounded number of immediate constituents. For example, Bloch (1946, p. 207) takes a Japanese sentence to consist of two IC's, the first of which is described as consisting of "as many constituents as there are non-final [i.e., subordinate] clauses", and likewise treats Japanese clauses as composed of a predicate IC preceded by an IC comprising "as many constituents as there are clause attributes". Since the number of subordinate clauses or clause attributes is unlimited, the first IC in each case may itself consist of an unbounded number of IC's. Perhaps even more telling is the way that Bloomfield analyzed coordinate constructions, which were precisely the forms that Chomsky wanted unbounded branch- ing for. Bloomfield's description unmistakably implies the possibility of any number of IC's, for he treats coordinate constructions as having as many heads as they have conjuncts (1933, p. 195):

Endocen t r i c cons t ruc t ions are of two kinds , co-ordinate (or serial) and subordinative (or

Page 17: Manaster-Ramer & Kac - The Concept of Phrase Structure

THE CONCEPT OF PHRASE STRUCTURE 341

attributive). In the former type the resultant phrase belongs to the same form-class as two or more of the constituents.

Bloch (1946: pp. 228-230) analyzes Japanese coordinate noun phrases

in almost the same way, treating "the head [as] consist[ing] of a series of two or more nouns, each noun with or without one or more modifiers

preceding, and each noun, or each except the last, followed by a CON-

J U N C T I V E P A R T I C L E " . There is also evidence for this sort of analysis

in Harris's [1951, p. 289) description of Moroccan Arabic coordination.

While, on the other hand, he says that "Any morpheme class or sequence

plus u plus an equivalent morpheme class or sequence equals the mor-

pheme class or sequence itself", 2L he goes on to add that "When two or

more N ~ occur with Va, the Va contains the plural m o r p h e m e . . . - . 2 2 The

addition of the words or more suggests that Harris intended his statement

of coordination to allow unbounded sequences of conjuncts rather than merely two. z3

Insofar as IC analysis provided for unbounded branching, then, Chore-

sky's formalization was off the mark. The use of the Kleene star in GPSG

thus represents a return to structuralist roots but it is nonetheless a crucial

departure from the model that Chomsky was arguing against in the first

place. Strictly speaking, a grammar with rules of the form A - - ~ X A * Y is

not a phrase structure grammar in Chomsky's sense. In current syntactic

literature no distinction is made between grammars which use regular

expressions and grammars which do not. But while the use of regular

expressions has no effect on weak generative capacity, the fact that it does

affect strong generative capacity means that the question of whether such

devices are available is highly significant in the context of disputes over

the theoretical adequacy of different models.

Discont inuous Consti tuents

With regard to discontinuous constituents, the story is partly though not

exactly analogous. Discontinuous constituents were also featured in many IC analyses. A good example is Bloomfield's analysis of parentheticals (1933, p. 186): 24

21 The morpheme u is a coordinating conjunction ('and'). ;2 The symbol N 4 denotes a noun phrase, and Va stands for verb suffixes marking subject- verb agreement. 23 It may be worth recalling that Syntactic Structures (Chomsky, 1957) also provided (trans- formational) rules only for binary coordination. 24 Bloomfield even allowed for discontinuous words, although he regarded these as abnormal (pp. 180-18t).

Page 18: Manaster-Ramer & Kac - The Concept of Phrase Structure

342 A L E X I S M A N A S T E R - R A M E R AND M I C H A E L B. KAC

Parenthesis is a variety of parataxis in which one form interrupts the other; in English the parenthetic form is ordinarily preceded and followed by a pause-pitch: I saw the boy[,] I mean Smith's boy[,] running across the street[.] In a form like Won't you please come? the please is in a close parenthesis, without pause-pitch.

Likewise, in his work on Japanese syntax, Bloch (p. 229) finds an adverb like mata 'again' inside a discontinuous constituent consisting of a series of conjoined nouns in certain coordinate constructions.

Even more telling is the fact that the problems handled by Chomsky (1957, pp. 39-41 and passim) in terms of the affix hopping transformation in English seem analogous to phenomena in Yokuts and German described by Harris (1951, pp. 165-66) in terms of discontinuous constituents. The Yokuts construction indicates uncertainty and is formally characterized by the cooccurrence of the independent element na'as, and a verb stem with the suffix -al, e.g., xatxat-al na'as. ~ na'as, xatxat-al. Since na'as, and -al

always occur together, Harris proposes to analyze the two elements as a discontinuous morpheme. In German, Harris analyzes the ge- prefix of perfect participles and certain adjectives as forming a part of discontinuous morphemes together with the -et and -en suffixes, in forms like geeignet

'suitable' and gefangen 'captive'. In fact, Harris considered such an analy- sis for the English "affix hopping" phenomena, but dismissed it on the grounds that -en can occur not only with have but also with be and elsewhere (p. 214). This objection would, of course, still leave open the possibility that in English the auxiliary stems and the cooccurring suffixes would form complex discontinuous constituents (rather than simple dis- continucus morphemes).

Chomsky (1957, p. 41, no. 6) also noted the possibility of treating the English facts in terms of discontinuity but rejected it out of hand, and left the very possibility of discontinuity out of his model of phrase structure. The subsequent history of discontinuous constituents has been quite differ- ent from that of unbounded branching, however. They were explicitly incorporated into the extensions of Chomsky's PSG introduced by Yngve (1960) and Harman. Moreover, this was the only point on which Harman specifically argued that Chomsky's model of PSG was not faithful to the IC analysis. 2s However, discontinuous constituents were not incorporated into TG. They were also left out of GPSG, and have not become part of the popular conception of phrase structure. Only in recent years have some syntacticians been showing signs of readiness to bite the bullet on

25 Interestingly, Harman's authority for IC analysis was Postal's then unpublished book (1964a), and Harman's reading was quite correct, as we have shown.

Page 19: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 343

this matter, e.g., Bach (1981), McCawley (1982), Kac (1985a), Ojeda (1987), and Huck and Ojeda (1987).

Cross-classif ication

The situation with cross-classification is different still. Recent conceptions of phrase structure, such as GPSG, make extensive and crucial use of complex symbols, they have been part and parcel of transformational grammar since 1965, and they had been advocated by Harman in his defense of phrase structure two years before. To get the history straight, let us start, for the moment, with Harman, who pointed out that a PSG using complex symbols would stand up to many of the arguments Chomsky had advanced against PSG's with only simplex symbols. Chomsky (1965, 1966), while conceding this up to a point, rejoins by noting that, precisely because the use of complex symbols distinguishes Harman's model nontri- vially from Chomsky's own definition of PSG, the advantages of Harman's grammars, if any, are irrelevant to the question of the adequacy of PSG. Something like a replay of this scenario, but with a difference, took place with Gazdar's somewhat heated polemic against Chomsky's contention that PSG's with complex symbols are not PSG's but form a distinct class of grammars, with some relation to transformational grammars. Gazdar rejects Chomsky's argument on the grounds that CFG's using complex symbols (such as GPSG's) are weakly and strongly equivalent to simple CFG's, whereas TG's are vastly more powerful in both respects. Since, as noted, complex symbols were adopted in TG in the aftermath of the Harman-Chomsky debate, the issue here cannot be the need for this device; nor is there much disagreement about the additional descriptive power that it imparts to a formalism. Rather, what seems to be at issue is the propriety of calling a grammar that uses it a PSG.

However, the issue is not just definitional, because ultimately PSG was supposed to be a complete formalization of a well-articulated descriptive model, that of Bloomfieldian IC analysis, and this raises the question whether complex symbols were used in IC models. This was not claimed by Harman, 26 and Chomsky must have assumed the contrary. That this point is crucial can be seen by considering that, if complex symbols had not been employed by the structuralists, then Harman's would have been no defense of phrase structure, though it might have been an interesting

26 Harman has maintained throughout that a grammar with complex symbols could be viewed as an abbreviation of a CFG and that to that extent he had not introduced a crucial modification of Chomsky's notion of PSG.

Page 20: Manaster-Ramer & Kac - The Concept of Phrase Structure

344 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B. K A C

demonstration of the utility of a new model of grammar, distinct from both TG and PSG. If, on the other hand, he was formalizing something that had been there, even implicitly, in the IC model, then it is Chomsky's purported refutation of phrase structure that loses much of its force.

A case can be made, however, that the structuralists made use of the concept of complex symbol, though usually without any special notation. For example, Bloch in his IC analysis of Japanese repeatedly makes use of the implicit convention that a statement, say, about 'predicates' refers to 'final predicates' and 'non-final predicates' as well as 'plain predicates' and 'polite predicates' (1946, pp. 206, 213-218). 27

Likewise, Bloomfield in his discussion of English grammatical categories clearly presupposes that statements about, for example, 'adjectives' cover 'descriptive adjectives' and 'limiting adjectives' (1933, pp. 202-203). Fur- thermore, Bloomfield was clearly aware of cross-classification in English, e.g. (1933, p. 203):

Our limiting adjectives fall into two sub-classes of determiners and numeratives. These two classes have several subdivisions and are crossed, moreover , by several other lines of classification.

Indeed, it appears that Bloomfield was one of the first writers on language to give explicit recognition to the phenomenon of cross-classification (1933, p. 269):

Form-classes are not mutually exclusive, but cross each other and overlap and are included one within the other, and so on. Thus, in English, the nominative expressions (which serve as actors) include both substantives and marked infinitives (to scold the boys would be foolish). On the other hand, among the substantives are some pronoun-forms which, by overdifferentiation, do not serve as actors: me, us, him, her, them, whom. One group of substantives, the gerunds (scolding), belongs to a form-class with infinitives and with other verb forms, in serving as head for certain types of modifiers, such as a goal (scolding the boys). For this reason a system of parts of speech in a language like English cannot be set up in any fully satisfactory w a y . . .

Likewise, in Bloch's analysis of Japanese, we find perfectly clear discussion of cross-classification involving final/non-final and polite/plain predicates, with all four possibilities occurring (pp. 216-217).

The fact that categories called aA and bA can be jointly referenced by A would appear to be such a basic fact of normal English usage - and IC analyses were written in English rather than in a symbolic metalanguage

- that it would never have occurred to the structuralist writers that final

27 Postal misunders tood this and tried to explain such cases as instances of context-sensitivity, the idea being that the category predicate is rewritten as final-predicate when dominated by final-clause and as non-final-predicate when dominated by non-final-clause.

Page 21: Manaster-Ramer & Kac - The Concept of Phrase Structure

THE CONCEPT OF PHRASE STRUCTURE 345

and non-f inal p red ica t e s , say, were not bo th of t hem pred ica tes . The use

of fo rmula ic no t a t i on appea r s to have been i n t roduc e d by Har r i s , who

expl ic i t ly al lows ca tegor ies , d e n o t e d by le t ters , to subsume ( sub)ca tegor -

ies, d e n o t e d by the same le t te rs with var ious subscr ip ts (1946, p. 167):

We may distinguish several sub-classes such as those listed below, while V without any subclass mark will be used to indicate all the subclasses together.

F u r t h e r m o r e , as is r a the r well known, Har r i s also i n t roduc e d the sys tem

of analysis which was r e i n t r o d u c e d by C h o m s k y (1970) and has come to

be k n o w n as X-bar theory, and whose whole po in t is that syntac t ic ca tegor -

ies a re complex . 2s

Linear Order

In P S G ' s , cons t i tuen t s are c r ea t ed and o r d e r e d by the same rules , so that

a cons t i tuen t does no t exist apa r t f rom its pos i t ion with r e spec t to its s is ter

cons t i tuents . The actual IC a p p r o a c h was qui te d i f ferent , however . The

IC ' s of a cons t ruc t ion did not have to be o r d e r e d at all - i ndeed , f ixed

word o r d e r was t r e a t ed as one of a n u m b e r of po t en t i a l f o rma l a t t r ibu tes

of a cons t ruc t ion (Bloomf ie ld , 1933; H o c k e t t , 1954). Specif ical ly, accord-

ing to B loomf ie ld (1933, pp. 162-206), every cons t ruc t ion is cha rac te r i zed

by a set of taxernes, tha t is, fo rmal a t t r ibu tes , of which he d is t inguishes

severa l k inds . O n e of these in ( l inear ) o rde r . 29 Crucia l ly , t axemes of o r d e r

a re not a lways r e q u i r e d (Bloomf ie ld , 1933, p. 197):

An example of a taxeme of order is the arrangement by which the actor form precedes the action form in the normal type of the English actor-action construction: John ran. In languages which use highly complex taxemes of selection, order is largely nondistinctive and connotative; in a Latin phrase such as pater amat filium 'the father loves the son', the syntactic relations are all selective (cross-reference and government) and the words appear in all possible orders (paterfilium amat, filium pater arnat, and so on), with differences only of emphasis and liveliness.

A s the La t in e x a m p l e i l lus t ra tes , a cons t ruc t ion (in this case a c lause) m a y

be def ined wi thou t any r e fe rence to l inear o rde r . Syntac t ica l ly , pater amat

f i l ium and pater f i l ium amat are , for B loomf ie ld , the same clause. This

a p p r o a c h is a m p l y exempl i f ied in discussions o f the IC analysis of langu-

28 Postal (1964a), writing a few years earlier, could make no sense of Harris's proposals and entertained the possibility that symbols such as N 3 and N 4 should be treated as simplex and hence unrelated categories, like the N and NP of (pre-X-bar) transformational grammar! 29 The others are modulation (which has to do with to suprasegmentals), phonetic modific- ation (referring to sandhi processes), and selection (which subsumes agreement, government, and categorial information).

Page 22: Manaster-Ramer & Kac - The Concept of Phrase Structure

346 A L E X I S M A N A S T E R - R A M E R AND M I C H A E L B. KAC

ages with a variety of word orders, including Latin, English, German and French (Bloomfield, 1933, pp. 197ff.), Japanese (Bloch, 1946), and Menomini (Bloomfield, 1962, pp. 437ff.). An example from Bloch is the statement that "It]he relative order of clause attributes is not determined by their type: an adverbial phrase, as such, can precede or follow a relational phrase" (1946, p. 218).

The difference between IC analysis and PSG, however, has had a differ- ent history from the three discussed above. It was not highlighted by Harris in his morpheme-to-utterance work (but see the discussion of Hidatsa in Harris, 1946), or in either of the neo-PS proposals (Harman's or the early versions of GPSG). To be sure, recent versions of GPSG (e.g., Gazdar et al., 1985) separate the immediate dominance and linear precedence components of rules. However, it may be that a better formalization of the IC approach would be in terms of the proposal, adumbrated by Manaster- Ramer (1976), that sentences are partially ordered sets, rather than in ID/LP string terms, since the former, unlike the later, allows scrambling across constituent boundaries, which seems to have been permitted in the IC model as well.

Null Elements

Finally, we come to null (zero) elements. The IC grammarians appear to have used this device more sparingly in syntax than in morphology, but have nevertheless left clear instances of it. For example, Bloomfield refers to it in connection with zero anaphora in English sentences like Mary

dances better than Jane (1933, p. 252):

We can describe this latter type by saying that (after as and than) an actor (Jane) serves as an anaphoric substitute 3° for an actor-action expression (Jane dances), or we can say that (after as and than) a zero-feature serves as an anaphoric substitute for a finite verb expression accompanying an actor expression.

Similarly, he spoke of a zero serving as a 'relative substitute' in construc- tions like the man I saw, the house we lived in, and the hero he was (1933, p. 263). This should perhaps come as no surprise given Bloomfield's well- known veneration of the first grammarian known to use zeroes, Pfin.ini. A similar example comes from Bloch's Japanese syntax, involving coordinate structures without explicit conjunctions (p. 229):

30 It may be useful to point out that Bloomfield used the term ~substitute' to refer to what would today be called "pro-forms'.

Page 23: Manaster-Ramer & Kac - The Concept of Phrase Structure

THE CONCEPT OF PHRASE STRUCTURE 347

Many sentences contain a Series of two or more nouns with no particles between them, but with each noun, or each except the last, followed by a pause; the last noun is followed by a referent particle, 31 the copula, or the predicate of the clause.

• . . To cover sequences of this type, we posit a fifth conjunctive particle, phonemically zero, with the same syntactic functions as the four already mentioned. The sentence just cited might be written H6n O, zassi O, siBbuB o, katta, 32 with 0 (zero particle) syntactically equivalent to the particle to 'and'.

Chomsky also left this aspect of IC analysis out of his definition of CFG' s and CSG's (e.g., 1956, 1959, 1963). This might appear to be

contradicted by some examples of phrase structure rules given in Chomsky (1957, pp. 29 (fn. 3), 39, 111), where a morpheme written '0' appears in

places where nothing is pronounced ("the morpheme which is singular for nouns and plural for verbs") . However , it seems clear that, as far as the syntax is concerned, these are t reated as actual morphemes . It is only the (transformational) morphophonemic component which 'happens ' to

realize them as zeroes. This appears to be different f rom Bloch's treat- ment , which does not seem to call for a separate morphophonemic rule to delete the zero, but ra ther treats the null realization as a fact of syntax.

In any event, Bloomfield and Bloch postulated zero elements more freely in their IC analyses than Chomsky did in his phrase-structural base compo- nent, including cases which transformational grammarians would for years

consider to require the use of syntactic deletion transformations. Chomsky did not allow explicit null productions to be part of G S G (i.e.,

PSG) or CFG, though, of course, they are necessary in type-0 grammars.

On the other hand, they were permit ted in the phrase structure model of Ha rman and in GPSG, and they have also been all but universally recog- nized in more recent definitions of CFG (following the Bar-Hillel et al. (1961) definition of what they called 'simple phrase structure grammar ' ) •

Consequences f o r the Phrase Structure Controversy

On all five points (unbounded branching, discontinuity, cross-classific- ation, the separation of linear from hierarchical information, and null elements), we find that the actual IC systems employed devices that were excluded f rom Chomsky 's definition of PSG. Moreover , the fact that provision for discontinuous constituents had been made in the IC model

was acknowledged by Postal (1964a, pp. 67-70 and passim) and Chomsky (1957, p. 41, n. 6; 1965, p. 210, n. 4; 1966, p. 39, n. 15), and Postal also

3i What would today probably be labeled a 'case particle". 32 Translation: '[I] bought a book and a magazine and a newspaper'.

Page 24: Manaster-Ramer & Kac - The Concept of Phrase Structure

348 ALEXIS MANASTER-RAMER AND MICHAEL B. KAC

mentions unbounded branching (pp. 23-24 and passim). To be sure, Postal (p. 78) argued that these aspects of structuralist syntax had not been formalized by the structuralists themselves, and that transformational

g rammar provided the only known formal account of the phenomenon in question:

Thus the claim that Chomsky's notation of PSG reasonably and correctly formalizes the immediate constituent approach to linguistic analysis as developed in America appears well- founded. The only aspects of description utilized by these authors which are not formalized by PSG, namely, discontinuities and perhaps unbounded branching, do receive an apparently correct formalization in T G and this is indeed the only such formalization known.

However , the work of Yngve and Harman in the early 1960's showed that discontinuity could be formalized perfectly well in non-transforma- tional terms. A similar t reatment of unbounded branching is made possible by the use Kleene 's (1956) regular expressions (in particular ones with the

concatenation closure, or Kleene star, operator) as right-hand sides of CF productions. 33 Moreover , since n o component of structuralist linguistics had been formalized, and since Chomsky was claiming to present precisely

such a formalization in the form of PSG, surely it follows that the short- comings of PSG did not reflect on the structuralist models. This becomes

even clearer when we consider that no mention was made of the other ways in which PSG differed from IC theory. In fact, to the extent that structuralist models did contain devices apparently capable of handling many, if not all, of the phenomena which transformations were introduced to describe, there would seem to be some question about the prima facie case that was originally offered for transformational grammar.

All this makes it clear that the issue of the adequacy of the IC model

and that of PSG are quite independent. Chomsky 's arguments may dispose of PSG as a plausible candidate for a theory of human language syntax, but, insofar as they ignore various components of Bloomfield's, Bloch's, and Harris 's models, they have little to say about the adequacy of the

latter. Of course, the question of exactly how good the structuralist models were and what we have yet to learn from them is very much open. It could be, for example, that Chomsky was right in claiming that any at tempt to include such devices as discontinuity, in a PSG leads only to "ad hoc and fruitless elaborat ion" (1957, p. 41, n. 6). Yet most if not all

33 The use of such productions does not alter the weak generative capacity of CFG's, because the CFL's are closed under the operations involved, namely, union, concatenation, and (in the case before us) concatenation, or Kleene, closure. These closure facts had been published by Bar-Hillel et al. (1961) and Chomsky (1963, p. 380).

Page 25: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 349

of these elaborations have since found their way into transformational and post-transformational syntax, whereas transformations - that other great elaboration of phrase structure - are no longer as widely accepted among generative grammarians as they were in the 1960's and the early '70's. Moreover, if such devices were indeed ill-advised, then the case against structuralist syntax would have been strengthened by including them in the PSG formalism, and then showing just what was wrong with them, rather than leaving them out of the account.

At the same time, Chomsky's negative judgment about PSG (that is, the model he himself defined) has never really been subjected to a substantive challenge. To see that this is indeed so, consider the fact that every time that something labeled as a PSG has been advocated as a plausible model of human syntax, the defenders of phrase structure have enriched the formalism in ways designed to handle precisely the difficulties that Chom- sky had identified as insuperable given the simple model of PSG. If Chomsky had been wrong, there would have no been no need for null productions, regular expressions, complex symbols, discontinuous con- stituents, ID/LP, or any number of other devices employed in recent syntactic theories.

This puts the controversy over phrase structure in a new light. The debate began with Chomsky's attempt to show the inadequacy of the IC model. For this issue, the relevant question is not whether Harman or Gazdar et al. have come up with notational variants of PSG or of TG or with something different still; it is rather whether Chomsky's proposed formalization corresponded faithfully to the structuralist practices. If, for example, we agree that Harman came closer to capturing the actual IC ideas by allowing for complex symbols, discontinuous constituents, and deletions than Chomsky did by excluding these, then it was Chomsky and not Harman who clouded the issue with terminological equivocation. That is, Chomsky's use of PSG as a stand-in for IC theory, in spite of the major differences between them, amounted to the introduction of a new and private notion of IC analysis, divorced from the established usage. In sum, much of the actual argument that has taken place over the adequacy of phrase structure has been of little genuine import, for precisely the reasons given by Chomsky in his response to Harman, though clearly not solely, or even principally, through any fault of Harman's.

Interestingly, while Harman (1966) specifically purports to defined PSG as defined by Chomsky, his discussion leaves some doubt as to whether he is really ready to accept simple CFG, since he is still committed to complex symbols, discontinuous constituents, and null productions, but considers them irrelevant on the grounds that rules with complex symbols

Page 26: Manaster-Ramer & Kac - The Concept of Phrase Structure

350 A L E X I S M A N A S T E R - R A M E R AND M I C H A E L B. KAC

are abbreviations of sets of rules with simple symbols, and that grammars with discontinuous constituent or null productions can always be rewritten without such rules. It appears that Harman is making the same kind of mistake that advocates of GPSG would later fall into by arguing that, since every GPSG can be 'compiled' into a CFG, there is a strong equival- ence between the two classes of grammars. 34 The problem here is that, if the two models were indeed strongly equivalent, then it is hard to see why the more complex GPSG formalism was introduced in the first place. The same argument applies, mutatis mutandis, to Harman's position.

We would like to add that an argument over whose definition of a term is to be used is not merely, as Gazdar (1982) would have it, an exercise in terminological imperialism; if the parties to a controversy do not agree on what the terms of the debate mean, there is nothing left to debate. But, while we agree with Chomsky on this point, the sword, as we have shown, cuts both ways.

3 . W H A T P S G IS T O D A Y : U N I S I N I S T R A L I T Y

We now come to a deeper question: Is there after all a notion of phrase structure, distinct both from Chomsky's narrow formal definition and from Bloomfield's broader descriptive practice, which has gained general currency among contemporary syntacticians, which is both coherent and nontrivial, and which can be defined explicitly and thus made available for serious discussion? We think that there is, that is, that the complicated history we have sketched above has engendered a popular conception of human language syntax, identified by most with phrase structure grammar but in fact sorely in need of an unambiguous definition as well as an unambiguous label. We suggest that to count as a PSG in this intuitive sense common to most syntacticians, it is necessary and sufficient for a grammar to have the following characteristics:

(i) The rules of the grammar are all of the form (X, Y), where X is a single grammar symbol and Y is a (possibly null) string of grammar symbols or a regular expression, and both X and the symbols in Y may be either simplex or complex symbols.

(ii) Each such ordered pair is interpreted in one of three ways:

as an admissibility condition on trees allowing a node labeled

34 Rather than stating that the equivalence obtains, trivially, between CFG's and the "com- piled' object grammars generated by GPSG's (see below).

Page 27: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T OF P H R A S E S T R U C T U R E 351

X to directly dominate a string of nodes labeled (in order) either by Y (if Y is a string of grammar symbols) or by some string in the regular set denoted by Y (if Y is a regular ex- pression),

o r as a production rule allowing the rewriting of the symbol X in a sentential form 35 by Y (if Y is a string of grammar symbols)

or by some string in the regular set denoted by Y (if Y is a regular expression),

o r as a tree formation rule allowing nodes labeled by the sym- bols of Y (if Y is a string of grammar symbols) or by the symbols of some string in the regular set denoted by Y (if Y is a regular expression) to be put under a hitherto terminal node X.

This definition allows us to take grammars to operate either in terms of string or tree rewriting or in terms of node admissibility. This catholicity seems to be required given that all three approaches are to be found in current literature.

The only significant restriction on these grammars, then, is that there be only one symbol on the left-hand side of each rule, and we turn the spotlight on this feature by calling such grammars un i s in i s t ra l . To be sure, there has been occasional mention of context-sensitive rules within the GPSG literature, but it seems clear to us that these have not entered the mainstream of syntactic thought associated with the concept of phrase structure.

It should be noted that we allow the nonterminal vocabulary to be an infinite (specifically, a recursively enumerable) set of symbols and that the set of rules of the grammar may likewise be infinite (recursively enumer- able). Of course, an infinite grammar cannot be presented in the usual way. However , it would appear that the popular conception of phrase structure which we are trying to explicate allows the PSG to be itself derived by a higher-order grammar (metagrammar). The derived grammar (object grammar) may then be infinite without violating the intuitive re- quirement that, for something to be a grammar, it must be finitely specifi- able. The fact that such a generalization of the concept of (phrase struc- ture) grammar is indeed necessary to account for the state of the field is clear from the widely-accepted claims advanced by Gazdar (1982) and

35 A sentential form is a string derived from the initial symbol. Thus, it may still contain nonterminals and hence not be generated by the grammar.

Page 28: Manaster-Ramer & Kac - The Concept of Phrase Structure

352 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

others about the weak and strong equivalence of their GPSG's to CFG's. Since a GPSG metagrammar derives an object grammar rather than a language, these claims can only be interpreted as referring to the sets of strings and trees generated by these object grammars. Moreover, while there is general agreement that, for linguistic reasons, we should constrain such models so as to guarantee the finiteness of object grammars, this very fact shows that an infinite object grammar is conceptually consistent with the basic model (and apparently also with the early version of GPSG of Gazdar and Sag, 1980).

It might seem that allowing infinite grammars trivializes the notion of unisinistral grammar, but we hasten to note (a) that we are not advocating unisinistral grammar either as a contribution to the study of formal systems or as a theory of human language but rather as an attempt at the faithful representation of an implicit notion of PSG that is already widespread in modern syntactic theory, and (b) that even such an apparently unrestricted model of grammar may be quite limited in human language applications. An example to which we shall return is discontinuous constituents, which unisinistral grammars clearly do not allow, inasmuch as Y is either a (continuous) string of symbols or variables, or a regular expression denot- ing a set of such strings.

There are three ways in which our unisinistral grammars differ from context-free grammars as defined by Chomsky. First, we admit regular expressions, thus allowing unbounded branching. Second, we permit com- plex symbols. Third, we allow the (object) grammar to be infinite. All three of these factors seem to have become an integral part of the current conception of phrase structure, and leaving them out would have ill served our goal of capturing that conception in explicit terms. It should be noted that the first two of these factors, unbounded branching and complex symbols, appear to have been present in the original Bloomfieldian con- ception of IC analysis, as we have argued above. We have not included any of the other facets of Bloomfield's model that were left out of Chom- sky's phrase structure model, because they do not appear to have become assimilated to the popular conception of phrase structure which we are seeking to describe.

We have thus deliberately chosen to exclude some formal devices which have been gaining currency, such as the separation of rules for immediate dominance and linear precedence, explicit incorporation of discontinuity, and the like. To the extent that these devices are desirable in their own right, that very fact may point to the linguistic inadequacy of the popular conception of phrase structure, and allowing for them in our decription would no more vindicate the unisinistral model than the development of

Page 29: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T O F P H R A S E S T R U C T U R E 353

generalized phrase structure grammar did the simple context-free model. It is important that two kinds of theoretical progress be distinguished: that which comes from exploiting an existing model in novel ways, and that which comes from extending or modifying it. Thus, to claim that a model incorporating, say, discontinuous constituents or the ID/LP rule format, is 'really' a variant of unisinistral grammar would do nothing more than confuse the very issues which we have sought to clarify.

4 . T H E C O N S E Q U E N C E S A N D L I M I T A T I O N S O F U N I S 1 N I S T R A L I T Y

We shall now take up the question of the generative capacity of unisinistral grammars, beginning with some prefatory remarks on the notion of gen- erative capacity and its relation to theoretical models of human language.

Almost all the known mathematical results about generative capacity have to do with weak generative capacity, that is, the set of terminal strings derived by a grammar. Strong generative capacity, the set of deri- vations (or else, on other definitions, the set of trees) defined by the grammar, has been little explored by comparison, and the implications of results about it often invite misinterpretation. In this connection, two points stand out as especially significant.

The first is that the Chomsky hierarchy can only be used to measure weak generative capacity. Although, the Church-Turing thesis implies that there are no computational devices with greater weak generative capacity than that of type-0 grammars, the strong generative capacity (i.e., the sets of derivations) of type-0 grammars is anything but unrestricted. For example, no type-0 grammar allows an unbounded-length string of symbols to replace a given nonterminal in one step in a derivation, whereas a CFG augmented with productions having regular expressions on the right-hand side can yield such derivations. Yet such augmented CFG's generate only context-free languages, a proper subset of the type-0, or recursively enumerable, languages. Similarly, no type-0 grammar will ever define a derivation in which a string of nonterminals like A B is replaced by a string like aBc in one step, by means of a rewrite rule that rewrites A as ac. Yet, there exist formal systems which do precisely this but whose weak generative capacity is equal to that of the CFG's (for example, the PSG's of Yngve, 1960, and Harman, 1963).

It should be noted that, since type-0 derivations do not always define parse trees of the usual sort, we have stated these observations directly in terms of derivations, i.e. sequences of strings. However, we can also choose to consider only those type-0 derivations in which a single nonter-

Page 30: Manaster-Ramer & Kac - The Concept of Phrase Structure

354 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B. K A C

minal is rewritten in a single step. 36 It then becomes possible to associate parse trees unambiguously with derivations (though not vice versa), and then we can simply say that tree sets with unbounded numbers of sister nodes and tree 37 sets with discontinuous constituents are outside the strong generative capacity of type-0 grammars. This claim does not contradict the obvious fact that the derivations induced by a CFG with regular expressions form an r.e. language (provided the derivations are suitably coded). Under such an encoding, each CF derivation would be represented as a string, and the set of these strings would be generated by some type- 0 grammar, but this type-0 grammar would still be unable to derive an unbounded number of symbols from one nonterminal in one step. Mutatis mutandis, the same is true for the case of discontinuous constituents.

The second point about strong generative capacity is that it is undefined for almost all current models of human language. The notion was de- veloped for CSG's and CFG's, grammars whose derivations can be repre- sented in tree form (Chomsky, 1959 [1965, pp. 128-29]). But most contem- porary syntactic theories do not incorporate the same notion of derivation or assign simple tree structures to their strings. This fact is well illustated by the claim of Gazdar (1982, p. 134) that the use of complex symbols in a CFG does not alter strong generative capacity. As was pointed out by Chomsky (1965, pp. 84ff.), if each node of a phrase marker is labeled by a complex symbol, then the resulting object is no longer a tree. The reason is that the complex symbols introduce an 'extra dimension' owing to the fact that, while a tree is defined by relations of dominance and precedence among labeled nodes, the internal structure of the complex symbols does not fit this description: the individual features making up complex symbols do not label separate nodes. Chomsky seems to visualize a phrase marker representing this kind of a structure as having an extra dimension for the components of the complex symbols.

To be sure, Chomsky notes that we could reinterpret complex symbols as ordinary node labels (in which case the phrase marker would be a tree), but that this would amount to taking each complex symbol as a simple symbol with a long name. What Chomsky perhaps did not say quite as clearly as he might have, is that, if this latter proposal is adopted, then the resulting phrase marker does not in fact reflect the way the grammar functions. This would mean that a set of such phrase marker trees could

36 If we constrain type-0 grammars to allow only such derivations, weak generative capacity is not affected (R6vdsz, 1976). 37 On a suitably relaxed definition of a tree that allows the representation of discontinuous constituents (as in McCawley, 1982).

Page 31: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T OF P H R A S E S T R U C T U R E 355

not be considered to represent the strong generative capacity of a grammar that makes use of the internal structure of the complex symbols. 3s

It may, of course, be possible to define a new and more general concept of derivation, which would make sense of models, such as TG or GPSG, which define languages in other ways than conventional PSG's. For exam- ple, it may be possible to represent the processes whereby derived rules are generated by the metagrammar in GPSG, the processes whereby derived lexical items are produced in LFG, and so on, as part of such a generalized notion of derivation. Since the idea of strong generative capac- ity appears to have become inextricably tied up with arboreal representa- tions of the sort commonly employed for CFG's, we shall refer to any such more general notion as derivational capacity, and for the sake of symmetry we shall also speak of string capacity rather than weak generative capacity. We shall likewise refer to derivational and string equivalence of

formalisms. It should now be clear that, given a class of grammars, arguments for

or against its use as a model of anything, e.g., human language, should not confound string capacity and derivational capacity, since consequences with regard to one Of these may not apply to the other. 39 For example, if we are dealing with a non-CF language, we know not only that we cannot use a CFG to describe it but also that we cannot employ any grammar that is string-equivalent to a CFG. Further, since there is a string capacity hierarchy, namely, the Chomsky hierarchy, we can say something about where we might look for a grammar that will generate the language. But if we require non-CF derivational capacity (e.g., a tree set that cannot be defined by a CFG), the situation is quite different. First of all, the class of devices that are derivationally equivalent to the CFG's is properly contained within that of the weakly CF formalisms. Second, since there is no known derivational capacity hierarchy, there is little that we can say about where to look for an adequate grammar. If, to top things off, we need a model that handles structural representations more complex than

3s Chomsky also notes that, while a simple PSG defines only strictly Markovian derivations (in the sense that the applicability of a rule depends solely on the most recent line in the derivation), the use of complex symbols allows access to earlier lines in a derivation, just as transformations do. Gazdar (1982) sidesteps this argument by pointing to the major differ- ences between TG and CFG with complex symbols. However, the fact that two models are different in certain ways does not preclude their being identical, or similar, in others, and Chomsky's observation clearly shows that CFG with complex symbols, like TG, can encode global, as opposed to local, constraints. 39 A fortiori, we must keep both of these quite distinct from arguments that appeal to elegance, naturalness, and other informal notions.

Page 32: Manaster-Ramer & Kac - The Concept of Phrase Structure

356 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

trees - which would be the case for all the models that have wide currency

in contemporary syntax - then we find ourselves very far indeed from the helping hand of formal language theory.

Another important moral follows: If we were to take CFG's as our point of departure and come up with an argument that human languages require richer structural descriptions than those induced by a CFG, but

had no reason to demand greater string capacity than that of the CFG's , then we would not be justified in adopting a model of g rammar than in fact is much more powerful in terms of string capacity than the CFG's .

By the same token, if our point of departure were a model well beyond the CFG's in both string and derivational capacity, and we discovered that all human languages are CF as far as the string capacity is concerned, but

found no way to capture the structural properties of human languages in simple CF terms, then we would not want to adopt CFG as our model of grammar. These observations might appar to be so obvious as to be redundant, were it not for the fact that, as our discussion has shown, the history of the origins of TG shows signs of disregard of the first point, while the history of the attempts to revive PSG reveals equal neglect of

the second one. The reality of human language syntax, in fact, seems quite close (though

not identical) to the second of the hypothetical situations we have de- scribed. On the one hand, we can reiterate our earlier observation that it is possible for a formalism to be as powerful as desired in terms of string capacity and yet to be sufficiently restricted in derivational capacity (as in

the case of type-0 grammars) as to be useless for any human language application. 4° At the same time, there does not seem to be any compelling

reason to take the string capacity of human language grammars much beyond that of CFG's , even if the few constructions which induce non- context freeness are as central as argued by Rounds et al. (1987) and Manas ter -Ramer (1987). Nor are these conclusions contradictory, given that string capacity, derivational capacity, and other relevant properties

of grammars are largely independent of each other. The discovery that

the CFG's are inadequate for human language (Gazdar and Pullum, 1985, and the references cited therein) cannot be taken to indicate a need for a class of grammars differing from the CFG's more than slightly in terms of string capacity (such as TG) . Similarly, the discovery that human lan- guages approximate context-free string sets closely, if not exactly (ibid),

4o For much the same reasons the type-0 grammars or Turing machines are never used for specifying computer programs.

Page 33: Manaster-Ramer & Kac - The Concept of Phrase Structure

THE C O N C E P T OF P H R A S E S T R U C T U R E 357

does not invalidate the first point, and says virtually nothing about the form of grammars required for human languages.

Against the background provided by these observations, we now note that though unisinistrality is incompatible with overt context-sensitivity, it does nothing to prevent non-context-freeness of the generated language from being achieved covertly. Unisinistrality does not suffice to limit weak generative capacity, and this reflects the way that the conception of phrase structure has evolved within modern syntactic theory. For example, the GPSG's of the sort described by Gazdar and Sag (1980) apparently genera- ted the type-0 (recursively enumerable) languages, since the constraints on metarules proposed at the time allowed the number of rules in the object grammar to be infinite. 4~ To be sure, there are many such grammars which no one would want to seriously entertain as descriptions of any human language, and the considerable interest of that version of GPSG had to do with the properties of only a small subset of the range of allowable grammars. (In this respecL the history of GPSG is not unlike that of early transformational grammar.) Thus, even though arguments about weak generative capacity have on occasion been seen as bearing on the question of the adequacy of phrase structure models, in reality there is little if any connection.

Perhaps more interesting in the context of the debates about phrase structure vs. transformational grammar is the fact that unisinistrality is not only not sufficient to achieve the limitation of weak generative capacity to CFL's, but not necessary either. It should be noted that some of the unease about transformational grammars and the resulting malaise in syntactic theory in the 1970's and 1980's has had to do with Peters and Ritchie's (1971) proof that TG's are string equivalent to type-0 grammars. The nontransformational models that have proliferated since have fre- quently been heralded, among other things, as solutions to the problem of excessive string capacity. However, a transformational grammar can be constrained in such a way as to generate less than the full set of type-0 languages. For example, a TG with a CF base component and an arbitrary set of optional transformations 42 is guaranteed to generate a CF language if it is strongly structure preserving, i.e., if every derived structure is

41 Uszkoreit and Peters (I985) prove this, as well as the fact that, even if we allow only one variable ranging over an infinite set of strings, in the metarules, we can still get all type-0 languages.

42 If we allow obligatory transformations, the construction outlined here for a CFL-genera- ting TG will not work, but obligatory transforma}ions have not been used for a number of years.

Page 34: Manaster-Ramer & Kac - The Concept of Phrase Structure

358 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

required to be identical to some base-generated structure. The strong structure preservation constraint can be enforced by an algorithm which verifies that the output of each transformation is equal to some base structure and accepts it only in that case. Since CFL's are recognizable, such a filtering algorithm clearly exists, and in fact it is easy to implement. To be sure, as Tom Wasow (p.c.) reminds us, it was taken for granted for a long time that such a restriction on TG's would be empirically inadequate (presumably because it was believed that human languages were non-context-free). 4s We are not arguing, however, that such a trans- formational grammar would be a satisfying model of human language. Rather we contend that, even if human languages were context-free, this of itself would not be evidence against transformational grammar, any more than a demonstration that human languages are n o t context-free should count as an argument against phrase structure.

We conclude this section by mentioning some syntactic phenomena which raise substantive issues regarding unisinistral grammars that need to be addressed. Such phenomena as free word order, discontinuity, agree- ment, functional relations, and reduplication appear to resist satisfactory treatment in unisinistral terms even if the relevant sets of strings can be generated. To be precise, unisinistral grammars, like type-0 grammars, generate all the string sets that may be equated with the possible human languages, but this does not mean that unisinistral grammars are any more plausible as models of human languages than are type-0 grammars.

All of the phenomena mentioned are ones which it may be impossible to handle in a principled and natural way within the unisinistral frame- work. Just like type-0 grammars, then, unisinistral grammars may fall far short of the goal of describing the syntax of human languages even if they are permitted full latitude in terms of string capacity. If the treatment of these phenomena as at best special cases is inevitable under the unisinistral assumptions and yet theoretically unjustified, then a very different concep- tion of grammar will be required. Current trends in the field of syntax in fact argue that this is the case. Among these we must note the ID/LP format which divorces rules governing linear order from those controlling dominance, the formalization of discontinuous constituents by wrapping operations, and the explicit treatment of relational (functional) categories. Of course, as noted, two of the original IC devices that had been left out of Chomsky's formalization of phrase structure (unbounded branching

43 Ins tead , cons ide rab le energy was devo ted to res t r ic t ing t r ans fo rmat ions so as to exc lude

non-recurs ive l anguages (see Wasow, 1978, and the l i t e ra ture c i ted there in) .

Page 35: Manaster-Ramer & Kac - The Concept of Phrase Structure

T H E C O N C E P T OF P H R A S E S T R U C T U R E 359

and cross-classification) have won virtually universal acceptance. We have, therefore, included them in our conception of unisinistral grammar, since they appear no longer to be at issue. But, to keep track of the issues, we have kept out of unisinistral grammar other devices which had been part and parcel of Bloomfield's theory of human syntax but which have not been generally included in contemporary conceptions of phrase structure.

4 . C O N C L U S I O N S

In this paper, we have argued two main points:

(i) Much of the debate over the adequacy of phrase structure analyses of human language syntax has been rendered pointless by inconsistency in the way the term phrase structure g r a m m a r

has been used. By the same token, however, the original trans- formationalist critique of IC analysis is rendered suspect by the discrepancies between the actual Bloomfieldian model and Chomsky's formalization of it in terms of PSG.

(ii) There is nonetheless a coherent and substantive issue which needs to be addressed in contemporary syntactic theory, name- ly, the adequacy of a conception of phrase structure which we have tried to explicitate in terms of our notion of unisinistrality.

In the course of discussing these points we have also sought to clarify such central theoretical concepts as generative capacity, complex symbol, and tree. We have sought, with our observations regarding the generative capacity and the adequacy of type-0 and unisinistral grammars, to exemp- lify the important principle that even an apparently highly stringent restric- tion on the format of grammar rules may have no effect on the string capacity of the model. Yet at the same time, a formalism may be unrestric- ted in terms of string capacity, but still have no use as a tool for analyzing human languages (as in the case of type-0 grammars). If nothing else, this argues for caution in making and evaluating claims about the power, equivalence, and adequacy of different models of grammar.

REFERENCES

Bach, E.: 1981, ~Discontinuous Constituents in Generalized Categorial Grammar', Proceed- ings of the Eleventh Conference of the New England Linguistic Socieo,, pp. I-i2.

Bar-Hillel, Y., M. Perles, and E. Shamir: 1961, "On Formal Properties of Simple Phrase Structure Grammars', Zeitschrift fiir Phonetik, Sprachwissenschaft und Kommunika- tionsforschung 14, 143-172.

Page 36: Manaster-Ramer & Kac - The Concept of Phrase Structure

360 ALEXIS M A N A S T E R - R A M E R AND M I C H A E L B. KAC

Berwick, R. and A. Weinberg: 1982, 'Parsing Efficiency, Computational Complexity, and the Evaluation of Grammatical Theories', Linguistic Inquiry 13, 165-91.

Bloch, B.: 1946, "Studies in Colloquial Japanese II: Syntax', Language 12, 200-48. Bloomfield, L.: 1914, An Introduction to the Study of Language, Henry Holt and Co., New

York. Bloomfield, L.: 1933, Language, Holt, Rinehart, and Winston, New York. Bloomfield, L.: 1962, The Menomini Language, Yale University Press, New Haven. Bresnan, J., R. M. Kaplan, S. Peters, and A. Zaenen: 1982, 'Cross-serial Dependencies in

Dutch', Linguistic Inquiry 13, 613-35. Chomsky, N.: 1956, 'Three Models for the Description of Language', IRE Transactions on

Information Theory, IT-2, 113-24. Reprinted 1965, in R. D. Luce, R. R. Bush, and E. Galanter (eds.), Readings in Mathematical Psychology, Vol. 2, pp. 105-24, John Wiley, New York.

Chomsky, N.: 1957, Syntactic Structures, Mouton, The Hague. Chomsky, N.: 1959, ~On Certain Formal Properties of Grammars', Information and Control

1, 91-112. Reprinted 1965, in R. D. Luce, R. R. Bush, and E. Galanter (eds,), Readings in Mathematical Psychology, Vol. 2, pp. 124-55, John Wiley, New York.

Chomsky, N.: 1961, 'On the Notion "Rule of Grammar" ', in R. Jakobson (ed.), Proceedings of Symposia in Applied Mathematics, vol. XII: Sn'ucture of Language and its Mathematical Aspects, pp. 6-24, American Mathematical Society, Providence.

Chomsky, N.: 1962a, "The Logical Basis of Linguistic Theory', in Preprints of Papers for the Ninth International Congress of Linguists, August 27-31, 1962, Cambridge, Mass., pp. 509-74, [Massachusetts Institute of Technology?], Cambridge, MA.

Chomsky, N.: 1962b, 'A Transformational Approach to Syntax', in [A. A. Hill, (ed.)] Third Texas Conference on Problems of Linguistic Analysis in English, May 9-12, 1958, pp. 124-158, University of Texas Press, Austin.

Chomsky, N.: 1963, 'Formal Properties of Grammars', in R. D. Luce, R. R. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, Vol. 2, pp. 323-418, John Wiley, New York.

Chomsky, N.: 1964, 'The Logical Basis of Linguistic Theory', in H. G. Lunt (ed.), Proceed- ings of the Ninth International Congress of Linguists, Cambridge, Mass., August 27-31, 1962, pp. 914-978, Mouton, The Hague.

Chomsky, N.: 1965, Aspects of the Theory of Syntax, MIT Press, Cambridge, MA. Chomsky, N.: 1966, Topics in the Theory of Generative Grammar, Mouton, The Hague. Chomsky, N.: 1970, 'Remarks on Nominalization', in R. Jacobs and P. Rosenbaum (eds.),

Readings in English Transformational Grammar, pp. 184-221, Ginn, Waltham, MA. Chomsky, N. and G. A. Miller: 1963, 'Introduction to the Formal Analysis of Natural

Languages, in R. D. Luce, R. R. Bush, and E. Galanter (eds.), Handbook of Mathematical Psychology, Vol. 2, pp. 269-322, John Wiley, New York.

Gazdar, G.: 1981, 'Unbounded Dependencies and Coordinate Structure', Linguistic Inquiry 12, 155-84.

Gazdar, G.: 1982, 'Phrase Structure Grammar', in P. Jacobson and G. K. Pullum (eds.), The Nature of Syntactic Represesentation, Reidel, Dordrecht.

Gazdar, G.: 1985, ~Applicability o[ Indexed Grammars to Natural Languages', CSLI Report 85-34, Stanford University, Stanford, CA.

Gazdar, G., and G. K. Pullum: 1982, GPSG: A Theoretical Synopsis, Indiana University Linguistics Club, Bloomington, IN.

Gazdar, G. and G. K. Pullum: 1985, ~Computationally Relevant Properties of Natural Languages and Their Grammars', New Generation Computing 3, 273-306.

Gazdar, G., and I. A. Sag: 1980, "Passives and Reflexives in Phrase Structure Grammar', in J. A. G. Groenendijk, T. Janssen, and M. Stokhof (eds.), Formal Methods in the Study of Language, Mathematical Center Tracts 135, pp. 131-52, Mathematisch Centrum, Universiteit van Amsterdam, Amsterdam.

Page 37: Manaster-Ramer & Kac - The Concept of Phrase Structure

THE CONCEPT OF PHRASE S T R U C T U R E 361

Gazdar, G., E. Klein, G. K. Pullum, and I. A. Sag: 1985, Generalized Phrase Structure Grammar, Harvard University Press, Cambridge, MA.

Harman, G. H.: 1963, 'Generative Grammars without Transformation Rules: A Defense of Phrase Structure', Language 39, 597-616.

Harman, G. H.: 1966, ~The Adequacy of Context-Free Phrase-Structure Grammars', Word 22, 276-293.

Harris, Z. S.: 1945, 'Discontinuous Morphemes', Languages 21, 121-27. Harris, Z. S.: 1946, ~From Morpheme to Utterance', Language 22, 161-83. Harris, Z. S.: 1951, Methods in Structural Linguistics, University of Chicago Press, Chicago. Harris, Z. S.: 1962, String Analysis, Mouton, The Hague. Harrison, M. A.: 1978, Introduction to Formal Language Theory, Addison-Wesley, Reading,

MA. Hockett, C. F.: 1954, "Two Models of Grammatical Description', Word 10, 210-31. Hockett, C. F.: 1958. A Course in Modern Linguistics, Macmillan, New York. Hockett, C. F.: 1961, ~Grammar for the Hearer', in R. Jakobson (ed.), Proceedings of

Symposia in Applied Mathematics, vol. XII: Structure of Language and its Mathematical Aspects, pp. 220-236, American Mathematical Society, Providence.

Hopcroft, J. E. and J. D. Ullman: 1979, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, MA.

Huck, G. J. and A. E. Ojeda (eds.): 1987, Discontinuous Constituency [=Syntax and Semantics, 20], Academic Press, Orlando, FL.

Hudson, R. A.: i976, Arguments for a Nontransformational Grammar, University of Chicago Press, Chicago.

Hudson, R. A.: 1984, Word Grammar, Basil Blackwell, Oxford. Joshi, A. K~: 1983, ~How Much Context-Sensitivity is Required to Provide Reasonable

Structural Descriptions: Tree-Adjoining Grammars', in D, Dowry, L. Karttunen, and A. Zwicky (eds.), Natural Language PatMng: Psycholinguistic, Computational, attd Theoretical Perspectives, pp. 190-204, Cambridge University Press.

Kac, M. B.: 1985a, Grammars and Grammaticalio,, unpublished ms., University of Minnesota.

Kac, M. B.: 1985b, "Constraints on Predicate Coordination', Indiana University of Linguistics Club, Bloomington.

Kleene, S. C.: 1956, "Representation of Events in Nerve Nets and Finite Automata', Auto- mata Studies, pp. 3-42, Princeton University Press, Princeton.

Lamb, S. M.: 1962, Outline of Stratificational Grammar, University of California, Berkeley. Manaster-Ramer, A.: 1978, 'The Position of the Verb in Dutch and German', Papers from

the Fourteenth Regional Meeting, pp. 254-63, Chicago Linguistic Society, Chicago. Manaster-Ramer, A.: 1987, "Dutch as a Formal Language', Linguistics and Philosophy. McCawley, J. D.: 1968, ~Concerning the Base Component of a Transformational Grammar',

Foundations of Language 4, 243-269. McCawley, J. D.: 1982, "Parentheticals and Discontinuous Constituent Structure, Linguistic

Inquiry 13, 91-106. Ojeda, A.: 1987, ~Discontinuity and Phrase Structure', in A. Manaster-Ramer (ed.), Mathe-

matics of Language, John Benjamins, Amsterdam. Percival, W. K.: 1976, "On the Historical Source of Immediate Constituent Analysis', in J.

D. McCawley (ed.), Syntax and Semantics 7: Notes from the Linguistic Underground, pp. 229-242, Academic Press, New York.

Peters, S. and R. Ritchie: 1969, "Context-sensitive Immediate Constituent Analysis - Con- text-free Languages Revisited', Proceedings of the A CM Symposium on Theory of Comput- ing, pp. 1-8.

Peters, S. and R. Ritchie: 1973, "On the Generative Power of Transformational Grammars', Information Sciences 6, 49-83.

Page 38: Manaster-Ramer & Kac - The Concept of Phrase Structure

362 A L E X I S M A N A S T E R - R A M E R A N D M I C H A E L B . K A C

Pike, K. L.: 1954-60, Language in Relation to a Unified Theory of the Structure of Human Behavior, Summer Institute of Linguistics, Glendale, CA., 3 parts.

Postal, P. M.: 1964a, Constituent Structure: A Study of Contemporary Models of Syntactic Description, Mouton, The Hague.

Postal, P. M.: 1964b, "Limitations of Phrase Structure Grammars', in J. A. Fedor and J. J. Katz (eds.), The Structure of Language: Readings in the Philosophy of Language, pp. 137-51, Prentice-Hall, Englewood Cliffs, NJ.

Pullum, G. K. and G. Gazdar: 1982, ~Natural Languages and Context-flee Languages', Linguistics and Philosophy 4, 471-504.

Rfiv6sz, G.: 1976, "A Note on the Relation of Turing Machines to Phrase Structure Gram- mars', Computational Linguistics and Computer Languages 11, 11-16.

Rounds, W. C., A. Manaster-Rarner, and J. Friedman: 1987, "Finding Natural Languages a Home in Formal Language Theory', in A. Manaster-Ramer (ed.), Mathematics of Language, pp. 349-59, John Benjamins, Amsterdam.

Savitch, W. J., E. Bach, W. Marsh, and G. Safran-Naveh (eds.): 1988, The Formal Com- plexity of Natural Language, Reidel, Dordrecht.

Uszkoreit, H. and S. Peters: 1986, "On Some Formal Properties of Metarules', Linguistics and Philosophy 9, 477-94.

Wasow, T.: 1978, 'On Constraining the Class of Transformational Languages', Synthese 39, 81-104.

Well, R. S.: 1947, qmmediate Constituents', Language 23, 81-117. Yngve, V. H.: 1960, "A Model and an Hypothesis for Language Structure', Proceedings of

the American Philosophical Society 104, 444-46.

Alexis Manaster-Ramer

Box 704

T. J. Watson Research Center

I B M Corporation

Yorktown Heights, N Y 10598

U.S.A.

Michael B. Kac

Department o f Linguistics

University o f Minnesota

Minneapolis, M N 55455

U.S.A.


Recommended