+ All Categories
Home > Documents > USAGE-BASED MODELS OF LANGUAGE

USAGE-BASED MODELS OF LANGUAGE

Date post: 21-Nov-2023
Category:
Upload: alverno
View: 0 times
Download: 0 times
Share this document with a friend
78
Boyland, Joyce Tang. 2009. Usage-based Models of Language. In D. Eddington (Ed.), Experimental and Quantitative Linguistics (pp. 351-419). Munich: Lincom. Formatted: Font:11 pt USAGE-BASED MODELS OF LANGUAGE JOYCE TANG BOYLAND Alverno College and University of Wisconsin-Milwaukee, U.S.A. Abstract What is the source of the abstract structure that linguists study in the form of syntax, morphology, and phonology? Usage-based linguists present a wide range of evidence that linguistic structure – syntactic, morphological, and phonological – is largely a product of humans' collective and accumulated experience. Over the past 20-30 years, the increasing availability of recorded and transcribed speech, as well as other large text corpora, has increasingly allowed our theorizing to be constrained by empirical observations. Research has given us reason to believe, first, that humans' multiple representations of linguistic structures reflect the statistics of an expression's occurrence, and second, that human cognitive processing can and does make use of usage statistics. This chapter gathers relevant research (both theoretical and empirical) from synchronic and diachronic research on grammar, from cognitive psychology and cognitive modeling, and from computational linguistics. Together this research gives a cross-disciplinary view of why usage is so central in the mental processes and representations of human language. Continued cross-fertilization should be a hallmark of future research. Introduction 1.1 General introduction Usage-based models of language have in recent years captured the attention of linguists of many persuasions. They are designed to explain regularity within the messiness of descriptively adequate grammars and capitalize on the findings of research in the cognitive sciences. These models propose that linguistic knowledge is embodied in Formatted Deleted:
Transcript

Boyland, Joyce Tang. 2009. Usage-based Models of Language. In D. Eddington (Ed.), Experimental and Quantitative Linguistics (pp. 351-419). Munich: Lincom.

Formatted: Font:11 pt

USAGE-BASED MODELS OF LANGUAGE

JOYCE TANG BOYLAND Alverno College and University of Wisconsin-Milwaukee, U.S.A.

Abstract What is the source of the abstract structure that linguists study in the form of syntax, morphology, and phonology? Usage-based linguists present a wide range of evidence that linguistic structure – syntactic, morphological, and phonological – is largely a product of humans' collective and accumulated experience. Over the past 20-30 years, the increasing availability of recorded and transcribed speech, as well as other large text corpora, has increasingly allowed our theorizing to be constrained by empirical observations. Research has given us reason to believe, first, that humans' multiple representations of linguistic structures reflect the statistics of an expression's occurrence, and second, that human cognitive processing can and does make use of usage statistics. This chapter gathers relevant research (both theoretical and empirical) from synchronic and diachronic research on grammar, from cognitive psychology and cognitive modeling, and from computational linguistics. Together this research gives a cross-disciplinary view of why usage is so central in the mental processes and representations of human language. Continued cross-fertilization should be a hallmark of future research. Introduction 1.1 General introduction Usage-based models of language have in recent years captured the attention of linguists of many persuasions. They are designed to explain regularity within the messiness of descriptively adequate grammars and capitalize on the findings of research in the cognitive sciences. These models propose that linguistic knowledge is embodied in

Formatted

Deleted: ∗

mental processing and mental representations that are sensitive to context and statistical probabilities, which are both cognitively plausible and powerful enough to account for the complexity of actual language use. Usage-based models concentrate on the power of the distributional facts of actual language use to form speakers’ mental representations of language structures at all levels, from phonetics and phonology through morphology and syntax to pragmatics. When speakers record multiple instances of language use in particular contexts, they develop an increasingly rich implicit knowledge base from which they can, in a cognitively realistic way, generate increasingly sophisticated generalizations without necessary recourse to a priori grammatical rules. The idea that usage-based processes and representations constitute knowledge of language has manifold implications. It profoundly affects our understanding of synchronic phenomena that are currently seen as explainable by rules, our understanding of how change takes place diachronically as well as how stability is maintained, and our understanding of patterns of variability both within and across speakers. If usage-based mechanisms are shown to be powerful enough to explain both the rule-like core phenomena and the messy marginal variation in synchronic structure and diachronic trends, then they offer an elegant alternative to theories that minimize the role of well-established cognitive mechanisms while maximizing the role of otherwise-unobserved special-purpose innate ideas (e.g., universal grammar). The goal of this chapter is to make this case. The chapter presents the usage-based approach: its disciplinary foundations, the claims it makes, the methodology and evidence that support it, and the implications it has for linguistics.

In the broadest possible terms, a usage-based model of language is one that explains both acquisition and language processing, both synchronic and diachronic phenomena, and both low-level and high-level structure in language, all primarily through reference to speakers’ experiences with linguistic input. The goal of finding such explanations is a reflection of two primary drives: the drive for descriptive adequacy, and the drive for parsimony in explanation.

The desideratum of descriptive adequacy motivates attention to regularly occurring linguistic phenomena that have sometimes been attributed to performance. Usage-based models aim to include these regularities in their descriptions of the language. They suggest that hearing and producing instances of language creates in speakers new or modified mental representations that are part of their linguistic competence. At a minimum, these representations, and the processes that operate on them, must be able to explain what Joseph (1997b) calls the core of language: the set of utterances that submit most easily to universal generalizations. However, usage-based representations also explain what Joseph calls the periphery: utterances that may have internal structure but do not submit to universally applicable rules, such as semi-productive patterns like “Down with X!” (Tomasello, 2003b). Therefore, usage-based models take particular care not to exclude from analysis units of speech that are, to

Deleted: its

varying degrees, fixed or conventionalized (Hudson, 1998), and which have often been relegated to the status of idiom. As discussed by Pawley and Syder (Pawley & Syder, 1983), Hopper (1987), and others, there are degrees of conventionality, and it is important to account for speakers’ graded sense of the acceptability of grammatically licensed and semantically coherent utterances that nevertheless sound unnatural. Usage-based models take care to account for the acceptability of both the freely generated and the formulaic with the same processes and representations.

The desideratum of parsimony leads usage-based models to rely whenever possible on forms of mental representation, and forms of mental processing, that are already known to cognitive science. A usage-based model reaches first for what is already known in cognitive science to help it state the properties of mental representations, or state how these representations can be arrived at by applying well-established mental processes to the input data. At the center of usage-based models are these claims about the mental representation of linguistic knowledge, and about the mental processes that regularly operate upon those representations.

A description of one usage-based model (out of many very diverse types) may serve to ground the discussion.

1.2 An example of usage-based research One of the most comprehensive published monographs that has been completed in the usage-based framework is Krug’s (2000) study of English neo-modals (e.g., gonna, wanna, hafta, gotta, useta). Krug applies the insights of usage-based linguistics to follow the progress of these auxiliary verbs, whose pragmatic, semantic, syntactic, morphological, and phonological properties have migrated far from their origins. He finds that the usage-based approach offers good analyses of both diachronic and synchronic data.

Consider his analysis of the development of wanna. Using diachronic corpus data, he documents a steep rise in the verbal use of want, particularly with an infinitival complement, between 1800 and the present, beginning with genres closer to speech (drama and personal letters), and spreading to less speech-like genres. Along with this rise in frequency, he notes a corresponding continuing increase in syntactic bondedness between want and to, as judged by this phrase’s interruptibility with intervening adverbs. This pattern is in accordance with usage-based theory, as frequency strengthens the representation of a spoken sequence (of whatever size) and increases the mutual dependency of each of its components (Boyland, 1996).

Semantically, through detailed review of the historical data, he confirms that the diachronic pathways found by Bybee et al. (1994) are valid for the development of

wanna in English as well. Want clearly initially expressed lack, as in for want of a nail, a shoe was lost. A sense of lack easily admits a sense of necessity, and there are indeed sentences attested in the 17th century that do not denote lack, nor the current sense of volition: the Money, which you favour’d me with, I chiefly want to prosecute this design. This may be paraphrased as the money you gave me – I need it in order to carry out this plan (Krug, 2000, p. 144, from the ARCHER corpus). Over the next two centuries, the predominant sense became the sense of volition that is most common in Contemporary English. More recently, however, want has begun to take on a more deontic semantics, as in You’ve got [a] toothache? You wanna see a dentist (Krug, 2000), which can be said regardless of the listener’s actual desire to meet the bearer of the drill. In some more recent cases, the semantics has even become epistemic: Coolers? They wanna be on one of the top shelves (Krug, 2000, p. 150). This semantic pathway from objective physical absence to subjective perceived obligation and even perceived likelihood confirms the tendencies found in usage-based studies of grammaticization (Traugott, 1989; Sweetser, 1991).

On the phonological level, Krug found a wide range of pronunciations of want to, ranging from full citation form down to [w], with over 20% of cases eliding the [t], even in Britain, where [t] is generally enunciated more fully than it is in the United States. This synchronic variation is consistent with a usage-based conception of language change, which depends on competition between variants (Croft, 2000), and in which variation is most salient during periods of change in progress. Furthermore, he found that the higher the frequency of an expression (e.g., want to), the higher the proportion of contracted variants (e.g., wanna). In this way, going to is more frequent than want to, and therefore gonna occurs in a higher proportion of instances than wanna. The frequency of the reduced forms, such as wanna, and their increased appearance in print form, testify to the entrenchment of these reduced variants in speakers’ repertoires, as is predicted in a usage-based account.

Methodologically speaking, Krug’s extensive mining of corpus data is characteristic of usage-based research, and embodies a number of usage-based ideals. Corpora provide real data, which can serve purposes that judgment data cannot. Real data feed in as input to the next cycle of language use (both in language acquisition and in adult language development), and thus are more informative in the search for mechanisms of language change. In addition, Krug takes advantage of the fact that corpus data are amenable both to quantitative study, useful for grasping the general state of affairs, and to qualitative study, useful for discovering and appreciating the relevant nuances.

Krug (2000, p. 176) summarizes as follows:

It seems clear that string frequency is an indicator of increased bondedness between the constituents of constructions; it is likely that it is a fairly robust principle in the grammaticalization of items originally consisting of more than one word; and it is conceivable that it constitutes a factor in language change in general. If any of the preceding assumptions [sic] were true, it would seem foolish to exclude performance data from considerations of linguistic theory. In that case, ignoring natural discourse would prevent us from gaining insights into language and language change that cannot otherwise be obtained.

Krug’s investigation yields more than a simple confirmation of prior hypotheses

found in the literature, as the data in aggregate reveal a higher-order structure. At the synchronic level, the auxiliary-like verbs, including the neo-modals, form loose prototypicality clusters. None of the verbs behaves exactly like any of the others, but there are clusters that behave similarly to each other. He finds that the various paths that the various not-quite-modals are taking are converging on a newly emerging class of neo-modals. These neo-modals have modal-like properties that they formerly did not have. The members of this emerging class are now more similar to each other than they used to be. This change offers support for the usage-based idea that the representations of utterances in a speaker’s mind – their strength and their neighborhood relations – actually do have some force on the progress of language change.

This gravitation model suggests that the cluster of neo-modals has begun coalescing around the more frequent ones, and that as the cluster gains in “mass” (or strength), it strengthens the original members of the cluster (see Rosch (1978)) and also attracts weaker less frequent verbs into its orbit. Such a quantitative model, with explains the rise of the neo-modals and makes predictions for the future of this verb class.

Krug’s account very simply handles questions that have occupied and flummoxed syntacticians on the topic of wanna-contraction (Pullum, 1997). Why can some verbs but not others undergo contraction with infinitive to? The answer is that any such compound construction that becomes highly frequent and thus highly bonded would have the potential to contract in such a way with to. Constructions that have not reached such a level of frequency and bonding would not be able to support the kinds of contractions that wanna can. It would have been difficult to make such a prediction solely on the basis of the formal properties of the sources without regard to how they are used by speakers, how they are received and processed by listeners, and how they are acquired by learners.

2 Disciplinary roots, definition, and delineation A number of disparate research streams have converged to form the current ecumenical usage-based movement. One major driving force is known as West Coast functionalism (Bybee, 1999), or cognitive-functional linguistics (Tomasello, 2003b). From outside of linguistics proper, cognitive psychologists and psycholinguists in their various stripes have contributed insights into what kinds of representations and processes are cognitively plausible. The statistical sophistication of computational linguists and their focus on real usage data have been increasingly important as well. On the side of those more committed to formal description, phonologists and syntacticians trained in Optimality Theory have joined in the conversation (Hayes & Londe, 2006). Each of these streams has brought a distinct voice that argues for the power and importance of the details of linguistic input in explaining how speakers develop mental representations for language. 2.1 West coast cognitive functionalism The term usage-based is sometimes used interchangeably with the terms cognitive and functionalist. Although this conflation is somewhat misleading, it is not surprising, because West Coast cognitive functionalism (WCCF) was, in many respects, the spark that launched the usage-based enterprise. There are several reasons why so many of the germinal ideas of the usage-based approach had their beginnings in WCCF.

First, WCCF has traditionally been concerned with handling phenomena that tended to be set aside by Chomskyan syntacticians, since WCCF originated as a reaction against the elevation of algebraic abstract syntax over other constructs that were legitimate targets of linguistic inquiry. Although an interest in semantics did not develop into an identifying characteristic of usage-based linguistics, semantics has had its impact by introducing non-algebraic forms of representation. A watershed moment in WCCF was the publication of Rosch et al. (1976). Rosch, a cognitive psychologist, showed that semantic categories, in human cognition, are not represented with clear-cut boundaries. Instead, categories are fuzzy; some instances of a concept are more central, while other instances are more peripheral. Later, Lakoff (1987) introduced these discoveries to the study of language. These ideas, while originally only referring to semantic categories, led to a host of arguments questioning Chomskyan linguistics across the board. If categories are fuzzy, what of syntactic categories and the rules that depend on them for coherent operation? For the purposes of usage-based models of language, what is important to recognize is that Rosch’s and Lakoff’s work opened the door to two things. First, it legitimized interest in the peripheral phenomena that cannot easily be shoehorned into the reflexes of the usual clearly bounded, brittle, rules, and which thus are often listed as

idioms or exceptions. Second, the ontological status of the rules themselves was questioned: if they do not exist in human minds, to what extent is it legitimate to say that they exist at all? (Lima, Corrigan, & Iverson, 1994).

A second reason that WCCF was a natural breeding ground for usage-based models is that WCCF is as a whole less interested in special-purpose hard-wired mechanisms and representations, and more interested in providing explanations that refer to the demands imposed by the textual or social context, and also to the cognitive processes endemic to all human thought. For example, Li and Thompson (1976), in exploring the differences between the grammatical notions of subject vs. topic, discovered that the cleanest explanation of the historical development of the grammatical category of subject was that “subjects are essentially grammaticalized topics” (p. 484). That is, the topic of a sentence is a loosely regulated functional entity with few fixed formal properties, but as speakers of a language routinize the use of topics, fixed formal properties begin to accrue and become automatized, until this tight formal structure becomes a syntactic entity, namely the subject (Givón, 1979c; 1979b; 1989). This understanding of the nature of grammar, as a product of pragmatic and cognitive considerations (Hopper, 1997), was formative in the development of usage-based models. Givón developed this idea extensively (Givón, 1979c; 2001; 1979a), and brought the insights of functionalism to the table in such a way that scholars in other linguistic sub-disciplines could make use of them. Articulating clearly and memorably how pragmatic sources and cognitive processing constraints together crystallize in syntax and morphology, Givón's prescient work is at the root of the current flowering of usage-based language research. Functionalists of this bent are accustomed to explanations referring to such external sources, so moving to an external explanation such as the distribution of input structures does not require a great leap in intellectual practice.

A third reason that usage-based models grew naturally in the soil of WCCF is methodological. In creating the study of linguistic typology, Greenberg introduced the practice of quantitatively examining the properties of unrelated languages (Greenberg, 1960), in order to identify categorical and gradient relationships between languages. This line of inquiry led not only to well-known insights about universals of form and function, but, through the concept of markedness, it led also to insights about the role of frequency in shaping the course of morphological change (Greenberg, 1974a; Greenberg, 1974b; Greenberg & O'Sullivan, 1974). Not coincidentally, researchers with typological interests developed the habit of using data from real communicative contexts, because it is only in context that it is possible to discover and understand many typological regularities. Finally, the West Coast functionalists’ training in typology caused them to take seriously the fact that languages exist dynamically in time (Croft, Denning, & Kemmer, 1990; Greenberg, 1995). This draws attention to the search for ways to integrate synchrony and diachrony, and one of the most important of these is usage.

Deleted: also

Despite the great debt that usage-based models owe WCCF, it would be misleading to equate the usage-based movement with WCCF. Many have made this understandable move. For example, Tomasello (2003b) construes usage-based linguistics as a synonym for cognitive-functional linguistics, emphasizing that structure emerges from use. True, a full account of usage must include both the communicative function to which a structure is put and its distributional characteristics. However, the models that are known as usage-based are typically constructed more to recognize the influence of statistical distribution than to recognize the influence of communicative functional pressures per se. Communicative pressures are seen as relevant to the extent that they affect the statistical properties of the speech that is actually produced, and which eventually forms the input to others’ acquisition, and to the extent that these pressures constrain the hearer's interpretation of the speech.

2.2 Language acquisition The study of language acquisition has also had a formative influence on usage-based models, as it highlights on the one hand the sources of speakers’ knowledge of language, and on the other hand, the endpoint of speakers’ process of language acquisition. If prevailing assumptions about the input to acquisition and about the output of acquisition are faulty, then conclusions about internal representation must be revisited.

First, thinking about language acquisition in connection with usage raises the empirical question of where the mental representation of linguistic structure comes from in the first place, that is, the source of a child’s linguistic knowledge. Although the poverty of the stimulus argument may offer powerful reasons to believe that knowledge of grammar cannot be learned and is thus innate at the core, such arguments are based on highly questionable assumptions. Johnson (2004) points out that Gold’s Theorem (Gold, 1967), the cornerstone of many arguments for innateness based on the logical problem of language acquisition, has been widely misinterpreted; in fact such arguments can simply be refuted by pointing out a few facts. First, Gold’s Theorem assumes that the endpoint of acquisition is the identification of a complete grammar that replicates exactly the ideal grammar that is assumed to have generated the set of sentences heard by the learner. However, identifiability is a technically defined property involving infinite sets that does not apply to the conditions of normal human language acquisition. Therefore, the logical problem of language acquisition is not complicated by Gold’s Theorem, and can simply be solved by referring to indirect negative evidence. New questions now come to the fore: Is this kind of indirect information from input available and usable by the language learner? What kind of mental structures are built from that information? Is the linguistic output resulting from those mental structures so tightly context-bound that it does not

Deleted: theorem

robustly generalize, or is the knowledge generative? Research on bootstrapping (Weissenborn & Höhle, 2001a; Weissenborn & Höhle, 2001b) and on statistical learning (Saffran, Aslin, & Newport, 1996) offer proposals as to specifically what information is available to the learner. With this research in hand, there is an empirical basis for claiming that input significantly shapes the form of a learner’s language.

Secondly, language acquisition research has raised the question of what is acquired. It is not at all clear that language is most perspicaciously conceptualized as an algebraic object, as many theories implicitly assume. One particular way this concern for the nature of the target has manifested itself is in research on how linguistic units of different lengths and levels are acquired, represented, elaborated, and generated. Examining these data with an eye to what the relationship is between units of different sizes, Peters and others (Peters, 1983; Tomasello, 2003a) have found that there is surprising structural similarity between the representation of short strings and longer ones. Although complex units can certainly be hierarchically generated from scratch, many apparently complex units are often represented as simple pre-fabricated strings, and others are to varying extents hybrid creations. In one sense, this is just a minor diversion – memorized sequences are hardly the meat of linguistics – but in another sense, the existence of these data gives rise to curiosity about the very large continuum that lies between the fully generated from scratch and the memorized holophrase. To put it more starkly, “if there is no clean break between the more rule-based and the more idiosyncratic items and structures of a language, then all constructions may be acquired with the same basic set of acquisitional processes” (Tomasello, 2003a, p. 6). The adult end-point is reconceptualized as a set of constructions, not just as a grammar of rules (even though, to be sure, speakers do make rule-like generalizations). Since usage-based mechanisms are the primary route for learning sets of constructions, the realization that constructions can be the framework for all grammatical knowledge is a substantial step forward for usage-based models.

2.3 Psycholinguistics The field of psycholinguistics has also been foundational in the development of usage-based models, although more implicitly than explicitly. The primary task of cognitive psychology research in general is to show how some subset of reality outside the mind creates or leads the mind to create a corresponding structure inside the mind. Psycholinguistics follows this rubric in its search for the processes by which some subset of a speaker’s linguistic milieu creates corresponding mental representations inside the speaker (Gernsbacher, 1990). For example, studies of on-line processing in language comprehension have shown that the mental representations people use when interpreting

Deleted: oDeleted: o

input speech are highly sensitive to word frequency in the input (Kim, Srinivas, & Trueswell, 2002). In language production, studies of structural priming (Bock, 1986; Boyland & Anderson, 1998; Branigan, Pickering, Stewart, & McLean, 2000) have shown that speakers’ choice of grammatical forms is strongly influenced by the statistics of recent input. These results show that information about usage is richly represented in speakers’ minds, and is drawn upon extensively. Other psycholinguistic and cognitive psychological research has, independently, discovered mechanisms for implicit learning and skill acquisition that neatly subserve usage-based models, as they transform concrete instances of input into generalizable abstractions in a way that is sensitive to frequency and distribution. With knowledge such as this, usage-based models are well-grounded in cognitive plausibility. Where psycholinguistics has not often gone is the leap from immediate short-term effects to the longer-term phenomena that usage-based models are interested in. 2.4 Computational linguistics Without the injection of the technical infrastructure of computational linguistics, the usage-based approach could never have developed the strength it now has. Usage-based models depend greatly on the ability to generate output structures based on the statistics of the input. Computational linguists know how to construct algorithms that map distributions of structures in the input onto structures in output, sometimes mapping also onto intermediate internal states. Computational work on parsing, speech recognition, and production parallel the work of experimental psycholinguists, but they formalize the models by using explicit algorithms, and test specific quantitative hypotheses (Christiansen & Chater, 2001a). The fruit of this formalization is greater explanatory power, which cannot be mistaken for handwaving (Sampson, 2001). Historically, the brute force methods of weak AI have been taken to be a poor model of human functioning, but as knowledge of the powers of the brain and natural language engineering’s successes have grown, respect for data-intensive rather than rule-based processes has followed suit (Budiansky, 1998). 2.5 Optimality Theory Most recently, a few generative linguists, particularly Optimality Theorists, have joined the discussion (Albright & Hayes, 2003; Boersma, 1998; Fanselow, Fery, Schlesewsky, & Vogel, 2006). Optimality Theory (OT) is a formal generative theory in which multiple potential forms are automatically generated, and the actual preferred form is determined

quantitatively by a hierarchy of constraints whose ranks are derived at least partly from properties of the input (McCarthy, 2002; Bresnan & Aissen, 2002). Whether OT work falls under the rubric of usage-based models is a matter for discussion. Although OT shares the concern of usage-based linguists for marginal, graded, phenomena and for sensitivity to input, and although the ranking of constraints is based on experience, mainstream OT still posits innately-given structures that have no basis in experience (but cf. Hayes, 1999). 3 General characteristics and phenomena of interest Usage-based models (UBMs) have come to share a number of general characteristics, as Kemmer and Barlow (1999) comprehensively delineate. Expanding on the characteristics discussed above – the focus on accounting for the full range of production data, and the desire for parsimony in explanation – Kemmer and Barlow highlight a number of additional points. First, instances of use and our mental representations of them are not seen just as products of grammars, but, in aggregate, actually constitute our linguistic competence. The fact that speakers actually do plan and utter marginally correct sentences, even if they do not have clear intuitions about them, tells us something about the linguistic structures available to speakers. Second, nearly half a century of research in artificial intelligence has shown that humans, much more so than computers, have extraordinary abilities to discern patterns in input data. Typically, usage-based models find that type and token frequencies, as well as a rich representation of the linguistic and non-linguistic context, are aspects of the input that figure prominently in extracting patterns and creating representations of them. Third, usage-based models allow speakers to be sensitive not only to the richness of the input but also to its temporal unfolding; adult speakers continue to receive input, which may fluctuate in its distributional properties. Usage-based models provide the conceptual infrastructure to accommodate dynamic mental representations that reflect the continued influx of input. As input accumulates beyond puberty, the overall grammar becomes more and more stable, but there is never assumed to be a point at which an adult’s grammar is considered to be crystallized (MacDonald, 1999).

Usage-based models are committed to both mental representations and mental processes that can capitalize on the richness of language use to do linguistic work. They use input information available to adults and children to explain both core phenomena (e.g., subject-verb agreement in English) as well as phenomena that might otherwise be written off as noise. Usage-based models are particularly interested in phenomena in which the statistical structure of the input predicts interesting departures from a speaker’s, or from a language community’s, presumed status quo. On the other hand,

unexceptional speech constitutes the indispensable background that is explained using the same representations and processes as exceptions, variation, and language change.

Unlike much of the rest of linguistics, which offers explanation for phenomena only within a single level of analysis (such as syntax, or morphology, or at most the interface between two such levels), usage-based models’ scope is wide; they are meant to generate explanations for phenomena ranging from the phonological to the pragmatic. How this can be done without obscuring all interesting detail is part of the work of usage-based linguists. Of course, representations and processes work together, but for expositional purposes, it is convenient to categorize some common phenomena into those explained primarily by facts about representation, versus those explained primarily by facts about processing. On the representational end, usage-based models seek to explain gradations in grammatical category membership, local regularities in apparent exceptions to rules, and the dual nature, hierarchical yet string-like, of utterances. On the processing end, they seek to explain conflicts between live data versus data from judgments, diachronic patterns that appear as grammaticalization, and psycholinguistic data about order of acquisition and on-line speech processing. Taking each of these in turn, let us survey some examples of the range of phenomena whose explanations usage-based models can unify.

3.1 On representations: When categories fail Linguists find it convenient, productive, and intellectually satisfying to capture the richness of language structure in brief but powerful strokes. This is normally done by distilling out an elegant set of rules that act upon insightfully well-defined categories, while denominating any remaining noise as exceptions. There is something thrilling about watching a whole phalanx of data fall to a good analysis. Unfortunately, though this scheme may appear to work in isolated cases, it runs into fundamental problems in the general case. The categories that rules act on, and the categories that rules define, all have feet of clay; they do not support the weight of linguistic description. Likewise, the rules that generate grammatical strings also turn out to operate in a more limited fashion than they at first appear to (Joseph, 1997a). Must every string then be treated in its own individual terms? Must elegant and parsimonious explanations of general patterns be given up in favor of long arbitrary lists? The news is not all bad. However disappointing it might be to face the limits to full generality, usage-based models see hope in the patterns that are present in the midst of the messy margins of language, and capture those patterns in alternative representational schemes that do the same work as done by categories and rules, but which also handle the more slippery world of actual speech. The

remainder of this section lists some of the ways that categories and rules break down, and then points toward more robust alternative approaches to representation. 3.1.1 Gradient category membership A major source of the difficulty in explaining, and even in describing, the full range of linguistic output is that the categories we use are far from clearly defined. Phonemic contrasts, for instance, are foundational to phonological theory. Phonemes are supposed to be atomic, and syntactic function is not supposed to depend on the phonetic detail of subphonemic gradients. But Bybee (2001) and Berkenfield (2001) show that sub-phonemic variation, such as vowel duration and formant frequencies in the pronunciation of a word, carries semantic and syntactic load. In this case, the foundational structuralist categories fail to represent what needs to be represented linguistically (Bybee & Hopper, 2001).

Gradience also exists in syntactic category membership, eliciting similar puzzles. A celebrated early demonstration is Ross (1973), who points out that tests of whether a given string constitutes a NP or not are often in disagreement. For example, X can keep tabs on Y, but can tabs be kept on Y by X? With regard to verbs, sentences such as those below have been called ungrammatical, due to violations of verb argument structure:

(1) The boy frightens sincerity (Fromkin & Rodman, 1998, pp. 184-185, as quoted in Sampson, 2001, p. 139) (2) The book dispersed (Chomsky, 1965, as quoted in Sampson, 2001).

But are these positively ungrammatical, or are they merely semantic oddities that are hard to think of contexts for? A book-arts guild and a performance artists’ guild might collaborate to create a living human artist’s book for the occasion of a gallery opening, and disperse immediately thereafter. In fact, verb subcategorization may never be complete; Pat sneezed the foam off the cappuccino (Goldberg, 1999, p. 199) is an example that points to infinite families of verbs that raise the question of whether it might be impossible to set down a complete subcategorization scheme.

Gradience also exists in judgments of grammaticality. Often these are due to competing input leading to systematic uncertainty in the individual minds of members of a speech community, (e.g., What would’ve you done? (Boyland, 1996); hone in on (Zwicky, 2005); between you and I (Boyland, 2001)). Although speakers often try to avoid using expressions that they are uncertain about (Zwicky, 2005), they may not always do so, and the resulting utterances do not clearly fit into the category of

grammatical or ungrammatical. Objections to the idea that concepts could in any way have unclear boundaries are manifold and strongly-worded. A standard approach is to argue that gradient categories could not possibly exist for logical reasons. Fodor and Lepore (1996, p. 259) consider and reject the claim that “there is no fact of the matter about whether wall-to-wall carpets are furniture.” They argue, “This sort of consequence is arguably not tolerable. Excluded middle is a law of logic, and laws of logic are necessarily true.” Usage-based linguists consider defining a problem out of existence to be unsatisfactory science. Given the pervasiveness of gradience, the task, instead, is to formulate the problem to be solved such that it acknowledges the data.

3.1.2 Gradient regularity of rules Arguably an even deeper problem than the inability to classify an isolated utterance into categories is gradience in the generality of rules. A model of language that depends on a system of rules acting on categories is vulnerable to weaknesses not only in the categories it establishes, but also in the rules postulated. Gross (1979, p. 860, as quoted in Joseph, 1997a) remarks that, in an attempt to write a comprehensive grammar of French, “if we compare the syntactic properties of any two lexical items – it is observed that no two lexical items have identical syntactic properties. If we compare ... the domains of the rules, the result is the same.” A term that fairly describes the state of affairs found when there is gradience in the generality of rules is quasi-regularity. Quasi-regularity is demonstrable both in so-called regular paradigms as well as in so-called exceptions. Without discounting the extensive data that are usually used to support the existence of rules, this section examines some evidence that both categories and rules demonstrate gradient applicability.

The standard conception of rules assumes that regular forms are normally represented as sets of items to which a set of “competence”-based rules are uniformly applied except for “performance”-based variability. Irregular forms, on the other hand, are seen as having an entirely different representation, as arbitrary sound-meaning pairings that are learned through an associative process not involving rules. There is no such thing, in this view, as a form that is intermediate between a regular and an irregular, because regulars and irregulars are fundamentally different from one another in their representation and in the processes by which they are derived. However, the linguistic data suggest that rules are not uniformly applied, and that irregular forms (exceptions to rules) display patterning very much like regular forms. The question is how to understand rules if they are not generally applicable, and how to capture the relationship between the more regular and the less regular cases. That is, should there be two systems (rules and exceptions) or one (a gradient between more and less regular)? Usage-based models

generally hold that the lack of a clear line between the regulars and irregulars means that there should be a single system, not two. Debate has raged between proponents of dual-system vs. single-system representation.

The essential issue here is that, if a line is to be drawn between the regulars and the irregulars, the pervasiveness of quasi-regularity means that the line would have to be placed arbitrarily. For example, a classic locus of quasi-regularity is the English past-tense (McClelland & Patterson, 2002; Rumelhart & McClelland, 1986). By default, past tense is marked by the suffix –ed. However, this is not a completely general rule since there are exceptions governed by competing paradigms such as sing/sang, or lead/led, or meet/met. However, there is also a great deal of bleed-through between the various patterns. We find, for instance, that irregulars are subject to influence from multiple sources, including the more general rule. As Seidenberg explains (Liberman, 2004b), segregating the irregulars from the regulars gives no ongoing basis for the irregular past-tenses to continue favoring forms that end in [d] and [t] similarly to the regulars. In Seidenberg’s words, “Pinker has to treat the overlap between rule governed forms and exceptions as coincidental at best, or the detritus of historical events like diachronic change.” Furthermore, not only do regular forms influence irregulars as just described, but irregulars also influence regulars. If regular forms were perfectly governed by rules except in the case of performance errors, a regular past tense like forgoed (Pinker, 1999) or a regular plural like dwarfs1 (Tolkien, Carpenter, & Tolkien, 1981) would raise no eyebrows – but they do, because the regular forms are represented in a way that is not in fact entirely segregated from the irregular forms.

Janda and Joseph (Joseph & Janda, 1988; Joseph, 2002; Joseph, 1992) have wrestled with the limited generality of rules, using their construct of “constellations,” which they define as localized generalizations. Joseph (1997a), for instance, discusses the American English example of in school, in college, in court, in jail, but in the hospital, as well as many examples from other languages. Janda, Joseph, and Jacobs (1994) put forward the phenomenon of hyper-foreignization, in which words like lingerie lose the original [ɛ] for hyperforeignized [an], and Beijing gets a hyperforeignized [ʒ] where

English [dʒ] would be closer. These substitutions are quite systematic, automatic, and

generative, for most Americans, thereby constituting, in some sense, a “rule.” However,

1 J. R. R. Tolkien agonized with his editor over what to call the plural of dwarf; as a compromise, he decided to forgo both the etymologically supported dwarrow and the regular and more standard dwarfs (Tolkien, Carpenter, & Tolkien, 1981), in favor of dwarves, thus illustrating the productivity of a quasi-regular paradigm and its viability over against a representational system that forces a choice between a rule and a list of known exceptions. (See also Liberman, 2004a,)

Deleted: [æ ]

as all of these cases and many others demonstrate, there is a limit to the generality of the rules. In the end, Joseph (1997a, p. 158) concludes that

It should be clear now where I stand on the question of how general our generalizations are: they are as general as speakers allow them to be, and that can be very ungeneral or quite broadly general. There are rules, and there are regularities in language, but when one examines where these rules come from, it is often from the cumulative effect of particularized extensions from one lexical item to another. Since this rule-formation process is an on-going one, a synchronic glimpse of a language is always going to capture the language with at least some incomplete generalizations. Thus if we as linguists are attempting to mirror speakers’ knowledge of their language through our grammars, we should be prepared to have less-than-fully-general generalizations, and also subregularities that are defined on a very localized basis; in short, we should expect to find, and thus to have in our grammars, both local generalizations and constellations – if speakers are able to stand having them, then so should we as linguists!

For theories of grammar that rely on an idealized language that clearly demarcates which strings are members of the set of grammatical sentences and which are not, violations of the ideal are merely noise that must be dealt with on a non-grammatical basis. As documented in the skirmishes between Pinker (Pinker & Ullman, 2002) and Seidenberg and McClelland (McClelland & Patterson, 2002; Plaut, McClelland, & Seidenberg, 1995), the rules and exceptions framework is a straightforward attempt to handle the incompleteness of rules by offering a distinct non-linguistic representation for the exceptions. But because the assumption is that to be linguistic implies being regular, even if there is any regularity in the “detritus” that does not fit neatly into the general rules (Liberman, 2004), such regularity is treated as non-linguistic: as historical accidents, as cognitive confusions, or as simple associations incapable of generating complex structure. In any case, non-ideal cases are segregated from the ideal ones, and labeled idioms or lexical exceptions (Pinker, 1999; Fodor & Lepore, 1996). No link is made between the regular and the irregular cases; they are seen as due to entirely distinct mechanisms.

Usage-based research, however, reminds us that it may not be possible to cleanly segregate the ideally regular patterns from the partially regular and completely irregular. In morphology and syntax, some exceptions are wildly irregular, while others are only slightly irregular and form substantially regular local patterns. In between, there are different degrees of regularity both in morphological paradigms and in syntactic

constructions. For example, in verbal morphology there are suppletive irregulars such as is/was. There are also irregulars such as keep/kept and sleep/slept that show stem deformation but retain the alveolar past tense suffix. Other verbs have alternate forms, such as lighted or lit (Scarry, 1966). Some verbs are slightly irregular only in very specific contexts such as the past participle blessèd. In a rules and exceptions framework where exceptions are listed as lexical entries, it is unclear how to represent the degree of regularity that an exception participates in (Seidenberg, in Liberman, 2004).

Why should gradience matter? First, it matters because it is pervasive throughout language, which means that a few theoretical patches will not solve the problem. Expressions will always be conventionalized to different degrees and thus will display varying levels of submission to rules. In a system where the utterances that do not fit the rules must be segregated and treated as part of a distinct module of lexical entries, the representational scheme must include some kind of explicit tag to indicate the extent to which the utterance fits the rule. Usage-based models assume a different type of representation, one that carries that kind of information integrally, and makes it retrievable if necessary, but does not need explicit tags to keep track of the information. Secondly, it is psychologically implausible and theoretically unsatisfying to attribute all gradient phenomena to performance factors such as memory limitations. As Hayes (2000, p. 99) points out,

Patterns of gradient well-formedness often seem to be driven by the very same principles that govern absolute well-formedness . . . whatever ‘performance’ mechanisms we adopted would look startlingly like the grammatical mechanisms that account for non-gradient judgments. Instead, knowing that pronunciations will always vary gradiently by context (and

thus provide clues to syntactic content), and that it is impossible to fully specify verb subcategorization, suggests that these are facts due to something inherent in the human cognitive capacity for language. Finally, building gradience into basic linguistic representations would have serious implications for linguistic theory (Gahl & Garnsey, 2006). The next section describes some observations that lead toward an approach that takes the full range of data into account. 3.1.3 Multiple scaled representations as start of a usage-based solution The problem of quasi-regularity requires a response more nuanced than either relegating the problematic structures to a separate non-linguistic system, or conversely giving up altogether on generalizations. An emphasis on gradience has historically been seen as

flagrant disregard for the structure that the human mind is capable of generating (Pinker & Prince, 1988). The usual argument has been that this apparent failure to recognize generativity is tantamount to trying to explain language in terms of associations of the sort that Chomsky demolished in his critique of Skinner (Chomsky, 1959). In fact, though, respect for gradience can be construed as a heightened respect for generativity, in its original sense of systematic creativity, because it is a way of seeing complex living structure where others have only seen dead associations.

If there is one observation that grounds the center of usage-based models of language, it would be the observation that, as language users repeatedly hear and produce language, they create new representations of recurring substrings, as unanalyzed units, irrespective of how regular or compositional they might be (Tomasello, 2003a) or how cleanly they form a constituent grouping. These substrings come to reside in language users’ minds as units of different lengths and different levels of granularity. It is then these substrings that form the fundamental building blocks in a usage-based representation. The substrings selected by the speaker may or may not be decomposable by the speaker or by the listener. The substrings may also be assembled compositionally to form longer utterances. Linguists may use any analytic scheme they find useful, but the insight of usage-based models is that these substrings extracted by the language learner are basic units of linguistic representation.

For instance, a sentence I don’t know about that can be analyzed into its usual smallest lexical and morphological constituents, but the dunno (or even the i-unno, complete with its phonetic detail) exists also as a formula that becomes a building block for the larger structure (Scheibman, 2000). Extending in the opposite direction, the sentence as a whole also exists as a formula that has potential for further building (e.g., A new stadium, I don’t know about that). These collections of constructions (Kay, 1997; see below) offer an analysis of an utterance that allows it and its parts to be seen both as being composed of parts and as being an unanalyzed surface string.

There are some data supporting a segregated system, rather than the integrated usage-based approach put forward here. For instance, Marslen-Wilson and Tyler (1998) found that briefly presenting readers with derived regular verb forms increases the speed with which the reader could identify the related verb stem, whereas the same did not hold true for irregular verbs. Pinker (1999) takes this difference in behavior between regular and irregular verbs as support for a segregated system. Regular verbs can be morphologically derived by a default rule, which is quick and easy to apply, while irregular verbs must be searched for in the lexicon, which is slow. But the apparent distinction disappears when the task is oral rather than written; then there is no advantage for the regular verbs over the irregulars. Pinker (1999, p. 134) states that “no one knows why” this is the case. But from a usage-based perspective, this is as it should be. Human language representations are evolved for speech, not reading, and the equal performance

of the irregulars and the regulars in speech suggests that they share a representational system. With a unified system, in which even derived forms of regular verbs have a listing as a surface string, there is no need for a representational distinction to be drawn. At the same time, every verb participates in some kind of patterning which can be recovered upon analysis, and which can be the basis for the patterning selected for newly learned words in the future. In this way, there is no abandonment of generalization.

Why is an integrated system desirable, where all expressions exist simultaneously both as unanalyzed wholes and as components and composites? The simple answer is elegance. Any human language uses units both small and large, both regular and irregular. Given this wide range, it is much simpler for the underlying system to work on the same principles for all units at every level, rather than having two separate systems, one a rule set and the other an appended list of arbitrary exceptions. If only there were a circumscribed set of marginal utterances that could be given special status, a rule-plus-list system would be an easy solution. But given the pervasiveness of gradient irregularity, a system that tries to accommodate exceptions by relegating them to a a list of arbitrary idioms would find the list to be unbounded in length. Under these circumstances, a rule plus list system is just as unparsimonious as a system that consists entirely of lists. Besides, usage-based systems are so much more than arbitrary lists of exceptions; without having to make a clear-cut distinction between regular and irregular, between grammatical and ungrammatical, they can still capture the regularities that are present in language. Thus the usage-based representational system is at an advantage over rule-plus-list systems, both in terms of parsimony and in terms of descriptive adequacy.

3.1.4 Interim discussion: the highly rule-governed as a special case Explaining both the more rule-governed and the less rule-governed in an integrated framework presents the challenge of how to position the more regular and the less regular in relation to each other. In a segregated framework, a standard response (Chomsky, 1965) to this challenge is to invoke the competence/performance distinction by referring to physics: like the laws of motion that are described by Newtonian mechanics, the suggestion is that we can make true and useful headway by idealizing away factors like friction and random noise, and bringing in a separate equation for friction and noise when the circumstances dictate. This is a reasonable response. However, we must consider the boundary conditions of its usefulness.

One reason to avoid such a division into competence vs. performance, rules vs. lexically-listed exceptions, Platonic ideal vs. messy real-life, is that the messiness is everywhere and is in fact what we should be studying. The meteorology of an idealized Earth without friction would be uninteresting and practically useless. Language is not like

idealized Newtonian physics, because the non-ideal situation is the norm in the realm of language. Deviations from regularity are extensive enough and complex enough that it would be misleading to consider them merely noise in the system. Indeed, considering the larger framework of usage-based models, like the larger framework of non-Newtonian physics, allows one to see the idealized case not as the norm, but as a special case within the larger framework.2

In the end, current applications of the physics metaphor in linguistics give too little generative power to language users. There needs to be a theory of what happens under apparently non-ideal conditions, and if that theory can also generate the “core” phenomena, so much the better. The inseparability of linear and hierarchical representations depends to a large extent on the abductive nature of language

2 It may very well turn out that the metaphor might actually work in reverse. In

the mechanics plus noise framework, which I am arguing is inappropriate, the assumption is that the actual motion of an object should be seen as a special case of the motion that would occur under ideal conditions. In this sense, the idealized situations would be more true than the actual situations. However, a completely different metaphor results from placing Newtonian mechanics in the larger context of relativity. There is no attempt here to misuse Einstein to justify claims like “science has now proven that anything goes.” Rather, the point is just that, if there is a theory in which whatever is neatly-governed-by-known-rules falls out naturally as a special case of the larger picture, then there is reason to consider the merits of the larger theory. The non-Newtonian phenomena are not randomness, while the more Newtonian phenomena constitute a special case within the theory. And, in contrast to physics, the neat special case in linguistics actually does not cover most of our everyday experience. Yes, both metaphors do address both the straightforward and the less-straightforward, but they come to opposite conclusions about which analyses are more applicable to the problem at hand.

The position of usage-based linguistics is similar to that of cognitive behavioral economics (Wanner, as cited in Lambert, 2006), which garnered a Nobel Prize for challenging classical economics’ strict disregard for the idiosyncrasies of human mental processing. Comparing and contrasting physics and cognitive economics, David Laibson (in Lambert, 2006) remarks, “If I want to build a bridge, pass a car, or hit a baseball, Newtonian physics will suffice. But the psychologists [doing cognitive economics] said, ‘No, it’s not sufficient, we’re not just playing around at the margins, making small change.” Again, the point is not that rules are never useful, but that for human-generated systems like economic markets and like language, discounting the data that aren’t neat will no longer do, because there is too much of it. In our world, acknowledging the primacy of usage buys us more than small change.

comprehension and language acquisition. The listener must reconstruct the structure of the incoming message, and there are multiple analyses possible, constrained of course by the input itself, but also by the computational properties of the listener. The next section offers a usage-based perspective as focused on process rather than representation. 3.2 Processes: Can known processes supplant innate ideas? Another class of usage-based models aims to account for the facts of acquisition and on-line processing, and variation and change, by showing how speaker/hearers’ general cognitive processes derive the knowledge they need from the language to which they are exposed. These models suggest that unlearned innate constraints need not be specific to language; instead, the general properties of human cognition sufficiently constrain the structure of and the interpretation of the input to allow humans to learn and process language.

Models that focus on cognitive processes tend to share a particular rhetorical strategy. First, they demonstrate that a particular set of cognitive processes would be logically sufficient to accomplish a given set of linguistic functions; often the demonstration consists of a computer simulation. They then show that these processes are actually available to human speakers, and that these processes are in fact responsible for the performance of this set of linguistic functions in human speakers. They conclude that the linguistic work that needs to be done is explained by the processes in question, without a need for heavy innate constraints specifying the shape of linguistic representations. In the end, the claim is that there is a uniquely human set of common cognitive processes that the speakers and listeners of a community are able to perform on the speech stream, rather than a uniquely human set of common representations. The subsections below discuss the processes responsible for sensitivity to discourse pressures, the processes involved in grammaticalization, the processes of on-line sentence processing, and the processes involved in acquiring distributional information from language input. Each of these domains offers evidence that known cognitive processes account for the creation of whatever mental structures are needed for language. What is innate is what we do with language, not the knowledge of language itself. 3.2.1 Data from natural discourse point to cognitive processes that invented sentences hide One important set of processes that speakers engage in is responding to discourse pressures during speech production and comprehension. An exclusive focus on the

possible output of a presumed language capacity diverts attention from the intricacies of existing and observable actual output of an actual language capacity. Usage-based language researchers often take note of the discrepancies between the conclusions that are drawn from linguists’ intuitions and phenomena that are observed in actual usage. Schegloff, Ochs, and Thompson (1996) explain how a variety of intellectual traditions have converged to create a nucleus of interest in discourse-and-grammar, in which “the study of grammar entails both taking actual discourse as one’s primary data, and explicitly relating the structure of grammar to the structure of discourse.”

Usage-based models seek to explain relationships between discourse and grammar that become salient when the data consist of language as it is used in context. For our purposes, adequate consideration of real usage leads us to ask questions about three types of phenomena. First, we notice that interactional goals are one constraint on synchronic grammatical analyses. For example, on the basis of data from conversations, Ono, Thompson, and Suzuki (2000) question whether –ga can properly be considered a subject marker in spoken Japanese, as it traditionally has been; in fact the data seem to suggest that it regularly fails to occur on subjects, except in certain limited types of contexts that are over-represented in invented sentences. Secondly, attention to discourse allows us to notice patterns in diachrony that would not have been possible otherwise. The classic example of this phenomenon is the development of negation in Romance: ne oenum > non > ne > ne pas/point/mie/... > ne ... pas > pas. Today’s ne V pas (V not a step) in French, for instance, developed from a set of expressions that were used to strengthen the pragmatic force of a negative particle, especially given the erosion of the phonological form (Jespersen, 1966). This development cannot be fully understood without recognition of discourse pressures in the creation of grammatical structure.

More fundamentally, the consideration of data from natural discourse prompts the question of what should “count” as grammar. As Pawley (1996) points out, if grammar covers everything that can be generated from the rules of the language, but (say) unidiomatic possible utterances like *plane-driver are considered ungrammatical, then grammar is in the impossible situation of needing to accommodate NPs like “colorless green ideas” while not allowing NPs like “ship driver.” Should grammars be the minimal set of rules that will yield the maximal number of grammatical sentences? To what extent should lexical restrictions be included in the rules of “grammar”? Should they distinguish normally acceptable utterances from ones that no speaker would ever actually say, even if memory limitations were not an issue? Hopper (1987) advocates an alternative view of grammar that denies an a priori representation of a set of rules that generates all and only the strings of the language, as if the set of strings of a language could be specified by rule. Instead, he propounds a notion of “emergent grammar,” in which speech is always assembled from fragments of prior speech. Setting aside momentarily the question of how fragments are to be assembled, in such a view the abstraction of grammar is an

epiphenomenon in the linguist’s mind, which is derivative of the set of concrete utterances that the linguist is analyzing.

More generally, the observation is that grammatical marking and structure, and even membership in lexical categories, are all determined by speech as it is actually produced in natural contexts. In contrast to most of the reductive physical sciences, the systems linguists must model are not systems existing purely as a specimen of a divinely created order (nor even of an evolved order); rather, humans, with all their generative powers, collaborate in the creation of language. On the basis of data revealing varying degrees of N-hood and V-hood, Hopper and Thompson (1984) make the point, for instance, that the categories of N and V emerge from discourse pressures. Although not all usage-based models argue that syntactic structure is directly caused by functional pressures, most do argue that one source of syntactic structure is the characteristic speech output of language users, whatever the functional pressures on them or whatever their creative powers might have been. Usage-based models show why linguistic data need to come from usage, not just from intuition, because these two discrepant sources of data often lead to conflicting understandings of the structure of language.

3.2.2 Diachronic phenomena, including grammaticalization Another place where cognitive processes come to the fore is in language variation and change. The presence of phenomena that do not submit neatly to synchronic analysis is commonly found in language change, and indeed, is sometimes a necessary precondition for certain kinds of change. As Croft (2000) points out, the dominant sociolinguistic and generative theories of language change are not fully satisfactory. Sociolinguistics does well at explaining adult-to-adult transmission as well as synchronic variation, but does not fully address innovation. Generative theories focus on innovation originating in child language acquisition, but leave no room for transmission among adults, nor do they have much to say about synchronic variation. A more complete and consistent theory would explain both innovation and transmission in a coherent way, preferably empirically-based. Usage-based models aim to fill this gap.

Much of the attention to diachrony in usage-based circles falls on a collection of diachronic phenomena called grammaticalization, sometimes also called grammaticization or grammatization. A key factor in grammaticalization is variation in the statistics of usage patterns. The development of negation in Romance outlined above is representative of grammaticalization. Grammaticalization theorists (Hopper & Traugott, 2003) have noticed that many diachronic developments trace out a cyclic pattern whereby lexical items in phrasal expressions, and sometimes the entire expressions themselves, become conventionalized, then take on increasing pragmatic

significance and appear in new contexts while losing semantic and phonetic distinctiveness. Eventually they often attach themselves in their reduced state as bound morphemes that serve a grammatical or morphological function. At the same time, new combinations of lexical items within constructions are constantly introduced into the language, a select few of which become conventionalized, thus starting the process anew. Any given grammaticalizing change might begin and end at any point along this path (Boyland, 1996).

Examples of grammaticalization are abundant and are attested across a wide range of languages (Bybee, Perkins, & Pagliuca, 1994; Sun, 1996; Heine & Reh, 1984). Dahl (2001) provides many examples of early grammaticalization, which occurs when optional expressions become obligatory because of what he aptly calls inflationary effects. For example, hěn in Mandarin is a frequently-used intensifier best translated as very, and it still carries that meaning in many contexts. In the context of a predicate adjective sentence, however, where a degree adverb is required before the adjective, it has come to function as a semi-obligatory place filler, used whenever the adjective needs no intensification – exactly the opposite of its original function. This is a grammaticization on the early end of the path, where the simple process of conventionalization changes the semantic and syntactic possibilities available to a construction: pragmatic inference grows, semantic substance is lost, syntactic function is added.

The later end of grammaticization is characterized by additional morphosyntactic and phonetic change. The familiar case of going to > gonna illustrates both (Hopper & Traugott, 2003, p. 69; Bybee, 2003). As gonna increased in frequency and conventionalization, it began to show increasing emancipation from the semantic and syntactic contexts in which it originated (Tabor, 1994), appearing first in the context of traveling, then of intending, then of future-marking. With conventionalization also comes predictability; so gonna also displays the phonetic and phonological changes that take place as a construction is grammaticized, namely loss of phonetic substance and increased bonding. Each of these developments are illustrations of linguistic phenomena explained by cognitive processes applied to language.

Diachronic patterns like those observed in grammaticalization depend on the availability of a palette of semi-acceptable structures from which speakers may draw (Zwicky, 2002), even if they do not do so under normal circumstances. These semi-acceptable structures in the background (or what some would call the “margins”), can be from any level of analysis, phonetic through pragmatic. Actuation of language change occurs when a critical mass of speakers shift structures from just outside their idiolect, to inside their idiolect, partially as a function of the speech occurring around them (Giles & Coupland, 1991; Milroy, 1992).

The study of grammaticalization, with its graded category membership, its dependence on discourse, and its attention to the statistics of input speech, has become a

favored topic among usage-based linguists. Although not all claims associated with grammaticalization theory are relevant to usage-based models, a few central claims of grammaticalization theory are shared by the usage-based enterprise. First, real usage data are important for understanding the patterns of change, the timing of change, and where the changes come from. Tendencies in the kind of forms that tend to emerge from particular sorts of sources can be explained by finding regularity in the contexts in which the forms are used. Second, there is strong agreement on the importance of the processes by which usage becomes entrenched; acknowledgment of these processes is essential for understanding how and when changes are actuated and propagated.

Grammaticalization is an area of linguistic inquiry where it is clear that usage predicts the direction of changes that actually occur. Furthermore, the mechanisms that are at work in grammaticalization cannot be barred from operating in other arenas, and so these same mechanisms take a leading role in explaining other language phenomena. Indeed, studies of grammaticalization can be seen as providing an explanation for why languages have grammar at all, and for much of the shape of grammar as it exists. In sum, grammaticalization is a phenomenon in which usage patterns predict the content and the timing of changes that eventually show up in grammar. Usage-based mechanisms support a satisfyingly unified explanation of these changes. 3.2.3 On-line and acquisition phenomena The language acquisition and language processing literatures bombard us with observations about preferences for or against different structures in acquisition, comprehension, and production. MacDonald (1999) presents a set of puzzles, from on-line speech comprehension, speech production, and language acquisition, and shows how considering the human power to process distributional information solves all of these kinds of problems.

The first observation is that in the comprehension of sentences like “John said that Bill left yesterday,” there is a strong preference to interpret the modifier yesterday as being attached to the closest verb left rather than to the more distant verb said. In contrast, there is much weaker preference for local attachment in the comprehension of sentences like “Cynthia saw the woman from the balcony.” In this case, although the modifier from the balcony is nearer to the NP the woman, it preferentially attaches to the verb, which is more distant. MacDonald observes that there are similar differences in production, which are due to independent constraints such as working memory and ease of retrieval (Hawkins, 1994). The different biases showing up in comprehension are due to differences in the modifiers’ distribution in different types of sentences, and with respect to different types of verbs.

The question is how such detailed distributional information is acquired and maintained for use during adult speech comprehension. MacDonald’s usage-based answer is that the distributional information specific to any given language must be learned during child language acquisition. Together with the cognitive psychological findings showing sensitivity to input distributions in children and adults (e.g., Saffran, Aslin, & Newport, 1996), she concludes that this distributional information does not stop being tracked in adulthood, and in fact is at the center of our linguistic competence. This contrasts strongly with an approach that would solve these puzzles with innately specified principles specific to the parsing of sentences in which the verb is modified at the right margin of the sentence. Within a usage-based approach, explanations requiring the creation of an innate module must give way whenever an independently motivated explanation suffices.

3.3 Unified explanation across levels is not vacuous The same set of proposed mechanisms and representations can explain, in a unified way, all of the phenomena listed above. In a usage-based account, the details of each instance of speech are what is essential, and generalizations are built up from, and rest on, the foundation of the details. For those accustomed to thinking of specific instances as reflexes of rules, reversing the direction may go against the grain. It may appear that basing a system on specific instances may lead to a disregard for the patterning that can be observed in linguistic data, but in fact building generalizations upon a large number of specific instances highly constrains the analysis. In a usage-based approach, the representational system can handle both specifics and generalities; the processes available as part of human cognition ensure that all relevant information gets to where it needs to go. 4 Overview of methods Since accounting for a full range of data is a principal aim of usage-based models, it is no coincidence that a characteristic feature of the usage-based approach is the breadth of its empirical base. The phenomena that can be explained by usage-based models are themselves wide-ranging, across linguistic levels of description, from phonology to historical syntax. Also wide-ranging is the knowledge base used to build usage-based theory, crossing the disciplines of cognitive science. For both of these reasons, the methods used to gather data are particularly diverse. What the methods have in common

is a triple focus on the language user, the real linguistic input received, and the context in which the language user is comprehending and producing speech. 4.1 Corpus linguistics The rise of large electronic corpora was a key factor that enabled the development of usage-based models of language. With these corpora, researchers have a fair chance at approximating real usage data. These data are put to use in several ways. For instance, corpora can be used to test hypotheses about historical language change, for example, whether variants with the highest token frequency have most strongly resisted change (Bybee, Perkins, & Pagliuca, 1994). They can also be used to build models of acquisition. Part of MacDonald’s solution to the puzzles she posed came from a model of verb learning by Allen (1998), which acquired verb semantics and roles associated with argument structure on the basis of the child-directed speech found in a corpus. Another common use of corpora is to assess the actual semantics of a word or phrase, which often ends up being different from what one would expect by intuition. The corpus enables one to see the actual uses of a word in context, using concordancing to identify patterns of collocation. If the actual use of I dunno and I think is not to denote an act of cogitation, but to communicate a pragmatic stance on the claim being articulated (Bybee & Scheibman, 1999; Aijmer, 1994 as cited in Traugott, 1995), that is something worth knowing, especially if it helps account for the high rate of the first person singular as subject in conversation (Hopper & Traugott, 2003; Scheibman, 2002). 4.2 Grammar-building There are also usage-based linguists who build grammars. Grammars perform two functions. First, they make explicit the patterns and regularities that UBMs find in actual speech, so that the now explicit claims can be tested. Secondly, they render plausible alternative representational schemes. Like the computational level of Marr’s (1982) classic theory of vision, these representational schemes stop short of specifying a neural implementation, while still strongly suggesting a more concrete representation like Marr’s 2 1/2–D sketch; it is an abstract representation that suggests an embodied representation.

Grammar-writing in the usage-based world has similarly taken two main forms. First, there are the grammars of particular languages that are usage-based in the plainest sense of the word: they are based on corpus data, and make explicit the patterns and

regularities that are found in actual speech or writing. Examples of these are Sinclair (2005) and Biber, Johansson, Leech, Conrad, and Finegan (1999).

The second kind of grammar-building that has an important place in usage-based thinking involves the creation of representational schemes that better accommodate the shape of users' actual experience with language input, for example, by not prescribing a strict distinction between lexicon and syntax. An example of this is Fillmore and Kay’s Sign-Based Construction Grammar (Goldberg, 2006). Kay (1997, p. 123) describes it as follows:

Construction Grammar (CG) is a non-modular, generative, non-

derivational, monostratal, unification-based grammatical approach, which aims at full coverage of the facts of any language under study without loss of linguistic generalizations, within and across languages.

That is, CG is designed to generate all the infinite possible structures of any given language, but does so without requiring multiple disjoint levels of representation. Complex sentences, while differing in detail from simple phrases, share the same representational format.

Some corpus linguists have begun to weave these strands together, such as Gries and Stefanowitsch in their work on “collostructions,” which incorporate into a single analytic category both a grammatical construction and its co-occurring lexemes (Stefanowitsch & Gries, 2003; Gries & Stefanowitsch, 2004). Their work draws heavily on both corpus research and grammatical theory. 4.3 Computational modeling Computational modeling as a method for usage-based models of language typically exists in one of the following forms. Statistically-informed algorithms are usually used for parsing and ambiguity resolution. Works discussing this kind of model include Charniak (1997), Bod, Hay, and Jannedy (2003), and Manning and Schütze (1999). Connectionist models of on-line processing or acquisition (Christiansen & Chater, 2001b; Hare, Ford, & Marslen-Wilson, 2001; McRae, Spivey-Knowlton, & Tanenhaus, 1998) aim to model psycholinguistic results using a neurally-inspired parallel distributed processing computational architecture. Sometimes the model is designed around a rule set (Smolensky, 2001; Hockenmaier & Steedman, 2002). More often the representation is not rule-based (Allen & Seidenberg, 1999; Plunkett & Juola, 1999). A third type of model often associated with connectionist modeling is a dynamical systems model

(Tabor, Juliano, & Tanenhaus, 1997; Port & Van Gelder, 1995). In such models, non-linear mathematical equations describe the forces acting on a system moment by moment; the non-linearity of the equations creates discontinuities in the eventual outcomes, even while the inputs to the system are graded. This feature is clearly a desirable one, given the gradedness of the linguistic input to listeners, coupled with the resolutely discrete nature of language in what some would say is its purest idealized form.

Computational models of all these kinds provide several benefits. They force the formalization of hypotheses so that they can be tested. Second, they take in complex inputs and explain the outputs by identifying key factors discoverable in the input. Third, they expand our theorizing by generating novel predictions, which can point toward insights one would not have derived from intuition alone. 4.4 Experimentation The cognitive foundations of usage-based models have been discovered primarily through experimentation, both in the psycholinguistic tradition and in cognitive psychology more broadly. Experimentation is what allows cognitive psychologists to state confidently that repeated practice does leads to automatization, or that shared context does allow speakers to get by with less precise enunciation (Krauss & Weinheimer, 1964; Krauss & Weinheimer, 1966; Garrod & Doherty, 1994) . Sometimes such claims are made without empirical support, but a well-planned series of experiments can provide the support. Experiments are also used to test linguistic theories, by showing that speakers either do or do not do what a grammatical theory predicts that they will in given circumstances. For example, Hawkins (1994) discovered that, regardless of the typological word order of a language, constituents tend to be placed in a sentence such that all the constituents immediately under a mother node can be identified as early as possible in processing, and that this tendency is correlated with the syntax of prepositions and postpositions and other types of phrases that create heavy constituents, that is, longer phrases with more phonetic substance. 5 Representative results, part I: Investigations of Representation In the following sections, we introduce two of the founding voices of usage-based linguistics, R. Langacker and J. Bybee, representing rather different traditions in linguistics. Langacker (1990), as part of his research program in Cognitive Grammar, was the first to use the term usage-based. Bybee’s model (1985; 2001), while in a sense looser than a grammar, makes more specific predictions about diachronic and typological

patterns, in addition to describing and explaining synchronic patterns within languages, and making predictions that have been upheld in experiments and in child language. 5.1 Langacker’s Cognitive Grammar Langacker created Cognitive Grammar (1987), a framework for representing hierarchical structure and making cognitively plausible generalizations about linguistic patterns. Langacker’s model (1990, chapter 4) is important in that it lays out the theoretical grounding for a usage-based conception of language (Mukherjee, 2004). Langacker begins with the psychological observation that perceptual and conceptual units are created when clumps of linguistic material occur together frequently and become entrenched (Saffran, Aslin, & Newport, 1996; Gomez, 2002; Goldstone, 1998). These units may be as small as phonetic material or as large as whole utterances. For example, -ing exists as a unit in the mind, since it is a sound sequence that occurs with some regularity. Likewise, How are you doing? also exists as a cognitive unit, since it is also a sequence that occurs with some regularity. Up to this point, there is nothing surprising; any linguistic theory recognizes units that occur at different levels of analysis. In the usage-based conception, though, there is a sense in which these two units are equal in status. Both -ing and How are you doing? are strings that exist by virtue of being instances of language that are encountered by and produced by speakers; they are conventional linguistic units that exist in comprehension as declarative chunks and in production as cognitive routines.

Clearly, however, not every utterance is a pre-fabricated chunk. What of utterances, such as what are you dissecting?, that are not conventionalized routines by any stretch of the imagination? Speakers have the knowledge that -ing is affixed to many different words in contexts where a present participle is needed. How are you doing? fits into the familiar patterns such as those associated with Wh- fronting, the use of present participles, subject-verb agreement, etc. These all appear to be rule-governed behaviors that participate in hierarchical structures. How does a usage-based model account for the rule-like aspect of the knowledge of language?

The explanation is based on a representational scheme that is thoroughly instance-based, out of which regularities are recognized and given their own representations, all of which exist with different strengths at different times. As a basis, Langacker draws on the widely accepted notion in cognitive psychology of schemata (Schank & Abelson, 1977), which are flexible hierarchical structures that serve to make complex stimuli cognitively manageable. In psychology, schemas are applied to phenomena as diverse as visual perception and motor planning (Arbib & Erdi, 2000) and narrative storytelling (Aksu-

Koç, 1996; Bartlett, 1932), and have been applied to the structure of language as well (Arbib, Conklin, & Hill, 1987).

In the linguistic realm, once a linguistic unit is extracted from the input, it takes on the status of a linguistic object, say, the sound [ŋ]. At the same time, other linguistic units are being extracted, many of which share some or all of that phonetic material: -ing, song, Lincoln, playing, All Things Considered, eating, What are you doing? and so on. With enough representations of enough objects, a speaker begins to be able to perceive the pattern that many words end with –ing, and that there are semantic and structural commonalities among some of these words. Given the human tendency toward abstraction, speakers construct a schema for those words ending in –ing that happen to share a meaning that is approximately present participial. At a still higher level, whole verb phrases with various forms of the copula followed by the present participle are perceived and abstracted over; there are also abstractions over 2nd person yes/no questions, and so on. This multi-layered abstraction gives varying levels of granularity in speakers’ knowledge of their language. Langacker is at pains to emphasize two points about utterances such as How are you doing? or What are you dissecting? First, from the standpoint of parsimony, they must be multiply represented in terms of schemas of multiple levels of abstraction. Second, from the standpoint of cognitive plausibility, they can be so represented, since the construct of schema handles units of varying sizes and abstraction without requiring different architecture for different levels.

In summary, Langacker’s work addresses the observation that language embodies both abstract structure and instance-based detail, varying in granularity but not in basic principles. These representational units individually vary in strength as a function of use. As a system, the representational units interact with each other and form connections that vary in strength, as a function of their formal and conceptual proximity to its near and distant neighbors. 5.2 Bybee’s dynamic usage-based framework Bybee’s work (e.g., 1985; 2001; 2006; Bybee, Perkins, & Pagliuca, 1994) has been a prime force in the emergence of usage-based models of language. Her work goes beyond Langacker’s in its depth of focus on phenomena of language variation and change. Furthermore, its claims have been directly tested, with quantitative methods that are amenable to falsification. Sharing with Langacker his basic claims about the representation of linguistic units, Bybee generates powerful predictions about the circumstances under which variants will appear, about the form that the variants will take, and about the diachronic consequences of that variation. She has subjected these predictions to a variety of empirical tests, some broad in scope, some narrowly focused

on a particular construction. Her investigations have confirmed that frequency of use, individually and in particular combinations, can predict both the incidence and the form of both variation and historical change.

As detailed in her seminal work on morphology, Bybee (1985) aims not so much to find machinery by which to analyze various lexical items into their morphological constituents, but to find an explanation for those recurring patterns that arise in morphological systems and continue to appear problematic for the analyst. It may be appropriate for morphologists to seek to rein in as much apparent anomaly as possible and systematize it, leaving whatever irregularity might remain in a storehouse of idiosyncratic forms (Zwicky, 1992). However, Bybee shows that when usage is taken into account in the very representation of morphology even irreducibly irregular word-forms are seen to be non-arbitrary. At the same time, even apparently regular allomorphy is observed to be sensitive to contexts of usage.

Bybee (1985) uses the phenomenon of splits to make a number of points about usage. A split is observed when a single form begins to show subtle polysemy, and then the morphological possibilities for the originally single form begin to diverge. A particularly striking example of an inflectional split is to be supposed to, which in its original paradigm meant ‘was believed to be.’ Now, in the form to be sposta, it clearly has a modal sense of obligation and distinctively different phonological features.

One point that splits help to demonstrate is that derived word-forms can be multiply represented. They may have one existence as a product of the application of morphological rules. In this existence, they remain placidly within their paradigms, perhaps undergoing regular phonological processes. However, they may at the same time exist as rote-learned forms, especially if the forms occur frequently in the input. The more frequently heard the expression, the stronger the rote-learned representation becomes. The stronger the rote-learned representation is relative to the rule-derived representation, the more independent it becomes, and the more likely it is to take on a meaning of its own. Once on its own, a form is subject to additional phonological or morphological processes that create what would appear to be anomalous paradigms, if one were to consider only the original stem and the rules. Since, tautologically, our everyday speech consists primarily of our most frequently used items, this dual representation of common forms is a powerful factor in explaining existing morphological systems complete with with their idiosyncrasies.

In the meantime, the development of frameworks like Cognitive Grammar (Langacker, 1987) and Construction Grammar (Goldberg, 1995; 2006) has raised the possibility of representing constructions of any size (words, phrases, sentences) according to a uniform scheme. In later work, Bybee (2001; Bybee & McClelland, 2005) also expands this notion of multiple representation beyond morphology to expressions in

general. Thus whole phrases, or utterances, as well as phonemes within words and morphemes, can be understood as both rule-derived and simultaneously rote-learned.

Splits also demonstrate the principle that linguistic forms do not exist in isolation, but are stored and retrieved with reference to the strength of their connections to other items (Luce, Pisoni, & Goldinger, 1990). Distinct words within a single paradigm are connected to some degree with each other word in the paradigm. For example, the word finite has a strong connection to the word finitely, less to infinite and finitude, and still less to infinity and infinitesimal. This continuum of connection strengths accounts for the ease of acquiring and generating new derivational forms that are similar to the target word, in the realm of language acquisition. In the realm of historical change, the continuum of connection strengths also accounts for the fact that the least similar items are most likely to wander into new semantic fields. A representation based on concrete word-forms, with allowances for gradient information, lays a good foundation for a usage-based framework, where each instance is tracked and noted.

Most importantly, the manner in which splits occur constitutes evidence that usage changes the morphological representation of words within a speaker’s mental lexicon. Specifically, the occurrence of splits is influenced not only by the strength of connections between words, but by the lexical strength of the words themselves; lexical strength, in turn, is influenced by the frequency with which the word is encountered. As it turns out, more frequent, strongly represented forms split off more radically, undergoing shifts not only in semantics and phonology, but often syntax as well. Pagliuca (1976, as cited in Bybee, 1985) showed that this pattern is the empirical norm; of 323 words with the prefix pre-, the more frequent ones were not only more phonetically reduced, but also less predictable in their meaning. The independence of semantics offers evidence that these forms are represented independently. This pattern of results not only demonstrates Zipf’s Law (i.e., more frequent words tend to be shorter than less frequent words), but also suggests that the mechanism behind it involves rote learning and subsequent phonetic erosion of what has become a base form. For example, although 3rd person singular verb forms may originally be derived from a stem plus a 3rd person singular suffix, they are frequent and learned early in acquisition. Thus, in individual speakers, 3rd person singular forms are represented relatively independently, and as a result these forms can increase in independence diachronically in a speech community. As this independence grows, a form can eventually split off and can itself become a stem in some new paradigm. Such a process has been attested historically in Provençal (Bybee & Brewer, 1980), in the Celtic languages (Watkins, 1962, as cited in Bybee, 1985), and Tok Pisin (Mosel, 1980, as cited in Bybee, 1985), among others. The fact that this melding of units and reduced articulation occurs preferentially in frequently occurring words tells us that repetition causes changes in representation.

Deleted: i

Synchronically, many forms within any given language are likely to be undergoing such a process, and the changes in representation that occur take place so gradually that they are almost imperceptible. At any given moment, forms that are in the process of undergoing such a change are not straightforwardly categorizable either as fully regular or as idiomatic, nor even as simultaneously both regular and idiomatic. Since a complete theory of language must provide a synchronic analysis for all forms, this state of affairs makes it difficult to support a theory of language that strictly divides representations into rules versus lists. At the same time, the insight of rule plus list theories is not lost; the speaker continues to rehearse cognitive routines that strengthen individual derivational or inflectional forms, while also continuing to generate generalizations at various levels of granularity. This conception in terms of generalizations and routines preserves the insight that both regularity and irregularity have their place in language. 6 Representative results, part II: Investigations of process Evidence from naturalistic studies, such as those conducted using contemporary corpora or relevant historical data, must be given pride of place in usage-based models of language; any purportedly usage-based model that did not hold up in actual use would be oxymoronic. However, naturalistic studies can profitably be complemented with studies that probe the link between usage and speakers’ mental processes and representations by introducing un-natural elements. For instance, psycholinguistic studies directly manipulate the input that a speaker hears, to see its effect on processing and representation. Computational models test different architectures for processing and representations to determine how well any given architecture handles linguistic input. These studies can assess quite directly whether there is any connection between the distribution of input speech and the mental structures and processes that the usage-based models posit. When used well, such studies can be combined synergistically so that the human data inform the specification of architectures and the modelling data clarify the possibilities for human cognitive architecture. In this section we will briefly touch on some of the non-naturalistic evidence supporting a usage-based conception of human linguistic abilities; other chapters in this volume treat these topics in more detail. 6.1 Computational modeling Computational modeling of human language has taken many forms within the overall rubric of artificial intelligence (AI). For the current discussion, it is helpful to make the

distinction between two kinds of systems: mathematical models that directly and explicitly implement a grammar along with its assumptions about representation and processing, versus connectionist systems that, however opaquely, learn to map input to output, and in doing so generate the linguistic patterns that grammars describe (Seidenberg, 1997). The success of usage-based models of both types has bolstered the overall usage-based movement. 6.1.1 Exemplar-based models One decision that must be made about a grammar is whether it will embody representations in the form of rules and lists or in the form of instances and analogies. In the former case, the processing would involve rule application, while in the latter case the processing would involve the computation of similarity between instances or the retrieval of analogous cases or exemplars. Natural language (NL) researchers within AI have developed experience-based algorithms for morphological and syntactic segmentation and parsing. These systems, given fresh unmarked text as input, will generate output text that is tagged for part of speech and parsed. Unlike rule-based systems, which derive the regularity of their performance from explicit rules applied to categorized lists of words and exceptions, the experience-based systems first store each instance of input that they have encountered. They then use that input to analyze further incoming input. As more input language is stored in the system, the analysis that the system performs becomes more and more sophisticated, and its behavior becomes rule-like yet flexible. Brent’s INCDROP model (1997; Brent & Cartwright, 1996) extracts word-sized units from input speech by subtracting out previously encountered strings and depositing the remainder into the list of previously encountered units. His results corresponded to human data on segmentation of novel speech. Daelemans and his colleagues’ model (Gillis, Daelemans, & Durieux, 1994) of the learning of stress assignment in Dutch compares each newly encountered word to its nearest neighbor in simulated memory (i.e., the most similar previously encountered item), and assigns its stress pattern to the new word. Performance was similar to that obtained from native-speaking 3-year-olds, and in general was superior to performance produced by other popular algorithms. Skousen’s Analogical Modeling model (1989; 1992; 1998; Daelemans, Gillis, & Durieux, 1994) goes a step further; it achieves even more accurate performance, by increasing its responsiveness to data in computing the set of most relevant neighbors, which are not necessarily the nearest neighbors. In models of this type, the regularities that were evident in performance were not built into the system but emerged on-line during processing time. Thus, these models demonstrate that taking into account the details of input can generate rule-like output on novel items without requiring the induction of rules or categories. See Chandler (this volume) for an extensive discussion of this type of model.

Deleted: Cartwright & Brent, 1997;

6.1.2 Probability-based models The growth in evidence surrounding statistical learning in machines and humans changes the conception of language acquisition as a feat of logical induction; lack of direct negative evidence is a much smaller problem than originally envisaged, because a probability-based learning procedure is robust against limited incorrect inputs. Indeed, the sheer amount of input, regardless of its messiness, becomes a resource rather than a liability.

Jurafsky’s “probabilistic model of lexical and syntactic access and disambiguation” (1996, p. 137; see also Gildea & Jurafsky, 2002; Roland & Jurafsky, 2002; Jurafsky, Bell, Gregory, & Raymond, 2001) exemplifies a more complete instantiation of a grammar than the exemplar-based models, in this case Construction Grammar (CG) (Kay, 1997; Goldberg, 1995). As a unificational theory, CG specifies the representation of possible structures as constructions, which can include constraints from many different levels (e.g., lexical, morphological, syntactic, semantic, pragmatic). Like any grammar, CG can generate parses of input strings. The additional demands of on-line processing, however, require the grammar to be augmented with probabilistic information, typically obtained from natural language corpora. By adding probabilities to a straight implementation of CG, it becomes possible to prune searches to retain only the most likely candidates, as well as to eventually select the best candidate; this procedure accomplishes the double duty of both making the resulting performance computationally tractable and also creating a good match to human data on syntactic ambiguity resolution. In other words, when probability information is incorporated, it becomes possible to integrate detailed structural knowledge across the different levels of representation, and moreover to perform on-line processing in a realistic way. Without the probability being represented, such a feat would not be possible. Thus, it is important not only to know the abstract structures that are possible, but to know how they are used – a central claim of usage-based linguistics.

Bod’s Data-Oriented Parsing (DOP) model (Bod, 1998; Bod & Scha, 1997; Bod, Scha, & Sima'an, 2003) combines instance-based and probabilistic architectures. The result is a model that, like any other grammar, specifies legal strings and parses. It assumes a basic “universal representation” for the structure of syntactic knowledge – comprised of trees and sub-trees – but unlike Universal Grammar, it does not place any further constraint on usable input. It also has no place for rules to apply to the structures. The model functions by collecting the raw material of input experiences and by retaining information about their frequency. Beginning with simple two-leaf subtrees, input experience is collected in the form of trees and subtrees. Without rules, it both learns and

parses by assembling the most probable structure out of the set of subtrees available, and then by adding the resulting structure to the collection of available parts. The set of structural units that are available, along with their probabilities, are thus constantly changing in direct response to the input. The model is usage-based in that it is quite dependent on the input it receives.

DOP has a number of desirable and interesting properties. For instance, it shows a preference for structures that are constructed out of the subtrees most similar to previously encountered structures. More interestingly, filtering the input in any way degrades performance on subsequent input. This outcome is exactly the opposite of what would be expected in a more top-down UG-driven system, and shows the primacy of the input. Countering claims (e.g. Lasnik, 1999) that usage-based systems cannot handle long-distance dependencies or complex structure, DOP in fact “can statistically relate constituents that are arbitrarily widely separated – structurally as well as sequentially” (Bod, 1998, p. 68), and it can represent structures of arbitrarily great complexity. Because it implements a formally specified algorithm, there is no danger that its claims are supported only by handwaving.

Computational models such as these have far-reaching implications: If this outcome is generally true, it has important consequences for linguistic theory. It means that the knowledge of a speaker/hearer cannot be understood as a grammar, but as a statistical ensemble of language experiences that changes slightly every time a new utterance is perceived or produced. The regularities we observe in language may be viewed as emergent phenomena, but they cannot be summarized into a consistent non-redundant system that unequivocally defines the structures of new utterances. . . . The problem of language acquisition would be the problem of acquiring examples of representations from linguistic experiences guided by the Universal Representation formalism. Language change could be explained as a side-effect of updating the statistical ensemble of language experiences. And if there is anything in the human language faculty that is "innate", then it should be (1) the Universal Representation of linguistic experiences, and (2) the capacity to take apart and recombine these experiences. (Bod, 1998, p. 145)

6.1.3 Connectionist models Connectionist models, also known as neural network or parallel distributed processing models, use an architecture that is rather different from the exemplar models and the

explicitly probabilistic models described above. Connectionist models are much more opaque, since the primary entities upon which they calculate mathematical values are not linguistic structures such as words gleaned from input speech, or trees and subtrees. Rather, the mathematical values calculated are node activations, which are not intended to map onto linguistic structures. These models echo the others in their parsimonious representation of gradience and in their ability to take realistic input and extract probabilistic generalizations that approximate natural language competence (Elman, 2005). Reali and Christiansen (2005), for example, use a network they had previously developed for another task, then fed it a corpus of actual child-directed speech (Bernstein Ratner, 1984). When tested on embedded Aux-fronting in English questions, which is said to be unlearnable from input (Crain & Nakayama, 1987), the model did in fact proceed to learn the necessary distinctions.

One criticism that has been leveled at connectionist models of language acquisition is that when they are presented with rare words they cannot help but respond with forms that conform to their nearest neighbors, even if the neighbors are quite distant and irregular, and thus lead the rare word into blind alleys such as treelilt as the past tense of trilb (Pinker & Ullman, 2002). This assumption that the models cannot help but respond with nearest neighbors is a common misunderstanding of usage-based models. There are well-established methods based on usage-based principles that prevent such untoward outcomes. For example, when there are not enough tokens of a new word to make confident generalizations on the basis of nearest neighbors, the system needs only to back off to the next higher level of granularity and generalize on the basis of the unit’s inferred part of speech, or other higher order features (Kim, Srinivas, & Trueswell, 2002, p. 129). Though based on concrete tokens, usage-based models are not trapped eternally in a bleak world without abstraction.

As seen in the work of Bybee and of Langacker, one of the tasks of usage-based linguistics is to identify forms of representation that will allow learners to learn and speakers to speak, and languages to change, all on the basis of often degenerate though always plentiful input. The contribution of connectionist models has been to demonstrate the possibility of accomplishing those tasks without relying on pre-specified representations of what is to be learned. More detailed discussions of the various families of connectionist models can be found in other chapters of this volume, so they will not be discussed further here. 6.2 Cognitive and developmental psychology Cognitive and developmental psychology informs usage-based linguistics by providing an independent characterization of human cognition. Some of this research deals directly

Formatted: Font:Not Italic

with language from a psychological perspective; some of it does not directly address language, but characterizes domain-general mental processes and representations that can be recruited for linguistic purposes. These processes and representations, whether domain-general or domain-specific, constitute a minimal baseline of mental machinery known to be available for the language user. 6.2.1 Language-focused psychology research The psychological research dealing directly with language includes work on acquisition, on-line processing, and variation and change. A portion of this vast literature bears on the question of how usage affects language users, that is, the question of how the speech that is available to a language user affects that language user’s private representation and public performance of linguistic content.

Karmiloff-Smith’s (1992; 1994) Representational Redescription model brings together a large body of empirical research in developmental psychology, including language acquisition, and articulates the resulting theory explicitly enough to model it computationally. Karmiloff-Smith (1992) proposes a detailed theory of how mental representations change over the lifespan, which takes into serious consideration the ways that mental representations depend on mental processes. First, this includes the idea that adult mental representations are constrained by the initial predispositions of the infant and by the logical possibilities inherent in the process of development. At the same time, it incorporates the idea that acquiring and processing knowledge engenders fundamental, qualitative changes in mental representations. The Representational Redescription model develops the idea that early learning is constrained by human predispositions for acquiring certain kinds of knowledge in certain domain-specific ways (e.g., attention to sequential patterns in auditory stimuli), but that the processing of this knowledge does not proceed rigidly through a pre-specified and unchanging module which provides output in only a pre-specified format and prevents other computations from taking place on intermediate steps. Rather, it is only after enough material is learned, at least to the point of reproducing behaviors reliably, that the mind begins to be able to make connections among the various specific representations that generate the behaviors. Only after those connections are made does it become possible to abstract over the various specific representations. These abstractions take root in parallel with increased specialization in the processing of input information, and the abstractions exist simultaneously with the specific, shallow, behavioral representations. Karmiloff-Smith’s work nicely recapitulates usage-based linguistics’ twin emphases on multiple representations of any given surface behavior and on the fundamental representational changes wrought by human interaction with input language.

Lieven and colleagues take a similar tack in their research, which investigates how children create abstract linguistic structures due to interaction with input and due to their increasing repertoire of behaviorally mastered utterances. For instance, Savage, Lieven, Theakston, and Tomasello (2003) examined children at three different points during language acquisition using a priming procedure to elicit speech that could differentiate between reliance on relatively concrete lexical representations versus reliance on abstract syntactic representations. They discovered that, although children at both ages were able to produce sentences that were syntactically the same (active and passive transitive sentences), the younger children relied on concrete examples and specific lexical items, while the older children generalized and produced passive sentences with new lexical items. Taking these results in conjunction with other studies using different methodologies, they suggest that children begin with specific concrete constructions learned from the input with minimal generalization, then add abstract linguistic representations only gradually. The abstract representations gain strength with increasing experience and with increasing availability of concrete representations for re-description.

On-line sentence comprehension and production research informs usage-based models because they provide insight into the forms of linguistic representation used by humans, and into the ways these representations are influenced by human cognitive processes. Syntactic ambiguity resolution is one typical research area that deals with these issues. For instance, to what extent is apparently syntactic information actually stored lexically, reflecting information that could only be acquired through extended experience with input? Novick, Kim, and Trueswell (2003) use a fast-priming paradigm to establish that verb subcategorization information is activated at a subconscious level. They found that when the task was to interpret an ambiguous verb, hearers took advantage of even an extremely brief presentation of a noun that is more frequently used with one or another argument structure. At least two conclusions can be drawn from these findings about sentence comprehension. First, some usage statistics that bear on syntactic interpretation are represented directly at a lexical level; and second, this information can cross form class boundaries. In the realm of language production, too, usage is important. Bock and colleagues (Bock, 1986; Bock & Griffin, 2000) introduced and developed the idea of structural priming. In doing so, they showed that presenting a sentence with a given syntactic structure prior to having to produce a sentence increases the likelihood that speakers will subsequently choose that syntactic structure when they speak. Thus experience with linguistic input is a factor in what utterances are produced.

Most of the usage-based work in psycholinguistics has focused on acquisition and on on-line processing. However, there is some evidence that the overall framework is also relevant to language variation and change. As seen above, many psycholinguistic

experiments have shown that usage patterns have an immediate effect on speakers. But are these merely transient activation effects or something else?

Using an extension of Bock’s structural priming paradigm, Boyland and Anderson (1998) showed that speakers’ experience affects their syntactic choices not only in the short-term, but even 20 and more minutes later. Depending on their priming condition, participants were exposed either to double-object datives or to prepositional datives, as well as filler sentences. After a 20 minute delay, they were asked to produce descriptions of dative scenes, and their choice of structure was noted. A computational model of this process (Boyland & Anderson, 1999) in the ACT-R framework (Anderson, 1993; Anderson & Lebiere, 1998) supports an account that explains priming in terms both of transient activation and of longer-term learning (Boyland, 1998). Experiments with children and connectionist modeling also support the longer-term view (Chang, Dell, Bock, & Griffin, 2000; Savage, Lieven, Theakston, & Tomasello, 2006).

Kaschak and Glenberg (2004) modified the structural priming paradigm to see whether and how exposure to a novel syntactic construction can enable adults to learn to interpret the construction. After exposing people to a few sentences such as the grass needs cut, which is acceptable in few places outside western Pennsylvania, participants were then tested on various measures of reading speed for a variety of target sentences. As expected, the data clearly showed that those who were exposed to the construction learned to comprehend it speedily, while those who were not exposed had great difficulty, especially in extended sentences where the anomalousness of the sentence in speakers’ native dialect became obvious. Participants’ learning generalized; learning the grass needs cut allowed them also to comprehend the grass wants cut. This shows that the learning was syntactic, not just lexical. Further data comparing effects of the new needs construction on processing of the standard needs to be cut sentence type suggests that the mechanism may involve episodic processing traces like those present in other non-linguistic domains. Kaschak and Glenberg see their work as contributing to the types of psycholinguistic theories described above in that this work, like the others, also demonstrates that speakers use mechanisms of language processing that are sensitive to shifting patterns of linguistic experience. In addition, their results support the idea that the mechanisms for learning syntax are domain-general, besides being available both in adulthood as well as in childhood.

Such research makes it plausible that language change is actuated not only in transmission across generations through imperfect learning, but also takes place within individuals, who subsequently provide modified language input to their own language communities. There, a change can take root with enough critical mass that it may eventually cross social networks and spread more widely (Milroy, 1992). In this way, usage multiplies its effect on language variation and change.

The discovery of domain-general mechanisms active in language use begs the question of what other mechanisms pertinent to language might remain as yet undiscovered or neglected in linguistics. Non-psycholinguistic cognitive psychology is an ample source of relevant knowledge. 6.2.2 Insights from research not focused on language One of the persuasive arguments for attention to usage in language is the power of experience, to which cognitive psychology bears ample witness. The cognitive evidence forms a broad-based foundation for usage-based models of language, not all of which has heretofore been widely appreciated. Skill acquisition and implicit learning are two complementary areas of core cognitive psychology that have important implications for usage-based models of language. Finally, there are the language-focused areas of cognitive psychology that study on-line psycholinguistic processing, and language acquisition and development, as well as the social-cognitive aspects of communication such as conventionalization and communicative accommodation.

Both implicit learning and skill acquisition are phenomena in which repeated experience with stimuli leads to restructuring of mental representations. Although the relationship between them has not been completely explored (Reber, 1993; Kirsner, Speelman, Maybery, O'Brien-Malone, & Anderson, 1998; Domangue, Mathews, Sun, Roussel, & Guidry, 2004), a key difference between the two is that implicit learning research has traditionally dealt with tasks in which participants have encountered complex but unanalyzed stimuli and, upon repeated experience, have emerged with mental representations that have analyzed out some of the underlying structure of the stimuli, either in terms of overarching generalizations or in terms of component chunks. Skill acquisition, on the other hand, has traditionally dealt with tasks in which participants have repeatedly assembled simple component tasks into useful combinations, constructing efficiently compiled mental representations that enable fluent performance. These two complementary learning mechanisms ground and enrich the usage-based claims that frequent exposure to linguistic input feeds the generation of patterns both large and small within speakers’ linguistic systems. 6.2.2.1 Implicit learning A large portion of the important work in implicit learning has been carried out through the study of artificial grammar learning (AGL). Typically, in AGL experiments, participants are presented with output strings generated by a context-free grammar

(CFG), usually alphabetic letter sequences presented visually. AGL experiments have shown that participants’ ability to recognize legal strings increases with exposure to the artificial language, even if they are not aware of that ability. How does this learning occur? AGL research (Reber, 1993) shows that, early in learning, when the amount of exposure to examples is still relatively small, learners represent their understanding of the system in terms of instances of full sequences or fragments of sequences. Once enough experience is accumulated, the same relatively literal knowledge is retained, but a layer of abstraction is added. This layer of abstraction guides the learner’s analysis of the full complex sequence, so that it is parsed according to abstract symbols (Kinder & Assmann, 2000). Consistently successful grammaticality judgments on novel sequences have been found to depend largely on arriving at an appropriate set of chunks. Servan-Schreiber and Anderson (1990) formalize the process of chunking in AGL. Their computer model, in which exposure builds chunks of varying strength, and in which chunk strength affects the speed and likelihood of retrieval, produced results that closely match empirical AGL data. The fact that the model matched the empirical data is evidence that the mechanisms built into the model were sound. Furthermore, Gomez and Gerken (2002; 1999; 2000) have studied AGL in infants, and show that even infants acquire both specific chunks and more abstract syntactic knowledge when they are exposed to example strings, just as adult listeners do. The abstract knowledge that the infants display is sufficient for them even to identify strings that come from the grammar they learned but with an entirely different vocabulary.

Studies in artificial grammar learning have revealed mental processes of structure formation that are consistent with the mechanisms posited by usage-based models of language. For example, Reber (1993, p. 129) mentions one study in which, even though a simple rule set would fully specify the legal strings, participants still learned the artificial languages by beginning with instance-based chunks, and proceeded to create their own abstract generalizations that generated the same legal strings but did so without necessarily relying on the most globally general rules (Brooks & Vokey, 1991). In fact, several studies showed that informing participants of the rule set did not help, and sometimes actually produced decrements in performance (e.g., Berry & Broadbent, 1988). Another point of contact with usage-based models is that the same strings are represented not mutually exclusively by rule or memorization, but are multiply represented, with strings starting out as memorized chunks and then more frequently encountered strings participating in local and then increasingly general abstractions (Brooks & Vokey, 1991; McAndrews & Moscovitch, 1985; Peters, 1983; Saffran & Wilson, 2003). Thus, routinized chunks exist, but they also participate in abstractions.

One caveat in applying AGL to natural language acquisition is that success in natural language acquisition is measured by meaningful production, not content-free grammaticality judgments, as has been typical of AGL experiments. Mathews and

Cochran (1998), however, have shown that when the paradigm is extended to test generative capacity, the same mechanisms hold. Another point of concern is that the artificial grammars studied have been restricted to context-free grammars, which are much simpler and more regular than natural language. However, these differences are mitigated by the fact that the implicit learning data correspond well to findings from the linguistic study of natural language, and to extensions of AGL to generative use. Janda and Joseph (Joseph, 1997a; Janda & Joseph, 1999; Joseph, 2002) showed that constellations of local regularities determined speakers’ generalizations across a number of natural languages, including modern Greek. Peters (1983) presents a body of research describing how children arrive at mental representations of linguistic units of various sizes, often involving memorized chunks that are later analyzed. In Dell, Reed, Adams, and Meyer’s (2000) work on language production, speakers reliably mirrored the phonotactics of an artificial language in speech that they produced. 6.2.2.2 Skill acquisition Human beings acquire many skills on their path to adulthood, of which language use is only one. The structure of language may be more complex and is almost always learned less consciously than the structure of the domains in which skill acquisition has been studied, and thus there are significant areas of non-overlap. Nevertheless, the study of skill acquisition brings to light many component processes that can reasonably be shared by language acquisition. Skill acquisition proceeds both in simple and complex motor domains (such as typing or piano playing) and in complex cognitive domains such as chess-playing (highly constrained) or medical diagnosis (less constrained). Furthermore, the general temporal framework of skill acquisition maps relatively well onto the course of language acquisition seen from a usage-based point of view. Van Lehn (1996) reviews the literature on cognitive skill acquisition, structuring his discussion around the specific processes that characterize progress among individuals possessing different degrees of skill. Skill acquisition begins with acquiring information, through a combination of observation, explicit instruction, and playful experimentation with minimal knowledge. In the domain of chess, for example, this early phase would include learning the movements allowed for each piece, typical opening moves, and how pieces are taken, while seeing others play and engaging in initial attempts to play. In the domain of language, this phase would include vast amounts of listening, as well as inexpert attempts to produce language at any level of analysis: sounds, morphemes, words, combinations. Once this foundation is acquired, however, there is still a long process of development that leads to qualitative changes in both representation and on-line processing.

As skill acquisition proceeds and the learner accumulates experience, the learner begins to glean generalizations that are supported by examples. The intermediate phase of skill acquisition characteristically involves accumulating experience with a wide variety of knowledge units, and by making generalizations over the individual pieces of knowledge that have been learned. Peters (1983) describes the process of language acquisition in these terms. Developing this idea further, Bates and Goodman (1999) note that a rapid increase in young children’s syntactic competence is linked with the acquisition of a critical mass of vocabulary. This intermediate phase can include learning how to put together small units or learning how to break apart large units into commonly-occurring component parts. Both in language and in the complex skills of chess or piano or tennis, mastery of the domain relies on the complementary processes of analysis into, and synthesis of, small pieces of knowledge.

Perhaps the most relevant lesson from skill acquisition comes from the advanced phase of skill acquisition. Although a learner’s performance may reach a basic level of technical correctness relatively early, practice is the engine that drives skill acquisition, and practice is what will improve the speed and automaticity of the performance (VanLehn, 1996). In the domain of language, usage is practice. In domains ranging from typing to learning mathematics to learning a second language, it has been found that increasing the amount of practice induces a predictable speed-up in the performance (DeKeyser, 2007). The so-called power law of practice is extremely robust.

To what extent the speed-up is due to knowledge being newly represented in procedural form, or just in larger declarative chunks, or something else (Johnson, Wang, & Zhang, 1998; Rabinowitz & Goldberg, 1995), is still a matter of controversy. In general it is agreed that practice leads to representational change in the material that was practiced. Two features of the changes due to practice are particularly worthy of note. First, highly practiced skills grow in automaticity. In the skill acquisition context, automaticity does not mean lack of control. Rather, it means that the computation of assembling units into higher order wholes becomes faster and more accurate, and requires less and less effort. Logan (1985) points out that automaticity is an important aspect of skill in all domains, and cites speaking (1989) as an example of a highly practiced skill in most adults. Secondly, increases in speed come about not only through speed-up in the assembly of novel combinations, but also through the storage into memory of frequently used sequences as pre-fabricated wholes, such that fewer, larger units are retrieved for assembly each time. In the terminology of skill acquisition, computation gives way to retrieval. It is important to note that these two routes to speed and fluency are not mutually exclusive. The benefits of frequent practice accrue in both computation and in retrieval for both new items and previously encountered items.

These findings are extremely relevant to usage-based models of language. Usage-based models depend on the existence of highly practiced units of language, and on the

fact that the units’ properties change depending on how much they are used. There is no reason to suppose that the power law does not apply to linguistic material, not only because the power law is so robust, but also because the properties are right; language use, like musical improvisation or scientific reasoning, requires less effort and becomes more fluent and more creative over time.

Both the retrieval view and the proceduralized rule use view are supported by second language acquisition research. DeKeyser (1997), reviewing his own research and that of Robinson and Ha (1993), cites evidence that automatization takes place in second language learning. Robinson and colleagues’ research suggest that retrieval of stored items is automatized. DeKeyser’s own research showed adherence to the power law of practice in comprehension and in production, demonstrating that use of linguistic rules in composing sentences is subject to automatization. A leading researcher of usage in second-language learning is Ellis (2008). For an excellent thorough review of usage-based research with particular application to second language acquisition research, see Ellis (2002). 6.2.2.3 Integration Poldrack, Selco, and Field (1999) and Gupta and Cohen (2002) have made a case for collapsing many of the distinctions that have been made in the areas of implicit learning and skill acquisition, and instead recognizing that many aspects of what had previously been considered distinct processes are in fact just manifestations of the tuning of procedural responses that comes with repetition. According to Gupta and Cohen (2002), regardless of whether an experimental task involves analyzing out components (as in implicit learning) or putting together components (as in skill acquisition), and regardless of whether the responses required by the task are freshly generated or previously-rehearsed, the task itself can always be thought of as a “transduction between representations” (p. 418), and the improvement in performance is merely a tuning of the neural transducers that is a natural consequence of repetition. Applying this idea to language and its development over the lifespan they make two relevant points. First, learning how to produce word forms follows the same trajectory as the phenomena they studied in terms of skill acquisition and the implicit learning phenomenon of repetition priming. Thus, learning how to produce words follows the tuning-through-repetition model. Second, the fact that structural priming (i.e., exposing speakers to a particular grammatical structure some time before they are asked to speak) leads speakers to preferentially use that grammatical structure also fits their model. It appears therefore that the production of grammatical structures should also be seen as subject to tuning through repetition.

Another line of research in cognitive psychology with important implications for usage-based models of learning is provided by Goldstone (1998; 2000; Goldstone & Steyvers, 2001). Cognitive psychologists have traditionally separated perceptual learning from the learning of concepts and categories. Goldstone has pushed for an integrated view of perceptual and conceptual learning, in which the perceptual units that shape our knowledge of categories are flexible and depend to a surprisingly great extent on our experiences, as well as on the categories that we use to interpret our experiences. Goldstone, Steyvers, Spencer-Smith, and Kersten (2000) review the literature on expert skill to remind us that skilled practitioners (such as chess masters, beer drinkers, and physicists) perceive the world differently from novices, and they marshal neuropsychological evidence that these differences extend down to the earliest stages of sensory processing. They discuss several mechanisms by which perception is tuned to experience.

First, experience helps learners to home in on whichever sensory dimensions are most informative in their particular environment, as long as they are within the range of learners’ perception. For example, native English speakers have learned to attend to the dimension that separates initial [l] from initial [r], whereas native Mandarin speakers

have learned to attend to the dimension that separates initial /tɕ/ (pinyin <j>) from initial

/ʈʂ/ (pinyin <zh>), because those distinctions are informative in their respective languages. These dimensions may or may not correspond to the features listed in an official analysis of the domain. Goldstone et al. (2001) have induced learners to become sensitive to arbitrary dimensions for a set of objects by tracking the statistics of the objects in combination with how the objects are used. The phenomenon of categorical perception in speech falls out from the more general theory’s idea of dimension sensitization by experience with usage.

In addition, experience forms both the shape and the size of what we consider our basic building blocks. Seeing a star-of-David shape as two large superimposed triangles produces a different mental representation compared to seeing it as a hexagon sporting a small triangle attached to each side. If, in the viewer’s experience, the presence of a hexagon has consistently been diagnostic of category membership, then the latter analysis increases in likelihood. Reanalysis of a sentence from I’m going [to eat] to I’m [going to] eat becomes likely when the presence of going to becomes increasingly diagnostic of a sense of intention. Not only is the shape of linguistic building blocks formed by experience, but their size is as well. Even in cases when a perceived entity was originally created and conceived of compositionally (e.g., as superimposed triangles), under conditions when the composed assembly takes on categorical meaning and is frequently encountered, the whole complex entity comes to be interpreted as basic (cf. Cave & Wolfe, 1990). Pawley and Syder (1983) provide a catalog of relevant linguistic examples,

such as Are you all right? or I’m so sorry to hear that, which are whole sentences that could be understood as composed, but are typically understood with little or no further syntactic analysis. I want to marry you is understood as a relatively unified entity; I want marriage with you, though perfectly grammatical, is understood as composed. Likewise, John wants to marry Nelly is cognitively simpler to a native speaker than John wants marriage with Nelly. Goldstone’s research is important to usage-based linguistics because it documents a generally applicable cognitive phenomenon that allows experience not only to change representations of frequently encountered utterances, but also to change even initial perceptions of new utterances in the future. It can reasonably be inferred that usage-based theories are not simple association machines.

Holt, Lotto, and Kluender (2001) have similarly investigated the interplay between experience and perceptual categories, but specifically in the realm of speech perception. In contrast to prevailing theories of speech perception that rely on species-specific neural discontinuities to define feature sets and phonemes they show that perceptual learning affects the identification of phonemes. When voicing and low f0 tend to covary statistically in a language, the presence of low f0 leads perceivers to label a sound as voiced; when that covariation is not present, low f0 does not affect the labeling of a sounds as voiced. An astute reader will notice that this experiment cannot ethically be carried out in a human infant population, since it would require long-term removal of the infant from his or her natural environment; any short-term experiment would not manipulate the overall long-term covariation statistics, and thus would not be an adequate test of the hypothesis. To achieve adequate experimental control of covariance patterns, they conduct many of their experiments with non-human species, such as the European starling and the Japanese quail. The fact that, in as little as 101 hours (Kluender, Lotto, Holt, & Bloedel, 1998), even a species with “more austere neural potential [than humans]” (Kluender, Lotto, & Holt, 2005, p. 213) shows perceptual learning for speech sounds demonstrates that humans are quite likely to be achieving at least as much perceptual learning. A computer simulation with no pre-programmed boundaries performed similarly well. Their research demonstrates that perceptual learning from experience is sufficient to produce the phonological classifications typical of human speech perception. Similar findings have been obtained in ASL (Wilson, 2001).

7 Summary and wider role: Usage-based models’ primary contributions The overall goal of usage-based models is to set an empirically and logically defensible foundation for the language sciences. The research conducted as part of this quest has yielded important findings and highlights crucial issues to consider, which have implications beyond the field of usage-based linguistics.

7.1 A different kind of explanation Aslin, Saffran, and Newport (1999) note wisely that no matter what one’s theoretical orientation may be, it is necessary to be aware that generalization must be constrained one way or another against output that does not occur, and that a good theory of language needs to specify these constraints. For some linguists, the constraints are the focus of attempts to explain linguistic phenomena: why are certain classes of structures not normally found in the language? Exactly how can one characterize these structures? Usage-based theorists acknowledge the necessity of constraints, but focus their efforts at explanation elsewhere. More specifically, explanation consists in getting as much mileage as possible from mechanisms already known to function in cognition, and reducing the complexity of linguistic phenomena to the natural outworking of mechanisms at some other level of analysis. The expectation of usage-based modeling is that these mechanisms will naturally constrain the output of the models.

Although there is still much work to be done, usage-based models are working

hard to reframe the logical problem of language acquisition, the puzzle of how language could be learned from unreliable input speech (MacWhinney, 2004; Lewis & Elman, 2002; Karmiloff-Smith, 1992). For historical reasons, the assumption has been that a relatively stable endstate adult grammar is the given, that input speech is insufficiently informative, and that the human cognitive capacity is the unknown whose properties we must infer from adult grammar in the absence of sufficient input. This assumption made sense at a time when cognitive science was relatively undeveloped, and when large electronic text corpora were not available. Now, however, this assumption makes less sense. Because of the knowledge gained in the last 30 years of cognitive science research, the usage-based enterprise accepts the idea that cognition is a given, that there is significant information in the linguistic input, and that the representation of the adult “endstate” is open for investigation and may or may not consist of a grammar. In this case, the fact that learning proceeds despite messy input does not have to be quite as puzzling a “logical problem” as it would be if creation of a grammar with insufficient input were the only conceivable way to pose the problem of language acquisition. 7.2 A different kind of formalism Usage-based models are not a sort of functionalism that is opposed to formalism. Instead, they are models that offer alternative formalisms, which often bear more relation to continuous than to discrete mathematics. As Pierrehumbert (1999, p. 287) puts it, “I take

‘fully formalized’ to mean formalized down to the last differential equation, and ‘exhaustively validated’ to reflect coverage of all speech behavior in its full statistical variability and physical glory.” A formalism that explains variability and probabilities is no less rigorous than one that explains only discrete rule-following competence. Usage-based formalisms range from mathematical and computational models like those of Pierrehumbert and Jurafsky, to abstract grammatical schemes like Construction Grammar (Goldberg, 2003; Kay, 1997) and Cognitive Grammar (Langacker, 1987).

Newmeyer (1998) articulates the position that “what all current [generative] models share is an algebraic approach to the explanation of grammatical phenomena” which depend on “the manipulation of discrete formal objects. . . . Foremost among these objects are the syntactic categories NP, V, S, and so on” (p. 165). That is, well-defined abstract categories and their well-defined interactions not only exist, but are the foundation of his linguistics. The alternative formalisms of usage-based linguistics allow for linguistic categories to exist in the mind. Yet these categories do not have to be a priori categories; they can be the created by the language user from the input speech. At the same time, usage-based linguistics proposes that such fixed categories need not constitute the basis for linguistic theory. The evidence that categorical mental representations and behaviors can arise from low-level data-driven mechanisms operating on raw input is well-established and growing in scope (e.g., Colunga & Smith, 2004; Edelman, Solan, Horn, & Ruppin, 2004). Positing discrete categories as innately given is thus un-necessary, from a usage-based point of view. Newmeyer takes the mirror-image position, arguing that external explanations for discrete linguistic behaviors are unnecessary. In a sense, it is a question of where the burden of evidence lies. Should one assume that a priori categories are given and low-level cognitive processes operating on linguistic input are unnecessary? Or should one assume that low-level cognitive processes are given and discrete categories are derived and therefore need not be innate? Usage-based linguistics places its bets not only on the power of low-level non-algebraic processes to explain linguistic categories and non-algebraic formalisms to undergird linguistic explanation, but also on the likelihood that these processes and formalisms, even if logically unnecessary, are empirically true.

A similar argument can be made for the necessity of rules. Chomsky argues that rules are basic and constructions are epiphenomenal (as cited in Newmeyer, 1998, p. 221). But usage-based linguists argue precisely the contrary: that instances of use are basic while broad generalizations as notated in rules are epiphenomenal. To usage-based linguists, the weight of the empirical evidence appears to favor the primacy of instances of use.

Deleted: m

7.3 A different kind of innateness A focus on usage as the basis of linguistic knowledge might seem to deny a strong role for innateness. In actuality the opposite is true (Elman et al., 1996; Smith, 1999b). Although usage-based models eschew claims that knowledge of linguistic structure is itself innate (in Elman et al.’s words, the content of cognition), these models rest on the assumption that there are cognitive processes that are innately constrained (the mechanisms). A system that collects input data indiscriminately, without constraint, has no chance of determining the underlying structure, because in the absence of distinctions on which to base a preliminary analysis of the input, the flood of data would be meaningless and without structure. But the needed constraints are not necessarily constraints of form. Having innate constraints on the cognitive processes used in assimilating input allows language acquisition to proceed more elegantly and more flexibly than having innate constraints on the resulting content. As pointed out by demonstrations of semantic, syntactic, and prosodic bootstrapping, a little innateness goes a long way in the acquisition process (Pinker, 1984; Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005; Morgan & Demuth, 1996).

Traditionally, a commitment to nativism flowed from poverty of the stimulus arguments. In these arguments, neither the input, nor the mechanisms by which input was processed, were innately constrained. Rather, it was only the resulting linguistic representations that were posited to carry heavy innate constraints on syntactic form. Thus it was concluded that many aspects of adult linguistic and conceptual representation must be innately given. For example, Crain and Pietroski (2001) argue explicitly that because of the indeterminacy of the input, children must have the innate knowledge that specific syntactic structures are expressly prohibited. However, the argument assumes that constraints should be placed on specific representations, rather than on cognitive processes.

What usage-based models offer is a view of innateness that specifies not biologically-given content knowledge or neural representations, but biologically-given constraints on predispositions in processing, which constrain how, when, and whether different kinds of information tend to be combined with others, during the acquisition process (Elman et al., 1996; Karmiloff-Smith, 1992). These constraints in mechanism create the infrastructure that forms the basis for the child’s analysis of input and subsequent building of his or her grammar. In general, all usage-based models rely on and argue for innateness of mechanism rather than innateness of content.

7.4 Theoretical unity before modularity The struggle over modularity has been nearly as contentious as the struggle over innateness, but usage-based models of language offer a way to acknowledge the value of both domain-specific and domain-general processes. Traditionally (Fodor, 1983), it has been assumed that there is a cognitive module at each level of description (phonology, morphology, syntax, etc.), encapsulating the mental processes needed to perform the operations necessary at each level, and communicating with other modules only through specific interfaces. According to this view, each level only receives information through specified input channels, and provides information to its specified output channels. The assumption of modularity has been based on a basically linear model of language processing and representation. In such a model, phonetic information is the first to enter the hearer. The low-level phonetic material is then processed and passed through an interface to the phonological level, then the morphological, syntactic, semantic, and pragmatic levels in turn. Each level is seen as being processed and linked to the next through an interface, until at the end the sense of a sentence can finally be computed (Frazier, 1995). Lower levels have smaller units; phonology traffics in phonemes, morphology traffics in morphemes. The work of each higher level consists largely in constraining the component parts fed to it by the immediately lower level.

Karmiloff-Smith’s representational redescription model (1992), described in section 6.2.1, challenges this view of modularity, and elaborates a cognitive architecture that allows the language user to create representations that do not respect these analytic levels. Eventually, to be sure, the speaker-learner may also formulate higher-order representations and develop specialized routines for linguistic tasks, such as parsing, or syntactic planning, but these routines are derived from, and co-exist with the more rough and ready representations that the speaker is constantly creating in order to cope with immediate linguistic needs. Because we are not limited to a single non-redundant representation of each piece of linguistic information, we are free to consider interactions that have previously been problematic. For instance, can phonological facts really bear directly on syntax without going through morphology? In a usage-based model, an affirmative answer to such questions is not a problem. Regular, quasi-regular, partially irregular, and fully irregular all get representationally re-described according to the specifics of their distributional patterns, and the result is this diversity of outcome, but with a coherent representational system at the base.

7.5 Methodological diversity over singularity One of the gifts of usage-based modeling to linguistics and the cognitive sciences in general is its multi-disciplinarity. To the extent that converging evidence is considered an ideal, usage-based linguistics marshals an extraordinary body of data attesting to its power as an organizing framework for investigating the processes and representations used in human language. Data from text and speech corpora, from psycholinguistic and cognitive psychology experiments in children and adults, from neuroscience and ethology, from mathematical and computational modeling, and from synchronic and diachronic linguistic argumentation, are all consistent with a usage-based conception of language.

Current research in allied fields points toward usage-based processes and representations as normal and foundational. As one example, implicit learning in cognitive psychology has shown that distributional and collocational information is easily acquired, although not usually at the conscious level (Sedlmeier & Betsch, 2002). In neuroscience, it has been found that plasticity and late specialization is normal for the higher mammals, where extensive interaction between the organism and its environment is the determinant of brain specialization much more than direct genetic coding (Dinse & Merzenich, 2002; Sur & Rubenstein, 2005; Elman et al., 1996). Artificial intelligence systems for natural language processing have demonstrated that the information necessary for generating linguistic competence is available in the environment, and have discovered computations that can extract the information and use it (Klavans & Resnik, 1996; Hatzivassiloglou, 1996). Most of these discoveries have been made completely independently of any a priori philosophical commitment to usage-based linguistics, and are taken for granted by practitioners of the respective disciplines. With such support, acceptance by linguists should be a matter of course.

We have also seen that widely varying puzzles can be explained through usage. For example, usage-based studies have identified minimal principles that determine when a verb will be regular or irregular to whatever degree (Bybee, 1995). Attention to usage, along with human predispositions for attending to certain kinds of information, predicts infant speech segmentation and inference of grammar (Saffran, Aslin, & Newport, 1996). Usage predicts attentional biases in word learning, directing attention to which dimensions are relevant to generalization of a new word’s meaning (Smith, 1999a). By observing usage to track change in progress, corpus work has uncovered new historical developments in modal verbs (Boyland, 1996; Krug, 2000) that would not otherwise have been evident. The variety of methods surrounding the notion of usage as the basis of grammar has spun a unifying thread running through a substantial body of current language research.

8 Issues: Limitations and unanswered questions There are a number of matters that the usage-based enterprise must attend to, reason through, and eventually argue convincingly. These are matters that perhaps have not attracted sufficient research attention in the past, or have yet to be resolved adequately, or would benefit from re-framing, or still need to be communicated clearly to diverse outside audiences.

Despite the success of usage-based models, limitations and unanswered questions remain on many fronts. A persistent problem has been that many critics have remained unconvinced by a usage-based approach to “crisp” data, i.e., phenomena that appear to be cleanly explained in a classical algebraic or set-theoretic framework where the set of grammatical utterances can be defined without error by logical formalisms. This is especially the case since there is substantial disagreement within the usage-based community on a number of foundational issues. Another problem is that while methodological diversity offers the advantage of a framework for collecting converging evidence when evidence is available it also presents pitfalls; certain kinds of desirable converging evidence are not always available. Indeed, the research agendas of many of the investigators are widely divergent, and this lack of focus results in slower than desirable progress on key issues. 8.1 Philosophical issues A persistent criticism of usage-based models is that, by being so attentive to gradients and marginal phenomena, they fail to deal adequately with the non-gradient data that some consider to form the core of linguistics. One way to respond to such criticism is to question the premises on which it is based. For example, one could question the assumption that the core of linguistics consists of crisp explanations of crisp data; this has been probably the most common approach. Unfortunately, dispute about axiomatic assumptions, even when logical, can devolve into a barrage of assertions. Who gets to decide what the core of linguistics is? 8.1.1 We don’t agree among ourselves, on deep matters A major question that remains within usage-based linguistics is what the place of rules and other formalisms ought to be. Probably the dominant view among usage-based linguists is that grammatical rules can be concise descriptive devices for a limited subset

of language, but that the regularities they encode are epiphenomenal. Specifically, they are seen as resulting from the combination of a target that consists of two things: conventionalized ways of marking grammatical relationships that solve standard communicative problems (Givón, 1999), and learning mechanisms that begin by creating undergeneralized representations from specific utterances and then grow in abstraction as the knowledge base grows. This view has been articulated extensively by researchers such as Bybee, Langacker, and Tomasello, who use the term abstraction rather than rule, since abstractions are not presumed to be fully regular.

This generalization sidesteps some controversies within the field. For example, in a conference and subsequent proceedings volume The Reality of Linguistic Rules (Lima, Corrigan, & Iverson, 1994), part of the question is whether there is any independent existence to the rules that might be used to specify conventionalized ways of marking grammatical relationships, or to the rules that might be used to specify the abstracted representations. Do these rules explain anything or do they merely describe? What would it mean for a rule to explain rather than describe? Is explanation really better than description? Do rules exist in the minds of all speakers, or only in the minds of linguists? Though the products of rules have gradient structure, must we conclude that the rules themselves must also operate gradiently? Do formalisms clarify or obscure? On one end of the spectrum are those who have a philosophical objection to the use of rules and formalisms. Hopper (1987), for example, has argued that the very idea of an a priori grammar that is possessed by a speaker apart from a communicative context is untenable; instead, the regularities that do emerge are products of multiple communicative constraints that are constantly changing. On the other end of the spectrum, many (e.g., Smolensky (2001) and Steedman (2001)) see no dichotomy between formal rule-based systems and certain kinds of usage-based systems, because any mental representations underlying rule-governed behavior can be inferred by learners from usage. Givón (1999) has also been a strong advocate of retaining a place for rules, as long as there is also a place for flexibility.

8.1.2 A theoretical question: Is simplification a necessary feature of explanation? One school of thought in the philosophy of science holds that explanation consists of simplifying a mass of data down to a few basic principles from which all the data can be predicted. A problem endemic to data-hungry theories is the objection that they do not simplify, and thus do not explain. Setting aside the question of whether the following is achievable, the goal of traditional rule-based theories is to begin with a few observations, then infer a finite set of formally-specified principles that are designed to generate the infinite set of grammatical sentences in a language. Usage-based theories, on the other

hand, by their nature require immense amounts of usage data to feed a smaller set of abstractions. Because the nature of usage-based systems is that they are synergistic and exploit information from the full complement of data that enters the system, as long as the task of model-building in linguistics is to describe in finite terms the whole of language, usage-based linguistics cannot do the job.

This may be a severe limitation, but as Postal (2004) points out, generative grammars have been proved likewise unfit for this job. Their fallacy is the assumption that the number of basic elements in a natural language can really be finite. More fundamentally, it is far from unanimous that explanation must consist of the reduction of data down to the simplest rule set that will generate the data (Chomsky, 1957). Other conceptions of explanation describe better the work of usage-based linguistics, such as the identification of mechanisms that produce observed data, or the prediction of future data on the basis of observed data. Without digressing further into the philosophy of science, the questions still remain: Do usage-based models simplify? If so, in what way can we say they simplify? In what sense is simplification one of the tasks of linguistics?

8.2 Minding the gap

A current area of weakness in usage-based linguistics is that its practitioners have widely divergent expertise, not just in different areas of linguistics, but in fields that have very little knowledge in common. Since their inception, usage-based models have made the most headway when there has been cross-fertilization among researchers with diverse backgrounds. Linguists and other cognitive scientists will need to continue to work together to create a knowledge base that is truly integrated, not just heavily overlapping as is currently the case. Specifically, linguists, computer scientists, and cognitive psychologists must collaborate to design both formal models, as well as empirical tests of those models. Working alone, most linguists and cognitive psychologists do not have the technical expertise. Computational linguists, who do have the expertise, have done interesting and relevant work, which because of its usually applied focus, has largely not been taken advantage of. Cognitive psychologists enter the field primarily through the route of connectionist modeling or psycholinguistic sentence processing, while contributions from other relevant areas of cognitive psychology may or may not make their way to a linguist who could benefit from them.

To improve cross-fertilization, more researchers should be conversant in multiple disciplines, especially in mathematics and statistics. Such technical expertise will allow more widespread watertight demonstrations of the predictive power of large aggregates of usage data. Similarly, more training in psychology will allow easier access to ideas

Deleted: Postal,

about attention, perception, concept formation, and action planning, that are common knowledge in psychology and relevant to language structure.

Linguistics may need to take on more fully the idea of converging evidence that cognitive scientists embrace. Realizing that any given methodology might not provide a complete answer to a question, inquiry proceeds by acquiring multiple forms of evidence and seeing where the preponderance of the evidence falls. 8.3 Proposing a research program Perhaps as a result of this lack of integration at the center, the question of whether there is a unified research program, and if so where it is headed, does not have a clear answer. Certainly a good foundation has been set in the abundance and diversity of research that has come together to offer proof of concept that usage is universally and inextricably woven into human knowledge of language (Gahl & Garnsey, 2006), and that models that take usage into account can make surprising and empirically supported claims about synchronic and diachronic patterns in pragmatics, syntax, morphology, and phonology. However, what has been done has often been misinterpreted, and there are several major areas where much more attention needs to be focused. 8.3.1 Address confusions and criticisms One need within the field is to clarify explicitly some areas that have given rise to consistent confusion from those outside the field. Newmeyer (2003) reveals a number of deep misunderstandings as he argues forcefully against the usage-based enterprise. The first is that usage-based linguistics is identical to functionalist linguistics, and thus he spends the bulk of the paper attacking the idea that communicative-functional usefulness is the primary source of grammatical structure. Regardless of whether that is an accurate characterization of functionalist linguistics, this appeal to usefulness plays no part in usage-based linguistics. Secondly, he devotes several pages to the defense of the statement that “speakers mentally represent the full grammatical structure, even if they utter only fragments” (Newmeyer, 2003, pp. 688-692). In effect, Newmeyer is assuming that usage-based models deny that speakers represent underlying hierarchical structure. However, even Hopper’s (1987) manifesto against generative grammar does not deny the existence of grammar, but describes it as the “spreading of systematicity from individual words, phrases, and small sets” (p. 142). Note that the point is not absence of systematicity; rather, the point is that the source of systematicity is not an overarching set of principles, but conservative abstractions over previously encountered instances of use.

Finally, Newmeyer states, “stochastic grammar is no more defensible as an approach to language and mind than would be a theory of vision that tries to tell us what we are likely to look at” (p. 697). Though it is quite true that theories of perception do not try to tell us what we are likely to look at, they do tell us that how we see (and presumably, also what we are able to imagine) depends a great deal on what we have been likely to look at (Goldstone & Barsalou, 1998). The structures that have become salient through repeated presence in prior visual arrays become the backbone of the analysis of visual arrays seen in the future. Likewise, our analysis of what we hear and the structure of what we say are both greatly dependent on the structures that have become salient in utterances that we have heard. Notably, both Newmeyer (2003) and Pinker (1999) have objected that usage-based models are inadequate because they cannot explain the existence of infrequent variants in adult syntax, assuming that infrequent forms ought to lose ground and gradually disappear. This objection reveals a misconstrual of usage-based tenets, assuming that in usage-based models linear order takes precedence over hierarchical structure, and that a particular string’s token frequency is fully determinative of a construction’s future usage, neither of which is accurate. Usage-based models will need to express more clearly exactly how infrequent forms are maintained, and retained for use in particular contexts. Being explicit about the sensitivity of constructions to context, including sensitivity to hierarchical structure, will also help. It is clear that continued dialog with critics, specifically addressing such confusions, will be necessary in order to counteract the misconceptions that have arisen. 8.3.2 Meet critics on their territory and test usage-based ideas there Though riddled with misunderstandings, Newmeyer (2003) also makes several useful criticisms. Chief among these is the charge that usage-based linguists have spent a fair amount of effort on providing proofs that their models can plausibly handle complex systems that approximate natural language, but have rarely provided full models of phenomena that are at the center of current syntactic inquiry. This is an area in which usage-based research has made some headway, but has not done as much as it could (Anderson, 1999). To date, most of the work has addressed what could be called the periphery: phenomena that encompass substantial variability and change, with no clear agreement among speakers. As enlightening and as important as the periphery is, if the aim is to show that usage-based models serve better as a basic model of human linguistic capacities than other prevailing models, then focusing some effort on creating superior working models of syntactic phenomena that generative linguists consider puzzling would be a worthwhile endeavor. To that end, Haskell, MacDonald, and Seidenberg (2003) have attacked the problem of plurals within noun compounds (e.g., mice-hunter

vs. *rats-hunter; Pinker, 1999) that has heretofore been used to propound the idea of a strict dichotomy between memorized words vs. freshly rule-generated utterances. They showed that a model based on a gradient sensitivity to input accounted for more of the data than the words vs. rules account. More recently, Goldberg (2006) has given usage-based accounts of core topics such as argument structure, subject-aux inversion, and islands and scope. Continuing to study phenomena in common with other linguists is one productive route along which usage-based research may proceed. Postal (2004) mentions the English passive and strong crossover effects, for instance, as being two phenomena that have resisted adequate analysis from a Chomskyan perspective. Although particular phenomena may be of more or less intrinsic interest, or may even seem theory-bound to some investigators, the overall end of demonstrating that usage-based models are widely applicable may override these concerns. 8.3.3 Take new territory One fundamental limitation of current usage-based models is that they have been indiscriminate recipients of input, rather than active in selecting information from the input stream and putting it to the speaker’s intended use (Elman et al., 1996). As a rule, all the work in these models is performed downstream, in the processing and analysis of language input that has already entered the system. The psychological evidence suggests that selective attention is active in the domain of language processing just as it is in visual perception where it has primarily been studied (Wolfe, Butcher, Lee, & Hyle, 2003). Models that do not build in a mechanism for selectively processing input are overly constrained by the global properties of the input, and the possibilities for one form to progress in two distinct directions, as they do in splits, is severely limited. Without selective attention, future development of any given linguistic form is captive to the input. As Elman et al. (1996) point out, this is a major gap that must be addressed. Tomasello’s (2003a) theory of acquisition is a strong step in this direction, as it provides a principled and empirically-supported motivation for actively selecting input to attend to. Tomasello, who has spent much of his career studying primates, argues convincingly that a theory of language must take into account the uniquely human, and universally human, ability to make inferences about others’ intentions. Seeking to understand the thoughts and intentions of others naturally magnifies the salience of some aspects of the input and releases usage-based models from the grip of a fully deterministic outlook. Tomasello’s theory of intention-reading, along with Karmiloff-Smith’s representational redescription, set the stage for the creation of a less-mechanistic, more human, usage-based linguistics.

9 Conclusion The overarching contribution of usage-based models is to have opened room for discussion of an alternative framework upon which to build a science of language, one whose goal is not to specify or infer membership in a set of grammatical sentences, but to understand how cognition makes possible the extraction, creation, and modification of structure from the messy raw materials actually encountered in the process of using language.

The unifying framework of incremental abstractions built up from bits of usage has made it possible to integrate knowledge from across disciplines, from psychology to computer science, all of which have demonstrated means to generate large abstract systems of different forms on the basis of selectively perceived input. Finding that levels of analysis as deeply different as phonology and syntax can be understood in usage-based terms is a marked departure from previous conceptions, but an exciting one. We have begun to see the insights that the usage-based framework can provide, from explaining the direction and timing of historical language change, to explaining patterns of regularity and irregularity in acquisition and adult language production. References Aijmer, K. (1994). I think -- an English modal particle. In O. J. Westvik & T. Swan

(Eds.), Modality in Germanic languages: Historical and comparative perspectives. (pp. 1-47). Berlin: Mouton de Gruyter.

Aksu-Koç, A. (1996). Frames of mind through narrative discourse. In D. I. Slobin, J. Gerhardt, A. Kyratzis, & J. Guo (Eds.), Social interaction, social context, and language: Essays in honor of Susan Ervin-Tripp. (pp. 309-328). Hillsdale, NJ: Erlbaum.

Albright, A. & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90, 119-161.

Allen, J. (1998). Argument structures without lexical entries. Unpublished doctoral dissertation, University of Southern California.

Allen, J. & Seidenberg, M. S. (1999). The emergence of grammaticality in connectionist networks. In B. MacWhinney (Ed.), Emergentist approaches to language: proceedings of the 28th Carnegie symposium on cognition. (pp. 115-151). Hillsdale, NJ: Erlbaum.

Anderson, J. R. & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum.

Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Erlbaum. Anderson, S. R. (1999). A formalist's reading of some functionalist work in syntax. In

M. Darnell, E. A. Moravcsik, M. Noonan, F. Newmeyer, & K. M. Wheatley (Eds.), Functionalism and formalism in linguistics. (Studies in language companion series, Vol. 41-42, pp. 111-136). Amsterdam: John Benjamins.

Arbib, M. & Erdi, P. (2000). Précis of Neural organization: Structure, function, and dynamics. Behavioral and Brain Sciences, 23(4), 513-533.

Arbib, M. A., Conklin, E. J., & Hill, J. A. C. (1987). From schema theory to language. New York: Oxford University Press.

Aslin, R. N., Saffran, J. R., & Newport, E. L. (1999). Statistical learning in linguistic and nonlinguistic domains. In B. MacWhinney (Ed.), The emergence of language. (pp. 359-380). Mahwah, NJ: Erlbaum.

Bartlett, F. C. (1932). Remembering. Cambridge, UK: Cambridge University Press. Bates, E. & Goodman, J. (1999). On the emergence of grammar from the lexicon. In B.

MacWhinney (Ed.), The emergence of language. (Carnegie Mellon symposia on cognition). Mahwah, NJ: Erlbaum.

Berkenfield, C. (2001). The role of frequency in the realization of English that. In J. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic structure. (pp. 281-307). Amsterdam: John Benjamins.

Bernstein Ratner, N. (1984). Patterns of vowel modification in mother-child speech. Journal of Child Language, 11, 557-578.

Berry, D. C. & Broadbent, D. E. (1988). Interactive tasks and the implicit-explicit distinction. British Journal of Psychology, 79, 251-272.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. New York: Longman.

Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18, 355-387.

Bock, J. K. & Griffin, Z. M. (2000). The persistence of structural priming: Transient activation or implicit learning? Journal of Experimental Psychology: General, 129, 177-192.

Bod, R. (1998). Beyond grammar: an experience-based theory of language. Stanford, CA: Center for the Study of Language and Information.

Bod, R., Hay, J., & Jannedy, S. (Eds.). (2003). Probabilistic linguistics. Cambridge, MA: MIT Press.

Bod, R. & Scha, R. (1997). Data-oriented language processing. In S. Young & G. Bloothooft (Eds.), Corpus-based methods in language and speech processing. (pp. 137-173). Dordrecht, Netherlands: Kluwer.

Bod, R., Scha, R., & Sima'an, K. (Eds.). (2003). Data-oriented parsing. Stanford, CA: CSLI Publications.

Boersma, P. (1998). Functional phonology: Formalizing the interactions between articulatory and perceptual drives. Unpublished doctoral dissertation, University of Amsterdam, Amsterdam, Netherlands.

Boyland, J. T. (1996). Morphosyntactic change in progress: A psycholinguistic treatment. Unpublished doctoral dissertation, University of California, Berkeley, CA.

Boyland, J. T. (1998). How developing perception and production contribute to a theory of language change: Morphologization < Expertise+Listening < Development (MELD). Proceedings of the 34th Regional Meeting of the Chicago Linguistics Society, 34(1), 27-37.

Boyland, J. T. (2001). Hypercorrect pronoun case in English? Cognitive processes that account for pronoun usage. In J. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic structure. (pp. 383-404). Amsterdam: John Benjamins.

Boyland, J. T. & Anderson, J. R. (1999). An integrated hybrid model of syntactic priming. Paper presented at the USC Language Production Conference. November 18, University of Southern California, Los Angeles, CA.

Boyland, J. T. & Anderson, J. R. (1998). Evidence that syntactic priming is long-lasting. Proceedings of the Twentieth Annual Meeting of the Cognitive Science Society, 20, 871.

Branigan, H. P., Pickering, M. J., Stewart, A. J., & McLean, J. F. (2000). Syntactic priming in spoken production: Linguistic and temporal interference. Memory and Cognition, 28, 1297-1302.

Brent, M., R. (1997). Toward a unified model of lexical acquisition and lexical access. Journal of Psycholinguistic Research, 26, 363-375.

Brent, M., R. & Cartwright, T., A. (1996). Distributional regularity and phonotactic constraints are useful for segmentation. Cognition, 61, 93-125.

Bresnan, J. & Aissen, J. (2002). Optimality and Functionality: Objections and Refutations. Natural Language & Linguistic Theory, 20(1), 81-95.

Brooks, L. R. & Vokey, J. R. (1991). Abstract analogies and abstracted grammars: Comments on Reber (1989) and Mathews et al. (1989). Journal of Experimental Psychology: General, 120, 316-323.

Budiansky, S. (1998). Lost in translation. The Atlantic Monthly, 282(6), 80-84. Bybee, J. (1995). Regular morphology and the lexicon. Language and Cognitive

Processes, 10(5), 425-455. Bybee, J. & McClelland, J., L. (2005). Alternatives to the combinatorial paradigm of

linguistic theory based on domain general principles of human cognition. The Linguistic Review, 22(2-4), 381-410.

Bybee, J. & Scheibman, J. (1999). The effect of usage on degrees of constituency: The reduction of don't in English. Linguistics, 37(4), 575-596.

Bybee, J. L. (2006). Frequency of use and the organization of language. New York: Oxford University Press.

Bybee, J. L. (1985). Morphology: A study of the relation between meaning and form. Amsterdam: John Benjamins.

Bybee, J. L. (1999). Usage-based phonology. In M. Darnell, E. A. Moravcsik, F. Newmeyer, M. Noonan, & K. M. Wheatley (Eds.), Functionalism and Formalism in Linguistics. (Studies in Language Companion Series, vol. 41, pp. 211-242). Amsterdam: Benjamins.

Bybee, J. L. (2001). Phonology and language use. Cambridge, UK: Cambridge University Press.

Bybee, J. L. (2003). Mechanisms of change in grammaticization: The role of frequency. In B. Joseph & R. Janda (Eds.), The handbook of historical linguistics. (pp. 602-623). Oxford: Blackwell.

Bybee, J. L. & Hopper, P. (Eds.). (2001). Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins.

Bybee, J. L., Perkins, R. D., & Pagliuca, W. (1994). The evolution of grammar: tense, aspect, and modality in the languages of the world. Chicago: University of Chicago Press.

Cave, K. R. & Wolfe, J. M. (1990). Modeling the role of parallel processing in visual search. Cognitive Psychology, 22, 225-271.

Chang, F., Dell, G. S., Bock, J. K., & Griffin, Z. M. (2000). Structural priming as implicit learning: A comparison of models of sentence production. Journal of Psycholinguistic Research, 29, 217-229.

Deleted: Cartwright, T. A. & Brent, M. R. (1997). Syntactic categorization in early language acquisition: Formalizing the role of distributional analysis. Cognition, 63, 121-170.

Charniak, E. (1997). Statistical techniques for natural language parsing. AI Magazine, 18(4), 33-44.

Chomsky, N. (1957). Syntactic structures. Oxford: Mouton. Chomsky, N. (1959). Review of Verbal Behavior by B. F. Skinner. Language, 35, 26-

58. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge: MIT Press. Christiansen, M. H. & Chater, N. (2001a). Connectionist psycholinguistics: Capturing

the empirical data. Trends in Cognitive Sciences, 5, 82-88. Christiansen, M. H. & Chater, N. (Eds.). (2001b). Connectionist psycholinguistics.

Westport, CT: Ablex Publishing. Colunga, E. & Smith, L. B. (2004). Dumb mechanisms make smart concepts.

Proceedings of the 26th Annual Meeting of the Cognitive Science Society, 26, 239-244.

Crain, S. & Nakayama, M. (1987). Structure dependence in grammar formation. Language, 63, 522-543.

Crain, S. & Pietroski, P. (2001). Nature, nurture and universal grammar. Linguistics and Philosophy, 24, 139-186.

Croft, W. (2000). Explaining language change: an evolutionary approach. Harlow, England: Longman.

Croft, W., Denning, K., & Kemmer, S. (Eds.). (1990). Studies in typology and diachrony: Papers presented to Joseph H. Greenberg on his 75th birthday. Philadelphia: John Benjamins.

Daelemans, W., Gillis, S., & Durieux, G. (1994). Skousen's Analogical Modeling algorithm: A comparison with Lazy Learning. Paper presented at the International Conference on New Methods in Language Processing (NeMLaP).

Dahl, Ö. (2001). Inflationary effects in language and elsewhere. In J. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic structure. (Typological studies in language, Vol. 45, pp. 471-480). Amsterdam: John Benjamins.

DeKeyser, R. (2007). Practice in a second language: Perspectives from applied linguistics and cognitive psychology. New York: Cambridge University Press.

DeKeyser, R. M. (1997). Beyond explicit rule learning: Automatizing second language morphosyntax. Studies in Second Language Acquisition, 19, 195-221.

Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors, phonotactic constraints, and implicit learning: A study of the role of experience in

language production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1355-1367.

Dinse, H. R. & Merzenich, M. M. (2002). Adaptation of inputs in the somatosensory system. In M. Fahle & T. Poggio (Eds.), Perceptual learning. (pp. 19-42). Cambridge, MA: MIT Press.

Domangue, T. J., Mathews, R. C., Sun, R., Roussel, L. G., & Guidry, C. E. (2004). Effects of model-based and memory-based processing on speed and accuracy of grammar string generation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 1002-1011.

Edelman, S., Solan, Z., Horn, D., & Ruppin, E. (2004). Bridging computational, formal, and psycholinguistic approaches to language. Proceedings of the 26th Annual Meeting of the Cognitive Science Society, 26.

Ellis, N., C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143-188.

Ellis, N., C. (2008). The dynamics of language use, language change, and first and second language acquisition. Modern Language Journal, 92, 232-249.

Elman, J. L. (2005). Connectionist models of cognitive development: Where next? Trends in Cognitive Sciences, 9, 112-117.

Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press.

Fanselow, G., Fery, C., Schlesewsky, M., & Vogel, R. (2006). Gradience in grammar: generative perspectives. New York: Oxford University Press.

Fodor, J. & Lepore, E. (1996). The red herring and the pet fish: why concepts still can't be prototypes. Cognition, 58, 253-270.

Fodor, J. A. (1983). The modularity of mind: an essay on faculty psychology. Cambridge, MA: MIT Press.

Frazier, L. (1995). Issues in representation in psycholinguistics. In J. L. Miller & P. D. Eimas (Eds.), Speech, language, and communication. (pp. 1-27). San Diego, CA: Academic Press.

Fromkin, V. & Rodman, R. (1998). An introduction to language (6th ed ed.). Fort Worth: Harcourt Brace.

Gahl, S. & Garnsey, S. (2006). Knowledge of grammar includes knowledge of syntactic probabilities. Language, 82, 405-410.

Garrod, S. & Doherty, G. (1994). Conversation, co-ordination and convention: An empirical investigation of how groups establish linguistic conventions. Cognition, 53, 181-215.

Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale, NJ: Erlbaum.

Gildea, D. & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28, 245-288.

Giles, H. & Coupland, N. (1991). Language: contexts and consequences. Pacific Grove, CA: Brooks/Cole.

Gillis, S., Daelemans, W., & Durieux, G. (1994). Are children 'lazy learners'? A comparison of natural and machine learning of stress. Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, 369-374.

Givón, T. (1979a). Discourse and syntax. New York: Academic Press. Givón, T. (1979b). From discourse to syntax: Grammar as a processing strategy. In T.

Givón (Ed.), Discourse and syntax. (Syntax and semantics, Vol. 12, pp. 81-109). New York: Academic Press.

Givón, T. (1979c). On understanding grammar. New York: Academic Press. Givón, T. (1989). Modes of knowledge and modes of processing: The routinization of

behavior and information. Mind, code, and context: essays in pragmatics. (pp. 237-268). Hillsdale, NJ: Erlbaum.

Givón, T. (2001). Syntax: an introduction (Rev. ed.). Amsterdam: John Benjamins. Givón, T. (1999). Generativity and variation: The notion 'rule of grammar' revisited. In

B. MacWhinney (Ed.), The emergence of language. (pp. 81-114). Mahwah, NJ: Erlbaum.

Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., & Trueswell, J. C. (2005). Hard words. Language Learning and Development, 1, 23-64.

Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447-474.

Goldberg, A. E. (2006). Constructions at work: the nature of generalization in language. Oxford: Oxford University Press.

Goldberg, A. E. (1995). Constructions: a construction grammar approach to argument structure. Chicago: University of Chicago Press.

Goldberg, A. E. (1999). The emergence of the semantics of argument structure constructions. In B. MacWhinney (Ed.), Emergence of Language. (pp. 197–212). Hillsdale, NJ: Erlbaum.

Goldberg, A. E. (2003). Constructions: A new theoretical approach to language. Trends in Cognitive Sciences, 7, 219-224.

Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psychology, 49, 585-612.

Goldstone, R. L. (2000). Unitization during category learning. Journal of Experimental Psychology: Human Perception and Performance, 26, 86-112.

Goldstone, R. L. & Barsalou, L. W. (1998). Reuniting perception and conception. Cognition, 65, 231-262.

Goldstone, R. L. & Steyvers, M. (2001). The sensitization and differentiation of dimensions during category learning. Journal of Experimental Psychology: General, 130, 116-139.

Goldstone, R. L., Steyvers, M., Spencer-Smith, J., & Kersten, A. (2000). Interactions between perceptual and conceptual learning. In E. Dietrich & A. B. Markman (Eds.), Cognitive dynamics: Conceptual and representational change in humans and machines. (pp. 191-228). Mahwah, NJ: Erlbaum.

Gomez, R. L. (2002). Variability and detection of invariant structure. Psychological Science, 13, 431-436.

Gomez, R., L. & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109-135.

Gomez, R., L. & Gerken, L. (2000). Infant artificial language learning and language acquisition. Trends in Cognitive Sciences, 4, 178-186.

Greenberg, J. H. (1960). A quantitative approach to the morphological typology of language. International Journal of American Linguistics, 26, 178-194.

Greenberg, J. H. (1974a). Three studies in the frequency of morphological categories: 1. A method for measuring the degree of overt expression of grammatical categories applied to the Sanskrit of the Rigveda. Stanford University Working Papers on Language Universals, 16, 1-19.

Greenberg, J. H. (1974b). Three studies in the frequency of morphological categories: 2. The relation of frequency to semantic feature in a case language (Russian). Stanford University Working Papers on Language Universals, 16, 21-45.

Greenberg, J. H. (1995). The diachronic typological approach to language. In M. Shibatani & T. Bynon (Eds.), Approaches to Language Typology. (pp. 145-166). Oxford, UK: Clarendon Press.

Greenberg, J. H. & O'Sullivan, C. (1974). Three studies in the frequency of morphological categories: 3. Frequency, marking and discourse styles with special

Formatted: English (US)

Deleted: ,

Deleted: Greenberg, J. H. (1974a). Three studies in the frequency of morphological categories: 1. A method for measuring the degree of overt expression of grammatical categories applied to the Sanskrit of the Rigveda. Stanford University Working Papers on Language Universals, 16, 1-19.

reference to substantival categories in the Romance languages. Stanford University Working Papers on Language Universals, 16, 47-72.

Gries, S. T. & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on 'alternations'. International Journal of Corpus Linguistics, 9(1), 97-129.

Gross, M. (1979). On the failure of generative grammar. Language, 55, 859-885. Gupta, P. & Cohen, N. J. (2002). Theoretical and computational analysis of skill

learning, repetition priming, and procedural memory. Psychological Review, 109, 401-448.

Hare, M. L., Ford, M., & Marslen-Wilson, W. D. (2001). Ambiguity and frequency effects in regular verb inflection. In J. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic structure. (pp. 181-200). Amsterdam: John Benjamins.

Haskell, T. R., MacDonald, M. C., & Seidenberg, M. S. (2003). Language learning and innateness: some implications of compounds research. Cognitive Psychology, 47, 119-163.

Hatzivassiloglou, V. (1996). Do we need linguistics when we have statistics? A comparative analysis of the contributions of linguistic cues to a statistical word grouping system. In J. L. Klavans & P. Resnik (Eds.), The balancing act: combining symbolic and statistical approaches to language. (pp. 67-94). Cambridge, MA: MIT Press.

Hawkins, J. A. (1994). A performance theory of order and constituency. Cambridge, UK: Cambridge University Press.

Hayes, B. P. (1999). Phonetically-driven phonology: The role of Optimality Theory and inductive grounding. In M. Darnell, E. A. Moravcsik, M. Noonan, F. Newmeyer, & K. M. Wheatley (Eds.), Functionalism and formalism in linguistics. (Studies in language companion series, vol. 41-42, pp. 243-285). Amsterdam: John Benjamins.

Hayes, B. P. & Londe, Z. C. (2006). Stochastic phonological knowledge: The case of Hungarian vowel harmony. Phonology, 23, 59-104.

Hayes, B. P. (2000). Gradient well-formedness in optimality theory. In J. Dekkers, F. van der Leeuw, & J. van de Weijer (Eds.), Optimality Theory: Phonology, syntax, and acquisition. (pp. 88-120). Oxford: Oxford University Press.

Heine, B. & Reh, M. (1984). Grammaticalization and reanalysis in African languages. Hamburg: H. Buske.

Hockenmaier, J. & Steedman, M. (2002). Generative models for statistical parsing with combinatory grammars. Paper presented at the 40th Meeting of the Association for Computational Linguistics. July 7-12, Philadelphia.

Holt, L. L., Lotto, A. J., & Kluender, K. R. (2001). Influence of fundamental frequency on stop-consonant voicing perception: A case of learned covariation or auditory enhancement? Journal of the Acoustical Society of America, 109, 764-774.

Hopper, P. J. & Traugott, E. C. (2003). Grammaticalization (2nd ed.). Cambridge, UK: Cambridge University Press.

Hopper, P. J. (1997). When 'Grammar' and Discourse Clash: The Problem of Source Conflicts. In J. Bybee, J. Haiman, & S. A. Thompson (Eds.), Essays on Language Function and Language Type: Dedicated to T. Givón. (pp. 231-247). Amsterdam: John Benjamins.

Hopper, P. J. (1987). Emergent grammar. Berkeley Linguistics Society, 13, 139-157. Hopper, P. J. & Thompson, S. A. (1984). The discourse basis for lexical categories in

Universal Grammar. Language, 60(4), 703-752. Hudson, J. (1998). Perspectives on fixedness: applied and theoretical. Unpublished

doctoral dissertation, Lund University, Lund, Sweden. Janda, R. & Joseph, B. D. (1999). The Modern Greek negator mi(n)(-) as a

morphological constellation. Proceedings of the 3rd International Conference on Greek Linguistics (pp. 341-351). Athens: Hellenika Grammata.

Janda, R., Joseph, B. D., & Jacobs, N. (1994). Systematic hyperforeignisms as maximally external evidence for linguistic rules. In S. D. Lima, R. L. Corrigan, & G. K. Iverson (Eds.), The reality of linguistic rules. (Studies in Language Companion Series, Vol. 26, pp. 67-92). Amsterdam: John Benjamins.

Jespersen, O. (1966). Negation in English and other languages (2nd ed.). Copenhagen: Munksgaard.

Johnson, K. (2004). Gold's Theorem and cognitive science. Philosophy of Science, 71(4), 571-592.

Johnson, T. R., Wang, H., & Zhang, J. (1998). Modeling speed-up and transfer of declarative and procedural knowledge. Proceedings of the Twentieth Annual Meeting of the Cognitive Science Society, 531-536.

Joseph, B. D. & Janda, R. (1988). The how and why of diachronic morphologization and demorphologization. In M. Hammond & M. Noonan (Eds.), Theoretical morphology. (pp. 193-210). New York: Academic Press.

Joseph, B. D. (1992). Diachronic explanation: Putting speakers back into the picture. In G. W. Davis & G. K. Iverson (Eds.), Explanation in historical linguistics. (pp. 123-144). Amsterdam: John Benjamins.

Joseph, B. D. (1997a). How general are our generalizations? What speakers actually know and what they actually do. In A. Green & V. Motapanyane (Eds.), Proceedings of the 13th Eastern States Conference on Linguistics (ESCOL '96). (pp. 148-160). Ithaca, NY: Cornell Linguistics Circle Publications/ Cascadilla Press.

Joseph, B. D. (1997b). On the linguistics of marginality: The centrality of the periphery. In G. Anderson, R. Eggert, & K. Singer (Eds.), Papers from the 33rd Regional Meeting of the Chicago Linguistic Society. (pp. 197-213). Chicago: Chicago Linguistic Society.

Joseph, B., D. (2002). Balkan Insights into the Syntax of *me in Indo-European. In M. Southern, R. V. (Ed.), Indo-European Perspectives. (Journal of Indo-European Studies Monograph Series, Vol. 43, pp. 103-120). Washington, DC: Institute for the Study of Man.

Jurafsky, D. (1996). A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science, 20, 137-194.

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between words: Evidence from reduction in lexical production. In J. Bybee & P. Hopper (Eds.), Frequency and the emergence of linguistic structure. (pp. 229-254). Amsterdam: John Benjamins.

Karmiloff-Smith, A. (1992). Beyond modularity: A developmental perspective on cognitive science. Cambridge, MA: MIT Press.

Karmiloff-Smith, A. (1994). Précis of Beyond modularity: A developmental perspective on cognitive science. Behavioral and Brain Sciences, 17, 693-745.

Kaschak, M., P. & Glenberg, A., M. (2004). This construction needs learned. Journal of Experimental Psychology: General, 133, 450-467.

Kay, P. (1997). Construction grammar. Words and the Grammar of Context. (CSLI Lecture Notes 40, pp. 123-132). Stanford, CA: Center for the Study of Language and Information.

Kemmer, S. & Barlow, M. (1999). Introduction: A usage-based conception of language. In M. Barlow & S. Kemmer (Eds.), Usage-based models of language. (pp. 7-28). Stanford, CA: CSLI Publications.

Kim, A. E., Srinivas, B., & Trueswell, J. (2002). A computational model of the grammatical aspects of word recognition as supertagging. In P. Merlo & S. Stevenson (Eds.), The lexical basis of sentence processing: formal, computational, and experimental issues. (pp. 109–135). Amsterdam: John Benjamins.

Kinder, A. & Assmann, A. (2000). Learning artificial grammars: No evidence for the acquisition of rules. Memory and Cognition, 28, 1321-1332.

Kirsner, K., Speelman, C., Maybery, M., O'Brien-Malone, A., & Anderson, M. (Eds.). (1998). Implicit and explicit mental processes. Mahwah, NJ: Erlbaum.

Klavans, J. L. & Resnik, P. (Eds.). (1996). The balancing act: combining symbolic and statistical approaches to language. Cambridge, MA: MIT Press.

Kluender, K. R., Lotto, A. J., Holt, L. L., & Bloedel, S. L. (1998). Role of experience for language-specific functional mappings of vowel sounds. Journal of the Acoustical Society of America, 104, 3568-3582.

Kluender, K., R, Lotto, A., J, & Holt, L., L. (2005). Contributions of nonhuman animal models to understanding human speech perception. In S. Greenberg & W. Ainsworth (Eds.), Listening to speech: An auditory perspective. (pp. 203-220). New York: Oxford University Press.

Krauss, R. M. & Weinheimer, S. (1964). Changes in reference phrases as a function of frequency of usage in social interaction: A preliminary study. Psychonomic Science, 1, 113-114.

Krauss, R. M. & Weinheimer, S. (1966). Concurrent feedback, confirmation, and the encoding of referents in verbal communication. Journal of Personality and Social Psychology, 4, 343-346.

Krug, M. G. (2000). Emerging English modals: a corpus-based study of grammaticalization. Berlin: Mouton de Gruyter.

Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press.

Lambert, C. (2006). The marketplace of perceptions. Harvard Magazine, 108, 50-57, 93-95 (11 pp.).

Langacker, R. W. (1987). Foundations of cognitive grammar (1-2). Stanford, CA: Stanford University Press.

Langacker, R. W. (1990). Concept, image, and symbol: the cognitive basis of grammar. Berlin: Mouton de Gruyter.

Lasnik, H. (1999). On the locality of movement: Formalist syntax position paper. In M. Darnell, E. A. Moravcsik, M. Noonan, F. Newmeyer, & K. M. Wheatley (Eds.),

Deleted: (32)

Functionalism and formalism in linguistics. (Studies in language companion series, Vol. 41-42, pp. 33-54). Amsterdam: John Benjamins.

Levelt, W. J. M. (1989). Speaking: from intention to articulation (ACL-MIT Press series in natural-language processing). Cambridge, MA: MIT Press.

Lewis, J. D. & Elman, J. L. (2002). Learnability and the statistical structure of language: Poverty of stimulus arguments revisited. Proceedings of the 26th Annual Boston University Conference on Language Development, 26, 359-370.

Li, C. N. (Ed.). (1976). Subject and topic. New York: Academic Press. Liberman, M. Y. (2004a). Dwarves vs. dwarfs. Language Log. Retrieved 7 December

2006, http://itre.cis.upenn.edu/~myl/languagelog/archives/000344.html. Liberman, M. Y. (2004b). The curious case of quasi-regularity. Language Log.

Retrieved 7 December 2006, http://itre.cis.upenn.edu/~myl/languagelog/archives/000293.html.

Lima, S. D., Corrigan, R. L., & Iverson, G. K. (Eds.). (1994). The reality of linguistic rules. Amsterdam: John Benjamins.

Logan, G. D. (1985). Skill and automaticity: Relations, implications, and future directions. Canadian Journal of Psychology, 39, 367-386.

Luce, P. A., Pisoni, D. B., & Goldinger, S. D. (1990). Similarity neighborhoods of spoken words. In G. T. M. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives. (pp. 122-147). Cambridge, MA: MIT Press.

MacDonald, M. C. (1999). Distributional information in language comprehension, production and acquisition: Three puzzles and a moral. In B. MacWhinney (Ed.), The emergence of language. (pp. 177-196). Mahwah, NJ: Erlbaum.

MacWhinney, B. (2004). A multiple process solution to the logical problem of language acquisition. Journal of Child Language, 31, 883-914.

Manning, C. D. & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

Marr, D. (1982). Vision: a computational investigation into the human representation and processing of visual information. San Francisco: W.H. Freeman.

Marslen-Wilson, W. & Tyler, L. K. (1998). Rules, representations, and the English past tense. Trends in Cognitive Sciences, 2, 428-435.

Mathews, R. C. & Cochran, B., P. (1998). Project Grammarama revisited: Generativity of implicitly acquired knowledge. In M. Stadler, A. & P. A. Frensch (Eds.),

Formatted: English (US)

Deleted: 000344

Deleted: ,

Handbook of implicit learning. (pp. 223-259). Thousand Oaks, CA: Sage Publications.

McAndrews, M. P. & Moscovitch, M. (1985). Rule-based and exemplar-based classification in artificial grammar learning. Memory and Cognition, 13(5), 469-475.

McCarthy, J. J. (2002). A thematic guide to optimality theory. Cambridge, UK: Cambridge University Press.

McClelland, J. L. & Patterson, K. (2002). Rules or connections in past-tense inflections: What does the evidence rule out? Trends in Cognitive Sciences, 6, 465-472.

McRae, K., Spivey-Knowlton, M. J., & Tanenhaus, M. K. (1998). Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension. Journal of Memory and Language, 38, 283-312.

Milroy, J. (1992). Linguistic variation and change: on the historical sociolinguistics of English. Oxford, UK: Blackwell.

Morgan, J. L. & Demuth, K. (1996). Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Hillsdale, NJ: Erlbaum.

Mosel, U. (1980). The influence of the substratum on the development of New Guinea Pidgin. Pacific Linguistics, B - 73.

Mukherjee, J. (2004). Corpus Data in a Usage-Based Cognitive Grammar. In K. Aijmer & B. Altenberg (Eds.), Advances in Corpus Linguistics: Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) Göteborg 22-26 May 2002. (Language and Computers: Studies in Practical Linguistics, Vol. 49, pp. 85-100). Amsterdam: Rodopi.

Newmeyer, F. J. (2003). Grammar is grammar and usage is usage. Language, 79, 682-707.

Newmeyer, F. J. (1998). Language form and language function. Cambridge, MA: MIT Press.

Novick, J. M., Kim, A., & Trueswell, J. C. (2003). Studying the grammatical aspects of word recognition: Lexical priming, parsing and syntactic ambiguity resolution. Journal of Psycholinguistic Research, 32, 57-75.

Ono, T., Thompson, S. A., & Suzuki, R. (2000). The pragmatic nature of the so-called subject marker ga in Japanese: Evidence from conversation. Discourse Studies, 2, 55-84.

Pagliuca, W. (1976). PRE -fixing. Unpublished manuscript, SUNY Buffalo, Buffalo, NY.

Svartvik, J. (Ed.). (1996). Grammarian's lexicon, lexicographer's lexicon: Worlds apart. Stockholm: Kungl. Vitterhets Historie & Antikvitets Akademien.

Pawley, A. & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and Communication. (pp. 191-226). New York: Longman.

Peters, A. M. (1983). The units of language acquisition. Cambridge, UK: Cambridge University Press.

Pierrehumbert, J. (1999). Formalizing functionalism. In M. Darnell, E. A. Moravcsik, M. Noonan, F. Newmeyer, & K. M. Wheatley (Eds.), Functionalism and formalism in linguistics, Vol. I. (Studies in language companion series, Vol. 41-42, pp. 287-305). Amsterdam: John Benjamins.

Pinker, S. (1984). Language learnability and language development. Cambridge, Mass: Harvard University Press.

Pinker, S. (1999). Words and rules: The ingredients of language. New York: Basic Books.

Pinker, S. & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28, 73-193.

Pinker, S. & Ullman, M. T. (2002). The past and future of the past tense. Trends in Cognitive Sciences, 6, 456-463.

Plaut, D. C., McClelland, J. C., & Seidenberg, M. S. (1995). Reading exception words and pseudowords: Are two routes really necessary?. In J. P. Levy, D. Bairaktaris, J. A. Bullinaria, & P. Cairns (Eds.), Connectionist models of memory and language. (pp. 145-159). London: UCL Press.

Plunkett, K. & Juola, P. (1999). A connectionist model of English past tense and plural morphology. Cognitive Science, 23, 463-490.

Poldrack, R. A., Selco, S. L., Field, J. E., & Cohen, N. J. (1999). The relationship between skill learning and repetition priming: Experimental and computational analyses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 208-235.

Port, R. F. & Van Gelder, T. (1995). Mind as motion: explorations in the dynamics of cognition. Cambridge, MA: MIT Press.

Postal, P. M. (2004). Skeptical linguistic essays. Oxford, UK: Oxford University Press. Pullum, G. K. (1997). The morpholexical nature of English to-contraction. Language,

73(1), 79-102.

Rabinowitz, M. & Goldberg, N. (1995). Evaluating the structure-process hypothesis. In F. E. Weinert & W. Schneider (Eds.), Memory performance and competencies: Issues in growth and development. (pp. 225-242). Hillsdale, NJ: Erlbaum.

Reali, F. & Christiansen, M. H. (2005). Uncovering the richness of the stimulus: Structure dependence and indirect statistical evidence. Cognitive Science, 29, 1007-1028.

Reber, A. S. (1993). Implicit learning and tacit knowledge: an essay on the cognitive unconscious. New York: Oxford University Press.

Robinson, P. J. & Ha, M. A. (1993). Instance theory and second language rule learning under explicit conditions. Studies in Second Language Acquisition, 15, 413-438.

Roland, D. & Jurafsky, D. (2002). Verb sense and verb subcategorization probabilities. In P. Merlo & S. Stevenson (Eds.), The lexical basis of sentence processing: formal, computational, and experimental issues. (pp. 325-345). Amsterdam: John Benjamins.

Rosch, E. & Lloyd, B. B. (1978). Cognition and categorization. Hillsdale, NJ: Erlbaum. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic

objects in natural categories. Cognitive Psychology, 8, 382-439. Ross, J. R. (1973). A fake NP squish. In C.-J. N. Bailey & R. Shuy (Eds.), New Ways

of Analyzing Variation in English. (pp. 96-140). Washington: Georgetown University.

Rumelhart, D. E. & McClelland, J. L. (1986). On learning the past tenses of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel Distributed Processing: Explorations in the microstructure of cognition. (pp. 216-271). Cambridge, MA: MIT Press.

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.

Saffran, J. R. & Wilson, D. P. (2003). From syllables to syntax: Multilevel statistical learning by 12-month-old infants. Infancy, 4, 273-284.

Sampson, G. (2001). Empirical linguistics. London: Continuum. Savage, C., Lieven, E., Theakston, A., & Tomasello, M. (2003). Testing the abstractness

of children's linguistic representations: lexical and structural priming of syntactic constructions in young children. Developmental Science, 6, 557-567.

Savage, C., Lieven, E., Theakston, A., & Tomasello, M. (2006). Structural priming as implicit learning in language acquisition: The persistence of lexical and structural priming in 4-year-olds. Language Learning and Development, 2, 27-49.

Scarry, R. (1966). Storybook dictionary. New York: Golden Press. Schank, R. C. & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: an

inquiry into human knowledge structures. Hillsdale, NJ: Erlbaum. Schegloff, E. A., Ochs, E., & Thompson, S. A. (1996). Introduction. In E. Ochs, E. A.

Schegloff, & S. A. Thompson (Eds.), Interaction and grammar. (pp. 1-51). Cambridge, UK: Cambridge University Press.

Scheibman, J. (2000). I dunno: A usage-based account of the phonological reduction of don't in American English conversation. Journal of Pragmatics, 32, 105-124.

Scheibman, J. (2002). Point of view and grammar: structural patterns of subjectivity in American English conversation (Studies in discourse and grammar, vol. 11). Amsterdam: John Benjamins.

Sedlmeier, P. & Betsch, T. (2002). ETC.: Frequency processing and cognition. New York: Oxford University Press.

Seidenberg, M. S. (1997). Language acquisition and use: Learning and applying probabilistic constraints. Science, 275, 1599-1603.

Servan-Schreiber, E. & Anderson, J. R. (1990). Learning artificial grammars with competitive chunking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 592-608.

Sinclair, J. (2005). English Grammar. UK: Collins CoBUILD. Skousen, R. (1989). Analogical modeling of language. Dordrecht: Kluwer Academic. Skousen, R. (1992). Analogy and structure. Dordrecht: Kluwer Academic. Skousen, R. (1998). Natural statistics in language modelling. Journal of Quantitative

Linguistics, 5(3), 246-255. Smith, L. B. (1999a). Children's noun learning: How general learning processes make

specialized learning mechanisms. In B. MacWhinney (Ed.), The emergence of language. (pp. 277-303). Mahwah, NJ: Erlbaum.

Smith, L. B. (1999b). Not 'either', not 'or', not 'both'. Developmental Science, 2, 162-163.

Smolensky, P. (2001). Grammar-based connectionist approaches to language. In M. H. Christiansen & N. Chater (Eds.), Connectionist psycholinguistics. (pp. 319-347). Westport, CT: Ablex.

Steedman, M. (2001). Connectionist sentence processing in perspective. In M. H. Christiansen & N. Chater (Eds.), Connectionist psycholinguistics. (pp. 348-372). Westport, CT: Ablex Publishing.

Stefanowitsch, A. & Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8(2), 209-243.

Sun, C. (1996). Word-order change and grammaticalization in the history of Chinese. Stanford, CA: Stanford University Press.

Sur, M. & Rubenstein, J. L. R. (2005). Patterning and plasticity of the cerebral cortex. Science, 310, 805-810.

Sweetser, E. E. (1991). From etymology to pragmatics: Metaphorical and cultural aspects of semantic structure. Cambridge, UK: Cambridge University Press.

Tabor, W. (1994). Syntactic innovation: a connectionist model. Unpublished doctoral dissertation, Stanford University, Stanford, CA.

Tabor, W., Juliano, C., & Tanenhaus, M. K. (1997). Parsing in a dynamical system: An attractor-based account of the interaction of lexical and structural constraints in sentence processing. Language and Cognitive Processes, 12, 211-271.

Tolkien, J. R. R., Carpenter, H., & Tolkien, C. (1981). The letters of J.R.R. Tolkien. Boston: Houghton Mifflin.

Tomasello, M. (2003a). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard.

Tomasello, M. (2003b). Introduction: Some surprises for psychologists. The new psychology of language: Cognitive and functional approaches to language structure. (Vol. 2, pp. 1-14). Mahwah, NJ: Erlbaum.

Traugott, E. C. (1995). The role of the development of discourse markers in a theory of grammaticalization. Paper presented at the 12th International Conference on Historical Linguistics. August 1995, Manchester, England.

Traugott, E. C. (1989). On the rise of epistemic meanings in English: An example of subjectification in semantic change. Language, 57, 33-65.

VanLehn, K. (1996). Cognitive skill acquisition. Annual Review of Psychology, 47, 513-539.

Watkins, C. (1962). Indo-European origins of the Celtic verb. Dublin: Dublin Institute for Advanced Studies.

Weissenborn, J. & Höhle, B. (Eds.). (2001a). Approaches to bootstrapping: Phonological, lexical, syntactic, and neurophysiological aspects of early language acquisition. Vol. I. Amsterdam: John Benjamins.

Deleted: Hohle

Weissenborn, J. & Höhle, B. (Eds.). (2001b). Approaches to bootstrapping: Phonological, lexical, syntactic, and neurophysiological aspects of early language acquisition. Vol. II. Amsterdam: John Benjamins.

Wilson, M. (2001). The impact of sign language expertise on visual perception. In M. D. Clark, M. Marschark, & M. Karchmer (Eds.), Context, cognition, and deafness. (pp. 38-48). Washington, DC: Gallaudet University Press.

Wolfe, J. M., Butcher, S. J., Lee, C., & Hyle, M. (2003). Changing your mind: On the contributions of top-down and bottom-up guidance in visual search for feature singletons. Journal of Experimental Psychology: Human Perception and Performance, 29, 483-502.

Zwicky, A. M. (1992). Some choices in the theory of morphology. In R. Levine (Ed.), Formal grammar: theory and implementation. (Vancouver studies in cognitive science, Vol. 2, pp. 327-371). New York: Oxford University Press.

Zwicky, A. M. (2002). Seeds of variation and change (Plenary Speech). Paper presented at the New Ways of Analyzing Variation (NWAV) 31 conference. October 10-13, Stanford, CA.

Zwicky, A. M. (2005). Adventures in the advice trade. Retrieved February 9, 2006, from http://www-csli.stanford.edu/~zwicky/proposal.pdf

Deleted: Hohle


Recommended