+ All Categories
Home > Documents > Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Date post: 07-Nov-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
25
Applied Linguistics: 1–25 ß Oxford University Press 2011 doi:10.1093/applin/amq053 A Multidimensional Analysis of a Written L2 Spanish Corpus *YULY ASENCIO ´ N-DELANEY and JOSEPH COLLENTINE Northern Arizona University, Modern Languages, Box 6004, Flagstaff, AZ 86011, USA *E-mail: [email protected] The present study adds to our understanding of how learners employ lexical and grammatical phenomena to communicate in writing in different types of inter- language discourse. A multidimensional (factor) analysis of a corpus of L2 Spanish writing (202,241 words) generated by second- and third-year, university-level learners was performed. The analysis uncovered four significant clusters that can be considered distinct discourse types with two main stylistic variations: narrative (characterized by verbal features) and expository (charac- terized by nominal features). Results also provide examples of the multiple ways that stylistic sophistication and linguistic complexity occur in the L2. Although the Spanish learners’ discourse did not show signs of syntactic complexity (e.g. frequent use of relative clauses, subordinate clauses, use of clitics), the frequent use of nominal features affects informational density due to the presence of numerous derivational morphemes. Inflectional complexity in the form of marked forms was not predominant in the data set. Still, the learners’ verbal inflections did vary, which is a sign of L2 development (Howard 2002, 2006; Collentine 2004; Marsden and David 2008). INTRODUCTION Second language acquisition (SLA) research views learning not only as know- ing individual lexical and grammatical features of a second language (L2) but also how these features work together to generate types of discourse (e.g. narratives, descriptions). In discussing certain mismatches between linguistic theories, psycholinguistics, and corpus-based investigations, Ellis (2002) com- ments that there has been traditionally a ‘blind faith’ in the (psycholinguistic) reality of structural categories (e.g. noun, verb). Collentine and Asencio ´ n- Delaney (2010) use corpus-based techniques to show that, statistically speak- ing, seemingly disparate lexical and grammatical L2 structures (e.g. verbs and adjectives) develop in tandem and work together to affect different types of discourse. Indeed, as Ellis et al. (2008) note, there is evidence that ‘human production grammar must store probabilistic relations between words’ (p. 377). Furthermore, Ellis et al. (2008) postulate that an important difference between native and non-native speakers is that ‘native speakers have extracted underlying co-occurrence information’. Applied Linguistics Advance Access published January 17, 2011 by guest on January 29, 2011 applij.oxfordjournals.org Downloaded from
Transcript
Page 1: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Applied Linguistics 1ndash25 Oxford University Press 2011

doi101093applinamq053

A Multidimensional Analysis of aWritten L2 Spanish Corpus

YULY ASENCION-DELANEY and JOSEPH COLLENTINE

Northern Arizona University Modern Languages Box 6004 Flagstaff AZ 86011 USAE-mail yulyasencionnauedu

The present study adds to our understanding of how learners employ lexical andgrammatical phenomena to communicate in writing in different types of inter-language discourse A multidimensional (factor) analysis of a corpus of L2Spanish writing (202241 words) generated by second- and third-yearuniversity-level learners was performed The analysis uncovered four significantclusters that can be considered distinct discourse types with two main stylisticvariations narrative (characterized by verbal features) and expository (charac-terized by nominal features) Results also provide examples of the multiple waysthat stylistic sophistication and linguistic complexity occur in the L2 Althoughthe Spanish learnersrsquo discourse did not show signs of syntactic complexity (egfrequent use of relative clauses subordinate clauses use of clitics) the frequentuse of nominal features affects informational density due to the presence ofnumerous derivational morphemes Inflectional complexity in the form ofmarked forms was not predominant in the data set Still the learnersrsquo verbalinflections did vary which is a sign of L2 development (Howard 2002 2006Collentine 2004 Marsden and David 2008)

INTRODUCTION

Second language acquisition (SLA) research views learning not only as know-ing individual lexical and grammatical features of a second language (L2) butalso how these features work together to generate types of discourse (egnarratives descriptions) In discussing certain mismatches between linguistictheories psycholinguistics and corpus-based investigations Ellis (2002) com-ments that there has been traditionally a lsquoblind faithrsquo in the (psycholinguistic)reality of structural categories (eg noun verb) Collentine and Asencion-Delaney (2010) use corpus-based techniques to show that statistically speak-ing seemingly disparate lexical and grammatical L2 structures (eg verbs andadjectives) develop in tandem and work together to affect different types ofdiscourse Indeed as Ellis et al (2008) note there is evidence that lsquohumanproduction grammar must store probabilistic relations between wordsrsquo(p 377) Furthermore Ellis et al (2008) postulate that an important differencebetween native and non-native speakers is that lsquonative speakers have extractedunderlying co-occurrence informationrsquo

Applied Linguistics Advance Access published January 17 2011 by guest on January 29 2011

applijoxfordjournalsorgD

ownloaded from

Researchers have powerful analytical tools for studying how lexical andgrammatical features generate discourse types (Biber and Conrad 2001)Still Myles (2005) and Myles and Mitchell (2005) argue that SLA researchhas been slow to embrace corpus-based technologies even though corpus lin-guistics is effective at helping theoretical linguistics to understand the nature ofdiscourse types Myles and Mitchell (2005) also note that corpus research cancomplement and increase the generalizability of SLA research by examininglarge amounts of data

We present the first-known multidimensional analysis of a written Spanishlearner corpus examining how learners combine lexical and grammatical phe-nomena to generate different discourse types Before presenting our researchquestions we provide an overview of research about the acquisition of discur-sive abilities how corpora have been used in SLA to understand the mappingof form to function and the usefulness of the multidimensional corpusanalysis

Developing L2 discursive abilities

Canale and Swainrsquos (1980) and Bachmanrsquos (1990) theories of L2 communica-tive competence afford discoursetextual competence a central role in SLAThe American Council on the Teaching of Foreign Languages (ACTFL 2001)proficiency scales assume a developmental progression with learners develop-ing conversational then descriptive then narratives abilities and finally morecomplex abilities In general the study of discursive abilities entails how learn-ers obtain the ability to produce multipropositional segments whose semanticand structural interrelationships (eg gendernumber agreement) cohere(Givon 1985) This research has considered how pragmatic abilities affect L2discursive abilities the assignment of discursive roles to grammatical con-structs and learnersrsquo use of discourse markers

Pragmatic research concerned with the discursive context studies how westructure information across sentences treating issues such as how syntacticstructures affect coherence topicfocus and propositional presupposition(Horn and Ward 2006 xiiindashxix) SLA pragmatic research as Kasper (2001)notes examines how learners organize discourse and how they employ gram-matical structures to produce multipropositional messages (eg markingcause-effect chains with conditional if structures) The acquisition of whatmay be termed lsquogrammatical-pragmaticrsquo abilities entails not only lexical andgrammatical phenomenarsquos denotations but also their connotations as managedby readerslisteners to interpret multisentential messages

Research suggests that L2 discursive abilities develop as learners attain bothgrammatical and pragmatic abilities In uninstructed learning contexts func-tional necessities (eg distinguishing between topic and comment) influencewhich lexical and grammatical tools learners grammaticalize (Skiba andDittmar 1992) Geyer (2007) presents data suggesting that once Japaneselearners develop lsquoseveral (grammatical-pragmatic) devicesrsquo (p 362) they can

2 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

chain utterances to achieve discursive organization Geyer reports thatadvanced learners use coordinate and adverbial conjunctions to makedistinctions such as foregroundbackground and cohesive propositionalsegmentingThe research on the acquisition of Spanishrsquos two copulas ser and estar shows

how learners associate lexical and grammatical features with particular dis-course types Cheng et al (2008) examining a corpus of Mandarin ChineseL1 learners of Spanish report that exploratory writing led to more copulaestar+ adjective usage They also noted that this structure was mostly asso-ciated with narratives and descriptions Collentine and Asencion-Delaney(2010) in their corpus-based study of the Spanish copula ser+ adjective andestar+ adjective report that overall when learners begin mastering simple dis-course types lexical syntactic and morphological diversity increases and com-plexity increases1 They found that ser+adjective appears in descriptive andevaluative discourse where lexical and morphological diversity and complexityoccurs However estar+ adjective is present in narrations descriptions andhypothetical discourse where nonetheless surprisingly little linguistic com-plexity occursOther discourse related research has studied L2 discourse markers Upton

and Connor (2001) use corpus techniques to compare formulaics in lsquoapplica-tion lettersrsquo by L1 speakers of (American) English and L2 learners of Englishfrom Belgian and Finland The L2 writers were much less formulaic than theL1 writers with politeness strategies due to L1 influence Forsberg (2005) usesan L2 French learner corpus from L1 Swedish speakers to study different typesof idioms (eg grammatical pragmatic purely lexical idioms) Forsberg foundthat lexical idioms increase in writing over time She also found that formulaicswith a discursive role were lsquooverusedrsquo by learners of all levels

SLA corpus research on the development of structural andlexical phenomena

Corpus-based L2 research has helped us to understand how learners map formto function2 It also elucidates the role of L2 formulaic segments It has pro-vided important insights into the interaction of grammar and the lexiconWhat is consistent within this research is that L2 lexical phenomena and gram-matical constructs work in tandem Finally corpus-based L2 research hasexplored ways to efficiently measure L2 syntactic complexityRegarding the mapping of form to function Klein and Perdue (1997) used a

learner corpus to posit that uninstructed settings generate a lsquobasic varietyrsquowhere word order and grammatical constructs result from functional commu-nicative necessities Granger et al (2002) and Belz (2004) use corpus tech-niques to document how social and institutional pressures affect new L2phenomena (eg da-compounds such as dazu lsquothere-torsquo davon lsquothere-fromrsquodarin lsquothere-inrsquo)

Y ASENCION-DELANEY AND J COLLENTINE 3

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Some investigators conceptualize formulaic segments (eg in other words forexample) as singular dictionary entries (cf Ellis 2006) Using the BritishNational Corpus (BNC Consortium 2001) Ellis et al (2008) found that L2learners at early stages of acquisition recognize formulaics based on their fre-quency in the input Interestingly however Ellis et al (2008) argue thatnative-like use of formulaics ultimately requires learners to alter their process-ing of words lsquofor coherence for co-occurrence greater than chancersquo (p 391)

Regarding the interaction between morphology and the lexicon Marsdenand David (2008) study vocabulary development among L2 learners of Spanishat ages 9 and 13 The 13-year-old learners who had 450 more hours of class-room instruction showed greater lexical and inflectional diversity Their ana-lysis supports the notion that in inflectionally rich languages like Spanish animportant indicator of development is not only accuracy as measured bygrammatical errors but also increased inflectionalderivational variation(Howard 2002 2006 Collentine 2009) Their research also indicates that akey factor in differentiating levels of development is changes in the part ofspeech (eg nouns verbs adjectives) that predominate in productionMarsden and David (2008) present evidence suggesting that as learners pro-gress in speech they produce more verbs than nouns and that as they begin toproduce more verbs they also start to produce more adjectives Since learnersproduce different discourse types as they progress it is logical to surmise thatthese changes in lexical and grammatical production parallel changes in thediscourse learnersrsquo produce

Two research projects have used corpus techniques to study the emergenceof linguistic complexity and semantic density Operationalizing lsquocomplexityrsquoand lsquodensityrsquo in corpus-based studies is challenging but important Ortega(2000) uses a corpus of intermediate-level L2-Spanish learnersrsquo writing toidentify reliable measures of syntactic complexity Her analysis indicated thatclause length phrasal elaboration and amount of subordination best predictsyntactic complexity Collentine (2004) employs corpus techniques to comparemorphological and lexical complexity in an in-class learning context and astudy-abroad context After a semester the study-abroad learners employedmore morphological narrative complexity by using past-tense verbs third-person morphology past participles and present participles (as well as publicverbs eg decir que lsquoto say thatrsquo) From a learnerrsquos perspective morphologicalcomplexity in Spanish requires the use of a range of aspectual personnumber and gender inflections beyond simple verb tenses (eg the present)and other unmarked morphemes (eg masculine-singular nounsadjectives cfCollentine 2004 237) The in-class group was more lexically complex produ-cing a higher concentration of nominal featuresmdashnouns and adjectivesmdashandso semantically dense discourse (Biber 1988)

Finally Grant and Ginther (2000) urge corpus-based SLA researchers tocombine qualitative with quantitative techniques to increase generalizabilityof results Many approaches (eg part-of-speech tagging) do not generallyfactor in errors and small corpora bias from sampling particular tasks While

4 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

we do not report error rates or omissions in this study we are confident thatour multidimensional approach and its mixed method research design (iequantitative + qualitative analyses) together with the size of our corpus ad-dress Grant and Gintherrsquos (2000) concerns

The multidimensional analysis and relevant corpus linguistictechniques

Multidimensional corpus analysis shows how lexical and grammatical featuresbundle together to produce different and new types of discourse (Biber andConrad 2001) It combines technological tools exploratory factor analysis andqualitative analysis of texts to determine which lexical and grammatical fea-tures reliably co-occur in a large corpus Using a large collection of textscoupled with powerful statistical tools introduces fewer reliability threatsthan small-scale studies where participantsrsquo particular characteristics orratersrsquo judgement heavily influence resultsBiber et al (2006) provide the first multidimensional analysis of native-

speaker Spanish They analyze a 20-million-word Spanish corpus (4049texts) with written and oral data representing 19 registers (eg face-to-faceconversation business letters) The analysis uncovered both well-known dis-course types like narratives as well as new ones For example the followingcharacterize lsquoinformationally richrsquo discourse in Spanish nouns adjectives def-inite articles prepositions derived nouns type-token ratio long words(ie multisyllabic words) and ergative se constructions Unique to Spanish isalso hypothetical discourse containing concentration of structures such as theconditional and the subjunctive as well as future verb forms verbs of obliga-tion and causation (eg dejar permitir hacer+ infinitive) and the conjunctionqueParodi (2007) used a 25-million-word corpus of native-speaker Spanish to

study the differences between written and spoken Spanish His analysis com-plements Biber et al (2006) in that it reveals how lexical and grammaticalfeatures cluster together based on whether the register is context dependent(ie the interpretation of important features depends on the lsquospeech situ-ationrsquo) written academic commissive in nature (in the pragmatic sense) at-titudinal or informational in focus Additionally Parodi provides a morerefined definition (although based on many fewer tokens) of what constitutesSpanish narrative discourse than Biber et al (2006) interestingly Parodirsquosanalysis indicates that English and Spanish share many of the same narrativefeaturesWe present the first-known multidimensional analysis of L2 Spanish study-

ing how learners use various lexical and grammatical phenomena to generateIL-specific discourse types We do not make a priori assumptions about whichof these phenomena work in tandem nor do we assume that learner discoursetypes are the same as those of native-speakers Our analysis serves as a first

Y ASENCION-DELANEY AND J COLLENTINE 5

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 2: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Researchers have powerful analytical tools for studying how lexical andgrammatical features generate discourse types (Biber and Conrad 2001)Still Myles (2005) and Myles and Mitchell (2005) argue that SLA researchhas been slow to embrace corpus-based technologies even though corpus lin-guistics is effective at helping theoretical linguistics to understand the nature ofdiscourse types Myles and Mitchell (2005) also note that corpus research cancomplement and increase the generalizability of SLA research by examininglarge amounts of data

We present the first-known multidimensional analysis of a written Spanishlearner corpus examining how learners combine lexical and grammatical phe-nomena to generate different discourse types Before presenting our researchquestions we provide an overview of research about the acquisition of discur-sive abilities how corpora have been used in SLA to understand the mappingof form to function and the usefulness of the multidimensional corpusanalysis

Developing L2 discursive abilities

Canale and Swainrsquos (1980) and Bachmanrsquos (1990) theories of L2 communica-tive competence afford discoursetextual competence a central role in SLAThe American Council on the Teaching of Foreign Languages (ACTFL 2001)proficiency scales assume a developmental progression with learners develop-ing conversational then descriptive then narratives abilities and finally morecomplex abilities In general the study of discursive abilities entails how learn-ers obtain the ability to produce multipropositional segments whose semanticand structural interrelationships (eg gendernumber agreement) cohere(Givon 1985) This research has considered how pragmatic abilities affect L2discursive abilities the assignment of discursive roles to grammatical con-structs and learnersrsquo use of discourse markers

Pragmatic research concerned with the discursive context studies how westructure information across sentences treating issues such as how syntacticstructures affect coherence topicfocus and propositional presupposition(Horn and Ward 2006 xiiindashxix) SLA pragmatic research as Kasper (2001)notes examines how learners organize discourse and how they employ gram-matical structures to produce multipropositional messages (eg markingcause-effect chains with conditional if structures) The acquisition of whatmay be termed lsquogrammatical-pragmaticrsquo abilities entails not only lexical andgrammatical phenomenarsquos denotations but also their connotations as managedby readerslisteners to interpret multisentential messages

Research suggests that L2 discursive abilities develop as learners attain bothgrammatical and pragmatic abilities In uninstructed learning contexts func-tional necessities (eg distinguishing between topic and comment) influencewhich lexical and grammatical tools learners grammaticalize (Skiba andDittmar 1992) Geyer (2007) presents data suggesting that once Japaneselearners develop lsquoseveral (grammatical-pragmatic) devicesrsquo (p 362) they can

2 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

chain utterances to achieve discursive organization Geyer reports thatadvanced learners use coordinate and adverbial conjunctions to makedistinctions such as foregroundbackground and cohesive propositionalsegmentingThe research on the acquisition of Spanishrsquos two copulas ser and estar shows

how learners associate lexical and grammatical features with particular dis-course types Cheng et al (2008) examining a corpus of Mandarin ChineseL1 learners of Spanish report that exploratory writing led to more copulaestar+ adjective usage They also noted that this structure was mostly asso-ciated with narratives and descriptions Collentine and Asencion-Delaney(2010) in their corpus-based study of the Spanish copula ser+ adjective andestar+ adjective report that overall when learners begin mastering simple dis-course types lexical syntactic and morphological diversity increases and com-plexity increases1 They found that ser+adjective appears in descriptive andevaluative discourse where lexical and morphological diversity and complexityoccurs However estar+ adjective is present in narrations descriptions andhypothetical discourse where nonetheless surprisingly little linguistic com-plexity occursOther discourse related research has studied L2 discourse markers Upton

and Connor (2001) use corpus techniques to compare formulaics in lsquoapplica-tion lettersrsquo by L1 speakers of (American) English and L2 learners of Englishfrom Belgian and Finland The L2 writers were much less formulaic than theL1 writers with politeness strategies due to L1 influence Forsberg (2005) usesan L2 French learner corpus from L1 Swedish speakers to study different typesof idioms (eg grammatical pragmatic purely lexical idioms) Forsberg foundthat lexical idioms increase in writing over time She also found that formulaicswith a discursive role were lsquooverusedrsquo by learners of all levels

SLA corpus research on the development of structural andlexical phenomena

Corpus-based L2 research has helped us to understand how learners map formto function2 It also elucidates the role of L2 formulaic segments It has pro-vided important insights into the interaction of grammar and the lexiconWhat is consistent within this research is that L2 lexical phenomena and gram-matical constructs work in tandem Finally corpus-based L2 research hasexplored ways to efficiently measure L2 syntactic complexityRegarding the mapping of form to function Klein and Perdue (1997) used a

learner corpus to posit that uninstructed settings generate a lsquobasic varietyrsquowhere word order and grammatical constructs result from functional commu-nicative necessities Granger et al (2002) and Belz (2004) use corpus tech-niques to document how social and institutional pressures affect new L2phenomena (eg da-compounds such as dazu lsquothere-torsquo davon lsquothere-fromrsquodarin lsquothere-inrsquo)

Y ASENCION-DELANEY AND J COLLENTINE 3

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Some investigators conceptualize formulaic segments (eg in other words forexample) as singular dictionary entries (cf Ellis 2006) Using the BritishNational Corpus (BNC Consortium 2001) Ellis et al (2008) found that L2learners at early stages of acquisition recognize formulaics based on their fre-quency in the input Interestingly however Ellis et al (2008) argue thatnative-like use of formulaics ultimately requires learners to alter their process-ing of words lsquofor coherence for co-occurrence greater than chancersquo (p 391)

Regarding the interaction between morphology and the lexicon Marsdenand David (2008) study vocabulary development among L2 learners of Spanishat ages 9 and 13 The 13-year-old learners who had 450 more hours of class-room instruction showed greater lexical and inflectional diversity Their ana-lysis supports the notion that in inflectionally rich languages like Spanish animportant indicator of development is not only accuracy as measured bygrammatical errors but also increased inflectionalderivational variation(Howard 2002 2006 Collentine 2009) Their research also indicates that akey factor in differentiating levels of development is changes in the part ofspeech (eg nouns verbs adjectives) that predominate in productionMarsden and David (2008) present evidence suggesting that as learners pro-gress in speech they produce more verbs than nouns and that as they begin toproduce more verbs they also start to produce more adjectives Since learnersproduce different discourse types as they progress it is logical to surmise thatthese changes in lexical and grammatical production parallel changes in thediscourse learnersrsquo produce

Two research projects have used corpus techniques to study the emergenceof linguistic complexity and semantic density Operationalizing lsquocomplexityrsquoand lsquodensityrsquo in corpus-based studies is challenging but important Ortega(2000) uses a corpus of intermediate-level L2-Spanish learnersrsquo writing toidentify reliable measures of syntactic complexity Her analysis indicated thatclause length phrasal elaboration and amount of subordination best predictsyntactic complexity Collentine (2004) employs corpus techniques to comparemorphological and lexical complexity in an in-class learning context and astudy-abroad context After a semester the study-abroad learners employedmore morphological narrative complexity by using past-tense verbs third-person morphology past participles and present participles (as well as publicverbs eg decir que lsquoto say thatrsquo) From a learnerrsquos perspective morphologicalcomplexity in Spanish requires the use of a range of aspectual personnumber and gender inflections beyond simple verb tenses (eg the present)and other unmarked morphemes (eg masculine-singular nounsadjectives cfCollentine 2004 237) The in-class group was more lexically complex produ-cing a higher concentration of nominal featuresmdashnouns and adjectivesmdashandso semantically dense discourse (Biber 1988)

Finally Grant and Ginther (2000) urge corpus-based SLA researchers tocombine qualitative with quantitative techniques to increase generalizabilityof results Many approaches (eg part-of-speech tagging) do not generallyfactor in errors and small corpora bias from sampling particular tasks While

4 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

we do not report error rates or omissions in this study we are confident thatour multidimensional approach and its mixed method research design (iequantitative + qualitative analyses) together with the size of our corpus ad-dress Grant and Gintherrsquos (2000) concerns

The multidimensional analysis and relevant corpus linguistictechniques

Multidimensional corpus analysis shows how lexical and grammatical featuresbundle together to produce different and new types of discourse (Biber andConrad 2001) It combines technological tools exploratory factor analysis andqualitative analysis of texts to determine which lexical and grammatical fea-tures reliably co-occur in a large corpus Using a large collection of textscoupled with powerful statistical tools introduces fewer reliability threatsthan small-scale studies where participantsrsquo particular characteristics orratersrsquo judgement heavily influence resultsBiber et al (2006) provide the first multidimensional analysis of native-

speaker Spanish They analyze a 20-million-word Spanish corpus (4049texts) with written and oral data representing 19 registers (eg face-to-faceconversation business letters) The analysis uncovered both well-known dis-course types like narratives as well as new ones For example the followingcharacterize lsquoinformationally richrsquo discourse in Spanish nouns adjectives def-inite articles prepositions derived nouns type-token ratio long words(ie multisyllabic words) and ergative se constructions Unique to Spanish isalso hypothetical discourse containing concentration of structures such as theconditional and the subjunctive as well as future verb forms verbs of obliga-tion and causation (eg dejar permitir hacer+ infinitive) and the conjunctionqueParodi (2007) used a 25-million-word corpus of native-speaker Spanish to

study the differences between written and spoken Spanish His analysis com-plements Biber et al (2006) in that it reveals how lexical and grammaticalfeatures cluster together based on whether the register is context dependent(ie the interpretation of important features depends on the lsquospeech situ-ationrsquo) written academic commissive in nature (in the pragmatic sense) at-titudinal or informational in focus Additionally Parodi provides a morerefined definition (although based on many fewer tokens) of what constitutesSpanish narrative discourse than Biber et al (2006) interestingly Parodirsquosanalysis indicates that English and Spanish share many of the same narrativefeaturesWe present the first-known multidimensional analysis of L2 Spanish study-

ing how learners use various lexical and grammatical phenomena to generateIL-specific discourse types We do not make a priori assumptions about whichof these phenomena work in tandem nor do we assume that learner discoursetypes are the same as those of native-speakers Our analysis serves as a first

Y ASENCION-DELANEY AND J COLLENTINE 5

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 3: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

chain utterances to achieve discursive organization Geyer reports thatadvanced learners use coordinate and adverbial conjunctions to makedistinctions such as foregroundbackground and cohesive propositionalsegmentingThe research on the acquisition of Spanishrsquos two copulas ser and estar shows

how learners associate lexical and grammatical features with particular dis-course types Cheng et al (2008) examining a corpus of Mandarin ChineseL1 learners of Spanish report that exploratory writing led to more copulaestar+ adjective usage They also noted that this structure was mostly asso-ciated with narratives and descriptions Collentine and Asencion-Delaney(2010) in their corpus-based study of the Spanish copula ser+ adjective andestar+ adjective report that overall when learners begin mastering simple dis-course types lexical syntactic and morphological diversity increases and com-plexity increases1 They found that ser+adjective appears in descriptive andevaluative discourse where lexical and morphological diversity and complexityoccurs However estar+ adjective is present in narrations descriptions andhypothetical discourse where nonetheless surprisingly little linguistic com-plexity occursOther discourse related research has studied L2 discourse markers Upton

and Connor (2001) use corpus techniques to compare formulaics in lsquoapplica-tion lettersrsquo by L1 speakers of (American) English and L2 learners of Englishfrom Belgian and Finland The L2 writers were much less formulaic than theL1 writers with politeness strategies due to L1 influence Forsberg (2005) usesan L2 French learner corpus from L1 Swedish speakers to study different typesof idioms (eg grammatical pragmatic purely lexical idioms) Forsberg foundthat lexical idioms increase in writing over time She also found that formulaicswith a discursive role were lsquooverusedrsquo by learners of all levels

SLA corpus research on the development of structural andlexical phenomena

Corpus-based L2 research has helped us to understand how learners map formto function2 It also elucidates the role of L2 formulaic segments It has pro-vided important insights into the interaction of grammar and the lexiconWhat is consistent within this research is that L2 lexical phenomena and gram-matical constructs work in tandem Finally corpus-based L2 research hasexplored ways to efficiently measure L2 syntactic complexityRegarding the mapping of form to function Klein and Perdue (1997) used a

learner corpus to posit that uninstructed settings generate a lsquobasic varietyrsquowhere word order and grammatical constructs result from functional commu-nicative necessities Granger et al (2002) and Belz (2004) use corpus tech-niques to document how social and institutional pressures affect new L2phenomena (eg da-compounds such as dazu lsquothere-torsquo davon lsquothere-fromrsquodarin lsquothere-inrsquo)

Y ASENCION-DELANEY AND J COLLENTINE 3

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Some investigators conceptualize formulaic segments (eg in other words forexample) as singular dictionary entries (cf Ellis 2006) Using the BritishNational Corpus (BNC Consortium 2001) Ellis et al (2008) found that L2learners at early stages of acquisition recognize formulaics based on their fre-quency in the input Interestingly however Ellis et al (2008) argue thatnative-like use of formulaics ultimately requires learners to alter their process-ing of words lsquofor coherence for co-occurrence greater than chancersquo (p 391)

Regarding the interaction between morphology and the lexicon Marsdenand David (2008) study vocabulary development among L2 learners of Spanishat ages 9 and 13 The 13-year-old learners who had 450 more hours of class-room instruction showed greater lexical and inflectional diversity Their ana-lysis supports the notion that in inflectionally rich languages like Spanish animportant indicator of development is not only accuracy as measured bygrammatical errors but also increased inflectionalderivational variation(Howard 2002 2006 Collentine 2009) Their research also indicates that akey factor in differentiating levels of development is changes in the part ofspeech (eg nouns verbs adjectives) that predominate in productionMarsden and David (2008) present evidence suggesting that as learners pro-gress in speech they produce more verbs than nouns and that as they begin toproduce more verbs they also start to produce more adjectives Since learnersproduce different discourse types as they progress it is logical to surmise thatthese changes in lexical and grammatical production parallel changes in thediscourse learnersrsquo produce

Two research projects have used corpus techniques to study the emergenceof linguistic complexity and semantic density Operationalizing lsquocomplexityrsquoand lsquodensityrsquo in corpus-based studies is challenging but important Ortega(2000) uses a corpus of intermediate-level L2-Spanish learnersrsquo writing toidentify reliable measures of syntactic complexity Her analysis indicated thatclause length phrasal elaboration and amount of subordination best predictsyntactic complexity Collentine (2004) employs corpus techniques to comparemorphological and lexical complexity in an in-class learning context and astudy-abroad context After a semester the study-abroad learners employedmore morphological narrative complexity by using past-tense verbs third-person morphology past participles and present participles (as well as publicverbs eg decir que lsquoto say thatrsquo) From a learnerrsquos perspective morphologicalcomplexity in Spanish requires the use of a range of aspectual personnumber and gender inflections beyond simple verb tenses (eg the present)and other unmarked morphemes (eg masculine-singular nounsadjectives cfCollentine 2004 237) The in-class group was more lexically complex produ-cing a higher concentration of nominal featuresmdashnouns and adjectivesmdashandso semantically dense discourse (Biber 1988)

Finally Grant and Ginther (2000) urge corpus-based SLA researchers tocombine qualitative with quantitative techniques to increase generalizabilityof results Many approaches (eg part-of-speech tagging) do not generallyfactor in errors and small corpora bias from sampling particular tasks While

4 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

we do not report error rates or omissions in this study we are confident thatour multidimensional approach and its mixed method research design (iequantitative + qualitative analyses) together with the size of our corpus ad-dress Grant and Gintherrsquos (2000) concerns

The multidimensional analysis and relevant corpus linguistictechniques

Multidimensional corpus analysis shows how lexical and grammatical featuresbundle together to produce different and new types of discourse (Biber andConrad 2001) It combines technological tools exploratory factor analysis andqualitative analysis of texts to determine which lexical and grammatical fea-tures reliably co-occur in a large corpus Using a large collection of textscoupled with powerful statistical tools introduces fewer reliability threatsthan small-scale studies where participantsrsquo particular characteristics orratersrsquo judgement heavily influence resultsBiber et al (2006) provide the first multidimensional analysis of native-

speaker Spanish They analyze a 20-million-word Spanish corpus (4049texts) with written and oral data representing 19 registers (eg face-to-faceconversation business letters) The analysis uncovered both well-known dis-course types like narratives as well as new ones For example the followingcharacterize lsquoinformationally richrsquo discourse in Spanish nouns adjectives def-inite articles prepositions derived nouns type-token ratio long words(ie multisyllabic words) and ergative se constructions Unique to Spanish isalso hypothetical discourse containing concentration of structures such as theconditional and the subjunctive as well as future verb forms verbs of obliga-tion and causation (eg dejar permitir hacer+ infinitive) and the conjunctionqueParodi (2007) used a 25-million-word corpus of native-speaker Spanish to

study the differences between written and spoken Spanish His analysis com-plements Biber et al (2006) in that it reveals how lexical and grammaticalfeatures cluster together based on whether the register is context dependent(ie the interpretation of important features depends on the lsquospeech situ-ationrsquo) written academic commissive in nature (in the pragmatic sense) at-titudinal or informational in focus Additionally Parodi provides a morerefined definition (although based on many fewer tokens) of what constitutesSpanish narrative discourse than Biber et al (2006) interestingly Parodirsquosanalysis indicates that English and Spanish share many of the same narrativefeaturesWe present the first-known multidimensional analysis of L2 Spanish study-

ing how learners use various lexical and grammatical phenomena to generateIL-specific discourse types We do not make a priori assumptions about whichof these phenomena work in tandem nor do we assume that learner discoursetypes are the same as those of native-speakers Our analysis serves as a first

Y ASENCION-DELANEY AND J COLLENTINE 5

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 4: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Some investigators conceptualize formulaic segments (eg in other words forexample) as singular dictionary entries (cf Ellis 2006) Using the BritishNational Corpus (BNC Consortium 2001) Ellis et al (2008) found that L2learners at early stages of acquisition recognize formulaics based on their fre-quency in the input Interestingly however Ellis et al (2008) argue thatnative-like use of formulaics ultimately requires learners to alter their process-ing of words lsquofor coherence for co-occurrence greater than chancersquo (p 391)

Regarding the interaction between morphology and the lexicon Marsdenand David (2008) study vocabulary development among L2 learners of Spanishat ages 9 and 13 The 13-year-old learners who had 450 more hours of class-room instruction showed greater lexical and inflectional diversity Their ana-lysis supports the notion that in inflectionally rich languages like Spanish animportant indicator of development is not only accuracy as measured bygrammatical errors but also increased inflectionalderivational variation(Howard 2002 2006 Collentine 2009) Their research also indicates that akey factor in differentiating levels of development is changes in the part ofspeech (eg nouns verbs adjectives) that predominate in productionMarsden and David (2008) present evidence suggesting that as learners pro-gress in speech they produce more verbs than nouns and that as they begin toproduce more verbs they also start to produce more adjectives Since learnersproduce different discourse types as they progress it is logical to surmise thatthese changes in lexical and grammatical production parallel changes in thediscourse learnersrsquo produce

Two research projects have used corpus techniques to study the emergenceof linguistic complexity and semantic density Operationalizing lsquocomplexityrsquoand lsquodensityrsquo in corpus-based studies is challenging but important Ortega(2000) uses a corpus of intermediate-level L2-Spanish learnersrsquo writing toidentify reliable measures of syntactic complexity Her analysis indicated thatclause length phrasal elaboration and amount of subordination best predictsyntactic complexity Collentine (2004) employs corpus techniques to comparemorphological and lexical complexity in an in-class learning context and astudy-abroad context After a semester the study-abroad learners employedmore morphological narrative complexity by using past-tense verbs third-person morphology past participles and present participles (as well as publicverbs eg decir que lsquoto say thatrsquo) From a learnerrsquos perspective morphologicalcomplexity in Spanish requires the use of a range of aspectual personnumber and gender inflections beyond simple verb tenses (eg the present)and other unmarked morphemes (eg masculine-singular nounsadjectives cfCollentine 2004 237) The in-class group was more lexically complex produ-cing a higher concentration of nominal featuresmdashnouns and adjectivesmdashandso semantically dense discourse (Biber 1988)

Finally Grant and Ginther (2000) urge corpus-based SLA researchers tocombine qualitative with quantitative techniques to increase generalizabilityof results Many approaches (eg part-of-speech tagging) do not generallyfactor in errors and small corpora bias from sampling particular tasks While

4 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

we do not report error rates or omissions in this study we are confident thatour multidimensional approach and its mixed method research design (iequantitative + qualitative analyses) together with the size of our corpus ad-dress Grant and Gintherrsquos (2000) concerns

The multidimensional analysis and relevant corpus linguistictechniques

Multidimensional corpus analysis shows how lexical and grammatical featuresbundle together to produce different and new types of discourse (Biber andConrad 2001) It combines technological tools exploratory factor analysis andqualitative analysis of texts to determine which lexical and grammatical fea-tures reliably co-occur in a large corpus Using a large collection of textscoupled with powerful statistical tools introduces fewer reliability threatsthan small-scale studies where participantsrsquo particular characteristics orratersrsquo judgement heavily influence resultsBiber et al (2006) provide the first multidimensional analysis of native-

speaker Spanish They analyze a 20-million-word Spanish corpus (4049texts) with written and oral data representing 19 registers (eg face-to-faceconversation business letters) The analysis uncovered both well-known dis-course types like narratives as well as new ones For example the followingcharacterize lsquoinformationally richrsquo discourse in Spanish nouns adjectives def-inite articles prepositions derived nouns type-token ratio long words(ie multisyllabic words) and ergative se constructions Unique to Spanish isalso hypothetical discourse containing concentration of structures such as theconditional and the subjunctive as well as future verb forms verbs of obliga-tion and causation (eg dejar permitir hacer+ infinitive) and the conjunctionqueParodi (2007) used a 25-million-word corpus of native-speaker Spanish to

study the differences between written and spoken Spanish His analysis com-plements Biber et al (2006) in that it reveals how lexical and grammaticalfeatures cluster together based on whether the register is context dependent(ie the interpretation of important features depends on the lsquospeech situ-ationrsquo) written academic commissive in nature (in the pragmatic sense) at-titudinal or informational in focus Additionally Parodi provides a morerefined definition (although based on many fewer tokens) of what constitutesSpanish narrative discourse than Biber et al (2006) interestingly Parodirsquosanalysis indicates that English and Spanish share many of the same narrativefeaturesWe present the first-known multidimensional analysis of L2 Spanish study-

ing how learners use various lexical and grammatical phenomena to generateIL-specific discourse types We do not make a priori assumptions about whichof these phenomena work in tandem nor do we assume that learner discoursetypes are the same as those of native-speakers Our analysis serves as a first

Y ASENCION-DELANEY AND J COLLENTINE 5

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 5: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

we do not report error rates or omissions in this study we are confident thatour multidimensional approach and its mixed method research design (iequantitative + qualitative analyses) together with the size of our corpus ad-dress Grant and Gintherrsquos (2000) concerns

The multidimensional analysis and relevant corpus linguistictechniques

Multidimensional corpus analysis shows how lexical and grammatical featuresbundle together to produce different and new types of discourse (Biber andConrad 2001) It combines technological tools exploratory factor analysis andqualitative analysis of texts to determine which lexical and grammatical fea-tures reliably co-occur in a large corpus Using a large collection of textscoupled with powerful statistical tools introduces fewer reliability threatsthan small-scale studies where participantsrsquo particular characteristics orratersrsquo judgement heavily influence resultsBiber et al (2006) provide the first multidimensional analysis of native-

speaker Spanish They analyze a 20-million-word Spanish corpus (4049texts) with written and oral data representing 19 registers (eg face-to-faceconversation business letters) The analysis uncovered both well-known dis-course types like narratives as well as new ones For example the followingcharacterize lsquoinformationally richrsquo discourse in Spanish nouns adjectives def-inite articles prepositions derived nouns type-token ratio long words(ie multisyllabic words) and ergative se constructions Unique to Spanish isalso hypothetical discourse containing concentration of structures such as theconditional and the subjunctive as well as future verb forms verbs of obliga-tion and causation (eg dejar permitir hacer+ infinitive) and the conjunctionqueParodi (2007) used a 25-million-word corpus of native-speaker Spanish to

study the differences between written and spoken Spanish His analysis com-plements Biber et al (2006) in that it reveals how lexical and grammaticalfeatures cluster together based on whether the register is context dependent(ie the interpretation of important features depends on the lsquospeech situ-ationrsquo) written academic commissive in nature (in the pragmatic sense) at-titudinal or informational in focus Additionally Parodi provides a morerefined definition (although based on many fewer tokens) of what constitutesSpanish narrative discourse than Biber et al (2006) interestingly Parodirsquosanalysis indicates that English and Spanish share many of the same narrativefeaturesWe present the first-known multidimensional analysis of L2 Spanish study-

ing how learners use various lexical and grammatical phenomena to generateIL-specific discourse types We do not make a priori assumptions about whichof these phenomena work in tandem nor do we assume that learner discoursetypes are the same as those of native-speakers Our analysis serves as a first

Y ASENCION-DELANEY AND J COLLENTINE 5

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 6: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

step towards characterizing learner discourse at the second and third years ofSpanish L2 instruction

Research questions

We characterize Spanish learner discourse via a multidimensional analysisof a corpus of L2 Spanish generated by second- and third-year university-level learners Specifically the study addresses the following researchquestions

1 How do lexical and grammatical phenomena cluster together in L2Spanish writing

2 What types of discourses (eg narrative descriptive hypothetical) sur-round the concentration of these lexical and grammatical features

METHOD

Corpus description

We used a 202241-word corpus of written Spanish comprising edited andnon-edited compositions collected from English-speaking Spanish learners atthe second-year (109224 words) and third-year (93017 words) levels inwhich more variety of texts could be collected due to more exposure to lan-guage instruction To estimate the written proficiency of each level of instruc-tion based on the ACTFL Writing Proficiency Scale (ACTFL 2001) theresearchers selected a random sample of 50 entire documents from each ofthe two instructional levels (N=100) After a training session on working withthe scale the researchers rated the samples independently Each level on thescale was assigned a numerical value with 0 representing a novice low and 9 asuperior rating We estimated the inter-rater reliability of the subjectsrsquo profi-ciency ratings with a Pearson correlation since the datasets represented inter-val scales [r(df=98) 097 plt 001] The second-year learners wrote at theintermediate high level and the third-year learners at the advanced lowlevel This suggests that while the third-year learners generally narrated andproduced a limited number of cohesive devices as well as a variety of complexsyntactic structures the second-year learners were beginning to produce nar-rative structure although their control of the verbal constructs was still de-veloping The second-year learners also produced few cohesive devices andlimited subordination

The corpus comprises writing samples used for course assessment purposesletters narratives descriptions summaries and argumentative essays both inand out of class as well as on exams Given the studyrsquos exploratory and de-scriptive nature we did not control the type of tasks or topics within thecorpus Topics related to textbook themes (eg family childhood) and culturalreadings

6 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 7: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Procedures tagging searching and norming

Understanding multidimensional analyses requires knowledge of three keyconcepts part-of-speech tagging search pattern and statistical techniquesfor eliminating various biases in how tokens are countedTo access information about particular texts each file includes a header with

information about its topic source type author biographical information andpurpose (argumentative essay narrative) To search for morphosyntactic in-formation (eg all adjectives all subjunctive forms) one needs a part-of-speech tagger software that annotates every word with information aboutits major word classes (eg adjective noun verb determiner) basic morpho-logical information (eg plural preterit) as well as its lemma (ie its un-marked dictionary root such as a verbrsquos infinitive or a nounrsquos masculinesingular form)Part-of-speech tagging requires a dictionary with lexical and grammatical

information It also requires a pretagged corpus to train the software routinesto determine unknown or ambiguous wordsrsquo probable tags We compiled ourown dictionary utilized a training set from samples from the Corpus del espanol(Biber et al 2006) and tagged the corpus with n-gram software routines fromthe Natural Language Tool Kit (NLTK httpwwwnltkorg) Additionally wewrote Spanish-specific routines (eg clitic sequences derivational morphemes)to complement the NLTK routines to achieve greater tagging precision Afterthe corpus is tagged in this way the investigator must verify the accuracy ofthe tagging and fix errors through further programmingWe studied 78 lexical grammatical and lexico-grammatical features3 The

features involved all parts of speech common morphosyntactic constructsstudied by learners as well as additional constructs studied in Biber et al(2006) They represented adjectives (eg derived postnominal position)nouns (eg derived feminine) adverbs (eg place time) verb classes (egimperfect aspect past participle) verb phrases (eg communication know-ledge) and certain morphosyntactic features like dependent clauses nounphrase configurations (eg article plus noun) and pronoun usage (egcliticmdashthird person)Textual frequencies often require two mathematical conversions First

norming transforms a phenomenonrsquos count to its normed frequency Since textlengths vary longer texts inflate certain itemsrsquo importance To offset thistext-length bias one scales a phenomenonrsquos frequency per text such as per1000 words Second normalizing eliminates the feature-concentration biassome phenomena are naturally scarce in a document (eg the subjunctive)while others are naturally common (eg articles) (cf Biber and Conrad 2001)Normalizing converts a normed frequency to its z-score value vis-a-vis itsnormed frequency in each document Consequently one can measure therelative presence of two or more linguistic features within any given textAdditionally one can sum various z-scores to determine how concentrated aset of features occur in any text or group of texts

Y ASENCION-DELANEY AND J COLLENTINE 7

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 8: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Factor analysis

To determine how learners cluster lexical and grammatical features weemploy an exploratory principal factor analysis (PFA) (Biber and Conrad2001) PFA identifies dimensionsmdashalso referred to as factorsmdashalong whichthe 78 features in Table 1 co-vary statistically The result is a series of dimen-sions according to which one could classify the texts of a corpus and the fea-tures characterizing each dimension Frequently a factor may have twoopposing clusters which is why this technique is often referred to as a multi-dimensional analysis Opposing clusters of features occur in complementarydistribution within the corpus

The factors differ in terms of how much variation they account for Thedimension accounting for most variance represents the primary supersetfactor according to which all texts can be classified Dimensions that representless variance are subset factors of the superset A superset factor may identifythe features of formal and informal language with the remaining factors rep-resenting genres within the superset Each factor has an eigenvalue whichrepresents its importance relevant to the others The examination of factorsrsquoeigenvalues in scree plots of each factorrsquos total variance helps to determinehow many factors are important enough to report In a scree plot the eigen-values for each factor typically flatten out at some point which is an indicationof relatively unimportant factors The measurement of a featurersquos importancewithin a factor is referred to as its loading which takes the form of a correlationcoefficient that represents how much the feature correlates with the clusteridentified varying from 000 or no correlation to 100 or absolute correlationA dimension with two significant clusters will have two sets of loadings a set ofpositive loadings and another set of negative loadings differing mathematicallyin terms of their sign (although the direction of a clusterrsquos sign is irrelevant)Higher absolute loadings are more useful in the interpretation of a clusterrsquoscommunicative function For a cluster to represent some meaningful discoursetype Biber (1988) recommends that it should have at least five loadings at orabove the 030 cutoff What is useful about this approach is that it identifiesdifferent discourse types in a corpus and the loadings reveal the features thatmost represent those discourse types Biber et al (2006) for instance found sixdimensions in oral and written native-speaker Spanish four had significantclusters of both positive and negative features two had only one significantcluster In the first (superset) dimension the positive features (eg nounspost-modifying adjectives long words high-type token ratio) represented se-mantically dense literate discourse and the negative featuresrsquo (eg suasivenominal clauses second-person pronouns demonstratives) oral discourse

To obtain a factor analysis that includes the fewest most representativefeatures in each factor while considering idiosyncrasies of textual data linguis-tic PFA results are commonly lsquorotatedrsquo (Biber 1988) This mathematical tech-nique maximizes high loading values and minimizes low values FollowingBiber (1988) we rotated the factors using the Promax method

8 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 9: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Table 1 Targeted linguistic features

Variable class Variable

Noun Derived nouns (NOUN_DER)

Feminine nouns (NOUN_FEM)

Masculine nouns (NOUN_MSC)

Non-pluralizing nouns eg virus lsquovirusrsquo(NOUN_NPL)

Plural nouns (NOUN_PLR)

Singular nouns (NOUN_SNG)

Adjective Apocopated adjectives eg buen lsquogoodrsquo(ADJ_SHRT)

Derived adjectives (ADJ_DERV)

Feminine adjectives (ADJ_FEM)

Four-inflection adjectives (ADJ_TYP1)

Masculine adjectives (ADJ_MSC)

Plural adjectives (ADJ_PLR)

Postmodifying adjectives (ADJ_POST)

Premodifying adjectives (ADJ_PRE)

Singular adjectives (ADJ_SNG)

Two-inflection adjectives (ADJ_TYP2)

Pronoun Third-person clitics (PRON_3RD)

Preverb clitics (CLT_PRE) se + 3s verb eg secome lsquoone eatsrsquo (SE_VRB3S)

Subject pronouns (PRON_SUB)

Unplanned se (SE_UNPLANNED)

Other noun phrase elements Articlemdashnoun segments (ART_NOUN)

Definite articles (DEFART)

Possessive adjectives (DET_POSS)

Possessive syntax (WO_POSS)

Present participle modified by noun eg aguahirviendo lsquoboiling waterrsquo (PS_NOUN)

Verbs Third-person verb (VRB_3RD)

Conditional (VRB_COND)

Copula plus adjective (not in other predicatecategories) (COP_ADJ)

Evaluative predicates (PRD_EVAL)

Future (VRB_FUT)

Gustar-like verbs (VRB_GULI)

Imperfect (VRB_IMP)

Infinitive (VRB_INF)

Y ASENCION-DELANEY AND J COLLENTINE 9

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 10: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Variable class Variable

Infinitive not preceded by verb or article(VRB_INVA)

Past participle (VRB_PS)

Past perfect (VRB_PPRF)

Past subjunctive (VRB_PSBJ)

Perfect tense verb (VRB_PRFC)

Periphrastic future eg voy a comer lsquoIrsquom goingto eatrsquo(VRB_PERI)

Predicates of conclusions eg es obvio lsquoit isobviousrsquo (PRD_CONC)

Predicates of probability (PRD_PROB)

Present participle (VRB_PP)

Preterit (VRB_PRET)

Progressive aspect (VRB_PRGA)

Saberconocer (SAB_CON)

Suasive predicates (PRD_SUAS)

Suasive verbs (VRB_SUAS)

Subjunctive (VRB_SBJ)

Verbs of communication reporting(VRB_COM)

Verbs of conclusions (VRB_CONC)

Verbs of knowledge cognitive verbs(VRB_CONO)

Verbs of observation (VRB_OBS)

Verbs of probability (VRB_PROB)

Adverbs Adverb of time (ADV_TIME)

Adverbs of intensity (ADV_INTS)

Adverbs of manner (ADV_MAN)

Adverbs of place (ADV_PLC)

Adverbs of probability (ADV_PROB)

Miscellaneous Comparatives (COMPARE)

Conjunctions followed by finite verb(CONJ_VRB)

Irregular comparatives (COMP_IRR)

Lexical density ratio (DEN_TOK)4

Long words (LONG)5

Prepositions (PREPS)

Que subordinator (QUE)

Type token ratio (TYP_TOK)

10 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 11: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

A factorrsquos meaning requires interpretation What linguistic activity does thiscluster of features represent To aid in this interpretive process one sums thez-scores of a clusterrsquos significant loadings in order to easily identify the mostrepresentative texts of a factor or cluster Then in the qualitative analysis ofour study we examined the most representative texts using Biber et al (2006)and other researchersrsquo (eg Longacre 1983) description of Spanish discoursetypes in order to determine the type of discourse portrayed by the cluster oflinguistic features in those texts (see Figure 1 for distribution of words in thecorpus by discourse type)

RESULTS

The following section summarizes the findings for the multidimensional ana-lysis First we identify the five dimensions yielded by the statistical analysisFor each dimension we present the lexical and grammatical features clusteringfor positive and negative loading sets of the dimension and their communi-cative functions Finally differences between second and third year in eachdimension are addressed

Variable class Variable

Adverbial clauses Adverbial conjunctions of mode eg lo hicesegun me dijeron lsquoI did it the way they toldmersquo (ADVCLS_M)

Adverbial conjunctions of place eg la casadonde vivo lsquothe house where I liversquo(ADVCLS_L)

Adverbials clauses of time (ADVCLS_T)

Causal adverbial conjunctions (ADVCLS_C)

If clauses (ADVCLS_S)

Purpose adverbial clauses ie para que lsquosothatrsquo (ADVCLS_P)

Adjective clauses Relative pronouns with pre-posed preposition(REL_EN)

Relative pronouns with pre-posed preposition(REL_NO_EN)

Relative clauses on subjects eg la casa queesta alla lsquothe house that is therersquo(REL_SUBJ)

Nominal clauses Nominal clausesmdashnon-subjunctive(NOM_IND)

Nominal clausesmdashsubjunctive (NOM_SUBJ)

Y ASENCION-DELANEY AND J COLLENTINE 11

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 12: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Factors and dimensions

The scree-plot analysis of the eigenvalues for the rotated factors of the PFAindicated that a five-factor solution was optimal representing 202 of theshared variance (Table 2)

Table 3 shows the feature clusters in the five factors with their loadingsmeeting the 030 cutoff

The following details the findings and our interpretations of the clustersrsquodiscursive function along with qualitative analysis of representative samplesof the factors

Figure 1 Corpus word counts by discourse type

Table 2 Eigenvalues of the rotated factor analysis

Factors Eigenvalues Percent of variance Cumulative ()

1 4459 5945 5945

2 3691 4922 10867

3 2697 3596 14463

4 2561 3415 17877

5 1789 2386 20263

12 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 13: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Table

3Su

mmary

offactors(iedim

ensions)

andsignifican

tclusters

Factor

12

34

5

Positivecluster

(load

ings)

VRB_IMP(056)

ADJ_DERV

(071)

VRB_3

RD

(063)

PREPS(053)

VRB_INVA

(036)

VRB_P

RGA

(055)

ADJ_PLR

(059)

QUE

(055)

VRB_INVA

(052)

VRB_INF(035)

VRB_P

RET(053)

ADJ_SNG

(052)

SAB_C

ON

(043)

VRB_INF(049)

VRB_S

BJ(042)

ADJ_TYP1(051)

VRB_C

ONO

(042)

ADJ_PLR

(031)

PRON_3

RD

(035)

ADJ_MSC

(047)

REL_N

O_E

N(038)

VRB_C

ONO

(035)

TYP_T

OK

(044)

VRB_P

ROB

(030)

VRB_P

SBJ(034)

ADJ_TYP2(043)

VRB_C

OND

(033)

ADJ_POST(041)

CLT_P

RE

(033)

ADJ_FEM

(035)

SAB_C

ON

(030)

LONG

(030)

Neg

ativecluster

(load

ings)

ART_N

OUN

(065)

NOUN_S

NG

(040)

VRB_IMP(

045)

ADJ_SNG

(059)

DEFART(

053)

NOUN_F

EM

(056)

DEN_T

OK

(038)

VRB_P

RGA

(042)

ADJ_MSC

(044)

ADJ_PLR

(040)

NOUN_D

ER

(053)

PREPS(

030)

NOUN_P

LR

(051)

NOUN_M

SC

(036)

ADJ_TYP1(

036)

ADJ_POST(

035)

DEFART(

031)

NOUN_S

NG

(031)

ADJ_FEM

(030)

Y ASENCION-DELANEY AND J COLLENTINE 13

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 14: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Factor 1 Narrative vs expository prose

Factor 1 contained two significant clusters The analysis indicates that second-and third-year learnersrsquo written production tends to be either narrative orexpository

The lexical and grammatical features in the positive set include eight verbaland two pronominal features which constitute a narrative discourse Two ofthe verbal features with the largest loadings (ie imperfect and preterit) aregrammatical features used to present events and background descriptions inthe past (Biber et al 2006) Also third-person pronouns and clitics are used torefer to presupposed participants or story protagonists in the narratives(Longacre 1983)

Narrative Era un dıa lleno de sol y el aire lleno de aromas diferentes a floressilvestres Al salir de mi dormitorio todo parecıa normal No habıa ningunanube que se pudiera ver en el cielo entonces despues de hacer esta observacion con-tinue con mi plan de salir de mi dormitorio Comence a correr por los caminitosdesignados y como el dıa todavıa parecıa muy bonito decidı meterme adentro delbosque (It was a sunny day and the air filled with aromas of different wild-flowers Upon leaving my room everything seemed normal There was nocloud that could be seen in the sky then after making this observation Icontinued with my plan to leave my bedroom I started to run around thedesignated trails and as the day still looked very nice I decided to get into thewoods)

Figure 2 describes the text types that have the highest average summedz-scores for the features in the narrative cluster Text types with the highestaverage summed z-score are most representative of the cluster while thosetexts with the highest opposing (ie the opposing sign) are antithetical to theclusterrsquos function

Interestingly not only do narrative texts have high positive scores in dimen-sion 1 but also argumentative essays and summaries Students frequentlyapproached summaries and argumentation with narrative elements perhapsto compensate for a lack of more sophisticated abilities which has shown to bethe case for L2 writers of English in expository writing (Hinkel 2004) Forinstance one second-year learner wrote an argumentative text about stereo-types of the colonial times and used evidence from a story about a fray in thenew land

Argumentative essay El cuento de fraile Bartolome y los indıgenas maya es unaestereotipo de los tiempos de la conquista de America Central Tambien las relacionesmalas entre las culturas europeas y indıgenas El cura fue a guatemala para convertir alos indıgenas El misionero Bartolome pensaba que su cultura de espana serıa masavanzado que los maya En ultima instancia este pensamiento fue su ultimo (Thestory of Brother Bartholomew and the indigenous Maya is a stereotype of thetime of the conquest of Central America Also the bad relations between in-digenous and European cultures The priest went to Guatemala to convert the

14 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 15: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Indians The missionary Bartolome thought that his culture from Spainwould be more advanced than the Mayan Ultimately this was his lastthought)In these narratives the writers are present and involved The cluster has

grammatical variables including verbal features such as subjunctive past sub-junctive conditional and progressive aspect the first two of which representlsquoirrealisrsquo modality (Biber et al 2006) where learners describe feelings andattitudes towards possible eventualities The presence of a lexical featuresuch as knowledge verbs indicates the writerrsquos perceptions (Weber andBentivoglio 1991) These features personalize the narratives Finally it isalso noteworthy that the presence of subjunctive forms in narratives indi-cates some degree of syntactic complexity in the form of subordination(Parodi 2007)The third-year learners trended towards using more narrative behaviors

whereas the third-year learners averaged a higher summed z-score on thiscluster (M=41 SD=59) than the second-year learners (M=30 SD=59)the difference is notable but did not approach significance [F(1 634) = 37p=006]Dimension 1rsquos negative cluster included exclusively features associated

with noun phrases Discourse concentrated with nominal features such asnouns articles and adjectives is largely expository where information iscondensed into a few words (Biber et al 2006) Such semantically dense dis-course is often achieved through the use of multiple derivational morphemes

Figure 2 Concentration of narrative features by text type and averagedsummed z-score

Y ASENCION-DELANEY AND J COLLENTINE 15

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 16: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

(Biber 1988 Collentine 2004) Interestingly learners in their second and thirdyears already possess their own strategies for producing lsquoliteratersquo discoursesome of which overlap with native-speaker features (cf Biber et al 2006)Figure 3 reports the average summed z-score of the expository features bytext type confirming our interpretation that the negative cluster of dimension1 represents expository discourse

The expository cluster also occurs in mini-essay questions and descriptionswhich the Expository essay section exemplifies

Expository essay Los Estados Unidos es un paıs bastante rico y losjovenes tienen bastantes oportunidades en la vida No obstante lamitad de los jovenes del Estados Unidos son malsanos y mas y mastienen los problemas con hiperactividad y aprendizaje Es obvio quemuchos jovenes estan faltando los elementos necesarios del contenta-miento y una vida saludable (The United States is a rather rich countryand young people have many opportunities in life However half of the youthof the United States are unhealthy and more and more have problems with

Figure 3 Concentration of expository features by text type and averagedsummed z-scores

16 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 17: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

hyperactivity and learning It is clear that many young people are lacking thenecessary elements of a happy and healthy life)The third-year learners produced more nominal discourse features The

third-year learners averaged a higher summed z-score on this cluster(M=24 SD=47) than the second-year learners (M=05 SD=52) withthe difference being statistically significant [F(1 634) = 163 p 000]

Factor 2 Descriptive expository prose

The positive cluster of dimension 2mdashthe only significant onemdashrepresents adiscourse that is both expository and descriptive Thus to the extent thatthis cluster represents a sub-discourse of the expository type identified in di-mension 1 it seems that these learners do use lexico-grammatical features (egadjectives with different morphological information) reflecting descriptiveelements in expository writing but not as much as narrative elements Thiscluster also contains measures of lexical complexity long words and typetoken ratio Adjectives add informational density to learnersrsquo written proseStill post-nominal adjectives predominate in texts with this clusterAdditionally at these instructional levels a low degree of inflectional accuracy(ie agreement errors) is commonTexts involving expository prose and description have large average summed

z-scores Figure 4rsquos similarity to Figure 3 corroborates that this cluster is aspecialized expository discourse The difference becomes more evident uponconsideration of the Descriptive expository essay section which was written bya third-year learner

Descriptive expository essay El sueno americano se infiltra desde juven-tud es evidente por television escuela la cultura y ejemplos del gobiernoy polıticos Esta influencia es subconsciente pero fuerte y se ensena el amer-icano que la unica cosa que se necesita hacer es trabaja fielmente y comprarlas cosas correctas y eventualmente se recibira la vida perfecta Entoncesen la mente americano la definicion del suceso es ser rico ser bonito yjoven y tener un monton de las cosas mejores y ricas [The Americandream infiltrates from youth it is evident on television school culture andexamples of government and politicians This influence is subconscious butstrong and they teach the American that the only thing that one needs todo is to work faithfully and buy the right things and eventually he will getthe perfect life Then in the American mind the definition of event (sic suc-cess) is to be rich be beautiful and young and have a lot of the best and richthings]Here we see a variety of uses of adjectives such as nominalized adjectives

(eg el americano lsquoAmericansrsquo) descriptor chains (eg subconsciente pero fuertelsquosubconscious but strongrsquo) and the copula ser or estar in predicative sentencesFurthermore this sub-discourse type is probably more challenging to producesince the third-year learners produced more descriptive expository features

Y ASENCION-DELANEY AND J COLLENTINE 17

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 18: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

The third-year learners averaged a higher summed z-score on this cluster(M=28 SD=53) than the second-year learners (M=14 SD=50) withthe difference being statistically significant [F(1 634) = 754 p 000]

Learners at these levels exhibit some degree of morphological and syntacticsophistication when attempting to use lexico-grammatical features such asadjectives with inflections for number and gender in various morphologicaland syntactical (ie attributive and predicative roles) functions Also longwords tend to be dense in derivational morphology and so in content ahigh typetoken ratio indicates that various adjectives are employed in thisdiscourse Finally the inclusion of several adjectival features reflects a slightlymore developed syntax (Parodi 2007) since word order is a critical consider-ation with Spanish adjectives which can be pre- or post-nominal and descrip-tors are often chained together (ie X has qualities A B and C)

The negative cluster of featuresmdashsingular nouns and the lexical densitymeasuremdashdoes not constitute a text type Instead it indicates what descriptiveexpository texts lack Referential lexical items in the form of nouns are missingas well as a variety of parts of speech Thus descriptive expository texts lackoverall specificity and contain numerous and varied adjectives

Figure 4 Concentration of descriptive expository prose features by text typeand averaged summed z-score

18 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 19: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

Factor 3 Expository prose with a stance

The six positive features of dimension 3 are indicative of discourse wherewriters are reporting both their own thoughts about the veracity of eventua-lities as well as othersrsquo perspectives This cluster includes mostly lexical featuresthat indicate writersrsquo stance (Biber et al 2006) or their commitment to thetruth of some proposition taking the form of verbs of knowledge cognitiveverbs and verbs of probability (cf Collentine 1995) The use of a grammaticalfeature such as third-person verb inflections is also said to minimize the wri-terrsquos risk of misconstruing the reference in the discourse (Castellano 2000) Thecluster also includes complex syntactic features the subordinating conjunctionque and relative pronouns without prepositions (ie que quien cual) whichhelp the reader to identify references in the discourseFigure 5rsquos detailing of the average summed z-scores by text type confirms the

expository function of this cluster

Argumentative essay El autor Mario Benedetti posiblemente quiere expli-car la imaginacion de los ninos Los ninos no piensan logicamente porque no

Figure 5 Concentration of expository prose with stance features by text typeand averaged summed z-score

Y ASENCION-DELANEY AND J COLLENTINE 19

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 20: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

saben la realidad Estaba la primera vez que Osvaldo mire la televisionCuando un nino o nina mira una persona mirando y hablando en su direc-cion piensan que estan hablando a ellos Es una que nosotros no pensarporque nos sabemos la realidad Tambien los ninos piensan que los personasen el televisor son sus amigos [The author Mario Benedetti possibly wants toexplain the imagination of children Children do not think logically becausethey do not know reality It was the first time Osvaldo watched TV When aboy or a girl watches a person looking at or talking in their direction theythink they are talking to them It is one that we do not to think (sic thinkabout) because we know the reality]

The writer here speculates about an authorrsquos intention and argues for aspecific motive by reporting on the thoughts and attitudes of the charactersin the story

Dimension 3rsquos negative cluster only indicates what is missing where stancepredominates The features include lexical and grammatical narrative toolsnamely imperfect verbs progressive aspect and prepositions These featuresdescribe the scene or the background context (eg time place people andother co-occurring events) in which the main events occur suggesting perhapsthat these learners do not couple stances with narrative elements Support forthis is found in a comparison of the two levels of learners The data indicatethat expository prose with a stance occurs mostly in less proficient learnerswhich would explain why it is disassociated with certain narrative elementsThe second-year learners averaged a higher summed z-score on this cluster(M=39 SD=45) than the third-year learners (M=22 SD=27) with thedifference being statistically significant [F(1 634) = 176 p 000]

Factors 4 and 5

The last two factors did not include enough lexical and grammatical features tomeet the five-feature threshold above the 030 level However Factors 4 and5 share some linguistic features For both factors some elements in nominalclauses (ie singular and masculine adjectives in Factor 4 and plural adjectivesand definite articles in Factor 5) appeared in complementary distribution withinfinitive verbs This is probably evidence to support the notion that nominaland verbal elements appear in complementary distribution in these learnersrsquowriting

DISCUSSION AND CONCLUSIONS

The development of learnersrsquo interlanguage can be seen as the acquisition ofthe ability to create meanings by combining lexical sentential pragmatic anddiscourse features This study describes whether and how Spanish learners uselexico-grammatical features to produce interlanguage-specific discourse typesIn response to our initial research questions we found that second- andthird-year Spanish L2 learnersrsquo lexical and grammatical features clustered

20 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 21: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

reliably into five factors that mainly combined verbal or nominal features Theanalysis uncovered four significant clusters that can be considered distinctdiscourse types The communicative functions of these discourse types breakdown into two main stylistic variations narrative (characterized by verbalfeatures) and expository (characterized by nominal features)Only 20 of the variance in learnersrsquo interlanguage could be explained by

the five factors while Biber et alrsquos (2006) native-speaker multidimensionalanalysis accounted for 45 of the targeted corpusrsquo variance Biber et alrsquos(2006) clusters were more numerous and contained more features And al-though their native-speaker analysis included data from the written and oralmodes whereas this study included data from only the written mode the L2clusters possibly had few features because clustering of features of any kinddevelops slowly in the L2 and the psycholinguistic consolidation of productiveassociations requires time and practice (Collentine and Asencion-Delaney2010) If so it should also not be surprising that the discourse types favoredby Spanish learners at the second and third year of university-level instructionare novel in comparison with native-speaker clusters in a number of respectsAdditionally the fact that the clusters are relatively homogeneous in terms ofthe parts of speech they represent (ie either verbal or nominal) indicates thatthese learners depend on a limited repertoire of lexico-grammatical features toachieve their communicative goalsThe L2 expository dimension constitutes a concentrated use of nominal fea-

tures This discourse type represents a certain discourse complexity since inBiberrsquos (2001) study of English texts and in Parodirsquos (2007) study of Spanishtexts nominal features work together to generate semantically dense discoursethat is informationally rich This is corroborated by the observation thatadvanced-low third-year learners produced more expository discourseHowever the learner discourse type uncovered here differs fromnative-speaker expository discourse in that intermediate-high second-yearlearners produce narrative elements in argumentation such as arguing for apoint with an illustrative story Future research may reveal whether the nar-rative elements constitute a compensatory strategy as reported by Hinkel(2004) with English learners andor a function of the Spanish curriculumwhich highly emphasizes verb morphology (ie conjugations for preterit andimperfect)The narrative dimension showed that Spanish learnersrsquo narrative discourse

is not limited to presenting an account of events in the past but it also reflectslearnersrsquo personal speculations feelings and attitudes towards the events Thisinvolvement may be due to task requirements or to beginnerndashintermediatelearnersrsquo tendency to produce the L2 to talk about themselves which is typicalin the Spanish L2 curriculumFinally the remaining two discourse types are probably more or less useful

to L2 Spanish writers depending on their level of development The descrip-tive expository prose is probably a more sophisticated version of learner ex-pository prose in general since the advanced-low learners utilized it more On

Y ASENCION-DELANEY AND J COLLENTINE 21

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 22: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

the other hand the expository prose with a stance is peculiar in that it containsfeatures associated with stance and subordination and yet the second-yearlearners used it more than the third-year learners It was neverthelessshown that this discourse type does not appear intermixed with other import-ant feature sets as it is disassociated with important narrative features Thesecond-year learners may only have been able to produce segments with thesefeatures and not simultaneously generate other discourse types Clearly theemergence of stance and how it is operationalized in L2 writing deserves amore fine-grained analysis in future research

The findings provide examples of the multiple ways that linguistic complex-ity occurs in the L2 Although Spanish learnersrsquo discourse did not show signs ofsyntactic complexity (eg frequent use of relative clauses subordinate clausesuse of clitics cf Wolfe-Quintero et al 1998 Ortega 2000) the frequent use ofnominal features affects informational density due to the presence of numer-ous derivational morphemes Inflectional complexity in the form of markedforms was not predominant in the data set Still the learnersrsquo verbal inflectionsdid vary which is a sign of L2 development occurring (Howard 2002 2006Collentine 2004 Marsden and David 2008)

Regarding the studyrsquos limitations the multidimensional analysis would havebenefited from the inclusion of spontaneous learner samples (eg emails chatscripts) to understand L2 discourse where there are high online cognitive de-mands Additionally our data are limited to intermediate-high andadvanced-low learners of Spanish Finally given the nature of the multidi-mensional analysis that examines what appears in communication as opposedto what does not the present analysis does not consider learner errors (Grantand Ginther 2000) Nevertheless the data and the qualitative analyses providecomplementary perspectives on how L2 communication occurs in relativelyextended discourse where learners must consider numerous lexical inflection-al derivational and syntactic features

NOTES

1 Morphological diversity occurs when alearner uses a variety rather than a

small subset of the languagersquos inflec-tional and derivational morphemes(Howard 2002 2006) Morphological

complexity occurs where learnerseither employ words with multiple der-ivational morphemes or marked inflec-

tional morphemes (eg subjunctiveconditional future past participle)

2 Just as digital technologies have altered

various classroom practices so has theability to amass large corpora of digi-tized representations of the L2 changed

how we design pedagogical grammars(OrsquoKeeffe et al 2007) Corpus-based

data-driven learning strategies are be-ginning to be incorporated into peda-gogical materials (Boulton 2009)

3 A lexical category is one identified by asearch expression or measurement (eglexical density ratio) focusing on the

lexical properties of a word or segmentwhich primarily entails checking awordrsquos lemma A grammatical category

is one whose search expressionfocuses on the inflectional propertiesof a word or segment (eg person

22 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 23: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

number tense) it may also entailhigh-frequency functors whose role islargely syntactic (eg si lsquoifrsquo articlescomparative constructions) A lexico-grammatical category is identifiedby both lexical and grammatical prop-erties such as feminine nouns and ad-verbial conjunctions

4 Lexical density is derived from thetotal number of non-functor words

(nouns + adjectives + verbs + derivedadverbs) divided by the total words in a

text5 Long words are defined here as any

word whose length is greater than the

mean word length of the corpus plusthe standard deviation word length of

the corpus eg mean word length

46 + SD 49 = 105ndash11

REFERENCES

American Council on the Teaching of

Foreign Languages 2001 lsquoACTFL

Proficiency Guidelines ndash Writing available at

httpwwwactflorgi4apagesindexcfm

pageid=3326 Accessed 27 September 2010

Bachman L 1990 Fundamental Considerations

in Language Testing Oxford University Press

Belz J 2004 lsquoLearner corpus analysis and the

development of foreign language proficiencyrsquo

System 324 577ndash97

Biber D 1988 Variation across Speech and

Writing Cambridge University Press

Biber D 2001 lsquoOn the complexity of discourse

complexity A multidimensional analysisrsquo

in S Conrad and D Biber (eds) Variation in

English Multidimensional Studies Longman

pp 215ndash40

Biber D and S Conrad 2001 lsquoIntroduction

Multidimensional analysis and the study of

register variationrsquo in S Conrad and D Biber

(eds) Variation in English Multidimensional

Studies Longman pp 3ndash13

Biber D M Davies J Jones and N Tracy-

Ventura 2006 lsquoSpoken and written register

variation in Spanish A multi-dimensional ana-

lysisrsquo Corpora 1 1ndash37

Boulton A 2009 lsquoTesting the limits of

data-driven learning Language proficiency

and trainingrsquo ReCALL 211 37ndash54

British National Consortium 2001

lsquoBNC World [Database] available at http

wwwnatcorpoxacuk Accessed 09 July

2009

Canale M and M Swain 1980 lsquoTheoretical

bases of communicative approaches to second

language teaching and testingrsquo Applied

Linguistics 11 1ndash47

Castellano A 2000 lsquoAmbiguedad y variacion

del pronombre personal sujetorsquo in M Munoz

G Fernandez A Rodrıguez and V Benıtez

(eds) IV Congreso de Linguıstica General

Universidad de Cadiz pp 521ndash31

Cheng C H-C Lu and P Giannakouros

2008 lsquoThe uses of Spanish copulas by

Chinese-speaking learners in a free writing

taskrsquo Bilingualism Language and Cognition 11

3 301ndash17

Collentine J 1995 lsquoThe development of com-

plex syntax and mood-selection abilities by

intermediate-level learners of Spanishrsquo

Hispania 781 123ndash36

Collentine J 2004 lsquoThe effects of learning con-

texts on morphosyntactic and lexical develop-

mentrsquo Studies in Second Language Acquisition 26

2 227ndash48

Collentine J 2009 lsquoStudy abroad research

Findings implications and future directionsrsquo

in M Long and C Doughty (eds) The

Handbook of Language Teaching Wiley-

Blackwell Press pp 218ndash34

Collentine J and Y Asencion-Delaney 2010

lsquoA corpus-based analysis of the discourse func-

tions of SerEstar+adjective in three levels of

Spanish FL learnersrsquo Language Learning 602

409ndash45

Ellis N C 2002 lsquoReflections on frequency ef-

fects in language processingrsquo Studies in Second

Language Acquisition 242 297ndash339

Ellis N C 2006 lsquoLanguage acquisition as ra-

tional contingency learningrsquo Applied Linguistics

271 1ndash24

Ellis N C R Simpson-Vlach and C Maynard

2008 lsquoFormulaic language in native and second

language speakers Psycholinguistics corpus

Y ASENCION-DELANEY AND J COLLENTINE 23

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 24: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

linguistics and TESOLrsquo TESOL Quarterly 423

375ndash96

Forsberg F 2005 lsquoReady-to-speakrsquo

Prefabricated sequences in spoken French as

a second language and as a native language

[lsquoPret-a-parlerrsquo sequences prefabriquees en francais

parle L2 et L1]rsquo Cahiers De lrsquoInstitut De

Linguistique De Louvain 31 183ndash95

Geyer N 2007 lsquoSelf-qualification in L2

Japanese An interface of pragmatic grammat-

ical and discourse competencesrsquo Language

Learning 573 337ndash67

Givon T 1985 lsquoFunction structure and lan-

guage acquisitionrsquo in D Slobin (ed) The Cross

Linguistic Study of Language Acquisition

Lawrence Erlbaum pp 1008ndash25

Granger S J Hung and S Petch-Tyson

2002 Computer Learner Corpora Second

Language Acquisition and Foreign Language

Teaching John Benjamins

Grant L and A Ginther 2000 lsquoUsing

computer-tagged linguistic features to describe

L2 writing differencesrsquo Journal of Second

Language Writing 92 123ndash45

Hinkel E 2004 lsquoTense aspect and the passive

voice in L1 and L2 academic textsrsquo Language

Teaching Research 81 5ndash29

Horn L and G Ward 2006 The Handbook of

Pragmatics Wiley-Blackwell

Howard M 2002 lsquoPrototypical and

non-prototypical marking in the advanced

learnerrsquos aspectuo-temporal systemrsquo EuroSLA

Yearbook 2 87ndash114

Howard M 2006 lsquoVariation in advanced

French interlanguage A comparison of three

(socio)linguistic variablesrsquo Canadian Modern

Language Review 623 379ndash400

Kasper G 2001 lsquoFour perspectives on L2 prag-

matic developmentrsquo Applied Linguistics 224

502ndash30

Klein W and C Perdue 1997 lsquoThe basic var-

iety (Or Couldnrsquot natural languages be much

simpler)rsquo Second Language Research 134

301ndash47

Longacre R 1983 The Grammar of Discourse

Plenum

Marsden E and A David 2008 lsquoVocabulary

use during conversation a cross-sectional

study of development from year 9 to year 13

among learners of Spanish and Frenchrsquo

Language Learning Journal 362 181ndash98

Myles F 2005 lsquoInterlanguage corpora and

second language acquisition researchrsquo Second

Language Research 214 373ndash91

Myles F and R Mitchell 2005 lsquoUsing infor-

mation technology to support empirical SLA

researchrsquo Journal of Applied Linguistics 12

169ndash96

OrsquoKeeffe A M McCarthy and R Carter

2007 From Corpus to Classroom Cambridge

University Press

Ortega L 2000 Understanding Syntactic

Complexity The Measurement of Change in the

Syntax of Instructed L2 Spanish Learners

Unpublished dissertation University of Hawaii

at Manoa

Parodi G 2007 lsquoVariation across registers in

Spanish Exploring the El Grial PUCV Corpusrsquo

in G Parodi (ed) Working with Spanish

Corpora Continuum pp 11ndash53

Skiba R and N Dittmar 1992 lsquoPragmatic se-

mantic and syntactic constraints and gram-

maticalization A longitudinal perspectiversquo

Studies in Second Language Acquisition 143

323ndash49

Upton T A and U Connor 2001 lsquoUsing com-

puterized corpus analysis to investigate the

textlinguistic discourse moves of a genrersquo

English for Specific Purposes 204 313ndash29

Weber E and P Bentivoglio 1991 lsquoVerbs of

cognition in spoken Spanish A discourse pro-

filersquo in S Fleischman and L Waugh (eds)

Discourse Pragmatics and the Verb Evidence from

Romance Routledge pp 194ndash213

Wolfe-Quintero K S Inagaki and S Kim

1998 lsquoSecond language development in

writing measures of fluency accuracy and

complexityrsquo Technical Report 17

University of Hawairsquoi Second Language

Teaching and Curriculum Center

24 ANALYSIS OF A WRITTEN L2 SPANISH CORPUS

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from

Page 25: Applied Linguistics: 1–25 Oxford University Press 2011 doi ...

NOTES ON CONTRIBUTORS

Yuly Asencion-Delaney is an Associate Professor of Spanish at Northern ArizonaUniversity Her research interests include corpus linguistics Spanish second languageacquisition second language writing and second language assessment Address for cor-respondence Northern Arizona University Modern Languages Box 6004 Flagstaff AZ86011 USA ltyulyasencionnauedugt

Joseph Collentine is Professor of Spanish and Chair of the Modern Languages Departmentat Northern Arizona University His research interests include the study of L2 morpho-syntax the acquisition of mood corpus linguistics study abroad and computer assistedlanguage learning

Y ASENCION-DELANEY AND J COLLENTINE 25

by guest on January 29 2011applijoxfordjournalsorg

Dow

nloaded from


Recommended