Applications of corpus analysis in EAP: research, learning, and teaching
Martin Hewings
The University of Birmingham
Corpora as resources for learners: data-driven learning (DDL)
• Corpus analysis in EAP research
• Students learning from corpora: Data-driven learning and an alternative
• Teachers learning from corpora: Classroom applications
Outline of talk
Features of 31 JEAP ‘corpus’ papers• Paper types
– 26 corpus analyses
• Corpus content– 19 writing; 5 speech; two both; journal articles predominant; focus on
single soc sci disciplines
• Corpus types– mainly expert/ published
• Focus of analysis– mainly particular lexical/ grammatical features
Open-access academic corpora include…• Corpus of Contemporary American English (COCA) (Academic) 120
million words
• British Academic Written English (BAWE) 6.5 million words of good-standard student writing
• Michigan Corpus of Academic Spoken English (MICASE) 1.8 million words
• British Academic Spoken English (BASE) 1.6 million words
For a number General English corpora:• www.lextutor.ca (Lextutor)
Corpora as resources for learners: data-driven learning (DDL)
Corpora as resources for learners: data-driven learning (DDL)
DDL: • exposes students to ‘target’ language forms• provides authentic examples• provides information beyond dictionary or
grammar• encourages inductive learning• encourages learner autonomy
Corpora as resources for learners: data-driven learning (DDL)
“…an exceptional group of students – highly
acculturated into the genres of their discourse communities, mostly on the way to their PhDs, eager to perfect their English, possessing of advanced computer skills, and perfectly comfortable with quantitative data.”
Lee & Swales (2006): DDL
DDL: some reservations
• lack of evidence to link DDL to language improvement
• are the outcomes worth the time, effort and money?
• it doesn’t suit all students
• An example: MBA students’ use of ‘I’
• ‘Research article (RA) corpus’: 120,000 words
• ‘MBA corpus’: essays, 22,000 words
Selecting corpus data for students (as an alternative to DDL)
MBA corpus on TV or from magazine, I am in the opinion that service more consumption of fuels. I am almost certain that there world imports composition. I believe services commodities s composition. In the future I believe there will be a new osing a million dollars. So I believe services commodities wector. As a result of this, I can predict that there will mports appeared. After 1987, I do not think that there was a mports about one third. But I don't think it will grow so er. As a result, therefore, I expect that the countries more than other commodities. I expect service industry will ervices" and is intangible. I feel that the intangible veloped in the next future. I personally see the above idea world. But, before I go on, I should make a point. After ore detail analysis, I think I should take deeper consideratio a long term point of view, I suppose the composition of
RA corpusuring the estimation period. I also computed Patell's (1976, p. y. The question: When should I buy? has one logical answer:(SVR) metric (in all cases), I choose to present only the resulthe wall' statements such as 'I don't care how you do it, just doon environment. In addition, I examine several subhypotheses basSize Test. In this section, I first test the hypothesis of diffy perennial question: should I invest now or wait for the as a long way from reality: 'I just did not want to be part of aep asking themselves,'How do I know? What evidence is there?' Thy? By information technology I mean the hardware and software, cI were doing this what would I need?' Another useful heuristic red per ASR No. 190. That is, I test the hypothesis that inflatioe key questions such as: 'If I were doing this what would I needh domestically and globally. I will, therefore, focus more on th
Journals: published writing
RA corpusuring the estimation period. I also computed Patell's (1976, p. y. The question: When should I buy? has one logical answer:(SVR) metric (in all cases), I choose to present only the resulthe wall' statements such as 'I don't care how you do it, just doon environment. In addition, I examine several subhypotheses basSize Test. In this section, I first test the hypothesis of diffy perennial question: should I invest now or wait for the as a long way from reality: 'I just did not want to be part of aep asking themselves,'How do I know? What evidence is there?' Thy? By information technology I mean the hardware and software, cI were doing this what would I need?' Another useful heuristic red per ASR No. 190. That is, I test the hypothesis that inflatioe key questions such as: 'If I were doing this what would I needh domestically and globally. I will, therefore, focus more on th
Journals: published writing
RA corpusuring the estimation period. I also computed Patell's (1976, p. y. The question: When should I buy? has one logical answer:(SVR) metric (in all cases), I choose to present only the resulthe wall' statements such as 'I don't care how you do it, just doon environment. In addition, I examine several subhypotheses basSize Test. In this section, I first test the hypothesis of diffy perennial question: should I invest now or wait for the as a long way from reality: 'I just did not want to be part of aep asking themselves,'How do I know? What evidence is there?' Thy? By information technology I mean the hardware and software, cI were doing this what would I need?' Another useful heuristic red per ASR No. 190. That is, I test the hypothesis that inflatioe key questions such as: 'If I were doing this what would I needh domestically and globally. I will, therefore, focus more on th
Journals: published writing
Teachers learning from corpora: checking intuitions
What adverbs come before…
……. similar but not …….different?
……. different but not …..similar?
……. similar or ……… different?
• Cambridge Corpus of Academic English (CCAE); about 400 million words of published academic written text (& about 1 million words of speech)
Teachers learning from corpora: checking intuitions
Teachers learning from corpora: checking intuitions
• it is [adjective] to-infinitive 48,170• it is [adjective] that 24,115
it is [adjective] to-infinitive
crucial difficult helpful important necessary possible safe straightforward
> 4000 times < 500 timespossible 7784important 5019difficult 4345
necessary 4103
straightforward 481crucial 282
helpful 255
safe 194
it is [adjective] that
clear interesting likely notable possible significant surprising true
> 1000 times < 300 timesclear 5284
possible 4116likely 2561true 1170
significant 257surprising 251interesting 235notable 206
it is true that
• It is true that having a theoretical foundation for what one is doing in the classroom is important,but it is at least equally important to transform that knowledge into activities that are simple,appealing to the students, and successful.
• While it is true that national expenditure estimates are often larger than those of national income, this is not always the case.
Teachers learning from corpora: discovering new information
Some nouns have a related adjective ending:• -ic base – basic (not basical)
• -ical astrology – astrological (not astrologic)
• -ic or –ical analysis – analytic/ analytical
analytic 9, 721 analytical 12, 107
problematic 11, 042 problematical 551
geographic 4, 403 geographical 9, 322
technologic 47 technological 8, 750
ecological and geographical
(rather than ecological and geographic)
taxonomic and geographic
(rather than taxonomic and geographical)
-ic or –ical ?
technologic or technogical?
When intuition and corpus evidence clash
Student writing corpus: problems (?)
…the number of full-time and part-time jobs was almost similar.
Their aims were also highly different.
When intuition and corpus evidence clash
• …the number of full-time and part-time jobs was almost similar.
• Cambridge Academic English Corpus (400+ mill words of writing) 29 examples
• e.g. Comparing the charts of Figures 12 and 13 with those of Figures 7 and 6, respectively, we conclude that they are almost similar.
• Their aims were also highly different.
• Cambridge Academic English Corpus (400+ mill words of writing) 17 examples
• e.g. The advocates of Semantic Syntax and of Principles Parameters emphasize that their conceptualizations of grammatical theory are highly different.
From corpus research to teaching materials
Cambridge Academic English
From corpus research to teaching materials: ‘on the surface’
From corpus research to teaching materials: ‘below the surface’