Clavier09
Josef SchmiedJosef SchmiedEnglish English LanguageLanguage & & LinguisticsLinguistics
Chemnitz University of TechnologyChemnitz University of Technologyhttp://http://www.tuwww.tu--chemnitz.dechemnitz.de//philphil//englishenglish/Schmied/Schmied
Using corpora as an innovative tool to Using corpora as an innovative tool to compare varieties of English around the world: compare varieties of English around the world:
the International Corpus of Englishthe International Corpus of EnglishICEICE
storystory2
22/40/40Clavier09
Motivation and structureMotivation and structure
1. The (1. The (hi)storyhi)story of ICE of ICE beginnings and concepts (ICE0)beginnings and concepts (ICE0)ICE problemsICE problems
2. The present2. The present--day status of ICEday status of ICEresourcesresourceswork in progresswork in progress
3. Case studies: modalities3. Case studies: modalities 4. The future of ICE (ICE2)4. The future of ICE (ICE2)changing communication patterns 1989 changing communication patterns 1989 --20092009new practical opportunitiesnew practical opportunitiesnew theoretical challengesnew theoretical challenges
Motivation ICE History ICE status Case studies ICE2future Conclusion
33/40/40Clavier09
1. The (1. The (hi)storyhi)story of ICEof ICE
a personal account a personal account
1.1 Beginnings and concepts1.1 Beginnings and conceptsMay 1988 ICAME Birmingham:May 1988 ICAME Birmingham:SchmiedSchmied: : ““Compiling a Corpus of East African EnglishCompiling a Corpus of East African English””discussion on Brown/LOB categories?discussion on Brown/LOB categories?more sociolinguistics variable (gender, status, age, 1more sociolinguistics variable (gender, status, age, 1stst language, etc.)language, etc.)
Oct. 1988 proposal:Oct. 1988 proposal:GreenbaumGreenbaum, Sidney. , Sidney. ““A proposal for an international A proposal for an international
computerised corpus of English. computerised corpus of English. World World EnglishesEnglishes 77, 315 , 315
Motivation ICE History ICE status Case studies ICE2future Conclusion
44/40/40Clavier09
1.1 ICE0 concepts 1.1 ICE0 concepts
GreenbaumGreenbaum (1988: 31):(1988: 31):(1)(1) to sample standard varieties from other countries to sample standard varieties from other countries
where English is the first language, for example where English is the first language, for example Canada and AustraliaCanada and Australia
(2)(2) to sample national varieties from countries where to sample national varieties from countries where English is an official additional language, for English is an official additional language, for instance India and Nigeria; andinstance India and Nigeria; and
(3)(3) to include spoken and manuscript English as well to include spoken and manuscript English as well as printed English.as printed English.
Motivation ICE History ICE status Case studies ICE2future Conclusion
55/40/40Clavier09
1.1 Discussing the corpus design1.1 Discussing the corpus design
SchmiedSchmied (1990):(1990):““CorpusCorpus--linguistics and the linguistics and the nativizationnativization of of
EnglishEnglish””. . World World EnglishesEnglishes 99, 255, 255--268268
““corpuscorpus--compilation paradoxcompilation paradox””::A A ““nationalnational”” corpus should contain culturecorpus should contain culture--
specific specific text(type)stext(type)s, but we can only identify , but we can only identify them through corpus analysisthem through corpus analysis
Motivation ICE History ICE status Case studies ICE2future Conclusion
66/40/40Clavier09
1.2 ICE problems 1.2 ICE problems
corpuscorpus--compilation:compilation:1)1) funding (e.g. ICEfunding (e.g. ICE--US, ICEUS, ICE--Nigeria)Nigeria)2)2) adaptations in corpus compilation:adaptations in corpus compilation:
technology and culturetechnology and culture3)3) copycopy--right for distributionright for distribution4)4) corpus processing: corpus processing:
annotation and parsingannotation and parsingcorpuscorpus--application:application:1)1) manuals for restriction (interpretation)manuals for restriction (interpretation)2)2) query (query (WordSmithWordSmith or or AntConcAntConc) ) –– statisticsstatistics
individual solutionsindividual solutions
Motivation ICE History ICE status Case studies ICE2future Conclusion
77/40/40Clavier09
1.2.2 Adaptation1.2.2 Adaptation
representativenessrepresentativeness vs. comparabilityvs. comparability
McEneryMcEnery/Wilson 1996 Edinburgh U.P./Wilson 1996 Edinburgh U.P.http://www.lancs.ac.uk/fss/courses/ling/corpus/Corpus2/2FRA1.HTMhttp://www.lancs.ac.uk/fss/courses/ling/corpus/Corpus2/2FRA1.HTM
4 characteristics of the modern corpus: 4 characteristics of the modern corpus: Sampling and Sampling and representativenessrepresentativenessFinite size Finite size MachineMachine--readable form readable form A standard reference A standard reference
Motivation ICE History ICE status Case studies ICE2future Conclusion
88/40/40Clavier09
Appendix 6: List of written texts from Tanzania (word count)Appendix 6: List of written texts from Tanzania (word count)PRINTEDPRINTED
Informational: LearnedInformational: LearnedHumanities Humanities W2A001T W2A001T –– W2A010TW2A010T 20.17220.172Social SciencesSocial Sciences W2A011T W2A011T –– W2A020TW2A020T 20.15120.151Natural Sciences Natural Sciences W2A021T W2A021T –– W2A027TW2A027T 20.11420.114Technology/Agriculture/Environmental dev.Technology/Agriculture/Environmental dev. W2A031T W2A031T –– W2A040TW2A040T 20.14820.148
totaltotal 80.58580.585Informational: PopularInformational: PopularHumanities Humanities W2B001T W2B001T –– W2B010TW2B010T 20.13320.133Social Sciences Social Sciences W2B011T W2B011T –– W2B020TW2B020T 20.22320.223Natural Sciences Natural Sciences W2B021T W2B021T –– W2B24TW2B24T 6.5426.542Technology/Agriculture/Small Industry Technology/Agriculture/Small Industry W2B031T W2B031T –– W2B040TW2B040T 20.06520.065General General W2BGEN1T W2BGEN1T -- W2BGEN8TW2BGEN8T 13.78913.789
totaltotal 80.75280.752Informational: ReportageInformational: ReportageSplash Splash W2C001T W2C001T -- W2C0010TW2C0010T 20.01820.018Reportage/Features Reportage/Features W2C011T W2C011T -- W2C020TW2C020T 20.13920.139
totaltotal 40.15740.157InstructionalInstructionalAdministrative/Regulatory Administrative/Regulatory W2D001T W2D001T -- W2D010T W2D010T 20.12020.120PersuasivePersuasiveInstitutional Institutional W2E001T W2E001T –– W2E010TW2E010T 20.07820.078Personal Column Personal Column W2E011T W2E011T –– W2E020TW2E020T 20.12520.125
totaltotal 40.20340.203
from ICEfrom ICE-- East Africa handbook East Africa handbook
Motivation ICE History ICE status Case studies ICE2future Conclusion
99/40/40Clavier09
1.2.2 Spoken text categories in ICE corpora1.2.2 Spoken text categories in ICE corpora
SPOKEN (300)SPOKEN (300)Dialogues Dialogues (180) (180) PrivatePrivate (100) (100)
FaceFace--toto--face conversations (90) face conversations (90) PhonecallsPhonecalls (10) (10)
PublicPublic (80) (80) Classroom Lessons (20) Classroom Lessons (20) Broadcast Discussions (20) Broadcast Discussions (20) Broadcast Interviews (10) Broadcast Interviews (10) Parliamentary Debates (10) Parliamentary Debates (10) Legal crossLegal cross--examinations (10) examinations (10) Business Transactions (10) Business Transactions (10)
MonologuesMonologues (120) (120) UnscriptedUnscripted (70) (70)
Spontaneous commentaries (20) Spontaneous commentaries (20) Unscripted Speeches (30) Unscripted Speeches (30) Demonstrations (10) Demonstrations (10) Legal Presentations (10) Legal Presentations (10)
ScriptedScripted (50) (50) Broadcast News (20)Broadcast News (20) Broadcast Talks (20)Broadcast Talks (20) NonNon--broadcast Talks (10) broadcast Talks (10)
Motivation ICE History ICE status Case studies ICE2future Conclusion
1010/40/40Clavier09
1.2.2 ICE categories and ICE1.2.2 ICE categories and ICE--EA/EA/Ke/TzKe/Tz
ICEICE KeKe + + TzTzSPOKENSPOKEN 300300 250250
DIALOGUE DIALOGUE 180 180 130 130 (written as spoken(written as spoken 50)50)privateprivate 100100 3030direct direct convconv.. 90 90 3030distanced distanced convconv. . 10 10 ----publicpublic 8080 100100
WRITTENWRITTENpress editorialspress editorials 10 10 ---- ----institutional institutional ---- 10 10 1010personal columns personal columns ---- 10 10 1010
Motivation ICE History ICE status Case studies ICE2future Conclusion
1111/40/40Clavier09
1.2.4 Textual 1.2.4 Textual MarkupMarkup
In written texts, features of the original layout are marked, including sentence and paragraph boundaries, headings, deletions, and typographic features.
Spoken texts are transcribed orthographically, and are marked for pauses, overlapping strings, discourse phenomena such as false starts and hesitations, and speaker turns.
The markup manual is available here.
Motivation ICE History ICE status Case studies ICE2future Conclusion
1212/40/40Clavier09
1.2.4 1.2.4 WordclassWordclass TaggingTagging
Motivation ICE History ICE status Case studies ICE2future Conclusion
ICE texts are automatically tagged for wordclass by the ICE Tagger, developed by Sean Wallis at the Survey of English Usage, University College London. This assigns wordclass tags to each lexical item in the corpus. The tagset has been developed especially for ICE, and is largely based on Quirk et al (1985) A Comprehensive Grammar of the English Language. e.g. Each PRON(univ,sing)
of PREP(ge)these PRON(dem,plu)is V(cop,pres)the ART(def)responsibility N(com,sing)of PREP(ge)one NUM(card,sing)person N(com,sing)
1313/40/40Clavier09
1.2.4 Syntactic Parsing1.2.4 Syntactic Parsing
Every sentence in the corpus is analysed at phrase, clause, and sentence level, and the analysis is shown in the form of a parsetree:
Motivation ICE History ICE status Case studies ICE2future Conclusion
1414/40/40Clavier09
2. The present2. The present--day status of ICEday status of ICE
resources:resources:1)1) WWW for ICE and IceWWW for ICE and Ice--corporacorpora2)2) corpora corpora availabeavailabe3)3) publicationspublications
Motivation ICE History ICE status Case studies ICE2future Conclusion
1818/40/40Clavier09
2.1 ICE 2.1 ICE webpagewebpage: Corpus : Corpus designdesign
500 files 500 files àà 2,0002,000--word texts in specific categoriesword texts in specific categories The texts in the corpus date from 1990 or later. The authors The texts in the corpus date from 1990 or later. The authors
and speakers of the texts are aged 18 or above, were and speakers of the texts are aged 18 or above, were educated through the medium of English, and were either educated through the medium of English, and were either born in the country in whose corpus they are included, or born in the country in whose corpus they are included, or moved there at an early age and received their education moved there at an early age and received their education through the medium of English in the country concerned.through the medium of English in the country concerned.
The corpus contains samples of speech and writing by both The corpus contains samples of speech and writing by both males and females, and it includes a wide range of age males and females, and it includes a wide range of age groups. The proportions, however, are not representative of groups. The proportions, however, are not representative of the proportions in the population as a whole: women are not the proportions in the population as a whole: women are not equally represented in professions such as politics and law, equally represented in professions such as politics and law, and so do not produce equal amounts of discourse in these and so do not produce equal amounts of discourse in these fields. Similarly, various age groups are not equally fields. Similarly, various age groups are not equally represented among students or academic authors.represented among students or academic authors.
Motivation ICE History ICE status Case studies ICE2future Conclusion
1919/40/40Clavier09
2.2 Currently available ICE corpora2.2 Currently available ICE corpora
1st Language (ENL)) Great BritainGreat Britain Ireland*Ireland* Jamaica*Jamaica* New ZealandNew Zealand
* * free as downfree as down--loadload
22ndnd Language (ESL)Language (ESL) East Africa* (East Africa* (Ke/TzKe/Tz)) Hong Kong*Hong Kong* India*India* The Philippines*The Philippines* Singapore*Singapore*
Motivation ICE History ICE status Case studies ICE2future Conclusion
2020/40/40Clavier09
2.3 2.3 PublicationsPublications
Motivation ICE History ICE status Case studies ICE2future Conclusion
2121/40/40Clavier09 2121/10/10Ideas& Ideals Methods Problems Results Outlook: projects
http://http://www.tuwww.tu--chemnitz.dechemnitz.de//philphil//englishenglish/Schmied/Schmied
2222/40/40Clavier09
3. ICE case studies: modalities3. ICE case studies: modalities
hypotheses for comparing ICE corpora: English auxiliaries are very unevenly distributed in
actual language usage epistemic use is more frequent than deontic ENL varieties: “American innovativeness, British conservativism and Australian independence from both”(Collins 2009) ESL varieties have a smaller number of modal
auxiliaries than ENL varieties: ICE-T has smaller frequencies than ICE-K generally and for epistemic usage specifically
Motivation ICE History ICE status Case studies ICE2future Conclusion
2323/40/40Clavier09
Modal auxiliaries in Tanzania and KenyaModal auxiliaries in Tanzania and Kenya
Motivation ICE History ICE status Case studies ICE2future Conclusion
2424/40/40Clavier09
Case study: Case study: Modal auxiliaries Modal auxiliaries in ICEin ICE--GB, ICEGB, ICE--Phil and ICEPhil and ICE--EA (EA (--K/K/--T)T)
Modal auxiliary
Total per corpus Per million words
ICE-GB ICE-Phil/10 ICE-K ICE-T GB ICE-Phil ICE-K ICE-T
can 3574 425 2212 1681 3574 3601,69 1580 1201could 1635 130 1485 1150 1635 1101,69 1061 821may 1219 120 1143 782 1219 1016,95 816 559might 693 45 249 208 693 381,36 178 149must 687 55 652 468 687 466,10 466 334shall 222 30 159 287 222 254,24 114 205should 1117 100 1155 1075 1117 847,46 825 768will 2841 505 2011 1628 2841 4279,66 1436 1163would 3037 270 1496 1176 3037 2288,14 1069 840Total 15025 1680 10562 8455 15025 14237,29 7545 6040
Motivation ICE History ICE status Case studies ICE2future Conclusion
2525/40/40Clavier09
Core modal aux per million words in ICECore modal aux per million words in ICE--GB,GB,--Phil, Phil, --K, K, --TT
Motivation ICE History ICE status Case studies ICE2future Conclusion
0
500
1000
1500
2000
2500
3000
3500
4000
4500
can
could
may
migh
t mus
t sh
all
shou
ld will wou
ld
GBICE-PhilICE-KICE-T
2626/40/40Clavier09
Results:Results:DeonticDeontic and epistemic modals comparedand epistemic modals compared
Background Concepts Examples Hypotheses Results Interpretation Conclusion
2727/40/40Clavier09
Summary modal Summary modal auxilariesauxilaries
hypotheses confirmed:hypotheses confirmed:English in Kenya has developed further towards a New English
variety than English in Tanzania
exceptions (like exceptions (like shallshall) can be explained:) can be explained:English in English in Tanzania is the more formal variety (informal texts are expressed more often in Kiswahili)
further analyses:further analyses: simple distinction epistemic vs. deontic
is not always detailed enough: e.g. lexeme-specific cases like habitual or historic would
n-grams / collocations, negation, etc.
Motivation ICE History ICE status Case studies ICE2future Conclusion
2828/40/40Clavier09
4. The future of ICE (ICE2)4. The future of ICE (ICE2)
1) a broader corpus basis:1) a broader corpus basis: diachronic corpus: cf. Brown familydiachronic corpus: cf. Brown family larger monitorlarger monitor--corpus: webcorpus: web--basedbased2) more comparative studies:2) more comparative studies: applied issues:applied issues:acknowledgment of new normsacknowledgment of new normsreplacing the native speakerreplacing the native speaker
theoretical issues:theoretical issues:dynamic model: reanalysisdynamic model: reanalysisNew English subNew English sub--categories: categories: deletersdeleters vs. preservers vs. preservers
Motivation ICE History ICE status Case studies ICE2future Conclusion
2929/40/40Clavier09
4.0 Changes in international communication 4.0 Changes in international communication since 1989since 1989
““globalglobal”” communicationcommunicationthrough internet, esp. WWW, chats, through internet, esp. WWW, chats, blogsblogsreplaces snailreplaces snail--mail / letters, ??mail / letters, ??or additional categories?or additional categories?
English as a English as a ““globalglobal”” language:language:EIL= English as an International LanguageEIL= English as an International Languageesp. European Union, Chinaesp. European Union, Chinabut but ““(secondary) education through the (secondary) education through the medium of Englishmedium of English””??
Motivation ICE History ICE status Case studies ICE2future Conclusion
3030/40/40Clavier09
4.1 Diachronic changes to ICE categories4.1 Diachronic changes to ICE categories
ICEICE--EA 1990 EA 1990 –– 2010:2010:replacing categories: emailreplacing categories: emailfull 1 Million words eachfull 1 Million words each
integrated in a larger monitor corpusintegrated in a larger monitor corpus
Motivation ICE History ICE status Case studies ICE2future Conclusion
3131/40/40Clavier09
4.1.2 Web monitor corpus: 4.1.2 Web monitor corpus: keywords approachkeywords approach
Business letters:Business letters: Dear, Yours sincerely/faithfully/truly/etc, invoice, Dear, Yours sincerely/faithfully/truly/etc, invoice, memo, fax, bank, account, financial, enquiries, thank you for, memo, fax, bank, account, financial, enquiries, thank you for, manager, secretary, order, PO Box, date, I enclose, enclosed, I manager, secretary, order, PO Box, date, I enclose, enclosed, I look look forward to, c.c.forward to, c.c.
Popular natural science: Popular natural science: popular, everyday, environment, diet, popular, everyday, environment, diet, disease, plants, animals, reptiles, medicine, health, birds, fisdisease, plants, animals, reptiles, medicine, health, birds, fish, h, whales, conservation, zoos, natural history, green issueswhales, conservation, zoos, natural history, green issues”” rainforest, rainforest, everything you need to know about...., Guide to..., made easy, everything you need to know about...., Guide to..., made easy, global warming, wildlife, botanical, ozone layer. global warming, wildlife, botanical, ozone layer.
Administrative writing: Administrative writing: policy, regulations, procedures, guide, policy, regulations, procedures, guide, benefits, grants, entitlements, Guide to..., University calendarbenefits, grants, entitlements, Guide to..., University calendar, , safety, register/registration, code of conduct, license/licensinsafety, register/registration, code of conduct, license/licensing, g, health regulations, FAQhealth regulations, FAQ
Motivation ICE History ICE status Case studies ICE2future Conclusion
3232/40/40Clavier09
4.1.2 4.1.2 WebcrawlerWebcrawler
CustomisableCustomisable, to exclude unwanted files, e.g. images, , to exclude unwanted files, e.g. images, sounds, movies, .exe. sounds, movies, .exe. CustomisedCustomised settings can be saved in settings can be saved in an an ““optionsoptions”” file, [file, [icelite.opticelite.opt]]
Fast:Fast: can download entire websites in a relatively short can download entire websites in a relatively short time (depending on the size of the site)time (depending on the size of the site)
StableStable: it never crashed, even when the download was : it never crashed, even when the download was aborted.aborted.
Can be run Can be run ‘‘in the backgroundin the background’’, and won, and won’’t interfere with t interfere with other processes.other processes.
Can be run overnight, and will safely switch off your PC.Can be run overnight, and will safely switch off your PC. Inserts Inserts time & datetime & date accessedaccessed in each downloaded file. in each downloaded file. courtesy Nelson, Gerry 2009 ICEcourtesy Nelson, Gerry 2009 ICE--lightlight
Motivation ICE History ICE status Case studies ICE2future Conclusion
3333/40/40Clavier09
4.1.2 Workflow for Monitor Corpus4.1.2 Workflow for Monitor Corpus
Use Google Advanced Search to identify major English-language sites in each domain
Use HTTrack to download sites
Select texts, and record details in a spreadsheet
Targeted search to fill gaps, using Keywords
Use Google Advanced Search to identify major English-language sites in each domain
courtesy Nelson, Gerry 2009 ICEcourtesy Nelson, Gerry 2009 ICE--lightlight
Motivation ICE History ICE status Case studies ICE2future Conclusion
3434/40/40Clavier09
4.2.1 Applied issues4.2.1 Applied issues
Comparative studies and a comparative Comparative studies and a comparative database help decide questions of usage database help decide questions of usage and normand norm
Quantitative comparisons allow more Quantitative comparisons allow more gradient usage decisions than native gradient usage decisions than native speakersspeakers
Motivation ICE History ICE status Case studies ICE2future Conclusion
3535/40/40Clavier09
4.2.2 Theoretical issues: 4.2.2 Theoretical issues: dynamic model dynamic model -- new categorisation?new categorisation?
Motivation ICE History ICE status Case studies ICE2future Conclusion
evolutionary stages:evolutionary stages:•• foundationfoundation•• exonormativeexonormative stabilisationstabilisation•• nativisationnativisation•• endonormativeendonormative stabilisationstabilisation•• differentiationdifferentiation
ICEICE--K > ICEK > ICE--T; ICET; ICE--SgpSgp > ICE> ICE--MyMy
3636/40/40Clavier09
4.2.2 Theoretical issues: 4.2.2 Theoretical issues: deletersdeleters vs. preservers?vs. preservers?
MesthrieMesthrie/Bhatt 2008:90/Bhatt 2008:90““One such broad dichotomy involves varieties One such broad dichotomy involves varieties that favour deletion of elements and those that favour deletion of elements and those that disfavour it. In this regard the that disfavour it. In this regard the differences between differences between SgpSgp Eng (especially Eng (especially amongst those with Chinese substrates) and amongst those with Chinese substrates) and African varieties are striking.African varieties are striking.””
(91): (91): Come what may (come).Come what may (come).He made me (to) do it.He made me (to) do it.As you know (that) I am from the As you know (that) I am from the CiskeiCiskei..
EAsiaEAsia vs. Africa?vs. Africa?ICEICE--Phil/Phil/--SgpSgp vs. ICEvs. ICE--EA/EA/--ZA/ZA/--NigNig
Motivation ICE History ICE status Case studies ICE2future Conclusion
3737/40/40Clavier09
5. Conclusion5. Conclusion
ICEICE--corpora are a good basis for empirical corpora are a good basis for empirical ““nationalnational”” and comparative corpus work and comparative corpus work ““EastEast--AfricanismAfricanism”” (marked <ea>):(marked <ea>): in grammar (modality), in grammar (modality), lexicon (lexicon (matatumatatu )) morphology/morphology/idiomaticityidiomaticity ((grass rootsgrass roots) ) etc.etc.
Motivation ICE History ICE status Case studies ICE2future Conclusion
3939/40/40Clavier09
ICEICE--EA/ESL EA/ESL idioms are less fixed/more flexibleidioms are less fixed/more flexible
Motivation ICE History ICE status Case studies ICE2future Conclusion
Kenya Tanzania written spoken written spoken grassroots 4 3 1 16 10 6 20 grass roots 12 11 1 9 3 6 21 grass root 1 1 2
16 25 41
4040/40/40Clavier09
Current issuesCurrent issues
Can results of corpus-linguistic analyses of “real usage” help to decide choices of norm and standards on a national and international basis?YES Can “objective” corpus-linguistic resources replace “subjective” native speaker intuition as a neutral international standard?YES Can corpus analyses add the cognitive dimension to variety formation?YES
Motivation ICE History ICE status Case studies ICE2future Conclusion
4141/40/40Clavier09
ReferencesReferences
Motivation ICE History ICE status Case studies ICE2future Conclusion
GreenbaumGreenbaum, S. (1988). , S. (1988). ““A proposal for an international computerised A proposal for an international computerised corpus of English". World corpus of English". World EnglishesEnglishes 7 , 315 7 , 315
MeshrieMeshrie, R., R.M. , R., R.M. BhattBhatt (2008). (2008). World World EnglishesEnglishes: The study of new : The study of new language varieties. Cambridge: CUP language varieties. Cambridge: CUP
SchmiedSchmied, J. (1990). , J. (1990). ““CorpusCorpus--linguistics and the linguistics and the nativizationnativization of Englishof English””. . World World EnglishesEnglishes 9, 2559, 255--268268
Schneider, E. (2006). Postcolonial English. Varieties around theSchneider, E. (2006). Postcolonial English. Varieties around the world. world. New York: Cambridge Press.New York: Cambridge Press.
ICES ICES -- International Corpus of English Studies International Corpus of English Studies VolVol 1, No 1 (2009). 1, No 1 (2009). UniversitUniversitäätsverlagtsverlag derder TechnischenTechnischen UniversitUniversitäätt Chemnitz, Chemnitz, https://www.bibliothek.tuhttps://www.bibliothek.tu--chemnitz.de/ojschemnitz.de/ojs//
4242/40/40Clavier09
UsingUsing corporacorpora as an innovative as an innovative tooltool to to comparecomparevarietiesvarieties of English of English aroundaround thethe worldworld: : thethe ICE ICE storystoryThis contribution presents the story of the International CorpusThis contribution presents the story of the International Corpus of English (ICE) from the of English (ICE) from the earliest discussions in 1989 to the current project applicationsearliest discussions in 1989 to the current project applications and the big issues of the and the big issues of the future. future. The ICE teams in the past have always worked independently, so tThe ICE teams in the past have always worked independently, so that some corpora were hat some corpora were finished early, others were processed with the help of finished early, others were processed with the help of taggerstaggers and parsers, others were and parsers, others were given up and restarted several times. Although modern computing given up and restarted several times. Although modern computing facilities and facilities and processing tools (like processing tools (like AntConcAntConc) have made simple data analyses relatively easy for ) have made simple data analyses relatively easy for everyone, the most challenging innovations are possible in corpueveryone, the most challenging innovations are possible in corpus collection. Nowadays, s collection. Nowadays, the WWW can facilitate some data collection and offers new optiothe WWW can facilitate some data collection and offers new options for multimedia ns for multimedia corpora. This will make future corpus corpora. This will make future corpus compliationcompliation and exploitation a real challenge. and exploitation a real challenge. The big issues in applying ICE corpora in theory and practice arThe big issues in applying ICE corpora in theory and practice are, for instance:e, for instance: Can results of corpusCan results of corpus--linguistic analyses of linguistic analyses of ““real usagereal usage”” help to decide choices of norm help to decide choices of norm
and standards on a national and international basis?and standards on a national and international basis? Can corpora justify national varieties of English all around tCan corpora justify national varieties of English all around the world?he world? Can corpusCan corpus--linguistic resources replace the native speaker as a neutral intlinguistic resources replace the native speaker as a neutral international ernational
standard?standard? Can corpus analyses add the cognitive dimension to variety forCan corpus analyses add the cognitive dimension to variety formation?mation?This contribution will illustrate that the International Corpus This contribution will illustrate that the International Corpus of English can, despite some of English can, despite some shortshort--comings, be used as a very innovative tool in exploring geographcomings, be used as a very innovative tool in exploring geographic varieties for ic varieties for researchers of every level.researchers of every level.
Motivation ICE History ICE status Case studies ICE2future Conclusion