Date post: | 22-Dec-2015 |
Category: |
Documents |
Upload: | rosalind-carpenter |
View: | 221 times |
Download: | 1 times |
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Corpus Linguistics and Language Education: Development and Utility of the Corpus of
Cameroon English
Daniel A. Nkemleke
Department of EnglishEcole Normale Supérieure
University of YaoundeI
Outline
Introduction: Corpus Linguistics, history
Some (main) existing corpora
Development of the Corpus of Cameroon English (CCE)
Corpus utility with reference to the CCE
Prospect
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Corpus Linguistics and Language Education: Development and Utility of the Corpus of
Cameroon English
Daniel A. Nkemleke
Department of EnglishEcole Normale Supérieure
University of YaoundeI
Plan Introduction: Corpus
Linguistics, history Some (main) existing corpora Development of the Corpus of
Cameroon English (CCE) Corpus utility with reference
to the CCE Prospect
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Introduction: what is Corpus Linguistics?
The study of language based on examples of “real life“ language use, collected, stored and processed via computer
Facilitated by the advent of computer technology (1960s)
Latin: corpus (body): body of text any collection of more than one text, written or spoken
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Introduction (con’t): brief history
Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology (“Primitive corpora?“)
Between 1960s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus)
Computer technology: major support for CL
First African Corpus: 1989 (ICE-East Africa) (Schmied 1989)
Second African Corpus: 1992 CCE (Tiamajou 1993)/ Nigeria??
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Introduction (con’t): brief history
“Thirty years ago when this research started it was considered impossible to process texts of several million words in length.
Twenty years ago it was considered marginally possible but lunatic.
Ten years ago it was considered quite possible but still lunatic. Today it is very popular“
(Thomas/Short 1996: 4)
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Some (main) existing corpora
L1 Corpora Brown Corpus of American English Lancaster-Oslo/Bergen Corpus (LOB) London-Lund Corpus British National Corpus (BNC) Birmingham Corpus of British EnglishL2 Corpora ICE-East Africa (Kenya & Tanzania) Corpus of Cameroon English Corpus of Nigerian English ?? Kolhapur Corpus of Indian EnglishMultinational Corpus Project International Corpus of English (ICE)
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
4 main characteristics of a corpus
1. Sampling & representativeness
Interest in whole variety of English
Attempts to construct a “representative” sample corpus
Which maximally represents variety
Aim: picture as accurate and reasonable as possible of a language population
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Four main characteristic of a corpus (Con‘t)
2. Finite size
Body of finite amount of words, e.g. 1,000,000
Figure determined at beginning of project
monitor corpus: constant addition of texts
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Four main characteristics of a corpus (con‘t)
3. Machine-readable form
Past: reference to printed text
Nowadays: implication, machine-redable
Few in book form (e.g. original London-Lund)
Occasionally other forms of media (microfiche, recordings)
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Four main characteristics of a corpus (con‘t)
4. Standard reference
Tacitly a corpus constitutes a standard reference
Presupposition: wide availability to other researchers
Direct comparison of results with other varieties
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Development of the Corpus of Cameroon English (CCE)
Began in 1992 with the collaboration of two British universities (Birmingham/Liverpool)
Assistance of the British council in Yaoundé
Target of a million words reached in 1994
Data use for classroom activities/research since then
2005: project benefited from a grant of the AvH
→ Goal: Further development (tagging) of the database (TU-Chemnitz)
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Objective
Provide authentic data for the description of the main features and problems inherent in the variety of English which is written in Cameroon
Provide a source of authentic material for English language teaching/learning in Cameroon
Serve as a database for comparative studies on CamE in relation to other varieties of English
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Text categories: written component
Text categories No. of texts No. of words
A: Official Press 257 126,539
B: Private Press 42 49,098
C: Novels & Short Stories 21 77,096
D: Religion 19 96,380
E: Tourism 5 26,881
F: Official letters 77 12,285
G: Private letters 250 79,386
H: Students’ Essays 83 137,399
I: Government Memos 16 71,368
J: Advertisement 10 4,875
K: Miscellaneous 22 139,247
TOTAL 802 820,554
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Text categories: spoken component
Dialogues 1. Conversations 2. Phone calls 3. Broadcast discussions 4. Classroom lessons 5. Interviews 6. Parliamentary debates 7. Legal cross- examination 8. Business transactions
Monologues 1. Commentaries 2. Demonstrations 3. Legal Presentations 4. Broadcast News 5. Broadcast Talks 6. Non-broadcast Talks
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Corpus utility with reference to CCE
13 possible ways in which a corpus may be useful 1. Corpora as a source of empirical data 2. Corpora in language teaching and learning 3. Corpora in Lexical studies 4. Corpora in grammar studies 5. Corpora in speech research 6. Corpora and semantic studies 7. Corpora in pragmatic and discourse studies 8. Corpora in sociolinguistic studies 9. Corpora and stylistic studies10. Corpora in historical linguistics11. Corpora in dialectology and variational studies12. Corpora in Psycholinguistics13. Corpora in cultural studies
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
1. Corpus as a source of empirical data
Linguists can make more objective statements on language use in the variety, comparing other varieties
• Nkemleke /Mbangwana (2001)• Nkemleke (2003)• Nkemleke (2004a, 2004b)• Nkemleke (2005)• Nkemleke(2006)• Nkemleke (2007a, 2007b)• Nkemleke(fc: 2008a, 2008b, 2008c)• Schmied/Nkemleke (fc:2008a, 2008b)• A number of post-graduate projects in ENS/Faculty
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
2. Corpora in language teaching/learning
CCE data used for classroom activities over the years
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Concordances : arrive _ NP (Simplification)
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Value of concordances
Support teachers’ classroom explanation
Learner’s as researchers
Data-driven learning
Critical look at existing language teaching material
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Natural data for textbook
CCE data used for studies on aspects of Cameroon English usage, E.g. Hans-Georg Wolf used data from the corpus in his book English in Cameroon, published in 2001 by Mouton de Grouter (Berlin/New York).
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
3. Corpora in Lexical Studies
Keep informed about new words, changing meanings
Call up word combinations, co-occurring words
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
Prospect
ICE-Cameroon is on-going
Future possibility of more specialized corpora E.g. Academic texts, Fiction
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008
END
Thank You!