Paul Baker July 2011. Structure 1 Background to a corpus approach to (C)DA 2 A 9 stage model...

Paul Baker July 2011

Structure1 Background to a corpus approach to (C)DA2 A 9 stage model


3 An experiment in analyst consistency

Discourse“a set of meanings, metaphors, representations, images,

stories, statements and so on that in some way together produce a particular version of events… Surrounding any one object, event, person etc., there may be a variety of different discourses, each with a different story to tell about the world, a different way of representing it to the world.” (Burr 1995: 48)

Critical discourse analysisIdentifies discourses in textsA politically driven form of analysisSeveral levels of analysis e.g.

TextProduction and receptionIntertextualitySocial context (society’s politics, history)

Usually qualitative and with small datasets

Criticisms of CDA

“Your analysis will be the record of whatever partial interpretation suits your own agenda” (Widdowson 1998: 148)

“what is distinctive about Critical Discourse Analysis is that it is resolutely uncritical of its own discursive practices” (Widdowson 1998: 151)

The benefits of using corporaInterpretation “grounded in systematic language

description”Need to account for much larger amounts of textAccurate and fast calculationsCorpus-driven techniques reduce researcher political and

cognitive bias (primacy effect, clustering illusion).Potential to find exceptional cases

Contributors to the approachMichael Stubbs (1994, 1996)Gerlinder Mautner (1995, 2004, 2007, 2009)Carmen Caldas Coulthard (1995)Ramesh Krishnamurthy (1996)Alan Partington (Corpus Assisted Discourse Studies)

(2004, 2008, 2010)Susan Hunston (2002, 2003)Kieran O’Halloran and Caroline Coffin (2004)Paul Baker (2005, 2006, 2008)

My own researchGay Men (2004a,b, 2005)Refugees and asylum seekers (2005, 2008a,b)Fox-hunting (2006)Islam and Muslims (2010)Gender (2006, 2008, 2010)Foreign doctors (2011)

Nine stage Corpus-assisted CDA (Baker et al 2008)1 Context-based analysis of topic via history/politics/culture/etymology

Identify existing topoi/discourses/strategies via wider reading, reference to other CDA studies.

2 Establish research questions/corpus building procedures.

3 Corpus analysis of frequencies, clusters, keywords, dispersion etc – identify potential sites of interest in the corpus along with possible discourses, strategies, relate to those existing in the literature

4 Qualitative or CDA analysis of a smaller, representative set of data (e.g. concordances of certain lexical items or of a particular text or set of texts within the corpus) – identify discourses, strategies etc.

5 Formulation of new hypotheses or research questions

6 Further corpus analysis based on new hypotheses, identify further discourses, strategies etc.

7 Analysis of intertextuality or interdiscursivity based on findings from corpus analysis

8 New hypotheses

9 Further corpus analysis, identify additional discourses, strategies etc

Stage 1 – wider reading

Stage 2 – corpus building/research questionsDo representations of Muslims match with van Dijk’s or

Karim’s categories?

What differences occur over time or between different types of newspapers?

143 million word corpus (200,037 articles)

Stage 3 – corpus-driven analysisKeyword comparison of broadsheet and tabloidsOmar Bakri and Abu Hamza were strong tabloid


Stage 4 – collocation/concordance analysis

Stage 4 – Concordance analysisEVIL hook-handed Muslim cleric Abu Hamza is using a legal

trick to delay getting the boot from Britain for THREE years and rake in thousands more in hand-outs.

The People, March 21st, 2004

RANTING Muslim cleric Omar Bakri Mohammed pulled off another handouts coup by claiming disability benefit to get a £28,000 car, complete with satellite navigation system. Yet he walked into the showroom with barely a limp.

The Sun May, 16th, 2005

Stage 5 – new hypothesesStories about ‘undeserving Muslims on benefits’ originate in the tabloids and influence the discourse of right-leaning broadsheets.

Stage 6 – corpus analysis98 99 00 01 02 03 04 05 06 07 08 09 Total

Express ND 0 0 9 5 71 54 194 60 52 127 62 634Sun ND ND 7 31 4 56 61 85 78 41 80 60 503Mail 4 10 6 30 7 59 8 104 47 29 57 47 408Star ND ND ND 12 0 18 20 22 15 15 24 23 149Mirror 3 0 0 14 4 6 3 30 14 2 8 1 85People 0 0 1 6 0 17 4 4 11 14 0 0 57Telegraph ND ND 0 8 1 4 1 4 3 1 19 4 45Times 0 0 0 11 2 9 0 14 0 6 0 0 42Guardian 0 0 0 0 0 0 0 1 1 0 1 0 3Independent 0 0 0 0 0 0 0 0 0 0 0 0 0Observer 0 0 0 0 0 1 0 0 0 0 0 0 1Total 7 10 14 121 23 241 151 458 229 160 316 197 1927

Stage 6 – corpus analysisINVESTIGATORS discovered £180,000 in a London bank account held

by a radical Muslim cleric accused of fomenting and financing terrorism. Sheikh Abu Qatada, who lives on benefits in Acton, west London, had his assets frozen at the weekend after appearing on a Treasury list of people suspected of "committing or providing material support for acts of terrorism".

Telegraph, October 18th, 2001

The taxpayer will also fund at least £12,000 per year in benefits for Qatada, his wife and five children, even though Qatada was once found to be carrying £170,000 in cash when he was stopped by police.”

Telegraph, June 18th, 2008

Stage 7 - intertextualityDAVID Blunkett has ordered a benefits blitz on Islamic hate

clerics who sponge off the state. The Daily Express, August 17th, 2005

So, David Blunkett is to have a blitz on Muslim clerics who sponge off the state ("Benefits blitz on the hate preachers", August 17).

Daily Express, letters, August 18th, 2005

Stage 8 – new hypothesesSome newspapers use readers’ letters as a legitimation strategy to print more Islamophobic representations.

Stage 9 – analysis of corpusPIGGYBANKS are facing the axe - because some

Muslims could take offence. Britain's top High Street banks have ruled the money-boxes are politically incorrect. But last night the move sparked snoutrage. And one of Britain's four Muslim MPs, Khalid Mahmoud, said: "A piggybank is just an ornament. Muslims would never be seriously offended."

The Star, October 24th, 2005

Stage 9 – analysis of corpusTEXT MANIACS (The Star, October 25th, 2005)

muslims r offended by our piggy banks! ? Then the £56 me n ma wife n ma 4 girls have got in our piggy bank 2 help the ppl in pakistan wil b spent on a fry up.

Y shud we change r way of life just 2 stop offending muslims. they aint neva gonna change theirs. Maybe they shud try eating pork. a nice bacon sarnie cud change any1's mind.

Evaluation of methodFruitful in identifying numerous features of the corpus

which could not have been considered in advanceResearchers need to be trained in conducting different

forms of analysis (or utilise a team of researchers with different skills).

Each stage can open up multiple pathways and/or hypotheses, not all of which can be followed due to time and money constraints. At times, the number of ‘directions’ that the research could go in felt overwhelming and endless.

Did researcher bias impact on paths followed, outcomes?

An inter-analyst consistency experiment

Subjects – 5 analysts all with prior experience of combining corpus linguistics and discourse analysis or CDA.

The corpus search term: “foreign doctors” + similar termsAll British national newspapers (about half a million words)Research Question “How are foreign doctors represented

in the British press 2000-10?”Any form of corpus methods or software allowed

Methods1 2 3 4 5

Concordance analysis

✓ ✓ ✓ ✓ ✓

Collocates ✓ ✓ ✓ ✓ ✓Keywords with reference corpus

✓ ✓

Key semantic categories

Dispersion plots ✓

Clusters ✓Diachronic keywords

Corpus Tool

Software used

1 2 3 4 5

WordSmith5 ✓ ✓ ✓ ✓Antconc ✓Wmatrix ✓Other reference corpus

✓ ✓ ✓

FindingsFinding 1 2 3 4 5

Poor language ✓ ✓ ✓ ✓ ✓incompetence ✓ ✓ ✓ ✓Need to regulate/test ✓ ✓ ✓ ✓killer/killed ✓ ✓ ✓ ✓NHS shortages ✓ ✓ ✓ ✓Flood metaphor ✓ ✓ ✓Invasion (by Germans) ✓ ✓Taking British jobs ✓ ✓Generalising ✓ ✓Terrorist threat ✓Health risk ✓FDs ignoring vacancies ✓Needed in own country ✓Very expensive ✓NHS Cost-cutting measure ✓Evil/villainous ✓“out of hours” ✓“alive” ✓“NHS red tape” ✓NHS failing ✓Tory vs Labour ✓Profession vs government ✓FDs under stress ✓Junior doctors at risk ✓“Racist” if we complain ✓Consultants as warners ✓

Shared findings1 2 3 4 5 Total

findings1 - 25% 29% 33% 33% 58%2 - - 29% 15% 36% 38%3 - - - 13% 31% 46%4 - - - - 25% 19%5 - - - - - 19%

e.g. calculated by: # of shared findings made by analyst 1 and 2 # of findings made by either analyst 1 or 2

Analysts 2 and 5 – most similar findings (36%) Analysts 3 and 4 – hardly any similar findings (13%)

Did the analysts uncover different things?Only one finding (4%) discovered by every single analystAbout a quarter of findings discovered by the majority

(3+) of analysts.But 65% of findings only discovered by 1 analystDistinction between “major” and “minor” findings.Analysts agreed on the overall ‘feel’ of the data, but the

specifics differed.Two “productive” strategies: spend a long time on one

technique (1, 3), use lots of different techniques (2). Time and ability/experience are important factors.


Corpus tools give a reasonably high degree of consistency for identifying larger patterns

Caution in concluding the techniques remove all biasProcedures direct attention to unforeseen aspects of the

dataResulting in more interesting questions and hypotheses

Thank you