Corpus Linguistics: Counting words, texts or features Mike Scott, University of Liverpool Corpus...

Post on 31-Mar-2015

220 views 0 download

Tags:

transcript

Corpus Linguistics: Counting words, texts or featuresMike Scott, University of Liverpool

Corpus Linguistics Summer Institute June-July 2008

Aims

to identify what is in principle countable using CL techniques

to consider what it is in principle desirable to count and why

No, not that kind of sentence

What have we got, anyway?

electronic texts is anything missing?

What is a text, anyway?

What we’re looking at

Words in Textssentencesparagraphs

sectionskey words

etc.

Words in the Brainmemory e.g. tip-of-the-tongue

word associationsenjoyment

priming

Words in the Languagelexicographyterminology,

phraseology, etc.patterns of “standard English”

Words in Culturecultural key words,

indicators of class andstance, bias, etc.

What is countable?

characters word-forms parts of speech sentences headings? paragraphs? lines? pages? other divisions (section, chapter) if marked up utterances turns grammatical sequences

What isn’t countable?

metaphors semantic prosody patterns

because these are abstractions

though we have to try …

by seeking various markers, frames signalling these abstractions

recognising, however, that 1 form ≠ 1 function

Corpus Linguistics is all about pattern-seeking!

Why counting, anyway?

search for interpretations understanding re-defining categories

via patterns WordSmith

What should we count?

the question of focus the question of scope pointfulness: the search for patterns the POS-trap

metadata are used to forget the data (François Rastier)

Reference

Scott, M. & C. Tribble, 2006. Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins. Chapters 1 & 2.