Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Post on 22-Dec-2015

224 views 4 download

Tags:

transcript

Introducing Corpus Linguistics: AntConc and Project Gutenberg.

Dr Glenn Hadikin

• Download two magazines• Conduct a ‘keyword’ query

What is corpus linguistics?

Corpus linguistics is the study of large bodies of naturally occurring text that are ‘visible’ to corpus analysis software.

• https://www.gutenberg.org

When you see this press ctr a to highlight it all and then ctr c to copy it all

Open up Wordpad and press ctr v to dump all the text to Wordpad

Press ‘save as’, choose ‘plain text’ and give it a filename such as boysandgirl.txt

• That’s how I got the boysandgirls.txt file on the website.

• The girls.txt file followed the same procedure but is a copy of ‘The Girl’s Own Paper’ from 1886

Go to ‘file’ and open ‘boysandgirls.txt’

You can type any common word in to the search box at the bottom and see if it’s working okay.

Go to ‘tool preferences’, ‘add files’, upload ‘girls.txt’ and press ‘load’ – this is called a reference file

Before any keyword analysis you must create a ‘wordlist’

• Any guesses what words or ideas will be key in ‘boysandgirls’ compared with ‘girls’?

Click on a word to explore further…

You can go back to ‘tool preferences’ and press ‘swap’ for opposite case.

Check there are 1117 occurrences of ‘the’ to make sure the files have swapped correctly.

If the ‘boysandgirls’ keyword list comes back (with ‘illustrated’ at the top) go back to ‘tool preferences’, clear and reload the

‘boysandgirls’ reference corpus.

• Would similar patterns come up in 21st century books?

Thank you – all invited to our book launch in Blackwells book shop tomorrow at 5pm.