+ All Categories
Home > Documents > Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Date post: 22-Dec-2015
Category:
Upload: grant-smith
View: 224 times
Download: 4 times
Share this document with a friend
Popular Tags:
27
Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Transcript
Page 1: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Introducing Corpus Linguistics: AntConc and Project Gutenberg.

Dr Glenn Hadikin

Page 2: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

• Download two magazines• Conduct a ‘keyword’ query

Page 3: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

What is corpus linguistics?

Corpus linguistics is the study of large bodies of naturally occurring text that are ‘visible’ to corpus analysis software.

Page 4: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 5: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 6: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

• https://www.gutenberg.org

Page 7: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 8: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 9: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 10: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

When you see this press ctr a to highlight it all and then ctr c to copy it all

Page 11: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Open up Wordpad and press ctr v to dump all the text to Wordpad

Page 12: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Press ‘save as’, choose ‘plain text’ and give it a filename such as boysandgirl.txt

Page 13: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

• That’s how I got the boysandgirls.txt file on the website.

• The girls.txt file followed the same procedure but is a copy of ‘The Girl’s Own Paper’ from 1886

Page 14: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 15: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Go to ‘file’ and open ‘boysandgirls.txt’

Page 16: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

You can type any common word in to the search box at the bottom and see if it’s working okay.

Page 17: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Go to ‘tool preferences’, ‘add files’, upload ‘girls.txt’ and press ‘load’ – this is called a reference file

Page 18: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Before any keyword analysis you must create a ‘wordlist’

Page 19: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

• Any guesses what words or ideas will be key in ‘boysandgirls’ compared with ‘girls’?

Page 20: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 21: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Click on a word to explore further…

Page 22: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

You can go back to ‘tool preferences’ and press ‘swap’ for opposite case.

Page 23: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Check there are 1117 occurrences of ‘the’ to make sure the files have swapped correctly.

Page 24: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.
Page 25: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

If the ‘boysandgirls’ keyword list comes back (with ‘illustrated’ at the top) go back to ‘tool preferences’, clear and reload the

‘boysandgirls’ reference corpus.

Page 26: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

• Would similar patterns come up in 21st century books?

Page 27: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin.

Thank you – all invited to our book launch in Blackwells book shop tomorrow at 5pm.


Recommended