Mdst 3559-01-25-data-journalism

Post on 26-Jan-2015

107 views 0 download

description

 

transcript

Data Journalism

MDST 3559: DataestheticsProf. Alvarado

1/25/2011

Business

• Course web sitehttp://pages.shanti.virginia.edu/Dataesthetics_S11

• Course Collab site– Dataesthetics S11

• Syllabus– Posted as a page on the course web site– Review at end of class

Review

• Dataesthetics is about data design• Data design is relevant at several levels:– Data modeling (tables, etc.)– Processing (code)– Visualizing (charts, graphs, interfaces, art, etc.)– Contextualizing (digital storytelling, arguments,

presentations, etc.)• Each level denotes a form a digital

representation

Overview

• We look at the new field of Data Journalism• A framing example for the course– Accessible content– Shows all of the levels– Uses available tools– A great example to imitate

• Thursday we will do our own DJ– Acquire data and use the tools

What is Data Journalism?

How is DJ related to traditional journalism?

i.e. news stories and op eds, aka Plain Old Journalism (POJ)

Relation to POJ

• Data work is supplementary to the story– Combines data, visualization, and story-telling

• But also valuable in itself – the publishing of interesting data is a journalistic act that

stands alone– “The Guardian curates far more data than it creates” (NJL)– Data tells a story

• More interactive– “there’s somebody out there who knows a lot more than

you do, and can thus contribute.” (NJL)

What is the workflow of DJ?

“Find, interrogate, visualize, mash”

• Acquisition from diverse sources– Well-formatted data sources– Web scraping from government PDFs, web sites

• Everything ends up in Google Docs • Data is cleaned up• Data is interrogated, explored• Available tools used to make visualizations

Example: Afghanistan IEDs

Example

• Get IED data from Data Blog link to Google– http://www.guardian.co.uk/news/datablog/2010/j

ul/27/wikileaks-afghanistan-data-datajournalism• Download as CSV– Change extension to txt

• Open in Excel and save as tab delimited file– Delete extra data

• Paste into Many Eyes– Choose Block Histogram

Sources

Government Data

• http://www.guardian.co.uk/data • http://www.data.gov/• http://factual.com/

Tools

“The technology involved is surprisingly simple, and mostly free. The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which

require no special tools for viewing and can be downloaded in just about any desired

format. Visualizations are mostly via Many Eyes and Timetric, both free.”

http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/

TBL says the future of journalism "lies with journalists who know their CSV from their RDF, can throw together some quick

MySQL queries for a PHP or Python output … and discover the story lurking

in datasets released by governments, local authorities, agencies, or any combination of them – even across

national borders."  Same for scholarship?

Types of Data

• Sources vary – often must be scraped• CSV (‘comma separated values’) is the lingua

franca– Once it is in this form, you can do anything with it– Actually more general—any delimited format

Types of Visualization

• ManyEyes– http://www-958.ibm.com/software/data/cognos/

manyeyes/page/Visualization_Options.html• Google– http://code.google.com/apis/visualization/docum

entation/gallery.html

Homework

• Get a Google account and visit Google Docs– docs.google.com– Create a spreadsheet

• Create a ManyEyes account– http://www-958.ibm.com/software/data/cognos/

manyeyes/– Read “Visualization Types”

Syllabus