Data Journalism 2: cleaning, combining, communicating

Post on 27-Jan-2015

108 views 2 download

Tags:

description

 

transcript

Monday, 5 March 2012

Watch: West Wing on maps www.youtube.com/watch?v=n8zBC2dvERM

Monday, 5 March 2012

Online JournalismCity UniversityPaul Bradshaw

Data 2: clean, combine, communicate

Monday, 5 March 2012

5 things you need to know about eachData journalism in actionWalkthrough

Themes

Monday, 5 March 2012

.

How clean is the data?

Monday, 5 March 2012

“With the help of just Benford’s law and data sets to compare he’s able to demonstrate how the police are systematically hiding over a thousand murders a year in a single state, and that’s just in one small part of the article”

- Pete WardenMonday, 5 March 2012

http://delicious.com/paulb/benfordslawMonday, 5 March 2012

1. Data always needs cleaning up2. Treat the ‘source’ like a source3. Use the right ‘average’ and percentage4. Watch for changing context: inflation, boundaries, classification5. Always work on copies of raw data

5 things you need to know about cleaning data

Monday, 5 March 2012

Monday, 5 March 2012

“What the Independent have done is confuse the UK’s deficit with our debt [making] the debt problem look around eight times worse than it is. And it used the whole of its front page to do so.”

- James BallMonday, 5 March 2012

A town has two hospitals. Hospital A is bigger than hospital B. One of them has a birth rate of 60% boys. Which one is it more likely to be?

Question?

Monday, 5 March 2012

The smaller hospital is more likely to have a 60% birth rate - larger samples are more stable.

Question?

Monday, 5 March 2012

16http://blog.ouseful.info/2011/10/31/power-tools-for-aspiring-data-journalists-r/

Monday, 5 March 2012

Measurement doesn't answer anything if there's only one variableStatistical significanceSample size and selectionControls and the placebo effectRegression to the meanRead up.

What is the data worth?

Monday, 5 March 2012

Data > Text to columns or =SPLITFind & replace=IF(condition, if met, if not)=TRIM, =CONCATENATE=RIGHT, =LEFT, =MID=REPLACE, =SUBSTITUTE=LEN

Getting data ready to answer questions

Monday, 5 March 2012

Edit cells > common transformsEdit cells > split multi-valued cellsFacet > text facetExport...

Walkthrough: cleaning data in Google Refine

Monday, 5 March 2012

.

Communicating data stories

Monday, 5 March 2012

Monday, 5 March 2012

1. Choose the chart for the purpose2. For answers or for story?3. Good design is when there’s nothing more to take away4. It should be self-contained & have refs5. Be careful with scales and classes

5 things you need to know about visualising data

Monday, 5 March 2012

or http://chartchooser.juiceanalytics.com/Monday, 5 March 2012

http://junkcharts.typepad.com/junk_charts/trifecta-checkup/

Monday, 5 March 2012

What is wrong with this picture?

Monday, 5 March 2012

Monday, 5 March 2012

http://simplecomplexity.net/statistics-without-context/

Monday, 5 March 2012

.

Monday, 5 March 2012

ManyEyes, Tableau, Number PictureWordle, TagxedoBatchGeo, FusionTablesGephiDelicious.com/paulb/vis+tools

Visualisation tools

Monday, 5 March 2012

Publish embed code & link to dataHave or join a Flickr group for visualisations, comment on othersTumblr blog Digg, Reddit, StumbleuponBuzzdata

Distribution: getting social

Monday, 5 March 2012

.

Mashing data

Monday, 5 March 2012

1. It is what a journalist does best2. Look for a point of connection: place? Person? Company? Date? Code?3. Mashups can be live, updated or static 4. What an API can do5. What APIs there are

5 things you need to know about mashing data

Monday, 5 March 2012

Monday, 5 March 2012

Yahoo! Pipes, xFruitsOpenHeatMapMapalist, Maptube, FusionTablesScraperwikiGoogle Refine

Mashup tools

Monday, 5 March 2012

Edit column > Add column by fetching URLsUse GREL (Google Refine Expression Language)Search web for help & examples

Walkthrough: grabbing geo data with Google Refine

Monday, 5 March 2012

.

Questions?

Monday, 5 March 2012

Links

OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj09Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/statistics Delicious.com/paulb/mashups

Monday, 5 March 2012

Before the lab: play with these techniques yourself, have problems, find solutions, raise questions. Install Google Refine and Tableau on your laptop to use.- Visualise, interrogate or mash data

Lab

Monday, 5 March 2012

Books

Kaiser Fung - Numbers Rule Your WorldBen Goldacre - Bad ScienceDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data

Monday, 5 March 2012