+ All Categories
Home > Data & Analytics > PLOTCON NYC: New Open Viz in R

PLOTCON NYC: New Open Viz in R

Date post: 08-Jan-2017
Category:
Upload: plotly
View: 100 times
Download: 0 times
Share this document with a friend
26
Hadley Wickham @hadleywickham Chief Scientist, RStudio Managing many models November 2016
Transcript
Page 2: PLOTCON NYC: New Open Viz in R

You’ve never seen data presented like this. With the drama and urgency of a sportscaster, statistics guru Hans Rosling debunks myths about the so-called “developing world.”

https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen

Page 3: PLOTCON NYC: New Open Viz in R
Page 4: PLOTCON NYC: New Open Viz in R

40

60

80

1950 1960 1970 1980 1990 2000year

lifeExp

142 countries

Page 5: PLOTCON NYC: New Open Viz in R

●●●

●●

●●

● ●

●●

●●●

●●

0.0

0.2

0.4

0.6

0.8

0.00 0.25 0.50 0.75 1.00R2

Estim

ated

yea

rly in

crea

se in

life

exp

ecta

ncy

continent ● ● ● ● ●Africa Americas Asia Europe Oceania

Page 6: PLOTCON NYC: New Open Viz in R

But...

Arbitrarily complicated models

Three simple underlying ideas

Scales to big data

Page 7: PLOTCON NYC: New Open Viz in R

Each idea is partnered with a package

1. Nested data (tidyr) 2. Functional programming (purrr) 3. Models → tidy data (broom)

Page 8: PLOTCON NYC: New Open Viz in R

40

60

80

1950 1960 1970 1980 1990 2000year

lifeExp

142 countries

Want to summarise each with a linear model

Page 9: PLOTCON NYC: New Open Viz in R

Currently our data has one row per observation

Country Year LifeExpAfghanistan 1952 28.9

Afghanistan 1957 30.3Afghanistan ... ...

Albania 1952 55.2Albania 1957 59.3Albania ... ...Algeria ... ...... ... ...

Page 10: PLOTCON NYC: New Open Viz in R

More convenient to one row per group

Country DataAfghanistan <df>

Albania <df>Algeria <df>... ...

Year LifeExp1952 28.91957 30.3

... ...

Year LifeExp1952 55.21957 59.3

... ...

I call this a nested data frame

Page 11: PLOTCON NYC: New Open Viz in R

library(dplyr) library(tidyr)

by_country <- gapminder %>% group_by(continent, country) %>% nest()

In R:

Page 12: PLOTCON NYC: New Open Viz in R

Each country will have an associated model

Country DataAfghanistan <df>

Albania <df>Algeria <df>... ...

lm(lifeExp ~ year1950, data = afghanistan)

lm(lifeExp1950 ~ year, data = albania)

Page 13: PLOTCON NYC: New Open Viz in R

Why not store that in a column too?

Country Data ModelAfghanistan <df> <lm>

Albania <df> <lm>Algeria <df> <lm>... ... ...

Page 14: PLOTCON NYC: New Open Viz in R

List-columns keep related things together

Anything can go in a list & a list can go in a data frame

Page 15: PLOTCON NYC: New Open Viz in R

library(dplyr) library(purrr)

country_model <- function(df) { lm(lifeExp ~ year1950, data = df) }

models <- by_country %>% mutate( mod = map(data, country_model) )

In R:

Page 16: PLOTCON NYC: New Open Viz in R

40

60

80

1950 1960 1970 1980 1990 2000year

lifeExp

142 countries

Page 17: PLOTCON NYC: New Open Viz in R

●●●

●●

●●

● ●

●●

●●●

●●

0.0

0.2

0.4

0.6

0.8

0.00 0.25 0.50 0.75 1.00R2

Estim

ated

yea

rly in

crea

se in

life

exp

ecta

ncy

continent ● ● ● ● ●Africa Americas Asia Europe Oceania

Page 18: PLOTCON NYC: New Open Viz in R

What can we do with a list of models?

Country Data ModelAfghanistan <data> <lm>

Albania <data> <lm>Algeria <data> <lm>

... <data> <lm>

Page 19: PLOTCON NYC: New Open Viz in R

What data can we extract from a model?

year lifeExp1952 69.4

1957 70.3

1962 71.2

1967 71.5

... ...

lm(lifeExp ~ year, data = nz)

R2=0.95

Intercept -307.7

Slope 0.19

year resid

1952 0.70

1957 0.61

1962 0.63

1967 -0.05

... ...

glance

tidy

augment

New Zealand

Page 20: PLOTCON NYC: New Open Viz in R

models <- models %>% mutate( glance = map(model, broom::glance), tidy = map(model, broom::tidy), augment = map(model, broom::augment) )

We need to do that for each model

Page 21: PLOTCON NYC: New Open Viz in R

Which gives us:

Country Data Model Glance Tidy AugmentAfghanistan <df> <lm> <df> <df> <df>

Albania <df> <lm> <df> <df> <df>Algeria <df> <lm> <df> <df> <df>

... ... ... ... ... ...

Page 22: PLOTCON NYC: New Open Viz in R

Unnest lets us go back to a regular data frame

Country DataAfghanistan <df>

Albania <df>Algeria <df>

... ...

Country Year LifeExpAfghanistan 1952 28.9

Afghanistan 1957 30.3Afghanistan ... ...

Albania 1952 55.2Albania 1957 59.3Albania ... ...Algeria ... ...

... ... ...

nest()

unnest()

Page 23: PLOTCON NYC: New Open Viz in R

Demo

Page 24: PLOTCON NYC: New Open Viz in R

1. Store related objects in list-columns.

2. Learn FP so you can focus on verbs, not objects.

3. Use broom to convert models to tidy data.

Page 25: PLOTCON NYC: New Open Viz in R

Data frames

Lists

dplyr

purrr

tidyr

Modelsbroom

Workflow replaces many uses of ldply()/dlply() (plyr) and do() + rowwise() (dplyr)

http://r4ds.had.co.nz/

Page 26: PLOTCON NYC: New Open Viz in R

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0

United States License. To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc/3.0/us/


Recommended