+ All Categories
Home > Documents > Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 ·...

Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 ·...

Date post: 22-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
60
Heatmaps for Economic Analysis Tom Cui, Eric Zwick (DRAFT) October 5, 2016 1 / 30
Transcript
Page 1: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Heatmaps for Economic Analysis

Tom Cui, Eric Zwick(DRAFT)

October 5, 2016

1 / 30

Page 2: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?

I A two-dimensional visualization of data using colourto represent magnitude

I Broad definition, which could be divided into

I Embedded heatmaps that overlay colour on an actual map orimage (not covered here)

I Matrix heatmaps that presents a grid of values where coloursdiffer by cell

2 / 30

Page 3: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?Example: The WSJ vaccine visualization (DeBold, Friedman2015)

3 / 30

Page 4: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?Example: Kaiser Fung’s executions data

4 / 30

Page 5: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?

Example (Bad): A “quilt plot” of Hep C prevalence (Wand et al)

5 / 30

Page 6: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?Example: Plotting gene expression data over samples (TCGN 2013)

Each row (∼ 1500)is one gene

DendrogramEach row isa protein

6 / 30

Page 7: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?Example: Plotting gene expression data over samples (TCGN 2013)

Each row (∼ 1500)is one gene

DendrogramEach row isa protein

6 / 30

Page 8: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?Example: Plotting gene expression data over samples (TCGN 2013)

Each row (∼ 1500)is one gene

DendrogramEach row isa protein

6 / 30

Page 9: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?

Some takeaways from these examples:

I The axes change the interpretation(1) - (3) use time as the X and factors as the Y, (4) uses factors for both

7 / 30

Page 10: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?

Some takeaways from these examples:

I The axes change the interpretation(1) - (3) use time as the X and factors as the Y, (4) uses factors for both

I Good representation of high-dimensional data(4) is an extreme example of this, but common in bioinformatics

7 / 30

Page 11: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

What is a heatmap?

Some takeaways from these examples:

I The axes change the interpretation(1) - (3) use time as the X and factors as the Y, (4) uses factors for both

I Good representation of high-dimensional data(4) is an extreme example of this, but common in bioinformatics

I Permuting axis order improves interpretation(2) sorts Y by total count over the sampling period, (4) uses clusteranalysis (recall dendrogram)

7 / 30

Page 12: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

I In an ideal world, we could derive causal effects in a modelY = g(W ) using exogeneous assignment of W and observingthe entire support of W

I Big data makes the latter easier. Former still hard!

I Hence research designs that exploit a policy introduction orkink are popular

8 / 30

Page 13: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

I In an ideal world, we could derive causal effects in a modelY = g(W ) using exogeneous assignment of W and observingthe entire support of W

I Big data makes the latter easier. Former still hard!

I Hence research designs that exploit a policy introduction orkink are popular

8 / 30

Page 14: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

I In an ideal world, we could derive causal effects in a modelY = g(W ) using exogeneous assignment of W and observingthe entire support of W

I Big data makes the latter easier. Former still hard!

I Hence research designs that exploit a policy introduction orkink are popular

Now consider a heatmap where time is on the X axis (showingthe policy introduction) and where W, a variable of interest orone related to a latent factor is binned on the Y axis (showing thesupport of W)

8 / 30

Page 15: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economicsExample: Scaled house sales in a heatmap sorted by FTHBexposure, from Berger, Turner, Zwick (2016)

10

20

30

40

50

60

70

80

90

100

Aug 2007 Feb 2008 Aug 2008 Feb 2009 Aug 2009 Feb 2010 Aug 2010 Feb 2011

Qua

ntile

of i

nstr

umen

t

0.5

0.7

0.9

1.1

1.3

MeanOutcomes

9 / 30

Page 16: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Using earlier takeaways:

I The axes change the interpretationPlacing time on X and an instrument of W on Y implies this heatmap is avisualization of nonparametric regression

I Good representation of high-dimensional dataAround 8600 ZIPs binned into 100 percentiles

I Permuting axis order improves interpretationY axis sorted to be increasing in W’s instrument, and figure tells us the

effect of W on Y is positive in a linear model

10 / 30

Page 17: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Using earlier takeaways:

I The axes change the interpretationPlacing time on X and an instrument of W on Y implies this heatmap is avisualization of nonparametric regression

I Good representation of high-dimensional dataAround 8600 ZIPs binned into 100 percentiles

I Permuting axis order improves interpretationY axis sorted to be increasing in W’s instrument, and figure tells us the

effect of W on Y is positive in a linear model

10 / 30

Page 18: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Using earlier takeaways:

I The axes change the interpretationPlacing time on X and an instrument of W on Y implies this heatmap is avisualization of nonparametric regression

I Good representation of high-dimensional dataAround 8600 ZIPs binned into 100 percentiles

I Permuting axis order improves interpretationY axis sorted to be increasing in W’s instrument, and figure tells us the

effect of W on Y is positive in a linear model

10 / 30

Page 19: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Extensions:

I Quantiles of instrument on X, other variables on Y, plottingmeans= Covariate balance check

I Time on X, portfolios on Y, plotting market-adjusted returns= Financial event study

I Time on X, generation on Y, plotting average of a simulatedpolicy function= OLG model dynamics

I Index determining policy entry on X, quantiles of dependentvariable on Y, plotting obs. counts in bin= Fuzzy RDD

11 / 30

Page 20: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Extensions:

I Quantiles of instrument on X, other variables on Y, plottingmeans= Covariate balance check

I Time on X, portfolios on Y, plotting market-adjusted returns= Financial event study

I Time on X, generation on Y, plotting average of a simulatedpolicy function= OLG model dynamics

I Index determining policy entry on X, quantiles of dependentvariable on Y, plotting obs. counts in bin= Fuzzy RDD

11 / 30

Page 21: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Extensions:

I Quantiles of instrument on X, other variables on Y, plottingmeans= Covariate balance check

I Time on X, portfolios on Y, plotting market-adjusted returns= Financial event study

I Time on X, generation on Y, plotting average of a simulatedpolicy function= OLG model dynamics

I Index determining policy entry on X, quantiles of dependentvariable on Y, plotting obs. counts in bin= Fuzzy RDD

11 / 30

Page 22: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Extensions:

I Quantiles of instrument on X, other variables on Y, plottingmeans= Covariate balance check

I Time on X, portfolios on Y, plotting market-adjusted returns= Financial event study

I Time on X, generation on Y, plotting average of a simulatedpolicy function= OLG model dynamics

I Index determining policy entry on X, quantiles of dependentvariable on Y, plotting obs. counts in bin= Fuzzy RDD

11 / 30

Page 23: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Setting up a heatmap for economics

Extensions:

I Quantiles of instrument on X, other variables on Y, plottingmeans= Covariate balance check

I Time on X, portfolios on Y, plotting market-adjusted returns= Financial event study

I Time on X, generation on Y, plotting average of a simulatedpolicy function= OLG model dynamics

I Index determining policy entry on X, quantiles of dependentvariable on Y, plotting obs. counts in bin= Fuzzy RDD

and so on.

11 / 30

Page 24: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

11 / 30

Page 25: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

I Many programs for creating heatmaps exist

I Stata twoway contour, hmapI R base, gplots, ggplot2, d3heatmap . . .I Matlab and Python matplotlib

So why another package?

I heatmapEco makes it easy building informative heatmaps byI Focusing on axis setup as a design framework;I Computing relevant axis permutations;I Executing prerequisite data cleaning.

12 / 30

Page 26: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

I Many programs for creating heatmaps existI Stata twoway contour, hmapI R base, gplots, ggplot2, d3heatmap . . .I Matlab and Python matplotlib

So why another package?

I heatmapEco makes it easy building informative heatmaps byI Focusing on axis setup as a design framework;I Computing relevant axis permutations;I Executing prerequisite data cleaning.

12 / 30

Page 27: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

I Many programs for creating heatmaps existI Stata twoway contour, hmapI R base, gplots, ggplot2, d3heatmap . . .I Matlab and Python matplotlib

So why another package?I heatmapEco makes it easy building informative heatmaps by

I Focusing on axis setup as a design framework;I Computing relevant axis permutations;I Executing prerequisite data cleaning.

12 / 30

Page 28: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

I Complicated heatmaps like TCGN’s are also quiteuncomplicated; they are literally a projection of some tabulardata

I In other words, the data loaded in is a 373x1500 matrix. Thevalues are then standardized, variables are clustered and givena colour

I But instead data may need to be aggregated, reshaped; axesrelabelled; colour palettes adjusted to show significant results

I heatmapEco combines R packages to simplify these changesand adds design features of its own

13 / 30

Page 29: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The heatmapEco package

Stata (heatmap)

Residualize data

Aggregate datato axis bins

OUTPUT:aggregated CSV

R (heatmapEco)

Residualize data

Aggregate datato axis bins

Aggregated dataset

Axes defined w/ options

heatmap built withggplot2

OUTPUT:heatmap PDF

14 / 30

Page 30: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco axes

I Currently, X axis can be set up as:I An index axis over numeric values (income, policy thresholds)I A time axis where time strings are converted into valid axis

values by the package

I Currently, Y axis can be set up as:

I A factor axis where each entry is some (aggregated) groupingI A quantile axis where a continuous instrument is split into N

quantiles

Currently output is in landscape letter format, but ultimately axisplacement should be arbitrary and portrait format heatmapspossible

15 / 30

Page 31: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco axes

I Currently, X axis can be set up as:I An index axis over numeric values (income, policy thresholds)I A time axis where time strings are converted into valid axis

values by the package

I Currently, Y axis can be set up as:I A factor axis where each entry is some (aggregated) groupingI A quantile axis where a continuous instrument is split into N

quantiles

Currently output is in landscape letter format, but ultimately axisplacement should be arbitrary and portrait format heatmapspossible

15 / 30

Page 32: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco axes

I Currently, X axis can be set up as:I An index axis over numeric values (income, policy thresholds)I A time axis where time strings are converted into valid axis

values by the package

I Currently, Y axis can be set up as:I A factor axis where each entry is some (aggregated) groupingI A quantile axis where a continuous instrument is split into N

quantiles

Currently output is in landscape letter format, but ultimately axisplacement should be arbitrary and portrait format heatmapspossible

15 / 30

Page 33: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco aggregationIn R the aggregation process is inputted using a pseudo-formula

Z ∼ CrS(Y,ID,w):X(t)

where

I Z is the dependent variable, or the fill variable

I Y is the factor independent variable or a continuous instrument tobe binned

I X is the index or time axis

I t allows time varying Y to be sorted on its values at a time t, (usecaution)

I ID is the individual identifier, either unique or unique with t

I w are quantile weights

In Stata the syntax isheatmap Z Y X [weights], id(varname) [t sort(string)]

16 / 30

Page 34: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco aggregationIn R the aggregation process is inputted using a pseudo-formula

Z ∼ CrS(Y,ID,w):X(t)

where

I Z is the dependent variable, or the fill variable

I Y is the factor independent variable or a continuous instrument tobe binned

I X is the index or time axis

I t allows time varying Y to be sorted on its values at a time t, (usecaution)

I ID is the individual identifier, either unique or unique with t

I w are quantile weights

In Stata the syntax isheatmap Z Y X [weights], id(varname) [t sort(string)]

16 / 30

Page 35: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco aggregation

I Note that, in R, an anonymous function could be passed as anargument

I This means the aggregation function argument grp.func cantake many forms, so long as a summary function is involved

I E.g. take the median of a quantile-month bin. Or take the logtransform of that median

I Or add control flow; if data censored, first remove censoreddata and output log median of what remains

I Stata’s aggregation features are much less rich: every collapsefunction could be inputted into grpfunc

17 / 30

Page 36: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco aggregation

I Note that, in R, an anonymous function could be passed as anargument

I This means the aggregation function argument grp.func cantake many forms, so long as a summary function is involved

I E.g. take the median of a quantile-month bin. Or take the logtransform of that median

I Or add control flow; if data censored, first remove censoreddata and output log median of what remains

I Stata’s aggregation features are much less rich: every collapsefunction could be inputted into grpfunc

17 / 30

Page 37: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco residualization

Both dependent and independent variables (fill and Y axis) can befirst residualized according to a model

Y = βW + Dθ + Fψ + Xγ + ε

Where D, F are fixed effects and X are controls.Stata implementation uses base areg. R implementation uses plm

or lfe (TODO)

18 / 30

Page 38: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Colour palettesStandard divergent color palette

Semi-sequential palette for count data

I On standard palette, far two shades reserved for outlier detection: binnedvalues above the 1.5 + IQR range are considerably darker

I Standard colors are not equally spaced: distribution below median takelonger to get to dark blue hues. This is to emphasize “Ashenfelter dips”

I Count data palette is ColorBrewer YlOrBr, with high outliers and amuted hue to deemphasize data censored by 0 (by default)

19 / 30

Page 39: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

heatmapEco Examples

19 / 30

Page 40: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Download data from Project Tycho. The cleaning in R:

library(data.table)

obj <- melt(fread("MEASLES_Incidence_1930-2003.csv"),

c("YEAR", "WEEK"))

obj[, value := as.numeric(value)]

Calling heatmapEco:

nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)

heatmapEco(value ~ CrS(variable,variable):YEAR, obj,

t.fmt="\%Y", t.per="year", pol.break=c("Jan 1963"),

grp.func=nasum, count=T, factor.ax=T, outliers=T, split.x=10,

zlab="Measles Incidence (p100,000)", save="measlesRep.pdf")

20 / 30

Page 41: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

WYOMINGWISCONSIN

WEST VIRGINIAWASHINGTON

VIRGINIAVERMONT

UTAHTEXAS

TENNESSEESOUTH DAKOTA

SOUTH CAROLINARHODE ISLANDPENNSYLVANIA

OREGONOKLAHOMA

OHIONORTH DAKOTA

NORTH CAROLINANEW YORK

NEW MEXICONEW JERSEY

NEW HAMPSHIRENEVADA

NEBRASKAMONTANAMISSOURI

MISSISSIPPIMINNESOTA

MICHIGANMASSACHUSETTS

MARYLANDMAINE

LOUISIANAKENTUCKY

KANSASIOWA

INDIANAILLINOIS

IDAHOHAWAII

GEORGIAFLORIDA

DISTRICT OF COLUMBIADELAWARE

CONNECTICUTCOLORADO

CALIFORNIAARKANSAS

ARIZONAALASKA

ALABAMA

1930 1940 1950 1960 1970 1980 1990 2000

0

1000

2000

MeaslesIncidence(p100,000)

21 / 30

Page 42: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:

I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 43: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 44: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 45: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 46: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 47: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

WSJ replication

Line by line:I heatmapEco(value ∼ CrS(variable,variable):YEAR,obj,

Inputs formula for aggregation and dataset

I t.fmt="%Y", t.per="year", pol.break=c("Jan 1963"),

Data object, time is in pure “year” format, policy line date

I grp.func=nasum [nasum <- function(...)

if (all(is.na(...))) NA else sum(..., na.rm=TRUE)]

Grouping function is summation, excluding NAs (a year with NAs isinputted as NA, grayed out)

I count=T, factor.ax=T, outliers=T, split.x=10,

Use the count colour palette; the Y-axis are state factors; turn on outlierperception; X tick every ten units

I zlab="Measles Incidence (p100,000)",save="measlesRep.pdf")

Policy line, labels, output location.

Overall: 9 lines of code w/ data.table

I 9 lines fewer than base w/ heatmap.2

I 25 lines fewer than pure ggplot2

22 / 30

Page 48: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The Berger, Turner, Zwick heatmap

Let’s call the program from Stata this time

heatmap y3_trim fthomebuyers_filingunits_2000 mdate ///

[aw=totalhsales_base], n(100) id(zip) tperiod(yearmon) ///

ylabel(10) polbreak(Jan 2009, Dec 2009, Jul 2010) ///

save(BTZRep.pdf)

I Default group function is mean, but the quantiles are weighted

I Each column is a month, labelled appropriately

I polbreak() interprets time strings and adds policy linesaccordingly

I ylabel(n) divides y-axis labels into n even intervals

23 / 30

Page 49: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The Berger, Turner, Zwick heatmap

Let’s call the program from Stata this time

heatmap y3_trim fthomebuyers_filingunits_2000 mdate ///

[aw=totalhsales_base], n(100) id(zip) tperiod(yearmon) ///

ylabel(10) polbreak(Jan 2009, Dec 2009, Jul 2010) ///

save(BTZRep.pdf)

I Default group function is mean, but the quantiles are weighted

I Each column is a month, labelled appropriately

I polbreak() interprets time strings and adds policy linesaccordingly

I ylabel(n) divides y-axis labels into n even intervals

23 / 30

Page 50: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The Berger, Turner, Zwick heatmap

Let’s call the program from Stata this time

heatmap y3_trim fthomebuyers_filingunits_2000 mdate ///

[aw=totalhsales_base], n(100) id(zip) tperiod(yearmon) ///

ylabel(10) polbreak(Jan 2009, Dec 2009, Jul 2010) ///

save(BTZRep.pdf)

I Default group function is mean, but the quantiles are weighted

I Each column is a month, labelled appropriately

I polbreak() interprets time strings and adds policy linesaccordingly

I ylabel(n) divides y-axis labels into n even intervals

23 / 30

Page 51: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The Berger, Turner, Zwick heatmap

Let’s call the program from Stata this time

heatmap y3_trim fthomebuyers_filingunits_2000 mdate ///

[aw=totalhsales_base], n(100) id(zip) tperiod(yearmon) ///

ylabel(10) polbreak(Jan 2009, Dec 2009, Jul 2010) ///

save(BTZRep.pdf)

I Default group function is mean, but the quantiles are weighted

I Each column is a month, labelled appropriately

I polbreak() interprets time strings and adds policy linesaccordingly

I ylabel(n) divides y-axis labels into n even intervals

23 / 30

Page 52: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

The Berger, Turner, Zwick heatmapAnother perspective: check the standard errors on the mean estimates over acoarser partition

heatmap y3_trim fthomebuyers_filingunits_2000 mdate ///

[aw=totalhsales_base], n(25) id(zip) tperiod(yearmon) ///

grpfunc(sem) ylabel(5) count out ///

polbreak(Jan 2009, Dec 2009, Jul 2010) save(BTZRep_se.pdf)

5

10

15

20

25

Aug 2007 Feb 2008 Aug 2008 Feb 2009 Aug 2009 Feb 2010 Aug 2010 Feb 2011

Qua

ntile

of i

nstr

umen

t

0.015

0.020

0.025

0.030

0.035

MeanOutcomes

24 / 30

Page 53: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Conclusions

24 / 30

Page 54: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

When not to use heatmaps

I Heatmaps are not a panacea: there is a tradeoff betweenI Higher density of effectively presented data;I Information lost in using colours, instead of geometric shapes,

to represent change

I It is also unclear how heatmaps can display uncertainty ofstatistics plotted in each bin, e.g. confidence intervals

I A good argument for a package that simplifies heatmapcreation — the less time spent making a visualization, the lesslikely one gets overattached to one when a better solutionexists

25 / 30

Page 55: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

When not to use heatmaps

A good heuristic (define Z as the variable plotted with colour):

I Plotting quantiles on the Y axis: How much clarity is gainedrelative to overlapping line graphs split by Y? Whatinformation is lost?

I Plotting a factor variable on the Y axis: How much clarity isgained relative to a small multiple plot split by Y? Whatinformation is lost?

26 / 30

Page 56: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

When not to use heatmapsExample: Measles vaccine revisited

0

1000

2000

3000

0

1000

2000

3000

0

1000

2000

3000

0

1000

2000

3000

0

1000

2000

3000

0

1000

2000

3000

0

1000

2000

3000

1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000

1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000

ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO CONNECTICUT DELAWARE

DISTRICT OF COLUMBIA FLORIDA GEORGIA HAWAII IDAHO ILLINOIS INDIANA IOWA

KANSAS KENTUCKY LOUISIANA MAINE MARYLAND MASSACHUSETTS MICHIGAN MINNESOTA

MISSISSIPPI MISSOURI MONTANA NEBRASKA NEVADA NEW HAMPSHIRE NEW JERSEY NEW MEXICO

NEW YORK NORTH CAROLINA NORTH DAKOTA OHIO OKLAHOMA OREGON PENNSYLVANIA RHODE ISLAND

SOUTH CAROLINA SOUTH DAKOTA TENNESSEE TEXAS UTAH VERMONT VIRGINIA WASHINGTON

WEST VIRGINIA WISCONSIN WYOMING

Mea

sles

Inci

denc

e (p

100,

000)

YearGraphs by U.S. state

27 / 30

Page 57: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

When not to use heatmaps

Example: visualizing positive assortative matching

(L: Card, Heining & Kline (2012); R: Hagedorn, Law & Manovskii (2016))

2016 How would the interpretation change if the visualization wasinstead overlaying many marginals over each other? Smallmultiples of marginals?

28 / 30

Page 58: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Future updates

I Easy addition of side plots to the heatmap (a histogram onboth axes, time series, bar plot of differences over twoperiods. . . )

I Syntax revisions

I Let either axis support variables belonging in one of four types(time, factor, quantile, index)

I Variable dimensions for heatmap cells (for unevendiscretizations of a continuous variable)

I ???

29 / 30

Page 59: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

References I

Berger, David, Nicholas Turner, and Eric Zwick. 2016. “Stimulating HousingMarkets.” Working Paper.

Card, David, Jorg Heining, and Patrick Kline. 2012. “Workplace heterogeneity andthe rise of West German wage inequality.” National Bureau of Economic Research.

DeBold, Tynan, and Dov Friedman. 2015. “Battling Infectious Diseases in the 20thCentury: The Impact of Vaccines.” The Wall Street Journal, , (11).

Eisen, Michael B, Paul T Spellman, Patrick O Brown, and David Botstein. 1998.“Cluster analysis and display of genome-wide expression patterns.” Proceedings ofthe National Academy of Sciences, 95(25): 14863–14868.

Fung, Kaiser. n.d.. “Advocacy graphics.” http: // junkcharts. typepad. com/ junk_

charts/ 2014/ 04/ advocacy-graphics. html , Accessed: 2016-03-14.

Hagedorn, Marcus, Tzuo Hann Law, and Iourii Manovskii. 2016. “Identifyingequilibrium models of labor market sorting.”

Network, Cancer Genome Atlas Research, et al. 2013. “Integrated genomiccharacterization of endometrial carcinoma.” Nature, 497(7447): 67–73.

Wand, Handan et. al. 2014. “Quilt Plots: A Simple Tool for the Visualisation ofLarge Epidemiological Data.” PLOS One, , (11).

30 / 30

Page 60: Tom Cui, Eric Zwick (DRAFT) October 5, 2016ericzwick.com/heatmap/heatmaps.pdf · 2020-07-03 · Setting up a heatmap for economics Using earlier takeaways: I The axes change the interpretation

Thanks!

30 / 30


Recommended