Heat (and hexagon) plots in Stata
Ben Jann
University of Bern, [email protected]
2019 London Stata ConferenceLondon, September 5–6, 2019
Ben Jann (University of Bern) heatplot London, 05.09.2019 1
Outline
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 2
What is a heat plot?
Generally speaking, a heat plot is a graph in which some aspect ofthe data is displayed as a color gradient.
A simple example is a bivariate histogram; the color gradient isused to illustrate (relative) frequencies within bins of X and Y .
Ben Jann (University of Bern) heatplot London, 05.09.2019 3
. quietly drawnorm y x, n(10000) corr(1 .5 1) cstorage(lower) clear
. heatplot y x, backfill colors(plasma)-4
-20
24
y
-4 -2 0 2 4x
.84893
.78679
.72464
.6625
.60036
.53821
.47607
.41393
.35179
.28964
.2275
.16536
.10321
.04107
percent
Ben Jann (University of Bern) heatplot London, 05.09.2019 4
What about hexagons?
Hexagons are great because they look a bit like circles, but you canjoin them together without leaving gaps.
Bees found out how awesome hexagons are long time ago.
Ben Jann (University of Bern) heatplot London, 05.09.2019 5
What about hexagons?
Latter on, gully cover designers found out that hexagons look greaton gully covers.
Ben Jann (University of Bern) heatplot London, 05.09.2019 6
What about hexagons?
Finally, also statisticians discovered the virtues of hexagons.
“The here are many reasons for using hexagons, at least oversquares. Hexagons have symmetry of nearest neighbors which islacking in square bins. Hexagons are the maximum number ofsides a polygon can have for a regular tesselation of the plane,so in terms of packing a hexagon is 13% more efficient forcovering the plane than squares. This property translates intobetter sampling efficiency at least for elliptical shapes. Lastlyhexagons are visually less biased for displaying densities thanother regular tesselations. For instance with squares our eyesare drawn to the horizontal and vertical lines of the grid.”1
1Lewin-Koh, N. (2018). Hexagon Binning: an Overview. Available fromhttps://cran.r-project.org/web/packages/hexbin/vignettes/hexagon_binning.pdf
Ben Jann (University of Bern) heatplot London, 05.09.2019 7
Example from above using hexagons
. hexplot y x, backfill colors(plasma)-4
-20
24
y
-4 -2 0 2 4x
.8875
.8225
.7575
.6925
.6275
.5625
.4975
.4325
.3675
.3025
.2375
.1725
.1075
.0425
percent
Ben Jann (University of Bern) heatplot London, 05.09.2019 8
Why heat plots (be it squares or hexagons)?
Heat plots are great for visualizing structure in (large) datasets.
Here is an example:. use example, clear. count
134,100. list in 1/10
X Y Z
1. 16 193 .124843352. 371 13 .007729073. 157 380 .573158054. 334 443 .316669945. 424 205 .23699765
6. 47 319 .306750087. 50 288 .310039268. 434 5 .039255079. 180 303 .5651538510. 428 183 .21671468
Ben Jann (University of Bern) heatplot London, 05.09.2019 9
Run some analyses . . .. two (lpoly Z X, degree(1)) (lpoly Z Y), legend(order(1 "X" 2 "Y"))
0.2
.4.6
lpol
y sm
ooth
: Z
0 100 200 300 400 500lpoly smoothing grid
X Y
Interesting! We clearly see the business cycles and a general upwardtrend in country Y , but country X did not develop much and therehas been some severe crisis between time 200 and 300.
Ben Jann (University of Bern) heatplot London, 05.09.2019 10
Here is a heat plot of the data:. hexplot Z Y X, xbins(10) ybins(15) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')
Ben Jann (University of Bern) heatplot London, 05.09.2019 11
Here is a heat plot of the data:. hexplot Z Y X, xbins(20) ybins(30) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')
Ben Jann (University of Bern) heatplot London, 05.09.2019 12
Here is a heat plot of the data:. hexplot Z Y X, xbins(40) ybins(60) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')
Ben Jann (University of Bern) heatplot London, 05.09.2019 13
Here is a heat plot of the data:. hexplot Z Y X, xbins(80) ybins(120) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')
Ben Jann (University of Bern) heatplot London, 05.09.2019 14
Here is a heat plot of the data:. hexplot Z Y X, xbins(160) ybins(240) levels(20) clip ///> xlabel(none) ylabel(none) aspect(`=447/300')
Ben Jann (University of Bern) heatplot London, 05.09.2019 15
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 16
Main commands
Bivariate histogram
heatplot Y X[if] [
in] [
weight] [
, options]
Trivariate heat plot (color gradient for Z)
heatplot Z Y X[if] [
in] [
weight] [
, options]
Heat plot from Stata matrix
heatplot matname[, options
]Heat plot from Mata matrix
heatplot mata(name)[, options
]Heat plot using hexagons
hexplot ...
Ben Jann (University of Bern) heatplot London, 05.09.2019 17
Main options
Color gradient optionslevels(#) number of color binscuts(numlist) custom cutpoints for color binscolors(palette) color map to be used for the color binsstatistic(stat) how Z is aggregatedsize
[(exp)
]| sizeprop size of color fields
values(options) display values as marker labelsscatter
[(...)
]render color fields as scatter plot
keylabels(spec) how legend keys are labeled. . .
Binning of Y and X[x|y
]bins(spec) how continuous Y and X are binned[
x|y]bwidth(spec) alternative to bins()[
x|y]discrete
[(#)
]treat variables as discrete and omit binning
(note: categorical X and Y can be specified as i.varname). . .
Ben Jann (University of Bern) heatplot London, 05.09.2019 18
Main options
Matrix optionsdrop(numlist) drop elements equal to values in numlistlower display lower triangle onlyupper display upper triangle only. . .
Graph optionsaddplot(plots) add other plots to the graphby(varlist
[, options
]) repeat plot by subgroups
twoway_options general twoway options. . .
Some more options related to storing results . . .
Ben Jann (University of Bern) heatplot London, 05.09.2019 19
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 20
Default
. webuse nhanes2, clear
. heatplot weight height
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
.86884
.80958
.75033
.69108
.63182
.57257
.51332
.45406
.39481
.33556
.2763
.21705
.15779
.09854
.03929
percent
Ben Jann (University of Bern) heatplot London, 05.09.2019 21
Change resolution
. heatplot weight height, xbins(20) ybwidth(10 30)
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
4.26823.97453.68083.38713.09342.79972.5062.21231.91871.6251.33131.0376.74389.4502.15651
percent
Ben Jann (University of Bern) heatplot London, 05.09.2019 22
Use counts, change color ramp, change binning, and labeling
. heatplot weight height, statistic(count) color(plasma, reverse) ///> cut(1(5)@max) keylabels(, range(1))
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
91-9386-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5
count
Ben Jann (University of Bern) heatplot London, 05.09.2019 23
Use hexagons instead of squares
. hexplot weight height, statistic(count) color(plasma, reverse) ///> cut(1(5)@max) keylabels(, range(1))
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5
count
Ben Jann (University of Bern) heatplot London, 05.09.2019 24
Scale size of hexagons by relative frequency
. hexplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5
count
Ben Jann (University of Bern) heatplot London, 05.09.2019 25
Scaling also available with squares
. heatplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
91-9386-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5
count
Ben Jann (University of Bern) heatplot London, 05.09.2019 26
Adding other plots
. hexplot weight height, statistic(count) color(plasma) ///> cut(1(5)@max) keylabels(, range(1)) size ///> addplot(lpolyci weight height, degree(1) psty(p2) lw(*1.5) ac(%50) alc(%0))
050
100
150
200
wei
ght (
kg)
140 160 180 200height (cm)
96-9891-9586-9081-8576-8071-7566-7061-6556-6051-5546-5041-4536-4031-3526-3021-2516-2011-156-101-5
count
Ben Jann (University of Bern) heatplot London, 05.09.2019 27
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 28
Gender distribution (proportion female) by weight and height
. webuse nhanes2, clear
. hexplot female weight height, color(PiYG) ylabel(25(25)175) cuts(0(.05)1)
2550
7510
012
515
017
5w
eigh
t (kg
)
140 160 180 200height (cm)
.975
.925
.875
.825
.775
.725
.675
.625
.575
.525
.475
.425
.375
.325
.275
.225
.175
.125
.075
.025
female
Ben Jann (University of Bern) heatplot London, 05.09.2019 29
Same graph taking account relative frequencies
. hexplot female weight height, color(PiYG) ylabel(25(25)175) cuts(0(.05)1) ///> sizeprop recenter p(lcolor(black) lwidth(vthin) lalign(center))
2550
7510
012
515
017
5w
eigh
t (kg
)
140 160 180 200height (cm)
.975
.925
.875
.825
.775
.725
.675
.625
.575
.525
.475
.425
.375
.325
.275
.225
.175
.125
.075
.025
female
Ben Jann (University of Bern) heatplot London, 05.09.2019 30
Distribution of the body mass index by gender and its relation to highblood pressure
. heatplot highbp bmi i.sex, xdiscrete(0.9) yline(18.5 25) cuts(0(.05).75) ///> sizeprop recenter colors(inferno) plotregion(color(gs11)) ylabel(, nogrid)
1020
3040
5060
Body
Mas
s In
dex
(BM
I)
Male Female
.875
.725
.675
.625
.575
.525
.475
.425
.375
.325
.275
.225
.175
.125
.075
.025
highbp
Ben Jann (University of Bern) heatplot London, 05.09.2019 31
Sea surface temperature by longitude, latitude, and date
. sysuse surface, clear(NOAA Sea Surface Temperature). heatplot temperature longitude latitude, discrete(.5) statistic(asis) ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)
3031
3233
3435
3637
38
142 144 146 148 150 142 144 146 148 150
01mar2011 11mar2011
30N
to 3
8.5N
142E to 150EGraphs by date
Ben Jann (University of Bern) heatplot London, 05.09.2019 32
Same plot using hexagons
. hexplot temperature longitude latitude, discrete(.5) statistic(asis) ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)
3031
3233
3435
3637
38
142 144 146 148 150 142 144 146 148 150
01mar2011 11mar2011
30N
to 3
8.5N
142E to 150EGraphs by date
Ben Jann (University of Bern) heatplot London, 05.09.2019 33
Same plot using hexagons
. hexplot temperature longitude latitude, discrete(.5) statistic(asis) clip ///> by(date, legend(off)) ylabel(30(1)38) aspectratio(1)
3031
3233
3435
3637
38
142 144 146 148 150 142 144 146 148 150
01mar2011 11mar2011
30N
to 3
8.5N
142E to 150EGraphs by date
Ben Jann (University of Bern) heatplot London, 05.09.2019 34
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 35
Same plot using hexagons
. quietly sysuse auto, clear
. hexplot price weight mpg, values(format(%9.0f)) legend(off) aspectratio(1) ///> colors(plasma, intensity(.6)) p(lc(black) lalign(center))
5147 4294 4425
4296 5837 3876 3866 4194 5397
7103 4647 5651
11995 4976 4569 4647
12990 4888
9298 8814
11385 15906
12546
1000
2000
3000
4000
5000
Wei
ght (
lbs.
)
10 20 30 40Mileage (mpg)
Ben Jann (University of Bern) heatplot London, 05.09.2019 36
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 37
First store correlations in a matrix and then plot from there
. quietly sysuse auto, clear
. quietly correlate price mpg trunk weight length turn foreign
. matrix C = r(C)
. heatplot C, values(format(%9.3f)) color(hcl, diverging intensity(.6)) ///> legend(off) aspectratio(1)
1.000 -0.469 0.314 0.539 0.432 0.310 0.049
-0.469 1.000 -0.582 -0.807 -0.796 -0.719 0.393
0.314 -0.582 1.000 0.672 0.727 0.601 -0.359
0.539 -0.807 0.672 1.000 0.946 0.857 -0.593
0.432 -0.796 0.727 0.946 1.000 0.864 -0.570
0.310 -0.719 0.601 0.857 0.864 1.000 -0.631
0.049 0.393 -0.359 -0.593 -0.570 -0.631 1.000
price
mpg
trunk
weight
length
turn
foreign
price mpg trunk weight length turn foreign
Ben Jann (University of Bern) heatplot London, 05.09.2019 38
Plot lower triangle only
. heatplot C, values(format(%9.3f)) color(hcl, diverging intensity(.6)) ///> legend(off) aspectratio(1) lower nodiagonal
-0.469
0.314 -0.582
0.539 -0.807 0.672
0.432 -0.796 0.727 0.946
0.310 -0.719 0.601 0.857 0.864
0.049 0.393 -0.359 -0.593 -0.570 -0.631
mpg
trunk
weight
length
turn
foreign
price mpg trunk weight length turn
Ben Jann (University of Bern) heatplot London, 05.09.2019 39
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 40
Preparation: Run a cluster analysis and obtain dissimilarity matrix; addinformation on clusters to the matrix
. sysuse lifeexp, clear(Life expectancy, 1998). keep if gnppc<.(5 observations deleted). cluster wards popgrowth lexp gnppccluster name: _clus_1. cluster generate N = groups(`=_N'), ties(fewer). cluster generate G = groups(5). sort G N. matrix dissim D = popgrowth lexp gnppc. mata: st_matrixcolstripe("D", strofreal(st_data(., "G N"))). mata: st_matrixrowstripe("D", strofreal(st_data(., "G N")))
Ben Jann (University of Bern) heatplot London, 05.09.2019 41
Display matrix with highlighted clusters
. heatplot D, equations(lcolor(red) lwidth(*2)) ///> plotregion(margin(zero)) legend(off) aspectratio(1) xscale(alt)
1
2
3
45
1 2 3 4 5
Ben Jann (University of Bern) heatplot London, 05.09.2019 42
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 43
Copy some data
. copy http://www.stata-press.com/data/r15/homicide1990.dta .
. copy http://www.stata-press.com/data/r15/homicide1990_shp.dta .
Compute spacial weights matrix (this might take a while)
. use homicide1990(S.Messner et al.(2000), U.S southern county homicide rates in 1990). spmatrix create contiguity W. spmatrix matafromsp W id = W. mata mata describe W
# bytes type name and extent
15,949,952 real matrix W[1412,1412]
(matrix W has about 2 million cells)
Ben Jann (University of Bern) heatplot London, 05.09.2019 44
Heat plot of W with default settings, ignoring cells (i.e. weights) that areequal to zero
. heatplot mata(W), drop(0) aspectratio(1)0
500
1000
1500
Row
s
0 500 1000 1500Columns
22.73221.17519.61718.0616.50314.94513.38811.83110.2738.71617.15875.60144.04412.4867.92938
sum
Ben Jann (University of Bern) heatplot London, 05.09.2019 45
Hexagon plot with fine-grained resolution
. heatplot mata(W), drop(0) aspectratio(1) hexagon bins(100)0
500
1000
1500
Row
s
0 500 1000 1500Columns
5.83255.44065.04884.6574.26513.87333.48143.08962.69772.30591.9141.52221.1303.73848.34663
sum
Ben Jann (University of Bern) heatplot London, 05.09.2019 46
Plot each cell individually using the discrete option
. heatplot mata(W), drop(0) aspectratio(1) discrete color(black) p(lalign(center))
0500
1000
1500
Row
s0 500 1000 1500
Columns
Ben Jann (University of Bern) heatplot London, 05.09.2019 47
Could also use the scatter option
. heatplot mata(W), drop(0) aspectratio(1) discrete color(black) scatter p(ms(p))
0500
1000
1500
Row
s0 500 1000 1500
Columns
Ben Jann (University of Bern) heatplot London, 05.09.2019 48
1 Introduction
2 Syntax of heatplot and hexplot
3 ExamplesBivariate histogramTrivariate distributionsDisplay values as marker labelsCorrelation matrixDissimilarity matrixSpacial weights matrix
4 Installation
Ben Jann (University of Bern) heatplot London, 05.09.2019 49
Installation
To install heatplot (and hexplot) type
. ssc install heatplot, replace
heatplot depends on the palettes package, which itself dependson the ColrSpace Mata library, so you may also want to type
. ssc install palettes, replace
. ssc install colrspace, replace
Ben Jann (University of Bern) heatplot London, 05.09.2019 50