i
Lecture notes forStatistical Computing 1 (SC1)
Stat 590University of New Mexico
Erik B. Erhardt
Fall 2015
Contents
1 More plots in R 1
1.1 Tree map plots (for hierarchical data) . . . . . . . . . . . . 2
1.2 Parallel sets plot (for categorical data) . . . . . . . . . . . 4
1.3 Sankey plots (for categorical data) . . . . . . . . . . . . . 6
1.4 Steam graphs (stacked density plots) . . . . . . . . . . . . 8
1.5 When data is (dis)agreeable . . . . . . . . . . . . . . . . . 11
1.6 Corrgrams/correlogram correlation plots . . . . . . . . . . 12
1.7 Beeswarm boxplot . . . . . . . . . . . . . . . . . . . . . . 18
1.8 Back-to-back histogram . . . . . . . . . . . . . . . . . . . 20
1.9 Graphs (networks) with directed edges . . . . . . . . . . . 21
Chapter 1
More plots in R
A selection of plots for more visualization possibilities. Not all of these are
good. These are meant for consideration and discussion. We’ll visit these
footnote links as we go.
Much of the R code is not shown in the pdf; refer to the R code posted
on the website.
Also, there are lots of packages used in this chapter:install.all <- FALSEif (install.all) {
install.list <- c("treemap", "corrgram", "ggplot2", "GGally", "ellipse", "beeswarm", "plyr", "sna", "Hmisc", "reshape2")
# installinstall.packages(install.list)# loadlapply(install.list, library, character.only = TRUE)
}
2 More plots in R
1.1 Tree map plots (for hierarchical data)
A treemap is a space-filling visualization of hierarchical structures1. It’s
not an easy design2 to get right. The treemap package does a good job.library(treemap)
# Gross national income (per capita) in dollars per country in 2010.
data(GNI2010)
str(GNI2010)
## 'data.frame': 208 obs. of 5 variables:
## $ iso3 : chr "ABW" "AFG" "AGO" "ALB" ...
## $ country : chr "Aruba" "Afghanistan" "Angola" "Albania" ...
## $ continent : chr "North America" "Asia" "Africa" "Europe" ...
## $ population: num 108 34385 19082 3205 7512 ...
## $ GNI : num 0 410 3960 3960 0 ...
head(GNI2010, 10)
## iso3 country continent population GNI
## 1 ABW Aruba North America 108 0
## 2 AFG Afghanistan Asia 34385 410
## 3 AGO Angola Africa 19082 3960
## 4 ALB Albania Europe 3205 3960
## 5 ARE United Arab Emirates Asia 7512 0
## 6 ARG Argentina South America 40412 8620
## 7 ARM Armenia Asia 3092 3200
## 8 ASM American Samoa Oceania 68 0
## 9 ATG Antigua and Barbuda North America 88 13280
## 10 AUS Australia Oceania 22299 46200
# create treemap
tmPlot(GNI2010
, index = c("continent", "iso3")
, vSize = "population"
, vColor = "GNI"
, type = "value")
## Note: tmPlot deprecated as of version 2.0. Please use treemap instead.
1http://en.wikipedia.org/wiki/Treemapping2http://www.juiceanalytics.com/writing/10-lessons-treemap-design/
1.1 Tree map plots (for hierarchical data) 3
population
GNI 0 10000 20000 30000 40000 50000 60000 70000 80000 90000
AGOBDI
BEN
BFA
BWA
CAF
CIV
CMRCOG
DJI
DZA
EGY
ERI
ETHGHA
GIN
KEN LBR
LBY
LSO
MAR MDG
MLIMOZ
MRTMWINAM
NERNGA
RWASDN
SEN
SLE
SOM
TCD
TGO
TUN
TZA
UGA
ZAF
ZMB ZWE
AFG ARE
ARM
AZE
BGD
CHN
GEO
HKG
IDN
IND
IRN
IRQ
ISR
JOR
JPN
KAZ
KGZ
KHM
KOR
KWT
LAO
LBN
LKA
MMR
MNG
MYS
NPL
OMN
PAK
PHL
PRK
QAT
SAU
SGPSYR
THA
TJKTKM
TUR
UZB
VNM
YEM
ALB
AUT
BEL
BGR BIHBLRCHE
CZE
DEU
DNK
ESPEST
FIN
FRA
GBR
GRC
HRVHUN
IRL
ITA
LTU
LVAMDA
MKD
NLD
NOR
POL
PRT
RUS
SRB SVK
SVN
SWE
CANCRI
CUB DOM
GTM HNDHTIJAM
MEX
NIC
PANPRI
SLV
USA
AUS
FJI
NZL
PNG
ARG
BOL
BRA
CHL
COL
ECUPER
PRY
URYVEN
Africa
Asia
Europe
North America
South America
Obama’s budget3 looks better as a tree map than with another method4.
Take a look at my Windows harddrive with SpaceSniffer.exe5.
3http://www.nytimes.com/interactive/2010/02/01/us/budget.html?_r=04http://www.nytimes.com/interactive/2012/02/13/us/politics/
2013-budget-proposal-graphic.html?hp5http://www.uderzo.it/main_products/space_sniffer/
4 More plots in R
1.2 Parallel sets plot (for categorical data)
Parallel sets plots6 visualizes cross-tabulated data, most helpful for tables
of at least 3 dimensions.## Parallel sets function
parallelset <- function(..., freq, col="gray", border=0, layer,
alpha=0.5, gap.width=0.05) {p <- data.frame(..., freq, col, border, alpha, stringsAsFactors=FALSE)
n <- nrow(p)
if(missing(layer)) { layer <- 1:n }p$layer <- layer
np <- ncol(p) - 5
d <- p[ , 1:np, drop=FALSE]
p <- p[ , -c(1:np), drop=FALSE]
p$freq <- with(p, freq/sum(freq))
col <- col2rgb(p$col, alpha=TRUE)
if(!identical(alpha, FALSE)) { col["alpha", ] <- p$alpha*256 }p$col <- apply(col, 2, function(x) do.call(rgb, c(as.list(x), maxColorValue = 256)))
getp <- function(i, d, f, w=gap.width) {a <- c(i, (1:ncol(d))[-i])
o <- do.call(order, d[a])
x <- c(0, cumsum(f[o])) * (1-w)
x <- cbind(x[-length(x)], x[-1])
gap <- cumsum( c(0L, diff(as.numeric(d[o,i])) != 0) )
gap <- gap / max(gap) * w
(x + gap)[order(o),]
}dd <- lapply(seq_along(d), getp, d=d, f=p$freq)
par(mar = c(0, 0, 2, 0) + 0.1, xpd=TRUE )
plot(NULL, type="n",xlim=c(0, 1), ylim=c(np, 1),
xaxt="n", yaxt="n", xaxs="i", yaxs="i", xlab='', ylab='', frame=FALSE)
for(i in rev(order(p$layer)) ) {for(j in 1:(np-1) )
polygon(c(dd[[j]][i,], rev(dd[[j+1]][i,])), c(j, j, j+1, j+1),
col=p$col[i], border=p$border[i])
}text(0, seq_along(dd), labels=names(d), adj=c(0,-2), font=2)
for(j in seq_along(dd)) {ax <- lapply(split(dd[[j]], d[,j]), range)
for(k in seq_along(ax)) {lines(ax[[k]], c(j, j))
6http://stats.stackexchange.com/questions/12029/is-it-possible-to-create-parallel-sets-plot-using-r
1.2 Parallel sets plot (for categorical data) 5
text(ax[[k]][1], j, labels=names(ax)[k], adj=c(0, -0.25))
}}
}
data(Titanic)
myt <- subset(as.data.frame(Titanic), Age=="Adult",
select=c("Survived","Sex","Class","Freq"))
myt <- within(myt, {Survived <- factor(Survived, levels=c("Yes","No"))
levels(Class) <- c(paste(c("First", "Second", "Third"), "Class"), "Crew")
color <- ifelse(Survived=="Yes","#008888","#330066")
})
with(myt, parallelset(Survived, Sex, Class, freq=Freq, col=color, alpha=0.2))
Survived
Sex
Class
Yes No
Male Female
First Class Second Class Third Class Crew
6 More plots in R
1.3 Sankey plots (for categorical data)
Sankey diagrams7 are a specific type of flow diagram, in which the width
of the arrows is shown proportionally to the flow quantity. They are
typically used to visualize energy or material or cost transfers between
processes. One of the most famous Sankey diagrams is Charles Minard’s
Map8 of Napoleon’s Russian Campaign of 1812. If I had known about
these earlier in my career, I would have used it to show how patients were
included/excluded for different reasons in an epidemiological study.
An R function is available9 which is used below for patient tracking.# My example (there is another example inside Sankey.R):
inputs = c(6, 144)
losses = c(6,47,14,7, 7, 35, 34)
unit = "n ="
labels = c("Transfers",
"Referrals\n","Unable to Engage",
"Consultation only",
"Did not complete the intake",
"Did not engage in Treatment",
"Discontinued Mid-Treatment",
"Completed Treatment",
"Active in \nTreatment")
SankeyR(inputs,losses,unit,labels)
# Clean up my mess
rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit")
## Warning in rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit"): object
’sourc.https’ not found
7http://www.sankey-diagrams.com/8http://en.wikipedia.org/wiki/File:Minard.png9https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/
Sankey.R
1.3 Sankey plots (for categorical data) 7
Transfers: 6 n = (4%)
Referrals: 144 n = (96%)
Unable to Engage: 6
n = (4%)
Consulta
tion only:
47 n = (31.3%)
Did not complete th
e intake
: 14 n = (9
.3%)
Did not engage in
Treatm
ent: 7 n = (4
.7%)
Discontin
ued Mid−Tre
atment: 7
n = (4.7%)
Completed Treatm
ent: 35 n = (2
3.3%)
Active in Treatment: 34 n = (22.7%)
8 More plots in R
1.4 Steam graphs (stacked density plots)
The NY Times box office revenue plot10 was one of the first steam graphs
created, showing 22 years of data where revenues have clearly grown over
time. The plots have been discussed in detail11 as well as how to create
them in R12. The two examples13 14 below provide a start.## Steam graphs 1 (stacked density plots)
plot.stacked <- function(x,y, ylab="", xlab="", ncol=1, xlim=range(x, na.rm=T), ylim=c(0, 1.2*max(rowSums(y), na.rm=T)), border = NULL, col=rainbow(length(y[1,]))){
## reorder the columns so each curve first appears behind previous curves
## when it first becomes the tallest curve on the landscape
#y <- y[, unique(apply(y, 1, which.max))]
plot(x,y[,1], ylab=ylab, xlab=xlab, ylim=ylim, xaxs="i", yaxs="i", xlim=xlim, t="n")
bottom=0*y[,1]
for(i in 1:length(y[1,])){top=rowSums(as.matrix(y[,1:i]))
polygon(c(x, rev(x)), c(top, rev(bottom)), border=border, col=col[i])
bottom=top
}abline(h=seq(0,200000, 10000), lty=3, col="grey")
legend("topleft", rev(colnames(y)), ncol=ncol, inset = 0, fill=rev(col), bty="0", bg="white", cex=0.8, col=col)
box()
}
#set.seed(1)
m <- 500
n <- 15
x <- seq(m)
y <- matrix(0, nrow=m, ncol=n)
colnames(y) <- seq(n)
for(i in seq(ncol(y))){mu <- runif(1, min=0.25*m, max=0.75*m)
SD <- runif(1, min=5, max=30)
10http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.
html11http://leebyron.com/else/streamgraph/12http://flowingdata.com/2012/07/03/a-variety-of-area-charts-with-r/13http://stackoverflow.com/questions/13084998/streamgraphs-in-r14http://gallery.r-enthusiasts.com/graph/Kernel_density_estimator%3Cbr%
3EIllustration_of_the_kernels_30
1.4 Steam graphs (stacked density plots) 9
TMP <- rnorm(1000, mean=mu, sd=SD)
HIST <- hist(TMP, breaks=c(0,x), plot=FALSE)
fit <- smooth.spline(HIST$counts ~ HIST$mids)
y[,i] <- fit$y
}
plot.stacked(x,y)
100 200 300 400 500
050
100
150 15
1413121110987654321
## Steam graphs 2 (stacked density plots)
require("RColorBrewer")
palette(brewer.pal(7,"Accent")[-4])
x <- rnorm(5) #c(-0.475,-1.553,-0.434,-1.019,0.395)
d1 <- density(x,bw=.3,from=-3,to=3)
par(mar=c(3, 2, 2, 3) + 0.1,las=1)
plot(d1,ylim=c(-.3,.6),xlim=c(-3,3),axes=F,ylab="",xlab="",main="")
axis(1)
axis(4,0:3*.2)
abline(h=-.3,col="gray")
#rug(x)
mat <- matrix(0,nc=512,nr=5)
for(i in 1:5){d <- density(x[i],bw=.3,from=-3,to=3)
10 More plots in R
lines(d$x,(d$y)/5-.3,col=i+1)
mat[i,] <- d$y/5
}for(i in 2:5) mat[i,] <- mat[i,] + mat[i-1,]
usr <- par("usr")
mat <- rbind(0,mat)
#segments(x0=rep(usr[1],5),x1=rep(d£x[171],5),y0=mat[,171],y1=mat[,171],lty=3)
for(i in 2:6) polygon(c(d$x,rev(d$x)),c(mat[i,],rev(mat[i-1,])),col=i,border=NA)
#segments(x0=d£x[171],x1=d£x[171],y0=0,y1=d1£y[171],lwd=3,col="white")
lines(d1,lwd=2)
box()
#palette("default")
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
1.5 When data is (dis)agreeable 11
1.5 When data is (dis)agreeable
Sometimes you want to emphasize15 how you feel about your data16.## Grumpy and Smile examples
X1 <- runif(20,0,100)
Y1 <- runif(20,0,100)
Y2 <- 2*X1-0.01*X1^2+rnorm(20,0,10) # quad function
# grumpy version:
smile(X1,Y1,emotion="grumpy",face="green")
# happy version :
smile(X1,Y2,rainbow.gap=0.75)
X
Y
0 20 40 60 80 100
2040
6080
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
X
Y
0 20 40 60 80 100
020
4060
8010
0
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
15http://gallery.r-enthusiasts.com/graph/Smily_and_Grumpy_faces_17416Please never use this except in jest, of course.
12 More plots in R
1.6 Corrgrams/correlogram correlation plots
Corrgrams17 help us visualize the data in correlation matrices18 The corrgram
package is one strategy.## Corrgram Examples 1 and 2
library(corrgram)
data(mtcars)
corrgram(mtcars, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Car Milage Data in PC2/PC1 Order")
corrgram(mtcars, order=TRUE, lower.panel=panel.ellipse,
upper.panel=panel.pts, text.panel=panel.txt,
diag.panel=panel.minmax,
main="Car Milage Data in PC2/PC1 Order")
gear
am
drat
mpg
vs
qsec
wt
disp
cyl
hp
carb
Car Milage Data in PC2/PC1 Order
3
5
gear ●●●
●●●●
●●●●
●●●●●●
●●●
●●●●●
●
●●●●●
● ●●●
●●● ●
●●●●
●●●●●●
● ●●
●● ● ●●
●
●● ●●●
● ●●●
●●●●
●●●●
●●●●● ●
●● ●
●●●● ●
●
● ●● ●●
● ●● ●
●● ●●
●●●●
●●●●●●
●●●
●●●●●
●
● ●●●●
● ●● ●
●● ●●
● ●●●
●●●●●●
●● ●
●●●● ●
●
●●● ●●
● ●●●
●●●●
●●●●
●●● ●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
●●●●
●●● ●●●
●●●
● ●● ● ●
●
●● ●● ●
● ●●●
● ●● ●
●● ●●
●●●●●●
●●●
● ●●●●
●
●● ●● ●
● ●●●
● ●● ●
● ●●●
●●●●●●
●●●
● ●● ●●
●
●● ●● ●
● ●●●
● ●● ●
●● ●●
●●● ●●●
● ●●
● ●● ●●
●
●● ● ● ●
●
0
1
am●●●
●●● ● ●●●●●●●●●●
● ●●
●● ● ●●
● ●● ●●● ● ●●●
●●●● ●●●●●●●●● ●
●● ●
●●●● ●
●● ●● ●● ● ●● ●
●● ●● ●●●●●●●●●●
●●●
●●●●●
●● ●●●● ● ●● ●
●● ●● ● ●●●●●●●●●
●● ●
●●●● ●
●●●● ●● ● ●●●
●●●●●●●● ●●● ●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●●●●● ●●● ●●●
●●●
● ●● ● ●
●●● ●● ●● ●●●
● ●● ●●● ●● ●●●●●●
●●●
● ●●●●
●●● ●● ●● ●●●
● ●● ●● ●●● ●●●●●●
●●●
● ●● ●●
●●● ●● ●● ●●●
● ●● ●●● ●●●●● ●●●
● ●●
● ●● ●●
● ●● ● ● ●●
2.76
4.93
drat ●●●
●●
●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●
●●
●
●
●●●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●
●●
●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●●●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
● ●
●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
● ●
●
●
●● ●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
● ●
●
●
●●●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●●
● ●
●
●
●● ●●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
10.4
33.9
mpg ●●●●
● ●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●
●● ●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●
●●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●
●●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●
●●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
●●
●
●
●
●
● ●●●●
●●
●
●●
●●
●●●
●●
●
●●
●
●
●●●
●
● ●
●
●
●
●
●
0
1
vs●●
●●
●
●
●
● ●●●
●●●●●●
●● ●●
●●● ●
●
●
●
● ●●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●● ●
●●●●
●
●
●
●● ●
●
●●
● ●
●
●
●
●●●●
●●● ●●●
●●● ●
●● ● ●
●
●
●
●● ●
●
●●
● ●
●
●
●
●● ●●
●●●●●●
●●●●
●●●●
●
●
●
●● ●
●
●●
●●
●
●
●
● ●●●
●●●●●●
●●● ●
●● ●●
●
●
●
●● ●
●
●●
●●
●
●
●
●● ●●
●●● ●●●
● ●●●
●● ●●
●
●
●
● ● ●
●
14.5
22.9
qsec●●
●●
●
●
●
●
●
●●
●●● ●●●
●●
● ●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●●● ●●●
●●
● ●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●●●●●●
●●
●●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●●●●●●
●●
● ●
●●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●●● ●●●
●●
●●
●●
●
●
●
●●
●●
●
●
1.51
5.42
wt●●
●
● ●● ●●●●●
●●●
●●●
●
●●
●
●●● ●
●●
●
●●
●
● ●●
●
● ●● ●●●
●●
●●●
●●●
●
●●
●
●●●●
●●
●
●●
●
● ●●
●
● ●● ●● ●
●●
●●●
●●●
●
●●
●
●●●●
●●
●
●●
●
● ●●
●
● ●● ●●●
●●
●●●
●●●
●
●●
●
●●●●
● ●
●
●●
●
●
71.1
472
disp●●
●
●
●
●
●
●●●●
●●●
●●●
●●●●
●●●●
●●●
●
●
●
●●●
●
●
●
●
●
● ●●●
●●●
●●●
●●●●
●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●●●
●●●
●●●
● ●●●
●●●
●
●●●
●
●
●
●
4
8
cyl ●●
●
●
●
●
●
● ●
●●
●●●●●●
●●● ●
●● ●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●●● ●●●
● ●●●
●● ●●
● ●●
●
●
●
●
52
335
hp●●
●●
●
●
●
●●
●●
●●●●●●
● ●●●
●●
●
●
●●●
●
●
●
●
1
8
carb
Car Milage Data in PC2/PC1 Order
## Corrgram Examples 3 and 4
library(corrgram)
corrgram(mtcars, order=NULL, lower.panel=panel.shade,
17http://www.datavis.ca/papers/corrgram.pdf18http://www.statmethods.net/advgraphs/correlograms.html
1.6 Corrgrams/correlogram correlation plots 13
upper.panel=NULL, text.panel=panel.txt,
main="Car Milage Data (unsorted)")
col.corrgram <- function(ncol){colorRampPalette(c("darkgoldenrod4", "burlywood1",
"darkkhaki", "darkgreen"))(ncol)}corrgram(mtcars, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Correlogram of Car Mileage Data (PC2/PC1 Order)",
col.regions = col.corrgram)
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
Car Milage Data (unsorted)
gear
am
drat
mpg
vs
qsec
wt
disp
cyl
hp
carb
Correlogram of Car Mileage Data (PC2/PC1 Order)
Base graphics19 and GGally20
## base graphics
panel.cor <- function(x, y, digits=2, prefix="", cex.cor)
{usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
test <- cor.test(x,y)
19http://gallery.r-enthusiasts.com/graph/Correlation_Matrix_13720http://cran.r-project.org/web/packages/GGally/GGally.pdf
14 More plots in R
# borrowed from printCoefmat
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("***", "**", "*", ".", " "))
text(0.5, 0.5, txt, cex = cex * r)
text(.8, .8, Signif, cex=cex, col=2)
}pairs(USJudgeRatings[,c(2:3,6,1,7)],
lower.panel=panel.smooth, upper.panel=panel.cor)
## ggplot + GGally
library(ggplot2)
library(GGally)
p <- ggpairs(USJudgeRatings[,c(2:3,6,1,7)])
print(p)
INTG
5 6 7 8 9
0.96*** 0.80***6 7 8 9 10
0.13
6.0
7.0
8.0
9.0
0.88***
56
78
9
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
●
DMNR 0.80*** 0.15
0.86***
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
DECI 0.087
6.0
7.0
8.0
0.96***
67
89
10
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
CONT
6.0 7.0 8.0 9.0
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
6.0 7.0 8.0
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
5 6 7 8 9
56
78
9
PREP
INT
GD
MN
RD
EC
IC
ON
TP
RE
P
INTG DMNR DECI CONT PREP
7
8
9
Corr:
0.965
Corr:
0.803
Corr:
−0.133
Corr:
0.878
5
6
7
8
9
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●●●
●
●
●
●
● Corr:
0.804
Corr:
−0.154
Corr:
0.856
6
7
8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●● Corr:
0.0865
Corr:
0.957
6
7
8
9
10
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
● Corr:
0.0115
5
6
7
8
9
6 7 8 9
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
5 6 7 8 9
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
6 7 8
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
6 7 8 9 10
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
5 6 7 8 9
A function for correlation circles21 has also been written.## circle.corr example
data(mtcars)
circle.corr( cor(mtcars), order = TRUE, bg = "gray50",
col = colorRampPalette(c("blue","white","red"))(100) )
21http://gallery.r-enthusiasts.com/graph/Correlation_matrix_circles_152
1.6 Corrgrams/correlogram correlation plots 15
gear
am
drat
mpg
vs
qsec
wt
disp
cyl
hp
carb
gear
am drat
mpg vs qsec wt
disp cyl
hp carb
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
The ellipse library has a function plotcorr(), though it’s output is
less than ideal.## plotcorr examples
library(ellipse)
corr.mtcars <- cor(mtcars)
# numbers don't quite give you what you expect
plotcorr(corr.mtcars, diag = TRUE, numbers = TRUE, type = "lower")
# colors can be nice
ord <- order(corr.mtcars[1,])
xc <- corr.mtcars[ord, ord]
colors <- c("#A50F15","#DE2D26","#FB6A4A","#FCAE91","#FEE5D9","white",
"#EFF3FF","#BDD7E7","#6BAED6","#3182BD","#08519C")
plotcorr(xc, col=colors[5*xc + 6], type = "lower")
16 More plots in R
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
mpg
cyl
disp
hp drat
wt
qsec
vs am gear
carb
10
−9 10
−8 9 10
−8 8 8 10
7 −7 −7 −4 10
−9 8 9 7 −7 10
4 −6 −4 −7 1 −2 10
7 −8 −7 −7 4 −6 7 10
6 −5 −6 −2 7 −7 −2 2 10
5 −5 −6 −1 7 −6 −2 2 8 10
−6 5 4 7 −1 4 −7 −6 1 3 10
cyl
disp
hp
carb
qsec
gear
am
vs
drat
mpg
wt
cyl
disp
hp carb
qsec
gear
am vs drat
An improvement has been made with an updated version22 of the
plotcorr() function.## my.plotcorr example
data(mtcars)
corr.mtcars <- cor(mtcars)
# Change the column and row names for clarity
colnames(corr.mtcars) = c('Miles/gallon', 'Number of cylinders', 'Displacement', 'Horsepower', 'Rear axle ratio', 'Weight', '1/4 mile time', 'V/S', 'Transmission type', 'Number of gears', 'Number of carburetors')
rownames(corr.mtcars) = colnames(corr.mtcars)
colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255))
colramp = colorRampPalette(colsc, space='Lab')
colors = colramp(100)
my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', upper.panel="number", mar=c(0,2,0,0), main='Predictor correlations')
22http://hlplab.wordpress.com/2012/03/20/correlation-plot-matrices-using-the-ellipse-library/
1.6 Corrgrams/correlogram correlation plots 17
Predictor correlations
Miles/gallon
Number of cylinders
Displacement
Horsepower
Rear axle ratio
Weight
1/4 mile time
V/S
Transmission type
Number of gears
Number of carburetors
Mile
s/ga
llon
Num
ber
of c
ylin
ders
Dis
plac
emen
t
Hor
sepo
wer
Rea
r ax
le r
atio
Wei
ght
1/4
mile
tim
e
V/S
Tran
smis
sion
type
Num
ber
of g
ears
Num
ber
of c
arbu
reto
rs
−0.85 −0.85 −0.78 0.68 −0.87 0.42 0.66 0.6 0.48 −0.55
0.9 0.83 −0.7 0.78 −0.59 −0.81 −0.52 −0.49 0.53
0.79 −0.71 0.89 −0.43 −0.71 −0.59 −0.56 0.39
−0.45 0.66 −0.71 −0.72 −0.24 −0.13 0.75
−0.71 0.09 0.44 0.71 0.7 −0.09
−0.17 −0.55 −0.69 −0.58 0.43
0.74 −0.23 −0.21 −0.66
0.17 0.21 −0.57
0.79 0.06
0.27
18 More plots in R
1.7 Beeswarm boxplot
The beeswarm plot23 24 is like a dot plot organized as a violin plot with
the advantage that individual points may be colored categorically.## beeswarm example 1
library(beeswarm)
data(breast)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'swarm',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
boxplot(time_survival ~ event_survival,
data = breast, add = T,
names = c("",""), col="#0000ff22")
## beeswarm using ggplot
library(beeswarm)
data(breast)
beeswarm.out <- beeswarm(time_survival ~ event_survival,
data = breast, method = 'swarm',
pwcol = ER, do.plot=FALSE)[, c(1, 2, 4, 6)]
colnames(beeswarm.out) <- c("x", "y", "ER", "event_survival")
library(ggplot2)
library(plyr) # for round_any()
p <- ggplot(beeswarm.out, aes(x, y))
p <- p + xlab("")
p <- p + scale_y_continuous(expression("Follow-up time (months)"))
p <- p + geom_boxplot(aes(x, y, group = round_any(x, 1, round)), outlier.shape = NA)
p <- p + geom_point(aes(colour = ER))
p <- p + scale_x_continuous(breaks = c(1:2), labels = c("Censored", "Metastasis")
, expand = c(0, 0.5))
print(p)
## Warning: position dodge requires constant width: output may be incorrect
## Warning: Removed 2 rows containing missing values (geom point).
23http://gallery.r-enthusiasts.com/graph/Beeswarm_Boxplot_16324http://gallery.r-enthusiasts.com/graph/Beeswarm_Boxplot_(with_ggplot2)_164
1.7 Beeswarm boxplot 19
Fol
low
−up
tim
e (m
onth
s)
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●● ●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
● ●● ●●
●
●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
●●
●●●
●
●
●●
●
●
● ●
●●●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
050
100
150
Censored Metastasis
●●
050
100
150
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●●●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
50
100
150
Censored Metastasis
Fol
low
−up
tim
e (m
onth
s)
ER
●
●
neg
pos
20 More plots in R
1.8 Back-to-back histogram
A back-to-back histogram25 can compare two distributions.## Back-to-back histogram
require(Hmisc)
age <- rnorm(1000,50,10)
sex <- sample(c('female','male'),1000,TRUE)
out <- histbackback(split(age, sex), probability=TRUE, xlim=c(-.06,.06),
main = 'Back to Back Histogram')
#! just adding color
barplot(-out$left, col="red" , horiz=TRUE, space=0, add=TRUE, axes=FALSE)
barplot(out$right, col="blue", horiz=TRUE, space=0, add=TRUE, axes=FALSE)
# overlayed histograms
df <- data.frame(age, sex)
library(ggplot2)
p <- ggplot(df, aes(x = age, fill=sex))
p <- p + geom_histogram(binwidth = 5, alpha = 0.5, position="identity")
print(p)
Back to Back HistogramBack to Back Histogram
0.06 0.04 0.02 0.00 0.02 0.04 0.0615.0
0000
0035
.000
0000
55.0
0000
0075
.000
0000
female male
0
30
60
90
25 50 75age
coun
t sex
female
male
25http://gallery.r-enthusiasts.com/graph/back_to_back_histogram_136
1.9 Graphs (networks) with directed edges 21
1.9 Graphs (networks) with directed edges
Graphs can be hard to represent, and directed graphs26 doubly so. There
is now a solution27 which I think looks beautiful.library(sna)
library(ggplot2)
library(Hmisc)
library(reshape2)
# Empty ggplot2 theme
new_theme_empty <- theme_bw()
new_theme_empty$line <- element_blank()
new_theme_empty$rect <- element_blank()
new_theme_empty$strip.text <- element_blank()
new_theme_empty$axis.text <- element_blank()
new_theme_empty$plot.title <- element_blank()
new_theme_empty$axis.title <- element_blank()
new_theme_empty$plot.margin <- structure(c(0, 0, -1, -1), unit = "lines",
valid.unit = 3L, class = "unit")
data(coleman) # Load a high school friendship network
adjacencyMatrix <- coleman[1, , ] # Fall semester
# First plot
layoutCoordinates <- gplot(adjacencyMatrix) # Get graph layout coordinates
adjacencyList <- melt(adjacencyMatrix) # Convert to list of ties only
adjacencyList <- adjacencyList[adjacencyList$value > 0, ]
# Function to generate paths between each connected node
edgeMaker <- function(whichRow, len = 100, curved = TRUE){fromC <- layoutCoordinates[adjacencyList[whichRow, 1], ] # Origin
toC <- layoutCoordinates[adjacencyList[whichRow, 2], ] # Terminus
# Add curve:
graphCenter <- colMeans(layoutCoordinates) # Center of the overall graph
bezierMid <- c(fromC[1], toC[2]) # A midpoint, for bended edges
distance1 <- sum((graphCenter - bezierMid)^2)
if(distance1 < sum((graphCenter - c(toC[1], fromC[2]))^2)){
26http://www.win.tue.nl/~dholten/papers/directed_edges_chi.pdf27http://is-r.tumblr.com/post/38459242505/beautiful-network-diagrams-with-ggplot2
22 More plots in R
bezierMid <- c(toC[1], fromC[2])
} # To select the best Bezier midpoint
bezierMid <- (fromC + toC + bezierMid) / 3 # Moderate the Bezier midpoint
if(curved == FALSE){bezierMid <- (fromC + toC) / 2} # Remove the curve
edge <- data.frame(bezier(c(fromC[1], bezierMid[1], toC[1]), # Generate
c(fromC[2], bezierMid[2], toC[2]), # X & y
evaluation = len)) # Bezier path coordinates
edge$Sequence <- 1:len # For size and colour weighting in plot
edge$Group <- paste(adjacencyList[whichRow, 1:2], collapse = ">")
return(edge)
}
# Generate a (curved) edge path for each pair of connected nodes
allEdges <- lapply(1:nrow(adjacencyList), edgeMaker, len = 500, curved = TRUE)
allEdges <- do.call(rbind, allEdges) # a fine-grained path ^, with bend ^
zp1 <- ggplot(allEdges) # Pretty simple plot code
zp1 <- zp1 + geom_path(aes(x = x, y = y, group = Group, # Edges with gradient
colour = Sequence, size = -Sequence)) # and taper
zp1 <- zp1 + geom_point(data = data.frame(layoutCoordinates), # Add nodes
aes(x = x, y = y), size = 2, pch = 21,
colour = "black", fill = "gray") # Customize gradient v
zp1 <- zp1 + scale_colour_gradient(low = gray(0), high = gray(9/10), guide = "none")
zp1 <- zp1 + scale_size(range = c(1/10, 1), guide = "none") # Customize taper
zp1 <- zp1 + new_theme_empty # Clean up plot
print(zp1)
1.9 Graphs (networks) with directed edges 23
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●