+ All Categories
Home > Documents > Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University...

Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University...

Date post: 10-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
25
i Lecture notes for Statistical Computing 1 (SC1) Stat 590 University of New Mexico Erik B. Erhardt Fall 2015
Transcript
Page 1: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

i

Lecture notes forStatistical Computing 1 (SC1)

Stat 590University of New Mexico

Erik B. Erhardt

Fall 2015

Page 2: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

Contents

1 More plots in R 1

1.1 Tree map plots (for hierarchical data) . . . . . . . . . . . . 2

1.2 Parallel sets plot (for categorical data) . . . . . . . . . . . 4

1.3 Sankey plots (for categorical data) . . . . . . . . . . . . . 6

1.4 Steam graphs (stacked density plots) . . . . . . . . . . . . 8

1.5 When data is (dis)agreeable . . . . . . . . . . . . . . . . . 11

1.6 Corrgrams/correlogram correlation plots . . . . . . . . . . 12

1.7 Beeswarm boxplot . . . . . . . . . . . . . . . . . . . . . . 18

1.8 Back-to-back histogram . . . . . . . . . . . . . . . . . . . 20

1.9 Graphs (networks) with directed edges . . . . . . . . . . . 21

Page 3: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

Chapter 1

More plots in R

A selection of plots for more visualization possibilities. Not all of these are

good. These are meant for consideration and discussion. We’ll visit these

footnote links as we go.

Much of the R code is not shown in the pdf; refer to the R code posted

on the website.

Also, there are lots of packages used in this chapter:install.all <- FALSEif (install.all) {

install.list <- c("treemap", "corrgram", "ggplot2", "GGally", "ellipse", "beeswarm", "plyr", "sna", "Hmisc", "reshape2")

# installinstall.packages(install.list)# loadlapply(install.list, library, character.only = TRUE)

}

Page 4: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

2 More plots in R

1.1 Tree map plots (for hierarchical data)

A treemap is a space-filling visualization of hierarchical structures1. It’s

not an easy design2 to get right. The treemap package does a good job.library(treemap)

# Gross national income (per capita) in dollars per country in 2010.

data(GNI2010)

str(GNI2010)

## 'data.frame': 208 obs. of 5 variables:

## $ iso3 : chr "ABW" "AFG" "AGO" "ALB" ...

## $ country : chr "Aruba" "Afghanistan" "Angola" "Albania" ...

## $ continent : chr "North America" "Asia" "Africa" "Europe" ...

## $ population: num 108 34385 19082 3205 7512 ...

## $ GNI : num 0 410 3960 3960 0 ...

head(GNI2010, 10)

## iso3 country continent population GNI

## 1 ABW Aruba North America 108 0

## 2 AFG Afghanistan Asia 34385 410

## 3 AGO Angola Africa 19082 3960

## 4 ALB Albania Europe 3205 3960

## 5 ARE United Arab Emirates Asia 7512 0

## 6 ARG Argentina South America 40412 8620

## 7 ARM Armenia Asia 3092 3200

## 8 ASM American Samoa Oceania 68 0

## 9 ATG Antigua and Barbuda North America 88 13280

## 10 AUS Australia Oceania 22299 46200

# create treemap

tmPlot(GNI2010

, index = c("continent", "iso3")

, vSize = "population"

, vColor = "GNI"

, type = "value")

## Note: tmPlot deprecated as of version 2.0. Please use treemap instead.

1http://en.wikipedia.org/wiki/Treemapping2http://www.juiceanalytics.com/writing/10-lessons-treemap-design/

Page 5: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.1 Tree map plots (for hierarchical data) 3

population

GNI 0 10000 20000 30000 40000 50000 60000 70000 80000 90000

AGOBDI

BEN

BFA

BWA

CAF

CIV

CMRCOG

DJI

DZA

EGY

ERI

ETHGHA

GIN

KEN LBR

LBY

LSO

MAR MDG

MLIMOZ

MRTMWINAM

NERNGA

RWASDN

SEN

SLE

SOM

TCD

TGO

TUN

TZA

UGA

ZAF

ZMB ZWE

AFG ARE

ARM

AZE

BGD

CHN

GEO

HKG

IDN

IND

IRN

IRQ

ISR

JOR

JPN

KAZ

KGZ

KHM

KOR

KWT

LAO

LBN

LKA

MMR

MNG

MYS

NPL

OMN

PAK

PHL

PRK

QAT

SAU

SGPSYR

THA

TJKTKM

TUR

UZB

VNM

YEM

ALB

AUT

BEL

BGR BIHBLRCHE

CZE

DEU

DNK

ESPEST

FIN

FRA

GBR

GRC

HRVHUN

IRL

ITA

LTU

LVAMDA

MKD

NLD

NOR

POL

PRT

RUS

SRB SVK

SVN

SWE

CANCRI

CUB DOM

GTM HNDHTIJAM

MEX

NIC

PANPRI

SLV

USA

AUS

FJI

NZL

PNG

ARG

BOL

BRA

CHL

COL

ECUPER

PRY

URYVEN

Africa

Asia

Europe

North America

South America

Obama’s budget3 looks better as a tree map than with another method4.

Take a look at my Windows harddrive with SpaceSniffer.exe5.

3http://www.nytimes.com/interactive/2010/02/01/us/budget.html?_r=04http://www.nytimes.com/interactive/2012/02/13/us/politics/

2013-budget-proposal-graphic.html?hp5http://www.uderzo.it/main_products/space_sniffer/

Page 6: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

4 More plots in R

1.2 Parallel sets plot (for categorical data)

Parallel sets plots6 visualizes cross-tabulated data, most helpful for tables

of at least 3 dimensions.## Parallel sets function

parallelset <- function(..., freq, col="gray", border=0, layer,

alpha=0.5, gap.width=0.05) {p <- data.frame(..., freq, col, border, alpha, stringsAsFactors=FALSE)

n <- nrow(p)

if(missing(layer)) { layer <- 1:n }p$layer <- layer

np <- ncol(p) - 5

d <- p[ , 1:np, drop=FALSE]

p <- p[ , -c(1:np), drop=FALSE]

p$freq <- with(p, freq/sum(freq))

col <- col2rgb(p$col, alpha=TRUE)

if(!identical(alpha, FALSE)) { col["alpha", ] <- p$alpha*256 }p$col <- apply(col, 2, function(x) do.call(rgb, c(as.list(x), maxColorValue = 256)))

getp <- function(i, d, f, w=gap.width) {a <- c(i, (1:ncol(d))[-i])

o <- do.call(order, d[a])

x <- c(0, cumsum(f[o])) * (1-w)

x <- cbind(x[-length(x)], x[-1])

gap <- cumsum( c(0L, diff(as.numeric(d[o,i])) != 0) )

gap <- gap / max(gap) * w

(x + gap)[order(o),]

}dd <- lapply(seq_along(d), getp, d=d, f=p$freq)

par(mar = c(0, 0, 2, 0) + 0.1, xpd=TRUE )

plot(NULL, type="n",xlim=c(0, 1), ylim=c(np, 1),

xaxt="n", yaxt="n", xaxs="i", yaxs="i", xlab='', ylab='', frame=FALSE)

for(i in rev(order(p$layer)) ) {for(j in 1:(np-1) )

polygon(c(dd[[j]][i,], rev(dd[[j+1]][i,])), c(j, j, j+1, j+1),

col=p$col[i], border=p$border[i])

}text(0, seq_along(dd), labels=names(d), adj=c(0,-2), font=2)

for(j in seq_along(dd)) {ax <- lapply(split(dd[[j]], d[,j]), range)

for(k in seq_along(ax)) {lines(ax[[k]], c(j, j))

6http://stats.stackexchange.com/questions/12029/is-it-possible-to-create-parallel-sets-plot-using-r

Page 7: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.2 Parallel sets plot (for categorical data) 5

text(ax[[k]][1], j, labels=names(ax)[k], adj=c(0, -0.25))

}}

}

data(Titanic)

myt <- subset(as.data.frame(Titanic), Age=="Adult",

select=c("Survived","Sex","Class","Freq"))

myt <- within(myt, {Survived <- factor(Survived, levels=c("Yes","No"))

levels(Class) <- c(paste(c("First", "Second", "Third"), "Class"), "Crew")

color <- ifelse(Survived=="Yes","#008888","#330066")

})

with(myt, parallelset(Survived, Sex, Class, freq=Freq, col=color, alpha=0.2))

Survived

Sex

Class

Yes No

Male Female

First Class Second Class Third Class Crew

Page 8: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

6 More plots in R

1.3 Sankey plots (for categorical data)

Sankey diagrams7 are a specific type of flow diagram, in which the width

of the arrows is shown proportionally to the flow quantity. They are

typically used to visualize energy or material or cost transfers between

processes. One of the most famous Sankey diagrams is Charles Minard’s

Map8 of Napoleon’s Russian Campaign of 1812. If I had known about

these earlier in my career, I would have used it to show how patients were

included/excluded for different reasons in an epidemiological study.

An R function is available9 which is used below for patient tracking.# My example (there is another example inside Sankey.R):

inputs = c(6, 144)

losses = c(6,47,14,7, 7, 35, 34)

unit = "n ="

labels = c("Transfers",

"Referrals\n","Unable to Engage",

"Consultation only",

"Did not complete the intake",

"Did not engage in Treatment",

"Discontinued Mid-Treatment",

"Completed Treatment",

"Active in \nTreatment")

SankeyR(inputs,losses,unit,labels)

# Clean up my mess

rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit")

## Warning in rm("inputs", "labels", "losses", "SankeyR", "sourc.https", "unit"): object

’sourc.https’ not found

7http://www.sankey-diagrams.com/8http://en.wikipedia.org/wiki/File:Minard.png9https://raw.github.com/gist/1423501/55b3c6f11e4918cb6264492528b1ad01c429e581/

Sankey.R

Page 9: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.3 Sankey plots (for categorical data) 7

Transfers: 6 n = (4%)

Referrals: 144 n = (96%)

Unable to Engage: 6

n = (4%)

Consulta

tion only:

47 n = (31.3%)

Did not complete th

e intake

: 14 n = (9

.3%)

Did not engage in

Treatm

ent: 7 n = (4

.7%)

Discontin

ued Mid−Tre

atment: 7

n = (4.7%)

Completed Treatm

ent: 35 n = (2

3.3%)

Active in Treatment: 34 n = (22.7%)

Page 10: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

8 More plots in R

1.4 Steam graphs (stacked density plots)

The NY Times box office revenue plot10 was one of the first steam graphs

created, showing 22 years of data where revenues have clearly grown over

time. The plots have been discussed in detail11 as well as how to create

them in R12. The two examples13 14 below provide a start.## Steam graphs 1 (stacked density plots)

plot.stacked <- function(x,y, ylab="", xlab="", ncol=1, xlim=range(x, na.rm=T), ylim=c(0, 1.2*max(rowSums(y), na.rm=T)), border = NULL, col=rainbow(length(y[1,]))){

## reorder the columns so each curve first appears behind previous curves

## when it first becomes the tallest curve on the landscape

#y <- y[, unique(apply(y, 1, which.max))]

plot(x,y[,1], ylab=ylab, xlab=xlab, ylim=ylim, xaxs="i", yaxs="i", xlim=xlim, t="n")

bottom=0*y[,1]

for(i in 1:length(y[1,])){top=rowSums(as.matrix(y[,1:i]))

polygon(c(x, rev(x)), c(top, rev(bottom)), border=border, col=col[i])

bottom=top

}abline(h=seq(0,200000, 10000), lty=3, col="grey")

legend("topleft", rev(colnames(y)), ncol=ncol, inset = 0, fill=rev(col), bty="0", bg="white", cex=0.8, col=col)

box()

}

#set.seed(1)

m <- 500

n <- 15

x <- seq(m)

y <- matrix(0, nrow=m, ncol=n)

colnames(y) <- seq(n)

for(i in seq(ncol(y))){mu <- runif(1, min=0.25*m, max=0.75*m)

SD <- runif(1, min=5, max=30)

10http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.

html11http://leebyron.com/else/streamgraph/12http://flowingdata.com/2012/07/03/a-variety-of-area-charts-with-r/13http://stackoverflow.com/questions/13084998/streamgraphs-in-r14http://gallery.r-enthusiasts.com/graph/Kernel_density_estimator%3Cbr%

3EIllustration_of_the_kernels_30

Page 11: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.4 Steam graphs (stacked density plots) 9

TMP <- rnorm(1000, mean=mu, sd=SD)

HIST <- hist(TMP, breaks=c(0,x), plot=FALSE)

fit <- smooth.spline(HIST$counts ~ HIST$mids)

y[,i] <- fit$y

}

plot.stacked(x,y)

100 200 300 400 500

050

100

150 15

1413121110987654321

## Steam graphs 2 (stacked density plots)

require("RColorBrewer")

palette(brewer.pal(7,"Accent")[-4])

x <- rnorm(5) #c(-0.475,-1.553,-0.434,-1.019,0.395)

d1 <- density(x,bw=.3,from=-3,to=3)

par(mar=c(3, 2, 2, 3) + 0.1,las=1)

plot(d1,ylim=c(-.3,.6),xlim=c(-3,3),axes=F,ylab="",xlab="",main="")

axis(1)

axis(4,0:3*.2)

abline(h=-.3,col="gray")

#rug(x)

mat <- matrix(0,nc=512,nr=5)

for(i in 1:5){d <- density(x[i],bw=.3,from=-3,to=3)

Page 12: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

10 More plots in R

lines(d$x,(d$y)/5-.3,col=i+1)

mat[i,] <- d$y/5

}for(i in 2:5) mat[i,] <- mat[i,] + mat[i-1,]

usr <- par("usr")

mat <- rbind(0,mat)

#segments(x0=rep(usr[1],5),x1=rep(d£x[171],5),y0=mat[,171],y1=mat[,171],lty=3)

for(i in 2:6) polygon(c(d$x,rev(d$x)),c(mat[i,],rev(mat[i-1,])),col=i,border=NA)

#segments(x0=d£x[171],x1=d£x[171],y0=0,y1=d1£y[171],lwd=3,col="white")

lines(d1,lwd=2)

box()

#palette("default")

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

Page 13: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.5 When data is (dis)agreeable 11

1.5 When data is (dis)agreeable

Sometimes you want to emphasize15 how you feel about your data16.## Grumpy and Smile examples

X1 <- runif(20,0,100)

Y1 <- runif(20,0,100)

Y2 <- 2*X1-0.01*X1^2+rnorm(20,0,10) # quad function

# grumpy version:

smile(X1,Y1,emotion="grumpy",face="green")

# happy version :

smile(X1,Y2,rainbow.gap=0.75)

X

Y

0 20 40 60 80 100

2040

6080

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

X

Y

0 20 40 60 80 100

020

4060

8010

0

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

15http://gallery.r-enthusiasts.com/graph/Smily_and_Grumpy_faces_17416Please never use this except in jest, of course.

Page 14: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

12 More plots in R

1.6 Corrgrams/correlogram correlation plots

Corrgrams17 help us visualize the data in correlation matrices18 The corrgram

package is one strategy.## Corrgram Examples 1 and 2

library(corrgram)

data(mtcars)

corrgram(mtcars, order=TRUE, lower.panel=panel.shade,

upper.panel=panel.pie, text.panel=panel.txt,

main="Car Milage Data in PC2/PC1 Order")

corrgram(mtcars, order=TRUE, lower.panel=panel.ellipse,

upper.panel=panel.pts, text.panel=panel.txt,

diag.panel=panel.minmax,

main="Car Milage Data in PC2/PC1 Order")

gear

am

drat

mpg

vs

qsec

wt

disp

cyl

hp

carb

Car Milage Data in PC2/PC1 Order

3

5

gear ●●●

●●●●

●●●●

●●●●●●

●●●

●●●●●

●●●●●

● ●●●

●●● ●

●●●●

●●●●●●

● ●●

●● ● ●●

●● ●●●

● ●●●

●●●●

●●●●

●●●●● ●

●● ●

●●●● ●

● ●● ●●

● ●● ●

●● ●●

●●●●

●●●●●●

●●●

●●●●●

● ●●●●

● ●● ●

●● ●●

● ●●●

●●●●●●

●● ●

●●●● ●

●●● ●●

● ●●●

●●●●

●●●●

●●● ●●●

●●●

● ●●●●

●● ●● ●

● ●●●

● ●● ●

●●●●

●●● ●●●

●●●

● ●● ● ●

●● ●● ●

● ●●●

● ●● ●

●● ●●

●●●●●●

●●●

● ●●●●

●● ●● ●

● ●●●

● ●● ●

● ●●●

●●●●●●

●●●

● ●● ●●

●● ●● ●

● ●●●

● ●● ●

●● ●●

●●● ●●●

● ●●

● ●● ●●

●● ● ● ●

0

1

am●●●

●●● ● ●●●●●●●●●●

● ●●

●● ● ●●

● ●● ●●● ● ●●●

●●●● ●●●●●●●●● ●

●● ●

●●●● ●

●● ●● ●● ● ●● ●

●● ●● ●●●●●●●●●●

●●●

●●●●●

●● ●●●● ● ●● ●

●● ●● ● ●●●●●●●●●

●● ●

●●●● ●

●●●● ●● ● ●●●

●●●●●●●● ●●● ●●●

●●●

● ●●●●

●●● ●● ●● ●●●

● ●● ●●●●● ●●● ●●●

●●●

● ●● ● ●

●●● ●● ●● ●●●

● ●● ●●● ●● ●●●●●●

●●●

● ●●●●

●●● ●● ●● ●●●

● ●● ●● ●●● ●●●●●●

●●●

● ●● ●●

●●● ●● ●● ●●●

● ●● ●●● ●●●●● ●●●

● ●●

● ●● ●●

● ●● ● ● ●●

2.76

4.93

drat ●●●

●●

●●●●

●●●●●

●●

●●● ●

●●

●●●●

●●●●●●

●●

●●● ●

●●

●●●●

●●●●●

●●

●●●●

●●

●●●●

●●●●●●

● ●

●●●●

● ●

●●●●

●●●●●

● ●

●●●●

● ●

●● ●●

●●●●●●

● ●

●●●●

● ●

●●●●

●●●●●

● ●

●●●●

● ●

●● ●●

●●●●●●

● ●

10.4

33.9

mpg ●●●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●

●● ●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

0

1

vs●●

●●

● ●●●

●●●●●●

●● ●●

●●● ●

● ●●

●●

● ●

●●●●

●●● ●●●

●●● ●

●●●●

●● ●

●●

● ●

●●●●

●●● ●●●

●●● ●

●● ● ●

●● ●

●●

● ●

●● ●●

●●●●●●

●●●●

●●●●

●● ●

●●

●●

● ●●●

●●●●●●

●●● ●

●● ●●

●● ●

●●

●●

●● ●●

●●● ●●●

● ●●●

●● ●●

● ● ●

14.5

22.9

qsec●●

●●

●●

●●● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●● ●●●

●●

●●

●●

●●

●●

1.51

5.42

wt●●

● ●● ●●●●●

●●●

●●●

●●

●●● ●

●●

●●

● ●●

● ●● ●●●

●●

●●●

●●●

●●

●●●●

●●

●●

● ●●

● ●● ●● ●

●●

●●●

●●●

●●

●●●●

●●

●●

● ●●

● ●● ●●●

●●

●●●

●●●

●●

●●●●

● ●

●●

71.1

472

disp●●

●●●●

●●●

●●●

●●●●

●●●●

●●●

●●●

● ●●●

●●●

●●●

●●●●

●●●

●●

●●●

●●●●

●●●

●●●

● ●●●

●●●

●●●

4

8

cyl ●●

● ●

●●

●●●●●●

●●● ●

●● ●●

●●●

●●

●●

●●

●●● ●●●

● ●●●

●● ●●

● ●●

52

335

hp●●

●●

●●

●●

●●●●●●

● ●●●

●●

●●●

1

8

carb

Car Milage Data in PC2/PC1 Order

## Corrgram Examples 3 and 4

library(corrgram)

corrgram(mtcars, order=NULL, lower.panel=panel.shade,

17http://www.datavis.ca/papers/corrgram.pdf18http://www.statmethods.net/advgraphs/correlograms.html

Page 15: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.6 Corrgrams/correlogram correlation plots 13

upper.panel=NULL, text.panel=panel.txt,

main="Car Milage Data (unsorted)")

col.corrgram <- function(ncol){colorRampPalette(c("darkgoldenrod4", "burlywood1",

"darkkhaki", "darkgreen"))(ncol)}corrgram(mtcars, order=TRUE, lower.panel=panel.shade,

upper.panel=panel.pie, text.panel=panel.txt,

main="Correlogram of Car Mileage Data (PC2/PC1 Order)",

col.regions = col.corrgram)

mpg

cyl

disp

hp

drat

wt

qsec

vs

am

gear

carb

Car Milage Data (unsorted)

gear

am

drat

mpg

vs

qsec

wt

disp

cyl

hp

carb

Correlogram of Car Mileage Data (PC2/PC1 Order)

Base graphics19 and GGally20

## base graphics

panel.cor <- function(x, y, digits=2, prefix="", cex.cor)

{usr <- par("usr"); on.exit(par(usr))

par(usr = c(0, 1, 0, 1))

r <- abs(cor(x, y))

txt <- format(c(r, 0.123456789), digits=digits)[1]

txt <- paste(prefix, txt, sep="")

if(missing(cex.cor)) cex <- 0.8/strwidth(txt)

test <- cor.test(x,y)

19http://gallery.r-enthusiasts.com/graph/Correlation_Matrix_13720http://cran.r-project.org/web/packages/GGally/GGally.pdf

Page 16: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

14 More plots in R

# borrowed from printCoefmat

Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,

cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),

symbols = c("***", "**", "*", ".", " "))

text(0.5, 0.5, txt, cex = cex * r)

text(.8, .8, Signif, cex=cex, col=2)

}pairs(USJudgeRatings[,c(2:3,6,1,7)],

lower.panel=panel.smooth, upper.panel=panel.cor)

## ggplot + GGally

library(ggplot2)

library(GGally)

p <- ggpairs(USJudgeRatings[,c(2:3,6,1,7)])

print(p)

INTG

5 6 7 8 9

0.96*** 0.80***6 7 8 9 10

0.13

6.0

7.0

8.0

9.0

0.88***

56

78

9

●●

●●

●●

●●

● ●

●●●

DMNR 0.80*** 0.15

0.86***

●●

● ●●

●●

●●

●●

●●

● ●●

●●

●●

●●

DECI 0.087

6.0

7.0

8.0

0.96***

67

89

10

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

CONT

6.0 7.0 8.0 9.0

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

6.0 7.0 8.0

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

5 6 7 8 9

56

78

9

PREP

INT

GD

MN

RD

EC

IC

ON

TP

RE

P

INTG DMNR DECI CONT PREP

7

8

9

Corr:

0.965

Corr:

0.803

Corr:

−0.133

Corr:

0.878

5

6

7

8

9

●●

●●

●●

●●

● ●

●●●

● Corr:

0.804

Corr:

−0.154

Corr:

0.856

6

7

8

●●

● ●●

●●

●●

●●

●●

● ●●

●●

●●

●● Corr:

0.0865

Corr:

0.957

6

7

8

9

10

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● Corr:

0.0115

5

6

7

8

9

6 7 8 9

●●

●●

●●

●●

● ●

●●

5 6 7 8 9

●●

●●

●●

●●

● ●

●●

6 7 8

●●

●●

●●

●●

● ●

●●

6 7 8 9 10

●●

●●

●●

●●

● ●

●●

5 6 7 8 9

A function for correlation circles21 has also been written.## circle.corr example

data(mtcars)

circle.corr( cor(mtcars), order = TRUE, bg = "gray50",

col = colorRampPalette(c("blue","white","red"))(100) )

21http://gallery.r-enthusiasts.com/graph/Correlation_matrix_circles_152

Page 17: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.6 Corrgrams/correlogram correlation plots 15

gear

am

drat

mpg

vs

qsec

wt

disp

cyl

hp

carb

gear

am drat

mpg vs qsec wt

disp cyl

hp carb

●●

●●

●●

●●

The ellipse library has a function plotcorr(), though it’s output is

less than ideal.## plotcorr examples

library(ellipse)

corr.mtcars <- cor(mtcars)

# numbers don't quite give you what you expect

plotcorr(corr.mtcars, diag = TRUE, numbers = TRUE, type = "lower")

# colors can be nice

ord <- order(corr.mtcars[1,])

xc <- corr.mtcars[ord, ord]

colors <- c("#A50F15","#DE2D26","#FB6A4A","#FCAE91","#FEE5D9","white",

"#EFF3FF","#BDD7E7","#6BAED6","#3182BD","#08519C")

plotcorr(xc, col=colors[5*xc + 6], type = "lower")

Page 18: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

16 More plots in R

mpg

cyl

disp

hp

drat

wt

qsec

vs

am

gear

carb

mpg

cyl

disp

hp drat

wt

qsec

vs am gear

carb

10

−9 10

−8 9 10

−8 8 8 10

7 −7 −7 −4 10

−9 8 9 7 −7 10

4 −6 −4 −7 1 −2 10

7 −8 −7 −7 4 −6 7 10

6 −5 −6 −2 7 −7 −2 2 10

5 −5 −6 −1 7 −6 −2 2 8 10

−6 5 4 7 −1 4 −7 −6 1 3 10

cyl

disp

hp

carb

qsec

gear

am

vs

drat

mpg

wt

cyl

disp

hp carb

qsec

gear

am vs drat

An improvement has been made with an updated version22 of the

plotcorr() function.## my.plotcorr example

data(mtcars)

corr.mtcars <- cor(mtcars)

# Change the column and row names for clarity

colnames(corr.mtcars) = c('Miles/gallon', 'Number of cylinders', 'Displacement', 'Horsepower', 'Rear axle ratio', 'Weight', '1/4 mile time', 'V/S', 'Transmission type', 'Number of gears', 'Number of carburetors')

rownames(corr.mtcars) = colnames(corr.mtcars)

colsc=c(rgb(241, 54, 23, maxColorValue=255), 'white', rgb(0, 61, 104, maxColorValue=255))

colramp = colorRampPalette(colsc, space='Lab')

colors = colramp(100)

my.plotcorr(corr.mtcars, col=colors[((corr.mtcars + 1)/2) * 100], diag='ellipse', upper.panel="number", mar=c(0,2,0,0), main='Predictor correlations')

22http://hlplab.wordpress.com/2012/03/20/correlation-plot-matrices-using-the-ellipse-library/

Page 19: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.6 Corrgrams/correlogram correlation plots 17

Predictor correlations

Miles/gallon

Number of cylinders

Displacement

Horsepower

Rear axle ratio

Weight

1/4 mile time

V/S

Transmission type

Number of gears

Number of carburetors

Mile

s/ga

llon

Num

ber

of c

ylin

ders

Dis

plac

emen

t

Hor

sepo

wer

Rea

r ax

le r

atio

Wei

ght

1/4

mile

tim

e

V/S

Tran

smis

sion

type

Num

ber

of g

ears

Num

ber

of c

arbu

reto

rs

−0.85 −0.85 −0.78 0.68 −0.87 0.42 0.66 0.6 0.48 −0.55

0.9 0.83 −0.7 0.78 −0.59 −0.81 −0.52 −0.49 0.53

0.79 −0.71 0.89 −0.43 −0.71 −0.59 −0.56 0.39

−0.45 0.66 −0.71 −0.72 −0.24 −0.13 0.75

−0.71 0.09 0.44 0.71 0.7 −0.09

−0.17 −0.55 −0.69 −0.58 0.43

0.74 −0.23 −0.21 −0.66

0.17 0.21 −0.57

0.79 0.06

0.27

Page 20: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

18 More plots in R

1.7 Beeswarm boxplot

The beeswarm plot23 24 is like a dot plot organized as a violin plot with

the advantage that individual points may be colored categorically.## beeswarm example 1

library(beeswarm)

data(breast)

beeswarm(time_survival ~ event_survival, data = breast,

method = 'swarm',

pch = 16, pwcol = as.numeric(ER),

xlab = '', ylab = 'Follow-up time (months)',

labels = c('Censored', 'Metastasis'))

boxplot(time_survival ~ event_survival,

data = breast, add = T,

names = c("",""), col="#0000ff22")

## beeswarm using ggplot

library(beeswarm)

data(breast)

beeswarm.out <- beeswarm(time_survival ~ event_survival,

data = breast, method = 'swarm',

pwcol = ER, do.plot=FALSE)[, c(1, 2, 4, 6)]

colnames(beeswarm.out) <- c("x", "y", "ER", "event_survival")

library(ggplot2)

library(plyr) # for round_any()

p <- ggplot(beeswarm.out, aes(x, y))

p <- p + xlab("")

p <- p + scale_y_continuous(expression("Follow-up time (months)"))

p <- p + geom_boxplot(aes(x, y, group = round_any(x, 1, round)), outlier.shape = NA)

p <- p + geom_point(aes(colour = ER))

p <- p + scale_x_continuous(breaks = c(1:2), labels = c("Censored", "Metastasis")

, expand = c(0, 0.5))

print(p)

## Warning: position dodge requires constant width: output may be incorrect

## Warning: Removed 2 rows containing missing values (geom point).

23http://gallery.r-enthusiasts.com/graph/Beeswarm_Boxplot_16324http://gallery.r-enthusiasts.com/graph/Beeswarm_Boxplot_(with_ggplot2)_164

Page 21: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.7 Beeswarm boxplot 19

Fol

low

−up

tim

e (m

onth

s)

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●

●● ●

●●

●●

● ●● ●●

●●

●●

●●●

●●

●●●●●

●●

●●

● ●

●●

●●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

050

100

150

Censored Metastasis

●●

050

100

150

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

0

50

100

150

Censored Metastasis

Fol

low

−up

tim

e (m

onth

s)

ER

neg

pos

Page 22: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

20 More plots in R

1.8 Back-to-back histogram

A back-to-back histogram25 can compare two distributions.## Back-to-back histogram

require(Hmisc)

age <- rnorm(1000,50,10)

sex <- sample(c('female','male'),1000,TRUE)

out <- histbackback(split(age, sex), probability=TRUE, xlim=c(-.06,.06),

main = 'Back to Back Histogram')

#! just adding color

barplot(-out$left, col="red" , horiz=TRUE, space=0, add=TRUE, axes=FALSE)

barplot(out$right, col="blue", horiz=TRUE, space=0, add=TRUE, axes=FALSE)

# overlayed histograms

df <- data.frame(age, sex)

library(ggplot2)

p <- ggplot(df, aes(x = age, fill=sex))

p <- p + geom_histogram(binwidth = 5, alpha = 0.5, position="identity")

print(p)

Back to Back HistogramBack to Back Histogram

0.06 0.04 0.02 0.00 0.02 0.04 0.0615.0

0000

0035

.000

0000

55.0

0000

0075

.000

0000

female male

0

30

60

90

25 50 75age

coun

t sex

female

male

25http://gallery.r-enthusiasts.com/graph/back_to_back_histogram_136

Page 23: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.9 Graphs (networks) with directed edges 21

1.9 Graphs (networks) with directed edges

Graphs can be hard to represent, and directed graphs26 doubly so. There

is now a solution27 which I think looks beautiful.library(sna)

library(ggplot2)

library(Hmisc)

library(reshape2)

# Empty ggplot2 theme

new_theme_empty <- theme_bw()

new_theme_empty$line <- element_blank()

new_theme_empty$rect <- element_blank()

new_theme_empty$strip.text <- element_blank()

new_theme_empty$axis.text <- element_blank()

new_theme_empty$plot.title <- element_blank()

new_theme_empty$axis.title <- element_blank()

new_theme_empty$plot.margin <- structure(c(0, 0, -1, -1), unit = "lines",

valid.unit = 3L, class = "unit")

data(coleman) # Load a high school friendship network

adjacencyMatrix <- coleman[1, , ] # Fall semester

# First plot

layoutCoordinates <- gplot(adjacencyMatrix) # Get graph layout coordinates

adjacencyList <- melt(adjacencyMatrix) # Convert to list of ties only

adjacencyList <- adjacencyList[adjacencyList$value > 0, ]

# Function to generate paths between each connected node

edgeMaker <- function(whichRow, len = 100, curved = TRUE){fromC <- layoutCoordinates[adjacencyList[whichRow, 1], ] # Origin

toC <- layoutCoordinates[adjacencyList[whichRow, 2], ] # Terminus

# Add curve:

graphCenter <- colMeans(layoutCoordinates) # Center of the overall graph

bezierMid <- c(fromC[1], toC[2]) # A midpoint, for bended edges

distance1 <- sum((graphCenter - bezierMid)^2)

if(distance1 < sum((graphCenter - c(toC[1], fromC[2]))^2)){

26http://www.win.tue.nl/~dholten/papers/directed_edges_chi.pdf27http://is-r.tumblr.com/post/38459242505/beautiful-network-diagrams-with-ggplot2

Page 24: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

22 More plots in R

bezierMid <- c(toC[1], fromC[2])

} # To select the best Bezier midpoint

bezierMid <- (fromC + toC + bezierMid) / 3 # Moderate the Bezier midpoint

if(curved == FALSE){bezierMid <- (fromC + toC) / 2} # Remove the curve

edge <- data.frame(bezier(c(fromC[1], bezierMid[1], toC[1]), # Generate

c(fromC[2], bezierMid[2], toC[2]), # X & y

evaluation = len)) # Bezier path coordinates

edge$Sequence <- 1:len # For size and colour weighting in plot

edge$Group <- paste(adjacencyList[whichRow, 1:2], collapse = ">")

return(edge)

}

# Generate a (curved) edge path for each pair of connected nodes

allEdges <- lapply(1:nrow(adjacencyList), edgeMaker, len = 500, curved = TRUE)

allEdges <- do.call(rbind, allEdges) # a fine-grained path ^, with bend ^

zp1 <- ggplot(allEdges) # Pretty simple plot code

zp1 <- zp1 + geom_path(aes(x = x, y = y, group = Group, # Edges with gradient

colour = Sequence, size = -Sequence)) # and taper

zp1 <- zp1 + geom_point(data = data.frame(layoutCoordinates), # Add nodes

aes(x = x, y = y), size = 2, pch = 21,

colour = "black", fill = "gray") # Customize gradient v

zp1 <- zp1 + scale_colour_gradient(low = gray(0), high = gray(9/10), guide = "none")

zp1 <- zp1 + scale_size(range = c(1/10, 1), guide = "none") # Customize taper

zp1 <- zp1 + new_theme_empty # Clean up plot

print(zp1)

Page 25: Lecture notes for Statistical Computing 1 (SC1) Stat 590 … · 2015-08-19 · Stat 590 University of New Mexico Erik B. Erhardt Fall 2015. Contents 1 More plots in R 1 1.1 Tree map

1.9 Graphs (networks) with directed edges 23

●●

●●

●●

●●


Recommended