Sweave: Reproducible Research using R and LATEX
Sandra D. GriffithDepartment of Biostatistics and EpidemiologyUniversity of [email protected]
Biostatistics Computing Workshop SeriesMarch 15, 2012
S. Griffith ([email protected]) Sweave 15 March 2012 1 / 20
Non-reproducible Research
• CharacteristicsI Prepare or manipulate data in a spreadsheetI Cut and paste output to create tablesI Multiple versions of data and analysis scriptsI Create many versions of graphics, selecting only one for final
presentation of results
• ProblemsI Data, code, and results not linkedI Any changes in analysis or data require manual regeneration of resultsI Workflow or organization scheme may change over timeI Can be difficult to replicate in the futureI Less forensic evidence if results are questioned
S. Griffith ([email protected]) Sweave 15 March 2012 2 / 20
Response to Duke University Scandal
“We now require most of our reports to be written using Sweave, a literateprogramming combination of LATEX source and R code (SASweave andodfWeave are also available) so that we can rerun the reports as needed
and get the same results.”
S. Griffith ([email protected]) Sweave 15 March 2012 3 / 20
Sweave: Conceptual Overview
• Link data, code, and results with a single .Rnw fileI Similar to .tex file, but includes interspersed “chunks” of R codeI Uses noweb syntax for literate programming
• Weave .Rnw file to produce .tex file which includes output from Rcode
• Compile TeX file to PDF or PS files as usual
• Tangle .Rnw file to extract R code into separate file
• In addition to including them in the output, creates individual files foreach figure
• Can refer to within-chunk R expressions in regular document textusing Sexpr
S. Griffith ([email protected]) Sweave 15 March 2012 4 / 20
Getting Started with Sweave
• Assume R and LATEX already installed
• Sweave.sty is already included with base R installation
I Preferred method: include R folder containing Sweave.sty in yourTeX path
F Will automatically update style file when you update R
I Copy Sweave.sty to a centralized location with other style files, alsoin your TeX path
F Requires manual updates, but can be located in a central locationshared among computers (e.g. Dropbox)
I Hard path: include \usepackage{...\Sweave} in preambleI Copy Sweave.sty into same folder as each .Rnw file
S. Griffith ([email protected]) Sweave 15 March 2012 5 / 20
Anatomy of a Code Chunk
<< label (optional), options >>=
insert R code here
@
Commonly-used options (see manual for full list)
• echo = F
Suppress R input from appearing in document (default = T)
• eval = F
R code not evaluated (default = T)
• results = hide
Suppress R output from appearing in document (default = verbatim)
• results = tex
R output will be read as TeX (default = verbatim)
• fig = T
Code chuck includes a figure (default = F)
S. Griffith ([email protected]) Sweave 15 March 2012 6 / 20
Global Options
Default options can be set in preamble and updated throughout document
• Set R chunk options\SweaveOpts{eval=T, echo=F}
• Preserve comments and spacing of echoed R code\SweaveOpts{keep.source=TRUE}
• Figure options for height, width, and file type
S. Griffith ([email protected]) Sweave 15 March 2012 7 / 20
Example
<<echo=T>>=
x <- exp(2.3)
x
@
> x <- exp(2.3)
> x
[1] 9.974182
<<echo=F>>=
x <- exp(2.3)
x
@
[1] 9.974182
<<echo=T, results=hide>>=
x <- exp(2.3)
x
@
> x <- exp(2.3)
> x
S. Griffith ([email protected]) Sweave 15 March 2012 8 / 20
Compiling an Sweave Document
• Manually (Windows or Mac)
1. Run Sweave(‘foo.Rnw’) in R console2. Open foo.tex in a TeX editor3. Compile PDF using TeX editor4. Stangle(‘foo.Rnw’) to extract R code if desired
• Manually (Linux/Unix)
1. Run R CMD Sweave foo.Rnw
2. Run pdflatex foo or latex foo
• Integrated Development Environment (IDE)
I Rstudio, Emacs (ESS), Eclipse (StatEt), etc.I If supported, usually one click/command for all steps (Sweave, compile
TeX, view PDF)
S. Griffith ([email protected]) Sweave 15 March 2012 9 / 20
The xtable Package: Basic Table Code
R package to convert many R objects to LATEXor HTML tables
<<label=tab:GenderRace, results=tex>>=
library(xtable)
data(tli)
xtable(table(tli$ethnicty, tli$sex),
caption="Distribution of gender and ethnicity")
@
<<label=tab:LM1, results=tex>>=
lm1 <- lm(tlimth ~ sex + ethnicty, data=tli)
xtable(lm1, caption="Linear Model Results")
@
S. Griffith ([email protected]) Sweave 15 March 2012 11 / 20
The xtable package: Basic Table Output
F M
BLACK 11 12HISPANIC 8 12
OTHER 2 0WHITE 30 25
Table: Distribution of gender and ethnicity
Estimate Std. Error t value Pr(>|t|)(Intercept) 71.0226 3.2894 21.59 0.0000
sexM 3.3734 2.8594 1.18 0.2410ethnictyHISPANIC -3.7466 4.3044 -0.87 0.3863
ethnictyOTHER 18.4774 10.4716 1.76 0.0809ethnictyWHITE 7.4622 3.4964 2.13 0.0354
Table: Linear Model Results
S. Griffith ([email protected]) Sweave 15 March 2012 12 / 20
The xtable package: Customized Tables
> mat <- round(matrix(c(0.9, 0.89, 200, 0.045, 2.0),
+ c(1, 5)), 4)
> rownames(mat) <- "$y_{t-1}$"
> colnames(mat) <- c("$R^2$", "$\\bar{R}^2$",
+ "F-stat", "S.E.E", "DW")
> mat <- xtable(mat)
> print(mat, sanitize.text.function = function(x){x})
R2 R̄2 F-stat S.E.E DW
yt−1 0.90 0.89 200.00 0.04 2.00
Almost all functionality available for LATEX tablescan be included directly in R code using xtable
S. Griffith ([email protected]) Sweave 15 March 2012 13 / 20
Aside: Using xtable for MS Word Tables
Non-statistical collaborators often prefer tabular results in MS Word
xtable(table(tli$ethnicty, tli$sex),
file="TabGenderRace",
type="html"
)
1. Save results in HTML file using xtable() in R
2. Open “TabGenderRace.htm” in a browser
3. Copy and paste into Word document as a fully-formatted table
S. Griffith ([email protected]) Sweave 15 March 2012 14 / 20
Basic Figure Example
<<fig=T, echo=F, width=5, height=3.5>>=
plot(1:10, rnorm(10))
@
●
●
●
●
●
●
●
●
●
●
2 4 6 8 10
−2
−1
01
1:10
rnor
m(1
0)
NB: Embed figure chunk within a LATEX figure environment for moreprecise control
S. Griffith ([email protected]) Sweave 15 March 2012 15 / 20
Large or Computationally Intensive Projects
• Use input statements or make files
• save() and load() intermediate results
• Conditional evaluationif (file exists) {load file} else {run; save file})
• Change R chunk evaluation options as necessary
• R package: cacheSweave to cache intermediate results
S. Griffith ([email protected]) Sweave 15 March 2012 16 / 20
Including R code as an Appendix
• Useful for homework, solution sets, etc.
• Include \usepackage{listings} in the preamble
• Include the following R chunk and TeX code in foo.Rnw where youwould like to place appendix
<<echo=FALSE, results=hide, split=TRUE>>=
Stangle(file="foo.Rnw",output="foo.R",
annotate=FALSE)
@
\pagebreak
\section{R Code}
\texttt{\lstinputlisting[emptylines=0]{foo.R}}
S. Griffith ([email protected]) Sweave 15 March 2012 17 / 20
Miscellaneous Sweave Tricks
• Load all libraries in one chunk with results = hide option tosuppress unwanted output (e.g. package dependencies)
• Beamer presentationsI Include [fragile] option for every frame with R code to handle
verbatim outputI For frames with TeX and verbatim output, must include
[containsverbatim] option instead
• R graphics package ggplot2
I Must use print() wrapper for ggplot objects
• R session information
> toLatex(sessionInfo(), locale=F)
I R version 2.14.1 (2011-12-22), x86_64-pc-mingw32I Base packages: base, datasets, graphics, grDevices, methods, stats,
utilsI Other packages: xtable 1.7-0I Loaded via a namespace (and not attached): tools 2.14.1
S. Griffith ([email protected]) Sweave 15 March 2012 18 / 20
Alternatives for Reproducible Research
• R for other document formats
I HTML: R2HTMLI Open Office: odfWeaveI MS Word: SwordI MS Powerpoint: R2PPT
• Other statistical packages
I Statweave for SAS, Stata, or MATLAB and LATEX or Open OfficeI Various other software-specific report generators
S. Griffith ([email protected]) Sweave 15 March 2012 19 / 20
Resources
• Sweave user manual (Friedrich Leisch): http://www.stat.
uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf
• Stack Overflow questions tagged Sweave:http://stackoverflow.com/questions/tagged/sweave
• Keith Baggerly’s introduction to Sweave: http://bioinformatics.
mdanderson.org/SweaveTalk/sweaveTalkb.pdf
• QuickR summary of alternatives to Sweave:http://www.statmethods.net/interface/output.html
• Citing R with Sweave: http://biostat.mc.vanderbilt.edu/
wiki/pub/Main/SweaveLatex/RCitation.pdf
• xtable gallery with examples: http://cran.r-project.org/web/
packages/xtable/vignettes/xtableGallery.pdf
S. Griffith ([email protected]) Sweave 15 March 2012 20 / 20