Reproducible Research and R - Alec Zwart

Post on 27-Jan-2015

113 views 4 download

Tags:

description

 

transcript

Reproducible Research and

CSIRO MATHEMATICS, INFORMATICS, AND STATISTICS

Alec Zwart 21 November 2012

Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).

Reproduceable Research and R | Alec Zwart

Preliminary: Markup

2 |

Reproduceable Research and R | Alec Zwart

Donald Knuth - Literate Programming

• DE Knuth - The Computer Journal, 1984.

‘Instead of imagining that our main task is to instruct a computer

what to do, let us concentrate rather on explaining to human

beings what we want a computer to do.’

3 |

Reproduceable Research and R | Alec Zwart

WEB:

4 |

‘Weave’

‘Tangle’

Reproduceable Research and R | Alec Zwart

Weaving (modern version)

5 |

CWEBCWEB, noweb,Sweave, knitr…

Code blocks

Text w/ markup

Code block outputs

Text markup processor(LaTeX, Web browser, Markdown processor)

Formatted output

Language translator(R, Python…)

Text, code & output

w/ markup

Reproduceable Research and R | Alec Zwart

Why?

• Knuth – a way to program

• Automatic report generation (web services)

• Reports, articles, program documentation/tutorials

• Reproducible research

6 |

Reproduceable Research and R | Alec Zwart

Reproducible Research

• Promoted by Jon F. Claerbout, Stanford University (1990’s?).

• Early publication: Wavelab and Reproducible Research, Buckheit & Donoho 1995.• ‘When we publish articles containing figures which were generated by

computer, we also publish the complete software environment which generates the figures’.

• Special issue, Computing in Science & Engineering, V11-1,2009.

7 |

Image: Fomel, S. & Claerbout, J. F. Guest Editors’ Introduction: Reproducible Research. Computing in Science & Engineering 11, 5–7 (2009).

Reproduceable Research and R | Alec Zwart

Reproducible Research in Statistics

• Gentleman & Temple Lang 2004, ‘Statistical Analysis and Reproducible Research’.

‘It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them.’

8 |

Reproduceable Research and R | Alec Zwart9 |

Gentleman and Temple Lange: The Compendium

Reproduceable Research and R | Alec Zwart

Literate programming systems in

• CRAN: Task Views – ReproduceableResearch

• Sweave (R+LaTeX, standard for vignette production)

• Knitr (various + various)

• Other possibilities (ascii, odfWeave, brew, etc)

10 |

Reproduceable Research and R | Alec Zwart11 |

Knitr + Markdown – Yihui Xie

Reproduceable Research and R | Alec Zwart

Publish on…

12 |

CSIRO Mathematics, Informatics & StatisticsAlec Zwart

t +61 2 6216 7010e alec.zwart@csiro.au

CSIRO MATHEMATICS, INFORMATICS AND STATISTICS

Thank you

Reproduceable Research and R | Alec Zwart

Reproducible Research – again, why?

• Anil Potti - Duke University, North Carolina• Personalised medicine for cancer patients• Microarray work

• Statisticians Keith Baggerly, Kevin Coombes intrigued by results from Potti’s research – decide to investigate:• Found errors, including lots of simple ones – mislabelled samples,

mismatched gene names, etc.

• To date: 10 retractions, 7 corrections, 1 partial retraction. Anil Potti resigned.

• Dishonesty? Ignorance, incompetence + wishful thinking? Unclear

14 |

Reproduceable Research and R | Alec Zwart

B & D: Reproducible Research – why?

• Buckheit & Donoho – anecdotes:• Which of these printouts was the right version of the figure? – Arrgh!• Stolen brief case – loss of irreplaceable figures.• Limitations of oral communication of software & algorithms.• Documentation – returning to old work.• Er – can’t remember what parameter values gave this result – not to worry…

• ‘An article about computational science in a scientific publication is NOT the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures’ - Buckheit & Donoho

15 |

Reproduceable Research and R | Alec Zwart

Knitr + markdown

• Markdown – text formatting system, not nearly as powerful as LaTeX, but simple

• Knitr + markdown great for producing quick reports in HTML

• Incorporated into RStudio – See RStudio pages for docs.

• Knitr webpage: http://yihui.name/knitr • Documentation for code chunk options:

http://yihui.name/knitr/options

16 |

Reproduceable Research and R | Alec Zwart17 |

Reproduceable Research and R | Alec Zwart

Weaving (modern version)

18 |

CWEBCWEB, noweb,Sweave, knitr…

Code blocks

Text w/ markup

Code block outputs

Text markup processor(LaTeX, Web browser, Markdown processor)

Formatted output

Language translator(R, Python…)

Text, code & outputw/ markup

Reproduceable Research and R | Alec Zwart19 |

Reproduceable Research and R | Alec Zwart20 |

Knitr + Markdown – Yihui Xie

Reproduceable Research and R | Alec Zwart21 |

Knitr + Markdown – Yihui Xie

Reproduceable Research and R | Alec Zwart

G & TL – the Compendium

• For RR, may need to provide:• Dynamic document files• Extra code files• Extra text processing files (e.g. LaTeX style files, etc?)• Data files• Instructions/documentation

• Place all of this in a suitable container – the Compendium• A folder with subfolders• An R package!

• GolubRR package – Gentleman 2005, ‘Reproducible Research: A Bioinformatics Case Study’.

22 |

Reproduceable Research and R | Alec Zwart

WEB

23 |

CWEBWEB

‘Weave’

‘Tangle’

Reproduceable Research and R | Alec Zwart

Knitr

• Yihui Xie• Scratching Sweave itches?• Greater functionality

• better R output capture, • better code formatting, • built in caching, • better graphics handling, • source R code from scripts, • more customizable.

• Multiple programming languages (R, python, AWK), and alternative text processing systems (LaTeX, markdown, restructured text & more)

24 |