Reproducible Research and the Cloud

Post on 26-Jan-2015

115 views 1 download

description

Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world. This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility. Find out more at www.azure4research.com

transcript

Reproducible

Research and

the Cloud

Dr Kenji Takeda (Kenji.Takeda@Microsoft.com)

Microsoft Research

@azure4research

Microsoft Research

Scientific Discovery

Credit: ROYAL INSTITUTION OF GREAT BRITAIN / SCIENCE PHOTO LIBRARY

𝜌𝐷𝑣

𝐷𝑡= −𝛻𝑝 + 𝛻 ∙ 𝜯 + 𝒇

The Research Lifecycle

Data

Acquisition & modelling

Collaboration and

visualisation

Analysis & data mining

Dissemination & sharing

Archiving and preserving

fourthparadigm.org

Believe it or not: how much can we rely on

published data on potential drug targets?

“at least 50% of published studies, even those in top-tier academic journals,

can’t be repeated with the same conclusions by an industrial lab”

Osherovich, L. Hedging against academic risk. SciBX 14 Apr 2011 (doi:10.1038/scibx.2011.416).

CLOUD COMPUTING

Global

presence

Datacenter

Edge point

The Microsoft Cloud

Cloud Computing

Choose from multiple runtimes and languages for your applications: Python, Java, PHP, .NET, Node.js

Run Linux on Windows Azure Virtual Machines (VHD)

Support multiple frameworks and popular open source applications with Windows Azure Web Sites

HDInsight Hadoop for Big Data analysis

Windows Azure

http://github.com/windowsazure

REPRODUCIBLE RESEARCH

htt

p:/

/ww

w.p

hd

com

ics.

com

/co

mic

s.p

hp

?f=

16

89

• Computational experiments should be recomputable for all time

• Recomputation of recomputable experiments should be very easy

• It should be easier to make experiments recomputable than not to

• Tools and repositories can help recomputationbecome standard

• The only way to ensure recomputability is to provide virtual machines

• Runtime performance is a secondary issue

Ian Gent , Alexander Konovalov and Lars KotthoffSteven Crouch, Devasena Inupakutika

Recomputation.org

Zanadu.IO

Patrick Henaff and Claude Martini

Zanadu.IO

khmer-protocols:

• Effort to provide standard “cheap” assembly protocols for cloud machines.

• Entirely copy/paste; ~2-6 days from raw reads to assembly, annotations, and differential expression analysis. Est ~$150 per data set

• Open, versioned, forkable, citable.

Open Science

C. Titus Brown, @ctitusbrown

http://ged.cse.msu.edu/http://ivory.idyll.org/

Explicitly a “protocol” – explicit steps, copy-paste, customizable, versioned; not black box.

No requirement for computational expertise or significant computational hardware.

~1-5 days to teach a bench biologist to use.

$100-150 of rental compute (“cloud computing”)…

…for $1000 data set.

Now adding in quality control and internal validation steps.

Some thoughts…

Reproducible computing

environment(Azure)

Publicly available

data(MMETSP)

Open and versioned protocol

Provenance

tracking and

registration

(Synapse?)

Distribution Modeller

<compute + data>

Middle ground between:

Exploratory science

Procedural science

Black box that can be cracked open and modified

• Reproducing my

own results

• Replicating other

people’s results

• Reproducing other

people’s results

Repeatability, Replicability,

Reproducibility, Reuse

“reviewers have no time and no resources to reproduce

data and to dig deeply into the presented work. “Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-

biotech-failures/#0_ undefined,0_

Ph

oto

: lee

chan

tmca

rth

ur,

CC

-BY

Windows Azure for Research

• Azure Research Awards

• Windows Azure for Research Training Courses

– Manchester, 3-4 April’14

• Webinars

• Technical resources & curriculum

• Research community engagements

www.azure4research.com