Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | devasena-inupakutika |
View: | 222 times |
Download: | 0 times |
of 48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
1/48
Reproducible
Research andthe Cloud
Dr Kenji Takeda ([email protected])
Microsoft Research
@azure4research@ktakeda1
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
2/48
Microsoft Research
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
3/48
Scientific Discovery
= + +
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
4/48
The Research Lifecycle
Data
Acquisition &modelling
Collaboration
andvisualisation
Analysis &data mining
Dissemination& sharing
Archiving andpreserving
fourthparadigm.org
http://fourthparadigm.org/http://fourthparadigm.org/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
5/48
X
-
Info
The evolution of X-Info
and Comp-X for each discipline X How to codify and represent our knowledge
Data ingest
Managing a petabyte Common schema
How to organize it
How to reorganize it
How to share with others
Query and Vis tools
Building and executing models Integrating data and Literature
Documenting experiments
Curation and long-term
preservation
The Generic Problems
Experiments &Instruments
Simulations
Literature
Other Archives
facts
facts
facts
facts
Questions
Answers
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
6/48
Data-Intensive Research
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
7/48
Believe it or not: how much can we rely onpublished data on potential drug targets?
at least 50% of published studies, even those in top-tier academic journals,
cant be repeated with the same conclusions by an industrial lab
Osherovich, L. Hedging against academic risk. SciBX14 Apr 2011 (doi:10.1038/scibx.2011.416).
http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.htmlhttp://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.htmlhttp://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
8/48
Cold fusion
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
9/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
10/48
Science 2.0 EU Consultation
http://www.consultation-science20.eu/
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
11/48
CLOUD COMPUTING
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
12/48
On-demand services,delivered over the network
Cloud computing provides
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
13/48
Getting what you need,when you need it
Cloud computing is good for
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
14/48
Focussing on your research
Cloud computing is good for
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
15/48
The Cloud
democratizes
access to scale &
economies of scale
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
16/48
CloudComputing
Patterns
tCompute
Inactivity
Period
t
t
t
On and OffOn & off workloads (e.g. batch job)
Over provisioned capacity is wasted
Time to market can be cumbersome
Unpredictable BurstingUnexpected/unplanned peak in demand
Sudden spike impacts performance
Cant over provision for extreme casesCompute
Growing FastSuccessful services needs to grow/scale
Keeping up w/ growth is big IT challenge
Cannot provision hardware fast enoughCompute
Predictable BurstingServices with micro seasonality trends
Peaks due to periodic increased demand
IT complexity and wasted capacity
Compute
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
17/48
Globalpresence
Datacenter
Edge point
The Microsoft Cloud
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
18/48
Cloud Computing
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
19/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
20/48
Choose from multiple runtimes and languages for your
applications: Python, Java, PHP, .NET, Node.js
Run Linux on Microsoft Azure Virtual Machines (VHD)
Support multiple frameworks and popular open source
applications with Microsoft Azure Web Sites
HDInsightHadoop for Big Data analysis
Microsoft Azure
http://github.com/windowsazure
http://github.com/windowsazurehttp://github.com/windowsazure8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
21/48
Research Cloud Ecosystem
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
22/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
23/48
http://www.p
hdcomics.com
/comics.p
hp?f=1689
http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=16898/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
24/48
Computational experiments should berecomputable for all time
Recomputation of recomputable experimentsshould be very easy
It should be easier to make experimentsrecomputable than not to
Tools and repositories can help recomputationbecome standard
The onlyway to ensure recomputability is toprovide virtual machines
Runtime performance is a secondary issue
Ian Gent , Alexander Konovalov and Lars Kotthoff
Steven Crouch, Devasena Inupakutika
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
25/48
Recomputation.org
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
26/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
27/48
Zanadu.IO
Patrick Henaff and Claude Martini
http://zanadu.io/http://zanadu.io/http://zanadu.io/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
28/48
Zanadu.IO
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
29/48
khmer-protocols: Effort to provide standard
cheap assemblyprotocols for cloudmachines.
Entirely copy/paste; ~2-6days from raw reads toassembly, annotations,and differentialexpression analysis. Est~$150 per data set
Open, versioned,forkable, citable.
Open Science
C. Titus Brown, @ctitusbrown
http://ged.cse.msu.edu/
http://ivory.idyll.org/
http://ged.cse.msu.edu/http://ivory.idyll.org/http://ivory.idyll.org/http://ged.cse.msu.edu/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
30/48
Explicitly a protocol explicitsteps, copy-paste, customizable,versioned; not black box.
No requirement for computational
expertise or significantcomputational hardware.
~1-5 days to teach a benchbiologist to use.
$100-150 of rental compute(cloud computing)
for $1000 data set.
Now adding in quality control andinternal validation steps.
Some thoughts
Reproduciblecomputingenvironment
(Azure)
Publiclyavailable
data
(MMETSP)
Open andversionedprotocol
Provenancetracking and
registration
(Synapse?)
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
31/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
32/48
Computing Cancer
http://biomodelanalyzer.research.microsoft.com/
http://biomodelanalyzer.research.microsoft.com/http://biomodelanalyzer.research.microsoft.com/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
33/48
Troubling Trends in Scientific Software
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
34/48
Azure Machine Learning
Azure Machine Learning Awards 15 Sep14
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
35/48
Azure Machine Learning - Sharing
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
36/48
www.tryfsharp.org
http://www.tryfsharp.org/http://www.tryfsharp.org/create/kenji/WorldBankeDemo.fsxhttp://www.tryfsharp.org/create/kenji/WorldBankeDemo.fsxhttp://www.tryfsharp.org/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
37/48
NOTES FROM THE FIELD
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
38/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
39/48
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
40/48
http://www.rigb.org/docs/faraday_notebooks__induction_0.pdf
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
41/48
21st Century Log Notebooks
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
42/48
Verification versus Validation
Are you building
it right?
Are you building
the right thing?
R t bilit R li bilit
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
43/48
Reproducing myown results
Replicating otherpeoples results
Reproducing otherpeoples results
Repeatability, Replicability,Reproducibility, Reuse
reviewers have no time and no resources to reproducedata and to dig deeply into the presented work.
Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-biotech-failures/#0_ undefined,0_
Photo:leecha
ntmcarthur,CC-BY
8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
44/48
Enabling Science 2.0
www.azure4research.com
http://www.azure4research.com/http://www.azure4research.com/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
45/48
Use laptops &desktop computers
Overwhelmed bydata
Finding analysisever more difficult;sharing evenharder
www.azure4research.com
Enabling Science 2.0
http://www.azure4research.com/http://www.azure4research.com/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
46/48
Microsoft Azure for Research
Azure Research Awards General next 15 Aug
Machine Learning next 15 Sep
Microsoft Azure for ResearchOnline Training
Webinars
Technical papers & walkthroughs
Research community engagementswww.azure4research.com
http://www.azure4research.com/http://www.azure4research.com/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
47/48
THANK YOU
www.azure4research.com
Microsoft Azure for Research Group
@azure4research
http://www.azure4research.com/http://www.azure4research.com/http://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttp://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttps://twitter.com/Azure4Researchhttps://twitter.com/Azure4Researchhttp://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttp://www.azure4research.com/http://www.azure4research.com/8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews
48/48