Date post: | 21-Jun-2015 |
Category: |
Science |
Upload: | ravi-madduri |
View: | 21,095 times |
Download: | 1 times |
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
Science as a Service on AWS
Ravi K Madduri
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Outline
• CI Mission and Introduction of Science as a Service
• Motivation– Why is this important?
• Separation of concerns – Going far together• Examples of Science as a Service• Focus on Globus Genomics as a Success story
– Announcing Globus Genomics AWS Test Drive
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Our Vision for a 21st Century Discovery Infrastructure
Provide more capability for people at lower cost by delivering
Science as a servicewww.globus.org
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Two Broader Themes
• Productivity of Researchers– Time spent performing administrative tasks Vs
time spent doing science – Reproducibility
• Sustainability of scientific software– Reduction in funding for science
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
42%
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Presenting21st Century Discovery Infrastructure
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Going Far Together
Separation of Concerns
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Our Science Stack• Galaxy
– Interactive execution– Creation, Execution, Sharing,
Discovering Workflows
• Globus– Data management– Identity Management
• AWS– EC2, EBS, S3, SNS, Spot,
Route 53, Cloud Formation
SaaS
PaaS
IaaS
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time-consuming tasks in science
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports
• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
DataSource
DataDestination
User initiates transfer request1
Globus moves and syncs files2
Globus notifies user3
Globus: Fast, reliable data transfer
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Amazon S3 Endpoints
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
DataSource
User A selects file(s) to share, selects user or group, and sets permissions
1
Globus tracks shared files; no need to move files to cloud storage!
2
User B logs into Globus and
accesses shared file
3
Globus: Sharing off existing systems
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
MyProxy
Globus: Federated identity
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
>25,000 registered users; >150 daily50 PB moved; >1B files
10x (or better) performance vs. scp99.9% availability
Entirely hosted on Amazon
Globus Transfer
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Metadata
Access Control
License
Storage
Curation Workflow
PoliciesCollection
Globus: Data publication service
Metadata
DataMetadata
Data
Metadata
Data
DatasetDataset
Dataset
Community
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Time-consuming tasks in science• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Globus Science Stack in Action
Sequencing Centers
Sequencing Centers
PublicData
Storage
Local Cluster/CloudSeq
Center
Research Lab
Globus Provides a• High-performance • Fault-tolerant• Secure
file transfer Service between all data-endpoints
Data Management Data Analysis
Picard
GATK
Fastq Ref Genome
Alignment
Variant Calling
Galaxy Data Libraries
Globus Genomics on Amazon EC2
• Analytical tools are automatically run on the scalable compute resources when possible
• Globus Integrated within Galaxy
• Web-based UI• Drag-Drop workflow
creations• Easily modify Workflows
with new tools
Galaxy Based Workflow Management System
FTP, SCP, others
FTP, SCP
SCP
Globus SaaS
FTP,
SCP,
HTTP
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Flexible, scalable, affordable
genomics analysis for all biologists
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Globus Genomics• Analysis tools profiled for optimal
performance
• Workload management for parallel execution
• Resources provisioned on demand
• High performance, reliable data movement
• Seamless access using institution’s credentials
• Best practice + extensible, customizable pipelines
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Globus Climate
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Globus Materials
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Cardio Vascular Research
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Proton Cancer Treatment
No. Histories
Execution Time (s)
No. Per Hour
On-demand Cost ($2.10)
Spot Cost ($0.50)
1.5B 570 6 $35 $91B 445 8 $27 $70.5B 283 12 $18 $50.25B
170 21 $10 $2
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Usage has been promising
January February March April May June0
200000
400000
600000
800000
1000000
1200000
0
2000
4000
6000
8000
10000
12000Instance Hours Cost
Date
Inst
ance
Hou
rs
Cost
($)
2.5 Million Core hours
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Exome: 3 – 12hrs ~1hr
Whole Genome: ~22hrs ~10hrs
RNA-Seq: 1 – 12hrs ~minutes
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Diversity of collaborations
DobynsLab
Cox LabVolchenboum LabOlopade Lab
Nagarajan Lab
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Common misconceptions• Cloud is expensive• Cloud is insecure• It takes a long time to move data and its hard• Cloud is about VMs and we got VMs• My codes won’t run on the cloud• Cloud is not HPC-enough• Amazon will be acquired or will file for bankruptcy
– What happens to my data?
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Possible Solutions
• Outreach• Case studies with TCO for various
domains and problem types• Compliance• Transparency in Billing
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Our Vision for a 21st Century Discovery
InfrastructureTo make advanced
computational capabilities available to all researchers at
substantially lower cost
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
We’re “all in” on cloud
Identify time-consuming activities amenable to automation, outsourcing and deliver as high-quality, low-touch SaaS
Extract common elements as a research data management automation PaaS
Leverage IaaS for reliability, economies of scale
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank you to our sponsors!