Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Maciej'Malawski1,2,'Piotr'Nowakowski1,'Tomasz'Gubała1,'Marek'Kasztelnik1,'Marian'Bubak1,2,'Rafael'Ferreira'da'Silva3,'Ewa'Deelman3,'Jarek'Nabrzyski4'
'NSFCloud'Workshop'on'Experimental'Support'for'Cloud'CompuOng''
December'11Q12,'2014,'Arlington,'VA'
AGH University of Science and Technology: 1 ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland 2 Department of Computer Science, al. Mickiewicza 30, 30-095 Kraków, Poland 3 University of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA 4 Center for Research Computing, University of Notre Dame, IN, USA
2
Research Challenges
• Execution of complex scientific applications on clouds: workflows and their ensembles • Pegasus Workflow Management System (OCI SI2-SSI #1148515)
• HyperFlow Workflow Engine
• Platform for deployment and sharing of scientific applications on hybrid clouds • Atmosphere Framework
• Algorithms for scheduling, provisioning and cost optimization: • Dynamic and Static Algorithms • Mathematical Programming • Cloud Workflow Simulator
2
3
Research: The Atmosphere Framework
3
Hybrid cloud as a means of provisioning computing power for virtual experiments
Cloud'Management'Portlets'
GUI'host'(provisions'endQuser'features'and'access'opOons)'
Provide'GUI'elements'which'enable'service'developers'and'end'users'to'interact'with'the'Atmosphere'plaYorm'and'create/deploy'services'on'the'available'cloud'resources'
Atmosphere'Core'Services'Host'
user'accounts'
Atmosphere'Registry'(AIR)'
available'cloud'sites' services'and'templates'
Atmosphere'Core'
Secure'RESTful'API'(Cloud'Facade)'
• AuthenOcaOon'and'authorizaOon'logic'• CommunicaOon'with'underlying'computaOonal'clouds'• Launching'and'monitoring'service'instances'• CreaOng'new'service'templates'• Billing'and'accounOng'• Logging'and'administraOve'services'
Worker'Node'
Worker'Node'
Worker'Node'
Head'Node'
Image'store'OpenStack'cloud'site'at'ACC'CYFRONET'AGH'
96'CPU'cores'
184'GB'RAM'
4'TB'storage'
private'IP'space'
Worker'node'w/large'resource'pool'(“fat'node”)'
Head'Node'
Image'store'VPHQShare'cloud'site'at'UNIVIE'
Worker'node'w/large'resource'pool'(“fat'node”)'
128'CPU'cores'
256'GB'RAM'
4'TB'storage'
private'IP'space'
API'host'
Image'store'Amazon'ElasOc'Compute'Cloud'(EC2)'–'European'availability'zone'
Massive'(funcOonally'limitless)'hardware'resource'pool'public'IP'space'
Worker'Node'
Worker'Node'
4
Research: Simulation and Scheduling of Large-Scale Scientific Workflows on IaaS Clouds
• Large-scale scientific workflows from Pegasus WMS • Workflows of 100,000 tasks
• Workflow Ensembles • Schedule as many workflows as possible
within a budget and deadline • Uses a Cloud Workflow Simulator
4
TimeVM
M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. SC 2012: 22
5
Research: Cost Optimization of Applications on Clouds
• Infrastructure model • Multiple compute and storage
clouds • Heterogeneous instance types
• Application model • Bag of tasks • Multi-level workflows
• Modeling with AMPL and CMPL • Modeling Language for
Mathematical Programming
• Cost optimization • Under deadline constraints
• Mixed integer programming • Bonmin, Cplex solvers
5
M. Malawski, K. Figiela, J. Nabrzyski, Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, 29(7), 2013, pp.1786-1794, http://dx.doi.org/10.1016/j.future.2013.01.004 M. Malawski, K. Figiela, M. Bubak, E. Deelman, J. Nabrzyski, Cost Optimization of Execution of Multi-level Deadline-Constrained Scientific Workflows on Clouds. PPAM, 2013, 251-260 http://dx.doi.org/10.1007/978-3-642-55224-3_24
0
500
1000
1500
2000
2500
3000
0 10 20 30 40 50 60 70 80 90 100
Cos
t ($)
Time limit (hours)
20000 tasks, 512 MiB input and 512 MiB output, task execution time 0.1h @ 1ccu machine
Rackspace instances
Rackspace and private instances
Amazon's and private instances
Multiple providers
Amazon S3Rackspace Cloud Files
Optimal
Layer 1 A
Layer 2B
B B C
Layer 3 D
Layer 4 E
Layer 5 F
1h
2.5 h
0.5 h
0.3 h
2 h
6 h
Private cloud
Compute
private
Amazon
Storage
Compute
m1.small m1.large
t1.micro m2.xlarge
Task
Input
Output
Application
Rackspace
Storage
Compute
rs.1gb rs.2gb
rs.4gb rs.16gb
6
Research: Cloud Performance Evaluation • Performance of VM deployment times
• Virtualization overhead
• Evaluation of open source cloud stacks • Eucalyptus, OpenNebula, OpenStack
• Survey of European public cloud providers
• Performance evaluation of top cloud providers
• EC2, RackSpace, SoftLayer • A grant from Amazon has been obtained
6
M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski, S. Varma, Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013, Delft, the Netherlands, pp.13-16, 2013
IaaS Provider
EEA Zoning
jClouds API
Support
BLOB storage support
Per-hour
instance billing
API Access
Published price
VM Image
Import / Export
Relational DB
support Score Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12
10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5 13 BitRefinery 0 0 0 0 0 1 0 1 0 14 BrightBox 1 0 0 1 1 1 1 0 0 15 BT Global Services 1 0 0 0 1 0 1 0 0 16 Carpathia Hosting 1 0 0 0 0 0 1 0 0 17 City Cloud 1 0 0 1 1 1 0 0 0 18 Claris Networks 0 0 0 1 0 0 0 0 0 19 Codero 0 0 0 1 1 1 0 0 0 20 CSC 1 0 0 0 0 0 1 0 0 21 Datapipe 1 0 0 1 1 0 0 0 0 22 e24cloud 1 0 0 1 0 1 0 0 0 23 eApps 0 0 0 0 0 1 0 0 0 24 FlexiScale 1 0 0 1 1 1 1 0 0 25 Google GCE 1 0 1 1 1 1 0 1 0 26 Green House Data 0 0 0 0 1 0 1 0 0 27 Hosting.com 0 0 0 0 0 1 1 1 0 28 HP Cloud 0 1 1 1 1 1 1 1 0 29 IBM SmartCloud 0 0 1 1 1 1 0 1 0 30 IIJ GIO 0 0 0 0 0 0 0 0 0 31 iland cloud 1 0 0 1 0 1 1 0 0 32 Internap 0 0 1 1 1 1 0 0 0 33 Joyent 0 0 0 1 1 1 0 0 0 34 LunaCloud 1 0 1 1 1 1 0 0 0 35 Oktawave 1 0 1 1 1 1 0 1 0 36 Openhosting.co.uk 1 0 0 0 0 1 0 0 0 37 Openhosting.com 0 1 0 1 1 1 1 0 0 38 OpSource 1 0 1 1 1 1 1 0 0 39 ProfitBricks 1 0 0 1 1 1 0 0 0 40 Qube 1 0 0 0 0 1 0 0 0 41 ReliaCloud 0 0 0 0 0 0 0 0 0 42 SaavisDirect 0 0 1 1 0 1 0 0 0 43 SkaliCloud 0 1 0 1 1 1 1 0 0 44 Teklinks 0 0 0 0 0 0 0 0 0 45 Terremark vcloud 0 1 0 1 1 1 1 0 0 46 Tier 3 0 0 0 0 1 0 0 0 0 47 Umbee 1 0 0 1 1 1 1 0 0 48 VPS.net 1 0 0 0 1 1 0 0 0 49 Windows Azure 1 0 1 1 1 1 0 1 0
7
Experiment: Evaluation of autoscaling techniques for Atmosphere cloud platform
• Challenges • Requires repeated tests under
varying workloads • Experiments in an isolated
environment
• Goals • Perform autoscaling based on:
• Complex event processing • Time series database
• Build an isolated environment on NSFCloud
7
8
Experiment: Scalability of Scientific Workflows in HyperFlow Model
• Challenges • Issues on data transfers and data locality • Calibrate the performance models of applications
• Goals • Execute large-scale deployments on multi-site NSFCloud facilities • Assess the impact of network latency and bandwidth limitations
8
PaaSage application
Hyperflowengine
Task scheduler
Cloud
Executor 1Executor 1
VMs
Workflow componentsExecutor
RabbitMQ
Job queue
Results
Monitoring
Ready tasks Scheduled tasks
Redis
CAMEL model
Workflow graph
Workflow CAMEL
generator
Workflow generator
PaaSage platform
Upperware Executionware
MetricsMetrics
Deploy & scale infrastructure
9
Experiment: Influence of Variability of Clouds on the Quality of Algorithms
• Challenges • Static scheduling methods assume that the
estimates of task runtimes are available • The runtime variations and various
uncertainties influence the actual execution
• Goals • A large-scale experimental testbed will allow
investigating the influence of the uncertainties
• Development of new models to mitigate uncertainties negative effects
9
0.0
0.5
1.0
1.5
2.0
Mak
espa
n / D
eadl
ine
DPD
SW
ADPD
SSP
SSD
PDS
WAD
PDS
SPSS
DPD
SW
ADPD
SSP
SSD
PDS
WAD
PDS
SPSS
DPD
SW
ADPD
SSP
SSD
PDS
WAD
PDS
SPSS
DPD
SW
ADPD
SSP
SS
0 % 1 % 2 % 5 % 10 % 20 % 50 %
DPDSWADPDSSPSS
Runtime estimate error
10
Experiment: Interoperation of Cloud Testbed of PL-Grid Infrastructure with NSFCloud
• PL-Grid • One of the largest national grid infrastructures in Europe (2500+ users,
500+ teams) • Cloud testbed based on OpenNebula and OpenStack
• Goals • Possibility to run transatlantic and global-scale experiments • Evaluation of impact of wide-area and high-latency networks
10
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Thank&you.&
!DICE!Team!at!AGH:!h0p://dice.cyfronet.pl!
Center!for!Research!Compu@ng!at!Notre!Dame:!h0ps://crc.nd.edu!Pegasus!Team!at!USC:!h0p://pegasus.isi.edu!