Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | lester-glenn |
View: | 220 times |
Download: | 1 times |
Ocean Observatories Initiative
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
R3 Life Cycle Objective Review forCommon Execution Infrastructure (CEI) Subsystem
Kate KeaheyDavid LaBissonierePatrick ArmstrongPierre Riteau
1
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Subsystem Purpose
• Allow OOI applications and system to– Provide Highly Available (HA)
services– Scale to demand
• Enact OOI deployment policies in elastic environment
• Provide a deployment foundation for OOI CI
2
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Overview
• CEI Overview
• R3 Scope
• Cloud Provider Options
• Risks
• Elaboration Plan
3
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Resources for HA and Scaling
04/20/23
4
EPU ManagementMonitor and regulate set properties
based on system-specific and application-specific metrics
– Cloud resources are available on-demand, but any particular resource may fail at any time
– Applications/processes can absorb new resources– Applications/processes can tolerate failures
EPU
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
EE ioncore 1.3
EPU ManagementEPU ManagementEPU Management
Elastic Processing Unit (EPU) Management
5
EE ioncore 1.2
context-agent
ou-agent
EE matlab 6.1
context-agent
ou-agent
Decision Engine
context-agent
ou-agent
Provisioner
IaaS
create instance
AMQP
OtherDTRS
CB
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Making the EPU HA
ou-agent ou-agent ou-agent
EPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU WorkerEPU Worker
EPU WorkerEPU WorkerEPU Worker
Bootstrap EPU
Dedicated DEProvisioner/DTRS
IaaS
create instance
AMQP
Other
cloudinit.d
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Managing Processes: Creating a Process I
7
Process Definition Registry
Process Dispatcher EE type A instanceProcess Instance Registry
request to activateprocess X
ee-agentDecision Enginelookup
launch
enter
AMQP
Other
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Managing Processes: Creating a Process II
8
Process Definition Registry
Process Dispatcher
Provisioner/DTRS
IaaS
EE type A instance
EPU Management
Process Instance Registry
request to activateprocess X
ee-agentDecision Enginelookup
launch
enter
request instance
create instance
AMQP
Other
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
CC instance
CC instance
Managing Processes: Inside an Execution Engine
9
EE type A instance
context-agent
ee-agent
ou-agent
supervisord
supervisord
supervisord
KeplerC
C
M
CMR
CMR
CMK
CMKO
CMKO
datastream subscription result
Process Dispatcher
EPU Management
Package Server
process (adapter) 1
AMQP
Other
C – create M – monitor R – restart K – kill O – I/OC – create M – monitor R – restart K – kill O – I/O
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Adventures in Availability
• Time to repair (TTR)– Diagnosis– Time to scale (TTS)
• PENDING (request)• STARTED (deployment)• RUNNING
(contextualization)
04/20/23
10
A = MTBFMTBF+MTTR
mean time between failures
mean time to repair
TTS: preliminary results for 2,000 VMs provisioned on AWS EC2
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
CEI R3 Proposed Scope• Robustness
– upgrade mechanisms, maintainability code refactor, more unit tests, scale and stress testing, documentation, packaging, support, etc.
• Integration– Component interaction update, tight inter-component integration
• New features– Process and resource management
• Process activation and validation• New execution site registration
– Integration with National Infrastructure• Framework for integration of academic cloud providers such as XSEDE
and OSG
– Support for a new cloud provider
– SLA management
11
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Cloud Provider Options• Windows Azure
– Initially PaaS, now offers IaaS with Windows & Linux
– Pros: 8 regions in North America, Europe, and Asia
– Cons: no libcloud support, still in preview mode (no SLA yet)
• Rackspace– Pros: based on OpenStack, libcloud support
– Cons: only 3 regions with 2 in the USA
• Google Compute Engine– Pros: targets high performance/throughput clusters, advertises
50% more CPU per $ compared to EC2
– Cons: still in limited preview, no libcloud support, few regions for now
12
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Risks
• Scope– Mitigation: scope prioritization with the architects
• Handoff process
13
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
R3 Elaboration 1• Theme: focus on support and high-risk elements
• Support activities (includes R2 Transition and R2.1 features)
• Assist with Kepler integration
• Design and prototype process package download and installation scheme (with COI)
• Initial prototype of Chef server integration (upgrades)
• Integration: eliminate resource registry mirroring in Process Dispatcher
14
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
R3 Elaboration 2
• Theme: continue to support existing deployments and fix issues while emphasizing integration.
• Continued support activities
• Assist with Kepler Integration
• Integration: pyon capabilities in EPUM, DTRS, and Provisioner
• New execution site registration
15
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
CEI R3 Team
1604/20/23
CEI DeveloperPatrick ArmstrongUniversity of Chicago(location: Victoria, Canada)
CEI DeveloperPierre RiteauUniversity of Chicago(location: Oxford, England)(part-time)
CEI Senior DeveloperDavid LaBissoniereUniversity of Chicago(location: Chicago, IL)
CEI DesignerKate KeaheyArgonne National LabUniversity of Chicago
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
Questions?
17
OOI CyberinfrastructureLife Cycle Objectives ReviewJanuary 8-9, 2013
CEI R3 Proposed Scope• Robustness
– upgrade mechanisms, maintainability code refactor, more unit tests, scale and stress testing, documentation, packaging, support, etc.
• Integration– Component interaction update, tight inter-component integration
• New features– Process and resource management
• Activation and validation• New execution site registration
– Integration with National Infrastructure• Framework for integration of academic cloud providers such as XSEDE
and OSG
– Support for a new cloud provider
– SLA management
18