Supporting Time Critical Events Processing in Grids
and CloudsQian Zhu 1
Supporting Time-Critical Event
Processing in Grids and Clouds
Qian Zhu
Advisor: Professor Gagan Agrawal
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds2
Adaptive Applications
Earthquake modelingCoastline forecasting Medical systems
• Time-Critical Event Processing- Compute-intensive- Time constraints- Application-specific flexibility- Application Quality of Service (QoS)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds3
Adaptive Applications (Cont’d)
Adaptive Applications that
perform time-critical event processing• Application-specific flexibility: parameter
adaptation• Trade-off between application QoS and execution time
HPC ApplicationsHPC Applications(compute-(compute-intensive)intensive)
HPC ApplicationsHPC Applications(compute-(compute-intensive)intensive)
• Aim at maximize performance• Do not consider adaptation
Deadline-drivenDeadline-drivenSchedulingScheduling
Deadline-drivenDeadline-drivenSchedulingScheduling
• Not very compute-intensive
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds4
Motivating Application - Real-time Volume
Rendering• Interactively create a 2D projection of a
large time-varying 3D data set
• Application Flexibility
- Error tolerance (image quality)
- Image size
• Benefit definition (QoS metric)
- To view the rendered images from as many angles as possible
- For each view angle, display the image with the best resolution at the desired image size
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds5
Motivating Application - Real-time Volume
endering•Example
(a) (b)
•How well can we do given 1 minute as the time constraint ?
Note: RMI data set from Lawrence Livermore National Laboratory
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds6
Motivating Application - Great Lake Nowcasting and
Forecasting
•Monitor meteorological conditions of the Lake Erie for nowcasting and forecasting
1km
1km
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds7
Motivating Application - Great Lake Nowcasting and
Forecasting• Application flexibility
- Resolution of grids
- Internal time step
- External time step
• Benefit definition (QoS metric)
- To predict the water level first
- To predict other meteorological information as much as possible
• How much meteorological information can we predict given 1 hour?
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds8
Time Critical Event Processing
•Grid Computing Environment
- Geographically distributed
- Heterogeneous
- Unreliable
•Cloud Computing Environment
- On-demand resource availability
- Pay-as-you-go pricing model
Goal: Maximize the application benefit (QoS) while satisfying the pre-specified time
constraints
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds9
Dissertation Overview
Adaptive applications that perform
time-critical event processing
Grid Cloud
Resource Allocation
Fault Tolerance
Resource Provisioning
Power Management
Parameter Adaptation
Scientific computingMobile applicationsParallel computing
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds10
Challenges-- Parameter
Adaptation•A Large Number of Parameters to be
Adapted
- Discrete and continuous
- Correlations between parameters
•No Knowledge about the Impact of Such Parameters on Execution Time or Benefit
•Pre-specified Time Constraints
- Low adaptation overhead
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds11
Challenges-- Resource
Allocation• Grid/Cloud: Heterogeneous and Dynamic Resources
• Resource Allocation Impacts Application Benefit
• A 20-min event from Volume Rendering application
Ben
efi
t V
alu
e
Resource Configuration
- Different CPU, Memory and/or Bandwidth Usage
• Different application components
• Different value of adaptive parameters
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds12
Challenges-- Fault
Tolerance•Grid Resources
- Heterogenous and Unreliable
•Time Constraints
•Trade-off between Resource Efficiency and Reliability
•Effective, Low-overhead Failure Recovery
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds13
Challenges-- Resource Budget
Constraints•Elastic Cloud Computing
- Pay-as-you-go model
•Satisfy the Application QoS with the Minimum Resource Cost
•Dynamic Resource Provisioning
- Dynamically varying application workloads
- Resource budget
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds14
Contributions
• Parameter adaptation
- Q. Zhu and G. Agrawal (ICAC2008)
• Resource allocation
- Q. Zhu and G. Agrawal (IPDPS2009)
• Fault tolerance
- Q. Zhu and G. Agrawal (SC2009)
• Budget constrained resource provisioning
- Q. Zhu and G. Agrawal (HPDC 2010)
• Power-aware consolidation of workflows
- Q. Zhu, J. Zhu and G. Agrawal (submitted to SC2010)
Goal: Maximize the application benefit (QoS) while satisfying the pre-specified time
constraints
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds15
Roadmap• Motivation and Introduction
• Parameter Adaptation in the Grid Environment
- Application model
- Autonomic adaptation algorithm
- Resource allocation in time-critical event processing
• Budget Constrained Resource Provisioning
• Power-aware Consolidation of Workflows
• Future Work
• Conclusion
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds16
Contributions
• Develop an Autonomic Adaptation Algorithm
- Effectively adjust the parameters
- Low overhead
• Design of an Adaptive Middleware with Support of Easy Deployment of Applications in Grid Environments
• Consider Heterogeneous Resources
- Efficiency value definition
- Efficiency value estimation
- Greedy-based scheduling algorithm
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds17
Application and Environment
Model
Temporal TreeConstruction
Service
CompressionService
Unit ImageRendering
Service
Decompression
ServiceImage
CompositionService
WSTP TreeConstruction
Service
• Volume Rendering application
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds18
Algorithm OverviewGoal: Maximize the application benefit (QoS)
while satisfying the pre-specified time constraints
Input DataInput DataInput DataInput Data......
checkpoint checkpoint 11......
checkpoint checkpoint 22......
......checkpoint checkpoint
11......
checkpoint checkpoint 22......
• Train system model
• Learn the relationship between the values of adaptive parameters and execution time, application benefit
(collect data)
(collect data)
Normal Processing Phase
Input DataInput DataInput DataInput Data......
checkpoint checkpoint 11......
checkpoint checkpoint 22......
......checkpoint checkpoint
11......
checkpoint checkpoint 22......
• Apply the trained system model for parameter adaptation
(adjust parameters)
(adjust parameters)
Event Handling PhaseTime Time
ConstraintConstraintTime Time
ConstraintConstraint
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds19
Parameter Adaptation to Optimal Control Model
•Adaptation Process
•Control Policy
- Policy with learning -- Reinforcement learning
ApplicatioApplicationn
ApplicatioApplicationn
ControlleControllerr
PerformancPerformancee
MeasureMeasureu(k)
D(k)w(k)
D(k)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds20
Resource Allocation
•Heterogeneous and Dynamic Resources
•Different CPU, Memory and/or Bandwidth Usage
- Different service components
- Different values of adaptive service parameters
•Schedule the Service Components to Maximize the Benefit Function Within the Time Constraint
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds22
Efficiency Value
• Assign to and to yields the maximum benefit
• Our definition of efficiency value captures the suitability of different nodes for different services
• Definition
- Represent how efficient to execute a service on a node
- Consider application benefit and execution time
• Estimation
- Based on fuzzy logic
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds23
Roadmap• Motivation and Introduction
• Parameter Adaptation in the Grid Environment
• Budget Constrained Resource Provisioning
- Background: Cloud environment
- Dynamic resource provisioning algorithm
- Framework Design
- Experimental evaluation
• Power-aware Consolidation of Workflows
• Future Work
• Conclusion
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds24
Background: Cloud Environment• Amazon EC2, Google AppEngine, Microsoft Azure,
Magellan ...
• Utility-like Computing
- On-demand scalability of resources
• Resource Cost
- Pricing model: Pay-as-you-go
• Virtualization
- Resource sharing
- Customized deployment and easy migration
- Assumption: Fine-grained resource allocation (i.e., change CPU, memory on-the-fly) and pricing
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds25
Background: Pricing Model
• Charged Fees
- Base price
- Transfer fee
• Linear Pricing Model
• Exponential Pricing Model
Base price charged for the smallest amount of CPU
cycles
Transfer fee for each CPU allocation change
CPU cycle at the ith allocation
Time duration at the ith allocation
Number of CPU cycle allocations
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds26
Problem Description
• Adaptive Applications
- Adaptive parameters
- Benefit
- Time constraint
• Cloud Computing Environment
- Resource budget
- Overprovisioning/Underprovisioning
• Goal
- Maximize the application benefit while satisfying the time constraints and resource budget
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds27
Contributions
•Dynamic Resource Provisioning Algorithm
- Based on multi-input-multi-output feedback control model
- Optimization to reduce provisioning overhead
•Adaptive and SOA Oriented Framework
- Support dynamic virtual CPU and memory allocation based on application requirements
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds28
Approach Overview
Dynamic Dynamic Resource Resource
Provisioning Provisioning (feedback (feedback control)control)
Dynamic Dynamic Resource Resource
Provisioning Provisioning (feedback (feedback control)control)
Resource ModelResource Model(with (with
optimization)optimization)
Resource ModelResource Model(with (with
optimization)optimization)
• Resource Provisioning Controller
- Multi-input-multi-output (MIMO) feedback control model
- Modeling between adaptive parameters and performance metrics
- Control policy: reinforcement learning
• Resource Model
- Map change of parameters to change in CPU/memory allocations
- Optimization: avoid frequent resource changes
change to the
adaptive parameter
s
change to CPU/memor
yallocations
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds29
Resource Provisioning Controller
Performance Performance MetricsMetrics
Performance Performance MetricsMetrics
Multi-Input-Multi-Input-Multi-Output Multi-Output
ModelModel
Multi-Input-Multi-Input-Multi-Output Multi-Output
ModelModel
Control Control PolicyPolicy
Control Control PolicyPolicy
00
• Satisfy time constraints and resource budget
00• Relationship
between adaptive parameters and performance metrics
00
• Decide how to change values of the adaptive parameters
00
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds31
Control Model Formulation -- Performance Metrics
• Performance Metrics
- Processing progress: ratio between the currently obtained application benefit and the elapsed execution time
- Performance/cost ratio: ratio between the currently obtained application benefit and the cost of the resources that have been assigned
•Notation
Application benefit obtained at time step kElapsed execution time at time step kResource cost at time step k
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds32
Control Model Formulation -- Multi-Input-Multi-Output Model• Auto-Regressive-Moving-Average with Exogenous
Inputs (ARMAX)
- Second-order model
- is ith adaptive parameter at time step k
- are updated at the end of every interval
Previous observed performance metricsPrevious and current values of adaptive parameters
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds34
Control Model Formulation -- Control Policy• : Maximize Application Benefit
- Reinforcement learning (Q-Learning)
- Reward function
• : Minimize Control Overhead( )
- Proportional-Integral (PI) controller
• Update Parameter Values
Action taken at time step kApplication benefit, subject to the time and resource budget constraints
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds35
Resource Model
• Offline Training
• Collect Data Points:
• Learn the Relationship Between the Values of the Parameters and CPU/memory Usage
• Model Optimization
- Avoid frequent change to CPU/memory allocations due to resource cost
- Balance global CPU/memory among multiple services
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds36
Framework DesignApplicatioApplicatio
nnApplicatioApplicatio
nn
Virtualization Management (Eucalyptus, Open Virtualization Management (Eucalyptus, Open Nebular...)Nebular...)
Xen HypervisorXen Hypervisor
VMVM VMVM...
Xen HypervisorXen Hypervisor
VMVM VMVM...
Xen HypervisorXen Hypervisor
VMVM VMVM...
ServiceDeployment
ServiceWrapper
Resource ProvisioningController
Application Controller
ResourceModel
ModelOptimizer
PerformanceManager
PriorityAssignme
nt
StatusQuery
Performance
Analysis
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds37
Experiments Setup
• Schemes Compared
- Work-conserving
- Static scheduling
• Metrics
- Benefit Percentage
- Resource Cost
• Emulated Cloud Environment
- Xen 3.0
- ,
- ,
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds38
Resource Model Validation: Hardware Heterogeneity
• Our model predicts CPU cycle and memory usage within 3% comparing to the actual resource usage
• Model trained on homogeneous hardware (M1) and on heterogeneous hardware (M2 and M3)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds39
Performance of Dynamic Resource Provisioning Algorithm
• Considered both linear and exponential pricing models
• In linear pricing model, Our approach performs 24% worse than Work Conserving
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds40
Performance of Dynamic Resource Provisioning Algorithm
• Work Conserving costs 66% more than our approach does
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds41
Resource Provisioning Overhead
• Optimal Execution: ideal resource configurations
• Our approach performs 4%, 2%, 2%, 1% and 0.8% worse than the Optimal Execution
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds42
Roadmap• Motivation and Introduction
• Parameter Adaptation in the Grid Environment
• Budget Constrained Resource Provisioning
• Power-aware Consolidation of Workflows
- Opportunities for consolidation
- Workload analysis
- Consolidation algorithm
- Experimental Evaluation
• Future Work
• Conclusion
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds43
Motivation
• Another Critical Issue in Cloud Environment: Power Management
- HPC servers consume a lot of energy
- Significant adverse impact on the environment
• To Reduce Resource and Energy Costs
- Server consolidation
- Minimize the total power consumption and resource costs without a substantial degradation in performance
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds44
Problem Description
• Our Target Applications
- Workflows with DAG structure
- Multiple processing stages
- Opportunities for consolidation
• Research Problems
- Combine parameter adaptation, budget constraints and resource allocation with consolidation and power optimization
- Challenge: consolidation without parameter adaptation
- Support power-aware parameter adaptation -- future work
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds46
Contributions
•A power-aware consolidation framework, pSciMapper, based on hierarchical clustering and an optimization search method
•pSciMapper is able to reduce the total power consumption by up to 56% with a most a 15% slowdown for the workflow
•pSciMapper incurs low overhead and thus suitable for large-scale scientific workflows
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds47
Opportunities for Consolidation: GLFS
• GLFS nowcasts and forecasts meteorological information for Lake Erie
• GLFS is compute-intensive
• Individual tasks could incur low resource usage
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds48
Resource Usage of GLFS Task1
<1000, 6, 600>
<500, 3, 600>
<2000, 12, 1200> <1000, 6, 600>
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds49
Resource Usage of GLFS Task2
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds50
Observations
•Periodic Behavior w.r.t. CPU, memory, disk, and network usage: Time Series
•Average Resource Usage is Significantly Smaller than its Peak Value
•Dependent on the Values of the Application Parameters and the Characteristics of the Host Server
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds51
Power Consumption Analysis
•Resource Usage Activity
- CPU, memory, disk and network
•Server Consolidation
- Virtualization
- Interference of consolidated workloads
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds52
Power Consumption Analysis: Resource Usage
• All resource activities impact power consumption
• Variation in the CPU utilization has the largest impact
• Memory footprint and cache activities also impact the consumed power
Workload CPUMemor
yDisk
Network
CPU-bound
Vary 2% None None
Memory-bound
70% Vary None None
Disk-bound
50% 2% Vary None
Network-bound
50% 2%18MB
/sVary
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds53
Power Consumption Analysis: Virtualization
• Virtualization incurs very low power overhead
• Contention of CPU cycles
- Dynamic CPU provisioning saves power
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds54
Power Consumption Analysis: Interference
• Consolidating dissimilar workloads incur a small slowdown in the execution time and large savings in power and resource costs
• Consolidating workloads with similar resource requirements significantly increase the execution time
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds55
The pSciMapper Framework Design
Offline Analysis Online ConsolidationScientific Scientific WorkflowsWorkflowsScientific Scientific WorkflowsWorkflows
Resource Usage Resource Usage GenerationGeneration
Resource Usage Resource Usage GenerationGeneration
Temporal Feature Temporal Feature ExtractionExtraction
Temporal Feature Temporal Feature ExtractionExtraction
Feature Feature ReductionReduction
and Modelingand Modeling
Feature Feature ReductionReduction
and Modelingand Modeling
Time Series
KnowledgeKnowledgebasebase
Temporal Signatures
model
Hierarchical Hierarchical ClusteringClustering
Hierarchical Hierarchical ClusteringClustering
Optimization Optimization SearchSearch
AlgorithmAlgorithm
Optimization Optimization SearchSearch
AlgorithmAlgorithmTime Time
VaryingVaryingResource Resource ProvisioninProvisionin
gg
Time Time VaryingVarying
Resource Resource ProvisioninProvisionin
gg
ConsolidatedWorkloads
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds56
Temporal Feature Extraction•Relate Resource Usage to Power
Consumption
•Temporal Signature
- Peak value: max value of the time series
- Relative variance: normalized sample variance
- Pattern: a sequence of samples to represent the pattern
•Notation
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds57
Kernel Canonical Correlation Analysis (KCCA)
• 52 Features from Temporal Signature
- 12 features for CPU, memory, disk and network
- 4 features representing the host capacity
• resource-time and resource-power Relationships
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds58
Power-aware Consolidation
• Distance Metric
• Algorithm
- Initial one-to-one assignment
- Generate resource usage time series (HMM)
- Merge clusters
- Optimal assignment (Nelder-Mead algorithm)
- Dynamic CPU provisioning
distance between task i and jinterference of consuming resource R1 and R2 togetherPearson’s correlation between two workloads w.r.t. the resource usage of R1 (10 pairs in total)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds59
Example
C1
CPU: moderateMem: lowDisk: lowNet: low
C2
CPU: moderateMem: lowDisk: lowNet: moderate
C3
CPU: moderateMem: highDisk: highNet: low
C4
CPU: highMem: moderateDisk: lowNet: low
C5
CPU: lowMem: lowDisk: highNet: moderate
C1Level 1 C2 C3 C4 C5{C1,S2}, {C2,S3}, {C3,S5}, {C4,S1}, {C5,S4}
{(C1, C2), S2}, {C3,S5}, {(C4,C5), S1}
{(C1, C2, C3),S2}, {(C4,C5), S1}
Level 2
Level 3
Level 4Assignment <power, time>
<180.56, 92.87>
<135.11, 88.03>
<93.62, 83.93>
X
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds60
Experiments Setup• Algorithms Compared
- Without Consolidation
- Optimal + Work Conserving
- pSciMapper + Static Allocation
- pSciMapper + Dynamic Provisioning
• Metrics
- Normalized total power consumption
- Execution time
• Emulated Cloud Environment
- Xen 3.0
- GridSim: a grid environment simulator
- CloudSim: a cloud environment simulator
• Power Modeling
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds61
Applications
• Two Real-world Workflows
- GLFS and VR
• Three Synthetic Workflows
Application
CPU Memory DiskNetwor
k
GLFS HighModerat
eModera
teNone
VRModerat
eHigh
Moderate
Moderate
SynApp1 Low Low High High
SynApp2Moderat
eHigh
Moderate
Low
SynApp3 HighModerat
eLow Low
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds62
Normalized Total Power Consumption Comparison: GLFS
• Four different combinations of application parameters
• Total power is saved up to 27% by Optimal and pSciMapper + Dynamic Provisioning is able to save up to 35%
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds63
Normalized Total Power Consumption Comparison: VR and Synthetic Workflows
• In VR, total power is saved up to 58% by Optimal. pSciMapper + Dynamic Provisioning is 8% worse
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds64
Execution Time Comparison: GLFS
• Optimal stops when performance degradation is 15%
• pSciMapper + Dynamic Provisioning performs 12% worse comparing to Without Consolidation
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds65
Execution Time Comparison: VR and Synthetic Workflows
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds66
Scheduling Overhead and Scalability
• The overhead caused by pSciMapper + Dynamic Provisioning is much smaller than Optimal
• pSciMapper is suitable to large-scale scientific workflows
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds67
Roadmap
•Motivation and Introduction
•Parameter Adaptation in the Grid Environment
•Budget Constrained Resource Provisioning
•Power-aware Consolidation of Workflows
•Future Work
•Conclusion
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds68
Schedule Service Components in Parallel
•Service Components can be Parallelized
- One-to-many mapping to processing nodes
•Degree of Parallelism
- Adaptive parameters
•How Does Degree of Parallelism Impact Parameter Adaptation
•How to Schedule Multiple Instances of Certain Service Components
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds69
Power-aware Adaptation
•Adaptive Parameters Impact Application QoS and Execution Time
•Different Resource Usage Lead to Different Levels of Power Consumption
•Co-hosting Service Components Incur Performance Interference
•How can we achieve the required application quality with the minimum power consumption?
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds70
Performance Modeling
•Detailed Performance Analysis
- Feedback to the application user
- Identify the performance bottleneck
•Help Understand the Application Behavior that is Dependent on Adaptive Parameters
•How to Determine the Factors that Limit Application Performance Accurately
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds71
Large-Scale Optimization
• Peta-scale Applications
- Scientific computing
• A Large Number of Parameters
- Continuous and discrete
- Correlation
- Unstructured vs. structured search space
• How to Efficiently Explore the Large Parameter Space
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds72
Roadmap
•Motivation and Introduction
•Parameter Adaptation in the Grid Environment
•Budget Constrained Resource Provisioning
•Power-aware Consolidation of Workflows
•Future Work
•Conclusion
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds73
Conclusion
• An Autonomic Adaptation Algorithm and an Adaptive Middleware
• In Grid Computing Environment
- An efficient resource allocation approach
- An effective fault tolerance scheme
• In Cloud Computing Environment
- A dynamic resource provisioning framework
- pSciMapper: power-aware consolidation framework
Goal: Maximize the benefit (QoS) of adaptive applications while satisfying the pre-specified time
constraints
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds74
Thank You!
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds75
Related Work: Parameter Adaptation
•Autonomic Adaptation
- Lim et al. (CCNC06), Valetto et al. (ICAC05), Ruth et al. (ICAC06)
•Autonomic Computing Middleware
- AutoMate(Vanderbilt), Q-Fabric (Georgia Tech.)
•Reinforcement Learning in Autonomic Computing
- Tesauro et al. (ICAC06)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds76
Related Work: Resource Allocation
•Resource Allocation in Grid Computing
- Singh et al. (HPDC07), Huang et al. (SC07)
- Xu et al. (ICAC07)
•Real-Time Scheduling
- Survey: Sha et al. (Real-time Systems 04)
- Gopalan et al. (MMCN02), Ghosh et al. (Cluster06)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds77
Related Work: Resource Provisioning•Cloud Computing Systems
- Amazon EC2, Google AppEngine, Microsoft Azure, Eucalyptus (UCSD)
•Virtualized Resource Scheduling
- Diao et al. (ACC02), Padala et al. (EuroSys07,09)
•Scheduling with Budget Constraints
- Garg et al. (ACSC09), Sakellariou et al. (GRID07)
Qian Zhu
Supporting Time Critical Events Processing in Grids
and Clouds78
Related Work: Power-aware Consolidation• Scientific Workflow Scheduling
- Pegasus (USC), Kelper (UCSB), ASKALON (Innsbruck)
• Power Management
- Dynamic Voltage and Frequency Scaling (DVFS)
• Wang et al. (HPCA08), Govandin et al. (EuroSys 09), Laszewski et al. (Cluster09)
- Consolidation
• Srikantaiah et al. (HotPower08), Verma et al. (USENIX09)