Date post: | 30-Aug-2018 |
Category: |
Documents |
Upload: | hoangxuyen |
View: | 215 times |
Download: | 0 times |
Comprehensive Elastic Resource
Management to Ensure Predictable
Performance for Scientific Applications
on Public IaaS Clouds.
In Kee Kim, Jacob Steele, Yanjun Qi, Marty Humphrey
CS@University of Virginia
[1] Schedule-based Scaling
Static approach
[2] Rule-based Scaling
• Dynamic but Delays
– Reactive
• Auto-Scaling• Scale Up – Job Deadline Satisfaction (High Demand)
• Scale Down – Cost Efficiency (Low Demand)
Schedule-based Scaling
T1 T2 T3 T1
Rule-based Scaling
Over ProvisioningUnder Provisioning
Scale Up Delay Scale Down Delay
Current Approach
Research Goal and Approach
4
• In order to meet 1) user-defined job deadline and 2) minimize execution cost for scientific applications that have highly variable job execution time, we design a Comprehensive Resource Management System by utilizing
- Local Linear Regression-based Job Execution Time Prediction
- Cost/Performance-Ratio based Resource Evaluation
- Availability-Aware Job Scheduling and VM Scaling
LLR: Job Execution Time Prediction
• Initial Intuition
– Job execution time has a linear relationship with IaaS/Application parameters
• Data Collection (26 samples on 4 types of VMs) and Correlation Analysis
• Local Linear Regression
Size of Data Type of VM
Non-Data Intensive Operation 0.0973 (negligible) 0.7089 (strong)
Data Intensive Operation 0.6129 (moderate) 0.3223 (weak)
Simple Linear Model → Cannot Produce Reliable Prediction
error
(a) Global Linear regression on m1.large (using all samples)
(b) Local Linear Regression on m1.large(Using three samples)
Jo
b E
xe
cu
tio
n T
ime
(s
ec
.)
Jo
b E
xe
cu
tio
n T
ime
(s
ec
.)
Availability-Aware Job Scheduling
• AAJS first assigns as many jobs as possible to current running VMsbased on CP evaluation results.– Maximize machine utilization of current running VM instances.
– Minimizing overhead from staring new VMs
• Job Assignment Criteria1) VM which has higher order (rank) in Cost/Performance ratio.
2) VM which offers earliest job completion time if multiple options available.
Queue Wait Time + New Job Exec Time
Experiment Setup
• Baselines– SCS – MH [SC 2011]
– SCS + LLR [NEW]
• Implementation & Deploy
– LCA and 2 baselines on AWS
• VM Types for Experiments
• Workload Generation
# of Jobs 100 Watershed Delineation Jobs
JobDeadline
Mean Deadline STD DEV
30 min. 9.7 min.
JobDuration
Mean Duration STD DEV
15 min. 12.5 min.
(a) Steady (b) Bursty
(c) Incremental (d) Random
InstanceType
CPU/Mem Hourly Price
m1.small 1/1.7G $0.091/Hr.
m1.medium 1/3.7G $0.182/Hr.
m1.large 2/7. 5G $0.364/Hr.
m1.xlarge 4/15G $0.728/Hr.
Job Exec. Time Predictor Performance
LLR LR kNN Mean
Avg. Predict. Acc. 78.77% 67.62% 65.38% 60.99%
MAPE 0.2773 0.3901 0.5012 0.8254
LLR: Local Linear Regression, LR: Linear Regression, MAPE: Mean Absolute Percentage Error
Job Deadline Satisfaction Rate
LCA: Average 83.25% of Job Deadline Satisfaction Rate- 9% better than SCS+LLR- 33% better than SCS
Overall Running Cost
LCA: Average $8.9 of Overall Running Cost- $2.5 of cheaper than SCS+LLR- $5.2 of more expensive than SCS
- (but performance is not comparable)
Conclusion
• LCA is a novel elastic resource management system for scientific applications on public IaaS cloud based on three approaches:
[1] Local Linear Regression-based Job Execution Time Prediction
[2] Cost-Performance Ratio-based Resource Evaluation
[3] Availability-Aware Job Scheduling and VM Scaling
• LCA has better performance than baselines (SCS, SCS with LLR) in Four different workload patterns (Steady, Bursty, Incremental, Random).– Predictor Performance: 11%-18% better accuracy
– Job Deadline Satisfaction Rate: 9%-33% better rate
– Overall Running Cost: $2.45 (22%) better cost efficiency
LCA System Design
6
Job Scheduling & VM Scaling
Prediction Module
LLR Predictor
Job HistoryRepository
Resource Evaluation
Cost-PerformanceOptimized Evaluation
Req
ues
t
Sam
ple
s
Availability-Aware Job Scheduling and VM Scaling
VM Manager
PredictionResults
VM Ranking& Selection
VM Req,Job Assign
Job + Deadline
+/- VMs, Job Assignment
UpdateExe Info
Results
VMs on IaaS
User
VM Utilization
Startup
Idle
JobRunning
LCA: Average 69.17% of VM Utilization- 25% higher than SCS + LLR- 11% higher than SCS
VM Instance Types
TABLE. SPECIFICATION OF GENERAL PURPOSE MICROSOFT WINDOWS INSTANCES ON AMAZON EC2 INUS EAST REGION (THE PRICE IS BASED ON MARCH 2014)
Instance Type ECU[1] CPU Cores Memory Hourly Price
m1.small 1 1 1.7GB $0.091/Hr.
m1.medium 2 1 3.7GB $0.182/Hr.
m1.large 4 2 7.5GB $0.364/Hr.
m1.xlarge 8 4 15GB $0.728/Hr.
1Single ECU (EC2 Compute Unit) provides the equivalent CPUI capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon Processor
← Back to Slide – Experiment Setup