Storage Workload Isolation via Tier Warming:
How Models Can Help By Ji Xue, Feng Yan, Alma Riska, and
Evgenia Smirni
Presented By Christian Contreras
Proposed solution
• Storage workload prediction model to support the scheduling in a Multi-tier Storage System. – Autonomic technique that learns the traffic
patterns and make the prediction of the traffic changes.
– Adjust the storage environment using the predictions.
Agenda
• Multi-tier storage system • Traffic patterns • Problem • Goals • Model • Experimental results • Simulation • Conclusions and critiques
Mul$-‐ $ered Storage System: components
Fast &er
Slow &er
Multi- tiered Storage System: Traffic
User traffic System traffic
Upload Download Read/write
Replica$on Backup Restore Disaster Recovery
Multi- tiered Storage System: Concerns
• Tiered Storage Integration • Traffic patterns • Performance • Availability • Cost
Problem : How to schedule properly?
Traffic analysis (3 days) • User traffic
– Response time – Availability
• System traffic – Time window – I/O intensive
Overall intensity Working data sets Warm up
Fast tier What ? When ?
Multi- tiered Storage System • I/O hierarchical structure and traffic patterns
• Warm up reasoning: • Fast tier à the user workload traffic
response time requirement. • Slow tier à the system traffic
time window requirement. • Challenge: • what portion of the voluminous data set to
bring up to the smaller fast tiers. • when to do it,
so the system performance is the highest.
Propose solution : Goals
• Predict when drastic changes happen
• Proactively prepare the system for heavy user workload
To optimize Multi-tier Storage System performance.
Analysis: Scenarios
Analysis: Scenarios
CDFs of user response time of 1 day for different algorithms.
Can we predict the changes as good as the proactive scenario?
Proposed solution
• Markovian-based model that captures the duration of the low/high traffic intensities in user arrivals across different time scales.
• Model that captures the changes in user performance as a function of fast tier hit rate.
• Predict when the periods of high/low intensities arrive to schedule system work and cache warm up to optimize the system performance.
Proposed solution: Algorithm for scheduling
Prediction Model Scheduling
Prediction Model: Prediction-based Scheduling Policy
Prediction Model: Traffic trace Daily pa?ern Weekly pa?ern
Prediction Model: Traffic trace Daily pattern Weekly pattern
Prediction Model: Traffic trace Arrival intensity changes overtime
Prediction Model: Traffic trace
Prediction Model: Class classification
Prediction Model: State classification
Use clustering Silhouette1 and K-means to determine the states
1 Silhoue*e is used to calculate the dissimilarity value s(i) of the average arrival intensity of day i.
Prediction Model: High level Markovian model
Weekends
Weekdays
Prediction Model: High level Markovian model
H1
H2 L2
L1
Cluster analysis: Silhouettes • A graphical aid to the
interpretation and validation of cluster analysis
• The dissimilarity value s(i) is defined : – i is the day index – a(i) is the average dissimilarity of day i to all other days within
the same cluster – b(i) is the lowest average dissimilarity of day i to all days in
a different cluster .
• Values of s(i) are in [−1, 1] à The larger its value the better • s(i) approaches to 1, a(i) ≪ b(i),
which means that the distance between data within each cluster is the smallest
• Algorithm to determine the number of clusters.à Classes and states
Estimation method for the instant fast tier hit rate • Goal of estimating how it changes as active user data moves from
the slow tier up to the fast tier and vice versa. • Necessary to warm up the fast tier cache rather than allow it to be
warmed up gradually by the user accesses. • Fast tier hit rate estimated :
•
Where : µ(t) is average service rate of user traffic at time t. µorigi is the original average service rate of user traffic (no system workload) S(t) is the service slowdown which describes how the average service rate changes from the original one. Rslow is the average slow storage tier access speed Rfast is the average fast storage tier access speed C is the capacity F is the transfer speed S(ti-START) is the Service slowdown at the beginning of the time window i serving the additional work
When to warm up
Actual and predicted arrival intensity state changes
Proposed solution: Algorithm for scheduling
Prediction Model Scheduling
Proposed solution: Algorithm for scheduling
Algorithm for scheduling system work in a mul$-‐$er storage system
Algorithm State-based Scheduling Low State
Algorithm State-based Scheduling High State
Testbed: Hardware & workload • Server Memory is 12GB and disk enclosure 12 SATA 7200RPM HDDs of 3TB each.
• System memory emulates the fast $er • The disk enclosure the slow $er used for the bulk of the data
• Workload – It use “fio” as the IO workload generator – The working set size for the user workload is 1GB – The system working set is 24GB. – The workload is generated and measured at the host machine.
• The fast $er is warmed up via a sequen$al read of the user working set.
• The user ac$ve working set can be determined by evalua$ng sta$s$cally access pa?erns such as the number of accesses per storage loca$on.
Testbed: System work scheduling policies
• user-‐only -‐ used only as a baseline to evaluate the impact of the addi$onal system work.
• feedback-‐based -‐ a reac$ve policy that monitors the current load intensity in the system and determines if it is in a high or low intensity period,
• predic7on-‐based -‐ a proac$ve policy that uses the proposed Markovian model to predict user traffic intensity by having learned from past data the dura$on of periods of high and low intensity.
Experimental Results: user-‐only policy
User IOPS (throughput) and user response $me over $me, user-‐only policy.
Experimental Results
User IOPS (throughput) and user response $me over $me, user-‐only policy.
Experimental Results
User IOPS (throughput) and user response $me over $me, user-‐only policy.
Experimental Results
Response time with warm up and without warm up across the experiment time
Simulation results
• Feedback method – Use online detection – Stop and change – Delay to change
• Prediction method – Uses fast tier hit rate
prediction – Change warming up
–
–
–
Predicted state change by feedback method and prediction method.
- Using traces containing the arrival time - Two-tiered storage system
Evalua$on of the predic$on model accuracy
Simulation results
CDF of user response $me
Simulation results
Performance comparisons via simula$on. Note the throughput for system work is null in the user only case.
Conclusions • The Markovian-based model • A prediction-based scheduling policy • The prediction-based policy is very close to the
ideal scenario (knowledge of the future). • It demonstrates the effectiveness of the prediction
model. – It detects the incoming High state – Proactively warm up the fast tier.
• The larger the fast tier the higher benefit of the predicted approach.
Conclusions and critiques • Simplistic approach
– “Determining the user working set (i.e., what to bring) is outside the scope of this paper.” p.3.
– “The fast tier is warmed up via a sequential read of the user working set.”
– “The user active working set can be determined by evaluating statistically access patterns such as the number of accesses per storage location.”
• Performance improvement of the window of workload change. • Other factors : OS parameters (e.g., paging) , data set locations,
network architecture.
• Improve – Model, Testbed, Sensibility analysis
Reference • Storage Workload Isolation via TierWarming: How
Models Can Help by Ji Xue, Feng Yan, Alma Riska, and Evgenia Smirni
• Storage Workload Isolation via Tier Warming: How Models Can Help, presentation at ICAC2014.
•