Post on 19-Jan-2016
description
transcript
1
Using A Multiscale Approach toUsing A Multiscale Approach to
Characterize Workload DynamicsCharacterize Workload Dynamics
Tao LiTao Li
taoli@ece.ufl.edutaoli@ece.ufl.edu
June 4, 2005June 4, 2005
Dept. of Electrical and Computer Dept. of Electrical and Computer
EngineeringEngineering
University of FloridaUniversity of Florida
2
MotivationMotivation
Workload dynamics reveals the changing of workload behavior over time
Understanding workload dynamics is important
emerging workload characterization long-run (servers, e-commerce) interactive (user, OS, DLL…) non-deterministic (multithreaded)
run-time tuning, optimization, monitoring performance, power, reliability, security
microarchitecture trends CMP, SMT
3
Program Time Varying BehaviorProgram Time Varying Behavior
710096.4 cycle bucket ( 1010096.4 cycles total)
IPC
/ b
uc
ket
61012.5 cycle bucket ( 91012.5 cycles total)
IPC
/ b
uc
ket
5104.6 cycle bucket
( 8104.6 cycles total)
IPC
/ b
uc
ket
4108 cycle bucket ( 7108 cycles total)
IPC
/ b
uc
ket
4101 cycle bucket ( 7101 cycles total)
IPC
/ b
uc
ket
(a) gzip (b) crafty
4
Multiscale Workload Multiscale Workload CharacterizationCharacterization
Characterize workload behavior across different time scales
“zoom-in” and “zoom-out” features
Apply wavelet analysis to study program scaling behavior
compact and parsimonious models
Complement with other approaches (aggregate measurement, phase analysis)
5
OutlineOutline
Scaling models and wavelet analysis
Experimental setup
Results of SPEC 2K integer benchmarks
On-line program scaling estimation
Conclusions
6
Scaling ModelsScaling Models
Self-similarity: a dilated portion of the sample path of a process can not be statistically distinguished from the whole
H (Hurst parameter): the degree of self-similarity
)(tX
)/( ctXcH
7
Scaling Models Scaling Models (Contd.)(Contd.)
Long-Range Dependence (LRD): the correlation function of a process behaves like a power-law of the time lag k
is a positive constant and the Hurst parameter
LRD: correlations decay so slowly that they sum to infinity
22~)(
H
rx kckr k
rc 121 H
8
Scaling Analysis Technique: Discrete Wavelet Scaling Analysis Technique: Discrete Wavelet TransformTransform
Consider a series at the finest level of time scale resolution
We can coarsen this event series by averaging (with a slightly unusual normalization factor) over non-overlapping blocks of size two
(Equ. 1)
and generates a new time series X1, which represents a coarser granularity picture of the original series X0
,...,2,1,0,,0 kX kn2
)(2
112,02,0,1 kkk XXX
9
Discrete Wavelet TransformDiscrete Wavelet Transform
The difference between the two, known as details, is
(Equ. 2)
The original time series X0 can be reconstructed from its coarser representation X1 by simply adding in the details d1
Repeat this process, we get
)(2
112,02,0,1 kkk XXd
)(2 112/1
0 dXX
12/12/2/
0 2...22 ddXX nn
nn
10
Discrete Wavelet Transform (Contd.)Discrete Wavelet Transform (Contd.)
Discrete wavelet coefficients: the collection of details
Discrete Wavelet Transform (DWT) iteratively uses Equ. 1 and Equ. 2 to calculate all
DWT divides data into a low-pass approximation and a high-pass detail at any level of resolution
The coefficients of wavelet decomposition can be used to study the scale dependent properties of the data
kjd ,
kjd ,
11
Energy Function and Log-scale DiagramEnergy Function and Log-scale Diagram
Given a time series and its discrete wavelet coefficients the average energy at resolution level is then defined as:
The log-scale diagram (LD) is the plot of Ej as a
function of resolution level 2j on a scale, i.e.
The LD plot allows the detection of scaling through observation of strict alignment (linear trend) within some octave range
,...,2,1,0,,0 kX k
,.)( jd X
j2
jn
kX
jj kjd
nE
1
2),(
1
22 loglog
)(log2 jj Ey
12
Experimental SetupExperimental Setup
Simplescalar 3.0 Sim-outorder simulator
Parameter Configuration Processor Width
8
ITLB 128 entries, 4-way, 200 cycle miss
Branch Prediction combined 8K tables, 10 cycle misprediction, 2 predictions/cycle
BTB
2K entries, 4-way Return Address Stack 32 entries
L1 Instruction Cache 32K, 2-way, 32 Byte/line, 2 ports, 4 MSHR, 1 cycle access RUU Size 128 entries
Load/ Store Queue 64 entries Store Buffer 16 entries Integer ALU
4 I-ALU, 2 I-MUL/DIV FP ALU
2 FP-ALU, 1FP-MUL/DIV DTLB 256 entries, 4-way, 200 cycle miss
L1 Data Cache 64KB, 4-way, 64 Byte/line, 2 ports, 8 MSHR, 1 cycle access
L1 Cache unified 1MB, 4-way, 128 Byte/line, 12 cycle access
Memory Access 100 cycles
13
Experimental Setup (Contd.)Experimental Setup (Contd.)
Program Traces
Benchmark Input Duration (Cycles) mcf /ref/inp.in 570,689,841,862 gcc /ref/166.i 33,578,085,795 crafty /ref/crafty.in 337,250,101,460 gzip /ref/input.graphic 52,867,265,321 bzip2 /ref/input.source 70,644,828,028 eon /ref/chair.cook.ppm 93,485,005,275 gap /ref/ref.in 355,758,277,267 parser /ref/ref.in 247,035,615,983 perlbmk /ref/splitmail.pl 49,931,474,883 twolf /ref/ref 274,987,890,000 vortex /ref/lendian1.raw 93,677,830,341 vpr /ref/net.in 122,267,820,515
14
The LD Plots of BenchmarksThe LD Plots of Benchmarks
2 4 6 8 10 12 14 16 18
-4
-2
0
2
4
6
8
Octave j
yj
gzip
2 4 6 8 10 12 14 16 18 20 22
-6
-4
-2
0
2
4
6
8
10
Octave j
yj
crafty
15
On-line Program Scaling EstimationOn-line Program Scaling Estimation
Pyramid algorithm for DWT computation
x(n) N
cx(1,.) dx(1., )
H and decimate G and decimate
cx(J-1,.) dx(J-1,.)
cx(J,.) dx(J,.)
scalej
J
time shift k
16
On-line Program Scaling Estimation (Contd.)On-line Program Scaling Estimation (Contd.)
High-pass and low pass filters
G 2
H 2G 2
H 2G 2
H 2
x(t) dx(1, .)
dx(2, .)
dx(3, .)
cx(3, .)
cx(1, .)
cx(2, .)
17
On-line Program Scaling Estimation (Contd.)On-line Program Scaling Estimation (Contd.)
FIR filter structure
DD D D D......
......
D
h(0) h(1) h(2) h(3) h(4) h(N-1)
x(n)
y(n)
h(.): one delay register : filter coefficients
: multiplier : adder
18
Program Scaling Estimation FrameworkProgram Scaling Estimation Framework
Clock
Perform anceCounters
Filter-banks(Mallat's Pyramid)
inpu tsequence DW T
coe ffic ien tsScaling Properties(Hurst Parameter)
Estimation
CPU
Feed-back Control
On-line Estimator
19
Performance of On-line EstimatorPerformance of On-line Estimator
Hurst parameter estimation
0 2 4 6 8 10 12 14
x 104
0.5
0.6
0.7
0.8
0.9
1
Estimation Serial No.
Est
imate
d H
urs
t P
ara
mete
rcrafty
20
ConclusionsConclusions
As software execution cycles become larger, its changing nature can span across a wide range of time scales
Various scaling properties can be used as a useful tool for unraveling the program dynamics over different time periods