1 University of MichiganElectrical Engineering and Computer Science
Uncovering Hidden Loop Level Parallelism in Sequential Applications
Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman,
Scott Mahlke
Advanced Computer Architecture Lab.University of Michigan
2 University of MichiganElectrical Engineering and Computer Science
CMP Architectures• Multiple cores on a chip
– Higher throughput– Reduced complexity (per core)– More power/heat friendly
• Multithreaded applications
Inte
l Cor
e 2
Duo
AMD
Quad
-cor
e (B
arce
lona
)
Sun Niagara 2
3 University of MichiganElectrical Engineering and Computer Science
How About Single Thread?
[Source : Bridges et al, MICRO `07]
4 University of MichiganElectrical Engineering and Computer Science
Loop Level Parallelization
i = 0-39
DOALL loop
5 University of MichiganElectrical Engineering and Computer Science
Loop Level Parallelization
i = 0-39i = 20-39i = 0-19
Core 1Core 0
DOALL loop
6 University of MichiganElectrical Engineering and Computer Science
Loop Level Parallelization
i = 0-39
Speculative DOALL loop
7 University of MichiganElectrical Engineering and Computer Science
Loop Level Parallelization
i = 0-39
i = 10-19
i = 30-39
i = 0-9
i = 20-29
Core 1Core 0
Loop Chunk
Speculative DOALL loop
8 University of MichiganElectrical Engineering and Computer Science
Loop Level Parallelization
i = 0-39
i = 10-19
i = 30-39
i = 0-9
i = 20-29
Core 1Core 0
Loop Chunk
Bad news: limited number of parallel loops in general purpose applications
–1.3x speedup for SpecINT2000 on 4 cores
Speculative DOALL loop
9 University of MichiganElectrical Engineering and Computer Science
Contributions
• Code generation framework– Speculative parallelization of
uncounted loops
• Compiler transformations – Speculative loop fission– Isolation of infrequent dependences– Speculative prematerialization
Initialization
Consolidation
Abort Handler
for(i=IS; i<IE; i++) { ...... if (brk_cond) local_brk_flag = 1; break;}
XBEGIN
if (global_brk_flag) break;
perm = RECV(THREADj-1)XCOMMITif (local_brk_flag) global_brk_flag = 1; kill_other_threads;elseif (IE < n) SEND(perm,THREADj+1)
IS = ...; IE = ...;
Spawn
10 University of MichiganElectrical Engineering and Computer Science
Target Architecture
L2 cache
L2 cache
Core 0 Core 1
Core 2 Core 3
11 University of MichiganElectrical Engineering and Computer Science
Target Architecture
L2 cache
L2 cache
Core 0 Core 1
Core 2 Core 3
Scalar operand network
12 University of MichiganElectrical Engineering and Computer Science
Target Architecture
L2 cache
L2 cache
Core 0 Core 1
Core 2 Core 3Hardware transactional
memory
Scalar operand network
13 University of MichiganElectrical Engineering and Computer Science
Code Generation Framework
for (i=0;i<n;i++)// original loop code
14 University of MichiganElectrical Engineering and Computer Science
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGIN
XCOMMIT
for (i=IS;i<IE;i++)// original loop code
15 University of MichiganElectrical Engineering and Computer Science
RECV(THREADj-1)XCOMMITSEND(THREADj+1)
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGIN
for (i=IS;i<IE;i++)// original loop code
16 University of MichiganElectrical Engineering and Computer Science
RECV(THREADj-1)XCOMMITSEND(THREADj+1)
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGIN
for (i=IS;i<IE;i++)// original loop code
Spawn
17 University of MichiganElectrical Engineering and Computer Science
RECV(THREADj-1)XCOMMITSEND(THREADj+1)
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGIN
for (i=IS;i<IE;i++)// original loop code if (brkCond) break;
Spawn
18 University of MichiganElectrical Engineering and Computer Science
for (i=IS;i<IE;i++)// original loop code if (brkCond)
localBrk=1; break;
RECV(THREADj-1)XCOMMITif (localBrk)globalBrk=1;abortOtherTXs;SEND(THREADj+1)
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGINif (globalBrk) break;
Spawn
19 University of MichiganElectrical Engineering and Computer Science
for (i=IS;i<IE;i++)// original loop code if (brkCond)
localBrk=1; break;
RECV(THREADj-1)XCOMMITif (localBrk)globalBrk=1;abortOtherTXs;SEND(THREADj+1)
Code Generation Framework
while (...)IS+=...; IE+=...;XBEGINif (globalBrk) break;
Consolidation
Spawn
20 University of MichiganElectrical Engineering and Computer Science
Code Generation Framework• Supports counted and
uncounted loops– Software managed
control speculation• Iteration chunking• Enforce transaction
ordering• Handles livein, liveout &
accumulator registers
for (i=IS;i<IE;i++)// original loop code if (brkCond)
localBrk=1; break;
RECV(THREADj-1)XCOMMITif (localBrk)globalBrk=1;abortOtherTXs;SEND(THREADj+1)
while (...)IS+=...; IE+=...;XBEGINif (globalBrk) break;
Consolidation
Spawn
21 University of MichiganElectrical Engineering and Computer Science
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
grid
177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
ompre
ss072.s
c099.g
o124.m
88ks
im129.c
ompre
ss130.li
132.ij
peg
164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wol
fcj
peg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
witdec
peg
witen
cra
wca
udio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
ctio
n o
f se
qu
en
tial execu
tion Provable DOALL
DOALL Coverage – Provable and Profiled
22 University of MichiganElectrical Engineering and Computer Science
DOALL Coverage – Provable and Profiled
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
grid
177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
ompre
ss072.s
c099.g
o124.m
88ks
im129.c
ompre
ss130.li
132.ij
peg
164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wol
fcj
peg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
witdec
peg
witen
cra
wca
udio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
ctio
n o
f se
qu
en
tial execu
tion
Profiled DOALLProvable DOALL
23 University of MichiganElectrical Engineering and Computer Science
DOALL Coverage – Provable and Profiled
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
grid
177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
ompre
ss072.s
c099.g
o124.m
88ks
im129.c
ompre
ss130.li
132.ij
peg
164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wol
fcj
peg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
witdec
peg
witen
cra
wca
udio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
ctio
n o
f se
qu
en
tial execu
tion
Profiled DOALLProvable DOALL
Still not good enough!Few dependences hinder parallelization in many loops
24 University of MichiganElectrical Engineering and Computer Science
DOALL Coverage – Provable and Profiled
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
grid
177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
ompre
ss072.s
c099.g
o124.m
88ks
im129.c
ompre
ss130.li
132.ij
peg
164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wol
fcj
peg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
witdec
peg
witen
cra
wca
udio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
ctio
n o
f se
qu
en
tial execu
tion
Profiled DOALLProvable DOALL
Still not good enough!Few dependences hinder parallelization in many loops
Compiler can help:•Speculative fission•Isolation of infrequent paths•Speculative prematerialization
25 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
26 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
27 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITSEND(THREADj+1)}
28 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITif (node!= node_array[IS+CS]){
update_node_array;kill_other_threads();}
SEND(THREADj+1)}
1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
29 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITif (node!= node_array[IS+CS]){
update_node_array;kill_other_threads();}
SEND(THREADj+1)}
1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
30 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITif (node!= node_array[IS+CS]){
update_node_array;kill_other_threads();}
SEND(THREADj+1)}
1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
31 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITif (node!= node_array[IS+CS]){
update_node_array;kill_other_threads();}
SEND(THREADj+1)}
1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
32 University of MichiganElectrical Engineering and Computer Science
1: while (node) {2: work(node);3: node = node->next;
}
Speculative Loop Fission
XBEGIN5: node = node_array[IS];
i = 0;1':while (node && i++ < CS) {2: work(node);3': node = node->next;
}RECV(THREADj-1)XCOMMITif (node!= node_array[IS+CS]){
update_node_array;kill_other_threads();}
SEND(THREADj+1)}
1: while (node) {4: node_array[count++] = node;3: node = node->next;
}
33 University of MichiganElectrical Engineering and Computer Science
Infrequent Dependence Isolation
1:
2:99%
1%
A
B
C
34 University of MichiganElectrical Engineering and Computer Science
Infrequent Dependence Isolation
1:
2:
1:
2:99%
1%
A
B
C
A
B
C
35 University of MichiganElectrical Engineering and Computer Science
Infrequent Dependence Isolation
1:
2:
1:
2:99%
1%
A
B
C
A
B
C’
C
36 University of MichiganElectrical Engineering and Computer Science
Infrequent Dependence Isolation
1:
2:
1:
2:99%
1%break
A
B
C
A
C’
C
B
1%
99%
37 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
Sample loop from yacc benchmark
38 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
Sample loop from yacc benchmark
39 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
if ( count > times) {best = cbest;times = count;
}
1 %
Sample loop from yacc benchmark
40 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
if ( count > times) {best = cbest;times = count;
}
j=0;while (j<=nstate){
for( ; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times)break;
}
if (count > times) {best = cbest;times = count; j++;}}
1 %1 %
Sample loop from yacc benchmark
41 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
if ( count > times) {best = cbest;times = count;
}
j=0;while (j<=nstate){
for( ; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times)break;
}
if (count > times) {best = cbest;times = count; j++;}}
1 %1 %
Sample loop from yacc benchmark
42 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
if ( count > times) {best = cbest;times = count;
}
j=0;while (j<=nstate){
for( ; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times)break;
}
if (count > times) {best = cbest;times = count; j++;}}
1 %
1 %
1 %
Sample loop from yacc benchmark
43 University of MichiganElectrical Engineering and Computer Science
for( j=0; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times) {best = cbest;times = count;
}}
Infrequent Dependence Isolation
if ( count > times) {best = cbest;times = count;
}
j=0;while (j<=nstate){
for( ; j<=nstate; ++j ){if( tystate[j] == 0 ) continue;if( tystate[j] == best ) continue;count = 0;cbest = tystate[j];for (k=j; k<=nstate; ++k)if (tystate[k]==cbest) ++count;if ( count > times)break;
}
if (count > times) {best = cbest;times = count; j++;}}
1 %
1 %
1 %
Sample loop from yacc benchmark
44 University of MichiganElectrical Engineering and Computer Science
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
gri
d177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
om
pre
ss072.s
c099.g
o124.m
88ks
im129.c
om
pre
ss130.li
132.ijp
eg164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wolf
cjpeg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
wit
dec
peg
wit
enc
raw
caudio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
cti
on
of
se
qu
en
tia
l e
xe
cu
tio
n
profiled + provable
DOALL Coverage – Profiled and Transformed
45 University of MichiganElectrical Engineering and Computer Science
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
gri
d177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
om
pre
ss072.s
c099.g
o124.m
88ks
im129.c
om
pre
ss130.li
132.ijp
eg164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wolf
cjpeg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
wit
dec
peg
wit
enc
raw
caudio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
cti
on
of
se
qu
en
tia
l e
xe
cu
tio
n
profiled + provable
DOALL Coverage – Profiled and Transformed
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
052.a
lvin
n056.e
ar171.s
wim
172.m
gri
d177.m
esa
179.a
rt183.e
quak
e188.a
mm
p008.e
spre
sso
023.e
qnto
tt026.c
om
pre
ss072.s
c099.g
o124.m
88ks
im129.c
om
pre
ss130.li
132.ijp
eg164.g
zip
175.v
pr
181.m
cf197.p
arse
r256.b
zip2
300.t
wolf
cjpeg
djp
egep
icg721dec
ode
g721en
code
gsm
dec
ode
gsm
enco
de
mpeg
2dec
mpeg
2en
cpeg
wit
dec
peg
wit
enc
raw
caudio
raw
dau
dio
unep
icgre
ple
xya
ccav
erag
e
SPEC FP SPEC INT Mediabench Utilities
Fra
cti
on
of
se
qu
en
tia
l e
xe
cu
tio
n
profiled + provable transformations
46 University of MichiganElectrical Engineering and Computer Science
Coverage Breakdown
0
10
20
30
40
50
60
70
SpecINT MediaBench Utilities
Fra
cti
on
of
se
qu
en
tia
l e
xe
cu
tio
n
DOALL loops Control speculation for uncounted loops
Speculative fission Speculative prematerialization
Infrequent dependence isolation DOALL loops after transformations
47 University of MichiganElectrical Engineering and Computer Science
Coverage Breakdown
0
10
20
30
40
50
60
70
SpecINT MediaBench Utilities
Fra
cti
on
of
se
qu
en
tia
l e
xe
cu
tio
n
DOALL loops Control speculation for uncounted loops
Speculative fission Speculative prematerialization
Infrequent dependence isolation DOALL loops after transformations
48 University of MichiganElectrical Engineering and Computer Science
Experimental Setup
• OpenIMPACT compiler• Multicore simulator
– Simulates up to 8 ARM9-like processors– Models scalar operand network– Assumes perfect memory system– Uses STM library to emulate HTM functionality
49 University of MichiganElectrical Engineering and Computer Science
1
1.5
2
2.5
3
3.5
4
4.5
5
05
2.a
lvin
n
05
6.e
ar
17
1.s
wim
17
2.m
gri
d
17
7.m
esa
17
9.a
rt
18
3.e
qu
ake
18
8.a
mm
p
00
8.e
spre
sso
02
3.e
qn
tott
02
6.c
om
pre
ss
07
2.s
c
09
9.g
o
12
4.m
88
ksi
m
12
9.c
om
pre
ss
13
0.li
13
2.ijp
eg
16
4.g
zip
17
5.v
pr
18
1.m
cf
19
7.p
ars
er
25
6.b
zip
2
30
0.t
wolf
cjp
eg
djp
eg
ep
ic
g7
21
deco
de
g7
21
en
cod
e
gsm
deco
de
gsm
en
cod
e
mp
eg
2d
ec
mp
eg
2en
c
peg
wit
dec
peg
wit
en
c
raw
cau
dio
raw
dau
dio
un
ep
ic
gre
p
lex
yacc
avera
ge
SPEC FP SPEC INT Mediabench Utilities
Sp
ee
du
p
With transformations
Without transformations
Speedup
2 core4 core8 core
7.897.37
7.87
6.44
50 University of MichiganElectrical Engineering and Computer Science
1
1.5
2
2.5
3
3.5
4
4.5
5
05
2.a
lvin
n
05
6.e
ar
17
1.s
wim
17
2.m
gri
d
17
7.m
esa
17
9.a
rt
18
3.e
qu
ake
18
8.a
mm
p
00
8.e
spre
sso
02
3.e
qn
tott
02
6.c
om
pre
ss
07
2.s
c
09
9.g
o
12
4.m
88
ksi
m
12
9.c
om
pre
ss
13
0.li
13
2.ijp
eg
16
4.g
zip
17
5.v
pr
18
1.m
cf
19
7.p
ars
er
25
6.b
zip
2
30
0.t
wolf
cjp
eg
djp
eg
ep
ic
g7
21
deco
de
g7
21
en
cod
e
gsm
deco
de
gsm
en
cod
e
mp
eg
2d
ec
mp
eg
2en
c
peg
wit
dec
peg
wit
en
c
raw
cau
dio
raw
dau
dio
un
ep
ic
gre
p
lex
yacc
avera
ge
SPEC FP SPEC INT Mediabench Utilities
Sp
ee
du
p
With transformations
Without transformations
Speedup
2 core4 core8 core
7.897.37
7.87
6.44
1.36x, 1.84x and 2.34x speedup on 2-, 4-, and 8-cores
51 University of MichiganElectrical Engineering and Computer Science
Conclusion
• Figure out ways to use available resources for legacy applications– Codes like error handlers, linked list & tree
traversal limit parallelism• Compiler analysis and optimization
looks promising • 1.84x speedup on 4 cores after
transformations compared to 1.41x
52 University of MichiganElectrical Engineering and Computer Science
Questions?
Thank you!
53 University of MichiganElectrical Engineering and Computer Science
SpecDSWP vs. Speculative Fission
B
A
C
54 University of MichiganElectrical Engineering and Computer Science
SpecDSWP vs. Speculative Fission
B0
A0A1A2A3 B1
B2B3
C0
C1
C2
C3
Core 0 Core 1 Core 2 Core 3
B0
A0A1A2A3
B1 B2 B3
C0 C1 C2 C3
Core 0 Core 1 Core 2 Core 3
55 University of MichiganElectrical Engineering and Computer Science
Speculative Prematerialization
for (...) {1: current = ...;2: work(last);3: last = current;
}
56 University of MichiganElectrical Engineering and Computer Science
Speculative Prematerialization
for (...) {1: current = ...;2: work(last);3: last = current;
}
XBEGIN1’: current =3’: last =
for (...) {1: current = ...;2: work(last);3: last = current;
}XCOMMIT