+ All Categories
Home > Documents > Trace-Level Reuse

Trace-Level Reuse

Date post: 06-Jan-2016
Category:
Upload: shayla
View: 27 times
Download: 3 times
Share this document with a friend
Description:
1999 International Conference on Parallel Processing ICPP´99. Trace-Level Reuse. A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya. Motivation. Increase performance by overcoming dataflow limitation DATA SPECULATION - PowerPoint PPT Presentation
Popular Tags:
28
U U P P C C Trace-Level Reuse A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya 1999 International Conference on Parallel Processing ICPP
Transcript
Page 1: Trace-Level Reuse

UU PP CC

Trace-Level ReuseTrace-Level Reuse

A. González, J. Tubella and C. Molina

Dpt. d´Arquitectura de Computadors

Universitat Politècnica de Catalunya

1999 International Conference on Parallel Processing ICPP´99

Page 2: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 2

MotivationMotivation

Increase performance by overcoming

dataflow limitation

DATA SPECULATION Exploits predictability of values

DATA REUSE Exploits redundancy of computations

Page 3: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 3

MotivationMotivation

Redundant computations

are rather frequent code

loops, recursive subroutines data

finite domain of values

The results could be reused

instead of recomputed

OUT = f (IN)

OUT = f (IN)

OUT = f (IN)

dynamicexecution

stream

redundantcomputations

Page 4: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 4

MotivationMotivation

Reuse granularity an instruction a sequence of instructions

TRACE-LEVEL REUSE

Performance potential of data reuse at instruction-level at trace-level

Page 5: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 5

OutlineOutline

Trace-level reuse

Performance potential

A first approach

Related work

Conclusions

Page 6: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 6

Trace-Level Reuse Trace-Level Reuse

Trace Any dynamic sequence of instructions

Goal Avoid the execution of a trace by reusing its

resultsprovided that the same trace with the same inputs has

already been executed

Advantages Reduces other machine resources utilization Reduces time to compute results Allows the processor to exceed the dataflow limit

Page 7: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 7

Trace-Level Reuse Trace-Level Reuse

Hardware scheme

Main Issues Reuse Trace Memory (RTM) Dynamic trace collection Reuse test State update

Page 8: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 8

Reuse Trace Memory (RTM)Reuse Trace Memory (RTM)

RTM stores candidate traces to be reused

Initial

Address

Input registers

identifiers&contents

Input memory

addresses&contents

Output registers

identifiers&contents

output memory

addresses&contents

Next

Address

Trace input Trace output

TRACE

INPUT

OUTPUT

Page 9: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 9

Dynamic trace collectionDynamic trace collection

Chooses candidate traces Initial address Next address

Input and output trace locations are

computed at execution-time and stored

along with their values in RTM

Page 10: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 10

Reuse Test & State UpdateReuse Test & State Update

Reuse test At some points of the execution the reused test is

performed Checks if a trace input, stored in RTM, matches

the current execution state

State update Writes output trace values to output trace

locations

REUSE LATENCY Reuse test plus State update

Page 11: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 11

OutlineOutline

Trace-level reuse

Performance Potential

A first approach

Related work

Conclusions

Page 12: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 12

Performance PotentialPerformance Potential

Base-line machine ISA: Alpha Only constrained by:

Data dependences Data dependences + Finite instruction window

Reuse engine Perfect trace reuse

Maximum-length tracesMinimum number of traces

Page 13: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 13

Performance Potential Performance Potential

Instruction-level reuse (ILR) Perfect instruction reuse engine:

All previous executed instances of each instruction are checked for a possible reuse

Maximum reusability: almost 90%

0102030405060708090

100

r e

u s

a b

i l

i t

y

Page 14: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 14

ILR ILR

Performance limits Base-line machine

constrained by data dependences Reuse engine: 1-cycle latency

0

0,5

1

1,5

2

2,5

3

3,5

4

sp

ee

d-u

p

Page 15: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 15

ILR ILR

Performance limits Base-line machine constrained by

data dependencesdata dependences and instruction window

Reuse latency: 1 to 4 cycles

1

1,1

1,2

1,3

1,4

1,5

sp

ee

d-u

p

1 2 3 4

Infinite IW

256-entry IW

Page 16: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 16

ILR ILR

Performance limits Moderate potential with a perfect reuse engine

Instruction latency is reduced The reuse of a chain of dependent instructions is still a sequential process

Source operands must be ready

Page 17: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 17

Performance Potential Performance Potential

Trace-level reuse (TLR) Perfect reuse engine

Traces consist of maximum-length dynamic sequences of reusable instructions

– Upper bound of the maximum reusability

– Lower bound of the minimum traces

I1I2I3I4I5I6

TRACE

Page 18: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 18

TLR TLR

Average trace size: 15.0 instructions FP: 11.7 INT: 20.3

1

10

100

tra

ce

siz

e

203 116

Page 19: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 19

TLR TLR

Performance limits Base-line machine constrained by

data dependences ans instruction window (256-entry) Reuse engine latency

ConstantLinear: f(#INPUTS+#OUTPUTS)

0

1

2

3

4

sp

ee

d-u

p

1 2 3 4 (I+O)/32 (I+O)/16 (I+O)/8 (I+O)/4 (I+O)/2 I+O

CONSTANT LINEAR

Page 20: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 20

OutlineOutline

Trace-level reuse

Performance potential

A first approach

Related work

Conclusions

Page 21: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 21

A First ApproachA First Approach

Reuse Trace Memory (RTM) Indexed by trace initial address (4-way and 8-way) Maximum number of input and output values:

8 register values 4 memory values

Sizes512 entries (4 different entries per initial address)4K entries (8 entries per initial address)32K entries (16 entries per initial address)256K entries (16 entries per initial address)

Page 22: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 22

A First ApproachA First Approach

In-order execution Reuse test performed for every fetch operation

PC

Instruction Cache

RTM

RT

M e

nt r

y

Reuse Test

Execute CommitFetch Decode

Page 23: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 23

A First ApproachA First Approach

Dynamic trace collection Built traces have all instructions reusable

an additional memory to check instruction reusability is needed

Fixed-length tracesstarting at any address

Trace expansion on reuse hit

Page 24: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 24

Reusable InstructionsReusable Instructions

25% reusability for a 4K-entry RTM

0

10

20

30

40

50

60

70

reu

sa

ble

in

str

uc

tio

ns

ILR ILR-E I1-E I2-E I3-E I4-E I5-E I6-E I7-E I8-E

512

4K

32K

256K

Page 25: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 25

Trace SizeTrace Size

6 instructions for a 4K-entry RTM

0

1

2

3

4

5

6

7

8

tra

ce

siz

e

ILR ILR-E I1-E I2-E I3-E I4-E I5-E I6-E I7-E I8-E

512

4K

32K

256K

Page 26: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 26

Related workRelated work

Data Reuse Software implementation

Memoization [Richardson,92] Hardware implementation

Tree Machine [Harbison,82]

At instruction-level Reuse Buffer [Sodani and Sohi,97] Register renaming [Jourdan et al.,98] Redundant Computation Buffer [Molina, González and Tubella,99]

At “trace”-level Result cache [Richardson,93] [Oberman and Flynn,95] Basic block reuse [Huang and Lilja,99]

Page 27: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 27

ConclusionsConclusions

Increasing the granularity of reuse from

instructions to traces Less reusability More effective

Fetch band-width is reduced Effective instruction window size is increased Number of operations per reused instruction is reduced DATA DEPENDENCES ARE BROKEN

Page 28: Trace-Level Reuse

UU PP CC September 21, 1999 ICPP´99 28

ConclusionsConclusions

Concentrate effort in divising strategies to choose reusable traces

High-level structures Compiler assistance

reducing the reuse test overhead Boolean test Invalidate/validate RTM entries


Recommended