+ All Categories
Home > Documents > Eager Writeback — A Technique for Improving Bandwidth Utilization

Eager Writeback — A Technique for Improving Bandwidth Utilization

Date post: 01-Feb-2016
Category:
Upload: iain
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Eager Writeback — A Technique for Improving Bandwidth Utilization. Hsien-Hsin Lee Gary Tyson Matt Farrens. Intel Corporation, Santa Clara University of Michigan, Ann Arbor University of California, Davis. Agenda. Introduction Memory Type and Bandwidth Issues - PowerPoint PPT Presentation
Popular Tags:
28
1 Eager Writeback Eager Writeback A A Technique for Improving Technique for Improving Bandwidth Utilization Bandwidth Utilization Hsien-Hsin Lee Hsien-Hsin Lee Gary Tyson Gary Tyson Matt Matt Farrens Farrens Intel Corporation, Santa Clara Intel Corporation, Santa Clara University of Michigan, Ann Arbor University of Michigan, Ann Arbor University of California, Davis University of California, Davis
Transcript
Page 1: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

1

Eager Writeback Eager Writeback — — A Technique for A Technique for Improving Bandwidth UtilizationImproving Bandwidth Utilization

Hsien-Hsin LeeHsien-Hsin Lee Gary Tyson Gary Tyson Matt FarrensMatt Farrens

Intel Corporation, Santa ClaraIntel Corporation, Santa Clara

University of Michigan, Ann ArborUniversity of Michigan, Ann Arbor

University of California, Davis University of California, Davis

Page 2: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

2Hsien-Hsin Lee MICRO-33

AgendaAgenda

IntroductionMemory Type and Bandwidth IssuesMemory Reference CharacterizationEager WritebackExperimental Results and AnalysisConclusions

Page 3: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

3Hsien-Hsin Lee MICRO-33

Modern Multimedia Computing Modern Multimedia Computing SystemSystem

Command and Texture Traffics

System Memory (DRAM)

GraphicsProcessing

UnitChipset

CacheL2

Texture dataLocalFrameBuffer

Back-Side Bus

Front-Side Bus

Core Processor

The Host Processor

I/O I/O I/O

A.G.P.

Commands Data

Page 4: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

4Hsien-Hsin Lee MICRO-33

Memory Type Support Memory Type Support Page-based programmable memory

types– Uncacheable (e.g. memory-mapped I/O)– Write-Combining (e.g. frame buffers)– Write-Protected (e.g. copy-on-write

when fork)– Write-Through– Write-Back or Copy-Back

Page 5: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

5Hsien-Hsin Lee MICRO-33

Write-through vs. Write-through vs. WritebackWriteback

CPU

L1$

MainMemory

allocate

writes Reads

CPU

L1$

MainMemory

allocate

writes

Dirtywrites

Reads

Page 6: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

6Hsien-Hsin Lee MICRO-33

Potential WB Bandwidth Potential WB Bandwidth Issues Issues Conflict on the bus while streaming data in

– Incoming : Demand fetches– Outgoing : Dirty Data

Dirty data – Can steal cycles amid successive data streaming– Delay of data delivery for critical path– Writeback (Castout) buffer could be ineffective

How to alleviate the conflicts ?– Try to find balance between WT and WB

Page 7: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

7Hsien-Hsin Lee MICRO-33

Probability of Rewrites to Dirty Probability of Rewrites to Dirty Lines Lines

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MRU MRU - 1 LRU + 1 LRU

L1 data cache

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

MRU MRU - 1 LRU + 1 LRU

L2 cache

Xlock-mount

POV-ray

xdoom

Xanim

Average

4-way caches using x-benchmark [Austin 98][Austin 98]

Pr(R|D) = # re-dirty / # dirty lines entering a particular LRU state MRU lines are much more likely to be written

Page 8: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

8Hsien-Hsin Lee MICRO-33

Normalized L1 Dirty Line Normalized L1 Dirty Line StatesStates

Enter dirty the first time a line is written Re-dirty writing to a dirty line

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

xlock pov-ray xdoom xanim

MRU # enter dirty

MRU # re-dirty

MRU-1 # enter dirty

MRU-1 # re-dirty

LRU+1 # enter dirty

LRU+1 # re-dirty

LRU # enter dirty

LRU # re-dirty

Page 9: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

9Hsien-Hsin Lee MICRO-33

Eager Writeback TriggerEager Writeback Trigger

Dirty lines enter LRU state !

Page 10: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

10Hsien-Hsin Lee MICRO-33

Eager Writeback Eager Writeback MechanismMechanism

way0

Writeback Buffer

MSHR

Next Level Cache/Memory

BlockAddr

BlockAddr

Data

Data

set0

Set-Associative Cache

LRU bits

Cache Miss Address

DataReturn

Data

Forward

Path

01

Page 11: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

11Hsien-Hsin Lee MICRO-33

Eager Writeback Eager Writeback MechanismMechanism

way0

Writeback Buffer

MSHR

Next Level Cache/Memory

BlockAddr

BlockAddr

Data

Data

set0

Set-Associative Cache

LRU bits

Cache Miss Address

DataReturn

Data

Forward

Path

00

Page 12: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

12Hsien-Hsin Lee MICRO-33

Eager Writeback Eager Writeback MechanismMechanism

way0

Writeback Buffer

MSHR

Next Level Cache/Memory

BlockAddr

BlockAddr

Data

Data

set0

Set-Associative Cache

LRU bits

Cache Miss Address

DataReturn

Data

Forward

Path

00

Page 13: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

13Hsien-Hsin Lee MICRO-33

Eager Writeback Eager Writeback MechanismMechanism

way0

Writeback Buffer

MSHR

Next Level Cache/Memory

BlockAddr

BlockAddr

Data

Data

set0

Set-Associative Cache

LRU bits

Cache Miss Address

DataReturn

Data

Forward

Path

00

X

Page 14: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

14Hsien-Hsin Lee MICRO-33

Eager Writeback Eager Writeback MechanismMechanism

way0

Writeback Buffer

MSHR

Next Level Cache/Memory

BlockAddr

BlockAddr

Data

Data

set0

Set-Associative Cache

LRU bits

Cache Miss Address

DataReturn

Data

Forward

Path

00

Eager Queue (EQ)

set IDs

Set ID

Trigger when entry freed

Page 15: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

15Hsien-Hsin Lee MICRO-33

Simulation FrameworkSimulation Framework Simplescalar suite 8-wide OOO superscalar machine Enhanced memory subsystem modeling Non-blocking caches (32KB L1 / 512 KB L2)

– Model MSHRs for all cache levels– Model WC memory type

2-level Gshare (10-bit) branch predictor RDRAM model (single-channel) Model limited bus bandwidth

– peak front-side bus bandwidth = 1.6 GB/s

Page 16: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

16Hsien-Hsin Lee MICRO-33

Simulation FrameworkSimulation Frameworkparameters specs

Core Frequency 1GHz

BSB speed 500MHz

FSB speed 200MHz

LSQ size 32

RUU size 64

1st level caches 3clks / 1clk

cache port 2

2nd level cache 18clks / 10clks

TLBs 2clks / 1clk

BSB arbitration 4 clks

FSB arbitration 10 clks

RDRAM Trcd 20 clks

RDRAM Tcac 20 clks

RDRAM Trp 20 clks

Page 17: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

17Hsien-Hsin Lee MICRO-33

Case StudiesCase Studies 3D Geometry Engine

– A triangle-based rendering algorithm– Used in Microsoft Direct3D and SGI OpenGL

Streaming

Xform

Light

Driver

3D model

DriverBuffer

To AGP memory

Geom engine

Page 18: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

18Hsien-Hsin Lee MICRO-33

Bandwidth ShiftingBandwidth Shifting (Geometry (Geometry Engine)Engine)1.6GB/s

0

Execution time

Baseline Writeback

0.6GB/s

Eager Writeback1.6GB/s

0

0.4GB/s

Page 19: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

19Hsien-Hsin Lee MICRO-33

Load Response TimeLoad Response Time

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1e+06

Execution time

Vert

ex I

D

Eager Writeback

Baseline Writeback

e.g. 600kth load

Page 20: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

20Hsien-Hsin Lee MICRO-33

Performance of Geometry Performance of Geometry EngineEngine

Free writeback represents performance upper bound

0.900

0.950

1.000

1.050

1.100

1.150

1.200

NL, wb = 1 NL, wb = 4 NL, wb=256 3DL, wb = 1 3DL, wb = 4 3DL, wb=256

Baseline EQ = 0 EQ = 4 free dirty WB

Page 21: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

21Hsien-Hsin Lee MICRO-33

Eager Writeback

Baseline Writeback

Bandwidth Filling Bandwidth Filling (Streaming)(Streaming)

1.6GB/s

0

1.6GB/s

0

Execution time

Page 22: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

22Hsien-Hsin Lee MICRO-33

Performance of Streaming Performance of Streaming BenchmarkBenchmark

0.90

0.95

1.00

1.05

1.10

1.15

Stream wb = 1 Stream wb = 4 Stream wb =2 56

Baseline EQ = 0 EQ = 4 Free dirty WB

Page 23: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

23Hsien-Hsin Lee MICRO-33

ConclusionsConclusions Writebacks compete bandwidth with demand

misses Demand data delivery can be deferred LRU dirty lines are rarely promoted again Eager writeback

– Triggered by dirty lines entering LRU state– Additional programmable memory type– Shift writeback traffic – Effective for content-rich apps, e.g. 3D

geometry – Can be extended for

• Improving context switch penalty • Reducing coherency misse latencies for MP systems

(similar technique: LTP [LaiFalsafi 00][LaiFalsafi 00] )

Page 24: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

24

Questions & AnswersQuestions & Answers

Bandwidth problem can be cured with money. Latency problems are harder because the speed of light is fixed you cannot bribe God.

David Clark, MIT

Page 25: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

25

That's all, folks !!!That's all, folks !!!

http://www.eecs.umich.edu/~linear

Page 26: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

26

Backup FoilsBackup Foils

Page 27: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

27Hsien-Hsin Lee MICRO-33

Speedup with Traffic Speedup with Traffic InjectionInjection

Imitating bandwidth stealing from other bus agents Uniform memory traffic injection

0.90 0.95 1.00 1.05 1.10 1.15 1.20

speedup

0B/s (No Injection)

400MB/s (160B/400clks)

800MB/s (320B/400clks)

1.2 GB/s (480B/400clks)

400MB/s (1280B/3200clks)

800MB/s (2560B/3200clks)

1.2 GB/s (3840B/3200clks)

Page 28: Eager Writeback  —  A Technique for Improving Bandwidth Utilization

28Hsien-Hsin Lee MICRO-33

Injected Memory Traffic Injected Memory Traffic (0.8GB/s)(0.8GB/s)

Execution time

1.6GB/s

0

320B/400 clks

1.6GB/s

0

2560B/3200 clks


Recommended