+ All Categories
Home > Documents > (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin...

(C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin...

Date post: 20-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
33
(C) 2002 Milo Martin HPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet Project Computer Sciences Department University of Wisconsin—Madison
Transcript
Page 1: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

(C) 2002 Milo Martin HPCA, Feb. 2002

Bandwidth Adaptive Snooping

Milo M.K. Martin, Daniel J. Sorin

Mark D. Hill, and David A. Wood

Wisconsin Multifacet Project

Computer Sciences Department

University of Wisconsin—Madison

Page 2: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 2

Two classes of multiprocessors

• Snooping (SMP) multiprocessors– Broadcast-based use more interconnect bandwidth+ Directly locate owner low latency cache-to-cache

transfers (36% - 91% of misses are cache-to-cache transfers in our

commercial workloads)

• Directory-based multiprocessors+ Indirection bandwidth-efficient & scalable– Indirection higher latency cache-to-cache transfers

• Problem: higher performing approach varies with:– Configuration (e.g., number of processors)– Workload (e.g., cache miss rate)

Page 3: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 3

Which approach is best?

•Micro-benchmark•64 processors

Page 4: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 4

Bandwidth Adaptive Snooping Hybrid (BASH)

• Goals – Best performance aspects of both approaches

• High performance for many configurations & workloads• Future workload properties unknown at design time

– Single design• Coherence logic integrated with processors• One part for many systems

• Hybrid protocol– Snooping-like broadcast requests– Directory-like “unicast” requests

• Bandwidth adaptive– Estimate available bandwidth– Adjust rate of broadcast based on estimate

Page 5: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 5

Best of both protocols

•Micro-benchmark•64 processors

Page 6: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 6

Outline

• Overview• Bandwidth adaptive mechanism• Hybrid protocol• Evaluation• Conclusions

Page 7: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 7

OrderedInterconnect

$

P M$

P M$

P M$

P M

System model

• Ordered interconnect• Processor/Memory nodes

– Directory state– Adaptive mechanism

Bandwidth AdaptiveMechanism

Network Interface

Caches

Processor

Memory

Directo

ry

Controller

Page 8: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 8

Bandwidth adaptive mechanism

• Choose broadcast or unicast for each miss

• Goal: minimize latency - avoid extreme queuing delay

• Approach: limit average interconnect utilization– Contention dominates miss latency at high utilizations– Interconnect utilization goal (e.g., 75%)– Adjust rate of broadcast– Feedback control system

Page 9: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 9

Implementation

• Two counters at each processor– Utilization counter (Above or below utilization threshold?)– Policy counter (Probability of broadcast?)

• At each processor– Each cycle: Monitor local link & adjust utilization counter– Each sampling interval: Adjust policy counter based on

utilization counter– Each miss: Compare policy counter with a random number

• Why random?– Steady state of mixed broadcasts and unicasts– Enables us to avoid oscillation

Page 10: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 10

Outline

• Overview• Bandwidth adaptive mechanism• Hybrid protocol

– Snooping-like operation– Directory-like operation– Complexity & Scalability

• Evaluation• Conclusions

Page 11: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 11

•Ordered broadcast

•Marker places request in total order

marker

request

Snooping-like operation

P2

Owner

P1

Shared

P3

Invalid

P0

Requestor

M0

Home

Data

Low latency cache-to-cache, but requires broadcastOwner: P1

Page 12: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 12

markerrequest

•Add indirection•Uses order to avoid acks•Similar to Alpha GS320

marker

re-request

Directory-like operation

P2

Owner

P1

Shared

P3

Invalid

P0

Requestor

M0

Home

Data

Avoids broadcast, but frequently adds indirectionOwner: P1, Sharers: {P2}

Page 13: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 13

Protocol races

• Choose broadcast or unicast for each miss

• Protocol simultaneously allows– Broadcast requests– Unicast requests– Forwarded requests– Writebacks

• Like all protocols, BASH has protocol races

Page 14: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 14

Protocol race example

P2

Owner

P1

Shared

P3

Requestor

P0

Requestor

M0

HomeOwner: P1, Sharers: {P2}

Broadcast

Unicast

re-request

Data

Page 15: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 15

Protocol race example

P2

Invalid

P1

Invalid

P3

Modified

P0

Requestor

M0

HomeOwner: P3, Sharers: Ø

Unicast

re-request

Page 16: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 16

Protocol race example

P2

Invalid

P1

Invalid

P3

Modified

P0

Requestor

M0

HomeOwner: P3, Sharers: Ø

Unicast

re-request

Data

2nd re-request

Page 17: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 17

Protocol races

• Race detection: directory audits all requests– Observes all requests– Compares request destination set with current sharers– Occasionally needs to re-issue a request

• Requests are processed uniformly– Processors - respond with data or invalidate– Directory - audit request, may forward data or request

See paper for more information

Page 18: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 18

Complexity

• One “cost” of implementing BASH

• Quantifying complexity is difficult…– Protocol controllers are finite state machines– Similar number of states– BASH has twice as many events and transitions

• Moderate complexity– Additive, not multiplicative

• Similar to Multicast Snooping– Original proposal [Bilir et al., ISCA 1999]– Enhanced, specified & verified [Sorin et al., TPDS 2002]

Page 19: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 19

Scalability

• Limited by ordered interconnect– BASH eliminates broadcast-only nature of snooping

• Recent systems with an ordered interconnect– Compaq AlphaServer GS320 (32 processor) - directory– Sun UE15000 (106 processors) - snooping– Fujitsu PrimePower 2000 (128 processors) - snooping

• Potential alternative– Timestamp Snooping network [Martin et al., ASPLOS 2000]

Page 20: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 20

Outline

• Overview• Bandwidth adaptive mechanism• Hybrid protocol• Evaluation• Conclusions

Page 21: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 21

Workloads & methods

• Workloads [CAECW ‘02]– OLTP: IBM’s DB2 & TPCC-like (1GB database)– Static web: Apache– Dynamic web: SlashCode– Java middleware: SpecJBB– Scientific workload: Barnes-Hut

• Setup and tuned for 16 processors• Full system simulation

– Virtutech’s Simics– Solaris 8 on SPARC V9– Blocking processor model

• Memory system simulator– Captures timing, races, and all transient states

Page 22: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 22

Three Questions

1) Is our adaptive mechanism effective?

2) Does BASH adapt to multiple workloads?

3) Does BASH adapt to multiple configurations?

Page 23: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 23

(1) SpecJBB on 16 processors

Page 24: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 24

(1) SpecJBB on 16 processors, 4x broadcast cost

Page 25: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 25

(1) SpecJBB on 16 processors, 4x broadcast cost

Page 26: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 26

(2) Can BASH adapt to multiple workloads?1600 MB/s links

Similar SnoopingDirectory

Page 27: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 27

(2) Can BASH adapt to multiple workloads?1600 MB/s links

Page 28: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 28

(3) Can BASH adapt to multiple configurations?

Micro-benchmark

1600 MB/s links

Page 29: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 29

(3) Can BASH adapt to multiple configurations?

Micro-benchmark

1600 MB/s links

Page 30: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 30

Results Summary

1) Is our adaptive mechanism effective?

• Yes2) Does BASH adapt to multiple workloads?

• Yes3) Does BASH adapt to multiple configurations?

• Yes

Page 31: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 31

Conclusions

• Bandwidth Adaptive Snooping Hybrid (BASH)– Hybrid of snooping and directories– Simple bandwidth adaptive mechanism

• Adapts to various workloads & system configurations– Robust performance– Outperforms base protocols in some cases

• Future directions– Focus bandwidth on likely cache-to-cache transfers– Explore multicasts– Power-adaptive coherence

Page 32: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 32

Page 33: (C) 2002 Milo MartinHPCA, Feb. 2002 Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.

BASH – Milo Martinslide 33

Queuing model motivation

Knee

• A multiprocessor as a simple queuing model– Exponential service & think time distributions

“interconnect”

“processors”

requests responses


Recommended