+ All Categories
Home > Documents > The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape...

The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape...

Date post: 27-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
61
1 DoE/DoD Workshop, Nov. 29 2007 The Shape Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge Associate Dean for Research McCourtney Prof. of CS & Engr University of Notre Dame IBM Fellow (retired)
Transcript
Page 1: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

1DoE/DoD Workshop, Nov. 29 2007

The ShapeShape

of Things to Come: Future Trends

in HPC Architectures

Peter M. Kogge

Associate Dean for Research

McCourtney Prof. of CS & Engr

University of Notre Dame

IBM Fellow (retired)

Page 2: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

2DoE/DoD Workshop, Nov. 29 2007

• Today: “Killer Micros” becoming “physics-limited” very hungry multi- core monsters

• Maturing Multi-threading & Tiling providing more nimble systems

• Is there an alternative evolutionary path we’ve ignored?

My View: Future HPC Evolutionary Paths Are Multiplying

Page 3: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

3DoE/DoD Workshop, Nov. 29 2007

My Concern: We’re Focused on the Wrong Aspect of the Wall

What about bandwidth?

TodayFuture Trend/Memory Wall

}

7% Performance DifferenceNOTE: ACCOUNTS ONLY FOR COMPUTATION (NOT MPI)!

Chart courtesy Richard Murphy,

SNL

Application:Trilinos

Page 4: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

4DoE/DoD Workshop, Nov. 29 2007

And Perhaps Missing Another Wall

Does Supplying Energy and Getting Rid of Heat

Dominate Area?

Page 5: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

5DoE/DoD Workshop, Nov. 29 2007

It Also Bothers Me That:

• Modern microprocessor state growing as Moore’s Law– Regardless of the number of computational units

• Memory is as dumb as it was 50 years ago

• We insist on giving persistent names to the tarballs representing the physical cores

• And go to great extremes to separate the persistent names of memory from its location

• Newer classes of apps “visit” data irregularly– Where “caching” copies is wasted energy

Page 6: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

6DoE/DoD Workshop, Nov. 29 2007

The Way We Were

Page 7: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

7DoE/DoD Workshop, Nov. 29 2007

The Historical Top 10

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1/1/72 1/1/76 1/1/80 1/1/84 1/1/88 1/1/92 1/1/96 1/1/00 1/1/04 1/1/08 1/1/12 1/1/16 1/1/20

GFl

ops

Historical Rmax Rmax Rmax Leading Edge Rpeak Leading Edge

CAGR = 1.9

Page 8: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

8DoE/DoD Workshop, Nov. 29 2007

Clock Rates

10

100

1,000

10,000

100,000

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Clo

ck (M

Hz)

Historical ITRS Max Clock Rate (12 invertors)

0.01

0.10

1.00

10.00

1/1/93 1/1/95 12/31/96

12/31/98

12/30/00

12/30/02

12/29/04

12/29/06

Clo

ck (G

Hz)

Top 10 Top SystemMicroprocessorsTop 10

Page 9: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

9DoE/DoD Workshop, Nov. 29 2007

Processor Parallelism

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1/1/93 1/1/95 12/31/96 12/31/98 12/30/00 12/30/02 12/29/04 12/29/06

Proc

esso

r Pa

ralle

lism

Top 10 Top System

Page 10: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

10DoE/DoD Workshop, Nov. 29 2007

Concurrency: Flops per Cycle

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1/1/93 1/1/95 12/31/96 12/31/98 12/30/00 12/30/02 12/29/04 12/29/06

Tota

l Con

curr

ecnc

y

Top 10 Top System Top 1 Trend

CAGR 1.65

Page 11: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

11DoE/DoD Workshop, Nov. 29 2007

The Moore’s Law We Know & Love

• Goal: 4X Functionality every 3 years

• Underlying technology improvement:– Growth in transistor density– Growth in transistor switching speed– Growth in size of producible die

• Microprocessors: Functionality=IPS– ~1/2 from higher clock rate– ~1/2 from more complex microarchitectures

• Memory: Functionality = Storage capacity– ~2X from smaller transistors– Shrinkage in architecture of basic bit cell– Increase in die size

YesYesNot in commercial volumes

No: heatNo: complexity

YesYes, but ..

Not at commercially viable pricesAnd it is silent on inter-chip I/O

Knew

Page 12: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

12DoE/DoD Workshop, Nov. 29 2007

The Darwinian Multi-Core Evolution

NowUp to ~2002

Page 13: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

13DoE/DoD Workshop, Nov. 29 2007

Area Scaling Alone Reveals the Rationale for Multi-Core

1

10

100

1000

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Sing

le C

ore

Proj

ecte

d D

ie S

ize

(mm

2)

Each line represents the scaling of a unique real microprocessor chip from its inception

ITRS ProjectedEconomic Die Size

Page 14: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

14DoE/DoD Workshop, Nov. 29 2007

How Many Can We Fit on a cm2?Assume we scale entire current single core chip & replicate to fill 280 sq mm die

1

10

100

1000

1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Num

ber

of u

P pe

r Sq

uare

Cen

timet

er

Answer Potentially 1000’s!!!!

Page 15: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

15DoE/DoD Workshop, Nov. 29 2007

And a Flood Tide of Recent Announcements

0

5

10

15

20

25

30

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

# of

New

Mul

ti-co

re A

nnou

ncem

ents

They AreAll

Multi Core Now

Page 16: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

16DoE/DoD Workshop, Nov. 29 2007

And Not Just “Twosies”

1

10

100

1000

10000

1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

# of

Cor

es/D

ie in

New

Ann

ounc

emen

ts

Page 17: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

17DoE/DoD Workshop, Nov. 29 2007

The Classical Limiting Factors for Microprocessor Chips:

Power & Contacts

Page 18: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

18DoE/DoD Workshop, Nov. 29 2007

Peak Logic Clock Rates

10

100

1,000

10,000

100,000

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Clo

ck (M

Hz)

Historical ITRS Max Clock Rate (12 invertors)

10

100

1000

10000

100000

10100100010000

Feature Size

Clo

ck (M

Hz)

Historical ITRS Max

3 GHz

Classic

al Moo

re’s

Law

Classic

al Moo

re’s

Law

2005 projection was for 5.2 GHz – and we didn’t make it in production.Further, we’re still stuck at 3+GHz in production.

3 GHz

Page 19: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

19DoE/DoD Workshop, Nov. 29 2007

Why the Clock Flattening? POWERPOWER

1

10

100

1000

1976 1986 1996 2006

Wat

ts p

er D

ie

0.1

1

10

100

1000

1976 1986 1996 2006W

atts

per

Squ

are

cm

Hot, Hot, Hot!

Light Bulb

Iron

Rocket Nozzle

Page 20: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

20DoE/DoD Workshop, Nov. 29 2007

Because Vdd No Longer Declining

0

1

2

3

4

5

6

1970 1980 1990 2000 2010 2020

Vdd

Page 21: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

21DoE/DoD Workshop, Nov. 29 2007

Multi-core Power and Clock

Chip Power = Cap/device*#_devices/core*cores/chip

* Clock * Voltage2

Max LimitWill Grow

only Slightly

Reaching anAsymptotic

Limit

AssumeConstant

forMulticore

Decreasing~linearly

withTechnology

IncreasingAs Square

withTechnology

Max Clock RateGrows

Rapidlywith

Technology

But ONLY KNOB to Balance Equation!!

Page 22: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

22DoE/DoD Workshop, Nov. 29 2007

Rewriting for Clock

Clock = Max_chip_power(T) * Reduction_in_core_area-------------------------------------------------------------

Cap_per_device * V2

This now governs Core Frequency.Not Faster Transistors!!!

Page 23: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

23DoE/DoD Workshop, Nov. 29 2007

Relative Change In Factors

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

2004 2008 2012 2016 2020

Rel

ativ

e to

200

4

Max Power (N) Area (N) Cap per Device (D)Vdd (D) Power Limited Clock

Page 24: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

24DoE/DoD Workshop, Nov. 29 2007

What Kind of Core Should We Replicate?

1.00 1.50 2.00 2.50 3.00 3.50 4.001

4

80

2

4

6

8

10

12

14

Rel

ativ

e IP

S

Relative Clock Issue Width

Complex designsGive most performance

1.00 1.50 2.00 2.50 3.00 3.50 4.001

4

80

2

4

6

8

10

12

14

16

18

Rel

ativ

e A

rea

Relative Clock Issue Width

But also largest area

1.001.502.00

2.503.00

3.504.00

12

46

810

0.0

0.5

1.0

1.5

2.0

2.5

3.0

IPS

per U

nit A

rea

Relative ClockIssue Width

But simpler givesbetter performance/area

Simpler is Better

Page 25: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

25DoE/DoD Workshop, Nov. 29 2007

10

100

1,000

10,000

100,000

1990 1995 2000 2005 2010 2015 2020

MH

z

Intel Bus Speed Intel CPU Clock ITRS: Max On-Chip ClockITRS: Max Off-Chip Clock Constant Dissipation Clock "0.3 of Power Limited Clock"

What About Memory Bus Clocks?

Historica

l Intel

CPU Clock

ITRS Projected CPU Clock

Clock for Constant Power Density

Historical Intel Memory Bus RateAssumed Projected Memory Rate

Page 26: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

26DoE/DoD Workshop, Nov. 29 2007

1

10

100

1,000

2004 2006 2008 2010 2012 2014 2016 2018 2020

Gro

wth

Fac

tor

over

200

4

ITRS Signal Pads per Hi Perf uP Ball Bond Contacts per sq. cmSignal Pads * Modified Off Chip Clock Transistor Density * Power Limited Clock

Does Logic Performance Match Off-chip Bandwidth Potential?

A Growing Mismatch!

ITRS Ball Bond Growth Rate

ITRS Hi Perf uP Signal Pad Growth Rate

Signal Pads * Modified Off Chip Clock

Transistor Density * Power Limited Clock

Page 27: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

27DoE/DoD Workshop, Nov. 29 2007

The Multi-Core Family Tree

Page 28: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

28DoE/DoD Workshop, Nov. 29 2007

Cache/Memory

Cache

Core Core

. . .

. . .Cache

Core Core

. . .

(a) Hierarchical Designs

CORE

CORE

CORE

MEM . . .

Cache/Memory

(b) Pipelined Designs

Cache/Memory

Core

Cache/Memory

Core. . .

Cache/Memory

Core

Cache/Memory

Core

. . .

Interconnect & Control

(c) Array Designs

This may be the Architecture You Think of for Multi-Core

• Intel Core Duo• IBM Power5• AMD Opteron• SUN Niagara• …

External Bandwidth = sum of escapes from cores

• IBM Cell• Most Router chips• Many Video chips

• Terasys• Execube• Yukon• Intel Teraflop

Page 29: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

29DoE/DoD Workshop, Nov. 29 2007

Cache/Memory

Cache

Core Core

. . .

. . .Cache

Core Core

. . .

(a) Hierarchical Designs

CORE

CORE

CORE

MEM . . .

Cache/Memory

(b) Pipelined Designs

Cache/Memory

Core

Cache/Memory

Core. . .

Cache/Memory

Core

Cache/Memory

Core

. . .

Interconnect & Control

(c) Array Designs

But There’s at Least One Approach with Lower Bandwidth Needs

• Intel Core Duo• IBM Power5• Sun Niagara• …

• Terasys• Execube• Yukon• Intel Teraflop

External bandwidth largely independent of # of cores

• Most Router chips• Many Video chips• Some aspects of IBM Cell

Page 30: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

30DoE/DoD Workshop, Nov. 29 2007

Cache/Memory

Cache

Core Core

. . .

. . .Cache

Core Core

. . .

(a) Hierarchical Designs

CORE

CORE

CORE

MEM . . .

Cache/Memory

(b) Pipelined Designs

Cache/Memory

Core

Cache/Memory

Core. . .

Cache/Memory

Core

Cache/Memory

Core

. . .

Interconnect & Control

(c) Array Designs

And then there’s Array Approaches that Provide Significant Internal Memory

• Intel Core Duo• IBM Power5• Sun Niagara• …

• IBM Cell• Most Router chips• Many Video chips

• Terasys• Execube• Yukon• Intel Teraflop• Some Aspects of Cell

Particularly Effective for Weak Scaling Apps

Page 31: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

31DoE/DoD Workshop, Nov. 29 2007

And Today’s Memory Architecture is Evolving to Feed the Beast

Mic

ropr

oces

sor

Nor

th B

ridge

M

emor

y C

ontr

olle

rMemoryInterface

Mic

ropr

oces

sor

Mic

ropr

oces

sor

Nor

th B

ridge

M

emor

y C

ontr

olle

rN

orth

Brid

ge

Mem

ory

Con

trol

ler

MemoryInterface

State of the Art Peak Aggregate Bandwidth: ~ 6.4 GB/s

Page 32: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

32DoE/DoD Workshop, Nov. 29 2007

… But Not to Reduce Latency

. . .. . .

AMB: AdvancedMemoryBuffer Chip

We’ve introduced 16 extra chip crossings!

… And at ~2X Power Increase

Page 33: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

33DoE/DoD Workshop, Nov. 29 2007

A Simple Case Study

Page 34: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

34DoE/DoD Workshop, Nov. 29 2007

A Modern HPC SystemComputational Board• 4 PE Nodes• Each PE Node:

– Dual core Opteron @ 2.6GHz– 4 DDR2 2GB DIMMs

• 4 Routers per BoardKey Ratios (all “Peak”)• 2 Flops per cycle per core• 1.5B per Flop• 1.25B/s of Memory BW per

Flop per core • 0.25B/s Link BW per flop per

PE• 0.06-0.25B/s of Bisection BW

per Flop

Page 35: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

35DoE/DoD Workshop, Nov. 29 2007

What Are We Doing with the Total System Silicon?

Silicon Area Distribution

Memory86%

Processors3%

Routers3%

Random8%

Power DistributionMemory

9%

Processors56%

Routers33%

Random2%

Page 36: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

36DoE/DoD Workshop, Nov. 29 2007

What Is the Board Space Utilization Like?

Board Space DistributionMemory

10%

Processors24%

Routers8%

Random8%

White Space50%

Page 37: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

37DoE/DoD Workshop, Nov. 29 2007

A Dual Core Processor Chip

http://techreport.com/reviews/2005q2/opteron-x75/dualcore-chip.jpg

0

20

40

60

80

100

120

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

Are

a (s

q. m

m)

Single Core Area Single HT Area Single Memory Controller

2 Cores, 56.2%

1 Memory Controller,

19.4%

3 HT Links, 5.4%

Other, 19.0%

Page 38: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

38DoE/DoD Workshop, Nov. 29 2007

Some Projections• Off chip memory controls

performance

• IPC/core more sensitive to latency than bandwidth

• “Flat” off chip physical latency => relative latency grows with clock

48%Drop 73%

Drop

82%Increase

3.08 XIncrease

1.048%Drop 73%

Drop

82%Increase

3.08 XIncrease

1.0

Single Core Performance Factors

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

Rel

ativ

e C

hang

e O

ver 2

004

Clock Growth Relative IPC Change Relative IPS

Clock

IPCIPS/Core

Ack. R. Murphy, SNL

Page 39: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

39DoE/DoD Workshop, Nov. 29 2007

Where Does This Lead Us?• Use density increase to replicate cores

• Keep clock flat to minimize power

• Still need additional I/O for both bandwidth & latency management (reduce queuing delays by multiple banks)

0

20

40

60

80

100

120

2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2022

Num

ber o

f Cor

es

Just Cores Cores with HT & DDR Ctl

UnproductiveSilicon

Page 40: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

40DoE/DoD Workshop, Nov. 29 2007

So What May This Mean to the Top 500?

1.E-011.E+001.E+011.E+021.E+031.E+041.E+051.E+061.E+071.E+081.E+091.E+10

1/1/72 1/1/76 1/1/80 1/1/84 1/1/88 1/1/92 1/1/96 1/1/00 1/1/04 1/1/08 1/1/12 1/1/16 1/1/20

GFl

ops

Historical Rmax RmaxRmax Leading Edge Rpeak Leading EdgeEvolutionary Heavy Node Projection

Page 41: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

41DoE/DoD Workshop, Nov. 29 2007

The Emergence of More Organized Architectures

Page 42: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

42DoE/DoD Workshop, Nov. 29 2007

Tiling & Local Memory Regularizes Layout, Lowers Latency, Reduces Off-Chip

Bandwidth Needs

• Work well with partitionable algorithms• Good fit for applications that support weak scaling• Inter-core communication DOES NOT USE CONTACTS• Compiling problem: placement of kernels AND data

structures to minimize inter-core bandwidth• Problems with global synchronization

Page 43: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

43DoE/DoD Workshop, Nov. 29 2007

Multi-Threading

• Provide explicit latency hiding

• Permits simpler cores with more efficient use of data flow

• Increase potential for memory references “in flight”

• Shares path to memory

• But still doesn’t help “single thread” performance in terms of chained memory references

• Nor reduction of off-chip bandwidth (and contacts)

Page 44: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

44DoE/DoD Workshop, Nov. 29 2007

A Brief History of Multi-threaded Processors

0

1

2

3

4

5

6

7

1960 1970 1980 1990 2000 2010

Rel

evan

t Fea

ture

s

6600 Space Shuttle IOP

HEP

J-Machine

Horizon MTA

HTMT

PIM Lite

Hyper ThreadingP5, U4

NiagaraEldorado

Page 45: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

45DoE/DoD Workshop, Nov. 29 2007

Sun’s Niagara• 8 4-way multi-threaded single

issue cores

• 3MB 12 bank shared L2• 4 DDR2 Memory Interfaces• Measured 5.76 IPC vs Peak of

8 on Java Business B/M• 63W @90nm (2W cores)

Cores, 37%

L2, 21%

FPU, 2%

Crossbar, 3%

DDR2 Interfaces,

11%

Other Functions,

3%

Remainder, 23%

1

10

100

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

Are

a (s

q. m

m)

Single Core Area Entire L2 Area Single DDR2 I/F Area Crossbar

Page 46: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

46DoE/DoD Workshop, Nov. 29 2007

Cray’s XMT

Supports 128 Threads/core

John Feo, David Harper, Simon Kahan, Petr Konecny, “Eldorado”, Computing Frontiers, 2005

Page 47: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

47DoE/DoD Workshop, Nov. 29 2007

Some Interesting Comparisons

Core L1 FPU Area pJ/~Niagara-I 24 No 11.92 1719Niagara-II 24 yes 23.85 2364

MIP64 64 yes 9.59MIPS64 40 No 436

So Multi-Threading is not Free

Page 48: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

48DoE/DoD Workshop, Nov. 29 2007

Problems Still Remain• Programming models not changed

• States still very heavy

• Compiling to specific cores

• Data partitioning

• Problems with coherency

• Doesn’t address barriers, sync points, …

• Doesn’t help emerging low reuse apps– AMR– Data mining– Graph traversals– Non-numeric solvers such as SAT

Page 49: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

49DoE/DoD Workshop, Nov. 29 2007

Are We Ready for a Mutation?

Page 50: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

50DoE/DoD Workshop, Nov. 29 2007

Ideas

• Ultra light weight “butterflies” take functions to the data flowers– Memory reference becomes “traveling

threadlet”

• But, like flowers, data can respond to the touch of the butterfly.– Add small amount of metadata to each word

• Finally, it’s the “flowers” whose location is important

Page 51: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

51DoE/DoD Workshop, Nov. 29 2007

Adding Metadata to the Memory

• “Special Values”– Uninitialized, error code, null

• Full/Empty bits– And multiple flavors of “empty”– Esp. “empty pending outstanding value”– Greatly simplifies Producer/Consumer

• Forwarding

• Locked

• Traps

• Especially interesting when aliased to thread state registers

Page 52: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

52DoE/DoD Workshop, Nov. 29 2007

Full/Empty Bits & MPI

Ack. A. Rodrigues, SNL

Page 53: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

53DoE/DoD Workshop, Nov. 29 2007

One Step Further: Allowing the Threads to Travel

• “Overprovision” memory with huge numbers of anonymous execution sites– Place at bottom of, or near, memory

• Reduce state of a thread to a memory reference

• Make creating a new thread “near” some memory a cheap operation

• Allow thread to “move” to new site when locality demands

• Don’t require target to maintain code

Latency reduced by huge factors

Page 54: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

54DoE/DoD Workshop, Nov. 29 2007

“Piglet” Processing At Base of Memory

Target Address Operands & Working Registers CodePCAdditional Data Payload

MANAGEMENT

PIGLETPROCESSING

Memory NODES

NETWORKINTERCONNECTCACHE

HEAVYWEIGHTISA

PROCESSING

PIGLETPROCESSING

“CLASSICAL”HOST CPU NODE

THREADLET FORMAT

ADDRESSMANAGEMENT

PIGLETPROCESSING

ADDRESSMANAGEMENT

PIGLETPROCESSING

Memory Bank

PIGLETPROCESSING

Page 55: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

55DoE/DoD Workshop, Nov. 29 2007

Types of Piglet Programs

• Classical memory operations

• Atomic Memory Operations

• Short Vector to Memory

• “Object-oriented” method evaluation at the object

• Small slices of programs

Page 56: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

56DoE/DoD Workshop, Nov. 29 2007

Example: AMO• AMO = Atomic Memory Operation

– Update some memory location– With guaranteed no interference– And return result

• Parcel Registers: A=Address, D=Data, R=Return Address• Sample Code:

MOVEL1: LOCK & LOADOPSTORE & RELEASE L1SWAPRAMOVE ASTOREQUIT

Atomic Update “At the Memory”

Return Result

Bottom Line: 2 network transactions rather than up to 6!

Page 57: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

57DoE/DoD Workshop, Nov. 29 2007

Vector Add (Z[I]=X[I]+Y[I]) via Threadlets

X

MEMORY

X

MEMORY

X

MEMORY

X

MEMORY

Type 1

Y

MEMORY

Y

MEMORY

Y

MEMORY

Y

MEMORY

Type 2

Type 3

Accumulate QX’s in payload

Spawn type 2s

Fetch Qmatching Y’s,

add to X’s,save in payload,store in Q Z’s

Z

MEMORY

Z

MEMORY

Z

MEMORY

Z

MEMORY

Stride thru Q elements

Transaction Reduction factor: •1.66X (Q=1)•10X (Q=6) • up to 50X (Q=30)

Page 58: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

58DoE/DoD Workshop, Nov. 29 2007

Conclusions

Page 59: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

59DoE/DoD Workshop, Nov. 29 2007

Conclusions• (Hierarchical) Multi-core has taken over

– But clock rate will be limited by power– And # of useable cores by contacts

• Simpler cores: more area/energy efficient– But we can’t use all them in hierarchical architectures

• Latency will stifle single-thread performance• Multi-threading provides better utilization

– But at an energy cost

• Pipelined/Array chips reduce need for off-chip bandwidth– But then run into power-limiting clock problem– And require 2D data/code partitioning of code

• Are there alternatives that don’t fix code to cores?

BEST HPC Architecture != Best commodity architecture

Page 60: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

60DoE/DoD Workshop, Nov. 29 2007

A Personal Goal

PIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIM

InterconnectionNetwork

PIM ClusterPIM Cluster

“Host”PIM Cluster

I/O

A “PIM Cluster”

A “PIM DIMM”PIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIMPIM PIM PIM PIM PIM PIM PIM PIM

InterconnectionNetwork

PIM ClusterPIM ClusterPIM ClusterPIM Cluster

“Host”“Host”PIM ClusterPIM Cluster

I/OI/O

A “PIM Cluster”

A “PIM DIMM”

• Huge increase in silicon per board• Level out power dissipation

Page 61: The Shape of Things to Come: Future Trends in HPC ......DoE/DoD Workshop, Nov. 29 2007 1 The Shape of Things to Come: Future Trends in HPC Architectures Peter M. Kogge. Associate Dean

61DoE/DoD Workshop, Nov. 29 2007

The Future

Will We Design Like This? Or This?


Recommended