+ All Categories
Home > Documents > Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k...

Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k...

Date post: 24-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
19
© Copyright IBM Corporation 2005 From Ingenuity to Impact Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group
Transcript
Page 1: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

© Copyright IBM Corporation 2005

From Ingenuity to Impact

Technology Trends Presentation For Power Symposium 2006

8-23-06

Darryl Solie, Distinguished Engineer, Chief System ArchitectIBM Systems & Technology Group

Page 2: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 2

Agenda

What’s driving technology today?

Why Cell BE today?

Emerging System Strategy – Scale Out & Acceleration

Intel’s direction

Page 3: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

3

© 2002 IBM CorporationIBM Engineering & Technology Services

When Moore Is Less,Watts Happen

Page 4: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

4

© 2002 IBM CorporationIBM Engineering & Technology Services

A Second Observation:Where have all the Gigahertz gone?

Page 5: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 5

Technology Scaling – We’ve hit the wall

1988 1992 1996 2000 2004 2008 20120.2

0.40.60.81

2

46810

20

Conventional Bulk CMOS SOI (silicon-on-insulator) High mobility Double-Gate

Rel

ativ

e D

evic

e Pe

rfor

man

ce

Year

?

Page 6: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 6

What’s causing the problem?

10S Tox=11AGate Stack

Gate dielectric approaching a fundamental limit (a few atomic layers)

0.010.110.001

0.01

0.1

1

10

100

1000

Gate Length (microns)

Active Power

Passive Power

1994 2004Pow

er D

ensi

ty (W

/cm

2 )

65 nM

Gate Length (microns)

Page 7: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 7

Steam Iron5W/cm2

? Opp

ortu

nity

Has This Ever Happened Before?

Page 8: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

Page 8

Engineering & Technology Services

© IBM Corporation 2005

Cell Processor Chip Overview

3 GHz 64 Bit PowerPC Processor

8 SPU’s (VMX-like accelerators)

25 GBytes/sec memory bandwidth

Up to 75 GBytes/sec I/O bandwidth

0.5-1 GByte High Speed Memory

~ 95 Watts @ 3 GHz

64b PowerProcessor

SynergisticProcessor

SynergisticProcessor

...

Mem. Contr.

Flexible IO

Page 9: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

Page 9

Engineering & Technology Services

© IBM Corporation 2005

Cell BE Processor Overview

Heterogeneous multi-core system architecture- Power Processor Element for

control tasks- Synergistic Processor Elements for

data-intensive processing

Synergistic Processor Element (SPE) consists of - Synergistic Processor Unit (SPU)- Synergistic Memory Flow Control

(SMF)Data movement and synchronizationInterface to high-performance Element Interconnect Bus

16B/cycle (2x)16B/cycle

BIC

FlexIOTM

MIC

Dual XDRTM

16B/cycle

EIB (up to 96B/cycle)

16B/cycle

64-bit Power Architecture with VMX

PPE

SPE

LS

SXUSPU

SMF

PXUL1

PPU

16B/cycleL2

32B/cycle

LS

SXUSPU

SMF

LS

SXUSPU

SMF

LS

SXUSPU

SMF

LS

SXUSPU

SMF

LS

SXUSPU

SMF

LS

SXUSPU

SMF

LS

SXUSPU

SMF

Page 10: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

Page 10

Engineering & Technology Services

© IBM Corporation 2005

General Purpose Cores vs Synergistic Processor Elements

Optimized acceleration can provide significant advantages

Page 11: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

Engineering and Technology Services

© 2006 IBM Corporation11

FreescaleMPC8641D

1.5 GHz

Theoretical Peak Operations

0

50

100

150

200

250

Bill

ion

Ops

/ se

c

FP (SP) FP (DP) Int (16 bit) Int (32 bit)

AMDAthlon™ 64 X2

2.4 GHz

PowerPC®

970MP2.5 GHz

Cell BroadbandEngineTM

3.2 GHz

IntelPentium D®

3.2 GHz

Page 12: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Systems & Technology Group 12

Cell BE Performance

12x290 fps (per SPE)200 fps (IA32)mpeg2 decoder (sdtv)video processing

12x770 Telemark (per SPE)

501 Telemark(1.4GHz mpc7447)EEMBCcommunication

18x1.98 Gbps (per SPE)0.85 Gbps (IA32)SHA-1

6x2.3 Gbps (per SPE)2.68 Gbps (IA32)MD-5

10x0.16 Gbps (per SPE)0.12 Gbps (IA32)TDES

14x2Gbps (per SPE)1.1 Gbps (IA32)AESsecurity15x24 fps (BE)1.6 fps (G5/VMX)TRE

12x240 MVPS (per SPE)160 MVPS (G5/VMX)transform-lightgraphics6x420 Mcups (per SPE)570 Mcups (IA32)smith-watermanbioinformatic2x12 GFLops (BE)6 GFlops (IA32)Linpack (D.P.)

8x150 GFlops (BE)18 GFlops (IA32)Linpack (S.P.)

8x190 GFlops (8SPEs)25 GflopsMatrix Multiplication (S.P.)HPC

BE PerfAdvantage3 GHz BE3 GHz GPPAlgorithmType

BE’s performance is about an order of magnitude better than traditional GPPs for media and other applications that can take advantage of its SIMD capability

BE can outperform a P4/SSE2 at same clock rate by 3 to 18x (assuming linear scaling) in various types of application workloads

Page 13: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 13

Clusters andVirtualization

High Density Racks/Blades

Large SMPs

IBM Server StrategySc

ale

Up

/ SM

P C

ompu

ting

Scale Out / Distributed Computing

Page 14: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

Page 14

Engineering & Technology Services

© IBM Corporation 2005

IBM BladeCenter

Page 15: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 15

Blue Gene/L – Lawrence Livermore System131072 Processors / 262144 Floating Point Units360 Teraflops / 16 Terabytes of Memory10X Performance / 28X Less Power / 10X smaller

Page 16: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 16

Blue Gene/L – Compute SoC

PLB (4:1)

“Double FPU”

Ethernet Gbit

JTAGAccess

144 bit wide DDR256MB

JTAG

Gbit Ethernet

440 CPU

440 CPUI/O proc

L2

L2

MultiportedSharedSRAM Buffer

Torus

DDR Control with ECC

SharedL3 directoryfor EDRAM

Includes ECC

4MB EDRAM

L3 CacheorMemory

l

6 out and6 in, each at 1.4 Gbit/s link

256

256

1024+144 ECC256

128

128

32k/32k L1

32k/32k L1

“Double FPU”

256

snoop

Tree

3 out and3 in, each at 2.8 Gbit/s link

GlobalInterrupt

128

2 PPC 440 Processors4 DP Floating Point Units4 MB EDRAMFull Mesh Toroid InterconnectIntegrated Memory Control/Ethernet~ 10-13 Watts/Chip

Page 17: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Systems & Technology Group 17

Intel EMEA Academic Forum 5/05

Page 18: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Systems & Technology Group 18

Intel EMEA Academic Forum 5/05

Page 19: Technology Trends Presentation For Power …...1.4 Gbit/s link 256 256 1024+ 256 144 ECC 128 32k/32k L1 32k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s

IBM Engineering & Technology Services © 2005 IBM Corporation 19

So………..Where do we go next?

- (More) Application Specific Acceleration!


Recommended