11. Multicore Processors

Post on 16-Jan-2016

45 views 0 download

Tags:

description

11. Multicore Processors. Dezső Sima Fall 2006.  D. Sima, 2006. Overview. 1 Overview of MCPs. 2 Attaching L2 caches. 3 Attaching L 3 caches. 4 Connecting memory and I/O. 5 Case examples. 1. Overview of MCPs (1). Figure 1.1 : Processor power density trends. - PowerPoint PPT Presentation

transcript

11. Multicore Processors

Dezső Sima

Fall 2006

D. Sima, 2006

1 Overview of MCPs•

2 Attaching L2 caches•

5 Case examples•

4 Connecting memory and I/O•

Overview

3 Attaching L3 caches•

1. Overview of MCPs (1)

Figure 1.1: Processor power density trends

Source: D. Yen: Chip Multithreading Processors Enable Reliable High Throughput Computing http://www.irps.org/05-43rd/IRPS_Keynote_Yen.pdf

1. Overview of MCPs (2)

Figure 1.2: Single-stream performance vs. cost

Source: Marr T.T. et al. „Hyper-Threading Technology Architecture and MicroarchitectureIntel Technology Journal, Vol. 06, Issue 01, Febr 14, 2002, pp. 4-16

1. Overview of MCPs (2)

Figure 1.2: Dual/multi-core processors (1)

1. Overview of MCPs (3)

Figure 1.3: Dual/multi-core processors (2)

Attaching of L2 caches

Layout of the cores

Layout of the I/O andmemory architecture

Macro architecture of dual/multi-core processors (MCPs)

Attaching of L3 caches (if available)

1. Overview of MCPs (4)

Inclusion policy

Allocation to the cores

Banking policy

Attaching L2 caches to MCPs

Use by instructions/data

Integration of L2 caches to the proc. chip

2. Attaching L2 caches

2.1 Main aspects of attaching L2 caches to MCPs (1)

Shared L2 cache for all cores

Allocation of L2 caches to the cores

Private L2 cache for each core

POWER4 (2001)

Montecito (2006?)

UltraSPARC IV (2004)

Smithfield (2005)

Athlon 64 X2 (2005)

POWER5 (2005)

Core Duo (2006)

Yonah (2006)UltraSPARC T1 (2005)

Expected trend

Inclusion policy

Allocation to the cores

Banking policy

Attaching L2 caches to MCPs

Use by instructions/data

Integration of L2 caches to the proc. chip

2.1 Main aspects of attaching L2 caches to MCPs (2)

Exclusive L2

Inclusion policy of L2 caches

Inclusive L2

L1

Memory

L2Memory

L2L1

Lines replaced (victimized) in the L1 arewritten into the L2

References to data in the L2 initiate reloadingthat cache line into the L1,

L2 operates usually as write back cache(only modified data that is replaced in the L2

Unmodified data that is replaced in the L2 is deleted.

is written back to the memory),

Figure 1.1: Implementation of exclusive L2 caches

Source: Zheng, Y., Davis, B.T., Jordan, M.: “ Performance evaluation of exclusive cache hierarchies”, 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),

2004, pp. 89-96.

Exclusive L2

Inclusion policy of L2 caches

Inclusive L2

Most implementations Athlon 64X2 (2005)

Expected trend

Inclusion policy

Allocation to the cores

Banking policy

Attaching L2 caches to MCPs

Use by instructions/data

Integration of L2 caches to the proc. chip

2.1 Main aspects of attaching L2 caches to MCPs (3)

Unified instr./data cache(s)

Use by instructions/data

Split instr./data caches

POWER4 (2001)

Montecito (2006?)UltraSPARC IV (2004)

Smithfield (2005)

Athlon 64 X2 (2005)

POWER5 (2005)

Core Duo (2006)

Yonah (2006)

UltraSPARC T1 (2005)

Expected trend

Inclusion policy

Allocation to the cores

Banking policy

Attaching L2 caches to MCPs

Use by instructions/data

Integration of L2 caches to the proc. chip

2.1 Main aspects of attaching L2 caches to MCPs (4)

Single-banked implementation

Banking policy

Multi-banked implementation

Inclusion policy

Allocation to the cores

Banking policy

Attaching L2 caches to MCPs

Use by instructions/data

Integration of L2 caches to the proc. chip

2.1 Main aspects of attaching L2 caches to MCPs (5)

On chip L2 tags/contr.,off chip data

Integration to the processor chip

Entire L2 on chip

POWER4 (2001)

UltraSPARC IV (2004)

Athlon 64 X2(2005)

POWER5 (2005)

Presler (2005)Smithfield (2005)

UltraSPARC V (2005)

Expected trend

Unified instruction / data caches Split instruction/data caches

Private L2 caches for each core

On-chip L2 tags/contr.,off-chip data

Entire L2 on-chip On-chip L2 t/coff-chip data

Entire L2 on-chip

Examples:

UltraSPARC IV (2004) Smithfield (2005)Presler (2005)

Montecito (2006?)

Core

L2

Core

L2

Syst. if.

FSB

Core

L2 I L2 D

L3

Core

L2 I L2 D

L3

Syst. if.

FSB

Core

Interconn. network

Mem. contr.

Memory

Syst. if.

Fire Planebus

Core

L2 data L2 data

L2 tags/contr. L2 tags/contr.

(Exclusive L2)

Athlon 64 X2 (2005)

Xbar

Memory

System Request Queue

HT-buscontr.

Memcontr.

HT-bus

L2 L2

2.2 Examples of attaching L2 caches to MCPs (1)

Core Core

Dual core/single banked L2 Dual core/multi banked L2

Shared L2 caches for all cores

2.2 Examples of attaching L2 caches to MCPs (2)

Multi core/multi banked L2

UltraSPARC T1 (2005) (Niagara)(8 cores/4xL2 banks)

POWER4 (2001)POWER5 (2005)

Yonah Duo (2006)Core (2006)Examples:

Core

X-bar

Core

L2 L2

Memory

Mem. contr.

Memory

Mem. contr.

The 128-byte long L2 cache lines are hashed acrossthe 3 modules. Hashing is performed by modulo 3arithmetric applied on a large number of real address bits.

The four L2 modules are interleaved at 64-byte blocks.

Mapping of addresses to the banks:

067

Addr.

0 21

Modulo 30

256

64

128

196

Mapping of addresses to the banks:

L2 contr.

Core

L2

Core

System if.

FSB

Core

X-bar

Core

L2 L2L2

Fabric Bu SContr.Fabric Bus Contr.

L3 tags/contr.

GXcontr.

GX bus

Attaching of L2 caches

Layout of the cores

Layout of the I/O andmemory architecture

Macro architecture of dual/multi-core processors (MCPs)

Attaching of L3 caches (if available)

3. Attaching L3 caches

Inclusion policy

Allocation to the L2 cache(s)

Banking policy

Attaching L3 caches to MCPs

Use by instructions/data

Integration of L3 caches to the proc. chip

3.1 Main aspects of attaching L3 caches to MCPs (1)

Shared L3 cache for all L2s

Allocation of L3 caches to the L2 caches

Private L3 cache for each L2

Montecito (2006?)

UltraSPARC IV+ (2004)

POWER5 (2005) POWER4 (2001)

Inclusion policy

Allocation to the L2 cache(s)

Banking policy

Attaching L3 caches to MCPs

Use by instructions/data

Integration of L3 caches to the proc. chip

3.1 Main aspects of attaching L3 caches to MCPs (2)

Exclusive L3

Inclusion policy of L3 caches

Inclusive L3

L2

Memory

L3Memory

L3L2

Lines replaced (victimized) in the L2 arewritten into the L3

References to data in the L3 initiate reloadingthat cache line into the L2,

L3 operates usually as write back cache(only modified data that is replaced in the L3

Unmodified data that is replaced in the L3 is deleted.

is written back to the memory),

Exclusive L3

Inclusion policy of L3 caches

Inclusive L3

Expected trend

Montecito (2006?)

UltraSPARC IV+ (2004)

POWER4 (2001) POWER5 (2005)

Inclusion policy

Allocation to the L2 cache(s)

Banking policy

Attaching L3 caches to MCPs

Use by instructions/data

Integration of L3 caches to the proc. chip

3.1 Main aspects of attaching L3 caches to MCPs (3)

Unified instr./data cache(s)

Use by instructions/data

Split instr./data caches

All multicore processorsunveiled until now hold

both instruction and data

Inclusion policy

Allocation to the L2 cache(s)

Banking policy

Attaching L3 caches to MCPs

Use by instructions/data

Integration of L3 caches to the proc. chip

3.1 Main aspects of attaching L3 caches to MCPs (4)

Single-banked implementation

Banking policy

Multi-banked implementation

Inclusion policy

Allocation to the L2 cache(s)

Banking policy

Attaching L3 caches to MCPs

Use by instructions/data

Integration of L3 caches to the proc. chip

3.1 Main aspects of attaching L3 caches to MCPs (5)

On chip L3 tags/contr.,off chip data

Integration to the processor chip

Entire L3 on chip

POWER4 (2001)

UltraSPARC IV+ (2005)

POWER5 (2005)Montecito (2006?)

Expected trend

Private L3 cachesfor each L2 cache banks

Shared L3 cachefor all cache banks

Inclusive L3 cache

On-chip L3 tags/contr.,off-chip data Entire L3 on-chip Entire L3 on-chip

Examples: POWER4 (2001)

3.2 Examples of attaching L3 caches to MCPs (1)

Montecito (2006?)

L2 I L2 D

L3

L2 I L2 D

L3

Arbiter

FSB

System if.

Fabric Bus Contr.

L2 L2 L2

Mem. contr.

Memory

L3 data

L3 tags/contr.

On-chip L3 tags/contr.,off-chip data

Private L3 cachesfor each L2 cache banks

Shared L3 cachefor all cache banks

Exclusive L3 cache

On-chip L3 tags/contr.,off-chip data Entire L3 on-chip Entire L3 on-chip

Examples:

3.2 Examples of attaching L3 caches to MCPs (2)

On-chip L3 tags/contr.,off-chip data

Core

L3 tags/contr.

L3 data

Interconn. network

Mem. contr.

Memory

Syst. if.

Fire Planebus

Core

L2

Fabric Bus Contr.

L2

L2

L2

Memory

Memory contr.

L3 tags/contr.

L3 tags/contr.

L3 tags/contr.

L3 data

L3 data

L3 data

POWER5 (2005): UltraSPARC IV+ (2005):

Attaching of L2 caches

Layout of the cores

Layout of the I/O andmemory architecture

Macro architecture of dual/multi-core processors (MCPs)

Attaching of L3 caches (if available)

4. Connecting memory and I/O

Connection policy of I/O and memory

Layout of the I/O and memory architecture in dual/multi-core processors

Integration of the memory controller to the processor chip

4.1 Overview

Connecting both I/O and memory via the system bus

Dedicated connection of I/O and memory

Connection policy of I/O and memory

Asymmetric connection of I/O and memory

Symmetric connection of I/O and memory

POWER4 (2001)

UltraSPARC IV (2004)

POWER5 (2005)

Montecito (2006?)

UltraSPARC T1 (2005)

UltraSPARC IV+ (2005)Presler (2005)

Smithfield (2005)

PA-8800 (2004)PA-8900 (2005)

Core (2006)Yonah Duo (2006)

Athlon64 X2 (2005)

4.2 Connection policy (1)

Yonah Duo/Core (2006/2006)Smithfield/Presler (2005/2005)

L2

FSB

Montecito (2006) PA-8800 (2004)

L2 L2

FSB

Examples:

Syst. bus if. Syst. bus if.

L2

CoreL2Core

Syst. bus if.

contr.

FSB

L3 L3

FSB

L2 I/L2 D

L2 I/L2 D

Syst. bus if.

PA-8900 (2005)

Connecting both I/O and memory via the system bus

4.2 Connection policy (2)

Connecting both I/O and memory via the system bus

Dedicated connection of I/O and memory

Connection policy of I/O and memory

Asymmetric connection of I/O and memory

Symmetric connection of I/O and memory

POWER4 (2001)

UltraSPARC IV (2004)

POWER5 (2005)

Montecito (2006?)

UltraSPARC T1 (2005)

UltraSPARC IV+ (2005)Presler (2005)

Smithfield (2005)

PA-8800 (2004)PA-8900 (2005)

Core (2006)Yonah Duo (2006)

Athlon64 X2 (2005)

(Connecting I/O via the internalinterconnection network,

and memory via the L2/L3 cache)

(Connecting both I/O and memory via the internal interconnection

network

4.2 Connection policy (3)

POWER4 (2001)UltraSPARC T1 (2005)

L2L2 M. contr.

Bus if.

L2

L2

L2Core 7

M. contr.

M. contr.

M. contr.

Core 0

X

b

a

r

Memory

Memory

Memory

Memory

Fabric Bus Contr.

Mem. contr.

Memory

GXcontr.

L3 dir./contr.

L3 data

L2 L2 L2

GX-bus

Chip-to-chip/Mem.-to-Mem.interconn.

JBus

Asymmetric connection of I/O and memory

4.2 Connection policy (4)

Connecting both I/O and memory via the system bus

Dedicated connection of I/O and memory

Connection policy of I/O and memory

Asymmetric connection of I/O and memory

Symmetric connection of I/O and memory

POWER4 (2001)

UltraSPARC IV (2004)

POWER5 (2005)

Montecito (2006?)

UltraSPARC T1 (2005)

UltraSPARC IV+ (2005)Presler (2005)

Smithfield (2005)

PA-8800 (2004)PA-8900 (2005)

Core (2006)Yonah Duo (2006)

Athlon64 X2 (2005)

(Connecting I/O via the internalinterconnection network,

and memory via the L2/L3 cache)

(Connecting both I/O and memory via the internal interconnection

network

4.2 Connection policy (5)

UltraSPARC IV (2004)POWER5 (2005)

Fabric Bus Contr.

L3

GXcontr.

Memcontr.

GX. bus Memory

L2 L2 L2

Chip-chip/Mem.-Mem.interconn.

Core

Interconn. network

Mem. contr.

Memory

Syst. if.

Fire Planebus

Core

L2 data L2 data

L2 tags/contr. L2 tags/contr.

Symmetric connection of I/O and memory (1)

4.2 Connection policy (6)

Athlon 64 X2 (2005)

Xbar

Memory

System Request Queue

HT-buscontr.

Memcontr.

HT-bus

L2 L2

UltraSPARC IV+ (2005)

Core

L3 tags/contr.

L3 data

Interconn. network

Mem. contr.

Memory

Syst. if.

Fire Planebus

Core

L2

Symmetric connection of I/O and memory (2)

4.2 Connection policy (7)

Off-chip memory controller On-chip memory controller

Integration of the memory controller to the processor chip

POWER4 (2001)

UltraSPARC IV+ (2005)

POWER5 (2005)

Montecito (2006?)

UltraSPARC T1 (2005)

UltraSPARC IV (2004)

Athlon 64 X2 (2005)Presler (2005)Smithfield (2005)

PA-8800 (2004)PA-8900 (2005)

Core (2006)Yonah Duo (2006)

Expected trend

4.3 Integration of the memory controller to the processor chip

5. Case examples

5.1 Intel MCPs (1)

The Move to Intel MultiThe Move to Intel Multi--corecore20052005 20062006 2007+2007+PlatformPlatform

ItaniumItanium®®processorprocessor

Desktop Desktop ClientClient

Mobile Mobile ClientClient

All products and dates are preliminary and subject to change without notice.

MP ServerMP Server

DP Server / DP Server / WSWS

Refer to ‘fact sheet’ for specific product timings

today

Figure 5.1: The move to Intel multi-core

Source: A. Loktu: Itanium 2 for Enterprise Computing http://h40132.www4.hp.com/upload/se/sv/Itanium2forenterprisecomputing.pps

5.1 Intel MCPs (2)

Figure 5.2: Processor specifications of Intel’s Pentium D family (90 nm)Source: http://www.intel.com/products/processor/index.htm

EIST: Enhanced Intel SpeedStep Technology

First delivered in Intel’s mobile and server platforms,It allows the system to dynamically adjust processor voltage and core frequency,which can result in decreased average power consumptionand decreased average heat production.

It is a set of hardware enhancements to Intel’s server and client platforms that can improve the performance and robustness of traditional software-based virtualization solutions.

Virtualization solutions will allow a platform to run multiple operating systems and applications in independent partitions. Using virtualization capabilities, one computer system can function as multiple "virtual" systems.

VT: Virtualization Technology

Malicious buffer overflow attacks pose a significant security threat. In a typical attack, a malicious worm creates a flood of code that overwhelms the processor,allowing the worm to propagate itself to the network, and to other computers. It can help prevent certain classes of malicious buffer overflow attacks when combined with a supporting operating system.

Execute Disable Bit allows the processor to classify areas in memory by where application code can execute and where it cannot. When a malicious worm attempts to insert code in the buffer, the processor disables code execution, preventing damage and worm propagation.

ED: Execute Disable Bit

5.1 Intel MCPs (3)

5.1 Intel MCPs (4)

Figure 5.3: Processor specifications of Intel’s Pentium D family (65 nm)Source: http://www.intel.com/products/processor/index.htm

5.1 Intel MCPs (5)

Figure 5.4 Specifications of Intel’s Pentium Processor Extrem Edition models 840/955/965

Source: http://www.intel.com/products/processor/index.htm

5.1 Intel MCPs (6)

Figure 5.5: Procesor specifications of Intel’s Yonah Duo (Core Duo) family

Source: http://www.intel.com/products/processor/index.htm

Source: http://www.intel.com/products/processor_number/chart/core2duo.htm

5.1 Intel MCPs (7)

Figure 5.6 Specifications of Intel’s Core Processors

5.1 Intel MCPs (8)

Category Code Name Cores Cache Market

Desktop KentsfieldDual core multi-die

4 MB Mid 2007

Desktop ConroeDual core single die

4 MB shared End 2006

Desktop AllendaleDual core single die

2 MB shared End 2006

Desktop Cedar Mill (NetBurst/P4) Single core512 kB, 1 MB, 2 MB

Early 2006

Desktop Presler (NetBurst/P4) Dual core, dual die 4 MB Early 2006

Desktop/Mobile Millville Single core 1 MB Early 2007

Mobile Yonah2 Dual core, single die 2 MB Early 2006

Mobile Yonah1 Single core 1/2 MB Mid 2006

Mobile Stealey Single core 512 kB Mid 2007

Mobile Merom Dual core, single die 2/4 MB shared End 2006

Enterprise Sossaman Dual core, single die 2 MB Early 2006

Enterprise Woodcrest Dual core, single die 4 MB Mid 2006

Enterprise Clovertown Quad core, multi-die 4 MB Mid 2007

Enterprise Dempsey (NetBurst/Xeon) Dual core, dual die 4 MB Mid 2006

Enterprise TulsaDual core single die

4/8/16 MB End 2006

Enterprise WhitefieldQuad core single die

8 MB, 16 MB shared Early 2008Figure 5.7: Future 65 nm processors (overview)

Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered

Codename Cores Cache Market

Desktop Wolfdale Dual core, single die 3 MB shared 2008

Desktop RidgefieldDual core single die

6 MB shared 2008

Desktop Yorkfield8 cores multi-die

12 MB shared 2008+

Desktop Bloomfield Quad core, single die - 2008+

Desktop/Mobile

Perryville Single core 2 MB 2008

Mobile PenrynDual core single die

3 MB, 6 MB shared 2008

Mobile Silverthorne - - 2008+

Enterprise Hapertown8 cores multi-die

12 MB shared 2008

Figure 5.8: Future 45 nm processors (overview)

5.1 Intel MCPs (9)

Source: P. Schmid: Top Secret Intel Processor Plans Uncovered www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered

5.2 Athlon 64 X2

Figure 5.9: AMD Athlon 64 X2 dual-core processor architectureSource: AMD Athlon 64 X2 Dual-Core Processor for Desktop – Key Architecture Features, http:///www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_13041.00.html

5.3 Sun’s UltraSPARC IV/IV+ (1)

Figure 5.10: UltraSPARC IV (Jaguar)

Source: C. Boussard: Architecture des processeurshttp://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf

ARB: Arbiter

5.3 Sun’s UltraSPARC IV/IV+ (2)

Figure 5.11: UltraSPARC IV+ (Panther)

Source: C. Boussard: Architecture des processeurshttp://laser.igh.cnrs.fr/IMG/pdf/SUN-CNRS-archi-cpu-3.pdf

5.4 POWER4/POWER5 (1)

Figure 5.12: POWER4 chip logical view

Source: J.M. Tendler, S. Dodson, S. Fields, H. Le, B. Sinharoy: Power4 System Microarchitecture, IBM Server, Technical White Paper, October 2001

http://www-03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf

Built-In-SelfTest

Service Processor

Power On Reset

Core interface Unit(crossbar)

Non-CacheableUnit

MultiChip Module

5.4 POWER4/POWER5 (2)

Figure 5.13: POWER4 chip

Source: R. Kalla, B. Sinharoy, J. Tendler: Simultaneous Multi-threading Implementation in Power5 – IBM’s Next Generation POWER Microprocessor, 2003

http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf

5.4 POWER4/POWER5 (3)

Figure 5.14: POWER4 and POWER5 system structures

Source: R. Kalla, B. Sinharoy, J.M. Tendler: IBM Power5 chip: A Dual-core multithreaded Processor, IEEE. Micro, Vol. 24, No.2, March-April 2004, pp. 40-47.

FabricController

5.5 Cell (1)

Figure 5.15: Cell (BE) microarchitecture

Source: IBM: „Cell Broadband Engine™ processor – based systems”, IBM corp. 2006

SPE: SynergisticProcessing Element

EIB: Element Interface Bus

MFC: Memory Flow Controller

PPE: Power Processing Element

AUC: Atomic Update Cache

5.5 Cell (2)

Figure 5.16: Cell SPE architecture

Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html

5.5 Cell (3)

Figure 5.17: Cell floorplan

Source: Blachford N.: „Cell Architecture Explained Version 2”, http://www.blachford.info/computer/Cell/Cell1_v2.html