+ All Categories
Home > Documents > Power 7 - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2013/4-1.pdf64 KB L1,...

Power 7 - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2013/4-1.pdf64 KB L1,...

Date post: 04-Mar-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
19
Power 7 Dan Christiani Kyle Wieschowski
Transcript

Power 7

Dan Christiani Kyle Wieschowski

History 1980 - 2000● 1980 RISC Prototype● 1990 POWER1 (Performance Optimization With

Enhanced RISC) (1 um)● 1993 IBM launches 66MHz POWER2 (.35 um)● 1997 POWER2 ‘Super Chip’

POWER1

History 1980 - 2000● 1980 RISC Prototype● 1990 POWER1 (Performance Optimization With

Enhanced RISC) (1 um)● 1993 IBM launches 66MHz POWER2 (.35 um)● 1997 POWER2 ‘Super Chip’ ● 1998 POWER3 (.22 um) 64-bit (POWER2+PowerPC)

History 2000-2007

● 2001 POWER4 (180 nm) - Dual Core● 2004 POWER5 (130 nm) - SMT● 2006 POWER6 (65 nm) - High Frequency

○ 4.7 GHz - Dual Core○ First server to hold all major benchmark records○ 3x faster than the comparable Intel Itanium

processor● 2010 POWER7 (45 nm) - Cores, eDRAM

Power 7 Architectural Focus

● Reduce core area and power○ Frequency is lowered to reduce power

● Fit the chip in the same sockets as POWER6

● Utilize the same SMP and I/O buses○ At higher frequencies

● Remove external L3 cache chips● Double floating-point capability of each core

Architecture Overview● 8 Cores

○ 12 execution units○ Four-way SMT○ Integrated L2 cache

● 2 Memory Controllers○ 4 channels of DDR3

● Shared L3 Cache● 5 SMP Links

○ Allows 32 sockets

The Core● 6 Primary Units

○ IFU, ISU, LSU, FXU, VSU, and decimal FPU● 12 Execution Units

○ 2 fixed point, 2 load/store, 4 double-precision, 1 vector, 1 branch, 1 decimal FP, 1 control register

● In a given cycle:○ Fetch up to 8 instructions○ Decode and dispatch up to 6 instructions○ Issue and execute up to 8 instructions

Instruction Fetch Unit (IFU)

● Feeds pipeline with most likely instructions○ Based on branch prediction

● Maintains balance of instruction execution○ Based on software-defined thread priority

● Decodes and groups instructions● Executes branch instructions

Instruction-Sequencing Unit (ISU)

● Dispatches instructions○ As groups to a single thread

● Renames registers● Completes instructions

○ Global Completion Table○ As groups also

● Handles exception conditions● In charge of flushing core

Load/Store Unit (LSU)● 2 symmetric LS execution pipelines (OoO)

○ 1 load or store operation each● Dependencies:

○ 1 stall between load and FXU operations○ 2 stalls between load and VSU operations

● Also executes FX add and logical instructions

● SRQ - 32 outstanding stores can be issued● LRQ - 32 outstanding loads can be issued

Fixed-Point Unit (FXU)● Two identical pipelines● Containing:

○ Multiport GPR file○ ALU, Divider, and Multiplier○ Rotator○ Count leading zeros unit○ Bit-select unit○ Miscellaneous unit (to execute population count,

parity, and binary-coded decimal assist instructions)

Vector and Scalar Unit (VSU)● Vector instructions for

○ Vector modification: e.g. Merge, Shift,○ Load/Store○ Arithmetic - no Divide○ Floating Point Arithmetic- no divide

Cache

● Private 32 KB Level 1 caches○ Instruction Cache integrated with the IFU○ Data Cache integrated with the LSU

● Private 256 KB Level 2 caches○ 8-way set associative

● 32 MB Level 3 cache○ 4 MB of Local L3 (comprised of 32 eDRAM macros)○ 28 MB of Global L3

Memory Subsystem

● 2 Memory Controllers○ Synchronous Region:

■ Services reads and writes■ Arbitrates among conflicting requests■ Manages coherence directory information

○ Asynchronous Region:■ Manages traffic through channels/buffer chips■ Schedules reads, writes, and maintenance■ Balances utilization of resources

The Future of PowerPOWER8 (mid-2014) ● 22 nm Design● SMT8● 12 Core:

○ 10 Issue○ 16 Execution Pipes

■ 2 FXU, 2 LSU, 2 LU■ 4 FPU, 2 VMX■ 1 Crypto, 1 DFU■ 1 CR, 1 BR

○ 64 KB L1, external 128 MB L3 ○ 2x Estimated performance during max SMT

OpenPower “The OpenPOWER Consortium brings together an ecosystem of hardware, system software, and enterprise applications that will provide powerful computing systems based on NVIDIA GPUs and POWER CPUs”

POWER8 + Infrastructure + CUDA = NextGen DataCenter

Questions?


Recommended