+ All Categories
Home > Documents > AMD OPTERON ARCHITECTURE

AMD OPTERON ARCHITECTURE

Date post: 25-Feb-2016
Category:
Upload: neona
View: 40 times
Download: 3 times
Share this document with a friend
Description:
AMD OPTERON ARCHITECTURE. Omar Aragon Abdel Salam Sayyad This presentation is missing the references used. Outline. Features Block diagram Microarchitecture Pipeline Cache Memory controller HyperTransport InterCPU Connections. Features. 64-bit x86-based microprocessor - PowerPoint PPT Presentation
Popular Tags:
15
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used
Transcript
Page 1: AMD OPTERON ARCHITECTURE

AMD OPTERON ARCHITECTURE

Omar AragonAbdel Salam Sayyad

This presentation is missing the references used

Page 2: AMD OPTERON ARCHITECTURE

Outline• Features• Block diagram• Microarchitecture• Pipeline• Cache• Memory controller• HyperTransport• InterCPU Connections

Page 3: AMD OPTERON ARCHITECTURE

Features• 64-bit x86-based microprocessor• On chip double-data-rate (DDR) memory controller [low

memory latency] • Three HyperTransport links [connect to other devices

without support chips]• Out of order, superscalar processor• Adds 64-bit (48-bit virtual and 40-bit physical) addressing

and expands number of registers• Supports legacy 32-bit applications without modifications

or recompilation

Page 4: AMD OPTERON ARCHITECTURE

Features• Double the number of registers

• Integer general purposes registers (GPR’s) – 16 each• Streaming SIMD extension (SSE) registers – 16 each

• Satisfies the register allocation needs of more than 80% of functions appearing in a typical program.

• Connected to a memory through an integrated memory controller

• High performance I/O subsystem via HyperTransport bus.

Page 5: AMD OPTERON ARCHITECTURE

Block diagram

Page 6: AMD OPTERON ARCHITECTURE

Microarchitecture• Works with fixed-length micro-ops and dispatches into two

independent schedulers: One for integer, and one for floating point and multimedia (MMX, 3DNow, SSE and SSE2)

• Load and store micro-ops go to the load/store unit• 11 micro-ops each cycle to the following execution

resources.• Three integer execution units• Three address generation units• Three floating point and multimedia units• Two load/store to the data cache

Page 7: AMD OPTERON ARCHITECTURE

Microarchitecture

Page 8: AMD OPTERON ARCHITECTURE

Pipeline• Long enough for high frequency and short enough for

good IPC (Instructions per cycle)• Fully integrated from instruction fetch through DRAM

access.• Execute pipeline is typically

• 12 stages for integer• 17 stages for floating-point• Data cache access occurs in stage 11.

• In case that L1 cache miss, the pipeline access the L2 cache in parallel and the request goes to the system request queue.

• Pipeline in the DRAM run as the same frequency as the core

Page 9: AMD OPTERON ARCHITECTURE

Pipeline

Page 10: AMD OPTERON ARCHITECTURE

Memory, Cache, and HyperTransport

Page 11: AMD OPTERON ARCHITECTURE

Cache• Separate L1 Instruction and Data caches.

• Each is 64 Kbytes, 2-way set associative, 64-byte cache line.• L2 cache (Data & Instructions)

• Size: 1 Mbytes. 16-way set associative.• uses a pseudo-least-recently-used (LRU) replacement policy

• Independent L1 and L2 translation look-aside buffers (TLB).• The L1 TLB is fully associative and stores thirty-two 4-Kbyte page

translations, and eight 2-Mbyte/4-Mbyte page translations.• The L2 TLB is four-way set-associative with 512 4-Kbyte entries.

Page 12: AMD OPTERON ARCHITECTURE

Onboard Memory Control• 128-bit memory bus• Latency reduced and bandwidth doubled• Multicore: Processors have own memory interface and

own memory• Available memory scales with the number of processors• DDR-SDRAM only• Up to 8 registered DDR DIMMs per processor• Memory bandwidth of up to 5.3 Gbytes/s per processor.

Page 13: AMD OPTERON ARCHITECTURE

HyperTransport• Bidirectional, serial/parallel, scalable, high-bandwidth low-

latency bus• Packet based

• 32-bit words regardless of physical width• Facilitates power management and low latencies

Page 14: AMD OPTERON ARCHITECTURE

HyperTransport in the Opteron• 16 CAD HyperTransport (16-bit wide, CAD=Command,

Address, Data) • processor-to-processor and processor-to-chipset• bandwidth of up to 6.4 GB/s (per HT port)

• 8-bit wide HyperTransport for components such as normal I/O-Hubs

Page 15: AMD OPTERON ARCHITECTURE

InterCPU Connections• Multiple CPUs connected through a proprietary extension

running on additional HyperTransport interfaces • Allows support of a cache-coherent, Non-Uniform Memory

Access, multi-CPU memory access protocol• Non-Uniform Memory Access

• Separate cache memory for each processor• Memory access time depends on memory location. (i.e. local

faster than non-local)• Cache coherence

• Integrity of data stored in local caches of a shared resource • Each CPU can access the main memory of another

processor, transparent to the programmer


Recommended