BULLDOZER: AN APPROACH TO MULTITHREADED
COMPUTE PERFORMANCE
Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas, Advanced Micro Devices
IEEE Micro Electronics Conference –Aug,2011 Volume 31,pp. 6-15
Presented By:Vikram Nunia (2011H103021H)ME (CS) Ist Yr.
ABSTRACT
• AMD’s bulldozer module represents a new direction in
microarchitecture.
• This article discusses the module’s multithreading
architecture, power-efficient microarchitecture, and
subblocks, including the various microarchitectural
latencies, bandwidths, and structure sizes.
Introduction
• Advanced Micro Devices’ Bulldozer module is the core building block for future AMD client and server systems on a chip (SoCs).• Future SoCs would always support multiple execution threads.• The core would always operate in a power-constrained environment.• The module employs various power reduction techniques—
such as filtering, speculation reduction, and data movement minimization—to produce an inherently power-efficient design.
Block Diagram of AMD Bulldozer
Key Features and Motivation
• Multithreading Architecture.• Dynamic Power management.• Decoupled Branch-Prediction an Instruction Fetch
pipelines.• Register renaming and operand delivery.• FMAC and media extensions.
Multithreading Architecture
Dynamic Power Management
• PRF-based renaming microarchitecture.• Macro-instruction fusing capability.• Actively monitoring and throttling power enables average
application power to be closer to TDP, with a corresponding performance increase.
Decoupled Branch Prediction and Instruction-Fetch Pipelines.
Register Renaming and Operand Delivery
• PRF-based renaming microarchitecture instead of distributed reservation stations.• It can physically separate dependency tracking
(wake up) from data storage, easing timing pressure and allowing better scaling to larger scheduler queue sizes.
FMAC and Media Extensions
It implements a significant extension to the x86 architecture that introduces a set of three source-operand, nondestructive instructions including floating-point multiply-accumulate (FMAC) of 128-byte each.
Functional Block Highlights
• Branch Prediction• Instruction Cache• Decode• Integer Scheduler and execution• Load/ Store• Floating Point• L2 Cache
Conclusion
The initial AMD products built with the Bulldozer module will be desktop and server SoCs. These SoCs are drop-in replacements for AMD’s existing SoCs and deliver a significant performance improvement in the same power envelope as the company’s existing products.
Thanks!!
Any Questions??