+ All Categories
Home > Documents > Bulldozer(1)

Bulldozer(1)

Date post: 24-Oct-2014
Category:
Upload: montyji30
View: 15 times
Download: 1 times
Share this document with a friend
Popular Tags:
13
BULLDOZER: AN APPROACH TO MULTITHREADED COMPUTE PERFORMANCE Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas, Advanced Micro Devices IEEE Micro Electronics Conference – Aug,2011 Volume 31,pp. 6-15 Presented By: Vikram Nunia (2011H103021H) ME (CS) I Yr.
Transcript
Page 1: Bulldozer(1)

BULLDOZER: AN APPROACH TO MULTITHREADED

COMPUTE PERFORMANCE

Michael Butler, Leslie Barnes, Debjit Das Sarma, Bob Gelinas, Advanced Micro Devices

IEEE Micro Electronics Conference –Aug,2011 Volume 31,pp. 6-15

Presented By:Vikram Nunia (2011H103021H)ME (CS) Ist Yr.

Page 2: Bulldozer(1)

ABSTRACT

• AMD’s bulldozer module represents a new direction in

microarchitecture.

• This article discusses the module’s multithreading

architecture, power-efficient microarchitecture, and

subblocks, including the various microarchitectural

latencies, bandwidths, and structure sizes.

Page 3: Bulldozer(1)

Introduction

• Advanced Micro Devices’ Bulldozer module is the core building block for future AMD client and server systems on a chip (SoCs).• Future SoCs would always support multiple execution threads.• The core would always operate in a power-constrained environment.• The module employs various power reduction techniques—

such as filtering, speculation reduction, and data movement minimization—to produce an inherently power-efficient design.

Page 4: Bulldozer(1)

Block Diagram of AMD Bulldozer

Page 5: Bulldozer(1)

Key Features and Motivation

• Multithreading Architecture.• Dynamic Power management.• Decoupled Branch-Prediction an Instruction Fetch

pipelines.• Register renaming and operand delivery.• FMAC and media extensions.

Page 6: Bulldozer(1)

Multithreading Architecture

Page 7: Bulldozer(1)

Dynamic Power Management

• PRF-based renaming microarchitecture.• Macro-instruction fusing capability.• Actively monitoring and throttling power enables average

application power to be closer to TDP, with a corresponding performance increase.

Page 8: Bulldozer(1)

Decoupled Branch Prediction and Instruction-Fetch Pipelines.

Page 9: Bulldozer(1)

Register Renaming and Operand Delivery

• PRF-based renaming microarchitecture instead of distributed reservation stations.• It can physically separate dependency tracking

(wake up) from data storage, easing timing pressure and allowing better scaling to larger scheduler queue sizes.

Page 10: Bulldozer(1)

FMAC and Media Extensions

It implements a significant extension to the x86 architecture that introduces a set of three source-operand, nondestructive instructions including floating-point multiply-accumulate (FMAC) of 128-byte each.

Page 11: Bulldozer(1)

Functional Block Highlights

• Branch Prediction• Instruction Cache• Decode• Integer Scheduler and execution• Load/ Store• Floating Point• L2 Cache

Page 12: Bulldozer(1)

Conclusion

The initial AMD products built with the Bulldozer module will be desktop and server SoCs. These SoCs are drop-in replacements for AMD’s existing SoCs and deliver a significant performance improvement in the same power envelope as the company’s existing products.

Page 13: Bulldozer(1)

Thanks!!

Any Questions??


Recommended