+ All Categories
Home > Documents > Cost/Performance Tradeoffs: a case study

Cost/Performance Tradeoffs: a case study

Date post: 15-Jan-2016
Category:
Upload: keola
View: 40 times
Download: 0 times
Share this document with a friend
Description:
Cost/Performance Tradeoffs: a case study. Digital Systems Architecture I. HARD PROBLEM design circuit to multiply BIG (32-bit, 64-bit) numbers. EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-, 3-bit) operands. Binary Multiplication. n bits. n bits. 2n bits - PowerPoint PPT Presentation
Popular Tags:
25
6.004 –Fall 2002 10/08/02 L10 –Multiplier 1 Cost/Performance Tradeoffs: a case study Digital Systems Architecture I. Digital Systems Architecture I.
Transcript
Page 1: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 1

Cost/Performance Tradeoffs:a case study

Digital Systems Architecture I.Digital Systems Architecture I.

Page 2: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 2

Binary Multiplication

HARD PROBLEM design circuit to multiply BIG (32-bit, 64-bit) numbers

n bits

n bits

2n bitssince (2n-1)2 < 22n

Engineering Principle:

Exploit STRUCTURE in problem.

We can make big multipliers out of little ones!

EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-, 3-bit) operands...

Page 3: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 3

Making a 2n-bit multiplierusing n-bit multipliers

Given n-bit multipliers:

Synthesize 2n-bit multipliers:

Page 4: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 4

Our Basis:

Of course, we could start with optimized combinational multipliers for larger operands; e.g.

2-bitMultiplier

the logic getsmore complex,

but someoptimizations are possible...

n=1: minimalist starting pointMultiplying two 1-bit numbers is pretty simple:

Page 5: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 5

Our induction step:

Induction: we can use the same structuring principle to build a 4n-bit multiplier from our newly-constructed 2n-bit ones...

REGROUP partial

products –2 additions rather than 3!

2n-bit by 2n-bit multiplication:

1. Divide multiplicands into n-bit pieces2. Form 2n-bit partial products, using n-bit by n-bit

multipliers.3. Align appropriately4. Add.

Page 6: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 6

Brick Wall viewof partial products

Making 4n-bit multipliers from n-bit ones: 2 “induction steps”

Page 7: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 7

Multiplier Cookbook: Chapter 1

Given problem:

Subassemblies:•Partial Products•Adders

Step 1: Form (& arrange)

Partial Products:

Step 2: Sum

Page 8: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 8

Performance/Cost Analysis

"Order Of" notation:

"g(n) is of order f(n)"

if there exist C2≥C1 >0,such that for all but finitely many integral n ≥0

Example:

since

"almost always"

impliesboth inequalities; O(...) implies only

the second.

Partial Products:Things to Add:

Adder Width:

Hardware Cost:

Latency:

Page 9: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 9

Observations:

partial products.

full adders.

Hmmm.

Page 10: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 10

Repackaging Function

partial products.

full adders.

Engineering Principle #2:

Put the Solution where the Put the Solution where the Problem is.Problem is.

How about n2 blocks, each doing a little multiplication and a little addition?

Page 11: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 11

Goal:Array of Identical Multiplier Cells

Single "brick" of brick-wall array...

• Forms partial product • Adds to accumulating sum along with carry

Necessary Component: Full Adder

Takes 2 addend bits plus carry bit. Produces sum and carry output bits.

CASCADE to form an n-bit adder.

Page 12: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 12

Design of 1-bit multiplier "Brick":

Array Layout:•operand bits bused diagonally•Carry bits propagate right-to-left•Sum bits propagate down

Brick design:•AND gate forms 1x1 product•2-bit sum propagates from top to bottom•Carry propagates to left

Page 13: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 13

Latency revisited

Here’s our combinational multiplier:

What’s its propagation delay?

Naive (but valid) bound:•O(n) addtions•O(n) time for each addition•Hence O(n2) time required

On closer inspection:•Propagation only toward left, bottom•Hence longest path bounded by length + width of array: O(n+n) = O(n)!

Page 14: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 14

Multiplier Cookbook:Chapter 2

Combinational Multiplier:

Hardware for n by n bits:

Latency:

Throughput:

Page 15: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 15

Combinational Multiplier:best bang for the buck?

Suppose we have LOTS of multiplications.

Can we do better from a cost/performance

standpoint?

PIPELINING

Page 16: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 16

Stupid Pipeline Tricks

Hardware cost for n by n bits:

Stages:

Clock Period:

Latency:

Throughput:

gotta break that long carry

chain!

Page 17: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 17

The Pipelining Bandwagon...where do I get on?

WE HAVE:• Pipeline rules –"well

formed pipelines"• Plenty of registers• Demand for higher

throughput.What do we do? Where do we define stages?

Page 18: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 18

Even Stupider Pipeline Tricks

Back to basics:what’s the point of pipelining, anyhow?

WORSE idea:• Doesn’t break long combinational paths• NOT a well-formed pipeline...

... different register counts on

alternative paths... data crosses stage

boundaries in both directions!

Page 19: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 19

Breaking O(n) combinational paths

GOAL: Θ(n) stages; Θ (1) clock period!

LONG PATHS go down, to left:•Break array into diagonal

slices•Segment every long combinational path

Page 20: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 20

Multiplier Cookbook: Chapter 3

•Well-formed pipeline (careful!)• Constant (high!) throughput, independently of operand size.

... but suppose we don’t need the throughput?

Stages:Clock Period:

Hardware cost for n by n bits:Latency:

Throughput:

Page 21: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 21

Moving down the cost curve...

Hmmm, are all these extras

really needed?

Suppose we have INFREQUENT multiplications... pipelining

doesn’t help us.

Can we do better from a cost/performance standpoint?

Page 22: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 22

Multiplier Cookbook: Chapter 4

Stages:Clock Period:

Hardware cost for n by n bits:Latency:

Throughput:

Sequential Multiplier:

•Re-uses a single n-bit “slice” to emulate each pipeline stage

•a operand entered serially

•Lots of details to be filled in...

(constant!)

Page 23: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 23

(Ridiculous?)

Extremes Dept...Cost minimization: how far can we go?

Suppose we want to minimize hardware (at any cost)…

•Consider bit-serial!

•Form and add 1-bit partial product per clock

•Reuse single “brick” for each bit bj of slice;

•Re-use slice for each bit of a operand

Page 24: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 24

Multiplier Cookbook: Chapter 5

Bit Serial multiplier:

•Re-uses a single brick to emulate an n-bit slice

•both operands entered serially

•O(n2) clock cycles required

•Needs additional storage (typically from existing registers)

Stages:Clock Period:

Hardware cost for n by n bits:

Latency:

Throughput:

(constant!)

Page 25: Cost/Performance Tradeoffs: a case study

6.004 –Fall 2002 10/08/02 L10 –Multiplier 25

Summary:

Lots more multiplier technology: fast adders, Booth Encoding, column compression, ...

Scheme: $ Latency ThruputCombinational

N-pipe

Slice-serial

Bit-serial


Recommended