EECS151/251AHomework8 Problem1:PowerandLeakage · 2020. 4. 21. · EECS151/251AHomework8 Due...

EECS 151/251A Homework 8

Due Monday, April 6th, 2020

Problem 1: Power and Leakage

Consider a 3-input “2-1 AOI” gate shown below with VDD = 1 V, CL = 5 fF, CD = 0.2 fF/nm.Assume RON,n = 10 kΩ · nm, RON,p = 20 kΩ · nm, ROF F,n = 100 MΩ · nm, ROF F,p = 500 MΩ · nmfor the given device length.

a) Size the gate, using as a reference a symmetrically sized inverter with Wn = 10 nm. Yoursized gate should have the same input capacitance as the reference inverter for all inputs.

Solution:To keep the pull-up and pull-down delays the same, we size Wp,a = 4Wn,a and Wp,b/c =2Wn,b/c. To make the inputs have the same input capacitance as the reference inverterwe size the transistors as follows:

• an = 3/5Wn = 6 nm• ap = 12/5Wn = 24 nm• bn = cn = Wn = 10 nm

Version: 1 - 2020-04-21 11:29:45-07:00

EECS 151/251A Homework 8 2

• bp = cp = 2Wn = 20 nm

b) Assume that the probability of an input being high is 0.5 (i.e., on any given clock cycle, eachinput is equally likely to be a 0 or a 1.) and that all inputs are independent. What is theprobability that the output is high, P (Out = 1)? What is the probability that the output islow, P (Out = 0)? What is the gate activity factor (i.e. the probability that the output willtransition from low to high, P0→1)?

Solution:The truth table for the 2-1 AOI gate is as follows:

a b c out0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 01 0 1 01 1 0 01 1 1 0

Since the probability of an input being 1 or 0 is equally likely and all the inputs areindependent, then the probability of the output being 1 is the sum of the probabilities ofinput combinations that yield an output of 1. Therefore

P (Out = 1) = 38 = 0.375

The same process is applied to finding the probability of the output being 0.

P (Out = 0 = 58 = 0.625

The probabiility that the output will transition from low to high can be found usingconditional probability.

P0→1 = P (Outt=n = 1|Outt−1 = 0) = P (Out = 0) · P (Out = 1) =1564 = 0.234375

c) What is the dynamic power dissipation of the gate, if the clock frequency is 3GHz? You mayignore the parasitic drain capacitance in the internal nodes of the PMOS stack, but not atthe output.

Solution:From lecture, we know that dynamic power dissipation of a gate can be expressed as

Pdyn =12αCtotV

2ddfclk

The total capacitance (from our sizing) is the drain capacitance from an, ap, and bn along

Version: 1 - 2020-04-21 11:29:45-07:00


with the load capacitance.

Ctot =35CD +

125 CD + CD + CL = 4CD + CL

The activity factor is the probability of the output transitioning, which you found in part(b).

α = P0→1The other values are

Vdd = 1 V, fclk = 3 GHz

Pdyn =12(0.234)(4 · 0.2 · 10 + 5) · 10

−15(1)2(3 · 109)

Pdym = 4.57 µW

d) 251A only - 151 Optional. For the following three cases, calculate the leakage current.An approximate expression is perfectly fine as long as you explain and justify your assump-tions/simplifications.

(a) All inputs are zero.

Solution:All PMOS on so ignore their resistance, parallel combo of a_nmos with b_nmos +c_nmos.

Req ≈ Ron,n,a||(Ron,n,b +Ron,n,c)

I = VddRon,n,a||(Ron,n,b +Ron,n,c)

(b) All inputs are 1.

Solution:All NMOS on so ignore their resistance, series combo of a_pmos with parallelb_pmos + c_pmos.

Req ≈ Ron,p,a + (Ron,p,b||Ron,p,b)

I = VddRon,p,a + (Ron,p,b||Ron,p,b)

(c) A = B = 1, C = 0.

Solution:c_pmos on in series with a_pmos off in series with a_nmos on.

Req ≈ Ron,p,c +Roff,p,a +Ron,n,a)

I = VddReq ≈ Ron,p,c +Roff,p,a +Ron,n,a)

Version: 1 - 2020-04-21 11:29:45-07:00


Problem 2: Energy Efficiency Improvements

Consider the design of a vector add unit. As shown below the unit has two input input registerbanks and an output register bank. One of the input register banks holds the first vector [A3, A2,A1, A0] and the second holds [B3, B2, B1, B0]. A controller (not shown) passes the elements of theinput vectors through the adder (one per clock cycle) and the result is stored in the output registerbank [C3, C2, C1, C0]. As you can see, a 4-1 multiplexor is used by the controller for choosing theproper A and B register, and clock enable signals are given to select the proper C register. Thecircuit elements have the following delays: τadd = 16 ns, τmux = 2 ns, and τsetup = τclk−Q = 1 ns.

On average, at the nominal Vdd the energy for one data item passed through the adder block is 1Joule, and 0.2 Joules for the multiplexor. The registers each consume 0.1 Joule on average for eachnew data word stored.

Your application for this circuit requires a complete vector of 4 elements be computed every 80ns.You can ignore the time and energy required to load new values into the A and B registers.

For this problem assume that the adder operation cannot be pipelined.

Devise a scheme that would improve the switching energy efficiency while meeting the applicationrequirements. Compare the switching energy per result of the original circuit and your new one.

Version: 1 - 2020-04-21 11:29:45-07:00


Assume that a 1/n reduction in clock frequency can accommodate a 1/n reduction in Vdd.

Solution:It takes 4 clock cycles to complete the vector addition. Each clock cycle requires going through2 MUXes, an adder, and storing the result in the correct register. The total energy expenditureis

Etot,old = 4(2 · 0.2 + 1 + 0.1) = 6 J

Since we cannot pipeline the adder operation, the other tradeoff we can make with energyefficiency while meeting application requirements is more hardware cost with parallelism. Wecan have 4 adders running in parallel and remove the need for MUXes. The new total energyexpenditure is then

Etot,new = 4 · 1 + 4 · 0.1 = 4.4 J

We can also run the clock slower since the vector addition can be completed in one clock cycle.The clock can be slowed by a factor of 8018 = 4.44 where the new critical path is 18ns. Thisallows a factor of 4.44 reduction in Vdd, resulting in a further 4.442 = 19.8 times reduction inenergy.

Problem 3: Race to Halt

An effective scheme for improving energy efficiency when static power consumption is a significantcomponent of total power consumption is a technique call “race to halt”. The basic idea is to runthe hardware at maximum speed to quickly compute the necessary set of computations, then turnoff the power, thus preventing leakage.

Suppose you have a CPU that take 4 seconds to run your application with an average powerconsumption of 8 Watts, where 50% of the power is dynamic and 50% is static. Assume that noother program also running on the CPU. You are willing to run your application slower if thatcould preserve energy.

You would like to determine the most effective way to run your application to preserve the batterylife. You have the ability to control the supply voltage (Vdd), the clock frequency (f), and if neededcan put the CPU into a sleep mode where static power is essentially zero. The CPU’s Vdd can beincreased or decreased by at most 25%.

Explore “race to halt” versus running longer at a lower Vdd. Which approach will be better atconserving your battery charge? For this problem, assume that when varying frequency f and suppyvoltage Vdd, that the static power usage remains constant. This is more or less true. Show yourwork and justify your answer.

Assume that an n% increase/decrease in clock frequency can accommodate an n% increase/decreasein Vdd.

Solution:Since the nominal average power consumption Ptot,nom = 8 W, then the nominal dynamic

Version: 1 - 2020-04-21 11:29:45-07:00


power Pdyn,nom = 12CV 2f = 4 W and Pstatic,nom = 4 W.

Race to Halt: Increase Vdd and fclk by the maximum 25%.

Pdyn,race =12C(1.25Vdd)

2(1.25fclk)

The CPU now takes 3 seconds instead of 4 to run the application, so the total energy is

Etot,race = 1.95 ·12CV

2f · 3 s + 4 W · 3 s

Etot,race = 1.95 · 4 W · 3 s + 4 W · 3 s = 35.4 J

Lower Vdd: Decrease Vdd and fclk by the maximum 25%.

Pdyn,race =12C(0.75Vdd)

2(0.75fclk)

The CPU now takes 5 seconds instead of 4 to run the application, so the total energy is

Etot,race = 0.42 ·12CV

2f · 5 s + 4 W · 5 s

Etot,race = 0.42 · 4 W · 5 s + 4 W · 5 s = 28.4 J

Interestly enough, the lower Vdd scheme is more energy efficient.

Problem 4: Memory

a) Suppose you want to design a 32-bit wide memory block with a capacity of 2K 32-bit wordsof storage (remember 1K = 1024). We would like to have the core of the block square (equalnumber of rows and columns). How many total address bits are needed for this memory?How many address bits are used by the row-decoder? How many address bits are used by thecolumn-decoder?

Solution:

2x1024x32 = 65536 -> core is 256 x 256, 11-bit address, col decoder requires 3 (23̂ =256/32 = 8), row decoder requires 8 (11-3) The total memory size is

2× 1024× 32 = 65536 bits

Since we want a squuare block, we take the square root of the memory size√

65536 = 256

This means we have to design a 256 × 256 memory core. The total number of addressbits required is

L = log2(2 · 1024) = 11 bits

Version: 1 - 2020-04-21 11:29:45-07:00


Each row contains 256/32 = 8 32-bit words. So the column-decoder needs

K = log2(8) = 3 bits

The row-decoder then requires L−K = 11− 3 = 8 bits.

b) Now you want to design the row decoder using the predecoder technique presented in lecture.You can use only gates with no more than 4-inputs. Map out the scheme and describe thedesign of each of the decoder.

Solution:Predecode groups of 2 bits for the 8 bits used by the row decoder. Then combine eachgroup of 2 results into a 4-input AND gate to decode the 8 bit address.

Problem 5: DRAM [4 pts]

1-transistor DRAM designs usually include a “row buffer”—a register on the periphery that is usedto register an entire row.

a) Explain how this register could be used and why it’s a good idea.

Solution:It reduces power and increasing memory system speed. RAM accesses exhibit spaciallocality to a high degree: it’s likely that access to one word in a DRAM row is likelyfollowed by another access to the same row. Buffering the row saves having to read thememory cells again, returning a value to the system faster and using less power. For

Version: 1 - 2020-04-21 11:29:45-07:00


writing: a row is opened (copied into the row buffer) and constituent bytes/words areupdated before the entire buffer is written back.

b) Explain how the inclusion of this buffer changes the detailed steps needed for a memory readand memory write operation.

Solution:For read:

• compulsory miss: slow access - open row, read data and move to row buffer, thenmove data to out

• row buffer hit: fast access - only move data from buffer to out• row buffer conflict: low access - write back existing row, open and read the new

row, and update row buffer

For write:

• compulsory miss: slow access - open row, update row buffer, then move data tobuffer where edits will take place

• row buffer hit: fast access - write and make edits on row buffer• row buffer conflict: low access - write back existing row, open and read the new

row, and update row buffer

Problem 6: Memory Implementation

a) Consider the design of a (very) small asynchrous-read register file block of 4 words by 4-bits each, and with two read ports and one write port. You want to implement the registermemory cells as positive edge-triggered flip-flops. Draw the circuit diagram for your designusing the flip-flop cells, multiplexers, and logic gates.

Solution:

Version: 1 - 2020-04-21 11:29:45-07:00


b) 251A only - 151 Optional. Now consider the redesign of the register file from part a)using latches instead of flip-flops. For this design, as above, the write operation occurs onthe positive edge of the clock, but now the output data on a read become available after thefalling edge of the clock.

Solution:

Problem 7: Memory Blocks [10pts]

You are given a simple dual port (SDP) memory block that is 128x8. Show how you would usemultiple instances to design a memory that has 2 independent read ports and is 256x8.

Solution:Use 2 128x8 to make 256x8 (increase depth), then stack 2 (4 total) to get 2 read ports

Version: 1 - 2020-04-21 11:29:45-07:00


Monday,April6,2020 12:08PM

Version: 1 - 2020-04-21 11:29:45-07:00

Power and LeakageEnergy Efficiency ImprovementsRace to HaltMemoryDRAM [4 pts]Memory ImplementationMemory Blocks [10pts]

Date post:	27-Jan-2021
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

EECS151/251AHomework8 Problem1:PowerandLeakage · 2020. 4. 21. · EECS151/251AHomework8 Due...

Documents