+ All Categories
Home > Documents > Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab...

Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab...

Date post: 20-Jan-2018
Category:
Upload: duane-black
View: 215 times
Download: 0 times
Share this document with a friend
Description:
Inelastic Pipeline x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1
28
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011 L04-1 http:// csg.csail.mit.edu/6.375
Transcript

Simple Inelastic and Folded Pipelines

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

February 14, 2011 L04-1http://csg.csail.mit.edu/6.375

Pipelining a block

inQ outQf2f1 f3

CombinationalC

inQ outQf2f1 f3 PipelineP

inQ outQf Folded

PipelineFP

Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 14, 2011 L04-2http://csg.csail.mit.edu/6.375

Inelastic Pipeline

xsReg1inQ

f1 f2 f3

sReg2 outQ

rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f1(inQ.first()); inQ.deq();end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f2(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endcaseendrule

Both Registers hold values of Maybe type

February 14, 2011 L04-3http://csg.csail.mit.edu/6.375

When is this rule enabled?rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end else sReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcaseendrule

inQ sReg1 sReg2 outQNE V V NFNE V V FNE V I NFNE V I FNE I V NFNE I V FNE I I NFNE I I F

E V V NFE V V FE V I NFE V I FE I V NFE I V FE I I NFE I I F

yesNoYesYesYesNoYesyes

yesNoYesYesYesNoYes1yes

Yes1 = yes but no change

inQ sReg1 sReg2 outQ

February 14, 2011 L04-4http://csg.csail.mit.edu/6.375

Pipelining a block

inQ outQf2f1 f3

CombinationalC

inQ outQf2f1 f3 PipelineP

inQ outQf Folded

PipelineFP

Clock? Area? Throughput?Clock: C < P FP Area: FP < C < P Throughput: FP < C < PFebruary 14, 2011 L04-5http://csg.csail.mit.edu/6.375

Folded pipeline

rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1;endrule

x

sReginQ

f

outQstage

notice stage is a dynamic parameter now!

no for-loop

Need type declarations for sxIn and sxOutFebruary 14, 2011 L04-6http://csg.csail.mit.edu/6.375

Superfolded pipeline One Bfly-4 case

f will be invoked for 48 dynamic values of stage each invocation will modify 4

numbers in sReg after 16 invocations a permutation

would be done on the whole sReg

February 14, 2011 L04-7http://csg.csail.mit.edu/6.375

Superfolded pipeline: stage function ffunction Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Bit#(2) stage Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; endreturn(stage_out);

Bit#(2+4) (stage,i)

should be done only when i=15 February 14, 2011 L04-8http://csg.csail.mit.edu/6.375

Code for the Superfolded pipeline stage functionFunction Vector#(64, Complex) f (Bit#(6) stagei, Vector#(64, Complex) stage_in); let i = stagei `mod` 16; let twid = getTwiddle(stagei `div` 16, i); let y = bfly4(twid, stage_in[i:i+3]);

let stage_temp = stage_in; stage_temp[i] = y[0]; stage_temp[i+1] = y[1]; stage_temp[i+2] = y[2]; stage_temp[i+3] = y[3];

let stage_out = stage_temp; if (i == 15) for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; return(stage_out); endfunction

One Bfly-4 case

February 14, 2011 L04-9http://csg.csail.mit.edu/6.375

Folded pipeline: stage function f

The Twiddle constants can be expressed in a table or in a case or nested case expression

stage

getTwiddle0

getTwiddle1

getTwiddle2

twidThe rest of stage_f, i.e. Bfly-4s and permutations (shared)

sx

February 14, 2011 L04-10http://csg.csail.mit.edu/6.375

802.11a Transmitter [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind

Design Lines of RelativeBlock Code (BSV) AreaController 49 0%Scrambler 40 0%Conv. Encoder 113 0%Interleaver 76 1%Mapper 112 11%IFFT 95 85%Cyc. Extender 23 3%

Complex arithmetic libraries constitute another 200 lines of code

February 14, 2011 L04-11http://csg.csail.mit.edu/6.375

802.11a Transmitter Synthesis results (Only the IFFT block is changing)

IFFT Design Area (mm2)

Throughput Latency

(CLKs/sym)

Min. Freq Required

Pipelined 5.25 04 1.0 MHz

Combinational 4.91 04 1.0 MHz

Folded(16 Bfly-4s)

3.97 04 1.0 MHz

Super-Folded(8 Bfly-4s)

3.69 06 1.5 MHz

SF(4 Bfly-4s) 2.45 12 3.0 MHz

SF(2 Bfly-4s) 1.84 24 6.0 MHzSF (1 Bfly4) 1.52 48 12 MHZ

TSMC .18 micron; numbers reported are before place and route.

The same source code

All these designs were done in less than 24 hours!

February 14, 2011 L04-12http://csg.csail.mit.edu/6.375

Why are the areas so similarFolding should have given a 3x improvement in IFFT area

BUT a constant twiddle allows low-level optimization on a Bfly-4 block a 2.5x area reduction!

February 14, 2011 L04-13http://csg.csail.mit.edu/6.375

Elastic pipelineUse FIFOs instead of pipeline registers

xfifo1inQ

f1 f2 f3

fifo2 outQrule stage1 (True); fifo1.enq(f1(inQ.first()); inQ.deq(); endrulerule stage2 (True);

fifo2.enq(f2(fifo1.first()); fifo1.deq(); endrulerule stage3 (True); outQ.enq(f3(fifo2.first());

fifo2.deq(); endrule

Firing conditions?

Can tokens be left inside the pipeline?

Easier to write?

No Maybe types?

Can all three rules fire concurrently?

February 14, 2011 L04-14http://csg.csail.mit.edu/6.375

What behavior do we want?

xfifo1inQ

f1 f2 f3

fifo2 outQ

Maximize concurrency - Fire maximum number of rules

inQ fifo1 fifo2 outQNE NE,NF NE,NF NFNE NE,NF NE,NF FNE NE,NF NE,F NFNE NE,NF NE,F F….

Yes Yes YesYes Yes NoYes No YesYes No No….

rule1 rule2 rule3

FIFOs must permit concurrent enq and deq for the three rules to fire concurrently

February 14, 2011 L04-15http://csg.csail.mit.edu/6.375

module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule

One-Element FIFO

enq and deq cannot even be enabled together much less fire concurrently!

n

not empty

not fullrdyenab

rdyenab

enq

deq

FIFO

mod

ule

More on FIFOs in the next lecture

February 14, 2011 L04-16http://csg.csail.mit.edu/6.375

Concurrency when the FIFOs do not permit concurrent enq and deq

xfifo1inQ

f1 f2 f3

fifo2 outQnot

emptynot

empty&

not full

not empty

&not full

not full

At best alternate stages in the pipeline will be able to fire concurrently

February 14, 2011 L04-17http://csg.csail.mit.edu/6.375

Inelastic vs Elastic PipelinesIn an Inelastic pipeline: typically only one rule; the designer controls

precisely which activities go on in parallel downside: The rule can get too complicated

-- easy to make a mistake; difficult to make changes

In an Elastic pipeline: several smaller rules, each easy to write,

easier to make changes downside: sometimes rules do not fire

concurrently when they should

February 14, 2011 L04-18http://csg.csail.mit.edu/6.375

The tensionIf multiple rules never fire in the same cycle then the machine can hardly be called a pipelined machineIf all rules fire in parallel every cycle when they are enabled, then, in general, wrong results can be produced

More on this in the next lecture

February 14, 2011 L04-19http://csg.csail.mit.edu/6.375

Language notesPattern matching syntax

Vector syntax

Implicit conditions

Static vs dynamic expression

February 14, 2011 L04-20http://csg.csail.mit.edu/6.375

Pattern-matching: A convenient way to extract datastructure components

The &&& is a conjunction, and allows pattern-variables to come into scope from left to right

case (m) matches tagged Invalid : return 0; tagged Valid .x : return x;endcase

if (m matches (Valid .x) &&& (x > 10))

typedef union tagged { void Invalid; t Valid;} Maybe#(type t);

x will get bound to the appropriate part of m

February 14, 2011 L04-21http://csg.csail.mit.edu/6.375

Syntax: Vector of RegistersRegister

suppose x and y are both of type Reg. Then x <= y means x._write(y._read())

Vector of Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j))

Vector of Registers x[i] <= y[j] does not work. The parser thinks it means

(sel(x,i)._read)._write(sel(y,j)._read), which will not type check

(x[i]) <= y[j] parses as sel(x,i)._write(sel(y,j)._read), and works correctly

Don’t ask me why

February 14, 2011 L04-22http://csg.csail.mit.edu/6.375

Making guards explicitrule recirculate (True); if (p) fifo.enq(8); r <= 7;endrule

rule recirculate ((p && fifo.enqG) || !p); if (p) fifo.enqB(8); r <= 7;endrule

Effectively, all implicit conditions (guards) are lifted and conjoined to the rule guard

February 14, 2011 L04-23http://csg.csail.mit.edu/6.375

Implicit guards (conditions)Rulerule <name> (<guard>); <action>; endrule

where

<action> ::= r <= <exp>| m.g(<exp>)| if (<exp>) <action>

endif| <action> ; <action>

make implicit guards explicit

m.gB(<exp>) when m.gG

February 14, 2011 L04-24http://csg.csail.mit.edu/6.375

Guards vs If’sA guard on one action of a parallel group of actions affects every action within the group(a1 when p1); (a2 when p2)

==> (a1; a2) when (p1 && p2)A condition of a Conditional action only affects the actions within the scope of the conditional action(if (p1) a1); a2

p1 has no effect on a2 ... Mixing ifs and whens(if (p) (a1 when q)) ; a2

((if (p) a1); a2) when ((p && q) | !p)

February 14, 2011 L04-25http://csg.csail.mit.edu/6.375

Static vs dynamic expressions

Expressions that can be evaluated at compile time will be evaluated at compile-time 3+4 7

Some expressions do not have run-time representations and must be evaluated away at compile time; an error will occur if the compile-time evaluation does not succeed Integers, reals, loops, lists, functions, …

February 14, 2011 L04-26http://csg.csail.mit.edu/6.375

Generalization: n-stage pipeline

rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg[0]<= tagged Valid f(1,inQ.first());inQ.deq();end else sReg[0]<= tagged Invalid; for(Integer i = 1; i < n-1; i=i+1) begin case (sReg[i-1]) matches tagged Valid .sx: sReg[i] <= tagged Valid f(i-1,sx); tagged Invalid: sReg[i] <= tagged Invalid; endcase end case (sReg[n-2]) matches tagged Valid .sx: outQ.enq(f(n-1,sx)); endcaseendrule

sReg[0]inQ sReg[1] outQx

f(0) f(1) f(2) f(n-1)...sReg[n-2]

February 14, 2011 L04-27http://csg.csail.mit.edu/6.375

Next lectureConcurrency analysis

February 14, 2011 L04-28http://csg.csail.mit.edu/6.375


Recommended