Outline
1. Poor design practice and remedy2. More counters3. Register as fast temporary storage4. KDA application examples
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 1
4. KDA application examples
1. Poor design practice and remedy
• Synchronous design is the most important methodology
• Poor practice in the past (to save chips)– Misuse of asynchronous reset
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 2
– Misuse of asynchronous reset– Misuse of gated clock – Misuse of derived clock
Misuse of asynchronous reset• Poor design: use reset to clear register in
normal operation. • e.g., a poorly mod-10 counter
– Clear register immediately after the counter reaches 1010
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 3
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 4
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 5
• Problem– Glitches in transition 1001 (9) => 0000 (0)– Glitches in aync_clr can reset the counter– How about timing analysis? (maximal clock
rate)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 6
rate)
• Asynchronous reset should only be used for power-on initialization
• Remedy: load “0000” synchronously
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 7
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 8
Misuse of gated clock• Poor design: use a and gate to disable the
clock to stop the register to get new value • E.g., a counter with an enable signal
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 9
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 10
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 11
• Problem– Gated clock width can be narrow– Gated clock may pass glitches of en– Difficult to design the clock distribution
network
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 12
network
• Remedy: use a synchronous enable
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 13
Misuse of derived clock• Subsystems may run at different clock rate• Poor design: use a derived slow clock for slow
subsystem
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 14
• Problem– Multiple clock distribution network– How about timing analysis? (maximal clock
rate)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 15
rate)
• Better use a synchronous one-clock enable pulse
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 16
• E.g., second and minutes counter– Input: 1 MHz clock – Poor design:
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 17
– Better design
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 18
• VHDL code of poor design
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 19
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 20
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 21
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 22
• Remedy: use a synchronous 1-clock pulse
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 23
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 24
A word about power
• Power is a major design criteria now• In CMOS technology
– Dynamic power is proportional to the switching frequency of transistors
– High clock rate implies high switching freq
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 25
– High clock rate implies high switching freq
• Clock manipulation– Can reduce switching frequency– But should not be done at RT level
• Development flow: 1. Design/synthesize/verify a regular
synchronous subsystems 2(a). Derived clock: use special circuit (PLL etc.)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 26
2(a). Derived clock: use special circuit (PLL etc.) to obtain derived clocks
2(b). Gated clock: use “power optimization” software tool to convert some register into gated clock
2. More counters• Counter circulates a set of specific patterns • Counter:
– Binary– Gray counter– Ring counter– Linear Feedback Shift Register (LFSR)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 27
– Linear Feedback Shift Register (LFSR)– BCD counter
• Binary counter:– State follows binary counting sequence – Use an incrementor for the next-state logic
d qr_reg r_next
q
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 28
d
clk
q
reset
+1r_reg r_next
reset
clk
q
• Gray counter:– State changes one-
bit at a time – Use a Gray
incrementor
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 29
Gray code counter (section 7.5.1)
RTL Hardware Design by P. Chu
Chapter 7 30
• Direct implementation
RTL Hardware Design by P. Chu
Chapter 7 31
• Observation– Require 2n rows– No simple algorithm for gray code increment– One possible method
• Gray to binary• Increment the binary number
RTL Hardware Design by P. Chu
Chapter 7 32
• Increment the binary number• Binary to gray
• binary to gray
• gray to binary
RTL Hardware Design by P. Chu
Chapter 7 33
• gray to binary
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 34
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 35
Ring counter• Circulate a single 1• E.g., 4-bit ring counter:
1000, 0100, 0010, 0001• n patterns for n-bit register• Output appears as an n-phase signal
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 36
• Output appears as an n-phase signal• Non self-correcting design
– Insert “0001” at initialization and circulate the pattern in normal operation
– Fastest counter
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 37
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 38
• Self-correcting design:shifting in a ‘1’ only when 3 MSBs are 000
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 39
LFSR (Linear Feedback Shift Reg)
• A shifter reg with a special feedback circuit to generate the serial input
• The feedback circuit performs xor operation over specific bits
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 40
operation over specific bits• Can circulate through 2n-1 states for an n-
bit register
• E.g, 4-bit LFSR
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 41
• Property of LFSR– N-bit LFSR can cycle through 2n-1 states– The feedback circuit always exists – The sequence is pseudorandom
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 42
• Application of LFSR– Pseudorandom: used in testing, data
encryption/decryption– A counter with simple next-state logic
e.g., 128-bit LFSR using 3 xor gates to circulate
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 43
e.g., 128-bit LFSR using 3 xor gates to circulate 2128-1 patterns (takes 1012 years for a 100 GHz system)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 44
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 45
• Read remaining of Section 9.2.3 (design to including 00..00 state)
• Read Section 9.2.4 (BCD counter, design similar to the second/minute counter in Section 9.1.3
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 46
Section 9.1.3
PWM (pulse width modulation)
• Duty cycle: percentage of time that the signal is asserted
• PWM: use a signal, w, to specify the duty cycle
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 47
cycle– Duty cycle is w/16 if w is not “0000”– Duty cycle is 16/16 if w is “0000”
• Implemented by a binary counter with a special output circuit
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 48
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 49
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 50
3. Register as fast temporary storage• RAM
– RAM cell designed at transistor level– Cell use minimal area– Behave like a latch– For mass storage– Need a special interface logic
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 51
– Need a special interface logic
• Register– D FF requires much larger area– Synchronous – For small, fast storage– E.g., register file, fast FIFO, Fast CAM (content
addressable memory)
Register file
• Registers arranged as an 1-d array• Each register is identified with an address• Normally has 1 write port (with write
enable signal)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 52
enable signal)• Can has multiple read ports
• E.g., 4-word register file w/ 1 write port and two read ports
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 53
• Register array:– 4 registers– Each register has an enable signal
• Write decoding circuit:– 0000 if wr_en is 0 – 1 bit asserted according to w_addr if wr_en is 1
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 54
– 1 bit asserted according to w_addr if wr_en is 1
• Read circuit:– A mux for each read port
• 2-d data type needed
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 55
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 56
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 57
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 58
FIFO Buffer
• “Elastic” storage between two subsystems
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 59
• Circular queue implementation• Use two pointers and a “generic storage”
– Write pointer: point to the empty slot before the head of the queue
– Read pointer: point to the tail of the queue
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 60
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 61
• FIFO controller– Read and write pointers: 2 counters– Status circuit:
• Difficult• Design 1: Augmented binary counter• Design 2: with status FFs
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 62
– LSFR as counter
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 63
• Augmented binary counter: – increase the counter by 1 bits– Use LSB bits (in this case 3 bits) as register address– Use MSB bit to distinguish full or empty
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 64
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 65
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 66
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 67
• 2 extra status FFs– Full_reg/empty_reg memorize the current staus– Initialized as 0 and 1– Modified according to wr and rd signals
(i.e. wr&rd):• 00: no change• 11: advance read pointer/write pointer; full/empty no
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 68
• 11: advance read pointer/write pointer; full/empty no change due to both read and write operations
• 10: advance write pointer; de-assert empty; assert full if needed (when write pointer=read pointer)
• 01: advance read pointer; de-assert full; asserted empty if needed (when write pointer=read pointer)
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 69
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 70
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 71
• Non-binary counter for the pointer– Exact location does not matter as long as the
write pointer and read pointer follow the same pattern
– Other counters can be used for the second scheme
– E.g, use LFSR
RTL Hardware Design by P. Chu
Chapter 9.1-3 and 9.5 72
– E.g, use LFSR
KDA application: FPGA modem overview
• Modem datapath functionality (i.e. TXD and RXD) created in System Generator Xilinx tool.
• Interfaces and high level control hand coded in VHDL• Implemented using Xilinx Virtex4 lx60
RW_IFIRQ
73
MCUPIFCRU
MBIF ADIF
TXD
RXD
TX_AGC
I/Q TX_DATA
RX_RSSI
RX_DATA
PA
MCLK
ARST_N
TX_CTRLTX_TDM
TX_EOW
RX_CTRLRX_TDM
RX_EOW RFIF
CLK_7M37
RF
KDA application: Top level TXD and RXD design
20.16 MHz
20.16 MHz
FIFO
FIFO
53.76 MHz
53.76 MHz
DAC
ADRX
TX
74
• Multiple rate and multiple clock domain System Generator design• 4+2 .ngc netlist files from System Generator integrated in top level VHDL
design• Timing constraints
– Multi-cycle timing constraints included in .ngc files (NB! ChipScope Pro)– All clock nets, clock domain crossings and other known paths constrained
leaving the number of unconstrained paths to a minimum.– ISE Timing Analyzer reports all unconstrained paths
KDA application: Packet DMA; the PTX module
• Packed DMA in transmission direction (i.e. from CPU).
/ 75 /INF5430
• Payload data first written to 32/64 kbyte Xilinx Block RAM (BRAM), and then two 16-bit words withpacked data start address in BRAM and number of bytes in packed are written to the cntrl packed FIFO.
• BUFRAM Finite State Machine (FSM) first reads the two control words and then reads payload data from BRAM.
• CTRL packed FIFO must be rather large (i.e. 1 kbyte) to be able to store many small packets.
• PCIe-PIF BRAM write interface and BUFRAM FSM read interface in different clock domains!
• For design verification during implementation data may be routed back (i.e. to the CPU) via the receiving PRX module (see next slide) in data loopback mode.
KDA application: Packet DMA; the PRX module
• Packet DMA in receiver direction (i.e. to the CPU).
/ 76 /INF5430
• Packet DMA in receiver direction (i.e. to the CPU).
• Payload data first stored in 32/64 kbyte BRAM, and then two 16-bit words with packed data startaddress in BRAM and number of bytes in packed are written to the cntrl packed FIFO
• The CPU first reads the two control words from the cntrl packet FIFO and then reads payload data from BRAM.
• CTRL packed FIFO must be rather large (i.e. 1 kbyte) to be able to store many small packets.
• PCIe-PIF BRAM read interface and BUFRAM FSM write interface in different clock domains!
• Payload data may be received from the PTX module in loopback mode.