Post on 11-Jan-2016
transcript
ECE 551Digital System Design &
Synthesis
Lecture 10Synthesis Techniques
Lecture 10 Topics Synthesis Process Revisited Optimization Stages in Synthesis Advanced Synthesis Strategies
2
Synthesis Verilog files aren’t hardware yet! Need to “synthesize” them
Tool reads hardware descriptions Figures out what hardware to make
Done automatically Faster! Easier!
Designers still have to understand hardware! Avoid pre- vs. post-synthesis discrepancies Describe EFFICIENT hardware
3
Useful Documentation
Fairly complete documentation is available for the Synopsys tools using:/afs/engr.wisc.edu/apps/eda/synopsys/syn_Y-2006.06-SP1/sold
See especially (through Design Compiler link) Design Vision User Guide Design Compiler User Guide Design Compiler Reference Manuals HDL Compiler (Presto Verilog) Reference Manual HDL Compiler for Verilog Reference Manual
Use as references 4
5
HDL Compiler for Verilog Reference Manual, pg. 1-5.
HDL Compiler is called by Design Compiler and Design Vision
Why do we need to compare synthesized code to initial code?
6
Design CompilerUser Guide, pg. 2-17
Design Vision is GUI for Design Compiler: use design_vision
Can also run Design Compiler directly using dc_shell
To compile using a synthesis script usedc_shell –tcl_mode –f file_name
Synthesis Script Example [1]# To run, place in the directory with all the Verilog files# and type: dc_shell -tcl_mode -f script.tcl
#Analyze input files. analyze -library WORK -format verilog {./prob5.v ./prob1.v
./prob2.v}
#Elaborate the design. elaborate GF_multiplier_mword -architecture verilog -library
WORK
#Sets clock constraint of 2ns with 50% duty cycle on signal "clock".
create_clock -name "clk" -period 2 -waveform {0 1} {clock}set_dont_touch_network [ find clock clk ]
#Sets the area constraint for the designset_max_area 50000
7
Synthesis Script Example [2]#Check and compile the designcheck_design > check_design.txtuniquifycompile -map_effort medium
#Export netlist for post-synthesis simulation into synth_netlist.vchange_names -rule verilog -hierarchywrite -format verilog -hierarchy -output synth_netlist.v
#Generate reportsreport_resources > resource_report.txtreport_area > area_report.txtreport_timing > timing_report.txtreport_constraint -all_violators > violator_report.txtreport_register -level_sensitive > latch_report.txt
exit
8
Internal Synthesizer Flow (Synopsys)
9
Syntax Checking
HDL Description
Elaboration &Translation
SynthesizerPolicy Checking
StructuralRepresentation
ArchitecturalOptimization
Multi-Level Logic Optimization
TechnologyMapping
Technology-BasedImplementation
TechnologyLibrary
Initial Steps Parsing for Syntax and Semantics Checking
Gives error messages and warnings to user User may modify the HDL description in response
Synthesizer Policy Checking (“Check Design”) Check for adherence to allowable language constructs Are you using unsupported operators or constructs?
Combinational feedback? Multiple drivers to non-tristate? This is where you find out you can’t use certain
Verilog constructs This is synthesizer-dependent
Example: Advanced DesignWare library allows modulo with any value; most other tools only allow modulo with powers of 2.
Certain things common to MOST synthesizers See HDL Compiler for Verilog Reference Manual for
constructs
10
Elaboration & Translation Unrolls loops, substitutes macros &
parameters, computes constant functions, evaluates generate conditionals
Builds a structural representation of the design
Like a netlist, but includes larger components Not just gate-level, may include adders, etc.
Gives additional errors or warnings to the user Issues in initial transformation to hardware. For example, port sizes do not match
Affects quality achieved by optimization steps Structural representation depends on HDL quality Poor HDL can prevent optimization
11
Importance of Translation It is important for the tool to recognize the
sort of logic structures you are trying to describe.
If it sees a 32-bit full adder, the tool has built-in solutions for optimizing adders Ripple-carry, carry-save, carry look-ahead, etc.
If it just sees a Boolean function with 65 inputs, it has to work a lot harder to achieve the same results Do you think it can invent a CLA on the fly?
12
Implications of Translation Writing clear, easy to understand code not
only benefits other engineers, but may give you better synthesis results.
Another reason for standard coding guidelines Brush up on the list in “Verilog Styles That Kill”
If you have a decent synthesis tool, it’s usually better to use Verilog’s built-in arithmetic operators rather than trying to build them from gates or Boolean equations
13
Optimization in Synthesis None of these are guaranteed!
Most synthesizers will make at least some attempt
Detect and eliminate redundant logic Detect combinational feedback loops Exploit don't-care conditions Try to detect unused states Detect and collapse equivalent states Make state assignments if not made already Synthesize multi-level logic equations subject to:
constraints on area and/or speed available technology (library)
14
Optimization Process Optimization modifies the generic netlist
resulting from elaboration and translation. Uses cells from the technology library (mapping) Attempts to meet all specified constraints
The process is divided into major phases All or some selection of the major phases may be
performed during optimization Phase selection can be controlled by the user Some optimizations can be disabled (ex:
set_structure) or forced (ex: set_flatten)
15
Optimization Phases Major Optimization Stages
Architectural Logic-Level Gate-Level
Architectural optimization High-level optimizations that occur before the
design is mapped to the logic-level Based on constraints and high-level coding style After optimization circuit function is represented
by a generic, technology-independent netlist (GTECH)
16
Architectural Optimization In Synopsis, optimizations include:
Sharing common mathematical subexpressions
Sharing resources Selecting DesignWare* implementations
Replacing the generic representation from Translation with a pre-built, optimized circuits
Reordering operators Identifying arithmetic expressions for
datapath synthesis
*DesignWare is Synopsys’s library of pre-designed circuit implementations
17
Architectural Optimization Examples:
Replace an adder used as a counter with incrementer
count = count + 1; Replace adder and separate subtractor with
adder/subtractor if not used simultaneouslyif (~sub) z = a + b; else z = a – b;
Performs selection of pre-designed components (Synopsys DesignWare) adders, multipliers, shifters, comparators, muxes, etc.
Need good code for synthesizer to do this Designer knows more about the project than
the tool does! It can only do so much on its own.
18
Logic/Gate-Level Optimization Works on the generic netlist created by logic
synthesis Produces a technology-specific netlist. In Synopsis, it consists of four stages:
Mapping Delay optimization Design rule fixing Area optimization
This phase often runs in multiple iterations if constraints are not met on the first try
19
Logic/Gate-Level Optimization Mapping
Generates a gate level implementation using tech library
Tries to meet timing and area goals Delay optimization
Tries to fix delay violations from mapping phase. Does not fix design rule violations or meet area
constraints. Design rule fixing
Tries to correct design rule violations Inserting buffers or resizing existing cells
If necessary, violates optimization constraints Area optimization
Tries to meet area constraints, which have lowest priority
20
Combinational Optimization
21
Gate-Level Optimization
22
Boolean Logic-Level Optimizations
23
VerilogDescription
TechnologyImplementation
TRANSLATIONENGINE
Two-levelLogic Functions
OptimizedMulti-level Logic
Functions
OPTIMIZATIONENGINE
MAPPINGENGINE
TechnologyLibraries
Logic Optimizations Area
Number of gates fewer == smaller Size of gates (# inputs) fewer == smaller
Delay Number of logic levels fewer == faster Size of gates (# inputs) fewer == faster
Note that examples that follow ignore NOT gates for gate count / levels of circuits
This is because many libraries offer gate cells with one or more inputs already inverted.
24
Logic Optimizations Decomposition Extraction Factoring Substitution Elimination
You don’t have to remember the names of these
But should understand logic optimization Different techniques targeting area vs. delay
25
Decomposition Find common expressions in a single function Reduce redundancy
Reduce area (number/size of gates)
May increase delay More levels of logic
Define a G(x) cost function to compare expressions G(inverter) = 0 G(basic gate) = #inputs to the gate
Basic gates: AND, OR, NAND, NOR
Based on the concept that the size of a gate is proportional to the number of inputs 26
Decomposition Example F = abc + abd + a’c’d’ + b’c’d’ F = ab(c + d) + c’d’(a’ + b’) F = ab(c + d) + (c + d)’(ab)’
X = ab 1 gate, 1 level Y = c + d 1 gate, 1 level F = XY + X’Y’ 3 gates, 2 levels
(5 gates, 3 levels total)
G(Original) = 16 (four 3-input, one 4-input gates)
G(Decomposed) = 10 (five 2-input gates) 27
Extraction Find common sub-expressions between
functions Like decomposition, but across more than
one function Reduce redundancy
Reduce area (number/size of gates)
May increase delay if more logic levels introduced
28
Extraction Example F = (a + b)cd + e 3 gates, 3 levels G = (a + b)e’ 2 gates, 2 levels H = cde 1 gate, 1 level
Common subexp: X = a + b, Y = cd 1 gate, 1 level (each)
F = XY + e 4 gates, 3 levels G = Xe’ 2 gate, 2 levels H = Ye 2 gate, 2 levels
Before: (3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND G(original) = 6 + 6 + 2 = 14
After (2) 2-input Ors, (4) 2-input ANDs G(extracted) = 4 + 8 = 12
29
Factoring Traditional two-level logic is sum-of-products Sometimes better expressed by product-of-
sums Fewer literals => less area
May increase delay if logic equation not completely factored (becomes multi-level)
30
Factoring Example Definitely good:
F = ac + ad + bc + bd 7 gates, 3 levels* F = (a + b)(c + d) 3 gates, 2 levels
Maybe good: F = ac + ad + e 3 gates, 2 levels (G=7) F = a(c + d) + e 3 gates, 3 levels (G=6)
This one might improve area... But will likely increase delay (tradeoff)
*Assuming 2-input gates31
Substitution Similar to Extraction When one function is a sub-function of
another Reduce area
Fewer gates
Can increase delay if more logic levels
32
Substitution Example G = a + b 1 gate, 1 level F = a + b + c 1 gate, 1 level
F = G + c 2 gate, 2 levels
Before: (1) 2-input OR, (1) 3-input OR
After: (2) 2-input ORs (better area but increased levels)
33
With compile_ultra, the sub-expressions do not have to explicitly match, i.e. a + b would still be identified if F = b + c + a
Elimination (Flattening) Opposite of previous optimizations Goal is to reduce delay
Make signals travel though as few logic levels as possible
But will likely increase area Gate replication / redundant logic
Can force/disable this step using set_flatten true / set_flatten false
34
Elimination Example G = c + d 1 gate, 1 level F = Ga + G' b 3 gates, 3 levels
G = c + d 1 gate, 1 level F = ac + ad + bc’d’ 4 gates, 2 levels
Before: (2) 2-input ORs, (2) 2-input ANDs
After: (1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs,
(1) 3-input AND (worse area, but fewer levels)
35
compile_ultra Optimizations Ultra-high mapping effort, 2-pass
Compilation Automatic hierarchical ungrouping
Ungroups small modules before mapping Ungroups critical path based on delay
Automatic datapath extraction * E.g. carry-save adders, sharing/unsharing
Boundary optimization Propagates logic across hierarchical boundaries
(constants, NC inputs/outputs, NOT)
Sequential inversion * Sequential elements can have their outputs
inverted 36
Datepath Extraction Optimizations Uses carry-save adders where beneficial
Carry-propagate adders only when result is needed
37
Datapath Extraction Optimizations Comparator sharing
A>B, A=B, A<B use a single subtractor with multiple outputs
Optimization of parallel constant multipliers SOP to POS transformation Operand reordering Explores trade-offs of common sub-
expression sharing and mutually exclusive resource sharing
38
Sharing and Unsharing Expression sharing may be overridden later
due to timing Z1 <= A + B + C Z2 <= A + B + D Arrival time is A < B < D < C
39
Sharing and Unsharing Mutually exclusive operations can share
resources if(SEL) Z = A + B else Z = C + D
When would this kind of sharing be a bad idea?
40
Sequential Inversion set
compile_seqmap_enable_output_inversion true
Useful if the available flip-flops do not have the same asynchronous input (preset or clear) as required in the design
41
Register Retiming At the HDL level, determining the optimal
placement of registers is difficult and tedious at best, or just plain impossible at worst
The register retiming tool moves registers through the synthesized combinational logic network to improve timing and/or area Equalize delay (i.e. reduce critical path delay by
increasing delay in other paths) Reduce the number of flip-flops if timing criteria
are met Usually propagate registers forward
Be aware that this may change the values of some internal signals compared to pre-synthesis.
42
Register Retiming Example (1)
43
Register Retiming Example (2)
44
DC Topographical Mode When optimizing for delay, the synthesis
engine is not aware of the net delays, since the place-and-route has not been accomplished Delays can be back-annotated and synthesis
repeated after place-and-route, until closure is reached
Layout-aware synthesis attempts to get faster timing closure by predicting the physical design and using that information in synthesis and optimization, particularly with respect to delay Estimates the placement and routing Predicts and uses net capacitances in synthesis
and optimization
45
Further Reading There are many more commands out there to
give you greater control over the synthesis process if you want it.
See: Synopsys Online Documentation (SOLD) Design Compiler man pages
46