Date post: | 24-May-2015 |
Category: |
Documents |
Upload: | mujahid-mohammed |
View: | 1,519 times |
Download: | 13 times |
UnderstandingUnderstanding Clock Tree Synthesis
Log MessagesLog Messages
© Synopsys 2012 1
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 2
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 3
Prerequisite 1: Run the check clock tree Command• Run the check_clock_tree command prior to clock tree
synthesis, and fix the issues reported
_ _
• This command checks the following, and reports issues that can lead to bad QoR: Cl k T S Clock Tree Structure Constraints Clock Tree Exceptions
© Synopsys 2012 4
Prerequisite 2: Ensure Placement Legality• For clock tree synthesis to proceed without any errors, it is necessary to
have a legally placed design. • Use the check legality command to check whether the design is
g y
• Use the check_legality command to check whether the design is properly placed and legalized, prior to CTS.
• In case of legality issues, use the legalize_placement command to resolve these issuesresolve these issues.
Note:• Clock tree synthesis will abort in case of placement legality issues• Clock tree synthesis will abort in case of placement legality issues.• In some cases, like overlapping standard cells, it may still proceed and
issue a warning during placement legality checking, but continuing with placement legality issues may lead to bad QoRplacement legality issues may lead to bad QoR.
Warning: Some cells in the design are not legal. (CTS-242)
© Synopsys 2012 5
Default Constraints
• The default constraints that clock tree synthesis uses are as follows:
Maximum transition time 0.5nsMaximum capacitance 0.6pFM i f 2000Maximum fanout 2000
© Synopsys 2012 6
Design Rule Constraints• In addition to the clock tree design rule constraint values specified usingIn addition to the clock tree design rule constraint values specified using set_clock_tree_options, IC Compiler also considers the design rule constraint values from the logic library and the design.
• The following table summarizes how IC Compiler determines the design rule constraint
Case1:Default behavior:
t lib f t f l
Case2:Use library and SDC settings for maximum fanout:
t lib f t t
Case3:Use only user set settings for clock tree synthesis and clock tree optimization:
The following table summarizes how IC Compiler determines the design rule constraint values used during the design rule fixing stage of clock tree synthesis and optimization.
cts_use_lib_max_fanout=falsects_use_sdc_max_fanout=falsects_force_user_constraints=false
cts_use_lib_max_fanout=truects_use_sdc_max_fanout=truects_force_user_constraints=false
cts_force_user_constraints=true
Maximum capacitance
The minimum value from:• The set_clock_tree_options• The CTS default value (0.6pF)
The minimum value from:• The set_clock_tree_options• The CTS default value (0.6pF)
Value set using set clock tree optionsMaximum capacitance The CTS default value (0.6pF)
• The logic library• The SDC constraints
The CTS default value (0.6pF)• The logic library• The SDC constraints
_ _ _ p
Maximum transition time
The minimum value from:• The set_clock_tree_options• The CTS default value (0.5ns)
Th l i lib
The minimum value from:• The set_clock_tree_options• The CTS default value (0.5ns)
Th l i lib
Value set using set_clock_tree_options
• The logic library• The SDC constraints
• The logic library• The SDC constraints
Maximum fanout The value set usingset_clock_tree_options
The minimum value from• The logic library• The SDC constraints• The set clock tree options
The value set usingset_clock_tree_options
© Synopsys 2012 7
The set_clock_tree_options
Constraints Specified Using the set clock tree options Command• Library units are used for time and capacitance values specified by using
the set_clock_tree_options command
_ _ _ p
• The smallest values accepted for the -max_capacitance and -max_transition options of the set_clock_tree_optionscommand are 1fF and 1ps respectivelycommand are 1fF and 1ps respectively.
• For example, if the library units are pF and ps, and you specify the following command IC Compiler will issue an error:command, IC Compiler will issue an error:icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206)Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207)
– IC compiler will not accept these small values, and will use the previously specified values or the default values for maximum capacitance and maximum transition, during clock tree synthesis.
© Synopsys 2012 8
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 9
Enabling Debug Messages
• To enable clock tree synthesis debug messages in IC Compiler, use: set cts use debug mode trueset cts_use_debug_mode true
• Many of the messages discussed in this presentation are available only when you enable the debug mode.y g
© Synopsys 2012 10
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 11
Messages in the compile_clock_treeCommand Log
• Before clock tree synthesis: D i d t
Command Log
– Design update– Buffer and Inverter information– Clock tree constraints– Clock structure before clock three synthesis
• During clock tree synthesis:– Clustering– Meeting target early delayMeeting target early delay– Gate level clock tree synthesis results
• After clock tree synthesis:S t– Summary report
– Embedded clock tree optimization– DRC fixing beyond exceptions– Placement legalization
© Synopsys 2012 12
START CMD: compile clock tree CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
Overview of the compile_clock_tree Command Log _ p _ _ ( ) ( )(PSYN-508)
CTS: CTS Operating Condition(s): MAX(Worst)START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)Loading design 'ORCA_TOP'
…Information: Design Library and main library capacitance units are matched - 1.000 pf.
Prelude
g y y p pEND_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)…****************************************************************Information: TLUPlus based RC computation is enabled. (RCEX-141)****************************************************************Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX-007)
Extraction related messagesInformation: The distance unit in Capacitance and Resistance is 1 micron. (RCEX 007)Information: The RC model used is TLU+. (RCEX-015)…CTS: Blockage Aware AlgorithmCTS: Marking Ignore Pins....…Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619)CTS b ff ti t d k t t d l d i i i tCTS: buffer estimated skew target delay driving res input capCTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059]... CTS: Prepare sources for clock domain SD_DDR_CLKCTS: Prepare sources for clock domain SDRAM_CLKCTS: Prepare sources for clock domain SYS_2x_CLK…
Buffer characterization
CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region.CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad.…clean drc fixing cell first...In all, 0 drc fixing cell(s) are cleanedIn all, 0 drc fixing cell(s) beyond exception pins are cleaned…
© Synopsys 2012 13
…CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignoreCTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore…
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignoreCTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore…
Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231)CTS: Info: will use target transition value for initial CTS stages
Pruning library cells (r/f, pwr)Min drive = 0.000372606.
…Final pruned buffer set (7 buffers):
bufbd1
Pruning of buffers and inverters
…CTDN lib estimation: buffers should result in better clock power.CTS: BA: Net 'sdram_clk'CTS: Starting clock tree synthesis ...CTS: Conditions = worst(1)CTS: Global design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored
Reporting global clock tree constraints
…Information: Removing clock transition on clock PCI_CLK ... (CTS-103)
CTS: gate level 1 clock tree synthesisCTS: clock net = sdram_clkCTS: gate level 1 clock tree synthesis resultsCTS: clock net : sdram clk
Clock tree synthesisCTS: clock net : sdram_clk…TS: Clock tree synthesis completed successfullyCTS: CPU time: 18 secondsCTS: Reporting clock tree violations ...…CTS: ------------------------------------------------
Reporting the results of clock tree synthesis
CTS: Clock Tree Synthesis SummaryCTS: ------------------------------------------------…CTS: Starting block level clock tree optimization…CTS: gate level 1 clock tree optimizationCTS: clock net = pclk
Embedded clock tree optimization
© Synopsys 2012 14
CTS: clock net = pclk
Gate Upsizing During Clock Tree Synthesis
• The compile_clock_tree command will upsize all the
Synthesis
preexisting cells in the clock tree before building the clock tree.
Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to bufbdf (CTS 152)
Preexisting gate
bufbdf. (CTS-152)
• In the previous example the preexisting gate is upsized from a bufbd4 to a bufbdf.
• This upsizing helps in reducing the number of buffer levels needed to building the clock tree, thereby reducing the buffer count.g , y g
© Synopsys 2012 15
Maximum Capacitance and Transition Related Warnings• Even if the set_clock_tree_options command does not issue
any errors when you set the maximum capacitance and transition constraints, the compile_clock_tree command can issue warnings if the values are too small.
Warning: too small maximum transition (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620)Warning: too small maximum capacitance (=0.050000) defined at pin instCLK1GC1/Q. (CTS-620)Warning: too small maximum transition (=0.050000) defined at
Max trans =50ps is too tight for the pin instCLK1GC1/Q
Max cap =50fF is too tight for the pin instCLK1GC1/Q
Warning: too small maximum transition ( 0.050000) defined at library cell bufbdk. (CTS-619)
• Tight constraints can cause clock tree synthesis to use an excessiveTight constraints can cause clock tree synthesis to use an excessive number of buffers to build the clock trees
© Synopsys 2012 16
Buffers and Inverters Used During Clock Tree Synthesis
• Before synthesizing the clock tree, IC Compiler characterizes each buffer and inverter To see the characterization details, set the following variable to true:g
set cts_do_characterization true After characterization is done, characterized values for each buffer and
inverter are reportedBuffer pCTS: buffer estimated skew target delay driving res input capCTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008]CTS b fbd4 [0 047 0 053] [0 347 0 357] [0 786 0 880] [0 004 0 004]CTS: bufbd4 [0.047 0.053] [0.347 0.357] [0.786 0.880] [0.004 0.004]Inverter Rise delay Fall delay
• Driving resistance determines the drive strength of the buffer or inverter. • Smaller the driving resistance, greater is the drive strength. • In the previous example, bufbdf is the buffer with the highest drive strength.
© Synopsys 2012 17
Unbalanced Buffers
• Buffers and inverters that have a big difference between their rise and fall delays, which is referred to as the rise/fall delay skew, are reported.CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000)
• Remove unbalanced buffers them from the buffer list specified for clock tree synthesis, as they can might cause bad skew.• Use the set_clock_tree_references command to specify the
buffers and inverters that should be used for clock tree synthesis
© Synopsys 2012 18
Pruning of Buffers and Invertors• Pruning is a process by which IC Compiler selects the buffers and
inverters which are best suited for clock tree synthesis, based on the buffer and inverter characterization, and prevents the remaining ones f b i dfrom being used.
• IC Compiler prunes the buffers and inverters based on drive strength and power:and power:Pruning library cells (r/f, pwr)
Min drive = 0.264263.Pruning inv0d0 because drive of 0.149845 is less than 0.264263.Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2.
• IC Compiler calculates a minimum drive value based on heuristics. Buffers and inverters whose drive strength is less than the minimum drive value are considered as weak drivers and are pruned by IC d e a ue a e co s de ed as ea d e s a d a e p u ed by CCompiler.
• It is not possible to override the default pruning process
© Synopsys 2012 19
Maximum Transition, Maximum Capacitance and Timing ConstraintsCapacitance and Timing Constraints
Before clock tree synthesis begins, all the global clock tree constraints are reported in the log in the format shown below:
Default value or the value set usingset clock tree optionsThe value
reported in the log, in the format shown below:
CTS: Global design rule constraints [rise fall]CTS: max transition = worst[0.050 0.050] GUI = worst[0.100 0.100] SDC = worst[0.050 0.050]
Value from SDC
_ _ _ pused by CTS
[ ] [ ] [ ]CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignoredCTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored
on s
Undefined means no value ifi d i SDCCTS: Global timing/clock tree constraints
CTS: clock skew = worst[0.100]CTS: insertion delay = worst[2.000]CTS: levels per net = 200
Skew
/inse
rtio
dela
y ta
rget
s
Values set using the
specified in SDC
Ignored means the value from SDC is ignored as the cts force user constraints
© Synopsys 2012 20
S d Values set using theset_clock_tree_options command
cts_force_user_constraints
variable is set to true
Clock Tree Synthesis Target Specifications
• Target specifications are the internal targets for clock tree synthesis,
Clock Tree Synthesis Target Specifications
but are not guaranteed. Only target constraints are guaranteed to be achieved CTS: Global target spec [rise fall]CTS: transition = worst[0.250 0.250]CTS: capacitance = worst[0.300 0.300]
CTS: fanout= 32 (This target fanout value is not considered by CTS)
• Target specifications: maxTransSpec: Min(0.25, 80%of max_transition constraints) maxCapSpec: Min(0.30, 80%of max_capacitance constraints)
© Synopsys 2012 21
Preexisting Clock Tree Information in the Log FileMaximum number of Before starting to
CTS: Design infomationCTS: total gate levels = 8CTS: Root clock net CLK2CTS: clock gate levels = 2
Number of sinks
Maximum number of gate levels available
e le
vels
Before starting to build the clock tree, the preexisting clock tree structure is printed in the log file
CTS: clock sink pins = 4CTS: level 2: gates = 1CTS: level 1: gates = 1CTS: Buffer/Inverter list for CTS for clock net CLK2:CTS: invbdk
Existing gate levels and number of gates at each level
Num
ber o
f gat
efo
r clo
ck C
LK2 printed in the log file
CTS: bufbdk...CTS: Root clock net CLK1CTS: clock gate levels = 8CTS: clock sink pins = 8431
N f
CTS: clock sink pins 8431CTS: level 8: gates = 2CTS: level 7: gates = 3CTS: level 6: gates = 4CTS: level 5: gates = 3CTS: level 4: gates = 1ev
els
from
ps to
war
dsso
urce
CTS: level 4: gates = 1CTS: level 3: gates = 5CTS: level 2: gates = 4
CTS: level 1: gates = 1CTS: Buffer/Inverter list for CTS for clock net CLK1:CTS i bdk
Gat
e l
flip-
flocl
ock
s
© Synopsys 2012 22
CTS: invbdkCTS: bufbdk...
Real Gates and Guide Buffers• You may see the term real gates in the preexisting clock tree structure
information section:CTS: Root clock net CLK1CTS: clock gate levels = 16CTS: clock gate levels = 16CTS: clock sink pins = 70644...CTS: level 13: gates = 14 (real gates = 4)CTS: level 12: gates = 111 (real gates = 101)CTS: level 11: gates = 146 (real gates = 136)g ( g )CTS: level 10: gates = 2488 (real gates = 2478)
• Real gates are preexisting gates in the clock tree, and are not gates added by the tool
• Guide buffers are buffers or inverters that are inserted by the tool, before it begins to build the tree. They are intended to help clock tree synthesis build a better clock tree
• The number of guide buffers inserted at each level can be determined from the difference between gates and real gates.– In the above example, the tool has added 10 guide buffers at each of the clock tree
© Synopsys 2012 23
Buffers and Inverters Used
• Before it begins to build the clock tree, the tool will list all the buffers and inverters it will use to build the treeCTS: Buffer/Inverter list for CTS for clock net sdram clk:_CTS: CLKBUFX20CTS: CLKBUFX16CTS: CLKBUFX12CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk:CTS CLKBUFX20
CTS uses this list
CTS: CLKBUFX20CTS: CLKBUFX16CTS: CLKINVX8CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk:CTS: CLKBUFX20
CTS uses this list for inserting boundary cells
CTS: CLKBUFX16CTS: CLKINVX8CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk:CTS: CLKBUFX20
CTO uses this list for sizing
CTO thi li t f d l i tiCTS: CLKBUFX16CTS: CLKINVX8
• You can change the buffer and inverter list by using the following command:
CTO uses this list for delay insertion
© Synopsys 2012 24
set_clock_tree_references
Clock Tree Synthesis Removes User-Specified Ideal Attributes on Clocks
• Synthesized clocks are set to be propagated, and clock transition, which is an attribute of an ideal clock, is removed
Ideal Attributes on Clocks
CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS-103)CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103)
• Latency, another attribute of an ideal clock, is also removedLatency, another attribute of an ideal clock, is also removedCTS: Information: Removing clock latency on pin
Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS-098)
• Source Latency is removed for generated clocksInformation: Removing clock source latency on clock CLK1GC1 ... (CTS-289)
• These messages are informational only, and no action is required
© Synopsys 2012 25
Overlap or Reconvergent Paths
• Overlap or reconvergent paths occur when multiple clocks can drive a nodenode
• IC Compiler issues warnings about such pathsWarning: Either the driven net has been synthesized previously or
clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209)
• Such messages should be treated as informational, rather than as warnings– IC Compiler has no problems handling such situations
© Synopsys 2012 26
Cl k t b ildi i d t l l b t l l t ti f th
Gate Level-by-Level Clock Tree Synthesis• Clock tree building is done gate level by gate level, starting from the
sinks to the clock root
• For each gate level, just before the synthesis starts, the following information will be printed in the log:CTS: gate level 2 clock tree synthesisCTS: clock net = I BLENDER 1/gclk Net and driver at_ _ gCTS: driving pin = I_BLENDER_1/U483/ZCTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300]CTS: max capacitance = worst[0.300 0.300]
Net and driver atthis gate level
CTS: max fanout = 2000CTS: gate level 2 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]CTS: driver cap. = worst[0.088 0.088]C S: d e cap. o st[0.088 0.088]CTS: fanout = 32CTS: gate level 2 timing constraintsCTS: clock skew = worst[0.000]CTS: levels per net = 200
© Synopsys 2012 27
CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
• The clock tree building starts with clustering. Clustering is the process of
Clustering During Clock Tree Synthesisg g g p
dividing a set of sink pins (fanouts) into groups. Each group is driven by a buffer The instances of a cluster are all close to each other
• The following message says that 423 sink pins are divided into 27 clusters• The following message says that 423 sink pins are divided into 27 clusters, each with approximately 423/27 sink pins
CTS: gate level 2 clock tree synthesis...CTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300]CTS: max capacitance = worst[0.300 0.300]CTS: max fanout = 2000CTS: gate level 2 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]p [ ]CTS: driver cap. = worst[0.088 0.088]CTS: fanout = 32CTS: gate level 2 timing constraints...CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
Before clustering After clustering
CTS: Starting clustering for bufbda with target load worst[0.240 0.240]CTS: Completed 423 to 27 clusteringCTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y)CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: Completed 27 to 4 clusteringCTS: BA: lp (0 673 0 597): skew (0 080 0 105) c(0 198 0 026) viol(n n)
One buffer level is added with each clustering
Represents DRCs (cap,trans)
© Synopsys 2012 28
CTS: BA: lp (0.673, 0.597): skew (0.080, 0.105) c(0.198, 0.026) viol(n n)CTS: -----------------------------------------------
y : violation presentn : no violation Skew (Before clustering, After clustering)
Clustering With Hookup Pins• Hookup pins are input pins of gates or macros
• Unlike clock pins of flip-flops and latches (sink pins), hookup pins have a nonzero phase delay that must be balanced with the sink pins
© Synopsys 2012 29
Initially the tool makes attempts to cluster hookup pins along with the normal sinks (trial
Clustering With Hookup Pins• Initially, the tool makes attempts to cluster hookup pins along with the normal sinks (trial
clustering)CTS: gate level 1 clock tree synthesis...CTS: gate level 1 design rule constraints [rise fall]CTS: max transition = worst[0.300 0.300] In this example there are 479 sinksCTS: max capacitance = worst[0.300 0.300]CTS: max fanout = 2000CTS: gate level 1 target spec [rise fall]CTS: transition = worst[0.240 0.240]CTS: capacitance = worst[0.240 0.240]CTS: driver cap. = worst[0.150 0.150]CTS: fanout = 32
In this example, there are 479 sinks and 1 hookup pin
CTS: fanout 32CTS: gate level 1 timing constraints...CTS: -----------------------------------------------CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: Completed 480 to 34 clusteringCTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS C l t d 34 t 6 l t i
TrialclusteringCTS: Completed 34 to 6 clustering
CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000)CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000)CTS: BA: target cap = 0.070 pfCTS: Starting clustering for bufbda with target load = worst[0.240 0.240]CTS: BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866CTS: Completed 479 to 39 clustering
clustering
Actuall t iCTS: BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y)
CTS: -----------------------------------------------
• At the trial clustering stage, the hookup pin is considered along with the other sink pins and (479+1) to 34 to 6 clustering is obtained
• At the actual clustering stage the tool clusters the 479 sink pins separately from the hookup
clustering
© Synopsys 2012 30
• At the actual clustering stage, the tool clusters the 479 sink pins separately from the hookup pin
Clustering With Hookup Pins:Hookup Pin Clustered With Sinks
• If the trial clustering gives good QoR results, the following message shown in blue is displayed :
Hookup Pin Clustered With Sinks
blue is displayed :CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y)CTS: -----------------------------------------------CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxedCTS: Completed 2 to 2 clusteringCTS: Completed 2 to 2 clusteringCTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxedCTS: Completed 2 to 1 clusteringCTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196)CTS: BA: next delay [max min] (skew) = worst[2.161 1.965] (0.196)CTS: BA: next delay [max min] (skew) worst[2.161 1.965] (0.196)CTS: BA: target cap = 0.048 pfCTS: Pin 1: periph/U5659/A is selected for next levelCTS: delay [max min] (skew) = worst[1.976 1.921] (0.055)CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]CTS: Completed 2 to 2 clusteringp gCTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n)CTS: -----------------------------------------------
• When the phase delay of the hookup pin periph/U5659/A matches with the delay of the already built tree at that gate level, it will be clustered at that buffer
© Synopsys 2012 31
y y g ,level.
Meeting Target Early Delay• After the synthesis of the root clock net (gate level 1 synthesis), the tool checks if the delay
constraint set by the user is being met or not.
• If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay specified by the user.p y
• In the following message, 16 buffers are inserted at the root clock net to increase the delay from 0.569ns to 2ns, which is the user specified target.
CTS: gate level 1 clock tree synthesis CTS: clock net = sys clkC S: c oc et sys_cCTS: driving pin = sys_clkCTS: gate level 1 design rule constraints [rise fall]...CTS: gate level 1 target spec [rise fall]...CTS: gate level 1 timing constraints Constraint set by the userCTS: clock skew = worst[0.000]CTS: insertion delay = worst[2.000]CTS: levels per net = 200CTS: -----------------------------------------------CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]...CTS: -----------------------------------------------CTS: CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]CTS: Completed 19 to 2 clusteringCTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n) CTS: -----------------------------------------------CTS: Inserting delay cells for clock tree sys_clk ...CTS: current delay = worst[0.569] worst[0.457]
© Synopsys 2012 32
CTS: constraint = worst[2.000] worst[0.000]CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk
CTS: gate level 1 clock tree synthesis results
Synthesis Results of One Gate Level After the synthesis of aCTS: gate level 1 clock tree synthesis results
CTS: clock net : sdram_clkCTS: driving pin: sdram_clkCTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pinsCTS: buffer level 1: bufbd7 (1)CTS: buffer level 2: bufbd7 (1)de
lay
at th
edr
am_c
lk)
After the synthesis of a gate level, the results are printed in the log
CTS: clock tree skew = worst[0.036]CTS: longest path delay = worst[0.327](rise)CTS: shortest path delay = worst[0.291](rise)CTS: total capacitance = worst[0.389 0.389]CTS: buffer level phase delayCTS 1 (I) t[0 293]( i ) t[0 256]( i ) k t[0 036]d
inse
rtion
dn
A (h
ere
sd
Operating ConditionCTS: 1 (I): worst[0.293](rise), worst[0.256](rise); skew = worst[0.036]CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022]CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022]CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004]CTS: buffer level output transition delays [rise fall]CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]
Ske
w a
nddr
ivin
g pi
n
CTS: level 0: worst[0.088 0.085] worst[0.088 0.085]CTS: load 0: worst[0.088 0.085] worst[0.088 0.085]CTS: level 1: worst[0.111 0.115] worst[0.091 0.092]CTS: load 1: worst[0.111 0.115] worst[0.091 0.092]CTS: level 2: worst[0.158 0.153] worst[0.080 0.071]CTS: load 2: worst[0.158 0.153] worst[0.080 0.071]CTS: buffer level total load capacitanceCTS: level 0: worst[0.045 0.045]CTS: level 1: worst[0.093 0.093]CTS: level 2: worst[0.251 0.251]CTS: drc violations: 0 0
21A CB
Load capacitance value is added and is
© Synopsys 2012 33
Load capacitance value is added and isreported as total capacitance of the subtreeNumber of cap
violationsNumber of trans violations
Maximum Transition and Capacitance Violations• After each gate level is synthesized, the maximum capacitance and
maximum transition violations at that gate level are reported
Violations
CTS: gate level 3 clock tree synthesis results...CTS: buffer level total load capacitance...CTS it i l ti i h/CTS 755CTS: capacitance violation on periph/CTS_755CTS: capacitance = worst[0.052 0.052]CTS: constraint = worst[0.050 0.050]CTS: capacitance violation on periph/CTS_757CTS: capacitance = worst[0.051 0.051]CTS: constraint = worst[0 050 0 050]CTS: constraint worst[0.050 0.050]...CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/ACTS: transition delay = worst[0.052 0.050] worst[0.052 0.050]CTS: constraint = worst[0.050 0.050]CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/ACTS: transition delay = worst[0.053 0.051] worst[0.053 0.051]CTS: constraint = worst[0.050 0.050]...CTS: drc violations: 18 5
Number of cap violations
Number of trans violations
© Synopsys 2012 34
violations violations
A More Complex Synthesis ResultsCTS: gate level 1 clock tree synthesis resultsCTS: clock net : clkCTS: driving pin: clkCTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pinsCTS: buffer level 1: CLKBUFX20 (1)CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1)CTS: clock tree skew = worst[0.001]CTS: longest path delay = worst[0.248](rise)CTS: shortest path delay = worst[0.246](rise)CTS: total capacitance = worst[0.549 0.549]CTS: buffer level phase delayCTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001]CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001]CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001]CTS: buffer level output transition delays [rise fall]CTS: level 0: worst[0.000 0.000] worst[0.000 0.000]CTS: load 0: worst[0.000 0.000] worst[0.000 0.000]CTS: level 1: worst[0.089 0.076] worst[0.089 0.076]CTS: load 1: worst[0.089 0.076] worst[0.089 0.076]CTS: level 2: worst[0.109 0.093] worst[0.104 0.091]CTS: load 2: worst[0.109 0.093] worst[0.104 0.091]CTS: buffer level total load capacitanceCTS: buffer level total load capacitanceCTS: level 0: worst[0.038 0.038]CTS: level 1: worst[0.108 0.108]CTS: level 2: worst[0.403 0.403]CTS: drc violations: 0 0
© Synopsys 2012 35
Gate Level and Buffer Level Nomenclature
21 21
) ate
leve
l 2
ate
leve
l 1
ate
leve
l 2
ate
leve
l 1
leve
l 3
e le
vel 2
leve
l 4
e le
vel 2
vel 1
so
urce
pin
evel
2
evel
1 o
f g
evel
2 o
f g
evel
2 o
f g
evel
1 o
f g
Buf
fer
of g
ate
Buf
fer
of g
ate
Gat
e le
v(C
lock
s
Gat
e Le
Buf
fer l
e
Buf
fer l
e
Buf
fer l
e
Buf
fer l
e
Red: Preexisting gates At each gate level, the clock tree is built
© Synopsys 2012 36
Black: CTS introduced gates bottom-up, but the buffer names are changed to appear top-down
DRC Violation Report After Synthesis• After building the complete clock tree, all the remaining DRC violations in
the entire clock tree gets reported in the log file:
CTS: Clock tree synthesis completed successfullyCTS: CPU time: 50 secondsCTS: Reporting clock tree violations ...CTS: Global design rules:CTS: maximum transition delay [rise,fall] = [0.05,0.05] CTS: maximum capacitance = 0.05 ConstraintsCTS: maximum fanout = 2000CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clkCTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]CTS: constraint = worst[0.050 0.050]
Constraints
CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050]CTS: constraint = worst[0.050 0.050]CTS: capacitance violation on CTS_6557CTS: capacitance = worst[0.074 0.074]
Reports only transitionand capacitance violationsp [ ]
CTS: constraint = worst[0.050 0.050]CTS: Summary of clock tree violations:CTS: Total number of transition violations = 2CTS: Total number of capacitance violations = 1
p
Total transition andcapacitance violations
© Synopsys 2012 37
Summary Report AfterClock Tree Synthesis
CTS: ------------------------------------------------CTS Cl k T S th i S
Clock Tree Synthesis
CTS: Clock Tree Synthesis SummaryCTS: ------------------------------------------------CTS: 5 clock domain synthesizedCTS: 30 gated clock nets synthesizedCTS: 26 buffer trees insertedCTS: 722 buffers used (total size = 45974.2)CTS: 752 clock nets total capacitance = worst[76.868 76.868]
Each gate level canh l i lhave multiple nets
© Synopsys 2012 38
Clock-by-Clock Summary• A summary is reported for each clock:
CTS: ------------------------------------------------CTS: Clock-by-Clock Summary Buffer tree is inserted
only if necessaryCTS: ------------------------------------------------CTS: Root clock net pclkCTS: 3 gated clock nets synthesizedCTS: 2 buffer trees inserted
only if necessary
CTS: 2 buffers used (total size = 159.667)CTS: 5 clock nets total capacitance = worst[0.514 0.514]CTS: clock tree skew = worst[0.341]CTS: longest path delay = worst[5.959](rise)CTS: longest path delay worst[5.959](rise)CTS: shortest path delay = worst[5.619](rise)CTS: Root clock net sys_clk...
© Synopsys 2012 39
Embedded Clock Tree Optimization• After clock tree synthesis, embedded clock tree optimization begins• The characteristics of the buffers and inverters used are reported again
CTS: buffer estimated skew target delay driving res input capCTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]...
• The global constraints for clock tree are also reported againCTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored...C S Gl b l i i / l k iCTS: Global timing/clock tree constraintsCTS: clock skew = worst[0.000]...CTS: Global target spec [rise fall]CTS: transition = worst[0.040 0.040] ...
Note: Embedded clock tree optimization is called only when the compile_clock_treecommand is used It is not called when the l k t command is used
© Synopsys 2012 40
command is used. It is not called when the clock_opt command is used
More Messages on Real Gates andGuide Buffers
• At the beginning of optimization, you might get the following
Guide Buffers
messages:CTS: Root clock net chip_sclk_srcCTS: clock gate levels = 75CTS: clock sink pins = 125896CTS: clock sink pins 125896...CTS: level 73: gates = 3 (real gates = 1)CTS: level 72: gates = 2 (no real gates, guide buffers only)
ff• All the gates are guide buffers and inverters inserted during clock tree synthesis.
• This information is similar to the one printed prior to clock tree h isynthesis.
© Synopsys 2012 41
Gate Level Optimization• The clock tree optimization is also done for each gate level
• Similar to when the clock tree is built
• Before optimizing a gate level, the current skew, longest path delay and shortest path delay from the driving pin of that gate level, is reported.
CTS: gate level 2 clock tree optimizationCTS: clock net = I_BLENDER_1/gclkCTS: driving pin = I_BLENDER_1/U483/ZCTS: clock tree skew = worst[0.517]CTS: longest path delay = worst[5.339](rise)CTS: shortest path delay = worst[4.822](fall)
• After which that gate level is optimized
© Synopsys 2012 42
Buffer Sizing
• The following message indicates that buffer sizing was successfulCTO-BS: Starting buffer sizing ...Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152)CTO-BS: CPU time = 0 seconds for buffer sizing
• Clock tree optimization will try to resize buffers, and improve skew and insertion delay. If it does not find it beneficial, then the original cell master will be restored.
CTO-BS: Starting buffer sizing ...CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4>CTO-BS: CPU time = 1 seconds for buffer sizing
© Synopsys 2012 43
CTO-GS: Starting gate sizing ...
Gate SizingInformation: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152)Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152)Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152)...I f ti R l d th lib ll f I6717862 f THQMUX2ZSX80 t TSTMUX2ZSX20 (CTS 152)
14 cells sizedInformation: Replaced the library cell of I6717862 from THQMUX2ZSX80 to TSTMUX2ZSX20. (CTS-152)Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152)Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152)CTO-GS: 1: Sized 14/40 cell instances (tested 40X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] Summary of the first round of sizingy ( ) [ ] [ ]; [ ]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: improvement = worst[0.106%]Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152)Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152)
• Number of gate sized (Here 14 out of 40 gates)• Shows the improvement in skew
Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152)Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152)CTO-GS: 2: Sized 6/40 cell instances (tested 40X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO GS: delay (to) = worst[9 104] worst[8 633]; skew = worst[0 471]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: improvement = worst[0.000%]CTO-GS: Summary of cell sizingCTO-GS: Sized 20/40 cell instances (tested 80X247)CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471]CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471]
Overall summary of gate sizing done at this gate level. Total 14+6 =20 gates sized giving an 0 106% i t i k t thi t l l
© Synopsys 2012 44
yCTO-GS: improvement = worst[0.106%]CTO-GS: CPU time = 2413 seconds for gate sizing
0.106% improvement in skew at this gate level
Gate Relocation
• Gate relocation works on preexisting gates.
• If you have no preexisting gates, you might see the following message:g
CTO-GR: gate relocation is skipped since there are no hookup pins
© Synopsys 2012 45
A Successful Gate Relocation
CTO-GR: Starting gate relocation ...CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460)
2 cells were tried at 47new locations, 1 was moved
CTO-GR: 1: Relocated 1/40 cell instances (tested 2 cell instances at 47 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: improvement = worst[0.000%]CTO GR d l [ i ] ( k ) t[9 018 8 563] (0 455)
Initial skewFinal skew
Improvement in skewCTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]y ( ) [ ] [ ] [ ]CTO-GR: improvement = worst[1.118%]CTO-GR: Summary of cell relocationCTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points)CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460] Overall summary of
t l ti t thiCTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]CTO-GR: improvement = worst[1.118%]CTO-GR: CPU time = 2 seconds for gate relocation
gate relocation at this gate level
© Synopsys 2012 46
Gate Relocation: Failed Attempts
CTO-GR: Starting gate relocation ...CTO-GR: Summary of cell relocationCTO-GR: Summary of cell relocationCTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points)CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227]CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227]CTO-GR: improvement = worst[0.000%]CTO-GR: CPU time = 0 seconds for gate relocation
• In this example, clock tree optimization tried to move one gate instance to 24 different locations. Since the attempts did not improve the QoR, the gate relocation was abandoned
© Synopsys 2012 47
Buffer Relocation
• Buffer relocation is done on all clock tree synthesis inserted buffersCTO-BR: Buffer relocation ...CTO BR: Buffer relocation ...CTO-BR: Optimization level: netCTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points)CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[2.013%]CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points)CTO-BR: delay (from) = worst[9 087] worst[8 503]; skew = worst[0 584]CTO BR: delay (from) worst[9.087] worst[8.503]; skew worst[0.584]CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[0.000%]CTO-BR: Summary of cell relocationCTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points)CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584]CTO-BR: improvement = worst[2.013%]
CTO-BR: CPU time = 0 seconds for buffer relocation
Th i f i i i il l i
© Synopsys 2012 48
• The information is similar to gate relocation
• After the embedded clock tree optimization, the tool prints the summary.• It looks exactly similar to the summary printed after clock tree synthesis
Post Embedded Clock Tree Synthesis• It looks exactly similar to the summary printed after clock tree synthesis.CTS: ------------------------------------------------CTS: Clock Tree Optimization SummaryCTS: ------------------------------------------------CTS: 4 clock domain synthesizedCTS: 5 gated clock nets synthesizedCTS: 5 buffer trees insertedCTS: 1000 buffers used (total size = 16570 8)CTS: 1000 buffers used (total size = 16570.8)CTS: 1005 clock nets total capacitance = worst[14.010 14.010]CTS: ------------------------------------------------CTS: Clock-by-Clock SummaryCTS: ------------------------------------------------CTS: Root clock net sdram_clkCTS: 1 gated clock nets synthesizedCTS: 1 buffer trees insertedCTS: 1 buffer trees insertedCTS: 302 buffers used (total size = 5039.47)CTS: 303 clock nets total capacitance = worst[4.170 4.170]CTS: clock tree skew = worst[0.035]CTS: longest path delay = worst[2.041](rise)CTS: shortest path delay = worst[2.006](fall)CTS: Root clock net sys_2x_clk...
• After the summary, all the trans and cap violations on the clock tree are also reported.CTS: Global design rules:CTS: maximum transition delay [rise,fall] = [0.05,0.05]CTS: maximum capacitance = 0.05CTS: maximum fanout = 2000CTS: maximum buffer levels per net = 200CTS: transition delay violation at sdram_clkCTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050]CTS: constraint = worst[0.050 0.050]CTS: transition delay violation at buffd2_G1B1I1/Z...CTS: Summary of clock tree violations:
© Synopsys 2012 49
CTS: Summary of clock tree violations: CTS: Total number of transition violations = 3994CTS: Total number of capacitance violations = 1
DRC Fixing Beyond Exceptions
• After embedded clock tree optimization, the tool will start fixing the DRC violations beyond exceptions.
• The messages are similar to clustering:CTS: fixing DRC beyond exception pins under clock CLK1
CTS: gate level 2 DRC fixing (exception level 1)CTS: clock net = CLK1_G1IPCTS: driving pin = bufbd2_G1IP_1/ZCTS: gate level 2 design rule constraints [rise fall]CTS: max transition = worst[0.100 0.100]CTS: max capacitance = worst[0.600 0.600]CTS: max fanout = 2000CTS: max fanout 2000CTS: -----------------------------------------------CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056]CTS: Completed 4 to 1 clusteringCTS: -----------------------------------------------CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050]
1 1 iCTS: Completed 1 to 1 clusteringCTS: ------------------------------------------------
• After fixing the DRC violations, the whole summary and the clock-by-clock summary of DRC fixing beyond exceptions are reported.
© Synopsys 2012 50
by clock summary of DRC fixing beyond exceptions are reported.
Placement Legalization is CalledAfter Clock Tree Synthesis
• When clock tree synthesis places a clock tree buffer or inverter, it
After Clock Tree Synthesis
places it at a legal location, but the location might be occupied Causes overlaps which needs to be resolved
• The tool calls the placement legalizer which moves the cells to resolve the overlaps.
• After legalization, the cells with large displacement gets reported in the logLargest displacement cells:
Cell: periph/U122 (AND3X)Input location: (906.380 1597.520)Legal location: (897.140 1582.400)Displacement: 17 720 um e g 3 52 row height
1 of 6 cells thatwere displaced
Displacement: 17.720 um, e.g. 3.52 row height.Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height)
© Synopsys 2012 51
Agenda
• Prerequisites for Clock Tree Synthesis
• Enabling Useful Debug Messages in IC Compiler Clock Tree Synthesis
• Clock Tree Synthesis Log Messages
• Clock Tree Optimization Log Messages
© Synopsys 2012 52
The optimize_clock_tree Command Log File Messages
• Optimization options
Log File Messages
p p• Report before optimization• Optimization• Report after optimization
© Synopsys 2012 53
Standalone Optimization Using the optimize clock tree Command
• Standalone optimization differs from embedded optimization in the
optimize_clock_tree Command
algorithms used
• Some of the log messages are similar to those of when you use the g g ycompile_clock_tree command Design update information Buffer characterizationBuffer characterization Pruning of cells List of cells used for clock tree optimization
© Synopsys 2012 54
CTS-352 Warning
• The default delay calculation engine is Elmore. Elmore delay calculation might lead to inferior accuracy in skew and latency estimation.
• Enable the Arnoldi delay calculation engine for more accurate delay y g ycalculation during optimization, by using the following command:
set_delay_calculation –clock_arnoldi
• Otherwise, the optimize_clock_tree command will issue the following warning:Warning: set_delay_calculation is currently set to 'elmore'.
'clock arnoldi' is suggested (CTS 352)'clock_arnoldi' is suggested. (CTS-352)
© Synopsys 2012 55
Optimization Options
• Before starting optimization, the optimize_clock_treed h i d h i i i i f hcommand reports the root pin and the optimization options for each
clock.• The following are the options which you have specified, by using the
set clock tree optimization options commandset_clock_tree_optimization_options command
Initializing parameters for clock CLK2GC:Root pin: instCLK2GC/QRoot pin: instCLK2GC/QUsing the following optimization options:
gate sizing : ongate relocation : onpreserve levels : offarea recovery : onrelax insertion delay : offbalance rc : off
© Synopsys 2012 56
balance rc : off
Preoptimization Report• Before the tool begins to optimize the clock tree, it reports some of
the current characteristics of the clock tree:****************************************** Preoptimization report (clock 'CLK3') * Clock name* Preoptimization report (clock CLK3 ) ******************************************
Corner max'Estimated Skew (r/f/b) = (0.073 0.000 0.073)Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903)
Corner 'RC-ONLY'
Clock nameCTS corner
The starting skew and ID for the clock as seen by CTO
Estimated Skew (r/f/b) = (0.005 0.000 0.005)Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008)
Wire capacitance = 0.8 pfTotal capacitance = 2.3 pfMax transition = 0.448 ns
CTO
Maximum transition value present in the clock tree
Cells = 24 (area=67.500000)Buffers = 23 (area=67.500000)Buffer Types============
bufbd2: 1bufbdf: 8
p
Information about the buffers and inverters
t i th l k tbufbdf: 8bufbd7: 5bufbd4: 3bufbd1: 6
present in the clock tree
© Synopsys 2012 57
Optimization Messages • During optimization, the tool prints out messages for sizing, insertion
and removal, and switching of metal layers:
Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10.iteration 1: (0.314104, 3.328620)Total 1 buffers removed on clock CLK3Start (3.256, 3.527), End (3.015, 3.329)
Buffer RemovalStart (sp, lp) : Initial delays
(skew, ID)
....iteration 2: (0.313991, 3.314841)iteration 3: (0.308073, 3.295621)Total 2 cells sized on clock CLK3Start (3 015, 3 329), End (2 988, 3 296) Cell Sizing
Start (sp, lp) : Initial delaysEnd (sp, lp) : Final delayssp: shortest path delaylp: longest path delay
Start (3.015, 3.329), End (2.988, 3.296)....iteration 6: (0.305181, 3.275623)Total 1 delay buffers added on clock sck_in12 (LP)Start (2.975, 3.283), End (2.970, 3.276) Buffer Insertion....Switch to low metal layer for clock ‘CLK3':Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap change 0.00 percent
© Synopsys 2012 58
Metal layer switching
Optimization Messages
• If area recovery option is enabled, the tool does area recovery after optimizing each clock and reports the changes made to that clock:optimizing each clock, and reports the changes made to that clock:
Area recovery optimization for clock ‘CLK3':15% 23% 30% 46% 53% 61% 76% 84% 92% 100%Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9.
Total 1 buffers removed (all paths) for clock ‘CLK3'
© Synopsys 2012 59
• After completing the optimization of a clock, the tool reports the new Post Optimization Report
p g p , pcharacteristics of the clock tree.
• This is similar to the information printed in before optimization:*************************************************** Multicorner optimization report (clock 'CLK3') ***************************************************
Corner ‘max'Estimated Skew (r/f/b) = (0.041 0.000 0.041)E ti t d I ti D l ( /f/b) (1 725 i f 1 725)Estimated Insertion Delay (r/f/b) = (1.725 -inf 1.725)
Corner 'RC-ONLY'Estimated Skew (r/f/b) = (0.007 0.000 0.007)Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009)
Wire capacitance = 0.8 pfTotal capacitance = 2.3 pfMax transition = 0.356 nsCells = 24 (area=59.000000)Buffers = 23 (area=59.000000)Buffer TypesBuffer Types============bufbd7: 4bufbdf: 6bufbd4: 5
© Synopsys 2012 60
bufbd1: 7bufbd2: 1
Reporting the Longest and Shortest Paths
• The longest and shortest paths corresponding to all corners are reported, soon after the post optimization report:
++ Longest path for clock CLK3 in corner 'max':object fan cap trn inc arr r locationclk3 (port) 32 0 0 r ( 440 748)clk3 (net) 13 97…I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1)
167 4 289 r ( 521 520)
++ Shortest path for clock CLK3 in corner 'max':object fan cap trn inc arr r locationobject fan cap trn inc arr r locationclk3 (port) 32 0 0 r ( 440 748)clk3(net) 13 97…I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1)
217 4 247 r ( 687 656)217 4 247 r ( 687 656)
• Placement legalization related messages are located at the end of the optimize_clock_tree command log
© Synopsys 2012 61
Thank you
© Synopsys 2012 62
© Synopsys 2012 63