Post on 19-Dec-2015
transcript
Charles Dike 1R
®
Synchronization Ideas
Charles E. Dike
Intel Corporation
Charles Dike 2R
®
Introduction
• Tutorial
• Share some ideas about synchronization and metastability
• Introduce NEW, IMPROVED theory on metastability
• Charles Dike (cdike@ichips.intel.com)
Charles Dike 3R
®
Why and where synchronize? Reduce latency between independent clock domains.
Asynchronous domain to synchronous clock.Synchronous clock to an independent synchronous
clock.
Benefit - higher performance in critical circuits.
Asynchronous
Circuit
Pausable
Clock
at 1.8 GHz
Synchronous
Clock
at 3.0 GHz
Synchronous Clock at 1.5GHz
Synchronous Clock at 1.5GHz
Charles Dike 4R
®
Design Direction
MEM
FPU
ALU
MEM
FPU
ALUMEM
FPU
ALU
MEMFPU
ALU
80stowards 100MHz
90stowards 1GHz
00smulti-GHz
VALUE ADDED
Charles Dike 5R
®
Chip Area NetworksLate 00s
multi-GHz
Charles Dike 6R
®
I believe….
• We must be able to synchronize all domains to a PLL controlled clock
• Interconnect on chip will be asynchronous (GALS)
• We need to minimize latency
• There will be two basic synchronizer uses - near neighbor and the chip net
Charles Dike 7R
®
Topics of Discussion• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in StrongArm
• The Myrinet pipeline synchronization scheme
• Latest understanding of metastability
Charles Dike 8R
®
Generic Synchronizer• Handles self timed to synchronous
interfaces and vice-versa
• Supports synchronous to synchronous interfaces
• Can handle streaming data
• Adaptable to any speed range
• Possibly used over the chip network
Charles Dike 9R
®
Two flop synch
D Q D Q
CLK
VALID#1 #2
Charles Dike 10R
®
Single latch synch
D Q D Q
CLK2
REQ
S R
Q
DQ DQ
CLK1
Write Valid Read Valid
ACK
LATCH OUTPUT
RECEIVER CLOCK
SENDER CLOCK
Charles Dike 11R
®
Multi latch synch
D Q D Q
CLK2
REQ
S R
Q
DQ DQ
CLK1
Write Valid Read Valid
ACK
D Q D Q
CLK2
REQ
S R
Q
DQ DQ
CLK1
Write Valid Read Valid
ACK
Charles Dike 12R
®
General Case
1000000000
0000010000
WRITEPOINTER
READPOINTER
EMPTY
SYNC
STATUSREGISTER
1111100000
SYNCHRONIZERS
LATENCY
PADDING
FULL
ENEN EN
Write Clock
Write Enable
Read Clock
Charles Dike 13R
®
empty caseWRITE
POINTERREAD
POINTERSTATUS
REGISTER
EMPTYD Q
REN
D Q
R
D Q
R
SYNCHRONIZER
Write Pointer a
Read Pointer bRead Clock
EMPTYD Q
REN
D Q
R
D Q
RWrite ClockWrite Enable
Write Pointer b
Read Pointer a
Charles Dike 14R
®
General Case
1000000000
0000010000
WRITEPOINTER
READPOINTER
EMPTY
SYNC
STATUSREGISTER
1111100000
SYNCHRONIZERS
LATENCY
PADDING
FULL
ENEN EN
Write Clock
Write Enable
Read Clock
Charles Dike 15R
®
Topics of Discussion• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in StrongArm processor
• The Myrinet pipeline synchronization scheme
• Latest understanding of metastability
Charles Dike 16R
®
Simple Synchronizer• Constrained by frequency ratio
• Supports synchronous to synchronous interfaces
• Does it support asynch to synch? Yes, with restrictions.
• Possibly used in local neighbor synchronizers
Charles Dike 17R
®
Simple Synchronizer
D Q D Q D QD Q
Divide by 2
SLOW CLK
FAST CLK
SYNC
MI**
MI* = Metastable Immune
A A1 A2 A3
w x y z
Charles Dike 18R
®
timing1D Q D Q D QD Q
Divide by 2
SLOW
FAST
SYNC
MI**
A A1 A2 A3
1 2 3 4 5 6FAST CLOCK
SLOW CLOCK
A
A1
A2
A3
SYNC
Charles Dike 19R
®
timing2D Q D Q D QD Q
Divide by 2
SLOW
FAST
SYNC
MI**
A A1 A2 A3
1 2 3 4 5 6FAST CLOCK
SYNC
SLOW CLOCK
CHEATER CLOCK
Charles Dike 20R
®
timing3D Q D Q D QD Q
Divide by 2
SLOW
FAST
SYNC
MI**
A A1 A2 A3
1 2 3 4 5 6FAST CLOCK
SYNC
SLOW CLOCK
CHEATER CLOCK
Charles Dike 21R
®
timing4
Divide by 2
SLOW
FAST
SYNC
MI**
A A1 A2 A3
1 2 3 4 5 6FAST CLOCK
SYNC
SLOW CLOCK
SLOW CLOCK#
SYNC
D Q D Q D Q
FAST
SYNC
MI**
A A1 A2 A3
D Q D Q D QD Q
D Q
MI**
Charles Dike 22R
®
transfers1 2 3 4 5 6FAST CLOCK
SYNC
SLOW CLOCK
CHEATER CLOCK
D Q D Q
SYNCFAST CLOCK
SLOW CLOCK
FAST TO SLOW TRANSFERSLOW TO FAST TRANSFER
D Q D Q
SYNCFAST CLOCK
SLOW CLOCK
Charles Dike 23R
®
Topics of Discussion• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in StrongArm
• The Myrinet pipeline synchronization scheme
• Latest understanding of metastability
Charles Dike 24R
®
Pipeline Synchronizer• Supports synchronous to synchronous
interfaces• Supports asynch to synch and vice-
versa• Possibly used in local neighbor
synchronizers• Essentially a distributed fifo and
synchronizer
Charles Dike 25R
®
Pipeline Synchronizer
S Ri
Ai
Di
Ro
Ao
Do
S Ri
Ai
Di
Ro
Ao
Do
S Ri
Ai
Di
Ro
Ao
Do
Charles Dike 26R
®
R1
R0
A1
A0
ME
S
ME element
XREQ
Charles Dike 27R
®
Fifo element
Ri
Ai
Di
Ro
Ao
Do
C
Ri
Ai
Ro
AoC
Data
Charles Dike 28R
®
Async to sync
S Ri
Ai
Di
Ro
Ao
Do
S Ri
Ai
Di
Ro
Ao
Do
S Ri
Ai
Di
Ro
Ao
Do
Synchronous Asynchronous
Charles Dike 29R
®
Sync to async
Synchronous Asynchronous
Ri
Ai
Di
Ro
Ao
Do
Ri
Ai
Di
Ro
Ao
Do
Ri
Ai
Di
Ro
Ao
Do
SSS
Charles Dike 30R
®
Points to ponder #1• All synchronizing interfaces have one thing in
common - a latching element that holds data while metastabilities are being resolved.
• There is no way to avoid the latency which is required to resolve metastabilities.
• To minimize latency the latching element characteristics can be improved.
• We will be required to understand and use this knowledge. This is the future of digital design.
Charles Dike 31R
®
Topics of Discussion• Generic synchronizer of the type used
in the TeraFlops computer
• Simple synchronizer of the type used in StrongArm
• The Myrinet pipeline synchronization scheme
• Latest understanding of metastability
Charles Dike 32R
®
Role of the Synchronizing Flop
• Reorients incoming information to a clock edge
• Its performance determines system failure rate or latency
Charles Dike 33R
®
Real Life• There is no magic bullet• There is a lot of misinformation on metastability
around• To date many circuits have been over designed
through planning and luck• Whenever a circuit fails based on too high of a
frequency ultimately the cause of failure is metastability
• There is no way to synchronize a signal faster than about the time it takes to pass a signal through six static gates
Charles Dike 34R
®
Metastability is....
SET
RESET
OUT
OUT
NODE A
NODE B
Charles Dike 35R
®
Technical terms• Tw (window size) - likelihood of entering a
metastable state - in units of time• Tau () - rate at which metastability
resolves - in units of time• MTBF (Mean Time Between Failures)
MTBF =Twfdfc
e t
<Vn2>=4kT/C < thermal noise
Charles Dike 36R
®
Simple jamb latch
DATA
CLOCK RESET
OUTNODE A NODE B
Propagation delay
time of dataafter clock
Charles Dike 37R
®
Simple jamb latch
DATA
CLOCK RESET
OUTNODE A NODE B
Propagation delay
time of dataafter clock
~RC time constant
Charles Dike 38R
®
Rough Histogram
Propagation delay
time of dataafter clock
Propagation delay
time of dataafter clock(log scale)
MTBF =Twfdfc
e t
Tw
The slope is the
Charles Dike 39R
®
Why is the theory a problem?
• It assumes a uniform distribution of data about the clock– What happens when data always violates the setup/ hold window?
• It is not detailed enough– Doesn’t consider a deterministic region
– Doesn’t account for thermal noise
• People tend to extrapolate the theory improperly
MTBF =Twfdfc
e t
Charles Dike 40R
®
Overview of refined theory
• Not everything past a normal propagation is a metastable event
• The Tw window can’t be improved by input edge rates
• Tw has a complex relationship to t based on load
• The MTBF formula needs to be modified due to non-uniform distribution of data about the clock input
Charles Dike 41R
®
Schematic
Charles Dike 42R
®
tau= 29.9 ps, Tw= 211.9 ps normal prop= 189.2 ps
0.1
1
10
100
1000
0.15 0.2 0.25 0.3 0.35
propagation delay in ps
Win
do
w w
idth
in
ps
propagation delay in ns
0.8 ps
1.8 ps2.8 ps
Simulation of Typical Latching Device
4.8 ps
Simulation of a typical latching device
Charles Dike 43R
®
Test case
D QR
PC
DELAY
PULSE GENERATOR#2
PULSE GENERATOR#1
TRIGGER
INPUT
TEK 11801-BOSCILLOSCOPE
DELAY
Charles Dike 44R
®
Measuring real data
advancing time
0.1
1
10
100
1000
10000
100000
1000000
10000000
-3.00E-10 -2.50E-10 -2.00E-10 -1.50E-10 -1.00E-10 -5.00E-11 0.00E+00 5.00E-11 1.00E-10
Series1
Charles Dike 45R
®
Histogram
Inflection point
time
0.6mv/0.1ps
Charles Dike 46R
®
Histogram
Inflection point
time
0.6mv/0.1ps
Charles Dike 47R
®
Measured versus Basic
Propagation delay
time of dataafter clock(log scale)
MTBF =Twfdfc
e t
Tw
The slope is the
Propagation delay
0.6mv/0.1ps
Charles Dike 48R
®
Simulated....
Voltage Controlled Switch
R1 = 100 R1 = 100M
Battery
Charles Dike 49R
®
Tau Simulated 2
=| t1 - t2 |
ln V2V1
Where:V1 = voltage at time t1V2 = voltage at time t2
t2
t1
Latch outputs at nodes 1 and 2
1.0 1.2 1.4ns
Semilog difference between latch outputs
1.0 1.2 1.4ns
100
10-3
10-6
volts
time
1.5
1.0
0.5
0.0
volts
Charles Dike 50R
®
<Vn2>=4kT/C=4kTBR
k = 1.38 x 10-23 J/K
B = 1/=5 x 1010Hz
R = ~400 T = 300o K
= 20 picoseconds
Vn = ~0.6 mv
Charles Dike 51R
®
Putting it all together
-50 0 20010050 150 250
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
1.80 ns
(picoseconds)A
normal
Charles Dike 52R
®
Putting it all together
-50 0 20010050 150 250
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
1.80 ns
(picoseconds)B
?deterministic
Charles Dike 53R
®
Putting it all together
-50 0 20010050 150 250
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
1.80 ns
(picoseconds)C
Thermal noise point
1.80 v
180 mv
18.0 mv
1.80 mv
180 v
18.0 v
1.80 v
deterministic
Charles Dike 54R
®
Putting it all together
-50 0 20010050 150 250
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
1.80 ns
(picoseconds)D
T=19 ps
deterministic true metastability
Charles Dike 55R
®
Putting it all together
-50 0 20010050 150 250
180 ps
18.0 ps
1.80 ps
0.18 ps
18.0 fs
1.80 fs
0.18 fs
1.80 ns
(picoseconds)E
Tw=15 ps
T=19 ps
deterministic true metastability
Charles Dike 56R
®
MTBF =Twfdfc
e(t-deter)
MTBF =Twfdfc
e t
Worst case
Simple case
MTBF =Twfdfc
e(t-0.5*deter)Expected
Charles Dike 57R
®
Points to ponder #2Jakov Seizovic postulated a “malicious” asynchronous signal:no matter how we position the sampling window, and no matter how small we make the sampling window, the asynchronous transition will appear in that window.
This case has to be assumed when interfacing to a signal of unknown probability distribution.
We know something about just how malicious a signal can be.
Charles Dike 58R
®
Exploring
Charles Dike 59R
®
Worst case bound
Charles Dike 60R
®
< 0.1 ps
Uniform distribution
12 ps jitter
Not worst case bound
Charles Dike 61R
®
Final comments • With the proper synchronizing device it may be possible to
synchronize a signal within a single clock cycle. The constraints
are: – You require about 35 s in order to get the MTBF out to about 1
century.
– Each typical static gate delay is equivalent to about 5 s in a properly designed synchronizing flop.
– The metastability MTBF of a device should probably be an order of magnitude better than the mechanical MTBF.
– You must assume a ‘malicious’ input to the synchronizer. Nevertheless, this only adds about 5s to the delay.
– Standard flop designs are generally very poor synchronizers. Use a jamb structure. It has the best transconductance.
– You should never require more than two synchronizing flops in series
Charles Dike 62R
®
Conclusion
• There are several ways to communicate between independent domains
• I believe more asynchronous domains will appear that are imbedded within synchronous designs– Latency must be reduced to maximize the use of asynchronous designs.
– This is a burden that asynch designers must bear
– We need to know the limitations of synchronization and metastability
• Chip area networks are coming and they will open up opportunities for asynchronous design
Charles Dike 63R
®
References
• T. Sakurai, “Optimization of CMOS Arbiter and Synchronizer Circuits with Submicrometer MOSFET’s,” IEEE J. Solid State Circuits, vol. 23,no. 4, pp. 901-906, Aug 1988.
• L. Kleeman and A. Cantoni, “Metastable Behavior in Digital Systems,” IEEE Design & Test of Computers, pp. 4-19, Dec 1987.
• I. E. Sutherland, “Micropipelines.” Turing Award Lecture, Communications of the ACM, 32(6), pp.720-738, 1989.
• J. N. Seizovic, “Pipeline Synchronization,” Proc. Int’l Symp. Advanced Research in Asynchronous Circuits and Systems, CS Press, 1994.
• C. Dike and E. Burton, “Miller and Noise Effects in a Synchronizing Flip-Flop,” IEEE J. Solid State Circuits, vol. 34,no. 6, pp. 849-855, June 1999.
• A. Van der Ziel, Noise in Measurements. New York: Wiley, 1976.
Charles Dike 64R
®
Overview of present theory• Everything past a normal propagation is
considered a metastable event• A deterministic region doesn’t exist
• Tw has no fixed relationship to • The MTBF formula assumes a uniform
distribution of data about the clock input
MTBF =Twfdfc
e t