ACOUSTIC WAVE DETECTORS TO PREVENT SEUS IN LLC.
Gaurang Upasani
1
28th June, 2011
Outline
Motivation Overview Acoustic wave detectors : Introduction Application of Acoustic wave detectors
Nehalem Quad core (i7) LLC
2
Motivation
Core count is already TDP/SER limited Decreasing voltage increases SER sensitivity Recent trends in Soft error protection :
Physical level Device resizing, restructuring, hardening.
Adding redundancy at micro architecture and architecture level. Error Detecting and Correcting Codes. Redundant Execution. (DMR, TMR etc.)
3
Protecting Memories: Parity
Simplest. Cannot detect even number of faults. Does not identify which bit has been flipped
The word cannot be corrected Not optimal in terms of number of code bits used.
4
Error Correcting Codes
K data bits, r code bits(n = k+r). r code-bits must be able to determine the exact
position or the positions in the error. Correcting m bits.. 2r >= Ʃ((k+r)Ci) (i = 0 to m)
If K = 64. (64 bit data word) Single bit correction (m=1) the minimum r = 7. Double bit correction (m=2), r = 13. Triple bit correction (m=3), r =20.
5
ECC Overhead
Multi-threaded Xeon® Processor. (DEC-TED)
Itanium processor SEC-DED
SEC-DEC = single bit error correct double bit error detect DEC-TED = Double bit error correct triple bit error detect
6
Evaluation of EDACs
Each structure must be protected separately. Logic to encode/decode is not cheap
SEC-DED to DEC-TED delay penalty is almost double, encoder area doubles, decoder area increases by 16x.
Special care required for multi bit errors. Spatial – adjacent bits, mostly same strike.
Interleaving: two errors in consecutive bits to be caught in two different code words.
Getting more and more expensive (interleaving factor goes up)
Temporal – non-adjacent bits, two different strikes. Scrubbing: cost in energy and may impact performance
For low voltage, we need very strong ECCs
7
Motivation summary
Error detecting and correcting codes in low voltage operation are expensive Consumes larger die-area Slow encode/decode cost in terms of performance. Increases the power budget due added components.
Finding an appropriate method to detect vulnerable area for protection, adding minimal hardware overhead is a challenge.
8
Outline
Motivation Overview Acoustic wave detectors : Introduction Application of Acoustic wave detectors
Nehalem Quad core (i7) LLC
9
Acoustic wave detectors : Introduction
Our choice: Dimensions : L = 1µm; W= 1µm; H = 0.05µm Area ~ 1µm2 (~1 bit).
Can detect the peak power density of 0.3mw/cm2
at a distance of 5mm from the source of the sound. Area covered by a single detector : 78.5375mm2
Area equivalent to the LLC in Nehalem Quad core (i7), 45nm
R=5mm
cantilever beam 10
Goal
Acoustic sensor ~ 1 bit parity We want to use them to locate the particle strikes
The idea is to use less sensors than all ECC bits to save in area
Once particle location is identified Identify potential bit flips Apply corrective mechanisms if required
MICROARCHITECTURE
Nehalem Quad core (i7), 45nm
• Die area : 245.7mm2
• Approximate dimensions : • L = 13mm , W = 18.9mm
w
L
12
Last level cache
Last level cache Size : 8 MB, Line size = 64B. Approximate Area : 78mm2 Approximate dimensions :
L = 4.41mm W = 17.64mm Approximate area of a bit = 1.0768µm2
w
L
13
What is it all about? S1 S2 S3 S4
Using 4 such sensors it is possible to detect a particle strike in the LLC
But… we want to be precise in the location and the time
14
Latency
Speed of phonons(vibration energy) in Si lattice is 10km/sec. (~23bits/ns)
What does this mean? If the processor speed = 2GHz. (1 cycle = 0.5ns) For the worst case (5mm away) particle strike
detection one detector needs = 1000 cycles A method to revert back 1k cycles would be used
(check pointing, logs etc. ) Of course, we can add as many sensors as required
to decrease the WC latency
15
Detecting with acoustic sensors
Problem Statement : Finding the unknown location of the particle strike(Xa,Ya).
S10 (X10,Y10)
S2 (X2,Y2)
S1 (X1,Y1)
S3 (X3,Y3)
(Xa,Ya)
S4 (X4,Y4)
S5 (X5,Y5)
S6 (X6,Y6)
S7 (X7,Y7)
S8 (X8,Y8)
S9 (X9,Y9)
Sn (Xn,Yn)
16
Questions to be answered.
How many sensors do I need? At least 3
Where should I put them? What is the accuracy?
Granularity of the location? cache line, cache set, bit?
Latency: when do I detect the error? If you have multiple cantilevers, the detection error is the first one that raises the flag. Some sensors for detection latency, some for localization?
17
S2 (X2,Y2,t2)
S1 (X1,Y1,t1)
S3 (X3,Y3,t3)
Traversal of acoustic wave.
S10 (X10,Y10)
(Xa,Ya,T)
S4 (X4,Y4)
S5 (X5,Y5)
S6 (X6,Y6)
S7 (X7,Y7)
S8 (X8,Y8)
S9 (X9,Y9)
Sn (Xn,Yn)
18
S1 (X1,Y1,t1)
S3 (X3,Y3,t3)
S2 (X2,Y2,t2)
(Xa,Ya,T)
d2
d1
d3
Determination of strike position.
T t1 t2 t3 DeltaT12 DeltaT23
time
19
Equations
If the range difference measurement is observed between the two stations the estimated locus of position will be a hyperbola.
We generate simultaneous set of algebraic position equations(generally non-linear).
We linearize them by Taylor series estimation.
We solve them using Gauss-Newton interpolation.
yes
No
Get “timer0” value event t1
If event t(i+1) is high? i=1,2
Get “timer0” value
DeltaT12 = t2-t1 DeltaT23 = t3-t2
DeltaD12 = Cp * DeltaT12 DeltaD23 = Cp * DeltaT23
Micro controller t1
t2 t3
Sampling frequency
Speed of phonon in Si lattice = Cp DeltaD12 = d2-d1 DeltaD23 = d3-d2
T t1 t2 t3 DeltaT12 DeltaT23
time
Detection Hardware 21
Use the iterative Gauss-Newton interpolation Initial guessed location : Xv,Yv Measured differences in the distances from two sensors
DeltaD12 and DeltaD23 Another problem: it may not converge…
S1 (X1,Y1,t1)
S3 (X3,Y3,t3)
S2 (X2,Y2,t2)
(Xa,Ya,T)
(Xv,Yv)
22
Determination of strike position.
Real World: Errors!!!
T
t1
t2
t3
e1 Tp
e2
e3
S1
S2
S3
Tp = 0.5 ns (sampling period) ei = [0,0.5]ns implies ei-ej = [-0.5,0.5]ns Suppose strike happens at 3.6 ns in time, it will be
detected at S1 at only 4.0ns. So e1 = 0.4ns
23
Impact of Errors
Distribution of error depends on the number of sensors and the sampling frequency.
If we use more than 3 sensors, we have an over determined system of equations. This may help reduce the error measurements
High-level Algorithm
System inputs •Number of sensors •Location of the sensors •Initial(guessed) location of strike •Difference in measured distances of actual strike from sensors.
System outputs •Location of strike •Error estimation
Iterative triangulation method “Gauss Newton Interpolation”
25
S1 (X1,Y1,t1)
S3 (X3,Y3,t3)
S2 (X2,Y2,t2)
• We calculate circular error probability CEP • 50% probability within CEP • 93% probability within 2*CEP. • 100% probability the strike within 3*CEP. • Further than 3*CEP is 0.2%
(Xnew,Ynew)
(Xa,Ya,T)
Area of Error Distribution. 26
Summary of issues
Latency of detection Reduced error area 100% coverage (convergence)
Results
Experiments performed for different number of sensors for various locations on the cache, with different initial guessed location with sampling frequency varying from 2GHz to 4 GHz.
Convergence varies depending upon the #sensors, their locations and sampling frequency. We managed to achieve 100% coverage
28
Results
Stabilization of algorithm achieved by... Proper termination criteria. Grid formation of the sensors.
Current results do not include the sampling errors, Goal…
Obtain the localization with the accuracy of ~1 bit. For any number of sensors distributed in grid formation With dynamic selection of “n” sensors And initial guess location.
Results
Achieved localization granularity ~1 bit. 30
#Sensors in the grid
#sensors triggered. #sensors selected
dynamically.
Formation of the grid.
Min. Max. Vertical Horizontal
15 5 8 4 5 3 18 5 9 5 6 3 20 6 11 6 5 4 24 7 12 7 6 4 25 8 14 8 5 5 30 9 15 9 6 5 30 10 17 10 5 6 36 11 18 11 6 6
Results
Latency can be reduced by adding more dummy sensors.
Future work
Observe different trade offs by varying number of sensors, locations of sensors, sampling frequency.
Changing the sensitivity of the sensors. Map multi-bit upsets. Add micro-architecture features to contain the
error due to the latency. Improving runtime FIT rate budget, chip error
detection, protecting logic etc. Extend the idea to the whole core.
32
References
1. Cosmic Ray Detectors for Integrated Circuit Chips. Eric Hannah, Intel corp. US Patent # 7,166,847
2. The design and Construction of a Mechanical Radiation Detector. M. D. Hammig.
3. Nuclear Radiation Detection via the deflection of pliable microstructures. M. D. Hammig.
4. Position-Location Solutions by Taylor-Series Estimation. Wade H. Foy.
5. Architecture Design for Soft Errors. Shubhendu Mukherjee.
33