+ All Categories
Home > Documents > Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented...

Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented...

Date post: 14-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
39
Melanie Berg MEI Technologies- NASA/GSFC Radiation Effects and Analysis Group [email protected] Complexity Management and Design Optimization Regarding a Variety of Triple Modular Redundancy Schemes through Automation
Transcript
Page 1: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Melanie BergMEI Technologies- NASA/GSFC Radiation Effects and Analysis Group

[email protected]

Complexity Management and Design Optimization Regarding a Variety of Triple Modular Redundancy Schemes through

Automation

Page 2: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 2Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

AgendaSection I: Single Event Effects in Digital LogicSection II: FPGA Basics – Architectural DifferencesSection III: Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)

Section IV: The Automation Process

Page 3: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 3Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Section I: Single Event Effects in Digital Logic

MEO: Medium Earth Orbit

HEO: Highly Elliptical Orbit

GEO: Geosynchronous Earth Orbit

Van Allen Radiation Belts: Illustrated by Aerospace Corp.

Page 4: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 4Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Source of Faults: SEEs and Ionizing ParticlesSingle Event Effects (SEEs)

Terrestrial devices are susceptible to faults mostly due to:

alpha particles: from packaging and doping and Neutrons: caused by Galactic Cosmic Ray (GCR) Interactions that enter into the earth’s atmosphere.

Devices expected to operate at higher altitude (Aerospace and Military) are more prone to upsets caused by:

Heavy ions: direct ionizationProtons: secondary effects

Page 5: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 5Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Device Penetration of Heavy Ions and Linear Energy Transfer (LET)

LET characterizes the deposition of charged particlesBased on Average energy loss per unit path length (stopping power)Mass is used to normalize LET to the target material

dxdELET

1

Density of target material

Average energy deposited per unit path length

mgcmMeV

2

Units

;

Page 6: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 6Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

LET vs. Error Cross Section Graph

Error Cross Sections are calculated per LET value in order to characterize the number of potential faults and error rates in the space environment

Terminology:Flux: Particles/(sec-cm2)Fluence: Particles/cm2

Error cross section(): #errors normalized by fluenceError cross section is calculated at several LET values (particle spectrum)

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

0 20 40 60 80 100

LET (MeV*cm2/mg)

se

u (c

m2 /b

it)8F8L 100MHz

LET vs. :

fluenceerrors

seu#

Page 7: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 7Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Single Event Faults and Common Terminology

Single Event Latch Up (SEL): Device latches in high current stateSingle Event Burnout (SEB): Device draws high current and burns outSingle Event Gate Rupture: (SEGR): Gate destroyed typically in power MOSFETsSingle Event Transient (SET): current spike due to ionization. Dissipates through bulkSingle Event Upset (SEU): transient is caught by a memory element Single Event Functional Interrupt (SEFI) -upset disrupts function

Page 8: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 8Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Radiation Induced Fault Generation

(SET)s can develop in combinatorial logicSETs can vary in pulse width (Tpulse) and amplitude.

RCfc 21

Each capacitance has its own fc

Transistor Cutoff frequencies

nodenodecrit VCQ *QcritQcoll

Geometry of TransistorsLoading of TransistorsLength of RoutesSwitching Rates

Page 9: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 9Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Single Event Effects (SEEs) and IC System ErrorSEUs or SETs can occur in:

Combinatorial Logic (including global routes)Sequential LogicMemory Cells

Depending on the Device and the design, each fault type will:

Have a probability of occurrenceEither have a significant or insignificant contribution to system error

Every Device has different Error Responses – We must understand the differences and design

appropriately

Page 10: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 10Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Section II: FPGA Basics –Architectural Differences

Page 11: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 11Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Configuration… Only FPGAs

FPGA MAPPING

Configuration Defines:Arrangement of pre-existing logic via programmable switches

Functionality (logic cluster)Connectivity (routes)

Programming Switch Types:Antifuse: One time Programmable (OTP)SRAM: Reprogrammable (RP)Flash: Reprogrammable (RP)

Page 12: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 12Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Combinatorial Logic Blocks and Potential Upsets… SETs in ASICs and Anti-fuse FPGAs

Glitch = Transient

Logic Logic

M1 M1

M2 M2

M3Antifuse

Logic Logic

M1 M1

M2 M2

M3Antifuse

SETP

Metal layers not

susceptible

Sensitive Region

Page 13: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 13Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

DFF’s: SEUs and SEFIs

Strike Caught in Loop

D Qreset

CLK

Probability of SEU

DFFSEUP

Probability of SEFISEFIP

Page 14: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 14Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Transient Capture on A DFF Data Input Pin (SET→SEU)

clockTpulse

tp = 1/fs

fs

PfsPfsPfsTfsP DFFEnSETpropSETgenpulse

seuset 12

)()()(

fs : System FrequencyT(fs)pulse : SET Pulse WidthP(fs)SETgen : Probability SET generated with sufficient amplitudeP(fs)SETprop : Probability SET can propagate with sufficient amplitudePDFFEn : Probability DFF is enabled (active)P(fs)SET→SEU : Probability SET can be caught by clock edge

SEUSETfsP )(

Page 15: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 15Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Summary: Most Significant Factors of System Error Probability P(fs)error

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(

Configuration DFFs SEFIs

SRAM Based FPGAs

STATIC

SEU

Dynamic

SET→SEU

Clocks & Resets

Inaccessible control circuitry

ionConfiguratP DFFSEUPSEUSETfsP )( SEFIP

Page 16: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 16Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Antifuse FPGA Devices

Currently the most widely employed FPGA Devices within space applicationsConfiguration is hardened due to fuse based technology (Metal to Metal)Localized (@ DFF node) Mitigation (TMR or DICE) is employedClock and Reset lines are hardened

Page 17: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 17Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

ACTEL RTAX-S Architecture Basics

Source: RTAX-S/SL RadTolerant FPGAs 2009 Actel.com

Super Cluster:•Combinatorial Cells: C CELLS•DFF Cells: R Cells

Page 18: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 18Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

ACTEL RTAX-S Combinatorial and Sequential Logic

Combinatorial logic: C-CELL

Sequential logic R-CELLCombinatorial logic C-CELL

Super Cluster

C RRX

TX

RX

TX

RX

TX

RX

TX

BC CC R

Combinatorial logic C-CELL

TX

C

C C R

RX

Page 19: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 19Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

General Xilinx Virtex 4 FPGA Architecture: SRAM Based Configuration

Page 20: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 20Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Section III: Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)Distributed Triple Modular Redundancy (DTMR)

Page 21: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 21Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Mitigation Error Correction or Error avoidanceMitigation can be:

Embedded: built into the device library cellsUser does not verify the mitigation – manufacturer does

User inserted: part of the actual design processUser must verify mitigation… Complexity is a RISK!!!!!!!!

Mitigation should reduce error…Generally through redundancyIncorrect implementation can increase error

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(

Page 22: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 22Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Example: TMR Mitigation Schemes will use Majority Voting

I0 I1 I2 Majority Voter0 0 0 00 0 1 00 1 0 00 1 1 11 0 0 01 0 1 11 1 0 11 1 1 1

102021 IIIIIIterMajorityVo

Page 23: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 23Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)Distributed Triple Modular Redundancy (DTMR)

Page 24: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 24Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

BTMR

Need Feedback to CorrectGenerally can not apply internal correction from voted outputsErrors can accumulate – not an effective technique

VOTING

MATRIX

Complex function

with DFFs

Page 25: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 25Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)Distributed Triple Modular Redundancy (DTMR)

Page 26: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 26Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Local Triple Modular Redundancy (LTMR): Voter+Feedback=Correction

Triple Each DFF + Vote+ Feedback Correct at DFFUnprotected:

Clocks and Resets… SEFITransients (SET->SEU)Internal/hidden device logic: SEFI

SEFISEUSETDFFSEUerror PfsPPfsP )( SEFISEUSETDFFSEUerror PfsPPfsP )(Low

Non-Mitigated Mitigated

Page 27: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 27Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Example… LTMR DFF Library Components and SETs

Combinatorial logic

Sequential logicCombinatorial logicX

X

XCombinatorial logic

TX

RX

Embedded LTMR in Library Cell

Page 28: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 28Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

RTAX Example: Probability of Error Reduction

•Error Rate must reflect frequency of operation•Low Design implementation Complexity

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low ~00

Page 29: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 29Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Example…Upper-Bound Error Prediction for LTMR + hardened Global Routes RHBD

Given…15MHz to 120MHz: Dynamic Error Bit Rate

P(fs)SET→SEU:

daybitErrors

dtfsdEbit 89 106101

Source: NASA Goddard

SEUSETerror fsPfsP )( SEUSETerror fsPfsP )(

Page 30: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 30Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Upper-Bound Error Prediction Actel RHBD Anti-fuse FPGA

UsedDFFsdt

fsdEdtdE bit #*

designbitsx

daybitErrorsx 48 105*106

SEUSETerror fsPfsP )( SEUSETerror fsPfsP )(

With embedded LTMR Mitigation + Hardened Clocks:

daydesignErrorsx

dtdE 3103

50,000 DFFs

Page 31: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 31Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)Distributed Triple Modular Redundancy (DTMR)

Page 32: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 32Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Global Triple Modular Redundancy (GTMR): Largest Area → Complexity

Triple Entire DesignTriple I/O and VotersUnprotected – hidden device logic SEFIsCan not be an embedded strategy: Complex to verify

Non-Mitigated Mitigated

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low LowLow Low

Page 33: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 33Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

GTMR Proves To be A Great Mitigation Strategy… BUT…

Triplicating a design and its global routes takes up a lot of power and areaNot part of the provided and well tested/characterized library elementsGenerally performed after synthesis by a tool–not part of RTLDifficult to verifyAdditional complications with Clock Skew and domain crossingsCan be implemented in an ASIC… but is not considered as a contemporary methodology

Page 34: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 34Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Reducing System Error: Common Mitigation Techniques

Triple Modular Redundancy:Block Triple Modular Redundancy (BTMR)Local Triple Modular Redundancy (LTMR)Global Triple Modular Redundancy (GTMR)Distributed Triple Modular Redundancy (DTMR)

Page 35: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 35Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

DTMR

Looks a lot like GTMR only difference is that the Global routes and I/O are not triplicatedSmall reduction in area vs. GTMRSmall reduction in power vs. GTMRCan be slightly slower than GTMR because all circuitry share the same clock

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low LowLow

Page 36: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 36Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Section IV: The Automation Process

Page 37: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 37Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Automation through Synthesis

Mentor Graphics and Synplicity provide TMR insertionIt is up to the designer to understand which type of TMR to implement based on the target FPGA and the target space environment

FPGA LTMR DTMR GTMRAntifuseSRAMFlash

General RecommendationNot Recommended but may be a solution for some situationsWill not be a good solution

Page 38: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 38Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

Automation Process

VHDL

Select Mitigation

Synthesis

Review Synthesis Output

Gate Level Simulations

Page 39: Complexity Management and Design Optimization Regarding a … · 2013. 11. 13. · Page 8 Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010 Radiation

Page 39Presented by Melanie Berg MARLUG Applied Physics Lab, Maryland March 26th 2010

SummarySEEs will affect FPGAs in space radiation environmentsTMR has been the most effective SEE mitigation techniqueThere are many types of TMR:

BTMRLTMRDTMRGTMR

Vendors have integrated different TMR schemes into their synthesis packageThe designer must be aware of the target FPGA and its SEE sensitivity before using any automated approachAfter TMR insertion, a rigorous review and simulation process must be performed


Recommended