+ All Categories
Home > Documents > XRTC Use of Fault Injection to Simulate Upsets in ...

XRTC Use of Fault Injection to Simulate Upsets in ...

Date post: 10-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
26
MAPLD 2008 XRTC Use of Fault Injection XRTC Use of Fault Injection to Simulate Upsets in to Simulate Upsets in Reconfigurable Reconfigurable FPGAs FPGAs Gary Swift, Chen Wei Tseng, and Gregory Miller, Xilinx, Inc., Gregory R. Allen, Jet Propulsion Laboratory / Caltech, and Heather Quinn, Los Alamos National Laboratory
Transcript
Page 1: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008

XRTC Use of Fault Injection XRTC Use of Fault Injection to Simulate Upsets in to Simulate Upsets in Reconfigurable Reconfigurable FPGAsFPGAsGary Swift, Chen Wei Tseng, and Gregory Miller, Xilinx, Inc., Gregory R. Allen, Jet Propulsion Laboratory / Caltech, and Heather Quinn, Los Alamos National Laboratory

Page 2: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 2

Overview

• Introduction - Reconfigurable FPGAs– Design-Level vs. Configuration-Level– Radiation Test Consortium (XRTC)

• XRTC Beam Tests - Methodology and Results• Verifying Redundant Designs• XRTC Fault Injector• Lessons Learned So Far• Future Directions

XilinxRadiation

TestConsortium

Page 3: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 3

Introducing Virtex-4QV FPGAs

• Space-Grade Reconfigurable Family of Four – Guaranteed 300 krad(Si) and Latchup Immune– Bigger and More Powerful = More Complex

• Design-Level vs. Configuration Level– Triple Modular Redundancy XTMR

• Resides in Design-Level Providing Upset Robustness• Protects Both Levels, but many more Configuration Upsets

– Errors only on statistically “unlucky” coincident upsets• In two domains, same voted segment• During single scrub cycle (fraction of a second)

– SRL16s, LUTROM, and LUTRAM Cross Levels• Formerly forbidden, new Virtex-4 feature allows their use

Page 4: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 4

XRTC Beam Tests

• Basic Philosophy: Continuous Monitoring with In-Beam Strip Charts of:– Design Functionality– Configuration Upsets

– Also Power and Temperature

DUT

FuncMonFPGA

Counter/Buffer

Host Computer

Counter/Buffer

ConfigMonFPGA

Host Computer

BEAM

Page 5: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 5

XRTC Apparatus

BEAM

FPGAundertest

Testing at Texas A&M Cyclotron Institute in Air

Page 6: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 6

XRTC Results – Space Upset Rates

• SEFI Rate is about one per century• Unprotected Designs: a few upsets per day in GEO• Mitigation makes these upset negligible• Robustness at one error per century (SEFI Rate) with:

– Design-Level: Triple Modular Redundancy• Assures no single-point of failure

– Config-Level: Configuration Management• Prevents upset accumulation (transparent to design operation)• SEFI detection logic triggers reconfiguration (intrusive)

Page 7: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 7

XRTC Fault Injector

Requirements– Configuration-level Fault Injector (or Upset Simulator)– Speed and ease of comprehensive single-bit injection– Kernel command set allows any middle-ware approach

• Either hardware or software generated commands– No impact on DUT designs– Minimum impact on FuncMon design

• Only need to add-on error signalling to ConfigMon• Introduce an easy-to-adapt template for FuncMon add-on

Page 8: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 8

XRTC Fault Injector

Same apparatus as for beam testing– Leverage ConfigMon functionality without breaking it – Kernel is add-on to ConfigMon– First priority – Inject as fast as possible

• Saves time by skipping intermediate “clean” frame– Requires three- way coordination

• Injector hands off to FuncMon after injecting fault• FuncMon tests functionality and reports results• Certain results cause ConfigMon to scrub or re-configure

– Scripting of kernel commands was natural addition

Page 9: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 9

FaultMon GUI Addition

Same (new) ConfigMon

GUI with FaultMon GUI

“sidecar”

Page 10: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 10

Three FaultMon GUI Controls

Stop at End of Part

Manual ControlOne operation at a time

Script ControlExecute a list of kernel commands

Auto ControlComprehensive single-bit fault injection

1. Choose STOP

condition(s)

2. Choose starting point

3. Kick it off

4. Observe it run

Page 11: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 11

Fault Injection Lessons So Far

• Very useful to designers and beam testers• Found 9 SelectMAP SEFI bits• Found an I/O Test Bit on certain pairs of I/Os• Many problems trace to state machine implementation

– Modern synthesis tools may “optimize” in bad ways• Trimming “extra” states• Changing the type of state machine

• Still working out complications– Half-latches give inconsistent results– Not all detected single faults are “real”– Other inconsistencies being worked

Page 12: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 12

Future Directions

• Near-term – Replace and augment beam tests– Simulate tri-flux test– Simulate multi-bit upsets (MBUs)

• Limitations of in-beam testing for robust TMR– Beam Time Required is Expensive – No Help with Locating of Problem Areas

• Longer-term – Expand to Flight Design Qual– May require expanded test platform

Page 13: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 13

V-4QV TMR-Counter Results

37% full, no single points-of-failure

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E-07 1.E-06 1.E-05 1.E-04 1.E-03

Raw Bit Flip Rate (upsets/bit-sec)

Syst

em E

rror

Rat

e (e

rror

s/se

c)

Kevin Heldt &

Scott A. Anderson

Xilinx Confidential • Unpublished Work © Copyright 2009 Xilinx

Page 14: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 14

V-4QV TMR-Counter Results

37% Full, 237 Scrub Faults &

213 Scrub+Reset Faults

1.00E-04

1.00E-03

1.00E-02

1.00E-01

1.00E+00

1.00E+01

1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03

Raw Bit Flip Rate (upsets/bit-sec)

Syst

em E

rror

Rat

e (e

rror

s/sy

stem

-sec

)

Xilinx Confidential • Unpublished Work © Copyright 2009 Xilinx

Page 15: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 15

V-4QV TMR-Counter Results

37% Full

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E-07 1.E-06 1.E-05 1.E-04 1.E-03

Raw Bit Flip Rate (upsets/bit-sec)

Syst

em E

rror

Rat

e (e

rror

s/se

c)

NO FAILs 450 FAILs

Xilinx Confidential • Unpublished Work © Copyright 2009 Xilinx

Page 16: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 16

41% Full, Partial Triplication4016 Scrub Fails,

0 Scrub+Reset Fails

1.00E-04

1.00E-03

1.00E-02

1.00E-01

1.00E+00

1.00E+01

1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03

Raw Bit Flip Rate (upsets/bit-sec)

Syst

em E

rror

Rat

e (e

rror

s/sy

stem

-sec

)V-4QV TMR-Counter Results

Xilinx Confidential • Unpublished Work © Copyright 2009 Xilinx

Page 17: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 17

Backup Material

• Virtex-4QV Devices and Features• Space Upset Rate Examples• Photos of XRTC Apparatus In Use

Page 18: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 18

Virtex-4QV Devices

XQR4V XQR4V XQR4V XQR4V Description SX55 FX60 FX140 LX200 CFG* Configuration Bits* (millions) 15.4 14.5 34.5 43.0 BRAM Block Memory Bits 5,898,240 4,276,224 10,174,464 6,193,152 LOGIC Slices (2 Lookup Tables/slice) 24,576 25,280 63,168 89,088 DSP** 18x18 MACs** 512 128 192 96 PPC PowerPC405 Processors - 2 2 - DCM Clock Managers 12 12 20 12 MGT*** High-speed Transceivers*** - N/A N/A - IOBs Input/Output Blocks 640 576 896 960

* Only real memory cells in the Configuration Bit Stream are counted here (not counting BRAM) ** MAC=multiply-and-accumulate block for digital signal processing (DSP) *** MGTs are not supported for Virtex-4QV devices

Architectural Features

Page 19: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 19

Example Space Upset Rates

Altitude ------------ XQR4V ----------- Orbit (km) Incl*

SX55 FX60 FX140 LX200 HI% 400 51.6º 0.73 0.69 1.61 2.03 69 LEO 800 22.0º 7.56 7.12 16.7 21.1 2

POLAR 833 98.7º 6.02 5.67 13.3 16.8 22 MEO 1200 65.0º 23.3 21.9 51.6 65.1 5 GEO 36,000 0º 4.28 4.03 9.5 11.9 94 * Incl = Inclination HI% = fraction from heavy ions

Configuration Cells

Page 20: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 20

Example Space Upset Rates

BRAM Cells

Altitude ------------ XQR4V ----------- Orbit (km) Incl*

SX55 FX60 FX140 LX200 HI% 400 51.6º 0.72 0.52 1.24 0.75 84 LEO 800 22.0º 4.05 2.94 6.99 4.25 5

POLAR 833 98.7º 4.00 2.90 6.90 4.20 37 MEO 1200 65.0º 13.3 9.63 22.9 13.9 10 GEO 36,000 0º 4.49 3.26 7.75 4.71 98 * Incl = Inclination HI% = fraction from heavy ions

Page 21: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 21

Example Space Upset Rates

Virtex-4QV SEFIs

Altitude ------------- SEFIs ------------ Orbit (km) Incl*

POR GSIG SMAP+ TOTAL HI%400 51.6º 1225 2161 1500 515 58 LEO 800 22.0º 100 114 112 36 13

POLAR 833 98.7º 131 165 146 49 14 MEO 1200 65.0º 32 37 35 11 3 GEO 36,000 0º 225 560 290 103 91

* Incl = Inclination HI% = fraction from heavy ions SMAP+ = SMAP & FAR SEFIs combined

Page 22: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 22

Mature Test Methods & Apparatus

Parallel Cable

Parallel Cable

40-Pin Ribbon

Banana Cable

Rx

CFG FPGA

EXTERNAL MGTs

40 Pin IDE

CONNECTOR

S Test MICTOR CONNECTORS

Reset Switches

BNC POWER LUGS

Fan Pwr

FUNC FPGA

FUNCM ON SDRAM DIM M S

DUT FPGA

Programming Headers

Compact FLASH Header

P14 Teradyne Conn A

P15 Teradyne Conn B

Clock SMA Connectors

CFGM ON & FUNCM ON RS - 232 Connectors

1.5V NC

3.0V

2.5V

GND

3.3V

C FG M O N

V C C _B 6

FU N C M O N

V C C _B 3

FU N C M O N

V C C _B 2

FU N C M O N

V C C _B 0

FU N C M O N

V C C _B 1

FU N C M O N

V ref_B 0

FU N C M O N

V ref_B 1

G N D

BNC POWER LUGS DUT RS -232 Connector

Programming Proms & Flash

System ACE

CFGM ON SDRAM DIM M

GPIB Interface

PowerSupply Control

And SEL Monitor

ConfigurationMonitor

2 U HP6629 PowerSupply (Service)

CounterBoard

DIO Cable

Data Out40-Pin Ribbon

Control In40-Pin Ribbon

2 U HP6623 PowerSupply (DUT)

FunctionalMonitor

CounterBoard

DIO Cable

Rx

Rx

RxTxTx

Tx

Tx

Breakout Box

Rx

Data Out40-Pin Ribbon

Data Out40-Pin Ribbon

Control In40-Pin Ribbon

Tx

Breakout Board

Inside Vacuum Chamber

Tx

Rx

Data Out40-Pin Ribbon

Bulkhead

Readback /Programming

Laptop

Boeing

XilinxRadiation

TestConsortium

SEAKR

Page 23: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 23

XRTC Beam Tests

• Static Results– Config cells– User BRAM & FFs– Functional Upsets

(aka SEFIs)– Both Protons

& Heavy Ions

• Dynamic & Mitigation Campaigns Underway

10-8

10-7

10-6

10-5

0 50 100Effective LET (MeV-cm2/mg)

Cro

ss S

ectio

n (c

m2 /d

evic

e)

SX55FX60LX200

POR SEFI

10-16

10-15

10-14

10-13

0 20 40 60 80 100 1Energy (MeV)

Cro

ss S

ectio

n (c

m2 /bit)

SX55FX60LX200

Configuration Cells

10-7

10-8

10-9

10-10

0 20 40 60 80 100 120Effective LET (MeV*cm2/mg)

Cro

ss S

ectio

n (c

m2 /b

it)

SX55FX60LX200

Configuration Cells

Page 24: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 24

ConfigMonFPGA

FuncMonFPGA

XRTC ApparatusTesting at Texas A&M Cyclotron Institute in Vacuum Chamber

FPGAundertest

Page 25: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 25

The TMR Verification Problem

• “Working” TMR may actually be broken– Stuck-at faults– Domain criss-crossing

• In the pathological case of only two working domains, a design’s error cross-section is double!

Page 26: XRTC Use of Fault Injection to Simulate Upsets in ...

MAPLD 2008G.Swift et al., page 26

• Benchtop smoke test for three-leg functionality• In-beam tri-flux test (expensive and non-specific)

– Probability of a system error is approximately proportional to the square of upsets per scrub cycle

• Fault Injection (again)

The TMR Verification Problem

Counters

r (bit errors/bit-second)

1e-7 1e-6 1e-5 1e-4 1e-30.0001

0.001

0.01

0.1

1

10

datageneral approximationsmall-r form extrapolated

R (

syst

em e

rror

s / s

econ

d)


Recommended