Cordova 1MAPLD
2005/F242
A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical
Reconfigurable Computers
Luis E. Cordova, Duncan A. Buell
Cordova 2MAPLD
2005/F242
Outline
1) Problem and definitions
2) Motivation
3) Architecture
4) N Techniques
5) Advantages
6) Disadvantages
7) Lessons learned
Cordova 3MAPLD
2005/F242
Problem and Definitions
1. RCs are built from FPGAs + CPUs + memories + &c2. RC are general purpose embedded platforms3. RC are used to accelerate scientific applications
Problem:1. Achieve fault tolerance over heterogeneous hardware2. RC requires knowledge of where the electronics
inside the satellite is used, its orbit, for how long, and which direction it is facing solution is adaptability
Notes: There are more FPGAs than microprocessors in an RC.
(examples: SRC, BEE, &c) This is the egg-chicken dilema! “SRAM-based FPGAs are less reliable than
microprocessors” but reconfigurable.
Cordova 4MAPLD
2005/F242
Motivation
* Ground tracking of LEO orbit of CEASE device:
Source: [Amptek03]
Cordova 5MAPLD
2005/F242
Case: SRC Hardware Architecture
Source: [SRC]
SNAP
ComputerMemory(8 GB)
P4(2.8GHz)
P4(2.8GHz)/ /
22400MB/s
MIOC
L2L2
4256 MB/s
//
4256 MB/s1064 MB/s
DDRInterface
PCI-X
ControlFPGA
XC2V6000
Max payload rate is 1400 MB/s
On-Board Memory(24 MB)
/
2x800 MB/s(6x64 bits)
FPGA 1XC2V6000
FPGA 2XC2V6000
/
4800 MB/s(6x 64 bits)
/
4800 MB/s(6x 64 bits)
2400 MB/s(192 bits)
/
/ /
(108 bits)
ChainPorts 2400 MB/s for each port
(108 bits)
/
1064 MB/s
uPBoard
22400MB/s
Cordova 6MAPLD
2005/F242
Fault Tolerance Techniques1) Dynamic FPGA-HOST remapping2) Dynamic FPGA-FPGA remapping3) FPGA Checkpointing4) System-level radiation tolerance
5) Sanity checks with golden copy6) Streaming Heart Beat Signal
7) Redundancy-based Data integrity8) Control Flow Tolerance9) Memory scrubbing10)Hardware-Software Backup Threading11)Dynamic Spatial Radiation Tolerance
12)HW-HW and HW-SW injection
Monitoring
Remapping and Recovery
Protection
Profiling
Cordova 7MAPLD
2005/F242
System Dynamic Remapping
Dynamic Redistribution between uProc and FPGAs:
Speedup demand of the computation
Faults on FPGA side
User FPGA 1
User FPGA 2
Program codemain( ) {comp 1
…comp N} // end
comp 4
comp 7
Faults on uProc
handled by other
methods
Radiation Environment(SETs/SEUs)
Trade-offs:- parallelism- tolerance- FPGA resources
Cordova 8MAPLD
2005/F242
Static Host-FPGA Mapping
parity &check ( )
Hybrid Computer Under Test (HCUT)
main ( )
map_function ( )
Processor
OBM
BRAM
V
self_repair ( )
check &parity ( )
MAP reconfigurable fabric
saboteur ( )
diagnose ( )
saboteur ( )
Cordova 9MAPLD
2005/F242
Remapping and Monitoring
Chip 2Bridge
heart beatChip 1
On-board-memory (OBM)
HostRadHard
uProc
Dynamic remappingbetween FPGAs
Dynamic remappingbetween
uProc and FPGAs
Streaming heart beat
HierarchicalTolerance
uProc level
MAP level
RTL level
Cordova 10MAPLD
2005/F242
Top level Remapping
if (mapIt (mapnum1)) { fprintf (stdout, "Hybrid level 1 failed!"); fprintf (stdout, "Entering hybrid level 2."); if (mapIt (mapnum2)) { fprintf (stdout, "Hybrid level 2 failed!"); fprintf (stdout, "Entering level 3
(full software).\n"); /* Computation on Software */ computeInSoftware(A,B,C,D);
} else { user2 (n, A, B, C, D, &time, 0); } } else { user1 (n, A, B, C, D, &time, 0); }
1
2
3
hardware functionalit
y mapped
more
less
Cordova 11MAPLD
2005/F242
Block RAM ‘Flip-Flop’ Scrubbing
// computation for (i=0;i<n;i++) { tmr_in = al[i]; saboteur = bl[i]; // reading input stream
// Block RAM Scrubbing Technique if (i%2) { bram_rw = scrubb_flip [i]; // parity check flag error scrubb_flop [i] = bram_rw; } else { bram_rw = scrubb_flop [i]; // parity check flag error scrubb_flip [i] = bram_rw; }// bram_rw is used later on ...
Scrubbflip
Block RAMs
Scrubbflop
Read
Write
Write
Read
even
oddparitybits
paritybits
check
check
NEXTbram_rw
Cordova 12MAPLD
2005/F242
Hardware-Hardware & Software-Hardware Fault Injection
// datapath level module redundancy -- DPLMR result_1 = tmr_in * bram_rw + (saboteur & 16LL); result_2 = tmr_in * bram_rw + (saboteur & 8LL); result_3 = tmr_in * bram_rw + (saboteur & 4LL); result_4 = tmr_in * bram_rw + (saboteur & 2LL);...
X
bram_rw
tmr_in
saboteur for k
+ result_k data-path k
data-path 1
data-path N
...
Redundant data-paths 1 to N
Hardware-Hardware(LFSR = linear feedback
shift register)Software-Hardware (recall previous slide)
saboteur = bl[i]; // reading input stream
Cordova 13MAPLD
2005/F242
Dynamic Spatial Radiation Hardening 1
if ((result_1 == result_2) && (en_hub1 == 1) && (en_hub2 == 1)) { final_result = result_1; mul_diagnose_opt = 12;} else if ((result_2 == result_3) && (en_hub2 == 1) && (en_hub3 == 1)) { final_result = result_2; mul_diagnose_opt = 23;} else if ((result_3 == result_4) && (en_hub3 == 1) && (en_hub4 == 1)) { final_result = result_3; mul_diagnose_opt = 34;} else if ((result_4 == result_5) && (en_hub4 == 1) && (en_hub5 == 1)) { final_result = dresult_4; mul_diagnose_opt = 45;} else if ((result_5 == result_1) && (en_hub5 == 1) && (en_hub1 == 1)) { final_result = result_5; mul_diagnose_opt = 51;} else { final_result = result_5; mul_diagnose_opt = 55;}
data-path k
data-path 1
data-path N
...Voting final_result
EnablingHub
result_A
result_B
result_C
Multi-diagnoseOption
Cordova 14MAPLD
2005/F242
EnablingHub
Dynamic Spatial Radiation Hardening 2
// on-next-iteration do enable/disable redundant datapaths circularly
if (temp_mul_diagnose_opt != mul_diagnose_opt) { temp_v = en_hub5; en_hub5 = en_hub4; en_hub4 = en_hub3; en_hub3 = en_hub2; en_hub2 = en_hub1; en_hub1 = temp_v;}
temp_mul_diagnose_opt = mul_diagnose_opt;
data-path k
data-path 1
data-path N
...Voting final_result
result_A
result_B
result_C
Multi-diagnoseOption
en_hub1
en_hub2
en_hub3
en_hub4
en_hub5
temp_v1
10
0
1 N = 5
Enableddata-paths
are 1, 2, 3 (*)
* implementing an LFSR is similar
Cordova 15MAPLD
2005/F242
Control Flow Tolerance: IF statement
// Agent-based control flow technique#define xor(x,y) (x & !y)|(!x & y)
control_flag1 = 0;control_flag2 = 0;...if (condition) { control_flag1 = 1; ...}...if (condition & tolerance) { control_flag2 = 1; ...}error_flag = xor(xor(condition, control_flag1), control_flag2) ...;
mux
error_flag
if true
if false
mux
control_flag1
condition
control_flag2
Cordova 16MAPLD
2005/F242
Control Flow Tolerance: FOR statement
// Agent-based control flow technique#pragma src parallel sections{ #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } } #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } }}if (control_counter1 == control_counter2) {error = 0;}else {error = 1;}
error
counter2
Data path
counter1
Dummy path
=
Cordova 17MAPLD
2005/F242
Resource UtilizationArea is crucial to assess efficiency but it is also aflexible variable that we can tune with ourprogramming model
(*)
Slice (%) 4-input LUT (%)
MULT18X18/
RAMB16
DesignSize
(eq. sys. gates)
FFs Total LogicRoute Throu
Shift Regs.
Total
1 6.66 8.66 2.41 0.48 0.003 2.92 0/0 59,622
2 8.14 10.86 3.14 0.66 0.34 4.15 0/0 85,436
3 10.18 13.40 4.35 0.72 0.57 5.64 10/4 416,025Total for chip: xc2v6000-ff1517-433,792 slices (x2 FFs)144 Mult/BRAM
Table I. Resource Utilization
* 1 = bare-bones design 2 = radhard design moderate 3 = radhard design high
Cordova 18MAPLD
2005/F242
Chip 1
FPGA Checkpointing
// attempt to back up the On-Board-Memory (OBM) banksif (status == temporary_failure) { obm_single_dma_stripe_backup(status, backed_up_obm_data);} else if (status == at_speed_backup) { obm_double_dma_looping_backup(status, backed_up_obm_data);} else { // FPGA unrecoverable backed_up_obm_data = null; status = 0;}
A B C D E
Chip 2
control
F G H
backed_up_obm_data
status
HostRadHard
uProc
Cordova 19MAPLD
2005/F242
Hardware-Software Backup Threading
Two types of threads:1. POSIX thread backup2. FPGA leading thread
comp x
openMPbackup
FPGAroutineuP1
uP2comp x
Mess
age
Pass
ing X
openMPbackup
FPGAroutine
Cordova 20MAPLD
2005/F242
Compute Data Integrity
// Compute Data Integrity techniqueint main(){ rst_count = 0; hw_valid = 0; ... for(i=0; i< compute_blocks; i++){ for (j=0; j<sz; j++) { if(hw_valid){ sw_array->aarray[j] = hw_array->aarray[j]; } else { hw_array->aarray[j] = sw_array->aarray[j]; } }...
sw_array
hw_valid
hw_arrayif
Cordova 21MAPLD
2005/F242
Hardware-Software Backup Threading
// Backup threading technique pthread_create(&thread_hw, NULL, &foo_hw, NULL); pthread_create(&thread_sw, NULL, &foo_sw, NULL); pthread_testcancel(); pthread_join(thread_hw, NULL); pthread_join(thread_sw, NULL); printf(“compute_block done! \n"); if(rst_count > 2){ system("snap Reset"); rst_count = 0; } } printf("job done! \n"); return(0);}...
thread_swthread_hw
foo_hw foo_sw
rst_count++
hw_valid=1
hw_valid=0
Cordova 22MAPLD
2005/F242
“foo_SW” Software Thread// foo_sw : software version of function foovoid *foo_sw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); printf("I am thread foo_sw \n"); for(j=0; j<sz; j++) { sw_array->aarray[j] = 1 + sw_array->aarray[j]; }
status = pthread_cancel(thread_hw);
pthread_testcancel();
printf("canceling thread_hw with status = %i\n", status);
pthread_exit(NULL); return NULL;}
foo_sw
Cordova 23MAPLD
2005/F242
“foo_HW” Hardware Thread// foo_hw : hardware version of function foo void *foo_hw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL);
printf("I am thread foo_hw \n");
rst_count++; foo_hw_map(hw_array->aarray, hw_array->mapno); rst_count--; hw_valid = 1;
status = pthread_cancel(thread_sw); pthread_testcancel(); printf("canceling thread_sw with status = %i\n", status); pthread_exit(NULL); return NULL;}
foo_hw
Cordova 24MAPLD
2005/F242
SystemC – Calling MAP C from C++
1) OfflineDevelopment is seamless and based on code transformationthat can be copy/pasted to a MAP C design
2) OnlineOnline Interface (OIF). The MAP hardware is treated as anobject. Computation is performed at the high level e.g.
FIR
resetinput_valid
sample
CLK
output_data_readyresult
stimulusresetinput_validsample
CLK
displayresult
output_data_ready
main ( )
foo_hw
Cordova 25MAPLD
2005/F242
Sanity Checking// Read back (supported if API supports it)p_bitstream_new = JTAG_bitstream_read_back();error = compare(p_bitstream_new, p_bitstream_golden);
// Sanity checking with hw module databasefoo_hw_1(argument_1, result_1);...foo_hw_N(argument_N, result_N);
for(i=0; i< modules; i++) { error[i] = compare(result_1, golden_1);}
error [ ]
foo_N()
foo_1 ( ) =
golden (sw-computed or stored)
Cordova 26MAPLD
2005/F242
Advantages
Hardening:Dynamic levels of radiation hardening or customization.System description is fully synthesizable in both SW(compiled-> processor) or HW (forged-> C to fpgacompilation)
Fault-injection:Fault injection can be specified at high level (ANSI C orFortran) and can be interfaced with scripts for verificationand test
Simulation and emulation capabilities:At speed tolerance check, debugging, cycle accuratesimulation, hardware emulation
Cordova 27MAPLD
2005/F242
Disadvantages
Too high level:
• Optimization is aimed at first only by the use of a Hardware compiler
• Further optimization is achieved by a skilled or experienced programmer
• Fine tunning is possible at the expense of time yet this obstacle is being overcome by more advanced hardware compiler technology and released programmer techniques
Cordova 28MAPLD
2005/F242
Leasons Learned
• Tested High-level Advance Fault tolerance techniques
• Develop high performance embedded computing techniques that are power aware and versatile to counteract different radiation scenarios
• High performance supercomputing methodologies need of terrestrial-based radiation hardening due to amplifying effects in supercomputers comprising large number of processing elements