+ All Categories
Home > Documents > Luis E. Cordova, Duncan A. Buell

Luis E. Cordova, Duncan A. Buell

Date post: 27-Jan-2016
Category:
Upload: yovela
View: 21 times
Download: 0 times
Share this document with a friend
Description:
A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical Reconfigurable Computers. Luis E. Cordova, Duncan A. Buell. Outline. Problem and definitions Motivation Architecture N Techniques Advantages Disadvantages Lessons learned. Problem and Definitions. - PowerPoint PPT Presentation
28
Cordova 1 MAPLD 2005/F24 2 A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical Reconfigurable Computers Luis E. Cordova, Duncan A. Buell
Transcript
Page 1: Luis E. Cordova, Duncan A. Buell

Cordova 1MAPLD

2005/F242

A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical

Reconfigurable Computers

Luis E. Cordova, Duncan A. Buell

Page 2: Luis E. Cordova, Duncan A. Buell

Cordova 2MAPLD

2005/F242

Outline

1) Problem and definitions

2) Motivation

3) Architecture

4) N Techniques

5) Advantages

6) Disadvantages

7) Lessons learned

Page 3: Luis E. Cordova, Duncan A. Buell

Cordova 3MAPLD

2005/F242

Problem and Definitions

1. RCs are built from FPGAs + CPUs + memories + &c2. RC are general purpose embedded platforms3. RC are used to accelerate scientific applications

Problem:1. Achieve fault tolerance over heterogeneous hardware2. RC requires knowledge of where the electronics

inside the satellite is used, its orbit, for how long, and which direction it is facing solution is adaptability

Notes: There are more FPGAs than microprocessors in an RC.

(examples: SRC, BEE, &c) This is the egg-chicken dilema! “SRAM-based FPGAs are less reliable than

microprocessors” but reconfigurable.

Page 4: Luis E. Cordova, Duncan A. Buell

Cordova 4MAPLD

2005/F242

Motivation

* Ground tracking of LEO orbit of CEASE device:

Source: [Amptek03]

Page 5: Luis E. Cordova, Duncan A. Buell

Cordova 5MAPLD

2005/F242

Case: SRC Hardware Architecture

Source: [SRC]

SNAP

ComputerMemory(8 GB)

P4(2.8GHz)

P4(2.8GHz)/ /

22400MB/s

MIOC

L2L2

4256 MB/s

//

4256 MB/s1064 MB/s

DDRInterface

PCI-X

ControlFPGA

XC2V6000

Max payload rate is 1400 MB/s

On-Board Memory(24 MB)

/

2x800 MB/s(6x64 bits)

FPGA 1XC2V6000

FPGA 2XC2V6000

/

4800 MB/s(6x 64 bits)

/

4800 MB/s(6x 64 bits)

2400 MB/s(192 bits)

/

/ /

(108 bits)

ChainPorts 2400 MB/s for each port

(108 bits)

/

1064 MB/s

uPBoard

22400MB/s

Page 6: Luis E. Cordova, Duncan A. Buell

Cordova 6MAPLD

2005/F242

Fault Tolerance Techniques1) Dynamic FPGA-HOST remapping2) Dynamic FPGA-FPGA remapping3) FPGA Checkpointing4) System-level radiation tolerance

5) Sanity checks with golden copy6) Streaming Heart Beat Signal

7) Redundancy-based Data integrity8) Control Flow Tolerance9) Memory scrubbing10)Hardware-Software Backup Threading11)Dynamic Spatial Radiation Tolerance

12)HW-HW and HW-SW injection

Monitoring

Remapping and Recovery

Protection

Profiling

Page 7: Luis E. Cordova, Duncan A. Buell

Cordova 7MAPLD

2005/F242

System Dynamic Remapping

Dynamic Redistribution between uProc and FPGAs:

Speedup demand of the computation

Faults on FPGA side

User FPGA 1

User FPGA 2

Program codemain( ) {comp 1

…comp N} // end

comp 4

comp 7

Faults on uProc

handled by other

methods

Radiation Environment(SETs/SEUs)

Trade-offs:- parallelism- tolerance- FPGA resources

Page 8: Luis E. Cordova, Duncan A. Buell

Cordova 8MAPLD

2005/F242

Static Host-FPGA Mapping

parity &check ( )

Hybrid Computer Under Test (HCUT)

main ( )

map_function ( )

Processor

OBM

BRAM

V

self_repair ( )

check &parity ( )

MAP reconfigurable fabric

saboteur ( )

diagnose ( )

saboteur ( )

Page 9: Luis E. Cordova, Duncan A. Buell

Cordova 9MAPLD

2005/F242

Remapping and Monitoring

Chip 2Bridge

heart beatChip 1

On-board-memory (OBM)

HostRadHard

uProc

Dynamic remappingbetween FPGAs

Dynamic remappingbetween

uProc and FPGAs

Streaming heart beat

HierarchicalTolerance

uProc level

MAP level

RTL level

Page 10: Luis E. Cordova, Duncan A. Buell

Cordova 10MAPLD

2005/F242

Top level Remapping

if (mapIt (mapnum1)) { fprintf (stdout, "Hybrid level 1 failed!"); fprintf (stdout, "Entering hybrid level 2."); if (mapIt (mapnum2)) { fprintf (stdout, "Hybrid level 2 failed!"); fprintf (stdout, "Entering level 3

(full software).\n"); /* Computation on Software */ computeInSoftware(A,B,C,D);

} else { user2 (n, A, B, C, D, &time, 0); } } else { user1 (n, A, B, C, D, &time, 0); }

1

2

3

hardware functionalit

y mapped

more

less

Page 11: Luis E. Cordova, Duncan A. Buell

Cordova 11MAPLD

2005/F242

Block RAM ‘Flip-Flop’ Scrubbing

// computation for (i=0;i<n;i++) { tmr_in = al[i]; saboteur = bl[i]; // reading input stream

// Block RAM Scrubbing Technique if (i%2) { bram_rw = scrubb_flip [i]; // parity check flag error scrubb_flop [i] = bram_rw; } else { bram_rw = scrubb_flop [i]; // parity check flag error scrubb_flip [i] = bram_rw; }// bram_rw is used later on ...

Scrubbflip

Block RAMs

Scrubbflop

Read

Write

Write

Read

even

oddparitybits

paritybits

check

check

NEXTbram_rw

Page 12: Luis E. Cordova, Duncan A. Buell

Cordova 12MAPLD

2005/F242

Hardware-Hardware & Software-Hardware Fault Injection

// datapath level module redundancy -- DPLMR result_1 = tmr_in * bram_rw + (saboteur & 16LL); result_2 = tmr_in * bram_rw + (saboteur & 8LL); result_3 = tmr_in * bram_rw + (saboteur & 4LL); result_4 = tmr_in * bram_rw + (saboteur & 2LL);...

X

bram_rw

tmr_in

saboteur for k

+ result_k data-path k

data-path 1

data-path N

...

Redundant data-paths 1 to N

Hardware-Hardware(LFSR = linear feedback

shift register)Software-Hardware (recall previous slide)

saboteur = bl[i]; // reading input stream

Page 13: Luis E. Cordova, Duncan A. Buell

Cordova 13MAPLD

2005/F242

Dynamic Spatial Radiation Hardening 1

if ((result_1 == result_2) && (en_hub1 == 1) && (en_hub2 == 1)) { final_result = result_1; mul_diagnose_opt = 12;} else if ((result_2 == result_3) && (en_hub2 == 1) && (en_hub3 == 1)) { final_result = result_2; mul_diagnose_opt = 23;} else if ((result_3 == result_4) && (en_hub3 == 1) && (en_hub4 == 1)) { final_result = result_3; mul_diagnose_opt = 34;} else if ((result_4 == result_5) && (en_hub4 == 1) && (en_hub5 == 1)) { final_result = dresult_4; mul_diagnose_opt = 45;} else if ((result_5 == result_1) && (en_hub5 == 1) && (en_hub1 == 1)) { final_result = result_5; mul_diagnose_opt = 51;} else { final_result = result_5; mul_diagnose_opt = 55;}

data-path k

data-path 1

data-path N

...Voting final_result

EnablingHub

result_A

result_B

result_C

Multi-diagnoseOption

Page 14: Luis E. Cordova, Duncan A. Buell

Cordova 14MAPLD

2005/F242

EnablingHub

Dynamic Spatial Radiation Hardening 2

// on-next-iteration do enable/disable redundant datapaths circularly

if (temp_mul_diagnose_opt != mul_diagnose_opt) { temp_v = en_hub5; en_hub5 = en_hub4; en_hub4 = en_hub3; en_hub3 = en_hub2; en_hub2 = en_hub1; en_hub1 = temp_v;}

temp_mul_diagnose_opt = mul_diagnose_opt;

data-path k

data-path 1

data-path N

...Voting final_result

result_A

result_B

result_C

Multi-diagnoseOption

en_hub1

en_hub2

en_hub3

en_hub4

en_hub5

temp_v1

10

0

1 N = 5

Enableddata-paths

are 1, 2, 3 (*)

* implementing an LFSR is similar

Page 15: Luis E. Cordova, Duncan A. Buell

Cordova 15MAPLD

2005/F242

Control Flow Tolerance: IF statement

// Agent-based control flow technique#define xor(x,y) (x & !y)|(!x & y)

control_flag1 = 0;control_flag2 = 0;...if (condition) { control_flag1 = 1; ...}...if (condition & tolerance) { control_flag2 = 1; ...}error_flag = xor(xor(condition, control_flag1), control_flag2) ...;

mux

error_flag

if true

if false

mux

control_flag1

condition

control_flag2

Page 16: Luis E. Cordova, Duncan A. Buell

Cordova 16MAPLD

2005/F242

Control Flow Tolerance: FOR statement

// Agent-based control flow technique#pragma src parallel sections{ #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } } #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } }}if (control_counter1 == control_counter2) {error = 0;}else {error = 1;}

error

counter2

Data path

counter1

Dummy path

=

Page 17: Luis E. Cordova, Duncan A. Buell

Cordova 17MAPLD

2005/F242

Resource UtilizationArea is crucial to assess efficiency but it is also aflexible variable that we can tune with ourprogramming model

(*)

Slice (%) 4-input LUT (%)

MULT18X18/

RAMB16

DesignSize

(eq. sys. gates)

FFs Total LogicRoute Throu

Shift Regs.

Total

1 6.66 8.66 2.41 0.48 0.003 2.92 0/0 59,622

2 8.14 10.86 3.14 0.66 0.34 4.15 0/0 85,436

3 10.18 13.40 4.35 0.72 0.57 5.64 10/4 416,025Total for chip: xc2v6000-ff1517-433,792 slices (x2 FFs)144 Mult/BRAM

Table I. Resource Utilization

* 1 = bare-bones design 2 = radhard design moderate 3 = radhard design high

Page 18: Luis E. Cordova, Duncan A. Buell

Cordova 18MAPLD

2005/F242

Chip 1

FPGA Checkpointing

// attempt to back up the On-Board-Memory (OBM) banksif (status == temporary_failure) { obm_single_dma_stripe_backup(status, backed_up_obm_data);} else if (status == at_speed_backup) { obm_double_dma_looping_backup(status, backed_up_obm_data);} else { // FPGA unrecoverable backed_up_obm_data = null; status = 0;}

A B C D E

Chip 2

control

F G H

backed_up_obm_data

status

HostRadHard

uProc

Page 19: Luis E. Cordova, Duncan A. Buell

Cordova 19MAPLD

2005/F242

Hardware-Software Backup Threading

Two types of threads:1. POSIX thread backup2. FPGA leading thread

comp x

openMPbackup

FPGAroutineuP1

uP2comp x

Mess

age

Pass

ing X

openMPbackup

FPGAroutine

Page 20: Luis E. Cordova, Duncan A. Buell

Cordova 20MAPLD

2005/F242

Compute Data Integrity

// Compute Data Integrity techniqueint main(){ rst_count = 0; hw_valid = 0; ... for(i=0; i< compute_blocks; i++){ for (j=0; j<sz; j++) { if(hw_valid){ sw_array->aarray[j] = hw_array->aarray[j]; } else { hw_array->aarray[j] = sw_array->aarray[j]; } }...

sw_array

hw_valid

hw_arrayif

Page 21: Luis E. Cordova, Duncan A. Buell

Cordova 21MAPLD

2005/F242

Hardware-Software Backup Threading

// Backup threading technique pthread_create(&thread_hw, NULL, &foo_hw, NULL); pthread_create(&thread_sw, NULL, &foo_sw, NULL); pthread_testcancel(); pthread_join(thread_hw, NULL); pthread_join(thread_sw, NULL); printf(“compute_block done! \n"); if(rst_count > 2){ system("snap Reset"); rst_count = 0; } } printf("job done! \n"); return(0);}...

thread_swthread_hw

foo_hw foo_sw

rst_count++

hw_valid=1

hw_valid=0

Page 22: Luis E. Cordova, Duncan A. Buell

Cordova 22MAPLD

2005/F242

“foo_SW” Software Thread// foo_sw : software version of function foovoid *foo_sw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); printf("I am thread foo_sw \n"); for(j=0; j<sz; j++) { sw_array->aarray[j] = 1 + sw_array->aarray[j]; }

status = pthread_cancel(thread_hw);

pthread_testcancel();

printf("canceling thread_hw with status = %i\n", status);

pthread_exit(NULL); return NULL;}

foo_sw

Page 23: Luis E. Cordova, Duncan A. Buell

Cordova 23MAPLD

2005/F242

“foo_HW” Hardware Thread// foo_hw : hardware version of function foo void *foo_hw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL);

printf("I am thread foo_hw \n");

rst_count++; foo_hw_map(hw_array->aarray, hw_array->mapno); rst_count--; hw_valid = 1;

status = pthread_cancel(thread_sw); pthread_testcancel(); printf("canceling thread_sw with status = %i\n", status); pthread_exit(NULL); return NULL;}

foo_hw

Page 24: Luis E. Cordova, Duncan A. Buell

Cordova 24MAPLD

2005/F242

SystemC – Calling MAP C from C++

1) OfflineDevelopment is seamless and based on code transformationthat can be copy/pasted to a MAP C design

2) OnlineOnline Interface (OIF). The MAP hardware is treated as anobject. Computation is performed at the high level e.g.

FIR

resetinput_valid

sample

CLK

output_data_readyresult

stimulusresetinput_validsample

CLK

displayresult

output_data_ready

main ( )

foo_hw

Page 25: Luis E. Cordova, Duncan A. Buell

Cordova 25MAPLD

2005/F242

Sanity Checking// Read back (supported if API supports it)p_bitstream_new = JTAG_bitstream_read_back();error = compare(p_bitstream_new, p_bitstream_golden);

// Sanity checking with hw module databasefoo_hw_1(argument_1, result_1);...foo_hw_N(argument_N, result_N);

for(i=0; i< modules; i++) { error[i] = compare(result_1, golden_1);}

error [ ]

foo_N()

foo_1 ( ) =

golden (sw-computed or stored)

Page 26: Luis E. Cordova, Duncan A. Buell

Cordova 26MAPLD

2005/F242

Advantages

Hardening:Dynamic levels of radiation hardening or customization.System description is fully synthesizable in both SW(compiled-> processor) or HW (forged-> C to fpgacompilation)

Fault-injection:Fault injection can be specified at high level (ANSI C orFortran) and can be interfaced with scripts for verificationand test

Simulation and emulation capabilities:At speed tolerance check, debugging, cycle accuratesimulation, hardware emulation

Page 27: Luis E. Cordova, Duncan A. Buell

Cordova 27MAPLD

2005/F242

Disadvantages

Too high level:

• Optimization is aimed at first only by the use of a Hardware compiler

• Further optimization is achieved by a skilled or experienced programmer

• Fine tunning is possible at the expense of time yet this obstacle is being overcome by more advanced hardware compiler technology and released programmer techniques

Page 28: Luis E. Cordova, Duncan A. Buell

Cordova 28MAPLD

2005/F242

Leasons Learned

• Tested High-level Advance Fault tolerance techniques

• Develop high performance embedded computing techniques that are power aware and versatile to counteract different radiation scenarios

• High performance supercomputing methodologies need of terrestrial-based radiation hardening due to amplifying effects in supercomputers comprising large number of processing elements


Recommended