+ All Categories
Home > Documents > Fault Tolerance

Fault Tolerance

Date post: 14-Apr-2017
Category:
Upload: karisma-ramesh
View: 73 times
Download: 0 times
Share this document with a friend
12
FAULT TOLERANCE IN FPGA BASED SYSTEMS CSE661-Milestone 3 Karisma Ramesh 451126715 CSE661 Milestone-3 1
Transcript
Page 1: Fault Tolerance

FAULT TOLERANCE IN FPGA BASED SYSTEMS

CSE661-Milestone 3

Karisma Ramesh

451126715

CSE661 Milestone-3 1

Page 2: Fault Tolerance

What is an FPGA?

• A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence "field-programmable". The FPGA configuration is generally specified using a Hardware Description Language (HDL), similar to that used for an Application Specific Integreted Circuits (ASIC).

CSE661 Milestone-3 2

Page 3: Fault Tolerance

FPGA ARCHITECTURE

CSE661 Milestone-3 3

In general, FPGAs consist of regular arrays of programmable logic blocks (PLBs) connected to each other by a programmable routing matrix. An FPGA configuration defines the functionality of an FPGA, specifying which logic blocks are used and which wire segments are used to connect them, as well as what functionality each block provides. As in Figure

Page 4: Fault Tolerance

What is Fault Tolerance?

• A fault can be defined as a physical occurrence within an FPGA that causes it to malfunction, such as a broken wire caused during manufacture by a dust particle. Faults usually occur at the beginning and end of a chip’s life cycle. Fabrication faults, or defects, are usually caused by contaminants or other flaws in the manufacturing process and are detected during manufacture testing. Late life faults are usually due to failure of device resources.

CSE661 Milestone-3 4

System failure rate during the life cycle of an FPGA.

Page 5: Fault Tolerance

Methods Of Fault Detection

• 1. Redundant/concurrent error detection uses additional logic as a means of detecting when a logic function is not generating the correct output.

• 2. Off-line test methods cover any testing which is carried out when the FPGA is not performing its operational function.

• 3. Roving test methods perform a progressive scan of the FPGA structure by swapping blocks of functionality with a block carrying out a test function.

CSE661 Milestone-3 5

Page 6: Fault Tolerance

Methods Of Fault Detection

There are various methods used to implement fault tolerance in FPGA systems some of them are listed in this paper:

• Single Fault Tolerance: These are mostly transient faults caused by extraordinary circumstances in the environment. Examples of these are charged particles striking the FPGA while it is in space, or radioactive materials sending out energy which lodges inside the FPGAs vulnerable systems. Because SEU faults are so common a plethora of methods have been devised to mitigate them.

• Multiple Fault Tolerance: While single faults account for many of the problems that FPGA systems encounter there are some environments or some applications where multiple faults can happen simultaneously. Some of these situations are due to the fact that feature size continues to decrease. This by itself can cause many faults in a manufactured device.

• Hardware level fault tolerance: Hardware level repair performs a correction such that the FPGA remains unchanged for the purposes of the configuration. The device retains its original number and arrangement of useable logic clusters and interconnects.

CSE661 Milestone-3 6

Page 7: Fault Tolerance

• Configuration level Fault Tolerance: is achieved using resources that are unused by the design. The spare resources can replace faulty ones in the event of a fault.

• System level Fault Tolerance: repair works at a higher level. When a design is highly modular, a fault can be tolerated by the use of a spare functional block or by providing degraded performance . Such methods are not considered in more detail here, as they are not limited in application to FPGAs.

CSE661 Milestone-3 7

Page 8: Fault Tolerance

Open source code

VHDL Code for GeneratePropogate block correction in 8-bit Kogge-Stone Fault Correcting Adder

• library IEEE;

• use IEEE.STD_LOGIC_1164.ALL;

• use IEEE.STD_LOGIC_ARITH.ALL;

• use IEEE.STD_LOGIC_UNSIGNED.ALL;

• entity GPblock is

• port( a,b: in std_logic;

• g,p: out std_logic);

• end GPblock;

• architecture Behavioral of GPblock is

• begin

• g <= a and b;

• p <= a xor b;

• Behavioral;

CSE661 Milestone-3 8

Page 9: Fault Tolerance

VHDL Code for mux correction in 8-bit Kogge-Stone Fault Correcting Adder

• library IEEE;

• use IEEE.STD_LOGIC_1164.ALL;

• entity mux is

• port (x,y: in std_logic_vector(1 downto 0);

• z : out std_logic_vector(1 downto 0);

• sel: in std_logic);

• end mux;

• architecture Behavioral of mux is

• constant delay: time :=100ns;

• begin

• mux_proc : process(x,y,sel)

• variable temp : std_logic_vector(1 downto 0);

• begin

• case sel is

• when '0'=> temp:=x;

• when '1'=> temp:=y;

• when others => temp :="XX";

• end case;

• z<= temp;

• end process mux_proc;

• end Behavioral;

CSE661 Milestone-3 9

Page 10: Fault Tolerance

VHDL Code for sum correction in 8-bit Kogge-Stone Fault Correcting Adder

• library IEEE;

• use IEEE.STD_LOGIC_1164.ALL;

• use IEEE.STD_LOGIC_ARITH.ALL;

• use IEEE.STD_LOGIC_UNSIGNED.ALL;

• entity sum is

• port( p,c: in std_logic;

• s : out std_logic);

• end sum;

• architecture Behavioral of sum is

• begin

• s <= (p xor c);

• end Behavioral;

CSE661 Milestone-3 10

Page 11: Fault Tolerance

Conclusions

• FPGAs are a very important computing resource for many different fields in the world today. Their reconfigurability allows for incredible flexibility and reuse. But this benefit comes with a cost.

• Clearly, no single FT methodology is significantly better than the others, The best general solution to FPGA FT is probably a combination of both DL and CL fault tolerance methodologies. The most likely future advancement in fault tolerance will be in the area of self-adaptation in the presence of faults. This will allow FPGAs to be fault tolerant no matter the environment. As for detection and diagnosis the focus is always on improving speed, coverage and overhead.

CSE661 Milestone-3 11

Page 12: Fault Tolerance

References• 1. Jason. A. Cheatham , John M. Emmert and Stan Baumgart 2006. A Survey of Fault Tolerant

• Methodologies for FPGAs

• 2. Jano Gebelein, Heiko Engel and Udo Kebschull 2010. FPGA fault tolerance in radiation

• susceptible environments

• 3. Khaled Elshafey, Jan Hlavicka ˇ 2002 . FAULT-TOLERANT FPGA-BASED SYSTEMS

• 4. Daniel Fisher, Addison Floyd . Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs

• 5. Wei-Je Huang and Edward J. McCluskey . Column-Based Precompiled Configuration Techniques for FPGA Fault Tolerance

• 6. Edward Stott, Pete Sedcole, Peter Y. K. Cheung 2007. FAULT TOLERANT METHODS FOR

• RELIABILITY IN FPGAs BasicsFPGA :https://www.google.com/search?biw=1536&bih=758&q=fpga+basics&revid=1721831489&sa=X&ei=LEw0VPK7LIuxyAT6loLICg&ved=0CG8Q1QIoBQ

• 7. Technical Blogs:

• http://www.pe-ip.com/ by Marc Perron http://billauer.co.il/blog/category/fpga/

• by Eli Billauerhttp://fpgablog.com/

• 8. http://www.dtic.mil/dtic/tr/fulltext/u2/a462520.pdf

• 9. http://www.ijetch.org/papers/424-C049.pdf

• 10.http://www.academia.edu/5178678/High_Speed_Fault_Injection_Tool_Implemented_With_Verilog_HDL_on_FPGA_for_Testing_Fault_Tolerance_Designs

• Others:

• Google

• Wikipedia

• A text book for Hardware description language Verilog and VHDL by A K pedroni

CSE661 Milestone-3 12


Recommended