Multiple Bit Parity Based Concurrent Error Detection ... · Web viewIn a software implementation of...

Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation

APPENDIX OVERVIEW OF VLSI Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC)

by combining hundreds of thousands of transistors or devices into a single chip. VLSI

began in the 1970s when complex semiconductor and communication technologies were

being developed. The microprocessor is a VLSI device. Before the introduction of VLSI

technology most ICs had a limited set of functions they could perform. An electronic

circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets IC designers

add all of these into one chip. The history of the transistor dates to the 1920s when several

inventors attempted devices that were intended to control current in solid-state diodes and

convert them into triodes. Success came after World War II, when the use of silicon and

germanium crystals as radar detectors led to improvements in fabrication and theory.

Scientists who had worked on radar returned to solid-state device development. With the

invention of transistors at Bell Labs in 1947, the field of electronics shifted from vacuum

tubes to solid-state device.

With the small transistor at their hands, electrical engineers of the 1950s saw the

possibilities of constructing far more advanced circuits. However, as the complexity of

circuits grew, problems arose. One problem was the size of the circuit. A complex circuit

like a computer was dependent on speed. If the components were large, the wires

interconnecting them must be long. The electric signals took time to go through the

circuit, thus slowing the computer. The invention of the integrated circuit by Jack

Kilby and Robert Noyce solved this problem by making all the components and the chip

out of the same block (monolith) of semiconductor material. The circuits could be made

smaller, and the manufacturing process could be automated. This led to the idea of

integrating all components on a single-crystal silicon wafer, which led to small-scale

integration (SSI) in the early 1960s, medium-scale integration (MSI) in the late 1960s,

and then large-scale integration (LSI) as well as VLSI in the 1970s and 1980s, with tens

of thousands of transistors on a single chip (later hundreds of thousands, then millions,

and now billions (109)). The first semiconductor chips held two transistors each.

Subsequent advances added more transistors, and as a consequence, more individual

functions or systems were integrated over time. The first integrated circuits held only a

few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it

Department of Electronics & Communication Engineering 1

https://en.wikipedia.org/wiki/Capacitor

https://en.wikipedia.org/wiki/Resistor

https://en.wikipedia.org/wiki/Transistor

https://en.wikipedia.org/wiki/Diode

https://en.wikipedia.org/wiki/Robert_Noyce

https://en.wikipedia.org/wiki/Jack_Kilby

https://en.wikipedia.org/wiki/Jack_Kilby

https://en.wikipedia.org/wiki/Invention_of_the_integrated_circuit

https://en.wikipedia.org/wiki/History_of_the_transistor

https://en.wikipedia.org/wiki/System_on_chip

https://en.wikipedia.org/wiki/Glue_logic

https://en.wikipedia.org/wiki/Random-access_memory

https://en.wikipedia.org/wiki/Read-only_memory

https://en.wikipedia.org/wiki/Central_processing_unit

https://en.wikipedia.org/wiki/Electronic_circuit

https://en.wikipedia.org/wiki/Electronic_circuit

https://en.wikipedia.org/wiki/Microprocessor

https://en.wikipedia.org/wiki/Communication

https://en.wikipedia.org/wiki/Semiconductor

https://en.wikipedia.org/wiki/Transistors

https://en.wikipedia.org/wiki/Integrated_circuit


possible to fabricate one or more logic gates on a single device. Now known

retrospectively as small-scale integration (SSI), improvements in technique led to devices

with hundreds of logic gates, known as medium-scale integration (MSI). Further

improvements led to large-scale integration (LSI), i.e. systems with at least a thousand

logic gates. Current technology has moved far past this mark and

today's microprocessors have many millions of gates and billions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale

integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But

the huge number of gates and transistors available on common devices has rendered such

fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no

longer in widespread use. In 2008, billion-transistor processors became commercially

available. This became more commonplace as semiconductor fabrication advanced from

the then-current generation of 65 nm processes. Current designs, unlike the earliest

devices, use extensive design automation and automated logic synthesis to lay out the

transistors, enabling higher levels of complexity in the resulting logic functionality.

Certain high-performance logic blocks like the SRAM (static random-access memory)

cell, are still designed by hand to ensure the highest efficiency.

System Requirement

The first step of any design process is to lay down the specifications of the system.

System specification is a high level representation of the system. The factors to be

considered in this process include: performance, functionality, and physical dimensions

(size of the die (chip)). The fabrication technology and design techniques are also

considered. The specification of a system is a compromise between market requirements,

technology and economical viability. The end results are specifications for the size,

speed, power, and functionality of the VLSI system.


https://en.wikipedia.org/wiki/Static_random-access_memory

https://en.wikipedia.org/wiki/Integrated_circuit_layout

https://en.wikipedia.org/wiki/Logic_synthesis

https://en.wikipedia.org/wiki/Electronic_Design_Automation

https://en.wikipedia.org/wiki/65_nanometer

https://en.wikipedia.org/wiki/Integrated_circuit#ULSI,_WSI,_SOC_and_3D-IC

https://en.wikipedia.org/wiki/Microprocessor

https://en.wikipedia.org/wiki/Integrated_circuit#SSI,_MSI_and_LSI



https://en.wikipedia.org/wiki/Logic_gate


Fig VLSI Design Flow

Architectural Design: The basic architecture of the system is designed in this step. This includes, such

decisions as RISC (Reduced Instruction Set Computer) versus CISC (Complex

Instruction Set Computer), number of ALUs, Floating Point units, number and structure

of pipelines, and size of caches among others. The outcome of architectural design is a

Micro-Architectural Specification (MAS). While MAS is a textual (English like)



description, architects can accurately predict the performance, power and die size of the

design based on such a description.

Behavioral or Functional Design: In this step, main functional units of the system are identified. This also identifies the

interconnect requirements between the units. The area, power, and other parameters of

each unit are estimated.

The behavioral aspects of the system are considered without implementation specific

information. For example, it may specify that a multiplication is required, but exactly in

which mode such multiplication may be executed is not specified. We may use a variety

of multiplication hardware depending on the speed and word size requirements. The key

idea is to specify behavior, in terms of input, output and timing of each unit, without

specifying its internal structure.

The outcome of functional design is usually a timing diagram or other relationships

between units. This information leads to improvement of the overall design process and

reduction of the complexity of subsequent phases. Functional or behavioral design

provides quick emulation of the system and allows fast debugging of the full system.

Behavioral design is largely a manual step with little or no automation help available.

Logic Design: In this step the control flow, word widths, register allocation, arithmetic operations,

and logic operations of the design that represent the functional design are derived and

tested. This description is called Register Transfer Level (RTL) description. RTL is

expressed in a Hardware Description Language (HDL), such as VHDL or Verilog. This

description can be used in simulation and verification. This description consists of

Boolean expressions and timing information. The Boolean expressions are minimized to

achieve the smallest logic design which conforms to the functional design. This logic

design of the system is simulated and tested to verify its correctness. In some special

cases, logic design can be automated using high level synthesis tools. These tools produce

a RTL description from a behavioral description of the design.



Circuit Design: The purpose of circuit design is to develop a circuit representation based on the logic

design. The Boolean expressions are converted into a circuit representation by taking into

consideration the speed and power requirements of the original design. Circuit Simulation

is used to verify the correctness and timing of each component. The circuit design is

usually expressed in a detailed circuit diagram. This diagram shows the circuit elements

(cells, macros, gates, transistors) and interconnection between these elements. This

representation is also called a net list. Tools used to manually enter such description are

called schematic capture tools. In many cases, a net list can be created automatically from

logic (RTL) description by using logic synthesis tools.

Physical Design: In this step the circuit representation (or net list) is converted into a geometric

representation. As stated earlier, this geometric representation of a circuit is called a

layout. Layout is created by converting each logic component (cells, macros, gates,

transistors) into a geometric representation (specific shapes in multiple layers), which

perform the intended logic function of the corresponding component. Connections

between different components are also expressed as geometric patterns typically lines in

multiple layers.

The exact details of the layout also depend on design rules, which are guidelines based on

the limitations of the fabrication process and the electrical properties of the fabrication

materials. Physical design is a very complex process and therefore it is usually broken

down into various sub-steps. Various verification and validation checks are performed on

the layout during physical design.

In many cases, physical design can be completely or partially automated and layout can

be generated directly from net list by Layout Synthesis tools. Layout synthesis tools,

while fast, do have an area and performance penalty, which limit their use to some

designs. Manual layout, while slow and manually intensive, does have better area and

performance as compared to synthesized layout. However this advantage may dissipate as

larger and larger designs may undermine human capability to comprehend and obtain

globally optimized solutions.



Fabrication: After layout and verification, the design is ready for fabrication. Since layout data is

typically sent to fabrication on a tape, the event of release of data is called Tape Out.

Layout data is converted (or fractured) into photo-lithographic masks, one for each layer.

Masks identify spaces on the wafer, where certain materials need to be deposited,

diffused or even removed. Silicon crystals are grown and sliced to produce wafers.

Extremely small dimensions of VLSI devices require that the wafers be polished to near

perfection. The fabrication process consists of several steps involving deposition, and

diffusion of various materials on the wafer. During each step one mask is used. Several

dozen masks may be used to complete the fabrication process.

A large wafer is 20 cm (8 inch) in diameter and can be used to produce hundreds of chips,

depending of the size of the chip. Before the chip is mass produced, a prototype is made

and tested. Industry is rapidly moving towards a 30 cm (12 inch) wafer allowing even

more chips per wafer leading to lower cost per chip.

Packaging, Testing and Debugging: Finally, the wafer is fabricated and diced into individual chips in a fabrication facility.

Each chip is then packaged and tested to ensure that it meets all the design specifications

and that it functions properly. Chips used in Printed Circuit Boards (PCBs) are packaged

in Dual In-line Package (DIP), Pin Grid Array (PGA), Ball Grid Array (BGA), and Quad

Flat Package (QFP). Chips used in Multi-Chip Modules (MCM) are not packaged, since

MCMs use bare or naked chips.

FRONT-END DESIGN FLOW VLSI Industry, VLSI is mainly divided into two major domains, first, one is called

front end and second is VLSI Back End, the series is designed according to the practical

approaches, labs and works.

VLSI front end considers all the logical designing and verification part, In simple words it

says all the work up to the gate level or RTL level designing and verification considered



as VLSI front end designing and verification, there are multiple ways for logical

designing of OC in VLSI Front End .

Fig Front end design flow

Y Chart The Gajski-Kuhn Y-chart is a model, which captures the considerations in designing

semiconductor devices. The three domains of the Gajski-Kuhn Y-chart are on radial

axes. Each of the domains can be divided into levels of abstraction, using concentric

rings. At the top level (outer ring), we consider the architecture of the chip; at the lower

levels (inner rings), we successively refine the design into finer detailed implementation

Creating a structural description from a behavioral one is achieved through the processes

of high-level synthesis or logical synthesis. Creating a physical description from a

structural one is achieved through layout synthesis.



Fig Y chart

The design hierarchy involves the principle of "Divide and Conquer." It is nothing but

dividing the task into smaller tasks until it reaches to its simplest level. This process is

most suitable because the last evolution of design has become so simple that its

manufacturing becomes easier.

We can design the given task into the design flow process's domain (Behavioral,

Structural, and Geometrical). To understand this, let’s take an example of designing a 16-

bit adder, as shown in the figure below.



Fig 16-bit Adder

Here, the whole chip of 16 bit adder is divided into four modules of 4-bit adders.

Further, dividing the 4-bit adder into 1-bit adder or half adder. 1 bit addition is the

simplest designing process and its internal circuit is also easy to fabricate on the chip.

Now, connecting all the last four adders, we can design a 4-bit adder and moving on, we

can design a 16-bit adder.

BACK-END DESIGN FLOW Second is VLSI Back End, that considers all the designing and verification part after

logical designing means gate level or RTL level designing shown in Fig , That may

include Floor planning, power Planning, Clock tree synthesis, place and route. Timing

optimization, DRC and LVS and all the foundry work like fabrication, packaging etc., are

also comes under VLSI Back End.



Fig BACK-END DESIGN FLOW



CHAPTER 1

INTRODUCTION 1.1 NETWORK LAYERS OSI stands for Open Systems Interconnection. It has been developed by ISO –

‘International Organization of Standardization‘, in the year 1974. It is a 7 layer

architecture with each layer having specific functionality to perform. All these 7 layers

work collaboratively to transmit the data from one person to another across the globe.

Fig 1.1 Layers in OSI Model

1.1.1 Physical Layer (Layer 1):

The lowest layer of the OSI reference model is the physical layer. It is responsible for

the actual physical connection between the devices. The physical layer contains

information in the form of bits. It is responsible for the actual physical connection

between the devices. When receiving data, this layer will get the signal received and

convert it into 0s and 1s and send them to the Data Link layer, which will put the frame

back together.



The functions of the physical layer are:

1. Bit synchronization: The physical layer provides the synchronization of the bits by

providing a clock. This clock controls both sender and receiver thus providing

synchronization at bit level.

2. Bit rate control: The Physical layer also defines the transmission rate i.e. the

number of bits sent per second.

3. Physical topologies: Physical layer specifies the way in which the different,

devices/nodes are arranged in a network i.e. bus, star or mesh topolgy.

4. Transmission mode: Physical layer also defines the way in which the data flows

between the two connected devices. The various transmission modes possible are:

Simplex, half-duplex and full-duplex. Hub, Repeater, Modem, Cables are Physical

Layer devices. Network Layer, Data Link Layer and Physical Layer are also known

as Lower Layers or Hardware Layers.

1.1.2 Data Link Layer (DLL) (Layer 2):

The data link layer is responsible for the node to node delivery of the message. The

main function of this layer is to make sure data transfer is error free from one node to

another, over the physical layer. When a packet arrives in a network, it is the

responsibility of DLL to transmit it to the Host using its MAC address.

Data Link Layer is divided into two sub layers:

1. Logical Link Control (LLC)

2. Media Access Control (MAC)

The packet received from Network layer is further divided into frames depending on the

frame size of NIC (Network Interface Card). DLL also encapsulates Sender and

Receiver’s MAC address in the header.

The Receiver’s MAC address is obtained by placing an ARP(Address Resolution

Protocol) request onto the wire asking “Who has that IP address?” and the destination

host will reply with its MAC address.



The functions of the data Link layer are:

1. Framing: Framing is a function of the data link layer. It provides a way for a sender

to transmit a set of bits that are meaningful to the receiver. This can be

accomplished by attaching special bit patterns to the beginning and end of the

frame.

2. Physical addressing: After creating frames, Data link layer adds physical addresses

(MAC address) of sender and/or receiver in the header of each frame.

3. Error control: Data link layer provides the mechanism of error control in which it

detects and retransmits damaged or lost frames.

4. Flow Control: The data rate must be constant on both sides else the data may get

corrupted thus , flow control coordinates that amount of data that can be sent before

receiving acknowledgement.

5. Access control: When a single communication channel is shared by multiple

devices, MAC sub-layer of data link layer helps to determine which device has

control over the channel at a given time.

1.1.3 Network Layer (Layer 3):

Network layer works for the transmission of data from one host to the other located in

different networks. It also takes care of packet routing i.e. selection of the shortest path to

transmit the packet, from the number of routes available. The sender & receiver’s IP

address are placed in the header by network layer.

The functions of the Network layer are:

1. Routing: The network layer protocols determine which route is suitable from

source to destination. This function of network layer is known as routing.

2. Logical Addressing: In order to identify each device on internetwork uniquely,

network layer defines an addressing scheme. The sender & receiver’s IP address are

placed in the header by network layer. Such an address distinguishes each device

uniquely and universally. Segment in Network layer is referred as Packet. Network

layer is implemented by networking devices such as routers.



1.1.4 Transport Layer (Layer 4):

Transport layer provides services to application layer and takes services from network

layer. The data in the transport layer is referred to as Segments. It is responsible for the

End to End delivery of the complete message. Transport layer also provides the

acknowledgment of the successful data transmission and re-transmits the data if an error

• At sender’s side: Transport layer receives the formatted data from the upper layers,

performs Segmentation and also implements Flow & Error control to ensure proper data

transmission. It also adds Source and Destination port number in its header and forwards

Note: The sender need to know the port number associated with the receiver’s

application.

Generally, this destination port number is configured, either by default or manually. For

example, when a web application makes a request to a web server, it typically uses port

number 80, because this is the default port assigned to web applications. Many

applications have default port assigned.

•At receiver’s side: Transport Layer reads the port number from its header and forwards

the Data which it has received to the respective application. It also performs sequencing

and reassembling of the segmented data.

The functions of the transport layer are :

1. Segmentation and Reassembly: This layer accepts the message from the (session)

layer, breaks the message into smaller units . Each of the segment produced has a

header associated with it. The transport layer at the destination station reassembles

the message.

2. Service Point Addressing: In order to deliver the message to correct process,

transport layer header includes a type of address called service point address or port

address. Thus by specifying this address, transport layer makes sure that the

message is delivered to the correct process.

The services provided by transport layer :

1. Connection Oriented Service: It is a three-phase process which include

–Connection Establishment

–Data Transfer

– Termination / disconnection

In this type of transmission, the receiving device sends an acknowledgment, back to Department of Electronics & Communication Engineering 14


the source after a packet or group of packet is received. This type of transmission is

reliable and secure.

Connection less service: It is a one phase process and includes Data Transfer. In this

type of transmission, the receiver does not acknowledge receipt of a packet.

Data in the Transport Layer is called as Segments. Transport layer is operated by the

Operating System. It is a part of the OS and communicates with the Application Layer by

making system calls. Transport Layer is called as Heart of OSI model.

1.1.5 Session Layer (Layer 5):

This layer is responsible for establishment of connection, maintenance of sessions,

authentication and also ensures security. The functions of the session layer are :

1. Session establishment, maintenance and termination: The layer allows the two

processes to establish, use and terminate a connection.

2. Synchronization: This layer allows a process to add checkpoints which are

considered as synchronization points into the data. These synchronization point help

to identify the error so that the data is re-synchronized properly, and ends of the

messages are not cut prematurely and data loss is avoided.

3. Dialog Controller: The session layer allows two systems to start communication

with each other in half-duplex or full-duplex.

All the below 3 layers (including Session Layer) are integrated as a single layer in

TCP/IP model as “Application Layer”. Implementation of these 3 layers is done by the

network application itself. These are also known as Upper Layers or Software Layers.

SCENARIO:

Let’s consider a scenario where a user wants to send a message through some

Messenger application running in his browser. The “Messenger” here acts as the

application layer which provides the user with an interface to create the data. This



message or so-called Data is compressed, encrypted (if any secure data).

Fig 1.1.5 General Block Diagram of Communication

1.1.6 Presentation Layer (Layer 6):

Presentation layer is also called the Translation layer. The data from the application

layer is extracted here and manipulated as per the required format to transmit over the

network.

The functions of the presentation layer are :

1. Translation: For example, ASCII to EBCDIC.

2. Encryption/ Decryption: Data encryption translates the data into another form or

code. The encrypted data is known as the cipher text and the decrypted data is

known as plain text. A key value is used for encrypting as well as decrypting data.

3. Compression: Reduces the number of bits that need to be transmitted on the

network.

1.1.7 Application Layer (Layer 7):

At the very top of the OSI Reference Model stack of layers, we find Application layer

which is implemented by the network applications. These applications produce the data,

which has to be transferred over the network. This layer also serves as a window for the

application services to access the network and for displaying the received information to

the user.

1. Network Virtual Terminal

2. FTAM-File transfer access and management



3. Mail Services

4. Directory Services

OSI model acts as a reference model and is not implemented in Internet because of its late

invention. Current model being used is the TCP/IP model

There are many reasons such as noise, cross-talk etc., which may help data to get

corrupted during transmission. The upper layers work on some generalized view of

network architecture and are not aware of actual hardware data processing. Hence, the

upper layers expect error-free transmission between the systems. Most of the applications

would not function expectedly if they receive erroneous data. Applications such as voice

and video may not be that affected and with some errors they may still function well.

Data-link layer uses some error control mechanism to ensure that frames (data bit

streams) are transmitted with certain level of accuracy. But to understand how errors is

controlled, it is essential to know what types of errors may occur.

1.2 TYPES OF ERRORS

There may be three types of errors:

Single bit error

In a frame, there is only one bit, anywhere though, which is corrupt.

Multiple bits error

Frame is received with more than one bits in corrupted state.

Burst error

Frame contains more than1 consecutive bits corrupted.Department of Electronics & Communication Engineering 17


Error control mechanism may involve two possible ways:

Error detection Error correction

1.2.1 Error Detection:Errors in the received frames are detected by means of Parity Check and Cyclic

Redundancy Check (CRC). In both cases, few extra bits are sent along with actual data

to confirm that bits received at other end are same as they were sent. If the counter-check

at receiver’ end fails, the bits are considered corrupted.

1.2.2 Parity Check:

One extra bit is sent along with the original bits to make number of 1s either even in case

of even parity, or odd in case of odd parity.

The sender while creating a frame counts the number of 1s in it. For example, if even

parity is used and number of 1s is even then one bit with value 0 is added. This way

number of 1s remains even. If the number of 1s is odd, to make it even a bit with value 1

is added.

The receiver simply counts the number of 1s in a frame. If the count of 1s is even and

even parity is used, the frame is considered to be not-corrupted and is accepted. If the

count of 1s is odd and odd parity is used, the frame is still not corrupted.

If a single bit flips in transit, the receiver can detect it by counting the number of 1s. But

when more than one bits are error nous, then it is very hard for the receiver to detect the

error.

1.2.3 Cyclic Redundancy Check (CRC)

CRC is a different approach to detect if the received frame contains valid data. This

technique involves binary division of the data bits being sent. The divisor is generated

using polynomials. The sender performs a division operation on the bits being sent and

calculates the remainder. Before sending the actual bits, the sender adds the remainder at



the end of the actual bits. Actual data bits plus the remainder is called a codeword. The

sender transmits data bits as code words.

At the other end, the receiver performs division operation on codewords using the same

CRC divisor. If the remainder contains all zeros the data bits are accepted, otherwise it is

considered as there some data corruption occurred in transit.

Error Correction

In the digital world, error correction can be done in two ways:

Backward Error Correction: When the receiver detects an error in the data

received, it requests back the sender to retransmit the data unit.

Forward Error Correction: When the receiver detects some error in the data

received, it executes error-correcting code, which helps it to auto-recover and to

correct some kinds of errors.

The first one, Backward Error Correction, is simple and can only be efficiently used

where retransmitting is not expensive. For example, fiber optics. But in case of wireless

transmission retransmitting may cost too much. In the latter case, Forward Error

Correction is used.



To correct the error in data frame, the receiver must know exactly which bit in the frame

is corrupted. To locate the bit in error, redundant bits are used as parity bits for error

detection. For example, we take ASCII words (7 bits data), then there could be 8 kind of

information we need: first seven bits to tell us which bit is error and one more bit to tell

that there is no error.

For m data bits, r redundant bits are used. r bits can provide 2r combinations of

information. In m+r bit codeword, there is possibility that the r bits themselves may get

corrupted. So the number of r bits used must inform about m+r bit locations plus no-

error information, i.e. m+r+1.

1.2.4 Generator Polynomials

Why is the predetermined c+1-bit divisor that's used to calculate a CRC called a generator

polynomial? In my opinion, far too many explanations of CRCs actually try to answer

that question. This leads their authors and readers down a long path that involves tons of

detail about polynomial arithmetic and the mathematical basis for the usefulness of

CRCs. This academic stuff is not important for understanding CRCs sufficiently to

implement and/or use them and serves only to create potential confusion. So I'm not

going to answer that question here.

Suffice it to say here only that the divisor is sometimes called a generator polynomial and

that you should never make up the divisor's value on your own. Several mathematically

well-understood generator polynomials have been adopted as parts of various

international communications standards; you should always use one of those. If you have

a background in polynomial arithmetic then you know that certain generator polynomials

are better than others for producing strong checksums. The ones that have been adopted

internationally are among the best of these.

Table 1 lists some of the most commonly used generator polynomials for 16- and 32-bit

CRCs. Remember that the width of the divisor is always one bit wider than the remainder.

So, for example, you'd use a 17-bit generator polynomial whenever a 16-bit checksum is

required.



Checksum Width Generator Polynomial

CRC-CCITT 16 bits 10001000000100001

CRC-16 16 bits 11000000000000101

CRC-32 32 bits 100000100110000010001110110110111

Table 1 International standard CRC polynomials

As is the case with other types of checksums, the width of the CRC plays an important

role in the error detection capabilities of the algorithm. Ignoring special types of errors

that are always detected by a particular checksum algorithm, the percentage of detectable

errors is limited strictly by the width of a checksum. A checksum of c bits can only take

one of 2c unique values. Since the number of possible messages is significantly larger

than that, the potential exists for two or more messages to have an identical checksum. If

one of those messages is somehow transformed into one of the others during

transmission, the checksum will appear correct and the receiver will unknowingly accept

a bad message. The chance of this happening is directly related to the width of the

checksum. Specifically, the chance of such an error is 1/2c. Therefore, the probability of

any random error being detected is 1-1/2c.

To repeat, the probability of detecting any random error increases as the width of the

checksum increases. Specifically, a 16-bit checksum will detect 99.9985% of all errors.

This is far better than the 99.6094% detection rate of an eight-bit checksum, but not

nearly as good as the 99.9999% detection rate of a 32-bit checksum. All of this applies to

both CRCs and addition-based checksums. What really sets CRCs apart, however, is the

number of special cases that can be detected 100% of the time. For example, I pointed out

last month that two opposite bit inversions (one bit becoming 0, the other becoming 1) in

the same column of an addition would cause the error to be undetected. Well, that's not

the case with a CRC.



By using one of the mathematically well-understood generator polynomials like those in

Table 1 to calculate a checksum, it's possible to state that the following types of errors

will be detected without fail:

A message with any one bit in error

A message with any two bits in error (no matter how far apart, which column, and

so on)

A message with any odd number of bits in error (no matter where they are)

A message with an error burst as wide as the checksum itself

The first class of detectable error is also detected by an addition-based checksum, or even

a simple parity bit. However, the middle two classes of errors represent much stronger

detection capabilities than those other types of checksum. The fourth class of detectable

error sounds at first to be similar to a class of errors detected by addition-based

checksums, but in the case of CRCs, any odd number of bit errors will be detected. So the

set of error bursts too wide to detect is now limited to those with an even number of bit

errors. All other types of errors fall into the relatively high 1-1/2c probability of detection.

1.2.5 Ethernet, SLIP, and PPP

Ethernet, like most physical layer protocols, employs a CRC rather than an additive

checksum. Specifically, it employs the CRC-32 algorithm. The likelihood of an error in a

packet sent over Ethernet being undetected is, therefore, extremely low. Many types of

common transmission errors are detected 100% of the time, with the less likely ones

detected 99.9999% of the time. Even if an error would somehow manage to get through at

the Ethernet layer, it would probably be detected at the IP layer checksum (if the error is

in the IP header) or in the TCP or UDP layer checksum above that. After all the chances

of two or more different checksum algorithms not detecting the same error is extremely

remote.

However, many embedded systems that use TCP/IP will not employ Ethernet. Instead,

they will use either the serial line Internet protocol (SLIP) or point-to-point protocol

(PPP) to send and receive IP packets directly over a serial connection of some sort.

Unfortunately, SLIP does not add a checksum or a CRC to the data from the layers above.

So unless a pair of modems with error correction capabilities sits in between the two



communicating systems, any transmission errors must hope to be detected by the

relatively weak, addition-based Internet checksum described last month. The newer,

compressed SLIP (CSLIP) shares this weakness with its predecessor. PPP, on the other

hand, does include a 16-bit CRC in each of its frames, which can carry the same

maximum size IP packet as an Ethernet frame. So while PPP doesn't offer the same

amount of error detection capability as Ethernet, by using PPP you'll at least avoid the

much larger number of undetected errors that may occur with SLIP or CSLIP.



CHAPTER 2

CRC (CYCLIC REDUNDANCY CHECK)

2.1 INTRODUCTION

The recent advancements in wireless technology have brought forth very high speed

transmission which faces a major challenge from noisy channels. Factors such as signal

attenuation, multiple access interference, inter symbol interference and Doppler shift

degrade the quality of data to a great extent. As a result, the received signal has a chance

of getting corrupted. Consequently, for the alleviation of this problem, error control

coding is not only desirable, but has become a must to achieve an acceptable Bit Error

Rate (BER). For the purpose of checking the integrity of the data being stored or sent

over noisy channels, the most widely adopted among all the error control codes is Cyclic

Redundancy Check Code (CRC). CRCs, first introduced by Peterson and Brown in 1961

are used for the detection of any corruption of the digital signal during its production,

transmission, processing as well as storage stage. Recent wireless technologies namely,

Asynchronous Transfer Mode (ATM) IEEE communication standards, such as wired

Ethernet (IEEE 802.3) Wi-Fi (IEEE 802.11) and WiMAX (IEEE 802.16), employ CRC

for error detection purpose. A very well written understanding of the CRC computation

and collection of most of the published hardware and software implementations of CRC

can be found. Various parallel hardware structures have been described. Several literature

have proposed highly efficient architecture based on performance enhancement as well as

resource utilization. The list also includes Two-Step, Cascade, Look-Ahead, State-Space

Transformed, and Retimed Architectures. The authors of have also proposed a three-step

LFSR architecture which makes it easy to address the two major issues of large fanout

limitation and iteration bound limitation in parallel LFSR architecture which is an integral

part of CRC computation. The authors have also proposed an improved solution for the

fanout problem. The authors have shown that the proposed architecture achieves

significant improvement in terms of processing speed and hardware efficiency. the



authors have extended the above idea and presented a mathematical proof showing that a

transformation exists in state space that can reduce the complexity of the parallel LFSR

feedback loop along with a new idea for high speed parallel implementation of linear

feedback shift registers based upon parallel IIR filter. In some other methods the authors

have proposed a novel parallel implementation of long BCH encoders to achieve

significant speedup by eliminating the effect of large fan out. The authors have mentioned

that similar technique can also be used for CRC.

Rapid advancement of semiconductor technologies results in rapidly increasing soft error

rates due to shrink in area, time and power. Temporary faults caused by cosmic radiation,

known as single event upset (SEU), are the main sources of errors. A lot of attention has

been paid for the reliability improvement of all the elementary components reflected by

their much lower failure rates. But the large number and dense population of these

elements on the chip degrade the reliability to a large extent. While error detection codes

and error correction codes have long been used for checking errors, the reliability of the

error correction or detection encoder itself has come up as a major issue. High speed data

transmission requires high speed error detection and parallel processing at the cost of

increased hardware, which increases the probability of error occurrence of internal faults.

As a result, fault detection of encoder and decoder have gained a lot of importance. The

authors have addressed the problem of designing parallel fault-secure encoders by

generating not only error correcting check bits, but also independently and in parallel,

error detecting check bits. The next step involves the comparison of error detection check

bits with another set of error detecting check bits generated from error correction check

bits. This design is significantly cost effective compared to duplication with comparison

technique. There are several other literature which deal with the similar topic. In some

papers the authors have also discussed the design of a fault secure encoder in between a

fault secure RAM and a faulty channel. Error detection and error correction using Residue

Number System have been discussed. We find the description of fault detection of Reed

Solomon encoder. The authors have presented a new approach for the automated

synthesis of multilevel circuits with concurrent error detection using a parity-check code.

They have developed a new procedure namely structure-constrained logic optimization

and proved that this design along with a checker forms a self-checking circuit Our

proposed method of the multiple bit parity based concurrent fault detection scheme for

parallel CRC computation inherently exploits the non-overlapping feature of the parity



generation circuits. We have presented a design concept of multiple-bit parity-based

concurrent fault detection for parallel CRC architecture. In the proposed scheme, the

generated CRC output is divided into a few blocks based on the number of parity bits and

has been made to go through a multiple-bit parity comparison unit. Then, the actual parity

bits are compared with the predicted multiple-bit parity of the blocks to incorporate the

fault detection architecture.

In the cyclic redundancy check, a fixed number of check bits, often called a checksum,

are appended to the message that needs to be transmitted. The data receivers receive the

data and inspect the check bits for any errors. Mathematically, data receivers check on the

check value attached by finding the remainder of the polynomial division of the contents

transmitted. If it seems that an error has occurred, a negative acknowledgement is

transmitted asking for data retransmission.

A cyclic redundancy check is also applied to storage devices like hard disks. In this

case, check bits are allocated to each block in the hard disk. When a corrupt or incomplete

file is read by the computer, the cyclic redundancy error is reported. This could be from

another storage device or from CD/DVDs. The common reasons for errors include system

crashes, incomplete or corrupt files, or files with lots of bugs.

CRC polynomial designs depend on length of block to be protected, error protection

features, resource for CRC implementation, and performance.

A cyclic redundancy check (CRC) is an error-detecting code commonly used in

digital networks and storage devices to detect accidental changes to raw data. Blocks of

data entering these systems get a short check value attached, based on the remainder of

a polynomial division of their contents. On retrieval, the calculation is repeated and, in

the event the check values do not match, corrective action can be taken against data

corruption. CRCs can be used for error correction.

CRCs are so called because the check (data verification) value is a redundancy (it

expands the message without adding information) and the algorithm is based

on cyclic codes. CRCs are popular because they are simple to implement in

binary hardware, easy to analyze mathematically, and particularly good at detecting

common errors caused by noise in transmission channels. Because the check value has a

fixed length, the function that generates it is occasionally used as a hash function.


https://en.wikipedia.org/wiki/Hash_function

https://en.wikipedia.org/wiki/Function_(mathematics)

https://en.wikipedia.org/wiki/Noise_(electronics)

https://en.wikipedia.org/wiki/Computer_hardware

https://en.wikipedia.org/wiki/Cyclic_code

https://en.wikipedia.org/wiki/Algorithm

https://en.wikipedia.org/wiki/Entropy_(information_theory)

https://en.wikipedia.org/wiki/Error_correcting_code

https://en.wikipedia.org/wiki/Polynomial_long_division

https://en.wikipedia.org/wiki/Telecommunications_network

https://en.wikipedia.org/wiki/Error_detection_and_correction


The CRC was invented by W. Wesley Peterson in 1961; the 32-bit CRC function, used

in Ethernet and many other standards, is the work of several researchers and was

published in 1975.

CRCs are based on the theory of cyclic error-correcting codes. The use

of systematic cyclic codes, which encode messages by adding a fixed-length check value,

for the purpose of error detection in communication networks, was first proposed by W.

Wesley Peterson in 1961. Cyclic codes are not only simple to implement but have the

benefit of being particularly well suited for the detection of burst errors: contiguous

sequences of erroneous data symbols in messages. This is important because burst errors

are common transmission errors in many communication channels, including magnetic

and optical storage devices. Typically an n-bit CRC applied to a data block of arbitrary

length will detect any single error burst not longer than n bits and the fraction of all

longer error bursts that it will detect is (1 − 2−n).

Specification of a CRC code requires definition of a so-called generator polynomial.

This polynomial becomes the divisor in a polynomial long division, which takes the

message as the dividend and in which the quotient is discarded and

the remainder becomes the result. The important caveat is that the

polynomial coefficients are calculated according to the arithmetic of a finite field, so the

addition operation can always be performed bitwise-parallel (there is no carry between

digits).

In practice, all commonly used CRCs employ the Galois field of two elements, GF(2).

The two elements are usually called 0 and 1, comfortably matching computer

architecture.

A CRC is called an n-bit CRC when its check value is n bits long. For a given n, multiple

CRCs are possible, each with a different polynomial. Such a polynomial has highest

degree n, which means it has n + 1 terms. In other words, the polynomial has a length

of n + 1; its encoding requires n + 1 bits. Note that most polynomial specifications either

drop the MSB or LSB, since they are always 1. The CRC and associated polynomial

typically have a name of the form CRC-n-XXX as in the table below.

The simplest error-detection system, the parity bit, is in fact a 1-bit CRC: it uses the

generator polynomial x + 1 (two terms), and has the name CRC-1.


https://en.wikipedia.org/wiki/Parity_bit

https://en.wikipedia.org/wiki/Cyclic_redundancy_check#table

https://en.wikipedia.org/wiki/Least_significant_bit

https://en.wikipedia.org/wiki/Most_significant_bit

https://en.wikipedia.org/wiki/GF(2)

https://en.wikipedia.org/wiki/Galois_field

https://en.wikipedia.org/wiki/Finite_field

https://en.wikipedia.org/wiki/Coefficient

https://en.wikipedia.org/wiki/Remainder

https://en.wikipedia.org/wiki/Quotient

https://en.wikipedia.org/wiki/Division_(mathematics)

https://en.wikipedia.org/wiki/Polynomial_long_division

https://en.wikipedia.org/wiki/Divisor

https://en.wikipedia.org/wiki/Generator_polynomial

https://en.wikipedia.org/wiki/Communication_channel

https://en.wikipedia.org/wiki/Burst_error

https://en.wikipedia.org/wiki/W._Wesley_Peterson


https://en.wikipedia.org/wiki/Systematic_code

https://en.wikipedia.org/wiki/Error-correcting_code

https://en.wikipedia.org/wiki/Cyclic_code



2.2 DATA INTEGRITY

CRCs are specifically designed to protect against common types of errors on

communication channels, where they can provide quick and reasonable assurance of

the integrity of messages delivered. However, they are not suitable for protecting against

intentional alteration of data.

Firstly, as there is no authentication, an attacker can edit a message and recomputed the

CRC without the substitution being detected. When stored alongside the data, CRCs and

cryptographic hash functions by themselves do not protect

against intentional modification of data. Any application that requires protection against

such attacks must use cryptographic authentication mechanisms, such as message

authentication codes or digital signatures (which are commonly based on cryptographic

hash functions).

Secondly, unlike cryptographic hash functions, CRC is an easily reversible function,

which makes it unsuitable for use in digital signatures.[4]

Thirdly, CRC is a linear function with a property that as a result, even if the CRC is

encrypted with a stream cipher that uses XOR as its combining operation

(or mode of block cipher which effectively turns it into a stream cipher, such as OFB or

CFB), both the message and the associated CRC can be manipulated without knowledge

of the encryption key; this was one of the well-known design flaws of the Wired

Equivalent Privacy (WEP) protocol.

2.3 DESIGNING POLYNOMIALS The selection of the generator polynomial is the most important part of implementing

the CRC algorithm. The polynomial must be chosen to maximize the error-detecting

capabilities while minimizing overall collision probabilities.

The most important attribute of the polynomial is its length (largest degree(exponent) +1

of any one term in the polynomial), because of its direct influence on the length of the

computed check value.

The most commonly used polynomial lengths are:

9 bits (CRC-8)

17 bits (CRC-16)


https://en.wikipedia.org/wiki/Wired_Equivalent_Privacy

https://en.wikipedia.org/wiki/Wired_Equivalent_Privacy

https://en.wikipedia.org/wiki/Block_cipher

https://en.wikipedia.org/wiki/Block_cipher_modes_of_operation

https://en.wikipedia.org/wiki/Exclusive_or

https://en.wikipedia.org/wiki/Stream_cipher

https://en.wikipedia.org/wiki/Linear_function

https://en.wikipedia.org/wiki/Cyclic_redundancy_check#cite_note-stigge-reversecrc-4

https://en.wikipedia.org/wiki/Cryptographic_hash

https://en.wikipedia.org/wiki/Cryptographic_hash

https://en.wikipedia.org/wiki/Digital_signatures

https://en.wikipedia.org/wiki/Message_authentication_code

https://en.wikipedia.org/wiki/Message_authentication_code

https://en.wikipedia.org/wiki/Data_integrity


33 bits (CRC-32)

65 bits (CRC-64)

A CRC is called an n-bit CRC when its check value is n-bits. For a given n, multiple

CRCs are possible, each with a different polynomial. Such a polynomial has highest

degree n, and hence n + 1 terms (the polynomial has a length of n + 1). The remainder has

length n. The CRC has a name of the form CRC-n-XXX.

The design of the CRC polynomial depends on the maximum total length of the block to

be protected (data + CRC bits), the desired error protection features, and the type of

resources for implementing the CRC, as well as the desired performance. A common

misconception is that the "best" CRC polynomials are derived from either irreducible

polynomials or irreducible polynomials times the factor 1 + x, which adds to the code the

ability to detect all errors affecting an odd number of bits.[7] In reality, all the factors

described above should enter into the selection of the polynomial and may lead to a

reducible polynomial. However, choosing a reducible polynomial will result in a certain

proportion of missed errors, due to the quotient ring having zero divisors.

The advantage of choosing a primitive polynomial as the generator for a CRC code is

that the resulting code has maximal total block length in the sense that all 1-bit errors

within that block length have different remainders (also called syndromes) and therefore,

since the remainder is a linear function of the block, the code can detect all 2-bit errors

within that block length. If is the degree of the primitive generator polynomial, then the

maximal total block length is , and the associated code is able to detect any single-bit or

double-bit errors. We can improve this situation. If we use the generator polynomial ,

where is a primitive polynomial of degree , then the maximal total block length is , and

the code is able to detect single, double, triple and any odd number of errors.

A polynomial that admits other factorizations may be chosen then so as to balance the

maximal total block length with a desired error detection power. The BCH codes are a

powerful class of such polynomials. They subsume the two examples above. Regardless

of the reducibility properties of a generator polynomial of degree r, if it includes the "+1"

term, the code will be able to detect error patterns that are confined to a window

of r contiguous bits. These patterns are called "error bursts".

2.4 SPECIFICATION


https://en.wikipedia.org/wiki/BCH_code

https://en.wikipedia.org/wiki/Syndrome_decoding

https://en.wikipedia.org/wiki/Primitive_polynomial_(field_theory)

https://en.wikipedia.org/wiki/Zero_divisor

https://en.wikipedia.org/wiki/Cyclic_redundancy_check#cite_note-williams93-7

https://en.wikipedia.org/wiki/Irreducible_polynomial

https://en.wikipedia.org/wiki/Irreducible_polynomial


The concept of the CRC as an error-detecting code gets complicated when an

implementer or standards committee uses it to design a practical system. Here are some of

the complications:

Sometimes an implementation prefixes a fixed bit pattern to the bitstream to be

checked. This is useful when clocking errors might insert 0-bits in front of a message,

an alteration that would otherwise leave the check value unchanged.

Usually, but not always, an implementation appends n 0-bits (n being the size of the

CRC) to the bit stream to be checked before the polynomial division occurs. Such

appending is explicitly demonstrated in the Computation of CRC article. This has the

convenience that the remainder of the original bit stream with the check value

appended is exactly zero, so the CRC can be checked simply by performing the

polynomial division on the received bit stream and comparing the remainder with

zero. Due to the associative and commutative properties of the exclusive-or operation,

practical table driven implementations can obtain a result numerically equivalent to

zero-appending without explicitly appending any zeroes, by using an equivalent,

faster algorithm that combines the message bit stream with the stream being shifted

out of the CRC register.

Sometimes an implementation exclusive-ORs a fixed bit pattern into the remainder of

the polynomial division.

Bit order: Some schemes view the low-order bit of each byte as "first", which then

during polynomial division means "leftmost", which is contrary to our customary

understanding of "low-order". This convention makes sense when serial-

port transmissions are CRC-checked in hardware, because some widespread serial-

port transmission conventions transmit bytes least-significant bit first.

Byte order: With multi-byte CRCs, there can be confusion over whether the byte

transmitted first (or stored in the lowest-addressed byte of memory) is the least-

significant byte (LSB) or the most-significant byte (MSB). For example, some 16-bit

CRC schemes swap the bytes of the check value.

Omission of the high-order bit of the divisor polynomial: Since the high-order bit is

always 1, and since an n-bit CRC must be defined by an (n + 1)-bit divisor

which overflows an n-bit register, some writers assume that it is unnecessary to

mention the divisor's high-order bit.


https://en.wikipedia.org/wiki/Processor_register

https://en.wikipedia.org/wiki/Byte_order

https://en.wikipedia.org/wiki/Serial_port

https://en.wikipedia.org/wiki/Serial_port

https://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks


Omission of the low-order bit of the divisor polynomial: Since the low-order bit is

always 1, authors such as Philip Koopman represent polynomials with their high

order bit intact, but without the low-order bit . This convention encodes the

polynomial complete with its degree in one integer.

These complications mean that there are three common ways to express a polynomial

as an integer: the first two, which are mirror images in binary, are the constants found in

code; the third is the number found in Koopman's papers. In each case, one term is

omitted. So the polynomial may be transcribed as:

0x3 = 0b0011, representing (MSB-first code)

0xC = 0b1100, representing (LSB-first code)

0x9 = 0b1001, representing (Koopman notation)

In the table below they are shown as:

Table 2: CRC Representations

2.5 STANDARDS AND COMMOM USE

Numerous varieties of cyclic redundancy checks have been incorporated into technical

standards. By no means have does one algorithm, or one of each degree suited every

purpose; Koopman and Chakravarty recommend selecting a polynomial according to the

application requirements and the expected distribution of message lengths. The number of

distinct CRCs in use has confused developers, a situation which authors have sought to

address.[7] There are three polynomials reported for CRC-12, nineteen conflicting

definitions of CRC-16, and seven of CRC-32.


https://en.wikipedia.org/wiki/Cyclic_redundancy_check#cite_note-williams93-7

https://en.wikipedia.org/wiki/Technical_standard

https://en.wikipedia.org/wiki/Technical_standard


The polynomials commonly applied are not the most efficient ones possible. Since

1993, Koopman, Castagnoli and others have surveyed the space of polynomials between

3 and 64 bits in size, finding examples that have much better performance (in terms

of Hamming distance for a given message size) than the polynomials of earlier protocols,

and publishing the best of these with the aim of improving the error detection capacity of

future standards. In particular, ISCSI and SCTP have adopted one of the findings of this

research, the CRC-32C (Castagnoli) polynomial.

The design of the 32-bit polynomial most commonly used by standards bodies, CRC-

32-IEEE, was the result of a joint effort for the Rome Laboratory and the Air Force

Electronic Systems Division by Joseph Hammond, James Brown and Shyan-Shiang Liu

of the Georgia Institute of Technology and Kenneth Brayer of the MITRE Corporation.

The earliest known appearances of the 32-bit polynomial were in their 1975 publications:

Technical Report 2956 by Brayer for MITRE, published in January and released for

public dissemination through DTIC in August, and Hammond, Brown and Liu's report for

the Rome Laboratory, published in May. Both reports contained contributions from the

other team. During December 1975, Brayer and Hammond presented their work in a

paper at the IEEE National Telecommunications Conference: the IEEE CRC-32

polynomial is the generating polynomial of a Hamming code and was selected for its

error detection performance. Even so, the Castagnoli CRC-32C polynomial used in ISCSI

or SCTP matches its performance on messages from 58 bits to 131 Kbits, and

outperforms it in several size ranges including the two most common sizes of Internet

packet. The ITU-T G.hn standard also uses CRC-32C to detect errors in the payload

(although it uses CRC-16-CCITT for PHY headers).

CRC32 computation is implemented in hardware as an operation of SSE4.2 instruction

set, first introduced in Intel processors' Nehalem micro architecture.


https://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

https://en.wikipedia.org/wiki/Intel

https://en.wikipedia.org/wiki/SSE4#SSE4.2

https://en.wikipedia.org/wiki/Physical_layer

https://en.wikipedia.org/wiki/G.hn

https://en.wikipedia.org/wiki/ITU-T

https://en.wikipedia.org/wiki/Hamming_code

https://en.wikipedia.org/wiki/DTIC

https://en.wikipedia.org/wiki/Mitre_Corporation

https://en.wikipedia.org/wiki/Georgia_Institute_of_Technology

https://en.wikipedia.org/wiki/Rome_Laboratory

https://en.wikipedia.org/wiki/SCTP

https://en.wikipedia.org/wiki/ISCSI

https://en.wikipedia.org/wiki/Hamming_distance


CHAPTER 3

LINEAR-FEEDBACK SHIFT REGISTER

3.1 FIBONACCI LFSR

In computing, a linear-feedback shift register (LFSR) is a shift register whose input bit

is a linear function of its previous state.

The most commonly used linear function of single bits is exclusive-or (XOR). Thus, an

LFSR is most often a shift register whose input bit is driven by the XOR of some bits of

the overall shift register value.

The initial value of the LFSR is called the seed, and because the operation of the

register is deterministic, the stream of values produced by the register is completely

determined by its current (or previous) state. Likewise, because the register has a finite

number of possible states, it must eventually enter a repeating cycle. However, an LFSR

with a well-chosen feedback function can produce a sequence of bits that appears random

and has a very long cycle. Applications of LFSRs include generating pseudo-random

numbers, pseudo-noise sequences, fast digital counters, and whitening sequences. Both

hardware and software implementations of LFSRs are common.


https://en.wikipedia.org/wiki/Whitening_sequences

https://en.wikipedia.org/wiki/Pseudorandom_noise

https://en.wikipedia.org/wiki/Pseudorandomness

https://en.wikipedia.org/wiki/Pseudorandomness

https://en.wikipedia.org/wiki/Maximal_length_sequence

https://en.wikipedia.org/wiki/Primitive_polynomial_(field_theory)

https://en.wikipedia.org/wiki/Exclusive-or

https://en.wikipedia.org/wiki/Linear#Boolean_functions

https://en.wikipedia.org/wiki/Shift_register

https://en.wikipedia.org/wiki/Computing


The mathematics of a cyclic redundancy check, used to provide a quick check against

transmission errors, are closely related to those of an LFSR. The bit positions that affect

the next state are called the taps. In the diagram the taps are [16,14,13,11]. The rightmost

bit of the LFSR is called the output bit. The taps are XOR'd sequentially with the output

bit and then fed back into the leftmost bit. The sequence of bits in the rightmost position

is called the output stream.

The bits in the LFSR state that influence the input are called taps. A maximum-length

LFSR produces an m-sequence (i.e., it cycles through all possible 2m − 1 states within the

shift register except the state where all bits are zero), unless it contains all zeros, in which

case it will never change.

As an alternative to the XOR-based feedback in an LFSR, one can also use XNOR.[2]

This function is an affine map, not strictly a linear map, but it results in an equivalent

polynomial counter whose state is the complement of the state of an LFSR. A state with

all ones is illegal when using an XNOR feedback, in the same way as a state with all

zeroes is illegal when using XOR. This state is considered illegal because the counter

would remain "locked-up" in this state. The sequence of numbers generated by an LFSR

or its XNOR counterpart can be considered a binary numeral system just as valid as Gray

code or the natural binary code. The arrangement of taps for feedback in an LFSR can be

expressed in finite field arithmetic as a polynomial mod 2. This means that the

coefficients of the polynomial must be 1s or 0s. This is called the feedback polynomial or

reciprocal characteristic polynomial. For example, if the taps are at the 16th, 14th, 13th

and 11th bits (as shown), the feedback polynomial is

The "one" in the polynomial does not correspond to a tap – it corresponds to the input

to the first bit (i.e. x0, which is equivalent to 1). The powers of the terms represent the

tapped bits, counting from the left. The first and last bits are always connected as an input

and output tap respectively. The LFSR is maximal-length if and only if the corresponding

feedback polynomial is primitive. This means that the following conditions are necessary

(but not sufficient):The number of taps is even. The set of taps is set wise co-prime; i.e.,

there must be no divisor other than 1 common to all taps. Tables of primitive polynomials

from which maximum-length LFSRs can be constructed are given below and in the

references. There can be more than one maximum-length tap sequence for a given LFSR

length. Also, once one maximum-length tap sequence has been found, another


https://en.wikipedia.org/wiki/Cyclic_redundancy_check


automatically follows. If the tap sequence in an n-bit LFSR is [n, A, B, C, 0], where the 0

corresponds to the x0 = 1 term, then the corresponding "mirror" sequence is [n, n − C, n −

B, n − A, 0]. So the tap sequence [32, 22, 2, 1, 0] has as its counterpart [32, 31, 30, 10, 0].

Both give a maximum-length sequence.

Fig 3.1: A 16-bit Fibonacci LFSR

3.2 GALOIS LFSRs

Named after the French mathematician Évariste Galois, an LFSR in Galois configuration,

which is also known as modular, internal XORs, or one-to-many LFSR, is an alternate

structure that can generate the same output stream as a conventional LFSR (but offset in

time). In the Galois configuration, when the system is clocked, bits that are not taps are

shifted one position to the right unchanged. The taps, on the other hand, are XORed with

the output bit before they are stored in the next position. The new output bit is the next

input bit. The effect of this is that when the output bit is zero, all the bits in the register

shift to the right unchanged, and the input bit becomes zero. When the output bit is one,

the bits in the tap positions all flip (if they are 0, they become 1, and if they are 1, they

become 0), and then the entire register is shifted to the right and the input bit becomes 1.

To generate the same output stream, the order of the taps is the counterpart (see above) of

the order for the conventional LFSR, otherwise the stream will be in reverse. Note that

the internal state of the LFSR is not necessarily the same. The Galois register shown has

the same output stream as the Fibonacci register in the first section. A time offset exists

between the streams, so a different start point will be needed to get the same output each

cycle.


https://en.wikipedia.org/wiki/%C3%89variste_Galois


Galois LFSRs do not concatenate every tap to produce the new input (the XORing is

done within the LFSR, and no XOR gates are run in serial, therefore the propagation

times are reduced to that of one XOR rather than a whole chain), thus it is possible for

each tap to be computed in parallel, increasing the speed of execution.

In a software implementation of an LFSR, the Galois form is more efficient, as the

XOR operations can be implemented a word at a time: only the output bit must be

examined individually.

Fig 3.2: A 16-bit Galois LFSR.

3.3 MATRIX FORMS

Binary LFSRs of both Fibonacci and Galois configurations can be expressed as linear

functions using matrices. Using the companion matrix of the characteristic polynomial of

the LFSR and denoting the seed as a column vector, the state of the register in Fibonacci

configuration after steps is given by


https://en.wikipedia.org/wiki/Companion_matrix


3.4 SOME POLYNOMIAL FOR MAXIMAL LFSR’S

The following table lists maximal-length polynomials for shift-register lengths up to

24. Note that more than one maximal-length polynomial may exist for any given shift-

register length. A list of alternative maximal-length polynomials for shift-register lengths

4–32 (beyond which it becomes unfeasible to store or transfer them) can be found here:



Table 3: General Polynomials



3.5 OUTPUTS-STREAM PROPERTIES

Ones and zeroes occur in "runs". The output stream 1110010, for example, consists of

four runs of lengths 3, 2, 1, 1, in order. In one period of a maximal LFSR, 2n−1 runs

occur (in the example above, the 3-bit LFSR has 4 runs). Exactly half of these runs

are one bit long, a quarter are two bits long, up to a single run of zeroes n − 1 bits

long, and a single run of ones n bits long. This distribution almost equals the

statistical expectation value for a truly random sequence. However, the probability of

finding exactly this distribution in a sample of a truly random sequence is rather low.

LFSR output streams are deterministic. If the present state and the positions of the

XOR gates in the LFSR are known, the next state can be predicted. This is not

possible with truly random events. With maximal-length LFSRs, it is much easier to

compute the next state, as there are only an easily limited number of them for each

length.

The output stream is reversible; an LFSR with mirrored taps will cycle through the

output sequence in reverse order.

3.6 APPLICATIONS

LFSRs can be implemented in hardware, and this makes them useful in applications

that require very fast generation of a pseudo-random sequence, such as direct-sequence

spread spectrum radio. LFSRs have also been used for generating an approximation

of white noise in various programmable sound generators.

3.6.1 USES AS COUNTERS

The repeating sequence of states of an LFSR allows it to be used as a clock divider or

as a counter when a non-binary sequence is acceptable, as is often the case where

computer index or framing locations need to be machine-readable.[7] LFSR counters have

simpler feedback logic than natural binary counters or Gray-code counters, and therefore

can operate at higher clock rates. However, it is necessary to ensure that the LFSR never

enters an all-zeros state, for example by presetting it at start-up to any other state in the

sequence. The table of primitive polynomials shows how LFSRs can be arranged in

Fibonacci or Galois form to give maximal periods. One can obtain any other period by


https://en.wikipedia.org/wiki/Gray-code_counter

https://en.wikipedia.org/wiki/Counter_(digital)

https://en.wikipedia.org/wiki/Linear-feedback_shift_register#cite_note-xilinx.com-7

https://en.wikipedia.org/wiki/Clock_divider

https://en.wikipedia.org/wiki/Programmable_sound_generator

https://en.wikipedia.org/wiki/White_noise

https://en.wikipedia.org/wiki/Direct-sequence_spread_spectrum


https://en.wikipedia.org/wiki/Deterministic

https://en.wikipedia.org/wiki/Expected_value


adding to an LFSR that has longer period some logic that shortens the sequence by

skipping some states.

3.6.2 USES IN CRYPTOGRAPHY

LFSRs have long been used as pseudo-random number generators for use in stream

ciphers (especially in military cryptography), due to the ease of construction from

simple electromechanical or electronic circuits, long periods, and very

uniformly distributed output streams. However, an LFSR is a linear system, leading to

fairly easy cryptanalysis. For example, given a stretch of known plaintext and

corresponding cipher text, an attacker can intercept and recover a stretch of LFSR output

stream used in the system described, and from that stretch of the output stream can

construct an LFSR of minimal size that simulates the intended receiver by using

the Berlekamp-Massey algorithm. This LFSR can then be fed the intercepted stretch of

output stream to recover the remaining plaintext.

Three general methods are employed to reduce this problem in LFSR-based stream

ciphers:

Non-linear combination of several bits from the LFSR state;

Non-linear combination of the output bits of two or more LFSRs (see also: shrinking

generator); or using Evolutionary algorithm to introduce non-linearity.[8]

Irregular clocking of the LFSR, as in the alternating step generator.

3.6.3 TEST-PATTERN GENERATION Complete LFSR are commonly used as pattern generators for exhaustive testing, since

they cover all possible inputs for an n-input circuit. Maximal-length LFSRs and weighted

LFSRs are widely used as pseudo-random test-pattern generators for pseudo-random test

applications.

3.6.4 SIGNATURE ANALYSIS In built-in self-test (BIST) techniques, storing all the circuit outputs on chip is not

possible, but the circuit output can be compressed to form a signature that will later be

compared to the golden signature (of the good circuit) to detect faults. Since this

compression is lossy, there is always a possibility that a faulty output also generates the

same signature as the golden signature and the faults cannot be detected. This condition is

called error masking or aliasing. BIST is accomplished with a multiple-input signature


https://en.wikipedia.org/wiki/Built-in_self-test

https://en.wikipedia.org/wiki/Alternating_step_generator

https://en.wikipedia.org/wiki/Linear-feedback_shift_register#cite_note-8

https://en.wikipedia.org/wiki/Evolutionary_algorithm

https://en.wikipedia.org/wiki/Shrinking_generator

https://en.wikipedia.org/wiki/Shrinking_generator

https://en.wikipedia.org/wiki/State_(computer_science)

https://en.wikipedia.org/wiki/Bit

https://en.wikipedia.org/wiki/Non-linear

https://en.wikipedia.org/wiki/Berlekamp-Massey_algorithm

https://en.wikipedia.org/wiki/Known-plaintext_attack

https://en.wikipedia.org/wiki/Known-plaintext_attack

https://en.wikipedia.org/wiki/Cryptanalysis

https://en.wikipedia.org/wiki/Probability_distribution

https://en.wikipedia.org/wiki/Periodic_function

https://en.wikipedia.org/wiki/Electronic_circuits

https://en.wikipedia.org/wiki/Electromechanical

https://en.wikipedia.org/wiki/Cryptography

https://en.wikipedia.org/wiki/Military



https://en.wikipedia.org/wiki/Pseudo-random_number_generator


register (MISR or MSR), which is a type of LFSR. A standard LFSR has a single XOR or

XNOR gate, where the input of the gate is connected to several "taps" and the output is

connected to the input of the first flip-flop. A MISR has the same structure, but the input

to every flip-flop is fed through an XOR/XNOR gate. For example, a 4-bit MISR has a 4-

bit parallel output and a 4-bit parallel input. The input of the first flip-flop is

XOR/XNORd with parallel input bit zero and the "taps". Every other flip-flop input is

XOR/XNORd with the preceding flip-flop output and the corresponding parallel input bit.

Consequently, the next state of the MISR depends on the last several states opposed to

just the current state. Therefore, a MISR will always generate the same golden signature

given that the input sequence is the same every time.

3.6.5 USES IN DIGITAL BROADCASTING To prevent short repeating sequences (e.g., runs of 0s or 1s) from forming spectral lines

that may complicate symbol tracking at the receiver or interfere with other transmissions,

the data bit sequence is combined with the output of a linear-feedback register before

modulation and transmission. This scrambling is removed at the receiver after

demodulation. When the LFSR runs at the same bit rate as the transmitted symbol stream,

this technique is referred to as scrambling. When the LFSR runs considerably faster than

the symbol stream, the LFSR-generated bit sequence is called chipping code. The

chipping code is combined with the data using exclusive or before transmitting

using binary phase-shift keying or a similar modulation method. The resulting signal has

a higher bandwidth than the data, and therefore this is a method of spread-

spectrum communication. When used only for the spread-spectrum property, this

technique is called direct-sequence spread spectrum; when used to distinguish several

signals transmitted in the same channel at the same time and frequency, it is called code

division multiple access.

Neither scheme should be confused with encryption or encipherment; scrambling and

spreading with LFSRs do not protect the information from eavesdropping. They are

instead used to produce equivalent streams that possess convenient engineering properties

to allow robust and efficient modulation and demodulation.


https://en.wikipedia.org/wiki/Encipherment

https://en.wikipedia.org/wiki/Encryption

https://en.wikipedia.org/wiki/Code_division_multiple_access

https://en.wikipedia.org/wiki/Code_division_multiple_access


https://en.wikipedia.org/wiki/Spread-spectrum

https://en.wikipedia.org/wiki/Spread-spectrum

https://en.wikipedia.org/wiki/Binary_phase-shift_keying

https://en.wikipedia.org/wiki/Exclusive_or

https://en.wikipedia.org/wiki/Scrambler

https://en.wikipedia.org/wiki/Bit_rate


Digital broadcasting systems that use linear-feedback registers:

ATSC Standards (digital TV transmission system – North America)

DAB (Digital Audio Broadcasting system – for radio)

DVB-T (digital TV transmission system – Europe, Australia, parts of Asia)

NICAM (digital audio system for television)

Other digital communications systems using LFSRs:

INTELSAT business service (IBS)

Intermediate data rate (IDR)

SDI (Serial Digital Interface transmission)

Data transfer over PSTN (according to the ITU-T V-series recommendations)

CDMA (Code Division Multiple Access) cellular telephony

100BASE-T2 "fast" Ethernet scrambles bits using an LFSR

1000BASE-T Ethernet, the most common form of Gigabit Ethernet, scrambles bits

using an LFSR

PCI Express 3.0

SATA


https://en.wikipedia.org/wiki/SATA

https://en.wikipedia.org/wiki/PCI_Express

https://en.wikipedia.org/wiki/Gigabit_Ethernet#1000BASE-T

https://en.wikipedia.org/wiki/Fast_Ethernet#100BASE-T2

https://en.wikipedia.org/wiki/CDMA

https://en.wikipedia.org/wiki/ITU-T

https://en.wikipedia.org/wiki/PSTN

https://en.wikipedia.org/wiki/Serial_digital_interface

https://en.wikipedia.org/wiki/NICAM

https://en.wikipedia.org/wiki/DVB-T

https://en.wikipedia.org/wiki/Digital_Audio_Broadcasting

https://en.wikipedia.org/wiki/Digital_Audio_Broadcasting

https://en.wikipedia.org/wiki/ATSC_Standards


CHAPTER 4

PROPOSED METHOD4.1 INTRODUCTION In this section, we present the architecture of the proposed concurrent Error detection

of parallel CRC structure based on multiple-bit parity prediction. We have already

formulated multiple-bit parity prediction expressions for l >m when l is the number of

input message bits in each iteration and m is the number of CRC bits.

Fig 4.1: Simplified block diagram of the proposed multiple-bit parity-based error

detection structure of parallel CRC architecture.

Considering the parallel CRC architecture proposed in to apply a multiple-bit parity

approach on this architecture to achieve concurrent fault detection. This multiple- bit

parity checking circuit will compare predicted parity and actual parity of the syndrome

bits at the end of each iteration and raise an alarm if they do not match. The output of the

parity calculation unit is the actual multiple-bit parity of CRC syndrome bits after nth

iteration, whereas the output of the parity prediction unit is the predicted multiple-bit

parity. The actual parity is obtained by XORing the outputs of the CRC Unit, whereas the

predicted parity is evaluated as a different function of inputs.



4.2 PARITY Parity is the simplest form of error checking. It adds one bit to the pattern and then

requires that the modulo-2 sum of all the bits of the pattern and the parity bit have a

defined answer. The answer is 0 for even parity and 1 for odd parity. An alternative way

of making the same statement is that odd(even) parity constrains there to be an odd(even)

number of “1"s in the pattern plus parity bit. Parity bits are sufficient to catch all single

errors in the pattern plus parity bit as this will change a single 1 to a 0 or vice versa and

therefore upset the parity calculation. A repeat of the parity calculation at any time will

reveal this problem. However the system will not catch any double errors (except the

trivial case of two errors on the same bit) and these will be flagged as valid. For example

a pattern of 0110100 becomes 00110100 with the addition of a parity bit and enforcement

of odd parity. A single error would ( for example) change the pattern to 00110000 which

has the wrong parity. However a further error to 10110000 looks OK - but is in fact

wrong. The length of pattern which each parity bit “guards” should be as long as possible

so that the parity bit takes up the least amount of extra storage, but not so long that a

double error becomes likely. Thus if the probability of a single bit being is error is 10 ,

then the -6 probability of an error in an 8-bit pattern (7 + parity) is about 8 x 10 and the

probability -6 of a double error is about 3 x 10 which is quite small.

Fig 4.2: Parity Prediction circuit of each block of the proposed multi bitscheme for l < m.



Table3: List of the Symbols Used



CHAPTER 5

RESULTS5.1 SIMULATION RESULTS

16-bit Generator Polynomial: X16+X15+X2+1

8-bit Message Bits: 8’b00000001

16-bit Theoretical CRC: 16’b0111110100000111

Fig 5.1 RTL SCHEMATIC FOR TOP MODULE





Fig.5.2 INTERNAL STRUCTURE FOR TOP MODULE



Fig. 5.3 SIMULATION RESULT WITHOUT INTRODUCING ERROR

Fig. 5.4 SIMULATION RESULT WITH INTRODUCING ERROR



5.2 SYNTHESIS REPORTThe design has been synthesized by using genus compiler in 90nm and 45nm technology.

The results for leakage power, Dynamic power, total Power and area have been noted

down and compared. The comparison results shown in table are low for 45nm technology

and is best suited for chip fabrication.

Leakage

power

(µW)

Dynamic

power

(µW)

Total power

(µW)

Area

(microns)

Time

(ns)

90 nm 4.629 544.14 548.779 911.308 1.4

45 nm 0.018 312.003 312.00 0.9*10^-3 1.3

Table 4: Comparison table for power and area

Table 5: Comparison of serial CRC and Parallel CRC

It is evident from the above table that the area, power is reduced and the timing slack is

improved to a great extent in 90nm technology. The parallel CRC implementation

employing multi-output LFSR is better.

CHAPTER 6


Serial CRC Parallel CRC

Area (microns) 585 529

Total power(nW) 108005.943 57212.539

Timing slack(pS) 4809 15130


CONCLUSIONS A concurrent error detection design structure for parallel CRC architecture is been

proposed. Two different cases, namely l > m and l < m have been considered and the

formulation for the multiple-bit parity-based concurrent fault detection scheme has been

derived followed by the implementation of the corresponding architecture for each of

them. The error and fault coverage of the proposed scheme have been analyzed in detail

using error and fault models. We have also presented a formulation for the error detection

capability of the scheme covering all the possible 2m-1 case where the output can become

erroneous. Moreover, a thorough analysis of all the possible single stuck-at fault locations

has been discussed. Furthermore, Investigated the relation between stuck-at fault location

and error propagation resulting in an erroneous output and the relation between this

propagation with the CRC generator polynomial. Finally, the fault detection capability

and the theoretical time and area overheads of the proposed schemes have been reported

and confirmed by software simulation and ASIC implementation results. We have coded

the proposed scheme in C for software simulation and in Verilog for ASIC

implementation purpose. The proposed method is shown to be area efficient and at the

same time it has a high fault detection capability. The proposed multiple-bit parity

prediction scheme has also been shown to have not only the ability to detect a fault, but

also to point out the region of the fault. This potential advantage can further be utilized

while extending the proposed scheme to correct the error online in future, i.e., to achieve

an error free output even in the presence of a hardware error.



REFERENCES1. W. Peterson and D. Brown, “Cyclic codes for error detection,” Proc. IRE, vol. 49, no. ,

pp. 228–235, 1961.

2. ATM Layer Specification, ITU-T Recommendation I.361, 1999.

3. IEEE Standard for Information Technology: Carrier Sense Multiple Access with

Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications,

ANSI/IEEE Std 802.3-2005.

4. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)

Specifications, ANSI/IEEE Std 802.11-1999.

5. IEEE Standard for Local and Metropolitan Area Networks: Air Interface for Fixed and

Mobile Broadband Wireless Access Systems, ANSI/IEEE Std. 802.16-2004.

6. C. Kennedy, “High performance hardware and software implementations of the cyclic

redundancy check computation,” PhD Thesis, University of Western Ontario, London,

ON, 2009.

7. S. Lin and D.J. Costello, Error Control Coding. Englewood Cliffs, NJ: Prentice-Hall,

1983.

8. T. Ramabadran and S. Gaitonde, “A tutorial on CRC computations,” IEEE Micro, vol.

no. 4, pp. 62–75, Aug. 1988.

9. T.-B. Pei and C. Zukowski, “High-speed parallel CRC circuits in VLSI,” IEEE Trans.

Commun., vol. 40, no. 4, pp. 653–657, Apr. 1992.

10. G. Albertengo and R. Sisto, “Parallel CRC generation,” IEEE Micro, vol. 10, no. 5,

pp. 63–71, Oct. 1990.

11. G. Campobello, G. Patane, and M. Russo, “Parallel CRC realization,” IEEE Trans.

Comput., vol. 52, no. 10, pp. 1312–1319, Oct. 2003.

12. M.-D. Shieh, M.-H. Sheu, C.-H. Chen, and H.-F. Lo, “A systematic approach for

parallel CRC computations,” J. Inf. Sci. Eng., vol. 17, no. 3, pp. 445–461, 2001.

13. M. Sprachmann, “Automatic generation of parallel CRC circuits,” IEEE Des. Test

Comput., vol. 18, no. 3, pp. 108–114, May/Jun. 2001.

14. C. Cheng and K. K. Parhi, “High-speed parallel CRC implementation based on

unfolding, pipelining, and retiming,” IEEE Trans. Circuits Syst. II, Express Briefs, vol.

53, no. 10, pp. 1017–1021, Oct. 2006.



15. C. Cheng and K. K. Parhi, “High speed VLSI architecture for general linear feedback

shift register (LFSR) structures,” in Proc. IEEE 43rd Asilomar Conf. Signals, Syst.

Comput., 2009, pp. 713–717.

16. M. Ayinala and K. K. Parhi, “High-speed parallel architectures for linear feedback

shift registers,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4459–4469, Sep. 2011.

17. C. Kennedy and A. Reyhani-Masoleh, “High-speed parallel CRC circuits,” in Proc.

42nd Asilomar Conf. Signals, Syst. Comput., Oct. 2008, pp. 1823–1829.

18. M. Braun, J. Friedrich, T. Grn, and J. Lembert, “Parallel CRC computation in

FPGAs” in Proc. 6th Int. Workshop Field-Programm. Logic Appl., 1996, vol. 1142, pp.

156–165.

19. R. Glaise, “A two-step computation of cyclic redundancy code CRC-32 for ATM

networks,” IBM J. Res. Develop., vol. 41, no. 6, pp. 705–709, 1997.

20. J. Derby, “High-speed CRC computation using state-space transformations,” in Proc.

IEEE Global Telecommun. Conf., 2001, vol. 1, pp. 166–170.

21. P. Reviriego, S. Pontarelli and J. A. Maestro, “Concurrent error detection for

orthogonal latin squares encoders and syndrome computation,” IEEE Trans. Very Large

Scale Integration Syst., vol. 21, no 12, pp. 2334–2338, Dec. 2013.


Date post:	21-Nov-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Multiple Bit Parity Based Concurrent Error Detection ... · Web viewIn a software implementation of...

Documents