Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
APPENDIX OVERVIEW OF VLSI Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC)
by combining hundreds of thousands of transistors or devices into a single chip. VLSI
began in the 1970s when complex semiconductor and communication technologies were
being developed. The microprocessor is a VLSI device. Before the introduction of VLSI
technology most ICs had a limited set of functions they could perform. An electronic
circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets IC designers
add all of these into one chip. The history of the transistor dates to the 1920s when several
inventors attempted devices that were intended to control current in solid-state diodes and
convert them into triodes. Success came after World War II, when the use of silicon and
germanium crystals as radar detectors led to improvements in fabrication and theory.
Scientists who had worked on radar returned to solid-state device development. With the
invention of transistors at Bell Labs in 1947, the field of electronics shifted from vacuum
tubes to solid-state device.
With the small transistor at their hands, electrical engineers of the 1950s saw the
possibilities of constructing far more advanced circuits. However, as the complexity of
circuits grew, problems arose. One problem was the size of the circuit. A complex circuit
like a computer was dependent on speed. If the components were large, the wires
interconnecting them must be long. The electric signals took time to go through the
circuit, thus slowing the computer. The invention of the integrated circuit by Jack
Kilby and Robert Noyce solved this problem by making all the components and the chip
out of the same block (monolith) of semiconductor material. The circuits could be made
smaller, and the manufacturing process could be automated. This led to the idea of
integrating all components on a single-crystal silicon wafer, which led to small-scale
integration (SSI) in the early 1960s, medium-scale integration (MSI) in the late 1960s,
and then large-scale integration (LSI) as well as VLSI in the 1970s and 1980s, with tens
of thousands of transistors on a single chip (later hundreds of thousands, then millions,
and now billions (109)). The first semiconductor chips held two transistors each.
Subsequent advances added more transistors, and as a consequence, more individual
functions or systems were integrated over time. The first integrated circuits held only a
few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
Department of Electronics & Communication Engineering 1
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
possible to fabricate one or more logic gates on a single device. Now known
retrospectively as small-scale integration (SSI), improvements in technique led to devices
with hundreds of logic gates, known as medium-scale integration (MSI). Further
improvements led to large-scale integration (LSI), i.e. systems with at least a thousand
logic gates. Current technology has moved far past this mark and
today's microprocessors have many millions of gates and billions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But
the huge number of gates and transistors available on common devices has rendered such
fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no
longer in widespread use. In 2008, billion-transistor processors became commercially
available. This became more commonplace as semiconductor fabrication advanced from
the then-current generation of 65 nm processes. Current designs, unlike the earliest
devices, use extensive design automation and automated logic synthesis to lay out the
transistors, enabling higher levels of complexity in the resulting logic functionality.
Certain high-performance logic blocks like the SRAM (static random-access memory)
cell, are still designed by hand to ensure the highest efficiency.
System Requirement
The first step of any design process is to lay down the specifications of the system.
System specification is a high level representation of the system. The factors to be
considered in this process include: performance, functionality, and physical dimensions
(size of the die (chip)). The fabrication technology and design techniques are also
considered. The specification of a system is a compromise between market requirements,
technology and economical viability. The end results are specifications for the size,
speed, power, and functionality of the VLSI system.
Department of Electronics & Communication Engineering 2
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig VLSI Design Flow
Architectural Design: The basic architecture of the system is designed in this step. This includes, such
decisions as RISC (Reduced Instruction Set Computer) versus CISC (Complex
Instruction Set Computer), number of ALUs, Floating Point units, number and structure
of pipelines, and size of caches among others. The outcome of architectural design is a
Micro-Architectural Specification (MAS). While MAS is a textual (English like)
Department of Electronics & Communication Engineering 3
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
description, architects can accurately predict the performance, power and die size of the
design based on such a description.
Behavioral or Functional Design: In this step, main functional units of the system are identified. This also identifies the
interconnect requirements between the units. The area, power, and other parameters of
each unit are estimated.
The behavioral aspects of the system are considered without implementation specific
information. For example, it may specify that a multiplication is required, but exactly in
which mode such multiplication may be executed is not specified. We may use a variety
of multiplication hardware depending on the speed and word size requirements. The key
idea is to specify behavior, in terms of input, output and timing of each unit, without
specifying its internal structure.
The outcome of functional design is usually a timing diagram or other relationships
between units. This information leads to improvement of the overall design process and
reduction of the complexity of subsequent phases. Functional or behavioral design
provides quick emulation of the system and allows fast debugging of the full system.
Behavioral design is largely a manual step with little or no automation help available.
Logic Design: In this step the control flow, word widths, register allocation, arithmetic operations,
and logic operations of the design that represent the functional design are derived and
tested. This description is called Register Transfer Level (RTL) description. RTL is
expressed in a Hardware Description Language (HDL), such as VHDL or Verilog. This
description can be used in simulation and verification. This description consists of
Boolean expressions and timing information. The Boolean expressions are minimized to
achieve the smallest logic design which conforms to the functional design. This logic
design of the system is simulated and tested to verify its correctness. In some special
cases, logic design can be automated using high level synthesis tools. These tools produce
a RTL description from a behavioral description of the design.
Department of Electronics & Communication Engineering 4
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Circuit Design: The purpose of circuit design is to develop a circuit representation based on the logic
design. The Boolean expressions are converted into a circuit representation by taking into
consideration the speed and power requirements of the original design. Circuit Simulation
is used to verify the correctness and timing of each component. The circuit design is
usually expressed in a detailed circuit diagram. This diagram shows the circuit elements
(cells, macros, gates, transistors) and interconnection between these elements. This
representation is also called a net list. Tools used to manually enter such description are
called schematic capture tools. In many cases, a net list can be created automatically from
logic (RTL) description by using logic synthesis tools.
Physical Design: In this step the circuit representation (or net list) is converted into a geometric
representation. As stated earlier, this geometric representation of a circuit is called a
layout. Layout is created by converting each logic component (cells, macros, gates,
transistors) into a geometric representation (specific shapes in multiple layers), which
perform the intended logic function of the corresponding component. Connections
between different components are also expressed as geometric patterns typically lines in
multiple layers.
The exact details of the layout also depend on design rules, which are guidelines based on
the limitations of the fabrication process and the electrical properties of the fabrication
materials. Physical design is a very complex process and therefore it is usually broken
down into various sub-steps. Various verification and validation checks are performed on
the layout during physical design.
In many cases, physical design can be completely or partially automated and layout can
be generated directly from net list by Layout Synthesis tools. Layout synthesis tools,
while fast, do have an area and performance penalty, which limit their use to some
designs. Manual layout, while slow and manually intensive, does have better area and
performance as compared to synthesized layout. However this advantage may dissipate as
larger and larger designs may undermine human capability to comprehend and obtain
globally optimized solutions.
Department of Electronics & Communication Engineering 5
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fabrication: After layout and verification, the design is ready for fabrication. Since layout data is
typically sent to fabrication on a tape, the event of release of data is called Tape Out.
Layout data is converted (or fractured) into photo-lithographic masks, one for each layer.
Masks identify spaces on the wafer, where certain materials need to be deposited,
diffused or even removed. Silicon crystals are grown and sliced to produce wafers.
Extremely small dimensions of VLSI devices require that the wafers be polished to near
perfection. The fabrication process consists of several steps involving deposition, and
diffusion of various materials on the wafer. During each step one mask is used. Several
dozen masks may be used to complete the fabrication process.
A large wafer is 20 cm (8 inch) in diameter and can be used to produce hundreds of chips,
depending of the size of the chip. Before the chip is mass produced, a prototype is made
and tested. Industry is rapidly moving towards a 30 cm (12 inch) wafer allowing even
more chips per wafer leading to lower cost per chip.
Packaging, Testing and Debugging: Finally, the wafer is fabricated and diced into individual chips in a fabrication facility.
Each chip is then packaged and tested to ensure that it meets all the design specifications
and that it functions properly. Chips used in Printed Circuit Boards (PCBs) are packaged
in Dual In-line Package (DIP), Pin Grid Array (PGA), Ball Grid Array (BGA), and Quad
Flat Package (QFP). Chips used in Multi-Chip Modules (MCM) are not packaged, since
MCMs use bare or naked chips.
FRONT-END DESIGN FLOW VLSI Industry, VLSI is mainly divided into two major domains, first, one is called
front end and second is VLSI Back End, the series is designed according to the practical
approaches, labs and works.
VLSI front end considers all the logical designing and verification part, In simple words it
says all the work up to the gate level or RTL level designing and verification considered
Department of Electronics & Communication Engineering 6
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
as VLSI front end designing and verification, there are multiple ways for logical
designing of OC in VLSI Front End .
Fig Front end design flow
Y Chart The Gajski-Kuhn Y-chart is a model, which captures the considerations in designing
semiconductor devices. The three domains of the Gajski-Kuhn Y-chart are on radial
axes. Each of the domains can be divided into levels of abstraction, using concentric
rings. At the top level (outer ring), we consider the architecture of the chip; at the lower
levels (inner rings), we successively refine the design into finer detailed implementation
Creating a structural description from a behavioral one is achieved through the processes
of high-level synthesis or logical synthesis. Creating a physical description from a
structural one is achieved through layout synthesis.
Department of Electronics & Communication Engineering 7
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig Y chart
The design hierarchy involves the principle of "Divide and Conquer." It is nothing but
dividing the task into smaller tasks until it reaches to its simplest level. This process is
most suitable because the last evolution of design has become so simple that its
manufacturing becomes easier.
We can design the given task into the design flow process's domain (Behavioral,
Structural, and Geometrical). To understand this, let’s take an example of designing a 16-
bit adder, as shown in the figure below.
Department of Electronics & Communication Engineering 8
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig 16-bit Adder
Here, the whole chip of 16 bit adder is divided into four modules of 4-bit adders.
Further, dividing the 4-bit adder into 1-bit adder or half adder. 1 bit addition is the
simplest designing process and its internal circuit is also easy to fabricate on the chip.
Now, connecting all the last four adders, we can design a 4-bit adder and moving on, we
can design a 16-bit adder.
BACK-END DESIGN FLOW Second is VLSI Back End, that considers all the designing and verification part after
logical designing means gate level or RTL level designing shown in Fig , That may
include Floor planning, power Planning, Clock tree synthesis, place and route. Timing
optimization, DRC and LVS and all the foundry work like fabrication, packaging etc., are
also comes under VLSI Back End.
Department of Electronics & Communication Engineering 9
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig BACK-END DESIGN FLOW
Department of Electronics & Communication Engineering 10
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CHAPTER 1
INTRODUCTION 1.1 NETWORK LAYERS OSI stands for Open Systems Interconnection. It has been developed by ISO –
‘International Organization of Standardization‘, in the year 1974. It is a 7 layer
architecture with each layer having specific functionality to perform. All these 7 layers
work collaboratively to transmit the data from one person to another across the globe.
Fig 1.1 Layers in OSI Model
1.1.1 Physical Layer (Layer 1):
The lowest layer of the OSI reference model is the physical layer. It is responsible for
the actual physical connection between the devices. The physical layer contains
information in the form of bits. It is responsible for the actual physical connection
between the devices. When receiving data, this layer will get the signal received and
convert it into 0s and 1s and send them to the Data Link layer, which will put the frame
back together.
Department of Electronics & Communication Engineering 11
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The functions of the physical layer are:
1. Bit synchronization: The physical layer provides the synchronization of the bits by
providing a clock. This clock controls both sender and receiver thus providing
synchronization at bit level.
2. Bit rate control: The Physical layer also defines the transmission rate i.e. the
number of bits sent per second.
3. Physical topologies: Physical layer specifies the way in which the different,
devices/nodes are arranged in a network i.e. bus, star or mesh topolgy.
4. Transmission mode: Physical layer also defines the way in which the data flows
between the two connected devices. The various transmission modes possible are:
Simplex, half-duplex and full-duplex. Hub, Repeater, Modem, Cables are Physical
Layer devices. Network Layer, Data Link Layer and Physical Layer are also known
as Lower Layers or Hardware Layers.
1.1.2 Data Link Layer (DLL) (Layer 2):
The data link layer is responsible for the node to node delivery of the message. The
main function of this layer is to make sure data transfer is error free from one node to
another, over the physical layer. When a packet arrives in a network, it is the
responsibility of DLL to transmit it to the Host using its MAC address.
Data Link Layer is divided into two sub layers:
1. Logical Link Control (LLC)
2. Media Access Control (MAC)
The packet received from Network layer is further divided into frames depending on the
frame size of NIC (Network Interface Card). DLL also encapsulates Sender and
Receiver’s MAC address in the header.
The Receiver’s MAC address is obtained by placing an ARP(Address Resolution
Protocol) request onto the wire asking “Who has that IP address?” and the destination
host will reply with its MAC address.
Department of Electronics & Communication Engineering 12
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The functions of the data Link layer are:
1. Framing: Framing is a function of the data link layer. It provides a way for a sender
to transmit a set of bits that are meaningful to the receiver. This can be
accomplished by attaching special bit patterns to the beginning and end of the
frame.
2. Physical addressing: After creating frames, Data link layer adds physical addresses
(MAC address) of sender and/or receiver in the header of each frame.
3. Error control: Data link layer provides the mechanism of error control in which it
detects and retransmits damaged or lost frames.
4. Flow Control: The data rate must be constant on both sides else the data may get
corrupted thus , flow control coordinates that amount of data that can be sent before
receiving acknowledgement.
5. Access control: When a single communication channel is shared by multiple
devices, MAC sub-layer of data link layer helps to determine which device has
control over the channel at a given time.
1.1.3 Network Layer (Layer 3):
Network layer works for the transmission of data from one host to the other located in
different networks. It also takes care of packet routing i.e. selection of the shortest path to
transmit the packet, from the number of routes available. The sender & receiver’s IP
address are placed in the header by network layer.
The functions of the Network layer are:
1. Routing: The network layer protocols determine which route is suitable from
source to destination. This function of network layer is known as routing.
2. Logical Addressing: In order to identify each device on internetwork uniquely,
network layer defines an addressing scheme. The sender & receiver’s IP address are
placed in the header by network layer. Such an address distinguishes each device
uniquely and universally. Segment in Network layer is referred as Packet. Network
layer is implemented by networking devices such as routers.
Department of Electronics & Communication Engineering 13
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
1.1.4 Transport Layer (Layer 4):
Transport layer provides services to application layer and takes services from network
layer. The data in the transport layer is referred to as Segments. It is responsible for the
End to End delivery of the complete message. Transport layer also provides the
acknowledgment of the successful data transmission and re-transmits the data if an error
• At sender’s side: Transport layer receives the formatted data from the upper layers,
performs Segmentation and also implements Flow & Error control to ensure proper data
transmission. It also adds Source and Destination port number in its header and forwards
Note: The sender need to know the port number associated with the receiver’s
application.
Generally, this destination port number is configured, either by default or manually. For
example, when a web application makes a request to a web server, it typically uses port
number 80, because this is the default port assigned to web applications. Many
applications have default port assigned.
•At receiver’s side: Transport Layer reads the port number from its header and forwards
the Data which it has received to the respective application. It also performs sequencing
and reassembling of the segmented data.
The functions of the transport layer are :
1. Segmentation and Reassembly: This layer accepts the message from the (session)
layer, breaks the message into smaller units . Each of the segment produced has a
header associated with it. The transport layer at the destination station reassembles
the message.
2. Service Point Addressing: In order to deliver the message to correct process,
transport layer header includes a type of address called service point address or port
address. Thus by specifying this address, transport layer makes sure that the
message is delivered to the correct process.
The services provided by transport layer :
1. Connection Oriented Service: It is a three-phase process which include
–Connection Establishment
–Data Transfer
– Termination / disconnection
In this type of transmission, the receiving device sends an acknowledgment, back to Department of Electronics & Communication Engineering 14
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
the source after a packet or group of packet is received. This type of transmission is
reliable and secure.
Connection less service: It is a one phase process and includes Data Transfer. In this
type of transmission, the receiver does not acknowledge receipt of a packet.
Data in the Transport Layer is called as Segments. Transport layer is operated by the
Operating System. It is a part of the OS and communicates with the Application Layer by
making system calls. Transport Layer is called as Heart of OSI model.
1.1.5 Session Layer (Layer 5):
This layer is responsible for establishment of connection, maintenance of sessions,
authentication and also ensures security. The functions of the session layer are :
1. Session establishment, maintenance and termination: The layer allows the two
processes to establish, use and terminate a connection.
2. Synchronization: This layer allows a process to add checkpoints which are
considered as synchronization points into the data. These synchronization point help
to identify the error so that the data is re-synchronized properly, and ends of the
messages are not cut prematurely and data loss is avoided.
3. Dialog Controller: The session layer allows two systems to start communication
with each other in half-duplex or full-duplex.
All the below 3 layers (including Session Layer) are integrated as a single layer in
TCP/IP model as “Application Layer”. Implementation of these 3 layers is done by the
network application itself. These are also known as Upper Layers or Software Layers.
SCENARIO:
Let’s consider a scenario where a user wants to send a message through some
Messenger application running in his browser. The “Messenger” here acts as the
application layer which provides the user with an interface to create the data. This
Department of Electronics & Communication Engineering 15
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
message or so-called Data is compressed, encrypted (if any secure data).
Fig 1.1.5 General Block Diagram of Communication
1.1.6 Presentation Layer (Layer 6):
Presentation layer is also called the Translation layer. The data from the application
layer is extracted here and manipulated as per the required format to transmit over the
network.
The functions of the presentation layer are :
1. Translation: For example, ASCII to EBCDIC.
2. Encryption/ Decryption: Data encryption translates the data into another form or
code. The encrypted data is known as the cipher text and the decrypted data is
known as plain text. A key value is used for encrypting as well as decrypting data.
3. Compression: Reduces the number of bits that need to be transmitted on the
network.
1.1.7 Application Layer (Layer 7):
At the very top of the OSI Reference Model stack of layers, we find Application layer
which is implemented by the network applications. These applications produce the data,
which has to be transferred over the network. This layer also serves as a window for the
application services to access the network and for displaying the received information to
the user.
1. Network Virtual Terminal
2. FTAM-File transfer access and management
Department of Electronics & Communication Engineering 16
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
3. Mail Services
4. Directory Services
OSI model acts as a reference model and is not implemented in Internet because of its late
invention. Current model being used is the TCP/IP model
There are many reasons such as noise, cross-talk etc., which may help data to get
corrupted during transmission. The upper layers work on some generalized view of
network architecture and are not aware of actual hardware data processing. Hence, the
upper layers expect error-free transmission between the systems. Most of the applications
would not function expectedly if they receive erroneous data. Applications such as voice
and video may not be that affected and with some errors they may still function well.
Data-link layer uses some error control mechanism to ensure that frames (data bit
streams) are transmitted with certain level of accuracy. But to understand how errors is
controlled, it is essential to know what types of errors may occur.
1.2 TYPES OF ERRORS
There may be three types of errors:
Single bit error
In a frame, there is only one bit, anywhere though, which is corrupt.
Multiple bits error
Frame is received with more than one bits in corrupted state.
Burst error
Frame contains more than1 consecutive bits corrupted.Department of Electronics & Communication Engineering 17
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Error control mechanism may involve two possible ways:
Error detection Error correction
1.2.1 Error Detection:Errors in the received frames are detected by means of Parity Check and Cyclic
Redundancy Check (CRC). In both cases, few extra bits are sent along with actual data
to confirm that bits received at other end are same as they were sent. If the counter-check
at receiver’ end fails, the bits are considered corrupted.
1.2.2 Parity Check:
One extra bit is sent along with the original bits to make number of 1s either even in case
of even parity, or odd in case of odd parity.
The sender while creating a frame counts the number of 1s in it. For example, if even
parity is used and number of 1s is even then one bit with value 0 is added. This way
number of 1s remains even. If the number of 1s is odd, to make it even a bit with value 1
is added.
The receiver simply counts the number of 1s in a frame. If the count of 1s is even and
even parity is used, the frame is considered to be not-corrupted and is accepted. If the
count of 1s is odd and odd parity is used, the frame is still not corrupted.
If a single bit flips in transit, the receiver can detect it by counting the number of 1s. But
when more than one bits are error nous, then it is very hard for the receiver to detect the
error.
1.2.3 Cyclic Redundancy Check (CRC)
CRC is a different approach to detect if the received frame contains valid data. This
technique involves binary division of the data bits being sent. The divisor is generated
using polynomials. The sender performs a division operation on the bits being sent and
calculates the remainder. Before sending the actual bits, the sender adds the remainder at
Department of Electronics & Communication Engineering 18
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
the end of the actual bits. Actual data bits plus the remainder is called a codeword. The
sender transmits data bits as code words.
At the other end, the receiver performs division operation on codewords using the same
CRC divisor. If the remainder contains all zeros the data bits are accepted, otherwise it is
considered as there some data corruption occurred in transit.
Error Correction
In the digital world, error correction can be done in two ways:
Backward Error Correction: When the receiver detects an error in the data
received, it requests back the sender to retransmit the data unit.
Forward Error Correction: When the receiver detects some error in the data
received, it executes error-correcting code, which helps it to auto-recover and to
correct some kinds of errors.
The first one, Backward Error Correction, is simple and can only be efficiently used
where retransmitting is not expensive. For example, fiber optics. But in case of wireless
transmission retransmitting may cost too much. In the latter case, Forward Error
Correction is used.
Department of Electronics & Communication Engineering 19
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
To correct the error in data frame, the receiver must know exactly which bit in the frame
is corrupted. To locate the bit in error, redundant bits are used as parity bits for error
detection. For example, we take ASCII words (7 bits data), then there could be 8 kind of
information we need: first seven bits to tell us which bit is error and one more bit to tell
that there is no error.
For m data bits, r redundant bits are used. r bits can provide 2r combinations of
information. In m+r bit codeword, there is possibility that the r bits themselves may get
corrupted. So the number of r bits used must inform about m+r bit locations plus no-
error information, i.e. m+r+1.
1.2.4 Generator Polynomials
Why is the predetermined c+1-bit divisor that's used to calculate a CRC called a generator
polynomial? In my opinion, far too many explanations of CRCs actually try to answer
that question. This leads their authors and readers down a long path that involves tons of
detail about polynomial arithmetic and the mathematical basis for the usefulness of
CRCs. This academic stuff is not important for understanding CRCs sufficiently to
implement and/or use them and serves only to create potential confusion. So I'm not
going to answer that question here.
Suffice it to say here only that the divisor is sometimes called a generator polynomial and
that you should never make up the divisor's value on your own. Several mathematically
well-understood generator polynomials have been adopted as parts of various
international communications standards; you should always use one of those. If you have
a background in polynomial arithmetic then you know that certain generator polynomials
are better than others for producing strong checksums. The ones that have been adopted
internationally are among the best of these.
Table 1 lists some of the most commonly used generator polynomials for 16- and 32-bit
CRCs. Remember that the width of the divisor is always one bit wider than the remainder.
So, for example, you'd use a 17-bit generator polynomial whenever a 16-bit checksum is
required.
Department of Electronics & Communication Engineering 20
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Checksum Width Generator Polynomial
CRC-CCITT 16 bits 10001000000100001
CRC-16 16 bits 11000000000000101
CRC-32 32 bits 100000100110000010001110110110111
Table 1 International standard CRC polynomials
As is the case with other types of checksums, the width of the CRC plays an important
role in the error detection capabilities of the algorithm. Ignoring special types of errors
that are always detected by a particular checksum algorithm, the percentage of detectable
errors is limited strictly by the width of a checksum. A checksum of c bits can only take
one of 2c unique values. Since the number of possible messages is significantly larger
than that, the potential exists for two or more messages to have an identical checksum. If
one of those messages is somehow transformed into one of the others during
transmission, the checksum will appear correct and the receiver will unknowingly accept
a bad message. The chance of this happening is directly related to the width of the
checksum. Specifically, the chance of such an error is 1/2c. Therefore, the probability of
any random error being detected is 1-1/2c.
To repeat, the probability of detecting any random error increases as the width of the
checksum increases. Specifically, a 16-bit checksum will detect 99.9985% of all errors.
This is far better than the 99.6094% detection rate of an eight-bit checksum, but not
nearly as good as the 99.9999% detection rate of a 32-bit checksum. All of this applies to
both CRCs and addition-based checksums. What really sets CRCs apart, however, is the
number of special cases that can be detected 100% of the time. For example, I pointed out
last month that two opposite bit inversions (one bit becoming 0, the other becoming 1) in
the same column of an addition would cause the error to be undetected. Well, that's not
the case with a CRC.
Department of Electronics & Communication Engineering 21
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
By using one of the mathematically well-understood generator polynomials like those in
Table 1 to calculate a checksum, it's possible to state that the following types of errors
will be detected without fail:
A message with any one bit in error
A message with any two bits in error (no matter how far apart, which column, and
so on)
A message with any odd number of bits in error (no matter where they are)
A message with an error burst as wide as the checksum itself
The first class of detectable error is also detected by an addition-based checksum, or even
a simple parity bit. However, the middle two classes of errors represent much stronger
detection capabilities than those other types of checksum. The fourth class of detectable
error sounds at first to be similar to a class of errors detected by addition-based
checksums, but in the case of CRCs, any odd number of bit errors will be detected. So the
set of error bursts too wide to detect is now limited to those with an even number of bit
errors. All other types of errors fall into the relatively high 1-1/2c probability of detection.
1.2.5 Ethernet, SLIP, and PPP
Ethernet, like most physical layer protocols, employs a CRC rather than an additive
checksum. Specifically, it employs the CRC-32 algorithm. The likelihood of an error in a
packet sent over Ethernet being undetected is, therefore, extremely low. Many types of
common transmission errors are detected 100% of the time, with the less likely ones
detected 99.9999% of the time. Even if an error would somehow manage to get through at
the Ethernet layer, it would probably be detected at the IP layer checksum (if the error is
in the IP header) or in the TCP or UDP layer checksum above that. After all the chances
of two or more different checksum algorithms not detecting the same error is extremely
remote.
However, many embedded systems that use TCP/IP will not employ Ethernet. Instead,
they will use either the serial line Internet protocol (SLIP) or point-to-point protocol
(PPP) to send and receive IP packets directly over a serial connection of some sort.
Unfortunately, SLIP does not add a checksum or a CRC to the data from the layers above.
So unless a pair of modems with error correction capabilities sits in between the two
Department of Electronics & Communication Engineering 22
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
communicating systems, any transmission errors must hope to be detected by the
relatively weak, addition-based Internet checksum described last month. The newer,
compressed SLIP (CSLIP) shares this weakness with its predecessor. PPP, on the other
hand, does include a 16-bit CRC in each of its frames, which can carry the same
maximum size IP packet as an Ethernet frame. So while PPP doesn't offer the same
amount of error detection capability as Ethernet, by using PPP you'll at least avoid the
much larger number of undetected errors that may occur with SLIP or CSLIP.
Department of Electronics & Communication Engineering 23
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CHAPTER 2
CRC (CYCLIC REDUNDANCY CHECK)
2.1 INTRODUCTION
The recent advancements in wireless technology have brought forth very high speed
transmission which faces a major challenge from noisy channels. Factors such as signal
attenuation, multiple access interference, inter symbol interference and Doppler shift
degrade the quality of data to a great extent. As a result, the received signal has a chance
of getting corrupted. Consequently, for the alleviation of this problem, error control
coding is not only desirable, but has become a must to achieve an acceptable Bit Error
Rate (BER). For the purpose of checking the integrity of the data being stored or sent
over noisy channels, the most widely adopted among all the error control codes is Cyclic
Redundancy Check Code (CRC). CRCs, first introduced by Peterson and Brown in 1961
are used for the detection of any corruption of the digital signal during its production,
transmission, processing as well as storage stage. Recent wireless technologies namely,
Asynchronous Transfer Mode (ATM) IEEE communication standards, such as wired
Ethernet (IEEE 802.3) Wi-Fi (IEEE 802.11) and WiMAX (IEEE 802.16), employ CRC
for error detection purpose. A very well written understanding of the CRC computation
and collection of most of the published hardware and software implementations of CRC
can be found. Various parallel hardware structures have been described. Several literature
have proposed highly efficient architecture based on performance enhancement as well as
resource utilization. The list also includes Two-Step, Cascade, Look-Ahead, State-Space
Transformed, and Retimed Architectures. The authors of have also proposed a three-step
LFSR architecture which makes it easy to address the two major issues of large fanout
limitation and iteration bound limitation in parallel LFSR architecture which is an integral
part of CRC computation. The authors have also proposed an improved solution for the
fanout problem. The authors have shown that the proposed architecture achieves
significant improvement in terms of processing speed and hardware efficiency. the
Department of Electronics & Communication Engineering 24
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
authors have extended the above idea and presented a mathematical proof showing that a
transformation exists in state space that can reduce the complexity of the parallel LFSR
feedback loop along with a new idea for high speed parallel implementation of linear
feedback shift registers based upon parallel IIR filter. In some other methods the authors
have proposed a novel parallel implementation of long BCH encoders to achieve
significant speedup by eliminating the effect of large fan out. The authors have mentioned
that similar technique can also be used for CRC.
Rapid advancement of semiconductor technologies results in rapidly increasing soft error
rates due to shrink in area, time and power. Temporary faults caused by cosmic radiation,
known as single event upset (SEU), are the main sources of errors. A lot of attention has
been paid for the reliability improvement of all the elementary components reflected by
their much lower failure rates. But the large number and dense population of these
elements on the chip degrade the reliability to a large extent. While error detection codes
and error correction codes have long been used for checking errors, the reliability of the
error correction or detection encoder itself has come up as a major issue. High speed data
transmission requires high speed error detection and parallel processing at the cost of
increased hardware, which increases the probability of error occurrence of internal faults.
As a result, fault detection of encoder and decoder have gained a lot of importance. The
authors have addressed the problem of designing parallel fault-secure encoders by
generating not only error correcting check bits, but also independently and in parallel,
error detecting check bits. The next step involves the comparison of error detection check
bits with another set of error detecting check bits generated from error correction check
bits. This design is significantly cost effective compared to duplication with comparison
technique. There are several other literature which deal with the similar topic. In some
papers the authors have also discussed the design of a fault secure encoder in between a
fault secure RAM and a faulty channel. Error detection and error correction using Residue
Number System have been discussed. We find the description of fault detection of Reed
Solomon encoder. The authors have presented a new approach for the automated
synthesis of multilevel circuits with concurrent error detection using a parity-check code.
They have developed a new procedure namely structure-constrained logic optimization
and proved that this design along with a checker forms a self-checking circuit Our
proposed method of the multiple bit parity based concurrent fault detection scheme for
parallel CRC computation inherently exploits the non-overlapping feature of the parity
Department of Electronics & Communication Engineering 25
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
generation circuits. We have presented a design concept of multiple-bit parity-based
concurrent fault detection for parallel CRC architecture. In the proposed scheme, the
generated CRC output is divided into a few blocks based on the number of parity bits and
has been made to go through a multiple-bit parity comparison unit. Then, the actual parity
bits are compared with the predicted multiple-bit parity of the blocks to incorporate the
fault detection architecture.
In the cyclic redundancy check, a fixed number of check bits, often called a checksum,
are appended to the message that needs to be transmitted. The data receivers receive the
data and inspect the check bits for any errors. Mathematically, data receivers check on the
check value attached by finding the remainder of the polynomial division of the contents
transmitted. If it seems that an error has occurred, a negative acknowledgement is
transmitted asking for data retransmission.
A cyclic redundancy check is also applied to storage devices like hard disks. In this
case, check bits are allocated to each block in the hard disk. When a corrupt or incomplete
file is read by the computer, the cyclic redundancy error is reported. This could be from
another storage device or from CD/DVDs. The common reasons for errors include system
crashes, incomplete or corrupt files, or files with lots of bugs.
CRC polynomial designs depend on length of block to be protected, error protection
features, resource for CRC implementation, and performance.
A cyclic redundancy check (CRC) is an error-detecting code commonly used in
digital networks and storage devices to detect accidental changes to raw data. Blocks of
data entering these systems get a short check value attached, based on the remainder of
a polynomial division of their contents. On retrieval, the calculation is repeated and, in
the event the check values do not match, corrective action can be taken against data
corruption. CRCs can be used for error correction.
CRCs are so called because the check (data verification) value is a redundancy (it
expands the message without adding information) and the algorithm is based
on cyclic codes. CRCs are popular because they are simple to implement in
binary hardware, easy to analyze mathematically, and particularly good at detecting
common errors caused by noise in transmission channels. Because the check value has a
fixed length, the function that generates it is occasionally used as a hash function.
Department of Electronics & Communication Engineering 26
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The CRC was invented by W. Wesley Peterson in 1961; the 32-bit CRC function, used
in Ethernet and many other standards, is the work of several researchers and was
published in 1975.
CRCs are based on the theory of cyclic error-correcting codes. The use
of systematic cyclic codes, which encode messages by adding a fixed-length check value,
for the purpose of error detection in communication networks, was first proposed by W.
Wesley Peterson in 1961. Cyclic codes are not only simple to implement but have the
benefit of being particularly well suited for the detection of burst errors: contiguous
sequences of erroneous data symbols in messages. This is important because burst errors
are common transmission errors in many communication channels, including magnetic
and optical storage devices. Typically an n-bit CRC applied to a data block of arbitrary
length will detect any single error burst not longer than n bits and the fraction of all
longer error bursts that it will detect is (1 − 2−n).
Specification of a CRC code requires definition of a so-called generator polynomial.
This polynomial becomes the divisor in a polynomial long division, which takes the
message as the dividend and in which the quotient is discarded and
the remainder becomes the result. The important caveat is that the
polynomial coefficients are calculated according to the arithmetic of a finite field, so the
addition operation can always be performed bitwise-parallel (there is no carry between
digits).
In practice, all commonly used CRCs employ the Galois field of two elements, GF(2).
The two elements are usually called 0 and 1, comfortably matching computer
architecture.
A CRC is called an n-bit CRC when its check value is n bits long. For a given n, multiple
CRCs are possible, each with a different polynomial. Such a polynomial has highest
degree n, which means it has n + 1 terms. In other words, the polynomial has a length
of n + 1; its encoding requires n + 1 bits. Note that most polynomial specifications either
drop the MSB or LSB, since they are always 1. The CRC and associated polynomial
typically have a name of the form CRC-n-XXX as in the table below.
The simplest error-detection system, the parity bit, is in fact a 1-bit CRC: it uses the
generator polynomial x + 1 (two terms), and has the name CRC-1.
Department of Electronics & Communication Engineering 27
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
2.2 DATA INTEGRITY
CRCs are specifically designed to protect against common types of errors on
communication channels, where they can provide quick and reasonable assurance of
the integrity of messages delivered. However, they are not suitable for protecting against
intentional alteration of data.
Firstly, as there is no authentication, an attacker can edit a message and recomputed the
CRC without the substitution being detected. When stored alongside the data, CRCs and
cryptographic hash functions by themselves do not protect
against intentional modification of data. Any application that requires protection against
such attacks must use cryptographic authentication mechanisms, such as message
authentication codes or digital signatures (which are commonly based on cryptographic
hash functions).
Secondly, unlike cryptographic hash functions, CRC is an easily reversible function,
which makes it unsuitable for use in digital signatures.[4]
Thirdly, CRC is a linear function with a property that as a result, even if the CRC is
encrypted with a stream cipher that uses XOR as its combining operation
(or mode of block cipher which effectively turns it into a stream cipher, such as OFB or
CFB), both the message and the associated CRC can be manipulated without knowledge
of the encryption key; this was one of the well-known design flaws of the Wired
Equivalent Privacy (WEP) protocol.
2.3 DESIGNING POLYNOMIALS The selection of the generator polynomial is the most important part of implementing
the CRC algorithm. The polynomial must be chosen to maximize the error-detecting
capabilities while minimizing overall collision probabilities.
The most important attribute of the polynomial is its length (largest degree(exponent) +1
of any one term in the polynomial), because of its direct influence on the length of the
computed check value.
The most commonly used polynomial lengths are:
9 bits (CRC-8)
17 bits (CRC-16)
Department of Electronics & Communication Engineering 28
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
33 bits (CRC-32)
65 bits (CRC-64)
A CRC is called an n-bit CRC when its check value is n-bits. For a given n, multiple
CRCs are possible, each with a different polynomial. Such a polynomial has highest
degree n, and hence n + 1 terms (the polynomial has a length of n + 1). The remainder has
length n. The CRC has a name of the form CRC-n-XXX.
The design of the CRC polynomial depends on the maximum total length of the block to
be protected (data + CRC bits), the desired error protection features, and the type of
resources for implementing the CRC, as well as the desired performance. A common
misconception is that the "best" CRC polynomials are derived from either irreducible
polynomials or irreducible polynomials times the factor 1 + x, which adds to the code the
ability to detect all errors affecting an odd number of bits.[7] In reality, all the factors
described above should enter into the selection of the polynomial and may lead to a
reducible polynomial. However, choosing a reducible polynomial will result in a certain
proportion of missed errors, due to the quotient ring having zero divisors.
The advantage of choosing a primitive polynomial as the generator for a CRC code is
that the resulting code has maximal total block length in the sense that all 1-bit errors
within that block length have different remainders (also called syndromes) and therefore,
since the remainder is a linear function of the block, the code can detect all 2-bit errors
within that block length. If is the degree of the primitive generator polynomial, then the
maximal total block length is , and the associated code is able to detect any single-bit or
double-bit errors. We can improve this situation. If we use the generator polynomial ,
where is a primitive polynomial of degree , then the maximal total block length is , and
the code is able to detect single, double, triple and any odd number of errors.
A polynomial that admits other factorizations may be chosen then so as to balance the
maximal total block length with a desired error detection power. The BCH codes are a
powerful class of such polynomials. They subsume the two examples above. Regardless
of the reducibility properties of a generator polynomial of degree r, if it includes the "+1"
term, the code will be able to detect error patterns that are confined to a window
of r contiguous bits. These patterns are called "error bursts".
2.4 SPECIFICATION
Department of Electronics & Communication Engineering 29
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The concept of the CRC as an error-detecting code gets complicated when an
implementer or standards committee uses it to design a practical system. Here are some of
the complications:
Sometimes an implementation prefixes a fixed bit pattern to the bitstream to be
checked. This is useful when clocking errors might insert 0-bits in front of a message,
an alteration that would otherwise leave the check value unchanged.
Usually, but not always, an implementation appends n 0-bits (n being the size of the
CRC) to the bit stream to be checked before the polynomial division occurs. Such
appending is explicitly demonstrated in the Computation of CRC article. This has the
convenience that the remainder of the original bit stream with the check value
appended is exactly zero, so the CRC can be checked simply by performing the
polynomial division on the received bit stream and comparing the remainder with
zero. Due to the associative and commutative properties of the exclusive-or operation,
practical table driven implementations can obtain a result numerically equivalent to
zero-appending without explicitly appending any zeroes, by using an equivalent,
faster algorithm that combines the message bit stream with the stream being shifted
out of the CRC register.
Sometimes an implementation exclusive-ORs a fixed bit pattern into the remainder of
the polynomial division.
Bit order: Some schemes view the low-order bit of each byte as "first", which then
during polynomial division means "leftmost", which is contrary to our customary
understanding of "low-order". This convention makes sense when serial-
port transmissions are CRC-checked in hardware, because some widespread serial-
port transmission conventions transmit bytes least-significant bit first.
Byte order: With multi-byte CRCs, there can be confusion over whether the byte
transmitted first (or stored in the lowest-addressed byte of memory) is the least-
significant byte (LSB) or the most-significant byte (MSB). For example, some 16-bit
CRC schemes swap the bytes of the check value.
Omission of the high-order bit of the divisor polynomial: Since the high-order bit is
always 1, and since an n-bit CRC must be defined by an (n + 1)-bit divisor
which overflows an n-bit register, some writers assume that it is unnecessary to
mention the divisor's high-order bit.
Department of Electronics & Communication Engineering 30
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Omission of the low-order bit of the divisor polynomial: Since the low-order bit is
always 1, authors such as Philip Koopman represent polynomials with their high
order bit intact, but without the low-order bit . This convention encodes the
polynomial complete with its degree in one integer.
These complications mean that there are three common ways to express a polynomial
as an integer: the first two, which are mirror images in binary, are the constants found in
code; the third is the number found in Koopman's papers. In each case, one term is
omitted. So the polynomial may be transcribed as:
0x3 = 0b0011, representing (MSB-first code)
0xC = 0b1100, representing (LSB-first code)
0x9 = 0b1001, representing (Koopman notation)
In the table below they are shown as:
Table 2: CRC Representations
2.5 STANDARDS AND COMMOM USE
Numerous varieties of cyclic redundancy checks have been incorporated into technical
standards. By no means have does one algorithm, or one of each degree suited every
purpose; Koopman and Chakravarty recommend selecting a polynomial according to the
application requirements and the expected distribution of message lengths. The number of
distinct CRCs in use has confused developers, a situation which authors have sought to
address.[7] There are three polynomials reported for CRC-12, nineteen conflicting
definitions of CRC-16, and seven of CRC-32.
Department of Electronics & Communication Engineering 31
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The polynomials commonly applied are not the most efficient ones possible. Since
1993, Koopman, Castagnoli and others have surveyed the space of polynomials between
3 and 64 bits in size, finding examples that have much better performance (in terms
of Hamming distance for a given message size) than the polynomials of earlier protocols,
and publishing the best of these with the aim of improving the error detection capacity of
future standards. In particular, ISCSI and SCTP have adopted one of the findings of this
research, the CRC-32C (Castagnoli) polynomial.
The design of the 32-bit polynomial most commonly used by standards bodies, CRC-
32-IEEE, was the result of a joint effort for the Rome Laboratory and the Air Force
Electronic Systems Division by Joseph Hammond, James Brown and Shyan-Shiang Liu
of the Georgia Institute of Technology and Kenneth Brayer of the MITRE Corporation.
The earliest known appearances of the 32-bit polynomial were in their 1975 publications:
Technical Report 2956 by Brayer for MITRE, published in January and released for
public dissemination through DTIC in August, and Hammond, Brown and Liu's report for
the Rome Laboratory, published in May. Both reports contained contributions from the
other team. During December 1975, Brayer and Hammond presented their work in a
paper at the IEEE National Telecommunications Conference: the IEEE CRC-32
polynomial is the generating polynomial of a Hamming code and was selected for its
error detection performance. Even so, the Castagnoli CRC-32C polynomial used in ISCSI
or SCTP matches its performance on messages from 58 bits to 131 Kbits, and
outperforms it in several size ranges including the two most common sizes of Internet
packet. The ITU-T G.hn standard also uses CRC-32C to detect errors in the payload
(although it uses CRC-16-CCITT for PHY headers).
CRC32 computation is implemented in hardware as an operation of SSE4.2 instruction
set, first introduced in Intel processors' Nehalem micro architecture.
Department of Electronics & Communication Engineering 32
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CHAPTER 3
LINEAR-FEEDBACK SHIFT REGISTER
3.1 FIBONACCI LFSR
In computing, a linear-feedback shift register (LFSR) is a shift register whose input bit
is a linear function of its previous state.
The most commonly used linear function of single bits is exclusive-or (XOR). Thus, an
LFSR is most often a shift register whose input bit is driven by the XOR of some bits of
the overall shift register value.
The initial value of the LFSR is called the seed, and because the operation of the
register is deterministic, the stream of values produced by the register is completely
determined by its current (or previous) state. Likewise, because the register has a finite
number of possible states, it must eventually enter a repeating cycle. However, an LFSR
with a well-chosen feedback function can produce a sequence of bits that appears random
and has a very long cycle. Applications of LFSRs include generating pseudo-random
numbers, pseudo-noise sequences, fast digital counters, and whitening sequences. Both
hardware and software implementations of LFSRs are common.
Department of Electronics & Communication Engineering 33
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
The mathematics of a cyclic redundancy check, used to provide a quick check against
transmission errors, are closely related to those of an LFSR. The bit positions that affect
the next state are called the taps. In the diagram the taps are [16,14,13,11]. The rightmost
bit of the LFSR is called the output bit. The taps are XOR'd sequentially with the output
bit and then fed back into the leftmost bit. The sequence of bits in the rightmost position
is called the output stream.
The bits in the LFSR state that influence the input are called taps. A maximum-length
LFSR produces an m-sequence (i.e., it cycles through all possible 2m − 1 states within the
shift register except the state where all bits are zero), unless it contains all zeros, in which
case it will never change.
As an alternative to the XOR-based feedback in an LFSR, one can also use XNOR.[2]
This function is an affine map, not strictly a linear map, but it results in an equivalent
polynomial counter whose state is the complement of the state of an LFSR. A state with
all ones is illegal when using an XNOR feedback, in the same way as a state with all
zeroes is illegal when using XOR. This state is considered illegal because the counter
would remain "locked-up" in this state. The sequence of numbers generated by an LFSR
or its XNOR counterpart can be considered a binary numeral system just as valid as Gray
code or the natural binary code. The arrangement of taps for feedback in an LFSR can be
expressed in finite field arithmetic as a polynomial mod 2. This means that the
coefficients of the polynomial must be 1s or 0s. This is called the feedback polynomial or
reciprocal characteristic polynomial. For example, if the taps are at the 16th, 14th, 13th
and 11th bits (as shown), the feedback polynomial is
The "one" in the polynomial does not correspond to a tap – it corresponds to the input
to the first bit (i.e. x0, which is equivalent to 1). The powers of the terms represent the
tapped bits, counting from the left. The first and last bits are always connected as an input
and output tap respectively. The LFSR is maximal-length if and only if the corresponding
feedback polynomial is primitive. This means that the following conditions are necessary
(but not sufficient):The number of taps is even. The set of taps is set wise co-prime; i.e.,
there must be no divisor other than 1 common to all taps. Tables of primitive polynomials
from which maximum-length LFSRs can be constructed are given below and in the
references. There can be more than one maximum-length tap sequence for a given LFSR
length. Also, once one maximum-length tap sequence has been found, another
Department of Electronics & Communication Engineering 34
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
automatically follows. If the tap sequence in an n-bit LFSR is [n, A, B, C, 0], where the 0
corresponds to the x0 = 1 term, then the corresponding "mirror" sequence is [n, n − C, n −
B, n − A, 0]. So the tap sequence [32, 22, 2, 1, 0] has as its counterpart [32, 31, 30, 10, 0].
Both give a maximum-length sequence.
Fig 3.1: A 16-bit Fibonacci LFSR
3.2 GALOIS LFSRs
Named after the French mathematician Évariste Galois, an LFSR in Galois configuration,
which is also known as modular, internal XORs, or one-to-many LFSR, is an alternate
structure that can generate the same output stream as a conventional LFSR (but offset in
time). In the Galois configuration, when the system is clocked, bits that are not taps are
shifted one position to the right unchanged. The taps, on the other hand, are XORed with
the output bit before they are stored in the next position. The new output bit is the next
input bit. The effect of this is that when the output bit is zero, all the bits in the register
shift to the right unchanged, and the input bit becomes zero. When the output bit is one,
the bits in the tap positions all flip (if they are 0, they become 1, and if they are 1, they
become 0), and then the entire register is shifted to the right and the input bit becomes 1.
To generate the same output stream, the order of the taps is the counterpart (see above) of
the order for the conventional LFSR, otherwise the stream will be in reverse. Note that
the internal state of the LFSR is not necessarily the same. The Galois register shown has
the same output stream as the Fibonacci register in the first section. A time offset exists
between the streams, so a different start point will be needed to get the same output each
cycle.
Department of Electronics & Communication Engineering 35
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Galois LFSRs do not concatenate every tap to produce the new input (the XORing is
done within the LFSR, and no XOR gates are run in serial, therefore the propagation
times are reduced to that of one XOR rather than a whole chain), thus it is possible for
each tap to be computed in parallel, increasing the speed of execution.
In a software implementation of an LFSR, the Galois form is more efficient, as the
XOR operations can be implemented a word at a time: only the output bit must be
examined individually.
Fig 3.2: A 16-bit Galois LFSR.
3.3 MATRIX FORMS
Binary LFSRs of both Fibonacci and Galois configurations can be expressed as linear
functions using matrices. Using the companion matrix of the characteristic polynomial of
the LFSR and denoting the seed as a column vector, the state of the register in Fibonacci
configuration after steps is given by
Department of Electronics & Communication Engineering 36
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
3.4 SOME POLYNOMIAL FOR MAXIMAL LFSR’S
The following table lists maximal-length polynomials for shift-register lengths up to
24. Note that more than one maximal-length polynomial may exist for any given shift-
register length. A list of alternative maximal-length polynomials for shift-register lengths
4–32 (beyond which it becomes unfeasible to store or transfer them) can be found here:
Department of Electronics & Communication Engineering 37
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Table 3: General Polynomials
Department of Electronics & Communication Engineering 38
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
3.5 OUTPUTS-STREAM PROPERTIES
Ones and zeroes occur in "runs". The output stream 1110010, for example, consists of
four runs of lengths 3, 2, 1, 1, in order. In one period of a maximal LFSR, 2n−1 runs
occur (in the example above, the 3-bit LFSR has 4 runs). Exactly half of these runs
are one bit long, a quarter are two bits long, up to a single run of zeroes n − 1 bits
long, and a single run of ones n bits long. This distribution almost equals the
statistical expectation value for a truly random sequence. However, the probability of
finding exactly this distribution in a sample of a truly random sequence is rather low.
LFSR output streams are deterministic. If the present state and the positions of the
XOR gates in the LFSR are known, the next state can be predicted. This is not
possible with truly random events. With maximal-length LFSRs, it is much easier to
compute the next state, as there are only an easily limited number of them for each
length.
The output stream is reversible; an LFSR with mirrored taps will cycle through the
output sequence in reverse order.
3.6 APPLICATIONS
LFSRs can be implemented in hardware, and this makes them useful in applications
that require very fast generation of a pseudo-random sequence, such as direct-sequence
spread spectrum radio. LFSRs have also been used for generating an approximation
of white noise in various programmable sound generators.
3.6.1 USES AS COUNTERS
The repeating sequence of states of an LFSR allows it to be used as a clock divider or
as a counter when a non-binary sequence is acceptable, as is often the case where
computer index or framing locations need to be machine-readable.[7] LFSR counters have
simpler feedback logic than natural binary counters or Gray-code counters, and therefore
can operate at higher clock rates. However, it is necessary to ensure that the LFSR never
enters an all-zeros state, for example by presetting it at start-up to any other state in the
sequence. The table of primitive polynomials shows how LFSRs can be arranged in
Fibonacci or Galois form to give maximal periods. One can obtain any other period by
Department of Electronics & Communication Engineering 39
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
adding to an LFSR that has longer period some logic that shortens the sequence by
skipping some states.
3.6.2 USES IN CRYPTOGRAPHY
LFSRs have long been used as pseudo-random number generators for use in stream
ciphers (especially in military cryptography), due to the ease of construction from
simple electromechanical or electronic circuits, long periods, and very
uniformly distributed output streams. However, an LFSR is a linear system, leading to
fairly easy cryptanalysis. For example, given a stretch of known plaintext and
corresponding cipher text, an attacker can intercept and recover a stretch of LFSR output
stream used in the system described, and from that stretch of the output stream can
construct an LFSR of minimal size that simulates the intended receiver by using
the Berlekamp-Massey algorithm. This LFSR can then be fed the intercepted stretch of
output stream to recover the remaining plaintext.
Three general methods are employed to reduce this problem in LFSR-based stream
ciphers:
Non-linear combination of several bits from the LFSR state;
Non-linear combination of the output bits of two or more LFSRs (see also: shrinking
generator); or using Evolutionary algorithm to introduce non-linearity.[8]
Irregular clocking of the LFSR, as in the alternating step generator.
3.6.3 TEST-PATTERN GENERATION Complete LFSR are commonly used as pattern generators for exhaustive testing, since
they cover all possible inputs for an n-input circuit. Maximal-length LFSRs and weighted
LFSRs are widely used as pseudo-random test-pattern generators for pseudo-random test
applications.
3.6.4 SIGNATURE ANALYSIS In built-in self-test (BIST) techniques, storing all the circuit outputs on chip is not
possible, but the circuit output can be compressed to form a signature that will later be
compared to the golden signature (of the good circuit) to detect faults. Since this
compression is lossy, there is always a possibility that a faulty output also generates the
same signature as the golden signature and the faults cannot be detected. This condition is
called error masking or aliasing. BIST is accomplished with a multiple-input signature
Department of Electronics & Communication Engineering 40
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
register (MISR or MSR), which is a type of LFSR. A standard LFSR has a single XOR or
XNOR gate, where the input of the gate is connected to several "taps" and the output is
connected to the input of the first flip-flop. A MISR has the same structure, but the input
to every flip-flop is fed through an XOR/XNOR gate. For example, a 4-bit MISR has a 4-
bit parallel output and a 4-bit parallel input. The input of the first flip-flop is
XOR/XNORd with parallel input bit zero and the "taps". Every other flip-flop input is
XOR/XNORd with the preceding flip-flop output and the corresponding parallel input bit.
Consequently, the next state of the MISR depends on the last several states opposed to
just the current state. Therefore, a MISR will always generate the same golden signature
given that the input sequence is the same every time.
3.6.5 USES IN DIGITAL BROADCASTING To prevent short repeating sequences (e.g., runs of 0s or 1s) from forming spectral lines
that may complicate symbol tracking at the receiver or interfere with other transmissions,
the data bit sequence is combined with the output of a linear-feedback register before
modulation and transmission. This scrambling is removed at the receiver after
demodulation. When the LFSR runs at the same bit rate as the transmitted symbol stream,
this technique is referred to as scrambling. When the LFSR runs considerably faster than
the symbol stream, the LFSR-generated bit sequence is called chipping code. The
chipping code is combined with the data using exclusive or before transmitting
using binary phase-shift keying or a similar modulation method. The resulting signal has
a higher bandwidth than the data, and therefore this is a method of spread-
spectrum communication. When used only for the spread-spectrum property, this
technique is called direct-sequence spread spectrum; when used to distinguish several
signals transmitted in the same channel at the same time and frequency, it is called code
division multiple access.
Neither scheme should be confused with encryption or encipherment; scrambling and
spreading with LFSRs do not protect the information from eavesdropping. They are
instead used to produce equivalent streams that possess convenient engineering properties
to allow robust and efficient modulation and demodulation.
Department of Electronics & Communication Engineering 41
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Digital broadcasting systems that use linear-feedback registers:
ATSC Standards (digital TV transmission system – North America)
DAB (Digital Audio Broadcasting system – for radio)
DVB-T (digital TV transmission system – Europe, Australia, parts of Asia)
NICAM (digital audio system for television)
Other digital communications systems using LFSRs:
INTELSAT business service (IBS)
Intermediate data rate (IDR)
SDI (Serial Digital Interface transmission)
Data transfer over PSTN (according to the ITU-T V-series recommendations)
CDMA (Code Division Multiple Access) cellular telephony
100BASE-T2 "fast" Ethernet scrambles bits using an LFSR
1000BASE-T Ethernet, the most common form of Gigabit Ethernet, scrambles bits
using an LFSR
PCI Express 3.0
SATA
Department of Electronics & Communication Engineering 42
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CHAPTER 4
PROPOSED METHOD4.1 INTRODUCTION In this section, we present the architecture of the proposed concurrent Error detection
of parallel CRC structure based on multiple-bit parity prediction. We have already
formulated multiple-bit parity prediction expressions for l >m when l is the number of
input message bits in each iteration and m is the number of CRC bits.
Fig 4.1: Simplified block diagram of the proposed multiple-bit parity-based error
detection structure of parallel CRC architecture.
Considering the parallel CRC architecture proposed in to apply a multiple-bit parity
approach on this architecture to achieve concurrent fault detection. This multiple- bit
parity checking circuit will compare predicted parity and actual parity of the syndrome
bits at the end of each iteration and raise an alarm if they do not match. The output of the
parity calculation unit is the actual multiple-bit parity of CRC syndrome bits after nth
iteration, whereas the output of the parity prediction unit is the predicted multiple-bit
parity. The actual parity is obtained by XORing the outputs of the CRC Unit, whereas the
predicted parity is evaluated as a different function of inputs.
Department of Electronics & Communication Engineering 43
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
4.2 PARITY Parity is the simplest form of error checking. It adds one bit to the pattern and then
requires that the modulo-2 sum of all the bits of the pattern and the parity bit have a
defined answer. The answer is 0 for even parity and 1 for odd parity. An alternative way
of making the same statement is that odd(even) parity constrains there to be an odd(even)
number of “1"s in the pattern plus parity bit. Parity bits are sufficient to catch all single
errors in the pattern plus parity bit as this will change a single 1 to a 0 or vice versa and
therefore upset the parity calculation. A repeat of the parity calculation at any time will
reveal this problem. However the system will not catch any double errors (except the
trivial case of two errors on the same bit) and these will be flagged as valid. For example
a pattern of 0110100 becomes 00110100 with the addition of a parity bit and enforcement
of odd parity. A single error would ( for example) change the pattern to 00110000 which
has the wrong parity. However a further error to 10110000 looks OK - but is in fact
wrong. The length of pattern which each parity bit “guards” should be as long as possible
so that the parity bit takes up the least amount of extra storage, but not so long that a
double error becomes likely. Thus if the probability of a single bit being is error is 10 ,
then the -6 probability of an error in an 8-bit pattern (7 + parity) is about 8 x 10 and the
probability -6 of a double error is about 3 x 10 which is quite small.
Fig 4.2: Parity Prediction circuit of each block of the proposed multi bitscheme for l < m.
Department of Electronics & Communication Engineering 44
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Table3: List of the Symbols Used
Department of Electronics & Communication Engineering 45
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CHAPTER 5
RESULTS5.1 SIMULATION RESULTS
16-bit Generator Polynomial: X16+X15+X2+1
8-bit Message Bits: 8’b00000001
16-bit Theoretical CRC: 16’b0111110100000111
Fig 5.1 RTL SCHEMATIC FOR TOP MODULE
Department of Electronics & Communication Engineering 46
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Department of Electronics & Communication Engineering 47
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig.5.2 INTERNAL STRUCTURE FOR TOP MODULE
Department of Electronics & Communication Engineering 48
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
Fig. 5.3 SIMULATION RESULT WITHOUT INTRODUCING ERROR
Fig. 5.4 SIMULATION RESULT WITH INTRODUCING ERROR
Department of Electronics & Communication Engineering 49
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
5.2 SYNTHESIS REPORTThe design has been synthesized by using genus compiler in 90nm and 45nm technology.
The results for leakage power, Dynamic power, total Power and area have been noted
down and compared. The comparison results shown in table are low for 45nm technology
and is best suited for chip fabrication.
Leakage
power
(µW)
Dynamic
power
(µW)
Total power
(µW)
Area
(microns)
Time
(ns)
90 nm 4.629 544.14 548.779 911.308 1.4
45 nm 0.018 312.003 312.00 0.9*10^-3 1.3
Table 4: Comparison table for power and area
Table 5: Comparison of serial CRC and Parallel CRC
It is evident from the above table that the area, power is reduced and the timing slack is
improved to a great extent in 90nm technology. The parallel CRC implementation
employing multi-output LFSR is better.
CHAPTER 6
Department of Electronics & Communication Engineering 50
Serial CRC Parallel CRC
Area (microns) 585 529
Total power(nW) 108005.943 57212.539
Timing slack(pS) 4809 15130
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
CONCLUSIONS A concurrent error detection design structure for parallel CRC architecture is been
proposed. Two different cases, namely l > m and l < m have been considered and the
formulation for the multiple-bit parity-based concurrent fault detection scheme has been
derived followed by the implementation of the corresponding architecture for each of
them. The error and fault coverage of the proposed scheme have been analyzed in detail
using error and fault models. We have also presented a formulation for the error detection
capability of the scheme covering all the possible 2m-1 case where the output can become
erroneous. Moreover, a thorough analysis of all the possible single stuck-at fault locations
has been discussed. Furthermore, Investigated the relation between stuck-at fault location
and error propagation resulting in an erroneous output and the relation between this
propagation with the CRC generator polynomial. Finally, the fault detection capability
and the theoretical time and area overheads of the proposed schemes have been reported
and confirmed by software simulation and ASIC implementation results. We have coded
the proposed scheme in C for software simulation and in Verilog for ASIC
implementation purpose. The proposed method is shown to be area efficient and at the
same time it has a high fault detection capability. The proposed multiple-bit parity
prediction scheme has also been shown to have not only the ability to detect a fault, but
also to point out the region of the fault. This potential advantage can further be utilized
while extending the proposed scheme to correct the error online in future, i.e., to achieve
an error free output even in the presence of a hardware error.
Department of Electronics & Communication Engineering 51
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
REFERENCES1. W. Peterson and D. Brown, “Cyclic codes for error detection,” Proc. IRE, vol. 49, no. ,
pp. 228–235, 1961.
2. ATM Layer Specification, ITU-T Recommendation I.361, 1999.
3. IEEE Standard for Information Technology: Carrier Sense Multiple Access with
Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications,
ANSI/IEEE Std 802.3-2005.
4. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specifications, ANSI/IEEE Std 802.11-1999.
5. IEEE Standard for Local and Metropolitan Area Networks: Air Interface for Fixed and
Mobile Broadband Wireless Access Systems, ANSI/IEEE Std. 802.16-2004.
6. C. Kennedy, “High performance hardware and software implementations of the cyclic
redundancy check computation,” PhD Thesis, University of Western Ontario, London,
ON, 2009.
7. S. Lin and D.J. Costello, Error Control Coding. Englewood Cliffs, NJ: Prentice-Hall,
1983.
8. T. Ramabadran and S. Gaitonde, “A tutorial on CRC computations,” IEEE Micro, vol.
no. 4, pp. 62–75, Aug. 1988.
9. T.-B. Pei and C. Zukowski, “High-speed parallel CRC circuits in VLSI,” IEEE Trans.
Commun., vol. 40, no. 4, pp. 653–657, Apr. 1992.
10. G. Albertengo and R. Sisto, “Parallel CRC generation,” IEEE Micro, vol. 10, no. 5,
pp. 63–71, Oct. 1990.
11. G. Campobello, G. Patane, and M. Russo, “Parallel CRC realization,” IEEE Trans.
Comput., vol. 52, no. 10, pp. 1312–1319, Oct. 2003.
12. M.-D. Shieh, M.-H. Sheu, C.-H. Chen, and H.-F. Lo, “A systematic approach for
parallel CRC computations,” J. Inf. Sci. Eng., vol. 17, no. 3, pp. 445–461, 2001.
13. M. Sprachmann, “Automatic generation of parallel CRC circuits,” IEEE Des. Test
Comput., vol. 18, no. 3, pp. 108–114, May/Jun. 2001.
14. C. Cheng and K. K. Parhi, “High-speed parallel CRC implementation based on
unfolding, pipelining, and retiming,” IEEE Trans. Circuits Syst. II, Express Briefs, vol.
53, no. 10, pp. 1017–1021, Oct. 2006.
Department of Electronics & Communication Engineering 52
Multiple Bit Parity Based Concurrent Error Detection Architecture For Parallel CRC Computation
15. C. Cheng and K. K. Parhi, “High speed VLSI architecture for general linear feedback
shift register (LFSR) structures,” in Proc. IEEE 43rd Asilomar Conf. Signals, Syst.
Comput., 2009, pp. 713–717.
16. M. Ayinala and K. K. Parhi, “High-speed parallel architectures for linear feedback
shift registers,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4459–4469, Sep. 2011.
17. C. Kennedy and A. Reyhani-Masoleh, “High-speed parallel CRC circuits,” in Proc.
42nd Asilomar Conf. Signals, Syst. Comput., Oct. 2008, pp. 1823–1829.
18. M. Braun, J. Friedrich, T. Grn, and J. Lembert, “Parallel CRC computation in
FPGAs” in Proc. 6th Int. Workshop Field-Programm. Logic Appl., 1996, vol. 1142, pp.
156–165.
19. R. Glaise, “A two-step computation of cyclic redundancy code CRC-32 for ATM
networks,” IBM J. Res. Develop., vol. 41, no. 6, pp. 705–709, 1997.
20. J. Derby, “High-speed CRC computation using state-space transformations,” in Proc.
IEEE Global Telecommun. Conf., 2001, vol. 1, pp. 166–170.
21. P. Reviriego, S. Pontarelli and J. A. Maestro, “Concurrent error detection for
orthogonal latin squares encoders and syndrome computation,” IEEE Trans. Very Large
Scale Integration Syst., vol. 21, no 12, pp. 2334–2338, Dec. 2013.
Department of Electronics & Communication Engineering 53