error detection and data recovery architecture for motion estimation

1 | P a g e

CHAPTER 1 INTRODUCTION

1.1 INTRODUCTION

Advances in semiconductors, digital signal processing, and communication technologies

have made multimedia applications more flexible and reliable. A good example is the H.264

video standard, also known as MPEG-4 Part 10 Advanced Video Coding, which is widely

regarded as the next generation video compression standard. Video compression is necessary in a

wide range of applications to reduce the total data amount required for transmitting or storing

video data. Among the coding systems, a ME is of priority concern in exploiting the temporal

redundancy between successive frames, yet also the most time consuming aspect of coding.

Additionally, while performing up to 60%90% of the computations encountered in the entire

coding system, a ME is widely regarded as the most computationally intensive of a video coding

system.

A ME generally consists of PEs with a size of 4 X 4. However, accelerating the

computation speed depends on a large PE array, especially in high-resolution devices with a large

search range such as HDTV. Additionally, the visual quality and peak signal-to-noise ratio

(PSNR) at a given bit rate are influenced if an error occurred in ME process. A testable design is

thus increasingly important to ensure the reliability of numerous PEs in a ME. Moreover,

although the advance of VLSI technologies facilitate the integration of a large number of PEs of a

ME into a chip, the logic-per-pin ratio is subsequently increased, thus decreasing significantly the

efficiency of logic testing on the chip. As a commercial chip, it is absolutely necessary for the

ME to introduce design for testability (DFT). DFT focuses on increasing the ease of device

testing, thus guaranteeing high reliability of a system. DFT methods rely on reconfiguration of a

circuit under test (CUT) to improve testability. While DFT approaches enhance the testability of

circuits, advances in sub-micron technology and resulting increases in the complexity of

electronic circuits and systems have meant that built-in self-test (BIST) schemes have rapidly

become necessary in the digital world. BIST for the ME does not expensive test equipment,

ultimately lowering test costs. Moreover, BIST can generate test simulations and analyze test

responses without outside support, subsequently streamlining the testing and diagnosis of digital

systems. However, increasingly complex density of circuitry requires that the built-in testing

approach not only detect faults but also specify their locations for error correcting. Thus,

2 | P a g e

extended schemes of BIST referred to as built-in self-diagnosis and built-in self-correction have

been developed recently.

While the extended BIST schemes generally focus on memory circuit, testing-related

issues of video coding have seldom been addressed. Thus, exploring the feasibility of an

embedded testing approach to detect errors and recover data of a ME is of worthwhile interest.

Additionally, the reliability issue of numerous PEs in a ME can be improved by enhancing the

capabilities of concurrent error detection (CED). The CED approach can detect errors through

conflicting and undesired results generated from operations on the same operands. CED can also

test the circuit at full operating speed without interrupting a system. Thus, based on the CED

concept, this work develops a novel EDDR architecture based on the RQ code to detect errors

and recovery data in PEs of a ME and, in doing so, further guarantee the excellent reliability for

video coding testing applications.

1.2 OVERVIEW

Video compression is the field in electrical engineering and computer science that deals

with representation of video data, for storage and/or transmission, for both analog and digital

video. Video coding is often considered to be only for natural video, it can also be applied to

synthetic (computer generated) video, i.e. graphics. Many representations take advantage of

features of the Human Visual System to achieve an efficient representation. The biggest

challenge is to reduce the size of the video data using video compression. For this reason the

terms video coding and often used interchangeably by those who dont know the difference.

The search for efficient video compression techniques dominated much of the research activity

for video coding since the early 1980s, the major milestone was H.261, from which JPEG

adopted the idea of using the DCT; since the many other advancements have been made to

algorithms such as motion estimation. Since approximately 2000 the focus has been more on

Meta data and video search, resulting in MPEG-7 and MPEG-21.

1.2.1 Video Compression

The main problem with the uncompressed (raw) video is it contain immense amount of

data and hence communication and storage capabilities are limited and are expensive. For

example, if we consider a HDTV video signal with 720 X 2180 pixels/frame with progressive

scanning at 60 frames/sec, then the transmitter must be able to send

3 | P a g e

( 720X1280 pixels

)( 60 frames

)( 3colours

)( 8bits

) = 1.3Gb/sec

frame sec pixels colour

But the available HDTV channel bandwidth is around 20 Mb/s, i.e., it require compression by a

factor of 70. A DVD (Digital Versatile Disk) can only store a few seconds of raw video at Frame

rate and television quality resolution.

1.2.2 Need for Compression

The following statement (or something similar) has been made many times over the 20-

year history of image and video compression: Video compression will become redundant very

soon, once transmission and storage capacities have increased to a sufficient level to cope with

uncompressed video. It is true that both storage and transmission capacities Continue to

increase. However, an efficient and well-designed video compression system gives very

significant performance advantages for visual communications at both low and high

transmission bandwidths. At low bandwidths, compression enables applications that would not

otherwise be possible, such as basic-quality video telephony over a standard telephone

connection. At high bandwidths, compression can support a much higher visual quality. For

example, a 4.7 Gbyte DVD can store approximately 2 hours of uncompressed QCIF video (at 15

frames per second) or 2 hours of compressed ITU-R 601 video (at 30 frames per second). Most

users would prefer to see television-quality video with smooth motion rather than postage-

stamp video with jerky motion. Video CODEC s will therefore remain a important part of the

emerging multimedia industry for the foreseeable future, allowing designers to make the most

efficient use of available transmission or storage capacity. In this chapter we introduce the basic

components of an image or video compression system. We then describe the main functional

blocks of an image encoded decoder (CODEC) and a video CODEC.

1.2.3 Achieving Compression

Video compression can be achieved by exploiting the similarities or redundancies and

irrelevancy that exist in a typical video signal. The redundancy in a video signal is based on two

principles. The first is the spatial redundancy that exists in each frame. The second is the

difference between the corresponding frames. This is called temporal redundancy. This temporal

redundancy can be eliminated by using motion estimation and compensation procedure. The

remaining goals of video compression are to reduce the irrelevancy in the video signal is

4 | P a g e

relatively straight forward. The identification and reduction redundancy in video signal is the

straightforward, in this what is perceptually relevant and what is not is very difficult and there.

This operation can be done by using appropriate models of the Human Vision System.

In video successive frames may contain the same objects (still or moving). In inter frame

coding motion estimation and compensation has become powerful techniques to eliminate the

temporal redundancy due high correlation between consecutive frames. In video scenes, motion

can be complex combination of translation of rotation. Such motion is difficult to estimate and

may require large amount of processing. The translational motion is easily estimated and has

been used successfully for motion compensated coding.

Different search algorithms are used to estimate motion between frames. When motion

estimation is performed by an MPEG-2 encoder it groups pixels into 16 x 16 macro blocks.

MPEG-4 AVC encoders can divide these macro blocks into partitions as well as 4 x 4, and even

of variable size within the same Macro block. Partitions allow for more accuracy in motion

estimation because areas with high motion can be isolated from those with less movement.

5 | P a g e

CHAPTER 2 LITERATURE REVIEW AND PROBLEM

IDENTIFICATION

2.1 LITERATURE REVIEW

Most of the researchers have proposed many algorithms for Motion Estimation.

Generally, the motion estimation search types are divided into two types these are

1). Pixel based motion estimation.

2).Block based motion estimation

The pixel based motion estimation approach seeks to determine motion vector for every

pixel in the image. This is also referred to as the optical flow method, which works on the

fundamental assumption of brightness constancy that is the intensity of a pixel remains constant,

when it is displaced. However, no unique match for a pixel in the reference frame is found in the

direction normal to the intensity gradient. It is for this reason that an additional constraint is also

introduced in terms of the smoothness displacement vector in the neighborhood. The smoothness

constraint makes the algorithm interactive and requires excessively large computation time,

making it unsuitable for practical and real time implementation.

An alternate and faster approach is the block based motion estimation. In this method, the

candidate frame is divided into non overlapping blocks (of size 16 16, 8 8 or even 4 4

pixels in the recent standards) and for each such candidate block, the best motion vector is

determined in the reference frame. Here, a single motion vector is computed for the entire block,

whereby we make an inherent assumption that the entire block undergoes translational Motion.

This assumption is reasonably valid, except for the object boundaries and smaller block size leads

to better motion compensation and motion estimation.

Block based motion estimation is accepted in all the video coding standards proposed till

date. It is easy to implement in hardware and real time motion estimation and prediction is

possible.

Many studies in literature use different block matching motion estimation algorithms in

these algorithms full search gives minimum error when compared to the all block matching

algorithms this is the basic search but it takes maximum computational complexity. So that in

literature survey we find so many block matching algorithms in this block matching motion

6 | P a g e

estimation we have to remember two main things the first one is the type of search pattern this

one is the most important one because when the object moving we have to follow the certain

pattern then only we can get the minim number of search points for block matching. The second

one is meaning absolute difference so when the search pattern is very close to objects then we

can get the minimum error.

The block size is the one of the important parameter in block matching algorithm. If the

block size is smaller, it achieves better prediction quality. This is due to a number of reasons. A

smaller block size reduces the effect of the accuracy problem. In other words, with a smaller

block size, there is less possibility that the block will contain different objects moving in different

directions.

In literature review I observed different searching algorithms in all these if the error

minimizes the number of search points increasing If search points decreases the error increases so

that by observing these algorithms proposed one fast block matching algorithm this given best

search accuracy and minim error. Finally the performance of the proposed algorithm is evaluated

in terms of completeness and correctness.

2.1.1 Motion Estimation

A video sequence can be considered to be a discredited three dimensional projection of

the real four-dimensional continuous space time. The objects in the real world may move, rotate,

or deform. The movements cannot be observed directly, but instead the light reflected from the

object surfaces and projected onto an image. The light can be moving, and the reflected back

light varies depending on the angle between a surface and alight source. There may be objects

occluding the light rays and casting shadows. The objects may be transparent (so that several

independent motions could be observed at the same location of an image) or there might be fog,

rain or snow blurring the observed image. The discretization causes noise into the video

sequence, from which the video encoder makes its motion estimations. There may also be noises

in the image capture device (such as a video camera) or in the electrical transmission lines. A

perfect motion model would take all the factors into account and find the motion that has the

maximum likelihood from the observed video sequence. The current frame and reference frame

difference can observe in the figure 2.1 diagram.

7 | P a g e

Changes between frames are mainly due to the movement of objects. Using a model of

the motion of objects between frames, the encoder estimates the motion that occurred between

the reference frame and the current frame. This process is called motion estimation (ME). The

encoder then uses this motion model and information to move the contents of the reference frame

to provide a better prediction of the current frame. This process is known as motion

compensation (MC), and the prediction so produced is called the motion-compensated prediction

(MCP) or the displaced-frame (DF). In this case, the coded prediction error signal is called the

displaced-frame difference (DFD).A block diagram of a motion compensated coding system is

illustrated in Figure 2.2 This is the most commonly used inter frame coding method.

Figure 2.1 Motion estimation detector

The reference frame employed for ME can occur temporally before or after the current

frame. The two cases are known as forward prediction and backward prediction, respectively.

The prediction can be observed in figure 2.3. In bidirectional prediction, however, two reference

frames (one each for forward and backward prediction) are employed and the two predictions are

interpolated (the resulting predicted frame is called B-frame). The most commonly used ME

method is the block matching motion estimation (BMME) algorithm.

8 | P a g e

Figure 2.2 Motion compensated video coding

Figure 2.3 Predictive sources coding with motion compensation

2.1.2 Motion Estimation Procedure

After completion of motion estimation the residue of picture and motion vectors are

predicted. This procedure is executed for each block (16x16, 8x8 or 4x4) in the current frame.

9 | P a g e

1. For the reference frame, a search area is defined for each block in the current frame. The

search area is typically sized at 2 to 3 times the macro block size (16x16). Using the fact

that the motion between consecutive frames is statistically small, the search range is

confined to this area. After the search process, a best match will be found within the

area. The best matching usually means having lowest energy in the sum of residual

formed by subtracting the candidate block in search region from the current block located

in current frame. The process of finding best match block by block is called block-based

motion estimation.

2. After finding the best match, the motion vectors and residues between the current block

and reference block are computed. The process of getting the residues and motion vectors

is known as motion compensation.

3. The residues and motion vectors of best match are encoded by the transform unit and

entropy unit and transmitted to the decoder side.

4. At decoder side, the process is reversed to reconstruct the original picture.

Figure 2.4 shows an illustration of the above procedure. In modern video coding standards, the

reference frame can be a previous frame, a future frame or a combination of two or more

Previously coded frames. The number of reference frames needed depends on the required

accuracy. The more reference frames referenced by current block, the more accurate the

prediction is.

2.1.3 Motion Vectors

To find the motion of each block, a motion vector is defined as the relative displacement

between the current candidate block and the best matching block within the search window in the

reference frame. It is a directional pair representing the displacement in horizontal (x-axis)

direction and vertical (y-axis direction). The maximum value of motion vector is determined by

the search range. If the search range is more, the more bits needed to code the motion vector.

Designers need to make tradeoffs between these two conflicting parameters. The motion vector is

illustrated in figure 2.4

10 | P a g e

Figure 2.4 Motion Estimation and Motion Vector

Each macro block in the frame motion vector is produced. MPEG-1 and MPEG-2 employ

this property. The introduction of variable block size motion estimation in MPEG-4 and

H.264/AVC, one macro block can produce more than one motion vector due to the existence of

different kinds of sub blocks. In H.264, 41 motion vectors should be produced in one macro

block and they are passed to rate distortion optimization to choose the best combination. This is

known as mode selection.

2.1.4 Prediction Of Video CODEC

A video signal consists of a sequence of individual frames. Each frame may be

compressed individually using an image CODEC as described above: this is described as intra-

frame coding, where each frame is intra coded without any reference to other frames. However,

better compression performance may be achieved by exploiting the temporal redundancy in a

video sequence (the similarities between successive video frames). This may be achieved by

adding a front end to the image CODEC, with two main functions:

11 | P a g e

These are: (1) The Prediction

(2) Compensation

Prediction: The prediction of the current frame find based on the one or more previously

transmitted frames.

Compensation: The prediction is subtracted from the current frame to produce a residual frame.

The residual frame is then processed using an image CODEC. The key to this approach

is the prediction function: if the prediction is accurate, the residual frame will contain little data

and will hence be compressed to a very small size by the image CODEC. In order to decode the

frame, the decoder must reverse the compensation process, adding the prediction to the decoded

residual frame (reconstruction). This is inter frame coding: frames are coded based on some

relationship with other video frames, i.e. coding exploits the interdependencies of video frames.

Figure 2.5 (a) Current Frame Figure 2.5(b) Previous Frame

12 | P a g e

Figure 2.5(c) Residual frame

2.1.5 Frame Differencing

The simplest predictor is just the previous transmitted frame. The above figure shows the

residual frame produced by subtracting the previous frame from the current frame in a video

sequence. Mid-grey areas of the residual frame contain zero data: light and dark areas indicate

positive and negative residual data respectively. It is clear that much of the residual data is zero:

hence, compression efficiency can be improved by compressing the residual frame rather than the

current frame.

Encoder Encoder Encoder output Decoder Decoder

Input Prediction Decoder input Prediction Output

Original frame Zero Compressed Zero Decoded frame 1

Frame 1

Original frame Original frame 1 Compressed Decoded frame 1 Decoded frame 2

Residual frame 2

Original frame Original frame 2 Compressed Decoded frame 2 Decoded frame 3

Residual frame 2

Table 2.1 Prediction drift

13 | P a g e

Figure 2.6 Encoder with Decoding Loop

The decoder faces a potential problem that can be illustrated as follows. Table 2.1

shows the sequence of operations required to encode and decode a series of video frames

using frame differencing. For the first frame the encoder and decoder use no prediction. The

problem starts with frame 2: the encoder uses the original frame 1 as a prediction and encodes

the resulting residual. However, the decoder only has the decoded frame 1 available to form

the prediction. Because the coding process is lossy, there is a difference between the decoded

and original frame 1 which leads to a small error in the prediction of frame 2 at the decoder.

This error will build up with each successive frame and the encoder and decoder predictors

will rapidly drift apart, leading to a significant drop in decoded quality. The solution to this

problem is for the encoder to use a decoded frame to form the prediction. Hence the encoder

in the above example decodes (or reconstructs) frame 1 to form a prediction for frame 2. The

encoder and decoder use the same prediction and drift should be reduced or removed. Figure

2.6 shows the complete encoder which now includes a decoding loop in order to reconstruct

its prediction reference. The reconstructed (or reference) frame is stored in the encoder and

in the decoder to form the prediction for the next coded frame.

2.1.6 Motion Compensated Prediction

Frame differencing gives better compression performance than the intra frame coding

when successive frames are very similar, but does not perform well when there is a significant

change between the previous and current frames. Such changes are usually due to movement

14 | P a g e

in the video scene and a significantly better prediction can be achieved by estimating this

movement and compensating for it.

The below figure 2.7 shows a video CODEC that uses motion-compensated

prediction. Two new steps are required in the encoder:

Motion estimation: a region of the current frame (often a rectangular block of

luminance samples) is compared with neighboring regions of the previous

reconstructed frame.

Figure 2.7 Video CODEC with Motion Estimation and Compensation

Motion estimator attempts to find the best match, i.e. the neighboring block in the

reference frame that gives the smallest residual block.

Motion compensation: The matching region or block from the reference frame

identified by The motion estimator) is subtracted from the current region or block.

15 | P a g e

Here the decoder carries out the same motion compensation to re construct the current

frame. This operation means the encoder has to transmit the location of the best matching

blocks to the decoder (typically in the form of a set of motion vectors). The below figure shows a

residual frame produced by subtracting a motion compensated version of the previous frame from

the current frame. The residual frame clearly contains less data than the residual in Figure 2.8 this

improvement in compression does not come without a price: motion estimation can be very

computationally intensive. Design of a motion estimation algorithm can have a dramatic effect on

the compression performance and computational complexity of a video CODEC.

Figure 2.8 Residual frame (MAD)

2.1.7 Block Matching Algorithm

Figure 2.9 illustrates a process of block-matching algorithm. In a typical block matching

Algorithm, each frame is divided into blocks, each of which consists of luminance and

chrominance blocks. Usually, for coding efficiency, motion estimation is performed only on the

luminance block. Each luminance block in the present frame is matched against candidate blocks

in a search area on the reference frame. These candidate blocks are just the displaced versions of

original block. The best candidate block is found and its displacement (motion vector) is

recorded. In a typical inter frame coder; the input frame is subtracted from the prediction of the

reference frame. Consequently the motion vector and the resulting error can be transmitted

instead of the original luminance block; thus inter frame redundancy is removed and data

16 | P a g e

compression is achieved. At receiver end, the decoder builds the frame difference signal from the

received data and adds it to their constructed reference frames.

Figure 2.9 Illustration of Motion Estimation Process

This algorithm is based on a translational model of the motion of objects between frames.

It also assumes that all pixels within a block undergo the same translational movement. There are

many other ME methods, but BMME is normally preferred due to its simplicity and good

compromise between prediction quality and motion overhead. This assumption is not strictly

valid, since we capture 3-D scenes through the camera and objects do have more degrees of

freedom than just the translational one. However, the assumptions are still reasonable,

considering the practical movements of the objects over one frame and this makes our

computations much simpler. There are many other approaches to motion estimation, some using

the frequency or wavelet domains, and designers have considered scope to invent new methods

since this process does not need to be specified in coding standards. The standards need only

specify how the motion vectors should be interpreted by the decoder. Block Matching (BM) is

the most common Method of motion estimation. Typically each macro block (1616 pixels) in

the new frame is compared with shifted regions of the same size from the previous decoded

frame, and the shift which results in the minimum error is selected as the best motion vector for

that macro block. The motion compensated prediction frame is then formed from all the shifted

regions from the previous decoded frame .

17 | P a g e

2.1.8 Backward Motion Estimation

The motion estimation generally considered as backward motion estimation, since the

current frame is considered as the candidate frame and the reference frame on which the motion

vectors are searched is a fast frame that is the search is back word. Back word motion estimation

leads to forward motion prediction.

Figure 2.10 Back word motion estimation with current frame as k and frame (k-1) as the

reference frame

2.1.9 Forward Motion Estimation

It is just the opposite of backward motion estimation. Here, the search for motion vectors

is carried out on a frame that appears later than the candidates frame in temporal ordering. In

other words, the search is forward. Forward motion estimation leads to backward motion

prediction. It may appear that forward motion estimation is unusual, since one requires future

frames to predict the candidate frame. However, this is not unusual, since the candidate frame,

for which the motion vector is being sought is not necessarily the current, that is the most recent

frame. It is possible to store more than one frame and use one of the past frames as a candidate

frame that uses another frame, appearing later in the temporal order as a reference.

18 | P a g e

Figure 2.11 Forward motion estimation with current frame as k and frame (k+1) as the

reference frame

2.1.10 Matching Criteria For Motion Estimation

Inter frame predictive coding is used to eliminate the large amount of temporal and spatial

redundancy that exists in video sequences and helps in compressing them. In conventional

predictive coding the difference between the current frame and the predicted frame is coded and

transmitted. The better the prediction, the smaller the error and hence the transmission bit rate

when there is motion in a sequence, then a pixel on the same part of the moving object is a better

prediction for the current pixel. There are a number of criteria to evaluate the goodness of a

match.

Popular matching criteria used for block-based motion estimation is

1. Sum of Absolute Difference (SAD)

To implement the block motion estimation, the candidate video frame is partitioned into a

set of non-overlapping blocks and the motion vector is to be determined for each such candidate

block with respect to the reference. For each of these criteria, square block of size N X N pixels is

considered. The intensity value of the pixel at coordinate ( n1 , n2 ) in the frame k is given by,

S ( n1 , n2 , k) where (0 =< n1 , n2 =< N -1). the frame k is referred to as the candidate frame and

the block of pixels defined is the candidates block.

19 | P a g e

2.1.10.1.Sum Of Absolute Difference (SAD)

The sum of absolute difference (SAD) too makes the error values as positive, but instead

of summing up the squared differences, the absolute differences are summed up. The SAD

measure at displacement (i , j) is defined as

The SAD is evaluated using the current block/pixel and the reference block/pixel. Reference

block is selected within the search window. The search window size is different for various

CODECs such as H.264, MPEG etcetera. Hence, SAD for all the possible reference blocks within

the search window are calculated. The block with minimum SAD is then selected and a motion

vector is drawn in order to denote the motion.

2.2 PROBLEM IDENTIFICATION

As mentioned in the earlier discussion, the PEs are essential building blocks and are

connected regularly to construct a ME. Generally, PEs are surrounded by sets of ADDs and

accumulators that determine how data flows through them. PEs can thus be considered the class

of circuits called ILAs, whose testing assignment can be easily achieved by using the fault model,

cell fault model (CFM). Using CFM has received considerable interest due to accelerated growth

in the use of high-level synthesis, as well as the parallel increase in complexity and density of

integration circuits (ICs). Using CFM makes the tests independent of the adopted synthesis tool

and vendor library. Arithmetic modules, like ADDs (the primary element in a PE), due to their

regularity, are designed in an extremely dense configuration.

Moreover, a more comprehensive fault model, i.e. the stuck-at (SA) model, must be

adopted to cover actual failures in the interconnect data bus between PEs. The SA fault is a well

known structural fault model, which assumes that faults cause a line in the circuit to behave as if

it were permanently at logic 0 (stuck-at 0 (SA0)) or logic 1 [stuck-at 1 (SA1)]. The SA fault

in a ME architecture can incur errors in computing SAD values. A distorted computational error

and the magnitude of are assumed here to be equal to SAD-SAD, where SAD denotes the

computed SAD value with SA faults.

20 | P a g e

CHAPTER 3 METHODOLOGY

An error detector and data recovery EDDR is the solution that has been proposed for the

above mentioned problem. The technique used for detection and correction of code is RQ code

generation. The proposed architecture is mentioned below.

3.1 PROPOSED EDDR ARCHITECTURE DESIGN

Fig. 3.1 shows the conceptual view of the proposed EDDR scheme, which comprises two

major circuit designs, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to detect

errors and recover the corresponding data in a specific CUT. The test code generator (TCG) in

Fig. 3.1 utilizes the concepts of RQ code to generate the corresponding test codes for error

detection and data recovery. In other words, the test codes from TCG and the primary output

from CUT are delivered to EDC to determine whether the CUT has errors. DRC is in charge of

recovering data from TCG. Additionally, a selector is enabled to export error-free data or data-

recovery results. Importantly, an array-based computing structure, such as ME, discrete cosine

transform (DCT), iterative logic array (ILA), and finite impulse filter (FIR), is feasible for the

proposed EDDR scheme to detect errors and recover the corresponding data.

Figure 3.1. Conceptual view of the proposed EDDR architecture.

21 | P a g e

Figure 3.2. A specific PEi testing processes of the proposed EDDR architecture.

This work adopts the systolic ME as a CUT to demonstrate the feasibility of the proposed

EDDR architecture. A ME consists of many PEs incorporated in a 1-D or 2-D array for video

encoding applications. A PE generally consists of two ADDs (i.e. an 8-b ADD and a 12-b ADD)

and an accumulator (ACC). Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the

addition of the current pixel (Cur_pixel) and reference pixel (Ref_pixel). Additionally, a 12-b

ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine

the sum of absolute difference (SAD) value for video encoding applications. Notably, some

registers and latches may exist in ME to complete the data shift and storage. Fig. 3.2 shows an

example of the proposed EDDR circuit design for a specific PEi of a ME. The fault model

definition, RQCG-based TCG design, operations of error detection and data recovery, and the

overall test strategy are described carefully as follows.

3.1.1 RQ Code Generation

Coding approaches such as parity code, Berger code, and residue code have been

considered for design applications to detect circuit errors. Residue code is generally separable

arithmetic codes by estimating a residue for data and appending it to data. Error detection logic

for operations is typically derived by a separate residue code, making the detection logic is

simple and easily implemented. For instance, assume that N denotes an integer, and represent

data words, N1 and N2 refers to the modulus. A separate residue code of interest is one in which N

is coded as a pair(N, |N|m). Notably, |N|m is the residue of modulo. Error detection logic for

22 | P a g e

operations is typically derived using a separate residue code such that detection logic is simply

and easily implemented. However, only a bit error can be detected based on the residue code.

Additionally, an error cannot be recovered effectively by using the residue codes. Therefore, this

work presents a quotient code, which is derived from the residue code, to assist the residue code

in detecting multiple errors and recovering errors. The mathematical model of RQ code is simply

described as follows. Assume that binary data is expressed as

The RQ code of X modulo m expressed as R = |X|m Q = [X/m], respectively. Notably, [i]

denotes the largest integer not exceeding i.

According to the above RQ code expression, the corresponding circuit design of the

RQCG can be realized. In order to simplify the complexity of circuit design, the implementation

of the module is generally dependent on the addition operation. Additionally, based on the

concept of residue code, the following definitions shown can be applied to generate the RQ code

for circuit design.

23 | P a g e

To accelerate the circuit design of RQCG, the binary data can generally be divided into two parts:

Significantly, the value of k is equal to [n/2] and the data formation of Y0 and Y1 are a

decimal system. If the modulus m=2k -1, then the residue code of X modulo m is given by

Notably, since the value of is generally greater than that of modulus , the

equations must be simplified further to replace the complex module operation with a simple

addition operation by using the parameters , , and .

Based on the equations, the corresponding circuit design of the RQCG is easily realized

by using the simple adders (ADDs). Namely, the RQ code can be generated with a low

complexity and little hardware cost.

3.1.2 Test Code Generation Design

According to Fig. 3.2, TCG is an important component of the proposed EDDR

architecture. Notably, TCG design is based on the ability of the RQCG circuit to generate

corresponding test codes in order to detect errors and recover data. The specific in Fig. 3.2

estimates the absolute difference between the Cur_pixel of the search area and the Ref_pixel of

24 | P a g e

the current macroblock. Thus, by utilizing PEs, SAD shown in as follows, in a macroblock with

size of can be evaluated

where and denote the corresponding RQ code of and modulo .

Importantly, and rep-resent the luminance pixel value of Cur_pixel and Ref_pixel,

respectively. Based on the residue code, the definitions shown can be applied to facilitate

generation of the RQ code ( and ) form TCG. Namely, the circuit design of TCG can be

easily achieved (see Fig. 3.3) by using

Fig. 3.4 shows the timing chart for a macroblock with a size of 4 4 in a specific to

demonstrate the operations of the TCG circuit. The data and from Cur_pixel and Ref_pixel

must be sent to a comparator in order to determine the luminance pixel value and at the

1st clock. Notably, if , then and are the luminance pixel value of Cur_pixel and

Ref_pixel, respectively. Conversely, represents the luminance pixel value of Ref_pixel, and

denotes the luminance pixel value of Cur_pixel when . At the 2nd clock, the values

of and are generated and the corresponding RQ code , , , can be

captured by the and circuits if the 3rd clock is triggered. Equations clearly

25 | P a g e

indicate that the codes of and can be obtained by using the circuit of a subtracter (SUB).

The 4th clock displays the operating results. The modulus value of is then obtained at the 5th

clock. Next, the summation of quotient values and residue values of modulo are proceeded

with from clocks 521 through the circuits of ACCs. Since a 4 4 macroblock in a specific

of a ME contains 16 pixels, the corresponding RQ code ( and ) is exported to the EDC and

DRC circuits in order to detect errors and recover data after

Figure 3.3. Circuit design of the TCG.

22 clocks. Based on the TCG circuit design shown in Fig. 3.4, the error detection and data

recovery operations of a specific in a ME can be achieved.

26 | P a g e

Figure 3.4. Timing chart of the TCG.

3.2 EDDR PROCESS

Fig. 3.2 clearly indicates that the operations of error detection in a specific PEi is achieved

by using EDC, which is utilized to compare the outputs between TCG and RQCG1 in order to

determine whether errors have occurred. If the values of RPE != RT and/or QPE != QT, then the

errors in a specific PEi can be detected. The EDC output is then used to generate a 0/1 signal to

indicate that the tested PEi is error-free/errancy.

This work presents a mathematical statement to verify the operations of error detection.

Based on the definition of the fault model, the SAD value is influenced if either SA1 and/or SA0

errors have occurred in a specific PEi. in the other words, the SAD value is transformed to

SAD = SAD + e if an error e occurred. Notably, the error signal e is expressed as

27 | P a g e

The RPEi and QPEi are given by the equation

During data recovery, the circuit DRC plays a significant role in recovering RQ code from TCG.

The data can be recovered by implementing the mathematical model as

To realize the operation of data recovery in, a Barrel shift and a corrector circuits are necessary

to achieve the func-tions of and , respectively. Notably, the proposed

EDDR design executes the error detection and data re-covery operations simultaneously.

Additionally, error-free data from the tested or the data recovery that results from DRC is

selected by a multiplexer (MUX) to pass to the next specific for subsequent testing as

shown in fig. 3.5.

28 | P a g e

Figure 3.5. Proposed EDDR architecture design for a ME.

29 | P a g e

CHAPTER 4 - IMPLEMENTATION

4.1 INTRODUCTION TO VLSI:

VLSI stands for "Very Large Scale Integrated Circuits". It's a classification of ICs. An IC

of common VLSI includes about millions active devices. Typical functions of VLSI include

Memories, computers, and signal processors, etc. A semiconductor process technology is a

method by which working circuits can be manufactured from designed specifications. There are

many such technologies, each of which creates a different environment or style of design. In

integrated circuit design, the specification consists of polygons of conducting and

semiconducting material that will be layered on top of each other to produce a working chip.

When a chip is custom-designed for a specific use, it is called an application-specific integrated

circuit (ASIC). Printed-circuit (PC) design also results in precise positions of conducting

materials, as they will appear on a circuit board; in addition, PC design aggregates the bulk of the

electronic activity into standard IC packages, the position and interconnection of which are

essential to the final circuit. Printed circuitry may be easier to debug than integrated circuitry is,

but it is slower, less compact, more expensive, and unable to take advantage of specialized silicon

layout structures that make VLSI systems so attractive. The design of these electronic circuits can

be achieved at many different refinement levels from the most detailed layout to the most abstract

architectures. Given the complexity that is demanded at all levels, computers are increasingly

used to aid this design at each step. It is no longer reasonable to use manual design techniques, in

which each layer is hand etched or composed by laying tape on film. Thus the term computer-

aided design or CAD is a most accurate description of this modern way and seems more broad in

its scope than the recently popular term computer-aided engineering (CAE)

4.1.1 Application Of VLSI:

PLAs:

Combinational circuit elements are an important part of any digital design. Three

common methods of implementing a combinational block are random logic, read-only memory

(ROM), and programmable logic array (PLA). In random-logic designs, the logic description of

the circuit is directly translated into hardware structures such as AND and OR gates. The PLA

occupies less area on the silicon due to reduced interconnection wire space; however, it may be

30 | P a g e

slower than purely random logic. A PLA can also be used as a compact finite state machine by

feeding back part of its outputs to the inputs and clocking both sides. Normally, for high-speed

applications, the PLA is not implemented as two NOR arrays. The inputs and outputs are inverted

to preserve the AND-OR structure.

Gate-Arrays:

The gate-array is a popular technique used to design IC chips. Like the PLA, it contains a fixed

mesh of unfinished layout that must be customized to yield the final circuit. Gate-arrays are more

powerful, however, because the contents of the mesh are less structured so the interconnection

options are more flexible. Gate-arrays exist in many forms with many names, eg: uncommitted

logic arrays and master-slice. The disadvantage of gate-arrays is that they are not optimal for any

task.

Gate Matrices:

The gate matrix is the next step in the evolution of automatically generated layout from high-

level specification. Like the PLA, this layout has no fixed size; a gate matrix grows according to

its complexity. Like all regular forms of layout, this one has its fixed aspects and its customizable

aspects. In gate matrix layout the fixed design consists of vertical columns of polysilicon gating

material. The customizable part is the metal and diffusion wires that run horizontally to

interconnect and form gates with the columns.

4.1.2 Application Areas Of VLSI

Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in

some cases have replaced mechanisms that operated mechanically, hydraulically, or by other

means; electronics are usually smaller, more flexible, and easier to service. In other cases

electronic systems have created totally new applications. Electronic systems perform a variety of

tasks, some of them visible, some more hidden:

1. Personal entertainment systems such as portable MP3 players and DVD players perform

sophisticated algorithms with remarkably little energy.

2. Electronic systems in cars operate stereo systems and displays; they also control fuel

injection systems, adjust suspensions to varying terrain, and perform the control functions

required for anti-lock braking (ABS) systems.

31 | P a g e

3. Digital electronics compress and decompress video, even at high definition data rates, on-

the-fly in consumer electronics.

4. Low-cost terminals for Web browsing still require sophisticated electronics, despite their

dedicated function.

5. Personal computers and workstations provide word-processing, financial analysis, and

games. Computers include both central processing units (CPUs) and special-purpose

hardware for disk access, faster screen display, etc.

4.1.3 Advantages Of VLSI

While we will concentrate on integrated circuits in this book, the properties of integrated

circuits what we can and cannot efficiently put in an integrated circuitlargely determine the

architecture of the entire system. Integrated circuits improve system characteristics in several

critical ways. ICs have three key advantages over digital circuits built from discrete components:

Size. Integrated circuits are much smallerboth transistors and wires are shrunk to micrometer

sizes, compared to the millimeter or centimeter scales of discrete components. Small size leads to

advantages in speed and power consumption, since smaller components have smaller parasitic

resistances, capacitances, and inductances.

Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip than they

can between chips. Communication within a chip can occur hundreds of times faster than

communication between chips on a printed circuit board. The high speed of circuits on-chip is

due to their small sizesmaller components and wires have smaller parasitic capacitances to

slow down the signal

Power consumption. Logic operations within a chip also take much less power. Once again,

lower power consumption is largely due to the small size of circuits on the chipsmaller

parasitic capacitances and resistances require less power to drive them.

4.2 VLSI AND SYSTEMS

These advantages of integrated circuits translate into advantages at the system level:

Smaller physical size. Smallness is often an advantage in itselfconsiders portable televisions

or handheld cellular telephones.

32 | P a g e

Lower power consumption. Replacing a handful of standard parts with a single chip reduces

total power consumption. Reducing power consumption has a ripple effect on the rest of the

system: a smaller, cheaper power supply can be used; since less power consumption means less

heat, a fan may no longer be necessary; a simpler cabinet with less shielding for electromagnetic

shielding may be feasible, too.

Reduced cost. Reducing the number of components, the power supply requirements, cabinet

costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that

the cost of a system built from custom ICs can be less, even though the individual ICs cost more

than the standard parts they replace. Understanding why integrated circuit technology has such

profound influence on the design of digital systems requires understanding both the technology

of IC manufacturing and the economics of ICs and digital systems.

4.3 INTRODUCTION TO ASICS AND PROGRAMMABLE LOGIC:

The last 15 years have witnessed the demise in the number of cell-based ASIC designs as

a means for developing customized SoCs. Rising NREs, development times and risk have mostly

restricted the use of cell-based ASICs to the highest volume applications; applications that can

withstand the multi-million dollar development costs associated with 1-2 design re-spins.

Analysts estimate that the number of cell based ASIC design starts per year is now only between

2000-3000 compared to ~10,000 in the late 1990s. The FPGA has emerged as a technology that

fills some of the gap left by cell-based ASICs. Yet even after 20+ years of existence and 40X

more design starts per year than cell-based ASICs, the size of the FPGA market in dollar terms

remains only a fraction that of cell-based ASICs. This suggests that there are many FPGA

designs that never make it into production and that for the most part, the FPGA is still seen by

many as a vehicle for prototyping or college education and has perhaps even succeeded in

actually stifling industry innovation. This paper introduces a new technology, the second

generation Structured ASIC that is tipped to reenergize the path to innovation within the

electronics industry. It brings together some of the key advantages of FPGA technology (i.e. fast

turnaround, no mask charges, no minimum order quantity) and of cell-based ASIC (i.e. low unit

cost and power) to deliver a new platform for SoC design. This document defines requirements

for development of Application Specific Integrated Circuits (ASICs). It is intended to be used as

an appendix to a Statement of Work. The document complements the ESA ASIC Design and

33 | P a g e

Assurance Requirements (AD1), which is a precursor to a future ESA PSS document on ASIC

design.

Moores Law

In the 1960s Gordon Moore predicted that the number of transistors that could be

manufactured on a chip would grow exponentially. His prediction, now known as Moores Law,

was remarkably prescient. Moores ultimate prediction was that transistor count would double

every two years, an estimate that has held up remarkably well. Today, an industry group

maintains the International Technology Roadmap for Semiconductors (ITRS), that maps out

strategies to maintain the pace of Moores Law. (The ITRS roadmap can be found at

http://www.itrs.net.)

4.3.1 Changing Landscape

Structured ASICs

A new alternative has recently emerged to address the market void between FPGAs and

cell-based ASICs. Analysts term this as the Structured ASIC.

First Generation Structured ASICs

Like the FPGA market, the Structured ASIC market had a flurry of early entrants many of

who have departed the market. Examples include respectable semiconductor companies like

NEC, LSI logic and EDA vendors such as Simplicity.

First Generation Structured ASICs provided designers with considerable power and cost

improvements over FPGAs but failed to remove many barriers to entry that existed with

traditional cell-based ASICs. First generation Structured ASICs had the following characteristics:

1. Turn-around times were still 2-5 months from tape-out to silicon

2. NREs were still in the range of $150-$250K or more making the technology difficult to

access for mainstream users.

3. Minimum order quantities were required as wafers could not be shared amongst projects

or customers

4. Development costs and time were also very high and long respectively, as designers were

expected to undergo rigorous verification down to the transistor level

34 | P a g e

5. Designers transitioning from prototyping devices like FPGAs to first generation

Structured ASICs were still expected to redesign the product into a completely new

device, revisit timing closure and re-qualify the new device before it production ready.

While some companies still offer first generation Structured ASICs today, market

acceptance has been severely limited as a result of these barriers to entry. However, these first

generation Structured ASICs paved the way for a new generation that would combine the benefits

of both FPGAs and cell-based ASICs.

Second Generation Structured ASICs

A new generation of Structured ASICs has emerged on the market and is gaining traction.

This generation utilizes a single via mask for configuring the device. In doing so, it removes the

need for the massive amounts of SRAM configuration elements and metal interconnect that

plagues todays FPGAs. The benefits to designers are delivered through a device that provides up

to 20X lower device power consumption and up to 80% lower unit cost than FPGAs, depending

on device density, (larger FPGAs have more configuration elements and metal interconnect).

This new generation of Structured ASICs, available from eASIC Corporation, and named

Extreme also removes the barriers of traditional cell based ASICs and also first generation

Structured ASICs. With Extremes Structured ASICs advantages include:

1. Turn-around times from tape-out to silicon is only 3-4 weeks

2. There are zero mask charges as multiple projects can be shared on a wafer

3. There is no minimum order quantity

4. Development tools costs are low (analogous to FPGA type tools)

5. Development time is short as designers need not perform verification down to the

transistor level or perform exhaustive test coverage

6. Coarse FPGA-like architecture based on calls which provides manufacturing yield

advantages.

There are device options for both prototyping and mass production. Designers

transitioning from prototyping Nextreme Structured ASICs to mass production Nextreme

Structured ASICs need not revisit timing closure or re-qualify the production device.

35 | P a g e

4.3.2 Applications For Nextreme Stryctured ASICS:

Embedded Processing

Nextreme Structured ASICs are ideally suited for embedded processing applications. The

availability of a firm, 150MHz ARM926EJT processor and AMBA peripherals backed by

industry standard development tools from ARM and its Connected Community partners,

designers have the option to implement control circuits in software. A major benefit of using

Nextreme for implementing embedded systems is that designers are able to make performance,

area and feature tradeoffs using both hardware and software allowing for highly differentiated yet

cost-optimized systems.

Signal, Video and Image Processing

Having to deal with programmable metal interconnect and its associated carry chain

delays ultimately forced FPGA vendors to develop dedicated DSP blocks and slices to overcome

performance bottlenecks. With Nextreme Structured ASICs, the elimination of massive amounts

of metal interconnect means that these devices are not subject to unacceptable carry chain delays

and many signal processing structured can be implemented, at speed, using logic fabric alone.

Another capability with in Nextreme that makes them particularly suitable for signal processing

is memories. eRAM blocks are particularly suited for distributed applications such as semi-

parallel filters and video processing. As these blocks are located very close together, they can be

connected to form larger blocks up to 4Kbits per eUnit.

4.3.3 Field Programmable Gate Array (FPGA)

A field-programmable gate array (FPGA) is an integrated circuit designed to be

configured by the customer or designer after manufacturinghence "field-programmable". The

FPGA configuration is generally specified using a hardware description language (HDL), similar

to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were

previously used to specify the configuration, as they were for ASICs, but this is increasingly

rare). FPGAs can be used to implement any logical function that an ASIC could perform. The

ability to update the functionality after shipping, partial re-configuration of the portion of the

designand the low non-recurring engineering costs relative to an ASIC design (notwithstanding

the generally higher unit cost), offer advantages for many applications.

36 | P a g e

FPGAs contain programmable logic components called "logic blocks", and a hierarchy of

reconfigurable interconnects that allow the blocks to be "wired together"somewhat like a one-

chip programmable breadboard. Logic blocks can be configured to perform

complex combinational functions, or merely simple logic gates like AND and XOR. In most

FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more

complete blocks of memory.

In addition to digital functions, some FPGAs have analog features. The most common

analog feature is programmable slew rate and drive strength on each output pin, allowing the

engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to

set stronger, faster rates on heavily loaded pins on high-speed channels that would otherwise run

too slow.Another relatively common analog feature is differential comparators on input pins

designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have

integrated peripheral Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters

(DACs) with analog signal conditioning blocks allowing them to operate as a system-on-a-

chip.[5]

Such devices blur the line between an FPGA, which carries digital ones and zeros on its

internal programmable interconnect fabric, and field-programmable analog array (FPAA), which

carries analog values on its internal programmable interconnect fabric.

4.4.3.1 Definitions of Relevant Terminology

The most important terminology used below.

Field-Programmable Device (FPD)

A general term that refers to any type of integrated circuit used for implementing digital

hardware, where the chip can be configured by the end user to realize different designs.

Programming of such a device often involves placing the chip into a special programming unit,

but some chips can also be configured in-system. Another name for FPDs is programmable

logic devices (PLDs); although PLDs encompass the same types of chips as FPDs, we prefer the

term FPD because historically the word PLD has referred to relatively simple types of devices.

Programmable Logic Array (PLA)

A Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of

logic, an AND-plane and an OR-plane, where both levels are programmable (note: although PLA

37 | P a g e

structures are sometimes embedded into full-custom chips, we refer here only to those PLAs that

are provided as separate integrated circuits and are user-programmable).

Programmable Array Logic (PAL)

A Programmable Array Logic (PAL) is a relatively small FPD that has a programmable

AND-plane followed by a fixed OR-plane.

Simple PLD

Refers to any type of Simple PLD, usually either a PLA or PAL.

Complex PLD

A more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a

single chip. Alternative names (that will not be used in this paper) sometimes adopted for this

style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others.

Field-Programmable Gate Array (FPGA)

A Field-Programmable Gate Array is an FPD featuring a general structure that allows

very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs

(AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of flip-

flops to logic resources than do CPLDs.

High-Capacity PLDs (HCPLD):

high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This term

has been coined in trade literature for providing an easy way to refer to both types of devices.

PAL is a trademark of Advanced Micro Devices.

1. Interconnect - the wiring resources in an FPD.

2. Programmable Switch- a user-programmable switch that can connect a logic element to

an interconnect wire, or one interconnect wire to another

3. Logic Block- a relatively small circuit block that is replicated in an array in an FPD.

When a circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits

that can each be mapped into a logic block. The term logic block is mostly used in the

context of FPGAs, but it could also refer to a block of circuitry in a CPLD.

38 | P a g e

4. Logic Capacity- the amount of digital logic that can be mapped into a single FPD. This is

usually measured in units of equivalent number of gates in a traditional gate array. In

other words, the capacity of an FPD is measured by the size of gate array that it is

comparable to. In simpler terms, logic capacity can be thought of as number of 2-input

NAND gates.

5. Logic Density - the amount of logic per unit area in an FPD.

6. Speed-Performance- measures the maximum operable speed of a circuit when

implemented in an FPD. For combinational circuits, it is set by the longest delay through

any path, and for sequential circuits it is the maximum clock frequency for which the

circuit functions properly. In the remainder of this section, to provide insight into FPD

development the evolution of FPDs over the past two decades is described. Additional

background information is also included on the semiconductor technologies used in the

manufacture of FPDs.

4.3.4 Evolution Of Programmable Logic Devices

The first type of user-programmable chip that could implement logic circuits was the

Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit

inputs and data lines as outputs. Logic functions, however, rarely require more than a few product

terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient

architecture for realizing logic circuits, and so are rarely used in practice for that purpose. The

first device developed later specifically for implementing logic circuits was the Field-

Programmable Logic Array (FPLA), or simply PLA for short. A PLA consists of two levels of

logic gates: a programmable wired AND-plane followed by a programmable wired OR-

plane. A PLA is structured so that any of its inputs (or their complements) can be ANDed

together in the AND-plane; each AND-plane output can thus correspond to any product term of

the inputs. Similarly, each OR plane output can be configured to produce the logical sum of any

of the AND-plane outputs. With this structure, PLAs are well-suited for implementing logic

functions in sum-of-products form. They are also quite versatile, since both the AND terms and

OR terms can have many inputs (this feature is often referred to as wide AND and OR gates).

When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they

were expensive to manufacture and offered somewhat poor speed-performance.

39 | P a g e

Both disadvantages were due to the two levels of configurable logic, because

programmable logic planes were difficult to manufacture and introduced significant propagation

delays. To overcome these weaknesses, Programmable Array Logic (PAL) devices were

developed. PALs feature only a single level of programmability, consisting of a programmable

wired AND plane that feeds fixed OR-gates. To compensate for lack of generality incurred

because the OR- Outputs plane is fixed, several variants of PALs are produced, with different

numbers of inputs and outputs, and various sizes of OR-gates. PALs usually contain flip-flops

connected to the OR-gate outputs so that sequential circuits can be realized.

PAL devices are important because when introduced they had a profound effect on digital

hardware design, and also they are the basis for some of the newer, more sophisticated

architectures that will be described shortly. Variants of the basic PAL architecture are featured in

several other products known by different acronyms. All small PLDs, including PLAs, PALs, and

PAL-like devices are grouped into a single category called Simple PLDs (SPLDs), whose most

important characteristics are low cost and very high pin-to-pin speed-performance. As technology

has advanced, it has become possible to produce devices with higher capacity than SPLDs. The

difficulty with increasing capacity of a strict SPLD architecture is that the structure of the

programmable logic-planes grows too quickly in size as the number of inputs is increased.

The only feasible way to provide large capacity devices based on SPLD architectures is

then to integrate multiple SPLDs onto a single chip and provide interconnect to programmably

connect the SPLD blocks together. Many commercial FPD products exist on the market today

with this basic structure, and are collectively referred to as Complex PLDs (CPLDs). CPLDs

were pioneered by Altera, first in their family of chips called Classic EPLDs, and then in three

additional series, called MAX 5000, MAX 7000 and MAX 9000. Because of a rapidly growing

market for large FPDs, other manufacturers developed devices in the CPLD category and there

are now many choices available. All of the most important commercial products will be

described in Section 2. CPLDs provide logic capacity up to the equivalent of about 50 typical

SPLD devices, but it is somewhat difficult to extend these architectures to higher densities. To

build FPDs with very high logic capacity, a different approach is needed. The highest capacity

general purpose logic chips available today are the traditional gate arrays sometimes referred to

as Mask-Programmable Gate Arrays (MPGAs).

40 | P a g e

MPGAs consist of an array of pre-fabricated transistors that can be customized into the

users logic circuit by connecting the transistors with custom wires. Customization is performed

during chip fabrication by specifying the metal interconnect, and this means that in order for a

user to employ an MPGA a large setup cost is involved and manufacturing time is long. Although

MPGAs are clearly not FPDs, they are mentioned here because they motivated the design of the

user-programmable equivalent: Field- Programmable Gate Arrays (FPGAs). Like MPGAs,

FPGAs comprise an array of uncommitted circuit elements, called logic blocks, and interconnect

resources, but FPGA configuration is performed through programming by the end user. An

illustration of a typical FPGA architecture appears in Figure .

4.4 SOFTWARE REQUIREMENT

Verification Tool

ModelSim 6.5e

Synthesis Tool

Xilinx ISE 14.4

4.4.1 MODELSIM

ModelSim SE - High Performance Simulation and Debug

ModelSim SE is our UNIX, Linux, and Windows-based simulation and debug

environment, combining high performance with the most powerful and intuitive GUI in the

industry.

What's New in ModelSim SE?

1. Improved FSM debug options including control of basic information, transition table and

warning messages. Added support of FSM Multi-state transitions coverage (i.e. coverage

for all possible FSM state sequences).

2. Improved debugging with hyperlinked navigation between objects and their declaration,

and between visited source files.

3. The dataflow window can now compute and display all paths from one net to another.

4. Enhanced code coverage data management with fine grain control of information in the

source window.

41 | P a g e

5. Toggle coverage has been enhanced to support SystemVerilog types: structures, packed

unions, fixed-size multi-dimensional arrays and real.

6. Some IEEE VHDL 2008 features are supported including source code encryption. Added

support of new VPI types, including packed arrays of struct nets and variables.

ModelSim SE Features:

1. Multi-language, high performance simulation engine

2. Verilog, VHDL, SystemVerilog Design

3. Code Coverage

4. SystemVerilog for Design

5. Integrated debug

6. JobSpy Regression Monitor

7. Mixed HDL simulation option

8. System C Option

9. TCL/tk

10. Solaris and Linux 32 & 64-bit

11. Windows 32-bit

ModelSim SE Benefits:

1. High performance HDL simulation solution for FPGA & ASIC design teams

2. The best mixed-language environment and performance in the industry

3. Intuitive GUI for efficient interactive or post-simulation debug of RTL and gate-level

designs

4. Merging, ranking and reporting of code coverage for tracking verification progress

5. Sign-off support for popular ASIC libraries

6. All ModelSim products are 100% standards based. This means your investment is

protected, risk is lowered, reuse is enabled, and productivity is enhanced

7. Award-winning technical support

High-Performance, Scalable Simulation Environment:

ModelSim provides seamless, scalable performance and capabilities. Through the use of a

single compiler and library system for all ModelSim configurations, employing the right

42 | P a g e

ModelSim configuration for project needs is as simple as pointing your environment to the

appropriate installation directory.

ModelSim also supports very fast time-tenet-simulation turnarounds while maintaining

high performance with its new black box use model, known as bbox. With bbox, non-changing

elements can be compiled and optimized once and reused when running a modified version of the

test bench. B box delivers dramatic throughput improvements of up to 3X when running a large

suite of test cases.

Easy-to-Use Simulation Environment:

An intelligently engineered graphical user interface (GUI) efficiently displays design data

for analysis and debug. The default configuration of windows and information is designed to

meet the needs of most users. However, the flexibility of the ModelSim SE GUI allows users to

easily customize it to their preferences. The result is a feature-rich GUI that is easy to use and

quickly mastered.

A message viewer enables simulation messages to be logged to the ModelSim results file

in addition to the standard transcript file. The GUIs organizational and filtering capabilities

allow design and simulation information to be quickly reduced to focus on areas of interest, such

as possible causes of design bugs.

ModelSim SE allows many debug and analysis capabilities to be employed post-

simulation on saved results, as well as during live simulation runs. For example, the coverage

viewer analyzes and annotates source code with code coverage results, including FSM state and

transition, statement, expression, branch, and toggle coverage. Signal values can be annotated in

the source window and viewed in the waveform viewer. Race conditions, delta, and event activity

can be analyzed in the list and wave windows. User-defined enumeration values can be easily

defined for quicker understanding of simulation results. For improved debug productivity,

ModelSim also has graphical and textual dataflow capabilities. The memory window identifies

memories in the design and accommodates flexible viewing and modification of the memory

contents. Powerful search, fill, load, and save functionalities are supported. The memory window

allows memories to be pre-loaded with specific or randomly generated values, saving the time-

consuming step of initializing sections of the simulation merely to load memories. All functions

are available via the command line, so they can be used in scripting.

43 | P a g e

Advanced Code Coverage

The ModelSim advanced code coverage capabilities deliver high performance with ease

of use. Most simulation optimizations remain enabled with code coverage. Code coverage

metrics can be reported by-instance or by-design unit, providing flexibility in managing coverage

data. All coverage information is now stored in the Unified Coverage Data Base (UCDB), which

is used to collect and manage all coverage information in one highly efficient database. Coverage

utilities that analyze code coverage data, such as merging and test ranking, are available.

The coverage types supported include:

1. Statement coverage: number of statements executed during a run

2. Branch coverage: expressions and case statements that affect the control flow of the HDL

execution

3. Condition coverage: breaks down the condition on a branch into elements that make the

result true or false

4. Expression coverage: the same as condition coverage, but covers concurrent signal

assignments instead of branch decisions

5. Focused expression coverage: presents expression coverage data in a manner that

accounts for each independent input to the expression in determining coverage results

6. Enhanced toggle coverage: in default mode, counts low-to-high and high-to-low

transitions; in extended mode, counts transitions to and from X

7. Finite State Machine coverage: state and state transition coverage

SYNTHESIS TOOL:

4.4.2 XILINX ISE

4.4.2.1 Introduction

For two-and-a-half decades, Xilinx has been at the forefront of the programmable logic

revolution, with the invention and continued migration of FPGA platform technology. During

that time, the role of the FPGA has evolved from a vehicle for prototyping and glue-logic to a

highly flexible alternative to ASICs and ASSPs for a host of applications and markets. Today,

Xilinx FPGAs have become strategically essential to world-class system companies that are

hoping to survive and compete in these times of extreme global economic instability, turning

44 | P a g e

what was once the programmable revolution into the programmable imperative for both Xilinx

and our customers.

Programmable Imperative

When viewed from the customer's perspective, the programmable imperative is the

necessity to do more with less, to remove risk wherever possible, and to differentiate in order to

survive. In essence, it is the quest to simultaneously satisfy the conflicting demands created by

ever-evolving product requirements (i.e., cost, power, performance, and density) and mounting

business challenges (i.e., shrinking market windows, fickle market demands, capped engineering

budgets, escalating ASIC and ASSP non-recurring engineering costs, spiraling complexity, and

increased risk). To Xilinx, the programmable imperative represents a two-fold commitment. The

first is to continue developing programmable silicon innovations at every process node that

deliver industry-leading value for every key figure of merit against which FPGAs are measured:

price, power, performance, density, features, and programmability. The second commitment is to

provide customers with simpler, smarter, and more strategically viable design platforms for the

creation of world-class FPGA-based solutions in a wide variety of industrieswhat Xilinx calls

targeted design platforms.

Base Platform

The base platform is both the delivery vehicle for all new silicon offerings from Xilinx

and the foundation upon which all Xilinx targeted design platforms are built. As such, it is the

most fundamental platform used to develop and run customer-specific software applications and

hardware designs as production system solutions. Released at launch, the base platform

comprises a robust set of well-integrated, tested, and targeted elements that enable customers to

immediately start a design. These elements include:

1. FPGA silicon

2. ISE Design Suite design environment

3. Third-party synthesis, simulation, and signal integrity tools

4. Reference designs common to many applications, such as memory interface and

configuration designs.

5. Development boards that run the reference designs

6. A host of widely used IP, such as Gig E, Ethernet, memory controllers, and PCIe.

45 | P a g e

Domain-Specific Platform

The next layer in the targeted design platform hierarchy is the domain-specific platform.

Released from three to six months after the base platform, each domain specific platform targets

one of the three primary Xilinx FPGA user profiles (domains):the embedded processing

developer, the digital signal processing (DSP) developer, or the logic/connectivity developer.

This is where the real power and intent of the targeted design platform begins to emerge.

Domain-specific platforms augment the base platform with a predictable, reliable, and

intelligently targeted set of integrated technologies, including:

1. Higher-level design methodologies and tools

2. Domain-specific embedded, DSP, and connectivity IP

3. Domain-specific development hardware and daughter cards

4. Reference designs optimized for embedded processing, connectivity, and DSP

5. Operating systems (required for embedded processing) and software

Every element in these platforms is tested, targeted, and supported by Xilinx and/or our

ecosystem partners. Starting a design with the appropriate domain-specific platform can cut

weeks, if not months, off of the user's development time.

Market-Specific Platform

A market-specific platform is an integrated combination of technologies that enables

software or hardware developers to quickly build and then run their specific application or

solution. Built for use in specific markets such as Automotive, Consumer, Mil/Aero,

Communications, AVB, or ISM, market-specific platforms integrate both the base and domain-

specific platforms and provide higher level elements that can be leveraged by customer-specific

software and hardware designs. The market-specific platform can rely more heavily on third-

party targeted IP than the base or domain-specific platforms. The market-specific platform

includes: the base and domain-specific platforms, reference designs, and boards (or daughter

cards) to run reference designs that are optimized for a particular market (e.g., lane departure

early-warning systems, analytics, and display processing).Xilinx will begin releasing market-

specific platforms three to six months after the domain-specific platforms, augmenting the

domain-specific platforms with reference designs, IP, and software aimed at key growth markets.

Initially, Xilinx will target markets such as Communications, Automotive, Video, and Displays

46 | P a g e

with platform elements that abstract away the more mundane portions of the design, thereby

further reducing the customer's development effort so they can focus their attention on creating

differentiated value in their end solution. This systematic platform development and release

strategy provides the framework for the consistent and efficient fulfillment of the programmable

imperativeboth by Xilinx and by its customers.

Platform Enablers

Xilinx has instituted a number of changes and enhancements that have contributed

substantially to the feasibility and viability of the targeted design platform. These platform-

enabling changes cover six primary areas:

1. Design environment enhancements

2. Socket able IP creation

3. New targeted reference designs

4. Scalable unified board and kit strategy

5. Ecosystem expansion

6. Design services supporting the targeted design platform approach

Design Environment Enhancements

With the breadth of advances and capabilities that the Virtex-6 and Spartan-6

programmable devices deliver coupled with the access provided by the associated targeted design

platforms, it is no longer feasible for one design flow or environment to fit every designer's

needs. System designers, algorithm designers, SW coders, and logic designers each represent a

different user-profile, with unique requirements for a design methodology and associated design

environment. Instead of addressing the problem in terms of individual fixed tools, Xilinx targets

the required or preferred methodology for each user, to address their specific needs with the

appropriate design flow. At this level, the design language changes from HDL (VHDL/Verilog)

to C, C++, MATLAB software, and other higher level languages which are more widely used

by these designers, and the design abstraction moves up from the block or component to the

system level. The result is a methodology and complete design flow tailored to each user profile

that provides design creation, design implementation, and design verification. Indicative of the

complexity of the problem, to fully understand the user profile of a logic designer, one must

consider the various levels of expertise represented by this demographic. The most basic category

47 | P a g e

in this profile is the push-button user who wants to complete a design with minimum work or

knowledge.

The push-button user just needs good-enough results. Contrastingly, more advanced

users want some level of interactive capabilities to squeeze more value into their design, and the

power user (the expert) wants full control over a vast array of variables. Add the traditional

ASIC designers, tasked with migrating their designs to an FPGA (a growing trend, given the

intolerable costs and risks posed by ASIC development these days), and clearly the imperative

facing Xilinx is to offer targeted flows and tools that support each user's requirements and

capabilities, on their terms. The most recent release of the ISE Design Suite includes numerous

changes that fulfill requirements specifically pertinent to the targeted design platform. The new

release features a complete tool chain for each top-level user profile (the domain-specific

personas: the embedded, DSP, and logic/connectivity designers), including specific

accommodations for everyone from the push-button user to the ASIC designer.

The tighter integration of embedded and DSP flows enables more seamless integration of

designs that contain embedded, DSP, IP, and user blocks in one system. To further enhance

productivity and help customers better manage the complexity of their designs, the new ISE

Design Suite enables designers to target area, performance, or power by simply selecting a design

goal in the setup. The tools then apply specific optimizations to help meet the design goal. In

addition, the ISE Design Suite boasts substantially faster place-and-route and simulation run

times, providing users with 2X faster compile times. Finally, Xilinx has adopted the FLEXnet

Licensing strategy that provides a floating license to track and monitor usage.

4.4.2.2 XILINX ISE Design Tools:

Xilinx ISE is the design tool provided by Xilinx. Xilinx would be virtually identical for

our purposes.

There are four fundamental steps in all digital logic design. The se consist of:

1. Design The schematicor code that describes the circuit.

2. Synthesis The intermediate conversion of human readable circuit description to FPGA

code(EDIF) format. It involves syntax checking and combining of all the separate design

files into a single file.

48 | P a g e

3. Place & Route Where the layout of the circuit is finalized. This is the translation of the

EDIF into logic gates on the FPGA.

4. Program The FPGA is updated to reflect the design through the use of programming

(.bit) files. Test bench simulation is in the second step. As its name implies, it is used for

testing the design by simulating the result of driving the inputs and observing the outputs

to verify your design. ISE has the capabilit

Date post:	28-Sep-2015
Category:	Documents
Upload:	kasaragadda
View:	7 times
Download:	0 times

error detection and data recovery architecture for motion estimation

Documents