+ All Categories
Home > Documents > error detection and data recovery architecture for motion estimation

error detection and data recovery architecture for motion estimation

Date post: 28-Sep-2015
Category:
Upload: kasaragadda
View: 7 times
Download: 0 times
Share this document with a friend
Description:
total pdf and documentation
Popular Tags:
63
1 | Page CHAPTER 1 INTRODUCTION 1.1 INTRODUCTION Advances in semiconductors, digital signal processing, and communication technologies have made multimedia applications more flexible and reliable. A good example is the H.264 video standard, also known as MPEG-4 Part 10 Advanced Video Coding, which is widely regarded as the next generation video compression standard. Video compression is necessary in a wide range of applications to reduce the total data amount required for transmitting or storing video data. Among the coding systems, a ME is of priority concern in exploiting the temporal redundancy between successive frames, yet also the most time consuming aspect of coding. Additionally, while performing up to 60%90% of the computations encountered in the entire coding system, a ME is widely regarded as the most computationally intensive of a video coding system. A ME generally consists of PEs with a size of 4 X 4. However, accelerating the computation speed depends on a large PE array, especially in high-resolution devices with a large search range such as HDTV. Additionally, the visual quality and peak signal-to-noise ratio (PSNR) at a given bit rate are influenced if an error occurred in ME process. A testable design is thus increasingly important to ensure the reliability of numerous PEs in a ME. Moreover, although the advance of VLSI technologies facilitate the integration of a large number of PEs of a ME into a chip, the logic-per-pin ratio is subsequently increased, thus decreasing significantly the efficiency of logic testing on the chip. As a commercial chip, it is absolutely necessary for the ME to introduce design for testability (DFT). DFT focuses on increasing the ease of device testing, thus guaranteeing high reliability of a system. DFT methods rely on reconfiguration of a circuit under test (CUT) to improve testability. While DFT approaches enhance the testability of circuits, advances in sub-micron technology and resulting increases in the complexity of electronic circuits and systems have meant that built-in self-test (BIST) schemes have rapidly become necessary in the digital world. BIST for the ME does not expensive test equipment, ultimately lowering test costs. Moreover, BIST can generate test simulations and analyze test responses without outside support, subsequently streamlining the testing and diagnosis of digital systems. However, increasingly complex density of circuitry requires that the built-in testing approach not only detect faults but also specify their locations for error correcting. Thus,
Transcript
  • 1 | P a g e

    CHAPTER 1 INTRODUCTION

    1.1 INTRODUCTION

    Advances in semiconductors, digital signal processing, and communication technologies

    have made multimedia applications more flexible and reliable. A good example is the H.264

    video standard, also known as MPEG-4 Part 10 Advanced Video Coding, which is widely

    regarded as the next generation video compression standard. Video compression is necessary in a

    wide range of applications to reduce the total data amount required for transmitting or storing

    video data. Among the coding systems, a ME is of priority concern in exploiting the temporal

    redundancy between successive frames, yet also the most time consuming aspect of coding.

    Additionally, while performing up to 60%90% of the computations encountered in the entire

    coding system, a ME is widely regarded as the most computationally intensive of a video coding

    system.

    A ME generally consists of PEs with a size of 4 X 4. However, accelerating the

    computation speed depends on a large PE array, especially in high-resolution devices with a large

    search range such as HDTV. Additionally, the visual quality and peak signal-to-noise ratio

    (PSNR) at a given bit rate are influenced if an error occurred in ME process. A testable design is

    thus increasingly important to ensure the reliability of numerous PEs in a ME. Moreover,

    although the advance of VLSI technologies facilitate the integration of a large number of PEs of a

    ME into a chip, the logic-per-pin ratio is subsequently increased, thus decreasing significantly the

    efficiency of logic testing on the chip. As a commercial chip, it is absolutely necessary for the

    ME to introduce design for testability (DFT). DFT focuses on increasing the ease of device

    testing, thus guaranteeing high reliability of a system. DFT methods rely on reconfiguration of a

    circuit under test (CUT) to improve testability. While DFT approaches enhance the testability of

    circuits, advances in sub-micron technology and resulting increases in the complexity of

    electronic circuits and systems have meant that built-in self-test (BIST) schemes have rapidly

    become necessary in the digital world. BIST for the ME does not expensive test equipment,

    ultimately lowering test costs. Moreover, BIST can generate test simulations and analyze test

    responses without outside support, subsequently streamlining the testing and diagnosis of digital

    systems. However, increasingly complex density of circuitry requires that the built-in testing

    approach not only detect faults but also specify their locations for error correcting. Thus,

  • 2 | P a g e

    extended schemes of BIST referred to as built-in self-diagnosis and built-in self-correction have

    been developed recently.

    While the extended BIST schemes generally focus on memory circuit, testing-related

    issues of video coding have seldom been addressed. Thus, exploring the feasibility of an

    embedded testing approach to detect errors and recover data of a ME is of worthwhile interest.

    Additionally, the reliability issue of numerous PEs in a ME can be improved by enhancing the

    capabilities of concurrent error detection (CED). The CED approach can detect errors through

    conflicting and undesired results generated from operations on the same operands. CED can also

    test the circuit at full operating speed without interrupting a system. Thus, based on the CED

    concept, this work develops a novel EDDR architecture based on the RQ code to detect errors

    and recovery data in PEs of a ME and, in doing so, further guarantee the excellent reliability for

    video coding testing applications.

    1.2 OVERVIEW

    Video compression is the field in electrical engineering and computer science that deals

    with representation of video data, for storage and/or transmission, for both analog and digital

    video. Video coding is often considered to be only for natural video, it can also be applied to

    synthetic (computer generated) video, i.e. graphics. Many representations take advantage of

    features of the Human Visual System to achieve an efficient representation. The biggest

    challenge is to reduce the size of the video data using video compression. For this reason the

    terms video coding and often used interchangeably by those who dont know the difference.

    The search for efficient video compression techniques dominated much of the research activity

    for video coding since the early 1980s, the major milestone was H.261, from which JPEG

    adopted the idea of using the DCT; since the many other advancements have been made to

    algorithms such as motion estimation. Since approximately 2000 the focus has been more on

    Meta data and video search, resulting in MPEG-7 and MPEG-21.

    1.2.1 Video Compression

    The main problem with the uncompressed (raw) video is it contain immense amount of

    data and hence communication and storage capabilities are limited and are expensive. For

    example, if we consider a HDTV video signal with 720 X 2180 pixels/frame with progressive

    scanning at 60 frames/sec, then the transmitter must be able to send

  • 3 | P a g e

    ( 720X1280 pixels

    )( 60 frames

    )( 3colours

    )( 8bits

    ) = 1.3Gb/sec

    frame sec pixels colour

    But the available HDTV channel bandwidth is around 20 Mb/s, i.e., it require compression by a

    factor of 70. A DVD (Digital Versatile Disk) can only store a few seconds of raw video at Frame

    rate and television quality resolution.

    1.2.2 Need for Compression

    The following statement (or something similar) has been made many times over the 20-

    year history of image and video compression: Video compression will become redundant very

    soon, once transmission and storage capacities have increased to a sufficient level to cope with

    uncompressed video. It is true that both storage and transmission capacities Continue to

    increase. However, an efficient and well-designed video compression system gives very

    significant performance advantages for visual communications at both low and high

    transmission bandwidths. At low bandwidths, compression enables applications that would not

    otherwise be possible, such as basic-quality video telephony over a standard telephone

    connection. At high bandwidths, compression can support a much higher visual quality. For

    example, a 4.7 Gbyte DVD can store approximately 2 hours of uncompressed QCIF video (at 15

    frames per second) or 2 hours of compressed ITU-R 601 video (at 30 frames per second). Most

    users would prefer to see television-quality video with smooth motion rather than postage-

    stamp video with jerky motion. Video CODEC s will therefore remain a important part of the

    emerging multimedia industry for the foreseeable future, allowing designers to make the most

    efficient use of available transmission or storage capacity. In this chapter we introduce the basic

    components of an image or video compression system. We then describe the main functional

    blocks of an image encoded decoder (CODEC) and a video CODEC.

    1.2.3 Achieving Compression

    Video compression can be achieved by exploiting the similarities or redundancies and

    irrelevancy that exist in a typical video signal. The redundancy in a video signal is based on two

    principles. The first is the spatial redundancy that exists in each frame. The second is the

    difference between the corresponding frames. This is called temporal redundancy. This temporal

    redundancy can be eliminated by using motion estimation and compensation procedure. The

    remaining goals of video compression are to reduce the irrelevancy in the video signal is

  • 4 | P a g e

    relatively straight forward. The identification and reduction redundancy in video signal is the

    straightforward, in this what is perceptually relevant and what is not is very difficult and there.

    This operation can be done by using appropriate models of the Human Vision System.

    In video successive frames may contain the same objects (still or moving). In inter frame

    coding motion estimation and compensation has become powerful techniques to eliminate the

    temporal redundancy due high correlation between consecutive frames. In video scenes, motion

    can be complex combination of translation of rotation. Such motion is difficult to estimate and

    may require large amount of processing. The translational motion is easily estimated and has

    been used successfully for motion compensated coding.

    Different search algorithms are used to estimate motion between frames. When motion

    estimation is performed by an MPEG-2 encoder it groups pixels into 16 x 16 macro blocks.

    MPEG-4 AVC encoders can divide these macro blocks into partitions as well as 4 x 4, and even

    of variable size within the same Macro block. Partitions allow for more accuracy in motion

    estimation because areas with high motion can be isolated from those with less movement.

  • 5 | P a g e

    CHAPTER 2 LITERATURE REVIEW AND PROBLEM

    IDENTIFICATION

    2.1 LITERATURE REVIEW

    Most of the researchers have proposed many algorithms for Motion Estimation.

    Generally, the motion estimation search types are divided into two types these are

    1). Pixel based motion estimation.

    2).Block based motion estimation

    The pixel based motion estimation approach seeks to determine motion vector for every

    pixel in the image. This is also referred to as the optical flow method, which works on the

    fundamental assumption of brightness constancy that is the intensity of a pixel remains constant,

    when it is displaced. However, no unique match for a pixel in the reference frame is found in the

    direction normal to the intensity gradient. It is for this reason that an additional constraint is also

    introduced in terms of the smoothness displacement vector in the neighborhood. The smoothness

    constraint makes the algorithm interactive and requires excessively large computation time,

    making it unsuitable for practical and real time implementation.

    An alternate and faster approach is the block based motion estimation. In this method, the

    candidate frame is divided into non overlapping blocks (of size 16 16, 8 8 or even 4 4

    pixels in the recent standards) and for each such candidate block, the best motion vector is

    determined in the reference frame. Here, a single motion vector is computed for the entire block,

    whereby we make an inherent assumption that the entire block undergoes translational Motion.

    This assumption is reasonably valid, except for the object boundaries and smaller block size leads

    to better motion compensation and motion estimation.

    Block based motion estimation is accepted in all the video coding standards proposed till

    date. It is easy to implement in hardware and real time motion estimation and prediction is

    possible.

    Many studies in literature use different block matching motion estimation algorithms in

    these algorithms full search gives minimum error when compared to the all block matching

    algorithms this is the basic search but it takes maximum computational complexity. So that in

    literature survey we find so many block matching algorithms in this block matching motion

  • 6 | P a g e

    estimation we have to remember two main things the first one is the type of search pattern this

    one is the most important one because when the object moving we have to follow the certain

    pattern then only we can get the minim number of search points for block matching. The second

    one is meaning absolute difference so when the search pattern is very close to objects then we

    can get the minimum error.

    The block size is the one of the important parameter in block matching algorithm. If the

    block size is smaller, it achieves better prediction quality. This is due to a number of reasons. A

    smaller block size reduces the effect of the accuracy problem. In other words, with a smaller

    block size, there is less possibility that the block will contain different objects moving in different

    directions.

    In literature review I observed different searching algorithms in all these if the error

    minimizes the number of search points increasing If search points decreases the error increases so

    that by observing these algorithms proposed one fast block matching algorithm this given best

    search accuracy and minim error. Finally the performance of the proposed algorithm is evaluated

    in terms of completeness and correctness.

    2.1.1 Motion Estimation

    A video sequence can be considered to be a discredited three dimensional projection of

    the real four-dimensional continuous space time. The objects in the real world may move, rotate,

    or deform. The movements cannot be observed directly, but instead the light reflected from the

    object surfaces and projected onto an image. The light can be moving, and the reflected back

    light varies depending on the angle between a surface and alight source. There may be objects

    occluding the light rays and casting shadows. The objects may be transparent (so that several

    independent motions could be observed at the same location of an image) or there might be fog,

    rain or snow blurring the observed image. The discretization causes noise into the video

    sequence, from which the video encoder makes its motion estimations. There may also be noises

    in the image capture device (such as a video camera) or in the electrical transmission lines. A

    perfect motion model would take all the factors into account and find the motion that has the

    maximum likelihood from the observed video sequence. The current frame and reference frame

    difference can observe in the figure 2.1 diagram.

  • 7 | P a g e

    Changes between frames are mainly due to the movement of objects. Using a model of

    the motion of objects between frames, the encoder estimates the motion that occurred between

    the reference frame and the current frame. This process is called motion estimation (ME). The

    encoder then uses this motion model and information to move the contents of the reference frame

    to provide a better prediction of the current frame. This process is known as motion

    compensation (MC), and the prediction so produced is called the motion-compensated prediction

    (MCP) or the displaced-frame (DF). In this case, the coded prediction error signal is called the

    displaced-frame difference (DFD).A block diagram of a motion compensated coding system is

    illustrated in Figure 2.2 This is the most commonly used inter frame coding method.

    Figure 2.1 Motion estimation detector

    The reference frame employed for ME can occur temporally before or after the current

    frame. The two cases are known as forward prediction and backward prediction, respectively.

    The prediction can be observed in figure 2.3. In bidirectional prediction, however, two reference

    frames (one each for forward and backward prediction) are employed and the two predictions are

    interpolated (the resulting predicted frame is called B-frame). The most commonly used ME

    method is the block matching motion estimation (BMME) algorithm.

  • 8 | P a g e

    Figure 2.2 Motion compensated video coding

    Figure 2.3 Predictive sources coding with motion compensation

    2.1.2 Motion Estimation Procedure

    After completion of motion estimation the residue of picture and motion vectors are

    predicted. This procedure is executed for each block (16x16, 8x8 or 4x4) in the current frame.

  • 9 | P a g e

    1. For the reference frame, a search area is defined for each block in the current frame. The

    search area is typically sized at 2 to 3 times the macro block size (16x16). Using the fact

    that the motion between consecutive frames is statistically small, the search range is

    confined to this area. After the search process, a best match will be found within the

    area. The best matching usually means having lowest energy in the sum of residual

    formed by subtracting the candidate block in search region from the current block located

    in current frame. The process of finding best match block by block is called block-based

    motion estimation.

    2. After finding the best match, the motion vectors and residues between the current block

    and reference block are computed. The process of getting the residues and motion vectors

    is known as motion compensation.

    3. The residues and motion vectors of best match are encoded by the transform unit and

    entropy unit and transmitted to the decoder side.

    4. At decoder side, the process is reversed to reconstruct the original picture.

    Figure 2.4 shows an illustration of the above procedure. In modern video coding standards, the

    reference frame can be a previous frame, a future frame or a combination of two or more

    Previously coded frames. The number of reference frames needed depends on the required

    accuracy. The more reference frames referenced by current block, the more accurate the

    prediction is.

    2.1.3 Motion Vectors

    To find the motion of each block, a motion vector is defined as the relative displacement

    between the current candidate block and the best matching block within the search window in the

    reference frame. It is a directional pair representing the displacement in horizontal (x-axis)

    direction and vertical (y-axis direction). The maximum value of motion vector is determined by

    the search range. If the search range is more, the more bits needed to code the motion vector.

    Designers need to make tradeoffs between these two conflicting parameters. The motion vector is

    illustrated in figure 2.4

  • 10 | P a g e

    Figure 2.4 Motion Estimation and Motion Vector

    Each macro block in the frame motion vector is produced. MPEG-1 and MPEG-2 employ

    this property. The introduction of variable block size motion estimation in MPEG-4 and

    H.264/AVC, one macro block can produce more than one motion vector due to the existence of

    different kinds of sub blocks. In H.264, 41 motion vectors should be produced in one macro

    block and they are passed to rate distortion optimization to choose the best combination. This is

    known as mode selection.

    2.1.4 Prediction Of Video CODEC

    A video signal consists of a sequence of individual frames. Each frame may be

    compressed individually using an image CODEC as described above: this is described as intra-

    frame coding, where each frame is intra coded without any reference to other frames. However,

    better compression performance may be achieved by exploiting the temporal redundancy in a

    video sequence (the similarities between successive video frames). This may be achieved by

    adding a front end to the image CODEC, with two main functions:

  • 11 | P a g e

    These are: (1) The Prediction

    (2) Compensation

    Prediction: The prediction of the current frame find based on the one or more previously

    transmitted frames.

    Compensation: The prediction is subtracted from the current frame to produce a residual frame.

    The residual frame is then processed using an image CODEC. The key to this approach

    is the prediction function: if the prediction is accurate, the residual frame will contain little data

    and will hence be compressed to a very small size by the image CODEC. In order to decode the

    frame, the decoder must reverse the compensation process, adding the prediction to the decoded

    residual frame (reconstruction). This is inter frame coding: frames are coded based on some

    relationship with other video frames, i.e. coding exploits the interdependencies of video frames.

    Figure 2.5 (a) Current Frame Figure 2.5(b) Previous Frame

  • 12 | P a g e

    Figure 2.5(c) Residual frame

    2.1.5 Frame Differencing

    The simplest predictor is just the previous transmitted frame. The above figure shows the

    residual frame produced by subtracting the previous frame from the current frame in a video

    sequence. Mid-grey areas of the residual frame contain zero data: light and dark areas indicate

    positive and negative residual data respectively. It is clear that much of the residual data is zero:

    hence, compression efficiency can be improved by compressing the residual frame rather than the

    current frame.

    Encoder Encoder Encoder output Decoder Decoder

    Input Prediction Decoder input Prediction Output

    Original frame Zero Compressed Zero Decoded frame 1

    Frame 1

    Original frame Original frame 1 Compressed Decoded frame 1 Decoded frame 2

    Residual frame 2

    Original frame Original frame 2 Compressed Decoded frame 2 Decoded frame 3

    Residual frame 2

    Table 2.1 Prediction drift

  • 13 | P a g e

    Figure 2.6 Encoder with Decoding Loop

    The decoder faces a potential problem that can be illustrated as follows. Table 2.1

    shows the sequence of operations required to encode and decode a series of video frames

    using frame differencing. For the first frame the encoder and decoder use no prediction. The

    problem starts with frame 2: the encoder uses the original frame 1 as a prediction and encodes

    the resulting residual. However, the decoder only has the decoded frame 1 available to form

    the prediction. Because the coding process is lossy, there is a difference between the decoded

    and original frame 1 which leads to a small error in the prediction of frame 2 at the decoder.

    This error will build up with each successive frame and the encoder and decoder predictors

    will rapidly drift apart, leading to a significant drop in decoded quality. The solution to this

    problem is for the encoder to use a decoded frame to form the prediction. Hence the encoder

    in the above example decodes (or reconstructs) frame 1 to form a prediction for frame 2. The

    encoder and decoder use the same prediction and drift should be reduced or removed. Figure

    2.6 shows the complete encoder which now includes a decoding loop in order to reconstruct

    its prediction reference. The reconstructed (or reference) frame is stored in the encoder and

    in the decoder to form the prediction for the next coded frame.

    2.1.6 Motion Compensated Prediction

    Frame differencing gives better compression performance than the intra frame coding

    when successive frames are very similar, but does not perform well when there is a significant

    change between the previous and current frames. Such changes are usually due to movement

  • 14 | P a g e

    in the video scene and a significantly better prediction can be achieved by estimating this

    movement and compensating for it.

    The below figure 2.7 shows a video CODEC that uses motion-compensated

    prediction. Two new steps are required in the encoder:

    Motion estimation: a region of the current frame (often a rectangular block of

    luminance samples) is compared with neighboring regions of the previous

    reconstructed frame.

    Figure 2.7 Video CODEC with Motion Estimation and Compensation

    Motion estimator attempts to find the best match, i.e. the neighboring block in the

    reference frame that gives the smallest residual block.

    Motion compensation: The matching region or block from the reference frame

    identified by The motion estimator) is subtracted from the current region or block.

  • 15 | P a g e

    Here the decoder carries out the same motion compensation to re construct the current

    frame. This operation means the encoder has to transmit the location of the best matching

    blocks to the decoder (typically in the form of a set of motion vectors). The below figure shows a

    residual frame produced by subtracting a motion compensated version of the previous frame from

    the current frame. The residual frame clearly contains less data than the residual in Figure 2.8 this

    improvement in compression does not come without a price: motion estimation can be very

    computationally intensive. Design of a motion estimation algorithm can have a dramatic effect on

    the compression performance and computational complexity of a video CODEC.

    Figure 2.8 Residual frame (MAD)

    2.1.7 Block Matching Algorithm

    Figure 2.9 illustrates a process of block-matching algorithm. In a typical block matching

    Algorithm, each frame is divided into blocks, each of which consists of luminance and

    chrominance blocks. Usually, for coding efficiency, motion estimation is performed only on the

    luminance block. Each luminance block in the present frame is matched against candidate blocks

    in a search area on the reference frame. These candidate blocks are just the displaced versions of

    original block. The best candidate block is found and its displacement (motion vector) is

    recorded. In a typical inter frame coder; the input frame is subtracted from the prediction of the

    reference frame. Consequently the motion vector and the resulting error can be transmitted

    instead of the original luminance block; thus inter frame redundancy is removed and data

  • 16 | P a g e

    compression is achieved. At receiver end, the decoder builds the frame difference signal from the

    received data and adds it to their constructed reference frames.

    Figure 2.9 Illustration of Motion Estimation Process

    This algorithm is based on a translational model of the motion of objects between frames.

    It also assumes that all pixels within a block undergo the same translational movement. There are

    many other ME methods, but BMME is normally preferred due to its simplicity and good

    compromise between prediction quality and motion overhead. This assumption is not strictly

    valid, since we capture 3-D scenes through the camera and objects do have more degrees of

    freedom than just the translational one. However, the assumptions are still reasonable,

    considering the practical movements of the objects over one frame and this makes our

    computations much simpler. There are many other approaches to motion estimation, some using

    the frequency or wavelet domains, and designers have considered scope to invent new methods

    since this process does not need to be specified in coding standards. The standards need only

    specify how the motion vectors should be interpreted by the decoder. Block Matching (BM) is

    the most common Method of motion estimation. Typically each macro block (1616 pixels) in

    the new frame is compared with shifted regions of the same size from the previous decoded

    frame, and the shift which results in the minimum error is selected as the best motion vector for

    that macro block. The motion compensated prediction frame is then formed from all the shifted

    regions from the previous decoded frame .

  • 17 | P a g e

    2.1.8 Backward Motion Estimation

    The motion estimation generally considered as backward motion estimation, since the

    current frame is considered as the candidate frame and the reference frame on which the motion

    vectors are searched is a fast frame that is the search is back word. Back word motion estimation

    leads to forward motion prediction.

    Figure 2.10 Back word motion estimation with current frame as k and frame (k-1) as the

    reference frame

    2.1.9 Forward Motion Estimation

    It is just the opposite of backward motion estimation. Here, the search for motion vectors

    is carried out on a frame that appears later than the candidates frame in temporal ordering. In

    other words, the search is forward. Forward motion estimation leads to backward motion

    prediction. It may appear that forward motion estimation is unusual, since one requires future

    frames to predict the candidate frame. However, this is not unusual, since the candidate frame,

    for which the motion vector is being sought is not necessarily the current, that is the most recent

    frame. It is possible to store more than one frame and use one of the past frames as a candidate

    frame that uses another frame, appearing later in the temporal order as a reference.

  • 18 | P a g e

    Figure 2.11 Forward motion estimation with current frame as k and frame (k+1) as the

    reference frame

    2.1.10 Matching Criteria For Motion Estimation

    Inter frame predictive coding is used to eliminate the large amount of temporal and spatial

    redundancy that exists in video sequences and helps in compressing them. In conventional

    predictive coding the difference between the current frame and the predicted frame is coded and

    transmitted. The better the prediction, the smaller the error and hence the transmission bit rate

    when there is motion in a sequence, then a pixel on the same part of the moving object is a better

    prediction for the current pixel. There are a number of criteria to evaluate the goodness of a

    match.

    Popular matching criteria used for block-based motion estimation is

    1. Sum of Absolute Difference (SAD)

    To implement the block motion estimation, the candidate video frame is partitioned into a

    set of non-overlapping blocks and the motion vector is to be determined for each such candidate

    block with respect to the reference. For each of these criteria, square block of size N X N pixels is

    considered. The intensity value of the pixel at coordinate ( n1 , n2 ) in the frame k is given by,

    S ( n1 , n2 , k) where (0 =< n1 , n2 =< N -1). the frame k is referred to as the candidate frame and

    the block of pixels defined is the candidates block.

  • 19 | P a g e

    2.1.10.1.Sum Of Absolute Difference (SAD)

    The sum of absolute difference (SAD) too makes the error values as positive, but instead

    of summing up the squared differences, the absolute differences are summed up. The SAD

    measure at displacement (i , j) is defined as

    The SAD is evaluated using the current block/pixel and the reference block/pixel. Reference

    block is selected within the search window. The search window size is different for various

    CODECs such as H.264, MPEG etcetera. Hence, SAD for all the possible reference blocks within

    the search window are calculated. The block with minimum SAD is then selected and a motion

    vector is drawn in order to denote the motion.

    2.2 PROBLEM IDENTIFICATION

    As mentioned in the earlier discussion, the PEs are essential building blocks and are

    connected regularly to construct a ME. Generally, PEs are surrounded by sets of ADDs and

    accumulators that determine how data flows through them. PEs can thus be considered the class

    of circuits called ILAs, whose testing assignment can be easily achieved by using the fault model,

    cell fault model (CFM). Using CFM has received considerable interest due to accelerated growth

    in the use of high-level synthesis, as well as the parallel increase in complexity and density of

    integration circuits (ICs). Using CFM makes the tests independent of the adopted synthesis tool

    and vendor library. Arithmetic modules, like ADDs (the primary element in a PE), due to their

    regularity, are designed in an extremely dense configuration.

    Moreover, a more comprehensive fault model, i.e. the stuck-at (SA) model, must be

    adopted to cover actual failures in the interconnect data bus between PEs. The SA fault is a well

    known structural fault model, which assumes that faults cause a line in the circuit to behave as if

    it were permanently at logic 0 (stuck-at 0 (SA0)) or logic 1 [stuck-at 1 (SA1)]. The SA fault

    in a ME architecture can incur errors in computing SAD values. A distorted computational error

    and the magnitude of are assumed here to be equal to SAD-SAD, where SAD denotes the

    computed SAD value with SA faults.

  • 20 | P a g e

    CHAPTER 3 METHODOLOGY

    An error detector and data recovery EDDR is the solution that has been proposed for the

    above mentioned problem. The technique used for detection and correction of code is RQ code

    generation. The proposed architecture is mentioned below.

    3.1 PROPOSED EDDR ARCHITECTURE DESIGN

    Fig. 3.1 shows the conceptual view of the proposed EDDR scheme, which comprises two

    major circuit designs, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to detect

    errors and recover the corresponding data in a specific CUT. The test code generator (TCG) in

    Fig. 3.1 utilizes the concepts of RQ code to generate the corresponding test codes for error

    detection and data recovery. In other words, the test codes from TCG and the primary output

    from CUT are delivered to EDC to determine whether the CUT has errors. DRC is in charge of

    recovering data from TCG. Additionally, a selector is enabled to export error-free data or data-

    recovery results. Importantly, an array-based computing structure, such as ME, discrete cosine

    transform (DCT), iterative logic array (ILA), and finite impulse filter (FIR), is feasible for the

    proposed EDDR scheme to detect errors and recover the corresponding data.

    Figure 3.1. Conceptual view of the proposed EDDR architecture.

  • 21 | P a g e

    Figure 3.2. A specific PEi testing processes of the proposed EDDR architecture.

    This work adopts the systolic ME as a CUT to demonstrate the feasibility of the proposed

    EDDR architecture. A ME consists of many PEs incorporated in a 1-D or 2-D array for video

    encoding applications. A PE generally consists of two ADDs (i.e. an 8-b ADD and a 12-b ADD)

    and an accumulator (ACC). Next, the 8-b ADD (a pixel has 8-b data) is used to estimate the

    addition of the current pixel (Cur_pixel) and reference pixel (Ref_pixel). Additionally, a 12-b

    ADD and an ACC are required to accumulate the results from the 8-b ADD in order to determine

    the sum of absolute difference (SAD) value for video encoding applications. Notably, some

    registers and latches may exist in ME to complete the data shift and storage. Fig. 3.2 shows an

    example of the proposed EDDR circuit design for a specific PEi of a ME. The fault model

    definition, RQCG-based TCG design, operations of error detection and data recovery, and the

    overall test strategy are described carefully as follows.

    3.1.1 RQ Code Generation

    Coding approaches such as parity code, Berger code, and residue code have been

    considered for design applications to detect circuit errors. Residue code is generally separable

    arithmetic codes by estimating a residue for data and appending it to data. Error detection logic

    for operations is typically derived by a separate residue code, making the detection logic is

    simple and easily implemented. For instance, assume that N denotes an integer, and represent

    data words, N1 and N2 refers to the modulus. A separate residue code of interest is one in which N

    is coded as a pair(N, |N|m). Notably, |N|m is the residue of modulo. Error detection logic for

  • 22 | P a g e

    operations is typically derived using a separate residue code such that detection logic is simply

    and easily implemented. However, only a bit error can be detected based on the residue code.

    Additionally, an error cannot be recovered effectively by using the residue codes. Therefore, this

    work presents a quotient code, which is derived from the residue code, to assist the residue code

    in detecting multiple errors and recovering errors. The mathematical model of RQ code is simply

    described as follows. Assume that binary data is expressed as

    The RQ code of X modulo m expressed as R = |X|m Q = [X/m], respectively. Notably, [i]

    denotes the largest integer not exceeding i.

    According to the above RQ code expression, the corresponding circuit design of the

    RQCG can be realized. In order to simplify the complexity of circuit design, the implementation

    of the module is generally dependent on the addition operation. Additionally, based on the

    concept of residue code, the following definitions shown can be applied to generate the RQ code

    for circuit design.

  • 23 | P a g e

    To accelerate the circuit design of RQCG, the binary data can generally be divided into two parts:

    Significantly, the value of k is equal to [n/2] and the data formation of Y0 and Y1 are a

    decimal system. If the modulus m=2k -1, then the residue code of X modulo m is given by

    Notably, since the value of is generally greater than that of modulus , the

    equations must be simplified further to replace the complex module operation with a simple

    addition operation by using the parameters , , and .

    Based on the equations, the corresponding circuit design of the RQCG is easily realized

    by using the simple adders (ADDs). Namely, the RQ code can be generated with a low

    complexity and little hardware cost.

    3.1.2 Test Code Generation Design

    According to Fig. 3.2, TCG is an important component of the proposed EDDR

    architecture. Notably, TCG design is based on the ability of the RQCG circuit to generate

    corresponding test codes in order to detect errors and recover data. The specific in Fig. 3.2

    estimates the absolute difference between the Cur_pixel of the search area and the Ref_pixel of

  • 24 | P a g e

    the current macroblock. Thus, by utilizing PEs, SAD shown in as follows, in a macroblock with

    size of can be evaluated

    where and denote the corresponding RQ code of and modulo .

    Importantly, and rep-resent the luminance pixel value of Cur_pixel and Ref_pixel,

    respectively. Based on the residue code, the definitions shown can be applied to facilitate

    generation of the RQ code ( and ) form TCG. Namely, the circuit design of TCG can be

    easily achieved (see Fig. 3.3) by using

    Fig. 3.4 shows the timing chart for a macroblock with a size of 4 4 in a specific to

    demonstrate the operations of the TCG circuit. The data and from Cur_pixel and Ref_pixel

    must be sent to a comparator in order to determine the luminance pixel value and at the

    1st clock. Notably, if , then and are the luminance pixel value of Cur_pixel and

    Ref_pixel, respectively. Conversely, represents the luminance pixel value of Ref_pixel, and

    denotes the luminance pixel value of Cur_pixel when . At the 2nd clock, the values

    of and are generated and the corresponding RQ code , , , can be

    captured by the and circuits if the 3rd clock is triggered. Equations clearly

  • 25 | P a g e

    indicate that the codes of and can be obtained by using the circuit of a subtracter (SUB).

    The 4th clock displays the operating results. The modulus value of is then obtained at the 5th

    clock. Next, the summation of quotient values and residue values of modulo are proceeded

    with from clocks 521 through the circuits of ACCs. Since a 4 4 macroblock in a specific

    of a ME contains 16 pixels, the corresponding RQ code ( and ) is exported to the EDC and

    DRC circuits in order to detect errors and recover data after

    Figure 3.3. Circuit design of the TCG.

    22 clocks. Based on the TCG circuit design shown in Fig. 3.4, the error detection and data

    recovery operations of a specific in a ME can be achieved.

  • 26 | P a g e

    Figure 3.4. Timing chart of the TCG.

    3.2 EDDR PROCESS

    Fig. 3.2 clearly indicates that the operations of error detection in a specific PEi is achieved

    by using EDC, which is utilized to compare the outputs between TCG and RQCG1 in order to

    determine whether errors have occurred. If the values of RPE != RT and/or QPE != QT, then the

    errors in a specific PEi can be detected. The EDC output is then used to generate a 0/1 signal to

    indicate that the tested PEi is error-free/errancy.

    This work presents a mathematical statement to verify the operations of error detection.

    Based on the definition of the fault model, the SAD value is influenced if either SA1 and/or SA0

    errors have occurred in a specific PEi. in the other words, the SAD value is transformed to

    SAD = SAD + e if an error e occurred. Notably, the error signal e is expressed as

  • 27 | P a g e

    The RPEi and QPEi are given by the equation

    During data recovery, the circuit DRC plays a significant role in recovering RQ code from TCG.

    The data can be recovered by implementing the mathematical model as

    To realize the operation of data recovery in, a Barrel shift and a corrector circuits are necessary

    to achieve the func-tions of and , respectively. Notably, the proposed

    EDDR design executes the error detection and data re-covery operations simultaneously.

    Additionally, error-free data from the tested or the data recovery that results from DRC is

    selected by a multiplexer (MUX) to pass to the next specific for subsequent testing as

    shown in fig. 3.5.

  • 28 | P a g e

    Figure 3.5. Proposed EDDR architecture design for a ME.

  • 29 | P a g e

    CHAPTER 4 - IMPLEMENTATION

    4.1 INTRODUCTION TO VLSI:

    VLSI stands for "Very Large Scale Integrated Circuits". It's a classification of ICs. An IC

    of common VLSI includes about millions active devices. Typical functions of VLSI include

    Memories, computers, and signal processors, etc. A semiconductor process technology is a

    method by which working circuits can be manufactured from designed specifications. There are

    many such technologies, each of which creates a different environment or style of design. In

    integrated circuit design, the specification consists of polygons of conducting and

    semiconducting material that will be layered on top of each other to produce a working chip.

    When a chip is custom-designed for a specific use, it is called an application-specific integrated

    circuit (ASIC). Printed-circuit (PC) design also results in precise positions of conducting

    materials, as they will appear on a circuit board; in addition, PC design aggregates the bulk of the

    electronic activity into standard IC packages, the position and interconnection of which are

    essential to the final circuit. Printed circuitry may be easier to debug than integrated circuitry is,

    but it is slower, less compact, more expensive, and unable to take advantage of specialized silicon

    layout structures that make VLSI systems so attractive. The design of these electronic circuits can

    be achieved at many different refinement levels from the most detailed layout to the most abstract

    architectures. Given the complexity that is demanded at all levels, computers are increasingly

    used to aid this design at each step. It is no longer reasonable to use manual design techniques, in

    which each layer is hand etched or composed by laying tape on film. Thus the term computer-

    aided design or CAD is a most accurate description of this modern way and seems more broad in

    its scope than the recently popular term computer-aided engineering (CAE)

    4.1.1 Application Of VLSI:

    PLAs:

    Combinational circuit elements are an important part of any digital design. Three

    common methods of implementing a combinational block are random logic, read-only memory

    (ROM), and programmable logic array (PLA). In random-logic designs, the logic description of

    the circuit is directly translated into hardware structures such as AND and OR gates. The PLA

    occupies less area on the silicon due to reduced interconnection wire space; however, it may be

  • 30 | P a g e

    slower than purely random logic. A PLA can also be used as a compact finite state machine by

    feeding back part of its outputs to the inputs and clocking both sides. Normally, for high-speed

    applications, the PLA is not implemented as two NOR arrays. The inputs and outputs are inverted

    to preserve the AND-OR structure.

    Gate-Arrays:

    The gate-array is a popular technique used to design IC chips. Like the PLA, it contains a fixed

    mesh of unfinished layout that must be customized to yield the final circuit. Gate-arrays are more

    powerful, however, because the contents of the mesh are less structured so the interconnection

    options are more flexible. Gate-arrays exist in many forms with many names, eg: uncommitted

    logic arrays and master-slice. The disadvantage of gate-arrays is that they are not optimal for any

    task.

    Gate Matrices:

    The gate matrix is the next step in the evolution of automatically generated layout from high-

    level specification. Like the PLA, this layout has no fixed size; a gate matrix grows according to

    its complexity. Like all regular forms of layout, this one has its fixed aspects and its customizable

    aspects. In gate matrix layout the fixed design consists of vertical columns of polysilicon gating

    material. The customizable part is the metal and diffusion wires that run horizontally to

    interconnect and form gates with the columns.

    4.1.2 Application Areas Of VLSI

    Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in

    some cases have replaced mechanisms that operated mechanically, hydraulically, or by other

    means; electronics are usually smaller, more flexible, and easier to service. In other cases

    electronic systems have created totally new applications. Electronic systems perform a variety of

    tasks, some of them visible, some more hidden:

    1. Personal entertainment systems such as portable MP3 players and DVD players perform

    sophisticated algorithms with remarkably little energy.

    2. Electronic systems in cars operate stereo systems and displays; they also control fuel

    injection systems, adjust suspensions to varying terrain, and perform the control functions

    required for anti-lock braking (ABS) systems.

  • 31 | P a g e

    3. Digital electronics compress and decompress video, even at high definition data rates, on-

    the-fly in consumer electronics.

    4. Low-cost terminals for Web browsing still require sophisticated electronics, despite their

    dedicated function.

    5. Personal computers and workstations provide word-processing, financial analysis, and

    games. Computers include both central processing units (CPUs) and special-purpose

    hardware for disk access, faster screen display, etc.

    4.1.3 Advantages Of VLSI

    While we will concentrate on integrated circuits in this book, the properties of integrated

    circuits what we can and cannot efficiently put in an integrated circuitlargely determine the

    architecture of the entire system. Integrated circuits improve system characteristics in several

    critical ways. ICs have three key advantages over digital circuits built from discrete components:

    Size. Integrated circuits are much smallerboth transistors and wires are shrunk to micrometer

    sizes, compared to the millimeter or centimeter scales of discrete components. Small size leads to

    advantages in speed and power consumption, since smaller components have smaller parasitic

    resistances, capacitances, and inductances.

    Speed. Signals can be switched between logic 0 and logic 1 much quicker within a chip than they

    can between chips. Communication within a chip can occur hundreds of times faster than

    communication between chips on a printed circuit board. The high speed of circuits on-chip is

    due to their small sizesmaller components and wires have smaller parasitic capacitances to

    slow down the signal

    Power consumption. Logic operations within a chip also take much less power. Once again,

    lower power consumption is largely due to the small size of circuits on the chipsmaller

    parasitic capacitances and resistances require less power to drive them.

    4.2 VLSI AND SYSTEMS

    These advantages of integrated circuits translate into advantages at the system level:

    Smaller physical size. Smallness is often an advantage in itselfconsiders portable televisions

    or handheld cellular telephones.

  • 32 | P a g e

    Lower power consumption. Replacing a handful of standard parts with a single chip reduces

    total power consumption. Reducing power consumption has a ripple effect on the rest of the

    system: a smaller, cheaper power supply can be used; since less power consumption means less

    heat, a fan may no longer be necessary; a simpler cabinet with less shielding for electromagnetic

    shielding may be feasible, too.

    Reduced cost. Reducing the number of components, the power supply requirements, cabinet

    costs, and so on, will inevitably reduce system cost. The ripple effect of integration is such that

    the cost of a system built from custom ICs can be less, even though the individual ICs cost more

    than the standard parts they replace. Understanding why integrated circuit technology has such

    profound influence on the design of digital systems requires understanding both the technology

    of IC manufacturing and the economics of ICs and digital systems.

    4.3 INTRODUCTION TO ASICS AND PROGRAMMABLE LOGIC:

    The last 15 years have witnessed the demise in the number of cell-based ASIC designs as

    a means for developing customized SoCs. Rising NREs, development times and risk have mostly

    restricted the use of cell-based ASICs to the highest volume applications; applications that can

    withstand the multi-million dollar development costs associated with 1-2 design re-spins.

    Analysts estimate that the number of cell based ASIC design starts per year is now only between

    2000-3000 compared to ~10,000 in the late 1990s. The FPGA has emerged as a technology that

    fills some of the gap left by cell-based ASICs. Yet even after 20+ years of existence and 40X

    more design starts per year than cell-based ASICs, the size of the FPGA market in dollar terms

    remains only a fraction that of cell-based ASICs. This suggests that there are many FPGA

    designs that never make it into production and that for the most part, the FPGA is still seen by

    many as a vehicle for prototyping or college education and has perhaps even succeeded in

    actually stifling industry innovation. This paper introduces a new technology, the second

    generation Structured ASIC that is tipped to reenergize the path to innovation within the

    electronics industry. It brings together some of the key advantages of FPGA technology (i.e. fast

    turnaround, no mask charges, no minimum order quantity) and of cell-based ASIC (i.e. low unit

    cost and power) to deliver a new platform for SoC design. This document defines requirements

    for development of Application Specific Integrated Circuits (ASICs). It is intended to be used as

    an appendix to a Statement of Work. The document complements the ESA ASIC Design and

  • 33 | P a g e

    Assurance Requirements (AD1), which is a precursor to a future ESA PSS document on ASIC

    design.

    Moores Law

    In the 1960s Gordon Moore predicted that the number of transistors that could be

    manufactured on a chip would grow exponentially. His prediction, now known as Moores Law,

    was remarkably prescient. Moores ultimate prediction was that transistor count would double

    every two years, an estimate that has held up remarkably well. Today, an industry group

    maintains the International Technology Roadmap for Semiconductors (ITRS), that maps out

    strategies to maintain the pace of Moores Law. (The ITRS roadmap can be found at

    http://www.itrs.net.)

    4.3.1 Changing Landscape

    Structured ASICs

    A new alternative has recently emerged to address the market void between FPGAs and

    cell-based ASICs. Analysts term this as the Structured ASIC.

    First Generation Structured ASICs

    Like the FPGA market, the Structured ASIC market had a flurry of early entrants many of

    who have departed the market. Examples include respectable semiconductor companies like

    NEC, LSI logic and EDA vendors such as Simplicity.

    First Generation Structured ASICs provided designers with considerable power and cost

    improvements over FPGAs but failed to remove many barriers to entry that existed with

    traditional cell-based ASICs. First generation Structured ASICs had the following characteristics:

    1. Turn-around times were still 2-5 months from tape-out to silicon

    2. NREs were still in the range of $150-$250K or more making the technology difficult to

    access for mainstream users.

    3. Minimum order quantities were required as wafers could not be shared amongst projects

    or customers

    4. Development costs and time were also very high and long respectively, as designers were

    expected to undergo rigorous verification down to the transistor level

  • 34 | P a g e

    5. Designers transitioning from prototyping devices like FPGAs to first generation

    Structured ASICs were still expected to redesign the product into a completely new

    device, revisit timing closure and re-qualify the new device before it production ready.

    While some companies still offer first generation Structured ASICs today, market

    acceptance has been severely limited as a result of these barriers to entry. However, these first

    generation Structured ASICs paved the way for a new generation that would combine the benefits

    of both FPGAs and cell-based ASICs.

    Second Generation Structured ASICs

    A new generation of Structured ASICs has emerged on the market and is gaining traction.

    This generation utilizes a single via mask for configuring the device. In doing so, it removes the

    need for the massive amounts of SRAM configuration elements and metal interconnect that

    plagues todays FPGAs. The benefits to designers are delivered through a device that provides up

    to 20X lower device power consumption and up to 80% lower unit cost than FPGAs, depending

    on device density, (larger FPGAs have more configuration elements and metal interconnect).

    This new generation of Structured ASICs, available from eASIC Corporation, and named

    Extreme also removes the barriers of traditional cell based ASICs and also first generation

    Structured ASICs. With Extremes Structured ASICs advantages include:

    1. Turn-around times from tape-out to silicon is only 3-4 weeks

    2. There are zero mask charges as multiple projects can be shared on a wafer

    3. There is no minimum order quantity

    4. Development tools costs are low (analogous to FPGA type tools)

    5. Development time is short as designers need not perform verification down to the

    transistor level or perform exhaustive test coverage

    6. Coarse FPGA-like architecture based on calls which provides manufacturing yield

    advantages.

    There are device options for both prototyping and mass production. Designers

    transitioning from prototyping Nextreme Structured ASICs to mass production Nextreme

    Structured ASICs need not revisit timing closure or re-qualify the production device.

  • 35 | P a g e

    4.3.2 Applications For Nextreme Stryctured ASICS:

    Embedded Processing

    Nextreme Structured ASICs are ideally suited for embedded processing applications. The

    availability of a firm, 150MHz ARM926EJT processor and AMBA peripherals backed by

    industry standard development tools from ARM and its Connected Community partners,

    designers have the option to implement control circuits in software. A major benefit of using

    Nextreme for implementing embedded systems is that designers are able to make performance,

    area and feature tradeoffs using both hardware and software allowing for highly differentiated yet

    cost-optimized systems.

    Signal, Video and Image Processing

    Having to deal with programmable metal interconnect and its associated carry chain

    delays ultimately forced FPGA vendors to develop dedicated DSP blocks and slices to overcome

    performance bottlenecks. With Nextreme Structured ASICs, the elimination of massive amounts

    of metal interconnect means that these devices are not subject to unacceptable carry chain delays

    and many signal processing structured can be implemented, at speed, using logic fabric alone.

    Another capability with in Nextreme that makes them particularly suitable for signal processing

    is memories. eRAM blocks are particularly suited for distributed applications such as semi-

    parallel filters and video processing. As these blocks are located very close together, they can be

    connected to form larger blocks up to 4Kbits per eUnit.

    4.3.3 Field Programmable Gate Array (FPGA)

    A field-programmable gate array (FPGA) is an integrated circuit designed to be

    configured by the customer or designer after manufacturinghence "field-programmable". The

    FPGA configuration is generally specified using a hardware description language (HDL), similar

    to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were

    previously used to specify the configuration, as they were for ASICs, but this is increasingly

    rare). FPGAs can be used to implement any logical function that an ASIC could perform. The

    ability to update the functionality after shipping, partial re-configuration of the portion of the

    designand the low non-recurring engineering costs relative to an ASIC design (notwithstanding

    the generally higher unit cost), offer advantages for many applications.

  • 36 | P a g e

    FPGAs contain programmable logic components called "logic blocks", and a hierarchy of

    reconfigurable interconnects that allow the blocks to be "wired together"somewhat like a one-

    chip programmable breadboard. Logic blocks can be configured to perform

    complex combinational functions, or merely simple logic gates like AND and XOR. In most

    FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more

    complete blocks of memory.

    In addition to digital functions, some FPGAs have analog features. The most common

    analog feature is programmable slew rate and drive strength on each output pin, allowing the

    engineer to set slow rates on lightly loaded pins that would otherwise ring unacceptably, and to

    set stronger, faster rates on heavily loaded pins on high-speed channels that would otherwise run

    too slow.Another relatively common analog feature is differential comparators on input pins

    designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have

    integrated peripheral Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters

    (DACs) with analog signal conditioning blocks allowing them to operate as a system-on-a-

    chip.[5]

    Such devices blur the line between an FPGA, which carries digital ones and zeros on its

    internal programmable interconnect fabric, and field-programmable analog array (FPAA), which

    carries analog values on its internal programmable interconnect fabric.

    4.4.3.1 Definitions of Relevant Terminology

    The most important terminology used below.

    Field-Programmable Device (FPD)

    A general term that refers to any type of integrated circuit used for implementing digital

    hardware, where the chip can be configured by the end user to realize different designs.

    Programming of such a device often involves placing the chip into a special programming unit,

    but some chips can also be configured in-system. Another name for FPDs is programmable

    logic devices (PLDs); although PLDs encompass the same types of chips as FPDs, we prefer the

    term FPD because historically the word PLD has referred to relatively simple types of devices.

    Programmable Logic Array (PLA)

    A Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of

    logic, an AND-plane and an OR-plane, where both levels are programmable (note: although PLA

  • 37 | P a g e

    structures are sometimes embedded into full-custom chips, we refer here only to those PLAs that

    are provided as separate integrated circuits and are user-programmable).

    Programmable Array Logic (PAL)

    A Programmable Array Logic (PAL) is a relatively small FPD that has a programmable

    AND-plane followed by a fixed OR-plane.

    Simple PLD

    Refers to any type of Simple PLD, usually either a PLA or PAL.

    Complex PLD

    A more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a

    single chip. Alternative names (that will not be used in this paper) sometimes adopted for this

    style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others.

    Field-Programmable Gate Array (FPGA)

    A Field-Programmable Gate Array is an FPD featuring a general structure that allows

    very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs

    (AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of flip-

    flops to logic resources than do CPLDs.

    High-Capacity PLDs (HCPLD):

    high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This term

    has been coined in trade literature for providing an easy way to refer to both types of devices.

    PAL is a trademark of Advanced Micro Devices.

    1. Interconnect - the wiring resources in an FPD.

    2. Programmable Switch- a user-programmable switch that can connect a logic element to

    an interconnect wire, or one interconnect wire to another

    3. Logic Block- a relatively small circuit block that is replicated in an array in an FPD.

    When a circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits

    that can each be mapped into a logic block. The term logic block is mostly used in the

    context of FPGAs, but it could also refer to a block of circuitry in a CPLD.

  • 38 | P a g e

    4. Logic Capacity- the amount of digital logic that can be mapped into a single FPD. This is

    usually measured in units of equivalent number of gates in a traditional gate array. In

    other words, the capacity of an FPD is measured by the size of gate array that it is

    comparable to. In simpler terms, logic capacity can be thought of as number of 2-input

    NAND gates.

    5. Logic Density - the amount of logic per unit area in an FPD.

    6. Speed-Performance- measures the maximum operable speed of a circuit when

    implemented in an FPD. For combinational circuits, it is set by the longest delay through

    any path, and for sequential circuits it is the maximum clock frequency for which the

    circuit functions properly. In the remainder of this section, to provide insight into FPD

    development the evolution of FPDs over the past two decades is described. Additional

    background information is also included on the semiconductor technologies used in the

    manufacture of FPDs.

    4.3.4 Evolution Of Programmable Logic Devices

    The first type of user-programmable chip that could implement logic circuits was the

    Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit

    inputs and data lines as outputs. Logic functions, however, rarely require more than a few product

    terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient

    architecture for realizing logic circuits, and so are rarely used in practice for that purpose. The

    first device developed later specifically for implementing logic circuits was the Field-

    Programmable Logic Array (FPLA), or simply PLA for short. A PLA consists of two levels of

    logic gates: a programmable wired AND-plane followed by a programmable wired OR-

    plane. A PLA is structured so that any of its inputs (or their complements) can be ANDed

    together in the AND-plane; each AND-plane output can thus correspond to any product term of

    the inputs. Similarly, each OR plane output can be configured to produce the logical sum of any

    of the AND-plane outputs. With this structure, PLAs are well-suited for implementing logic

    functions in sum-of-products form. They are also quite versatile, since both the AND terms and

    OR terms can have many inputs (this feature is often referred to as wide AND and OR gates).

    When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they

    were expensive to manufacture and offered somewhat poor speed-performance.

  • 39 | P a g e

    Both disadvantages were due to the two levels of configurable logic, because

    programmable logic planes were difficult to manufacture and introduced significant propagation

    delays. To overcome these weaknesses, Programmable Array Logic (PAL) devices were

    developed. PALs feature only a single level of programmability, consisting of a programmable

    wired AND plane that feeds fixed OR-gates. To compensate for lack of generality incurred

    because the OR- Outputs plane is fixed, several variants of PALs are produced, with different

    numbers of inputs and outputs, and various sizes of OR-gates. PALs usually contain flip-flops

    connected to the OR-gate outputs so that sequential circuits can be realized.

    PAL devices are important because when introduced they had a profound effect on digital

    hardware design, and also they are the basis for some of the newer, more sophisticated

    architectures that will be described shortly. Variants of the basic PAL architecture are featured in

    several other products known by different acronyms. All small PLDs, including PLAs, PALs, and

    PAL-like devices are grouped into a single category called Simple PLDs (SPLDs), whose most

    important characteristics are low cost and very high pin-to-pin speed-performance. As technology

    has advanced, it has become possible to produce devices with higher capacity than SPLDs. The

    difficulty with increasing capacity of a strict SPLD architecture is that the structure of the

    programmable logic-planes grows too quickly in size as the number of inputs is increased.

    The only feasible way to provide large capacity devices based on SPLD architectures is

    then to integrate multiple SPLDs onto a single chip and provide interconnect to programmably

    connect the SPLD blocks together. Many commercial FPD products exist on the market today

    with this basic structure, and are collectively referred to as Complex PLDs (CPLDs). CPLDs

    were pioneered by Altera, first in their family of chips called Classic EPLDs, and then in three

    additional series, called MAX 5000, MAX 7000 and MAX 9000. Because of a rapidly growing

    market for large FPDs, other manufacturers developed devices in the CPLD category and there

    are now many choices available. All of the most important commercial products will be

    described in Section 2. CPLDs provide logic capacity up to the equivalent of about 50 typical

    SPLD devices, but it is somewhat difficult to extend these architectures to higher densities. To

    build FPDs with very high logic capacity, a different approach is needed. The highest capacity

    general purpose logic chips available today are the traditional gate arrays sometimes referred to

    as Mask-Programmable Gate Arrays (MPGAs).

  • 40 | P a g e

    MPGAs consist of an array of pre-fabricated transistors that can be customized into the

    users logic circuit by connecting the transistors with custom wires. Customization is performed

    during chip fabrication by specifying the metal interconnect, and this means that in order for a

    user to employ an MPGA a large setup cost is involved and manufacturing time is long. Although

    MPGAs are clearly not FPDs, they are mentioned here because they motivated the design of the

    user-programmable equivalent: Field- Programmable Gate Arrays (FPGAs). Like MPGAs,

    FPGAs comprise an array of uncommitted circuit elements, called logic blocks, and interconnect

    resources, but FPGA configuration is performed through programming by the end user. An

    illustration of a typical FPGA architecture appears in Figure .

    4.4 SOFTWARE REQUIREMENT

    Verification Tool

    ModelSim 6.5e

    Synthesis Tool

    Xilinx ISE 14.4

    4.4.1 MODELSIM

    ModelSim SE - High Performance Simulation and Debug

    ModelSim SE is our UNIX, Linux, and Windows-based simulation and debug

    environment, combining high performance with the most powerful and intuitive GUI in the

    industry.

    What's New in ModelSim SE?

    1. Improved FSM debug options including control of basic information, transition table and

    warning messages. Added support of FSM Multi-state transitions coverage (i.e. coverage

    for all possible FSM state sequences).

    2. Improved debugging with hyperlinked navigation between objects and their declaration,

    and between visited source files.

    3. The dataflow window can now compute and display all paths from one net to another.

    4. Enhanced code coverage data management with fine grain control of information in the

    source window.

  • 41 | P a g e

    5. Toggle coverage has been enhanced to support SystemVerilog types: structures, packed

    unions, fixed-size multi-dimensional arrays and real.

    6. Some IEEE VHDL 2008 features are supported including source code encryption. Added

    support of new VPI types, including packed arrays of struct nets and variables.

    ModelSim SE Features:

    1. Multi-language, high performance simulation engine

    2. Verilog, VHDL, SystemVerilog Design

    3. Code Coverage

    4. SystemVerilog for Design

    5. Integrated debug

    6. JobSpy Regression Monitor

    7. Mixed HDL simulation option

    8. System C Option

    9. TCL/tk

    10. Solaris and Linux 32 & 64-bit

    11. Windows 32-bit

    ModelSim SE Benefits:

    1. High performance HDL simulation solution for FPGA & ASIC design teams

    2. The best mixed-language environment and performance in the industry

    3. Intuitive GUI for efficient interactive or post-simulation debug of RTL and gate-level

    designs

    4. Merging, ranking and reporting of code coverage for tracking verification progress

    5. Sign-off support for popular ASIC libraries

    6. All ModelSim products are 100% standards based. This means your investment is

    protected, risk is lowered, reuse is enabled, and productivity is enhanced

    7. Award-winning technical support

    High-Performance, Scalable Simulation Environment:

    ModelSim provides seamless, scalable performance and capabilities. Through the use of a

    single compiler and library system for all ModelSim configurations, employing the right

  • 42 | P a g e

    ModelSim configuration for project needs is as simple as pointing your environment to the

    appropriate installation directory.

    ModelSim also supports very fast time-tenet-simulation turnarounds while maintaining

    high performance with its new black box use model, known as bbox. With bbox, non-changing

    elements can be compiled and optimized once and reused when running a modified version of the

    test bench. B box delivers dramatic throughput improvements of up to 3X when running a large

    suite of test cases.

    Easy-to-Use Simulation Environment:

    An intelligently engineered graphical user interface (GUI) efficiently displays design data

    for analysis and debug. The default configuration of windows and information is designed to

    meet the needs of most users. However, the flexibility of the ModelSim SE GUI allows users to

    easily customize it to their preferences. The result is a feature-rich GUI that is easy to use and

    quickly mastered.

    A message viewer enables simulation messages to be logged to the ModelSim results file

    in addition to the standard transcript file. The GUIs organizational and filtering capabilities

    allow design and simulation information to be quickly reduced to focus on areas of interest, such

    as possible causes of design bugs.

    ModelSim SE allows many debug and analysis capabilities to be employed post-

    simulation on saved results, as well as during live simulation runs. For example, the coverage

    viewer analyzes and annotates source code with code coverage results, including FSM state and

    transition, statement, expression, branch, and toggle coverage. Signal values can be annotated in

    the source window and viewed in the waveform viewer. Race conditions, delta, and event activity

    can be analyzed in the list and wave windows. User-defined enumeration values can be easily

    defined for quicker understanding of simulation results. For improved debug productivity,

    ModelSim also has graphical and textual dataflow capabilities. The memory window identifies

    memories in the design and accommodates flexible viewing and modification of the memory

    contents. Powerful search, fill, load, and save functionalities are supported. The memory window

    allows memories to be pre-loaded with specific or randomly generated values, saving the time-

    consuming step of initializing sections of the simulation merely to load memories. All functions

    are available via the command line, so they can be used in scripting.

  • 43 | P a g e

    Advanced Code Coverage

    The ModelSim advanced code coverage capabilities deliver high performance with ease

    of use. Most simulation optimizations remain enabled with code coverage. Code coverage

    metrics can be reported by-instance or by-design unit, providing flexibility in managing coverage

    data. All coverage information is now stored in the Unified Coverage Data Base (UCDB), which

    is used to collect and manage all coverage information in one highly efficient database. Coverage

    utilities that analyze code coverage data, such as merging and test ranking, are available.

    The coverage types supported include:

    1. Statement coverage: number of statements executed during a run

    2. Branch coverage: expressions and case statements that affect the control flow of the HDL

    execution

    3. Condition coverage: breaks down the condition on a branch into elements that make the

    result true or false

    4. Expression coverage: the same as condition coverage, but covers concurrent signal

    assignments instead of branch decisions

    5. Focused expression coverage: presents expression coverage data in a manner that

    accounts for each independent input to the expression in determining coverage results

    6. Enhanced toggle coverage: in default mode, counts low-to-high and high-to-low

    transitions; in extended mode, counts transitions to and from X

    7. Finite State Machine coverage: state and state transition coverage

    SYNTHESIS TOOL:

    4.4.2 XILINX ISE

    4.4.2.1 Introduction

    For two-and-a-half decades, Xilinx has been at the forefront of the programmable logic

    revolution, with the invention and continued migration of FPGA platform technology. During

    that time, the role of the FPGA has evolved from a vehicle for prototyping and glue-logic to a

    highly flexible alternative to ASICs and ASSPs for a host of applications and markets. Today,

    Xilinx FPGAs have become strategically essential to world-class system companies that are

    hoping to survive and compete in these times of extreme global economic instability, turning

  • 44 | P a g e

    what was once the programmable revolution into the programmable imperative for both Xilinx

    and our customers.

    Programmable Imperative

    When viewed from the customer's perspective, the programmable imperative is the

    necessity to do more with less, to remove risk wherever possible, and to differentiate in order to

    survive. In essence, it is the quest to simultaneously satisfy the conflicting demands created by

    ever-evolving product requirements (i.e., cost, power, performance, and density) and mounting

    business challenges (i.e., shrinking market windows, fickle market demands, capped engineering

    budgets, escalating ASIC and ASSP non-recurring engineering costs, spiraling complexity, and

    increased risk). To Xilinx, the programmable imperative represents a two-fold commitment. The

    first is to continue developing programmable silicon innovations at every process node that

    deliver industry-leading value for every key figure of merit against which FPGAs are measured:

    price, power, performance, density, features, and programmability. The second commitment is to

    provide customers with simpler, smarter, and more strategically viable design platforms for the

    creation of world-class FPGA-based solutions in a wide variety of industrieswhat Xilinx calls

    targeted design platforms.

    Base Platform

    The base platform is both the delivery vehicle for all new silicon offerings from Xilinx

    and the foundation upon which all Xilinx targeted design platforms are built. As such, it is the

    most fundamental platform used to develop and run customer-specific software applications and

    hardware designs as production system solutions. Released at launch, the base platform

    comprises a robust set of well-integrated, tested, and targeted elements that enable customers to

    immediately start a design. These elements include:

    1. FPGA silicon

    2. ISE Design Suite design environment

    3. Third-party synthesis, simulation, and signal integrity tools

    4. Reference designs common to many applications, such as memory interface and

    configuration designs.

    5. Development boards that run the reference designs

    6. A host of widely used IP, such as Gig E, Ethernet, memory controllers, and PCIe.

  • 45 | P a g e

    Domain-Specific Platform

    The next layer in the targeted design platform hierarchy is the domain-specific platform.

    Released from three to six months after the base platform, each domain specific platform targets

    one of the three primary Xilinx FPGA user profiles (domains):the embedded processing

    developer, the digital signal processing (DSP) developer, or the logic/connectivity developer.

    This is where the real power and intent of the targeted design platform begins to emerge.

    Domain-specific platforms augment the base platform with a predictable, reliable, and

    intelligently targeted set of integrated technologies, including:

    1. Higher-level design methodologies and tools

    2. Domain-specific embedded, DSP, and connectivity IP

    3. Domain-specific development hardware and daughter cards

    4. Reference designs optimized for embedded processing, connectivity, and DSP

    5. Operating systems (required for embedded processing) and software

    Every element in these platforms is tested, targeted, and supported by Xilinx and/or our

    ecosystem partners. Starting a design with the appropriate domain-specific platform can cut

    weeks, if not months, off of the user's development time.

    Market-Specific Platform

    A market-specific platform is an integrated combination of technologies that enables

    software or hardware developers to quickly build and then run their specific application or

    solution. Built for use in specific markets such as Automotive, Consumer, Mil/Aero,

    Communications, AVB, or ISM, market-specific platforms integrate both the base and domain-

    specific platforms and provide higher level elements that can be leveraged by customer-specific

    software and hardware designs. The market-specific platform can rely more heavily on third-

    party targeted IP than the base or domain-specific platforms. The market-specific platform

    includes: the base and domain-specific platforms, reference designs, and boards (or daughter

    cards) to run reference designs that are optimized for a particular market (e.g., lane departure

    early-warning systems, analytics, and display processing).Xilinx will begin releasing market-

    specific platforms three to six months after the domain-specific platforms, augmenting the

    domain-specific platforms with reference designs, IP, and software aimed at key growth markets.

    Initially, Xilinx will target markets such as Communications, Automotive, Video, and Displays

  • 46 | P a g e

    with platform elements that abstract away the more mundane portions of the design, thereby

    further reducing the customer's development effort so they can focus their attention on creating

    differentiated value in their end solution. This systematic platform development and release

    strategy provides the framework for the consistent and efficient fulfillment of the programmable

    imperativeboth by Xilinx and by its customers.

    Platform Enablers

    Xilinx has instituted a number of changes and enhancements that have contributed

    substantially to the feasibility and viability of the targeted design platform. These platform-

    enabling changes cover six primary areas:

    1. Design environment enhancements

    2. Socket able IP creation

    3. New targeted reference designs

    4. Scalable unified board and kit strategy

    5. Ecosystem expansion

    6. Design services supporting the targeted design platform approach

    Design Environment Enhancements

    With the breadth of advances and capabilities that the Virtex-6 and Spartan-6

    programmable devices deliver coupled with the access provided by the associated targeted design

    platforms, it is no longer feasible for one design flow or environment to fit every designer's

    needs. System designers, algorithm designers, SW coders, and logic designers each represent a

    different user-profile, with unique requirements for a design methodology and associated design

    environment. Instead of addressing the problem in terms of individual fixed tools, Xilinx targets

    the required or preferred methodology for each user, to address their specific needs with the

    appropriate design flow. At this level, the design language changes from HDL (VHDL/Verilog)

    to C, C++, MATLAB software, and other higher level languages which are more widely used

    by these designers, and the design abstraction moves up from the block or component to the

    system level. The result is a methodology and complete design flow tailored to each user profile

    that provides design creation, design implementation, and design verification. Indicative of the

    complexity of the problem, to fully understand the user profile of a logic designer, one must

    consider the various levels of expertise represented by this demographic. The most basic category

  • 47 | P a g e

    in this profile is the push-button user who wants to complete a design with minimum work or

    knowledge.

    The push-button user just needs good-enough results. Contrastingly, more advanced

    users want some level of interactive capabilities to squeeze more value into their design, and the

    power user (the expert) wants full control over a vast array of variables. Add the traditional

    ASIC designers, tasked with migrating their designs to an FPGA (a growing trend, given the

    intolerable costs and risks posed by ASIC development these days), and clearly the imperative

    facing Xilinx is to offer targeted flows and tools that support each user's requirements and

    capabilities, on their terms. The most recent release of the ISE Design Suite includes numerous

    changes that fulfill requirements specifically pertinent to the targeted design platform. The new

    release features a complete tool chain for each top-level user profile (the domain-specific

    personas: the embedded, DSP, and logic/connectivity designers), including specific

    accommodations for everyone from the push-button user to the ASIC designer.

    The tighter integration of embedded and DSP flows enables more seamless integration of

    designs that contain embedded, DSP, IP, and user blocks in one system. To further enhance

    productivity and help customers better manage the complexity of their designs, the new ISE

    Design Suite enables designers to target area, performance, or power by simply selecting a design

    goal in the setup. The tools then apply specific optimizations to help meet the design goal. In

    addition, the ISE Design Suite boasts substantially faster place-and-route and simulation run

    times, providing users with 2X faster compile times. Finally, Xilinx has adopted the FLEXnet

    Licensing strategy that provides a floating license to track and monitor usage.

    4.4.2.2 XILINX ISE Design Tools:

    Xilinx ISE is the design tool provided by Xilinx. Xilinx would be virtually identical for

    our purposes.

    There are four fundamental steps in all digital logic design. The se consist of:

    1. Design The schematicor code that describes the circuit.

    2. Synthesis The intermediate conversion of human readable circuit description to FPGA

    code(EDIF) format. It involves syntax checking and combining of all the separate design

    files into a single file.

  • 48 | P a g e

    3. Place & Route Where the layout of the circuit is finalized. This is the translation of the

    EDIF into logic gates on the FPGA.

    4. Program The FPGA is updated to reflect the design through the use of programming

    (.bit) files. Test bench simulation is in the second step. As its name implies, it is used for

    testing the design by simulating the result of driving the inputs and observing the outputs

    to verify your design. ISE has the capabilit


Recommended