+ All Categories
Home > Documents > Ocp Document Final

Ocp Document Final

Date post: 03-Apr-2018
Category:
Upload: darkknight1432
View: 220 times
Download: 0 times
Share this document with a friend

of 68

Transcript
  • 7/29/2019 Ocp Document Final

    1/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 1CHAPTER 1

    INTRODUCTION

    1.1INTRODUCTION

    An SOC chip usually contains a large number of IP cores that communicate with

    each other through on-chip buses. As the VLSI process technology continuously

    advances, the frequency and the amount of the data communication between IP cores

    increase substantially. As a result, the ability of on chip buses to deal with the large

    amount of data traffic becomes a dominant factor for the overall performance. The design

    of on-chip buses can be divided into two parts: bus interface and bus architecture. The

    bus interface involves a set of interface signals and their corresponding timing

    relationship, while the bus architecture refers to the internal components of buses and the

    interconnections among the IP cores.

    The widely accepted on-chip bus, AMBA AHB [1], defines a set of bus interface

    to facilitate basic (single) and burst read/write transactions. AHB also defines the internal

    bus architecture, which is mainly a shared bus composed of multiplexors. The

    multiplexer-based bus architecture works well for a design with a small number of IP

    cores. When the number of integrated IP cores increases, the communication between IP

    cores also increase and it becomes quite frequent that two or more master IPs would

    request data from different slaves at the same time. The shared bus architecture often

    cannot provide efficient communication since only one bus transaction can be supported

    at a time. To solve this problem, two bus protocols have been proposed recently. One is

    the Advanced extensible Interface protocol (AXI) [1] proposed by the ARM company.

    AXI defines five independent channels (write address, write data, write response,

    read address, and read data channels). Each channel involves a set of signals. AXI does

    not restrict the internal bus architecture and leaves it to designers. Thus designers are

    allowed to integrate two IP cores with AXI by either connecting the wires directly or

    invoking an in-house bus between them. The other bus interface protocol is proposed by

    a non-profitable organization, the Open Core Protocol International Partnership (OCP-

    ECET 1

  • 7/29/2019 Ocp Document Final

    2/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    IP) [2]. OCP is an interface (or socket) aiming to standardize and thus simplify the

    system integration problems. It facilitates system integration by defining a set of concrete

    interface (I/O signals and the handshaking protocol) which is independent of the bus

    architecture. Based on this interface IP core designers can concentrate on designing the

    internal functionality of IP cores, bus designers can emphasize on the internal bus

    architecture, and system integrators can focus on the system issues such as the

    requirement of the bandwidth and the whole system architecture. In this way, system

    integration becomes much more efficient.

    Most of the bus functionalities defined in AXI and OCP are quite similar. The

    most conspicuous difference between them is that AXI divides the address channel into

    independent write address channel and read address channel such that read and write

    transactions can be processed simultaneously. However, the additional area of the

    separated address channels is the penalty .Some previous work has investigated on-chip

    buses from various aspects. The work presented in [3] and [4] develops high-level

    AMBA bus models with fast simulation speed and high timing accuracy. The authors in

    [5] propose an automatic approach to generate high-level bus models from a formal

    channel model of OCP. In both of the above work, the authors concentrate on fast and

    accurate simulation models at high level but did not provide real hardware

    implementation details. In [6], the authors implement the AXI interface on shared bus

    architecture. Even though it costs less in area, the benefit of AXI in the communication

    efficiency may be limited by the shared-bus architecture.

    In this paper we propose a high-performance on-chip bus design with OCP as the

    bus interface. We choose OCP because it is open to the public and OCP-IP has provided

    some free tools to verify this protocol. Nevertheless, most bus design techniques

    developed in this paper can also be applied to the AXI bus. Our proposed bus architecture

    features crossbar/partial-crossbar based interconnect and realizes most transactionsdefined in OCP, including 1) single transactions, 2) burst transactions, 3) lock

    transactions, 4) pipelined transactions, and 5) out-of-order transactions. In addition, the

    proposed bus is flexible such that one can adjust the bus architecture according to the

    system requirement .One key issue of advanced buses is how to manipulate the order of

    ECET 2

  • 7/29/2019 Ocp Document Final

    3/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Transactions such that requests from masters and responses from slaves can be carried

    out in best efficiency without violating any ordering constraint. In this work we have

    developed a key bus component called the scheduler to handle the ordering issues of out-

    of-order transactions. We will show that the proposed crossbar/partial-crossbar bus

    architecture together with the scheduler can significantly enhance the communication

    efficiency of a complex SOC.

    1.2 Basic Idea

    Basic idea is to perform the proper and lossless communication between the IP

    cores which using same protocols on the System on Chip (SOC) system. Basically, an

    SOC is a system which is considered as a set of components and interconnects amongthem. The dataflow will happen in the system in order to achieve a successful process and

    hence for which the various interfaces is required. If these interfaces have issues, then the

    process to be achieved will fail which leads to fail of whole application.

    Generally, in an SOC system, the protocols can be used as interfaces which will

    be based on the application and also the designer. The interface has its own properties

    which suits for the corresponding application.

    1.3 Need for Project

    This project is chosen because currently the issues are increased in the industries

    due to the lack of proper data transferring between the IP cores on the System on Chip

    (SOC) system.

    In recent days, the development of SOC chips and the reusable IP cores are given

    higher priority because of its less cost and reduction in the period of Time-to-Market. So

    this enables the major and very sensitive issue such as interfacing of these IP cores. These

    interfaces play a vital role in SOC and should be taken care because of the

    communication between the IP cores property. The communication between the different

    IP cores should have a lossless data flow and should be flexible to the designer too.

    Hence to resolve this issue, the standard protocol buses are used in or order to

    interface the two IP cores. Here the loss of data depends on the standards of protocols

    ECET 3

  • 7/29/2019 Ocp Document Final

    4/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    used. Most of the IP cores OCP (Open Core Protocol) which is basically a core based

    protocol which has its own advantages and flexibilities.

    1.4 Shortening SoC Design Times

    To address the time-to-market issue, first consider the benefits of designing

    individual SoC cores, and the final SoC, in parallel. Here, enterprises would clearly have

    a significant opportunity to reduce design time since all design aspects including SoC

    simulations (timing and performance analysis, etc.) occur in parallel.

    This reduces the SoC design time to that of the longest-effort, single-element

    design. The element might be an individual SoC core or, perhaps, the SoC integration

    effort. Either way, development schedule risk becomes bounded - assuring a higherprobability of a satisfactory SoC within an accelerated development schedule. This also

    allows more predictable scheduling. Since all the development is bounded and all design

    is done in parallel, problems are not solved in serial fashion. This means problems are

    detected and solved sooner. The design flows become very predictable. However, parallel

    development in this context mandates clearly defined divisions of responsibility for each

    core and shared SoC resources. Thats because cores would only perform their native

    functions without any system knowledge.

    1.5 SoC Sockets

    As we have now seen, the solution to maximizing core reuse potential requires

    adopting a well-conceived and specified core-centric protocol as the native core

    interface. By selecting an adopted industry standard, core designers not only enable core

    reuse for cores developed within their own enterprise, they also enable reuse outside their

    enterprise under Intellectual Property (IP) licensing agreements. Finally, they also

    maximize their ability to license and incorporate third-part IP within their own SoC

    designs. In other words, they achieve SoC design agility and the ability to generate

    revenue through IP licensing.

    Moreover, a rigorous IP core interface specification, combined with an optimized

    system interconnect, allows core developers to focus on developing core functions. This

    ECET 4

  • 7/29/2019 Ocp Document Final

    5/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    eliminates the typical advance knowledge requirements regarding potential end systems,

    which might utilize a core, as well as the other IP cores that might be present in the

    application(s). Cores simply need a useful interface that de-couples them from system

    requirements. The interface then assumes the attributes of a SoCKETan attachment

    interface that is powerful, frugal and well understood across the industry.

    Via this methodology, system integrators realize the benefits of partitioning

    components through layered hardware; designers no longer have to contend with a

    myriad of diverse core protocols and inter-core delivery strategies. Using a standard IP

    core interface eliminates having to adapt each core during each SoC integration, allowing

    system integrators the otherwise unrealized luxury of focusing on SoC design issues.

    And, since the cores are truly decoupled from the on-chip interconnect, hence, each other,

    it becomes trivial to exchange one core for another to meet evolving system and market

    requirements.

    In summary, for true core reuse, cores must remain completely untouched as

    designers integrate them into any SoC. This only occurs when, say, a change in bus

    width, bus frequency or bus electrical loading does not require core modification. In other

    words, a complete socket insulates cores from the vagaries of, and change to, the SoC

    interconnect mechanism. The existence of such a socket enables supporting tool and

    collateral development for protocol, checkers, models, test benches and test generators.

    This allows independent core development that delivers plug-and-play modularity

    without core-interconnect rework. This also allows core development in parallel with a

    system design that saves precious design time.

    1.6 Overview

    The Open Core Protocol (OCP) defines a high-performance, bus-independent

    interface between IP cores that reduces design time, design risk, and manufacturing costs

    for SOC designs. An IP core can be a simple peripheral core, a high-performance

    microprocessor, or an on-chip communication subsystem such as a wrapped on-chip bus.

    The Open Core Protocol,

    ECET 5

  • 7/29/2019 Ocp Document Final

    6/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Achieves the goal of IP design reuse. The OCP transforms IP cores making them

    independent of the architecture and design of the systems in which they are used

    Optimizes die area by configuring into the OCP only those features needed by the

    communicating cores

    Simplifies system verification and testing by providing a firm boundary around

    each IP core that can be observed, controlled, and validated

    The approach adopted by the Virtual Socket Interface Alliances (VSIA) Design

    Working Group on On-Chip Buses (DWGOCB) is to specify a bus wrapper to provide a

    bus-independent Transaction Protocol-level interface to IP cores. The OCP is equivalent

    to VSIAs Virtual Component Interface (VCI). While the VCI addresses only data flow

    aspects of core communications, the OCP is a superset of VCI additionally supportingconfigurable sideband control signaling and test harness signals. The OCP is the only

    standard that defines protocols to unify all of the inter-core communication.

    The Open Core Protocol (OCP) delivers the only non-proprietary, openly

    licensed, core-centric protocol that comprehensively describes the system-level

    integration requirements of intellectual property (IP) cores. While other bus and

    component interfaces address only the data flow aspects of core communications, the

    OCP unifies all inter-core communications, including sideband control and test harness

    signals. OCP's synchronous unidirectional signaling produces simplified core

    implementation, integration, and timing analysis.

    OCP eliminates the task of repeatedly defining, verifying, documenting and

    supporting proprietary interface protocols. The OCP readily adapts to support new core

    capabilities while limiting test suite modifications for core upgrades. Clearly delineated

    design boundaries enable cores to be designed independently of other system cores

    yielding definitive, reusable IP cores with reusable verification and test suites.

    Any on-chip interconnects can be interfaced to the OCP rendering it appropriate

    for many forms of on-chip communications:

    Dedicated peer-to-peer communications, as in many pipelined signal processing

    applications such as MPEG2 decoding.

    Simple slave-only applications such as slow peripheral interfaces.

    ECET 6

  • 7/29/2019 Ocp Document Final

    7/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    High-performance, latency-sensitive, multi-threaded applications, such as multi-

    bank DRAM architectures.

    The OCP supports very high performance data transfer models ranging from

    simple request-grants through pipelined and multi-threaded objects. Higher complexity

    SOC communication models are supported using thread identifiers to manage out-of-

    order completion of multiple concurrent transfer sequences.

    The Open Core Protocol interface addresses communications between the

    functional units (or IP cores) that comprise a system on a chip. The OCP provides

    independence from bus protocols without having to sacrifice high-performance access to

    on-chip interconnects. By designing to the interface boundary defined by the OCP, you

    can develop reusable IP cores without regard for the ultimate target system.

    Given the wide range of IP core functionality, performance and interface

    requirements, a fixed definition interface protocol cannot address the full spectrum of

    requirements. The need to support verification and test requirements adds an even higher

    level of complexity to the interface. To address this spectrum of interface definitions, the

    OCP defines a highly configurable interface. The OCPs structured methodology includes

    all of the signals required to describe an IP cores communications including data flow,

    control, and verification and test signals.

    Here the importance of project comes into picture i.e. OCP (Open Core

    Protocol) plays a vital role by doing its transaction between two different IP cores,

    which will make the application fail when it doesnt work properly.

    1.7 Application:

    Since it is an IP block, it can be used in any kind of SOC Application. The

    application can be listed as follows.

    SRAM

    Processor

    ECET 7

  • 7/29/2019 Ocp Document Final

    8/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 2CHAPTER 2

    Literature survey

    2.1 Introduction AXI

    With the rapid progress of system-on-a-chip (SOC) and massive data movement

    requirement, on-chip system bus becomes the central role in determining the performance

    of a SOC. Two types of on-chip bus have been widely used in current designs: pipelined-

    based bus and packet-based bus.

    For pipelined-based buses, such as ARMs AMBA 2.0 AHB [1], IBMs

    CoreConnect [2] and OpenCores WishBone [3], the cost and complexity to bridge the

    communications among on-chip designs are low. However, pipeline-based bus suffers

    from bus contention and inherent blocking characteristics due to the protocol. The

    contention issue can be alleviated by adopting multi-layer bus structure [4] or using

    proper arbitration policies [5, 6]. However, the blocking characteristic, which allows a

    transfer to complete only if the previous transfer has completed, cannot be altered without

    changing the bus protocol. This blocking characteristic reduces the bus bandwidth

    utilisation when accessing long latency devices, such as an external memory controller.

    To cope with the issues of pipelined-based buses packet-based buses such as

    ARMs AMBA 3.0 AXI [7], OCP-IPs Open Core Protocol (OCP) [8], and

    STMicroelectronics STBus [9] have been proposed to support outstanding transfer and

    out-of-order transfer completion. We will focus on AXI here because of its popularity.

    AXI bus possesses multiple independent channels to support multiple simultaneous

    address and data streams. Besides, AXI also supports improved burst operation, register

    slicing with registered input and secured transfer.

    Despite the above features, AXI requires high cost and possesses long transaction

    handshaking latency. However, a shared-link AXI interconnect can provide good

    performance while requiring less than half of the hardware required by a crossbar AXI

    implementation. This work focused on the performance analysis of a shared-link AXI.

    The handshaking latency is at least two cycles if the interface or interconnect is designed

    with registered input. This would limit the bandwidth utilisation to less than 50%. To

    ECET 8

  • 7/29/2019 Ocp Document Final

    9/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    reduce the handshaking latency, we proposed a hybrid data locked transfer mode. Unlike

    the lock transfer in [10] which requires arbitration lock over transactions, our data locked

    mode is based on a transfer-level arbitration scheme and allows bus ownership to change

    between transactions. This gives more flexibility to arbitration policy selection.

    With the additional features of AXI, new factors that affect the bus performance

    are also introduced. The first factor is the arbitration combination. The multi-channel

    architecture allows different and independent arbitration policies to be adopted by each

    channel. However, existing AXI-related works often assumed a unified arbitration policy

    where each channel adopts the same arbitration policy [1012]. Another key factor is the

    interface buffer size. A larger interface buffer usually implies that more out-of-order

    transactions can be handled. The third factor is the task access setting, which defines how

    the transfer modes should be used by the devices within a system. Proper task access

    settings can yield better performance. However, the proper setting may be different under

    different circumstances, such as different buffer sizes.

    Being aware of the performance factors mentioned above, we conducted a

    detailed simulation-based analysis on the performance impact of the factors. The analysis

    is carried out by simulating a multi-core platform with a shared-link AXI backbone

    running a video phone application. The performance is evaluated in terms of bandwidth

    utilization, average transaction latency and system task completion time. In addition to

    the analysis on the performance impact of the aforementioned factors, the performance of

    a corresponding five-layer AHB-lite bus, which has a cost comparable to a 5-channel

    shared-link AXI, is also included for comparison. The rest of the paper is organized as

    follows. Section 2 presents the related works on AXI bus. Section 3 presents the proposed

    transfer modes.

    ECET 9

  • 7/29/2019 Ocp Document Final

    10/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    2.2 Transfer modes

    2.2.1 NormalThis mode is the basic transfer mode in an AXI bus with registered interface. In

    the first cycle of a transfer using normal mode, the initiator sets the valid signal high and

    sends it to the target. In the second cycle, the target receives the high valid signal and sets

    the ready signal high for one cycle in response. Once the initiator receives the High ready

    signal, the initiator resets the valid signal low and this transfer is completed. As a result,

    at least two cycles are needed to complete a transfer in an AXI bus with registered

    interface. Fig.2.1 illustrates the transfer of

    Figure 2.1 Normal mode transfer example

    Two normal transactions with a data burst length of four. It takes 16 bus cycles to

    complete the eight data transfer in the two transactions. This means 50% of the bus

    available bandwidth is wasted.

    ECET 10

  • 7/29/2019 Ocp Document Final

    11/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    2.2.2 Interleaved mode

    The interleaved mode [10, 20] hides transfer latency by allowing two transactions

    from different initiators to be transferred in an interleaved manner. Fig. 2.1 illustrates the

    transfer of the two transactions mentioned earlier using interleaved transfer mode. The

    one cycle latency introduced in the normal mode for request B is hidden by the transfer of

    request A. Similarly, the interleaved transfer mode can also be applied to data channels.

    As a result, transferring the data of the two transactions only takes nine cycles.

    To support the interleaved mode, only the bus interconnects needs additional

    hardware. No additional hardware in device interface or modification on bus protocol is

    required. Hence, an AXI interconnect that supports the interleaved mode can be used

    with standard AXI device.

    2.2.3 Proposed data locked mode

    Although the interleaved mode can increase bandwidth utilization when more than one

    initiator is using the bus, the interleaved mode cannot be enabled when only one

    standalone initiator is using the bus. To handle this, we proposed the data locked mode.

    In contrast to the locked transfer implemented in [11] that can only perform when the bus

    ownership is locked across consecutive transactions, the proposed data locked mode

    locks the ownership of the bus only within the period of burst data transfers. During the

    burst data transfer period, the ready signal is tied high and hence the handshaking process

    is bypassed. Unlike the interleaved mode, which can be applied to both request and data

    channels, the proposed data locked mode supports only burst data transfer.

    Fig 2.2 illustrates an example of two transactions using data locked mode to

    transfer data. Device M0 sends a data locked request A and device M1 sends a data

    locked request B. Once the bus interconnect accepts request A, the bus interconnect

    records the transaction ID of request A. When a data transfer with the matched ID

    appears in the data channel, the bus interconnect uses data locked mode to transfer the

    data continuously. For a transaction with a data burst of n, the data transfer latency is (n +

    1) cycles.

    ECET 11

  • 7/29/2019 Ocp Document Final

    12/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    There are two approaches to signal the bus interconnect to use the data locked

    mode for a transaction. One uses ARLOCK/AWLOCK signal in the address channels to

    signal the bus of an incoming transaction using data locked transfer. However, doing so

    requires modifying the protocol definition of these signals and the bus interface. To avoid

    modifying the protocol, the other approach assigns the devices that can use the data

    locked mode in advance. The overhead of this approach is that the bus interconnect must

    provide mechanisms to configure the device transfer mode mapping. Note that these two

    approaches can be used together without conflict.

    To support the proposed data locked mode, the bus interconnect needs an

    additional buffer, called data locked mode buffer, to keep record of the transactions using

    the data locked mode. Each entry in the buffer stores one transaction ID. If all the entries

    in the data locked mode buffer are in use, no more transactions can be transferred using

    the data locked mode.

    Figure 2.2 Interleaved mode transfer example

    ECET 12

  • 7/29/2019 Ocp Document Final

    13/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Figure 2.3 Data locked mode transfer example

    2.2.4 Proposed hybrid data locked mode

    The hybrid data locked mode is proposed to allow additional data locked mode

    transaction requests to be transferred using the normal or interleaved mode when the data

    locked mode buffer is full. This allows more transactions to be available to the scheduler

    1of the devices that support transaction scheduling. With the additional transactions, the

    scheduler of such devices may achieve better scheduling result. However, only a limited

    number of additional transactions using the data locked mode can be transferred using the

    normal or interleaved mode. This avoids bandwidth-hungry devices from occupying the

    bus with too many transactions. A hybrid mode counter is included to count the number

    of additional transactions transferred. If the counter value reaches the preset threshold, no

    more data locked mode transactions can be transferred using the normal or interleaved

    mode until the data locked mode buffer becomes not full again. Once the data locked

    mode buffer is not full, the hybrid mode counter is reset.

    ECET 13

  • 7/29/2019 Ocp Document Final

    14/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    2.3 AMBA-AHB Protocol

    2.3.1 Introduction

    The AHB (Advanced High-performance Bus) is a high-performance bus in

    AMBA (Advanced Microcontroller Bus Architecture) family. This AHB can be used in

    high clock frequency system modules. The AHB acts as the high-performance system

    backbone bus. AHB supports the efficient connection of processors, on-chip memories

    and off-chip external memory interfaces with low-power peripheral macro cell functions.

    AHB is also specified to ensure ease of use in an efficient design flow using automated

    test techniques. This AHB is a technology-independent and ensure that highly reusable

    peripheral and system macro cells can be migrated across a diverse range of IC processes

    and be appropriate for full-custom, standard cell and gate array technologies.

    2.3.2 Basic Idea

    Basic idea is to perform the proper and lossless communication between the IP

    cores which using same protocols on the System on Chip (SOC) system. Basically, an

    SOC is a system which is considered as a set of components and interconnects among

    them. The dataflow will happen in the system in order to achieve a successful process and

    hence for which the various interfaces is required. If these interfaces have issues, then the

    process to be achieved will fail which leads to fail of whole application.

    Generally, in an SOC system, the protocols can be used as interfaces which will

    be based on the application and also the designer. The interface has its own properties

    which suits for the corresponding application.

    2.3.3 Need for Project

    This project is chosen because currently the issues are increased in the industries

    due to the lack of proper data transferring between the IP cores on the System on Chip

    (SOC) system.

    In recent days, the development of SOC chips and the reusable IP cores are given

    higher priority because of its less cost and reduction in the period of Time-to-Market. So

    ECET 14

  • 7/29/2019 Ocp Document Final

    15/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    this enables the major and very sensitive issue such as interfacing of these IP cores. These

    interfaces play a vital role in SOC and should be taken care because of the

    communication between the IP cores property. The communication between the different

    IP cores should have a lossless data flow and should be flexible to the designer too.

    Hence to resolve this issue, the standard protocol buses are used in or order to

    interface the two IP cores. Here the loss of data depends on the standards of protocols

    used. Most of the IP cores from ARM uses the AMBA (Advanced Microcontroller Bus

    Architecture) which has AHB (Advanced High-Performance Bus). This bus has its own

    advantages and flexibilities. A full AHB interface is used for the following.

    Bus masters

    On-chip memory blocks External memory interfaces

    High-bandwidth peripherals with FIFO interfaces

    DMA slave peripherals

    2.3.4 Objectives of the AMBA Specification

    The AMBA specification has been derived to satisfy four key requirements:

    To facilitate the right-first-time development of embedded microcontroller

    products with one or more CPUs or signal processors.

    To be technology-independent and ensure that highly reusable peripheral and

    system macro cells can be migrated across a diverse range of IC processes and beappropriate for full-custom, standard cell and gate array technologies.

    To encourage modular system design to improve processor independence,

    providing a development road-map for advanced cached CPU cores and thedevelopment of peripheral libraries.

    To minimize the silicon infrastructure required to support efficient on-chip andoff-chip communication for both operation and manufacturing test.

    ECET 15

  • 7/29/2019 Ocp Document Final

    16/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    2.3.5 Typical AMBA-based Microcontroller

    An AMBA-based microcontroller typically consists of a high-performance system

    backbone bus (AMBA AHB or AMBA ASB), able to sustain the external memory

    bandwidth, on which the CPU, on-chip memory and other Direct Memory Access (DMA)

    devices reside. This bus provides a high-bandwidth interface between the elements that

    are involved in the majority of transfers. Also located on the high-performance bus is a

    bridge to the lower bandwidth APB, where most of the peripheral devices in the system

    are located.

    Figure 2.3.5 Typical AMBA Systems

    The key advantages of a typical AMBA System are listed as follows.

    High performance

    Pipelined operation

    Multiple bus masters

    Burst transfers

    ECET 16

  • 7/29/2019 Ocp Document Final

    17/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Split transactions

    AMBA APB provides the basic peripheral macro cell communications

    infrastructure as a secondary bus from the higher bandwidth pipelined main system bus

    such peripherals typically.

    Have interfaces which are memory-mapped registers

    Have no high-bandwidth interfaces

    Are accessed under programmed control

    The external memory interface is application-specific and may only have a

    narrow data path, but may also support a test access mode which allows the internal

    AMBA AHB, ASB and APB modules to be tested in isolation with system-independent

    test sets.

    Here the importance of project comes into picture i.e. AMBA-AHB plays a vital

    role by doing its transaction between two different IP cores, which will make the

    application fail when it doesnt work properly.

    2.4. Terminology

    The following terms are used throughout this specification

    2.4.1 Bus Cycle

    A bus cycle is a basic unit of one bus clock period and for the purpose of AMBA

    AHB or APB protocol descriptions is defined from rising-edge to rising-edge transitions.

    An ASB bus cycle is defined from falling-edge to falling-edge transitions. Bus signal

    timing is referenced to the bus cycle clock.

    2.4.2 Bus Transfer

    An AMBA AHB bus transfer is a read or write operation of a data object, which

    may take one or more bus cycles. The bus transfer is terminated by a completion response

    from the addressed slave. The transfer sizes supported by AMBA AHB include byte (8-

    bit), half word (16-bit) and word (32-bit).

    2.4.3 Burst Operation

    A burst operation is defined as one or more data transactions, initiated by a bus

    master, which have a consistent width of transaction to an incremental region of address

    ECET 17

  • 7/29/2019 Ocp Document Final

    18/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    space. The increment step per transaction is determined by the width of transfer (byte,

    half word and word).

    2.5 APPLICATIONS

    AMBA-AHB can be used in the different application and also it is technology

    independent.

    ARM Controllers are designed according to the specifications of AMBA.

    In the present technology, high performance and speed are required which are

    convincingly met by AMBA-AHB

    Compared to the other architectures AMBA-AHB is far more advanced and

    efficient.

    To minimize the silicon infrastructure to support on-chip and off-chip

    communications

    Any embedded project which involve in ARM processors or microcontroller must

    always make use of this AMBA-AHB as the common bus throughout the project.

    2.6 Features

    AMBA Advanced High-performance Bus (AHB) supports the following features.

    High performance

    Burst transfers

    Split transactions

    Single edge clock operation

    SEQ, NONSEQ, BUSY, and IDLE Transfer Types

    Programmable number of idle cycles

    Large Data bus-widths - 32, 64, 128 and 256 bits wide

    Address Decoding with Configurable Memory Map

    2.7 Merits

    Since AHB is a most commonly used bus protocol, it must have many advantages

    from designers point of view and are mentioned below.

    ECET 18

  • 7/29/2019 Ocp Document Final

    19/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    AHB offers a fairly low cost (in area), low power (based on I/O) bus with a

    moderate amount of complexity and it can achieve higher frequencies when

    compared to others because this protocol separates the address and data phases.

    AHB can use the higher frequency along with separate data buses that can be

    defined to 128-bit and above to achieve the bandwidth required for high-

    performance bus applications.

    AHB can access other protocols through the proper bridging converter. Hence it

    supports the bridge configuration for data transfer.

    AHB allows slaves with significant latency to respond to read with an HRESP of

    SPLIT. The slave will then request the bus on behalf of the master when theread data is available. This enables better bus utilization.

    AHB offers burst capability by defining incrementing bursts of specified length

    and it supports both incrementing and wrapping. Although AHB requires that an

    address phase be provided for each beat of data, the slave can still use the burst

    information to make the proper request on the other side. This helps to mask the

    latency of the slave.

    AHB is defined with a choice of several bus widths, from 8-bit to 1024-bit. The

    most common implementation has been 32-bit, but higher bandwidth

    requirements may be satisfied by using 64 or 128-bit buses.

    AHB used the HRESP signals driven by the slaves to indicate when an error has

    occurred.

    AHB also offers a large selection of verification IP from several different

    suppliers. The solutions offered support several different languages and run in a

    choice of environments.

    Access to the target device is controlled through a MUX, thereby admitting bus-

    access to one bus-master at a time.

    AHB Masters, Slaves and Arbiters support Early Burst Termination. Bursts can

    be early terminated either as a result of the Arbiter removing the HGRANT to a

    master part way through a burst or after a slave returns a non-OKAY response to

    ECET 19

  • 7/29/2019 Ocp Document Final

    20/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    any beat of a burst. However that a master cannot decide to terminate a defined

    length burst unless prompted to do so by the Arbiter or Slave responses.

    Any slave which does not use SPLIT responses can be connected directly to an

    AHB master. If the slave does use SPLIT responses then a simplified version of

    the arbiter is also required.

    Thus the strengths of the AHB protocol is listed above which clearly resembles

    the reason for the wide use of this protocol.

    2.8 Demerits

    Even though AHB protocol is commonly used bus in the design, it has some

    affordable demerits which are listed below.

    AHB cannot achieve full data bus utilization and bandwidth if some slaves have arelatively high latency.

    AHB defines transfer sizes of 1, 2, 4, 8, and 16 bytes. Because byte enables are

    not defined, there are cases where multiple transfers must be made inside a single

    quadword.

    AHB defines timing parameters for many of the relationships between signals on

    the bus. However, these are not associated with requirements relative to a clock

    cycle. Therefore, SoC developers must integrate AHB cores and run chip level

    static timing analysis to judge how compatible AHB masters and slaves are with

    one another.

    Power-based SoCs cover a wide range of applications, and there is a

    corresponding wide range of address map requirements. Having the address

    decodes for all AHB slaves reside within the interconnect means having to

    support the most complex split address ranges, even for the simplest of slaves.

    Thus the weakness of AHB protocol is mentioned above which can be tolerated

    with respect to its useful advantages.

    ECET 20

  • 7/29/2019 Ocp Document Final

    21/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 3CHAPTER 3

    OPEN CORE PROTOCOL

    3.1 INTRODUCTION

    The Open Core Protocol (OCP) is a core centric protocol which defines a high-

    performance, bus-independent interface between IP cores that reduces design time,

    design risk, and manufacturing costs for SOC designs. Main property of OCP is that it

    can be configured with respect to the application required. The OCP is chosen because of

    its advanced supporting features such as configurable sideband control signaling and test

    harness signals, when compared to other core protocols.

    The other bus and component interfaces address only the data flow aspects of core

    communications, the OCP unifies all inter-core communications, including sideband

    control and test harness signals. The OCPs synchronous unidirectional signaling

    produces simplified core implementation, integration, and timing analysis. The OCP

    readily adapts to support new core capabilities while limiting test suite modifications for

    core upgrades.

    3.2 Merits:

    The OCP has many advantages which will make the designers more comfortable

    which are listed below.

    OCP is a Point to point protocol which can be directly interfaced between the two

    IP cores.

    Most important advantage is that the OCP can be configured with respect to the

    application due to its configurable property.

    This configurable property will lead to reduction of the die area and the design

    time too. Hence the optimization of die area is attained.

    OCP is a bus independent protocol i.e. it can be interfaced to any bus protocol like

    AHB.

    ECET 21

  • 7/29/2019 Ocp Document Final

    22/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    This supports pipelining operation and multi-threaded application such as Multi

    Bank DRAM architecture.

    Support the Burst operation which will generate the sequence of addresses with

    respect to the burst length.

    This OCP provides more flexible to the designer who uses it and also gives high

    performance by improved core maintenance.

    The reusability of the IP cores can be done easily using OCP because the issue

    arises while reusing the IP cores for other application is that the interfaces already

    used in the system have to be modified with respect to the application. Supports Sideband Signals which will carry out the information such as interrupt,

    flags, error and status which are said to be non-dataflow signals.

    Also supports the Testing Signals such as scan interface, clock control interface

    and Debug and test interface. This ensures that the OCP can also be used to

    interface the Device under Test (DUT) and test signals can be passed.

    This OCP also enables the Threads and Tags which does the independent

    concurrent transfer sequence.

    OCP doubles the peak bandwidth at a given frequency by using separate buses for

    read and write data. These buses are used in conjunction with pipelining

    command to data phases to increase performance.

    Simplified circuitry needed to bridge an OCP based core to another

    communication interface standard.

    Thus the advantages of the OCP are listed above which clearly explains the basic

    reason of choosing this protocol when compared to others.

    ECET 22

  • 7/29/2019 Ocp Document Final

    23/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    3.3 Demerits:

    Every protocol has its own demerits which should not have more proportion in

    affecting the application or flexibility of the designer. Some of the demerits of OCP are

    mentioned below.

    The designing and verifying the OCP which supports all possible burst operation

    is complex and also it needs more time and effort.

    Slaves that support the largest burst transfer size will consume more die area than

    slaves that are able to accept.

    The main disadvantage of OCP is that the core which is to be interface with OCP

    should be OCP compliant, if not, the OCP compliant bridge must be created

    which will make the core OCP compliant.

    3.4 Basic Block Diagram

    The block diagram which explains the basic operation and characteristics of OCP

    is shown in Figure 2.1.

    The OCP defines a point-to-point interface between two communicating entities

    such as IP cores and bus interface modules. One entity acts as the master of the OCP

    instance, and the other as the slave. Only the master can present commands and is the

    controlling entity.

    The slave responds to commands presented to it, either by accepting data from the

    master, or presenting data to the master. For two entities to communicate there need to be

    two instances of the OCP connecting them such as one where the first entity is a master,

    and one where the first entity is a slave.

    ECET 23

  • 7/29/2019 Ocp Document Final

    24/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Figure 3.1 Basic block diagram of OCP instance

    Figure 2.1 shows a simple system containing a wrapped bus and three IP core

    entities such as one that is a system target, one that is a system initiator, and an entity that

    is both. The characteristics of the IP core determine whether the core needs master, slave,

    or both sides of the OCP and the wrapper interface modules must act as the

    complementary side of the OCP for each connected entity. A transfer across this system

    occurs as follows.

    A system initiator (as the OCP master) presents command, control, and possibly

    data to its connected slave (a bus wrapper interface module). The interface module plays

    the request across the on-chip bus system. The OCP does not specify the embedded bus

    functionality. Instead, the interface designer converts the OCP request into an embedded

    bus transfer. The receiving bus wrapper interface module (as the OCP master) converts

    the embedded bus operation into a legal OCP command. The system target (OCP slave)

    receives the command and takes the requested action.

    ECET 24

  • 7/29/2019 Ocp Document Final

    25/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Each instance of the OCP is configured (by choosing signals or bit widths of a

    particular signal) based on the requirements of the connected entities and is independent

    of the others. For instance, system initiators may require more address bits in their OCP

    instances than do the system targets; the extra address bits might be used by the

    embedded bus to select which bus target is addressed by the system initiator.

    The OCP is flexible. There are several useful models for how existing IP cores

    communicate with one another. Some employ pipelining to improve bandwidth and

    latency characteristics. Others use multiple-cycle access models, where signals are held

    static for several clock cycles to simplify timing analysis and reduce implementation

    area. Support for this wide range of behavior is possible through the use of synchronous

    handshaking signals that allow both the master and slave to control when signals are

    allowed to change.

    3.5 Theory of Operation

    The various operation involved in the Open Core Protocol will be discussed as

    follows.

    Point-to-Point Synchronous Interface

    To simplify timing analysis, physical design, and general comprehension, the

    OCP is composed of uni-directional signals driven with respect to, and sampled by the

    rising edge of the OCP clock. The OCP is fully synchronous (with the exception of reset)

    and contains no multi-cycle timing paths with respect to the OCP clock. All signals other

    than the clock signal are strictly point-to-point.

    Bus Independence

    A core utilizing the OCP can be interfaced to any bus. A test of any bus-

    independent interface is to connect a master to a slave without an intervening on-chip

    bus. This test not only drives the specification towards a fully symmetric interface but

    helps to clarify other issues. For instance, device selection techniques vary greatly among

    on-chip buses. Some use address decoders. Others generate independent device select

    signals (analogous to a board level chip select). This complexity should be hidden from

    ECET 25

  • 7/29/2019 Ocp Document Final

    26/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    IP cores, especially since in the directly-connected case there is no

    decode/selection logic. OCP-compliant slaves receive device selection information

    integrated into the basic command field.

    Arbitration schemes vary widely. Since there is virtually no arbitration in the

    directly-connected case, arbitration for any shared resource is the sole responsibility of

    the logic on the bus side of the OCP. This permits OCP-compliant masters to pass a

    command field across the OCP that the bus interface logic converts into an arbitration

    request sequence.

    Address/Data

    Wide widths, characteristic of shared on-chip address and data buses, make tuning

    the OCP address and data widths essential for area-efficient implementation. Only those

    address bits that are significant to the IP core should cross the OCP to the slave. The OCP

    address space is flat and composed of 8-bit bytes (octets).

    To increase transfer efficiencies, many IP cores have data field widths signifi-

    cantly greater than an octet. The OCP supports a configurable data width to allow

    multiple bytes to be transferred simultaneously. The OCP refers to the chosen data field

    width as the word size of the OCP. The term word is used in the traditional computer

    system context; that is, a word is the natural transfer unit of the block. OCP supports

    word sizes of power-of-two and non-power-of-two as would be needed for a 12-bit DSP

    core. The OCP address is a byte address that is word aligned.

    Transfers of less than a full word of data are supported by providing byte enable

    information that specifies which octets are to be transferred. Byte enables are linked to

    specific data bits (byte lanes). Byte lanes are not associated with particular byte

    addresses.

    ECET 26

  • 7/29/2019 Ocp Document Final

    27/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Pipelining

    The OCP allows pipelining of transfers. To support this feature, the return of read

    data and the provision of write data may be delayed after the presentation of the

    associated request.

    Response

    The OCP separates requests from responses. A slave can accept a command

    request from a master on one cycle and respond in a later cycle. The division of request

    from response permits pipelining. The OCP provides the option of having responses for

    Write commands, or completing them immediately without an explicit response.

    Burst

    To provide high transfer efficiency, burst support is essential for many IP cores.

    The extended OCP supports annotation of transfers with burst information. Bursts can

    either include addressing information for each successive command (which simplifies the

    requirements for address sequencing/burst count processing in the slave), or include

    addressing information only once for the entire burst.

    ECET 27

  • 7/29/2019 Ocp Document Final

    28/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 4CHAPTER 4

    Design of Open Core Protocol

    4.1 Introduction

    The literature survey on the OCP is made and the basic signal flow block diagram

    is identified. In the mentioned dataflow signal diagram, the basic signals are identified

    and are used in the simple read and write and burst operation in OCP Master and Slave.

    Initially the Finite State Machine (FSM) is developed and the modeling of the developed

    FSM is done using the VHDL.

    In this chapter, the design of OCP protocol is discussed and their simulations are

    verified with the basic operation.

    4.2 On-Chip Bus Functionalities

    The On-Chip Bus Functionalities are classified into 4 types including 1) burst, 2)

    lock, 3) pipelined, and 4) out-of-order transactions.

    4.2.1 Burst transactions

    The burst transactions allow the grouping of multiple transactions that have a certain

    address relationship, and can be classified into multi-request burst and single-requestburst according to how many times the addresses are issued. FIGURE 1 shows the two

    types of burst read transactions. The multi-request burst as defined in AHB is illustrated

    in FIGURE 1(a) where the address information must be issued for each command of a

    burst transaction (e.g., A11, A12, A13 and A14). This may cause some unnecessary

    7overhead. In the more advanced bus architecture, the single-request burst transaction is

    supported. As shown in FIGURE 1(b), which is the burst type defined in AXI, the

    address information is issued only once for each burst transaction. In our proposed bus

    design we support both burst transactions such that IP cores with various burst types can

    use the proposed on-chip bus without changing their original burst behavior.

    ECET 28

  • 7/29/2019 Ocp Document Final

    29/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    4.2.2 Lock transactions

    Lock is a protection mechanism for masters that have low bus priorities. Without this

    mechanism the read/write transactions of masters with lower priority would be

    interrupted whenever a higher-priority master issues a request. Lock transactions prevent

    an arbiter from performing arbitration and assure that the low priority masters can

    complete its granted transaction without being interrupted.

    4.2.3 Pipelined transactions (outstanding transactions)

    Figure 2(a) and 2(b) show the difference between non-pipelined and pipelined (also

    called outstanding in AXI) read transactions. In FIGURE 2(a), for a non-pipelined

    transaction a read data must be returned after its corresponding address is issued plus a

    period of latency. For example, D21 is sent right after A21 is issued plus t. For a

    pipelined transaction as shown in FIGURE 2(b), this hard link is not required. Thus A21

    can be issued right after A11 is issued without waiting for the return of data requested by

    A11 (i.e., D11-D14).

    ECET 29

  • 7/29/2019 Ocp Document Final

    30/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    4.2.4 Out-of-order transactions

    The out-of-order transactions allow the return order of responses to be different from the

    order of their requests. These transactions can significantly improve the communication

    efficiency of an SOC system containing IP cores with various access latencies as

    illustrated in FIGURE 3. In FIGURE 3(a) which does not allow out-of-order transactions,

    the corresponding responses of A21 and A31 must be returned after the response of A11.

    With the support of out-of-order transactions as shown in FIGURE 3(b), the response

    with shorter access latency (D21, D22 and D31) can be returned before those with longer

    latency (D11-D14) and thus the transactions can be completed in much less cycles.

    4.3 Hardware Design of the On-Chip Bus

    The architecture of the proposed on-chip bus is illustrated in FIGURE 4, where we show

    an example with two masters and two slaves. A crossbar architecture is employed suchthat more than one master can communicate with more than one slave simultaneously. If

    not all masters require the accessing paths to all slaves, partial crossbar architecture is

    also allowed. The main blocks of the proposed bus architecture are described next.

    ECET 30

  • 7/29/2019 Ocp Document Final

    31/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Arbiter

    In traditional shared bus architecture, resource contention happens whenever more

    than one master requests the bus at the same time. For a crossbar or partial crossbar

    architecture, resource contention occurs when more than one master is to access the same

    slave simultaneously. In the proposed design each slave IP is associated with an arbiter

    that determines which master can access the slave.

    4.4 OCP BLOCK DIAGRAM

    ECET 31

  • 7/29/2019 Ocp Document Final

    32/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Decoder

    Since more than one slave exists in the system, the decoder decodes the address

    and decides which slave return response to the target master. In addition, the proposed

    decoder also checks whether the transaction address is illegal or nonexistent and

    responses with an error message if necessary.

    4.5 FSM-M & FSM-S

    Depending on whether a transaction is a read or a write operation, the request and

    response processes are different. For a write transaction, the data to be written is sent out

    together with the address of the target slave, and the transaction is complete when the

    target slave accepts the data and acknowledges the reception of the data. For a read

    operation, the address of the target slave is first sent out and the target slave will issue an

    accept signal when it receives the message. The slave then generates the required data

    and sends it to the bus where the data will be properly directed to the master requesting

    the data. The read transaction finally completes when the master accepts the response and

    issues an acknowledge signal. In the proposed bus architecture, we employ two types of

    finite state machines, namely FSM-M and FSM-S to control the flow of each transaction.

    FSM-M acts as a master and generates the OCP signals of a master, while FSM-S acts as

    a slave and generates those of a slave. These finite state machines are designed in a way

    that burst, pipelined, and out-or-order read/write transactions can all be properly

    controlled.

    4.6 OCP Dataflow Signals

    The OCP interface has the dataflow signals which are divided into basic signals,

    burst extensions, tag extensions, and thread extensions. A small set of the signals from

    the basic dataflow group is required in all OCP configurations. Optional signals can be

    configured to support additional core communication requirements. The OCP is a

    synchronous interface with a single clock signal. All OCP signals are driven with respect

    to the rising edge of the OCP clock and also sampled by the rising edge of the OCP

    clock. Except for clock, OCP signals are strictly point-to-point and unidirectional.

    ECET 32

  • 7/29/2019 Ocp Document Final

    33/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    The basic signals between the two cores are identified and are shown in the Figure

    2.2 which is said to be a dataflow signal diagram. Here core1 acts as the master that gives

    the command to the slave and the core2 acts as the slave which accepts the command

    given by the master in order to perform an operation.

    Figure 4.5 OCP dataflow signals

    Figure 2.2 shows the OCP dataflow signals which include the Request, Response

    and Data Handshake. A set of signals comes under the request phase are the one which

    M

    A

    S

    T

    E

    R

    S

    L

    A

    V

    E

    CLK

    MCmd

    MAddr

    MData

    MBurstSeq

    MBurstLength

    MBurstPrecise

    SCmdAccept

    SResp

    SData

    SRespLast

    MDataLast

    InputAddr

    Control

    Input

    Data

    OutputData

    CORE 1 CORE 2

    RE

    Q

    U

    ES

    T

    R

    E

    S

    P

    O

    N

    S

    E

    Data

    Handshake

    D

    A

    TA

    F

    L

    OW

    SI

    G

    NA

    L

    S

    Burst

    Length

    ECET 33

  • 7/29/2019 Ocp Document Final

    34/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    will be used for requesting a particular operation to the slave. The request phase will be

    ended by the SCmdAccept signal. Similarly the signals comes under the response phase

    are the one which will used for sending the proper response to the corresponding request.

    The response phase will be ended by the SResp signal. The data handshake signals are

    one which deals with the data transfer either from master or slave.

    The Basic signals are the one which will be used in the simple read and write

    operation of the OCP master and slave. This simple operation can also support the

    pipelining operation. These basic signals are extended to the burst operation in which

    more than one request with multiple data transfer. It can also be defined in such a way

    that the burst extensions allow the grouping of multiple transfers that have a defined

    address relationship. The burst extensions are enabled only when MBurstLength is

    included in the interface.

    The burst length is the one which represents that how many write or read

    operation should be carried out in a burst. Hence this burst length will be given by the

    system to the master which will in turn give it to the slave through the MBurstLength

    signal. Thus the burst length acts as one of the input to the master only in burst mode is

    enabled. Whereas in simple write and read operation, the burst length input is not needed.

    From the Figure 2.2, the inputs and outputs of the OCP are clearly identified

    which are discussed as follows.

    4.7 Inputs and Outputs:

    Basically OCP has the address is of 13bits, data is of 8bits, control signal is of

    3bits and burst length is of integer type. The 8kbit memory (213 = 8192bits = 8kbits) is

    used in the slave side in order to verify the protocol functionality. The System will give

    the inputs to OCP Master during Write operation and receive signals from OCP Slave

    during Read operation which is listed below.

    ECET 34

  • 7/29/2019 Ocp Document Final

    35/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Master System

    Control

    o Control signal acts as input which will say whether the WRITE or READ

    operation to be performed by the master and is given by the processor

    through control pin.

    Input address

    o System will give the address through addr pin to the master to which the

    write or read operation can be carried out.

    Input data

    o This will act as input pin in which data will be given by the system

    through data_in pin to the master and that must be stored in the

    corresponding address during write operation.

    Burst Length

    o This input is used only when the burst profile is used and is of integer type

    which indicates the number of operations that is to be carried out in a

    burst.

    Output data

    o In Read operation, the master will give the address and the slave will

    receive the address. Now the slave will fetch the corresponding data fromthe sent address and that data will be given out through this data_out

    pin.

    4.8 OCP Specification:

    The specifications for the Open Core Protocol are identified for both simple write

    and read operation supports the pipelining operation and burst operation. The identified

    specifications are represented in tabulation format.

    4.8.1 Simple write and read

    This simple write and read operation for which the basic and mandatory signals

    required signals are tabulated in Table 2.1.

    ECET 35

  • 7/29/2019 Ocp Document Final

    36/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Table 4.8.1 Basic OCP Signal Specification

    S.No. NAME WIDTH DRIVER FUNCTION

    1 Clk 1 Varies OCP Clock

    2 MCmd 3 Master Transfer Command

    3 MAddr 13 (Configurable) Master Transfer address

    4 MData 8 (Configurable) Master Write data

    5 SCmdAccept 1 Slave Slave accepts transfer

    6 SData 1 Slave Read data

    7 SResp 2 Slave Transfer response

    The request issued by system is given to slave by MCmd signal. Similarly, in

    Write operation, the input address and data provided by the system will be given to slave

    through the signal MAddr and MData and when those informations are accepted, slave

    will give SCmdAccept signal which ensures that the system can issue next request.

    During Read operation, system issues the request and address to slave which will set

    SResp and fetch the corresponding data that is given to output through SData.

    Clk

    Input clock signal for the OCP clock. The rising edge of the OCP clock is defined

    as a rising edge of Clk that samples the asserted EnableClk. Falling edges of Clk and any

    rising edge of Clk that does not sample EnableClk asserted do not constitute rising edges

    of the OCP clock.

    EnableClk

    ECET 36

  • 7/29/2019 Ocp Document Final

    37/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    EnableClk indicates which rising edges of Clk are the rising edges of the OCP

    clock, that is. which rising edges of Clk should sample and advance interface state. Use

    the enableclk parameter to configure this signal. EnableClk is driven by a third entity and

    serves as an input to both the master and the slave.

    When enableclk is set to 0 (the default), the signal is not present and the OCP

    behaves as if EnableClk is constantly asserted. In that case all rising edges of Clk are

    rising edges of the OCP clock.

    MAddr

    The Transfer address, MAddr specifies the slave-dependent address of the

    resource targeted by the current transfer. To configure this field into the OCP, use the

    addr parameter. To configure the width of this field, use the addr_wdth parameter.

    MCmd

    Transfer command. This signal indicates the type of OCP transfer the master is

    requesting. Each non-idle command is either a read or write type request, depending on

    the direction of data flow.

    MData

    Write data. This field carries the write data from the master to the slave. The field

    is configured into the OCP using the mdata parameter and its width is configured using

    the data_wdth parameter. The width is not restricted to multiples of 8.

    SCmdAccept

    Slave accepts transfer. A value of 1 on the SCmdAccept signal indicates that the

    slave accepts the masters transfer request. To configure this field into the OCP, use the

    cmdaccept parameter.

    SData

    This field carries the requested read data from the slave to the master. The field isconfigured into the OCP using the sdata parameter and its width is configured using the

    data_wdth parameter. The width is not restricted to multiples of 8.

    SResp

    ECET 37

  • 7/29/2019 Ocp Document Final

    38/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Response field is given from the slave to a transfer request from the master. The

    field is configured into the OCP using the resp parameter.

    4.8.2 Burst extension

    The required signals are identified for the burst operation and are tabulated in the

    Table 2.2. Burst Length indicates the number of transfers in a burst. For precise bursts,

    the value indicates the total number of transfers in the burst, and is constant throughout

    the burst. For imprecise bursts, the value indicates the best guess of the number of

    transfers remaining (including the current request), and may change with every request.

    Here the burst length that can be configured which represents that many read or

    write operation can be performed in sequence. The Burst Precise field indicates whether

    the precise length of a burst is known at the start of the burst or not. The Burst Sequence

    field indicates the sequence of addresses for requests in a burst. The burst sequence can

    be incrementing which increments the address sequentially.

    Table 4.8.2 OCP burst signal specification

    S.No. NAME WIDTH DRIVER FUNCTION

    1 MBurstLength 13(Configurable) Master Burst Length

    2 MBurstPrecise 1 Master Given burst length is

    Precise

    3 MBurstSeq 3 Master Address seq of burst

    4 MDataLast 1 Master Last write data in burst

    5 SRespLast 1 Slave Last response in burst

    ECET 38

  • 7/29/2019 Ocp Document Final

    39/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Each type will be indicated by its corresponding representation such as increment

    operation can be indicated by setting the Burst Sequence signal to 000. Data last

    represents the last write data in a burst. This field indicates whether the current write data

    transfer is the last in a burst. Last Response represents last response in a burst. This field

    indicates whether the current response is the last in this burst.

    MBurstLength

    Basically this field indicates the number of transfers for a row of the burst and

    stays constant throughout the burst. For imprecise bursts, the value indicates the best

    guess of the number of transfers remaining (including the current request), and may

    change with every request. To configure this field into the OCP, use the burstlength

    parameter.

    MBurstPrecise

    This field indicates whether the precise length of a burst is known at the start of

    the burst or not. When set to 1, MBurstLength indicates the precise length of the burst

    during the first request of the burst. To configure this field into the OCP, use the

    burstprecise parameter. If set to 0, MBurstLength for each request is a hint of the

    remaining burst length.

    MBurstSeq

    This field indicates the sequence of addresses for requests in a burst. To configure

    this field into the OCP, use the burstseq parameter.

    MDataLast

    Last write data in a burst. This field indicates whether the current write data

    transfer is the last in a burst. To configure this field into the OCP, use the datalast

    parameter. When this field is set to 0, more write data transfers are coming for the burst;when set to 1, the current write data transfer is the last in the burst.

    SRespLast

    Last response in a burst. This field indicates whether the current response is the

    last in this burst. To configure this field into the OCP, use the resplast parameter.

    ECET 39

  • 7/29/2019 Ocp Document Final

    40/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    When the field is set to 0, more responses are coming for this burst; when set to 1, the

    current response is the last in the burst.

    Thus the OCP basic block diagram, dataflow signal diagram and its specifications

    are tabulated and hence give the clear view in designing the Open Core Protocol bus.

    4.9 Summary

    The literature survey is carried out with merits and demerits of OCP and the

    signal flow diagram is identified.

    The specification for the signals shown in the signal flow diagram is identified

    and its working is explained with the help of its block diagram.

    The discussion on the overview of the OCP operation was made which includes

    all the signals involved in the OCP.

    ECET 40

  • 7/29/2019 Ocp Document Final

    41/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 5CHAPTER 5

    Implementation of Open Core Protocol

    5.1 Introduction

    The design of the Open Core Protocol starts with the initial study based on which

    the development of FSM (Finite State Machine) for the various supporting operation after

    which the development of VHDL for the FSM.The development of the FSMs are the

    basic step based on which the design can be modelled. The FSM will ensure and explains

    the clear operation of the OCP step by step and hence this development will act as a basic

    step for design.

    The notations used while designing the OCP are listed in the Table 3.1, Table 3.2and Table 3.3 which are as follows.

    Table 5.1 Input control values

    Control Notations Used Command

    000 IDL Idle

    001 WR Write

    010 RD Read

    011 INCR_WR Burst_Write

    100 INCR_RD Burst_Read

    ECET 41

  • 7/29/2019 Ocp Document Final

    42/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Table 5.2OCP master command (MCmd) values

    MCmd Notations Used Command

    000 IDL Idle

    001 WR Write

    010 RD Read

    Table 5.3 Slave response (SResp) values

    SResp Notations Used Response

    00 NUL No Response

    01 DVA Data Valid / Accept

    5.2 IMPLEMENTATION

    5.2.1 Simple Write and Read Operation

    The simple write and read operation in OCP has the mandatory signals whose

    specification is mentioned in the Table 2.1.

    FSM for OCP master

    The Finite State Machine (FSM) is developed for the simple write and read

    operation of OCP Master. The simple write and read operation indicates that the control

    goes to IDLE state after every operation. The FSM for the OCP Master Simple Write

    ECET 42

  • 7/29/2019 Ocp Document Final

    43/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    and Read is developed and is shown in the Figure 3.1. Totally there are four states are

    available in this FSM such as IDLE, WRITE, READ and WAIT.

    Basically, the operation in the OCP will be held in two phases.

    Request Phase

    Response Phase

    Initially the control will be in IDLE state (Control = 000) at which all the

    outputs such as MCmd, MAddr and MData are set to dont care. The system will issue

    the request to the master such write request which leads to the WRITE state (Control =

    001). In this state, the address and the data will be given to the slave that is to be

    written and hence the process will get over only when the SCmdAccept is asserted to

    high. If SCmdAccept is not set, this represents that the write operation still in process and

    the control will be in the WRITE state itself. Once the write operation is over the controlwill go to the IDLE state and then it will check for the next request.

    Figure 5.4 FSM for OCP master - simple write and read

    When the read request is made, the control will go to the READ state (Control =

    010) and the address is send to the slave which in turn gives the SCmdAccept signal

    that ends the request phase. Once the SCmdAccept is set and SResp is not Data Valid

    (DVA), the control will go the WAIT state and will be waiting for the SResp signal.

    IDLE

    READWRITE

    WAIT

    Control = WrReq Control = RdReq

    SCmdAccept=1& SResp != DVA

    SCmdAccept=1

    SResp = DVA SCmdAccept=0

    MAddr, MCmd

    & MData

    MAddr &

    MCmd

    Data_out = SData

    SCmdAccept=0

    ECET 43

  • 7/29/2019 Ocp Document Final

    44/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    When the read operation is over which represents that the SResp is set to DVA and the

    data for the corresponding address is taken. Hence the SResp signal ends the response

    phase and the control will go the IDLE state, then checks for the next request.

    FSM for OCP slave

    The FSM for the OCP Slave which has the simple write and read operation is

    developed and is shown in the Figure 3.2.

    Figure 5.5 FSM for OCP slave - simple write and read

    The slave will be set to the respective state based on the MCmd issued by the

    master and the output of this slave is that the SCmdAccept and SResp. Initially control

    will be in the IDLE state and when the master issues the command as write request, and

    then the control will go the WRITE state in which the data will be written to th

    e corresponding memory address location which is sent by the masters. Once the write

    operation is finished, the SCmdAccept signal is set to high and is given to the master.

    When MCmd is given as read request, then the control will move to the READ

    state in which the data will read from the particular memory address location that is given

    IDLE

    WRIT

    E

    REA

    D

    MCmd = WrReq MCmd = RdReq

    SCmdAccept=1 &

    Sresp = NULL

    SCmdAccept=1, Sresp

    = DVA & SData

    SCmdAccept=0 &Sresp = NULL

    Store_Mem =

    MData

    SData =

    Store_Mem

    ECET 44

  • 7/29/2019 Ocp Document Final

    45/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    by the master. Hence the SCmdAccept is set to high and the SResp is set to the DVA

    which represents that the read operation over and control goes to the IDLE state.

    Simulation result for simple write and read

    The above developed FSM for the OCP Master and Slave which supports the

    simple write and read operation is designed using VHDL and is simulated. The designed

    OCP master and slave are integrated as a single design and is simulated waveform

    represents the complete transaction of simple write and read operation from master to

    slave and vice-versa which is shown in Figure 3.3.

    ECET 45

  • 7/29/2019 Ocp Document Final

    46/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Figure 5.6 Waveform for OCP master and slave - simple write and read

    The integrated OCP master and slave is simulated which clearly explains the

    operation of the FSM developed for the simple write and read. The input data is written

    in the given 0th and 3rd address memory location during write operation and is read out by

    giving the corresponding address during the read operation.

    Write Request

    and inputs aregiven

    After READ,again goes to

    IDLE State

    Master and Slave go toWRITE state based on

    Write Request

    Master and Slave go to

    IDLE state after Writeoperation over

    Input Data is stored incorresponding address

    during Write operation

    Stored Data is read from

    corresponding addressduring Read operation

    MCmd, MAddrand MData are

    asserted

    In IDLE, MAddrand MData are set

    to Dont Cares

    After IDLE, controlchecks for next request

    and goes to READ stateIn IDLE, MAddr and

    MData are set to DontCares

    MCmd, MAddr andMData are asserted

    ECET 46

  • 7/29/2019 Ocp Document Final

    47/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    5.1. 2 Burst Operation

    The burst signals that are used for the burst extension in OCP are tabulated in the

    table 2.2 with the specification. The main advantage of this burst extension is that the

    address will be generated with respect to the burst length mentioned. This enables the

    automatic generation of the address and acts as a major advantage of OCP.

    FSM for OCP master

    The FSM for the OCP Master which supports the burst extension is developed

    with respect to its functionality and is shown in Figure 3.4.

    Figure 5.7FSM for OCP master burst operation

    Note

    In the FSM with burst extension shown in Figure 3.4, the transition occurs for the

    condition (Count = BurstLength) will be the same when the condition (Count !=

    BurstLength). The only difference will be coming in the Address Generation. When the

    condition (Count = BurstLength) is set, then the address will be generated from the

    starting location and when the condition (Count != BurstLength) is set, then the address

    will be generated from the previous location.

    IDLE

    READWRITE

    WAIT

    Control = WrReq

    SCmdAccept=0

    Control = RdReq

    SCmdAccept=1& SResp != DVA

    SCmdAccept=1

    & (Count =BurstLength) SResp = DVA

    & (Count =BurstLength)

    SResp = DVA& (Count !=

    BurstLength)

    MAddr, MCmd,

    MBurstLength &MData

    SCmdAccept=1& (Count !=

    BurstLength)

    SCmdAccept=0

    MAddr, MCmd

    & MBurstLength

    Data_out = SData

    ECET 47

  • 7/29/2019 Ocp Document Final

    48/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    The basic operation for this burst extension remains the same as previously

    developed FSMs. Initially control will be in IDLE state and goes to the WRITE state

    when the write request is given. In this burst extension, the mandatory signal to be

    present is burst length which says the number of transfers in a burst. The counter is

    implemented in this operation which will start the count and hence the address generation

    will be started. When SCmdAccept is set to high, control check for the count value

    reaches the burst length. If not, the address generation will be continued and if count

    reaches burst length, the count is reset to zero and hence the address generation will start

    from initial location mentioned.

    Similarly, the control will be in READ state for the read request in which the

    count is process and when SCmdAccept is set to high, control goes to the WAIT state. In

    WAIT state, the count process will not be done i.e. the count process will be paused and

    hence the address generation will also be stopped. Once the Sresp is set to DVA, then the

    count process is continued which leads the address generation to be continued. The

    corresponding data for the generated address will be read from the memory and is sent to

    the master through the SData signal.

    Thus, after every burst operation i.e. either write or read, the control will goes to

    the idle state and then the next request will be checked by the control and will be

    performed according to it.

    FSM for OCP slave

    The FSM for the OCP slave with the burst extension is developed and is shown in

    the Figure 5.5.

    Note

    The transition occurs for the condition (Count = BurstLength) will be the same

    when the condition (Count != BurstLength). The only difference is the Address

    Generation. When the condition (Count = BurstLength) is set, then the address will be

    generated from the starting location and when the condition (Count != BurstLength) is

    set, then the address will be generated from the previous location or value.

    ECET 48

  • 7/29/2019 Ocp Document Final

    49/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    The Initial state will be IDLE and when the MCmd is set to write request, the

    control will go the WRITE state. Here the burst length and the count are declared because

    the slave may not know whether burst extension is enabled. The generated address and

    the input data will be given to the slave and it will store the data to the corresponding

    address and assert the SCmdAccept signal to high. Then the control will check for the

    count and the next MCmd request and will process according to it.

    Figure 5.8 FSM for OCP slave burst operation

    When the MCmd has read request, then the control will go to READ state and the

    corresponding data for the generated address will be read during which SCmdAccept is

    set to high. Once the read process over, SResp will be set to DVA and will check for both

    count and next request.

    6. SYNTHESIS RESULTS

    IDLE

    WRITE READ

    MCmd = WrReq MCmd = RdReq

    SCmdAccept=1 SCmdAccept=1, Sresp

    = DVA & SData

    SCmdAccept=0 &Sresp = NULL

    Store_Mem =

    MData & (Count =

    MBurstLength) SData = Store_Mem

    & (Count =

    MBurstLength)

    Count !=

    MBurstLength

    Count !=MBurstLength

    ECET 49

  • 7/29/2019 Ocp Document Final

    50/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Block Level Diagram

    Internal Diagram

    ECET 50

  • 7/29/2019 Ocp Document Final

    51/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    6.1.1 Address mux

    6.1.2 Slave response mux

    ECET 51

  • 7/29/2019 Ocp Document Final

    52/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    6.1.3 Master write

    ECET 52

  • 7/29/2019 Ocp Document Final

    53/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    6.1.4 Slave schematic

    ECET 53

  • 7/29/2019 Ocp Document Final

    54/68

  • 7/29/2019 Ocp Document Final

    55/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Design Statistics

    # IOs : 42

    Cell Usage :

    # BELS : 7818

    # BUF : 7

    # GND : 1

    # INV : 8

    # LUT1 : 52

    # LUT2 : 100

    # LUT2_D : 5

    # LUT2_L : 6

    # LUT3 : 2314

    # LUT3_D : 48

    # LUT3_L : 7

    # LUT4 : 2783

    # LUT4_D : 270

    # LUT4_L : 21

    # MUXCY : 71

    # MUXF5 : 1165

    # MUXF6 : 512

    # MUXF7 : 256

    # MUXF8 : 128

    # VCC : 1

    # XORCY : 63

    # FlipFlops/Latches : 2368

    # FD : 2123# FDE : 190

    # FDR : 21

    # FDRS : 1

    # FDS : 33

    ECET 55

  • 7/29/2019 Ocp Document Final

    56/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    # Clock Buffers : 1

    # BUFGP : 1

    # IO Buffers : 41

    # IBUF : 33

    # OBUF : 8

    6.4 Device utilization summary:

    Selected Device : 3s500efg320-5

    Number of Slices: 2947 out of 4656 63%

    Number of Slice Flip Flops: 2368 out of 9312 25%

    Number of 4 input LUTs: 5614 out of 9312 60%

    Number of IOs: 42

    Number of bonded IOBs: 42 out of 232 18%

    Number of GCLKs: 1 out of 24 4%

    6.5Timing Summary:

    ---------------

    Speed Grade:

    Minimum period: 13.413ns (Maximum Frequency: 74.555MHz)

    Minimum input arrival time before clock: 8.467ns

    Maximum output required time after clock: 5.184ns

    Maximum combinational path delay: No path found

    ECET 56

  • 7/29/2019 Ocp Document Final

    57/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    6.6 Summary

    Based on the literature review, the working of OCP masters and slaves is made

    clear and on identified specifications the design is made.

    Initially the FSMs are developed for both master and slave of OCP separately

    which includes simple write and read operation and burst operation.

    The modelling of the developed FSMs of OCP are made using VHDL.

    Finally the OCP is designed in such a way that the transaction between master and

    slave is carried out with proper delay and timings.

    The screen shots of the simulated waveform results are displayed and are

    explained with respect to the design behaviour.

    ECET 57

  • 7/29/2019 Ocp Document Final

    58/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    CHAPTER 7CHAPTER 7

    RESULTS

    7.1 Simulation results for burst operation

    The design of the developed FSM is done using VHDL and the integration of the

    master and slave is made. The simulation result of the integrated design gives the clear

    view on the OCP Burst Write operation which is shown in Figure 3.6.

    Here the burst length is given as 8 and hence the address will be generated for

    number of memory locations that equals to the burst length and the corresponding input

    data is given. The sequence address generation and writing the input data to the

    corresponding memory location is represented in the waveform clearly. Also the

    increment of count value is shown in the waveform based on which the sequence of

    address get generated.

    ECET 58

  • 7/29/2019 Ocp Document Final

    59/68

    DESIGN OF ON CHIP BUS WITH OCP INTERFACE DEPT OF ECE

    Figure 7.1 Waveform for OCP master and slave burst write operation

    The simulated waveform for the burst read operation is shown i


Recommended