+ All Categories
Home > Documents > F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

Date post: 03-Feb-2022
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
33
Course Introduction Purpose The intent of this course is to explain the major differences between PCI and PCI-X blocks of Freescale’s PowerQUICC III Processor and to explain the programming model for the PCI and PCI-X blocks. Objectives Identify the major differences between PCI and PCI-X blocks. Describe the major features of PCI blocks. Describe the major features of PCI-X blocks. Identify the programming models of the PCI and PCI-X blocks. Contents 30 pages 5 questions Learning Time 45 minutes Welcome to this course on the PCI and PCI-X blocks of Freescale’s PowerQUICC III Processor. Additional information is provided at the end to explain the programming model for PCI and PCI-X blocks. The underlying assumption of this course is that people are familiar with PCI, so we are not going to spend too much time describing the protocol, rather in this course we’ll talk about the major differences between PCI and PCI-X from a top level and how they are implemented in the PowerQUICC III. Some of the major features of the PCI and PCI-X blocks will also be discussed. And finally, at the end of this course, the programming models of the PCI and PCI-X blocks will be described.
Transcript
Page 1: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

1

Course IntroductionPurpose

• The intent of this course is to explain the major differences between PCI and PCI-X blocks of Freescale’s PowerQUICC III Processor and to explain the programming model for the PCI and PCI-X blocks.

Objectives• Identify the major differences between PCI and PCI-X blocks.• Describe the major features of PCI blocks.• Describe the major features of PCI-X blocks.• Identify the programming models of the PCI and PCI-X blocks.

Contents• 30 pages• 5 questions

Learning Time• 45 minutes

Welcome to this course on the PCI and PCI-X blocks of Freescale’sPowerQUICC III Processor. Additional information is provided at the end toexplain the programming model for PCI and PCI-X blocks.

The underlying assumption of this course is that people are familiar with PCI, so we are not going to spend too much time describing the protocol, rather in this course we’ll talk about the major differences between PCI and PCI-X from a top level and how they are implemented in the PowerQUICC III.

Some of the major features of the PCI and PCI-X blocks will also be discussed. And finally, at the end of this course, the programming models of the PCI and PCI-X blocks will be described.

Page 2: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

2

PCI Bus Architecture

• Hierarchical arbitrated multimaster 32-or 64-bit muxed address/data bus

• Up to 66 MHz operation– Peak bandwidth 528 MB/s– Significantly less in practice (50%?)

• 32- and 64-bit addressing• Transactions may be retried or

deferred– Retried transactions repeated by the

master– Deferred transactions accepted and

started by target while master retries transaction

Mem

Host Bus

PCI 0

PCI 1 PCI 2

PCI Device

PCI Device

PCI Device

PCIBridge

PCIBridge

HostBridge

CPU

PCI Device

PCI Device

PCI is an input/output (I/O) interconnect architecture that has been around for a very long time. This figure shows a typical PCI-based system. As you can see, you can add many PCI devices using the Host PCI bridge and PCI-to-PCI bridge. A large number of devices can be connected to a PCI Bus.

According to the PCI specification, some of the major features of the PCI Bus are that the Bus width would be either 32- or 64-bits, and in the PowerQUICC III part we support both of them. The common PCI frequency is 33 MHz or 66 MHz or sometimes slower. At 64-bits and 66 MHz PCI Bus frequency, the particular peak bandwidth is 528 megabytes per second and again in 32- or 64-bit mode, the addressing is PCI is 32-bit and 64-bit, respectively.

So PCI defines the transaction to be seen by all devices on the PCI system by retried and deferred mechanism.

Retried transactions are repeated by the master, and deferred transactions are accepted by the target. For example, let’s say a master initiates a transaction addressed towards a target on the PCI Bus. Then, if the target is busy or doesn’t have enough resources to process the transaction, it will retry the initiator. Now, the master is allowed to retry this particular transaction at a later time. On the other hand, the target should be ready to accept the transaction which it deferred once. This is basically a very top level description of the PCI Bus architecture.

Page 3: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

3

PCI-X Differences from PCI

133 MHz Operation• Latch-to-latch protocol

• More margin for prop delay• More margin for receiver logic

• In practice, only point-to-point is achievable at 133 MHz

TransmitterAsserts Signal

Prop Delay Receiver Logic

Receiver Responds

TransmitterAsserts Signal

Receiver Samples

Prop Delay

ReceiverResponds

Receiver Logic

Scan

PCI2.2PCI/PCI-X

Logic

PCI-XI/O Pin

PCI

PCI-X

Let’s take a look at the differences between PCI/X and PCI.

The conventional PCI uses the immediate protocol. Look at the PCI flow in this diagram.

The transmitter asserts a signal, then the signal propagates across the bus, which causes the propagation delay. Then on the same cycle, the receiver logic decodes the transaction to find out whether it should respond. On the following cycle, the receiver responds if it has to. Notice that on the first clock cycle for PCI, time is needed for the transmitter to assert the signal, then for propagation delay, and then finally the receiver logic.

PCI-X uses the latch-to-latch or register-to-register protocol. As you can see from the PCI-X flow of the diagram, only transmitter signal assertion and propagation delay occur in the first cycle.

The receiver logic which decodes the transaction is not used in the first cycle. In fact, at the end of the first cycle for PCI-X, the transaction is latched and the PCI-X receiver logic decodes the latched transaction on the second clock cycle.

Finally, on the third clock cycle, the receiver responds if required. From this simple diagram, you can see that PCI requires two cycles and PCI-X requires three clock cycles. The key point is that when it comes to PCI-X, the amount of work it has to do on a clock cycle basis is less because it uses latch-to-latch protocol. As a consequence, the time period for PCI-X is also less, and that’s why the PCI-X can operate at a much higher frequency than the conventional PCI.

Note that although higher frequency could be achieved in PCI/X, at the maximum frequency of 133 MHz, it becomes a point-to-point protocol and you can’t have more than two sync (one host and an agent). As you keep increasing the number of sync in a PCI-X, system the operating frequency reduces.

Page 4: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

4

Question

Is the following statement true or false? Click “Done” when you are finished.

“PCI requires two cycles, and PCI-X requires three clock cycles.”

True

False

Consider this question regarding the differences between PCI and PCI-X.

Correct. PCI requires two cycles, and PCI-X requires three clock cycles.

Page 5: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

5

PCI-X Differences from PCI

• Split Transactions– Separate transaction for request and response– Request and response are separately arbitrated– Latency and bus utilization are improved

Master(Initiator)

TargetCompleter

Master(Initiator)

CompletionInitiator

OtherBus Activity

REQ

GNT

COMPLETE REQ

GNT

COMPLETE

TargetCompleter

Data Read

READ COMPLETE

Conventional PCI protocol supports delayed transactions. With a delayed transaction, the device requesting data must poll the target to determine when the request has been completed and if its data is available. So, this polling time is a complete overhead, and during this time, the bus is held up by the requestor who does nothing but wait for the data from the target. In this type of situation, other masters on the bus who could have used the bus can’t use it.

A split transaction is implemented in PCI/X to improve bus utilization. Let’s look at the diagram. The master arbitrates for the PCI-X Bus and makes a read request. The target might not be able to immediately process the request made by the master. In this type of situation, the completer logic of the target sends acknowledgment only to the initiator, and the transaction is completed without the initiator getting the data. After this, the target can continue doing other things. Then finally, when the target has enough resources available to process the transaction once requested by initiator, it will process it.

The target would then arbitrate for the bus, provide the data that was once requested by the master, and terminate the transactions. With the help of these split bus transactions, the PCI-X Bus could be utilized in a efficient way.

In sum, requesting the data sends a signal to the target. The target device informs the requester that it has accepted the request. The requester is free to process other information until the target device initiates a new transaction and sends the data to the requester. Thus, split transactions enable more efficient use of the bus.

Page 6: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

6

PCI-X Differences from PCI

• Insertion of wait states restricted– Initiators cannot insert wait states.– Targets cannot insert wait states after the first data beat.– Both initiators and targets may end a burst only on naturally aligned

128-byte boundaries.– This improves bus utilization.

• Additional state carried with transactions– Each transaction in a sequence carries a byte count.– Each transaction carries the identity of the initiator.– This improves buffer management.

• Maximum transaction size limited to 4K bytes– This improves worst case latency.

Mouse over each bulleted point for more information.

Let’s take a look at another major difference between PCI and PCI-X: wait state.

Conventional PCI devices often add extra clock cycles, or wait states, into their transactions. The wait states are added to “stall” the bus if the PCI device is not ready to proceed with the transaction. This can slow bus throughput dramatically.

PCI-X eliminates the use of wait states, except for initial target latency. In other words, the initiators are not allowed to add wait states and the target can add wait state only with the first data bit. When a PCI-X device does not have data to transfer, it will remove itself from the bus so that another device can use the bus bandwidth. This provides more efficient use of bus and memory resources.

Move your mouse pointer over the first bulleted point for more information.

With PCI-X, adapters and bridges (host-to-PCI-X and PCI-X-to-PCI-X) are permitted to disconnect transactions only on naturally aligned 128-byte boundaries. This encourages longer bursts and enables more efficient use of cache-line based resources such as the processor bus and main memory.

The following enhancements are included within the attribute phase.

With conventional PCI protocols, the bridge (host-to-PCI or PCI-to-PCI) fetches a default number of cache lines (typically one or two) for every data request. Because the bridge has no way of knowing how much data will be requested, it uses the default number of cache lines. With PCI-X, the bridge knows exactly how much data to fetch because the byte count is included in the attribute field.

Each PCI-X transaction in a sequence identifies the total number of bytes remaining to be read or written in its associated sequence. This enables more efficient buffer management schemes in the bridge as well as more efficient utilization of the bus and other system resources. Move your mouse pointer over the second and third bulleted points for more information.

The sequence number uniquely identifies transactions that are part of the same sequence. It identifies the initiator and which bus segment the initiator resides on, as well as other information. The sequence number is used to increase efficiency in buffer-management algorithms.

Page 7: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

7

PCI & PCI-X Performance

N/A800 MB/s100 MHz64-bit

N/A133 MB/s33 MHz32-bit

528 MB/s66 MHz64-bit

N/A133 MHz

PCI-X SlotsPCI

SlotsPCI

Frequency

1066 MB/s64-bit

PCIBandwidth

PCI Width

Now let’s look at the performance differences between PCI and PCI-X.

This table lists the common PCI and PCI-X frequencies and how many devices or syncs can be connected to at those frequencies. You can see from the table that at 33 MHz, PCI allows one host and four devices for a total of 5 syncs, but PCI-X is not defined at this frequency.

The bandwidths mentioned in the table are theoretical bandwidths, and in an actual system these figures might differ based on system architecture, such as loading.

Also, you can see from this table that PCI is not truly defined above 66 MHz; PCI-X supports three and two syncs at 100 MHz and 133 MHz frequencies. Performance will be degraded as more and more syncs are added to the system. Note that people run PCI at many different frequencies other than what is listed here.

Page 8: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

8

QuestionPlace the operating frequency and number of slots shown in their correct positions in the table by dragging the letters on the left to the table on the right and click Done.

Done Reset Show Solution

A

B

C

D

33 MHz

N/A800 MB/s100 MHz64-bit

N/A133 MB/s32-bit

528 MB/s66 MHz64-bit

N/A133 MHz

PCI-X Slots

PCISlots

PCI Frequency

1066 MB/s64-bit

PCIBandwidth

PCI Width

A

D

C

B

Here is a question to check your understanding of the material presented so far.

Correct!A PCI Frequency of 33 MHz allows one host and four devices for a total of 5 syncs, and PCI-X supports three and two syncs at 100 MHz and 133 MHz frequencies.

Page 9: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

9

PCI & PCI-X Ordering Rules

• Ordering rules accomplish two purposes– Guarantee the results of one master’s write transactions are observable

by other masters in the proper order even when writes are posted at bridges

– Prevent deadlock when bridges attempt to empty their posting buffers• Ordering Rules

– Posted writes must occur in the same order on the second bus– Writes flowing through a bridge in one direction have no ordering with

respect to writes flowing in the other– A read appearing at a bridge pushes writes ahead through the bridge

and also pushes all posted writes in the other direction– Bridge may not make contingent a posted write on prior completion of a

master transaction

Let’s take a look at the ordering rules that should be implemented in PCI-X block design to avoid a deadlock situation.

Ordering rules are important because they help prevent PCI-X deadlock. Another of the ordering rules’ features, or key points, is that writes should be viewed in order by all devices.

Write flowing in one direction does not depend on the write in the other direction. For instance, the Master may write to another device, and it may also allow other Masters to write into its memory. So these two writes in two different directions do not depend on each other. This is also very important to remember.

Another key point is that read pushes write ahead in the bridge. For example, suppose a Master makes four write transactions. It could be four consecutive write transactions followed by a read transaction. Now if the read transaction is successful, this implies that all the other four writes are also successful, because read will push out all four of the posted writes.

Note that from the Master’s standpoint, the writes are always posted, so in other words, the write gets buffered up in the internal buffer space. The read will make sure that it pushes out all the writes from the posted buffer, and then it places the transaction on the PCI Bus.

Page 10: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

10

RapidIO Transaction Mapping

RapidIO Interoperability Specification discusses PCI/X interoperability with RapidIO:• Recommends mappings for PCI/X to RapidIO transactions

– Note that NWRITE-R transactions are not allowed• Recommends RapidIO priority for PCI/X transaction types

– Note that all RapidIO transactions to PCI/X must be priority 0

Mouse over the bulleted points to learn more.

This page, about PCI/X to RapidIO transaction mapping, is for your information only; specifically, these items are not supported by PowerQUICCIII PCI-X to RapidIO mapping. These two are deviations as far as the mapping between PCI-X and RapidIO is concerned.

Move your mouse pointer over the two bullet points to see the mapping deviations.

Page 11: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

11

MPC8540/60

General PCI/X features:• 32-bit and 64-bit interfaces• Host and Agent mode support• 64-bit dual address cycle (DAC) support• On-chip PCI/X arbiter supporting 5 REQ/GNT pairs• ATMU supports access to entire PCI/X memory and I/O address space• Inbound ATMU supports snoop, stash attributes• Inbound and outbound streaming to memory• Inbound and outbound write posting

So far, we have seen the difference between the PCI and PCI-X and also looked into some important implementation guidelines. Now, let’s discuss the specifics of PCI and PCI-X.

As you can see, both PCI and PCI-X can be configured as 32-bit or 64-bit I/O interfaces. They can be configured either as a Host or as an Agent. There is also an on-chip arbiter for the PCI and the PCI-X block. This arbiter supports five REQ/GNT pairs. The 64-bit dual address cycle is also supported by the PowerQUICC III.

Many configuration options are available for PCI and PCI-X. You can enable or disable the arbiter, operate either at 32-bit or 64-bit mode, configure either as the Host or as an Agent, and if it is in the Agent mode, you can decide whether it should be in Agent configuration lock mode. Finally, you can select whether you want to use PCI or PCI-X.

One important thing to note is you cannot use both PCI and PCI-X simultaneously; you have to choose one. In order to facilitate the programming model, the address transaction mapping unit (ATMU) can provide many address mappings for the PCI PCI-X block.

The ATMU for this block is like the ATMUs for the other block. The inbound ATMU transactions can be snooped up to the processor, or it could be stashed on to the L2 cache. The block supports streaming and write posting.

Page 12: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

12

MPC8540/60 PCI

Specific Features:• PCI rev 2.2 compatible• 3:1 to 16:1 clock ratios supported• Memory prefetching of PCI read transactions• PCI 3.3V compatible

Now let’s look at the PCI-specific features of PowerQUICC III. The PCI clock of PowerQUICC III is compatible with PCI specification 2.2.

PCI supports all the clock ratios starting from 3:1 to 16:1. This ratio is between the platform frequency and PCI Bus frequency. One important thing to note here is that the 2:1 ratio between the platform and PCI Bus frequency is not supported for PCI.

Memory prefetching for PCI read transactions is supported. In order to enable memory prefetching, the ATMU has to be programmed with this feature. Prefetching happens on a cache line boundary, and the PCI block is 3.3V compliant.

Page 13: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

13

QuestionIdentify the clock ratio that will complete the sentence. Select the correct

answer and then click Done.

PCI supports all the clock ratios starting from 3:1 to 16:1. This ratio is between the platform frequency and PCI Bus frequency. One important thing to note here is that the ___ ratio between the platform and PCI Bus frequency is not supported for PCI.

a. 1:2

b. 2:1

c. 3:1

d. 16:1

Let’s see if you can remember the clock ratios.

Correct. PCI supports all the clock ratios from 3:1 to 16:1. This ratio is between the platform frequency and PCI Bus frequency. One important thing to note here is that the 2:1 ratio between the platform and PCI Bus frequency is not supported for PCI.

Page 14: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

14

MPC8540/60 PCI-X

PCI-X 3.3V compatiblePCI 3.3V compatible

Implementation of global outbound relaxed ordering (not per transaction)

Memory prefetching of PCI read transactions

All PCI-X ordering rules enforced--

Gathers up to 4 cache line transactions into an ADB--

Support for 4 split transactions; complete ADB support--

2:1 to 16:1 clock ratios supported3:1 to 16:1 clock ratios supported

PCI-X rev 1.0a compatiblePCI rev 2.2 compatible

MPC8540/60 PCI-X specific features

MPC8540/60 PCI specific features

Now let’s look at the PCI-X specific features for the PowerQUICC III part.

It is compatible with PCI-X specification rev1.0a, and as far as the clocking is concerned, it supports integer clock ratios from 2:1 to 16:1. Again this is the ratio between platform frequency and the PCI frequency. The PCI input clock and the system input clock are one and the same.

The PCI-X has support for four line transactions. It can gather four cache line transactions into an allowable disconnect boundary (ADB), which is a naturally aligned 128B boundary.

Unlike PCI, the PCI-X has relax ordering. When you select the relax ordering, you loose all the benefits such as writes are in order, and read pushes write. These kinds of features will no longer be supported if you select the relax ordering. Also, relax ordering for the PCI-X is a global feature. The PCI-X block is 3.3V compatible.

Page 15: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

15

PCI/X Internal Architecture• PCI and PCI-X have separate

logic blocks• OCN Gasket contains most

buffers• PCI and PCI-X blocks exchange

data with gasket in 32-byte chunks

PCIBlock

PCI-XBlock

Arbiter

Registers

Arbiter

Registers

PCI-X Interface

OCNGasket

OCN Interface

Now let’s look at the PCI/X internal architecture.

This diagram shows the internal architecture implementation of the PCI/X blocks.

The PCI block and the PCI-X block have separate logics.

PCI has its own arbiter, register sets etc. PCI-X also has its own resources. They are matched together to the same set of PCI/X I/O pins.

PCI and PCI-X blocks talk to the same OCeaN (OCN) gasket.

The PCI-X block exchanges data with the OCN gasket in 32-byte chunks. It really does not matter whether PCI-X interface is 32- or 64-bit. Internally, the PCI-X block exchanges data with the OCN gasket in 32-byte boundary.

Page 16: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

16

PCI/X to OCN Gasket

• OCN to PCI/X request logic – Supports up to 13 transactions:

• 7 writes or read-responses• 6 read-requests

– Splits up to 256 byte OCeaN transactions into cache-line size for transmission to PCI/X Logic

– Enforces request ordering• PCI/X to OCN logic

– Supports up to 7 transactions – Supports priority reordering of requests or responses– Supports pipelining of PCI/X requests– Enforces request ordering– Splits byte enable transactions into legal OCeaN transactions

Mouse over each bullet item to learn more.

Here are more implementation details on the PCI to OCeaN gasket. The interface OCeaN to PCI-X request logic supports 13 transactions. The PCI/X and OCeaN gasket interface is 32 bytes whereas the OCeaN transactions are up to 256 bytes long. Therefore, the OCeaN transactions larger than 32 bytes are broken into 32 bytes cache lines and then delivered to the PCI/X block.

The OCeaN to PCI-X request is basically for the outbound transactions. For instance, on the local side, if any of the Masters are trying to make a PCI-X transaction, it will go via the OCeaN and then the PCI-X logic. In the other direction, when the inbound transaction comes in, the PCI/X to OCeaN logic is used. This logic block supports 7 transactions. Although reordering of requests or responses is possible, the default priority for inbound transactions is 0. All the requests that come in can be pipelined.

Move your mouse pointer over each bullet point for more information.

Page 17: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

17

PCI/X to OCN Gasket

4 separate data buffer pools:• One for writes from OCN to PCI/X

– 7 transactions, 256 bytes each• One for read-responses from OCN to PCI/X

– 8 transactions, 32 bytes each• One for writes from PCI/X to OCN

– 8 transactions, 32 bytes each• One for read responses from PCI/X to OCN

– 6 transactions, 256 bytes each

Four data buffer pools are implemented using two SRAM. These buffer pools collect the data temporarily.

Page 18: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

18

PCI/X Interface Memory Maps

4 Inbound Windows• 1 linked to PCSRBAR• 1 linked to 32-bit BAR• 2 linked to 32/64-bit BAR

The PCI Base Address registers assert “PCI_DEVSEL” to point to the beginnings of each of the address ranges to which the device responds. The BAR at offset 0x10 is a fixed 1 MB window that is automatically translated to the local configuration, control, and status registers address space.

The other base address registers are aliases, with differing format, of the PCI inbound ATMU windows. The 32-bit Base Address register at offset 0x14 corresponds in inbound ATMU window 1; the 64-bit Base Address registers at offsets 0x18 and 0x20 correspond to inbound ATMU windows 2 and 3. If one of these registers is written, the corresponding ATMU register is also updated; if a PCI inbound ATMU register is written, the corresponding BAR is also updated.

Config Access

IACK Access

Outbound Windows

Inbound Windows

Error Management

Memory Mapped Registers

[CCSRBAR] + 0x0_8000

Mouse over Outbound Windows and Inbound Windows to learn more.

5 Outbound Windows •1 default window - used when the Outbound transaction misses the other four windows•4 programmable windows

So far, we have talked about the major difference between the PCI and PCI-X, PowerQUICC III -specific PCI features and PCI-X features. We have also looked at some of the implementation details. In the remaining part of this course, we will talk about the programming model of the PCI/X block. Before looking into register programming, we will describe some of the important data structures and important memory-mapped related parameters.

Beginning at the top, you can see the major data structures that need to be programmed in the PowerQUICC III -based PCI system. Located at the top of the memory map are the configuration address register and configuration data register, followed by Interrupt Acknowledgment related registers, the Outbound and Inbound registers, and finally, the Error Management related registers.

CCSRBAR is a pointer to the configuration control and status registers (CCSR), which are memory mapped.

There are five Outbound windows, which enable data transfer from the local side to the PCI/X Bus side.

Move your mouse pointer over the Outbound Windows block in the diagram for more information.

The base address of the Inbound windows can be defined either from the local side or from the PCI side. A link exists between the local Inbound Base Address register, which is part of the internal CCSRBAR map, and the Configuration Space Base Address register, which is part of the standard PCI configuration space.

Move your mouse pointer over the Inbound Windows block in the diagram for more information.

Several error management related registers are for the PCI/X block.

Page 19: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

19

PCI/X Interface Memory Maps• Configuration Registers

– Accessible from PCI/X Bus• Configuration cycles• Memory mapped (PCSRBAR)

– Accessible internally• Configuration using CFG_ADDR/CFG_DATA

PCIConfiguration

Registers

Config Access

IACK Access

Outbound Windows

Inbound Windows

Error Management

PCI Masters

[CCSRBAR] + 0x0_8000 [PCSRBAR]

Local Address Space Transactions

Let’s look at the Configuration Space Data Structure and how it can be accessed from the local side and from the PCI-X side.

From the PCI, the external Master can use two methods to access the PCI Configuration Space related register. This configuration-related data structure is represented by the blue block.

One way for the PCI Master to access this Configuration Space is by generating standard configuration cycles. Another way for the PCI Master to access this Configuration register is via the PCSRBAR register, which is actually a memory mapped register.

Page 20: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

20

QuestionLabel the Memory Mapped registers by dragging the letters on the left to their correct locations on the right and click Done.

Done Reset Show Solution

A

B

C

D

Config Access

Inbound Windows

Outbound Windows

IACK Access

E Error Management

AD

C

B

E

Memory Mapped Registers

Here is a question to check your understanding of the material presented so far.

Correct!From the top down the registers are Config Access, AICK Access, Outbound Windows, Inbound Windows, and then Error Management.

Page 21: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

21

Initialization

Initialization: Host, Agent and Configuration Space

• When PCI/X interface configured as host:• Bus Master enabled on interface following reset

• When PCI/X interface configured as agent:• Bus Master disabled on interface following reset• Agent Config Lock = 0: Responds to external configuration cycles• Agent Config Lock = 1: Retries external configuration cycles

• Configuration Space– Internal devices access configuration space via

• Memory mapped CFG_ADDR/DATA registers• Can generate both Type 0 and Type 1 configuration cycles• Registers accessed as little-endian

– All other memory-mapped registers big-endian– External Devices access configuration registers via

• Configuration cycles• Memory mapped through PCSRBAR

Mouse over Configuration Space to learn more.

The PCI or PCI/X block can be configured as either a Host or as an Agent.

From the local side, the PCI configuration register space can be accessed in only one way: by using the CFG_ADDR register and CFG_DATA register. We will look into the programming details as to how to access the Configuration Space later in this course.

This page describes the various PCI-X modes and the initialization status of the block after the system boots up. When the PCI-X Interface is configured as a Host, it has the Bus Mastering capability by default. Then, the PCI-X block can also be configured to act as an Agent. In this case, the Bus Mastering is not enabled by default; rather, it is disabled after the reset.

Click “PCI/X Host” and “PCI/X Agent” for more information on when the PCI-X block is configured as each.

In other words, if the Agent needs the Bus Mastering capability, the software has to do some software programming to enable thisfeature. Again, in the Agent mode, there are two different types of Agents. One is the Agent Config Lock mode enable and the other is Agent Config Lock disabled. When the Agent Config Lock is disabled, it responds to the external configuration cycle by default.

By default, right after the reset, the Agent will be able to respond to the configuration cycle generated by the external Master. If it is operating in an Agent Config Lock mode immediately after reset, the Agent will retry all the external configuration cycles denoted by other PCI Masters.

The Configuration Spaced Locked mode for the Agent is sometimes useful because during the boot up process, the Agent can do some basic boot setup from the EEPROM. While the configuration space is locked, the Agent will not allow other external PCI Masters to change its Configuration space registers.

The sequence can be in the Agent mode during the boot period. It can read out the boot data from an eSquare prompt and remain inthe Config Lock mode so the external Master cannot access its registers. Finally, when the boot up procedure is finished, it candisable the Config Lock. At this point, the external Master can access the Configuration Space registers.

The configuration space registers can be accessed from the local side or from the PCI side. If configuration registers need to be accessed from the local side, then the CFG_ADDR register needs to be programmed, followed by a read or write from the CFG_DATA register. On the other hand, if the configuration registers need to be accessed from the PCI side by an external PCI master, then PCSRBAR register needs to be used.

Move your mouse pointer over Configuration Space for more information.

Page 22: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

22

PCI/X Host

Config as Host =========-Controlling the system-bus master mode is enabled following reset

[This is a reference page for the “PCI/X Host” button on E021.]

Page 23: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

23

PCI/X AgentConfig as Agent=========-bus master mode is disabled following reset, needs to be configured by someone else-cfg lock =0 -cfg lock=1 will retry config cycle =>boot seq to config a part; during boot; so no one else can try and run config cycles-cfg space, pci/x config block; 256 that sits out in memory floating, defines with the PCI/X; we can choose to These reg are memory mapped; config add@0, config data@4The PowerQUICC III pci/x block supports both type 0,1Spec as LEALL MEMORY MAPPED ARE BEBUT 256 ARE LEINTERNAL DEVICE CONFIG_ADDR/DATAEXTERNAL DEVICE=CONFIG CYOR=PCSRBAR [INBOUND ATMU] LINKS TO ON-BOARD PowerQUICC III MEM MAPPED REGISTER

[This is a reference page for the “PCI/X Agent” button on E021.]

Page 24: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

24

Address Mapping ExampleAddress Mapping Example

Local Map

RapidIO Map

Inbound Address Translation

Outbound Address Translation

LAW

(DRAM)

Local Port Mapping

0

232 – 1

IBW

IBW

IBW

PCI Map

LAW

LAW

OBW

OBW

OBW

0

264 – 1

234 – 1

0

Let’s take a look at an address mapping example.

Mapping is possible, for instance, between the PCI and the RapidIO with the help of the Inbound Window (IBW) in the PCI Map.

The Inbound Window can help put the data in the Local Map.

The Inbound Window can also be set up so that it will be directed to the RapidIO.

A variety of different kinds of mapping are possible using the Inbound Windows in the Local Address Window and using the Outbound Window.

Page 25: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

25

Creating an Outbound Window

OBW

•8 KB Outbound Window starting at 0xA000_000 in the Local Memory

•Any access within 0xA000_0000 and 0xA000_2000 gets mapped to 0x8000_0000-0x8000_2000

Local memory map PCI memory map

0xA000_0000

0x8000_0000

0xA000_2000

0x8000_2000

OBW

Click the OBW box for more information.

Let’s take a look at the actual programming data.

First, how is an Outbound Window (OBW) created?

The purpose of creating an Outbound Window is so that from the internal side you can make an access that reaches out to the PCI side. In this case, the transaction is going out with respect to the local side; thus the name Outbound Window.

Here are some numbers; the next you will see how these numbers should be programmed in the register to enable Outbound Mapping.

The key concept here is on the Outbound Window side.

The Outbound Window’s starting address is A, and then all zeros, and it’s an 8 KB window. Access that falls within this Outbound Window gets mapped into the PCI Space.

Click the “OBW” box to see the Outbound Window’s starting address.

This PCI space starts from 80 million hex. For example, let’s say from the local side an e500 CPU accesses A0000000. If this is the write access, then this write actually shows up on the PCI Bus or PCI-X Bus, and the address for that write transaction is going be 80 million hex. Notice that a translation happens from the local memory to PCI memory; that is, the local transaction at address 0xA0000000 ends up as a PCI transaction with address 0x80000000. This occurs because of the programming of the Outbound ATMU.

Click the “OBW” box for more information.

Page 26: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

26

Creating an Outbound Window

• Total 5; 0 is the default; Window 0 is default window and is enabled upon reset– CCSRBAR->POTAR1= 0x00080000;– CCSRBAR->POTEAR1=0x00000000;– CCSRBAR->POWBAR1= 0x000A0000;– CCSRBAR->POWAR1= 0x8004400C;

• EN=1,S_D=0,RTT=0b0100, WTT=0b0100,OWS=0b001100

Let’s take a look at the register programming details.

There are four important registers that you must program. The very first register is POTAR1, and this actually contains the top 20-bits of the PCI starting address 80 million hex. From 80 million, you take the top 20 bits and put them in this register, as shown above.

The second register is POTEAR1. This register is useful only when you’re dealing with a 64-bit interface. If it is a 32-bit interface, then you don’t need this register, so this example assumes it is a 32-bit PCI and PCI-X interface.

The third register is POWBAR1, and it should contain the starting address of the Outbound Window. You learned previously that the address is A and then all zeros. You take the top 20 bits of this address and put them into this register.

Finally, there is the POWAR1 register, which is actually an attribute register. You can assign many different kinds of attributes to the Outbound Window. For instance, some of the key things here are that the EN-bit has to be set and if it is not, then the translation will be disabled, so you have to make sure that EN is set to 1.

Then S_D will enable you to do the byte swapping on the fly. In our case, we set it to 0, because we don’t want it. Then, RTT and WTT follow, and in this case it mentions that it does the standard read and standard write. A variety of other combinations are possible.

The last one is OWS, which determines the window size, and in our case, it is 8 KB. So there is a whole bunch of listing. You can look at the register data and use them and you can select one of these values, based on your design or based on the size of your Outbound Window.

Click “Example” to see the output of the programmed registers.

Page 27: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

27

Example1) POTARx=PCI/X Outbound translation address registerPOTEARx=PCI/X Outbound translation extended address registerFor 32-bit, POTARx[TA:12-31]=bit 31-12 of the translation addressFor 64-bit, POTARx[TA:12-31] =bit 31-12 of the translation addressPOTEARx[TEA:12-31]=bit 43-32 of the translation address3) POWBARx=PCI/X Outbound window base address register4)POWARx=PCI/X Outbound attributes registerEN=1=translation enabledS_D=0, this bit disables the byte lane redirection on data busRTT=0b0100 (memory read), diff read type on the pci busrWTT=0b0100(memory write), diff read type on the pci busOWS=outbound window size=(min,max)=(4KB-4GB)

[This is a reference page for the “Example” button on E024.]

Page 28: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

28

Creating an Inbound Window

Local memory map PCI memory map

0xB000_0000

0x9000_0000

0xB000_2000

0x9000_2000IBW

•8 KB Inbound Window starting at 0x9000_000 in the PCI Memory

•Any access within 0x9000_0000 and 0x9000_2000 by an external PCI master gets mapped to 0xB000_0000-0xB000_2000

Click the “IBW” box for more information.

Let’s take a look at how an Inbound Window is created.

The register programming model is somewhat similar, but the idea here is based on the outbound mechanism; the transaction originates in the PCI side, and it ends up on the local side. So, naturally, the transaction has to be initiated by an external PCI Master.

In this case, on the PCI side, the Inbound Window starts at 90 million hex. The size for the Inbound Window is 8 KB.

Click the “IBW” box for more information.

On the local side, the window starts at B and then all zeros.

Click the “IBW” box for more information.

For example, if an external PCI Master is making a write transaction at an address of 90 million hex, then it gets mapped to the PowerQUICC III Local Memory Map and the mapping address becomes B and then all zeros.

Page 29: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

29

Creating an Inbound Window

• The inbound ATMU is comprised of four windows: a configuration window and three general translation windows.– CCSRBAR->PITAR = 0x000B0000;– CCSRBAR->PIWBAR= 0x00090000;– CCSRBAR->PIWBEAR=0x00000000;– CCSRBAR->PIWAR=0x80F440C0

• (EN=1,S_D=0,PF=0,TGI=0b1111,RTT=0b0100,WTT=0b0100,IWS=0b001100)

Click the bulleted point to learn more.

Identify the address.

Collect the top 20 bits.

Place in this Register.

Let’s look at the register programming details to implement this Inbound Window. It is somewhat similar to creating an Inbound Window.

Click the bulleted point for more information.

Again, the basic idea is you identify the address, collect the top 20 bits, and put them in this register as shown.

Similar to the Outbound Window, there is also the Attribute register, which allows you to define a variety of attributes for your Inbound Window.

Page 30: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

30

Question

Is the following statement true or false? Click “Done” when you are finished.

“For creating an Inbound Window, the transaction originates in the PCI side, and it ends up on the local side. Therefore, the transaction has to be initiated by an external PCI Master.”

True

False

Consider this question regarding the Inbound Window.

Correct. For creating an Inbound Window, the transaction originates in the PCI side, and it ends up on the local side. So naturally, the transaction has to be initiated by an external PCI Master.

Page 31: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

31

Configuration Access

• Accessing internally using CFG_ADDR/CFG_DATA• Note that CFG_DATA is part of the little endian mapped data structure,

whereas CFG_ADDR is not• Example 1: Configuration Read from local side

– r1 contains the pointer to CFG_ADDR register.– r2 contains the pointer to CFG_DATA register.

• Initial value: – r0=0x80000000 (CFG_ADDR register

format=>EN=1;Bus#=b00000000;Device#=0b00000;Function#=b000; Register#=b000000)

– r1=CCSRBAR + 0x8000 //addr of CFG_ADDR– r2=CCSRBAR + 0x8004 //addr of CFG_DATA

The final item we will discuss is Configuration Access. Configuration Access pertains to the Configuration Space for the standard PCI/PCI-X nomenclature.

Let’s take a look at how Configuration Access can be done from the local side or internally.

From the internal side, if you want to do a Configuration Access, then you have to use the CFG_ADDR register and CFG_DATA register. One of the most important things to notice here is CFG_DATA register is part of the little Endian Memory Map data structure. CFG_ADDR is not part of the little Endian Map data structure.

When you access the CFG_DATA register, you have to make sure that you perform byte swapping properly.

For example, let's assume that we want to do a configuration read from the local side. To do this, we need to program the CFG_ADDR register first, followed by a read from CFG_DATA register. For the purpose of programming the CFG_ADDR register, let's assume that we have these general purpose registers with the shown initialized values. r0 register contains the value that needs to be programmed into the CFG_ADDR register. r0 also contains all the information pertaining to the bus number, device number, function number, and the register number, which are all the standard things you need to generate the configuration cycle.

You determine these items to construct the value and put it on r0 register.

In this example, the constructed value is 80 million hex. R1 contains the pointer to Configuration Address register, and r2 is the pointer to the Configuration Data register. Also, note that from the top of the Memory Map Configuration Address register the hex address is 8000. The Configuration Data register is a 8004 byte address away from the top of the Memory Map.

Page 32: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

32

Configuration Access

• Code Sequencestw r0,0,r1 //writing to CFG_ADDR synclwbrx r3,0,r2 //reading from CFG_DATA (LE)sync

RESULT:r3 now contains the vendor and device ID– Device ID=0x0003, Vendor ID = 0x1057

Here you can see an example of a configuration read performed by the e500 CPU. The correct sequence to generate the configuration cycle is described below. The first step is to generate the number and then write it to the Configuration Address register.

For a configuration access, it is important to write to the CFG_ADDR first before attempting to access CFG_DATA. For performance reasons, the e500 CPU can do out of order load and store execution. In this specific scenario, we need to ensure that the e500 doesn't perform out of order load and store; instead, we need to make sure that the load "stw" happens first, followed by the "lwbrx". Therefore, we put a “sync” instruction in between the two instructions, which will ensure the store happens first and is followed by load.

Again, you have to make sure that you write to the Configuration Address register followed by the access to the Configuration Data register.

The third instruction is the Load instruction from the Configuration Data register. Note that byte reverse load instruction was used because here the data is part of the little Endian data structure.

This is the way you can access the Configuration Space related register from the internal side. The result of this Code Sequence will assign the device ID and the vendor ID of the part.

Page 33: F2 Mod13 PCI PCIX - Welcome to Freescale - Freescale Semiconductor

33

Course Summary

• PCI Bus architecture

• PCI-X and PCI features

• PCI-X differences from PCI

• Memory maps

• Outbound and Inbound Windows

• Configuration access

This course outlined the PCI and PCI-X Blocks of Freescale’s PowerQUICC III Processor, including their features, and the major differences between them. Both the PCI and PCI-X can be configured as 32-bit or 64-bit I/O interfaces. There is also an on-chip arbiter for the PCI and the PCI-X block. The 64-bit dual address cycle is also supported in the PowerQUICC III.

This course also introduced you to the memory maps and programming models for PCI and PCI-X blocks, including Inbound and Outbound windows. PCI and PCI-X blocks can be configured either as a Host or as an Agent. Finally, configuration access windows were discussed.

Thank you for taking this PowerQUICC III course.


Recommended