HyperTransport 3.1 Interconnect Technology

training that fi ts your needsMindShare recognizes and addresses your company’s technical training issues with:

• Scalable cost training • Customizable training options • Reducing time away from work• Just-in-time training • Overview and advanced topic courses • Training delivered effectively globally• Training in a classroom, at your cubicle or home offi ce • Concurrently delivered multiple-site training

bringing lifeto knowledge. real-world tech training put into practice worldwide real-world tech training put into practice worldwide real-world tech training put into practice worldwide real-world tech training put into practice worldwide

Are your company’s technical training needs being addressed in the most effective manner?

MindShare has over 25 years experience in conducting technical training on cutting-edge technologies. We understand the challenges companies have when searching for quality, effective training which reduces the students’ time away from work and provides cost-effective alternatives. MindShare offers many fl exible solutions to meet those needs. Our courses are taught by highly-skilled, enthusiastic, knowledgeable and experienced instructors. We bring life to knowledge through a wide variety of learning methods and delivery options.

2 PCI Express 2.0 ®

2 Intel Core 2 Processor Architecture

2 AMD Opteron Processor Architecture

2 Intel 64 and IA-32 Software Architecture

2 Intel PC and Chipset Architecture

2 PC Virtualization

2 USB 2.0

2 Wireless USB

2 Serial ATA (SATA)

2 Serial Attached SCSI (SAS)

2 DDR2/DDR3 DRAM Technology

2 PC BIOS Firmware

2 High-Speed Design

2 Windows Internals and Drivers

2 Linux Fundamentals

... and many more.

All courses can be customized to meet your group’s needs. Detailed course outlines can be found at www.mindshare.com

world-class technical training

MindShare training courses expand your technical skillset

*PCI Express ® is a registered trademark of the PCISIG*PCI Express ® is a registered trademark of the PCISIG

www.mindshare.com 4285 SLASH PINE DRIVE COLORADO SPRINGS, CO 80908 USA M 1.602.617.1123 O 1.800.633.1440 [email protected]

Engage MindShareHave knowledge that you want to bring to life? MindShare will work with you to “Bring Your Knowledge to Life.” Engage us to transform your knowledge and design courses that can be delivered in classroom or virtual class-room settings, create online eLearning modules, or publish a book that you author.

We are proud to be the preferred training provider at an extensive list of clients that include:ADAPTEC • AMD • AGILENT TECHNOLOGIES • APPLE • BROADCOM • CADENCE • CRAY • CISCO • DELL • FREESCALE

GENERAL DYNAMICS • HP • IBM • KODAK • LSI LOGIC • MOTOROLA • MICROSOFT • NASA • NATIONAL SEMICONDUCTOR

NETAPP • NOKIA • NVIDIA • PLX TECHNOLOGY • QLOGIC • SIEMENS • SUN MICROSYSTEMS SYNOPSYS • TI • UNISYS

Classroom Training

Invite MindShare to train you in-house, or sign-up to attend one of our many public classes held throughout the year and around the world. No more boring classes, the ‘MindShare Experience‘ issure to keep you engaged.

Virtual Classroom Training

The majority of our courses live over the web in an inter-active environment with WebEx and a phone bridge. We deliver training cost-effectively across multiple sites and time zones. Imagine being trained in your cubicle or home offi ce and avoiding the hassle of travel. Contact us to attend one of our public virtual classes.

eLearning Module Training

MindShare is also an eLearning company. Our growing list of interactive eLearning modules include:

• Intro to Virtualization Technology

• Intro to IO Virtualization

• Intro to PCI Express 2.0 Updates

• PCI Express 2.0

• USB 2.0

• AMD Opteron Processor Architecture

• Virtualization Technology ...and more

MindShare Press

Purchase our books and eBooks or publish your own content through us. MindShare has authored over 25 books and the listis growing. Let us help make your book project a successful one.

MindShare Learning Options

MindShare Classroom

MindShare Virtual Classroom

MindShare eLearning

MindShare Press

In-House Training

Public Training

Virtual In-House Training

Virtual Public Training

Intro eLearning Modules

Comprehensive eLearning Modules

Books

eBooks

HyperTransport 3.1Interconnect Technology

MINDSHARE, INC.

Brian HoldenJay Trodden

Don Anderson

MINDSHARE PRESS

The authors and publishers have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connec-tion with or arising out of the use of the information or programs contained herein.

Visit MindShare, Inc. on the web: www.mindshare.com

Library of Congress Control Number: 2006934887

Copyright ©2008 by MindShare, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopy-ing, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.

For information on obtaining permission for use of material from this work, please sub-mit a written request to:

MindShare PressAttn: Maryanne Daves4285 Slash Pine Dr.Colorado Springs, CO 80908Fax: 719-487-1434

Set in 10-point Times New Roman by MindShare, Inc.

ISBN 978-0-9770878-2-2

First Printing September 2008

1 Introduction to HyperTransport

This ChapterThis chapter discusses some of the motivations leading to the development of Hyper-Transport. It reviews some of the attributes that limit the ability of older generation I/O buses to keep pace with the increasing demands of new applications and advances in processor and memory technologies. The chapter then summarizes the key features behind the improved performance of HT over earlier buses.

The Next ChapterThe next chapter provides an overview of HT architecture, including the primary ele-ments of HT technology and the relationship between them. The chapter describes the general features, capabilities, and limitations of HT and introduces the terminology and concepts necessary for in-depth discussions of the various HT topics in subsequent chapters.

Computers: Three SubsystemsA server, desktop or notebook computer system is comprised of three major subsystems:

1. Processor. In all systems, the processor device may have more than one CPU core. In servers, there may be more than one processor device.

2. Main DRAM Memory3. I/O Subsystem (Southbridge). This connects to all other devices including such

things as graphics, mass storage, networks, legacy hardware, and the buses required to support them: PCIe, PCI, PCI-X, AGP, USB, IDE, etc.

9

HyperTransport 3.1 Interconnect Technology

CPUs are Faster Than Their Interconnect

Because of improvements in both the CPU internal execution speed and in the number of CPU cores per device, processor devices are more demanding than ever when they access external resources such as memory and I/O. Each external read or write by a CPU core represents a huge performance hit compared to internal execution.

Multiple Processors Aggravate The Problem

In systems with multiple processor devices, such as servers, the problem of accessing external devices becomes worse because of competition for access to system DRAM and the single set of I/O resources.

Caches Help DRAM Keep Up

System DRAM memory has kept up reasonably well with the increasing demands of CPUs for a couple of reasons. First, the performance penalty for accessing memory is mitigated by the use of internal processor cache memory. Modern processors generally implement multiple levels of internal caches that run at the full CPU clock rate and are tuned for high “hit rates”. Each fetch from an internal cache eliminates the need for an external bus cycle to memory.

Second, in cases where an external memory fetch is required, DRAM technology and the use of high-speed, mixed-signal interfaces (e.g. DDR-3, FB-DIMM, etc.) have allowed DRAM to maintain bandwidths comparable with the processor external bus rates.

Southbridge Interconnect Must Keep Pace

In order to prevent system slow-downs, the connectivity to the Southbridge with its I/O subsystem must keep pace with the processor.

I/O Accesses Can Slow Down The Processor

Although external DRAM accesses by processors can be minimized through the use of internal caches, there is no way to avoid external bus operations when accessing I/O devices. Many applications require the processor to perform small, inefficient external transactions, which then must find their way through the I/O subsystem to the bus host-ing the device.

10

Chapter 1: Introduction to HyperTransport

Lack of Bandwidth Also Hurts Fast Peripherals

Similarly, bus master I/O devices using slower subsystem buses to reach main memory are also hindered by the lack of bandwidth. Some modern peripheral devices are capable of running much faster than the buses they live on. This presents another system bottle-neck. This is a particular problem in cases where applications are running that empha-size latency-critical and/or real-time movement of data through the I/O subsystem over CPU processing.

Reducing I/O Bottlenecks

Two important schemes have been used to connect I/O devices to main memory. The first is the shared bus approach, as used in PCI and PCI-X. The second involves point-to-point component interconnects, and includes some proprietary buses as well as open architectures such as HyperTransport and PCI-Express. These are described here, along with the advantages and disadvantages of each.

The Historic Approach

Figure 1-1 on page 12 depicts the classic “North-South” bridge PCI implementation. Note that the PCI bus acts as both an “add-in” bus for user peripheral cards and as an interconnect bus to memory for all devices residing on or below it. Even traffic to and from the USB and IDE controllers integrated in the South Bridge must cross the PCI bus to reach main memory.

The topology shown in Figure 1-1 was the traditional way of connecting desktop sys-tems for a number of reasons, including:

1. A shared bus reduces the number of traces on the motherboard to a single set.2. All of the devices located on the PCI bus are only one bridge interface away from

the principal target of their transactions — main DRAM memory.3. A single, very popular protocol (PCI) can be used for all embedded devices, add-in

cards, and chipset components attached to the bus.

Unfortunately, some of the things that made this topology so popular also have made it difficult to fix the I/O bandwidth problems which have become more obvious as proces-sors and memory became faster.

11


Figure 1-1: Classic PCI North-South Bridge System

CPU

PCI Bus

PCI Slots

NorthBridge

MainMemoryAGP

Port

AGPGraphics

accelerator

VideoBIOS

DVD

Host Port

Video Port

Monitor

LocalVideo

Memory

SouthBridge

SCSIHBA

Ethernet

IDEHardDrive

IDE CD ROM

IRQs

USB

ISA Bus

ISASlots

SoundChipsetSuper

IO

RTC

COM1COM2

SystemBIOS

FSBCCIR601

VMI(Video Module I/F)

InterruptController

INTR

12

2 HT Architectural Overview

The Previous Chapter

To understand why HT was developed, it is helpful to review the previous generation of I/O buses and interconnects. This chapter reviews the factors that limit the ability of older generation buses to keep pace with the increasing demands of new applications. Finally, this chapter discusses the key factors of the HT technology that provides its improved capability.

This Chapter

This chapter provides an overview of the HT architecture that defines the primary ele-ments of HT technology and the relationship between these elements. This chapter sum-marizes the features, capabilities, and limitation of HT and provides the background information necessary for in-depth discussions of the various HT topics in later chap-ters.

The Next Chapter

The next chapter describes the function of each signal in the high- and low- speed HyperTransport signal groups.

GeneralHyperTransport is in its essence a hardware interface. While software is involved in configuring, controlling, and utilizing the interface, the bulk of the protocols described in this book are implemented in hardware. The interface has been built into many devices including all of the microprocessors in AMD’s Turion, Athlon, Phenom, and Opteron product lines.

17


HyperTransport provides a point-to-point interconnect that can be extended to support a wide range of devices. Figure 2-1 on page 19 illustrates a sample HT system with four internal links. HyperTransport provides a high-speed, high-performance, point-to-point dual simplex link for interconnecting IC components on a PCB. Data is transmitted from one device to another across the link.

The width of the link along with the clock frequency at which data is transferred are scalable and auto-negotiable:

• Link width ranges from 2 bits to 32-bits• Clock Frequency ranges from 200MHz to 3.2Hz

This scalability allows for a wide range of link performance and potential applications with bandwidths ranging from 200MB/s to 51.2GB/s.

Once again referring to Figure 2-1, the HT bus has been extended in the sample system via a series of devices known as tunnels. A tunnel is an HT device that performs some function and also has a second HT interface that permits the connection of another HT device. The end device is termed a cave, which always represents the termination of a chain of devices that all reside on the same HT bus. Cave devices include a function, but no additional HT connection. The series of devices that comprise an HT bus is some-times simply referred to as an HT chain.

In Figure 2-1, the tunnel devices provide connections to Infiniband, PCI-Express, and Ethernet. The Cave device provides a connection to a Serial Attached SCSI storage device. Devices supporting connections to any of these other technologies can be made as either cave or tunnel devices.

Additional HT buses (i.e. chains) may be implemented in a given system by using a HT-to-HT bridge. In this way, a fabric of HT devices may be implemented. Refer to sec-tion entitled, “Extending the Topology” on page 31 for additional detail.

Transfer Types SupportedHT supports two types of addressing semantics:

1. legacy PC, address-based semantics2. messaging semantics common to networking environments

The most of this book discusses the address-based semantics common to compatible PC implementations. Message-passing semantics are discussed in See “The Direct Packet Feature Set” on page 497.

18

Chapter 2: HT Architectural Overview

Address-Based Semantics

The HT bus was initially implemented as a PC compatible solution that by definition uses Address-based semantics. This includes both the 40-bit, 1 Terabyte (TB) and the 64-bit, 18 Exabyte (EB) address spaces. The 64-bit address map is the same as the 40-bit address map with 24 extra bits that are defined to be 00_0000h for 40 bit transactions. Transactions specify locations within this address space that are to be read from or writ-ten to. The address space is divided into blocks that are allocated for particular func-tions, listed in Figure 2-2 on page 20. Unless otherwise specified, 40-bit addresses are used in this book.

Read and write request command packets contain a 40-bit address Addr[39:2]. When an address extension doubleword is present, twenty four additional address bits are pro-vided for a total of 64 address bits.

Figure 2-1: Example HyperTransport System

CPU CPU

Memory/Graphics Hub and HyperTransport Host Bridge

PCIeDDR

SDRAM

Infiniband

PCIe-

GBEthernet

PCIe slots

Infiniband Switch

SASSATA

RAID Disk array

HyperTransportTunnel Devices

-

HyperTransportCave Device

HyperTransportLinks

EthernetCable

“Out of Box”

HTX slot

19


HyperTransport does not contain dedicated I/O address space as PCI does. Instead, CPU I/O space is memory-mapped to high address range (FD_FC00_0000h—FD_FDFF_FFFFh). Each HyperTransport device is configured at initialization time by the boot ROM configuration software to respond to a range of memory address spaces. The devices are assigned addresses via the base address registers contained in the con-figuration register header. Note that these registers are based on the PCI Configuration registers, and are also mapped to memory space (FD_FE00_0000h—FD_FFFF_FFFFh. Also unlike the PCI bus, there is no dedicated configuration address space.

Additional memory address ranges are used for interrupt signaling and system manage-ment messages. Details regarding the use of each range of address space is discussed in subsequent chapters that cover the related topic. For example, a detailed discussion of the configuration address space can be found in Chapter 14, entitled "Device Configura-tion," on page 329

Figure 2-2: 40-bit HT Address Map

1012GBDRAM / Memory-

Mapped IO

3984MB Interrupt /EOI

1MB LegacyPIC IACK

1MB SystemManagement

46MB Reserved

7680MB Reserved

32MBConfiguration

32MB IO

00_0000_0000h to FC_FFFF_FFFFh

FD_0000_0000h to FD_F8FF_FFFFh

FD_F900_0000h to FD_F90F_FFFFh

FD_F910_0000h to FD_F91F_FFFFh

FD_F920_0000h to FD_FBFF_FFFFh

FD_FC00_0000h to FD_FDFF_FFFFh

FD_FE00_0000h to FD_FFFF_FFFFh

FE_0000_0000h to FE_1FFF_FFFFh512MB ExtendedConfiguration

FE_2000_0000h to FF_FFFF_FFFFh

20

3 Signal Groups

The Previous ChapterThe previous chapter provided an overview of the HT architecture that defines the pri-mary elements of HT technology and the relationship between these elements. The chapter summarized the features, capabilities, and limitation of HT and provided the background information necessary for in-depth discussions of the various HT topics in later chapters.

This ChapterThis chapter describes the function of each signal in the high and low speed Hyper-Transport signal groups. The CAD, CTL and CLK high speed signals are routed point-to-point as low-voltage differential pairs between two devices (or between a device and a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDTSTOP# low speed signals are single-ended low voltage CMOS and may be bused to multiple devices. In addition, each device requires power supply and ground pins. Because the CAD bus width is scalable, the actual number of CAD, CTL and CLK signal pairs var-ies, as does the number of power and ground pins to the device.

The Next ChapterThe next chapter describes the use of HyperTransport control and data packets to con-struct HyperTransport link transactions. Control packet types include Information, Request, and Response variants; data packets contain a payload of 0-64 valid bytes. The transmission, structure, and use of each packet type is presented.

IntroductionSignals on each HyperTransport link fall into two groups: high speed signals associated with the sending and receiving of control and data packets, and miscellaneous low-speed signals required for such things as reset and power management. The low speed signals are not scalable and employ conventional low voltage CMOS signaling. The high speed signal group is scalable in terms of both bus width and clock rate, and each signal is carried by a low-voltage differential signal pair.

53


While device pin count varies with scaling, signal group functions remain the same; the only real difference in signaling between a 32-bit link versus a 2-bit link is the number of bit times required to shift information onto the bus.

The Signal GroupsAs illustrated in Figure 3-1 on page 54, the high-speed HyperTransport signals on each link consist of an outbound (transmit) set of signals and an inbound (receive) set of sig-nals for each device; these are routed point-to-point. Having two sets of uni-directional signals allows concurrent traffic. In addition, there is one set of low speed signals that may be bused to multiple devices.

Figure 3-1: HyperTransport Signal Groups

Link

CTL[n:0] signal pairsCAD[p:0] signal pairs

CLK[m:0] signal pairs

RCV

XMT

XMT

RCV

System Logic

PWROKRESET#

*LDTSTOP#*LDTREQ#

*Optional

Low SpeedSignals

VHT = 1.2VoltsGND

High SpeedSignals

(Next Link)

(Next Link)

XMT

RCV

Device A Device B

XMT

RCV

RCV

XMT

For Width = 2 4 8 16 32 m = 0 0 0 1 3 Gen3: n = 0 0 0 1 3 Gen1: n = 0 0 0 0 0

p = 1 3 7 15 31

CTL[n:0] signal pairsCAD[p:0] signal pairs

CLK[m:0] signal pairs

54

Chapter 3: Signal Groups

The High Speed Signals (One Set In Each Direction)Each high-speed signal is actually a differential signal pair. The CAD (Command/ Address/Data) signals carry the HyperTransport packets. When a link transmitter sends packets on the CAD bus, the receive side of the interface uses the CLK signals, also sup-plied by the transmitter, to latch in packet information during each bit time. The CTL signal or signals are used to delineate between the packet types.

The CAD Signal Group

The CAD bus is always driven by the transmitter side of a link, and is comprised of sig-nal pairs that carry HyperTransport requests, responses, and data. Each CAD bus may consist of between 2 bits (two differential signal pairs) and 32 bits (thirty-two differen-tial signal pairs). The HyperTransport specification permits the CAD bus width to be different (asymmetrical) for the two directions. To enable the corresponding receiver to make a distinction as to the type of information currently being sent over the CAD bus, the transmitter also drives the CTL signals (See the following description).

Control Signal (CTL)

This set of signal pairs is driven by the transmitter to identify the information being sent concurrently over the CAD signals. The receiver uses this information to delineate the incoming CAD information.

In the Gen1 Protocol, there is one (and only one) CTL signal pair for each link direction, regardless of the width of the CAD bus. If this signal is asserted (high), the transmitter is indicated that it is sending a control packet; if deasserted, the transmitter is sending a data packet. The CTL line may only change on the boundaries of the transfer of the log-ical 32 bit CAD bus. The receiver uses this restriction to allow framing on the data.

In the Gen3 Protocol, there is one CTL signal pair for each set of 8 or fewer CAD lines. These CTL signal(s) are encoded in a manner to transfer four CTL bits per 32 CAD bits. Five of the sixteen available codepoints for these bits are used to delineate the transfer of commands, inserted commands, the CRC for a command with data, the CRC for a command without data, and finally the data itself. The receiver frames on the presence of the used codepoints.

55


Clock Signal(s) (CLK)

As each HyperTransport transmitter sends a differential clock signal along with CAD and CTL signals to the receiver at the other end of the link. There is one CLK signal pair for each set of 8 or fewer CAD signal pairs. Having a CLK signal pair per set of 8 or fewer CAD signal pairs aids in the partitioning of on-chip clock circuitry as well as eas-ing board layout.

In HyperTransport revisions 1.03 to 1.1, the CLK signal is source synchronous with the CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 800 MHz.

In HyperTransport revision 2.0 the CLK signal is also source synchronous with the CAD and CTL lines. The clock speeds range from 200 MHz (default) up to 1.4 GHz.

In HyperTransport revision 3.1 the CLK signal is also source synchronous with the CAD and CTL lines, but the phase of CAD and CTL with respect to CLK is allowed to vary within a defined budget over a several bit times. The clock speeds range from 200 MHz (default) up to 3.2 GHz.

For all revisions, the jitter on the CLK line is tightly controlled.

Scaling Hazards: Burden Is On The TransmitterIt is a requirement in HyperTransport that the software in the transmitter side of each link must be aware of the capabilities of its corresponding receiver and avoid the double hazard of a scalable bus: running at a faster clock rate than the receiver can handle orusing a wider data path than the receiver supports. Because the link is not a shared bus, the transmitter side of each device is concerned with the capabilities of only one target.

The Low Speed Signals

Power OK (PWROK) And Reset (RESET#)

PWROK used with RESET# indicates to HyperTransport devices whether a Cold or Warm Reset is in progress. Which system logic component is responsible for managing the PWROK and RESET# signals is beyond the scope of the HyperTransport specifica-tion, but timing and use of the signals are defined. The basic use of the signals includes:

56

4 Packet Protocol

The Previous ChapterThe previous chapter described the function of each signal in the high and low speed HyperTransport signal groups. The CAD, CTL, and CLK high speed signals are routed point-to-point as low-voltage differential pairs between two devices (or between a device and a connector in some cases). The RESET#, PWROK, LDTREQ#, and LDT-STOP# low speed signals are single-ended low voltage CMOS and may be bused to multiple devices. In addition, each device requires power supply and ground pins. Because the CAD bus width is scalable, the actual number of CAD, CTL and CLK sig-nal pairs varies, as does the number of power and ground pins to the device.

This ChapterThis chapter describes the use of HyperTransport control and data packets to construct HyperTransport link transactions. Control packet types include Information, Request, and Response variants; data packets contain a payload of 0-64 valid bytes. The trans-mission, structure, and use of each packet type is presented.

The Next ChapterThe next chapter describes HyperTransport flow control, used to throttle the movement of packets across each link interface. On a high-performance connection such as Hyper-Transport, efficient management of transaction flow is nearly as important as the raw bandwidth made possible by clock speed and data bus width. Topics covered here include background information on bus flow control and the initialization and use of the HyperTransport virtual channel flow control buffer mechanism defined for each trans-mitter-receiver pair.

The Packet-Based ProtocolHyperTransport employs a packet-based protocol in which all information, address, commands, and data, travel in packets which are multiples of four bytes each. Packets are used in link management (e.g. flow control and error reporting) and as building blocks in constructing more complex transactions such as read and write data transfers.

63


It should be noted that, while packet descriptions in this chapter are in terms of bytes, the link’s bidirectional interface width (2, 4, 8, 16, or 32 bits) ultimately determines the amount of packet information sent during each bit time on HyperTransport links. There are two bit times per clock period.

Before looking at packet function and use, the following sections describe the mechan-ics of packet delivery over 2, 4, 8, 16, and 32 bit scalable link interfaces.

8 Bit Interfaces

For 8-bit interfaces, one byte of packet information may be sent in each bit time. For example, a 4-byte request packet would be sent by the transmitter during four adjacent bit times, least significant byte first as shown in Figure 4-1 on page 64. Total time to complete a four-byte packet is two clock periods.

Figure 4-1: Four Byte Packet On An 8-Bit Interface

Bit TimeClock

Byte 0 -3

0

0

0

0

7

7

7

7

A

A B C D

(Byte 0)

Example:4 Byte Packet On An 8-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

B

C

D

Device A Device BCAD0-7

8

D,C, B, A

64

Chapter 4: Packet Protocol

Interfaces Narrower Than 8 Bits

For link interfaces that are narrower than 8 bits, the first byte of packet information is shifted out over multiple bit times, least significant bits first. Referring to Figure 4-2 on page 65, a 2-bit interface would require four bit times to transmit each byte of informa-tion. After the first byte is sent, subsequent bytes in the packet are shifted out in the same manner. Total time to complete four byte packet: eight clock periods.

Figure 4-2: Four Byte Packet On A 2-Bit Interface

Bit TimeClock

Byte 0 Byte 1 Byte 2 Byte 3

0

0

0

0

7

7

7

7

D

H

L

P

C

G

K

O

B

F

J

N

AA B C D E F G H I J K L M N O P

E

(Byte 0)

Example:4 Byte Packet On A 2-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

P, O, N.....C, B, A

Device A Device BCAD0-1

2

65


Interfaces Wider Than 8 Bits

For 16 or 32 bit interfaces, packet delivery is accelerated by sending multiple bytes of packet information in parallel with each other.

16 Bit Interfaces

On 16-bit interfaces, two bytes of information may be sent in each bit time. Referring to Figure 4-3 on page 66, note that even numbered bytes travel on the lower portion of the 16 bit interface, odd numbered bytes on the upper portion.

Figure 4-3: Four Byte Packet On A 16-Bit Interface

Device A Device B8

Bit TimeClock

(Byte 0 -3)

0

0

0

0

7

7

7

7

A

CAD0-7

(Byte 0)

Example:4 Byte Packet On A 16-Bit Interface

(Byte 1)

(Byte 2)

(Byte 3)

1

1

1

1

2

2

2

2

B

8

(Byte3), (Byte1)

A B

B ACAD8-15

(Byte2), (Byte0)

66

6 I/O Ordering

The Previous ChapterThe previous chapter describes HyperTransport flow control, used to throttle the move-ment of packets across each link interface. On a high-performance connection such as HyperTransport, efficient management of transaction flow is nearly as important as the raw bandwidth made possible by clock speed and data bus width. Topics covered here include background information on bus flow control and the initialization and use of the HyperTransport virtual channel flow control buffer mechanism defined for each trans-mitter-receiver pair.

This ChapterThis chapter describes the ordering rules which apply to HyperTransport packets. Attribute bits in request and response packets are configured according to application requirements. Additional ordering requirements can be found in Chapter 23, entitled "I/O Compatibility," on page 525 that are used when interfacing PCI, PCI-X, PCIe and AGP.

The Next ChapterIn the next chapter, examples are presented which apply the packet principles described in the preceding chapter. The examples also entail more complex system transactions than discussed previously, including reads, posted and non-posted writes, and atomic read-modify-write operations.

The Purpose Of Ordering RulesSome of the important reasons for enforcing ordering rules on packets moving through HyperTransport include the following:

125


Maintain Data Correctness

If transactions are in some way dependent on each other, a method is required to assure that they complete in a deterministic way. For example, if Device A performs a write transaction targeting main memory and then follows it with a read request targeting the same location, what data will the read transaction return? HyperTransport ordering seeks to make such events predictable (deterministic) and to match the intent of the pro-grammer. Note that, compared to a shared bus such as PCI, HyperTransport transaction ordering is complicated somewhat by point-to-point connections which result in target devices on the same chain (logical bus) being at different levels of fabric hierarchy.

Avoid Deadlocks

Another reason for ordering rules is to prevent cases where the completion of two sepa-rate transactions are each dependent on the other completing first. HyperTransport ordering includes a number of rules for deadlock avoidance. Some of the rules are in the specification because of known deadlock hazards associated with other buses to which HyperTransport may interface (e.g. PCI).

Support Legacy Buses

One of the principal roles of HyperTransport is to serve as a backbone bus which is bridged to other peripheral buses. HyperTransport explicitly supports PCI, PCI-X, and AGP and the ordering requirements of those buses.

Maximize Performance

Finally, HyperTransport permits devices in the path to the target, and the target itself, some flexibility in reordering packets around each other to enhance performance. When acceptable, relaxed ordering may be enabled by the requester on a per-transaction basis using attribute bits in request and response packets.

126

Chapter 6: I/O Ordering

Introduction: Three Types Of Traffic FlowHyperTransport defines three types of traffic: Programmed I/O (PIO), Direct Memory Access (DMA), and Peer-to-Peer. Figure 6-1 on page 127 depicts the three types of traf-fic.

1. Programmed I/O traffic originates at the host bridge on behalf of the CPU and tar-gets I/O or Memory Mapped I/O in one of the peripherals. These types of transac-tions often are generated by CPU to set up peripherals for bus master activity, check status, program configuration space, etc.

2. DMA traffic originates at a bus master peripheral and typically targets main mem-ory. It may also originate from a DMA controller located within the processor device outside of the CPU itself. This traffic is used so that the CPU may be off-loaded from the burden of moving large amounts of data to and from the I/O sub-system. Generally, the CPU uses a few PIO instructions to program the peripheral device with information about a required DMA transfer (transfer size, target address in memory, read or write, etc.), then performs some other task while the DMA transfer is carried out. When the transfer is complete, the DMA device may generate an interrupt message to inform the CPU.

3. Peer-to-Peer traffic is generated by an interior node and targets another interior node. In HyperTransport, direct peer-to-peer traffic is not allowed. As indicated in Figure 6-1 on page 127, the request is issued upstream and must travel to the host bridge. The host bridge examines the address and determines whether the request should be reflected downstream. If the request is non-posted, the response will sim-ilarly travel from the target back up to the host bridge and then be reissued to the original requester.

Figure 6-1: PIO, DMA, And Peer-to-Peer Traffic

MemoryTarget

SourceUnitID0

TargetUnitID2

Host Bridge Host Bridge

S

P

TunnelTunnel

Bus 0Bus 0

Bus 0Bus 0

Bus 0

Bus 0

SourceSourceUnitID2 UnitID2

TargetUnitID1

Peer-to-PeerDMAPIO

Tunnel

P P

P

S

Host Bridge

S

P P

127


The Ordering RulesHyperTransport packet ordering rules are divided into groups: general rules, rules for upstream I/O ordering, and rules for downstream ordering. Even the peer-to-peer exam-ple in Figure 6-1 on page 127 can be broken into two parts: the request moving to the bridge (covered by upstream ordering rules) and the reflection of the request down-stream to the peer-to-peer target (covered by downstream I/O ordering rules). Refer to Chapter 23, entitled "I/O Compatibility," on page 525 for a discussion of ordering when packets move between HyperTransport and another protocol (PCI, PCI-X, or AGP).

General I/O Ordering Limits

Ordering Covers Targets At Same Hierarchy Level

Ordering rules only apply to the order in which operations are detected by targets at the same level in the HyperTransport fabric hierarchy. Referring to Figure 6-2 on page 128, assume that two peer-to-peer writes targeting devices on two different chains have been performed by the end device in chain 0.

Figure 6-2: Targets At Different Levels In Hierarchy And In Different Chains

HT Host Bridge

HT-to-PCI-XTunnel

HT-to-GbE

Tunnel

HT-to-PCI

Tunnel

HT-to-SCSI

Tunnel

Chain 1

Chain 2

Chain 0

Request A

Request B

I/OHub

128

7 Transaction Examples

The Previous ChapterThe previous chapter describes the ordering rules which apply to HyperTransport pack-ets. Attribute bits in request and response packets are configured according to applica-tion requirements. Additional ordering requirements can be found in Chapter 23, entitled "I/O Compatibility," on page 525 that are used when interfacing PCI, PCI-X, PCIe and AGP.

This ChapterIn this chapter, examples are presented that apply the packet principles in the preceding chapters and includes more complex system transactions, not previously discussed. The examples include reads, posted and non-posted writes, and atomic read-modify-write.

The Next ChapterHT uses an interrupt signaling scheme similar to PCI’s Message Signaled Interrupts. The next chapter defines how HT delivers interrupts to the Host Bridge via posted mem-ory writes. This chapter also defines an End of Interrupt message and details the mecha-nism that HT uses for configuring and setting up interrupt transactions (which is different from the PCI-defined mechanisms).

Transaction Examples: IntroductionThe following section describes some transactions which might be seen in HyperTrans-port. Not all cases are covered, but the ones which are presented are intended to illus-trate two important aspects of each transaction:

145


Packet Format And Optional Features

Most of the control packet variants have a number of fields used to enable optional fea-tures: isochronous vs. standard virtual channels, byte vs. dword data transfers, posted vs. non-posted channels, etc. In each of the examples, the key options are described; in some cases, certain packet fields don’t apply to a particular request or response at all. Refer to Chapter 4, entitled "Packet Protocol," on page 63 for low level, bit-by-bit packet field descriptions.

General Sequence Of Events

Once the packet fields are determined, the next topic covered in the transaction exam-ples is a general summary of events which must occur in the HyperTransport topology to perform the transaction. The description starts at the agent initiating an information or request control packet and follows it (and possibly a data packet) to the target; if there is a response required, this packet (and any data associated with it) is similarly followed from the target back to the original requester. Packet movement through HyperTransport involves aspects of topics described in more detail in other chapters, including flow con-trol, packet ordering and routing, error handling, etc.

For each of the examples, assume the link interface CAD bus is eight bits in each direc-tion. This simplifies the discussion of packet bytes moving across the connection — each byte of packet content is sent in one bit time. The only difference, when link width is greater or less than 8 bits, is how the packet bytes are shifted out and received. For example, on a 2-bit interface each byte of packet content is shifted onto the link over 4 bit times. For an interface 32 bits wide, four bytes of packet content are sent in parallel with each other in each bit time.

146

Chapter 7: Transaction Examples

Example 1: NOP Information PacketSituation: Device 2 in Figure 7-1 on page 147 is using a NOP packet to inform Device 1 about the availability of two new Posted Request (CMD) entries and one new Response (Data) entry in its receiver flow control buffers.

Figure 7-1: Example 1: NOP Information Packet With Buffer Updates

ResponseData[1:0]

6 5 4 3 2 17

0

Bits

Bit FieldsByte 0:

DisCon = 0bCMD = 000000b

Command Type Cmd[5:0] = 000000b

1

2

3

Byte 0

DisCon

PostData[1:0]Response[1:0]

Reserved

PostCmd[1:0]

NonPostData[1:0] NonPostCmd[1:0]IsocDiagRsv-Host Reserved

Byte 1: PostCMD = 10b (2d) PostData = 00b Response = 00b ResponseData = 01b (1d)

Bit Fields

Byte 3: RxNextPktToAck = 25h

Byte 2: NonPostCMD = 00b Non-PostData = 00b Isoc = 0b Diag = 0b

NOP PacketBytes

Device 1 Device 2

CAD0-7

CTL 8

CLK

CAD0-7

CTL8

CLK

Flow ControlBuffers

RxNextPktToAck[7:0]

147


Example 1: NOP Packet Setup

(Refer to Figure 7-1 on page 147.)

Command[5:0] Field (Byte 0, Bit 5:0)

The NOP information packet command code. There are no option bits within this field. For this example, field = 000000b.

DisCon Bit Field (Byte 0, Bit 6)

This bit is set to indicate an LDTSTOP# sequence is beginning. Not active for this example. Refer to “HT Link Disconnect/Reconnect Sequence” on page 217 for a discus-sion of the DisCon bit and the LDTSTOP# sequence. For this example, field = 0b.

PostCMD[1:0] Field (Byte 1, Bits 1:0)

This NOP field is used by all devices to dynamically report the number of new entries (0-3) in the receiver’s Posted Request (CMD) flow control buffers which have become available since the last NOP reporting this information was sent. In our example, assume that two new entries have become available and this field = 10b.

PostData[1:0] Field (Byte 1, Bits 3:2)

This NOP field is used by all devices to dynamically report the number of new entries (0-3) in the receiver’s Posted Request (Data) flow control buffers that have become available since the last NOP reporting this information was sent. In our example, assume no new entries in this buffer have become available and this field = 00b.

Response[1:0] Field (Byte 1, Bits 5:4)

This NOP field dynamically reports the number of new entries (0-3) in the receiver’s Response (Data) flow control buffers which have become available since the last NOP reporting this information was sent. In our example, assume no new entries in this buffer have become available. For this example, field = 00b.

ResponseData[1:0] Field (Byte 1, Bits 3:2)

This NOP field is used by all devices to dynamically report the number of new entries (0-3) in the receiver’s Response (Data) flow control buffers which have become avail-able since the last NOP reporting this information was sent. In our example, assume one new entry in this buffer has become available. For this example, field = 01b.

148

Date post:	31-Jan-2017
Category:	Documents
Upload:	danglien
View:	218 times
Download:	0 times

HyperTransport 3.1 Interconnect Technology

Documents