Contents of the Lecture

1

Contents of the Lecture

1. Introduction2. Methods for I/O Operations3. Buses4. Liquid Crystal Displays5. Other Types of Displays6. Graphics Adapters7. Optical Discs

12/21/2017 Input/Output Systems and Peripheral Devices (06-1)

2Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017


Structure of a Graphics Adapter (1)

12/21/2017



Graphics Controller Implements the main functions of the graphics adapter System Bus Interface

Transfers in burst mode Transfers with no wait states when reading the video memory FIFO memory for efficient write to the video memory

12/21/2017



Video Memory Interface Allows to update the video images

VGA Registers and Control RegistersEnable programming of the video adapter for operation in VGA modes There are adapters that are no longer compatible with the VGA standard

Cursor GeneratorGraphic Functions

Implemented by graphics accelerators 12/21/2017


Structure of a Graphics Adapter (4)Video BIOS

Provides video functions for access to the graphics adapter The BIOS programs of different adapters are different difficult programming VESA (Video Electronics Standards Association) standard for high-resolution BIOS functions

Video MemoryHolds the video image frame buffer

12/21/2017



RAMDAC Circuit (RAMDAC – RAM Digital to Analog Converter)

Reads the digital image and converts it into analog signals The RAMDAC functions may be integrated into the graphics controller Only required for displays with analog inputs Displays that operate in the digital domain reconvert the analog signals to digital form

12/21/2017



Video PortsEnable to transfer the video images to a monitor There are several variants of video ports VGA (Video Graphics Array)

Analog interface Designed for CRT displays, but also used by some liquid crystal displays Electrical noise may occur DB-15 connector

12/21/2017



VIVO (Video In Video Out)Analog interface for connecting to TV sets, DVD players, game consoles (TV Out) Signals: S-Video (Y/C); composite video; component video (e.g., RGB) 9-pin mini-DIN connector

DVI (Digital Visual Interface) Digital interface DVI-I (digital and analog signals) or DVI-D (digital signals only) connector

12/21/2017

10


HDMI (High-Definition Multimedia Interface)Digital interface for uncompressed video dataAllows to send digital audio data over the same cable19-pin (single-link) or 29-pin (dual-link) connector

DisplayPort Digital interface for video and audio data Targeted to replace the VGA and DVI interfaces 20-pin connectors for 1, 2, or 4 lanes





12/21/2017


Video Memory (1)

Can be single-ported or dual-ported Single-ported video memory

The data port is used to refresh the screen and to write new data

Dual-ported video memoryOne of the ports is used to update the images in memory The second port has serial access and is used to refresh the images on the screen

12/21/2017


Video Memory (2)

Video Memory Transfer RateThe maximum transfer rate bandwidth Affected by video memory technology and access time Bandwidth has to be shared by: screen refreshing circuits, CPU, graphics controller 30 .. 50% of the bandwidth should be reserved for other functions, different than refreshing

12/21/2017

14

Video Memory (3)

DDR-400 (PC3200) memory Maximum transfer rate: 3,200 MB/s Average transfer rate: ~1,600 MB/s

DDR2-667 (PC2-5300) memory Maximum transfer rate : 5.336 GB/s

DDR3-2133 (PC3-17000) memory Maximum transfer rate : 17 GB/s

DDR4-3200 (PC4-25600) memory Maximum transfer rate : 25.6 GB/s



Video Memory (4)

GDDR (Graphics Double Data Rate)Designed by ATI Technologies with the collaboration of the JEDEC committee Several versions: GDDR2 .. GDDR5

GDDR2 and GDDR3: based on DDR2 technologyGDDR4 and GDDR5: based on DDR3 technology

Low voltage: 1.8 V .. 1.5 V reduced power consumption and heat output Separate data strobe signals for read and write

12/21/2017


Video Memory (5)

GDDR5Combines high performance with stable operation and low implementation costs Memory organization: 32 Differential command clock signal (CK, CK#) Two diff. write clock signals (WCK, WCK#)

Two data bytes are aligned to one WCK signal Example for a data rate of 5 Gbits/s:

fCK = 1.25 GHz; fWCK = 2.5 GHz

12/21/2017


Video Memory (6)

Data bus inversion Reduces the number of zero bits transmitted Indicated with a DBI# signal for each byte Transmission lines have high level termination power dissipation is reduced

Address bus inversion Signal training

Phase adjustment of clock, data, and address signals

12/21/2017

18

Video Memory (7)

Address training: alignment of the address bus to the CK clock signal Alignment of WCK signal to the CK signal Data training: alignment of the data lines to the corresponding WCK signal A “hidden” data re-training is possible

Calibration: improves the reliability Auto-calibration: drive strength, termination impedance Software-controlled adjustment



Video Memory (8)Burst read/write access to the internal memory: 8 bits/pin 256 bits (two CK cycles)

Maximum transfer rates of 4 .. 7 Gbits/s per pin 16 .. 28 GB/s for 32 pins

Error detection Dedicated EDC (Error Detection Code) pins for sending CRC codes to the controller CRC code: for each data byte + DBI# line Allows to detect single-bit and double-bit errors

12/21/2017


Video Memory (9)

Power management Features that allow to consume power only when it is needed Scalable clock frequency and data rate: 5 Gbits/s .. 200 Mbits/s Low power mode for the DRAM core Multiple levels for termination impedance: increasing the impedance at slower data rates Low supply voltage: 1.5 V Data and address bus inversion

12/21/2017




12/21/2017


Graphics Accelerators (1)

Contain specialized circuits to execute the mathematical operations required for graphics rendering

Release the CPU from the task of executing these operations

The first graphics accelerators: AVGA (Accelerated VGA) adapters Subsequent graphics accelerators: 2D acceleratorsThe link between the accelerator circuitry and the OS is made via a driver

12/21/2017


Graphics Accelerators (2)Common 2D graphics functions:

BitBlt (Bit Block Transfer)Two bitmaps are combined with a raster operation Boolean operatorThe result is transferred to the destination areaBlitter: dedicated circuit for the BitBlt operation

Tracing lines, drawing rectangles, circles Filling surfaces or polygons Adding color

12/21/2017


Graphics Accelerators (3)

Multimedia accelerators: graphics accelerators extended with audio and video acceleration functionsFunctions:

Decoding audio data streams Scaling video images in x, y directionsConverting digital video signals into RGB signals Decompressing video images represented in various formats

12/21/2017




12/21/2017


3D Accelerators

3D Accelerators3D Images 3D Operations

12/21/2017


3D Images (1)

Are managed using abstract models An object is represented as a set of points defined by its x, y, and z coordinates position of verticesIf the object vertices are connected with lines, surfaces are obtained can be filled with a certain color or texture Each 3D object is composed of a large number of triangles (or polygons) that describe its surface

12/21/2017


3D Images (2)

Animated 3D graphics requires to perform a series of geometry computations that define the position of objects in 3D space

The geometry computations that handle the vertices of triangles can be performed by the CPU or by the graphics processor

The graphics processor must convert these triangles into solid surfaces intensive computations are needed

12/21/2017


3D Images (3)

In the real world, objects interact with each other Complex mathematical equations are used to determine whether an object is visible in a scene from a given angle Besides the color components, for each pixel an alpha value must also be stored

Indicates the degree of transparency of the pixel in the final image

12/21/2017


3D Images (4)

Another information that must be stored: the depth in space or z coordinate

The accelerator determines the z value of the objects’ pixels in a plane and displays those with a smaller z value The pixels’ depth information is stored in a separate buffer z-buffer Usually, 32 bits are allocated in the z-buffer for each pixel

12/21/2017


3D Images (5)

Each time the image is updated, the color and depth of pixels must be recomputed

Applying different 3D computations to the scene process of rendering Fills in all of the points on the surface of the object that previously was stored only as a set of vertices A solid object with 3D effects will be drawn on the monitor

12/21/2017


3D Accelerators

3D Accelerators3D Images 3D Operations

12/21/2017


3D Operations (1)

3D operations are performed in two stages:

Geometry stage: clipping, transformation, lighting Rendering stage: shading, texture mapping with adding the perspective effect, texture filtering, alpha blending On current 3D accelerators, operations in both stages are performed by the graphics processor

12/21/2017


3D Operations (2)

12/21/2017


3D Operations (3)

ClippingDetermines what part of an object is visible on the screen Eliminates the parts that are not visible

LightingObjects are modeled by light sources in the scene Lighting effects create color shading, light reflection, shadows

12/21/2017


3D Operations (4)

TransformationTranslation: moving every point by a fixed distance in the same direction Reflection: transforming an object into its mirror image Glide reflection: combining a reflection with the translation along the reflection axis Scaling: linear transformation to change the size of objects

12/21/2017


3D Operations (5)

Tessellation Dividing polygons into smaller structures for rendering Dividing into triangles: triangulation

12/21/2017


3D Operations (6)

Shading Enables the realistic representation of 3D objects on 2D screens Algorithms: Gouraud, Phong Reading the color information of vertices Interpolating the intensities for the color components

12/21/2017


3D Operations (7)

Texture Mapping Adding surface details (textures) to polygons that represent objects

Loading the texture elements (texels) from a bitmapCombining the texels Writing the resulting pixel to video memory

Applying a single texture Multi-texturing: a combination of textures is applied to an object

12/21/2017


3D Operations (8)

Textures may require a large space in memory compression is used Textures must be corrected to create the perspective effect

12/21/2017


3D Operations (9)

Texture Filtering Reduces some unwanted effects that may occur with texture mapping The color of a new pixel is determined through interpolation between the colors of several texels in the original textureBilinear filtering: uses the weighted average of the four texels nearest to a particular texel

12/21/2017


3D Operations (10)Trilinear Filtering

The texture resolution is reduced when the distance to the object increases 3D accelerators store in memory several variants of a texture “MIP mapping”Combining this feature with bilinear filtering

12/21/2017


3D Operations (11)

Fogging Gradually fading objects in the distance The scene will appear more realistic illusion of distant objects Allows to perform the 3D processing faster

Alpha Blending Used to create the transparency effect for some objects (e.g., windows)

12/21/2017


3D Operations (12)

Anti-Aliasing Oblique lines: approximated by combining vertical segments with horizontal segments the aliasing effect occurs Removing this effect (“anti-aliasing”):

Changing the color of pixels near the outlines The background color is gradually mixed with the object's color

The clarity of the outlines is reduced 12/21/2017



Structure of a Graphics Adapter Color Representation Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017


Graphics Processing Units

Graphics Processing UnitsOverviewGPGPU ComputingThe CUDA ArchitectureThe NVIDIA GP100 GPU

12/21/2017


Overview (1)

GPU – Graphics Processing UnitDedicated graphics processors for PCs, workstations, and game consoles

Initially used to accelerate the rendering stage for 3D graphics (e.g., texture mapping)Later also used to accelerate the geometric computations (rotation, translation)

GPUs contain shader units, modules for texture mapping, anti-aliasing etc.

12/21/2017


Overview (2)

Vertex shader units Transform the 3D position of each vertex to the 2D coordinates on the screen and to the depth value for the z-buffer Modify the attributes of vertices: position, color, texture coordinates

Geometry shader unitsGenerate geometric figures or add volumetric details to objects

12/21/2017


Overview (3)Pixel/fragment shader units

Determine the color, z depth, and alpha value for each pixel or fragment

Unified shader units Programmable units Able to perform various shading operations (vertex, geometry, pixel) GPUs contain an array of computing units and a unit that distributes the operations to be performed

12/21/2017


Overview (4)

The architecture with programmable units allows a more flexible use of the hardware resources The programmable units can also be used for other types of computations A flexible parallel architecture is obtained

GPUs also include modules for 2D acceleration, MPEG compression, high-definition video decoding

12/21/2017


Overview (5)

GPUs can be dedicated or integrated Dedicated GPUs

Used in graphics cards interfaced with the motherboard via a PCI Express bus or AGP (Accelerated Graphics Port) interface Have a dedicated memory to the card use Examples

AMD Radeon HD 8xxxM (e.g., 8970M)

NVIDIA GeForce GTX (e.g., GTX 1080)

12/21/2017


Overview (6)

Integrated GPUsAre integrated into a chipset or processorUse a portion of the system memory Have lower performance compared to dedicated GPUs Examples

Intel HD Graphics (e.g., UHD Graphics 630) AMD A-10 APU (Accelerated Processing Unit) processor series NVIDIA in Tegra processors (K1, X1)

12/21/2017


Overview (7)

The design of GPUs was influenced by the 2D and 3D programming interfaces

Implement API functions in hardware OpenGL (Open Graphics Library)

For various platforms and languages Functions to draw 3D scenes from primitives

Direct3D (component of DirectX) Only for the Microsoft operating systemsLow-level interface to the 3D hardware functions

12/21/2017


Overview (8)

Technologies for connecting multiple GPUs on different graphics cards NVIDIA: SLI (Scalable Link Interface)

2 .. 4 identical graphics cards are connected via a motherboard (PCIe x 16)

AMD: CrossFireX Up to 4 graphics cards can be connectedThe graphics cards do not have to be identicalThe cards have external connectors

12/21/2017




12/21/2017


GPGPU Computing (1)GPGPU (General Purpose computing on GPU)The GPU processing cores provide massive FP computational power

Example: a single NVIDIA GP100 GPU (3,584 cores) achieves 10.6 TFLOPS

The graphics pipeline can also be used for general-purpose applications

The performance can be orders of magnitude higher than that of conventional CPUs

12/21/2017


GPGPU Computing (2)

GPUs can process independent vertices and pixels/fragments stream processors

Stream: set of records that require similar computation Kernel function: applied to each element in the stream Shared memories cannot be used

Ideal GPGPU applications: large data sets, high parallelism, reduced dependencies

12/21/2017


GPGPU Computing (3)Disadvantages of GPGPU computing:

The programmer needs to be familiar with the graphics APIs and the GPU architectureProblems need to be expressed in terms of coordinates, textures, shader functions The need to use graphics programming languages: OpenGL, DirectX, Cg

API extensions for running some program functions on GPU's processors: CUDA (NVIDIA), OpenCL (Khronos Group)

12/21/2017




12/21/2017


The CUDA Architecture (1)

CUDA (Compute Unified Device Architecture)Software and hardware architecture

Enables GPUs to execute programs written in C, C++, Fortran, OpenCL languagesAllows to use Microsoft's DirectCompute API Allows to access directly the GPU resources for general-purpose computing

Exploits the GPU's capability to operate on large matrices in parallel

12/21/2017



A CUDA program calls kernel functions executed by threadsThreads are organized into blocks and groups of blocks (grids) Thread block:

Set of concurrent threadsCommunicate via a shared memory Each thread has an identifier, registers, private memory, inputs, outputs

12/21/2017



Grid of blocks:Group (array) of thread blocks The blocks execute the same kernel function Ensure synchronization between dependent kernel functions Results are shared in a global memory allocated to an application global synchronization

12/21/2017



12/21/2017



The hierarchy of threads is executed on a hierarchy of processors on the GPU

Threads: executed by CUDA cores and other execution units Thread blocks: executed by a streaming multiprocessor (SM) Group of 32 threads: warp Grids of blocks: executed by the GPU

12/21/2017


The CUDA Architecture (6)Unified Virtual Addressing (CUDA 4)

Provides a single virtual memory address space for CPU and GPU memory

Unified Memory (CUDA 6)Part of the GPU physical memory is shared between the CPU and GPU CUDA software migrates data allocated in the unified memory between GPU and CPU The memory modified by the CPU should be synchronized with the GPU memory

12/21/2017




12/21/2017


The NVIDIA GP100 GPU (1)Uses a new architecture, Pascal

Previous architectures: Fermi, Kepler, Maxwell

Main features:15.3 billion transistors, 16 nm technologyUses NVIDIA's NVLink interconnect, with a bandwidth of up to 160 GB/s Integrates an HBM2 (High Bandwidth Memory 2) stacked memory, 16 GB .. 32 GBImproved power efficiency (performance/W)

12/21/2017


The NVIDIA GP100 GPU (2)

A full implementation contains:Six Graphics Processing Cluster (GPC) units 30 Texture Processing Cluster (TPC) units60 SM units with 64 CUDA cores and 4 texture units (3,840 cores; 240 texture units)Eight 512-bit memory controllers (4096-bit memory interface)Four HBM2 DRAM memory stacks Common L2 cache memory for the SM unitsGlobal scheduler GigaThread Engine

12/21/2017



12/21/2017



Each SM unit contains:64 single-precision (SP) CUDA cores

Integer arithmetic and logic unit Floating-point unit IEEE 754-2008Fused multiply-add instruction (FMA)

32 double-precision (DP) units 16 Load/Store units (LD/ST)16 special-function units (SFU) transcendental functions

12/21/2017



12/21/2017



Threads are scheduled in groups of 32 (warps)Each SM unit contains:

Two warp schedulers provide increased performance and reduced power consumptionFour instruction dispatch units

From each warp, two instructions can be dispatched in each clock cycle

12/21/2017



Memory subsystemEach SM unit contains an instruction cache memoryThere is a separate L1 data cache memory, which can also be used as a texture memory 4096 KB of unified L2 cache memory: allows to share data between the SM units64 KB of shared memory The register files and memories are protected by an ECC code

12/21/2017



The NVIDIA Tesla P100 GPU AcceleratorBased on the Pascal GPU architecture and the CUDA parallel computing model Designed as an accelerator for datacenters Contains one GP100 GPUNumber of CUDA cores: 56 x 64 = 3,584

Only 56 SM units are enabled Single-precision FP perform.: 10.6 TFLOPSDouble-precision FP perform.: 5.3 TFLOPSMemory size: 4 x 4 GB = 16 GB (HBM2)

12/21/2017



12/21/2017


Summary (1)

The main component of a graphics adapter is the graphics controller

It contains the system bus interface; the speed of this interface is an important performance factor

Video ports enable to combine video images from other sources with graphics imagesThe transfer rate of the video memory has a major impact on performance

Dual-ported memories: updating the images and refreshing the screen can be performed in parallel

12/21/2017


Summary (2)

The GDDR5 memory has advanced features for high performance and stable operation

Data and address bus inversion; signal training; calibration; error detection; power management

3D accelerators are required to convert 3D objects into 2D images in a realistic mannerFor each pixel of a 3D object, an alpha value and the z coordinate have to be stored3D operations are performed in two stages: geometry stage and rendering stage

12/21/2017


Summary (3)

GPUs are used to accelerate the geometric stage and the rendering stage of 3D graphics

Can be dedicated or integrated in a chipset GPUs contain a large number of processing cores, programmable for various shadings

The processing power of GPUs can also be used for applications that require vector operations

The CUDA architecture allows to directly access the GPU resources for general-purpose computing

12/21/2017


Concepts, Knowledge (1)Structure of a graphics adapterComponents of the graphics controllerFunction of the RAMDAC circuitFeatures of the GDDR5 graphics memoryData and address bus inversion of the GDDR5 graphics memorySignal training of the GDDR5 graphics memoryRepresentation of 3D objects3D operations performed in the geometry stage

12/21/2017


Concepts, Knowledge (2)

3D operations performed in the rendering stageTypes of shader units contained by GPUsDedicated and integrated GPUsAdvantages and disadvantages of GPGPU computingThe CUDA architectureThread block in the CUDA architectureGrid of blocks in the CUDA architecture

12/21/2017


Questions

1. What is the advantage of a dual-ported video memory?

2. What are the power management features of the GDDR5 video memory?

3. What information is required for representing 3D objects?

4. What operations are performed in the rendering stage for 3D images?

12/21/2017

Date post:	23-Mar-2016
Category:	Documents
Upload:	morna
View:	32 times
Download:	0 times

Contents of the Lecture

Documents