+ All Categories
Home > Documents > Contents of the Lecture

Contents of the Lecture

Date post: 23-Mar-2016
Category:
Upload: morna
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Contents of the Lecture. 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5 . Other Types of Displays 6 . Graphics Adapters 7 . Optical Discs . 6 . Graphics Adapters. Structure of a Graphics Adapter Color Representation Video Memory - PowerPoint PPT Presentation
81
Contents of the Lecture 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 12/21/2017 1 Input/Output Systems and Peripheral Devices (06-1)
Transcript
Page 1: Contents of the Lecture

1

Contents of the Lecture

1. Introduction2. Methods for I/O Operations3. Buses4. Liquid Crystal Displays5. Other Types of Displays6. Graphics Adapters7. Optical Discs

12/21/2017 Input/Output Systems and Peripheral Devices (06-1)

Page 2: Contents of the Lecture

2Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017

Page 3: Contents of the Lecture

3Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (1)

12/21/2017

Page 4: Contents of the Lecture

4Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (2)

Graphics Controller Implements the main functions of the graphics adapter System Bus Interface

Transfers in burst mode Transfers with no wait states when reading the video memory FIFO memory for efficient write to the video memory

12/21/2017

Page 5: Contents of the Lecture

5Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (3)

Video Memory Interface Allows to update the video images

VGA Registers and Control RegistersEnable programming of the video adapter for operation in VGA modes There are adapters that are no longer compatible with the VGA standard

Cursor GeneratorGraphic Functions

Implemented by graphics accelerators 12/21/2017

Page 6: Contents of the Lecture

6Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (4)Video BIOS

Provides video functions for access to the graphics adapter The BIOS programs of different adapters are different difficult programming VESA (Video Electronics Standards Association) standard for high-resolution BIOS functions

Video MemoryHolds the video image frame buffer

12/21/2017

Page 7: Contents of the Lecture

7Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (5)

RAMDAC Circuit (RAMDAC – RAM Digital to Analog Converter)

Reads the digital image and converts it into analog signals The RAMDAC functions may be integrated into the graphics controller Only required for displays with analog inputs Displays that operate in the digital domain reconvert the analog signals to digital form

12/21/2017

Page 8: Contents of the Lecture

8Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (6)

Video PortsEnable to transfer the video images to a monitor There are several variants of video ports VGA (Video Graphics Array)

Analog interface Designed for CRT displays, but also used by some liquid crystal displays Electrical noise may occur DB-15 connector

12/21/2017

Page 9: Contents of the Lecture

9Input/Output Systems and Peripheral Devices (06-1)

Structure of a Graphics Adapter (7)

VIVO (Video In Video Out)Analog interface for connecting to TV sets, DVD players, game consoles (TV Out) Signals: S-Video (Y/C); composite video; component video (e.g., RGB) 9-pin mini-DIN connector

DVI (Digital Visual Interface) Digital interface DVI-I (digital and analog signals) or DVI-D (digital signals only) connector

12/21/2017

Page 10: Contents of the Lecture

10

Structure of a Graphics Adapter (8)

HDMI (High-Definition Multimedia Interface)Digital interface for uncompressed video dataAllows to send digital audio data over the same cable19-pin (single-link) or 29-pin (dual-link) connector

DisplayPort Digital interface for video and audio data Targeted to replace the VGA and DVI interfaces 20-pin connectors for 1, 2, or 4 lanes

12/21/2017 Input/Output Systems and Peripheral Devices (06-1)

Page 11: Contents of the Lecture

11Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017

Page 12: Contents of the Lecture

12Input/Output Systems and Peripheral Devices (06-1)

Video Memory (1)

Can be single-ported or dual-ported Single-ported video memory

The data port is used to refresh the screen and to write new data

Dual-ported video memoryOne of the ports is used to update the images in memory The second port has serial access and is used to refresh the images on the screen

12/21/2017

Page 13: Contents of the Lecture

13Input/Output Systems and Peripheral Devices (06-1)

Video Memory (2)

Video Memory Transfer RateThe maximum transfer rate bandwidth Affected by video memory technology and access time Bandwidth has to be shared by: screen refreshing circuits, CPU, graphics controller 30 .. 50% of the bandwidth should be reserved for other functions, different than refreshing

12/21/2017

Page 14: Contents of the Lecture

14

Video Memory (3)

DDR-400 (PC3200) memory Maximum transfer rate: 3,200 MB/s Average transfer rate: ~1,600 MB/s

DDR2-667 (PC2-5300) memory Maximum transfer rate : 5.336 GB/s

DDR3-2133 (PC3-17000) memory Maximum transfer rate : 17 GB/s

DDR4-3200 (PC4-25600) memory Maximum transfer rate : 25.6 GB/s

12/21/2017 Input/Output Systems and Peripheral Devices (06-1)

Page 15: Contents of the Lecture

15Input/Output Systems and Peripheral Devices (06-1)

Video Memory (4)

GDDR (Graphics Double Data Rate)Designed by ATI Technologies with the collaboration of the JEDEC committee Several versions: GDDR2 .. GDDR5

GDDR2 and GDDR3: based on DDR2 technologyGDDR4 and GDDR5: based on DDR3 technology

Low voltage: 1.8 V .. 1.5 V reduced power consumption and heat output Separate data strobe signals for read and write

12/21/2017

Page 16: Contents of the Lecture

16Input/Output Systems and Peripheral Devices (06-1)

Video Memory (5)

GDDR5Combines high performance with stable operation and low implementation costs Memory organization: 32 Differential command clock signal (CK, CK#) Two diff. write clock signals (WCK, WCK#)

Two data bytes are aligned to one WCK signal Example for a data rate of 5 Gbits/s:

fCK = 1.25 GHz; fWCK = 2.5 GHz

12/21/2017

Page 17: Contents of the Lecture

17Input/Output Systems and Peripheral Devices (06-1)

Video Memory (6)

Data bus inversion Reduces the number of zero bits transmitted Indicated with a DBI# signal for each byte Transmission lines have high level termination power dissipation is reduced

Address bus inversion Signal training

Phase adjustment of clock, data, and address signals

12/21/2017

Page 18: Contents of the Lecture

18

Video Memory (7)

Address training: alignment of the address bus to the CK clock signal Alignment of WCK signal to the CK signal Data training: alignment of the data lines to the corresponding WCK signal A “hidden” data re-training is possible

Calibration: improves the reliability Auto-calibration: drive strength, termination impedance Software-controlled adjustment

12/21/2017 Input/Output Systems and Peripheral Devices (06-1)

Page 19: Contents of the Lecture

19Input/Output Systems and Peripheral Devices (06-1)

Video Memory (8)Burst read/write access to the internal memory: 8 bits/pin 256 bits (two CK cycles)

Maximum transfer rates of 4 .. 7 Gbits/s per pin 16 .. 28 GB/s for 32 pins

Error detection Dedicated EDC (Error Detection Code) pins for sending CRC codes to the controller CRC code: for each data byte + DBI# line Allows to detect single-bit and double-bit errors

12/21/2017

Page 20: Contents of the Lecture

20Input/Output Systems and Peripheral Devices (06-1)

Video Memory (9)

Power management Features that allow to consume power only when it is needed Scalable clock frequency and data rate: 5 Gbits/s .. 200 Mbits/s Low power mode for the DRAM core Multiple levels for termination impedance: increasing the impedance at slower data rates Low supply voltage: 1.5 V Data and address bus inversion

12/21/2017

Page 21: Contents of the Lecture

21Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017

Page 22: Contents of the Lecture

22Input/Output Systems and Peripheral Devices (06-1)

Graphics Accelerators (1)

Contain specialized circuits to execute the mathematical operations required for graphics rendering

Release the CPU from the task of executing these operations

The first graphics accelerators: AVGA (Accelerated VGA) adapters Subsequent graphics accelerators: 2D acceleratorsThe link between the accelerator circuitry and the OS is made via a driver

12/21/2017

Page 23: Contents of the Lecture

23Input/Output Systems and Peripheral Devices (06-1)

Graphics Accelerators (2)Common 2D graphics functions:

BitBlt (Bit Block Transfer)Two bitmaps are combined with a raster operation Boolean operatorThe result is transferred to the destination areaBlitter: dedicated circuit for the BitBlt operation

Tracing lines, drawing rectangles, circles Filling surfaces or polygons Adding color

12/21/2017

Page 24: Contents of the Lecture

24Input/Output Systems and Peripheral Devices (06-1)

Graphics Accelerators (3)

Multimedia accelerators: graphics accelerators extended with audio and video acceleration functionsFunctions:

Decoding audio data streams Scaling video images in x, y directionsConverting digital video signals into RGB signals Decompressing video images represented in various formats

12/21/2017

Page 25: Contents of the Lecture

25Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017

Page 26: Contents of the Lecture

26Input/Output Systems and Peripheral Devices (06-1)

3D Accelerators

3D Accelerators3D Images 3D Operations

12/21/2017

Page 27: Contents of the Lecture

27Input/Output Systems and Peripheral Devices (06-1)

3D Images (1)

Are managed using abstract models An object is represented as a set of points defined by its x, y, and z coordinates position of verticesIf the object vertices are connected with lines, surfaces are obtained can be filled with a certain color or texture Each 3D object is composed of a large number of triangles (or polygons) that describe its surface

12/21/2017

Page 28: Contents of the Lecture

28Input/Output Systems and Peripheral Devices (06-1)

3D Images (2)

Animated 3D graphics requires to perform a series of geometry computations that define the position of objects in 3D space

The geometry computations that handle the vertices of triangles can be performed by the CPU or by the graphics processor

The graphics processor must convert these triangles into solid surfaces intensive computations are needed

12/21/2017

Page 29: Contents of the Lecture

29Input/Output Systems and Peripheral Devices (06-1)

3D Images (3)

In the real world, objects interact with each other Complex mathematical equations are used to determine whether an object is visible in a scene from a given angle Besides the color components, for each pixel an alpha value must also be stored

Indicates the degree of transparency of the pixel in the final image

12/21/2017

Page 30: Contents of the Lecture

30Input/Output Systems and Peripheral Devices (06-1)

3D Images (4)

Another information that must be stored: the depth in space or z coordinate

The accelerator determines the z value of the objects’ pixels in a plane and displays those with a smaller z value The pixels’ depth information is stored in a separate buffer z-buffer Usually, 32 bits are allocated in the z-buffer for each pixel

12/21/2017

Page 31: Contents of the Lecture

31Input/Output Systems and Peripheral Devices (06-1)

3D Images (5)

Each time the image is updated, the color and depth of pixels must be recomputed

Applying different 3D computations to the scene process of rendering Fills in all of the points on the surface of the object that previously was stored only as a set of vertices A solid object with 3D effects will be drawn on the monitor

12/21/2017

Page 32: Contents of the Lecture

32Input/Output Systems and Peripheral Devices (06-1)

3D Accelerators

3D Accelerators3D Images 3D Operations

12/21/2017

Page 33: Contents of the Lecture

33Input/Output Systems and Peripheral Devices (06-1)

3D Operations (1)

3D operations are performed in two stages:

Geometry stage: clipping, transformation, lighting Rendering stage: shading, texture mapping with adding the perspective effect, texture filtering, alpha blending On current 3D accelerators, operations in both stages are performed by the graphics processor

12/21/2017

Page 34: Contents of the Lecture

34Input/Output Systems and Peripheral Devices (06-1)

3D Operations (2)

12/21/2017

Page 35: Contents of the Lecture

35Input/Output Systems and Peripheral Devices (06-1)

3D Operations (3)

ClippingDetermines what part of an object is visible on the screen Eliminates the parts that are not visible

LightingObjects are modeled by light sources in the scene Lighting effects create color shading, light reflection, shadows

12/21/2017

Page 36: Contents of the Lecture

36Input/Output Systems and Peripheral Devices (06-1)

3D Operations (4)

TransformationTranslation: moving every point by a fixed distance in the same direction Reflection: transforming an object into its mirror image Glide reflection: combining a reflection with the translation along the reflection axis Scaling: linear transformation to change the size of objects

12/21/2017

Page 37: Contents of the Lecture

37Input/Output Systems and Peripheral Devices (06-1)

3D Operations (5)

Tessellation Dividing polygons into smaller structures for rendering Dividing into triangles: triangulation

12/21/2017

Page 38: Contents of the Lecture

38Input/Output Systems and Peripheral Devices (06-1)

3D Operations (6)

Shading Enables the realistic representation of 3D objects on 2D screens Algorithms: Gouraud, Phong Reading the color information of vertices Interpolating the intensities for the color components

12/21/2017

Page 39: Contents of the Lecture

39Input/Output Systems and Peripheral Devices (06-1)

3D Operations (7)

Texture Mapping Adding surface details (textures) to polygons that represent objects

Loading the texture elements (texels) from a bitmapCombining the texels Writing the resulting pixel to video memory

Applying a single texture Multi-texturing: a combination of textures is applied to an object

12/21/2017

Page 40: Contents of the Lecture

40Input/Output Systems and Peripheral Devices (06-1)

3D Operations (8)

Textures may require a large space in memory compression is used Textures must be corrected to create the perspective effect

12/21/2017

Page 41: Contents of the Lecture

41Input/Output Systems and Peripheral Devices (06-1)

3D Operations (9)

Texture Filtering Reduces some unwanted effects that may occur with texture mapping The color of a new pixel is determined through interpolation between the colors of several texels in the original textureBilinear filtering: uses the weighted average of the four texels nearest to a particular texel

12/21/2017

Page 42: Contents of the Lecture

42Input/Output Systems and Peripheral Devices (06-1)

3D Operations (10)Trilinear Filtering

The texture resolution is reduced when the distance to the object increases 3D accelerators store in memory several variants of a texture “MIP mapping”Combining this feature with bilinear filtering

12/21/2017

Page 43: Contents of the Lecture

43Input/Output Systems and Peripheral Devices (06-1)

3D Operations (11)

Fogging Gradually fading objects in the distance The scene will appear more realistic illusion of distant objects Allows to perform the 3D processing faster

Alpha Blending Used to create the transparency effect for some objects (e.g., windows)

12/21/2017

Page 44: Contents of the Lecture

44Input/Output Systems and Peripheral Devices (06-1)

3D Operations (12)

Anti-Aliasing Oblique lines: approximated by combining vertical segments with horizontal segments the aliasing effect occurs Removing this effect (“anti-aliasing”):

Changing the color of pixels near the outlines The background color is gradually mixed with the object's color

The clarity of the outlines is reduced 12/21/2017

Page 45: Contents of the Lecture

45Input/Output Systems and Peripheral Devices (06-1)

6. Graphics Adapters

Structure of a Graphics Adapter Color Representation Video Memory Graphics Accelerators3D AcceleratorsGraphics Processing UnitsDigital Interfaces for Monitors

12/21/2017

Page 46: Contents of the Lecture

46Input/Output Systems and Peripheral Devices (06-1)

Graphics Processing Units

Graphics Processing UnitsOverviewGPGPU ComputingThe CUDA ArchitectureThe NVIDIA GP100 GPU

12/21/2017

Page 47: Contents of the Lecture

47Input/Output Systems and Peripheral Devices (06-1)

Overview (1)

GPU – Graphics Processing UnitDedicated graphics processors for PCs, workstations, and game consoles

Initially used to accelerate the rendering stage for 3D graphics (e.g., texture mapping)Later also used to accelerate the geometric computations (rotation, translation)

GPUs contain shader units, modules for texture mapping, anti-aliasing etc.

12/21/2017

Page 48: Contents of the Lecture

48Input/Output Systems and Peripheral Devices (06-1)

Overview (2)

Vertex shader units Transform the 3D position of each vertex to the 2D coordinates on the screen and to the depth value for the z-buffer Modify the attributes of vertices: position, color, texture coordinates

Geometry shader unitsGenerate geometric figures or add volumetric details to objects

12/21/2017

Page 49: Contents of the Lecture

49Input/Output Systems and Peripheral Devices (06-1)

Overview (3)Pixel/fragment shader units

Determine the color, z depth, and alpha value for each pixel or fragment

Unified shader units Programmable units Able to perform various shading operations (vertex, geometry, pixel) GPUs contain an array of computing units and a unit that distributes the operations to be performed

12/21/2017

Page 50: Contents of the Lecture

50Input/Output Systems and Peripheral Devices (06-1)

Overview (4)

The architecture with programmable units allows a more flexible use of the hardware resources The programmable units can also be used for other types of computations A flexible parallel architecture is obtained

GPUs also include modules for 2D acceleration, MPEG compression, high-definition video decoding

12/21/2017

Page 51: Contents of the Lecture

51Input/Output Systems and Peripheral Devices (06-1)

Overview (5)

GPUs can be dedicated or integrated Dedicated GPUs

Used in graphics cards interfaced with the motherboard via a PCI Express bus or AGP (Accelerated Graphics Port) interface Have a dedicated memory to the card use Examples

AMD Radeon HD 8xxxM (e.g., 8970M)

NVIDIA GeForce GTX (e.g., GTX 1080)

12/21/2017

Page 52: Contents of the Lecture

52Input/Output Systems and Peripheral Devices (06-1)

Overview (6)

Integrated GPUsAre integrated into a chipset or processorUse a portion of the system memory Have lower performance compared to dedicated GPUs Examples

Intel HD Graphics (e.g., UHD Graphics 630) AMD A-10 APU (Accelerated Processing Unit) processor series NVIDIA in Tegra processors (K1, X1)

12/21/2017

Page 53: Contents of the Lecture

53Input/Output Systems and Peripheral Devices (06-1)

Overview (7)

The design of GPUs was influenced by the 2D and 3D programming interfaces

Implement API functions in hardware OpenGL (Open Graphics Library)

For various platforms and languages Functions to draw 3D scenes from primitives

Direct3D (component of DirectX) Only for the Microsoft operating systemsLow-level interface to the 3D hardware functions

12/21/2017

Page 54: Contents of the Lecture

54Input/Output Systems and Peripheral Devices (06-1)

Overview (8)

Technologies for connecting multiple GPUs on different graphics cards NVIDIA: SLI (Scalable Link Interface)

2 .. 4 identical graphics cards are connected via a motherboard (PCIe x 16)

AMD: CrossFireX Up to 4 graphics cards can be connectedThe graphics cards do not have to be identicalThe cards have external connectors

12/21/2017

Page 55: Contents of the Lecture

55Input/Output Systems and Peripheral Devices (06-1)

Graphics Processing Units

Graphics Processing UnitsOverviewGPGPU ComputingThe CUDA ArchitectureThe NVIDIA GP100 GPU

12/21/2017

Page 56: Contents of the Lecture

56Input/Output Systems and Peripheral Devices (06-1)

GPGPU Computing (1)GPGPU (General Purpose computing on GPU)The GPU processing cores provide massive FP computational power

Example: a single NVIDIA GP100 GPU (3,584 cores) achieves 10.6 TFLOPS

The graphics pipeline can also be used for general-purpose applications

The performance can be orders of magnitude higher than that of conventional CPUs

12/21/2017

Page 57: Contents of the Lecture

57Input/Output Systems and Peripheral Devices (06-1)

GPGPU Computing (2)

GPUs can process independent vertices and pixels/fragments stream processors

Stream: set of records that require similar computation Kernel function: applied to each element in the stream Shared memories cannot be used

Ideal GPGPU applications: large data sets, high parallelism, reduced dependencies

12/21/2017

Page 58: Contents of the Lecture

58Input/Output Systems and Peripheral Devices (06-1)

GPGPU Computing (3)Disadvantages of GPGPU computing:

The programmer needs to be familiar with the graphics APIs and the GPU architectureProblems need to be expressed in terms of coordinates, textures, shader functions The need to use graphics programming languages: OpenGL, DirectX, Cg

API extensions for running some program functions on GPU's processors: CUDA (NVIDIA), OpenCL (Khronos Group)

12/21/2017

Page 59: Contents of the Lecture

59Input/Output Systems and Peripheral Devices (06-1)

Graphics Processing Units

Graphics Processing UnitsOverviewGPGPU ComputingThe CUDA ArchitectureThe NVIDIA GP100 GPU

12/21/2017

Page 60: Contents of the Lecture

60Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (1)

CUDA (Compute Unified Device Architecture)Software and hardware architecture

Enables GPUs to execute programs written in C, C++, Fortran, OpenCL languagesAllows to use Microsoft's DirectCompute API Allows to access directly the GPU resources for general-purpose computing

Exploits the GPU's capability to operate on large matrices in parallel

12/21/2017

Page 61: Contents of the Lecture

61Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (2)

A CUDA program calls kernel functions executed by threadsThreads are organized into blocks and groups of blocks (grids) Thread block:

Set of concurrent threadsCommunicate via a shared memory Each thread has an identifier, registers, private memory, inputs, outputs

12/21/2017

Page 62: Contents of the Lecture

62Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (3)

Grid of blocks:Group (array) of thread blocks The blocks execute the same kernel function Ensure synchronization between dependent kernel functions Results are shared in a global memory allocated to an application global synchronization

12/21/2017

Page 63: Contents of the Lecture

63Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (4)

12/21/2017

Page 64: Contents of the Lecture

64Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (5)

The hierarchy of threads is executed on a hierarchy of processors on the GPU

Threads: executed by CUDA cores and other execution units Thread blocks: executed by a streaming multiprocessor (SM) Group of 32 threads: warp Grids of blocks: executed by the GPU

12/21/2017

Page 65: Contents of the Lecture

65Input/Output Systems and Peripheral Devices (06-1)

The CUDA Architecture (6)Unified Virtual Addressing (CUDA 4)

Provides a single virtual memory address space for CPU and GPU memory

Unified Memory (CUDA 6)Part of the GPU physical memory is shared between the CPU and GPU CUDA software migrates data allocated in the unified memory between GPU and CPU The memory modified by the CPU should be synchronized with the GPU memory

12/21/2017

Page 66: Contents of the Lecture

66Input/Output Systems and Peripheral Devices (06-1)

Graphics Processing Units

Graphics Processing UnitsOverviewGPGPU ComputingThe CUDA ArchitectureThe NVIDIA GP100 GPU

12/21/2017

Page 67: Contents of the Lecture

67Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (1)Uses a new architecture, Pascal

Previous architectures: Fermi, Kepler, Maxwell

Main features:15.3 billion transistors, 16 nm technologyUses NVIDIA's NVLink interconnect, with a bandwidth of up to 160 GB/s Integrates an HBM2 (High Bandwidth Memory 2) stacked memory, 16 GB .. 32 GBImproved power efficiency (performance/W)

12/21/2017

Page 68: Contents of the Lecture

68Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (2)

A full implementation contains:Six Graphics Processing Cluster (GPC) units 30 Texture Processing Cluster (TPC) units60 SM units with 64 CUDA cores and 4 texture units (3,840 cores; 240 texture units)Eight 512-bit memory controllers (4096-bit memory interface)Four HBM2 DRAM memory stacks Common L2 cache memory for the SM unitsGlobal scheduler GigaThread Engine

12/21/2017

Page 69: Contents of the Lecture

69Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (3)

12/21/2017

Page 70: Contents of the Lecture

70Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (4)

Each SM unit contains:64 single-precision (SP) CUDA cores

Integer arithmetic and logic unit Floating-point unit IEEE 754-2008Fused multiply-add instruction (FMA)

32 double-precision (DP) units 16 Load/Store units (LD/ST)16 special-function units (SFU) transcendental functions

12/21/2017

Page 71: Contents of the Lecture

71Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (5)

12/21/2017

Page 72: Contents of the Lecture

72Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (6)

Threads are scheduled in groups of 32 (warps)Each SM unit contains:

Two warp schedulers provide increased performance and reduced power consumptionFour instruction dispatch units

From each warp, two instructions can be dispatched in each clock cycle

12/21/2017

Page 73: Contents of the Lecture

73Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (7)

Memory subsystemEach SM unit contains an instruction cache memoryThere is a separate L1 data cache memory, which can also be used as a texture memory 4096 KB of unified L2 cache memory: allows to share data between the SM units64 KB of shared memory The register files and memories are protected by an ECC code

12/21/2017

Page 74: Contents of the Lecture

74Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (8)

The NVIDIA Tesla P100 GPU AcceleratorBased on the Pascal GPU architecture and the CUDA parallel computing model Designed as an accelerator for datacenters Contains one GP100 GPUNumber of CUDA cores: 56 x 64 = 3,584

Only 56 SM units are enabled Single-precision FP perform.: 10.6 TFLOPSDouble-precision FP perform.: 5.3 TFLOPSMemory size: 4 x 4 GB = 16 GB (HBM2)

12/21/2017

Page 75: Contents of the Lecture

75Input/Output Systems and Peripheral Devices (06-1)

The NVIDIA GP100 GPU (9)

12/21/2017

Page 76: Contents of the Lecture

76Input/Output Systems and Peripheral Devices (06-1)

Summary (1)

The main component of a graphics adapter is the graphics controller

It contains the system bus interface; the speed of this interface is an important performance factor

Video ports enable to combine video images from other sources with graphics imagesThe transfer rate of the video memory has a major impact on performance

Dual-ported memories: updating the images and refreshing the screen can be performed in parallel

12/21/2017

Page 77: Contents of the Lecture

77Input/Output Systems and Peripheral Devices (06-1)

Summary (2)

The GDDR5 memory has advanced features for high performance and stable operation

Data and address bus inversion; signal training; calibration; error detection; power management

3D accelerators are required to convert 3D objects into 2D images in a realistic mannerFor each pixel of a 3D object, an alpha value and the z coordinate have to be stored3D operations are performed in two stages: geometry stage and rendering stage

12/21/2017

Page 78: Contents of the Lecture

78Input/Output Systems and Peripheral Devices (06-1)

Summary (3)

GPUs are used to accelerate the geometric stage and the rendering stage of 3D graphics

Can be dedicated or integrated in a chipset GPUs contain a large number of processing cores, programmable for various shadings

The processing power of GPUs can also be used for applications that require vector operations

The CUDA architecture allows to directly access the GPU resources for general-purpose computing

12/21/2017

Page 79: Contents of the Lecture

79Input/Output Systems and Peripheral Devices (06-1)

Concepts, Knowledge (1)Structure of a graphics adapterComponents of the graphics controllerFunction of the RAMDAC circuitFeatures of the GDDR5 graphics memoryData and address bus inversion of the GDDR5 graphics memorySignal training of the GDDR5 graphics memoryRepresentation of 3D objects3D operations performed in the geometry stage

12/21/2017

Page 80: Contents of the Lecture

80Input/Output Systems and Peripheral Devices (06-1)

Concepts, Knowledge (2)

3D operations performed in the rendering stageTypes of shader units contained by GPUsDedicated and integrated GPUsAdvantages and disadvantages of GPGPU computingThe CUDA architectureThread block in the CUDA architectureGrid of blocks in the CUDA architecture

12/21/2017

Page 81: Contents of the Lecture

81Input/Output Systems and Peripheral Devices (06-1)

Questions

1. What is the advantage of a dual-ported video memory?

2. What are the power management features of the GDDR5 video memory?

3. What information is required for representing 3D objects?

4. What operations are performed in the rendering stage for 3D images?

12/21/2017


Recommended