+ All Categories
Home > Documents > How a GPU Works - Kayvon Fatahalian

How a GPU Works - Kayvon Fatahalian

Date post: 24-Feb-2018
Category:
Upload: gsbabil
View: 221 times
Download: 0 times
Share this document with a friend

of 87

Transcript
  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    1/87

    Kayvon Fatahalian

    15-462 (Fall 2011)

    How a GPU Works

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    2/87

    Today

    1.

    Review: the graphics pipeline

    2. History: a few old GPUs

    3. How a modern GPU works (and why it is s

    4. Closer look at a real GPU design

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    3/87

    Part 1:

    The graphics pipeline

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    4/87

    Vertex processing

    v0

    v1

    v2

    v3

    v4

    v5

    Vertices are transformed into screen space

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    5/87

    Vertex processing

    v0

    v1

    v2

    v3

    v4

    v5

    Vertices are transformed into screen space

    EACH VERT

    TRANSFORM

    INDEPENDE

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    6/87

    Primitive processing

    v0

    v1

    v2

    v3

    v4

    v5

    v0

    v3

    v5

    Then organized into primitives that are clippculled

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    7/87

    Rasterization

    Primitives are rasterized into pixel fragment

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    8/87

    Rasterization

    Primitives are rasterized into pixel fragment

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    9/87

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    10/87

    Fragment processing

    Fragments are shaded to compute a color at each

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    11/87

    Pixel operations

    Fragments are blended into the frame buffer

    pixel locations (z-buffer determines visibility)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    12/87

    Pipeline entities

    v0

    v1

    v2

    v3

    v4

    v5v0

    v1

    v2

    v3

    v4

    v5

    Vertices Primitives Frag

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    13/87

    Graphics pipeline

    Primitive Generation

    Vertex Generation

    Vertex Processing

    Fragment Generation

    Memory Buffers

    Vertex Data Buffers

    Textures

    TexturesPrimitive Processing

    Vertex stream

    Vertex stream

    Primitive stream

    Primitive stream

    F t t

    Vertices

    Primitives

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    14/87

    Part 2:

    Graphics architecture

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    15/87

    Independent

    Whats so important about independent

    computations?

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    16/87

    Silicon Graphics RealityEngine (1

    Primitive G

    Vertex Ge

    Vertex Pr

    Fragment G

    Primitive P

    graphics supercomputer

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    17/87

    Pre-1999 PC 3D graphics accelera

    Primitive G

    Vertex Ge

    Vertex Pr

    Fragment G

    Primitive P

    3dfx Voodoo

    NVIDIA RIVA TNT

    Clip/cull/rasterize

    Tex Tex

    CPU

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    18/87

    GPU* circa 1999

    Primitive G

    Vertex Ge

    Vertex Pr

    Fragment G

    Primitive P

    CPU

    GPU

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    19/87

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    20/87

    Direct3D 10 programmability: 20

    Primitive G

    Vertex Ge

    Vertex Pr

    Fragment G

    Primitive P

    NVIDIA G F 8800

    Core Pixel op

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Core

    Tex

    Pixel op

    Pixel op

    Pixel op

    Pixel op

    Pixel op

    Clip/Cull/Rast

    Scheduler

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    21/87

    Part 3:

    How a shader core wor

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    22/87

    GPUs are fast

    Intel Core i7 Quad Core

    ~100 GFLOPS peak730 million transistors

    AMD Radeon HD

    ~2.7 TFLOPS p2 2 billion transi

    A diff fl t h d

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    23/87

    A diffuse reflectance shader

    Shader programming mo

    Fragments are processed

    but there is no explicit pa

    programming.

    Independent logical sequ

    per fragment. ***

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    24/87

    A diffuse reflectance shader

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    25/87

    A diffuse reflectance shader

    Shader programming mo

    Fragments are processed

    but there is no explicit pa

    programming.

    Independent logical sequ

    per fragment. ***

    Big Guy lookin diffuse

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    26/87

    Big Guy, lookin diffuse

    Compile shader

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    27/87

    Compile shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    28/87

    Execute shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    29/87

    Execute shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    30/87

    Execute shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    31/87

    Execute shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    32/87

    Execute shader

    2;733/!&*9";&'6

    !"#$%& 'G> ?:> .

    #/% '5> ?G> FKG

    #";; '5> ?H> FKG

    #";; '5> ?0> FKG

    F%#$ '5> '5> % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    33/87

    CPU-style cores

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    34/87

    CPU style cores

    Fetch/

    Decode

    Execution

    Context

    ALU

    (Execute)

    Data cache

    (a big one)

    Out-of-order control logic

    Fancy branch predictor

    Memory pre-fetcher

    Slimming down

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    35/87

    Slimming down

    Fetch/

    Decode

    Execution

    Context

    ALU

    (Execute)

    Idea #1:

    Remove components that

    help a single instruction

    stream run fast

    Two cores (two fragments in parallel)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    36/87

    Two cores (two fragments in parallel)

    Fetch/Decode

    Execution

    Context

    ALU

    (Execute)

    Fetch/Decode

    Execution

    Context

    ALU

    (Execute)

    2;733/!&*9";&'6J

    !"#$%& 'G> ?:> .G> !G

    #/% '5> ?G> FKGLGM

    #";; '5> ?H> FKGLHM> '5

    #";; '5> ?0> FKGL0M> '5

    F%#$ '5> '5> % % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    37/87

    Four cores (four fragments in parallel)

    Fetch/Decode

    ExecutionContext

    ALU(Execute)

    Fetch/Decode

    ExecutionContext

    ALU(Execute)

    Fetch/Decode

    ExecutionContext

    ALU(Execute)

    Fetch/Decode

    ExecutionContext

    ALU(Execute)

    Sixteen cores (sixteen fragments in parallel)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    38/87

    Sixteen cores (sixteen fragments in parallel)

    16 cores = 16 simultaneous instruct

    Instruction stream sharing

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    39/87

    g

    But ... many fragments

    should be able to shareinstruction stream!

    2;733/!&*9";&'6J

    !"#$%& 'G> ?:> .G> !G

    #/% '5> ?G> FKGLGM

    #";; '5> ?H> FKGLHM> '5

    #";; '5> ?0> FKGL0M> '5

    F%#$ '5> '5> % % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    40/87

    Fetch/

    Decode

    p p g

    Execution

    Context

    ALU

    (Execute)

    Add ALUs

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    41/87

    Fetch/

    Decode

    Idea #2:

    Amortize cost/complexmanaging an instructio

    stream across many ALU

    SIMD processing

    Ctx Ctx Ctx Ctx

    Ctx Ctx Ctx Ctx

    Shared Ctx Data

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    Modifying the shader

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    42/87

    y g

    Fetch/

    Decode

    Ctx Ctx Ctx Ctx

    Ctx Ctx Ctx Ctx

    Shared Ctx Data

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    2;733/!&*9";&'6J!"#$%& 'G> ?:> .G> !G

    #/% '5> ?G> FKGLGM

    #";; '5> ?H> FKGLHM> '5

    #";; '5> ?0> FKGL0M> '5

    F%#$ '5> '5> % % 'G> '5

    #/% 4H> 'H> '5

    #/% 40> '0> '5

    #4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    43/87

    y g

    Fetch/

    Decode

    Ctx Ctx Ctx Ctx

    Ctx Ctx Ctx Ctx

    Shared Ctx Data

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    New compiled shader:

    Processes eight fragme

    vector ops on vector re

    2NOPQR;733/!&*9";&'6JVEC8_!"#$%& ?&FR'G> ?&FR?:> .G> ?&FR

    VEC8_#/% ?&FR'5> ?&FR?G> FKGLGM

    VEC8_#";; ?&FR'5> ?&FR?H> FKGLHM> ?&

    VEC8_#";; ?&FR'5> ?&FR?0> FKGL0M> ?&

    VEC8_F%#$ ?&FR'5> ?&FR'5> % % ?&FR'G> ?&FR'5

    VEC8_#/% ?&FR4H> ?&FR'H> ?&FR'5

    VEC8_#/% ?&FR40> ?&FR'0> ?&FR'5

    VEC8_#4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    44/87

    y g

    Fetch/

    Decode

    Ctx Ctx Ctx Ctx

    Ctx Ctx Ctx Ctx

    Shared Ctx Data

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    1 2 3 4

    5 6 7 8

    2NOPQR;733/!&*9";&'6JVEC8_!"#$%& ?&FR'G> ?&FR?:> .G> ?&FR

    VEC8_#/% ?&FR'5> ?&FR?G> FKGLGM

    VEC8_#";; ?&FR'5> ?&FR?H> FKGLHM> ?&

    VEC8_#";; ?&FR'5> ?&FR?0> FKGL0M> ?&

    VEC8_F%#$ ?&FR'5> ?&FR'5> % % ?&FR'G> ?&FR'5

    VEC8_#/% ?&FR4H> ?&FR'H> ?&FR'5

    VEC8_#/% ?&FR40> ?&FR'0> ?&FR'5

    VEC8_#4? 45> %

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    45/87

    g

    16 cores = 128 ALUs, 16 simultaneous instruction stream

    128 [ ] in parallelvertices/fragmentsprimitivesOpenCL work items

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    46/87

    CUDA threads

    fragments

    vertices

    primitives

    But what about branches?

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    47/87

    ALU 1 ALU 2 . . . ALU 8. . .Time (clocks)

    2 . . .1 . . . 8

    73

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    48/87

    ALU 1 ALU 2 . . . ALU 8. . .Time (clocks)

    2 . . .1 . . . 8

    73

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    49/87

    ALU 1 ALU 2 . . . ALU 8. . .Time (clocks)

    2 . . .1 . . . 8

    73

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    50/87

    ALU 1 ALU 2 . . . ALU 8. . .Time (clocks)

    2 . . .1 . . . 8

    73

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    51/87

    ! Coherent execution*** (admittedly fuzzy definition): when processing

    entities is similar, and thus can share resources for efficient execution

    - Instruction stream coherence: different fragments follow same sequence of logic

    - Memory access coherence:

    Different fragments access similar data (avoid memory transactions by reusing data in ca

    Different fragments simultaneously access contiguous data (enables efficient, bulk granu

    transactions)

    !Divergence: lack of coherence

    - Usually used in reference to instruction streams (divergent execution does not make full us

    processing)

    *** Do not confuse this use of term coherence with cache coherence protocols

    GPUs share instruction streams across many fragm

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    52/87

    In modern GPUs: 16 to 64 fragments share an instruction stream.

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    53/87

    Stalls!Stalls occur when a core cannot run the next instruction because odependency on a previous operation.

    Recall: diffuse reflectance shader

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    54/87

    Texture acces

    Latency of 10

    Recall: CPU-style core

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    55/87

    ALU

    Fetch/Decode

    Execution

    Context

    OOO exec logic

    Branch predictor

    Data cache

    (a big one: several MB)

    CPU-style memory hierarchy

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    56/87

    CPU cores run efficiently when data is resident in cache

    (caches reduce latency, provide high bandwidth)

    ALU

    Fetch/Decode

    Execution

    contexts

    OOO exec logic

    Branch predictor

    L1 cache

    (32 KB)

    L2 cache

    (256 KB)

    L3 cache

    (8 MB)

    shared across cores

    Processing Core (several cores per chip)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    57/87

    Stalls!Texture access latency = 100s to 1000s of cycles

    Weve removed the fancy caches and logic that helps avoid stall

    Stalls occur when a core cannot run the next instruction because o

    dependency on a previous operation.

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    58/87

    But we have LOTSof independent fragments.(Way more fragments to process than ALUs)

    Idea #3:Interleave processing of many fragments on a single core to avoid

    stalls caused by high latency operations.

    Hiding shader stalls

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    59/87

    Time (clocks) Frag 1 8

    ALU 1 AL

    ALU 5 AL

    Ctx C

    Ctx C

    Shar

    Hiding shader stalls

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    60/87

    Time (clocks)

    ALU 1 AL

    ALU 5 AL

    Frag 9 16 Frag 17 24 Frag 25 32Frag 1 8

    1 2 3 4

    1

    3

    Hiding shader stalls

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    61/87

    Time (clocks)

    Frag 9 16 Frag 17 24 Frag 25 32Frag 1 8

    1 2 3 4

    Stall

    Runnable

    Hiding shader stalls

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    62/87

    Time (clocks)

    Frag 9 16 Frag 17 24 Frag 25 32Frag 1 8

    1 2 3 4

    Stall

    Runnable

    Stall

    Stall

    Stall

    Throughput!

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    63/87

    Time (clocks)

    Frag 9 16 Frag 17 24 Frag 25 32Frag 1 8

    1 2 3 4

    Stall

    Runnable

    Stall

    Runnable

    Stall

    Runnable

    Stall

    Runnable

    Done!

    Done!

    Done!

    Done!

    Start

    Start

    Start

    Increase run time of one group

    to increase throughput of many groups

    Storing contexts

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    64/87

    Fetch/

    Decode

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    Pool of context storage

    128 KB

    Eighteen small contexts (maximal latency hiding

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    65/87

    Fetch/

    Decode

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    Twelve medium contexts

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    66/87

    Fetch/

    Decode

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    Four large contexts (low latency hiding abilit

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    67/87

    Fetch/

    Decode

    ALU 1 ALU 2 ALU 3 ALU 4

    ALU 5 ALU 6 ALU 7 ALU 8

    1 2

    3 4

    My chip!

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    68/87

    16 cores

    8 mul-add ALUs per core

    (128 total)

    16 simultaneousinstruction streams

    64 concurrent (but interleaved)instruction streams

    512 concurrent fragments

    = 256 GFLOPs (@ 1GHz)

    My enthusiast chip!

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    69/87

    32 cores, 16 ALUs per core (512 total) = 1 TFLOP (@ 1 GHz

    Summary: three key ideas for high-throughput ex

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    70/87

    1. Use many slimmed down cores, run them in parallel

    2. Pack cores full of ALUs (by sharing instruction stream overhead acrogroups of fragments)

    Option 1: Explicit SIMD vector instructions

    Option 2: Implicit sharing managed by hardware

    3. Avoid latency stalls by interleaving execution of many groups of fra

    When one group stalls, work on another group

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    71/87

    Putting the three ideas into practice:

    A closer look at a real GPU

    NVIDIA GeForce GTX 480

    NVIDIA GeForce GTX 480 (Fermi)! NVIDIA-speak:

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    72/87

    ! NVIDIA-speak:

    480 stream processors (CUDA cores)

    SIMT execution

    ! Generic speak:

    15 cores

    2 groups of 16 SIMD functional units per core

    NVIDIA GeForce GTX 480 core

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    73/87

    = SIMD function unit,

    control shared acros

    (1 MUL-ADD per clock

    Shared scratchpad memory

    (16+48 KB)

    Execution contexts

    (128 KB)

    Fetch/

    Decode

    Groups of 32 fragments share a

    stream

    Up to 48 groups are simultane

    Up to 1536 individual contexts

    Source: Fermi Compute Architecture Whitepaper

    CUDA Programming Guide 3.1, Appendix G

    NVIDIA GeForce GTX 480 core

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    74/87

    = SIMD function unit,

    control shared acros

    (1 MUL-ADD per clock

    Shared scratchpad memory

    (16+48 KB)

    Execution contexts

    (128 KB)

    Fetch/

    Decode

    Fetch/

    Decode The core contains 32 functiona

    Two groups are selected each c

    (decode, fetch, and execute tw

    streams in parallel)

    Source: Fermi Compute Architecture Whitepaper

    CUDA Programming Guide 3.1, Appendix G

    NVIDIA GeForce GTX 480 SM

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    75/87

    = CUDA core

    (1 MUL-ADD per clock

    Shared scratchpad memory

    (16+48 KB)

    Execution contexts

    (128 KB)

    Fetch/

    Decode

    Fetch/

    Decode The SMcontains 32 CUDA cores

    Two warpsare selected each cl

    (decode, fetch, and execute tw

    parallel)

    Up to 48 warps are interleaved

    CUDA threads

    Source: Fermi Compute Architecture Whitepaper

    CUDA Programming Guide 3.1, Appendix G

    NVIDIA GeForce GTX 480

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    76/87

    There are 15 of these things on the GT

    Thats 23,000 fragments!

    (or 23,000 CUDA threads!)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    77/87

    Looking Forward

    Current and future: GPU architectures! Bigger and faster (more cores, more FLOPS)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    78/87

    gg ( , )

    2 TFLOPs today, and counting

    ! Addition of (select) CPU-like features

    More traditional caches

    ! Tight integration with CPUs (CPU+GPU hybrids)

    See AMD Fusion

    ! What fixed-function hardware should remain?

    Recent trends

    S t f lt ti i i t f

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    79/87

    ! Support for alternative programming interfaces

    Accelerate non-graphics applications using GPU (CUDA, OpenCL)

    ! How does graphics pipeline abstraction change to enable more advance

    real-time graphics?

    Direct3D 11 adds three new pipeline stages

    Global illumination algorithms Credit: Bratincevic

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    80/87

    Credit: NVIDIA

    Ray tracing:

    for accurate reflections, shadows

    Credit: Ingo Wald

    Alternative shading structures (e.g., deferred shading)

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    81/87

    For more efficient scaling to many lights (1000 lights, [Andersson 09])

    Simulation

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    82/87

    Cinematic scene complexity

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    83/87

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    84/87

    Motion blur

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    85/87

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    86/87

  • 7/25/2019 How a GPU Works - Kayvon Fatahalian

    87/87

    Thanks!

    Relevant CMU Courses for students interested in high performance graphics:

    15-869: Graphics and Imaging Architectures (my special topics course)

    15-668: Advanced Parallel Graphics (Treuille)15-418: Parallel Architecture and Programming (spring semester)


Recommended