+ All Categories
Home > Documents > 03-powerOf3DRendering-BPS2011-koduri

03-powerOf3DRendering-BPS2011-koduri

Date post: 03-Apr-2018
Category:
Upload: yurymik
View: 213 times
Download: 0 times
Share this document with a friend

of 32

Transcript
  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    1/32

    Power of Realtime 3D-RendeRaja Koduri

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    2/32

    We ate our GPU cake -

    And had more too! 16+ years of (sugar) high!

    In every GPU generation More performance and performance-per-watt

    More programmability, precision and features

    While maintaining compatibility with 8+ generati

    APIs

    vuoi la botte piena e la moglie ubriaca

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    3/32

    Chip Power = Static Power + Dynamic Power

    System Power = CPU + GPU + Other

    Static power is leakage of inactive transistors

    Static Power = N*V*e-Vt

    Dynamic power is from active switching transistors

    Dynamic Power = A*N*C*F*V2

    N - Number of transistors

    V - Voltage

    C - Capacitance per transistor

    F - Frequency

    Vt - Thresh-hold Voltage

    A - Activity Factor

    Energy = Power(Watts) * Time

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    4/32

    Desktops

    0-10 Watts TDP 10-50 Watts TDP 50-300 Watts T

    Mobile Devices

    (Phones, tablets etc)

    Mobile Computers

    (Computers with abattery)

    Desktops, ServeConsoles etc

    (Always plugged

    TDP - Thermal design power

    Maximum amount of power that the thermal system csustain

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    5/32

    Sub Moores law scaling of performance-per-w N wants to go up, but Sub-Moore scaling on C and V.

    GPUs scale better than CPUs

    Beware of the Fine Print though

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    6/32

    Towards lower power computers

    Battery lifeThermal limits

    Acoustics

    $-per-KiloWattHourGreen

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    7/32

    0 W

    10 W

    50 W

    300 W

    Mobile Devices

    Desktops

    Mobile Computers

    HW vendors compelled to succeed in mobile

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    8/32

    Chip Power = N*V*e-Vt + A*N*C*F*V2

    CPUs

    Prioritize Frequency - higher V and Vt

    Spend N for caches, cores, flexibility and compute qu

    GPUs

    Lower Frequency and Voltage

    Spend N for shaders, textures, pixels etc

    FixedFunction

    Lowest N, F and V for a given task

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    9/32

    If your workload parallelizes well on a GPUUse the GPU

    Optimize for System Energy

    Its only a win if

    GPU_Workload_power*GPU_Executiontime +CPU_GPU_feeding_power*CPU_GPU_feeding_t is less thanCPU_Workload_power*CPU_Executiontime

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    10/32

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    11/32

    Power off unused areas(Power gating)When not in use Static Power ~= 0

    Fine print Latency with power toggle (few nano-seconds to a few hundred microsecond

    Too aggressive switching may cause performance problems, too conservativswitching may lead to wasted power

    Static Power = N*V*e-Vt

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    12/32

    Primary OS+HW strategy is to control F & V

    Based on history of A

    Applications have control of Activity(A)

    Lower A, leads to lower F and V and lower power.

    Fine Print

    Switching F&V states can range between a few milliseconds to few secon

    Dynamic Power = A*N*C*F*V2

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    13/32

    0

    0.3

    0.5

    0.8

    1.0

    Activity/Performance/Powersampleillustration

    Activity Frequency Power

    Time

    Ac

    tivity/Frequency/Power

    State Switch latency

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    14/32

    Applications

    GPU API/Runtime

    GPU UserMode Drivers

    GPU Kernel Mode DriverDrivers

    GPU Power ManagementDriver

    GPUPower u-Controller

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    15/32

    Control frame rate to minimum desired

    Pumping out more frames than user can see is

    wasteful anyways

    Tip 1

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    16/32

    0 10 20 30 40 50 60 70 80 90

    400

    0

    50

    100

    150

    200

    250

    300

    350

    50 Watt System

    Frames-per-sec

    Power in Watts

    Power Limit @50W

    Simple rendering (like menus)

    Complex re(like game p

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    17/32

    0 10 20 30 40 50 60 70 80 90

    400

    0

    50

    100

    150

    200

    250

    300

    350

    25 Watt Syste

    Frames-per-sec

    Power in Watts

    PowerLimit@25W

    Simple rendering (like menus)

    Complex ren(like game p

    Thermal limit

    60 fps app

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    18/32

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    19/32

    Optimize the frame rendering time to a minimum

    Dont stop optimizing when you hit your minimum

    frame rate targets!

    Tip 2

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    20/32

    1 2 4 6 8 10 12 14

    100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Time in Milliseconds

    Activity

    Case1 Energy 4*Pmax+12*Pstatic

    Pstatic is near zero in power optimized GPUs,So

    Case1 Energy 4*Pmax

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    21/32

    1 2 4 6 8 10 12 14

    100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Time in Milliseconds

    Activity

    Case1 Energy 4*Pmax

    Case2 Energy 16*Pmax

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    22/32

    Dont scatter work in a frame (coalesce)

    Insufficient idle intervals for power-state reduction

    Tip 3

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    23/32

    1 2 4 6 8 10 12

    100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Time in Milliseconds

    Activity

    May not be enough idle time for switching to lower powe

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    24/32

    Avoid spin-loops

    Tip 4

    Eg:- CPU waiting on GPU

    This looks like real work to CPU

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    25/32

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    26/32

    A complex subject

    Dynamic scheduling systems based on power thermal feedback

    Hardware v/s Software schedulers

    Scheduling CPU and GPU

    Many more topics

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    27/32

    Premature declaration of death of fixed-fu

    in GPU hardware

    What new candidates can we move to

    FixedFunction?

    What interface principles should

    FixedFunction hardware adopt to be

    mainstream programmer friendly?

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    28/32

    Can we build Predictive models toaugment current reactive models?

    Should there be APIs for apps to influenpower states and monitor feedback?

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    29/32

    2560x1600 Screen @60Hz ~ 250 MPixels/Sec

    A 25W GPU today

    5800 MPixels/Sec17400 MTexels/Sec696000 MFLOPS

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    30/32

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    31/32

    Reduction Choices Reduce N, but that reduces capabilities

    Reduce V, but limits performance and not in yourcontrol(process limits)

    Reduce Vt, Slower transistor, lower performance

    Static Power = N*V*e-Vt

    Dominant battery life factor for common usage scenar

  • 7/29/2019 03-powerOf3DRendering-BPS2011-koduri

    32/32

    1 2 4 6 8 10 12 14

    100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Time in Milliseconds

    Activity

    Case1 Energy 4*Pmax

    Case3 Energy 8*0.7*P?


Recommended