+ All Categories
Home > Documents > Multi-GPU Load Balancing: Simulation & Rendering · 2013. 3. 22. · Title: Multi-GPU Load...

Multi-GPU Load Balancing: Simulation & Rendering · 2013. 3. 22. · Title: Multi-GPU Load...

Date post: 18-Feb-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
22
Multi-GPU Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA
Transcript
  • Multi-GPU Load Balancing for Simulation and Rendering

    Yong Cao Computer Science Department, Virginia Tech, USA

  • In-situ Visualization and Visual Analytics

    •  Instant visualization and interaction of computing tasks •  Applications:

    –  Computational Fluid Dynamics –  Seismic Propagation –  Molecular Dynamics –  Network Security Analysis –  …

    2

  • In-situ Visualization and Visual Analytics

    •  Instant visualization and interaction of computing tasks •  Applications:

    –  Computational Fluid Dynamics –  Seismic Propagation –  Molecular Dynamics –  Network Security Analysis –  …

    3

  • In-situ Visualization and Visual Analytics

    •  Instant visualization and interaction of computing tasks •  Applications:

    –  Computational Fluid Dynamics –  Seismic Propagation –  Molecular Dynamics –  Network Security Analysis –  …

    4

  • Generalized Execution Loop

    5

    Simulation Rendering

    Data write Data read

    Execution:

    Memory:

  • Generalized Execution Loop

    6

    Task 1 Task 2

    Data write Data read

    Execution:

    Memory:

  • Parallel Execution – Task Split

    7

    T1 T2

    Data write Data read

    Processor 1:

    Memory:

    Processor 2:

    Problem: Task (Context) Switch

    •  Disadvantage of context switch: - Overhead of another kernel launch - Flash of the cache lines - Disallow persistent threads

  • Parallel Execution: Pipelining

    8

    t t

    Task 1 Task 2

    Data write Data read

    Memory:

    t+1 t+1

    Processor 1: Processor 2:

    + Simplified kernel for each GPU + Better share memory and cache usage + Persistent thread for distributed scheduling

  • Parallel Execution: Pipelining

    9

    t t

    Task 1 Task 2

    Data write Data read

    Memory:

    t+1 t+1

    Processor 1: Processor 2:

    Problem: bubble in the pipeline

  • Multi-GPU Pipeline Architecture

    10

    Multi-GPU Array

    FIFO Data Buffer

    Vis GPU

    Vis GPU

    Sim GPU

    Sim GPU …

    ……

    Read Write

    Vis Sim

    W

    R Vis Sim

    W

    R

    Time Step 1

    … …Vis Sim

    W

    R

    Time Step 2

    Time Step n

  • Adaptive Load Balancing

    11

    Multi-GPU Array

    FIFO Data Buffer

    Vis GPU

    Vis GPU

    Sim GPU

    Sim GPU …

    ……

    Read Write

    Ada

    ptiv

    e an

    d D

    istr

    ibut

    ed S

    ched

    ulin

    g

    Vis GPU

    Vis GPU

    Vis GPU

    Sim GPU …

    ……

    Read Write Full Buffer: Shift toward Rendering

    Vis GPU

    Sim GPU

    Sim GPU

    Sim GPU …

    ……

    Read Write Empty Buffer: Shift toward Simulation

  • Task Partition

    •  Intra-frame partition

    •  Inter-frame partition

    12

    t

    t t t t

    t t+1 t+2 t+3

    t t+1 t+2 t+3

  • Task Partition for Visual Simulation

    13

    Multi-GPU Array

    FIFO Data Buffer

    Vis GPU

    Vis GPU

    Sim GPU

    Sim GPU …

    ……

    Read Write

    •  Simulation: Intra frame partition •  Rendering: Inter frame partition

  • Problem: Scheduling Algorithm

    •  Performance Model:

    •  Schedule to optimize:

    14

    n: The number of assigned GPUs.

    Mi: The number of assigned Simulation GPUs.

  • Case Study Application

    •  N-body Simulation with Ray-Traced rendering

    •  Performance model parameters:

    •  Simulation: number of iterations (i) number of simulated bodies (p)

    •  Rendering: number of samples for super sampling (s)

    •  Scheduling Optimization:

    15

    Mt = f (it, st, pt )

  • Static Load-Balancing

    •  Assumption: the performance parameters do NOT change at run-time.

    •  Data driven modeling approach: –  Sample the 3 dimensional (i,s,p) as a rigid grid –  Use tri-linear interpolation to get the result for the new

    inputs

    16

    Mt = f (it, st, pt ) M = f (i, s, p)

  • Static Load-Balancing: Results

    •  Performance Parameter Sampling

    •  Load Balancing

    17

    16 Samples, 80 iterations 4 Samples, 80 iterations

  • Dynamic Load Balancing

    •  Assumption: Performance parameters change during the run-time.

    •  Find the indirect load-balance indicator p –  Execution time of the previous time step

    •  Problem: Performance different between two time steps can be dramatic. –  The fullness of the buffer F

    18

  • Dynamic Load Balancing: Result

    •  Stability of the Dynamic Scheduling Algorithm

    19

    No parameter change (only at the beginning)

    Parameters change at the dotted line.

  • Comparison: Dynamic vs. Static Scheduling

    20

    2000 Particles

    4000 Particles

    Performance Speedup over static load-balancing

  • Conclusion

    + Pipelining + Dynamic load balancing

    - Fine granularity load balancing (SM level) - Communication overhead

    - Programmability: Software framework, Library

    21

  • Question(s):

    •  Contact Information:

    Yong Cao

    Computer Science Department

    Virginia Tech Email: [email protected]

    Website: www.cs.vt.edu/~yongcao

    22


Recommended