+ All Categories
Home > Documents > Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Date post: 23-Feb-2016
Category:
Upload: brier
View: 55 times
Download: 0 times
Share this document with a friend
Description:
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization. Hank Childs Lawrence Berkeley National Laboratory / University of California at Davis October 25, 2010. Outline. Motivation Parallelization strategies Master-slave parallelization - PowerPoint PPT Presentation
Popular Tags:
48
Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization Hank Childs Lawrence Berkeley National Laboratory / University of California at Davis October 25, 2010
Transcript
Page 1: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Large Vector-Field Visualization,Theory and Practice:

Large Data and Parallel Visualization

Hank ChildsLawrence Berkeley National Laboratory /

University of California at DavisOctober 25, 2010

Page 2: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Outline

• Motivation• Parallelization strategies• Master-slave parallelization• Hybrid parallelism

Page 3: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Outline

• Motivation• Parallelization strategies• Master-slave parallelization• Hybrid parallelism

Page 4: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Supercomputers are generating large data sets that often require parallelized postprocessing.

217 pin reactor cooling simulation.Nek5000 simulation on ¼ of Argonne BG/P.Image credit: Paul Fischer using VisIt

1 billion element unstructured mesh

Page 5: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Communication between “channels” are a key factor in effective cooling.

Page 6: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Particle advection can be used to study communication properties.

Page 7: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

This sort of analysis requires many particles to be statistically significant.

Place thousands of particles in one channel

Observe which channels the particles pass through

Observe where particles come out (compare with experimental data)

How can we parallelize this process?

Repeat for other channels

Page 8: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Outline

• Motivation• Parallelization strategies• Master-slave parallelization• Hybrid parallelism

Page 9: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

For embarrassingly parallel algorithms, the most common processing technique is pure parallelism.

P0P1

P3

P2

P8 P7 P6P5

P4

P9

Pieces of data

(on disk)

Read Process Render

Processor 0

Read Process Render

Processor 1

Read Process Render

Processor 2

Parallelized visualizationdata flow network

P0 P3P2

P5P4 P7P6

P9P8

P1

Parallel Simulation Code

Particle advection is not embarrassingly parallel.

So how to parallelize?A: it depends

Page 10: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Particle advection:Four dimensions of complexity

Data set size

vs

Seed set distribution

vs

Seed set size

vs

Vector field complexity

Page 11: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Do we need parallel processing? When? How complex?

• Data set size?• Not enough!

• Large #’s of particles?

Page 12: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Parallelization for small data and a large number of particles.

Read Advect Render

Processor 1

Read Advect Render

Processor 2

Read Advect Render

Processor 0

Parallelized visualizationdata flow network

File

Simulation code

GPU-accelerated approaches follow a variant of this model.

The key is that the data is small enough that it can fit in memory.

This scheme is referred to as parallelizing-over-particles.

Page 13: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Do we need advanced parallelization techniques? When?

• Data set size?• Not enough!

• Large #’s of particles?• Need to parallelize, but embarrassingly parallel OK

• Large #’s of particles + large data set sizes

Page 14: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Parallelization for large data with good “distribution”.

P0P1

P3

P2

P8 P7 P6P5

P4

P9

Pieces of data

(on disk)

P0 P3P2

P5P4 P7P6

P9P8

P1

Parallel Simulation Code

Read Advect Render

Processor 1

Read Advect Render

Processor 2

Read Advect Render

Processor 0

Parallelized visualizationdata flow network

This scheme is referred to as parallelizing-over-data.

Page 15: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Do we need advanced parallelization techniques? When?

• Data set size?• Not enough!

• Large #’s of particles?• Need to parallelize, but embarrassingly parallel OK

• Large #’s of particles + large data set sizes• Need to parallelize, simple schemes may be OK

• Large #’s of particles + large data set sizes + (bad distribution OR complex vector field)• Need smart algorithm for parallelization

Page 16: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Parallelization with big data & lots of seed points & bad distribution

Two extremes:• Partition data over processors

and pass particles amongst processors−Parallel inefficiency!

• Partition seed points over processors and process necessary data for advection−Redundant I/O!

Notional streamlineexample

P0 P0 P0 P0 P0

P1 P1 P1 P1 P1

P2 P2 P2 P2 P2

P3 P3 P3 P3 P3

P4 P4 P4 P4 P4

P0

P1P2

P3P4

Parallelizing Over I/O EfficiencyData Good BadParticles Bad Good

Parallelizeover particles

Parallelizeover dataHybrid algorithms

Page 17: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Outline

• Motivation• Parallelization strategies• Master-slave parallelization• Hybrid parallelism

Page 18: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

The master-slave algorithm is an example of a hybrid technique.

• “Scalable Computation of Streamlines on Very Large Datasets”, Pugmire, Childs, Garth, Ahern, Weber. SC09• Many of the following slides compliments of Dave Pugmire.

• Algorithm adapts during runtime to avoid pitfalls of parallelize-over-data and parallelize-over-particles.• Nice property for production visualization tools.

• Implemented inside VisIt visualization and analysis package.

Page 19: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master-Slave Hybrid Algorithm

• Divide processors into groups of N

• Uniformly distribute seed points to each group

Master:- Monitor workload- Make decisions to optimize

resource utilization

Slaves:- Respond to commands

from Master- Report status when

work complete

SlaveSlaveSlave

Master

SlaveSlaveSlave

Master

SlaveSlaveSlave

Master

SlaveSlaveSlave

MasterP0P1P2P3

P4P5P6P7

P8P9P10P11

P12P13P14P15

Page 20: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master Process Pseudocode

Master(){ while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand()

for cmd in commands SendCommandToSlaves( cmd ) } }}

What are the possible commands?

Page 21: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Commands that can be issued by master

Master Slave

Slave is given a streamline that is contained in a block that is already loaded

1. Assign / Loaded Block2. Assign / Unloaded Block3. Handle OOB / Load4. Handle OOB / Send

OOB = out of bounds

Page 22: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master Slave

Slave is given a streamline and loads the block

Commands that can be issued by master

1. Assign / Loaded Block2. Assign / Unloaded Block3. Handle OOB / Load4. Handle OOB / Send

OOB = out of bounds

Page 23: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master Slave

Load

Slave is instructed to load a block. The streamline in that block can then be computed.

Commands that can be issued by master

1. Assign / Loaded Block2. Assign / Unloaded Block3. Handle OOB / Load4. Handle OOB / Send

OOB = out of bounds

Page 24: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master Slave

Send to J

Slave J

Slave is instructed to send a streamline to another slave that has loaded the block

Commands that can be issued by master

1. Assign / Loaded Block2. Assign / Unloaded Block3. Handle OOB / Load4. Handle OOB / Send

OOB = out of bounds

Page 25: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master Process Pseudocode

Master(){ while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand()

for cmd in commands SendCommandToSlaves( cmd ) } }} * See SC 09 paper

for details

Page 26: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Master-slave in action

P0 P0

P1

P1P2

P2 P3P4

Iteration Action

0 P0 reads B0,P3 reads B1

1 P1 passes points to P0,P4 passes points to P3,P2 reads B0

0: Read

0: Read

Notional streamlineexample

1: Pass

1: Pass1: Read

- When to pass and when to read?- How to coordinate communication? Status?

Efficiently?

Page 27: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Algorithm Test Cases

- Core collapse supernova simulation-Magnetic confinement fusion simulation- Hydraulic flow simulation

Page 28: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Particles Data Hybrid

Workload distribution in supernova simulation

Parallelization by:

Colored by processor doing integration

Page 29: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Workload distribution in parallelize-over-particles

Too much I/O

Page 30: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Workload distribution in parallelize-over-data

Starvation

Page 31: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Workload distribution in hybrid algorithm

Just right

Page 32: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Comparison of workload distribution

Particles DataHybrid

Page 33: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Astrophysics Test Case: Total time to compute 20,000 Streamlines

Sec

onds

Sec

onds

Number of procs Number of procs

Uniform Seeding

Non-uniform Seeding

DataPart-icles

Hybrid

Page 34: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Astrophysics Test Case: Number of blocks loaded

Blo

cks

load

ed

Blo

cks

load

ed

Number of procs Number of procs

DataPart-icles

Hybrid

Uniform Seeding

Non-uniform Seeding

Page 35: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Outline

• Motivation• Parallelization strategies• Master-slave parallelization• Hybrid parallelism

Page 36: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Are today’s algorithms going to fit well on tomorrow’s machines?

Traditional approach for parallel visualization – 1 core per MPI task – may not work well on future supercomputers, which will have 100-1000 cores per node.• Exascale machines will likely have

O(1M) nodes … and we anticipate in situ particle advection.

Hybrid parallelism blends distributed- and shared-memory parallelism concepts.

Page 37: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

The word “hybrid” is being used in two contexts…

• The master-slave algorithm is a hybrid algorithm, sharing concepts from both parallelization-over-data and parallelization-over-seeds.

• Hybrid parallelism involves using a mix of shared and distributed memory techniques, e.g. MPI + pthreads or MPI+CUDA.

• One could think about implement a hybrid particle advection algorithm in a hybrid parallel setting.

Page 38: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

What can we learn from a hybrid parallel study?

• How do we implement parallel particle advection algorithms in a hybrid parallel setting?

• How do they perform?• Which algorithms perform better? How much better?• Why?

Streamline Integration Using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

by David Camp, Christoph Garth, Hank Childs, Dave Pugmire and Ken Joy

Accepted to TVCG

Page 39: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Streamline integration using MPI-hybrid parallelism on a large multi-core architecture

• Implement parallelize-over-data and parallelize-over-particles in a hybrid parallel setting• Did not study the master-slave algorithm

• Run series of tests on NERSC Franklin machine (Cray)• Compare 128 MPI tasks (non-hybrid)

vs 32 MPI tasks / 4 cores per task (hybrid)• 12 test cases: large vs small # of seeds

uniform vs non-uniform seed locations 3 data sets

Page 40: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Hybrid parallelism for parallelize-over-data

• Expected benefits:• Less communication and communicators• Should be able to avoid starvation by

sharing data within a group.

Page 41: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Measuring the benefits of hybrid parallelism for parallelize-over-data

Page 42: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Gantt chart for parallelize-over-data

Page 43: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Hybrid parallelism for parallelize-over-particles

• Expected benefits:• Only need to read blocks once for node, instead of once

for core.• Larger cache allows for reduced reads• “Long” paths automatically shared among cores on node

Page 44: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Measuring the benefits of hybrid parallelism for parallelize-over-particles

Page 45: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Gantt chart for parallelize-over-particles

Page 46: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Summary of Hybrid Parallelism Study

• Hybrid parallelism appears to be extremely beneficial to particle advection.

• We believe the parallelize-over-data results are highly relevant to the in situ use case.

• Although we didn’t implement the master-slave algorithm, we believe the benefits shown at the spectrum extremes provide good evidence that hybrid algorithms will also benefit.

• Implemented on VisIt branch, goal to integrate into VisIt proper in the coming months.

Page 47: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Summary for Large Data and Parallelization

• The type of parallelization required will vary based on data set size, number of seeds, seed locations, and vector field complexity

• Parallelization may occur via parallelization-over-data, parallelization-over-particles, or somewhere in between (master-slave). Hybrid algorithms have the opportunity to de-emphasize the pitfalls of the traditional techniques.

• Hybrid parallelism appears to be very beneficial.

Page 48: Large Vector-Field Visualization, Theory and Practice: Large Data and Parallel Visualization

Thank you for your attention!• Acknowledgments: DoE SciDAC program, VACET• Participants: Hank Childs ([email protected]),

Dave Pugmire (ORNL), Dave Camp (LBNL/UCD), Christoph Garth (UCD), Sean Ahern (ORNL), Gunther Weber (LBNL), and Ken Joy (UCD)

• Questions?


Recommended