SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Desktop techniques for the exploration of terascale size, time-varying data sets
John Clyne & Alan Norton
Scientific Computing Division
National Center for Atmospheric Research
Boulder, CO USA
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
National Center for Atmospheric Research
Space Weather Turbulence
Atmospheric ChemistryClimate Weather
The Sun
More than just the atmosphere… from the earth’s oceans to the solar interior
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Goals
1. Improve scientist’s ability to investigate and understand complex phenomena found in high-resolution fluid flow simulations– Accelerate analysis process and improve scientific productivity
– Enable exploration of data sets heretofore impractical due to unwieldy size
– Gain insight into physical processes governing fluid dynamics widely found in the natural world
2. Demonstrate visualization’s ability to aid in day-to-day scientific discovery process
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Problem motivation:Analysis of high resolution numerical turbulence simulations
• Simulations are huge!!– May require months of supercomputer time
– Multi-variate (typically 5 to 8 variables)
– Time-varying data
– A single experiment may yield terabytes of numerical data
• Analysis requirements are formidable– Numerical outputs simulate phenomena not easily observed!!!
– Interesting domain regions (ROIs) may not be known apriori
• Additionally…– Historical focus of computing centers on batch processing
– Dichotomy of batch and interactive processing needs
– Currently available analysis tools inadequate for large data needs• Single threaded, 32bit, in-core algorithms
• Lack advanced visualization capabilities
– Currently available visualization tools ill-suited for analysis
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
[Numerical] models that can currently be run on typical supercomputing platforms produce data in amounts that make storage expensive, movement cumbersome, visualization difficult, and detailed analysis impossible. The result is a significantly reduced scientific return from the nation's largest computational efforts.
And furthermore…
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
A sampling of various technology performance curves
• Not all technologies advance at same rate!!!
Performance gains from 1980 to present
1
10
100
1000
10000
100000Im
pro
vem
ent
Disk Drive Internal DataRate
Disk Drive InterfaceData RateEthernet NetworkBandwidth
Intel MicroprocessorClock SpeedDrive Capacity
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Example: Compressible plume dynamics
• 504x504x2048
• 5 variables (u,v,w,rho,temp)
• ~500 time steps saved
• 9TBs storage
• Six months compute time required on 112 IBM SP RS/6000 processors
• Three months for post-processing
• Data may be analyzed for several years
M. Rast, 2004. Image courtesy of Joseph Mendoza, NCAR/SCD
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Visualization and Analysis Platform for oceanic, atmospheric, and solar Research (VAPoR)
Key componentsDomain specific
numerically simulated turbulence in the natural sciences
Data processing languageData post processing and quantitative analysis
Advanced visualizationIdentify spatial/temporal ROIs
MultiresolutionEnable speed/quality tradeoffs
This work is funded in part through a U.S. National Science Foundation, Information Technology Research program grant
Combination of visualization with multiresolution data representation that provide sufficient data reduction to enable interactive work on time-varying data
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Multiresolution Data Representation
• Geometry Reduction (Schroeder et al, 1992; Lindrstrom & Silva, 2001;Shaffer and Garland, 2001)
• Wavelet based progressive data access– Mathematical transforms similar to Fourier
transformations– Invertible and lossless – Numerically efficient forward and inverse transform – No additional storage costs– Permit hierarchical representations of functions– See Clyne, VIIP2003
Transform
(e.g. Iso, cut plane)
Render
geometryData
Source
data Pixels
Analyze & Manipulate
Text, 2D graphics
Visualization Pipeline
Reduce Reduce
• Data reduction (Cignoni, et al 1994; Wilhelms & Van Gelder, 1994; Pascucci & Frank, 2001; Clyne 2003)
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Putting it all together
• Visual data browsing permits rapid identification of features of interest, reducing data domain
• Multiresolution data representation affords a second level of data reduction by permitting speed/quality trade offs enabling rapid hypothesis testing
• Quantitative operators and data processing enable data analysis
• Result: Integrated environment for large-data exploration and discovery
Goal: Avoid unnecessary and expensive full-domain calculations
– Execute on human time scales!!!
Visual data browsing
Datamanipulation
Quantitativeanalysis
Refine
Coarsen
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Compressible Convection
1283 5123M. Rast, 2002
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
504x504x2048
Full
252x252x1024
1/8
126x126x512
1/64
63x63x256
1/512
Compressible plume data set shown at native and progressively coarser resolutions
Compressible plume
Resolution:
Problem size:
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Rendering timings
0.1
1
10
100
1000
Full 1/2 1/4 1/8
Resolution
Tim
e in
se
con
ds
Mdb
Vtk
0.01
0.1
1
10
Full 1/2 1/4 1/8
Resolution
Tim
e in
se
con
ds
Mdb
5123 Compressible Convection 5042x2048 Compressible Plume
Reduced resolution affords responsive interaction while preserving all but finest features
SGI Octane2, 1x600MHz R14k
SGI Origin, 10x600MHz R14k
Interactive!!
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Derived quantities
p: pressure
: density
T: temperature
: ionization potential
: Avogadro’s number
me: electron mass
k: Boltzmann’s constant
h: Planck’s constant
(1) Tp
(2)
2323
2
2
2
1kTe e
N
T
h
km
y
y
(3)22 u
Derived quantities produced from the simulation’s field variables as a post-process
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Calculation timings for derived quantities
0.01
0.1
1
10
100
1000
10000
Full 1/2 1/4 1/8
Resolution
Tim
e in
Se
co
nd
s
pressure (eq 1)
ionization (eq 2)
enstrophy (eq 3)
Note: 1/2th resolution is 1/8th problem size, etc
Deriving new quantities on interactive time scales only possible with data reduction
SGI Origin, 10x600MHz R14k
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Error in approximations
• Error is highly dependent on operation performed
• Algebraic operations tested introduced low error even after substantial coarsening
• Error grows rapidly for gradient calculation
• Point-wise error gives no indication of global (average) error
Point-wise, normalized, Point-wise, normalized, maximum, absolute errormaximum, absolute error
i
iii
s
ss
,ˆmax
Resolution P
Eq 1
Y
Eq 2
2
Eq 3
Full 0 0 0
1/2 1.09 0.03 85.57
1/4 2.53 0.14 97.3
1/8 3.79 0.65 99.8
Integrated visualization and analysis on interactively selected subdomains:
u
2ur
pg
z
1 pr
1 pr
2ur
z
Vertical vorticity of the flow
Mach number of the vertical velocityFull domain seen from above Subdomain from side
Full domain seen from above Subdomain from side
Efficient analysis requires rapid calculation and visualization of unanticipated derived quantities. This can be facilitated by a combination of subdomain selection and resolution reduction.
A test of multiresolution analysis: Force balance in supersonic downflows
Sites of supersonic downflow are also those of very high vertical vorticity. The core of the vortex tubes are evacuated, with centripetal acceleration balancing that due to the inward directed pressure gradient. Buoyancy forces are maximum on the tube periphery due to mass flux convergence.
The same interpretation results from analysis at half resolution.
1 pr
u
2ur
pg
z
1 pr
2ur
z
u
2ur
pg
z
1 pr
1 pr
2ur
z
Full
Half
Resolution
Subdomain selection and reduced resolution together yield data reduction by a factor of 128
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Summary
• Presented prototype, integrated analysis environment aimed at aid investigation of high-resolution numerical fluid flow simulations
• Orders of magnitude data reduction achieved through:1. Visualization: Reduce full domain to ROI
2. Multiresolution: Enable speed/quality trade-offs
• Coarsened data frequently suitable for rapid hypothesis testing that may later be verified at full resolution
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Future work
• Quantify and predict error in results obtained with various mathematical operations applied to coarsened data
• Investigate lossy and lossless data compression
• Add support for less regular meshes
• Explore other scientific domains – Climate, weather, atmospheric chemistry,…
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Future???
Original 20:1 Lossy Compression
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Acknowledgements
• Steering Committee– Nic Brummell - CU, JILA
– Aimé Fournier – NCAR, IMAGe
– Helene Politano - Observatoire de la Cote d'Azur
– Pablo Mininni, NCAR, IMAGe
– Yannick Ponty - Observatoire de la Cote d'Azur
– Annick Pouquet - NCAR, ESSL
– Mark Rast - NCAR, HAO
– Duane Rosenberg - NCAR, IMAGe
– Matthias Rempel - NCAR, HAO
– Yuhong Fan - NCAR, HAO
• Developers– Alan Norton – NCAR, SCD
– John Clyne – NCAR, SCD
• Research Collaborators– Kwan-Liu Ma, U.C. Davis
– Hiroshi Akiba, U.C. Davis
– Han-Wei Shen, Ohio State
– Liya Li, Ohio State
• Systems Support– Joey Mendoza, NCAR, SCD
SC05November, [email protected]
Supercomputing • Communications • Data
NCAR Scientific Computing Division
Questions???
http://www.scd.ucar.edu/hss/dasg/software/vapor