March 29, 2001March 29, 2001
Hiroshi Takahara & Toshifumi TakeiHiroshi Takahara & Toshifumi TakeiNEC Corporation
Visualization for High-End Weather and Climate Modeling on the NEC SX Series
The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities
E-mail: [email protected]
PC
Advancement of HPC TechnologyAdvancement of HPC Technology
Biochemistry Weather/climate Ocean
Structural & thermalanalysisApplications
FLOPS
ServerMicro processor
Parallel processing cluster
Highly parallel vectorprocessing
High performance 1-chip vector / massively parallelprocessing
Multi-layered highly parallel processing / Distributed global computing
Supercomputer
Future system
1 Giga
1 Tera
1 Peta
10
100
10
100
CFD Crash
Earth Simulator
Memory size
100M 1G 10G 100G 1T
80 GBytes
Performance (FLOPS)
8 G
800 M
80 M
8 MAirfoil
48-Hour Weather
2-D PlasmaModelingOil ReservoirModeling
Estimate ofHiggs BosonMass
3-D PlasmaModeling
72-HourWeather
Vehicle Designing
StructuralBiology
PharmaceuticalDesigning
ChemicalDynamics
Climate ModelingTurbulence SimulationHuman GenomeOceanic CirculationViscous Fluid DynamicsSemiconductor ModelingQuantum Chromodynamics
(DARPA)
Required performance and memory capacityRequired performance and memory capacity
Vector & scalar processingVector & scalar processing
WeatherWeather
Crash
Amount of computation
Data
siz
e
Genome
FEM
CFD
Vector-tailoredVector-tailored
Scalar-tailoredScalar-tailored
Chemistry
Vector --- tailored to large-scale simulations and huge data (Meteo/climate, CFD, crash…)
Scalar --- suitable for small-to-medium sized problems
Limited performance scalability due to inter-PE communications
Merits and Demerits of Each Architecture
Vector
Scalar
Shared Distributed・ Shared Distributed
Excellent Effective PerformanceEase of Use(Auto Parallelization)
Excellent Effective PerformanceEase of Use(Auto Parallelization)
:Merits :Demerits
High ScalabilityHigh Scalability
Difficult Parallelization(Require High Skills)
Difficult Parallelization(Require High Skills)
Difficult Parallelization(Require High Skills)
Poor Effective Performance
Difficult Parallelization(Require High Skills)
Poor Effective Performance
Excellent Cost/Peak PerformanceExcellent Cost/Peak PerformanceEase of Use (Auto Parallelization)Ease of Use (Auto Parallelization)
Wide Application RangeWide Application Range
High Cost, Limited ScalabilityHigh Cost, Limited Scalability
Limited ScalabilityLimited Scalability
Some views from the weather & climate Some views from the weather & climate community for vector computerscommunity for vector computers
Shared-memory, vector computers manufactured in Japan, have a combination of usability and performance...
The purchase of Japanese vector computers would have an immediate impact on climate and weather science in the U.S.
The use of distributed memory, commodity-based processor parallel computers increases the needed software investment …
USGCRP Report (Dec.2000)
Pros and Cons about the Validity of the TOP500Pros and Cons about the Validity of the TOP500Pros: Ranking covering worldwide high-performance computers with much swayCons: NOT representing a complete range of applications. Too much impacts in policy makingChanging acceptance among the HPC community because of the increased dominance of
business computing vendors (particularly for lower rankings)
0
50
100
150
200
250
IBM Sun SGI
Cray
Compaq (11)HP (5)
47
Fujitsu (17)
Self(5)HPTi(1)Intel(1)
NEC(23)
Hitachi (16)
215
92
67
Finance, DB, Web etc. 72 sites#15/34 Charles Schwab#53 European Patent Office#93 Sobeys#102 Deutch Telekom#112 Bank Administration Institute (BAI)#120 State Farm#177 NTT#213 Chase Manhattan
Finance, DB, Web etc. 54 sites#136 New York City - Human Resources#139 Bank Westboro#140 E-commerce Stanta Clara#169 Ariline London#170 Bank Milano#171 Bank Munich#173 Chase GlobalNet#176Rakuten **
1 IBM ASCI White, 4938 Lawrence Livermore National Laboratory 2 Intel ASCI Red 2379Sandia National Labs 3 IBM ASCI Blue-Pacific
2144Lawrence Livermore National Laboratory 4 SGI ASCI Blue Mountain
1608Los Alamos National Laboratory 5 IBM SP Power3 375 MHz 1417Naval Oceanographic Office (NAVOCEANO) 6 IBM SP Power3 375 MHz 1179National Centers for Environmental Prediction 7 Hitachi SR8000-F1/112
1035Leibniz Rechenzentrum 8 IBM SP Power3 375 MHz 8 way 929UCSD/San Diego Supercomputer Center 9 Hitachi SR8000-F1/100
917High Energy Accelerator Research Organization /KEK 10 Cray Inc. T3E1200 892 Government
Nov. 2000
** Rank 176: Rakuten is the largest cyber mall in Japan!!!
(FLOPS)
Peak p
erform
ance
1G
10G
100G
1T
4T
Multi-node
IXS or HIPPI-SW
160G
80G
40G
20G
10G
●
●
●
●
A
B
C
D
●
●
●
●
●
5T
●
SX- 5 Series
A Model 64G - 128GF
B Model32G - 64GFLOPS
4G - 8GFLOPS
C Model16G - 32GFLOPS
D Model8G - 16GFLOPS
HPC ServerSX- 5S
●
●
●
8G
4G
16G
8G - 16GFLOPS
SX-5 Series / SX-5S (HPC Server) ProductsSX-5 Series / SX-5S (HPC Server) Products
Single node
●
●
Be
Ce ●
4GFLOPS・
CPU Model
...what you pay for:
1 2 4 8 12 16 20 24 32 47
IFS
BOM GASP
BOM LAPS
CHMI_ALADIN
DMI HIRLAM
51 50 50 4947
4545
35.8
55
44
48.6
44.9
41.4 41.138.3
44 44 43 43
39
37
0
10
20
30
40
50
60
% o
f m
ach
ine
pe
ak
# CPUs
Sustained performance
Performance of Mission-Critical NWP Codes on the SX Series
SX Series in Meteorology / Environmental Science
Europe
・ Danish Meteorological Institute(DMI)
・ Bureau of Meteorology (BOM)/CSIRO
Ѓњ
・ Instituto Nacional De Pesquis Espaciais (INPE)
・ Atmospheric Environment Service(AES)
・ National Institute of Environmental Studies (NIES)・ Japan Marine Science and Technology Center (JAMSTEC)・ Frontier Research System for Global Change
・ Czech Hydrometeorological Institute(CHMI)
・ Institute for Atmospheric Physics in Germany(IAP)
Japan
Australia
North America
South America
Asia
SX Series at Worldwide Major Meteorological Institutions
・ Korea Meteorological Administration(KMA)
・ IRI Lamont Doherty・ Deutsches Klimarechenzentrum (DKRZ)
・ Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University(ICMW)
・ Istituto Nazionale di Geofisica e Vulcanologia (INGV)
・ Meteorological Service Singapore (MSS)
Swiss Center for Scientific Computing (CSCS)
Real-Time Visual Simulation LibraryReal-Time Visual Simulation Library
RVSLIBRVSLIB
http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/
Image-based visualization tailored to large volume of data resulting from numerical simulations / observations
Challenges in Visualizing a Large Volume of DataChallenges in Visualizing a Large Volume of Data
User’s terminalComputing server
Post-processor
1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 1.295902107579022 -0.1312699897486619 -2.843950729607857D-05 0.1922945263984200 1.314703922311307 -0.1313582658024421 -1.051331901487677D-04 0.1699623273723783::
1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 :
Internet
program cfdc implicit real*8 (a-h,o-z) parameter ( maxi=81,maxj=41,maxk=5 ) parameter ( maxgrd=maxi*maxj*maxk ,maxobj=101 & ,maxiwk=512*512*15 ,maxrwk=maxgrd*61 ) integer irvslibstatecc-- permanent array -- dimension x(maxgrd),y(maxgrd),z(maxgrd) & ,scal(maxgrd*5) & ,iobj(maxobj*6),rwork(maxrwk),iwork(maxiwk) & ,iobj2(maxobj*6)::
Disk space problem
Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)
Disk space problem
Storing all the computational results for each parameter setneeds more than several GBytes of disk space.Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)
Data transfer bottleneck
Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h
Data transfer bottleneck
Transferring GB-order data over a networkis next to impossible.Ex. Effective performance 1MB/sec -->100GB/(1MB/sec)=28h
Memory capacity problem
Loading a large volume of datathat were output by a supercomputermay be difficult.
Memory capacity problem
Loading a large volume of datathat were output by a supercomputermay be difficult.
Intensive needs for grasping simulatedIntensive needs for grasping simulatedresults on the flyresults on the fly
Memory capacity NWP code : 100-200 array elements per grid Increasing demand with model resolution and complexity T319L50 model (40km mesh) requires 20-40 GBytes Ensemble forecasting of 50 members --> >> 1TBytes Data assimilation / chemical models much demanding Climate code :1-year simulation 30-60 Gbytes (T213L50) 2-4TBytes (T1280L100) Disk space NCAR: empirically 114 Bytes per MFLOP 5TBytes/month net growth* (*RCI Workshop, April 2000)
-Approach A : Conventional Post-processing –Approach A : Conventional Post-processing –(Vis5D, GrADS, and many of off-the-shelf packages)(Vis5D, GrADS, and many of off-the-shelf packages)
Graphical mapping and rendering on the client side Approach adopted by many conventional post-processors
◆ AdvantagesFull exploitation of server for number crunching and local
machine resources for graphical processing
◆ DrawbacksChallenges in transferring a huge volume of (polygon) data
across the network and manipulating them on the local server
Numericalsimulation
Mapping RenderingImage display
Computing server
User’s terminal
-Approach B (-Approach B (Server-side VisualizationServer-side Visualization)-)-Approach of NEC RVSLIBApproach of NEC RVSLIB
Both mapping and rendering processes on the server side
◆ AdvantagesEfficient usage of network because of transfer of image data (NOT massive polygon data)
Image compression techniques available for further reduction of data
◆ DrawbacksIncreased load of computing and memory resources on the server side for graphical mapping and rendering processes
Numericalsimulation
Mapping RenderingImage display
Computing server User’s terminal
Compressed Image Data
Program (Calling RVSLIB)
Image Display GUIRenderingAnimation
ScenarioFile
AnimationFile
VisualizationSteering of SolverCreation of Image
Tracking
Steering
Computing Server( Supercomputer/Workstation)
Terminal(Workstation/PC)
(Flow Simulator etc.)RVSLIB Client
RVSLIB Server
RVSLIB: Real-time Visual Simulation LibraryRVSLIB: Real-time Visual Simulation Library
•Monitoring of an on-going simulation (tracking) and alteration of its parameters (steering) while continuing the simulation
- Constant and reduced data transfer rate between the server and client regardless of the scale of simulations
Reduced Cost and EffortEfficient Use of NW Bandwidth
Network (LAN/WAN)
RVSLIB/Server: SX, WSRVSLIB/Client: PC,WS (Java)
Usage of RVSLIBUsage of RVSLIB
Moviegeneration(batch mode)
--> Initialization in interactive mode Handshake with the client in batch mode Loading a scenario--> Data management (no data copy) --> Rendering C/S communication --> Termination
CALL RVS_INIT
CALL RVS_BFC
CALL RVS_TERM
Main loop body
Time integration
CALL RVS_MAIN
User’s code RVSLIB server
Server
Moviein AVIetc.
Moviein AVIetc.
Scenarioscript
Off-line converter
Moviein avi ormpeg2
RVSLIB/Client (GUI)(interactive mode) Tracking and steering of user code
- UNIX version based on X/Motif - Java version for Windows / UNIX
C/S communica-tion protocols
Intranet- TCP/IP socket Internet/firewall- HTTP Single machine- Shared memory
Interactive Mode
Batch Mode
Data interfaces with GrADS and NetCDF
formats supported for post-processing
R educed cost and effort on a trial-and-error basis - Monitoring of an on-going simulation (tracking) and alteration of its parameters
(steering) while continuing the simulation - Conventional post-processing and batch-mode graphics also available
Best Benefits Gained From RVSLIB
Efficient use of vector/parallel facilities and network - Efficient graphical processing and image creation capitalizing on vector/parallel computing capabilities - Reduced and almost constant network traffic exploiting image data compression
Animation based on scenario
- Easily navigable visualization based on a plot described in a scenario file
Library format tailored to a wide spectrum of simulation programs (BFC Grid, FEM, Multi-block grid, particle simulation, …)
Visualization of flow around a baseball Visualization of flow around a baseball - Collaboration with Physical & - Collaboration with Physical &
Chemical Res. Inst., Japan Chemical Res. Inst., Japan Computation: Finite Difference Method Unsteady, incompressible, viscous Flow Number of Grids: 169 * 92 * 101 Reynolds number: 100000 -- 200000 Ball Speed: 75 ~ 150km/h
Applications
Computation timing data (10000 time steps): Solver only (no visualization) 27150sec (7.54h) + Visualization with same viewing: 28000sec (7.78h) + Visualization with variable viewing: 28150sec (7.82h) ---> Almost no additional CPU time required for visualization because of high-speed visualization on the SX Series
# Computation on SX-5S1 (4GFlops) # Visualization every 10 time steps (contour and tracer) # Tracer movement calculated at each time step
Post-processing with RVSLIBPost-processing with RVSLIB- Collaboration with BoM/Australia -- Collaboration with BoM/Australia -
RVSLIB Client
NetCDF format files
User’s solver RVSLIB Server
Server SX-4/32
Compressed image data
NumericalWeather Prediction
Offlinevisualization
Bureau of Meteorology (Australia)
Oceanic circulation simulated with ACOM2
Atmospheric simulation --Relative humidity around Australia represented by isosurfaces
On-going & Future enhancementsOn-going & Future enhancements
◆ MPI-based performance optimization
◆ Hierarchical data structure for visualization of huge data
combined with wavelet transformation
◆ Inter-server collaboration
◆ Active visualization - Automatic extraction of specific features from data - Visualization combined with data mining
Needs for Grid ServicesNeeds for Grid Services
Remoteaccess
Remotemonitoring
Informationservices
Faultdetection
. . .Resourcecontrol
CollaborationTools
Data MgmtTools
Distributedsimulation
. . .
net
Toward Global Computing Environments
One Single Machine Never Fits All ...
SX-5 Series: http://www.sw.nec.co.jp/hpc/sx-e/index.html RVSLIB: http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/