Post on 01-Jan-2016
transcript
SCEC/CME Project - How Earthquake Simulations Drive
Middleware Requirements
Philip Maechling
SCEC IT Architect
24 June 2005
GRIDS Center Community Workshop 2005June 26, 2006 2
Southern California Southern California
Earthquake CenterEarthquake Center
• Consortium of 15 core institutions and 39 Consortium of 15 core institutions and 39 other participating organizations, founded other participating organizations, founded as an NSF STC in 1991as an NSF STC in 1991
• Co-funded by NSF and USGS under the Co-funded by NSF and USGS under the National Earthquake Hazards Reduction National Earthquake Hazards Reduction Program (NEHRP)Program (NEHRP)
• Mission:Mission:– Gather data on earthquakes in Southern Gather data on earthquakes in Southern
CaliforniaCalifornia
– Integrate information into a comprehensive, Integrate information into a comprehensive, physics-based understanding of physics-based understanding of earthquake phenomena earthquake phenomena
– Communicate understanding to end-users Communicate understanding to end-users and the general public to increase and the general public to increase earthquake awareness and reduce earthquake awareness and reduce earthquake riskearthquake risk
Core InstitutionsUniversity of Southern California (lead)University of Southern California (lead)
California Institute of TechnologyCalifornia Institute of Technology
Columbia UniversityColumbia University
Harvard UniversityHarvard University
Massachusetts Institute of TechnologyMassachusetts Institute of Technology
San Diego State UniversitySan Diego State University
Stanford UniversityStanford University
U.S. Geological Survey (3 offices)U.S. Geological Survey (3 offices)
University of California, Los AngelesUniversity of California, Los Angeles
University of California, San DiegoUniversity of California, San Diego
University of California, Santa BarbaraUniversity of California, Santa Barbara
University of Nevada, RenoUniversity of Nevada, Reno
Participating Institutions39 national and international universities and 39 national and international universities and
research organizationsresearch organizations http://www.scec.orghttp://www.scec.org
GRIDS Center Community Workshop 2005June 26, 2006 3
Recent Earthquakes In California
GRIDS Center Community Workshop 2005June 26, 2006 4
Observed Areas of Strong Ground Motion
GRIDS Center Community Workshop 2005June 26, 2006 5
Simulations Supplement Observed Data
GRIDS Center Community Workshop 2005June 26, 2006 6
SCEC/CME ProjectSCEC/CME ProjectGoal:Goal: To develop a cyberinfrastructure that can support system-level To develop a cyberinfrastructure that can support system-level earthquake science – earthquake science – the SCEC Community Modeling Environment (CME)the SCEC Community Modeling Environment (CME)
Support:Support: 5-yr project funded by the NSF/ITR program under the CISE 5-yr project funded by the NSF/ITR program under the CISE and Geoscience Directoratesand Geoscience Directorates
Start date:Start date: Oct 1, 2001 Oct 1, 2001
SCEC/ITRProject
NSFCISE GEO
SCECInstitutions
IRIS
USGSISI
SDSCInformationInformation
ScienceScienceEarthEarth
ScienceScience
www.scec.org/cme
GRIDS Center Community Workshop 2005June 26, 2006 7
SCEC/CME Scientific Workflow ConstructionSCEC/CME Scientific Workflow Construction
A major SCEC/CME objective is the ability to construct and run A major SCEC/CME objective is the ability to construct and run complex scientific workflow for SHAcomplex scientific workflow for SHA
9000 Hazard Curve files (9000 x 0.5 Mb = 4.5Gb)
Extract IMR
Value
Plot Hazard
Map
Lat/Long/Amp (xyz file) with 3000 datapoints (100Kb)
Calculate Hazard Curves
Gridded Region Definition
IMR Definition
ERF Definition
Probability of Exceedence
and IMRDefinition
GMT MapConfigurationParameters
Define Scenario
Earthquake
Pathway 1 example
GRIDS Center Community Workshop 2005June 26, 2006 8
SCEC/CME Scientific Workflow System
Grid-BasedData Selector
CompositionalAnalysis Tool
(CAT)
DAXGenerator
Pegasus
CondorDAGMAN
PathwayComposition
Tool
GRID
host1host2
Data
Data
CAT KnowledgeBase
SCEC DatatypeDB
MetadataCatalog Service
ReplicaLocationService
Dax
Dag
Rsl
HAZARD MAP
GRIDS Center Community Workshop 2005 9
SCEC/CME SRB-based Digital Library
SCEC Community Library
Select Receiver (Lat/Lon)
OutputTime HistorySeismograms
Select ScenarioFault Model
Source Model
SRB-based Digital Library
– More than 100 Terabytes of tape archive– 4 Terabytes of on-line disk– 5 Terabytes of disk cache for derivations
June 26, 2006 GRIDS Center Community Workshop 2005
10
Component Library
WorkflowTemplateEditor(CAT)
Workflow Template (WT)
Query for data given metadata
L. Hearn @ UBC
K. Olsen @ SDSU
Execution requirements
I/O data descriptions
COMPONENTS
J. Zechar @ USC (Teamwork: Geo + CS)
DomainOntologyWorkflow
Library
MetadataCatalog
Conceptual Data Query Engine(DataFinder)
DataSelection
D. Okaya @ USC
Query for WT
WorkflowInstance (WI)
WorkflowMapping(Pegasus)
ExecutableWorkflow
Grid information services
Grid
Query for components
INTEGRATED WORKFLOW ARCHITECTURE
Engineer
Tools
Tools
GRIDS Center Community Workshop 2005June 26, 2006 11
SCEC/CME HPC Allocations• SCEC/CME researchers have need and have access to
significant High Performance Computing capabilities
• TeraGrid Allocations (April 2005 – March 2006)– TG-MCA03S012 (Olsen) 1,020,000 SUs– TG-BCS050002S (Okaya) 145,000 Sus
• USC HPCC Allocations– CME Group Allocations (Maechling) 100,000 SUs– Investigator Allocations (Li, Jordan) 300,000 SUs
• SCEC Cluster– Dedicated Pentium 4 16 Processor Cluster (102 GFlops)
GRIDS Center Community Workshop 2005June 26, 2006 12
SCEC/CME TeraGrid Support• TeraGrid Strategic Application Collaboration
(SAC) greatly improved our AWM run-time on TeraGrid
• Advanced TeraGrid Support (ATS) for TeraShake 2 and CyberShake simulations
• SDSC Visualization Services support for SCEC simulations.
GRIDS Center Community Workshop 2005June 26, 2006 13
Three Types of Simulations
• SCEC/CME supports widely varying types of earthquake simulations
• Each Simulation type creates it’s own set of middleware requirements
• Will Describe three examples and comment on their middleware implications and on computational system requirements:– Probabilistic Seismic Hazard Maps– 3D Waveform Propagation Simulations– 3D Waveform-based Intensity Measure Relationship
(1)Earthquake-Rupture Forecast (ERF)
Probability of all possible fault-rupture events (M≥~5) for region & time span
(2) Intensity-Measure Relationship (IMR)
Gives Prob(IMT≥IML) for a given site and fault-rupture event
Attenuation Relationships(traditional)
(no physics)
Full-Waveform Modeling
(developmental)(more physics)
Ý Ý u i ij, j fi
ij ijpp 2ij
Probabilistic Seismic Hazard Maps
GRIDS Center Community Workshop 2005June 26, 2006 15
Example Hazard Curve
Site: USC ERF: Frankel-02IMR: Field IMT: Peak VelocityTime Period: 50 Years
GRIDS Center Community Workshop 2005June 26, 2006 16
Probabilistic Hazard Map Calculations
GRIDS Center Community Workshop 2005June 26, 2006 17
Characteristic of PSHA Simulations• 10k Independent hazard curve calculations for each map calculations.
– High throughput, not high performance, computing problem.
• 10k resulting files per map– Metadata saved for each file
• Short run times on each calculation– Overhead of starting up job is expensive.
• Would like to offer map calculations as service to SCEC users (who may not have an allocation)
GRIDS Center Community Workshop 2005June 26, 2006 18
Middleware Implications• High throughput scheduling
– Well Suited to Condor Pool
• Bundling of short run-time jobs will reduce job startup overhead.
• Bundling of jobs useful for clusters execution.
• Metadata tracking with a RDBMS-based catalog system (e.g. Metadata Catalog System (MCS) and Replication Location Service (RLS)
– Databases present installation and operational problems at ever site we request them
• Software support for interpreted language on Computational Clusters– Implemented in an interpreted programming language
• On-demand execution by non-allocated user
GRIDS Center Community Workshop 2005June 26, 2006 19
3D Wave Propagation Simulations
GRIDS Center Community Workshop 2005June 26, 2006 20
Characteristics of 3D Wave Propagation Simulations• More physically realistic than existing PSHA but more
computationally expensive.
• High Performance Computing, cluster-based codes
• 4D data calculations (time varying volumetric data)
• Output large volumetric data sets
• Physics limited by resolution of grid. Higher ground motion frequencies require denser grid. Double of density increases storage by factor of 8.
GRIDS Center Community Workshop 2005June 26, 2006 21
Example: TeraShake Simulation
• Magnitude 7.7 earthquake on southern San Andreas• Mesh of ~2 billion cubes, dx=200 m
• 0.011 sec time step, 20,000 time steps: 3 minute simulation
• Kinematic source (from Denali) from Cajon Creek to Bombay Beach – 60 sec source duration
– 18,886 point sources, each 6,800 time steps in duration
• 240 processors at San Diego SuperComputer Center DataStar
• ~ 20,000 CPU hours, approximately 5 days wall clock
• ~ 50 Tbytes of output
• During execution “on-the-fly” graphics (…attempt aborted!)
• Metadata capture and storage in the SCEC digital library
GRIDS Center Community Workshop 2005June 26, 2006 22
Domain Decomposition For TeraShake Simulations
GRIDS Center Community Workshop 2005June 26, 2006 23
Simulations Supplement Observed Data
GRIDS Center Community Workshop 2005June 26, 2006 24
Peak Velocity NW-SE Rupture SE-NW rupture
SCEC/CME 25
Montebello: 337 cm/sDowntown: 52 cm/sLong Beach: 48 cm/sSan Diego: 8 cm/sPalm Springs: 36 cm/s
Montebello: 8 cm/sDowntown: 4 cm/sLong Beach: 9 cm/sSan Diego: 6 cm/sPalm Springs: 23 cm/s
SE-NW
NW-SE
GRIDS Center Community Workshop 2005June 26, 2006 26
Break-down of output
Full volume velocities every 10th time step 43.2Tb
Full surface velocities every time step 1.1Tb
Checkpoints (restarts) every 1,000 steps 3.0Tb
Input files, etc 0.1Tb
GRIDS Center Community Workshop 2005June 26, 2006 27
Middleware Implications for 3D Wave Propagation Simulations
• Multi-day high performance runs– Check point restart support needed
• Schedule reservations on clusters– Reservations and special queues are often arranged.
• Large file and data movement– TeraByte transfers require high reliably, long term, data transfers
• Ability to stop and restart– Can we move restart from one system to another
• Draining of temporary storage during runs– Storage required for full often exceeds capability of scratch, so output
files must be moved during simulation
GRIDS Center Community Workshop 2005June 26, 2006 28
Middleware Implications for 3D Wave Propagation Simulations
• On the fly visualization for rapid validation of results– Verify before full simulation is completed
• Standard protocols for data transfers, and metadata registration into SRB-based storage
(1)Earthquake-Rupture Forecast (ERF)
Probability of all possible fault-rupture events (M≥~5) for region & time span
(2) Intensity-Measure Relationship (IMR)
Gives Prob(IMT≥IML) for a given site and fault-rupture event
Attenuation Relationships(traditional)
(no physics)
Full-Waveform Modeling
(developmental)(more physics)
Ý Ý u i ij, j fi
ij ijpp 2ij
Waveform-based Intensity Measure Relationship (CyberShake)
Intensity-Measure RelationshipIntensity-Measure Relationship
List of Supported IMTs
List of Site-Related Ind. Params
IMT, IMT, IML(s)IML(s) Site(s)Site(s) RuptureRupture
Prob(IMT IML | Site,Rup)
Attenuation Relationships
Simulation IMRs
exceed. prob. computed using a suite of synthetic seismograms
Vector IMRs
compute joint prob. of exceeding multiple IMTs
(Bazzurro & Cornell, 2002)
Multi-Site IMRs
compute joint prob. of exceeding IML(s) at multiple sites
(e.g., Wesson & Perkins, 2002)
Various IMR types (subclasses)
Gaussian dist. is assumed; mean and std. from various parameters
GRIDS Center Community Workshop 2005June 26, 2006 31
CyberShake Simulations Push Macro and Micro Computing
• CyberShake requires large forward wave propagation simulations, volumetric data storage
• CyberShake requires 100k seismogram synthesis computations using multi-Terabyte volumetric data sets. During synthesis processing, this data needs to be disk-based.
• 100k of data files, and metadata, files to be managed
• High throughput requirements are driving implementation toward TeraGrid wide computing approach.
• High throughput requirements are driving integration of non-TeraGrid grids with TeraGrid
GRIDS Center Community Workshop 2005June 26, 2006 32
Example CyberShake Region (200km x 200km)
USC: 34.05,-118.24 minLat=31.889,minLon=-120.60,maxLat=36.1858,maxLon=-115.70
GRIDS Center Community Workshop 2005June 26, 2006 33
CyberShake Strain Green Tensor AWM
• Large (TeraShake Scale) forward calculations for each site.– SHA typically ignore rupture > 200km from site, so this is
used as cutoff distance.– 20km buffer distance is used around edges of volume to
reduce edge effects– 65km depth to support frequencies of interest– Volume is 440km x 440km x 65km at 200m spacing
• 1.573 Billion mesh pts• Simulation time 240 seconds
– Volumetric Data Saved for 2 horizontal simulations• Estimated Storage per site is: 7 TB (4.5 data 2.5 checkpoint files)
GRIDS Center Community Workshop 2005June 26, 2006 34
Ruptures in ERF within 200KM of USC
43227 Ruptures in Frankel02 ERF with M 5.0 or larger within 200km of USC
GRIDS Center Community Workshop 2005June 26, 2006 35
CyberShake Computational Elements
GRIDS Center Community Workshop 2005June 26, 2006 36
CyberShake Seismogram Synthesis• Requires calculation of 100,000+ seismogram for each site.• Estimate Rupture Variations scale by magnitude:
– Mw 5.0 x 1 = 20,450– Mw 6.0 x 10 = 216,990– Mw 7.0 x 100 = 106,900– Mw 8.0 x 1000 = 9,000
------------------ 353,340 Ruptures
x 2 components
• Current estimated number of seismogram files per site is 43,000 (due to combining components and variations into single file per rupture).
GRIDS Center Community Workshop 2005June 26, 2006 37
CyberShake Seismogram Synthesis
• Seismogram synthesis stage requires disk-based data storage of large volumetric data sets so tape based archive of volumetric data sets does not work.
• To distribute seismogram synthesis across TeraGrid, we need to either duplicate TB of data, or have global visibility on disks systems
GRIDS Center Community Workshop 2005June 26, 2006 38
Example Hazard Curve
Site: USC ERF: Frankel-02IMR: Field IMT: Peak VelocityTime Period: 50 Years
GRIDS Center Community Workshop 2005June 26, 2006 39
Workflows run Using Grid VDS Workflow Tools
Input Data Selector
Compositional Analysis Tool
(CAT)
PegasusCondor
DAGManConcrete Workflow
Results
Workflow Template
Chimera
MontageAbstract Workflow Service
Abstract Workflow
Grid Resourcesjobs
Application-dependent
Application independent
GRIDS Center Community Workshop 2005June 26, 2006 40
Examples Hazard Map Region (50km x 50km at 2km grid spacing = 625 sites)
OpenSHA SA 1.0 Frankel 2002 ERF and Sadigh with 10% POE in 50 years.
GRIDS Center Community Workshop 2005June 26, 2006 41
Summary of SCEC Experiences
• As soon as we develop a computational capability, the geophysicists develop application that push the technology.– Compute technology, data management technology,
resource sharing technology all are applied.
• In many ways, IT capabilities required for geophysical problems exceed what is currently possible and limit the state of knowledge in geophysics and public safety.– For example, higher frequency simulations, are of
significant interest, but exceed computational and storage capabilities currently available.
GRIDS Center Community Workshop 2005June 26, 2006 42
Major Middleware related issues for SCEC/CMESecurity and Allocation Management
• No widely accepted CA makes adding organizations to SCEC grid problematic.
• Ability to run under group allocations for “on demand” requests. (Community Allocation ?)
GRIDS Center Community Workshop 2005June 26, 2006 43
Major Middleware related issues for SCEC/CMESoftware Installation and Maintenance
• Middleware software stack, even at supercomputer systems, support should include micro jobs support such as Java.
• Database management support for database-oriented tools such as Metadata Catalogs are important (backup, recovery, cleanup, performance, modifications)
• Guidelines for tools in middleware software stack, should describe when local installations are required and when remote installations are acceptable for tools such as RLS and MCS
GRIDS Center Community Workshop 2005June 26, 2006 44
Major Middleware related issues for SCEC/CMESupercomputing and Storage
• Globally (TeraGrid – wide) visible disk storage
• Well supported, reliable file transfers with monitoring and restart of jobs with problems are essential.
• Interoperability between grid tools and data management tools such as SRB must include data and metadata and metadata search.
GRIDS Center Community Workshop 2005June 26, 2006 45
Major Middleware related issues for SCEC/CMEScheduling Issues
• Support for Reservation-based scheduling
• Partial run and restart capability
• Failure detection and alerting
GRIDS Center Community Workshop 2005June 26, 2006 46
Major Middleware related issues for SCEC/CMEUsability Related and Monitoring
• Monitoring tools that include status of available storage resources.
• On-the-fly visualizations for run-time validation of results
• Interfaces to workflow systems are complex, developer oriented interfaces. Easier to user interfaces needed