Post on 18-Jan-2016
transcript
CISM Collaboratory Development Plan
Philip J. MaechlingInformation Technology Architect
Southern California Earthquake CenterMarch 11, 2015
CISM Collaboratory Development Plan
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
Key CISM Use CasesBased on the CISM Proposal, we identified the following CISM Use Cases:Year 1: • Couple the empirical Uniform California Earthquake Rupture Forecast to the
CyberShake ground-motion forecasting models of the Los Angeles region. • Provide new computational tools to assist the development of rupture simulators
such as RSQSim and ground-motion simulators such as CyberShake.
Year 2: • Couple the RSQSim physics-based rupture simulator to the CyberShake ground-
motion forecasting models• Retrospectively calibrate and test the resulting comprehensive forecasting models.
Year 3: • Construct a computational environment that can sustain the long-term
development of comprehensive, physics-based earthquake forecasting models• Submitted exemplars to CSEP for prospective testing against observed earthquake
activity in California.
Additional CISM System RequirementsCISM must be designed to meet several additional non-functional requirements:1. Must use existing scientific software written in a variety of
programming languages 2. Must use local computing resources and high-performance
parallel computing resources from external resource providers3. Must be able to “show our work” to support scientific review of
results.4. Must be inexpensive to design, build, maintain, and operate5. Must be easy to modify without significant re-implementation or
down time.6. Must support new development without impacting ongoing
operations7. Must run for years to get statistically significant results
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
CISM Modular Processing Architecture
Build a modular, extensible, distributed, high performance computing framework:
1) Define and execute a multi-stage series of scientific calculations
2) Execute calculations on SCEC and external resources and return results to SCEC
3) Modular construction to enable evaluation of multiple alternative methods
4) Ensure repeatable and reviewable results
* We will use a workflow-based distributed computing framework developed on SCEC HPC Projects
Define Rupture Catalog
Define list of possible earthquakes for region of Interest during period of interest
Assign Rupture Probabilities
Assign a probability to each rupture in catalog during period of interest
Calculate Rupture Ground Motions
Calculate ground motions produced by each rupture in region of interest
Forecast Future Ground Motions
Combine ground motions with probabilities to produce probabilistic ground motion forecast
CISM Modular Processing Architecture
Define Rupture Catalog
Define list of possible earthquakes for region of Interest during period of interest
Assign Rupture Probabilities
Assign a probability to each rupture in catalog during period of interest
Calculate Rupture Ground Motions
Calculate ground motions produced by each rupture in region of interest
Forecast Future Ground Motions
Combine ground motions with probabilities to produce probabilistic ground motion forecast
OpenSHA
UCERF 3 Ruptures
3D Wave Propagation Simulations
UCERF 3 Probabilities
CyberShake OpenSHA
Combine Amplitudes
into Forecast
Focus CISM Software Development on Defining Workflows to Minimize Software Development
WorkflowConfigurationEnvironment
(CISM Software
Development)
WorkflowExecution
Environment
(Existing Open-Source
Software)
CISM Workflow-oriented System Implementation
1. CISM forecasts are implemented by running a series of scientific programs.
2. CISM Workflows define the programs used, the input and output files, and order they must be run.
3. Workflows are defined without machine, or computing environment, specific details (called abstract workflows)
4. After target run site is selected, abstract is “planned” and specific executables, and physical file names are inserted (called concrete workflows)– This technique is well suited for computing environments that move
computing from one system to another.– Workflow tools also provide logging, metadata collection, and restart
capabilities
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
Computing Environment
Develop a distributed computing environment, based at USC HPCC, utilizing NSF and DOE HPC systems.• Establish both an (1) operational and
(2) development computing environment• Maintain cumulative data results locally• Provide external interfaces to forecasts and
forecast results
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
Essential CISM Scientific Codes[1] OpenSHA: Implement Uniform California Earthquake Rupture Forecast2 and 3, GMPEs, and probabilistic seismic hazard processing / Language: Java / Multi-threaded / Primary Developers: Ned Field, Kevin Milner[2] RSQSim: Large scale simulations of earthquake occurrence to characterize system‐ ‐level response of fault systems including processes that control time, place, and extent of earthquake slip/ Language: C / MPI-based / Primary Developers: James Dieterich, Keith Richards-Dinger [3] CyberShake: 3D wave propagation simulations for large set of ruptures, and seismogram processing resulting in peak ground motions and other parameters / Language: C / MPI-based / Primary Developers: Robert Graves, Scott Callaghan, Philip Maechling, Thomas Jordan[4] CSEP: Automated execution and evaluation of short term earthquake forecast models / Language: Python / Multi-threaded / Primary Developers: D. Schorlemmer, T. Jordan, M. Liukis
[1] Field, E.H., T.H. Jordan, and C.A. Cornell (2003), OpenSHA: A Developing Community-Modeling Environment for Seismic Hazard Analysis, Seismological Research Letters, 74, no. 4, p. 406-419.[2] Richards Dinger, K., and James H. Dieterich (2012) RSQSim Earthquake Simulator Seismological Research Letters, 2012 ‐ v. 83 no. 6 p. 983-990 doi: 10.1785/0220120105[3] Graves, R., T. Jordan; S. Callaghan; E. Deelman; E. Field; G. Juve; C. Kesselman; P. Maechling; G. Mehta; K. Milner; D. Okaya; P. Small; and K. Vahi (2010). CyberShake: A Physics-Based Seismic Hazard Model for Southern California, Pure Applied Geophys.,v.169,i.3-4 DOI: 10.1007/s00024-010-0161-6[4] Zechar,J. D., D. Schorlemmer, M. Liukis, J. Yu, F. Euchner, P. J. Maechling and T. H. Jordan (2010) The Collaboratory for the Study of Earthquake Predictability Perspective on Computational Earthquake Science Concurrency and Computation: Practice and Experience, Vol. 22, 1836-1847, 2010.
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
Computing and Data Estimates
Estimated Annual Large-scale HPC Runs:
RSQSim: Regional (1200Km faults), Simulated time: 100K Years, Number of Rupture: 100M, Repetitions: 50• Core Hours Required: 40M• Local Results Data: 60TB
CyberShake: Regional (300 Sites), Spacing: 10Km, Max Freq: 1Hz, Min Vs: 500m/s• Core Hours Required: 70M• Local Results Data: 10TB
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
SCEC Software Engineering Practices
Iterative Development Process (Typically 3 Month Iterations)• Develop end-to-end processing that provides scientific value• Deploy operational system and operate during next iteration• Extend system, preserving existing and add new capabilities on development
system• Migrate development system to operational at end of iteration
Software Engineering Practices• Software Version Control• Automated Testing frameworks• Standards based data formats and management• Metadata collection• Process logging• Error detection and monitoring
CISM Information TechnologyOverview
• System Requirements• System Architecture• Computing Environment• Essential Software Components• Computational and Data Estimates• Software Development Process• IP Considerations
CISM IP Principles
1. Integrate best-available academic codes developed and contributed by research community
2. Accept NSF-support and private company gifts to support software development
3. Release as free and open-source software to support scientific transparency and build confidence in results
4. License software in way it can be used by academic and US agencies including USGS.
Open-source license criteria focuses on the availability of the source code and the ability to modify and share it, while free software licenses focuses on the user's freedom to use the program, to modify it, and to share it.
Key Rights and Issues: Apache License v2.0
1. Software distribution must Include license2. Software distribution must Include source code3. No warranty offered4. User agrees to no liability 5. User are granted copyright to software and source code6. Users granted patent license to use software7. Users are not permitted to use any trademarks in distribution without permissions8. Private use is allowed9. Commercial use is allowed10. Redistribution is allowed with licenses intact11. Users is allowed to make modifications12. User must State what changes they made13. User can distribute their modifications under different, including proprietary, licenses14. Users are permitted to link to any other software, that uses different, including proprietary,
licenses
Backup Slides
System Requirement DetailsOur focus will be on system-specific models for time-dependent earthquake forecasting that are comprehensive, physics-based, data calibrated,and prospectively testable.
A model is System-specific if it identifies a physics-based parameter, such as future ground motions, that it seeks to predict, and integrates all relevant physics into the model needed to predict that parameter.
A model is comprehensive if it forecasts ground-motion exceedance probabilities; i.e., the chances that ground motions at any surface site will exceed an adjustable (risk-sensitive) intensity threshold during a forecasting interval. Comprehensive forecasts must thus combine earthquake rupture forecasts with groundmotionpredictions that are conditional on the rupture (Fig. 2)
Physics-based models adhere to the laws of physics and thus automatically approximate many essential physical constraints; they can thereby capture more predictability than strictly empirical models.
A model is Data calibrated if the model is validated against some type of observational data, and can be improved with additional observations.
Key CISM Use CasesBased on CISM Proposal, we identify the following key CISM system use cases:
1. Incorporate into a common software framework the UCERF3 models, one or more rupture simulators developed by SCEC’s Earthquake Simulator technical activity group, and SCEC’s CyberShake ground-motion simulation platform.
2. Integrate UCERF3 and CyberShake into a comprehensive forecasting model, replacing the empirical ground motion prediction equations used in the national maps with a physics-based model derived from simulations of seismic wave excitation and propagation through realistic 3D crustal structures.
3. Couple CyberShake with RSQSim, replacing the empirical UCERF3 model with a physics based rupture simulator that accounts for earthquake nucleation and stress transfer.
4. Use Monte Carlo techniques to incorporate the deterministic, physics-based models into a probabilistic framework that properly accounts for epistemic uncertainties in the models.
Project Personnel
• Executive Director of Science Programs - With the Project PI, responsible for defining and conducting a collaborative scientific process that can identify and address the full range of scientific issues encountered during CISM development, ensuring that all necessary scientific models are integrated into a system that produces testable forecasts.
• Graduate Students - Responsible for in-depth analysis of selected scientific issues related to time-dependent earthquake forecasts that arise during CISM developments under the direction of the Project PI and the Executive Directory of Science Programs.
• Post-Doc - Responsible for integrating existing CISM scientific models into time-dependent earthquake
forecast models and developing software prototypes that can be used as initial implementations of necessary CISM processing systems.
• Phil - Responsible for defining and developing the CISM system and software architecture including the
internal CISM processing and data management capabilities, and CISM external interfaces to observational data sources, external high-performance computer resources, and presentation of results to end-users and stakeholders.
• Software Engineer - Responsible for software implementation of the CISM system architecture, for
implementation of the CISM data interfaces, and for integration of existing and new scientific software into the CISM end-to-end processing systems.
System Architecture
CISM Modular Processing Architecture
Define Rupture Catalog
Define list of possible earthquakes for region of Interest during period of interest
Assign Rupture Probabilities
Assign a probability to each rupture in catalog during period of interest
Calculate Rupture Ground Motions
Calculate ground motions produced by each rupture in region of interest
Forecast Future Ground Motions
Combine ground motions with probabilities to produce probabilistic ground motion forecast
RSQSim
Long-Period Earthquake Simulations
3D Wave Propagation Simulations
ETAS Probabilities
CyberShake OpenSHA
Combine Amplitudes
into Forecast
CISM Software Combines CSEP and CyberShake Capabilities
29
30
CyberShake workflows
.
.
.
7,000 jobs 415,000 jobs
Mesh generation
Tensor Workflow
1 job 2 jobs
Post-Processing Workflow
.
.
.
DBInsert
Tensor simulation
Tensor extraction
Tensor extraction
Seismogram synthesis
Seismogram synthesis
Seismogram synthesis
PSA
PSA
PSA
Data Products Workflow
HazardCurve
415,000 jobs
Earthquake
Catalog
Earthquake Catalog
Retrieve
Data
Filter
Catalog
Filtered Earthquake
Catalog
Earthquake
Forecast
Evaluation of Earthquake
Predictions
Forecast
EQs
Evaluate
Forecast
Conceptual CSEP Processing Model For Seismicity Based Forecasts
CSEP Collaboratory
Earthquake
Catalog
Earthquake Catalog
CSEP Software
Retrieve data on a daily basis Prepare data sets Prepare for testing Test Publish results
32
SCEC: An NSF + USGS Research Center
Benefits of Scientific Workflows (from the point of view of an application scientist)
• Conducts a series of computational tasks.– Resources distributed across Internet.
• Chaining (outputs become inputs) replaces manual hand-offs.– Accelerated creation of products.
• Ease of use - gives non-developers access to sophisticated codes.– Avoids need to download-install-learn how to use someone else's code.
• Provides framework to host or assemble community set of applications.– Honors original codes. Allows for heterogeneous coding styles.
• Framework to define common formats or standards when useful.– Promotes exchange of data, products, codes. Community metadata.
• Multi-disciplinary workflows can promote even broader collaborations.– E.g., ground motions fed into simulation of building shaking.
• Certain rules or guidelines make it easier to add a code into a workflow.
SCEC/CME HPC Allocation Growth
SCEC: An NSF + USGS Research Center
SCEC Pursuing Leadership Class Computer Systems
100 TF Systems10’s of Projects
10’s of 10 TF Systems1,000’s of Users
100’s of 1 TF Systems10,000’s of Users
Workstations
Departmental HPC
HPCCenters
GigaFLOPSMillions of Users
Key function of the NSF Supercomputer
Centers:
Provide facilities over and above what can
be found in the typical campus/lab
environment
Sci
enti
fic
Co
mp
uti
ng
Compute (more FLOPS)
Dat
a (m
ore
BY
TE
S)
Home, Lab, Campus, Desktop
TraditionalHPC
environment
Data-oriented Science
and Engineering Environment
CyberShake Estimates
G4 CyberShake PSHA Jordan 1.0Hz CyberShake Hazard map at 1.0Hz 500m/s Min Vs, output 3 components using 10 billion elements, 40k timesteps
AWP-ODC-GPU
300 0.33 99.00 10.00
SGT data: 878.90625
RSQSim Estimates
SCEC: An NSF + USGS Research Center
Intensity-Measure Relationship
List of Supported IMTs
List of Site-Related Ind. Params
IMT, IML(s) Site(s) Rupture
Attenuation Relationship
Waveform-Simulation Based IMR
Two types of IMRs (subclasses)
Figure 3
Here, the possible IMLs for a given Site and Earthquake Rupture are assumed to exhibit a Gaussian distribution.
Thus, in addition to reporting Prob(IMT>IML), this class can also give the predicted mean IML and the standard deviation.
These models are usually constructed by regression of observed IMLs onto some functional form.
Given an arbitrary Site and Earthquake Rupture, a suite of “viable” synthetic seismograms is computed using Pathway 2.
where the range of synthetics reflects uncertainties in the modeling process
The IMLs computed from the suite are then used to compute the probability of exceeding the specified IML.
SCEC/CME OpenSHA Conceptual Model
Seismic Hazard Calculation
IntensityMeasure
Type & Level
(IMT & IML)
Intensity-Measure
Relationship
List of Supported Intensity-Measure Types
List of Site-RelatedIndependent Parameters
Earthquake-RuptureForecast
List of AdjustableParameters
TimeSpan
SiteLocation
List of Site-Related
Parameters
Computing Estimates
80M Core Hours/Year x 0.0280M Total• 10M Year RSQSim 20M• 1Hz CyberShake regional Map 50M• Post-processing Local
Key Databases
• Simulation History• Rupture Lists• Rupture Variation Lists• Seismograms and Amplitudes• Seismic Hazard Curves at different periods• Ground Motion Forecasts
Existing Data Exchange Formats
• OpenSHA: Earthquake Rupture Catalog• RSQSim: Rupture Exchange Format• CyberShake: Standard Rupture Format• Ground Motion Forecast:
Standardized Rupture Description Development
Standardized Rupture Description (Robert Graves) supports exchange of ruptures between Pathways 1 and 2 and between Pathway 2 codes.
1.0PLANE 1 -118.1020 33.9670 46 27 46.00 27.00 289 27 5.00 -10.00 20.25POINTS 1242 -117.8700 33.9049 5.2270 289 27 1.00000e+10
8.5146 1.00000e-01 90 6.51 25 0.00 0 0.00 0 0.00000e+00 8.35041e-01 1.67008e+00 6.81352e-01
6.48907e-01 6.16462e-01 5.84016e-01 5.51571e-01 5.19126e-01 4.86680e-01
4.54235e-01 4.21790e-01 3.89344e-01 3.56899e-01 3.24454e-01 2.92008e-01
2.59563e-01 2.27117e-01 1.94672e-01 1.62227e-01 1.29781e-01 9.73361e-02
6.48907e-02 3.24454e-02 0.00000e+00 -117.8802 33.9078 5.2270 289 27 1.00000e+10
8.3068 1.00000e-01 90 19.61 25 0.00 0 0.00 0 0.00000e+00 8.35041e-01 1.67008e+00 6.81352e-01
6.48907e-01 6.16462e-01 5.84016e-01 5.51571e-01 5.19126e-01 4.86680e-01
4.54235e-01 4.21790e-01 3.89344e-01 3.56899e-01 3.24454e-01 2.92008e-01
2.59563e-01 2.27117e-01 1.94672e-01 1.62227e-01 1.29781e-01 9.73361e-02
6.48907e-02 3.24454e-02 0.00000e+00
The testing area is separated into cells (grid-based models)
A bin defines a volume (cell), magnitude range, and range of focal mechanism angles for which a forecast is issued
The default binning:
Lon/Lat 0.1°x0.1°Depth 0-30kmMagnitude 0.1Focal Mech. None (30°)
0.1°x0.1° Cells
Identification of Key Interfaces
• UCERF3 -> OpenSHA• OpenSHA -> CyberShake• CyberShake -> OpenSHA• OpenSHA -> CSEP• RSQSim -> OpenSHA• Users -> RSQSim• CISM -> Users• CISM - CSEP
CSEP Objectives & Design
1. Establish rigorous procedures for registering and evaluating prediction experiments
2. Construct community standards and protocols for comparative testing of predictions
3. Develop an infrastructure that allows groups of researchers to participate in prediction experiments
4. Provide access to authorized data sets and monitoring products for calibrating and testing prediction algorithms
5. Accommodate experiments involving fault systems in different geographic and tectonic environments
SCEC Computational Platform Concept• Computational Platform Concept emerged from the following
observations
– Using Cyberinfrastructure in large scale research quickly identifies which technologies are ready for application, and what are still research.
– A significant portion of the work involved in a large research study is the vertical integration of the Cyberinfrastructure used. It is desirable to preserve this integration once achieved
– Large scale research computing needs geoscientists and computer scientists working together.
SCEC Computational Platform Concept• Definition of Computational Platform
– A vertically integrated collection of hardware, software, and people that provides a broadly useful research capability
• Implied capabilities– Validated simulation software and geophysical models– Re-usable simulation capabilities– Imports parameters from other systems. Exports results to other
systems– IT/geoscience collaboration involved in operation– Access to High-performance hardware and large scale data and
metadata management.– May use Workflow management tools
Public and Governmental
ForecastsEngineering and interdisciplinary
Research
Collaborative Research Project
Individual Research Project
Computational codes, structural models, and simulation results versioned with associated tests.
Development of new computational, data, and physical models.
Automated retrospective testing of forecast models using community defined validation problems.
Automated prospective performance evaluation of forecast models over time within collaborative
forecast testing center.
Quantitatively Managed
Defined
Managed
Initial Activity
SCEC Computational Research Users
Scientific and Engineering Requirements for Computational Research Systems
Platform Maturity Levels
Our NSF awards requires “standard open-source license” and have approved the Open Source Initiative (http://opensource.org) requirements as standard. The Open Source Initiative requires distribution terms of open-source software must comply with the following criteria:1. Free RedistributionThe license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.2. Source CodeThe program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.3. Derived WorksThe license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.4. Integrity of The Author's Source CodeThe license may restrict source-code from being distributed in modified form only if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.5. No Discrimination Against Persons or GroupsThe license must not discriminate against any person or group of persons.6. No Discrimination Against Fields of EndeavorThe license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.7. Distribution of LicenseThe rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.8. License Must Not Be Specific to a ProductThe rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.9. License Must Not Restrict Other SoftwareThe license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.10. License Must Be Technology-NeutralNo provision of the license may be predicated on any individual technology or style of interface.