TeraGrid Gateway UserConcept – Supporting Users
V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai
Oak Ridge National Laboratory
In collaboration with many teams:
NSTG, SNS Scientific Computing, McStas group, Open Science Grid, Tech-X Corp, and the TeraGrid Partners teams.
What is a Science Gateway?
A Science Gateway Enables scientific communities of
users with a common scientific goal
Uses high performance computing Has a common interface Leverages community investment
Three common forms: Web-based Portals Application programs running on
users' machines but accessing services in TeraGrid
Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.
How can a Gateway help?
Make science more productive Researchers use same tools Complex workflows Common data formats Data sharing
Bring TeraGrid capabilities to the broad science community Lots of disk space Lots of compute resources Powerful analysis capabilities A nice interface to information
What is the TeraGrid?
4
SDSC
TACC
UC/ANL
NCSA
ORNL
PU
IU
PSCNCAR
Caltech
USC/ISI
UNC/RENCI
UW
Resource Provider (RP)
Software Integration Partner
LONI
NICS
Grid Infrastructure Group (UChicago)(GIG)
GIG
20 computers at 11 facilities 10Gbps networkOver a petaflop of computing power136,470 CPU-cores60 petabytes long-term storageGrowing
Neutron Science TeraGrid Gateway
Focus is neutron science
Connects facilities with cyberinfrastructure
Bridges cyberinfrastructure
Combines TeraGrid computational resources with neutron datasets
Data movement across TeraGrid
Outreach to neutron science
5
Community Certificate and Account Gateways with
community accounts scale to thousands of facility users
Have Jimmy Neutron community accounts on 14 TeraGrid computers
Use Jimmy Neutron Community Certificate from SNS community account
Record end-user identification for auditing and return of results
6
Community Account
7 TeraGrid
Before Gateways
Large facilities: Recorded histogram data from experiments
Users: Took their data home on floppy disk in pocket Saved permanent copy on hard disk Did not have event data to change histogram Translated data into format needed for analysis Wrote their own code to read and analyze data Installed discipline focused software on their PC for analysis Installed plotting programs/libraries to plot analysis output
8 TeraGrid
After Gateways
Large facilities: Record data from multiple facilities
SNS, HFIR, LENS, IPNS, LUJAN, …
Save permanent copy of raw data Bin event data into histogram Translate data into standard NeXus format Have analysis and simulation programs available from portal Use remote TeraGrid cyberinfrastructure for computations Have visualization capability in portal
Users: Use portal from web for all data, analysis, and visualization
9 TeraGrid
Gateway Savings
Users do not duplicate efforts
Facilities do not duplicate efforts
Data is not lost
Data is easily shared
Natural way to integrate community contributed instrument specific software for hosting and wide availability to many facility users
Analysis is done quickly on high performance computers
10 TeraGrid
Portal submit to TeraGrid
11 TeraGrid
Simulation Service Simulation of neutron
instrument is available in portal with McStas
Simulations agree with experimental results
Linear scaling to 1024 cores
Output is NeXus
Use cases: Instrument design and
construction Experiment planning and
analysis
Fitting Service
• Fits theoretical models to the NeXus data files from the experiments• Adaptive nonlinear least squares algorithm implemented in parallel• Linear speedup to 32 cores• Service to run on TeraGrid
12 TeraGrid
Reduction Service
•Reduction software is available tor backscattering and reflectometry through portal
•Calculations will be sent to local cluster and TeraGrid
•Attempted to parallelize this calculation by distributing regions of the time-of-flight to each processor. • Each processor read only its region of the NeXus input data file and write a new file containing only that region.
•Each processor performs the data reduction on its file.
•The results are merged at the end of the calculation
13 TeraGrid
Reflectometry Data Reduction
Job Information Service
14 TeraGrid
• Portal Job Information Service tells where jobs is running, when it started and status
• Daily tests of submitting five simultaneous remote portal jobs
• Percentage success is > 82%
Tests of Remote Job Submission
15 TeraGrid
• Difficult to diagnose the problem from the Globus output.
- Check the status of the computer
- Look at the output files• Some problems diagnosed:
- Updated software on a computer that required relinked executables
- Globus software setting the wrong time limit
- Batch prologue script that killed jobs on same core
- Long queue waits- Firewall installed
Conclusions
Gateways help facilities scale to a large number of users
Gateways give facilities access to high performance computing such as the TeraGrid
Gateways enable a scientific community to use community software through a common interface
Researchers are more productive if they use the same tools, use a common data format, and share data easily
16 TeraGrid