Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | thomas-fitzpatrick |
View: | 212 times |
Download: | 0 times |
1
Astronomy Applications in the TeraGrid Environment
Roy Williams, Caltech
with thanks for material to:Sandra Bittner, ANL;
Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA;
Nancy Wilkins-Diehr, SDSC
NVO Summer School Sept 2004
The TeraGrid VisionDistributing the resources is better than putting them at one site
Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications
New hardware, new networks, new software, new practices, new policies
Expand centers to support cyberinfrastructure Distributed, coordinated operations center Exploit unique partner expertise and resources to make whole
greater than the sum of its parts Leverage homogeneity to make the distributed
computing easier and simplify initial development and standardization
Run single job across entire TeraGrid Move executables between sites
NVO Summer School Sept 2004
What is Grid Really?
A set of powerful Beowulf clusters Lots of disk storage Fast interconnection Unified account management Interesting software
The Grid is not Magic Infinite Simple A universal panacea The hype that you have read
NVO Summer School Sept 2004
Grid as Federation
Teragrid as a federation
independent centers
flexibility
unified interface
power and strength
Large/small state compromise
NVO Summer School Sept 2004
TeraGrid Wide Area Network
6
Grid Astronomy
NVO Summer School Sept 2004
Quasar ScienceAn NVO-Teragrid projectPennState, CMU, Caltech
• 60,000 quasar spectra from Sloan Sky Survey• Each is 1 cpu-hour: submit to grid queue• Fits complex model (173 parameter)
derive black hole mass from line widths
clusters
globusrun
manager
NVO dataservices
NVO Summer School Sept 2004
N-point galaxy correlationAn NVO-Teragrid projectPitt, CMU
Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z)
Lots of large parallel jobs
kd-tree algorithms
NVO Summer School Sept 2004
Palomar-Quest SurveyCaltech, NCSA, Yale
P48 Telescope
Caltech Yale
NCSA
Transient pipeline computing reservation at sunrise for immediate followup of transients
Synoptic survey massive resampling (Atlasmaker) for ultrafaint detection
TG
NCSA and Caltech and Yale run different pipelines on the same data
50 Gbyte/night
5 Tbyte
ALERT
NVO Summer School Sept 2004
Transient from PQfrom catalog pipeline
NVO Summer School Sept 2004
PQ stacked imagesfrom image pipeline
NVO Summer School Sept 2004
Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech
High-qualityflux-preserving, spatial accuracy
StackableHyperatlas
Edge-freePyramid weight
Mining AND Outreach
DPOSS 15º
Griffith Observatory "Big Picture"
NVO Summer School Sept 2004
2MASS Mosaicking portalAn NVO-Teragrid projectCaltech IPAC
14
Teragrid hardware
NVO Summer School Sept 2004
TeraGrid Components
Compute hardware Intel/Linux Clusters, Alpha SMP clusters,
POWER4 cluster, … Large-scale storage systems
hundreds of terabytes for secondary storage Very high-speed network backbone
bandwidth for rich interaction and tight coupling Grid middleware
Globus, data management, … Next-generation applications
NVO Summer School Sept 2004
Overview of Distributed TeraGrid Resources
HPSSHPSS
HPSS UniTree
External Networks
External NetworksExternal
Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI10.3 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
NVO Summer School Sept 2004
Compute Resources – NCSA2.6 TF ~10.6 TF w/ 230 TB
GbE FabricGbE Fabric
Myrinet Fabric
2p 1.3 GHz4 or 12 GB memory
73 GB scratch
Brocade 12000 Switches
256 2x FC
2.6 TF Madison256 nodes
2p Madison4 GB memory
2x73 GB
2p Madison4 GB memory
2x73 GB
8 TF Madison 667 nodes
Storage I/Oover Myrinet and/or GbE
230 TB Interactive+Spare Nodes
Login, FTP
8 4pMadisonNodes
30 Gbps to TeraGrid Network
2p Madison4 GB memory
2x73 GB
92 2x FC
250MB/s/node * 670 nodes250MB/s/node * 256 nodes
NVO Summer School Sept 2004
Compute Resources – SDSC 1.3 TF ~4.3 + 1.1 TF w/ 500 TB
GbE FabricGbE Fabric
Myrinet Fabric
2p 1.3 GHz4 GB
memory73 GB scratch
Brocade 12000 Switches128 2x FC
1.3 TF Madison128 nodes
2p Madison4 GB memory
2x73 GB
2p Madison4 GB memory
2x73 GB
500 TB
Login, FTP
30 Gbps to TeraGrid Network
256 2x FC
128 2x FC128 2x FC
128 250MB/s 128 250MB/s 128 250MB/s
3 TF Madison 256 nodes
Interactive+Spare Nodes6 4p
MadisonNodes
NVO Summer School Sept 2004
Compute Resources – Caltech~ 100 GF w/ 100 TB
GbE FabricGbE Fabric
Myrinet Fabric
2p Madison6 GB memory73 GB scratch
34 GF Madison17 HP/Intel nodes
2p Madison6 GB memory
2x73 GB
13 Tape drives1.2 PB silo raw capacity
Login, FTP
30 Gbps to TeraGrid Network
13 2xFC
36 250MB/s
72 GF Madison 36 IBM/Intel nodes
Interactive Node
17 250MB/s
2p IBMMadisonNode
4p Opteron8 GB memory 66 TB RAID5
HPSS Datawulf
6 Opteron nodes
2p ia326 GB memory100 TB /pvfs
33 IA32 storage nodes 100 TB /pvfs
33 250MB/s
2p Madison6 GB memory73 GB scratch
20
Using Teragrid
NVO Summer School Sept 2004
Wide Variety of Usage Scenarios
Tightly coupled jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO)
Thousands of independent jobs using data from a distributed data collection (NVO)
Science Gateways – "not a Unix prompt"! from web browser with security from application eg IRAF, IDL
NVO Summer School Sept 2004
Traditional Parallel Processing
Single executables to be on a single remote machine big assumptions
runtime necessities (e.g. executables, input files, shared objects) available on remote system!
login to a head node, choose a submission mechanism
Direct, interactive execution mpirun –np 16 ./a.out
Through a batch job manager qsub my_script
where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification…
NVO Summer School Sept 2004
Traditional Parallel Processing II
Through globus globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script
where my_rsl_script describes the same details as in the qsub my_script!
Through Condor-G condor_submit my_condor_script
where my_condor_script describes the same details as the globus my_rsl_script!
NVO Summer School Sept 2004
Distributed Parallel Processing
Decompose application over geographically distributed resources functional or domain decomposition fits well take advantage of load balancing
opportunities think about latency impact
Improved utilization of a many resources
Flexible job management
NVO Summer School Sept 2004
Pipelined/dataflow processing
Suited for problems which can be divided into a series of sequential tasks where multiple instances of problem need
executing series of data needs processing with
multiple operations on each series information from one processing phase can
be passed to next phase before current phase is complete
NVO Summer School Sept 2004
Security
ssh with password Too much password-typing Not very secure-- big break-in at TG April 04
One failure is a big failure all TG!
Caltech and Argonne no longer allow this SDSC does not allow password change
NVO Summer School Sept 2004
Security
ssh with public key: single sign-on! use ssh-keygen on Unix or puttykeygen on Windows
public key file (eg id_rsa.pub) AND private key file (eg id_rsa) AND passphrase
on remote machine, put public ke .ssh/authorized_keys
on local machine, combine private key and passphrase ATM card model
On TG, can put public key on application form immediate login, no snailmail
NVO Summer School Sept 2004
Security
X.509 certificates: single sign-on! from a Certificate Authority (eg verisign, US navy, DOE, etc etc)It is:
Distinguished Name (DN) AND /C=US/O=National Center for Supercomputing Applications/CN=Roy Williams
Private file (usercert.p12) AND passphrase
Remote machine needs entry in gridmap file (maps DN to account)
use gx-map command Can create certificate with ncsa-cert-request etc Certificates can be lodged in web browser
NVO Summer School Sept 2004
3 Ways to Submit a Job
1. Directly to PBS Batch Scheduler Simple, scripts are portable among PBS TeraGrid clusters
2. Globus common batch script syntax Scripts are portable among other grids using Globus
3. Condor-G Nice interface atop Globus, monitoring of all jobs submitted via Condor-G Higher-level tools like DAGMan
NVO Summer School Sept 2004
PBS Batch Submission
ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org qsub flatten.sh –v "FILE=f544" qstat or showq ls *.dat pbs.out, pbs.err files
NVO Summer School Sept 2004
globus-job-submit
For running of batch/offline jobs globus-job-submit Submit job
same interface as globus-job-run returns immediately
globus-job-status Check job status globus-job-cancel Cancel job globus-job-get-output Get job
stdout/err globus-job-clean Cleanup after job
NVO Summer School Sept 2004
Condor-G Job Submission
tg-login.sdsc.teragrid.org
PBS
Globus job manager
mickey.disney.edu
Globus API
Condor-G
executable=/wd/doituniverse=globusglobusscheduler=<…>globusrsl=(maxtime=10)queue
executable=/wd/doituniverse=globusglobusscheduler=<…>globusrsl=(maxtime=10)queue
NVO Summer School Sept 2004
Condor-G
Combines the strengths of Condor and the Globus Toolkit Advantages when managing grid jobs
full featured queuing service credential management fault-tolerance DAGman (== pipelines)
NVO Summer School Sept 2004
Condor DAGMan
Manages workflow interdependencies Each task is a Condor description file A DAG file controls the order in which
the Condor files are run
NVO Summer School Sept 2004
Where’s the disk
Home directory $TG_CLUSTER_HOME
example /home/roy
Shared writeable global areas $TG_CLUSTER_PFS
example /pvfs/MCA04N009/roy
NVO Summer School Sept 2004
GridFtp
Moving a Test File
% globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2
Also uberftp and scp
NVO Summer School Sept 2004
Storage Resource Broker (SRB)
Single logical namespace while accessing distributed archival storage resources
Effectively infinite storage (first to 1TB wins a t-shirt)
Data replication Parallel Transfers Interfaces: command-line, API,
web/portal.
NVO Summer School Sept 2004
Storage Resource Broker (SRB):Virtual Resources, Replication
NCSA
SDSC
workstation
SRB Client (cmdline,
or API)
hpss-sdsc
sfs-tape-sdsc
hpss-caltech
…
NVO Summer School Sept 2004
Allocations Policies
TG resources allocated via the PACI allocations and review process
modeled after NSF process TG considered as single resource for grid allocations
Different levels of review for different size allocation requests
DAC: up to 10,000 PRAC/AAB: <200,000 SUs/year NRAC: 200,000+ SUs/year
Policies/procedures posted at:http://www.paci.org/Allocations.html
Proposal submission through the PACI On-Line Proposal System (POPS)
https://pops-submit.paci.org/
minimal review, fast turnaround
NVO Summer School Sept 2004
Requesting a TeraGrid Allocation
htt
p:/
/ww
w.p
aci
.org
NVO Summer School Sept 2004
24/7 Consulting Support
[email protected] advanced ticketing system for cross-site support staffed 24/7 866-336-2357, 9-5 Pacific Time
http://news.teragrid.org/ Extensive experience solving problems for
early access users Networking, compute resources, extensible
TeraGrid resources
NVO Summer School Sept 2004
Links
www.teragrid.org/userinfo getting an account [email protected] news.teragrid.org site monitors
43
DemoData intensive computing with NVO services
NVO Summer School Sept 2004
DPOSS flattening
2650 x 1.1 Gbyte files
Cropping borders
Quadratic fit and subtract
Virtual data
Source Target
NVO Summer School Sept 2004
Driving the Queues for f in os.listdir(inputDirectory):
# if the file exists, with the right size and age, then we keep it
ofile = outputDirectory +"/"+ f
if os.path.exists(ofile):
osize = os.path.getsize(ofile)
if osize != 1109404800:
print " -- wrong target size, remaking", osize
else:
time_tgt = filetime(ofile)
time_src = filetime(file)
if time_tgt < time_src:
print(" -- target too old or nonexistant, making")
else:
print " -- already have target file "
continue
cmd = "qsub flat.sh -v \"FILE=" + f +"\""
print " -- submitting batch job: ", cmd
os.system(cmd)
Here is the driver that makes and submits jobs
NVO Summer School Sept 2004
PBS script
#!/bin/sh
#PBS -N dposs
#PBS -V
#PBS -l nodes=1
#PBS -l walltime=1:00:00
cd /home/roy/dposs-flat/flat
./flat \
-infile /pvfs/mydata/source/${FILE}.fits \
-outfile /pvfs/mydata/target/${FILE}.fits \
-chop 0 0 1500 23552 \
-chop 0 0 23552 1500 \
-chop 0 22052 23552 23552 \
-chop 22052 0 23552 23552 \
-chop 18052 0 23552 4000
A PBS script. Can do "qsub script.sh –v "FILE=f345"
NVO Summer School Sept 2004
Atlasmakera service-oriented applicationon Teragrid
VO Registry
SIAP
Hyperatlas
Federated Images:wavelength, time, ...
source detectionaverage/max
subtraction
NVO Summer School Sept 2004
HyperatlasStandard naming for atlases and pages
TM-5-SIN-20Page 1589
Standard Scales:scale s means 220-s arcseconds per pixel
SIN projection
TAN projection
TM-5 layout
HV-4 layout
Standard Projections
StandardLayout
NVO Summer School Sept 2004
Hyperatlas is a ServiceAll Pages: <baseURL>/getChart?atlas=TM-5-SIN-200 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.01 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.02 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0...1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.01732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.01733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0
Best Page: <baseURL>/getChart?atlas=TM-5-SIN-20&RA=182&Dec=62
1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0
Numbered Page: <baseURL>/getChart?atlas=TM-5-SIN-20&page=1604
1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0
Replicated ImplementationsbaseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services)baseURL = http://virtualsky.org/servlet
NVO Summer School Sept 2004
GET services from Python
hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \
+ "&RA=" + str(center1) + "&Dec=" + str(center2)
stream = urllib.urlopen(hyperatlasURL)
# result is a tab-separated line, so use split() to tokenize
tokens = stream.readline().split('\t')
print "Using page ", tokens[0], " of atlas ", atlas
self.scale = float(tokens[1])
self.CTYPE1 = tokens[2]
self.CTYPE2 = tokens[3]
rval1 = float(tokens[4])
rval2 = float(tokens[5])
This code uses a service to find the best hyperatlas page for a given sky location
NVO Summer School Sept 2004
VOTable parser in Python
stream = urllib.urlopen(SIAP_URL)
doc = xml.dom.minidom.parse(stream)
#Make a dictionary for the columns
col_ucd_dict = {}
for XML_TABLE in doc.getElementsByTagName("TABLE"):
for XML_FIELD in XML_TABLE.getElementsByTagName("FIELD"):
col_ucd = XML_FIELD.getAttribute("ucd")
col_ucd_dict[col_title] = col_counter
urlColumn = col_ucd_dict["VOX:Image_AccessReference"]
formatColumn = col_ucd_dict["VOX:Image_Format"]
raColumn = col_ucd_dict["POS_EQ_RA_MAIN"]
deColumn = col_ucd_dict["POS_EQ_DEC_MAIN"]
From a SIAP URL, we get the XML, and extract the columns that have the image references, image format, and image RA/Dec
(need exception catching here)
NVO Summer School Sept 2004
VOTable parser in Python
table=[]
for XML_TABLE in doc.getElementsByTagName("TABLE"):
for XML_DATA in XML_TABLE.getElementsByTagName("DATA"):
for XML_TABLEDATA in XML_DATA.getElementsByTagName("TABLEDATA"):
for XML_TR in XML_TABLEDATA.getElementsByTagName("TR"):
row=[]
for XML_TD in XML_TR.getElementsByTagName("TD"):
data = ""
for child in XML_TD.childNodes:
data += child.data
row.append(data)
table.append(row)
Table is a list of rows, and each row is a list of table cells
NVO Summer School Sept 2004
SOAP client in Python
from SOAPpy import *
# get fitsheader string as FITS header
# get x1, x2 as coordinates on image
server = SOAPProxy("http://mercury.cacr.caltech.edu:9091")
wcsR = server.xy2sky(fitsheader, x1, x2)
ra = wcsR["c1"]
dec = wcsR["c2"]
status = wcsR["status"]
message = wcsR["message"]
print "Sky coordinates are:", ra, dec
print "status is: ", status
print "Message is: ", message
WCSTools (xy2sky and sky2xy) as web services
NVO Summer School Sept 2004
Future: Science Gateways
NVO Summer School Sept 2004
Teragrid Impediments
Learn GlobusLearn MPILearn PBSPort code to ItaniumGet certificateGet logged inWait 3 months for accountWrite proposal
and now do some science....
NVO Summer School Sept 2004
A better way:Graduated Securityfor Science Gateways
Web form - anonymous
somescience....
Register - logging and reporting
morescience....
Authenticate X.509- browser or cmd line
big-ironcomputing
....
Write proposal- own account
power user
NVO Summer School Sept 2004
Secure Web servicesfor Teragrid Access
web form(browser hascertificate)
auto-generated client APIfor scripted submission(certificate in .globus/)
ClarensBOSSPBSGridportXforms
distribute jobs on grid
Embedded in existingclient application (Root, IRAF, IDL, ...)
Embedded as part of other service(proxy agent)
NVO Summer School Sept 2004
Shell command
List files, get files
Submit job to TG queue (Condor / Dagman / globusrun)
Monitor running jobs
Secure Web servicesfor Teragrid Access
NVO Summer School Sept 2004
Teragrid Wants YOU!
Your astronomy applications Your science gateway projects Teragrid has 100's of processors and
100's of terabytes