Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | merryl-hunter |
View: | 214 times |
Download: | 0 times |
Interoperability Achieved by GADU in using multiple Grids.Interoperability Achieved by GADU in using multiple Grids.OSG, Teragrid and ANL JazzOSG, Teragrid and ANL Jazz
Presented by:Dinanath Sulakhe
Mathematics and Computer Science DivisionArgonne National Laboratory
Computational InstituteUniversity of Chicago
GADU Applications…
Its all about Comparative analysis
Insights of Biology are gained by Comparative Analysis:Insights of Biology are gained by Comparative Analysis: Unknown genes are compared against known.Unknown genes are compared against known. Similar genes tend to perform same functions.Similar genes tend to perform same functions.
Comparative analysis to know what is same and different between two strains of Comparative analysis to know what is same and different between two strains of an Organism:an Organism:
Example: What is different a organism living Boiling temperature such as 108 deg Example: What is different a organism living Boiling temperature such as 108 deg Celsius and the one living in extreme freezing conditions.Celsius and the one living in extreme freezing conditions.
Difference between Pathogenic and non-pathogenic organisms.Difference between Pathogenic and non-pathogenic organisms. Mycobecterium Tuberculosis is a Pathogen causing TB, is only 12 genes different Mycobecterium Tuberculosis is a Pathogen causing TB, is only 12 genes different
from the non-pathogenic BCG used as vaccine against TB.from the non-pathogenic BCG used as vaccine against TB.
ToolsBLAST , Blocks, Chisel, Interpro etc..BLAST , Blocks, Chisel, Interpro etc.. An embarrassingly parallel workload.An embarrassingly parallel workload.
GADU’s evolution ..GADU’s evolution ..
GADU Just evolved into what it is today.
Chiba City at Argonne.Chiba City at Argonne.Jazz Cluster at Argonne.Jazz Cluster at Argonne.Grid2003 to OSGGrid2003 to OSGTeragridTeragrid
All of them togeather.All of them togeather.
Some Results and HighlightsSome Results and Highlights GADU can successfully use OSG and Teragrid GADU can successfully use OSG and Teragrid resources simultaneously.resources simultaneously.
Individual clusters such as ANL Jazz is also used Individual clusters such as ANL Jazz is also used parallely. parallely.
Site selection and scheduling across multiple grids.Site selection and scheduling across multiple grids.
Easily add a new site into the pool of sites.Easily add a new site into the pool of sites.
Status Site Name Site Test MaxNodes Gridcat
ASGC_OSG 18 199 Pass
FNAL_FERMIGRID 12 12 Pass
FNAL_GPFARM 266 749 Pass
GRASE-CCR-U2 114 2112 Pass
Nebraska FAIL_TIMEOUT 252 Pass
OSG_LIGO_PSU 28 312 Pass
Purdue-ITaP 13 1224 Pass
Purdue-Physics 14 63 Pass
STAR-BNL FAIL_TIMEOUT 672 Pass
UFlorida-PG 279 268 Pass
UMATLAS FAIL_TIMEOUT 771 Pass
UTA_DPCC 18 154 Inactive
UWMadisonCMS FAIL_TIMEOUT 90 Pass
grow-UNI-P FAIL_TIMEOUT 17 Pass
TG_UC 44 316 NONE
TG_NCSA 55 1000 NONE
TG_PURDUE FAIL_FTP 1024 NONE
Last Run .. ( Last week)Last Run .. ( Last week)
RanRan 38830 BLAST Jobs38830 BLAST Jobs70% OSG70% OSG30% Teragrid30% Teragrid
Grid Resources..Open Science Grid and Teragrid.
AuthenticationAuthentication.. OSGOSG
OSG : GADU VOMS Server.OSG : GADU VOMS Server.DOE Grid Certificates are automatically picked by the Sites.DOE Grid Certificates are automatically picked by the Sites.
TeraGridTeraGridIndividual Accounts via Allocations.Individual Accounts via Allocations.Manually adding DOE Grid certificates to each site. (gx-map).Manually adding DOE Grid certificates to each site. (gx-map).
Application DeploymentApplication Deployment.. OSGOSG
OSG variables, $OSG_APP and $OSG_DATA is used to install GADU’s OSG variables, $OSG_APP and $OSG_DATA is used to install GADU’s applications and pre-stage the databases such as NR.applications and pre-stage the databases such as NR.
TeraGridTeraGridGADU has a Community space on each of the sites available. GADU has a Community space on each of the sites available. Applications and installed within this community space.Applications and installed within this community space.
Resource Independent GADU.GADU uses Pegasus based VDS and Condor-G
GlobusGRAM Interface
Pegasus
DAGManCondor-G
tc.data
Pool.config
Abstract Workflowas VDL
Condor Submit files
Submit Host
WN
Job management system
GatekeeperJobManager
WNWN
Remote Resources
WN
Job management system
GatekeeperJobManager
WNWNWN
Job management system
GatekeeperJobManager
WNWN
Information Services
GADU’s automated Analysis Server, expressing, executing and tracking the scientific workflows on Grid.
Database
Controller
Query Interface
Resource Independent GADU.GADU uses Pegasus based VDS and Condor-G
The Workflow Generator in GADU is responsible for producing a workflow suitable for execution in the Grid environment. This task is accomplished through the use of the “virtual data language” (VDL).
Once the VDL for the workflow is written, VDS converts it into condor submit files and a DAG that can be submitted to the site selected by the site selector.
TR FileBreaker(input filename, none nodes, output sequences[], none species) { argument = ${species}; argument = ${filename}; argument = ${nodes}; profile globus.maxwalltime = "300";}TR BLAST( none OutPre, none evalue, input query[], none type ) { argument = ${OutPre}; argument = ${evalue}; profile globus.maxwalltime = "300";}DV jobNo_1_1separator->FileBreaker( filename=@{input:"inputfile.1"|rt}, nodes="5", sequences=[@{output:"job1.0":"tmp"}, @{output:"job1.1":"tmp"}, @{output:"job1.2":"tmp"}, @{output:"job1.3":"tmp"}, @{output:"job1.4":"tmp"} ], species="Aeropyrum_Pernix")…. VDL for BLAST workflow
Resource Independent GADU.
4 Millionsequences
Fig. Example of a Dag representing the workflow.
ATGCATGCA
1000sequencesATGCATGCA
Resource Independent GADU.Representing a Site and the applications on it..
#SITE Transformation PFN TYPEANL_Jazz BLAST /soft/apps/BLAST/bin/blastall nullANL_Jazz Blocks /soft/apps/run-Blocks.pl nullANL_Jazz Chisel /soft/apps/chisel/runChisel.pl nullANL_Jazz IPRSCAN /soft/apps/iprscan_wrapper.pl nullANL_Jazz globus-url-copy /soft/apps/packages/globus-2.2.4/bin/globus-url-copy GLOBUS_LOCATION=/soft/apps/packages/globus-2.2.4/;LD_LIBRARY_PATH=/soft/apps/packages/globus-2.2.4/lib;PATH=/soft/apps/packages/globus-2.2.4/bin
pool ANL_Jazz { lrc "rls://gnare.mcs.anl.gov“ gridftp "gsiftp:// jmayor1.lcrc.anl.gov:2812/soft/apps/gadu" gridlaunch "/soft/apps/gadu/bin/kickstart" workdir "/soft/apps/gadu/vdldata" universe vanilla "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs" universe globus "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs" universe transfer " jmayor1.lcrc.anl.gov:2812/jobmanager-fork"}…. pool.config
tc.data
Resource Independent GADU.GADU uses Pegasus based VDS and Condor-G
GlobusGRAM Interface
Pegasus
DAGManCondor-G
tc.data
Pool.config
Abstract Workflowas VDL
Condor Submit files
Submit Host
WN
Job management system
GatekeeperJobManager
WNWN
Remote Resources
WN
Job management system
GatekeeperJobManager
WNWNWN
Job management system
GatekeeperJobManager
WNWN
Information Services
GADU’s automated Analysis Server, expressing, executing and tracking the scientific workflows on Grid.
Database
Controller
Query Interface
Requirements ... Information Services.VDS like System can to provide an Architecture independent mechanism to use different sites (Grids)
Information Services at various levelsInformation Services at various levels
Authentication – To check if the certs are valid at this site.Authentication – To check if the certs are valid at this site.Architecture – Is it an ia-32 cluster or an ia-64 ?Architecture – Is it an ia-32 cluster or an ia-64 ?Gatekeeper, GridFtp Server.Gatekeeper, GridFtp Server.Environment Variables – $OSG_APP, $TG_COMMUNITYEnvironment Variables – $OSG_APP, $TG_COMMUNITY
Number of CPUsNumber of CPUsNumber of Used CPUs.Number of Used CPUs.Number of Idle CPUs.Number of Idle CPUs.VO (user) specific jobs running at a given site.VO (user) specific jobs running at a given site.VO (user) specific jobs sitting in QUEUE at a given site (why?)VO (user) specific jobs sitting in QUEUE at a given site (why?)
We a need standards and protocols for these Information Services and identify more We a need standards and protocols for these Information Services and identify more information variables that needs to published by the Grids.information variables that needs to published by the Grids.
Gridcat or MDS or something else.Gridcat or MDS or something else.Currently GADU uses GridCat to collect site specific information for OSG and manually Currently GADU uses GridCat to collect site specific information for OSG and manually adds information for TeraGrid and Jazz. We are working on an MDS based information adds information for TeraGrid and Jazz. We are working on an MDS based information interface on TeraGrid.interface on TeraGrid.
In order to automatically add a new Grid site, we need information about the site:
Another Big Challenge.. Site Selection.GADU has access to 60 OSG Sites and 5 TeraGrid Sites.One challenge in using the Grid reliably for high-throughput analysis is monitoring the state of all Grid sites and how well they have performed for job requests from a given submit host. We view a site as “available” if our submit host can communicate with it, if it is responding to Globus job-submission commands, and if it will run our jobs promptly, with minimal queuing delays
4
GRID3
…..
…..
TeraGrid
JAZZ
PDSF
UBuffalo
ANL
SDSC
Test job for each site Run parallelly –Forking
site_tester.pl(each child process writes to
the site status file below)
# - manually forced to not to use1 - working site.0 - site failed
Site Status File: status | test-time* | site
1 10 jazz0 FAIL pdsf
#1 80 sdsc – tg….* - time in secs.
5
3
Blast/Blocks ServerRequest a site
Get site with details1
Site_selector.pl
get_all_working_sitesforeach working_site{
get_condor_q details.
if (#of jobs in Q == 0)&&
if ( toal # jobs on the host < max_allowed )
select the site.}get_selected_site_details
return (@site_and_details)
Site Info File: site | #max_nodes | nodes/batch |seqs/nodejazz 360 30 100pdsf 500 40 100sdsc 70 10 150…..Sequences/batch = nodes/batch x seqs/node
condor_q –global -globusID | .. .. | manager | ST | .. 1 jazz R blast..1.1 jazz R blast..2 Ubuff Q blast..…..2
7
6
GADU Server
OSG
Another Big Challenge.. Site Selection.GADU has access to 60 OSG Sites and 5 TeraGrid Sites.
Web Interface to Control the Selection of Sites for GADU:
http://compbio.mcs.anl.gov/sulakhe/cgi-bin/site_selection_new.pl?user=dina
Web Interface showing live status of usage:
http://compbio.mcs.anl.gov/gaduvo/gadu_jobs.cgi
Grid may not worry about this…
Next Steps..
• Working with Teragrid Information Services group – MDS based interface.
• Continue to improve GADU’s implementation of Site Selection.
• Trying to generalize Site Selection using the Information Services such as MDS and Gridcat.
• Continue to deploy faster scientific applications for the Bioinformatics Group at Argonne.
Bioinformatics Group:Natalia Maltsev, PI
• Alex Rodriguez• Elizabeth Glass• Mark D’ Souza• Mustafa Syed• Yi Zhang
Globus and VDS• Mike Wilde• Nika Nefedova• Jens Voeckler• Ian Foster• Rick Stevens
• VDT Support.• Condor Support.• Systems at MCS.
Acknowledgements
Open Science Grid• Thanks to Ruth Pordes and OSG team for their wonderful support
TeraGrid• Charlie Catlett• Special thanks to David O’Neal, Joeseph Insley, and Sergiu Sanielevici