+ All Categories
Home > Documents > Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart...

Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart...

Date post: 28-Dec-2015
Category:
Upload: duane-brooks
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President, Research & Academic Computing
Transcript
Page 1: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Life Science Software and High Performance Computing Seminar Series Part IV

Craig A. Stewart

Fulbright Senior Scholar beim ZIH

Associate Vice President, Research & Academic Computing

Page 2: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

License Terms

• Please cite this presentation as: Stewart, C.A. Life Science Software and High Performance Computing: Seminar Series Part IV. 2006. Presentation. Presented at: Technische Universitaet Dresden (Dresden, Germany, 27 Apr 2006). Available from: http://hdl.handle.net/2022/14767

• Portions of this document that originated from sources outside IU are shown here and used by permission or under licenses indicated within this document.

• Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse.

• Except where otherwise noted, the contents of this presentation are copyright 2007 by the Trustees of Indiana University. This content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Page 3: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Life Science Software and HPC Seminar Plan as of today

• Today: – Some thoughts and observations on US national projects and

centers• Funding agencies• HPC/grid computing• Bioinformatics and computational biology

– Performance analysis• Late June – another visit to Dresden, associated with the ISC• Late August – another visit to Dresden, associated with Euro-PAR

Page 4: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

US Funding agencies (1)

• National Science Foundation - $5.5B/year annual budget, fund about 20% of all basic research in US. Basic research in comp sci, math, biology, geology, etc. www.nsf.gov

• National Institutes of Health - $27.5B/year. Funds largest share of medical research. 27 separate institutes and centers www.nih.gov

• Department of Energy. Funds much applied and basic research. Funds: Argonne National Laboratory, Brookhaven National Laboratory, Fermi National Accelerator Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Oak Ridge National Laboratory, Pacific Northwest National Laboratory, Sandia National Laboratories, Stanford Linear Accelerator Center, Electron accelerators, Thomas Jefferson National Accelerator Facility www.doe.gov

Page 5: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

US Funding agencies (2)

• Department of Defense. http://www.defenselink.mil/– Defense Advanced http://www.darpa.mil/– High Productivity Computing Systems program

http://www.darpa.mil/ipto/programs/hpcs/programplan.htm• Military branches (esp. Army, Navy, Air Force)• Department of Homeland Security http://www.dhs.gov/dhspublic/• National Security Agency www.nsa.gov• Congressional markups

Page 6: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Some shining successes

• DARPANet/Internet/Abilene• NSF HPC Centers/NITRD• “Hallmark” demos e.g. Tornado, Caterpillar bulldozer design• It’s really possible for a good researcher to get time on a

nationally shared superocmputer and get help with it

Page 7: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

DARPA High Productivity Computing System program

IBM, Cray, Sun currently phase II industry partners

http://www.darpa.mil/ipto/programs/hpcs/programplan.htm

Page 8: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Real, not peak

http://www.darpa.mil/ipto/programs/hpcs/assessment.htm

Page 9: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Current Top500 list• DOE impact on top of list!

• http://www.top500.org/lists/2005/11/basic

Page 10: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

NSF strategies

• Office of Cyberinfrastructure. Daniel Atkins, Director• Report of the National Science Foundation Blue-Ribbon Advisory Panel on

Cyberinfrastructure. http://www.nsf.gov/publications/pub_summ.jsp?ods_key=cise051203 (aka “the Atkins Report”).

• Draft – NSF’s Cyberinfrastructure vision for the 21st century. http://www.nsf.gov/od/oci/ci_v5.pdf

• NSF Cyberinfrastructure panel• Systems

– $30M/year x 4 solicitations for large shared systems– $200M for a 1 PetaFLOPS *achieved* system– Focus on science results

• Software– National Middleware Initiative

Page 11: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

National supercomputer centers• Pittsburgh Supercomputer Center• San Diego Supercomputer Center• National Computational Science Alliance• TeraGrid• Other university centers of note:

– Purdue University– Ohio Supercomputer Center– Louisiana State University– Texas Advanced Computer Center– Texas Tech– Rice– Cal-Tech– Cornell– U. Chicago (computation, electronic visualization lab)– Florida/SURA

Page 12: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

NIH

• National Center for Research Resources• Really focused on clinical resources, not computing

resources• NIH is perhaps doing more than any other funding agency to

promote openness in research as a result of its data access policies and support for open source software

• National library of medicine, protein data bank (also supported by NSF)

Page 13: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

A semi-random walk through some US projects

Page 14: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

CIPRES Cyberinfrastructure for Phylogenetic Research (CIPRES)

• http://www.phylo.org/• The largest active phylogenetics group going. “The goal of

the CIPRES project is to enable large-scale phylogenetic reconstructions on a scale that will enable analyses of huge datasets containing hundreds of thousands of bio molecular sequences “ Have 5 years of funding.

• Computational phylogenetics activities: phylogenetic reconstruction from gene order, gene sequences. Horizontal gene transfer.

Page 15: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Renci (renaisannce computing institute)

• http://www.renci.org/• Led by Dan Reed. “a major collaborative venture of Duke

University, North Carolina State University, the University of North Carolina at Chapel Hill and the state of North Carolina.”

• Funding through the National Middleware Initiative• Key role in the TeraGrid

Page 16: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,
Page 17: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Argonne National Lab Biosciences Division• Let by Rick Stevens. http://www.bio.anl.gov/• LOTS of structural biology. Very focused, well funded and

dedicated group.

Page 18: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Cal-IT2

• Led by Larry Smarr. http://www.calit2.net/• Lots of areas of focus, including “

– “GEON: The Geosciences Network [GEON] – Laboratory for the Ocean Observatory Knowledge

INtegration Grid [LOOKING] – Sensor Networks

Page 19: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

BIRN

• Biomedical Informatics Research Network• http://www.nbirn.net/• NIH-sponsored attempt to create health-oriented cyberinfrastructure• Function BIRN – brain function and disorders, e.g. schizophrenia• Morphometry BIRN – brain structural disorders, e.g. Alzheimers• Mouse BIRN – studying mouse brain and mouse models of human

brain disorders• Grid technology, using federated data system approach, based on

Globus, SRB, etc.

Page 20: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Optiputer

• “The OptIPuter, so named for its use of Optical networking, Internet Protocol, computer storage, processing and visualization technologies, is an envisioned infrastructure that will tightly couple computational resources over parallel optical networks using the IP communication mechanism. The OptIPuter exploits a new world in which the central architectural element is optical networking, not computers - creating "supernetworks".

• LambdaRAM• http://www.optiputer.net/index.html

Page 21: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Genomes to Life

• http://www.doegenomestolife.org/• Original goals:

– Identify and Characterize the Molecular Machines of Life — the Multiprotein Complexes That Execute Cellular Functions and Govern Cell Form

– Characterize Gene Regulatory Networks– Characterize the Functional Repertoire of Complex Microbial

Communities in Their Natural Environments at the Molecular Level

– Develop the Computational Methods and Capabilities to Advance Understanding of Complex Biological Systems and Predict Their Behavior

– (Goals taken directly from Genomes to Life web site)

Page 22: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Genomes to Life refactored

• The Department of Energy’s Office of Science announced ... that it is revising its plans for the deployment of new research facilities to support its Genomics:GTL program. … The specific goal of the new facilities plan will be to accelerate GTL systems biology research in the area of bioenergy, with the objective of developing cost-effective, biologically based renewable energy sources to reduce U.S. dependence on fossil fuels.

• http://www.sc.doe.gov/Sub/Newsroom/News_Releases/DOE-SC/2006/GTL/index.htm

Page 23: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Current Genomic PipelineArabidopsis Protein sequences

Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG)

Structural assignment of domains by PSI-BLAST on FOLDLIB

Only sequences w/out A-prediction

Only sequences w/out A-prediction

Structural assignment of domains by 123D on FOLDLIB

Create PSI-BLAST profiles for Protein sequences

Store assigned regions in the DB

Functional assignment by PFAM, NR, PSIPred assignments

FOLDLIB

NR, PFAM

Building FOLDLIB:

PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP

90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30)

Domain location prediction by sequence

structure infosequence info

SCOP, PDB

http://eol.sdsc.edu/methodology.html

Page 24: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Scale of Multi-genome Analysis

Genomes Protein sequences

Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG)

Structural assignment of domains by PSI-BLAST on FOLDLIB

Only sequences w/out A-prediction

Only sequences w/out A-prediction

Structural assignment of domains by 123D on FOLDLIB

Create PSI-BLAST profiles for Protein sequences

Store assigned regions in the DB

Functional assignment by PFAM, NR, PSIPred assignments

FOLDLIB

NR, PFAM

Building FOLDLIB:

PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP

90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30)

Domain location prediction by sequence

structure infosequence info

SCOP, PDB

~800 genomes @ 10k-20k per =~107 ORF’s

4 CPU years

228 CPU years

3 CPU years

9 CPU years

252 CPU years

3 CPU years

104 entries

http://eol.sdsc.edu/methodology.html

Page 25: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Other centers of note

• National Resource for Biomedical Supercomputing (NRBSC). Pittsburgh. Source of MCell. http://www.nrbsc.org/.

• Scientific Computing and Imaging Institute – Christopher R. Johnson http://www.sci.utah.edu/

• UCSD Bioinformatics Program - http://bioinformatics.ucsd.edu/• Wash U bioinformatics http://www.ccb.wustl.edu/• MIT, Johns Hopkins also have interesting programs• List (incomplete) at http://zlab.bu.edu/~mfrith/BioinfoCenters.html

Page 26: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Some international efforts• eScience project - http://www.nesc.ac.uk/. EDIAMOND• Japanese Petaflops Protein Folding project -

http://www.jsbi.org/journal/GIW02/GIW02P121.pdf

Page 27: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Some activities at IU

• Flybase – authoritative source of annotated fruit fly genomic information. http://flybase.bio.indiana.edu/

• Lifescienceweb http://www.lifescienceweb.org/– Mutdb http://www.mutdb.org/– SBLEST “The Structure-Based Local Environment Search Tool

uses vectors of amino acid structural environments to perform K Nearest Neighbor queries against a database of protein structures. Our Web services allow for authenticated (password protected) submission of a protein structure, or selection of an existing structure and searching it against common databases and then visualization of the results using UCSF Chimera or PyMOL.” http://www.lifescienceweb.org/index.php?mode=sBlest_about

• TeraGrid – teragrid.iu.edu• IU IT Strategic Plan• IU Life Sciences Strategic Plan

Page 28: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Some .orgs and commercial activities• Bioinformatics.org

– Includes BioBrew Linux• BioPerl http://www.bioperl.org/wiki/Main_Page• BioPhython http://www.biopython.org/• BioJava http://biojava.org/wiki/Main_Page• BioMoby http://biomoby.open-bio.org/index.php/what-is-moby/

• Bio grid activities– folding@home http://folding.stanford.edu/– Protein predictor @ home http://predictor.scripps.edu/– rosetta@home http://boinc.bakerlab.org/rosetta/– Fight aids @ home http://fightaidsathome.scripps.edu/– World community grid http://www.worldcommunitygrid.org/

• Commercial:– Apple bioclusters (uses SGE)– IBM Life Science Institutes of Innovation– Sun Center of Excellence– Dell Center of Excellence

Page 29: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Some Good Books

• Computational Cell Biology. 2002. Springer Verlag (Fall et al, eds).• Foundations of systems biology. MIT Press, 2001. Kitano (ed)• Winter, P.C., G.I. Hickey, H.L. Fletcher. 1998. Instant notes in genetics.

Springer-Verlag, NY. ISBM 0-387-91562-1• Durbin, R., S. Eddy, A. Krogh, G. Mitchison. 2000. Biological sequence

analysis. Cambridge University Press.• Gibas, C., and P. Jambeck. 2001. Developing bioinformatics computer

skills. O’Reilly.• Tisdall, J. 2001. Beginning perl for bioinformatics. O’Reilly.• Tisdall, J. 2003. Mastering perl for bioinformatics, O’Reilly.• Gusfield, D. 1997. Algorithms on strings, trees, and sequences. Cambridge

University Press.• Berman, F., G.C. Fox, A.J.G. Hey. (eds) 2003. Grid computing: making the

grid infrastructure a reality. Wiley, Sussex

Page 30: Life Science Software and High Performance Computing Seminar Series Part IV Craig A. Stewart Fulbright Senior Scholar beim ZIH Associate Vice President,

Acknowledgments

• Funding for projects described in this talk has come from the National Science Foundation, National Institutes of Health, Lilly Endowment, Inc., State of Indiana (particularly through support of I-light Initiative and the 21st Century Fund)

• The work described here was made possible by the faculty, students, and staff of Indiana University. Thanks especially to the staff of RAC, CPO, Telecommunications, PTL, UITS generally, the participants in the Indiana Genomics Initiative, and the participants in the METACyt Initiative.

• Several of the slides and ideas presented here were developed by colleagues or collaborators – the Research and Academic Computing Division of UITS in general, and Dick Repasky in particular.

• Stewart’s visit to Dresden is funded in part by the Center for the International Exchange of Scholars, the Technical University of Dresden, and Indiana University

• And thank you very much! This has been fun and educational for me!


Recommended