+ All Categories
Home > Documents > Computational Science and the School of Informatics at Indiana University

Computational Science and the School of Informatics at Indiana University

Date post: 31-Dec-2015
Category:
Upload: tarik-beck
View: 41 times
Download: 1 times
Share this document with a friend
Description:
Computational Science and the School of Informatics at Indiana University. IU/HBCU STEM Initiative IUPUI April 11 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http://www.infomall.org. - PowerPoint PPT Presentation
21
Computational Science and the School of Informatics at Indiana University IU/HBCU STEM Initiative IUPUI April 11 2007 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 [email protected] http:// www.infomall.org
Transcript
Page 1: Computational Science and the School of Informatics at  Indiana University

Computational Science and theSchool of Informatics at

Indiana University

IU/HBCU STEM InitiativeIUPUI

April 11 2007Geoffrey Fox

Computer Science, Informatics, PhysicsPervasive Technology Laboratories

Indiana University Bloomington IN 47401

[email protected]://www.infomall.org

Page 2: Computational Science and the School of Informatics at  Indiana University

What is Computational Science?What is Computational Science? InformaticsInformatics is the integration of the art, is the integration of the art,

science, and the human dimensions of science, and the human dimensions of information technologyinformation technology to provide solutions to to provide solutions to discipline-specific problemsdiscipline-specific problems

Informatics is a response to the Informatics is a response to the data/information/knowledge gaps (data/information/knowledge gaps (data delugedata deluge) ) caused by “billions and billions of bits”caused by “billions and billions of bits”• GridsGrids are technology supporting this in distributed are technology supporting this in distributed

researchresearch Computational ScienceComputational Science could be the same as could be the same as

this or focus on the large scale simulation partthis or focus on the large scale simulation part MulticoreMulticore chips will revitalize simulation! chips will revitalize simulation!

Page 3: Computational Science and the School of Informatics at  Indiana University

Bioinformatics Data DelugeBioinformatics Data DelugeChallenge and OpportunityChallenge and Opportunity1985 2000

1 experiment

1 gene

10 data

1 experiment

10,000 genes

10,000,000 data

Page 4: Computational Science and the School of Informatics at  Indiana University

e-moreorlessanything and the Grid ‘e-Science is about global collaboration in key areas of science,

and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology

e-Science is about developing tools and technologies that allow scientists to do ‘faster, better or different’ research

Similarly e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example

The Grid provides the information technology e-infrastructure for e-moreorlessanything.

A deluge of data of unprecedented and inevitable size must be managed and understood.

People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and

storage resources must be supported

Page 5: Computational Science and the School of Informatics at  Indiana University

Why Grids/ Cyberinfrastructure Useful Supports distributed science – data, people, computers Exploits Internet technology (Web2.0) adding management,

security, supercomputers etc. It has two aspects: parallel – low latency (microseconds)

between nodes and distributed – highish latency (microseconds) between nodes

Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem

Distributed aspect integrates already distinct components Cyberinfrastructure is in general a distributed collection of

parallel systems Grids are made of services that are “just” programs or data

sources packaged for distributed access Web 2.0 can be used “instead of” Grids

Page 6: Computational Science and the School of Informatics at  Indiana University

TeraGrid: Integrating NSF Cyberinfrastructure

TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University, Indiana University, Oak Ridge National Laboratory, the Pittsburgh Supercomputing Center, and the National Center for Atmospheric Research.Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today and doubling

SDSCTACC

UC/ANL

NCSA

ORNL

PU

IU

PSCNCAR

Caltech

USC-ISI

UtahIowa

Cornell

Buffalo

UNC-RENCI

Wisc

Page 7: Computational Science and the School of Informatics at  Indiana University

APEC Cooperation for Earthquake Simulation ACES is a seven year-long collaboration among scientists

interested in earthquake and tsunami predication• iSERVO is Infrastructure to support

work of ACES

• SERVOGrid is (completed) US Grid that is a prototype of iSERVO

• http://www.quakes.uq.edu.au/ACES/

Chartered under APEC – the Asia Pacific Economic Cooperation of 21 economies

Page 8: Computational Science and the School of Informatics at  Indiana University

Database Database

Analysis and VisualizationPortal

RepositoriesFederated Databases

Data Filter

Services

Field Trip DataStreaming Data

Sensors

?DiscoveryServices

SERVOGrid

ResearchSimulations

Research Education

CustomizationServices

From Research

to Education

EducationGrid ComputerFarmGrid of Grids: Research Grid and Education Grid

GISGrid

Sensor GridDatabase Grid

Compute Grid

Page 9: Computational Science and the School of Informatics at  Indiana University

SERVOGrid and Cyberinfrastructure Grids are the technology based on Web services that implement

Cyberinfrastructure i.e. support eScience or science as a team sport

• Internet scale managed services that link computers data repositories sensors instruments and people

There is a portal and services in SERVOGrid for• Applications such as GeoFEST, RDAHMM, Pattern

Informatics, Virtual California (VC), Simplex, mesh generating programs …..

• Job management and monitoring web services for running the above codes.

• File management web services for moving files between various machines.

• Geographical Information System services • Quaketables earthquake specific database• Sensors as well as databases• Context (dynamic metadata) and UDDI system long term

metadata services• Services support streaming real-time data

Page 10: Computational Science and the School of Informatics at  Indiana University

LEAD Gateway PortalNSF Large ITR and Teragrid Gateway - Adaptive Response to Mesoscale weather events - Supports Data exploration,Grid Workflow

Page 11: Computational Science and the School of Informatics at  Indiana University

Grid Workflow Datamining in Earth Science Work with Scripps Institute Grid services controlled by workflow process real time

data from ~70 GPS Sensors in Southern California

Streaming DataSupport

TransformationsData Checking

Hidden MarkovDatamining (JPL)

Display (GIS)

NASA GPS

Earthquake

Page 12: Computational Science and the School of Informatics at  Indiana University

Some Organizations I work with• MSI CI2 Minority-Serving Institutions (MSI) Cyberinfrastructure Institute

led by the• Alliance for Equity in Higher Education. Working with the Alliance will

have systemic impact on at least 335 Minority Serving Institutions covered by the

• AIHEC American Indian Higher Education Consortium)• HACU Hispanic Association of Colleges and Universities• NAFEO National Association for Equal Opportunity in Higher Education• MSI-CIEC Minority-Serving Institution Cyberinfrastructure (CI)

Empowerment Coalition led by• UHD University of Houston Downtown as a major Hispanic Serving

Institution• I am Senior Research Associate in the Center for Computational Science

and Advanced Distributed Simulation at UHD and Visiting Scholar for Cyberinfrastructure Development at the Alliance for Equity in Higher Education

Page 13: Computational Science and the School of Informatics at  Indiana University

Basic Ideas• Cyberinfrastructure is critical to all involved in

Research and Education• Cyberinfrastructure is intrinsically democratic

supporting broad participation• MSI’s should lead MSI integration with

Cyberinfrastructure• One should guide the projects with experts• One should aim at scalable (systemic) approaches• Goal is peer collaborations involving all institutions of

higher education

Page 14: Computational Science and the School of Informatics at  Indiana University

2/22/2001 JSUFall97Master http://www.npac.syr.edu [email protected]

1

Programming for the WebProgramming for the WebGeneral IntroductionGeneral Introduction

Course at Jackson State University Spring98 Course at Jackson State University Spring98 and Fall 97and Fall 97

http://www.npac.syr.edu/users/gcf/jsufall97introNancy McCracken

Geoffrey Fox, Tom Scavo

Syracuse University NPAC

111 College PlaceSyracuse NY 13244 4100

3154432163

JSUSyracuse

Teaching Jackson State Fall 97 to Spring 2005

Page 15: Computational Science and the School of Informatics at  Indiana University

Example: Setting up a Polar CI/Grid• NSF CI-Team project with HBCU ECSU in North Carolina and

Kansas University will design and set up a Polar Grid – CI Enable MSIs (ECSU Haskell) and a community (Polar Science)

• The North and South poles are melting with potential huge environmental impact– We have changed the 100,000 year Glacier cycle into a ~50 year cycle;

the field has increased dramatically in importance and interest• Polar Grid is a network of computers, sensors (on robots and

satellites), data and people aimed at understanding science of ice-sheets and impact of global warming

• We are planning Polar Grid relevant CI Education Infrastructure and initial projects with Undergraduate students (ECSU) and Graduate students (Kansas)– Polar weather stations as Grid resources– Use distance education to cover all CReSIS sites

Page 16: Computational Science and the School of Informatics at  Indiana University
Page 17: Computational Science and the School of Informatics at  Indiana University

CReSIS PolarGrid• Important CReSIS-specific Cyberinfrastructure components include

– Managed data from sensors and satellites – Data analysis such as SAR processing – possibly with parallel

algorithms– Electromagnetic simulations (currently commercial codes) to design

instrument antennas– 3D simulations of ice-sheets (glaciers) with non-uniform meshes– GIS Geographical Information Systems

• Also need capabilities present in many Grids– Portal i.e. Science Gateway– Submitting multiple sequential or parallel jobs

• TeraGrid etc. (the National Cyberinfrastructure) is having Cyberinfrastructure days at various places around country to popularize and identify how institutions can participate– ECSU will be later this year

Page 18: Computational Science and the School of Informatics at  Indiana University

Indiana University Cheminformatics Center Summary

Indiana University is focusing on two major areas:• Creating a comprehensive, easily accessible infrastructure for

chemoinformatics tools and data sources, linked with PubChem and made available as web services, and partnering with screening centers and other users to demonstrate how this infrastructure can be usefully applied– Infrastructure can include any tools, not just ours (commercial/open source,

chemoinformatics, bioinformatics, and so on)

– New, custom applications can be built quickly using existing services in a similar way to Google Maps and other “web 2.0” resources

• Being a central hub of chemoinformatics education, including offering distance courses on chemoinformatics theory and techniques, practical workshops on using chemoinformatics resources, and freely available web-based educational resources– We currently offer a Ph.D, M.S. and graduate certificate (distance) in chemical

informatics

– Distance education program allows you to “pick and choose” courses to meet educational needs: certificate is awarded on completion of four courses

Page 19: Computational Science and the School of Informatics at  Indiana University

CICC Combines Grid Computing with Chemical Informatics

CICCCICC CICCCICCChemical Informatics and Cyberinfrastucture Collaboratory

Funded by the National Institutes of Healthwww.chembiogrid.org

Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories

Science and Cyberinfrastructure

.

Large Scale Computing Challenges

Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated.

CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs.

CICC supports the NIH mission by combining state of the art chemical informatics techniques with

• World class high performance computing• National-scale computing resources (TeraGrid)• Internet-standard web services • International activities for service orchestration• Open distributed computing infrastructure for scientists world wide

NIHPubMed

DataBase

OSCARText

Analysis

POVRayParallel

Rendering

Initial 3DStructure

Calculation

ToxicityFiltering

ClusterGrouping

Docking

MolecularMechanics

Calculations

Quantum Mechanics

Calculations

IU’sVaruna

DataBase

NIHPubChemDataBase

Chemical informatics text analysis programs can process 100,000’s of abstracts of online journalarticles to extract chemical signatures of potential drugs.

OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential.

Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community.

Page 20: Computational Science and the School of Informatics at  Indiana University

CICC Web Service Infrastructure

Portal ServicesRSS FeedsUser ProfilesCollaboration as in Sakai

Grid ServicesService RegistryJob Submission and Management

Local ClustersIU Big RedTeraGrid, Open Science Grid

Varuna.netQuantum Chemistry

Statistics Services Database Services

Core functionality Computation functionality 3D structures byFingerprints Regression CIDSimilarity Classification SMARTSDescriptors Clustering 3D Similarity2D diagrams Sampling distributionsFile format conversion

Docking scores/poses byApplications Applications CID

Docking Predictive models SMARTSFiltering Feature selection Protein

2D plots Docking scoresToxicity predictions

Anti-cancer activity predictionsCID, SMARTS

Cheminformatics Services

DruglikenessArbitrary R code (PkCell)

Mutagenecity predictionsPubChem related data by

Pharmacokinetic parametersOSCAR Document AnalysisInChI Generation/SearchComputational Chemistry (Gamess, Jaguar etc.)

Page 21: Computational Science and the School of Informatics at  Indiana University

Varuna environment for molecular modeling (Baik, IU)

QMDatabase

ResearcherResearcher

Simulation ServiceFORTRAN Code,

Scripts

Chemical Concepts

Experiments

QM/MMDatabasePubChem, PDB,

NCI, etc.

ChemBioGridChemBioGrid

ReactionDB

DB ServiceQueries, Clustering,

Curation, etc.

Papersetc.

Condor

TeraGridSupercomputers

“Flocks”


Recommended