+ All Categories
Home > Documents > BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science &...

BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science &...

Date post: 17-Dec-2015
Category:
Upload: felicia-wilkinson
View: 223 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort NIH Resource for Macromolecular Modeling and Bioinformatics University of Illinois
Transcript
Page 1: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS: Cyber

Infrastructure for Cyber Chemistry

Jesús A. IzaguirreComputer Science & Engineering

University of Notre Damewith Kirby Vandivort

NIH Resource for Macromolecular Modeling and Bioinformatics

University of Illinois

Page 2: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Overview I

• Chemical applications such as virtual screening, protein kinetics and structure, and analysis and validation of molecular simulations require enormous resources that can be provided by CyberInfrastructure

• Successful solution of these problems require collaborative approaches, also facilitated by CyberInfrastructure

Page 3: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Overview II

To make CyberInfrastucture effective, the following issues must be addressed:

• Users of CyberInfrastructure need a data-centric way of managing their computations and data

• Distributed databases on the grid need to address the problem of reliability and fault-tolerance of data

Page 4: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Overview III

• We will study examples of collaborative software that address these issues, primarily:– BioCoRE: A Collaboratory for Structural

Biology– GEMS: Grid Enabled Molecular Simulations

Toolset and Database

Page 5: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Sample CyberScience Projects

Collaborative Biophysics BioCoRE

K. Schulten, Illinois

Virtual Screening The Screensaver Project

W.G. Richards, Oxford

Protein Kinetics Folding@Home

V. Pande, Stanford

Distributed Database of Molecular Simulations

BioSimGrid

M. Sansom, Oxford

Page 6: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

What is BioCoRE?

BioCoRE: a collaborative work environment for biomedical research, research management and training.

BioCoRE assists the entire research process, from talking with collaborators to performing simulations and collecting data, to preparing papers and reports.

Page 7: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Sharing Documents

With the BioFS and WebDAV, scientists can exchange and edit files from anywhere with a web connection.

Page 8: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Setting Up and Running Simulations

• NAMDCFG: A “Simulation Setup Wizard”

• Online help and error checking for NAMD input files

• Job submission to supercomputers simplified

• Job status monitored for easy retrieval

• Job data archived for future reference

Page 9: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Sharing Molecular ViewsUsing VMD and BioCoRE, collaborators may exchange and manipulate 3-D models of molecules

Emphasis on collaborative sessions.Streamlined process of sharing views.

Page 10: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Communicating

• Control Panel provides instant messaging and notifications

• BioCoRE also provides message boards, Web site library, lab book

Page 11: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Programming Interface

• Provide way for users to programmatically interact with BioCoRE.

• Communication (Control Panel), shared states (VMD)

• WebDAV

Page 12: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Availability

• Free

• Can be accessed from Illinois site, or server software can be installed locally

• Server software can be modified if necessary

• http://www.ks.uiuc.edu/Research/biocore/

Page 13: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Virtual Screening

• Combinatorial Complexity Lead Exploration

• Screen docking affinities based on a scoring function (interaction energies, RMSD, etc…)

• Modeled as an all pairs problem

• Logically independent computational requirements are well suited for wide area grid distribution

Leads (ligands)

L0001

L0002

L0003

L0004

L0005

Page 14: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

CyberInfrastructure Needs for Virtual Screening I

• Incorporate protein (receptor) flexibility– Use multiple protein structures (hierarchical

representations and algorithms)

• Iterative refinement of results– Add new protein conformations to improve

docking– Use higher resolution models for promising hits

(integration of data and work flow)– Monitor status of results (not just jobs running)

Page 15: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

CyberInfrastructure Needs for Virtual Screening II

• Manage computation and storage in the grid– Declarative rather than imperative specification

• Automate usage of algorithms / tools– Select software and optimal parameters for

algorithms (recommender system)– Example: MDSimAid (

http://mdsimaid.cse.nd.edu) selects optimal MD simulation protocol (limited options)

Page 16: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

BioSimGrid Mark S. P. Sansom, Oxford

• Trajectory data stored in relational database tables per Data Schema

• Semi-Automated Deposition of trajectory files for certain formats (CHARMM, NAMD, etc…)

• Trajectory analysis modules• Future goal to distribute

database

• Database for biomolecular simulations• Specifically: molecular dynamics trajectories• Facilitate validation and analysis of simulations• Provides “independence” from the specific simulation semantics

(configuration parameters, architecture, simulation tools, etc…)

Page 17: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

CyberInfrastructure Needs for Distributed Databases I

• Metadata for trajectories– Simulation protocol, software, etc.

• Distribution on the grid– Storage fault tolerance / reliability– Scalable solution: reduce storage requirements

and centralization

Page 18: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

CyberInfrastructure Needs for Distributed Databases II

• Data-driven model for the user– Data organized around key themes (trajectories,

molecules)

• Generic tools for developers– Applicable to different applications

Page 19: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving Integration Problem

• We need to capture the data flow and the work flow

– Ecce project– XML metadata– Component architectures (e.g., JavaBeans,

Common Component Architecture)

Page 20: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving Integration Problem

• BioCoRE (K. Schulten, Illinois)– Use of programming interface– Provides multiple services to applications (web

file system, job management, shared visualization)

Page 21: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving Grid Management

• Current grid tools are task oriented: run this particular simulation code with these input files, etc.– Web portals are an incremental improvement

over command line or stand alone applications

• Problem: Controlling multiple resources– For example, create 10,000 tasks & keep track

of the data, as might be needed for virtual screening or @home applications

Page 22: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving Grid Management with GIPSE

• GIPSE: Grid Interface for Parameter-driven Simulation Environments– Shift focus from management to research– Result-driven interface– Scripting capabilities

Page 23: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving grid management with GIPSE

• Simplify process– XML Data format– Missing “glue”

• Powerful searches– Optimizations– Control loops

GEMS Toolset HIV-1 Protease

Page 24: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Solving grid management with GIPSE

• Manage data– Storage– Database retrieval

• Monitor progress– Status– Application – specific

GEMS Toolset HIV-1 Protease

Page 25: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

GEMS Database Toolset

• Grid Enabled Molecular Simulation– Data Centric

– Wide area distributed storage

– Researchers have data and resource autonomy

– Simulation configuration, input data files, and output data files identified via XML

– Centralized SQL locator

– Availability via replication

Page 26: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Reliability and Leveraged Availability via Runtime Imaging

• Reliability of data storage is increased• User can tradeoff availability versus storage volume

• Workspace data has 2-way redundancy by default• Archival data has a 2-way redundancy of fewer

snapshots, but saves the computational images• For each computational run through the GEMS portal a

comprehensive runtime image is created from which the simulation can automatically be regenerated.

• Runtime images include executable version and location, library requirements, hardware requirements, input files, and configuration parameters

Page 27: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Integration of Distributed Data Into New Simulations

• A grid distributed “make” based on a computational requirement over a set parameter sweep– Example: optimize MD simulation protocol

• Before starting the sweep a query determines data points that are up to date and those that require computation (including regeneration)– Example: keep current list of results of virtual

screening as more computations are performed or targets and ligands added

Page 28: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Example: Validating Simulations

• Locate specific published simulation configurations for benchmarking

• Select pertinent input data files (pdb, psf, force fields, etc…) for direct utilization in a new simulation for purpose of comparison/contrast.

• Researcher B wants to vary certain parameters of Researcher A’s published simulation to test her new MD integrator

Page 29: BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.

BioCoRE and GEMS 3 October 2004

Acknowledgments

• Collaborators in GIPSE and GEMS: – Aaron Striegel– Doug Thain – Jeff Peng

• Students– Paul Brenner– Santanu Chatterjee

• Funding from NSF Career and Biocomplexity

• Klaus Schulten• BioCoRE Team:

– Robert Brunner

– Michael Bach

– David Brandon

• BioCoRE funding from NIH


Recommended