+ All Categories
Home > Documents > Thomas Steinke Zuse Institute Berlin (ZIB) [email protected] Activities of the COST D37 GridChem...

Thomas Steinke Zuse Institute Berlin (ZIB) [email protected] Activities of the COST D37 GridChem...

Date post: 31-Dec-2015
Category:
Upload: dortha-osborne
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Thomas Steinke Zuse Institute Berlin (ZIB) <www.zib.de> [email protected] Activities of the COST D37 Activities of the COST D37 GridChem GridChem Computational Chemistry Computational Chemistry Workflow Group Workflow Group EGEE'07 Conference EGEE'07 Conference Budapest Budapest 01.10.2007 01.10.2007
Transcript
Page 1: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

Thomas Steinke

Zuse Institute Berlin (ZIB) <www.zib.de>[email protected]

Activities of the COST D37 GridChemActivities of the COST D37 GridChemComputational Chemistry Workflow Computational Chemistry Workflow

GroupGroup

EGEE'07 ConferenceEGEE'07 Conference

BudapestBudapest

01.10.200701.10.2007

Page 2: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

2

• Berlin

• Manno•

• Erlangen

• London•

• Sevilla

Zürich

Cambridge Thomas Steinke, Tim Clark (DE)

Hans-Peter Lüthi, Martin Brändle

(CH)

Peter Murray-Rust, Henry Rzepa

(UK)

Antonio Márquez (ES)

Kurt Mikkelsen (DK)

- CSCS (Manno, CH)

- ZIB (Berlin, DE)

Partners in the CCWF Working Group

København•

Page 3: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

3

“Traditional” Workflow in Computational Chemistry

Workflows have a long tradition in the CC domain.

start knowledge base (DB search)automated/manually edited molecular structuresmolecular simulations

method / program Amethod / program B…

propertiesprimary visualization / quality controlanalysis / archival / DB storagenew insights?

in the 80’s – 90’s

Page 4: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

4

Databases: Computational protocol (T. Clark, 1998)

Complete protocol runs automatically with less than 0.5% failure rate. Cleanup 2D 3D conversion VAMP optimization Calculate properties

~3,000 compounds per processor day (3 GHz Xeon)

Enhanced 3D-Databases: A Fully Electrostatic Database of AM1-Optimized Structures B. Beck, A. Horn, J. E. Carpenter, and T. Clark, J.Chem. Inf. Comput.Sci. 1998, 38, 1214-1217.

source: Tim Clark, Uni Erlangen

Page 5: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

5

Distributed Computing Environment in the 90’s

QMpackages

Page 6: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

6

Distributed Computing Environment in the 90’s

Example: UniChemdistributed environment for quantum-chemical

simulationsCray Research Inc. 1991-(2004)

Page 7: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

7

CCWF Chemical Illustrator Applications

Molecular design of functionalised enzynesHans-Peter Lüthi, Martin Brändle, ZürichPeter Murray-Rust, Cambridge; Henry Rzepa, London

Quantum chemical based QSAR/QSPRTim Clark, Erlangen; Jon Essex, Southampton

High-order dynamic and static electrostatic molecular properties

Kurt Mikkelsen, Copenhagen

Computational heterogeneous catalysisAntonio M. Márquez Cruz, Javier Fdez. Sanz, Sevilla

Page 8: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

8

Molecular Design Workflow (Enzyne Design)

Steps: Generation and

Archiving of data

ExtractionXPath queries

Statistical Analysis

DB

QC Input

QC Output

Input

Output

Parser

StatisticalAnalysis

XMLXPathQuery

XSLT

QCApplication

source: Hans-Peter Lüthi, ETH Zürich

Page 9: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

9

Quantum Chemical Based QSAR and QSPR

2D-Database

2D 3DConformations,

Tautomers

VAMP

ParaSurf

QSPR

Virtual Screening

ADME/Tox.

Pharmacokinetics

Molecular Info

Materials Design

Multiscale Modeling

Property Optimization

generate structures,conformations and protonation states

semiempirical MO geometry optimization and electron density

generate isodensity surfaces, spherical-harmonic fits and local properties

apply models

source: Tim Clark, Uni Erlangen

Page 10: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

10

-14 -12 -10 -8 -6 -4 -2 0 2 4

Experimental Gsolv(H2O) (kcal mol-1)

-14

-12

-10

-8

-6

-4

-2

0

2

4

Cal

cula

ted

G

solv(H

2O)

(kca

l mol

-1)

Properties: Free Energies of Hydration

N = 362MUE = 0.85 kcal mol-1

RMSD = 1.09 kcal mol-1

r2 = 0.88q2 = 0.83

source: Tim Clark, Uni Erlangen

Page 11: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

11

Computing the NCI database (P. Murray-Rust, ’05)

MOPACPM5

source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute

Workflow built with Taverna

Page 12: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

12

Times to run jobs

0

40,000

80,000

120,000

0.E+00 5.E+08 1.E+09

(n basis functions)4

time

/ s

source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute

Page 13: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

13

Protocol

Log Files

Parse

SystemCrashes

ScienceErrors

Analysis

PathologicalBehaviour

Statistics

Other Science DisseminateResults

UnsuitableData

ProgramCrashes

InformDeveloper

source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute

Page 14: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

14

source: Peter Murray-Rust et al., Uni Cambridge / Unilever Institute

Conclusions from NCI “Experiment” (2005)

Protocols can be automated

Machines can highlight unusual behaviour, geometries and distribution of results for humans to consider

Computational programs can provide high quality “experimental” molecular properties

Page 15: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

15

Motivation

The orchestration of complex workflow scenarios is on today’s agenda.

complex scientific solution paths linking in-house and (commercial) legacy codes

Transformation of scientific ventures into a scientifically validated protocol

allowing a highly (semi-) automated data generation (pre-processing) and data processing steps.

Page 16: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

16

Goals of the CCWF Working Group

implementation of workflow environments for QC by adapting standard (Grid) technologies

fostering standard techniques (interfaces) for handling quantum chemical data in a flexible and extensible format to ensure application program interoperability and support of an efficient access to chemical information based on a CC ontology.

implementation of computational chemistry illustrator scenarios to demonstrate the applicability of our approach

Page 17: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

17

Generic Workflow

1. Automatic generation + validation of input data

2. Submission, monitoring, and gathering of output data of

simulation jobs

3. Integration of results (primary data) into project database

4. Data mining and visualization techniques to reduce

complexity

5. Knowledge generation by applying methods of statistical

analysis and pattern recognition.

6. On-line publication and archiving of valuable scientific

data.

Page 18: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

18

Challenges

Diversity:Molecular properties derived from state functions obtained with electronic-structure methods. ab-initio, semi-empirical, DFT, approximate potentials

Gaussian, COLUMBUS, Dalton, Turbomole, MOPAC, Vamp, CPMD…

Data formats:How to implement seamless data export/import? ~80 relevant formats known in CC: XYZ, MDL, SDF, PDB, …

OpenBABEL

Page 19: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

19

Challenges (cont.)

Scaling, Robustness, Load Balancing:I can handle O(10) jobs by hand but…what about campaigns of O(1000) of jobs? workflow system computational resources distributed computing persistence, automated failure recovery, … long simulation times, sometimes unpredictable

Acceptance: easy of use, GUI + CLI

Page 20: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

20

What I Want…

easy-of-use: workflow orchestration usage installation / maintenance

sharing of workflow descriptions with my colleagues standard languages

support in a heterogeneous environment laptop – server – cluster – supercomputer – grid

Page 21: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

21

Which Workflow System?

… to be spoilt for choice?

Page 22: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

22

Some Assessment Criteria

workflows in distributed systems supported batch systems: PBS (,

LSF) support for managing large files

recovery / backup

quality of the documentation customizability PKI / security

required installation effort Web interface WF language

robustness, stability Grid environment open source

restart/stop/debugging user/installation base

status & exception handling legacy codes and Web services project development activity

GUI

Page 23: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

23

TRIANA Experiences (2005/06)

workflow orchestration integration of web

services semantic check of WSDL

files support for self-written

Triana modules negligible control logic

overhead pre-requisite for migration

to Grid environments

- proprietary workflow description language in TRIANA (BPEL is announced)

- GUI robustness for very complex workflow definitions

Page 24: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

24

GWES Experiences (MediGRID, since 2006)

integration of web services and legacy codes

monitoring + debugging support

Grid environments under active development

(A. Hoheisel et al./FhG FIRST)

- workflow orchestration (WF GUI builder in preparation)

- proprietary workflow description language

Page 25: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

25

Page 26: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

26

OMII Server: Attracting Features

Workflows language: BPEL (Active BPEL) WF editor (Eclipse) Web Services customization

Jobs submission & monitoring via

WS job manager API

persistent (job recovery), in-memory (via Hibernate)

Distributed Resource Management (DRM)

Condor-G, Globus Gram SSH-exec your own plug-ins, e.g. PBS

Data GridSAM file staging support within job (JSDL): file stage in/out Apache Virtual File System library

(vfs) FTP, local files, http, http, ssftp zip, jar, tar, bzip2, gzip ram - data in memory

GridFTP

Page 27: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

27

OMII/Active BPEL Experiences (3 months)

workflow orchestration (Eclipse plugin)

standardized WF language monitoring support Grid environments security features: https +

signed messages (X.509 cert.)

active development (UK eScience)

- deployment requires manual workarounds

- learning barrier (BPEL)- BPEL editor not fully

mature (validation of BPEL workflows)

Page 28: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

28

Summary

there are a couple of workflow system available design/development of workflow system still an on-

going research not yet decided for our working group

barriers: easy to use vs. robustness middleware stack: more complicated Grid

environments vs. script-based approaches on clusters

standards vs. proprietary but powerful/sufficient WF languages BPEL has a high chance to survive

Page 29: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

29

Acknowledgement

Core members of D37 CCWF working group Hans-Peter Lüthi, ETH Zurich Tim Clark, CCC Uni Erlangen J. A. Townsend, P. Murray-Rust, S. M. Tyrrell, Y. Zhang, Uni

Cambridge/Unilever Inst.

developer of workflow systems mentioned in this talk

Page 30: Thomas Steinke Zuse Institute Berlin (ZIB) steinke@zib.de Activities of the COST D37 GridChem Computational Chemistry Workflow Group EGEE'07 Conference.

30

QUESTIONS?


Recommended