Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | joshua-harper |
View: | 222 times |
Download: | 0 times |
1Ilkay ALTINTAS - October, 2007
Ilkay ALTINTASLab Director, Scientific Workflow Automation Technologies
San Diego Supercomputer Center, UCSD
Kepler Scientific Workflows:Current and Future Development
2Ilkay ALTINTAS - October, 2007
Scientific Workflow Systems• Combination of
– data integration, analysis, and visualization steps – automated "scientific process”
• Mission of scientific workflow systems– Promote “scientific discovery” by providing tools and methods to
generate scientific workflows– Create an extensible and customizable graphical user interface
for scientists from different scientific domains– Support computational experiment creation, execution, sharing,
reuse and provenance– Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple resources
3Ilkay ALTINTAS - October, 2007
Ptolemy II: A laboratory for investigating designKEPLER: A problem-solving environment for Scientific Workflow
KEPLER = “Ptolemy II + X” for Scientific Workflows
Kepler is a Scientific Workflow System
• … and a cross-project collaboration
• 3rd Beta release (Jan 8, 2007)
www.kepler-project.org
• Builds upon the open-source Ptolemy II framework
4Ilkay ALTINTAS - October, 2007
Kepler use cases represent many science domains!• Ecology
– SEEK: Ecological Niche Modeling and climate change
– REAP: Modeling parasite invasions in grasslands using sensor networks
– NEON: Ecological sensor networks– COMET: Environmental science
• Geosciences– GEON: LiDAR data processing, Geological
data integration– NEESit: Earthquake engineering
• Molecular biology– SDM: Gene promoter identification and
ScalaBLAST– ChIP-chip: Genome-scale research– CAMERA: Metagenomics
• Oceanography– REAP: SST data processing– LOOKING/OOI CI: ocean observing CI– ROADNet: real-time data modeling and
analysis– Ocean Life project
• Phylogenetics– ATOL: Processing Phylodata
– CiPRES: Phylogentic tools
• Chemistry– Resurgence: Computational chemistry
– DART/ARCHER: X-Ray crystallography
• Library science– DIGARCH: Digital preservation
– UK Text Mining Center: Cheshire feature and archival
• Conservation biology– SanParks: Thresholds of Potential
Concerns
• Physics– SDM: astrophysics TSI-1 and TSI-2
– CPES: Plasma fusion simulation
– ITER-EU: ITM fusion workflows
5Ilkay ALTINTAS - October, 2007
Kepler today is a research prototype and a production workflow tool!
• Some of the current R&D– Distributed execution of workflow
parts (peer to peer)
– Efficient data transfer
– Provenance tracking of data and processes
– Tracking workflow evolution
– Streaming data analysis
– Easy-to-deploy batch interfaces
– Intuitive workflow design
– Customizable semantic typing
– Interoperability with other workflow and analytical environments (at exec level)
• Production workflow examples:– GEON LiDAR workflow (GLW)
• 116 registered, 106 active users• 2076 submitted jobs to date
– Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW)
• 2000 actors, 5 levels of model hierarchy• Longest run duration 3 hours
– PtII AirForce Lab Model• 12920 actors, 65331 attributes• Longest run duration: 10 minutes
– Longest running real-time simple monitoring model in PtII - months at a time
• All generated using the GUI and executed in batch mode…
– No coding and text manipulation
6Ilkay ALTINTAS - October, 2007
REAP: Realtime Environment for Analytical Processing
• Funded 2006-2009– NSF CEO:P
• Jones(PI), Altintas, Baru, Ludaescher, Schildhauer
– Partners: • NCEAS/UCSB (Lead),
SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU
• Management and Analysis of Observatory Data using Kepler Scientific Workflows• The vision:
– An integrated environment for analyzing data from observatories
• Two scientific use cases:– Terrestrial ecology– Oceanography
reap.ecoinformatics.org
7Ilkay ALTINTAS - October, 2007
REAP Views
• For data-grid engineers– monitoring and management
capabilities of underlying sensor networks
• For outside users– access to observatory data
and results of models, approachable to non-scientists.
• For scientists– capabilities for designing and executing complex analytical models over near real-time and archived
data sources
8Ilkay ALTINTAS - October, 2007
REAP: Terrestrial Ecology Usecase
Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.
9Ilkay ALTINTAS - October, 2007
REAP: RBNB Streaming Data Actor
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.
10Ilkay ALTINTAS - October, 2007
REAP: Oceanographic Usecase
Facilitate the quantitative evaluation of SST data sets.
11Ilkay ALTINTAS - October, 2007
Kepler/C.O.R.E
• Funded 2007-2010– NSF SDCI
• Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer
– Partners: • Genome Center/UCDavis
(Lead), SDSC/UCSD, NCEAS/UCSB
• SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure
• The vision:– Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure
kepler-project.org
Builds on community participation as a driving force for Kepler.
12Ilkay ALTINTAS - October, 2007
Kepler/C.O.R.E.• Comprehensive
– First-class support for technical features
• Open– well designed and clearly articulated mechanisms and interfaces provided to facilitate
developing extensions
• Reliable– Both as a development platform and as a run-time environment for the user
• Extensible– Independently extensible by groups not directly collaborating with the team
13Ilkay ALTINTAS - October, 2007
Directors in Kepler• Means to execute networks of components under multiple execution
models– Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all
combined
• Makes use of separation of concerns principle– e.g., component execution, workflow execution and provenance tracking
• The manager acts like a “common execution environment” – governing different concerns related to execution of the network and services
Ptolemy and Kepler are unique in combining different
execution models in heterogeneous models!
Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines
Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless
14Ilkay ALTINTAS - October, 2007
Credits
• Kepler community and colleagues
• On REAP and Kepler/CORE:– Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome
Center, UCD– Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB– Eric Seabloom, OSU– Peter Cornillion, OpenDAP
15Ilkay ALTINTAS - October, 2007
Ilkay [email protected]+1 (858) 822-5453
http://www.sdsc.edu
Questions…