+ All Categories
Home > Documents > Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.

Date post: 28-Mar-2015
Category:
Upload: antonio-baker
View: 213 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
Overview of the Science Environment for Ecological Knowledge (SEEK) http://seek.ecoinformatics.org http://kepler- project.org Ricardo Scachetti Pereira (with many, many slides from Matt Jones, Bertram Ludäscher, Ilkay Altintas, Chad Berkeley and others) University of Kansas, USA June 30, 2005
Transcript
Page 1: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

Overview of theScience Environment for Ecological Knowledge

(SEEK)

http://seek.ecoinformatics.org http://kepler-project.org

Ricardo Scachetti Pereira(with many, many slides from Matt Jones, Bertram Ludäscher, Ilkay Altintas, Chad Berkeley and others)

University of Kansas, USAJune 30, 2005

Page 2: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Outline

• Introduction to SEEK

• Introduction to Kepler

• Kepler capabilities and sample workflows

• Current and future developments

Page 3: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

What is SEEK?

Science Environment for Ecological Knowledge

Multidisciplinary project to create:

Scientific-workflow system (Kepler)– Design, document, reuse, and execute scientific analyses

Distributed data network (EcoGrid)– Environmental, ecological, and systematics data

Knowledge Representation & Semantic Mediation– Discover, integrate, and compose hard-to-relate data and

services via ontologies

Taxonomic, Biology, and Education subcomponents

Collaborators (the SEEK team)• NCEAS, UNM, SDSC/UCSD, U Kansas, UC Davis• Vermont, Napier, ASU, UNC

Page 4: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Scientific Workflows

• Model the way scientists work with their data now– Mentally coordinate export and import of data among software

systems1) Capture data in the field2) Digitize it into Excel spreadsheets3) Export as CSV files4) Import into statistical package5) Perform analysis6) Export results, tables and graphics7) Write and publish article

Query EcoGrid to find data

Archive output to EcoGrid with workflow

metadata

Page 5: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Scientific Workflows

• Scientific workflows are:– Not linear– Involve multiple data sets– Involve multiple analytical steps

Page 6: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Metadata driven data ingestion

• Key information needed to read and machine process a data file is in the metadata– File descriptors (CSV, Excel, RDBMS, etc.)– Entity (table) and Attribute (column) descriptions

• Name• Type (integer, float, string, etc.)• Codes (missing values, nulls, etc.)• In the future, this will include semantic typing

Page 7: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Metadata driven data ingestion

• Metadata is revised following any transformation• Versioning of metadata and data is very important• This process results in a lineage of the data file as it has

been transformed

Page 8: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Data integration

• Integration of heterogeneous data requires much more advanced metadata and processing– Attributes must be semantically typed– Collection protocols must be known– Units and measurement scale must be known– Measurement mechanics must be known (i.e. that

Density=Count/Area)– This is an advanced research topic within the SEEK project

Page 9: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

• Label data with semantic types• Label inputs and outputs of analytical components with semantic types

• Use SMS to generate transformation steps– Beware analytical constraints

• Use SMS to discover relevant components• Ontology = specification of a conceptualization (a knowledge map)

Semantic typing

Data Ontology Workflow Components

Page 10: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

SEEK Components Revisited

Page 11: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

SEEK EcoGrid

• Goal: allow diverse environmental data systems to interoperate– Hides complexity of underlying systems using lightweight interfaces– Integrate diverse data networks from ecology, biodiversity, and

environmental sciences

• Data systems– Any system can implement these interfaces – Prototyping using:

• Metacat, SRB, DiGIR, Xanthoria, etc.

• Supports multiple metadata standards– EML, Darwin Core as foci

• Implemented as OGSA Grid Services– Query()– Get()– Put()– Login()– …

• Tiered-implementation critical to adoption

Page 12: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler: Scientific Workflows

• Implements the workflow system in SEEK

• Open, collaborative effort of:– SEEK, SciDAC/SDM, GEON, Ptolemy Project– Ecology, biodiversity, molecular bio, geology, engineering

• Based on Ptolemy II system

• Kepler aims to extend the Ptolemy system with:– Web and grid service access– Data integration support– Semantic reasoning

• Kepler actors are written in Java but can wrap other applications (such as MATLAB, GRASS)

• Actors can call arbitrary Web (or Grid) Services

• Ptolemy already has a very large inventory of actors

Page 13: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Actor Search and Browse

• Actors Panel– Large number of

actors– Organized

hirarchically– Search makes it easy

to find right actor– Ontology-based

• Plan to support multiple views

Page 14: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

EcoGrid: EML Data Access

Page 15: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

EcoGrid: Queries

Page 16: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

EcoGrid: Queries

Page 17: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

EML Metadata Display

Page 18: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

EcoGrid: DarwinCore Access

Page 19: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler: database access

Page 20: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler: web service example

Page 21: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler: grid services access

Page 22: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler: ecological modeling

Page 23: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

New ENM Workflow

Page 24: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Data Analysis: Biodiversity Indices

Page 25: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

‘R’ in Kepler

Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK

Page 26: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

ORB

Page 27: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler today

• Supports scientific workflows– Ecology, molecular bio, geology, …– Variety of analytical components (including spatial data transformations)– Support for R scripts and Matlab scripts

• EcoGrid access to heterogeneous data– EML Data support

• Experimental data, survey data, spatial raster and vector data, etc.– DarwinCore Data support

• Museum collections– EcoGrid registry to discover data sources

• Ontology-based browsing for analytical components– Exploit semantics to improve the user experience

• Demonstration workflows– Ecology: Ecological Niche Modeling– Genomics: Promoter Identification Workflow– Geology: Geologic Map Information Integration– Oceanography: Real-time Revelle example of data access

Page 28: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Kepler this year

• Usability engineering– Full evaluation and user-oriented customization of all UI components

• Distributed computing/grid computing– Large jobs, lots of machines– Detached execution

• Component repository / downloadable components

• “Smart” data and component discovery– Support annotating data sources

• Automated data and service integration and transformation using ontologies

• Complete EcoGrid access– Full EML support– Support for “large” data and 3rd-party transfer– More data sources and types of data sources (e.g., JDBC, GEON data)

• Provenance and metadata propagation

Page 29: Overview of the Science Environment for Ecological Knowledge (SEEK)   Ricardo Scachetti Pereira.

http://seek.ecoinformatics.org

SWDB Aug 29, 2004

June, 2005

Acknowledgements

This material is based upon work supported by:

The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis

The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.

The Andrew W. Mellon Foundation.

Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON


Recommended