+ All Categories
Home > Documents > Intelligent Distributed Data Management in Earth system science

Intelligent Distributed Data Management in Earth system science

Date post: 15-Jan-2016
Category:
Upload: gwyn
View: 20 times
Download: 0 times
Share this document with a friend
Description:
Intelligent Distributed Data Management in Earth system science. K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany. QFLUX: Humidity flux calculation. Structure. - PowerPoint PPT Presentation
15
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany S. Kindermann, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany
Transcript
Page 1: Intelligent Distributed Data Management in  Earth system science

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Intelligent Distributed Data Management in

Earth system scienceK. Ronneberger, DKRZ, GermanyS. Kindermann, DKRZ, Germany

T. Brücher, University of Cologne, GermanyH. Ramthun, M&D, Germany

M. Stockhause, MPI-Met, IFM-Geomar, Germany

Page 2: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 2

Enabling Grids for E-sciencE

INFSO-RI-031688

QFLUX: Humidity flux calculation

Page 3: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 3

Enabling Grids for E-sciencE

INFSO-RI-031688

Structure

• What is Earthsystem Science about?– Typical workflows– Traditional infrastructure

• Why can grid-technology help?– Limits of the current practice– Outline of possible and existing use areas

• How do we use this technology?– Conceptual Outline of the developing infrastructure – Demo of an example workflow

• Potential impact and vision– Next steps and challenges

Page 4: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 4

Enabling Grids for E-sciencE

INFSO-RI-031688

Earthsystem Sciences

• Goal: learn about the past, the present, and possible futures of the earth system

• Community: internationally and interdisciplinary distributed but strongly interconnected

• Method: Analysing, comparing and processing data

• Input: data from observations and/or other modelling studies

Collect & Prepare

Visualize4

Analyse

Find & Select

Distributed Climate Data

Model DataObservation Data

Analysis Dataset

Result Dataset

Scenario data

3

2

Data description

1

Typical workflow

Page 5: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 5

Enabling Grids for E-sciencE

INFSO-RI-031688

Visualize

selected

result

An example workflow: “qflux”

Collect & Prepare a temporal and spatial subset of the data

4

Analyse the integrated, transport of humidity between selected levels

Find & Select relevant & available datasets

Distributed Climate Data

Analysis Dataset

Result Dataset

Wind speed

3

2

1TemperatureSpecific

humidity

Datavolume

Several PB

~3,1TB(300-500 files)

~10,3GB

(28 files)

~76 MB

~6MB

~66KB

Location

Various data centers & portals

Institutional storage & computing

facilities

local facilities

Personal Computer

Page 6: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 6

Enabling Grids for E-sciencE

INFSO-RI-031688

Potential use of grid technology

• Search & selectSearch & select– Different portals with

different authentications and data descriptions

• Collect & prepareCollect & prepare– Different access

mechanisms of the different providers

– Pre-processing requires sufficient local facilities

• AnalyseAnalyse– Existing tools and already

processed data are available locally and miss proper description

• VisualizeVisualize– Detached from the remaining

workflow

Current issues• Central unique authentication to a common catalogue with standardized metadata

• Shared resources with standardized access hiding proprietary access mechanisms

• Commonly defined tool description• Log processing steps and automatically republish processed data

• Integrate basic visualization (first peep) into the workflow

Page 7: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 7

Enabling Grids for E-sciencE

INFSO-RI-031688

Fin

d &

sele

ct

Collect

&

pre

pare

an

aly

se

vis

ualiz

e

C3 Grid and EGEE - the components

• Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility

• Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139)

• Standardized access interface: hide the complexity of specific data access mechanisms and pre-processing functionalities (webservice technology)

• Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )

Page 8: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 8

Enabling Grids for E-sciencE

INFSO-RI-031688

Data access in ESR grid projects Earth System Grid project

(USA)

C3 Grid (Germany)

NERC data grid (UK)

Scope

(project)

High performance access of climate model data

Uniform & effective discovery and access of data of various disciplines & types

Harmonized & detailed search and access of data of various disciplines & types

Data stock

(status)

• Homogenous

• Flat-file storage

• Heterogeneous

• Databases & flat-file storage

• Heterogeneous

• Databases & flat-file storage

Data description

(solution)

• Use aspect of data, tools and models

• E.g. NcML for netCDF data

• Discovery and some use aspects

• ISO 19115/ISO 19139

• Content of the data in great detail

• Semantic datamodel (CSML, based on GML)

Data access

(solution)

• Different protocols

• Intelligence at portal

• Uniform access interface

• Intelligence at data provider / grid

• Different protocols

• Intelligence at portal

Page 9: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 9

Enabling Grids for E-sciencE

INFSO-RI-031688

Bridging EGEE and C3

EGEEEGEE

UI

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

OAI-PMHserver

Webservice Interface

OAI-PMHserver

AMGAMetadata Catalog

(f) Publish (ISO

19115/19139)

(g) Harvest (OAI-PMH)

German Climate Data Providers:

WDC Climate WDC RSAT WDC Mare DWD AWI PIK IFMGeomar MPI-Met GKSS

DataResource Metadata

(a) Publish (ISO

19115/19139)

(b) Harvest (OAI-PMH)

Page 10: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 10

Enabling Grids for E-sciencE

INFSO-RI-031688

Demo

(1) Search-, discover-, and select- functionalities of the portal

(2) Upload and register data to EGEE

(3) Trigger the example workflow qflux from the portal

Page 11: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 11

Enabling Grids for E-sciencE

INFSO-RI-031688

Upload pre-processed data to EGEE

EGEEEGEE

UI

DataResource

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

LFCCatalog

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

(1) Find & Select

(2) Collect & Prepare

(b) Retrieve (jdbc or archive)

(c) Stage & Provide

Webservice Interface

(a) Request (webservice)

(d) notifyWebservice Interface

(f) Transfer &

Register (lcg-tools)

(e) Request (webservice)

(g) Register

(Java-API)

Metadata

(f) Publish (ISO

19115/19139)

Page 12: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 12

Enabling Grids for E-sciencE

INFSO-RI-031688

Trigger qflux workflow

EGEEEGEE

UI

DataResource Metadata

C3Grid data interfaceC3Grid data interface

ClimateData

Workspace

Webservice Interface

SE

CEWNWNWNWNWNWN

(3) Analyse

LFCCatalog

(4) Visualize

Web Portal C3

Lucene Index

Webservice Interface

OAI-PMHserverOAI-PMH

server

AMGAMetadata Catalog

Webservice Interface

(b) submit

(glite)

qflux

qflux

(a) Request (webservice)(g)

Harvest (OAI-PMH)

(f) Publish (ISO

19115/19139)

(c) retrieve

(lcg-tools)

(e) Return graphic

(d) Update (Java-

API)

Page 13: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 13

Enabling Grids for E-sciencE

INFSO-RI-031688

Potential Impact

Ease and accelerate the search, discovery, access and processing of German ESR data

Potential impact on the German ESR-community

Provide a framework to easily and consistently exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems

Potential impact on current and potential EGEE ESR-community

Other portals or infrastructures can be integrated analogously to EGEE

Potential impact on international ESR-community

Built on international standards thus easy adaptable/expandable by other disciplines and by further partners

Potential impact on other disciplines

Page 14: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 14

Enabling Grids for E-sciencE

INFSO-RI-031688

Next steps

• Expand the demonstrated prototype to a reliable and stable system

• Porting further workflows and some pre-processing functionalities to EGEE

• Enlarge the user community

Page 15: Intelligent Distributed Data Management in  Earth system science

1st EU-Review May 15.-16 2007 15

Enabling Grids for E-sciencE

INFSO-RI-031688

Future challenges or missing bricks

• Establish a comprehensive and consistent security context to control access to (restricted) data with a single sign-on– C3Grid starts to implement a federated AA

infrastructure based on Shibboleth

• Describe analysis-services to improve discovery, use and share possibilities– First approaches to adapt ISO19119/19139 as a

common metadata format for tool description

• Modularize workflows to increase the flexibility and enable intelligent scheduling – First steps to implement a workflow information

service


Recommended