A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A...

Post on 21-Jan-2016

217 views 0 download

Tags:

transcript

A Cyberinfrastructure Framework for Discovery,

Integration, and Analysis of Earth Science Data

A Prototype SystemA. K. Sinha, Z. Malik, A. Rezgui, A. Dalton, K.

Lin

* Virginia Tech

** San Diego Supercomputer Center

* * * * **

2

Hypothesis Evaluation: Are A-Type Rocks in Virginia related to a Hot Spot Trace ?

Spatio-Temporal Distribution of Igneous Rocks

Laurentian Crust and Lithosphere

Plume Head

Hot Spot Trace ?

3

GEON’s DIA Engine

Evaluating a Hypothesis requires

Discovery - Access to Data Integration of Data – Provide data

products Analysis of Data – Verify Hypothesis

4

Data Discovery

Registration of Data : Pre-requisite for Data Discovery

Level 1 Registration – Keywords Level 2 Registration – Ontologic Classes Level 3 Registration – Item Detail Level

5

Registration of Data:Key to Discovery, Integration and Analysis

Level 1 Discovery of data resources (e.g., gravity, geologic maps,

etc) requires registration through use of high level index terms. GEON has deployed extension of AGI Index terms -will be cross indexed to others such as GCMD, AGU

Level 2 Discovering Item level databases requires registration at

data level ontologies (e.g. bulk rock geochemistry, gravity database)

Level 3 Item detail level registration (e.g., column in geochemical

database that represents SiO2 measurement). This level of registration is a requirement for semantic integration

6

AGI Index Terms

GEON Index Ontology

http://www.geoscienceworld.org/

Level 1 Registration

7

Ontological Look at Virginia Tech Igneous Rock Database

RockGeologic Images

Methods & References

Isotope

LocationMineral

Structure

MapReference References

FeTreatmentMinerals BulkRockGeochemMethods

AnalyticalMethods BodyShapes

Fractures Fabric

RockGeoChemistry

ModelComposition

ImagesGeologicLocation MineralChemistry

Rb_Sr_Isotope_Whole_Rock

Sm_Nd_Isotope_Whole_Rock U_Th_Pb_Isotope_Whole_RockRb_Sr_Isotope_Mineral

Sm_Nd_Isotope_Mineral U_Th_Pb_Isotope_Mineral

Level 2: Registration at the Item Level

Mineral

Rock

Element

Isotope

Structure

Location

Level 2 Registration

8

1 0..n

A Section from Planetary Material Ontology

AnalyticalOxideConcentration

analyticalOxide: AnalyticalOxideconcentration : ValueWithUniterrorOfConcentration : ValueWithUnit

GEON approach of registering data to concepts removesstructural (format) andsemantic heterogeneity

Level 3 Registration

9

DIA Engine (1) How does GEON discover data

Keywords, Resource Type, Temporal, Spatial Invoke GEON protocol for discovering

databases Discovery, Integration and Analysis Engine

Retrieve the discovered data from registered databases

Emphasize Geospatial and Aspatial Discoveries (Not all things to be done through a Map-based browser)

10

DIA Engine (2)

Geoscience TemplatesGeologic Map (USA)Geologic Map (States)

Terrane MapGeologic Provinces

Geophysical Map

- Experimental Databases

- Tools

Geospatial Engine Aspatial Engine

11

High-Level View of the DIA Engine

User specifies class of data for analysis

The DIA Engine derives and retrieves the different data sets needed for the requested analysis

The DIA Engine applies processing and filtering techniques to generate the requested data product

Data products and Query Steps can be saved

RawData

QueryTool

DataProduct

Modeling Computation

12

Data products (1) Data products can be in the form of Interactive

Maps, Interactive Filtering Diagrams or Excel Data Files

Examples: A map showing the A-Type bodies in the Mid-Atlantic

region

An Excel file giving the ages of those A-Type bodies

A gravity database table spatially related to A-Type bodies

Saved as a contoured gravity map

13

Data products (2)

Data products can be: Pre-Packaged

Quickly queried but not flexible and provide little support for complex scientific discovery

Created Dynamically May require on-the-fly, extensive query

processing but enables far richer possibilities for scientific discovery

Requires Semantic Integration

14

Data Integration (1)

Semantic integration of data products requires: Ontologies: a common language to

interpret data from different sources Data sharing: requires data registration

Fine grain (i.e., item-level) registration is necessary to enable the automatic processing (by tools) of shared data.

15

Data Integration (2)

QueryTool

DataProduct

Integration within anontological class

OntologicallyRegistered Data 1

DP 1

Integration acrossontological classes

OntologicallyRegistered Data(Geo-chemistry)

OntologicallyRegistered Data(Geo-physics)

DP 2

QT 1

QT 2

OntologicallyRegistered Data 2

RawData

Data OwnerData Owner

Geo-chemistryOntology

RawData

RegisterData

Geo-chemistryOntology

Geo-physicsOntology

IntegrationClass

Location

16

Limitations of Current Data Sharing Approaches

Each research group adopts its own acronyms, notations, conventions, units, etc.

Data sharing is of limited scope Data discovery is ad-hoc Only a small community of scientists may be aware of

and share a given data set Integration is difficult

Extensive conversion efforts may be needed Absence of streamlined integration leads to poor

ability to answer complex scientific questions Solution: Ontology-based Data Registration

17

Menu-based (Used in the Demo) The GUI lets the user select only specific items

which in turn queries only a subset of the data A robust system informs the user of any incorrect

input and guides in the right direction Results are guaranteed as the query is

definitely answered Text-based

The entire database can be queried Result sets may be empty Only a small mistake in the query can return

incorrect results, without the user being able to point out the fallacy

Query Building

18

Menu-based Query Building In a selected “region of interest” the user is

provided with a number of options (the menu)

User clicks through the different menus to build an exact query Click history is maintained to enable future referencing

Menu # 1 Menu # 3Menu # 2 Menu # 4

Menu # 5

19

Query Tool Selection Tools provided by GEON can be used to answer a query

OR Other geologic tools can be incorporated (invocation

interfaces need to be defined) Example: GCD-Kit can be used for classification, geotectonic

and normative calculations for Igneous Rocks

20

Analysis

Data Product(s) generated can be analyzed using various techniques Modeling Computation

21

10000*Ga/Al vs.

Zr

User

Geo-Chemical

Data

FeO*/MgO vsZr+Nb+Ce+Y

Web ServerSDSC

RockClassification

Ontology

US NationalGazeteer

Q: A-Type polygons in a region Rusing discrimination diagram D ?

GEONServer -Virginia

Tech

DiscriminationFunctions

Geo-SpatialData

Geo-SpatialData Server

Geo-ChemicalData Server 1 -Virginia Tech(Mid-Atlantic)

Geo-Chemical

Data

Geo-ChemicalData Server 2

(Wyoming)

Geo-Chemical

Data

Geo-ChemicalData Server 3

(Texas)

Y vs. Nb

Java/VB ScriptASP.netVB.net

Visual Basic

Java/VB Script-enabled

Web browser

ESRIArcSDE

ESRI ArcGISServer

MS SQLServer

MS SQLServer

MS SQLServer

Workflow Associated with the Demo

22

Used Technologies User Interface:

Java / VB Script ASP.net VB.net

Back-End: ESRI ArcGIS Server 9.1 ESRI ArcSDE 9.1 (Spatial Database) Microsoft SQL Server (Geo-Chemical

Database) Functionality Coding:

Visual Basic (to code the discrimination filters)

23

Demo Starts Here

24

Current Tool Sharing Approaches

Each research group develops its own tools

Tools developed by a research group are rarely used by other groups

Redundancy of development efforts Little interoperability amongst tools

Interaction amongst different tools is often not possible or requires extensive (re)coding

Solution: Wrap Tools as Web Services Accessible to the Scientific Community Worldwide

25

The Future: Integration through Ontologies and Web Services

Benefits of Web Services Facilitate Integration

Tools developed independently may easily be integrated into new applications

Example: Discrimination tools may be made as Web services

Provide High Reusability More tools available to the research community

Reduce development time, effort, and cost

26

Web Services Explained (1)

Function 1

ServiceProvider 1

Function 2

ServiceProvider 2

Function 3

ServiceProvider 3

W e b

UserUser

ApplicationProvider 1

ApplicationProvider 2

UDDI Registry

WSDL ServiceDescriptions

UDDI Registry

PublishWeb

Service

1

DiscoverWeb

Service

2

InvokeWeb

Service

3

SOAPMessages

WebServices

WS Standards

WSDL: Web Services Description Language

UDDI:Universal Description, Discovery, and Integration

SOAP:Simple Object Access Protocol

27

Web Services Explained (2)

WSDL (Service provider describes service using WSDL) An XML-based language to describe the capabilities of Web

services The capabilities of a WS are described as a set of end points

that can exchange messages WSDL is part of UDDI

UDDI (Service provider publishes service using UDDI) A Web-based directory where service providers may list their

services and where service consumer may retrieve the services published by the providers (like yellow pages)

SOAP (Clients and services communicate using SOAP) An XML-based protocol used to encode the messages

(requests and responses) exchanged between a Web service and its clients.

28

Within Same Ontologic Class

Discovery

Integration

Geochemical Geophysics Geologic Time

Ontologically Registered Data

Data Product

Analysis

Hypothesis Evaluation: Are A-Type Rocks in Virginia related to a Hot Spot Trace ?

Geospatial Query Aspatial Query

Between Different Ontologic Classes

Data Product

Geochemical

A-Type Identification

VA. Ontologically Registered Data

WY. Ontologically Registered Data

TX. Ontologically Registered Data