Post on 12-Jan-2016
description
transcript
1www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Towards a Generic Framework for Semantic Data Towards a Generic Framework for Semantic Data Registration and Integration in GeosciencesRegistration and Integration in Geosciences
Kai Lin, Chaitan BaruKai Lin, Chaitan Baru
San Diego Supercomputer CenterSan Diego Supercomputer Center
University of California, San DiegoUniversity of California, San Diego
2www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Data Integration GoalData Integration Goal
• Query heterogeneous data sources as a single Query heterogeneous data sources as a single resourceresource– Query: not write a program (“ad hoc, non-procedural
query languages”)– Heterogeneous: local resource controls definition of the
data– Single resource: remove the burden of individually
accessing each data source
3www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Data Integration Challenges: Data Integration Challenges: HeterogeneitiesHeterogeneities
• Syntactical Heterogeneity Syntactical Heterogeneity
heterogeneous data format heterogeneous data format
e.g. 02-04-2004 vs. 02/04/04
• Structural Heterogeneity Structural Heterogeneity heterogeneous data models and schemas
e.g. 02-04-2004 is saved as three columns or one columns
• Semantics HeterogeneitySemantics Heterogeneity fuzzy metadata, terminology, “hidden” semantics, implicit
assumptions
GEON Solution:• data should be semantically registered to GEON first• heterogeneities are resolved by registration
4www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Levels of RegistrationLevels of Registration
• Metadata-level registrationMetadata-level registration– Register metadata associated with a resource submit required metadata. Predefined semantics.
• ““Item” level registrationItem” level registration– Register the “schema” of a resources, e.g. relational
database, shapefiles, …– Record semantics of schema elements, e.g. table name,
column name
• ““Item-Detail” level registrationItem-Detail” level registration– Register individual values in a dataset– Record semantics of each item in a record/column
5www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Registering Structured DataRegistering Structured Data
• Relational databasesRelational databases• Shapefiles Shapefiles database tables database tables• Excel spreadsheets Excel spreadsheets database tables database tables• Delimited ASCII files Delimited ASCII files database tables database tables• Headers of scientific data files, e.g. netCDFHeaders of scientific data files, e.g. netCDF
6www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Item Level Database Registration and AccessItem Level Database Registration and Access
Table
Table
Table
Table
View
View
Original Database
Table Def
Table Def View Def
Published Database select tables and
views to register
GEON Mediator
GEON JDBC Driver
Application
7www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
How to Connect to GEON DatabasesHow to Connect to GEON Databases
• Download GEON JDBC Driver• Use the following code to create a connection
// load driverClass.forName ("org.geongrid.jdbc.driver.Driver");
// set the mediator URLString url = "jdbc:geon://geon01.sdsc.edu:2532/GEON-63cb404c-6038-11d9-a69f”;
// open the connectionConnection conn = DriverManager.getConnection(url, "geonuser", "geongrid");
GEON JDBC protocolThe host name and port number of GEON Mediator
GEON ID
Note: the original account information is not accessbile by end users
8www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Mediator Enables Write ProtectionGEON Mediator Enables Write Protection
Mediator
Database
UPDATE B
• Only accepts SELECT statements• Rejects any requests other than SELECT
A
B
C
B
9www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Read Protection for Unregistered Tables and ViewsRead Protection for Unregistered Tables and Views
MediatorDatabase
SELECT *FROM A
An unregistered table or view is invisible to an end user• The data in the table can’t be viewed by SELECT statement • The schema can’t be fetched
A
B
C
B
10www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Database IntegrationGEON Database Integration
GEON Mediator supports integration at three levels
Level 1: Federation-Based Integration• End users need to be knowledgeable about each database
Level 2: View-Based Integration• End users see “integrated views”. An intermediary designs these views.
Level 3: Ontology-Based Integration• End users can query using familiar concepts• Requires middleware and formal representation of domain knowledge
11www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Level 1: Federation-Based IntegrationLevel 1: Federation-Based Integration
C
A B
G
D
F
E
C
A B
D
GF
E
GEON Mediatorbackend
backendSELECT * FROM A, E WHERE ……
• Use SQL to query the federated database• Structural and semantic heterogeneity should be solved by users themselves
12www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Level 2: View-Based IntegrationLevel 2: View-Based Integration
C
A B
G
D
F
E
CA B
D
GFE
GEON Mediatorbackend
backendSELECT * FROM V, W WHERE ……
• Allow defining views on top of the federated databases• Allow hiding the original backend schemas• Integration results can be shared and reused
V W
13www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Level 3: Ontology-Based IntegrationLevel 3: Ontology-Based Integration
• Requires ontology annotations for backend databases • Use simple ontology query language to query the integrated database• End users do not need to know the backend schemas and local semantics
C
A B
G
D
F
E
CA B
D
GFE
GEON Mediatorbackend
backend Ontology Based Query
14www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
GEON Ontology Based Data IntegrationGEON Ontology Based Data Integration
• Ontology Enabled Semantic IntegrationOntology Enabled Semantic Integration
Challenges for Computer Scientists and Domain ScientistsChallenges for Computer Scientists and Domain Scientists– Computer Scientists: build an integration system based on the
ontological registration of datasets– Domain Scientists: create domain ontologies– Data Providers: register datasets to ontologies
Ontology1 Ontology2 ontology3
dataset1 dataset2 dataset3 dataset4
15www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Ontological Data Registration for Data integrationOntological Data Registration for Data integration
• Registering a dataset to an ontology for data integration Registering a dataset to an ontology for data integration is a procedure to generate a partial model of the ontology is a procedure to generate a partial model of the ontology from the dataset itselffrom the dataset itself
From registrationdataset
individuals ontology
p
Not all the constraints inthe ontology are satisfied
by the generated individuals
16www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
• Associate one or more columns under an optional Associate one or more columns under an optional SQL condition to a selected class in the ontologySQL condition to a selected class in the ontology
• Provide a mapping method if no explicit names of Provide a mapping method if no explicit names of individuals should be generatedindividuals should be generated
Registering Relational Tables to Ontology ClassesRegistering Relational Tables to Ontology Classes
………… LatitudeLatitude ………… LongitudeLongitude …………
23.523.5 47.947.9
………… ………… ………… ………… …………
Location
(23.5, 47.9) is the name of an individual of the class Location
Same name indicates the same location
RockSampleRockSample GeologicAgeGeologicAge …… ……
Jurassic/TriassicJurassic/Triassic
PrecambrianPrecambrian
………… …………
GeologicalAge
Precambrian Cenozoic Paleozoic
17www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Registering Relational Tables to Ontology Object PropertiesRegistering Relational Tables to Ontology Object Properties
• Associate two entities which are already registered to the Associate two entities which are already registered to the domain class and the range class of a selected object domain class and the range class of a selected object property in the ontologyproperty in the ontology
………… RockSampleIDRockSampleID ………… PERIODPERIOD …………
………… ………… ………… ………… …………
Rock GeologicAgehasAge
18www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
Register item/item-detailto Ontology
ODAL(Ontological Database Annotation Language)
User querySOQL
(Simple Ontology Query Language)
ODAL and SOQLODAL and SOQL
19www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODALODAL ((OOntological ntological DDatabase atabase AAnnotation nnotation LLanguage)anguage)
<odal:NamedIndividuals odal:id="RockSample" odal:database="VTDatabase"> <odal:Class odal:resource="http://geon.vt.edu#RockSample" /> <odal:Table>Samples</odal:Table> <odal:Table>RockTexture</odal:Table> <odal:Table>RockGeoChemistry</odal:Table> <odal:Table>ModalData</odal:Table> <odal:Table>MineralChemistry</odal:Table> <odal:Table>Images</odal:Table> <odal:Column>ssID</odal:Column> </odal:NamedIndividuals>
GUI
generateto ODALprocessor
The values in the column ssID of the table Samples, RockTexture, RockGeoChemistry, ModalData,MineralChemistry and Images represent instances of RockSample
• Create a partial model of ontologies from databases• Independent of end interface• Independent of specific database implementations• The ODAL mapping is itself a “first-class” object
20www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Import OntologiesODAL: Import Ontologies
The Ontologies used for annotating a database can be imported as follows:The Ontologies used for annotating a database can be imported as follows:
<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” ><odal:Ontology> <odal:Imports rdf:resource="http://www.library.org/Book.owl"/> <odal:Imports rdf:resource="http://www.writer.org/Writer.owl"/></odal:Ontology>
……
</odal:ODAL>
21www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Database Connection DeclarationODAL: Database Connection Declaration
The target databases for making annotation is declared as The target databases for making annotation is declared as follows:follows:
<?xml version="1.0"?> <odal:ODAL xmlns:rdf = “http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:odal = “http://www.sdsc.edu/odal#” >……<odal:Database odal:id="PublicationDatabase"> <odal:DatabaseProductName>Oracle<odal:DatabaseProductName> <odal:DatabaseProductVersion>9.1.21<odal:DatabaseProductVersion> <odal:Host>oracle.sdsc.edu</odal:Host> <odal:Port>3456</odal:Port> <odal:DatabaseName>Publications</odal:DatabaseName></odal:Database>……
</odal:ODAL>
22www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Simple Named IndividualsODAL: Simple Named Individuals
<odal:NamedIndividuals odal:id="BookInTableBookPrice" <odal:NamedIndividuals odal:id="BookInTableBookPrice" odal:database="PublicationDatabase" >odal:database="PublicationDatabase" >
<odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/><odal:Class odal:resource="http://www.amazon.com/Book.owl#Book"/> <odal:Schema>Collections</odal:Schema><odal:Schema>Collections</odal:Schema> <odal:Table>book-price</odal:Table><odal:Table>book-price</odal:Table> <odal:Column>ISBN</odal:Column><odal:Column>ISBN</odal:Column>
</odal:NamedIndividuals></odal:NamedIndividuals>
Suppose the Book ontology contains a class Book and the schema Collection contains a table Book-Price with a column ISBN.
odal:id gives a name to the declaration, and represents the set of the individuals generated by the statement.
The statement says that each value in the column ISBN represents a book individual.
23www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Named Individuals from Multiple ColumnsODAL: Named Individuals from Multiple Columns
<odal:NamedIndividuals odal:id="LocationInTableRockSample" ><odal:NamedIndividuals odal:id="LocationInTableRockSample" > <odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/><odal:Class odal:resource="http://www.usgs.org/Space.owl#Location"/> <odal:Schema>California</odal:Schema><odal:Schema>California</odal:Schema> <odal:Table>Rock-Sample</odal:Table><odal:Table>Rock-Sample</odal:Table> <odal:Column>Latitude</odal:Column><odal:Column>Latitude</odal:Column> <odal:Column>Longitude</odal:Column><odal:Column>Longitude</odal:Column></odal:NamedIndividuals></odal:NamedIndividuals>
Suppose an ontology contains a class Location and a database table Rock-Sample with two columns Latitude and Longitude.
The statement says that a pair of latitude and longitude gives a location
24www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Named Individuals with ConditionsODAL: Named Individuals with Conditions
<odal:NamedIndividuals odal:id="MaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee.owl#MaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’M’ >]]</odal:Condition></odal:NamedIndividuals>
<odal:NamedIndividuals odal:id="FemaleEmployeeInTableEmployee" > <odal:Class odal:resource="http://www.abc.com/Employee#FemaleEmployee"/> <odal:Table>employee</odal:Table> <odal:Column>EmployeeId</odal:Column> <odal:Condition><![CDATA[ Gender=’F’ >]]</odal:Condition></odal:NamedIndividuals>
A condition in an odal:Condition element should be a boolean expression which isvalid to be used in any WHERE clauses of SQL queries
25www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
ODAL: Data Type Property DeclarationODAL: Data Type Property Declaration
<odal:NamedIndividuals odal:id="PersonInTablePerson" > <odal:Class odal:resource="http://www.foo.org/Person.owl#Person"/> <odal:Table>Person</odal:Table> <odal:Column>ssn</odal:Column></odal:NamedIndividuals>
<odal:OntologyProperty> <odal:DatatypeProperty odal:resource="http://www.foo.org/Person.owl#hasAge"/> <odal:Table>person</odal:Table> <odal:Domain odal:resource="PersonInTablePerson" /> <odal:Range odal:resource="age" /></odal:OntologyProperty>
……88……1234-56-78901234-56-7890……
……ageage……SSNSSN…… Person
double
hasAge
26www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
• To join data across independent resources we need we need to know To join data across independent resources we need we need to know the correspondence between entities. the correspondence between entities.
• For example, does “10001” represent the same rock in the two For example, does “10001” represent the same rock in the two resources. By default, we assume they are not.resources. By default, we assume they are not.
• A set of datatype properties can be declared as a key for a class in the A set of datatype properties can be declared as a key for a class in the ontology. We do join cross multiple resources based on keys.ontology. We do join cross multiple resources based on keys.
e.g. e.g. { hasLatitude, hasLongitude}{ hasLatitude, hasLongitude} can be declared as a key of Location can be declared as a key of Location
Two locations from different resources are same if they have the same Two locations from different resources are same if they have the same latitude and longitude latitude and longitude
Conditions for Joining Individuals from Different ResourcesConditions for Joining Individuals from Different Resources
Rock
RockSampleIDRockSampleID
1000110001
… …......
RockIDRockID
1000110001
…… ……
27www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
SOQL (SOQL (SSimple imple OOntology ntology QQuery uery LLanguage)anguage)
Query single or integrated resources • via ontologies (i.e., high level logical views)• independent of schema-level representation
RockSample Location
ValueWithUnit float
location
hasSiO2
value
lat long
unit
string
SELECT X.location.*; FROM RockSample X WHERE X.location.lat > 60 AND X.location.long > 100 AND X.hasSiO2.value < 30 AND X.hasSiO2.unit =‘weightPercetage’
GUIgenerate
to SOQLprocessor
28www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
The Architecture of GEON Semantic MediatorThe Architecture of GEON Semantic Mediator
Portal or Application
Mediator JDBC Driver
GUI
SOQLSemantic Query Rewriter
SOQL Parser Ontology
Reasoner
SOQL Processor
Spatial SQL against federal schemas
SQL Parser
OWL ODAL
Query Execution
Query Optimization
QueryPlanning Internal Database
Oracle DB2 MySQLSQL
ServerPostgreSQL PostGIS
ODAL Processor
29www.geongrid.orgCYBERINFRASTRUCTURE FOR THE GEOSCIENCES
SELECT X.code, X.location.* FROM SeismicStation X, Railroad Y WHERE distance(X.location, Y.geometry) < 1
SELECT X2.stationcode, X2.lat, X2.lon FROM railroads_of_the_united_states X1, stationdatatable X2 WHERE distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
GEONSOQLGUI
SOQL Processor
Railroadshapefile
Seismic Stations
Schema Mediator
distance(X1.the_geom, MakePoint(X2.lat, X2.lon)) < 1
SELECT X1.the_geom FROM railroads X1
QuestionQuestion: Finding all seismic stations within 1 mile from railroads: Finding all seismic stations within 1 mile from railroads
SELECT X2.stationcode, X2.lat, X2.lon FROM stationdatatable X2
WHERE bounding box condition