Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
1
Lecture 13:Database Heterogeneity
Debriefing Project Phase 2
2
Outline
• Database Integration
• Wrappers
• Mediators
• Schema Integration
Book Section
3
Database Integration
• How to build applications using multiple DBs?
Ebay DVDorders
IMDB amazon
Oracle PointBase MySQL IBM DB2
movie DB order movie order status
4
Problem Dimensions
Distribution
Autonomy
Heterogeneity
5
How to Deal with Distribution?
• Problems
• Solutions
6
How to Deal with Autonomy?
• Problems
• Solutions
7
How to Deal with Heterogeneity?
• Problems
• Solutions
8
Solution Variants
• General issues– Bottom-up vs. top-down engineering– Virtual vs. materialized integration– Read-only vs. read-write access– Transparency: language, schema, location
• What did you do?
9
A Generic System Architecture
• Wrapper-Mediator architecture
DB1 DB2 DB3 DB4
Oracle PointBase MySQL IBM DB2
wrapper wrapper wrapper wrapper
mediator
application 1 application 2 application 3
mediators integrate thedata from the DBs
wrappers convert to acommon representation
10
A Closer Look at Data Models
• Data model used by sources– relational? HTML? XML? Text?
• Data model used by integrated DB– canonical data model (e.g. relational, XML)
• Query models– Structured queries, retrieval queries, data
mining (statistics)
11
A Generic Wrapper Architecture
request/query result/data
Compensationfor missingprocessing capabilities
Transformationof data model
Communicationinterface
Source data
Metadata
integrity constraints
12
Wrapper Tasks
• Data Model consists of– Data types– Integrity constraints– Operations (e.g. query language)
• Translate among different data models• Overcome other "syntactic" heterogeneity
Which was the task?
How was it implemented?
13
Example: Wrapping Relational Data in XML/HTML
• Data types– trivial
• Integrity Constraints (e.g. primary keys)– requires XML Schema
• Operations– none in HTML
Where did this play a role?
14
Example: Wrapping XML/HTML into Relational
• Data Types– which difficulties?
• Integrity Constraints– none in HTML
• Operations– requires generally XQuery– form fields can be considered as hard-coded
queries
15
A Closer Look at Schemas
• Tight vs. loose integration– Is there a global schema?
• Support for semantic integration– collection, fusion, abstraction
16
Schema Architecture for Federated DBMS
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
• accepted model for integrated database systems with integrated schema
• 5-level architecture
• data independence
17
Export Schema
• provided by data source
• source DB can change w/o changing export schema
which was the export schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
18
Import Schema
• provided by wrapper
• export schema can change w/o changing import schema
which was the import schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
19
Integrated Schema
• provided by mediator
• import schemas can change w/o changing integrated schema
which was the integrated schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
20
Application View
• provided by application
• integrated DB can change w/o changing application (code)
which were application views?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
21
Mediator Tasks
• Integrate data with same "real-world meaning", but different representation– integration mapping schema integration– can be implemented, e.g., as database view
• Decompose queries against the integrated schema to queries against source DBs– only for virtual integration
22
Schema Integration
• Standard Methodology
Schema translation(wrapper)
Correspondenceinvestigation
Conflict resolutionand schema integration
23
Identifying Schema Correspondences
Sources of information– source schema– source database– source application– database administrator, developer, user
Which were your information sources?
24
Identifying Schema Correspondences
• Semantic correspondences – e.g. related names
• Structural correspondences– reachability by paths
• Data analysis– distribution of values
Can you give examples?
25
Conflicts
• What types of problems did you encounter integrating corresponding data?
26
Types of Conflicts
• Schema level– Naming conflicts– Structural conflicts– Classification conflicts– Constraint and behavioral conflicts
• Data level– Identification conflicts– Representational conflicts– Data errors
27
Conflict Resolution
• Depends on type of conflict
• Requires construction of mappings
• Mappings might be complex, e.g. not expressible as SQL views
28
Naming Conflicts
• Homonyms (give example)– same name used for different concepts– Resolution: introduce prefixes to distinguish
the names
• Synonyms (give example)– different names for the same concepts– Resolution: introduce a mapping to a common
name
29
Structural Conflicts
• Different, non-corresponding attributes– Resolution: create a relation with the union of
the attributes
• Different datatypes – Resolution: build a mapping function
• Different data model constructs– e.g. attribute vs. relation– Resolution: requires higher order mappings
30
Classification Conflicts
• Relations can have different coverage (inclusion, non-empty intersection)– Resolution: build generalization hierarchies
• Additional problem– Identification of corresponding data instances– "real world" correspondence is application
dependent
31
Data Correspondences
• Corresponding data instances– similar to naming conflicts at schema level– Resolution: mapping tables and functions– Similarity functions
• Corresponding data values, data conflicts– of corresponding data instances– Resolution: mapping tables and functions– Prefer data from more trusted data source
32
Constraint and Behavioral Conflicts
• Cardinality conflicts– different types of cardinality constraints on
relationships– Resolution: use the more general constraint
• Behavioral conflicts for relation update– E.g. cascading delete vs. non-cascading– Resolution: add missing behavior at global level
33
More?
• Security– protecting data
• Data Quality– actively managing data quality
• Integration as Agreement Process– "emergent semantics"