+ All Categories
Home > Documents > 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases –...

1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases –...

Date post: 03-Jan-2016
Category:
Upload: marshall-may
View: 216 times
Download: 0 times
Share this document with a friend
29
Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching 1 Advanced databases – Introduction and overview Prof. Dr. Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science http://www.cs.kuleuven.ac.be/~berendt/teaching/2009-10-1st semester/adb/ ast update: 23 September 2009
Transcript
Page 1: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

1Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

1

Advanced databases –

Introduction and overview

Prof. Dr. Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.ac.be/~berendt/teaching/2009-10-1stsemester/adb/

Last update: 23 September 2009

Page 2: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

2Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

2

Agenda

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Page 3: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

3Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

3Organisation of the course

www.cs.kuleuven.be/~berendt/teaching/2009-10-1stsemester/adb

Master’s course for CS students specializing in Databases + others

Teaching

Lecture: Bettina Berendt, in English

Exercises and homeworks: Ilija Subašić, in English

Materials:

see Web site; available ~ 1 week before each class

Grading based on exercises; no exam

Contact us:

via the toledo system (details to be announced)

[ bettina.berendt | ilija.subasic]@cs.kuleuven.be

Page 4: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

4Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

4

Agenda

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Page 5: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

5Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

5LOTS of data(often, but not always, in database form)

Page 6: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

6Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

6

What is this course about? (1) – What does it build on?

The database field profits from a well-understood, well-functioning, commonly-used general model: relational databases

You have learned about this in the Databases course

Relational databases: a „homogenizing model“

What else makes databases so powerful today ?

Page 7: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

7Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

71. Data are accessible because they are interconnected

(often, but not always, over the Internet/Web)

Page 8: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

8Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

82. Heterogeneous data are integrated (often, but not always, „semantically“)

Page 9: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

9Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

93. They are analysed to reveal the „knowledge“ implicit in them(e.g., link structure PageRank sorting to order by relevance)

Page 10: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

10Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

10An application example:Where do people live who will buy the Qur‘an soon?

Page 11: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

11Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

11

Data source #1: Amazon wishlists

[Owad, T. (2006). Data Mining 101: Finding Subversives with Amazon Wishlists. http://www.applefritter.com/bannedbooks]

Page 12: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

12Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

12Data sources #2-#4: address books, geocoders, visualizations

1. http://www.amazon.com/gp/registry/search.html/?encoding=UTF8&type=wishlist&field-name=edgar&page=1 contains “edgar“ wishlist URLs:

http://www.amazon.com/gp/registry/registry.html/?encoding=UTF8&type=wishlist&id=theFirstEdgar...

2. 6-line shell script + wget : Many wish lists

3. ls -1 | xargs grep -HiFof /Volumes/UFS/terms.txt > /Volumes/UFS/matches.txt (or search by ISBN):

search term (or ISBN) {person name + city}

4. http://people.yahoo.com/

book {name + address}

5. http://www.ontok.com/geocode :

book {geo-coordinates}

6. Google Maps API: insert geo-coordinates into map

Page 13: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

13Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

13

So: What is this course about? (2) – What will it be about?

The database field profits from a well-understood, well-functioning, commonly-used general model: relational databases

You have learned about this in the Databases course

Relational databases: a „homogenizing model“

What else makes databases so powerful today ?

Semantic integration of heterogeneous data

Integration over the Internet/Web

Analysis beyond retrieval: „Knowledge discovery (in databases)“ aka „Data mining“

Page 14: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

14Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

14

Outline of the course

Lectures (see Web page)

Exercises progress from small „Bachelor-type exercises“ to a larger joint „mini-

project“ with distributed teams

conceptual elements (modelling), tool use, programming, reports

Will be similar in structure to last year:

1. Create a conceptual model in UML of ...

2. Model the same domain in OWL

3. Federated search: Retrieve information from different databases

4. Convert information (2008: XML the OWL model created in ex. 2)

5. Extract implicit knowledge from a given relational database table

6. Extract implicit knowledge from a given semi-structured dataset

7. Knowledge discovery from real data on the Web (Wikipedia): retrieval, preprocessing, semantic enrichment, model integration, pattern extraction, visualisation, model comparison

Page 15: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

15Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

15

Learning outcomes: After this course, you will ...

understand and master relevant concepts and techniques of current databases and processing based on databases

understand the potentials, limitations, and risks inherent in assembling, combining, and processing huge amounts of heterogeneous data in globally interconnected environments

be able to design such databases and connectivity and relevant methods for combining and enriching data

have worked with concrete examples of such data collection / processing

Page 16: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

16Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

16

Agenda

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Page 17: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

17Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

17

Data and information

Datum / Data

Fact or concept from reality, in a form suitable for communicating it, interpreting it, and processing it

Information

Interpreted data

Example:

The length of the road is 400 km

Interpretation Data

(based on Henk Olivié: Gegevensbanken – 01. 2006/07)

Page 18: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

18Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

18

Data, information, and knowledge

Data represents a fact or statement of event

without relation to other things. Ex: It is raining.

Information embodies the understanding of a relationship of some sort, possibly cause and effect.

Ex: The temperature dropped 15 degrees and then it started raining.

Knowledge represents a pattern that connects and generally provides a high level of predictability as to what is described or what will happen next.

Ex: If the humidity is very high and the temperature drops substantially the atmospheres is often unlikely to be able to hold the moisture so it rains.

(This is from knowledge-management theory. If you want to know about wisdom, check the Web page:

G. Bellinger, D. Castro, & A. Mills: Data, Information, Knowledge, and Wisdom. http://www.systems-thinking.org/dikw/dikw.htm )

Page 19: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

19Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

19

„Knowledge“ as used in this course

Data represents a fact or statement of event

without relation to other things. Ex: It is raining.

Information embodies the understanding of a relationship of some sort, possibly cause and effect.

Ex: The temperature dropped 15 degrees and then it started raining.

Knowledge represents a pattern that connects and generally provides a high level of predictability as to what is described or what will happen next.

Ex: If the humidity is very high and the temperature drops substantially the atmospheres is often unlikely to be able to hold the moisture so it rains.

This definition of „knowledge“ corresponds to that used in Data mining (aka „knowledge discovery (in databases)“) (in particular symbolic) AI (e.g., „knowledge-based systems“)

It is not the only definition; e.g., cognitive psychology generally assumes that only people can have knowledge, such that computers can only possess (different types of) information.

Page 20: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

20Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

20Computerizing data, information, and knowledge:Databases and knowledge bases

Databases

= data + interpretation (metadata)

focus on data and information

= focus on the retrieval of data and information

Knowledge bases

a special kind of database

provide the means for the computerized collection, organization, and retrieval of knowledge

focus on knowledge

= focus on the inferences that can be made from data+information

Page 21: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

21Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

21Combining data and knowledge from different sources:The importance of conceptual models

To combine data from different databases:

know + integrate their conceptual models

To combine data from databases and knowledge bases:

1. understand the commonalities and differences of their conceptual meta-models

Simplified:

database conceptual models = entities + relations

knowledge base conceptual models = entities + relations + rules for inferencing

2. integrate these conceptual models (as for databases)

Page 22: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

22Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

22

Agenda

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Page 23: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

23Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

23

Conceptual modelling as a part of database design

Page 24: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

24Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

24

Conceptual database schemas and conceptual models in general

Conceptual schema: a concise description of the data requirements of the users

includes detailed descriptions of the entity types, relationships, and constraints

does not include implementation details

can be used to communicate with non-technical users

(Elmasri, R. & Navathe, S.B. (2007). Fundamentals of Database Systems. Boston: Addison Wesley. 5th Edition. p. 60)

Conceptual model a theoretical construct that represents something, with a set of variables

and a set of logical and quantitative relationships between them.

describes the semantics of the modelled domain

Models in this sense are constructed to enable reasoning within an idealized logical framework

Often in the form of an ontology, or having an ontology as a part

– Ontology (a simple definition): ~ schema plus axioms for inference

Page 25: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

25Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

25Conceptual modelling: languages, automated code generation, integration

Typically, the conceptual model(s) that are developed are captured in a software tool, using a particular conceptual modeling language.

Entity-relationship models (ERM)

Unified modeling language (UML)

But also: resource description framework (RDF), Web ontology language (OWL)

Conceptual modeling is one of the key activities in developing computerized systems for two important reasons.

Firstly, more and more, it is now possible to use computerized tools that can generate part (or sometimes all) of a computer application from the conceptual models encoded in standardized modeling languages [such as UML].

Secondly, computerization of enterprises continues with a focus on integrating systems.

Integration of systems requires an understanding of the semantics of each of the systems to be integrated.

The availability of conceptual models for the participant systems can facilitate the integration process and will require the involved staff to be fluent with the basics of the models employed and to have some modeling capabilities of their own. ...

Page 26: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

26Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

26

Agenda

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Page 27: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

27Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

27

Recap: Conceptual modelling in the Entity-Relationship Model

insert here:

Jeff Ullman

The Entity-Relationship (E/R) Model.

2004 Slide set.

http://infolab.stanford.edu/~ullman/dscb/pslides/er.ppt

(in particular pp. 1-39)

(A lot of detail also in Henk Olivié, Gegevensbanken:

3: gegevensmodellering met het entiteit-relatie model,

4: het uitgebreide entiteit relatie model en UML

Or

(Instructor slides of the Elmasri/Navathe book, in English)

ch03.ppt, ch04.ppt in the directory „Lecture/OtherSlides“ of this course´s Web site

Page 28: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

28Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

28

Next lecture

Organisation of the course

Motivation and overview

Data, information, and knowledge

Conceptual modelling, schemas, and ontologies

Recap: Entity-relationship model for data modelling

Data modelling: UML, logics, Semantic Web

Page 29: 1 Berendt: Advanced databases, first semester 2009, berendt/teaching 1 Advanced databases – Introduction and overview Prof.

29Berendt: Advanced databases, first semester 2009, http://www.cs.kuleuven.ac.be/~berendt/teaching

29

References / background reading; acknowledgements

p. 23:

Elmasri, R. & Navathe, S.B. (2007). Fundamentals of Database Systems. Boston: Addison Wesley. 5th Edition. p. 410

p. 25: Based on: Dagstuhl seminar April 2008: The Evolution of Conceptual Modeling

http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=2008181

p. 27 – the referenced Ullman slides refer to

Hector Garcia-Molina, Jeff Ullman, & Jennifer Widom (2002). Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice-Hall.


Recommended