Berendt: Advanced databases, winter term 2007/08, berendt/teaching/2007w/adb/ 1 Advanced databases...

Post on 18-Dec-2015

213 views 0 download

transcript

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

1

Advanced databases –

Inferring implicit/new knowledge from data(bases):

Tying it all together (a start)

Bettina Berendt

Katholieke Universiteit Leuven, Department of Computer Science

http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

Last update: 6 December 2007

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

2

Goal 1 for today

Wrap up yesterday‘s lecture and discussion + prepare you for the next assignment

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

3Goal 2 for today: identify „missing links“ & point to solution approaches

(on the board)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

4

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

5

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

6

Mining association rules

Apriori: (slides from D. Delic)

Mining generalized association rules: (Karlsruhe slides)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

7

Main interestingness measures of association rules

Support of a rule A B

= no. of instances with A and B / no. of all instances

Confidence of a rule A B

= no. of instances with A and B / no. of instances with A

= support (A & B) / support (A)

Lift of a rule A B

= support (A & B) / [ support (A) * support (B) ]

What does this measure, and in what numerical interval can it be?

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

8

Interesting- ness measures

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

9

Interestingness as a constraint

So we‘re not interested in

„show me all patterns“

But

„show me all patterns that are interesting = that have properties X“

constraints!

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

10

Examples from MINERULE

MINE RULE exemple as

SELECT DISTINCT 1..n Item as BODY, 1..1 Item as HEAD, SUPPORT, CONFIDENCE

WHERE HEAD.Item=« umbrellas » // also other fields, e.g. Date

FROM Purchase

GROUP BY Tid

HAVING COUNT(*)<6

EXTRACTING RULES WITH SUPPORT: 0.06, CONFIDENCE: 0.9

E.g., jacket flight_Dublin umbrellas (0.08,0.93)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

11

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

12

The site

Business understanding / problem definition:

* How do users search in this online catalog?

* Which search criteria are popular?

* Which are efficient?

[Berendt & Spiliopoulou,VLDB Journal 2000]

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

13

The concept hierarchies / site ontology(excerpt)

SEITE1-...LI (1st page of a list)orSEITEn-...LI (further page)

LA („Land“) SA („Schulart“) SU („Suche“)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

14

Sequence mining – one result pattern: successful search for a school in Germany

a refinement

a repetition

a continuation

one example pattern

select t from node a b, template a * b as t where a.url startswith "SEITE1-" and a.occurrence = 1 and b.url contains "1SCHULE" and b.occurrence = 1 and (b.support / a.support) >= 0.2

(Berendt & Spiliopoulou, VLDB J. 2000)

/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de&region=by&schultyp=

/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de&region=by&schultyp=

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

15

Sequences

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

16

Generalized sequences, navigation patterns, hits in WUM

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

17

Aggregated Logs: The basic internal representation in WUM

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

18The confi-dence measure for genera-lized sequences

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

19

Templates in the query language MINT, g-sequences, and navigation patterns

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

20

Interestingness measures: Support (hits) and confidence

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

21

Aggregated Logs, queries, and query results

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

22

The basic idea of the WUM algorithm

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

23

MINT can express 3 types of constraints (“predicates“)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

24

The WUM gseqm algorithm

(B predicates)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

25

Also for higher-order structures (graphs): Ex. MolFea

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

26

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

27The basic idea

(on the board)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

28

Agenda

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

29(One) basic idea

(on the board)

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

30

Next lecture

Naïve Bayes [remaining from yesterday]

Changing representation: LSI [rem. from yesterday]

Ont.+KDD: Apriori and taxonomies

KDD+DB: Constrained pattern mining – ex. WUM

KDD+DB: Inductive databases (very brief)

KDD+Ont.: Induction and Semantic Web (very brief)

Applications

Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/

31

References and background reading; acknowledgements

Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993. http://citeseer.ist.psu.edu/agrawal93mining.html

(presentation from Delic, D. (2002). Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study.)

Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. http://citeseer.ist.psu.edu/srikant95mining.html

(presentation from http://www.kde.cs.uni-kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf)

P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. 183 http://citeseer.ist.psu.edu/tan02selecting.html

MINERULE: R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, Vol. 2 (2), pp. 195-224, 1998. http://www.springerlink.com/index/L57188431Q027L73.pdf

WUM and the Schulweb study: Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75. http://vasarely.wiwi.hu-berlin.de/Home/berendt-spiliopoulou-vldbj00.pdf

MolFea (esp. The example): S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2001.

De Raedt, L. (2002) A perspective on inductive databases. SIGKDD Explorations. Volume 4, Issue 2, 69-77. http://owl-workshop.man.ac.uk/acceptedLong/submission_25.pdf