Post on 18-Dec-2015
transcript
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
1
Advanced databases –
Inferring implicit/new knowledge from data(bases):
Tying it all together (a start)
Bettina Berendt
Katholieke Universiteit Leuven, Department of Computer Science
http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
Last update: 6 December 2007
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
2
Goal 1 for today
Wrap up yesterday‘s lecture and discussion + prepare you for the next assignment
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
3Goal 2 for today: identify „missing links“ & point to solution approaches
(on the board)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
4
Agenda
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
5
Agenda
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
6
Mining association rules
Apriori: (slides from D. Delic)
Mining generalized association rules: (Karlsruhe slides)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
7
Main interestingness measures of association rules
Support of a rule A B
= no. of instances with A and B / no. of all instances
Confidence of a rule A B
= no. of instances with A and B / no. of instances with A
= support (A & B) / support (A)
Lift of a rule A B
= support (A & B) / [ support (A) * support (B) ]
What does this measure, and in what numerical interval can it be?
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
8
Interesting- ness measures
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
9
Interestingness as a constraint
So we‘re not interested in
„show me all patterns“
But
„show me all patterns that are interesting = that have properties X“
constraints!
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
10
Examples from MINERULE
MINE RULE exemple as
SELECT DISTINCT 1..n Item as BODY, 1..1 Item as HEAD, SUPPORT, CONFIDENCE
WHERE HEAD.Item=« umbrellas » // also other fields, e.g. Date
FROM Purchase
GROUP BY Tid
HAVING COUNT(*)<6
EXTRACTING RULES WITH SUPPORT: 0.06, CONFIDENCE: 0.9
E.g., jacket flight_Dublin umbrellas (0.08,0.93)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
11
Agenda
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
12
The site
Business understanding / problem definition:
* How do users search in this online catalog?
* Which search criteria are popular?
* Which are efficient?
[Berendt & Spiliopoulou,VLDB Journal 2000]
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
13
The concept hierarchies / site ontology(excerpt)
SEITE1-...LI (1st page of a list)orSEITEn-...LI (further page)
LA („Land“) SA („Schulart“) SU („Suche“)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
14
Sequence mining – one result pattern: successful search for a school in Germany
a refinement
a repetition
a continuation
one example pattern
select t from node a b, template a * b as t where a.url startswith "SEITE1-" and a.occurrence = 1 and b.url contains "1SCHULE" and b.occurrence = 1 and (b.support / a.support) >= 0.2
(Berendt & Spiliopoulou, VLDB J. 2000)
/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de®ion=by&schultyp=
/liste.html?offset=920&zeilen=20&anzahl=1323&sprache=de&sw_kategorie=de&erscheint=&suchfeld=&suchwert=&staat=de®ion=by&schultyp=
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
15
Sequences
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
16
Generalized sequences, navigation patterns, hits in WUM
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
17
Aggregated Logs: The basic internal representation in WUM
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
18The confi-dence measure for genera-lized sequences
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
19
Templates in the query language MINT, g-sequences, and navigation patterns
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
20
Interestingness measures: Support (hits) and confidence
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
21
Aggregated Logs, queries, and query results
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
22
The basic idea of the WUM algorithm
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
23
MINT can express 3 types of constraints (“predicates“)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
24
The WUM gseqm algorithm
(B predicates)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
25
Also for higher-order structures (graphs): Ex. MolFea
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
26
Agenda
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
27The basic idea
(on the board)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
28
Agenda
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
29(One) basic idea
(on the board)
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
30
Next lecture
Naïve Bayes [remaining from yesterday]
Changing representation: LSI [rem. from yesterday]
Ont.+KDD: Apriori and taxonomies
KDD+DB: Constrained pattern mining – ex. WUM
KDD+DB: Inductive databases (very brief)
KDD+Ont.: Induction and Semantic Web (very brief)
Applications
Berendt: Advanced databases, winter term 2007/08, http://www.cs.kuleuven.be/~berendt/teaching/2007w/adb/
31
References and background reading; acknowledgements
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207--216, Washington, D.C., May 1993. http://citeseer.ist.psu.edu/agrawal93mining.html
(presentation from Delic, D. (2002). Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study.)
Ramakrishnan Srikant and Rakesh Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. http://citeseer.ist.psu.edu/srikant95mining.html
(presentation from http://www.kde.cs.uni-kassel.de/lehre/ss2004/kdd/folien/4Folie_VII.3_Assoziationsregeln.pdf)
P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2002. 183 http://citeseer.ist.psu.edu/tan02selecting.html
MINERULE: R. Meo, G. Psaila and S. Ceri, An extension to SQL for mining association rules. Data Mining and Knowledge Discovery, Vol. 2 (2), pp. 195-224, 1998. http://www.springerlink.com/index/L57188431Q027L73.pdf
WUM and the Schulweb study: Berendt, B. & Spiliopoulou, M. (2000). Analysis of navigation behaviour in web sites integrating multiple information systems. The VLDB Journal, 9, 56-75. http://vasarely.wiwi.hu-berlin.de/Home/berendt-spiliopoulou-vldbj00.pdf
MolFea (esp. The example): S. Kramer, L. De Raedt, C. Helma. Molecular Feature Mining in HIV Data, in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2001.
De Raedt, L. (2002) A perspective on inductive databases. SIGKDD Explorations. Volume 4, Issue 2, 69-77. http://owl-workshop.man.ac.uk/acceptedLong/submission_25.pdf