+ All Categories
Home > Documents > ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association...

ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association...

Date post: 27-Dec-2015
Category:
Upload: janice-french
View: 214 times
Download: 0 times
Share this document with a friend
26
1 ICMLC2007, Aug. 19~22, 2007, Hong Kong Incremental Maintenance of Ontology-Exploiting Association Rules Ming-Cheng Tseng 1 , Wen-Yang Lin 2 and Rong Jeng 3 1, 3 Institute of Information Engineering, I-Shou Universit y, Taiwan 2 Dept. of Comp. Sci. & Info. Eng., National University o f Kaohsiung, Taiwan August 20, 2007
Transcript

1ICMLC2007, Aug. 19~22, 2007, Hong Kong

Incremental Maintenance of Ontology-Exploiting Association Rules

Ming-Cheng Tseng1, Wen-Yang Lin2 and Rong Jeng3 1, 3 Institute of Information Engineering, I-Shou University, Taiwan

2 Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan

August 20, 2007

2ICMLC2007, Aug. 19~22, 2007, Hong Kong

Outline

Introduction

Problem description

The proposed algorithm

Performance evaluation

Conclusions

3ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction

Motivation In general, there exist lots of semantic relationships

(domain knowledge) among items It is natural to incorporate domain ontology into the

process of data mining to explore more innovative rules The source databases are changing over time

E.g., insertion, deletion, modification The discovered knowledge (rules) has to be updated to

reflect new situation

4ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction (cont.)

Association rules Given:

A database of customer transactions Each transaction is a set of items

Find all rules X Y that correlate the presence of one set of items X with another set of items Y

Example:Sony VAIO HP LaserJet 1300 (Sup. 30%, Conf. 60%)

5ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction (cont.)

Strong association rules Given:

User’s specified constraints Minimum support (min_sup) minimum confidence (min_conf)

Finding rules X Y with support and confidence larger than the user’s specified minimum values

Example: min_sup = 25%, min_conf = 50%

Sony VAIO HP LaserJet 1300 (Sup. 30%, Conf. 60%)

6ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction (cont.)

Frequent itemsets (patterns) mining The association mining problem can be reduced to the pr

oblem of mining frequent itemsets, i.e., itemsets with support larger than min_sup

Example min_sup = 25%, min_conf = 50%

Sony VAIO HP LaserJet 1300 (Sup. 30%, Conf. 60%)

sup({Sony VAIO, HP LaserJet 1300}) = 30%sup({Sony VAIO}) = 50%

7ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction (cont.)

Ontology W3C Web Ontology Working Group

“An ontology formally defines a common set of terms that are used to describe and represent a domain knowledge.”

e.g., taxonomy: a kind of ontology presenting classification relationship among objects

Tomato

Vegetable

Carrot

Kale

Non-rootVegetable

Pickle

Apple

Fruit

Papaya

8ICMLC2007, Aug. 19~22, 2007, Hong Kong

Introduction (cont.)

Ontology-exploiting association rules

---

MemoryHard Disk

NotebookDesktop PC

PC

---

---

---

RAM256MB

S60GB

IBM60GB

RAM512MB

SonyVAIO

GatewayGE

IBMTP

Printer

HPDeskJet

EpsonEPL

---

InkCartridge

PhotoConductor

TonerCartridge

---

Composition

Classification

IBM 60GB HD => HP DeskJet

9ICMLC2007, Aug. 19~22, 2007, Hong Kong

Problem Description

Incremental maintenance of ontology-exploiting association rules Given:

A database of customer transactions DB An incremental database db An item ontology T Discovered frequent itemsets in DB, L minimum support, ms, and minimum confidence, mc

Find all frequent itemsets in UD = DB + db w.r.t. ms Construct all strong rules from the frequent itemsets w.r.t. m

c

10ICMLC2007, Aug. 19~22, 2007, Hong Kong

Problem Description (cont.) -- Example

TID Purchased Items

1 IBM TP, Epson EPL, Toner Cartridge

2 Sony VAIO, IBM TP, Epson EPL

3 IBM TP, HP DeskJet, Ink Cartridge

4 HP DeskJet

5 IBM TP, HP DeskJet, Ink Cartridge

6 Sony VAIO, Ink Cartridge

Composition

Classification

PhotoConductor

TonerCartridge

HPDeskJet

Printer

EpsonEPL

- -

InkCartridge

- - - -

RAM256MB

IBM60GB

SonyVAIO

PC

IBMTP

S60GB

- -

Customer transactions DB

L1 Count L2 & L3 Count

{Printer}{PC}{IBM TP}{RAM 256MB*}{IBM 60GB*}

55454

{Printer, PC}{Printer, IBM TP}{Printer, RAM 256MB*}{Printer, IBM 60GB*}{RAM 256MB*, IBM 60GB*}{Printer, RAM 256MB*, IBM 60GB*}

444444

Discovered frequent itemsets L

Item ontology G

minsup = 70% (algorithms AROC, AROS)

11ICMLC2007, Aug. 19~22, 2007, Hong Kong

Problem Description (cont.)

Example

TID Purchased Items

1 IBM TP, Epson EPL, Toner Cartridge

2 Sony VAIO, IBM TP, Epson EPL

3 IBM TP, HP DeskJet, Ink Cartridge

4 HP DeskJet

5 IBM TP, HP DeskJet, Ink Cartridge

6 Sony VAIO, Ink Cartridge

Composition

Classification

PhotoConductor

TonerCartridge

HPDeskJet

Printer

EpsonEPL

- -

InkCartridge

- - - -

RAM256MB

IBM60GB

SonyVAIO

PC

IBMTP

S60GB

- -

TID Items Purchased

7 Toner Cartridge

8 IBM TP, HP DeskJet, IBM 60GB, Toner Cartridge

9 IBM 60GB, Toner Cartridge

Customer transactions DB

Incremental transactions db

Item ontology G

minsup = 70%

Updated frequent itemsets L’

??

12ICMLC2007, Aug. 19~22, 2007, Hong Kong

Basic scheme An Apriori-based maintenance algorithm Employing a bottom-up, level-wise searching strategy

Starting from frequent 1-itemset, L1, then L2, …, Lk, etc.

A B C D

ABC ABD BCDACD

ABCD

AB AC AD BC BD CD

The Proposed Algorithm – IMARO

13ICMLC2007, Aug. 19~22, 2007, Hong Kong

Notation Definition

DB Original database

db Incremental database

UD Updated database UD DB + db

T Item ontology

ED Extension of DB with extended items in T

ed Extension of db with extended items in T

UE Updated extended database UE ED + ed

The Proposed Algorithm – IMARO (cont.)

Terminology

14ICMLC2007, Aug. 19~22, 2007, Hong Kong

Example

The Proposed Algorithm – IMARO (cont.)

15ICMLC2007, Aug. 19~22, 2007, Hong Kong

Note on database extension A component item may exist as a primitive item itself To clarify the meaning of associations involving such

an item, we have to differentiate the role this item play

e.g., IBM TP => Ink Cartridge

buy an IBM TP notebook, also buy an Ink Cartridge

buy an IBM TP notebook, also buy an product composed of Ink Cartridge

The Proposed Algorithm – IMARO (cont.)

TID Purchased Items

5 IBM TP, HP DeskJet, Ink Cartridge

TID Primitive Items Extended Items

5 IBM TP, HP DeskJet, Ink Cartridge*

PC, RAM 256MB, IBM 60GB, Printer, Ink

Cartridge

16ICMLC2007, Aug. 19~22, 2007, Hong Kong

The Proposed Algorithm – IMARO (cont.)

EDkL

EDkL 1

Candidate Generating kC

Mining

Freq. orInfreq. in

UE

UEkL

Determined

UndeterminedScan

Count

TDB

db

T

edkL

1

2

3

4

Process flow for updating frequent k-itemsets

e.g., AROC or AROS

17ICMLC2007, Aug. 19~22, 2007, Hong Kong

Frequent/infrequent itemsets inference

The Proposed Algorithm – IMARO (cont.)

Min.Support

Min.Support

Small Itemset

Small Itemset Large Itemset

Case 1Case 4Case 2Case 3

Large Itemset

DB

db

T

+

+T

UDT

+

Conditions Results

LED Led UE Action Case

freq. no 1

undetd. compare supUD(A) with ms 2

undetd. scan DB 3

infreq. no 4

18ICMLC2007, Aug. 19~22, 2007, Hong Kong

The Proposed Algorithm – IMARO (cont.)

Optimization 1: Candidate pruning Any candidate itemset that contains both an item and anyo

ne of its extensions (generalized item or component) is pruned.

PhotoConductor

TonerCartridge

HPDeskJet

Printer

EpsonEPL

- -

InkCartridge

- - - -

RAM256MB

IBM60GB

SonyVAIO

PC

IBMTP

S60GB

- -

{Epson EPL, Printer}

{Epson EPL, Toner Cartridge*}

19ICMLC2007, Aug. 19~22, 2007, Hong Kong

The Proposed Algorithm – IMARO (cont.)

The extension of an item can be added only if that item does appear in at least one candidate itemset being counted currently

Photo

Conductor

Toner

Cartridge

HP

DeskJet

Printer

Epson

EPL

- -

Ink

Cartridge

- - - -

RAM

256MB

IBM

60GB

Sony

VAIO

PC

IBM

TP

S

60GB

- -

Optimization 2: Extension filtering

20ICMLC2007, Aug. 19~22, 2007, Hong Kong

Performance Evaluation

Compared with applying our proposed algorithms, AROC and AROS, to the whole database DB+db with T Test data

A synthetic dataset generated by the IBM data generator with artificially–built ontology

Parameter Default value

|DB| Number of original transactions 200,000

|t| Average size of transactions 20

N Number of items 362

R Number of groups 30

L Number of levels 4

F Fanout 5

21ICMLC2007, Aug. 19~22, 2007, Hong Kong

Performance Evaluation (cont.)

Varying minimum supports

10

100

1000

1 1.5 2 2.5 3 3.5

ms %

Run

tim

e (s

ec.)

AROC AROS IMARO

log

|db| = 40,000

22ICMLC2007, Aug. 19~22, 2007, Hong Kong

Performance Evaluation (cont.)

Varying incremental transaction size

0

50

100

150

200

250

300

2 4 6 8 10 12 14 16 18 20

Number of incremental transctions (x 10,000)

Run

tim

e (s

ec.)

AROC AROS IMARO ms = 1.5%

23ICMLC2007, Aug. 19~22, 2007, Hong Kong

Conclusions

We have investigated the problem of updating ontology-exploiting association rules when new transactions are inserted into the database

An Apriori-based algorithm is proposed Other issues

More complicated semantic relationships and knowledge Non-uniform minimum support

Generalized item or composite item occurs more frequently Towards a total solution for evolving environments

Ontology evolution, database update Interactive refinement of support constraints

24ICMLC2007, Aug. 19~22, 2007, Hong Kong

Thanks for Thanks for your your attention!attention!

25ICMLC2007, Aug. 19~22, 2007, Hong Kong

Conclusions (cont.)

Taxonomy of semantic relationships

*source: 1993, Veda C. Storey, VLDB journal

26ICMLC2007, Aug. 19~22, 2007, Hong Kong

Related Work

Comparison with previous work

Contributors Model of incremental maintenance of association rules

Type of database update Type of ontology

Srikant & Agrawal, 1995 none classification

Han & Fu, 1995 none classification

Cheung et al., 1996 insertion classification

Cheung et al., 1997 insertion, deletion and modification

none

Jea et al., 2003 none composition

Chien et al., 2005 none classification & composition


Recommended