Restrictions on Concept Lattices for Pattern Management Léonard Kwuida, Rokia Missaoui, Beligh Ben...

Post on 18-Jan-2016

218 views 0 download

transcript

Restrictions on Concept Lattices for Pattern Management

Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor,

Lahcen Boumedjout, Jean Vaillancourt

October 20, 2010

2

Outline Introduction

Pattern management Restrictions on concept lattices

Projection Selection

Algorithms Depth-first Search (DFS) Breadth-first Search (BFS) Leading Bits Sort (LBS) Bottom-up Search (BUS)

Experiments Conclusion

3

Objectives

Adapt the relational operators (e.g. projection) to the formal concept analysis framework to manipulate set of concepts.

Manage patterns using restriction on objects or attributes of a given data set.

Query a concept lattice throught a restriction (projection or selection).

Compare restriction on formal contexts vs restriction on concept lattices.

4

Pattern Management Objective

Store, process and retrieve patterns defined over raw data.

Different types of patterns Rules, clusters, decision trees, ….

Basic operations Selection, projection, join, union, difference,

… Cross-over operations

Drill-through: from a pattern to raw data Covering: does a pattern hold for a given

dataset? Approximation (Quafafou, Missaoui &

Kwuida)

5

Pattern Management European PANDA Project

a generic framework to model various classes of patterns.

SQL operators CINQ Project

Inductive databases. Terrovitis and al. (2007)

A uniform framework for data and pattern management.

Links between data and pattern spaces. Jeudy and al.(2007)

A Model for Managing Collections of Patterns.

6

Restrictions on Concept lattices

7

Projection of a concept set on to N . The projection of a concept set r over a set

of attributes N M is given by:

N(r)= Project(r, N) ={c1=(Ext(c), Int(c)N) c r and c1 is maximal in its equivalence class}.

Two concepts c1 and c2 are equivalent if Int(c1)N= Int(c2)N.

Restrictions on Concept lattices

8

Restrictions on Concept lattices

Selection on a concept set . The selection on a concept set r w.r.t. a

(conjunctive) restriction F on attributes Ai (i N) is a set of concepts c that logically satisfy that restriction.Select(r, F= {A1=a1 … AN=aN })= {c c r and c = F}

The output corresponds to the order ideal in r generated by i N (ai) where (ai)=(ai’, ai”)

For simplicity reasons, we assume that F is in a conjunctive form.

9

Example

a b c

1 2

3 4 5

6 7 8

XX

XX

XX

X

d e

XX

X

XX

X

X

g

X

XX

X

X

h

XXX

X

XX

X

X

X

XXX

XXX

i Objects

Transactions

Properties - Items

f

Basket market analysis Transactions and items (products) Context K:= (G, M, I)

10

Example

a

ac ab

acd123

56

4

7acde

ag

adfagh

3

23 68

12345678

6

abcdf

acdf

abdf

ad

acgh abcabg

abgh

abcgh

abcdefghi

568

678

567812356

346781234

234

3436

acghi

Concept Lattice

11

Example - Projectiona

ac ab

acd123

56

47

acde

ag

adfagh

3

23 68

12345678

6

abcdf

acdf

abdf

ad

acgh abcabg

abgh

abcgh

abcdefghi

568

678

56781235634678

1234

234

34

36

acghi

Project(r, {abcd})

12

Projection

Projection on {S; T;U; V } of the initial concept lattice. On the left we can see equivalence classes marked on the initial lattice. On the right we note that each equivalence classis represented by a single node (behind which a whole class is attached).

13

Algorithms - Projection

Depth-first Search (DFS) Breadth-first Search (BFS) Leading Bits Sort (LBS) Bottom-up Search (BUS)

14

Depth-first Search

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a 12345678

15

Depth-first Search

Input lattice B

Output lattice B1 a 12345678

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Algorithm idea:

16

Depth-first Search

Input lattice B

Output lattice B1 a 12345678

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Algorithm idea:

17

Depth-first Search

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a 12345678

ac 34

18

Depth-first Search

Input lattice B

Output lattice B1 a 12345678

ac 34

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Algorithm idea:

19

Depth-first Search

Input lattice B

Output lattice B1 a 12345678

ac 34

Set the first class with the top element.

Test if the current node is in the same class with one of his marked parents or children.

If they do not belong to the same class, then create a new membership class for it.

Set up the links between the representatives of equivalence classes.

Algorithm idea:

ab123

abc 3

abcd

34678

20

Breadth-first Search

Start with the top element e.

Move to each node in the children of this element and compare it with e.

If it is not in the same class,

then check whether all parents are marked. If so, then we create a new class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a 12345678

21

Breadth-first Search

Start with the top element e.

Move to each node in the children of this element and compare it with e.

If it is not in the same class,

then check whether all parents are marked. If so, then we create a new class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a 12345678

22

Breadth-first Search

Start with the top element e.

Move to each node in the children of this element and compare it with e.

If it is not in the same class,

then check whether all parents are marked. If so, then we create a new class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a 12345678

ab 12356 ac 345678 ad 5678

23

Breadth-first Search

Start with the top element e.

Move to each node in the children of this element and compare it with e.

If it is not in the same class,

then check whether all parents are marked. If so, then we create a new class for it.

Set up the links between the representatives of equivalence classes.

Input lattice B

Output lattice B1

Algorithm idea:

a

ab 12356 ac 345678 ad 5678

12345678

24

Leading Bits Sort

The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.

The equivalent concepts/intents are necessarily consecutive.

Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.

Intents of the input lattice B

Algorithm idea:Intents a b c d e f g h i

a 1 0 0 0 0 0 0 0 0

ag 1 0 0 0 0 0 1 0 0

agh 1 0 0 0 0 0 1 1 0

ad 1 0 0 1 0 0 0 0 0

adf 1 0 0 1 0 1 0 0 0

ac 1 0 1 0 0 0 0 0 0

acgh 1 0 1 0 0 0 1 1 0

acghi 1 0 1 0 0 0 0 0 1

ab 1 1 0 0 0 0 0 0 0

abg 1 1 0 0 0 0 1 0 0

abgh 1 1 0 0 0 0 1 1 0

Project(r, {abcd})

25

Leading Bits Sort

The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.

The equivalent concepts/intents are necessarily consecutive.

Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.

Intents of the input lattice B

Algorithm idea:Intents a b c d e f g h i

a 1 0 0 0 0 0 0 0 0

ag 1 0 0 0 0 0 1 0 0

agh 1 0 0 0 0 0 1 1 0

ad 1 0 0 1 0 0 0 0 0

adf 1 0 0 1 0 1 0 0 0

ac 1 0 1 0 0 0 0 0 0

acgh 1 0 1 0 0 0 1 1 0

acghi 1 0 1 0 0 0 0 0 1

ab 1 1 0 0 0 0 0 0 0

abg 1 1 0 0 0 0 1 0 0

abgh 1 1 0 0 0 0 1 1 0

Project(r, {abcd})

26

Leading Bits Sort

The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.

The equivalent concepts/intents are necessarily consecutive.

Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.

Algorithm idea:

acab

acd

6

68

abcd

abd

ad

abc678

56781235634678

36

Output lattice B1

Project(r, {abcd})a 12345678

27

Bottom-up Search

We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.

There are two possibilities:

If the concept c has exactly N as intent then the output of the projection is the filter generated by c.

If N is not an intent, then the attributes that are in N” ∩ N will be deleted one by one from the intent of concepts in the filter c.

Input lattice B Algorithm idea:

28

Bottom-up Search

We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.

There are two possibilities:

If the concept c has exactly N as intent then the output of the projection is the filter generated by c.

If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.

Input lattice B Algorithm idea:

29

Bottom-up Search

We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.

There are two possibilities:

If the concept c has exactly N as intent then the output of the projection is the filter generated by c.

If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.

The filter c Algorithm idea:

30

Bottom-up Search

We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.

There are two possibilities:

If the concept c has exactly N as intent then the output of the projection is the filter generated by c.

If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.

Algorithm idea:

Output lattice B1

31

Experiments

Environment

Java,1.9 GHz processor and 3GB memory Parameters

Nb of concepts in K= (G, M, I) Density of K: 40%, 50%,60% Ratio N/M: (10%,...,80%) data: from 71114 to 234946 concepts

32

Experiments

Results

Better performance for LBS and BUS when the percentage of projection is higher than 40%

LBS has lower variation than BUS DFS is the worst algorithm Projection on context is not the best choice!

33

Experiments

34

Conclusion

Focus on projection Work can be adapted for the selection Possibility to handle the two operations in one

shot on a given concept lattice Projection on lattices vs on contexts

Special cases where the projection on lattices is more efficient

More experiments are needed

35

Future Work An important fact: the projection is the

inverse operation of the assembly of two lattices!

Projection on implication sets Algorithm improvement

Execution time and memory consumption Other operations on concept lattice

36

THANK YOU!

37

Projection K=(G, M, W, I) Projection on a set N of attributes

38

Selection K=(G, M, W, I) Selection on a set of objects

39

DFS complexity To analyze the complexity of this

procedure, we consider the number of accesses to each node and the number of comparisons.

Each node is visited at least twice (on the way down and back).

If q is the number of equivalence classes, then there are in average q/2 comparisons to mark a node.

40

BFS complexity To evaluate the complexity of this

algorithm, we consider two parameters: the number of needed comparisons and the number of times each node is accessed. Each node o is visited exactly #parent(o) + 1 times. Then, the overall access to nodes is :

41

LBS complexity The sorting process with respect to the

lectic order can be done in O(n x ln(n)), where n is the number of concepts in B. The marking of equivalence classes on B is straightforward since there is one linear pass in the linearly sorted set of concepts. Thus, the overall process has a complexity of O(n x ln(n)).

42

ipred• It sorts the elements of the lattice by size.• All the Δ[ci] in each element of the input set is initialized to the

empty set.• This Δ[ci] will contain the accumulation of faces for each element.• The first element in the border is the first element in the

sequence• All remaining elements in the input sequence are processed in the

order inwhich they appear in the enumeration.• The candidate set is computed by intersecting the current

element ci with

all the elements in the border.• We check if the current element belongs to the upper set of the

elements that are in the candidate set• If the test result is positive, ci ≺ ˜c, so we can add this connection

to the output set, then we add that face to the set of accumulated faces of ˜c and finally, we remove ˜c from the Border

• Before the next element is processed, we make sure that ci is added to the

border

43

BUS complexity The complexity of this procedure

depends on two factors: When we find the most general concept

whose intent contains the set of attributes N.

The number of attributes to be deleted

44

Work of Jeudy and al. Sort the concepts on the topological

order Find the equivalence classes and their

representatives. Scan an other time the input lattice to

built links between the representatives of equivalence classes.