Post on 18-Jan-2016
transcript
Restrictions on Concept Lattices for Pattern Management
Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor,
Lahcen Boumedjout, Jean Vaillancourt
October 20, 2010
2
Outline Introduction
Pattern management Restrictions on concept lattices
Projection Selection
Algorithms Depth-first Search (DFS) Breadth-first Search (BFS) Leading Bits Sort (LBS) Bottom-up Search (BUS)
Experiments Conclusion
3
Objectives
Adapt the relational operators (e.g. projection) to the formal concept analysis framework to manipulate set of concepts.
Manage patterns using restriction on objects or attributes of a given data set.
Query a concept lattice throught a restriction (projection or selection).
Compare restriction on formal contexts vs restriction on concept lattices.
4
Pattern Management Objective
Store, process and retrieve patterns defined over raw data.
Different types of patterns Rules, clusters, decision trees, ….
Basic operations Selection, projection, join, union, difference,
… Cross-over operations
Drill-through: from a pattern to raw data Covering: does a pattern hold for a given
dataset? Approximation (Quafafou, Missaoui &
Kwuida)
5
Pattern Management European PANDA Project
a generic framework to model various classes of patterns.
SQL operators CINQ Project
Inductive databases. Terrovitis and al. (2007)
A uniform framework for data and pattern management.
Links between data and pattern spaces. Jeudy and al.(2007)
A Model for Managing Collections of Patterns.
6
Restrictions on Concept lattices
7
Projection of a concept set on to N . The projection of a concept set r over a set
of attributes N M is given by:
N(r)= Project(r, N) ={c1=(Ext(c), Int(c)N) c r and c1 is maximal in its equivalence class}.
Two concepts c1 and c2 are equivalent if Int(c1)N= Int(c2)N.
Restrictions on Concept lattices
8
Restrictions on Concept lattices
Selection on a concept set . The selection on a concept set r w.r.t. a
(conjunctive) restriction F on attributes Ai (i N) is a set of concepts c that logically satisfy that restriction.Select(r, F= {A1=a1 … AN=aN })= {c c r and c = F}
The output corresponds to the order ideal in r generated by i N (ai) where (ai)=(ai’, ai”)
For simplicity reasons, we assume that F is in a conjunctive form.
9
Example
a b c
1 2
3 4 5
6 7 8
XX
XX
XX
X
d e
XX
X
XX
X
X
g
X
XX
X
X
h
XXX
X
XX
X
X
X
XXX
XXX
i Objects
Transactions
Properties - Items
f
Basket market analysis Transactions and items (products) Context K:= (G, M, I)
10
Example
a
ac ab
acd123
56
4
7acde
ag
adfagh
3
23 68
12345678
6
abcdf
acdf
abdf
ad
acgh abcabg
abgh
abcgh
abcdefghi
568
678
567812356
346781234
234
3436
acghi
Concept Lattice
11
Example - Projectiona
ac ab
acd123
56
47
acde
ag
adfagh
3
23 68
12345678
6
abcdf
acdf
abdf
ad
acgh abcabg
abgh
abcgh
abcdefghi
568
678
56781235634678
1234
234
34
36
acghi
Project(r, {abcd})
12
Projection
Projection on {S; T;U; V } of the initial concept lattice. On the left we can see equivalence classes marked on the initial lattice. On the right we note that each equivalence classis represented by a single node (behind which a whole class is attached).
13
Algorithms - Projection
Depth-first Search (DFS) Breadth-first Search (BFS) Leading Bits Sort (LBS) Bottom-up Search (BUS)
14
Depth-first Search
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a 12345678
15
Depth-first Search
Input lattice B
Output lattice B1 a 12345678
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Algorithm idea:
16
Depth-first Search
Input lattice B
Output lattice B1 a 12345678
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Algorithm idea:
17
Depth-first Search
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a 12345678
ac 34
18
Depth-first Search
Input lattice B
Output lattice B1 a 12345678
ac 34
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Algorithm idea:
19
Depth-first Search
Input lattice B
Output lattice B1 a 12345678
ac 34
Set the first class with the top element.
Test if the current node is in the same class with one of his marked parents or children.
If they do not belong to the same class, then create a new membership class for it.
Set up the links between the representatives of equivalence classes.
Algorithm idea:
ab123
abc 3
abcd
34678
20
Breadth-first Search
Start with the top element e.
Move to each node in the children of this element and compare it with e.
If it is not in the same class,
then check whether all parents are marked. If so, then we create a new class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a 12345678
21
Breadth-first Search
Start with the top element e.
Move to each node in the children of this element and compare it with e.
If it is not in the same class,
then check whether all parents are marked. If so, then we create a new class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a 12345678
22
Breadth-first Search
Start with the top element e.
Move to each node in the children of this element and compare it with e.
If it is not in the same class,
then check whether all parents are marked. If so, then we create a new class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a 12345678
ab 12356 ac 345678 ad 5678
23
Breadth-first Search
Start with the top element e.
Move to each node in the children of this element and compare it with e.
If it is not in the same class,
then check whether all parents are marked. If so, then we create a new class for it.
Set up the links between the representatives of equivalence classes.
Input lattice B
Output lattice B1
Algorithm idea:
a
ab 12356 ac 345678 ad 5678
12345678
24
Leading Bits Sort
The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.
The equivalent concepts/intents are necessarily consecutive.
Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.
Intents of the input lattice B
Algorithm idea:Intents a b c d e f g h i
a 1 0 0 0 0 0 0 0 0
ag 1 0 0 0 0 0 1 0 0
agh 1 0 0 0 0 0 1 1 0
ad 1 0 0 1 0 0 0 0 0
adf 1 0 0 1 0 1 0 0 0
ac 1 0 1 0 0 0 0 0 0
acgh 1 0 1 0 0 0 1 1 0
acghi 1 0 1 0 0 0 0 0 1
ab 1 1 0 0 0 0 0 0 0
abg 1 1 0 0 0 0 1 0 0
abgh 1 1 0 0 0 0 1 1 0
Project(r, {abcd})
25
Leading Bits Sort
The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.
The equivalent concepts/intents are necessarily consecutive.
Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.
Intents of the input lattice B
Algorithm idea:Intents a b c d e f g h i
a 1 0 0 0 0 0 0 0 0
ag 1 0 0 0 0 0 1 0 0
agh 1 0 0 0 0 0 1 1 0
ad 1 0 0 1 0 0 0 0 0
adf 1 0 0 1 0 1 0 0 0
ac 1 0 1 0 0 0 0 0 0
acgh 1 0 1 0 0 0 1 1 0
acghi 1 0 1 0 0 0 0 0 1
ab 1 1 0 0 0 0 0 0 0
abg 1 1 0 0 0 0 1 0 0
abgh 1 1 0 0 0 0 1 1 0
Project(r, {abcd})
26
Leading Bits Sort
The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B.
The equivalent concepts/intents are necessarily consecutive.
Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes.
Algorithm idea:
acab
acd
6
68
abcd
abd
ad
abc678
56781235634678
36
Output lattice B1
Project(r, {abcd})a 12345678
27
Bottom-up Search
We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.
There are two possibilities:
If the concept c has exactly N as intent then the output of the projection is the filter generated by c.
If N is not an intent, then the attributes that are in N” ∩ N will be deleted one by one from the intent of concepts in the filter c.
Input lattice B Algorithm idea:
28
Bottom-up Search
We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.
There are two possibilities:
If the concept c has exactly N as intent then the output of the projection is the filter generated by c.
If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.
Input lattice B Algorithm idea:
29
Bottom-up Search
We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.
There are two possibilities:
If the concept c has exactly N as intent then the output of the projection is the filter generated by c.
If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.
The filter c Algorithm idea:
30
Bottom-up Search
We start the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N.
There are two possibilities:
If the concept c has exactly N as intent then the output of the projection is the filter generated by c.
If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c.
Algorithm idea:
Output lattice B1
31
Experiments
Environment
Java,1.9 GHz processor and 3GB memory Parameters
Nb of concepts in K= (G, M, I) Density of K: 40%, 50%,60% Ratio N/M: (10%,...,80%) data: from 71114 to 234946 concepts
32
Experiments
Results
Better performance for LBS and BUS when the percentage of projection is higher than 40%
LBS has lower variation than BUS DFS is the worst algorithm Projection on context is not the best choice!
33
Experiments
34
Conclusion
Focus on projection Work can be adapted for the selection Possibility to handle the two operations in one
shot on a given concept lattice Projection on lattices vs on contexts
Special cases where the projection on lattices is more efficient
More experiments are needed
35
Future Work An important fact: the projection is the
inverse operation of the assembly of two lattices!
Projection on implication sets Algorithm improvement
Execution time and memory consumption Other operations on concept lattice
36
THANK YOU!
37
Projection K=(G, M, W, I) Projection on a set N of attributes
38
Selection K=(G, M, W, I) Selection on a set of objects
39
DFS complexity To analyze the complexity of this
procedure, we consider the number of accesses to each node and the number of comparisons.
Each node is visited at least twice (on the way down and back).
If q is the number of equivalence classes, then there are in average q/2 comparisons to mark a node.
40
BFS complexity To evaluate the complexity of this
algorithm, we consider two parameters: the number of needed comparisons and the number of times each node is accessed. Each node o is visited exactly #parent(o) + 1 times. Then, the overall access to nodes is :
41
LBS complexity The sorting process with respect to the
lectic order can be done in O(n x ln(n)), where n is the number of concepts in B. The marking of equivalence classes on B is straightforward since there is one linear pass in the linearly sorted set of concepts. Thus, the overall process has a complexity of O(n x ln(n)).
42
ipred• It sorts the elements of the lattice by size.• All the Δ[ci] in each element of the input set is initialized to the
empty set.• This Δ[ci] will contain the accumulation of faces for each element.• The first element in the border is the first element in the
sequence• All remaining elements in the input sequence are processed in the
order inwhich they appear in the enumeration.• The candidate set is computed by intersecting the current
element ci with
all the elements in the border.• We check if the current element belongs to the upper set of the
elements that are in the candidate set• If the test result is positive, ci ≺ ˜c, so we can add this connection
to the output set, then we add that face to the set of accumulated faces of ˜c and finally, we remove ˜c from the Border
• Before the next element is processed, we make sure that ci is added to the
border
43
BUS complexity The complexity of this procedure
depends on two factors: When we find the most general concept
whose intent contains the set of attributes N.
The number of attributes to be deleted
44
Work of Jeudy and al. Sort the concepts on the topological
order Find the equivalence classes and their
representatives. Scan an other time the input lattice to
built links between the representatives of equivalence classes.