Date post: | 29-Nov-2014 |
Category: |
Education |
Upload: | insa-de-lyon |
View: | 282 times |
Download: | 1 times |
On the Mining of Numerical Data withFormal Concept Analysis
These de doctorat en informatique
Mehdi Kaytoue
22 April 2011
Amedeo Napoli Sebastien Duplessis
Somewhere... in a temperate forest...
2 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
A biological problem
: How does symbiosis work at the cellular level?
Analyse biological processes
Find genes involved in symbiosis
Choose a model forunderstanding symbiosis:Laccaria bicolor
Analysing Gene Expression Data (GED)
F. Martin et al.The Genome of Laccaria Bicolor Provides Insights into Mycorrhizal Symbiosis.In Nature., 2008.
3 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Gene expression data (GED)
A numerical dataset, or data-table with
genes in rows
biological situations in columns
expression value of a gene in row forthe situation in column.
A row denotes the expression profileof a gene (GEP)
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
Biological hypothesis
A group of genes having a similar expression profile interact to-gether within the same biological process
4 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
With very large datasets...Gene expression data of Laccaria bicolor
22,294 genes
3 types of biological situations reflecting cells of the organism invarious stages of its biological cycle:
free living myceliumsymbiotic tissuesfruiting bodies
Attribute values ranged in [0, 65000]
5 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Knowledge discovery in databases
An iterative and interactive process
U. Fayyad, G. Piatetsky-Shapiro and P. SmythThe KDD process for Extracting Useful Knowledge from Volumes of Data.In Commun. ACM., 1996.
6 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Mining gene expression data
Extracting (maximal) rectangles in numerical data
A set of genes co-expressed in some biological situations
Local patterns: biological processes may be activated in somesituations only
Overlapping patterns: a gene may be involved in severalbiological process
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 0 6g3 2 2 1 7 6g4 8 9 2 6 7
Biclustering: A difficult problem relying on heuristics
R. PeetersThe Maximum Edge Biclique Problem is NP-Complete.In Discrete Applied Math., vol. 131, no. 3., 2003
7 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Core of the thesis
Mining gene expression data with formal concept analysis
Turning GED into binary, encoding over/under expression
Bringing the problem into well-known settings
Allowing a complete and mathematically well defined approach
Exploiting algorithms and “tools”
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 5 6g3 2 2 1 7 6g4 8 9 2 6 7
⇒
m1 m2 m3 m4 m5
g1 0 0 0 0 1g2 0 0 0 0 1g3 0 0 0 1 1g4 1 1 0 1 1
Can we work with FCA directly on numerical data?
8 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Core of the thesis
Mining gene expression data with formal concept analysis
Turning GED into binary, encoding over/under expression
Bringing the problem into well-known settings
Allowing a complete and mathematically well defined approach
Exploiting algorithms and “tools”
m1 m2 m3 m4 m5
g1 1 2 2 1 6g2 2 1 1 5 6g3 2 2 1 7 6g4 8 9 2 6 7
⇒
m1 m2 m3 m4 m5
g1 ×g2 ×g3 × ×g4 × × × ×
Can we work with FCA directly on numerical data?
8 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Context
Outline
1 Context
2 Formal Concept Analysis
3 ContributionsInterval pattern structuresIntroducing similarityA KDD-oriented discussion
4 Conclusion and perspectives
9 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Formal Concept Analysis
A binary table as a formal context
Given by (G ,M, I ) with
G a set of objects
M a set of attributes
I a binary relation between objects and attributes:(g ,m) ∈ I means that “object g owns attribute m”
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
G = {g1, . . . , g5}M = {m1,m2,m3}
(g1,m3) ∈ I
B. Ganter and R. WilleFormal Concept Analysis.In Springer, Mathematical foundations., 1999.
10 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Formal Concept Analysis
A maximal rectangle as a formal concept
A Galois connection to characterize formal concepts
A′ = {m ∈ M | ∀g ∈ A ⊆ G : (g ,m) ∈ I}
B ′ = {g ∈ G | ∀m ∈ B ⊆ M : (g ,m) ∈ I}
(A,B) is a concept with extent A = B ′ and intent B = A′
{g3}′ = {m2,m3}
{m2,m3}′ = {g3, g4, g5}
m1 m2 m3
g1 × ×g2 × ×g3 × ×g4 × ×g5 × × ×
({g3, g4, g5}, {m2,m3}) is a formal concept
11 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Formal Concept Analysis
Concept latticeOrdered set of concepts...
(A1,B1) ≤ (A2,B2)⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1)
({g1, g5}, {m1,m3}) ≤ ({g1, g2, g5}, {m1})
... with interesting properties
Maximality of concepts as rectangles
Overlapping of concepts
Specialization/generalisation hierarchy
Synthetic representation of the data without loss of information
12 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Formal Concept Analysis
Handling numerical data with FCA?
Initial problem
Extracting groups of genes with similar numerical values
Conceptual scaling (discretization or binarization)
An object has an attribute if its value lies in a predefined interval
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
m1, [4, 5] m2, [4, 7] m3, [5, 6]
g1 × × ×g2g3 × ×g4 ×g5 × ×
Different scalings: different interpretations of the data
General problem of the thesis
How to directly build a concept lattice from numerical data?
13 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
1 Context
2 Formal Concept Analysis
3 ContributionsInterval pattern structuresIntroducing similarityA KDD-oriented discussion
4 Conclusion and perspectives
Contributions – Interval pattern structures
How to handle complex descriptions
An intersection as a similarity operator
∩ behaves as similarity operator
{m1,m2} ∩ {m1,m3} = {m1}
∩ induces an ordering relation ⊆
N ∩ O = N ⇐⇒ N ⊆ O{m1} ∩ {m1,m2} = {m1} ⇐⇒ {m1} ⊆ {m1,m2}
∩ has the properties of a meet u in a semi lattice,a commutative, associative and idempotent operation
c u d = c ⇐⇒ c v dA. Tversky
Features of similarity.In Psychological Review, 84 (4), 1977.
15 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Interval pattern structures
Pattern structure
Given by (G , (D,u), δ)
G a set of objects
(D,u) a semi-lattice of descriptions or patterns
δ a mapping such as δ(g) ∈ D describes object g
A Galois connection
A� =l
g∈Aδ(g) for A ⊆ G
d� = {g ∈ G |d v δ(g)} for d ∈ (D,u)
B. Ganter and S. O. KuznetsovPattern Structures and their Projections.In International Conference on Conceptual Structures, 2001.
16 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Interval pattern structures
Numerical data are pattern structuresInterval pattern structures
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g1, g2}� =l
g∈{g1,g2}δ(g)
= 〈5, 7, 6〉 u 〈6, 8, 4〉= 〈[5, 6], [7, 8], [4, 6]〉
〈[5, 6], [7, 8], [4, 6]〉� = {g ∈ G |〈[5, 6], [7, 8], [4, 6]〉 v δ(g)}= {g1, g2, g5}
({g1, g2, g5}, 〈[5, 6], [7, 8], [4, 6]〉) is a (pattern) concept
17 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Interval pattern structures
Interval pattern concept lattice
Lowest concepts: few objects, small intervals
Highest concepts: many objects, large intervals
Concept/pattern overwhelming
18 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Interval pattern structures
Links with conceptual scaling
Interordinal scaling [Ganter & Wille]
A scale to encode intervals of attribute values
m1 ≤ 4 m1 ≤ 5 m1 ≤ 6 m1 ≥ 4 m1 ≥ 5 m1 ≥ 6
4 × × × ×5 × × × ×6 × × × ×
Equivalent concept lattice
Example({g1, g2, g5}, {m1 ≤ 6,m1 ≥ 4,m1 ≥ 5, ... , ... })({g1, g2, g5}, 〈[5, 6] , ... , ... 〉)
Why should we use pattern structures as we have scaling?
Processing a pattern structure is more efficient
19 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Outline
1 Context
2 Formal Concept Analysis
3 ContributionsInterval pattern structuresIntroducing similarityA KDD-oriented discussion
4 Conclusion and perspectives
20 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 2
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 1
4 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Introducing a similarity relation
Grouping in a same concept objects having similar values?
A natural similarity relation on numbers
a 'θ b ⇔ |a− b| ≤ θ e.g. 4 '1 5 4 6'1 6
Similarity operator u in pattern structures
θ = 04 5 6
[4,5] [5,6]
[4,6]
How to consider a similarity relation w.r.t. a distance?
21 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D,u) denoting dissimilarity
c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}
({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a concept:g3 and g4 have similar values for attributes m1 and m2 only
22 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Towards a similarity between values
Introduce an element ∗ ∈ (D,u) denoting dissimilarity
c u d = ∗ iff c 6'θ dc u d 6= ∗ iff c 'θ d
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = 〈[4, 4], [8, 9], ∗〉〈[4, 4], [8, 9], ∗〉� = {g3, g4}
({g3, g4}, 〈[4, 4], [8, 9], ∗〉) is a concept:g3 and g4 have similar values for attributes m1 and m2 only
Is {g3, g4} maximal w.r.t. similarity? We can add g5...
22 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Classes of tolerance in numerical data
Towards maximal sets of similar values
'θ a tolerance relation : reflexive, symmetric, not transitive
Consider an attribute taking values in {6, 8, 11, 16, 17} and θ = 5
8 '5 11, 11 '5 16 but 8 6'5 16
A class of tolerance as a maximal set of pairwise similar values
{6, 8, 11} {11, 16} {16, 17}[6, 11] [11, 16] [16, 17]
S. O. KuznetsovGalois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.In Formal Concept Analysis, Foundations and Applications, 2005.
23 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Tolerance in pattern structures
Projecting the pattern structure
Each value is replaced by the interval characterizing its class oftolerance (if unique)
Each pattern d is projected with a mapping ψ(d) v d(pre-processing)
Example with θ = 1m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
{g3, g4}� = ψ(〈[4, 4], [8, 9], ∗〉)= 〈[4, 5], [8, 9], ∗〉
〈[4, 5], [8, 9], ∗〉� = {g3, g4, g5}
24 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Biological results
An extracted pattern among 2, 150 others
Genes present a high expression level in the fruit-body situations
Some of these genes encode metabolic enzymes in remobilizationof fungal resources towards the new organ in development
Other genes are unknown but specific to Laccaria Bicolor : itrequires biological experiments
25 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – Introducing similarity
Relevant publications
Interval pattern structures and GED analysis
M. Kaytoue, S. Duplessis, S. O. Kuznetsov, and A. NapoliTwo FCA-Based Methods for Mining Gene Expression Data.In International Conference on Formal Concept Analysis (ICFCA), 2009.
M. Kaytoue, S. O. Kuznetsov, A. Napoli and S. DuplessisMining Gene Expression Data with Pattern Structures in Formal Concept Analysis.In Information Sciences. Spec. Iss.: Lattices (Elsevier), 2011.
Introducing tolerance relations and information fusion
M. Kaytoue, Z. Assaghir, N. Messai and A. NapoliTwo Complementary Classification Methods for Designing a Concept Lattice from Interval Data.In Foundations of Information and Knowledge Systems, 6th International Symposium (FoIKS), 2010.
M. Kaytoue, Z. Assaghir, A. Napoli and S. O. KuznetsovEmbedding Tolerance Relations in Formal Concept Analysis: an Application in Information Fusion.In ACM Conference on Information and Knowledge Management (CIKM), 2010.
26 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions –
Other works
Pattern structures are useful for several tasks
Bi-clustering and tolerance relations
M. Kaytoue, S. O. Kuznetsov, and A. NapoliBiclustering Numerical Data in Formal Concept Analysis.In International Conference on Formal Concept Analysis (ICFCA), 2011.
Information fusion: enhancing decision making
Z. Assaghir, M. Kaytoue, A. Napoli and H. PradeManaging Information Fusion with Formal Concept Analysis.In Modeling Decisions for Artificial Intelligence, 6th International Conference (MDAI), 2010.
KDD: a study of equivalence classes of interval patterns
M. Kaytoue, S. O. Kuznetsov, and A. NapoliRevisiting Numerical Pattern Mining with Formal Concept Analysis.In International Joint Conference on Artificial Intelligence (IJCAI), 2011.
27 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Outline
1 Context
2 Formal Concept Analysis
3 ContributionsInterval pattern structuresIntroducing similarityA KDD-oriented discussion
4 Conclusion and perspectives
28 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Interval pattern search space
Counting all possible interval patterns
〈[am1 , bm1 ], [am2 , bm2 ], ...〉where ami , bmi ∈Wmi
m1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 5 8 5
∏i∈{1,...,|M|}
|Wmi | × (|Wmi |+ 1)
2
360 possible interval patterns in our small example
29 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}
〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}
〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}
〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}
〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}
〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Semantics for interval patterns
Interval patterns as (hyper) rectangles
m1 m3
g1 5 6g2 6 4g3 4 5g4 4 8g5 5 5
〈[4, 5], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [5, 7]〉� = {g1, g3, g5}〈[4, 6], [5, 6]〉� = {g1, g3, g5}〈[4, 5], [4, 6]〉� = {g1, g3, g5}〈[4, 6], [5, 7]〉� = {g1, g3, g5}〈[4, 5], [4, 7]〉� = {g1, g3, g5}
3
4
5
6
7
8
3 4 5 6m1
m3
b
b
b
b
b
δ(g1)
δ(g2)
δ(g3)
δ(g4)
δ(g5)
30 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
A condensed representation
Equivalence classes of interval patterns
Two interval patterns with same image are said to be equivalent
c ∼= d ⇐⇒ c� = d�
Equivalence class of a pattern d
[d ] = {c |c ∼= d}
with a unique closed pattern: the smallest rectangle
and one or several generators: the largest rectangles
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal.Mining frequent patterns with counting inference.SIGKDD Expl., 2(2):66–75, 2000.
In our example: 360 patterns ; 18 closed ; 44 generators
31 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Algorithms & experiments
Algorithms: MintIntChange, MinIntChangeG[t|h]
4 5 6
[4,5] [5,6]
[4,6]
Experiments
Mining several datasets from Bilkent University Repository
Compression rate varies between 107 and 109
Interordinal scaling: encodes ' 30.000 binary patterns
not efficient even with best algorithms (e.g. LCMv2)redundancy problem discarding its use for generator extraction
32 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Algorithms & experiments
Algorithms: MintIntChange, MinIntChangeG[t|h]
4 5 6
[4,5] [5,6]
[4,6]
Experiments
Mining several datasets from Bilkent University Repository
Compression rate varies between 107 and 109
Interordinal scaling: encodes ' 30.000 binary patterns
not efficient even with best algorithms (e.g. LCMv2)redundancy problem discarding its use for generator extraction
32 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Contributions – A KDD-oriented discussion
Discussion
Advantages
Minimum description length principle favours generators
Potential applications
Data privacy and k-anonymisationk-box problem in computational geometryQuantitative association rule miningData summarization
Problem
With very large data set, compression is not enough
Numerical data are noisy
Need of fault-tolerant condensed representations
33 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
1 Context
2 Formal Concept Analysis
3 ContributionsInterval pattern structuresIntroducing similarityA KDD-oriented discussion
4 Conclusion and perspectives
Conclusion and perspectives
Conclusion
A new insight for the mining numerical data
Our main tools...
Formal Concept Analysis and conceptual scaling
Pattern structures and projections
Tolerance relation
... for numerical data mining
Conceptual representations of numerical data
Bi-clustering
Information fusion
Applications: GED analysis and agricultural practice assessment
35 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Conclusion and perspectives
ConclusionAn application in GED analysis
With FCA and pattern structures
Many ways of extracting patterns in GED
Biological validation of several patterns
We now need a systematic validation step using new knowledge
transcription factors
biological knowledge base, e.g. Gene Ontology
36 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Conclusion and perspectives
To be continued...Short- and mid- term
Handle other types of biclusters and algorithm comparison
S. C. Madeira and A. L. OliveiraBiclustering Algorithms for Biological Data Analysis: a survey.In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004.
Insert domain knowledge for biological data
Study threshold θ effect w.r.t. the number of tolerance classes
Post-doctoral position
Biclustering (multi-dimensional) numerical data
Numerical pattern based classifier and association rules
Data privacy and pattern projection
Wagner Jr. Meira (Universidade Federal de Minas Gerais, Brasil)
37 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Conclusion and perspectives
Cross-domain fertilizationItemset-mining in KDD
Other frameworks for closed patterns
H. Arimura and T. UnoPolynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, andPictures in Accessible Set Systems.In SIAM International Conference on Data Mining, 2009.
G.C. GarrigaFormal Methods for Mining Structured Objects.PhD Thesis, Universitat Politecnica de Catalunya, 2006
Condensed representations and fault-tolerant patternsm1 m2 m3
g1 5 7 6g2 6 8 4g3 4 8 5g4 4 9 8g5 15 8 5
R. Pensa and J.-F. BoulicautTowards Fault-Tolerant Formal Concept Analysis.In Proc. 9th Congress of the Italian Association for Artificial Intelligence (AI*IA), Springer, 2005.
38 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Conclusion and perspectives
Cross-domain fertilization
Data-analysis
Symbolic data analysis and distances
P. Agarwal, M. Kaytoue, S. O. Kuznetsov, A. Napoli and G. PolaillonSymbolic Galois Lattices with Pattern Structures.In International Conference on Rough Sets, Fuzzy Sets, Data-mining and Granularity Computing(RSFDGrC), 2011.
Information fusion and fuzzy concept analysis
Fuzzy settings and possibility theory
Z. Assaghir, M. Kaytoue, and H. PradeA Possibility Theory Oriented Discussion of Conceptual Pattern Ptructures.In Scalable Uncertainty Management, 4th International Conference (SUM), 2010.
39 / 40On the Mining of Numerical Data with Formal Concept Analysis
N
Merci
Danke schonSpasibo
40 / 40On the Mining of Numerical Data with Formal Concept Analysis
N