Lecture Notes : (In)Formal concept analysis 30/03/2009
Formal Concept AnalysisProf. Kim Mens
Louvain School of EngineeringDepartment of Computing Science and Engineering
UCL
http://www.info.ucl.ac.be/~km
(In)
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Information explosion
IT advances in the last decade(s) have caused an explosion of information
E.g., growth of the internet
This leads to a real information overload
How to manage (i.e., search, structure) all that information?
2
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
(Small) example
Dataset = someone’s iTunes™ music library
≥ 5000 songs each having a name, artist, rating, genre, ...
How to manage all that data
How to find a song we like?
Can we find interesting relations between songs?
which songs are similar?
in what way are they similar?
3
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Managing large data sets
Given a data set with many thousands of elements:
web pages, text or other documents
data libraries (books, songs, movies, ...)
customer and personnel databases
having certain properties:
indexes, relevant keywords, tags, genres, ...
In general ...
4
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Managing large data sets
Given a data set with many thousands of elements:
web pages, text or other documents
data libraries (books, songs, movies, ...)
customer and personnel databases
Questions
1. How to find relevant data?
2. How to discover (hidden) structure in that data?
5
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example (revisited)
Songs Genres
6
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
7
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance
8
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance
Search results [ party, dance ] :
• Technologic – Daft Punk• Whole Again - Atomic Kitten• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar
Refine search by genres :
• [ slow, pop, soft ]• [ beat ]
Remove genres from search :
• party• dance
9
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance beat
Search results [ party, dance, beat ] :
• Technologic – Daft Punk• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar
Refine search by genres :
• [ electronic ]• [ reggae ]
Remove genres from search :
• party• dance• beat
10
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance beat reggae
Search results [ party, dance, beat, reggae ] :
• Get Busy - Sean Paul
Remove genres from search :
• party• dance• beat• reggae
11
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party reggae
Search results [ party, reggae ] :
• Could You Be Loved – Bob Marley
Refine search by genres :
• [ dance, beat ]
Remove genres from search :
• party• reggae
12
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example
How to manage all those songs?
Three concrete applications:
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
13
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Structure of the world-wide music scene
http://sixdegrees.hu/last.fm/index.html
?14
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Dependencies between genres
New wave is so eighties
Dance music is party music
Disco is from the seventies
Classical music and slows are for softies
...
15
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example
How to manage all those songs?
Three concrete applications:
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
16
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Discover a user profile
To analyse the preferred genres of a user
for match-making or publicity purposes
For example,
most of her music is party music
she likes background music
she’s not such a big fan of classical
none of her music is hard
17
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example
How to manage all those songs?
Three concrete applications:
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
So how can we
achieve all this?
18
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Formal concept analysis...
... may be of help
FCA was invented around 1980 in Darmstadt as a mathematical theory for modelling the notion of a “concept”
Since then it has been applied in many domains of computer science dealing with large data sets
data analysis
knowledge discovery
software engineering
19
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Data set is represented by a “context”
Objects Attributes
Relation
20
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Formal concept analysis...
Starts from a context C
a set G of objects
a set M of attributes
a relation I between the objects and the attributes
Determines concepts
Maximal groups of objects and attributes
Plus hierarchical relationships
Subset relationships between those groups21
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A “concept” represents a group of related objects and attributes
Intuitively, we look for maximal “rectangles” in the binary relation I
22
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A conceptAlice - Sisters of Mercy
A Forest - The Cure
New Wave Party Eighties
Objects Attributes
A concept is a maximal group of objects and attributes
Group:
Every object of the concept has those attributes
Every attribute of the concept holds for those objects
Maximal
No other object (outside the concept) has those same attributes
No other attribute (outside the concept) is shared by these objects23
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Not a concept
Need to include thisNeed to include this as well
Intuitively, we look for maximal “rectangles” in the binary relation I
24
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Formal concept analysis...
... derives hierarchies of concepts from data sets
It generates and visualizes hierarchies of concepts on a mathematically founded basis
FCA
25
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A concept hierarchy
26
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Yet another concept
27
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A subconcept
The blue concept is a subconcept of the green one.
28
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A subconcept
is subconcept of
TechnologicIn Da ClubGet Busy
Destination CalabriaRock This Party
Party Dance Beat
Party Electronic Dance BeatTechnologic
Destination CalabriaRock This Party
is subset of is subset of
29
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Concept lattice
For a given context, the set of all formal concepts, together with the partial order “is subconcept of” form a lattice
A lattice is a mathematical structure with some interesting properties:
for any two concepts there is always a greatest common subconcept and a least common superconcept
it is even a complete lattice, i.e. a unique top (least common superconcept) and bottom element (greatest common subconcept) exist
30
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A concept lattice
31
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A concept lattice
Alice – Sisters of Mercy
Forest – The Cure
New Wave Party Eighties
TechnologicIn Da ClubGet Busy
Destination CalabriaRock This Party
Party Dance Beat
Party Electronic Dance BeatTechnologic
Destination CalabriaRock This Party
is su
bcon
cept
of
32
Tool support : Concept Explorer
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A concept lattice in detail(sparse labelling)
34
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example revisitedHow does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
35
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance
Search results [ party, dance ] :
• Technologic – Daft Punk• Whole Again - Atomic Kitten• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar
Refine search by genres :
• [ slow, pop, soft ]• [ beat ]
Remove genres from search :
• party• dance
36
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance beat
Search results [ party, dance, beat ] :
• Technologic – Daft Punk• Get Busy - Sean Paul• Destination Calabria – Alex Gaudino• Rock This Party – Bob Sinclar
Refine search by genres :
• [ electronic ]• [ reggae ]
Remove genres from search :
• party• dance• beat
37
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party dance beat reggae
Search results [ party, dance, beat, reggae ] :
• Get Busy - Sean Paul
Refine search by genres :
• [ electronic ]• [ reggae ]
Remove genres from search :
• party• dance• beat
38
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
A Google-like search engine for songsGalois
Genres (separated by spaces) :
search
party reggae
Search results [ party, reggae ] :
• Could You Be Loved – Bob Marley
Refine search by genres :
• [ dance, beat ]
Remove genres from search :
• party• reggae
39
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example revisitedHow does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
40
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Implications
New wave is from the eighties
Dance music is party music
Disco is from the seventies
Slows are soft
Classical music is soft
41
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Implications
Slows are soft
Classical music is soft
Disco is from the seventies
Dance music is party music
New wave is from the eighties
42
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Associations
Most dance music has a beat
Most of her music is party music
A lot of music from the eighties is party music
43
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Running example revisitedHow does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies between genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
44
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Concept lattice(with number of objects)
Preferred music is party music
Also likes some background music
Not such a big fan of classical
and so on ...
45
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Some problems...
Concept lattice can get very dense for large data sets
Concept lattice can grow exponential in size of context
Attributes are not always binary
What if data is incomplete or imprecise
False positives and negatives
...
(Some solutions have been proposed to overcome these problems)
46
/ 48Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium
Conclusion
FCA is an interesting technique to analyse large data sets
especially to discover interesting concepts, relations and structures in the data
Can be applied to many application domains
Based on a formal mathematical theory
Yet easy to use and understand intuitively
Quality of results depends on size and quality of the data
47
Lecture Notes : (In)Formal concept analysis 30/03/2009Prof. Kim Mens – UCL, Belgium / 48
SourcesB. Ganter, R . Wille: Formal Concept Analysis –Mathematical Foundations. Springer, Heidelberg 1999
Uta Priss’ Formal Concept Analysis Homepage
http://www.upriss.org.uk/fca/fca.html
Gerd Stumme’s course “Formale Begriffsanalyse”
http://www.kde.cs.uni-kassel.de/lehre/ss2005/formale_begriffsanalyse
Context Explorer (ConExp)
http://conexp.sourceforge.net/
J. Fallon: Application des treillis de Galois à la recherche d’informations. Master’s thesis, Université catholique de Louvain, Département d’Ingénierie Informatique, 2004
48