Date post: | 24-Jan-2015 |
Category: |
Education |
Upload: | albert-orriols-puig |
View: | 350 times |
Download: | 0 times |
Artificial Data Sets based on Artificial Data Sets based on Knowledge Generators: Analysis of g y
Learning Algorithms Efficiency
Joaquin Rios BoutinJoaquin Rios-BoutinAlbert Orriols-Puig
Josep-Maria Garrell-GuiuJosep Maria Garrell Guiu
Grup de Recerca en Sistemes Intel·ligentsEnginyeria i Arquitectura La Salle Universitat Ramon LlullEnginyeria i Arquitectura La Salle, Universitat Ramon Llull
{jrios, aorriols, josepmg}@salle.url.edu
Motivation
What is the Holy Grail of Machine Learning?Find the right Learning Algorithm to every Problem– Find the right Learning Algorithm to every Problem
– Real Problems are black boxes• We don’t know which knowledge is contained
• We can’t answer: DI– When to stop training?
– How much efficient is the learning process?
DI
– Artificial Problems: DI• Knowledge-driven
• Property-driven
DIK
Slide 2GRSI
Property-driven
Enginyeria i Arquitectura la Salle
Complex.Met.
Framework
Machine Learning as a Communication System
Environment. LearningAl ith
Communication Chanel
Knowledgeto be learned
Algorithm.Learned
Knowledge
Data Set
Knowledge
Slide 3GRSI Enginyeria i Arquitectura la Salle
Outline
1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition
2 Methodology Implementation2. Methodology Implementation
3. Experiment Description3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Slide 4GRSI Enginyeria i Arquitectura la Salle
1 Algorithm Evaluation Processg
Process Execution and Control DB
Problem
Process Execution and Control DB
ProblemData SetSampling
SizeSamplingMethod
AlgorithmParameters
DI Learning AccuracyDIGeneration
LearningAlgorithm
0.6
0.8
1
1.2DS1kMulplx6m1
0
0.2
0.4
0 2000 4000 6000 8000 10000
Knowledge
0.6
0.8
1
1.2DS1kMulplx6m1
Optimal
KnowledgeComparison
Slide 5GRSI Enginyeria i Arquitectura la Salle
0
0.2
0.4
0 2000 4000 6000 8000 10000
pPopulation
1 Algorithm Evaluation Process Dimensions
10000100000
g
1001000
10000
mpl
ing
Size
A
110
100
Sam S
AP1
ApnAlg.P
1SRS SIS RRS RIS
AP1
Sampling Methods
aram.
Sampling Methods
To each Problem
Slide 6GRSI Enginyeria i Arquitectura la Salle
To each Problem
Outline
1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition
2 Methodology Implementation2. Methodology Implementation
3. Experiment Description3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Slide 7GRSI Enginyeria i Arquitectura la Salle
2 Knowledge Representationg p
CCondition Class/Action
a11 a12 a1j a1m C1 Rule1
ai1 ai2 aij aim CiRule i1 i2 ij im iRuleSet
an1 an2 anj anm Cn
aij={0,1, #} CiєN
Slide 8GRSI Enginyeria i Arquitectura la Salle
aij {0,1, #} CiєN
2 Sampling Methodsp g
SRS Sequential Rule Selection SIS Sequential Instance SelectionSRS Sequential Rule Selection SIS Sequential Instance Selection
2nd Random # substitution 2nd
Sequential #substitution1st
1st
RRS Random Rule Selection RIS Random Instance Selection
2nd Random # substitution Sequential #
1st
2nd1st
Random # substitution
2nd
Sequential #substitution
Slide 9GRSI Enginyeria i Arquitectura la Salle
2 Problems to learn and Learning Algorithm
Mux6 Mux11 Parity50 0 # # # 0 0
0 0 # # # 1 1
0 0 0 0 0 0
0 0 0 0 1 1
Mux6 Mux11 Parity5
0 1 # # 0 # 0
0 1 # # 1 # 1
0 0 0 1 0 1
0 0 0 1 1 0
1 1 0 # # # 0
1 1 1 # # # 1
1 1 1 1 0 0
1 1 1 1 1 1XCS
Position5 Position11 Parity5-3
0 0 0 0 0 0
0 0 0 0 1 1
0 0 0 1 # 2
0 0 0 0 0 # # # 0
0 0 0 0 1 # # # 1
0 0 0 1 0 # # # 1
0 0 1 # # 3
1 # # # # 5
0 0 0 1 1 # # # 0
1 1 1 1 0 # # # 0
Slide 10GRSI Enginyeria i Arquitectura la Salle
1 1 1 1 1 # # # 1
2 Problem Propertiesp
Optimal Rule SetsComplete– Complete
– Non overlapped
– Irreducible
Why?Why?– Simple structure of knowledge complexity
V k tifi i l bl– Very known artificial problems
Slide 11GRSI Enginyeria i Arquitectura la Salle
Outline
1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition
2 Methodology Implementation2. Methodology Implementation
3. Experiment Description3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Slide 12GRSI Enginyeria i Arquitectura la Salle
3 Sampling and Learning Iteration p g g
Problem{Sampling Iteration} {Training Iteration}ProblemData SetSampling
SizeSamplingMethod
AlgorithmParameters
{ p g } { g }
DI Learning AccuracyDIGenaration
LearningAlgorithm
0.6
0.8
1
1.2DS1kMulplx6m1
0
0.2
0.4
0 2000 4000 6000 8000 10000
Knowledge
0.6
0.8
1
1.2DS1kMulplx6m1
OptimalP l ti
KnowledgeComparison
Slide 13GRSI Enginyeria i Arquitectura la Salle
0
0.2
0.4
0 2000 4000 6000 8000 10000
Population
3 Output Results and Iteration Reductionp
Output Results– 2 Plots to every Problem Sampling Method Sampling Size and– 2 Plots to every Problem, Sampling Method, Sampling Size and
Algorithm Parameters.
• Optimal Population 1
1.2DS1kMulplx6m1
Optimal Population
• Accuracy
It ti R d ti 0.8
1
Iteration Reduction– SIS Pure sequential
0.4
0.6
• No Sampling Iteration Needed– Problems without “don’t care”
0.2
0.4
• SRS=SIS and RRS=RIS 0
0 2000 4000 6000 8000 10000
Slide 14GRSI
3 Experimental Parametersp
Number of Problems = 6
N b f S li M th d 4Number of Sampling Methods = 4
Number of different Sampling Sizes = 4
Number of different Algorithms Parameters Sets = 2
N b f S li It ti 10Number of Sampling Iterations = 10
Number of Training Iterations = 10
Number of Data Sets Generated = 744
Number of Training Process = 14880Number of Training Process = 14880
Slide 15GRSI
Outline
1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition
2 Methodology Implementation2. Methodology Implementation
3. Experiment Description3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Slide 16GRSI Enginyeria i Arquitectura la Salle
Problem Dimension
Sampling M = RIS Sampling Size = 1000 Learning Alg Param = pDNC 0 2Sampling M. = RIS Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2
0.8
1
1.2DS1kMulplx6m4
0.9
1
1.1DS1kParity5m4Mux6
Parity5
0.4
0.6
0.8
0.6
0.7
0.8
0.9
0
0.2
0.3
0.4
0.5
-0.2 0 2000 4000 6000 8000 10000
1
1.05DS1kMulplx6m4
0.2 0 2000 4000 6000 8000 10000
1
1.05DS1kParity5m4
0.9
0.95
1
0.8
0.85
0.9
0.95
0.8
0.85
0.6
0.65
0.7
0.75
Slide 17GRSI
0.75 0 2000 4000 6000 8000 10000
0.55 0 2000 4000 6000 8000 10000
Sampling Method Dimensionp g
Problem = Position5 Sampling Size = 1000 Learning Alg Param = pDNC 0 2Problem = Position5 Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2
1
1.2DS1kPosition5m1
0.8
0.9DS1kPosition5m4
SRS Sequential Rule Selection RIS Random Instance Selection
0.6
0.8
1
0.5
0.6
0.7
0.2
0.4
0.6
0.1
0.2
0.3
0.4
0
0.2
0 2000 4000 6000 8000 10000 -0.1
0
0.1
0 2000 4000 6000 8000 10000
0.8
0.9
1
1.1DS1kPosition5m4
0.8
0.9
1
1.1DS1kPosition5m1
0.5
0.6
0.7
0.8
0.5
0.6
0.7
0.8
Slide 18GRSI
0.3
0.4
0.5
0 2000 4000 6000 8000 10000
0.3
0.4
0 2000 4000 6000 8000 10000
Sampling Size Dimensionp g
Problem = Parity5 Sampling M = RIS Learning Alg Param = pDNC 0 2Problem = Parity5 Sampling M.= RIS Learning Alg. Param. = pDNC 0.2
100 10000 0.9
1
1.1DS100Parity5m4
1
1.1DS10kParity5m4
0.6
0.7
0.8
0.9
0.6
0.7
0.8
0.9
0.2
0.3
0.4
0.5
0.3
0.4
0.5
0.6
0.1
0.2
0 2000 4000 6000 8000 10000 0.2
0 2000 4000 6000 8000 10000
1
1.05DS100Parity5m4
1
1.05DS10kParity5m4
0.85
0.9
0.95
1
0.8
0.85
0.9
0.95
0.7
0.75
0.8
0.6
0.65
0.7
0.75
0.8
Slide 19GRSI
0.6
0.65
0 2000 4000 6000 8000 10000
0.55
0.6
0 2000 4000 6000 8000 10000
Parameter Algorithm Dimensiong
Problem = Mux6 Sampling M = RIS Sampling Size = 1000Problem = Mux6 Sampling M. = RIS Sampling Size = 1000
0.7
0.8
0.9
1DS1kMulplx6m4
1
1.2DS1kMulplx6m4
pDNC 0.8 pDNC 0.2
0.3
0.4
0.5
0.6
0.7
0.4
0.6
0.8
-0.1
0
0.1
0.2
0.3
0 2000 4000 6000 8000 10000-0.2
0
0.2
0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000
1
1.05DS1kMulplx6m4
0.95
1
1.05DS1kMulplx6m4
0.9
0.95
0.7
0.75
0.8
0.85
0.9
0.75
0.8
0.85
0.5
0.55
0.6
0.65
0.7
0 2000 4000 6000 8000 10000
Slide 20GRSI
0.75 0 2000 4000 6000 8000 10000
0 2000 4000 6000 8000 10000
Outline
1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition
2 Methodology Implementation2. Methodology Implementation
3. Experiment Description3. Experiment Description
4. Results and Analysis
5. Conclusions and Further Work
Slide 21GRSI Enginyeria i Arquitectura la Salle
Conclusions and Further Work
Conclusions– Automatic Learning Algorithm Analyzer based on Artificial Data Sets– Four dimensions comparisons– Methodology Implementation, Experiment and Results Analysis
Further Work– Non ORS Problems
R l Att ib t– Real Attributes– Sampling Methods based on distance or transition matrix– Multi Step Problemsp– Different Learning Algorithms– Different Knowledge representations– Knowledge Covering Metrics– Applying Data Set Complexity Metrics Suite
Slide 22GRSI
GRSI
Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiencyy g g y
Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu{jrios, aorriols, josepmg}@salle.url.edu{j j p g}@
GRSI (Grup de Recerca en Sistemes Intel·ligents)• http://www salle url edu/GRSI• http://www.salle.url.edu/GRSI
Oriented to:– Oriented to:• CBR (Computer Based Reasoning) Algorithms• Evolutive Computation AlgorithmsEvolutive Computation Algorithms• Data Mining Technology Transfer
Slide 23GRSI Enginyeria i Arquitectura la Salle