+ All Categories
Home > Education > HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms...

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms...

Date post: 24-Jan-2015
Category:
Upload: albert-orriols-puig
View: 350 times
Download: 0 times
Share this document with a friend
Description:
 
23
Artificial Data Sets based on Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency Joaquin Rios Boutin Joaquin Rios-Boutin Albert Orriols-Puig Josep-Maria Garrell-Guiu Josep Maria Garrell Guiu Grup de Recerca en Sistemes Intel·ligents Enginyeria i Arquitectura La Salle Universitat Ramon Llull Enginyeria i Arquitectura La Salle, Universitat Ramon Llull {jrios, aorriols, josepmg}@salle.url.edu
Transcript
Page 1: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Artificial Data Sets based on Artificial Data Sets based on Knowledge Generators: Analysis of g y

Learning Algorithms Efficiency

Joaquin Rios BoutinJoaquin Rios-BoutinAlbert Orriols-Puig

Josep-Maria Garrell-GuiuJosep Maria Garrell Guiu

Grup de Recerca en Sistemes Intel·ligentsEnginyeria i Arquitectura La Salle Universitat Ramon LlullEnginyeria i Arquitectura La Salle, Universitat Ramon Llull

{jrios, aorriols, josepmg}@salle.url.edu

Page 2: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Motivation

What is the Holy Grail of Machine Learning?Find the right Learning Algorithm to every Problem– Find the right Learning Algorithm to every Problem

– Real Problems are black boxes• We don’t know which knowledge is contained

• We can’t answer: DI– When to stop training?

– How much efficient is the learning process?

DI

– Artificial Problems: DI• Knowledge-driven

• Property-driven

DIK

Slide 2GRSI

Property-driven

Enginyeria i Arquitectura la Salle

Complex.Met.

Page 3: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Framework

Machine Learning as a Communication System

Environment. LearningAl ith

Communication Chanel

Knowledgeto be learned

Algorithm.Learned

Knowledge

Data Set

Knowledge

Slide 3GRSI Enginyeria i Arquitectura la Salle

Page 4: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

Slide 4GRSI Enginyeria i Arquitectura la Salle

Page 5: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

1 Algorithm Evaluation Processg

Process Execution and Control DB

Problem

Process Execution and Control DB

ProblemData SetSampling

SizeSamplingMethod

AlgorithmParameters

DI Learning AccuracyDIGeneration

LearningAlgorithm

0.6

0.8

1

1.2DS1kMulplx6m1

0

0.2

0.4

0 2000 4000 6000 8000 10000

Knowledge

0.6

0.8

1

1.2DS1kMulplx6m1

Optimal

KnowledgeComparison

Slide 5GRSI Enginyeria i Arquitectura la Salle

0

0.2

0.4

0 2000 4000 6000 8000 10000

pPopulation

Page 6: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

1 Algorithm Evaluation Process Dimensions

10000100000

g

1001000

10000

mpl

ing

Size

A

110

100

Sam S

AP1

ApnAlg.P

1SRS SIS RRS RIS

AP1

Sampling Methods

aram.

Sampling Methods

To each Problem

Slide 6GRSI Enginyeria i Arquitectura la Salle

To each Problem

Page 7: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

Slide 7GRSI Enginyeria i Arquitectura la Salle

Page 8: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

2 Knowledge Representationg p

CCondition Class/Action

a11 a12 a1j a1m C1 Rule1

ai1 ai2 aij aim CiRule i1 i2 ij im iRuleSet

an1 an2 anj anm Cn

aij={0,1, #} CiєN

Slide 8GRSI Enginyeria i Arquitectura la Salle

aij {0,1, #} CiєN

Page 9: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

2 Sampling Methodsp g

SRS Sequential Rule Selection SIS Sequential Instance SelectionSRS Sequential Rule Selection SIS Sequential Instance Selection

2nd Random # substitution 2nd

Sequential #substitution1st

1st

RRS Random Rule Selection RIS Random Instance Selection

2nd Random # substitution Sequential #

1st

2nd1st

Random # substitution

2nd

Sequential #substitution

Slide 9GRSI Enginyeria i Arquitectura la Salle

Page 10: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

2 Problems to learn and Learning Algorithm

Mux6 Mux11 Parity50 0 # # # 0 0

0 0 # # # 1 1

0 0 0 0 0 0

0 0 0 0 1 1

Mux6 Mux11 Parity5

0 1 # # 0 # 0

0 1 # # 1 # 1

0 0 0 1 0 1

0 0 0 1 1 0

1 1 0 # # # 0

1 1 1 # # # 1

1 1 1 1 0 0

1 1 1 1 1 1XCS

Position5 Position11 Parity5-3

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 1 # 2

0 0 0 0 0 # # # 0

0 0 0 0 1 # # # 1

0 0 0 1 0 # # # 1

0 0 1 # # 3

1 # # # # 5

0 0 0 1 1 # # # 0

1 1 1 1 0 # # # 0

Slide 10GRSI Enginyeria i Arquitectura la Salle

1 1 1 1 1 # # # 1

Page 11: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

2 Problem Propertiesp

Optimal Rule SetsComplete– Complete

– Non overlapped

– Irreducible

Why?Why?– Simple structure of knowledge complexity

V k tifi i l bl– Very known artificial problems

Slide 11GRSI Enginyeria i Arquitectura la Salle

Page 12: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

Slide 12GRSI Enginyeria i Arquitectura la Salle

Page 13: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

3 Sampling and Learning Iteration p g g

Problem{Sampling Iteration} {Training Iteration}ProblemData SetSampling

SizeSamplingMethod

AlgorithmParameters

{ p g } { g }

DI Learning AccuracyDIGenaration

LearningAlgorithm

0.6

0.8

1

1.2DS1kMulplx6m1

0

0.2

0.4

0 2000 4000 6000 8000 10000

Knowledge

0.6

0.8

1

1.2DS1kMulplx6m1

OptimalP l ti

KnowledgeComparison

Slide 13GRSI Enginyeria i Arquitectura la Salle

0

0.2

0.4

0 2000 4000 6000 8000 10000

Population

Page 14: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

3 Output Results and Iteration Reductionp

Output Results– 2 Plots to every Problem Sampling Method Sampling Size and– 2 Plots to every Problem, Sampling Method, Sampling Size and

Algorithm Parameters.

• Optimal Population 1

1.2DS1kMulplx6m1

Optimal Population

• Accuracy

It ti R d ti 0.8

1

Iteration Reduction– SIS Pure sequential

0.4

0.6

• No Sampling Iteration Needed– Problems without “don’t care”

0.2

0.4

• SRS=SIS and RRS=RIS 0

0 2000 4000 6000 8000 10000

Slide 14GRSI

Page 15: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

3 Experimental Parametersp

Number of Problems = 6

N b f S li M th d 4Number of Sampling Methods = 4

Number of different Sampling Sizes = 4

Number of different Algorithms Parameters Sets = 2

N b f S li It ti 10Number of Sampling Iterations = 10

Number of Training Iterations = 10

Number of Data Sets Generated = 744

Number of Training Process = 14880Number of Training Process = 14880

Slide 15GRSI

Page 16: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

Slide 16GRSI Enginyeria i Arquitectura la Salle

Page 17: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Problem Dimension

Sampling M = RIS Sampling Size = 1000 Learning Alg Param = pDNC 0 2Sampling M. = RIS Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2

0.8

1

1.2DS1kMulplx6m4

0.9

1

1.1DS1kParity5m4Mux6

Parity5

0.4

0.6

0.8

0.6

0.7

0.8

0.9

0

0.2

0.3

0.4

0.5

-0.2 0 2000 4000 6000 8000 10000

1

1.05DS1kMulplx6m4

0.2 0 2000 4000 6000 8000 10000

1

1.05DS1kParity5m4

0.9

0.95

1

0.8

0.85

0.9

0.95

0.8

0.85

0.6

0.65

0.7

0.75

Slide 17GRSI

0.75 0 2000 4000 6000 8000 10000

0.55 0 2000 4000 6000 8000 10000

Page 18: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Sampling Method Dimensionp g

Problem = Position5 Sampling Size = 1000 Learning Alg Param = pDNC 0 2Problem = Position5 Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2

1

1.2DS1kPosition5m1

0.8

0.9DS1kPosition5m4

SRS Sequential Rule Selection RIS Random Instance Selection

0.6

0.8

1

0.5

0.6

0.7

0.2

0.4

0.6

0.1

0.2

0.3

0.4

0

0.2

0 2000 4000 6000 8000 10000 -0.1

0

0.1

0 2000 4000 6000 8000 10000

0.8

0.9

1

1.1DS1kPosition5m4

0.8

0.9

1

1.1DS1kPosition5m1

0.5

0.6

0.7

0.8

0.5

0.6

0.7

0.8

Slide 18GRSI

0.3

0.4

0.5

0 2000 4000 6000 8000 10000

0.3

0.4

0 2000 4000 6000 8000 10000

Page 19: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Sampling Size Dimensionp g

Problem = Parity5 Sampling M = RIS Learning Alg Param = pDNC 0 2Problem = Parity5 Sampling M.= RIS Learning Alg. Param. = pDNC 0.2

100 10000 0.9

1

1.1DS100Parity5m4

1

1.1DS10kParity5m4

0.6

0.7

0.8

0.9

0.6

0.7

0.8

0.9

0.2

0.3

0.4

0.5

0.3

0.4

0.5

0.6

0.1

0.2

0 2000 4000 6000 8000 10000 0.2

0 2000 4000 6000 8000 10000

1

1.05DS100Parity5m4

1

1.05DS10kParity5m4

0.85

0.9

0.95

1

0.8

0.85

0.9

0.95

0.7

0.75

0.8

0.6

0.65

0.7

0.75

0.8

Slide 19GRSI

0.6

0.65

0 2000 4000 6000 8000 10000

0.55

0.6

0 2000 4000 6000 8000 10000

Page 20: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Parameter Algorithm Dimensiong

Problem = Mux6 Sampling M = RIS Sampling Size = 1000Problem = Mux6 Sampling M. = RIS Sampling Size = 1000

0.7

0.8

0.9

1DS1kMulplx6m4

1

1.2DS1kMulplx6m4

pDNC 0.8 pDNC 0.2

0.3

0.4

0.5

0.6

0.7

0.4

0.6

0.8

-0.1

0

0.1

0.2

0.3

0 2000 4000 6000 8000 10000-0.2

0

0.2

0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

1

1.05DS1kMulplx6m4

0.95

1

1.05DS1kMulplx6m4

0.9

0.95

0.7

0.75

0.8

0.85

0.9

0.75

0.8

0.85

0.5

0.55

0.6

0.65

0.7

0 2000 4000 6000 8000 10000

Slide 20GRSI

0.75 0 2000 4000 6000 8000 10000

0 2000 4000 6000 8000 10000

Page 21: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work

Slide 21GRSI Enginyeria i Arquitectura la Salle

Page 22: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

Conclusions and Further Work

Conclusions– Automatic Learning Algorithm Analyzer based on Artificial Data Sets– Four dimensions comparisons– Methodology Implementation, Experiment and Results Analysis

Further Work– Non ORS Problems

R l Att ib t– Real Attributes– Sampling Methods based on distance or transition matrix– Multi Step Problemsp– Different Learning Algorithms– Different Knowledge representations– Knowledge Covering Metrics– Applying Data Set Complexity Metrics Suite

Slide 22GRSI

Page 23: HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiency

GRSI

Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiencyy g g y

Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu{jrios, aorriols, josepmg}@salle.url.edu{j j p g}@

GRSI (Grup de Recerca en Sistemes Intel·ligents)• http://www salle url edu/GRSI• http://www.salle.url.edu/GRSI

Oriented to:– Oriented to:• CBR (Computer Based Reasoning) Algorithms• Evolutive Computation AlgorithmsEvolutive Computation Algorithms• Data Mining Technology Transfer

Slide 23GRSI Enginyeria i Arquitectura la Salle


Recommended