HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms...

Artificial Data Sets based on Artificial Data Sets based on Knowledge Generators: Analysis of g y

Learning Algorithms Efficiency

Joaquin Rios BoutinJoaquin Rios-BoutinAlbert Orriols-Puig

Josep-Maria Garrell-GuiuJosep Maria Garrell Guiu

Grup de Recerca en Sistemes Intel·ligentsEnginyeria i Arquitectura La Salle Universitat Ramon LlullEnginyeria i Arquitectura La Salle, Universitat Ramon Llull

{jrios, aorriols, josepmg}@salle.url.edu

Motivation

What is the Holy Grail of Machine Learning?Find the right Learning Algorithm to every Problem– Find the right Learning Algorithm to every Problem

– Real Problems are black boxes• We don’t know which knowledge is contained

• We can’t answer: DI– When to stop training?

– How much efficient is the learning process?

DI

– Artificial Problems: DI• Knowledge-driven

• Property-driven

DIK

Slide 2GRSI

Property-driven

Enginyeria i Arquitectura la Salle

Complex.Met.

Framework

Machine Learning as a Communication System

Environment. LearningAl ith

Communication Chanel

Knowledgeto be learned

Algorithm.Learned

Knowledge

Data Set

Knowledge

Slide 3GRSI Enginyeria i Arquitectura la Salle

Outline

1 Al i h E l i M h d l D fi i i1. Algorithm Evaluation Methodology Definition

2 Methodology Implementation2. Methodology Implementation

3. Experiment Description3. Experiment Description

4. Results and Analysis

5. Conclusions and Further Work


1 Algorithm Evaluation Processg

Process Execution and Control DB

Problem

Process Execution and Control DB

ProblemData SetSampling

SizeSamplingMethod

AlgorithmParameters

DI Learning AccuracyDIGeneration

LearningAlgorithm

0.6

0.8

1

1.2DS1kMulplx6m1

0

0.2

0.4

0 2000 4000 6000 8000 10000

Knowledge

0.6

0.8

1

1.2DS1kMulplx6m1

Optimal

KnowledgeComparison


0

0.2

0.4

0 2000 4000 6000 8000 10000

pPopulation

1 Algorithm Evaluation Process Dimensions

10000100000

g

1001000

10000

mpl

ing

Size

A

110

100

Sam S

AP1

ApnAlg.P

1SRS SIS RRS RIS

AP1

Sampling Methods

aram.

Sampling Methods

To each Problem


To each Problem

Outline







2 Knowledge Representationg p

CCondition Class/Action

a11 a12 a1j a1m C1 Rule1

ai1 ai2 aij aim CiRule i1 i2 ij im iRuleSet

an1 an2 anj anm Cn

aij={0,1, #} CiєN


aij {0,1, #} CiєN

2 Sampling Methodsp g

SRS Sequential Rule Selection SIS Sequential Instance SelectionSRS Sequential Rule Selection SIS Sequential Instance Selection

2nd Random # substitution 2nd

Sequential #substitution1st

1st

RRS Random Rule Selection RIS Random Instance Selection

2nd Random # substitution Sequential #

1st

2nd1st

Random # substitution

2nd

Sequential #substitution


2 Problems to learn and Learning Algorithm

Mux6 Mux11 Parity50 0 # # # 0 0

0 0 # # # 1 1

0 0 0 0 0 0

0 0 0 0 1 1

Mux6 Mux11 Parity5

0 1 # # 0 # 0

0 1 # # 1 # 1

0 0 0 1 0 1

0 0 0 1 1 0

1 1 0 # # # 0

1 1 1 # # # 1

1 1 1 1 0 0

1 1 1 1 1 1XCS

Position5 Position11 Parity5-3

0 0 0 0 0 0

0 0 0 0 1 1

0 0 0 1 # 2

0 0 0 0 0 # # # 0

0 0 0 0 1 # # # 1

0 0 0 1 0 # # # 1

0 0 1 # # 3

1 # # # # 5

0 0 0 1 1 # # # 0

1 1 1 1 0 # # # 0


1 1 1 1 1 # # # 1

2 Problem Propertiesp

Optimal Rule SetsComplete– Complete

– Non overlapped

– Irreducible

Why?Why?– Simple structure of knowledge complexity

V k tifi i l bl– Very known artificial problems


Outline







3 Sampling and Learning Iteration p g g

Problem{Sampling Iteration} {Training Iteration}ProblemData SetSampling

SizeSamplingMethod

AlgorithmParameters

{ p g } { g }

DI Learning AccuracyDIGenaration

LearningAlgorithm

0.6

0.8

1

1.2DS1kMulplx6m1

0

0.2

0.4

0 2000 4000 6000 8000 10000

Knowledge

0.6

0.8

1

1.2DS1kMulplx6m1

OptimalP l ti

KnowledgeComparison


0

0.2

0.4

0 2000 4000 6000 8000 10000

Population

3 Output Results and Iteration Reductionp

Output Results– 2 Plots to every Problem Sampling Method Sampling Size and– 2 Plots to every Problem, Sampling Method, Sampling Size and

Algorithm Parameters.

• Optimal Population 1

1.2DS1kMulplx6m1

Optimal Population

• Accuracy

It ti R d ti 0.8

1

Iteration Reduction– SIS Pure sequential

0.4

0.6

• No Sampling Iteration Needed– Problems without “don’t care”

0.2

0.4

• SRS=SIS and RRS=RIS 0

0 2000 4000 6000 8000 10000

Slide 14GRSI

3 Experimental Parametersp

Number of Problems = 6

N b f S li M th d 4Number of Sampling Methods = 4

Number of different Sampling Sizes = 4

Number of different Algorithms Parameters Sets = 2

N b f S li It ti 10Number of Sampling Iterations = 10

Number of Training Iterations = 10

Number of Data Sets Generated = 744

Number of Training Process = 14880Number of Training Process = 14880

Slide 15GRSI

Outline







Problem Dimension

Sampling M = RIS Sampling Size = 1000 Learning Alg Param = pDNC 0 2Sampling M. = RIS Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2

0.8

1

1.2DS1kMulplx6m4

0.9

1

1.1DS1kParity5m4Mux6

Parity5

0.4

0.6

0.8

0.6

0.7

0.8

0.9

0

0.2

0.3

0.4

0.5

-0.2 0 2000 4000 6000 8000 10000

1

1.05DS1kMulplx6m4

0.2 0 2000 4000 6000 8000 10000

1

1.05DS1kParity5m4

0.9

0.95

1

0.8

0.85

0.9

0.95

0.8

0.85

0.6

0.65

0.7

0.75

Slide 17GRSI

0.75 0 2000 4000 6000 8000 10000

0.55 0 2000 4000 6000 8000 10000

Sampling Method Dimensionp g

Problem = Position5 Sampling Size = 1000 Learning Alg Param = pDNC 0 2Problem = Position5 Sampling Size = 1000 Learning Alg. Param. = pDNC 0.2

1

1.2DS1kPosition5m1

0.8

0.9DS1kPosition5m4

SRS Sequential Rule Selection RIS Random Instance Selection

0.6

0.8

1

0.5

0.6

0.7

0.2

0.4

0.6

0.1

0.2

0.3

0.4

0

0.2

0 2000 4000 6000 8000 10000 -0.1

0

0.1

0 2000 4000 6000 8000 10000

0.8

0.9

1

1.1DS1kPosition5m4

0.8

0.9

1

1.1DS1kPosition5m1

0.5

0.6

0.7

0.8

0.5

0.6

0.7

0.8

Slide 18GRSI

0.3

0.4

0.5

0 2000 4000 6000 8000 10000

0.3

0.4

0 2000 4000 6000 8000 10000

Sampling Size Dimensionp g

Problem = Parity5 Sampling M = RIS Learning Alg Param = pDNC 0 2Problem = Parity5 Sampling M.= RIS Learning Alg. Param. = pDNC 0.2

100 10000 0.9

1

1.1DS100Parity5m4

1

1.1DS10kParity5m4

0.6

0.7

0.8

0.9

0.6

0.7

0.8

0.9

0.2

0.3

0.4

0.5

0.3

0.4

0.5

0.6

0.1

0.2

0 2000 4000 6000 8000 10000 0.2

0 2000 4000 6000 8000 10000

1

1.05DS100Parity5m4

1

1.05DS10kParity5m4

0.85

0.9

0.95

1

0.8

0.85

0.9

0.95

0.7

0.75

0.8

0.6

0.65

0.7

0.75

0.8

Slide 19GRSI

0.6

0.65

0 2000 4000 6000 8000 10000

0.55

0.6

0 2000 4000 6000 8000 10000

Parameter Algorithm Dimensiong

Problem = Mux6 Sampling M = RIS Sampling Size = 1000Problem = Mux6 Sampling M. = RIS Sampling Size = 1000

0.7

0.8

0.9

1DS1kMulplx6m4

1

1.2DS1kMulplx6m4

pDNC 0.8 pDNC 0.2

0.3

0.4

0.5

0.6

0.7

0.4

0.6

0.8

-0.1

0

0.1

0.2

0.3

0 2000 4000 6000 8000 10000-0.2

0

0.2

0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

1

1.05DS1kMulplx6m4

0.95

1

1.05DS1kMulplx6m4

0.9

0.95

0.7

0.75

0.8

0.85

0.9

0.75

0.8

0.85

0.5

0.55

0.6

0.65

0.7

0 2000 4000 6000 8000 10000

Slide 20GRSI

0.75 0 2000 4000 6000 8000 10000

0 2000 4000 6000 8000 10000

Outline







Conclusions and Further Work

Conclusions– Automatic Learning Algorithm Analyzer based on Artificial Data Sets– Four dimensions comparisons– Methodology Implementation, Experiment and Results Analysis

Further Work– Non ORS Problems

R l Att ib t– Real Attributes– Sampling Methods based on distance or transition matrix– Multi Step Problemsp– Different Learning Algorithms– Different Knowledge representations– Knowledge Covering Metrics– Applying Data Set Complexity Metrics Suite

Slide 22GRSI

GRSI

Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms Efficiencyy g g y

Joaquin Rios Boutin, Albert Orriols-Puig, Josep-Maria Garrell-Guiu{jrios, aorriols, josepmg}@salle.url.edu{j j p g}@

GRSI (Grup de Recerca en Sistemes Intel·ligents)• http://www salle url edu/GRSI• http://www.salle.url.edu/GRSI

Oriented to:– Oriented to:• CBR (Computer Based Reasoning) Algorithms• Evolutive Computation AlgorithmsEvolutive Computation Algorithms• Data Mining Technology Transfer


Date post:	24-Jan-2015
Category:	Education
Upload:	albert-orriols-puig
View:	350 times
Download:	0 times

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Learning Algorithms...

Education