Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems
22
Linkage Learning for Pittsburgh LCS: Making Problems Tractable Xavier Llorà, Kumara Sastry, & David E. Goldberg Illinois Genetic Algorithms Lab University of Illinois at Urbana-Champaign { xllora ,kumara,deg} @illigal . ge . uiuc . edu
Transcript
1. Linkage Learning for Pittsburgh LCS: Making Problems
Tractable Xavier Llor, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms Lab University of Illinois at
Urbana-Champaign {xllora,kumara,deg}@illigal.ge.uiuc.edu
2. Motivation and Early Work Can we apply Wilsons ideas for
evolving rule sets formed only by maximally accurate and general
rules in Pittsburgh LCS? Previous Multi-objective approaches:
Bottom up (Bernad, 2002) Panmictic populations Multimodal
optimization (sharing/crowding for niche formation) Top down (Llor,
Goldberg, Traus, Bernad, 2003) Explicitly address accuracy and
generality Use it to push and product compact rule sets The compact
classifier system (CCS) roots on the bottom up approach. NIGEL 2006
Llor, X., Sastry, K., and Goldberg, D. 2
3. Maximally Accurate and General Rules Accuracy and generality
can be compute as n t + (r) + n t# (r) n t + (r) quot;(r) =
quot;(r) = nt nm Fitness should combine accuracy and generality f
(r) = quot;(r) # $(r)% ! ! Such measure can be either applied to
rules or rule sets. The CCS uses this fitness and a compact genetic
algorithm ! (cGA) to evolve such rules. One cGA run provides one
rule. Multiple rules are required to form a rule set. NIGEL 2006
Llor, X., Sastry, K., and Goldberg, D. 3
4. The cGA Can Make It Rules may be obtained optimizing f (r) =
quot;(r) # $(r)% The basic CGA scheme 0 1. Initialization px i =
0.5 ! 2. Model sampling (two individuals are generated) 3.
Evaluation (f(r)) 4. Selection (tournament selection) ! 5.
Probabilistic model updation 6. Repeat steps 2-5 until termination
criteria are met NIGEL 2006 Llor, X., Sastry, K., and Goldberg, D.
4
5. cGA Model Perturbation Facilitate the evolution of different
rules Explore the frequency of appearance of each optimal rule
Initial model perturbation 0 px i = 0.5 + U(quot;0.4,0.4)
Experiments using the 3-input multiplexer 1,000 independent runs !
Visualize the pair-wise relations of the genes NIGEL 2006 Llor, X.,
Sastry, K., and Goldberg, D. 5
6. But One Rule Is Not Enough Model perturbation in cGA evolve
different rules The goal: evolve population of rules that solve the
problem together The fitness measure (f(r)) can be also be applied
to rule sets Two mechanism: Spawn a population until the solution
is meet Fusing populations when they represent the same rule NIGEL
2006 Llor, X., Sastry, K., and Goldberg, D. 6
7. Spawning and Fusing Populations NIGEL 2006 Llor, X., Sastry,
K., and Goldberg, D. 7
8. Experiments & Scalability Analysis using multiplexer
problems (3-, 6-, and 11-input) The number of rules in [O] grow
exponentially. It grows as 2i, where i is the number of inputs.
Assume equal probability of hitting a rule (binomial model). The
number or runs to achieve all the rules in [O] grows exponentially.
The cGA success as a function of the problem size! 3-input: 97%
6-input: 73.93% 11-input: 43.03% Scalability over 10,000
independent runs NIGEL 2006 Llor, X., Sastry, K., and Goldberg, D.
8
9. Scalability of CCS NIGEL 2006 Llor, X., Sastry, K., and
Goldberg, D. 9
10. So? Open questions: Multiple runs is not an option. Could
the poor cGA scalability be the result of the existence of linkage?
The -ary extended compact classifier system (eCCS) needs to provide
answers to: Perform linkage learning to improve the scalability of
the rule learning process. Evolve [O] in a single run (rule
niching?). The eCCS answer: Use the extended compact genetic
algorithm (Harik, 1999) Rule niching via restricted tournament
replacement (Harik, 1995) NIGEL 2006 Llor, X., Sastry, K., and
Goldberg, D. 10
11. Extended Compact Genetic Algorithm A Probabilistic model
building GA (Harik, 1999) Builds models of good solutions as
linkage groups Key idea: Good probability distribution Linkage
learning Key components: Representation: Marginal product model
(MPM) Marginal distribution of a gene partition Quality: Minimum
description length (MDL) Occams razor principle All things being
equal, simpler models are better Search Method: Greedy heuristic
search NIGEL 2006 Llor, X., Sastry, K., and Goldberg, D. 11
12. Marginal Product Model (MPM) Partition variables into
clusters Product of marginal distributions on a partition of genes
Gene partition maps to linkage groups MPM: [1, 2, 3], [4, 5, 6],
[l-2, l -1, l] ... xl-2 xl-1 xl x1 x2 x3 x4 x5 x6 {p000, p001,
p010, p100, p011, p101, p110, p111} NIGEL 2006 Llor, X., Sastry,
K., and Goldberg, D. 12
13. Minimum Description Length Metric Hypothesis: For an
optimal model Model size and error is minimum Model complexity, Cm
# of bits required to store all marginal probabilities Compressed
population complexity, Cp Entropy of the marginal distribution over
all partitions MDL metric, Cc = Cm + Cp NIGEL 2006 Llor, X.,
Sastry, K., and Goldberg, D. 13
14. Building an Optimal MPM Assume independent genes
([1],[2],,[l]) Compute MDL metric, Cc All combinations of two
subset merges Eg., {([1,2],[3],,[l]), ([1,3],[2],,[l]),
([1],[2],,[l-1,l])} Compute MDL metric for all model candidates
Select the set with minimum MDL, If , accept the model and go to
step 2. Else, the current model is optimal NIGEL 2006 Llor, X.,
Sastry, K., and Goldberg, D. 14
15. Extended Compact Genetic Algorithm Initialize the
population (usually random initialization) Evaluate the fitness of
individuals Select promising solutions (e.g., tournament selection)
Build the probabilistic model Optimize structure & parameters
to best fit selected individuals Automatic identification of
sub-structures Sample the model to create new candidate solutions
Effective exchange of building blocks Repeat steps 27 till some
convergence criteria are met NIGEL 2006 Llor, X., Sastry, K., and
Goldberg, D. 15
16. Models built by eCGA Use model-building procedure of
extended compact GA Partition genes into (mutually) independent
groups Start with the lowest complexity model Search for a
least-complex, most-accurate model Model Structure Metric [X0] [X1]
[X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000 [X0]
[X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933 [X0]
[X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819 [X0] [X1]
[X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644 M M [X0] [X1]
[X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273 M M [X0X1X2X3] [X4X5X6X7]
[X8X9X10X11] 0.8895 NIGEL 2006 Llor, X., Sastry, K., and Goldberg,
D. 16
17. Modifying ecGA for Rule Learning Rules are described using
-ary alphabets {0, 1, #}. eCCS uses a -ary version of ecGA (Sastry
and Goldberg, 2003; de la Osa, Sastry, and Lobo, 2006). Maximally
general and maximally accurate rules may be obtained using: f (r) =
quot;(r) # $(r)% Needs to maintain multiple rules in a run niching
We need an efficient niching method, that does not adversely !
affect the quality of the probabilistic models. Restricted
tournament replacement (Harik, 1995) NIGEL 2006 Llor, X., Sastry,
K., and Goldberg, D. 17
18. Experiments Goals 1. Is linkage learning useful to solve
the multiplexer problem using Pittsburgh LCS? 2. How far can we
push it? Multiplexer problems Address bits determine what input to
use There is un underlying structure, isnt it? The larger solved
using Pittsburgh approaches (11-input) Match all the examples No
linkage learning available We borrowed the population sizing theory
for ecGA. NIGEL 2006 Llor, X., Sastry, K., and Goldberg, D. 18
19. eCCS Models for Different Multiplexers Building Block Size
Increases NIGEL 2006 Llor, X., Sastry, K., and Goldberg, D. 19
20. eCCS Scalability Follows facet-wise theory: 1. Grows
exponential with the number of address bits (building block size)
2. Quadratically with the problem size NIGEL 2006 Llor, X., Sastry,
K., and Goldberg, D. 20
21. Conclusions The eCCS builds on competent GAs The facetwise
models from GA theory hold The eCCS is able to: 1. Perform linkage
learning to improve the scalability of the rule learning process.
2. Evolve [O] in a single run. The eCCS show the need for linkage
learning in Pittsburgh LCS to effectively solve multiplexer
problems. eCCS solved 20-input, 37-input, and 70-input multiplexers
problems for the first time using Pittsburgh LCS. NIGEL 2006 Llor,
X., Sastry, K., and Goldberg, D. 21
22. Linkage Learning for Pittsburgh LCS: Making Problems
Tractable Xavier Llor, Kumara Sastry, & David E. Goldberg
Illinois Genetic Algorithms Lab University of Illinois at
Urbana-Champaign {xllora,kumara,deg}@illigal.ge.uiuc.edu