Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... ·...

Introduction Learning for optimization Optimization for learning

Machine learning and

black-box expensive optimization

Sebastien Verel

Laboratoire d’Informatique, Signal et Image de la Cote d’opale (LISIC)Universite du Littoral Cote d’Opale, Calais, Francehttp://www-lisic.univ-littoral.fr/~verel/

June, 18th, 2018

http://www-lisic.univ-littoral.fr/~verel/


AI : Machine Learning, Optimization, perception, etc.

Learning :

Minimize an error functionMθ : model to learn on dataSearch θ? = arg minθ Error(Mθ, data)

According to the model dimension, variables, error function, etc.,huge number of optimization algorithms

Optimization :

Learn a design algorithm of good solutionsAθ : search algorithm for problems (X , f )Learn Aθ such that Aθ(X , f ) = arg minx∈X f (x)

According to the class of algorithms, search spaces, functions, etc.,huge number of learning algorithms

Artificial : from paper to computer !



Learning :



Optimization :






Learning :



Optimization :






Learning :



Optimization :






Learning :



Optimization :






Learning :



Optimization :





Black-box (expensive) optimization

x −→ −→ f (x)

No information on the objective definition function f

Objective fonction :

can be irregular, non continuous, non differentiable, etc.

given by a computation or an (expensive) simulation

Few examples from the team :

• Mobility simulation (Florian Lepretre),• Plant’s biology, plant growth (Amaury Dubois),• Logistic simulation (Brahim Aboutaib),• Cellular automata,• Nuclear power plant (Valentin Drouet),


Real-world black-box expensive optimizationPhD of Mathieu Muniglia, 2014-2017, Valentin Drouet, 2017-2020, CEA, Paris

x −→ −→ f (x)

(73, . . . , 8) −→ −→ ∆zP

Multi-physic simulator

Expensive optimization : parallel computing, and surrogate model.


Adaptive distributed optimization algorithmsChristopher Jankee, Bilel Derbel, Cyril Fonlupt

Portfolio of algorithms :Control of algorithm during

optimization

How to select an algorithm ?Design reinforcement learning

methods for distributed computing(ε-greedy, adapt. pursuit, UCB, ...)

How to compute a reward ?Aggregation function of local

rewards (mean, max, etc.) for aglobal selection



34

4

8

2

5


optimization







34

4

8

2

5

Methodology

Use designed benchmarkfunctions with designedproperties andexperimental analysis


optimization






Features to learn : Mult.-Obj. fitness landscapeK. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille), 2010 - 2018

Fitness landscape : (X , f ,N ), Search space, obj. func., neighborhood relation

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Obj

ectiv

e 2

Objective 1

conflicting objectives independent objectives correlated objectives

f_co

r_rw

srh

o#l

supp

_avg

_rw

s#l

supp

_avg

_aw

s#l

nd_a

vg_r

ws

#lnd

_avg

_aw

sle

ngth

_aw

s#s

up_a

vg_a

ws

#sup

_avg

_rw

s#i

nc_a

vg_r

ws

#inf

_avg

_rw

s#i

nc_a

vg_a

ws

#inf

_avg

_aw

shv

_r1_

rws

#sup

_r1_

rws

#inf

_r1_

rws

#lnd

_r1_

rws

nhv_

r1_r

ws

hvd_

r1_r

ws

#inc

_r1_

rws

k_n

#lsu

pp_r

1_rw

s nhv

d_av

g_rw

shv

d_av

g_aw

shv

_avg

_aw

shv

_avg

_rw

s mnh

v_av

g_rw

snh

v_av

g_aw

s

nhv_avg_awsnhv_avg_rwsmhv_avg_rwshv_avg_awshvd_avg_awshvd_avg_rwsn#lsupp_r1_rwsk_n#inc_r1_rwshvd_r1_rwsnhv_r1_rws#lnd_r1_rws#inf_r1_rws#sup_r1_rwshv_r1_rws#inf_avg_aws#inc_avg_aws#inf_avg_rws#inc_avg_rws#sup_avg_rws#sup_avg_awslength_aws#lnd_avg_aws#lnd_avg_rws#lsupp_avg_aws#lsupp_avg_rwsrhof_cor_rws

−1 0 1

Value

040

Kendall's tau

Cou

nt

Perf. prediction (cross-val.)feature set MAE MSE R2 rank

GSEMOall 0.007781 0.000118 0.951609 1enumeration 0.008411 0.000142 0.943046 2sampling all 0.009113 0.000161 0.932975 3sampling rws 0.009284 0.000167 0.930728 4sampling aws 0.010241 0.000195 0.917563 5{r ,m, n, k/n} 0.010609 0.000215 0.911350 6{r ,m, n} 0.026974 0.001123 0.518505 7{m, n} 0.032150 0.001545 0.340715 8


Learning/tuning parameters according to features


Learning/tuning parameters according to features


Surrogate model for pseudo-boolean functions

Goal : Replace/learn the (expensive) objective functionwith a (sheep) meta-model during the optimization process

Continuous optimization : NN, Gaussian Process (krigging),EGO : sample the next solution with max. expected improvement

GP : Random variables which have joint Gaussian distribution.

mean : m(y(x)) = µcovariance : cov(y(x), y(x ′)) = exp(−θd(x , x ′)p)

from : Rasmussen, Williams, GP for ML, MIT Press, 2006.


Surrogate model for pseudo-boolean functions

Goal : Replace/learn the (expensive) objective functionwith a (sheep) meta-model during the optimization process

Continuous optimization : NN, Gaussian Process (krigging),EGO : sample the next solution with max. expected improvement

Proposition

Walsh function basis : ∀x ∈ {0, 1}n, ϕk(x) = (−1)∑n−1

j=0 kjxj

f (x) =2n−1∑k=0

wk .ϕk(x) with wk =1

2n

∑x∈{0,1}n

f (x).ϕk(x)

Surrogate model :

f (x) =∑

k : o(ϕk )6d

wk .ϕk(x)

Estimate the coefficientswith LARS

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●0.00

0.01

0.02

0.03

0.04

0 100 200 300 400 500

Sample size

Mea

n A

bs. E

rror

of f

itnes

s

method●

●

krigingwalsh


Energy surface of deep learning problem

To learn Deep NN :High dimension spaceMinimize error with variants of stochastic gradient descent

Why does it works ?improves expressiveness but complicates optimization

What is the shape of energy surface ?

A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun. ”The loss surfaces of multilayer networks.”In Artificial Intelligence and Statistics, pp. 192-204. (2015).P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina.”Entropy-sgd : Biasing gradient descent into wide valleys.” ICLR (2017).S. Arora, N. Cohen, and E. Hazan. ”On the optimization of deep networks : Implicit acceleration byoverparameterization.” arXiv preprint arXiv :1802.06509 (2018).

Perspective

Study the geometry of fitness landscape...

Any idea, and collaboration are welcome !


Energy surface of deep learning problem

To learn Deep NN :High dimension spaceMinimize error with variants of stochastic gradient descent

Why does it works ?improves expressiveness but complicates optimization

What is the shape of energy surface ?

A. Choromanska, M. Henaff, M. Mathieu, G. Ben Arous, and Y. LeCun. ”The loss surfaces of multilayer networks.”In Artificial Intelligence and Statistics, pp. 192-204. (2015).P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, and R. Zecchina.”Entropy-sgd : Biasing gradient descent into wide valleys.” ICLR (2017).S. Arora, N. Cohen, and E. Hazan. ”On the optimization of deep networks : Implicit acceleration byoverparameterization.” arXiv preprint arXiv :1802.06509 (2018).

Perspective

Study the geometry of fitness landscape...

Any idea, and collaboration are welcome !

Date post:	26-Sep-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Machine learning and black-box expensive optimizationverel/talks/slides-AIwokrshop-18... ·...

Documents