+ All Categories
Home > Documents > CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD

CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD

Date post: 06-Jan-2016
Category:
Upload: viho
View: 26 times
Download: 2 times
Share this document with a friend
Description:
Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD. Organization of chapter in ISSO Contrast of gradient-based and gradient-free algorithms Motivating examples - PowerPoint PPT Presentation
11
CHAPTER 6 CHAPTER 6 S S TOCHASTIC TOCHASTIC A A PPROXIMATION AND THE PPROXIMATION AND THE F F INITE- INITE- D D IFFERENCE IFFERENCE M M ETHOD ETHOD •Organization of chapter in ISSO –Contrast of gradient-based and gradient- free algorithms –Motivating examples –Finite-difference algorithm –Convergence theory –Asymptotic normality –Selection of gain sequences –Numerical examples –Extensions and segue to SPSA in Chapter Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
Transcript
Page 1: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

CHAPTER 6CHAPTER 6

SSTOCHASTIC TOCHASTIC AAPPROXIMATION AND PPROXIMATION AND

THE THE FFINITE-INITE-DDIFFERENCE IFFERENCE MMETHODETHOD

•Organization of chapter in ISSO–Contrast of gradient-based and gradient-free algorithms

–Motivating examples

–Finite-difference algorithm–Convergence theory–Asymptotic normality–Selection of gain sequences–Numerical examples–Extensions and segue to SPSA in Chapter 7

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Page 2: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-2

Motivation for AlgorithmsMotivation for AlgorithmsNot Requiring Gradient of Loss FunctionNot Requiring Gradient of Loss Function

• Primary interest here is in optimization problems for which we cannot obtain direct measurements of L/

cannotcannot use techniques such as Robbins-Monro SA, steepest descent, etc.

cancan (in principle) use techniques such as Kiefer and Wolfowitz SA (Chapter 6), genetic algorithms (Chapters 9–10),…

• Many such “gradient-free” problems arise in practice

– Generic difficult parameter estimation

– Model-free feedback control

– Simulation-based optimization

– Experimental design: sensor configuration

Page 3: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-3

Model-Free Control Setup Model-Free Control Setup (Example 6.2 in (Example 6.2 in ISSOISSO))

Page 4: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-4

Finite Difference SA (FDSA) MethodFinite Difference SA (FDSA) Method

• FDSA has standard “first-order” form of root-finding (Robbins-Monro) SA– Finite difference approximation replaces direct gradient

measurement (Chap. 5)

– Resulting algorithm sometimes called Kiefer-Wolfowitz SAKiefer-Wolfowitz SA

• Let denote FD estimate of g() at kth iteration (next slide)

• Let denote estimate for at kth iteration• FDSA algorithm has form

where ak is nonnegative gain value

• Under conditions, in stochastic sense (a.s.)

ˆ ( )kg

1ˆ ˆ ˆˆ ( )k k k k ka g

Page 5: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-5

Finite Difference Gradient ApproximationFinite Difference Gradient Approximation• Classical method for approximating gradients in Kiefer-

Wolfowitz SA is by finite differences• FD gradient approximation used in SA recursion as gradient

measurement (previous slide)

• Standard two-sided gradient approximation at iteration k is

where j is p-dimensional with 1 in jth entry, 0 elsewhere

• Each computation of FD approximation takes 2p measurements y(•)

k k k k

k

k k

k k p k k p

k

y c y cc

y c y c

c

1 1ˆ ˆ( ) ( )

2

ˆˆ ( )

ˆ ˆ( ) ( )

2

g

Page 6: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-6

Shaded TriangleShaded Triangle Shows Shows Valid Coefficient Valid Coefficient Values Values and and in Gain Sequences in Gain Sequences aakk = =

aa//((kk+1++1+AA)) and and cckk = = cc//((kk+1)+1) (Sect. 6.5 of (Sect. 6.5 of ISSOISSO))

Solid line indicates non-strict Solid line indicates non-strict border (border ( or or ) and dashed ) and dashed line indicates strict border (>)line indicates strict border (>)

Page 7: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-7

Example: Wastewater Treatment Problem Example: Wastewater Treatment Problem (Example 6.5 in (Example 6.5 in ISSOISSO))

• Small-scale problem with p = 2– Aim is to optimize water cleanliness and methane gas byproduct

– Evaluated algorithms with 50 realizations of N = 2000 measurements

• Used FDSA with gains ak = a/(1 + k) and ck = 1/(1 + k)1/6

– Asymptotically optimal decay rates found “best”

• Gain tuning chooses a; naïve gain sets a = 1• Also compared with random search algorithm B from Chapter 2• Algorithms use noisy loss measurements (same level as in

Example 2.7 in ISSO)

Page 8: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-8

Mean values ofMean values of L() with 95% Confidence Intervals with 95% Confidence Intervals

FDSA with “naïve” gains

FDSA with tuned gains

N = 100 (25 iters.)

0.11 [0.087, 0.140]

0.083 [0.057, 0.108]

N = 2000 (500 iters.)

0.023 [0.017, 0.028]

0.021 [0.016, 0.026]

Above numbers much lower than random search algorithm B: best value at N = 2000 is 0.38

Shows value of approximating gradient in FDSA

ˆ( )kL

Page 9: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-9

Example: Skewed-Quartic Loss FunctionExample: Skewed-Quartic Loss Function(Examples 6.6 and 6.7 in (Examples 6.6 and 6.7 in ISSOISSO))

• Larger-scale problem with p = 10:

()i is the i th component of B, and pB is an upper triangular

matrix of ones

• Used N = 1000 measurements; 50 replications

• Used FDSA with gains ak = a/(1+k+A) and ck = c/(1+k)

• “Semi-automatic” and manual gain tuning

• Also compared with random search algorithm B

3 4

1 1

( ) 0.1 ( ) 0.01 ( )p p

T Ti i

i i

L B B B B

Page 10: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-10

Algorithm Comparison with Skewed-Quartic Algorithm Comparison with Skewed-Quartic Loss Function (Loss Function (pp = 10) (Example 6.6 in = 10) (Example 6.6 in ISSOISSO))

Page 11: CHAPTER 6 S TOCHASTIC  A PPROXIMATION AND THE  F INITE- D IFFERENCE  M ETHOD

6-11

Example with Skewed-Quartic Loss: Example with Skewed-Quartic Loss: Mean Terminal Values and 95% Confidence Mean Terminal Values and 95% Confidence

Intervals for Intervals for

FDSA: semi-automatic

gains

FDSA: manually

tuned gains

Random search B

0.427 [0.411, 0.443]

0.531 [0.502, 0.561]

1.285 [1.190, 1.378]

FDSA semi-automatic is best with respect to error

Random search algorithm B produces solution further from

than initial condition!

But loss value is better than initial condition

k 0ˆ ˆ


Recommended