+ All Categories
Home > Documents > Scanning Probe Microscopy Based On Reinforcement Learning

Scanning Probe Microscopy Based On Reinforcement Learning

Date post: 16-Oct-2021
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
1
Scanning Probe Microscopy Based On Reinforcement Learning Eric Yue Ma 1 1 Department of Applied Physics, Stanford University, CA 94305 Motivation and Goal Scanning probe microscopy (SPM) is one of the most important tools in solid-state and molecular science and technology Scanning a sharp tip across the sample surface while maintaining a constant tip-sample interaction, usually achieved via proportional-integral (PI) feedback Use machine learning to improve this feedback: o Automatic PI parameters tuning via a neural network o Reinforcement learning based feedback without explicit PI parameters SPM Simulator Input: vertical position of the sample surface (z s ) and tip (z t ) Output: measured tip-sample interaction signal s In the simplest case s is the position of a laser beam deflected by a micro-cantilever, which is linearly proportional to the tip-sample force upon contact: s 0 Sample Laser beam z s z t s 0 z s z t s 0 z t Scan z t x s-s 0 (error) x 0 Tip approach PI feedback: Performance of different PI parameter combinations: Direct gradient descent impractical due to local minima z s 400x400 “brutal force” exploration of (P, I) space: 160,000 line scans Fitting the error vs. (P, I) function with a neural network PI Tuning via Neural Network /10 Good performance is achieved with 400 + 200 sample line scans (~10 min) Reinforcement Learning Based Feedback Conclusion error Sample line scans + neural network fitting is a robust way to automatically tune the PI parameters MDP based reinforcement learning feedback is not particularly suitable for SPM applications noise The MDP model: S A P sa γ R s-s 0 discretized Δz t discretized Depends on tip- sample interaction and z s 0.5-0.9 -|s-s 0 | Each line scan is one trial, after which P sa is updated, and also the value function and optimal policy High-level expectation: the model will learn about the tip- sample interaction and the feature of the sample (z s ), to achieve good performance (small error s-s 0 ) Typical performance with N S = N A = 30, γ = 0.9: 2015 CS229 final project poster -18 18 Problems: stochastic and bottle-necked learning; inferior performance than well-tuned PI feedback; a continuous valued state space does not qualitatively improve performance Speculative cause: the error due to the unpredictable z s is comparable to that due to unoptimized MDP/policy, thus there is no way to learn well with a MDP model
Transcript
Page 1: Scanning Probe Microscopy Based On Reinforcement Learning

Scanning Probe Microscopy Based On Reinforcement LearningEric Yue Ma1

1Department of Applied Physics, Stanford University, CA 94305

Motivation and Goal

• Scanning probe microscopy (SPM) is one of the most

important tools in solid-state and molecular science and

technology

• Scanning a sharp tip across the sample surface while

maintaining a constant tip-sample interaction, usually

achieved via proportional-integral (PI) feedback

• Use machine learning to improve this feedback:

o Automatic PI parameters tuning via a neural

network

o Reinforcement learning based feedback without

explicit PI parameters

SPM Simulator

• Input: vertical position of the sample surface (zs) and tip (zt)

• Output: measured tip-sample interaction signal s

• In the simplest case s is the position of a laser beam

deflected by a micro-cantilever, which is linearly

proportional to the tip-sample force upon contact:

s

0

Sample

Laser

beam

zs

zt

s

0

zs

zt

s0

zt

Scanzt

x

s-s0 (error)

x

0

Tip approach

• PI feedback:

• Performance of different PI parameter combinations:

• Direct gradient descent impractical due to local minima

zs

400x400 “brutal

force” exploration of

(P, I) space:

160,000 line scans

• Fitting the error vs. (P, I) function with a neural network

PI Tuning via Neural Network

/10

Good performance is achieved with 400 + 200 sample line scans (~10 min)

Reinforcement Learning

Based Feedback

Conclusion

error

• Sample line scans + neural network fitting is a robust way to

automatically tune the PI parameters

• MDP based reinforcement learning feedback is not

particularly suitable for SPM applications

noise

• The MDP model:

S A Psa γ R

s-s0

discretized

Δzt

discretized

Depends on tip-

sample

interaction and zs

0.5-0.9 -|s-s0 |

• Each line scan is one trial, after which Psa is updated, and

also the value function and optimal policy

• High-level expectation: the model will learn about the tip-

sample interaction and the feature of the sample (zs), to

achieve good performance (small error s-s0)

• Typical performance with NS = NA = 30, γ = 0.9:

2015 CS229 final project poster

-18

18

• Problems: stochastic and bottle-necked learning; inferior

performance than well-tuned PI feedback; a continuous valued

state space does not qualitatively improve performance

• Speculative cause: the error due to the unpredictable zs is

comparable to that due to unoptimized MDP/policy, thus there

is no way to learn well with a MDP model

Recommended