Stochasitc Process and Applications by GA Pavliotis

STOCHASTIC PROCESSES AND APPLICATIONS

G.A. PavliotisDepartment of Mathematics

Imperial College LondonLondon SW7 2AZ, UK

February 23, 2011

2

Contents

Preface vii

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 The One-Dimensional Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Stochastic Modeling of Deterministic Chaos . . . . . . . . . . . . . . . . . . . . . 6

1.5 Why Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Discussion and Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Elements of Probability Theory 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Basic Definitions from Probability Theory . . . . . . . . . . . . . . . . . . . . . . 9

2.2.1 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Expectation of Random Variables . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Conditional Expecation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Types of Convergence and Limit Theorems . . . . . . . . . . . . . . . . . . . . . 23


2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i

3 Basics of the Theory of Stochastic Processes 29

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Strictly Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Second Order Stationary Processes . . . . . . . . . . . . . . . . . . . . . 32

3.3.3 Ergodic Properties of Second-Order Stationary Processes . . . . . . . . . . 37

3.4 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5 Other Examples of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.1 Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.2 Fractional Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6 The Karhunen-Loeve Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Markov Processes 57

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Definition of a Markov Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4 The Chapman-Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.5 The Generator of a Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . 67

4.5.1 The Adjoint Semigroup . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.6 Ergodic Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.6.1 Stationary Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . 71


4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5 Diffusion Processes 77

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Definition of a Diffusion Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3 The Backward and Forward Kolmogorov Equations . . . . . . . . . . . . . . . . . 79

ii

5.3.1 The Backward Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . 79

5.3.2 The Forward Kolmogorov Equation . . . . . . . . . . . . . . . . . . . . . 81

5.4 Multidimensional Diffusion Processes . . . . . . . . . . . . . . . . . . . . . . . . 84

5.5 Connection with Stochastic Differential Equations . . . . . . . . . . . . . . . . . . 84

5.6 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 The Fokker-Planck Equation 87

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2 Basic Properties of the FP Equation . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.1 Existence and Uniqueness of Solutions . . . . . . . . . . . . . . . . . . . 88

6.2.2 The FP equation as a conservation law . . . . . . . . . . . . . . . . . . . . 89

6.2.3 Boundary conditions for the Fokker–Planck equation . . . . . . . . . . . . 90

6.3 Examples of Diffusion Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3.2 The Ornstein-Uhlenbeck Process . . . . . . . . . . . . . . . . . . . . . . . 95

6.3.3 The Geometric Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 99

6.4 The Ornstein-Uhlenbeck Process and Hermite Polynomials . . . . . . . . . . . . . 100

6.5 Reversible Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.5.1 Markov Chain Monte Carlo (MCMC) . . . . . . . . . . . . . . . . . . . . 111

6.6 Perturbations of non-Reversible Diffusions . . . . . . . . . . . . . . . . . . . . . . 112

6.7 Eigenfunction Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.7.1 Reduction to a Schrodinger Equation . . . . . . . . . . . . . . . . . . . . 114


6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7 Stochastic Differential Equations 119

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.2 The Ito and Stratonovich Stochastic Integral . . . . . . . . . . . . . . . . . . . . . 119

7.2.1 The Stratonovich Stochastic Integral . . . . . . . . . . . . . . . . . . . . . 121

7.3 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

iii

7.3.1 Examples of SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.4 The Generator, Ito’s formula and the Fokker-Planck Equation . . . . . . . . . . . . 125

7.4.1 The Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.4.2 Ito’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.5 Linear SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.6 Derivation of the Stratonovich SDE . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.6.1 Ito versus Stratonovich . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.7 Numerical Solution of SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.8 Parameter Estimation for SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.9 Noise Induced Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133


7.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

8 The Langevin Equation 137

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8.2 The Fokker-Planck Equation in Phase Space (Klein-Kramers Equation) . . . . . . 137

8.3 The Langevin Equation in a Harmonic Potential . . . . . . . . . . . . . . . . . . . 142

8.4 Asymptotic Limits for the Langevin Equation . . . . . . . . . . . . . . . . . . . . 151

8.4.1 The Overdamped Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.4.2 The Underdamped Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.5 Brownian Motion in Periodic Potentials . . . . . . . . . . . . . . . . . . . . . . . 164

8.5.1 The Langevin equation in a periodic potential . . . . . . . . . . . . . . . . 164

8.5.2 Equivalence With the Green-Kubo Formula . . . . . . . . . . . . . . . . . 170

8.6 The Underdamped and Overdamped Limits of the Diffusion Coefficient . . . . . . 171

8.6.1 Brownian Motion in a Tilted Periodic Potential . . . . . . . . . . . . . . . 180

8.7 Numerical Solution of the Klein-Kramers Equation . . . . . . . . . . . . . . . . . 183


8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

9 Exit Time Problems 185

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

9.2 Brownian Motion in a Bistable Potential . . . . . . . . . . . . . . . . . . . . . . . 185

iv

9.3 The Mean First Passage Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

9.3.1 The Boundary Value Problem for the MFPT . . . . . . . . . . . . . . . . . 188

9.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

9.4 Escape from a Potential Barrier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

9.4.1 Calculation of the Reaction Rate in the Overdamped Regime . . . . . . . . 193

9.4.2 The Intermediate Regime: γ = O(1) . . . . . . . . . . . . . . . . . . . . . 194

9.4.3 Calculation of the Reaction Rate in the energy-diffusion-limited regime . . 195


9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10 Stochastic Resonance and Brownian Motors 199

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10.2 Stochastic Resonance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10.3 Brownian Motors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

10.5 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

10.6 Multiscale Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

10.6.1 Calculation of the Effective Drift . . . . . . . . . . . . . . . . . . . . . . . 203

10.6.2 Calculation of the Effective Diffusion Coefficient . . . . . . . . . . . . . . 205

10.7 Effective Diffusion Coefficient for Correlation Ratchets . . . . . . . . . . . . . . . 207


10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

11 Stochastic Processes and Statistical Mechanics 213

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

11.2 The Kac-Zwanzig Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

11.3 Quasi-Markovian Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . 218

11.3.1 Open Classical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

11.4 The Mori-Zwanzig Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

11.5 Derivation of the Fokker-Planck and Langevin Equations . . . . . . . . . . . . . . 224

11.6 Linear Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224


v

11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

vi

Preface

The purpose of these notes is to present various results and techniques from the theory of stochastic

processes and are useful in the study of stochastic problems in physics, chemistry and other areas.

These notes have been used for several years for a course on applied stochastic processes offered to

fourth year and to MSc students in applied mathematics at the department of mathematics, Imperial

College London.

G.A. Pavliotis

London, December 2010

vii

viii

Chapter 1

Introduction

1.1 Introduction

In this chapter we introduce some of the concepts and techniques that we will study in this book.

In Section 1.2 we present a brief historical overview on the development of the theory of stochastic

processes in the twentieth century. In Section 1.3 we introduce the one-dimensional random walk

an we use this example in order to introduce several concepts such Brownian motion, the Markov

property. In Section 1.4 we discuss about the stochastic modeling of deterministic chaos. Some

comments on the role of probabilistic modeling in the physical sciences are offered in Section 1.5.

Discussion and bibliographical comments are presented in Section 1.6. Exercises are included in

Section 1.7.

1.2 Historical Overview

The theory of stochastic processes, at least in terms of its application to physics, started with

Einstein’s work on the theory of Brownian motion: Concerning the motion, as required by the

molecular-kinetic theory of heat, of particles suspended in liquids at rest (1905) and in a series

of additional papers that were published in the period 1905 − 1906. In these fundamental works,

Einstein presented an explanation of Brown’s observation (1827) that when suspended in water,

small pollen grains are found to be in a very animated and irregular state of motion. In develop-

ing his theory Einstein introduced several concepts that still play a fundamental role in the study

of stochastic processes and that we will study in this book. Using modern terminology, Einstein

introduced a Markov chain model for the motion of the particle (molecule, pollen grain...). Fur-

1

thermore, he introduced the idea that it makes more sense to talk about the probability of finding

the particle at position x at time t, rather than about individual trajectories.

In his work many of the main aspects of the modern theory of stochastic processes can be

found:

• The assumption of Markovianity (no memory) expressed through the Chapman-Kolmogorov

equation.

• The Fokker–Planck equation (in this case, the diffusion equation).

• The derivation of the Fokker-Planck equation from the master (Chapman-Kolmogorov) equa-

tion through a Kramers-Moyal expansion.

• The calculation of a transport coefficient (the diffusion equation) using macroscopic (kinetic

theory-based) considerations:

D =kBT

6πηa.

• kB is Boltzmann’s constant, T is the temperature, η is the viscosity of the fluid and a is the

diameter of the particle.

Einstein’s theory is based on the Fokker-Planck equation. Langevin (1908) developed a theory

based on a stochastic differential equation. The equation of motion for a Brownian particle is

md2x

dt2= −6πηa

dx

dt+ ξ,

where ξ is a random force. It can be shown that there is complete agreement between Einstein’s

theory and Langevin’s theory. The theory of Brownian motion was developed independently by

Smoluchowski, who also performed several experiments.

The approaches of Langevin and Einstein represent the two main approaches in the theory of

stochastic processes:

• Study individual trajectories of Brownian particles. Their evolution is governed by a stochas-

tic differential equation:dX

dt= F (X) + Σ(X)ξ(t),

• where ξ(t) is a random force.

2

• Study the probability ρ(x, t) of finding a particle at position x at time t. This probability

distribution satisfies the Fokker-Planck equation:

∂ρ

∂t= −∇ · (F (x)ρ) +

1

2∇∇ : (A(x)ρ),

• where A(x) = Σ(x)Σ(x)T .

The theory of stochastic processes was developed during the 20th century by several mathemati-

cians and physicists including Smoluchowksi, Planck, Kramers, Chandrasekhar, Wiener, Kol-

mogorov, Ito, Doob.

1.3 The One-Dimensional Random Walk

We let time be discrete, i.e. t = 0, 1, . . . . Consider the following stochastic process Sn: S0 = 0; at

each time step it moves to ±1 with equal probability 12.

In other words, at each time step we flip a fair coin. If the outcome is heads, we move one unit

to the right. If the outcome is tails, we move one unit to the left.

Alternatively, we can think of the random walk as a sum of independent random variables:

Sn =n∑j=1

Xj,

where Xj ∈ −1, 1 with P(Xj = ±1) = 12.

We can simulate the random walk on a computer:

• We need a (pseudo)random number generator to generate n independent random variables

which are uniformly distributed in the interval [0,1].

• If the value of the random variable is > 12

then the particle moves to the left, otherwise it

moves to the right.

• We then take the sum of all these random moves.

• The sequence SnNn=1 indexed by the discrete time T = 1, 2, . . . N is the path of the

random walk. We use a linear interpolation (i.e. connect the points n, Sn by straight lines)

to generate a continuous path.

3

Figure 1.1: Three paths of the random walk of length N = 50.

Figure 1.2: Three paths of the random walk of length N = 1000.

4

Figure 1.3: Sample Brownian paths.

Every path of the random walk is different: it depends on the outcome of a sequence of indepen-

dent random experiments. We can compute statistics by generating a large number of paths and

computing averages. For example, E(Sn) = 0, E(S2n) = n. The paths of the random walk (without

the linear interpolation) are not continuous: the random walk has a jump of size 1 at each time step.

This is an example of a discrete time, discrete space stochastic processes. The random walk is a

time-homogeneous Markov process. If we take a large number of steps, the random walk starts

looking like a continuous time process with continuous paths.

We can quantify this observation by introducing an appropriate rescaled process and by taking

an appropriate limit. Consider the sequence of continuous time stochastic processes

Znt :=

1√nSnt.

In the limit as n → ∞, the sequence Znt converges (in some appropriate sense, that will be

made precise in later chapters) to a Brownian motion with diffusion coefficient D = ∆x2

2∆t= 1

2.

Brownian motion W (t) is a continuous time stochastic processes with continuous paths that starts

at 0 (W (0) = 0) and has independent, normally. distributed Gaussian increments. We can simulate

the Brownian motion on a computer using a random number generator that generates normally

distributed, independent random variables. We can write an equation for the evolution of the paths

5

of a Brownian motion Xt with diffusion coefficient D starting at x:

dXt =√

2DdWt, X0 = x.

This is the simplest example of a stochastic differential equation. The probability of finding

Xt at y at time t, given that it was at x at time t = 0, the transition probability density ρ(y, t)

satisfies the PDE∂ρ

∂t= D

∂2ρ

∂y2, ρ(y, 0) = δ(y − x).

This is the simplest example of the Fokker-Planck equation. The connection between Brownian

motion and the diffusion equation was made by Einstein in 1905.

1.4 Stochastic Modeling of Deterministic Chaos

1.5 Why Randomness

Why introduce randomness in the description of physical systems?

• To describe outcomes of a repeated set of experiments. Think of tossing a coin repeatedly or

of throwing a dice.

• To describe a deterministic system for which we have incomplete information: we have

imprecise knowledge of initial and boundary conditions or of model parameters.

– ODEs with random initial conditions are equivalent to stochastic processes that can be

described using stochastic differential equations.

• To describe systems for which we are not confident about the validity of our mathematical

model.

• To describe a dynamical system exhibiting very complicated behavior (chaotic dynamical

systems). Determinism versus predictability.

• To describe a high dimensional deterministic system using a simpler, low dimensional stochas-

tic system. Think of the physical model for Brownian motion (a heavy particle colliding with

many small particles).

6

• To describe a system that is inherently random. Think of quantum mechanics.

Stochastic modeling is currently used in many different areas ranging from biology to climate

modeling to economics.

1.6 Discussion and Bibliography

The fundamental papers of Einstein on the theory of Brownian motion have been reprinted by

Dover [20]. The readers of this book are strongly encouraged to study these papers. Other fun-

damental papers from the early period of the development of the theory of stochastic processes

include the papers by Langevin, Ornstein and Uhlenbeck, Doob, Kramers and Chandrashekhar’s

famous review article [12]. Many of these early papers on the theory of stochastic processes have

been reprinted in [18]. Very useful historical comments can be founds in the books by Nelson [68]

and Mazo [66].

1.7 Exercises

1. Read the papers by Einstein, Ornstein-Uhlenbeck, Doob etc.

2. Write a computer program for generating the random walk in one and two dimensions. Study

numerically the Brownian limit and compute the statistics of the random walk.

7

8

Chapter 2

Elements of Probability Theory

2.1 Introduction

In this chapter we put together some basic definitions and results from probability theory that will

be used later on. In Section 2.2 we give some basic definitions from the theory of probability.

In Section 2.3 we present some properties of random variables. In Section 2.4 we introduce the

concept of conditional expectation and in Section 2.5 we define the characteristic function, one of

the most useful tools in the study of (sums of) random variables. Some explicit calculations for

the multivariate Gaussian distribution are presented in Section 2.6. Different types of convergence

and the basic limit theorems of the theory of probability are discussed in Section 2.7. Discussion

and bibliographical comments are presented in Section 2.8. Exercises are included in Section 2.9.

2.2 Basic Definitions from Probability Theory

In Chapter 1 we defined a stochastic process as a dynamical system whose law of evolution is

probabilistic. In order to study stochastic processes we need to be able to describe the outcome of

a random experiment and to calculate functions of this outcome. First we need to describe the set

of all possible experiments.

Definition 2.2.1. The set of all possible outcomes of an experiment is called the sample space and

is denoted by Ω.

Example 2.2.2. • The possible outcomes of the experiment of tossing a coin areH and T . The

sample space is Ω =H, T

.

9

• The possible outcomes of the experiment of throwing a die are 1, 2, 3, 4, 5 and 6. The

sample space is Ω =

1, 2, 3, 4, 5, 6

.

We define events to be subsets of the sample space. Of course, we would like the unions,

intersections and complements of events to also be events. When the sample space Ω is uncount-

able, then technical difficulties arise. In particular, not all subsets of the sample space need to be

events. A definition of the collection of subsets of events which is appropriate for finite additive

probability is the following.

Definition 2.2.3. A collection F of Ω is called a field on Ω if

i. ∅ ∈ F;

ii. if A ∈ F then Ac ∈ F;

iii. If A, B ∈ F then A ∪B ∈ F .

From the definition of a field we immediately deduce that F is closed under finite unions and

finite intersections:

A1, . . . An ∈ F ⇒ ∪ni=1Ai ∈ F , ∩ni=1Ai ∈ F .

When Ω is infinite dimensional then the above definition is not appropriate since we need to

consider countable unions of events.

Definition 2.2.4. A collection F of Ω is called a σ-field or σ-algebra on Ω if

i. ∅ ∈ F;

ii. if A ∈ F then Ac ∈ F;

iii. If A1, A2, · · · ∈ F then ∪∞i=1Ai ∈ F .

A σ-algebra is closed under the operation of taking countable intersections.

Example 2.2.5. • F =∅, Ω

.

• F =∅, A, Ac, Ω

where A is a subset of Ω.

• The power set of Ω, denoted by 0, 1Ω which contains all subsets of Ω.

10

Let F be a collection of subsets of Ω. It can be extended to a σ−algebra (take for example the

power set of Ω). Consider all the σ−algebras that contain F and take their intersection, denoted

by σ(F), i.e. A ⊂ Ω if and only if it is in every σ−algebra containing F . σ(F) is a σ−algebra

(see Exercise 1 ). It is the smallest algebra containing F and it is called the σ−algebra generated

by F .

Example 2.2.6. Let Ω = Rn. The σ-algebra generated by the open subsets of Rn (or, equivalently,

by the open balls of Rn) is called the Borel σ-algebra of Rn and is denoted by B(Rn).

Let X be a closed subset of Rn. Similarly, we can define the Borel σ-algebra of X , denoted by

B(X).

A sub-σ–algebra is a collection of subsets of a σ–algebra which satisfies the axioms of a σ–

algebra.

The σ−field F of a sample space Ω contains all possible outcomes of the experiment that we

want to study. Intuitively, the σ−field contains all the information about the random experiment

that is available to us.

Now we want to assign probabilities to the possible outcomes of an experiment.

Definition 2.2.7. A probability measure P on the measurable space (Ω, F) is a function P :

F 7→ [0, 1] satisfying

i. P(∅) = 0, P(Ω) = 1;

ii. For A1, A2, . . . with Ai ∩ Aj = ∅, i 6= j then

P(∪∞i=1Ai) =∞∑i=1

P(Ai).

Definition 2.2.8. The triple(Ω, F , P

)comprising a set Ω, a σ-algebra F of subsets of Ω and a

probability measure P on (Ω, F) is a called a probability space.

Example 2.2.9. A biased coin is tossed once: Ω = H, T, F = ∅, H, T, Ω = 0, 1, P :

F 7→ [0, 1] such that P(∅) = 0, P(H) = p ∈ [0, 1], P(T ) = 1− p, P(Ω) = 1.

Example 2.2.10. Take Ω = [0, 1], F = B([0, 1]), P = Leb([0, 1]). Then (Ω,F ,P) is a probability

space.

11

2.2.1 Conditional Probability

One of the most important concepts in probability is that of the dependence between events.

Definition 2.2.11. A family Ai : i ∈ I of events is called independent if

P(∩j∈J Aj

)= Πj∈JP(Aj)

for all finite subsets J of I .

When two events A, B are dependent it is important to know the probability that the event

A will occur, given that B has already happened. We define this to be conditional probability,

denoted by P(A|B). We know from elementary probability that

P (A|B) =P (A ∩B)

P(B).

A very useful result is that of the total law of probability.

Definition 2.2.12. A family of events Bi : i ∈ I is called a partition of Ω if

Bi ∩Bj = ∅, i 6= j and ∪i∈I Bi = Ω.

Proposition 2.2.13. Law of total probability. For any event A and any partition Bi : i ∈ Iwe have

P(A) =∑i∈I

P(A|Bi)P(Bi).

The proof of this result is left as an exercise. In many cases the calculation of the probability

of an event is simplified by choosing an appropriate partition of Ω and using the law of total

probability.

Let (Ω,F ,P) be a probability space and fix B ∈ F . Then P(·|B) defines a probability measure

on F . Indeed, we have that

P(∅|B) = 0, P(Ω|B) = 1

and (since Ai ∩ Aj = ∅ implies that (Ai ∩B) ∩ (Aj ∩B) = ∅)

P (∪∞j=1Ai|B) =∞∑j=1

P(Ai|B),

for a countable family of pairwise disjoint sets Aj+∞j=1. Consequently, (Ω,F ,P(·|B)) is a proba-

bility space for every B ∈ cF .

12

2.3 Random Variables

We are usually interested in the consequences of the outcome of an experiment, rather than the

experiment itself. The function of the outcome of an experiment is a random variable, that is, a

map from Ω to R.

Definition 2.3.1. A sample space Ω equipped with a σ−field of subsets F is called a measurable

space.

Definition 2.3.2. Let (Ω,F) and (E,G) be two measurable spaces. A function X : Ω → E such

that the event

ω ∈ Ω : X(ω) ∈ A =: X ∈ A (2.1)

belongs to F for arbitrary A ∈ G is called a measurable function or random variable.

When E is R equipped with its Borel σ-algebra, then (2.1) can by replaced with

X 6 x ∈ F ∀x ∈ R.

Let X be a random variable (measurable function) from (Ω,F , µ) to (E,G). If E is a metric space

then we may define expectation with respect to the measure µ by

E[X] =

∫Ω

X(ω) dµ(ω).

More generally, let f : E 7→ R be G–measurable. Then,

E[f(X)] =

∫Ω

f(X(ω)) dµ(ω).

Let U be a topological space. We will use the notation B(U) to denote the Borel σ–algebra of U :

the smallest σ–algebra containing all open sets of U . Every random variable from a probability

space (Ω,F , µ) to a measurable space (E,B(E)) induces a probability measure on E:

µX(B) = PX−1(B) = µ(ω ∈ Ω;X(ω) ∈ B), B ∈ B(E). (2.2)

The measure µX is called the distribution (or sometimes the law) of X .

Example 2.3.3. Let I denote a subset of the positive integers. A vector ρ0 = ρ0,i, i ∈ I is a

distribution on I if it has nonnegative entries and its total mass equals 1:∑

i∈I ρ0,i = 1.

13

Consider the case where E = R equipped with the Borel σ−algebra. In this case a random

variable is defined to be a function X : Ω→ R such that

ω ∈ Ω : X(ω) 6 x ⊂ F ∀x ∈ R.

We can now define the probability distribution function of X , FX : R→ [0, 1] as

FX(x) = P( ω ∈ Ω

∣∣X(ω) 6 x)

=: P(X 6 x). (2.3)

In this case, (R,B(R), FX) becomes a probability space.

The distribution function FX(x) of a random variable has the properties that limx→−∞ FX(x) =

0, limx→+∞ F (x) = 1 and is right continuous.

Definition 2.3.4. A random variable X with values on R is called discrete if it takes values in

some countable subset x0, x1, x2, . . . of R. i.e.: P(X = x) 6= x only for x = x0, x1, . . . .

With a random variable we can associate the probability mass function pk = P(X = xk).

We will consider nonnegative integer valued discrete random variables. In this case pk = P(X =

k), k = 0, 1, 2, . . . .

Example 2.3.5. The Poisson random variable is the nonnegative integer valued random variable

with probability mass function

pk = P(X = k) =λk

k!e−λ, k = 0, 1, 2, . . . ,

where λ > 0.

Example 2.3.6. The binomial random variable is the nonnegative integer valued random variable

with probability mass function

pk = P(X = k) =N !

n!(N − n)!pnqN−n k = 0, 1, 2, . . . N,

where p ∈ (0, 1), q = 1− p.

Definition 2.3.7. A random variable X with values on R is called continuous if P(X = x) =

0 ∀x ∈ R.

14

Let (Ω,F ,P) be a probability space and letX : Ω→ R be a random variable with distribution

FX . This is a probability measure on B(R). We will assume that it is absolutely continuous with

respect to the Lebesgue measure with density ρX : FX(dx) = ρ(x) dx. We will call the density

ρ(x) the probability density function (PDF) of the random variable X .

Example 2.3.8. i. The exponential random variable has PDF

f(x) =

λe−λx x > 0,

0 x < 0,

with λ > 0.

ii. The uniform random variable has PDF

f(x) =

1b−a a < x < b,

0 x /∈ (a, b),

with a < b.

Definition 2.3.9. Two random variables X and Y are independent if the events ω ∈ Ω |X(ω) 6

x and ω ∈ Ω |Y (ω) 6 y are independent for all x, y ∈ R.

Let X, Y be two continuous random variables. We can view them as a random vector, i.e. a

random variable from Ω to R2. We can then define the joint distribution function

F (x, y) = P(X 6 x, Y 6 y).

The mixed derivative of the distribution function fX,Y (x, y) := ∂2F∂x∂y

(x, y), if it exists, is called the

joint PDF of the random vector X, Y :

FX,Y (x, y) =

∫ x

−∞

∫ y

−∞fX,Y (x, y) dxdy.

If the random variables X and Y are independent, then

FX,Y (x, y) = FX(x)FY (y)

and

fX,Y (x, y) = fX(x)fY (y).

15

The joint distribution function has the properties

FX,Y (x, y) = FY,X(y, x),

FX,Y (+∞, y) = FY (y), fY (y) =

∫ +∞

−∞fX,Y (x, y) dx.

We can extend the above definition to random vectors of arbitrary finite dimensions. Let X be

a random variable from (Ω,F , µ) to (Rd,B(Rd)). The (joint) distribution function FXRd → [0, 1]

is defined as

FX(x) = P(X 6 x).

Let X be a random variable in Rd with distribution function f(xN) where xN = x1, . . . xN. We

define the marginal or reduced distribution function fN−1(xN−1) by

fN−1(xN−1) =

∫RfN(xN) dxN .

We can define other reduced distribution functions:

fN−2(xN−2) =

∫RfN−1(xN−1) dxN−1 =

∫R

∫Rf(xN) dxN−1dxN .

2.3.1 Expectation of Random Variables

We can use the distribution of a random variable to compute expectations and probabilities:

E[f(X)] =

∫Rf(x) dFX(x) (2.4)

and

P[X ∈ G] =

∫G

dFX(x), G ∈ B(E). (2.5)

The above formulas apply to both discrete and continuous random variables, provided that we

define the integrals in (2.4) and (2.5) appropriately.

When E = Rd and a PDF exists, dFX(x) = fX(x) dx, we have

FX(x) := P(X 6 x) =

∫ x1

−∞. . .

∫ xd

−∞fX(x) dx..

When E = Rd then by Lp(Ω; Rd), or sometimes Lp(Ω;µ) or even simply Lp(µ), we mean the

Banach space of measurable functions on Ω with norm

‖X‖Lp =(E|X|p

)1/p

.

16

Let X be a nonnegative integer valued random variable with probability mass function pk. We

can compute the expectation of an arbitrary function of X using the formula

E(f(X)) =∞∑k=0

f(k)pk.

Let X, Y be random variables we want to know whether they are correlated and, if they are, to

calculate how correlated they are. We define the covariance of the two random variables as

cov(X, Y ) = E[(X − EX)(Y − EY )

]= E(XY )− EXEY.

The correlation coefficient is

ρ(X, Y ) =cov(X, Y )√

var(X)√

var(X)(2.6)

The Cauchy-Schwarz inequality yields that ρ(X, Y ) ∈ [−1, 1]. We will say that two random

variables X and Y are uncorrelated provided that ρ(X, Y ) = 0. It is not true in general that

two uncorrelated random variables are independent. This is true, however, for Gaussian random

variables (see Exercise 5).

Example 2.3.10. • Consider the random variable X : Ω 7→ R with pdf

γσ,b(x) := (2πσ)−12 exp

(−(x− b)2

2σ

).

Such an X is termed a Gaussian or normal random variable. The mean is

EX =

∫Rxγσ,b(x) dx = b

and the variance is

E(X − b)2 =

∫R(x− b)2γσ,b(x) dx = σ.

• Let b ∈ Rd and Σ ∈ Rd×d be symmetric and positive definite. The random variableX : Ω 7→Rd with pdf

γΣ,b(x) :=((2π)ddetΣ

)− 12 exp

(−1

2〈Σ−1(x− b), (x− b)〉

)is termed a multivariate Gaussian or normal random variable. The mean is

E(X) = b (2.7)

17

and the covariance matrix is

E(

(X − b)⊗ (X − b))

= Σ. (2.8)

Since the mean and variance specify completely a Gaussian random variable on R, the Gaussian

is commonly denoted by N (m,σ). The standard normal random variable is N (0, 1). Similarly,

since the mean and covariance matrix completely specify a Gaussian random variable on Rd, the

Gaussian is commonly denoted by N (m,Σ).

Some analytical calculations for Gaussian random variables will be presented in Section 2.6.

2.4 Conditional Expecation

Assume that X ∈ L1(Ω,F , µ) and let G be a sub–σ–algebra of F . The conditional expectation

of X with respect to G is defined to be the function (random variable) E[X|G] : Ω 7→ E which is

G–measurable and satisfies ∫G

E[X|G] dµ =

∫G

X dµ ∀G ∈ G.

We can define E[f(X)|G] and the conditional probability P[X ∈ F |G] = E[IF (X)|G], where IF is

the indicator function of F , in a similar manner.

We list some of the most important properties of conditional expectation.

Theorem 2.4.1. [Properties of Conditional Expectation]. Let (Ω,F , µ) be a probability space and

let G be a sub–σ–algebra of F .

(a) If X is G−measurable and integrable then E(X|G) = X .

(b) (Linearity) If X1, X2 are integrable and c1, c2 constants, then

E(c1X1 + c2X2|G) = c1E(X1|G) + c2E(X2|G).

(c) (Order) If X1, X2 are integrable and X1 6 X2 a.s., then E(X1|G) 6 E(X2|G) a.s.

(d) If Y and XY are integrable, and X is G−measurable then E(XY |G) = XE(Y |G).

(e) (Successive smoothing) If D is a sub–σ–algebra of F , D ⊂ G and X is integrable, then

E(X|D) = E[E(X|G)|D] = E[E(X|D)|G].

18

(f) (Convergence) Let Xn∞n=1 be a sequence of random variables such that, for all n, |Xn| 6 Z

where Z is integrable. If Xn → X a.s., then E(Xn|G)→ E(X|G) a.s. and in L1.

Proof. See Exercise 10.

2.5 The Characteristic Function

Many of the properties of (sums of) random variables can be studied using the Fourier transform

of the distribution function. Let F (λ) be the distribution function of a (discrete or continuous)

random variable X . The characteristic function of X is defined to be the Fourier transform of

the distribution function

φ(t) =

∫Reitλ dF (λ) = E(eitX). (2.9)

For a continuous random variable for which the distribution function F has a density, dF (λ) =

p(λ)dλ, (2.9) gives

φ(t) =

∫Reitλp(λ) dλ.

For a discrete random variable for which P(X = λk) = αk, (2.9) gives

φ(t) =∞∑k=0

eitλkak.

From the properties of the Fourier transform we conclude that the characteristic function deter-

mines uniquely the distribution function of the random variable, in the sense that there is a one-to-

one correspondance between F (λ) and φ(t). Furthermore, in the exercises at the end of the chapter

the reader is asked to prove the following two results.

Lemma 2.5.1. Let X1, X2, . . . Xn be independent random variables with characteristic func-

tions φj(t), j = 1, . . . n and let Y =∑n

j=1Xj with characteristic function φY (t). Then

φY (t) = Πnj=1φj(t).

Lemma 2.5.2. Let X be a random variable with characteristic function φ(t) and assume that it

has finite moments. Then

E(Xk) =1

ikφ(k)(0).

19

2.6 Gaussian Random Variables

In this section we present some useful calculations for Gaussian random variables. In particular,

we calculate the normalization constant, the mean and variance and the characteristic function of

multidimensional Gaussian random variables.

Theorem 2.6.1. Let b ∈ Rd and Σ ∈ Rd×d a symmetric and positive definite matrix. Let X be the

multivariate Gaussian random variable with probability density function

γΣ,b(x) =1

Zexp

(−1

2〈Σ−1(x− b),x− b〉

).

Then

i. The normalization constant is

Z = (2π)d/2√

det(Σ).

ii. The mean vector and covariance matrix of X are given by

EX = b

and

E((X− EX)⊗ (X− EX)) = Σ.

iii. The characteristic function of X is

φ(t) = ei〈b,t〉−12〈t,Σt〉.

Proof. i. From the spectral theorem for symmetric positive definite matrices we have that there

exists a diagonal matrix Λ with positive entries and an orthogonal matrix B such that

Σ−1 = BTΛ−1B.

Let z = x− b and y = Bz. We have

〈Σ−1z, z〉 = 〈BTΛ−1Bz, z〉

= 〈Λ−1Bz, Bz〉 = 〈Λ−1y,y〉

=d∑i=1

λ−1i y2

i .

20

Furthermore, we have that det(Σ−1) = Πdi=1λ

−1i , that det(Σ) = Πd

i=1λi and that the Jacobian

of an orthogonal transformation is J = det(B) = 1. Hence,

∫Rd

exp

(−1

2〈Σ−1(x− b),x− b〉

)dx =

∫Rd

exp

(−1

2〈Σ−1z, z〉

)dz

=

∫Rd

exp

(−1

2

d∑i=1

λ−1i y2

i

)|J | dy

=d∏i=1

∫R

exp

(−1

2λ−1i y2

i

)dyi

= (2π)d/2Πni=1λ

1/2i = (2π)d/2

√det(Σ),

from which we get that

Z = (2π)d/2√

det(Σ).

In the above calculation we have used the elementary calculus identity

∫Re−α

x2

2 dx =

√2π

α.

ii. From the above calculation we have that

γΣ,b(x) dx = γΣ,b(BTy + b) dy

=1

(2π)d/2√

det(Σ)

d∏i=1

exp

(−1

2λiy

2i

)dyi.

Consequently

EX =

∫Rd

xγΣ,b(x) dx

=

∫Rd

(BTy + b)γΣ,b(BTy + b) dy

= b

∫RdγΣ,b(B

Ty + b) dy = b.

We note that, since Σ−1 = BTΛ−1B, we have that Σ = BTΛB. Furthermore, z = BTy. We

21

calculate

E((Xi − bi)(Xj − bj)) =

∫RdzizjγΣ,b(z + b) dz

=1

(2π)d/2√

det(Σ)

∫Rd

∑k

Bkiyk∑m

Bmiym exp

(−1

2

∑`

λ−1` y2

`

)dy

=1

(2π)d/2√

det(Σ)

∑k,m

BkiBmj

∫Rdykym exp

(−1

2

∑`

λ−1` y2

`

)dy

=∑k,m

BkiBmjλkδkm

= Σij.

iii. Let y be a multivariate Gaussian random variable with mean 0 and covariance I . Let also

C = B√

Λ. We have that Σ = CCT = CTC. We have that

X = CY + b.

To see this, we first note that X is Gaussian since it is given through a linear transformation

of a Gaussian random variable. Furthermore,

EX = b and E((Xi − bi)(Xj − bj)) = Σij.

Now we have:

φ(t) = Eei〈X,t〉 = ei〈b,t〉Eei〈CY,t〉

= ei〈b,t〉Eei〈Y,CT t〉

= ei〈b,t〉EeiPj(

Pk Cjktk)yj

= ei〈b,t〉e−12

Pj|

Pk Cjktk|2

= ei〈b,t〉e−12〈Ct,Ct〉

= ei〈b,t〉e−12〈t,CTCt〉

= ei〈b,t〉e−12〈t,Σt〉.

Consequently,

φ(t) = ei〈b,t〉−12〈t,Σt〉.

22

2.7 Types of Convergence and Limit Theorems

One of the most important aspects of the theory of random variables is the study of limit theo-

rems for sums of random variables. The most well known limit theorems in probability theory

are the law of large numbers and the central limit theorem. There are various different types of

convergence for sequences or random variables. We list the most important types of convergence

below.

Definition 2.7.1. Let Zn∞n=1 be a sequence of random variables. We will say that

(a) Zn converges to Z with probability one if

P(

limn→+∞

Zn = Z)

= 1.

(b) Zn converges to Z in probability if for every ε > 0

limn→+∞

P(|Zn − Z| > ε

)= 0.

(c) Zn converges to Z in Lp if

limn→+∞

E[∣∣Zn − Z∣∣p] = 0.

(d) Let Fn(λ), n = 1, · · · +∞, F (λ) be the distribution functions of Zn n = 1, · · · +∞ and Z,

respectively. Then Zn converges to Z in distribution if

limn→+∞

Fn(λ) = F (λ)

for all λ ∈ R at which F is continuous.

Recall that the distribution function FX of a random variable from a probability space (Ω,F ,P)

to R induces a probability measure on R and that (R,B(R), FX) is a probability space. We can

show that the convergence in distribution is equivalent to the weak convergence of the probability

measures induced by the distribution functions.

Definition 2.7.2. Let (E, d) be a metric space, B(E) the σ−algebra of its Borel sets, Pn a sequence

of probability measures on (E,B(E)) and let Cb(E) denote the space of bounded continuous

functions on E. We will say that the sequence of Pn converges weakly to the probability measure

P if, for each f ∈ Cb(E),

limn→+∞

∫E

f(x) dPn(x) =

∫E

f(x) dP (x).

23

Theorem 2.7.3. Let Fn(λ), n = 1, · · ·+∞, F (λ) be the distribution functions ofZn n = 1, · · ·+∞and Z, respectively. Then Zn converges to Z in distribution if and only if, for all g ∈ Cb(R)

limn→+∞

∫X

g(x) dFn(x) =

∫X

g(x) dF (x). (2.10)

Notice that (2.10) is equivalent to

limn→+∞

Eng(Xn) = Eg(X),

where En and E denote the expectations with respect to Fn and F , respectively.

When the sequence of random variables whose convergence we are interested in takes values

in Rd or, more generally, a metric space space (E, d) then we can use weak convergence of the se-

quence of probability measures induced by the sequence of random variables to define convergence

in distribution.

Definition 2.7.4. A sequence of real valued random variables Xn defined on a probability spaces

(Ωn,Fn, Pn) and taking values on a metric space (E, d) is said to converge in distribution if the

indued measures Fn(B) = Pn(Xn ∈ B) for B ∈ B(E) converge weakly to a probability measure

P .

Let Xn∞n=1 be iid random variables with EXn = V . Then, the strong law of large numbers

states that average of the sum of the iid converges to V with probability one:

P(

limN→+∞

1

N

N∑n=1

Xn = V)

= 1.

The strong law of large numbers provides us with information about the behavior of a sum of

random variables (or, a large number or repetitions of the same experiment) on average. We can

also study fluctuations around the average behavior. Indeed, let E(Xn − V )2 = σ2. Define the

centered iid random variables Yn = Xn−V . Then, the sequence of random variables 1σ√N

∑Nn=1 Yn

converges in distribution to a N (0, 1) random variable:

limn→+∞

P

(1

σ√N

N∑n=1

Yn 6 a

)=

∫ a

−∞

1√2πe−

12x2

dx.

This is the central limit theorem.

24


The material of this chapter is very standard and can be found in many books on probability theory.

Well known textbooks on probability theory are [8, 23, 24, 56, 57, 48, 90].

The connection between conditional expectation and orthogonal projections is discussed in [13].

The reduced distribution functions defined in Section 2.3 are used extensively in statistical

mechanics. A different normalization is usually used in physics textbooks. See for instance [2,

Sec. 4.2].

The calculations presented in Section 2.6 are essentially an exercise in linear algebra. See [53,

Sec. 10.2].

Random variables and probability measures can also be defined in infinite dimensions. More

information can be found in [75, Ch. 2].

The study of limit theorems is one of the cornerstones of probability theory and of the theory

of stochastic processes. A comprehensive study of limit theorems can be found in [43].

2.9 Exercises

1. Show that the intersection of a family of σ-algebras is a σ-algebra.

2. Prove the law of total probability, Proposition 2.2.13.

3. Calculate the mean, variance and characteristic function of the following probability density

functions.

(a) The exponential distribution with density

f(x) =

λe−λx x > 0,

0 x < 0,

with λ > 0.

(b) The uniform distribution with density

f(x) =

1b−a a < x < b,

0 x /∈ (a, b),

with a < b.

25

(c) The Gamma distribution with density

f(x) =

λ

Γ(α)(λx)α−1e−λx x > 0,

0 x < 0,

with λ > 0, α > 0 and Γ(α) is the Gamma function

Γ(α) =

∫ ∞0

ξα−1e−ξ dξ, α > 0.

4. Le X and Y be independent random variables with distribution functions FX and FY . Show

that the distribution function of the sum Z = X + Y is the convolution of FX and FY :

FZ(x) =

∫FX(x− y) dFY (y).

5. Let X and Y be Gaussian random variables. Show that they are uncorrelated if and only if they

are independent.

6. (a) Let X be a continuous random variable with characteristic function φ(t). Show that

EXk =1

ikφ(k)(0),

where φ(k)(t) denotes the k-th derivative of φ evaluated at t.

(b) Let X be a nonnegative random variable with distribution function F (x). Show that

E(X) =

∫ +∞

0

(1− F (x)) dx.

(c) Let X be a continuous random variable with probability density function f(x) and char-

acteristic function φ(t). Find the probability density and characteristic function of the

random variable Y = aX + b with a, b ∈ R.

(d) Let X be a random variable with uniform distribution on [0, 2π]. Find the probability

density of the random variable Y = sin(X).

7. Let X be a discrete random variable taking vales on the set of nonnegative integers with prob-

ability mass function pk = P(X = k) with pk > 0,∑+∞

k=0 pk = 1. The generating function is

defined as

g(s) = E(sX) =+∞∑k=0

pksk.

26

(a) Show that

EX = g′(1) and EX2 = g′′(1) + g′(1),

where the prime denotes differentiation.

(b) Calculate the generating function of the Poisson random variable with

pk = P(X = k) =e−λλk

k!, k = 0, 1, 2, . . . and λ > 0.

(c) Prove that the generating function of a sum of independent nonnegative integer valued

random variables is the product of their generating functions.

8. Write a computer program for studying the law of large numbers and the central limit theorem.

Investigate numerically the rate of convergence of these two theorems.

9. Study the properties of Gaussian measures on separable Hilbert spaces from [75, Ch. 2].

10. . Prove Theorem 2.4.1.

27

28

Chapter 3

Basics of the Theory of Stochastic Processes

3.1 Introduction

In this chapter we present some basic results form the theory of stochastic processes and we inves-

tigate the properties of some of the standard stochastic processes in continuous time. In Section 3.2

we give the definition of a stochastic process. In Section 3.3 we present some properties of sta-

tionary stochastic processes. In Section 3.4 we introduce Brownian motion and study some of

its properties. Various examples of stochastic processes in continuous time are presented in Sec-

tion 3.5. The Karhunen-Loeve expansion, one of the most useful tools for representing stochastic

processes and random fields, is presented in Section 3.6. Further discussion and bibliographical

comments are presented in Section 3.7. Section 3.8 contains exercises.

3.2 Definition of a Stochastic Process

Stochastic processes describe dynamical systems whose evolution law is of probabilistic nature.

The precise definition is given below.

Definition 3.2.1. Let T be an ordered set, (Ω,F ,P) a probability space and (E,G) a measurable

space. A stochastic process is a collection of random variables X = Xt; t ∈ T where, for each

fixed t ∈ T , Xt is a random variable from (Ω,F ,P) to (E,G). Ω is called the sample space. and

E is the state space of the stochastic process Xt.

The set T can be either discrete, for example the set of positive integers Z+, or continuous,

T = [0,+∞). The state space E will usually be Rd equipped with the σ–algebra of Borel sets.

29

A stochastic process X may be viewed as a function of both t ∈ T and ω ∈ Ω. We will

sometimes write X(t), X(t, ω) or Xt(ω) instead of Xt. For a fixed sample point ω ∈ Ω, the

function Xt(ω) : T 7→ E is called a sample path (realization, trajectory) of the process X .

Definition 3.2.2. The finite dimensional distributions (fdd) of a stochastic process are the dis-

tributions of the Ek–valued random variables (X(t1), X(t2), . . . , X(tk)) for arbitrary positive

integer k and arbitrary times ti ∈ T, i ∈ 1, . . . , k:

F (x) = P(X(ti) 6 xi, i = 1, . . . , k)

with x = (x1, . . . , xk).

From experiments or numerical simulations we can only obtain information about the finite

dimensional distributions of a process. A natural question arises: are the finite dimensional distri-

butions of a stochastic process sufficient to determine a stochastic process uniquely? This is true

for processes with continuous paths 1. This is the class of stochastic processes that we will study

in these notes.

Definition 3.2.3. We will say that two processes Xt and Yt are equivalent if they have same finite

dimensional distributions.

Definition 3.2.4. A one dimensional Gaussian process is a continuous time stochastic process for

whichE = R and all the finite dimensional distributions are Gaussian, i.e. every finite dimensional

vector (Xt1 , Xt2 , . . . , Xtk) is a N (µk, Kk) random variable for some vector µk and a symmetric

nonnegative definite matrix Kk for all k = 1, 2, . . . and for all t1, t2, . . . , tk.

From the above definition we conclude that the Finite dimensional distributions of a Gaussian

continuous time stochastic process are Gaussian with PFG

γµk,Kk(x) = (2π)−n/2(detKk)−1/2 exp

[−1

2〈K−1

k (x− µk), x− µk〉],

where x = (x1, x2, . . . xk).

It is straightforward to extend the above definition to arbitrary dimensions. A Gaussian process

x(t) is characterized by its mean

m(t) := Ex(t)

1In fact, what we need is the stochastic process to be separable. See the discussion in Section 3.7

30

and the covariance (or autocorrelation) matrix

C(t, s) = E((x(t)−m(t)

)⊗(x(s)−m(s)

)).

Thus, the first two moments of a Gaussian process are sufficient for a complete characterization of

the process.

3.3 Stationary Processes

3.3.1 Strictly Stationary Processes

In many stochastic processes that appear in applications their statistics remain invariant under time

translations. Such stochastic processes are called stationary. It is possible to develop a quite

general theory for stochastic processes that enjoy this symmetry property.

Definition 3.3.1. A stochastic process is called (strictly) stationary if all finite dimensional dis-

tributions are invariant under time translation: for any integer k and times ti ∈ T , the distribution

of (X(t1), X(t2), . . . , X(tk)) is equal to that of (X(s + t1), X(s + t2), . . . , X(s + tk)) for any s

such that s+ ti ∈ T for all i ∈ 1, . . . , k. In other words,

P(Xt1+t ∈ A1, Xt2+t ∈ A2 . . . Xtk+t ∈ Ak) = P(Xt1 ∈ A1, Xt2 ∈ A2 . . . Xtk ∈ Ak), ∀t ∈ T.

Example 3.3.2. Let Y0, Y1, . . . be a sequence of independent, identically distributed random vari-

ables and consider the stochastic process Xn = Yn. Then Xn is a strictly stationary process (see

Exercise 1). Assume furthermore that EY0 = µ < +∞. Then, by the strong law of large numbers,

we have that1

N

N−1∑j=0

Xj =1

N

N−1∑j=0

Yj → EY0 = µ,

almost surely. In fact, Birkhoff’s ergodic theorem states that, for any function f such that

Ef(Y0) < +∞, we have that

limN→+∞

1

N

N−1∑j=0

f(Xj) = Ef(Y0), (3.1)

almost surely. The sequence of iid random variables is an example of an ergodic strictly stationary

processes.

31

Ergodic strictly stationary processes satisfy (3.1) Hence, we can calculate the statistics of a

sequence stochastic process Xn using a single sample path, provided that it is long enough (N 1).

Example 3.3.3. Let Z be a random variable and define the stochastic process Xn = Z, n =

0, 1, 2, . . . . Then Xn is a strictly stationary process (see Exercise 2). We can calculate the long

time average of this stochastic process:

1

N

N−1∑j=0

Xj =1

N

N−1∑j=0

Z = Z,

which is independent of N and does not converge to the mean of the stochastic processes EXn =

EZ (assuming that it is finite), or any other deterministic number. This is an example of a non-

ergodic processes.

3.3.2 Second Order Stationary Processes

Let(Ω,F ,P

)be a probability space. Let Xt, t ∈ T (with T = R or Z) be a real-valued random

process on this probability space with finite second moment, E|Xt|2 < +∞ (i.e. Xt ∈ L2(Ω,P)

for all t ∈ T ). Assume that it is strictly stationary. Then,

E(Xt+s) = EXt, s ∈ T (3.2)

from which we conclude that EXt is constant. and

E((Xt1+s − µ)(Xt2+s − µ)) = E((Xt1 − µ)(Xt2 − µ)), s ∈ T (3.3)

from which we conclude that the covariance or autocorrelation or correlation function C(t, s) =

E((Xt − µ)(Xs − µ)) depends on the difference between the two times, t and s, i.e. C(t, s) =

C(t− s). This motivates the following definition.

Definition 3.3.4. A stochastic process Xt ∈ L2 is called second-order stationary or wide-sense

stationary or weakly stationary if the first moment EXt is a constant and the covariance function

E(Xt − µ)(Xs − µ) depends only on the difference t− s:

EXt = µ, E((Xt − µ)(Xs − µ)) = C(t− s).

32

The constant µ is the expectation of the process Xt. Without loss of generality, we can set

µ = 0, since if EXt = µ then the process Yt = Xt − µ is mean zero. A mean zero process

with be called a centered process. The function C(t) is the covariance (sometimes also called

autocovariance) or the autocorrelation function of the Xt. Notice that C(t) = E(XtX0), whereas

C(0) = E(X2t ), which is finite, by assumption. Since we have assumed that Xt is a real valued

process, we have that C(t) = C(−t), t ∈ R.

Remark 3.3.5. Let Xt be a strictly stationary stochastic process with finite second moment (i.e.

Xt ∈ L2). The definition of strict stationarity implies that EXt = µ, a constant, and E((Xt −µ)(Xs − µ)) = C(t − s). Hence, a strictly stationary process with finite second moment is also

stationary in the wide sense. The converse is not true.

Example 3.3.6.

Let Y0, Y1, . . . be a sequence of independent, identically distributed random variables and con-

sider the stochastic process Xn = Yn. From Example 3.3.2 we know that this is a strictly station-

ary process, irrespective of whether Y0 is such that EY 20 < +∞. Assume now that EY0 = 0 and

EY 20 = σ2 < +∞. Then Xn is a second order stationary process with mean zero and correlation

function R(k) = σ2δk0. Notice that in this case we have no correlation between the values of the

stochastic process at different times n and k.

Example 3.3.7. Let Z be a single random variable and consider the stochastic process Xn =

Z, n = 0, 1, 2, . . . . From Example 3.3.3 we know that this is a strictly stationary process irrespec-

tive of whether E|Z|2 < +∞ or not. Assume now that EZ = 0, EZ2 = σ2. Then Xn becomes

a second order stationary process with R(k) = σ2. Notice that in this case the values of our

stochastic process at different times are strongly correlated.

We will see in Section 3.3.3 that for second order stationary processes, ergodicity is related to

fast decay of correlations. In the first of the examples above, there was no correlation between our

stochastic processes at different times and the stochastic process is ergodic. On the contrary, in our

second example there is very strong correlation between the stochastic process at different times

and this process is not ergodic.

Remark 3.3.8. The first two moments of a Gaussian process are sufficient for a complete charac-

terization of the process. Consequently, a Gaussian stochastic process is strictly stationary if and

only if it is weakly stationary.

33

Continuity properties of the covariance function are equivalent to continuity properties of the

paths of Xt in the L2 sense, i.e.

limh→0

E|Xt+h −Xt|2 = 0.

Lemma 3.3.9. Assume that the covariance function C(t) of a second order stationary process is

continuous at t = 0. Then it is continuous for all t ∈ R. Furthermore, the continuity of C(t) is

equivalent to the continuity of the process Xt in the L2-sense.

Proof. Fix t ∈ R and (without loss of generality) set EXt = 0. We calculate:

|C(t+ h)− C(t)|2 = |E(Xt+hX0)− E(XtX0)|2 = E|((Xt+h −Xt)X0)|2

6 E(X0)2E(Xt+h −Xt)2

= C(0)(EX2t+h + EX2

t − 2EXtXt+h)

= 2C(0)(C(0)− C(h))→ 0,

as h→ 0. Thus, continuity of C(·) at 0 implies continuity for all t.

Assume now that C(t) is continuous. From the above calculation we have

E|Xt+h −Xt|2 = 2(C(0)− C(h)), (3.4)

which converges to 0 as h → 0. Conversely, assume that Xt is L2-continuous. Then, from the

above equation we get limh→0C(h) = C(0).

Notice that form (3.4) we immediately conclude that C(0) > C(h), h ∈ R.

The Fourier transform of the covariance function of a second order stationary process always

exists. This enables us to study second order stationary processes using tools from Fourier analysis.

To make the link between second order stationary processes and Fourier analysis we will use

Bochner’s theorem, which applies to all nonnegative functions.

Definition 3.3.10. A function f(x) : R 7→ R is called nonnegative definite if

n∑i,j=1

f(ti − tj)cicj > 0 (3.5)

for all n ∈ N, t1, . . . tn ∈ R, c1, . . . cn ∈ C.

34

Lemma 3.3.11. The covariance function of second order stationary process is a nonnegative defi-

nite function.

Proof. We will use the notation Xct :=

∑ni=1Xtici. We have.

n∑i,j=1

C(ti − tj)cicj =n∑

i,j=1

EXtiXtjcicj

= E

(n∑i=1

Xtici

n∑j=1

Xtj cj

)= E

(Xct X

ct

)= E|Xc

t |2 > 0.

Theorem 3.3.12. (Bochner) Let C(t) be a continuous positive definite function. Then there exists

a unique nonnegative measure ρ on R such that ρ(R) = C(0) and

C(t) =

∫Reixt ρ(dx) ∀t ∈ R. (3.6)

Definition 3.3.13. Let Xt be a second order stationary process with autocorrelation function C(t)

whose Fourier transform is the measure ρ(dx). The measure ρ(dx) is called the spectral measure

of the process Xt.

In the following we will assume that the spectral measure is absolutely continuous with respect

to the Lebesgue measure on R with density f(x), i.e. ρ(dx) = f(x)dx. The Fourier transform

f(x) of the covariance function is called the spectral density of the process:

f(x) =1

2π

∫ ∞−∞

e−itxC(t) dt.

From (3.6) it follows that that the autocorrelation function of a mean zero, second order stationary

process is given by the inverse Fourier transform of the spectral density:

C(t) =

∫ ∞−∞

eitxf(x) dx. (3.7)

There are various cases where the experimentally measured quantity is the spectral density (or

power spectrum) of a stationary stochastic process. Conversely, from a time series of observations

of a stationary processes we can calculate the autocorrelation function and, using (3.7) the spectral

density.

35

The autocorrelation function of a second order stationary process enables us to associate a time

scale to Xt, the correlation time τcor:

τcor =1

C(0)

∫ ∞0

C(τ) dτ =

∫ ∞0

E(XτX0)/E(X20 ) dτ.

The slower the decay of the correlation function, the larger the correlation time is. Notice that

when the correlations do not decay sufficiently fast so that C(t) is integrable, then the correlation

time will be infinite.

Example 3.3.14. Consider a mean zero, second order stationary process with correlation function

R(t) = R(0)e−α|t| (3.8)

where α > 0. We will write R(0) = Dα

where D > 0. The spectral density of this process is:

f(x) =1

2π

D

α

∫ +∞

−∞e−ixte−α|t| dt

=1

2π

D

α

(∫ 0

−∞e−ixteαt dt+

∫ +∞

0

e−ixte−αt dt

)=

1

2π

D

α

(1

−ix+ α+

1

ix+ α

)=

D

π

1

x2 + α2.

This function is called the Cauchy or the Lorentz distribution. The correlation time is (we have

that R(0) = D/α)

τcor =

∫ ∞0

e−αt dt = α−1.

A Gaussian process with an exponential correlation function is of particular importance in the

theory and applications of stochastic processes.

Definition 3.3.15. A real-valued Gaussian stationary process defined on R with correlation func-

tion given by (3.8) is called the (stationary) Ornstein-Uhlenbeck process.

The Ornstein Uhlenbeck process is used as a model for the velocity of a Brownian particle. It

is of interest to calculate the statistics of the position of the Brownian particle, i.e. of the integral

X(t) =

∫ t

0

Y (s) ds, (3.9)

where Y (t) denotes the stationary OU process.

36

Lemma 3.3.16. Let Y (t) denote the stationary OU process with covariance function (3.8) and set

α = D = 1. Then the position process (3.9) is a mean zero Gaussian process with covariance

function

E(X(t)X(s)) = 2 min(t, s) + e−min(t,s) + e−max(t,s) − e−|t−s| − 1. (3.10)


3.3.3 Ergodic Properties of Second-Order Stationary Processes

Second order stationary processes have nice ergodic properties, provided that the correlation be-

tween values of the process at different times decays sufficiently fast. In this case, it is possible to

show that we can calculate expectations by calculating time averages. An example of such a result

is the following.

Theorem 3.3.17. Let Xtt>0 be a second order stationary process on a probability space Ω, F , Pwith mean µ and covariance R(t), and assume that R(t) ∈ L1(0,+∞). Then

limT→+∞

E∣∣∣∣ 1

T

∫ T

0

X(s) ds− µ∣∣∣∣2 = 0. (3.11)

For the proof of this result we will first need an elementary lemma.

Lemma 3.3.18. Let R(t) be an integrable symmetric function. Then∫ T

0

∫ T

0

R(t− s) dtds = 2

∫ T

0

(T − s)R(s) ds. (3.12)

Proof. We make the change of variables u = t−s, v = t+s. The domain of integration in the t, s

variables is [0, T ]× [0, T ]. In the u, v variables it becomes [−T, T ]× [0, 2(T − |u|)]. The Jacobian

of the transformation is

J =∂(t, s)

∂(u, v)=

1

2.

The integral becomes∫ T

0

∫ T

0

R(t− s) dtds =

∫ T

−T

∫ 2(T−|u|)

0

R(u)J dvdu

=

∫ T

−T(T − |u|)R(u) du

= 2

∫ T

0

(T − u)R(u) du,

where the symmetry of the function R(u) was used in the last step.

37

Proof of Theorem 3.3.17. We use Lemma (3.3.18) to calculate:

E∣∣∣∣ 1

T

∫ T

0

Xs ds− µ∣∣∣∣2 =

1

T 2E∣∣∣∣∫ T

0

(Xs − µ) ds

∣∣∣∣2=

1

T 2E∫ T

0

∫ T

0

(X(t)− µ)(X(s)− µ) dtds

=1

T 2

∫ T

0

∫ T

0

R(t− s) dtds

=2

T 2

∫ T

0

(T − u)R(u) du

62

T

∫ +∞

0

∣∣∣(1− u

T

)R(u)

∣∣∣ du 62

T

∫ +∞

0

R(u) du→ 0,

using the dominated convergence theorem and the assumption R(·) ∈ L1.

Assume that µ = 0 and define

D =

∫ +∞

0

R(t) dt, (3.13)

which, from our assumption on R(t), is a finite quantity. 2 The above calculation suggests that, for

T 1, we have that

E(∫ t

0

X(t) dt

)2

≈ 2DT.

This implies that, at sufficiently long times, the mean square displacement of the integral of the

ergodic second order stationary process Xt scales linearly in time, with proportionality coefficient

2D.

Assume that Xt is the velocity of a (Brownian) particle. In this case, the integral of Xt

Zt =

∫ t

0

Xs ds,

represents the particle position. From our calculation above we conclude that

EZ2t = 2Dt.

where

D =

∫ ∞0

R(t) dt =

∫ ∞0

E(XtX0) dt (3.14)

is the diffusion coefficient. Thus, one expects that at sufficiently long times and under appropriate

assumptions on the correlation function, the time integral of a stationary process will approximate2Notice however that we do not know whether it is nonzero. This requires a separate argument.

38

a Brownian motion with diffusion coefficient D. The diffusion coefficient is an example of a

transport coefficient and (3.14) is an example of the Green-Kubo formula: a transport coefficient

can be calculated in terms of the time integral of an appropriate autocorrelation function. In the

case of the diffusion coefficient we need to calculate the integral of the velocity autocorrelation

function.

Example 3.3.19. Consider the stochastic processes with an exponential correlation function from

Example 3.3.14, and assume that this stochastic process describes the velocity of a Brownian

particle. Since R(t) ∈ L1(0,+∞) Theorem 3.3.17 applies. Furthermore, the diffusion coefficient

of the Brownian particle is given by∫ +∞

0

R(t) dt = R(0)τ−1c =

D

α2.

3.4 Brownian Motion

The most important continuous time stochastic process is Brownian motion. Brownian motion

is a mean zero, continuous (i.e. it has continuous sample paths: for a.e ω ∈ Ω the function Xt is

a continuous function of time) process with independent Gaussian increments. A process Xt has

independent increments if for every sequence t0 < t1 < . . . tn the random variables

Xt1 −Xt0 , Xt2 −Xt1 , . . . , Xtn −Xtn−1

are independent. If, furthermore, for any t1, t2, s ∈ T and Borel set B ⊂ R

P(Xt2+s −Xt1+s ∈ B) = P(Xt2 −Xt1 ∈ B)

then the process Xt has stationary independent increments.

Definition 3.4.1. • A one dimensional standard Brownian motion W (t) : R+ → R is a real

valued stochastic process such that

i. W (0) = 0.

ii. W (t) has independent increments.

iii. For every t > s > 0 W (t) − W (s) has a Gaussian distribution with mean 0 and

variance t− s. That is, the density of the random variable W (t)−W (s) is

g(x; t, s) =(

2π(t− s))− 1

2exp

(− x2

2(t− s)

); (3.15)

39

• A d–dimensional standard Brownian motion W (t) : R+ → Rd is a collection of d indepen-

dent one dimensional Brownian motions:

W (t) = (W1(t), . . . ,Wd(t)),

where Wi(t), i = 1, . . . , d are independent one dimensional Brownian motions. The density

of the Gaussian random vector W (t)−W (s) is thus

g(x; t, s) =(

2π(t− s))−d/2

exp

(− ‖x‖2

2(t− s)

).

Brownian motion is sometimes referred to as the Wiener process .

Brownian motion has continuous paths. More precisely, it has a continuous modification.

Definition 3.4.2. LetXt and Yt, t ∈ T , be two stochastic processes defined on the same probability

space (Ω,F ,P). The process Yt is said to be a modification of Xt if P(Xt = Yt) = 1 ∀t ∈ T .

Lemma 3.4.3. There is a continuous modification of Brownian motion.

This follows from a theorem due to Kolmogorov.

Theorem 3.4.4. (Kolmogorov) Let Xt, t ∈ [0,∞) be a stochastic process on a probability space

Ω,F ,P. Suppose that there are positive constants α and β, and for each T > 0 there is a

constant C(T ) such that

E|Xt −Xs|α 6 C(T )|t− s|1+β, 0 6 s, t 6 T. (3.16)

Then there exists a continuous modification Yt of the process Xt.

The proof of Lemma 3.4.3 is left as an exercise.

Remark 3.4.5. Equivalently, we could have defined the one dimensional standard Brownian mo-

tion as a stochastic process on a probability space(Ω,F ,P

)with continuous paths for almost all

ω ∈ Ω, and Gaussian finite dimensional distributions with zero mean and covariance E(WtiWtj) =

min(ti, tj). One can then show that Definition 3.4.1 follows from the above definition.

It is possible to prove rigorously the existence of the Wiener process (Brownian motion):

40

Figure 3.1: Brownian sample paths

Theorem 3.4.6. (Wiener) There exists an almost-surely continuous process Wt with independent

increments such and W0 = 0, such that for each t > 0 the random variable Wt is N (0, t).

Furthermore, Wt is almost surely locally Holder continuous with exponent α for any α ∈ (0, 12).

Notice that Brownian paths are not differentiable.

We can also construct Brownian motion through the limit of an appropriately rescaled random

walk: let X1, X2, . . . be iid random variables on a probability space (Ω,F ,P) with mean 0 and

variance 1. Define the discrete time stochastic process Sn with S0 = 0, Sn =∑

j=1Xj, n > 1.

Define now a continuous time stochastic process with continuous paths as the linearly interpolated,

appropriately rescaled random walk:

W nt =

1√nS[nt] + (nt− [nt])

1√nX[nt]+1,

where [·] denotes the integer part of a number. Then W nt converges weakly, as n → +∞ to a one

dimensional standard Brownian motion.

Brownian motion is a Gaussian process. For the d–dimensional Brownian motion, and for I

the d× d dimensional identity, we have (see (2.7) and (2.8))

EW (t) = 0 ∀t > 0

and

E(

(W (t)−W (s))⊗ (W (t)−W (s)))

= (t− s)I. (3.17)

41

Moreover,

E(W (t)⊗W (s)

)= min(t, s)I. (3.18)

From the formula for the Gaussian density g(x, t− s), eqn. (3.15), we immediately conclude that

W (t)−W (s) and W (t+ u)−W (s+ u) have the same pdf. Consequently, Brownian motion has

stationary increments. Notice, however, that Brownian motion itself is not a stationary process.

Since W (t) = W (t)−W (0), the pdf of W (t) is

g(x, t) =1√2πt

e−x2/2t.

We can easily calculate all moments of the Brownian motion:

E(xn(t)) =1√2πt

∫ +∞

−∞xne−x

2/2t dx

=

1.3 . . . (n− 1)tn/2, n even,0, n odd.

Brownian motion is invariant under various transformations in time.

Theorem 3.4.7. . Let Wt denote a standard Brownian motion in R. Then, Wt has the following

properties:

i. (Rescaling). For each c > 0 define Xt = 1√cW (ct). Then (Xt, t > 0) = (Wt, t > 0) in law.

ii. (Shifting). For each c > 0 Wc+t −Wc, t > 0 is a Brownian motion which is independent of

Wu, u ∈ [0, c].

iii. (Time reversal). Define Xt = W1−t−W1, t ∈ [0, 1]. Then (Xt, t ∈ [0, 1]) = (Wt, t ∈ [0, 1])

in law.

iv. (Inversion). Let Xt, t > 0 defined by X0 = 0, Xt = tW (1/t). Then (Xt, t > 0) =

(Wt, t > 0) in law.

We emphasize that the equivalence in the above theorem holds in law and not in a pathwise

sense.


42

We can also add a drift and change the diffusion coefficient of the Brownian motion: we will

define a Brownian motion with drift µ and variance σ2 as the process

Xt = µt+ σWt.

The mean and variance of Xt are

EXt = µt, E(Xt − EXt)2 = σ2t.

Notice that Xt satisfies the equation

dXt = µ dt+ σ dWt.

This is the simplest example of a stochastic differential equation.

We can define the OU process through the Brownian motion via a time change.

Lemma 3.4.8. Let W (t) be a standard Brownian motion and consider the process

V (t) = e−tW (e2t).

Then V (t) is a Gaussian stationary process with mean 0 and correlation function

R(t) = e−|t|. (3.19)

For the proof of this result we first need to show that time changed Gaussian processes are also

Gaussian.

Lemma 3.4.9. Let X(t) be a Gaussian stochastic process and let Y (t) = X(f(t)) where f(t) is a

strictly increasing function. Then Y (t) is also a Gaussian process.

Proof. We need to show that, for all positive integers N and all sequences of times t1, t2, . . . tNthe random vector

Y (t1), Y (t2), . . . Y (tN) (3.20)

is a multivariate Gaussian random variable. Since f(t) is strictly increasing, it is invertible and

hence, there exist si, i = 1, . . . N such that si = f−1(ti). Thus, the random vector (3.20) can be

rewritten as

X(s1), X(s2), . . . X(sN),

which is Gaussian for all N and all choices of times s1, s2, . . . sN . Hence Y (t) is also Gaussian.

43

Proof of Lemma 3.4.8. The fact that V (t) is mean zero follows immediately from the fact that

W (t) is mean zero. To show that the correlation function of V (t) is given by (3.19), we calculate

E(V (t)V (s)) = e−t−sE(W (e2t)W (e2s)) = e−t−s min(e2t, e2s)

= e−|t−s|.

The Gaussianity of the process V (t) follows from Lemma 3.4.9 (notice that the transformation that

gives V (t) in terms of W (t) is invertible and we can write W (s) = s1/2V (12

ln(s))).

3.5 Other Examples of Stochastic Processes

3.5.1 Brownian Bridge

Let W (t) be a standard one dimensional Brownian motion. We define the Brownian bridge (from

0 to 0) to be the process

Bt = Wt − tW1, t ∈ [0, 1]. (3.21)

Notice that B0 = B1 = 0. Equivalently, we can define the Brownian bridge to be the continuous

Gaussian process Bt : 0 6 t 6 1 such that

EBt = 0, E(BtBs) = min(s, t)− st, s, t ∈ [0, 1]. (3.22)

Another, equivalent definition of the Brownian bridge is through an appropriate time change of the

Brownian motion:

Bt = (1− t)W(

t

1− t

), t ∈ [0, 1). (3.23)

Conversely, we can write the Brownian motion as a time change of the Brownian bridge:

Wt = (t+ 1)B

(t

1 + t

), t > 0.

3.5.2 Fractional Brownian Motion

Definition 3.5.1. A (normalized) fractional Brownian motion WHt , t > 0 with Hurst parameter

H ∈ (0, 1) is a centered Gaussian process with continuous sample paths whose covariance is

given by

E(WHt W

Hs ) =

1

2

(s2H + t2H − |t− s|2H

). (3.24)

44

Proposition 3.5.2. Fractional Brownian motion has the following properties.

i. When H = 12, W

12t becomes the standard Brownian motion.

ii. WH0 = 0, EWH

t = 0, E(WHt )2 = |t|2H , t > 0.

iii. It has stationary increments, E(WHt −WH

s )2 = |t− s|2H .

iv. It has the following self similarity property

(WHαt , t > 0) = (αHWH

t , t > 0), α > 0, (3.25)

where the equivalence is in law.

Proof. See Exercise 19

3.5.3 The Poisson Process

Another fundamental continuous time process is the Poisson process :

Definition 3.5.3. The Poisson process with intensity λ, denoted by N(t), is an integer-valued,

continuous time, stochastic process with independent increments satisfying

P[(N(t)−N(s)) = k] =e−λ(t−s)(λ(t− s)

)kk!

, t > s > 0, k ∈ N.

The Poisson process does not have a continuous modification. See Exercise 20.

3.6 The Karhunen-Loeve Expansion

Let f ∈ L2(Ω) where Ω is a subset of Rd and let en∞n=1 be an orthonormal basis in L2(Ω). Then,

it is well known that f can be written as a series expansion:

f =∞∑n=1

fnen,

where

fn =

∫Ω

f(x)en(x) dx.

45

The convergence is in L2(Ω):

limN→∞

∥∥∥∥∥f(x)−N∑n=1

fnen(x)

∥∥∥∥∥L2(Ω)

= 0.

It turns out that we can obtain a similar expansion for an L2 mean zero process which is continuous

in the L2 sense:

EX2t < +∞, EXt = 0, lim

h→0E|Xt+h −Xt|2 = 0. (3.26)

For simplicity we will take T = [0, 1]. Let R(t, s) = E(XtXs) be the autocorrelation function.

Notice that from (3.26) it follows that R(t, s) is continuous in both t and s (exercise 21).

Let us assume an expansion of the form

Xt(ω) =∞∑n=1

ξn(ω)en(t), t ∈ [0, 1] (3.27)

where en∞n=1 is an orthonormal basis in L2(0, 1). The random variables ξn are calculated as∫ 1

0

Xtek(t) dt =

∫ 1

0

∞∑n=1

ξnen(t)ek(t) dt

=∞∑n=1

ξnδnk = ξk,

where we assumed that we can interchange the summation and integration. We will assume that

these random variables are orthogonal:

E(ξnξm) = λnδnm,

where λn∞n=1 are positive numbers that will be determined later.

Assuming that an expansion of the form (3.27) exists, we can calculate

R(t, s) = E(XtXs) = E

(∞∑k=1

∞∑`=1

ξkek(t)ξ`e`(s)

)

=∞∑k=1

∞∑`=1

E (ξkξ`) ek(t)e`(s)

=∞∑k=1

λkek(t)ek(s).

46

Consequently, in order to the expansion (3.27) to be valid we need

R(t, s) =∞∑k=1

λkek(t)ek(s). (3.28)

From equation (3.28) it follows that∫ 1

0

R(t, s)en(s) ds =

∫ 1

0

∞∑k=1

λkek(t)ek(s)en(s) ds

=∞∑k=1

λkek(t)

∫ 1

0

ek(s)en(s) ds

=∞∑k=1

λkek(t)δkn

= λnen(t).

Hence, in order for the expansion (3.27) to be valid, λn, en(t)∞n=1 have to be the eigenvalues and

eigenfunctions of the integral operator whose kernel is the correlation function of Xt:∫ 1

0

R(t, s)en(s) ds = λnen(t). (3.29)

Hence, in order to prove the expansion (3.27) we need to study the eigenvalue problem for the

integral operator R : L2[0, 1] 7→ L2[0, 1]. It easy to check that this operator is self-adjoint

((Rf, h) = (f,Rh) for all f, h ∈ L2(0, 1)) and nonnegative (Rf, f > 0 for all f ∈ L2(0, 1)).

Hence, all its eigenvalues are real and nonnegative. Furthermore, it is a compact operator (if

φn∞n=1 is a bounded sequence in L2(0, 1), then Rφn∞n=1 has a convergent subsequence). The

spectral theorem for compact, self-adjoint operators implies that R has a countable sequence of

eigenvalues tending to 0. Furthermore, for every f ∈ L2(0, 1) we can write

f = f0 +∞∑n=1

fnen(t),

whereRf0 = 0, en(t) are the eigenfunctions ofR corresponding to nonzero eigenvalues and the

convergence is inL2. Finally, Mercer’s Theorem states that forR(t, s) continuous on [0, 1]×[0, 1],

the expansion (3.28) is valid, where the series converges absolutely and uniformly.

Now we are ready to prove (3.27).

47

Theorem 3.6.1. (Karhunen-Loeve). Let Xt, t ∈ [0, 1] be an L2 process with zero mean and

continuous correlation function R(t, s). Let λn, en(t)∞n=1 be the eigenvalues and eigenfunctions

of the operatorR defined in (3.35). Then

Xt =∞∑n=1

ξnen(t), t ∈ [0, 1], (3.30)

where

ξn =

∫ 1

0

Xten(t) dt, Eξn = 0, E(ξnξm) = λδnm. (3.31)

The series converges in L2 to X(t), uniformly in t.

Proof. The fact that Eξn = 0 follows from the fact that Xt is mean zero. The orthogonality of the

random variables ξn∞n=1 follows from the orthogonality of the eigenfunctions ofR:

E(ξnξm) = E∫ 1

0

∫ 1

0

XtXsen(t)em(s) dtds

=

∫ 1

0

∫ 1

0

R(t, s)en(t)em(s) dsdt

= λn

∫ 1

0

en(s)em(s) ds

= λnδnm.

Consider now the partial sum SN =∑N

n=1 ξnen(t).

E|Xt − SN |2 = EX2t + ES2

N − 2E(XtSN)

= R(t, t) + EN∑

k,`=1

ξkξ`ek(t)e`(t)− 2E

(Xt

N∑n=1

ξnen(t)

)

= R(t, t) +N∑k=1

λk|ek(t)|2 − 2EN∑k=1

∫ 1

0

XtXsek(s)ek(t) ds

= R(t, t)−N∑k=1

λk|ek(t)|2 → 0,

by Mercer’s theorem.

Remark 3.6.2. Let Xt be a Gaussian second order process with continuous covariance R(t, s).

Then the random variables ξk∞k=1 are Gaussian, since they are defined through the time integral

48

of a Gaussian processes. Furthermore, since they are Gaussian and orthogonal, they are also

independent. Hence, for Gaussian processes the Karhunen-Loeve expansion becomes:

Xt =+∞∑k=1

√λkξkek(t), (3.32)

where ξk∞k=1 are independent N (0, 1) random variables.

Example 3.6.3. The Karhunen-Loeve Expansion for Brownian Motion. The correlation func-

tion of Brownian motion is R(t, s) = min(t, s). The eigenvalue problemRψn = λnψn becomes∫ 1

0

min(t, s)ψn(s) ds = λnψn(t).

Let us assume that λn > 0 (it is easy to check that 0 is not an eigenvalue). Upon setting t = 0 we

obtain ψn(0) = 0. The eigenvalue problem can be rewritten in the form∫ t

0

sψn(s) ds+ t

∫ 1

t

ψn(s) ds = λnψn(t).

We differentiate this equation once: ∫ 1

t

ψn(s) ds = λnψ′n(t).

We set t = 1 in this equation to obtain the second boundary condition ψ′n(1) = 0. A second

differentiation yields;

−ψn(t) = λnψ′′n(t),

where primes denote differentiation with respect to t. Thus, in order to calculate the eigenvalues

and eigenfunctions of the integral operator whose kernel is the covariance function of Brownian

motion, we need to solve the Sturm-Liouville problem

−ψn(t) = λnψ′′n(t), ψ(0) = ψ′(1) = 0.

It is easy to check that the eigenvalues and (normalized) eigenfunctions are

ψn(t) =√

2 sin

(1

2(2n− 1)πt

), λn =

(2

(2n− 1)π

)2

.

Thus, the Karhunen-Loeve expansion of Brownian motion on [0, 1] is

Wt =√

2∞∑n=1

ξn2

(2n− 1)πsin

(1

2(2n− 1)πt

). (3.33)

49

We can use the KL expansion in order to study the L2-regularity of stochastic processes. First,

let R be a compact, symmetric positive definite operator on L2(0, 1) with eigenvalues and normal-

ized eigenfunctions λk, ek(x)+∞k=1 and consider a function f ∈ L2(0, 1) with

∫ 1

0f(s) ds = 0. We

can define the one parameter family of Hilbert spaces Hα through the norm

‖f‖2α = ‖R−αf‖2

L2 =∑k

|fk|2λ−α.

The inner product can be obtained through polarization. This norm enables us to measure the reg-

ularity of the function f(t).3 Let Xt be a mean zero second order (i.e. with finite second moment)

process with continuous autocorrelation function. Define the space Hα := L2((Ω, P ), Hα(0, 1))

with (semi)norm

‖Xt‖2α = E‖Xt‖2

Hα =∑k

|λk|1−α. (3.34)

Notice that the regularity of the stochastic process Xt depends on the decay of the eigenvalues of

the integral operatorR· :=∫ 1

0R(t, s) · ds.

As an example, consider the L2-regularity of Brownian motion. From Example 3.6.3 we know

that λk ∼ k−2. Consequently, from (3.34) we get that, in order forWt to be an element of the space

Hα, we need that ∑k

|λk|−2(1−α) < +∞,

from which we obtain that α < 1/2. This is consistent with the Holder continuity of Brownian

motion from Theorem 3.4.6. 4


The Ornstein-Uhlenbeck process was introduced by Ornstein and Uhlenbeck in 1930 as a model

for the velocity of a Brownian particle [93].

The kind of analysis presented in Section 3.3.3 was initiated by G.I. Taylor in [91]. The proof of

Bochner’s theorem 3.3.12 can be found in [50], where additional material on stationary processes

can be found. See also [46].

3Think of R as being the inverse of the Laplacian with periodic boundary conditions. In this case Hα coincideswith the standard fractional Sobolev space.

4Notice, however, that Wiener’s theorem refers to a.s. Holder continuity, whereas the calculation presented in thissection is about L2-continuity.

50

The spectral theorem for compact, self-adjoint operators which was needed in the proof of the

Karhunen-Loeve theorem can be found in [81]. The Karhunen-Loeve expansion is also valid for

random fields. See [88] and the reference therein.

3.8 Exercises

1. Let Y0, Y1, . . . be a sequence of independent, identically distributed random variables and con-

sider the stochastic process Xn = Yn.

(a) Show that Xn is a strictly stationary process.

(b) Assume that EY0 = µ < +∞ and EY 20 = sigma2 < +∞. Show that

limN→+∞

E

∣∣∣∣∣ 1

N

N−1∑j=0

Xj − µ

∣∣∣∣∣ = 0.

(c) Let f be such that Ef 2(Y0) < +∞. Show that

limN→+∞

E

∣∣∣∣∣ 1

N

N−1∑j=0

f(Xj)− f(Y0)

∣∣∣∣∣ = 0.

2. Let Z be a random variable and define the stochastic process Xn = Z, n = 0, 1, 2, . . . . Show

that Xn is a strictly stationary process.

3. Let A0, A1, . . . Am and B0, B1, . . . Bm be uncorrelated random variables with mean zero and

variances EA2i = σ2

i , EB2i = σ2

i , i = 1, . . .m. Let ω0, ω1, . . . ωm ∈ [0, π] be distinct frequen-

cies and define, for n = 0,±1,±2, . . . , the stochastic process

Xn =m∑k=0

(Ak cos(nωk) +Bk sin(nωk)

).

Calculate the mean and the covariance of Xn. Show that it is a weakly stationary process.

4. Let ξn : n = 0,±1,±2, . . . be uncorrelated random variables with Eξn = µ, E(ξn − µ)2 =

σ2, n = 0,±1,±2, . . . . Let a1, a2, . . . be arbitrary real numbers and consider the stochastic

process

Xn = a1ξn + a2ξn−1 + . . . amξn−m+1.

51

(a) Calculate the mean, variance and the covariance function of Xn. Show that it is a weakly

stationary process.

(b) Set ak = 1/√m for k = 1, . . .m. Calculate the covariance function and study the cases

m = 1 and m→ +∞.

5. Let W (t) be a standard one dimensional Brownian motion. Calculate the following expecta-

tions.

(a) EeiW (t).

(b) Eei(W (t)+W (s)), t, s,∈ (0,+∞).

(c) E(∑n

i=1 ciW (ti))2, where ci ∈ R, i = 1, . . . n and ti ∈ (0,+∞), i = 1, . . . n.

(d) Ee[i(Pn

i=1 ciW (ti))]

, where ci ∈ R, i = 1, . . . n and ti ∈ (0,+∞), i = 1, . . . n.

6. Let Wt be a standard one dimensional Brownian motion and define

Bt = Wt − tW1, t ∈ [0, 1].

(a) Show that Bt is a Gaussian process with

EBt = 0, E(BtBs) = min(t, s)− ts.

(b) Show that, for t ∈ [0, 1) an equivalent definition of Bt is through the formula

Bt = (1− t)W(

t

1− t

).

(c) Calculate the distribution function of Bt.

7. Let Xt be a mean-zero second order stationary process with autocorrelation function

R(t) =N∑j=1

λ2j

αje−αj |t|,

where αj, λjNj=1 are positive real numbers.

(a) Calculate the spectral density and the correlaction time of this process.

52

(b) Show that the assumptions of Theorem 3.3.17 are satisfied and use the argument presented

in Section 3.3.3 (i.e. the Green-Kubo formula) to calculate the diffusion coefficient of the

process Zt =∫ t

0Xs ds.

(c) Under what assumptions on the coefficients αj, λjNj=1 can you study the above questions

in the limit N → +∞?

8. Prove Lemma 3.10.

9. Let a1, . . . an and s1, . . . sn be positive real numbers. Calculate the mean and variance of the

random variable

X =n∑i=1

aiW (si).

10. Let W (t) be the standard one-dimensional Brownian motion and let σ, s1, s2 > 0. Calculate

(a) EeσW (t).

(b) E(

sin(σW (s1)) sin(σW (s2))).

11. Let Wt be a one dimensional Brownian motion and let µ, σ > 0 and define

St = etµ+σWt .

(a) Calculate the mean and the variance of St.

(b) Calculate the probability density function of St.

12. Use Theorem 3.4.4 to prove Lemma 3.4.3.

13. Prove Theorem 3.4.7.

14. Use Lemma 3.4.8 to calculate the distribution function of the stationary Ornstein-Uhlenbeck

process.

15. Calculate the mean and the correlation function of the integral of a standard Brownian motion

Yt =

∫ t

0

Ws ds.

53

16. Show that the process

Yt =

∫ t+1

t

(Ws −Wt) ds, t ∈ R,

is second order stationary.

17. Let Vt = e−tW (e2t) be the stationary Ornstein-Uhlenbeck process. Give the definition and

study the main properties of the Ornstein-Uhlenbeck bridge.

18. The autocorrelation function of the velocity Y (t) a Brownian particle moving in a harmonic

potential V (x) = 12ω2

0x2 is

R(t) = e−γ|t|(cos(δ|t|)− 1

δsin(δ|t|)

),

where γ is the friction coefficient and δ =√ω2

0 − γ2.

(a) Calculate the spectral density of Y (t).

(b) Calculate the mean square displacement E(X(t))2 of the position of the Brownian particle

X(t) =∫ t

0Y (s) ds. Study the limit t→ +∞.

19. Show the scaling property (3.25) of the fractional Brownian motion.

20. Use Theorem (3.4.4) to show that there does not exist a continuous modification of the Poisson

process.

21. Show that the correlation function of a process Xt satisfying (3.26) is continuous in both t and

s.

22. Let Xt be a stochastic process satisfying (3.26) and R(t, s) its correlation function. Show that

the integral operatorR : L2[0, 1] 7→ L2[0, 1]

Rf :=

∫ 1

0

R(t, s)f(s) ds (3.35)

is self-adjoint and nonnegative. Show that all of its eigenvalues are real and nonnegative. Show

that eigenfunctions corresponding to different eigenvalues are orthogonal.

54

23. LetH be a Hilbert space. An operatorR : H → H is said to be Hilbert-Schmidt if there exists

a complete orthonormal sequence φn∞n=1 in H such that

∞∑n=1

‖Ren‖2 <∞.

LetR : L2[0, 1] 7→ L2[0, 1] be the operator defined in (3.35) with R(t, s) being continuous both

in t and s. Show that it is a Hilbert-Schmidt operator.

24. LetXt a mean zero second order stationary process defined in the interval [0, T ] with continuous

covariance R(t) and let λn+∞n=1 be the eigenvalues of the covariance operator. Show that

∞∑n=1

λn = T R(0).

25. Calculate the Karhunen-Loeve expansion for a second order stochastic process with correlation

function R(t, s) = ts.

26. Calculate the Karhunen-Loeve expansion of the Brownian bridge on [0, 1].

27. Let Xt, t ∈ [0, T ] be a second order process with continuous covariance and Karhunen-Loeve

expansion

Xt =∞∑k=1

ξkek(t).

Define the process

Y (t) = f(t)Xτ(t), t ∈ [0, S],

where f(t) is a continuous function and τ(t) a continuous, nondecreasing function with τ(0) =

0, τ(S) = T . Find the Karhunen-Loeve expansion of Y (t), in an appropriate weighted L2

space, in terms of the KL expansion of Xt. Use this in order to calculate the KL expansion of

the Ornstein-Uhlenbeck process.

28. Calculate the Karhunen-Loeve expansion of a centered Gaussian stochastic process with co-

variance function R(s, t) = cos(2π(t− s)).

29. Use the Karhunen-Loeve expansion to generate paths of the

(a) Brownian motion on [0, 1].

55

(b) Brownian bridge on [0, 1].

(c) Ornstein-Uhlenbeck on [0, 1].

Study computationally the convergence of the KL expansion for these processes. How many

terms do you need to keep in the KL expansion in order to calculate accurate statistics of these

processes?

56

Chapter 4

Markov Processes

4.1 Introduction

In this chapter we will study some of the basic properties of Markov stochastic processes. In

Section 4.2 we present various examples of Markov processes, in discrete and continuous time.

In Section 4.3 we give the precise definition of a Markov process. In Section 4.4 we derive the

Chapman-Kolmogorov equation, the fundamental equation in the theory of Markov processes. In

Section 4.5 we introduce the concept of the generator of a Markov process. In Section 4.6 we study

ergodic Markov processes. Discussion and bibliographical remarks are presented in Section 4.7

and exercises can be found in Section 4.8.

4.2 Examples

Roughly speaking, a Markov process is a stochastic process that retains no memory of where it has

been in the past: only the current state of a Markov process can influence where it will go next. A

bit more precisely: a Markov process is a stochastic process for which, given the present, past and

future are statistically independent.

Perhaps the simplest example of a Markov process is that of a random walk in one dimension.

We defined the one dimensional random walk as the sum of independent, mean zero and variance

1 random variables ξi, i = 1, . . . :

XN =N∑n=1

ξn, X0 = 0.

57

Let i1, . . . i2, . . . be a sequence of integers. Then, for all integers n and m we have that

P(Xn+m = in+m|X1 = i1, . . . Xn = in) = P(Xn+m = in+m|Xn = in). (4.1)

1In words, the probability that the random walk will be at in+m at time n+m depends only on its

current value (at time n) and not on how it got there.

The random walk is an example of a discrete time Markov chain:

Definition 4.2.1. A stochastic process Sn;n ∈ N and state space is S = Z is called a discrete

time Markov chain provided that the Markov property (4.1) is satisfied.

Consider now a continuous-time stochastic process Xt with state space S = Z and denote by

Xs, s 6 t the collection of values of the stochastic process up to time t. We will say that Xt is

a Markov processes provided that

P(Xt+h = it+h|Xs, s 6 t) = P(Xt+h = it+h|Xt = it), (4.2)

for all h > 0. A continuous-time, discrete state space Markov process is called a continuous-time

Markov chain.

Example 4.2.2. The Poisson process is a continuous-time Markov chain with

P(Nt+h = j|Nt = i) = 0 if j < i,

e−λs(λs)j−i

(j−i)! , if j > i.

Similarly, we can define a continuous-time Markov process whose state space is R. In this

case, the above definitions become

P(Xt+h ∈ Γ|Xs, s 6 t) = P(Xt+h ∈ Γ|Xt = x) (4.3)

for all Borel sets Γ.

Example 4.2.3. The Brownian motion is a Markov process with conditional probability density

p(y, t|x, s) := p(Wt = y|Ws = x) =1√

2π(t− s)exp

(−|x− y|

2

2(t− s)

). (4.4)

1In fact, it is sufficient to take m = 1 in (4.1). See Exercise 1.

58

Example 4.2.4. The Ornstein-Uhlenbeck process Vt = e−tW (e2t) is a Markov process with con-

ditional probability density

p(y, t|x, s) := p(Vt = y|Vs = x) =1√

2π(1− e−2(t−s))exp

(−|y − xe

−(t−s)|2

2(1− e−2(t−s))

). (4.5)

To prove (4.5) we use the formula for the distribution function of the Brownian motion to calculate,

for t > s,

P(Vt 6 y|Vs = x) = P(e−tW (e2t) 6 y|e−sW (e2s) = x)

= P(W (e2t) 6 ety|W (e2s) = esx)

=

∫ ety

−∞

1√2π(e2t − e2s)

e− |z−xe

s|2

2(e2t−e2s) dz

=

∫ y

−∞

√2πe2t(1− e−2(t−s))e

− |ρet−xes|2

2(e2t(1−e−2(t−s)) dρ

=

∫ y

−∞

1√2π(1− e−2(t−s))

e− |ρ−x|2

2(1−e−2(t−s)) dρ.

Consequently, the transition probability density for the OU process is given by the formula

p(y, t|x, s) =∂

∂yP(Vt 6 y|Vs = x)

=1√

2π(1− e−2(t−s))exp

(−|y − xe

−(t−s)|2

2(1− e−2(t−s))

).

Markov stochastic processes appear in a variety of applications in physics, chemistry, biology

and finance. In this and the next chapter we will develop various analytical tools for studying them.

In particular, we will see that we can obtain an equation for the transition probability

P(Xn+1 = in+1|Xn = in), P(Xt+h = it+h|Xt = it), p(Xt+h = y|Xt = x), (4.6)

which will enable us to study the evolution of a Markov process. This equation will be called the

Chapman-Kolmogorov equation.

We will be mostly concerned with time-homogeneous Markov processes, i.e. processes for

which the conditional probabilities are invariant under time shifts. For time-homogeneous discrete-

time Markov chains we have

P(Xn+1 = j|Xn = i) = P(X1 = j|X0 = i) =: pij.

59

We will refer to the matrix P = pij as the transition matrix. It is each to check that the

transition matrix is a stochastic matrix, i.e. it has nonnegative entries and∑

j pij = 1. Similarly,

we can define the n-step transition matrix Pn = pij(n) as

pij(n) = P(Xm+n = j|Xm = i).

We can study the evolution of a Markov chain through the Chapman-Kolmogorov equation:

pij(m+ n) =∑k

pik(m)pkj(n). (4.7)

Indeed, let µ(n)i := P(Xn = i). The (possibly infinite dimensional) vector µn determines the state

of the Markov chain at time n. A simple consequence of the Chapman-Kolmogorov equation is

that we can write an evolution equation for the vector µ(n)

µ(n) = µ(0)P n, (4.8)

where P n denotes the nth power of the matrix P . Hence in order to calculate the state of the

Markov chain at time n all we need is the initial distribution µ0 and the transition matrix P . Com-

ponentwise, the above equation can be written as

µ(n)j =

∑i

µ(0)i πij(n).

Consider now a continuous time Markov chain with transition probability

pij(s, t) = P(Xt = j|Xs = i), s 6 t.

If the chain is homogeneous, then

pij(s, t) = pij(0, t− s) for all i, j, s, t.

In particular,

pij(t) = P(Xt = j|X0 = i).

The Chapman-Kolmogorov equation for a continuous time Markov chain is

dpijdt

=∑k

pik(t)gkj, (4.9)

60

where the matrixG is called the generator of the Markov chain. Equation (4.9) can also be written

in matrix notation:dP

dt= PtG.

The generator of the Markov chain is defined as

G = limh→0

1

h(Ph − I).

Let now µit = P(Xt = i). The vector µt is the distribution of the Markov chain at time t. We can

study its evolution using the equation

µt = µ0Pt.

Thus, as in the case if discrete time Markov chains, the evolution of a continuous time Markov

chain is completely determined by the initial distribution and and transition matrix.

Consider now the case a continuous time Markov process with continuous state space and

with continuous paths. As we have seen in Example 4.2.3 the Brownian motion is an example

of such a process. It is a standard result in the theory of partial differential equations that the

conditional probability density of the Brownian motion (4.4) is the fundamental solution of the

diffusion equation:∂p

∂t=

1

2

∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.10)

Similarly, the conditional distribution of the OU process satisfies the initial value problem

∂p

∂t=∂(yp)

∂y+

1

2

∂2p

∂y2, lim

t→sp(y, t|x, s) = δ(y − x). (4.11)

The Brownian motion and the OU process are examples of a diffusion process. A diffusion pro-

cess is a continuous time Markov process with continuous paths. We will see in Chapter 5, that

the conditional probability density p(y, t|x, s) of a diffusion process satisfies the forward Kol-

mogorov or Fokker-Planck equation

∂p

∂t= − ∂

∂y(a(y, t)p) +

1

2

∂2

∂y2(b(y, t)p), lim

t→sp(y, t|x, s) = δ(y − x). (4.12)

as well as the backward Kolmogorov equation

−∂p∂s

= a(x, s)∂p

∂x+

1

2b(x, s)

∂2p

∂x2, lim

t→sp(y, t|x, s) = δ(y − x). (4.13)

for appropriate functions a(y, t), b(y, t). Hence, a diffusion process is determined uniquely from

these two functions.

61

4.3 Definition of a Markov Process

In Section 4.1 we gave the definition of Markov process whose time is either discrete or continuous,

and whose state space is the set of integers. We also gave several examples of Markov chains as

well as of processes whose state space is the real line. In this section we give the precise definition

of a Markov process with t ∈ T , a general index set and S = E, an arbitrary metric space. We will

use this formulation in the next section to derive the Chapman-Kolmogorov equation.

In order to state the definition of a continuous-time Markov process that takes values in a metric

space we need to introduce various new concepts. For the definition of a Markov process we need

to use the conditional expectation of the stochastic process conditioned on all past values. We can

encode all past information about a stochastic process into an appropriate collection of σ-algebras.

Our setting will be that we have a probability space (Ω,F ,P) and an ordered set T . LetX = Xt(ω)

be a stochastic process from the sample space (Ω,F) to the state space (E,G), where E is a metric

space (we will usually take E to be either R or Rd). Remember that the stochastic process is a

function of two variables, t ∈ T and ω ∈ Ω.

We start with the definition of a σ–algebra generated by a collection of sets.

Definition 4.3.1. Let K be a collection of subsets of Ω. The smallest σ–algebra on Ω which

contains K is denoted by σ(K) and is called the σ–algebra generated by K.

Definition 4.3.2. Let Xt : Ω 7→ E, t ∈ T . The smallest σ–algebra σ(Xt, t ∈ T ), such that the

family of mappings Xt, t ∈ T is a stochastic process with sample space (Ω, σ(Xt, t ∈ T )) and

state space (E,G), is called the σ–algebra generated by Xt, t ∈ T.

In other words, the σ–algebra generated by Xt is the smallest σ–algebra such that Xt is a

measurable function (random variable) with respect to it: the set(ω ∈ Ω : Xt(ω) 6 x

)∈ σ(Xt, t ∈ T )

for all x ∈ R (we have assumed that E = R).

Definition 4.3.3. A filtration on (Ω,F) is a nondecreasing family Ft, t ∈ T of sub–σ–algebras

of F: Fs ⊆ Ft ⊆ F for s 6 t.

We set F∞ = σ(∪t∈TFt). The filtration generated by Xt, where Xt is a stochastic process, is

FXt := σ (Xs; s 6 t) .

62

Definition 4.3.4. A stochastic process Xt; t ∈ T is adapted to the filtration Ft := Ft, t ∈ Tif for all t ∈ T , Xt is an Ft–measurable random variable.

Definition 4.3.5. Let Xt be a stochastic process defined on a probability space (Ω,F , µ) with

values in E and let FXt be the filtration generated by Xt; t ∈ T. Then Xt; t ∈ T is a Markov

process if

P(Xt ∈ Γ|FXs ) = P(Xt ∈ Γ|Xs) (4.14)

for all t, s ∈ T with t > s, and Γ ∈ B(E).

Remark 4.3.6. The filtrationFXt is generated by events of the form ω|Xs1 ∈ B1, Xs2 ∈ B2, . . . Xsn ∈Bn, with 0 6 s1 < s2 < · · · < sn 6 s and Bi ∈ B(E). The definition of a Markov process is

thus equivalent to the hierarchy of equations

P(Xt ∈ Γ|Xt1 , Xt2 , . . . Xtn) = P(Xt ∈ Γ|Xtn) a.s.

for n > 1, 0 6 t1 < t2 < · · · < tn 6 t and Γ ∈ B(E).

Roughly speaking, the statistics of Xt for t > s are completely determined once Xs is known;

information about Xt for t < s is superfluous. In other words: a Markov process has no mem-

ory. More precisely: when a Markov process is conditioned on the present state, then there is no

memory of the past. The past and future of a Markov process are statistically independent when

the present is known.

Remark 4.3.7. A non-Markovian process Xt can be described through a Markovian one Yt by

enlarging the state space: the additional variables that we introduce account for the memory in

the Xt. This ”Markovianization” trick is very useful since there exist many analytical tools for

analyzing Markovian processes.

Example 4.3.8. The velocity of a Brownian particle is modeled by the stationary Ornstein-Uhlenbeck

process Yt = e−tW (e2t). The particle position is given by the integral of the OU process (we take

X0 = 0)

Xt =

∫ t

0

Ys ds.

The particle position depends on the past of the OU process and, consequently, is not a Markov

process. However, the joint position-velocity process Xt, Yt is. Its transition probability density

63

p(x, y, t|x0, y0) satisfies the forward Kolmogorov equation

∂p

∂t= −p∂p

∂x+

∂

∂y(yp) +

1

2

∂2p

∂y2.

4.4 The Chapman-Kolmogorov Equation

With a Markov process Xt we can associate a function P : T × T × E × B(E) → R+ defined

through the relation

P[Xt ∈ Γ|FXs

]= P (s, t,Xs,Γ),

for all t, s ∈ T with t > s and all Γ ∈ B(E). Assume that Xs = x. Since P[Xt ∈ Γ|FXs

]=

P [Xt ∈ Γ|Xs] we can write

P (Γ, t|x, s) = P [Xt ∈ Γ|Xs = x] .

The transition function P (t,Γ|x, s) is (for fixed t, x s) a probability measure onE with P (t, E|x, s) =

1; it is B(E)–measurable in x (for fixed t, s, Γ) and satisfies the Chapman–Kolmogorov equation

P (Γ, t|x, s) =

∫E

P (Γ, t|y, u)P (dy, u|x, s). (4.15)

for all x ∈ E, Γ ∈ B(E) and s, u, t ∈ T with s 6 u 6 t. The derivation of the Chapman-

Kolmogorov equation is based on the assumption of Markovianity and on properties of the con-

ditional probability. Let (Ω,F , µ) be a probability space, X a random variable from (Ω,F , µ) to

(E,G) and let F1 ⊂ F2 ⊂ F . Then (see Theorem 2.4.1)

E(E(X|F2)|F1) = E(E(X|F1)|F2) = E(X|F1). (4.16)

Given G ⊂ F we define the function PX(B|G) = P (X ∈ B|G) for B ∈ F . Assume that f is such

that E(f(X)) <∞. Then

E(f(X)|G) =

∫Rf(x)PX(dx|G). (4.17)

64

Now we use the Markov property, together with equations (4.16) and (4.17) and the fact that

s < u ⇒ FXs ⊂ FXu to calculate:

P (Γ, t|x, s) := P(Xt ∈ Γ|Xs = x) = P(Xt ∈ Γ|FXs )

= E(IΓ(Xt)|FXs ) = E(E(IΓ(Xt)|FXs )|FXu )

= E(E(IΓ(Xt)|FXu )|FXs ) = E(P(Xt ∈ Γ|Xu)|FXs )

= E(P(Xt ∈ Γ|Xu = y)|Xs = x)

=

∫RP (Γ, t|Xu = y)P (dy, u|Xs = x)

=:

∫RP (Γ, t|y, u)P (dy, u|x, s).

IΓ(·) denotes the indicator function of the set Γ. We have also set E = R. The CK equation is

an integral equation and is the fundamental equation in the theory of Markov processes. Under

additional assumptions we will derive from it the Fokker-Planck PDE, which is the fundamental

equation in the theory of diffusion processes, and will be the main object of study in this course.

Definition 4.4.1. A Markov process is homogeneous if

P (t,Γ|Xs = x) := P (s, t, x,Γ) = P (0, t− s, x,Γ).

We set P (0, t, ·, ·) = P (t, ·, ·). The Chapman–Kolmogorov (CK) equation becomes

P (t+ s, x,Γ) =

∫E

P (s, x, dz)P (t, z,Γ). (4.18)

Let Xt be a homogeneous Markov process and assume that the initial distribution of Xt is

given by the probability measure ν(Γ) = P (X0 ∈ Γ) (for deterministic initial conditions–X0 = x–

we have that ν(Γ) = IΓ(x) ). The transition function P (x, t,Γ) and the initial distribution ν

determine the finite dimensional distributions of X by

P(X0 ∈ Γ1, X(t1) ∈ Γ1, . . . , Xtn ∈ Γn)

=

∫Γ0

∫Γ1

. . .

∫Γn−1

P (tn − tn−1, yn−1,Γn)P (tn−1 − tn−2, yn−2, dyn−1)

· · · × P (t1, y0, dy1)ν(dy0). (4.19)

Theorem 4.4.2. ([21, Sec. 4.1]) Let P (t, x,Γ) satisfy (4.18) and assume that (E, ρ) is a complete

separable metric space. Then there exists a Markov process X in E whose finite-dimensional

distributions are uniquely determined by (4.19).

65

Let Xt be a homogeneous Markov process with initial distribution ν(Γ) = P (X0 ∈ Γ) and

transition function P (x, t,Γ). We can calculate the probability of finding Xt in a set Γ at time t:

P(Xt ∈ Γ) =

∫E

P (x, t,Γ)ν(dx).

Thus, the initial distribution and the transition function are sufficient to characterize a homoge-

neous Markov process. Notice that they do not provide us with any information about the actual

paths of the Markov process. The transition probability P (Γ, t|x, s) is a probability measure. As-

sume that it has a density for all t > s:

P (Γ, t|x, s) =

∫Γ

p(y, t|x, s) dy.

Clearly, for t = s we have P (Γ, s|x, s) = IΓ(x). The Chapman-Kolmogorov equation becomes:∫Γ

p(y, t|x, s) dy =

∫R

∫Γ

p(y, t|z, u)p(z, u|x, s) dzdy,

and, since Γ ∈ B(R) is arbitrary, we obtain the equation

p(y, t|x, s) =

∫Rp(y, t|z, u)p(z, u|x, s) dz. (4.20)

The transition probability density is a function of 4 arguments: the initial position and time x, s

and the final position and time y, t.

In words, the CK equation tells us that, for a Markov process, the transition from x, s to y, t

can be done in two steps: first the system moves from x to z at some intermediate time u. Then it

moves from z to y at time t. In order to calculate the probability for the transition from (x, s) to

(y, t) we need to sum (integrate) the transitions from all possible intermediary states z. The above

description suggests that a Markov process can be described through a semigroup of operators,

i.e. a one-parameter family of linear operators with the properties

P0 = I, Pt+s = Pt Ps ∀ t, s > 0.

Indeed, let P (t, x, dy) be the transition function of a homogeneous Markov process. It satisfies

the CK equation (4.18):

P (t+ s, x,Γ) =

∫E

P (s, x, dz)P (t, z,Γ).

66

Let X := Cb(E) and define the operator

(Ptf)(x) := E(f(Xt)|X0 = x) =

∫E

f(y)P (t, x, dy).

This is a linear operator with

(P0f)(x) = E(f(X0)|X0 = x) = f(x) ⇒ P0 = I.

Furthermore:

(Pt+sf)(x) =

∫f(y)P (t+ s, x, dy)

=

∫ ∫f(y)P (s, z, dy)P (t, x, dz)

=

∫ (∫f(y)P (s, z, dy)

)P (t, x, dz)

=

∫(Psf)(z)P (t, x, dz)

= (Pt Psf)(x).

Consequently:

Pt+s = Pt Ps.

4.5 The Generator of a Markov Processes

Let (E, ρ) be a metric space and let Xt be an E-valued homogeneous Markov process. Define

the one parameter family of operators Pt through

Ptf(x) =

∫f(y)P (t, x, dy) = E[f(Xt)|X0 = x]

for all f(x) ∈ Cb(E) (continuous bounded functions on E). Assume for simplicity that Pt :

Cb(E)→ Cb(E). Then the one-parameter family of operators Pt forms a semigroup of operators

on Cb(E). We define by D(L) the set of all f ∈ Cb(E) such that the strong limit

Lf = limt→0

Ptf − ft

,

exists.

Definition 4.5.1. The operator L : D(L) → Cb(E) is called the infinitesimal generator of the

operator semigroup Pt.

67

Definition 4.5.2. The operator L : Cb(E) → Cb(E) defined above is called the generator of the

Markov process Xt; t > 0.

The semigroup property and the definition of the generator of a semigroup imply that, formally

at least, we can write:

Pt = exp(Lt).

Consider the function u(x, t) := (Ptf)(x). We calculate its time derivative:

∂u

∂t=

d

dt(Ptf) =

d

dt

(eLtf

)= L

(eLtf

)= LPtf = Lu.

Furthermore, u(x, 0) = P0f(x) = f(x). Consequently, u(x, t) satisfies the initial value problem

∂u

∂t= Lu, u(x, 0) = f(x). (4.21)

When the semigroup Pt is the transition semigroup of a Markov processXt, then equation (4.21)

is called the backward Kolmogorov equation. It governs the evolution of an observable

u(x, t) = E(f(Xt)|X0 = x).

Thus, given the generator of a Markov process L, we can calculate all the statistics of our process

by solving the backward Kolmogorov equation. In the case where the Markov process is the

solution of a stochastic differential equation, then the generator is a second order elliptic operator

and the backward Kolmogorov equation becomes an initial value problem for a parabolic PDE.

The space Cb(E) is natural in a probabilistic context, but other Banach spaces often arise

in applications; in particular when there is a measure µ on E, the spaces Lp(E;µ) sometimes

arise. We will quite often use the space L2(E;µ), where µ will is the invariant measure of

our Markov process. The generator is frequently taken as the starting point for the definition

of a homogeneous Markov process. Conversely, let Pt be a contraction semigroup (Let X be

a Banach space and T : X → X a bounded operator. Then T is a contraction provided that

‖Tf‖X 6 ‖f‖X ∀ f ∈ X), with D(Pt) ⊂ Cb(E), closed. Then, under mild technical hypotheses,

there is an E–valued homogeneous Markov process Xt associated with Pt defined through

E[f(X(t)|FXs )] = Pt−sf(X(s))

for all t, s ∈ T with t > s and f ∈ D(Pt).

68

Example 4.5.3. The Poisson process is a homogeneous Markov process.

Example 4.5.4. The one dimensional Brownian motion is a homogeneous Markov process. The

transition function is the Gaussian defined in the example in Lecture 2:

P (t, x, dy) = γt,x(y)dy, γt,x(y) =1√2πt

exp

(−|x− y|

2

2t

).

The semigroup associated to the standard Brownian motion is the heat semigroup Pt = et2d2

dx2 . The

generator of this Markov process is 12d2

dx2 .

Notice that the transition probability density γt,x of the one dimensional Brownian motion is

the fundamental solution (Green’s function) of the heat (diffusion) PDE

∂u

∂t=

1

2

∂2u

∂x2.

4.5.1 The Adjoint Semigroup

The semigroup Pt acts on bounded measurable functions. We can also define the adjoint semigroup

P ∗t which acts on probability measures:

P ∗t µ(Γ) =

∫R

P(Xt ∈ Γ|X0 = x) dµ(x) =

∫Rp(t, x,Γ) dµ(x).

The image of a probability measure µ under P ∗t is again a probability measure. The operators Pt

and P ∗t are adjoint in the L2-sense:

∫RPtf(x) dµ(x) =

∫Rf(x) d(P ∗t µ)(x). (4.22)

We can, formally at least, write

P ∗t = exp(L∗t),

where L∗ is the L2-adjoint of the generator of the process:∫Lfh dx =

∫fL∗h dx.

Let µt := P ∗t µ. This is the law of the Markov process and µ is the initial distribution. An argument

similar to the one used in the derivation of the backward Kolmogorov equation (4.21) enables us

to obtain an equation for the evolution of µt:

∂µt∂t

= L∗µt, µ0 = µ.

69

Assuming that µt = ρ(y, t) dy, µ = ρ0(y) dy this equation becomes:

∂ρ

∂t= L∗ρ, ρ(y, 0) = ρ0(y). (4.23)

This is the forward Kolmogorov or Fokker-Planck equation. When the initial conditions are

deterministic, X0 = x, the initial condition becomes ρ0 = δ(y − x). Given the initial distribution

and the generator of the Markov process Xt, we can calculate the transition probability density by

solving the Forward Kolmogorov equation. We can then calculate all statistical quantities of this

process through the formula

E(f(Xt)|X0 = x) =

∫f(y)ρ(t, y;x) dy.

We will derive rigorously the backward and forward Kolmogorov equations for Markov processes

that are defined as solutions of stochastic differential equations later on.

We can study the evolution of a Markov process in two different ways: Either through the

evolution of observables (Heisenberg/Koopman)

∂(Ptf)

∂t= L(Ptf),

or through the evolution of states (Schrodinger/Frobenious-Perron)

∂(P ∗t µ)

∂t= L∗(P ∗t µ).

We can also study Markov processes at the level of trajectories. We will do this after we define the

concept of a stochastic differential equation.

4.6 Ergodic Markov processes

A very important concept in the study of limit theorems for stochastic processes is that of er-

godicity. This concept, in the context of Markov processes, provides us with information on the

long–time behavior of a Markov semigroup.

Definition 4.6.1. A Markov process is called ergodic if the equation

Ptg = g, g ∈ Cb(E) ∀t > 0

has only constant solutions.

70

Roughly speaking, ergodicity corresponds to the case where the semigroup Pt is such that Pt−Ihas only constants in its null space, or, equivalently, to the case where the generator L has only

constants in its null space. This follows from the definition of the generator of a Markov process.

Under some additional compactness assumptions, an ergodic Markov process has an invariant

measure µ with the property that, in the case T = R+,

limt→+∞

1

t

∫ t

0

g(Xs) ds = Eg(x),

where E denotes the expectation with respect to µ. This is a physicist’s definition of an ergodic

process: time averages equal phase space averages.

Using the adjoint semigroup we can define an invariant measure as the solution of the equation

P ∗t µ = µ.

If this measure is unique, then the Markov process is ergodic. Using this, we can obtain an equation

for the invariant measure in terms of the adjoint of the generator L∗, which is the generator of the

semigroup P ∗t . Indeed, from the definition of the generator of a semigroup and the definition of an

invariant measure, we conclude that a measure µ is invariant if and only if

L∗µ = 0

in some appropriate generalized sense ((L∗µ, f) = 0 for every bounded measurable function).

Assume that µ(dx) = ρ(x) dx. Then the invariant density satisfies the stationary Fokker-Planck

equation

L∗ρ = 0.

The invariant measure (distribution) governs the long-time dynamics of the Markov process.

4.6.1 Stationary Markov Processes

If X0 is distributed according to µ, then so is Xt for all t > 0. The resulting stochastic process,

with X0 distributed in this way, is stationary . In this case the transition probability density (the

solution of the Fokker-Planck equation) is independent of time: ρ(x, t) = ρ(x). Consequently, the

statistics of the Markov process is independent of time.

71

Example 4.6.2. Consider the one-dimensional Brownian motion. The generator of this Markov

process is

L =1

2

d2

dx2.

The stationary Fokker-Planck equation becomes

d2ρ

dx2= 0, (4.24)

together with the normalization and non-negativity conditions

ρ > 0,

∫Rρ(x) dx = 1. (4.25)

There are no solutions to Equation (4.24), subject to the constraints (4.25). 2 Thus, the one dimen-

sional Brownian motion is not an ergodic process.

Example 4.6.3. Consider a one-dimensional Brownian motion on [0, 1], with periodic boundary

conditions. The generator of this Markov processL is the differential operatorL = 12d2

dx2 , equipped

with periodic boundary conditions on [0, 1]. This operator is self-adjoint. The null space of both

L and L∗ comprises constant functions on [0, 1]. Both the backward Kolmogorov and the Fokker-

Planck equation reduce to the heat equation

∂ρ

∂t=

1

2

∂2ρ

∂x2

with periodic boundary conditions in [0, 1]. Fourier analysis shows that the solution converges to

a constant at an exponential rate. See Exercise 6.

Example 4.6.4. The one dimensional Ornstein-Uhlenbeck (OU) process is a Markov process

with generator

L = −αx ddx

+Dd2

dx2.

The null space of L comprises constants in x. Hence, it is an ergodic Markov process. In order to

calculate the invariant measure we need to solve the stationary Fokker–Planck equation:

L∗ρ = 0, ρ > 0, ‖ρ‖L1(R) = 1. (4.26)

2The general solution to Equation (4.25) is ρ(x) = Ax + B for arbitrary constants A and B. This function is notnormalizable, i.e. there do not exist constants A and B so that

∫R rho(x) dx = 1.

72

Let us calculate the L2-adjoint of L. Assuming that f, h decay sufficiently fast at infinity, we have:∫RLfh dx =

∫R

[(−αx∂xf)h+ (D∂2

xf)h]dx

=

∫R

[f∂x(αxh) + f(D∂2

xh)]dx =:

∫RfL∗h dx,

where

L∗h :=d

dx(axh) +D

d2h

dx2.

We can calculate the invariant distribution by solving equation (4.26). The invariant measure of

this process is the Gaussian measure

µ(dx) =

√α

2πDexp

(− α

2Dx2)dx.

If the initial condition of the OU process is distributed according to the invariant measure, then

the OU process is a stationary Gaussian process.

Let Xt be the 1d OU process and let X0 ∼ N (0, D/α). Then Xt is a mean zero, Gaussian

second order stationary process on [0,∞) with correlation function

R(t) =D

αe−α|t|

and spectral density

f(x) =D

π

1

x2 + α2.

Furthermore, the OU process is the only real-valued mean zero Gaussian second-order stationary

Markov process defined on R.


The study of operator semigroups started in the late 40’s independently by Hille and Yosida. Semi-

group theory was developed in the 50’s and 60’s by Feller, Dynkin and others, mostly in connection

to the theory of Markov processes. Necessary and sufficient conditions for an operator L to be the

generator of a (contraction) semigroup are given by the Hille-Yosida theorem [22, Ch. 7].

73

4.8 Exercises

1. Let Xn be a stochastic process with state space S = Z. Show that it is a Markov process if

and only if for all n

P(Xn+1 = in+1|X1 = i1, . . . Xn = in) = P(Xn+1 = in+1|Xn = in).

2. Show that (4.4) is the solution of initial value problem (4.10) as well as of the final value

problem

−∂p∂s

=1

2

∂2p

∂x2, lim

s→tp(y, t|x, s) = δ(y − x).

3. Use (4.5) to show that the forward and backward Kolmogorov equations for the OU process are

∂p

∂t=

∂

∂y(yp) +

1

2

∂2p

∂y2

and

−∂p∂s

= −x∂p∂x

+1

2

∂2p

∂x2.

4. Let W (t) be a standard one dimensional Brownian motion, let Y (t) = σW (t) with σ > 0 and

consider the process

X(t) =

∫ t

0

Y (s) ds.

Show that the joint process X(t), Y (t) is Markovian and write down the generator of the

process.

5. Let Y (t) = e−tW (e2t) be the stationary Ornstein-Uhlenbeck process and consider the process

X(t) =

∫ t

0

Y (s) ds.

Show that the joint process X(t), Y (t) is Markovian and write down the generator of the

process.

6. Consider a one-dimensional Brownian motion on [0, 1], with periodic boundary conditions. The

generator of this Markov processL is the differential operatorL = 12d2

dx2 , equipped with periodic

boundary conditions on [0, 1]. Show that this operator is self-adjoint. Show that the null space

of both L and L∗ comprises constant functions on [0, 1]. Conclude that this process is ergodic.

Solve the corresponding Fokker-Planck equation for arbitrary initial conditions ρ0(x) . Show

that the solution converges to a constant at an exponential rate. .

74

7. (a) Let X, Y be mean zero Gaussian random variables with EX2 = σ2X , EY 2 = σ2

Y and

correlation coefficient ρ (the correlation coefficient is ρ = E(XY )σXσY

). Show that

E(X|Y ) =ρσXσY

Y.

(b) Let Xt be a mean zero stationary Gaussian process with autocorrelation function R(t).

Use the previous result to show that

E[Xt+s|Xs] =R(t)

R(0)X(s), s, t > 0.

(c) Use the previous result to show that the only stationary Gaussian Markov process with

continuous autocorrelation function is the stationary OU process.

8. Show that a Gaussian process Xt is a Markov process if and only if

E(Xtn|Xt1 = x1, . . . Xtn−1 = xn−1) = E(Xtn|Xtn−1 = xn−1).

75

76

Chapter 5

Diffusion Processes

5.1 Introduction

In this chapter we study a particular class of Markov processes, namely Markov processes with

continuous paths. These processes are called diffusion processes and they appear in many appli-

cations in physics, chemistry, biology and finance.

In Section 5.2 we give the definition of a diffusion process. In section 5.3 we derive the forward

and backward Kolmogorov equations for one-dimensional diffusion processes. In Section 5.4 we

present the forward and backward Kolmogorov equations in arbitrary dimensions. The connec-

tion between diffusion processes and stochastic differential equations is presented in Section 5.5.

Discussion and bibliographical remarks are included in Section 5.7. Exercises can be found in

Section 5.8.

5.2 Definition of a Diffusion Process

A Markov process consists of three parts: a drift (deterministic), a random process and a jump

process. A diffusion process is a Markov process that has continuous sample paths (trajectories).

Thus, it is a Markov process with no jumps. A diffusion process can be defined by specifying its

first two moments:

Definition 5.2.1. A Markov process Xt with transition function P (Γ, t|x, s) is called a diffusion

process if the following conditions are satisfied.

77

i. (Continuity). For every x and every ε > 0∫|x−y|>ε

P (dy, t|x, s) = o(t− s) (5.1)

uniformly over s < t.

ii. (Definition of drift coefficient). There exists a function a(x, s) such that for every x and every

ε > 0 ∫|y−x|6ε

(y − x)P (dy, t|x, s) = a(x, s)(t− s) + o(t− s). (5.2)


iii. (Definition of diffusion coefficient). There exists a function b(x, s) such that for every x and

every ε > 0 ∫|y−x|6ε

(y − x)2P (dy, t|x, s) = b(x, s)(t− s) + o(t− s). (5.3)


Remark 5.2.2. In Definition 5.2.1 we had to truncate the domain of integration since we didn’t

know whether the first and second moments exist. If we assume that there exists a δ > 0 such that

limt→s

1

t− s

∫Rd|y − x|2+δP (dy, t|x, s) = 0, (5.4)

then we can extend the integration over the whole Rd and use expectations in the definition of the

drift and the diffusion coefficient. Indeed, ,let k = 0, 1, 2 and notice that∫|y−x|>ε

|y − x|kP (dy, t|x, s)

=

∫|y−x|>ε

|y − x|2+δ|y − x|k−(2+δ)P (dy, t|x, s)

61

ε2+δ−k

∫|y−x|>ε

|y − x|2+δP (dy, t|x, s)

61

ε2+δ−k

∫Rd|y − x|2+δP (dy, t|x, s).

Using this estimate together with (5.4) we conclude that:

limt→s

1

t− s

∫|y−x|>ε

|y − x|kP (dy, t|x, s) = 0, k = 0, 1, 2.

78

This implies that assumption (5.4) is sufficient for the sample paths to be continuous (k = 0) and

for the replacement of the truncated integrals in (10.1) and (5.3) by integrals over R (k = 1 and

k = 2, respectively). The definitions of the drift and diffusion coefficients become:

limt→s

E(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.5)

and

limt→s

E(|Xt −Xs|2

t− s

∣∣∣Xs = x

)= b(x, s) (5.6)

5.3 The Backward and Forward Kolmogorov Equations

In this section we show that a diffusion process is completely determined by its first two moments.

In particular, we will obtain partial differential equations that govern the evolution of the condi-

tional expectation of an arbitrary function of a diffusion process Xt, u(x, s) = E(f(Xt)|Xs = x),

as well as of the transition probability density p(y, t|x, s). These are the backward and forward

Kolmogorov equations.

In this section we shall derive the backward and forward Kolmogorov equations for one-

dimensional diffusion processes. The extension to multidimensional diffusion processes is pre-

sented in Section 5.4.

5.3.1 The Backward Kolmogorov Equation

Theorem 5.3.1. (Kolmogorov) Let f(x) ∈ Cb(R) and let

u(x, s) := E(f(Xt)|Xs = x) =

∫f(y)P (dy, t|x, s).

Assume furthermore that the functions a(x, s), b(x, s) are continuous in both x and s. Then

u(x, s) ∈ C2,1(R× R+) and it solves the final value problem

−∂u∂s

= a(x, s)∂u

∂x+

1

2b(x, s)

∂2u

∂x2, lim

s→tu(s, x) = f(x). (5.7)

Proof. First we notice that, the continuity assumption (5.1), together with the fact that the function

79

f(x) is bounded imply that

u(x, s) =

∫Rf(y)P (dy, t|x, s)

=

∫|y−x|6ε

f(y)P (dy, t|x, s) +

∫|y−x|>ε

f(y)P (dy, t|x, s)

6∫|y−x|6ε

f(y)P (dy, t|x, s) + ‖f‖L∞∫|y−x|>ε

P (dy, t|x, s)

=

∫|y−x|6ε

f(y)P (dy, t|x, s) + o(t− s).

We add and subtract the final condition f(x) and use the previous calculation to obtain:

u(x, s) =

∫Rf(y)P (dy, t|x, s) = f(x) +

∫R(f(y)− f(x))P (dy, t|x, s)

= f(x) +

∫|y−x|6ε

(f(y)− f(x))P (dy, t|x, s) +

∫|y−x|>ε

(f(y)− f(x))P (dy, t|x, s)

= f(x) +

∫|y−x|6ε

(f(y)− f(x))P (dy, t|x, s) + o(t− s).

Now the final condition follows from the fact that f(x) ∈ Cb(R) and the arbitrariness of ε.

Now we show that u(s, x) solves the backward Kolmogorov equation. We use the Chapman-

Kolmogorov equation (4.15) to obtain

u(x, σ) =

∫Rf(z)P (dz, t|x, σ) (5.8)

=

∫R

∫Rf(z)P (dz, t|y, ρ)P (dy, ρ|x, σ)

=

∫Ru(y, ρ)P (dy, ρ|x, σ). (5.9)

The Taylor series expansion of the function u(x, s) gives

u(z, ρ)− u(x, ρ) =∂u(x, ρ)

∂x(z − x) +

1

2

∂2u(x, ρ)

∂x2(z − x)2(1 + αε), |z − x| 6 ε, (5.10)

where

αε = supρ,|z−x|6ε

∣∣∣∣∂2u(x, ρ)

∂x2− ∂2u(z, ρ)

∂x2

∣∣∣∣ .Notice that, since u(x, s) is twice continuously differentiable in x, limε→0 αε = 0.

80

We combine now (5.9) with (5.10) to calculate

u(x, s)− u(x, s+ h)

h=

1

h

(∫RP (dy, s+ h|x, s)u(y, s+ h)− u(x, s+ h)

)=

1

h

∫RP (dy, s+ h|x, s)(u(y, s+ h)− u(x, s+ h))

=1

h

∫|x−y|<ε

P (dy, s+ h|x, s)(u(y, s+ h)− u(x, s)) + o(1)

=∂u

∂x(x, s+ h)

1

h

∫|x−y|<ε

(y − x)P (dy, s+ h|x, s)

+1

2

∂2u

∂x2(x, s+ h)

1

h

∫|x−y|<ε

(y − x)2P (dy, s+ h|x, s)(1 + αε) + o(1)

= a(x, s)∂u

∂x(x, s+ h) +

1

2b(x, s)

∂2u

∂x2(x, s+ h)(1 + αε) + o(1).

Equation (5.7) follows by taking the limits ε→ 0, h→ 0.

Assume now that the transition function has a density p(y, t|x, s). In this case the formula for

u(x, s) becomes

u(x, s) =

∫Rf(y)p(y, t|x, s) dy.

Substituting this in the backward Kolmogorov equation we obtain∫Rf(y)

(∂p(y, t|x, s)

∂s+As,xp(y, t|x, s)

)= 0 (5.11)

where

As,x := a(x, s)∂

∂x+

1

2b(x, s)

∂2

∂x2.

Since (5.11) is valid for arbitrary functions f(y), we obtain a partial differential equations for the

transition probability density:

−∂p(y, t|x, s)∂s

= a(x, s)∂p(y, t|x, s)

∂x+

1

2b(x, s)

∂2p(y, t|x, s)∂x2

. (5.12)

Notice that the variation is with respect to the ”backward” variables x, s. We will obtain an equa-

tion with respect to the ”forward” variables y, t in the next section.

5.3.2 The Forward Kolmogorov Equation

In this section we will obtain the forward Kolmogorov equation. In the physics literature is called

the Fokker-Planck equation. We assume that the transition function has a density with respect to

81

Lebesgue measure.

P (Γ, t|x, s) =

∫Γ

p(y, t|x, s) dy.

Theorem 5.3.2. (Kolmogorov) Assume that conditions (5.1), (10.1), (5.3) are satisfied and that

p(y, t|·, ·), a(y, t), b(y, t) ∈ C2,1(R × R+). Then the transition probability density satisfies the

equation

∂p

∂t= − ∂

∂y(a(t, y)p) +

1

2

∂2

∂y2(b(t, y)p) , lim

t→sp(t, y|x, s) = δ(x− y). (5.13)

Proof. Fix a function f(y) ∈ C20(R). An argument similar to the one used in the proof of the

backward Kolmogorov equation gives

limh→0

1

h

(∫f(y)p(y, s+ h|x, s) ds− f(x)

)= a(x, s)fx(x) +

1

2b(x, s)fxx(x), (5.14)

where subscripts denote differentiation with respect to x. On the other hand∫f(y)

∂

∂tp(y, t|x, s) dy =

∂

∂t

∫f(y)p(y, t|x, s) dy

= limh→0

1

h

∫(p(y, t+ h|x, s)− p(y, t|x, s)) f(y) dy

= limh→0

1

h

(∫p(y, t+ h|x, s)f(y) dy −

∫p(z, t|s, x)f(z) dz

)= lim

h→0

1

h

(∫ ∫p(y, t+ s|z, t)p(z, t|x, s)f(y) dydz −

∫p(z, t|s, x)f(z) dz

)= lim

h→0

1

h

(∫p(z, t|x, s)

(∫p(y, t+ h|z, t)f(y) dy − f(z)

))dz

=

∫p(z, t|x, s)

(a(z, t)fz(z) +

1

2b(z)fzz(z)

)dz

=

∫ (− ∂

∂z(a(z)p(z, t|x, s)) +

1

2

∂2

∂z2(b(z)p(z, t|x, s)

)f(z) dz.

In the above calculation used the Chapman-Kolmogorov equation. We have also performed two

integrations by parts and used the fact that, since the test function f has compact support, the

boundary terms vanish.

Since the above equation is valid for every test function f(y), the forward Kolmogorov equation

follows.

Assume now that initial distribution of Xt is ρ0(x) and set s = 0 (the initial time) in (5.13).

Define

p(y, t) :=

∫p(y, t|x, 0)ρ0(x) dx. (5.15)

82

We multiply the forward Kolmogorov equation (5.13) by ρ0(x) and integrate with respect to x to

obtain the equation

∂p(y, t)

∂t= − ∂

∂y(a(y, t)p(y, t)) +

1

2

∂2

∂y2(b(y, t)p(t, y)) , (5.16)

together with the initial condition

p(y, 0) = ρ0(y). (5.17)

The solution of equation (5.16), provides us with the probability that the diffusion process Xt,

which initially was distributed according to the probability density ρ0(x), is equal to y at time t.

Alternatively, we can think of the solution to (5.13) as the Green’s function for the PDE (5.16).

Using (5.16) we can calculate the expectation of an arbitrary function of the diffusion process Xt:

E(f(Xt)) =

∫ ∫f(y)p(y, t|x, 0)p(x, 0) dxdy

=

∫f(y)p(y, t) dy,

where p(y, t) is the solution of (5.16). Quite often we need to calculate joint probability densities.

For, example the probability that Xt1 = x1 and Xt2 = x2. From the properties of conditional

expectation we have that

p(x1, t1, x2, t2) = P(Xt1 = x1, Xt2 = x2)

= P(Xt1 = x1|Xt2 = x2)P(Xt2 = x2)

= p(x1, t1|x2t2)p(x2, t2).

Using the joint probability density we can calculate the statistics of a function of the diffusion

process Xt at times t and s:

E(f(Xt, Xs)) =

∫ ∫f(y, x)p(y, t|x, s)p(x, s) dxdy. (5.18)

The autocorrelation function at time t and s is given by

E(XtXs) =

∫ ∫yxp(y, t|x, s)p(x, s) dxdy.

In particular,

E(XtX0) =

∫ ∫yxp(y, t|x, 0)p(x, 0) dxdy.

83

5.4 Multidimensional Diffusion Processes

Let Xt be a diffusion process in Rd. The drift and diffusion coefficients of a diffusion process in

Rd are defined as:

limt→s

1

t− s

∫|y−x|<ε

(y − x)P (dy, t|x, s) = a(x, s)

and

limt→s

1

t− s

∫|y−x|<ε

(y − x)⊗ (y − x)P (dy, t|x, s) = b(x, s).

The drift coefficient a(x, s) is a d-dimensional vector field and the diffusion coefficient b(x, s) is a

d× d symmetric matrix (second order tensor). The generator of a d dimensional diffusion process

is

L = a(x, s) · ∇+1

2b(x, s) : ∇∇

=d∑j=1

aj(x, s)∂

∂xj+

1

2

d∑i,j=1

bij(x, s)∂2

∂x2j

.

Exercise 5.4.1. Derive rigorously the forward and backward Kolmogorov equations in arbitrary

dimensions.

Assuming that the first and second moments of the multidimensional diffusion process exist,

we can write the formulas for the drift vector and diffusion matrix as

limt→s

E(Xt −Xs

t− s

∣∣∣Xs = x

)= a(x, s) (5.19)

and

limt→s

E(

(Xt −Xs)⊗ (Xt −Xs)

t− s

∣∣∣Xs = x

)= b(x, s) (5.20)

Notice that from the above definition it follows that the diffusion matrix is symmetric and nonneg-

ative definite.

5.5 Connection with Stochastic Differential Equations

Notice also that the continuity condition can be written in the form

P (|Xt −Xs| > ε|Xs = x) = o(t− s).

84

Now it becomes clear that this condition implies that the probability of large changes in Xt over

short time intervals is small. Notice, on the other hand, that the above condition implies that the

sample paths of a diffusion process are not differentiable: if they where, then the right hand side

of the above equation would have to be 0 when t− s 1. The sample paths of a diffusion process

have the regularity of Brownian paths. A Markovian process cannot be differentiable: we can

define the derivative of a sample paths only with processes for which the past and future are not

statistically independent when conditioned on the present.

Let us denote the expectation conditioned on Xs = x by Es,x. Notice that the definitions of the

drift and diffusion coefficients (5.5) and (5.6) can be written in the form

Es,x(Xt −Xs) = a(x, s)(t− s) + o(t− s).

and

Es,x(

(Xt −Xs)⊗ (Xt −Xs))

= b(x, s)(t− s) + o(t− s).

Consequently, the drift coefficient defines the mean velocity vector for the stochastic process Xt,

whereas the diffusion coefficient (tensor) is a measure of the local magnitude of fluctuations of

Xt −Xs about the mean value. hence, we can write locally:

Xt −Xs ≈ a(s,Xs)(t− s) + σ(s,Xs) ξt,

where b = σσT and ξt is a mean zero Gaussian process with

Es,x(ξt ⊗ ξs) = (t− s)I.

Since we have that

Wt −Ws ∼ N (0, (t− s)I),

we conclude that we can write locally:

∆Xt ≈ a(s,Xs)∆t+ σ(s,Xs)∆Wt.

Or, replacing the differences by differentials:

dXt = a(t,Xt)dt+ σ(t,Xt)dWt.

Hence, the sample paths of a diffusion process are governed by a stochastic differential equation

(SDE).

85

5.6 Examples of Diffusion Processes

i. The 1-dimensional Brownian motion starting at x is a diffusion process with generator

L =1

2

d2

dx2.

The drift and diffusion coefficients are, respectively a(x) = 0 and b(x) = 1. The corre-

sponding stochastic differential equation is

dXt = dWt, X0 = x.

The solution of this SDE is

Xt = x+Wt.

ii. The 1-dimensional Ornstein-Uhlenbeck process is a diffusion process with drift and diffusion

coefficients, respectively, a(x) = −αx and b(x) = D. The generator of this process is

L = −αx ddx

+D

2

d2

dx2.

The corresponding SDE is

dXt = −αXt dt+√DdWt.

The solution to this equation is

Xt = e−αtX0 +√D

∫ t

0

e−α(t−s) dWs.


The argument used in the derivation of the forward and backward Kolmogorov equations goes back

to Kolmogorov’s original work. More material on diffusion processes can be found in [36], [42].

5.8 Exercises

1. Prove equation (5.14).

2. Derive the initial value problem (5.16), (5.17).

3. Derive rigorously the backward and forward Kolmogorov equations in arbitrary dimensions.

86

Chapter 6

The Fokker-Planck Equation

6.1 Introduction

In the previous chapter we derived the backward and forward (Fokker-Planck) Kolmogorov equa-

tions and we showed that all statistical properties of a diffusion process can be calculated from the

solution of the Fokker-Planck equation. 1 In this long chapter we study various properties of this

equation such as existence and uniqueness of solutions, long time asymptotics, boundary condi-

tions and spectral properties of the Fokker-Planck operator. We also study in some detail various

examples of diffusion processes and of the associated Fokker-Palnck equation. We will restrict

attention to time-homogeneous diffusion processes, for which the drift and diffusion coefficients

do not depend on time.

In Section 6.2 we study various basic properties of the Fokker-Planck equation, including exis-

tence and uniqueness of solutions, writing the equation as a conservation law and boundary condi-

tions. In Section 6.3 we present some examples of diffusion processes and use the corresponding

Fokker-Planck equation in order to calculate various quantities of interest such as moments. In

Section 6.4 we study the multidimensional Onrstein-Uhlenbeck process and we study the spectral

properties of the corresponding Fokker-Planck operator. In Section 6.5 we study stochastic pro-

cesses whose drift is given by the gradient of a scalar function, gradient flows. In Section 6.7 we

solve the Fokker-Planck equation for a gradient SDE using eigenfunction expansions and we show

how the eigenvalue problem for the Fokker-Planck operator can be reduced to the eigenfunction

expansion for a Schrodinger operator. In Section 8.2 we study the Langevin equation and the as-

1In this chapter we will call the equation Fokker-Planck, which is more customary in the physics literature. ratherforward Kolmogorov, which is more customary in the mathematics literature.

87

sociated Fokker-Planck equation. In Section 8.3 we calculate the eigenvalues and eigenfunctions

of the Fokker-Planck operator for the Langevin equation in a harmonic potential. Discussion and

bibliographical remarks are included in Section 6.8. Exercises can be found in Section 6.9.

6.2 Basic Properties of the FP Equation

6.2.1 Existence and Uniqueness of Solutions

Consider a homogeneous diffusion process on Rd with drift vector and diffusion matrix a(x) and

b(x). The Fokker-Planck equation is

∂p

∂t= −

d∑j=1

∂

∂xj(ai(x)p) +

1

2

d∑i,j=1

∂2

∂xi∂xj(bij(x)p), t > 0, x ∈ Rd, (6.1a)

p(x, 0) = f(x), x ∈ Rd. (6.1b)

Since f(x) is the probability density of the initial condition (which is a random variable), we have

that

f(x) > 0, and∫

Rdf(x) dx = 1.

We can also write the equation in non-divergence form:

∂p

∂t=

d∑j=1

aj(x)∂p

∂xj+

1

2

d∑i,j=1

bij(x)∂2p

∂xi∂xj+ c(x)u, t > 0, x ∈ Rd, (6.2a)

p(x, 0) = f(x), x ∈ Rd, (6.2b)

where

ai(x) = −ai(x) +d∑j=1

∂bij∂xj

, ci(x) =1

2

d∑i,j=1

∂2bij∂xi∂xj

−d∑i=1

∂ai∂xi

.

By definition (see equation (5.20)), the diffusion matrix is always symmetric and nonnegative.

We will assume that it is actually uniformly positive definite, i.e. we will impose the uniform

ellipticity condition:d∑

i,j=1

bij(x)ξiξj > α‖ξ‖2, ∀ ξ ∈ Rd, (6.3)

Furthermore, we will assume that the coefficients a, b, c are smooth and that they satisfy the growth

conditions

‖b(x)‖ 6 M, ‖a(x)‖ 6 M(1 + ‖x‖), ‖c(x)‖ 6 M(1 + ‖x‖2). (6.4)

88

Definition 6.2.1. We will call a solution to the Cauchy problem for the Fokker–Planck equa-

tion (6.2) a classical solution if:

i. u ∈ C2,1(Rd,R+).

ii. ∀T > 0 there exists a c > 0 such that

‖u(t, x)‖L∞(0,T ) 6 ceα‖x‖2

iii. limt→0 u(t, x) = f(x).

It is a standard result in the theory of parabolic partial differential equations that, under the

regularity and uniform ellipticity assumptions, the Fokker-Planck equation has a unique smooth

solution. Furthermore, the solution can be estimated in terms of an appropriate heat kernel (i.e. the

solution of the heat equation on Rd).

Theorem 6.2.2. Assume that conditions (6.3) and (6.4) are satisfied, and assume that |f | 6

ceα‖x‖2. Then there exists a unique classical solution to the Cauchy problem for the Fokker–Planck

equation. Furthermore, there exist positive constants K, δ so that

|p|, |pt|, ‖∇p‖, ‖D2p‖ 6 Kt(−n+2)/2 exp

(− 1

2tδ‖x‖2

). (6.5)

Notice that from estimates (6.5) it follows that all moments of a uniformly elliptic diffusion

process exist. In particular, we can multiply the Fokker-Planck equation by monomials xn and

then to integrate over Rd and to integrate by parts. No boundary terms will appear, in view of the

estimate (6.5).

Remark 6.2.3. The solution of the Fokker-Planck equation is nonnegative for all times, pro-

vided that the initial distribution is nonnegative. This is follows from the maximum principle

for parabolic PDEs.

6.2.2 The FP equation as a conservation law

The Fokker-Planck equation is in fact a conservation law: it expresses the law of conservation of

probability. To see this we define the probability current to be the vector whose ith component is

Ji := ai(x)p− 1

2

d∑j=1

∂

∂xj

(bij(x)p

). (6.6)

89

We use the probability current to write the Fokker–Planck equation as a continuity equation:

∂p

∂t+∇ · J = 0.

Integrating the FP equation over Rd and integrating by parts on the right hand side of the equation

we obtaind

dt

∫Rdp(x, t) dx = 0.

Consequently:

‖p(·, t)‖L1(Rd) = ‖p(·, 0)‖L1(Rd) = 1. (6.7)

Hence, the total probability is conserved, as expected. Equation (6.7) simply means that

E(Xt ∈ Rd) = 1, t > 0.

6.2.3 Boundary conditions for the Fokker–Planck equation

When studying a diffusion process that can take values on the whole of Rd, then we study the

pure initial value (Cauchy) problem for the Fokker-Planck equation, equation (6.1). The boundary

condition was that the solution decays sufficiently fast at infinity. For ergodic diffusion processes

this is equivalent to requiring that the solution of the backward Kolmogorov equation is an element

of L2(µ) where µ is the invariant measure of the process. There are many applications where it is

important to study stochastic process in bounded domains. In this case it is necessary to specify

the value of the stochastic process (or equivalently of the solution to the Fokker-Planck equation)

on the boundary.

To understand the type of boundary conditions that we can impose on the Fokker-Planck equa-

tion, let us consider the example of a random walk on the domain 0, 1, . . . N.2 When the random

walker reaches either the left or the right boundary we can either set

i. X0 = 0 or XN = 0, which means that the particle gets absorbed at the boundary;

ii. X0 = X1 or XN = XN−1, which means that the particle is reflected at the boundary;

iii. X0 = XN , which means that the particle is moving on a circle (i.e., we identify the left and

right boundaries).2Of course, the random walk is not a diffusion process. However, as we have already seen the Brownian motion

can be defined as the limit of an appropriately rescaled random walk. A similar construction exists for more generaldiffusion processes.

90

Hence, we can have absorbing, reflecting or periodic boundary conditions.

Consider the Fokker-Planck equation posed in Ω ⊂ Rd where Ω is a bounded domain with

smooth boundary. Let J denote the probability current and let n be the unit outward pointing

normal vector to the surface. The above boundary conditions become:

i. The transition probability density vanishes on an absorbing boundary:

p(x, t) = 0, on ∂Ω.

ii. There is no net flow of probability on a reflecting boundary:

n · J(x, t) = 0, on ∂Ω.

iii. The transition probability density is a periodic function in the case of periodic boundary

conditions.

Notice that, using the terminology customary to PDEs theory, absorbing boundary conditions cor-

respond to Dirichlet boundary conditions and reflecting boundary conditions correspond to Neu-

mann. Of course, on consider more complicated, mixed boundary conditions.

Consider now a diffusion process in one dimension on the interval [0, L]. The boundary condi-

tions are

p(0, t) = p(L, t) = 0 absorbing,

J(0, t)) = J(L, t) = 0 reflecting,

p(0, t) = p(L, t) periodic,

where the probability current is defined in (6.6). An example of mixed boundary conditions would

be absorbing boundary conditions at the left end and reflecting boundary conditions at the right

end:

p(0, t) = J(L, t) = 0.

There is a complete classification of boundary conditions in one dimension, the Feller classifica-

tion: the BC can be regular, exit, entrance and natural.

91

6.3 Examples of Diffusion Processes

6.3.1 Brownian MotionBrownian Motion on R

Set a(y, t) ≡ 0, b(y, t) ≡ 2D > 0. This diffusion process is the Brownian motion with diffusion

coefficient D. Let us calculate the transition probability density of this process assuming that

the Brownian particle is at y at time s. The Fokker-Planck equation for the transition probability

density p(x, t|y, s) is:∂p

∂t= D

∂2p

∂x2, p(x, s|y, s) = δ(x− y). (6.8)

The solution to this equation is the Green’s function (fundamental solution) of the heat equation:

p(x, t|y, s) =1√

4πD(t− s)exp

(− (x− y)2

4D(t− s)

). (6.9)

Notice that using the Fokker-Planck equation for the Brownian motion we can immediately show

that the mean squared displacement grows linearly in time. Assuming that the Brownian particle

is at the origin at time t = 0 we get

d

dtEW 2

t =d

dt

∫Rx2p(x, t|0, 0) dx

= D

∫Rx2∂

2p(x, t)

∂x2dx

= D

∫Rp(x, t|0, 0) dx = 2D,

where we performed two integrations by parts and we used the fact that, in view of (6.9), no

boundary terms remain. From this calculation we conclude that

EW 2t = 2Dt.

Assume now that the initial condition W0 of the Brownian particle is a random variable with distri-

bution ρ0(x). To calculate the probability density function (distribution function) of the Brownian

particle we need to solve the Fokker-Planck equation with initial condition ρ0(x). In other words,

we need to take the average of the probability density function p(x, t|y, 0) over all initial real-

izations of the Brownian particle. The solution of the Fokker-Planck equation, the distribution

function, is

p(x, t) =

∫p(x, t|y, 0)ρ0(y) dy. (6.10)

92

Notice that only the transition probability density depends on x and y only through their difference.

Thus, we can write p(x, t|y, 0) = p(x − y, t). From (6.10) we see that the distribution function is

given by the convolution between the transition probability density and the initial condition, as we

know from the theory of partial differential equations.

p(x, t) =

∫p(x− y, t)ρ0(y) dy =: p ? ρ0.

Brownian motion with absorbing boundary conditions

We can also consider Brownian motion in a bounded domain, with either absorbing, reflecting or

periodic boundary conditions. Set D = 1 and consider the Fokker-Planck equation (6.8) on [0, 1]

with absorbing boundary conditions:

∂p

∂t=

1

2

∂2p

∂x2, p(0, t) = p(1, t) = 0. (6.11)

We look for a solution to this equation in a sine Fourier series:

p(x, t) =∞∑k=1

pn(t) sin(nπx). (6.12)

Notice that the boundary conditions are automatically satisfied. The initial condition is

p(x, 0) = δ(x− x0),

where we have assumed that W0 = x0. The Fourier coefficients of the initial conditions are

pn(0) = 2

∫ 1

0

δ(x− x0) sin(nπx) dx = 2 sin(nπx0).

We substitute the expansion (6.12) into (6.11) and use the orthogonality properties of the Fourier

basis to obtain the equations

pn = −n2π2

2pn n = 1, 2, . . .

The solution of this equation is

pn(t) = pn(0)e−n2π2

2t.

Consequently, the transition probability density for the Brownian motion on [0, 1] with absorbing

boundary conditions is

p(x, t|x0, 0) = 2∞∑n=1

e−n2π2

2t sinnπx0 sin(nπx).

93

Notice that

limt→∞

p(x, t|x0, 0) = 0.

This is not surprising, since all Brownian particles will eventually get absorbed at the boundary.

Brownian Motion with Reflecting Boundary Condition

Consider now Brownian motion on the interval [0, 1] with reflecting boundary conditions and set

D = 1 for simplicity. In order to calculate the transition probability density we have to solve the

Fokker-Planck equation which is the heat equation on [0, 1] with Neumann boundary conditions:

∂p

∂t=

1

2

∂2p

∂x2, ∂xp(0, t) = ∂xp(1, t) = 0, p(x, 0) = δ(x− x0).

The boundary conditions are satisfied by functions of the form cos(nπx). We look for a solution

in the form of a cosine Fourier series

p(x, t) =1

2a0 +

∞∑n=1

an(t) cos(nπx).

From the initial conditions we obtain

an(0) = 2

∫ 1

0

cos(nπx)δ(x− x0) dx = 2 cos(nπx0).

We substitute the expansion into the PDE and use the orthonormality of the Fourier basis to obtain

the equations for the Fourier coefficients:

an = −n2π2

2an

from which we deduce that

an(t) = an(0)e−n2π2

2t.

Consequently

p(x, t|x0, 0) = 1 + 2∞∑n=1

cos(nπx0) cos(nπx)e−n2π2

2t.

Notice that Brownian motion with reflecting boundary conditions is an ergodic Markov process.

To see this, let us consider the stationary Fokker-Planck equation

∂2ps∂x2

= 0, ∂xps(0) = ∂xps(1) = 0.

94

The unique normalized solution to this boundary value problem is ps(x) = 1. Indeed, we multiply

the equation by ps, integrate by parts and use the boundary conditions to obtain∫ 1

0

∣∣∣∣dpsdx∣∣∣∣2 dx = 0,

from which it follows that ps(x) = 1. Alternatively, by taking the limit of p(x, t|x0, 0) as t → ∞we obtain the invariant distribution:

limt→∞

p(x, t|x0, 0) = 1.

Now we can calculate the stationary autocorrelation function:

E(W (t)W (0)) =

∫ 1

0

∫ 1

0

xx0p(x, t|x0, 0)ps(x0) dxdx0

=

∫ 1

0

∫ 1

0

xx0

(1 + 2

∞∑n=1

cos(nπx0) cos(nπx)e−n2π2

2t

)dxdx0

=1

4+

8

π4

+∞∑n=0

1

(2n+ 1)4e−

(2n+1)2π2

2t.

6.3.2 The Ornstein-Uhlenbeck Process

We set now a(x, t) = −αx, b(x, t) = 2D > 0. With this drift and diffusion coefficients the

Fokker-Planck equation becomes

∂p

∂t= α

∂(xp)

∂x+D

∂2p

∂x2. (6.13)

This is the Fokker-Planck equation for the Ornstein-Uhlenbeck process. The corresponding stochas-

tic differential equation is

dXt = −αXt +√

2DdWt.

So, in addition to Brownian motion there is a linear force pulling the particle towards the origin.

We know that Brownian motion is not a stationary process, since the variance grows linearly in

time. By adding a linear damping term, it is reasonable to expect that the resulting process can be

stationary. As we have already seen, this is indeed the case.

The transition probability density pOU(x, t|y, s) for an OU particle that is located at y at time s

is

pOU(y, t|x, s) =

√α

2πD(1− e−2α(t−s))exp

(−α(x− e−α(t−s)y)2

2D(1− e−2α(t−s))

). (6.14)

95

We obtained this formula in Example (4.2.4) (for α = D = 1) by using the fact that the OU process

can be defined through the a time change of the Brownian motion. We can also derive it by solving

equation (6.13). To obtain (6.14), we first take the Fourier transform of the transition probability

density with respect to x, solve the resulting first order PDE using the method of characteristics

and then take the inverse Fourier transform3

Notice that from formula (6.14) it immediately follows that in the limit as the friction coefficient

α goes to 0, the transition probability of the OU processes converges to the transition probability

of Brownian motion. Furthermore, by taking the long time limit in (6.14) we obtain (we have set

s = 0)

limt→+∞

pOU(x, t|y, 0) =

√α

2πDexp

(−αx

2

2D

),

irrespective of the initial position y of the OU particle. This is to be expected, since as we have

already seen the Ornstein-Uhlenbeck process is an ergodic Markov process, with a Gaussian in-

variant distribution

ps(x) =

√α

2πDexp

(−αx

2

2D

). (6.15)

Using now (6.14) and (6.15) we obtain the stationary joint probability density

p2(x, t|y, 0) = p(x, t|y, 0)ps(y)

=α

2πD√

1− e−2αtexp

(−α(x2 + y2 − 2xye−αt)

2D(1− e−2αt)

).

More generally, we have

p2(x, t|y, s) =α

2πD√

1− e−2α|t−s|exp

(−α(x2 + y2 − 2xye−α|t−s|)

2D(1− e−2α|t−s|)

). (6.16)

Now we can calculate the stationary autocorrelation function of the OU process

E(X(t)X(s)) =

∫ ∫xyp2(x, t|y, s) dxdy (6.17)

=D

αe−α|t−s|. (6.18)

In order to calculate the double integral we need to perform an appropriate change of variables.

The calculation is similar to the one presented in Section 2.6. See Exercise 2.

3This calculation will be presented in Section ?? for the Fokker-Planck equation of a linear SDE in arbitrarydimensions.

96

Assume that initial position of the OU particle is a random variable distributed according to a

distribution ρ0(x). As in the case of a Brownian particle, the probability density function (distri-

bution function) is given by the convolution integral

p(x, t) =

∫p(x− y, t)ρ0(y) dy, (6.19)

where p(x− y, t) := p(x, t|y, 0). When the OU process is distributed initially according to its in-

variant distribution, ρ0(x) = ps(x) given by (6.15), then the Ornstein-Uhlenbeck process becomes

stationary. The distribution function is given by ps(x) at all times and the joint probability density

is given by (6.16).

Knowledge of the distribution function enables us to calculate all moments of the OU process

using the formula

E((Xt)n) =

∫xnp(x, t) dx,

We will calculate the moments by using the Fokker-Planck equation, rather than the explicit for-

mula for the transition probability density. Let Mn(t) denote the nth moment of the OU process,

Mn :=

∫Rxnp(x, t) dx, n = 0, 1, 2, . . . ,

Let n = 0. We integrate the FP equation over R to obtain:∫∂p

∂t= α

∫∂(yp)

∂y+D

∫∂2p

∂y2= 0,

after an integration by parts and using the fact that p(x, t) decays sufficiently fast at infinity. Con-

sequently:d

dtM0 = 0 ⇒ M0(t) = M0(0) = 1.

In other words, sinced

dt‖p‖L1(R) = 0,

we deduce that ∫Rp(x, t) dx =

∫Rp(x, t = 0) dy = 1,

which means that the total probability is conserved, as we have already shown for the general

Fokker-Planck equation in arbitrary dimensions. Let n = 1. We multiply the FP equation for the

OU process by x, integrate over R and perform and integration by parts to obtain:

d

dtM1 = −αM1.

97

Consequently, the first moment converges exponentially fast to 0:

M1(t) = e−αtM1(0).

Let now n > 2. We multiply the FP equation for the OU process by xn and integrate by parts (once

on the first term on the RHS and twice on the second) to obtain:

d

dt

∫ynp = −αn

∫ynp+Dn(n− 1)

∫yn−2p.

Or, equivalently:d

dtMn = −αnMn +Dn(n− 1)Mn−2, n > 2.

This is a first order linear inhomogeneous differential equation. We can solve it using the variation

of constants formula:

Mn(t) = e−αntMn(0) +Dn(n− 1)

∫ t

0

e−αn(t−s)Mn−2(s) ds. (6.20)

We can use this formula, together with the formulas for the first two moments in order to calculate

all higher order moments in an iterative way. For example, for n = 2 we have

M2(t) = e−2αtM2(0) + 2D

∫ t

0

e−2α(t−s)M0(s) ds

= e−2αtM2(0) +D

αe−2αt(e2αt − 1)

=D

α+ e−2αt

(M2(0)− D

α

).

Consequently, the second moment converges exponentially fast to its stationary value D2α

. The

stationary moments of the OU process are:

〈yn〉OU :=

√α

2πD

∫Ryne−

αy2

2D dx

=

1.3 . . . (n− 1)(Dα

)n/2, n even,

0, n odd.

It is not hard to check that (see Exercise 3)

limt→∞

Mn(t) = 〈yn〉OU (6.21)

98

exponentially fast4. Since we have already shown that the distribution function of the OU process

converges to the Gaussian distribution in the limit as t→ +∞, it is not surprising that the moments

also converge to the moments of the invariant Gaussian measure. What is not so obvious is that the

convergence is exponentially fast. In the next section we will prove that the Ornstein-Uhlenbeck

process does, indeed, converge to equilibrium exponentially fast. Of course, if the initial conditions

of the OU process are stationary, then the moments of the OU process become independent of time

and given by their equilibrium values

Mn(t) = Mn(0) = 〈xn〉OU . (6.22)

6.3.3 The Geometric Brownian Motion

We set a(x) = µx, b(x) = 12σ2x2. This is the geometric Brownian motion. The corresponding

stochastic differential equation is

dXt = µXt dt+ σXt dWt.

This equation is one of the basic models in mathematical finance. The coefficient σ is called the

volatility. The generator of this process is

L = µx∂

∂x+σx2

2

∂2

∂x2.

Notice that this operator is not uniformly elliptic. The Fokker-Planck equation of the geometric

Brownian motion is:∂p

∂t= − ∂

∂x(µx) +

∂2

∂x2

(σ2x2

2p

).

We can easily obtain an equation for the nth moment of the geometric Brownian motion:

d

dtMn =

(µn+

σ2

2n(n− 1)

)Mn, n > 2.


Mn(t) = e(µ+(n−1)σ2

2)ntMn(0), n > 2

and

M1(t) = eµtM1(0).

4Of course, we need to assume that the initial distribution has finite moments of all orders in order to justify theabove calculations.

99

Notice that the nth moment might diverge as t→∞, depending on the values of µ and σ. Consider

for example the second moment and assume that µ < 0. We have

Mn(t) = e(2µ+σ2)tM2(0),

which diverges when σ2 + 2µ > 0.

6.4 The Ornstein-Uhlenbeck Process and Hermite Polynomials

The Ornstein-Uhlenbeck process is one of the few stochastic processes for which we can calcu-

late explicitly the solution of the corresponding SDE, the solution of the Fokker-Planck equation

as well as the eigenfunctions of the generator of the process. In this section we will show that

the eigenfunctions of the OU process are the Hermite polynomials. We will also study various

properties of the generator of the OU process. In the next section we will show that many of the

properties of the OU process (ergodicity, self-adjointness of the generator, exponentially fast con-

vergence to equilibrium, real, discrete spectrum) are shared by a large class of diffusion processes,

namely those for which the drift term can be written in terms of the gradient of a smooth functions.

The generator of the d-dimensional OU process is (we set the drift coefficient equal to 1)

L = −p · ∇p + β−1∆p (6.23)

where β denotes the inverse temperature. We have already seen that the OU process is an er-

godic Markov process whose unique invariant measure is absolutely continuous with respect to the

Lebesgue measure on Rd with Gaussian density ρ ∈ C∞(Rd)

ρβ(p) =1

(2πβ−1)d/2e−β

|p|22 .

The natural function space for studying the generator of the OU process is the L2-space weighted

by the invariant measure of the process. This is a separable Hilbert space with norm

‖f‖2ρ :=

∫Rdf 2ρβ dp.

and corresponding inner product

(f, h)ρ =

∫Rfhρβ dp.

100

Similarly, we can define weighted L2-spaced involving derivatives, i.e. weighted Sobolev spaces.

See Exercise .

The reason why this is the right function space in which to study questions related to conver-

gence to equilibrium is that the generator of the OU process becomes a self-adjoint operator in this

space. In fact, L defined in (6.23) has many nice properties that are summarized in the following

proposition.

Proposition 6.4.1. The operator L has the following properties:

i. For every f, h ∈ C20(Rd) ∩ L2

ρ(Rd),

(Lf, h)ρ = (f,Lh)ρ = −β−1

∫Rd∇f · ∇hρβ dp. (6.24)

ii. L is a non-positive operator on L2ρ.

iii. Lf = 0 iff f ≡ const.

iv. For every f ∈ C20(Rd) ∩ L2

ρ(Rd) with∫fρβ = 0,

(−Lf, f)ρ > ‖f‖2ρ (6.25)

Proof. Equation (6.24) follows from an integration by parts:

(Lf, h)ρ =

∫−p · ∇fhρβ dp+ β−1

∫∆fhρβ dp

=

∫−p · ∇fhρβ dp− β−1

∫∇f · ∇hρβ dp+

∫−p · ∇fhρβ dp

= −β−1(∇f,∇h)ρ.

Non-positivity of L follows from (6.24) upon setting h = f :

(Lf, f)ρ = −β−1‖∇f‖2ρ 6 0.

Similarly, multiplying the equation Lf = 0 by fρβ , integrating over Rd and using (6.24) gives

‖f‖ρ = 0,

from which we deduce that f ≡ const. The spectral gap follows from (6.24), together with

Poincare’s inequality for Gaussian measures:∫Rdf 2ρβ dp 6 β−1

∫Rd|∇f |2ρβ dp (6.26)

101

for every f ∈ H1(Rd; ρβ) with∫fρβ = 0. Indeed, upon combining (6.24) with (6.26) we obtain:

(Lf, f)ρ = −β−1‖∇f‖2ρ

6 −‖f‖2ρ

The spectral gap of the generator of the OU process, which is equivalent to the compactness

of its resolvent, implies that L has discrete spectrum. Furthermore, since it is also a self-adjoint

operator, we have that its eigenfunctions form a countable orthonormal basis for the separable

Hilbert space L2ρ. In fact, we can calculate the eigenvalues and eigenfunctions of the generator of

the OU process in one dimension.5

Theorem 6.4.2. Consider the eigenvalue problem for the generator of the OU process in one

dimension

−Lfn = λnfn. (6.27)

Then the eigenvalues of L are the nonnegative integers:

λn = n, n = 0, 1, 2, . . . .

The corresponding eigenfunctions are the normalized Hermite polynomials:

fn(p) =1√n!Hn

(√βp), (6.28)

where

Hn(p) = (−1)nep2

2dn

dpn

(e−

p2

2

). (6.29)

For the subsequent calculations we will need some additional properties of Hermite polynomi-

als which we state here without proof (we use the notation ρ1 = ρ).

Proposition 6.4.3. For each λ ∈ C, set

H(p;λ) = eλp−λ2

2 , p ∈ R.5The multidimensional problem can be treated similarly by taking tensor products of the eigenfunctions of the one

dimensional problem.

102

Then

H(p;λ) =∞∑n=0

λn

n!Hn(p), p ∈ R, (6.30)

where the convergence is both uniform on compact subsets of R×C, and for λ’s in compact subsets

of C, uniform in L2(C; ρ). In particular, fn(p) := 1√n!Hn(√βp) : n ∈ N is an orthonormal basis

in L2(C; ρβ).

From (6.29) it is clear that Hn is a polynomial of degree n. Furthermore, only odd (even)

powers appear inHn(p) when n is odd (even). Furthermore, the coefficient multiplying pn inHn(p)

is always 1. The orthonormality of the modified Hermite polynomials fn(p) defined in (6.28)

implies that ∫Rfn(p)fm(p)ρβ(p) dp = δnm.

The first few Hermite polynomials and the corresponding rescaled/normalized eigenfunctions of

the generator of the OU process are:

H0(p) = 1, f0(p) = 1,

H1(p) = p, f1(p) =√βp,

H2(p) = p2 − 1, f2(p) =β√2p2 − 1√

2,

H3(p) = p3 − 3p, f3(p) =β3/2

√6p3 − 3

√β√6p

H4(p) = p4 − 3p2 + 3, f4(p) =1√24

(β2p4 − 3βp2 + 3

)H5(p) = p5 − 10p3 + 15p, f5(p) =

1√120

(β5/2p5 − 10β3/2p3 + 15β1/2p

).

The proof of Theorem 6.4.2 follows essentially from the properties of the Hermite polynomials.

First, notice that by combining (6.28) and (6.30) we obtain

H(√βp, λ) =

+∞∑n=0

λn√n!fn(p)

We differentiate this formula with respect to p to obtain

λ√βH(

√βp, λ) =

+∞∑n=1

λn√n!∂pfn(p),

103

since f0 = 1. From this equation we obtain

H(√βp, λ) =

+∞∑n=1

λn−1

√β√n!∂pfn(p)

=+∞∑n=0

λn√β√

(n+ 1)!∂pfn+1(p)

from which we deduce that1√β∂pfk =

√kfk−1. (6.31)

Similarly, if we differentiate (6.30) with respect to λ we obtain

(p− λ)H(p;λ) =+∞∑k=0

λk

k!pHk(p)−

+∞∑k=1

λk

(k − 1)!Hk−1(p)

+∞∑k=0

λk

k!Hk+1(p)

from which we obtain the recurrence relation

pHk = Hk+1 + kHk−1.

Upon rescaling, we deduce that

pfk =√β−1(k + 1)fk+1 +

√β−1kfk−1. (6.32)

We combine now equations (6.31) and (6.32) to obtain(√βp− 1√

β∂p

)fk =

√k + 1fk+1. (6.33)

Now we observe that

−Lfn =

(√βp− 1√

β∂p

)1√β∂pfn

=

(√βp− 1√

β∂p

)√nfn−1 = nfn.

The operators(√

βp− 1√β∂p

)and 1√

β∂p play the role of creation and annihilation operators.

In fact, we can generate all eigenfunctions of the OU operator from the ground state f0 = 0

through a repeated application of the creation operator.

Proposition 6.4.4. Set β = 1 and let a− = ∂p. Then the L2ρ-adjoint of a+ is

a+ = −∂p + p.

104

Then the generator of the OU process can be written in the form

L = −a+a−.

Furthermore, a+ and a− satisfy the following commutation relation

[a+, a−] = −1

Define now the creation and annihilation operators on C1(R) by

S+ =1√

(n+ 1)a+

and

S− =1√na−.

Then

S+fn = fn+1 and S−fn = fn−1. (6.34)

In particular,

fn =1√n!

(a+)n1 (6.35)

and

1 =1√n!

(a−)nfn. (6.36)

Proof. let f, h ∈ C1(R) ∩ L2ρ. We calculate∫

∂pfhρ = −∫f∂p(hρ) (6.37)

=

∫f(− ∂p + p

)hρ. (6.38)

Now,

−a+a− = −(−∂p + p)∂p = ∂p − p∂p = L.

Similarly,

a−a+ = −∂2p + p∂p + 1.

and

[a+, a−] = −1

Forumlas (6.34) follow from (6.31) and (6.33). Finally, formulas (6.35) and (6.36) are a conse-

quence of (6.31) and (6.33), together with a simple induction argument.

105

Notice that upon using (6.35) and (6.36) and the fact that a+ is the adjoint of a− we can easily

check the orthonormality of the eigenfunctions:∫fnfm ρ =

1√m!

∫fn(a−)m1 ρ

=1√m!

∫(a−)mfn ρ

=

∫fn−m ρ = δnm.

From the eigenfunctions and eigenvalues of L we can easily obtain the eigenvalues and eigenfunc-

tions of L∗, the Fokker-Planck operator.

Lemma 6.4.5. The eigenvalues and eigenfunctions of the Fokker-Planck operator

L∗· = ∂2p ·+∂p(p·)

are

λ∗n = −n, n = 0, 1, 2, . . . and f ∗n = ρfn.

Proof. We have

L∗(ρfn) = fnL∗ρ+ ρLfn

= −nρfn.

An immediate corollary of the above calculation is that we can the nth eigenfunction of the

Fokker-Planck operator is given by

f ∗n = ρ(p)1

n!(a+)n1.

6.5 Reversible Diffusions

The stationary Ornstein-Uhlenbeck process is an example of a reversible Markov process:

Definition 6.5.1. A stationary stochastic process Xt is time reversible if for every m ∈ N and

every t1, t2, . . . , tm ∈ R+, the joint probability distribution is invariant under time reversals:

p(Xt1 , Xt2 , . . . , Xtm) = p(X−t1 , X−t2 , . . . , X−tm). (6.39)

106

In this section we study a more general class (in fact, as we will see later the most general

class) of reversible Markov processes, namely stochastic perturbations of ODEs with a gradient

structure.

Let V (x) = 12αx2. The generator of the OU process can be written as:

L = −∂xV ∂x + β−1∂2x.

Consider diffusion processes with a potential V (x), not necessarily quadratic:

L = −∇V (x) · ∇+ β−1∆ (6.40)

In applications of (6.40) to statistical mechanics the diffusion coefficient β−1 = kBT where kB is

Boltzmann’s constant and T the absolute temperature. The corresponding stochastic differential

equation is

dXt = −∇V (Xt) dt+√

2β−1 dWt. (6.41)

Hence, we have a gradient ODE Xt = −∇V (Xt) perturbed by noise due to thermal fluctuations.

The corresponding FP equation is:

∂p

∂t= ∇ · (∇V p) + β−1∆p. (6.42)

It is not possible to calculate the time dependent solution of this equation for an arbitrary potential.

We can, however, always calculate the stationary solution, if it exists.

Definition 6.5.2. A potential V will be called confining if lim|x|→+∞ V (x) = +∞ and

e−βV (x) ∈ L1(Rd). (6.43)

for all β ∈ R+.

Gradient SDEs in a confining potential are ergodic:

Proposition 6.5.3. Let V (x) be a smooth confining potential. Then the Markov process with gen-

erator (6.40) is ergodic. The unique invariant distribution is the Gibbs distribution

p(x) =1

Ze−βV (x) (6.44)

where the normalization factor Z is the partition function

Z =

∫Rde−βV (x) dx.

107

The fact that the Gibbs distribution is an invariant distribution follows by direct substitution.

Uniqueness follows from a PDEs argument (see discussion below). It is more convenient to ”nor-

malize” the solution of the Fokker-Planck equation with respect to the invariant distribution.

Theorem 6.5.4. Let p(x, t) be the solution of the Fokker-Planck equation (6.42), assume that (6.43)

holds and let ρ(x) be the Gibbs distribution (11.4). Define h(x, t) through

p(x, t) = h(x, t)ρ(x).

Then the function h satisfies the backward Kolmogorov equation:

∂h

∂t= −∇V · ∇h+ β−1∆h, h(x, 0) = p(x, 0)ρ−1(x). (6.45)

Proof. The initial condition follows from the definition of h. We calculate the gradient and Lapla-

cian of p:

∇p = ρ∇h− ρhβ∇V

and

∆p = ρ∆h− 2ρβ∇V · ∇h+ hβ∆V ρ+ h|∇V |2β2ρ.

We substitute these formulas into the FP equation to obtain

ρ∂h

∂t= ρ

(−∇V · ∇h+ β−1∆h

),

from which the claim follows.

Consequently, in order to study properties of solutions to the FP equation, it is sufficient to

study the backward equation (6.45). The generator L is self-adjoint, in the right function space.

We define the weighted L2 space L2ρ:

L2ρ =

f |∫

Rd|f |2ρ(x) dx <∞

,

where ρ(x) is the Gibbs distribution. This is a Hilbert space with inner product

(f, h)ρ =

∫Rdfhρ(x) dx.

Theorem 6.5.5. Assume that V (x) is a smooth potential and assume that condition (6.43) holds.

Then the operator

L = −∇V (x) · ∇+ β−1∆

is self-adjoint in L2ρ. Furthermore, it is non-positive, its kernel consists of constants.

108

Proof. Let f, ∈ C20(Rd). We calculate

(Lf, h)ρ =

∫Rd

(−∇V · ∇+ β−1∆)fhρ dx

=

∫Rd

(∇V · ∇f)hρ dx− β−1

∫Rd∇f∇hρ dx− β−1

∫Rd∇fh∇ρ dx

= −β−1

∫Rd∇f · ∇hρ dx,

from which self-adjointness follows.

If we set f = h in the above equation we get

(Lf, f)ρ = −β−1‖∇f‖2ρ,

which shows that L is non-positive.

Clearly, constants are in the null space of L. Assume that f ∈ N (L). Then, from the above

equation we get

0 = −β−1‖∇f‖2ρ,

and, consequently, f is a constant.

Remark 6.5.6. The expression (−Lf, f)ρ is called the Dirichlet form of the operator L. In the

case of a gradient flow, it takes the form

(−Lf, f)ρ = β−1‖∇f‖2ρ. (6.46)

Using the properties of the generator L we can show that the solution of the Fokker-Planck

equation converges to the Gibbs distribution exponentially fast. For this we need to use the fact

that, under appropriate assumptions on the potential V , the Gibbs measure µ(dx) = Z−1e−βV (x)

satisfies Poincare’s inequality:

Theorem 6.5.7. Assume that the potential V satisfies the convexity condition

D2V > λI.

Then the corresponding Gibbs measure satisfies the Poincare inequality with constant λ:∫Rdfρ = 0 ⇒ ‖∇f‖ρ >

√λ‖f‖ρ. (6.47)

109

Theorem 6.5.8. Assume that p(x, 0) ∈ L2(eβV ). Then the solution p(x, t) of the Fokker-Planck

equation (6.42) converges to the Gibbs distribution exponentially fast:

‖p(·, t)− Z−1e−βV ‖ρ−1 6 e−λDt‖p(·, 0)− Z−1e−βV ‖ρ−1 . (6.48)

Proof. We Use (6.45), (6.46) and (6.47) to calculate

− d

dt‖(h− 1)‖2

ρ = −2

(∂h

∂t, h− 1

)ρ

= −2 (Lh, h− 1)ρ

= (−L(h− 1), h− 1)ρ = 2D‖∇(h− 1)‖ρ

> 2β−1λ‖h− 1‖2ρ.

Our assumption on p(·, 0) implies that h(·, 0) ∈ L2ρ. Consequently, the above calculation shows

that

‖h(·, t)− 1‖ρ 6 e−λβ−1t‖h(·, 0)− 1‖ρ.

This, and the definition of h, p = ρh, lead to (6.48).

Remark 6.5.9. The assumption ∫Rd|p(x, 0)|2Z−1eβV <∞

is very restrictive (think of the case where V = x2). The function space L2(ρ−1) = L2(e−βV ) in

which we prove convergence is not the right space to use. Since p(·, t) ∈ L1, ideally we would like

to prove exponentially fast convergence in L1. We can prove convergence in L1 using the theory of

logarithmic Sobolev inequalities. In fact, we can also prove convergence in relative entropy:

H(p|ρV ) :=

∫Rdp ln

(p

ρV

)dx.

The relative entropy norm controls the L1 norm:

‖ρ1 − ρ2‖2L1 6 CH(ρ1|ρ2)

Using a logarithmic Sobolev inequality, we can prove exponentially fast convergence to equilib-

rium, assuming only that the relative entropy of the initial conditions is finite.

A much sharper version of the theorem of exponentially fast convergence to equilibrium is the

following:

110

Theorem 6.5.10. Let p denote the solution of the Fokker–Planck equation (6.42) where the poten-

tial is smooth and uniformly convex. Assume that the the initial conditions satisfy

H(p(·, 0)|ρV ) <∞.

Then p converges to the Gibbs distribution exponentially fast in relative entropy:

H(p(·, t)|ρV ) 6 e−λβ−1tH(p(·, 0)|ρV ).

Self-adjointness of the generator of a diffusion process is equivalent to time-reversibility.

Theorem 6.5.11. Let Xt be a stationary Markov process in Rd with generator

L = b(x) · ∇+ β−1∆

and invariant measure µ. Then the following three statements are equivalent.

i. The process it time-reversible.

ii. Its generator of the process is symmetric in L2(Rd;µ(dx)).

iii. There exists a scalar function V (x) such that

b(x) = −∇V (x).

6.5.1 Markov Chain Monte Carlo (MCMC)

The Smoluchowski SDE (6.41) has a very interesting application in statistics. Suppose we want to

sample from a probability distribution π(x). One method for doing this is by generating the dynam-

ics whose invariant distribution is precisely π(x). In particular, we consider the Smolochuwoski

equation

dXt = ∇ ln(π(Xt)) dt+√

2dWt. (6.49)

Assuming that − ln(π(x)) is a confining potential, then Xt is an ergodic Markov process with

invariant distribution π(x). Furthermore, the law of Xt converges to π(x) exponentially fast:

‖ρt − π‖L1 6 e−Λt‖ρ0 − π‖L1 .

The exponent Λ is related to the spectral gap of the generator L = 1π(x)∇π(x) · ∇ + ∆. This

technique for sampling from a given distribution is an example of the Markov Chain Monte

Carlo (MCMC) methodology.

111

6.6 Perturbations of non-Reversible Diffusions

We can add a perturbation to a non-reversible diffusion without changing the invariant distribution

Z−1e−βV .

Proposition 6.6.1. Let V (x) be a confining potential, γ(x) a smooth vector field and consider the

diffusion process

dXt = (−∇V (Xt) + γ(x)) dt+√

2β−1 dWt. (6.50)

Then the invariant measure of the process Xt is the Gibbs measure µ(dx) = 1Ze−βV (x) dx if and

only if γ(x) is divergence-free with respect to the density of this measure:

∇ ·(γ(x)e−βV (x))

)= 0. (6.51)

6.7 Eigenfunction Expansions

Consider the generator of a gradient stochastic flow with a uniformly convex potential

L = −∇V · ∇+D∆. (6.52)

We know that L is a non-positive self-adjoint operator on L2ρ and that it has a spectral gap:

(Lf, f)ρ 6 −Dλ‖f‖2ρ

where λ is the Poincare constant of the potential V (i.e. for the Gibbs measure Z−1e−βV (x) dx).

The above imply that we can study the spectral problem for −L:

−Lfn = λnfn, n = 0, 1, . . .

The operator −L has real, discrete spectrum with

0 = λ0 < λ1 < λ2 < . . .

Furthermore, the eigenfunctions fj∞j=1 form an orthonormal basis in L2ρ: we can express every

element of L2ρ in the form of a generalized Fourier series:

φ =∞∑n=0

φnfn, φn = (φ, fn)ρ (6.53)

112

with (fn, fm)ρ = δnm. This enables us to solve the time dependent Fokker–Planck equation in

terms of an eigenfunction expansion. Consider the backward Kolmogorov equation (6.45). We

assume that the initial conditions h0(x) = φ(x) ∈ L2ρ and consequently we can expand it in the

form (6.53). We look for a solution of (6.45) in the form

h(x, t) =∞∑n=0

hn(t)fn(x).

We substitute this expansion into the backward Kolmogorov equation:

∂h

∂t=

∞∑n=0

hnfn = L

(∞∑n=0

hnfn

)(6.54)

=∞∑n=0

−λnhnfn. (6.55)

We multiply this equation by fm, integrate wrt the Gibbs measure and use the orthonormality of

the eigenfunctions to obtain the sequence of equations

hn = −λnhn, n = 0, 1,

The solution is

h0(t) = φ0, hn(t) = e−λntφn, n = 1, 2, . . .

Notice that

1 =

∫Rdp(x, 0) dx =

∫Rdp(x, t) dx

=

∫Rdh(x, t)Z−1eβV dx = (h, 1)ρ = (φ, 1)ρ

= φ0.

Consequently, the solution of the backward Kolmogorov equation is

h(x, t) = 1 +∞∑n=1

e−λntφnfn.

This expansion, together with the fact that all eigenvalues are positive (n > 1), shows that the

solution of the backward Kolmogorov equation converges to 1 exponentially fast. The solution of

the Fokker–Planck equation is

p(x, t) = Z−1e−βV (x)

(1 +

∞∑n=1

e−λntφnfn

).

113

6.7.1 Reduction to a Schrodinger Equation

Lemma 6.7.1. The Fokker–Planck operator for a gradient flow can be written in the self-adjoint

form∂p

∂t= D∇ ·

(e−V/D∇

(eV/Dp

)). (6.56)

Define now ψ(x, t) = eV/2Dp(x, t). Then ψ solves the PDE

∂ψ

∂t= D∆ψ − U(x)ψ, U(x) :=

|∇V |2

4D− ∆V

2. (6.57)

Let H := −D∆ + U . Then L∗ and H have the same eigenvalues. The nth eigenfunction φn of L∗

and the nth eigenfunction ψn ofH are associated through the transformation

ψn(x) = φn(x) exp

(V (x)

2D

).

Remarks 6.7.2. i. From equation (6.56) shows that the FP operator can be written in the form

L∗· = D∇ ·(e−V/D∇

(eV/D·

)).

ii. The operator that appears on the right hand side of eqn. (6.57) has the form of a Schrodinger

operator:

−H = −D∆ + U(x).

iii. The spectral problem for the FP operator can be transformed into the spectral problem for

a Schrodinger operator. We can thus use all the available results from quantum mechanics

to study the FP equation and the associated SDE.

iv. In particular, the weak noise asymptotics D 1 is equivalent to the semiclassical approxi-

mation from quantum mechanics.

Proof. We calculate

D∇ ·(e−V/D∇

(eV/Df

))= D∇ ·

(e−V/D

(D−1∇V f +∇f

)eV/D

)= ∇ · (∇V f +D∇f) = L∗f.

Consider now the eigenvalue problem for the FP operator:

−L∗φn = λnφn.

114

Set φn = ψn exp(− 1

2DV). We calculate −L∗φn:

−L∗φn = −D∇ ·(e−V/D∇

(eV/Dψne

−V/2D))= −D∇ ·

(e−V/D

(∇ψn +

∇V2D

ψn

)eV/2D

)=

(−D∆ψn +

(−|∇V |

2

4D+

∆V

2D

)ψn

)e−V/2D = e−V/2DHψn.

From this we conclude that e−V/2DHψn = λnψne−V/2D from which the equivalence between the

two eigenvalue problems follows.

Remarks 6.7.3. i. We can rewrite the Schrodinger operator in the form

H = DA∗A, A = ∇+∇U2D

, A∗ = −∇+∇U2D

.

ii. These are creation and annihilation operators. They can also be written in the form

A· = e−U/2D∇(eU/2D·

), A∗· = eU/2D∇

(e−U/2D·

)iii. The forward the backward Kolmogorov operators have the same eigenvalues. Their eigen-

functions are related through

φBn = φFn exp (−V/D) ,

where φBn and φFn denote the eigenfunctions of the backward and forward operators, respec-

tively.


The proof of existence and uniqueness of classical solutions for the Fokker-Planck equation of a

uniformly elliptic diffusion process with smooth drift and diffusion coefficients, Theorem 6.2.2,

can be found in [30]. A standard textbook on PDEs, with a lot of material on parabolic PDEs

is [22], particularly Chapters 2 and 7 in this book.

It is important to emphasize that the condition that solutions to the Fokker-Planck equation

do not grow too fast, see Definition 6.2.1, is necessary to ensure uniqueness. In fact, there are

infinitely many solutions of

∂p

∂t= ∆p in Rd × (0, T )

p(x, 0) = 0.

115

Each of these solutions besides the trivial solution p = 0 grows very rapidly as x → +∞. More

details can be found in [44, Ch. 7].

The Fokker-Planck equation is studied extensively in Risken’s monograph [82]. See also [35]

and [42]. The connection between the Fokker-Planck equation and stochastic differential equations

is presented in Chapter 7. See also [1, 31, 32].

Hermite polynomials appear very frequently in applications and they also play a fundamental

role in analysis. It is possible to prove that the Hermite polynomials form an orthonormal basis

for L2(Rd, ρβ) without using the fact that they are the eigenfunctions of a symmetric operator with

compact resolvent.6 The proof of Proposition 6.4.1 can be found in [90], Lemma 2.3.4 in particular.

Diffusion processes in one dimension are studied in [61]. The Feller classification for one

dimensional diffusion processes can be also found in [45, 24].

Convergence to equilibrium for kinetic equations (such as the Fokker-Planck equation) both

linear and non-linear (e.g., the Boltzmann equation) has been studied extensively. It has been

recognized that the relative entropy and logarithmic Sobolev inequalities play an important role in

the analysis of the problem of convergence to equilibrium. For more information see [62].

6.9 Exercises

1. Solve equation (6.13) by taking the Fourier transform, using the method of characteristics for

first order PDEs and taking the inverse Fourier transform.

2. Use the formula for the stationary joint probability density of the Ornstein-Uhlenbeck process,

eqn. (6.17) to obtain the stationary autocorrelation function of the OU process.

3. Use (6.20) to obtain formulas for the moments of the OU process. Prove, using these formulas,

that the moments of the OU process converge to their equilibrium values exponentially fast.

4. Show that the autocorrelation function of the stationary Ornstein-Uhlenbeck is

E(XtX0) =

∫R

∫Rxx0pOU(x, t|x0, 0)ps(x0) dxdx0

=D

2αe−α|t|,

6In fact, Poincare’s inequality for Gaussian measures can be proved using the fact that that the Hermite polynomialsform an orthonormal basis for L2(Rd, ρβ).

116

where ps(x) denotes the invariant Gaussian distribution.

5. Let Xt be a one-dimensional diffusion process with drift and diffusion coefficients a(y, t) =

−a0 − a1y and b(y, t) = b0 + b1y + b2y2 where ai, bi > 0, i = 0, 1, 2.

(a) Write down the generator and the forward and backward Kolmogorov equations for Xt.

(b) Assume that X0 is a random variable with probability density ρ0(x) that has finite mo-

ments. Use the forward Kolmogorov equation to derive a system of differential equations

for the moments of Xt.

(c) Find the first three moments M0, M1, M2 in terms of the moments of the initial distribu-

tion ρ0(x).

(d) Under what conditions on the coefficients ai, bi > 0, i = 0, 1, 2 is M2 finite for all times?

6. Let V be a confining potential in Rd, β > 0 and let ρβ(x) = Z−1e−βV (x). Give the definition of

the Sobolev space Hk(Rd; ρβ) for k a positive integer and study some of its basic properties.

7. Let Xt be a multidimensional diffusion process on [0, 1]d with periodic boundary conditions.

The drift vector is a periodic function a(x) and the diffusion matrix is 2DI , where D > 0 and

I is the identity matrix.

(a) Write down the generator and the forward and backward Kolmogorov equations for Xt.

(b) Assume that a(x) is divergence-free (∇ · a(x) = 0). Show that Xt is ergodic and find the

invariant distribution.

(c) Show that the probability density p(x, t) (the solution of the forward Kolmogorov equa-

tion) converges to the invariant distribution exponentially fast in L2([0, 1]d). (Hint: Use

Poincare’s inequality on [0, 1]d).

8. The Rayleigh process Xt is a diffusion process that takes values on (0,+∞) with drift and

diffusion coefficients a(x) = −ax+ Dx

and b(x) = 2D, respectively, where a, D > 0.

(a) Write down the generator the forward and backward Kolmogorov equations for Xt.

(b) Show that this process is ergodic and find its invariant distribution.

(c) Solve the forward Kolmogorov (Fokker-Planck) equation using separation of variables.

(Hint: Use Laguerre polynomials).

117

9. Let x(t) = x(t), y(t) be the two-dimensional diffusion process on [0, 2π]2 with periodic

boundary conditions with drift vector a(x, y) = (sin(y), sin(x)) and diffusion matrix b(x, y)

with b11 = b22 = 1, b12 = b21 = 0.

(a) Write down the generator of the process x(t), y(t) and the forward and backward Kol-

mogorov equations.

(b) Show that the constant function

ρs(x, y) = C

is the unique stationary distribution of the process x(t), y(t) and calculate the normal-

ization constant.

(c) Let E denote the expectation with respect to the invariant distribution ρs(x, y). Calculate

E(

cos(x) + cos(y))

and E(sin(x) sin(y)).

10. Let a, D be positive constants and let X(t) be the diffusion process on [0, 1] with periodic

boundary conditions and with drift and diffusion coefficients a(x) = a and b(x) = 2D, respec-

tively. Assume that the process starts at x0, X(0) = x0.

(a) Write down the generator of the process X(t) and the forward and backward Kolmogorov

equations.

(b) Solve the initial/boundary value problem for the forward Kolmogorov equation to calcu-

late the transition probability density p(x, t|x0, 0).

(c) Show that the process is ergodic and calculate the invariant distribution ps(x).

(d) Calculate the stationary autocorrelation function

E(X(t)X(0)) =

∫ 1

0

∫ 1

0

xx0p(x, t|x0, 0)ps(x0) dxdx0.

118

Chapter 7

Stochastic Differential Equations

7.1 Introduction

In this part of the course we will study stochastic differential equation (SDEs): ODEs driven by

Gaussian white noise.

Let W (t) denote a standard m–dimensional Brownian motion, h : Z → Rd a smooth vector-

valued function and γ : Z → Rd×m a smooth matrix valued function (in this course we will take

Z = Td, Rd or Rl ⊕ Td−l. Consider the SDE

dz

dt= h(z) + γ(z)

dW

dt, z(0) = z0. (7.1)

We think of the term dWdt

as representing Gaussian white noise: a mean-zero Gaussian process

with correlation δ(t − s)I . The function h in (7.1) is sometimes referred to as the drift and γ as

the diffusion coefficient. Such a process exists only as a distribution. The precise interpretation

of (7.1) is as an integral equation for z(t) ∈ C(R+,Z):

z(t) = z0 +

∫ t

0

h(z(s))ds+

∫ t

0

γ(z(s))dW (s). (7.2)

In order to make sense of this equation we need to define the stochastic integral against W (s).

7.2 The Ito and Stratonovich Stochastic Integral

For the rigorous analysis of stochastic differential equations it is necessary to define stochastic

integrals of the form

I(t) =

∫ t

0

f(s) dW (s), (7.3)

119

where W (t) is a standard one dimensional Brownian motion. This is not straightforward because

W (t) does not have bounded variation. In order to define the stochastic integral we assume that

f(t) is a random process, adapted to the filtration Ft generated by the process W (t), and such that

E(∫ T

0

f(s)2 ds

)<∞.

The Ito stochastic integral I(t) is defined as the L2–limit of the Riemann sum approximation of

(7.3):

I(t) := limK→∞

K−1∑k=1

f(tk−1) (W (tk)−W (tk−1)) , (7.4)

where tk = k∆t and K∆t = t. Notice that the function f(t) is evaluated at the left end of each

interval [tn−1, tn] in (7.4). The resulting Ito stochastic integral I(t) is a.s. continuous in t. These

ideas are readily generalized to the case where W (s) is a standard d dimensional Brownian motion

and f(s) ∈ Rm×d for each s.

The resulting integral satisfies the Ito isometry

E|I(t)|2 =

∫ t

0

E|f(s)|2Fds, (7.5)

where | · |F denotes the Frobenius norm |A|F =√tr(ATA). The Ito stochastic integral is a

martingale:

EI(t) = 0

and

E[I(t)|Fs] = I(s) ∀ t > s,

where Fs denotes the filtration generated by W (s).

Example 7.2.1. • Consider the Ito stochastic integral

I(t) =

∫ t

0

f(s) dW (s),

• where f,W are scalar–valued. This is a martingale with quadratic variation

〈I〉t =

∫ t

0

(f(s))2 ds.

• More generally, for f, W in arbitrary finite dimensions, the integral I(t) is a martingale

with quadratic variation

〈I〉t =

∫ t

0

(f(s)⊗ f(s)) ds.

120

7.2.1 The Stratonovich Stochastic Integral

In addition to the Ito stochastic integral, we can also define the Stratonovich stochastic integral. It

is defined as the L2–limit of a different Riemann sum approximation of (7.3), namely

Istrat(t) := limK→∞

K−1∑k=1

1

2

(f(tk−1) + f(tk)

)(W (tk)−W (tk−1)) , (7.6)

where tk = k∆t and K∆t = t. Notice that the function f(t) is evaluated at both endpoints of each

interval [tn−1, tn] in (7.6). The multidimensional Stratonovich integral is defined in a similar way.

The resulting integral is written as

Istrat(t) =

∫ t

0

f(s) dW (s).

The limit in (7.6) gives rise to an integral which differs from the Ito integral. The situation is

more complex than that arising in the standard theory of Riemann integration for functions of

bounded variation: in that case the points in [tk−1, tk] where the integrand is evaluated do not effect

the definition of the integral, via a limiting process. In the case of integration against Brownian

motion, which does not have bounded variation, the limits differ. When f and W are correlated

through an SDE, then a formula exists to convert between them.

7.3 Stochastic Differential Equations

Definition 7.3.1. By a solution of (7.1) we mean aZ-valued stochastic process z(t) on t ∈ [0, T ]

with the properties:

i. z(t) is continuous andFt−adapted, where the filtration is generated by the Brownian motion

W (t);

ii. h(z(t)) ∈ L1((0, T )), γ(z(t)) ∈ L2((0, T ));

iii. equation (7.1) holds for every t ∈ [0, T ] with probability 1.

The solution is called unique if any two solutions xi(t), i = 1, 2 satisfy

P(x1(t) = x2(t), ∀t ∈ [0.T ]) = 1.

121

It is well known that existence and uniqueness of solutions for ODEs (i.e. when γ ≡ 0 in (7.1))

holds for globally Lipschitz vector fields h(x). A very similar theorem holds when γ 6= 0. As for

ODEs the conditions can be weakened, when a priori bounds on the solution can be found.

Theorem 7.3.2. Assume that both h(·) and γ(·) are globally Lipschitz onZ and that z0 is a random

variable independent of the Brownian motion W (t) with

E|z0|2 <∞.

Then the SDE (7.1) has a unique solution z(t) ∈ C(R+;Z) with

E[∫ T

0

|z(t)|2 dt]<∞ ∀T <∞.

Furthermore, the solution of the SDE is a Markov process.

The Stratonovich analogue of (7.1) is

dz

dt= h(z) + γ(z) dW

dt, z(0) = z0. (7.7)

By this we mean that z ∈ C(R+,Z) satisfies the integral equation

z(t) = z(0) +

∫ t

0

h(z(s))ds+

∫ t

0

γ(z(s)) dW (s). (7.8)

By using definitions (7.4) and (7.6) it can be shown that z satisfying the Stratonovich SDE (7.7)

also satisfies the Ito SDE

dz

dt= h(z) +

1

2∇ ·(γ(z)γ(z)T

)− 1

2γ(z)∇ ·

(γ(z)T

)+ γ(z)

dW

dt, (7.9a)

z(0) = z0, (7.9b)

provided that γ(z) is differentiable. White noise is, in most applications, an idealization of a sta-

tionary random process with short correlation time. In this context the Stratonovich interpretation

of an SDE is particularly important because it often arises as the limit obtained by using smooth

approximations to white noise. On the other hand the martingale machinery which comes with

the Ito integral makes it more important as a mathematical object. It is very useful that we can

convert from the Ito to the Stratonovich interpretation of the stochastic integral. There are other

interpretations of the stochastic integral, e.g. the Klimontovich stochastic integral.

122

The Definition of Brownian motion implies the scaling property

W (ct) =√cW (t),

where the above should be interpreted as holding in law. From this it follows that, if s = ct, then

dW

ds=

1√c

dW

dt,

again in law. Hence, if we scale time to s = ct in (7.1), then we get the equation

dz

ds=

1

ch(z) +

1√cγ(z)

dW

ds, z(0) = z0.

7.3.1 Examples of SDEs

The SDE for Brownian motion is:

dX =√

2σdW, X(0) = x.

The Solution is:

X(t) = x+W (t).

The SDE for the Ornstein-Uhlenbeck process is

dX = −αX dt+√

2λ dW, X(0) = x.

We can solve this equation using the variation of constants formula:

X(t) = e−αtx+√

2λ

∫ t

0

e−α(t−s)dW (s).

We can use Ito’s formula to obtain equations for the moments of the OU process. The generator is:

L = −αx∂x + λ∂2x.

We apply Ito’s formula to the function f(x) = xn to obtain:

dX(t)n = LX(t)n dt+√

2λ∂X(t)n dW

= −αnX(t)n dt+ λn(n− 1)X(t)n−2 dt+ n√

2λX(t)n−1 dW.

123

Consequently:

X(t)n = xn +

∫ t

0

(−αnX(t)n + λn(n− 1)X(t)n−2

)dt

+n√

2λ

∫ t

0

X(t)n−1 dW.

By taking the expectation in the above equation we obtain the equation for the moments of the OU

process that we derived earlier using the Fokker-Planck equation:

Mn(t) = xn +

∫ t

0

(−αnMn(s) + λn(n− 1)Mn−2(s)) ds.

Consider the geometric Brownian motion

dX(t) = µX(t) dt+ σX(t) dW (t), (7.10)

where we use the Ito interpretation of the stochastic differential. The generator of this process is

L = µx∂x +σ2x2

2∂2x.

The solution to this equation is

X(t) = X(0) exp

((µ− σ2

2)t+ σW (t)

). (7.11)

To derive this formula, we apply Ito’s formula to the function f(x) = log(x):

d log(X(t)) = L(

log(X(t)))dt+ σx∂x log(X(t)) dW (t)

=

(µx

1

x+σ2x2

2

(− 1

x2

))dt+ σ dW (t)

=

(µ− σ2

2

)dt+ σ dW (t).

Consequently:

log

(X(t)

X(0)

)=

(µ− σ2

2

)t+ σW (t)

from which (7.11) follows. Notice that the Stratonovich interpretation of this equation leads to the

solution

X(t) = X(0) exp(µt+ σW (t))

124

7.4 The Generator, Ito’s formula and the Fokker-Planck Equa-tion

7.4.1 The Generator

Given the function γ(z) in the SDE (7.1) we define

Γ(z) = γ(z)γ(z)T . (7.12)

The generator L is then defined as

Lv = h · ∇v +1

2Γ : ∇∇v. (7.13)

This operator, equipped with a suitable domain of definition, is the generator of the Markov process

given by (7.1). The formal L2−adjoint operator L∗

L∗v = −∇ · (hv) +1

2∇ · ∇ · (Γv).

7.4.2 Ito’s Formula

The Ito formula enables us to calculate the rate of change in time of functions V : Z → Rn

evaluated at the solution of a Z-valued SDE. Formally, we can write:

d

dt

(V (z(t))

)= LV (z(t)) +

⟨∇V (z(t)), γ(z(t))

dW

dt

⟩.

Note that if W were a smooth time-dependent function this formula would not be correct: there is

an additional term in LV , proportional to Γ, which arises from the lack of smoothness of Brownian

motion. The precise interpretation of the expression for the rate of change of V is in integrated

form:

Lemma 7.4.1. (Ito’s Formula) Assume that the conditions of Theorem 7.3.2 hold. Let x(t) solve

(7.1) and let V ∈ C2(Z,Rn). Then the process V (z(t)) satisfies

V (z(t)) = V (z(0)) +

∫ t

0

LV (z(s))ds+

∫ t

0

〈∇V (z(s)), γ(z(s)) dW (s)〉 .

Let φ : Z 7→ R and consider the function

v(z, t) = E(φ(z(t))|z(0) = z

), (7.14)

125

where the expectation is with respect to all Brownian driving paths. By averaging in the Ito for-

mula, which removes the stochastic integral, and using the Markov property, it is possible to obtain

the Backward Kolmogorov equation.

Theorem 7.4.2. Assume that φ is chosen sufficiently smooth so that the backward Kolmogorov

equation

∂v

∂t= Lv for (z, t) ∈ Z × (0,∞),

v = φ for (z, t) ∈ Z × 0 , (7.15)

has a unique classical solution v(x, t) ∈ C2,1(Z × (0,∞), ). Then v is given by (7.14) where z(t)

solves (7.2).

For a Stratonovich SDE the rules of standard calculus apply: Consider the Stratonovich SDE (7.29)

and let V (x) ∈ C2(R). Then

dV (X(t)) =dV

dx(X(t)) (f(X(t)) dt+ σ(X(t)) dW (t)) .

Consider the Stratonovich SDE (7.29) on Rd (i.e. f ∈ Rd, σ : Rn 7→ Rd, W (t) is standard

Brownian motion on Rn). The corresponding Fokker-Planck equation is:

∂ρ

∂t= −∇ · (fρ) +

1

2∇ · (σ∇ · (σρ))). (7.16)

Now we can derive rigorously the Fokker-Planck equation.

Theorem 7.4.3. Consider equation (7.2) with z(0) a random variable with density ρ0(z). Assume

that the law of z(t) has a density ρ(z, t) ∈ C2,1(Z × (0,∞)). Then ρ satisfies the Fokker-Planck

equation

∂ρ

∂t= L∗ρ for (z, t) ∈ Z × (0,∞), (7.17a)

ρ = ρ0 for z ∈ Z × 0. (7.17b)

Proof. Let Eµ denote averaging with respect to the product measure induced by the measure µ

with density ρ0 on z(0) and the independent driving Wiener measure on the SDE itself. Averaging

126

over random z(0) distributed with density ρ0(z), we find

Eµ(φ(z(t))) =

∫Zv(z, t)ρ0(z) dz

=

∫Z

(eLtφ)(z)ρ0(z) dz

=

∫Z

(eL∗tρ0)(z)φ(z) dz.

But since ρ(z, t) is the density of z(t) we also have

Eµ(φ(z(t))) =

∫Zρ(z, t)φ(z)dz.

Equating these two expressions for the expectation at time t we obtain∫Z

(eL∗tρ0)(z)φ(z) dz =

∫Zρ(z, t)φ(z) dz.

We use a density argument so that the identity can be extended to all φ ∈ L2(Z). Hence, from the

above equation we deduce that

ρ(z, t) =(eL∗tρ0

)(z).

Differentiation of the above equation gives (7.17a). Setting t = 0 gives the initial condition (7.17b).

7.5 Linear SDEs

In this section we study linear SDEs in arbitrary finite dimensions. Let A ∈ Rn×n be a positive

definite matrix and let D > 0 be a positive constant. We will consider the SDE

dX(t) = −AX(t) dt+√

2DdW (t)

or, componentwise,

dXi(t) = −d∑j=1

AijXj(t) +√

2DdWi(t), i = 1, . . . d.

The corresponding Fokker-Planck equation is

∂p

∂t= ∇ · (Axp) +D∆p

127

or∂p

∂t=

d∑i,j

∂

∂xi(Aijxjp) +D

d∑j=1

∂2p

∂x2j

.

Let us now solve the Fokker-Planck equation with initial conditions p(x, t|x0, 0) = δ(x−x0). We

take the Fourier transform of the Fokker-Planck equation to obtain

∂p

∂t= −Ak · ∇kp−D|k|2p (7.18)

with

p(x, t|x0, 0) = (2π)−d∫

Rdeik·xp(k, t|x0, t) dk.

The initial condition is

p(k, 0|x0, 0) = e−ik·x0 (7.19)

We know that the transition probability density of a linear SDE is Gaussian. Since the Fourier

transform of a Gaussian function is also Gaussian, we look for a solution to (7.18) which is of the

form

p(k, t|x0, 0) = exp(−ik ·M(t)− 1

2kTΣ(t)k).

We substitute this into (7.18) and use the symmetry of A to obtain the equations

dM

dt= −AM and

dΣ

dt= −2AΣ + 2DI,

with initial conditions (which follow from (11.5)) M(0) = x0 and Σ(0) = 0 where 0 denotes the

zero d× d matrix. We can solve these equations using the spectral resolution of A = BTΛB. The

solutions are

M(t) = e−AtM(0)

and

Σ(t) = DA−1 −DA−1e−2At.

We calculate now the inverse Fourier transform of p to obtain the fundamental solution (Green’s

function) of the Fokker-Planck equation

p(x, t|x0, 0) = (2π)−d/2(det(Σ(t)))−1/2 exp

(−1

2

(x− e−Atx0

)TΣ−1(t)

(x− e−Atx0

)).

(7.20)

128

We note that generator of the Markov processes Xt is of the form

L = −∇V (x) · ∇+D∆

with V (x) = 12xTAx = 1

2

∑di,j=1 Aijxixj . This is a confining potential and from the theory

presented in Section 6.5 we know that the process Xt is ergodic. The invariant distribution is

ps(x) =1

Ze−

12xTAx (7.21)

with Z =∫

Rd e− 1

2xTAx dx = (2π)

d2

√det(A−1). Using the above calculations, we can calculate the

stationary autocorrelation matrix is given by the formula

E(XT0 Xt) =

∫ ∫xT0 xp(x, t|x0, 0)ps(x0) dxdx0.

We substitute the formulas for the transitions probability density and the stationary distribution,

equations (7.21) and (7.20) into the above equations and do the Gaussian integration to obtain

E(XT0 Xt) = DA−1e−At.

We use now the the variation of constants formula to obtain

Xt = eAtX0 +√

2D

∫ t

0

eA(t−s) dW (s).

The matrix exponential can be calculated using the spectral resolution of A:

eAt = BT eΛtB.

7.6 Derivation of the Stratonovich SDE

When white noise is approximated by a smooth process this often leads to Stratonovich inter-

pretations of stochastic integrals, at least in one dimension. We use multiscale analysis (singular

perturbation theory for Markov processes) to illustrate this phenomenon in a one-dimensional ex-

ample.

Consider the equationsdx

dt= h(x) +

1

εf(x)y, (7.22a)

dy

dt= −αy

ε2+

√2D

ε2

dV

dt, (7.22b)

129

with V being a standard one-dimensional Brownian motion. We say that the process x(t) is driven

by colored noise: the noise that appears in (7.22a) has non-zero correlation time. The correlation

function of the colored noise η(t) := y(t)/ε is (we take y(0) = 0)

R(t) = E (η(t)η(s)) =1

ε2

D

αe−

αε2|t−s|.

The power spectrum of the colored noise η(t) is:

f ε(x) =1

ε2

Dε−2

π

1

x2 + (αε−2)2

=D

π

1

ε4x2 + α2→ D

πα2

and, consequently,

limε→0

E(y(t)

ε

y(s)

ε

)=

2D

α2δ(t− s),

which implies the heuristic

limε→0

y(t)

ε=

√2D

α2

dV

dt. (7.23)

Another way of seeing this is by solving (7.22b) for y/ε:

y

ε=

√2D

α2

dV

dt− ε

α

dy

dt. (7.24)

If we neglect the O(ε) term on the right hand side then we arrive, again, at the heuristic (7.23).

Both of these arguments lead us to conjecture the limiting Ito SDE:

dX

dt= h(X) +

√2D

αf(X)

dV

dt. (7.25)

In fact, as applied, the heuristic gives the incorrect limit. Whenever white noise is approximated

by a smooth process, the limiting equation should be interpreted in the Stratonovich sense, giving

dX

dt= h(X) +

√2D

αf(X) dV

dt. (7.26)

This is usually called the Wong-Zakai theorem. A similar result is true in arbitrary finite and even

infinite dimensions. We will show this using singular perturbation theory.

Theorem 7.6.1. Assume that the initial conditions for y(t) are stationary and that the function f

is smooth. Then the solution of eqn (7.22a) converges, in the limit as ε → 0 to the solution of the

Stratonovich SDE (7.26).

130

Remarks 7.6.2. i. It is possible to prove pathwise convergence under very mild assumptions.

ii. The generator of a Stratonovich SDE has the from

Lstrat = h(x)∂x +D

αf(x)∂x (f(x)∂x) .

iii. Consequently, the Fokker-Planck operator of the Stratonovich SDE can be written in diver-

gence form:

L∗strat· = −∂x (h(x)·) +D

α∂x(f 2(x)∂x·

).

iv. In most applications in physics the white noise is an approximation of a more complicated

noise processes with non-zero correlation time. Hence, the physically correct interpretation

of the stochastic integral is the Stratonovich one.

v. In higher dimensions an additional drift term might appear due to the noncommutativity of

the row vectors of the diffusion matrix. This is related to the Levy area correction in the

theory of rough paths.

Proof of Proposition 7.6.1 The generator of the process (x(t), y(t)) is

L =1

ε2

(−αy∂y +D∂2

y

)+

1

εf(x)y∂x + h(x)∂x

=:1

ε2L0 +

1

εL1 + L2.

The ”fast” process is an stationary Markov process with invariant density

ρ(y) =

√α

2πDe−

αy2

2D . (7.27)

The backward Kolmogorov equation is

∂uε

∂t=

(1

ε2L0 +

1

εL1 + L2

)uε. (7.28)

We look for a solution to this equation in the form of a power series expansion in ε:

uε(x, y, t) = u0 + εu1 + ε2u2 + . . .

131

We substitute this into (7.28) and equate terms of the same power in ε to obtain the following

hierarchy of equations:

−L0u0 = 0,

−L0u1 = L1u0,

−L0u2 = L1u1 + L2u0 −∂u0

∂t.

The ergodicity of the fast process implies that the null space of the generator L0 consists only of

constant in y. Hence:

u0 = u(x, t).

The second equation in the hierarchy becomes

−L0u1 = f(x)y∂xu.

This equation is solvable since the right hand side is orthogonal to the null space of the adjoint of

L0 (this is the Fredholm alterantive). We solve it using separation of variables:

u1(x, y, t) =1

αf(x)∂xuy + ψ1(x, t).

In order for the third equation to have a solution we need to require that the right hand side is

orthogonal to the null space of L∗0:∫R

(L1u1 + L2u0 −

∂u0

∂t

)ρ(y) dy = 0.

We calculate: ∫R

∂u0

∂tρ(y) dy =

∂u

∂t.

Furthermore: ∫RL2u0ρ(y) dy = h(x)∂xu.

Finally ∫RL1u1ρ(y) dy =

∫Rf(x)y∂x

(1

αf(x)∂xuy + ψ1(x, t)

)ρ(y) dy

=1

αf(x)∂x (f(x)∂xu) 〈y2〉+ f(x)∂xψ1(x, t)〈y〉

=D

α2f(x)∂x (f(x)∂xu)

=D

α2f(x)∂xf(x)∂xu+

D

α2f(x)2∂2

xu.

132

Putting everything together we obtain the limiting backward Kolmogorov equation

∂u

∂t=

(h(x) +

D

α2f(x)∂xf(x)

)∂xu+

D

α2f(x)2∂2

xu,

from which we read off the limiting Stratonovich SDE

dX

dt= h(X) +

√2D

αf(X) dV

dt.

7.6.1 Ito versus Stratonovich

A Stratonovich SDE

dX(t) = f(X(t)) dt+ σ(X(t)) dW (t) (7.29)

can be written as an Ito SDE

dX(t) =

(f(X(t)) +

1

2

(σdσ

dx

)(X(t))

)dt+ σ(X(t)) dW (t).

Conversely, and Ito SDE

dX(t) = f(X(t)) dt+ σ(X(t))dW (t) (7.30)

can be written as a Statonovich SDE

dX(t) =

(f(X(t))− 1

2

(σdσ

dx

)(X(t))

)dt+ σ(X(t)) dW (t).

The Ito and Stratonovich interpretation of an SDE can lead to equations with very different prop-

erties!

When the diffusion coefficient depends on the solution of the SDE X(t), we will say that we

have an equation with multiplicative noise .

7.7 Numerical Solution of SDEs

7.8 Parameter Estimation for SDEs

7.9 Noise Induced Transitions

Consider the Landau equation:

dXt

dt= Xt(c−X2

t ), X0 = x. (7.31)

133

This is a gradient flow for the potential V (x) = 12cx2− 1

4x4. When c < 0 all solutions are attracted

to the single steady state X∗ = 0. When c > 0 the steady state X∗ = 0 becomes unstable and

Xt →√c if x > 0 and Xt → −

√c if x < 0. Consider additive random perturbations to the

Landau equation:dXt

dt= Xt(c−X2

t ) +√

2σdWt

dt, X0 = x. (7.32)

This equation defines an ergodic Markov process on R: There exists a unique invariant distribution:

ρ(x) = Z−1e−V (x)/σ, Z =

∫Re−V (x)/σ dx, V (x) =

1

2cx2 − 1

4x4.

ρ(x) is a probability density for all values of c ∈ R. The presence of additive noise in some

sense ”trivializes” the dynamics. The dependence of various averaged quantities on c resembles

the physical situation of a second order phase transition.

Consider now multiplicative perturbations of the Landau equation.

dXt

dt= Xt(c−X2

t ) +√

2σXtdWt

dt, X0 = x. (7.33)

Where the stochastic differential is interpreted in the Ito sense. The generator of this process is

L = x(c− x2)∂x + σx2∂2x.

Notice that Xt = 0 is always a solution of (7.33). Thus, if we start with x > 0 (x < 0) the solution

will remain positive (negative). We will assume that x > 0.

Consider the function Yt = log(Xt). We apply Ito’s formula to this function:

dYt = L log(Xt) dt+ σXt∂x log(Xt) dWt

=

(Xt(c−X2

t )1

Xt

− σX2t

1

X2t

)dt+ σXt

1

Xt

dWt

= (c− σ) dt−X2t dt+ σ dWt.

Thus, we have been able to transform (7.33) into an SDE with additive noise:

dYt =[(c− σ)− e2Yt

]dt+ σ dWt. (7.34)

This is a gradient flow with potential

V (y) = −[(c− σ)y − 1

2e2y].

134

The invariant measure, if it exists, is of the form

ρ(y) dy = Z−1e−V (y)/σ dy.

Going back to the variable x we obtain:

ρ(x) dx = Z−1x(c/σ−2)e−x2

2σ dx.

We need to make sure that this distribution is integrable:

Z =

∫ +∞

0

xγe−x2

2σ <∞, γ =c

σ− 2.

For this it is necessary that

γ > −1 ⇒ c > σ.

Not all multiplicative random perturbations lead to ergodic behavior. The dependence of the in-

variant distribution on c is similar to the physical situation of first order phase transitions.


Colored Noise When the noise which drives an SDE has non-zero correlation time we will say

that we have colored noise. The properties of the SDE (stability, ergodicity etc.) are quite robust

under ”coloring of the noise”. See

G. Blankenship and G.C. Papanicolaou, Stability and control of stochastic systems with wide-

band noise disturbances. I, SIAM J. Appl. Math., 34(3), 1978, pp. 437–476. Colored noise

appears in many applications in physics and chemistry. For a review see P. Hanggi and P. Jung

Colored noise in dynamical systems. Adv. Chem. Phys. 89 239 (1995).

In the case where there is an additional small time scale in the problem, in addition to the

correlation time of the colored noise, it is not clear what the right interpretation of the stochastic

integral (in the limit as both small time scales go to 0). This is usually called the Ito versus

Stratonovich problem. Consider, for example, the SDE

τX = −X + v(X)ηε(t),

where ηε(t) is colored noise with correlation time ε2. In the limit where both small time scales go

to 0 we can get either Ito or Stratonovich or neither. See [51, 71].

Noise induced transitions are studied extensively in [42]. The material in Section 7.9 is based

on [59]. See also [58].

135

7.11 Exercises

1. Calculate all moments of the geometric Brownian motion for the Ito and Stratonovich interpre-

tations of the stochastic integral.

2. Study additive and multiplicative random perturbations of the ODE

dx

dt= x(c+ 2x2 − x4).

3. Analyze equation (7.33) for the Stratonovich interpretation of the stochastic integral.

136

Chapter 8

The Langevin Equation

8.1 Introduction

8.2 The Fokker-Planck Equation in Phase Space (Klein-KramersEquation)

Consider a diffusion process in two dimensions for the variables q (position) and momentum p.

The generator of this Markov process is

L = p · ∇q −∇qV∇p + γ(−p∇p +D∆p). (8.1)

The L2(dpdq)-adjoint is

L∗ρ = −p · ∇qρ−∇qV · ∇pρ+ γ (∇p(pρ) +D∆pρ) .

The corresponding FP equation is:∂p

∂t= L∗p.

The corresponding stochastic differential equations is the Langevin equation

Xt = −∇V (Xt)− γXt +√

2γDWt. (8.2)

This is Newton’s equation perturbed by dissipation and noise. The Fokker-Planck equation for the

Langevin equation, which is sometimes called the Klein-Kramers-Chandrasekhar equation was

first derived by Kramers in 1923 and was studied by Kramers in his famous paper [?]. Notice that

L∗ is not a uniformly elliptic operator: there are second order derivatives only with respect to p

and not q. This is an example of a degenerate elliptic operator. It is, however, hypoelliptic. We can

137

still prove existence, uniqueness and regularity of solutions for the Fokker-Planck equation, and

obtain estimates on the solution. It is not possible to obtain the solution of the FP equation for an

arbitrary potential. We can, however, calculate the (unique normalized) solution of the stationary

Fokker-Planck equation.

Theorem 8.2.1. Let V (x) be a smooth confining potential. Then the Markov process with genera-

tor (8.45) is ergodic. The unique invariant distribution is the Maxwell-Boltzmann distribution

ρ(p, q) =1

Ze−βH(p,q) (8.3)

where

H(p, q) =1

2‖p‖2 + V (q)

is the Hamiltonian, β = (kBT )−1 is the inverse temperature and the normalization factor Z is

the partition function

Z =

∫R2d

e−βH(p,q) dpdq.

It is possible to obtain rates of convergence in either a weighted L2-norm or the relative entropy

norm.

H(p(·, t)|ρ) 6 Ce−αt.

The proof of this result is very complicated, since the generatorL is degenerate and non-selfadjoint.

See for example and the references therein.

Let ρ(q, p, t) be the solution of the Kramers equation and let ρβ(q, p) be the Maxwell-Boltzmann

distribution. We can write

ρ(q, p, t) = h(q, p, t)ρβ(q, p),

where h(q, p, t) solves the equation

∂h

∂t= −Ah+ γSh (8.4)

where

A = p · ∇q −∇qV · ∇p, S = −p · ∇p + β−1∆p.

The operator A is antisymmetric in L2ρ := L2(R2d; ρβ(q, p)), whereas S is symmetric.

Let Xi := − ∂∂pi

. The L2ρ-adjoint of Xi is

X∗i = −βpi +∂

∂pi.

138

We have that

S = β−1

d∑i=1

X∗iXi.

Consequently, the generator of the Markov process q(t), p(t) can be written in Hormander’s

”sum of squares” form:

L = A+ γβ−1

d∑i=1

X∗iXi. (8.5)

We calculate the commutators between the vector fields in (8.5):

[A, Xi] =∂

∂qi, [Xi, Xj] = 0, [Xi, X

∗j ] = βδij.

Consequently,

Lie(X1, . . . Xd, [A, X1], . . . [A, Xd]) = Lie(∇p,∇q)

which spans Tp,qR2d for all p, q ∈ Rd. This shows that the generator L is a hypoelliptic operator.

Let now Yi = − ∂∂pi

with L2ρ-adjoint Y ∗i = ∂

∂qi− β ∂V

∂qi. We have that

X∗i Yi − Y ∗i Xi = β

(pi∂

∂qi− ∂V

∂qi

∂

∂pi

).

Consequently, the generator can be written in the form

L = β−1

d∑i=1

(X∗i Yi − Y ∗i Xi + γX∗iXi) . (8.6)

Notice also that

LV := −∇qV∇q + β−1∆q = β−1

d∑i=1

Y ∗i Yi.

The phase-space Fokker-Planck equation can be written in the form

∂ρ

∂t+ p · ∇qρ−∇qV · ∇pρ = Q(ρ, fB)

where the collision operator has the form

Q(ρ, fB) = D∇ ·(fB∇

(f−1B ρ)).

The Fokker-Planck equation has a similar structure to the Boltzmann equation (the basic equation

in the kinetic theory of gases), with the difference that the collision operator for the FP equation is

139

linear. Convergence of solutions of the Boltzmann equation to the Maxwell-Boltzmann distribution

has also been proved. See ??.

We can study the backward and forward Kolmogorov equations for (9.11) by expanding the

solution with respect to the Hermite basis. We consider the problem in 1d. We set D = 1. The

generator of the process is:

L = p∂q − V ′(q)∂p + γ(−p∂p + ∂2

p

).

=: L1 + γL0,

where

L0 := −p∂p + ∂2p and L1 := p∂q − V ′(q)∂p.

The backward Kolmogorov equation is

∂h

∂t= Lh. (8.7)

The solution should be an element of the weighted L2-space

L2ρ =

f |∫

R2

|f |2Z−1e−βH(p,q) dpdq <∞.

We notice that the invariant measure of our Markov process is a product measure:

e−βH(p,q) = e−β12|p|2e−βV (q).

The space L2(e−β12|p|2 dp) is spanned by the Hermite polynomials. Consequently, we can expand

the solution of (8.7) into the basis of Hermite basis:

h(p, q, t) =∞∑n=0

hn(q, t)fn(p), (8.8)

where fn(p) = 1/√n!Hn(p). Our plan is to substitute (8.8) into (8.7) and obtain a sequence of

equations for the coefficients hn(q, t). We have:

L0h = L0

∞∑n=0

hnfn = −∞∑n=0

nhnfn

Furthermore

L1h = −∂qV ∂ph+ p∂qh.

140

We calculate each term on the right hand side of the above equation separately. For this we will

need the formulas

∂pfn =√nfn−1 and pfn =

√nfn−1 +

√n+ 1fn+1.

p∂qh = p∂q

∞∑n=0

hnfn = p∂ph0 +∞∑n=1

∂qhnpfn

= ∂qh0f1 +∞∑n=1

∂qhn

(√nfn−1 +

√n+ 1fn+1

)=

∞∑n=0

(√n+ 1∂qhn+1 +

√n∂qhn−1)fn

with h−1 ≡ 0. Furthermore

∂qV ∂ph =∞∑n=0

∂qV hn∂pfn =∞∑n=0

∂qV hn√nfn−1

=∞∑n=0

∂qV hn+1

√n+ 1fn.

Consequently:

Lh = L1 + γL1h

=∞∑n=0

(− γnhn +

√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1

)fn

Using the orthonormality of the eigenfunctions of L0 we obtain the following set of equations

which determine hn(q, t)∞n=0.

hn = −γnhn +√n+ 1∂qhn+1

+√n∂qhn−1 +

√n+ 1∂qV hn+1, n = 0, 1, . . .

This is set of equations is usually called the Brinkman hierarchy (1956). We can use this approach

to develop a numerical method for solving the Klein-Kramers equation. For this we need to expand

each coefficient hn in an appropriate basis with respect to q. Obvious choices are other the Hermite

basis (polynomial potentials) or the standard Fourier basis (periodic potentials). We will do this

141

for the case of periodic potentials. The resulting method is usually called the continued fraction

expansion. See [82]. The Hermite expansion of the distribution function wrt to the velocity is used

in the study of various kinetic equations (including the Boltzmann equation). It was initiated by

Grad in the late 40’s. It quite often used in the approximate calculation of transport coefficients (e.g.

diffusion coefficient). This expansion can be justified rigorously for the Fokker-Planck equation.

See [67]. This expansion can also be used in order to solve the Poisson equation −Lφ = f(p, q).

See [73].

8.3 The Langevin Equation in a Harmonic Potential

There are very few potentials for which we can solve the Langevin equation or to calculate the

eigenvalues and eigenfunctions of the generator of the Markov process q(t), p(t). One case

where we can calculate everything explicitly is that of a Brownian particle in a quadratic (har-

monic) potential

V (q) =1

2ω2

0q2. (8.9)

The Langevin equation is

q = −ω20q − γq +

√2γβ−1W (8.10)

or

q = p, p = −ω20q − γp+

√2γβ−1W . (8.11)

This is a linear equation that can be solved explicitly. Rather than doing this, we will calculate the

eigenvalues and eigenfunctions of the generator, which takes the form

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (8.12)

The Fokker-Planck operator is

L = p∂q − ω20q∂p + γ(−p∂p + β−1∂2

p). (8.13)

The process q(t), p(t) is an ergodic Markov process with Gaussian invariant measure

ρβ(q, p) dqdp =βω0

2πe−β

2p2−βω

20

q2 . (8.14)

142

For the calculation of the eigenvalues and eigenfunctions of the operator L it is convenient to

introduce creation and annihilation operator in both the position and momentum variables. We set

a− = β−1/2∂p, a+ = −β−1/2∂p + β1/2p (8.15)

and

b− = ω−10 β−1/2∂q, b+ = −ω−1

0 β−1/2∂q + ω0β1/2p. (8.16)

We have that

a+a− = −β−1∂2p + p∂p

and

b+b− = −β−1∂2q + q∂q

Consequently, the operator

L = −a+a− − b+b− (8.17)

is the generator of the OU process in two dimensions.

The operators a±, b± satisfy the commutation relations

[a+, a−] = −1, (8.18a)

[b+, b−] = −1, (8.18b)

[a±, b±] = 0. (8.18c)

See Exercise 3. Using now the operators a± and b± we can write the generator L in the form

L = −γa+a− − ω0(b+a− − a+b−), (8.19)

which is a particular case of (8.6). In order to calculate the eigenvalues and eigenfunctions of (8.19)

we need to make an appropriate change of variables in order to bring the operator L into the

”decoupled” form (8.17). Clearly, this is a linear transformation and can be written in the form

Y = AX

where X = (q, p) for some 2× 2 matrix A. It is somewhat easier to make this change of variables

at the level of the creation and annihilation operators. In particular, our goal is to find first order

differential operators c± and d± so that the operator (8.19) becomes

L = −Cc+c− −Dd+d− (8.20)

143

for some appropriate constants C and D. Since our goal is, essentially, to map L to the two-

dimensional OU process, we require that that the operators c± and d± satisfy the canonical com-

mutation relations

[c+, c−] = −1, (8.21a)

[d+, d−] = −1, (8.21b)

[c±, d±] = 0. (8.21c)

The operators c± and d± should be given as linear combinations of the old operators a± and b±.

From the structure of the generator L (8.19), the decoupled form (8.20) and the commutation

relations (8.21) and (8.18) we conclude that c± and d± should be of the form

c+ = α11a+ + α12b

+, (8.22a)

c− = α21a− + α22b

−, (8.22b)

d+ = β11a+ + β12b

+, (8.22c)

d− = β21a− + β22b

−. (8.22d)

Notice that the c− and d− are not the adjoints of c+ and d+. If we substitute now these equations

into (8.20) and equate it with (8.19) and into the commutation relations (8.21) we obtain a sys-

tem of equations for the coefficients αij, βij. In order to write down the formulas for these

coefficients it is convenient to introduce the eigenvalues of the deterministic problem

q = −γq − ω20q.


q(t) = C1e−λ1t + C2e

−λ2t

with

λ1,2 =γ ± δ

2, δ =

√γ2 − 4ω2

0. (8.23)

The eigenvalues satisfy the relations

λ1 + λ2 = γ, λ1 − λ2 = δ, λ1λ2 = ω20. (8.24)

144

Proposition 8.3.1. Let L be the generator (8.19) and let c±, dpm be the operators

c+ =1√δ

(√λ1a

+ +√λ2b

+), (8.25a)

c− =1√δ

(√λ1a

− −√λ2b−), (8.25b)

d+ =1√δ

(√λ2a

+ +√λ1b

+), (8.25c)

d− =1√δ

(−√λ2a

− +√λ1b−). (8.25d)

Then c±, d± satisfy the canonical commutation relations (8.21) as well as

[L, c±] = −λ1c±, [L, d±] = −λ2d

±. (8.26)

Furthermore, the operator L can be written in the form

L = −λ1c+c− − λ2d

+d−. (8.27)

Proof. first we check the commutation relations:

[c+, c−] =1

δ

(λ1[a+, a−]− λ2[b+, b−]

)=

1

δ(−λ1 + λ2) = −1.

Similarly,

[d+, d−] =1

δ

(−λ2[a+, a−] + λ1[b+, b−]

)=

1

δ(λ2 − λ1) = −1.

Clearly, we have that

[c+, d+] = [c−, d−] = 0.

Furthermore,

[c+, d−] =1

δ

(−√λ1λ2[a+, a−] +

√λ1λ2[b+, b−]

)=

1

δ(−√λ1λ2 +−

√λ1λ2) = 0.

145

Finally:

[L, c+] = −λ1c+c−c+ + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+(1 + c+c−) + λ1c

+c+c−

= −λ1c+,

and similarly for the other equations in (8.26). Now we calculate

L = −λ1c+c− − λ2d

+d−

= −λ22 − λ2

1

δa+a− + 0b+b− +

√λ1λ2

δ(λ1 − λ2)a+b− +

1

δ

√λ1λ2(−λ1 + λ2)b+a−

= −γa+a− − ω0(b+a− − a+b−),

which is precisely (8.19). In the above calculation we used (8.24).

Using now (8.27) we can readily obtain the eigenvalues and eigenfunctions of L. From our

experience with the two-dimensional OU processes (or, the Schrodinger operator for the two-

dimensional quantum harmonic oscillator), we expect that the eigenfunctions should be tensor

products of Hermite polynomials. Indeed, we have the following, which is the main result of this

section.

Theorem 8.3.2. The eigenvalues and eigenfunctions of the generator of the Markov process q, p (8.11)

are

λnm = λ1n+ λ2m =1

2γ(n+m) +

1

2δ(n−m), n,m = 0, 1, . . . (8.28)

and

φnm(q, p) =1√n!m!

(c+)n(d+)m1, n,m = 0, 1, . . . (8.29)

Proof. We have

[L, (c+)2] = L(c+)2 − (c+)2L

= (c+L − λ1c+)c+ − c+(Lc+ + λ1c

+)

= −2λ1(c+)2

and similarly [L, (d+)2] = −2λ1(c+)2. A simple induction argument now shows that (see Exer-

cise 8.3.3)

[L, (c+)n] = −nλ1(c+)n and [L, (d+)m] = −mλ1(d+)m. (8.30)

146

We use (8.30) to calculate

L(c+)n(d+)n1

= (c+)nL(d+)m1− nλ1(c+)n(d+m)1

= (c+)n(d+)mL1−mλ2(c+)n(d+m)1− nλ1(c+)n(d+m)1

= −nλ1(c+)n(d+m)1−mλ2(c+)n(d+m)1

from which (8.28) and (8.29) follow.

Exercise 8.3.3. Show that

[L, (c±)n] = −nλ1(c±)n, [L, (d±)n] = −nλ1(d±)n, [c−, (c+)n] = n(c+)n−1, [d−, (d+)n] = n(d+)n−1.

(8.31)

Remark 8.3.4. In terms of the operators a±, b± the eigenfunctions of L are

φnm =√n!m!δ−

n+m2 λ

n/21 λ

m/22

n∑`=0

m∑k=0

1

k!(m− k)!`!(n− `)!

(λ1

λ2

) k−`2

(a+)n+m−k−`(b+)`+k1.

The first few eigenfunctions are

φ00 = 1.

φ10 =

√β(√

λ1p+√λ2ω0q

)√δ

.

φ01 =

√β(√

λ2p+√λ1ω0q

)√δ

φ11 =−2√λ1

√λ2 +

√λ1β p

2√λ2 + β pλ1ω0q + ω0β qλ2p+

√λ2ω0

2β q2√λ1

δ.

φ20 =−λ1 + β p2λ1 + 2

√λ2β p

√λ1ω0q − λ2 + ω0

2β q2λ2√2δ

.

φ02 =−λ2 + β p2λ2 + 2

√λ2β p

√λ1ω0q − λ1 + ω0

2β q2λ1√2δ

.

147

Notice that the eigenfunctions are not orthonormal.

As we already know, the first eigenvalue, corresponding to the constant eigenfunction, is 0:

λ00 = 0.

Notice that the operator L is not self-adjoint and consequently, we do not expect its eigenvalues

to be real. Indeed, whether the eigenvalues are real or not depends on the sign of the discriminant

∆ = γ2 − 4ω20 . In the underdamped regime, γ < 2ω0 the eigenvalues are complex:

λnm =1

2γ(n+m) +

1

2i√−γ2 + 4ω2

0(n−m), γ < 2ω0.

This it to be expected, since the underdamped regime the dynamics is dominated by the deter-

ministic Hamiltonian dynamics that give rise to the antisymmetric Liouville operator. We set

ω =√

(4ω20 − γ2), i.e. δ = 2iω. The eigenvalues can be written as

λnm =γ

2(n+m) + iω(n−m).

In Figure 8.3 we present the first few eigenvalues of L in the underdamped regime. The eigen-

values are contained in a cone on the right half of the complex plane. The cone is determined

by

λn0 =γ

2n+ iωn and λ0m =

γ

2m− iωm.

The eigenvalues along the diagonal are real:

λnn = γn.

On the other hand, in the overdamped regime, γ > 2ω0 all eigenvalues are real:

λnm =1

2γ(n+m) +

1

2

√γ2 − 4ω2

0(n−m), γ > 2ω0.

In fact, in the overdamped limit γ → +∞ (which we will study in Chapter ??), the eigenvalues of

the generator L converge to the eigenvalues of the generator of the OU process:

λnm = γn+ω2

0

γ(n−m) +O(γ−3).

This is consistent with the fact that in this limit the solution of the Langevin equation converges to

the solution of the OU SDE. See Chapter ?? for details.

148

Figure 8.1: First few eigenvalues of L for γ = ω = 1.

149

The eigenfunctions of L do not form an orthonormal basis in L2β := L2(R2, Z−1e−βH) since L

is not a selfadjoint operator. Using the eigenfunctions/eigenvalues of L we can easily calculate the

eigenfunctions/eigenvalues of the L2β adjoint of L. From the calculations presented in Section 8.2

we know that the adjoint operator is

L := −A+ γS (8.32)

= −ω0(b+a− − b−a+) + γa+a− (8.33)

= −λ1(c−)∗(c+)∗ − λ2(d−) ∗ (d+)∗, (8.34)

where

(c+)∗ =1√δ

(√λ1a

− +√λ2b−), (8.35a)

(c−)∗ =1√δ

(√λ1a

+ −√λ2b

+), (8.35b)

(d+)∗ =1√δ

(√λ2a

− +√λ1b−), (8.35c)

(d−)∗ =1√δ

(−√λ2a

+ +√λ1b

+). (8.35d)

L has the same eigenvalues as L:

−Lψnm = λnmψnm,

where λnm are given by (8.28). The eigenfunctions are

ψnm =1√n!m!

((c−)∗)n((d−)∗)m1. (8.36)

Proposition 8.3.5. The eigenfunctions of L and L satisfy the biorthonormality relation∫ ∫φnmψ`kρβ dpdq = δn`δmk. (8.37)

Proof. We will use formulas (8.31). Notice that using the third and fourth of these equations

together with the fact that c−1 = d−1 = 0 we can conclude that (for n > `)

(c−)`(c+)n1 = n(n− 1) . . . (n− `+ 1)(c+)n−`. (8.38)

We have∫ ∫φnmψ`kρβ dpdq =

1√n!m!`!k!

∫ ∫((c+))n((d+))m1((c−)∗)`((d−)∗)k1ρβ dpdq

=n(n− 1) . . . (n− `+ 1)m(m− 1) . . . (m− k + 1)√

n!m!`!k!

∫ ∫((c+))n−`((d+))m−k1ρβ dpdq

= δn`δmk,

150

since all eigenfunctions average to 0 with respect to ρβ .

From the eigenfunctions of L we can obtain the eigenfunctions of the Fokker-Planck operator.

Using the formula (see equation (8.4))

L∗(fρβ) = ρLf

we immediately conclude that the the Fokker-Planck operator has the same eigenvalues as those of

L and L. The eigenfunctions are

ψ∗nm = ρβφnm = ρβ1√n!m!

((c−)∗)n((d−)∗)m1. (8.39)

8.4 Asymptotic Limits for the Langevin Equation

There are very few SDEs/Fokker-Planck equations that can be solved explicitly. In most cases

we need to study the problem under investigation either approximately or numerically. In this

part of the course we will develop approximate methods for studying various stochastic systems

of practical interest. There are many problems of physical interest that can be analyzed using

techniques from perturbation theory and asymptotic analysis:

i. Small noise asymptotics at finite time intervals.

ii. Small noise asymptotics/large times (rare events): the theory of large deviations, escape from

a potential well, exit time problems.

iii. Small and large friction asymptotics for the Fokker-Planck equation: The Freidlin–Wentzell

(underdamped) and Smoluchowski (overdamped) limits.

iv. Large time asymptotics for the Langevin equation in a periodic potential: homogenization

and averaging.

v. Stochastic systems with two characteristic time scales: multiscale problems and methods.

We will study various asymptotic limits for the Langevin equation (we have set m = 1)

q = −∇V (q)− γq +√

2γβ−1W . (8.40)

151

There are two parameters in the problem, the friction coefficient γ and the inverse temperature β.

We want to study the qualitative behavior of solutions to this equation (and to the corresponding

Fokker-Planck equation). There are various asymptotic limits at which we can eliminate some of

the variables of the equation and obtain a simpler equation for fewer variables. In the large temper-

ature limit, β 1, the dynamics of (9.11) is dominated by diffusion: the Langevin equation (9.11)

can be approximated by free Brownian motion:

q =√

2γβ−1W .

The small temperature asymptotics, β 1 is much more interesting and more subtle. It leads

to exponential, Arrhenius type asymptotics for the reaction rate (in the case of a particle escaping

from a potential well due to thermal noise) or the diffusion coefficient (in the case of a particle

moving in a periodic potential in the presence of thermal noise)

κ = ν exp (−βEb) , (8.41)

where κ can be either the reaction rate or the diffusion coefficient. The small temperature asymp-

totics will be studied later for the case of a bistable potential (reaction rate) and for the case of a

periodic potential (diffusion coefficient).

Assuming that the temperature is fixed, the only parameter that is left is the friction coefficient

γ. The large and small friction asymptotics can be expressed in terms of a slow/fast system of

SDEs. In many applications (especially in biology) the friction coefficient is large: γ 1. In

this case the momentum is the fast variable which we can eliminate to obtain an equation for the

position. This is the overdamped or Smoluchowski limit. In various problems in physics the

friction coefficient is small: γ 1. In this case the position is the fast variable whereas the energy

is the slow variable. We can eliminate the position and obtain an equation for the energy. This is

the underdampled or Freidlin-Wentzell limit. In both cases we have to look at sufficiently long

time scales.

We rescale the solution to (9.11):

qγ(t) = λγ(t/µγ).

This rescaled process satisfies the equation

qγ = −λγµ2γ

∂qV (qγ/λγ)−γ

µγqγ +

√2γλ2

γµ−3γ β−1W , (8.42)

152

Different choices for these two parameters lead to the overdamped and underdamped limits: λγ =

1, µγ = γ−1, γ 1. In this case equation (8.42) becomes

γ−2qγ = −∂qV (qγ)− qγ +√

2β−1W . (8.43)

Under this scaling, the interesting limit is the overdamped limit, γ 1. We will see later that in

the limit as γ → +∞ the solution to (8.43) can be approximated by the solution to

q = −∂qV +√

2β−1W .

λγ = 1, µγ = γ, γ 1:

qγ = −γ−2∇V (qγ)− qγ +√

2γ−2β−1W . (8.44)

Under this scaling the interesting limit is the underdamped limit, γ 1. We will see later that in

the limit as γ → 0 the energy of the solution to (8.44) converges to a stochastic process on a graph.

8.4.1 The Overdamped Limit

We consider the rescaled Langevin equation (8.43):

ε2qγ(t) = −∇V (qγ(t))− qγ(t) +√

2β−1W (t), (8.45)

where we have set ε−1 = γ, since we are interested in the limit γ →∞, i.e. ε→ 0. We will show

that, in the limit as ε → 0, qγ(t), the solution of the Langevin equation (8.45), converges to q(t),

the solution of the Smoluchowski equation

q = −∇V +√

2β−1W . (8.46)

We write (8.45) as a system of SDEs:

q =1

εp, (8.47)

p = −1

ε∇V (q)− 1

ε2p+

√2

βε2W . (8.48)

This systems of SDEs defined a Markov process in phase space. Its generator is

Lε =1

ε2

(− p · ∇p + β−1∆

)+

1

ε

(p · ∇q −∇qV · ∇p

)=:

1

ε2L0 +

1

εL1.

153

This is a singularly perturbed differential operator. We will derive the Smoluchowski equation (8.46)

using a pathwise technique, as well as by analyzing the corresponding Kolmogorov equations.

We apply Ito’s formula to p:

dp(t) = Lεp(t) dt+1

ε

√2β−1∂pp(t) dW

= − 1

ε2p(t) dt− 1

ε∇qV (q(t)) dt+

1

ε

√2β−1 dW.

Consequently:

1

ε

∫ t

0

p(s) ds = −∫ t

0

∇qV (q(s)) ds+√

2β−1W (t) +O(ε).

From equation (8.47) we have that

q(t) = q(0) +1

ε

∫ t

0

p(s) ds.

Combining the above two equations we deduce

q(t) = q(0)−∫ t

0

∇qV (q(s)) ds+√

2β−1W (t) +O(ε)

from which (8.46) follows.

Notice that in this derivation we assumed that

E|p(t)|2 6 C.

This estimate is true, under appropriate assumptions on the potential V (q) and on the initial con-

ditions. In fact, we can prove a pathwise approximation result:(E supt∈[0,T ]

|qγ(t)− q(t)|p)1/p

6 Cε2−κ,

where κ > 0, arbitrary small (it accounts for logarithmic corrections).

The pathwise derivation of the Smoluchowski equation implies that the solution of the Fokker-

Planck equation corresponding to the Langevin equation (8.45) converges (in some appropriate

sense to be explained below) to the solution of the Fokker-Planck equation corresponding to the

Smoluchowski equation (8.46). It is important in various applications to calculate corrections to the

limiting Fokker-Planck equation. We can accomplish this by analyzing the Fokker-Planck equation

154

for (8.45) using singular perturbation theory. We will consider the problem in one dimension. This

mainly to simplify the notation. The multi–dimensional problem can be treated in a very similar

way.

The Fokker–Planck equation associated to equations (8.47) and (8.48) is

∂ρ

∂t= L∗ρ

=1

ε(−p∂qρ+ ∂qV (q)∂pρ) +

1

ε2

(∂p(pρ) + β−1∂2

pρ)

=:

(1

ε2L∗0 +

1

εL∗1)ρ. (8.49)

The invariant distribution of the Markov process q, p, if it exists, is

ρβ(p, q) =1

Ze−βH(p,q), Z =

∫R2

e−βH(p,q) dpdq,

where H(p, q) = 12p2 + V (q). We define the function f(p,q,t) through

ρ(p, q, t) = f(p, q, t)ρβ(p, q). (8.50)

Proposition 8.4.1. The function f(p, q, t) defined in (8.50) satisfies the equation

∂f

∂t=

[1

ε2

(−p∂q + β−1∂2

p

)− 1

ε(p∂q − ∂qV (q)∂p)

]f

=:

(1

ε2L0 −

1

εL1

)f. (8.51)

Remark 8.4.2. This is ”almost” the backward Kolmogorov equation with the difference that we

have−L1 instead ofL1. This is related to the fact thatL0 is a symmetric operator inL2(R2;Z−1e−βH(p,q)),

whereas L1 is antisymmetric.

Proof. We note that L∗0ρ0 = 0 and L∗1ρ0 = 0. We use this to calculate:

L∗0ρ = L0(fρ0) = ∂p(fρ0) + β−1∂2p(fρ0)

= ρ0p∂pf + ρ0β−1∂2

pf + fL∗0ρ0 + 2β−1∂pf∂pρ0

=(−p∂pf + β−1∂2

pf)ρ0 = ρ0L0f.

Similarly,

L∗1ρ = L∗1(fρ0) = (−p∂q + ∂qV ∂p) (fρ0)

= ρ0 (−p∂qf + ∂qV ∂pf) = −ρ0L1f.

155

Consequently, the Fokker–Planck equation (8.94b) becomes

ρ0∂f

∂t= ρ0

(1

ε2L0f −

1

εL1f

),

from which the claim follows.

We will assume that the initial conditions for (8.51) depend only on q:

f(p, q, 0) = fic(q). (8.52)

Another way for stating this assumption is the following: Let H = L2(R2d; ρβ(p, q)) and define

the projection operator P : H 7→ L2(Rd; ρβ(q)) with ρβ(q) = 1Zqe−βV (q), Zq =

∫Rd e

−βV (q) dq:

P · := 1

Zp

∫Rd·e−β

|p|22 dp, (8.53)

with Zp :=∫

Rd e−β|p|2/2 dp. Then, assumption (11.5) can be written as

Pfic = fic.

We look for a solution to (8.51) in the form of a truncated power series in ε:

f(p, q, t) =N∑n=0

εnfn(p, q, t). (8.54)

We substitute this expansion into eqn. (8.51) to obtain the following system of equations.

L0f0 = 0, (8.55a)

−L0f1 = −L1f0, (8.55b)

−L0f2 = −L1f1 −∂f0

∂t(8.55c)

−L0fn = −L1fn−1 −∂fn−2

∂t, n = 3, 4 . . . N. (8.55d)

The null space of L0 consists of constants in p. Consequently, from equation (8.55a) we conclude

that

f0 = f(q, t).

Now we can calculate the right hand side of equation (8.55b):

L1f0 = p∂qf.

156

Equation (8.55b) becomes:

L0f1 = p∂qf.

The right hand side of this equation is orthogonal toN (L∗0) and consequently there exists a unique

solution. We obtain this solution using separation of variables:

f1 = −p∂qf + ψ1(q, t).

Now we can calculate the RHS of equation (8.55c). We need to calculate L1f1:

−L1f1 =(p∂q − ∂qV ∂p

)(p∂qf − ψ1(q, t)

)= p2∂2

qf − p∂qψ1 − ∂qV ∂qf.

The solvability condition for (8.55c) is∫R

(− L1f1 −

∂f0

∂t

)ρOU(p) dp = 0,

from which we obtain the backward Kolmogorov equation corresponding to the Smoluchowski

SDE:∂f

∂t= −∂qV ∂qf + β−1∂2

qf, (8.56)

together with the initial condition (11.5).

Now we solve the equation for f2. We use (8.56) to write (8.55c) in the form

L0f2 =(β−1 − p2

)∂2qf + p∂qψ1.


f2(p, q, t) =1

2∂2qf(p, q, t)p2 − ∂qψ1(q, t)p+ ψ2(q, t).

Now we calculate the right hand side of the equation for f3, equation (8.55d) with n = 3. First we

calculate

L1f2 =1

2p3∂3

qf − p2∂2qψ1 + p∂qψ2 − ∂qV ∂2

qfp− ∂qV ∂qψ1.

The solvability condition ∫R

(∂ψ1

∂t+ L1f2

)ρOU(p) dp = 0.

This leads to the equation∂ψ1

∂t= −∂qV ∂qψ1 + β−1∂2

qψ1,

157

together with the initial condition ψ1(q, 0) = 0. From the calculations presented in the proof of

Theorem 6.5.5, and using Poincaree’s inequality for the measure 1Zqe−βV (q), we deduce that

1

2

d

dt‖ψ1‖2 6 −C‖ψ1‖2.

We use Gronwall’s inequality now to conclude that

ψ1 ≡ 0.

Putting everything together we obtain the first two terms in the ε-expansion of the Fokker–Planck

equation (8.51):

ρ(p, q, t) = Z−1e−βH(p,q)(f + ε(−p∂qf) +O(ε2)

),

where f is the solution of (8.56). Notice that we can rewrite the leading order term to the expansion

in the form

ρ(p, q, t) = (2πβ−1)−12 e−βp

2/2ρV (q, t) +O(ε),

where ρV = Z−1e−βV (q)f is the solution of the Smoluchowski Fokker-Planck equation

∂ρV∂t

= ∂q(∂qV ρV ) + β−1∂2qρV .

It is possible to expand the n-th term in the expansion (8.54) in terms of Hermite functions (the

eigenfunctions of the generator of the OU process)

fn(p, q, t) =n∑k=0

fnk(q, t)φk(p), (8.57)

where φk(p) is the k–th eigenfunction of L0:

−L0φk = λkφk.

We can obtain the following system of equations (L = β−1∂q − ∂qV ):

Lfn1 = 0,√k + 1

β−1Lfn,k+1 +

√kβ−1∂qfn,k−1 = −kfn+1,k, k = 1, 2 . . . , n− 1,√nβ−1∂qfn,n−1 = −nfn+1,n,√

(n+ 1)β−1∂qfn,n = −(n+ 1)fn+1,n+1.

158

Using this method we can obtain the first three terms in the expansion:

ρ(x, y, t) = ρ0(p, q)

(f + ε(−

√β−1∂qfφ1) + ε2

(β−1

√2∂2qfφ2 + f20

)+ε3

(−√β−3

3!∂3qfφ3 +

(−√β−1L∂2

qf −√β−1∂qf20

)φ1

))+O(ε4),

8.4.2 The Underdamped Limit

Consider now the rescaling λγ,ε = 1, µγ,ε = γ. The Langevin equation becomes

qγ = −γ−2∇V (qγ)− qγ +√

2γ−2β−1W . (8.58)

We write equation (8.58) as system of two equations

qγ = γ−1pγ, pγ = −γ−1V ′(qγ)− pγ +√

2β−1W .

This is the equation for an O(1/γ) Hamiltonian system perturbed by O(1) noise. We expect that,

to leading order, the energy is conserved, since it is conserved for the Hamiltonian system. We

apply Ito’s formula to the Hamiltonian of the system to obtain

H =(β−1 − p2

)+√

2β−1p2W

with p2 = p2(H, q) = 2(H − V (q)).

Thus, in order to study the γ → 0 limit we need to analyze the following fast/slow system of

SDEs

H =(β−1 − p2

)+√

2β−1p2W (8.59a)

pγ = −γ−1V ′(qγ)− pγ +√

2β−1W . (8.59b)

The Hamiltonian is the slow variable, whereas the momentum (or position) is the fast variable.

Assuming that we can average over the Hamiltonian dynamics, we obtain the limiting SDE for the

Hamiltonian:

H =(β−1 − 〈p2〉

)+√

2β−1〈p2〉W . (8.60)

The limiting SDE lives on the graph associated with the Hamiltonian system. The domain of

definition of the limiting Markov process is defined through appropriate boundary conditions (the

gluing conditions) at the interior vertices of the graph.

159

We identify all points belonging to the same connected component of the a level curve x :

H(x) = H, x = (q, p). Each point on the edges of the graph correspond to a trajectory. Interior

vertices correspond to separatrices. Let Ii, i = 1, . . . d be the edges of the graph. Then (i,H)

defines a global coordinate system on the graph.

We will study the small γ asymptotics by analyzing the corresponding backward Kolmogorov

equation using singular perturbation theory. The generator of the process qγ, pγ is

Lγ = γ−1 (p∂q − ∂qV ∂p)− p∂p + β−1∂2p

= γ−1L0 + L1.

Let uγ = E(f(pγ(p, q; t), qγ(p, q; t))). It satisfies the backward Kolmogorov equation associated

to the process qγ, pγ:∂uγ

∂t=

(1

γL0 + L1

)uγ. (8.61)

We look for a solution in the form of a power series expansion in ε:

uγ = u0 + γu1 + γ2u2 + . . .

We substitute this ansatz into (8.61) and equate equal powers in ε to obtain the following sequence

of equations:

L0u0 = 0, (8.62a)

L0u1 = −L1u1 +∂u0

∂t, (8.62b)

L0u2 = −L1u1 +∂u1

∂t. (8.62c)

. . . . . . . . .

Notice that the operator L0 is the backward Liouville operator of the Hamiltonian system with

Hamiltonian

H =1

2p2 + V (q).

We assume that there are no integrals of motion other than the Hamiltonian. This means that the

null space of L0 consists of functions of the Hamiltonian:

N (L0) =

functions ofH. (8.63)

160

Let us now analyze equations (8.62). We start with (8.62a); eqn. (8.63) implies that u0 depends on

q, p through the Hamiltonian function H:

u0 = u(H(p, q), t) (8.64)

Now we proceed with (8.62b). For this we need to find the solvability condition for equations of

the form

L0u = f (8.65)

My multiply it by an arbitrary smooth function of H(p, q), integrate over R2 and use the skew-

symmetry of the Liouville operator L0 to deduce:1∫R2

L0uF (H(p, q)) dpdq =

∫R2

uL∗0F (H(p, q)) dpdq

=

∫R2

u(−L0F (H(p, q))) dpdq

= 0, ∀F ∈ C∞b (R).

This implies that the solvability condition for equation (8.83) is that∫R2

f(p, q)F (H(p, q)) dpdq = 0, ∀F ∈ C∞b (R). (8.66)

We use the solvability condition in (8.62b) to obtain that∫R2

(L1u1 −

∂u0

∂t

)F (H(p, q)) dpdq = 0, (8.67)

To proceed, we need to understand how L1 acts to functions of H(p, q). Let φ = φ(H(p, q)). We

have that

∂φ

∂p=∂H

∂p

∂φ

∂H= p

∂φ

∂H

and

∂2φ

∂p2=

∂

∂p

(∂φ

∂H

)=

∂φ

∂H+ p2 ∂

2φ

∂H2.

The above calculations imply that, when L1 acts on functions φ = φ(H(p, q)), it becomes

L1 =[(β−1 − p2)∂H + β−1p2∂2

H

], (8.68)

1We assume that both u1 and F decay to 0 as |p| → ∞ to justify the integration by parts that follows.

161

where

p2 = p2(H, q) = 2(H − V (q)).

We want to change variables in the integral (8.67) and go from (p, q) to p, H . The Jacobian of the

transformation is:∂(p, q)

∂(H, q)=

∂p∂H

∂p∂q

∂q∂H

∂q∂q

=∂p

∂H=

1

p(H, q).

We use this, together with (8.68), to rewrite eqn. (8.67) as∫ ∫ (∂u∂t

+[(β−1 − p2)∂H + β−1p2∂2

H

]u)F (H)p−1(H, q) dHdq = 0.

We introduce the notation

〈·〉 :=

∫· dq.

The integration over q can be performed ”explicitly”:∫ [∂u∂t〈p−1〉+

((β−1〈p−1〉 − 〈p〉)∂H + β−1〈p〉∂2

H

)u]F (H) dH = 0.

This equation should be valid for every smooth function F (H), and this requirement leads to the

differential equation

〈p−1〉∂u∂t

=(β−1〈p−1〉 − 〈p〉

)∂Hu+ 〈p〉β−1∂2

Hu,

or,∂u

∂t=(β−1 − 〈p−1〉−1〈p〉

)∂Hu+ γ〈p−1〉−1〈p〉β−1∂2

Hu.

Thus, we have obtained the limiting backward Kolmogorov equation for the energy, which is the

”slow variable”. From this equation we can read off the limiting SDE for the Hamiltonian:

H = b(H) + σ(H)W (8.69)

where

b(H) = β−1 − 〈p−1〉−1〈p〉, σ(H) = β−1〈p−1〉−1〈p〉.

Notice that the noise that appears in the limiting equation (8.69) is multiplicative, contrary to

the additive noise in the Langevin equation.

As it well known from classical mechanics, the action and frequency are defined as

I(E) =

∫p(q, E) dq

162

and

ω(E) = 2π

(dI

dE

)−1

,

respectively. Using the action and the frequency we can write the limiting Fokker–Planck equation

for the distribution function of the energy in a very compact form.

Theorem 8.4.3. The limiting Fokker–Planck equation for the energy distribution function ρ(E, t)

is∂ρ

∂t=

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.70)

Proof. We notice thatdI

dE=

∫∂p

∂Edq =

∫p−1 dq

and consequently

〈p−1〉−1 =ω(E)

2π.

Hence, the limiting Fokker–Planck equation can be written as

∂ρ

∂t= − ∂

∂E

((β−1 I(E)ω(E)

2π

)ρ

)+ β−1 ∂2

∂E2

(Iω

2π

)= −β−1 ∂ρ

∂E+

∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(dI

dE

ωρ

2π

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))=

∂

∂E

(Iω

2πρ

)+ β−1 ∂

∂E

(I∂

∂E

(ωρ2π

))=

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)),

which is precisely equation (8.70).

Remarks 8.4.4. i. We emphasize that the above formal procedure does not provide us with the

boundary conditions for the limiting Fokker–Planck equation. We will discuss about this

issue in the next section.

ii. If we rescale back to the original time-scale we obtain the equation

∂ρ

∂t= γ

∂

∂E

((I(E) + β−1 ∂

∂E

)(ω(E)ρ

2π

)). (8.71)

We will use this equation later on to calculate the rate of escape from a potential barrier in

the energy-diffusion-limited regime.

163

8.5 Brownian Motion in Periodic Potentials

Basic model

mx = −γx(t)−∇V (x(t), f(t)) + y(t) +√

2γkBTξ(t), (8.72)

Goal: Calculate the effective drift and the effective diffusion tensor

Ueff = limt→∞

〈x(t)〉t

(8.73)

and

Deff = limt→∞

〈x(t)− 〈x(t)〉)⊗ (x(t)− 〈x(t)〉)〉2t

. (8.74)

8.5.1 The Langevin equation in a periodic potential

We start by studying the underdamped dynamics of a Brownian particle x(t) ∈ Rd moving in a

smooth, periodic potential.

x = −∇V (x(t))− γx(t) +√

2γkBTξ(t), (8.75)

where γ is the friction coefficient, kB the Boltzmann constant and T denotes the temperature. ξ(t)

stands for the standard d–dimensional white noise process, i.e.

〈ξi(t)〉 = 0 and 〈ξi(t)ξj(s)〉 = δijδ(t− s), i, j = 1, . . . d.

The potential V (x) is periodic in x and satisfies ‖∇V (x)‖L∞ = 1 with period 1 in all spatial

directions:

V (x+ ei) = V (x), i = 1, . . . , d,

where eidi=1 denotes the standard basis of Rd.

Notice that we have already non–dimensionalized eqn. (8.75) in such a way that the non–

dimensional particle mass is 1 and the maximum of the (gradient of the) potential is fixed [52].

Hence, the only parameters in the problem are the friction coefficient and the temperature. Notice,

furthermore, that the parameter γ in (8.75) controls the coupling between the Hamiltonian system

x = −∇V (x) and the thermal heat bath: γ 1 implies that the Hamiltonian system is strongly

coupled to the heat bath, whereas γ 1 corresponds to weak coupling.

164

Equation (8.75) defines a Markov process in the phase space Td × Rd. Indeed, let us write

(8.75) as a first order system

x(t) = y(t), (8.76a)

y(t) = −∇V (x(t))− γy(t) +√

2γkBTξ(t), (8.76b)

The process x(t), y(t) is Markovian with generator

L = y · ∇x −∇V (x) · ∇y + γ (−y · ∇y +D∆y) .

In writing the above we have setD = KBT . This process is ergodic. The unique invariant measure

is absolutely continuous with respect to the Lebesgue measure and its density is the Maxwell–

Boltzmann distribution

ρ(y, x) =1

(2πD)n2Z

e−1DH(x,y), (8.77)

where Z =∫

Td e−V (x)/D dx and H(x, y) is the Hamiltonian of the system

H(x, y) =1

2y2 + V (x).

The long time behavior of solutions to (8.75) is governed by an effective Brownian motion. Indeed,

the following central limit theorem holds [83, 70, ?]

Theorem 8.5.1. Let V (x) ∈ C(Td). Define the rescaled process

x

ε(t) := εx(t/ε2).

Then xε(t) converges weakly, as ε→ 0, to a Brownian motion with covariance

Deff =

∫Td×Rd

−LΦ⊗ Φµ(dx dy), (8.78)

where µ(dx dy) = ρ(x, y)dxdy and the vector valued function Φ is the solution of the Poisson

equation

−LΦ = y. (8.79)

We are interested in analyzing the dependence of Deff on γ. We will mostly focus on the one

dimensional case. We start by rescaling the Langevin equation (9.11)

x = F (x)− γx+√

2γβ−1W , (8.80)

165

where we have set F (x) = −∇V (x). We will assume that the potential is periodic with period 2π

in every direction. Since we expect that at sufficiently long length and time scales the particle per-

forms a purely diffusive motion, we perform a diffusive rescaling to the equations of motion (9.11):

t→ t/ε2, x→ xε. Using the fact that W (c t) = 1√

cW (t) in law we obtain:

ε2x =1

εF(xε

)− γx+

√2γβ−1W ,

Introducing p = εx and q = x/ε we write this equation as a first order system:

x = 1εp,

p = 1ε2F (q)− 1

ε2γp+ 1

ε2γβ−1W ,

q = 1ε2p,

(8.81)

with the understanding that q ∈ [−π, π]d and x, p ∈ Rd. Our goal now is to eliminate the fast

variables p, q and to obtain an equation for the slow variable x. We shall accomplish this by

studying the corresponding backward Kolmogorov equation using singular perturbation theory for

partial differential equations.

Let

uε(p, q, x, t) = Ef(p(t), q(t), x(t)|p(0) = p, q(0) = q, x(0) = x

),

where E denotes the expectation with respect to the Brownian motion W (t) in the Langevin equa-

tion and f is a smooth function.2 The evolution of the function uε(p, q, x, t) is governed by the

backward Kolmogorov equation associated to equations (8.81) is [74]3

∂uε

∂t=

1

εp · ∇xu

ε +1

ε2

(−∇qV (q) · ∇p + p · ∇q + γ

(− p · ∇p + β−1∆p

))uε.

:=

(1

ε2L0 +

1

εL1

)uε, (8.82)

where:

L0 = −∇qV (q) · ∇p + p · ∇q + γ(− p · ∇p + β−1∆p

),

L1 = p · ∇x

2In other words, we have that

uε(p, q, x, t) =∫f(x, v, t; p, q)ρ(x, v, t; p, q)µ(p, q) dpdqdxdv,

where ρ(x, v, t; p, q) is the solution of the Fokker-Planck equation and µ(p, q) is the initial distribution.

3it is more customary in the physics literature to use the forward Kolmogorov equation, i.e. the Fokker-Planckequation. However, for the calculation presented below, it is more convenient to use the backward as opposed to theforward Kolmogorov equation. The two formulations are equivalent. See [72, Ch. 6] for details.

166

The invariant distribution of the fast processq(t), p(t)

in Td × Rd is the Maxwell-Boltzmann

distribution

ρβ(q, p) = Z−1e−βH(q,p), Z =

∫Td×Rd

e−βH(q,p) dqdp,

where H(q, p) = 12|p|2 + V (q). Indeed, we can readily check that

L∗0ρβ(q, p) = 0,

where L∗0 denotes the Fokker-Planck operator which is the L2-adjoint of the generator of the pro-

cess L0:

L∗0f · = ∇qV (q) · ∇pf − p · ∇qf + γ(∇p · (pf) + β−1∆pf

).

The null space of the generator L0 consists of constants in q, p. Moreover, the equation

−L0f = g, (8.83)

has a unique (up to constants) solution if and only if

〈g〉β :=

∫Td×Rd

g(q, p)ρβ(q, p) dqdp = 0. (8.84)

Equation (8.83) is equipped with periodic boundary conditions with respect to z and is such that∫Td×Rd

|f |2µβ dqdp <∞. (8.85)

These two conditions are sufficient to ensure existence and uniqueness of solutions (up to con-

stants) of equation (8.83) [38, 39, 70].

We assume that the following ansatz for the solution uε holds:

uε = u0 + εu1 + ε2u2 + . . . (8.86)

with ui = ui(p, q, x, t), i = 1, 2, . . . being 2π periodic in q and satisfying condition (8.85). We

substitute (8.86) into (8.82) and equate equal powers in ε to obtain the following sequence of

equations:

L0 u0 = 0, (8.87a)

L0 u1 = −L1 u0, (8.87b)

L0 u2 = −L1 u1 +∂u0

∂t. (8.87c)

167

From the first equation in (8.87) we deduce that u0 = u0(x, t), since the null space of L0 consists

of functions which are constants in p and q. Now the second equation in (8.87) becomes:

L0u1 = −p · ∇xu0.

Since 〈p〉 = 0, the right hand side of the above equation is mean-zero with respect to the Maxwell-

Boltzmann distribution. Hence, the above equation is well-posed. We solve it using separation of

variables:

u1 = Φ(p, q) · ∇xu0

with

−L0Φ = p. (8.88)

This Poisson equation is posed on Td × Rd. The solution is periodic in q and satisfies condi-

tion (8.85). Now we proceed with the third equation in (8.87). We apply the solvability condition

to obtain:

∂u0

∂t=

∫Td×Rd

L1u1ρβ(p, q) dpdq

=d∑

i,j=1

(∫Td×Rd

piΦjρβ(p, q) dpdq

)∂2u0

∂xi∂xj.

This is the Backward Kolmogorov equation which governs the dynamics on large scales. We write

it in the form∂u0

∂t=

d∑i,j=1

Dij∂2u0

∂xi∂xj(8.89)

where the effective diffusion tensor is

Dij =

∫Td×Rd

piΦjρβ(p, q) dpdq, i, j = 1, . . . d. (8.90)

The calculation of the effective diffusion tensor requires the solution of the boundary value problem

(8.88) and the calculation of the integral in (8.90). The limiting backward Kolmogorov equation

is well posed since the diffusion tensor is nonnegative. Indeed, let ξ be a unit vector in Rd. We

calculate (we use the notation Φξ = Φ · ξ and 〈·, ·〉 for the Euclidean inner product)

〈ξ,Dξ〉 =

∫(p · ξ)(Φξ)µβ dpdq =

∫ (− L0Φξ

)Φξµβ dpdq

= γβ−1

∫ ∣∣∇pΦξ

∣∣2µβ dpdq > 0, (8.91)

168

where an integration by parts was used.

Thus, from the multiscale analysis we conclude that at large lenght/time scales the particle

which diffuses in a periodic potential performs and effective Brownian motion with a nonnegative

diffusion tensor which is given by formula (8.90).

We mention in passing that the analysis presented above can also be applied to the problem of

Brownian motion in a tilted periodic potential. The Langevin equation becomes

x(t) = −∇V (x(t)) + F − γx(t) +√

2γβ−1W (t), (8.92)

where V (x) is periodic with period 2π and F is a constant force field. The formulas for the

effective drift and the effective diffusion tensor are

V =

∫Rd×Td

pρ(q, p) dqdp, D =

∫Rd×Td

(p− V )⊗ φρ(p, q) dpdq, (8.93)

where

−Lφ = p− V, (8.94a)

L∗ρ = 0,

∫Rd×Td

ρ(p, q) dpdq = 1. (8.94b)

with

L = p · ∇q + (−∇qV + F ) · ∇p + γ(− p · ∇p + β−1∆p

). (8.95)

We have used⊗ to denote the tensor product between two vectors; L∗ denotes the L2-adjoint of the

operator L, i.e. the Fokker-Planck operator. Equations (8.94) are equipped with periodic boundary

conditions in q. The solution of the Poisson equation (8.94) is also taken to be square integrable

with respect to the invariant density ρ(q, p):∫Rd×Td

|φ(q, p)|2ρ(p, q) dpdq < +∞.

The diffusion tensor is nonnegative definite. A calculation similar to the one used to derive (8.91)

shows the positive definiteness of the diffusion tensor:

〈ξ,Dξ〉 = γβ−1

∫ ∣∣∇pΦξ

∣∣2ρ(p, q) dpdq > 0, (8.96)

for every vector ξ in Rd. The study of diffusion in a tilted periodic potential, in the underdamped

regime and in high dimensions, based on the above formulas for V and D, will be the subject of a

separate publication.

169

8.5.2 Equivalence With the Green-Kubo Formula

Let us now show that the formula for the diffusion tensor obtained in the previous section, equa-

tion (8.90), is equivalent to the Green-Kubo formula (3.14). To simplify the notation we will prove

the equivalence of the two formulas in one dimension. The generalization to arbitrary dimensions is

immediate. Let (x(t; q, p), v(t; q, p)) with v = x and initial conditions x(0; q, p) = q, v(0; q, p) =

p be the solution of the Langevin equation

x = −∂xV − γx+ ξ

where ξ(t) stands for Gaussian white noise in one dimension with correlation function

〈ξ(t)ξ(s)〉 = 2γkBTδ(t− s).

We assume that the (x, v) process is stationary, i.e. that the initial conditions are distributed ac-

cording to the Maxwell-Boltzmann distribution

ρβ(q, p) = Z−1e−βH(p,q).

The velocity autocorrelation function is [15, eq. 2.10]

〈v(t; q, p)v(0; q, p)〉 =

∫v pρ(x, v, t; p, q)ρβ(p, q) dpdqdxdv, (8.97)

and ρ(x, v, t; p, q) is the solution of the Fokker-Planck equation

∂ρ

∂t= L∗ρ, ρ(x, v, 0; p, q) = δ(x− q)δ(v − p),

where

L∗ρ = −v∂xρ+ ∂xV (x)∂vρ+ γ(∂(vρ) + β−1∂2

vρ).

We rewrite (8.97) in the form

〈v(t; q, p)v(0; q, p)〉 =

∫ ∫ (∫ ∫vρ(x, v, t; p, q) dvdx

)pρβ(p, q) dpdq

=:

∫ ∫v(t; p, q)pρβ(p, q) dpdq. (8.98)

The function v(t) satisfies the backward Kolmogorov equation which governs the evolution of

observables [74, Ch. 6]∂v

∂t= Lv, v(0; p, q) = p. (8.99)

170

We can write, formally, the solution of (8.99) as

v = eLtp. (8.100)

We combine now equations (8.98) and (8.100) to obtain the following formula for the velocity

autocorrelation function

〈v(t; q, p)v(0; q, p)〉 =

∫ ∫p(eLtp

)ρβ(p, q) dpdq. (8.101)

We substitute this into the Green-Kubo formula to obtain

D =

∫ ∞0

〈v(t; q, p)v(0; q, p)〉 dt

=

∫ (∫ ∞0

eLt dt p

)pρβ dpdq

=

∫ (− L−1p

)pρβ dpdq

=

∫ ∞−∞

∫ π

−πφpρβ dpdq,

where φ is the solution of the Poisson equation (8.88). In the above derivation we have used the

formula −L−1 =∫∞

0eLt dt, whose proof can be found in [74, Ch. 11].

8.6 The Underdamped and Overdamped Limits of the Diffu-sion Coefficient

In this section we derive approximate formulas for the diffusion coefficient which are valid in the

overdamped γ 1 and underdampled γ 1 limits. The derivation of these formulas is based on

the asymptotic analysis of the Poisson equation (8.88).

The Underdamped Limit

In this subsection we solve the Poisson equation (8.88) in one dimension perturbatively for small

γ. We shall use singular perturbation theory for partial differential equations. The operator L0 that

appears in (8.88) can be written in the form

L0 = LH + γLOU

171

where LH stands for the (backward) Liouville operator associated with the Hamiltonian H(p, q)

and LOU for the generator of the OU process, respectively:

LH = p∂q − ∂qV ∂p, LOU = −p∂p + β−1∂2p .

We expect that the solution of the Poisson equation scales like γ−1 when γ 1. Thus, we look

for a solution of the form

Φ =1

γφ0 + φ1 + γφ2 + . . . (8.102)

We substitute this ansatz in (8.88) to obtain the sequence of equations

LHφ0 = 0, (8.103a)

−LHφ1 = p+ LOUφ0, (8.103b)

−LHφ2 = LOUφ1. (8.103c)

From equation (8.103a) we deduce that, since the φ0 is in the null space of the Liouville operator,

the first term in the expansion is a function of the Hamiltonian z(p, q) = 12p2 + V (q):

φ0 = φ0(z(p, q)).

Now we want to obtain an equation for φ0 by using the solvability condition for (8.103b). To this

end, we multiply this equation by an arbitrary function of z, g = g(z) and integrate over p and q to

obtain ∫ +∞

−∞

∫ π

−π(p+ LOUφ0) g(z(p, q)) dpdq = 0.

We change now from p, q coordinates to z, q, so that the above integral becomes∫ +∞

Emin

∫ π

−πg(z) (p(z, q) + LOUφ0(z))

1

p(z, q)dzdq = 0,

where J = p−1(z, q) is the Jacobian of the transformation. Operator L0, when applied to functions

of the Hamiltonian, becomes:

LOU = (β−1 − p2)∂

∂z+ β−1p2 ∂

2

∂z2.

Hence, the integral equation for φ0(z) becomes∫ +∞

Emin

∫ π

−πg(z)

[p(z, q) +

((β−1 − p2)

∂

∂z+ β−1p2 ∂

2

∂z2

)φ0(z)

]1

p(z, q)dzdq = 0.

172

Let E0 denote the critical energy, i.e. the energy along the separatrix (homoclinic orbit). We set

S(z) =

∫ x2(z)

x1(z)

p(z, q) dq, T (z) =

∫ x2(z)

x1(z)

1

p(z, q)dq,

where Risken’s notation [82, p. 301] has been used for x1(z) and x2(z).

We need to consider the casesz > E0, p > 0

,z > E0, p < 0

and

Emin < z < E0

separately.

We consider first the case E > E0, p > 0. In this case x1(x) = π, x2(z) = −π. We can

perform the integration with respect to q to obtain∫ +∞

E0

g(z)

[2π +

((β−1T (z)− S(z))

∂

∂z+ β−1S(z)

∂2

∂z2

)φ0(z)

]dz = 0,

This equation is valid for every test function g(z), from which we obtain the following differential

equation for φ0:

−Lφ := −β−1 1

T (z)S(z)φ′′ +

(1

T (z)S(z)− β−1

)φ′ =

2π

T (z), (8.104)

where primes denote differentiation with respect to z and where the subscript 0 has been dropped

for notational simplicity.

A similar calculation shows that in the regions E > E0, p < 0 and Emin < E < E0 the

equation for φ0 is

−Lφ = − 2π

T (z), E > E0, p < 0 (8.105)

and

−Lφ = 0, Emin < E < E0. (8.106)

Equations (8.104), (8.105), (8.106) are augmented with condition (8.85) and a continuity condition

at the critical energy [27]

2φ′3(E0) = φ′1(E0) + φ′2(E0), (8.107)

where φ1, φ2, φ3 are the solutions of equations (8.104), (8.105) and (8.106), respectively.

The average of a function h(q, p) = h(q, p(z, q)) can be written in the form [82, p. 303]

〈h(q, p)〉β :=

∫ ∞−∞

∫ π

−πh(q, p)µβ(q, p) dqdp

= Z−1β

∫ +∞

Emin

∫ x2(z)

x1(z)

(h(q, p(z, q)) + h(q,−p(z, q))

)(p(q, z))−1e−βz dzdq,

173

where the partition function is

Zβ =

√2π

β

∫ π

−πe−βV (q) dq.

From equation (8.106) we deduce that φ3(z) = 0. Furthermore, we have that φ1(z) = −φ2(z).

These facts, together with the above formula for the averaging with respect to the Boltzmann

distribution, yield:

D = 〈pΦ(p, q)〉β = 〈pφ0〉β +O(1) (8.108)

≈ 2

γZ−1β

∫ +∞

E0

φ0(z)eβz dzO(1)

=4π

γZ−1β

∫ +∞

E0

φ0(z)e−βz dz, (8.109)

to leading order in γ, and where φ0(z) is the solution of the two point boundary value prob-

lem (8.104). We remark that if we start with formula D = γβ−1〈|∂pΦ|2〉β for the diffusion coeffi-

cient, we obtain the following formula, which is equivalent to (8.109):

D =4π

γβZ−1β

∫ +∞

E0

|∂zφ0(z)|2e−βz dz.

Now we solve the equation for φ0(z) (for notational simplicity, we will drop the subscript 0 ).

Using the fact that S ′(z) = T (z), we rewrite (8.104) as

−β−1(Sφ′)′ + Sφ′ = 2π.

This equation can be rewritten as

−β−1(e−βzSφ′

)= e−βz.

Condition (8.85) implies that the derivative of the unique solution of (8.104) is

φ′(z) = S−1(z).

We use this in (8.109), together with an integration by parts, to obtain the following formula for

the diffusion coefficient:

D =1

γ8π2Z−1

β β−1

∫ +∞

E0

e−βz

S(z)dz. (8.110)

We emphasize the fact that this formula is exact in the limit as γ → 0 and is valid for all periodic

potentials and for all values of the temperature.

174

Consider now the case of the nonlinear pendulum V (q) = − cos(q). The partition function is

Zβ =(2π)3/2

β1/2J0(β),

where J0(·) is the modified Bessel function of the first kind. Furthermore, a simple calculation

yields

S(z) = 25/2√z + 1E

(√2

z + 1

),

where E(·) is the complete elliptic integral of the second kind. The formula for the diffusion

coefficient becomes

D =1

γ

√π

2β1/2J0(β)

∫ +∞

1

e−βz√z + 1E(

√2/(z + 1))

dz. (8.111)

We use now the asymptotic formula J0(β) ≈ (2πβ)−1/2eβ, β 1 and the fact that E(1) = 1 to

obtain the small temperature asymptotics for the diffusion coefficient:

D =1

γ

π

2βe−2β, β 1, (8.112)

which is precisely formula (??), obtained by Risken.

Unlike the overdamped limit which is treated in the next section, it is not straightforward to ob-

tain the next order correction in the formula for the effective diffusivity. This is because, due to the

discontinuity of the solution of the Poisson equation (8.88) along the separatrix. In particular, the

next order correction to φwhen γ 1 is of (γ−1/2), rather than (1) as suggested by ansatz (8.102).

Upon combining the formula for the diffusion coefficient and the formula for the hopping rate

from Kramers’ theory [41, eqn. 4.48(a)] we can obtain a formula for the mean square jump length

at low friction. For the cosine potential, and for β 1, this formula is

〈`2〉 =π2

8γ2β2for γ 1, β 1. (8.113)

The Overdamped Limit

In this subsection we study the large γ asymptotics of the diffusion coefficient. As in the previous

case, we use singular perturbation theory, e.g. [42, Ch. 8]. The regularity of the solution of (8.88)

when γ 1 will enable us to obtain the first two terms in the 1γ

expansion without any difficulty.

175

We set γ = 1ε. The differential operator L0 becomes

L0 =1

εLOU + LH .

We look for a solution of (8.88) in the form of a power series expansion in γ:

Φ = φ0 + εφ1 + ε2φ2 + ε3φ3 + . . . (8.114)

We substitute this into (8.88) and obtain the following sequence of equations:

−LOUφ0 = 0, (8.115a)

−LOUφ1 = p+ LHφ0, (8.115b)

−LOUφ2 = LHφ1, (8.115c)

−LOUφ3 = LHφ2. (8.115d)

The null space of the Ornstein-Uhlenbeck operator L0 consists of constants in p. Consequently,

from the first equation in (8.115) we deduce that the first term in the expansion in independent of

p, φ0 = φ(q). The second equation becomes

−LOUφ1 = p(1 + ∂qφ).

Let

νβ(p) =

(2π

β

)− 12

e−βp2

2 ,

be the invariant distribution of the OU process (i.e. L∗OUνβ(p) = 0). The solvability condition for

an equation of the form −LOUφ = f requires that the right hand side averages to 0 with respect

to νβ(p), i.e. that the right hand side of the equation is orthogonal to the null space of the adjoint

of LOU . This condition is clearly satisfied for the equation for φ1. Thus, by Fredholm alternative,

this equation has a solution which is

φ1(p, q) = (1 + ∂qφ)p+ ψ1(q),

where the function ψ1(q) of is to be determined. We substitute this into the right hand side of the

third equation to obtain

−LOUφ2 = p2∂2qφ− ∂qV (1 + ∂qφ) + p∂qψ1(q).

176

From the solvability condition for this we obtain an equation for φ(q):

β−1∂2qφ− ∂qV (1 + ∂qφ) = 0, (8.116)

together with the periodic boundary conditions. The derivative of the solution of this two-point

boundary value problem is

∂qφ+ 1 =2π∫ π

−π eβV (q) dq

eβV (q). (8.117)

The first two terms in the large γ expansion of the solution of equation (8.88) are

Φ(p, q) = φ(q) +1

γ(1 + ∂qφ) +O

(1

γ2

),

where φ(q) is the solution of (8.116). Substituting this in the formula for the diffusion coefficient

and using (8.117) we obtain

D =

∫ ∞−∞

∫ π

−πpΦρβ(p, q) dpdq =

4π2

βZZ+O

(1

γ3

),

where Z =∫ π−π e

−βV (q), Z =∫ π−π e

βV (q). This is, of course, the Lifson-Jackson formula which

gives the diffusion coefficient in the overdamped limit [54]. Continuing in the same fashion, we

can also calculate the next two terms in the expansion (8.114), see Exercise 4. From this, we can

compute the next order correction to the diffusion coefficient. The final result is

D =4π2

βγZZ− 4π2βZ1

γ3ZZ2+O

(1

γ5

), (8.118)

where Z1 =∫ π−π |V

′(q)|2eβV (q) dq.

In the case of the nonlinear pendulum, V (q) = cos(q), formula (8.118) gives

D =1

γβJ−2

0 (β)− β

γ3

(J2(β)

J30 (β)

− J−20 (β)

)+O

(1

γ5

), (8.119)

where Jn(β) is the modified Bessel function of the first kind.

In the multidimensional case, a similar analysis leads to the large gamma asymptotics:

〈ξ,Dξ〉 =1

γ〈ξ,D0ξ〉+O

(1

γ3

),

where ξ is an arbitrary unit vector in Rd and D0 is the diffusion coefficient for the Smoluchowski

(overdamped) dynamics:

D0 = Z−1

∫Rd

(− LV χ

)⊗ χe−βV (q) dq (8.120)

177

where

LV = −∇qV · ∇q + β−1∆q

and χ(q) is the solution of the PDE LV χ = ∇qV with periodic boundary conditions.

Now we prove several properties of the effective diffusion tensor in the overdamped limit. For

this we will need the following integration by parts formula∫Td

(∇yχ

)ρ dy =

∫Td

(∇y(χρ)− χ⊗∇yρ

)dy = −

∫Td

(χ⊗∇yρ) dy. (8.121)

The proof of this formula is left as an exercise, see Exercise 5.

Theorem 8.6.1. The effective diffusion tensor D0 (8.120) satisfies the upper and lower bounds

D

ZZ6 〈ξ,Kξ〉 6 D|ξ|2 ∀ξ ∈ Rd, (8.122)

where

Z =

∫TdeV (y)/D dy.

In particular, diffusion is always depleted when compared to molecular diffusivity. Furthermore,

the effective diffusivity is symmetric.

Proof. The lower bound follows from the general lower bound (??), equation (??) and the formula

for the Gibbs measure. To establish the upper bound, we use (8.121) and (??) to obtain

K = DI + 2D

∫Td

(∇χ)Tρ dy +

∫Td−∇yV ⊗ χρ dy

= DI − 2D

∫Td∇yρ⊗ χdy +


= DI − 2

∫Td−∇yV ⊗ χρ dy +


= DI −∫

Td−∇yV ⊗ χρ dy

= DI −∫

Td

(− L0χ

)⊗ χρ dy

= DI −D∫

Td

(∇yχ⊗∇yχ

)ρ dy. (8.123)

Hence, for χξ = χ · ξ,

〈ξ,Kξ〉 = D|ξ|2 −D∫

Td|∇yχξ|2ρ dy

6 D|ξ|2.

This proves depletion. The symmetry of K follows from (8.123).

178

The One Dimensional Case

The one dimensional case is always in gradient form: b(y) = −∂yV (y). Furthermore in one

dimension we can solve the cell problem (??) in closed form and calculate the effective diffusion

coefficient explicitly–up to quadratures. We start with the following calculation concerning the

structure of the diffusion coefficient.

K = D + 2D

∫ 1

0

∂yχρ dy +

∫ 1

0

−∂yV χρ dy

= D + 2D

∫ 1

0

∂yχρ dy +D

∫ 1

0

χ∂yρ dy

= D + 2D

∫ 1

0

∂yχρ dy −D∫ 1

0

∂yχρ dy

= D

∫ 1

0

(1 + ∂yχ

)ρ dy. (8.124)

The cell problem (??) in one dimension is

D∂yyχ− ∂yV ∂yχ = ∂yV. (8.125)

We multiply equation (8.125) by e−V (y)/D to obtain

∂y(∂yχe

−V (y)/D)

= −∂y(e−V (y)/D

).

We integrate this equation from 0 to 1 and multiply by eV (y)/D to obtain

∂yχ(y) = −1 + c1eV (y)/D.

Another integration yields

χ(y) = −y + c1

∫ y

0

eV (y)/D dy + c2.

The periodic boundary conditions imply that χ(0) = χ(1), from which we conclude that

−1 + c1

∫ 1

0

eV (y)/D dy = 0.

Hence

c1 =1

Z, Z =

∫ 1

0

eV (y)/D dy.

179

We deduce that

∂yχ = −1 +1

ZeV (y)/D.

We substitute this expression into (8.124) to obtain

K =D

Z

∫ 1

0

(1 + ∂yχ(y)) e−V (y)/D dy

=D

ZZ

∫ 1

0

eV (y)/De−V (y)/D dy

=D

ZZ, (8.126)

with

Z =

∫ 1

0

e−V (y)/D dy, Z =

∫ 1

0

eV (y)/D dy. (8.127)

The Cauchy-Schwarz inequality shows that ZZ > 1. Notice that in the one–dimensional case the

formula for the effective diffusivity is precisely the lower bound in (8.122). This shows that the

lower bound is sharp.

Example 8.6.2. Consider the potential

V (y) =

a1 : y ∈ [0, 1

2],

a2 : y ∈ (12, 1],

(8.128)

where a1, a2 are positive constants.4

It is straightforward to calculate the integrals in (8.127) to obtain the formula

K =D

cosh2(a1−a2

D

) . (8.129)

In Figure 8.2 we plot the effective diffusivity given by (8.129) as a function of the molecular

diffusivity D. We observe that K decays exponentially fast in the limit as D → 0.

8.6.1 Brownian Motion in a Tilted Periodic Potential

In this appendix we use our method to obtain a formula for the effective diffusion coefficient of an

overdamped particle moving in a one dimensional tilted periodic potential. This formula was first

4Of course, this potential is not even continuous, let alone smooth, and the theory as developed in this chapterdoes not apply. It is possible, however, to consider a regularized version of this discontinuous potential and thenhomogenization theory applies.

180

Figure 8.2: Effective diffusivity versus molecular diffusivity for the potential (8.128).

derived and analyzed in [80, 79] without any appeal to multiscale analysis. The equation of motion

is

x = −V ′(x) + F +√

2Dξ, (8.130)

where V (x) is a smooth periodic function with period L, F and D > 0 constants and ξ(t) standard

white noise in one dimension. To simplify the notation we have set γ = 1.

The stationary Fokker–Planck equation corresponding to(8.130) is

∂x ((V ′(x)− F ) ρ(x) +D∂xρ(x)) = 0, (8.131)

with periodic boundary conditions. Formula (10.13) for the effective drift now becomes

Ueff =

∫ L

0

(−V ′(x) + F )ρ(x) dx. (8.132)

The solution of eqn. (8.131) is [77, Ch. 9]

ρ(x) =1

Z

∫ x+L

x

dyZ+(y)Z−(x), (8.133)

with

Z±(x) := e±1D

(V (x)−Fx),

181

and

Z =

∫ L

0

dx

∫ x+L

x

dyZ+(y)Z−(x). (8.134)

Upon using (8.133) in (8.132) we obtain [77, Ch. 9]

Ueff =DL

Z

(1− e−

F LD

). (8.135)

Our goal now is to calculate the effective diffusion coefficient. For this we first need to solve the

Poisson equation (10.20) which now becomes

Lχ(x) := D∂xxχ(x) + (−V ′(x) + F )∂xχ = V ′(x)− F + Ueff , (8.136)

with periodic boundary conditions. Then we need to evaluate the integrals in (10.18):

Deff = D +

∫ L

0

(−V ′(x) + F − Ueff )ρ(x) dx+ 2D

∫ L

0

∂xχ(x)ρ(x) dx.

It will be more convenient for the subsequent calculation to rewrite the above formula for the effec-

tive diffusion coefficient in a different form. The fact that ρ(x) solves the stationary Fokker–Planck

equation, together with elementary integrations by parts yield that, for all sufficiently smooth peri-

odic functions φ(x), ∫ L

0

φ(x)(−Lφ(x))ρ(x) dx = D

∫ L

0

(∂xφ(x))2ρ(x) dx.

Now we have

Deff = D +

∫ L

0

(−V ′(x) + F − Ueff )χ(x)ρ(x) dx+ 2D

∫ L

0

∂xχ(x)ρ(x) dx

= D +

∫ L

0

(−Lχ(x))χ(x)ρ(x) dx+ 2D

∫ L

0

∂xχ(x)ρ(x) dx

= D +D

∫ L

0

(∂xχ(x))2 ρ(x) dx+ 2D

∫ L

0

∂xχ(x)ρ(x) dx

= D

∫ L

0

(1 + ∂xχ(x))2 ρ(x) dx. (8.137)

Now we solve the Poisson equation (8.136) with periodic boundary conditions. We multiply the

equation by Z−(x) and divide through by D to rewrite it in the form

∂x(∂xχ(x)Z−(x)) = −∂xZ−(x) +UeffD

Z−(x).

182

We integrate this equation from x−L to x and use the periodicity of χ(x) and V (x) together with

formula (8.135) to obtain

∂xχ(x)Z−(x)(

1− e−F LD

)= −Z−(x)

(1− e−

F LD

)+L

Z

(1− e−

F LD

)∫ x

x−LZ−(y) dy,

from which we immediately get

∂xχ(x) + 1 =1

Z

∫ x

x−LZ−(y)Z+(x) dy.

Substituting this into (8.137) and using the formula for the invariant distribution (8.133) we finally

obtain

Deff =D

Z3

∫ L

0

(I+(x))2I−(x) dx, (8.138)

with

I+(x) =

∫ x

x−LZ−(y)Z+(x) dy and I−(x) =

∫ x+L

x

Z+(y)Z−(x) dy.

Formula (8.138) for the effective diffusion coefficient (formula (22) in [79]) is the main result of

this section.

8.7 Numerical Solution of the Klein-Kramers Equation


The rigorous study of the overdamped limit can be found in [68]. A similar approximation theorem

is also valid in infinite dimensions (i.e. for SPDEs); see [10, 11].

More information about the underdamped limit of the Langevin equation can be found at [89,

28, 29].

We also mention in passing that the various formulae for the effective diffusion coefficient

that have been derived in the literature [34, 54, 80, 85] can be obtained from equation (??): they

correspond to cases where equations (??) and (??) can be solved analytically. An example–the

calculation of the effective diffusion coefficient of an overdamped Brownian particle in a tilted

periodic potential–is presented in appendix. Similar calculations yield analytical expressions for

all other exactly solvable models that have been considered in the literature.

183

8.9 Exercises

1. Let L be the generator of the two-dimensional Ornstein-Uhlenbeck operator (8.17). Calculate

the eigenvalues and eigenfunctions of L. Show that there exists a transformation that transforms

L into the Schrodinger operator of the two-dimensional quantum harmonic oscillator.

2. Let L be the operator defined in (8.34)

(a) Show by direct substitution that L can be written in the form

L = −λ1(c−)∗(c+)∗ − λ2(d−)∗(d+)∗.

(b) Calculate the commutators

[(c+)∗, (c−)∗], [(d+)∗, (d−)∗], [(c±)∗, (d±)∗], [L, (c±)∗], [L, (d±)∗].

3. Show that the operators a±, b± defined in (8.15) and (8.16) satisfy the commutation relations

[a+, a−] = −1, (8.139a)

[b+, b−] = −1, (8.139b)

[a±, b±] = 0. (8.139c)

4. Obtain the second term in the expansion (8.118).

5. Prove formula (8.121).

184

Chapter 9

Exit Time Problems

9.1 Introduction

9.2 Brownian Motion in a Bistable Potential

There are many systems in physics, chemistry and biology that exist in at least two stable states.

Among the many applications we mention the switching and storage devices in computers. An-

other example is biological macromolecules that can exist in many different states. The problems

that we would like to solve are:

• How stable are the various states relative to each other.

• How long does it take for a system to switch spontaneously from one state to another?

• How is the transfer made, i.e. through what path in the relevant state space? There is a lot of

important current work on this problem by E, Vanden Eijnden etc.

• How does the system relax to an unstable state?

We can separate between the 1d problem, the finite dimensional problem and the infinite dimen-

sional problem (SPDEs). We we will solve completely the one dimensional problem and discuss

in some detail about the finite dimensional problem. The infinite dimensional situation is an ex-

tremely hard problem and we will only make some remarks. The study of bistability and metasta-

bility is a very active research area, in particular the development of numerical methods for the

calculation of various quantities such as reaction rates, transition pathways etc.

185

We will mostly consider the dynamics of a particle moving in a bistable potential, under the

influence of thermal noise in one dimension:

x = −V ′(x) +√

2kBT β. (9.1)

An example of the class of potentials that we will consider is shown in Figure. It has to local

minima, one local maximum and it increases at least quadratically at infinity. This ensures that the

state space is ”compact”, i.e. that the particle cannot escape at infinity. The standard potential that

satisfies these assumptions is

V (x) =1

4x4 − 1

2x2 +

1

4. (9.2)

It is easily checked that this potential has three local minima, a local maximum at x = 0 and two

local minima at x = ±1. The values of the potential at these three points are:

V (±1) = 0, V (0) =1

4.

We will say that the height of the potential barrier is 14. The physically (and mathematically!)

interesting case is when the thermal fluctuations are weak when compared to the potential barrier

that the particle has to climb over.

More generally, we assume that the potential has two local minima at the points a and c and a

local maximum at b. Let us consider the problem of the escape of the particle from the left local

minimum a. The potential barrier is then defined as

∆E = V (b)− V (a).

186

Our assumption that the thermal fluctuations are weak can be written as

kBT

∆E 1.

In this limit, it is intuitively clear that the particle is most likely to be found at either a or c. There

it will perform small oscillations around either of the local minima. This is a result that we can

obtain by studying the small temperature limit by using perturbation theory. The result is that we

can describe locally the dynamics of the particle by appropriate Ornstein–Uhlenbeck processes.

Of course, this result is valid only for finite times: at sufficiently long times the particle can escape

from the one local minimum, a say, and surmount the potential barrier to end up at c. It will then

spend a long time in the neighborhood of c until it escapes again the potential barrier and end at

a. This is an example of a rare event. The relevant time scale, the exit time or the mean first

passage time scales exponentially in β := (kBT )−1:

τ = ν−1 exp(β∆E).

It is more customary to calculate the reaction rate κ := τ−1 which gives the rate with which

particles escape from a local minimum of the potential:

κ = ν exp(−β∆E). (9.3)

It is very important to notice that the escape from a local minimum, i.e. a state of local stability,

can happen only at positive temperatures: it is a noise assisted event. Indeed, consider the case

T = 0. The equation of motion becomes

x = −V ′(x), x(0) = x0.

In this case the potential becomes a Lyapunov function:

dx

dt= V ′(x)

dx

dt= −(V ′(x))2 < 0.

Hence, depending on the initial condition the particle will converge either to a or c. The particle

cannot escape from either state of local stability.

On the other hand, at high temperatures the particle does not ”see” the potential barrier: it

essentially jumps freely from one local minimum to another.

187

To get a better understanding of the dependence of the dynamics on the depth of the potential

barrier relative to temperature, we solve the equation of motion (10.3) numerically. In Figure we

present the time series of the particle position. We observe that at small temperatures the particle

spends most of its time around x = ±1 with rapid transitions from −1 to 1 and back.

9.3 The Mean First Passage Time

The Arrhenius-type factor in the formula for the reaction rate, eqn. (9.3) is intuitively and it has

been observed experimentally in the late nineteenth century by Arrhenius and others. What is

extremely important both from a theoretical and an applied point of view is the calculation of the

prefactor ν, the rate coefficient. A systematic approach for the calculation of the rate coefficient,

as well as the justification of the Arrhenius kinetics, is that of the mean first passage time method

(MFPT). Since this method is of independent interest and is useful in various other contexts, we

will present it in a quite general setting and apply it to the problem of the escape from a potential

barrier in later sections. We will first treat the one dimensional problem and then extend the theory

to arbitrary finite dimensions.

We will restrict ourselves to the case of homogeneous Markov processes. It is not very easy to

extend the method to non-Markovian processes.

9.3.1 The Boundary Value Problem for the MFPT

Let Xt be a continuous time diffusion process on Rd whose evolution is governed by the SDE

dXxt = b(Xx

t ) dt+ σ(Xxt ) dWt, Xx

0 = x. (9.4)

Let D be a bounded subset of Rd with smooth boundary. Given x ∈ D, we want to know how long

it takes for the process Xt to leave the domain D for the first time

τxD = inf t > 0 : Xxt /∈ D .

Clearly, this is a random variable. The average of this random variable is called the mean first

passage time MFPT or the first exit time:

τ(x) := EτxD.

We can calculate the MFPT by solving an appropriate boundary value problem.

188

Theorem 9.3.1. The MFPT is the solution of the boundary value problem

−Lτ = 1, x ∈ D, (9.5a)

τ = 0, x ∈ ∂D, (9.5b)

where L is the generator of the SDE 9.5.

The homogeneous Dirichlet boundary conditions correspond to an absorbing boundary: the

particles are removed when they reach the boundary. Other choices of boundary conditions are

also possible. The rigorous proof of Theorem 9.3.1 is based on Ito’s formula.

Proof. Let ρ(X, x, t) be the probability distribution of the particles that have not left the domain

D at time t. It solves the FP equation with absorbing boundary conditions.

∂ρ

∂t= L∗ρ, ρ(X, x, 0) = δ(X − x), ρ|∂D = 0. (9.6)

We can write the solution to this equation in the form

ρ(X, x, t) = eL∗tδ(X − x),

where the absorbing boundary conditions are included in the definition of the semigroup eL∗t. The

homogeneous Dirichlet (absorbing) boundary conditions imply that

limt→+∞

ρ(X, x, t) = 0.

That is: all particles will eventually leave the domain. The (normalized) number of particles that

are still inside D at time t is

S(x, t) =

∫D

ρ(X, x, t) dx.

Notice that this is a decreasing function of time. We can write

∂S

∂t= −f(x, t),

189

where f(x, t) is the first passage times distribution. The MFPT is the first moment of the distri-

bution f(x, t):

τ(x) =

∫ +∞

0

f(s, x)s ds =

∫ +∞

0

−dSdss ds

=

∫ +∞

0

S(s, x) ds =

∫ +∞

0

∫D

ρ(X, x, s) dXds

=

∫ +∞

0

∫D

eL∗sδ(X − x) dXds

=

∫ +∞

0

∫D

δ(X − x)(eLs1

)dXds =

∫ +∞

0

(eLs1

)ds.

We apply L to the above equation to deduce:

Lτ =

∫ +∞

0

(LeLt1

)dt =

∫ t

0

d

dt

(LeLt1

)dt

= −1.

9.3.2 Examples

In this section we consider a few simple examples for which we can calculate the mean first passage

time in closed form.

Brownian motion with one absorbing and one reflecting boundary.

We consider the problem of Brownian motion moving in the interval [a, b]. We assume that the left

boundary is absorbing and the right boundary is reflecting. The boundary value problem for the

MFPT time becomes

−d2τ

dx2= 1, τ(a) = 0,

dτ

dx(b) = 0. (9.7)


τ(x) = −x2

2+ bx+ a

(a2− b).

The MFPT time for Brownian motion with one absorbing and one reflecting boundary in the inter-

val [−1, 1] is plotted in Figure 9.3.2.

190

Figure 9.1: The mean first passage time for Brownian motion with one absorbing and one reflectingboundary.

Brownian motion with two reflecting boundaries.

Consider again the problem of Brownian motion moving in the interval [a, b], but now with both

boundaries being absorbing. The boundary value problem for the MFPT time becomes

−d2τ

dx2= 1, τ(a) = 0, τ(b) = 0. (9.8)


τ(x) = −x2

2+ bx+ a

(a2− b).

The MFPT time for Brownian motion with two absorbing boundaries in the interval [−1, 1] is

plotted in Figure 9.3.2.

The Mean First Passage Time for a One-Dimensional Diffusion Process

Consider now the mean exit time problem from an interval [a, b] for a general one-dimensional

diffusion process with generator

L = a(x)d

dx+

1

2b(x)

d2

dx2,

where the drift and diffusion coefficients are smooth functions and where the diffusion coefficient

b(x) is a strictly positive function (uniform ellipticity condition). In order to calculate the mean

191

Figure 9.2: The mean first passage time for Brownian motion with two absorbing boundaries.

first passage time we need to solve the differential equation

−(a(x)

d

dx+

1

2b(x)

d2

dx2

)τ = 1, (9.9)

together with appropriate boundary conditions, depending on whether we have one absorbing and

one reflecting boundary or two absorbing boundaries. To solve this equation we first define the

function ψ(x) through ψ′(x) = 2a(x)/b(x) to write (9.9) in the form(eψ(x)τ ′(x)

)′= − 2

b(x)e−ψ(x)

The general solution of (9.9) is obtained after two integrations:

τ(x) = −2

∫ x

a

e−ψ(z) dz

∫ z

a

e−ψ(y)

b(y)dy + c1

∫ x

a

e−ψ(y) dy + c2,

where the constants c1 and c2 are to be determined from the boundary conditions. When both

boundaries are absorbing we get

τ(x) = −2

∫ x

a

e−ψ(z) dz

∫ z

a

e−ψ(y)

b(y)dy +

2Z

Z

∫ x

a

e−ψ(y) dy. (9.10)

9.4 Escape from a Potential Barrier

In this section we use the theory developed in the previous section to study the long time/small

temperature asymptotics of solutions to the Langevin equation for a particle moving in a one–

192

dimensional potential of the form (9.2):

x = −V ′(x)− γx+√

2γkBTW . (9.11)

In particular, we justify the Arrhenius formula for the reaction rate

κ = ν(γ) exp(−β∆E)

and we calculate the escape rate ν = ν(γ). In particular, we analyze the dependence of the escape

rate on the friction coefficient. We will see that the we need to distinguish between the cases of

large and small friction coefficients.

9.4.1 Calculation of the Reaction Rate in the Overdamped Regime

We consider the Langevin equation (9.11) in the limit of large friction. As we saw in Section 8.4,

in the overdamped limit γ 1, the solution to (9.11) can be approximated by the solution to the

Smoluchowski equation (10.3)

x = −V ′(x) +√

2β−1W .

We want to calculate the rate of escape from the potential barrier in this case. We assume that the

particle is initially at x0 which is near a, the left potential minimum. Consider the boundary value

problem for the MFPT of the one dimensional diffusion process (10.3) from the interval (a, b):

−β−1eβV ∂x(e−βV τ

)= 1 (9.12)

We choose reflecting BC at x = a and absorbing B.C. at x = b. We can solve (9.12) with these

boundary conditions by quadratures:

τ(x) = β−1

∫ b

x

dyeβV (y)

∫ y

0

dze−βV (z). (9.13)

Now we can solve the problem of the escape from a potential well: the reflecting boundary is at

x = a, the left local minimum of the potential, and the absorbing boundary is at x = b, the local

maximum. We can replace the B.C. at x = a by a repelling B.C. at x = −∞:

τ(x) = β−1

∫ b

x

dyeβV (y)

∫ y

−∞dze−βV (z).

193

When Ebβ 1 the integral wrt z is dominated by the value of the potential near a. Furthermore,

we can replace the upper limit of integration by∞:∫ z

−∞exp(−βV (z)) dz ≈

∫ +∞

−∞exp(−βV (a)) exp

(−βω

20

2(z − a)2

)dz

= exp (−βV (a))

√2π

βω20

,

where we have used the Taylor series expansion around the minimum:

V (z) = V (a) +1

2ω2

0(z − a)2 + . . .

Similarly, the integral wrt y is dominated by the value of the potential around the saddle point. We

use the Taylor series expansion

V (y) = V (b)− 1

2ω2b (y − b)2 + . . .

Assuming that x is close to a, the minimum of the potential, we can replace the lower limit of

integration by −∞. We finally obtain∫ b

x

exp(βV (y)) dy ≈∫ b

−∞exp(βV (b)) exp

(−βω

2b

2(y − b)2

)dy

=1

2exp (βV (b))

√2π

βω2b

.

Putting everything together we obtain a formula for the MFPT:

τ(x) =π

ω0ωbexp (βEb) .

The rate of arrival at b is 1/τ . Only have of the particles escape. Consequently, the escape rate (or

reaction rate), is given by 12τ

:

κ =ω0ωb2π

exp (−βEb) .

9.4.2 The Intermediate Regime: γ = O(1)

• Consider now the problem of escape from a potential well for the Langevin equation

q = −∂qV (q)− γq +√

2γβ−1W . (9.14)

194

• The reaction rate depends on the fiction coefficient and the temperature. In the overdamped

limit (γ 1) we retrieve (??), appropriately rescaled with γ:

κ =ω0ωb2πγ

exp (−βEb) . (9.15)

• We can also obtain a formula for the reaction rate for γ = O(1):

κ =

√γ2

4− ω2

b −γ2

ωb

ω0

2πexp (−βEb) . (9.16)

• Naturally, in the limit as γ → +∞ (9.16) reduces to (9.15)

9.4.3 Calculation of the Reaction Rate in the energy-diffusion-limited regime

In order to calculate the reaction rate in the underdamped or energy-diffusion-limited regime

γ 1 we need to study the diffusion process for the energy, (8.69) or (8.70). The result is

κ = γβI(Eb)ω0

2πe−βEb , (9.17)

where I(Eb) denotes the action evaluated at b.


The calculation of reaction rates and the stochastic modeling of chemical reactions has been a very

active area of research since the 30’s. One of the first methods that were developed was that of

transition state theory. Kramers developed his theory in his celebrated paper [49]. In this chapter

we have based our approach to the calculation of the mean first passage time. Our analysis is

based mostly on [35, Ch. 5, Ch. 9], [96, Ch. 4] and the excellent review article [41]. We highly

recommend this review article for further information on reaction rate theory. See also [40] and

the review article of Melnikov (1991). A formula for the escape rate which is valid for all values

of friction coefficient was obtained by Melnikov and Meshkov in 1986, J. Chem. Phys 85(2) 1018-

1027. This formula requires the calculation of integrals and it reduced to (9.15) and (9.17) in the

overdamped and underdamped limits, respectively.

There are many applications of interest where it is important to calculate reaction rates for

non-Markovian Langevin equations of the form

x = −V ′(x)−∫ t

0

bγ(t− s)x(s) ds+ ξ(t) (9.18a)

195

〈ξ(t)ξ(0)〉 = kBTM−1γ(t) (9.18b)

We will derive generalized non–Markovian equations of the form (9.18a), together with the

fluctuation–dissipation theorem (11.10), in Chapter 11. The calculation of reaction rates for the

generalized Langevin equation is presented in [40].

The long time/small temperature asymptotics can be studied rigorously by means of the theory

of Freidlin-Wentzell [29]. See also [6]. A related issue is that of the small temperature asymptotics

for the eigenvalues (in particular, the first eigenvalue) of the generator of the Markov process x(t)

which is the solution of

γx = −∇V (x) +√

2γkBTW .

The theory of Freidlin and Wentzell has also been extended to infinite dimensional problems. This

is a very important problem in many applications such as micromagnetics...We refer to CITE...

for more details.

A systematic study of the problem of the escape from a potential well was developed by

Matkowsky, Schuss and collaborators [86, 63, 64]. This approach is based on a systematic use

of singular perturbation theory. In particular, the calculation of the transition rate which is uni-

formly valid in the friction coefficient is presented in [64]. This formula is obtained through a

careful analysis of the PDE

p∂qτ − ∂qV ∂pτ + γ(−p∂p + kBT∂2p)τ = −1,

for the mean first passage time τ . The PDE is equipped, of course, with the appropriate boundary

conditions. Singular perturbation theory is used to study the small temperature asymptotics of

solutions to the boundary value problem. The formula derived in this paper reduces to the formulas

which are valid at large and small values of the friction coefficient at the appropriate asymptotic

limits.

The study of rare transition events between long lived metastable states is a key feature in

many systems in physics, chemistry and biology. Rare transition events play an important role,

for example, in the analysis of the transition between different conformation states of biological

macromolecules such as DNA [87]. The study of rare events is one of the most active research

areas in the applied stochastic processes. Recent developments in this area involve the transition

path theory of W. E and Vanden Eijnden. Various simple applications of this theory are presented

in Metzner, Schutte et al 2006. As in the mean first passage time approach, transition path theory

196

is also based on the solution of an appropriate boundary value problem for the so-called commitor

function.

9.6 Exercises

197

198

Chapter 10

Stochastic Resonance and Brownian Motors

10.1 Introduction

10.2 Stochastic Resonance

10.3 Brownian Motors

10.4 Introduction

Particle transport in spatially periodic, noisy systems has attracted considerable attention over the

last decades, see e.g. [82, Ch. 11], [78] and the references therein. There are various physical

systems where Brownian motion in periodic potentials plays a prominent role, such as Josephson

junctions [3], surface diffusion [52, 84] and superionic conductors [33]. While the system of a

Brownian particle in a periodic potential is kept away from equilibrium by an external, determin-

istic or random, force, detailed balance does not hold. Consequently, and in the absence of any

spatial symmetry, a net particle current will appear, without any violation of the second law of

thermodynamics. It was this fundamental observation [60] that led to a revival of interest in the

problem of particle transport in periodic potentials with broken spatial symmetry. These types of

non– equilibrium systems, which are often called Brownian motors or ratchets, have found new

and exciting applications e.g as the basis of theoretical models for various intracellular transport

processes such as molecular motors [9]. Furthermore, various experimental methods for particle

separation have been suggested which are based on the theory of Brownian motors [7].

The long time behavior of a Brownian particle in a periodic potential is determined uniquely

199

by the effective drift and the effective diffusion tensor which are defined, respectively, as

Ueff = limt→∞

〈x(t)− x(0)〉t

(10.1)

and

Deff = limt→∞

〈x(t)− 〈x(t)〉)⊗ (x(t)− 〈x(t)〉)〉2t

. (10.2)

Here x(t) denotes the particle position, 〈·〉 denotes ensemble average and ⊗ stands for the tensor

product. Indeed, an argument based on the central limit theorem [5, Ch. 3], [47] implies that at

long times the particle performs an effective Brownian motion which is a Gaussian process, and

hence the first two moments are sufficient to determine the process uniquely. The main goal of all

theoretical investigations of noisy, non–equilibrium particle transport is the calculation of (10.1)

and (10.2). One wishes, in particular, to analyze the dependence of these two quantities on the

various parameters of the problem, such as the friction coefficient, the temperature and the particle

mass.

Enormous theoretical effort has been put into the study of Brownian ratchets and, more gen-

erally, of Brownian particles in spatially periodic potentials [78]. The vast majority of all these

theoretical investigations is concerned with the calculation of the effective drift for one dimen-

sional models. This is not surprising, since the theoretical tools that are currently available are

not sufficient for the analytical treatment of the multi–dimensional problem. This is only possible

when the potential and/or noise are such that the problem can be reduced to a one dimensional one

[19]. For more general multi–dimensional problems one has to resort to numerical simulations.

There are various applications, however, where the one dimensional analysis is inadequate. As an

example we mention the technique for separation of macromolecules in microfabricated sieves that

was proposed in [14]. In the two–dimensional setting considered in this paper, an appropriately

chosen driving force in the y direction produces a constant drift in the x direction, but with a zero

net velocity in the y direction. On the other hand, a force in the x direction produces no drift in the

y direction. The theoretical analysis of this problem requires new technical tools.

Furthermore, the number of theoretical studies related to the calculation of the effective diffu-

sion tensor has also been scarce [34, 55, 79, 80, 85]. In these papers, relatively simple potentials

and/or forcing terms are considered, such as tilting periodic potentials or simple periodic in time

forcing. It is widely recognized that the calculation of the effective diffusion coefficient is tech-

nically more demanding than that of the effective drift. Indeed, as we will show in this paper, it

200

requires the solution of a Poisson equation, in addition to the solution of the stationary Fokker–

Planck equation which is sufficient for the calculation of the effective drift. Diffusive, rather than

directed, transport can be potentially extremely important in the design of experimental setups for

particle selection [78, Sec 5.11] [85]. It is therefore desirable to develop systematic tools for the

calculation of the effective diffusion coefficient (or tensor, in the multi–dimensional setting).

From a mathematical point of view, non–equilibrium systems which are subject to unbiased

noise can be modelled as non–reversible Markov processes [76] and can be expressed in terms

of solutions to stochastic differential equations (SDEs). The SDEs which govern the motion of

a Brownian particle in a periodic potential possess inherent length and time scales: those related

to the spatial period of the potential and the temporal period (or correlation time) of the external

driving force. From this point of view the calculation of the effective drift and the effective dif-

fusion coefficient amounts to studying the behavior of solutions to the underlying SDEs at length

and time scales which are much longer than the characteristic scales of the system. A systematic

methodology for studying problems of this type, which is based on scale separation, has been de-

veloped many years ago [5, ?, ?]. The techniques developed in the aforementioned references are

appropriate for the asymptotic analysis of stochastic systems (and Markov processes in particular)

which are spatially and/or temporally periodic. The purpose of this work is to apply these multi-

scale techniques to the study Brownian motors in arbitrary dimensions, with particular emphasis

to the calculation of the effective diffusion tensor.

The rest of this paper is organized as follows. In section 10.5 we introduce the model that we

will study. In section 10.6 we obtain formulae for the effective drift and the effective diffusion

tensor in the case where all external forces are Markov processes. In section 10.7 we study the

effective diffusion coefficient for a Brownian particle in a periodic potential driven simultaneously

by additive Gaussian white and colored noise. Section ?? is reserved for conclusions. In Appendix

A we derive formulae for the effective drift and the effective diffusion coefficient for the case where

the Brownian particle is driven away from equilibrium by periodic in time external fluctuations.

Finally, in appendix B we use the method developed in this paper to calculate the effective diffusion

coefficient of an overdamped particle in a one dimensional tilted periodic potential.

201

10.5 The Model

We consider the overdamped d–dimensional stochastic dynamics for a state variable x(t) ∈ Rd

[78, sec. 3]

γx(t) = −∇V (x(t), f(t)) + y(t) +√

2γkBTξ(t), (10.3)

where γ is the friction coefficient, kB the Boltzmann constant and T denotes the temperature. ξ(t)

stands for the standard d–dimensional white noise process, i.e.

〈ξi(t)〉 = 0 and 〈ξi(t)ξj(s)〉 = δijδ(t− s), i, j = 1, . . . d.

We take f(t) and y(t) to be Markov processes with respective state spaces Ef , Ey and generators

Lf , Ly. The potential V (x, f) is periodic in x for every f , with period L in all spatial directions:

V (x+ Lei, f) = V (x, f), i = 1, . . . , d,

where eidi=1 denotes the standard basis of Rd. We will use the notation Q = [0, L]d.

The processes f(t) and y(t) can be continuous in time diffusion processes which are con-

structed as solutions of stochastic differential equations, dichotomous noise [42, Ch. 9], more

general Markov chains etc. The (easier) case where f(t) and y(t) are deterministic, periodic func-

tions of time is treated in the appendix.

For simplicity, we have assumed that the temperature in (10.3) is constant. However, this

assumption is with no loss of generality, since eqn. (10.3) with a time dependent temperature can

be mapped to an equation with constant temperature and an appropriate effective potential [78, sec.

6]. Thus, the above framework is general enough to encompass most of the models that have been

studied in the literature, such as pulsating, tilting, or temperature ratchets. We remark that the state

variable x(t) does not necessarily denote the position of a Brownian particle. We will, however,

refer to x(t) as the particle position in the sequel.

The process x(t), f(t), y(t) in the extended phase space Rd × Ef × Ey is Markovian with

generator

L = F (x, f, y) · ∇x +D∆x + Lf + Ly,

where D := kBTγ

and

F (x, f, y) =1

γ(−∇V (x, f) + y) .

202

To this process we can associate the initial value problem for the backward Kolmogorov Equation

[69, Ch. 8]∂u

∂t= Lu, u(x, y, f, t = 0) = uin(x, y, f). (10.4)

which is, of course, the adjoint to the Fokker–Planck equation. Our derivation of formulae for the

effective drift and the effective diffusion tensor is based on singular perturbation analysis of the

initial value problem (10.4).

10.6 Multiscale Analysis

In this section we derive formulae for the effective drift and the effective diffusion tensor for x(t),

the solution of (10.3). Let us outline the basic philosophy behind the derivation of formulae (10.13)

and (10.18). We are interested in the long time, large scale behavior of x(t). For the analysis that

follows it is convenient to introduce a parameter ε 1 which in effect is the ratio between the

length scale defined through the period of the potential and a large ”macroscopic” length scale at

which the motion of the particle is governed by and effective Brownian motion. The limit ε → 0

corresponds to the limit of infinite scale separation. The behavior of the system in this limit can be

analysed using singular perturbation theory.

We remark that the calculation of the effective drift and the effective diffusion tensor are per-

formed seperately, because a different re–scaling is needed in each case. This is due to the fact that

advection and diffusion have different characteristic time scales.

10.6.1 Calculation of the Effective Drift

The backward Kolmogorov equation reads

∂u(x, y, f, t)

∂t= (F (x, f, y) · ∇x +D∆x + Lf + Ly)u(x, y, f, t). (10.5)

We re–scale a space and time in (10.5) according to

x→ εx, t→ εt

and divide through by ε to obtain

∂uε

∂t=

1

ε

(F(

(x

ε, f, y

)· ∇x + εD∆x + Lf + Ly

)uε. (10.6)

203

We solve (10.6) pertubatively by looking for a solution in the form of a two–scale expansion

uε(x, f, y, t) = u0

(x,x

ε, f, y, t

)+ εu1

(x,x

ε, f, y, t

)+ ε2u2

(x,x

ε, f, y, t

)+ . . . . (10.7)

All terms in the expansion (10.7) are periodic functions of z = x/ε. From the chain rule we have

∇x → ∇x +1

ε∇z. (10.8)

Notice that we do not take the terms in the expansion (10.8) to depend explicitly on t/ε. This is

because the coefficients of the backward Kolmogorov equation (10.6) do not depend explicitly on

the fast time t/ε. In the case where the fluctuations are periodic, rather than Markovian, in time,

we will need to assume that the terms in the multiscale expansion for uε(x, t) depend explicitly on

t/ε. The details are presented in the appendix.

We substitute now (10.7) into (10.5), use (10.8) and treat x and z as independent variables.

Upon equating the coefficients of equal powers in ε we obtain the following sequence of equations

L0u0 = 0, (10.9)

L0u1 = −L1u0 +∂u0

∂t, (10.10)

. . . = . . . ,

where

L0 = F (z, f, y) · ∇z +D∆z + Ly + Lf (10.11)

and

L1 = F (z, f, y) · ∇x + 2D∇z∇x.

The operator L0 is the generator of a Markov process on Q × Ey × Ef . In order to proceed

we need to assume that this process is ergodic: there exists a unique stationary solution of the

Fokker–Planck equation

L∗0ρ(z, y, f) = 0, (10.12)

with ∫Q×Ey×Ef

ρ(z, y, f) dzdydf = 1

and

L∗0ρ = ∇z · (F (z, f, y)ρ) +D∆zρ+ L∗yρ+ L∗fρ.

204

In the above L∗f and L∗y are the Fokker–Planck operators of f and y, respectively. The stationary

density ρ(z, y, f) satisfies periodic boundary conditions in z and appropriate boundary conditions

in f and y. We emphasize that the ergodicity of the ”fast” process is necessary for the very exis-

tence of an effective drift and an effective diffusion coefficient, and it has been tacitly assumed in

all theoretical investigations concerning Brownian motors [78].

Under the assumption that (10.12) has a unique solution eqn. (10.9) implies, by Fredholm

alternative, that u0 is independent of the fast scales:

u0 = u(x, t).

Eqn. (10.10) now becomes

L0u1 =∂u(x, t)

∂t− F (z, y, f) · ∇xu(x, t).

In order for this equation to be well posed it is necessary that the right hand side averages to 0

with respect to the invariant distribution ρ(z, f, y). This leads to the following backward Liouville

equation∂u(x, t)

∂t= Ueff · ∇xu(x, t),

with the effective drift given by

Ueff =

∫Q×Ey×Ef

F (z, y, f)ρ(z, y, f) dzdydf

=1

γ

∫Q×Ey×Ef

(−∇V (x, f) + y) ρ(z, y, f) dzdydf. (10.13)

10.6.2 Calculation of the Effective Diffusion Coefficient

We assume for the moment that the effective drift vanishes, Ueff = 0. We perform a diffusive

re–scaling in (10.5)

x→ εx, t→ ε2t

and divide through by ε2 to obtain

∂uε

∂t=

1

ε2

(F(xε, f, y

)· ∇x + εD∆x + Lf + Ly

)uε, (10.14)

205

We go through the same analysis as in the previous subsection to obtain the following sequence of

equations.

L0u0 = 0, (10.15)

L0u1 = −L1u0, (10.16)

L0u2 = −L1u1 − L2u0, (10.17)

. . . = . . . ,

where L0 and L1 were defined in the previous subsection and

L2 = − ∂

∂t+D∆x.

Equation (10.15) implies that u0 = u(x, t). Now (10.16) becomes

L0u1 = −F (z, y, f) · ∇xu(x, t).

Since we have assumed that Ueff = 0, the right hand side of the above equation belongs to the null

space of L∗0 and this equation is well posed. Its solution is

u1(x, z, f, y, t) = χ(z, y, f) · ∇xu(x, t),

where the auxiliary field χ(z, y, f) satisfies the Poisson equation

−L0χ(z, y, f) = F (z, y, f)

with periodic boundary conditions in z and appropriate boundary conditions in y and f .

We proceed now with the analysis of equation (10.17). The solvability condition for this equa-

tion reads ∫Q×Ey×Ef

(−L1u1 − L2u0) dzdydf = 0,

from which, after some straightforward algebra, we obtain the limiting backward Kolmogorov

equation for u(x, t)

∂u(x, t)

∂t=

d∑i,j=1

Deffij

∂2u(x, t)

∂xi∂xj.

The effective diffusion tensor is

Deffij = Dδij +

⟨Fi(z, y, f)χj(z, y, f)

⟩ρ

+ 2D

⟨∂χi(z, y, f)

∂zj

⟩ρ

, (10.18)

206

where the notation 〈·〉ρ for the averaging with respect to the invariant density has been introduced.

The case where the effective drift does not vanish, Ueff 6= 0, can be reduced to the situation

analyzed in this subsection through a Galilean transformation with respect to Ueff 1. The effective

diffusion tensor is now given by

Deffij = Dδij +

⟨(Fi(z, y, f)− U i

eff

)χj(z, y, f)

⟩ρ

+2D

⟨∂χi(z, y, f)

∂zj

⟩ρ

, (10.19)

and the field χ(z, f, y) satisfies the Poisson equation

−L0χ = F (z, y, f)− Ueff . (10.20)

10.7 Effective Diffusion Coefficient for Correlation Ratchets

In this section we consider the following model [4, 17]

γx(t) = −∇V (x(t)) + y(t) +√

2γkBT ξ(t), (10.21a)

y(t) = −1

τy(t) +

√2σ

τζ(t), (10.21b)

where ξ(t) and ζ(t) are mutually independent standard d–dimensional white noise processes. The

potential V (x) is assumed to be L–periodic in all spatial directions The process y(t) is the d–

dimensional Onrstein–Uhlenbeck (OU) process [35] which is a mean zero Gaussian process with

correlation function

〈yi(t)yj(s)〉 = δijσe− |t−s|

τ , i, j = 1, . . . , d.

Let z(t) denote the restriction of x(t) to Q = [0, 2π]d. The generator of the Markov process

z(t), y(t) is

L =1

γ(−∇zV (z) + y) · ∇z +D∆z +

1

τ(−y · ∇y + σ∆y)

with D := kBTγ

. Standard results from the ergodic theory of Markov processes see e.g. [5, ch.

3] ensure that the process z(t), y(t) ∈ Q × Rd, with generator L is ergodic and that the unique

1In other words, the process x(ε)(t) := ε(x(t/ε2)− ε−2Ueff t

)converges to a mean zero Gaussian process with

effective diffusivity given by (10.19)

207

invariant measure has a smooth density ρ(y, z) with respect to the Lebesgue measure. This is true

even at zero temperature [?, 65]. Hence, the results of section 10.6 apply: the effective drift and

effective diffusion tensor are given by formulae (10.13) and (10.18), respectively. Of course, in

order to calculate these quantities we need to solve equations (10.12) and (10.20) which take the

form:

−1

γ∇z · ((−∇zV (z) + y)ρ(y, z)) +D∆zρ(y, z) +

1

τ

(∇y · (yρ(y, z))

+σ∆yρ(y, z))

= 0

and

−1

γ(−∇zV (z) + y) · ∇zχ(y, z)−D∆zχ(y, z)

−1

τ

(− y · ∇yχ(y, z) + σ∆yχ(y, z)

)=

1

γ(−∇zV (z) + y)− U.

The effective diffusion tensor is positive definite. To prove this, let e be a unit vector in Rd, define

f = F · e, u = Ueff · e and let φ := e · χ denote the unique solution of the scalar problem

−Lφ = (F − U) · e =: f − u, φ(y, z + L) = φ(y, z), 〈φ〉ρ = 0.

Let now h(y, z) be a sufficiently smooth function. Elementary computations yield

L∗(hρ) = −ρLh+ 2D∇z · (ρ∇zh) +2σ

τ∇y · (ρ∇yh) .

We use the above calculation in the formula for the effective diffusion tensor, together with an

integration by parts and the fact that 〈φ(y, z)〉ρ = 0, to obtain

e ·Deff · e = D + 〈fφ〉ρ +D〈e · ∇zφ〉ρ

= D + 〈uφ〉ρ − 〈φLφ〉ρ + 2D〈e · ∇yφ〉ρ

= D +D〈|∇zφ|2〉ρ + 2D〈e · ∇yφ〉ρ +σ

τ〈|∇yφ|2〉ρ

= D〈|e+∇zφ|2〉ρ +σ

τ〈|∇yφ|2〉ρ.

From the above formula we see that the effective diffusion tensor is non–negative definite and that

it is well defined even at zero temperature:

e ·Deff (T = 0) · e =σ

τ〈|∇yφ(T = 0)|2〉ρ.

208

Although we cannot solve these equations in closed form, it is possible to calculate the small τ

expansion of the effective drift and the effective diffusion coefficient, at least in one dimension.

Indeed, a tedious calculation using singular perturbation theory, e.g. [42, ?] yields

Ueff = O(τ 3), (10.22)

and

Deff =L2

ZZ

(D + τσ

(1 +

1

γD2

(Z2

Z− Z1

Z

)))+O(τ 2). (10.23)

In writing eqn. (10.23) we have used the following notation

Z =

∫ L

0

e−V (z)D dz, Z =

∫ L

0

eV (z)D dz,

Z1 =

∫ L

0

V (z)e−V (z)D dz, Z2 =

∫ L

0

V (z)eV (z)D dz.

It is relatively straightforward to obtain the next order correction to (10.23); the resulting formula

is, however, too complicated to be of much use.

The small τ asymptotics for the effective drift were also studied in [4, 17] for the model con-

sidered in this section and in [16, 92] when the external fluctuations are given by a continuous

time Markov chain. It was shown in [16, 92] that, for the case of dichotomous noise, the small

τ expansion for Ueff is valid only for sufficiently smooth potentials. Indeed, the first non–zero

term–of order O(τ 3)–involves the second derivative of the potential. Non–smooth potentials lead

to an effective drift which is O(τ52 ). On the contrary, eqn. (10.23) does not involve any deriva-

tives of the potential and, hence, is well defined even for non–smooth potentials. On the other

hand, theO(τ 2) term involves third order derivatives of the potential and can be defined only when

V (x) ∈ C3(0, L).

We also remark that the expansion (10.23) is only valid for positive temperatures. The problem

becomes substantially more complicated at zero temperature because the generator of the Markov

process becomes a degenerate differential operator at T = 0.

Naturally, in the limit as τ → 0 the effective diffusion coefficient converges to its value for

y ≡ 0 :

Deff =L2D

ZZ. (10.24)

This is the effective diffusion coefficient for a Brownian particle moving in a periodic potential, in

the absence of external fluctuations [54, 94]. It is well known, and easy to prove, that the effective

209

Figure 10.1: Effective diffusivity for (10.21) with V (x) = cos(x) as a function of τ , for σ =1, D = kBT

γ= 1, γ = 1. Solid line: Results from Monte Carlo simulations. Dashed line: Results

from formula (10.23).

diffusion coefficient given by (10.24) is bounded from above by D. This not the case for the

effective diffusivity of the correlation ratchet (10.21).

We compare now the small τ asymptotics for the effective diffusion coefficient with Monte

Carlo simulations. The results presented in figures 1 and 2 were obtained from the numerical

solution of equations (10.21) using the Euler–Marayama method, for the cosine potential V (x) =

cos(x). The integration step that was used was ∆t = 10−4 and the total number of integration

steps was 107. The effective diffusion coefficient was calculated by ensemble averaging over 2000

particle trajectories which were initially uniformly distributed on [0, 2π].

In figure 1 we present the effective diffusion coefficient as a function of the correlation time τ

of the OU process. We also plot the results of the small τ asymptotics. The agreement between

theoretical predictions and numerical results is quite satisfactory, for τ 1. We also observe that

the effective diffusivity is an increasing function of τ .

In figure 2 we plot the effective diffusivity as a function of the noise strength σ of the OU

process. As expected, the effective diffusivity is an increasing function of σ. The agreement

between the theoretical predictions from (10.23) and the numerical experiments is excellent.

210

Figure 10.2: Effective diffusivity for (10.21) with V (x) = cos(x) as a function of σ, for τ =0.1, D = kBT

γ= 1, γ = 1. Solid line: Results from Monte Carlo simulations. Dashed line:

Results from formula (10.23).


10.9 Exercises

1. In this appendix we derive formulae for the mean drift and the effective diffusion coefficient for

a Brownian particle which moves according to

γx(t) = −∇V (x(t), t) + y(t) +√

2γkBT (x(t), t)ξ(t), (10.25)

for space–time periodic potential V (x, t) and temperature T (x, t) > 0, and periodic in time

force y(t). We take the spatial period to be L in all directions and the temporal period of

V (x, t), T (x, t) and y(t) to be T . We use the notation Q = [0, L]d. Equation (10.25) is inter-

preted in the Ito sense.

211

212

Chapter 11

Stochastic Processes and StatisticalMechanics

11.1 Introduction

We will consider some simple ”particle + environment” systems for which we can obtain rigorously

a stochastic equation that describes the dynamics of the Brownian particle.

We can describe the dynamics of the Brownian particle/fluid system:

H(QN , PN ; q, p) = HBP (QN , PN) +HHB(q, p) +HI(QN , q), (11.1)

where q, p :=qjNj=1, pjNj=1

are the positions and momenta of the fluid particles, N is the

number of fluid (heat bath) particles (we will need to take the thermodynamic limit N → +∞).

The initial conditions of the Brownian particle are taken to be fixed, whereas the fluid is assumed

to be initially in equilibrium (Gibbs distribution). Goal: eliminate the fluid variables q, p :=qjNj=1, pjNj=1

to obtain a closed equation for the Brownian particle. We will see that this

equation is a stochastic integrodifferential equation, the Generalized Langevin Equation (GLE)

(in the limit as N → +∞)

Q = −V ′(Q)−∫ t

0

R(t− s)Q(s) ds+ F (t), (11.2)

where R(t) is the memory kernel and F (t) is the noise. We will also see that, in some appropriate

limit, we can derive the Markovian Langevin equation (9.11).

213

11.2 The Kac-Zwanzig Model

Need to model the interaction between the heat bath particles and the coupling between the Brow-

nian particle and the heat bath. The simplest model is that of a harmonic heat bath and of linear

coupling:

H(QN , PN , q, p) =P 2N

2+ V (QN) +

N∑n=1

p2n

2mn

+1

2kn(qn − λQN)2. (11.3)

The initial conditions of the Brownian particle QN(0), PN(0) := Q0, P0 are taken to be

deterministic.

The initial conditions of the heat bath particles are distributed according to the Gibbs distribution,

conditional on the knowledge of Q0, P0:

µβ(dpdq) = Z−1e−βH(q,p) dqdp, (11.4)

where β is the inverse temperature. This is a way of introducing the concept of the temperature in

the system (through the average kinetic energy of the bath particles). In order to choose the initial

conditions according to µβ(dpdq) we can take

qn(0) = λQ0 +√β−1k−1

n ξn, pn(0) =√mnβ−1ηn, (11.5)

where the ξn ηn are mutually independent sequences of i.i.d. N (0, 1) random variables. Notice that

we actually consider the Gibbs measure of an effective (renormalized) Hamiltonian. Other choices

for the initial conditions are possible. For example, we can take qn(0) =√β−1k−1

n ξn. Our choice

of I.C. ensures that the forcing term in the GLE that we will derive is mean zero (see below).

Hamilton’s equations of motion are:

QN + V ′(QN) =N∑n=1

kn(λqn − λ2QN), (11.6a)

qn + ω2n(qn − λQN) = 0, n = 1, . . . N, (11.6b)

where ω2n = kn/mn. The equations for the heat bath particles are second order linear inhomoge-

neous equations with constant coefficients. Our plan is to solve them and then to substitute the

result in the equations of motion for the Brownian particle. We can solve the equations of motion

214

for the heat bath variables using the variation of constants formula

qn(t) = qn(0) cos(ωnt) +pn(0)

mnωnsin(ωnt)

+ωnλ

∫ t

0

sin(ωn(t− s))QN(s) ds.

An integration by parts yields

qn(t) = qn(0) cos(ωnt) +pn(0)

mnωnsin(ωnt) + λQN(t)

−λQN(0) cos(ωnt)− λ∫ t

0

cos(ωn(t− s))QN(s) ds.

We substitute this in equation (11.6) and use the initial conditions (11.5) to obtain the Generalized

Langevin Equation

QN = −V ′(QN)− λ2

∫ t

0

RN(t− s)QN(s) ds+ λFN(t), (11.7)

where the memory kernel is

RN(t) =N∑n=1

kn cos(ωnt) (11.8)

and the noise process is

FN(t) =N∑n=1

kn (qn(0)− λQ0) cos(ωnt) +knpn(0)

mnωnsin(ωnt)

=√β−1

N∑n=1

√kn (ξn cos(ωnt) + ηn sin(ωnt)) . (11.9)

Remarks 11.2.1. i. The noisy and random term are related through the fluctuation-dissipation

theorem:

〈FN(t)FN(s)〉 = β−1

N∑n=1

kn(

cos(ωnt) cos(ωns)

+ sin(ωnt) sin(ωns))

= β−1RN(t− s). (11.10)

ii. The noise F (t) is a mean zero Gaussian process.

215

iii. The choice of the initial conditions (11.5) for q, p is crucial for the form of the GLE and, in

particular, for the fluctuation-dissipation theorem (11.10) to be valid.

iv. The parameter λ measures the strength of the coupling between the Brownian particle and

the heat bath.

v. By choosing the frequencies ωn and spring constants kn(ω) of the heat bath particles appro-

priately we can pass to the limit as N → +∞ and obtain the GLE with different memory

kernels R(t) and noise processes F (t).

Let a ∈ (0, 1), 2b = 1 − a and set ωn = Naζn where ζn∞n=1 are i.i.d. with ζ1 ∼ U(0, 1).

Furthermore, we choose the spring constants according to

kn =f 2(ωn)

N2b,

where the function f(ωn) decays sufficiently fast at infinity. We can rewrite the dissipation and

noise terms in the form

RN(t) =N∑n=1

f 2(ωn) cos(ωnt) ∆ω

and

FN(t) =N∑n=1

f(ωn) (ξn cos(ωnt) + ηn sin(ωnt))√

∆ω,

where ∆ω = Na/N . Using now properties of Fourier series with random coefficients/frequencies

and of weak convergence of probability measures we can pass to the limit:

RN(t)→ R(t) in L1[0, T ],

for a.a. ζn∞n=1 and

FN(t)→ F (t) weakly in C([0, T ],R).

The time T > 0 if finite but arbitrary. The limiting kernel and noise satisfy the fluctuation-

dissipation theorem (11.10):

〈F (t)F (s)〉 = β−1R(t− s). (11.11)

QN(t), the solution of (11.7) converges weakly to the solution of the limiting GLE

Q = −V ′(Q)− λ2

∫ t

0

R(t− s)Q(s) ds+ λF (t). (11.12)

216

The properties of the limiting dissipation and noise are determined by the function f(ω). As an

example, consider the Lorentzian function

f 2(ω) =2α/π

α2 + ω2(11.13)

with α > 0. Then

R(t) = e−α|t|.

The noise process F (t) is a mean zero stationary Gaussian process with continuous paths and,

from (11.11), exponential correlation function:

〈F (t)F (s)〉 = β−1e−α|t−s|.

Hence, F (t) is the stationary Ornstein-Uhlenbeck process:

dF

dt= −αF +

√2β−1α

dW

dt, (11.14)

with F (0) ∼ N (0, β−1). The GLE (11.12) becomes

Q = −V ′(Q)− λ2

∫ t

0

e−α|t−s|Q(s) ds+ λ2F (t), (11.15)

where F (t) is the OU process (11.14). Q(t), the solution of the GLE (11.12), is not a Markov

process, i.e. the future is not statistically independent of the past, when conditioned on the present.

The stochastic process Q(t) has memory. We can turn (11.12) into a Markovian SDE by enlarging

the dimension of state space, i.e. introducing auxiliary variables. We might have to introduce

infinitely many variables! For the case of the exponential memory kernel, when the noise is given

by an OU process, it is sufficient to introduce one auxiliary variable. We can rewrite (11.15) as a

system of SDEs:

dQ

dt= P,

dP

dt= −V ′(Q) + λZ,

dZ

dt= −αZ − λP +

√2αβ−1

dW

dt,

where Z(0) ∼ N (0, β−1).

The process Q(t), P (t), Z(t) ∈ R3 is Markovian.

217

It is a degenerate Markov process: noise acts directly only on one of the 3 degrees of freedom.

We can eliminate the auxiliary process Z by taking an appropriate distinguished limit.

Set λ =√γε−1, α = ε−2. Equations (11.17) become

dQ

dt= P,

dP

dt= −V ′(Q) +

√γ

εZ,

dZ

dt= − 1

ε2Z −

√γ

εP +

√2β−1

ε2

dW

dt.

We can use tools from singular perturbation theory for Markov processes to show that, in the limit

as ε→ 0, we have that1

εZ →

√2γβ−1

dW

dt− γP.

Thus, in this limit we obtain the Markovian Langevin Equation (R(t) = γδ(t))

Q = −V ′(Q)− γQ+√

2γβ−1dW

dt. (11.18)

11.3 Quasi-Markovian Stochastic Processes

In the previous section we studied the gLE for the case where the memory kernel decays expo-

nentially fast. We showed that we can represent the gLE as a Markovian processes by adding

one additional variable, the solution of a linear SDE. A natural question which arises is whether

it is always possible to turn the gLE into a Markovian system by adding a finite number of ad-

ditional variables. This is not always the case. However, there are many applications where the

memory kernel decays sufficiently fast so that we can approximate the gLE by a finite dimensional

Markovian system.

We introduce the concept of a quasi-Markovian stochastic process.

Definition 11.3.1. We will say that a stochastic process Xt is quasi-Markovian if it can be repre-

sented as a Markovian stochastic process by adding a finite number of additional variables: There

exists a stochastic process Yt so that Xt, Yt is a Markov process.

In many cases the additional variables Yt in terms of solutions to linear SDEs. This is possi-

ble, for example, when the memory kernel consists of a sum of exponential functions, a natural

extension of the case considered in the previous section.

218

Proposition 11.3.2. Consider the generalized Langevin equation

Q = p, P = −V ′(Q)−∫ t

0

R(t− s)P (s) ds+ F (t) (11.19)

with a memory kernel of the form

R(t) =n∑j=1

λje−αj |t| (11.20)

and F (t) being a mean zero stationary Gaussian process and where R(t) and F (t) are related

through the fluctuation-dissipation theorem,

〈F (t)F (s)〉 = β−1R(t− s). (11.21)

Then (11.19) is equivalent to the Markovian SDE

Q = P, P = −V ′(Q) +n∑j=1

λjuj, uj = −αjuj − λjpj +√

2αjβ−1, j = 1, . . . n, (11.22)

with uj ∼ N (0, β−1) and where Wj(t) are independent standard one dimensional Brownian mo-

tions.

Proof. We solve the equations for uj:

uj = −λj∫ t

0

e−αj(t−s)P (s) ds+ e−αjtuj(0) +√

2αjβ−1

∫ t

0

e−αj(t−s)dWj

=: −∫ t

0

Rj(t− s)P (s) ds+ ηj(t).

We substitute this into the equation for P to obtain

P = −V ′(Q) +n∑j=1

λjuj

= −V ′(Q) +n∑j=1

λj

(−∫ t

0

Rj(t− s)P (s) ds+ ηj(t)

)= −V ′(Q)−

∫ t

0

R(t− s)P (s) ds+ F (t)

where R(t) is given by (11.20) and the noise process F (t) is

F (t) =n∑j=1

λjηj(t),

219

with ηj(t) being one-dimensional stationary independent OU processes. We readily check that the

fluctuation-dissipatione theorem is satisfied:

〈F (t)F (s)〉 =n∑

i,j=1

λiλj〈ηi(s)ηj(t)〉

=n∑

i,j=1

λiλjδije−αi|t−s|

=n∑i=1

λie−αi|t−s| = R(t− s).

These additional variables are solutions of a linear system of SDEs. This follows from results

in approximation theory. Consider now the case where the memory kernel is a bounded analytic

function. Its Laplace transform

R(s) =

∫ +∞

0

e−stR(t) dt

can be represented as a continued fraction:

R(s) =∆2

1

s+ γ1 +∆2

2

...

, γi > 0, (11.23)

Since R(t) is bounded, we have that

lims→∞

R(s) = 0.

Consider an approximation RN(t) such that the continued fraction representation terminates after

N steps.

RN(t) is bounded which implies that

lims→∞

RN(s) = 0.

The Laplace transform of RN(t) is a rational function:

RN(s) =

∑Nj=1 ajs

N−j

sN +∑N

j=1 bjsN−j

, aj, bj ∈ R. (11.24)

This is the Laplace transform of the autocorrelation function of an appropriate linear system of

SDEs. Indeed, setdxjdt

= −bjxj + xj+1 + ajdWj

dt, j = 1, . . . , N, (11.25)

220

with xN+1(t) = 0. The process x1(t) is a stationary Gaussian process with autocorrelation function

RN(t). For N = 1 and b1 = α, a1 =√

2β−1α we derive the GLE (11.15) with F (t) being the OU

process (11.14). Consider now the case N = 2 with bi = αi, i = 1, 2 and a1 = 0, a2 =√

2β−1α2.

The GLE becomes

Q = −V ′(Q)− λ2

∫ t

0

R(t− s)Q(s) ds+ λF1(t),

F1 = −α1F1 + F2,

F2 = −α2F2 +√

2β−1α2W2,

with

β−1R(t− s) = 〈F1(t)F1(s)〉.

We can write (11.27) as a Markovian system for the variables Q, P, Z1, Z2:

Q = P,

P = −V ′(Q) + λZ1(t),

Z1 = −α1Z1 + Z2,

Z2 = −α2Z2 − λP +√

2β−1α2W2.

Notice that this diffusion process is ”more degenerate” than (11.15): noise acts on fewer degrees

of freedom. It is still, however, hypoelliptic (Hormander’s condition is satisfied): there is sufficient

interaction between the degrees of freedom Q, P, Z1, Z2 so that noise (and hence regularity)

is transferred from the degrees of freedom that are directly forced by noise to the ones that are

not. The corresponding Markov semigroup has nice regularizing properties. There exists a smooth

density. Stochastic processes that can be written as a Markovian process by adding a finite num-

ber of additional variables are called quasimarkovian . Under appropriate assumptions on the

potential V (Q) the solution of the GLE equation is an ergodic process. It is possible to study the

ergodic properties of a quasimarkovian processes by analyzing the spectral properties of the gen-

erator of the corresponding Markov process. This leads to the analysis of the spectral properties of

hypoelliptic operators.

11.3.1 Open Classical Systems

When studying the Kac-Zwanzing model we considered a one dimensional Hamiltonian system

coupled to a finite dimensional Hamiltonian system with random initial conditions (the harmonic

221

heat bath) and then passed to the theromdynamic limit N → ∞. We can consider a small Hamil-

tonian system coupled to its environment which we model as an infinite dimensional Hamiltonian

system with random initial conditions. We have a coupled particle-field model. The distinguished

particle (Brownian particle) is described through the Hamiltonian

HDP =1

2p2 + V (q). (11.28)

We will model the environment through a classical linear field theory (i.e. the wave equation) with

infinite energy:

∂2t φ(t, x) = ∂2

xφ(t, x). (11.29)

The Hamiltonian of this system is

HHB(φ, π) =

∫ (|∂xφ|2 + |π(x)|2

). (11.30)

π(x) denotes the conjugate momentum field. The initial conditions are distributed according to

the Gibbs measure (which in this case is a Gaussian measure) at inverse temperature β, which we

formally write as

”µβ = Z−1e−βH(φ,π) dφdπ”. (11.31)

Care has to be taken when defining probability measures in infinite dimensions.

Under this assumption on the initial conditions, typical configurations of the heat bath have

infinite energy. In this way, the environment can pump enough energy into the system so that

non-trivial fluctuations emerge. We will assume linear coupling between the particle and the field:

HI(q, φ) = q

∫∂qφ(x)ρ(x) dx. (11.32)

where The function ρ(x) models the coupling between the particle and the field. This coupling is

influenced by the dipole coupling approximation from classical electrodynamics. The Hamiltonian

of the particle-field model is

H(q, p, φ, π) = HDP (p, q) +H(φ, π) +HI(q, φ). (11.33)

The corresponding Hamiltonian equations of motion are a coupled system of equations of the

coupled particle field model. Now we can proceed as in the case of the finite dimensional heat

222

bath. We can integrate the equations motion for the heat bath variables and plug the solution into

the equations for the Brownian particle to obtain the GLE. The final result is

q = −V ′(q)−∫ t

0

R(t− s)q(s) + F (t), (11.34)

with appropriate definitions for the memory kernel and the noise, which are related through the

fluctuation-dissipation theorem.

11.4 The Mori-Zwanzig Formalism

Consider now the N + 1-dimensional Hamiltonian (particle + heat bath) with random initial con-

ditions. The N + 1− probability distribution function fN+1 satisfies the Liouville equation

∂fN+1

∂t+ fN+1, H = 0, (11.35)

where H is the full Hamiltonian and ·, · is the Poisson bracket

A,B =N∑j=0

(∂A

∂qj

∂B

∂pj− ∂B

∂qj

∂A

∂pj

).

We introduce the Liouville operator

LN+1· = −i·, H.

The Liouville equation can be written as

i∂fN+1

∂t= LN+1fN+1. (11.36)

We want to obtain a closed equation for the distribution function of the Brownian particle. We

introduce a projection operator which projects onto the distribution function f of the Brownian

particle:

PfN+1 = f, PfN+1 = h.

The Liouville equation becomes

i∂f

∂t= PL(f + h), (11.37a)

i∂h

∂t= (I − P )L(f + h). (11.37b)

223

We integrate the second equation and substitute into the first equation. We obtain

i∂f

∂t= PLf − i

∫ t

0

PLe−i(I−P )Ls(I − P )Lf(t− s) ds+ PLe−i(I−P )Lth(0). (11.38)

In the Markovian limit (large mass ratio) we obtain the Fokker-Planck equation (??).

11.5 Derivation of the Fokker-Planck and Langevin Equations

11.6 Linear Response Theory


The original papers by Kac et al and by Zwanzig are [26, 95]. See also [25]. The variant of

the Kac-Zwanzig model that we have discussed in this chapter was studied in [37]. An excellent

discussion on the derivation of the Fokker-Planck equation using projection operator techniques

can be found in [66].

Applications of linear response theory to climate modeling can be found in.

11.8 Exercises

224

Index

autocorrelation function, 32

Banach space, 16

Brownian motion

scaling and symmetry properties, 42

central limit theorem, 24

conditional expectation, 18

correlation coefficient, 17

covariance function, 32

Diffusion process

mean first passage time, 188

Diffusion processes

reversible, 106

Dirichlet form, 109

equation

Fokker-Planck, 88

kinetic, 116

Klein-Kramers-Chandrasekhar, 137

Langevin, 137

Fokker-Planck, 88

Fokker-Planck equation, 126

Fokker-Planck equation

classical solution of, 89

Gaussian stochastic process, 30

generator, 68, 125

Gibbs distribution, 107

Gibbs measure, 109

Green-Kubo formula, 39

inverse temperature, 100

Ito formula, 125

Joint probability density, 96

Karhunen-Loeve Expansion, 45

Karhunen-Loeve Expansion

for Brownian Motion, 49

kinetic equation, 116

Kolmogorov equation, 126

Langevin equation, 137

law, 13

law of large numbers

strong, 24

Markov Chain Monte Carlo, 111

MCMC, 111

Mean first passage time, 188

Multiplicative noise, 133

operator

hypoelliptic, 137

Ornstein-Uhlenbeck process

225

Fokker-Planck equation for, 95

partition function, 107

Poincare’s inequality

for Gaussian measures, 101

Poincare’s inequality, 109

Quasimarkovian stochastic process, 221

random variable

Gaussian, 17

uncorrelated, 17

Reversible diffusion, 106

spectral density, 35

stationary process, 31

stationary process

second order stationary, 32

strictly stationary, 31

wide sense stationary, 32

stochastic differential equation, 43

Stochastic Process

quasimarkovian, 221

stochastic process

definition, 29

Gaussian, 30

second-order stationary, 32

stationary, 31

equivalent, 30

stochastic processes

strictly stationary, 31

transport coefficient, 39

Wiener process, 40

226

Bibliography

[1] L. Arnold. Stochastic differential equations: theory and applications. Wiley-Interscience

[John Wiley & Sons], New York, 1974. Translated from the German.

[2] R. Balescu. Statistical dynamics. Matter out of equilibrium. Imperial College Press, London,

1997.

[3] A. Barone and G. Paterno. Physics and Applications of the Josephson Effect. Wiley, New

York, 1982.

[4] R. Bartussek, P. Reimann, and P. Hanggi. Precise numerics versus theory for correlation

ratchets. Phys. Rev. Let., 76(7):1166–1169, 1996.

[5] A. Bensoussan, J.-L. Lions, and G. Papanicolaou. Asymptotic analysis for periodic structures,

volume 5 of Studies in Mathematics and its Applications. North-Holland Publishing Co.,

Amsterdam, 1978.

[6] N. Berglund and B. Gentz. Noise-induced phenomena in slow-fast dynamical systems. Prob-

ability and its Applications (New York). Springer-Verlag London Ltd., London, 2006. A

sample-paths approach.

[7] M. Bier and R.D. Astumian. Biasing Brownian motion in different directions in a 3–state

fluctuating potential and application for the separation of small particles. Phys. Rev. Let.,

76(22):4277, 1996.

[8] L. Breiman. Probability, volume 7 of Classics in Applied Mathematics. Society for Industrial

and Applied Mathematics (SIAM), Philadelphia, PA, 1992. Corrected reprint of the 1968

original.

227

[9] C. Bustamante, D. Keller, and G. Oster. The physics of molecular motors. Acc. Chem. res.,

34:412–420, 2001.

[10] S. Cerrai and M. Freidlin. On the Smoluchowski-Kramers approximation for a system with

an infinite number of degrees of freedom. Probab. Theory Related Fields, 135(3):363–394,

2006.

[11] S. Cerrai and M. Freidlin. Smoluchowski-Kramers approximation for a general class of

SPDEs. J. Evol. Equ., 6(4):657–689, 2006.

[12] S. Chandrasekhar. Stochastic problems in physics and astronomy. Rev. Mod. Phys., 15(1):1–

89, Jan 1943.

[13] A.J. Chorin and O.H. Hald. Stochastic tools in mathematics and science, volume 1 of Surveys

and Tutorials in the Applied Mathematical Sciences. Springer, New York, 2006.

[14] I. Derenyi, , and R.D. Astumian. ac separation of particles by biased Brownian motion in a

two–dimensional sieve. Phys. Rev. E, 58(6):7781–7784, 1998.

[15] W. Dietrich, I. Peschel, and W.R. Schneider. Diffusion in periodic potentials. Z. Phys,

27:177–187, 1977.

[16] C.R. Doering, L. A. Dontcheva, and M.M. Klosek. Constructive role of noise: fast fluctuation

asymptotics of transport in stochastic ratchets. Chaos, 8(3):643–649, 1998.

[17] C.R. Doering, W. Horsthemke, and J. Riordan. Nonequilibrium fluctuation–induced trans-

port. Phys. Rev. Let., 72(19):2984–2987, 1994.

[18] N. Wax (editor). Selected Papers on Noise and Stochastic Processes. Dover, New York, 1954.

[19] R. Eichhorn and P. Reimann. Paradoxical directed diffusion due to temperature anisotropies.

Europhys. Lett., 69(4):517–523, 2005.

[20] A. Einstein. Investigations on the theory of the Brownian movement. Dover Publications Inc.,

New York, 1956. Edited with notes by R. Furth, Translated by A. D. Cowper.

[21] S.N. Ethier and T.G. Kurtz. Markov processes. Wiley Series in Probability and Mathematical

Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1986.

228

[22] L.C. Evans. Partial Differential Equations. AMS, Providence, Rhode Island, 1998.

[23] W. Feller. An introduction to probability theory and its applications. Vol. I. Third edition.

John Wiley & Sons Inc., New York, 1968.

[24] W. Feller. An introduction to probability theory and its applications. Vol. II. Second edition.

John Wiley & Sons Inc., New York, 1971.

[25] G. W. Ford and M. Kac. On the quantum Langevin equation. J. Statist. Phys., 46(5-6):803–

810, 1987.

[26] G. W. Ford, M. Kac, and P. Mazur. Statistical mechanics of assemblies of coupled oscillators.

J. Mathematical Phys., 6:504–515, 1965.

[27] M. Freidlin and M. Weber. A remark on random perturbations of the nonlinear pendulum.

Ann. Appl. Probab., 9(3):611–628, 1999.

[28] M. I. Freidlin and A. D. Wentzell. Random perturbations of Hamiltonian systems. Mem.

Amer. Math. Soc., 109(523):viii+82, 1994.

[29] M.I. Freidlin and A.D. Wentzell. Random Perturbations of dunamical systems. Springer-

Verlag, New York, 1984.

[30] A. Friedman. Partial differential equations of parabolic type. Prentice-Hall Inc., Englewood

Cliffs, N.J., 1964.

[31] A. Friedman. Stochastic differential equations and applications. Vol. 1. Academic Press

[Harcourt Brace Jovanovich Publishers], New York, 1975. Probability and Mathematical

Statistics, Vol. 28.

[32] A. Friedman. Stochastic differential equations and applications. Vol. 2. Academic Press

[Harcourt Brace Jovanovich Publishers], New York, 1976. Probability and Mathematical

Statistics, Vol. 28.

[33] P. Fulde, L. Pietronero, W. R. Schneider, and S. Strassler. Problem of brownian motion in a

periodic potential. Phys. Rev. Let., 35(26):1776–1779, 1975.

229

[34] H. Gang, A. Daffertshofer, and H. Haken. Diffusion in periodically forced Brownian particles

moving in space–periodic potentials. Phys. Rev. Let., 76(26):4874–4877, 1996.

[35] C. W. Gardiner. Handbook of stochastic methods. Springer-Verlag, Berlin, second edition,

1985. For physics, chemistry and the natural sciences.

[36] I. I. Gikhman and A. V. Skorokhod. Introduction to the theory of random processes. Dover

Publications Inc., Mineola, NY, 1996.

[37] D. Givon, R. Kupferman, and A.M. Stuart. Extracting macroscopic dynamics: model prob-

lems and algorithms. Nonlinearity, 17(6):R55–R127, 2004.

[38] M. Hairer and G. A. Pavliotis. From ballistic to diffusive behavior in periodic potentials. J.

Stat. Phys., 131(1):175–202, 2008.

[39] M. Hairer and G.A. Pavliotis. Periodic homogenization for hypoelliptic diffusions. J. Statist.

Phys., 117(1-2):261–279, 2004.

[40] P. Hanggi. Escape from a metastable state. J. Stat. Phys., 42(1/2):105–140, 1986.

[41] P. Hanggi, P. Talkner, and M. Borkovec. Reaction-rate theory: fifty years after Kramers. Rev.

Modern Phys., 62(2):251–341, 1990.

[42] W. Horsthemke and R. Lefever. Noise-induced transitions, volume 15 of Springer Series in

Synergetics. Springer-Verlag, Berlin, 1984. Theory and applications in physics, chemistry,

and biology.

[43] J. Jacod and A.N. Shiryaev. Limit theorems for stochastic processes, volume 288 of

Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical

Sciences]. Springer-Verlag, Berlin, 2003.

[44] F. John. Partial differential equations, volume 1 of Applied Mathematical Sciences. Springer-

Verlag, New York, fourth edition, 1991.

[45] S. Karlin and H. M. Taylor. A second course in stochastic processes. Academic Press Inc.

[Harcourt Brace Jovanovich Publishers], New York, 1981.

230

[46] S. Karlin and H.M. Taylor. A first course in stochastic processes. Academic Press [A sub-

sidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975.

[47] C. Kipnis and S. R. S. Varadhan. Central limit theorem for additive functionals of reversible

Markov processes and applications to simple exclusions. Comm. Math. Phys., 104(1):1–19,

1986.

[48] L. B. Koralov and Y. G. Sinai. Theory of probability and random processes. Universitext.

Springer, Berlin, second edition, 2007.

[49] H. A. Kramers. Brownian motion in a field of force and the diffusion model of chemical

reactions. Physica, 7:284–304, 1940.

[50] N. V. Krylov. Introduction to the theory of diffusion processes, volume 142 of Translations

of Mathematical Monographs. American Mathematical Society, Providence, RI, 1995.

[51] R. Kupferman, G. A. Pavliotis, and A. M. Stuart. Ito versus Stratonovich white-noise limits

for systems with inertia and colored multiplicative noise. Phys. Rev. E (3), 70(3):036120, 9,

2004.

[52] A.M. Lacasta, J.M Sancho, A.H. Romero, I.M. Sokolov, and K. Lindenberg. From subdiffu-

sion to superdiffusion of particles on solid surfaces. Phys. Rev. E, 70:051104, 2004.

[53] P. D. Lax. Linear algebra and its applications. Pure and Applied Mathematics (Hoboken).

Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2007.

[54] S. Lifson and J.L. Jackson. On the self–diffusion of ions in polyelectrolytic solution. J. Chem.

Phys, 36:2410, 1962.

[55] B. Lindner, M. Kostur, and L. Schimansky-Geier. Optimal diffusive transport in a tilted

periodic potential. Fluctuation and Noise Letters, 1(1):R25–R39, 2001.

[56] M. Loeve. Probability theory. I. Springer-Verlag, New York, fourth edition, 1977. Graduate

Texts in Mathematics, Vol. 45.

[57] M. Loeve. Probability theory. II. Springer-Verlag, New York, fourth edition, 1978. Graduate

Texts in Mathematics, Vol. 46.

231

[58] M. C. Mackey. Time’s arrow. Dover Publications Inc., Mineola, NY, 2003. The origins of

thermodynamic behavior, Reprint of the 1992 original [Springer, New York; MR1140408].

[59] M.C. Mackey, A. Longtin, and A. Lasota. Noise-induced global asymptotic stability. J.

Statist. Phys., 60(5-6):735–751, 1990.

[60] M. O. Magnasco. Forced thermal ratchets. Phys. Rev. Let., 71(10):1477–1481, 1993.

[61] P. Mandl. Analytical treatment of one-dimensional Markov processes. Die Grundlehren der

mathematischen Wissenschaften, Band 151. Academia Publishing House of the Czechoslo-

vak Academy of Sciences, Prague, 1968.

[62] P. A. Markowich and C. Villani. On the trend to equilibrium for the Fokker-Planck equation:

an interplay between physics and functional analysis. Mat. Contemp., 19:1–29, 2000.

[63] B. J. Matkowsky, Z. Schuss, and E. Ben-Jacob. A singular perturbation approach to Kramers’

diffusion problem. SIAM J. Appl. Math., 42(4):835–849, 1982.

[64] B. J. Matkowsky, Z. Schuss, and C. Tier. Uniform expansion of the transition rate in Kramers’

problem. J. Statist. Phys., 35(3-4):443–456, 1984.

[65] J.C. Mattingly and A. M. Stuart. Geometric ergodicity of some hypo-elliptic diffusions for

particle motions. Markov Processes and Related Fields, 8(2):199–214, 2002.

[66] R.M. Mazo. Brownian motion, volume 112 of International Series of Monographs on

Physics. Oxford University Press, New York, 2002.

[67] J. Meyer and J. Schroter. Comments on the Grad procedure for the Fokker-Planck equation.

J. Statist. Phys., 32(1):53–69, 1983.

[68] E. Nelson. Dynamical theories of Brownian motion. Princeton University Press, Princeton,

N.J., 1967.

[69] B. Øksendal. Stochastic differential equations. Universitext. Springer-Verlag, Berlin, 2003.

[70] G.C. Papanicolaou and S. R. S. Varadhan. Ornstein-Uhlenbeck process in a random potential.

Comm. Pure Appl. Math., 38(6):819–834, 1985.

232

[71] G. A. Pavliotis and A. M. Stuart. Analysis of white noise limits for stochastic systems with

two fast relaxation times. Multiscale Model. Simul., 4(1):1–35 (electronic), 2005.

[72] G. A. Pavliotis and A. M. Stuart. Parameter estimation for multiscale diffusions. J. Stat.

Phys., 127(4):741–781, 2007.

[73] G. A. Pavliotis and A. Vogiannou. Diffusive transport in periodic potentials: Underdamped

dynamics. Fluct. Noise Lett., 8(2):L155–173, 2008.

[74] G.A. Pavliotis and A.M. Stuart. Multiscale methods, volume 53 of Texts in Applied Mathe-

matics. Springer, New York, 2008. Averaging and homogenization.

[75] G. Da Prato and J. Zabczyk. Stochastic Equations in Infinite Dimensions, volume 44 of

Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1992.

[76] H. Qian, Min Qian, and X. Tang. Thermodynamics of the general duffusion process: time–

reversibility and entropy production. J. Stat. Phys., 107(5/6):1129–1141, 2002.

[77] R. L. R. L. Stratonovich. Topics in the theory of random noise. Vol. II. Revised English

edition. Translated from the Russian by Richard A. Silverman. Gordon and Breach Science

Publishers, New York, 1967.

[78] P. Reimann. Brownian motors: noisy transport far from equilibrium. Phys. Rep., 361(2-

4):57–265, 2002.

[79] P. Reimann, C. Van den Broeck, H. Linke, P. Hanggi, J.M. Rubi, and A. Perez-Madrid. Dif-

fusion in tilted periodic potentials: enhancement, universality and scaling. Phys. Rev. E,

65(3):031104, 2002.

[80] P. Reimann, C. Van den Broeck, H. Linke, J.M. Rubi, and A. Perez-Madrid. Giant accel-

eration of free diffusion by use of tilted periodic potentials. Phys. Rev. Let., 87(1):010602,

2001.

[81] Frigyes Riesz and Bela Sz.-Nagy. Functional analysis. Dover Publications Inc., New York,

1990. Translated from the second French edition by Leo F. Boron, Reprint of the 1955

original.

233

[82] H. Risken. The Fokker-Planck equation, volume 18 of Springer Series in Synergetics.

Springer-Verlag, Berlin, 1989.

[83] H. Rodenhausen. Einstein’s relation between diffusion constant and mobility for a diffusion

model. J. Statist. Phys., 55(5-6):1065–1088, 1989.

[84] J.M Sancho, A.M. Lacasta, K. Lindenberg, I.M. Sokolov, and A.H. Romero. Diffusion on a

solid surface: anomalous is normal. Phys. Rev. Let, 92(25):250601, 2004.

[85] M Schreier, P. Reimann, P. Hanggi, and E. Pollak. Giant enhancement of diffusion and

particle selection in rocked periodic potentials. Europhys. Let., 44(4):416–422, 1998.

[86] Z. Schuss. Singular perturbation methods in stochastic differential equations of mathematical

physics. SIAM Review, 22(2):119–155, 1980.

[87] Ch. Schutte and W. Huisinga. Biomolecular conformations can be identified as metastable

sets of molecular dynamics. In Handbook of Numerical Analysis (Computational Chemistry),

Vol X, 2003.

[88] C. Schwab and R.A. Todor. Karhunen-Loeve approximation of random fields by generalized

fast multipole methods. J. Comput. Phys., 217(1):100–122, 2006.

[89] R.B. Sowers. A boundary layer theory for diffusively perturbed transport around a hetero-

clinic cycle. Comm. Pure Appl. Math., 58(1):30–84, 2005.

[90] D.W. Stroock. Probability theory, an analytic view. Cambridge University Press, Cambridge,

1993.

[91] G. I. Taylor. Diffusion by continuous movements. London Math. Soc., 20:196, 1921.

[92] T.C.Elston and C.R. Doering. Numerical and analytical studies of nonequilibrium fluctuation-

induced transport processes. J. Stat. Phys., 83:359–383, 1996.

[93] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the brownian motion. Phys. Rev.,

36(5):823–841, Sep 1930.

[94] M. Vergassola and M. Avellaneda. Scalar transport in compressible flow. Phys. D, 106(1-

2):148–166, 1997.

234

[95] R. Zwanzig. Nonlinear generalized Langevin equations. J. Stat. Phys., 9(3):215–220, 1973.

[96] R. Zwanzig. Nonequilibrium statistical mechanics. Oxford University Press, New York,

2001.

235

Date post:	24-Oct-2014
Category:	Documents
Upload:	yidongdong6247
View:	61 times
Download:	0 times

Stochasitc Process and Applications by GA Pavliotis

Documents