Vanier College
Science Programme - Comprehensive
Assessment
Stochastic Differential Equations
Lauren Ménard
201-HTH-05
May 20, 2016
1 Introduction
Differential equations in which one or more random processes are involved are
called stochastic differential equations (SDE), where the term stochastic is used
to describe the random behaviour of such processes. It is not surprising that
these equations are widely used in various fields. For instance, their applications
range from the modelling of the fluctuations of stock prices to the diffusion of
particles in a given physical medium. They arise where underlying random
process can be found.
To evaluate these phenomenons, we must be able to solve the corresponding
stochastic differential equations. However, to do so, we must first understand
the properties of stochastic processes at a certain level. That is why, in this
introduction to SDE’s, we will explore the dynamics of these types of processes
in the discrete-time case, e.g. Markov chains, as well as in the continuous-
time case, e.g. classes of processes such as the Markov processes and diffusion
processes with emphasis on the example of the Wiener process. The latter is
particularly important as the main integral used to solve SDE’s, called the Itô
integral, which crucially uses properties of the Wiener process.
To provide further context, the Wiener process is a mathematical interpre-
tation, devised by mathematician Norbert Wiener, of the physical phenomenon
that describes the erratic movement of particles suspended in a fluid, whether
it be liquid or gas, resulting from their collision with atoms or molecules of the
medium. This phenomenon was discovered by botanist Robert Brown in 1827,
when observing the movement of grains of pollen suspended in water at the
1
microscopic level, and is appropriately called Brownian Motion.
Consequently, stochastic calculus, and the related mathematical fields, owe
credit to the discoveries made about the relationships between different random
processes and physical phenomenons. With all that said, the theory as pre-
sented in this introduction will hopefully help provide further clarification and
understanding of stochastic differential equations.
2
2 Markov Chains
2.1 Random Walks
To get a better understanding of the significance of Markov chains and stochastic
matrices, it is useful to look at the simplest cases of random walks. Consider a
particle moving along an axis with discrete single-unit steps and discrete time,
starting at position 0 at time 0, x(0) = 0.
The initial distribution is described by the equations
p(i) =
1, i = 0
0, i 6= 0,
where the probability that the particle be at position i = 0 is p(i) = 1, and the
probability that the particle be anywhere else, i 6= 0, is p(i) = 0.
Next, suppose that this particle only moves one unit to the left or right in
one unit time with associated probabilities p and q = 1−p, respectively. Finally,
let p = q = 12 . This particular type of random walk is called a standard random
walk.
Aside: Let E be the set of all states of the particle, which in this example is
the set of all integers, E = Z. Notice that, in this particular case, if En corre-
sponds to the position of the particle (state) at time n, where n also describes
the number of single-unit steps, then the set of all states E must be enclosed
between n and −n, En = {−n,−n+ 2, . . . , n− 2, n}.
3
Figure 1: Random walks with smaller and smaller time steps.
The transition probability distribution for the standard random walk defined
above can be described by the following,
P (i, j) =
p, j = i+ 1
q, j = i− 1
0, otherwise
(1)
where i is the current state of the particle and j is the next immediate state of
the particle.
Modifications can be added to this standard random walk. For example, if
we allow the particle to remain in the same position for any unit of time, the then
4
probability that describes this situation is denoted r. Therefore, p+ q + r = 1,
as the probabilities must sum to 1, and the transition probability distribution,
i.e. the transition from position i to j, is described by the following,
P (i, j) =
p, j = i+ 1
q, j = i− 1
r, j = i
0, otherwise
(2)
Another modification that can be applied to random walks is the case where
there are two limiting barriers. If both barriers are found at point A and B,
respectively, and 0 is contained between these points A < 0 < B, then the set
of all states E has B − A + 1 states. The transition probability distribution is
defined by the following,
P (i, j) =
sA, i = A, j = A
1− sA, i = A, j = A+ 1
sB , i = B, j = B
1− sB , i = B, j = B − 1
p, i 6= A,B, j = i+ 1
q, i 6= A,B, j = i− 1
r, i 6= A,B, j = i
0, otherwise
(3)
5
where sA and sB are the probabilities that the particle stay at points A or B,
respectively, in the next immediate step. Note that if sA = sB = 1, then there
is full absorption by the barriers and if sA = sB = 0, then there is full repulsion
by the barriers.
Furthermore, one can write (1), (2) and (3) in matricial form,
P (i, j) =
· · · −1 0 1 · · ·
... · · · · · · · · ·
1... 0 q 0
...
0... q 0 p
...
−1... 0 p 0
...
... · · · · · · · · ·
, P (i, j) =
· · · −1 0 1 · · ·
... · · · · · · · · ·
1... 0 q r
...
0... q r p
...
−1... r p 0
...
... · · · · · · · · ·
,
P (i, j) =
A A+ 1 · · · −1 0 1 · · · B − 1 B
B 0 · · · · · · · · · · · · · · · · · · 1− sB sB
B − 1... . .
.r p
...... . .
.. ..
. .. ...
1... 0 q r . .
. ...
0... q r p
...
−1... . .
.r p 0
...
...... . .
.. ..
. .. ...
A+ 1 q r . .. ...
A sA 1− sA · · · · · · · · · · · · · · · · · · 0
,
6
respectively, where the left-hand side of the matrices indicates the position i
(current state), and the top part indicates the next position, j. These transition
matrices are stochastic matrices Pn×n, where the rows are known as probability
or random vectors. Such vectors X0, X1, X2, . . . , Xn, each have nonnegative
entries that sum up to 1.
For the case where the particle is not bounded between two points A and B,
such as in case (1) and (2), the stochastic matrix will result in an infinitely large
and unbounded matrix P∞. It is for this reason that we typically use matrices
to evaluate discrete-time processes with a finite set of states.
2.2 Markov Chains
Following the theory for stochastic matrices, a Markov chain can be defined as a
sequence of probability vectors X0, X1, X2, . . . associated to a certain stochastic
matrix P such that the following conditions hold:
(1) P (X0 = i) = p0(i) for each state i ∈ E;
(2) P (X0 ∈ E) =∑i∈E
p(i) = 1;
(3) P (Xn+1 = in+1 | (X0 = i0, . . . , Xn = in))
= P (Xn+1 = in+1 | Xn = in) = P (in, in+1);
(4) P (Xn+1 ∈ E | Xn = i) =∑j∈E
P (Xn+1 = j | Xn = i)
=∑j∈E
P (i, j) = 1 for every i ∈ E.
The above conditions can be interpreted as follows. If E is the set of all states
7
of some system, then Xn indicates the active state at time n. We assume that
the probability that i is the active state at time zero is p(i) for any initial state
i ∈ E and that the system at time zero is in E with probability 1. Furthermore,
according to Property (3), the transitional probability from some state i to
another j only depends on the state i and not on any previous states that have
been visited. Lastly, Property (4) says that it is impossible for a system to leave
the set E as the transition from i to j is always in E.
Now, consider a finite set C of states where all the transitional probabilities
from i → j and from j → i in a finite fixed number of steps are positive. In
this case, states i and j are said to communicate. In other words, every pair of
states in C communicates with each other and it is possible to get to any state
from any state in a certain number of steps. A Markov chain is irreducible if its
set of states is described by such a communicating class C.
Another important concept is that of regular stochastic matrices. A stochas-
tic matrix P for which some power of Pn has strictly positive entries is called
regular. Note that regular matrices are only discussed in the case of finite
Markov chains. Furthermore, the Markov chain described by this type of ma-
trix is necessarily irreducible. However, the converse is not always true. An
irreducible Markov chain is not necessarily regular.
For example, consider the transition matrix P =
0 11 0
. It is irreducibleas it is possible to move to any state from any state. However, there exists
8
no power of P where all entries as strictly positive as P 2n =
1 00 1
andP 2n+1 =
0 11 0
. Therefore, it is not regular.Moreover, an irreducible Markov Chain with set C of states can be aperiodic
or periodic. If any return to state i ∈ C occurs at irregular times (the greatest
common denominator of the number of steps to return to i is 1), then the state
is said to be aperiodic. An irreducible Markov chain only needs one aperiodic
state to imply all states are aperiodic. However, if returns to i occur at multiples
of k > 1, then the chain is said to be periodic. In the previous example, k = 2,
therefore it is called periodic. Consequently, the Markov chain is irreducible,
but not aperiodic. This leads to the following proposition: finite Markov chains
are regular if and only if they are irreducible and aperiodic.
These definitions are useful when discussing the long-term behaviour of a
system. For an aperiodic irreducible Markov chain, high powers of the associ-
ated regular stochastic matrix P will approach limiting value in the sense that
limn→∞
Pn = Π, where Π is a matrix will all rows equal to the same probability
vector π. Thus, π is called the steady-state vector of the Markov chain. Note
that a Markov chain must be irreducible and aperiodic for the stochastic matrix
to be regular and for there to exist a steady-state. A Markov chain that respects
the conditions described above as n → ∞ is called ergodic. With the addition
of this new definition, the following claim arises:
9
A finite Markov chain is Aperiodic and Irreducible ⇐⇒ Regular ⇐⇒
Ergodic.
There are other ways, however, to find a steady-state vector π. Consider the
following. If we have a matrix A, then a row vector ξ that satisfies the equation
ξA = λξ is called an eigenvector with eigenvalue λ. To find such eigenvectors,
one can use AT ξT = λξT . In the case where A is a regular stochastic matrix,
now called P , we can apply the Perron-Frobenius theorem:
(1) there is always one eigenvalue of P with λ1 = 1;
(2) all other λi satisfy | λi |< 1 for all i > 1;
(3) λ1 = 1 has algebraic and geometric multiplicity 1.
Consequently, the λ = 1 eigenvector ξ1 is the steady-state vector π, which
is shown by the following argument. Let πn = c1λn1 ξ1 + c2λ
n2 ξ2 + . . .+ ckλ
n1 ξk,
where, by (1) and (2), taking limn→∞
πn will collapse all terms to 0 except for ξ1.
Therefore, we get π = limn→∞ πn = c1ξ1. This particular probability vector, π,
is the unique normalized solution of the equation ξ1P = ξ1.
Example: Let P =
0.5 0.3 0.2
0.2 0.8 0
0.3 0.3 0.4
be a regular stochastic matrix along itsrows.
The state of this system is described by the Markov chain, xk = x0Pk for
k = 0, 1, 2, . . .
10
The steady-state vector can be found by taking high powers of P . For example,
within the precision of of computer algebra system
P 100 =
0.3 0.6 0.1
0.3 0.6 0.1
0.3 0.6 0.1
and π = ( 0.3 0.6 0.1 ).
However, as previously discussed, we can also find the steady-state vector
by solving (PT − I)ξT1 = 0
(PT − I) =
−0.5 0.2 0.3 0
0.3 −0.2 0.3 0
0.2 0 −0.6 0
→
1 0 −3 0
0 1 −6 0
0 0 0 0
ξT1 =
3
6
1
s, and by choosing s = 110 , π = ( 0.3 0.6 0.1 ).
Now, let us discuss non-regular Markov chains. If the associated matrix is
non-regular, denoted Q, then Qn does not converge and limn→∞
Qn(i, j) does not
exist.
Example: Let Q =
0 11 0
be an periodic irreducible stochastic matrix.
Then, Q2 = Q4 = Q6 =
1 00 1
and Q1 = Q3 = Q5 = 0 1
1 0
11
In this case, Q is irreducible, but not regular because of its periodicity.
Stochastic matrices are particularly effective when analyzing the long-term
behaviour, i.e. the steady-state probability vector, of a discrete-time Markov
chain. However, there exists many other cases where time is continuous, called
continuous-time stochastic processes. Such processes will be discussed in the
next section.
12
3 Continuous-Time Stochastic Processes
As the section title suggests, we will consider stochastic processes in continuous
time, where specific classes, notably the Markov processes and the Diffusion
processes, will be discussed. In addition, the Wiener process, an example of said
classes, will be explored.
3.1 Markov Processes
A Markov process, as its name suggests, is an extension of the previously ex-
plored Markov chain in continuous time. It can be viewed as a stochastic pro-
cess that satisfies the Markov property, which can be described as follows. Let
X = {X(t), t ∈ R+} be continuous time stochastic process, i.e. a family of
random variables X(t), where t ≥ 0. Then,
P[X(tn+1) ∈ B|X(tn) = xn, . . . , X(t1) = x1
]= P
[X(tn+1) ∈ B|X(tn) = xn
]
for all Borel subsets B of R, all time instants 0 < t1 < . . . < tn < tn+1
and all states x0, x1, . . . , xn in R. In other words, a stochastic process has the
Markov property if the conditional probability distribution of future states of
the process depends only upon the present state and not on the sequence of
events that precede it.
Moreover, Borel sets in the real line are a class of events obtained as relative
complements, countable unions and countable intersections of intervals of the
13
real line. In the case where the set of all possible outcomes is defined as the real
line or as an interval of said line, meaning that the sample space is not finite or
countable, it would be unrealistic to assign probabilities to all possible subsets
of that interval. Therefore, the use of Borel sets is necessary.
The transition probabilities of the Markov process X(t) can be written as
follows,
P (s, x; t, B) = P[X(t) ∈ B|X(s) = x
]for 0 ≤ s < t and,
P (s, x; t, B) =
∫B
p(s, x; t, y)dy ,
for the continuous case, where the density p(s, x; t, ·) is called the transition
density.
Furthermore, a Markov process is said to be homogeneous if all its transition
probabilities depend only on the time difference between two instants, t − s.
This means that,
P[X(s+ t) = j|X(s) = i
]is independent of s. When this holds, setting s = 0, we obtain,
P[X(s+ t) = j|X(s) = i
]= P
[X(t) = j|X(0) = i
], ∀s, t ≥ 0.
An important class of Markov process called the diffusion process will be
explored in the following section.
14
3.2 Diffusion Processes
The diffusion process, which will be discussed in the one-dimensional case, is
a special case of Markov process with continuous sample paths. Note that the
terms sample path and trajectory can be used interchangeably.
A Markov process X(t) with transition densities p(s, x; t, y) is called a dif-
fusion process if the following conditions are satisfied.
(1) For all x and all ε > 0,
limt→s+
1
t− s
∫|y−x|>ε
p(s, x; t, y)dy = 0;
(2) There exists a function a(x, s) such that for all x and all ε > 0,
limt→s+
1
t− s
∫|y−x|≤ε
(y − x) p(s, x; t, y)dy = a(s, x);
(3) There exists a function b(s, x) such that for all x and all ε > 0,
limt→s+
1
t− s
∫|y−x|≤ε
(y − x)2 p(s, x; t, y)dy = b2(s, x).
The first condition implies, as it was stated above, that the process is con-
tinuous for any chosen sample path. Furthermore, the second condition states
that there exists a function a(x, s) called the drift coefficient of the diffusion,
where the drift is the instantaneous rate of change of the average value or mean
15
of the process if X(s) = x. In addition, for a diffusion process, there exists
a function b(x, s) called the diffusion coefficient for which b2(x, s) denotes its
squared variations with the condition that X(s) = x.
3.2.1 Kolmogorov Equations
The Kolmogorov equations are two partial differential equations (PDE) that
arise in the case of continuous-time and continuous-state Markov processes,
which were introduced by Andrei Kolmogorov in 1931. Such equations, notably
the Backward and Forward Kolmogorov equations, will be explored in the case
of Markov diffusion processes. However, it should be noted that the forward
equation is also known as the Fokker-Planck equation.
The Forward Kolmogorov Equation At time s, we are given information
about the state of a system. This information is described by probability density
p(s, x). This imposes an initial condition on the partial differential equation,
from time s to t for any s < t, hence the term forward. In other words, the final
condition of the PDE, is found by integrating forward in time, from s to t.
Suppose X(t) is a diffusion process with transition density p(s, x; t, y), which
is described by a continuous function over its arguments. Furthermore, suppose
that both functions a(t, y) and b(t, y) are continuous in both t and y. Then,
p(s, y) is a solution to
∂p
∂t+
∂
∂y
[a(t, y)p
]− 1
2
∂2
∂y2
[b2(t, y)p
]= 0, (4)
16
with initial condition p(s, x; s, y) = δ(x− y).
The proof that diffusion processes obey the forward Kolmogorov equation is
similar to the proof for the backward Kolmogorov equation, which will be given
below.
The Backward Kolmogorov Equation Conversely, at time s, we are inter-
ested in whether, at a future time point t, the system will be in a given subset of
states. This subset is described by a function u(t, x). This imposes a terminal
condition on the partial differential equation, from time t to s for any t > s,
hence the term backward. In other words, the final condition of the PDE, is
found by integrating backward in time, from t to s.
Theorem: Let f(x) be a continuous bounded function on R, and let u(s, x)
be the conditional expectation,
u(s, x) = E[f(Xt)|Xs = x
]=
∫f(y) p(s, x; t, y) dy,
with t fixed. Furthermore, suppose that both functions describing the drift and
diffusion coefficients, a(s, x) and b(s, x), respectively, are continuous in both s
and x. Then u(s, x) is a solution to partial differential equation
∂u
∂s+ a(s, x)
∂u
∂x+
1
2b2(s, x)
∂2u
∂x2= 0, (5)
with the terminal condition that u(t, x) = f(x) for s ∈ [0, t].
17
Proof. First observe that by the continuity assumption of the diffusion process,
together with the fact that the function f(x) is bounded implies that
u(s, x) =
∫Rf(y) p(s, x; t, y) dy =
∫|y−x|≤�
f(y) p(s, x; t, y) dy +
∫|y−x|>�
f(y) p(s, x; t, y) dy ≤
∫|y−x|≤�
f(y) p(s, x; t, y) dy + ||f ||∞∫|y−x|>�
p(s, x; t, y) dy =
∫|y−x|≤�
f(y) p(s, x; t, y) dy + o (t− s).
Here the little o notation signifies a term which goes to zeto faster than t−s when
t → s. ||f ||∞ is the maximum absolute value of the bounded function f . We
add and substract the final condition f(x) and repeat the previous calculation
to obtain
u(s, x) = f(x) +
∫|y−x|≤�
(f(y)− f(x)
)p(s, x; t, y) dy + o(t− s).
Using the Chapman-Kolmogorov equation we obtain
u(s, x) =
∫Rf(z)p(s, x; t, z) dz =
∫R
∫Rp(s, x; r, y)p(r, y; t, z) dz dy
=
∫Ru(r, y)p(s, x; r, y) dy.
From Taylor’s theorem we have
u(r, z)− u(r, x) = ∂u(r, x)∂x
(z − x) + 12
∂2u(r, x)
∂x2(z − x)2(1 + α�), |z − x| ≤ �,
18
where lim�→0 α� = 0. Combining the above equations we calculate
u(s, x)− u(s+ h, x)h
=1
h
(∫Rp(s, x; s+ h, y)u(s+ h, y) dy − u(s+ h, x)
)
=1
h
∫|x−y|
Figure 2: Three dimensional Brownian motion
The standard Wiener process W = {W (t), t ≥ 0} is a family of Gaussian
random variables W (t) that depends continuously on t ≥ 0 and that satisfies
the following:
(1) W (0) = 0;
(2) E(W (t)) = 0;
(3) V ar(W (t)−W (s)) = t− s;
for all 0 ≤ s ≤ t. We can gather from these conditions that as time increases,
the variance also increases while maintaining a mean of 0 if the the process
starts at 0. The Wiener process is sample-path continuous, meaning that it
is continuous on any choice of trajectory. This is not surprising as the same
can be said more generally for diffusion processes. However, with probability
20
1, it is nowhere differentiable for any time t ≥ 0. This will be proved in the
mean-square sense.
Proof. By definition, W (t) is Gaussian with variance t. If we consider the
quotient for the derivative,
W (t+ h)−W (t)(t+ h)− t
,
limh→0
E
[(W (t+ h)−W (t)
(t+ h)− t
)2]= limh→0
1
h= ∞,
where the ratio has mean square 1h , and it goes to infinity as h approaches 0.
Therefore, as no such limit exists, the trajectories of the Wiener process are
nowhere differentiable. In other words, as the curve of W (t) is observed on an
increasingly smaller scale, it becomes more and more erratic, which results in a
completely random quantity.
The transition density of the Wiener process is,
p(s, x; t, y) =1√
2π(t− s)exp
(− (y − x)
2
2(t− s)
). (6)
Note that this transition density is expressed as a Gaussian distribution.
Furthermore, by evaluating the partial derivatives of (6), we find that they
satisfy the partial differential equations,
∂p
∂t− 1
2
∂2p
∂y2= 0, (s, x) fixed, (7)
21
and
∂p
∂s+
1
2
∂2p
∂x2= 0, (t, y) fixed. (8)
Proof. From equation (7), by letting (s, x) be fixed at 0, equation (6) becomes,
p(0, 0; t, y) =1√2πt
exp
(− y
2
2t
).
By taking the first and second partial derivatives of t and y respectively, we
obtain,
∂
∂t
[1√2πt
exp
(− y
2
2t
)]=
− 12√
2πt32
exp
(− y
2
2t
)+
1
2√
2πt
y2
t2exp
(− y
2
2t
), (9)
∂
∂y
[1√2πt
exp
(− y
2
2t
)]=
− y√2πt
32
exp
(− y
2
2t
),
∂2
∂y2
[1√2πt
exp
(− y
2
2t
)]=
∂
∂y
[− y√
2πt32
exp
(− y
2
2t
)]=
1√2πt
y2
t2exp
(− y
2
2t
)− 1√
2πt32
exp
(− y
2
2t
). (10)
Comparing (9) and (10) obtained from this calculation, we observe that,
22
∂p
∂t=
1
2
∂2p
∂y2,
and, consequently,
∂p
∂t− 1
2
∂2p
∂y2= 0.
Similarly, by selecting specific conditions for (t, y) it can be shown that,
∂p
∂s+
1
2
∂2p
∂x2= 0.
To get a better understanding of the Wiener process, let us clearly relate
it to the diffusion process. The standard Wiener process is a diffusion process
with drift coefficient a(s, x) = 0 and diffusion coefficient b(s, x) = 1. Hence, we
obtain,
a(s, x) = limt→s
E
(xt − xst− s
∣∣∣∣∣ xs = x)
= 0,
and,
b2(s, x) = limt→s
E
((xt − xs)2
t− s
∣∣∣∣∣ xs = x)
= limt→s
t− st− s
= 1.
Consequently, substituting these values into the forward equation (4) and
the backward equation (5), we obtain,
∂p
∂t+∂p
∂y(0)− 1
2
∂2p
∂y2(1) = 0,
23
∂p
∂t− 1
2
∂2p
∂y2= 0, (11)
and,
(∂u
∂s+ (0)
∂u
∂x+
1
2(1)
∂2u
∂x2
)p = 0,
(∂u
∂s+
1
2
∂2u
∂x2
)p = 0,
respectively, where these results are precisely equal to the previously derived
equations (7) and (8). Note that equation (11) is called the heat equation and
is used to model the diffusion of heat. This makes the Wiener process a very
important stochastic process in many different fields. However, for it to be
useful, mathematical meaning must be assigned to its infinitesimal changes.
Therefore, stochastic calculus is the logical continuation of the theory, where Itô
integrals will be introduced.
24
4 Stochastic Calculus
Stochastic calculus is necessary to effectively study stochastic processes because,
as previously mentioned, properties of said processes prevent us from using
regular calculus techniques. This section will explore the various obstacles and
results that arose in the solving of this problem.
Recall that a Riemann integral∫ baf(t) dt is defined for a continuous function
f on a bounded interval [a, b]. The interval is partitioned into n subintervals
with a = t0 < t1 < . . . < tn = b, where the Riemann integral is equal to the
sum of the areas of all subintervals as n→∞ and as the width of n approaches
0,
∫ ba
f(t) dt = lim(tj−tj−1)→0
n∑j=1
f(tj−1) (tj − tj−1).
There exists, however, in the more general case, an integral called the
Riemann-Stieltjes integral. Suppose f(t) and g(t) are real-valued bounded func-
tions defined on an interval [a, b]. This simple dt integration can be generalized
to increments dg(t) by using g(tj)−g(tj−1) instead of tj−tj−1. Thus, we obtain
the Riemann-Stieltjes integral,
∫ ba
f(t) dg(t) = lim(g(tj)−g(tj−1)
)→0
n∑j=1
f(tj−1)(g(tj)− g(tj−1)
).
Note that for such integrals to exist, the variation of g must be bounded
and finite over the interval [a, b].
25
These types of integrals arise when solving equations of various processes.
For example, consider the case where a small amount of liquid flows with macro-
scopic velocity a(t, u(t)
), where u(t) describes its position at time t. Further-
more, suppose that a microscopic particle is suspended in this liquid, displaying
evidence of Brownian motion. Consequently, the change in the position of the
particle has the following equation,
du(t) = a(t, u(t)
)+ b(t, u(t)
)dWt, (12)
However, the second term of equation (12) does not make sense because the
trajectories of the Wiener process are nowhere differentiable as it was previously
discussed. If we represent this equation in integral form, we obtain the following,
u(t)− u(0) =∫ t0
a(s, u(s)
)ds+
∫ t0
b(s, u(s)
)dWs. (13)
Observe that the second term of equation (13) is a Riemann-Stieltjes integral.
Therefore, for this to apply, the variation must be bounded. However, the
Wiener process is nowhere differentiable and it is not of bounded variation.
Consequently,∫ baf(t) dWt cannot be interpreted as a Riemann-Stieltjes integral.
This is precisely why Itô integrals are necessary for stochastic calculus. These
integrals will be discussed in the next section.
26
4.1 Itô Stochastic Integrals
We want to make sense of the following expression,
∫ Tto
f(s, ω) dWs(ω),
which we will call a stochastic integral, that is defined for a random function
f : [0, T ]× Ω→ R.
For a fixed sample path ω the Riemann-Stieltjes integral is commonly used
to express the limit of the sums,
n∑j=1
f(τ(n)j , ω)
(Wt(n)j+1
(ω)−Wt(n)j
(ω)), (14)
for all possible choices of evaluation points τ(n)j ∈
[t(n)j , t
(n)j+1
]and partitions
0 = t(n)1 < t
(n)2 < . . . < t
(n)n+1 = T of [0, T ] as
max(1≤j≤n){t(n)j+1 − t
(n)j
}→ 0 as n→ 0.
However, this limit does not exist as the sample paths of the Wiener process
are not of bounded variation. Hence, instead of considering said pathwise con-
vergence, we might want to consider an L2-convergence, where the limit of the
sums (13) may exists and differ depending on the choice of evaluation points
τ(n)j ∈
[t(n)j , t
(n)j+1
].
For example, consider the case where f(t, ω) = Wt(ω) and,
τ(n)j = (1− λ)t
(n)j + λt
(n)j+1 = (j + λ)δ, λ ∈ [0, 1], ∀ j = 0, 1, . . . , n− 1,
27
is a fixed evaluation point. Note that λ is a parameter that determines the
choice of evaluation point. Furthermore, let δ = tj+1 − tj = Tn be the constant
step size. The terms of (13) can, therefore, be rearranged as follows,
Wτj
(Wtj+1 −Wtj
)= −1
2
(Wtj+1 −Wτj
)2+
1
2
(Wτj −Wtj
)2+
1
2
(W 2tj+1 −W
2tj
).
By taking the sums, we obtain,
n∑j=1
Wτj
(Wtj+1 −Wtj
)=
−12
n∑j=1
(Wtj+1 −Wτj
)2+
1
2
n∑j=1
(Wτj −Wtj
)2+
1
2
n∑j=1
(W 2tj+1 −W
2tj
).
The third term of the right hand side becomes,
1
2
n∑j=1
(W 2τj −W
2tj
)=
1
2
(W 2T −W 20
)=
1
2W 2T .
Moreover, recall that the Wiener process has variance,
Var(W (t)−W (s)
)= E
(W (t)−W (s)
)2= t− s.
Consequently,
E
(n∑j=1
Wτj
(Wtj+1 −Wtj
))=
−12
n∑j=1
(tj+1 − τj
)+
1
2
n∑j=1
(τj − tj
)+
1
2W 2T .
28
Observe that the following equalities arises from the chosen evaluation point,
tj+1 − τj = tj+1 − (1− λ) tj − λ tj+1
= (1− λ)[tj+1 − tj
]= (1− λ) δ,
and,
τj − tj = (1− λ) tj + λ tj+1 − tj
= λ[tj+1 − tj
]= λ δ.
Therefore,
E
(n∑j=1
Wτj
(Wtj+1 −Wtj
))= −1
2
n∑j=1
(1− λ) δ + 12
n∑j=1
λ δ +1
2W 2T
= −12
(1− λ) δn +1
2λ δn +
1
2W 2T
= − (12− λ) T + 1
2W 2T
=1
2T − 1
2T + λT
= λT.
Note that the integral in the expected-value sense becomes,
29
E
(∫ T0
Wt dWt
)= λT. (15)
Thus we have a convergent sum in the L2-sense, but the result depends on
the location of the evaluation point. Furthermore, by taking λ = 0, thus making
the evaluation point the left endpoint of the subinterval, we obtain,
E
(∫ T0
Wt dWt
)= 0,
which is a useful result as the integrand Wiener process Wτj has E(Wt) = 0
with independent increments Wtj+1 −Wtj . Moreover,
E
(∣∣∣∣ ∫ T0
Wt dWt
∣∣∣∣2)
=
∫ T0
E(|Wt|2
)dt =
∫ T0
t dt =1
2T 2.
The main point of the construction of the Itô’s integral is that, for the
stochastic integral∫ Tt0f(s, ω) dWs(ω), the dependence of f(s, ω) on W (s, ω) is
nonanticipative, meaning that the random function f(s, ω) can depend, at most,
on the present and past values of the Wiener process W (s, ω) and is independent
of its future.
To clarify, for the stochastic integral∫ Tt0f(s, ω) dWs(ω), the integrand f(s, ω)
is nonanticipative if the random variable f(t, ·) is At-measurable for t ∈ [0, T ]
where {At, t ≥ 0} is an increasing family of σ-algebras generated by Wt, for
t ≥ 0.
30
Furthermore, the relevant class L2T consists of functions f : [0, T ] × Ω → R
satisfying
(1) f is jointly β×A-measurable, where β is the Borel σ-algebra on [0, T ]. Note
that the collection of Borel sets is the smallest σ-algebra containing the open
sets,
(2) E(f(t, ·)2
)
(4) I(αf + βg) = αI(f) + βI(g) for f, g ∈ L2t and ∀α, β ∈ R.
32
5 Conclusion
Itô integrals now provides us with the necessary theory to interpret the stochas-
tic differential equation
dXt = a(t,Xt) dt+ b(t,Xt) dWt,
as a stochastic integral equation
Xt = Xt0 +
∫ tt0
a(s,Xs) ds+
∫ tt0
b(s,Xs) dWs,
where the solutions of these types of stochastic differential equations are diffu-
sion processes which obey the Kolmogorov equations previously discussed.
The theory presented in this introduction to stochastic differential equations
allows us to consider the solutions of such equations. In fact, as it can be seen,
these solutions are stochastic processes. The representation of SDE’s at the
macroscopic level is a diffusion, say of some substance suspended in a physical
medium, that is modelled by the Kolmogorov equations mentioned above. The
century-long discussion revolving around the relationship between the micro-
scopic random behaviour of particles and the nature of a diffusion (amongst
many emminent mathematicians and scientists including Einstein) has been
settled by the Itô integral, devised by K. Itô and giving meaning to stochastic
differential equations. The main result of this discussion is the relationship be-
tween the stochastic diffusion processes and the Kolmogorov partial differential
equations.
33
6 References
Books
Coleman R. Stochastic processes. London: Allen and Unwin; 1974. p. 35-51
Cyganowski S, Kloeden PE, Ombach J. From Elementary Probability to
Stochastic Differential Equations with Maple. Berlin: Springer; 2002.
Lawler GF. Random Walk and the Heat Equation. Providence, RI: American
Mathematical Society; 2010. p. vii
Lay DC, Lay SR, McDonald JJ. Linear Algebra and its Applications. Boston:
Pearson/Addison-Wesley; 2006. p. 256-262
Leon-Garcia A. Probability, statistics, and random processes for electrical
engineering. 3rd ed. Upper Saddle River, NJ: Pearson/Prentice Hall; 2008.
p. 673
Sobczyk K. Stochastic Differential Equations: with Applications to Physics and
Engineering. Dordrecht: Kluwer Academic; 1991. p. 106-113
Websites
Archambeau C., London University College, Centre for Computational
Statistics and Machine Learning, (n.d.), An Introduction to Diffusion
Processes and Ito Stochastic Calculus.
Carnegie Mellon University, Department of Statistics, (n.d.), Diffusions and the
Wiener Process.
Ghosh A. P., Iowa State University, Department of Statistics, February 1, 2010,
Backward and Forward equations for Diffusion processes.
34
Herzog F., ETHzurich, (n.d.), Stochastic Differential Equations.
Massachusetts Institute of Technology, February 28, 2011, Electrical
Engineering and Computer Science: Discrete Stochastic Processes - Markov
Eigenvalues and Eigenvectors.
New York University, Department of Mathematics, 2007, The Ito Integral with
Respect to Brownian Motion.
Sigman K., University of Columbia, 2007, Introduction to Stochastic
Integration.
University of Chicago, Department of Statistics, (n.d.), Brownian Motion.
Whitt, University of Columbia, 2007, A Quick Introduction to Stochastic
Calculus.
35