+ All Categories
Home > Documents > Vanier Collegegauss.vaniercollege.qc.ca/~iti/proj/Lauren_Stochastic.pdf · 2016. 7. 6. · Vanier...

Vanier Collegegauss.vaniercollege.qc.ca/~iti/proj/Lauren_Stochastic.pdf · 2016. 7. 6. · Vanier...

Date post: 16-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
36
Vanier College Science Programme - Comprehensive Assessment Stochastic Differential Equations Lauren M´ enard 201-HTH-05 May 20, 2016
Transcript
  • Vanier College

    Science Programme - Comprehensive

    Assessment

    Stochastic Differential Equations

    Lauren Ménard

    201-HTH-05

    May 20, 2016

  • 1 Introduction

    Differential equations in which one or more random processes are involved are

    called stochastic differential equations (SDE), where the term stochastic is used

    to describe the random behaviour of such processes. It is not surprising that

    these equations are widely used in various fields. For instance, their applications

    range from the modelling of the fluctuations of stock prices to the diffusion of

    particles in a given physical medium. They arise where underlying random

    process can be found.

    To evaluate these phenomenons, we must be able to solve the corresponding

    stochastic differential equations. However, to do so, we must first understand

    the properties of stochastic processes at a certain level. That is why, in this

    introduction to SDE’s, we will explore the dynamics of these types of processes

    in the discrete-time case, e.g. Markov chains, as well as in the continuous-

    time case, e.g. classes of processes such as the Markov processes and diffusion

    processes with emphasis on the example of the Wiener process. The latter is

    particularly important as the main integral used to solve SDE’s, called the Itô

    integral, which crucially uses properties of the Wiener process.

    To provide further context, the Wiener process is a mathematical interpre-

    tation, devised by mathematician Norbert Wiener, of the physical phenomenon

    that describes the erratic movement of particles suspended in a fluid, whether

    it be liquid or gas, resulting from their collision with atoms or molecules of the

    medium. This phenomenon was discovered by botanist Robert Brown in 1827,

    when observing the movement of grains of pollen suspended in water at the

    1

  • microscopic level, and is appropriately called Brownian Motion.

    Consequently, stochastic calculus, and the related mathematical fields, owe

    credit to the discoveries made about the relationships between different random

    processes and physical phenomenons. With all that said, the theory as pre-

    sented in this introduction will hopefully help provide further clarification and

    understanding of stochastic differential equations.

    2

  • 2 Markov Chains

    2.1 Random Walks

    To get a better understanding of the significance of Markov chains and stochastic

    matrices, it is useful to look at the simplest cases of random walks. Consider a

    particle moving along an axis with discrete single-unit steps and discrete time,

    starting at position 0 at time 0, x(0) = 0.

    The initial distribution is described by the equations

    p(i) =

    1, i = 0

    0, i 6= 0,

    where the probability that the particle be at position i = 0 is p(i) = 1, and the

    probability that the particle be anywhere else, i 6= 0, is p(i) = 0.

    Next, suppose that this particle only moves one unit to the left or right in

    one unit time with associated probabilities p and q = 1−p, respectively. Finally,

    let p = q = 12 . This particular type of random walk is called a standard random

    walk.

    Aside: Let E be the set of all states of the particle, which in this example is

    the set of all integers, E = Z. Notice that, in this particular case, if En corre-

    sponds to the position of the particle (state) at time n, where n also describes

    the number of single-unit steps, then the set of all states E must be enclosed

    between n and −n, En = {−n,−n+ 2, . . . , n− 2, n}.

    3

  • Figure 1: Random walks with smaller and smaller time steps.

    The transition probability distribution for the standard random walk defined

    above can be described by the following,

    P (i, j) =

    p, j = i+ 1

    q, j = i− 1

    0, otherwise

    (1)

    where i is the current state of the particle and j is the next immediate state of

    the particle.

    Modifications can be added to this standard random walk. For example, if

    we allow the particle to remain in the same position for any unit of time, the then

    4

  • probability that describes this situation is denoted r. Therefore, p+ q + r = 1,

    as the probabilities must sum to 1, and the transition probability distribution,

    i.e. the transition from position i to j, is described by the following,

    P (i, j) =

    p, j = i+ 1

    q, j = i− 1

    r, j = i

    0, otherwise

    (2)

    Another modification that can be applied to random walks is the case where

    there are two limiting barriers. If both barriers are found at point A and B,

    respectively, and 0 is contained between these points A < 0 < B, then the set

    of all states E has B − A + 1 states. The transition probability distribution is

    defined by the following,

    P (i, j) =

    sA, i = A, j = A

    1− sA, i = A, j = A+ 1

    sB , i = B, j = B

    1− sB , i = B, j = B − 1

    p, i 6= A,B, j = i+ 1

    q, i 6= A,B, j = i− 1

    r, i 6= A,B, j = i

    0, otherwise

    (3)

    5

  • where sA and sB are the probabilities that the particle stay at points A or B,

    respectively, in the next immediate step. Note that if sA = sB = 1, then there

    is full absorption by the barriers and if sA = sB = 0, then there is full repulsion

    by the barriers.

    Furthermore, one can write (1), (2) and (3) in matricial form,

    P (i, j) =

    · · · −1 0 1 · · ·

    ... · · · · · · · · ·

    1... 0 q 0

    ...

    0... q 0 p

    ...

    −1... 0 p 0

    ...

    ... · · · · · · · · ·

    , P (i, j) =

    · · · −1 0 1 · · ·

    ... · · · · · · · · ·

    1... 0 q r

    ...

    0... q r p

    ...

    −1... r p 0

    ...

    ... · · · · · · · · ·

    ,

    P (i, j) =

    A A+ 1 · · · −1 0 1 · · · B − 1 B

    B 0 · · · · · · · · · · · · · · · · · · 1− sB sB

    B − 1... . .

    .r p

    ...... . .

    .. ..

    . .. ...

    1... 0 q r . .

    . ...

    0... q r p

    ...

    −1... . .

    .r p 0

    ...

    ...... . .

    .. ..

    . .. ...

    A+ 1 q r . .. ...

    A sA 1− sA · · · · · · · · · · · · · · · · · · 0

    ,

    6

  • respectively, where the left-hand side of the matrices indicates the position i

    (current state), and the top part indicates the next position, j. These transition

    matrices are stochastic matrices Pn×n, where the rows are known as probability

    or random vectors. Such vectors X0, X1, X2, . . . , Xn, each have nonnegative

    entries that sum up to 1.

    For the case where the particle is not bounded between two points A and B,

    such as in case (1) and (2), the stochastic matrix will result in an infinitely large

    and unbounded matrix P∞. It is for this reason that we typically use matrices

    to evaluate discrete-time processes with a finite set of states.

    2.2 Markov Chains

    Following the theory for stochastic matrices, a Markov chain can be defined as a

    sequence of probability vectors X0, X1, X2, . . . associated to a certain stochastic

    matrix P such that the following conditions hold:

    (1) P (X0 = i) = p0(i) for each state i ∈ E;

    (2) P (X0 ∈ E) =∑i∈E

    p(i) = 1;

    (3) P (Xn+1 = in+1 | (X0 = i0, . . . , Xn = in))

    = P (Xn+1 = in+1 | Xn = in) = P (in, in+1);

    (4) P (Xn+1 ∈ E | Xn = i) =∑j∈E

    P (Xn+1 = j | Xn = i)

    =∑j∈E

    P (i, j) = 1 for every i ∈ E.

    The above conditions can be interpreted as follows. If E is the set of all states

    7

  • of some system, then Xn indicates the active state at time n. We assume that

    the probability that i is the active state at time zero is p(i) for any initial state

    i ∈ E and that the system at time zero is in E with probability 1. Furthermore,

    according to Property (3), the transitional probability from some state i to

    another j only depends on the state i and not on any previous states that have

    been visited. Lastly, Property (4) says that it is impossible for a system to leave

    the set E as the transition from i to j is always in E.

    Now, consider a finite set C of states where all the transitional probabilities

    from i → j and from j → i in a finite fixed number of steps are positive. In

    this case, states i and j are said to communicate. In other words, every pair of

    states in C communicates with each other and it is possible to get to any state

    from any state in a certain number of steps. A Markov chain is irreducible if its

    set of states is described by such a communicating class C.

    Another important concept is that of regular stochastic matrices. A stochas-

    tic matrix P for which some power of Pn has strictly positive entries is called

    regular. Note that regular matrices are only discussed in the case of finite

    Markov chains. Furthermore, the Markov chain described by this type of ma-

    trix is necessarily irreducible. However, the converse is not always true. An

    irreducible Markov chain is not necessarily regular.

    For example, consider the transition matrix P =

    0 11 0

    . It is irreducibleas it is possible to move to any state from any state. However, there exists

    8

  • no power of P where all entries as strictly positive as P 2n =

    1 00 1

    andP 2n+1 =

    0 11 0

    . Therefore, it is not regular.Moreover, an irreducible Markov Chain with set C of states can be aperiodic

    or periodic. If any return to state i ∈ C occurs at irregular times (the greatest

    common denominator of the number of steps to return to i is 1), then the state

    is said to be aperiodic. An irreducible Markov chain only needs one aperiodic

    state to imply all states are aperiodic. However, if returns to i occur at multiples

    of k > 1, then the chain is said to be periodic. In the previous example, k = 2,

    therefore it is called periodic. Consequently, the Markov chain is irreducible,

    but not aperiodic. This leads to the following proposition: finite Markov chains

    are regular if and only if they are irreducible and aperiodic.

    These definitions are useful when discussing the long-term behaviour of a

    system. For an aperiodic irreducible Markov chain, high powers of the associ-

    ated regular stochastic matrix P will approach limiting value in the sense that

    limn→∞

    Pn = Π, where Π is a matrix will all rows equal to the same probability

    vector π. Thus, π is called the steady-state vector of the Markov chain. Note

    that a Markov chain must be irreducible and aperiodic for the stochastic matrix

    to be regular and for there to exist a steady-state. A Markov chain that respects

    the conditions described above as n → ∞ is called ergodic. With the addition

    of this new definition, the following claim arises:

    9

  • A finite Markov chain is Aperiodic and Irreducible ⇐⇒ Regular ⇐⇒

    Ergodic.

    There are other ways, however, to find a steady-state vector π. Consider the

    following. If we have a matrix A, then a row vector ξ that satisfies the equation

    ξA = λξ is called an eigenvector with eigenvalue λ. To find such eigenvectors,

    one can use AT ξT = λξT . In the case where A is a regular stochastic matrix,

    now called P , we can apply the Perron-Frobenius theorem:

    (1) there is always one eigenvalue of P with λ1 = 1;

    (2) all other λi satisfy | λi |< 1 for all i > 1;

    (3) λ1 = 1 has algebraic and geometric multiplicity 1.

    Consequently, the λ = 1 eigenvector ξ1 is the steady-state vector π, which

    is shown by the following argument. Let πn = c1λn1 ξ1 + c2λ

    n2 ξ2 + . . .+ ckλ

    n1 ξk,

    where, by (1) and (2), taking limn→∞

    πn will collapse all terms to 0 except for ξ1.

    Therefore, we get π = limn→∞ πn = c1ξ1. This particular probability vector, π,

    is the unique normalized solution of the equation ξ1P = ξ1.

    Example: Let P =

    0.5 0.3 0.2

    0.2 0.8 0

    0.3 0.3 0.4

    be a regular stochastic matrix along itsrows.

    The state of this system is described by the Markov chain, xk = x0Pk for

    k = 0, 1, 2, . . .

    10

  • The steady-state vector can be found by taking high powers of P . For example,

    within the precision of of computer algebra system

    P 100 =

    0.3 0.6 0.1

    0.3 0.6 0.1

    0.3 0.6 0.1

    and π = ( 0.3 0.6 0.1 ).

    However, as previously discussed, we can also find the steady-state vector

    by solving (PT − I)ξT1 = 0

    (PT − I) =

    −0.5 0.2 0.3 0

    0.3 −0.2 0.3 0

    0.2 0 −0.6 0

    1 0 −3 0

    0 1 −6 0

    0 0 0 0

    ξT1 =

    3

    6

    1

    s, and by choosing s = 110 , π = ( 0.3 0.6 0.1 ).

    Now, let us discuss non-regular Markov chains. If the associated matrix is

    non-regular, denoted Q, then Qn does not converge and limn→∞

    Qn(i, j) does not

    exist.

    Example: Let Q =

    0 11 0

    be an periodic irreducible stochastic matrix.

    Then, Q2 = Q4 = Q6 =

    1 00 1

    and Q1 = Q3 = Q5 = 0 1

    1 0

    11

  • In this case, Q is irreducible, but not regular because of its periodicity.

    Stochastic matrices are particularly effective when analyzing the long-term

    behaviour, i.e. the steady-state probability vector, of a discrete-time Markov

    chain. However, there exists many other cases where time is continuous, called

    continuous-time stochastic processes. Such processes will be discussed in the

    next section.

    12

  • 3 Continuous-Time Stochastic Processes

    As the section title suggests, we will consider stochastic processes in continuous

    time, where specific classes, notably the Markov processes and the Diffusion

    processes, will be discussed. In addition, the Wiener process, an example of said

    classes, will be explored.

    3.1 Markov Processes

    A Markov process, as its name suggests, is an extension of the previously ex-

    plored Markov chain in continuous time. It can be viewed as a stochastic pro-

    cess that satisfies the Markov property, which can be described as follows. Let

    X = {X(t), t ∈ R+} be continuous time stochastic process, i.e. a family of

    random variables X(t), where t ≥ 0. Then,

    P[X(tn+1) ∈ B|X(tn) = xn, . . . , X(t1) = x1

    ]= P

    [X(tn+1) ∈ B|X(tn) = xn

    ]

    for all Borel subsets B of R, all time instants 0 < t1 < . . . < tn < tn+1

    and all states x0, x1, . . . , xn in R. In other words, a stochastic process has the

    Markov property if the conditional probability distribution of future states of

    the process depends only upon the present state and not on the sequence of

    events that precede it.

    Moreover, Borel sets in the real line are a class of events obtained as relative

    complements, countable unions and countable intersections of intervals of the

    13

  • real line. In the case where the set of all possible outcomes is defined as the real

    line or as an interval of said line, meaning that the sample space is not finite or

    countable, it would be unrealistic to assign probabilities to all possible subsets

    of that interval. Therefore, the use of Borel sets is necessary.

    The transition probabilities of the Markov process X(t) can be written as

    follows,

    P (s, x; t, B) = P[X(t) ∈ B|X(s) = x

    ]for 0 ≤ s < t and,

    P (s, x; t, B) =

    ∫B

    p(s, x; t, y)dy ,

    for the continuous case, where the density p(s, x; t, ·) is called the transition

    density.

    Furthermore, a Markov process is said to be homogeneous if all its transition

    probabilities depend only on the time difference between two instants, t − s.

    This means that,

    P[X(s+ t) = j|X(s) = i

    ]is independent of s. When this holds, setting s = 0, we obtain,

    P[X(s+ t) = j|X(s) = i

    ]= P

    [X(t) = j|X(0) = i

    ], ∀s, t ≥ 0.

    An important class of Markov process called the diffusion process will be

    explored in the following section.

    14

  • 3.2 Diffusion Processes

    The diffusion process, which will be discussed in the one-dimensional case, is

    a special case of Markov process with continuous sample paths. Note that the

    terms sample path and trajectory can be used interchangeably.

    A Markov process X(t) with transition densities p(s, x; t, y) is called a dif-

    fusion process if the following conditions are satisfied.

    (1) For all x and all ε > 0,

    limt→s+

    1

    t− s

    ∫|y−x|>ε

    p(s, x; t, y)dy = 0;

    (2) There exists a function a(x, s) such that for all x and all ε > 0,

    limt→s+

    1

    t− s

    ∫|y−x|≤ε

    (y − x) p(s, x; t, y)dy = a(s, x);

    (3) There exists a function b(s, x) such that for all x and all ε > 0,

    limt→s+

    1

    t− s

    ∫|y−x|≤ε

    (y − x)2 p(s, x; t, y)dy = b2(s, x).

    The first condition implies, as it was stated above, that the process is con-

    tinuous for any chosen sample path. Furthermore, the second condition states

    that there exists a function a(x, s) called the drift coefficient of the diffusion,

    where the drift is the instantaneous rate of change of the average value or mean

    15

  • of the process if X(s) = x. In addition, for a diffusion process, there exists

    a function b(x, s) called the diffusion coefficient for which b2(x, s) denotes its

    squared variations with the condition that X(s) = x.

    3.2.1 Kolmogorov Equations

    The Kolmogorov equations are two partial differential equations (PDE) that

    arise in the case of continuous-time and continuous-state Markov processes,

    which were introduced by Andrei Kolmogorov in 1931. Such equations, notably

    the Backward and Forward Kolmogorov equations, will be explored in the case

    of Markov diffusion processes. However, it should be noted that the forward

    equation is also known as the Fokker-Planck equation.

    The Forward Kolmogorov Equation At time s, we are given information

    about the state of a system. This information is described by probability density

    p(s, x). This imposes an initial condition on the partial differential equation,

    from time s to t for any s < t, hence the term forward. In other words, the final

    condition of the PDE, is found by integrating forward in time, from s to t.

    Suppose X(t) is a diffusion process with transition density p(s, x; t, y), which

    is described by a continuous function over its arguments. Furthermore, suppose

    that both functions a(t, y) and b(t, y) are continuous in both t and y. Then,

    p(s, y) is a solution to

    ∂p

    ∂t+

    ∂y

    [a(t, y)p

    ]− 1

    2

    ∂2

    ∂y2

    [b2(t, y)p

    ]= 0, (4)

    16

  • with initial condition p(s, x; s, y) = δ(x− y).

    The proof that diffusion processes obey the forward Kolmogorov equation is

    similar to the proof for the backward Kolmogorov equation, which will be given

    below.

    The Backward Kolmogorov Equation Conversely, at time s, we are inter-

    ested in whether, at a future time point t, the system will be in a given subset of

    states. This subset is described by a function u(t, x). This imposes a terminal

    condition on the partial differential equation, from time t to s for any t > s,

    hence the term backward. In other words, the final condition of the PDE, is

    found by integrating backward in time, from t to s.

    Theorem: Let f(x) be a continuous bounded function on R, and let u(s, x)

    be the conditional expectation,

    u(s, x) = E[f(Xt)|Xs = x

    ]=

    ∫f(y) p(s, x; t, y) dy,

    with t fixed. Furthermore, suppose that both functions describing the drift and

    diffusion coefficients, a(s, x) and b(s, x), respectively, are continuous in both s

    and x. Then u(s, x) is a solution to partial differential equation

    ∂u

    ∂s+ a(s, x)

    ∂u

    ∂x+

    1

    2b2(s, x)

    ∂2u

    ∂x2= 0, (5)

    with the terminal condition that u(t, x) = f(x) for s ∈ [0, t].

    17

  • Proof. First observe that by the continuity assumption of the diffusion process,

    together with the fact that the function f(x) is bounded implies that

    u(s, x) =

    ∫Rf(y) p(s, x; t, y) dy =

    ∫|y−x|≤�

    f(y) p(s, x; t, y) dy +

    ∫|y−x|>�

    f(y) p(s, x; t, y) dy ≤

    ∫|y−x|≤�

    f(y) p(s, x; t, y) dy + ||f ||∞∫|y−x|>�

    p(s, x; t, y) dy =

    ∫|y−x|≤�

    f(y) p(s, x; t, y) dy + o (t− s).

    Here the little o notation signifies a term which goes to zeto faster than t−s when

    t → s. ||f ||∞ is the maximum absolute value of the bounded function f . We

    add and substract the final condition f(x) and repeat the previous calculation

    to obtain

    u(s, x) = f(x) +

    ∫|y−x|≤�

    (f(y)− f(x)

    )p(s, x; t, y) dy + o(t− s).

    Using the Chapman-Kolmogorov equation we obtain

    u(s, x) =

    ∫Rf(z)p(s, x; t, z) dz =

    ∫R

    ∫Rp(s, x; r, y)p(r, y; t, z) dz dy

    =

    ∫Ru(r, y)p(s, x; r, y) dy.

    From Taylor’s theorem we have

    u(r, z)− u(r, x) = ∂u(r, x)∂x

    (z − x) + 12

    ∂2u(r, x)

    ∂x2(z − x)2(1 + α�), |z − x| ≤ �,

    18

  • where lim�→0 α� = 0. Combining the above equations we calculate

    u(s, x)− u(s+ h, x)h

    =1

    h

    (∫Rp(s, x; s+ h, y)u(s+ h, y) dy − u(s+ h, x)

    )

    =1

    h

    ∫|x−y|

  • Figure 2: Three dimensional Brownian motion

    The standard Wiener process W = {W (t), t ≥ 0} is a family of Gaussian

    random variables W (t) that depends continuously on t ≥ 0 and that satisfies

    the following:

    (1) W (0) = 0;

    (2) E(W (t)) = 0;

    (3) V ar(W (t)−W (s)) = t− s;

    for all 0 ≤ s ≤ t. We can gather from these conditions that as time increases,

    the variance also increases while maintaining a mean of 0 if the the process

    starts at 0. The Wiener process is sample-path continuous, meaning that it

    is continuous on any choice of trajectory. This is not surprising as the same

    can be said more generally for diffusion processes. However, with probability

    20

  • 1, it is nowhere differentiable for any time t ≥ 0. This will be proved in the

    mean-square sense.

    Proof. By definition, W (t) is Gaussian with variance t. If we consider the

    quotient for the derivative,

    W (t+ h)−W (t)(t+ h)− t

    ,

    limh→0

    E

    [(W (t+ h)−W (t)

    (t+ h)− t

    )2]= limh→0

    1

    h= ∞,

    where the ratio has mean square 1h , and it goes to infinity as h approaches 0.

    Therefore, as no such limit exists, the trajectories of the Wiener process are

    nowhere differentiable. In other words, as the curve of W (t) is observed on an

    increasingly smaller scale, it becomes more and more erratic, which results in a

    completely random quantity.

    The transition density of the Wiener process is,

    p(s, x; t, y) =1√

    2π(t− s)exp

    (− (y − x)

    2

    2(t− s)

    ). (6)

    Note that this transition density is expressed as a Gaussian distribution.

    Furthermore, by evaluating the partial derivatives of (6), we find that they

    satisfy the partial differential equations,

    ∂p

    ∂t− 1

    2

    ∂2p

    ∂y2= 0, (s, x) fixed, (7)

    21

  • and

    ∂p

    ∂s+

    1

    2

    ∂2p

    ∂x2= 0, (t, y) fixed. (8)

    Proof. From equation (7), by letting (s, x) be fixed at 0, equation (6) becomes,

    p(0, 0; t, y) =1√2πt

    exp

    (− y

    2

    2t

    ).

    By taking the first and second partial derivatives of t and y respectively, we

    obtain,

    ∂t

    [1√2πt

    exp

    (− y

    2

    2t

    )]=

    − 12√

    2πt32

    exp

    (− y

    2

    2t

    )+

    1

    2√

    2πt

    y2

    t2exp

    (− y

    2

    2t

    ), (9)

    ∂y

    [1√2πt

    exp

    (− y

    2

    2t

    )]=

    − y√2πt

    32

    exp

    (− y

    2

    2t

    ),

    ∂2

    ∂y2

    [1√2πt

    exp

    (− y

    2

    2t

    )]=

    ∂y

    [− y√

    2πt32

    exp

    (− y

    2

    2t

    )]=

    1√2πt

    y2

    t2exp

    (− y

    2

    2t

    )− 1√

    2πt32

    exp

    (− y

    2

    2t

    ). (10)

    Comparing (9) and (10) obtained from this calculation, we observe that,

    22

  • ∂p

    ∂t=

    1

    2

    ∂2p

    ∂y2,

    and, consequently,

    ∂p

    ∂t− 1

    2

    ∂2p

    ∂y2= 0.

    Similarly, by selecting specific conditions for (t, y) it can be shown that,

    ∂p

    ∂s+

    1

    2

    ∂2p

    ∂x2= 0.

    To get a better understanding of the Wiener process, let us clearly relate

    it to the diffusion process. The standard Wiener process is a diffusion process

    with drift coefficient a(s, x) = 0 and diffusion coefficient b(s, x) = 1. Hence, we

    obtain,

    a(s, x) = limt→s

    E

    (xt − xst− s

    ∣∣∣∣∣ xs = x)

    = 0,

    and,

    b2(s, x) = limt→s

    E

    ((xt − xs)2

    t− s

    ∣∣∣∣∣ xs = x)

    = limt→s

    t− st− s

    = 1.

    Consequently, substituting these values into the forward equation (4) and

    the backward equation (5), we obtain,

    ∂p

    ∂t+∂p

    ∂y(0)− 1

    2

    ∂2p

    ∂y2(1) = 0,

    23

  • ∂p

    ∂t− 1

    2

    ∂2p

    ∂y2= 0, (11)

    and,

    (∂u

    ∂s+ (0)

    ∂u

    ∂x+

    1

    2(1)

    ∂2u

    ∂x2

    )p = 0,

    (∂u

    ∂s+

    1

    2

    ∂2u

    ∂x2

    )p = 0,

    respectively, where these results are precisely equal to the previously derived

    equations (7) and (8). Note that equation (11) is called the heat equation and

    is used to model the diffusion of heat. This makes the Wiener process a very

    important stochastic process in many different fields. However, for it to be

    useful, mathematical meaning must be assigned to its infinitesimal changes.

    Therefore, stochastic calculus is the logical continuation of the theory, where Itô

    integrals will be introduced.

    24

  • 4 Stochastic Calculus

    Stochastic calculus is necessary to effectively study stochastic processes because,

    as previously mentioned, properties of said processes prevent us from using

    regular calculus techniques. This section will explore the various obstacles and

    results that arose in the solving of this problem.

    Recall that a Riemann integral∫ baf(t) dt is defined for a continuous function

    f on a bounded interval [a, b]. The interval is partitioned into n subintervals

    with a = t0 < t1 < . . . < tn = b, where the Riemann integral is equal to the

    sum of the areas of all subintervals as n→∞ and as the width of n approaches

    0,

    ∫ ba

    f(t) dt = lim(tj−tj−1)→0

    n∑j=1

    f(tj−1) (tj − tj−1).

    There exists, however, in the more general case, an integral called the

    Riemann-Stieltjes integral. Suppose f(t) and g(t) are real-valued bounded func-

    tions defined on an interval [a, b]. This simple dt integration can be generalized

    to increments dg(t) by using g(tj)−g(tj−1) instead of tj−tj−1. Thus, we obtain

    the Riemann-Stieltjes integral,

    ∫ ba

    f(t) dg(t) = lim(g(tj)−g(tj−1)

    )→0

    n∑j=1

    f(tj−1)(g(tj)− g(tj−1)

    ).

    Note that for such integrals to exist, the variation of g must be bounded

    and finite over the interval [a, b].

    25

  • These types of integrals arise when solving equations of various processes.

    For example, consider the case where a small amount of liquid flows with macro-

    scopic velocity a(t, u(t)

    ), where u(t) describes its position at time t. Further-

    more, suppose that a microscopic particle is suspended in this liquid, displaying

    evidence of Brownian motion. Consequently, the change in the position of the

    particle has the following equation,

    du(t) = a(t, u(t)

    )+ b(t, u(t)

    )dWt, (12)

    However, the second term of equation (12) does not make sense because the

    trajectories of the Wiener process are nowhere differentiable as it was previously

    discussed. If we represent this equation in integral form, we obtain the following,

    u(t)− u(0) =∫ t0

    a(s, u(s)

    )ds+

    ∫ t0

    b(s, u(s)

    )dWs. (13)

    Observe that the second term of equation (13) is a Riemann-Stieltjes integral.

    Therefore, for this to apply, the variation must be bounded. However, the

    Wiener process is nowhere differentiable and it is not of bounded variation.

    Consequently,∫ baf(t) dWt cannot be interpreted as a Riemann-Stieltjes integral.

    This is precisely why Itô integrals are necessary for stochastic calculus. These

    integrals will be discussed in the next section.

    26

  • 4.1 Itô Stochastic Integrals

    We want to make sense of the following expression,

    ∫ Tto

    f(s, ω) dWs(ω),

    which we will call a stochastic integral, that is defined for a random function

    f : [0, T ]× Ω→ R.

    For a fixed sample path ω the Riemann-Stieltjes integral is commonly used

    to express the limit of the sums,

    n∑j=1

    f(τ(n)j , ω)

    (Wt(n)j+1

    (ω)−Wt(n)j

    (ω)), (14)

    for all possible choices of evaluation points τ(n)j ∈

    [t(n)j , t

    (n)j+1

    ]and partitions

    0 = t(n)1 < t

    (n)2 < . . . < t

    (n)n+1 = T of [0, T ] as

    max(1≤j≤n){t(n)j+1 − t

    (n)j

    }→ 0 as n→ 0.

    However, this limit does not exist as the sample paths of the Wiener process

    are not of bounded variation. Hence, instead of considering said pathwise con-

    vergence, we might want to consider an L2-convergence, where the limit of the

    sums (13) may exists and differ depending on the choice of evaluation points

    τ(n)j ∈

    [t(n)j , t

    (n)j+1

    ].

    For example, consider the case where f(t, ω) = Wt(ω) and,

    τ(n)j = (1− λ)t

    (n)j + λt

    (n)j+1 = (j + λ)δ, λ ∈ [0, 1], ∀ j = 0, 1, . . . , n− 1,

    27

  • is a fixed evaluation point. Note that λ is a parameter that determines the

    choice of evaluation point. Furthermore, let δ = tj+1 − tj = Tn be the constant

    step size. The terms of (13) can, therefore, be rearranged as follows,

    Wτj

    (Wtj+1 −Wtj

    )= −1

    2

    (Wtj+1 −Wτj

    )2+

    1

    2

    (Wτj −Wtj

    )2+

    1

    2

    (W 2tj+1 −W

    2tj

    ).

    By taking the sums, we obtain,

    n∑j=1

    Wτj

    (Wtj+1 −Wtj

    )=

    −12

    n∑j=1

    (Wtj+1 −Wτj

    )2+

    1

    2

    n∑j=1

    (Wτj −Wtj

    )2+

    1

    2

    n∑j=1

    (W 2tj+1 −W

    2tj

    ).

    The third term of the right hand side becomes,

    1

    2

    n∑j=1

    (W 2τj −W

    2tj

    )=

    1

    2

    (W 2T −W 20

    )=

    1

    2W 2T .

    Moreover, recall that the Wiener process has variance,

    Var(W (t)−W (s)

    )= E

    (W (t)−W (s)

    )2= t− s.

    Consequently,

    E

    (n∑j=1

    Wτj

    (Wtj+1 −Wtj

    ))=

    −12

    n∑j=1

    (tj+1 − τj

    )+

    1

    2

    n∑j=1

    (τj − tj

    )+

    1

    2W 2T .

    28

  • Observe that the following equalities arises from the chosen evaluation point,

    tj+1 − τj = tj+1 − (1− λ) tj − λ tj+1

    = (1− λ)[tj+1 − tj

    ]= (1− λ) δ,

    and,

    τj − tj = (1− λ) tj + λ tj+1 − tj

    = λ[tj+1 − tj

    ]= λ δ.

    Therefore,

    E

    (n∑j=1

    Wτj

    (Wtj+1 −Wtj

    ))= −1

    2

    n∑j=1

    (1− λ) δ + 12

    n∑j=1

    λ δ +1

    2W 2T

    = −12

    (1− λ) δn +1

    2λ δn +

    1

    2W 2T

    = − (12− λ) T + 1

    2W 2T

    =1

    2T − 1

    2T + λT

    = λT.

    Note that the integral in the expected-value sense becomes,

    29

  • E

    (∫ T0

    Wt dWt

    )= λT. (15)

    Thus we have a convergent sum in the L2-sense, but the result depends on

    the location of the evaluation point. Furthermore, by taking λ = 0, thus making

    the evaluation point the left endpoint of the subinterval, we obtain,

    E

    (∫ T0

    Wt dWt

    )= 0,

    which is a useful result as the integrand Wiener process Wτj has E(Wt) = 0

    with independent increments Wtj+1 −Wtj . Moreover,

    E

    (∣∣∣∣ ∫ T0

    Wt dWt

    ∣∣∣∣2)

    =

    ∫ T0

    E(|Wt|2

    )dt =

    ∫ T0

    t dt =1

    2T 2.

    The main point of the construction of the Itô’s integral is that, for the

    stochastic integral∫ Tt0f(s, ω) dWs(ω), the dependence of f(s, ω) on W (s, ω) is

    nonanticipative, meaning that the random function f(s, ω) can depend, at most,

    on the present and past values of the Wiener process W (s, ω) and is independent

    of its future.

    To clarify, for the stochastic integral∫ Tt0f(s, ω) dWs(ω), the integrand f(s, ω)

    is nonanticipative if the random variable f(t, ·) is At-measurable for t ∈ [0, T ]

    where {At, t ≥ 0} is an increasing family of σ-algebras generated by Wt, for

    t ≥ 0.

    30

  • Furthermore, the relevant class L2T consists of functions f : [0, T ] × Ω → R

    satisfying

    (1) f is jointly β×A-measurable, where β is the Borel σ-algebra on [0, T ]. Note

    that the collection of Borel sets is the smallest σ-algebra containing the open

    sets,

    (2) E(f(t, ·)2

    )

  • (4) I(αf + βg) = αI(f) + βI(g) for f, g ∈ L2t and ∀α, β ∈ R.

    32

  • 5 Conclusion

    Itô integrals now provides us with the necessary theory to interpret the stochas-

    tic differential equation

    dXt = a(t,Xt) dt+ b(t,Xt) dWt,

    as a stochastic integral equation

    Xt = Xt0 +

    ∫ tt0

    a(s,Xs) ds+

    ∫ tt0

    b(s,Xs) dWs,

    where the solutions of these types of stochastic differential equations are diffu-

    sion processes which obey the Kolmogorov equations previously discussed.

    The theory presented in this introduction to stochastic differential equations

    allows us to consider the solutions of such equations. In fact, as it can be seen,

    these solutions are stochastic processes. The representation of SDE’s at the

    macroscopic level is a diffusion, say of some substance suspended in a physical

    medium, that is modelled by the Kolmogorov equations mentioned above. The

    century-long discussion revolving around the relationship between the micro-

    scopic random behaviour of particles and the nature of a diffusion (amongst

    many emminent mathematicians and scientists including Einstein) has been

    settled by the Itô integral, devised by K. Itô and giving meaning to stochastic

    differential equations. The main result of this discussion is the relationship be-

    tween the stochastic diffusion processes and the Kolmogorov partial differential

    equations.

    33

  • 6 References

    Books

    Coleman R. Stochastic processes. London: Allen and Unwin; 1974. p. 35-51

    Cyganowski S, Kloeden PE, Ombach J. From Elementary Probability to

    Stochastic Differential Equations with Maple. Berlin: Springer; 2002.

    Lawler GF. Random Walk and the Heat Equation. Providence, RI: American

    Mathematical Society; 2010. p. vii

    Lay DC, Lay SR, McDonald JJ. Linear Algebra and its Applications. Boston:

    Pearson/Addison-Wesley; 2006. p. 256-262

    Leon-Garcia A. Probability, statistics, and random processes for electrical

    engineering. 3rd ed. Upper Saddle River, NJ: Pearson/Prentice Hall; 2008.

    p. 673

    Sobczyk K. Stochastic Differential Equations: with Applications to Physics and

    Engineering. Dordrecht: Kluwer Academic; 1991. p. 106-113

    Websites

    Archambeau C., London University College, Centre for Computational

    Statistics and Machine Learning, (n.d.), An Introduction to Diffusion

    Processes and Ito Stochastic Calculus.

    Carnegie Mellon University, Department of Statistics, (n.d.), Diffusions and the

    Wiener Process.

    Ghosh A. P., Iowa State University, Department of Statistics, February 1, 2010,

    Backward and Forward equations for Diffusion processes.

    34

  • Herzog F., ETHzurich, (n.d.), Stochastic Differential Equations.

    Massachusetts Institute of Technology, February 28, 2011, Electrical

    Engineering and Computer Science: Discrete Stochastic Processes - Markov

    Eigenvalues and Eigenvectors.

    New York University, Department of Mathematics, 2007, The Ito Integral with

    Respect to Brownian Motion.

    Sigman K., University of Columbia, 2007, Introduction to Stochastic

    Integration.

    University of Chicago, Department of Statistics, (n.d.), Brownian Motion.

    Whitt, University of Columbia, 2007, A Quick Introduction to Stochastic

    Calculus.

    35


Recommended