+ All Categories
Home > Documents > Introduction to Probability Theory and Stochastic Processes

Introduction to Probability Theory and Stochastic Processes

Date post: 06-Apr-2018
Category:
Upload: emil-bigaj
View: 241 times
Download: 1 times
Share this document with a friend

of 178

Transcript
  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    1/178

    VIENNA GRADUATE SCHOOL OF FINANCE (VGSF)

    LECTURE NOTES

    Introduction to Probability Theory and

    Stochastic Processes (STATS)

    Helmut Strasser

    Department of Statistics and Mathematics

    Vienna University of Economics and BusinessAdministration

    [email protected]

    http://helmut.strasserweb.net/public

    October 19, 2006

    Copyright c 2006 by Helmut StrasserAll rights reserved. No part of this text may be reproduced, stored in a retrieval sys-

    tem, or transmitted, in any form or by any means, electronic, mechanical, photocoping,

    recording, or otherwise, without prior written permission of the author.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    2/178

    2

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    3/178

    Contents

    Preliminaries i

    0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

    0.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

    I Measure and Integration 1

    1 Measure and probability 3

    1.1 Sigma-fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.3 Measures on the real line . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.4 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 Measurable functions and random variables 132.1 The idea of measurability . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.2 The basic abstract assertions . . . . . . . . . . . . . . . . . . . . . . 14

    2.3 The structure of real-valued measurable functions . . . . . . . . . . . 14

    3 Integral and expectation 17

    3.1 The integral of simple functions . . . . . . . . . . . . . . . . . . . . 17

    3.2 The extension process . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.3 Convergence of integrals . . . . . . . . . . . . . . . . . . . . . . . . 21

    The theorem of monotone convergence . . . . . . . . . . . . . . . . . 21

    The infinite series theorem . . . . . . . . . . . . . . . . . . . . . . . 22The dominated convergence theorem . . . . . . . . . . . . . . . . . . 23

    3.4 Stieltjes integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.5 Proofs of the main theorems . . . . . . . . . . . . . . . . . . . . . . 27

    4 Selected topics 29

    4.1 Image measures and distributions . . . . . . . . . . . . . . . . . . . . 29

    4.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 30

    4.3 Product measures and Fubinis theorem . . . . . . . . . . . . . . . . 33

    4.4 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 36

    3

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    4/178

    4 CONTENTS

    4.5 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    II Probability theory 43

    5 Beyond measure theory 45

    5.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    5.2 Convergence and limit theorems . . . . . . . . . . . . . . . . . . . . 46

    5.3 The causality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    6 Random walks 51

    6.1 The ruin problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    6.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    6.3 Walds equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    6.4 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    7 Conditioning 61

    7.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 61

    7.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    7.3 Some theorems on martingales . . . . . . . . . . . . . . . . . . . . . 67

    8 Stochastic processes 718.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    8.2 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    8.3 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    8.4 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    8.5 The Wiener Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    9 Martingales 79

    9.1 From independent increments to martingales . . . . . . . . . . . . . . 79

    9.2 A technical issue: Augmentation . . . . . . . . . . . . . . . . . . . . 81

    9.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    The optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 85

    9.4 Application: First passage times of the Wiener process . . . . . . . . 88

    One-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    Two-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    The reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . 91

    9.5 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 93

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    5/178

    CONTENTS 5

    III Stochastic calculus 95

    10 The stochastic integral 97

    10.1 Integrals along stochastic paths . . . . . . . . . . . . . . . . . . . . . 97

    10.2 The integral of simple processes . . . . . . . . . . . . . . . . . . . . 98

    10.3 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    10.4 Extending the stochastic integral . . . . . . . . . . . . . . . . . . . . 103

    10.5 The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

    11 Calculus for the stochastic integral 107

    11.1 The associativity rule . . . . . . . . . . . . . . . . . . . . . . . . . . 107

    11.2 Quadratic variation and the integration-by-parts formula . . . . . . . 108

    11.3 Itos formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

    12 Applications to financial markets 115

    12.1 Financial markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    12.2 Trading strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    12.3 The Black-Scholes equation . . . . . . . . . . . . . . . . . . . . . . 117

    12.4 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    12.5 Change of numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    13 Stochastic differential equations 125

    13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12513.2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . 126

    13.3 Wiener driven models . . . . . . . . . . . . . . . . . . . . . . . . . . 128

    14 Martingale properties of stochastic integrals 131

    14.1 Locally square integrable martingales . . . . . . . . . . . . . . . . . 131

    14.2 Square integrable martingales . . . . . . . . . . . . . . . . . . . . . . 134

    14.3 Levys theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

    14.4 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . 136

    15 Exponential martingale and Girsanovs theorem 14115.1 The exponential martingale . . . . . . . . . . . . . . . . . . . . . . . 141

    15.2 Likelihood processes . . . . . . . . . . . . . . . . . . . . . . . . . . 142

    15.3 Change of probability measures . . . . . . . . . . . . . . . . . . . . 143

    16 Martingales in financial markets 147

    16.1 Pricing in financial markets . . . . . . . . . . . . . . . . . . . . . . . 147

    16.2 Pricing in Black-Scholes markets . . . . . . . . . . . . . . . . . . . . 147

    16.3 Pricing in diffusion market models . . . . . . . . . . . . . . . . . . . 149

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    6/178

    6 CONTENTS

    IV Appendix 151

    17 Foundations of modern analysis 153

    17.1 Basic notions on set theory . . . . . . . . . . . . . . . . . . . . . . . 153

    Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    Cartesian products . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

    Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    17.2 Sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

    17.3 The set of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 157

    17.4 Real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    Regulated functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

    The variation of functions . . . . . . . . . . . . . . . . . . . . . . . . 16317.5 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

    17.6 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    7/178

    Preliminaries

    0.1 Introduction

    The goal of this course is to give an introduction into some mathematical concepts and

    tools which are indispensable for understanding the modern mathematical theory offinance. Let us give an overview of historic origins of some of the mathematical tools.

    The central topic will be those probabilistic concepts and results which play an

    important role in mathematical finance. Therefore we have to deal with mathematical

    probability theory. Mathematical probability theory is formulated in a language that

    comes from measure theory and integration. This language differs considerably from

    the language of classical analysis, known under the label of calculus. Therefore, our

    first step will be to get an impression of basic measure theory and integration.

    We will not go into the advanced problems of measure theory where this theory

    becomes exciting. Such topics would be closely related to advanced set theory and

    topology which also differs basically from mere set theoretic language and topologi-cally driven slang which is convenient for talking about mathematics but nothing more.

    Similarly, our usage of measure theory and integration is sort of a convenient language

    which on this level is of little interest in itself. For us its worth arises with its power to

    give insight into exciting applications like probability and mathematical finance.

    Therefore, our presentation of measure theory and integration will be an overview

    rather than a specialized training program. We will become more and more familiar

    with the language and its typical kind of reasoning as we go into those applications

    for which we are highly motivated. These will be probability theory and stochastic

    calculus.

    In the field of probability theory we are interested in probability models having a

    dynamic structure, i.e. a time evolution governed by endogeneous correlation proper-

    ties. Such probability models are called stochastic processes.

    Probability theory is a young theory compared with the classical cornerstones of

    mathematics. It is illuminating to have a look at the evolution of some fundamental

    ideas of defining a dynamic structure of stochastic processes.

    One important line of thought is looking at stationarity. Models which are them-

    selves stationary or are cumulatives of stationary models have determined the econo-

    metric literature for decades. For Gaussian models one need not distinguish between

    strict and weak (covariance) stationarity. As for weak stationarity it turns out that typi-

    i

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    8/178

    ii PRELIMINARIES

    cal processes follow difference or differential equations driven by some noise process.

    The concept of a noise process is motivated by the idea that it does not transport any

    information.

    From the beginning of serious investigation of stochastic processes (about 1900)

    another idea was leading in the scientific literature, i.e. the Markov property. This

    is not the place to go into details of the overwhelming progress in Markov chains

    and processes achieved in the first half of the 20th century. However, for a long time

    this theory failed to describe the dynamic behaviour of continuous time Markov pro-

    cesses in terms of equations between single states at different times. Such equations

    have been the common tools for deterministic dynamics (ordinary difference and dif-

    ferential equations) and for discrete time stationary stochastic sequences. In contrast,

    continuous time Markov processes were defined in terms of the dynamic behaviour of

    their distributions rather than of their states, using partial difference and differential

    equations.The situation changed dramatically about the middle of the 20th century. There

    were two ingenious concepts at the beginning of this disruption. The first is the con-

    cept of a martingale introduced by Doob. The martingale turned out to be the final

    mathematical fixation of the idea of noise. The notion of a martingale is located be-

    tween a process with uncorrelated increments and a process with independent incre-

    ments, both of which were the competing noise concepts up to that time. The second

    concept is that of a stochastic integral due to K. Ito. This notion makes it possible to

    apply differential reasoning to stochastic dynamics.

    At the beginning of the stochastic part of this lecture we will present an introduc-

    tion to the ideas of martingales and stopping times at hand of stochastic sequences(discrete time processes). However, the main subject of the second half of the lecture

    will be continuous time processes with a strong focus on the Wiener process. However,

    the notions of martingales, semimartingales and stochastic integrals are introduced in

    a way which lays the foundation for the study of more general process theory. The

    choice of examples is governed by be the needs of financial applications (covering the

    notion of gambling, of course).

    0.2 Literature

    Let us give some comments to the bibliography.

    The popular monograph by Bauer, [1], has been for a long time the standard text-

    book in Germany on measure theoretic probability. However, probability theory has

    many different faces. The book by Shiryaev, [21], is much closer to those modern

    concepts we are heading to. Both texts are mathematically oriented, i.e. they aim at

    giving complete and general proofs of fundamental facts, preferable in abstract terms.

    A modern introduction into probability models containing plenty of fascinating phe-

    nomena is given by Bremaud, [6] and [7]. The older monograph by Bremaud, [5], is

    not located at the focus of this lecture but contains as appendix an excellent primer on

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    9/178

    0.2. LITERATURE iii

    probability theory.

    Our topic in stochastic processes will be the Wiener process and the stochastic

    analysis of Wiener driven systems. A standard monograph on this subject is Karatzas

    and Shreve, [15]. The Wiener systems part of the probability primer by Bremaud

    gives a very compact overview of the main facts. Today, Wiener driven systems are

    a very special framework for modelling financial markets. In the meanwhile, general

    stochastic analysis is in a more or less final state, called semimartingale theory. Present

    and future research applies this theory in order to get a much more flexible modelling

    of financial markets. Our introduction to semimartingale theory follows the outline by

    Protter, [20] (see also [19]).

    Let us mention some basic literature on mathematical finance.

    There is a standard source by Hull, [11]. Although this book heavily tries to present

    itself as not demanding, nevertheless the contrary is true. The reason is that the com-

    bination of financial intuition and the appearently informal utilization of advancedmathematical tools requires on the readers side a lot of mathematical knowledge in

    order to catch the intrinsics. Paul Wilmott, [22] and [23], tries to cover all topics in

    financial mathematics together with the corresponding intuition, and to make the an-

    alytical framework a bit more explicit and detailed than Hull does. I consider these

    books by Hull and Wilmott as a must for any beginner in mathematical finance.

    The books by Hull and Wilmott do not pretend to talk about mathematics. Let us

    mention some references which have a similar goal as this lecture, i.e. to present the

    mathematical theory of stochastic analysis aiming at applications in finance.

    A very popular book which may serve as a bridge from mathematical probability

    to financial mathematics is by Bjrk, [4]. Another book, giving an introduction bothto the mathematical theory and financial mathematics, is by Hunt and Kennedy, [12].

    Standard monographs on mathematical finance which could be considered as cor-

    nerstones marking the state of the art at the time of their publication are Karatzas and

    Shreve, [16], Musiela and Rutkowski, [17], and Bielecki and Rutkowski, [3]. The

    present lecture should lay some foundations for reading books of that type.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    10/178

    iv PRELIMINARIES

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    11/178

    Part I

    Measure and Integration

    1

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    12/178

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    13/178

    Chapter 1

    Measure and probability

    1.1 Sigma-fields

    Let be a (non-empty) set. We are interested in systems of subsets of which areclosed under set operations.

    1.1 Example. In general, a system of subsets need not be closed under set operations.

    Let = {1, 2, 3}. Consider the system of subsets A = {{1}, {2}, {3}}. This sys-tem is not closed under union, intersection or complementation. E.g. the complement

    of{1} is not in A.It is clear that the power set is closed under any set operations. However, there are

    smaller systems of sets which are closed under set operations, too.Let = {1, 2, 3}. Consider the system of subsets B = {, , {1}, {2, 3}}. Itis easy to see that this system is closed under union, intersection and complementa-

    tion. Moreover, it follows that these set operations can be repeated in arbitrary order

    resulting always in sets contained in A.1.2 Definition. A (non-empty) system Fof subsets of is called a -field if it

    closed under union, intersection and complementation als well as under building limits

    of monotone sequences. The pair (, F) is called a measurable space.There are some obvious necessary properties of a -field.

    1.3 Problem.

    (1) Show that every -field on contains and .(2) What is the smallest possible -field on ?

    If we want to check whether a given system of sets is actually a -field then it issufficient to verify only a minimal set of conditions. The following assertion states

    such a minimal set of conditions.

    1.4 Proposition. A (non-empty) system Fof subsets of is a -field iff it satisfiesthe following conditions:

    3

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    14/178

    4 CHAPTER 1. MEASURE AND PROBABILITY

    (1) F,(2)A F A F,(3) If(Ai)

    i=1 F

    then i=1

    Ai F

    .

    1.5 Problem. Prove 1.4.

    Let us discuss a number of examples.

    When one starts to construct a -field one usually starts with a family C of setswhich in any case should be contained in the -field. If this starting family C does notfulfil all conditions of a -field then a simple idea could be to add further sets untilthe family fulfils all required conditions. Actually, this procedure works if the starting

    family C is a finite system.

    1.6 Definition. Let C be any system of subsets on . The -field generated by C isthe smallest -field Fwhich contains C. It is denoted by (C).

    1.7 Problem. Assume that C = {A}. Find (C).

    1.8 Problem. Assume that C = {A, B}. Find (C).

    1.9 Problem. Show by giving an example that the union of two -field need not be a-field.

    If the system C is any finite system then (C) consists of all sets which can beobtained by finitely many unions, intersections and complementations of sets in C.Although the resulting system (C) still is finite a systematic overview over all setscould be rather complicated.

    Things are much easier if the generating system is a finite partition of.

    1.10 Proposition. Assume thatC is a finite partition of . Then (C) consists ofand of all unions of sets in C.

    1.11 Problem. Prove 1.10.

    1.12 Problem. Let be a finite set. Find the -field which is generated by theone-point sets.

    It is a remarkable fact that every finite -field is generated by a partition.

    1.13 Problem. Show that every finite -field Fis generated by a partition of.Hint: Call a nonempty set A Fan atom if it contains no nonempty proper subset inF. Show that the collection of atoms is a partition of and that every set in Fis aunion of atoms.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    15/178

    1.1. SIGMA-FIELDS 5

    Information sets

    In probability theory a model of a random experiment consists of a pair (,

    F) where

    is a non-empty set and Fis a -field on .The set serves as sample space. It is interpreted as set of possible outcomes

    of the experiment. Note, that it is not necessarily the case that single outcomes are

    actually observable.

    The -field Fis interpreted as the field of observable events. Observability of aset A means that after having performed the random experiment it can be decidedwhether A has been realized or not. In this sense the -field contains the informationwhich is obtained after having performed the random experiment. Therefore Fis alsocalled the information set of the random experiment.

    A simple random variable X is a simple function whose basic partition is observ-

    able, i.e. (X = a) Ffor every value a ofX. The information set ofX is the -fieldwhich is generated by the basic partition ofX. It is is denoted by (X).

    1.14 Example. Consider the random experiment of throwing a coin n-times. Denotethe sides of the coin by 0 and 1. Then the sample space is = {0, 1}n. Assume thatthe outcomes of each throw are observable. IfXi denotes the outcome of the i-th throwthen this means that (Xi = 0) and (Xi = 1) are observable.

    1.15 Problem. Let = {0, 1}3 and define Sk :=k

    i=1 Xi.(1) Find (S1), (S2), (S3).(2) Find (X1), (X2), (X3).

    1.16 Problem. Let be the sample space of throwing a die twice. Denote theoutcomes of the throws Xand Y, respectively. Find (X), (Y), (X+Y), (XY).

    Borel sigma-fields

    Let us discuss -fields on R.Clearly, the power set ofR is a -field. However, the power set is too large. Let

    us be more modest and start with a system of simple sets and then try to extend thesystem to a -field.

    The following example shows that such a procedure does not work if we start with

    one-point sets.

    1.17 Problem. Let Fbe the collection of all subsets ofR which are countable or arethe complement of a countable set.

    (1) Show that Fis a -field.(2) Show that Fis the smallest -field which contains all one-point sets.(3) Does Fcontain intervals ?

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    16/178

    6 CHAPTER 1. MEASURE AND PROBABILITY

    A reasonable -field on R should at least contain all intervals.

    1.18 Definition. The smallest -field on R which contains all intervals is called the

    Borel -field. It is denoted by B and its elements are called Borel sets.Unfortunately, there is no way of describing all sets in B in a simple manner. All

    we can say is that any set which can be obtained from intervals by countably many set

    operations is a Borel set. E.g., every set which is the countable union of intervals is

    a Borel set. But there are even much more complicated sets in B. On the other hand,however, there are subsets ofR which are not in B.

    The concept of Borel sets is easily extended to Rn.

    1.19 Definition. The -field on Rn which is generated by all rectangles

    R = {I1 I2 In : Ik being any interval}is called the Borel -field on Rn and is denoted by Bn.

    All open and all closed sets in Rn are Borel sets since open sets can be represented

    as a countable union of rectangles and closed sets are the complements of open sets.

    Random variables

    Let (, F) be a model of a random experiment. What is a random variable ?The idea of a random variable is that of a function X :

    R such that assertions

    about X are observable events, i.e. are contained in F. But what are assertions on X ?In the case of a simple function we considered assertions of the form (X = a).

    But for functions taking an uncountable number of values we have to consider also

    assertions of the form (X I) where I is an interval.1.20 Definition. A random variable is a function X : R such that (X I) F

    for every interval I.

    1.21 Problem. Show that every function satisfying (X x) Ffor every x R isa random variable.

    Let us turn to the question for the information set of a general random variable.Conceptually, the information set (X) is the -field that is generated by all eventswhich can be observed through X.

    Obviously, the system C consisting of the sets (X I), I being an interval, is nota -field. However, using the the Borel -field we can describe the information set ofa random variable X in a quasi-explicit way.

    1.22 Theorem. The information set(X) is the system of sets (X B) whereB isan arbitrary Borel set. In particular, for a random variableX we have (X B) Ffor allB B.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    17/178

    1.2. MEASURES 7

    1.2 Measures

    Measures are set functions. Let us consider some examples.

    1.23 Example. Let by an arbitrary set and for any subset A define

    (A) = |A| :=

    k ifA contains k elements, ifA contains infinitely many elements.

    This set function is called a counting measure. It is defined for all subsets of .Obviously, it is additive, i.e.

    A B = (A B) = (A) + (B).

    Measures are set functions which intuitively should be related to the notion ofvolume. Therefore measures should be nonnegative and additive. In order to apply

    additivity they should be defined on systems of subsets which are closed under the

    usual set operations. This leads to the requirement that measures should be defined on

    -fields. Finally, if the underlying -field contains infinitely many sets there should besome rule how to handle limits of infinite sequences of sets.

    Thus, we are ready for the definition of a measure.

    1.24 Definition. Let be a non-empty set. A measure on is a set function whichsatisfies the following conditions:

    (1) is defined on a -fieldF

    on .(2) is nonnegative, i.e. (A) 0, A F, and () = 0.(3) is -additive, i.e. for every pairwise disjoint sequence (Ai)

    i=1 F

    i=1

    Ai

    =

    i=1

    (Ai)

    A measure is called finite if () < . A measure P is called a probabilitymeasure ifP() = 1. If|Fis a measure then (, F, ) is a measure space. IfP|Fis a probability measure then (, F, P) is called a probability space.There are some obvious consequences of the preceding definition.1.25 Problem.

    Show that every measure is additive.

    1.26 Problem. Let |Fbe a measure.(1) A1 A2 implies (A1) (A2).(2) Show the inclusion-exclusion law:

    (A1) + (A2) = (A1 A2) + (A1 A2)

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    18/178

    8 CHAPTER 1. MEASURE AND PROBABILITY

    (3) The preceding problem gives a formula for (A1 A2) provided that all sets havefinite measure. Extend this formula to the union of three sets.

    The property of being -additive both guarantees additivity and implies easy rulesfor handling infinite sequences of sets.

    1.27 Problem. Let |Fbe a measure.(1) IfAi A then (Ai) (A).(2) IfAi A and (A1) < then (Ai) (A).1.28 Problem.

    (1) Any nonnegative linear combination of measures is a measure.

    (2) Every infinite sum of measures is a measure.

    1.29 Problem. Explain the construction of measures on a finite -field.Hint: Measures have to be defined for atoms only.

    1.3 Measures on the real line

    The most simple example of a measure is a point measure.

    1.30 Definition. The set function defined by

    a(A) = 1A(a), A

    R,

    is called the point measure at a R.Take a moments reflection on whether this definition actually satisfies the proper-

    ties of a measure. Note that any point measure can be defined for all subsets ofR, i.e.

    it is defined on the largest possible -field 2R.Taking linear combinations of point measures gives a lot of further examples of

    measures.

    1.31 Problem.

    (1) Let = 0 + 21 + 0.51. Calculate (([0, 1)), ([

    1, 1)), ((

    1, 1]).

    (2) Describe in words the values of = kj=1 aj .(3) Let x Rn be a list of data and let (I) be the percentage of data contained in I.Show that is a measure by writing it as a linear combination of point measures.

    Let = R and for every interval I R define(I) := length ofI

    E.g. ((a, b]) = b a. This set function is called the Lebesgue content of intervals.At the moment it is defined only on the family of all intervals.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    19/178

    1.3. MEASURES ON THE REAL LINE 9

    The Lebesgue content is also additive in the following sense: If I1 and I2 are twointervals such that the union I1 I2 = I3 is an interval, too, then

    I1 I2 = (I1 I2) = (I1) + (I2).However, the family of intervals is not a -field. In order to obtain a measure we haveto extend the Lebesgue content to a -field which contains the intervals. The smallest-field with this property is the Borel--field.

    1.32 Theorem. (Measure extension theorem)

    There exists a uniquely determined measure |B such that ((a, b]) = b a, a < b.This measure is called theLebesgue measure.

    Knowing that |B is a measure we may calculate its values for simple Borel setswhich are not intervals.

    1.33 Problem. Find the Lebesgue measure ofQ.

    Now, let us turn to the problem how to get an overview over all measures |B. Werestrict our interest to measures which give finite values to bounded intervals.

    Let |B be a measure such that ((a, b]) < for a < b. Define

    (x) :=

    ((0, x]) ifx > 0

    ((x, 0]) ifx 0and note that for any a < b we have

    ((a, b]) = (b) (a) =

    ((0, b]) ((0, a]) if0 a < b,((0, b]) + ((a, 0]) ifa < 0 < b,

    ((b, 0]) + ((a, 0]) ifa < b 0This means: For every such measure there is a function : R R which definesthe measure at least for all intervals. This function is called the measure-defining

    function of.Note, that our definition of the measure-defining function is such that (0) = 0.

    However, any function which differs from by an additive constant, only, defines thesame measure.

    1.34 Problem. Calculate the measure-defining function of the following measures:

    (1) A point measure: 2, 0, 3(2) A linear combination of point measures: 2 + 20 + 0.53(3) The Lebesgue measure .

    1.35 Problem. Let |B be finite on bounded intervals. Explain the fundamentalproperties of the measure-defining function :

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    20/178

    10 CHAPTER 1. MEASURE AND PROBABILITY

    (1) is increasing.(2) is right-continuous.

    The following is an existence theorem which establishes a one-to-one relation be-

    tween functions and measures.

    1.36 Theorem. (Measure extension theorem)

    For every function : R R satisfying properties (1) and (2) of 1.35there exists auniquely determined measure such that((a, b]) = (b) (a).

    If the measure-defining function is continuous and piecewise differentiable thenits derivative is called the density of the measure (with respect to the Lebesguemeasure ). This name comes from

    (x) = limh0

    (x + h) (x h)2h

    = limh0

    ((x h, x + h])((x h, x + h])

    In such a situation we have

    ((a, b]) =

    ba

    (x) dx

    A measure |B is discrete if it is a finite or infinite linear combination of pointmeasures. A counting measure is a discrete measure where all point measures with

    positive weight have weight one.

    1.37 Problem. Explain the characteristic properties of the measure-defining function

    of a discrete measure and of a counting measure.

    1.38 Problem. Let be the measure-defining function of.(1) Show that ({a}) = (a).(2) For which measures is continuous ?(3) For which measures is a step-function ?

    1.4 Probability distributionsA probability model consists of a sample space , a -field Fand a probability mea-sure P|F. Such a triple (, F, P) is called a probability space.

    For practical applications it is important to specify the particular probability mea-

    sure under consideration. This can be done either if the -field Fhas a simple struc-ture, e.g. if it is finite (confer problem 1.29), or if the -field is the information set ofa random variable X.

    Let us consider the second case. Let X be a random variable. The information setofX is the -field (X) consisting of all events (X B) where B B.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    21/178

    1.4. PROBABILITY DISTRIBUTIONS 11

    1.39 Definition. The set function

    PX : B

    P(X

    B); B

    B,

    is called the distribution ofX (under P).

    1.40 Problem. Show that PX is a probability measure on (R, B).

    Since PX is a measure on (R, B) it can be represented by its measure definingfunction . For probability measures it is, however, simpler to use the distributionfunction

    F(x) = P(X x) = PX((, x]) = PX((, 0]) + (x)

    which differs from only by an additive constant. Thus we have

    P(a < X b) = PX((a, b]) = F(b) F(a) = (b) (a).

    1.41 Proposition. Let X be a random variable with distribution function F. ThenPX = F.

    Many examples illustrating the relation between random variables and their distri-

    bution function have been considered in the introductory course.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    22/178

    12 CHAPTER 1. MEASURE AND PROBABILITY

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    23/178

    Chapter 2

    Measurable functions and random

    variables

    2.1 The idea of measurability

    Recall the concept of a random variable. This is a function X : (, F, P) R definedon a probability space such that the sets (X B) are in Ffor all Borel sets B B.

    The notion of a random variable is a special case of the notion of a measurable

    function.

    2.1 Definition. A function f : (, F) R defined on a measurable space is calledmeasurable if the sets (f B) are in Ffor all Borel sets B B.

    The notion of measurability is not restricted to real-valued functions.

    Let (, A) and (Y, B) be measurable spaces. Moreover, let f : Y be afunction. Recall that (f B) is the inverse image of B under f, usually denoted byf1(B).

    2.2 Definition. A function f : (, A) (Y, B) is called (A, B)-measurable iff1(B) A for all B B.

    Let us agree upon some terminology.

    (1) When we consider real-valued functions then we always use the Borel--field inthe range off. Iff : (, F) (R, B) then we simply say that f is F-measurable ifwe mean that it is (F, B)-measurable.(2) When we consider functions f : R R then (B, B)-measurability is called Borelmeasurability. The term Borel is thus concerned with the -field in the domain off.

    To get an idea what measurability means let us consider some simple examples.

    2.3 Problem. Let (, F, ) be a measure space and let f = 1A where A . Showthat f is F-measurable iffA F.

    13

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    24/178

    14 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

    It follows that very complicated functions are Borel-measurable, e.g. f = 1Q.

    2.4 Problem. Let ,F

    , ) be a measure space and let f :

    R be a simple

    function. Show that f is F-measurable iff all sets of the canonical representation arein F.

    2.2 The basic abstract assertions

    There are two fundamental principles for dealing with measurability. The first prin-

    ciple says that measurability is a property which is preserved under composition of

    functions.

    2.5 Theorem. Letf : (, A) (Y, B) be(A, B)-measurable, and letg : (Y, B) (Z, C) be(B, C)-measurable. Then g f is (A, C)-measurable.

    2.6 Problem. Prove 2.5.

    The second principle is concerned with checking measurability. For checking mea-

    surability off it is sufficient to consider the sets in a generating system of the -fieldin the range off.

    2.7 Theorem. Letf : (, A) (Y, B) and letC be a generating system ofB, i.e.

    B= (

    C). Then f is (

    A,

    B)-measurable ifff1(C)

    Afor allC

    C.

    Proof: Let D := {D Y : f1(D) A}. It can be shown that D is a -field. Iff1(C) A for all C C then C D. This implies (C) D. 2

    2.8 Problem. Fill in the details of the proof of 2.7.

    2.3 The structure of real-valued measurable functions

    Let (, F) be a measurable space. Let L(F) denote the set of all F-measurable real-valued functions. We start with the most common and most simple criterion for check-

    ing measurability of a real-valued function.

    2.9 Problem. Show that a function f : R is F-measurable iff(f ) Fforevery R.Hint: Apply 2.7.

    This provides us with a lot of examples of Borel-measurable functions.

    2.10 Problem.

    (a) Show that every monotone function f : R R is Borel-measurable.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    25/178

    2.3. THE STRUCTURE OF REAL-VALUED MEASURABLE FUNCTIONS 15

    (b) Show that every continuous function f : Rn R is Bn-measurable.Hint: Note that (f ) is a closed set.

    (c) Let f : (,F

    )

    R beF

    -measurable. Show that f+, f,|f|, and every

    polynomial a0 + a1f + + anfn are F-measurable.

    The next exercise is a first step towards the measurability of expressions involving

    several measurable functions.

    2.11 Problem. Let (f1, f2, . . . , f n) be measurable functions. Then

    f = (f1, f2, . . . , f n) : Rn

    is (F, Bn)-measurable.

    2.12 Corollary. Letf1, f2, . . . , f n be measurable functions. Then for every continuousfunction : Rn R the composition (f1, f2, . . . , f n) is measurable.

    Proof: Apply 2.5. 2

    2.13 Corollary. Let f1, f2 be measurable functions. Then f1 + f2, f1 f2, f1 f2,f1 f2 are measurable functions.

    2.14 Problem. Prove 2.13.

    As a result we see that

    L(

    F) is a space of functions where we may perform any

    algebraic operations without leaving the space. Thus it is a very convenient space forformal manipulations. The next assertion shows that we may even perform all of those

    operations involving a countable set (e.g. a sequence) of measurable functions !

    2.15 Theorem. Let(fn)nN be a sequence of measurable functions. Then supn fn,infn fn are measurable functions. LetA := ( limn fn). Then A Fandlimn fn 1Ais measurable.

    Proof: Since

    (supn

    fn ) =n

    (fn )

    it follows from 2.9 that supn fn and infn fn = supn(fn) are measurable. We have

    A := ( limn

    fn) =

    supk

    infnk

    fn = infk

    supnk

    fn

    This implies A F. The last statement follows from

    limn

    fn = supk

    infnk

    fn on A.

    2

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    26/178

    16 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

    Note that the preceding corollaries are only very special examples of the power of

    Theorem 2.5. Roughly speaking, any function which can be written as an expression

    involving countable many operations with countable many measurable functions is

    measurable. Therefore it is rather difficult to construct non-measurable functions.

    Let us denote the set of all F-measurable simple functions by S(F). Clearly, alllimits of simple measurable functions are measurable. The remarkable fact being fun-

    damental for almost everything in integration theory is the converse of this statement.

    2.16 Theorem.

    (a) Every measurable function f is the limit of some sequence of simple measur-able functions.

    (b) Iff 0 then the approximating sequence can be chosen to be increasing.Proof: The fundamental statement is (b).

    Let f 0. For every n N define

    fn :=

    (k 1)/2n whenever (k 1)/2n f < k/2n, k = 1, 2, . . . , n2nn whenever f n

    Then fn f. If f is bounded then (fn) converges uniformly to f. Part (a) followsfrom f = f+ f. 2

    2.17 Problem. Draw a diagram illustrating the construction of the proof of 2.16.

    2.18 Problem. Show: If f is bounded then the approximating sequence can be

    chosen to be uniformly convergent.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    27/178

    Chapter 3

    Integral and expectation

    3.1 The integral of simple functions

    Let (, F, ) be a measure space. We start with defining the -integral of a measurablesimple function.

    3.1 Definition. Let f =n

    i=1 ai1Fi be a nonnegative simple F-measurable functionwith its canonical representation. Then

    f d :=

    ni=1

    ai(Fi)

    is called the -integral off.

    We had to restrict the preceding definition to nonnegative functions since we admit

    the case (F) = . If we were dealing with a finite measure the definition wouldwork for all F-measurable simple functions.

    3.2 Example. Let (, F, P) be a probability space and let X = ni=1 ai1Fi be asimple random variable. Then we have E(X) =

    X dP.

    3.3 Problem. What is the integral with respect to a linear combination of point

    measures ? Which functions can be integrated ?

    3.4 Problem. Give a geometric interpretation of the integral of a step function with

    respect to a Borel measure.

    3.5 Theorem. The-integral on S(F)+ has the following properties:(1)

    1Fd = (F),

    (2)

    (sf + tg) d = s

    f d + t

    g d ifs, t R+ andf, g S(F)+(3)

    f d g d iff g andf, g S(F)+

    Proof: The only nontrivial part is to prove that

    (f + g)d =

    f d +

    gd. 2

    17

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    28/178

    18 CHAPTER 3. INTEGRAL AND EXPECTATION

    3.6 Problem. Show that

    (f + g)d =

    f d +

    gd for f, g S(F)+.Hint: Try to find the canonical representation of f+ g in terms of the canonical repre-sentations off and g.

    It follows that the defining formula of the -integral can be applied to any (non-negative) linear combination of indicators, not only to canonical representations !

    3.2 The extension process

    We know that every nonnegative measurable function f L(F)+ is the limit of anincreasing sequence (fn) S(F)+ of measurable simple functions: fn f. It is anatural idea to think of the integral of f as something like

    f d := lim

    n

    fn d (1)

    This is actually the way we will proceed. But there are some points to worry about.

    First of all, we should ask whether the limit on the right hand side exists. This is

    always the case. Indeed, the integrals

    fnd form an increasing sequence in [0, ].This sequence either has a finite limit or it increases to . Both cases are covered byour definition.

    The second and far more subtle question is whether the definition is compatible

    with the definition of the integral on

    S(

    F). This is the only nontrivial part of the

    extension process of the integral and it is the point where -additivity of comes in.This is proved in Theorem 3.51.

    The third question is whether the value of the limit is independent of the approxi-

    mating sequence. This is is is also the case and proved in Theorem 3.52.

    Thus, (1) is a valid definition of the integral of f L(F)+.

    3.7 Definition. Let (, F, ) be a measure space. The -integral of a functionf L+(F) is defined by equation (1) where (fn) is any increasing sequence (fn) S(F)+ of measurable simple functions such that fn f.

    It is now straightforward that the basic properties of the integral of simple functionsstated in Theorem 3.5 carry over to L(F)+.

    3.8 Theorem. The-integral on L(F)+ has the following properties:(1)

    1Fd = (F),

    (2)

    (sf + tg) d = s

    f d + t

    g d ifs, t R+ andf, g L(F)+(3)

    f d g d iff g andf, g L(F)+

    The following problems establish some easy properties of the integral developed

    so far.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    29/178

    3.2. THE EXTENSION PROCESS 19

    3.9 Problem. Let f L(F)+. Prove Markoffs inequality

    (f > a) 1

    a f d, a > 0.3.10 Problem. Let f L(F)+. Show that f d = 0 implies (f = 0) = 0.

    Hint: Show that (f > 1/n) = 0 for every n N.An assertion A about a measurable function f is said to hold -almost everywhere

    (-a.e.) if(Ac) = 0. Using this terminology the assertion of the preceding exercisecan be phrased as:

    f d = 0, f

    0

    f = 0 -a.e.

    If we are talking about probability measures and random variables the phrase almost

    everwhere is sometimes replaced by almost sure.

    3.11 Problem. Let f L(F)+. Show that f d < implies (f > a) < forevery a > 0.

    Now the integral is defined for every nonnegative measurable function. The value

    of the integral may be . In order to define the integral for measurable functionswhich may take both positive and negative values we have to exclude infinite integrals.

    3.12 Definition. A measurable function f is -integrable if f+ d < andf d < . Iff is -integrable then

    f d :=

    f+ d

    f d

    The set of all -integrable functions is denoted by L1() = L1(, F, ).Proving the basic properties of the integral of integrable functions is an easy matter.

    We collect these fact in a couple of problems.

    3.13 Problem. Show that f L

    (F

    ) is -integrable iff|f| d < .3.14 Problem. The set L1() is a linear space and the -integral is a linear functional

    on L1().3.15 Problem. The -integral is an isotonic functional on L1().3.16 Problem. Let f L1(). Show that | f d| |f| d.3.17 Problem. Let f be a measurable function and assume that there is an integrable

    function g such that |f| g (say: f is dominated). Then f is integrable.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    30/178

    20 CHAPTER 3. INTEGRAL AND EXPECTATION

    3.18 Problem.

    (a) Discuss the question whether bounded measurable functions are integrable.

    (b) Characterize those measurable simple functions which are integrable.

    Many assertions in measure theory concerning measurable functions are stable un-

    der linear combinations and under convergence. Assertions of such a type need only

    be proved for indicators. The procedure of proving (understanding) an assertion for in-

    dicators and extending it to nonnegative and to integrable functions is called measure

    theoretic induction.

    3.19 Problem. Show that integrals are linear with respect to the integrating measure.

    Let us finish this section by some notational remarks.

    For convenience we denoteA

    f d :=

    1Af d, A F.

    3.20 Problem.

    (a) Let f be an integrable function. ThenA

    f d = 0 for all A Fimplies f = 0-a.e.

    (b) Let f and g be integrable functions. ThenA

    f d =A

    g d for all A Fimplies f = g -a.e.

    If(,F

    , P) is a probability space and X

    0 is a random variable then

    E(X) :=

    X dP

    is called the expectation of X. Thus, expectations are integrals of random variablesw.r.t. the underlying probability measures.

    3.21 Problem. Let Xbe a P-integrable random variable. Prove Cebysevs inequality.

    If we are dealing with Borel measure spaces (R, B, ) where the measure is de-fined by some increasing right-continuous function . Then we write

    f d := f d = f(x) d(x)A special case is the Lebesgue integral

    f d =

    f(x) dx. Moreover, integral limits

    are defined by ba

    f d :=

    (a,b]

    f d.

    Note, that the lower integral limit is not included, but the upper limit is included !

    3.22 Problem. What is the difference betweenba

    f d and(a,b)

    f d ?

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    31/178

    3.3. CONVERGENCE OF INTEGRALS 21

    3.3 Convergence of integrals

    One of the reasons for the great success of abstract integration theory are the conver-

    gence theorems for integrals. The problem is the following. Assume that (fn) is asequence of functions converging to some function f. When can we conclude that

    limn

    fn d =

    f d ?

    There are (at least) three basic assertions of this kind which could be viewed as the

    three basic principles of integral convergence. We will present these principles together

    with typical applications.

    The theorem of monotone convergence

    The first principle says that for increasing sequences of nonnegative functions the limit

    and the integral may be interchanged.

    3.23 Theorem. (Theorem of Beppo Levi)

    Let(fn) L(F)+. Then fn f limn

    fn d =

    f d

    The theorem is proved in section 3.5. Note that there is no assumption on integra-

    bility. If the sequence is decreasing instead of increasing the corresponding assertion

    is only valid if the sequence is integrable.

    3.24 Problem.

    (a) Let (fn) L1(F)+. Then fn f limn fn d = f d(b) Show by example that the integrability assumption cannot be omitted without

    compensation.

    The first application looks harmless.

    3.25 Problem.

    (a) Let f be a measurable function such that f = 0 -a.e.. Then f is integrable andf d = 0.

    Hint: Consider f+ and f separately.(b) Let f and g be measurable functions such that f = g -a.e.. Then f is integrable

    iffg is integrable.

    Our next application is the starting point of a couple of problems which are con-

    cerned with advanced calculus. They serve as a warming up for stochastic calculus

    which will be the subject of part III of this text.

    Let : [a, b] R be increasing and right-continuous. For any bounded measur-able function f : [a, b] R let

    (f )(t) :=ta

    f d

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    32/178

    22 CHAPTER 3. INTEGRAL AND EXPECTATION

    3.26 Problem.

    (a) Show that f g is right-continuous with left limits.(b) Show that (f )(t) = f(t)(t).

    The infinite series theorem

    The second principle says that for nonnegative measurable function integrals and in-

    finite sums may be interchanged. It is an easy consequence of the monotone conver-

    gence theorem (see section 3.5).

    3.27 Theorem. For every sequence (fn) of nonnegative measurable functions wehave

    n=1

    fn d = n=1

    fn d

    3.28 Problem. Let (, F, ) be a measure space and f 0 a measurable function.Show that : A

    Af d is a measure.

    3.29 Problem. Let (amn) a double sequence of nonnegative numbers. Show thatm

    n amn =

    n

    m amn.

    Hint: Define fn(x) := amn ifx

    (m

    1, m].

    3.30 Problem.

    (a) Let = N and F = 2N. Show that for every sequence an 0 there is auniquely determined measure |Fsuch that ({n}) = an.

    (b) Find

    f d for f 0.

    3.31 Problem. Let f 0 be a measurable function and right-continuous andincreasing. Iff = 0 except at countably many points (xi)iN then

    f d =

    i=1

    f(xi)(xi).

    3.32 Problem. Let f and be increasing right-continuous functions. Show that

    ba

    f(t) d(t) =

    a

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    33/178

    3.3. CONVERGENCE OF INTEGRALS 23

    The dominated convergence theorem

    The most popular result concerning this issue is Lebesgues theorem on dominated

    convergence. Find the proof in section 3.5.

    3.33 Theorem. (Dominated convergence theorem)

    Let (fn) be a sequence of measurable function which is dominated by an inte-grable function g, i.e. |fn| g, n N. If fn f -a.e. then f L1() andlimn

    fn d =

    f d.

    3.34 Problem. Show that under the assumptions of the dominated convergence

    theorem we even have

    limn |fn f| d = 0

    (This type of convergence is called mean convergence.)

    3.35 Problem. Discuss the question whether a uniformly bounded sequence of

    measurable functions is dominated in the sense of the dominated convergence theorem.

    There is a plenty of applications of the dominated convergence theorem. Let us

    present those consequences which show the superiority of general measure theory

    compared with previous approaches to integration.

    Recall the notion of a Riemannian sequence of subdivisions of an interval [a, b].

    3.36 Problem. Let f : [a, b] R be a regulated function and let be increasing andright-continuous. Show that for every Riemannian sequence of subdivisions of [a, b]

    (a) limn

    kni=1

    f(ti1)((ti) (ti1)) =ba

    f d

    (b) limn

    kni=1

    f(ti)((ti) (ti1)) =ba

    f+ d

    The preceding convergence statements for Riemannian sums are the key for impor-tant mathematical theorems.

    3.37 Problem. Let f and g be increasing right continuous functions. Show thefollowing versions of the integration by parts formula (a < b):

    (a) f(b)g(b) = f(a)g(a) +

    ba

    f dg +ba

    g df

    (b) f(b)g(b) = f(a)g(a) +

    ba

    f dg +ba

    g df +

    a

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    34/178

    24 CHAPTER 3. INTEGRAL AND EXPECTATION

    3.38 Problem. Let f be increasing and right continuous. Show that for everyRiemannian sequence of subdivisions of [a, b]

    limn

    kni=1

    (f(ti) f(ti1))2 =

    a

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    35/178

    3.4. STIELTJES INTEGRATION 25

    3.41 Problem. Explain why f g is of bounded variation.The Stieltjes integral has many properties which can be used for calculation pur-

    poses. Moreover, the Stieltjes integral is a special case of the general stochastic integralwhich is an indispensable tool in the theory of stochastic processes and their applica-

    tions.

    If h : [a, b] R is bounded measurable then the integral ba

    f d(h g) is well-defined. How can we express this integral in terms of an integral with respect to g ?

    3.42 Theorem. (Associativity rule)

    Letf andh be bounded measurable functions. Then

    b

    a

    f d(h

    g) =

    b

    a

    fhdg

    Proof: The assertion is obvious for f = 1(a,t], 0 t b. The general casefollows by measure theoretic induction. 2

    Since for rules of this kind the function f is only a dummy function it is convenientto state the rule in a more compact way as

    d(h g) = h dgwhich is called differential notation. It should be kept in mind that such formulas

    have always to be interpreted as assertions about integrals.

    3.43 Problem. Let g be differentiable with continuous derivative g. Show thatdg = gdt.

    If g has jumps then we have (h g)(t) = h(t)g(t). This follows by the sameargument as for problem 3.24. Hence, h g is continuous whenever g is continuous.

    Note, that the assertions on the convergence of Riemannian sums, shown in prob-

    lem 3.36, are also true for right-continuous functions g of bounded variation. Hence,we have the integation by parts formula.

    3.44 Theorem. (Integration by parts)Let both f andg be right-continuous and of bounded variation. Then

    f(b)g(b) = f(a)g(a) +

    ba

    f dg +ba

    g df +

    a

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    36/178

    26 CHAPTER 3. INTEGRAL AND EXPECTATION

    3.46 Problem. What is the quadratic variation of a function of bounded variation ?

    Distinguish between the continuous and the non-continuous case.

    There is a third calculation rule for Stieltjes integrals called the transformationformula. Since this rule is the classical prototype of the famous Ito formula of stochas-

    tic integration let us try to explain it very carefully.

    3.47 Theorem. (Transformation formula)

    Let : R R be a continuous function with a continuous derivative. Letf : [a, b] R be a function of bounded variation.

    (1) Iff is continuous then

    f(b) = f(a) +ba

    f df

    (2) Iff is right-continuous then

    f(b) = f(a) +ba

    f df +0

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    37/178

    3.5. PROOFS OF THE MAIN THEOREMS 27

    3.5 Proofs of the main theorems

    3.51 Theorem. Letf S(F)+ and(fn) S(F)+. Then

    fn f limn

    fn d =

    f d

    Proof: Note that is clear. For an arbitrary > 0 let Bn := (f fn (1 + )). Itis clear that

    1Bnf d

    1Bnfn (1 + ) d

    fn d (1 + )

    From Bn it follows that A Bn A and (A Bn) (A) by -additivity. Weget

    f d =n

    j=1

    j(Aj) = limn

    nj=1

    j (Aj Bn) = limn

    1Bnf d

    which implies f d lim

    n

    fn d (1 + )

    Since is arbitrarily small the assertion follows. 2

    3.52 Theorem. Let(fn) andgn) be increasing sequences of nonnegative measurablesimple functions. Then

    limn

    fn = limn

    gn limn

    fnd = lim

    n

    gnd.

    Proof: It is sufficient to prove the assertion with replacing =. Sincelimk fn gk = fn limk gk = fn we obtain by 3.51

    fn d = limk

    fn gk d lim

    k

    gk d

    2

    3.53 Theorem. (Theorem of Beppo Levi)

    Letf L(F)+ and(fn) L(F)+. Then

    fn f limn

    fn d =

    f d

    Proof: We have to show .

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    38/178

    28 CHAPTER 3. INTEGRAL AND EXPECTATION

    For every n N let (fnk)kN be an increasing sequence in S(F)+ such thatlimk fnk = fn. Define

    gk := f1k

    f2k

    . . .

    fkk

    Then

    fnk gk fk f whenever n k.It follows that gk f and

    f d = limk

    gk d lim

    k

    fk d

    2

    3.54 Problem. Prove Fatous lemma: For every sequence (fn) of nonnegative mea-

    surable functionslim inf

    n

    fn d

    lim inf

    nfn d

    Hint: Recall that lim infn xn = limk infnk xn. Consider gk := infnk fn and applyLevis theorem to (gk).

    3.55 Theorem. (Dominated convergence theorem)

    Let(fn) be a sequence of measurable function which is dominated by an integrablefunction g, i.e. |fn| g, n N. Then

    fn f -a.e. f L1

    () and limn fn d = f dNow it is easy to prove several important facts concerning the integral. We state

    these a problems.

    Proof: Integrability off is obvious since f is dominated by g, too. Moreover, thesequences g fn and g + fn consist of nonnegative measurable functions. Thereforewe may apply Fatous lemma:

    (g f) d lim inf

    (g fn) d =

    g d lim supn

    fn d

    and (g + f) d lim inf

    (g + fn) d =

    g d + lim inf

    n

    fn d

    This implies f d lim inf

    n

    fn d lim sup

    n

    fn d

    f d

    2

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    39/178

    Chapter 4

    Selected topics

    4.1 Image measures and distributions

    Let (, A, ) be a measure space and let (Y, B) be a measurable space. Moreover,let f : Y be a function. We are going to consider the problem of mapping themeasure to the set Y be means of the function f.

    The concept of the distribution of a random variable is an important special case of

    mapping a measure from one set to another (confer definition 1.39).

    4.1 Definition. Let f : (, A, ) (Y, B) be (A, B)-measurable. Then

    f

    (B) := (f B) = (f1

    (B)), B B.is the image of under f or the distribution off under .

    4.2 Problem. Show that f is indeed a measure on B.

    4.3 Problem. Let , F, ) be a measure space and let f = 1A where A . Findf.

    4.4 Problem. Let , F, ) be a measure space and let f : R be a simplefunction. Find f.

    4.5 Problem. Let (, F, P) be a probability space and let X be a random variablewith distribution function F. Show that PX = F.

    An important point is how integrals behave under measure mappings.

    4.6 Theorem. (Transformation formula)

    Let(, F, ) be a measure space and letg L(F). Then for everyf L+(B)f g d =

    f dg

    29

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    40/178

    30 CHAPTER 4. SELECTED TOPICS

    4.7 Problem. Prove 4.6 by measure theoretic induction.

    4.8 Problem. Let (, F, ) be a measure space and let g L(F). Show that f g is-integrable ifff is g-integrable. In case of integrability the transformation formulaholds.

    4.9 Problem. Let (, F, P) be a probability space and X a random variable withdistribution function F. Explain the formula

    E(f X) =

    f dF

    4.2 Measures with densities

    Let (, F, ) be a measure space and let f L+(F).

    4.10 Problem. Show that : A A

    f d, A Fis a measure.

    We would like to say that f is the density of with respect to but for doing sowe have to be sure that f is uniquely determined by . But this is not true, in general.

    4.11 Problem. Show that the density is uniquely determined if the measure isfinite.

    4.12 Example. Let |B be a measure such that all countable sets B B have measurezero and all uncountable sets have measure (B) = . A moments reflection showsthat this is actually a measure. Now for every positive constant function f c > 0 wehave

    B

    f d = (B), B B.

    In the light of the preceding example we see that we have to exclude unreasonable

    measures in order to obtain uniqueness of densities. The following lemma shows the

    direction we have to go.

    4.13 Lemma. Letf, g L+(F). ThenA

    f d =

    A

    g d A F ((f = g) A) = 0 (A) < .

    In other words: f = g -a.e. on every set of finite-measure.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    41/178

    4.2. MEASURES WITH DENSITIES 31

    Proof: Let (M) < and define Mn := M (f n) g n). Since f1Mnand g1Mn are -integrable it follows that f1Mn = g1Mn -a.e. For n we haveMn

    M which implies f1M = g1M -a.e. 2

    Since densities are uniquely determined on sets of finite measure we have unique-

    ness of densities for finite measures and also for measures which can be decomposedinto finite measures.

    4.14 Definition. A measure |F is called -finite if there is a sequence of sets(Fn)nN Fsuch that (Fn) < for all n N.

    Note that Borel measures are -finite. For -finite measures densitites areuniquely determined.

    4.15 Lemma. If is finite or-finite thenA

    f d =

    A

    g d A F f = g -a.e.

    4.16 Definition. Let be -finite and define a measure = f by

    : A

    Af d, A

    F.

    Then f =:d

    dis called the density or the Radon-Nikodym derivative of with

    respect to .

    A density w.r.t the Lebesgue measure is called a Lebesgue density.

    4.17 Problem. Let : R R be an increasing function which is supposed to bedifferentiable on R. Show that =

    .

    4.18 Problem.Let (, F, P) be a measure space and X a random variable withdifferentiable distribution function F. Explain the formulas

    P(X B) =B

    F(t) dt and E(g X) =

    g(t)F(t) dt

    Which measures have densities w.r.t. other measures ?

    4.19 Problem. Let = f . Show that (A) = 0 implies (A) = 0, A F.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    42/178

    32 CHAPTER 4. SELECTED TOPICS

    4.20 Definition. Let |Fand |Fbe measures. The measure is said to be abso-lutely continuous w.r.t the measure |F( ) if

    (A) = 0 (A) = 0, A F.

    We saw that absolute continuity is necessary for having a density. It is even suffi-

    cient.

    4.21 Theorem. (Radon-Nikodym theorem)

    Assume that is -finite. Then iff = f for somef L+(F).Proof: See Bauer, [2]. 2

    4.22 Problem. Let P and Q be probability measures of a finite field

    F.

    (1) State Q P in terms of the generating partition ofF.(2) IfQ P find dQ/dP.An improtant question is how -integrals can be transformed into -integrals.

    4.23 Problem. Let = f . Discuss the validity off d =

    f

    d

    dd

    Hint: Prove it for f S+(F) and extend it by measure theoretic induction.The following prepares for chapter 15.

    4.24 Definition. The probability measures P|Fand Q|Fare said to be equivalent ifthey are mutually absolutley continuous (P Q), i.e.

    P(F) = 0 Q(F) = 0 whenever F F

    Obviously, we have P Q iff Q P and P Q. Therefore there exist the

    Radon-Nikodym derivatives

    dQ

    dP anddP

    dQ . The following two problems are containgeneral properties of Radon-Nkodym derivatives.

    4.25 Problem. Let P Q. Show that dPdQ

    = 1dQ

    dP.

    Hint: Show that for all F FTF

    dPdQ

    dQdP

    1

    dP = 0

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    43/178

    4.3. PRODUCT MEASURES AND FUBINIS THEOREM 33

    4.26 Problem. Let Q P. Show that P Q iff dQdP

    > 0 P-a.s.

    Hint: For proving

    , show thatQ(F) = 0

    implies1F

    dQ

    dP = 0 P-a.s.

    4.27 Problem. Let Q P and (An) F. Then P(An) 0 implies Q(An) 0.Hint: Let > 0 and choose M such that

    dQdP

    >M

    dQ

    dPdP < .

    (Why is this possible ?) Let B =dQ

    dP> M

    and split An = (An B) (An Bc).

    4.3 Product measures and Fubinis theoremLet (1, F) and (2, G) be measurable spaces. We want to discuss measure and inte-gration on 1 2.

    To begin with we have to define a -field on 1 2. This -field should be largeenough to contain at least the rectangles (diagram) F G where F Fand G G.4.28 Definition. The -field on 1 2 which is generated by the family of mea-

    surable rectangles

    R = {F G : F F, G G}

    is called the product ofFand G and is denoted by F G.A special case of a product -field is the Borel -field B2.Having established a -field we turn to measurable functions. Recall that any con-

    tinuous function f : R2 R is B2-measurable.4.29 Problem.

    (1) Let f : 1 R be F-measurable. Show that (x, y) f(x) is F G-measurable.

    (2) Let f : 1 R be F-measurable, g : 2 R be G-measurable, and let : R2 R be continuous. Show that (x, y) (f(x), g(y)) is F G-measurable.

    The preceding problem shows that functions of several variables which are set up

    as compositions of measurable functions of one variable are usually measurable with

    respect to the product -field (confer corollaries 2.12 and 2.13).The next point is to talk about measures. Basically, there are measures on product

    spaces having a very complicated structure. But there is a special class of measures on

    product spaces which are constructed from measures on the components in a simple

    way.

    The starting idea is the geometric content of rectangles in R2. If I1 and I2 areintervals then the geometric content (area) of the rectangle I1 I2 is the product of the

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    44/178

    34 CHAPTER 4. SELECTED TOPICS

    contents (lengths) of the constituting intervals. The extension of this idea to general

    measures leads to product measures.

    4.30 Theorem. Let(1, F, ) and(2, G, ) be measure spaces. Then there exists auniquely determined measure |F G satisfying

    ( )(F G) = (F)(G), F G R.

    The measure is called theproduct measureof and.Proof: See Bauer, [2]. 2

    As a consequence it follows that there is a uniquely determined measure on (R2, B2)which measures rectangles by their geometric area. In terms of product measure this

    is = 2

    , and is called the Lebesgue measure on R2

    .Let us turn to integration. Integration for general measures on product spaces can

    be a rather delicate matter. Things are much simpler when we are dealing with product

    measures. The main point is that multiple integration (i.e. integration w.r.t. product

    measures) can be reduced to iterated integration (i.e. evaluating integrals over single

    components).

    Let us proceed step by step.

    The most simple case is the integration of the indicator of a rectangle. Let FG R. Then we have

    1FG d( ) = ( )(F G) = (F)(G) = 1F d 1G dIn general, a set A F G need not be a rectangle. How, can we extend the formulaabove to general sets ? The answer is the section theorem (Cavalieris principle).

    For any set A 1 2 we call

    Ay := {x 1 : (x, y) A}, y 2,

    the y-section ofA (diagram !). Similarly the x-section Ax, x 1, is defined. Note,that for rectangles the sections are particularly simple.

    4.31 Problem. Find the sections of a rectangle.

    The section theorem says that the volume of a set is the sum of the volumes of its

    sections.

    4.32 Theorem. LetA F G . Then all sections of A are measurable, i.e. Ay F,y 2, andy (Ay) is aG-measurable function. Moreover, we have

    ( )(A) =

    (Ay)(dy)

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    45/178

    4.3. PRODUCT MEASURES AND FUBINIS THEOREM 35

    Proof: The measurability parts of the section theorem are a matter of measure

    theoretic routine arguments. Much more interesting is the integral formula.

    In order to understand the integral formula we write it as an iterated integral:

    ( )(A) =

    1A(x, y) (dx)

    (dy)

    It is easy to see that the inner integral evaluates to (Ay). Why is this formula valid ?First of all, it is valid for rectangles A = F G R. This follows immediately fromthe definition of the product measure. Moreover, both sides of the equation define

    measures on the -field F G. Since these two measures are equal on rectangles theynecessarily are equal on the generated -field. 2

    Let us illustrate how the section theorem works.

    4.33 Problem. Find the area of the unit circle by means of the section theorem.

    Outline: Let A be the unit circle with center at the origin. Then we have

    2(A) =

    (Ay) dy = 2

    11

    1 y2 dy

    Substitute y = sin t and apply (sin t cos t) = 2 cos2 t 1.4.34 Problem. Find the area of a right angled triangle by means of the section

    theorem.

    Our last topic in this section is to extend the section theorem to integrals. Theresulting general assertion is Fubinis theorem.

    4.35 Theorem. (Fubinis theorem) Letf : 1 2 R be a nonnegativeF G-measurable function. Then

    x f(x, y) and y

    f(x, y) (dx)

    are measurable functions andf d( ) =

    f(x, y) (dx)

    (dy)

    Proof: Fubinis theorem follows from the section theorem in a straightforward

    way by measure theoretic induction. 2

    4.36 Problem. Find a version of Fubinis theorem for integrable functions.

    4.37 Problem. Explain when it is possible to interchange the order of integration for

    an iterated integral.

    4.38 Problem. Deduce from Fubinis theorem assertions for interchanging the order

    of summation for double series of numbers.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    46/178

    36 CHAPTER 4. SELECTED TOPICS

    4.4 Spaces of integrable functions

    We know that the space

    L1 =

    L1(,

    F, ) is a vector space. We would like to define

    a norm on L1.A natural idea is to define

    ||f||1 :=

    |f| d, f L1.

    It is easy to see that this definition has the following properties:

    (1) ||f||1 0, f = 0 ||f||1 = 0,(2) ||f + g||1 ||f||1 + ||g||1, f , g L1,(3) ||f||1 || ||f||1, R, f L1.However, we have

    ||f||1 = 0 f = 0 -a.e.A function with zero norm need not be identically zero ! Therefore, ||.||1 is not a normon L1 but only a pseudo-norm.

    In order to get a normed space one has to change the space L1 in such a way that allfunctions f = g -a.e. are considered as equal. Then f = 0 -a.e. can be consideredas the null element of the vector space. The space of integrable functions modified in

    this way is denoted by L1 = L1(, F, ).4.39 Discussion. For those readers who like to have hard facts instead of soft wellness

    we provide some details.

    For any f L(F) letf = {g L(F) : f = g -a.e.}

    denote the equivalence class of f. Then integrability is a class-property and the space

    L1 := {f : f L1}is a vector space. The value of the integral depends only on the class and therefore it

    defines a linear function on L1 having the usual properties. In particular, ||f||1 := ||f||1defines a norm on L1.

    It is common practice to work with L1 instead of

    L1 but to write f instead of f.

    This is a typical example of what mathematicians call abuse of language.

    4.40 Theorem. The spaceL1(, F, ) is a Banach space.Proof: Let (fn) be a Cauchy sequence in L

    1, i.e.

    > 0 N() such that

    |fn fm| d < whenever n, m N().

    Let ni := N(1/2i). Then

    |fni+1 fni| d 0. First we note that forevery f L1(, F, P) there exists an F-measurable simple function g such that ||fg||1 < . This can easily be shown for the positive and the negative parts separately.Second we have show that for every F-measurable simple function g there exists an R-measurable simple function h such that ||g h||1 < . This follows from the measureextension theorem. we do not go into details but refer to Bauer, [2]. 2

    Let

    L2 = L2(, F, ) = {f L(F) :

    f2d < }

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    48/178

    38 CHAPTER 4. SELECTED TOPICS

    This is another important space of integrable functions.

    4.42 Problem.

    (a) Show that L2 is a vector space.(b) Show that

    f2d < is a property of the -equivalence class off L(F).

    By L2 = L2(, F, ) we again denote the corresponding space of equivalenceclasses. On this space there is an inner product

    < f, g >:=

    f g d, f, g L2.

    The corresponding norm is

    ||f||2 =< f,f >= f2d1/2

    The following facts can be proved in a way similar the the L1-case.

    4.43 Theorem. The spaceL2(, F, ) is a Hilbert space.

    4.44 Theorem. LetR be a field which generates F. Then the set ofR-measurablesimple functions is dense in L2(, F, P).

    4.5 Fourier transforms

    In order to represent and treat measures and probability measures in a mathematically

    convenient way measure transforms play a predominant role. The most simple measure

    transform is the moment generating function.

    4.45 Definition. Let |B be a finite measure. Then the function

    m(t) = etx (dx), t R,is called the Laplace transform or moment generating function of.

    The moment generating function shares important useful properties with other

    measure transforms but it has a serious drawback. The exponential function x etx isunbounded and therefore may be not integrable for some values of t and measures .The application of moment generating functions is only possible in such cases where

    the exponential function is integrable at least for all values oft in an interval of positivelength.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    49/178

    4.5. FOURIER TRANSFORMS 39

    This kind of complication vanishes if we replace the real-valued exponential func-

    tion x etx by its complex version x eitx. The corresponding measure transformis called the Fourier transform.

    4.46 Discussion. Let us recall some basic facts on complex numbers.

    The complex number field

    C = {z = u + iv : u, v R}is an extension of the real numbers R in such a way that a number i (the imaginarynumber) is introduced which satisfies i2 = i i = 1. All other rules of calculationcarry over from R to C.

    Complex numbers are not ordered but have an absolute value, defined by |z| =u2 + v2 ifz = u + iv. For every complex number z

    C there is a conjugate number

    z := u iv. The operation of conjugation satisfies z1z2 = z1z2. Moreover, we havezz = |z|2.

    Several functions defined on R can be extended to C. For our purposes only the

    exponential function is of importance. It is defined by

    eu+iv := eu(cos(v) + i sin(v)), u, v R.This definition satisfies ez1+z2 = ez1ez2 , z1, z2 C. For the notion of the Fouriertransform it is important to note that |eiv| = 1, v R. This is a consequence offamiliar properties of trigonometric functions.

    Differentiation and integration of complex-valued functions of a real variable is

    easily defined to be performed for the real and the imaginary parts separately. Be sure

    to note that we are not dealing with function of a complex variable ! This would be a

    much more advanced topic called complex analysis.

    4.47 Problem. Find the derivative ofx eax, x R, where a C.4.48 Problem. Show that the basic derivation rules (summation rule, product rule

    and chain rule) are valid for complex-valued functions.

    4.49 Problem. Let f be a complex-valued measurable function (both the real and theimaginary part are measurable). Show that |f| is -integrable iff both the real and theimaginary part off are -integrable.

    4.50 Problem. Show that the -integral of complex-valued functions on R is a linearfunctional.

    4.51 Problem. Let f be a complex-valued -integrable function. Show that

    f d

    |f| d.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    50/178

    40 CHAPTER 4. SELECTED TOPICS

    The next problem shows that the usual integration calculus (substitution, integra-

    tion by parts) carries over from real-valued functions to complex-valued functions.

    4.52 Problem. Show that indefinite integrals of complex-valued functions on R are

    primitives of their integrands.

    4.53 Problem. Find

    dc

    eax dx, where c, d R, a C.

    With these preparations we are in a position to proceed with Fourier transforms.

    4.54 Definition. Let |B be a finite measure. Then the function

    (t) =

    eitx (dx), t R,

    is called the Fourier transform of.

    Note that the Fourier transform is well-defined and finite for every t R.4.55 Problem. Find the Fourier transform of a point measure.

    4.56 Problem. Find the Fourier transform of an exponential distribution.

    4.57 Problem. Find the Fourier transform of a Poisson distribution.Hint: The series expansion of the exponential function carries over to the complex-

    valued case.

    4.58 Problem. Find the Fourier transform of a Gaussian distribution.

    Hint: Derive a differential equation for the Fourier transform.

    The Fourier transform can be used to find the moments of a measure.

    4.59 Theorem. Let|B be a finite measure. If|x|k (dx) < then is k-timesdifferentiable and

    dk

    dtk (t) t=0 = ik xk (dx)4.60 Problem. Prove 4.59.

    The fundamental fact on Fourier transforms is the uniqueness theorem.

    4.61 Theorem. Let1|B and2|B be finite measures. Then1 = 2 1 = 2.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    51/178

    4.5. FOURIER TRANSFORMS 41

    We dont prove this theorem here since it is a reformulation of the fundamen-

    tal Stone-Weierstrass approximation theorem of mathematical analysis. We refer to

    Bauer, [2].

    The notion of the Fourier transform can be extended to measures on (Rn, Bn).

    4.62 Definition. Let |Bn be a finite measure on Rn. Then the function

    (t) =

    eitx (dx), t Rn,

    is called the Fourier transform of.

    The uniqueness theorem is true also for the n-dimensional case.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    52/178

    42 CHAPTER 4. SELECTED TOPICS

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    53/178

    Part II

    Probability theory

    43

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    54/178

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    55/178

    Chapter 5

    Beyond measure theory

    5.1 Independence

    The notion of independence marks the point where probability theory goes beyond

    measure theory.

    Recall that two events A, B F are independent if the product formulaP(A B) = P(A)P(B) is true. This is easily extended to families of events.

    5.1 Definition. Let C and D be subfamilies ofF. The families C and D are said to beindependent (with respect to P) if P(A B) = P(A)P(B) for every choice A Cand B D.

    It is natural to call random variables X and Y independent if the correspondinginformation sets are independent.

    5.2 Definition. Two random variables X and Y are independent if(X) and (Y)are independent.

    The preceding definition can be stated as follows: Two random variables X and Yare independent if

    P(X B1, Y B2) = P(X B1)P(Y B2), B1, B2 B.

    This is equivalent to saying that the joint distribution PX,Y ofX and Y is the productofPX and PY.

    How to check independence of random variables ? Is it sufficient to check the

    independence of generators of the information sets ? This is not true, in general, but

    with a minor modification it is.

    5.3 Theorem. Let X and Y be random variables and let C and D be generatorsof the corresponding information sets. IfC andD are independent and closed underintersection then X andY are independent.

    45

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    56/178

    46 CHAPTER 5. BEYOND MEASURE THEORY

    5.4 Problem. Let F(x, y) be the joint distribution function of (X, Y). Show that Xand Y are independent iffF(x, y) = h(x)k(y) for some functions h and k.

    For independent random variables there is a product formula for expectations.

    5.5 Theorem. (1) LetX 0 andY 0 be independent random variables. Then

    E(XY) = E(X)E(Y)

    (2) LetX L1 andY L1 be independent random variables. Then XY L1 and

    E(XY) = E(X)E(Y)

    Proof: Apply measure theoretic induction to obtain (1). Part (2) follows from (1).

    2

    5.6 Problem. Let X and Y be random variables on a common probability space.Show that X and Y are independent iff

    E(ei(sX+tY)) = E(eisX)E(eitY), s, t R.

    Recall that square integrable random variables X and Y are called uncorrelated ifE(XY) = E(X)E(Y). This is a weaker notion than independence.

    5.7 Problem. Show that uncorrelated random variables need not be independent.

    5.8 Problem. Find the variance of the sample mean of independent random variables.

    5.9 Problem. Show that Xand Y are independent ifff(X) and g(Y) are uncorrelatedfor all bounded measurable functions f and g.

    The notion of independence (as well as the notion of uncorrelated random vari-

    ables) can be extended to more than two random variables. We will state the appropri-

    ate facts when we need them.

    5.2 Convergence and limit theorems

    For probability theory other kinds of convergence play a predominant role than those

    we are accustomed to so far.

  • 8/2/2019 Introduction to Probability Theory and Stochastic Processes

    57/178

    5.2. CONVERGENCE AND LIMIT THEOREMS 47

    5.10 Definition. Let (, F, P) be a probability space and let (Xn) be a sequenceof random variables. The sequence (Xn) is said to converge to a random variable XP-almost sure if

    limn

    Xn() = X() for P-almost all

    This kind of convergence is also considered in measure theory and we know that

    under certain additional conditions convergence P-almost sure implies convergence ofthe expectations of the random variables.

    However, the probabilistic meaning of almost sure convergence is limited. The

    reason is that the idea of approximating a random variable X by another random vari-able

    Yin a probabilistic sense does not require that the random variables similar for

    all . It is sufficient that the probability of being near to each other is large.5.11 Definition. Let (, F,


Recommended