Introduction to Probability Theory and Stochastic Processes

8/2/2019 Introduction to Probability Theory and Stochastic Processes

1/178

VIENNA GRADUATE SCHOOL OF FINANCE (VGSF)

LECTURE NOTES

Introduction to Probability Theory and

Stochastic Processes (STATS)

Helmut Strasser

Department of Statistics and Mathematics

Vienna University of Economics and BusinessAdministration

[email protected]

http://helmut.strasserweb.net/public

October 19, 2006

Copyright c 2006 by Helmut StrasserAll rights reserved. No part of this text may be reproduced, stored in a retrieval sys-

tem, or transmitted, in any form or by any means, electronic, mechanical, photocoping,

recording, or otherwise, without prior written permission of the author.


2/178

2


3/178

Contents

Preliminaries i

0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

0.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

I Measure and Integration 1

1 Measure and probability 3

1.1 Sigma-fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Measures on the real line . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Measurable functions and random variables 132.1 The idea of measurability . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 The basic abstract assertions . . . . . . . . . . . . . . . . . . . . . . 14

2.3 The structure of real-valued measurable functions . . . . . . . . . . . 14

3 Integral and expectation 17

3.1 The integral of simple functions . . . . . . . . . . . . . . . . . . . . 17

3.2 The extension process . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Convergence of integrals . . . . . . . . . . . . . . . . . . . . . . . . 21

The theorem of monotone convergence . . . . . . . . . . . . . . . . . 21

The infinite series theorem . . . . . . . . . . . . . . . . . . . . . . . 22The dominated convergence theorem . . . . . . . . . . . . . . . . . . 23

3.4 Stieltjes integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 Proofs of the main theorems . . . . . . . . . . . . . . . . . . . . . . 27

4 Selected topics 29

4.1 Image measures and distributions . . . . . . . . . . . . . . . . . . . . 29

4.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Product measures and Fubinis theorem . . . . . . . . . . . . . . . . 33

4.4 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 36

3


4/178

4 CONTENTS

4.5 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

II Probability theory 43

5 Beyond measure theory 45

5.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 Convergence and limit theorems . . . . . . . . . . . . . . . . . . . . 46

5.3 The causality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Random walks 51

6.1 The ruin problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.3 Walds equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.4 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Conditioning 61

7.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 61

7.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.3 Some theorems on martingales . . . . . . . . . . . . . . . . . . . . . 67

8 Stochastic processes 718.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8.2 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

8.3 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

8.4 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8.5 The Wiener Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

9 Martingales 79

9.1 From independent increments to martingales . . . . . . . . . . . . . . 79

9.2 A technical issue: Augmentation . . . . . . . . . . . . . . . . . . . . 81

9.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

The optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 85

9.4 Application: First passage times of the Wiener process . . . . . . . . 88

One-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Two-sided boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 90

The reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . 91

9.5 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 93


5/178

CONTENTS 5

III Stochastic calculus 95

10 The stochastic integral 97

10.1 Integrals along stochastic paths . . . . . . . . . . . . . . . . . . . . . 97

10.2 The integral of simple processes . . . . . . . . . . . . . . . . . . . . 98

10.3 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

10.4 Extending the stochastic integral . . . . . . . . . . . . . . . . . . . . 103

10.5 The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

11 Calculus for the stochastic integral 107

11.1 The associativity rule . . . . . . . . . . . . . . . . . . . . . . . . . . 107

11.2 Quadratic variation and the integration-by-parts formula . . . . . . . 108

11.3 Itos formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

12 Applications to financial markets 115

12.1 Financial markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

12.2 Trading strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

12.3 The Black-Scholes equation . . . . . . . . . . . . . . . . . . . . . . 117

12.4 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

12.5 Change of numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . 121

13 Stochastic differential equations 125

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12513.2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . 126

13.3 Wiener driven models . . . . . . . . . . . . . . . . . . . . . . . . . . 128

14 Martingale properties of stochastic integrals 131

14.1 Locally square integrable martingales . . . . . . . . . . . . . . . . . 131

14.2 Square integrable martingales . . . . . . . . . . . . . . . . . . . . . . 134

14.3 Levys theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

14.4 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . 136

15 Exponential martingale and Girsanovs theorem 14115.1 The exponential martingale . . . . . . . . . . . . . . . . . . . . . . . 141

15.2 Likelihood processes . . . . . . . . . . . . . . . . . . . . . . . . . . 142

15.3 Change of probability measures . . . . . . . . . . . . . . . . . . . . 143

16 Martingales in financial markets 147

16.1 Pricing in financial markets . . . . . . . . . . . . . . . . . . . . . . . 147

16.2 Pricing in Black-Scholes markets . . . . . . . . . . . . . . . . . . . . 147

16.3 Pricing in diffusion market models . . . . . . . . . . . . . . . . . . . 149


6/178

6 CONTENTS

IV Appendix 151

17 Foundations of modern analysis 153

17.1 Basic notions on set theory . . . . . . . . . . . . . . . . . . . . . . . 153

Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Cartesian products . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

17.2 Sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

17.3 The set of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 157

17.4 Real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Regulated functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

The variation of functions . . . . . . . . . . . . . . . . . . . . . . . . 16317.5 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

17.6 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165


7/178

Preliminaries

0.1 Introduction

The goal of this course is to give an introduction into some mathematical concepts and

tools which are indispensable for understanding the modern mathematical theory offinance. Let us give an overview of historic origins of some of the mathematical tools.

The central topic will be those probabilistic concepts and results which play an

important role in mathematical finance. Therefore we have to deal with mathematical

probability theory. Mathematical probability theory is formulated in a language that

comes from measure theory and integration. This language differs considerably from

the language of classical analysis, known under the label of calculus. Therefore, our

first step will be to get an impression of basic measure theory and integration.

We will not go into the advanced problems of measure theory where this theory

becomes exciting. Such topics would be closely related to advanced set theory and

topology which also differs basically from mere set theoretic language and topologi-cally driven slang which is convenient for talking about mathematics but nothing more.

Similarly, our usage of measure theory and integration is sort of a convenient language

which on this level is of little interest in itself. For us its worth arises with its power to

give insight into exciting applications like probability and mathematical finance.

Therefore, our presentation of measure theory and integration will be an overview

rather than a specialized training program. We will become more and more familiar

with the language and its typical kind of reasoning as we go into those applications

for which we are highly motivated. These will be probability theory and stochastic

calculus.

In the field of probability theory we are interested in probability models having a

dynamic structure, i.e. a time evolution governed by endogeneous correlation proper-

ties. Such probability models are called stochastic processes.

Probability theory is a young theory compared with the classical cornerstones of

mathematics. It is illuminating to have a look at the evolution of some fundamental

ideas of defining a dynamic structure of stochastic processes.

One important line of thought is looking at stationarity. Models which are them-

selves stationary or are cumulatives of stationary models have determined the econo-

metric literature for decades. For Gaussian models one need not distinguish between

strict and weak (covariance) stationarity. As for weak stationarity it turns out that typi-

i


8/178

ii PRELIMINARIES

cal processes follow difference or differential equations driven by some noise process.

The concept of a noise process is motivated by the idea that it does not transport any

information.

From the beginning of serious investigation of stochastic processes (about 1900)

another idea was leading in the scientific literature, i.e. the Markov property. This

is not the place to go into details of the overwhelming progress in Markov chains

and processes achieved in the first half of the 20th century. However, for a long time

this theory failed to describe the dynamic behaviour of continuous time Markov pro-

cesses in terms of equations between single states at different times. Such equations

have been the common tools for deterministic dynamics (ordinary difference and dif-

ferential equations) and for discrete time stationary stochastic sequences. In contrast,

continuous time Markov processes were defined in terms of the dynamic behaviour of

their distributions rather than of their states, using partial difference and differential

equations.The situation changed dramatically about the middle of the 20th century. There

were two ingenious concepts at the beginning of this disruption. The first is the con-

cept of a martingale introduced by Doob. The martingale turned out to be the final

mathematical fixation of the idea of noise. The notion of a martingale is located be-

tween a process with uncorrelated increments and a process with independent incre-

ments, both of which were the competing noise concepts up to that time. The second

concept is that of a stochastic integral due to K. Ito. This notion makes it possible to

apply differential reasoning to stochastic dynamics.

At the beginning of the stochastic part of this lecture we will present an introduc-

tion to the ideas of martingales and stopping times at hand of stochastic sequences(discrete time processes). However, the main subject of the second half of the lecture

will be continuous time processes with a strong focus on the Wiener process. However,

the notions of martingales, semimartingales and stochastic integrals are introduced in

a way which lays the foundation for the study of more general process theory. The

choice of examples is governed by be the needs of financial applications (covering the

notion of gambling, of course).

0.2 Literature

Let us give some comments to the bibliography.

The popular monograph by Bauer, [1], has been for a long time the standard text-

book in Germany on measure theoretic probability. However, probability theory has

many different faces. The book by Shiryaev, [21], is much closer to those modern

concepts we are heading to. Both texts are mathematically oriented, i.e. they aim at

giving complete and general proofs of fundamental facts, preferable in abstract terms.

A modern introduction into probability models containing plenty of fascinating phe-

nomena is given by Bremaud, [6] and [7]. The older monograph by Bremaud, [5], is

not located at the focus of this lecture but contains as appendix an excellent primer on


9/178

0.2. LITERATURE iii

probability theory.

Our topic in stochastic processes will be the Wiener process and the stochastic

analysis of Wiener driven systems. A standard monograph on this subject is Karatzas

and Shreve, [15]. The Wiener systems part of the probability primer by Bremaud

gives a very compact overview of the main facts. Today, Wiener driven systems are

a very special framework for modelling financial markets. In the meanwhile, general

stochastic analysis is in a more or less final state, called semimartingale theory. Present

and future research applies this theory in order to get a much more flexible modelling

of financial markets. Our introduction to semimartingale theory follows the outline by

Protter, [20] (see also [19]).

Let us mention some basic literature on mathematical finance.

There is a standard source by Hull, [11]. Although this book heavily tries to present

itself as not demanding, nevertheless the contrary is true. The reason is that the com-

bination of financial intuition and the appearently informal utilization of advancedmathematical tools requires on the readers side a lot of mathematical knowledge in

order to catch the intrinsics. Paul Wilmott, [22] and [23], tries to cover all topics in

financial mathematics together with the corresponding intuition, and to make the an-

alytical framework a bit more explicit and detailed than Hull does. I consider these

books by Hull and Wilmott as a must for any beginner in mathematical finance.

The books by Hull and Wilmott do not pretend to talk about mathematics. Let us

mention some references which have a similar goal as this lecture, i.e. to present the

mathematical theory of stochastic analysis aiming at applications in finance.

A very popular book which may serve as a bridge from mathematical probability

to financial mathematics is by Bjrk, [4]. Another book, giving an introduction bothto the mathematical theory and financial mathematics, is by Hunt and Kennedy, [12].

Standard monographs on mathematical finance which could be considered as cor-

nerstones marking the state of the art at the time of their publication are Karatzas and

Shreve, [16], Musiela and Rutkowski, [17], and Bielecki and Rutkowski, [3]. The

present lecture should lay some foundations for reading books of that type.


10/178

iv PRELIMINARIES


11/178

Part I

Measure and Integration

1


12/178


13/178

Chapter 1

Measure and probability

1.1 Sigma-fields

Let be a (non-empty) set. We are interested in systems of subsets of which areclosed under set operations.

1.1 Example. In general, a system of subsets need not be closed under set operations.

Let = {1, 2, 3}. Consider the system of subsets A = {{1}, {2}, {3}}. This sys-tem is not closed under union, intersection or complementation. E.g. the complement

of{1} is not in A.It is clear that the power set is closed under any set operations. However, there are

smaller systems of sets which are closed under set operations, too.Let = {1, 2, 3}. Consider the system of subsets B = {, , {1}, {2, 3}}. Itis easy to see that this system is closed under union, intersection and complementa-

tion. Moreover, it follows that these set operations can be repeated in arbitrary order

resulting always in sets contained in A.1.2 Definition. A (non-empty) system Fof subsets of is called a -field if it

closed under union, intersection and complementation als well as under building limits

of monotone sequences. The pair (, F) is called a measurable space.There are some obvious necessary properties of a -field.

1.3 Problem.

(1) Show that every -field on contains and .(2) What is the smallest possible -field on ?

If we want to check whether a given system of sets is actually a -field then it issufficient to verify only a minimal set of conditions. The following assertion states

such a minimal set of conditions.

1.4 Proposition. A (non-empty) system Fof subsets of is a -field iff it satisfiesthe following conditions:

3


14/178

4 CHAPTER 1. MEASURE AND PROBABILITY

(1) F,(2)A F A F,(3) If(Ai)

i=1 F

then i=1

Ai F

.

1.5 Problem. Prove 1.4.

Let us discuss a number of examples.

When one starts to construct a -field one usually starts with a family C of setswhich in any case should be contained in the -field. If this starting family C does notfulfil all conditions of a -field then a simple idea could be to add further sets untilthe family fulfils all required conditions. Actually, this procedure works if the starting

family C is a finite system.

1.6 Definition. Let C be any system of subsets on . The -field generated by C isthe smallest -field Fwhich contains C. It is denoted by (C).

1.7 Problem. Assume that C = {A}. Find (C).

1.8 Problem. Assume that C = {A, B}. Find (C).

1.9 Problem. Show by giving an example that the union of two -field need not be a-field.

If the system C is any finite system then (C) consists of all sets which can beobtained by finitely many unions, intersections and complementations of sets in C.Although the resulting system (C) still is finite a systematic overview over all setscould be rather complicated.

Things are much easier if the generating system is a finite partition of.

1.10 Proposition. Assume thatC is a finite partition of . Then (C) consists ofand of all unions of sets in C.


1.12 Problem. Let be a finite set. Find the -field which is generated by theone-point sets.

It is a remarkable fact that every finite -field is generated by a partition.

1.13 Problem. Show that every finite -field Fis generated by a partition of.Hint: Call a nonempty set A Fan atom if it contains no nonempty proper subset inF. Show that the collection of atoms is a partition of and that every set in Fis aunion of atoms.


15/178

1.1. SIGMA-FIELDS 5

Information sets

In probability theory a model of a random experiment consists of a pair (,

F) where

is a non-empty set and Fis a -field on .The set serves as sample space. It is interpreted as set of possible outcomes

of the experiment. Note, that it is not necessarily the case that single outcomes are

actually observable.

The -field Fis interpreted as the field of observable events. Observability of aset A means that after having performed the random experiment it can be decidedwhether A has been realized or not. In this sense the -field contains the informationwhich is obtained after having performed the random experiment. Therefore Fis alsocalled the information set of the random experiment.

A simple random variable X is a simple function whose basic partition is observ-

able, i.e. (X = a) Ffor every value a ofX. The information set ofX is the -fieldwhich is generated by the basic partition ofX. It is is denoted by (X).

1.14 Example. Consider the random experiment of throwing a coin n-times. Denotethe sides of the coin by 0 and 1. Then the sample space is = {0, 1}n. Assume thatthe outcomes of each throw are observable. IfXi denotes the outcome of the i-th throwthen this means that (Xi = 0) and (Xi = 1) are observable.

1.15 Problem. Let = {0, 1}3 and define Sk :=k

i=1 Xi.(1) Find (S1), (S2), (S3).(2) Find (X1), (X2), (X3).

1.16 Problem. Let be the sample space of throwing a die twice. Denote theoutcomes of the throws Xand Y, respectively. Find (X), (Y), (X+Y), (XY).

Borel sigma-fields

Let us discuss -fields on R.Clearly, the power set ofR is a -field. However, the power set is too large. Let

us be more modest and start with a system of simple sets and then try to extend thesystem to a -field.

The following example shows that such a procedure does not work if we start with

one-point sets.

1.17 Problem. Let Fbe the collection of all subsets ofR which are countable or arethe complement of a countable set.

(1) Show that Fis a -field.(2) Show that Fis the smallest -field which contains all one-point sets.(3) Does Fcontain intervals ?


16/178


A reasonable -field on R should at least contain all intervals.

1.18 Definition. The smallest -field on R which contains all intervals is called the

Borel -field. It is denoted by B and its elements are called Borel sets.Unfortunately, there is no way of describing all sets in B in a simple manner. All

we can say is that any set which can be obtained from intervals by countably many set

operations is a Borel set. E.g., every set which is the countable union of intervals is

a Borel set. But there are even much more complicated sets in B. On the other hand,however, there are subsets ofR which are not in B.

The concept of Borel sets is easily extended to Rn.

1.19 Definition. The -field on Rn which is generated by all rectangles

R = {I1 I2 In : Ik being any interval}is called the Borel -field on Rn and is denoted by Bn.

All open and all closed sets in Rn are Borel sets since open sets can be represented

as a countable union of rectangles and closed sets are the complements of open sets.

Random variables

Let (, F) be a model of a random experiment. What is a random variable ?The idea of a random variable is that of a function X :

R such that assertions

about X are observable events, i.e. are contained in F. But what are assertions on X ?In the case of a simple function we considered assertions of the form (X = a).

But for functions taking an uncountable number of values we have to consider also

assertions of the form (X I) where I is an interval.1.20 Definition. A random variable is a function X : R such that (X I) F

for every interval I.

1.21 Problem. Show that every function satisfying (X x) Ffor every x R isa random variable.

Let us turn to the question for the information set of a general random variable.Conceptually, the information set (X) is the -field that is generated by all eventswhich can be observed through X.

Obviously, the system C consisting of the sets (X I), I being an interval, is nota -field. However, using the the Borel -field we can describe the information set ofa random variable X in a quasi-explicit way.

1.22 Theorem. The information set(X) is the system of sets (X B) whereB isan arbitrary Borel set. In particular, for a random variableX we have (X B) Ffor allB B.


17/178

1.2. MEASURES 7

1.2 Measures

Measures are set functions. Let us consider some examples.

1.23 Example. Let by an arbitrary set and for any subset A define

(A) = |A| :=

k ifA contains k elements, ifA contains infinitely many elements.

This set function is called a counting measure. It is defined for all subsets of .Obviously, it is additive, i.e.

A B = (A B) = (A) + (B).

Measures are set functions which intuitively should be related to the notion ofvolume. Therefore measures should be nonnegative and additive. In order to apply

additivity they should be defined on systems of subsets which are closed under the

usual set operations. This leads to the requirement that measures should be defined on

-fields. Finally, if the underlying -field contains infinitely many sets there should besome rule how to handle limits of infinite sequences of sets.

Thus, we are ready for the definition of a measure.

1.24 Definition. Let be a non-empty set. A measure on is a set function whichsatisfies the following conditions:

(1) is defined on a -fieldF

on .(2) is nonnegative, i.e. (A) 0, A F, and () = 0.(3) is -additive, i.e. for every pairwise disjoint sequence (Ai)

i=1 F

i=1

Ai

=

i=1

(Ai)

A measure is called finite if () < . A measure P is called a probabilitymeasure ifP() = 1. If|Fis a measure then (, F, ) is a measure space. IfP|Fis a probability measure then (, F, P) is called a probability space.There are some obvious consequences of the preceding definition.1.25 Problem.

Show that every measure is additive.

1.26 Problem. Let |Fbe a measure.(1) A1 A2 implies (A1) (A2).(2) Show the inclusion-exclusion law:

(A1) + (A2) = (A1 A2) + (A1 A2)


18/178


(3) The preceding problem gives a formula for (A1 A2) provided that all sets havefinite measure. Extend this formula to the union of three sets.

The property of being -additive both guarantees additivity and implies easy rulesfor handling infinite sequences of sets.

1.27 Problem. Let |Fbe a measure.(1) IfAi A then (Ai) (A).(2) IfAi A and (A1) < then (Ai) (A).1.28 Problem.

(1) Any nonnegative linear combination of measures is a measure.

(2) Every infinite sum of measures is a measure.

1.29 Problem. Explain the construction of measures on a finite -field.Hint: Measures have to be defined for atoms only.

1.3 Measures on the real line

The most simple example of a measure is a point measure.

1.30 Definition. The set function defined by

a(A) = 1A(a), A

R,

is called the point measure at a R.Take a moments reflection on whether this definition actually satisfies the proper-

ties of a measure. Note that any point measure can be defined for all subsets ofR, i.e.

it is defined on the largest possible -field 2R.Taking linear combinations of point measures gives a lot of further examples of

measures.

1.31 Problem.

(1) Let = 0 + 21 + 0.51. Calculate (([0, 1)), ([

1, 1)), ((

1, 1]).

(2) Describe in words the values of = kj=1 aj .(3) Let x Rn be a list of data and let (I) be the percentage of data contained in I.Show that is a measure by writing it as a linear combination of point measures.

Let = R and for every interval I R define(I) := length ofI

E.g. ((a, b]) = b a. This set function is called the Lebesgue content of intervals.At the moment it is defined only on the family of all intervals.


19/178

1.3. MEASURES ON THE REAL LINE 9

The Lebesgue content is also additive in the following sense: If I1 and I2 are twointervals such that the union I1 I2 = I3 is an interval, too, then

I1 I2 = (I1 I2) = (I1) + (I2).However, the family of intervals is not a -field. In order to obtain a measure we haveto extend the Lebesgue content to a -field which contains the intervals. The smallest-field with this property is the Borel--field.

1.32 Theorem. (Measure extension theorem)

There exists a uniquely determined measure |B such that ((a, b]) = b a, a < b.This measure is called theLebesgue measure.

Knowing that |B is a measure we may calculate its values for simple Borel setswhich are not intervals.

1.33 Problem. Find the Lebesgue measure ofQ.

Now, let us turn to the problem how to get an overview over all measures |B. Werestrict our interest to measures which give finite values to bounded intervals.

Let |B be a measure such that ((a, b]) < for a < b. Define

(x) :=

((0, x]) ifx > 0

((x, 0]) ifx 0and note that for any a < b we have

((a, b]) = (b) (a) =

((0, b]) ((0, a]) if0 a < b,((0, b]) + ((a, 0]) ifa < 0 < b,

((b, 0]) + ((a, 0]) ifa < b 0This means: For every such measure there is a function : R R which definesthe measure at least for all intervals. This function is called the measure-defining

function of.Note, that our definition of the measure-defining function is such that (0) = 0.

However, any function which differs from by an additive constant, only, defines thesame measure.

1.34 Problem. Calculate the measure-defining function of the following measures:

(1) A point measure: 2, 0, 3(2) A linear combination of point measures: 2 + 20 + 0.53(3) The Lebesgue measure .

1.35 Problem. Let |B be finite on bounded intervals. Explain the fundamentalproperties of the measure-defining function :


20/178


(1) is increasing.(2) is right-continuous.

The following is an existence theorem which establishes a one-to-one relation be-

tween functions and measures.

1.36 Theorem. (Measure extension theorem)

For every function : R R satisfying properties (1) and (2) of 1.35there exists auniquely determined measure such that((a, b]) = (b) (a).

If the measure-defining function is continuous and piecewise differentiable thenits derivative is called the density of the measure (with respect to the Lebesguemeasure ). This name comes from

(x) = limh0

(x + h) (x h)2h

= limh0

((x h, x + h])((x h, x + h])

In such a situation we have

((a, b]) =

ba

(x) dx

A measure |B is discrete if it is a finite or infinite linear combination of pointmeasures. A counting measure is a discrete measure where all point measures with

positive weight have weight one.

1.37 Problem. Explain the characteristic properties of the measure-defining function

of a discrete measure and of a counting measure.

1.38 Problem. Let be the measure-defining function of.(1) Show that ({a}) = (a).(2) For which measures is continuous ?(3) For which measures is a step-function ?

1.4 Probability distributionsA probability model consists of a sample space , a -field Fand a probability mea-sure P|F. Such a triple (, F, P) is called a probability space.

For practical applications it is important to specify the particular probability mea-

sure under consideration. This can be done either if the -field Fhas a simple struc-ture, e.g. if it is finite (confer problem 1.29), or if the -field is the information set ofa random variable X.

Let us consider the second case. Let X be a random variable. The information setofX is the -field (X) consisting of all events (X B) where B B.


21/178

1.4. PROBABILITY DISTRIBUTIONS 11

1.39 Definition. The set function

PX : B

P(X

B); B

B,

is called the distribution ofX (under P).

1.40 Problem. Show that PX is a probability measure on (R, B).

Since PX is a measure on (R, B) it can be represented by its measure definingfunction . For probability measures it is, however, simpler to use the distributionfunction

F(x) = P(X x) = PX((, x]) = PX((, 0]) + (x)

which differs from only by an additive constant. Thus we have

P(a < X b) = PX((a, b]) = F(b) F(a) = (b) (a).

1.41 Proposition. Let X be a random variable with distribution function F. ThenPX = F.

Many examples illustrating the relation between random variables and their distri-

bution function have been considered in the introductory course.


22/178



23/178

Chapter 2

Measurable functions and random

variables

2.1 The idea of measurability

Recall the concept of a random variable. This is a function X : (, F, P) R definedon a probability space such that the sets (X B) are in Ffor all Borel sets B B.

The notion of a random variable is a special case of the notion of a measurable

function.

2.1 Definition. A function f : (, F) R defined on a measurable space is calledmeasurable if the sets (f B) are in Ffor all Borel sets B B.

The notion of measurability is not restricted to real-valued functions.

Let (, A) and (Y, B) be measurable spaces. Moreover, let f : Y be afunction. Recall that (f B) is the inverse image of B under f, usually denoted byf1(B).

2.2 Definition. A function f : (, A) (Y, B) is called (A, B)-measurable iff1(B) A for all B B.

Let us agree upon some terminology.

(1) When we consider real-valued functions then we always use the Borel--field inthe range off. Iff : (, F) (R, B) then we simply say that f is F-measurable ifwe mean that it is (F, B)-measurable.(2) When we consider functions f : R R then (B, B)-measurability is called Borelmeasurability. The term Borel is thus concerned with the -field in the domain off.

To get an idea what measurability means let us consider some simple examples.

2.3 Problem. Let (, F, ) be a measure space and let f = 1A where A . Showthat f is F-measurable iffA F.

13


24/178

14 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

It follows that very complicated functions are Borel-measurable, e.g. f = 1Q.

2.4 Problem. Let ,F

, ) be a measure space and let f :

R be a simple

function. Show that f is F-measurable iff all sets of the canonical representation arein F.

2.2 The basic abstract assertions

There are two fundamental principles for dealing with measurability. The first prin-

ciple says that measurability is a property which is preserved under composition of

functions.

2.5 Theorem. Letf : (, A) (Y, B) be(A, B)-measurable, and letg : (Y, B) (Z, C) be(B, C)-measurable. Then g f is (A, C)-measurable.


The second principle is concerned with checking measurability. For checking mea-

surability off it is sufficient to consider the sets in a generating system of the -fieldin the range off.

2.7 Theorem. Letf : (, A) (Y, B) and letC be a generating system ofB, i.e.

B= (

C). Then f is (

A,

B)-measurable ifff1(C)

Afor allC

C.

Proof: Let D := {D Y : f1(D) A}. It can be shown that D is a -field. Iff1(C) A for all C C then C D. This implies (C) D. 2

2.8 Problem. Fill in the details of the proof of 2.7.

2.3 The structure of real-valued measurable functions

Let (, F) be a measurable space. Let L(F) denote the set of all F-measurable real-valued functions. We start with the most common and most simple criterion for check-

ing measurability of a real-valued function.

2.9 Problem. Show that a function f : R is F-measurable iff(f ) Fforevery R.Hint: Apply 2.7.

This provides us with a lot of examples of Borel-measurable functions.

2.10 Problem.

(a) Show that every monotone function f : R R is Borel-measurable.


25/178

2.3. THE STRUCTURE OF REAL-VALUED MEASURABLE FUNCTIONS 15

(b) Show that every continuous function f : Rn R is Bn-measurable.Hint: Note that (f ) is a closed set.

(c) Let f : (,F

)

R beF

-measurable. Show that f+, f,|f|, and every

polynomial a0 + a1f + + anfn are F-measurable.

The next exercise is a first step towards the measurability of expressions involving

several measurable functions.

2.11 Problem. Let (f1, f2, . . . , f n) be measurable functions. Then

f = (f1, f2, . . . , f n) : Rn

is (F, Bn)-measurable.

2.12 Corollary. Letf1, f2, . . . , f n be measurable functions. Then for every continuousfunction : Rn R the composition (f1, f2, . . . , f n) is measurable.

Proof: Apply 2.5. 2

2.13 Corollary. Let f1, f2 be measurable functions. Then f1 + f2, f1 f2, f1 f2,f1 f2 are measurable functions.


As a result we see that

L(

F) is a space of functions where we may perform any

algebraic operations without leaving the space. Thus it is a very convenient space forformal manipulations. The next assertion shows that we may even perform all of those

operations involving a countable set (e.g. a sequence) of measurable functions !

2.15 Theorem. Let(fn)nN be a sequence of measurable functions. Then supn fn,infn fn are measurable functions. LetA := ( limn fn). Then A Fandlimn fn 1Ais measurable.

Proof: Since

(supn

fn ) =n

(fn )

it follows from 2.9 that supn fn and infn fn = supn(fn) are measurable. We have

A := ( limn

fn) =

supk

infnk

fn = infk

supnk

fn

This implies A F. The last statement follows from

limn

fn = supk

infnk

fn on A.

2


26/178

16 CHAPTER 2. MEASURABLE FUNCTIONS AND RANDOM VARIABLES

Note that the preceding corollaries are only very special examples of the power of

Theorem 2.5. Roughly speaking, any function which can be written as an expression

involving countable many operations with countable many measurable functions is

measurable. Therefore it is rather difficult to construct non-measurable functions.

Let us denote the set of all F-measurable simple functions by S(F). Clearly, alllimits of simple measurable functions are measurable. The remarkable fact being fun-

damental for almost everything in integration theory is the converse of this statement.

2.16 Theorem.

(a) Every measurable function f is the limit of some sequence of simple measur-able functions.

(b) Iff 0 then the approximating sequence can be chosen to be increasing.Proof: The fundamental statement is (b).

Let f 0. For every n N define

fn :=

(k 1)/2n whenever (k 1)/2n f < k/2n, k = 1, 2, . . . , n2nn whenever f n

Then fn f. If f is bounded then (fn) converges uniformly to f. Part (a) followsfrom f = f+ f. 2

2.17 Problem. Draw a diagram illustrating the construction of the proof of 2.16.

2.18 Problem. Show: If f is bounded then the approximating sequence can be

chosen to be uniformly convergent.


27/178

Chapter 3

Integral and expectation

3.1 The integral of simple functions

Let (, F, ) be a measure space. We start with defining the -integral of a measurablesimple function.

3.1 Definition. Let f =n

i=1 ai1Fi be a nonnegative simple F-measurable functionwith its canonical representation. Then

f d :=

ni=1

ai(Fi)

is called the -integral off.

We had to restrict the preceding definition to nonnegative functions since we admit

the case (F) = . If we were dealing with a finite measure the definition wouldwork for all F-measurable simple functions.

3.2 Example. Let (, F, P) be a probability space and let X = ni=1 ai1Fi be asimple random variable. Then we have E(X) =

X dP.

3.3 Problem. What is the integral with respect to a linear combination of point

measures ? Which functions can be integrated ?

3.4 Problem. Give a geometric interpretation of the integral of a step function with

respect to a Borel measure.

3.5 Theorem. The-integral on S(F)+ has the following properties:(1)

1Fd = (F),

(2)

(sf + tg) d = s

f d + t

g d ifs, t R+ andf, g S(F)+(3)

f d g d iff g andf, g S(F)+

Proof: The only nontrivial part is to prove that

(f + g)d =

f d +

gd. 2

17


28/178

18 CHAPTER 3. INTEGRAL AND EXPECTATION

3.6 Problem. Show that

(f + g)d =

f d +

gd for f, g S(F)+.Hint: Try to find the canonical representation of f+ g in terms of the canonical repre-sentations off and g.

It follows that the defining formula of the -integral can be applied to any (non-negative) linear combination of indicators, not only to canonical representations !

3.2 The extension process

We know that every nonnegative measurable function f L(F)+ is the limit of anincreasing sequence (fn) S(F)+ of measurable simple functions: fn f. It is anatural idea to think of the integral of f as something like

f d := lim

n

fn d (1)

This is actually the way we will proceed. But there are some points to worry about.

First of all, we should ask whether the limit on the right hand side exists. This is

always the case. Indeed, the integrals

fnd form an increasing sequence in [0, ].This sequence either has a finite limit or it increases to . Both cases are covered byour definition.

The second and far more subtle question is whether the definition is compatible

with the definition of the integral on

S(

F). This is the only nontrivial part of the

extension process of the integral and it is the point where -additivity of comes in.This is proved in Theorem 3.51.

The third question is whether the value of the limit is independent of the approxi-

mating sequence. This is is is also the case and proved in Theorem 3.52.

Thus, (1) is a valid definition of the integral of f L(F)+.

3.7 Definition. Let (, F, ) be a measure space. The -integral of a functionf L+(F) is defined by equation (1) where (fn) is any increasing sequence (fn) S(F)+ of measurable simple functions such that fn f.

It is now straightforward that the basic properties of the integral of simple functionsstated in Theorem 3.5 carry over to L(F)+.

3.8 Theorem. The-integral on L(F)+ has the following properties:(1)

1Fd = (F),

(2)

(sf + tg) d = s

f d + t

g d ifs, t R+ andf, g L(F)+(3)

f d g d iff g andf, g L(F)+

The following problems establish some easy properties of the integral developed

so far.


29/178

3.2. THE EXTENSION PROCESS 19

3.9 Problem. Let f L(F)+. Prove Markoffs inequality

(f > a) 1

a f d, a > 0.3.10 Problem. Let f L(F)+. Show that f d = 0 implies (f = 0) = 0.

Hint: Show that (f > 1/n) = 0 for every n N.An assertion A about a measurable function f is said to hold -almost everywhere

(-a.e.) if(Ac) = 0. Using this terminology the assertion of the preceding exercisecan be phrased as:

f d = 0, f

0

f = 0 -a.e.

If we are talking about probability measures and random variables the phrase almost

everwhere is sometimes replaced by almost sure.

3.11 Problem. Let f L(F)+. Show that f d < implies (f > a) < forevery a > 0.

Now the integral is defined for every nonnegative measurable function. The value

of the integral may be . In order to define the integral for measurable functionswhich may take both positive and negative values we have to exclude infinite integrals.

3.12 Definition. A measurable function f is -integrable if f+ d < andf d < . Iff is -integrable then

f d :=

f+ d

f d

The set of all -integrable functions is denoted by L1() = L1(, F, ).Proving the basic properties of the integral of integrable functions is an easy matter.

We collect these fact in a couple of problems.

3.13 Problem. Show that f L

(F

) is -integrable iff|f| d < .3.14 Problem. The set L1() is a linear space and the -integral is a linear functional

on L1().3.15 Problem. The -integral is an isotonic functional on L1().3.16 Problem. Let f L1(). Show that | f d| |f| d.3.17 Problem. Let f be a measurable function and assume that there is an integrable

function g such that |f| g (say: f is dominated). Then f is integrable.


30/178


3.18 Problem.

(a) Discuss the question whether bounded measurable functions are integrable.

(b) Characterize those measurable simple functions which are integrable.

Many assertions in measure theory concerning measurable functions are stable un-

der linear combinations and under convergence. Assertions of such a type need only

be proved for indicators. The procedure of proving (understanding) an assertion for in-

dicators and extending it to nonnegative and to integrable functions is called measure

theoretic induction.

3.19 Problem. Show that integrals are linear with respect to the integrating measure.

Let us finish this section by some notational remarks.

For convenience we denoteA

f d :=

1Af d, A F.

3.20 Problem.

(a) Let f be an integrable function. ThenA

f d = 0 for all A Fimplies f = 0-a.e.

(b) Let f and g be integrable functions. ThenA

f d =A

g d for all A Fimplies f = g -a.e.

If(,F

, P) is a probability space and X

0 is a random variable then

E(X) :=

X dP

is called the expectation of X. Thus, expectations are integrals of random variablesw.r.t. the underlying probability measures.

3.21 Problem. Let Xbe a P-integrable random variable. Prove Cebysevs inequality.

If we are dealing with Borel measure spaces (R, B, ) where the measure is de-fined by some increasing right-continuous function . Then we write

f d := f d = f(x) d(x)A special case is the Lebesgue integral

f d =

f(x) dx. Moreover, integral limits

are defined by ba

f d :=

(a,b]

f d.

Note, that the lower integral limit is not included, but the upper limit is included !

3.22 Problem. What is the difference betweenba

f d and(a,b)

f d ?


31/178

3.3. CONVERGENCE OF INTEGRALS 21

3.3 Convergence of integrals

One of the reasons for the great success of abstract integration theory are the conver-

gence theorems for integrals. The problem is the following. Assume that (fn) is asequence of functions converging to some function f. When can we conclude that

limn

fn d =

f d ?

There are (at least) three basic assertions of this kind which could be viewed as the

three basic principles of integral convergence. We will present these principles together

with typical applications.

The theorem of monotone convergence

The first principle says that for increasing sequences of nonnegative functions the limit

and the integral may be interchanged.

3.23 Theorem. (Theorem of Beppo Levi)

Let(fn) L(F)+. Then fn f limn

fn d =

f d

The theorem is proved in section 3.5. Note that there is no assumption on integra-

bility. If the sequence is decreasing instead of increasing the corresponding assertion

is only valid if the sequence is integrable.

3.24 Problem.

(a) Let (fn) L1(F)+. Then fn f limn fn d = f d(b) Show by example that the integrability assumption cannot be omitted without

compensation.

The first application looks harmless.

3.25 Problem.

(a) Let f be a measurable function such that f = 0 -a.e.. Then f is integrable andf d = 0.

Hint: Consider f+ and f separately.(b) Let f and g be measurable functions such that f = g -a.e.. Then f is integrable

iffg is integrable.

Our next application is the starting point of a couple of problems which are con-

cerned with advanced calculus. They serve as a warming up for stochastic calculus

which will be the subject of part III of this text.

Let : [a, b] R be increasing and right-continuous. For any bounded measur-able function f : [a, b] R let

(f )(t) :=ta

f d


32/178


3.26 Problem.

(a) Show that f g is right-continuous with left limits.(b) Show that (f )(t) = f(t)(t).

The infinite series theorem

The second principle says that for nonnegative measurable function integrals and in-

finite sums may be interchanged. It is an easy consequence of the monotone conver-

gence theorem (see section 3.5).

3.27 Theorem. For every sequence (fn) of nonnegative measurable functions wehave

n=1

fn d = n=1

fn d

3.28 Problem. Let (, F, ) be a measure space and f 0 a measurable function.Show that : A

Af d is a measure.

3.29 Problem. Let (amn) a double sequence of nonnegative numbers. Show thatm

n amn =

n

m amn.

Hint: Define fn(x) := amn ifx

(m

1, m].

3.30 Problem.

(a) Let = N and F = 2N. Show that for every sequence an 0 there is auniquely determined measure |Fsuch that ({n}) = an.

(b) Find

f d for f 0.

3.31 Problem. Let f 0 be a measurable function and right-continuous andincreasing. Iff = 0 except at countably many points (xi)iN then

f d =

i=1

f(xi)(xi).

3.32 Problem. Let f and be increasing right-continuous functions. Show that

ba

f(t) d(t) =

a


33/178

3.3. CONVERGENCE OF INTEGRALS 23

The dominated convergence theorem

The most popular result concerning this issue is Lebesgues theorem on dominated

convergence. Find the proof in section 3.5.

3.33 Theorem. (Dominated convergence theorem)

Let (fn) be a sequence of measurable function which is dominated by an inte-grable function g, i.e. |fn| g, n N. If fn f -a.e. then f L1() andlimn

fn d =

f d.

3.34 Problem. Show that under the assumptions of the dominated convergence

theorem we even have

limn |fn f| d = 0

(This type of convergence is called mean convergence.)

3.35 Problem. Discuss the question whether a uniformly bounded sequence of

measurable functions is dominated in the sense of the dominated convergence theorem.

There is a plenty of applications of the dominated convergence theorem. Let us

present those consequences which show the superiority of general measure theory

compared with previous approaches to integration.

Recall the notion of a Riemannian sequence of subdivisions of an interval [a, b].

3.36 Problem. Let f : [a, b] R be a regulated function and let be increasing andright-continuous. Show that for every Riemannian sequence of subdivisions of [a, b]

(a) limn

kni=1

f(ti1)((ti) (ti1)) =ba

f d

(b) limn

kni=1

f(ti)((ti) (ti1)) =ba

f+ d

The preceding convergence statements for Riemannian sums are the key for impor-tant mathematical theorems.

3.37 Problem. Let f and g be increasing right continuous functions. Show thefollowing versions of the integration by parts formula (a < b):

(a) f(b)g(b) = f(a)g(a) +

ba

f dg +ba

g df

(b) f(b)g(b) = f(a)g(a) +

ba

f dg +ba

g df +

a


34/178


3.38 Problem. Let f be increasing and right continuous. Show that for everyRiemannian sequence of subdivisions of [a, b]

limn

kni=1

(f(ti) f(ti1))2 =

a


35/178

3.4. STIELTJES INTEGRATION 25

3.41 Problem. Explain why f g is of bounded variation.The Stieltjes integral has many properties which can be used for calculation pur-

poses. Moreover, the Stieltjes integral is a special case of the general stochastic integralwhich is an indispensable tool in the theory of stochastic processes and their applica-

tions.

If h : [a, b] R is bounded measurable then the integral ba

f d(h g) is well-defined. How can we express this integral in terms of an integral with respect to g ?

3.42 Theorem. (Associativity rule)

Letf andh be bounded measurable functions. Then

b

a

f d(h

g) =

b

a

fhdg

Proof: The assertion is obvious for f = 1(a,t], 0 t b. The general casefollows by measure theoretic induction. 2

Since for rules of this kind the function f is only a dummy function it is convenientto state the rule in a more compact way as

d(h g) = h dgwhich is called differential notation. It should be kept in mind that such formulas

have always to be interpreted as assertions about integrals.

3.43 Problem. Let g be differentiable with continuous derivative g. Show thatdg = gdt.

If g has jumps then we have (h g)(t) = h(t)g(t). This follows by the sameargument as for problem 3.24. Hence, h g is continuous whenever g is continuous.

Note, that the assertions on the convergence of Riemannian sums, shown in prob-

lem 3.36, are also true for right-continuous functions g of bounded variation. Hence,we have the integation by parts formula.

3.44 Theorem. (Integration by parts)Let both f andg be right-continuous and of bounded variation. Then

f(b)g(b) = f(a)g(a) +

ba

f dg +ba

g df +

a


36/178


3.46 Problem. What is the quadratic variation of a function of bounded variation ?

Distinguish between the continuous and the non-continuous case.

There is a third calculation rule for Stieltjes integrals called the transformationformula. Since this rule is the classical prototype of the famous Ito formula of stochas-

tic integration let us try to explain it very carefully.

3.47 Theorem. (Transformation formula)

Let : R R be a continuous function with a continuous derivative. Letf : [a, b] R be a function of bounded variation.

(1) Iff is continuous then

f(b) = f(a) +ba

f df

(2) Iff is right-continuous then

f(b) = f(a) +ba

f df +0


37/178

3.5. PROOFS OF THE MAIN THEOREMS 27

3.5 Proofs of the main theorems

3.51 Theorem. Letf S(F)+ and(fn) S(F)+. Then

fn f limn

fn d =

f d

Proof: Note that is clear. For an arbitrary > 0 let Bn := (f fn (1 + )). Itis clear that

1Bnf d

1Bnfn (1 + ) d

fn d (1 + )

From Bn it follows that A Bn A and (A Bn) (A) by -additivity. Weget

f d =n

j=1

j(Aj) = limn

nj=1

j (Aj Bn) = limn

1Bnf d

which implies f d lim

n

fn d (1 + )

Since is arbitrarily small the assertion follows. 2

3.52 Theorem. Let(fn) andgn) be increasing sequences of nonnegative measurablesimple functions. Then

limn

fn = limn

gn limn

fnd = lim

n

gnd.

Proof: It is sufficient to prove the assertion with replacing =. Sincelimk fn gk = fn limk gk = fn we obtain by 3.51

fn d = limk

fn gk d lim

k

gk d

2

3.53 Theorem. (Theorem of Beppo Levi)

Letf L(F)+ and(fn) L(F)+. Then

fn f limn

fn d =

f d

Proof: We have to show .


38/178


For every n N let (fnk)kN be an increasing sequence in S(F)+ such thatlimk fnk = fn. Define

gk := f1k

f2k

. . .

fkk

Then

fnk gk fk f whenever n k.It follows that gk f and

f d = limk

gk d lim

k

fk d

2

3.54 Problem. Prove Fatous lemma: For every sequence (fn) of nonnegative mea-

surable functionslim inf

n

fn d

lim inf

nfn d

Hint: Recall that lim infn xn = limk infnk xn. Consider gk := infnk fn and applyLevis theorem to (gk).

3.55 Theorem. (Dominated convergence theorem)

Let(fn) be a sequence of measurable function which is dominated by an integrablefunction g, i.e. |fn| g, n N. Then

fn f -a.e. f L1

() and limn fn d = f dNow it is easy to prove several important facts concerning the integral. We state

these a problems.

Proof: Integrability off is obvious since f is dominated by g, too. Moreover, thesequences g fn and g + fn consist of nonnegative measurable functions. Thereforewe may apply Fatous lemma:

(g f) d lim inf

(g fn) d =

g d lim supn

fn d

and (g + f) d lim inf

(g + fn) d =

g d + lim inf

n

fn d

This implies f d lim inf

n

fn d lim sup

n

fn d

f d

2


39/178

Chapter 4

Selected topics

4.1 Image measures and distributions

Let (, A, ) be a measure space and let (Y, B) be a measurable space. Moreover,let f : Y be a function. We are going to consider the problem of mapping themeasure to the set Y be means of the function f.

The concept of the distribution of a random variable is an important special case of

mapping a measure from one set to another (confer definition 1.39).

4.1 Definition. Let f : (, A, ) (Y, B) be (A, B)-measurable. Then

f

(B) := (f B) = (f1

(B)), B B.is the image of under f or the distribution off under .

4.2 Problem. Show that f is indeed a measure on B.

4.3 Problem. Let , F, ) be a measure space and let f = 1A where A . Findf.

4.4 Problem. Let , F, ) be a measure space and let f : R be a simplefunction. Find f.

4.5 Problem. Let (, F, P) be a probability space and let X be a random variablewith distribution function F. Show that PX = F.

An important point is how integrals behave under measure mappings.

4.6 Theorem. (Transformation formula)

Let(, F, ) be a measure space and letg L(F). Then for everyf L+(B)f g d =

f dg

29


40/178

30 CHAPTER 4. SELECTED TOPICS

4.7 Problem. Prove 4.6 by measure theoretic induction.

4.8 Problem. Let (, F, ) be a measure space and let g L(F). Show that f g is-integrable ifff is g-integrable. In case of integrability the transformation formulaholds.

4.9 Problem. Let (, F, P) be a probability space and X a random variable withdistribution function F. Explain the formula

E(f X) =

f dF

4.2 Measures with densities

Let (, F, ) be a measure space and let f L+(F).

4.10 Problem. Show that : A A

f d, A Fis a measure.

We would like to say that f is the density of with respect to but for doing sowe have to be sure that f is uniquely determined by . But this is not true, in general.

4.11 Problem. Show that the density is uniquely determined if the measure isfinite.

4.12 Example. Let |B be a measure such that all countable sets B B have measurezero and all uncountable sets have measure (B) = . A moments reflection showsthat this is actually a measure. Now for every positive constant function f c > 0 wehave

B

f d = (B), B B.

In the light of the preceding example we see that we have to exclude unreasonable

measures in order to obtain uniqueness of densities. The following lemma shows the

direction we have to go.

4.13 Lemma. Letf, g L+(F). ThenA

f d =

A

g d A F ((f = g) A) = 0 (A) < .

In other words: f = g -a.e. on every set of finite-measure.


41/178

4.2. MEASURES WITH DENSITIES 31

Proof: Let (M) < and define Mn := M (f n) g n). Since f1Mnand g1Mn are -integrable it follows that f1Mn = g1Mn -a.e. For n we haveMn

M which implies f1M = g1M -a.e. 2

Since densities are uniquely determined on sets of finite measure we have unique-

ness of densities for finite measures and also for measures which can be decomposedinto finite measures.

4.14 Definition. A measure |F is called -finite if there is a sequence of sets(Fn)nN Fsuch that (Fn) < for all n N.

Note that Borel measures are -finite. For -finite measures densitites areuniquely determined.

4.15 Lemma. If is finite or-finite thenA

f d =

A

g d A F f = g -a.e.

4.16 Definition. Let be -finite and define a measure = f by

: A

Af d, A

F.

Then f =:d

dis called the density or the Radon-Nikodym derivative of with

respect to .

A density w.r.t the Lebesgue measure is called a Lebesgue density.

4.17 Problem. Let : R R be an increasing function which is supposed to bedifferentiable on R. Show that =

.

4.18 Problem.Let (, F, P) be a measure space and X a random variable withdifferentiable distribution function F. Explain the formulas

P(X B) =B

F(t) dt and E(g X) =

g(t)F(t) dt

Which measures have densities w.r.t. other measures ?

4.19 Problem. Let = f . Show that (A) = 0 implies (A) = 0, A F.


42/178


4.20 Definition. Let |Fand |Fbe measures. The measure is said to be abso-lutely continuous w.r.t the measure |F( ) if

(A) = 0 (A) = 0, A F.

We saw that absolute continuity is necessary for having a density. It is even suffi-

cient.

4.21 Theorem. (Radon-Nikodym theorem)

Assume that is -finite. Then iff = f for somef L+(F).Proof: See Bauer, [2]. 2

4.22 Problem. Let P and Q be probability measures of a finite field

F.

(1) State Q P in terms of the generating partition ofF.(2) IfQ P find dQ/dP.An improtant question is how -integrals can be transformed into -integrals.

4.23 Problem. Let = f . Discuss the validity off d =

f

d

dd

Hint: Prove it for f S+(F) and extend it by measure theoretic induction.The following prepares for chapter 15.

4.24 Definition. The probability measures P|Fand Q|Fare said to be equivalent ifthey are mutually absolutley continuous (P Q), i.e.

P(F) = 0 Q(F) = 0 whenever F F

Obviously, we have P Q iff Q P and P Q. Therefore there exist the

Radon-Nikodym derivatives

dQ

dP anddP

dQ . The following two problems are containgeneral properties of Radon-Nkodym derivatives.

4.25 Problem. Let P Q. Show that dPdQ

= 1dQ

dP.

Hint: Show that for all F FTF

dPdQ

dQdP

1

dP = 0


43/178

4.3. PRODUCT MEASURES AND FUBINIS THEOREM 33

4.26 Problem. Let Q P. Show that P Q iff dQdP

> 0 P-a.s.

Hint: For proving

, show thatQ(F) = 0

implies1F

dQ

dP = 0 P-a.s.

4.27 Problem. Let Q P and (An) F. Then P(An) 0 implies Q(An) 0.Hint: Let > 0 and choose M such that

dQdP

>M

dQ

dPdP < .

(Why is this possible ?) Let B =dQ

dP> M

and split An = (An B) (An Bc).

4.3 Product measures and Fubinis theoremLet (1, F) and (2, G) be measurable spaces. We want to discuss measure and inte-gration on 1 2.

To begin with we have to define a -field on 1 2. This -field should be largeenough to contain at least the rectangles (diagram) F G where F Fand G G.4.28 Definition. The -field on 1 2 which is generated by the family of mea-

surable rectangles

R = {F G : F F, G G}

is called the product ofFand G and is denoted by F G.A special case of a product -field is the Borel -field B2.Having established a -field we turn to measurable functions. Recall that any con-

tinuous function f : R2 R is B2-measurable.4.29 Problem.

(1) Let f : 1 R be F-measurable. Show that (x, y) f(x) is F G-measurable.

(2) Let f : 1 R be F-measurable, g : 2 R be G-measurable, and let : R2 R be continuous. Show that (x, y) (f(x), g(y)) is F G-measurable.

The preceding problem shows that functions of several variables which are set up

as compositions of measurable functions of one variable are usually measurable with

respect to the product -field (confer corollaries 2.12 and 2.13).The next point is to talk about measures. Basically, there are measures on product

spaces having a very complicated structure. But there is a special class of measures on

product spaces which are constructed from measures on the components in a simple

way.

The starting idea is the geometric content of rectangles in R2. If I1 and I2 areintervals then the geometric content (area) of the rectangle I1 I2 is the product of the


44/178


contents (lengths) of the constituting intervals. The extension of this idea to general

measures leads to product measures.

4.30 Theorem. Let(1, F, ) and(2, G, ) be measure spaces. Then there exists auniquely determined measure |F G satisfying

( )(F G) = (F)(G), F G R.

The measure is called theproduct measureof and.Proof: See Bauer, [2]. 2

As a consequence it follows that there is a uniquely determined measure on (R2, B2)which measures rectangles by their geometric area. In terms of product measure this

is = 2

, and is called the Lebesgue measure on R2

.Let us turn to integration. Integration for general measures on product spaces can

be a rather delicate matter. Things are much simpler when we are dealing with product

measures. The main point is that multiple integration (i.e. integration w.r.t. product

measures) can be reduced to iterated integration (i.e. evaluating integrals over single

components).

Let us proceed step by step.

The most simple case is the integration of the indicator of a rectangle. Let FG R. Then we have

1FG d( ) = ( )(F G) = (F)(G) = 1F d 1G dIn general, a set A F G need not be a rectangle. How, can we extend the formulaabove to general sets ? The answer is the section theorem (Cavalieris principle).

For any set A 1 2 we call

Ay := {x 1 : (x, y) A}, y 2,

the y-section ofA (diagram !). Similarly the x-section Ax, x 1, is defined. Note,that for rectangles the sections are particularly simple.

4.31 Problem. Find the sections of a rectangle.

The section theorem says that the volume of a set is the sum of the volumes of its

sections.

4.32 Theorem. LetA F G . Then all sections of A are measurable, i.e. Ay F,y 2, andy (Ay) is aG-measurable function. Moreover, we have

( )(A) =

(Ay)(dy)


45/178

4.3. PRODUCT MEASURES AND FUBINIS THEOREM 35

Proof: The measurability parts of the section theorem are a matter of measure

theoretic routine arguments. Much more interesting is the integral formula.

In order to understand the integral formula we write it as an iterated integral:

( )(A) =

1A(x, y) (dx)

(dy)

It is easy to see that the inner integral evaluates to (Ay). Why is this formula valid ?First of all, it is valid for rectangles A = F G R. This follows immediately fromthe definition of the product measure. Moreover, both sides of the equation define

measures on the -field F G. Since these two measures are equal on rectangles theynecessarily are equal on the generated -field. 2

Let us illustrate how the section theorem works.

4.33 Problem. Find the area of the unit circle by means of the section theorem.

Outline: Let A be the unit circle with center at the origin. Then we have

2(A) =

(Ay) dy = 2

11

1 y2 dy

Substitute y = sin t and apply (sin t cos t) = 2 cos2 t 1.4.34 Problem. Find the area of a right angled triangle by means of the section

theorem.

Our last topic in this section is to extend the section theorem to integrals. Theresulting general assertion is Fubinis theorem.

4.35 Theorem. (Fubinis theorem) Letf : 1 2 R be a nonnegativeF G-measurable function. Then

x f(x, y) and y

f(x, y) (dx)

are measurable functions andf d( ) =

f(x, y) (dx)

(dy)

Proof: Fubinis theorem follows from the section theorem in a straightforward

way by measure theoretic induction. 2

4.36 Problem. Find a version of Fubinis theorem for integrable functions.

4.37 Problem. Explain when it is possible to interchange the order of integration for

an iterated integral.

4.38 Problem. Deduce from Fubinis theorem assertions for interchanging the order

of summation for double series of numbers.


46/178


4.4 Spaces of integrable functions

We know that the space

L1 =

L1(,

F, ) is a vector space. We would like to define

a norm on L1.A natural idea is to define

||f||1 :=

|f| d, f L1.

It is easy to see that this definition has the following properties:

(1) ||f||1 0, f = 0 ||f||1 = 0,(2) ||f + g||1 ||f||1 + ||g||1, f , g L1,(3) ||f||1 || ||f||1, R, f L1.However, we have

||f||1 = 0 f = 0 -a.e.A function with zero norm need not be identically zero ! Therefore, ||.||1 is not a normon L1 but only a pseudo-norm.

In order to get a normed space one has to change the space L1 in such a way that allfunctions f = g -a.e. are considered as equal. Then f = 0 -a.e. can be consideredas the null element of the vector space. The space of integrable functions modified in

this way is denoted by L1 = L1(, F, ).4.39 Discussion. For those readers who like to have hard facts instead of soft wellness

we provide some details.

For any f L(F) letf = {g L(F) : f = g -a.e.}

denote the equivalence class of f. Then integrability is a class-property and the space

L1 := {f : f L1}is a vector space. The value of the integral depends only on the class and therefore it

defines a linear function on L1 having the usual properties. In particular, ||f||1 := ||f||1defines a norm on L1.

It is common practice to work with L1 instead of

L1 but to write f instead of f.

This is a typical example of what mathematicians call abuse of language.

4.40 Theorem. The spaceL1(, F, ) is a Banach space.Proof: Let (fn) be a Cauchy sequence in L

1, i.e.

> 0 N() such that

|fn fm| d < whenever n, m N().

Let ni := N(1/2i). Then

|fni+1 fni| d 0. First we note that forevery f L1(, F, P) there exists an F-measurable simple function g such that ||fg||1 < . This can easily be shown for the positive and the negative parts separately.Second we have show that for every F-measurable simple function g there exists an R-measurable simple function h such that ||g h||1 < . This follows from the measureextension theorem. we do not go into details but refer to Bauer, [2]. 2

Let

L2 = L2(, F, ) = {f L(F) :

f2d < }


48/178


This is another important space of integrable functions.

4.42 Problem.

(a) Show that L2 is a vector space.(b) Show that

f2d < is a property of the -equivalence class off L(F).

By L2 = L2(, F, ) we again denote the corresponding space of equivalenceclasses. On this space there is an inner product

< f, g >:=

f g d, f, g L2.

The corresponding norm is

||f||2 =< f,f >= f2d1/2

The following facts can be proved in a way similar the the L1-case.

4.43 Theorem. The spaceL2(, F, ) is a Hilbert space.

4.44 Theorem. LetR be a field which generates F. Then the set ofR-measurablesimple functions is dense in L2(, F, P).

4.5 Fourier transforms

In order to represent and treat measures and probability measures in a mathematically

convenient way measure transforms play a predominant role. The most simple measure

transform is the moment generating function.

4.45 Definition. Let |B be a finite measure. Then the function

m(t) = etx (dx), t R,is called the Laplace transform or moment generating function of.

The moment generating function shares important useful properties with other

measure transforms but it has a serious drawback. The exponential function x etx isunbounded and therefore may be not integrable for some values of t and measures .The application of moment generating functions is only possible in such cases where

the exponential function is integrable at least for all values oft in an interval of positivelength.


49/178

4.5. FOURIER TRANSFORMS 39

This kind of complication vanishes if we replace the real-valued exponential func-

tion x etx by its complex version x eitx. The corresponding measure transformis called the Fourier transform.

4.46 Discussion. Let us recall some basic facts on complex numbers.

The complex number field

C = {z = u + iv : u, v R}is an extension of the real numbers R in such a way that a number i (the imaginarynumber) is introduced which satisfies i2 = i i = 1. All other rules of calculationcarry over from R to C.

Complex numbers are not ordered but have an absolute value, defined by |z| =u2 + v2 ifz = u + iv. For every complex number z

C there is a conjugate number

z := u iv. The operation of conjugation satisfies z1z2 = z1z2. Moreover, we havezz = |z|2.

Several functions defined on R can be extended to C. For our purposes only the

exponential function is of importance. It is defined by

eu+iv := eu(cos(v) + i sin(v)), u, v R.This definition satisfies ez1+z2 = ez1ez2 , z1, z2 C. For the notion of the Fouriertransform it is important to note that |eiv| = 1, v R. This is a consequence offamiliar properties of trigonometric functions.

Differentiation and integration of complex-valued functions of a real variable is

easily defined to be performed for the real and the imaginary parts separately. Be sure

to note that we are not dealing with function of a complex variable ! This would be a

much more advanced topic called complex analysis.

4.47 Problem. Find the derivative ofx eax, x R, where a C.4.48 Problem. Show that the basic derivation rules (summation rule, product rule

and chain rule) are valid for complex-valued functions.

4.49 Problem. Let f be a complex-valued measurable function (both the real and theimaginary part are measurable). Show that |f| is -integrable iff both the real and theimaginary part off are -integrable.

4.50 Problem. Show that the -integral of complex-valued functions on R is a linearfunctional.

4.51 Problem. Let f be a complex-valued -integrable function. Show that

f d

|f| d.


50/178


The next problem shows that the usual integration calculus (substitution, integra-

tion by parts) carries over from real-valued functions to complex-valued functions.

4.52 Problem. Show that indefinite integrals of complex-valued functions on R are

primitives of their integrands.

4.53 Problem. Find

dc

eax dx, where c, d R, a C.

With these preparations we are in a position to proceed with Fourier transforms.

4.54 Definition. Let |B be a finite measure. Then the function

(t) =

eitx (dx), t R,

is called the Fourier transform of.

Note that the Fourier transform is well-defined and finite for every t R.4.55 Problem. Find the Fourier transform of a point measure.

4.56 Problem. Find the Fourier transform of an exponential distribution.

4.57 Problem. Find the Fourier transform of a Poisson distribution.Hint: The series expansion of the exponential function carries over to the complex-

valued case.

4.58 Problem. Find the Fourier transform of a Gaussian distribution.

Hint: Derive a differential equation for the Fourier transform.

The Fourier transform can be used to find the moments of a measure.

4.59 Theorem. Let|B be a finite measure. If|x|k (dx) < then is k-timesdifferentiable and

dk

dtk (t) t=0 = ik xk (dx)4.60 Problem. Prove 4.59.

The fundamental fact on Fourier transforms is the uniqueness theorem.

4.61 Theorem. Let1|B and2|B be finite measures. Then1 = 2 1 = 2.


51/178

4.5. FOURIER TRANSFORMS 41

We dont prove this theorem here since it is a reformulation of the fundamen-

tal Stone-Weierstrass approximation theorem of mathematical analysis. We refer to

Bauer, [2].

The notion of the Fourier transform can be extended to measures on (Rn, Bn).

4.62 Definition. Let |Bn be a finite measure on Rn. Then the function

(t) =

eitx (dx), t Rn,

is called the Fourier transform of.

The uniqueness theorem is true also for the n-dimensional case.


52/178



53/178

Part II

Probability theory

43


54/178


55/178

Chapter 5

Beyond measure theory

5.1 Independence

The notion of independence marks the point where probability theory goes beyond

measure theory.

Recall that two events A, B F are independent if the product formulaP(A B) = P(A)P(B) is true. This is easily extended to families of events.

5.1 Definition. Let C and D be subfamilies ofF. The families C and D are said to beindependent (with respect to P) if P(A B) = P(A)P(B) for every choice A Cand B D.

It is natural to call random variables X and Y independent if the correspondinginformation sets are independent.

5.2 Definition. Two random variables X and Y are independent if(X) and (Y)are independent.

The preceding definition can be stated as follows: Two random variables X and Yare independent if

P(X B1, Y B2) = P(X B1)P(Y B2), B1, B2 B.

This is equivalent to saying that the joint distribution PX,Y ofX and Y is the productofPX and PY.

How to check independence of random variables ? Is it sufficient to check the

independence of generators of the information sets ? This is not true, in general, but

with a minor modification it is.

5.3 Theorem. Let X and Y be random variables and let C and D be generatorsof the corresponding information sets. IfC andD are independent and closed underintersection then X andY are independent.

45


56/178

46 CHAPTER 5. BEYOND MEASURE THEORY

5.4 Problem. Let F(x, y) be the joint distribution function of (X, Y). Show that Xand Y are independent iffF(x, y) = h(x)k(y) for some functions h and k.

For independent random variables there is a product formula for expectations.

5.5 Theorem. (1) LetX 0 andY 0 be independent random variables. Then

E(XY) = E(X)E(Y)

(2) LetX L1 andY L1 be independent random variables. Then XY L1 and

E(XY) = E(X)E(Y)

Proof: Apply measure theoretic induction to obtain (1). Part (2) follows from (1).

2

5.6 Problem. Let X and Y be random variables on a common probability space.Show that X and Y are independent iff

E(ei(sX+tY)) = E(eisX)E(eitY), s, t R.

Recall that square integrable random variables X and Y are called uncorrelated ifE(XY) = E(X)E(Y). This is a weaker notion than independence.

5.7 Problem. Show that uncorrelated random variables need not be independent.

5.8 Problem. Find the variance of the sample mean of independent random variables.

5.9 Problem. Show that Xand Y are independent ifff(X) and g(Y) are uncorrelatedfor all bounded measurable functions f and g.

The notion of independence (as well as the notion of uncorrelated random vari-

ables) can be extended to more than two random variables. We will state the appropri-

ate facts when we need them.

5.2 Convergence and limit theorems

For probability theory other kinds of convergence play a predominant role than those

we are accustomed to so far.


57/178

5.2. CONVERGENCE AND LIMIT THEOREMS 47

5.10 Definition. Let (, F, P) be a probability space and let (Xn) be a sequenceof random variables. The sequence (Xn) is said to converge to a random variable XP-almost sure if

limn

Xn() = X() for P-almost all

This kind of convergence is also considered in measure theory and we know that

under certain additional conditions convergence P-almost sure implies convergence ofthe expectations of the random variables.

However, the probabilistic meaning of almost sure convergence is limited. The

reason is that the idea of approximating a random variable X by another random vari-able

Yin a probabilistic sense does not require that the random variables similar for

all . It is sufficient that the probability of being near to each other is large.5.11 Definition. Let (, F,

Date post:	06-Apr-2018
Category:	Documents
Upload:	emil-bigaj
View:	241 times
Download:	1 times

Introduction to Probability Theory and Stochastic Processes

Documents