Mathematical Analysis - Forsiden · Mathematical analysis is a continuation of calculus, but it is...

Mathematical Analysis

by

Tom Lindstrøm

Department of MathematicsUniversity of Oslo

2014

i

Preface

These notes were first written as an emergency measure when the textbookfor the course MAT2400 failed to show up in the spring of 2011. The presentversion is corrected, updated and extended, but still far from perfect. I wouldlike to thank everybody who has pointed out mistakes and weaknesses inthe previous versions, in particular Geir Ellingsrud, Erik Løw, Per Størset,and Daniel Aubert.

Blindern, January 7th, 2014

Tom Lindstrøm

ii

Contents

1 Preliminaries: Proofs, Sets, and Functions 3

1.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Sets and boolean operations . . . . . . . . . . . . . . . . . . . 6

1.3 Families of sets . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Relations and partitions . . . . . . . . . . . . . . . . . . . . . 13

1.6 Countability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Metric Spaces 21

2.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . 22

2.2 Convergence and continuity . . . . . . . . . . . . . . . . . . . 27

2.3 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Complete spaces . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.6 An alternative description of compactness . . . . . . . . . . . 47

2.7 The completion of a metric space . . . . . . . . . . . . . . . . 50

3 Spaces of continuous functions 57

3.1 Modes of continuity . . . . . . . . . . . . . . . . . . . . . . . 57

3.2 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . 59

3.3 The spaces C(X,Y ) . . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Applications to differential equations . . . . . . . . . . . . . . 66

3.5 Compact subsets of C(X,Rm) . . . . . . . . . . . . . . . . . . 70

3.6 Differential equations revisited . . . . . . . . . . . . . . . . . 75

3.7 Polynomials are dense in C([a, b],R) . . . . . . . . . . . . . . 80

3.8 Baire’s Category Theorem . . . . . . . . . . . . . . . . . . . . 86

4 Series of functions 93

4.1 lim sup and lim inf . . . . . . . . . . . . . . . . . . . . . . . . 93

4.2 Integrating and differentiating sequences . . . . . . . . . . . . 95

4.3 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.4 Abel’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.5 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 108

iii

CONTENTS 1

4.6 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . 1134.7 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 124

5 Measure and integration 1295.1 Measure spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.2 Complete measures . . . . . . . . . . . . . . . . . . . . . . . . 1395.3 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . 1445.4 Integration of simple functions . . . . . . . . . . . . . . . . . 1505.5 Integrals of nonnegative functions . . . . . . . . . . . . . . . . 1575.6 Integrable functions . . . . . . . . . . . . . . . . . . . . . . . 1675.7 L1(X,A, µ) and L2(X,A, µ) . . . . . . . . . . . . . . . . . . . 173

6 Constructing measures 1816.1 Outer measure . . . . . . . . . . . . . . . . . . . . . . . . . . 1826.2 Measurable sets . . . . . . . . . . . . . . . . . . . . . . . . . . 1846.3 Caratheodory’s Theorem . . . . . . . . . . . . . . . . . . . . . 1896.4 Lebesgue measure on R . . . . . . . . . . . . . . . . . . . . . 1966.5 Approximation results . . . . . . . . . . . . . . . . . . . . . . 2006.6 The coin tossing measure . . . . . . . . . . . . . . . . . . . . 2046.7 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . 2076.8 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 210

2 CONTENTS

Chapter 1

Preliminaries: Proofs, Sets,and Functions

Chapters with the word ”preliminaries” in the title are never much fun, butthey are useful — they provide the reader with the background informationnecessary to enjoy the main body of the text. This chapter is no exception,but I have tried to keep it short and to the point; everything you findhere will be needed at some stage, and most of the material will show upthroughout the book.

Mathematical analysis is a continuation of calculus, but it is more ab-stract and therefore in need of a larger vocabulary and more precisely definedconcepts. You have undoubtedly dealt with proofs, sets, and functions inyour previous mathematics courses, but probably in a rather casual way.Now they become the centerpiece of the theory, and there is no way to un-derstand what is going on if you don’t have a good grasp of them: Thesubject matter is so abstract that you can no longer rely on drawings andintuition; you simply have to be able to understand the concepts and toread, create and write proofs. Fortunately, this is not as difficult as it maysound if you have never tried to take proofs and formal definitions seriouslybefore.

1.1 Proofs

There is nothing mysterious about mathematical proofs; they are just chainsof logically irrefutable arguments that bring you from things you alreadyknow to whatever you want to prove. Still there are a few tricks of the tradethat are useful to know about.

Many mathematical statements are of the form “If A, then B”. Thissimply means that whenever statement A holds, statement B also holds,but not necessarily vice versa. A typical example is: ”If n ∈ N is divisibleby 14, then n is divisible by 7”. This is a true statement since any natural

3

4CHAPTER 1. PRELIMINARIES: PROOFS, SETS, AND FUNCTIONS

number that is divisible by 14, is also divisible by 7. The opposite statementis not true as there are numbers that are divisible by 7, but not by 14 (e.g.7 and 21).

Instead of “If A, then B”, we often say that “A implies B” and writeA =⇒ B. As already observed, A =⇒ B and B =⇒ A mean two differentthings. If they are both true, A and B hold in exactly the same cases, andwe say that A and B are equivalent. In words, we say “A if and only if B”,and in symbols we write A⇐⇒ B. A typical example is:

“A triangle is equilateral if and only if all three angels are 60”

When we want to prove that A ⇐⇒ B, it is often convenient to proveA =⇒ B and B =⇒ A separately.

If you think a little, you will realize that “A =⇒ B” and “not-B =⇒not-A” mean exactly the same thing — they both say that whenever Ahappens, so does B. This means that instead of proving “A =⇒ B”, wemight just a well prove “not-B =⇒ not-A”. This is called a contrapositiveproof, and is convenient when the hypothesis “not-B” gives us more to workon than the hypothesis “A”. Here is a typical example.

Proposition 1.1.1 If n2 is an even number, so is n.

Proof: We prove the contrapositive statement: ”If n is odd, so is n2”: If nis odd, it can be written as n = 2k+1 for a nonnegative integer k. But then

n2 = (2k + 1)2 = 4k2 + 4k + 1 = 2(2k2 + 2k) + 1

which is clearly odd. 2

It should be clear why a contrapositive proof is best in this case: The hy-pothesis “n is odd” is much easier to work with than the original hypothesis“n2 is even”.

A related method of proof is proof by contradiction or reductio ad ab-surdum. In these proofs, we assume the opposite of what we want to show,and prove that this leads to a contradiction. Hence our assumption must befalse, and the original claim is established. Here is a well-known example.

Proposition 1.1.2√

2 is an irrational number.

Proof: We assume for contradiction that√

2 is rational. This means that

√2 =

m

n

for natural numbers m and n. By cancelling as much as possible, we mayassume that m and n have no common factors.

1.1. PROOFS 5

If we square the equality above and multiply by n2 on both sides, we get

2n2 = m2

This means that m2 is even, and by the previous proposition, so is m. Hencem = 2k for some natural number k, and if we substitute this into the lastformula above and cancel a factor 2, we see that

n2 = 2k2

This means that n2 is even, and by the previous proposition n is even. Thuswe have proved that both m and n are even, which is impossible as we as-sumed that they have no common factors. This means that the assumptionthat

√2 is rational leads to a contradiction, and hence

√2 must be irra-

tional. 2

Let me end this section by reminding you of a technique you have cer-tainly seen before, proof by induction. We use this technique when wewant to prove that a certain statement P (n) holds for all natural num-bers n = 1, 2, 3, . . .. A typical statement one may want to prove in this way,is

P (n) : 1 + 2 + 3 + · · ·+ n =n(n+ 1)

2

The basic observation behind the technique is:

1.1.3 (Induction Principle) Assume that P (n) is a statement about nat-ural numbers n = 1, 2, 3, . . .. Assume that the following two conditions aresatisfied:

(i) P (1) is true

(ii) If P (k) is true for a natural number k, then P (k + 1) is also true.Then P (n) holds for all natural numbers n.

Let us see how we can use the principle to prove that

P (n) : 1 + 2 + 3 + · · ·+ n =n(n+ 1)

2

holds for all natural numbers n.First we check that the statement holds for n = 1: In this case the

formula says

1 =1 · (1 + 1)

2

which is obviously true. Assume now that P (k) holds for some naturalnumber k, i.e.

1 + 2 + 3 + · · ·+ k =k(k + 1)

2


We then have

1 + 2 + 3 + · · ·+ k + (k + 1) =k(k + 1)

2+ (k + 1) =

(k + 1)(k + 2)

2

which means that P (k + 1) is true. By the Induction Principle, P (n) holdsfor all natural numbers n.

Exercises for Section 1.1

1. Assume that the product of two integers x and y is even. Show that at leastone of the numbers is even.

2. Assume that the sum of two integers x and y is even. Show that x and y areeither both even or both odd.

3. Show that if n is a natural number such that n2 is divisible by 3, then n isdivisible by 3. Use this to show that

√3 is irrational.

1.2 Sets and boolean operations

In the systematic development of mathematics, set is usually taken as thefundamental notion from which all other concepts are developed. We shallnot be so ambitious, we shall just think naively of a set as a collection ofmathematical objects. A set may be finite, such as the set

1, 2, 3, 4, 5, 6, 7, 8, 9

of all natural numbers less than 10, or infinite as the set (0, 1) of all realnumbers between 0 and 1.

We shall write x ∈ A to say that x is an element of the set A, and x /∈ Ato say that x is not an element of A. Two sets are equal if they have exactlythe same elements, and we say that A is subset of B (and write A ⊆ B) ifall elements of A are elements of B, but not necessarily vice versa. Notethat there is no requirement that A is strictly included in B, and hence itis correct to write A ⊆ B when A = B (in fact, a standard technique forshowing that A = B is first to show that A ⊆ B and then that B ⊆ A). By∅ we shall mean the empty set, i.e. the set with no elements (you may feelthat a set with no elements is a contradiction in terms, but mathematicallife would be much less convenient without the empty set).

Many common sets have a standard name and notation such as

N = 1, 2, 3, . . ., the set of natural numbers

Z = . . .− 3,−2,−1, 0, 1, 2, 3, . . ., the set of all integers

Q, the set of all rational numbers

1.2. SETS AND BOOLEAN OPERATIONS 7

R, the set of all real numbers

C, the set of all complex numbers

Rn, the set of all real n-tuples

To specify other sets, we shall often use expressions of the kind

A = a |P (a)

which means the set of all objects satisfying condition P . Often it is moreconvenient to write

A = a ∈ B |P (a)which means the set of all elements in B satisfyıng the condition P . Exam-ples of this notation are

[−1, 1] = x ∈ R | − 1 ≤ x ≤ 1

andA = 2n− 1 | n ∈ N

where A is the set of all odd numbers. To increase readability I shall oc-casionally replace the vertical bar | by a colon : and write A = a : P (a)and A = a ∈ B : P (a) instead of A = a |P (a) and A = a ∈ B |P (a),e.g. in expressions like ||αx|| : |α| < 1 where there are lots of vertical barsalready.

If A1, A2, . . . , An are sets, their union and intersection are given by

A1∪A2∪. . .∪An = a | a belongs to at least one of the sets A1, A2, . . . , An

and

A1 ∩A2 ∩ . . . ∩An = a | a belongs to all the sets A1, A2, . . . , An,

respectively. Two sets are called disjoint if they do not have elements incommon, i.e. if A ∩B = ∅.

When we calculate with numbers, the distributive law tells us how tomove common factors in and out of parentheses:

b(a1 + a2 + · · ·+ an) = ba1 + ba2 + · · · ban

Unions and intersections are distributive both ways, i.e. we have:

Proposition 1.2.1 For all sets B,A1, A2, . . . , An

B ∩ (A1 ∪A2 ∪ . . . ∪An) = (B ∩A1) ∪ (B ∩A2) ∪ . . . ∪ (B ∩An) (1.2.1)

and

B ∪ (A1 ∩A2 ∩ . . . ∩An) = (B ∪A1) ∩ (B ∪A2) ∩ . . . ∩ (B ∪An) (1.2.2)


Proof: We prove the first formula and leave the second as an exercise. Theproof is in two steps: first we prove that the set on the left is a subset of theone on the right, and then we prove that the set on the right is a subset ofthe one on the left.

Assume first that x is an element of the set on the left, i.e. x ∈ B ∩(A1 ∪ A2 ∪ . . . ∪ An). Then x must be in B and at least one of the sets Ai.But then x ∈ B ∩ Ai, and hence x ∈ (B ∩ A1) ∪ (B ∩ A2) ∪ . . . ∪ (B ∩ An).This proves that

B ∩ (A1 ∪A2 ∪ . . . ∪An) ⊆ (B ∩A1) ∪ (B ∩A2) ∪ . . . ∪ (B ∩An)

To prove the opposite inclusion, assume that x ∈ (B ∩ A1) ∪ (B ∩ A2) ∪. . . ∪ (B ∩ An). Then x ∈ B ∩ Ai for at least one i, and hence x ∈ B andx ∈ Ai. But if x ∈ Ai for some i, then x ∈ A1 ∪ A2 ∪ . . . ∪ An, and hencex ∈ B ∩ (A1 ∪A2 ∪ . . . ∪An). This proves that

B ∩ (A1 ∪A2 ∪ . . . ∪An) ⊇ (B ∩A1) ∪ (B ∩A2) ∪ . . . ∪ (B ∩An)

As we now have inclusion in both directions, (1.2.1) follows. 2

Remark: It is possible to prove formula (1.2.1) in one sweep by noticingthat all steps in the argument are equivalences and not only implications,but most people are more prone to making mistakes when they work withchains of equivalences than with chains of implications.

There are also other algebraic rules for unions and intersections, butmost of them are so obvious that we do not need to state them here (anexception is De Morgan’s laws which we shall return to in a moment).

The set theoretic difference A \B (also written A−B) is defined by

A \B = a | a ∈ A, a /∈ B

In many situations we are only interested in subsets of a given set U (oftenreferred to as the universe). The complement Ac of a set A with respect toU is defined by

Ac = U \A = a ∈ U | a /∈ A

We can now formulate De Morgan’s laws:

Proposition 1.2.2 (De Morgan’s laws) Assume that A1, A2, . . . , An aresubsets of a universe U . Then

(A1 ∪A2 ∪ . . . ∪An)c = Ac1 ∩Ac2 ∩ . . . ∩Acn (1.2.3)

and(A1 ∩A2 ∩ . . . ∩An)c = Ac1 ∪Ac2 ∪ . . . ∪Acn (1.2.4)

1.3. FAMILIES OF SETS 9

(These rules are easy to remember if you observe that you can distributethe c outside the parentheses on the individual sets provided you turn all∪’s into ∩’s and all ∩’s into ∪’s).

Proof of De Morgan’s laws: We prove the first part and leave the secondas an exercise. The strategy is as indicated above; we first show that anyelement of the set on the left must also be an element of the set on the right,and then vice versa.

Assume that x ∈ (A1 ∪ A2 ∪ . . . ∪ An)c. Then x /∈ A1 ∪ A2 ∪ . . . ∪ An,and hence for all i, x /∈ Ai. This means that for all i, x ∈ Aci , and hencex ∈ Ac1 ∩Ac2 ∩ . . . ∩Acn.

Assume next that x ∈ Ac1 ∩ Ac2 ∩ . . . ∩ Acn. This means that x ∈ Aci forall i, in other words: for all i, x /∈ Ai . Thus x /∈ A1 ∪ A2 ∪ . . . ∪ An whichmeans that x ∈ (A1 ∪A2 ∪ . . . ∪An)c. 2

We end this section with a brief look at cartesian products. If we havetwo sets, A and B, the cartesian product A × B consists of all pairs (a, b)where a ∈ A and b ∈ B. If we have more sets A1, A2, . . . , An, the cartesianproduct A1 × A2 × · · · × An consists of all n-tuples (a1, a2, . . . , an) wherea1 ∈ A1, a2 ∈ A2, . . . , an ∈ An. If all the sets are the same (i.e. Ai = A forall i), we usually write An instead of A× A× · · · × A. Hence Rn is the setof all n-tuples of real numbers, just as you are used to, and Cn is the set ofall n-tuples of complex numbers.


1. Show that [0, 2] ∪ [1, 3] = [0, 3] and that [0, 2] ∩ [1, 3] = [1, 2]

2. Let U = R be the universe. Explain that (−∞, 0)c = [0,∞)

3. Show that A \B = A ∩Bc.

4. The symmetric difference A 4 B of two sets A,B consists of the elementsthat belong to exactly one of the sets A,B. Show that

A4B = (A \B) ∪ (B \A)

5. Prove formula (1.2.2).


7. Prove that A1 ∪A2 ∪ . . . ∪An = U if and only if Ac1 ∩Ac2 ∩ . . . ∩Acn = ∅.

8. Prove that (A∪B)×C = (A×C)∪(B×C) and (A∩B)×C = (A×C)∩(B×C).

1.3 Families of sets

A collection of sets is usually called a family. An example is the family

A = [a, b] | a, b ∈ R


of all closed and bounded intervals on the real line. Families may seemabstract, but you have to get used to them as they appear in all parts ofhigher mathematics. We can extend the notions of union and intersectionto families in the following way: If A is a family of sets, we define⋃

A∈AA = a | a belongs to at least one set A ∈ A

and ⋂A∈A

A = a | a belongs to all sets A ∈ A

The distributive laws extend to this case in the obvious way. i.e.,

B ∩ (⋃A∈A

A) =⋃A∈A

(B ∩A) and B ∪ (⋂A∈A

A) =⋂A∈A

(B ∪A)

and so do the laws of De Morgan:

(⋃A∈A

A)c =⋂A∈A

Ac and (⋂A∈A

A)c =⋃A∈A

Ac

Families are often given as indexed sets. This means we we have a basicset I, and that the family consists of one set Ai for each element in I. Wethen write the family as

A = Ai | i ∈ I,

and use notation such as⋃i∈I

Ai and⋂i∈I

Ai

or alternatively⋃Ai : i ∈ I and

⋂Ai : i ∈ I

for unions and intersectionsA rather typical example of an indexed set is A = Br | r ∈ [0,∞)

where Br = (x, y) ∈ R2 |x2 + y2 = r2. This is the family of all circles inthe plane with centre at the origin.


1. Show that⋃n∈N[−n, n] = R

2. Show that⋂n∈N(− 1

n ,1n ) = 0.

3. Show that⋃n∈N[ 1n , 1] = (0, 1]

4. Show that⋂n∈N(0, 1

n ] = ∅

1.4. FUNCTIONS 11

5. Prove the distributive laws for families. i.e.,

B ∩ (⋃A∈A

A) =⋃A∈A

(B ∩A) and B ∪ (⋂A∈A

A) =⋂A∈A

(B ∪A)

6. Prove De Morgan’s laws for families:

(⋃A∈A

A)c =⋂A∈A

Ac and (⋂A∈A

A)c =⋃A∈A

Ac

1.4 Functions

Functions can be defined in terms of sets, but for our purposes it sufficesto think of a function f : X → Y from X to Y as a rule which to eachelement x ∈ X assigns an element y = f(x) in Y . If f(x) 6= f(y) wheneverx 6= y, we call the function injective (or one-to-one). If there for each y ∈ Yis an x ∈ X such that f(x) = y, the function is called surjective (or onto).A function which is both injective and surjective, is called bijective — itestablishes a one-to-one correspondence between the elements of X and Y .

If A is subset of X, the set f(A) ⊆ Y defined by

f(A) = f(a) | a ∈ A

is called the image of A under f . If B is subset of Y , the set f−1(B) ⊆ Xdefined by

f−1(B) = x ∈ X | f(x) ∈ B

is called the inverse image of B under f . In analysis, images and inverseimages of sets play important parts, and it is useful to know how theseoperations relate to the boolean operations of union and intersection. Letus begin with the good news.

Proposition 1.4.1 Let B be a family of subset of Y . Then for all functionsf : X → Y we have

f−1(⋃B∈B

B) =⋃B∈B

f−1(B) and f−1(⋂B∈B

B) =⋂B∈B

f−1(B)

We say that inverse images commute with arbitrary unions and intersec-tions.

Proof: I prove the first part; the second part is proved similarly. Assume firstthat x ∈ f−1(

⋃B∈B B). This means that f(x) ∈

⋃B∈B B, and consequently

there must be at least one B ∈ B such that f(x) ∈ B. But then x ∈f−1(B), and hence x ∈

⋃B∈B f

−1(B). This proves that f−1(⋃B∈B B) ⊆⋃

B∈B f−1(B).

To prove the opposite inclusion, assume that x ∈⋃B∈B f

−1(B). Theremust be at least one B ∈ B such that x ∈ f−1(B), and hence f(x) ∈ B.


This implies that f(x) ∈⋃B∈B B, and hence x ∈ f−1(

⋃B∈B B). 2

For forward images the situation is more complicated:

Proposition 1.4.2 Let A be a family of subset of X. Then for all functionsf : X → Y we have

f(⋃A∈A

A) =⋃A∈A

f(A) and f(⋂A∈A

A) ⊆⋂A∈A

f(A)

In general, we do not have equality in the last case. Hence forward imagescommute with unions, but not always with intersections.

Proof: To prove the statement about unions, we first observe that sinceA ⊆

⋃A∈AA for all A ∈ A, we have f(A) ⊆ f(

⋃A∈AA) for all such A. Since

this inclusion holds for all A, we must also have⋃A∈A f(A) ⊆ f(

⋃A∈A). To

prove the opposite inclusion, assume that y ∈ f(⋃A∈AA). This means that

there exists an x ∈⋃A∈AA such that f(x) = y. This x has to belong to at

least one A ∈ A, and hence y ∈ f(A) ⊆⋃A∈A f(A).

To prove the inclusion for intersections, just observe that since⋂A∈AA ⊆

A for all A ∈ A, we must have f(⋂A∈AA) ⊆ f(A) for all such A. Since

this inclusion holds for all A, it follows that f(⋂A∈AA) ⊆

⋂A∈A f(A). The

example below shows that the opposite inclusion does not always hold. 2

Example 1: Let X = x1, x2 and Y = y. Define f : X → Y byf(x1) = f(x2) = y, and let A1 = x1, A2 = x2. Then A1 ∩ A2 = ∅ andconsequently f(A1∩A2) = ∅. On the other hand f(A1) = f(A2) = y, andhence f(A1)∩f(A2) = y. This means that f(A1∩A2) 6= f(A1)∩f(A2). ♣

The problem in this example stems from the fact that y belongs to bothf(A1) and f(A2), but only as the image of two different elements x1 ∈ A1

og x2 ∈ A2; there is no common element x ∈ A1 ∩ A2 which is mapped toy. This problem disappears if f is injective:

Corollary 1.4.3 Let A be a family of subset of X. Then for all injectivefunctions f : X → Y we have

f(⋂A∈A

A) =⋂A∈A

f(A)

Proof: The easiest way to show this is probably to apply Proposition 2 tothe inverse function of f , but I choose instead to prove the missing inclusionf(⋂A∈AA) ⊇

⋂A∈A f(A) directly.

Assume y ∈⋂A∈A f(A). For each A ∈ A there must be an element

xA ∈ A such that f(xA) = y. Since f is injective, all these xA ∈ A must

1.5. RELATIONS AND PARTITIONS 13

be the same element x, and hence x ∈ A for all A ∈ A. This means thatx ∈

⋂A∈AA, and since y = f(x), we have proved that y ∈ f(

⋂A∈AA). 2

Taking complements is another operation that commutes with inverseimages, but not (in general) with forward images.

Proposition 1.4.4 Assume that f : X → Y is a function and that B ⊆Y . Then f−1(Bc)) = (f−1(B))c. (Here, of course, Bc = Y \ B is thecomplement with respect to the universe Y , while (f−1(B))c = X \ f−1(B)is the complemet with respect to the universe X).

Proof: An element x ∈ X belongs to f−1(Bc) if and only if f(x) ∈ Bc.On the other hand, it belongs to (f−1(B))c if and only if f(x) /∈ B, i.e. ifff(x) ∈ Bc. 2

Finally, let us just observe that being disjoint is also a property thatis conserved under inverse images; if A ∩ B = ∅, then f−1(A) ∩ f−1(B) =∅. Again the corresponding property for forward images does not hold ingeneral.


1. Let f : R→ R be the function f(x) = x2. Find f([−1, 2]) and f−1([−1, 2]).

2. Let g : R2 → R be the function g(x, y) = x2 + y2. Find f([−1, 1] × [−1, 1])and f−1([0, 4]).

3. Show that a strictly increasing function f : R→ R is injective. Does it haveto be surjective?

4. Prove the second part of Proposition 1.4.1.

5. Find a function f : X → Y and a set A ⊆ X such that we have neitherf(Ac) ⊆ f(A)c nor f(A)c ⊆ f(Ac).

6. Show that if f : X → Y and g : Y → Z are injective, then g f : X → Z isinjective.

7. Show that if f : X → Y and g : Y → Z are surjective, then g f : X → Z issurjective.

1.5 Relations and partitions

In mathematics there are lots of relations between objects; numbers maybe smaller or larger than each other, lines may be parallell, vectors may beorthogonal, matrices may be similar and so on. Sometimes it is convenientto have an abstract definition of what we mean by a relation.


Definition 1.5.1 By a relation on a set X, we mean a subset R of thecartesian product X × X. We usually write xRy instead of (x, y) ∈ R todenote that x and y are related. The symbols ∼ and ≡ are often used todenote relations, and we then write x ∼ y and x ≡ y.

At first glance this definition may seem strange as very few people thinkof relations as subsets of X ×X, but a little thought will convince you thatit gives us a convenient starting point, especially if I add that in practicerelations are rarely arbitrary subsets of X ×X, but have much more struc-ture than the definition indicates. We shall take a look at one such class ofrelations, the equivalence relations. Equivalence relations are used to parti-tion sets into subsets, and from a pedagogical point of view, it is probablybetter to start with the related notion of a partition.

Informally, a partition is what we get if we divide a set into non-overlappingpieces. More precisely, If X is a set, a partition P of X is a family of subsetof X such that each element in x belongs to exactly one set P ∈ P. Theelements P of P are called partition classes of P.

Given a partition of X, we may introduce a relation ∼ on X by

x ∼ y ⇐⇒ x and y belong to the same set P ∈ P

It is easy to check that ∼ has the following three properties:

(i) x ∼ x for all x ∈ X.

(ii) If x ∼ y, then y ∼ x.

(iii) If x ∼ y and y ∼ z, then x ∼ z.

We say that ∼ is the relation induced by the partition P.Let us now turn the tables around and start with a relation on X satis-

fying conditions (i)-(iii):

Definition 1.5.2 An equivalence relation on X is a relation ∼ satisfyingthe following conditions:

(i) Reflexivity: x ∼ x for all x ∈ X,

(ii) Symmetry: If x ∼ y, then y ∼ x,

(iii) Transitivity: If x ∼ y and y ∼ z, then x ∼ z.

Given an equivalence relation ∼ on X, we may for each x ∈ X definethe equivalence class [x] of x by:

[x] = y ∈ X |x ∼ y

The following result tells us that there is a one-to-one correspondence be-tween partitions and equivalence relations — all partitions induce an equiv-alence relation, and all equivalence relations define a partition

1.5. RELATIONS AND PARTITIONS 15

Proposition 1.5.3 If ∼ is an equivalence relation on X, the collection ofequivalence classes

P = [x] : x ∈ X

is a partition of X.

Proof: We have to prove that each x in X belongs to exactly one equivalenceclass. We first observe that since x ∼ x by (i), x ∈ [x] and hence belongs toat least one equivalence class. To finish the proof, we have to show that ifx ∈ [y] for some other element y ∈ X, then [x] = [y].

We first prove that [y] ⊆ [x]. To this end assume that z ∈ [y]. Bydefinition, this means that y ∼ z. On the other hand, the assumption thatx ∈ [y] means that y ∼ x, which by (ii) implies that x ∼ y. We thus havex ∼ y and y ∼ z, which by (iii) means that x ∼ z. Thus z ∈ [x], and hencewe have proved that [y] ⊆ [x].

The opposite inclusion [x] ⊆ [y] is proved similarly: Assume that z ∈ [x].By definition, this means that x ∼ z. On the other hand, the assumptionthat x ∈ [y] means that y ∼ x. We thus have y ∼ x and x ∼ z, which by(iii) implies that y ∼ z. Thus z ∈ [y], and we have proved that [x] ⊆ [y]. 2

The main reason why this theorem is useful is that it is often morenatural to describe situations through equivalence relations than throughpartitions. The following example assumes that you remember a little linearalgebra:

Example 1.5.3: Let V be a vector space and U a subspace. Define arelation on V by

x ∼ y ⇐⇒ x− y ∈ U

Let us show that∼ is an equivalence relation by checking the three conditions(i)-(iii) in the definition:

(i) Since x− x = 0 ∈ U , we see that x ∼ x for all x ∈ V .

(ii) Assume that x ∼ y. This means that x− y ∈ U , and hence y − x =−(x − y) ∈ U since subspaces are closed under multiplication by scalars.This means that y ∼ x.

(iii) If x ∼ y and y ∼ z, then x− y ∈ U and y − z ∈ U . Since subspacesare closed under addition, this means that x − z = (x − y) + (y − z) ∈ U ,and hence x ∼ z.

As we have now proved that ∼ is an equivalence relation, the equivalenceclasses of ∼ form a partition of V . ♣.

If ∼ is an equivalence relation on X, we let X/∼ denote the set of allequivalence classes of ∼. Such quotient constructions are common in allparts of mathematics, and you will see a few examples in this book.


Exercises to Section 1.5

1. Let P be a partition of a set A, and define a relation ∼ on A by

x ∼ y ⇐⇒ x and y belong to the same set P ∈ P

Check that ∼ is an equivalence relation.

2. Assume that P is the partition defined by an equivilance relation ∼. Showthat ∼ is the equivalence relation induced by P.

3. Let L be the collection of all lines in the plane. Define a relation on L bysaying that two lines are equivalent if and only if they are parallel. Showthat this an equivalence relation on L.

4. Define a relation on C by

z ∼ y ⇐⇒ |z| = |w|

Show that ∼ is an equivalence relation. What does the equivalence classeslook like?

5. Define a relation ∼ on R3 by

(x, y, z) ∼ (x′, y′, z′) ⇐⇒ 3x− y + 2z = 3x′ − y′ + 2z′

Show that ∼ is an equivalence relation and describe the equivalence classesof ∼.

6. Let m be a natural number. Define a relation ≡ on Z by

x ≡ y ⇐⇒ x− y is divisible by m

Show that ≡ is an equivalence relation on Z. How many partition classes arethere, and what do they look like?

7. Let M be the set of all n× n matrices. Define a relation on ∼ on M by

A ∼ B ⇐⇒ if there exists an invertible matrix P such that A = P−1BP

Show that ∼ is an equivalence relation.

1.6 Countability

A set A is called countable if it possible to make a list a1, a2, . . . , an, . . . whichcontains all elements of A. Finite sets A = a1, a2, . . . , am are obviouslycountable1 as they can be listed

a1, a2, . . . , am, am, am, . . .

1Some books exclude the finite sets from the countable and treat them as a separatecategory, but that would be impractical for our purposes.

1.6. COUNTABILITY 17

(you may list the same elements many times). The set N of all naturalnumbers is also countable as it is automatically listed by

1, 2, 3, . . .

It is a little less obvious that the set Z of all integers is countable, but wemay use the list

0, 1,−1, 2,−2, 3,−3 . . .

It is also easy to see that a subset of a countable set must be countable, andthat the image f(A) of a countable set is countable (if an is a listing ofA, then f(an) is a listing of f(A)).

The next result is perhaps more surprising:

Proposition 1.6.1 If the sets A,B are countable, so is the cartesian prod-uct A×B.

Proof: Since A and B are countable, there are lists an, bn containingall the elements of A and B, respectively. But then

(a1, b1), (a2, b1), (a1, b2), (a3, b1), (a2, b2), (a1, b3), (a4, b1), (a3, b2), . . . ,

is a list containing all elements of A×B (observe how the list is made; firstwe list the (only) element (a1, b1) where the indicies sum to 2, then we listthe elements (a2, b1), (a1, b2) where the indicies sum to 3, then the elements(a3, b1), (a2, b2), (a1, b3) where the indicies sum to 4 etc.) 2

Remark If A1, A2, . . . , An is a finite collection of countable sets, then thecartesian product A1 × A2 × · · · × An is countable. This can be proved byinduction from the Proposition above, using that A1 × · · · × Ak × Ak+1 isessentially the same set as (A1 × · · · ×Ak)×Ak+1.

The same trick we used to prove Proposition 1.6.1, can also be used toprove the next result:

Proposition 1.6.2 If the sets A1, A2, . . . , An, . . . are countable, so is theirunion

⋃n∈NAn. Hence a countable union of countable sets is itself countable.

Proof: Let Ai = ai1, ai2, . . . , ain, . . . be a listing of the i-th set. Then

a11, a21, a12, a31, a22, a13, a41, a32, . . .

is a listing of⋃i∈NAi. 2

Proposition 1.6.1 can also be used to prove that the rational numbersare countable:


Proposition 1.6.3 The set Q of all rational numbers is countable.

Proof: According to Proposition 1.6.1, the set Z × N is countable and canbe listed (a1, b1), (a2, b2), (a3, b3), . . .. But then a1

b1, a2b2 ,

a3b3, . . . is a list of all

the elements in Q (due to cancellations, all rational numbers will appearinfinitely many times in this list, but that doesn’t matter). 2

Finally, we prove an important result in the opposite direction:

Theorem 1.6.4 The set R of all real numbers is not countable.

Proof: (Cantor’s diagonal argument) Assume for contradiction that R iscountable and can be listed r1, r2, r3, . . .. Let us write down the decimalexpansions of the numbers on the list:

r1 = w1.a11a12a13a14 . . .

r2 = w2.a21a22a23a24 . . .

r3 = w3.a31a32a33a34 . . .

r4 = w4.a41a42a43a44 . . ....

......

(wi is the integer part of ri, and ai1, ai2, ai3, . . . are the decimals). To get ourcontradiction, we introduce a new decimal number c = 0.c1c2c3c4 . . . wherethe decimals are defined by:

ci =

1 if aii 6= 1

2 if aii = 1

This number has to be different from the i-th number ri on the list as thedecimal expansions disagree on the i-th place (as c has only 1 and 2 asdecimals, there are no problems with nonuniqueness of decimal expansions).This is a contradiction as we assumed that all real numbers were on the list.2


1. Show that a subset of a countable set is countable.

2. Show that if A1, A2, . . . An are countable, then A1×A2×· · ·An is countable.

3. Show that the set of all finite sequences (q1, q2, . . . , qk), k ∈ N, of rationalnumbers is countable.

4. Show that if A is an infinite, countable set, then there is a list a1, a2, a3, . . .which only contains elements in A and where each element in A appearsonly once. Show that if A and B are two infinite, countable sets, there is abijection (i.e. an injective and surjective function) f : A→ B.

1.6. COUNTABILITY 19

5. Show that the set of all subsets of N is not countable (Hint: Try to modifythe proof of Theorem 1.6.4.)


Chapter 2

Metric Spaces

Many of the arguments you have seen in several variable calculus are almostidentical to the corresponding arguments in one variable calculus, especiallyarguments concerning convergence and continuity. The reason is that thenotions of convergence and continuity can be formulated in terms of distance,and that the notion of distance between numbers that you need in the onevariable theory, is very similar to the notion of distance between points orvectors that you need in the theory of functions of severable variables. Inmore advanced mathematics, we need to find the distance between morecomplicated objects than numbers and vectors, e.g. between sequences, setsand functions. These new notions of distance leads to new notions of con-vergence and continuity, and these again lead to new arguments suprisinglysimilar to those we have already seen in one and several variable calculus.

After a while it becomes quite boring to perform almost the same argu-ments over and over again in new settings, and one begins to wonder if thereis general theory that covers all these examples — is it possible to developa general theory of distance where we can prove the results we need onceand for all? The answer is yes, and the theory is called the theory of metricspaces.

A metric space is just a set X equipped with a function d of two variableswhich measures the distance between points: d(x, y) is the distance betweentwo points x and y in X. It turns out that if we put mild and naturalconditions on the function d, we can develop a general notion of distancethat covers distances between numbers, vectors, sequences, functions, setsand much more. Within this theory we can formulate and prove resultsabout convergence and continuity once and for all. The purpose of thischapter is to develop the basic theory of metric spaces. In later chapters weshall meet some of the applications of the theory.

21

22 CHAPTER 2. METRIC SPACES

2.1 Definitions and examples

As already mentioned, a metric space is just a setX equipped with a functiond : X×X → R which measures the distance d(x, y) beween points x, y ∈ X.For the theory to work, we need the function d to have properties similarto the distance functions we are familiar with. So what properties do weexpect from a measure of distance?

First of all, the distance d(x, y) should be a nonnegative number, and itshould only be equal to zero if x = y. Second, the distance d(x, y) from x toy should equal the distance d(y, x) from y to x. Note that this is not alwaysa reasonable assumption — if we, e.g., measure the distance from x to y bythe time it takes to walk from x to y, d(x, y) and d(y, x) may be different —but we shall restrict ourselves to situations where the condition is satisfied.The third condition we shall need, says that the distance obtained by goingdirectly from x to y, should always be less than or equal to the distance weget when we go via a third pont z, i.e.

d(x, y) ≤ d(x, z) + d(z, x)

It turns out that these conditions are the only ones we need, and we sumthem up in a formal definition.

Definition 2.1.1 A metric space (X, d) consists of a non-empty set X anda function d : X ×X → [0,∞) such that:

(i) (Positivity) For all x, y ∈ X, d(x, y) ≥ 0 with equality if and only ifx = y.

(ii) (Symmetry) For all x, y ∈ X, d(x, y) = d(y, x).

(iii) (Triangle inequality) For all x, y, z ∈ X

d(x, y) ≤ d(x, z) + d(z, y)

A function d satisfying conditions (i)-(iii), is called a metric on X.

Comment: When it is clear – or irrelevant – which metric d we have inmind, we shall often refer to “the metric space X” rather than “the metricspace (X, d)”.

Let us take a look at some examples of metric spaces.

Example 1: If we let d(x, y) = |x−y|, (R, d) is a metric space. The first twoconditions are obviously satisfied, and the third follows from the ordinarytriangle inequality for real numbers:

d(x, y) = |x− y| = |(x− z) + (z − y)| ≤ |x− z|+ |z − y| = d(x, z) + d(z, y)

2.1. DEFINITIONS AND EXAMPLES 23

Example 2: If we let d(x,y) = |x−y|, then (Rn, d) is a metric space. Thefirst two conditions are obviously satisfied, and the third follows from thetriangle inequality for vectors the same way as above :

d(x,y) = |x− y| = |(x− z) + (z− y)| ≤ |x− z|+ |z− y| = d(x, z) + d(z,y)

Example 3: Assume that we want to move from one point x = (x1, x2)in the plane to another y = (y1, y2), but that we are only allowed to movehorizontally and vertically. If we first move horizontally from (x1, x2) to(y1, x2) and then vertically from (y1, x2) to (y1, y2), the total distance is

d(x,y) = |y1 − x1|+ |y2 − x2|

This gives us a metric on R2 which is different from the usual metric inExample 2. It is ofte referred to as the Manhattan metric or the taxi cabmetric.

Also in this cas the first two conditions of a metric space are obviouslysatisfied. To prove the triangle inequality, observe that for any third pointz = (z1, z2), we have

d(x,y) = |y1 − x1|+ |y2 − x1| =

= |(y1 − z1) + (z1 − x1)|+ |(y2 − z2) + (z2 − x2)| ≤

≤ |y1 − z1|+ |z1 − x1|+ |y2 − z2|+ |z2 − x2| =

= |z1 − x1|+ |z2 − x2|+ |y1 − z1|+ |y2 − z2| =

= d(x, z) + d(z, y)

where we have used the ordinary triangle inequality for real numbers to getfrom the second to the third line. ♣

Example 4: We shall now take a look at an example of a different kind.Assume that we want to send messages in a language with N symbols (letters,numbers, punctuation marks, space, etc.) We assume that all messages havethe same length K (if they are too short or too long, we either fill them outor break them into pieces). We let X be the set of all messages, i.e. allsequences of symbols from the language of length K. If x = (x1, x2, . . . , xK)and y = (y1, y2, . . . , yK) are two messages, we define

d(x,y) = the number of indices n such that xn 6= yn

It is not hard to check that d is a metric. It is usually referred to as theHamming-metric, and is much used in coding theory where it serves as ameasure of how much a message gets distorted during transmission. ♣


Example 5: There are many ways to measure the distance between func-tions, and in this example we shall look at some. Let X be the set of allcontinuous functions f : [a, b]→ R. Then

d1(f, g) = sup|f(x)− g(x)| : x ∈ [a, b]

is a metric on X. This metric determines the distance beween two functionsby measuring the distance at the x-value where the graphs are most apart.This means that the distance between two functions may be large even ifthe functions in average are quite close. The metric

d2(f, g) =

∫ b

a|f(x)− g(x)| dx

instead sums up the distance between f(x) og g(x) at all points. A thirdpopular metric is

d3(f, g) =

(∫ b

a|f(x)− g(x)|2 dx

) 12

This metric is a generalization of the usual (euclidean) metric in Rn:

d(x,y) =

√√√√ n∑i=1

(xi − yi)2 =

(n∑i=1

(xi − yi)2

) 12

(think of the integral as a generalized sum). That we have more thanone metric on X, doesn’t mean that one of them is “right” and the oth-ers “wrong”, but that they are useful for different purposes. ♣

Example 6: The metrics in this example may seem rather strange. Al-though they are not very useful in applications, they are important to knowabout as they are totally different from the metrics we are used to from Rnand may help sharpen our intuition of how a metric can be. Let X be anynon-empty set, and define:

d(x, y) =

0 if x = y

1 if x 6= y

It is not hard to check that d is a metric on X, usually referred to as thediscrete metric. ♣

Example 7: There are many ways to make new metric spaces from old.The simplest is the subspace metric: If (X, d) is a metric space and Ais a non-empty subset of X, we can make a metric dA on A by putting

2.1. DEFINITIONS AND EXAMPLES 25

dA(x, y) = d(x, y) for all x, y ∈ A — we simply restrict the metric to A. Itis trivial to check that dA is a metric on A. In practice, we rarely bother tochange the name of the metric and refer to dA simply as d, but rememberin the back of our head that d is now restricted to A. ♣

There are many more types of metric spaces than we have seen so far,but the hope is that the examples above will give you a certain impres-sion of the variety of the concept. In the next section we shall see how wecan define convergence and continuity for sequences and functions in metricspaces. When we prove theorems about these concepts, they automaticallyhold in all metric spaces, saving us the labor of having to prove them overand over again each time we introduce a new class of spaces.

An important question is when two metric spaces (X, dX) and (Y, dY )are the same. The easy answer is to say that we need the sets X,Y andthe functions dX , dY to be equal. This is certainly correct if one interprets“being the same” in the strictest sense, but it is often more appropriate touse a looser definition — in mathematics we are usually not interested inwhat the elements of a set are, but only in the relationship between them(you may, e.g., want to ask yourself what the natural number 3 “is”).

An isometry between two metric spaces is a bijection which preserveswhat is important for metric spaces: the distance between points. Moreprecisely:

Definition 2.1.2 Assume that (X, dX) and (Y, dY ) are metric spaces. Anisometry from (X, dX) to (Y, dY ) is a bijection i : X → Y such that dX(x, y) =dY (i(x), i(y)) for all x, y ∈ X. We say that (X, dX) and (Y, dY ) are isomet-ric if there exists an isometry from (X, dX) to (Y, dY ).

In many situations it is convenient to think of two metric spaces as “thesame” if they are isometric. Note that if i is an isometry from (X, dX) to(Y, dY ), then the inverse i−1 is an isometry from (Y, dY ) to (X, dX), andhence being isometric is a symmetric relation.

A map which preserves distance, but does not necessarily hit all of Y , iscalled an embedding :

Definition 2.1.3 Assume that (X, dX) and (Y, dY ) are metric spaces. Anembedding of (X, dX) into (Y, dY ) is an injection i : X → Y such thatdX(x, y) = dY (i(x), i(y)) for all x, y ∈ X.

Note that an embedding i can be regarded as an isometry between Xand its image i(X).

We end this section with an important consequence of the triangle in-equality.


Proposition 2.1.4 (Inverse Triangle Inequality) For all elements x, y, zin a metric space (X, d), we have

|d(x, y)− d(x, z)| ≤ d(y, z)

Proof: Since the absolute value |d(x, y) − d(x, z)| is the largest of the twonumbers d(x, y)− d(x, z) and d(x, z)− d(x, y), it suffices to show that theyare both less than or equal to d(y, z). By the triangle inequality

d(x, y) ≤ d(x, z) + d(z, y)

and hence d(x, y) − d(x, z) ≤ d(z, y) = d(y, z). To get the other inequality,we use the triangle inequality again,

d(x, z) ≤ d(x, y) + d(y, z)

and hence d(x, z)− d(x, y) ≤ d(y, z). 2


1. Show that (X, d) in Example 4 is a metric space.

2. Show that (X, d1) in Example 5 is a metric space.

3. Show that (X, d2) in Example 5 is a metric space.

4. Show that (X, d) in Example 6 is a metric space.

5. A sequence xnn∈N of real numbers is called bounded if there is a numberM ∈ R such that |xn| ≤ M for all n ∈ N. Let X be the set of all boundedsequences. Show that

d(xn, yn) = sup|xn − yn| : n ∈ N

is a metric on X.

6. If V is a (real) vector space, a function || · || : V → R is called a norm if thefollowing conditions are satisfied:

(i) For all x ∈ V , ||x|| ≥ 0 with equality if and only if x = 0.

(ii) ||αx|| = |α|||x|| for all α ∈ R and all x ∈ V .

(iii) ||x+ y||| ≤ ||x||+ ||y|| for all x, y ∈ V .

Show that if || · || is a norm, then d(x, y) = ||x− y|| defines a metric on V .

7. Show that for vectors x,y, z ∈ Rm,

| |x− y| − |x− z| | ≤ |y − z|

8. Assume that d1 og d2 are two metrics on X. Show that

d(x, y) = d1(x, y) + d2(x, y)

is a metric on X.

2.2. CONVERGENCE AND CONTINUITY 27

9. Assume that (X, dX) and (Y, dY ) are two metric spaces. Define a function

d : (X × Y )× (X × Y )→ R

byd((x1, y1), (x2, y2)) = dX(x1, x2) + dY (y1, y2)

Show that d is a metric on X × Y .

10. Let X be a non-empty set, and let ρ : X ×X → R be a function satisfying:

(i) ρ(x, y) ≥ 0 with equality if and only if x = y.

(ii) ρ(x, y) ≤ ρ(x, z) + ρ(z, y) for all x, y, z ∈ X.

Define d : X ×X → R by

d(x, y) = maxρ(x, y), ρ(y, x)

Show that d is a metric on X.

11. Let a ∈ R. Show that the function f(x) = x+ a is an isometry from R to R.

12. Recall that an n × n matrix U is orthogonal if U−1 = UT . Show that ifU is orthogonal and b ∈ Rn, then the mapping i : Rn → Rn given byi(x) = Ux + b is an isometry.

2.2 Convergence and continuity

We begin our study of metric spaces by defining convergence. A sequencexn in a metric space X is just an ordered collection x1, x2, x3, . . . , xn, . . .of elements in X enumerated by the natural numbers.

Definition 2.2.1 Let (X, d) be a metric space. A sequencee xn in Xconverges to a point a ∈ X if there for every ε > 0 exists an N ∈ N suchthat d(xn, a) < ε for all n ≥ N . We write limn→∞ xn = a or xn → a.

Note that this definition exactly mimics the definition of convergence inR og Rn. Here is an alternative formulation.

Lemma 2.2.2 A sequence xn in a metric space (X, d) converges to a ifand only if limn→∞ d(xn, a) = 0.

Proof: The distances d(xn, a) form a sequence of nonnegative numbers.This sequence converges to 0 if and only if there for every ε > 0 exists anN ∈ N such that d(xn, a) < ε when n ≥ N . But this is exactly what thedefinition above says. 2

May a sequence converge to more than one point? We know that itcannot in Rn, but some of these new metric spaces are so strange that wecan not be certain without a proof.


Proposition 2.2.3 A sequence in a metric space can not converge to morethan one point.

Proof: Assume that limn→∞ xn = a and limn→∞ xn = b. We must showthat this is only possible if a = b. According to the triangle inequality

d(a, b) ≤ d(a, xn) + d(xn, b)

Taking limits, we get

d(a, b) ≤ limn→∞

d(a, xn) + limn→∞

d(xn, b) = 0 + 0 = 0

Consequently, d(a, b) = 0, and according to point (i) (positivity) in the def-inition of metric spaces, a = b. 2

Note how we use the conditions in Definition 2.1.1 in the proof above. Sofar they are all we know about metric spaces. As the theory develops, weshall get more and more tools to work with.

We can also phrase the notion of convergence in more geometric terms.If a is an element of a metric space X, and r is a positive number, the (open)ball centered at a with radius r is the set

B(a; r) = x ∈ X | d(x, a) < r

As the terminology suggests, we think of B(a; r) as a ball around a withradius r. Note that x ∈ B(a; r) means exactly the same as d(x, a) < r.

The definition of convergence can now be rephrased by saying that xnconverges to a if the elements of the sequence xn eventually end up insideany ball B(a; ε) around a.

Let us now see how we can define continuity in metric spaces.

Definition 2.2.4 Assume that (X, dX), (Y, dY ) are two metric spaces. Afunction f : X → Y is continuous at a point a ∈ X if for every ε > 0 thereis a δ > 0 such that dY (f(x), f(a)) < ε whenever dX(x, a) < δ.

This definition says exactly the same as as the usual definitions of continuityfor functions of one or several variables; we can get the distance betweenf(x) and f(a) smaller than ε by choosing x such that the distance betweenx and a is smaller than δ. The only difference is that we are now using themetrics dX og dY to measure the distances.

A more geometric formulation of the definition is to say that for any openball B(f(a); ε) around f(a), there is an open ball B(a, δ) around a such thatf(B(a; δ)) ⊆ B(f(a); ε) (make a drawing!).

There is a close connection between continuity and convergence whichreflects our intuitive feeling that f is continuous at a point a if f(x) ap-proaches f(a) whenever x approaches a.

2.2. CONVERGENCE AND CONTINUITY 29

Proposition 2.2.5 The following are equivalent for a function f : X → Ybetween metric spaces:

(i) f is continuous at a point a ∈ X.

(ii) For all sequences xn converging to a, the sequence f(xn) convergesto f(a).

Proof: (i) =⇒ (ii): We must show that for any ε > 0, there is an N ∈ Nsuch that dY (f(xn), f(a)) < ε when n ≥ N . Since f is continuous at a,there is a δ > 0 such that dY (f(xn), f(a)) < ε whenever dX(x, a) < δ. Sincexn converges to a, there is an N ∈ N such that dX(xn, a) < δ when n ≥ N .But then dY (f(xn), f(a)) < ε for all n ≥ N .

(ii) =⇒ (i) We argue contrapositively: Assume that f is not continuousat a. We shall show that there is a sequence xn converging to a such thatf(xn) does not converge to f(a). That f is not continuous at a, meansthat there is an ε > 0 such that no matter how small we choose δ > 0, thereis an x such that dX(x, a) < δ, but dY (f(x), f(a)) ≥ ε. In particular, we canfor each n ∈ N find an xn such that dX(xn, a) < 1

n , but dY (f(xn), f(a)) ≥ ε.Then xn converges to a, but f(xn) does not converge to f(a). 2

The composition of two continuous functions is continuous.

Proposition 2.2.6 Let (X, dX), (Y, dY ), (Z, dZ) be three metric spaces.Assume that f : X → Y and g : Y → Z are two functions, and let h : X → Zbe the composition h(x) = g(f(x)). If f is continuous at the point a ∈ Xand g is continuous at the point b = f(a), then h is continuous at a.

Proof: Assume that xn converges to a. Since f is continuous at a, thesequence f(xn) converges to f(a), and since g is continuous at b = f(a),the sequence g(f(xn)) converges to g(f(a)), i.e h(xn) converges to h(a).By the proposition above, h is continuous at a. 2

As in calculus, a function is called continuous if it is continuous at allpoints:

Definition 2.2.7 A function f : X → Y between two metrics spaces iscalled continuous if it continuous at all points x in X.

Occasionally, we need to study functions which are only defined on asubset A of our metric space X. We define continuity of such functions byrestricting the conditions to elements in A:

Definition 2.2.8 Assume that (X, dX), (Y, dY ) are two metric spaces andthat A is a subset of X. A function f : A → Y is continuous at a point


a ∈ A if for every ε > 0 there is a δ > 0 such that dY (f(x), f(a)) < εwhenever x ∈ A and dX(x, a) < δ. We say that f is continuous if it iscontinuous at all a ∈ A.

There is another way of formulating this definition that is often useful: Wecan think of f as a function from the metric space (A, dA) (recall Example7 in Section 2.1) to (Y, dY ) and use the original definition of continuity in2.2.4. By just writing it out, it is easy to see that this definition says exactlythe same as the one above. The advantage of the second definition is thatit makes it easier to transfer results from the full to the restricted setting,e.g., it is now easy to see that Proposition 2.2.5 can be generalized to:

Proposition 2.2.9 Assume that (X, dX) and (Y, dY ) are metric spaces andthat A ⊆ X. Then the following are equivalent for a function f : A→ Y :

(i) f is continuous at a point a ∈ A.

(ii) For all sequences xn in A converging to a, the sequence f(xn)converges to f(a).


1. Assume that (X, d) is a discrete metric space (recall Example 6 in Section2.1). Show that the sequence xn converges to a if and only if there is anN ∈ N such that xn = a for all n ≥ N .

2. Prove Proposition 2.2.6 without using Proposition 2.2.5, i.e. use only thedefinition of continuity.

3. Prove Proposition 2.2.9.

4. Assume that (X, d) is a metric space, and let R have the usual metricdR(x, y) = |x− y|. Assume that f, g : X → R are continuous functions.

a) Show that cf is continuous for all constants c ∈ R.

b) Show that f + g is continuous.

c) Show that fg is continuous.

5. Let (X, d) be a metric space and choose a point a ∈ X. Show that thefunction f : X → R given by f(x) = d(x, a) is continuous (we are using theusual metric dR(x, y) = |x− y| on R).

6. Let (X, dX) and (Y, dY ) be two metric spaces. A function f : X → Yis said to be a Lipschitz function if there is a constant K ∈ R such thatdY (f(u), f(v)) ≤ KdX(u, v) for all u, v ∈ X. Show that all Lipschitz func-tions are continuous.

7. Let dR be the usual metric on R and let ddisc be the discrete metric on R.Let id : R→ R be the identity function id(x) = x. Show that

id : (R, ddisc)→ (R, dR)

2.3. OPEN AND CLOSED SETS 31

is continuous, but that

id : (R, dR)→ (R, ddisc)

is not continuous. Note that this shows that the inverse of a bijective, con-tinuous function is not necessarily continuous.

8. Assume that d1 and d2 are two metrics on the same space X. We say thatd1 and d2 are equivalent if there are constants K and M such that d1(x, y) ≤Kd2(x, y) and d2(x, y) ≤Md1(x, y) for all x, y ∈ X.

a) Assume that d1 and d2 are equivalent metrics on X. Show that if xnconverges to a in one of the metrics, it also converges to a in the othermetric.

b) Assume that d1 and d2 are equivalent metrics on X, and that (Y, d) isa metric space. Show that if f : X → Y is continuous when we use thed1-metric on X, it is also continuous when we use the d2-metric.

c) We are in the same setting as i part b), but this time we have a functiong : Y → X. Show that if g is continuous when we use the d1-metric onX, it is also continuous when we use the d2-metric.

d Assume that d1, d2 and d3 are three metrics on X. Show that if d1and d2 are equivalent, and d2 and d3 are equivalent, then d1 and d3 areequivalent.

e) Show that

d1(x,y) = |x1 − y1|+ |x2 − y2|+ . . .+ |xn − yn|

d2(x,y) = max|x1 − y1|, |x2 − y2|, . . . , |xn − yn[

d3(x,y) =√|x1 − y1|2 + |x2 − y2|2 + . . .+ |xn − yn|2

are equivalent metrics on Rn.

2.3 Open and closed sets

In this and the following sections, we shall study some of the most importantclasses of subsets of metric spaces. We begin by recalling and extending thedefinition of balls in a metric space:

Definition 2.3.1 Let a be a point in a metric space (X, d), and assume thatr is a positive, real number. The (open) ball centered at a with radius r isthe set

B(a; r) = x ∈ X : d(x, a) < r

The closed ball centered ar a with radius r is the set

B(a; r) = x ∈ X : d(x, a) ≤ r


In many ways, balls in metric spaces behave just the way we are used to, butgeometrically they may look quite different from ordinary balls. A ball in theManhattan metric (Example 3 in Section 2.1) looks like an ace of diamonds,while a ball in the discrete metric (Example 6 i Section 2.1) consists eitherof only one point or the entire space X.

If A is a subset of X and x is a point in X, there are three possibilities:

(i) There is a ball B(x; r) around x which is contained in A. In this casex is called an interior point of A.

(ii) There is a ball B(x; r) around x which is contained in the complementAc. In this case x is called an exterior point of A.

(iii) All balls B(x; r) around x contain points in A as well as points in thecomplement Ac. In this case x is a boundary point of A.

Note that an interior point always belongs to A, while an exterior pointnever belongs to A. A boundary point will some times belong to A, andsome times to Ac.

We now define the important concepts of open and closed sets:

Definition 2.3.2 A subset A of a metric space is open if it does not containany of its boundary points, and it is closed if it contains all its boundarypoints.

Most sets contain some, but not all of their boundary points, and arehence neither open nor closed. The empty set ∅ and the entire space X areboth open and closed as they do not have any boundary points. Here is anobvious, but useful reformulation of the definition of an open set.

Proposition 2.3.3 A subset A of a metric space X is open if and only ifit only consists of interior points, i.e. for all a ∈ A, there is a ball B(a; r)around a which is contained in A.

Observe that a set A and its complement Ac have exactly the sameboundary points. This leads to the following useful result.

Proposition 2.3.4 A subset A of a metric space X is open if and only ifits complement Ac is closed.

Proof: If A is open, it does not contain any of the (common) boundarypoints. Hence they all belong to Ac, and Ac must be closed.

Conversely, if Ac is closed, it contains all boundary points, and hence Acan not have any. This means that A is open. 2

The following observation may seem obvious, but needs to be proved:


Lemma 2.3.5 All open balls B(a; r) are open sets, while all closed ballsB(a; r) are closed sets.

Proof: We prove the statement about open balls and leave the other as anexercise. Assume that x ∈ B(a; r); we must show that there is a ball B(x; ε)around x which is contained in B(a; r). If we choose ε = r − d(x, a), we seethat if y ∈ B(x; ε) then by the triangle inequality

d(y, a) ≤ d(y, x) + d(x, a) < ε+ d(x, a) = (r − d(x, a)) + d(x, a) = r

Thus d(y, a) < r, and hence B(x; ε) ⊆ B(a; r) 2

The next result shows that closed sets are indeed closed as far as se-quences are concerned:

Proposition 2.3.6 Assume that F is a subset of a metric space X. Thefollowing are equivalent:

(i) F is closed.

(ii) If xn is a convergent sequence of elements in F , then the limit a =limn→∞ xn always belongs to F .

Proof: Assume that F is closed and that a does not belong to F . Wemust show that a sequence from F cannot converge to a. Since F is closedand contains all its boundary points, a has to be an exterior point, andhence there is a ball B(a; ε) around a which only contains points from thecomplement of F . But then a sequence from F can never get inside B(a, ε),and hence cannot converge to a.

Assume now that that F is not closed. We shall construct a sequencefrom F that converges to a point outside F . Since F is not closed, there is aboundary point a that does not belong to F . For each n ∈ N, we can find apoint xn from F in B(a; 1

n). Then xn is a sequence from F that convergesto a point a which is not in F . 2

An open set containing x is called a neighborhood of x1. The next resultis rather silly, but also quite useful.

Lemma 2.3.7 Let U be a subset of the metric space X, and assume thateach x0 ∈ U has a neighborhood Ux0 ⊆ U . Then U is open.

Proof: We must show that any x0 ∈ U is an interior point. Since Ux0 isopen, there is an r > 0 such that B(x0, r) ⊆ Ux0 . But then B(x0, r) ⊆ U ,

1In some books, a neighborhood of x is not necessarily open, but does contain a ballcentered at x. What we have defined, is the then referred to as an open neighborhood


which shows that x0 is an interior point of U . 2

In Proposition 2.2.5 we gave a characterization of continuity in terms ofsequences. We shall now prove three characterizations in terms of open andclosed sets. The first one characterizes continuity at a point.

Proposition 2.3.8 Let f : X → Y be a function between metric spaces,and let x0 be a point in X. Then the following are equivalent:

(i) f is continuous at x0.

(ii) For all neighborhoods V of f(x0), there is a neighborhood U of x0 suchthat f(U) ⊆ V .

Proof: (i) =⇒ (ii): Assume that f is continuous at x0. If V is a neighbor-hood of f(x0), there is a ball BY (f(x0), ε) centered at f(x0) and contained inV . Since f is continuous at x0, there is a δ > 0 such that dY (f(x), f(x0)) < εwhenever dX(x, x0) < δ. But this means that f(BX(x0, δ)) ⊆ BY (f(x0), ε) ⊆V . Hence (ii) is satisfied if we choose U = B(x0, δ).

(ii) =⇒ (i) We must show that for any given ε > 0, there is a δ > 0 suchthat dY (f(x), f(x0)) < ε whenever dX(x, x0) < δ. Since V = BY (f(x0), ε)is a neighbohood of f(x0), there must be a neighborhood U of x0 such thatf(U) ⊆ V . Since U is open, there is a ball B(x0, δ) centered at x0 andcontained in U . Assume that dX(x, x0) < δ. Then x ∈ BX(x0, δ) ⊆ U ,and hence f(x) ∈ V = BY (f(x0), ε), which means that dY (f(x), f(x0)) < ε.Hence we have found a δ > 0 such that dY (f(x), f(x0)) < ε wheneverdX(x, x0) < δ, and thus f is continuous at x0. 2

We can also use open sets to characterize global continuity of functions:

Proposition 2.3.9 The following are equivalent for a function f : X → Ybetween two metric spaces:

(i) f is continuous.

(ii) Whenever V is an open subset of Y , the inverse image f−1(V ) is anopen set in X.

Proof: (i) =⇒ (ii): Assume that f is continuous and that V ⊆ Y is open.We shall prove that f−1(V ) is open. For any x0 ∈ f−1(V ), f(x0) ∈ V , andwe know from the previous theorem that there is a neighborhood Ux0 ofx0 such that f(Ux0) ⊆ V . But then Ux0 ⊆ f−1(V ), and by Lemma 2.3.7,f−1(V ) is open.

(ii) =⇒ (i) Assume that the inverse images of open sets are open. Toprove that f is continuous at an arbitrary point x0, Proposition 2.3.6 tellsus that it suffices to show that for any neighborhood V of f(x0), there is a


neighborhood U of x0 such that f(U) ⊆ V . But this easy: Since the inverseimage of an open set is open, we can simply choose U = f−1(V ). 2

The description above is useful in many situations. Using that inverseimages commute with complements, and that closed sets are the comple-ments of open, we can translate it into a statement about closed sets:

Proposition 2.3.10 The following are equivalent for a function f : X → Ybetween two metric spaces:

(i) f is continuous.

(ii) Whenever F is a closed subset of Y , the inverse image f−1(F ) is aclosed set in X.

Proof: (i) =⇒ (ii): Assume that f is continuous and that F ⊆ Y is closed.Then F c is open, and by the previous proposition, f−1(F c) is open. Sinceinverse images commute with complements, (f−1(F ))c = f−1(F c). Thismeans that f−1(F ) has an open complement and hence is closed.

(ii) =⇒ (i) Assume that the inverse images of closed sets are closed.According to the previous proposition, it suffices to show that the inverseimage of any open set V ⊆ Y is open. But if V is open, the complement V c

is closed, and hence by assumption f−1(V c) is closed. Since inverse imagescommute with complements, (f−1(V ))c = f−1(V c). This means that thecomplement of f−1(V ) is closed, and hence f−1(V ) is open. 2

Mathematicians usually sum up the last two theorems by saying thatopenness and closedness are preserved under inverse, continuous images. Beaware that these properties are not preserved under continuous, direct im-ages; even if f is continuous, the image f(U) of an open set U need not beopen, and the image f(F ) of a closed F need not be closed:

Example 1: Let f, g : R→ R be the continuous functions defined by

f(x) = x2 and g(x) = arctanx

The set R is both open and closed, but f(R) equals [0,∞) which is not open,and g(R) equals (−π

2 ,π2 ) which is not closed. Hence the continuous image

of an open set need not be open, and the continuous image of a closed setneed not be closed. ♣

We end this section with two simple but useful observations on open andclosed sets.

Proposition 2.3.11 Let (X, d) be a metric space.


a) If G is a (finite or infinite) collection of open sets, then the union⋃G∈G G is open.

b) If G1, G2, . . . , Gn is a finite collection of open sets, then the intersec-tion G1 ∩G2 ∩ . . . ∩Gn is open.

Proof: Left to the reader (see Exercise 12, where you are also asked to showthat the intersection of infinitely many open sets is not necessarily open). 2

Proposition 2.3.12 Let (X, d) be a metric space.

a) If F is a (finite or infinite) collection of closed sets, then the intersec-tion

⋂F∈F F is closed.

b) If F1, F2, . . . , Fn is a finite collection of closed sets, then the unionF1 ∪ F2 ∪ . . . ∪ Fn is closed.

Proof: Left to the reader (see Exercise 13, where you are also asked to showthat the union of infinitely many closed sets is not necessarily closed). 2

Propositions 2.3.11 and 2.3.12 are the starting points for topology, aneven more abstract theory of nearness.


1. Assume that (X, d) is a discrete metric space.

a) Show that an open ball in X is either a set with only one element (asingleton) or all of X.

b) Show that all subsets of X are both open and closed.

c) Assume that (Y, dY ) is another metric space. Show that all functionsf : X → Y are continuous.

2. Give a geometric description of the ball B(a; r) in the Manhattan metric (seeExample 3 in Section 2.1). Make a drawing of a typical ball. Show that theManhattan metric and the usual metric in R2 have exactly the same opensets.

3. Assume that F is a non-empty, closed and bounded subset of R (with theusual metric d(x, y) = |y − x|). Show that supF ∈ F and inf F ∈ F . Givean example of a bounded, but not closed set F such that supF ∈ F andinf F ∈ F .

4. Prove the second part of Lemma 2.3.5, i.e. prove that a closed ball B(a; r) isalways a closed set.

5. Assume that f : X → Y and g : Y → Z are continuous functions. UseProposition 2.3.9 to show that the composition g f : X → Z is continuous.


6. Assume that A is a subset of a metric space (X, d). Show that the interiorpoints of A are the exterior points of Ac, and that the exterior points of Aare the interior points of Ac. Check that the boundary points of A are theboundary points of Ac.

7. Assume that A is a subset of a metric space X. The interior A of A is theset consisting of all interior points of A. Show that A is open.

8. Assume that A is a subset of a metric space X. The closure A of A is theset consisting of all interior points plus all boundary points of A.

a) Show that A is closed.

b) Let an be a sequence from A converging to a point a. Show thata ∈ A.

9. Let (X, d) be a metric space, and let A be a subset of X. We shall considerA with the subset metric dA.

a) Assume that G ⊆ A is open in (X, d). Show that G is open in (A, dA).

b) Find an example which shows that although G ⊆ A is open in (A, dA)it need not be open in (X, dX).

c) Show that if A is an open set in (X, dX), then a subset G of A is openin (A, dA) if and only if it is open in (X, dX)

10. Let (X, d) be a metric space, and let A be a subset of X. We shall considerA with the subset metric dA.

a) Assume that F ⊆ A is closed in (X, d). Show that F is closed in (A, dA).

b) Find an example which shows that although F ⊆ A is closed in (A, dA)it need not be closed in (X, dX).

c) Show that if A is a closed set in (X, dX), then a subset F of A is openin (A, dA) if and only if it is closed in (X, dX)

11. Let (X, d) be a metric space and give R the usual metric. Assume thatf : X → R is continuous.

a) Show that the set

x ∈ X | f(x) < a

is open for all a ∈ R.

a) Show that the set

x ∈ X | f(x) ≤ a

is closed for all a ∈ R.

12. Prove Proposition 2.3.11. Find an example of an infinite collection of opensets G1, G2, . . . whose intersection is not open.

13. Prove Proposition 2.3.12. Find an example of an infinite collection of closedsets F1, F2, . . . whose union is not closed.


2.4 Complete spaces

One of the reasons why calculus in Rn is so successful, is that Rn is acomplete space. We shall now generalize this notion to metric spaces. Thekey concept is that of a Cauchy sequence:

Definition 2.4.1 A sequence xn in a metric space (X, d) is a Cauchysequence if for each ε > 0 there is an N ∈ N such that d(xn, xm) < εwhenever n,m ≥ N .

We begin by a simple observation:

Proposition 2.4.2 Every convergent sequence is a Cauchy sequence.

Proof: If a is the limit of the sequence, there is for any ε > 0 a numberN ∈ N such that d(xn, a) < ε

2 whenever n ≥ N . If n,m ≥ N , the triangleinequality tells us that

d(xn, xm) ≤ d(xn, a) + d(a, xm) <ε

2+ε

2= ε

and consequently xn is a Cauchy sequence. 2

The converse of the proposition above does not hold in all metric spaces,and we make the following definition:

Definition 2.4.3 A metric space is called complete if all Cauchy sequencesconverge.

We know from MAT1110 that Rn is complete, but that Q is not when weuse the usual metric d(x, y) = |x − y|. The complete spaces are in manyways the “nice” metric spaces, and we shall spend much time studying theirproperties. We shall also spend some time showing how we can make non-complete spaces complete. Example 5 in Section 2.1 (where X is the spaceof all continuous f : [a, b] → R) shows some interesting cases; X with themetric d1 is complete, but not X with the metrics d2 and d3. By introducinga stronger notion of integral (the Lebesgue integral) we can extend d2 andd3 to complete metrics by making them act on richer spaces of functions.In Section 2.7, we shall study an abstract method for making incompletespaces complete by adding new points.

The following proposition is quite useful. Remember that if A is a subsetof X, then dA is the subspace metric obtained by restricting d to A (seeExample 7 in Section 2.1).

Proposition 2.4.4 Assume that (X, d) is a complete metric space. If A isa subset of X, (A, dA) is complete if and only if A is closed.

2.4. COMPLETE SPACES 39

Proof: Assume first that A is closed. If an is a Cauchy sequence in A, anis also a Cauchy sequence in X, and since X is complete, an converges toa point a ∈ X. Since A is closed, Proposition 2.3.6 tells us that a ∈ A. Butthen an converges to a in (A, dA), and hence (A, dA) is complete.

If A is not closed, there is a boundary point a that does not belong to A.Each ball B(a, 1

n) must contain an element an from A. In X, the sequencean converges to a, and must be a Cauchy sequence. However, since a /∈ A,the sequence an does not converge to a point in A. Hence we have founda Cauchy sequence in (A, dA) that does not converge to a point in A, andhence (A, dA) is incomplete. 2

The nice thing about complete spaces is that we can prove that sequencesconverge to a limit without actually constructing or specifying the limit —all we need is to prove that the sequence is a Cauchy sequence. To provethat a sequence has the Cauchy property, we only need to work with thegiven terms of the sequence and not the unknown limit, and this often makesthe arguments much easier. As an example of this technique, we shall nowprove an important theorem that will be useful later in the book, but firstwe need some definitions.

A function f : X → X is called a contraction if there is a positive numbers < 1 such that

d(f(x), f(y)) ≤ s d(x, y) for all x, y ∈ X

We call s a contraction factor for f . All contractions are continuous (provethis!), and by induction it is easy to see that

d(fn(x), fn(y)) ≤ snd(x, y)

where fn(x) = f(f(f(. . . f(x)))) is the result of iterating f exactly n times.If f(a) = a, we say that a is a fixed point for f .

Theorem 2.4.5 (Banach’s Fixed Point Theorem) Assume that (X, d)is a complete metric space and that f : X → X is a contraction. Then fhas a unique fixed point a, and no matter which starting point x0 ∈ X wechoose, the sequence

x0, x1 = f(x0), x2 = f2(x0), . . . , xn = fn(x0), . . .

converges to a.

Proof: Let us first show that f can not have more than one fixed point. Ifa and b are two fixed points, and s is a contraction factor for f , we have

d(a, b) = d(f(a), f(b)) ≤ s d(a, b)


Since 0 < s < 1, this is only possible if d(a, b) = 0, i.e. if a = b.

To show that f has a fixed point, choose a starting point x0 in X andconsider the sequence

x0, x1 = f(x0), x2 = f2(x0), . . . , xn = fn(x0), . . .

Assume, for the moment, that we can prove that this is a Cauchy sequence.Since (X, d) is complete, the sequence must converge to a point a. To provethat a is a fixed point, observe that we have xn+1 = f(xn) for all n, andtaking the limit as n → ∞, we get a = f(a). Hence a is a fixed point of f ,and the theorem must hold. Thus it suffices to prove our assumption thatxn is a Cauchy sequence.

Choose two elements xn and xn+k of the sequence. By repeated use ofthe triangle inequality, we get

d(xn, xn+k) ≤ d(xn, xn+1) + d(xn+1, xn+2) + . . .+ d(xn+k−1, xn+k) =

= d(fn(x0), fn(x1)) + d(f(n+1)(x0), f(n+1)(x1)) + . . .

. . .+ d(f(n+k−1)(x0), f(n+k−1)(x1)) ≤

≤ snd(x0, x1) + sn+1d(x0, x1) + . . .+ sn+k−1d(x0, x1) =

=sn(1− sk)

1− sd(x0, x1) ≤ sn

1− sd(x0, x1)

where we have summed a geometric series to get to the last line. Sinces < 1, we can get the last expression as small as we want by choosing nlarge enough. Given an ε > 0, we can in particular find an N such thatsN

1−s d(x0, x1) < ε. For n,m = n+ k larger than or equal to N , we thus have

d(xn, xm) ≤ sn

1− sd(x0, x1) < ε

and hence xn is a Cauchy sequence. 2

In Section 3.4 we shall use Banach’s Fixed Point Theorem to prove theexistence of solutions to quite general differential equations.


1. Show that the discrete metric is always complete.

2. Assume that (X, dX) and (Y, dY ) are complete spaces, and give X × Y themetric d defined by

d((x1, y1), (x2, y2)) = dX(x1, x2) + dY (y1, y2)

Show that (X × Y, d) is complete.

2.5. COMPACT SETS 41

3. If A is a subset of a metric space (X, d), the diameter diam(A) of A is definedby

diam(A) = supd(x, y) | x, y ∈ A

Let An be a collection of subsets ofX such thatAn+1 ⊆ An and diam(An)→0, and assume that an is a sequence such that an ∈ An for each n ∈ N.Show that if X is complete, the sequence an converges.

4. Assume that d1 and d2 are two metrics on the same space X. We say thatd1 and d2 are equivalent if there are constants K and M such that d1(x, y) ≤Kd2(x, y) and d2(x, y) ≤ Md1(x, y) for all x, y ∈ X. Show that if d1 and d2are equivalent, and one of the spaces (X, d1), (X, d2) is complete, then so isthe other.

5. Assume that f : [0, 1] → [0, 1] is a differentiable function and that there isa number s < 1 such that |f ′(x)| < s for all x ∈ (0, 1). Show that there isexactly one point a ∈ [0, 1] such that f(a) = a.

6. You are standing with a map in your hand inside the area depicted on themap. Explain that there is exactly one point on the map that is verticallyabove the point it depicts.

7. Assume that (X, d) is a complete metric space, and that f : X → X is afunction such that fn is a contraction for some n ∈ N. Show that f has aunique fixed point.

8. A subset D of a metric space X is dense if for all x ∈ X and all ε ∈ R+ thereis an element y ∈ D such that d(x, y) < ε. Show that if all Cauchy sequenceyn from a dense set D converge in X, then X is complete.

2.5 Compact sets

We now turn to the study of compact sets. These sets are related both toclosed sets and to the notion of completeness, and they are extremely usefulin many applications.

Assume that xn is a sequence in a metric space X. If we have a strictlyincreasing sequence of natural numbers

n1 < n2 < n3 < . . . < nk < . . .

we call the sequence yk = xnk a subsequence of xn. A subsequence

contains infinitely many of the terms in the original sequence, but usuallynot all.

I leave the first result as an exercise:

Proposition 2.5.1 If the sequence xn converges to a, so does all subse-quences.

We are now ready to define compact sets:


Definition 2.5.2 A subset K of a metric space (X, d) is called compact ifevery sequence in K has a subsequence converging to a point in K. Thespace (X, d) is compact if X a compact set, i.e. if all sequences in X has aconvergent subsequence.

Compactness is a rather complex notion that it takes a while to get used to.We shall start by relating it to other concepts we have already introduced.First a definition:

Definition 2.5.3 A subset A of a metric space (X, d) is bounded if thereis a point b ∈ X and a constant K ∈ R such that d(a, b) ≤ K for all a ∈ A(it does not matter which point b ∈ X we use in this definition).

Here is our first result on compact sets:

Proposition 2.5.4 Every compact set K in a metric space (X, d) is closedand bounded.

Proof: We argue contrapositively. First we show that if a set K is not closed,then it can not be compact, and then we show that if K is not bounded, itcan not be compact.

Assume that K is not closed. Then there is a boundary point a that doesnot belong to K. For each n ∈ N, there is an xn ∈ K such that d(xn, a) < 1

n .The sequence xn converges to a /∈ K, and so does all its subsequences,and hence no subsequence can converge to a point in K.

Assume now that K is not bounded. For every n ∈ N there is an elementxn ∈ K such that d(xn, b) > n. If yk is a subsequence of xn, clearlylimk→∞ d(yk, b) = ∞. It is easy to see that yk can not converge to anyelement y ∈ X: According to the triangle inequality

d(yk, b) ≤ d(yk, y) + d(y, b)

and since d(yk, b) → ∞, we must have d(yk, y) → ∞. Hence xn has noconvergent subsequences, and K can not be compact. 2

In Rn the converse of the result above holds: All closed and bounded sub-sets of Rn are compact (this is just a reformulation of Bolzano-Weierstrass’Theorem in MAT1110). The following example shows that this is not thecase for all metric space.

Example 1: Consider the metric space (N, d) where d is the discrete met-ric. Then N is complete, closed and bounded, but the sequence n doesnot have a convergent subsequence.


We shall later see how we can strengthen the boundedness condition (tosomething called total boundedness) to get a characterization of compact-ness.

We next want to take a look at the relationship between completenessand compactness. Not all complete spaces are compact (R is complete butnot compact), but it turns out that all compact spaces are complete. Toprove this, we need a lemma on subsequences of Cauchy sequences that isuseful also in other contexts.

Lemma 2.5.5 Assume that xn is a Cauchy sequence in a (not necessarilycomplete) metric space (X, d). If there is a subsequence xnk

converging toa point a, then the original sequence xn also converges to a

Proof: We must show that for any given ε > 0, there is an N ∈ N such thatd(xn, a) < ε for all n ≥ N . Since xn is a Cauchy sequence, there is anN ∈ N such that d(xn, xm) < ε

2 for all n,m ≥ N . Since xnk converges to

a, there is a K such that nK ≥ N and d(xnK , a) ≤ ε2 . For all n ≥ N we then

haved(xn, a) ≤ d(xn, xnK ) + d(xnK , a) <

ε

2+ε

2= ε

by the triangle inequality. 2

Proposition 2.5.6 Every compact metric space is complete.

Proof: Let xn be a Cauchy sequence. Since X is compact, there is asubsequence xnk

converging to a point a. By the lemma above, xn alsoconverges to a. Hence all Cauchy sequences converge, and X must be com-plete. 2

Here is another useful result:

Proposition 2.5.7 A closed subset F of a compact set K is compact.

Proof: Assume that xn is a sequence in F — we must show that xn hasa subsequence converging to a point in F . Since xn is also a sequence inK, and K is compact, there is a subsequence xnk

converging to a pointa ∈ K. Since F is closed, a ∈ F , and hence xn has a subsequence con-verging to a point in F . 2

We have previously seen that if f is a continuous function, the inverseimages of open and closed sets are open and closed, respectively. The in-verse image of a compact set need not be compact, but it turns out thatthe (direct) image of a compact set under a continuous function is alwayscompact.


Proposition 2.5.8 Assume that f : X → Y is a continuous function be-tween two metric spaces. If K ⊆ X is compact, then f(K) is a compactsubset of Y .

Proof: Let yn be a sequence in f(K); we shall show that yn has sub-sequence converging to a point in f(K). Since yn ∈ f(K), we can for eachn find an element xn ∈ K such that f(xn) = yn. Since K is compact, thesequence xn has a subsequence xnk

converging to a point x ∈ K. Butthen by Proposition 2.2.5, ynk

= f(xnk) is a subsequence of yn con-

verging to y = f(x) ∈ f(K). 2

So far we have only proved technical results about the nature of compactsets. The next result gives the first indication why these sets are useful.

Theorem 2.5.9 (The Extreme Value Theorem) Assume that K is anon-empty, compact subset of a metric space (X, d) and that f : K → R isa continuous function. Then f has maximum and minimum points in K,i.e. there are points c, d ∈ K such that

f(d) ≤ f(x) ≤ f(c)

for all x ∈ K.

Proof: There is a quick way of proving this theorem by using the previousproposition (see the remark below), but I choose a slightly longer proof asI think it gives a better feeling for what is going on and how compactnessargumentness are used in practice. I only prove the maximum part and leavethe minimum as an exercise.

LetM = supf(x) | x ∈ K

(if F is unbounded, we put M =∞) and choose a sequence xn in K suchthat limn→∞ f(xn) = M . Since K is compact, xn has a subsequence xnk

converging to a point c ∈ K. Then on the one hand limk→∞ f(xnk

) = M ,and on the other limk→∞ f(xnk

) = f(c) according to Proposition 2.2.9.Hence f(c) = M , and since M = supf(x) | x ∈ K, we see that c is amaximum point for f on K. 2.

Remark: As already mentioned, it is possible to give a shorter proof of theExtreme Value Theorem by using Proposition 2.5.7. According to it, the setf(K) is compact and thus closed and bounded. This means that sup f(K)and inf f(K) belong to f(K), and hence there are points c, d ∈ K such thatf(c) = sup f(K) and f(d) = inf f(K). Clearly, c is a maximum and d aminimum point for f .

Let us finally turn to the description of compactness in terms of totalboundedness.


Definition 2.5.10 A subset A of a metric space X is called totally boundedif for each ε > 0 there is a finite number B(a1, ε),B(a2, ε), . . . ,B(an, ε) ofballs with centers in A and radius ε that cover A (i.e. A ⊆ B(a1, ε) ∪B(a2, ε) ∪ . . . ∪ B(an, ε)).

We first observe that a compact set is always totally bounded.

Proposition 2.5.11 Let K be a compact subset of a metric space X. ThenK is totally bounded.

Proof: We argue contrapositively: Assume that A is not totally bounded.Then there is an ε > 0 such that no finite collection of ε-balls cover A.We shall construct a sequence xn in A that does not have a convergentsubsequence. We begin by choosing an arbitrary element x1 ∈ A. SinceB(x1, ε) does not cover A, we can choose x2 ∈ A \ B(x1, ε). Since B(x1, ε)and B(x2, ε) do not cover A, we can choose x3 ∈ A \

(B(x1, ε) ∪ B(x2, ε)

).

Continuing in this way, we get a sequence xn such that

xn ∈ A \(B(x1, ε) ∪ B(x2, ε) ∪ . . . ∪ (B(xn−1, ε)

)This means that d(xn, xm) ≥ ε for all n,m ∈ N, n > m, and hence xn hasno convergent subsequence. 2

We are now ready for the final theorem. Note that we have now addedthe assumption that X is complete — without this condition, the statementis false.

Theorem 2.5.12 A subset A of a complete metric space X is compact ifand only if it is closed and totally bounded.

Proof: As we already know that a compact set is closed and totally bounded,it suffices to prove that a closed and totally bounded set A is compact. Letxn be a sequence in A. Our aim is to construct a convergent subsequencexnk

. Choose balls B11 , B

12 , . . . , B

1k1

of radius one that cover A. At least oneof these balls must contain infinitely many terms from the sequence. Callthis ball S1 (if there are more than one such ball, just choose one). We nowchoose balls B2

1 , B22 , . . . , B

2k2

of radius 12 that cover A. At least one of these

ball must contain infinitely many of the terms from the sequence that lies inS1. If we call this ball S2, S1 ∩ S2 contains infinitely many terms from thesequence. Continuing in this way, we find a sequence of balls Sk of radius 1

ksuch that

S1 ∩ S2 ∩ . . . ∩ Skalways contains infinitely many terms from the sequence.

We can now construct a convergent subsequence of xn. Choose n1

to be the first number such that xn1 belongs to S1. Choose n2 to be first


number larger that n1 such that xn2 belongs to S1 ∩ S2, then choose n3 tobe the first number larger than n2 such that xn3 belongs to S1 ∩ S2 ∩ S3.Continuing in this way, we get a subsequence xnk

such that

xnk∈ S1 ∩ S2 ∩ . . . ∩ Sk

for all k. Since the Sk’s are shrinking, xnk is a Cauchy sequence, and

since X is complete, xnk converges to a point a. Since A is closed, a ∈ A.

Hence we have proved that any sequence in A has a subsequence convergingto a point in A, and thus A is compact. 2

Problems to Section 2.5

1. Show that a space (X, d) with the discrete metric is compact if and only ifX is a finite set.


3. Prove the minimum part of Theorem 2.5.9.

4. Let b and c be two points in a metric space (X, d), and let A be a subset ofX. Show that if there is a number K ∈ R such that d(a, b) ≤ K for all a ∈ A,then there is a number M ∈ R such that d(a, c) ≤M for all a ∈ A. Hence itdoesn’t matter which point b ∈ X we use in Definition 2.5.3.

5. Assume that (X, d) is metric space and that f : X → [0,∞) is a continuousfunction. Assume that for each ε > 0, there is a compact Kε ⊆ X such thatf(x) < ε when x /∈ Kε. Show that f has a maximum point.

6. Let (X, d) be a compact metric space, and assume that f : X → R is contin-uous when we give R the usual metric. Show that if f(x) > 0 for all x ∈ X,then there is a positive, real number a such that f(x) > a for all x ∈ X.

7. Assume that f : X → Y is a continuous function between metric spaces,and let K be a compact subset of Y . Show that f−1(K) is closed. Find anexample which shows that f−1(K) need not be compact.

8. Show that a totally bounded subset of a metric space is always bounded. Findan example of a bounded set in a metric space that is not totally bounded.

9. The Bolzano-Weierstrass’ Theorem says that any bounded sequence in Rnhas a convergent subsequence. Use it to prove that a subset of Rn is compactif and only if it is closed and bounded.

10. Let (X, d) be a metric space.

a) Assume that K1,K2, . . . ,Kn is a finite collection of compact subsets ofX. Show that the union K1 ∪K2 ∪ . . . ∪Kn is compact.

b) Assume that K is a collection of compact subset of X. Show that theintersection

⋂K∈KK is compact.

11. Let (X, d) be a metric space. Assume that Kn is a sequence of non-empty,compact subsets of X such that K1 ⊇ K2 ⊇ . . . ⊇ Kn ⊇ . . .. Prove that⋂n∈NKn is non-empty.

2.6. AN ALTERNATIVE DESCRIPTION OF COMPACTNESS 47

12. Let (X, dX) and (Y, dY ) be two metric spaces. Assume that (X, dX) is com-pact, and that f : X → Y is bijective and continuous. Show that the inversefunction f−1 : Y → X is continuous.

13. Assume that C and K are disjoint, compact subsets of a metric space (X, d),and define

a = infd(x, y) | x ∈ C, y ∈ K

Show that a is strictly positive and that there are points x0 ∈ C, y0 ∈ Ksuch that d(x0, y0) = a. Show by an example that the result does not hold ifwe only assume that one of the sets C and K is compact and the other oneclosed.

14. Assume that (X, d) is compact and that f : X → X is continuous.

a) Show that the function g(x) = d(x, f(x)) is continuous and has a min-imum point.

b) Assume in addition that d(f(x), f(y)) < d(x, y) for all x, y ∈ X, x 6= y.Show that f has a unique fixed point. (Hint: Use the minimum froma))

2.6 An alternative description of compactness

The descriptions of compactness we studied in the previous section, sufficefor most purposes in this book, but for some of the more advanced proofsthere is another description that is more convenient. This alternative de-scription is also the right one to use if one wants to extend the concept ofcompactness to even more general spaces, so-called topological spaces. Insuch spaces, sequences are not always an efficient tool, and it is better tohave a description of compactness in terms of coverings by open sets.

To see what this means, assume that K is a subset of a metric space X.An open covering of X is simply a (finite or infinite) collection O of opensets whose union contains K, i.e.

K ⊆⋃O : O ∈ O

The purpose of this section is to show that in metric spaces, the followingproperty is equivalent to compactness.

Definition 2.6.1 (Open Covering Property) Let K be a subset of ametric space X. Assume that for each open covering O of K, there is afinite number of elements O1, O2, . . . , On in O such that

K ⊆ O1 ∪O2 ∪ . . . ∪On

(we say that each open covering of K has a finite subcovering). Then theset K is said to have the open covering property.


The open covering property is quite abstract and may take some time toget used to, but it turns out to be a very efficient tool. Note that the term“open covering property” is not standard terminology, and that it will moreor less disappear once we have proved that it is equivalent to compactness.

Let us first prove that a set with the open covering property is necessarilycompact. Before we begin, we need a simple observation: Assume that x isa point in our metric space X, and that no subsequence of a sequence xnconverges to x. Then there must be an open ball B(x; r) around x whichonly contains finitely many terms from xn (because if all balls around xcontained infinitely many terms, we could use these terms to construct asubsequence converging to x).

Proposition 2.6.2 If a subset K of a metric space X has the open coveringproperty, then it is compact.

Proof: We argue contrapositively, i.e., we assume that K is not compactand prove that it does not have the open covering property. Since K is notcompact, there is a sequence xn which does not have any subsequencesconverging to points in K. By the observation above, this means that foreach element x ∈ K, there is an open ball B(x; rx) around x which only con-tains finitely many terms of the sequence. The family B(x, rx) : x ∈ Kis an open covering of K, but it cannot have a finite subcovering sinceany such subcovering B(x1, rx1),B(x2, rx2), . . . ,B(xm, rxm) can only containfinitely many of the infinitely many terms in the sequence. 2

To prove the opposite implication, we shall use an elegant trick based onthe Extreme Value Theorem, but first we need a lemma (the strange cut-offat 1 in the definition of f(x) below is just to make sure that the function isfinite):

Lemma 2.6.3 Let O be an open covering of a subset A of a metric speceX. Define a function f : A→ R by

f(x) = supr ∈ R | r < 1 and B(x; r) ⊆ O for some O ∈ O

Then f is continuous and strictly positive (i.e. f(x) > 0 for all x ∈ A).

Proof: The strict positivity is easy: Since O is a covering of A, there is aset O ∈ O such that x ∈ O, and since O is open, there is an r, 0 < r < 1,such that B(x; r) ⊆ O. Hence f(x) ≥ r > 0.

To prove the continuity, it suffices to show that |f(x) − f(y)| ≤ d(x, y)as we can then choose δ = ε in the definition of continuity. Observe firstthat if f(x), f(y) ≤ d(x, y), there is nothing to prove. Assume thereforethat at least one of these values is larger than d(x, y). Without out loss ofgenerality, we may assume that f(x) is the larger of the two. There must

2.6. AN ALTERNATIVE DESCRIPTION OF COMPACTNESS 49

then be an r > d(x, y) and an O ∈ O such that B(x, r) ⊆ O. For anysuch r, B(y, r − d(x, y)) ⊆ O since B(y, r − d(x, y)) ⊂ B(x, r). This meansthat f(y) ≥ f(x) − d(x, y). Since by assumption f(x) ≥ f(y), we have|f(x)− f(y)| ≤ d(x, y) which is what we set out to prove. 2

We are now ready for the main theorem:

Theorem 2.6.4 A subset K of a metric space is compact if and only if ithas the open covering property.

Proof: It remains to prove that if K is compact and O is an open coveringof K, then O has a finite subcovering. By the Extremal Value Theorem,the function f in the lemma attains a minimal value r on K, and since fis strictly positive, r > 0. This means that for all x ∈ K, the ball B(x, r)is contained in a set O ∈ B. Since K is compact, it is totally bounded, andhence there is a finite collection of balls B(x1, r1), B(x2, r2), . . . , B(xn, rn)that cover K. Each ball B(xi, ri) is contained in a set Oi ∈ O, and henceO1, O2, . . . , On is a finite subcovering of O. 2

As usual, there is a reformulation of the theorem above in terms ofclosed sets. Let us first agree to say that a collection F of sets has the finiteintersection property over K if

K ∩ F1 ∩ F2 ∩ . . . ∩ Fn 6= ∅

for all finite collections F1, F2, . . . , Fn of sets from F .

Corollary 2.6.5 Assume that K is a subset of a metric space X. Then thefollowing are equivalent:

(i) K is compact.

(ii) If a collection F of closed sets has the finite intersection property overK, then

K ∩

( ⋂F∈F

F

)6= ∅

Proof: Left to the reader (see Exercise 8). 2


1. Assume that I is a collection of open intervals in R whose union contains[0, 1]. Show that there exists a finite collection I1, I2, . . . , In of sets from Isuch that

[0, 1] ⊆ I1 ∪ I1 ∪ . . . ∪ In


2. Let Kn be a decrasing sequence (i.e., Kn+1 ⊆ Kn for all n ∈ N) ofnonempty, compact sets. Show that

⋂n∈NKn 6= ∅.

3. Assume that f : X → Y is a continuous function between two metric spaces.Use the open covering property to show that if K is a compact subset of X,then f(K) is a compact subset of Y .

4. Assume that K1,K2, . . . ,Kn are compact subsets of a metric space X. Usethe open covering property to show that K1 ∪K2 ∪ . . . ∪Kn is compact.

5. Use the open covering property to show that a closed subset of a compactset is compact.

6. Assume that f : X → Y is a continuous function between two metric spaces,and assume that K is a compact subset of X. We shall prove that f isuniformly continuous, i.e. that for each ε > 0, there exists a δ > 0 such thatwhenever x, y ∈ K and dX(x, y) < δ, then dY (f(x), f(y)) < ε (this looks verymuch like ordinary continuity, but the point is that we can use the same δat all points x, y ∈ K).

a) Given ε > 0, explain that for each x ∈ K there is a δ(x) > 0 such thatdY (f(x), f(y)) < ε

2 for all y with d(x, y) < δ(x).

b) Explain that B(x, δ(x)2 )x∈K is an open cover of X, and that it has a

finite subcover B(x1,δ(x1)

2 ), B(x2,δ(x2)

2 ), . . . , B(xn,δ(xn)

2 ).

c) Put δ = min δ(x1)2 , δ(x2)

2 , . . . , δ(xn)2 , and show that if x, y ∈ K with

dX(x, y) < δ, then dY (f(x), f(y)) < ε.

2.7 The completion of a metric space

Completeness is probably the most important notion in this book as mostof the deep and important theorems about metric spaces only hold whenspace is complete. In this section we shall see that it is always possible tomake an incomplete space complete by adding new elements, but before weturn to this, we need to take a look at a concept that will be important inmany different contexts throughout the book.

Definition 2.7.1 Let (X, d) be a metric space and assume that D is a subsetof X. We say that D is dense in X if for each x ∈ X there is a sequenceyn from D converging to x.

We know that Q is dense in R — we may, e.g., approximate a real numberby longer and longer parts of its decimal expansion. For x =

√2 this would

mean the approximating sequence

y1 = 1.4 =14

10, y2 = 1.41 =

141

100, y3 = 1.414 =

1414

1000, y4 = 1.4142 =

14142

10000, . . .

There is an alternative description of dense that we shall also need.

2.7. THE COMPLETION OF A METRIC SPACE 51

Proposition 2.7.2 A subset D of a metric space X is dense if and only iffor each x ∈ X and each δ > 0, there is a y ∈ D such that d(x, y) ≤ δ.

Proof: Left as an exercise. 2

We are now ready to return to our initial problem: How do we extendan incomplete metric space to a complete one? The following definitiondescribes what we are looking for.

Definition 2.7.3 If (X, dX) is a metric space, a completion of (X, dX) isa metric space (X, dX) such that:

(i) (X, dX) is a subspace of (X, dX); i.e. X ⊆ X and d(x, y)) = d(x, y)for all x, y ∈ X.

(ii) X is dense (X, dX).

The canonical example of a completion is that R is the completion Q. Wealso note that a complete metric space is its own (unique) completion.

An incomplete metric space will have more than one completion, but asthey are all isometric2, they are the same for all practical purposes, and weusually talk about the completion of a metric space.

Proposition 2.7.4 Assume that (Y, dY ) and (Z, dZ) are completions of themetric space (X, dX). Then (Y, dY ) and (Z, dZ) are isometric.

Proof: We shall construct an isometry i : Y → Z. Since X is dense inY , there is for each y ∈ Y a sequence xn from X converging to y. Thissequence must be a Cauchy sequence in X and hence in Z. Since Z iscomplete, xn converges to an element z ∈ Z. The idea is to define i byletting i(y) = z. For the definition to work properly, we have to check thatif xn is another sequence in X converging to y, then xn converges to zin Z. This is the case since dZ(xn, xn) = dX(xn, xn) = dY (xn, xn) → 0 asn→∞.

To prove that i preserves distances, assume that y, y are two points in Y ,and that xn, xn are sequences in X converging to y and y, respectively.Then xn, xn converges to i(y) and i(y), respectively, in Z, and we have

dZ(i(y), i(y)) = limn→∞

dZ(xn, xn) = limn→∞

dX(xn, xn) =

= limn→∞

dY (xn, xn) = dY (y, y)

2Recall from Section 2.1 that an isometry from (X, dX) to (Y, dY ) is a bijection i :X → Y such that dY (i(x), i(y)) = dX(x, y) for all x, y ∈ X. Two metric spaces areoften considered “the same” when they are isomorphic; i.e. when there is an isomorphismbetween them.


It remains to prove that i is a bijection. Injectivity follows immediatelyfrom distance preservation: If y 6= y, then dZ(i(y), i(y)) = dY (y, y) 6= 0, andhence i(y) 6= i(y). To show that i is surjective, consider an arbitrary elementz ∈ Z. Since X is dense in Z, there is a sequence xn from X convergingto z. Since Y is complete, xn is also converging to an element y in Y . Byconstruction, i(y) = z, and hence i is surjective. 2

We shall use the rest of the section to show that all metric spaces (X, d)have a completion. The construction is longer and more complicated thanmost others in this book, but also quite instructive as it is typical of a typeof construction that is very common in mathematics. As what we want toconstruct is a space where all Cauchy sequences from X has a limit, it isnot unnatural to start with the set X of all Cauchy sequences and see if wecan turn it into a metric space containing X.

The first lemma gives us the information we need to construct a metric.

Lemma 2.7.5 Assume that xn and yn are two Cauchy sequences in ametric space (X, d). Then limn→∞ d(xn, yn) exists.

Proof: As R is complete, it suffices to show that d(xn, yn) is a Cauchysequence. We have

|d(xn, yn)− d(xm, ym)| = |d(xn, yn)− d(xm, yn) + d(xm, yn)− d(xm, ym)| ≤

≤ |d(xn, yn)− d(xm, yn)|+ |d(xm, yn)− d(xm, ym)| ≤ d(xn, xm) + d(yn, ym)

where we have used the inverse triangle inequality (Proposition 2.1.4) in thefinal step. Since xn and yn are Cauchy sequences, we can get d(xn, xm)and d(yn, ym) as small as we wish by choosing n and m sufficiently large,and hence d(xn, yn) is a Cauchy sequence. 2

As already mentioned, we let X be the set of all Cauchy sequences on themetric space (X, dX). We want to turn X into a metric space by using the“metric” d(xn, yn) = limn→∞ d(xn, yn) to measure the distance betweenthe sequences xn and yn, but before we can do this, we have to identifyCauchy sequences that will converge to the same point in any completion.To this end we introduce a relation ∼ on X by

xn ∼ yn ⇐⇒ limn→∞

d(xn, yn) = 0

Lemma 2.7.6 ∼ is an equivalence relation.

Proof: We have to check the three properties in Definition 1.5.2:Reflexivity: Since limn→∞ d(xn, xn) = 0, the relation is reflexiv.Symmetry: Since limn→∞ d(xn, yn) = limn→∞ d(yn, xn), the relation is sym-metric.


Transitivity: Assume that xn ∼ yn og yn ∼ zn. Then limn→∞ d(xn, yn)= limn→∞ d(yn, zn) = 0, and consequently

0 ≤ limn→∞

d(xn, zn) ≤ limn→∞

(d(xn, yn) + d(yn, zn)

)=

= limn→∞

d(xn, yn) + limn→∞

d(yn, zn) = 0

which shows that xn = yn. 2

We shall denote the equivalence class of xn by [xn], and we let X bethe set of all equivalence classes. The next lemma will allow us to define anatural metric on X.

Lemma 2.7.7 If xn ∼ xn and yn ∼ yn, then limn→∞ d(xn, yn) =limn→∞ d(xn, yn).

Proof: Since d(xn, yn) ≤ d(xn, xn) + d(xn, yn) + d(yn, yn) by the triangleinequality, and limn→∞ d(xn, xn) = limn→∞ d(yn, yn) = 0, we get

limn→∞

d(xn, yn) ≤ limn→∞

d(xn, yn)

By reversing the roles of elements with and without hats, we get the oppo-site inequality. 2

We may now define a function d : X × X → [0,∞) by

d([xn], [yn]) = limn→∞

d(xn, yn)

Note that by the previous lemma d is well-defined ; i.e. the value of d([xn], [yn])does not depend on which representatives xn and yn we choose fromthe equivalence classes [xn] and [yn].

Lemma 2.7.8 (X, d) is a metric space.

Proof : We need to check the three conditions in the definition of a metricspace.Positivity: Clearly d([xn], [yn]) = limn→∞ d(xn, yn) ≥ 0, and by definitionof the equivalence relation, we have equality if and only if [xn] = [yn].Symmetry: Since the underlying metric d is symmetric, we have

d([xn], [yn]) = limn→∞

d(xn, yn) = limn→∞

d(yn, xn) = d([yn], [xn])

Triangle inequality: For all equivalence classes [xn], [yn], [zn], we have

d([xn], [zn]) = limn→∞

d(xn, zn) ≤ limn→∞

d(xn, yn) + limn→∞

d(yn, zn) =


= d([xn], [yn]) + d([yn], [zn])

2

For each x ∈ X, let x be the equivalence class of the constant sequencex, x, x, . . .. Since d(x, y) = limn→∞ d(x, y) = d(x, y), the mapping x → xis an embedding3 of X into X. Hence X contains a copy of X, and the nextlemma shows that this copy is dense in X.

Lemma 2.7.9 The set

D = x : x ∈ X

is dense in X.

Proof: Assume that [xn] ∈ X. It suffices to show (see Problem 4) that foreach ε > 0 there is an x ∈ D such that d(x, [xn]) < ε. Since xn is a Cauchysequence, there is an N ∈ N such that d(xn, xN ) < ε

2 for all n ≥ N . Putx = xN . Then d([xn], x) = limn→∞ d(xn, xN ) ≤ ε

2 < ε. 2

It still remains to prove that (X, d) is complete. The next lemma is the firststep in this direction.

Lemma 2.7.10 All Cauchy sequences in D converges to an element in X.

Proof: Let uk be a Cauchy sequence in D. Since d(un, um) = d(un, um),un is a Cauchy sequence in X, and gives rise to an element [un] in X. Tosee that uk converges to [un], note that d(uk, [un]) = limn→∞ d(uk, un).Since un is a Cauchy sequence, this limit decreases to 0 as k goes to in-finity. 2

We are now ready to prove completeness:

Lemma 2.7.11 (X, d) is complete.

Proof: Let xn be a Cauchy sequence in X. Since D is dense in X, thereis for each n a yn ∈ D such that d(xn, yn) < 1

n . It is easy to check thatsince xn is a Cauchy sequence, so is yn. By the previous lemma, ynconverges to an element in X, and by construction xn must converge tothe same element. Hence (X, d) is complete. 2

We have reached the main theorem.

Theorem 2.7.12 Every metric space (X, d) has a completion.

3Recall Definition 2.1.3


Proof: We have already proved that (X, d) is a complete metric space thatcontains D = x : x ∈ X as a dense subset. In addition, we know that Dis a copy of X (more precisel, x → x is an isometry from X to D). All wehave to do, is to replace the elements x in D by the original elements x inX, and we have found a completion of X. 2

Remark: The theorem above doesn’t solve all problems with incompletespaces as there may be additional structure we want the completion to re-flect. If, e.g., the original space consists of functions, we may want thecompletion also to consist of functions, but there is nothing in the construc-tion above that guarantees that this is possible. We shall return to thisquestion in later chapters.



2. Let us write (X, dX) ∼ (Y, dY ) to indicate that the two spaces are isometric.Show that

(i) (X, dX) ∼ (X, dX)

(ii) If (X, dX) ∼ (Y, dY ), then (Y, dY ) ∼ (X, dX)

(iii) If (X, dX) ∼ (Y, dY ) and (Y, dY ) ∼ (Z, dZ), then (X, dX) ∼ (Z, dZ).

3. Show that the only completion of a complete metric space is the space itself.

4. Show that R is the completion of Q (in the usual metrics).

5. Assume that i : X → Y is an isometry between two metric spaces (X, dX)and (Y, dY ).

(i) Show that a sequence xn converges in X if and only if i(xn) con-verges in Y .

(ii) Show that a set A ⊆ X is open/closed/compact if and only if i(A) isopen/closed/compact.


Chapter 3

Spaces of continuousfunctions

In this chapter we shall apply the theory we developed in the previous chap-ter to spaces where the elements are continuous functions. We shall studycompleteness and compactness of such spaces and take a look at some ap-plications.

3.1 Modes of continuity

If (X, dX) and (Y, dY ) are two metric spaces, the function f : X → Yis continuous at a point a if for each ε > 0 there is a δ > 0 such thatdY (f(x), f(a)) < ε whenever dX(x, a) < δ. If f is also continuous at anotherpoint b, we may need a different δ to match the same ε. A question thatoften comes up is when we can use the same δ for all points x in the spaceX. The function is then said to be uniformly continuous in X. Here is theprecise definition:

Definition 3.1.1 Let f : X → Y be a function between two metric spaces.We say that f is uniformly continuous if for each ε > 0 there is a δ > 0such that dY (f(x), f(y)) < ε for all points x, y ∈ X such that dX(x, y) < δ.

A function which is continuous at all points in X, but not uniformlycontinuous, is often called pointwise continuous when we want to emphasizethe distinction.

Example 1 The function f : R → R defined by f(x) = x2 is pointwisecontinuous, but not uniformly continuous. The reason is that the curve be-comes steeper and steeper as |x| goes to infinity, and that we hence needincreasingly smaller δ’s to match the same ε (make a sketch!) See Exercise1 for a more detailed discussion. ♣

57

58 CHAPTER 3. SPACES OF CONTINUOUS FUNCTIONS

If the underlying space X is compact, pointwise continuity and uniformcontinuity are the same. This means that a continuous function defined ona closed and bounded subset of Rn is always uniformly continuous.

Proposition 3.1.2 Assume that X and Y are metric spaces. If X is com-pact, all continuous functions f : X → Y are uniformly continuous.

Proof: We argue contrapositively: Assume that f is not uniformly continu-ous; we shall show that f is not continuous.

Since f fails to be uniformly continuous, there is an ε > 0 we cannotmatch; i.e. for each δ > 0 there are points x, y ∈ X such that dX(x, y) < δ,but dY (f(x), f(y)) ≥ ε. Choosing δ = 1

n , there are thus points xn, yn ∈ Xsuch that dX(xn, yn) < 1

n and dY (f(xn), f(yn)) ≥ ε. Since X is compact,the sequence xn has a subsequence xnk

converging to a point a. SincedX(xnk

, ynk) < 1

nk, the corresponding sequence ynk

of y’s must also con-verge to a. We are now ready to show that f is not continuous at a: Had itbeen, the two sequences f(xnk

) and f(ynk) would both have converged

to f(a), something they clearly can not since dY (f(xn), f(yn)) ≥ ε for alln ∈ N. 2

There is an even more abstract form of continuity that will be impor-tant later. This time we are not considering a single function, but a wholecollection of functions:

Definition 3.1.3 Let (X, dX) and (Y, dY ) be metric spaces, and let F be acollection of functions f : X → Y . We say that F is equicontinuous if forall ε > 0, there is a δ > 0 such that for all f ∈ F and all x, y ∈ X withdX(x, y) < δ, we have dY (f(x), f(y)) < ε.

Note that in the case, the same δ should not only hold at all pointsx, y ∈ X, but also for all functions f ∈ F .

Example 2 Let F be the set of all contractions f : X → X. Then F isequicontinuous, since we can can choose δ = ε. To see this, just note thatif dX(x, y) < δ = ε, then dX(f(x), f(y)) ≤ dX(x, y) < ε for all x, y ∈ X andall f ∈ F . ♣

Equicontinuous families will be important when we study compact setsof continuous functions in Section 3.5.


1. Show that the function f(x) = x2 is not uniformly continuous on R. (Hint:You may want to use the factorization f(x)−f(y) = x2−y2 = (x+y)(x−y)).

3.2. MODES OF CONVERGENCE 59

2. Prove that the function f : (0, 1) → R given by f(x) = 1x is not uniformly

continuous.

3. A function f : X → Y between metric spaces is said to be Lipschitz-continuous with Lipschitz constant K if dY (f(x), f(y)) ≤ KdX(x, y) forall x, y ∈ X. Asume that F is a collection of functions f : X → Y withLipschitz constant K. Show that F is equicontinuous.

4. Let f : R→ R be a differentiable function and assume that the derivative f ′

is bounded. Show that f is uniformly continuous.

3.2 Modes of convergence

In this section we shall study two ways in which a sequence fn of continu-ous functions can converge to a limit function f : pointwise convergence anduniform convergence. The distinction is rather simililar to the distinctionbetween pointwise and uniform continuity in the previous section — in thepointwise case, a condition can be satisfied in different ways for different x’s;in the uniform, case it must be satisfied in the same way for all x. We beginwith pointwise convergence:

Definition 3.2.1 Let (X, dX) and (Y, dY ) be two metric spaces, and letfn be a sequence of functions fn : X → Y . We say that fn con-verges pointwise to a function f : X → Y if fn(x) → f(x) for all x ∈ X.This means that for each x and each ε > 0, there is an N ∈ N such thatdY (fn(x), f(x)) < ε when n ≥ N .

Note that the N in the last sentence of the definition depends on x —we may need a much larger N for some x’s than for others. If we can usethe same N for all x ∈ X, we have uniform convergence. Here is the precisedefinition:

Definition 3.2.2 Let (X, dX) and (Y, dY ) be two metric spaces, and letfn be a sequence of functions fn : X → Y . We say that fn convergesuniformly to a function f : X → Y if for each ε > 0, there is an N ∈ Nsuch that if n ≥ N , then dY (fn(x), f(x)) < ε for all x ∈ X.

At first glance, the two definitions may seem confusingly similar, but thedifference is that in the last one, the same N should work simultaneously forall x, while in the first we can adapt N to each individual x. Hence uniformconvergence implies pointwise convergence, but a sequence may convergepointwise but not uniformly. Before we look at an example, it will be usefulto reformulate the definition of uniform convergence.

Proposition 3.2.3 Let (X, dX) and (Y, dY ) be two metric spaces, and letfn be a sequence of functions fn : X → Y . For any function f : X → Ythe following are equivalent:


(i) fn converges uniformly to f .

(ii) supdY (fn(x), f(x)) |x ∈ X → 0 as n→∞.

Hence uniform convergence means that the “maximal” distance between fand fn goes to zero.

Proof: (i) =⇒ (ii) Assume that fn converges uniformly to f . For anyε > 0, we can find an N ∈ N such that dY (fn(x), f(x)) < ε for all x ∈ X andall n ≥ N . This means that supdY (fn(x), f(x)) |x ∈ X ≤ ε for all n ≥ N(note that we may have unstrict inequality ≤ for the supremum althoughwe have strict inequality < for each x ∈ X), and since ε is arbitrary, thisimplies that supdY (fn(x), f(x)) |x ∈ X → 0.

(ii) =⇒ (i) Assume that supdY (fn(x), f(x)) |x ∈ X → 0 as n → ∞.Given an ε > 0, there is an N ∈ N such that supdY (fn(x), f(x)) |x ∈ X <ε for all n ≥ N . But then we have dY (fn(x), f(x)) < ε for all x ∈ X and alln ≥ N , which means that fn converges uniformly to f . 2

Here is an example which shows clearly the distinction between point-wise and uniform convergence:

Example 1 Let fn : [0, 1] → R be the function in Figure 1. It is constantzero except on the interval [0, 1

n ] where it looks like a tent of height 1.

-

6

1

1

1n

Figure 1

EEEEEEEEEE

If you insist, the function is defined by

fn(x) =

2nx if 0 ≤ x < 1

2n

−2nx+ 2 if 12n ≤ x <

1n

0 if 1n ≤ x ≤ 1

but it is much easier just to work from the picture.

The sequence fn converges pointwise to 0, because at every point x ∈[0, 1] the value of fn(x) eventually becomes 0 (for x = 0, the value is always

3.2. MODES OF CONVERGENCE 61

0, and for x > 0 the “tent” will eventually pass to the left of x.) However,since the maximum value of all fn is 1, supdY (fn(x), f(x)) |x ∈ [0, 1] = 1for all n, and hence fn does not converge uniformly to 0. ♣

When we are working with convergent sequences, we would often likethe limit to inherit properties from the elements in the sequence. If, e.g.,fn is a sequence of continuous functions converging to a limit f , we areoften interested in showing that f is also continuous. The next exampleshows that this is not always the case when we are dealing with pointwiseconvergence.

Example 2: Let fn : R→ R be the function in Figure 2.

-

6

-1

1

1n

− 1n

Figure 2

It is defined by

fn(x) =

−1 if x ≤ − 1

n

nx if − 1n < x < 1

n

1 if 1n ≤ x

The sequence fn converges pointwise to the function, f defined by

f(x) =

−1 if x < 0

0 if x = 0

1 if x > 0

but although all the functions fn are continuous, the limit function f isnot. ♣

If we strengthen the convergence from pointwise to uniform, the limit ofa sequence of continuous functions is always continuous.


Proposition 3.2.4 Let (X, dX) and (Y, dY ) be two metric spaces, and as-sume that fn is a sequence of continuous functions fn : X → Y converginguniformly to a function f . Then f is continuous.

Proof: Let a ∈ X. Given an ε > 0, we must find a δ > 0 such thatdY (f(x), f(a)) < ε whenever dX(x, a) < δ. Since fn converges uniformlyto f , there is an N ∈ N such that when n ≥ N , dY (f(x), fn(x)) < ε

3for all x ∈ X. Since fN is continuous at a, there is a δ > 0 such thatdY (fN (x), fN (a)) < ε

3 whenever dX(x, a) < δ. If dX(x, a) < δ, we then have

dY (f(x), f(a)) ≤ dY (f(x), fN (x)) + dY (fN (x), fN (a)) + dY (fN (a), f(a)) <

<ε

3+ε

3+ε

3= ε

and hence f is continuous at a. 2

The technique in the proof above is quite common, and arguments ofthis kind are often referred to as ε

3 -arguments.


1. Let fn : R → R be defined by fn(x) = xn . Show that fn converges point-

wise, but not uniformly to 0.

2. Let fn : (0, 1) → R be defined by fn(x) = xn. Show that fn convergespointwise, but not uniformly to 0.

3. The function fn : [0,∞)→ R is defined by fn(x) = e−x(xn

)ne.

a) Show that fn converges pointwise.

b) Find the maximum value of fn. Does fn converge uniformly?

4. The function fn : (0,∞)→ R is defined by

fn(x) = n(x1/n − 1)

Show that fn converges pointwise to f(x) = lnx. Show that the conver-gence is uniform on each interval ( 1

k , k), k ∈ N, but not on (0,∞).

5. Let fn : R → R and assume that the sequence fn of continuous functionsconverges uniformly to f : R→ R on all intervals [−k, k], k ∈ N. Show thatf is continuous.

6. Assume that X is a metric space and that fn, gn are functions from X to R.Show that if fn and gn converge uniformly to f and g, respectively, thenfn + gn converges uniformly to f + g.

7. Assume that fn : [a, b] → R are continuous functions converging uniformlyto f . Show that ∫ b

a

fn(x) dx→∫ b

a

f(x) dx

Find an example which shows that this is not necessarily the case if fnonly converges pointwise to f .

3.3. THE SPACES C(X,Y ) 63

8. Let fn : R → R be given by fn(x) = 1n sin(nx). Show that fn converges

uniformly to 0, but that the sequence f ′n of derivates does not converge.Sketch the graphs of fn to see what is happening.

9. Let (X, d) be a metric space and assume that the sequence fn of continuousfunctions converges uniformly to f . Show that if xn is a sequence in Xconverging to x, then fn(xn) → f(x). Find an example which shows thatthis is not necessarily the case if fn only converges pointwise to f .

10. Assume that the functions fn : X → Y converges uniformly to f , and thatg : Y → Z is uniformly continuous. Show that the sequence g fn con-verges uniformly. Find an example which shows that the conclusion does notnecessarily hold if g is only pointwise continuous.

11. Assume that∑∞n=0Mn is a convergent series of positive numbers. Assume

that fn : X → R is a sequence of continuous functions defined on a metricspace (X, d). Show that if |fn(x)| ≤ Mn for all x ∈ X and all n ∈ N , then

the partial sums sN (x) =∑Nn=0 fn(x) converge uniformly to a continuous

function s : X → R as N →∞. (This is called Weierstrass’ M-test).

12. Assume that (X, d) is a compact space and that fn is a decreasing se-quence of continuous functions converging pointwise to a continuous functionf . Show that the convergence is uniform (this is called Dini’s theorem).

3.3 The spaces C(X, Y )

If (X, dX) and (Y, dY ) are metric spaces, we let

C(X,Y ) = f : X → Y | f is continuous

be the collection of all continuous functions from X to Y . In this sectionwe shall see how we can turn C(X,Y ) into a metric space. To avoid certaintechnicalities, we shall restrict ourselves to the case where X is compact asthis is sufficient to cover most interesting applications (see Exercise 4 forone possible way of extending the theory to the non-compact case).

The basic idea is to measure the distance between two functions bylooking at the point they are the furthest apart; i.e. by

ρ(f, g) = supdY (f(x), g(x)) |x ∈ X

Our first task is to show that ρ is a metric on C(X,Y ). But first we need alemma:

Lemma 3.3.1 Let (X, dX) and (Y, dY ) be metric spaces, and assume thatX is compact. If f, g : X → Y are continuous functions, then


is finite, and there is a point x ∈ X such that dY (f(x), g(x)) = ρ(f, g).


Proof: The result will follow from the Extreme Value Theorem (Theorem2.5.9) if we can only show that the function

h(x) = dY (f(x), g(x))

is continiuous. By the triangle inequality for numbers and the inverse tri-angle inequality 2.1.4, we get

|h(x)− h(y)| = |dY (f(x), g(x))− dY (f(y), g(y))| =

= |dY (f(x), g(x))− dY (f(x), g(y)) + dY (f(x), g(y))− dY (f(y), g(y))| ≤

≤ |dY (f(x), g(x))− dY (f(x), g(y))|+ |dY (f(x), g(y))− dY (f(y), g(y))| ≤

≤ dY (g(x), g(y)) + dY (f(x), f(y))

To prove that h is continuous at x, just observe that since f and g are contin-uous at x, there is for any given ε > 0 a δ > 0 such that dY (f(x), f(y)) < ε

2and dY (g(x), g(y)) < ε

2 when dX(x, y) < δ. But then

|h(x)− h(y)| ≤ dY (f(x), f(y)) + dY (g(y), g(x)) <ε

2+ε

2= ε

whenever dX(x, y) < δ, and hence h is continuous. 2

We are now ready to prove that ρ is a metric on C(X,Y ):

Proposition 3.3.2 Let (X, dX) and (Y, dY ) be metric spaces, and assumethat X is compact. Then


defines a metric on C(X,Y ).

Proof: By the lemma, ρ(f, g) is always finite, and we only have to provethat ρ satisfies the three properties of a metric: positivity, symmetry, andthe triangle inequality. The first two are more or less obvious, and weconcentrate on the triangle inequality:

Assume that f, g, h are three functions in C(X,Y ); we must show that

ρ(f, g) ≤ ρ(f, h) + ρ(h, g)

According to the lemma, there is a point x ∈ X such that ρ(f, g) =dY (f(x), g(x)). But then

ρ(f, g) = dY (f(x), g(x)) ≤ dY (f(x), h(x))+dY (h(x), g(x)) ≤ ρ(f, h)+ρ(h, g)

where we have used the triangle inequality in Y and the definition of ρ. 2

Not surprisingly, convergence in C(X,Y ) is exactly the same as uniformconvergence.

3.3. THE SPACES C(X,Y ) 65

Proposition 3.3.3 A sequence fn converges to f in (C(X,Y ), ρ) if andonly if it converges uniformly to f .

Proof: According to Proposition 3.2.3, fn converges uniformly to f if andonly if

supdY (fn(x), f(x)) |x ∈ X → 0

This just means that ρ(fn, f)→ 0, which is to say that fn converges to fin (C(X,Y ), ρ). 2

The next result is the starting point for many applications; it shows thatC(X,Y ) is complete if Y is.

Theorem 3.3.4 Assume that (X, dX) is a compact and (Y, dY ) a completemetric space. Then C(X,Y ), ρ) is complete.

Proof: Assume that fn is a Cauchy sequence in C(X,Y ). We must provethat fn converges to a function f ∈ C(X,Y ).

Fix an element x ∈ X. Since dY (fn(x), fm(x)) ≤ ρ(fn, fm) and fn is aCauchy sequence in (C(X,Y ), ρ), the function values fn(x) form a Cauchysequence in Y . Since Y is complete, fn(x) converges to a point f(x) inY. This means that fn converges pointwise to a function f : X → Y . Wemust prove that f ∈ C(X,Y ) and that fn converges to f in the ρ-metric.

Since fn is a Cauchy sequence, we can for any ε > 0 find an N ∈ Nsuch that ρ(fn, fm) < ε

2 when n,m ≥ N . This means that all x ∈ X andall n,m ≥ N , dY (fn(x), fm(x)) < ε

2 . If we let m → ∞, we see that for allx ∈ X and all n ≥ N

dY (fn(x), f(x)) = limm→∞

dY (fn(x), fm(x)) ≤ ε

2< ε

This means that fn converges uniformly to f . According to Proposition3.2.4, f is continuos and belongs to C(X,Y ), and according to the proposi-tion above, fn converges to f in (C(X,Y ), ρ). 2

In the next section we shall combine the result above with Banach’sFixed Point Theorem to obtain our first real application.


1. Let f, g : [0, 1]→ R be given by f(x) = x, g(x) = x2. Find ρ(f, g).

2. Let f, g : [0, 2π]→ R be given by f(x) = sinx, g(x) = cosx. Find ρ(f, g).

3. Complete the proof of Proposition 3.3.2 by showing that ρ satisfies the firsttwo conditions of a metric (positivity and symmetry).


4. The main reason why we have restricted the theory above to the case whereX is compact, is that if not,


may be infinite, and then ρ is not a metric. In this problem we shall sketcha way to avoid this problem.

A function f : X → Y is called bounded if there is a point a ∈ Y and aconstant K ∈ R such that dY (a, f(x)) ≤ K for all x ∈ X (it doesn’t matterwhich point a we use in this definition). Let C0(X,Y ) be the set of allbounded, continuous functions f : X → Y , and define


a) Show that ρ(f, g) <∞ for all f, g ∈ C0(X,Y ).

b) Show by an example that there need not be a point x in X such thatρ(f, g) = dY (f(x), g(x)).

c) Show that ρ is a metric on C0(X,Y ).

d) Show that if a sequence fn of functions in C0(X,Y ) converges uni-formly to a function f , then f ∈ C0(X,Y ).

e) Assume that (Y, dY ) is complete. Show that (C0(X,Y ), ρ) is complete.

f) Let c0 be the set of all bounded sequences in R. If xn, yn are inc0, define

ρ(xn, yn) = sup(|xn − yn| : n ∈ NProve that (c0, ρ) is a complete metric space. (Hint: You may think ofc0 as C0(N,R) where N has the discrete metric).

3.4 Applications to differential equations

Consider a system of differential equations

y′1(t) = f1(t, y1(t), y2(t), . . . , yn(t))

y′2(t) = f2(t, y1(t), y2(t), . . . , yn(t))

......

......

y′n(t) = fn(t, y1(t), y2(t), . . . , yn(t))

with initial conditions y1(0) = Y1, y2(0) = Y2, . . . , yn(0) = Yn. In this sec-tion we shall use Banach’s Fixed Point Theorem 2.4.5 and the completenessof C([0, a],Rn) to prove that under reasonable conditions such systems havea unique solution.

We begin by introducing vector notation to make the formulas easier toread:

y(t) =

y1(t)y2(t)

...yn(t)

3.4. APPLICATIONS TO DIFFERENTIAL EQUATIONS 67

y0 =

Y1

Y2...Yn

and

f(t,y(t)) =

f1(t, y1(t), y2(t), . . . , yn(t))f2(t, y1(t), y2(t), . . . , yn(t))

...fn(t, y1(t), y2(t), . . . , yn(t))

In this notation, the system becomes

y′(t) = f(t,y(t)), y(0) = y0 (3.4.1)

The next step is to rewrite the differential equation as an integral equa-tion. If we integrate on both sides of (3.4.1), we get

y(t)− y(0) =

∫ t

0f(s,y(s)) ds

i.e.

y(t) = y0 +

∫ t

0f(s,y(s)) ds (3.4.2)

On the other hand, if we start with a solution of (3.4.2) and differentiate,we arrive at (3.4.1). Hence solving (3.4.1) and (3.4.2) amounts to exactlythe same thing, and for us it will be convenient to concentrate on (3.4.2).

Let us begin by putting an arbitrary, continuous function z into the righthand side of (3.4.2). What we get out is another function u defined by

u(t) = y0 +

∫ t

0f(s, z(s)) ds

We can think of this as a function F mapping continuous functions z tocontinuous functions u = F (z). From this point of view, a solution y ofthe integral equation (3.4.2) is just a fixed point for the function F — weare looking for a y such that y = F (y). (Don’t worry if you feel a littledizzy; that’s just normal at this stage! Note that F is a function acting ona function z to produce a new function u = F (z) — it takes some time toget used to such creatures!)

Our plan is to use Banach’s Fixed Point Theorem to prove that F has aunique fixed point, but first we have to introduce a crucial condition. We saythat the function f : [a, b] × Rn → Rn is uniformly Lipschitz with Lipschitzconstant K on the interval [a, b] if K is a real number such that

|f(t,y)− f(t, z)| ≤ K|y − z|

for all t ∈ [a, b] and all y, z ∈ Rn. Here is the key observation in ourargument.


Lemma 3.4.1 Assume that y0 ∈ Rn and that f : [0,∞) × Rn → Rn iscontinuous and uniformly Lipschitz with Lipschitz constant K on [0,∞). Ifa < 1

K , the mapF : C([0, a],Rn)→ C([0, a],Rn)

defined by

F (z)(t) = y0 +

∫ t

0f(t, z(t)) dt

is a contraction.

Remark: The notation here is rather messy. Remember that F (z) is afunction from [0, a] to Rn. The expression F (z)(t) denotes the value of thisfunction at point t ∈ [0, a].

Proof: Let v,w be two elements in C([0, a],Rn), and note that for anyt ∈ [0, a]

|F (v)(t)− F (w)(t)| = |∫ t

0

(f(s,v(s))− f(s,w(s))

)ds| ≤

≤∫ t

0|f(s,v(s))− f(s,w(s))| ds ≤

∫ t

0K|v(s)−w(s)| ds ≤

≤ K∫ t

0ρ(v,w) ds ≤ K

∫ a

0ρ(v,w) ds = Kaρ(v,w)

Taking the supremum over all t ∈ [0, a], we get

ρ(F (v), F (w)) ≤ Kaρ(v,w).

Since Ka < 1, this means that F is a contraction. 2

We are now ready for the main theorem.

Theorem 3.4.2 Assume that y0 ∈ Rn and that f : [0,∞) × Rn → Rn iscontinuous and uniformly Lipschitz on [0,∞). Then the initial value problem

y′(t) = f(t,y(t)), y(0) = y0 (3.4.3)

has a unique solution y on [0,∞).

Proof: Let K be the uniform Lipschitz constant, and choose a numbera < 1/K. According to the lemma, the function

F : C([0, a],Rn)→ C([0, a],Rn)

defined by

F (z)(t) = y0 +

∫ t

0f(t, z(t)) dt

3.4. APPLICATIONS TO DIFFERENTIAL EQUATIONS 69

is a contraction. Since C([0, a],Rn) is complete by Theorem 3.3.4, Banach’sFixed Point Theorem tells us that F has a unique fixed point y. This meansthat the integral equation

y(t) = y0 +

∫ t

0f(s,y(s)) ds (3.4.4)

has a unique solution on the interval [0, a]. To extend the solution to alonger interval, we just repeat the argument on the interval [a, 2a], usingy(a) as initial value. The function we then get, is a solution of the integralequation (3.4.4) on the extended interval [0, 2a] as we for t ∈ [a, 2a] have

y(t) = y(a) +

∫ t

af(s,y(s)) ds =

= y0 +

∫ a

0f(s,y(s)) ds+

∫ t

af(s,y(s)) ds = y0 +

∫ t

0f(s,y(s)) ds

Continuing this procedure to new intervals [2a, 3a], [3a, 4a], we see that theintegral equation (3.4.3) has a unique solution on all of [0,∞). As we havealready observed that equation (3.4.3) has exactly the same solutions asequation (3.4.4), the theorem is proved. 2

In the exercises you will see that the conditions in the theorem are im-portant. If they fail, the equation may have more than one solution, or asolution defined only on a bounded interval.


1. Solve the initial value problem

y′ = 1 + y2, y(0) = 0

and show that the solution is only defined on the interval [0, π/2).

2. Show that the functions

y(t) =

0 if 0 ≤ t ≤ a

(t− a)32 if t > a

where a ≥ 0 are all solutions of the initial value problem

y′ =3

2y

13 , y(0) = 0

Remember to check that the differential equation is satisfied at t = a.

3. In this problem we shall sketch how the theorem in this section can be usedto study higher order systems. Assume we have a second order initial valueproblem

u′′(t) = g(t, u(t), u′(t)) u(0) = a, u′(0) = b (∗)


where g : [0,∞)×R2 → R is a given function. Define a function f : [0,∞)×R2 → R2 by

f(t, u, v) =

(v

g(t, u, v)

)Show that if

y(t) =

(u(t)v(t)

)is a solution of the initial value problem

y′(t) = f(t,y(t)). y(0) =

(ab

),

then u is a solution of the original problem (∗).

3.5 Compact subsets of C(X,Rm)

The compact subsets of Rm are easy to describe — they are just the closedand bounded sets. This characterization is extremely useful as it is mucheasier to check that a set is closed and bounded than to check that it satisfiesthe definition of compactness. In the present section we shall prove a similarkind of characterization of compact sets in C(X,Rm) — we shall show thata subset of C(X,Rm) is compact if and only if it it closed, bounded andequicontinuous. This is known as the Arzela-Ascoli Theorem. But beforewe turn to it, we have a question of independent interest to deal with. Wehave already encountered the notion of a dense set in Section 2.7, but repeatit here:

Definition 3.5.1 Let (X, d) be a metric space and assume that A is a subsetof X. We say that A is dense in X if for each x ∈ X there is a sequencefrom A converging to x.

Recall (Proposition 2.7.2) that dense sets can also be described in a slightlydifferent way: A subset D of a metric space X is dense if and only if foreach x ∈ X and each δ > 0, there is a y ∈ D such that d(x, y) ≤ δ.

We know that Q is dense in R — we may, e.g., approximate a real numberby longer and longer parts of its decimal expansion. For x =

√2 this would

mean the approximating sequence

a1 = 1.4 =14

10, a2 = 1.41 =

141

100, a3 = 1.414 =

1414

1000, a4 = 1.4142 =

14142

10000, . . .

Recall that Q is countable, but that R is not. Still every element in theuncountable set R can be approximated arbitrarily well by elements in themuch smaller set Q. This property turns out to be so useful that it deservesa name.

3.5. COMPACT SUBSETS OF C(X,RM ) 71

Definition 3.5.2 A metric set (X, d) is called separable if it has a count-able, dense subset A.

Our first result is a simple, but rather surprising connection betweenseparability and compactness.

Proposition 3.5.3 All compact metric (X, d) spaces are separable. We canchoose the countable dense set A in such a way that for any δ > 0, there isa finite subset Aδ of A such that all elements of X are within distance lessthan δ of Aδ, i.e. for all x ∈ X there is an a ∈ Aδ such that d(x, a) < δ.

Proof: We use that a compact space X is totally bounded (recall Theorem2.5.12). This mean that for all n ∈ N, there is a finite number of balls ofradius 1

n that cover X. The centers of all these balls form a countable subsetA of X (to get a listing of A, first list the centers of the balls of radius 1,then the centers of the balls of radius 1

2 etc.). We shall prove that A is densein X.

Let x be an element of X. To find a sequence an from A converging tox, we first pick the center a1 of (one of) the balls of radius 1 that x belongsto, then we pick the center a2 of (one of) the balls of radius 1

2 that x belongto, etc. Since d(x, an) < 1

n , an is a sequence from A converging to x.To find the set Aδ, just choose m ∈ N so big that 1

m < δ, and let Aδconsist of the centers of the balls of radius 1

m . 2

We are now ready to turn to C(X,Rm). First we recall the definition ofequicontinuous sets of functions from Section 3.1.

Definition 3.5.4 Let (X, dX) and (Y, dY ) be metric spaces, and let F be acollection of functions f : X → Y . We say that F is equicontinuous if forall ε > 0, there is a δ > 0 such that for all f ∈ F and all x, y ∈ X withdX(x, y) < δ, we have dY (f(x), f(y)) < ε.

We begin with a lemma that shows that for equicontinuous sequences,it suffices to check convergence on dense sets of the kind described above.

Lemma 3.5.5 Assume that (X, dX) is a compact and (Y, dY ) a completemetric space, and let gk be an equicontinuous sequence in C(X,Y ). As-sume that A ⊆ X is a dense set as described in Proposition 3.5.3 and thatgk(a) converges for all a ∈ A. Then gk converges in C(X,Y ).

Proof: Since C(X,Y ) is complete, it suffices to prove that gk is a Cauchysequence. Given an ε > 0, we must thus find an N ∈ N such that ρ(gn, gm) <ε when n,m ≥ N . Since the sequence is equicontinuous, there exists a δ > 0such that if dX(x, y) < δ, then dY (gk(x), gk(y)) < ε

4 for all k. Choose afinite subset Aδ of A such that any element in X is within less than δ of an


element in Aδ. Since the sequences gk(a), a ∈ Aδ, converge, they are allCauchy sequences, and we can find an N ∈ N such that when n,m ≥ N ,dY (gn(a), gm(a)) < ε

4 for all a ∈ Aδ (here we are using that Aδ is finite).For any x ∈ X, we can find an a ∈ Aδ such that dX(x, a) < δ. But then

for all n,m ≥ N ,dY (gn(x), gm(x)) ≤

≤ dY (gn(x), gn(a)) + dY (gn(a), gm(a)) + dY (gm(a), gm(x)) <

<ε

4+ε

4+ε

4=

3ε

4

Since this holds for any x ∈ X, we must have ρ(gn, gm) ≤ 3ε4 < ε for all

n,m ≥ N , and hence gk is a Cauchy sequence and converges in the com-plete space C(X,Y ). 2

We are now ready to prove the hard part of the Arzela-Ascoli Theorem.

Proposition 3.5.6 Assume that (X, d) is a compact metric space, and letfn be a bounded and equicontinuous sequence in C(X,Rm). Then fnhas a subsequence converging in C(X,Rm).

Proof: Since X is compact, there is a countable, dense subset

A = a1, a2. . . . , an, . . .

as in Proposition 3.5.3. According to the lemma, it suffices to find a subse-quence gk of fn such that gk(a) converges for all a ∈ A.

We begin a little less ambitiously by showing that fn has a subsequence

f (1)n such that f (1)

n (a1) converges (recall that a1 is the first element in our

listing of the countable set A). Next we show that f (1)n has a subsequence

f (2)n such that both f (2)

n (a1) and f (2)n (a2) converge. Continuing taking

subsequences in this way, we shall for each j ∈ N find a sequence f (j)n such

that f (j)n (a) converges for a = a1, a2, . . . , aj . Finally, we shall construct

the sequence gk by combining all the sequences f (j)n in a clever way.

Let us start by constructing f (1)n . Since the sequence fn is bounded,

fn(a1) is a bounded sequence in Rm, and by Bolzano-Weierstrass’ Theo-

rem, it has a convergent subsequence fnk(a1). We let f (1)

n consist of the

functions appearing in this subsequence. If we now apply f (1)n to a2, we get

a new bounded sequence f (1)n (a2) in Rm with a convergent subsequence.

We let f (2)n be the functions appearing in this subsequence. Note that

f (2)n (a1) still converges as f (2)

n is a subsequence of f (1)n . Continuing

in this way, we see that we for each j ∈ N have a sequence f (j)n such that

f (j)n (a) converges for a = a1, a2, . . . , aj . In addition, each sequence f (j)

n is a subsequence of the previous ones.

3.5. COMPACT SUBSETS OF C(X,RM ) 73

We are now ready to construct a sequence gk such that gk(a) con-verges for all a ∈ A. We do it by a diagonal argument, putting g1 equal

to the first element in the first sequence f (1)n , g2 equal to the second el-

ement in the second sequence f (2)n etc. In general, the k-th term in the

g-sequence equals the k-th term in the k-th f -sequence fkn, i.e. gk = f(k)k .

Note that except for the first few elements, gk is a subsequence of any

sequence f (j)n . This means that gk(a) converges for all a ∈ A, and the

proof is complete. 2

As a simple consequence of this result we get:

Corollary 3.5.7 If (X, d) is a compact metric space, all bounded, closedand equicontinuous sets K in C(X,Rm) are compact.

Proof: According to the proposition, any sequence in K has a convergentsubsequence. Since K is closed, the limit must be in K, and hence K iscompact. 2

As already mentioned, the converse of this result is also true, but beforewe prove it, we need a technical lemma that is quite useful also in othersituations:

Lemma 3.5.8 Assume that (X, dX) and (Y, dY ) are metric spaces and thatfn is a sequence of continuous function from X to Y which convergesuniformly to f . If xn is a sequence in X converging to a, then fn(xn)converges to f(a).

Remark: This lemma is not as obvious as it may seem — it is not true ifwe replace uniform convergence by pointwise!

Proof of Lemma 3.5.8: Given ε > 0, we must show how to find an N ∈ Nsuch that dY (fn(xn), f(a)) < ε for all n ≥ N . Since we know from Proposi-tion 3.2.4 that f is continuous, there is a δ > 0 such that dY (f(x), f(a)) < ε

2when dX(x, a) < δ. Since xn converges to x, there is an N1 ∈ N suchthat dX(xn, a) < δ when n ≥ N1. Also, since fn converges uniformly tof , there is an N2 ∈ N such that if n ≥ N2, then dY (fn(x), f(x)) < ε

2 for allx ∈ X. If we choose N = maxN1, N2, we see that if n ≥ N ,

dY (fn(xn), f(a)) ≤ dY (fn(xn), f(xn)) + dY (f(xn), f(a)) <ε

2+ε

2= ε

and the lemma is proved. 2

We are finally ready to prove the main theorem:


Theorem 3.5.9 (Arzela-Ascoli’s Theorem) Let (X, dX) be a compactmetric space. A subset K of C(X,Rm) is compact if and only if it is closed,bounded and equicontinuous.

Proof: It remains to prove that a compact set K in C(X,Rm) is closed,bounded and equicontinuous. Since compact sets are always closed andbounded according to Proposition 2.5.4, if suffices to prove that K is equicon-tinuous. We argue by contradiction: We assume that the compact set K isnot equicontinuous and show that this leads to a contradiction.

Since K is not equicontinuous, there must be an ε > 0 which can notbe matched by any δ; i.e. for any δ > 0, there is a function f ∈ K andpoints x, y ∈ X such that dX(x, y) < δ, but dRm(f(x), f(y)) ≥ ε. If weput δ = 1

n , we get at function fn ∈ K and points xn, yn ∈ X such thatdX(xn, yn) < 1

n , but dRm(fn(xn), fn(yn)) ≥ ε. Since K is compact, there is asubsequence fnk

of fn which converges (uniformly) to a function f ∈ K.Since X is compact, the corresponding subsequence xnk

of xn, has asubsequence xnkj

converging to a point a ∈ X. Since dX(xnkj, ynkj

) < 1nkj

,

the corresponding sequence ynkj of y’s also converges to a.

Since fnkj converges uniformly to f , and xnkj

, ynkj both converge

to a, the lemma tells us that

fnkj(xnkj

)→ f(a) and fnkj(ynkj

)→ f(a)

But this is impossible since dRm(f(xnkj), f(ynkj

)) ≥ ε for all j. Hence we

have our contradiction, and the theorem is proved. 2


1. Show that Rn is separable for all n.

2. Show that a subset A of a metric space (X, d) is dense if and only if all openballs B(a, r), a ∈ X, r > 0, contain elements from A.

3. Assume that (X, d) is a complete metric space, and that A is a dense subsetof X. We let A have the subset metric dA.

a) Assume that f : A→ R is uniformly continuous. Show that if an is asequence from A converging to a point x ∈ X, then f(an) converges.Show that the limit is the same for all such sequences an convergingto the same point x.

b) Define f : X → R by putting f(x) = limn→∞ f(an) where an is asequence from a converging to x. We call f the continuous extensionof f to X. Show that f is uniformly continuous.

c) Let f : Q→ R be defined by

f(q) =

0 if q <√

2

1 if q >√

2

3.6. DIFFERENTIAL EQUATIONS REVISITED 75

Show that f is continuous on Q (we are using the usual metric dQ(q, r) =|q − r|). Is f uniformly continuous?

d) Show that f does not have a continuous extension to R.

4. Let K be a compact subset of Rn. Let fn be a sequence of contractions ofK. Show that fn has uniformly convergent subsequence.

5. A function f : [−1, 1] → R is called Lipschitz continuous with Lipschitzconstant K ∈ R if

|f(x)− f(y)| ≤ K|x–y|

for all x, y ∈ [−1, 1]. Let K be the set of all Lipschitz continuous functionswith Lipschitz constant K such that f(0) = 0. Show that K is a compactsubset of C([−1, 1],R).

6. Assume that (X, dX) and (Y, dY ) are two metric spaces, and let σ : [0,∞)→[0,∞) be a nondecreasing, continuous function such that σ(0) = 0. We saythat σ is a modulus of continuity for a function f : X → Y if

dY (f(u), f(v)) ≤ σ(dX(u, v))

for all u, v ∈ X.

a) Show that a family of functions with the same modulus of continuity isequicontinuous.

b) Assume that (X, dX) is compact, and let x0 ∈ X. Show that if σ is amodulus of continuity, then the set

K = f : X → Rn : f(x0) = 0 and σ is modulus of continuity for f

is compact.

c) Show that all functions in C([a, b],Rm) has a modulus of continuity.

7. A metric space (X, d) is called locally compact if for each point a ∈ X,there is a closed ball B(a; r) centered at a that is compact. (Recall thatB(a; r) = x ∈ X : d(a, x) ≤ r). Show that Rm is locally compact, butthat C([0, 1],R) is not.

3.6 Differential equations revisited

In Section 3.4, we used Banach’s Fixed Point Theorem to study initial valueproblems of the form

y′(t) = f(t,y(t)), y(0) = y0 (3.6.1)

or equivalently

y(t) = y0 +

∫ t

0f(s,y(s)) ds (3.6.2)

In this section we shall see how Arzela-Ascoli’s Theorem can be used to proveexistence of solutions under weaker conditions than before. But in the new


approach we shall also lose something — we can only prove that the solutionsexist in small intervals, and we can no longer guarantee uniqueness.

The starting point is Euler’s method for finding approximate solutionsto differential equations. If we want to approximate the solution starting aty0 at time t = 0, we begin by partitioning time into discrete steps of length∆t; hence we work with the time line

T = t0, t1, t2, t3 . . .

where t0 = 0 and ti+1− ti = ∆t. We start the approximate solution y at y0

and move in the direction of the derivative f(t0,y0), i.e. we put

y(t) = y0 + f(t0,y0)(t− t0)

for t ∈ [t0, t1]. Once we reach t1, we change directions and move in thedirection of the new derivative f(t1, y(t1)) so that we have

y(t) = y(t1) + f(t0, y(t1))(t− t1)

for t ∈ [t1, t2]. If we insert the expression for y(t1), we get:

y(t) = y0 + f(t0,y0)(t1 − t0) + f(t1, y(t1))(t− t1)

If we continue in this way, changing directions at each point in T , we get

y(t) = y0 +

k−1∑i=0

f(ti, y(ti))(ti+1 − ti) + f(tk, y(tk))(t− tk)

for t ∈ [tk, tk+1]. If we observe that

f(ti, y(ti))(ti+1 − ti) =

∫ ti+1

ti

f(ti, y(ti) ds ,

we can rewrite this expression as

y(t) = y0 +

k−1∑i=0

∫ ti+1

ti

f(ti, y(ti) ds+

∫ t

tk

f(tk, y(tk) ds

If we also introduce the notation

s = the largest ti ∈ T such that ti ≤ s,

we may express this more compactly as

y(t) = y0 +

∫ t

0f(s, y(s)) ds


Note that we can also write this as

y(t) = y0 +

∫ t

0f(s, y(s)) ds+

∫ t

0

(f(s, y(s))− f(s, y(s))

)ds

(observe that there is one s and one s term in the last integral) where thelast term measures how much y “deviates” from being a solution of equation(3.6.2).

Intuitively, one would think that the approximate solution y will con-verge to a real solution y when the step size ∆t goes to zero. To be morespecific, if we let yn be the approximate solution we get when we choose∆t = 1

n , we would expext the squence yn to converge to a solution of (2).It turns out that in the most general case we can not quite prove this, but wecan instead use the Arzela-Ascoli Theorem to find a subsequence convergingto a solution.

Before we turn to the proof, it will useful to see how intergals of the form

Ik(t) =

∫ t

0f(s, yk(s)) ds

behave when the functions yk converge uniformly to a limit y.

Lemma 3.6.1 Let f : [0,∞) × Rm → Rm be a continuous function, andassume that yk is a sequence of continuous functions yk : [0, a] → Rmconverging uniformly to a function y. Then the integral functions

Ik(t) =

∫ t

0f(s, yk(s)) ds

converge uniformly to

I(t) =

∫ t

0f(s,y(s)) ds

on [0, a].

Proof: Since the sequence yk converges uniformly, it is bounded, andhence there is a constant K such that |yk(t)| ≤ K for all k ∈ N and allt ∈ [0, a] (prove this!). The continuous function f is uniformly continuouson the compact set [0, a]× [−K,K]m, and hence for every ε > 0, there is aδ > 0 such that if |y − y′| < δ, then |f(s,y)− f(s,y′)| < ε

a for all s ∈ [0, a].Since yk converges uniformly to y, there is an N ∈ N such that if n ≥ N ,|yn(s)− y(s)| < δ for all s ∈ [0, a]. But then

|In(t)− I(t)| = |∫ t

0

(f(s, yn(s))− f(s,y(s))

)ds| ≤

≤∫ t

0

∣∣f(s, yn(s))− f(s,y(s))∣∣ ds < ∫ a

0

ε

ads = ε


for all t ∈ [0, a], and hence Ik converges uniformly to I. 2

We are now ready for the main result.

Theorem 3.6.2 Assume that f : [0,∞)× Rm → Rm is a continuous func-tion and that y0 ∈ Rm. Then there exists a positive real number a and afunction y : [0, a]→ Rm such that y(0) = y0 and

y′(t) = f(t,y(t)) for all t ∈ [0, a]

Remark: Note that there is no uniqueness statement (the problem mayhave more than one solution), and that the solution is only guaranteed toexist on a bounded intervall (it may disappear to infinity after finite time).

Proof of Theorem 3.6.2: Choose a big, compact subset C = [0, R]×[−R,R]m

of [0,∞) × Rm containing (0,y0) in its interior. By the Extreme ValueTheorem, the components of f have a maximum value on C, and hencethere exists a number M ∈ R such that |fi(t,y)| ≤M for all (t,y) ∈ C andall i = 1, 2, . . . ,m. If the initial value has components

y0 =

Y1

Y2...Ym

we choose a ∈ R so small that the set

A = [0, a]×[Y1−Ma, Y1+Ma]×[Y2−Ma, Y2+Ma]×· · ·×[Ym−Ma, Ym+ma]

is contained in C. This may seem mysterious, put the point is that ourapproximate solutions of the differential equation can never leave the area

[Y1 −Ma, Y1 +Ma]× [Y2 −Ma, Y2 +Ma]× · · · × [Ym −Ma, Y +ma]

while t ∈ [0, a] since all the derivatives are bounded by M .

Let yn be the approximate solution obtained by using Euler’s methodon the interval [0, a] with time step a

n . The sequence yn is boundedsince (t, yn(t)) ∈ A, and it is equicontinuous since the components of fare bounded by M . By Proposition 3.5.6, yn has a subsequence ynk

converging uniformly to a function y. If we can prove that y solves theintegral equation

y(t) = y0 +

∫ t

0f(s,y(s)) ds

for all t ∈ [0, a], we shall have proved the theorem.


From the calculations at the beginning of the section, we know that

ynk(t) = y0 +

∫ t

0f(s, ynk

(s)) ds+

∫ t

0

(f(s, ynk

(s))−f(s, ynk(s))

)ds (3.6.3)

and according to the lemma∫ t

0f(s, ynk

(s)) ds→∫ t

0f(s,y(s)) ds uniformly for t ∈ |0, a]

If we can only prove that∫ t

0

(f(s, ynk

(s))− f(s, ynk(s))

)ds→ 0 (3.6.4)

we will get

y(t) = y0 +

∫ t

0f(s,y(s)) ds

as k →∞ in (3.6.3), and the theorem will be provedTo prove (3.6.4), observe that since A is a compact set, f is uniformly

continuous on A. Given an ε > 0, we thus find a δ > 0 such that |f(s,y)−f(s′,y′)| < ε

a when |(s,y) − (s′,y)| < δ (we are measuring the distance inthe ordinary Rm+1-metric). Since

|(s, ynk(s))− (s, ynk

(s))| ≤ |(∆t,M∆t, . . . ,M∆t)| =√

1 + nM2 ∆t ,

we can clearly get |(s, ynk(s))− (s, ynk

(s))| < δ by choosing k large enough(and hence ∆t small enough). For such k we then have

|∫ t

0

(f(s, ynk

(s))− f(s, ynk(s))

∣∣ < ∫ a

0

ε

ads = ε

and hence ∫ t

0

(f(s, ynk

(s))− f(s, ynk(s))

)ds→ 0

as k →∞. As already observed, this completes the proof. 2

Remark: An obvious question at this stage is why didn’t we extend oursolution beyond the interval [0, a] as we did in the proof of Theorem 3.4.2?The reason is that in the present case we do not have control over the lengthof our intervals, and hence the second interval may be very small comparedto the first one, the third one even smaller, and so one. Even if we add aninfinite number of intervals, we may still only cover a finite part of the realline. There are good reasons for this: the differential equation may onlyhave solutions that survive for a finite amount of time. A typical exampleis the equation

y′ = (1 + y2), y(0) = 0


where the (unique) solution y(t) = tan t goes to infinity when t→ π2−.

The proof above is a simple, but typical example of a wide class ofcompactness arguments in the theory of differential equations. In such ar-guments one usually starts with a sequence of approximate solutions andthen uses compactness to extract a subsequence converging to a solution.Compactness methods are strong in the sense that they can often prove lo-cal existence of solutions under very general conditions, but they are weakin the sense that they give very little information about the nature of thesolution. But just knowing that a solution exists, is often a good startingpoint for further explorations.


1. Prove that if fn : [a, b]→ Rm are continuous functions converging uniformlyto a function f , then the sequence fn is bounded in the sense that there isa constant K ∈ R such that |fn(t)| ≤ K for all n ∈ N and all t ∈ [a, b] (thisproperty is used in the proof of Lemma 3.6.1).

2. Go back to exercises 1 and 2 in Section 3.4. Show that the differential equa-tions satisfy the conditions of Theorem 3.6.2. Comment.

3. It is occasionally useful to have a slightly more general version of Theorem3.6.2 where the solution doesn’t just start a given point, but passes through it:

Teorem Assume that f : R × Rm → Rm is a continuous function. For anyt0 ∈ R and y0 ∈ Rm, there exists a positive real number a and a functiony : [t0 − a, t0 + a]→ Rm such that y(t0) = y0 and

y′(t) = f(t,y(t)) for all t ∈ [t0 − a, t0 + a]

Prove this theorem by modifying the proof of Theorem 3.6.2 (run Euler’smethod “backwards” on the interval [t0 − a, t0]).

3.7 Polynomials are dense in C([a, b],R)

From calculus we know that many continuous functions can be approxi-mated by their Taylor polynomials, but to have Taylor polynomials of allorders, a function f has to be infinitely differentiable, i.e. the higher orderderivatives f (k) have to exist for all k. Most continuous functions are not dif-ferentiable at all, and the question is whether they still can be approximatedby polynomials. In this section we shall prove:

Theorem 3.7.1 (Weierstrass’ Theorem) The polynomials are dense inC([a, b],R) for all a, b ∈ R, a < b. In other words, for each continuousfunction f : [a, b] → R, there is a sequence of polynomials pn converginguniformly to f .

3.7. POLYNOMIALS ARE DENSE IN C([A,B],R) 81

The proof I shall give (due to the Russian mathematician Sergei Bern-stein (1880-1968)) is quite surprising; it uses probability theory to establishthe result for the interval [0, 1], and then a straight forward scaling argumentto extend it to all closed and bounded intervals.

The idea is simple: Assume that you are tossing a biased coin which hasprobability x of coming up “heads”. If you toss it more and more times,you expect the proportion of times it comes up “heads” to stabilize aroundx. If somebody has promised you an award of f(X) dollars, where X is theactually proportion of “heads” you have had during your (say) 1000 firsttosses, you would expect your award to be close to f(x). If the number oftosses was increased to 10 000, you would feel even more certain.

Let us fomalize this: Let Yi be the outcome of the i-th toss in the sensethat Yi has the value 0 if the coin comes up “tails” and 1 if it comes up“heads”. The proportion of “heads” in the first N tosses is then given by

XN =1

N(Y1 + Y2 + · · ·+ YN )

Each Yi is binomially distributed with mean E(Yi) = x and variance Var(Yi) =x(1− x). We thus have

E(XN ) =1

N(E(Y1) + E(Y2) + · · ·E(YN )) = x

and (using that the Yi’s are independent)

Var(XN ) =1

N2(Var(Y1) + Var(Y2) + · · ·+ Var(YN )) =

1

Nx(1− x)

(if you don’t remember these formulas from probability theory, we shallderive them by analytic methods in the exercises). As N goes to infinity,we would expect XN to converge to x with probability 1. If the “awardfunction” f is continuous, we would also expect our average award E(f(XN ))to converge to f(x).

To see what this has to do with polynomials, let us compute the averageaward E(f(XN )). Since the probability of getting exactly k heads in Ntosses is

(Nk

)xk(1− x)n−k, we get

E(f(XN )) =

N∑k=0

f(k

N)

(N

k

)xk(1− x)N−k

Our expectation that E(f(XN )) → f(x) as N → ∞, can therefore berephrased as

N∑k=0

f(k

N)

(N

k

)xk(1− x)N−k → f(x) N →∞


If we expand the parentheses (1−x)N−k, we see that the expressions on theright hand side are just polynomials in x, and hence we have arrived at thehypothesis that the polynomials

pN (x) =

N∑k=0

f(k

N)

(N

k

)xk(1− x)N−k

converge to f(x). We shall prove that this is indeed the case, and that theconvergence is uniform.

Before we turn to the proof, we need some notation and a lemma. Forany random variable X with expectation x and any δ > 0, we shall write

1|x−X|<δ =

1 if |x−X| < δ

0 otherwise

and oppositely for 1|x−X|≥δ.

Lemma 3.7.2 (Chebyshev’s Inequality) For a bounded random vari-able X with mean x

E(1|x−X|≥δ) ≤1

δ2Var(X)

Proof: Since δ21|x−X|≥δ) ≤ (x−X)2, we have

δ2E(1|x−X|≥δ) ≤ E((x−X)2) = Var(X)

Dividing by δ2, we get the lemma. 2

We are now ready to prove that the Bernstein polynomials converge.

Proposition 3.7.3 If f : [0, 1]→ R is a continuous function, the Bernsteinpolynomials

pN (x) =N∑k=0

f(k

N)

(N

k

)xk(1− x)N−k

converge uniformly to f on [0, 1].

Proof: Given ε > 0, we must show how to find an N such that |f(x) −pn(x)| < ε for all n ≥ N and all x ∈ [0, 1]. Since f is continuous on thecompact set [0, 1], it has to be uniformly continuous, and hence we canfind a δ > 0 such that |f(u) − f(v)| < ε

2 whenever |u − v| < δ. Sincepn(x) = E(f(Xn)), we have

|f(x)−pn(x)| = |f(x)−E(f(Xn))| = |E(f(x)−f(Xn))| ≤ E(|f(x)−f(Xn)|)


We split the last expectation into two parts: the cases where |x −Xn| < δand the rest:

E(|f(x)−f(Xn)|) = E(1|x−Xn|<δ|f(x)−f(Xn)|)+E(1|x−Xn|≥δ|f(x)−f(Xn)|)

The idea is that the first term is always small due to the choice of δ and thatthe second part will be small when N is large because XN then is unlikelyto deviate much from x. Here are the details:

By choice of δ, we have for the first term

E(1|x−Xn|<δ|f(x)− f(Xn)|) ≤ E(1|x−Xn|<δ

ε

2

)≤ ε

2

For the second term, we first note that since f is a continuous functionon a compact interval, it must be bounded by a constant M . Hence byChebyshev’s inequality

E(1|x−Xn|≥δ|f(x)− f(Xn)|) ≤ 2ME(1|x−Xn|≥δ) ≤

≤ 2M

δ2Var(Xn) =

2Mx(1− x)

δ2n≤ M

2δ2n

where we in the last step used that 14 is the maximal value of x(1 − x) on

[0, 1]. If we now choose N ≥ Mδ2ε

, we see that we get

E(1|x−Xn|≥δ|f(x)− f(Xn)|) < ε

2

for all n ≥ N . Combining all the inequalities above, we see that if n ≥ N ,we have for all x ∈ [0, 1]

|f(x)− pn(x)| ≤ E(|f(x)− f(Xn)|) =

= E(1|x−Xn|<δ|f(x)− f(Xn)|) + E(1|x−Xn|≥δ|f(x)− f(Xn)|) <

<ε

2+ε

2= ε

and hence the Bernstein polynomials pn converge uniformly to f . 2

To get Weierstrass’ result, we just have to move functions from arbitraryintervals [a, b] to [0, 1] and back. The function

T (x) =x− ab− a

maps [a, b] bijectively to [0, 1], and the inverse function

T−1(y) = a+ (b− a)y

maps [0, 1] back to [a, b]. If f is a continuous function on [a, b], the functionf = f T−1 is a continuous function on [0, 1] taking exactly the same values


in the same order. If qn is a sequence of pynomials converging uniformlyto f on [0, 1], then the functions pn = qn T converge uniformly to f on[a, b]. Since

pn(x) = qn(x− ab− a

)

the pn’s are polynomials, and hence Weierstrass’ theorem is proved.

Remark: Weierstrass’ theorem is important because many mathematicalarguments are easier to perform on polynomials than on continuous func-tions in general. If the property we study is preserved under uniform limits(i.e. if the if the limit function f of a uniformly convergent sequence of func-tions fn always inherits the property from the fn’s), we can use Weier-strass’ Theorem to extend the argument from polynomials to all continuousfunctions. There is an extension of the result called the Stone-WeierstrassTheorem which generalizes the result to much more general settings.


1. Show that there is no sequence of polynomials that converges uniformly tothe continuous function f(x) = 1

x on (0, 1).

2. Show that there is no sequence of poynomials that converges uniformly tothe function f(x) = ex on R.

3. In this problem

f(x) =

e−1/x2

if x 6= 0

0 if x = 0

a) Show that if x 6= 0, then the n-th derivative has the form

f (n)(x) = e−1/x2 Pn(x)

xNn

where Pn is a polynomial and Nn ∈ N.

b) Show that f (n)(0) = 0 for all n.

c) Show that the Taylor polynomials of f at 0 do not converge to f exceptin the point 0.

4. Assume that f : [a, b]→ R is a continuous function such that∫ baf(x)xn dx =

0 for all n = 0, 1, 2, 3, . . ..

a) Show that∫ baf(x)p(x) dx = 0 for all polynomials p.

b) Use Weierstrass’ theorem to show that∫ baf(x)2 dx = 0. Conclude that

f(x) = 0 for all x ∈ [a, b].

5. In this exercise we shall show that C([a, b],R) is a separable metric space.

a) Assume that (X, d) is a metric space, and that S ⊆ T are subsets of X.Show that if S is dense in (T, dT ) and T is dense in (X, d), then S isdense in (X, d).


b) Show that for any polynomial p, there is a sequence qn of polynomialswith rational coefficients that converges uniformly to p on [a, b].

c) Show that the polynomials with rational coefficients are dense in C([a, b],R).

d) Show that C([a, b],R) is separable.

6. In this problem we shall reformulate Bernstein’s proof in purely analyticterms, avoiding concepts and notation from probability theory. You shouldkeep the Binomial Formula

(a+ b)N =

N∑k=0

(n

k

)akbN−k

and the definition(Nk

)= N(N−1)(N−2)·...·(N−k+1)

1·2·3·...·k in mind.

a) Show that∑Nk=0

(Nk

)xk(1− x)N−k = 1.

b) Show that∑Nk=0

kN

(Nk

)xk(1 − x)N−k = x (this is the analytic version

of E(XN ) = 1N (E(Y1) + E(Y2) + · · ·E(YN )) = x)

c) Show that∑Nk=0

(kN − x

)2 (Nk

)xk(1 − x)N−k = 1

N x(1 − x) (this is the

analytic version of Var(XN ) = 1N x(1 − x)). Hint: Write ( kN − x)2 =

1N2

(k(k − 1) + (1− 2xN)k +N2x2

)and use points b) and a) on the

second and third term in the sum.

d) Show that if pn is the n-th Bernstein polynomial, then

|f(x)− pn(x)| ≤n∑k=0

|f(x)− f(k/n)|(n

k

)xn(1− x)n−k

e) Given ε > 0, explain why there is a δ > 0 such that |f(u)− f(v)| < ε/2for all u, v ∈ [0, 1] such that |u− v| < δ. Explain why

|f(x)− pn(x)| ≤∑

k:| kn−x|<δ

|f(x)− f(k/n)|(n

k

)xn(1− x)n−k+

+∑

k:| kn−x|≥δ

|f(x)− f(k/n)|(n

k

)xn(1− x)n−k ≤

<ε

2+

∑k:| kn−x|≥δ

|f(x)− f(k/n)|(n

k

)xn(1− x)n−k

f) Show that there is a constant M such that |f(x)| ≤M for all x ∈ [0, 1].Explain all the steps in the calculation:∑

k:| kn−x|≥δ

|f(x)− f(k/n)|(n

k

)xn(1− x)n−k ≤

≤ 2M∑

k:| kn−x|≥δ

(n

k

)xn(1− x)n−k ≤

≤ 2M

n∑k=0

(kn − xδ

)2(n

k

)xn(1− x)n−k ≤ 2M

nδ2x(1− x) ≤ M

2nδ2


g) Explain why we can get |f(x)− pn(x)| < ε by chosing n large enough,and explain why this proves Proposition 3.7.2.

3.8 Baire’s Category Theorem

Recall that a subset A of a metric space (X, d) is dense if for all x ∈ X thereis a sequence from A converging to x. An equivalent definition is that allballs in X contain elements from A. To show that a set S is not dense, wethus have to find an open ball that does not intersect S. Obviously, a setcan fail to be dense in parts of X, and still be dense in other parts. If Gis a nonempty, open subset of X, we say that A is dense in G if every ballB(x; r) ⊆ G contains elements from A. The following definition catches ourintiution of a set set that is not dense anywhere.

Definition 3.8.1 A subset S of a metric space (X, d) is said to be nowheredense if it isn’t dense in any nonempty, open set G. In other words, forall nonempty, open sets G ⊆ X, there is a ball B(x; r) ⊆ G that does notintersect S.

This definition simply says that no matter how much we restrict our atten-tion, we shall never find an area in X where S is dense.

Example 1. N is nowhere dense in R. ♣

Nowhere dense sets are sparse in an obvious way. The following definitionindicates that even countable unions of nowhere dense sets are unlikely tobe very large.

Definition 3.8.2 A set is called meager if it is a countable union of nowheredense sets. The complement of a meager set is called comeager.1

Example 2. Q is a meager set in R as it can be written as a countable unionQ =

⋃a∈Qa of the nowhere dense singletons a. By the same argument,

Q is also meager in Q.

The last part of the example shows that a meager set can fill up a metricspace. However, in complete spaces the meager sets are always “meager” inthe following sense:

1Most books refer to meager sets as “sets of first category” while comeager sets arecalled “residual sets”. Sets that are not of first category, are said to be of “second cat-egory”. Although this is the original terminology of Rene-Louis Baire (1874-1932) whointroduced the concepts, it is in my opinion so nondescriptive that it should be abandonedin favor of more evocative terms.

3.8. BAIRE’S CATEGORY THEOREM 87

Theorem 3.8.3 (Baire’s Category Theorem) Assume that M is a mea-ger subset of a complete metric space (X, d). Then M does not contain anyopen balls, i.e. M c is dense in X.

Proof: Since M is meager, it can be written as a union M =⋃k∈NNk of

nowhere dense sets Nk. Given a ball B(a; r), our task is to find an elementx ∈ B(a; r) which does not belong to M .

We first observe that since N1 is nowhere dense, there is a ball B(a1; r1)inside B(a; r) which does not intersectN1. By shrinking the radius r1 slightlyif necessary, we may assume that the closed ball B(a1; r1) is contained inB(a; r), does not intersect N1, and has radius less than 1. Since N2 isnowhere dense, there is a ball B(a2; r2) inside B(a1; r1) which does not in-tersect N2. By shrinking the radius r2 if necessary, we may assume thatthe closed ball B(a2; r2) does not intersect N2 and has radius less than 1

2 .Continuing in this way, we get a sequence B(ak; rk) of closed balls, eachcontained in the previous, such that B(ak; rk) has radius less than 1

k anddoes not intersect Nk.

Since the balls are nested and the radii shrink to zero, the centers akform a Cauchy sequence. Since X is complete, the sequence converges to apoint x. Since each ball B(ak; rk) is closed, and the “tail” an∞n=k of thesequence belongs to B(ak; rk), the limit x also belongs to B(ak; rk). Thismeans that for all k, x /∈ Nk , and hence x /∈ M . Since B(a1; r1) ⊆ B(a; r),we see that x ∈ B(a; r), and the theorem is proved. 2

As an immediate consequence we have:

Corollary 3.8.4 A complete metric space is not a countable union of nowheredense sets.

Baire’s Category Theorem is a surprisingly strong tool for proving the-orems about sets and families of functions. Before we take a look at someexamples, we shall prove the following lemma which gives a simpler descrip-tion of closed, nowhere dense sets.

Lemma 3.8.5 A closed set F is nowhere dense if and only if it does notcontain any open balls.

Proof: If F contains an open ball, it obviously isn’t nowhere dense. Wetherefore assume that F does not contain an open ball, and prove that itis nowhere dense. Given a nonempty, open set G, we know that F cannotcontain all of G as G contains open balls and F does not. Pick an elementx in G that is not in F . Since F is closed, there is a ball B(x; r1) around xthat does not intersect F . Since G is open, there is a ball B(x; r2) aroundx that is contained in G. If we choose r = minr1, r2, the ball B(x; r) is


contained in G and does not intersect F , and hence F is nowhere dense. 2

Remark: Without the assumption that F is closed, the lemma is false,but it is still possible to prove a related result: A (general) set S is nowheredense if and only if its closure S doesn’t contain any open balls. See Exercise5.

We are now ready to take a look at our first application.

Definition 3.8.6 Let (X, d) be a metric space. A family F of functionsf : X → R is called pointwise bounded if for each x ∈ X, there is aconstant Mx ∈ R such that |f(x)| ≤Mx for all f ∈ F .

Note that the constant Mx may vary from point to point, and that thereneed not be a constant M such that |f(x)| ≤M for all f and all x (a simpleexample is F = f : R → R | f(x) = kx for k ∈ [−1, 1, where Mx = |x|).The next result shows that although we cannot guarantee boundedness onall X, we can under reasonable assumptions guarantee boundedness on apart of X.

Proposition 3.8.7 Let (X, d) be a complete metric space, and assume thatF is a pointwise bounded family of continuous functions f : X → R. Thenthere exists an open, nonempty set G and a constant M ∈ R such that|f(x)| ≤M for all f ∈ F and all x ∈ G.

Proof: For each n ∈ N and f ∈ F , the set f−1([−n, n]) is closed as it is theinverse image of a closed set under a continuous function (recall Proposition2.3.10). As intersections of closed sets are closed (Proposition 2.3.12)

An =⋂f∈F

f−1([−n, n])

is also closed. Since F is pointwise bounded, X =⋃n∈NAn, and Corollary

3.8.4 tells us that not all An can be nowhere dense. If An0 is not nowheredense, it contains an open set G by the lemma above. By definition of An0 ,we see that |f(x)| ≤ n0 for all f ∈ F and all x ∈ An0 (and hence all x ∈ G).2

You may doubt the usefulness of this theorem as we only know that the resultholds for some open set G, but the point is that if we have extra informationon the the family F , the sole existence of such a set may be exactly what weneed to pull through a more complex argument. In functional analysis, thereis a famous (and most useful) example of this called the Banach-SteinhausTheorem (see Exercise 4.7.11).


For our next application, we first observe that although Rn is not com-pact, it can be written as a countable union of compact sets:

Rn =⋃k∈N

[−k, k]n

We shall show that this is not the case for C([0, 1],R) — this space can notbe written as a countable union of compact sets. We need a lemma.

Lemma 3.8.8 A compact subset K of C([0, 1],R) is nowhere dense.

Proof: Since compact sets are closed, it suffices (by the previous lemma)to show that each ball B(f ; ε) contains elements that are not in K. ByArzela-Ascoli’s Theorem, we know that compact sets are equicontinuous,and hence we need only prove that B(f ; ε) contains a family of functionsthat is not equicontinuous. We shall produce such a family by perturbing fby functions that are very steep on small intervals.

For each n ∈ N, let gn be the function

gn(x) =

nx for x ≤ ε

2n

ε2 for x ≥ ε

2n

Then f + gn is in B(f, ε), but since f + gn is not equicontinuous (seeExercise 9 for help to prove this), all these functions can not be in K, andhence B(f ; ε) contains elements that are not in K. 2

Proposition 3.8.9 C([0, 1],R) is not a countable union of compact sets.

Proof: Since C([0, 1],R) is complete, it is not the countable union of nowheredense sets by Corollary 3.8.4. Since the lemma tells us that all compact setsare nowhere dense, the theorem follows. 2

Remark: The basic idea in the proof above is that the compact sets arenowhere dense since we can obtain arbitrarily steep functions by perturbinga given function just a little. The same basic idea can be used to prove moresophisticated results, e.g. that the set of nowhere differentiable functions iscomeager in C([0, 1],R). The key idea is that starting with any continuousfunction, we can perturb it into functions with arbitrarily large derivativesby using small, but rapidly oscillating functions. With a little bit of technicalwork, this implies that the set of functions that are differentiable at at leastone point, is meager.



1. Show that N is a nowhere dense subset of R.

2. Show that the set A = g ∈ C([0, 1],R) | g(0) = 0 is nowhere dense inC([0, 1],R).

3. Show that a subset of a nowhere dense set is nowhere dense and that a subsetof a meager set is meager.

4. Show that a subset S of a metric space X is nowhere dense if and only if foreach open ball B(a0; r0) ⊆ X, there is a ball B(x; r) ⊆ B(a0; r0) that doesnot intersect S.

5. Recall that the closure N of a set N consist of N plus all its boundary points.

a) Show that if N is nowhere dense, so is N .

b) Find an example of a meager set M such that M is not meager.

c) Show that a set is nowhere dense if and only if N does not contain anyopen balls.

6. Show that a countable union of meager sets is meager.

7. Show that if N1, N2, . . . , Nk are nowhere dense, so is N1 ∪N2 ∪ . . . Nk.

8. Prove that S is nowhere dense if and only if Sc contains an open, densesubset.

9. In this problem we shall prove that the set f + gn in the proof of Lemma3.8.8 is not equicontinuous.

a) Show that the set gn : n ∈ N is not equicontinuous.

b) Show that if hn is an equicontinous family of functions hn : [0, 1]→ Rand k : [0, 1]→ R is continuous, then hn + k is equicontinuous.

c) Prove that the set f + gn in the lemma is not equicontinuous. (Hint:Assume that the sequence is equicontinuous, and use part b) with hn =f + gn and k = −f to get a contradiction with a)).

10. Let N have the discrete metric. Show that N is complete and that N =⋃n∈Nn. Why doesn’t this contradict Baire’s Category Theorem?

11. Show that in a complete space, a closed set is meager if and only if it isnowhere dense.

12. Let (X, d) be a metric space.

a) Show that if G ⊆ X is open and dense, then Gc is nowhere dense.

b) Assume that (X, d) is complete. Show that if Gn is a countablecollection of open, dense subsets of X, then

⋂n∈NGn is dense in X

13. Assume that a sequence fn of continuous functions fn : [0, 1] → R con-verges pointwise to f . Show that f must be bounded on a subinterval of[0, 1]. Find an example which shows that f need not be bounded on all of[0, 1].


14. In this problem we shall study sequences fn of functions converging point-wise to 0.

a) Show that if the functions fn are continuous, then there exists a nonemptysubinterval (a, b) of [0, 1] and an N ∈ N such that for n ≥ N , |fn(x)| ≤ 1for all x ∈ (a, b).

b) Find a sequence of functions fn converging to 0 on [0, 1] such that foreach nonempty subinterval (a, b) there is for each N ∈ N an x ∈ (a, b)such that fN (x) > 1.

15. Let (X, d) be a metric space. A point x ∈ X is called isolated if there is anε > 0 such that B(x; ε) = x.

a) Show that if x ∈ X, the singleton x is nowhere dense if and only if xis not an isolated point.

b) Show that if X is a complete metric space without isolated points, thenX is uncountable.

We shall now prove:

Theorem: The unit interval [0, 1] can not be written as a countable, disjointunion of closed, proper subintervals In = [an, bn].

c) Assume for contradictions that [0, 1] can be written as such a union.Show that the set of all endpoints, F = an, bn |n ∈ N is a closedsubset of [0, 1], and that so is F0 = F \ 0, 1. Explain that since F0

is countable and complete in the subspace metric, F0 must have anisolated point, and use this to force a contradiction.


Chapter 4

Series of functions

In this chapter we shall see how the theory in the previous chapters can beused to study functions. We shall be particularly interested in how generalfunctions can be written as sums of series of simple functions such as powerfunctions and trigonometric functions. This will take us to the theories ofpower series and Fourier series.

4.1 lim sup and lim inf

In this section we shall take a look at a useful extension of the conceptof limit. Many sequences do not converge, but still have a rather regularasymptotic behavior as n goes to infinity — they may, for instance, oscillatebetween an upper set of values and a lower set. The notions of limit superior,lim sup, and limit inferior, lim inf, are helpful to describe such behavior.They also have the advantage that they always exist (provided we allowthem to take the values ±∞).

We start with a sequence an of real numbers, and define two newsequences Mn and mn by

Mn = supak | k ≥ n

and

mn = infak | k ≥ n

We allow that Mn = ∞ and that mn = −∞ as may well occur. Note thatthe sequence Mn is decreasing (as we are taking suprema over smallerand smaller sets), and that mn is increasing (as we are taking infima overincreasingly smaller sets). Since the sequences are monotone, the limits

limn→∞

Mn and limn→∞

mn

93

94 CHAPTER 4. SERIES OF FUNCTIONS

clearly exist, but they may be ±∞. We now define the limit superior of theoriginal sequence an to be

lim supn→∞

an = limn→∞

Mn

and the limit inferior to be

lim infn→∞

an = limn→∞

mn

The intuitive idea is that as n goes to infinity, the sequence an may oscil-late and not converge to a limit, but the oscillations will be asymptoticallybounded by lim sup an above and lim inf an below.

The following relationship should be no surprise:

Proposition 4.1.1 Let an be a sequence of real numbers. Then

limn→∞

an = b

if and only if

lim supn→∞

an = lim infn→∞

an = b

(we allow b to be a real number or ±∞.)

Proof: Assume first that lim supn→∞ an = lim infn→∞ an = b. Since mn ≤an ≤Mn, and

limn→∞

mn = lim infn→∞

an = b ,

limn→∞

Mn = lim supn→∞

an = b ,

we clearly have limn→∞ an = b by “squeezing”.

We now assume that limn→∞ an = b where b ∈ R (the cases b = ±∞are left to the reader). Given an ε > 0, there exists an N ∈ N such that|an − b| < ε for all n ≥ N . In other words

b− ε < an < b+ ε

for all n ≥ N . But then

b− ε ≤ mn < b+ ε

and

b− ε < Mn ≤ b+ ε

for n ≥ N . Since this holds for all ε > 0, we have lim supn→∞ an =lim infn→∞ an = b 2

4.2. INTEGRATING AND DIFFERENTIATING SEQUENCES 95

Exercises for section 4.1

1. Let an = (−1)n. Find lim supn→∞ an and lim infn→∞ an.

2. Let an = cos nπ2 . Find lim supn→∞ an and lim infn→∞ an.

3. Let an = arctan(n) sin(nπ2

). Find lim supn→∞ an and lim infn→∞ an.

4. Complete the proof of Proposition 4.1.1 for the case b =∞.

5. Show thatlim supn→∞

(an + bn) ≤ lim supn→∞

an + lim supn→∞

bn

andlim infn→∞

(an + bn) ≥ lim infn→∞

an + lim infn→∞

bn

and find examples which show that we do not in general have equality. Stateand prove a similar result for the product anbn of two positive sequences.

6. Assume that the sequence an is nonnegative and converges to a, and thatb = lim sup bn is finite and positive. Show that lim supn→∞ anbn = ab (theresult holds without the condition that b is positive, but the proof becomesmessy). What happens if the sequence an is negative?

7. We shall see how we can define lim sup and lim inf for functions f : R → R.Let a ∈ R, and define

Mε = supf(x) |x ∈ (a− ε, a+ ε)

mε = inff(x) |x ∈ (a− ε, a+ ε)

for ε > 0 (we allow Mε =∞ and mε = −∞).

a) Show that Mε decreases and mε increases as ε→ 0.

b) Show that lim supx→a f(x) = limε→0+ Mε and lim infx→a f(x) = limε→0+ mε

exist (we allow ±∞ as values).

c) Show that limx→a f(x) = b if and only if lim supx→a f(x) = lim infx→a f(x) =b

d) Find lim infx→0 sin 1x and lim supx→0 sin 1

x

4.2 Integrating and differentiating sequences

Assume that we have a sequence of functions fn converging to a limitfunction f . If we integrate the functions fn, will the integrals convergeto the integral of f? And if we differentiate the fn’s, will the derivativesconverge to f ′?

In this section, we shall see that without any further restrictions, theanswer to both questions are no, but that it is possible to put conditions onthe sequences that turn the answers into yes.

Let us start with integration and the following example.

Example 1: Let fn : [0, 1]→ R be the function in the figure.


-

6

1

n

1n

Figure 1

EEEEEEEEEE

It is given by the formula

fn(x) =

2n2x if 0 ≤ x < 1

2n

−2n2x+ 2n if 12n ≤ x <

1n

0 if 1n ≤ x ≤ 1

but it is much easier just to work from the picture. The sequence fn con-verges pointwise to 0, but the integrals do not not converge to 0. In fact,∫ 1

0 fn(x) dx = 12 since the value of the integral equals the area under the

function graph, i.e. the area of a triangle with base 1n and height n. ♣

The example above shows that if the functions fn converge pointwise to afunction f on an interval [a, b], the integrals

∫ ba fn(x) dx need not converge

to∫ ba f(x) dx. The reason is that with pointwise convergence, the difference

between f and fn may be very large on small sets — so large that theintegrals of fn do not converge to the integral of f . If the convergence isuniform, this can not happen (note that the result below is actually a specialcase of Lemma 3.6.1):

Proposition 4.2.1 Assume that fn is a sequence of continuous functionsconverging uniformly to f on the interval [a, b]. Then the functions

Fn(x) =

∫ x

afn(t) dt


F (x) =

∫ x

af(t) dt

on [a, b].

Proof: We must show that for a given ε > 0, we can always find an N ∈ Nsuch that |F (x) − Fn(x)| < ε for all n ≥ N and all x ∈ [a, b]. Since fn


converges uniformly to f , there is an N ∈ N such that |f(t)− fn(t)| < εb−a

for all t ∈ [a, b]. For n ≥ N , we then have for all x ∈ [a, b]:

|F (x)− Fn(x)| = |∫ x

a(f(t)− fn(t)) dt | ≤

∫ x

a|f(t)− fn(t)| dt ≤

≤∫ x

a

ε

b− adt ≤

∫ b

a

ε

b− adt = ε

This shows that Fn converges uniformly to F on [a, b]. 2

In applications it is often useful to have the result above with a flexiblelower limit.

Corollary 4.2.2 Assume that fn is a sequence of continuous functionsconverging uniformly to f on the interval [a, b]. For any x0 ∈ [a, b], thefunctions

Fn(x) =

∫ x

x0

fn(t) dt


F (x) =

∫ x

x0

f(t) dt

on [a, b].

Proof: Recall that∫ x

afn(t) dt =

∫ x0

afn(t) dt+

∫ x

x0

fn(t) dt

regardless of the order of the numbers a, x0, x, and hence∫ x

x0

fn(t) dt =

∫ x

afn(t) dt−

∫ x0

afn(t) dt

The first integral on the right converges uniformly to∫ xa f(t) dt according to

the proposition, and the second integral converges (as a sequence of num-bers) to

∫ x0a f(t) dt. Hence

∫ xx0fn(t) dt converges uniformly to∫ x

af(t) dt−

∫ x0

af(t) dt =

∫ x

x0

f(t) dt

as was to be proved. 2

Let us reformulate this result in terms of series. Recall that a series offunctions

∑∞n=0 vn(x) converges pointwise/unifomly to a function f on an

interval I if an only if the sequence sn of partial sum sn(x) =∑n

k=0 vk(x)converges pointwise/uniformly to f on I.


Corollary 4.2.3 Assume that vn is a sequence of continuous functionssuch that the series

∑∞n=0 vn(x) converges uniformly on the interval [a, b].

Then for any x0 ∈ [a, b], the series∑∞

n=0

∫ xx0vn(t) dt converges uniformly

and∞∑n=0

∫ x

x0

vn(t) dt =

∫ x

x0

∞∑n=0

vn(t) dt

The corollary tell us that if the series∑∞

n=0 vn(x) converges uniformly,we can integrate it term by term to get∫ x

x0

∞∑n=0

vn(t) dt =∞∑n=0

∫ x

x0

vn(t) dt

This formula may look obvious, but it does not in general hold for seriesthat only converge pointwise. As we shall see later, interchanging integralsand infinite sums is quite a tricky business.

To use the corollary efficiently, we need to be able to determine when aseries of functions converges uniformly. The following simple test is oftenhelpful:

Proposition 4.2.4 (Weierstrass’ M-test) Let vn be a sequence of con-tinuous functions on the interval [a, b], and assume that there is a convergentseries

∑∞n=0Mn of positive numbers such that |vn(x)| ≤ Mn for all n ∈ N

and all x ∈ [a, b]. Then series∑∞

n=0 vn(x) converges uniformly on [a, b].

Proof: Since (C([a, b],R), ρ) is complete, we only need to check that thepartial sums sn(x) =

∑nk=0 vk(x) form a Cauchy sequence. Since the series∑∞

n=0Mn converges, we know that its partial sums Sn =∑n

k=0Mk form aCauchy sequence. Since for all x ∈ [a, b] and all m > n,

|sm(x)− sn(x)| = |m∑

k=n+1

vk(x) | ≤m∑

k=n+1

|vk(x)| ≤m∑

k=n+1

Mk = |Sm − Sn| ,

this implies that sn is a Cauchy sequence. 2

Example 1: Consider the series∑∞

n=1cosnxn2 . Since | cosnx

n2 | ≤ 1n2 , and∑∞

n=01n2 converges, the original series

∑∞n=1

cosnxn2 converges uniformly to a

function f on any closed and bounded interval [a, b]. Hence we may inter-grate termwise to get∫ x

0f(t) dt =

∞∑n=1

∫x

cosnt

n2dt =

∞∑n=1

sinnx

n3

♣


Let us now turn to differentiation of sequences. This is a much trickierbusiness than integration as integration often helps to smoothen functionswhile differentiation tends to make them more irregular. Here is a simpleexample.

Example 2: The sequence (not series!) sinnxn obviously converges uni-

formly to 0, but the sequence of derivatives cosnx does not converge atall. ♣

The example shows that even if a sequence fn of differentiable functionsconverges uniformly to a differentiable function f , the derivatives f ′n neednot converge to the derivative f ′ of the limit function. If you draw the graphsof the functions fn, you will see why — although they live in an increasinglynarrower strip around the x-axis, they all wriggle equally much, and thederivatives do not converge.

To get a theorem that works, we have to put the conditions on thederivatives. The following result may look ugly and unsatisfactory, but itgives us the information we shall need.

Proposition 4.2.5 Let fn be a sequence of differentiable functions onthe interval [a, b]. Assume that the derivatives f ′n are continuous and thatthey converge uniformly to a function g on [a, b]. Assume also that thereis a point x0 ∈ [a, b] such that the sequence f(x0) converges. Then thesequence fn converges uniformly on [a, b] to a differentiable function fsuch that f ′ = g.

Proof: The proposition is just Corollary 4.2.2 in a convenient disguise. Ifwe apply that proposition to the sequence f ′n, we se that the integrals∫ xx0f ′n(t) dt converge uniformly to

∫ xx0g(t) dt. By the Fundamental Theorem

of Calculus, we get

fn(x)− fn(x0)→∫ x

x0

g(t) dt uniformly on [a, b]

Since fn(x0) converges to a limit b, this means that fn(x) converges uni-formly to the function f(x) = b+

∫ xx0g(t) dt. Using the Fundamental Theo-

rem of Calculus again, we see that f ′(x) = g(x). 2

Also in this case it is useful to have a reformulation in terms of series:

Corollary 4.2.6 Let∑∞

n=0 un(x) be a series where the functions un aredifferentiable with continuous derivatives on the interval [a, b]. Assume thatthe series of derivatives

∑∞n=0 u

′n(x) converges uniformly on [a, b]. Assume

also that there is a point x0 ∈ [a, b] where the series∑∞

n=0 un(x0) converges.


Then the series∑∞

n=0 un(x) converges uniformly on [a, b], and( ∞∑n=0

un(x)

)′=∞∑n=0

u′n(x)

The corollary tells us that under rather strong conditions, we can differ-entiate the series

∑∞n=0 un(x) term by term.

Example 3: Summing a geometric series, we see that

1

1− e−x=∞∑n=0

e−nx for x > 0 (4.2.1)

If we can differentiate term by term on the right hand side, we shall get

e−x

(1− e−x)2=

∞∑n=1

ne−nx for x > 0 (4.2.2)

To check that this is correct, we must check the convergence of the dif-ferentiated series (4.2.2). Choose an interval [a, b] where a > 0, thenne−nx ≤ ne−na for all x ∈ [a, b]. Using, e.g., the ratio test, it is easy tosee that the series

∑∞n=0 ne

−na converges, and hence∑∞

n=0 ne−nx converges

uniformly on [a, b] by Weierstrass’ M -test. The corollary now tells us thatthe sum of the sequence (4.2.2) is the derivative of the sum of the sequence(4.2.1), i.e.

e−x

(1− e−x)2=

∞∑n=1

ne−nx for x ∈ [a, b]

Since [a, b] is an arbitrary subinterval of (0,∞), we have

e−x

(1− e−x)2=∞∑n=1

ne−nx for all x > 0

♣


1. Show that∑∞n=0

cos(nx)n2+1 converges uniformly on R.

2. Does the series∑∞n=0 ne

−nx in Example 3 converge uniformly on (0,∞)?

3. Let fn : [0, 1]→ R be defined by fn(x) = nx(1− x2)n. Show that fn(x)→ 0

for all x ∈ [0, 1], but that∫ 1

0fn(x) dx→ 1

2 .

4. Explain in detail how Corollary 4.2.3 follows from Corollary 4.2.2.

5. Explain in detail how Corollary 4.2.6 follows from Proposition 4.2.5.

4.3. POWER SERIES 101

6. a) Show that series∑∞n=1

cos xn

n2 converges uniformly on R.

b) Show that∑∞n=1

sin xn

n converges to a continuous function f , and that

f ′(x) =

∞∑n=1

cos xnn2

7. One can show that

x =

∞∑n=1

2(−1)n+1

nsin(nx) for x ∈ (−π, π)

If we differentiate term by term, we get

1 =

∞∑n=1

2(−1)n+1 cos(nx) for x ∈ (−π, π)

Is this a correct formula?

8. a) Show that the sequence∑∞n=1

1nx converges uniformly on all intervals

[a,∞) where a > 1.

b) Let f(x) =∑∞n=1

1nx for x > 1. Show that f ′(x) = −

∑∞n=1

ln xnx .

4.3 Power series

Recall that a power series is a function of the form

f(x) =∞∑n=0

cn(x− a)n

where a is a real number and cn is a sequence of real numbers. It isdefined for the x-values that make the series converge. We define the radiusof convergence of the series to be the number R such that

1

R= lim sup

n→∞n√|cn|

with the interpretation that R = 0 if the limit is infinite, and R =∞ if thelimit is 0. To justify this terminology, we need the the following result.

Proposition 4.3.1 If R is the radius of convergence of the power series∑∞n=0 cn(x − a)n, the series converges for |x − a| < R and diverges for

|x− a| > R. If 0 < r < R, the series converges uniformly on [a− r, a+ r].

Proof: Let us first assume that |x − a| > R. This means that 1|x−a| <

1R ,

and since lim supn→∞n√|cn| = 1

R , there must be arbitrarily large values of

n such that n√|cn[ > 1

|x−a| . Hence |cn(x − a)n| > 1, and consequently theseries must diverge as the terms do not decrease to zero.


To prove the (uniform) convergence, assume that r is a number between0 and R. Since 1

r > 1R , we can pick a positive number b < 1 such that

br >

1R . Since lim supn→∞

n√|cn| = 1

R , there must be an N ∈ N such thatn√|cn| < b

r when n ≥ N . This means that |cnrn| < bn for n ≥ N , andhence that |cn(x − a)|n < bn for all x ∈ [a − r, a + r]. Since

∑∞n=N b

n isa convergent, geometric series, Weierstrass’ M-test tells us that the series∑∞

n=N cn(x − a)n converges uniformly on [a − r, a + r]. Since only the tailof a sequence counts for convergence, the full series

∑∞n=0 cn(x − a)n also

converges uniformly on [a−r, a+r]. Since r is an arbitrary number less thanR, we see that the series must converge on the open interval (a−R, a+R),i.e. whenever |x− a| < R. 2

Remark: When we want to find the radius of convergence, it is occasion-ally convenient to compute a slightly different limit such as limn→∞ n+1

√cn

or limn→∞ n−1√cn instead of limn→∞ n

√cn. This corresponds to finding the

radius of convergence of the power series we get by either multiplying or di-viding the original one by (x− a), and gives the correct answer as multiply-ing or dividing a series by a non-zero number doesn’t change its convergenceproperties.

The proposition above does not tell us what happens at the endpointsa±R of the interval of convergence, but we know from calculus that a seriesmay converge at both, one or neither endpoint. Although the convergenceis uniform on all subintervals [a − r, a + r], it is not in general uniform on(a−R, a+R).

Corollary 4.3.2 Assume that the power series f(x) =∑∞

n=0 cn(x−a)n hasradius of convergence R larger than 0. Then the function f is continuousand differentiable on the open interval (a−R, a+R) with

f ′(x) =∞∑n=1

ncn(x−a)n−1 =∞∑n=0

(n+1)cn+1(x−a)n for x ∈ (a−R, a+R)

and∫ x

af(t) dt =

∞∑n=0

cnn+ 1

(x−a)n+1 =

∞∑n=1

cn−1

n(x−a)n for x ∈ (a−R, a+R)

Proof: Since the power series converges uniformly on each subinterval [a −r, a+r], the sum is continuous on each such interval according to Proposition3.2.4. Since each x in (a− R, a+ R) is contained in the interior of some ofthe subintervals [a− r, a+ r], we see that f must be continuous on the fullinterval (a−R, a+R). The formula for the integral follows immediately byapplying Corollary 4.2.3 on each subinterval [a− r, a+ r] in a similar way.

4.3. POWER SERIES 103

To get the formula for the derivative, we shall apply Corollary 4.2.6. Touse this result, we need to know that the differentiated series

∑∞n=1(n +

1)cn+1(x− a)n has the same radius of convergence as the original series; i.e.that

lim supn→∞

n+1√|(n+ 1)cn+1| = lim sup

n→∞n√|cn| =

1

R

(note that by the remark above, we may use the n + 1-st root on the lefthand side instead of the n-th root). Since limn→∞

n+1√

(n+ 1) = 1, this isnot hard to show (see Exercise 6). Applying Corollary 4.2.6 on each subin-terval [a− r, a+ r], we now get the formula for the derivative at each pointx ∈ [a− r, a+ r]. Since each point in (a−R, a+R) belongs to the interiorof some of the subintervals, the formula for the derivative must hold at allpoints x ∈ (a−R, a+R). 2

A function that is the sum of a power series, is called a real analyticfunction. Such functions have derivatives of all orders.

Corollary 4.3.3 Let f(x) =∑∞

n=0 cn(x− a)n for x ∈ (a−R, a+R). Thenf is k times differentiable in (a−R, a+R) for any k ∈ N, and f (k)(a) = k!ck.Hence

∑∞n=0 cn(x− a)n is the Taylor series

f(x) =∞∑n=0

f (n)(a)

n!(x− a)n

Proof: Using the previous corollary, we get by induction that f (k) exists on(a−R, a+R) and that

f (k)(x) =

∞∑n=k

n(n− 1) · . . . · (n− k + 1)cn(x− a)n−k

Putting x = a, we get f (k)(a) = k!ck, and the corollary follows. 2


1. Find power series with radius of convergence 0, 1, 2, and ∞.

2. Find power series with radius of convergence 1 that converge at both,one and neither of the endpoints.

3. Show that for any polynomial P , limn→∞n√|P (n)| = 1.

4. Use the result in Exercise 3 to find the radius of convergence:

a)∑∞

n=02nxn

n3+1


b)∑∞

n=02n2+n−1

3n+4 xn

c)∑∞

n=0 nx2n

5. a) Explain that 11−x2 =

∑∞n=0 x

2n for |x| < 1,

b) Show that 2x(1−x2)2

=∑∞

n=0 2nx2n−1 for |x| < 1.

c) Show that 12 ln

∣∣∣1+x1−x

∣∣∣ =∑∞

n=0x2n+1

2n+1 for |x| < 1.

6. Let∑∞

n=0 cn(x− a)n be a power series.

a) Show that the radius of convergence is given by

1

R= lim sup

n→∞n+k√|cn|

for any integer k.

b) Show that limn→∞n+1√n+ 1 = 1 (write n+1

√n+ 1 = (n+ 1)

1n+1 ).

c) Prove the formula

lim supn→∞

n+1√|(n+ 1)cn+1| = lim sup

n→∞n√|cn| =

1

R

in the proof of Corollary 4.3.2.

4.4 Abel’s Theorem

We have seen that the sum f(x) =∑∞

n=0 cn(x − a)n of a power series iscontinuous in the interior (a−R, a+R) of its interval of convergence. Butwhat happens if the series converges at an endpoint a ± R? It turns outthat the sum is also continuous at the endpoint, but that this is surprisinglyintricate to prove.

Before we turn to the proof, we need a lemma that can be thought of asa discrete version of integration by parts.

Lemma 4.4.1 (Abel’s Summation Formula) Let an∞n=0 and bn∞n=0

be two sequences of real numbers, and let sn =∑n

k=0 ak. Then

N∑n=0

anbn = sNbN +N−1∑n=0

sn(bn − bn+1).

If the series∑∞

n=0 an converges, and bn → 0 as n→∞, then

∞∑n=0

anbn =∞∑n=0

sn(bn − bn+1)

in the sense that either the two series both diverge or they converge to thesame limit.

4.4. ABEL’S THEOREM 105

Proof: Note that an = sn−sn−1 for n ≥ 1, and that this formula even holdsfor n = 0 if we define s−1 = 0. Hence

N∑n=0

anbn =

N∑n=0

(sn − sn−1)bn =

N∑n=0

snbn −N∑n=0

sn−1bn

Changing the index of summation and using that s−1 = 0, we see that∑Nn=0 sn−1bn =

∑N−1n=0 snbn+1. Putting this into the formula above, we get

N∑n=0

anbn =N∑n=0

snbn −N−1∑n=0

snbn+1 = sNbN +N−1∑n=0

sn(bn − bn+1)

and the first part of the lemma is proved. The second follows by lettingN →∞. 2

We are now ready to prove:

Theorem 4.4.2 (Abel’s Theorem) The sum of a power series f(x) =∑∞n=0 cn(x − a)n is continuous in its entire interval of convergence. This

means in particular that if R is the radius of convergence, and the power se-ries converges at the right endpoint a+R, then limx↑a+R f(x) = f(a+R), andif the power series converges at the left endpoint a−R, then limx↓a−R f(x) =f(a−R).1

Proof: We already know that f is continuous in the open interval (a−R, a+R), and that we only need to check the endpoints. To keep the notationsimple, we shall assume that a = 0 and concentrate on the right endpointR. Thus we want to prove that limx↑R f(x) = f(R).

Note that f(x) =∑∞

n=0 cnRn(xR

)n. If we assume that |x| < R, we may

apply the second version of Abel’s summation formula with an = cnRn and

bn =(xn

)nto get

f(x) =∞∑n=0

fn(R)

(( xR

)n−( xR

)n+1)

=(

1− x

R

) ∞∑n=0

fn(R)( xR

)nwhere fn(R) =

∑nk=0 ckR

k. Summing a geometric series, we see that wealso have

f(R) =(

1− x

R

) ∞∑n=0

f(R)( xR

)nHence

|f(x)− f(R)| =

∣∣∣∣∣(1− x

R

) ∞∑n=0

(fn(R)− f(R))( xR

)n∣∣∣∣∣1I use limx↑b and limx↓b for one-sided limits, also denoted by limx→b− and limx→b+ .


Given an ε > 0, we must find a δ > 0 such that this quantity is less than εwhen R − δ < x < R. This may seem obvious due to the factor (1− x/R),but the problem is that the infinite series may go to infinity when x → R.Hence we need to control the tail of the sequence before we exploit the factor(1 − x/R). Fortunately, this is not difficult: Since fn(R) → f(R), we firstpick an N ∈ N such that |fn(R)− f(R)| < ε

2 for n ≥ N . Then

|f(x)− f(R)| ≤(

1− x

R

)N−1∑n=0

|fn(R)− f(R)|( xR

)n+

+(

1− x

R

) ∞∑n=N

|fn(R)− f(R)|( xR

)n≤

≤(

1− x

R

)N−1∑n=0

|fn(R)− f(R)|( xR

)n+(

1− x

R

) ∞∑n=0

ε

2

( xR

)n=

=(

1− x

R

)N−1∑n=0

|fn(R)− f(R)|( xR

)n+ε

2

where we have summed a geometric series. Now the sum is finite, andthe first term clearly converges to 0 when x ↑ R. Hence there is a δ > 0such that this term is less than ε

2 when R − δ < x < R, and consequently|f(x)− f(R)| < ε for such values of x. 2

Let us take a look at a famous example.

Example 1: Summing a geometric series, we clearly have

1

1 + x2=

∞∑n=0

(−1)nx2n for |x| < 1

Integrating, we get

arctanx =∞∑n=0

(−1)nx2n+1

2n+ 1for |x| < 1

Using the Alternating Series Test, we see that the series converges even forx = 1. By Abel’s Theorem

π

4= arctan 1 = lim

x↑1arctanx = lim

x↑1

∞∑n=0

(−1)nx2n+1

2n+ 1=∞∑n=0

(−1)n1

2n+ 1

Hence we have proved

π

4= 1− 1

3+

1

5− 1

7+ . . .

4.4. ABEL’S THEOREM 107

This is often called Leibniz’ or Gregory’s formula for π, but it was actuallyfirst discovered by the Indian mathematician Madhava (ca. 1350 – ca. 1425).♣

This example is rather typical; the most interesting information is oftenobtained at an endpoint, and we need Abel’s Theorem to secure it.

It is natural to think that Abel’s Theorem must have a converse sayingthat if limx↑a+R

∑∞n=0 cnx

n exists, then the sequence converges at the rightendpoint x = a + R. This, however, is not true as the following simpleexample shows.

Example 2: Summing a geometric series, we have

1

1 + x=∞∑n=0

(−x)n for |x| < 1

Obviously, limx↑1∑∞

n=0(−x)n = limx↑11

1+x = 12 , but the series does not

converge for x = 1. ♣

It is possible to put extra conditions on the coefficients of the series toensure convergence at the endpoint, see Exercise 2.


1. a) Explain why 11+x =

∑∞n=0(−1)nxn for |x| < 1.

b) Show that ln(1 + x) =∑∞n=0(−1)n x

n+1

n+1 for |x| < 1.

c) Show that ln 2 =∑∞n=0(−1)n 1

n+1 .

2. In this problem we shall prove the following partial converse of Abel’s The-orem:

Tauber’s Theorem Assume that s(x) =∑∞n=0 cnx

n is a power series withradius of convergence 1. Assume that s = limx↑1

∑∞n=0 cnx

n is finite. If inaddition limn→∞ ncn = 0, then the power series converges for x = 1 ands = s(1).

a) Explain that if we can prove that the power series converges for x = 1,then the rest of the theorem will follow from Abel’s Theorem.

b) Show that limN→∞1N

∑Nn=0 n|cn| = 0.

c) Let sN =∑Nn=0 cn. Explain that

s(x)− sN = −N∑n=0

cn(1− xn) +

∞∑n=N+1

cnxn

d) Show that 1− xn ≤ n(1− x) for |x| < 1.


e) Let Nx be the integer such that Nx ≤ 11−x < Nx + 1 Show that

Nx∑n=0

cn(1− xn) ≤ (1− x)

Nx∑n=0

n|cn| ≤1

Nx

Nx∑n=0

n|cn| → 0

as x ↑ 1.

f) Show that ∣∣∣∣∣∞∑

n=Nx+1

cnxn

∣∣∣∣∣ ≤∞∑

n=Nx+1

n|cn|xn

n=

dxNx

∞∑n=0

xn

where dx → 0 as x ↑ 1. Show that∑∞n=Nx+1 cnx

n → 0 as x ↑ 1.

g) Prove Tauber’s theorem.

4.5 Normed spaces

In a later chapter we shall continue our study of how general functions canbe expressed as series of simpler functions. This time the “simple functions”will be trigonometric functions and not power functions, and the series willbe called Fourier series and not power series. Before we turn to Fourier series,we shall take a look at normed spaces and inner product spaces. Strictlyspeaking, it is not necessary to know about such spaces to study Fourierseries, but a basic understanding will make it much easier to appreciate thebasic ideas and put them into a wider framework.

In Fourier analysis, one studies vector spaces of functions, and let mebegin by reminding you that a vector space is just a set where you canadd elements and multiply them by numbers in a reasonable way. Moreprecisely:

Definition 4.5.1 Let K be either R or C, and let V be a nonempty set.Assume that V is equipped with two operations:

• Addition which to any two elements u,v ∈ V assigns an element u +v ∈ V .

• Scalar multiplication which to any element u ∈ V and any numberα ∈ K assigns an element αu ∈ V .

We call V a vector space over K if the following axioms are satisfied:

(i) u + v = v + u for all u,v ∈ V .

(ii) (u + v) + w = u + (v + w) for all u,v,w ∈ V .

(iii) There is a zero vector 0 ∈ V such that u + 0 = u for all u ∈ V .

(iv) For each u ∈ V , there is an element −u ∈ V such that u + (−u) = 0.

4.5. NORMED SPACES 109

(v) α(u + v) = αu + αv for all u,v ∈ V and all α ∈ K.

(vi) (α+ β)u = αu + βu for all u ∈ V and all α, β ∈ K:

(vii) α(βu) = (αβ)u for all u ∈ V and all α, β ∈ K:

(viii) 1u = u for all u ∈ V .

To make it easier to distinguish, we sometimes refer to elements in V asvectors and elements in K as scalars.

I’ll assume that you are familar with the basic consequences of theseaxioms as presented in a course on linear algebra. Recall in particular thata subset U ⊆ V is a vector space if it closed under addition and scalarmultiplication, i.e. that whenever u,v ∈ U and α ∈ K, then u + v, αu ∈ U .

To measure the size of an element in a vector space, we introduce norms:

Definition 4.5.2 If V is a vector space over K, a norm on V is a function|| · || : V → R such that:

(i) ||u|| ≥ 0 with equality if and only if u = 0.

(ii) ||αu|| = |α|||u|| for all α ∈ K and all u ∈ V .

(iii) ||u + v|| ≤ ||u||+ ||v|| for all u,v ∈ V .

Example 1: The classical example of a norm on a real vector space, is theeuclidean norm on Rn given by

||x|| =√x2

1 + x22 + · · ·+ x2

n

where x = (x1, x2. . . . , xn). The corresponding norm on the complex vectorspace Cn is

||z|| =√|z1|2 + |z2|2 + · · ·+ |zn|2

where z = (z1, z2. . . . , zn). ♣

The spaces above are the most common vector spaces and norms in lin-ear algebra. More relevant for our purposes in this chapter are:

Example 2: Let (X, d) be a compact metric space, and let V = C(X,R)be the set of all continuous, real valued functions on X. Then V is a vectorspace over R and

||f || = sup|f(x)| |x ∈ X

is a norm on V . To get a complex example, let V = C(X,C) and define thenorm by the same formula as before. ♣

From a norm we can always get a metric in the following way:


Proposition 4.5.3 Assume that V is a vector space over K and that || · ||is a norm on V . Then

d(u,v) = ||u− v||

is a metric on V .

Proof: We have to check the three properties of a metric:

Positivity: Since d(u,v) = ||u − v||, we see from part (i) of the definitionabove that d(u,v) ≥ 0 with equality if and only if u − v = 0, i.e. if andonly if u = v.

Symmetry: Since

||u− v|| = ||(−1)(v − u)|| = |(−1)|||v − u|| = ||v − u||

by part (ii) of the definition above, we see that d(u,v) = d(v,u).

Triangle inequality: By part (iii) of the definition above, we see that for allu,v,w ∈ V :

d(u,v) = ||u− v|| = ||(u−w) + (w − v)|| ≤

≤ ||u−w||+ ||w − v|| = d(u,w) + d(w,v)

2

Whenever we refer to notions such as convergence, continuity, openness,closedness, completeness, compactness etc. in a normed vector space, weshall be refering to these notions with respect to the metric defined by thenorm. In practice, this means that we continue as before, but write ||u− v||instead of d(u,v) for the distance between the points u and v.

Remark: The inverse triangle inequality (recall Proposition 2.1.4)

|d(x, y)− d(x, z)| ≤ d(y, z) (4.5.1)

is a useful tool in metric spaces. In normed spaces, it is most convenientlyexpressed as

| ||u|| − ||v|| | ≤ ||u− v|| (4.5.2)

(use formula (4.5.1) with x = 0, y = u and z = v).

Note that if un∞n=1 is a sequence of elements in a normed vector space,we define the infinite sum

∑∞n=1 un as the limit of the partial sums sn =∑n

k=1 uk provided this limit exists; i.e.

∞∑n=1

un = limn→∞

n∑k=1

uk

4.5. NORMED SPACES 111

When the limit exists, we say that the series converges.

Remark: The notation u =∑∞

n=1 un is rather treacherous — it seems tobe a purely algebraic relationship, but it does, in fact, depend on whichnorm we are using. If we have a two different norms || · ||1 and || · ||2 on thesame space V , we may have u =

∑∞n=1 un with respect to || · ||1, but not with

respect to || · ||2, as ||u− sn||1 → 0 does not necesarily imply ||u− sn||2 → 0.This phenomenon is actually quite common, and we shall meet it on severaloccasions later in the book.

Recall from linear algebra that at vector space V is finite dimensionalif there is a finite set e1, e2, . . . , en of elements in V such that each elementx ∈ V can be written as a linear combination x = α1e1 +α2e2 + · · ·+αnenin a unique way. We call e1, e2, . . . , en a basis for V , and say that V hasdimension n. A space that is not finite dimensional is called infinte dimen-sional. Most of the spaces we shall be working with are infinite dimensional,and we shall now extend the notion of basis to (some) such spaces.

Definition 4.5.4 Let en∞n=1 be a sequence of elements in a normed vectorspace V . We say that en is a basis2 for V if for each x ∈ V there is aunique sequence αn∞n=1 from K such that

x =

∞∑n=1

αnen

Not all normed spaces have a basis; there are, e.g., spaces so big thatnot all elements can be reached from a countable set of basis elements. Letus take a look at an infinite dimensional space with a basis.

Example 3: Let c0 be the set of all sequences x = xnn∈N of real numberssuch that limn→∞ xn = 0. It is not hard to check that c0 is a vector spaceand that

||x|| = sup|xn| : n ∈ N

is a norm on c0. Let en = (0, 0, . . . , 0, 1, 0, . . .) be the sequence that is 1on element number n and 0 elsewhere. Then enn∈N is a basis for c0 withx =

∑∞n=1 xnen. ♣

If a normed vector space is complete, we call it a Banach space. Thenext theorem provides an efficient method for checking that a normed space

2Strictly speaking, there are two notions of basis for an infinite dimensional space.The type we are introducing here is sometimes called a Schauder basis and only worksin normed spaces where we can give meaning to infinite sums. There ia another kind ofbasis called a Hamel basis which does not require the space to be normed, but which isless practical for applications.


is complete. We say that a series∑∞

n=1 un in V converges absolutely if∑∞n=1 ||un|| converges (note that

∑∞n=1 ||un|| is a series of positive numbers).

Proposition 4.5.5 A normed vector space V is complete if and only ifevery absolutely convergent series converges.

Proof: Assume first that V is complete and that the series∑∞

n=0 un con-verges absolutely. We must show that the series converges in the ordinarysense. Let Sn =

∑nk=0 ||uk|| and sn =

∑nk=0 uk be the partial sums of the

two series. Since the series converges absolutely, the sequence Sn is aCauchy sequence, and given an ε > 0, there must be an N ∈ N such that|Sn − Sm| < ε when n,m ≥ N . Without loss of generality, we may assumethat m > n. By the triangle inequality

||sm − sn|| = ||m∑

k=n+1

uk|| ≤m∑

k=n+1

||uk|| = |Sm − Sn| < ε

when n,m ≥ N , and hence sn is a Cauchy sequence. Since V is complete,the series

∑∞n=0 un converges.

For the converse, assume that all absolutely convergent series converge,and let xn be a Cauchy sequence. We must show that xn converges.Since xn is a Cauchy sequence, we can find an increasing sequence ni inN such that ||xn − xm|| < 1

2ifor all n,m ≥ ni. In particular ||xni+1 − xni || <

12i

, and clearly∑∞

i=1 ||xni+1 − xni || converges. This means that the series∑∞i=1(xni+1 − xni) converges absolutely, and by assumption it converges in

the ordinary sense to some element s ∈ V . The partial sums of this sequenceare

sN =

N∑i=1

(xni+1 − xni) = xnN+1 − xn1

(the sum is “telescoping” and almost all terms cancel), and as they convergeto s, we see that xnN+1 must converge to s + xn1 . This means that asubsequence of the Cauchy sequence xn converges, and thus the sequenceitself converges according to Lemma 2.5.5. 2


1. Check that the norms in Example 1 really are norms (i.e. that they satisfythe conditions in Definition 4.5.2).

2. Check that the norms in Example 2 really are norms (i.e. that they satisfythe conditions in Definition 4.5.2).

3. Let V be a normed vector space over K. Assume that un, vn are se-quences in V converging to u og v, respectively, and that αn, βn aresequences in K converging to α og β, respectively.

4.6. INNER PRODUCT SPACES 113

a) Show that un + vn converges to u + v.

b) Show that αnun converges to αu

c) Show that αnun + βnvn converges to αu + βv.

4. Let V be a normed vector space over K.

a) Prove the inverse triangle inequality |||u||−||v||| ≤ ||u−v|| for all u,v ∈ V .

b) Assume that un is a sequence in V converging to u. Show that ||un||converges to ||u||

5. Show that

||f || =∫ 1

0

|f(t)| dt

is a norm on C([0, 1],R).

6. Prove that the set enn∈N in Example 3 really is a basis for c0.

7. Let V 6= 0 be a vector space, and let d be the discrete metric on V . Showthat d is not generated by a norm (i.e. there is no norm on V such thatd(x,y) = ||x− y||).

8. Let V 6= 0 be a normed vector space. Show that V is complete if and onlyif the unit sphere S = x ∈ V : ||x|| = 1 is complete.

9. Show that if a normed vector space V has a basis (as defined in Definition4.5.4), then it is separable (i.e. it has a countable, dense subset).

10. l1 is the set of all sequences x = xnn∈N of real numbers such that∑∞n=1 |xn|

converges.

a) Show that

||x|| =∞∑n=1

|xn|

is a norm on l1.

b) Show that the set enn∈N in Example 3 is a basis for l1.

c) Show that l1 is complete.

4.6 Inner product spaces

The usual (euclidean) norm in Rn can be defined in terms of the scalar (dot)product:

||x|| =√

x · x

This relationship is extremely important as it connects length (defined bythe norm) and orthogonality (defined by the scalar product), and it is thekey to many generalizations of geometric arguments from R2 and R3 to Rn.In this section we shall see how we can extend this generalization to certaininfinite dimensional spaces called inner product spaces.


The basic observation is that some norms on infinite dimensional spacescan be defined in terms of an inner product just as the euclidean norm isdefined in terms of the scalar product. Let us begin by taking a look at suchproducts. As in the previous section, we assume that all vector spaces areover K which is either R or C. As we shall be using complex spaces in ourstudy of Fourier series, it is important that you don’t neglect the complexcase.

Definition 4.6.1 An inner product 〈·, ·〉 on a vector space V over K is afunction 〈·, ·〉 : V × V → K such that:

(i) 〈u,v〉 = 〈v,u〉 for all u,v ∈ V (the bar denotes complex conjugation;if the vector space is real, we just have 〈u,v〉 = 〈v,u〉).

(ii) 〈u + v,w〉 = 〈u,w〉+ 〈v,w〉 for all u,v,w ∈ V .

(iii) 〈αu,v〉 = α〈u,v〉 for all α ∈ K, u,v ∈ V .

(iv) For all u ∈ V , 〈u,u〉 ≥ 0 with equality if and only if u = 0 (by (i),〈u,u〉 is always a real number).3

As immediate consequences of (i)-(iv), we have

(v) 〈u,v + w〉 = 〈u,v〉+ 〈u,w〉 for all u,v,w ∈ V .

(vi) 〈u, αv〉 = α〈u,v〉 for all α ∈ K, u,v ∈ V (note the complex conju-gate!).

(vii) 〈αu, αv〉 = |α|2〈u,v〉 (combine (i) and(vi) and recall that for complexnumbers |α|2 = αα).

Example 1: The classical examples are the dot products in Rn and Cn. Ifx = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) are two real vectors, we define

〈x,y〉 = x · y = x1y1 + x2y2 + . . .+ xnyn

If z = (z1, z2, . . . , zn) and w = (w1, w2, . . . , wn) are two complex vectors, wedefine

〈z,w〉 = z ·w = z1w1 + z2w2 + . . .+ znwn

Before we look at the next example, we need to extend integration tocomplex valued functions. If a, b ∈ R, a < b, and f, g : [a, b] → R arecontinuous functions, we get a complex valued function h : [a, b] → C byletting

h(t) = f(t) + i g(t)

3Strictly speaking, we are defining positive definite inner products, but they are theonly inner products we have use for.


We define the integral of h in the natural way:∫ b

ah(t) dt =

∫ b

af(t) dt+ i

∫ b

ag(t) dt

i.e., we integrate the real and complex parts separately.

Example 2: Again we look at the real and complex case separately. Forthe real case, let V be the set of all continuous functions f : [a, b]→ R, anddefine the inner product by

〈f, g〉 =

∫ b

af(t)g(t) dt

For the complex case, let V be the set of all continuous, complex valuedfunctions h : [a, b]→ C as descibed above, and define

〈h, k〉 =

∫ b

ah(t)k(t) dt

Then 〈·, ·〉 is an inner product on V .Note that these inner products may be thought of as natural extensions

of the products in Example 1; we have just replaced discrete sums by con-tinuous products.

Given an inner product 〈·, ·〉, we define || · || : V → [0,∞) by

||u|| =√〈u,u〉

in analogy with the norm and the dot product in Rn and Cn. For simplicity,we shall refer to || · || as a norm, although at this stage it is not at all clearthat it is a norm in the sense of Definition 4.5.2.

On our way to proving that || · || really is a norm, we shall pick up a fewresults of a geometric nature that will be useful later. We begin by definingtwo vectors u,v ∈ V to be orthogonal if 〈u,v〉 = 0. Note that if this is thecase, we also have 〈v,u〉 = 0 since 〈v,u〉 = 〈u,v〉 = 0 = 0.

With these definitions, we can prove the following generalization of thePythagorean theorem:

Proposition 4.6.2 (Pythagorean Theorem) For all orthogonal u1, u2,. . . , un in V ,

||u1 + u2 + . . .+ un||2 = ||u1||2 + ||u2||2 + . . .+ ||un||2

Proof: We have

||u1 + u2 + . . .+ un||2 = 〈u1 + u2 + . . .+ un,u1 + u2 + . . .+ un〉 =


=∑

1≤i,j≤n〈ui,uj〉 = ||u1||2 + ||u2||2 + . . .+ ||un||2

where we have used that by orthogonality, 〈ui,uj〉 = 0 whenever i 6= j. 2

Two nonzero vectors u, v are said to be parallel if there is a numberα ∈ K such that u = αv. As in Rn, the projection of u on v is the vector pparallel with v such that u−p is orthogonal to v. Figure 1 shows the idea.

-

6

@@@

@

p

uvu− p

Figure 1: The projection p of u on v

Proposition 4.6.3 Assume that u and v are two nonzero elements of V .Then the projection p of u on v is given by:

p =〈u,v〉||v||2

v

The norm of the projection is ||p|| = |〈u,v〉|||v||

Proof: Since p is parallel to v, it must be of the form p = αv. To determineα, we note that in order for u − p to be orthogonal to v, we must have〈u− p,v〉 = 0. Hence α is determined by the equation

0 = 〈u− αv,v〉 = 〈u,v〉 − 〈αv,v〉 = 〈u,v〉 − α||v||2

Solving for α, we get α = 〈u,v〉||v||2 , and hence p = 〈u,v〉

||v||2 v.

To calculate the norm, note that

||p||2 = 〈p,p〉 = 〈αv, αv〉 = |α|2〈v,v〉 =|〈u,v〉|2

||v||4〈v,v〉 =

|〈u,v〉|2

||v||2

(recall property (vi) just after Definition 4.6.1). 2

We can now extend Cauchy-Schwarz’ inequality to general inner prod-ucts:


Proposition 4.6.4 (Cauchy-Schwarz’ inequality) For all u,v ∈ V ,

|〈u,v〉| ≤ ||u||||v||

with equality if and only if u and v are parallel or at least one of them iszero.

Proof: The proposition clearly holds with equality if one of the vectors iszero. If they are both nonzero, we let p be the projection of u on v, andnote that by the pythagorean theorem

||u||2 = ||u− p||2 + ||p||2 ≥ ||p||2

with equality only if u = p, i.e. when u and v are parallel. Since ||p|| = |〈u,v〉|||v||

by Proposition 4.6.3, we have

||u||2 ≥ |〈u,v〉|2

||v||2

and the proposition follows. 2

We may now prove:

Proposition 4.6.5 (Triangle inequality for inner products) For all u,v ∈ V

||u + v|| ≤ ||u||+ ||v||

Proof: We have (recall that Re(z) refers to the real part a of a complexnumber z = a+ ib):

||u + v||2 = 〈u + v,u + v〉 = 〈u,u〉+ 〈u,v〉+ 〈v,u〉+ 〈v,v〉 =

= 〈u,u〉+ 〈u,v〉+ 〈u,v〉+ 〈v,v〉 = 〈u,u〉+ 2Re(〈u,v〉) + 〈v,v〉 ≤

≤ ||u||2 + 2||u||||v||+ ||v||2 = (||u||+ ||v||)2

where we have used that according to Cauchy-Schwarz’ inequality, we haveRe(〈u,v〉) ≤ |〈u,v〉| ≤ ||u||||v||. 2

We are now ready to prove that || · || really is a norm:

Proposition 4.6.6 If 〈·, ·〉 is an inner product on a vector space V , then

||u|| =√〈u,u〉

defines a norm on V , i.e.



(ii) ||αu|| = |α|||u|| for all α ∈ C and all u ∈ V .


Proof: (i) follows directly from the definition of inner products, and (iii)is just the triangle inequality. We have actually proved (ii) on our way toCauchy-Scharz’ inequality, but let us repeat the proof here:

||αu||2 = 〈αu, αu〉 = |α|2||u||2

where we have used property (vi) just after Definition 4.6.1. 2

The proposition above means that we can think of an inner productspace as a metric space with metric defined by

d(x,y) = ||x− y|| =√〈x− y,x− y〉

Example 3: Returning to Example 2, we see that the metric in the real aswell as the complex case is given by

d(f, g) =

(∫ b

a|f(t)− g(t)|2 dt

) 12

The next proposition tells us that we can move limits and infinite sumsin and out of inner products.

Proposition 4.6.7 Let V be an inner product space.

(i) If un is a sequence in V converging to u, then the sequence ||un||of norms converges to ||u||.

(ii) If the series∑∞

n=0 wn converges in V , then

||∞∑n=0

wn|| = limN→∞

||N∑n=0

wn||

(iii) If un is a sequence in V converging to u, then the sequence 〈un,v〉of inner products converges to 〈u,v〉 for all v ∈ V . In symbols,limn→∞〈un,v〉 = 〈limn→∞ un,v〉 for all v ∈ V .

(iv) If the series∑∞

n=0 wn converges in V , then

〈∞∑n=1

wn,v〉 =

∞∑n=1

〈wn,v〉

.


Proof: (i) follows directly from the inverse triangle inequality

| ||u|| − ||un|| | ≤ ||u− un||

(ii) follows immediately from (i) if we let un =∑n

k=0 wk

(iii) Assume that un → u. To show that 〈un,v〉 → 〈u,v〉, is sufficesto prove that 〈un,v〉 − 〈u,v〉 = 〈un − u,v〉 → 0. But by Cauchy-Schwarz’inequality

|〈un − u,v〉| ≤ ||un − u||||v|| → 0

since ||un − u|| → 0 by assumption.(iv) We use (iii) with u =

∑∞n=1 wn and un =

∑nk=1 wk. Then

〈∞∑n=1

wn,v〉 = 〈u,v〉 = limn→∞

〈un,v〉 = limn→∞

〈n∑k=1

wk,v〉 =

= limn→∞

n∑k=1

〈wk,v〉 =

∞∑n=1

〈wn,v〉

2

We shall now generalize some notions from linear algebra to our newsetting. If u1,u2, . . . ,un is a finite set of elements in V , we define thespan

Spu1,u2, . . . ,un

of u1,u2, . . . ,un to be the set of all linear combinations

α1u1 + α2u2 + . . .+ αnun, where α1, α2, . . . , αn ∈ K

A set A ⊆ V is said to be orthonormal if it consists of orthogonal elementsof length one, i.e. if for all a,b ∈ A, we have

〈a,b〉 =

0 if a 6= b

1 if a = b

If e1, e2, . . . , en is an orthonormal set and u ∈ V , we define the projectionof u on Spe1, e2, . . . , en by

Pe1,e2,...,en(u) = 〈u, e1〉e1 + 〈u, e2〉e2 + · · ·+ 〈u, en〉en

This terminology is justified by the following result.

Proposition 4.6.8 Let e1, e2, . . . , en be an orthonormal set in V . For ev-ery u ∈ V , the projection Pe1,e2,...,en(u) is the element in Spe1, e2, . . . , enclosest to u. Moreover, u − Pe1,e2,...,en(u) is orthogonal to all elements inSpe1, e2, . . . , en.


Proof: We first prove the orthogonality. It suffices to prove that

〈u− Pe1,e2,...,en(u), ei〉 = 0 (4.6.1)

for each i = 1, 2, . . . , n, as we then have

〈u− Pe1,e2,...,en(u), α1e1 + · · ·+ αnen〉 =

= α1〈u− Pe1,e2,...,en(u), e1〉+ . . .+ αn〈u− Pe1,e2,...,en(u), en〉 = 0

for all α1e1 + · · ·+αnen ∈ Spe1, e2, . . . , en. To prove formula (4.6.1), justobserve that for each ei

〈u− Pe1,e2,...,en(u), ei〉 = 〈u, ei〉 − 〈Pe1,e2,...,en(u), ei〉

= 〈u, ei〉 −(〈u, ei〉〈e1, ei〉+ 〈u, e2〉〈e2, ei〉+ · · ·+ 〈u, en〉〈en, ei〉

)=

= 〈u, ei〉 − 〈u, ei〉 = 0

To prove that the projection is the element in Spe1, e2, . . . , en closest tou, let w = α1e1 +α2e2 +· · ·+αnen be another element in Spe1, e2, . . . , en.Then Pe1,e2,...,en(u) − w is in Spe1, e2, . . . , en, and hence orthogonal tou−Pe1,e2,...,en(u) by what we have just proved. By the Pythagorean theorem

||u−w||2 = ||u−Pe1,e2,...,en(u)||2+||Pe1,e2,...,en(u)−w||2 > ||u−Pe1,e2,...,en(u)||2

2

As an immediate consequence of the proposition above, we get:

Corollary 4.6.9 (Bessel’s inequality) Let e1, e2, . . . , en, . . . be an or-thonormal sequence in V . For any u ∈ V ,

∞∑i=1

|〈u, ei〉|2 ≤ ||u||2

Proof: Since u−Pe1,e2,...,en(u) is orthogonal to Pe1,e2,...,en(u), we get by thePythagorean theorem that for any n

||u||2 = ||u− Pe1,e2,...,en(u)||2 + ||Pe1,e2,...,en(u)||2 ≥ ||Pe1,e2,...,en(u)||2

Using the Pythagorean Theorem again, we see that

||Pe1,e2,...,en(u)||2 = ||〈u, e1〉e1 + 〈u, e2〉e2 + · · ·+ 〈u, en〉en||2 =

= ||〈u, e1〉e1||2 + ||〈u, e2〉e2||2 + · · ·+ ||〈u, en〉en||2 =

= |〈u, e1〉|2 + |〈u, e2〉|2 + · · ·+ |〈u, en〉|2


and hence||u||2 ≥ |〈u, e1〉|2 + |〈u, e2〉|2 + · · ·+ |〈u, en〉|2

for all n. Letting n→∞, the corollary follows. 2

We have now reached the main result of this section. Recall from Defi-nition 4.5.4 that ei is a basis for V if any element u in V can be writtenas a linear combination u =

∑∞i=1 αiei in a unique way. The theorem tells

us that if the basis is orthonormal, the coeffisients αi are easy to find; theyare simply given by αi = 〈u, ei〉.

Theorem 4.6.10 (Parseval’s Theorem) If e1, e2, . . . , en, . . . is an or-thonormal basis for V , then for all u ∈ V , we have u =

∑∞i=1〈u, ei〉ei and

||u||2 =∑∞

i=1 |〈u, ei〉|2.

Proof: Since e1, e2, . . . , en, . . . is a basis, we know that there is a uniquesequence α1, α2, . . . , αn, . . . from K such that u =

∑∞n=1 αnen. This means

that ||u−∑N

n=1 αnen|| → 0 as N →∞. Since the projection Pe1,e2,...,eN (u) =∑Nn=1〈u, en〉en is the element in Spe1, e2, . . . , eN closest to u, we have

||u−N∑n=1

〈u, en〉en|| ≤ ||u−N∑n=1

αnen|| → 0 as N →∞

and hence u =∑∞

n=1〈u, en〉en. To prove the second part, observe that since

u =∑∞

n=1〈u, en〉en = limN→∞∑N

n=1〈u, en〉en, we have (recall Proposition4.6.7(ii))

||u||2 = limN→∞

||N∑n=1

〈u, en〉en||2 = limN→∞

N∑n=1

|〈u, en〉|2 =∞∑n=1

|〈u, en〉|2

2

The coefficients 〈u, en〉 in the arguments above are often called (abstract)Fourier coefficients. By Parseval’s theorem, they are square summable inthe sense that

∑∞n=1 |〈u, en〉|2 < ∞. A natural question is whether we can

reverse this procedure: Given a square summable sequence αn of elementsin K, does there exist an element u in V with Fourier coefficients αn, i.e.such that 〈u, en〉 = αn for all n? The answer is affirmative provided V iscomplete.

Proposition 4.6.11 Let V be a complete inner product space over K withan orthonormal basis e1, e2, . . . , en, . . .. Assume that αnn∈N is a se-quence from K which is square summable in the sense that

∑∞n=1 |αn|2 con-

verges. Then the series∑∞

n=1 αnen converges to an element u ∈ V , and〈u, en〉 = αn for all n ∈ N.


Proof: We must prove that the partial sums sn =∑n

k=1 αkek form a Cauchysequence. If m > n, we have

||sm − sn||2 = ||m∑

k=n+1

αnen||2 =

m∑k=n+1

|αn|2

Since∑∞

n=1 |αn|2 converges, we can get this expression less than any ε > 0by choosing n,m large enough. Hence sn is a Cauchy sequence, and theseries

∑∞n=1 αnen converges to some element u ∈ V . By Proposition 4.6.7,

〈u, ei〉 = 〈∞∑n=1

αnen, ei〉 =∞∑n=1

〈αnen, ei〉 = αi

2

Completeness is necessary in the proposition above — if V is not com-plete, there will always be a square summable sequence αn such that∑∞

n=1 αnen does not converge (see exercise 13).

A complete inner product space is called a Hilbert space.


1. Show that the inner products in Example 1 really are inner products (i.e.that they satisfy Definition 4.6.1).

2. Show that the inner products in Example 2 really are inner products.

3. Prove formula (v) just after Definition 4.6.1.

4. Prove formula (vi) just after Definition 4.6.1.

5. Prove formula (vii) just after Definition 4.6.1.

6. Show that if A is a symmetric (real) matrix with strictly positive eigenvalues,then

〈u,v〉 = (Au) · v

is an inner product on Rn.

7. If h(t) = f(t) + i g(t) is a complex valued function where f and g are dif-ferentiable, define h′(t) = f ′(t) + i g′(t). Prove that the integration by partsformula ∫ b

a

u(t)v′(t) dt =

[u(t)v(t)

]ba

−∫ b

a

u′(t)v(t) dt

holds for complex valued functions.

8. Assume that un and vn are two sequences in an inner product spaceconverging to u and v, respectively. Show that 〈un,vn〉 → 〈u,v〉.


9. Show that if the norm || · || is defined from an inner product by ||u|| = 〈u,u〉 12 ,we have the parallelogram law

||u + v||2 + ||u− v||2 = 2||u||2 + 2||v||2

for all u,v ∈ V . Show that the norms on R2 defined by ||(x, y)|| = max|x|, |y|and ||(x, y)|| = |x|+ |y| do not come from inner products.

10. Let e1, e2, . . . , en be an orthonormal set in an inner product space V . Showthat the projection P = Pe1,e2,...,en

is linear in the sense that P (αu) = αP (u)and P (u + v) = P (u) + P (v) for all u,v ∈ V and all α ∈ K.

11. In this problem we prove the polarization identities for real and complexinner products. These identities are useful as they express the inner productin terms of the norm.

a) Show that if V is an inner product space over R, then

〈u,v〉 =1

4

(||u + v||2 − ||u− v||2

)b) Show that if V is an inner product space over C, then

〈u,v〉 =1

4

(||u + v||2 − ||u− v||2 + i||u + iv||2 − i||u− iv||2

)12. If S is a nonempty subset of an inner product space, let

S⊥ = u ∈ V : 〈u, s〉 = 0 for all s ∈ S

a) Show that S⊥ is a closed subspace of S.

b) Show that if S ⊆ T , then S⊥ ⊇ T⊥.

13. Let l2 be the set of all real sequences x = xnn∈N such that∑∞n=1 x

2n <∞.

a) Show that if x = xnn∈N and y = ynn∈N are in l2, then the series∑∞n=1 xnyn converges. (Hint: For each N ,

N∑n=1

xnyn ≤

(N∑n=1

x2n

) 12(

N∑n=1

y2n

) 12

by Cauchy-Schwarz’ inequality)

b) Show that l2 is a vector space.

c) Show that 〈x,y〉 =∑∞n=1 xnyn is an inner product on l2.

d) Show that l2 is complete.

e) Let en be the sequence where the n-th component is 1 and all the othercomponents are 0. Show that enn∈N is an orthonormal basis for l2.

f) Let V be an inner product space with an orthonormal basis v1, v2,. . . , vn, . . .. Assume that for every square summable sequence αn,there is an element u ∈ V such that 〈u,vi〉 = αi for all i ∈ N. Showthat V is complete.


4.7 Linear operators

In linear algebra the important functions are the linear maps. The sameholds for infinitely dimensional spaces, but the maps are then usually re-ferred to as linear operators or linear transformations:

Definition 4.7.1 Assume that V and W are two vector spaces over K. Afunction A : V →W is called a linear operator if it satisfies:

(i) A(αu) = αA(u) for all α ∈ K and u ∈ V .

(ii) A(u + v) = A(u) +A(v) for all u,v ∈ V .

Combining (i) and (ii), we see that

A(αu + βv) = αA(u) + βA(v)

Using induction, this can be generalized to

A(α1u1 +α2u2+ · · ·+αnun) = α1A(u1)+α2A(u2)+ · · ·+αnA(un) (4.7.1)

It is also useful to observe that since A(0) = A(00) = 0A(0) = 0, we haveA(0) = 0 for all linear operators.

As K may be regarded as a vector space over itself, the definition abovecovers the case where W = K. The map is then usually referred to as a(linear) functional.

Example 1: Let V = C([a, b],R) be the space of continuous functions fromthe interval [a, b] to R. The function A : V → R defined by

A(u) =

∫ b

au(x) dx

is a linear functional, while the function B : V → V defined by

B(u)(x) =

∫ x

au(t) dt

is a linear operator. ♣

Example 2: Just as integration, differentiation is a linear operation, butas the derivative of a differentiable function is not necessarily differentiable,we have to be careful which spaces we work with. A function f : (a, b)→ Ris said to be infinitely differentiable if it has derivatives of all orders at allpoints in (a, b), i.e. if f (n)(x) exists for all n ∈ N and all x ∈ (a, b). Let Ube the space of all infinitely differentiable functions, and define D : U → Uby Du(x) = u′(x). Then D is a linear operator. ♣

We shall mainly be interested in linear operators between normed spaces,and the following notion is of central importance:

4.7. LINEAR OPERATORS 125

Definition 4.7.2 Assume that (V, || · ||v) and (W, || · ||W ) are two normedspaces. A linear operator A : V → W is bounded if there is a constantM ∈ R such that ||A(u)||W ≤M ||u||V for all u ∈ V .

Remark: The terminology here is rather treacherous as a bounded operatoris not a bounded function in the sense of, e.g., the Extremal Value Theorem.To see this, note that if A(u) 6= 0, we can get ||A(αu)||W = |α|||A(u)||W aslarge as we want by increasing the size of α.

The best (i.e. smallest) value of the constant M in the definition aboveis denoted by ||A|| and is given by

||A|| = sup

||A(u)||W||u||V

: u 6= 0

An alternative formulation (see Exercise 4) is

||A|| = sup ||A(u)||W : ||u||V = 1 (4.7.2)

We call ||A|| the operator norm of A. The name is justified in Exercise 7.It’s instructive to take a new look at the operators in Examples 1 and 2:

Example 3: The operators A and B in Example 1 are bounded if we usethe (usual) supremum norm on V . To see this for B, note that

|B(u)(x)| = |∫ x

au(t) dt| ≤

∫ x

a|u(t)| dt ≤

∫ x

a||u|| du = ||u||(x−a) ≤ ||u||(b−a)

which implies that ||B(u)|| ≤ (b− a)||u|| for all u ∈ V . ♣

Example 4: If we let U have the supremum norm, the operator D in Ex-ample 2 is not bounded. If we let un = sinnx, we have ||un|| = 1, but||D(un)|| = ||n cosnx|| → ∞ as n → ∞. That D is an unbounded operatoris the source of a lot of trouble, e,g. the rather unsatisfactory conditionswe had to enforce in our treatment of differentiation of series in Proposition4.2.5. ♣

We shall end this section with a brief study of the connection betweenboundedness and continuity. One way is easy:

Lemma 4.7.3 A bounded linear operator A is uniformly continuous.

Proof: If ||A|| = 0, A is constant zero and there is nothing to prove. If||A|| 6= 0, we may for a given ε > 0, choose δ = ε

||A|| . For ||u − v||V < δ, wethen have

||A(u)−A(v)||W = ||A(u− v)||W ≤ ||A||||u− v||V < ||A|| · ε

||A||< ε


which shows that A is uniformly continuous. 2

The result in the opposite direction is perhaps more surprising:

Lemma 4.7.4 If a linear map A is continuous at 0, it is bounded.

Proof: We argue contrapositively; i.e. we assume that A is not boundedand prove that A is not continuous at 0. Since A is not bounded, theremust for each n ∈ N exist a un such that ||Aun||W

||un||V = Mn ≥ n. If we put

vn = unMn||un||V , we see that vn → 0, while A(vn) does not converge to

A(0) = 0 since ||A(vn)||W = ||A( unMn||un||V )|| = ||A(un)||W

Mn||un||V = Mn||un||VMn||un||V = 1. By

Proposition 2.2.5, this means that A is not contiuous at 0. 2

Let us sum up the two lemmas in a theorem:

Theorem 4.7.5 For linear operators A : V → W between normed spaces,the following are equivalent:

(i) A is bounded.

(ii) A is uniformly continuous.

(iii) A is continuous at 0.

Proof: It suffices to prove (i)=⇒(ii)=⇒(iii)=⇒(i). As (ii)=⇒(iii) is obvious,we just have to observe that (i)=⇒(ii) by Lemma 4.7.3 and (iii)=⇒(i) byLemma 4.7.4. 2


1. Prove Formula (4.7.1).

2. Check that the operator A in Example 1 is a linear functional and that B isa linear operator.

3. Check that the operator D in Example 2 is a linear operator.


5. Define F : C([0, 1],R) → R by F (u) = u(0). Show that F is a linear func-tional. Is F continuous?

6. Assume that (U, || · ||U ), (V, || · ||V ) and (W, || · ||W ) are three normed vectorspaces over R. Show that if A : U → V and B : V → W are bounded,linear operators, then C = B A is a bounded, linear operator. Show that||C|| ≤ ||A||||B|| and find an example where we have strict inequality (it ispossible to find simple, finite dimensional examples)

7. Assume that (V, || · ||V ) and (W, || · ||W ) are two normed vector spaces over R,and let B(V,W ) be the set of all bounded, linear operators from V to W .

4.7. LINEAR OPERATORS 127

a) Show that if A,B ∈ B(V,W ), then A+B ∈ B(V,W ).

b) Show that if A ∈ B(V,W ) and α ∈ R, then αA ∈ B(V,W ).

c) Show that B(V,W ) is a vector space.

d) Show that

||A|| = inf

||A(u)||W||u||V

: u 6= 0

is a norm on B(V,W ).

8. Assume that (W, || · ||W ) is a normed vector space. Show that all linearoperators A : Rd →W are bounded.

9. In this problem we shall give another characterization of boundedness forfunctionals. We assume that V is a normed vector space over K and letA : V → K be a linear functional. The kernel of A is defined by

ker(A) = v ∈ V : A(v) = 0 = A−1(0)

a) Show that if A is bounded, ker(A) is closed. (Hint: Recall Proposition2.3.10)

We shall use the rest of the problem to prove the converse: If kerA is closed,then A is bounded. As this is obvious when A is identically zero, we mayassume that there is an element a in ker(A)c. Let b = a

A(a) (since A(a) is a

number, this makes sense).

b) Show that A(b) = 1 and that there is a ball B(b; r) around b containedin kerAc.

c) Show that if u ∈ B(0; r) (where r is as in b) above), then ||A(u)||W ≤ 1.(Hint: Assume for contradiction that u ∈ B(0, r), but ||A(u)||W > 1,and show that A(b− u

A(u) ) = 0 although b− uA(u) ∈ B(b; r).)

d) Use a) and c) to prove:

Teorem: Assume that (V, || · ||V ) is a normed spaces over K. A linearfunctional A : V → K is bounded if and only if ker(A) is closed.

10. Let (V, 〈·, ·〉) be a complete inner product space over R with an orthonormalbasis en.

a) Show that for each y ∈ V , the map B(x) = 〈x,y〉 is a bounded linearfunctional.

b) Assume now that A : V → R is a bounded linear functional, and letβn = A(en). Show that A(

∑ni=1 βiei) =

∑ni=1 β

2i and conclude that(∑∞

i=1 β2i

) 12 ≤ ||A||.

c) Show that the series∑∞i=1 βiei converges in V .

d) Let y =∑∞i=1 βiei. Show that A(x) = 〈x,y〉 for all x ∈ V , and that

||A|| = ||y||V . (Note: This is a special case of the Riesz-Frechet Repre-sentation Theorem which says that all linear functionals A on a Hilbertspace H is of the form A(x) = 〈x,y〉 for some y ∈ H. The assumptionthat V has an orthonormal basis is not needed for the theorem to betrue).


11. Assume that (V, ||·||V ) and (W, ||·||W ) are two normed vector spaces over R, andlet An be a sequence of bounded, linear operators from V to W . Assumethat limn→∞An(v) exists for all v ∈ V , and define A(v) = limn→∞An(v).

a) Show that A is a linear operator.

b) Assume from now on that U is complete and show that there is a closedball B(a; r), r > 0, and a constant M ∈ R such that ||An(u)||W ≤ Mfor all u ∈ B(a; r) and all n ∈ N. (Hint: Use Proposition 3.8.7).

c) Show that there is a number K ∈ R such that ||An(u)||W ≤ K||u||V forall u ∈ V and all n ∈ N. (Hint: a + r

||u||U u ∈ B(a; r) for all nonzero

u ∈ V ).

d) Show that the linear operator A is bounded. (Note: This result is oftenreferred to as the Banach-Steinhaus Theorem.)

Chapter 5

Measure and integration

In calculus you have learned how to calculate the size of different kinds ofsets: the length of a curve, the area of a region or a surface, the volume ormass of a solid. In probability theory and statistics you have learned how tocompute the size of other kinds of sets: the probability that certain eventshappen or do not happen.

In this chapter we shall develop a general theory for the size of sets,a theory that covers all the examples above and many more. Just as theconcept of a metric space gave us a general setting for discussing the notionof distance, the concept of a measure space will provide us with a generalsetting for discussing the notion of size.

In calculus we use integration to calculate the size of sets. In this chap-ter we turn the situation around: We first develop a theory of size andthen use it to define integrals of a new and more general kind. As we shallsometimes wish to compare the two theories, we shall refer to integrationas taught in calculus as Riemann integration in honor of the German math-ematician Bernhard Riemann (1826-1866) and the new theory developedhere as Lebesgue integration in honor of the French mathematician HenriLebesgue (1875-1941).

Let us begin by taking a look at what we might wish for in a theory ofsize. Assume what we want to measure the size of subsets of a set X (ifyou need something concrete to concentrate on, you may let X = R2 andthink of the area of subsets of R2, or let X = R3 and think of the volume ofsubsets of R3). What properties do we want such a measure to have?

Well, if µ(A) denotes the size of a subset A of X, we would expect

(i) µ(∅) = 0

as nothing can be smaller than the empty set. In addition, it seems reason-able to expect:

129

130 CHAPTER 5. MEASURE AND INTEGRATION

(ii) If A1, A2, A3 . . . is a disjoint sequence of sets, then

µ(⋃n∈N

An) =∞∑n=1

µ(An)

These two conditions are, in fact, all we need to develop a reasonabletheory of size, except for one complication: It turns out that we can not ingeneral expect to measure the size of all subsets of X – some subsets arejust so irregular that we can not assign a size to them in a meaningful way.This means that before we impose conditions (i) and (ii) above, we need todecide which properties the measurable sets (those we are able to assign asize to) should have. If we call the collection of all measurable sets A, thestatement A ∈ A is just a shorthand for “A is measurable”.

The first condition is simple; since we have already agreed that µ(∅) = 0,we must surely want to impose

(iii) ∅ ∈ A

For the next condition, assume that A ∈ A. Intuitively, this means thatwe should be able to assign a size µ(A) to A. If the size µ(X) of the entirespace is finite, we ought to have µ(Ac) = µ(X)−µ(A), and hence Ac shouldbe measurable. We shall impose this condition even when X has infinitesize:

(iv) If A ∈ A, then Ac ∈ A.

For the third and last condition, assume that An is a sequence ofdisjoint sets in A. In view of condition (ii), it is natural to assume that⋃n∈NAn is in A. We shall impose this condition even when the sequence is

not disjoint (there are arguments for this that I don’t want to get involvedin at the moment):

(v) If Ann∈N is a sequence of sets in A, then⋃n∈NAn ∈ A.

When we now begin to develop the theory systematically, we shall takethe five conditions above as our starting point.

5.1 Measure spaces

Assume that X is a nonempty set. A collection A of subsets of X thatsatisfies conditions (iii)-(v) above, is called a σ-algebra. More succinctly:

Definition 5.1.1 Assume that X is a nonempty set. A collection A ofsubsets of X is called a σ-algebra if the following conditions are satisfied:

5.1. MEASURE SPACES 131

(i) ∅ ∈ A

(ii) If A ∈ A, then Ac ∈ A.

(iii) If Ann∈N is a sequence of sets in A, then⋃n∈NAn ∈ A.

If A is a σ-algebra of subsets of X, we call the pair (X,A) a measurablespace.

As already mentioned, the intuitive idea is that the sets in A are those thatare so regular that we can measure their size.

Before we introduce measures, we take a look at some simple conse-quences of the definition above:

Proposition 5.1.2 Assume that A is a σ-algebra on X. Then

a) X ∈ A.

b) If Ann∈N is a sequence of sets in A, then⋂n∈NAn ∈ A.

c) If A1, A2, . . . , An ∈ A, then A1∪A2∪ . . .∪An ∈ A and A1∩A2∩ . . .∩An ∈ A.

d) If A,B ∈ A, then A \B ∈ A.

Proof: a) By conditions (i) and (ii) in the definition, X = ∅c ∈ A.

b) By condition (ii), each Acn is in A, and hence⋃n∈NA

cn ∈ A by condi-

tion (iii). By one of De Morgan’s laws,( ⋂n∈N

An)c

=⋃n∈N

Acn

and hence(⋂

n∈NAn)c

is in A. Using condition (ii) again, we see that⋂n∈NAn is in A.

c) If we extend the finite sequence A1, A2, . . . , An to an infinite oneA1, A2, . . . , An, ∅, ∅, . . ., we see that

A1 ∪A2 ∪ . . . ∪An =⋃n∈N

An ∈ A

by condition (iii). A similar trick works for intersections, but we have toextend the sequence A1, A2, . . . , An to A1, A2, . . . , An, X,X, . . . instead ofA1, A2, . . . , An, ∅, ∅, . . .. The details are left to the reader.

d) We have A\B = A∩Bc, which is inA by condition (ii) and c) above. 2


It is time to turn to measures. Before we look at the definition, there isa small detail we have to take care of. As you know from calculus, there aresets of infinite size – curves of infinite length, surfaces of infinite area, solidsof infinite volume. We shall use the symbol ∞ to indicate that sets haveinfinite size. This does not mean that we think of ∞ as a number; it is justa symbol to indicate that something has size bigger than can be specifiedby a number.

A measure µ assigns a value µ(A) (“the size of A”) to each set A in theσ-algebra A. The value is either ∞ or a nonnegative number. If we let

R+ = [0,∞) ∪ ∞

be the set of extended, nonnegative real numbers, µ is a function from A toR+. In addition, µ has to satisfy conditions (i) and (ii) above, i.e.:

Definition 5.1.3 Assume that (X,A) is a measurable space. A measureon (X,A) is a function µ : A → R+ such that

(i) µ(∅) = 0

(ii) (Countable additivity) If A1, A2, A3 . . . is a disjoint sequence of setsfrom A, then

µ(∞⋃n=1

AN ) =∞∑n=1

µ(An)

(We treat infinite terms in the obvious way: if some of the terms µ(An)in the sum equal ∞, then the sum itself also equals ∞).

The triple (X,A, µ) is then called a measure space.

Let us take a look at some examples.

Example 1: Let X = x1, x2, . . . , xn be a finite set, and let A be thecollection of all subsets of X. For each set A ⊆ X, let

µ(A) = |A| = the number of elements in A

Then µ is called the counting measure on X, and (X,A, µ) is a measurespace. ♣

The next two examples show two simple modifications of counting mea-sures.

Example 2: Let X and A be as in Example 1. For each element x ∈ X,let m(x) be a nonnegative, real number (the weight of x). For A ⊆ X, let

µ(A) =∑x∈A

mx


Then (X,A, µ) is a measure space. ♣

Example 3: Let X = x1, x2, . . . , xn, . . . be a countable set, and let A bethe collection of all subsets of X. For each set A ⊆ X, let

µ(A) = the number of elements in A

where we put µ(A) =∞ if A has infinitely many elements. Again µ is calledthe counting measure on X, and (X,A, µ) is a measure space. ♣

The next example is also important, but rather special.

Example 4: Let X be a any set, and let A be the collection of all subsetsof X. Choose an element a ∈ X, and define

µ(A) =

1 if a ∈ A

0 if a /∈ A

Then (X,A, µ) is a measure space, and µ is called the point measure at a. ♣

The examples we have looked at so far are important special cases, butrather untypical of the theory – they are too simple to really need the fullpower of measure theory. The next examples are much more typical, but atthis stage we can not define them precisely, only give an intuitive descriptionof their most important properties.

Example 5: In this example X = R, A is a σ-algebra containing all openand closed sets (we shall describe it more precisely later), and µ is a measureon (X,A) such that

µ([a, b]) = b− awhenever a ≤ b. This measure is called the Lebesgue measure on R, and wecan think of it as an extension of the notion of length to more general sets.The sets in A are those that can be assigned a generalized “length” µ(A) ina systematic way. ♣

Originally, measure theory was the theory of the Lebesgue measure, andit remains one of the most important examples. It is not at all obvious thatsuch a measure exists, and one of our main tasks in the next chapter is toshow that it does.

Lebesgue measure can be extended to higher dimensions:

Example 6: In this example X = R2, A is a σ-algebra containing all openand closed sets, and µ is a measure on (X,A) such that

µ([a, b]× [c, d]) = (b− a)(d− c)


whenever a ≤ b and c ≤ d (this just means that the measure of a rectangleequals its area). This measure is called the Lebesgue measure on R2, and wecan think of it as an extension of the notion of area to more general sets.The sets in A are those that can be assigned a generalized “area” µ(A) in asystematic way.

There are obvious extensions of this example to higher dimensions: Thethree dimensional Lebesgue measure assigns value

µ([a, b]× [c, d]× [e, f ]) = (b− a)(d− c)(f − e)

to all rectangular boxes and is a generalization of the notion of volume. Then-dimensional Lebesgue measure assigns value

µ([a1, b1]× [a2, b2]× · · · × [an, bn]) = (b1 − a1)(b2 − a2) · . . . · (bn − an)

to all n-dimensional, rectangular boxes and represents n-dimensional vol-ume. ♣

Although we have not yet constructed the Lebesgue measures, we shallfeel free to use them in examples and exercises. Let us finally take a look attwo examples from probabilty theory.

Example 7: Assume we want to study coin tossing, and that we plan totoss the coin N times. If we let H denote “heads” and T “tails”, the possibleoutcomes can be represented as all sequences of H’s and T’s of length N . Ifthe coin is fair, all such sequences have probability 1

2N.

To fit this into the framework of measure theory, let X be the set of allsequences of H’s and T’s of length N , let A be the collection of all subsetsof X, and let µ be given by

µ(A) =|A|2N

where |A| is the number of elements in A. Hence µ is the probability of theevent A. It is easy to check that µ is a measure on (X,A). ♣

In probability theory it is usual to call the underlying space Ω (instead ofX) and the measure P (instead of µ), and we shall often refer to probabilityspaces as (Ω,A, P ).

Example 8: We are still studying coin tosses, but this time we don’t knowbeforehand how many tosses we are going to make, and hence we have toconsider all sequences of H’s and T’s of infinite length, that is all sequences

ω = ω1, ω2, ω3, . . . , ωn, . . .


where each ωi is either H or T. We let Ω be the collection of all such se-quences.

To describe the σ-algebra and the measure, we first need to introducethe so-called cylinder sets: If a = a1, a2, . . . , an is a finite sequence of H’sand T’s, we let

Ca = ω ∈ Ω |ω1 = a1, ω2 = a2, . . . , ωn = an

and call it the cylinder set generated by a. Note that Ca consists of allsequences of coin tosses beginning with the sequence a1, a2, . . . , an. Sincethe probability of starting a sequence of coin tosses with a1, a2, . . . , an is 1

2n ,we want a measure such that P (Ca) = 1

2n .The measure space (Ω,A, P ) of infinite coin tossing consists of Ω, a σ-

algebraA containing all cylinder sets, and a measure P such that P (Ca) = 12n

for all cylinder sets of length n. It is not at all obvious that such a measurespace exists, but it does (as we shall prove in the next chapter), and it isthe right setting for the study of coin tossing of unrestricted length. ♣

Let us return to Definition 5.1.3 and derive some simple, but extremelyuseful consequences. Note that all these properties are properties we wouldexpect of a measure.

Proposition 5.1.4 Assume that (X,A, µ) is a measure space.

a) (Finite additivity) If A1, A2, . . . , Am are disjoint sets in A, then

µ(A1 ∪A2 ∪ . . . ∪Am) = µ(A1) + µ(A2) + . . .+ µ(Am)

b) (Monotonicity) If A,B ∈ A and B ⊆ A, then µ(B) ≤ µ(A).

c) If A,B ∈ A, B ⊆ A, and µ(A) <∞, then µ(A \B) = µ(A)− µ(B).

d) (Countable subadditivity) If A1, A2, . . . , An, . . . is a (not necessarilydisjoint) sequence of sets from A, then

µ(⋃n∈N

An) ≤∞∑n=1

µ(An)

Proof: a) We fill out the sequence with empty sets to get an infinite sequence

A1, A2, . . . , Am, Am+1, Am+2 . . .

where An = ∅ for n > m. Then clearly

µ(A1∪A2∪. . .∪Am) = µ(⋃n∈N

An) =

∞∑n=1

µ(An) = µ(A1)+µ(A2)+. . .+µ(An)


where we have used the two parts of definition 5.1.3.

b) We write A = B ∪ (A \ B). By Proposition 5.1.2d), A \ B ∈ A, andhence by part a) above

µ(A) = µ(B) + µ(A \B) ≥ µ(B)

c) By the argument in part b),

µ(A) = µ(B) + µ(A \B)

Since µ(A) is finite, so is µ(B), and we may subtract µ(B) on both sides ofthe equation to get the result.

d) Define a new, disjoint sequence of sets B1, B2, . . . by:

B1 = A1, B2 = A2\A1, B3 = A3\(A1∪A2), B4 = A4\(A1∪A2∪A3), . . .

Note that⋃n∈NBn =

⋃n∈NAn (make a drawing). Hence

µ(⋃n∈N

An) = µ(⋃n∈N

Bn) =∞∑n=1

µ(Bn) ≤∞∑n=1

µ(An)

where we have applied part (ii) of Definition 5.1.3 to the disjoint sequenceBn and in addition used that µ(Bn) ≤ µ(An) by part b) above. 2

The next properties are a little more complicated, but not unexpected.They are often referred to as continuity of measures:

Proposition 5.1.5 Let Ann∈N be a sequence of measurable sets in a mea-sure space (X,A, µ).

a) If the sequence is increasing (i.e. An ⊆ An+1 for all n), then

µ(⋃n∈N

An) = limn→∞

µ(An)

b) If the sequence is decreasing (i.e. An ⊇ An+1 for all n), and µ(A1) isfinite, then

µ(⋂n∈N

An) = limn→∞

µ(An)

Proof: a) If we put E1 = A1 and En = An \ An−1 for n> 1, the sequenceEn is disjoint, and

⋃nk=1Ek = An for all N (make a drawing). Hence

µ(⋃n∈N

An) = µ(⋃n∈N

En) =∞∑n=1

µ(En) =


= limn→∞

n∑k=1

µ(Ek) = limn→∞

µ(

n⋃k=1

Ek) = limn→∞

µ(An)

where we have used the additivity of µ twice.

b) We first observe that A1 \ Ann∈N is an increasing sequence of setswith union A1 \

⋂n∈NAn. By part a), we thus have

µ(A1 \⋂n∈N

An) = limn→∞

µ(A1 \An)

Applying part c) of the previous proposition on both sides, we get

µ(A1)− µ(⋂n∈N

An) = limn→∞

(µ(A1)− µ(An)) = µ(A1)− limn→∞

µ(An)

Since µ(A1) is finite, we get µ(⋂n∈NAn) = limn→∞ µ(An), as we set out to

prove. 2

Remark: The finiteness condition in part b) may look like an unnecessaryconsequence of a clumsy proof, but it is actually needed as the followingexample shows: Let X = N, let A be the set of all subsets of A, and letµ(A) = |A| (the number of elements in A). If An = n, n + 1, . . ., thenµ(An) = ∞ for all n, but µ(

⋂n∈NAn) = µ(∅) = 0. Hence limn→∞ µ(An) 6=

µ(⋂n∈NAn).

The properties we have proved in this section are the basic tools we needto handle measures. The next section will take care of a more technicalissue.


1. Verify that the space (X,A, µ) in Example 1 is a measure space.





6. Describe a measure space that is suitable for modeling tossing a die N times.

7. Show that if µ and ν are two measures on the same measurable space (X,A),then for all positive numbers α, β ∈ R, the function λ : A → R+ given by

λ(A) = αµ(A) + βν(A)

is a measure.


8. Assume that (X,A, µ) is a measure space and that A ∈ A. Define µA : A →R+ by

µA(B) = µ(A ∩B) for all B ∈ A

Show that µA is a measure.

9. Let X be an uncountable set, and define

A = A ⊆ X |A or Ac is countable

Show that A is a σ-algebra. Define µ : A → R+ by

µ(A) =

0 if A is countable

1 if Ac is countable

Show that µ is a measure.

10. Assume that (X,A) is a measurable space, and let f : X → Y be any functionfrom X to a set Y . Show that

B = B ⊆ Y | f−1(B) ∈ A

is a σ-algebra.

11. Assume that (X,A) is a measurable space, and let f : Y → X be any functionfrom a set Y to X. Show that

B = f−1(A) |A ∈ A

is a σ-algebra.

12. Let X be a set and A a collection of subsets of X such that:

a) ∅ ∈ Ab) If A ∈ A, then Ac ∈ Ac) If Ann∈N is a sequence of sets from A, then

⋂n∈NAn ∈ A.

Show that A is a σ-algebra.

13. A measure space (X,A, µ) is called atomless if µ(x) = 0 for all x ∈ X.Show that in an atomless space, all countable sets have measure 0.

14. Assume that µ is a measure on R such that µ([− 1n ,

1n ]) = 1 + 2

n for eachn ∈ N. Show that µ(0) = 1.

15. Assume that a measure space (X,A, µ) contains sets of arbitrarily large finitemeasure, i.e. for each N ∈ N, there is a set A ∈ A such that N ≤ µ(A) <∞.Show that there is a set B ∈ A such that µ(B) =∞.

16. Assume that µ is a measure on R such that µ([a, b]) = b − a for all closedintervals [a, b], a < b. Show that µ((a, b)) = b − a for all open intervals.Conversely, show that if µ is a measure on R such that µ((a, b)) = b − a forall open intervals [a, b], then µ([a, b]) = b− a for all closed intervals.

17. Let X be a set. An algebra is a collection A of subset of X such that

5.2. COMPLETE MEASURES 139

(i) ∅ ∈ A(ii) If A ∈ A, then Ac ∈ A.

(iii) If A,B ∈ A, then A ∪B ∈ A.

Show that if A is an algebra, then:

a) If A1, A2, . . . , An ∈ A, then A1 ∪ A2 ∪ . . . ∪ An ∈ A (use induction onn).

b) If A1, A2, . . . , An ∈ A, then A1 ∩A2 ∩ . . . ∩An ∈ A.

c) Put X = N and define A by

A = A ⊆ N |A or Ac is finite

Show that A is an algebra, but not a σ-algebra.

d) Assume that A is an algebra closed under disjoint, countable unions(i.e.,

⋃n∈NAn ∈ A for all disjoint sequences An of sets from A).

Show that A is a σ-algebra.

18. Let (X,A, µ) be a measure space and assume that An is a sequence of setsfrom A such that

∑∞n=1 µ(An) <∞. Let

A = x ∈ X |x belongs to infinitely many of the sets An

Show that A ∈ A and that µ(A) = 0.

5.2 Complete measures

Assume that (X,A, µ) is a measure space, and that A ∈ A with µ(A) = 0.It is natural to think that if N ⊆ A, then N must also be measurable andhave measure 0, but there is nothing in the definition of a measure that saysso, and, in fact, it is not difficult to find measure spaces where this propertydoes not hold. This is often a nuisance, and we shall now see how it can becured.

First some definitions:

Definition 5.2.1 Assume that (X,A, µ) is a measure space. A set N ⊆ Xis called a null set if N ⊆ A for some A ∈ A with µ(A) = 0. The collectionof all null sets is denoted by N . If all null sets belong to A, we say that themeasure space is complete.

Note that if N is a null set that happens to belong to A, then µ(N) = 0by Proposition 5.1.4b).

Our purpose in this section is to show that any measure space (X,A, µ)can be extended to a complete space (i.e. we can find a complete measurespace (X, A, µ) such that A ⊆ A and µ(A) = µ(A) for all A ∈ A).

We begin with a simple observation:


Lemma 5.2.2 If N1, N2, . . . are null sets, then⋃n∈NNn is a null set.

Proof: For each n, there is a set An ∈ A such that µ(An) = 0 and Nn ⊆ An.Since

⋃n∈NNn ⊆

⋃n∈NAn and

µ(⋃n∈N

An) ≤∞∑n=1

µ(An) = 0

by Proposition 5.1.4d),⋃n∈NNn is a null set. 2

The next lemma tells us how we can extend a σ-algebra to include the nullsets.

Lemma 5.2.3 If (X,A, µ) is a measure space, then

A = A ∪N |A ∈ A and N ∈ N

is the smallest σ-algebra containing A and N (in the sense that if B is anyother σ-algebra containing A and N , then A ⊆ B).

Proof: If we can only prove that A is a σ-algebra, the rest will be easy: Anyσ-algebra B containing A and N , must necessarily contain all sets of theform A∪N and hence be larger than A, and since ∅ belongs to both A andN , we have A ⊆ A and N ⊆ A.

To prove that A is a σ-algebra, we need to check the three conditions inDefinition 5.1.1. Since ∅ belongs to both A and N , condition (i) is obviouslysatisfied, and condition (iii) follows from the identity⋃

n∈N(An ∪Nn) =

⋃n∈N

An ∪⋃n∈N

Nn

and the preceeding lemma.

It remains to prove condition (ii), and this is the tricky part. Given aset A∪N ∈ A, we must prove that (A∪N)c ∈ A. Observe first that we canassume that A and N are disjoint; if not, we just replace N by N \A. Nextobserve that since N is a null set, there is a set B ∈ A such that N ⊆ Band µ(B) = 0. We may also assume that A and B are disjoint; if not, wejust replace B by B \A. Since

(A ∪N)c = (A ∪B)c ∪ (B \N)

(see Figure 1), where (A∪B)c ∈ A and B\N ∈ N , we see that (A∪N)c ∈ Aand the lemma is proved. 2


'

&

$

%

A

N

B

X

(A ∪B)c

B \N

Figure 1: (A ∪N)c = (A ∪B)c ∪ (B \N)

The next step is to extend µ to a measure on A. Here is the key obser-vation:

Lemma 5.2.4 If A1, A2 ∈ A and N1, N2 ∈ N are such that A1 ∪ N1 =A2 ∪N2, then µ(A1) = µ(A2).

Proof: Let B2 be a set in A such that N2 ⊆ B2 and µ(B2) = 0. ThenA1 ⊆ A1 ∪N1 = A2 ∪N2 ⊆ A2 ∪B2, and hence

µ(A1) ≤ µ(A1 ∪B2) ≤ µ(A2) + µ(B2) = µ(A2)

Interchanging the roles of A1 and A2, we get the opposite inequality µ(A2) ≤µ(A1), and hence we must have µ(A1) = µ(A2). 2

We are now ready for the main result. It shows that we can always extenda measure space to a complete measure space in a controlled manner. Themeasure space (X, A, µ) in the theorem below is called the completion ofthe original measure space (X,A, µ).

Theorem 5.2.5 Assume that (X,A, µ) is a measure space, let

A = A ∪N |A ∈ A and N ∈ N

and define µ : A → R+ by

µ(A ∪N) = µ(A)

for all A ∈ A and all N ∈ N . Then (X, A, µ) is a complete measure space,and µ is an extension of µ, i.e. µ(A) = µ(A) for all A ∈ A.

Proof: We already know that A is a σ-algebra, and by the lemma above,the definition

µ(A ∪N) = µ(A)


is legitimate (i.e. it only depends on the set A ∪ N and not on the setsA ∈ A, N ∈ N we use to represent it). Also, we clearly have µ(A) = µ(A)for all A ∈ A.

To prove that µ is a measure, observe that since obviously µ(∅) = 0, wejust need to check that if Bn is a disjoint sequence of sets in A, then

µ(⋃n∈N

Bn) =

∞∑n=1

µ(Bn)

For each n, pick sets An ∈ A, Nn ∈ N such that Bn = An ∪Nn. Then theAn’s are clearly disjoint since the Bn’s are, and since

⋃n∈NBn =

⋃n∈NAn∪⋃

n∈NNn, we get

µ(⋃n∈N

Bn) = µ(⋃n∈N

An) =

∞∑n=1

µ(An) =

∞∑n=1

µ(Bn)

It remains to check that µ is complete. Assume that C ⊆ D, whereµ(D) = 0; we must show that C ∈ A. Since µ(D) = 0, D is of the formD = A∪N , where A is in A with µ(A) = 0, and N is in N . By definition ofN , there is a B ∈ A such that N ⊆ B and µ(B) = 0. But then C ⊆ A ∪B,where µ(A ∪B) = 0, and hence C is in N and hence in A. 2

In Lemma 5.2.3 we proved that A is the smallest σ-algebra containing Aand N . This an instance of a more general phenomenon: Given a collectionB of subsets of X, there is always a smallest σ-algebra A containing B. Itis called the σ-algebra generated by B and is often designated by σ(B). Theproof that σ(B) exists is not difficult, but quite abstract:

Proposition 5.2.6 Let X be a non-empty set and B a collection of subsetsof X. Then there exists a smallest σ-algebra σ(B) containing B (in the sensethat if C is any other σ-algebra containing B, then σ(B) ⊆ C).

Proof: Observe that there is at least one σ-algebra containing B, namely theσ-algebra of all subsets of X. This guarantees that the following definitionmakes sense:

σ(B) = A ⊆ X |A belongs to all σ-algebras containing B

It suffices to show that σ(B) is a σ-algebra as it then clearly must be thesmallest σ-algebra containing B.

We must check the three conditions in Definition 5.1.1. For (i), justobserve that since ∅ belongs to all σ-algebras, it belongs to σ(B). For (ii),observe that if A ∈ σ(B), then A belongs to all σ-algebras containing B.Since σ-algebras are closed under complements, Ac belongs to the same σ-algebras, and hence to σ(B). The argument for (iii) is similar: Assume that


the sets An, n ∈ N, belong to σ(B). Then they belong to all σ-algebrascontaining B, and since σ-algebras are closed under countable unions, theunion

⋃n∈NAn belongs to the same σ-algebras and hence to σ(B). 2

In many applications, the underlying set X is also a metric space (e.g.,X = Rd for the Lebesgue measure). In this case the σ-algebra σ(G) gener-ated by the collection G of open sets is called the Borel σ-algebra, a measuredefined on σ(G) is called a Borel measure, and the sets in σ(G) are calledBorel sets. Most useful measures on metric spaces are either Borel measuresor completions of Borel measures.

We can now use the results and terminology of this section to give amore detailed descriptions of the Lebesgue measure on Rd. It turns out (aswe shall prove in the next chapter) that there is a unique measure on theBorel σ-algebra σ(G) such that

µ([a1, b1]× [a2, b2]× · · · × [ad, bd]) = (b1 − a1)(b2 − a2) · . . . · (bd − ad)

whenever a1 < b1, a2 < b2,. . . , ad < bd (i.e. µ assigns the “right” valueto all rectangular boxes). The completion of this measure is the Lebesguemeasure on Rd.

We can give a similar description of the space of all infinite series of cointosses in Example 8 of section 5.1. In this setting one can prove that thereis a unique measure on the σ-algebra σ(C) generated by the cylinder sets,and the completion of this measure is the one used to model coin tossing.


1. Let X = 0, 1, 2 and let A = ∅, 0, 1, 2, X.

a) Show that A is a σ-algebra.

b) Define µ : A → R+ by: µ(∅) = µ(0, 1) = 0, µ(2) = µ(X) = 1.Show that µ is a measure.

c) Show that µ is not complete, and describe the completion (X, A, µ) of(X,A, µ).

2. Redo Problem 1 for X = 0, 1, 2, 3, A = ∅, 0, 1, 2, 3, X, and µ(∅) =µ(0, 1) = 0, µ(2, 3) = µ(X) = 1.

3. Let (X,A, µ) be a complete measure space. Assume that A,B ∈ A withµ(A) = µ(B) <∞. Show that if A ⊆ C ⊆ B, then C ∈ A.

4. Let A and B be two collections of subsets of X. Assume that any set in Abelongs to σ(B) and that any set in B belongs to σ(A). Show that σ(A) =σ(B).

5. Assume that X is a metric space, and let G be the collection of all open setsand F the collection of all closed sets. Show that σ(G) = σ(F).


6. Let X be a set. An algebra is a collection A of subset of X such that

(i) ∅ ∈ A(ii) If A ∈ A, then Ac ∈ A.

(iii) If A,B ∈ A, then A ∪B ∈ A.

Show that if B is a collection of subsets of X, there is a smallest algebra Acontaining B.

7. Let X be a set. A monotone class is a collectionM of subset of X such that

(i) If An is an increasing sequence of sets from M, then⋃n∈NAn ∈M.

(ii) If An is a decreasing sequence of sets from M, then⋂n∈NAn ∈M.

Show that if B is a collection of subsets of X, there is a smallest monotoneclass M containing B.

5.3 Measurable functions

One of the main purposes of measure theory is to provide a richer and moreflexible foundation for integration theory, but before we turn to integration,we need to look at the functions we hope to integrate, the measurable func-tions. As functions taking the values ∞ and −∞ will occur naturally aslimits of sequences of ordinary functions, we choose to include them fromthe beginning; hence we shall study functions

f : X → R

where (X,A, µ) is a measure space and R = R ∪ −∞,∞ is the set ofextended real numbers. Don’t spend too much effort on trying to figureout what −∞ and ∞ “really” are — they are just convenient symbols fordescribing divergence.

To some extent we may extend ordinary algebra to R, e.g., we shall let

∞+∞ =∞, −∞−∞ = −∞

and

∞ ·∞ =∞, (−∞) · ∞ = −∞, (−∞) · (−∞) =∞.

If r ∈ R, we similarly let

∞+ r =∞, −∞+ r = −∞

For products, we have to take the sign of r into account, hence

∞ · r =

∞ if r > 0

−∞ if r < 0

5.3. MEASURABLE FUNCTIONS 145

and similarly for (−∞) · r.All the rules above are natural and intuitive. Expressions that do not

have an intuitive interpretation, are usually left undefined, e.g. is ∞−∞not defined. There is one exception to this rule; it turns out that in measuretheory (but not in other parts of mathematics!) it is convenient to define0 · ∞ =∞ · 0 = 0.

Since algebraic expressions with extended real numbers are not alwaysdefined, we need to be careful and always check that our expressions makesense.

We are now ready to define measurable functions:

Definition 5.3.1 Let (X,A, µ) be a measure space. A function f : X → Ris measurable (with respect to A) if

f−1([−∞, r)) ∈ A

for all r ∈ R. In other words, the set

x ∈ X : f(x) < r

must be measurable for all r ∈ R.

The half-open intervals in the definition are just a convenient startingpoint for showing that the inverse images of open and closed sets are mea-surable, but to prove this, we need a little lemma:

Lemma 5.3.2 Any non-empty, open set G in R is a countable union ofopen intervals.

Proof: Call an open interval (a, b) rational if the endpoints a, b are rationalnumbers. As there are only countably many rational numbers, there areonly countably many rational intervals. It is not hard to check that G is theunion of those rational intervals that are contained in G. 2

Proposition 5.3.3 If f : X → R is measurable, then f−1(I) ∈ A for allintervals I = (s, r), I = (s, r], I = [s, r), I = [s, r] where s, r ∈ R. Indeed,f−1(A) ∈ A for all open and closed sets A.

Proof: We use that inverse images commute with intersections, unions andcomplements. First observe that for any r ∈ R

f−1([−∞, r]

)= f−1

( ⋂n∈N

[−∞, r +1

n))

=⋂n∈N

f−1([−∞, r +

1

n))∈ A


which shows that the inverse images of closed intervals [−∞, r] are measur-able. Taking complements, we see that the inverse images of intervals of theform [s,∞] and (s,∞] are measurable:

f−1([s,∞]) = f−1([−∞, s)c) =(f−1([−∞, s)

))c ∈ A

andf−1((s,∞]) = f−1([−∞, s]c) =

(f−1([−∞, s]

))c ∈ A

To show that the inverse images of finite intervals are measurable, we justtake intersections, e.g.,

f−1((s, r)) = f−1([−∞, r) ∩ (s,∞]) = f−1([−∞, r)) ∩ f−1((s,∞]) ∈ A

If A is open, we know from the lemma above that it is a countable unionA =

⋃n∈N In of open intervals. Hence

f−1(A) = f−1( ⋃n∈N

In)

=⋃n∈N

f−1(In) ∈ A

Finally, to prove the proposition for closed sets A, we are going to use thatthe complement (in R) of a closed set is an open set. We have to be a littlecareful, however, as complements in R are not the same as complements inR. Note that if O = R\A is the complement of A in R, then O is open, andA = Oc ∩ R, where Oc is the complement of O in R. Hence

f−1(A) = f−1(Oc ∩ R) = f−1(O)c ∩ f−1(R) ∈ A2

It is sometimes convenient to use other kinds of intervals than those inthe definition to check that a function is measurable:

Proposition 5.3.4 Let (X,A, µ) be a measure space and consider a func-tion f : X → R. If either

(i) f−1([−∞, r]) ∈ A for all r ∈ R, or

(ii) f−1([r,∞]) ∈ A for all r ∈ R, or

(iii) f−1((r,∞]) ∈ A for all r ∈ R,

then f is measurable.

Proof: In either case we just have to check that f−1([−∞, r)) ∈ A for allr ∈ R. This can be done by the techniques in the previous proof. The detailsare left to the reader. 2

The next result tells us that there are many measurable functions. Recallthe definition of Borel measures and completed Borel measures from the endof Section 5.2.


Proposition 5.3.5 Let (X, d) be a metric space and let µ be a Borel or acompleted Borel measure on X. Then all continuous functions f : X → Rare measurable.

Proof: Since f is continuous and takes values in R,

f−1([−∞, r)) = f−1((−∞, r))

is an open set by Proposition 2.3.9 and measurable since the Borel σ-algebrais generated by the open sets. 2

We shall now prove a series of results showing how we can obtain newmeasurable functions from old ones. These results are not very exciting, butthey are necessary for the rest of the theory. Note that the functions in thenext two propositions take values in R and not R.

Proposition 5.3.6 Let (X,A, µ) be a measure space. If f : X → R ismeasurable, then φf is measurable for all continuous functions φ : R→ R.In particular, f2 is measurable.

Proof: We have to check that

(φ f)−1((−∞, r)) = f−1(φ−1((−∞, r)))

is measurable. Since φ is contiunuous, φ−1((−∞, r)) is open, and con-sequently f−1(φ−1((−∞, r))) is measurable by Proposition 5.3.3. To seethat f2 is measurable, apply the first part of the theorem to the functionφ(x) = x2. 2

Proposition 5.3.7 Let (X,A, µ) be a measure space. If the functions f, g :X → R are measurable, so are f + g, f − g, and fg.

Proof: To prove that f +g is measurable, observe first that f +g < r meansthat f < r − g. Since the rational numbers are dense, it follows that thereis a rational number q such that f < q < r − g. Hence

(f + g)−1([−∞, r)) = x ∈ X | (f + g) < r) =⋃q∈Q

(x ∈ X | f(x) < q ∩ x ∈ X | g < r − q)

which is measurable since Q is countable and a countabe union of measurablesets is measurable. A similar argument proves that f − g is measurable.

To prove that fg is measurable, note that by Proposition 5.3.6 and whatwe have already proved, f2, g2, and (f + g)2 are measurable, and hence

fg =1

2

((f + g)2 − f2 − g2

)


is measurable (check the details). 2

We would often like to apply the result above to functions taking valuesin the extended real numbers, but the problem is that the expressions neednot make sense. As we shall mainly be interested in functions that are finiteexcept on a set of measure zero, there is a way out of the problem. Let usstart with the terminology.

Definition 5.3.8 Let (X,A, µ) be a measure space. We say that a mea-surable function f : X → R is finite almost everywhere if the set x ∈X : f(x) = ±∞ has measure zero. We say that two measurable functionsf, g : X → R are equal almost everywhere if the set x ∈ X : f(x) 6= g(x)has measure zero. We usually abbreviate “almost everywhere” by “a.e.”.

If the measurable functions f and g are finite a.e., we can modify themto get measurable functions f ′ and g′ which take values in R and are equala.e. to f and g, respectively (see exercise 13). By the proposition above,f ′ + g′, f ′ − g′ and f ′g′ are measurable, and for many purposes they aregood representatives for f + g, f − g and fg.

Let us finally see what happens to limits of sequences.

Proposition 5.3.9 Let (X,A, µ) be a measure space. If fn is a sequenceof measurable functions fn : X → R, then supn∈N fn(x), infn∈N fn(x),lim supn→∞ fn(x) and lim infn→∞ fn(x) are measurable. If the sequence con-verges pointwise, then limn→∞ fn(x) is a measurable function.

Proof: To see that f(x) = supn∈N fn(x) is measurable, we use Proposition5.3.4(iii). For any r ∈ R

f−1((r,∞]) = x ∈ X : supn∈N

fn(x) > r =

=⋃n∈Nx ∈ X : fn(x) > r =

⋃n∈N

f−1n ((r,∞]) ∈ A

and hence f is measurable by Propostion 5.3.4(iii). A similar argument canbe used for infn∈N fn(x).

To show that lim supn→∞ fn(x) is measurable, first observe that thefunctions

gk(x) = supn≥k

fn(x)

are measurable by what we have already shown. Since

lim supn→∞

fn(x) = limk→∞

gk(x) = infk∈N

gk(x),

(for the last equality, use that the sequence gk(x) is decreasing) the mea-surability of lim supn→∞ fn(x) follows. A completely similar proof can be


used to prove that lim infn→∞ fn(x) is measurable. Finally, if the sequenceconverges pointwise, then limn→∞ fn(x) = lim supn→∞ fn(x) and is hencemeasurable. 2

The results above are quite important. Mathematical analysis aboundsin limit arguments, and knowing that the limit function is measurable, isoften a key ingredient in these arguments.


1. Show that if f : X → R is measurable, the sets f−1(∞) and f−1(−∞)are measurable.

2. Complete the proof of Proposition 5.3.3 by showing that f−1 of the intervals(−∞, r), (−∞, r], [r,∞), (r,∞), (−∞,∞), where r ∈ R, are measurable.


4. Fill in the details in the proof of Lemma 5.3.2. Explain in particular whythere is only a countable number of rational intervals and why the open setG is the union of the rational intervals contained in it.

5. Show that if f1, f2, . . . , fn are measurable functions with values in R, thenf1 + f2 + · · ·+ fn and f1f2 · . . . · fn are measurable.

6. The indicator function of a set A ⊆ X is defined by

1A(x) =

1 if x ∈ A

0 otherwise

a) Show that 1A is a measurable function if and only if A ∈ A.

b) A simple function is a function f : X → R of the form

f(x) =

n∑i=1

ai1Ai(x)

where a1, a2, . . . , an ∈ R and A1, A2, . . . , An ∈ A. Show that all simplefunctions are measurable.

7. Show that if f : X → R is measurable, then f−1(B) ∈ A for all Borel sets B(it may help to take a look at Exercise 5.1.10).

8. Let En be a disjoint sequence of measurable sets such that⋃∞n=1En = X,

and let fn be a sequence of measurable functions. Show that the functiondefined by

f(x) = fn(x) when x ∈ Enis measurable.

9. Fill in the details of the proof of the fg part of Proposition 5.3.7. You maywant to prove first that if h : X → R is measurable, then so is h

2 .

10. Prove the inf- and the lim inf-part of Proposition 5.3.9.


11. Let us write f ∼ g to denote that f and g are two measurable functionswhich are equal a.e.. Show that ∼ is an equivalence relation, i.e.:

(i) f ∼ f(ii) If f ∼ g, then g ∼ f .

(iii) If f ∼ g and g ∼ h, then f ∼ h.

12. Let (X,A, µ) be a measure space.

a) Assume that the measure space is complete. Show that if f : X → Ris measurable and g : X → R equals f almost everywhere, then g ismeasurable.

b) Show by example that the result in a) does not hold without the com-pleteness condition. You may, e.g., use the measure space in Exercise5.2.1.

13. Assume that the measurable function f : X → R is finite a.e. Define a newfunction f ′ : X → R by

f ′(x) =

f(x) if f(x) is finite

0 otherwise

Show that f ′ is measurable and equal to f a.e.

14. A sequence fn of measurable functions is said to converge almost every-where to f if there is a set A of measure 0 such that fn(x) → f(x) for allx /∈ A.

a) Show that if the measure space is complete, then f is necessarily mea-surable.

b) Show by example that the result in a) doesn’t hold without the com-pleteness assumption (take a look at Problem 12 above).

15. Let X be a set and F a collection of functions f : X → R. Show that there isa smallest σ-algebra A on X such that all the functions f ∈ F are measurablewith respect to A (this is called the σ-algebra generated by F). Show that ifX is a metric space and all the functions in F are continuous, then A ⊆ B,where B is the Borel σ-algebra.

5.4 Integration of simple functions

We are now ready to look at integration. The integrals we shall work withare of the form

∫f dµ where f is a measurable function and µ is a measure,

and the theory is at the same time a refinement and a generalization of theclassical theory Riemann integration that you know from calculus.

It is a refinement because if we choose µ to be the one-dimensionalLebesgue measure, the new integral

∫f dµ equals the traditional Riemann

integral∫f(x) dx for all Riemann integrable functions, but is defined for

5.4. INTEGRATION OF SIMPLE FUNCTIONS 151

many more functions. The same holds in higher dimensions: If µ is n-dimensional Lebesgue measure, then

∫f dµ equals the Riemann integral∫

f(x1, . . . , xn) dx1 . . . dxn for all Riemann integrable functions, but is de-fined for many more functions. The theory is also a vast generalization ofthe old one as it will allow us to integrate functions on all measure spacesand not only on Rn.

One of the advantages of the new (Lebesgue) theory is that it will allowus to interchange limits and integrals:

limn→∞

∫fn dµ =

∫limn→∞

fn dµ

in much greater generality than before. Such interchanges are of great im-portance in many arguments, but are problematic for the Riemann integralas there is in general no reason why the limit function limn→∞ fn should beRiemann integrable even when the individual functions fn are. Accordingto Proposition 5.3.9, limn→∞ fn is measurable whenever the fn’s are, andthis makes it much easier to establish limit theorems for the new kind ofintegrals.

We shall develop integration theory in three steps: In this section weshall look at integrals of so-called simple functions which are generaliza-tions of step functions; in the next section we shall introduce integrals ofnonnegative mesurable functions; and in section 5.6 we shall extend the the-ory to functions taking both positive and negative values.

Thoughout this section we shall be working with a measure space (X,A, µ).If A is a subset of X, we define its indicator function by

1A(x) =

1 if x ∈ A

0 otherwise

The indicator function is measurable if and only if A is measurable.A measurable function f : X → R is called a simple function if it takes

only finitely many different values a1, a2, . . . , an. We may then write

f(x) =

n∑i=1

ai1Ai(x)

where the sets Ai = x ∈ X | f(x) = ai are disjoint and measurable. Notethat if one of the ai’s is zero, the term does not contribute to the sum, andit is occasionally convenient to drop it.

If we instead start with measurable sets B1, B2, . . . , Bm and real numbersb1, b2, . . . , bm, then

g(x) =m∑i=1

bi1Bi(x)


is measurable and takes only finitely many values, and hence is a simplefunction. The difference between f and g is that the sets A1, A2, . . . , An inf are disjoint with union X, and that the numbers a1, a2, . . . , an are distinct.The same need not be the case for g. We say that the simple function fis on standard form, while g is not (unless, of course, the bi’s happen to bedistinct and the sets Bi are disjoint and make up all of X).

You may think of a simple function as a generalized step function. Thedifference is that step functions are constant on intervals (in R), rectangles(in R2), or boxes (in higher dimensions), while a simple function need onlybe constant on much more complicated (but still measurable) sets.

We can now define the integral of a nonnegative simple function.

Definition 5.4.1 Assume that

f(x) =n∑i=1

ai1Ai(x)

is a nonnegative simple function on standard form. Then the integral of fwith respect to µ is defined by∫

f dµ =n∑i=1

aiµ(Ai)

Recall that we are using the convention that 0·∞ = 0, and hence aiµ(Ai) = 0if ai = 0 and µ(Ai) =∞.

Note that the integral of an indicator function is∫1A dµ = µ(A)

To see that the definition is reasonable, assume that you are in R2. Sinceµ(Ai) measures the area of the set Ai, the product aiµ(Ai) measures in anintuitive way the volume of the solid with base Ai and height ai.

We need to know that the formula in the definition also holds when thesimple function is not on standard form. The first step is the following,simple lemma:

Lemma 5.4.2 If

g(x) =m∑j=1

bj1Bj (x)

is a nonnegative simple function where the Bj’s are disjoint and X =⋃mj=1Bj, then ∫

g dµ =m∑j=1

bjµ(Bj)


Proof: The problem is that the values b1, b2, . . . , bm need not be distinct, butthis is easily fixed: If c1, c2, . . . , ck are the distinct values taken by g, let bi1 ,bi2 ,. . . ,bini

be the bj ’s that are equal to ci, and let Ci = Bi1 ∪Bi2 ∪ . . .∪Bini

(make a drawing!) Then µ(Ci) = µ(Bi1) +µ(Bi2) + . . .+µ(Bini), and hence

m∑j=1

bjµ(Bj) =k∑i=1

ciµ(Ci)

Since g(x) =∑k

i=1 ci1Ci(x) is the standard form representation of g, wehave ∫

g dµ =

k∑i=1

ciµ(Ci)

and the lemma is proved 2

The next step is also easy:

Proposition 5.4.3 Assume that f and g are two nonnegative simple func-tions, and let c be a nonnnegative, real number. Then

(i)∫cf dµ = c

∫f dµ

(ii)∫

(f + g) dµ =∫f dµ+

∫g dµ

Proof: (i) is left to the reader. To prove (ii), let

f(x) =n∑i=1

ai1Ai(x)

g(x) =

n∑j=1

bj1Bj (x)

be standard form representations of f and g, and define Ci,j = Ai ∩Bj . Bythe lemma above ∫

f dµ =∑i,j

aiµ(Ci,j)

and ∫g dµ =

∑i,j

bjµ(Ci,j)

and also ∫(f + g) dµ =

∑i,j

(ai + bj)µ(Ci,j)

since the value of f + g on Ci,j is ai + bj 2


Remark: Using induction, we can extend part (ii) above to longer sums:∫(f1 + f2 + · · ·+ fn) dµ =

∫f1 dµ+

∫f2 dµ+ . . .+

∫fn dµ

for all nonnegative, simple functions f1, f2, . . . , fn.

We can now prove that the formula in Definition 5.4.1 holds for all rep-resentations of simple functions, and not only the standard ones:

Corollary 5.4.4 If f(x) =∑

i=1 ai1Ai(x) is a step function with ai ≥ 0 forall i, then ∫

f dµ =n∑i=1

aiµ(Ai)

Proof: By the results above∫f dµ =

∫ n∑i=1

ai1Ai dµ =

n∑i=1

∫ai1Ai dµ =

n∑i=1

ai

∫1Ai dµ =

n∑i=1

aiµ(Ai)

2

We need to prove yet another almost obvious result. We write g ≤ f tosay that g(x) ≤ f(x) for all x.

Proposition 5.4.5 Assume that f and g are two nonnegative simple func-tions. If g ≤ f , then ∫

g dµ ≤∫f dµ

Proof: We use the same trick as in the proof of Proposition 5.4.3: Let

f(x) =n∑i=1

ai1Ai(x)

g(x) =

m∑j=1

bj1Bj (x)

be standard form representations of f and g, and define Ci,j = Ai ∩ Bj .Then ∫

f dµ =∑i,j

aiµ(Ci,j) ≥∑i,j

bjµ(Ci,j) =

∫g dµ

2

We shall end this section with a key result on limits of integrals, butfirst we need some notation. Observe that if f =

∑ni=1 ai1Ai is a simple


function and B is a measurable set, then 1Bf =∑n

i=1 ai1Ai∩B is also asimple function. We shall write∫

Bf dµ =

∫1Bf dµ

and call this the integral of f over B. The lemma below may seem obvious,but it is the key to many later results.

Lemma 5.4.6 Assume that B is a measurable set, b a nonnegative realnumber, and fn an increasing sequence of nonnegative simple functionssuch that limn→∞ fn(x) ≥ b for all x ∈ B. Then limn→∞

∫B fn dµ ≥ bµ(B).

Proof: Observe first that we may assume that b > 0 and µ(B) > 0 asotherwise the conclusion obviously holds. Let a be any positive number lessthan b, and define

An = x ∈ B | fn(x) ≥ a

Since fn(x) ↑ b for all x ∈ B, we see that the sequence An is increasingand that

B =∞⋃n=1

An

By continuity of measure (Proposition 5.1.5 a)), µ(B) = limn→∞ µ(An), andhence for any positive number m less that µ(B), we can find an N ∈ N suchthat µ(An) > m when n ≥ N . Since fn ≥ a on An, we thus have∫

Bfn dµ ≥

∫An

a dµ = am

whenever n ≥ N . Since this holds for any number a less than b and anynumber m less than µ(B), we must have limn→∞

∫B fn dµ ≥ bµ(B). 2

To get the result we need, we extend the lemma to simple functions:

Proposition 5.4.7 Let g be a nonnegative simple function and assume thatfn is an increasing sequence of nonnegative simple functions such thatlimn→∞ fn(x) ≥ g(x) for all x. Then

limn→∞

∫fn dµ ≥

∫g dµ

Proof: Let g(x) =∑m

i=1 bi1B1(x) be the standard form of g. If any of thebi’s is zero, we may just drop that term in the sum, so that we from now onassume that all the bi’s are nonzero. By Corollary 5.4.3(ii), we have∫

B1∪B2∪...∪Bm

fn dµ =

∫(1B1 + 1B2 + . . .+ 1Bm) fn dµ =


=

∫(1B1fn + 1B2fn + . . .+ 1Bmfn) dµ =

=

∫B1

fn dµ+

∫B2

fn dµ+ . . .+

∫Bm

fn dµ =

m∑i=1

∫Bi

fn dµ

By the lemma, limn→∞∫Bifn dµ ≥ biµ(Bi), and hence

limn→∞

∫fn dµ ≥ lim

n→∞

∫B1∪B2∪...∪Bm

fndµ = limn→∞

m∑i=1

∫Bi

fn dµ

=m∑i=1

limn→∞

∫Bi

fn dµ ≥m∑i=1

biµ(Bi) =

∫g dµ

2

We are now ready to extend the integral to all positive, measurablefunctions. This will be the topic of the next section.


1. Show that if f is a measurable function, then the level set

Aa = x ∈ X | f(x) = a

is measurable for all a ∈ R.

2. Check that according to Definition 5.4.1,∫

1A dµ = µ(A) for all A ∈ A.

3. Prove part (i) of Proposition 5.4.3.

4. Show that if f1, f2, . . . , fn are simple functions, then so are

h(x) = maxf1(x), f2(x), . . . , fn(x)

and

h(x) = minf1(x), f2(x), . . . , fn(x)

5. Let µ be Lebesgue measure, and define A = Q ∩ [0, 1]. The function 1A isnot integrable in the Riemann sense. What is

∫1A dµ?

6. Let f be a nonnegative, simple function on a measure space (X,A, µ). Showthat

ν(B) =

∫B

f dµ

defines a measure ν on (X,A).

5.5. INTEGRALS OF NONNEGATIVE FUNCTIONS 157

5.5 Integrals of nonnegative functions

We are now ready to define the integral of a general, nonnegative, measurablefunction. Throughout the sextion, (X,A, µ) is a measure space.

Definition 5.5.1 If f : X → R+ is measurable, we define∫f dµ = sup

∫g dµ | g is a nonnegative simple function, g ≤ f

Remark: Note that if f is a simple function, we now have two definitionsof∫f dµ; the original one in Definition 5.4.1 and a new one in the definition

above. It follows from Proposition 5.4.5 that the two definitions agree.

The definition above is natural, but also quite abstract, and we shall worktoward a reformulation that is often easier to handle.

Proposition 5.5.2 Let f : X → R+ be a measurable function, and assumethat hn is an increasing sequence of simple functions converging pointwiseto f . Then

limn→∞

∫hn dµ =

∫f dµ

Proof: Since the sequence ∫hn dµ is increasing by Proposition 5.4.5, the

limit clearly exists (it may be ∞), and since∫hn dµ ≤

∫f dµ for all n, we

must have

limn→∞

∫hn dµ ≤

∫f dµ

To get the opposite inequality, it suffices to show that

limn→∞

∫hn dµ ≥

∫g dµ

for each simple function g ≤ f , but this follows from Proposition 5.4.7. 2

The proposition above would lose much of its power if there weren’t anyincreasing sequences of simple functions converging to f . The next resulttells us that there always are. Pay attention to the argument; it is the keyto why the theory works.

Proposition 5.5.3 If f : X → R+ is measurable, there is an increasingsequence hn of simple functions converging pointwise to f . Moreover, foreach n either f(x)− 1

2n < hn(x) ≤ f(x) or hn(x) = 2n

Proof: To construct the simple function hn, we cut the interval [0, 2n) intohalf-open subintervals of length 1

2n , i.e. intervals

Ik =

[k

2n,k + 1

2n

)


where 0 ≤ k < 22n, and then let

Ak = f−1(Ik)

We now define

hn(x) =22n−1∑k=0

k

2n1Ak

(x) + 2n1x | f(x)≥2n

By definition, hn is a simple function no greater than f . Since the intervalsget narrower and narrower and cover more and more of [0,∞), it is easy tosee that hn converges pointwise to f . To see why the sequence increases,note that each time we increase n by one, we split each of the former intervalsIk in two, and this will cause the new step function to equal the old one forsome x’s and jump one step upwards for others (make a drawing).

The last statement follows directly from the construction. 2

Remark: You should compare the partitions in the proof above to thepartitions you have previously seen in Riemann integration. When we in-tegrate a function of one variable in calculus, we partition an interval [a, b]on the x-axis and use this partition to approximate the original function bya step function. In the proof above, we instead partitioned the y-axis intointervals and used this partition to approximate the original function by asimple function. The latter approach gives us much better control over whatis going one; the partition controls the oscillations of the function. The pricewe have to pay, is that we get simple functions instead of step functions,and to use simple functions for integration, we need measure theory. Thisobservation may look like a curiosity, but it is really the key to the successof Lebesgue integration.

Let us combine the last two results in a handy corollary:

Corollary 5.5.4 If f : X → R+ is measurable, there is an increasing se-quence hn of simple functions converging pointwise to f , and∫

f dµ = limn→∞

∫hn dµ

Let us take a look at some properties of the integral.

Proposition 5.5.5 Assume that f, g : X → R+ are measurable functionsand that c is a nonnegative, real number. Then:

(i)∫cf dµ = c

∫f dµ.

(ii)∫

(f + g) dµ =∫f dµ+

∫g dµ.


(iii) If g ≤ f , then∫g dµ ≤

∫f dµ.

Proof: (iii) is immediate from the definition, and (i) is left to the reader. Toprove (ii), let fn and gn be to increasing sequence of simple functionsconverging to f and g, respectively. Then fn+gn is an increasing sequenceof simple functions converging to f + g, and∫

(f + g) dµ = limn→∞

∫(fn + gn) dµ = lim

n→∞

(∫fn dµ+

∫gn dµ

)=

= limn→∞

∫fn dµ+ lim

n→∞

∫gn dµ =

∫f dµ+

∫g dµ

where we have used Proposition 5.4.3(ii) to go from∫

(fn+gn) dµ to∫fn dµ+∫

gn dµ. 2

One of the great advantages of the Lebesgue integration theory we arenow developing, is that it is much better behaved with respect to limits thanthe Riemann theory you are used to. Here is a typical example:

Theorem 5.5.6 (Monotone Convergence Theorem) If fn is an in-creasing sequence of nonnegative, measurable functions such that f(x) =limn→∞ fn(x) for all x, then

limn→∞

∫fn dµ =

∫f dµ

In other words,

limn→∞

∫fn dµ =

∫limn→∞

fn dµ

Proof: We know from Proposition 5.3.9 that f is measurable, and hence theintegral

∫f dµ is defined. Since fn ≤ f , we have

∫fn dµ ≤

∫f dµ for all n,

and hence

limn→∞

∫fn dµ ≤

∫f dµ

To prove the opposite inequality, we approximate each fn by simple functionsas in the proof of Proposition 5.5.3; in fact, let hn be the n-th approximationto fn. Assume that we can prove that the sequence hn converges to f ;then

limn→∞

∫hn dµ =

∫f dµ

by Proposition 5.5.2. Since fn ≥ hn, this would give us the desired inequality

limn→∞

∫fn dµ ≥

∫f dµ


It remains to show that hn(x)→ f(x) for all x. From Proposition 5.5.3we know that for all n, either fn(x)− 1

2n < hn(x) ≤ fn(x) or hn(x) = 2n. Ifhn(x) = 2n for infinitely many n, then hn(x) goes to ∞, and hence to f(x).If hn(x) is not equal to 2n for infinitely many n, then we eventually havefn(x)− 1

2n < hn(x) ≤ fn(x), and hence hn(x) converges to f(x) since fn(x)does. 2

We would really have liked the formula

limn→∞

∫fn dµ =

∫limn→∞

fn dµ (5.5.1)

above to hold in general, but as the following example shows, this is not thecase.

Example 1: Let µ be the counting measure on N, and define the sequencefn by

fn(x) =

1 if x = n

0 otherwise

Then limn→∞ fn(x) = 0 for all x, but∫fn dµ = 1. Hence

limn→∞

∫fn dµ = 1

but ∫limn→∞

fn dµ = 0

♣

There are many results in measure theory giving conditions for (5.5.1) tohold, but there is no ultimate theorem covering all others. There is, however,a simple inequality that always holds.

Theorem 5.5.7 (Fatou’s Lemma) Assume that fn is a sequence of non-negative, measurable functions. Then

lim infn→∞

∫fn dµ ≥

∫lim infn→∞

fn dµ

Proof: Let gk(x) = infk≥n fn(x). Then gk is an increasing sequence ofmeasurable functions, and by the Monotone Convergence Theorem

limk→∞

∫gk dµ =

∫limk→∞

gk dµ =

∫lim infn→∞

fn dµ


where we have used the definition of lim inf in the last step. Since fk ≥ gk,we have

∫fk dµ ≥

∫gk dµ, and hence

lim infk→∞

∫fk dµ ≥ lim

k→∞

∫gk dµ =

∫lim infn→∞

fn dµ

and the result is proved. 2

Fatou’s Lemma is often a useful tool in establishing more sophisticatedresults, see Exercise 17 for a typical example.

Just as for simple functions, we define integrals over measurable subsetsA of X by the formula ∫

Af dµ =

∫1Af dµ

So far we have allowed our integrals to be infinite, but we are mainlyinterested in situations where

∫f dµ is finite:

Definition 5.5.8 A function f : X → [0,∞] is said to be integrable if it ismeasurable and

∫f dµ <∞.

Comparison with Riemann integration

We shall end this section with a quick comparison between the integral wehave now developed and the Riemann integral you learned in calculus. Letus begin with a quick review of the Riemann integral1.

Assume that [a, b] is a closed and bounded interval, and let f : [a, b]→ Rbe a nonnegative, bounded function. Recall that a partition P of the interval[a, b] is a finite set x0, x1, . . . , xn such that

a = x0 < x1 < x2 < . . . < xn = b

The lower and upper values of f over the interval (xi−1, xi] are

mi = inff(x) |x ∈ (xi−1, xi]

andMi = supf(x) |x ∈ (xi−1, xi]

respectively, and the lower and upper sums of the partition P are

L(P) =n∑i=1

mi(xi − xi−1)

1The approach to Riemann integration that I describe here is actually due to the Frenchmathematician Gaston Darboux (1842-1917).


and

U(P) =

n∑i=1

Mi(xi − xi−1)

The function f is Riemann integrable if the lower integral∫ b

af(x) dx = supL(P) | P is a partition of [a, b]

and the upper integral∫ b

af(x) dx = infU(P) | P is a partition of [a, b]

coincide, in which case we define the Riemann integral∫ ba f(x) dx to be the

common value.We are now ready to compare the Riemann integral

∫ ba f(x) dx and the

Lebesgue integral∫

[a,b] f dµ (µ is now the Lebesgue measure). Observe firstthat if we define simple functions

φP =n∑i=1

mi1(xi−1,xi]

and

ΦP =n∑i=1

Mi1(xi−1,xi]

we have ∫φP dµ =

n∑i=1

mi(xi − xi−1) = N(P)

and ∫ΦP dµ =

n∑i=1

Mi(xi − xi−1) = U(P)

Theorem 5.5.9 Assume that f : [a, b] → [0,∞) is a bounded, Riemannintegrable function on [a, b]. Then f is measurable and the Riemann and theLebesgue integral coincide:∫ b

af(x) dx =

∫[a.b]

f dµ

Proof: Since f is Riemann integrable, we can pick a sequence Pn of parti-tions such that the sequences φ(Pn) of lower step functions is increasing,the sequence Φ(Pn) of upper step functions is decreasing, and

limn→∞

L(Pn) = limn→∞

U(Pn) =

∫ b

af(x) dx


(see Exercise 10 for help), or in other words

limn→∞

∫φPn dµ = lim

n→∞

∫ΦPn dµ =

∫ b

af(x) dx

This means that

limn→∞

∫(ΦPn − φPn) dµ = 0

and by Fatou’s lemma, we have∫limn→∞

(ΦPn − φPn) dµ = 0

(the limits exists since the sequence ΦPn − φPn is decreasing). This meansthat limn→∞ φPn = limn→∞ΦPn a.e., and since

limn→∞

φPn ≤ f ≤ limn→∞

ΦPn ,

f must be measurable as it squeezed between two almost equal, measurablefunctions. Also, since f = limn→∞ φPn a.s., the Monotone ConvergenceTheorem (we are actually using the slightly extended version in Exercise13) tells us that∫

[a,b]f dµ = lim

n→∞

∫φPn dµ = lim

n→∞U(Pn) =

∫ b

af(x) dx

2

The theorem above can be extended in many directions. Exactly the sameproof works for Riemann integrals over rectangular boxes in Rd, and oncewe have introduced integrals of functions taking both positive and negativevalues in the next section, it easy to extend the theorem above to thatsituation. There are some subtleties concerning improper integrals, but weshall not touch on these here. Our basic message is: Lebesgue integrationis just like Riemann integration, only better (because more functions areintegrable and we can integrate in completely new contexts — all we needis a measure)!


1. Assume f : X → [0,∞] is a nonnegative simple function. Show that the twodefinitions of

∫f dµ given in Definitions 5.4.1 and 5.5.1 coincide.

2. Prove Proposition 5.5.5(i).

3. Show that if f : X → [0,∞] is measurable, then

µ(x ∈ X | f(x) ≥ a) ≤ 1

a

∫f dµ

for all positive, real numbers a.


4. In this problem, f, g : X → [0,∞] are measurable functions.

a) Show that if f = 0 a.e., then∫f dµ = 0.

b) Show that if∫f dµ = 0, then f = 0 a.e. (Hint: Argue contrapositively:

Assume that f is not equal to 0 almost everywhere and use that sincex ∈ X | f(x) > 0 =

⋃n∈Nx ∈ X | f(x) > 1

n, there has to be ann ∈ N such that µ(x ∈ X | f(x) > 1

n) > 0.)

c) Show that if f = g a.e., then∫f dµ =

∫g dµ.

d) Show that if∫Ef dµ =

∫Eg dµ for all measurable sets E, then f = g

a.e.

5. Assume that (X,A, µ) is a measure space and that f : X → [0,∞] is anonnegative, measurable function

a) Show that if A, B are measurable sets with A ⊆ B, then∫Af dµ ≤∫

Bf dµ

b) Show that if A,B are disjoint, measurable sets, then∫A∪B f dµ =∫

Af dµ+

∫Bf dµ.

c) Define ν : A → R by

ν(A) =

∫A

f dµ

Show that ν is a measure.

6. Show that if f : X → [0,∞] is integrable, then f is finite a.e.

7. Let µ be Lebesgue measure on R and assume that f : R→ R+ is a nonneg-ative, measurable function. Show that

limn→∞

∫[−n,n]

f dµ =

∫f dµ

8. Let µ be Lebesgue measure on R. Show that for all measurable sets A ⊆ R

limn→∞

∫A

n∑k=1

x2k

k!dµ =

∫A

ex2

dµ

9. Let f : R→ R be the function

f(x) =

1 if x is rational

0 otherwise

and for each n ∈ N, let fn : R→ R be the function

fn(x) =

1 if x = p

q where p ∈ Z, q ∈ N, q ≤ n

0 otherwise

a) Show that fn(x) is an increasing sequence converging to f(x) for allx ∈ R.


b) Show that each fn is Riemann integrable over [0, 1] with∫ 1

0fn(x) dx = 0

(this is integration as taught in calculus courses).

c) Show that f is not Riemann integrable over [0, 1].

d) Show that the one-dimensional Lebesgue integral∫[0,1]

f dµ exists and

find its value.

10. In this problem we shall sketch how one may construct the sequence Pn ofpartitions in the proof of Theorem 5.5.9.

a) Call a partition P of [a, b] finer than another partition Q if Q ⊆ P, andshow that if P is finer than Q, then φP ≥ φQ and ΦP ≤ ΦQ.

b) Show that if f is as in Theorem 5.5.9, there are sequences of partitionsQn and Rn such that

limn→∞

L(Qn) =

∫ b

a

f(x) dx

and

limn→∞

U(Rn) =

∫ b

a

f(x) dx

c) For each n, let Pn be the common refinement of all partitions Qk andRk, k ≤ n, i.e.

Pn =

n⋃k=1

(Qk ∪Rk)

Show that Pn satisfies the requirements in the proof of Theorem 5.5.9.

11. a) Let un be a sequence of positive, measurable functions. Show that∫ ∞∑n=1

un dµ =

∞∑n=1

∫un dµ

b) Assume that f is a nonnnegative, measurable function and that Bnis a disjoint sequence of measurable sets with union B. Show that∫

B

f dµ =

∞∑n=1

∫Bn

f dµ

12. Assume that f is a nonnegative, measurable function and that An is anincreasing sequence of measurable sets with union A. Show that∫

A

f dµ = limn→∞

∫An

f dµ

13. Show the following generalization of the Monotone Convergence Theorem:Assume that f is a measurable function. If fn is an increasing sequenceof nonnegative, measurable functions such that f(x) = limn→∞ fn(x) almosteverywhere. (i.e. for all x outside a set N of measure zero), then

limn→∞

∫fn dµ =

∫f dµ


14. Let µ be the Lebesgue measure on R. Find a decreasing sequence fn ofmeasurable functions fn : R→ [0,∞) converging pointwise to zero such thatlimn→∞

∫fn dµ 6= 0

15. Assume that f : X → [0,∞] is a measurable function, and that fn is asequence of measurable functions converging pointwise to f . Show that iffn ≤ f for all n,

limn→∞

∫fn dµ =

∫f dµ

16. Assume that fn is a sequence of nonnegative functions converging pointwiseto f . Show that if

limn→∞

∫fn dµ =

∫f dµ <∞,

then

limn→∞

∫E

fn dµ =

∫E

f dµ

for all measurable E ⊆ X.

17. Assume that g : X → [0,∞] is an integrable function (i.e. g is measurableand

∫g dµ < ∞) and that fn is a sequence of nonnegative, measurable

functions converging pointwise to a function f . Show that if fn ≤ g for alln, then

limn→∞

∫fn dµ =

∫f dµ

Hint: Apply Fatou’s Lemma to both sequences fn and g − fn.

18. Let (X,A) be a measurable space, and letM+ be the set of all non-negative,measurable functions f : X → R+. Assume that I :M+ → R+ satisfies thefollowing three conditions:

(i) I(αf) = αI(f) for all α ∈ [0,∞) and all f ∈M+.

(ii) I(f + g) = I(f) + I(g) for all f, g ∈M+.

(iii) If fn is an increasing sequence from M+ converging pointwise to f ,then limn→∞ I(fn) = I(f).

a) Show that I(f1 + f2 + · · · + fn) = I(f1) + I(f2) + · · · + I(fn) for alln ∈ N and all f1, f2, . . . , fn ∈M+.

b) Show that if f, g ∈M+ and f(x) ≤ g(x) for all x ∈ X, then I(f) ≤ I(g).

c) Show that

µ(E) = I(1E) for E ∈ A

defines a measure on (X,A).

d) Show that I(f) =∫f dµ for all non-negative simple functions f .

e) Show that I(f) =∫f dµ for all f ∈M+.

5.6. INTEGRABLE FUNCTIONS 167

5.6 Integrable functions

So far we only know how to integrate nonnegative functions, but it is notdifficult to extend the theory to general functions. We have, however, tobe a little more careful with the size of the functions we integrate: If anonnegative function f is too big, we may just put the integral

∫f dµ equal

to ∞, but if the function can take negative values as well as positive, theremay be infinite contributions of opposite signs that are difficult to balance.For this reason we shall only define the integral for a class of integrablefunctions where this problem does not occur.

Given a function f : X → R, we first observe that f = f+ − f−, wheref+ and f− are the nonnegative functions

f+(x) =

f(x) if f(x) > 0

0 otherwise

and

f−(x) =

−f(x) if f(x) < 0

0 otherwise

Note that f+ and f− are measurable if f is.Recall that a nonnegative, measurable function f is integrable if

∫f dµ <

∞.

Definition 5.6.1 A function f : X → R is called integrable if it is mea-surable, and f+ and f− are integrable. We define the integral of f by∫

f dµ =

∫f+ dµ−

∫f− dµ

The definition illustrates our point above: If both∫f+ dµ and

∫f− dµ are

infinite, there is no natural way to define the difference∫f+ dµ−

∫f− dµ.

The next lemma gives a useful characterization of integrable functions.

Lemma 5.6.2 A measurable function f is integrable if and only if its ab-solute value |f | is integrable, i.e. if and only if

∫|f | dµ <∞.

Proof: Note that |f | = f+ + f−. Hence∫|f | dµ =

∫f+ dµ+

∫f− dµ

by Proposition 5.5.5(ii), and we see that∫|f | dµ is finite if and only if both∫

f+ dµ and∫f− dµ are finite. 2


The next lemma is another useful technical tool. It tells us that if wesplit f as a difference f = g − h of two nonnegative, integrable functions,we always get

∫f dµ =

∫g dµ−

∫h dµ (so far we only know this for g = f+

and h = f−).

Lemma 5.6.3 Assume that g : X → [0,∞] and h : X → [0,∞] are twointegrable, nonnegative functions, and that f(x) = g(x)− h(x) at all pointswhere the difference is defined. Then f is integrable and∫

f dµ =

∫g dµ−

∫h dµ

Proof: Note that since g and h are integrable, they are finite a.e., and hencef = g − h a.e. Modifying g and h on a set of measure zero (this will notchange their integrals), we may assume that f(x) = g(x) − h(x) for all x.Since |f(x)| = |g(x) − h(x)| ≤ |g(x)| + |h(x)|, it follows from the lemmaabove that f is integrable.

As

f(x) = f+(x)− f−(x) = g(x)− h(x)

we have

f+(x) + h(x) = g(x) + f−(x)

where we on both sides have sums of nonnegative functions. By Proposition5.5.5(ii), we get ∫

f+ dµ+

∫h dµ =

∫g dµ+

∫f− dµ

Rearranging the integrals (they are all finite), we get∫f dµ =

∫f+ dµ−

∫f− dµ =

∫g dµ−

∫h dµ

and the lemma is proved. 2

We are now ready to prove that the integral behaves the way we expect:

Proposition 5.6.4 Assume that f, g : X → R are integrable functions, andthat c is a constant. Then f + g and cf are integrable, and

(i)∫cf dµ = c

∫f dµ.

(ii)∫

(f + g) dµ =∫f dµ+

∫g dµ.

(iii) If g ≤ f , then∫g dµ ≤

∫f dµ.


Proof: (i) is left to the reader (treat positive and negative c’s separately). Toprove (ii), first note that since f and g are integrable, the sum f(x) + g(x)is defined a.e., and by changing f and g on a set of measure zero (thisdoesn’t change their integrals), we may assume that f(x) + g(x) is definedeverywhere. Since

|f(x) + g(x)| ≤ |f(x)|+ |g(x)|,

f + g is integrable. Obviously,

f + g = (f+ − f−) + (g+ − g−) = (f+ + g+)− (f− + g−)

and hence by the lemma above and Proposition 5.5.5(ii)∫(f + g) dµ =

∫(f+ + g+) dµ−

∫(f− + g−) dµ =

=

∫f+ dµ+

∫g+ dµ−

∫f− dµ−

∫g− dµ =

=

∫f+ dµ−

∫f− dµ+

∫g+ dµ−

∫g− dµ =

=

∫f dµ+

∫g dµ

To prove (iii), note that f − g is a nonnegative function and hence by (i)and (ii):∫

f dµ−∫g dµ =

∫f dµ+

∫(−1)g dµ =

∫(f − g) dµ ≥ 0

Consequently,∫f dµ ≥

∫g dµ and the proposition is proved. 2

We can now extend our limit theorems to integrable functions takingboth signs. The following result is probably the most useful of all limittheorems for integrals as it is quite strong and at the same time easy touse. It tells us that if a convergent sequence of functions is dominated byan integrable function, then

limn→∞

∫fn dµ =

∫limn→∞

fn dµ

Theorem 5.6.5 (Lebesgue’s Dominated Convergence Theorem) As-sume that g : X → R is a nonnegative, integrable function and that fn isa sequence of measurable functions converging pointwise to f . If |fn| ≤ gfor all n, then

limn→∞

∫fn dµ =

∫f dµ


Proof: First observe that since |f | ≤ g, f is integrable. Next note thatsince g − fn and g + fn are two sequences of nonnegative measurablefunctions, Fatou’s Lemma gives:

lim infn→∞

∫(g−fn) dµ ≥

∫lim infn→∞

(g−fn) dµ =

∫(g−f) dµ =

∫g dµ−

∫f dµ

and

lim infn→∞

∫(g+fn) dµ ≥

∫lim infn→∞

(g+fn) dµ =

∫(g+f) dµ =

∫g dµ+

∫f dµ

On the other hand,

lim infn→∞

∫(g − fn) dµ =

∫g dµ− lim sup

n→∞

∫fn dµ

and

lim infn→∞

∫(g + fn) dµ =

∫g dµ+ lim inf

n→∞

∫fn dµ

Combining the two expressions for lim infn→∞∫

(g − fn) dµ, we see that∫g dµ− lim sup

n→∞

∫fn dµ ≥

∫g dµ−

∫f dµ

and hence

lim supn→∞

∫fn dµ ≤

∫f dµ

Combining the two expressions for lim infn→∞∫

(g+fn) dµ, we similarly get

lim infn→∞

∫fn dµ ≥

∫f dµ

Hence

lim supn→∞

∫fn dµ ≤

∫f dµ ≤ lim inf

n→∞

∫fn dµ

which means that limn→∞∫fn dµ =

∫f dµ. The theorem is proved. 2

Remark: It is easy to check that we can relax the conditions above some-what: If fn(x) converges to f(x) a.e., and |fn(x)| ≤ g(x) fails on a set ofmeasure zero, the conclusion still holds (see Exercise 7 for the precise state-ment).

Let us take a look at a typical application of the theorem:

Proposition 5.6.6 Let f : R×X → R be a function which is

(i) continuous in the first variable, i.e. for each y ∈ X, the functionx 7→ f(x, y) is continuous


(ii) measurable in the second component, i.e. for each x ∈ X, the functiony 7→ f(x, y) is measurable

(iii) uniformly bounded by an integrable function, i.e. there is an integrablefunction g : X → [0,∞] such that |f(x, y)| ≤ g(y) for all x, y ∈ X.

Then the function

h(x) =

∫f(x, y) dµ(y)

is continuous (the expression∫f(x, y) dµ(y) means that we for each fixed x

integrate f(x, y) as a function of y).

Proof: According to Proposition 2.2.5 it suffices to prove that if an is asequence converging to a point a, then h(an) converges to h(a). Observethat

h(an) =

∫f(an, y) dµ(y)

and

h(a) =

∫f(a, y) dµ(y)

Observe also that since f is continuous in the first variable, f(an, y) →f(a, y) for all y. Hence f(an, y) is a sequence of functions which is domi-nated by the integrable function g and which converges pointwise to f(a, y).By Lebesgue’s Dominated Convergence Theorem,

limn→∞

h(an) = limn→∞

∫f(an, y) dµ =

∫f(a, y) dµ = h(a)

and the proposition is proved. 2

As before, we define∫A f dµ =

∫f1A dµ for measurable sets A. We say

that f is integrable over A if f1A is integrable.


1. Show that if f is measurable, so are f+ and f−.

2. Show that if an integrable function f is zero a.e., then∫f dµ = 0.

3. Prove Proposition 5.6.4(i). You may want to treat positive and negative c’sseparately.

4. Assume that f : X → R is a measurable function.

a) Show that if f is integrable over a measurable set A, and An is anincreasing sequence of measurable sets with union A, then

limn→∞

∫An

f dµ =

∫A

f dµ


b) Assume that Bn is a decreasing sequence of measurable sets withintersection B. Show that if f is integrable over B1, then

limn→∞

∫Bn

f dµ =

∫B

f dµ

5. Show that if f : X → R is integrable over a measurable set A, and An is adisjoint sequence of measurable sets with union A, then∫

A

f dµ =

∞∑n=1

∫An

f dµ

6. Let f : R→ R be an integrable function, and define

An = x ∈ X | f(x) ≥ n

Show that

limn→∞

∫An

f dµ = 0

7. Prove the following slight extension of the Dominated Convergence Theorem:

Theorem: Assume that g : X → R is a nonnegative, integrable functionand that fn is a sequence of measurable functions converging a.e. to ameasurable function f . If |fn(x)| ≤ g(x) a.e. for each n, then

limn→∞

∫fn dµ =

∫f dµ

8. Assume that g : R × X → R is continuous in the first variable and thaty → g(x, y) is integrable for each x. Assume also that the partial derivative∂g∂x (x, y) exists for all x and y, and that there is an integrable function h :X → [0,∞] such that ∣∣∣∣∂g∂x (x, y)

∣∣∣∣ ≤ h(y)

for all x, y. Show that the function

f(x) =

∫g(x, y) dµ(y)

is differentiable at all points x and

f ′(x) =

∫∂g

∂x(x, y) dµ(y)

This is often referred to as “differentiation under the integral sign”.

9. Let µ be the Lebesgue measure on R. Show that if a, b ∈ R, a < b, andf : [a, b]→ R is a bounded, Riemann integrable function, then f is integrableover [a, b] and ∫ b

a

f(x) dx =

∫[a,b]

f dµ

(Hint: Since f is bounded, there is a constant M such that f + M is non-negative. Apply Theorem 5.5.9 to this function.)

5.7. L1(X,A, µ) AND L2(X,A, µ) 173

5.7 L1(X,A, µ) and L2(X,A, µ)

In this section we shall connect integration theory to the theory of normedspaces in Chapter 4. Recall from Definition 4.5.2 that a norm on a realvector space V is a function || · || : V → [0,∞) satisfying


(ii) ||αu|| = |α|||u|| for all α ∈ R and all u ∈ V .


Let us now put

L1(X,A, µ) = f : X → R : f is integrable

and define || · ||1 : L1(X,A, µ)→ [0,∞) by

||f ||1 =

∫|f | dµ

It is not hard to see that L1(X) is a vector space (see Exercise 1), and that|| · ||1 satisfies the three axioms above with one exception: ||f ||1 may be zeroeven when f is not zero — actually ||f ||1 = 0 if and only if f = 0 a.e. (seeExercise 5.5.4)).

The usual way to fix this is to consider two functions f and g to be equalif they are equal almost everywhere. To be more precise, let us write f ∼ gif f and g are equal a.e.2, and define the equivalence class of f to be the set

[f ] = g ∈ L1(X,A, µ) | g ∼ f

Note that two such equivalence classes [f ] and [g] are either equal (if fequals g a.e.) or disjoint (if f is not equal to g a.e.). If we let L1(X,A, µ)be the collection of all equivalence classes, we can organize L1(X,A, µ) as anormed vector space by defining

α[f ] = [αf ] and [f ] + [g] = [f + g] and |[f ]|1 = ||f ||1

The advantage of the space (L1(X), | · |1) compared to (L1(X,A, µ), || · ||1)is that it is a normed space where all the theorems we have proved aboutsuch spaces apply — the disadvantage is that the elements are no longerfunctions, but equivalence classes of functions. In practice, there is very littledifference between (L1(X), |·|1) and (L1(X,A, µ), ||·||1), and mathematicianstend to blur the distinction between the two spaces: they pretend to work inL1(X,A, µ), but still consider the elements as functions. We shall follow this

2What we are really doing here, is introducing an equivalence relation ∼, but we don’twant to stress the formalities. You may check them yourself in Exercise 3


practice here; it is totally harmless as long as you remember that wheneverwe talk about an element of L1(X,A, µ) as a function, we are really choosinga representative from an equivalence class (Exercise 3 gives a more thoroughand systematic treatment of L1(X,A, µ)).

The most important fact about (L1(X), | · |1) is that it is complete. Inmany ways, this is the most impressive success of the theory of measures andintegration: We have seen in previous chapters how important completenessis, and it is a great advantage to work with a theory of integration wherethe space of integrable functions is naturally complete. Before we turn tothe proof, you may want to remind yourself of Proposition 4.5.5 which shallbe our main tool.

Theorem 5.7.1 (L1(X), | · |1) is complete.

Proof: Assume that un is a sequence of functions in L1(X,A, µ) such thatthe series

∑∞n=1 un converges absolutely, i.e. such that

∑∞n=1 |un|1 < ∞.

According to Proposition 4.5.5, it suffices to show that the series∑∞

n=1 un(x)must converge in L1(X,A, µ).

We first use the absolute convergence to prove that the series∑∞

n=1 |un(x)|converges to an integrable function:

∫ ∞∑n=1

|un| dµ =

∫limN→∞

N∑n=1

|un| dµ = limN→∞

∫ N∑n=1

|un| dµ

= limN→∞

N∑n=1

∫|un| dµ = lim

N→∞

N∑n=1

|un|1 =∞∑n=1

|un|1 <∞

where we have used the Monotone Convergence Theorem to move the limitinside the integral sign. This means that the function

g(x) =

∞∑n=1

|un(x)|

is integrable. We shall use g as the dominating function in the DominatedConvergence Theorem.

Let us first observe that since g(x) =∑∞

n=1 |un(x)| is integrable, theseries converges a.e. Hence the sequence

∑∞n=1 un(x) (without the absolute

values) converges absolutely a.e., and hence it converges a.e. in the ordinarysense. Let f(x) =

∑∞n=1 un(x) (put f(x) = 0 on the null set where the

series does not converge). It remains to prove that the series converges tof in L1-sense, i.e. that |f −

∑Nn=1 un|1 → 0 as N → ∞. By definition

of f , we know that limN→∞

(f(x)−

∑Nn=1 un(x)

)= 0 a.e. Since |f(x) −∑N

n=1 un(x)| = |∑∞

n=N+1 un(x)| ≤ g(x) a.e., it follows from Dominated

5.7. L1(X,A, µ) AND L2(X,A, µ) 175

Convergence Theorem (actually from the slight extension in Exercise 5.6.7)that

|f −N∑n=1

un|1 =

∫|f −

N∑n=1

un| dµ→ 0

The theorem is proved. 2

It turns of that L1(X,A, µ) is just one of infinitely many spaces of thesame kind. In fact, for any real number p ≥ 1, we may let

Lp(X,A, µ) = f : X → R : |f |p is integrable

and define || · ||p : Lp(X,A, µ)→ [0,∞) by

||f ||p =

(∫|f |p dµ

) 1p

It turns out that Lp(X,A, µ) is a vector space, and that || · ||p is a norm onLp(X,A, µ), except that ||f ||p = 0 if f = 0 a.e. If we consider functions asequal if they are equal a.e., we can turn (Lp(X), || · ||p) into a normed space(Lp(X), | · |p) just as we did with L1(X,A, µ).

We shall not pursue the general theory of Lp-spaces here, but we shalltake a closer look at the case p = 2, i.e. the space

L2(X,A, µ) = f : X → R : |f |2 is integrable

with the norm

||f ||2 =

(∫|f |2 dµ

) 12

This space is particularly important as it turns out to be an inner productspace with inner product

〈f, g〉 =

∫fg dµ

But let us begin from the beginning. To prove that L2(X,A, µ) is a vectorspace, we need a simple lemma:

Lemma 5.7.2 For all real numbers a, b

(a+ b)2 ≤ 2a2 + 2b2

Proof:2a2 + 2b2 − (a+ b)2 = a2 + b2 − 2ab = (a− b)2 ≥ 0

2

It is now easy to prove that L2(X,A, µ) is a vector space:


Proposition 5.7.3 L2(X,A, µ) is a vector space, i.e.

(i) If f ∈ L2(X,A, µ), then cf ∈ L2(X,A, µ) for all c ∈ R.

(ii) If f, g ∈ L2(X,A, µ), then f + g ∈ L2(X,A, µ).

Proof: Part (i) is easy, and part (ii) follows from the lemma since∫(f + g)2 dµ ≤

∫(2f2 + 2g2) dµ = 2

∫f2 dµ+ 2

∫g2 dµ

2

We are now ready to prove that

〈f, g〉 =

∫fg dµ

is almost an inner product on L2(X,A, µ).

Proposition 5.7.4 If f, g ∈ L2(X,A, µ), then fg is integrable and

〈f, g〉 =

∫fg dµ

satisfies

(i) 〈f, g〉 = 〈g, f〉 for all f, g ∈ L2(X,A, µ).

(ii) 〈f + g, h〉 = 〈f, h〉+ 〈g, h〉 for all f, g, h ∈ L2(X,A, µ) .

(iii) 〈cf, g〉 = c〈f, g〉 for all c ∈ R, f, g ∈ L2(X,A, µ).

(iv) For all f ∈ L2(X,A, µ), 〈f, f〉 ≥ 0 with equality if and only if f = 0a.e.

Proof: To see that fg is integrable, note that

fg =1

2

((f + g)2 − f2 − g2

)and hence∫

|fg| dµ ≤ 1

2

(∫(f + g)2 dµ+

∫f2 dµ+

∫g2 dµ

)<∞

where we have used the previous proposition.Properties (i)-(iv) are easy consequences of properties we have already

proved and are left to the reader. 2

Note that 〈·, ·〉 would have been an inner product if instead of (iv) we hadhad

5.7. L1(X,A, µ) AND L2(X,A, µ) 177

(iv)’ For all f ∈ L2(X,A, µ), 〈f, f〉 ≥ 0 with equality if and only if f(x) = 0for all x ∈ X.

To turn 〈·, ·〉 into an inner product, we use the same trick as for L2(X,A, µ):We say that two functions f, g ∈ L2(X,A, µ) are equivalent if they are equala.e., and we let L2(X,A, µ) be the set of all equivalence classes. As before,we let [f ] denote the equivalence class of f , and define

〈[f ], [g]〉 = 〈f, g〉 =

∫fg dµ

for all [f ], [g] ∈ L2(X,A, µ) (you should check that this definition makessense; i.e. that it is independent of the representatives f and g we pick fromthe equivalence classes [f ] and [g]).

It follows from the proposition above and the theory in section 4.6 thatL2(X,A, µ) is an inner product space with norm

| [f ] |2 = 〈[f ], [f ]〉12 =

(∫f2 dµ

) 12

It is usual to blur the distinction between L2(X,A, µ) and L2(X,A, µ) justas one blurs the distinction between L1(X,A, µ) and L1(X,A, µ), and weshall follow this tradition and refer to elements in L2(X,A, µ) as if they werefunctions and not equivalence classes of functions.

We have the same main result for L2(X) as for L1(X):

Theorem 5.7.5 (L2(X), | · |2) is complete.

Proof: This is almost a copy of the proof that L1(X) is complete. In fact,once it has been proved that all the Lp-norms really are norms, the sameargument can be used to prove that all Lp-spaces, p ≥ 1, are complete.

We begin by assuming that un is a sequence of functions in L2(X)such that the series

∑∞n=1 un converges absolutely, i.e. that

∑∞n=1 |un|2 <∞.

According to Proposition 4.5.5, it suffices to show that the series∑∞

n=1 un(x)converges in L2(X).

Observe first that by the Monotone Convergence Theorem

∫ ( ∞∑n=1

|un(x)|

)2

dµ =

∫limN→∞

(N∑n=1

|un(x)|

)2

dµ =

= limN→∞

∫ ( N∑n=1

|un(x)|

)2

dµ

Taking square roots, we get |∑∞

n=1 |un(x)| |2 = limN→∞ |∑N

n=1 |un(x)| |2


The next step is to use this equality and the absolute convergence toprove that the series

∑∞n=1 |un(x)| converges to an L2-function:

|∞∑N=1

|un(x)| |2 = limN→∞

|N∑n=1

|un(x)| |2 ≤

≤ limN→∞

N∑n=1

|un(x)|2 =∞∑n=1

|un(x)|2 <∞

This means that the function

g(x) =∞∑n=1

|un(x)|

is in L2(X). We shall use g as the dominating function in the DominatedConvergence Theorem.

Let us first observe that since g(x) =∑∞

n=1 |un(x)| is in L2(X), theseries converges a.e. Hence the sequence

∑∞n=1 un(x) (without the absolute

values) converges absolutely a.e., and hence it converges a.e. in the ordinarysense. Let f(x) =

∑∞n=1 un(x) (put f(x) = 0 on the null set where the series

does not converge). It remains to prove that the series converges to f in L2-sense, i.e. that |f −

∑Nn=1 un|2 → 0 as N →∞. By definition of f , we know

that limN→∞

(f(x)−

∑Nn=1 un(x)

)= 0 a.e. Since |f(x) −

∑Nn=1 un(x)| =

|∑∞

n=N+1 un(x)| ≤ g(x) a.e. and g ∈ L2(X), it follows from DominatedConvergence Theorem that

|f −N∑n=1

un|2 =

∫ (f − N∑n=1

un

)2

dµ

12

→ 0

The theorem is proved. 2


1. Show that L1(X,A, µ) is a vector space. Since the set of all functions fromX to R is a vector space, it suffices to show that L1(X,A, µ) is a subspace,i.e. that cf and f + g are in L1(X,A, µ) whenever f, g ∈ L1(X,A, µ) andc ∈ R.

2. Show that || · ||1 satisfies the following conditions:

(i) ||f ||1 ≥ 0 for all f , and ||0||1 = 0 (here 0 is the function that is constant0).

(ii) ||cf ||1 = |c|||f ||1 for all f ∈ L1(X,A, µ) and all c ∈ R.

(iii) ||f + g||1 ≤ ||f ||1 + ||g||1 for all f, g ∈ L1(X,A, µ)

5.7. L1(X,A, µ) AND L2(X,A, µ) 179

This means that || · ||1 is a seminorm.

3. If f, g ∈ L1(X,A, µ), we write f ∼ g if f = g a.e.

a) Show that ∼ is an equivalence relation.

b) Show that if f ∼ f ′ and g ∼ g′, then f + g ∼ f ′ + g′. Show also thatcf ∼ cf ′ for all c ∈ R.

c) Show that if f ∼ g, then ||f − g||1 = 0 and ||f ||1 = ||g||1.

d) Show that the set L1(X,A, µ) of all equivalence classes is a normedspace if we define scalar multiplication, addition and norm by:

(i) c[f ] = [cf ] for all c ∈ R, f ∈ L1(X,A, µ).

(ii) [f ] + [g] = [f + g] for all f, g ∈ L1(X,A, µ)

(iii) |[f ]|1 = ||f ||1 for all f ∈ L1(X,A, µ).

Why do we need to establish the results in (i), (ii), and (iii) before wecan make these definitions?

4. LetX = 1, 2, 3, . . . , d, letA be the collection of all subsets ofX, and let µ be

the counting measure, i.e. µ(i) = 1 for all i. Show that |f |2 =∑di=1 f(i)2,

and explain that L2(X,A, µ) is essentially the same as Rd with the usualmetric.

5. Let X = N, let A be the collection of all subsets of X, and let µ be thecounting measure, i.e. µ(i) = 1 for all i. Show that L1(X,A, µ) consistsof all functions f such that the series

∑∞n=1 f(n) converges absolutely. Show

also that |f |1 =∑n=1 |f(n)|. Give a similar description of L2(X,A, µ) and

|f |2.

6. Prove (i)-(iv) in Proposition 5.7.4.

7. In this problem (X,A, µ) is a finite measure space (i.e. µ(X) < ∞) and allfunctions are measurable functions fromX til R. We shall use the abbreviatednotation

f > M = x ∈ X : f(x) > M

a) Assume that f is nonnegative. Show that f is integrable if and only ifthere is a number M ∈ R such that∫

f>Mf dµ <∞

b) Assume that f is nonnegative and integrable. Show that

limM→∞

∫f>M

f dµ = 0

c) Assume that fn is a sequence of nonnegative, integrable functionsconverging pointwise to f . Let M ∈ R. Show that

lim infn→∞

1fn>Mfn(x) ≥ 1f>Mf(x)


d) Let fn, f og M be as above. Show that if∫fn>M

fn(x) dµ ≤ α

for all n, then ∫f>M

f(x) dµ ≤ α

A squence fn of nonnegative functions is called uniformly integrable if

limM→∞

(supn∈N

∫fn>M

fn dµ

)= 0

(compare this to part b)).

e) Assume that fn is a uniformly integrable sequence of nonnegativefunctions converging pointwise to f . Show that f is integrable.

f) Let fn and f be as in part e). Show that fn converges to f iL1-norm, i.e.,

||f − fn||L1(µ) =

∫|f − fn| dµ→ 0 nar n→∞

Chapter 6

Constructing measures

So far we have been taking measures for granted; except for a few, almosttrivial examples we have not shown that they exist. In this chapter we shalldevelop a powerful technique for constructing measures with the propertieswe want, e.g. the Lebesgue measures and the coin tossing measure describedin Section 5.1.

When we construct a measure, we usually start by knowing how we wantit to behave on a family of simple sets: For the Lebesgue measure on R, wewant the intervals (a, b) to have measure b − a, and for the coin tossingmeasure we know what we want the measure of a cylinder set to be. Theart is to extend such “pre-measures” to full-blown measures.

We shall use a three step procedure to construct measures. We startwith a collection R of subsets of our space X and a function ρ : R → R+.The idea is that the sets R in R are the sets we “know” the size ρ(R) of; if,e.g., we want to construct the Lebesgue measure, R could be the collectionof all open intervals, and ρ would then be given by ρ((a, b)) = b− a. FromR and ρ, we first construct an “outer measure” µ∗ which assigns a “size”µ∗(A) to all subsets A of X. The problem with µ∗ is that it usually fails tobe countably additive; i.e. the crucial equality

µ∗(⋃n∈N

An) =∞∑n=1

µ∗(An)

doesn’t hold for all disjoint sequences An of subsets of X. The second stepin our procedure is therefore to identify a σ-algebra A such that countableadditivity holds when the disjoint sets An belong to A. The restriction of µ∗

to A will then be our desired measure µ. The final step in the procedure isto check that µ really is an extension of ρ, i.e. that R ⊆ A and µ(R) = ρ(R)for all R ∈ R. This is not always the case, but requires special properties ofR and ρ.

We begin by constructing outer measures.

181

182 CHAPTER 6. CONSTRUCTING MEASURES

6.1 Outer measure

To construct outer measures, we don’t need to require much of R and ρ:

Definition 6.1.1 Assume that X is a nonempty set and that R is a collec-tion of subsets of X such that

(i) ∅ ∈ R

(ii) There is a collection Rnn∈N of sets in R such that X =⋃n∈NRn.

We also assume that ρ : R → R+ is a function such that ρ(∅) = 0.

Assume that B is a subset of X. A covering1 of B is a countable collectionC = Cnn∈N of sets from R such that

B ⊆⋃n∈N

Cn

Note that by 6.1.1(ii), all sets B ⊆ X have at least one covering. We definethe size of the covering C to be

|C| =∞∑n=1

ρ(Cn)

We are now ready to define the outer measure µ∗ generated by R and ρ: Forall B ⊆ X, we set

µ∗(B) = inf|C| : C is a covering of B

We see why µ∗ is called an outer measure; it is obtained by approximatingsets from the outside by unions of sets in R.

The essential properties of outer measures are not hard to establish:

Proposition 6.1.2 The outer measure µ∗ satisfies:

(i) µ∗(∅) = 0.

(ii) (Monotonicity) If B ⊆ C, then µ∗(B) ≤ µ∗(C).

(iii) (Countable subadditivity) If Bnn∈N is a sequence of subsets of Rd,then

µ∗(

∞⋃n=1

Bn) ≤∞∑n=1

µ∗(Bn)

1Note that we are here using the term “covering” in a slightly different sense than inSection 2.6 – the coverings are now countable, and they do not consist of open sets, butof sets in R

6.1. OUTER MEASURE 183

Proof: (i) Since C = ∅, ∅, ∅, . . . is a covering of ∅ and ρ(∅) = 0, we getµ∗(∅) = 0.

(ii) Since any covering of C is a covering of B, we have µ∗(B) ≤ µ∗(C).

(iii) If µ∗(Bn) = ∞ for some n ∈ N, there is nothing to prove, and wemay hence assume that µ∗(Bn) <∞ for all n. Let ε > 0 be given. For each

n ∈ N, we can find a covering C(n)1 , C

(n)2 , . . . of Bn such that

∞∑k=1

ρ(C(n)k ) < µ∗(Bn) +

ε

2n

The collection C(n)k k,n∈N of all sets in all the coverings is a countable

covering of⋃∞n=1Bn, and

∑k,n∈N

ρ(C(n)k ) =

∞∑n=1

( ∞∑k=1

ρ(C(n)k )

)≤∞∑n=1

(µ∗(Bn) +

ε

2n

)=∞∑n=1

µ∗(Bn) + ε

(if you feel unsure about these manipulations, take a look at Exercise 5).This means that

µ∗(⋃n∈N

Bn) ≤∞∑n=1

µ∗(Bn) + ε

and since ε is any positive number, we must have

µ∗(⋃n∈N

Bn) ≤∞∑n=1

µ∗(Bn)

2

Remark: Note that property (iii) in the proposition above also holds forfinite sums; i.e.

µ∗(N⋃n=1

Bn) ≤N∑n=1

µ∗(Bn)

(to see this, just apply (iii) to the sequence B1, B2, . . . , BN , ∅, ∅, . . .). In par-ticular, we always have µ∗(A ∪B) ≤ µ∗(A) + µ∗(B).

We have now completed the first part of our program: We have con-structed the outer measure and described its fundamental properties. Thenext step is to define the measurable sets, to prove that they form a σ-algebra, and show that µ∗ is a measure when we restrict it to this σ-algebra.



1. Show that µ∗(R) ≤ ρ(R) for all R ∈ R.

2. Let X = 1, 2, R = ∅, 1, 1, 2, and define ρ : R → R by ρ(∅) = 0,ρ(1) = 2, ρ(1, 2) = 1. Show that µ∗(1) < ρ(1).

3. Assume that X = R, and let R consist of ∅, R, plus all open intervals (a, b),where a, b ∈ R. Define ρ : R → R by ρ(∅) = 0, ρ(R) =∞, ρ((a, b)) = b− a.

a) Show that if I = [c, d] is a closed and bounded interval, and C = Cnis a covering of I, then there is a finite number Ci1 , Ci2 , . . . , Cin of setsfrom C that covers I (i.e., such that I ⊆ Ci1 ∪ Ci2 ∪ . . . ∪ Cin). (Hint:Compactness.)

b) Show that µ∗([c, d]) = ρ([c, d]) = d − c for all closed and boundedintervals.

4. Assume that R is a σ-algebra and that ρ is a measure on R. Let (X, R, µ)be the completion of (X,R, µ). Show that µ∗(A) = µ(A) for all A ∈ R.

5. Let an,kn,k∈N be a collection of nonnegative, real numbers, and let A bethe supremum over all finite sums of distinct elements in this collection, i.e.

A = supI∑i=1

ani,ki : I ∈ N and all pairs (n1, k1), . . . , (nI , kI) are different

a) Assume that bmm∈N is a sequence which contains each element in theset an,kn,k∈N exactly ones. Show that

∑∞m=1 bm = A.

b) Show that∑∞n=1 (

∑∞k=1 an,k) = A.

c) Comment on the proof of Proposition 6.1.2(iii).

6.2 Measurable sets

The definition of a measurable set is not difficult, but it is rather myste-rious in the sense that it is not easy to see why it captures the essence ofmeasurability.

Definition 6.2.1 A subset E of X is called µ∗-measurable if

µ∗(A ∩ E) + µ∗(A ∩ Ec) = µ∗(A)

for all A ⊆ X. The collection of all measurable sets is denoted byM. Whenit is obvious which outer measure we have in mind (and it usually is!), weshall drop the reference to µ∗ and just talk of measurable sets2.

2A note on terminology is probably helpful at this stage as we may seem to use the word“measurable” in two different ways. If we have a measure space (X,A, µ), a measurableset is just a set in the σ-algebra A. In this section, we do not have a σ-algebra to beginwith, but define the (µ∗-)measurable sets in terms of the outer measure. As it will turnout that the µ∗-measurable sets always form a σ-algebra, there is no real contradictionbetween the two usages.

6.2. MEASURABLE SETS 185

This approach to measurability was introduced by the Greek mathematicianConstantin Caratheodory (1873-1950) and replaced more cumbersome (buteasier to motivate) earlier approaches (see Exercise 6.3.4 for one such). As al-ready mentioned, it is not at all easy to explain why it captures the intuitivenotion of measurability. The best explanation I can offer, is that the reasonwhy some sets are impossible to measure (and hence non-measurable), isthat they have very irregular boundaries. The definition above says that aset is measurable if we can use it to cut any other set in two parts withoutintroducing any further irregularities – hence all parts of its boundary mustbe reasonably regular. I admit that this explanation is rather vague, and abetter argument may simply be to show that the definition works. So let usget started.

Let us first of all make a very simple observation. Since A = (A ∩ E) ∪(A∩Ec), subadditivity (recall Proposition 6.1.2(iii)) tells us that we alwayshave

µ∗(A ∩ E) + µ∗(A ∩ Ec) ≥ µ∗(A)

Hence to prove that a set is measurable, we only need to prove that

µ∗(A ∩ E) + µ∗(A ∩ Ec) ≤ µ∗(A)

Our first result is easy.

Lemma 6.2.2 If E has outer measure 0, then E is measurable. In partic-ular, ∅ ∈ M.

Proof: If E has outer measure 0, so has A ∩ E since A ∩ E ⊆ E. Hence

µ∗(A ∩ E) + µ∗(A ∩ Ec) = µ∗(A ∩ Ec) ≤ µ∗(A)

for all A ⊆ X. 2

Next we have:

Proposition 6.2.3 M satisfies:3

(i) ∅ ∈ M.

(ii) If E ∈M, then Ec ∈M.

(iii) If E1, E2, . . . , En ∈M, then E1 ∪ E2 ∪ . . . ∪ En ∈M.

(iv) If E1, E2, . . . , En ∈M, then E1 ∩ E2 ∩ . . . ∩ En ∈M.

3As you probably know from Chapter 5, a family of sets satisfying (i)-(iii) is usuallycalled an algebra. As (iv) is a consequence of (ii) and (iii) using one of De Morgan’s laws,the proposition simply states that M is an algebra, but in the present context (iv) is inmany ways the crucial property.


Proof: We have already proved (i), and (ii) is obvious from the definitionof measurable sets. Since E1 ∪ E2 ∪ . . . ∪ En = (Ec1 ∩ Ec2 ∩ . . . ∩ Ecn)c byDe Morgan’s laws, (iii) follows from (ii) and (iv). Hence it only remains toprove (iv).

To prove (iv) is suffices to prove that if E1, E2 ∈M, then E1 ∩E2 ∈Mas we can then add more sets by induction. If we first use the measurabilityof E1, we see that for any set A ⊆ Rd

µ∗(A) = µ∗(A ∩ E1) + µ∗(A ∩ Ec1)

Using the measurability of E2, we get

µ∗(A ∩ E1) = µ∗((A ∩ E1) ∩ E2) + µ∗((A ∩ E1) ∩ Ec2)

Combining these two expressions and rearranging the parentheses, we have

µ∗(A) = µ∗((A ∩ (E1 ∩ E2)) + µ∗((A ∩ E1) ∩ Ec2) + µ∗(A ∩ Ec1)

Observe that (draw a picture!)

(A ∩ E1 ∩ Ec2) ∪ (A ∩ Ec1) = A ∩ (E1 ∩ E2)c

and hence

µ∗(A ∩ E1 ∩ Ec2) + µ∗(A ∩ Ec1) ≥ µ∗(A ∩ (E1 ∩ E2)c)

Putting this into the expression for µ∗(A) above, we get

µ∗(A) ≥ µ∗((A ∩ (E1 ∩ E2)) + µ∗(A ∩ (E1 ∩ E2)c)

which means that E1 ∩ E2 ∈M. 2

We would like to extend parts (iii) and (iv) in the proposition above tocountable unions and intersection. For this we need the following lemma:

Lemma 6.2.4 If E1, E2, . . . , En is a disjoint collection of measurable sets,then for all A ⊆ X

µ∗(A∩ (E1 ∪E2 ∪ . . .∪En)) = µ∗(A∩E1) + µ∗(A∩E2) + . . .+ µ∗(A∩En)

Putting A = X, we get in particular that

µ∗(E1 ∪ E2 ∪ . . . ∪ En) = µ∗(E1) + µ∗(E2) + . . .+ µ∗(En)

Proof: It suffices to prove the lemma for two sets E1 and E2 as we can thenextend it by induction. Using the measurability of E1, we see that

µ∗(A∩ (E1 ∪E2)) = µ∗((A∩ (E1 ∪E2))∩E1) + µ∗((A∩ (E1 ∪E2))∩Ec1) =

= µ∗(A ∩ E1) + µ∗(A ∩ E2)

2

We can now prove that M is closed under countable unions.

6.2. MEASURABLE SETS 187

Lemma 6.2.5 If An ∈M for each n ∈ N, then⋃n∈NAn ∈M.

Proof: Define a new sequence En of sets by E1 = A1 and

En = An ∩ (E1 ∪ E2 ∪ . . . ∪ En−1)c

for n > 1, and note that En ∈M sinceM is an algebra. The sets En aredisjoint and have the same union as An (make a drawing!), and hence itsuffices to prove that

⋃n∈NEn ∈M, i.e.

µ∗(A) ≥ µ∗(A ∩

∞⋃n=1

En)

+ µ∗(A ∩

( ∞⋃n=1

En)c)

for all A ∈ A. Since⋃Nn=1En ∈M for all N ∈ N, we have:

µ∗(A) = µ∗(A ∩

N⋃n=1

En)

+ µ∗(A ∩

( N⋃n=1

En)c) ≥

≥N∑n=1

µ∗(A ∩ En) + µ∗(A ∩

( ∞⋃n=1

En)c)

where we in the last step have used the lemma above plus the observationthat

(⋃∞n=1En

)c ⊆ (⋃Nn=1En

)c. Since this inequality holds for all N ∈ N,

we get

µ∗(A) ≥∞∑n=1

µ∗(A ∩ En) + µ∗(A ∩

( ∞⋃n=1

En)c)

By sublinearity, we have∑∞

n=1 µ∗(A ∩ En) ≥ µ∗(

⋃∞n=1(A ∩ En)) = µ∗(A ∩⋃∞

n=1En), and hence

µ∗(A) ≥ µ∗(A ∩

∞⋃n=1

En)

+ µ∗(A ∩

( ∞⋃n=1

En)c)

2

Lemmas 6.2.3 and 6.2.5 tell us that M is a σ-algebra. We may restrictµ∗ to M to get a function

µ :M→ R+

defined by4

µ(A) = µ∗(A) for all A ∈M

We can now complete the second part of our program:

4As µ is just the restriction of µ∗ to a smaller domain, it may seem a luxury to introducea new symbol for it, but in some situations it is important to be able to distinguish easilybetween µ and µ∗.


Theorem 6.2.6 M is a σ-algebra, and µ is a complete measure on M.

Proof: We already know that M is a σ-algebra, and if we prove that µ isa measure, the completeness will follow from Lemma 6.2.2. As we alreadyknow that µ(∅) = µ∗(∅) = 0, we only need to prove that

µ(∞⋃n=1

En) =∞∑n=1

µ(En) (6.2.1)

for all disjoint sequences En of sets from MBy Proposition 6.1.2(iii), we already know that

µ(∞⋃n=1

En) ≤∞∑n=1

µ(En)

To get the opposite inequality, we use Lemma 6.2.4 with A = X to see that

N∑n=1

µ(En) = µ(

N⋃n=1

En) ≤ µ(

∞⋃n=1

En)

Since this holds for all N ∈ N, we must have

∞∑n=1

µ(En) ≤ µ(

∞⋃n=1

En)

Hence we have both inequalities, and (6.2.1) is proved. 2

We have now completed the second part of our program — we haveshown how we can turn an outer measure µ∗ into a measure µ by restrictingit to the measurable sets. This is in an interesting result in itself, but we stillhave some work to do – we need to compare the measure µ to the originalfunction ρ.


1. Explain in detail why 6.2.3(iii) follows from (ii) and (iv).

2. Carry out the induction step in the proof of Proposition 6.2.3(iv).

3. Explain the equality (A∩E1 ∩Ec2)∪ (A∩Ec1) = A∩ (E1 ∩E2)c in the proofof Lemma 6.2.3.

4. Carry out the induction step in the proof of Lemma 6.2.4.

5. Explain why the sets En in the proof of Lemma 6.2.5 are disjoint and havethe same union as the sets An. Explain in detail why the sets En belong toM.

6.3. CARATHEODORY’S THEOREM 189

6.3 Caratheodory’s Theorem

In the generality we have have been working so far, there isn’t much that canbe said about the relationship between the original set function ρ and themeasure µ generated by the outer measure construction. Since R, ∅, ∅, . . . isa covering of R, we always have µ∗(R) ≤ ρ(R) for all R ∈ R, but this is notenough. What we really want is that R ⊆ A and that µ(R) = ρ(R) for allR ∈ R. When this is the case, we call µ a measure extension of R.

In addition to our standing assumption ρ(∅) = 0, there is one conditionsthat clearly has to be satisfied if there shall be any hope of constructinga measure extension: If A1, A2, A3, . . . , An, . . . are disjoint sets in R whoseunion

⋃∞n=1An happens to be in R, then

ρ(

∞⋃n=1

An) =

∞∑n=1

ρ(An) (6.3.1)

The reason is simply that the corresponding condition has to hold for themeasure µ, and if µ and ρ coincide on R, we get the equality above. Let usgive the condition a name:

Definition 6.3.1 We say that ρ is a premeasure if (6.3.1) holds wheneverA1, A2, A3, . . . , An, . . . are disjoint sets in R whose union

⋃∞n=1An happens

to be in R.

Note the use of the word “happens”: In general, there is no reason why⋃∞n=1An should be in R, but when it happens, (6.3.1) must hold.

Being a premeasure isn’t in itself enough to guarantee a measure ex-tension; we also have to require a certain regularity of the family R. Weshall prove two versions of the main theorem; first we shall prove that apremeasure ρ always has a measure extension when R is an algebra, andthen we shall strengthen the result by showing that it suffices to assumethat R is a so-called semi-algebra. There is a certain ironic contrast be-tween the two versions: R being an algebra is a natural-looking conditionthat doesn’t show up very often in practice, while R being a semi-algebrais an unnatural-looking condition that occurs all the time.

Let us begin by recalling what an algebra is:

Definition 6.3.2 R is called an algebra of sets on X if the following con-ditions are satisfied:

(i) ∅ ∈ R

(ii) If R ∈ R, then Rc ∈ R.

(iii) If R,S ∈ R, then R ∪ S ∈ R


We are now ready for the first version of the main result of this section.

Theorem 6.3.3 (Caratheodory’s Extension Theorem) Assume that Ris an algebra and that ρ is a premeasure on R. Then the measure µ gen-erated by the outer measure construction is a complete measure extendingρ.

Proof: We know that the outer measure construction generates a σ-algebraM of measurable sets and a complete measure µ on M, and it suffices toshow that all sets in R are measurable and that µ(R) = ρ(R) for all R ∈ R.

Let us first prove that any set R ∈ R is measurable, i.e. that

µ∗(A) ≥ µ∗(A ∩R) + µ∗(A ∩Rc)

for any set A ⊆ X. Given ε > 0, we can find a covering C = Cnn∈N of Asuch that

∑∞n=1 ρ(Cn) < µ∗(A) + ε. Since R is an algebra, Cn ∩ R and

Cn ∩Rc are coverings of A ∩R and A ∩Rc, respectively, and hence

µ∗(A) + ε >∞∑n=1

ρ(Cn) =∞∑n=1

(ρ(Cn ∩R) + ρ(Cn ∩Rc)

)=

=∞∑n=1

ρ(Cn ∩R) +∞∑n=1

ρ(Cn ∩Rc) ≥ µ∗(A ∩R) + µ(A ∩Rc)

Since ε > 0 is arbitrary, this means that µ∗(A) ≥ µ∗(A ∩ R) + µ∗(A ∩ Rc),and hence R is measurable.

It remains to prove that µ(R) = µ∗(R) = ρ(R) for all R ∈ R. As wealready know that µ∗(R) ≤ ρ(R), it suffices to prove the opposite inequality.For any ε > 0, there is a covering C = Cnn∈N of R such that

∑∞n=1 ρ(Cn) <

µ∗(R) + ε. Since R is an algebra, the sets C ′n = R ∩ (Cn \⋃n−1k=1 Ck) are

disjoint elements of R whose union is R, and, and hence

ρ(R) =

∞∑n=1

ρ(C ′n) ≤∞∑n=1

ρ(Cn) < µ∗(R) + ε

since ρ is a premeasure. As ε > 0 is arbitrary, this means that ρ(R) ≤ µ∗(R),and since we already have the opposite inequality, we have proved thatµ∗(R) = ρ(R). 2

In general, there is more than one measure extending a given premeasureρ, but for most spaces that occur in practice, there isn’t too much freedom(you should, however, see Exercise 3 for an extreme case). A measure space(X,M, µ) is called σ-finite if X is a countable union of sets of finite measure(i.e. X =

⋃n∈NBn where µ(Bn) <∞ for all n).


Proposition 6.3.4 Let ρ be a premeasure on an algebra R, and let (X,M, µ)be the measure space obtained by the outer measure construction. If ν is an-other measure extension of ρ defined on a σ-algebra B, then ν(A) ≤ µ(A)for all A ∈M∩B, with equality if µ(A) <∞. If µ is σ-finite, it is the onlymeasure extentsion of ρ to M.

Proof: Assume A ∈M∩ B and let C = Cn be a covering of A. Then

ν(A) ≤ ν(⋃n∈N

Cn) ≤∞∑n=1

ν(Cn) =

∞∑n=1

ρ(Cn) = |C|

and since µ(A) = µ∗(A) = inf|C| : C is a covering of A, we see thatν(A) ≤ µ(A).

Now assume that µ(A) < ∞. There clearly exists a covering C = Cnof A such that |C| <∞. Replacing Cn by Cn \ (C1 ∪ . . . Cn−1) if necessary,we may assume that the sets Cn are disjoint. If we put C =

⋃n∈NCn, we

then have µ(C) = ν(C) =∑∞

n=1 ρ(Cn) <∞. Hence

ν(A) + ν(C \A) = ν(C) = µ(C) = µ(A) + µ(C \A)

and since we already know that ν(A) ≤ µ(A) and ν(C \A) ≤ µ(C \A), thisis only possible if ν(A) = µ(A) (and ν(C \A) = µ(C \A)).

The final statement is left to the reader (see Exercise 2). 2

Theoretically, Caratheodory’s theorem is a very natural and satisfactoryresult, but it is a little inconvenient to use in practice as we seldom startwith a premeasure defined on an algebra of sets – experience shows that weusually start from something slightly weaker called a semi-algebra. We shallnow extend Caratheodory’s result to deal with this situation, and we beginwith the definition of a semi-algebra.

Definition 6.3.5 Let X be a non-empty set and R a non-empty collectionof subsets of X. We call R a semi-algebra if the following conditions aresatisfied:

(i) If R,S ∈ R, then R ∩ S ∈ R.

(ii) If R ∈ R, then Rc is a a disjoint union of sets in R.

Observation: The empty set ∅ belongs to all semi-algebras. To see this,pick a set R ∈ R. According to (ii), Rc = S1 ∪ S2 ∪ . . .∪ Sn for disjoint setsS1, S2, . . . , Sn in R. But then ∅ = R ∩ S1 ∈ R by condition (i).

Starting with a semi-algebra, it is not hard to build an algebra:


Lemma 6.3.6 Assume that R is a semi-algebra on a set X and let A consistof all finite, disjoint unions of sets in R. Then A is the algebra generatedby R.

Proof: As all sets in A clearly have to be in any algebra containing R, weonly need to show that A is an algebra, and for this it suffices to show thatA is closed under complements and finite intersections.

Let us start with the intersections. Assume that A,B are two nonemptysets in A, i.e. that

A = R1 ∪R2 ∪ . . . ∪Rn

B = Q1 ∪Q2 ∪ . . . ∪Qm

are disjoint unions of sets from R. Then

A ∩B =⋃i,j

(Ri ∩Qj)

is clearly a disjoint, finite union of sets in R, and hence A ∩ B ∈ A. Byinduction, we see that any finite intersection of sets in A is in A.

Turning to complements, assume that A = R1 ∪R2 ∪ . . .∪Rn is a set inA. By one of De Morgan’s laws

Ac = Rc1 ∩Rc2 ∩ . . . ∩Rcn

Since R is a semi-algebra, each Rci is a disjoint union of sets in R and hencebelongs to A. Since we have already proved that A is closed under finiteintersections, Ac ∈ A. 2

Our plan is as follows: Given a premeasure λ on a semi-algebra R, weshall extend it to a premeasure ρ on the algebra A generated by R, andthen apply Caratheodory’s theorem to ρ.

The next lemma will guarantee that it is always possible to extend apremeasure from a semi-algebra R to the algebra it generates.

Lemma 6.3.7 Assume that λ is a premeasure on a semi-algebra R. If aset A ⊆ X can be written as disjoint, countable unions of sets in R in twodifferent ways, A =

⋃i∈NRi and A =

⋃j∈N Sj, then

∞∑i=1

λ(Ri) =∞∑j=1

λ(Sj)

As usual, the equality still holds if one or both unions are finite (just addcopies of the empty set to get countable unions).


Proof: Observe that since R is a semi-algebra, the intersections Ri ∩ Sjbelong to R, and hence by condition (ii) in the definition of premeasure

λ(Ri) =

∞∑j=1

λ(Ri ∩ Sj)

and

λ(Sj) =∞∑i=1

λ(Ri ∩ Sj)

This means that

n∑i=1

λ(Ri) =∞∑i=1

∞∑j=1

λ(Ri ∩ Sj) =

∞∑j=1

∞∑i=1

λ(Ri ∩ Sj) =

∞∑j=1

λ(Sj)

which is what we wanted to prove. 2

We are now ready to extend premeasures on semi-algebras to premeasureson the algebras they generate.

Lemma 6.3.8 If λ is a premeasure on a semi-algebra R, there is a pre-measure ρ on the algebra A generated by R such that

ρ(A) =

n∑i=1

λ(Ri)

whenever A is a disjoint union A =⋃ni=1Ri of sets Ri ∈ R.

Proof: As any A in A is a finite, disjoint union A =⋃ni=1Ri of sets in R,

we define ρ : A → R+ by

ρ(A) =

n∑i=1

λ(Ri)

This definition is valid as the previous lemma tells us that if A =⋃mj=1 Sj is

another way of writing A as a disjoint union of sets in R, then∑n

i=1 λ(Ri) =∑mj=1 λ(Sj).To show that ρ is a premeasure, we need to check that

ρ(A) =∞∑n=1

ρ(An)

whenever A and A1, A2, . . . are in A and A is the disjoint union of A1, A2, . . ..Since A and the An’s are in A, they can be written as finite, disjoint unionsof sets in R:

A =

M⋃j=i

Rj


and

An =

Nn⋃k=1

Sn,k

Since A =⋃n∈NAn =

⋃n,k Sn,k, the previous lemma tells us that

M∑j=1

λ(Rj) =∑n,k

λ(Sn,k)

The rest is easy: By definition of ρ

ρ(A) =M∑j=1

λ(Rj) =∑n,k

λ(Sn,k) =∞∑n=1

Nn∑k=1

λ(Sn,k) =∞∑n=1

ρ(An)

2

Here is the promised extension of Caratheodory’s theorem to semi-algebras:

Theorem 6.3.9 (Caratheodory’s Theorem for Semi-Algebras) As-sume that λ : R → R+ is a premeasure on a semi-algebra R. Then λ hasan extension to a complete measure µ on a σ-algebra M containing R. If µis σ-finite, it is the only measure extension of λ to M.

Proof: We just apply the original version of Caratheodory’s Extension Theo-rem 6.3.2 to the premeasure ρ described in the lemma above. The uniquenessfollows from Proposition 6.3.4. 2

Remark: Note that the outer measures generated by λ and ρ are the same:Since any set in A is a finite, disjoint union of sets in R, any covering bysets in A can be replicated as a covering (of the same seize) by sets in R.This means that when we apply Caratheodory’s theorem for semi-algebras,we may assume that the outer measure is generated by the semi-algebra.

We now have the machinery we need to construct measures, and in thenext section we shall use it to construct Lebesgue measure on R.


1. Prove the first statement in the Remark just after Theorem 6.3.9 (that λ andρ generate the same outer measure).

2. In this problem we shall prove the last part of Proposition 6.3.4.

a) Show that if (X,A, µ) is σ-finite, there is an increasing family Enn∈Nof sets in A such that X =

⋃n∈NEn and µ(En) <∞ for all n ∈ N.


b) Show that if the measure µ in Proposition 6.3.3 is σ-finite, then it isthe only extension of ρ to a measure on M. (Hint: Use that for anyA ∈M, A =

⋃n∈N(A ∩ En) where µ(A ∩ En) <∞.)

c) Show that the Lebesgue measure on Rd is σ-finite.

3. Let X = Q and let R be the collection of subsets of X consisting of

(i) All bounded, half-open, rational intervals (r, s]Q = q ∈ Q | r < q ≤ swhere r, s ∈ Q.

(ii) All unbounded, half-open, rational intervals (−∞, s]Q = q ∈ Q | q ≤ sand (r,∞)Q = q ∈ Q | r < q where r, s ∈ Q.

a) Show that R is a semi-algebra.

b) Show that the σ-algebraM generated byR is the collection of all subsetof X.

Define ρ : R → R+ by ρ(∅) = 0, ρ(R) =∞ otherwise.

c) Show that ρ is a premeasure.

d) Show that there are infinitely many extensions of ρ to a measure ν onM. (Hint: Which values may ν(q) have?)

4. In this problem we shall take a look at a different (and perhaps more intuitive)approach to measurability. We assume that R is an algebra and that ρ is apremeasure on R. We also assume that ρ(X) <∞. Define the inner measureof a subset E of X by

µ∗(E) = µ∗(X)− µ∗(Ec)

We call a set E ∗-measurable if µ∗(E) = µ∗(E). (Why is this a naturalcondition?)

a) Show that µ∗(E) ≤ µ∗(E) for all E ⊆ X.

b) Show that if E is measurable, then E is ∗-measurable. (Hint: UseDefinition 6.2.1 with A = X.)

c) Show that if E is ∗-measurable, then for every ε > 0 there are measur-able sets D,F such that D ⊆ E ⊆ F and

µ∗(D) > µ∗(E)− ε

2and µ∗(F ) < µ∗(E) +

ε

2

d) Show that µ∗(F \D) < ε and explain that µ∗(F \E) < ε and µ∗(E\D) <ε.

e) Explain that for any set A ⊆ X

µ∗(A ∩ F ) = µ∗(A ∩D) + µ∗(A ∩ (F \D)) < µ∗(A ∩D) + ε

and

µ∗(A ∩Dc) = µ∗(A ∩ F c) + µ∗(A ∩ (F \D)) < µ∗(A ∩ F c) + ε

and use this to show that µ∗(A∩D) > µ∗(A∩E)− ε and µ∗(A∩F c) >µ∗(A ∩ Ec)− ε.


f) Explain that for any set A ⊆ X

µ∗(A) ≥ µ∗(A ∩ (F c ∪D)) =

= µ∗(A ∩ F c) + µ∗(A ∩D) ≥ µ∗(A ∩ Ec) + µ∗(A ∩ E)− 2ε

and use it to show that if E is ∗-measurable, then E is measurable. Thenotions of measurable and ∗-measurable hence coincide when µ∗(X) isfinite.

6.4 Lebesgue measure on R

In this section we shall use the theory in the previous section to constructthe Lebesgue measure on R. Essentially the same argument can be used toconstruct Lebesgue measure on Rd for d > 1, but as the geometric consid-erations become a little more complicated, we shall restrict ourselves to theone dimensional case. In Section 6.7 we shall see how we can obtain higherdimensional Lebesgue measure by a different method.

Recall that the one dimensional Lebesgue measure is a generalization ofthe notion of length: We know how long intervals are and want to extend thisnotion of size to a full-blown measure. For technical reasons, it is convenientto work with half-open intervals (a, b].

Definition 6.4.1 In this section, R consists of the following subsets of R:

(i) ∅

(ii) All finite, half-open intervals (a, b], where a, b ∈ R, a < b.

(iii) All infinite intervals of the form (−∞, b] and (a,∞) where a, b ∈ R.

The advantage of working with half-open intervals become clear when wecheck what happens when we take intersections and complements:

Proposition 6.4.2 R is a semi-algebra of sets.

Proof: I leave it to the reader to check the various cases. The crucialobservation is that the complement of a half-open interval is either a half-open interval or the union of two half-open intervals; e.g., we have (a, b]c =(−∞, a] ∪ (b,∞). 2

We define λ : R → R+ simply by letting λ(R) be the length of the intervalR (and, of course, λ(∅) = 0).

Lemma 6.4.3 λ is a premeasure on R.

6.4. LEBESGUE MEASURE ON R 197

Proof: We must show that if a set R ∈ R is a disjoint, countable unionR =

⋃i∈NRi of sets in R, then

λ(R) =∞∑i=1

λ(Ri)

We first note that for any finiteN , the nonoverlapping intervalsR1, R2, . . . , RNcan be ordered from left to right, and obviously make up a part of R in asuch a way that λ(R) ≥

∑Nn=1 λ(Rn). Since this holds for all finite N , we

must have λ(R) ≥∑∞

n=1 λ(Rn).

The opposite inequality is more subtle, and we need to use a compactnessargument. Let us first assume that R is a finite interval (a, b]. Given anε > 0, we extend each interval Rn = (an, bn] to an open interval Rn =(an, bn + ε

2n ). These open intervals cover the compact interval [a + ε, b],

and by Theorem 2.6.6 there is a finite subcover Rn1 , Rn2 , . . . , Rnk. Since

this finite set of intervals cover an interval of length b − a − ε, we clearlyhave

∑kj=1 λ(Rnj ) ≥ (b − a − ε). But since λ(Rn) = λ(Rn) + ε

2n , this

means that∑k

j=1 λ(Rnj ) ≥ (b − a − 2ε). Consequently,∑∞

n=1 λ(Rn) ≥b−a−2ε = λ(R)−2ε, and since ε is an arbitrary, positive number, we musthave

∑∞n=1 λ(Rn) ≥ λ(R).

It remains to see what happens if R is an interval of infinite length,i.e. when λ(R) = ∞. For each n ∈ N, the intervals Ri ∩ (−n, n]i∈Nare disjoint and have union R ∩ (−n, n]. By what we have already proved,λ(R ∩ (−n, n]) =

∑∞i=1 λ(Ri ∩ (−n, n]) ≤

∑∞i=1 λ(Ri). Since limn→∞ λ(R ∩

(−n, n]) =∞, we see that∑∞

i=1 λ(Ri) =∞ = λ(R). 2

Before we construct the Lebesgue measure, we also need to check thatthe σ-algebra generated by R is big enough.

Lemma 6.4.4 The σ-algebra σ(R) generated by R is the Borel σ-algebra,i.e. the σ-algebra B generated by the open sets.

Proof: Since the intervals is R are Borel sets, σ(R) ⊆ B. To prove theopposite inclusion, it suffices to prove that all open sets are in σ(R). Tothis end, first observe that all open intervals (a, b) are in B since (a, b) =⋃n∈N(a, b− 1

n ]. Since all non-empty, open sets are countable unions of openintervals according to Lemma 5.3.2, this means that all open sets are inσ(R). 2

We have reached our goal: The lemmas above tells us that we can applyCaratheodory’s theorem for semi-algebras to the semi-algebra R and thefunction λ. The resulting measure µ is the Lebesgue measure on R. Let ussum up the essential points.


Theorem 6.4.5 The Lebesgue measure µ is the unique, completed Borelmeasure on R such that µ(I) = |I| for all intervals I.

Proof: We apply Caratheodory’s theorem for semi-algebras to R and λ.This gives us a measure µ such that the µ-measure of a half-open intervalis equal to its length, and by continuity of measure (Proposition 5.1.5), thesame must hold for open and closed intervals. Since µ is σ-finite, it is unique,and by Lemma 6.4.4 above, it is defined on a σ-algebra containing the Borelsets. 2

Sometimes we need to use Lebesgue measure on only a part of R. If,e.g., we want to study functions defined on an interval [a, b], it is naturalto restrict µ to subsets of [a, b]. This is unproblematic as the Lebesguemeasurable subsets of [a, b] form a σ-algebra on [a, b]. Formally, we shouldgive µ a new name such as µ[a,b] when we restrict it to subsets of [a, b], butthe tradition is to keep the same name. Hence we shall use notations such asL1([a, b], µ) to denote spaces of functions defined on an interval [a, b]. Thenorm is then given by

||f ||1 =

∫f dµ

where we are only integrating over the set [a, b].

An important property of the Lebesgue measure is that it is translationinvariant — if we move a set a distance to the left or to the right, it keepsits measure. To formulate this mathematically, let E be a subset of R anda a real number, and write

E + a = e+ a | e ∈ E

for the set we obtain by moving all points in E a distance a.

Proposition 6.4.6 If E ⊆ R is measurable, so is E + a for all a ∈ R, andµ(E + a) = µ(E).

Proof: We shall leave this to the reader (see Exercise 5). The key observa-tion is that µ∗(E + a) = µ∗(E) holds for outer measure since intervals keeptheir length when we translate them. 2

One of the reasons why we had to develop the rather complicated ma-chinery of σ-algebras, is that we can not in general expect to define a measureon all subsets of our measure space X — some sets are just so complicatedthat they are nonmeasurable. We shall now take a look at such a set. Beforewe begin, we need to modify the notion of translation so that it works on

6.4. LEBESGUE MEASURE ON R 199

the interval [0, 1). If x, y ∈ [0, 1), we first define

x.

+ y =

x+ y if x+ y ∈ [0, 1)

x+ y − 1 otherwise

If E ⊆ [0, 1) and y ∈ [0, 1), let

E.

+ y = e.

+ y | e ∈ E

Note that E.

+ y is the set obtained by first translating E by y and thenmoving the part that sticks out to the right of [0, 1) one unit to left so thatit fills up the empty part of [0, 1). It follows from translation invariance thatE

.+ y is measurable if E is, and that µ(E) = µ(E

.+ y).

Example 1: A nonmeasurable set. We start by introducing an equiva-lence relation ∼ on the interval [0, 1):

x ∼ y ⇐⇒ x− y is rational

Next, we let E be a set obtained by picking one element from each equiva-lence class.5 We shall work with the sets E

.+ q for all rational numbers q

in the interval [0, 1), i.e., for all q ∈ Q = Q ∩ [0, 1).First observe that if q1 6= q2, then (E

.+ q1) ∩ (E

.+ q2) = ∅. If not, we

could write the common element x as both x = e1

.+ q1 and x = e2

.+ q2

for some e1, e2 ∈ E. The equality e1

.+ q1 = e2

.+ q2, implies that e1 − e2

is rational, and by definition of E this is only possible if e1 = e2. Henceq1 = q2, contradicting the assumption that q1 6= q2.

The next observation is that [0, 1) =⋃q∈Q(E

.+ q). To see this, pick an

arbitrary x ∈ [0, 1). By definition, there is an e in E that belongs to thesame equivalence class as x, i.e. such that q = x− e is rational. If q ∈ [0, 1),then x ∈ E

.+ q, if q < 0 (the only other possibility), we have x ∈ E

.+ (q+1)

(check this!). In either case, x ∈⋃q∈Q(E

.+ q).

Assume for contradiction that E is Lebesgue measurable. Then, as al-ready observed, E

.+ q is Lebesgue measurable with µ(E

.+ q) = µ(E) for

q ∈ Q. But by countable additivity

µ([0, 1)) =∑q∈Q

µ(E.

+ q)

and since µ([0, 1)) = 1, this is impossible — a sum of countable many, equal,nonnegative numbers is either ∞ (if the numbers are positive) or 0 (if thenumbers are 0).

5Here we are using a principle from set theory called the Axiom of Choice which allowsus to make a new set by picking one element from each set in an infinite family. It hasbeen proved that it is not possible to construct a nonmeasurable subset of R without usinga principle of this kind.


Note that this argument works not only for the Lebesgue measure, butfor any (non-zero) translation invariant measure on R. This means that it isimpossible to find a translation invariant measure on R that makes all setsmeasurable. ♣

The existence of nonmeasurable sets is not a surprise – there is no reasonto expect that all sets should be measurable – but it is a nuisance whichcomplicates many arguments. As we have seen in this chapter, the hardpart is often to find the right class of measurable sets and prove that it is aσ-algebra.


1. Complete the proof of Proposition 6.4.2.

2. Prove Proposition 6.4.7 by:

a) Showing that if E ⊆ R and a ∈ R, then µ∗(E + a) = µ(E).

b) Showing that if E ⊆ R is measurable, then so is E + a for any a ∈ R.

c) Explaining how to obtain the proposition from a) and b).

3. Show that if f : R → R is Lebesgue measurable, then fa(x) = f(x + a) isLebesgue measurable for all a ∈ R.

3. Check that the equivalence relation in Example 1 really is an equivalencerelation.

4. If A is a subset of R and r is a positive, real number, let

rA = ra | a ∈ A

Show that if A is measurable, then so is rA and µ(rA) = rµ(A).

5. Use Proposition 6.4.7 to prove that that if E ⊆ [0, 1) is Lebesgue measurable,

then E.+ y is Lebesgue measurable with µ(E

.+ y) = µ(E) or all y ∈ [0, 1).

6.5 Approximation results

Measurable sets and functions can be quite complicated, and it is often usefulto know that they can be approximated by sets and functions that are easierto grasp. In this section we shall see how Lebesgue measurable sets canbe approximated by open, closed and compacts sets and how measurablefunctions can be approximated by continuous functions. Throughout thesection, µ is the Lebesgue measure on R.

Proposition 6.5.1 Assume that A ⊆ R is a measurable set. For each ε > 0,there is an open set G ⊇ A such that µ(G \A) < ε, and a closed set F ⊆ Asuch that µ(A \ F ) < ε.

6.5. APPROXIMATION RESULTS 201

Proof: We begin with the open sets. Assume first that A has finite measure.Then for every ε > 0 there is a covering Cn of A by half-open rectanglesCn = (an, bn] such that

∞∑n=1

|Cn| < µ(A) +ε

2

If we replace the half-open intervals Cn = (an, bn] by the open intervalsBn = (an, bn + ε

2n+1 ), we get an open set G =⋃∞n=1Bn containing A with

µ(G) ≤∞∑n=1

µ(Bn) =

∞∑n=1

(|Cn|+

ε

2n+1

)< µ(A) + ε

and hence

µ(G \A) = µ(G)− µ(A) < ε

by Lemma 5.1.4c).If µ(A) is infinite, we slice A into pieces of finite measure An = A ∩

(n, n+1] for all n ∈ Z, and use what we have already proved to find an openset Gn such that An ⊆ Gn and µ(Gn \ An) < ε

2n+2 . Then G =⋃n∈ZGn is

an open set which contains A, and since G \A ⊆⋃n∈Z(Gn \An), we get

µ(G \A) ≤∑n∈Z

µ(Gn \An) <∑n∈Z

ε

2n+2< ε,

proving the statement about approximation by open sets.To prove the statement about closed sets, just note that if we apply

the first part of the theorem to Ac, we get an open set G ⊇ Ac such thatµ(G \ Ac) < ε. This means that F = Gc is a closed set such that F ⊆ A,and since A \ F = G \Ac, we have µ(A \ F ) < ε. 2

When the set A has finite measure, we can replace the closed sets in theproposition above by compact sets:

Corollary 6.5.2 Assume that A ⊆ R is a measurable set and that µ(A) <∞. For each ε > 0, there is a compact set K ⊆ A such that µ(A \K) < ε.

Proof: By the proposition, there is closed set F such that µ(A \ F ) < ε.The sets Kn = F ∩ [−n, n] are compact with union F , and hence A \ F =⋂n∈N(A\Kn). By continuity of measures (here we are using that µ(A) <∞),

limn→∞ µ(A \Kn) = µ(A \ F ) < ε, and hence µ(A \Kn) < ε for sufficientlylarge n’s. 2

We now turn to functions, and first prove a useful lemma that holds forall metric spaces.


Lemma 6.5.3 Assume that K ⊆ O where K is a compact and O an opensubset of a metric space X. Then there is a continuous function f : X →[0, 1] such that f(x) = 1 for all x ∈ K and f(x) = 0 for all x ∈ Oc.

Proof: Assume we can prove that there is a number α > 0 such that d(x, y) ≥α whenever x ∈ K and y ∈ Oc. Then the continuous function g : X → Rdefined by g(x) = infd(x, a) | a ∈ K would have value 0 on K and value αor greater Oc, and hence

f(x) = max0, 1− g(x)

α

would have the properties we are looking for (see Exercise 3 for help to provethat g and f really are continuous)

To prove that such an α exists, we use a compactness argument. Foreach x ∈ K there is a ball B(x; rx) which is contained in O. The ballsB(x; rx2 )x∈K (note that we have shrunk the balls to half their originalsize) is a covering of K, and must have a finite subcovering

B(x1;rx12

),B(x2;rx22

), . . . ,B(xk;rxk2

)

Choose α to be the smallest of the numbersrx12 ,

rx22 , . . . ,

rxk2 , and assume that

x, y are two points in X such that x ∈ K and d(x, y) < α. Then there mustbe an xi such that x ∈ B(xi;

rxi2 ), and hence d(xi, y) ≤ d(xi, x) + d(x, y) <

rxi2 +α ≤ rxi . Consequently, y ∈ B(xi; rxi) ⊆ O. This means that if y ∈ Oc,

then d(x, y) ≥ α for all x ∈ K. 2

We can now approximate indicator functions by continuous functions.

Lemma 6.5.4 Assume that A ⊆ R is measurable with finite measure. Forevery ε > 0, there is a continuous function f : R → [0, 1] such that f = 1Aexcept on a set of measure less than ε.

Proof: By Proposition 6.5.1 and Corollary 6.5.2 we can find a compact setK and an open set O such that K ⊂ A ⊂ O and µ(O \K) < ε. We can nowuse the function f from the previous lemma. 2

That simple functions are dense in L1(µ) and L2(µ) is an almost immediateconsequence of the definition of integrals.

Lemma 6.5.5 The simple functions are dense in L1(µ) and L2(µ).

Proof: I prove the L1(µ)-case and leave the slightly harder L2(µ)-case tothe reader (see Exercise 4 for help). Assume that f is in L1(µ) and thatε > 0. We must find a simple function h such that |f − h|1 < ε. Split f

6.5. APPROXIMATION RESULTS 203

in a positive and a negative part, f = f+ − f−, in the usual way. By thedefinition of the integral, there are nonnegative simple functions h+ ≤ f+

and h− ≤ f− such that |f+−h+|1 =∫f+ dµ−

∫h+ dµ <

ε2 and |f−+h−|1 =∫

f− dµ−∫h− dµ <

ε2 . Thus h = h+ − h− is a simple function with

|f − h|1 ≤ |f+ − h+|1 + |f− − h−|1 <ε

2+ε

2= ε

2

Combining the results above, we now get:

Theorem 6.5.6 The set of continuous functions is dense in both L1(µ) andL2(µ).

Proof: We prove the L1(µ)-case and leave the (similar) L2(µ)-case to thereader (see Exercise 5). Given f ∈ L1(µ) and an ε > 0, we must showthat there is a continuous function g such that |f − g|1 < ε. By Lemma6.5.5, we know that there is a simple function h =

∑ni=1 ai1Ai such that

|f − h|1 < ε2 . If we put M = max|ai| : i = 1, 2, . . . , n, Lemma 6.5.4 tells

us that there for each i is a continuous function gi : R → [0, 1] such thatgi = 1Ai except on a set of measure less than ε

2Mn . Note that this meansthat |1ai − gi|1 < ε

2Mn . If we put g =∑n

i=1 aigi, g is continuous and

|h− g|1 = |n∑i=1

ai(1Ai − gi)|1 ≤n∑i=1

|ai||1Ai − gi|1 ≤ nMε

2Mn=ε

2

Hence|f − g|1 ≤ |f − h|1 + |h− g|1 <

ε

2+ε

2= ε

and the theorem is proved. 2


1. Explain that A \ F = G \Ac at the end of the proof of Proposition 6.5.1.

2. A subset of R is called a Gδ-set if it is the intersection of countably manyopen sets, and it is called a Fσ-set if it is union of countably many closed set.

a) Explain why all Gδ- and Fσ-sets are measurable.

b) Show that if A ⊆ R is measurable, there is a Gδ-set G such that A ⊆ Gand µ(G \A) = 0.

c) Show that if A ⊆ R is measurable, there is a Fσ-set F such that F ⊆ Aand µ(A \ F ) = 0.

3. Let g be the function in the proof of Lemma 6.5.3.

a) Show that |g(x)− g(y)| ≤ d(x, y) for all x, y ∈ X.


b) Show that g is continuous.

c) Show that f(x) = max0, 1− g(x)α is continuous.

4. a) Assume that f is a nonnegative function in L2(µ), and that fn isan increasing sequence of simple functions converging pointwise to f .Show that |f − fn|2 → 0 (you may want to use Lebesgue’s DominatedConvergence Theorem).

b) Prove the L2(µ)-part of Lemma 6.5.5.

5. Prove the L2-part of Theorem 6.5.6 (you should do Exercise 4 first).

6. Show that the polynomials are dense in L1([a, b], µ) and L2([a, b], µ) for alla, b ∈ R, a < b. (Hint: Recall Weierstrass’ theorem 3.7.1.)

6.6 The coin tossing measure

As a second example of how to construct measures, we shall construct thenatural probability measure on the space of infinite sequences of unbiasedcoin tosses (recall Example 8 of Section 5.1). Be aware that this section israther sketchy – it is more like a structured sequence of exercises than anordinary section. The results here will not be used in the sequel.

Let us recall the setting. The underlying space Ω consists of all infinitesequences

ω = (ω1, ω2, . . . , ωn, . . .)

where each ωn is either H (for heads) or T (for tails). If a = a1, a2, . . . , anis a finite sequence of H’s and T’s, we let

Ca = ω ∈ Ω |ω1 = a1, ω2 = a2, . . . , ωn = an

and call it the cylinder set generated by a. We call n the length of Ca. Let Rbe the collection of all cylinder sets (of all possible lengths) plus the emptyset ∅.

Lemma 6.6.1 R is a semi-algebra.

Proof: The intersection of two cylinder sets Ca and Cb is either equal to oneof them (if one of the sequences a, b is an extension of the other) or empty.The complement of a cylinder set is the union of all other cylinder sets ofthe same length. 2

We define a function λ : R → [0, 1] by putting

λ(Ca) =1

2n

where n is the length of Ca. (There are 2n cylinder sets of length n and theycorrespond to 2n equally probable events).

We first check that λ behaves the right way under finite unions:

6.6. THE COIN TOSSING MEASURE 205

Lemma 6.6.2 If A1, A2, . . . , AN are disjoint sets in R whose union⋃Nn=1An

belongs to R, then

λ(

N⋃n=1

An) =

N∑n=1

λ(An)

Proof: Left to the reader (see Exercise 1). 2

To prove that λ is a premeasure, we must extend the result above to count-able unons, i.e., we need to show that if An is a disjoint sequence ofcylinder sets whose union is a cylinder set, then

λ(⋃n∈N

An) =

∞∑i=1

λ(An)

The next lemma tells us that this condition is trivially satisfied because thesituation never occurs – a cylinder set is never a disjoint, countable union ofinfinitely many cylinder sets! As this is the difficult part of the construction,I spell out the details. The argument is actually a compactness argument indisguise and corresponds to Lemma 6.4.3 in the construction of the Lebesguemeasure.

Before we begin, we need some notation and terminology. For eachk ∈ N, we write ω ∼k ω if ωi = ωi for i = 1, 2, . . . , k, i.e., if the first k cointosses in the sequences ω and ω are equal. We say that a subset A of Ω isdetermined at k ∈ N if whenever ω ∼k ω then either both ω and ω belong toA or none of them do (intuitively, this means that you can decide whether ωbelongs to A by looking at the k first coin tosses). A cylinder set of length kis obviously determined at time k. We say that a set A is finally determinedif it is determined at some k ∈ N.

Lemma 6.6.3 A cylinder set A is never an infinite, countable, disjointunion of cylinder sets.

Proof: Assume for contradiction that Ann∈N is a disjoint sequence ofcylinder sets whose union A =

⋃n∈NAn is also a cylinder set. Since the

An’s are disjoint and nonempty, the set BN = A \⋃Nn=1An is nonempty for

all N ∈ N. Note that the sets BN are finitely determined since A and theAn’s are.

Since the sets BN are decreasing and nonempty, there must either bestrings ω = (ω1, ω2, . . .) starting with ω1 = H in all the sets BN or stringsstarting with ω1 = T in all the sets BN (or both, in which case we justchoose H). Call the appropriate symbol ω1. Arguing in the same way, wesee that there must either be strings starting with (ω1, H) or strings start-ing with (ω1, T ) in all the sets BN (or both, in which case we just chooseH). Call the appropriate symbol ω2. Continuing in this way, we get an


infinite sequence ω = (ω1, ω2, ω3, . . .) such that for each k ∈ N, there is asequence starting with (ω1, ω2, ω3, . . . , ωk) in each BN . Since BN is finitelydetermined, this means that ω ∈ BN for all N (just choose k so large thatBN is determined at k). But this implies that ω ∈ A \

⋃n∈NAn, which is a

contradiction since A =⋃n∈NAn by assumption. 2

We are now ready for the main result:

Theorem 6.6.4 There is a complete measure P on Ω such that P (A) =λ(A) for all finitely determined sets A. The measure is unique on the σ-algebra of measurable sets.

Proof: The three lemmas above give us exactly what we need to applyCaratheodory’s theorem for semi-algebras. The details are left to the reader.2


1. Prove Lemma 6.6.2 (Hint: If the cylinder sets A1, A2, . . . , AN have lengthK1,K2, . . . ,KN , respectively, then this is really a statement about finitesequences of length K = maxK1,K2, . . . ,KN.)

2. Fill in the details in the proof of Theorem 6.6.4.

3. Let Bnn∈N be a decreasing sequence of nonempty, finitely determined sets.Show that

⋂n∈NBn 6= ∅.

4. Let E = ω ∈ Ω |ω contains infinitely many H’s. Show that E is notfinitely determined.

5. Let H = 1, T = 0. Show that

E = ω ∈ Ω | limn→∞

1

n

n∑i=1

ωi =1

2

is not finitely determined.

6. Show that a set is in the algebra generated by R if and only if it is finitelydetermined. Show that if E is a measurable set, then there is for any ε > 0a finitely determined set D such that P (E 4D) < ε

7. (You should do Exercise 6 before you attempt this one.) Assume that I is anonempty subset of N. Define an equivalence relation ∼I on Ω by

ω ∼I ω ⇐⇒ ωi = ωi for all i ∈ I

We say that a set B ⊆ Ω is I-determined if whenever ω ∼I ω, then eitherω and ω are both in B or both in Bc (intuitively, this means that we candetermine whether ω belongs to B by looking at the coin tosses ωi wherei ∈ I).

6.7. PRODUCT MEASURES 207

a) Let A be the σ-algebra of all measurable sets, and define

AI = A ∈ A |A is I-determined

Show that AI is a σ-algebra.

b) Assume that A ∈ AI and that C is a finitely determined set such thatA ⊆ C. Show that there is a finitely determined C ∈ AI such thatA ⊆ C ⊆ C.

c) Assume that A ∈ AI . Show that for any ε > 0, there is a finitelydetermined B ∈ AI such that P (A4B) < ε.

d) Assume that I, J ⊆ N are disjoint. Show that if B ∈ AI and D ∈ AJare finitely generated, then P (B∩D) = P (B)P (D). In the language ofprobability theory, B and D are independent events. (Hint: This is justfinite combinatorics and has nothing to do with measures. Note thatfinitely determined sets are in the algebra generated by R, and hencetheir measures are given directly in terms of λ.)

e) We still assume that I, J ⊆ N are disjoint. Show that if A ∈ AI andC ∈ AJ , then P (A ∩ C) = P (A)P (C). (Hint: Combine c) and d).)

f) Let In = n, n + 1, n + 2, . . .. A set E ⊆ Ω is called a tail event ifE ∈ In for all n ∈ N. Show that the sets E in Exercises 4 and 5 are tailevents.

g) Assume that E is a tail event and that A is finitely generated. Showthat P (A ∩ E) = P (A)P (E).

h) Assume that E is a tail event, and let En be finitely determined setssuch that P (E 4 En) < 1

n (such sets exist by Exercise 6). Show thatP (E ∩ En)→ P (E) as n→∞.

i) Show that on the other hand P (E ∩ En) = P (En)P (E) → P (E)2.Conclude that P (E) = P (E)2, which means that P (E) = 0 or P (E) =1. We have proved Kolmogorov’s 0-1-law : A tail event can only haveprobability 0 or 1.

6.7 Product measures

In calculus you learned how to compute double integrals by iterated inte-gration: If R = [a, b]× [c, d] is a rectangle in the plane, then∫∫

Rf(x, y) dxdy =

∫ d

c

[∫ b

af(x, y) dx

]dy =

∫ b

a

[∫ d

cf(x, y) dy

]dx

There is a similar result in measure theory that we are now going to lookat. Starting with two measure spaces (X,A, µ) and (Y,B, ν), we shall firstconstruct a product measure space (X × Y,A ⊗ B, µ × ν) and then provethat∫

f d(µ× ν) =

∫ [∫f(x, y) dµ(x)

]dν(y) =

∫ [∫f(x, y) dν(y)

]dµ(x)


(I am putting in x’s and y’s to indicate which variables we are integrat-ing with respect to). The guiding light in the construction of the productmeasure µ× ν is that we want it to satisfy the natural product rule

µ× ν(A×B) = µ(A)ν(B)

for all A ∈ A and all B ∈ B (think of the formula for the area of an ordinaryrectangle).

As usual, we shall apply Caratheodory’s theorem for semi-algebras. If wedefine measurable rectangles to be subsets of X ×Y of the form R = A×B,where A ∈ A and B ∈ B, we first observe that the class of all such sets forma semi-algebra.

Lemma 6.7.1 The collection R of measurable rectangles is a semi-algebra.

Proof: Observe first that since

(R1 × S1) ∩ (R2 × S2) = (R1 ∩R2)× (S1 ∩ S2)

the intersection of two measurable rectangles is a measurable rectangle.Moreover, since

(A×B)c = (Ac ×Bc) ∪ (Ac ×B) ∪ (A×Bc) (6.7.1)

the complement of measurable rectangle is a finite, disjoint union of mea-surable rectangles, and hence R is a semi-algebra. 2

The next step is to define a function λ : R → R+ by

λ(A×B) = µ(A)ν(B)

To show that λ is a premeasure, we must prove that if

A×B =⋃n∈N

(Cn ×Dn)

is a disjoint union, then λ(A×B) =∑∞

n=1 λ(Cn ×Dn), or in other words

µ(A)ν(B) =

∞∑n=1

µ(Cn)ν(Dn) (6.7.2)

Observe that since µ(C) =∫

1C(x)dµ(x) and ν(D) =∫

1D(y)dν(y), wehave

µ(C)ν(D) =

∫1C(x)dµ(x)

∫1D(y)dν(x) =

=

∫ [∫1C(x)1D(y)dµ(x)

]dν(x) =

∫ [∫1C×D(x, y)dµ(x)

]dν(x)

6.7. PRODUCT MEASURES 209

for any two sets C ∈ A and D ∈ B. If A×B =⋃n∈N(Cn×Dn) is a disjoint

union, the Monotone Convergence Theorem 5.5.6 thus tells us that

∞∑n=1

µ(Cn)ν(Dn) = limN→∞

N∑n=1

µ(Cn)ν(Dn) =

= limN→∞

N∑n=1

∫ [∫1Cn×Dn(x, y)dµ(x)

]dν(y)

= limN→∞

∫ [∫ N∑n=1

1Cn×Dn(x, y)dµ(x)

]dν(y)

=

∫ [limN→∞

∫ N∑n=1

1Cn×Dn(x, y)dµ(x)

]dν(y)

=

∫ [∫limN→∞

N∑n=1

1Cn×Dn(x, y)dµ(x)

]dν(y)

=

∫ [∫ ∞∑n=1

1Cn×Dn(x, y)dµ(x)

]dν(y)

=

∫ [∫1A×B(x, y)dµ(x)

]dν(y) = µ(A)ν(B)

which proves equation (6.7.2).

We are now ready to prove the main theorem. Remember that a measurespace is σ-finite if it is a countable union of sets of finite measure.

Theorem 6.7.2 Assume that (X,A, µ) and (Y,B, ν) are two measure spacesand let A⊗B be the σ-algebra generated by the measurable rectangles A×B,A ∈ A, B ∈ B. Then there exists a measure µ× ν on A⊗ B such that

µ× ν(A×B) = µ(A)ν(B) for all A ∈ A, B ∈ B

If µ and ν are σ-finite, this measure is unique and is called the productmeasure of µ and ν.

Proof: Apply Caratheodory’s theorem for semi-algebras. 2

Product measures can be used to construct Lebesgue measure in higherdimension. If µ is Lebesgue measure on R, the completion of the productµ×µ is the Lebesgue measure on R2. To get Lebesgue measure on R3, takea new product (µ× µ)× µ and complete it. Continuing in this way, we getLebesgue measure in all dimensions.



1. Check that formula (R1 × S1) ∩ (R2 × S2) = (R1 ∩ R2) × (S1 ∩ S2) in theproof of Lemma 6.7.1 is correct.

2. Check that formula (6.7.1) is correct and that the union is disjoint.

3. Assume that µ is the counting measure on N. Show that µ × µ is countingmeasure on N2.

4. Show that any open set in Rd is a countable union of open boxes of the form

(a1, b1)× (a2, b2)× . . .× (ad, bd)

where a1 < b1, a2 < b2, . . . , ad < bd (this can be used to show that theLebesgue measure on Rd is a completed Borel measure).

5. In this problem we shall generalize Proposition 6.5.1 from R to R2. Let µ bethe Lebesgue integral on R and let λ = µ× µ.

a) Show that if D,E are open sets in R, then D × E is open in R2.

b) Assume that A × B is a measurable rectangle with µ(A), µ(B) < ∞.Show that for any ε > 0 there are open sets D,E ⊆ R such thatA×B ⊆ D × E and λ(E ×D)− λ(A×B) < ε.

c) Assume that Z ⊆ R2 is measurable with λ(Z) <∞. Show that for anyε > 0, there is an open setG ⊆ R2 such that Z ⊆ G and λ(G)−λ(Z) < ε.Explain why this mean that λ(G \ Z) < ε.

d) Assume that Z ⊆ R2 is measurable. Show that for any ε > 0, there isan open set G ⊆ R2 such that Z ⊆ G and λ(G \ Z) < ε.

e) Assume that Z ⊆ R2 is measurable. Show that for any ε > 0, there isa closed set F ⊆ R2 such that that Z ⊇ F and λ(Z \ F ) < ε.

6.8 Fubini’s Theorem

In this section we shall see how we can integrate with respect to a productmeasure; i.e. we shall prove the formulas

∫f d(µ× ν) =

∫ [∫f(x, y) dµ(x)

]dν(y) =

∫ [∫f(x, y) dν(y)

]dµ(x) (6.8.1)

mentioned in the previous section. As one might expect, these formulas donot hold for all measurable functions, and part of the challenge is to findthe right conditions. We shall prove two theorems; one (Tonelli’s Theorem)which only works for nonnegative functions, but doesn’t need any additionalconditions; and one (Fubini’s Theorem) which works for functions takingboth signs, but which has integrability conditions that might be difficult tocheck. Often the two theorems are used in combination – we use Tonelli’sTheorem to show that the conditions for Fubini’s Theorem are satisfied.

We have to begin with some technicalities. For formula (6.8.1) to makesense, the functions x 7→ f(x, y) and y 7→ f(x, y) we get by fixing one of

6.8. FUBINI’S THEOREM 211

the variables in the function f(x, y), have to be measurable. To simplifynotation, we write fy(x) for the function x 7→ f(x, y) and fx(y) for thefunction y 7→ f(x, y). Similarly for subsets of X × Y :

Ey = x ∈ X | (x, y) ∈ E and Ex = y ∈ Y | (x, y) ∈ E

These sets and functions are called sections of f and E, respectively (makea drawing).

Lemma 6.8.1 Assume that (X,A, µ) and (Y,B, ν) are two measure spaces,and let A⊗ B be the product σ-algebra.

(i) For any E ∈ A ⊗ B, we have Ey ∈ A and Ex ∈ B for all y ∈ Y andx ∈ X.

(ii) For any A⊗B-measurable f : X × Y → R, the sections fy and fx areA- and B-measurable, respectively, for all y ∈ Y and x ∈ X.

Proof: I only prove the lemma for the y-sections, and leave the x-sectionsto the readers.

(i) Since A⊗B is the smallest σ-algebra containing the measurable rect-angles, it clearly suffices to show that

C = E ⊆ X × Y |Ey ∈ A

is a σ-algebra containing the measurable rectangles. That the measurablerectangles are in C, follows from the observation

(A×B)y =

A if y ∈ B

∅ if y /∈ B

To show that C is closed under complements, just note that if E ∈ C, thenEy ∈ A, and hence (Ec)y = (Ey)c ∈ A (check this!) which means thatEc ∈ C. Similarly for countable unions: If for each n ∈ N, En ∈ C, then(En)y ∈ A for all n, and hence

⋃n∈N(En)y = (

⋃n∈NEn)y ∈ A (check this!)

which means that⋃n∈NEn ∈ C.

(ii) We need to check that (fy)−1(I) ∈ A for all intervals of the form[−∞, r). But this follows from (i) and the measurability of f since

(fy)−1(I) = (f−1(I))y

(check this!) 2

Remark: The proof above illustrates a useful technique. To prove that allsets in the σ-algebra F generated by a family R satisfies a certain propertyP , we prove


(i) All sets in R has property P .

(ii) The sets with property P form a σ-algebra G.

Then F ⊆ G (since F is the smallest σ-algebra containing R), and hence allsets in F has property P .

There is another measurability problem in formula (6.8.1): We need toknow that the integrated sections

y 7→∫f(x, y) dµ(x) =

∫fy(x) dµ(x)

and

x 7→∫f(x, y) dν(y) =

∫fx(y) dν(y)

are measurable. This is a more complicated question, and we shall needa quite useful and subtle result known as the Monotone Class Theorem.A monotone class of subsets of a set Z is a collection M of subsets of Zthat is closed under increasing countable unions and decreasing countableinteresections. More precisely:

(i) If E1 ⊆ E2 ⊆ . . . ⊆ En ⊆ . . . are in M, then⋃n∈N ∈M

(ii) If E1 ⊇ E2 ⊇ . . . ⊇ En ⊇ . . . are in M, then⋂n∈N ∈M

All σ-algebras are monotone classes, but a monotone class need not be aσ-algebra (see Exercise 4). If R is a collection of subsets of Z, there is (bythe usual argument) a smallest monotone class containing R. It is calledthe monotone class generated by R.

Theorem 6.8.2 (Monotone Class Theorem) Assume that Z is a non-empty set and that A is an algebra of subsets of Z. Then the σ-algebra andthe monotone class generated by A coincide.

Proof: Let C be the σ-algebra and M the monotone class generated by A.Since all σ-algebras are monotone classes, we must have M⊆ C.

To prove the opposite inclusion, we show thatM is a σ-algebra. Observethat it suffices to prove that M is an algebra as closure under countableunions will then take care of itself: If En is a sequence from M, thesets Fn = E1 ∪ E2 ∪ . . . ∪ En are in M since M is an algebra, and hence⋃n∈NEn =

⋃n∈N Fn ∈ M since M is closed under increasing countable

unions.

To prove that M is an algebra, we use a trick. For each M ∈M define

M(M) = F ∈M|F \M,F \M,F ∩M ∈M


It is not hard to check that sinceM is a monotone class, so isM(M). Notealso that by symmetry, N ∈M(M)⇐⇒M ∈M(N).

Our aim is to prove thatM(M) =M for all M ∈M. This would meanthat the intersection and difference between any two sets in M are in M,and since Z ∈ M (because Z ∈ A ⊆ M), we may conclude that M is analgebra.

To show that M(M) = M for all M ∈ M, pick a set A ∈ A. SinceA is an algebra, we have A ⊆ M(A). Since M is the smallest monotoneclass containing A, this means that M(A) =M, and hence M ∈M(A) forall M ∈ M. By symmetry, A ∈ M(M) for all M ∈ M. Since A was anarbitrary element in A, we have A ⊆M(M) for all M ∈ M, and using theminimality of M again, we see that M(M) =M for all M ∈M. 2

The advantage of the Monotone Class Theorem is that it is often mucheasier to prove that families are closed under monotone unions and intersec-tions than under arbitrary unions and intersections, especially when there’sa measure involved in the definition. The next lemma is a typical case. Notethe σ-finiteness condition; Exercise 9 shows that the result does not alwayshold without it.

Lemma 6.8.3 Let (X,A, µ) and (Y,B, ν) be two σ-finite measure spaces,and assume that E ⊆ X × Y is A ⊗ B-measurable. Then the functionsx 7→ ν(Ex) and y 7→ µ(Ey) are A- and B-measurable, respectively, and∫

ν(Ex) dµ(x) =

∫µ(Ey) dν(y) = µ× ν(E)

Proof: We shall prove the part about the x-sections Ex and leave the (sim-ilar) proof for y-sections to the readers. We shall first carry out the prooffor finite measure spaces, i.e. we assume that µ(X), ν(Y ) <∞.

Let

C = E ⊆ X × Y |x 7→ ν(Ex) is A-measurable and∫ν(Ex) dµ(x) = µ× ν(E)

If we can show that C is a monotone class containing the algebra generatedby the measurable rectangles R, the Monotone Class Theorem will tell usthat A×B ⊆ C (since A×B is the smallest σ-algebra, and hence the smallestmonotone class, containing the algebra generated by R). This obviouslysuffices to prove the finite case of the theorem.

To show that any measurable rectangle E = A × B belongs to C, justobserve that

ν(Ex) =

ν(B) if x ∈ A

0 if x /∈ A

and that∫ν(Ex) dµ(x) =

∫A ν(B) dµ = µ(A)ν(B) = µ× ν(E).


A set F in the algebra generated by the measurable rectangles, is adisjoint union F =

⋃ni=1Ei of measurable rectangles, and since ν(Fx) =∑n

i=1 ν((Ei)x), the function x 7→ ν(Fx) is A-measurable (a sum of measur-able functions is measurable) and

∫ν(Fx) dµ(x) =

∫ ∑ni=1 ν((Ei)x)dµ(x) =∑n

i=1 µ× ν(Ei) = µ× ν(F ). Hence F ∈ C.To show that C is monotone class, assume that En is an increasing

sequence of sets in C. Let E =⋃n∈NEn, and note that Ex =

⋃∞n=1(En)x.

By continuity of measure, ν(Ex) = limn→∞ ν((En)x), and hence x 7→ ν(Ex)is measurable as the limit of a sequence of measurable functions. Moreover,by the Monotone Convergence Theorem and continuity of measures,∫

ν(Ex) dµ(x) =

∫limn→∞

ν((En)x) dµ(x)) = limn→∞

∫ν((En)x) dµ(x)) =

= limn→∞

µ× ν(En) = µ× ν(E)

This means that E ∈ C.We must also check monotone intersections. Assume that En is a

decreasing sequence of sets in C. Let E =⋂n∈NEn, and note that Ex =⋂∞

n=1(En)x. By continuity of measure (here we are using that ν is a finitemeasure), ν(Ex) = limn→∞ ν((En)x), and hence x 7→ ν(Ex) is measurable.Moreover, using the Dominated Convergenge Theorem (since the measurespace is finite, we can use the function that is constant ν(Y ) as the domi-nating function), we see that∫

ν(Ex) dµ(x) =

∫limn→∞

ν((En)x) dµ(x)) = limn→∞

∫ν((En)x) dµ(x)) =

= limn→∞

µ× ν(En) = µ× ν(E)

and hence E ∈ C.Since this shows that C is a monotone class and hence a σ-algebra con-

tainingA×B, we have proved the lemma for finite measure spaces. To extendit to σ-finite spaces, let Xn and Yn be increasing sequence of subsets ofX and Y of finite measure such that X =

⋃n∈NXn and Y =

⋃n∈N Yn. If E

is a A⊗B-measurable subset of X×Y , it follows from what we have alreadyproved (but with some work, see Exercise 11) that x 7→ ν((E ∩ (Xn×Yn))x)is measurable and∫

ν((E ∩ (Xn × Yn))x) dµ(x) = µ× ν(E ∩ (Xn × Yn))

The lemma for σ-finite spaces now follows from the Monotone ConvergenceTheorem and continuity of measures. 2

We are now ready to prove the first of our main theorems.


Theorem 6.8.4 (Tonelli’s Theorem) Let (X,A, µ) and (Y,B, ν) be twoσ-finite measure spaces, and assume that f : X ×Y → R+ is a nonnegative,A ⊗ B-measurable function. Then the functions x 7→

∫f(x, y) dν(y) and

y 7→∫f(x, y)dµ(x) are A- and B-measurable, respectively, and∫

f d(µ× ν) =

∫ [∫f(x, y) dν(y)

]dµ(x) =

∫ [∫f(x, y) dµ(x)

]dν(y)

Proof: We prove the first equality and leave the second to the reader. Noticefirst that if f = 1E is an indicator function, then by the lemma∫

1E d(µ× ν) = µ× ν(E) =

∫ν(Ex) dµ(x) =

∫ [∫1E(x, y) dν(y)

]dµ(x)

which proves the theorem for indicator functions. By linearity, it also holdsfor nonnegative, simple functions.

For the general case, let fn be an increasing sequence of nonnegativesimple functions converging pointwise to f . The functions x 7→

∫fn(x, y) dν(y)

increase to x 7→∫f(x, y) dν(y) by the Monotone Convergence Theorem.

Hence the latter function is measurable (as the limit of a sequence of mea-surable functions), and using the Monotone Convergence Theorem again,we get ∫

f(x, y) d(µ× ν) = limn→∞

∫fn(x, y) d(µ× ν) =

= limn→∞

∫ [∫fn(x, y) dν(y)

]dµ(x) =

∫ [limn→∞

∫fn(x, y) dν(y)

]dµ(x) =

=

∫ [∫limn→∞

fn(x, y) dν(y)

]dµ(x) =

∫ [∫f(x, y) dν(y)

]dµ(x)

2

Fubini’s Theorem is now an easy application of Tonelli’s Theorem.

Theorem 6.8.5 (Fubini’s Theorem) Let (X,A, µ) and (Y,B, ν) be twoσ-finite measure spaces, and assume that f : X×Y → R is µ×ν-integrable.Then the functions fx and fy are integrable for almost all x and y, and theintegrated functions x 7→

∫f(x, y) dν(y) and y 7→

∫f(x, y)dµ(x) are µ- and

ν-integrable, respectively. Moreover,∫f d(µ× ν) =

∫ [∫f(x, y) dν(y)

]dµ(x) =

∫ [∫f(x, y) dµ(x)

]dν(y)

Proof: Since f is integrable, it splits as the difference f = f+ − f− betweentwo nonnegative, integrable functions. Applying Tonelli’s Theorem to |f |,we get ∫ [∫

|f(x, y)| dν(y)

]dµ(x) =

∫ [∫|f(x, y)| dµ(x)

]dν(y) =


=

∫|f | d(µ× ν) <∞

which implies the integrability statements for fx, fy, x 7→∫f(x, y) dν(y) and

y 7→∫f(x, y)dµ(x). Applying Tonelli’s Theorem to f+ and f− separately,

we get∫f+ d(µ× ν) =

∫ [∫f+(x, y) dν(y)

]dµ(x) =

∫ [∫f+(x, y) dµ(x)

]dν(y)

and∫f− d(µ× ν) =

∫ [∫f−(x, y) dν(y)

]dµ(x) =

∫ [∫f−(x, y) dµ(x)

]dν(y)

and subtracting the second from the first, we get Fubini’s Theorem. 2

Remark: The integrability condition in Fubini’s Theorem is occasionally anuisance: The natural way to show that f is integrable, is by calculating theiterated integrals, but this presupposes that f is integrable! The solution isoften first to apply Tonelli’s Theorem to the absolute value |f |, and use theiterated integrals there to show that

∫|f | d(µ× ν is finite. This means that

f is integrable, and we are ready to apply Fubini’s Theorem.

Even when the original measures µ and ν are complete, the productmeasure µ× ν rarely is. A natural response is to take it’s completion µ× ν(as we did with higher dimensional Lebesgue measures), but the questionis if Fubini’s theorem still holds. This is not obvious since there are moreµ× ν-measurable functions than µ×ν-measurable ones, but fortunately theanswer is yes. I just state the result without proof (see Excercise 12).

Theorem 6.8.6 Let (X,A, µ) and (Y,B, ν) be two complete, σ-finite mea-sure spaces, and assume that f : X × Y → R is µ× ν-measurable, whereµ× ν is the completion of the product measure. Then the functions fxand fy are measurable for almost all x and y, and the integrated functionsx 7→


∫f(x, y)dµ(x) are measurable as well. More-

over

(i) (Tonelli’s Theorem for Completed Measures) If f is nonnega-tive,∫f dµ× ν =

∫ [∫f(x, y) dν(y)

]dµ(x) =

∫ [∫f(x, y) dµ(x)

]dν(y)

(ii) (Fubini’s Theorem for Completed Measures) If f is integrable,the functions fx and fy are integrable for almost all x and y, and theintegrated functions x 7→


∫f(x, y)dµ(x) are

µ- and ν-integrable, respectively. Moreover,∫f dµ× ν =

∫ [∫f(x, y) dν(y)

]dµ(x) =

∫ [∫f(x, y) dµ(x)

]dν(y)



1. Show that (Ec)y = (Ey)c (here c i referring to complements).

2. Show that (⋃n∈NEn)y =

⋃n∈NE

yn.

3. Show that (fy)−1(I) = (f−1(I))y.

4. Show thatM = M ⊆ R | 0 ∈M

is a monotone class, but not a σ-algebra.

5. Show that the setsM(M) in the proof of the Monotone Class Theorem reallyare monotone classes.

6. In this problem µ Lebesgue measure on R, while ν is counting measure onN. Let λ = µ× ν be the product measure and let

f : R× N→ R

be given by

f(x, n) =1

1 + (2nx)2

Compute∫f dλ. Remember that

∫1

1+u2 du = arctanu+ C.

7. Let f : [0, 1] × [0, 1] → R be defined by f(x, y) = x−y(x+y)3 (the expression

doesn’t make sense for x = y = 0, and you may give the function whatevervalue you want at that point). Show by computing the integrals that∫ 1

0

[∫ 1

0

f(x, y) dx

]dy = −1

2

and ∫ 1

0

[∫ 1

0

f(x, y) dy

]dx =

1

2

(you may want to use that f(x, y) = 1(x+y)2 −

2y(x+y)3 in the first integral and

argue by symmetry in the second one). Let µ be the Lebesgue measure on[0, 1] and λ = µ× µ. Is f integrable with respect to λ?

8. Define f : [0,∞)× [0,∞) by f(x, y) = xe−x2(1+y2).

a) Show by performing the integrations that∫ ∞0

[∫ ∞0

f(x, y) dx

]dy =

π

4

b) Use Tonelli’s theorem to show that∫∞0

[∫∞0f(x, y) dy

]dx = π

4

c) Make the substitution u = xy in the inner integral of∫ ∞0

[∫ ∞0

f(x, y) dy

]dx =

∫ ∞0

[∫ ∞0

xe−x2(1+y2)) dy

]dx

and show that∫∞0

[∫∞0f(x, y) dy

]dx =

(∫∞0e−u

2

du)2

.


d) Conclude that∫∞0e−u

2

du =√π2 .

9. Let X = Y = [0, 1], let µ be the Lebesgue measure on X (this is just therestriction of the Lebesgue measure on R to [0, 1]) and let ν be the countingmeasure. Let E = (x, y) ∈ X × Y |x = y, and show that

∫ν(Ex) dµ(x),∫

µ(Ey) dν(y), and µ× ν(E) are all different (compare lemma 6.8.3).

10. Assume that X = Y = N and that µ = ν is the counting measure. Letf : X × Y → R be defined by

f(x, y) =

1 if x = y

−1 if x = y + 1

0 otherwise

Show that∫f dµ×ν =∞, but that the iterated integrals

∫ [∫f(x, y) dµ(x)

]dν(u)

and∫ [∫

f(x, y) dν(y)]dµ(x) are both finite, but unequal.

11. Prove the formula∫ν((E ∩ (Xn × Yn))x) dµ(x) = µ× ν(E ∩ (Xn × Yn))

in the proof of Lemma 6.7.3. (Hint: It may be useful to introduce newmeasures µn and νn by µn(A) = µ(A ∩ Xn) and νn(B) = µ(B ∩ Yn) andconsider their product measure µn× νn. From the finite case, you know that∫

νn(Ex) dµn(x) = µn × νn(E)

and you need to derive the formula above from this one.)

12. In this exercise, we shall sketch the proof of Theorem 6.8.6, and we assumethat (X,A, µ) and (Y,B, ν) are as in that theorem.

a) Assume that E ∈ A⊗ B and that µ× ν(E) = 0. Show that µ(Ey) = 0for ν-almost all y and that ν(Ex) = 0 for µ-almost all x.

b) Assume that N is an µ × ν-null set, i.e. there is an E ∈ A ⊗ B suchthat N ⊆ E and µ × ν(E) = 0. Show that for ν-almost all y, Ny isµ-measurable and µ(Ny) = 0. Show also that for µ-almost all x, Nx isν-measurable and ν(Nx) = 0. (Here you need to use that the originalmeasure spaces are complete).

c) Assume that D ⊆ X × Y is in the completion of A⊗B with respect toµ × ν. Show that for ν-almost all y, Dy is µ-measurable, and that forµ-almost all x, Dx is ν-measurable (use that by Theorem 5.2.5 D canbe written as a disjoint union D = E ∪N , where E ∈ A ⊗ B and N isa null set).

d) Let D be as above. Show that the functions x 7→ ν(Dx) and y 7→µ(Dy) are A- and B-measurable, respectively (define ν(Dx) and µ(Dy)arbitrarily on the sets of measure zero where Dx and Dy fail to bemeasurable), and show that∫

ν(Dx) dµ(x) =

∫µ(Dy) dν(y) = µ× ν(D)


e) Prove Theorem 6.8.6 (this is just checking that the arguments that gotus from Lemma 6.8.3 to Theorems 6.8.4 and 6.8.5, still works in thenew setting).

Date post:	31-May-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Mathematical Analysis - Forsiden · Mathematical analysis is a continuation of calculus, but it is...

Documents