Analysis II: Topology and Diﬀerential Calculus of Several Variables · 2020-05-27 · Topology...

Analysis II:

Topology and Differential Calculus

of Several Variables

Peter Philip∗

Lecture Notes

Created for the Class of Spring Semester 2016 at LMU Munich

May 27, 2020

Contents

1 Topology, Metric, Norm 4

1.1 Motivation, Definitions, Convergence . . . . . . . . . . . . . . . . . . . . 4

1.2 Open Sets, Closed Sets, and Related Notions . . . . . . . . . . . . . . . . 18

1.3 Construction of Topological Spaces . . . . . . . . . . . . . . . . . . . . . 25

1.3.1 Bases, Subbases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.4 Topics Particular to Metric and Normed Spaces . . . . . . . . . . . . . . 36

1.4.1 Basic Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1.4.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1.4.3 Inner Products and Hilbert Space . . . . . . . . . . . . . . . . . . 38

1.4.4 Equivalence of Metrics and Equivalence of Norms . . . . . . . . . 40

2 Limits and Continuity of Functions 44

2.1 Definitions and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.2 Banach Fixed Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . 56

2.3 Homeomorphisms, Norm-Preserving and Isometric Maps, Embeddings . . 57

3 Further Topologic Properties 61

∗E-Mail: [email protected]

CONTENTS 2

3.1 Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.3 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Differential Calculus 75

4.1 Partial Derivatives and Gradients . . . . . . . . . . . . . . . . . . . . . . 75

4.2 The Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3 Higher Order Partials and the Spaces Ck . . . . . . . . . . . . . . . . . . 78

4.4 Interlude: Graphical Representation in Two Dimensions . . . . . . . . . . 82

4.5 The Total Derivative and the Notion of Differentiability . . . . . . . . . . 83

4.6 Higher Order Total Derivatives as Multilinear Maps . . . . . . . . . . . . 91

4.7 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.8 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.9 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.10 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.11 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Extreme Values, Stationary Points, Optimization 106

5.1 Definitions of Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . 106

5.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3 Extreme Values and Stationary Points of Differentiable Functions . . . . 111

5.4 Constrained Optimization, Lagrange Multipliers . . . . . . . . . . . . . . 113

A Set-Theoretic Rules for Cartesian Products 117

B Box Topology 118

C Uniform Continuity and Lipschitz Continuity 119

D Viewing Cn as R2n 121

E Pseudometrics and Seminorms 123

F Initial and Final Topologies, Quotient Spaces 126

G Separation: More Counterexamples 129

CONTENTS 3

H Compactness 130

H.1 Intersections of Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . 130

H.2 Unit Balls in Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . 130

H.3 Proof of Tychonoff’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 132

I Topological Invariants 133

J Multilinear Maps 135

K Differential Calculus 138

K.1 Bounded Derivatives Imply Lipschitz Continuity . . . . . . . . . . . . . . 138

K.2 Surjectivity of Directional Derivatives . . . . . . . . . . . . . . . . . . . . 140

References 141

1 TOPOLOGY, METRIC, NORM 4

1 Topology, Metric, Norm

1.1 Motivation, Definitions, Convergence

One major goal of this class is to study convergence in more general contexts than thesets R or C. We have already encountered the convergence of K-valued functions in[Phi16, Sec. 8], where we considered the distinct concepts of pointwise and uniformconvergence. The notion of topology provides an abstract concept suitable for all theabovementioned convergences as well as for convergence in countless other situations ofinterest to us (convergence in Kn being just one important example).

In Analysis I, we said that a sequence (zk)k∈N in K converged to z ∈ K if, and only if,every neighborhood of z contains almost all zk (cf. [Phi16, Rem. 7.8]). As it turns out,this is the concept that one can still use in the most abstract situation – one merelyneeds a suitable abstract notion of neighborhood. Recall from [Phi16, Def. 7.7(a)] thata neighborhood of a point z ∈ K is a set U ⊆ K, containing an open ǫ-ball with centerz. In the situation of an abstract topological space X, to be defined in Def. 1.1 below,one specifies all the open subsets of X, calling U ⊆ X a neighborhood of x ∈ X if, andonly if, there is an open set O ⊆ X such that x ∈ O ⊆ U .

Definition 1.1. Let X be a set and T ⊆ P(X) a set of subsets of X. Then T is calleda topology on X if, and only if, the following three conditions are satisfied:

(i) ∅ ∈ T and X ∈ T .

(ii) T is closed under finite intersections, i.e. the intersection of finitely many sets inT is again in T : If n ∈ N and Oi ∈ T for each i = 1, . . . , n, then

n⋂

i=1

Oi ∈ T .

(iii) T is closed under arbitrary unions, i.e. the union of arbitrarily many (i.e. of finitelyof infinitely many) sets in T is again in T : If I is an arbitrary index set and Oi ∈ Tfor each i ∈ I, then

⋃

i∈I

Oi ∈ T .

If T constitutes a topology on X, then the pair (X, T ) is called a topological space.Moreover, a set O ⊆ X is called T -open or open with respect to T if, and only if,O ∈ T . One simply calls O open in case the topology is understood. Given x ∈ X, aset U ⊆ X is called a neighborhood of x if, and only if, there is an open set O ∈ T suchthat x ∈ O ⊆ U (note that U does not have to be in T ). The set of all neighborhoodsof x is denoted by U(x); it is also called the neighborhood system or the neighborhoodfilter of x. If T1 and T2 are both topologies on X such that T1 ⊆ T2, then we call T1

smaller or coarser than T2, and we call T2 bigger or finer than T1.


Lemma 1.2. Let (X, T ) be a topological space. Then O ⊆ X is open if, and only if, foreach x ∈ O, there exists an open set Ox ∈ T such that x ∈ Ox ⊆ O.

Proof. If O ∈ T and x ∈ O, then one can just choose Ox := O. Conversely, if, for eachx ∈ O, there exists an open set Ox ∈ T such that x ∈ Ox ⊆ O, then

O =⋃

x∈O

Ox,

proving O ∈ T by Def. 1.1(iii). �

Example 1.3. (a) For each set X, T := P(X) constitutes a topology on X, called thediscrete topology on X.

(b) For each set X, T := {∅, X} constitutes a topology on X, called the indiscrete ortrivial topology on X.

(c) For each set X,T := {O ⊆ X : O = ∅ or Oc is finite}

(where Oc := X \ O denotes the complement) constitutes a topology on X, calledthe cofinite topology on X (clearly, ∅, X ∈ T ; if Oc

1 and Oc2 are both finite, then

(O1 ∩O2)c = Oc

1 ∪Oc2 is finite; if Oc

i , i ∈ I, are all finite, then (⋃

i∈I Oi)c =

⋂

i∈I Oci

is finite).

(d) We call O ⊆ K open if, and only if,

∀z∈O

∃ǫ∈R+

Bǫ(z) ⊆ O,

where, as in [Phi16, Def. 7.7(a)], Bǫ(z) := {w ∈ K : |w− z| < ǫ}. Then T := {O ⊆K : O open} constitutes a topology on K (cf. Th. 1.16 below).

Proposition 1.4. Arbitrary intersections of topologies yield again topologies: Let X bea set, let I be a nonempty index set, and let (Ti)i∈I be a family of topologies on X. Then

T :=⋂

i∈I

Ti

is a again a topology on X.

Proof. Since ∅ and X are in each Ti, they are also in T . If O1, O2 ∈ T and i ∈ I, thenO1, O2 ∈ Ti. Thus, O1 ∩ O2 ∈ Ti as well, proving O1 ∩ O2 ∈ T . Similarly, if J is anindex set and Oj ∈ T for each j ∈ J , then Oj ∈ Ti for each i ∈ I and each j ∈ J . Thus,O :=

⋃

j∈J Oj ∈ Ti for each i ∈ I, showing O ∈ T . �

As in K, topologies often arise from so-called metrics or norms, which we define next:

Definition 1.5. Let X be a set. A function d : X ×X −→ R+0 is called a metric on X

if, and only if, the following three conditions are satisfied:


(i) d is positive definite, i.e., for each (x, y) ∈ X × X, d(x, y) = 0 if, and only if,x = y.

(ii) d is symmetric, i.e., for each (x, y) ∈ X ×X, d(y, x) = d(x, y).

(iii) d satisfies the triangle inequality, i.e., for each (x, y, z) ∈ X3, d(x, z) ≤ d(x, y) +d(y, z).

If d constitutes a metric on X, then the pair (X, d) is called a metric space. One thenoften refers to the elements of X as points and to the number d(x, y) as the d-distancebetween the points x and y. If the metric d on X is understood, one also refers to Xitself as a metric space.

Remark 1.6. The requirement that a metric be nonnegative is included in Def. 1.5merely for emphasis. Nonnegativity actually follows from the remaining properties of ametric: For each x, y ∈ X, one computes

0Def. 1.5(i)

= d(x, x)Def. 1.5(iii)

≤ d(x, y) + d(y, x)Def. 1.5(ii)

= 2 d(x, y), (1.1)

showing d(x, y) ≥ 0.

Definition 1.7. Let X be a vector space over the field K. Then a function ‖·‖ : X −→R+

0 is called a norm on X if, and only if, the following three conditions are satisfied:

(i) ‖ · ‖ is positive definite, i.e.

(

‖x‖ = 0 ⇔ x = 0)

for each x ∈ X.

(ii) ‖ · ‖ is homogeneous of degree 1, i.e.

‖λx‖ = |λ|‖x‖ for each λ ∈ K, x ∈ X.

(iii) ‖ · ‖ satisfies the triangle inequality, i.e.

‖x+ y‖ ≤ ‖x‖+ ‖y‖ for each x, y ∈ X.

If ‖ · ‖ constitutes a norm on X, then the pair (X, ‖ · ‖) is called a normed vector spaceor just normed space. If the norm ‖ · ‖ on X is understood, then one also refers to Xitself as a normed space.

Lemma 1.8. If (X, ‖ · ‖) is a normed space, then the function

d : X ×X −→ R+0 , d(x, y) := ‖x− y‖, (1.2)

constitutes a metric on X: One also calls d the metric induced by the norm ‖ · ‖. Thus,the induced metric d makes X into a metric space.


Proof. Consider x, y ∈ X. If x = y, then d(x, y) = ‖x − y‖ = ‖0‖ = 0. Conversely,if 0 = d(x, y) = ‖x − y‖, then x − y = 0, i.e. x = y. Symmetry is verified by thecomputation

d(y, x) = ‖y − x‖ = | − 1| ‖x− y‖ = d(x, y).

Finally, for the triangle inequality, one lets x, y, z ∈ X and estimates

d(x, y) = ‖x− y‖ = ‖x− z + z − y‖ ≤ ‖x− z‖+ ‖z − y‖ = d(x, z) + d(z, y),

which establishes the case. �

Remark 1.9. Throughout this class, a multitude of notions will be introduced formetric spaces (X, d), including open sets, balls, closed sets, etc. Subsequently, we willthen also use these notions in normed spaces (X, ‖ · ‖), always implicitly assuming thatthey are meant with respect to the metric space (X, d), where d is the metric inducedby the norm ‖ · ‖, i.e. where d is given by (1.2).

Definition 1.10. Let (X, d) be a metric space. Given x ∈ X and r ∈ R+, define

Br(x) := {y ∈ X : d(x, y) < r}, (1.3a)

Br(x) := {y ∈ X : d(x, y) ≤ r}, (1.3b)

Sr(x) := {y ∈ X : d(x, y) = r}. (1.3c)

The set Br(x) is called the open ball with center x and radius r, also known as ther-ball with center x. The set Br(x) is called the closed ball with center x and radius r.The set Sr(x) is called the sphere with center x and radius r. A set U ⊆ X is called aneighborhood of x if, and only if, there is ǫ ∈ R+ such that Bǫ(x) ⊆ U . We call O ⊆ Xopen if, and only if,

∀x∈O

∃ǫ∈R+

Bǫ(x) ⊆ O.

Definition 1.11. For n ∈ N, p ∈ [1,∞[, the function

‖ · ‖p : Kn −→ R+0 , ‖z‖p :=

(n∑

j=1

|zj|p)1/p

, (1.4)

is called the p-norm on Kn (here, and in the following, we write vectors z ∈ Kn in theform z = (z1, . . . , zn), where z1, . . . , zn ∈ K are the coordinates of z). For p = 2 andK = R, one also speaks of the Euclidean norm.

—

We want to show that the p-norms are, indeed, norms in the sense of Def. 1.7. Beforewe can do that in Cor. 1.14 below, we need to establish some important inequalities:

Theorem 1.12 (Holder inequality). If n ∈ N and p, q > 1 such that 1p+ 1

q= 1, then

∣∣∣∣

n∑

j=1

aj bj

︸︷︷︸

=:a·b

∣∣∣∣≤ ‖a‖p‖b‖q for each a, b ∈ Kn. (1.5)


Proof. If a = 0 or b = 0, then there is nothing to prove. So let a 6= 0 and b 6= 0. Foreach j ∈ {1, . . . , n}, apply inequality between the weighted arithmetic mean and theweighted geometric mean [Phi16, (9.43)], i.e.

xλ1

1 · · · xλn

n ≤ λ1x1 + · · ·+ λnxn,

with λ1 = 1/p, λ2 = 1/q, x1 = |aj|p/‖a‖pp and x2 = |bj|q/‖b‖qq, to get

|aj||bj|‖a‖p‖b‖q

≤ 1

p

|aj|p‖a‖pp

+1

q

|bj|q‖b‖qq

. (1.6a)

Summing (1.6a) over j ∈ {1, . . . , n} yields 1 on the right-hand side, and, thus,

|a · b| =∣∣∣∣∣

n∑

j=1

aj bj

∣∣∣∣∣≤

n∑

j=1

|aj| |bj|summed (1.6a)

≤ ‖a‖p‖b‖q, (1.6b)

proving (1.5). �

Theorem 1.13 (Minkowski inequality). For each p ≥ 1, z, w ∈ Kn, n ∈ N, one has

‖z + w‖p ≤ ‖z‖p + ‖w‖p. (1.7)

Proof. For p = 1, (1.7) follows directly from the triangle inequality for the absolutevalue in K. It remains to consider the case p > 1. In that case, define q := p/(p − 1),i.e. 1/p + 1/q = 1. Also define a ∈ Rn by letting aj := |zj + wj|p−1 ∈ R+

0 for eachj ∈ {1, . . . , n}, and notice

|zj + wj|p = |zj + wj| aj ≤ |zj| aj + |wj| aj. (1.8a)

Summing (1.8a) over j ∈ {1, . . . , n} and applying the Holder inequality (1.5), one ob-tains

‖z + w‖pp ≤ (|z1|, . . . , |zn|) · a+ (|w1|, . . . , |wn|) · a ≤ ‖z‖p‖a‖q + ‖w‖p‖a‖q. (1.8b)

As q(p− 1) = p, it is aqj = |zj + wj|p, and, thus

‖a‖q =(

n∑

j=1

|zj + wj|p) 1

ppq

= ‖z + w‖p−1p , (1.8c)

where p/q = p−1 was used in the last step. Finally, combining (1.8b) with (1.8c) yields(1.7). �

Corollary 1.14. For each n ∈ N, p ∈ [1,∞[, the p-norm on Kn constitutes, indeed, anorm on Kn.

Proof. If z = 0, then ‖z‖p = 0 follows directly from (1.4). If z 6= 0, then there isj ∈ {1, . . . , n} such that |zj| > 0. Then (1.4) provides ‖z‖p ≥ |zj| > 0. If λ ∈ K

and z ∈ Kn, then ‖λz‖p = (∑n

j=1 |λzj|p)1/p = (|λ|p∑nj=1 |zj|p)1/p = |λ|‖z‖p. The proof

is concluded by noticing that the triangle inequality is the same as the Minkowskiinequality (1.7). �


Example 1.15. Let S 6= ∅ be an otherwise arbitrary set. According to Linear Algebra,the set F(S,K) of all K-valued functions on S is a vector space over K if vector additionand scalar multiplication are defined pointwise via

(f + g) : S −→ K, (f + g)(x) := f(x) + g(x),

(λ · f) : S −→ K, (λ · f)(x) := λ · f(x) for each λ ∈ K.

Now consider the subset B(S,K) of F(S,K), consisting of all bounded K-valued func-tions on S, where we call a K-valued function f bounded if, and only if, the set{|f(s)| : s ∈ S} ⊆ R+

0 is a bounded subset of R. Define

‖f‖∞ := ‖f‖sup := sup{|f(s)| : s ∈ S} ∈ R+0 for each f ∈ B(S,K). (1.9)

We will show that B(S,K) constitutes a vector space over K and ‖·‖sup provides a normon B(S,K) (i.e.

(B(S,K), ‖ · ‖sup

)is a normed vector space). To verify that B(S,K)

constitutes a vector space over K, it suffices to show it is a subspace of the vector spaceF(S,K), which, is equivalent to showing f, g ∈ B(S,K) and λ ∈ K imply f+g ∈ B(S,K)and λf ∈ B(S,K).

If f, g ∈ B(S,K), then

∀s∈S

|f(s) + g(s)| ≤ |f(s)|+ |g(s)| ≤ ‖f‖sup + ‖g‖sup ∈ R+0 , (1.10a)

showing f + g ∈ B(S,K) and that ‖ · ‖sup satisfies the triangle inequality

∀f,g∈B(S,K)

‖f + g‖sup ≤ ‖f‖sup + ‖g‖sup. (1.10b)

If f ∈ B(S,K), λ ∈ K, then,

∀s∈S

|λ f(s)| = |λ| |f(s)| ≤ |λ| ‖f‖sup ∈ R+0 (1.11a)

implies λf ∈ B(S,K), completing the proof that B(S,K) is a subspace of F(S,K).Moreover,

‖λf‖sup = sup{|λf(s)| : s ∈ S}= sup{|λ||f(s)| : s ∈ S}

[Phi16, (4.6c)]= |λ| sup{|f(s)| : s ∈ S} = |λ|‖f‖sup, (1.11b)

proving ‖ · ‖sup is homogeneous of degree 1. To see that ‖ · ‖sup constitutes a norm onB(S,K), it merely remains to show positive definiteness. To this end, we notice that thezero element f = 0 of the vector space B(S,K) is the function f ≡ 0, which vanishesidentically. Thus, f = 0 if, and only if, ‖f‖sup := sup{|f(s)| : s ∈ S} = 0, showing‖ · ‖sup is positive definite, and completing the proof that ‖ · ‖sup is a norm, makingB(S,K) into a normed vector space.

—


In generalization of Ex. 1.3(d), every metric (in particular, every norm) induces a topol-ogy:

Theorem 1.16. Let (X, d) be a metric space. Then T := {O ⊆ X : O open} constitutesa topology on X: One also calls T the topology induced by the metric d, making eachmetric space into a topological space.

Proof. Clearly, ∅ ∈ T and X ∈ T . Now consider finitely many open sets O1, . . . , ON ∈T , N ∈ N, and let O :=

⋂Nj=1Oj. We have to prove that O is open. Hence, let x ∈ O.

Then x ∈ Oj for each j ∈ {1, . . . , N}. Since each Oj is open, for each j ∈ {1, . . . , N},there is ǫj > 0 such that Bǫj(x) ⊆ Oj. If we let ǫ := min{ǫj : j ∈ {1, . . . , N}}, thenǫ > 0 and Bǫ(x) ⊆ Bǫj(x) ⊆ Oj for each j ∈ {1, . . . , N}, i.e. Bǫ(x) ⊆ O, showing O isopen. Now let I be an arbitrary index set. For each j ∈ I, let Oj ∈ T . We have toverify that O :=

⋃

j∈I Oj is open. Let x ∈ O. Then there is j ∈ I such that x ∈ Oj.Since Oj is open, there is ǫ > 0 such that Bǫ(x) ⊆ Oj ⊆ O, showing O to be open. �

Definition and Remark 1.17. A topological space (X, T ) is called metrizable if,and only if, there exists a metric d on X such that T is induced by d. Not everytopology is metrizable: For example, the indiscrete topology on a set with at least twodistinct elements can never be metrizable (see Rem. 1.42(a) below). Other examplesof nonmetrizable topologies are the cofinite topology on an uncountable set (see Rem.1.39(b) below) and the topology of pointwise convergence (see Ex. 1.53(c) below).

Example 1.18. Let X be a set. We will show that the discrete topology on X (cf. Ex.1.3(a)) is always metrizable: The corresponding discrete metric is defined by

d : X ×X −→ {0, 1}, d(x, y) :=

{

0 for x = y,

1 for x 6= y.(1.12)

We verify that d constitutes a metric on X: Since d(x, y) = 0 holds if, and only if,x = y, d is positive definite. Symmetry, i.e. d(x, y) = d(y, x) is immediate from (1.12).We still need to show

∀x,y,z∈X

d(x, y) ≤ d(x, z) + d(z, y).

For x = y, it is d(x, y) = 0, and we are done. For x 6= y, it is d(x, y) = 1, and we haveto show the right-hand cannot be 0. If z = x, then z 6= y and d(y, z) = 1. If z 6= x,then d(x, z) = 1. Now that we have seen that d is a metric on X, we still need to proveit induces the discrete topology, i.e. that every subset of X is open with respect to d.Thus, let O ⊆ X and x ∈ O. Then B1(x) = {x} ⊆ O, which already shows O to beopen.

—

We now want to proceed to introduce convergence on topological spaces (in particular onmetric spaces, including Kn). In K we were able to characterize continuity of functionsas well as the closedness of a set in terms of convergence of sequences. While this will stillbe possible in metric spaces, in general topological spaces, the convergence of sequences


no longer suffices for such characterizations. For this reason, we will introduce the moregeneral (and more powerful) notion of net convergence1 (also referred to as Moore-Smithconvergence in the literature).

Definition 1.19. (a) A directed set (I,≤) consists of a nonempty set I and a relation≤ on I that is reflexive, transitive,2 and has the additional property that every setconsisting of precisely two elements has an upper bound, i.e.

∀i,j∈I

∃M∈I

(

i ≤M ∧ j ≤M)

. (1.13)

(b) Let X be a set. A net in X is a family (xi)i∈I in X (i.e. a function from I into X,cf. [Phi16, Def. 2.15(a)]) indexed by a directed set I. If A ⊆ X, then we say thatthe net (xi)i∈I in X is eventually in A if, and only if,

∃i∈I

∀j≥i

xj ∈ A. (1.14)

Example 1.20. (a) If I is any nonempty set and ≤ is a total order on I, then (I,≤)is a directed set. In particular, N with its usual order is a directed set, and, thus,every sequence is a net.

(b) If I is any nonempty set, then (I, I × I) is a directed set, but I × I is not a partialorder if I has at least two distinct elements. On the other hand, {(0, 0), (1, 1)} is apartial order on {0, 1} that does not make {0, 1} into a directed set.

(c) Let (X, T ) be a topological space, x ∈ X. Then the neighborhood system U(x) ismade into a directed set by defining

∀U,V ∈U(x)

U ≤ V :⇔ V ⊆ U : (1.15)

Clearly, ≤ is reflexive and transitive (it is even a partial order), and (1.13) is satis-fied, since U, V ∈ U(x) implies U ∩ V ∈ U(x) (one still obtains a directed set whenusing U ⊆ V in (1.15), but this example turns out to be less useful).

(d) If [a, b] ⊆ R is an interval, a, b ∈ R, then the set Π of partitions of [a, b] (cf. [Phi16,Def. 10.3]) is turned into a directed set by setting ∆ ≤ ∆′ if, and only if, ∆′ is arefinement of ∆ (cf. [Phi16, Def. 10.8(a)]).

—

The reason one does not require the relation in Def. 1.19(a) to be a partial order is thatit is not necessary – it would clutter the notion of a directed set without any additionalgain for the theory.

1Alternatively, one can also use the similar, but different, concept of filter convergence. In this class,we only consider net convergence, which seems slightly easier to explain as well as more common inintroductory Analysis texts.

2A relation that is reflexive and transitive is sometimes called a preorder.


Definition 1.21. Let (X, T ) be a topological space. The net (xi)i∈I in X is said to beconvergent with limit x ∈ X if, and only if, for every neighborhood U of x, the net iseventually in U , i.e. if, and only if,

∀U∈U(x)

∃i∈I

∀j≥i

xj ∈ U. (1.16)

If (xi)i∈I converges to x, then we write limi∈I xi = x. If I = N (i.e. if the net is asequence), then we also still write limi→∞ xi = x instead of limi∈N xi = x. A net is calleddivergent if, and only if, it is not convergent.

Example 1.22. (a) Let (X, T ) be a topological space, where T is induced by themetric d on X. Let (xi)i∈I be a net in X, and x ∈ X. Since every ball Bǫ(x),ǫ > 0, is a neighborhood of x and, conversely, every U ∈ U(x) contains some ballBǫ(x) ⊆ U , ǫ > 0, we have the equivalence

limi∈I

xi = x ⇔ ∀ǫ∈R+

∃i∈I

∀j≥i

d(xi, x) < ǫ. (1.17a)

In particular, if I = N and the net is a sequence, then (xi)i∈N converges to x if, andonly if, the real sequence of distances

(d(xi, x)

)

i∈Nconverges to 0 (in the sense of

Analysis I):limi→∞

xi = x ⇔ limi→∞

d(xi, x) = 0. (1.17b)

(b) Let (X, T ) be an arbitrary topological space and x ∈ X. Consider U(x) with thepartial order of Ex. 1.20(c). Moreover, let (xU)U∈U(x) be a net such that xU ∈ Ufor each U ∈ U(x). Then, clearly, limU∈U(x) xU = x.

(c) We consider Kn, n ∈ N, with the 2-norm. Let (zk)k∈I be a net in Kn (for example,a sequence). Here, zk = (zk1 , . . . , z

kn), i.e. the z

kj are the coordinates of the vector

zk. As in the present example, we will subsequently use upper indices for indices insequences and nets in situations, where we also need to denote coordinates; differentcoordinates will be referred to via lower indices. For each coordinate j ∈ {1, . . . , n},we have the coordinate net (zkj )k∈I in K. We now claim that (zk)k∈I converges withrespect to (the topology induced by) the 2-norm to a = (a1, . . . , an) ∈ Kn if, andonly if, each coordinate net (zkj )k∈N converges to aj in K, j ∈ {1, . . . , n}:

limk∈I

zk = a ⇔ ∀j∈{1,...,n}

limk∈I

zkj = aj. (1.18)

In particular, a sequence in Kn converges (with respect to the 2-norm) if, and onlyif, each coordinate sequence converges in K. Remark: In Th. 1.72 below, we willsee that every norm on Kn induces the same topology, i.e. the validity of (1.18)does actually not depend on the norm. In preparation for the proof of (1.18), weobserve that, for each z ∈ Kn, one has the following estimates:

∀j∈{1,...,n}

|zj| ≤ ‖z‖2︸︷︷︸√

|z1|2+···+|zn|2

≤ ‖z‖1︸︷︷︸

|z1|+···+|zn|

. (1.19)


Let ≤ denote the relation that makes I into a directed set. If (zk)k∈I convergesto a, then, according to (1.17a), given ǫ ∈ R+, there is N ∈ I such that, for eachk ≥ N ,

‖zk − a‖2 < ǫ.

By (1.19), this implies

∀j∈{1,...,n}

|zkj − aj| ≤ ‖zk − a‖2 < ǫ,

proving limk∈I zkj = aj. Conversely, if (z

kj )k∈I converges to aj for each j ∈ {1, . . . , n},

then, given ǫ ∈ R+, (1.17a) yields N ∈ I such that, for each k ≥ N ,

|zkj − aj| <ǫ

n.

Since, by (1.19), this implies

‖zk − a‖2 ≤n∑

j=1

|zkj − aj| < nǫ

n= ǫ,

(zk)k∈I converges to a.

(d) We can interpret the function limit limz→ζ f(z) = η of [Phi16, Def. 8.17] as a netlimit in C: Let M ⊆ C, f : M −→ C, and let ζ ∈ C be a cluster point of M . Wemake the set I :=M \ {ζ} into a directed set by defining

∀z,w∈M

z ≤ w :⇔ |z − ζ| ≥ |w − ζ|

(i.e. “bigger” elements are closer to ζ). It is then an exercise to show that, forη ∈ C,

limz→ζ

f(z) = η ⇔ limz∈I

f(z) = η.

(e) We can interpret the Riemann integral∫ b

af of a bounded function f : [a, b] −→ R,

a < b, as a net limit in R: As in Ex. 1.20(d), we turn the set Π of tagged partitionsof [a, b] into a directed set by setting ∆ ≤ ∆′ if, and only if, ∆′ is a refinement of

∆. Then∫ b

af exists if, and only if, lim∆∈Π ρ(∆, f) exists (where ρ(∆, f) denotes

the Riemann sum of [Phi16, (10.7c)]), and, in that case,

∫ b

a

f = lim∆∈Π

ρ(∆, f)

(see [Wal02, Sec. 5.6]).

—

In [Phi16, Sec. 7.1], we studied subsequences and reorderings of sequences (cf. [Phi16,Sec. 7.21]). We found that subsequences and reorderings of convergent sequences in C

are also convergent with the same limit ([Phi16, Prop. 7.23]). Moreover, in [Phi16, Prop.


7.26], we showed that z ∈ K is a cluster point of the sequence (zk)k∈N in K if, and onlyif, the sequence has a subsequence converging to z. We will now study related notionson general nets, where we will find that the mentioned results extend only partiallyto the most general situation. Reorderings turn out to be much less useful on generalnets than on sequences and, while subnets generalize subsequences, the concept of asubnet is somewhat more complicated, such that the concept of a subnet turns out tobe noticeably more versatile than that of a subsequence.

Definition 1.23. Let (I,≤) and (J,≤) be directed sets, let φ : J −→ I, let X be anarbitrary nonempty set, and let σ : I −→ X be a net in X.

(a) φ is called final if, and only if,

∀i∈I

∃j0∈J

∀j≥j0

φ(j) ≥ i. (1.20)

(b) The net (σ ◦ φ) : J −→ X is called a subnet of σ if, and only if, φ is final.

(c) For I = J (with the same relation ≤), the net (σ◦φ) : I −→ X is called a reorderingof σ if, and only if, φ is bijective.

Example 1.24. Using the identity for φ shows that every net is a subnet of itself. More-over, every subsequence is a subnet, since, clearly, if φ : N −→ N is strictly increasing,then φ is final. However, sequences can have subnets that are no subsequences. Onereason is that final maps do not need to be monotone: For example, (2, 1, 4, 3, 6, 5, . . . )is a subnet of (1, 2, 3, . . . ). The other reason is that a final map does not need to beinjective. For example, if σ : N −→ X is a sequence in X, then (σ ◦ φ) : R −→ X,

φ : R −→ N, φ(s) := min{k ∈ N : k ≥ s},

is a subnet of σ.

Proposition 1.25. Let (X, T ) be a topological space, x ∈ X.

(a) Let σ : I −→ X be a net in X. If limi∈I σ(i) = x, then every subnet of σ is alsoconvergent with limit x.

(b) Let σ : N −→ X be a sequence in X. If limi→∞ σ(i) = x, then every reordering ofσ is also convergent with limit x.

Proof. (a): Let σ ◦ φ be a subnet of σ, where φ : J −→ I is a final map. Moreover, letU ∈ U(x) be a neighborhood of x. Since limi∈I σ(i) = x, there exists i0 ∈ I such thatσ(i) ∈ U for each i ≥ i0. As φ is final, there exists j0 ∈ J such that

∀j≥j0

φ(j) ≥ i0.

Thus, for each j ≥ j0, we have

(σ ◦ φ)(j) = σ(φ(j)) ∈ U,


due to φ(j) ≥ i0, proving limj∈J(σ ◦ φ)(j) = x as desired.

(b): The proof is analogous to the corresponding part of the proof of [Phi16, Prop. 7.23]:Let σ ◦ φ be a reordering of σ, where φ : I −→ I is bijective. Let U and i0 be as before(now with I := N). Define

M := max{φ−1(i) : i ≤ i0}.

As φ is bijective, it is φ(i) > i0 for each i > M . Then, for each i > M , one has

(σ ◦ φ)(i) = σ(φ(i)) ∈ U

due to φ(i) > i0, proving limi→∞(σ ◦ φ)(i) = x as desired. �

Caveat: Reorderings of general convergent nets do not necessarily converge, see Ex. 1.30below.

Definition 1.26. (a) Let X be a set. If A ⊆ X, then we say that the net (xi)i∈I in Xis frequently or cofinally in A if, and only if,

∀i∈I

∃j≥i

xj ∈ A. (1.21)

(b) Let (X, T ) be a topological space, x ∈ X. The net (xi)i∈I in X is said to have x asa cluster point (or accumulation point) if, and only if, for every neighborhood U ofx, the net is frequently in U , i.e. if, and only if,

∀U∈U(x)

∀i∈I

∃j≥i

xj ∈ U. (1.22)

Proposition 1.27. Let (X, T ) be a topological space, x ∈ X. The net σ : I −→ X inX has x as a cluster point if, and only if, it has a subnet that converges to x.

Proof. Let σ ◦ φ be a subnet of σ, where φ : J −→ I is a final map. Assume thatlimj∈J(σ ◦ φ)(j) = x and let U ∈ U(x). If i ∈ I, then, as φ is final, there exists j0 ∈ Jsuch that

∀j≥j0

φ(j) ≥ i.

On the other hand, there is j1 ∈ J such that

∀j≥j1

σ(φ(j)) ∈ U.

Since J is a directed set, there exists j2 ∈ J such that j2 ≥ j0 and j2 ≥ j1, implyingφ(j2) ≥ i and σ(φ(j2)) ∈ U . As we have, thus, shown σ to be frequently in U , x is acluster point of σ. Conversely, let x be a cluster point of σ. We have to construct asubnet that converges to x. To this end, we let

J :={(i, U) ∈ I × U(x) : σ(i) ∈ U

}


and define∀

(i,U),(j,V )∈J(i, U) ≤ (j, V ) :⇔

(i ≤ j ∧ U ⊇ V

).

Then ≤ is clearly reflexive and transitive on J . Let (i, U), (j, V ) ∈ J and note U ∩ V ∈U(x). As I is a directed set, there is M ∈ I such that M ≥ i and M ≥ j. Moreover,the assumption that σ is frequently in U ∩ V implies there is k ∈ I such that

k ≥M ≥ i ∧ k ≥M ≥ j ∧ σ(k) ∈ U ∩ V,

and, thus,

(k, U ∩ V ) ∈ J ∧ (k, U ∩ V ) ≥ (i, U) ∧ (k, U ∩ V ) ≥ (j, V ),

proving (J,≤) to be a directed set. The projection map

π : J −→ I, π(i, U) := i,

is final: Indeed, if i ∈ I, then (i,X) ∈ J and (j, U) ≥ (i,X) ∈ J implies π(j, U) = j ≥ i.In consequence, σ ◦ π is a subnet of σ. It merely remains to show that σ ◦ π convergesto x. Thus, let U ∈ U(x). Since σ is frequently in U , there must be i ∈ I with σ(i) ∈ U .Then (i, U) ∈ J and if (j, V ) ≥ (i, U), then

(σ ◦ π)(j, V ) = σ(j) ∈ V ⊆ U,

proving lim(j,U)∈J(σ ◦ π)(j, U) = x. �

In [Phi16, Prop. 7.10(b)], we obtained the result that every convergent sequence in C isbounded. We will see in Prop. 1.29(d) below that this result remains true for sequencesin metric spaces (but not for nets, see Ex. 1.30). In general topological spaces, however,one no longer has the concept of boundedness.

Definition 1.28. Let (X, d) be a metric space.

(a) The set A ⊆ X is called bounded if, and only if, A = ∅ or A 6= ∅ and the set{d(x, y) : x, y ∈ A} is bounded in R; A ⊆ X is called unbounded if, and only if, Ais not bounded. For each A ⊆ X, the number

diamA :=

0 for A = ∅,sup

{d(x, y) : x, y ∈ A

}for ∅ 6= A bounded,

∞ for A unbounded,

(1.23)

is called the diameter of A. Thus, diamA ∈ [0,∞] := R+0 ∪ {∞} and A is bounded

if, and only if, diamA <∞.

(b) The net (xi)i∈I in X is called bounded if, and only if, the set {xi : i ∈ I} is boundedin the sense of (a).

Proposition 1.29. Let (X, d) be a metric space.


(a) A ⊆ X is bounded if, and only if, there is r > 0 and x ∈ X such that A ⊆ Br(x)(in particular, Def. 1.28(a) is consistent with [Phi16, Def. 7.42(a)]).

(b) Every finite subset of X is bounded.

(c) The union of two bounded subsets of X is bounded.

(d) If the sequence (xk)k∈N in X is convergent, then it is bounded.

Proof. (a): If A is bounded, then diamA < ∞. Let r be any real number bigger thandiamA, e.g. 1 + diamA. Choose any point x ∈ A. Then, by the definition of diamA,for each y ∈ A, it is d(x, y) ≤ diamA < r, showing that A ⊆ Br(x). Conversely,if r > 0 and x ∈ X such that A ⊆ Br(x), then, by the definition of Br(x), one hasd(x, y) < r for each y ∈ A. Now, if y, z ∈ A, then d(z, y) ≤ d(z, x) + d(x, y) < 2r,showing diamA ≤ 2r <∞, i.e. A is bounded.

(b): Let A be a finite subset of X and a ∈ A. Set r := 1 +max{d(a, x) : x ∈ A}. Then1 ≤ r <∞, since A is finite. Moreover, A ⊆ Br(a), showing that A is bounded.

(c): Let A and B be bounded subsets of X. Then there are x, y ∈ X and r > 0such that A ⊆ Br(x) and B ⊆ Br(y). Define α := d(x, y) and ǫ := r + α. ThenA ⊆ Br(x) ⊆ Bǫ(x). If b ∈ B, then d(b, x) ≤ d(b, y) + d(y, x) < r + α = ǫ, showingB ⊆ Bǫ(x), and, thus, A ∪ B ⊆ Bǫ(x), establishing that A ∪B is bounded.

(d) (cf. the proof for sequences in K in [Phi16, Prop. 7.10(b)]): If limk→∞ xk = a ∈ X,then there is N ∈ N such that xk ∈ B1(a) for each k > N , i.e. {xk : k > N} is bounded.Moreover, the finite set {xk : k ≤ N} is bounded. Therefore, {xk : k ∈ N} is the unionof two bounded sets, and, hence, bounded. �

Example 1.30. The net (e−k)k∈Z in R, which clearly converges to 0, shows that conver-gent nets do not need to be bounded. Moreover, using the bijective map φ : Z −→ Z,φ(k) := −k, we see that the divergent net (ek)k∈Z is a reordering of (e−k)k∈Z.

—

We conclude the present section by extending the 1-dimensional Bolzano-Weierstrasstheorem of [Phi16, Th. 7.27] to sequences in Kn:

Theorem 1.31 (Bolzano-Weierstrass). Let (zk)k∈N be a sequence in Kn that is boundedwith respect to the 2-norm (in fact, we will see as a consequence of Th. 1.72 below thatthe property of boundedness in Kn does not depend on the chosen norm). Then (zk)k∈Nhas a subsequence that converges in Kn.

Proof. If (zk)k∈N is bounded with respect to the 2-norm, then, due to (1.19), each coor-dinate sequence (zkj )k∈N, j ∈ {1, . . . , n}, is bounded in K. We prove by induction over{1, . . . , n} that, for each j ∈ {1, . . . , n}, there is a subsequence (yk,j)k∈N of (xk)k∈N suchthat the coordinate sequences (yk,jα )k∈N converge for each α ∈ {1, . . . , j}. Base Case(j = 1): Since (zk1 )k∈N is a bounded sequence in K, the Bolzano-Weierstrass theorem


for sequences in K (cf. [Phi16, Prop. 7.26, Th. 7.27]) yields the existence of a conver-gent subsequence of (zk1 )k∈N. This provides us with the needed subsequence (yk,1)k∈N of(zk)k∈N. Now suppose that 1 < j ≤ n. By induction, we already have a subsequence(yk,j−1)k∈N of (zk)k∈N such that the coordinate sequences (yk,j−1

α )k∈N converge for eachα ∈ {1, . . . , j − 1}. As (yk,j−1

α )k∈N is a subsequence of the bounded K-valued sequence(zkα)k∈N, by the Bolzano-Weierstrass theorem for sequences in K, it has a convergent sub-sequence. This provides us with the needed subsequence (yk,j)k∈N of (yk,j−1)k∈N, whichis then also a subsequence of (zk)k∈N. Moreover, for each α ∈ {1, . . . , j−1}, (yk,jα )k∈N is asubsequence of the convergent sequence (yk,j−1

α )k∈N, and, thus, also convergent. In con-sequence, (yk,jα )k∈N converge for each α ∈ {1, . . . , j} as required. Finally, one observesthat (yk,n)k∈N is a subsequence of (zk)k∈N such that all coordinate sequences (yk,nα )k∈N,α ∈ {1, . . . , n}, converge. Let aα := limk→∞ yk,nα for each α ∈ {1, . . . , n}. Then, by(1.18), limk→∞ yk,n = a, thereby establishing the case. �

1.2 Open Sets, Closed Sets, and Related Notions

Definition 1.32. Let (X, T ) be a topological space, A ⊆ X, and x ∈ X.

(a) A is called open if, and only if, A ∈ T (of course, we already defined this in Def.1.1 – here, it is only repeated for the sake of completeness).

(b) A is called closed if, and only if, Ac = X \ A is open (where Ac = X \ A denotesthe complement of A, cf. [Phi16, Def. 1.24(c)]).

(c) The point x is called an interior point of A if, and only if, A ∈ U(x), i.e. if, andonly if, there exists O ∈ T such that x ∈ O ⊆ A. Note: An interior point of Ais always in A. The set of all interior points of A is called the interior of A. It isdenoted by A◦ or by intA.

(d) The point x is called a boundary point of A if, and only if, each O ∈ U(x) contains atleast one point from A and at least one point from Ac (A∩O 6= ∅ and Ac ∩O 6= ∅).Note: A boundary point of A is not necessarily in A. The set of all boundary pointsof A is called the boundary of A. It is denoted by ∂A. The set A∪ ∂A is called theclosure of A. It is denoted by A or by clA. The set A is called dense in X if, andonly if, A = X; (X, T ) is called separable if, and only if, X has a countable densesubset.

(e) The point x is called a cluster point or accumulation point of A if, and only if, everyneighborhood of x has a nonempty intersection with A \ {x}, i.e. if, and only if,

∀U∈U(x)

U ∩ (A \ {x}) 6= ∅

(cf. [Phi16, Def. 7.33(a)] and Rem. 1.42(a) below). Note: A cluster point of A isnot necessarily in A.


(f) The point x is called an isolated point of A if, and only if, it is in A and not acluster point of A, i.e. if, and only if, there exists U ∈ U(x) such that U ∩A = {x}(cf. [Phi16, Def. 7.33(b)]). Note: An isolated point of A is always in A.

Example 1.33. Let (X, T ) be a topological space, where T is induced by a metric d onX. Then, given x ∈ X and r ∈ R+, the open ball Br(x) is an open set and the closedball Br(x) is a closed set: That Br(x) ∈ T is immediate from the definition of T . To seethat Br(x) is closed, we need to show X \Br(x) is open. To this end, let y ∈ X \Br(x),ǫ := d(x, y) − r. If z ∈ Bǫ(y), then d(y, z) < d(x, y)− r and d(x, y) ≤ d(x, z) + d(z, y).Thus, d(x, z) ≥ d(x, y)− d(z, y) > r, showing Bǫ(y) ⊆ X \Br(x), i.e. X \Br(x) is open.

Example 1.34. Consider (X, T ), where X = K and T is induced by the metric d :X −→ R+

0 , d(z, w) := |z − w|.

(a) Let A :=]0, 1] and K = R. Then A◦ =]0, 1[, ∂A = {0, 1}, A = [0, 1].

(b) Let A :=]0, 1] and K = C. Then A◦ = ∅, ∂A = A = [0, 1].

(c) Let A := Q. In this case, there is no difference between K = R and K = C: A◦ = ∅,∂A = A = R.

(d) Let A := {1/n : n ∈ N}. Once again, there is no difference between K = R andK = C: Every element of A is an isolated point. In particular A◦ = ∅. The uniquecluster point of A is 0, and ∂A = A = A ∪ {0}.

Proposition 1.35. Let (X, T ) be a topological space, A ⊆ X.

(a) A is open if, and only if, Ac is closed.

(b) The empty set ∅ and the entire space X are both open and closed. Such sets aresometimes called clopen.

(c) Intersections of arbitrarily many closed sets are closed (cf. [Phi16, Prop. 7.44(b)]).The union of finitely many closed sets is closed (cf. [Phi16, Prop. 7.44(a)]).

(d) X is the disjoint union of A◦, ∂A, and (X \ A)◦.

(e) A is dense in X if, and only if, for every nonempty O ∈ T , one has O ∩ A 6= ∅.

Proof. (a): According to Def. 1.32(b), Ac is closed if, and only if, (Ac)c is open. However,(Ac)c = X \ Ac = X \ (X \ A) = A.

(b) is immediate, since ∅, X ∈ T .

(c): Let I 6= ∅ be a (finite or infinite) index set. For each j ∈ I, let Cj ⊆ X be closed.We have to verify that C :=

⋂

j∈I Cj is closed. According to the set-theoretic law [Phi16,Prop. 1.39(e)]

Cc =

(⋂

j∈I

Cj

)c

[Phi16, Prop. 1.39(e)]=

⋃

j∈I

Ccj .


Now, as we know that Cj is closed, we know that Ccj is open. According to Def. 1.1(iii),

that means that Cc is open, showing that C is closed. Similarly, if we consider finitelymany closed sets C1, . . . , CN , N ∈ N, and let C :=

⋃Nj=1Cj, then the set-theoretic law

[Phi16, Prop. 1.39(f)] yields

Cc =

(N⋃

j=1

Cj

)c

[Phi16, Prop. 1.39(f)]=

N⋂

j=1

Ccj .

Since Cj is closed, Ccj is open, and, by Def. 1.1(ii), Cc is open, hence C closed.

(d): One has to show four parts: X = A◦∪∂A∪(X\A)◦, A◦∩∂A = ∅, ∂A∩(X\A)◦ = ∅,and A◦ ∩ (X \ A)◦ = ∅. Suppose x ∈ X \ (A◦ ∪ ∂A). Since x /∈ ∂A, there existsU ∈ U(x) such that U ⊆ A or U ⊆ X \ A. As x /∈ A◦, it must be U ⊆ X \ A, i.e.x ∈ (X \ A)◦. A◦ ∩ ∂A = ∅: If x ∈ A◦, then there is U ∈ U(x) such that U ⊆ A, thus,x /∈ ∂A. ∂A ∩ (X \ A)◦ = ∅: Since ∂A = ∂(X \ A), this follows from A◦ ∩ ∂A = ∅.A◦ ∩ (X \ A)◦ = ∅ holds as A◦ ⊆ A, (X \ A)◦ ⊆ X \ A, and A ∩ (X \ A) = ∅.(e): If A is dense in X, then X = A ∪ ∂A. Let x ∈ O. If x ∈ A, then x ∈ O ∩ A. Ifx ∈ ∂A, then O ∩ A 6= ∅ also holds. Conversely, suppose O ∩ A 6= ∅ for each nonemptyO ∈ T . Then, for x ∈ X \A and x ∈ O ∈ T , both O ∩A 6= ∅ and O ∩Ac 6= ∅, showingx ∈ ∂A. �

Example 1.36. Due to Prop. 1.35(e), Q is dense in R (endowed with the topologygiven by | · |). More generally, Qn is dense in Rn, n ∈ N, and

A :={(x1 + iy1, . . . , xn + iyn) ∈ Cn : xj, yj ∈ Q, j ∈ {1, . . . , n}

}

is dense in Cn (endowed with the topology given by ‖ · ‖2). As Qn and A are countable,Rn and Cn are separable in the norm topology. On the otherhand, if X is uncountableand (X, T ) is discrete, then the space can never be separable (since A = A for eachA ⊆ X). As every discrete space is metrizable (by the discrete metric), this shows thatnot every metric space is separable.

Theorem 1.37. Let (X, T ) be a topological space, A ⊆ X.

(a) The interior A◦ is the union of all open subsets of A. In particular, A◦ is open. Inother words, A◦ is the largest open set contained in A.

(b) The closure A is the intersection of all closed supersets of A. In particular, A isclosed. In other words, A is the smallest closed set containing A.

(c) The boundary ∂A is closed.

Proof. (a): Let O be the union of all open subsets of A. Then O is open by Def. 1.1(iii).If x ∈ A◦, then x is an interior point of A, i.e. there is U ∈ U(x) and O1 ∈ T such thatx ∈ O1 ⊆ U ⊆ A (cf. Def. 1.1). Since O1 is open, x ∈ O. Conversely, if x ∈ O, then, asO is open, O ∈ U(x), showing that x is an interior point of A, i.e. x ∈ A◦.


(b): According to (a), (Ac)◦ is the union of all open subsets of Ac, i.e.

(Ac)◦ =⋃

O∈{S⊆Ac:S open}

O,

then((Ac)◦

)c [Phi16, Prop. 1.39(f)]=

⋂

O∈{S⊆Ac:S open}

Oc =⋂

C∈{S⊇A:S closed}

C

is the intersection of all closed supersets of A (note that C is a closed superset of A if,and only if, Cc is an open subset of Ac). As, by Prop. 1.35(d),

((Ac)◦

)c= ∂(Ac) ∪ A◦ = ∂A ∪ A◦ = ∂A ∪ A = A,

A is the intersection of all closed supersets of A as claimed.

(c): According to Prop. 1.35(d), it is ∂A = X \(A◦ ∪ (X \A)◦

). Since A◦ and (X \A)◦

are open, ∂A is closed. �

We will now proceed to study some relations between cluster points of a set A, theclosure of a set A, and convergent nets in A. Sequences suffice instead of nets fortopological spaces that are first countable, a notion provided by the following definition:

Definition 1.38. Let (X, T ) be a topological space, x ∈ X. A set B ⊆ U(x) is called alocal base at x or a neighborhood base at x if, and only if, every U ∈ U(x) contains someB ∈ B, i.e. if, and only if,

∀U∈U(x)

∃B∈B

B ⊆ U.

Moreover, X is called first countable (sometimes also called a C1-space) if, and only if,at every x ∈ X there exists a countable local base.

Remark 1.39. (a) Indiscrete spaces as well as metric spaces, are first countable: If(X, T ) is indiscrete, then U(x) = {X} for every x ∈ X. If (X, T ) is metric (i.e. Tis induced by a metric d on X), then, clearly, for each x ∈ X,

B(x) :={Bǫ(x) : ǫ ∈ Q+

}

constitutes a countable local base at x.

(b) Let X be a set endowed with the cofinite topology T of Ex. 1.3(c). The space isfirst countable if, and only if, X is countable: If X is countable, then the set offinite subsets of X is countable, i.e. T is countable and, for each x ∈ X, T ∩U(x) isa countable local base at x. Conversely, assume there exists a countable local baseB at x ∈ X. It holds that

⋂

B∈B

B =⋂

U∈U(x)

U = {x} : (1.24a)

The inclusions ⊇ in (1.24a) are immediate; ⊆ at the first equality holds, as everyU ∈ U(x) contains a B from B as a subset; ⊆ at the second equality holds, since,


for each y 6= x, one has {y}c ∈ U(x), i.e. y /∈ ⋂U∈U(x) U . Taking complements in

(1.24a), we have

X \ {x} =⋃

B∈B

Bc. (1.24b)

Since B is countable and each Bc is finite (as B contains an open set), (1.24b) provesX to be countable. In particular, for uncountable X, (X, T ) is not metrizable.Another example of a topology that is not first countable is given by the topologyof pointwise convergence, see Ex. 1.53(c) below.

—

First countable spaces are typically much easier to handle due to the following result:

Proposition 1.40. Let (X, T ) be a topological space, x ∈ X. Assume there existsa countable local base at x. Then each net (xi)i∈I in X, converging to x, contains asequence (xik)k∈N (which is not necessarily a subnet!) such that limk→∞ xik = x (inparticular, for each A ⊆ X, there is a net in A converging to x if, and only if, there isa sequence in A converging to x).

Proof. Assume there exists a countable local base B at x. Without loss of generality,we may assume the elements of B to be open. Let B1, B2, . . . be an enumeration of theelements of B (not necessarily injective). Define, for each k ∈ N,

B0k :=k⋂

j=1

Bj.

Then, since x ∈ B0k ⊆ Bk and each B0k is open, B0 := {B0k : k ∈ N} is still a countablelocal base at x, but with the additional property that B0,k+1 ⊆ B0k for each k ∈ N.Now suppose (xi)i∈I is a net in X, converging to x. We have to that the net containsa sequence that converges to x as well. Since limi∈I xi = x, given k ∈ N, there existsik ∈ I such that xik ∈ B0k. We claim limk→∞ xik = x. Indeed, if U ∈ U(x), then thereis N ∈ N with B0N ⊆ U . Thus,

∀k≥N

xik ∈ B0k ⊆ B0N ⊆ U,

proving limk→∞ xik = x as desired. �

In Ex. 1.53(c) below, we will see that, if there does not exist a local base at x, then itcan happen that there is a net in some set A, converging to x, but there is no sequencein A converging to x.

Lemma 1.41. Let (X, T ) be a topological space, A ⊆ X. Then x ∈ X is a cluster pointof A if, and only if, there is a net (ai)i∈I in A \ {x} such that limi∈I ai = x (by Prop.1.40, we may replace “net” by “sequence” if X is first countable).


Proof. Let (ai)i∈I be a net in A \ {x} such that limi∈I ai = x. If U ∈ U(x), thenthere must be i ∈ I with ai ∈ U . Since ai ∈ A and ai 6= x, this already showsU ∩ (A \ {x}) 6= ∅, i.e. x is a cluster point of A. Conversely, if x is a cluster point of A,for each U ∈ U(x), there exists xU ∈ U ∩ (A \ {x}). Then (xU)U∈U(x) is a net in A \ {x}such that limU∈U(x) xU = x. �

Remark 1.42. Let (X, T ) be a topological space, A ⊆ X, x ∈ X.

(a) If T is induced by a metric d on X, then x is a cluster point of A if, and only if,every U ∈ U(x) contains infinitely many distinct points from A (exercise) (thus,our new definition of cluster point is consistent with the definition in [Phi16, Def.7.33(a)]). On the other hand, we now consider a set X with at least two distinctelements and with the indiscrete topology. Let x, y ∈ X, x 6= y, A := {y}. Then xis a cluster point of A, even though X (the only neighborhood of x) contains onlyone point of A distinct from x. In particular, we see that X with the indiscretetopology cannot be metrizable.

(b) If x is an isolated point of A and (ai)i∈I is a net in A converging to x, then the netmust be finally constant with value x, i.e.

∃N∈I

∀i≥N

ai = x : (1.25)

Indeed, since there exists some U ∈ U(x) with U ∩ A = {x} and (ai)i∈I convergesto x, (1.25) must hold.

Theorem 1.43. Let (X, T ) be a topological space, A ⊆ X. Let H(A) denote the set ofcluster points of A, and let L(A) denote the set of limits of nets in A, i.e. L(A) consistsof all x ∈ X such that there is a net (xi)i∈I in A satisfying limi∈I xi = x (by Prop. 1.40,L(A) consists of all limits of sequences in A if X is first countable).

It then holds that A = L(A) = A ∪H(A).

Proof. It suffices to show that L(A) ⊆ A ⊆ A ∪H(A) ⊆ L(A).

“L(A) ⊆ A”: Suppose x /∈ A. Since X \ A is open, X \ A ∈ U(x), showing that no netin A can converge to x, i.e. x /∈ L(A).

“A ⊆ A∪H(A)”: Let x ∈ A \A. We need to show that x ∈ H(A). As A = A∪ ∂A andx /∈ A, we have x ∈ ∂A. Thus, if U ∈ U(x), then there must exist xU ∈ A ∩ U . Sincex /∈ A, we have xU 6= x, showing x to be a cluster point of A.

“A ∪ H(A) ⊆ L(A)”: If a ∈ A, then the constant sequence (a, a, . . . ) converges to a,implying a ∈ L(A). If a ∈ H(A), then a ∈ L(A) according to Lem. 1.41. �

Corollary 1.44. Let (X, T ) be a topological space, A ⊆ X. Then the following state-ments are equivalent:

(i) A is closed.

(ii) A = A.


(iii) A contains all cluster points of A.

(iv) A contains all limits of nets in A that are convergent in X (by Prop. 1.40, wemay replace “nets” by “sequences” if X is first countable3, also cf. [Phi16, Def.7.42(b)]).

In particular, if A does not have any cluster points, then A is closed.

Proof. The equivalence of (i) and (ii) is due to Th. 1.37(b) (A is the smallest closed setcontaining A). The equivalences of (ii), (iii), and (iv) are due to Th. 1.43: Using thenotation L(A) and H(A) from Th. 1.43, one has that A = A implies A = A ∪ H(A),i.e. H(A) ⊆ A, i.e. (ii) implies (iii). If H(A) ⊆ A, then L(A) = A ∪H(A) = A, i.e. (iii)implies (iv). If A = L(A), then A = A, i.e. (iv) implies (ii). �

Example 1.45. Let p, q ∈ N and consider the metric spaces given by Kp, Kq, Kp+q,each endowed with the topology induced by the respective 2-norm (cf. Ex. 1.22(c)). LetA ⊆ Kp, B ⊆ Kq.

(a) If A and B are closed, then A × B is closed in Kp+q = Kp × Kq: Let (ck)k∈N be aconvergent sequence in A × B with limk→∞ ck = c ∈ Kp+q. Then, for each k ∈ N,ck = (ak, bk) with ak ∈ Kp, bk ∈ Kq. Moreover, c = (a, b) with a ∈ Kp and b ∈ Kq.According to (1.18), one has a = limk→∞ ak and b = limk→∞ bk. Since A and B areclosed, from Cor. 1.44(iv), we know that a ∈ A and b ∈ B, i.e. c = (a, b) ∈ A× B,showing that A×B is closed.

(b) If A and B are open, then A × B is open in Kp+q = Kp × Kq: It suffices to showthat (A×B)c = Kp+q \ (A×B) is closed. To that end, note

(A×B)c = (Ac ×Kq) ∪ (Kp ×Bc) : (1.26)

For a point (z, w) ∈ Kp ×Kq = Kp+q, one reasons as follows:

(z, w) ∈ (A× B)c ⇔ (z, w) /∈ A×B

⇔(z /∈ A and w ∈ Kq

)or(z ∈ Kp and w /∈ B

)

⇔ (z, w) ∈ Ac ×Kq or (z, w) ∈ Kp × Bc

⇔ (z, w) ∈ (Ac ×Kq) ∪ (Kp ×Bc),

thereby proving (1.26). One now observes that Ac and Bc are closed, as A and Bare open. As Kp and Kq are also closed, by (a), Ac × Kq and Kp × Bc are closed,and, thus, by (1.26), so is (A×B)c. In consequence, A×B is open as claimed.

(c) Intervals in Rn, n ∈ N: A subset I of Rn is called an n-dimensional interval if, andonly if, I has the form I = I1 × · · · × In, where I1, . . . , In are intervals in R. Thelengths |I1|, . . . , |In| ∈ R+

0 ∪ {∞} are called the edge lengths of I. An interval I is

3One calls A sequentially closed if, and only if, A contains all limits of sequences in A. Thus, forfirst countable X, A is closed if, and only if, A is sequentially closed.


called a (hyper)cube if, and only if, all its edge lengths are equal. For x, y ∈ Rn,we write x < y (resp. x ≤ y) if, and only if, xj < yj (resp. xj ≤ yj) for eachj ∈ {1, . . . , n} (clearly, ≤ is a partial order on Rn, but not a total order for n > 1,for example, neither (0, 1) ≤ (1, 0), nor (1, 0) ≤ (0, 1)). If x, y ∈ Rn, x < y, then wedefine the following intervals

]x, y[ := {z ∈ Rn : x < z < y} =]x1, y1[× · · · ×]xn, yn[ open interval,

[x, y] := {z ∈ Rn : x ≤ z ≤ y} = [x1, y1]× · · · × [xn, yn] closed interval,

[x, y[ := {z ∈ Rn : x ≤ z < y} = [x1, y1[× · · · × [xn, yn[ halfopen interval,

]x, y] := {z ∈ Rn : x < z ≤ y} =]x1, y1]× · · · ×]xn, yn] halfopen interval.

Due to (a),(b) open intervals in Rn are open sets and closed intervals in Rn areclosed sets (with respect to the 2-norm).

1.3 Construction of Topological Spaces

1.3.1 Bases, Subbases

One can often build a topology T on X from smaller sets B ⊆ T by taking unions (thenB is called a base of T ) or from S ⊆ T by taking unions of finite intersections (then S iscalled a subbase of T ). This can be useful, since one might be able to prove somethingabout T by merely proving it for the (smaller) base or subbase (cf. Cor. 1.50 below).

Definition 1.46. Let (X, T ) be a topological space.

(a) A set B ⊆ T is called a base of the topology T if, and only if, every O ∈ T is aunion of sets from B, i.e. if, and only if, for each O ∈ T , there exists an index setI (where I can be empty, finite, or infinite) and sets Oi ∈ B, i ∈ I, such that

O =⋃

i∈I

Oi.

X is called second countable (sometimes also called a C2-space) if, and only if, thereexists a countable base of T (recall the definition of first countable from Def. 1.38).

(b) A set S ⊆ T is called a subbase of T if, and only if, {X} united with all finiteintersections of sets from S forms a base of T , i.e. if, and only if,

B := β(S) := {X} ∪{

n⋂

i=1

Oi : O1, . . . , On ∈ S, n ∈ N

}

forms a base of T .4

4Some authors require a subbase to have the additional property of being a cover of X (i.e. everyx ∈ X must be in some element of S), which leads to a similar, but nonequivalent, notion.


Lemma 1.47. Let (X, T ) be a topological space. Then B ⊆ T is a base of T if, andonly if, for each x ∈ X, B(x) := {B ∈ B : x ∈ B} is a local base at x (cf. Def. 1.38).

Proof. Let B ⊆ T . If B is a base of T , x ∈ X, U ∈ U(x), then there is O ∈ T withx ∈ O ⊆ U . Since B is a base, there must be B ∈ B with x ∈ B ⊆ O ⊆ U , showingB(x) to be a local base at x. Conversely, assume, for each x ∈ X, B(x) is a local baseat x. If O ∈ T , then, for each x ∈ O, there is Ox ∈ B(x) with x ∈ Ox ⊆ O. ThenO =

⋃

x∈O Ox, proving B to be a base of T . �

In general, not every set of subsets of X is the base of some topology on X. One hasthe following result:

Proposition 1.48. Let X be a set and B ⊆ P(X). Then B is the base of some topologyT on X if, and only if, B satisfies the following two conditions:

(i) B is a cover of X, i.e.

X =⋃

B∈B

B.

(ii) For each B1, B2 ∈ B and each x ∈ B1 ∩ B2, there exists B3 ∈ B such that

x ∈ B3 ⊆ B1 ∩ B2. (1.27)

Moreover, if B satisfies (i) and (ii), then

T =

{⋃

i∈I

Oi : Oi ∈ B, I some index set

}

. (1.28)

In particular, the topology T is uniquely determined by B; it is the coarsest topology onX containing B, i.e.

T = min{

M ⊆ P(X) : B ⊆ M ∧ M is topology on X}

(1.29)

where P(P(X)) is endowed with the partial order given by ⊆.

Proof. Suppose B satisfies (i) and (ii). We now define T by (1.28) and show it constitutesa topology on X: Coosing I := ∅, shows ∅ ∈ T . As an immediate consequence of (i),X ∈ T also holds. Now let O1, O2 ∈ T and assume O1 ∩ O2 6= ∅. If x ∈ O1 ∩ O2, then,by (1.28), there exist sets B1, B2 ∈ B such that B1 ⊆ O1, B2 ⊆ O2, and x ∈ B1 ∩ B2.According to (ii), there exists Bx ∈ B such that x ∈ Bx ⊆ B1 ∩ B2 ⊆ O1 ∩ O2. Thus,setting I := O1 ∩O2,

O1 ∩O2 =⋃

x∈I

Bx ∈ T ,

showing T is closed under finite intersections. To see that T is closed under arbitraryunions, let I be an index set and Oi ∈ T for each i ∈ I. Then

∀i∈I

Oi =⋃

j∈Ji

Bj


with some index sets Ji and each Bj ∈ B. Thus, letting J :=⋃

i∈I Ji,

⋃

i∈I

Oi =⋃

j∈J

Bj ∈ T ,

showing T to be closed under arbitrary unions. It merely remains to note that it isimmediate from the definition of T that B is a base of T .

Conversely, assume T to be an arbitrary topology on X such that B is a base. SinceX ∈ T , (i) must hold. If B1, B2 ∈ B, then B ⊆ T and T being a topology implyB1 ∩ B2 ∈ T . Thus, as B is a base, B1 ∩ B2 must be a union of sets from B. Inparticular, if x ∈ B1 ∩ B2, then there must be B3 ∈ B satisfying x ∈ B3 ⊆ B1 ∩ B2,proving (ii) to hold. Now let T denote the right-hand side of (1.28). Then T ⊆ Tsince B ⊆ T and T is closed under arbitrary unions. But since we also know T to be atopology on X (with B as a base), we obtain T = T . Similarly, if we let

T :={

M ⊆ P(X) : B ⊆ M ∧ M is topology on X}

,

then T ∈ T and Prop. 1.4 implies

τ :=⋂

M∈T

M

to be a topology. Then τ ⊆ T and (1.28) implies T ⊆ τ . �

Proposition 1.49. Let X be a set. Then every S ⊆ P(X) forms a subbase of sometopology on X. More precisely, the set B := β(S) of Def. 1.46(b) forms a base of atopology on X. According to Prop. 1.48, this topology must be unique; we denote itby τ(S) and call it the topology generated by S. Then τ(S) is the coarsest topologycontaining S.

Proof. One merely needs to check that B satisfies (i) and (ii) of Prop. 1.48. However, (i)holds, asX ∈ β(S); and (ii) holds, as β(S) is clearly closed under finite intersections. �

Corollary 1.50. Let (X, T ) be a topological space, x ∈ X. Let B be a base, S be asubbase for T , and define B(x) := {B ∈ B : x ∈ B}, S(x) := {S ∈ S : x ∈ S}.

(a) A net (xi)i∈I in X converges to x ∈ X if, and only if,

∀S∈S(x)

∃i∈I

∀j≥i

xj ∈ S. (1.30)

(b) The point x ∈ X is a cluster point of A ⊆ X if, and only if,

∀B∈B(x)

B ∩ (A \ {x}) 6= ∅.

(c) The point x ∈ X is a closure point of A ⊆ X (i.e. x ∈ A) if, and only if,

∀B∈B(x)

B ∩ A 6= ∅.


(d) A set A ⊆ X is dense if, and only if, for every nonempty B ∈ B, one has B∩A 6= ∅.

Proof. (a): Since S(x) ⊆ U(x), limi∈I xi = x implies (1.30). Conversely, assume (1.30)and let U ∈ U(x). Then there exists n ∈ N and S1, . . . , Sn ∈ S(x) such that x ∈⋂n

k=1 Sk ⊆ U . Due to (1.30), for each k ∈ {1, . . . , n}, there is ik ∈ I such that, foreach j ≥ ik, xj ∈ Sk. Since I is a directed set, there is i ∈ I with i ≥ ik for eachk ∈ {1, . . . , n}. Then, for each j ≥ i, one has xj ∈

⋂nk=1 Sk ⊆ U , proving limi∈I xi = x.

(b): Since B(x) ⊆ U(x), if x is a cluster point of A, then the condition in (b) holdsby Def. 1.32(e). Conversely, assume for every x ∈ X and every B ∈ B(x), one hasB ∩ (A \ {x}) = ∅, and let U ∈ U(x). Since there is B ∈ B(x) with B ⊆ U andB ∩ (A \ {x}) 6= ∅, we have U ∩ (A \ {x}) 6= ∅ as well, showing x to be a cluster pointof A.

(c) follows from (b), since, according to Th. 1.43, A is the union of A with its clusterpoints.

(d) is now immediate from (c), since A ⊆ X is dense if, and only if, A = X. �

Caveat: At the end of Ex. 1.53(c) below, we will see that, in general, one can not replacethe base B in Cor. 1.50(b),(c),(d) by the subbase S.Proposition 1.51. Let (X, T ) be a topological space.

(a) If X is second countable, then it is separable.

(b) If T is induced by a metric d on X, then X is second countable if, and only if, itis separable.

Proof. (a): Suppose, X is second countable. Then there is a countable base B for T .For each B ∈ B, let xB ∈ B. Then A := {xB : B ∈ B} is countable. We claim that A isalso dense in X. Indeed, if x ∈ X and U ∈ U(x), then there is B ∈ B with x ∈ B ⊆ U .Then xB ∈ A ∩ U , i.e. A is dense by Prop. 1.35(e).

(b): We need to show that if X is separable, then it has a countable base. Thus, letA ⊆ X be a countable dense set. Then B := {Br(a) : r ∈ Q, a ∈ A} is countable. Weclaim B is also a base of T . Indeed, let O ∈ T and x ∈ O. Then there exists r ∈ Q suchthat Br(x) ⊆ O. Moreover, since A is dense, there exists some a ∈ A∩B r

2(x). Consider

B := B r2(a). Then B ∈ B, x ∈ B, and, due to

∀y∈B

d(y, x) ≤ d(y, a) + d(a, x) <r

2+r

2= r,

also B ⊆ Br(x) ⊆ O, proving B to be a countable base of T . �

Example 1.52. (a) The set of all open n-dimensional intervals (cf. Ex. 1.45(c)) isa base for (Rn, T ), n ∈ N, if T is induced by the 2-norm; the set of all open n-dimensional intervals having only endpoints with rational coordinates is a countablebase of T . Moreover, the set all unbounded open intervals is a subbase of T (sinceevery bounded open interval is a finite intersection of unbounded open intervals).


(b) Let X be a nonempty set with a total order ≤. As on R, one can define openintervals on X by letting, for a, b ∈ X with a < b, Ia,b := {x ∈ X : a < x < b},I<b := {x ∈ X : x < b}, I>a := {x ∈ X : a < x}. We consider X itself to be anopen interval as well. Then, clearly, the set B of all open intervals satisfies conditions(i) and (ii) of Prop. 1.48 and, thus, forms a base for a topology T on X, called theorder topology induced by ≤ on X. The set S := {I<b : b ∈ X} ∪ {I>a : a ∈ X}is a subbase for T . On R, the order topology is just the normal topology. Anotherexample is given by X := R := R ∪ {−∞,∞}, letting

∀x,y∈X

x ≤ y :⇔

x = −∞ or

y = ∞ or

x, y ∈ R, x ≤ y.

In X, I<b = [−∞, b[ and I>a =]a,∞]. In contrast to R, {Ia,b : a, b ∈ R, a < b}is not a base for the order topology on X. Also note that a sequence (even a net,actually) in R converges in X if, and only if, it converges to x ∈ R or it diverges toeither ∞ or −∞.

(c) If (X, T ) is a discrete topological space, then B := {{x} : x ∈ X} is a base for Tand no strict subset of B is a base for T .

(d) Consider R and B := {[a, b[: a, b ∈ R, a < b}. Then, clearly, B satisfies conditions(i) and (ii) of Prop. 1.48 and, thus, forms a base for a topology T on R. Thistopology is called the Sorgenfrey topology or the right half-open interval topologyon R. This topology has a number of interesting properties: It is strictly finer thanthe usual topology T0 on R (the one generated by the open intervals), i.e. T0 ( T :Intervals of the form [a, b[, a < b, are not in T0, but

]a, b[=⋃

n∈N

[a+ 1/n, b[∈ T .

Moreover, T is an example of a topology that is separable and first countable, butnot second countable (thus, by Prop. 1.51, T can not be metrizable): Q is dense,since every [a, b[, a < b, contains a rational number. T is first countable, since, foreach x ∈ R, B(x) := {[x, x + 1

n[: n ∈ N} is a countable local base at x. Now let

C be an arbitrary base for T , x ∈ R. Since x ∈ Ox := [x, x + 1[ and Ox is open,there exists Bx ∈ C such that x ∈ Bx ⊆ Ox, implying x = minBx. In particular, ifx, y ∈ R, x 6= y, then Bx 6= By, showing that C cannot be countable.

Example 1.53. Let I be a nonempty index set and, for each i ∈ I, let (Xi, Ti) be atopological space. Moreover, for each i ∈ I, let Bi be a base of Ti and let Si be a subbaseof Ti. We consider the Cartesian product X :=

∏

i∈I Xi (cf. [Phi16, Def. 2.15(c)]). Ifx = (xi)i∈I ∈ X, then we call xi the ith coordinate of x. Moreover, for each j ∈ I, wecall the map

πj : X −→ Xj, πj((xi)i∈I

):= xj, (1.31)

the projection on the jth coordinate.


(a) Consider the set

Bp :=

{∏

i∈I

Oi :

(

∀i∈I

Oi ∈ Ti

)

∧ #{i ∈ I : Oi 6= Xi} <∞}

=

{⋂

j∈J

π−1j (Oj) : J ⊆ I, 0 < #J <∞, ∀

j∈JOj ∈ Tj

}

.

Then Bp satisfies conditions (i) and (ii) of Prop. 1.48 (since X ∈ Bp and since Bp isclosed under finite intersections by (A.1) of Appendix A and Def. 1.1(iii)) and, thus,forms a base for a topology Tp on X, called the product topology on X. Anotherbase for Tp is

B∗p :=

{∏

i∈I

Bi :

(

∀i∈I

Bi ∈ Bi ∪ {Xi})

∧ #{i ∈ I : Bi 6= Xi} <∞}

=

{⋂

j∈J

π−1j (Bj) : J ⊆ I, 0 < #J <∞, ∀

j∈JBj ∈ Bj

}

∪ {X} :

Let ∅ 6= J ⊆ I be finite, Oj ∈ Tj . Since Bj is a base for Tj, there exist Ij andBji ∈ Bj, i ∈ Ij, such that Oj =

⋃

i∈IjBji. Let K :=

∏

j∈J Ij. Then∏

j∈J

Oj =⋃

f∈K

∏

j∈J

Bj,f(j),

showing that every element of Bp is the union of elements from B∗p. A subbase for

Tp is

Sp :=

{∏

i∈I

Oi :

(

∀i∈I

Oi ∈ Ti

)

∧ #{i ∈ I : Oi 6= Xi} ≤ 1

}

={

π−1i (Oi) : i ∈ I, Oi ∈ Ti

}

.

Another subbase for Tp is

S∗p :=

{∏

i∈I

Si :

(

∀i∈I

Si ∈ Si ∪ {Xi})

∧ #{i ∈ I : Si 6= Xi} ≤ 1

}

={

π−1i (Si) : i ∈ I, Si ∈ Si

}

∪ {X}

(clearly, since Si is a subbase for Ti, each element of Sp is a union of finite inter-sections of elements from S∗

p). It is an exercise to show that, if I = {i1, i2, . . . } iscountable, and Tik is induced by the metric dk on Xik , then the product topologyon X is metrizable via the metric

d : X ×X −→ R+0 , d

(

(xik)k∈N, (yik)k∈N

)

:=∞∑

k=1

dk(xik , yik)

2k(1 + dk(xik , yik)

) (1.32)

(however, we will see in (c) below that, e.g., the product topology on KR is notmetrizable).


(b) An important special case of the product topology of (a) one obtains for I ={1, . . . , n}, n ∈ N, and Xi = K, in which case X = Kn. If we endow K with theusual topology (given by | · |), then, we claim that the product topology on Kn isprecisely the one induced by the 2-norm (and, by Th. 1.72 below, this is the sametopology induced by any other norm on Kn): Let T2 be the topology induced bythe 2-norm on Kn. We have to show T2 = Tp. Let O ∈ Tp, z = (z1, . . . , zn) ∈ O.Then there is ǫ > 0 such that

B :=n∏

j=1

Bǫ(zj) ⊆ O.

According to (1.19), we obtain Bǫ,‖·‖2(z) ⊆ B, showing O ∈ T2 and T2 ⊇ Tp. For theremaining inclusion, let z = (z1, . . . , zn) ∈ Kn and ǫ > 0. We need to find O ∈ Tp

such that z ∈ O ⊆ Bǫ,‖·‖2(z). However, again, due to (1.19), we can simply choose

O :=n∏

j=1

Bǫ/n(zj).

This shows T2 ⊆ Tp and, thus, T2 = Tp as desired.

(c) Another important special case of the product topology of (a) one obtains for I = R

and Xi = K for each i ∈ I (as in (b) with each Ti given by | · | on K). Then

X =∏

i∈I

Xi = KR = F(R,K)

is the set of functions from R into K. Then the product topology Tp is the so-calledtopology of pointwise convergence. Indeed, we can show that a net (fj)j∈J in Xconverges to f ∈ X pointwise, i.e.

∀s∈R

limj∈J

fj(s) = f(s), (1.33)

if, and only if, limj∈J fj = f with respect to Tp: Suppose, (1.33) holds. If f ∈ S ∈ Sp

(as defined in (a)), then there exists s0 ∈ R and an open set O ⊆ K such thatf(s0) ∈ O, S = π−1

s0(O). Since limj∈J fj(s0) = f(s0),

∃i∈J

∀j≥i

fj(s0) ∈ O, (1.34)

implying fj ∈ S for each j ≥ i. Thus, using Cor. 1.50(a), limj∈J fj = f with respectto Tp. Conversely, assume limj∈J fj = f with respect to Tp and fix s0 ∈ R. LetO ⊆ K be open and such that f(s0) ∈ O. Defining S = π−1

s0(O) as above, we see

that (1.34) must hold again, proving (1.33), i.e. pointwise convergence.

Next, we will see that Tp is not first countable (in particular, not metrizable): Define

A :=

{

(f : R −→ K) : ∃J ⊆ R finite

f(s) =

{

0 for s ∈ J,

1 for s /∈ J

}

⊆ X,


f0 ∈ X, f0 ≡ 0. Then f0 ∈ A: Indeed, if f0 ∈ B ∈ Bp, then B =⋂

s∈J π−1s (Os),

J ⊆ R finite, for each s ∈ J is 0 ∈ Os ⊆ K open. Then

f ∈ X, f(s) =

{

0 for s ∈ J,

1 for s /∈ J

is in A∩B, showing A∩B 6= ∅, proving f0 ∈ A. However, no sequence in A convergesto f0: If (fk)k∈N is a sequence in A, then J := {s ∈ R : fk(s) = 0 for some k ∈ N}is countable, i.e., for s /∈ J , fk(s) → 1 6= 0, showing fk 6→ f0. Thus, according toTh. 1.43 and Prop. 1.40, f0 can not have a countable local base (and an analogousargument actually shows that no f ∈ X can have a countable local base).

Somewhat surprisingly, Tp turns out to be separable: Let (d1, d2, . . . ) be an enu-meration of a dense subset of K (e.g. of Q for K = R and of Q×Q for K = C), andlet

K :={

(I1, . . . , Ik, n1, . . . , nk) :nα ∈ N; Iα = [rα1, rα2]; rα1, rα2 ∈ Q;

Iα ∩ Iβ = ∅ for α 6= β; k ∈ N}

(K is the set of pairs of k-tuples, where the first k-tuple consists of disjoint closedintervals with rational endpoints and the second k-tuple consists of natural num-bers). Then, clearly, K is countable (since Q and N are countable, and countableunions of countable sets are countable). For each τ := (I1, . . . , Ik, n1, . . . , nk) ∈ Kdefine fτ ∈ X by setting

fτ : R −→ K, fτ (s) :=

{

dnαif s ∈ Iα,

d1 otherwise.

As K is countable, so is A := {fτ : τ ∈ K} ⊆ X. We show A to be dense as well:Let ∅ 6= B ∈ Bp. Then there exists {s1, . . . , sk} ⊆ R, k ∈ N, and ∅ 6= Oα ⊆ K open,α ∈ {1, . . . , k}, such that

B =k⋂

α=1

π−1sα (Oα).

For each α ∈ {1, . . . , k}, there is dnα∈ Oα, nα ∈ N. Then there is

τ := (I1, . . . , Ik, n1, . . . , nk) ∈ K

such that sα ∈ Iα for each α ∈ {1, . . . , k}. Then fτ ∈ B, since fτ (sα) = dnα∈ Oα

for each α ∈ {1, . . . , k}, proving A to be dense and Tp to be separable.

Finally, we can use the present example to show that, in Cor. 1.50(b),(c),(d), onecan, in general, not replace the base B by the subbase S: Define

A1 :=

{

(f : R −→ K) : ∃s0∈R

f(s) =

{

0 for s = s0,

1 for s 6= s0

}

⊆ X,


f0 ∈ X, f0 ≡ 0. Then f0 is not a cluster point of A1 (and also f0 /∈ A1, as f0 /∈ A1):Let U := B 1

2

(0) ⊆ K. Then O := π−10 (U) ∩ π−1

1 (U) is an open neighborhood of f0,

but A1 ∩ O = ∅. On the other hand, if f0 ∈ S ∈ Sp, then, clearly, S ∩ A1 6= ∅,showing the base B in Cor. 1.50(b),(c) cannot be replaced by the subbase S. Tosee that the same holds for Cor. 1.50(d), one observes that A1 is not dense inY := A1 ∪ {f0} if Y is considered as a subspace of X (cf. Prop. 1.54(a) below).

If, in Ex. 1.53, instead of using the base Bp to obtain the product topology on X, oneuses the base

Bb :=

{∏

i∈I

Oi :

(

∀i∈I

Oi ∈ Ti

)}

=

{⋂

i∈I

π−1i (Oi) : ∀

i∈IOi ∈ Ti

}

,

then one obtains the so-called box topology on X (see Appendix B). In general, it hasdifferent properties that turn out to be less useful.

1.3.2 Subspaces

Proposition 1.54. Let (X, T ) be a topological space, M ⊆ X.

(a) We call

TM :=

{

A ⊆M : ∃O∈T

A = O ∩M}

(1.35)

the relative topology on M . It is also called the subspace or trace topology on M .It is, indeed, a topology. In this context, we also call O ∈ T X-open and O ∈ TM

M-open or relatively open.

(b) A set A ⊆ M is M-closed (i.e. closed with respect to TM , also called relativelyclosed) if, and only if, there exists an X-closed C ⊆ X such that A = C ∩M .

(c) If B is a base for T , S is a subbase for T , and

BM :=

{

A ⊆M : ∃B∈B

A = B ∩M}

, (1.36a)

SM :=

{

A ⊆M : ∃S∈S

A = S ∩M}

, (1.36b)

then BM is a base for TM and SM is a subbase for TM . Moreover, if a ∈ M andB(a) is a local base at a for T , then

BM(a) :=

{

A ⊆M : ∃B∈B(a)

A = B ∩M}

, (1.36c)

is a local base at a for TM .


Proof. (a): We have ∅ = M ∩ ∅ ∈ TM , M = M ∩ X ∈ TM . If A1, A2 ∈ TM , thenA1 = O1 ∩M , A2 = O2 ∩M with O1, O2 ∈ T . Then A1 ∩A2 = (O1 ∩O2)∩M , showingA1 ∩ A2 ∈ TM . If I is an index set and Ai ∈ TM for each i ∈ I, then there are Oi ∈ Tsuch that Ai = Oi ∩M . Then

A :=⋃

i∈I

Ai =⋃

i∈I

(Oi ∩M)[Phi16, Prop. 1.39(d)]

=

(⋃

i∈I

Oi

)

∩M,

showing A ∈ TM , i.e. TM is a topology.

(b): Let A ⊆ M . If A is M -closed, then M \ A is M -open, i.e. there is an X-openset O ⊆ X such that M \ A = M ∩ O. Then C := X \ O is an X-closed set andM ∩ C = M ∩ (X \ O) = M \ (M ∩ O) = M \ (M \ A) = A. Conversely, if there isan X-closed set C ⊆ X with A = C ∩M , then O := X \ C is an X-open set satisfyingO ∩M = M ∩ (X \ C) = M \ (C ∩M) = M \ A, showing M \ A is M -open, i.e. A isM -closed.

(c): If A ∈ TM , then A = O ∩M with O ∈ T . Thus, O =⋃

i∈I Bi with each Bi ∈ B. Inconsequence,

A =

(⋃

i∈I

Bi

)

∩M [Phi16, Prop. 1.39(d)]=

⋃

i∈I

(Bi ∩M).

Since each Bi ∩M ∈ BM , this shows BM to be a base for TM . We now assume the baseB to be the set of finite intersections of sets from S. If A ∈ BM , then A = B ∩M withB ∈ B. Moreover, B =

⋂ni=1 Si, n ∈ N, each Si ∈ S. Thus,

A =

(n⋂

i=1

Si

)

∩M [Phi16, Prop. 1.39(a)]= =

n⋂

i=1

(Si ∩M).

Since each Si ∩M ∈ SM , this shows SM to be a subbase for TM . Now let U ⊆ M bea TM -neighborhood of a. Then there exists O ∈ T such that a ∈ O ∩M ⊆ U . Thus,there exists B ∈ B(a) such that a ∈ B ⊆ O. Then a ∈ B ∩M ⊆ O∩M ⊆ U , and, sinceB ∩M ∈ BM(a), this shows BM(a) to be a local base at a for TM . �

One has to use care when working with a subspace (M, TM ) of a topological space(X, T ): The following Prop. 1.55 already shows that some properties of (X, T ) areinherited by the subspace (e.g. first and second countable, metrizable), but others arenot (e.g. separable, complete metric (to be studied later), compact (to be studied lateras well). Moreover, a set that isM -open might not be X-open and a set that isM -closedmight not be X-closed (see Ex. 1.56 below).

Proposition 1.55. Let (X, T ) be a topological space and (M, TM ) a subspace.

(a) If (ai)i∈I is a net in M , a ∈ M , then the net converges to a with respect to TM if,and only if, it converges to a with respect to T (caveat: ( 1

k)k∈N converges to 0 in

R, but does not converge in ]0, 1]; (k)k∈N converges to ∞ in R ∪ {∞}, but does notconverge in R).


(b) If a ∈ M and there is a countable local base at a with respect to T , then there is acountable local base at a with respect to TM . In particular, if T is first countable,then so is TM .

(c) If T is second countable, then so is TM .

(d) If T is metrizable by the metric d : X×X −→ R+0 , then TM is metrizable by d↾M×M

(we then call (M,d) a metric subspace of (X, d)).

(e) Somewhat surprisingly, every topological space (X, T ) is the subspace of some sep-arable topological space (Y, TY ), where Y \ X is even countable (in particular, ingeneral, not every subspace of a separable space is separable).

Proof. (a): Suppose, (ai)i∈I converges to a with respect to T , and let U ⊆ M be aTM -neighborhood of a. Then there is O ∈ T such that a ∈ O ∩M ⊆ U . Since O is aT -neighborhood of a,

∃i∈I

∀j≥i

aj ∈ O.

Then aj ∈ O ∩M ⊆ U , showing (ai)i∈I converges to a with respect to TM . Conversely,assume limi∈I ai = a holds with respect to TM , and let U ⊆ X be a T -neighborhood ofa. Again there is O ∈ T such that a ∈ O∩M ⊆ U . Since O∩M is an TM -neighborhoodof a,

∃i∈I

∀j≥i

aj ∈ O ∩M ⊆ U,

showing (ai)i∈I converges to a with respect to T .

(b) follows directly from the local base part of Prop. 1.54(c).

(c) follows directly from the base part of Prop. 1.54(c).

(d): We note that d ↾M×M is, indeed, a metric on M (since d satisfies the laws (i) –(iii) from Def. 1.5 for all x, y, z ∈ X, in particular, d satisfies the same laws for allx, y, z ∈M ⊆ X). Since T is metrizable, B := {Br(x) : x ∈ X, r ∈ R+} is a base of T .We have to show that C := {Br,M(a) : a ∈M, r ∈ R+}, where

∀a∈M

∀r∈R+

Br,M(a) :=M ∩ Br(a) = {x ∈M : d(a, x) < r} (1.37)

are the open balls with respect to d↾M×M , constitutes a base for TM . If A ∈ TM , thenA = O ∩M with O ∈ T . Thus,

∀a∈A

∃r∈R+

Br(a) ⊆ O.

Intersecting with M yields Br,M(a) = Br(a)∩M ⊆ O ∩M = A, proving C to be a basefor TM as desired.

(e): See [Sie52, p. 49]. �

Example 1.56. (a) If (M, TM ) is a topological subspace of (X, T ), then, M is alwaysboth M -open and M -closed (irrespective of M being X-open or X-closed).


(b) Let X = R with the usual metric, i.e. d(x, y) = |x − y| for each x, y ∈ R. LetM = [0, 1]. According to (a), M is both M -closed and M -open, even though [0, 1]is not open in X. If M =]0, 1], then, again, M is both M -closed and M -open, eventhough ]0, 1] is neither closed nor open in X. Moreover, ]0, 1

2] is M -closed (but not

X-closed) and ]12, 1] is M -open (but not X-open).

1.4 Topics Particular to Metric and Normed Spaces

1.4.1 Basic Inequalities

Lemma 1.57. (a) The following law holds in every metric space (X, d):

|d(x, y)− d(x′, y′)| ≤ d(x, x′) + d(y, y′) for each x, x′, y, y′ ∈ X. (1.38a)

(b) The following law holds in every normed vector space (X, ‖ · ‖):∣∣‖x‖ − ‖y‖

∣∣ ≤ ‖x− y‖ for each x, y ∈ X. (1.38b)

This law is sometimes referred to as the inverse triangle inequality.

Proof. (a): First, note d(x, y) ≤ d(x, x′) + d(x′, y′) + d(y′, y), i.e.

d(x, y)− d(x′, y′) ≤ d(x, x′) + d(y′, y). (1.39a)

Second, d(x′, y′) ≤ d(x′, x) + d(x, y) + d(y, y′), i.e.

d(x′, y′)− d(x, y) ≤ d(x′, x) + d(y, y′). (1.39b)

Taken together, (1.39a) and (1.39b) complete the proof of (1.38a).

(b): Let d(x, y) := ‖x − y‖ be the induced metric on X. Applying (a) to d yields theestimate

∣∣‖x‖ − ‖y‖

∣∣ = |d(x, 0)− d(y, 0)| ≤ d(x, y) + d(0, 0) = ‖x− y‖,


1.4.2 Completeness

Definition 1.58. Let (X, d) be a metric space. The sequence (xk)k∈N in X is said to bea Cauchy sequence if, and only if, for each ǫ > 0, there is N ∈ N such that, d(xk, xl) < ǫfor each k, l > N .

Proposition 1.59. Let (X, d) be a metric space and let (xk)k∈N be a sequence in X.

(a) If (xk)k∈N is convergent, then it is a Cauchy sequence.


(b) If X = Kn, n ∈ N, and d is induced by the 2-norm on Kn, then the converse of (a)also holds: If (xk)k∈N is a Cauchy sequence, then it is convergent in Kn.

Proof. (a) (cf. the proof for sequences in K in [Phi16, Th. 7.29]): If limk→∞ xk = a ∈ X,then, given ǫ > 0, there is N ∈ N such that xk ∈ B ǫ

2(a) for each k > N . If k, l > N ,

then d(xk, xl) ≤ d(xk, a) + d(a, xl) < ǫ2+ ǫ

2= ǫ, establishing that (xk)k→∞ is a Cauchy

sequence.

(b): As (xk)k∈N is Cauchy, given ǫ ∈ R+, there is N ∈ N such that, for each k, l > N ,

‖xk − xl‖2 < ǫ.

Then, by (1.19), for each k, l > N ,

∀j∈{1,...,n}

|xkj − xlj| ≤ ‖xk − xl‖2 < ǫ,

implying, according to [Phi16, Def. 7.28], (xkj )k∈N is a Cauchy sequence in K for eachj ∈ {1, . . . , n}. Then [Phi16, Th. 7.29] yields each (xkj )k∈N to be convergent in K withsome limit aj ∈ K. Thus, finally, (xk)k∈N converges to a := (a1, . . . , an) ∈ Kn by Ex.1.22(c). �

While we just saw that, in Kn, each Cauchy sequence converges, this is not true for allmetric spaces, as simple examples show. Take, e.g., X = Q or X =]0, 1] with d beinggiven by the absolute value. A less trivial example is the following:

Example 1.60. Let X be the vector space over K of sequences in K that are finallyconstant and equal to 0. Thus, the sequence z = (zn)n∈N, zn ∈ K for each n ∈ N, isin X if, and only if, there exists N ∈ N such that zn = 0 for each n ≥ N . Clearly,X endowed with the norm ‖ · ‖sup is a subspace of the normed vector space B(S,K) ofExample 1.15 with S := N. Defining, for each n, k ∈ N,

zkn :=

{

1/n for 1 ≤ n ≤ k,

0 for n > k,(1.40)

one sees that (zk)k∈N is a Cauchy sequence in X (i.e. with respect to ‖ · ‖sup), but it isnot convergent in X (its limit, the sequence (1/n)n∈N is not finally constant and, thus,not in X).

Definition 1.61. A metric space (X, d) and its metric d are both called complete if,and only if, every Cauchy sequence in X converges. A normed space is called a Banachspace if, and only if, the metric induced by the norm is complete. In that case, onealso says that the normed space and the norm itself are complete. We call the completemetric space (Y, dY ) a completion of (X, d) if, and only if, there exists an injective mapφ : X −→ Y that is isometric, i.e.

∀x1,x2∈X

dY(φ(x1), φ(x2)

)= d(x1, x2),

and φ(X) is dense in Y (according to Th. 2.35 below, a completion always exists and,in a certain sense, it is even unique).

—


Prop. 1.59(b) means that Kn is a Banach space.

1.4.3 Inner Products and Hilbert Space

Definition 1.62. Let X be a vector space over K. A function 〈·, ·〉 : X × X −→ K

is called an inner product or a scalar product on X if, and only if, the following threeconditions are satisfied:

(i) 〈x, x〉 ∈ R+ for each 0 6= x ∈ X.

(ii) 〈λx + µy, z〉 = λ〈x, z〉 + µ〈y, z〉 for each x, y, z ∈ X and each λ, µ ∈ K (i.e. aninner product is K-linear in its first argument).

(iii) 〈x, y〉 = 〈y, x〉 for each x, y ∈ X (i.e. an inner product is conjugate-symmetric,even symmetric for K = R).

Lemma 1.63. For each inner product 〈·, ·〉 on a vector space X over K, the followingformulas are valid:

(a) 〈x, λy + µz〉 = λ〈x, y〉 + µ〈x, z〉 for each x, y, z ∈ X and each λ, µ ∈ K, i.e. 〈·, ·〉is conjugate-linear (also called antilinear) in its second argument, even linear forK = R. Together with Def. 1.62(ii), this means that 〈·, ·〉 is a sesquilinear form,even a bilinear form for K = R.

(b) 〈0, x〉 = 〈x, 0〉 = 0 for each x ∈ X.

Proof. (a): One computes, for each x, y, z ∈ X and each λ, µ ∈ K,

〈x, λy + µz〉 Def. 1.62(iii)= 〈λy + µz, x〉 Def. 1.62(ii)

= λ〈y, x〉+ µ〈z, x〉= λ 〈y, x〉+ µ 〈z, x〉 Def. 1.62(iii)

= λ〈x, y〉+ µ〈x, z〉.

(b): One computes, for each x ∈ X,

〈x, 0〉 Def. 1.62(iii)= 〈0, x〉 = 〈0x, x〉 Def. 1.62(ii)

= 0〈x, x〉 = 0,

thereby completing the proof of the lemma. �

Theorem 1.64. The following Cauchy-Schwarz inequality (1.41) holds for each innerproduct 〈·, ·〉 on a vector space X over K:

|〈x, y〉| ≤ ‖x‖ ‖y‖ for each x, y ∈ X, (1.41)

where‖x‖ :=

√

〈x, x〉, ‖y‖ :=√

〈y, y〉. (1.42)

Moreover, equality in (1.41) holds if, and only if, x and y are linearly dependent, i.e. if,and only if, y = 0 or there exists λ ∈ K such that x = λy.


Proof. If y = 0, then it is immediate that both sides of (1.41) vanish. If x = λy with

λ ∈ K, then |〈x, y〉| = |λ〈y, y〉| = |λ|‖y‖2 =√

λλ〈y, y〉‖y‖ = ‖x‖ ‖y‖, showing that(1.41) holds with equality. If x and y are not linearly independent, then y 6= 0 andx− λy 6= 0 for each λ ∈ K, i.e.

0 < 〈x− λy, x− λy〉 = 〈x, x− λy〉 − λ〈y, x− λy〉= 〈x, x〉 − λ〈x, y〉 − λ〈y, x〉+ λλ〈y, y〉 = ‖x‖2 − λ〈x, y〉 − λ 〈x, y〉+ |λ|2‖y‖2. (1.43)

Since (1.43) is valid for each λ ∈ K, one can set λ := 〈x, y〉/‖y‖2 (using y 6= 0) to get

0 < ‖x‖2 − 2 〈x, y〉〈x, y〉‖y‖2 +

〈x, y〉〈x, y〉‖y‖2 =

‖x‖2‖y‖2 − 〈x, y〉〈x, y〉‖y‖2 , (1.44)

or 〈x, y〉〈x, y〉 < ‖x‖2‖y‖2. Finally, taking the square root on both sides shows that(1.41) holds with strict inequality. �

Proposition 1.65. If X is a vector space over K with an inner product 〈·, ·〉, then themap

‖ · ‖ : X −→ R+0 , ‖x‖ :=

√

〈x, x〉, (1.45)

defines a norm on X. One calls this the norm induced by the inner product.

Proof. If x = 0, then 〈x, x〉 = 0 and ‖x‖ = 0 as well. Conversely, if x 6= 0, then〈x, x〉 > 0 and ‖x‖ > 0 as well, showing that ‖ · ‖ is positive definite. For λ ∈ K

and x ∈ X, one has ‖λx‖ =√

λλ〈x, x〉 =√

|λ|2〈x, x〉 = |λ|‖x‖, showing that ‖ · ‖ ishomogeneous of degree 1. Finally, if x, y ∈ X, then

‖x+ y‖2 = 〈x+ y, x+ y〉 = ‖x‖2 + 〈x, y〉+ 〈y, x〉+ ‖y‖2(1.41)

≤ ‖x‖2 + 2‖x‖ ‖y‖+ ‖y‖2 =(‖x‖+ ‖y‖

)2,

establishing that ‖ ·‖ satisfies the triangle inequality. In conclusion, we have shown that‖ · ‖ constitutes a norm on X. �

Definition 1.66. Let X be a vector space over K. If 〈·, ·〉 is an inner product on X,then

(X, 〈·, ·〉

)is called an inner product space or a pre-Hilbert space. An inner product

space is called a Hilbert space if, and only if, (X, ‖·‖) is a Banach space, where ‖·‖ is theinduced norm, i.e. ‖x‖ :=

√

〈x, x〉. Frequently, the inner product on X is understoodand X itself is referred to as an inner product space or Hilbert space.

Example 1.67. On the space Kn, n ∈ N, we define an inner product by letting, foreach z = (z1, . . . , zn) ∈ Kn, w = (w1, . . . , wn) ∈ Kn:

z · w :=n∑

j=1

zjwj (1.46)

(called the Euclidean inner product for K = R). Let us verify that (1.46), indeed, definesan inner product in the sense of Def. 1.62: If z 6= 0, then there is j0 ∈ {1, . . . , n} such


that zj0 6= 0. Thus, z · z =∑nj=1 |zj|2 ≥ |zj0|2 > 0, i.e. Def. 1.62(i) is satisfied. Next, let

z, w, u ∈ Kn and λ, µ ∈ K. One computes

(λz+µw) ·u =n∑

j=1

(λzj +µwj)uj =n∑

j=1

λzjuj +n∑

j=1

µwjuj = λ(z ·u)+µ(w ·u), (1.47a)

i.e. Def. 1.62(ii) is satisfied. For Def. 1.62(iii), merely note that

z · w =n∑

j=1

zjwj =n∑

j=1

wj zj = w · z. (1.47b)

Hence, we have shown that (1.46) defines an inner product according to Def. 1.62. Sincethe 2-norm of Def. 1.11 is the same as the norm induced by the inner product of (1.46),this also proves that the 2-norm satisfies the Cauchy-Schwarz inequality (1.41). Dueto Prop. 1.59(b), the 2-norm is complete, i.e. Kn with the inner product of (1.46) is aHilbert space.

Definition 1.68. If(X, 〈·, ·〉

)is an inner product space, then x, y ∈ X are called

orthogonal or perpendicular (denoted x ⊥ y) if, and only if, 〈x, y〉 = 0. A unit vector isx ∈ X such that ‖x‖ = 1, where ‖ · ‖ is the induced norm. An orthogonal system is afamily (xi)i∈I , xi ∈ X, I being some index set, such that 〈xi, xj〉 = 0 for each i, j ∈ Iwith i 6= j. An orthogonal system is called an orthonormal system if, and only if, itconsists entirely of unit vectors.

Remark 1.69. If(X, 〈·, ·〉

)is an inner product space, then one has Pythagoras’ theorem,

namely that for each x, y ∈ X with x ⊥ y:

‖x+ y‖2 = ‖x‖2 + 〈x, y〉+ 〈y, x〉+ ‖y‖2 = ‖x‖2 + ‖y‖2. (1.48)

1.4.4 Equivalence of Metrics and Equivalence of Norms

The p-norms of Def. 1.11 provide an uncountable number of different norms on Kn. Itis an important result that they all generate the same open sets (i.e. the same topology– the so-called norm topology) on Kn – one says that all norms on Kn are equivalent.Before we will prove this result in Th. 1.72 below, we introduce the notion of equivalencefor metrics and norms. We will also see that, even though all norms on Kn are equivalent,norms on other normed vector spaces are not necessarily equivalent (see Example 1.75below).

Definition 1.70. (a) Let d1 and d2 be metrics on a set X. Then d1 and d2 are said tobe equivalent if, and only if, the topologies T1 and T2 on X, induced by d1 and d2,respectively, are identical, , i.e. if, and only if, for each A ⊆ X, the following holds:

A is d1-open ⇔ A is d2-open.


(b) Let ‖ · ‖1 and ‖ · ‖2 be norms on a vector space X over K. Then ‖ · ‖1 and ‖ · ‖2 aresaid to be equivalent if, and only if, there exist positive constants α, β ∈ R+ suchthat

α‖x‖1 ≤ ‖x‖2 ≤ β‖x‖1 for each x ∈ X. (1.49)

Proposition 1.71. Let ‖ · ‖1 and ‖ · ‖2 be norms on a vector space X over K, and letd1 and d2 be the respective induced metrics on X. Then ‖ · ‖1 and ‖ · ‖2 are equivalentnorms if, and only if, d1 and d2 are equivalent metrics.

Proof. If X = {0}, then there is nothing to show. Thus, assume that there exists somev ∈ X \ {0}. Let T1, T2 denote the topologies induced by d1, d2, respectively. First,assume (1.49) holds, i.e. the norms are equivalent. If A ∈ T1 and x ∈ A, then thereexists ǫ > 0 such that Bǫ,d1(x) ⊆ A. Thus, for each y ∈ Bδ,d2(x) satisfying δ := αǫ, oneobtains

d1(x, y) ≤1

α‖x− y‖2 <

δ

α= ǫ,

showing Bδ,d2(x) ⊆ Bǫ,d1(x) ⊆ A and that A ∈ T2. Now assume A ∈ T2. If x ∈ A, thenthere exists ǫ > 0 such that Bǫ,d2(x) ⊆ A. Then, for each y ∈ Bδ,d1(x) with δ := ǫ/β, itholds that

d2(x, y) ≤ β ‖x− y‖1 < βδ = ǫ,

showing Bδ,d1(x) ⊆ Bǫ,d2(x) ⊆ A. Hence, A ∈ T1. So far, we have proved that thevalidity of (1.49) implies T1 = T2 (i.e. d1 and d2 are equivalent).

Conversely, assume that the induced metrics d1 and d2 are equivalent. According toDef. 1.70(a), 0 ∈ X has to be a d1-interior point of both the open d1-ball B1,d1(0) andthe open d2-ball B1,d2(0). Moreover, 0 also has to be a d2-interior point of both openballs. We claim that the set M :=

{‖x‖2 : ‖x‖1 = 1

}⊆ R+

0 is bounded. Proceedingby contraposition, assume that M is unbounded (from above, as it is always boundedfrom below by 0). Then there exists a sequence (xk)k∈N in X such that ‖xk‖1 = 1 foreach k ∈ N and limk→∞ ‖xk‖2 = ∞. Define ηk := ‖xk‖2 and yk := xk/ηk (note ηk 6= 0,since ‖xk‖1 = 1). Then ‖yk‖2 = 1 for each k ∈ N. Moreover, ‖yk‖1 = 1/ηk, showinglimk→∞ d1(0, y

k) = limk→∞ ‖yk‖1 = 0. Thus, for each ǫ > 0, Bǫ,d1(0) contains elementsyk with ‖yk‖2 = 1, i.e. 0 is not a d1-interior point of B1,d2(0). Thus, if 0 is a d1-interiorpoint of B1,d2(0), then M must be bounded. Letting

β := sup{‖x‖2 : ‖x‖1 = 1

}∈ R+

(indeed, β > 0, as ‖x‖1 = 1 implies x 6= 0 and ‖x‖2 > 0), one has

∀x∈X\{0}

‖x‖2 =∥∥∥∥‖x‖1

x

‖x‖1

∥∥∥∥2

≤ β ‖x‖1.

We have therefore found a constant β > 0 such that the corresponding part of (1.49) issatisfied. One can now proceed completely analogously to show that the hypothesis of 0being a d2-interior point of B1,d1(0) implies that the set

{‖x‖1 : ‖x‖2 = 1

}is bounded

andγ := sup

{‖x‖1 : ‖x‖2 = 1

}∈ R+


satisfies ‖x‖1 ≤ γ‖x‖2 for each x ∈ X. Finally, letting α := γ−1 completes the proof ofthe equivalence of ‖ · ‖1 and ‖ · ‖2. �

Theorem 1.72. All norms on Kn, n ∈ N, are equivalent.

Proof. It suffices to show that every norm on Kn is equivalent to the 2-norm on Kn. Solet ‖ · ‖2 denote the 2-norm on Kn and let ‖ · ‖ denote an arbitrary norm on Kn. Clearly,every z ∈ Kn can be written as z =

∑nj=1 zjej, where

e1 := (1, 0, . . . , 0), e2 := (0, 1, . . . , 0), . . . , en := (0, . . . , 0, 1)

are the standard unit vectors of Kn. Moreover, the 2-norm satisfies the Cauchy-Schwarzinequality (1.41) (cf. Ex. 1.67), which can be exploited to get

‖z‖ =

∥∥∥∥∥

n∑

j=1

zjej

∥∥∥∥∥≤

n∑

j=1

|zj|‖ej‖ =(|z1|, . . . , |zn|

)·(‖e1‖, . . . , ‖en‖

)

(1.41)

≤ ‖z‖2∥∥(‖e1‖, . . . , ‖en‖

)∥∥2, (1.50)

that means, with β :=√∑n

j=1 ‖ej‖2 > 0,

‖z‖ ≤ β‖z‖2 for each z ∈ Kn. (1.51)

We claim that there is also α > 0 such that

α‖z‖2 ≤ ‖z‖ for each z ∈ Kn. (1.52)

Seeking a contradiction, assume that there is no α > 0 satisfying (1.52). Then thereis a sequence (zk)k∈N in Kn such that, for each k ∈ N, 1

k‖zk‖2 > ‖zk‖. Letting wk :=

zk/‖zk‖2, one gets 1k‖wk‖2 > ‖wk‖ and ‖wk‖2 = 1 for each k ∈ N. The Bolzano-

Weierstrass Th. 1.31 yields a subsequence (uk)k∈N of (wk)k∈N that converges with respectto ‖ · ‖2 to some u ∈ Kn. We use the inverse triangle inequality to obtain

∣∣‖u‖2 − 1

∣∣ =

∣∣‖u‖2 − ‖uk‖2

∣∣(1.38b)

≤ ‖u− uk‖2 → 0 for k → ∞,

implying ‖u‖2 = 1, and, in particular, u 6= 0. On the other hand, using (1.51), one has

∣∣‖u‖ − ‖uk‖

∣∣(1.38b)

≤ ‖u− uk‖(1.51)

≤ β‖uk − u‖2 → 0 for k → ∞,

implying ‖u‖ = limk→∞ ‖uk‖ ≤ limk→∞1k‖uk‖2 = limk→∞

1k= 0, i.e. u = 0 in a contra-

diction to u 6= 0. Thus, the assumption that there is no α > 0 satisfying (1.52) musthave been wrong, i.e. (1.52) must hold for some α > 0. The proof is concluded by theobservation that (1.51) together with (1.52) is precisely the statement that ‖ · ‖2 and‖ · ‖ are equivalent. �

Definition 1.73. According to Th. 1.72, there exists a unique topology on Kn, n ∈ N,that is induced by the norms on Kn. We call this topology the norm topology on Kn.


Caveat 1.74. Even though it follows from Th. 1.72 and Prop. 1.71 that all metrics onKn induced by norms on Kn are equivalent, there exist nonequivalent metrics on Kn

(examples?).

—

The following Ex. 1.75 shows that, in general, there can be norms on a real vector spaceX that are not equivalent.

Example 1.75. As in Ex. 1.60 before, let X be the vector space over K, consisting ofthe sequences in K that are finally constant and equal to zero. Then

∥∥(zn)n∈N

∥∥1:=

∞∑

n=1

|zn| and (1.53a)

∥∥(zn)n∈N

∥∥sup

:= max{|zn| : n ∈ N

}(1.53b)

define norms on X (‖·‖sup is the same norm that was considered in the earlier example).Clearly, the sequence (zk)k∈N in X defined by

zkn :=

{

1/k for 1 ≤ n ≤ k,

0 for n > k,(1.54)

converges to (0, 0, . . . ) ∈ X with respect to ‖ · ‖sup; however, the sequence does notconverge in X with respect to ‖·‖1 (exercise), proving ‖·‖1 and ‖·‖sup are not equivalent.

—

We remarked before that boundedness, which is a useful concept in metric spaces, is nota topological concept. To emphasize this further, we show in the following Prop. 1.76that every metric is equivalent to a bounded metric.

Proposition 1.76. Let (X, d) be a metric space. Then

d1 : X ×X −→ R+0 , d1(x, y) := min{1, d(x, y)}, (1.55)

defines a metric on X that is equivalent to d (in particular, every metric is equivalentto a bounded metric).

Proof. We verify that d1 is a metric. Let x, y, z ∈ X. We have

d1(x, y) = 0 ⇔ d(x, y) = 0 ⇔ x = y,

showing d1 to be positive definite. The symmetry of d immediately implies the symmetryof d1. For the triangle inequality, we estimate

d1(x, z) = min{1, d(x, z)} ≤ L := min{1, d(x, y) + d(y, z)}(∗)

≤ d1(x, y) + d1(y, z) = min{1, d(x, y)}+min{1, d(y, z)} =: R, (1.56)

2 LIMITS AND CONTINUITY OF FUNCTIONS 44

where we still need to prove the inequality at (∗). If d(x, y) ≤ 1 and d(y, z) ≤ 1, thenR = d(x, y) + d(y, z) and L ≤ R is clear. If d(x, y) > 1 or d(y, z) > 1, then L = 1 ≤ R,finishing the proof of (∗). Thus, d1 is a metric, and it remains to show that d and d1are equivalent. Let O ⊆ X. If O is d-open and x ∈ O, then there is r > 0 such thatBr,d(x) ⊆ O. Let ǫ := min{r, 1}. Then Bǫ,d1(x) = Bǫ,d(x) ⊆ Br,d(x) ⊆ O, showing Oto be d1-open. Similarly, if O is d1-open and x ∈ O, then there is 1 ≥ r > 0 such thatBr,d1(x) ⊆ O. Since r ≤ 1, Br,d(x) = Br,d1(x) ⊆ O, showing O to be d-open. �

2 Limits and Continuity of Functions

2.1 Definitions and Properties

In the following Definitions 2.1(a) and 2.3, we will generalize the notion of continuity[Phi16, Def. 7.31] to topological spaces, the notions of uniform continuity [Phi16, (10.39)]and Lipschitz continuity [Phi16, Def. and Rem. 10.17] to metric spaces. Even though itis less obvious, we will also see that the limit definition of Def. 2.1(b) is a generalizationof [Phi16, Def. 8.17].

Definition 2.1. Let (X, TX) and (Y, TY ) be topological spaces, M ⊆ X, f : M −→ Y .

(a) f is called continuous in ξ ∈M if, and only if,

∀U∈U(f(ξ))

∃V ∈U(ξ)

f(V ∩M) ⊆ U. (2.1)

(b) If ξ ∈ X is a cluster point of M , then f is said to tend to η ∈ Y (or to have thelimit η ∈ Y ) for x→ ξ (denoted by limx→ξ f(x) = η) if, and only if,

∀U∈U(η)

∃V ∈U(ξ)

f((V ∩M) \ {ξ}

)⊆ U. (2.2)

Remark 2.2. (a) The reason that ξ is removed from V in (2.2) is that one wants toallow the situation f(ξ) 6= limx→ξ f(x), i.e. the value of a function in ξ is allowedto differ from the functions limit for x → ξ. Thus, for a cluster point ξ of M withξ ∈M , one of three distinct cases will always occur: (i) limx→ξ f(x) does not exist,(ii) f(ξ) 6= limx→ξ f(x), (iii) f(ξ) = limx→ξ f(x).

(b) If ξ ∈M is a cluster point of M , then it is an immediate consequence of (2.1) and(2.2) that f is continuous in ξ if, and only if, f(ξ) = limx→ξ f(x).

(c) According to Def. 2.1(a), f : M −→ Y is continuous in ξ ∈ M with respect to thetopology TX on X if, and only if, f is continuous in ξ with respect to the topologyTM on M , where TM is the relative topology on M induced by TX (and analogouslyfor Def. 2.1(b)).


(d) Recall that, if the topology T is induced by the metric d, then U ∈ U(x) if, andonly if, there exist ǫ > 0 with x ∈ Bǫ(x) ⊆ U . Thus, if TX and TY are induced bymetrics dX on X and dY on Y , respectively, then (2.1) is equivalent to

∀ǫ∈R+

∃δ∈R+

f(Bδ(ξ) ∩M

)⊆ Bǫ

(f(ξ)

), (2.3a)

and (2.2) is equivalent to

∀ǫ∈R+

∃δ∈R+

f((Bδ(ξ) ∩M) \ {ξ}

)⊆ Bǫ(η). (2.3b)

Definition 2.3. Let (X, TX) and (Y, TY ) be topological spaces, M ⊆ X, f : M −→ Y .

(a) f is called continuous in M if, and only if, f is continuous in each ξ ∈M . The setof all continuous functions from M into Y is denoted by C(M,Y ).

(b) Suppose TX and TY are induced by metrics dX on X and dY on Y , respectively.Then f is called uniformly continuous in M if, and only if, for each ǫ > 0, there isδ > 0 such that:

∀x,y∈M

dX(x, y) < δ ⇒ dY(f(x), f(y)

)< ǫ (2.4)

(note that, here, δ must not depend on x and y). Moreover, f is called Lipschitzcontinuous in M with Lipschitz constant L ∈ R+

0 if, and only if,

∀x,y∈M

dY(f(x), f(y)

)≤ LdX(x, y). (2.5)

—

Of course, if continuity is used in a metric space, it is meant with respect to the inducedtopology; if uniform continuity or Lipschitz continuity are used in a normed space, thenthey are meant with respect to the induced metric.

Lemma 2.4. Let (X, dX) and (Y, dY ) be metric spaces, M ⊆ X, f : M −→ Y . If fis Lipschitz continuous in M , then f is uniformly continuous in M . If f is uniformlycontinuous in M , then f is continuous in M .

Proof. If f is Lipschitz continuous, then there is L ∈ R+0 such that dY

(f(x), f(y)

)≤

LdX(x, y) for each x, y ∈ M . Thus, given ǫ > 0, choose δ := ǫ for L = 0 and δ := ǫ/Lfor L > 0. Let x, y ∈M such that dX(x, y) < δ. If L = 0, then dY

(f(x), f(y)

)= 0 < ǫ.

If L > 0, then dY(f(x), f(y)

)≤ LdX(x, y) < Lǫ/L = ǫ, showing that f is uniformly

continuous. If f is uniformly continuous, then, given ǫ > 0, there is δ(ǫ) > 0 such that,for each x, y ∈ M with dX(x, y) < δ(ǫ), it is dY

(f(x), f(y)

)< ǫ. Fix x ∈ X. Since

f(Bδ(ǫ)(x)∩M) ⊆ Bǫ(f(x)), f is continuous in x. As x was arbitrary, f is continuous. �

Example 2.5. Consider X = R with the usual metric given by the absolute valuefunction, M := R+.


(a) f : M −→ R, f(x) := 1/x, is continuous, but not uniformly continuous: For eachξ ∈ R+ and each δ > 0, one has

f(ξ)− f(ξ + δ) =1

ξ− 1

ξ + δ=

δ

ξ(ξ + δ).

Thus, for a fixed δ > 0 and ǫ > 0, one has, for each ξ ∈ R+ that is chosen smallerthan δ/2 and also smaller than 1/(2ǫ),

f(ξ)− f(ξ + δ/2) =δ

2ξ(ξ + δ/2)

ξ<δ/2>

1

2ξ

ξ<1/(2ǫ)> ǫ,

i.e. x := ξ and y := ξ + δ/2 are points such that |x− y| = δ/2 < δ, but

|f(x)− f(y)| = δ

2ξ(ξ + δ/2)> ǫ,

showing f is not uniformly continuous.

(b) g : M −→ R, g(x) := x2, is continuous, but not uniformly continuous: For eachξ ∈ R+ and each δ > 0, one has

g(ξ + δ)− g(ξ) = (ξ + δ)2 − ξ2 = 2ξδ + δ2.

Thus, for a fixed δ > 0 and ǫ > 0, one has, for each ξ ∈ R+ that is chosen biggerthan ǫ/δ,

g(ξ + δ/2)− g(ξ) = ξδ + δ2/4 > ξδ > ǫ,

i.e. x := ξ and y := ξ + δ/2 are points such that |x− y| = δ/2 < δ, but

|g(x)− g(y)| = ξδ +δ2

4> ǫ,

showing f is not uniformly continuous.

(c) h : M −→ R, h(x) :=√x, is uniformly continuous, but not Lipschitz continuous:

To show that h is uniformly continuous is left as an exercise. If h were Lipschitzcontinuous, then there needed to be L ≥ 0 such that

√

ξ + δ −√

ξ ≤ Lδ (2.6)

for each ξ ∈ R+, δ > 0. However, since

√ξ + δ −√

ξ

δ=

δ

δ(√ξ + δ +

√ξ)

=1√

ξ + δ +√ξ, (2.7)

by choosing ξ and δ sufficiently small, one can always make the expression in (2.7)larger than any given L, showing that h is not Lipschitz continuous.


Example 2.6. (a) According to Lem. 1.57(b), the norm ‖ ·‖ on a normed vector spaceX satisfies the inverse triangle inequality

∣∣‖x‖ − ‖y‖

∣∣ ≤ ‖x− y‖ for each x, y ∈ X,

i.e. the norm is Lipschitz continuous with Lipschitz constant 1.

(b) Let (X, d) be a metric space. According to Lem. 1.57(a), we have

∀x,x′,y,y′∈X

|d(x, y)− d(x′, y′)| ≤ d(x, x′) + d(y, y′).

In particular, this yields the Lipschitz continuity of d : X2 −→ R+0 (with Lipschitz

constant 1) with respect to the metric d1 on X2 defined by

d1 : X2 ×X2 −→ R+

0 , d1((x, y), (x′, y′)

)= d(x, x′) + d(y, y′). (2.8)

Further consequences are the continuity and even uniform continuity of d, andalso the continuity of d in both components. If X is nonempty, then, for each∅ 6= A,B ⊆ X, we define the distance between A and B by

dist(A,B) := inf{d(a, b) : a ∈ A, b ∈ B} ∈ [0,∞[, (2.9)

and

∀x∈X

dist(x,B) := dist({x}, B), dist(A, x) := dist(A, {x}). (2.10)

It is an exercise to show that, if A ⊆ X and A 6= ∅, then the functions

δ, δ : X −→ R+0 , δ(x) := dist(x,A), δ(x) := dist(A, x), (2.11)

are both Lipschitz continuous with Lipschitz constant 1 (in particular, they areboth continuous and even uniformly continuous).

Theorem 2.7. Let (X, TX) and (Y, TY ) be topological spaces (for example, metric ornormed spaces), f : X −→ Y . Let SY be a subbase for TY . Then the following fourstatements are equivalent:

(i) f is continuous.

(ii) For each open set O ⊆ Y (i.e. each O ∈ TY ), the preimage f−1(O) = {x ∈ X :f(x) ∈ O} is open in X (i.e. f−1(O) ∈ TX).

(iii) For each S ∈ SY , the preimage f−1(S) is open in X (i.e. f−1(S) ∈ TX).

(iv) For each closed set C ⊆ Y , the preimage f−1(C) is closed in X (i.e. X \ f−1(C) ∈TX).


Proof. “(i) ⇒ (ii)”: Assume f is continuous and consider O ⊆ Y open. Let ξ ∈ f−1(O)and η := f(ξ). As f is continuous in ξ and O is a neighborhood of η, there is Oξ ∈ TX

such that ξ ∈ Oξ and f(Oξ) ⊆ O, i.e. Oξ ⊆ f−1(O). Thus, f−1(O) =⋃

ξ∈f−1(O)Oξ is aunion of open sets and, hence, open.

“(ii) ⇒ (i)”: Assume that, for each open set O ⊆ Y , f−1(O) is open in X. Let ξ ∈ Xand, once again, write η := f(ξ). Consider O ∈ TY with η ∈ O. We have to findU ∈ TX such that ξ ∈ U and f(U) ⊆ O. However, since f−1(O) ∈ TX , we simply chooseU := f−1(O), showing the continuity of f in ξ. As ξ was arbitrary, f is continuous.

Proving “(ii) ⇔ (iii)” and “(ii) ⇔ (iv)” is left as an exercise. �

As we have seen in [Phi16, Sec. 7.2.2] in the one-dimensional context, it is often moreconvenient to use sequences rather than neighborhoods in order to check if functions havelimits or are continuous. For functions between metric spaces (in particular, betweennormed spaces), it is possible to generalize [Phi16, (8.31)] and [Phi16, Th. 7.37] to usesequences in that way; in general topological spaces, one has to use nets rather thansequences.

Theorem 2.8. Let (X, TX) and (Y, TY ) be topological spaces, M ⊆ X, f : M −→ Y ,ξ ∈M . Then the following statements (i) and (ii) are equivalent:

(i) f is continuous in ξ.

(ii) For each net (xi)i∈I in M with limi∈I xi = ξ, the net (f(xi))i∈I in Y converges tof(ξ), i.e.

limi∈I

xi = ξ ⇒ limi∈I

f(xi) = f(ξ). (2.12)

If (X, TX) has a countable local base at ξ (e.g. if (X, TX) is metrizable), then (i) and(ii) are also equivalent to the following statement:

(iii) For each sequence (xk)k∈N in M with limk→∞ xk = ξ, the sequence (f(xk))k∈N inY converges to f(ξ), i.e.

limk→∞

xk = ξ ⇒ limk→∞

f(xk) = f(ξ). (2.13)

Proof. “(i) ⇔ (ii)” (the proof is analogous to the proof of [Phi16, Th. 7.37]): If ξ ∈ Mis not a cluster point of M , then there is U ∈ U(ξ) such that M ∩ U = {ξ} (i.e. ξ is anisolated point of M). Then every f : M −→ Y is continuous in ξ. On the other hand,every net (xi)i∈I in M converging to ξ must be eventually equal to ξ, in the sense that

∃i0∈I

∀i≥i0

xi = ξ. (2.14)

Thus, (2.12) is trivially valid for every f : M −→ Y , i.e. the assertion of the theoremholds if ξ ∈ M is not a cluster point of M . Now let ξ ∈ M be a cluster point of M .


Assume that f is continuous in ξ and (xi)i∈I is a net in M with limi∈I xi = ξ. As f iscontinuous in ξ, (2.1) holds, i.e.

∀U∈U(f(ξ))

∃V ∈U(ξ)

f(V ∩M) ⊆ U.

Since limi∈I xi = ξ, there is also i ∈ I such that, for each j ≥ i, xj ∈ V ∩M . Thus,for each j ≥ i, f(xj) ∈ U , proving limi∈I f(xi) = f(ξ). Conversely, assume that f isnot continuous in ξ. We have to construct a net (xi)i∈I in M with limi∈I xi = ξ, but(f(xi))i∈I does not converge to f(ξ). Since f is not continuous in ξ, there must be someU ∈ U(f(ξ)) such that, for each V ∈ U(ξ), there exists at least one xV ∈ M ∩ V withf(xV ) /∈ U . As in Ex. 1.20(c) and Ex. 1.22(b), we let I := U(ξ) with

∀V1,V2∈U(ξ)

V1 ≤ V2 :⇔ V2 ⊆ V1.

Then, clearly, limV ∈U(ξ) xV = ξ, but (f(xV ))V ∈U(ξ) does not converge to f(ξ).

Now let (X, TX) have a countable local base at ξ. Since every sequence is a net, it isimmediate that (ii) implies (iii). If (X, TX) has a countable local base at ξ, then (iii)implies (i): We have shown above that, if f is not continuous in ξ, then there is a net(xi)i∈I in M with limi∈I xi = ξ, but (f(xi))i∈I does not converge to f(ξ), as there isU ∈ U(f(ξ)) such that f(xi) /∈ U for each i ∈ I. Now, if (X, TX) has a countablelocal base at ξ, then, by Prop. 1.40, (xi)i∈I contains a sequence (xik)k∈N such thatlimk→∞ xik = ξ. But then (f(xik))k∈N still does not converge to f(ξ). �

Corollary 2.9. Let (X, TX) and (Y, TY ) be topological spaces, M ⊆ X, f : M −→ Y .Let ξ ∈ X be a cluster point of M and let η ∈ Y . Then the following statements (i) and(ii) are equivalent:

(i) limx→ξ f(x) = η.

(ii) For each net (xi)i∈I in M with limi∈I xi = ξ, the net (f(xi))i∈I in Y converges toη, i.e.

limi∈I

xi = ξ ⇒ limi∈I

f(xi) = η.

If (X, TX) has a countable local base at ξ (e.g. if (X, TX) is metrizable), then (i) and(ii) are also equivalent to the following statement:

(iii) For each sequence (xk)k∈N in M with limk→∞ xk = ξ, the sequence (f(xk))k∈N inY converges to η, i.e.

limk→∞

xk = ξ ⇒ limk→∞

f(xk) = η.

Proof. According to Rem. 2.2(b), the map

g : M ∪ {ξ} −→ Y, g(x) :=

{

f(x) for x 6= ξ,

η for x = ξ,

is continuous in ξ if, and only if (i) holds. Thus, everything is an immediate consequenceof Th. 2.8. �


Caveat 2.10. Consider the situation of Th. 2.8. If (X, T ) is not first countable, thenit can happen that f satisfies Th. 2.8(iii)5, but f is not continuous in ξ: The followingconstruction is quite general: Let A ⊆ X and let ξ ∈ A be such that there is no sequencein A converging to ξ (such a ξ can only exist if (X, T ) is not first countable, cf. Cor.1.44(iv); a concrete example was given in Ex. 1.53(c) and we will come back to thatshortly). Now let M := A ∪ {ξ} and define

f : M −→ R, f(x) :=

{

0 for x ∈ A,

1 for x = ξ.

Then f satisfies Th. 2.8(iii) (trivially, since every sequence in M , converging to ξ, mustbe finally constant with value ξ). However, there exists a net (ai)i∈I in A, converging toξ. Then (f(ai))i∈I is constant and equal to 0, i.e. limi∈I f(ai) = 0 6= 1 = f(ξ), showingthat f is not continuous in ξ. For a concrete example, recall that, in Ex. 1.53(c), weconsidered X := F(R,K) = KR with the product topology (i.e. with the topology ofpointwise convergence) and the subset

A :=

{

(x : R −→ K) : ∃J ⊆ R finite

x(s) =

{

0 for s ∈ J,

1 for s /∈ J

}

.

It was shown in Ex. 1.53(c) that ξ := x0 with x0(s) = 0 for each s ∈ R is in A, but nosequence in A converges to ξ.

Theorem 2.11. Let (X, TX), (Y, TY ), (Z, TZ) be topological spaces, Df ⊆ X, f : Df −→Y , Dg ⊆ Y , g : Dg −→ Z, f(Df ) ⊆ Dg. If f is continuous in ξ ∈ Df and g iscontinuous in f(ξ) ∈ Dg, then g ◦ f : Df −→ Z is continuous in ξ. In consequence, iff and g are both continuous, then the composition g ◦ f is also continuous.

Proof. Let ξ ∈ Df and assume that f is continuous in ξ and g is continuous in f(ξ).If (xi)i∈I is a net in Df such that limi∈I xi = ξ, then the continuity of f in ξ impliesthat limi∈I f(xi) = f(ξ). Then the continuity of g in f(ξ) implies that limi∈I g(f(xi)) =g(f(ξ)), thereby establishing the continuity of g ◦ f in ξ. �

Example 2.12. (a) Constant functions f : X −→ Y are always continuous, since Xand ∅ are the only preimages, and both are always open.

(b) As in Ex. 1.53, consider topological spaces (Xi, Ti), i ∈ I, and X :=∏

i∈I Xi withthe product topology T (X = Kn, n ∈ N, with the norm topology is, of course, aparticularly simple special case). Recall the projections

∀j∈I

πj : X −→ Xj, πj((xi)i∈I

):= xj.

(i) Each projection πj is continuous, since π−1j (O) ∈ T if O ∈ Tj by the definition

of the product topology.

5A function that satisfies Th. 2.8(iii) is sometimes called sequentially continuous in ξ – thus, ingeneral, a function can be sequentially continuous without being continuous.


(ii) Each projection πj is a so-called open map, i.e.

∀O∈T

πj(O) ∈ Tj : (2.15)

Let xj ∈ πj(O). Then there is x = (xi)i∈I ∈ O such that πj(x) = xj. SinceO ∈ T , there is j ∈ J ⊆ I finite and Oi ∈ Ti for each i ∈ J such thatx ∈ B :=

⋂

i∈J π−1i (Oi) ⊆ O. Then xj = πj(x) ∈ Oj = πj(B) ⊆ πj(O), which

already proves πj(O) to be open.

(c) Let X be as in (b). If Y is a set and f : Y −→ X, then the functions fi := πi ◦ f ,i ∈ I, are called the coordinate functions of f . Then

fi : Y −→ Xi, fi(y) = πi(f(y)), f(y) =(fi(y)

)

i∈I. (2.16)

If, as in (b), T is the product topology on X, TY is a topology on Y , and y ∈ Y ,then the following statements are equivalent:

(i) f is continuous in y.

(ii) For each i ∈ I, the coordinate function fi is continuous in y.

Indeed, if f is continuous in y, then each fi, i ∈ I, is continuous in y by (b) and Th.2.11. For the converse, assume each fi, i ∈ I, to be continuous in y. Let (yj)j∈J bea net in Y such that limj∈J yj = y. Let O ∈ Ti, i ∈ I, such that f(y) ∈ π−1

i (O), i.e.such that fi(y) ∈ O. Then the continuity of fi in y implies limj∈J fi(yj) = fi(y).Thus,

∃j0∈J

∀j≥j0

(fi(yj) ∈ O i.e. f(yj) ∈ π−1

i (O)),

implying limj∈J f(yj) = f(y) by Cor. 1.50(a) and, thus, the continuity of f in y.

(d) We are staying in the setting of (c) with the additional assumption that Xi = C

for each i ∈ I. Then fi is continuous in y if, and only if, both Re fi and Im fi arecontinuous in y: This is actually merely a corollary of (c), since C = R2, | · | onC is precisely ‖ · ‖2 on R2, and Re fi and Im fi are just the coordinate functions offi : Y −→ R2.

Remark 2.13. Let X 6= ∅ be an arbitrary nonempty set, f, g : X −→ K, and λ ∈ K.In [Phi16, Not. 6.2], we defined the functions f + g, λf , fg, f/g, |f |, and, for K = R,also max(f, g), min(f, g), f+, f−. If Y is an arbitrary vector space over K and f, g :X −→ Y , then we can generalize the definition of f + g and λf by letting

(f + g) : X −→ Y, (f + g)(x) := f(x) + g(x), (2.17a)

(λf) : X −→ Y, (λf)(x) := λf(x). (2.17b)

It turns out that this makes the set of functions from X into Y , F(X, Y ), into a vectorspace over K with zero element f ≡ 0 (exercise). Finally, for f : X −→ Cn, f(x) =(f1(x), . . . , fn(x)), n ∈ N, we define

Re f : X −→ Rn, Re f(x) := (Re f1(x), . . . ,Re fn(x)), (2.18a)

Im f : X −→ Rn, Im f(x) := (Im f1(x), . . . , Im fn(x)), (2.18b)

f : X −→ Cn, f(x) := (f1(x), . . . , fn(x)), (2.18c)


such that

f = Re f + i Im f, (2.19a)

f = Re f − i Im f. (2.19b)

Lemma 2.14. Let (X, ‖ · ‖) be a normed vector space, and let (xk)k∈N and (yk)k∈N besequences in X with limk→∞ xk = x ∈ X and limk→∞ yk = y ∈ X. Then the followingholds:

limk→∞

(xk + yk) = x+ y, (2.20a)

limk→∞

(λxk) = λx for each λ ∈ K. (2.20b)

Proof. Since limk→∞ ‖xk − x‖ = 0 and limk→∞ ‖yk − y‖ = 0, it follows from ‖xk + yk −x− y‖ ≤ ‖xk − x‖+ ‖yk − y‖ that also limk→∞ ‖xk + yk − x− y‖ = 0. For each λ ∈ K,one has limk→∞ ‖λxk − λx‖ = limk→∞(|λ| ‖xk − x‖) = |λ| limk→∞ ‖xk − x‖ = 0. �

Theorem 2.15. Let X be a metric space (e.g. a normed space), let Y be a normedvector space, and assume that f, g : X −→ Y are continuous in ξ ∈ X. Then f + g andλf are continuous in ξ for each λ ∈ K (in particular, C(X, Y ) constitutes a subspace ofthe vector space F(X, Y ) over K). Moreover, if Y = Cn, n ∈ N, then Re f , Im f , andf are all continuous in ξ; if Y = K, then fg, f/g for g 6= 0, and |f | are all continuousin ξ; if Y = R, then max(f, g), min(f, g), f+, and f− are all continuous in ξ as well.

Proof. Let (xk)k∈N be a sequence inX such that limk→∞ xk = ξ. Then the continuity of fand g in ξ yields limk→∞ f(xk) = f(ξ) and limk→∞ g(xk) = g(ξ). Lemma 2.14 then yieldslimk→∞(f + g)(xk) = (f + g)(ξ) and limk→∞(λf)(xk) = (λf)(ξ). For Y = Cn, n ∈ N,(1.18) together with [Phi16, (7.2)] and [Phi16, (7.11f)] shows limk→∞Re f(xk) = Re f(ξ),limk→∞ Im f(xk) = Im f(ξ), and limk→∞ f(xk) = f(ξ), providing the continuity of Re f ,Im f , and f at ξ. For Y = K, the rules for the limits of sequences in K [Phi16, Th.7.13(a)] yield limk→∞(fg)(xk) = (fg)(ξ), limk→∞(f/g)(xk) = (f/g)(ξ) for g 6= 0, andlimk→∞ |f |(xk) = |f |(ξ). This provides the continuity of f+g, λf , fg, f/g, and |f | at ξ.Moreover, for Y = R, [Phi16, Th. 7.13(b)] implies limk→∞max(f, g)(xk) = max(f, g)(ξ)and limk→∞min(f, g)(xk) = min(f, g)(ξ), proving the continuity of max(f, g), min(f, g),f+, and f− at ξ. �

Example 2.16. Each K-linear function A : Kn −→ Km, (n,m) ∈ N2, is continuous:Using the standard unit vectors ej, for each z ∈ Kn, one has A(z) = A(

∑nj=1 zjej) =

∑nj=1 zjA(ej). Thus, one can build A by summing the functions Aj : Kn −→ Km,

Aj(z) := zjA(ej) for each j ∈ {1, . . . , n}. Since limk→∞ zk = z implies limk→∞ zkj = zj ,which implies limk→∞ zkjA(ej) = zjA(ej), all Aj are continuous, and, thusA is continuousby Th. 2.15.

Example 2.17. The function f : R+ × C −→ C, f(x, z) := xz = exp(z ln x), iscontinuous: With the projections π1, π2 : C

2 −→ C, we can write f = exp ◦(π2(ln ◦π1))(note π1 is R

+-valued on R+ ×C). Since π1 and π2 are continuous by Example 2.12(b),


ln ◦π1 is continuous by Th. 2.11 and π2(ln ◦π1) is continuous by Th. 2.15. Finally,f = exp ◦(π2(ln ◦π1)) is continuous by Th. 2.11.

—

In [Phi16, Ex. 7.40(b),(c)], we had shown that 1-dimensional polynomials and rationalfunctions are continuous (where they are defined). We will now extend [Phi16, Ex.7.40(b),(c)] to n-dimensional polynomials and rational functions:

Definition 2.18. Let n ∈ N. An element p = (p1, . . . , pn) ∈ (N0)n is called a multi-

index; |p| := p1+ · · ·+pn is called the degree of the multi-index. If x = (x1, . . . , xn) ∈ Kn

and p = (p1, . . . , pn) is a multi-index, then we define

xp := xp11 xp22 · · · xpnn . (2.21)

Each function from Kn into K, x 7→ xp, is called a monomial; the degree of p is calledthe degree of the monomial. A function P from Kn into K is called a polynomial if, andonly if, it is a linear combination of monomials, i.e. if, and only if P has the form

P : Kn −→ K, P (x) =∑

|p|≤k

apxp, k ∈ N0, ap ∈ K. (2.22)

The degree of P , still denoted deg(P ), is the largest number d ≤ k such that there is pwith |p| = d and ap 6= 0. If all ap = 0, i.e. if P ≡ 0, then P is the (n-dimensional) zeropolynomial and, as for n = 1, its degree is defined to be −1. A rational function is onceagain a quotient of two polynomials.

Example 2.19. Writing x, y, z instead of x1, x2, x3, xy3z, x2y2, x2y, x2, y, 1 are exam-

ples of monomials of degree 5, 4, 3, 2, 1, and 0, respectively, P (x, y) := 5x2y−3x2+y−1and Q(x, y, z) := xy3z − 2x2y2 + 1 are polynomials of degree 3 and 5, respectively,and P (x, y)/Q(x, y, z) is a rational function defined for each (x, y, z) ∈ K3 such thatQ(x, y, z) 6= 0.

Theorem 2.20. Each polynomial P : Kn −→ K, n ∈ N, is continuous and each rationalfunction P/Q is continuous at each z ∈ Kn such that Q(z) 6= 0.

Proof. Let

P : Kn −→ K, P (z) =∑

|p|≤k

apzp, k ∈ N0, p = (p1, . . . , pn) ∈ (N0)

n,

|p| = p1 + · · ·+ pn, zp = zp11 zp22 · · · zpnn , ap ∈ K.

First, from Ex. 2.12(b), we know that the projections πj : Kn −→ K, πj(z) := zj ,j ∈ {1, . . . , n}, are continuous. An induction and Th. 2.15 then show the monomialsz 7→ apz

p to be continuous, and another induction then shows P to be continuous.Applying Th. 2.15 once more finally shows that each rational function P/Q is continuousat each z ∈ Kn such that Q(z) 6= 0. �


Example 2.21. For n ∈ N, let M(n,K) denote the set of n×n matrices over K, whichis the same as Kn2

and, thus, can be considered as a normed vector space in the usualway. From Linear Algebra, recall the determinant function det : M(n,K) −→ K.

(a) From Linear Algebra, we know that the determinant det is a polynomial onM(n,K)(i.e. on Kn2

), i.e. det is continuous as a consequence of Th. 2.20.

(b) From Linear Algebra, we also know that A ∈ M(n,K) is invertible if, and only if,det(A) 6= 0. Using (a) and Th. 2.7(ii), this implies that GL(n,K) := det−1(K\{0})is an open subset of M(n,K) (in Linear Algebra, GL(n,K) is known as the generallinear group of degree n over K). Moreover, we claim the map

inv : GL(n,K) −→ GL(n,K), inv(A) := A−1,

is continuous: Indeed, according to another Linear Algebra result, all the coordinatemaps invkl (i.e. the entries of the inverse matrix) are rational functions on M(n,K)(i.e. on Kn2

), i.e. they are continuous as a consequence of Th. 2.20, i.e. inv iscontinuous by Ex. 2.12(c).

Theorem 2.22. For a K-linear function A : X −→ Y between normed vector spaces(X, ‖ · ‖X) and (Y, ‖ · ‖Y ) over K, the following statements are equivalent:

(i) A is continuous.

(ii) There exists ξ ∈ X such that A is continuous in ξ.

(iii) A is Lipschitz continuous.

Proof. Exercise. �

We will now see two examples that show that, in contrast to linear maps betweenfinite-dimensional spaces as considered in Example 2.16 above, linear maps betweeninfinite-dimensional spaces can be discontinuous.

Example 2.23. (a) Once again, consider the space X from Example 1.60, consistingof all sequences in K that are finally constant and equal to zero, endowed with thenorm ‖ · ‖sup. The function

A : X −→ K, A((zn)n∈N

):=

∞∑

n=1

zn, (2.23)

is clearly linear. However, we will see that A is not continuous: The sequence(zk)k∈N defined by

zkn :=

{

1/k for 1 ≤ n ≤ k,

0 for n > k,(2.24)

converges to 0 = (0, 0, . . . ) ∈ X with respect to ‖ · ‖sup. However, for each k ∈ N,

A(zk) =∑k

n=1(1/k) = 1, i.e. limk→∞A(zk) = 1 6= 0 = A(0), showing that A is notcontinuous at 0.


(b) Let X be the normed vector space consisting of all bounded and differentiablefunctions f : R −→ R, endowed with the sup-norm. Then the function d : X −→R, d(f) := f ′(0), is linear, but not continuous (exercise).

—

A notion related to, but different from, continuity is componentwise continuity (see Def.2.24). Both notions have to be distinguished carefully, as componentwise continuitydoes not imply continuity (see Example 2.26).

Definition 2.24. Let (Y, T ) be a topological space and let ζ = (ζ1, . . . , ζn) ∈ Kn, n ∈ N.A function f : Kn −→ Y is called continuous in ζ with respect to the jth component,j ∈ {1, . . . , n}, if, and only if, the function

φ : K −→ Y, φ(α) := f(ζ1, . . . , ζj−1, α, ζj+1, . . . , ζn), (2.25)

is continuous in α = ζj.

Lemma 2.25. Let (Y, T ) be a topological space and let ζ = (ζ1, . . . , ζn) ∈ Kn, n ∈ N.If f is continuous in ζ, then f is continuous in ζ with respect to all components.

Proof. Let j ∈ {1, . . . , n} and let (αk)k∈N be a sequence in K with limk→∞ αk = ζj. Then(zk)k∈N with zk := (ζ1, . . . , ζj−1, αk, ζj+1, . . . , ζn) is a sequence in Kn with limk→∞ zk = ζ.Thus, the continuity of f yields limk→∞ f(zk) = f(ζ). If φ is defined as in (2.25), thenφ(αk) = f(zk), showing limk→∞ φ(αk) = f(ζ) = φ(ζj), i.e. φ is continuous in ζj. Wehave, hence, shown, for each j ∈ {1, . . . , n}, that f is continuous in ζ with respect tothe jth component. �

Example 2.26. A function can be continuous with respect to all components at a pointζ without being continuous at ζ: Consider the function

f : K2 −→ K, f(z, w) :=

{

0 for zw = 0,

1 for zw 6= 0.(2.26)

Let φ1, φ2 : K −→ K, φ1(α) := f(α, 0), φ2(α) := f(0, α). Then both φ1 and φ2 areidentically 0 and, in particular, continuous at α = 0. However, f is not continuous at(0, 0), since, for example,

(zk, wk) :=

{

(1/k, 0) for k even,

(1/k, 1/k) for k odd(2.27)

yields a sequence that converges to (0, 0), but f(zk, wk) = 0 if k is even and f(zk, wk) = 1if k is odd, i.e. the sequence (f(zk, wk))k∈N does not converge.


2.2 Banach Fixed Point Theorem a.k.a. Contraction MappingPrinciple

Definition 2.27. Let ∅ 6= A be a subset of a metric space (X, d), ϕ : A −→ A.

(a) The map ϕ is called a contraction if, and only if, there exists 0 ≤ L < 1 satisfying

d(ϕ(x), ϕ(y)

)≤ Ld(x, y) for each x, y ∈ A. (2.28)

(b) x∗ ∈ A is called a fixed point of ϕ if, and only if, ϕ(x∗) = x∗.

Remark 2.28. According to Def. 2.27, ϕ : A −→ A is a contraction if, and only if, ϕis Lipschitz continuous with Lipschitz constant L < 1.

The following Th. 2.29 constitutes the Banach fixed point theorem. It is also known asthe contraction mapping principle. Its proof is surprisingly simple, e.g. about an orderof magnitude easier than the proof of the Brouwer fixed point theorem.

Theorem 2.29 (Banach Fixed Point Theorem). Let ∅ 6= A be a closed subset of acomplete metric space (X, d) (for example, a Banach space). If ϕ : A −→ A is acontraction with Lipschitz constant 0 ≤ L < 1, then ϕ has a unique fixed point x∗ ∈ A.Moreover, for each initial value x0 ∈ A, the sequence (xn)n∈N0

, defined by

xn+1 := ϕ(xn) for each n ∈ N0, (2.29)

converges to x∗:limn→∞

ϕn(x0) = x∗. (2.30)

Furthermore, for each such sequence, we have the error estimate

d(xn, x∗) ≤L

1− Ld(xn, xn−1) ≤

Ln

1− Ld(x1, x0) (2.31)

for each n ∈ N.

Proof. We start with uniqueness: Let x∗, x∗∗ ∈ A be fixed points of ϕ. Then

d(x∗, x∗∗) = d(ϕ(x∗), ϕ(x∗∗)

)≤ Ld(x∗, x∗∗), (2.32)

which implies 1 ≤ L for d(x∗, x∗∗) > 0. Thus, L < 1 implies d(x∗, x∗∗) = 0 and x∗ = x∗∗.

Next, we turn to existence. A simple induction on m− n shows

d(xm+1, xm) ≤ Ld(xm, xm−1) ≤ Lm−n d(xn+1, xn)

for each m,n ∈ N0, m > n.(2.33)

This, in turn, allows us to estimate, for each n, k ∈ N0:

d(xn+k, xn) ≤n+k−1∑

m=n

d(xm+1, xm)(2.33)

≤n+k−1∑

m=n

Lm−n d(xn+1, xn)

≤ 1

1− Ld(xn+1, xn)

(2.33)

≤ Ln

1− Ld(x1, x0) → 0 for n→ ∞, (2.34)


establishing that (xn)n∈N0constitutes a Cauchy sequence. Since X is complete, this

Cauchy sequence must have a limit x∗ ∈ X, and since the sequence is in A and Ais closed, x∗ ∈ A. The continuity of ϕ allows to take limits in (2.29), resulting inx∗ = ϕ(x∗), showing that x∗ is a fixed point and proving existence.

Finally, the error estimate (2.31) follows from (2.34) by fixing n and taking the limit fork → ∞. �

Example 2.30. Suppose, we are looking for a fixed point of the map ϕ(x) = cos x (or,equivalently, for a zero of f(x) = cos x− x). To apply the Banach fixed point theorem,we need to restrict ϕ to a set A such that ϕ(A) ⊆ A. This is the case for A := [0, 1].Moreover, ϕ : A −→ A is a contraction, due to sin 1 < 1 and the mean value theoremproviding τ ∈]0, 1[, satisfying

∣∣ϕ(x)− ϕ(y)

∣∣ =

∣∣ϕ′(τ)

∣∣ |x− y| < (sin 1)|x− y| (2.35)

for each x, y ∈ A. Since R is complete and A is closed in R, the Banach fixed pointtheorem yields the existence of a unique fixed point x∗ ∈ [0, 1] and limϕn(x0) = x∗ foreach x0 ∈ [0, 1].

2.3 Homeomorphisms, Norm-Preserving and Isometric Maps,Embeddings

Definition 2.31. (a) Given topological spaces (X, TX) and (Y, TY ), a function f :X −→ Y is called homeomorphism (and (X, TX) and (Y, TY ) are called homeo-morphic) if, and only if, f is bijective and both f and f−1 are continuous. If f isinjective, then it is called an embedding of (the topological space) X into Y if, andonly if, f is a homeomorphism onto its image (f(X), Tf(X)).

(b) Given metric spaces (X, dX) and (Y, dY ), a function f : X −→ Y is called distance-preserving or isometric or an isometry if, and only if,

dY(f(x), f(y)

)= dX(x, y) for each x, y ∈ X. (2.36)

(c) Given normed vector spaces (X, ‖·‖X) and (Y, ‖·‖Y ) over K, a function f : X −→ Yis called norm-preserving if, and only if,

∥∥f(x)

∥∥Y= ‖x‖X for each x ∈ X. (2.37)

Moreover, f is called an embedding of (the normed space) X into Y if, and only if,it is norm-preserving and linear.

—

In general, a structure-preserving bijective map is called an isomorphism. Thus, ahomeomorphism is an isomorphism of topological spaces, a bijective isometry is an iso-morphism of metric spaces, and a norm-preserving linear isomorphism is an isomorphismof normed spaces. Properties preserved by homeomorphisms are called topological in-variants (e.g., separability, being first or second countable etc. – see Prop. I.1 of theAppendix for a more extensive list).


Lemma 2.32. (a) If (X, TX) and (Y, TY ) are topological spaces, then f : X −→ Y isan embedding if, and only if, f is injective and both f and f−1 : f(X) −→ X arecontinuous.

(b) Each isometric map between metric spaces is continuous.

(c) Each isometric map between metric spaces is injective.

(d) If (X, ‖ · ‖X) and (Y, ‖ · ‖Y ) are normed vector spaces over K, then a K-linearfunction f : X −→ Y is norm-preserving if, and only if, it is isometric with respectto the induced metrics.


The following Ex. 2.33(a) shows that the assertion of Lem. 2.32(d) becomes false if theword “linear” is omitted: In general, a norm-preserving map is not isometric and noteven injective or continuous. On the other hand, Ex. 2.33(b) shows that an isometricmap does not need to be norm-preserving, and Ex. 2.33(c) shows that a homeomorphismis not necessarily isometric.

Example 2.33. (a) Let (X, ‖·‖X) be a normed vector space over K, and f : X −→ K,f(x) := ‖x‖X . If we take ‖ · ‖Y to be the usual norm on K, i.e. ‖y‖Y := |y|, then,for each x ∈ X,

∥∥f(x)

∥∥Y=∣∣‖x‖X

∣∣ = ‖x‖X , i.e. f is norm-preserving. However, if

dimX > 0 (i.e. if X 6= {0}), then f is not isometric with respect to the inducedmetrics: Take any 0 6= x ∈ X. One computes

∥∥f(x)−f(−x)

∥∥Y=∣∣‖x‖X −‖x‖X

∣∣ =

0 6= ‖x − (−x)‖X = 2‖x‖X . Moreover, for x 6= 0, one has x 6= −x, but f(x) =‖x‖X = f(−x), i.e. f is not injective. Similarly, if y ∈ X, y 6= 0, then g : X −→ X,g(x) := x for x 6= y, g(y) := −y is norm-preserving, but not continuous in y (alsonot injective, since g(y) = g(−y)). The map h : R −→ R, h(x) = x for x ∈ Q,h(x) = −x for x /∈ Q, is norm-preserving, but nowhere continuous, except in 0 (thisexample was pointed out by Charlotte Dietze).

(b) Consider (X, ‖ · ‖X), (Y, ‖ · ‖Y ), where X = Y = K and ‖x‖X = ‖x‖Y = |x| foreach x ∈ K. Then f : X −→ Y , f(x) := 1 + x, is isometric due to |f(x)− f(y)| =|1+ x− (1 + y)| = |x− y|, but f is not norm-preserving, since 0 = |0| 6= |f(0)| = 1.

(c) We know that all norms on Kn, n ∈ N, are equivalent, i.e. they all induce the sametopology on Kn. Thus, if we consider Kn with two distinct norms, then the identityId : Kn −→ Kn is a homeomorphism, but neither norm-preserving nor isometric.

Remark 2.34. If (X, ‖·‖) is a normed space, d is the induced metric, andM ⊆ X, then(M,d) can be considered as the metric subspace of (X, d) according to Prop. 1.55(d).Thus, every subset of a normed space is turned into a metric space in a natural way.It is quite remarkable that actually every metric space arises in this way. That means,given any metric space (M,d), there exists a normed space (X, ‖ · ‖) and an isometric(in particular, injective) function f : M −→ X: One can choose X as the vector spaceover R of bounded functions from M into R with the sup-norm (for F ∈ X, define


‖F‖ := sup{|F (x)| : x ∈ M} – this is the same space we considered in Ex. 1.15) andf : M −→ X, f(x) = fx, where fx : M −→ R, fx(y) = d(x, y) − d(x0, y), with somefixed x0 ∈M (the details are left as an exercise). However, the normed space X can bevery large (i.e. much larger than M), and, thus, in practice, it is not always useful tostudy X in order to learn more about the metric space M .

—

We conclude this section by showing that each metric space has a completion:

Theorem 2.35. Let (X, dX) be a metric space. Then X has a completion in the sense ofDef. 1.61, i.e. there exists a complete metric space (Y, dY ) and an isometry φ : X −→ Ysuch that φ(X) is dense in Y . Moreover, the completion is unique in the sense that if(Z, dZ) is another completion of X, where ψ : X −→ Z is an isometry and ψ(X) isdense in Z, then there exists a bijective isometry f : Y −→ Z such that f ◦ φ = ψ.

Proof. The idea is to construct Y as a set of equivalence classes of Cauchy sequences inX (analogously to the construction that yields R from Q): In a first step, let X ′ be theset of Cauchy sequences in X, and define

d′ : X ′ ×X ′ −→ R+0 , d′

((xk)k∈N, (yk)k∈N

):= lim

k→∞dX(xk, yk).

To see that d′ is well-defined, we need to show that the limit in its definition exists: Tothis end, it suffices to show that (dX(xk, yk))k∈N is a Cauchy sequence in R. Indeed,given ǫ > 0, let N ∈ N be such that

∀k,l>N

(

dX(xk, xl) <ǫ

2and dX(yk, yl) <

ǫ

2

)

.

Then

∀k,l>N

|dX(xk, yk)− dX(xl, yl)|(1.38a)

≤ dX(xk, xl) + dX(yk, yl) <ǫ

2+ǫ

2= ǫ,

thereby establishing the case. If τ = σ, then, clearly, d′(τ, σ) = 0; d′ is symmetric, since,clearly, d′(τ, σ) = d′(σ, τ); and d′ also satisfies the triangle inequality:

d′((xk)k∈N, (yk)k∈N

)= lim

k→∞dX(xk, yk) ≤ lim

k→∞dX(xk, zk) + lim

k→∞dX(zk, yk)

= d′((xk)k∈N, (zk)k∈N

)+ d′

((zk)k∈N, (yk)k∈N

).

Altogether, we have shown that d′ constitutes a so-called pseudometric on X ′ (cf. Def.E.1 of the Appendix). Unfortunately, one can not expect d′ to be a metric on X ′, sinceit can happen that d′(τ, σ) = 0, even though τ 6= σ. However, according to Th. E.8,

σ ∼ τ :⇔ d′(σ, τ) = 0

defines an equivalence relation on X ′ and, if one lets Y := {[σ] : σ ∈ X ′} be the set ofthe corresponding equivalence classes, then

dY : Y × Y −→ R+0 , dY ([σ], [τ ]) := d′(σ, τ),


defines a metric on Y . Define

φ : X −→ Y, φ(x) := [(x)k∈N].

Then, for each x, y ∈ X,

dY(φ(x), φ(y)

)= dY

([(x)k∈N], [(y)k∈N]

)= d′

((x)k∈N, (y)k∈N

)

= limk→∞

dX(x, y) = dX(x, y),

showing φ to be an isometry. Next, we show that, for each Cauchy sequence (xk)k∈N inX, we have

liml→∞

φ(xl) = [(xk)k∈N](

i.e. liml→∞

dY(φ(xl), [(xk)k∈N]

)= 0)

(in particular, this implies φ(X) to be dense in Y ). Let N ∈ N be such that, for eachk, l > N , we have dX(xk, xl) <

ǫ2. Then

∀l>N

dY(φ(xN+1), [(xk)k∈N]

)= lim

k→∞dX(xN+1, xk) ≤

ǫ

2< ǫ,

proving liml→∞ φ(xl) = [(xk)k∈N]. We can now show that (Y, dY ) is complete: Let(yn)n∈N be a Cauchy sequence in Y . As φ(X) is dense in Y , for each n ∈ N, there existsxn ∈ X such that dY (yn, φ(xn)) <

1n. Then (xn)n∈N is a Cauchy sequence: Given ǫ > 0,

choose k ∈ N such that 1k< ǫ

3and dY (yn, ym) <

ǫ3for each n,m > k. Then

∀n,m>k

dX(xn, xm) = dY (φ(xn), φ(xm)) ≤ dY (φ(xn), yn) + dY (yn, ym) + dY (ym, φ(xm))

<ǫ

3+ǫ

3+ǫ

3= ǫ,

showing (xn)n∈N is Cauchy and, as we have shown above, limn→∞ φ(xn) = y := [(xk)k∈N].Then limn→∞ yn = y as well: Given ǫ > 0, choose N ∈ N such that 1

N< ǫ

2and

dY (y, φ(xn)) <ǫ2for each n > N . Then

∀n>N

dY (y, yn) ≤ dY (y, φ(xn)) + dY (φ(xn), yn) <ǫ

2+ǫ

2= ǫ,

showing limn→∞ yn = y and completing the proof that (Y, dY ) is complete. Finally, let(Z, dZ) and ψ : X −→ Z be as in the statement of the theorem. Define f : Y −→ Z asfollows: Given y ∈ Y , there is a sequence (xk)k∈N in X such that y = limk→∞ φ(xk). Setf(y) := limk→∞ ψ(xk). We need to verify that f is well-defined. Since y = limk→∞ φ(xk)and φ is an isometry, (xk)k∈N must be a Cauchy sequence, and (ψ(xk))k∈N must, indeed,converge in Z. If (ak)k∈N is another sequence in X with y = limk→∞ φ(ak), then

limk→∞

dZ(ψ(ak), ψ(xk)) = limk→∞

dX(ak, xk) = limk→∞

dY (φ(ak), φ(xk)) = 0.

Letting z1 := limk→∞ ψ(xk), z2 := limk→∞ ψ(ak), this, together with

∀k∈N

dZ(z1, z2) ≤ dZ(z1, ψ(xk)) + dZ(ψ(xk), ψ(ak)) + dZ(ψ(ak), z2) → 0

3 FURTHER TOPOLOGIC PROPERTIES 61

shows z1 = z2 and the independence of f(y) from the chosen sequence. Next, for eachx ∈ X, f(φ(x)) = ψ(x), i.e. f ◦φ = ψ as desired. To see that f is isometric, let y1, y2 ∈ Yand let (xk)k∈N, (ak)k∈N be sequences with y1 = limk→∞ φ(xk), y2 = limk→∞ φ(ak). Thenf(y1) = limk→∞ ψ(xk), f(y2) = limk→∞ ψ(ak), and we need to show dZ(f(y1), f(y2)) =dY (y1, y2). Since

∀k∈N

∣∣dY (y1, y2)− dY (φ(xk), φ(ak))

∣∣ ≤ dY (y1, φ(xk)) + dY (y2, φ(ak)),

∣∣dZ(f(y1), f(y2))− dZ(ψ(xk), ψ(ak))

∣∣ ≤ dZ(f(y1), ψ(xk)) + dZ(f(y2), ψ(ak)),

we know

dZ(f(y1), f(y2)) = limk→∞

dZ(ψ(xk), ψ(ak)) = limk→∞

dX(xk, ak)

= limk→∞

dY (φ(xk), φ(ak)) = dY (y1, y2)

as required. Finally, f is also surjective: If z ∈ Z, then, as ψ(X) is dense in Z, there isa sequence (xk)k∈N in X such that z = limk→∞ ψ(xk). Then (xk)k∈N is Cauchy and wecan let y := limk→∞ φ(xk). Then f(y) = z, showing f to be surjective. �

3 Further Topologic Properties

3.1 Separation

Separation properties (also called separation axioms) are important properties of topo-logical spaces that are closely related to the uniqueness of limits and to the existence ofcontinuous maps. In general, separation properties are rather subtle and there exist anabundance of different such properties in the literature. We only consider some of themost important ones.

Definition 3.1. Let (X, T ) be a topological space.

(a) (X, T ) is called a T1 space if, and only if, points can be separated, i.e. if, and onlyif,

∀x,y∈X

(

x 6= y ⇒ ∃Oy∈T

(

y ∈ Oy ∧ x /∈ Oy

))

,

i.e. if, and only if, for each x ∈ X, the set {x} is closed (somewhat lax, one alsosays that T1 means “points are closed”).

(b) (X, T ) is called a T2 space or a Hausdorff space if, and only if, points can beseparated by disjoint open sets, i.e. if, and only if,

∀x,y∈X

(

x 6= y ⇒ ∃Ox,Oy∈T

(

x ∈ Ox ∧ y ∈ Oy ∧ Ox ∩Oy = ∅))

.


(c) (X, T ) is called a T3 space if, and only if, points and closed sets can be separatedby disjoint open sets, i.e. if, and only if,

∀x∈X

∀C⊆X,

C closed

(

x /∈ C ⇒ ∃Ox,OC∈T

(

x ∈ Ox ∧ C ⊆ OC ∧ Ox ∩OC = ∅))

.

(X, T ) is called regular if, and only if, it is both T1 and T3.6

(d) (X, T ) is called a T4 space if, and only if, disjoint closed sets can be separated bydisjoint open sets, i.e. if, and only if,

∀C1,C2⊆X,

C1, C2 closed

(

C1 ∩ C2 = ∅ ⇒ ∃O1,O2∈T

(

C1 ⊆ O1 ∧ C2 ⊆ O2 ∧ O1 ∩O2 = ∅))

.

(X, T ) is called normal if, and only if, it is both T1 and T4.7

Lemma 3.2. Let (X, T ) be a topological space. Then we have the following implicationsof separation properties:

(a) T2 implies T1.

(b) Regular implies T1, T2, T3.

(c) Normal implies T1, T2, T3, T4.

Proof. (a) is immediate and so are (b) and (c) (since points are closed in regular spacesas well as in normal spaces). �

Proposition 3.3. Let (X, T ) be a topological space. Then the following statements areequivalent:

(i) (X, T ) is a T2 space.

(ii) Limits of nets in X are unique, i.e. for each net (xi)i∈I in X it holds that if (xi)i∈Iconverges to both x ∈ X and y ∈ X, then x = y.


Example 3.4. (a) Clearly, indiscrete spaces (X, T ) with at least two distinct pointsare not T1. However, every indiscrete space is both T3 and T4 (since ∅ and X arethe only closed subsets of X, the conditions of Def. 3.1(c),(d) are trivially satisfied).

(b) Let (X, T ) be a cofinite space. Then it is always T1, since, for each x ∈ X, X\{x} isopen, hence {x} is closed. However, if X is infinite, then (X, T ) is not T2 (exercise).

6Caveat: Unfortunately, about half the literature switches the meaning of regular and T3.7Caveat: Unfortunately, about half the literature switches the meaning of normal and T4.


(c) Every metric space is normal (in particular, as a consequence of Prop. 3.3, limitsin metric spaces are unique): If (X, T ) is a topological space, where T is inducedby the metric d on X, then (X, T ) is T1 and T4: Let x, y ∈ X with x 6= y. Thenr := d(x, y) > 0 and y /∈ Br(x), x /∈ Br(y), showing that (X, T ) is T1. To showthat (X, T ) is T4, let A,B ⊆ X be closed with A∩B = ∅. Recalling the continuousdistance functions from Ex. 2.6(b), define dA := dist(·, A), dB := dist(·, B),

OA :={x ∈ X : dA(x) < dB(x)

}, OB :=

{x ∈ X : dB(x) < dA(x)

}.

We claim that OA and OB are open sets that separate A and B: Indeed, OA∩OB = ∅is immediate. Suppose x ∈ Ac. Since A is closed, Ac is open and Br(x) ⊆ Ac forsome r > 0. Thus, dA(x) ≥ r > 0. In consequence, A ⊆ OA and B ⊆ OB. Itremains to show that OA, OB ∈ T . We can write

OA =⋃

s∈R+

As, OB =⋃

s∈R+

Bs,

where

∀s∈R+

As :={x ∈ X : dA(x) < s < dB(x)

}= d−1

A

(]−∞, s[

)∩ d−1

B

(]s,∞[

),

Bs :={x ∈ X : dB(x) < s < dA(x)

}= d−1

B

(]−∞, s[

)∩ d−1

A

(]s,∞[

).

Due to the continuity of dA and dB, we have As, Bs ∈ T for each s ∈ R+, provingOA, OB ∈ T as well.

—

Further counterexamples regarding implications for separation properties are providedin Appendix Sec. G.

Proposition 3.5. (a) T1, T2, T3 are inherited by subspaces (but cf. Ex. G.1(d)): Let(X, T ) be a topological space, M ⊆ X. Let TM denote the relative topology on M .If (X, T ) is Tn, where n ∈ {1, 2, 3}, then (M, TM ) is Tn as well.

(b) T1, T2, T3 are inherited by product spaces (but cf. Ex. G.1(e)): Consider nonemptytopological spaces (Xi, Ti), i ∈ I, and X :=

∏

i∈I Xi with the product topology T(cf. Ex. 1.53). Then (X, T ) is Tn, where n ∈ {1, 2, 3}, if, and only if, each (Xi, Ti),i ∈ I, is Tn.

Proof. (a): Let (X, T ) be T1 (resp. T2), x, y ∈M with x 6= y. Then there are Ox, Oy ∈ Tsuch that x ∈ Ox, y ∈ Oy, and x /∈ Oy, y /∈ Ox (resp. Ox ∩Oy = ∅). Let Mx :=M ∩Ox,My := M ∩ Oy. Then Mx,My ∈ TM , x ∈ Mx, y ∈ My, and x /∈ My, y /∈ Mx (resp.Mx∩My = ∅), showing (M, TM ) is T1 (resp. T2). Now assume (X, T ) to be T3, let x ∈Mand let A ⊆ X be closed, satisfying x /∈ AM := A ∩M . Then there are Ox, OA ∈ Tsuch that x ∈ Ox, A ⊆ OA, Ox ∩ OA = ∅. Let Mx := M ∩ Ox, MA := M ∩ OA. ThenMx,MA ∈ TM , x ∈Mx, AM ⊆MA, and Mx ∩MA = ∅, showing (M, TM ) is T3.

(b): Since X 6= ∅, there exists x = (xi)i∈I ∈ X. Fix j ∈ I. Let M :=∏

i∈I Ai, whereAi := {xi} for i 6= j and Aj := Xj . Then (M, TM ) is homeomorphic to (Xj, Tj): If ι :


M −→ X is the identity inclusion map and πj : X −→ Xj is the projection on Xj, then,clearly, f := πj ◦ι is continuous and bijective. Moreover, since {M ∩π−1

j (O) : O ∈ Tj} is

a subbase of TM and, for each O ∈ Tj, f(M∩π−1j (O)) = O, showing f−1 to be continuous

and f to be a homeomorphism. Thus, it (X, T ) is Tn, n ∈ {1, 2, 3}, then (Xj, Tj) is Tnby (a) and Prop. I.1(h). Let each (Xi, Ti) be T1 (resp. T2), x, y ∈ X with x 6= y. Thenthere exists j ∈ I with xj 6= yj, and Oj,x, Oj,y ∈ Tj such that xj ∈ Oj,x, yj ∈ Oj,y, andxj /∈ Oj,y, yj /∈ Oj,x (resp. Oj,x ∩Oj,y = ∅). Let Ox := π−1

j (Oj,x), Oy := π−1j (Oj,y). Then

Ox, Oy ∈ T , x ∈ Ox, y ∈ Oy, and x /∈ Oy, y /∈ Ox (resp. Ox ∩ Oy = ∅), showing (X, T )is T1 (resp. T2). Now assume each (Xi, Ti) to be T3, let x ∈ X and let A ⊆ X be closed,satisfying x /∈ A. Since x ∈ O := X \ A and O ∈ T , there is a finite J ⊆ I such thatx ∈ B :=

⋂

j∈J π−1j (Oj) ⊆ O, each Oj ∈ Tj. Then, for each j ∈ J , xj /∈ Aj := Xj\Oj and

each Aj is closed. As each (Xj, Tj) is T3, there are Oj,x, Oj,A ∈ Tj such that xj ∈ Oj,x,Aj ⊆ Oj,A, Oj,x ∩ Oj,A = ∅. Let Ox :=

⋂

j∈J π−1j (Oj,x), OA :=

⋃

j∈J π−1j (Oj,A). Then

Ox, OA ∈ T , x ∈ Ox, A ⊆ OA (a ∈ A implies a /∈ B, which implies aj0 ∈ Aj0 for somej0 ∈ J , which implies a ∈ OA), and Ox ∩OA = ∅, showing (X, T ) is T3. �

Theorem 3.6 (Tietze-Urysohn). Let (X, T ) be a topological space. Then the followingstatements are equivalent:

(i) (X, T ) is T4.

(ii) If ∅ 6= A,B ⊆ X are arbitrary closed sets with A ∩ B = ∅, and if a, b ∈ R, a < b,then there exists a continuous function f : X −→ [a, b] such that f ↾A≡ a andf ↾B≡ b.

(iii) If ∅ 6= A ⊆ X is an arbitrary closed set, a, b ∈ R, a < b, and f : A −→ [a, b] iscontinuous, then f can be continuously extended to X, i.e. there exists a continuousg : X −→ [a, b] such that g↾A= f .

Proof. See, e.g., [Pre75, Th. 4.5.2, Th. 4.5.4] or [RF10, Sec. 12.1]. �

3.2 Compactness

In [Phi16, Def. 7.42(c)], we defined a subset C of K to be compact if, and only if, Cwas closed and bounded; and we saw in [Phi16, Th. 7.48] that compactness of C wasequivalent to every sequence in C having a convergent subsequence. The appropriatedefinition of compactness in general topological spaces looks quite different, at least atfirst glance (see Def. 3.7 below). It does turn out to be equivalent to our old definitionin K with its standard topology. Even in Kn, a set is still compact if, and only if, it isclosed and bounded (see Cor. 3.16(c)). However, in infinite-dimensional normed vectorspaces, this is no longer true (see Th. 3.18). In general metric spaces, it is at leaststill true that a set C is compact if, and only if, every sequence in C has a convergentsubsequence (see Th. 3.14). In general topological spaces this still remains true if onereplaces sequences with nets (see Th. 3.8) – however, sequences, in general, no longersuffice (see Caveat 3.15).


Definition 3.7. Let (X, T ) be a topological space, C ⊆ X. We call a family of opensets (Oi)i∈I , Oi ∈ T , an open cover of C if, and only if,

C ⊆⋃

i∈I

Oi. (3.1)

We call C compact if, and only if, every open cover of C has a finite subcover, i.e.if (Oi)i∈I is an open cover of C, then there exist i1, . . . , iN ∈ I, N ∈ N, such thatC ⊆ ⋃N

k=1Oik .

Theorem 3.8. Let (X, T ) be a topological space, C ⊆ X. Then the following statementsare equivalent:

(i) C is compact.

(ii) C has the finite intersection property, i.e. if (Ai)i∈I is a family of closed subsetsof X such that C ∩⋂i∈I Ai = ∅, then there exist i1, . . . , iN ∈ I, N ∈ N, such that

C ∩⋂Nk=1Aik = ∅.

(iii) Every net in C has a subnet that converges in C.

Proof. We show “(i)⇔(ii)”, “(ii)⇒(iii)”, and “(iii)⇒(i)”.

“(i)⇔(ii)”: (Oi)i∈I is an open cover of C if, and only if, Ai := X \ Oi are closed setssatisfying

X \ C ⊇⋂

i∈I

Ai ⇔ C ∩⋂

i∈I

Ai = ∅.

Thus, the (Oi)i∈I have a finite subcover (Oi1 , . . . , OiN ) of C if, and only if, there arei1, . . . , iN ∈ I such that C ∩⋂N

k=1Aik = ∅.“(ii)⇒(iii)”: Let (ci)i∈I be a net in C. For each i ∈ I, let Ai := cl{cj : j ≥ i}. Consideri1, . . . , iN ∈ I, N ∈ N. Then, since I is a directed set, there is i0 ∈ I satisfying i0 ≥ ikfor each k = 1, . . . , N , implying

ci0 ∈ C ∩N⋂

k=1

Aik , i.e. C ∩N⋂

k=1

Aik 6= ∅.

As we assume (ii), this now implies there exists

c ∈ C ∩⋂

i∈I

Ai. (3.2)

We proceed to construct a subnet of (ci)i∈I that converges to c: We define J := {(U, i) :U ∈ U(c), ci ∈ U}. Due to (3.2), for each i ∈ I, c is in the closure of {cj : j ≥ i},implying, for each U ∈ U(c), the existence of cU ∈ U ∩ {cj : j ≥ i}, also showing J 6= ∅.We make J into a directed set by letting

(U, i) ≤ (V, j) :⇔(U ⊇ V ∧ i ≤ j) :


Clearly, ≤ is reflexive and transitive on J . Given (U, i), (V, j) ∈ J , let M ∈ I be suchthat M ≥ i, j. Then there is M0 ≥ M such that cM0

∈ U ∩ V ∩ {cj : j ≥ M}. Thus,(U ∩ V,M0) ∈ J and (U ∩ V,M0) ≥ (U, i), (V, j), proving J to be a directed set. Thenφ : J −→ I, φ(U, i) := i, is final, since, for each i ∈ I and U ∈ U(c), there exists i0 ≥ isuch that (U, i0) ∈ J . Then

∀(V,j)≥(U,i0)

φ(V, j) = j ≥ i0 ≥ i,

proving φ to be final. Thus (cφ(U,i))(U,i)∈J is a subnet of (ci)i∈I . It merely remains toverify lim(U,i)∈J cφ(U,i) = c. However, if U ∈ U(c) and (U, i) ∈ J , then, for each (V, j) ∈ Jwith (V, j) ≥ (U, i), one has cφ(V,j) = cj ∈ V ⊆ U , which establishes the case.

“(iii)⇒(i)”: Seeking a contradiction, let (Oi)i∈I be an open cover of C that does notadmit a finite subcover. If J denotes the set of all finite subsets of I, then J is directedby ⊆. If, for each K ∈ J , we choose cK ∈ C \ ⋃i∈K Oi, then (cK)K∈J defines a netin C. According to (iii), (cK)K∈J has a subnet (cφ(l))l∈L, where φ : L −→ J is final,and liml∈L cφ(l) = c ∈ C. Since (Oi)i∈I is an open cover of C, there exists i0 ∈ I suchthat Oi0 ∈ U(c). Then, for each l ∈ L such that φ(l) ≥ {i0}, one has cφ(l) /∈ Oi0 , incontradiction to liml∈L cφ(l) = c. �

Proposition 3.9. Let (X, T ) be a topological space, C ⊆ X.

(a) If C is compact and A ⊆ C is closed, then A is compact.

(b) If C is compact and X is T2, then C is closed.

Proof. (a): If (xi)i∈I is a net in A, then (xi)i∈I is a net in C. Since C is compact, itmust have a subnet that converges to some c ∈ C. However, as A is closed, c must bein A, showing that (xi)i∈I has a subnet that converges to some c ∈ A, i.e. A is compact.

(b): Let (xi)i∈I be a net in C that converges in X, i.e. limi∈I xi = x ∈ X. Since C iscompact, (xi)i∈I must have a subnet that converges to some c ∈ C. Since the subnetalso converges to x and since limits are unique in T2 spaces by Prop. 3.3, x = c ∈ C,showing C is closed. �

Example 3.10. (a) Clearly, finite sets are always compact.

(b) If (X, T ) is a cofinite topological space, then every C ⊆ X is compact (if C is infinite,then it is not closed, showing that a compact subset of a T1 space does not need tobe closed): Let (Oi)i∈I be an open cover of C, i0 ∈ I. Then A := C\Oi0 is finite. Foreach a ∈ A, there is ia ∈ I such that a ∈ Oia . Thus, letting J := {i0}∪{ia : a ∈ A},(Oi)i∈J is a finite subcover of (Oi)i∈I .

Proposition 3.11. Let (X, T ) be a topological space.

(a) Unions of finitely many compact subsets of X are compact.

(b) If X is T2, then arbitrary intersections of compact subsets of X are compact.


Proof. (a): It suffices to consider two compact sets C1, C2 ⊆ X (then the general casefollows by induction). Let (Oi)i∈I be an open cover of C := C1 ∪ C2. Then (Oi)i∈Iconstitutes an open cover of both C1 and C2. As C1, C2 are compact, there are K,L ⊆ Ifinite, such that (Oi)i∈K still covers C1 and (Oi)i∈L still covers of C2. Then the (Oi)i∈K∪L

forms a finite cover of C.

(b): If (Ci)i∈I , I 6= ∅, is a family of compact subsets of a T2 space, then each Ci isclosed by Prop. 3.9(b). Thus, C :=

⋂

i∈I Ci is a closed subset of a compact set and,thus, compact by Prop. 3.9(a). �

N (with the discrete topology) already shows that infinite unions of compact sets neednot be compact. If (X, T ) is not T2, then, in general, not even intersections of twocompact sets need to be compact (see Ex. H.1 of the Appendix).

Theorem 3.12. If (X, TX) and (Y, TY ) are topological spaces, C ⊆ X is compact, andf : C −→ Y is continuous, then f(C) is compact.

Proof. If (yi)i∈I is a net in f(C), then, for each i ∈ I, there is some xi ∈ C such thatf(xi) = yi. As C is compact, there is a subnet (aj)j∈J of (xi)i∈I with limj∈J aj = afor some a ∈ C. Then (f(aj))j∈J is a subnet of (yi)i∈I and the continuity of f yieldslimj∈J f(aj) = f(a) ∈ f(C), showing that (yi)i∈I has a convergent subnet with limit inf(C). We have therefore established that f(C) is compact. �

Definition 3.13. A subset A of a metric space (X, d) is called precompact or totallybounded if, and only if, for each ǫ > 0, A can be covered by finitely many ǫ-balls, i.e. if,and only if, there exist finitely many points a1, . . . , aN ∈ A, N ∈ N, such that

A ⊆N⋃

j=1

Bǫ(aj). (3.3)

Theorem 3.14. For a subset C of a metric space (X, d), the following statements areequivalent:

(i) C is compact as defined in Def. 3.7.

(ii) Every sequence in C has a subsequence that converges to some limit c ∈ C.

(iii) C is precompact (i.e. totally bounded) as defined in Def. 3.13 and complete, i.e.every Cauchy sequence in C converges to a limit in C.

Proof. We show (i) ⇒ (ii) ⇒ (iii) ⇒ (i).

“(i) ⇒ (ii)”: Assume C is compact. Seeking a contradiction, assume there exists asequence (cn)n∈N in C such that no subsequence of (cn)n∈N converges to a limit in C.According to Lem. 1.41, no c ∈ C can be a cluster point of A := {cn : n ∈ N}. Thus, byRem. 1.42(a), for each c ∈ C, there exists ǫc > 0 such that Bǫc(c) contains only finitelymany of the cn. Since C ⊆ ⋃c∈C Bǫc(c), the family

(Bǫc(c)

)

c∈Cconstitutes an open cover


of C. As C is compact, there exist finitely many points a1, . . . , aN ∈ C, N ∈ N, suchthat C ⊆ ⋃N

j=1Bǫaj(aj), i.e. C contains only finitely many of the cn, in contradiction to

(cn)n∈N being a sequence in C.

“(ii) ⇒ (iii)”: Let (cn)n∈N be a Cauchy sequence in C. By (ii), (cn)n∈N has a subsequence(cnj

)j∈N such that limj→∞ cnj= c ∈ C. Given ǫ > 0 choose K ∈ N such that, for each

m,n ≥ K, d(cm, cn) <ǫ2, and such that, for each nj ≥ K, d(cnj

, c) < ǫ2. Then, fixing

some nj ≥ K,

∀n≥K

d(cn, c) ≤ d(cn, cnj) + d(cnj

, c) <ǫ

2+ǫ

2= ǫ,

showing limn→∞ cn = c and the completeness of C. We now show C to be also totallybounded. We proceed by contraposition and assume C not to be totally bounded, i.e.there exists ǫ > 0 such that C is not contained in any finite union of ǫ-balls. Inductively,we construct a sequence (cn)n∈N in C such that

∀m,n∈N,m 6=n

d(cm, cn) ≥ ǫ : (3.4)

To start with, we note C 6= ∅ and choose some arbitrary c1 ∈ C. Assuming c1, . . . , ck ∈C, k ∈ N, have already been constructed such that d(cm, cn) ≥ ǫ holds for each m,n ∈{1, . . . , k}, there must be

c ∈ C \k⋃

j=1

Bǫ(cj). (3.5)

Choosing ck+1 := c, (3.5) guarantees (3.4) now holds for each m,n ∈ {1, . . . , k + 1}.Due to (3.4), no subsequence of (cn)n∈N can be a Cauchy sequence, i.e. (cn)n∈N does nothave a convergent subsequence, proving (ii) does not hold.

“(iii) ⇒ (i)”: Assume C to be precompact and complete. For each k ∈ N, the precom-pactness yields points ck1, . . . , c

kNk

∈ C, Nk ∈ N, such that

C ⊆Nk⋃

j=1

B 1

k(ckj ). (3.6)

Seeking a contradiction, assume there exists an open cover (Oj)j∈I of C which does nothave a finite subcover. Inductively, we construct a decreasing sequence of subsets Ck ofC, C ⊇ C1 ⊇ C2 ⊇ . . . , such that no Ck can be covered by a finite subcover of (Oj)j∈Iand such that

∀k∈N

∃j∈{1,...,Nk}

Ck ⊆ B 1

k(ckj ) :

To start out, we note that (3.6) implies at least one of the finitely many sets C ∩B1(c

11), . . . , C∩B1(c

1N1) can not be covered by a finite subcover of (Oj)j∈I , say, C∩B1(c

1j1).

Define C1 := C∩B1(c1j1). Then, given C1, . . . , Ck have already been constructed for some

k ∈ N, since Ck can not be covered by a finite subcover of (Oj)j∈I and

Ck ⊆ C ⊆Nk+1⋃

j=1

B 1

k+1

(ck+1j ),


there exists jk+1 ∈ {1, . . . , Nk+1} such that Ck ∩ B 1

k+1

(ck+1jk+1

) can not be covered by a

finite subcover of (Oj)j∈I , either. Define Ck+1 := Ck ∩ B 1

k+1

(ck+1jk+1

). For each k ∈ N,

choose some sk ∈ Ck (note Ck 6= ∅, as it can not be covered by finitely many Oj). Givenǫ > 0, there is K ∈ N such that 2

K< ǫ. If k, l ≥ K, then sk, sl ∈ CK ⊆ B 1

K(cKj ) for some

suitable j ∈ {1, . . . , NK}. In particular, d(sk, sl) <2K< ǫ, showing (sk)k∈N is a Cauchy

sequence. As (sk)k∈N is a Cauchy sequence in C and C is complete, there exists c ∈ Csuch that limk→∞ sk = c. However, then there must exist some j ∈ I such that c ∈ Oj

and, since Oj is open, there is ǫ > 0 with Bǫ(c) ⊆ Oj, and Bǫ(c) must contain almostall of the sk. Choose k sufficiently large such that 1

k< ǫ

4and d(sk, c) <

ǫ2. Then, since

sk ∈ Ck ⊆ B 1

k(ckj ),

one has

∀x∈B 1

k(ckj )

d(x, c) ≤ d(x, sk) + d(sk, c) <2

k+ǫ

2<

2ǫ

4+ǫ

2= ǫ,

showing Ck ⊆ B 1

k(ckj ) ⊆ Bǫ(c) ⊆ Oj, in contradiction to Ck not being coverable by

finitely many Oj. �

Caveat 3.15. A subset C of a topological space is defined to be sequentially compact if,and only if, every sequence in C has a convergent subsequence. Using this terminology,one can rephrase the equivalence between (ii) and (i) in Th. 3.14 by stating that ametric space is sequentially compact if, and only if, it is compact. However, in generaltopological spaces, neither implication remains true ((iii) of Th. 3.14 does not even makesense in general topological spaces, as the concepts of boundedness, total boundedness,and Cauchy sequences are, in general, not available): For an example of a topologicalspace that is compact, but not sequentially compact, see, e.g. [Pre75, 7.2.10(a)]; for anexample of a topological space that is sequentially compact, but not compact, see, e.g.[Pre75, 7.2.10(c)].

Corollary 3.16. (a) Let (X, d) be a metric space, C,A ⊆ X. If C is compact, A isclosed, and A ∩ C = ∅, then dist(C,A) > 0.

(b) A compact subset C of a metric space is closed and bounded.

(c) Heine-Borel Theorem: A subset C of Kn, n ∈ N, is compact if, and only if, C isclosed and bounded.

Proof. (a): Proceeding by contraposition, we show that dist(C,A) = 0 implies A∩C 6=∅. If dist(C,A) = 0, then there exists a sequence ((ck, ak))k∈N in C × A such thatlimk→∞ d(ck, ak) = 0. As C is compact, we may assume limk→∞ ck = c ∈ C, alsoimplying

limk→∞

ak = c, since ∀k∈N

d(ak, c) ≤ d(ak, ck) + d(ck, c). (3.7)

Since A is closed, (3.7) yields c ∈ A, i.e. c ∈ A ∩ C.(b): Let C be a compact subset of a metric space X. Since X is T2 by Ex. 3.4(c), Cis closed by Prop. 3.9(b). Since C is totally bounded by Th. 3.14(iii), clearly, C is alsobounded.


(c): If C is compact, then it is closed and bounded by (b). If C is closed and bounded,and (xk)k∈N is a sequence in C, then the boundedness and the Bolzano-Weierstrass Th.1.31 yield a subsequence that converges to some x ∈ Kn. However, since C is closed,x ∈ C, showing that C is compact. �

The following Ex. 3.17 and Th. 3.18 show that, in general, sets in metric spaces can beclosed and bounded without being compact.

Example 3.17. If (X, d) is a noncomplete metric space, than it contains a Cauchysequence that does not converge. It is not hard to see that such a sequence can nothave a convergent subsequence, either. This shows that no noncomplete metric spacecan be compact. Moreover, the closure of every bounded subset of X that contains sucha nonconvergent Cauchy sequence is an example of a closed and bounded set that isnoncompact. Concrete examples are given by Q ∩ [a, b] for each a, b ∈ R with a < b(these sets are Q-closed, but not R-closed!) and ]a, b[ for each a, b ∈ R with a < b, ineach case endowed with the usual metric d(x, y) := |x− y|.

—

In the previous example, the compactness of the closed and bounded sets failed due tononcompleteness. However, even in Banach spaces, there can be closed and boundedsets that are noncompact. In fact, according to the following Th. 3.18, the closed unitball in a normed vector space X is compact if, and only if, X is finite-dimensional.

Theorem 3.18. A normed vector space (X, ‖ · ‖) over K is finite-dimensional if, andonly if, its closed unit ball B1(0) is compact.

Proof. The proof is provided in Sec. H.2 of the Appendix. �

Theorem 3.19. If (X, T ) is a topological space, C ⊆ X is compact, and f : C −→ R

is continuous, then f assumes its max and its min, i.e. there are xm ∈ C and xM ∈ Csuch that f(xm) ≤ f(x) ≤ f(xM) for each x ∈ C.

Proof. Since C is compact and f is continuous, f(C) ⊆ R is compact according to Th.3.12. Then, by [Phi16, Lem. 7.53], f(C) contains a smallest element m and a largestelement M . This, in turn, implies that there are xm, xM ∈ C such that f(xm) = m andf(xM) =M . �

Theorem 3.20. If (X, dX) and (Y, dY ) are metric spaces, C ⊆ X is compact, andf : C −→ Y is continuous, then f is uniformly continuous.

Proof. If f is not uniformly continuous, then there must be some ǫ > 0 such that, foreach k ∈ N, there exist xk, yk ∈ C satisfying dX(x

k, yk) < 1/k and dY (f(xk), f(yk)) ≥ ǫ.

Since C is compact, there is a ∈ C and a subsequence (ak)k∈N of (xk)k∈N such thata = limk→∞ ak. Then there is a corresponding subsequence (bk)k∈N of (yk)k∈N such thatdX(a

k, bk) < 1/k and dY (f(ak), f(bk)) ≥ ǫ for all k ∈ N. Using the compactness of C

again, there is b ∈ C and a subsequence (vk)k∈N of (bk)k∈N such that b = limk→∞ vk.


Now there is a corresponding subsequence (uk)k∈N of (ak)k∈N such that dX(uk, vk) <

1/k and dY (f(uk), f(vk)) ≥ ǫ for all k ∈ N. Note that we still have a = limk→∞ vk.

Given α > 0, there is N ∈ N such that, for each k > N , one has dX(a, uk) < α/3,

dX(b, vk) < α/3, and dX(u

k, vk) < 1/k < α/3. Thus, dX(a, b) ≤ dX(a, uk)+dX(u

k, vk)+dX(b, v

k) < α, implying d(a, b) = 0 and a = b. Finally, the continuity of f impliesf(a) = limk→∞ f(uk) = limk→∞ f(vk) in contradiction to dY (f(u

k), f(vk)) ≥ ǫ. �

Theorem 3.21 (Lebesgue Number). Let (X, d) be a metric space and C ⊆ X. If C iscompact and (Oj)j∈I is an open cover of C, then there exists a Lebesgue number δ forthe open cover, i.e. some δ > 0 such that, for each A ⊆ C with diamA < δ, there existsj0 ∈ I, where A ⊆ Oj0.

Proof. Seeking a contradiction, assume there is no Lebesgue number for the open cover(Oj)j∈I . Then there are sequences (xk)k∈N in C and (Ak)k∈N in P(C) such that

∀k∈N

(

xk ∈ Ak, diamAk <1

k, and ∀

j∈IAk 6⊆ Oj

)

. (3.8)

As C is compact, we may assume that limk→∞ xk = c ∈ C. Then there must be Oj

such that c ∈ Oj and ǫ > 0 such that Bǫ(c) ⊆ Oj. If k ∈ N is such that 1k< ǫ

2and

d(xk, c) <ǫ2, then, for each a ∈ Ak, we have d(a, c) ≤ d(a, xk) + d(xk, c) <

ǫ2+ ǫ

2= ǫ,

implying the contradiction Ak ⊆ Bǫ(c) ⊆ Oj. �

Theorem 3.22. If (X, TX) and (Y, TY ) are topological spaces, C ⊆ X is compact,(Y, TY ) is T2, and f : C −→ Y is continuous and one-to-one, then f−1 : f(C) −→ C iscontinuous (i.e. f : C −→ f(C) is a homeomorphism).

Proof. By Th. 2.7(iv), it suffices to show f(A) is closed in f(C) for each C-closed A ⊆ C.If A ⊆ C is closed in C, then A is compact by Prop. 3.9(a). Then f(A) is compactby Th. 3.12 and, thus, closed in f(C) by Prop. 3.9(b) (since Y and, hence, f(C) areT2). �

Example 3.23. (a) The following example shows that the statement of Th. 3.22 doesnot hold without the hypothesis that C be compact: Let C := [0, 2π[ and Y := C

with the usual topologies, f : C −→ Y , f(t) := eit. Then f is continuous due to thecontinuity of the exponential function. Moreover, f is injective with f(C) = S1(0)by [Phi16, Cor. 8.30] However, f−1 : f(C) −→ C is not continuous: Consider thesequence (zn)n∈N, where

zn :=

{

e1

n for n even,

ei(2π−1

n) for n odd.

Then, clearly, (zn)n∈N is a sequence in f(C) with limn→∞ zn = 1. On the otherhand,

f−1(zn) =

{1n

for n even,

2π − 1n

for n odd,

showing that (f−1(zn))n∈N does not converge.


(b) The following example shows that the statement of Th. 3.22 does not hold withoutthe hypothesis that (Y, TY ) be T2: Let X := Y := [0, 1], let TX be the usual (metric)topology, and let TY be the cofinite topology. Then TY ( TX (since finite sets areTX-closed, but, e.g., ]0,

12[/∈ TY ). Then (X, TX) is compact, Id : X −→ Y is bijective

and continuous, but Id : Y −→ X is not continuous.

Proposition 3.24. Let (X, T ) be a topological space. If (X, T ) is both compact and T2,then it is normal.


Theorem 3.25 (Tychonoff). Let (Xi, Ti) be topological spaces, i ∈ I. If X =∏

i∈I Xi

is endowed with the product topology T and each Xi is compact, then X is compact.

Proof. The proof is provided in Sec. H.3 of the Appendix. �

3.3 Connectedness

Definition 3.26. Let (X, T ) be a topological space C ⊆ X.

(a) X is called connected if, and only if, there are no nonempty disjoint open setsO1, O2 ∈ T such that X = O1 ∪ O2. The subset C is called connected if, and onlyif, it is connected with respect to the subspace topology TC .

(b) A path in X is a continuous map φ : [0, 1] −→ X. We say that x, y ∈ X areconnected by the path φ if, and only if, φ(0) = x and φ(1) = y. We call C path-connected if, and only if, for each x, y ∈ C, there exists a path in C, connecting xand y.

Lemma 3.27. For a topological space (X, T ), the following statements are equivalent:

(i) (X, T ) is connected.

(ii) The only clopen (i.e. closed and open) subsets of X are X and ∅.

(iii) If X = A1 ∪A2, A1 ∩A2 = ∅, and A1 and A2 are both open (or both closed), thenA1 = ∅ or A2 = ∅.

(iv) If f : X −→ {0, 1} is continuous (with respect to the discrete topology on {0, 1}),then f is constant.

Proof. The equivalences of (i) – (iii) are immediate from Def. 3.26(a).

(ii) ⇒ (iv): If f : X −→ {0, 1} is continuous and nonconstant, then O1 := f−1({0})and O1 := f−1({1}) are disjoint nonempty clopen subsets of X.

(iv) ⇒ (i): If X = O1 ∪O2 with both O1, O2 nonempty and open, then f : X −→ {0, 1},f(x) := 0 for x ∈ O1, f(x) := 1 for x ∈ O2, defines a continuous nonconstant map. �


Example 3.28. It is immediate that a discrete space with at least two points is neverconnected and that an indiscrete space is always connected.

Proposition 3.29. Let (X, T ) be a topological space.

(a) If A,B ⊆ X such that A ⊆ B ⊆ A, then A connected implies B connected.

(b) Let (Ai)i∈I , I 6= ∅, be a family of subsets of X such that B :=⋂

i∈I Ai 6= ∅. Ifeach Ai is connected (resp. path-connected), then C :=

⋃

i∈I Ai is connected (resp.path-connected).

Proof. (a): Let M1,M2 be closed subsets of X such that B = (M1 ∩ B) ∪(M2 ∩ B).Since A is connected and A = (M1 ∩A) ∪(M2 ∩A), we have M1 ∩A = ∅ or M2 ∩A = ∅,say M1 ∩ A = ∅. Then A ⊆ M2. Since M2 is closed, this implies A ⊆ M2 and B ⊆ M2.Thus, B =M2 ∩ B and M1 ∩ B = ∅, proving B to be connected.

(b): Let U, V be disjoint subsets of C that are open in C and satisfy C = U ∪V .Moreover, let x ∈ B. Then either x ∈ U or x ∈ V , say x ∈ U . Then, for each i ∈ I,x ∈ U ∩ Ai. Since Ai = (U ∩ Ai) ∪(V ∩ Ai) and both U ∩ Ai and V ∩ Ai are openin Ai, then connectedness of Ai implies V ∩ Ai = ∅. As this holds for each i ∈ I, wehave V ∩C = ∅, showing that C is connected. The path-connected version is left as anexercise. �

Example 3.34(b) below shows that Prop. 3.29(a) does not hold with “connected” re-placed by “path-connected”.

Definition and Remark 3.30. Let (X, T ) be a topological space.

(a) Proposition 3.29(b) allows to define, for each x ∈ X, the connected componentCx (resp. the path-component Px) of x as the union of all connected (resp. path-connected) subsets of X that contain x. Then these components are the largestconnected (resp. path-connected) subsets of X that contain x. Then, clearly, if onedefines x ∼ y if, and only if, y ∈ Cx (resp. y ∈ Px), then∼ constitutes an equivalencerelation on X. The equivalence classses of ∼ are called the connected components(resp. the path-components) of X. Thus, each topological space is the disjoint unionof its connected components and also the disjoint union of its path-components.

(b) As a consequence of Prop. 3.29(a), connected components are always closed, whereasEx. 3.34(b) below shows that path-components are not necessarily closed.

Theorem 3.31. If (X, TX) and (Y, TY ) are topological spaces, X is connected (resp.path-connected), and f : X −→ Y is continuous, then f(X) is connected (resp. path-connected).

Proof. If B := f(X) is not connected, then there are nonempty sets O1, O2 ⊆ B that areopen in B, such that B = O1 ∪O2. Then U1 := f−1(O1), U2 := f−1(O2) are disjoint opensubsets of X such that X = U1 ∪U2, showing X is not connected. Now, if y1, y2 ∈ B,then there are x1, x2 ∈ X such that f(x1) = y1 and f(x2) = y2. If φ : [0, 1] −→ X, is apath in X connecting x1 and x2, then f ◦ φ is a path in B connecting y1 and y2. �


Theorem 3.32. If A ⊆ R, then A is connected if, and only if, A is an interval.

Proof. If A is not an interval, then there are a, b, c ∈ R such that a < c < b, a, b ∈ A,c /∈ A. Then O1 :=] − ∞, c[ and O2 :=]c,∞[ are disjoint open subsets of R such thatA = (O1 ∩A) ∪(O2 ∩A) and both O1 ∩A and O2 ∩A are nonempty. It remains to showthat, if A is an interval, then A is connected. We may assume that A consists of morethan one point. Moreover, let O1, O2 be disjoint subsets of A that are open in A andsuch that A = O1 ∪O2. Seeking a contradiction, assume O1 6= ∅ and O2 6= ∅. Thus,there are s ∈ O1, t ∈ O2, and, without loss of generality, s < t (otherwise, switch thenames of O1 and O2). Let a := sup{x ∈ O1 : x < t}. Then a ∈ O1 since O1 is closed inA (note a ∈ A since a ≤ t and A is an interval). In particular a < t. However, since O1

is also open in the interval A, there must be an open interval I such that a ∈ I, I ⊆ O1,showing that O1 contains elements between a and t, in contradiction to the definitionof a. �

Corollary 3.33. If the topological space (X, T ) is path-connected, then it is connected.

Proof. Assume (X, T ) is path-connected and let x ∈ X. If y ∈ X, then there is apath φy : [0, 1] −→ X, connecting x and y. Since [0, 1] is connected by Th. 3.32, so isAy := φy([0, 1]). Since X =

⋃

y∈X Ay, X is connected by Prop. 3.29(b). �

Example 3.34. (a) In general, neither unions nor intersections of (even just two) con-nected sets are connected: Let (X, T ) be a topological space. Then, for eachx ∈ X, {x} is connected (even path-connected), but, if (X, T ) is T2 and x, y ∈ Xwith x 6= y, then {x, y} is not connected. Now we consider

C1 :={eit : t ∈ [0, π]

}, C2 :=

{eit : t ∈ [π, 2π]

}.

Then, as the continuous images of intervals, both C1 and C2 are connected (evenpath-connected) subsets of C. However, C1∩C2 = {−1, 1} is not a connected subsetof C.

(b) The following example shows that a connected set is not necessarily path-connected,and that the closure of a path-connected set is not necessarily path-connected: LetX := R2 with the norm topology. Let

G :=

{(

t, sin

(1

t

))

: t ∈ R+

}

, A :=(

{0} × [−1, 1])

∪G.

Then A is connected but not path-connected: As the image of the connected setR+ under the continuous map t 7→ (t, sin(1

t)), the set G is connected. We claim

A = G: Let s ∈ [−1, 1]. By the intermediate value theorem, there is a sequence(tn)n∈N in R+ such that limn→∞ tn = 0 and s = sin( 1

tn) for each n ∈ N. Then

((tn, s))n∈N is a sequence in G with limn→∞(tn, s) = (0, s), showing A ⊆ G. Onthe other hand, let (tn, sn)n∈N be an arbitrary sequence in G with limn→∞(tn, sn) =(t, s). If t ∈ R+, then s = sin(1

t) and (t, s) ∈ G. Otherwise, limn→∞ tn = 0

and s ∈ [−1, 1], since each sn ∈ [−1, 1] and [−1, 1] is closed, showing A ⊇ G.

4 DIFFERENTIAL CALCULUS 75

Now A = G implies A is connected by Prop. 3.29(a). It remains to verify Ais not path-connected. Proceeding by contradiction, assume there were a pathλ : [0, 1] −→ A connecting (0, 0) and (1, sin 1). Since λ([0, 1]) must be connected,one obtains {(t, sin(1

t)) : t ∈]0, 1]} ⊆ λ([0, 1]). Let s := sup{t ∈ [0, 1] : λ(t) ∈

{0} × [0, 1]}. Then λ(s) ∈ {0} × [0, 1], since {0} × [0, 1] is closed. In particular,s < 1. Moreover, with respect to the max-norm on R2, there is ǫ ∈ R+ suchthat λ(Bǫ(s)) ⊆ B 1

2

(λ(s)). Let s+ := min{s + ǫ2, s+1

2}. By the definition of s

there is t0 ∈]0, 1] such that λ(s+) = (t0, sin(1t0)). As λ(Bǫ(s)) must be connected,

one has C := {(t, sin(1t)) : t ∈]0, t0]} ⊆ λ(Bǫ(s)), which is in contradiction to

λ(Bǫ(s)) ⊆ B 1

2

(λ(s)) (since diam(B 1

2

(λ(s))) ≤ 1 and diam(C) ≥ |1 − (−1)| = 2.This finishes the proof that A is not path-connected.

Theorem 3.35. An open subset of a normed vector space (e.g. an open subset of Kn)is connected if, and only if, it is path-connected.

Proof. See, e.g., [Heu08, Th. 161.4]. �

Theorem 3.36. Let (Xi, Ti) be topological spaces, i ∈ I. If X =∏

i∈I Xi is endowedwith the product topology T and each Xi is connected (resp. path-connected), then X isconnected (resp. path-connected).


4 Differential Calculus

4.1 Partial Derivatives and Gradients

The goal of the following is to generalize the notion of derivative from one-dimensionalfunctions to functions f : G −→ K, where G ⊆ Rn with n ∈ N. Later we will also allowfunctions with values in Km. For ξ ∈ G, G ⊆ Rn, we will define a function f : G −→ K

to have a so-called partial derivative (or just partial for short) at ξ with respect to thevariable xj if, and only if, the one-dimensional function that results from keeping all butthe jth variable fixed, namely

xj 7→ φ(xj) := f(ξ1, . . . , ξj−1, xj, ξj+1, . . . , ξn),

is differentiable at xj = ξj in the usual sense for one-dimensional functions. The partialderivative of f at ξ with respect to xj is then identified with φ′(ξj). This leads to thefollowing definition:

Definition 4.1. Let G ⊆ Rn, n ∈ N, f : G −→ K, ξ ∈ G, j ∈ {1, . . . , n}. If there isǫ > 0 such that ξ + hej ∈ G for each h ∈]− ǫ, ǫ[ (this condition is trivially satisfied if ξis an interior point of G), then f is said to have a partial derivative at ξ with respect tothe variable xj (or a jth partial for short) if, and only if, the limit

limh→0

f(ξ + hej)− f(ξ)

h

(

0 6= h ∈]− ǫ, ǫ[)

(4.1)


exists in K. In that case, the limit is defined to the jth partial of f at ξ and it is denotedwith one of the symbols

∂jf(ξ), ∂xjf(ξ),

∂f(ξ)

∂xj, fxj

(ξ), Djf(ξ).

If ξ is a boundary point of G and there is ǫ > 0 such that, for each h ∈]0, ǫ[, ξ+hej ∈ Gand ξ− hej /∈ G (resp. ξ− hej ∈ G and ξ+ hej /∈ G), then, instead of the limit in (4.1),one uses the one-sided limit

limh↓0


h

(

resp. limh↑0


h

)

(4.2)

in the above definition of the jth partial at ξ. If all the partials of f exist in ξ, then thevector

∇ f(ξ) :=(∂1f(ξ), . . . , ∂nf(ξ)

)(4.3)

is called the gradient of f at ξ (the symbol ∇ is called nabla, the corresponding operatoris sometimes called del). It is customary to consider the gradient as a row vector. If thejth partial ∂jf(ξ) exists for each ξ ∈ G, then the function

∂jf : G −→ K, ξ 7→ ∂jf(ξ), (4.4)

is also called the jth partial of f .

Example 4.2. The following example shows that, in general, the existence of partialderivatives does not imply continuity: Consider the function

f : R2 −→ R, f(x, y) :=

{xy

x2+y2for (x, y) 6= (0, 0),

0 for (x, y) = (0, 0).

Using the quotient rule for (x, y) 6= (0, 0) and the fact that f(x, 0) = f(0, y) = 0 for all(x, y) ∈ R2, one obtains

∇ f : R2 −→ R2, ∇ f(x, y) =

{(y(y2−x2)(x2+y2)2

, x(x2−y2)(x2+y2)2

)

for (x, y) 6= (0, 0),

(0, 0) for (x, y) = (0, 0).

In particular, both partials ∂xf and ∂yf exist everywhere in R2. However, f is notcontinuous in (0, 0): For k ∈ N, let xk := (1/k), yk := (1/k). Then limk→∞(xk, yk) =(0, 0), but

f(xk, yk) =1k2

1k2

+ 1k2

=1

2

for each k ∈ N. In particular, limk→∞ f(xk, yk) =126= 0 = f(0, 0), showing that f is not

continuous in (0, 0).

Remark 4.3. The problem in Example 4.2 is the discontinuity of the partials in (0, 0).We will see in Th. 4.29 below that, if all partials of f exist and are continuous in someneighborhood of a point ξ, then f is continuous (and even differentiable) in ξ.


4.2 The Jacobian

If f : G −→ Km, where G ⊆ Rn, then we can compute partials for each of the coordinatefunctions fj of f (provided the partials exist).

Definition 4.4. Let G ⊆ Rn, f : G −→ Km, (n,m) ∈ N2, ξ ∈ G. If, for eachl ∈ {1, . . . ,m}, the coordinate function fl = πl ◦ f (recall that f = (f1, . . . , fm)) has allpartials ∂kfl at ξ, then these m · n partials form an m× n matrix, namely

Jf (ξ) :=∂(f1, . . . , fm)

∂(x1, . . . , xn)(ξ) :=

∂1f1(ξ) . . . ∂nf1(ξ)...

...∂1fm(ξ) . . . ∂nfm(ξ)

=

∇ f1(ξ)...

∇ fm(ξ)

, (4.5)

called the Jacobian matrix of f at ξ. In the case that m = n, the Jacobian matrixJf (ξ) is quadratic and one can compute its determinant det Jf (ξ). This determinant isthen called the Jacobian determinant of f at ξ. Both the Jacobi matrix and the Jacobideterminant are sometimes referred to as the Jacobian. One then has to determine fromthe context which of the two is meant.

Remark 4.5. In many situations, it does not matter if you interpret z ∈ Kn as a columnvector or a row vector, and the same is true for the gradient. However, in the context ofmatrix multiplications, it is important to work with a consistent interpretation of suchvectors. We will therefore adhere to the following agreement: In the context of matrixmultiplications, we always interpret x ∈ Rn and f(x) ∈ Km for Km-valued functions f ascolumn vectors, whereas we always interpret the gradients ∇ g(x) of K-valued functionsg as row vectors.

Example 4.6. (a) Let A be an m× n matrix over K,

A =

a11 . . . a1n...

...am1 . . . amn

.

Then the map x 7→ Ax, A : Rn −→ Km, is R-linear for K = R, and it is therestriction to Rn of the C-linear map A on Cn for K = C (note that, due to theagreement from Rem. 4.5, Ax can be interpreted as a matrix multiplication in theusual way). Thus, if we denote the coordinate functions πl◦A by Al, l ∈ {1, . . . ,m},then Al(x) =

∑nk=1 alkxk and ∂kAl(x) = ∂Al(x)

∂xk= alk. Thus, JA(x) = A for each

x ∈ Rn.

(b) Consider (f, g) : R3 −→ C2, (f(x, y, z), g(x, y, z)) := (ixyz2, ix + yz). Then onecomputes the following Jacobian:

J(f,g)(x, y, z) =

(∇ f(x, y, z)∇ g(x, y, z)

)

=

(iyz2 ixz2 2ixyzi z y

)

.


(c) Consider (f, g) : R2 −→ C2, (f(x, y), g(x, y)) := (eixy, x+ 2y). Then one computesthe following Jacobian determinant:

det J(f,g)(x, y) =

∣∣∣∣

iyeixy ixeixy

1 2

∣∣∣∣= i eixy (2y − x).

Remark 4.7. The linearity of forming the derivative of one-dimensional functions di-rectly implies the linearity of forming partial derivatives, gradients, and Jacobians (pro-vided they exist). More precisely, if G ⊆ Rn, f, g : G −→ Km, (n,m) ∈ N2, ξ ∈ G, andλ ∈ K, then, for each (l, k) ∈ {1, . . . ,m} × {1, . . . , n},

∂k(f + g)l(ξ) = ∂kfl(ξ) + ∂kgl(ξ), ∂k(λf)l(ξ) = λ∂kfl(ξ), (4.6a)

∇(f + g)l(ξ) = ∇ fl(ξ) +∇ gl(ξ), ∇(λf)l(ξ) = λ∇ fl(ξ), (4.6b)

Jf+g(ξ) = Jf (ξ) + Jg(ξ), Jλf (ξ) = λJf (ξ), (4.6c)

where, in each case, the assumed existence of the objects on the right-hand side of theequation implies the existence of the object on the left-hand side.

4.3 Higher Order Partials and the Spaces Ck

Partial derivatives can, in turn, have partial derivatives themselves and so on. Forexample, a function f : R3 −→ K might have the following partial derivative of 6thorder: ∂1∂3∂2∂1∂2∂2f . We will see that, in general, it is important in which order thedifferent partial derivatives are carried out (see Example 4.9). If all partial derivativesare continuous, then the situation is much better and the result is the same, no matterwhat order is used for the partial derivatives (continuous partials commute, see Th.4.12). We start with the definition of higher order partials:

Definition 4.8. Let G ⊆ Rn, f : G −→ K, ξ ∈ G. Fix k ∈ N. For each element p =(p1, . . . , pk) ∈ {1, . . . , n}k, define the following partial derivative of kth order providedthat it exists:

∂pf(ξ) :=∂kf(ξ)

∂xp1 . . . ∂xpk:= ∂p1 . . . ∂pkf(ξ). (4.7)

One also defines f itself to be its own partial derivative of order 0. Analogous to Def. 4.4,if f : G −→ Km, m ∈ N, then one defines the higher order partials for each coordinatefunction fl, l = 1, . . . ,m, i.e. one uses fl instead of f in (4.7).

Example 4.9. The following example shows that, in general, partial derivatives do notcommute: Consider the function

f : R2 −→ R, f(x, y) :=

{xy3

x2+y2for (x, y) 6= (0, 0),

0 for (x, y) = (0, 0).

Analogous to Example 4.2, using the quotient rule for (x, y) 6= (0, 0) and the fact that


f(x, 0) = f(0, y) = 0 for all (x, y) ∈ R2, one obtains

∇ f : R2 −→ R2, ∇ f(x, y) =(∂1f(x, y), ∂2f(x, y)

)=(∂xf(x, y), ∂yf(x, y)

)

=

{(y3(y2−x2)(x2+y2)2

, xy2(3x2+y2)(x2+y2)2

)

for (x, y) 6= (0, 0),

(0, 0) for (x, y) = (0, 0).

In particular, we have ∂1f(0, y) = ∂xf(0, y) = y for each y ∈ R and ∂2f(x, 0) =∂yf(x, 0) = 0 for each x ∈ R. Thus, ∂y∂xf(0, y) ≡ 1 and ∂x∂yf(x, 0) ≡ 0. Evaluating at(0, 0) yields ∂2∂1f(0, 0) = ∂y∂xf(0, 0) = 1 6= 0 = ∂x∂yf(0, 0) = ∂1∂2f(0, 0).

—

As in Ex. 4.2, the problem in Ex. 4.9 lies in the discontinuity of the partials in (0, 0).As mentioned above, if all partials are continuous, then they do commute. To provethis result is our next goal. We will accomplish this in several steps. We start with apreparatory lemma that provides a variant of the mean value theorem in two dimensions.

Lemma 4.10. Let a, a, b, b ∈ R, a 6= a, b 6= b, and consider the square I = [a, a]× [b, b](which constitutes a closed interval in R2). Suppose f : I −→ R, (x, y) 7→ f(x, y), andset

∆I(f) := f(a, b) + f(a, b)− f(a, b)− f(a, b).

If ∂xf and ∂y∂xf exist everywhere in I, then there is some point (ξ, η) ∈ I◦ (i.e. withξ ∈]a, a[ and η ∈]b, b[), satisfying

∆I(f) = (a− a)(b− b)∂y∂xf(ξ, η).

Proof. Since the function g : [a, a] −→ R, g(x) := f(x, b) − f(x, b), is differentiable,the one-dimensional mean value theorem [Phi16, Th. 9.18] yields the existence of someξ ∈]a, a[ satisfying

∆I(f) = g(a)− g(a) = (a− a)g′(ξ) = (a− a)(∂xf(ξ, b)− ∂xf(ξ, b)

). (4.8a)

Since the function G : [b, b] −→ R, G(y) := ∂xf(ξ, y), is differentiable, the one-dimensional mean value theorem [Phi16, Th. 9.18] yields the existence of some η ∈]b, b[satisfying

∂xf(ξ, b)− ∂xf(ξ, b) = G(b)−G(b) = (b− b)G′(η) = (b− b)∂y∂xf(ξ, η). (4.8b)

Combining (4.8a) and (4.8b) proves the lemma. �

Theorem 4.11 (Schwarz). Let G be an open subset of R2. Suppose that f : G −→ K,(x, y) 7→ f(x, y), has partial derivatives ∂xf , ∂yf , and ∂y∂xf everywhere in G. If ∂y∂xfis continuous in (a, b) ∈ G, then ∂x∂yf(a, b) exists and ∂x∂yf(a, b) = ∂y∂xf(a, b) (inparticular, ∂y∂xf = ∂x∂yf if all the functions f , ∂xf , ∂yf , ∂y∂xf are continuous).


Proof. We first note that it suffices to prove the theorem for K = R, as one can thenapply the result to both Re f and Im f to obtain the caseK = C. Thus, for the remainderof the proof, we assume f to be R-valued. Given ǫ > 0, since ∂y∂xf is continuous in(a, b) and since G is open, there exists δ > 0 such that I := [a−δ, a+δ]×[b−δ, b+δ] ⊆ Gand

∀(x,y)∈I

|∂y∂xf(x, y)− ∂y∂xf(a, b)| < ǫ. (4.9)

Let (h, k) ∈ R2 \ {(0, 0)} with 0 < |h|, |k| < δ. Since

f(a+ h, b+ k) + f(a, b)− f(a, b+ k)− f(a+ h, b)

hk

=1

h

(f(a+ h, b+ k)− f(a+ h, b)

k− f(a, b+ k)− f(a, b)

k

)

, (4.10)

Lem. 4.10 together with (4.9) implies∣∣∣∣

1

h

(f(a+ h, b+ k)− f(a+ h, b)

k− f(a, b+ k)− f(a, b)

k

)

− ∂y∂xf(a, b)

∣∣∣∣< ǫ. (4.11)

Taking the limit for k → 0 in (4.11) yields∣∣∣∣

∂yf(a+ h, b)− ∂yf(a, b)

h− ∂y∂xf(a, b)

∣∣∣∣≤ ǫ.

Since ǫ > 0 is arbitrary, we have shown ∂x∂yf(a, b) = ∂y∂xf(a, b) as desired. �

Using the combinatorial result that one can achieve an arbitrary permutation by a finitesequence of permutations of precisely two juxtaposed elements (cf. [Phi16, Th. B.7(b)])one can easily extend Th. 4.11 to partial derivatives of order k > 2.

Theorem 4.12. Let G be an open subset of Rn, n ∈ N, and let k ∈ N. Suppose thatfor f : G −→ K all partial derivatives of order less than or equal to k exist in G andare continuous in ξ ∈ G. Than the value of each partial derivative of f of order kin ξ is independent of the order in which the individual partial derivatives are carriedout. In other words, if p = (p1, . . . , pk) ∈ {1, . . . , n}k and q = (q1, . . . , qk) ∈ {1, . . . , n}ksuch that there exists a permutation (i.e. a bijective map) π : {1, . . . , k} −→ {1, . . . , k}satisfying q = (pπ(1), . . . , pπ(k)), then ∂pf(ξ) = ∂qf(ξ). If f : G −→ Km, m ∈ N, thenthe same holds with respect to each coordinate function fj of f , j ∈ {1, . . . ,m}.

Proof. For k = 1, there is nothing to prove. So let k > 1. For l ∈ 1, . . . , k − 1,let τl : {1, . . . , k} −→ {1, . . . , k} be the transposition that interchanges l and l + 1and leaves all other elements fixed (i.e. τl(l) = l + 1, τl(l + 1) = l, τ(α) = α foreach α ∈ {1, . . . , k} \ {l, l + 1}) and let T := {τ1, . . . , τk−1}. Then Th. 4.11 directlyimplies that the theorem holds for π = τ for each τ ∈ T . For a general permutationπ : {1, . . . , k} −→ {1, . . . , k}, the abovementioned combinatorial result provides a finitesequence (τ 1, . . . , τN), N ∈ N, of elements of T such that π = τN ◦ · · · ◦ τ 1. Thus,as we already know that the theorem holds for N = 1, the case N > 1 follows byinduction. �


Now that we have seen that functions with continuous partials are particularly benign,we introduce some special notation dedicated to such functions:

Definition 4.13. Let G ⊆ Rn, f : G −→ K, k ∈ N0. If all partials of f up to order kexist everywhere in G, and if f and all its partials up to order k are continuous on G,then f is said to be of class Ck (one also says that f has continuous partials up to orderk). The set of all K-valued functions of class Ck is denoted by Ck(G,K) (in particular,C0(G,K) = C(G,K)). If f has continuous partials of all orders, than f is said to be ofclass C∞, i.e. C∞(G,K) :=

⋂∞k=0C

k(G,K). For R-valued functions, we introduce theshorter notation Ck(G) := Ck(G,R) for each k ∈ N0 ∪ {∞}. Finally, for f : G −→ Km,we say that f is of class Ck if, and only if, each coordinate function fj, j ∈ {1, . . . ,m},is of class Ck. The set of all such functions is denoted by Ck(G,Km).

Notation 4.14. For two vectors u = (u1, u2, u3) ∈ K3, v = (v1, v2, v3) ∈ K3, the crossproduct is an element of K3 defined as follows:

u× v :=(

u2v3 − u3v2, u3v1 − u1v3, u1v2 − u2v1

)

. (4.12)

Definition 4.15. Let G ⊆ Rn, n ∈ N, ξ ∈ G.

(a) If f : G −→ Kn and the partials ∂jfj(ξ) exist for each j ∈ {1, . . . , n}, then thedivergence of f in ξ is defined as

div f(ξ) :=n∑

j=1

∂jfj(ξ) =∂f1(ξ)

∂x1+ · · ·+ ∂fn(ξ)

∂xn. (4.13)

If div f(ξ) exists for all ξ ∈ G, then div f : G −→ K. Sometimes, one defines the deloperator ∇ = (∂1, . . . , ∂n) and then writes div f = ∇ ·f , using the analogue between(4.13) and the definition of the Euclidean scalar product. Also note that div f(ξ)is precisely the trace of the corresponding Jacobi matrix, div f(ξ) = tr Jf (ξ).

(b) If f : G −→ K has second-order partials at ξ, then one defines the Laplacian (alsoknown as the Laplace operator) of f in ξ by

∆f(ξ) := div∇ f(ξ) =n∑

j=1

∂j∂jf(ξ) = ∂21f(ξ) + · · ·+ ∂2nf(ξ). (4.14)

If ∆f(ξ) exists for all ξ ∈ G, then ∆f : G −→ K.

(c) If n = 3 and f : G −→ K3 has first-order partials at ξ, then one defines the curl off in ξ by

curl f(ξ) :=(∂2f3(ξ)− ∂3f2(ξ), ∂3f1(ξ)− ∂1f3(ξ), ∂1f2(ξ)− ∂2f1(ξ)

)

=

(∂f3(ξ)

∂x2− ∂f2(ξ)

∂x3,∂f1(ξ)

∂x3− ∂f3(ξ)

∂x1,∂f2(ξ)

∂x1− ∂f1(ξ)

∂x2

)

. (4.15)

If curl f(ξ) exists for all ξ ∈ G, then curl f : G −→ K3. Again, one sometimesdefines the del operator ∇ = (∂1, ∂2, ∂3) and then writes curl f = ∇×f , using theanalogue between (4.15) and the definition of the cross product or two vectors inK3.


Proposition 4.16. Let G ⊆ R3, let f : G −→ K be a scalar-valued function and letv : G −→ K3 be a vector-valued function.

(a) If ξ ∈ G is such that f and v have all partials of first order at ξ, then

curl(fv)(ξ) = f(ξ) curl v(ξ) +∇ f(ξ)× v(ξ).

(b) If G is open and f ∈ C2(G,K), then curl∇ f vanishes identically on G, i.e.

curl∇ f ≡ 0.

(c) If G is open and v ∈ C2(G,K3), then div curl v vanishes identically on G, i.e.

div curl v ≡ 0.


4.4 Interlude: Graphical Representation in Two Dimensions

In this section, we will briefly address the problem of drawing graphs of functions f :Df −→ R with Df ⊆ R2. If the function f is sufficiently benign (for example, iff ∈ C1(R2)), then the graph of f , namely the set {(x, y, z) ∈ R3 : (x, y) ∈ Df , z =f(x, y)} ⊆ R3 will represent a two-dimensional surface in the three-dimensional spaceR3. The two most important methods for depicting the graph of f as a picture in atwo-dimensional plane (such as a sheet of paper or a board) are:

(a) The use of perspective.

(b) The use of level sets, in particular, level curves (also known as contour lines).

The Use of Perspective

Nowadays, this is most effectively accomplished by the use of computer graphics soft-ware. Widely used programs include commercial software such as MATLAB and Math-ematica as well as the noncommercial software Gnuplot.

The Use of Level Sets

By a level set or an isolevel, we mean a set of the form f−1{C} = {(x, y) ∈ Df : f(x, y) =C} with C ∈ R. If f−1{C} constitutes a curve in R2, then we speak of a level curve or acontour line. Representation of functions depending on two variables by contour lines iswell-known from everyday live. For example, contour lines are used to depict the heightabove sea level on hiking maps; on meteorological maps, isobars and isotherms are usedto depict levels of equal pressure and equal temperature, respectively. Determining


level sets and contour lines can be difficult, and the appropriate method depends on thefunction under consideration. In some cases, it is possible to determine the contour linecorresponding to the level C ∈ f(Df ) by solving the equation C = f(x, y) for y (thedifficulty is that an explicit solution of this equation can not always be found). Thefollowing Example 4.17 provides some cases, where C = f(x, y) can be solved explicitly:

Example 4.17. (a) For f : R2 −→ R, f(x, y) := x2 + y2, and C ∈ R+0 , one has

|y| =√C − x2 for −

√C ≤ x ≤

√C.

(b) For f : R2 −→ R, f(x, y) := xy, and C ∈ R, one has

y =C

xfor x 6= 0.

For C = 0, one actually gets x = 0 or y = 0, which provides one additional contourline.

—

In some cases, it helps to write C = f(x, y) in different coordinates (e.g., polar coordi-nates). In general, the question if C = f(x, y) can be solved for y (or x) is related tothe implicit function Th. 4.49 below.

4.5 The Total Derivative and the Notion of Differentiability

Roughly, a function f : G −→ Km, G ⊆ Kn, will be called differentiable if, locally,it can be approximated by a K-affine function, i.e., if, for each ζ ∈ G, there exists anK-linear function L(ζ) such that f(ζ + h) ≈ f(ζ) + L(h) for sufficiently small h ∈ Kn.Analogous to the treatment in the one-dimensional situation in [Phi16, Sec. 9], we willcall a function f : G −→ Cm, G ⊆ Rn, R-differentiable if, and only if, both Rm-valuedfunctions Re f and Im f are Rm-differentiable.

Definition 4.18. Let G be an open subset of Kn, n ∈ N, f : G −→ Km, m ∈ N, ζ ∈ G.Then f is called K-differentiable (or just differentiable if the field K is understood) in ζif, and only if, there exists a K-linear map L : Kn −→ Km such that

limh→0

f(ζ + h)− f(ζ)− L(h)

‖h‖2= 0. (4.16a)

Note that, in general, L will depend on ζ. If f is differentiable in ζ, then L is calledthe total derivative or the total differential of f in ζ. In that case, one writes Df(ζ)instead of L. For G ⊆ Rn, we call f : G −→ Cm to be R-differentiable in ξ ∈ G if,and only if, both Re f and Im f are R-differentiable in ξ in the above sense. If f isR-differentiable in ξ, define Df(ξ) := DRe f(ξ) + iD Im f(ξ) to be the total derivativeor the total differential of f in ξ. It is then an easy exercise to show

limh→0

f(ξ + h)− f(ξ)−Df(ξ)(h)

‖h‖2= 0. (4.16b)


Finally, f is called K-differentiable if, and only if, f is K-differentiable in every ζ ∈ G(a C-differentiable function is also called holomorphic – holomorphic functions are thecentral topic of the field of Complex Analysis).

Remark 4.19. (a) As the set G ⊆ Kn in Def. 4.18 is open, it is guaranteed thatζ + h ∈ G for ‖h‖2 sufficiently small: There exists ǫ > 0 such that ‖h‖2 < ǫ impliesζ + h ∈ G.

(b) As all norms on Kn are equivalent, instead of the Euclidean norm ‖ · ‖2, one canuse any other norm on Kn in (4.16a) without changing the definition.

(c) Since Cn ∼= R2n and since addition in Cn is precisely addition in R2n, multiplicationby λ ∈ R in Cn is precisely multiplication by λ in R2n, if G ⊆ Cn and f : G −→ Cm

in Def. 4.18 is C-differentiable in ζ ∈ G, then it is also R-differentiable in ζ ∈ G(if L is C-linear as a map from Cn to Cm, then L is also R-linear as a map fromR2n to R2m). However, C-differentiability is a much stronger condition than R-differentiability (cf. Rem. 4.23(b) and Rem. 4.27 below).

Lemma 4.20. Let G be an open subset of Kn, n ∈ N, ζ ∈ G. Then f : G −→ Km,m ∈ N, is K-differentiable in ζ if, and only if, there exists a K-linear map L : Kn −→Km and another (not necessarily linear) map r : Kn −→ Km such that

f(ζ + h)− f(ζ) = L(h) + r(h) (4.17a)

for each h ∈ Kn with sufficiently small ‖h‖2, and

limh→0

r(h)

‖h‖2= 0. (4.17b)

Proof. Suppose L, r are as above and satisfy (4.17). Then, for each 0 6= h ∈ Kn withsufficiently small ‖h‖2, it holds that

f(ζ + h)− f(ζ)− L(h)

‖h‖2=r(h)

‖h‖2. (4.18)

Thus, (4.17b) implies (4.16a), showing that f is differentiable. Conversely, if f is dif-ferentiable in ζ, then there exists a K-linear map L : Kn −→ Km satisfying (4.16a).Choose ǫ > 0 such that Bǫ,‖·‖2(ζ) ⊆ G and define

r : Kn −→ Km, r(h) :=

{

f(ζ + h)− f(ζ)− L(h) for h ∈ Bǫ,‖·‖2(ζ),

0 otherwise.(4.19)

Then (4.17a) is immediate. Since (4.18) also holds, (4.16a) implies (4.17b). �

In the following Th. 4.21 and Cor. 4.22, we consider R-differentiability, then comingback to C-differentiability in Rem. 4.23 and Cor. 4.24.


Theorem 4.21. Let G be an open subset of Rn, n ∈ N, ξ ∈ G. If f : G −→ K is R-differentiable in ξ, then f is continuous in ξ, all partials at ξ, i.e. ∂jf(ξ), j ∈ {1, . . . , n},exist, and Df(ξ) = ∇ f(ξ) (that means, for each h = (h1, . . . , hn) ∈ Rn, one hasDf(ξ)(h) = ∇ f(ξ)h =

∑nj=1 ∂jf(ξ)hj). In particular, Df(ξ) is unique and, hence,

well-defined.

Proof. Assume f is R-differentiable in ξ. We first consider the case K = R. Let thelinear map L : Rn −→ R and r : Rn −→ R be as in Lem. 4.20. We already knowfrom Example 2.16 that each linear map from Rn into R is continuous. In particular,L must be continuous. Now let (xk)k∈N be a sequence in G that converges to ξ, i.e.limk→∞ ‖xk − ξ‖2 = 0. Then (hk)k∈N with hk := xk − ξ constitutes a sequence in Rn

such that limk→∞ ‖hk‖2 = 0. Note that (4.17b) implies that 0 ≤ |r(h)| < ‖h‖2 for‖h‖2 sufficiently small. Thus, limk→∞ ‖hk‖2 = 0 implies limk→∞ |r(hk)| = 0. As thecontinuity of L also yields limk→∞ |L(hk)| = 0, (4.17a) provides

limk→∞

∣∣f(xk)− f(ξ)

∣∣ = lim

k→∞

∣∣f(ξ + hk)− f(ξ)

∣∣

= limk→∞

∣∣L(hk)

∣∣+ lim

k→∞

∣∣r(hk)

∣∣ = 0, (4.20)

establishing the continuity of f in ξ. To see that the partials exist and that L is givenby the gradient, set lj := L(ej) for each j ∈ {1, . . . , n}. If h = tej with t ∈ R sufficientlyclose to 0, than (4.17a) yields

f(ξ + tej)− f(ξ) = t lj + r(tej). (4.21)

For t 6= 0, we can divide by t. Letting t → 0, we see from (4.17b) that the right-hand side converges to lj. But this means that the left-hand side must converge aswell, and comparing with (4.1), we see that its limit is precisely ∂jf(ξ), thereby provinglj = ∂jf(ξ) as claimed. We now consider the case K = C. From the case K = R,we know Re f and Im f are both continuous at ξ, such that, by Ex. 2.12(d), f mustbe continuous at ξ as well. Moreover, from the case K = R, we know ∂j Re f(ξ) and∂j Im f(ξ) exist for each j ∈ {1, . . . , n}. Thus, ∂jf(ξ) = ∂j Re f(ξ) + i ∂j Im f(ξ) existas well by [Phi16, Rem. 9.2]. �

By applying Th. 4.21 to coordinate functions, we can immediately extend it to Km-valued functions:

Corollary 4.22. Let G be an open subset of Rn, n ∈ N, ξ ∈ G. If f : G −→ Km is R-differentiable in ξ, then f is continuous in ξ, all partials at ξ, i.e. ∂kfl(ξ), k ∈ {1, . . . , n},l ∈ {1, . . . ,m}, exist, and Df(ξ) = Jf (ξ): For each h = (h1, . . . , hn) ∈ Rn, one has

Df(ξ)(h) = Jf (ξ)

h1...hn

=

∇ f1(ξ)(h)...

∇ fm(ξ)(h)

=

∑nk=1 ∂kf1(ξ)hk

...∑n

k=1 ∂kfm(ξ)hk

.

In particular, Df(ξ) is unique and, hence, well-defined. �


We now proceed to further investigate the relation between R-differentiability and C-differentiability.

Remark 4.23. (a) If L : C −→ C is a C-linear map, then there exists a ∈ C such thatL(z) = az. As C = R2, using the definition of complex multiplication and lettinga = α + iβ, z = x+ iy, we can write this in matrix form as

L(z) = az =

(αx− βyαy + βx

)

=

(α −ββ α

)(xy

)

. (4.22)

Thus, a map L : R2n −→ R2m can be interpreted as a C-linear map L : Cn −→ Cm

if, and only if, it is R-linear and each 2 × 2 block in its matrix representation hasthe form of (4.22).

(b) If G ⊆ C is open, f : G −→ C, ζ ∈ G, then combining (a) with Cor. 4.22 impliesf to be C-differentiable in ζ if, and only if, f is R-differentiable in ζ with

Df(ζ) =

(∂1Re f(ζ) ∂2 Re f(ζ)∂1 Im f(ζ) ∂2 Im f(ζ)

)

being C-linear. According to (4.22), this means

∂1 Re f(ζ) = ∂2 Im f(ζ), ∂1 Im f(ζ) = −∂2Re f(ζ). (4.23)

The equations of (4.23) are known as the Cauchy-Riemann differential equations(they are partial differential equations, as the involve partial derivatives).

Corollary 4.24. Let G be an open subset of Cn = R2n, n ∈ N, ζ ∈ G. Then f : G −→Cm = R2m is C-differentiable in ζ if, and only if, f is R-differentiable in ζ and each 2×2block of the real (2m) × (2n) matrix Jf (ζ) has the form of (4.22) (i.e. each C-valuedcomponent function fl : G −→ C satisfies a set of Cauchy-Riemann equations as in(4.23) for each of its n complex input arguments). Thus, if f is C-differentiable in ζ,then each entry of the “complex Jacobian”, i.e. of the complex m×n matrix representingDf(ζ) : Cn −→ Cm, uniquely corresponds to a 2 × 2 block of partials in Jf (ζ). Eachentry of the complex Jacobian can be seen as a complex partial ∂ckfl, uniquely determinedby the corresponding four real partials in Jf (ζ). In particular, f is continuous in ζ andDf(ζ) is unique (and, hence, well-defined).

Proof. The corollary merely combines Cor. 4.22 with Rem. 4.23. �

Example 4.25. (a) If G ⊆ Kn, n ∈ N, is open and f : G −→ Km is constant (i.e.there is c ∈ Km such that f(x) = c for each x ∈ G), than f is K-differentiable withDf ≡ 0: It suffices to notice that, for a constant f and L ≡ 0, the numerator in(4.16a) vanishes identically.

(b) If A : Kn −→ Km is K-linear, then A is K-differentiable with DA(ζ) = A for eachζ ∈ Kn: If ζ, h ∈ Kn, then A(ζ + h)−A(ζ)−A(h) = 0, showing that, as in (a), thenumerator in (4.16a) (with f = L = A) vanishes identically.


(c) Let us revisit the well-known case of an R-differentiable one-dimensional function(cf. [Phi16, Def. 9.1]), and compare this notion of differentiability with the moregeneral one of Def. 4.18. Thus, let G be an open subset of R, ξ ∈ G, and f : G −→K. We claim that f is differentiable at ξ in the sense of [Phi16, (9.1)] if, and onlyif, f is R-differentiable at ξ in the sense of Def. 4.18 with

Df(ξ) : R −→ K, Df(ξ)(h) := f ′(ξ)h. (4.24)

As, in both situations, a C-valued f is R-differentiable at ξ if, and only if, bothRe f and Im f are R-differentiable at ξ, it suffices to consider K = R. Thus, let fbe R-valued. If f is differentiable at ξ as a one-dimensional function and we usethe Df(ξ) according to (4.24) for the linear map L of Def. 4.18, then we get, foreach 0 6= h ∈ R sufficiently close to 0,

f(ξ + h)− f(ξ)− L(h)

‖h‖2=f(ξ + h)− f(ξ)− f ′(ξ)h

|h|

=

{f(ξ+h)−f(ξ)

h− f ′(ξ) for h > 0,

f ′(ξ)− f(ξ+h)−f(ξ)h

for h < 0.(4.25a)

Furthermore, f ′(ξ) = limh→0f(ξ+h)−f(ξ)

hby its definition, i.e.

limh→0

∣∣∣∣

f(ξ + h)− f(ξ)

h− f ′(ξ)

∣∣∣∣= 0. (4.25b)

Combining (4.25a) and (4.25b), one obtains

limh→0

f(ξ + h)− f(ξ)− L(h)

‖h‖2= 0, (4.25c)

showing that f is R-differentiable in ξ in the sense of Def. 4.18. Conversely, if f isR-differentiable in ξ in the sense of Def. 4.18, then, according to Th. 4.21, ∂1f(ξ)exists and Df(ξ)(h) = ∂1f(ξ)h. Thus, the one-dimensional differentiability of f atξ as well as (4.24) follow by noticing that the definitions of ∂1f(ξ) and of f ′(ξ) areidentical.

Example 4.26. We will show that if f : Br(0) −→ C is represented by a power series,where r ∈]0,∞] is its radius of convergence, Br(0) ⊆ C (cf. [Phi16, Th. 8.9]), then f isalways C-differentiable (i.e. holomorphic) and also C∞. Then, in particular, this holdson all of C for each polynomial and for the functions exp, sin, and cos (as they all haveradius of convergence r = ∞). Thus, let

f : Br(0) −→ C, f(z) =∞∑

j=0

aj zj , (4.26a)

where aj ∈ C and r ∈]0,∞] is the radius of convergence of the power series. Considerthe power series

g : Br(0) −→ C, g(z) =∞∑

j=1

j aj zj−1. (4.26b)


Note that the radius of convergence of g is the same as for f , as

lim supn→∞

n√

|n an| = limn→∞

n√n · lim sup

n→∞

n√

|an| = 1 · lim supn→∞

n√

|an| = r−1.

We have to show that

∀z∈Br(0)

Df(z) : C −→ C, Df(z)(h) = g(z)h.

Thus, for each z ∈ Br(0), we need to prove

limh→0

f(z + h)− f(z)− g(z)h

|h| = 0. (4.27)

To this end, given z ∈ Br(0), choose ρ ∈]|z|, r[ and define, for sufficiently small h 6= 0,

δ(h) :=f(z + h)− f(z)− g(z)h

h=

∑∞j=0 aj ((z + h)j − zj)

h−

∞∑

j=1

j aj zj−1

=∞∑

j=1

aj

((z + h)j − zj

h− j zj−1

)

=∞∑

j=1

aj wj,

where

∀j∈N

wj :=(z + h)j − zj

h− j zj−1.

Then w1 = 0 and, for each j ≥ 2,

wj = h

j−1∑

k=1

k zk−1 (z + h)j−k−1,

as can be verified via induction over j (exercise). Thus, for each h 6= 0 such that|z + h| < ρ, one has

∀j≥2

|wj| < |h|j−1∑

k=1

k ρk−1 ρj−k−1[Phi16, Ex. 3.4]

≤ |h| j (j − 1)

2ρj−2,

implying

|δ(h)| ≤ |h|∞∑

j=2

j2 |aj| ρj−2. (4.28)

As ρ < r, the series in (4.28) converges to some finite (nonnegative real) number, showinglimh→0 δ(h) = 0 and g(z) = Df(z).

Remark 4.27. It is an important result of Complex Analysis that the converse of Ex.4.26 is also true: If G ⊆ C is open and f : G −→ C is holomorphic, then f is analytic,i.e. locally representable as a power series. More precisely, for each a ∈ G, there existsr > 0 and a sequence (cj)j∈N in C such that Br(a) ⊆ G and f(z) =

∑∞j=0 cj(z − a)j for

each z ∈ Br(a) (see, e.g., [Kon04, Sec. 6.2] or [Rud87, Th. 10.16]). As a consequence,Ex. 4.26 implies that every holomorphic function is automatically C∞.


Proposition 4.28. Forming the total derivative is a linear operation: Let G be an opensubset of Kn, n ∈ N, ζ ∈ G.

(a) If f, g : G −→ Km, m ∈ N, are both K-differentiable at ζ, then f + g is K-differentiable at ζ and D(f + g)(ζ) = Df(ζ) +Dg(ζ).

(b) If f : G −→ Km, m ∈ N, is K-differentiable at ζ and λ ∈ K, then λf is K-differentiable at ζ and D(λf)(ζ) = λDf(ζ).

Proof. (a): We note that, for each h ∈ Kn with 0 6= ‖h‖2 sufficiently small,

(f + g)(ζ + h)− (f + g)(ζ)−Df(ζ)(h)−Dg(ζ)(h)

‖h‖2=f(ζ + h)− f(ζ)−Df(ζ)(h)

‖h‖2+g(ζ + h)− g(ζ)−Dg(ζ)(h)

‖h‖2.

Thus, if the limit limh→0 exists and equals 0 for both summands on the right-hand side,then the same must be true for the left-hand side.

(b): For λ ∈ K, one computes

limh→0

(λf)(ζ + h)− (λf)(ζ)− λDf(ζ)(h)

‖h‖2= λ lim

h→0

f(ζ + h)− f(ζ)−Df(ζ)(h)

‖h‖2= 0,

thereby establishing the case. �

Even though we have seen in Example 4.2 that the existence of all partial derivativesdoes not even imply continuity, let alone differentiability, the next theorem and itscorollary will show that if all partial derivatives exist and are continuous, then thatdoes, indeed, imply differentiability.

Theorem 4.29. Let G be an open subset of Rn, n ∈ N, ξ ∈ G, and f : G −→ K. Ifall partials ∂jf , j ∈ {1, . . . , n} exist everywhere in G and are continuous in ξ, then f isR-differentiable in ξ, and, in particular, f is continuous in ξ.

Proof. As usual, the case K = C follows by applying the case K = R to Re f and Im f .We, therefore proceed to treat the case K = R. We first consider the special case where∂jf(ξ) = 0 for each j ∈ {1, . . . , n}. In that case, we need to show

limh→0

f(ξ + h)− f(ξ)

‖h‖1= 0 (4.29)

(noting that ‖ ·‖1 and ‖ ·‖2 are equivalent on Rn). Since G is open and since the ∂jf arecontinuous in ξ, given ǫ > 0, there is δ > 0 such that, for each h ∈ Rn with ‖h‖1 < δ,one has ξ + h ∈ G and

∣∣∂jf(ξ + h)

∣∣ < ǫ for every j ∈ {1, . . . , n}. Fix h ∈ Rn with


‖h‖1 < δ. Then

f(ξ + h)− f(ξ)

= f(ξ1 + h1, . . . , ξn−1 + hn−1, ξn + hn)− f(ξ1 + h1, . . . , ξn−1 + hn−1, ξn)

+ f(ξ1 + h1, . . . , ξn−1 + hn−1, ξn)− f(ξ1 + h1, . . . , ξn−1, ξn)

+− · · ·+ f(ξ1 + h1, ξ2, . . . , ξn)− f(ξ1, ξ2, . . . , ξn)

= f(ξ + h)− f

(

ξ +n−1∑

k=1

hkek

)

+ f

(

ξ +n−1∑

k=1

hkek

)

− f

(

ξ +n−2∑

k=1

hkek

)

+− · · ·+ f(ξ + h1e1)− f(ξ)

=n−1∑

j=0

f

(

ξ +

n−j∑

k=1

hkek

)

− f

ξ +

n−(j+1)∑

k=1

hkek

=n−1∑

j=0

(φj(hn−j)− φj(0)

), (4.30)

where, for each j ∈ {0, . . . , n− 1},

φj : [0, hn−j] −→ R, φj(t) := f

ξ + ten−j +

n−(j+1)∑

k=1

hkek

.

If hn−j = 0, then set θj := 0. Otherwise, apply the one-dimensional mean value theorem[Phi16, Th. 9.18] to the one-dimensional function φj to get numbers θj ∈]0, hn−j[ suchthat

φj(hn−j)− φj(0) = hn−j φ′j(θj) = hn−j ∂n−jf

ξ + θjen−j +

n−(j+1)∑

k=1

hkek

. (4.31)

Combining (4.30) with (4.31) yields

f(ξ + h)− f(ξ) =n−1∑

j=0

hn−j ∂n−jf

ξ + θjen−j +

n−(j+1)∑

k=1

hkek

. (4.32)

Noting that ‖h‖1 < δ implies∥∥θjen−j +

∑n−(j+1)k=1 hkek

∥∥1< δ, we obtain from (4.32)

that, for 0 6= h with ‖h‖1 < δ,

∣∣f(ξ + h)− f(ξ)

∣∣

‖h‖1<

1

‖h‖1

n−1∑

j=0

|hn−j| ǫ = ǫ,

thereby proving (4.29) and establishing the case. It remains to consider a general f :G −→ R, without the restriction of a vanishing gradient. For such a general f , consider


the modified function g : G −→ R, g(x) := f(x) −∇ f(ξ)(x) = f(x) −∑nj=1 ∂jf(ξ)xj.

For g, we then get ∂jg(x) = ∂jf(x)− ∂jf(ξ) for each x ∈ G. In particular, the ∂jg existin G, are continuous at x = ξ, and vanish at x = ξ. Thus, the first part of the proofapplies to g, showing that g is R-differentiable at ξ. Since f = g +∇ f(ξ) and both gand the linear map ∇ f(ξ) are R-differentiable at ξ, so is f by Prop. 4.28(a). �

Corollary 4.30. Let G be an open subset of Rn, n ∈ N, ξ ∈ G, and f : G −→ Km,m ∈ N. If all partials ∂kfl, k ∈ {1, . . . , n}, l ∈ {1, . . . ,m}, exist everywhere in G andare continuous in ξ, then f is R-differentiable in ξ, and, in particular, f is continuousin ξ.

Proof. Applying Th. 4.29 to the coordinate functions fl, l ∈ {1, . . . ,m}, yields that eachfl is R-differentiable at ξ. However, since a Km-valued function converges if, and onlyif, each of its coordinate functions converges, f must also be R-differentiable at ξ. �

4.6 Higher Order Total Derivatives as Multilinear Maps

Let G be an open subset of Kn, n ∈ N, f : G −→ Km, m ∈ N. If f is K-differentiablein G, then Df : G −→ L(Kn,Km), where L(Kn,Km) ∼= Knm denotes the vector spaceover K of all K-linear maps from Kn into Km. The coordinate functions are the partialderivatives ∂jfl (as mentioned in Cor. 4.24, this also makes sense for K = C). If thisfunction Df is K-differentiable in ζ ∈ G, then we call its derivative the second totalderivative of f in ζ, denoted by D2f(ζ). It is an element of L(Kn,L(Kn,Km)) ∼= Kn2m.Fortunately, L(Kn,L(Kn,Km)) ∼= L2(Kn,Km), where L2(Kn,Km) is the vector spaceover K of all bilinear maps from Kn×Kn into Km (cf. Sec. J, Th. J.3, in the Appendix).Thus,

D2f(ζ) : Kn ×Kn −→ Km, D2f(ζ)(h, k)l =n∑

j1,j2=1

∂j1∂j2fl(ζ)hj1kj2 , l ∈ {1, . . . ,m}.

If f is twice K-differentiable in G, then D2f : G −→ L2(Kn,Km) and its coordinatefunctions are the second partials of f .

Inductively, one obtains that, if f is α times K-differentiable in G, α ∈ N, then Dαf :G −→ Lα(Kn,Km) ∼= L(Kn,Lα−1(Kn,Km)) ∼= Knαm, where Lα(Kn,Km) is the vectorspace over K of all α times linear maps from (Kn)α into Km, L0(Kn,Km) := Km (again,cf. Sec. J, Th. J.3). For each ζ ∈ G,

Dαf(ζ) : (Kn)α −→ Km,

Dαf(ζ)(h1, . . . , hα)l =n∑

j1,...,jα=1

∂j1 · · · ∂jαfl(ζ)h1j1 · · ·hαjα , l ∈ {1, . . . ,m}, (4.33)

i.e. the coordinate functions of Dαf are precisely the partials of order α of f .


4.7 The Chain Rule

As for one-dimensional differentiable functions, one can also prove a chain rule for vector-valued differentiable functions:

Theorem 4.31. Let m,n, p ∈ N. Let Gf ⊆ Kn be open, f : Gf −→ Km, let Gg ⊆ Km

be open, g : Gg −→ Kp, f(Gf ) ⊆ Gg. If f is K-differentiable at ζ ∈ Gf and g is K-differentiable at f(ζ) ∈ Gg, then g◦f : Gf −→ Kp is K-differentiable at ζ and, for the K-linear maps D(g◦f)(ζ) : Kn −→ Kp, Df(ζ) : Kn −→ Km, and Dg

(f(ζ)

): Km −→ Kp,

the following chain rule holds:

D(g ◦ f)(ζ) = Dg(f(ζ)

)◦Df(ζ). (4.34)

In particular, if both f and g are K-differentiable, then g ◦ f is K-differentiable.

Proof. Since f is K-differentiable at ζ and g is K-differentiable at f(ζ), according toLem. 4.20, there are functions rf : Kn −→ Km and rg : K

m −→ Kp satisfying

rf (h) = f(ζ + h)− f(ζ)−Df(ζ)(h), (4.35a)

rg(h) = g(f(ζ) + h

)− g(f(ζ)

)−Dg

(f(ζ))(h) (4.35b)

for each h ∈ Kn (resp. each h ∈ Km) such that ‖h‖2 is sufficiently small, as well as

limh→0

rf (h)

‖h‖2= 0, lim

h→0

rg(h)

‖h‖2= 0. (4.36)

Defining rg◦f : Kn −→ Kp by

rg◦f (h) :=

{

(g ◦ f)(ζ + h)− (g ◦ f)(ζ)−(

Dg(f(ζ)

)◦Df(ζ)

)

(h) for ζ + h ∈ Gf ,

0 otherwise,

(4.37)it remains to show

limh→0

rg◦f (h)

‖h‖2= 0. (4.38)

For each h ∈ Kn with ‖h‖2 sufficiently small, we use (4.35) to compute

(g ◦ f)(ζ + h) = g(

f(ζ) +Df(ζ)(h) + rf (h))

= g(f(ζ)

)+Dg

(f(ζ))

(

Df(ζ)(h) + rf (h))

+ rg

(

Df(ζ)(h) + rf (h))

,

implying

rg◦f (h) = Dg(f(ζ))

(

rf (h))

+ rg

(

Df(ζ)(h) + rf (h))

.

From Th. 2.22, we know that Dg(f(ζ)) is Lipschitz continuous with some Lipschitzconstant Lg ∈ R+

0 . Thus, for each 0 6= h ∈ Kn,

0 ≤

∥∥∥Dg

(f(ζ))

(

rf (h))∥∥∥2

‖h‖2≤Lg

∥∥rf (h)

∥∥2

‖h‖2,


implying

limh→0

∥∥∥Dg

(f(ζ))

(

rf (h))∥∥∥2

‖h‖2= 0

due to (4.36). Thus, to prove (4.38), it merely remains to show

limh→0

∥∥∥rg

(

Df(ζ)(h) + rf (h))∥∥∥2

‖h‖2= 0. (4.39)

To that end, we rewrite, for Df(ζ)(h) + rf (h) 6= 0,∥∥∥rg

(

Df(ζ)(h) + rf (h))∥∥∥2

‖h‖2=

∥∥Df(ζ)(h) + rf (h)

∥∥2

∥∥∥rg

(

Df(ζ)(h) + rf (h))∥∥∥2

‖h‖2∥∥Df(ζ)(h) + rf (h)

∥∥2

. (4.40a)

Next, note

limh→0

∥∥Df(ζ)(h) + rf (h)

∥∥2= 0

(4.36)⇒ limh→0

∥∥∥rg

(

Df(ζ)(h) + rf (h))∥∥∥2∥

∥Df(ζ)(h) + rf (h)∥∥2

= 0. (4.40b)

Once again, from Th. 2.22, we know that Df(ζ) is Lipschitz continuous with someLipschitz constant Lf ∈ R+

0 , implying∥∥Df(ζ)(h) + rf (h)

∥∥2

‖h‖2≤∥∥Df(ζ)(h)

∥∥2+∥∥rf (h)

∥∥2

‖h‖2≤ Lf + 1 (4.40c)

for 0 6= ‖h‖2 sufficiently small. Combining (4.40a) – (4.40c) proves (4.39) and, thus,(4.38). Together with (4.37) and Lem. 4.20, this shows that g ◦ f is differentiable at ζwith D(g ◦ f)(ζ) = Dg

(f(ζ)

)◦Df(ζ). �

Example 4.32. In the setting of the chain rule of Th. 4.31, we consider the specialcase n = p = 1. Thus, we have an open subset Gf of R and f : Gf −→ Rm. The mapg maps Gg into K and for h := g ◦ f : Gf −→ K, we have h(t) = g

(f1(t), . . . , fm(t)

).

In this case, one computes the one-dimensional function h by making a detour throughthe m-dimensional space Rm. If f is R-differentiable at ξ ∈ Gf and g is R-differentiableat f(ξ) ∈ Gg, the chain rule (4.34) now reads

Dh(ξ) = D(g ◦ f)(ξ) = Dg(f(ξ)

)◦Df(ξ) = ∇ g

(f(ξ)

)Jf (ξ) =

m∑

j=1

∂jg(f(ξ)

)∂1fj(ξ),

(4.41)where, for K = C, we applied the case K = R of the chain rule to Re g and Im g.Recall from Example 4.25(c) that, for one-dimensional functions such as h, the functionDh(ξ) : R −→ K corresponds to the number h′(ξ) ∈ K via (4.24). Also recall that,for one-dimensional functions such as fj, the partial derivative ∂1fj coincides with theone-dimensional derivative f ′

j. Thus, (4.41) implies

h′(ξ) =m∑

j=1

∂jg(f(ξ)

)f ′j(ξ). (4.42)


Definition 4.33. Let G ⊆ Rm, m ∈ N. A differentiable path is an R-differentiablefunction φ :]a, b[−→ G, a, b ∈ R, a < b. The set G is called connected by differentiablepaths if, and only if, for each x, y ∈ G, there exists some differentiable path φ : ]a, b[−→ Gsuch that φ(s) = x and φ(t) = y for suitable s, t ∈]a, b[.Proposition 4.34. Let G ⊆ Rm be open, m ∈ N. If G is connected by differentiablepaths and f : G −→ K is R-differentiable with ∇ f ≡ 0, then f is constant.

Proof. Let x, y ∈ G, and let φ : ]a, b[−→ G be a differentiable path connecting x andy, i.e. φ(s) = x and φ(t) = y for suitable s, t ∈]a, b[. Define the auxiliary functionh : ]a, b[−→ K, h = f ◦ φ. By the chain rule of Th. 4.31, h is R-differentiable and, using(4.42) and ∂jf ≡ 0 for each j ∈ {1, . . . ,m},

h′(ξ) =m∑

j=1

∂jf(φ(ξ)

)φ′j(ξ) = 0 for each ξ ∈]a, b[.

As a one-dimensional function on an open interval with vanishing derivative, h must beconstant (as both Reh and Imh must be constant by [Phi16, Cor. 9.19(b)]), implyingf(x) = f(φ(s)) = h(s) = h(t) = f(φ(t)) = f(y), showing that f is constant as well. �

4.8 The Mean Value Theorem

Another application of the chain rule in several variables is the mean value theorem inseveral variables:

Theorem 4.35. Let G ⊆ Rn be open, n ∈ N, f : G −→ R. If f is R-differentiable on Gand x, y ∈ G are such that the entire line segment connecting x and y is also containedin G, i.e. Sx,y := {x+ t(y − x) : 0 < t < 1} ⊆ G, then there is ξ ∈ Sx,y satisfying

f(y)− f(x) = Df(ξ)(y − x) = ∇ f(ξ)(y − x) =n∑

j=1

∂jf(ξ)(yj − xj). (4.43)

Proof. We merely need to combine the one-dimensional mean value theorem [Phi16, Th.9.18] with the chain rule of Th. 4.31. A small problem arises from the fact that, in Th.4.31, we required Gf to be open. We therefore note that the openness of G allows us tofind some ǫ > 0 such that the small extension Sx,y,ǫ := {x + t(y − x) : −ǫ < t < 1 + ǫ}is still contained in G: Sx,y,ǫ ⊆ G. Consider the auxiliary functions

φ : ]− ǫ, 1 + ǫ[−→ Rn, φ(t) := x+ t(y − x)

h : ]− ǫ, 1 + ǫ[−→ R, h(t) := (f ◦ φ)(t) = f(x+ t(y − x)

).

As the sum of a constant function and a linear function, φ is differentiable, and Dφ(t) :R −→ Rn, Dφ(t) = y − x (that means, for each α ∈ R, one has Dφ(t)(α) = α(y − x)).Thus, according to Th. 4.31, h is differentiable, and, using (4.41),

Dh(t) = Df(φ(t)

)◦Dφ(t) = ∇ f

(φ(t)

)(y − x). (4.44)


The one-dimensional mean value theorem [Phi16, Th. 9.18] provides θ ∈]0, 1[ such that

f(y)− f(x) = h(1)− h(0) = h′(θ). (4.45)

As in Example 4.32, we recall from (4.24) that the real number h′(θ) represents thelinear map Dh(θ) such that we can combine (4.44) and (4.45) to obtain

f(y)− f(x) = h′(θ) = ∇ f(φ(θ)

)(y − x) = ∇ f(ξ)(y − x)

with ξ := φ(θ) = x+ θ(y − x) ∈ Sx,y, concluding the proof of (4.43). �

Caveat 4.36. Unlike many other results of this class, Th. 4.35 does not extend toC-valued functions – actually, even the one-dimensional mean value theorem does notextend to C-valued functions. It is an exercise to find an explicit counterexample of adifferentiable function f : R −→ C and x, y ∈ R, x < y, such that there does not existξ ∈]x, y[ satisfying f(y)− f(x) = f ′(ξ)(y − x).

—

As an application of Th. 4.35, let us prove that differentiable maps with bounded partialsare Lipschitz continuous on convex sets.

Definition 4.37. A set G ⊆ Rn, n ∈ N, is called convex if, and only if, for eachx, y ∈ G, one has Sx,y := {x+ t(y − x) : 0 < t < 1} ⊆ G.

Theorem 4.38. Let m,n ∈ N, let G ⊆ Rn be open, and let f : G −→ Rm be R-differentiable. Suppose there exists M ∈ R+

0 such that |∂jfl(ξ)| ≤ M for each j ∈{1, . . . , n}, each l ∈ {1, . . . ,m}, and each ξ ∈ G. If G is convex, then f is Lipschitzcontinuous with Lipschitz constant L := mM with respect to the 1-norms on Rn and Rm

and with Lipschitz constant cL, c > 0, with respect to arbitrary norms on Rn and Rm.

Proof. Fix l ∈ {1, . . . ,m}. We first show that fl is M -Lipschitz with respect to the1-norm on Rn: Since fl is differentiable and G is convex, given x, y ∈ G, we can applyTh. 4.35 to obtain ξl ∈ G such that

|fl(y)− fl(x)|(4.43)

≤n∑

j=1

|∂jfl(ξl)| |yj − xj| ≤M‖y − x‖1,

showing that, with respect to the 1-norm, fl is Lipschitz continuous with Lipschitzconstant M . In consequence, we obtain, for each x, y ∈ G,

‖f(y)− f(x)‖1 =m∑

l=1

|fl(y)− fl(x)| ≤ mM‖y − x‖1,

showing that, with respect to the 1-norms on Rn and Rm, f is Lipschitz continuous withLipschitz constant mM . Since all norms on Rn and Rm are equivalent, we also get thatf is Lipschitz continuous with Lipschitz constant cL, c > 0, with respect to all othernorms on Rn and Rm. �

For a variant of Th. 4.38, where the bound on the derivatives is the same as the resultingLipschitz constant, see Th. K.2 of the Appendix.


4.9 Directional Derivatives

Given a real-valued function f , the partial derivatives ∂jf (if they exist) describe thelocal change of f in the direction of the standard unit vector ej. We would now liketo generalize the notion of partial derivative in such a way that it allows us to studythe change of f in an arbitrary direction e ∈ Rn. This leads to the following notion ofdirectional derivatives.

Definition 4.39. Let G ⊆ Rn, n ∈ N, f : G −→ K, ξ ∈ G, e ∈ Rn. If there is ǫ > 0such that ξ + he ∈ G for each h ∈]0, ǫ[ (this condition is trivially satisfied if ξ is aninterior point of G), then f is said to have a directional derivative at ξ in the directione if, and only if, the limit

limh↓0

f(ξ + he)− f(ξ)

h(4.46)

exists in K. In that case, this limit is identified with the corresponding directionalderivative and denoted by ∂f

∂e(ξ) or by δf(ξ, e). If the directional derivative of f in the

direction e exists for each ξ ∈ G, then the function

∂f

∂e: G −→ K, ξ 7→ ∂f

∂e(ξ), (4.47)

is also called the directional derivative of f in the direction e.

Remark 4.40. Consider the setting of Def. 4.39 and suppose e = ej for some j ∈{1, . . . , n}. If ξ is an interior point of G, then the directional derivative ∂f

∂e(ξ) coincides

with the partial derivative ∂jf(ξ) of Def. 4.1 if, and only if, both ∂f∂e(ξ) and ∂f

∂(−e)(ξ)

exist and ∂f∂e(ξ) = − ∂f

∂(−e)(ξ): If ∂jf(ξ) exists, then

∂jf(ξ) = limh→0


h= lim

h↓0


h=∂f

∂e(ξ)

= limh↑0


h= lim

h↓0

f(ξ − hej)− f(ξ)

−h= − lim

h↓0

f(ξ + h(−ej))− f(ξ)

h= − ∂f

∂(−e)(ξ). (4.48)

On the other hand, if both ∂f∂e(ξ) and ∂f

∂(−e)(ξ) exist and ∂f

∂e(ξ) = − ∂f

∂(−e)(ξ), then the

corresponding equalities in (4.48) show that both one-sided partials exist at ξ and thattheir values agree, showing that ∂jf(ξ) =

∂f∂e(ξ) exists.

—

We can now generalize Th. 4.21:

Theorem 4.41. Let G be an open subset of Rn, n ∈ N, ξ ∈ G. If f : G −→ K isR-differentiable in ξ, then, for each e = (ǫ1, . . . , ǫn) ∈ Rn, the directional derivative∂f∂e(ξ) exists and

∂f

∂e(ξ) = ∇ f(ξ) · e =

n∑

j=1

ǫj∂jf(ξ). (4.49)


Moreover, if we consider K = R and only allow normalized e ∈ Rn with ‖e‖2 = 1, thenthe directional derivatives can take only values between α := ‖∇ f(ξ)‖2 and −α, wherethe largest value (i.e. α) is attained in the direction emax := ∇ f(ξ)/α and the smallestvalue (i.e. −α) is attained in the direction emin := −emax. For n = 1, e = ±1 are theonly possible directions, yielding precisely the values α and −α. For n ≥ 2, all valuesin [−α, α] are attained.

Proof. Since G is open, there is ǫ > 0 such that ξ+he ∈ G for each h ∈]−ǫ, ǫ[. Similarlyto the proof of Th. 4.35, consider auxiliary functions

φ : ]− ǫ, ǫ[−→ Rn, φ(h) := ξ + he,

g : ]− ǫ, ǫ[−→ K, g(h) := (f ◦ φ)(h) = f(ξ + he

).

Theorem 4.31 yields the R-differentiability of g, and, as Dφ ≡ e (i.e., for each h ∈]−ǫ, ǫ[and each α ∈ R, it is Dφ(h)(α) = αe), by (4.41), we have

∀h∈]−ǫ,ǫ[

g′(h) = D(f ◦ φ)(h) = Df(φ(h)

)◦Dφ(h) = ∇ f(ξ + he) · e

and∂f

∂e(ξ) = g′(0) = ∇ f(ξ) · e,

proving (4.49). Applying the Cauchy-Schwarz inequality (1.41) to (4.49) yields

∣∣∣∣

∂f

∂e(ξ)

∣∣∣∣=∣∣∇ f(ξ) · e

∣∣ ≤

∥∥∇ f(ξ)

∥∥2

∥∥e∥∥2= α‖e‖2. (4.50)

Thus, for K = R and e ∈ Rn with ‖e‖2 = 1, we have −α ≤ ∂f∂e(ξ) ≤ α. It remains to

show that, for K = R and n ≥ 2, the map

D : S1(0) −→ [−α, α], D(e) := ∇ f(ξ) · e =n∑

j=1

ǫj∂jf(ξ),

is surjective. The details are bit tedious and are carried out in App. K.2. �

The following example shows that the existence of all directional derivatives does notimply continuity, let alone differentiability.

Example 4.42. Consider the function

f : R2 −→ K, f(x, y) :=

{

1 for 0 < y < x2,

0 otherwise.

The function is not continuous in (0, 0): Let xn := 1/n and yn := 1/n3. Thenlimn→∞(xn, yn) = (0, 0). However, since yn = 1/n3 < 1/n2 = x2n for n > 1, onehas

limn→∞

f(xn, yn) = 1 6= 0 = f(0, 0).


We now claim that, for each e = (ǫx, ǫy) ∈ R2, the directional derivative ∂f∂e(0, 0) exists

and ∂f∂e(0, 0) = 0. For ǫy ≤ 0 this is immediate since, for each h ∈ R+, f((0, 0) +

h(ǫx, ǫy)) = f(hǫx, hǫy) = 0. Now assume ǫy > 0. If ǫx = 0, then f(hǫx, hǫy) =f(0, hǫy) = 0 for each h ∈ R+, showing ∂f

∂e(0, 0) = 0. It remains the case, where ǫy > 0

and ǫx 6= 0. In that case, one obtains h2ǫ2x < hǫy for each 0 < h < ǫyǫ2x. Thus, for such h,

f(hǫx, hǫy) = 0, once again proving ∂f∂e(0, 0) = 0.

4.10 Taylor’s Theorem

We will now extend Taylor’s theorem to higher dimensions by means of the chain rule.First, we need to introduce some notation.

Notation 4.43. In the context of Taylor’s theorem, we need to consider directionalderivatives of higher order. In this context, one often uses a slightly different notationthan the one we used earlier. Let n ∈ N and h = (h1, . . . , hn) ∈ Rn. If G ⊆ Rn is openand f : G −→ K is differentiable at some ξ ∈ G, then, according to (4.49), we cancompute the directional derivative

(h∇)(f)(ξ) :=∂f

∂h(ξ) =

n∑

j=1

hj∂jf(ξ) = h1∂1f(ξ) + · · ·+ hn∂nf(ξ). (4.51)

The object h∇ is also called a differential operator. If f has all partials of second orderat ξ, then we can apply h∇ again to the function in (4.51), obtaining

(h∇)2(f)(ξ) := (h∇)(h∇)(f)(ξ) =n∑

j=1

(h∇)(hj∂jf)(ξ) =n∑

j,k=1

hkhj∂k∂jf(ξ). (4.52)

Thus, if f has all partials of order k at ξ, k ∈ N, then an induction yields

(h∇)k(f)(ξ) =n∑

j1,...,jk=1

hjk · · ·hj1∂jk · · · ∂j1f(ξ). (4.53a)

If f is k times R-differentiable, then comparing (4.53a) with (4.33) (for h1 = · · · = hk =h ∈ Rn) yields

(h∇)k(f)(ξ) = Dkf(ξ)(h, . . . , h︸︷︷︸

k times

). (4.53b)

Finally, it is also useful to define

D0f(ξ) := (h∇)0(f)(ξ) := f(ξ). (4.54)

Theorem 4.44 (Taylor’s Theorem). Let G ⊆ Rn be open, n ∈ N, and f ∈ Cm+1(G,K)for some m ∈ N0 (i.e. f : G −→ K and f has continuous partials up to order m + 1).


Let ξ ∈ G and h ∈ Rn such that the line segment Sξ,ξ+h between ξ and ξ + h is a subsetof G. Then the following formula, also known as Taylor’s formula, holds:

f(ξ + h) =m∑

k=0

(h∇)k(f)(ξ)

k!+Rm(ξ) =

m∑

k=0

Dkf(ξ)(

k times

︷︸︸︷

h, . . . , h)

k!+Rm(ξ)

= f(ξ) +(h∇)(f)(ξ)

1!+

(h∇)2(f)(ξ)

2!+ · · ·+ (h∇)m(f)(ξ)

m!+Rm(ξ), (4.55)

where, similar to the one-dimensional case,

Rm(ξ) :=

∫ 1

0

(1− t)m

m!(h∇)m+1(f)(ξ + th) dt (4.56)

is the integral form of the remainder term. Also similar to the one-dimensional case, ifK = R, then there is θ ∈]0, 1[ such that

Rm(ξ) =(h∇)m+1(f)(ξ + θh)

(m+ 1)!, (4.57)

called the Lagrange form of the remainder term.

Proof. Since Sξ,ξ+h ⊆ G and G is open, there is ǫ > 0 such that we can consider theauxiliary function

φ : ]− ǫ, 1 + ǫ[−→ K, φ(t) := f(ξ + th).

This definition immediately implies φ(0) = f(ξ) and φ(1) = f(ξ+h). We can apply thechain rule to get

φ′(t) = ∇ f(ξ + th) · h = (h∇)(f)(ξ + th),

using the notation from (4.51). Since f ∈ Cm+1(G,K), we can use an induction to get,for each k ∈ {0, . . . ,m+ 1},

φ(k)(t) = (h∇)k(f)(ξ + th). (4.58)

Applying the one-dimensional form of Taylor’s theorem [Phi16, Th. 10.27] with theremainder term in integral form to φ with x = 1 and a = 0 together with (4.58) yields

f(ξ + h) = φ(1)

= φ(0) + φ′(0)(1− 0) +φ′′(0)

2!(1− 0)2 + · · ·+ φ(m)(0)

m!(1− 0)m

+

∫ 1

0

(1− t)m

m!φ(m+1)(t) dt

= f(ξ) +(h∇)(f)(ξ)

1!+

(h∇)2(f)(ξ)

2!+ · · ·+ (h∇)m(f)(ξ)

m!

+

∫ 1

0

(1− t)m

m!(h∇)m+1(f)(ξ + th) dt , (4.59)


which is precisely (4.55) with Rm(ξ) in the form (4.56).

To prove the Lagrange form of the remainder term, we restate (4.59), this time applying[Phi16, Th. 10.27] to φ with the remainder term in Lagrange form, yielding

f(ξ + h) =m∑

k=0

(h∇)k(f)(ξ)

k!+φ(m+1)(θ)

(m+ 1)!(1− 0)m+1

=m∑

k=0

(h∇)k(f)(ξ)

k!+

(h∇)m+1(f)(ξ + θh)

(m+ 1)!

for some suitable θ ∈]0, 1[, thereby completing the proof. �

Example 4.45. Let us write Taylor’s formula (4.55) explicitly for the function

f : R2 −→ R, f(x, y) := sin(xy)

for m = 1 and for ξ = (0, 0). Here, we have for the gradient

∇ f(x, y) =(

y cos(xy), x cos(xy))

and for the Hessian matrix of second order partials

Hf (x, y) =

(∂x∂xf(x, y) ∂x∂yf(x, y)∂y∂xf(x, y) ∂y∂yf(x, y)

)

=

(−y2 sin(xy) cos(xy)− xy sin(xy)

cos(xy)− xy sin(xy) −x2 sin(xy)

)

.

For h = (h1, h2) ∈ R2, we obtain

f(h) =

−h21θ2h22 sin(θ2h1h2) + 2h1h2 cos(θ2h1h2)− 2h21h

22θ

2 sin(θ2h1h2)− h22θ2h21 sin(θ

2h1h2)

2!= −2h21h

22θ

2 sin(θ2h1h2) + h1h2 cos(θ2h1h2)

for some suitable 0 < θ < 1.

4.11 Implicit Function Theorem

The implicit function Th. 4.49 below provides suitable hypotheses such that the equationf(x, y) = C can, locally, be solved for y (or x). To illustrate the situation, consider

x2 + y2 = C, x, y, C ∈ R, (4.60)

which, for C > 0, represents a circle with radius√C and center at (0, 0). This simple

example already shows that one can not expect such an equation to have a solution foreach C, and, if it does have a solution, it need not be unique. However, if

ξ2 + η2 − C = 0, C > 0, (4.61)


and η 6= 0, then, in a neighborhood of (ξ, η), (4.60) can, indeed, uniquely be solved fory, namely

for η > 0: ∀(x,y)∈G1

(

x2 + y2 = C ⇔ y =√C − x2

)

, G1 :=]−√C,

√C[×R+

(4.62a)(note that G1 is, indeed, an open neighborhood of (ξ, η)), and

for η < 0: ∀(x,y)∈G2

(

x2 + y2 = C ⇔ y = −√C − x2

)

, G2 :=]−√C,

√C[×R−

(4.62b)(note that G2 is, indeed, an open neighborhood of (ξ, η)). If η = 0, then, in eachneighborhood of (ξ, η), (4.60) has two distinct solutions for y. However, in this case,there exists a neighborhood of (ξ, η), where (4.60) can, indeed, uniquely be solved for x(one merely has to switch the roles of x and y in the above considerations).

In the example|x| − |y| = 0, x, y ∈ R, (4.63)

the equation can not be solved uniquely for either x or y in any neighborhood of (0, 0).

The implicit function Th. 4.49 will show that a sufficient condition for f(x, y) = 0 to beuniquely solvable for y in a neighborhood of (ξ, η) is f to be continuously differentiable,with invertible derivative with respect to y in (ξ, η).

In preparation for the proof of the implicit function theorem, we provide the followingproposition:

Proposition 4.46. Let ‖ · ‖ be some norm on Rn, n ∈ N. Moreover, let a ∈ Rn, r > 0,and let f : Br(a) −→ Rn be defined on the open r-ball with center a with respect to ‖ · ‖.If A is an invertible n× n matrix over R such that

∥∥A−1f(a)

∥∥ <

r

2(4.64)

and such that the map

F : Br(a) −→ Rn, F (x) := x− A−1f(x), (4.65)

is Lipschitz continuous with Lipschitz constant L = 1/2, then f has a unique zeroξ ∈ Br(a). Moreover, for each x0 ∈ Br(a), ξ is the limit of the sequence (xk)k∈N0

,recursively defined by

∀k∈N0

xk+1 := F (xk). (4.66)

Proof. Let x0 ∈ Br(a) and set

s0 := max{2∥∥A−1f(a)

∥∥ , ‖x0 − a‖

} (4.64)∈ [0, r[.

The idea is to show that, for each s0 < s < r, the Banach fixed point Th. 2.29 appliesto the contraction

Fs : Bs(a) −→ Bs(a), Fs(x) := F (x).


We verify that Fs, indeed, maps Bs(a) into Bs(a): If x ∈ Bs(a), then

∥∥F (x)− a

∥∥ ≤

∥∥F (x)− F (a)

∥∥+

∥∥F (a)− a

∥∥ ≤ 1

2‖x− a‖+

∥∥A−1f(a)

∥∥

≤ s

2+s02< s,

showing Fs(x) ∈ Bs(a) (in particular, this shows the xk are well-defined by (4.66)). AsF is Lipschitz continuous with Lipschitz constant L = 1/2, so is Fs, i.e. Fs is, indeed,a contraction. As Bs(a) is closed, the Banach fixed point Th. 2.29 yields that Fs has aunique fixed point ξ and, moreover, ξ = limk→∞ xk. Since this holds for each s ∈]s0, r[,ξ must also be the unique fixed point of F . The proof is concluded by noting

∀y∈Br(a)

f(y) = 0 ⇔ F (y) = y − A−1f(y) = y,

that means y is a zero of f if, and only if, y is a fixed point of F . �

Remark 4.47. If the map f in Prop. 4.46 is differentiable with invertible derivativesDf(x), and if, instead of using a constant matrix A in the definition of (4.66), one uses(Df(xk))

−1, then the iteration defined by (4.66) is known as Newton’s method (in ndimensions, cf. [Phi17, Sec. 6.3]). In consequence, if A ≈ (Df(xk))

−1 in (4.66), then thedefined iteration is sometimes referred to as a simplified Newton’s method.

Notation 4.48. Let k,m, n ∈ N, let G ⊆ Rn × Rm be open, and consider a mapf : G −→ Rk. If (ξ, η) ∈ G and f is R-differentiable at (ξ, η), then let Dyf(ξ, η) andDxf(ξ, η) denote the R-linear maps

Dyf(ξ, η) : Rm −→ Rk,

(Dyf(ξ, η)

)(h) :=

(Df(ξ, η)

)(0, h), (4.67a)

Dxf(ξ, η) : Rn −→ Rk,

(Dxf(ξ, η)

)(h) :=

(Df(ξ, η)

)(h, 0), (4.67b)

respectively.

Theorem 4.49 (Implicit Function Theorem). Let m,n ∈ N, let G ⊆ Rn ×Rm be open,and let f : G −→ Rm be continuously differentiable, i.e. f ∈ C1(G,Rm). If (ξ, η) ∈ Gis such that

f(ξ, η) = 0 and A := Dyf(ξ, η) is invertible, (4.68)

then there exist open neighborhoods Uξ ⊆ Rn of ξ and Vη ⊆ Rm of η, and a continuouslydifferentiable map g : Uξ −→ Vη such that the zeros of f in Uξ × Vη are given preciselyby the graph of g, i.e.

(Uξ × Vη) ∩ f−1{0} = {(x, g(x)) : x ∈ Uξ}, (4.69a)

which can be restated as

∀(x,y)∈Uξ×Vη

(

f(x, y) = 0 ⇔ y = g(x))

. (4.69b)

Moreover,

∀x∈Uξ

Dg(x) = −(Dyf(x, g(x))

)−1Dxf(x, g(x)) (4.70)

and, if f ∈ Cα(G,Rm), α ∈ N ∪ {∞}, then g ∈ Cα(Uξ,Rm).


Proof. Fix some arbitrary norms on Rn and on the set M(m,R) of real m×m matrices(for readability’s sake, we will denote both norms by ‖ · ‖). On Rm, we will use the1-norm ‖ · ‖1 to apply Th. 4.38. According to the hypothesis, A is invertible. Thus,det(A) 6= 0. Since the map B 7→ det(B) is continuous (cf. Ex. 2.21(a)), and the mapDyf : G −→ M(m,R) is continuous due to the assumed continuous differentiability off , the set

G0 :={(x, y) ∈ G : det

(Dyf(x, y)

)6= 0}⊆ G

is an open neighborhood of (ξ, η). Next, we consider the map

F : G0 −→ Rm, F (x, y) := y − A−1f(x, y).

Then F is continuously differentiable with

DyF : G0 −→ M(m,R), DyF (x, y) = Id−A−1Dyf(x, y),

being continuous as well. Thus, since DyF (ξ, η) = Id−A−1A = 0, there exists r > 0such that the open r-balls Br(ξ) ⊆ Rn and Br(η) ⊆ Rm satisfy

(ξ, η) ∈ Br(ξ)× Br(η) ⊆{

(x, y) ∈ G0 : ∀k,l=1,...,m

∣∣∂ykFl(x, y)

∣∣ <

1

2m

}

⊆ G0. (4.71)

As we assume f to be continuous with f(ξ, η) = 0, there exists s ∈]0, r] such that

Bs(ξ) ⊆{

x ∈ Br(ξ) :∥∥A−1f(x, η)

∥∥1<r

2

}

⊆ Br(ξ). (4.72)

To construct the map g : Bs(ξ) −→ Br(η), we fix x ∈ Bs(ξ) and apply Prop. 4.46 tothe function

fx : Br(η) −→ Rm, fx(y) := f(x, y). (4.73)

To verify that the hypotheses of Prop. 4.46 are satisfied, we observe ‖A−1fx(η)‖1 < r2

holds due to x ∈ Bs(ξ) and (4.72), the map Fx : Br(η) −→ Rm, Fx(y) := y−A−1fx(y) =F (x, y), is Lipschitz continuous with Lipschitz constant L = m 1

2m= 1

2due to (4.71)

and Th. 4.38. Thus, according to Prop. 4.46, the function fx has a unique zero g(x) inBr(η), which defines the function g.

Note that, in the above argument, for each 0 < ρ < r, one can choose s(ρ) < s suchthat (4.72) holds with s replaced by s(ρ) and r replaced by ρ, then showing that g mapsBs(ρ)(ξ) into Bρ(η). We now choose some arbitrary ρ ∈]0, r[ and set

Uξ := Bs(ρ)(ξ), Vη := Bρ(η)

for the desired neighborhoods of the theorem. We verify g to be continuous on Uξ:Let x ∈ Uξ and let (xk)k∈N be a sequence in Uξ with limk→∞ xk = x. We have toshow limk→∞ g(xk) = g(x). If limk→∞ g(xk) = g(x) does not hold, then, without lossof generality, we may assume that there exists ǫ > 0 such that ‖g(xk) − g(x)‖ > ǫ foreach k ∈ N (after having replaced (xk)k∈N with a suitable subsequence). Moreover, wemay replace (xk)k∈N with another subsequence such that there exists y ∈ Bρ(η) ⊆ Br(η)


satisfying y = limk→∞ g(xk) (this is due to the Bolzano-Weierstrass Th. 1.31, as g(xk) ∈Bρ(η) for each k ∈ N). Then the continuity of f implies

f(x, y) = limk→∞

f(xk, g(xk)

)= 0,

showing g(x) = y = limk→∞ g(xk) (due to (4.69b) – here we need y ∈ Br(η), which wasthe reason for choosing ρ < r), which is in contradiction to the choice of the xk, andproves the continuity of g.

Next, we show that g is differentiable at each x ∈ Uξ, where the derivative is given

by (4.70): To this end, let x ∈ Uξ and note the existence of(Dyf(x, g(x))

)−1due to

(x, g(x)) ∈ Uξ × Vη ⊆ G0. According to Def. 4.18, we have to show

limh→0

g(x+ h)− g(x) +(Dyf(x, g(x))

)−1Dxf(x, g(x))h

‖h‖ = 0. (4.74)

Let 0 6= h ∈ Rn be sufficiently small such that x+h ∈ Uξ. Using the notation of the meanvalue Th. 4.35, for each l ∈ {1, . . . ,m}, there exist xh,l ∈ Sx,x+h and yh,l ∈ Sg(x),g(x+h)

such that

0 = fl(x+ h, g(x+ h)

)− fl

(x, g(x)

)

= fl(x+ h, g(x+ h)

)− fl

(x, g(x+ h)

)+ fl

(x, g(x+ h)

)− fl

(x, g(x)

)

= Dxfl(xh,l, g(x+ h)

)(h) +Dyfl

(x, yh,l

)(g(x+ h)− g(x)

). (4.75)

Note that the two derivatives occurring in (4.75) have the form of gradients, which,according to our usual convention, we can interpret as row vectors. Joining m rowvectors into a matrix, we can write the m equations of (4.75) in matrix form as

0 = Xhh+ Yh(g(x+ h)− g(x)

), (4.76)

where

Xh :=

Dxf1(xh,1, g(x+ h)

)

...Dxfm

(xh,m, g(x+ h)

)

, Yh :=

Dyf1(x, yh,1

)

...Dyfm

(x, yh,m

)

.

As we already know g to be continuous, h → 0 implies g(x + h) → g(x). Thus, sinceyh,l ∈ Sg(x),g(x+h), h → 0 implies yh,l → g(x) for each l ∈ {1, . . . ,m}, and, as allpartials of f are continuous as well, Yh → Dyf

(x, g(x)

). Since the maps B 7→ det(B)

and B 7→ ‖B−1‖ are continuous (cf. Ex. 2.6(a) and Ex. 2.21(a),(b)), h → 0 impliesdet(Yh) → det

(Dyf

(x, g(x)

))6= 0 and Yh is invertible for sufficiently small h with

(Yh)−1 →

(Dyf

(x, g(x)

))−1. For such sufficiently small h, we can rewrite (4.76) as

g(x+ h)− g(x) = −(Yh)−1Xhh.

Also, since xh,l ∈ Sx,x+h, h→ 0 implies xh,l → x and, then, the continuity of g togetherwith the continuity of the partials of f implies Xh → Dxf(x, g(x)). Thus, we can finish


the proof of (4.70) by noting

limh→0

∥∥g(x+ h)− g(x) +

(Dyf(x, g(x))

)−1Dxf(x, g(x))h

∥∥1

‖h‖

= limh→0

∥∥− (Yh)

−1Xhh+(Dyf(x, g(x))

)−1Dxf(x, g(x))h

∥∥1

‖h‖ = 0.

It remains to prove that g is Cα if f is Cα, α ∈ N∪{∞}. To this end, for α ∈ N, we willshow by induction on β = 1, . . . , α that each partial derivative of g at x ∈ Uξ of order βis a rational function of partials of f of order ≤ β, all taken at (x, g(x)), and of partials ofg of order ≤ β−1, all taken at x (in particular, the denominator of this rational functiondoes not have any zeros in Uξ): For β = 1, the claim follows from (4.70): The entriesof Dxf(x, g(x)) are polynomials of first partials of f taken at (x, g(x)); the entries of(Dyf(x, g(x))

)−1are rational functions, where both the numerator and the denominator

polynomial are polynomials of first partials of f taken at (x, g(x)) (in particular, theentries of the right-hand side of (4.70) do not involve any first partials of g). For theinduction step, let 1 < β ≤ α. By induction, we know the partials of g of order β − 1are rational functions of partials of f of order ≤ β − 1, all taken at (x, g(x)), and ofpartials of g of order ≤ β − 2, all taken at x. Taking the derivative of partials of g oforder ≤ β − 2 evaluated at x, yields partials of g of order ≤ β − 1 still evaluated at x;according to the chain rule of Th. 4.31, taking the derivative of partials of f of order≤ β−1 evaluated at (x, g(x)), yields polynomials of partials of f of order ≤ β evaluatedat (x, g(x)) and of first partials of g evaluated at x. Thus, applying the product and thequotient rule establishes the case. �

Theorem 4.50 (Inverse Function Theorem). Let n ∈ N, let G ⊆ Rn be open, and letf : G −→ Rn be continuously differentiable, i.e. f ∈ C1(G,Rn). If ξ ∈ G is such that

Df(ξ) is invertible, (4.77)

then there exists an open neighborhood U ⊆ G of ξ such that V := f(U) is open andthe restriction f : U −→ V is bijective with continuously differentiable inverse functionf−1 : V −→ U . Moreover,

∀y∈V

D(f−1)(y) =(

Df(f−1(y)

))−1

(4.78)

and, if f ∈ Cα(U,Rn), α ∈ N ∪ {∞}, then f−1 ∈ Cα(V,Rn).

Proof. The idea is to apply the implicit function Th. 4.49 to the continuously differen-tiable map

F : G× Rn −→ Rn, F (x, y) := f(x)− y.

Here, as compared to Th. 4.49, the roles of the variables x and y are switched. Lettingη := f(ξ), we have

F (ξ, η) = f(ξ)− η = 0, and DxF (ξ, η) = Df(ξ) is invertible.

5 EXTREME VALUES, STATIONARY POINTS, OPTIMIZATION 106

Thus, Th. 4.49 applies and yields an open neighborhood U ⊆ G of ξ, an open neighbor-hood V ⊆ Rn of η, and a C1 map g : V −→ U such that

∀(x,y)∈U×V

(

F (x, y) = f(x)− y = 0 ⇔ x = g(y))

. (4.79)

If we let U := g(V ), then U ⊆ U is a neighborhood of ξ = g(f(ξ)), and (4.79) impliesthat f : U −→ V and g : V −→ U are inverse to each other, in particular, they areboth bijective with f−1 = g. To verify that U is open, consider the (still continuous)map f : U −→ Rn and observe U = f−1(V ). As V is open, Th. 2.7(iii) implies theexistence of O ⊆ Rn open with U = O ∩ U . Since both O and U are open, U must beopen as well.

Using (4.70), we obtain, for each y ∈ V ,

Dg(y) = −(DxF (g(y), y)

)−1DyF (g(y), y)

= −(

Df(g(y)

))−1

(− Id) =(

Df(g(y)

))−1

,

proving (4.78). Finally, if f is Cα on U , then F is Cα on U × Rn, such that Th. 4.49implies g = f−1 to be Cα as well. �

Corollary 4.51. Let n ∈ N, let G ⊆ Rn be open, and let f : G −→ Rn be continuouslydifferentiable, i.e. f ∈ C1(G,Rn). If Df(x) is invertible for each x ∈ G, then f mapsopen sets to open sets, i.e. if O ⊆ G is open, then f(O) is open as well.

Proof. Let O ⊆ G be open. We have to show that each point η ∈ f(O) is an interiorpoint of f(O). To this end, let η ∈ f(O) and let ξ ∈ O be such that f(ξ) = η. SinceDf(ξ) is invertible by hypothesis, we can apply the inverse function Th. 4.50 to therestriction of f to O, obtaining open neighborhoods U ⊆ O of ξ and V ⊆ f(O) of ηsuch that f : U −→ V is bijective. In particular, η is an interior point of f(O), provingf(O) to be open. �

5 Extreme Values, Stationary Points, Optimization

5.1 Definitions of Extreme Values

The following Def. 5.1 is a generalization of [Phi16, Def. 7.50].

Definition 5.1. Let (X, d) be a metric space, M ⊆ X, and f : M −→ R.

(a) Given x ∈ M , f has a (strict) global min at x if, and only if, f(x) ≤ f(y) (f(x) <f(y)) for each y ∈ M \ {x}. Analogously, f has a (strict) global max at x if, andonly if, f(x) ≥ f(y) (f(x) > f(y)) for each y ∈M \ {x}. Moreover, f has a (strict)global extreme value at x if, and only if, f has a (strict) global min or a (strict)global max at x.


(b) Given x ∈ M , f has a (strict) local min at x if, and only if, there exists ǫ > 0such that f(x) ≤ f(y) (f(x) < f(y)) for each y ∈ {y ∈ M : d(x, y) < ǫ} \ {x}.Analogously, f has a (strict) local max at x if, and only if, there exists ǫ > 0 suchthat f(x) ≥ f(y) (f(x) > f(y)) for each y ∈ {y ∈M : d(x, y) < ǫ}\{x}. Moreover,f has a (strict) local extreme value at x if, and only if, f has a (strict) local min ora (strict) local max at x.

Remark 5.2. In the context of Def. 5.1, it is immediate from the respective definitionsthat f has a (strict) global min at x ∈ M if, and only if, −f has a (strict) global maxat x. Moreover, the same holds if “global” is replaced by “local”. It is equally obviousthat every (strict) global min/max is a (strict) local min/max.

—

In optimization problems, one often aims at finding global (or at least local) minimaor maxima of real-valued functions. From Th. 3.19, we already know that continuousfunctions on compact topological spaces always have a global max and a global min.However, in general, one has no method to actually find such extrema. For differen-tiable functions defined on subsets of Rn, the situation is somewhat better, even thoughfinding extrema of a complicated function can still be very difficult. To prove sufficientconditions for extrema of differentiable functions of several variables, we will make useof Taylor’s Th. 4.44 and so-called quadratic forms.

5.2 Quadratic Forms

Before we get to the quadratic forms, we briefly need to consider the Euclidean norm ofmatrices.

Notation 5.3. Let A = (akl)(k,l)∈{1,...,m}×{1,...,n} a real m × n matrix, m,n ∈ N. Weintroduce the quantity

‖A‖HS :=

√√√√

m∑

k=1

n∑

l=1

a2kl, (5.1)

called the Hilbert-Schmidt norm or the Frobenius norm of A. Thus, ‖A‖HS is the Eu-clidean norm of A if we consider A as an element of Rmn. Caveat: For m,n > 1, theHilbert-Schmidt norm is not! the operator norm of A with respect to the Euclideannorms on Rm and Rn – it is actually not an operator norm at all (see [Phi17, Ex. B.9]).We could actually use the mentioned operator norm in the following and everythingwould work just the same (since (5.2) also holds for the operator norm) – the reason weprefer the Hilbert-Schmidt norm here, is that it is much easier to compute and, thus,less abstract.

Lemma 5.4. Let A = (akl)(k,l)∈{1,...,m}×{1,...,n} a real m×n matrix, m,n ∈ N. Then, foreach x ∈ Rn, it holds that

‖Ax‖2 ≤ ‖A‖HS ‖x‖2. (5.2)


Proof. This follows easily from the Cauchy-Schwarz inequality. For each k ∈ {1, . . . ,m},let ak := (ak1, . . . , akn) denote the kth row vector of the matrix A. Then one computes

‖Ax‖2 =

√√√√

m∑

k=1

(n∑

l=1

aklxl

)2

≤

√√√√

m∑

k=1

‖ak‖22 ‖x‖22 = ‖A‖HS ‖x‖2,

thereby establishing the case. �

Definition 5.5. Let n ∈ N. A quadratic form is a map

QA : Rn −→ R, QA(x) := xtAx =n∑

k,l=1

aklxkxl, (5.3)

where xt denotes the transpose of x, and A = (akl)nk,l=1 is a symmetric real n×n-matrix,

i.e. a quadratic real matrix with akl = alk.

Remark 5.6. Each quadratic form is a polynomial and, thus, continuous by Th. 2.20.Moreover, if λ ∈ R and A and B are symmetric real n×n-matrices, then λA and A+Bare also symmetric real n × n-matrices, and QλA = λQA as well as QA+B = QA + QB,showing, in particular, that the symmetric real n× n-matrices form a real vector spaceand that the quadratic forms also form a real vector space.

Example 5.7. If G ⊆ Rn is open and f : G −→ R is C2, then, for each ξ ∈ G, theHessian matrix

Hf (ξ) =(

∂k∂lf(ξ))n

k,l=1(5.4)

is symmetric, i.e. QHf (ξ) : Rn −→ R is a quadratic form.

Lemma 5.8. Let A = (akl)nk,l=1 is a symmetric real n×n-matrix, n ∈ N, and let QA be

the corresponding quadratic form.

(a) QA is homogeneous of degree 2, i.e.

QA(λx) = λ2QA(x) for each x ∈ Rn and each λ ∈ R.

(b) For each α ∈ R, the following statements are equivalent:

(i) QA(x) ≥ α‖x‖22 for all x ∈ Rn.

(ii) QA(x) ≥ α for all x ∈ Rn with ‖x‖2 = 1.

(c) For each x ∈ Rn:|QA(x)| ≤ ‖A‖HS ‖x‖22.

Proof. (a) is an immediate consequence of (5.3).


(b): That (i) implies (ii) is trivial, since (ii) is a special case of (i). It remains to showthat (ii) implies (i). For x = 0, one has 0 = QA(x) = α‖x‖22, so let x 6= 0 and assume(ii). Then one obtains

QA(x) = QA

(

‖x‖2x

‖x‖2

)(a)= ‖x‖22QA

(x

‖x‖2

)

≥ α‖x‖22,

proving (i).

(c): Let x ∈ Rn. SinceQA(x) = x·(Ax), the Cauchy-Schwarz inequality yields |QA(x)| ≤‖Ax‖2‖x‖2, and (5.2) then implies (c). �

Definition 5.9. Let A = (akl)nk,l=1 is a symmetric real n×n-matrix, n ∈ N, and let QA

be the corresponding quadratic form.

(a) A and QA are called positive definite if, and only if, QA(x) > 0 for every 0 6= x ∈ Rn.

(b) A and QA are called positive semidefinite if, and only if, QA(x) ≥ 0 for every x ∈ Rn.

(c) A andQA are called negative definite if, and only if, QA(x) < 0 for every 0 6= x ∈ Rn.

(d) A and QA are called negative semidefinite if, and only if, QA(x) ≤ 0 for everyx ∈ Rn.

(e) A and QA are called indefinite if, and only if, they are neither positive semidefinitenor negative semidefinite, i.e. if, and only if, there exist a, b ∈ Rn with QA(a) > 0and QA(b) < 0.

Example 5.10. Let n = 2 and consider the real symmetric matrix A =

(a bb c

)

. One

then obtainsQA : R2 −→ R, QA(x, y) = ax2 + 2bxy + cy2. (5.5)

One can now use the value of detA = ac − b2, which is also called the discriminant ofQA, to determine the definiteness of A. This is due to the following identity, that holdsfor each (x, y) ∈ R2:

aQA(x, y) = a(ax2 + 2bxy + cy2) = (ax+ by)2 + (detA)y2. (5.6)

One obtains the following cases:

detA > 0: This implies a 6= 0. Then (5.6) provides:

a > 0 ⇔ QA positive definite,

a < 0 ⇔ QA negative definite.

detA < 0: In this case, we claim:

QA is indefinite.


To verify this claim, first consider a > 0. Then QA(1, 0) = a > 0 and, according to(5.6), QA(−b/a, 1) = (detA)/a < 0, showing that QA is indefinite. Now let a < 0. ThenQA(1, 0) = a < 0 and, according to (5.6), QA(−b/a, 1) = (detA)/a > 0, again showingthat QA is indefinite. Finally, let a = 0. Then detA < 0 implies b 6= 0. If c > 0, thenQA(0, 1) = c > 0 and QA(1/(2b),−1/(2c)) = −1/(2c) + 1/(4c) = −1/(2c) < 0, i.e. QA

is indefinite. If c < 0, then QA(0, 1) = c < 0 and QA(1/(2b),−1/(2c)) = −1/(2c) +1/(4c) = −1/(4c) > 0, i.e. QA is again indefinite. If c = 0, then QA(1/(2b), 1) = 1 andQA(1/(2b),−1) = −1 and QA is indefinite also in this last case.

detA = 0: Here, we claim:

a > 0 or (a = 0 and c ≥ 0) ⇔ QA positive semidefinite,

a < 0 or (a = 0 and c ≤ 0) ⇔ QA negative semidefinite.

Once again, for the proof, we need to distinguish the different possible cases. If a > 0,then QA(x, y) = (ax + by)2/a ≥ 0, i.e. QA is positive semidefinite. If a < 0, thenQA(x, y) = (ax + by)2/a ≤ 0, i.e. QA is negative semidefinite. Now let a = 0. ThendetA = 0 implies b = 0. Thus, QA(x, y) = cy2, i.e. QA is positive semidefinite for c ≥ 0and negative semidefinite for c ≤ 0.

Proposition 5.11. Let A = (akl)nk,l=1 is a symmetric real n× n-matrix, n ∈ N, and let

QA be the corresponding quadratic form.

(a) A and QA are positive definite if, and only if, there exists α > 0 such that

QA(x) ≥ α > 0 for each x ∈ Rn with ‖x‖2 = 1. (5.7a)

Analogously, A and QA are negative definite if, and only if, there exists α < 0 suchthat

QA(x) ≤ α < 0 for each x ∈ Rn with ‖x‖2 = 1. (5.7b)

(b) If A and QA are positive definite (respectively negative definite, or indefinite), thenthere exists ǫ > 0 such that each symmetric real n×n matrix B with ‖A−B‖HS < ǫis also positive definite (respectively negative definite, or indefinite).

(c) If A and QA are indefinite, then there exists ǫ > 0 and a, b ∈ Rn with ‖a‖2 = ‖b‖2 =1 such that, for each symmetric real n× n matrix B with ‖A−B‖HS < ǫ and each0 6= λ ∈ R, it holds that QB(λa) > 0 and QB(λb) < 0.

Proof. (a): We consider the positive definite case; the negative definite case is provedcompletely analogously. First note that (5.7a) implies that A andQA are positive definiteaccording to Lem. 5.8(b). Conversely, assume that A and QA are positive definite. The1-sphere S1(0) = {x ∈ Rn : ‖x‖2 = 1} is a closed and bounded subset of Rn and, hence,compact. Since QA is continuous, it must assume its min on S1(0) according to Th.3.19, i.e. there is α ∈ R and xα ∈ S1(0) such that QA(xα) = α and QA(x) ≥ α for eachx ∈ Rn with ‖x‖2 = 1. Since QA is positive definite, α > 0, proving (5.7a).

(b) and (c): We begin by employing (5.7a) to show (b) for A and QA being positivedefinite (employing (5.7b), the case of A and QA being negative definite can be treated


completely analogously). If A and QA are positive definite, then there is α > 0 such that(5.7a) holds. Choose ǫ := α/2. If B is a symmetric real n×n matrix with ‖A−B‖HS < ǫ,then, using Lem. 5.8(c), for each x ∈ Rn with ‖x‖2 = 1:

|QA(x)−QB(x)| = |QA−B(x)| ≤ ‖A− B‖HS < ǫ =α

2. (5.8)

Since QA(x) ≥ α > 0, this implies QB(x) ≥ α/2 > 0 for each x ∈ Rn with ‖x‖2 = 1.Due to (a), this proves that B is positive definite. Now consider the case that A and QA

are indefinite. Then there are 0 6= a, b ∈ Rn such that QA(a) > 0 and QA(b) < 0. Bynormalizing and using Lem. 5.8(a), one can even additionally assume ‖a‖2 = ‖b‖2 = 1.Set α := min{QA(a), |QA(b)|}. Then α > 0. If ǫ := α/2 and B is a symmetric realn × n matrix with ‖A − B‖HS < ǫ, then, as above, (5.8) holds for each x ∈ Rn with‖x‖2 = 1. In particular, QB(a) ≥ α/2 > 0 and QB(b) ≤ −α/2 < 0, showing thatQB is indefinite, concluding the proof of (b). To complete the proof of (c) as well, itmerely remains to remark that, for each 0 6= λ ∈ R, one has QB(λa) ≥ λ2α/2 > 0 andQB(λb) ≤ −λ2α/2 < 0. �

5.3 Extreme Values and Stationary Points of DifferentiableFunctions

Definition 5.12. Let G ⊆ Rn, n ∈ N, f : G −→ K, and let ξ be an interior point ofG. If all first partials of f exist in ξ, then ξ is called a stationary or critical point of fif, and only if,

∇ f(ξ) = 0. (5.9)

—

The following Th. 5.13 generalizes [Phi16, Th. 9.15] to functions defined on subsets ofRn:

Theorem 5.13. Let G ⊆ Rn, n ∈ N, f : G −→ R, and let ξ be an interior point ofG. If all first partials of f exist in ξ and f has a local min or max at ξ, then ξ is astationary point of f , i.e. ∇ f(ξ) = 0.

Proof. Since ξ is an interior point of G and since f has a local min or max at ξ, thereis ǫ > 0 such that Bǫ(ξ) ⊆ G and such that f(ξ) ≤ f(x) for each x ∈ Bǫ(ξ) or suchthat f(ξ) ≥ f(x) for each x ∈ Bǫ(ξ). Let j ∈ {1, . . . , n}. Then there is δ > 0 such that(ξ1, . . . , ξj−1, t, ξj+1, . . . , ξn) ∈ Bǫ(ξ) for each t ∈]ξj−δ, ξj+δ[. Thus, the one-dimensionalfunction g : ]ξj − δ, ξj + δ[−→ R, g(t) := f(ξ1, . . . , ξj−1, t, ξj+1, . . . , ξn), has a local min ormax at ξj, and, since ∂jf(ξ) exists, g is differentiable in ξj, implying 0 = g′(ξj) = ∂jf(ξ)according to [Phi16, Th. 9.15]. Since j ∈ {1, . . . , n} was arbitrary, ∇ f(ξ) = 0. �

One already knows from simple one-dimensional examples such as f : R −→ R, f(x) :=x3 and ξ = 0 that ∇ f(ξ) = 0 is not a sufficient condition for f to have a local extremevalue at ξ. However, the following Th. 5.14 does provide such sufficient conditions.


Theorem 5.14. Let G ⊆ Rn be open, n ∈ N, f : G −→ R, f ∈ C2(G), and let ξ ∈ Gbe a stationary point of f . Then, in the following cases, one can use the Hessian matrixHf (ξ) to determine if f has a local extreme value at ξ:

Hf (ξ) positive definite ⇒ f has a strict local min at ξ, (5.10a)

Hf (ξ) negative definite ⇒ f has a strict local max at ξ, (5.10b)

Hf (ξ) indefinite ⇒ f does not have a local extreme value at ξ. (5.10c)

Proof. Since G is open, there is ǫ > 0 such that ξ + h ∈ G for each h ∈ Rn with‖h‖2 < ǫ. For each such h, by an application of Taylor’s Th. 4.44 with m = 1, we obtainthe existence of θ ∈]0, 1[ satisfying

f(ξ + h) = f(ξ) + h · ∇ f(ξ) +1

2

n∑

k,l=1

∂k∂lf(ξ + θh)hkhl

= f(ξ) +htHf (ξ + θh)h

2= f(ξ) +

QHf (ξ+θh)(h)

2. (5.11)

Rewriting (5.11), one gets

f(ξ + h)− f(ξ) =QHf (ξ+θh)(h)

2. (5.12)

Note that the assumed continuity of the functions ∂k∂lf : G −→ R (k, l ∈ {1, . . . , n})implies the continuity of Hf : G −→ Rn2

, x 7→ Hf (x) (the ∂k∂lf are the coordinatefunctions of Hf ). Thus, if Hf (ξ) is positive definite, then, by Prop. 5.11(b), there isδ > 0 such that ‖h‖2 < ǫ and ‖Hf (ξ) − Hf (ξ + θh)‖HS < δ imply that Hf (ξ + θh) isalso positive definite. Moreover, the continuity of Hf means that there exists 0 < α < ǫsuch that ‖h‖2 < α implies ‖Hf (ξ) − Hf (ξ + θh)‖HS < δ for each θ ∈]0, 1[. For suchh 6= 0, the right-hand side of (5.12) must be positive, showing that f has a strictlocal min at ξ (f(ξ) < f(x) for each x ∈ Bα,‖·‖2(ξ) \ {ξ}). For Hf (ξ) being negativedefinite, an analogous argument shows that f has a strict max at ξ. Similarly, if Hf (ξ)is indefinite, then, by Prop. 5.11(c), there is δ > 0 and a, b ∈ Rn with ‖a‖2 = ‖b‖2 = 1such that, ‖h‖2 < ǫ and ‖Hf (ξ) − Hf (ξ + θh)‖HS < δ imply that QHf (ξ+θh)(λa) > 0and QHf (ξ+θh)(λb) < 0 for each 0 6= λ ∈ R. The continuity of Hf provides some0 < α < ǫ such that ‖h‖2 < α implies ‖Hf (ξ) − Hf (ξ + θh)‖HS < δ for each θ ∈]0, 1[.For each 0 < λ < α, we get ‖λa‖2 < α and ‖λb‖2 < α, such that (5.12) impliesf(ξ + λb) < f(ξ) < f(ξ + λa), i.e. f has neither a local min nor a local max at ξ. �

Example 5.15. Consider the case n = 2, i.e. the case of a C2 function f : G −→ R, Gbeing an open subset of R2. Let (x0, y0) ∈ G be a stationary point of f . Then, accordingto Example 5.10, the definiteness of the Hessian matrix Hf (x0, y0) is determined by thesign of

detHf (x0, y0) = ∂x∂xf(x0, y0)∂y∂yf(x0, y0)−(∂x∂yf(x0, y0)

)2(5.13)

(which, by definition, is the same as the discriminant of the corresponding quadraticform QHf (x0,y0)). If detHf (x0, y0) > 0, then Th. 5.14 tells us that f has a strict local


extreme value at (x0, y0): If ∂x∂xf(x0, y0) > 0, then, by Example 5.10, Hf (x0, y0) ispositive definite and f has as strict local min at (x0, y0); if ∂x∂xf(x0, y0) < 0, then, byExample 5.10, Hf (x0, y0) is negative definite and f has as strict local max at (x0, y0). IfdetHf (x0, y0) < 0, then Hf (x0, y0) is indefinite according to Example 5.10, and Th. 5.14yields that f has neither a local max nor a local min at (x0, y0). Such a stationary point,where detHf (x0, y0) < 0, is called a saddle point – in a neighborhood of such a point,the graph of f is shaped like a saddle. In the remaining case, namely detHf (x0, y0) =0, one knows from Example 5.10 that Hf (x0, y0) is positive semidefinite or negativesemidefinite. In this case, Th. 5.14 does not provide any information, i.e., withoutfurther investigation, one can not say if f does or does not have an extreme value at(x0, y0).

Let us look at two concrete cases:

(a) Consider f : R2 −→ R, f(x, y) := x2+y2. Then∇ f(x, y) = (2x, 2y) andHf (x, y) =(2 00 2

)

. Thus, (0, 0) is the only stationary point of f . Since detHf (0, 0) = 4 > 0,

f has a strict local min at (0, 0) and this is the only point, where f has a localextreme value. Moreover, since f(x, y) > 0 for (x, y) 6= (0, 0), f also has a strictglobal min at (0, 0).

(b) Consider f : R2 −→ R, f(x, y) := x2 − y2. Then ∇ f(x, y) = (2x,−2y) and

Hf (x, y) =

(2 00 −2

)

. Thus, (0, 0) is the only stationary point of f . Here, one

has detHf (0, 0) = −4 < 0, i.e. f does not have a local min or max at (0, 0) (oranywhere else). Thus, (0, 0) is an example of a saddle point.

—

Let us summarize the general strategy for determining extreme values of differentiablefunctions f defined on a set G: One starts by seeking all stationary points of f , thatmeans the points ξ, where ∇ f(ξ) = 0. Every min or max of f that lies in the interiorof G must be included in the set of stationary points. To investigate if a stationarypoint is, indeed, a max or a min, one will compute the Hessian matrix Hf at this point,and one will determine the definiteness properties of Hf . Then one can use Th. 5.14 todecide if the stationary point is a max, a min, or neither, except for cases, where Hf isonly (positive or negative) semidefinite, in which case Th. 5.14 does not help and onehas to resort to other means (which can be difficult). As is already know from functionsdefined on G ⊆ R, one also has to investigate the behavior of f at the boundary ofG if one wants to find out if one of the local extrema is actually a global extremum.Moreover, if f is defined on ∂G, then ∂G might contain further local extrema of f .

5.4 Constrained Optimization, Lagrange Multipliers

Constrained optimization can be seen as restricting the function f : A −→ R to beminimized or maximized to some subset of A, determined by the constraints. Constraints


can be given explicitly or implicitly. If A = R, then an explicit constraint might be toseek nonnegative solutions. Implicit constraints can be given in the form of equationconstraints – for example, if A = Rn, then one might want to minimize f on the set ofall solutions to Ma = b, where M is some real m× n matrix.

If one is seeking extrema of a differentiable function f : G −→ R, G ⊆ Rn+m, underthe constraint g(x) = 0, where g : G −→ Rm, then, under suitable hypotheses, onecan obtain necessary conditions using the trick of introducing additional (auxiliary)variables. These additional variables are called Lagrange multipliers. The proof is anapplication of the implicit function Th. 4.49.

Theorem 5.16. Let m,n ∈ N, let G ⊆ Rn × Rm be open, f ∈ C1(G,R), and g ∈C1(G,Rm). Suppose (ξ, η) ∈ G and, using the notation of (4.67), suppose Dyg(ξ, η) isinvertible. If f has a local min or local max at (ξ, η) under the constraint g = 0 (thatmeans, more precisely, f ↾{(x,y)∈G: g(x,y)=0} has a local min or local max at (ξ, η)), thenthere exists µ = (µ1, . . . , µm) ∈ Rm such that

H : G×Rm −→ R, H(x, y, λ) := f(x, y)+λ·g(x, y) = f(x, y)+m∑

k=1

λkgk(x, y), (5.14)

has a stationary point at (ξ, η, µ), i.e. ∇H(ξ, η, µ) = 0, i.e.

∀j∈{1,...,n}

∂xjH(ξ, η, µ) = ∂xj

f(ξ, η) +m∑

k=1

µk∂xjgk(ξ, η) = 0, (5.15a)

∀j∈{1,...,m}

∂yjH(ξ, η, µ) = ∂yjf(ξ, η) +m∑

k=1

µk∂yjgk(ξ, η) = 0, (5.15b)

∀j∈{1,...,m}

∂λjH(ξ, η, µ) = gj(ξ, η) = 0. (5.15c)

The additional variables λ1, . . . , λm are called Lagrange multipliers.

Proof. According to the hypotheses, the implicit function Th. 4.49 applies to g at (ξ, η).Thus, there exist U ⊆ Rn open and V ⊆ Rm open such that ξ ∈ U , η ∈ V , and suchthat there exists a continuously differentiable h : U −→ V , satisfying

∀(x,y)∈U×V

(

g(x, y) = 0 ⇔ y = h(x))

as well as∀

x∈UDh(x) = −

(Dyg(x, h(x))

)−1Dxg(x, h(x)).

If f has a local min or local max at (ξ, η) under the constraint g = 0, then ϕ : U −→ R,ϕ(x) := f(x, h(x)) has a local min or local max at ξ (with no constraint). Thus,according to Th. 5.13, ∇ϕ(ξ) = 0. On the other hand, the chain rule of Th. 4.31 yields

∇ϕ(ξ) = Df(ξ, h(ξ))

(Idn

Dh(ξ)

)

=(Dxf(ξ, η) Dyf(ξ, η)

)(

Idn

Dh(ξ)

)

= Dxf(ξ, η)−Dyf(ξ, η)(Dyg(ξ, η)

)−1Dxg(ξ, η) = 0. (5.16)


Letting µ := Dyf(ξ, η)(Dyg(ξ, η)

)−1 ∈ Rm, (5.16) reads

Dxf(ξ, η)− µ ·Dxg(ξ, η) = 0,

i.e. (5.15a) holds. On the other hand,

Dyf(ξ, η)− µ ·Dyg(ξ, η) = Dyf(ξ, η)−Dyf(ξ, η)(Dyg(ξ, η)

)−1Dyg(ξ, η) = 0,

showing that (5.15b) holds as well. As (5.15c) holds simply due to g(ξ, η) = 0, the proofis complete. �

In the formulation of Th. 5.16, there is an a priori distinction between the variables ysuch that Dyg(ξ, η) is invertible and the remaining variables x. In practice, however,there is often no such a priori distinction. Thus, we provide the following reformulationof Th. 5.16:

Corollary 5.17. Let m,n ∈ N, m < n, let G ⊆ Rn be open, f ∈ C1(G,R), andg ∈ C1(G,Rm). Suppose ξ ∈ G and Dg(ξ) has rank m. If f has a local min or localmax at ξ under the constraint g = 0 (that means, more precisely, f ↾{x∈G: g(x)=0} has alocal min or local max at ξ), then there exists µ = (µ1, . . . , µm) ∈ Rm such that

H : G× Rm −→ R, H(x, λ) := f(x) + λ · g(x) = f(x) +m∑

k=1

λkgk(x), (5.17)

has a stationary point at (ξ, µ), i.e. ∇H(ξ, µ) = 0, i.e.

∀j∈{1,...,n}

∂xjH(ξ, µ) = ∂xj

f(ξ) +m∑

k=1

µk∂xjgk(ξ) = 0, (5.18a)

∀j∈{1,...,m}

∂λjH(ξ, µ) = gj(ξ) = 0. (5.18b)

Proof. Since Dg(ξ) has rank m < n, there exist j1, . . . , jm ∈ {1, . . . , n} such that onecan let y := (xj1 , . . . , xjm) and apply Th. 5.16. �

Example 5.18. Let us maximize

f : [0, π]n −→ R, f(x) :=n∑

j=1

sin xj,

for n ∈ N, n ≥ 3, under the constraint that g(x) = 0, where8

g : [0, π]n −→ R, g(x) := −2π +n∑

j=1

xj.

8Geometrically, f(x) is twice the area of an n-gon with vertices on the unit circle, where the jthvertex Pj ∈ R2 has coordinates (cos sj , sin sj), sj =

∑n

k=1xk: Here, sinxj is twice the area of the

triangle Pj0Pj+1, since, if one uses 0Pj as the base, then sinxj is the height of the triangle (xj is thesize of the angle between 0Pj and 0Pj+1). Thus, we are aiming at maximizing the area among alln-gons with vertices on the unit circle.


Then f and g are defined on G with G := (]0, π[)n. Clearly, f and g are continouslydifferentiable on G and

∀j∈{1,...,n}

∀x∈G

∂jg(x) = 1,

i.e. each Dxjg(x) is invertible. Thus, the hypotheses of Cor. 5.17 are satisfied and, to

apply Cor. 5.17, we define

H : G× R −→ R, H(x, λ) := f(x) + λg(x) = −2πλ+n∑

j=1

(sin xj + λxj).

According to (5.18a), for f to have a local min or max at ξ ∈ G, under the constraintg = 0, it is a necessary condition that

∂xjH(ξ, µ) = cos ξj + µ = 0,

for some suitable µ ∈ R. As each ξj ∈ I :=]0, π[ and the range of cos on I is ]− 1, 1[, µmust be in ] − 1, 1[. Moreover, since cos is injective on I, there exists α ∈ I such thatξj = α for each j ∈ {1, . . . , n}. Thus,

0 = g(ξ) = −2π + nα

implies α = 2πn. We can now show that, under the constraint g = 0, f : G −→ R

actually has its global max9 at ξ: Since C := {x ∈ G : g(x) = 0} is compact and f iscontinuous, f must have a global max on C. As every global max is a local max, if it isnot at ξ, then it must occur at some x0 ∈ C ∩ ∂G. The value of f at ξ is

An := f(ξ) = n sin2π

n= 2πφ

(2π

n

)

,

where

φ : R+ −→ R, φ(t) :=sin t

t. (5.19)

If we can show φ to be strictly decreasing on I, then

∀n≥3

An < An+1. (5.20)

Using (5.20), we are in a position to show that the global max of f on C is at ξ viainduction on n ≥ 3: Note

x0 ∈ ∂G ⇒ ∃j∈{1,...,n}

x0j ∈ {0, π}. (5.21)

For n = 3, if some x0j = 0, then the remaining coordinates of x0 must equal π (due tog(x0) = 0). Thus f(x0) = 0 < f(ξ). If some x0j = π, then, as sin(x0j) = 0,

f(x0) ≤ 2 <3√3

2= 3 sin

2π

3= f(ξ).

9Geometrically, we have then shown that among all n-gons with vertices on the unit circle, theregular n-gon maximizes the area.

A SET-THEORETIC RULES FOR CARTESIAN PRODUCTS 117

Now let n > 3. If some x0j = 0, then the sum of the remaining coordinates of x0 mustequal 2π and, by induction

f(x0)ind.hyp.

≤ An−1

(5.20)< An,

showing f does not have a global max at x0. If some x0j = π, then the sum of theremaining coordinates of x0 must equal π, implying

f(x0) =n∑

k=1,k 6=j

sin(x0k) ≤n∑

k=1,k 6=j

x0k = π < 4 = A4

(5.20)

≤ f(ξ),

again showing f does not have a global max at x0. So it only remains to verify that thefunction φ of (5.19) is strictly decreasing on I. To this end, note

φ′ : R+ −→ R, φ′(t) :=t cos t− sin t

t2,

i.e. φ′(t) < 0 for t ∈ I: t < tan t for 0 < t < π2, φ′(π

2) = −1 < 0, and both cos t and

− sin t are negative for π2< t < π.

A Set-Theoretic Rules for Cartesian Products

Proposition A.1. Let I be a nonempty index set and (Xi)i∈I a family of sets. Weconsider the Cartesian product X :=

∏

i∈I Xi (cf. [Phi16, Def. 2.15(c)]). Let J also bea nonempty index set and, for each i ∈ I, let (Aij)j∈J be a family of subsets of Xi. Wethen have the following rules:

⋂

j∈J

(∏

i∈I

Aij

)

=∏

i∈I

(⋂

j∈J

Aij

)

, (A.1)

⋃

j∈J

(∏

i∈I

Aij

)

⊆∏

i∈I

(⋃

j∈J

Aij

)

. (A.2)

Proof. Define

∀j∈J

Aj :=∏

i∈I

Aij ⊆ X. (A.3)

Consider x := (xi)i∈I ∈ X, i.e. xi ∈ Xi for each i ∈ I. Then

x ∈⋂

j∈J

Aj ⇔ ∀j∈J

x ∈ Aj ⇔ ∀j∈J

∀i∈I

xi ∈ Aij

⇔ ∀i∈I

xi ∈⋂

j∈J

Aij ⇔ x ∈∏

i∈I

(⋂

j∈J

Aij

)

, (A.4)

B BOX TOPOLOGY 118

proving (A.1). Moreover,

x ∈⋃

j∈J

Aj ⇔ ∃j∈J

x ∈ Aj ⇔ ∃j∈J

∀i∈I

xi ∈ Aij ⇒ ∀i∈I

∃j∈J

xi ∈ Aij

⇔ ∀i∈I

xi ∈⋃

j∈J

Aij ⇔ x ∈∏

i∈I

(⋃

j∈J

Aij

)

, (A.5)

proving (A.2). �

Example A.2. We give an example that show that, in general, equality does not holdin (A.2): Let X1 := X2 := {1, 2}, A1 := {1}, A2 := {2}, Y1 := A1 × A2 = {(1, 2)},Y2 := A2 × A1 = {(2, 1)}. Then

Y1 ∪ Y2 = {(1, 2), (2, 1)} 6= {1, 2} × {1, 2} = (A1 ∪ A2)× (A2 ∪ A1). (A.6)

B Box Topology

Example B.1. Let (Xi, Ti) be arbitrary topological spaces. Consider the set

Bb :=

{∏

i∈I

Oi :

(

∀i∈I

Oi ∈ Ti

)}

=

{⋂

i∈I

π−1i (Oi) : ∀

i∈IOi ∈ Ti

}

,

whereπj : X −→ Xj, πj

((xi)i∈I

):= xj,

denote the projections. Analogous to the base Bp of the product topology of Ex. 1.53(a),Bb also satisfies conditions (i) and (ii) of Prop. 1.48 (since X ∈ Bb and since Bb is closedunder finite intersections by (A.1) of Appendix A and Def. 1.1(iii)) and, thus, also formsa base for a topology Tb on X, called the box topology on X. Analogous to Ex. 1.53(a),another base for Tb is

B∗b :=

{∏

i∈I

Bi :

(

∀i∈I

Bi ∈ Bi

)}

=

{⋂

i∈I

π−1i (Bi) : ∀

i∈IBi ∈ Bi

}

.

Let us compare the box topology on X with the product topology of Ex. 1.53(a): SinceBp ⊆ Bb, we always have Tp ⊆ Tb. If I is finite, then Bp = Bb and Tp = Tb. The same istrue if one Xi = ∅ (then X = ∅) or if all, but finitely many, of the Ti are indiscrete. Inall other cases, Tb is strictly finer than Tp: Each set A :=

⋂

i∈I π−1i (Oi), each Oi ∈ Ti is

in Tb, but not in Tp if infinitely many Oi 6= Xi (since, in that case, A does not containa set from Bp as a subset). Moreover, in this case, the sets Sp and S∗

p of Ex. 1.53(a) are

not subbases of Tb, since A is not a finite intersection of sets of the form π−1i (Oi).

Tb is not separable if I is not finite and each Ti contains two disjoint open sets Oi,1

and Oi,2 (in particular, KR is not separable in the box topology): If f : I −→ {0, 1},then Of :=

∏

i∈I Oi,f(i) ∈ Tb. Then the elements of O := {Of : (f : I −→ {0, 1})} are

C UNIFORM CONTINUITY AND LIPSCHITZ CONTINUITY 119

pairwise disjoint. Since f 7→ Of is an injective map from 2I to O and 2I is not countableif I is not finite, O is not countable, i.e. no countable subset of X can be dense.

Next, we prove that the box topology on X = KR is not first countable (in particular,not metrizable) – a property shared with the (coarser) product topology on X = KR

(note that there is no easy way to infer one from the other – both the indiscrete (thecoarsest) and the discrete (the finest) topology on X are first countable): Seeking acontradiction, let f ∈ X and let N be a countable local base at f , where (Nk)k∈N is anenumeration of the elements of N . For each k ∈ N, consider πk(Nk), which, as it is openin K, must contain an open interval Ik with f(k) ∈ Ik, i.e. we can choose ak, bk ∈ Ikwith ak < f(k) < bk. Letting Ok :=]ak, bk[, we have Ok ( Ik and O :=

⋂

k∈N π−1k (Ok) is

a box topology neighborhood of f . However, O does not contain any of the Nk (sinceπk(O) = Ok is strictly contained in πk(Nk)), a contradiction to N being a local base atf (note that the argument even shows that Tb on X = KN is not first countable).

To finish this example, let us check explicitly that the box topology onX = KR is not thetopology of pointwise convergence. Since Tp ⊆ Tb, every sequence in X converging withrespect to Tb must also converge with respect to Tp. However, we claim that the sequence(fk)k∈N in X, where fk ≡ 1

k, which, clearly, converges pointwise (even uniformly) to

f ≡ 0, does not converge to f ≡ 0 in the box topology: For each 0 6= s ∈ R, letOs := Bs(0) ⊆ K, O0 := K, O :=

∏

s∈ROs. Then O ∈ Bb, in particular, open withrespect to the box topology. However, O contains none of the fk: fk(s) /∈ Os for each0 < s < 1

k.

C Uniform Continuity and Lipschitz Continuity

This section provides some additional important results regarding uniformly continu-ous functions and Lipschitz continuous functions (see Def. 2.3(b)). We start with anauxiliary result:

Lemma C.1. If f, g are real-valued functions on a set X, i.e. if f, g : X −→ R, then,for each x, y ∈ X,

∣∣max(f, g)(x)−max(f, g)(y)

∣∣ ≤ max

{|f(x)− f(y)|, |g(x)− g(y)|

}, (C.1a)

∣∣min(f, g)(x)−min(f, g)(y)

∣∣ ≤ max

{|f(x)− f(y)|, |g(x)− g(y)|

}. (C.1b)

Proof. By possibly switching the names of f and g, one can assume, without loss ofgenerality, that max(f, g)(x) = f(x), i.e. g(x) ≤ f(x). If g(y) ≤ f(y) as well, then∣∣max(f, g)(x)−max(f, g)(y)

∣∣ = |f(x)−f(y)| and


∣∣ = |g(x)−

C UNIFORM CONTINUITY AND LIPSCHITZ CONTINUITY 120

g(y)|, i.e. (C.1) is true. If g(y) > f(y), then


∣∣ = |f(x)− g(y)| ≤

{

≤ |g(x)− g(y)| for f(x) ≤ g(y),

< f(x)− f(y) for f(x) > g(y),

(C.2a)


∣∣ = |g(x)− f(y)| ≤

{

< |g(x)− g(y)| for g(x) ≤ f(y),

≤ f(x)− f(y) for g(x) > f(y),

(C.2b)

showing that (C.1) holds in all cases. �

Theorem C.2. Let (X, d) be a metric space (e.g. a normed space), (Y, ‖ · ‖) a normedvector space over K, and assume that f, g : X −→ Y are uniformly continuous. Thenf + g and λf are uniformly continuous for each λ ∈ K, i.e. the set of all uniformlycontinuous functions from X into Y constitutes a subspace of the vector space F(X, Y )over K. Moreover, if Y = K = R, then max(f, g), min(f, g), f+, f−, |f | are alluniformly continuous.

Proof. As f and g are uniformly continuous, given ǫ > 0, there exist δf > 0 and δg > 0such that, for each x, y ∈ X,

d(x, y) < δf ⇒∥∥f(x)− f(y)

∥∥ < ǫ/2, (C.3a)

d(x, y) < δg ⇒∥∥g(x)− g(y)

∥∥ < ǫ/2. (C.3b)

Thus, if d(x, y) < min{δf , δg}, then∥∥(f + g)(x)− (f + g)(y)

∥∥ ≤

∥∥f(x)− f(y)

∥∥+

∥∥g(x)− g(y)

∥∥ <

ǫ

2+ǫ

2= ǫ, (C.3c)

showing that f + g is uniformly continuous. Next, if λ = 0, then λf ≡ 0, and obviouslyuniformly continuous. For λ 6= 0, choose δ > 0 such that d(x, y) < δ implies ‖f(x) −f(y)‖ < ǫ/|λ|. Then

∥∥(λf)(x)− (λf)(y)

∥∥ = |λ|

∥∥f(x)− f(y)

∥∥ < |λ| ǫ|λ| = ǫ, (C.3d)

showing that λf is uniformly continuous. If Y = K = R, then d(x, y) < min{δf , δg}together with Lem. C.1 implies


∣∣ < ǫ/2 < ǫ, (C.3e)


∣∣ < ǫ/2 < ǫ, (C.3f)

showing the uniform continuity of max(f, g) and min(f, g) and, in turn, also of f+, f−,and |f |. �

Theorem C.3. Let (X, d) be a metric space (e.g. a normed space), (Y, ‖ · ‖) a normedvector space over K, and assume that f, g : X −→ Y are Lipschitz continuous. Thenf+g and λf are Lipschitz continuous for each λ ∈ K, i.e. the set Lip(X, Y ) constitutes asubspace of the vector space F(X, Y ) over K. Moreover, if Y = K = R, then max(f, g),min(f, g), f+, f−, |f | are all Lipschitz continuous.

D VIEWING CN AS R2N 121

Proof. As f and g are Lipschitz continuous, there exist Lf ≥ 0 and Lg ≥ 0 such that,for each x, y ∈ X,

∥∥f(x)− f(y)

∥∥ ≤ Lf d(x, y), (C.4a)

∥∥g(x)− g(y)

∥∥ ≤ Lg d(x, y). (C.4b)

Thus,∥∥(f + g)(x)− (f + g)(y)

∥∥ ≤

∥∥f(x)− f(y)

∥∥+

∥∥g(x)− g(y)

∥∥

≤ Lf d(x, y) + Lg d(x, y) = (Lf + Lg)d(x, y), (C.4c)

showing that f + g is Lipschitz continuous with Lipschitz constant Lf + Lg. Next, forλ ∈ K,

∥∥(λf)(x)− (λf)(y)

∥∥ = |λ|

∥∥f(x)− f(y)

∥∥ ≤ |λ|Lf d(x, y), (C.4d)

showing that λf is Lipschitz continuous with Lipschitz constant |λ|Lf . For Y = K = R,Lem. C.1 shows max(f, g) and min(f, g) are Lipschitz continuous with Lipschitz constantmax{Lf , Lg}, f+ and f− are Lipschitz continuous with Lipschitz constant Lf , and |f |is Lipschitz continuous with Lipschitz constant 2Lf . �

Caveat C.4. Products and quotients of uniformly continuous functions are not neces-sarily uniformly continuous; products and quotients of Lipschitz continuous functionsare not necessarily Lipschitz continuous: Even though f ≡ 1 and g(x) = x are Lipschitzcontinuous, it was shown in Examples 2.5(a),(b), respectively, that f/g and g2 are noteven uniformly continuous on R+.

D Viewing Cn as R2n

Remark D.1. Recall that the set of complex numbers C is defined to be R2, where theimaginary unit is i := (0, 1) ∈ R2, which allows to write each z = (x, y) ∈ C = R2 asz = x+ iy, where x = Re z and y = Im z. This, for each n ∈ N, gives rise to the R-linearbijective map

I : Cn −→ R2n, I((x1, y1), . . . , (xn, yn)

):= (x1, y1, . . . , xn, yn), (D.1)

allowing to canonically identify Cn with R2n.

—

The identification (D.1) allows the identification of metric structures on Cn and R2n dueto the following general result:

Proposition D.2. Let X, Y be sets, let d : X ×X −→ R+0 be a metric on X, and let

I : X −→ Y be bijective. Then

dY : Y × Y −→ R+0 , dY (x, y) := d

(I−1(x), I−1(y)

), (D.2)

defines a metric on Y such that (X, d) and (Y, dY ) are isometric (with the map I pro-viding the isometry).

D VIEWING CN AS R2N 122

Proof. Let x, y, z ∈ Y . Then

dY (x, y) = 0 ⇔ d(I−1(x), I−1(y)

)= 0 ⇔ I−1(x) = I−1(y) ⇔ x = y,

(D.3)

showing that dY is positive definite. Moreover,

dY (x, y) = d(I−1(x), I−1(y)

)= d(I−1(y), I−1(x)

)= dY (y, x), (D.4)

showing dY is symmetric. Finally,

dY (x, z) = d(I−1(x), I−1(z)

)≤ d(I−1(x), I−1(y)

)+ d(I−1(y), I−1(z)

)

= dY (x, y) + dY (y, z), (D.5)

proving the triangle inequality for dY and completing the proof that dY constitutes ametric. That I provides an isometry between (X, d) and (Y, dY ) is immediate from(D.2). �

Corollary D.3. Let n ∈ N, let d : Cn × Cn −→ R+0 be a metric, and let I be the map

from (D.1). Then

dr : R2n × R2n −→ R+

0 , dr(x, y) := d(I−1(x), I−1(y)

), (D.6)

defines a metric on R2n such that (Cn, d) and (R2n, dr) are isometric (with the map Iproviding the isometry). Moreover, the map d 7→ dr is bijective between the set of metricson Cn and the set of metrics on R2n. �

Proposition D.4. Let n ∈ N. If ‖ · ‖ constitutes a norm on the vector space Cn overC, then

‖ · ‖r : R2n −→ R+0 , ‖(x1, y1, . . . , xn, yn)‖r :=

∥∥((x1, y1), . . . , (xn, yn)

)∥∥ (D.7)

defines a norm on the vector space R2n over R such that (Cn, ‖ · ‖) and (R2n, ‖ · ‖r) areisometric (with the map I from (D.1) providing the isometry – even more precisely, if dand dr denote the respective induced metrics, then the relation between d and dr is givenby (D.6)).


Example D.5. Let n ∈ N, p ∈ [1,∞], and let ‖·‖ denote the p-norm on the vector spaceRn over R, i.e. ‖x‖ := (

∑nj=1 |xj|p)1/p for p <∞ and ‖x‖ = max{|xj| : j = 1, . . . , n} for

p = ∞. Then it is an exercise to show

‖ · ‖c : Cn −→ R+0 , ‖(z1, . . . , zn)‖c := ‖(|z1|, . . . , |zn|)‖ (D.8)

defines a norm on the vector space Cn over C.

E PSEUDOMETRICS AND SEMINORMS 123

Remark D.6. As a consequence of Th. 1.72, every norm on the normed vector spaceCn over C generates precisely the same open subsets of Cn – in other words, there isonly one norm topology on Cn. Analogously, there is only one norm topology on Rn asevery norm on the normed vector space Rn over R generates precisely the same opensubsets of Rn Moreover, Prop. D.4 shows that the open sets of the norm topology onCn are actually precisely the same as the open sets of the norm topology on R2n.

Theorem D.7. Let n ∈ N, A ⊆ Cn. Then A is bounded in the normed vector space Cn

over C if, and only if, A is bounded in the normed vector space R2n over R.


E Pseudometrics and Seminorms

If one omits the requirement of definiteness in the definitions of metric and norm, thenone obtains what is called a pseudometric and a seminorm, respectively. In spite of thepossible loss of definiteness, one can still carry out many parts of the theory analogously.In particular, a pseudometric still induces a topology (however, one can no longer expectthis topology to be Hausdorff (not even T1, actually), as it can happen that there arepoints x, y such that every open set that contains x also contains y (i.e. y ∈ {x}).

Definition E.1. Let X be a set. A function d : X×X −→ R+0 is called a pseudometric

or semimetric on X if, and only if, the following three conditions are satisfied:

(i) For each x ∈ X, one has d(x, x) = 0.

(ii) d is symmetric, i.e., for each (x, y) ∈ X ×X, d(y, x) = d(x, y).

(iii) d satisfies the triangle inequality, i.e., for each (x, y, z) ∈ X3, d(x, z) ≤ d(x, y) +d(y, z).

If d constitutes a pseudometric on X, then the pair (X, d) is called a pseudometric space.

Definition E.2. Let X be a vector space over the field K. Then a function ‖·‖ : X −→R+

0 is called a seminorm on X if, and only if, the following three conditions are satisfied:

(i) ‖0‖ = 0.

(ii) ‖ · ‖ is homogeneous of degree 1, i.e.

‖λx‖ = |λ|‖x‖ for each λ ∈ K, x ∈ X.

(iii) ‖ · ‖ satisfies the triangle inequality, i.e.

‖x+ y‖ ≤ ‖x‖+ ‖y‖ for each x, y ∈ X.


If ‖ · ‖ constitutes a seminorm on X, then the pair (X, ‖ · ‖) is called a seminormedvector space or just seminormed space.

Remark E.3. The proof of Lem. 1.8 shows that, if (X, ‖ · ‖) is a seminormed space,then

d : X ×X −→ R+0 , d(x, y) := ‖x− y‖, (E.1)

constitutes a pseudometric onX. One calls d the pseudometric induced by the seminorm‖ · ‖.

Remark E.4. Let (X, d) be a pseudometric space. Given x ∈ X and r ∈ R+, the openball Br(x), the closed ball Br(x), and the sphere Sr(x) are still defined precisely as inDef. 1.10. And analogous to Def. 1.10, we define a set U ⊆ X to be a neighborhood of xif, and only if, there is ǫ ∈ R+ such that Bǫ(x) ⊆ U . We call O ⊆ X open if, and only if,

∀x∈O

∃ǫ∈R+

Bǫ(x) ⊆ O.

Theorem E.5. Let (X, d) be a pseudometric space. Then T := {O ⊆ X : O open}constitutes a topology on X: One also calls T the topology induced by the pseudometricd, making each pseudometric space into a topological space.

Proof. Clearly, ∅ ∈ T and X ∈ T . Now consider finitely many open sets O1, . . . , ON ∈T , N ∈ N, and let O :=

⋂Nj=1Oj. We have to prove that O is open. Hence, let x ∈ O.

Then x ∈ Oj for each j ∈ {1, . . . , N}. Since each Oj is open, for each j ∈ {1, . . . , N},there is ǫj > 0 such that Bǫj(x) ⊆ Oj. If we let ǫ := min{ǫj : j ∈ {1, . . . , N}}, thenǫ > 0 and Bǫ(x) ⊆ Bǫj(x) ⊆ Oj for each j ∈ {1, . . . , N}, i.e. Bǫ(x) ⊆ O, showing O isopen. Now let I be an arbitrary index set. For each j ∈ I, let Oj ∈ T . We have toverify that O :=

⋃

j∈I Oj is open. Let x ∈ O. Then there is j ∈ I such that x ∈ Oj.Since Oj is open, there is ǫ > 0 such that Bǫ(x) ⊆ Oj ⊆ O, showing O to be open. �

Definition E.6. A topological space (X, T ) is called pseudometrizable if, and only if,there exists a pseudometric d on X such that T is induced by d.

Remark E.7. Let (X, T ) be a topological space, where T is induced by the pseudo-metric d on X.

(a) As for metric spaces, one can still characterize the convergence in pseudometricspaces via the convergence of distances: Let (xi)i∈I be a net in X, and x ∈ X. Sinceevery ball Bǫ(x), ǫ > 0, is a neighborhood of x and, conversely, every U ∈ U(x)contains some ball Bǫ(x) ⊆ U , ǫ > 0, we have the equivalence

limi∈I

xi = x ⇔ ∀ǫ∈R+

∃i∈I

∀j≥i

d(xi, x) < ǫ.

(b) Example 1.33 still works exactly the same for pseudometric spaces: Given x ∈ Xand r ∈ R+, the open ball Br(x) is an open set and the closed ball Br(x) is a closedset.


(c) As for metric spaces, one still has that pseudometric spaces are first countable: Foreach x ∈ X,

B(x) :={Bǫ(x) : ǫ ∈ Q+

}

constitutes a countable local base at x.

(d) Proposition 1.55(d) still works exactly the same for pseudometric spaces, showingthat, for M ⊆ X, the subspace topology TM on M is pseudometrizable by d↾M×M .

(e) Lemma 1.57 still works exactly the same for pseudometric spaces, i.e.

|d(x, y)− d(x′, y′)| ≤ d(x, x′) + d(y, y′) for each x, x′, y, y′ ∈ X, (E.2)

and, if (X, ‖ · ‖) is a seminormed space, then

∣∣‖x‖ − ‖y‖

∣∣ ≤ ‖x− y‖ for each x, y ∈ X. (E.3)

In particular, seminorms are still continuous (and even Lipschitz continuous, if oneextends this notion to pseudometric spaces).

—

We will now see how to obtain metric spaces from pseudometric spaces (and normedspaces from seminormed spaces) by identifying points with d(x, y) = 0:

Theorem E.8. Let (X, d) be a pseudometric space. Define an equivalence relation onX by letting

x ∼ y :⇔ d(x, y) = 0. (E.4)

Let Y := {[x] : x ∈ X} be the set of the corresponding equivalence classes and define

ρ : Y × Y −→ R+0 , ρ([x], [y]) := d(x, y). (E.5)

Then ρ is a metric on Y ; f : X −→ Y , f(x) := [x], is surjective and continuous.

Proof. We start by verifying that∼ is, indeed, an equivalence relation onX: x ∼ x, sinced(x, x) = 0; x ∼ y implies y ∼ x, since d(x, y) = 0 implies d(y, x) = 0; x ∼ y and y ∼ zimplies x ∼ z, since d(x, y) = 0 = d(y, z) implies 0 ≤ d(x, z) ≤ d(x, y) + d(y, z) = 0, i.e.d(x, z) = 0. Next, we show ρ to be well-defined: If [x] = [x] and [y] = [y], then x ∼ xand y ∼ y, i.e. d(x, x) = d(y, y) = 0. Thus,

∣∣ρ([x], [y])− ρ([x], [y])

∣∣ =

∣∣d(x, y)− d(x, y)

∣∣(E.2)

≤ d(x, x) + d(y, y) = 0,

showing ρ([x], [y]) = ρ([x], [y]) as desired. In the next step, we verify ρ to be a metric:ρ([x], [x]) = d(x, x) = 0. If ρ([x], [y]) = 0, then d(x, y) = 0, i.e. x ∼ y and [x] = [y],showing ρ to be positive definite. As one also has ρ([x], [y]) = d(x, y) = d(y, x) =ρ([y], [x]) and ρ([x], [y]) = d(x, y) ≤ d(x, z) + d(z, y) = ρ([x], [z]) + ρ([z], [y]), ρ is a

F INITIAL AND FINAL TOPOLOGIES, QUOTIENT SPACES 126

metric. That f is surjective is immediate. For the continuity, let (xk)k∈N be a sequencein X such that limk→∞ xk = x ∈ X. Then

limk→∞

ρ(f(xk), f(x)) = limk→∞

ρ([xk], [x]) = limk→∞

d(xk, x) = 0,

proving limk→∞ f(xk) = f(x) ∈ Y . Since (X, d) is first countable, this shows f to becontinuous, completing the proof. �

The corresponding result for seminorms and norms is analogous:

Theorem E.9. Let (X,N) be a seminormed vector space over K. Then

V := N−1{0} (E.6)

is a subspace (over K) of X. Let Y := X/V be the corresponding factor space and define

‖ · ‖ : Y −→ R+0 , ‖V + x‖ := N(x). (E.7)

Then ‖ · ‖ is a norm on Y ; f : X −→ Y , f(x) := V + x, is linear, surjective, andcontinuous.

Proof. We start by verifying that V is, indeed, a subspace: 0 ∈ V , since N(0) = 0; ifx, y ∈ V , then N(x) = N(y) = 0, implying 0 ≤ N(x + y) ≤ N(x) + N(y) = 0, i.e.N(x + y) = 0 and x + y ∈ V ; if x ∈ V and λ ∈ K, then N(λx) = |λ|N(x) = 0, i.e.λx ∈ V . Next, we show ‖ · ‖ to be well-defined: If V + x = V + x, then x− x ∈ V i.e.N(x− x) = 0. Thus,

∣∣N(x)−N(x)

∣∣(E.3)

≤ N(x− x) = 0,

showing ‖V + x‖ = ‖V + x‖ as desired. In the next step, we verify ‖ · ‖ to be a norm:‖V + 0‖ = N(0) = 0. If ‖V + x‖ = 0, then N(x) = 0, i.e. x ∈ V and V + x = V + 0,showing ‖·‖ to be positive definite. As one also has, for each λ ∈ K, ‖V +λx‖ = N(λx) =|λ|N(x) = |λ| ‖V +x‖ and ‖V +x+y‖ = N(x+y) ≤ N(x)+N(y) = ‖V +x‖+‖V +y‖,‖·‖ is a norm. That f is linear and surjective, since Y = X/V and f is the correspondingcannonical epimorphism. For the continuity, let (xk)k∈N be a sequence in X such thatlimk→∞ xk = x ∈ X. Then

limk→∞

‖f(xk)− f(x)‖ = limk→∞

‖V + xk − x‖ = limk→∞

N(xk − x) = 0,

proving limk→∞ f(xk) = f(x) ∈ Y . Since (X,N) is first countable, this shows f to becontinuous, completing the proof. �

F Initial and Final Topologies, Quotient Spaces

In this section, we will briefly study two very general construction principles that areuseful in topology as well as in some other branches of mathematics (one of the construc-tion principles we have actually already used when constructing subspace and producttopologies, cf. Ex. F.4(a),(b) below).


Definition F.1. Let X be a set and let (Xi, Ti)i∈I be a family of topological spaces,I 6= ∅.

(a) Given a family of functions (fi)i∈I , fi : X −→ Xi, the initial or weak topology on Xwith respect to the family (fi)i∈I is the coarsest topology T on X that makes all ficontinuous (i.e. T is the intersection of all topologies that make all fi continuous –this intersection is well-defined, since the discrete topology on X always makes allfi continuous). The name initial topology stems from the fi being initially in X.

(b) Given a family of functions (fi)i∈I , fi : Xi −→ X, the final topology on X withrespect to the family (fi)i∈I is

T :=

{

O ⊆ X : ∀i∈I

f−1i (O) ∈ Ti

}

. (F.1)

We will see in Lem. F.2(b) below that T is, indeed, a topology, and that it is thefinest topology on X that makes all fi continuous. The name final topology stemsfrom the fi being finally in X.10

Lemma F.2. Let X be a set and let (Xi, Ti)i∈I be a family of topological spaces, I 6= ∅.

(a) Given a family of functions (fi)i∈I , fi : X −→ Xi, the set

S :={f−1i (Oi) : Oi ∈ Ti, i ∈ I

}(F.2)

is a subbase of the initial topology T on X with respect to the family (fi)i∈I .

(b) Given a family of functions (fi)i∈I , fi : Xi −→ X, the final topology T on X withrespect to the family (fi)i∈I as defined in (F.1) is, indeed, a topology on X, and itis the finest topology on X that makes all fi continuous.

Proof. (a): Let τ(S) be the topology on X generated by S, and let T ′ be an arbitrarytopology on X that makes all fi continuous. Then, clearly, S ⊆ T ′, also implyingτ(S) ⊆ T ′. Thus, τ(S) ⊆ T . On the other hand, by the definition of S, τ(S) also hasthe property of making every fi continuous, proving τ(S) = T .

(b): We verify that T is a topology. Fix i ∈ I. Then f−1i (∅) = ∅ ∈ Ti and f

−1i (X) =

Xi ∈ Ti, showing ∅, X ∈ T . If O1, O2 ∈ T , then f−1i (O1∩O2) = f−1

i (O1)∩f−1i (O2) ∈ Ti,

showing O1 ∩ O2 ∈ T . If Oj ∈ T , j ∈ J , then f−1i (⋃

j∈J Oj) =⋃

j∈J f−1i (Oj) ∈ Ti,

showing⋃

j∈J Oj ∈ T . Thus, T is a topology. It is immediate from (F.1) that every fi,i ∈ I, is continuous with respect to T . To see that T is the finest topology on X withthis property, we still need to show that every topology A on X making all fi continuousis contained in T . To this end, let A be such a topology on X. If O ∈ A, then, for eachi ∈ I, f−1

i (O) ∈ Ti, i.e. O ∈ T , showing A ⊆ T . �

10In the language of so-called Category Theory, we can say that the category of topological spaceshas initial and final objects – in Analysis III, we will see that the category of measurable spaces hasthat same property.


Proposition F.3. Let X be a set and let (Xi, Ti)i∈I be a family of topological spaces,I 6= ∅.

(a) Given a family of functions (fi)i∈I , fi : X −→ Xi, let T denote the initial topologyon X with respect to the family (fi)i∈I . Then T has the property that each mapg : Z −→ X from a topological space (Z, TZ) into X is continuous if, and only if,each map (fi ◦ g) : Z −→ Xi is continuous. Moreover, T is the only topology on Xwith this property.

(b) Given a family of functions (fi)i∈I , fi : Xi −→ X, let T denote the final topologyon X with respect to the family (fi)i∈I . Then T has the property that each mapg : X −→ Z from X into a topological space (Z, TZ) is continuous if, and only if,each map (g ◦ fi) : Xi −→ Z is continuous. Moreover, T is the only topology on Xwith this property.

Proof. (a): If g is continuous, then each composition fi ◦ g, i ∈ I, is also continuous.For the converse, let z ∈ Z and assume each fi ◦ g, i ∈ I, to be continuous in z.Let (zj)j∈J be a net in Z such that limj∈J zj = z. Let O ∈ Ti, i ∈ I, such thatg(z) ∈ f−1

i (O), i.e. such that (fi ◦ g)(z) ∈ O. Then the continuity of fi ◦ g in z implieslimj∈J(fi ◦ g)(zj) = (fi ◦ g)(z). Thus,

∃j0∈J

∀j≥j0

((fi ◦ g)(zj) ∈ O i.e. g(zj) ∈ f−1

i (O)),

implying limj∈J g(zj) = z by Lem. F.2(a) and Cor. 1.50(a). Thus, we obtain the conti-nuity of g in z. Now let A be an arbitrary topology on X with the property stated inthe hypothesis. Letting (Z, TZ) := (X,A) and g := IdX , we see that each fi is contin-uous with respect to A, implying T ⊆ A. Now let T ′ be an arbitrary topology on Xthat makes all fi continuous. Letting (Z, TZ) := (X, T ′), we see that g := IdX is T ′-Acontinuous (since each fi = IdX ◦fi is T ′-Ti continuous) i.e., for each O ∈ A, we haveg−1(O) = O ∈ T ′, showing A ⊆ T ′ and A ⊆ T , also completing the proof of A = T .

(b): If g is continuous, then each composition g ◦ fi, i ∈ I, is also continuous. Forthe converse, assume each g ◦ fi, i ∈ I, to be continuous. If O ∈ TZ , then, for eachi ∈ I, f−1

i (g−1(O)) ∈ Ti, showing g−1(O) ∈ T according to (F.1). Thus, g is continuous.

Now let A be an arbitrary topology on X with the property stated in the hypothesis.Letting (Z, TZ) := (X,A) and g := IdX , we see that each fi is continuous with respectto A, implying A ⊆ T . Now let T ′ be an arbitrary topology on X that makes all ficontinuous. Letting (Z, TZ) := (X, T ′), we see that g := IdX is A-T ′ continuous (sinceeach fi = fi ◦ IdX is Ti-T ′ continuous) i.e., for each O ∈ T ′, we have g−1(O) = O ∈ A,showing T ′ ⊆ A and T ⊆ A, also completing the proof of A = T . �

Example F.4. (a) The product topology on X =∏

i∈I Xi (cf. Ex. 1.53) is the initialtopology with respect to the projections (πi)i∈I , πi : X −→ Xi (as is clear fromLem. F.2(a)).

(b) The subspace topology on M ⊆ X, where (X, T ) is a topological space (cf. Prop.1.54), is the initial topology with respect to the identity inclusion map ι : M −→ X,

G SEPARATION: MORE COUNTEREXAMPLES 129

ι(x) := x: This is also clear from Lem. F.2(a), since

TM ={O ∩M : O ∈ T

}={ι−1(O) : O ∈ T

}.

(c) An important example of a final topology is given by the quotient topology: Let(X, T ) be a topological space and let ∼ be an equivalence relation on X. Moreover,let Y := X/ ∼= {[x] : x ∈ X} be the corresponding quotient set (i.e. the set ofcorresponding equivalence classes). Then the quotient topology on Y with respectto ∼, denoted T / ∼, is defined as the final topology with respect to the canonicalprojection

π : X −→ Y, π(x) := [x].

Thus, by (F.1),T / ∼=

{O ⊆ Y : π−1(O) ∈ T

}.

It is an exercise to show that S1(0) ⊆ R2, i.e. the unit sphere in R2, endowed withthe subspace topology, is homeomorphic to Y := (R ∪ {∞,−∞})/ ∼, where ∼identifies ∞ and −∞, and where Y is endowed with the corresponding quotienttopology.

G Separation: More Counterexamples

Example G.1. (a) R2 with a double origin is an example of a topological space thatis T2, but not T3 (see [SS95, Sec. 74]).

(b) For examples of spaces that are regular, but not T4, cf. Ex. G.1(d) and Ex. G.1(e)below.

(c) LetX := {0, 1}, T := {∅, {1}, X}. Clearly, (X, T ) is a topological space (it is knownas the Sierpinski space). The space is T4, since {0} and X are the only nonemptyclosed sets and these are not disjoint. However, the space is not T1, T2, T3: Due toLem. 3.2(c) it suffices to see it is not T1. It is not T1, since {1}, {0} are disjoint;{0} is closed, but {1}, {0} can not be separated by open sets.

(d) The following simple example shows that a subspace of a T4 space does not need tobe a T4 space: Let X := {0, 1, 2, 3},

T :={∅, {3}, {1, 3}, {2, 3}, {1, 2, 3}, X

}.

Clearly, T is a topology on X. The closed subsets of X are precisely ∅, {0, 1, 2},{0, 2}, {0, 1}, {0}, X. We see that, if A,B are closed disjoint subsets of X, thenA = ∅ or B = ∅, showing (X, T ) to be T4. However, if we consider M := {1, 2, 3},then TM = T \ {X}, and the closed sets in M are precisely ∅, {1, 2}, {2}, {1},X. Now {2}, {1} are closed subsets of M that can not be separated, showing that(M, TM ) is not T4.

Finding spaces that are normal, but have subspaces that are not T4 is not so easy,but they do exist: For example [0, 1][0,1] with the product topology (see [SS95, Sec.

H COMPACTNESS 130

105]) or the so-called Tychonoff plank (see [SS95, Sec. 86,87]). Each subspace of anormal space that is not T4 provides an example of a space that is regular (as asubspace of a regular space), but not T4.

(e) The following example shows that the product of normal spaces does not need tobe T4: If (S, TS) is R with the Sorgenfrey topology of Ex. 1.52(d), then (S, TS)is normal (see [SS95, Sec. 51]). If X := S × S with the corresponding producttopology TX , then (X, TX) is called the Sorgenfrey plane. The Sorgenfrey plane isnot T4 (see [SS95, Sec. 84]). On the other hand, (X, TX) must be regular, since itis a product of regular spaces.

H Compactness

H.1 Intersections of Compact Sets

We provide an example that shows that in spaces that are not T2, it can happen thatthe intersection of two compact sets is not compact:

Example H.1. Let a, b /∈ N, a 6= b. Consider X := N ∪ {a, b}. Define

T := P(N) ∪{X} ∪

{N ∪ {a}} ∪

{N ∪ {b}}. (H.1)

Clearly, T is a topology on X. Moreover C1 := N∪{a} and C2 := N∪{b} are compact:Each open cover of C1 must have X or C1 as a member (they are the only open setscontaining a), providing a finite subcover (the analogous argument shows C2 to becompact). However, C1 ∩ C2 = N and, as the subspace topology on N is discrete, N isnot compact.

H.2 Unit Balls in Normed Vector Spaces

The goal of this section is to prove Th. 3.18, i.e. that the closed unit ball in a normedvector space X is compact if, and only if, X is finite-dimensional. In preparation, weshow that finite-dimensional subspaces of normed vector spaces are always closed:

Theorem H.2. Let (X, ‖ · ‖) be a normed vector space over K. If U ⊆ X is a subspacesuch that dimU = n ∈ N, then U is closed.

Proof. Let (b1, . . . , bn) be a basis of U . Then, clearly,

A : U −→ Kn, A

(n∑

k=1

αk bk

)

:= (α1, . . . , αn), (H.2)

defines a linear isomorphism. We define a norm on Kn by letting

‖ · ‖ : Kn −→ R+0 , ‖z‖ :=

∥∥A−1(z)

∥∥. (H.3)

H COMPACTNESS 131

Indeed, (H.3) defines a norm: ‖0‖ =∥∥A−1(0)

∥∥ = ‖0‖ = 0; if z ∈ Kn and ‖z‖ =

∥∥A−1(z)

∥∥ = 0, then A−1(z) = 0, i.e. z = 0, showing ‖ · ‖ to be positive definite.

Moreover∀

z∈Kn∀

λ∈K‖λz‖ =

∥∥A−1(λz)

∥∥ = |λ|

∥∥A−1(z)

∥∥ = |λ| ‖z‖,

showing ‖ · ‖ to be homogeneous of degree 1. Finally,

∀z,w∈Kn

‖z + w‖ =∥∥A−1(z + w)

∥∥ ≤

∥∥A−1(z)

∥∥+

∥∥A−1(w)

∥∥ = ‖z‖+ ‖w‖,

showing the triangle inequality to hold for ‖ · ‖.Let (uk)k∈N be a sequence in U such that limk→∞ uk = x ∈ X. Then (uk)k∈N is a Cauchysequence and, as A is norm-preserving in consequence of (H.3), (Auk)k∈N is a Cauchysequence in Kn. Since Kn is complete, there is z ∈ Kn such that limk→∞Auk = z andlimk→∞ uk = A−1z, showing x = A−1z ∈ U , i.e. U is closed. �

Proof of Th. 3.18. Let X be finite-dimensional. If (b1, . . . , bn) denotes a basis of X, then(H.2) defines a linear isomorphism A : X −→ Kn. If we define a norm on Kn via (H.3),then A−1 becomes norm-preserving and, in particular, continuous. Then B1(0) in Xmust be compact as the countinuous image (under A−1) of B1(0) in Kn.

Conversely, let X be infinite-dimensional. To show that B1(0) is not compact, weconstruct, via recursion, a sequence (xk)k∈N in B1(0) (actually in the sphere S1(0)) thatdoes not have a convergent subsequence: Fix n ∈ N and assume (x1, . . . , xn) to bealready constructed such that

∀k∈{1,...,n}

‖xk‖ = 1, (H.4a)

∀k,l∈{1,...,n},

k 6=l

‖xk − xl‖ ≥ 1

2. (H.4b)

Let U := span{x1, . . . , xn}. Since X is infinite-dimensional, we have U 6= X. Letx ∈ X \ U . Since U is closed by Th. H.2, it is

d := inf{‖x− u‖ : u ∈ U

}> 0.

Moreover, there exists u0 ∈ U such that ‖x− u0‖ ≤ 2d. Set

xn+1 :=x− u0

‖x− u0‖.

Then ‖xn+1‖ = 1 and, for each u ∈ U is ‖x− u0‖u+ u0 ∈ U , implying

‖u− xn+1‖ =

∥∥‖x− u0‖u− x+ u0

∥∥

‖x− u0‖≥ d

‖x− u0‖≥ 1

2.

Thus, (H.4) holds with n replaced by n + 1, where (H.4b) means that (xk)k∈N can nothave a convergent subsequence. �

H COMPACTNESS 132

H.3 Proof of Tychonoff’s Theorem

Proof of Th. 3.25. When using net convergence, the proof of the theorem can be carriedout rather elegantly. A standard method in the literature is to first show that everynet has a so-called universal subnet. Once this is established, Tychonoff’s theorem isa simple corollary. The following proof is essentially the one given in [Che92], whichavoids the use of universal nets. Let ν := (xj)j∈J be a net in X. We have to show thatν has a convergent subnet. By Prop. 1.27, it suffices to show this ν has a cluster point.The idea is to show that by an application of Zorn’s lemma. By definition, each x ∈ Xis a function defined on I and, for each K ⊆ I, y ∈ XK :=

∏

i∈K Xi is a function on K.Thus, x ∈ X implies x↾K∈ XK and, in particular, ν↾K := (xj↾K)j∈J is a net in XK . Asin [Che92], for each K ⊆ I, we call y ∈ XK a partial cluster point of ν if, and only if, yis a cluster point of ν↾K . Let P be the set of all partial cluster points of ν. Then P 6= ∅,since ∅ ∈ P : The empty function ∅ is a cluster point (in fact the limit) of the constantnet ν↾∅= (∅)j∈J . Even if you are not fond of the empty set, you need not be concerned,as we will now show

∀y∈P∩XK

(

K 6= I ⇒ ∃i0∈I\K

∃z∈P∩XK∪{i0}

z↾K= y

)

: (H.5)

Let y ∈ P ∩ XK , K ( I. Then y is a cluster point of ν ↾K . Thus, ν ↾K has a subnet(xa ↾K)a∈A such that lima∈A x

a ↾K= y. Let i0 ∈ I \ K, L := K ∪ {i0}. Since Xi0

is compact, the net (xai0)a∈A has a subnet (xbi0)b∈B that converges to some zi0 ∈ Xi0 .Define

z ∈∏

i∈L

Xi, z(i) :=

{

y(i) for i ∈ K,

zi0 for i = i0.

Then z↾K= y and it remains to show z ∈ P , i.e. z is a cluster point of ν↾L. Indeed, thesubnet (xb↾L)b∈B of ν ↾L converges to z: Let O ∈ U(z) and suppose O = π−1

i (Oi) withi ∈ L and Oi ∈ Ti. If i ∈ K, then limb∈B x

b↾K= y implies there is b0 ∈ B such that, foreach b ≥ b0, one has xb ↾L∈ O. If i = i0, then limb∈B x

bi0= zi0 implies there is b0 ∈ B

such that, for each b ≥ b0, one has xb ↾L∈ O. According to Cor. 1.50(a), this showslimb∈B x

b↾L= z and establishes (H.5). We now define a partial order on P by setting

∀y,z∈P

y ≤ z :⇔(

y ∈ XKy, z ∈ XKz

, Ky ⊆ Kz ⊆ I, z↾Ky= y)

.

To apply Zorn’s lemma, we have to show that each totally ordered subset of P hasan upper bound. Let Q = {y ∈ ∏

i∈KyXi} be a totally ordered subset of P . Let

K :=⋃

y∈QKy and define

z ∈∏

i∈K

Xi, z(i) := y(i) for i ∈ Ky

(note that z is well-defined since Q is totally ordered). To see that z ∈ P , let K0 ⊆ Kbe finite and O :=

⋂

i∈K0π−1i (Oi), Oi ∈ Ti. If K0 = {i1, . . . , iN}, N ∈ N, then there

are y1, . . . , yN ∈ Q such that il ∈ Kyl for each l ∈ {1, . . . , N}. If y := max{y1, . . . , yN},

I TOPOLOGICAL INVARIANTS 133

then y is a cluster point of ν↾Kyand ν↾K is frequently in O, showing z is a cluster point

of ν↾K . Clearly, z is an upper bound for P . In consequence, Zorn’s lemma applies andP must contain a maximal element c. Due to (H.5), c must be defined on all of I, i.e. cis a cluster point of ν as desired. �

I Topological Invariants

The following Prop. I.1 lists topological invariants (i.e. properties preserved under home-omorphisms) that are relevant to this class:

Proposition I.1. Let (X, TX) and (Y, TY ) be topological spaces, let f : X −→ Y be ahomeomorphism, A ⊆ X, x ∈ X.

(a) A is open if, and only if, f(A) is open; A is closed if, and only if, f(A) is closed;A ∈ U(x) if, and only if, f(A) ∈ U(f(x)).

(b) The net (xi)i∈I in X converges to x (has x as a cluster point) if, and only if, thenet (f(xi))i∈I in Y converges to f(x) (has f(x) as a cluster point).

(c) (X, TX) is (pseudo)metrizable if, and only if, (Y, TY ) is (pseudo)metrizable.

(d) x is an interior point (a boundary point, a point in the closure, a cluster point, anisolated point) of A if, and only if, f(x) is an interior point (a boundary point, apoint in the closure, a cluster point, an isolated point) of f(A).

(e) A is dense in X if, and only if, f(A) is dense in Y ; (X, TX) is separable if, andonly if, (Y, TY ) is separable.

(f) A ⊆ TX is a local base at x (a base of TX , a subbase of TX) if, and only if,f(A) := {f(A) : A ∈ A} is a local base at f(x) (a base of TY , a subbase of TY );(X, TX) is first (second) countable if, and only if, (Y, TY ) is first (second) countable.

(g) If M ⊆ X, then A ⊆ M is M-open (M-closed) if, and only if, f(A) is f(M)-open(f(M)-closed).

(h) For each n ∈ {1, 2, 3, 4}, (X, TX) is Tn if, and only if, (Y, TY ) is Tn.

(i) A is compact if, and only if, f(A) is compact.

(j) A is connected (resp. path-connected) if, and only if, f(A) is connected (resp. path-connected). Moreover, A is a connected component (resp. a path-component) of Xif, and only if, f(A) is a connected component (resp. a path-component) of Y .

Proof. Since f is a homeomorphism if, and only if, f−1 is a homeomorphism, it alwayssuffices to prove one direction of the claimed equivalences.

(a): If A is open, then f(A) is open by Th. 2.7(ii), since f−1 is continuous. If A is closed,then f(A) is closed by Th. 2.7(iv), since f−1 is continuous. If A ∈ U(x), then there

I TOPOLOGICAL INVARIANTS 134

exists O ∈ TX such that x ∈ O ⊆ A. Then f(x) ∈ f(O) ⊆ f(A). Since f(O) ∈ TY , thisshows f(A) ∈ U(f(x)).(b): Let (xi)i∈I be a net in X, converging to x. Let U ∈ U(f(x)). Then, by (a),f−1(U) ∈ U(x) and (xi)i∈I is eventually in f−1(U). Then (f(xi))i∈I is eventually inU = f(f−1(U)), showing (f(xi))i∈I to converge to f(x). If x is a cluster point of (xi)i∈I ,then there is a subnet (xj)j∈J of (xi)i∈I such that limj∈J xj = x. Then (f(xj))j∈J is asubnet of (f(xi))i∈I such that limj∈J f(xj) = f(x), showing f(x) to be a cluster pointof (f(xi))i∈I .

(c): Let TX be induced by the (pseudo)metric dX on X. Then, clearly,

dY : Y × Y −→ R+0 , dY (y1, y2) := dX

(f−1(y1), f

−1(y2)), (I.1)

defines a (pseudo)metric on Y . We show that dY induces TY : We have the equivalences

O ∈ TY ⇔ f−1(O) ∈ TX ⇔ ∀x∈f−1(O)

∃ǫ∈R+

Bǫ(x) ∈ f−1(O)

⇔ ∀y∈O

∃ǫ∈R+

Bǫ(y) ∈ O,

establishing the case.

(d): If x is an interior point of A, then there is O ∈ TX such that x ∈ O ⊆ A. Thenf(O) ∈ TY and f(x) ∈ f(O) ⊆ f(A), showing f(x) to be an interior point of f(A). Ifx ∈ ∂A and U ∈ U(f(x)), then O := f−1(U) ∈ U(x), O ∩A 6= ∅ and O ∩Ac 6= ∅. Thus,f(O)∩f(A) = U∩f(A) 6= ∅ and f(O)∩f(Ac) = U∩(f(A))c 6= ∅, showing f(x) ∈ ∂f(A).If x ∈ A = A ∪ ∂A, then f(x) ∈ f(A) ∪ f(∂A) = f(A) ∪ ∂(f(A)) = f(A). If x is acluster point of A, then there is a net (ai)i∈A in A \ {x} such that limi∈I ai = x. Then(f(ai))i∈I is a net in f(A) \ {f(x)} such that limi∈I f(ai) = f(x) (by (b)), showing f(x)to be a cluster point of f(A). Finally, the set P of isolated points of A is P = A \H(A)(where H(A) is set of cluster points). Then f(P ) = f(A) \ f(H(A)) = f(A) \H(f(A)),i.e. f(P ) is the set of isolated points of f(A).

(e): According to (e) A = X if and only if f(A) = Y . The claim regarding separabilitythen also follows, as A is countable if, and only if, f(A) is countable.

(f): If A is a local base at x and U ∈ U(f(x)), then f−1(U) ∈ U(x). Thus, there isB ∈ A such that x ∈ B ⊆ f−1(U). Then f(x) ∈ f(B) ⊆ U , showing that f(A) is alocal base at f(x). Now let A be a base of TX and let O ∈ TY . Then f

−1(O) ∈ TX , i.e.f−1(O) =

⋃

i∈I Bi with suitable I and Bi ∈ A. Then O =⋃

i∈I f(Bi), proving f(A) tobe a base of TY . Since f is bijective, A is countable if, and only if, f(A) is countable.Finally, let A be a subbase of TX . Then β(A), then set of finite intersections of sets inA is a base of TX . Then f(β(A)) is a base of TY . If B ∈ β(A), then

B =n⋂

i=1

Ai, ∀i∈{1,...,n}

Ai ∈ A, n ∈ N,

and

f(B) =n⋂

i=1

f(Ai)

J MULTILINEAR MAPS 135

showing f(A) to be a subbase of TY .

(g): If A ⊆ M ⊆ X and A is M -open (M -closed), then there is B ⊆ X such that Bis X-open (X-closed) and A = M ∩ B. Then f(A) = f(M) ∩ f(B), and, since f(B) isY -open (Y -closed) this shows f(A) to be f(M)-open (f(M)-closed).

(h): Suppose (X, TX) is T1 (resp. T2) and let y1, y2 ∈ Y such that y1 6= y2. Thenx1 := f−1(y1) 6= x2 := f−1(y2) and there are open O1 ∈ U(x1) and open O2 ∈ U(x2) suchthat x2 /∈ O1 and x1 /∈ O2 (resp. O1 ∩ O2 = ∅). Then U1 := f(O1) is open, U2 := f(O2)is open, y1 ∈ U1, y2 ∈ U2, and y2 /∈ U1 as well as y1 /∈ U2 (resp. U1 ∩ U2 = ∅), showing(Y, TY ) to be T1 (resp. T2). Now suppose (X, TX) is T3, let y ∈ Y , and let B ⊆ Ybe closed such that y /∈ B. Then x := f−1(y) /∈ A := f−1(B), A is closed, and thereare open O1 ∈ U(x) and open O2 ⊆ X such that A ⊆ O2 and O1 ∩ O2 = ∅. ThenU1 := f(O1) is open, U2 := f(O2) is open, y ∈ U1, B ⊆ U2, and U1 ∩ U2 = ∅, showing(Y, TY ) to be T3. Finally, suppose (X, TX) is T4 and let B1, B2 ⊆ Y be closed such thatB1∩B2 = ∅. Then A1 := f−1(B1)∩A2 := f−1(B2) = ∅, A1 and A2 are closed, and thereare open O1, O2 ⊆ X such that A1 ⊆ O1, A2 ⊆ O2 and O1 ∩O2 = ∅. Then U1 := f(O1)is open, U2 := f(O2) is open, B1 ⊆ U1, B2 ⊆ U2, and U1 ∩ U2 = ∅, showing (Y, TY ) tobe T4.

(i): Suppose A is compact and let (Oi)i∈I be an open cover of f(A). Letting Ui :=f−1(Oi) for each i ∈ I, we see that (Ui)i∈I is an open cover of A. Since A is compact,there exists a finite J ⊆ I such that (Ui)i∈J is still a cover of A. But then (Oi)i∈J is afinite subcover of (Oi)i∈I that still covers f(A), proving f(A) to be compact.

(j): For the first part, without loss of generality, we may assume A = X. Assume Y isnot connected and let O1, O2 ∈ TY such that O1 ∩O2 = ∅, Y = O1 ∪O2, O1, O2 6= ∅. IfU1 := f−1(O1), U2 := f−1(O2), then U1, U2 ∈ TX , U1 ∩U2 = ∅, X = U1 ∪U2, U1, U2 6= ∅,showing X is not connected. If x, y ∈ A and φ : [0, 1] −→ A is a path in A connecting xand y, then f ◦φ is a path in f(A) connecting f(x) and f(y). Thus, if A is connected, sois f(A). If A is a connected component (resp. a path-component) of X and x ∈ A, thenA is the union of all connected sets containing x. Then f(A) is the union of all connected(resp. path-connected) sets containing f(x), i.e. f(A) is a connected component (resp.a path-component) of Y . �

J Multilinear Maps

We are mostly interested in vector spaces over the fields F = R and F = C. However,the following considerations hold for an arbitrary field F .

Definition J.1. Let X and Y be vector spaces over the field F , α ∈ N. We call a map

L : Xα −→ Y (J.1)

multilinear (more precisely, α times linear) if, and only if, it is linear in each component,


i.e., for each x1, . . . , xi−1, xi+1, . . . , xα, v, w ∈ X, i ∈ {1, . . . , α} and each λ, µ ∈ F :

L(x1, . . . , xi−1, λv + µw, xi+1, . . . , xα)

= λL(x1, . . . , xi−1, v, xi+1, . . . , xα) + µL(x1, . . . , xi−1, w, xi+1, . . . , xα). (J.2)

We denote the set of all α times linear maps from Xα into Y by Lα(X, Y ). We also setL0(X, Y ) := Y .

Remark J.2. In the situation of Def. J.1, each Lα(X, Y ), α ∈ N0, constitutes a vectorspace over F : It is a subspace of the vector space over F of all functions from Xα into Y ,since, clearly, if K,L : Xα −→ Y are both α times linear and λ, µ ∈ F , then λK + µLis also α times linear.

Theorem J.3. Let X and Y be vector spaces over the field F , α ∈ N. Then, as vectorspaces over F , L(X,Lα−1(X, Y )) and Lα(X, Y ) are isomorphic via the isomorphism

Φ : L(X,Lα−1(X, Y )) −→ Lα(X, Y ),

Φ(L)(x1, . . . , xα) := L(x1)(x2, . . . , xα). (J.3)

Proof. Since L is linear and L(x1) is (α − 1) times linear, Φ(L) is, indeed, an elementof Lα(X, Y ), showing that Φ is well-defined by (J.3). Next, we verify Φ to be linear: Ifλ ∈ F and K,L ∈ L(X,Lα−1(X, Y )), then

Φ(λL)(x1, . . . , xα) = (λL)(x1)(x2, . . . , xα) = λ(L(x1)(x2, . . . , xα)) = λΦ(L)(x1, . . . , xα)

and

Φ(K + L)(x1, . . . , xα) = (K + L)(x1)(x2, . . . , xα) = (K(x1) + L(x1))(x2, . . . , xα)

= K(x1)(x2, . . . , xα) + L(x1)(x2, . . . , xα)

= Φ(K)(x1, . . . , xα) + Φ(L)(x1, . . . , xα)

= (Φ(K) + Φ(L))(x1, . . . , xα),

proving Φ to be linear. Now we show Φ to be injective. To this end, we show that,if L 6= 0, then Φ(L) 6= 0. If L 6= 0, then there exist x1, . . . , xα ∈ X such thatL(x1)(x2, . . . , xα) 6= 0, showing that Φ(L) 6= 0 as needed. To verify Φ is also surjective,let K ∈ Lα(X, Y ). Define L : X −→ Lα−1(X, Y ) by letting

L(x1)(x2, . . . , xα) := K(x1, . . . , xα). (J.4)

Then, clearly, for each x1 ∈ X, L(x1) ∈ Lα−1(X, Y ). Moreover, L is linear, i.e. L ∈L(X,Lα−1(X, Y )). Comparing (J.4) with (J.3) shows Φ(L) = K, i.e. Φ is surjective,completing the proof. �

Remark J.4. For simplicity, we will now restrict ourselves to finite-dimensional X.Suppose dimX = n, n ∈ N. Moreover, let {b1, . . . , bn} be a basis of X over F . If


x1, . . . , xα ∈ X, then there are xji ∈ F , j ∈ {1, . . . , α}, i ∈ {1, . . . , n}, such thatxj =

∑ni=1 x

ji bi. Thus, if L ∈ Lα(X, Y ), then

L(x1, . . . , xα) =n∑

i1,...,iα=1

x1i1 · · · xαiα L(bi1 , . . . , biα), (J.5)

showing L is uniquely determined by its values L(bi1 , . . . , biα), (i1, . . . , iα) ∈ {1, . . . , n}α.Conversely, if, for each (i1, . . . , iα) ∈ {1, . . . , n}α, one is given a vector yi1,...,iα ∈ Y , then

L(x1, . . . , xα) =n∑

i1,...,iα=1

x1i1 · · · xαiα yi1,...,iα , (J.6)

clearly, defines an element L ∈ Lα(X, Y ).

Theorem J.5. Let X and Y be vector spaces over the field F , α ∈ N. Moreover, letdimX = n, n ∈ N, let {b1, . . . , bn} be a basis of X over F , and let B be a basis of Yover F . For each (i1, . . . , iα) ∈ I := {1, . . . , n}α and each b ∈ B, define

Li1,...,iα,b(bj1 , . . . , bjα) :=

{

b for (j1, . . . , jα) = (i1, . . . , iα),

0 otherwise.(J.7)

According to Rem. J.4, (J.7) uniquely defines an element of Lα(X, Y ). Then B :={Li1,...,iα,b : (i1, . . . , iα) ∈ I, b ∈ B} constitutes a basis of Lα(X, Y ) over F and, inparticular, if dimY = m, then dimLα(X, Y ) = nαm.

Proof. We verify that the elements of B are linearly independent: Let M,N ∈ N. Let(i11, . . . , i

1α), . . . , (i

N1 , . . . , i

Nα ) ∈ I be distinct and let b1, . . . , bM ∈ B be distinct as well.

Assume λlk ∈ F to be such that

L :=M∑

l=1

N∑

k=1

λlkLik1,...,ikα,b

l = 0.

Let k ∈ {1, . . . , N}. Then

0 = L(bik1

, . . . , bikα) =M∑

l=1

N∑

k=1

λlkLik1,...,ikα,b

l(bik1

, . . . , bikα) =M∑

l=1

λlk bl

implies λ1k = · · · = λMk = 0 due to the linear independence of the bl. As this holdsfor each k ∈ {1, . . . , N}, we have established the linear independence of B. It remainsto verify that B spans Lα(X, Y ). According to Rem. J.4, if L ∈ Lα(X, Y ), then L hasthe form (J.6), where yi1,...,iα = L(bi1 , . . . , biα) ∈ Y for each (i1, . . . , iα) ∈ I. Thus, if, foreach (i1, . . . , iα) ∈ I,

Li1,...,iα(bj1 , . . . , bjα) :=

{

yi1,...,iα for (j1, . . . , jα) = (i1, . . . , iα),

0 otherwise,

K DIFFERENTIAL CALCULUS 138

thenL =

∑

(i1,...,iα)∈I

Li1,...,iα .

It merely remains to write each Li1,...,iα as a linear combination of elements of B. Tothis end, write

yi1,...,iα =N∑

k=1

λkbk,

where N ∈ N, λk ∈ F , bk ∈ B. Then

Li1,...,iα =N∑

k=1

λkLi1,...,iα,bk :

Let K :=∑N

k=1 λkLi1,...,iα,bk . Then, for each (j1, . . . , jα) ∈ I,

K(bj1 , . . . , bjα) =

{

yi1,...,iα for (j1, . . . , jα) = (i1, . . . , iα),

0 otherwise,

thereby completing the proof. �

In Th. J.5, if X is infinite-dimensional, then the set corresponding to B is still lin-early independent, but, if Y 6= {0}, then B does no longer generate Lα(X, Y ) and, inparticular, it is no longer a basis of Lα(X, Y ).

K Differential Calculus

K.1 Bounded Derivatives Imply Lipschitz Continuity

It is sometimes useful if the bound on the derivatives is the same as the resultingLipschitz constant (which, for m > 1, is not the case in Th. 4.38). The followingTh. K.2 provides a variant, where the constants are the same, formulated for functionsf : I −→ Rn, defined on open intervals I ⊆ R, and making use of the Euclidean norm‖ · ‖2 on Rn. We will start with some auxiliary results regarding the Euclidean normand the Euclidean inner product:

Proposition K.1. Let I ⊆ R be an open interval, and let g, h : I −→ Rn be differen-tiable, n ∈ N.

(a) The function

f : I −→ R, f(x) := g(x) • h(x) =n∑

j=1

gj(x)hj(x), (K.1)

is differentiable and

f ′ : I −→ R, f ′(x) = g′(x) • h(x) + h(x) • h′(x). (K.2)


(b) The function

α : I −→ R, α(x) := ‖g(x)‖2 =√

g(x) • g(x), (K.3)

is differentiable at each x ∈ I such that g(x) 6= 0. Moreover,

∀x∈I,

g(x) 6=0

α′(x) =g(x) • g′(x)

α(x)=g(x) • g′(x)‖g(x)‖2

. (K.4)

Proof. (a) is immediate from the product rule.

(b) is an easy consequence of (a), as (a) implies α to be differentiable at each x ∈ Isuch that g(x) 6= 0, and

∀x∈I,

g(x) 6=0

α′(x) =2 g(x) • g′(x)2√

g(x) • g(x)=g(x) • g′(x)

α(x), (K.5)

completing the proof. �

Theorem K.2. Let a, b ∈ R with a < b and let f : ]a, b[−→ Rn be differentiable withuniformly bounded derivative, i.e. with

∃M∈R+

0

∀x∈]a,b[

‖f ′(x)‖2 =

√√√√

n∑

j=1

|f ′j(x)|2 ≤M. (K.6)

Then f is M-Lipschitz, i.e.

∀x1,x2∈]a,b[

‖f(x1)− f(x2)‖2 ≤M |x1 − x2|. (K.7)

Proof. For x1 = x2, there is nothing to prove. Thus, assume x1 6= x2 and define theauxiliary function

g : [0, 1] −→ Rn, g(t) := f(x1 + t(x2 − x1)

)− f(x1). (K.8)

According to the chain rule of Th. 4.31, g is differentiable on ]0, 1[ and

∀t∈]0,1[

g′(t) = (x2 − x1)f′(x1 + t(x2 − x1)

), (K.9)

implying∀

t∈]0,1[‖g′(t)‖2 ≤M |x1 − x2|. (K.10)

We now introduce another auxiliary function, namely

α : [0, 1] −→ R, α(t) := ‖g(t)‖2. (K.11)


Then α is continuous (as f and the norm are both continuous), satisfying α(0) =‖g(0)‖2 = 0 and α(1) = ‖g(1)‖2 = ‖f(x2)− f(x1)‖2. If α(1) = 0, then (K.7) is triviallytrue, and, thus, we proceed to assume α(1) > 0. Then the continuity of α implies

s := sup{t ∈ [0, 1] : α(t) = 0

}< 1, α(s) = 0. (K.12)

In consequence, α is positive on ]s, 1[ and, thus, differentiable on ]s, 1[ by Prop. K.1(b).The mean value theorem [Phi16, Th. 9.18] implies the existence of σ ∈]s, 1[ such that

α(1) = α(1)− α(s) = (1− s)α′(σ)(K.4)= (1− s)

g(σ) • g′(σ)α(σ)

(1.41)

≤ (1− s)‖g(σ)‖2‖g′(σ)‖2

‖g(σ)‖2(K.10)

≤ (1− s)M |x1 − x2|

≤ M |x1 − x2|, (K.13)


K.2 Surjectivity of Directional Derivatives

We finish the proof of Th. 4.41 by showing that, for n ≥ 2, the map

D : S1(0) −→ [−α, α], D(e) := ∇ f(ξ) · e =n∑

j=1

ǫj∂jf(ξ), α = ‖∇ f(ξ)‖2, (K.14)

is surjective (we already know from (4.50) that D(e) ∈ [−α, α] for each e ∈ S1(0)). Wealso recall emax = ∇ f(ξ)/α, emin = −emax, D(emax) = α, D(emin) = −α.The idea is to rotate emax into emin. This can be achieved using a suitable function

ρ : [0, π] −→ S1(0) ⊆ Rn, ρ = (ρ1, . . . , ρn).

We have to define ρ differently, depending on n ≥ 2 being even or odd. To this end, let(ǫ1, . . . , ǫn) := emax. If n is even, then define

∀j∈{1,...,n}

ρj : [0, π] −→ [−1, 1], ρj(θ) :=

{

ǫj cos θ + ǫj+1 sin θ if j is odd,

−ǫj−1 sin θ + ǫj cos θ if j is even;

(K.15a)if n is odd (note n ≥ 3 in this case), then define

∀j∈{1,...,n}

ρj : [0, π] −→ [−1, 1],

ρj(θ) :=

ǫj cos θ + ǫj+1 sin θ if j < n− 2 is odd,

−ǫj−1 sin θ + ǫj cos θ if j < n− 2 is even,

ǫn−2 cos θ +√ǫ2n−1 + ǫ2n sin θ if j = n− 2,

ǫn−1 cos θ − ǫn−2 ǫn−1√ǫ2n−1

+ǫ2nsin θ if j = n− 1,

ǫn cos θ − ǫn−2 ǫn√ǫ2n−1

+ǫ2nsin θ if j = n.

(K.15b)


For the sake of readability, we assumed ǫn−1 6= 0 or ǫn 6= 0 in (K.15b). There is alwaysat least one j0 ∈ {1, . . . , n} such that ǫj0 6= 0. If j0 /∈ {n− 1, n}, then one merely needsto interchange the roles of j0 and n in (K.15b).

Clearly, for every n ≥ 2, each ρj is continuous, i.e. ρ is continuous.

Next, we verify that ρ, indeed, maps into S1(0) (which, in particular, implies each ρjmaps into [−1, 1]): If n ≥ 2 is even, then, for each odd j ≤ n− 1, one has

∀θ∈[0,π]

(ρj(θ))2 + (ρj+1(θ))

2

= (ǫj cos θ + ǫj+1 sin θ)2 + (−ǫj sin θ + ǫj+1 cos θ)

2

= ǫ2j cos2 θ + 2ǫjǫj+1 cos θ sin θ + ǫ2j+1 sin

2 θ

+ ǫ2j sin θ − 2ǫjǫj+1 cos θ sin θ + ǫ2j+1 cos2 θ

= ǫ2j(cos2 θ + sin2 θ) + ǫ2j+1(cos

2 θ + sin2 θ) = ǫ2j + ǫ2j+1,

(K.16)

implying

∀θ∈[0,π]

‖ρ(θ)‖22 =n∑

j=1

(ρj(θ))2 =

n∑

j=1

ǫ2j = 1. (K.17)

If n ≥ 3 is odd, then (K.16) still holds for each odd j ≤ n− 4. Additionally,

∀θ∈[0,π]

(ρn−2(θ))2 + (ρn−1(θ))

2 + (ρn(θ))2

= ǫ21 cos2 θ + 2ǫ1

√

ǫ22 + ǫ23 sin θ cos θ + (ǫ22 + ǫ23) sin2 θ

+ ǫ22 cos2 θ − 2

ǫ1 ǫ22

√

ǫ22 + ǫ23sin θ cos θ +

ǫ21 ǫ22

ǫ22 + ǫ23sin2 θ

+ ǫ23 cos2 θ − 2

ǫ1 ǫ23

√

ǫ22 + ǫ23sin θ cos θ +

ǫ21 ǫ23

ǫ22 + ǫ23sin2 θ

= (ǫ21 + ǫ22 + ǫ23) cos2 θ

+2ǫ1 (ǫ

22 + ǫ23 − ǫ22 − ǫ23)√

ǫ22 + ǫ23sin θ cos θ

+

(

ǫ22 + ǫ23 +ǫ21 (ǫ

22 + ǫ23)

ǫ22 + ǫ23

)

sin2 θ

= (ǫ21 + ǫ22 + ǫ23) (cos2 θ + sin2 θ) = ǫ2n−2 + ǫ2n−1 + ǫ2n,

(K.18)

i.e. (K.17) is true also for each n ≥ 3 odd.

Clearly, D is also continuous and, thus, so is D ◦ ρ : [0, π] −→ [−α, α]. Moreover, assin(0) = sin(π) = 0, cos(0) = 1, cos(π) = −1, we obtain

∀n≥2

∀j∈{1,...,n}

(

ρj(0) = ǫj ∧ ρj(π) = −ǫj)

, (K.19)

implying

∀n≥2

(

ρ(0) = emax ∧ (D ◦ ρ)(0) = α ∧ ρ(π) = emin ∧ (D ◦ ρ)(π) = −α)

.

(K.20)The continuity of D◦ρ and the intermediate value theorem [Phi16, Th. 7.57] imply D◦ρto be surjective, i.e. D must be surjective as well.

REFERENCES 142

References

[Che92] P.R. Chernoff. A Simple Proof of Tychonoff’s Theorem Via Nets. Amer.Math. Monthly 99 (1992), No. 10, 932–934.

[Heu08] Harro Heuser. Lehrbuch der Analysis Teil 2, 14th ed. Vieweg+Teubner,Wiesbaden, Germany, 2008 (German).

[Kon04] Konrad Konigsberger. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2004(German).

[Phi16] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, Ludwig-Maximilians-Universitat, Germany, 2015/2016, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis1.pdf.

[Phi17] P. Philip. Numerical Analysis I. Lecture Notes, Ludwig-Maximi-lians-Universitat, Germany, 2016/2017, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_NumericalAnalysis1.pdf.

[Pre75] G. Preuß. Allgemeine Topologie, 2nd ed. Springer-Verlag, Berlin, 1975 (Ger-man).

[RF10] Halsey Royden and Patrick Fitzpatrick. Real Analysis, 4th ed. PearsonEducation, Boston, USA, 2010.

[Rud87] W. Rudin. Real and Complex Analysis, 3rd ed. McGraw-Hill Book Company,New York, 1987.

[Sie52] Waclav Sierpinski. General Topology. Mathematical Expositions, Vol. 7,University of Toronto Press, Toronto, Canada, 1952.

[SS95] Lynn Arthur Steen and J. Arthur Seebach, Jr. Counterexamples inTopology. Dover Publications, New York, 1995.

[Wal02] Wolfgang Walter. Analysis 2, 5th ed. Springer-Verlag, Berlin, 2002 (Ger-man).

Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Analysis II: Topology and Diﬀerential Calculus of Several Variables · 2020-05-27 · Topology...

Documents