LinearAlgebraI - math.lmu.de · LinearAlgebraI Peter Philip∗ Lecture Notes Created for the Class...

Linear Algebra I

Peter Philip∗

Lecture Notes

Created for the Class of Winter Semester 2018/2019 at LMU Munich

November 1, 2019

Contents

1 Foundations: Mathematical Logic and Set Theory 4

1.1 Introductory Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Functions and Relations 22

2.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.1 Definition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.2 Order Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Natural Numbers, Induction, and the Size of Sets 38

3.1 Induction and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Cardinality: The Size of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.1 Definition and General Properties . . . . . . . . . . . . . . . . . . . . . 44

3.2.2 Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.3 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

∗E-Mail: [email protected]

1

CONTENTS 2

4 Basic Algebraic Structures 52

4.1 Magmas and Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5 Vector Spaces 78

5.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.2 Linear Independence, Basis, Dimension . . . . . . . . . . . . . . . . . . . . . . . 84

6 Linear Maps 98

6.1 Basic Properties and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.2 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.3 Vector Spaces of Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7 Matrices 113

7.1 Definition and Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.2 Matrices as Representations of Linear Maps . . . . . . . . . . . . . . . . . . . . 118

7.3 Rank and Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

7.4 Special Types of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.5 Blockwise Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8 Linear Systems 132

8.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.2 Abstract Solution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8.3 Finding Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

8.3.1 Echelon Form, Back Substitution . . . . . . . . . . . . . . . . . . . . . . 136

8.3.2 Elementary Row Operations, Variable Substitution . . . . . . . . . . . . 141

8.3.3 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

8.4 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.4.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.4.2 Elementary Row Operations Via Matrix Multiplication . . . . . . . . . 147

8.4.3 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.4.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

A Axiomatic Set Theory 161

A.1 Motivation, Russell’s Antinomy . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

A.2 Set-Theoretic Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

CONTENTS 3

A.3 The Axioms of Zermelo-Fraenkel Set Theory . . . . . . . . . . . . . . . . . . . . 163

A.3.1 Existence, Extensionality, Comprehension . . . . . . . . . . . . . . . . . 164

A.3.2 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

A.3.3 Pairing, Union, Replacement . . . . . . . . . . . . . . . . . . . . . . . . 167

A.3.4 Infinity, Ordinals, Natural Numbers . . . . . . . . . . . . . . . . . . . . 171

A.3.5 Power Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

A.3.6 Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.4 The Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

A.5 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

B Associativity and Commutativity 191

B.1 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

B.2 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

C Groups 196

D Number Theory 197

E Vector Spaces 200

E.1 Cardinality and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

E.2 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

References 203

1 FOUNDATIONS: MATHEMATICAL LOGIC AND SET THEORY 4

1 Foundations: Mathematical Logic and Set Theory

1.1 Introductory Remarks

The task of mathematics is to establish the truth or falsehood of (formalizable) state-ments using rigorous logic, and to provide methods for the solution of classes of (e.g.applied) problems, ideally including rigorous logical proofs verifying the validity of themethods (proofs that the method under consideration will, indeed, provide a correctsolution).

The topic of this class is linear algebra, a subfield of the field of algebra. Algebrain the sense of this class is also called abstract algebra and constitutes the study ofmathematical objects and rules for combining them (one sometimes says that theserules form a structure on the underlying set of objects). An important task of algebrais the solution of equations, where the equations are formulated in some set of objectswith a given structure. In linear algebra, the sets of objects are so-called vector spaces(the objects being called vectors) and the structure consists of an addition, assigning twovectors v, w their sum vector v + w, together with a scalar multiplication, assigning ascalar (from a so-called scalar field, more about this later) λ and a vector v the productvector λv. In linear algebra, one is especially interested in solving linear equations, i.e.equations of the form A(x) = b, where A is a linear function, i.e. a function, satisfying

A(λv + µw) = λA(v) + µA(w)

for all vectors v, w and all scalars λ, µ. Before we can properly define and study vectorspaces and linear equations, it still needs some preparatory work. In modern mathe-matics, the objects under investigation are almost always so-called sets. So one aims atderiving (i.e. proving) true (and interesting and useful) statements about sets from otherstatements about sets known or assumed to be true. Such a derivation or proof meansapplying logical rules that guarantee the truth of the derived (i.e. proved) statement.

However, unfortunately, a proper definition of the notion of set is not easy, and neitheris an appropriate treatment of logic and proof theory. Here, we will only be able tobriefly touch on the bare necessities from logic and set theory needed to proceed to thecore matter of this class. We begin with logic in Sec. 1.2, followed by set theory inSec. 1.3, combining both in Sec. 1.4. The interested student can find an introductorypresentation of axiomatic set theory in Appendix A and he/she should consider takinga separate class on set theory, logic, and proof theory at a later time.

1.2 Propositional Calculus

1.2.1 Statements

Mathematical logic is a large field in its own right and, as indicated above, a thorough in-troduction is beyond the scope of this class – the interested reader may refer to [EFT07],[Kun12], and references therein. Here, we will just introduce some basic concepts using


common English (rather than formal symbolic languages – a concept touched on in Sec.A.2 of the Appendix and more thoroughly explained in books like [EFT07]).

As mentioned before, mathematics establishes the truth or falsehood of statements. Bya statement or proposition we mean any sentence (any sequence of symbols) that canreasonably be assigned a truth value, i.e. a value of either true, abbreviated T, or false,abbreviated F. The following example illustrates the difference between statements andsentences that are not statements:

Example 1.1. (a) Sentences that are statements:

Every dog is an animal. (T)

Every animal is a dog. (F)

The number 4 is odd. (F)

2 + 3 = 5. (T)√2 < 0. (F)

x+ 1 > 0 holds for each natural number x. (T)

(b) Sentences that are not statements:

Let’s study calculus!

Who are you?

3 · 5 + 7.

x+ 1 > 0.

All natural numbers are green.

The fourth sentence in Ex. 1.1(b) is not a statement, as it can not be said to be eithertrue or false without any further knowledge on x. The fifth sentence in Ex. 1.1(b) isnot a statement as it lacks any meaning and can, hence, not be either true or false. Itwould become a statement if given a definition of what it means for a natural numberto be green.

1.2.2 Logical Operators

The next step now is to combine statements into new statements using logical operators,where the truth value of the combined statements depends on the truth values of theoriginal statements and on the type of logical operator facilitating the combination.

The simplest logical operator is negation, denoted ¬. It is actually a so-called unaryoperator, i.e. it does not combine statements, but is merely applied to one statement.For example, if A stands for the statement “Every dog is an animal.”, then ¬A standsfor the statement “Not every dog is an animal.”; and if B stands for the statement “Thenumber 4 is odd.”, then ¬B stands for the statement “The number 4 is not odd.”, whichcan also be expressed as “The number 4 is even.”


To completely understand the action of a logical operator, one usually writes what isknown as a truth table. For negation, the truth table is

A ¬AT FF T

(1.1)

that means if the input statement A is true, then the output statement ¬A is false; ifthe input statement A is false, then the output statement ¬A is true.

We now proceed to discuss binary logical operators, i.e. logical operators combiningprecisely two statements. The following four operators are essential for mathematicalreasoning:

Conjunction: A and B, usually denoted A ∧ B.

Disjunction: A or B, usually denoted A ∨ B.

Implication: A implies B, usually denoted A⇒ B.

Equivalence: A is equivalent to B, usually denoted A⇔ B.

Here is the corresponding truth table:

A B A ∧ B A ∨B A⇒ B A⇔ BT T T T T TT F F T F FF T F T T FF F F F T T

(1.2)

When first seen, some of the assignments of truth values in (1.2) might not be completelyintuitive, due to the fact that logical operators are often used somewhat differently incommon English. Let us consider each of the four logical operators of (1.2) in sequence:

For the use in subsequent examples, let A1, . . . , A6 denote the six statements from Ex.1.1(a).

Conjunction: Most likely the easiest of the four, basically identical to common languageuse: A∧B is true if, and only if, both A and B are true. For example, using Ex. 1.1(a),A1 ∧ A4 is the statement “Every dog is an animal and 2 + 3 = 5.”, which is true sinceboth A1 and A4 are true. On the other hand, A1 ∧ A3 is the statement “Every dog isan animal and the number 4 is odd.”, which is false, since A3 is false.

Disjunction: The disjunction A∨B is true if, and only if, at least one of the statementsA,B is true. Here one already has to be a bit careful – A ∨ B defines the inclusive or,whereas “or” in common English is often understood to mean the exclusive or (which isfalse if both input statements are true). For example, using Ex. 1.1(a), A1 ∨ A4 is thestatement “Every dog is an animal or 2 + 3 = 5.”, which is true since both A1 and A4

are true. The statement A1 ∨A3, i.e. “Every dog is an animal or the number 4 is odd.”is also true, since A1 is true. However, the statement A2 ∨ A5, i.e. “Every animal is adog or

√2 < 0.” is false, as both A2 and A5 are false.


As you will have noted in the above examples, logical operators can be applied tocombine statements that have no obvious contents relation. While this might seemstrange, introducing contents-related restrictions is unnecessary as well as undesirable,since it is often not clear which seemingly unrelated statements might suddenly appearin a common context in the future. The same occurs when considering implications andequivalences, where it might seem even more obscure at first.

Implication: Instead of A implies B, one also says if A then B, B is a consequenceof A, B is concluded or inferred from A, A is sufficient for B, or B is necessary forA. The implication A ⇒ B is always true, except if A is true and B is false. At firstglance, it might be surprising that A⇒ B is defined to be true for A false and B true,however, there are many examples of incorrect statements implying correct statements.For instance, squaring the (false) equality of integers −1 = 1, implies the (true) equalityof integers 1 = 1. However, as with conjunction and disjunction, it is perfectly validto combine statements without any obvious context relation: For example, using Ex.1.1(a), the statement A1 ⇒ A6, i.e. “Every dog is an animal implies x+ 1 > 0 holds foreach natural number x.” is true, since A6 is true, whereas the statement A4 ⇒ A2, i.e.“2 + 3 = 5 implies every animal is a dog.” is false, as A4 is true and A2 is false.

Of course, the implication A ⇒ B is not really useful in situations, where the truthvalues of both A and B are already known. Rather, in a typical application, one triesto establish the truth of A to prove the truth of B (a strategy that will fail if A happensto be false).

Example 1.2. Suppose we know Sasha to be a member of a group of children. Thenthe statement A “Sasha is a girl.” implies the statement B “There is at least one girlin the group.” A priori, we might not know if Sasha is a girl or a boy, but if we canestablish Sasha to be a girl, then we also know B to be true. If we find Sasha to be aboy, then we do not know, whether B is true or false.

—

Equivalence: A ⇔ B means A is true if, and only if, B is true. Once again, usinginput statements from Ex. 1.1(a), we see that A1 ⇔ A4, i.e. “Every dog is an animalis equivalent to 2 + 3 = 5.”, is true as well as A2 ⇔ A3, i.e. “Every animal is a dog isequivalent to the number 4 is odd.”. On the other hand, A4 ⇔ A5, i.e. “2 + 3 = 5 isequivalent to

√2 < 0, is false.

Analogous to the situation of implications, A⇔ B is not really useful if the truth valuesof both A and B are known a priori, but can be a powerful tool to prove B to be trueor false by establishing the truth value of A. It is obviously more powerful than theimplication as illustrated by the following example (compare with Ex. 1.2):

Example 1.3. Suppose we know Sasha is the tallest member of a group of children.Then the statement A “Sasha is a girl.” is equivalent to the statement B “The tallestkid in the group is a girl.” As in Ex. 1.2, if we can establish Sasha to be a girl, then wealso know B to be true. However, in contrast to Ex. 1.2, if we find Sasha to be a boy,we know B to be false.


Remark 1.4. In computer science, the truth value T is often coded as 1 and the truthvalue F is often coded as 0.

1.2.3 Rules

Note that the expressions in the first row of the truth table (1.2) (e.g. A ∧ B) are notstatements in the sense of Sec. 1.2.1, as they contain the statement variables (also knownas propositional variables) A or B. However, the expressions become statements if allstatement variables are substituted with actual statements. We will call expressions ofthis form propositional formulas. Moreover, if a truth value is assigned to each statementvariable of a propositional formula, then this uniquely determines the truth value of theformula. In other words, the truth value of the propositional formula can be calculatedfrom the respective truth values of its statement variables – a first justification for thename propositional calculus.

Example 1.5. (a) Consider the propositional formula (A ∧ B) ∨ (¬B). Suppose A istrue and B is false. The truth value of the formula is obtained according to thefollowing truth table:

A B A ∧ B ¬B (A ∧ B) ∨ (¬B)T F F T T

(1.3)

(b) The propositional formula A∨ (¬A), also known as the law of the excluded middle,has the remarkable property that its truth value is T for every possible choice oftruth values for A:

A ¬A A ∨ (¬A)T F TF T T

(1.4)

Formulas with this property are of particular importance.

Definition 1.6. A propositional formula is called a tautology or universally true if,and only if, its truth value is T for all possible assignments of truth values to all thestatement variables it contains.

Notation 1.7. We write φ(A1, . . . , An) if, and only if, the propositional formula φcontains precisely the n statement variables A1, . . . , An.

Definition 1.8. The propositional formulas φ(A1, . . . , An) and ψ(A1, . . . , An) are calledequivalent if, and only if, φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) is a tautology.

Lemma 1.9. The propositional formulas φ(A1, . . . , An) and ψ(A1, . . . , An) are equiva-lent if, and only if, they have the same truth value for all possible assignments of truthvalues to A1, . . . , An.

Proof. If φ(A1, . . . , An) and ψ(A1, . . . , An) are equivalent and Ai is assigned the truthvalue ti, i = 1, . . . , n, then φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) being a tautology implies it


has truth value T. From (1.2) we see that either φ(A1, . . . , An) and ψ(A1, . . . , An) bothhave truth value T or they both have truth value F.

If, on the other hand, we know φ(A1, . . . , An) and ψ(A1, . . . , An) have the same truthvalue for all possible assignments of truth values to A1, . . . , An, then, given such anassignment, either φ(A1, . . . , An) and ψ(A1, . . . , An) both have truth value T or bothhave truth value F, i.e. φ(A1, . . . , An) ⇔ ψ(A1, . . . , An) has truth value T in each case,showing it is a tautology. �

For all logical purposes, two equivalent formulas are exactly the same – it does notmatter if one uses one or the other. The following theorem provides some importantequivalences of propositional formulas. As too many parentheses tend to make formulasless readable, we first introduce some precedence conventions for logical operators:

Convention 1.10. ¬ takes precedence over ∧,∨, which take precedence over ⇒,⇔.So, for example,

(A ∨ ¬B ⇒ ¬B ∧ ¬A) ⇔ ¬C ∧ (A ∨ ¬D)

is the same as((A ∨ (¬B)

)⇒((¬B) ∧ (¬A)

))

⇔(

(¬C) ∧(A ∨ (¬D)

))

.

Theorem 1.11. (a) (A ⇒ B) ⇔ ¬A ∨ B. This means one can actually define impli-cation via negation and disjunction.

(b) (A ⇔ B) ⇔((A ⇒ B) ∧ (B ⇒ A)

), i.e. A and B are equivalent if, and only if, A

is both necessary and sufficient for B. One also calls the implication B ⇒ A theconverse of the implication A ⇒ B. Thus, A and B are equivalent if, and only if,both A⇒ B and its converse hold true.

(c) Commutativity of Conjunction: A ∧B ⇔ B ∧ A.

(d) Commutativity of Disjunction: A ∨ B ⇔ B ∨ A.

(e) Associativity of Conjunction: (A ∧ B) ∧ C ⇔ A ∧ (B ∧ C).

(f) Associativity of Disjunction: (A ∨ B) ∨ C ⇔ A ∨ (B ∨ C).

(g) Distributivity I: A ∧ (B ∨ C) ⇔ (A ∧ B) ∨ (A ∧ C).

(h) Distributivity II: A ∨ (B ∧ C) ⇔ (A ∨B) ∧ (A ∨ C).

(i) De Morgan’s Law I: ¬(A ∧ B) ⇔ ¬A ∨ ¬B.

(j) De Morgan’s Law II: ¬(A ∨B) ⇔ ¬A ∧ ¬B.

(k) Double Negative: ¬¬A⇔ A.

(l) Contraposition: (A⇒ B) ⇔ (¬B ⇒ ¬A).


Proof. Each equivalence is proved by providing a truth table and using Lem. 1.9.

(a):A B ¬A A⇒ B ¬A ∨ BT T F T TT F F F FF T T T TF F T T T

(b) – (h): Exercise.

(i):A B ¬A ¬B A ∧ B ¬(A ∧ B) ¬A ∨ ¬BT T F F T F FT F F T F T TF T T F F T TF F T T F T T

(j): Exercise.

(k):A ¬A ¬¬AT F TF T F

(l):A B ¬A ¬B A⇒ B ¬B ⇒ ¬AT T F F T TT F F T F FF T T F T TF F T T T T

Having checked all the rules completes the proof of the theorem. �

The importance of the rules provided by Th. 1.11 lies in their providing proof techniques,i.e. methods for establishing the truth of statements from statements known or assumedto be true. The rules of Th. 1.11 will be used frequently in proofs throughout this class.

Remark 1.12. Another important proof technique is the so-called proof by contradic-tion, also called indirect proof. It is based on the observation, called the principle ofcontradiction, that A ∧ ¬A is always false:

A ¬A A ∧ ¬AT F FF T F

(1.5)

Thus, one possibility of proving a statement B to be true is to show ¬B ⇒ A ∧ ¬A forsome arbitrary statement A. Since the right-hand side of the implication is false, theleft-hand side must also be false, proving B is true.

—


Two more rules we will use regularly in subsequent proofs are the so-called transitivityof implication and the transitivity of equivalence (we will encounter equivalence againin the context of relations in Sec. 1.3 below). In preparation for the transitivity rules,we generalize implication to propositional formulas:

Definition 1.13. In generalization of the implication operator defined in (1.2), we saythe propositional formula φ(A1, . . . , An) implies the propositional formula ψ(A1, . . . , An)(denoted φ(A1, . . . , An) ⇒ ψ(A1, . . . , An)) if, and only if, each assignment of truth valuesto the A1, . . . , An that makes φ(A1, . . . , An) true, makes ψ(A1, . . . , An) true as well.

Theorem 1.14. (a) Transitivity of Implication: (A⇒ B) ∧ (B ⇒ C) ⇒ (A⇒ C).

(b) Transitivity of Equivalence: (A⇔ B) ∧ (B ⇔ C) ⇒ (A⇔ C).

Proof. According to Def. 1.13, the rules can be verified by providing truth tables thatshow that, for all possible assignments of truth values to the propositional formulas onthe left-hand side of the implications, either the left-hand side is false or both sides aretrue. (a):

A B C A⇒ B B ⇒ C (A⇒ B) ∧ (B ⇒ C) A⇒ CT T T T T T TT F T F T F TF T T T T T TF F T T T T TT T F T F F FT F F F T F FF T F T F F TF F F T T T T

(b):A B C A⇔ B B ⇔ C (A⇔ B) ∧ (B ⇔ C) A⇔ CT T T T T T TT F T F F F TF T T F T F FF F T T F F FT T F T F F FT F F F T F FF T F F F F TF F F T T T T

Having checked both rules, the proof is complete. �

Definition and Remark 1.15. A proof of the statement B is a finite sequence ofstatements A1, A2, . . . , An such that A1 is true; for 1 ≤ i < n, Ai implies Ai+1, and An

implies B. If there exists a proof for B, then Th. 1.14(a) guarantees that B is true.


Remark 1.16. Principle of Duality: In Th. 1.11, there are several pairs of rules thathave an analogous form: (c) and (d), (e) and (f), (g) and (h), (i) and (j). Theseanalogies are due to the general law called the principle of duality: If φ(A1, . . . , An) ⇒ψ(A1, . . . , An) and only the operators ∧,∨,¬ occur in φ and ψ, then the reverse im-plication Φ(A1, . . . , An) ⇐ Ψ(A1, . . . , An) holds, where one obtains Φ from φ and Ψfrom ψ by replacing each ∧ with ∨ and each ∨ with ∧. In particular, if, instead of animplication, we start with an equivalence (as in the examples from Th. 1.11), then weobtain another equivalence.

1.3 Set Theory

In the previous section, we have had a first glance at statements and corresponding truthvalues. In the present section, we will move our focus to the objects such statementsare about. Reviewing Example 1.1(a), and recalling that this is a mathematics classrather than one in zoology, the first two statements of Example 1.1(a) are less relevantfor us than statements 3–6. As in these examples, we will nearly always be interested instatements involving numbers or collections of numbers or collections of such collectionsetc.

In modern mathematics, the term one usually uses instead of “collection” is “set”. In1895, Georg Cantor defined a set as “any collection into a whole M of definite andseparate objects m of our intuition or our thought”. The objects m are called the ele-ments of the set M . As explained in Appendix A, without restrictions and refinements,Cantor’s set theory is not free of contradictions and, thus, not viable to be used in thefoundation of mathematics. Axiomatic set theory provides these necessary restrictionsand refinements and an introductory treatment can also be found in Appendix A. How-ever, it is possible to follow and understand the rest of this class, without having studiedAppendix A.

Notation 1.17. We write m ∈M for the statement “m is an element of the set M”.

Definition 1.18. The sets M and N are equal, denoted M = N , if, and only if, M andN have precisely the same elements.

—

Definition 1.18 means we know everything about a set M if, and only if, we know all itselements.

Definition 1.19. The set with no elements is called the empty set; it is denoted by thesymbol ∅.Example 1.20. For finite sets, we can simply write down all its elements, for example,A := {0}, B := {0, 17.5}, C := {5, 1, 5, 3}, D := {3, 5, 1}, E := {2,

√2,−2}, where the

symbolism “:=” is to be read as “is defined to be equal to”.

Note C = D, since both sets contain precisely the same elements. In particular, theorder in which the elements are written down plays no role and a set does not change ifan element is written down more than once.


If a set has many elements, instead of writing down all its elements, one might useabbreviations such as F := {−4,−2, . . . , 20, 22, 24}, where one has to make sure themeaning of the dots is clear from the context.

Definition 1.21. The set A is called a subset of the set B (denoted A ⊆ B and alsoreferred to as the inclusion of A in B) if, and only if, every element of A is also anelement of B (one sometimes also calls B a superset of A and writes B ⊇ A). Pleasenote that A = B is allowed in the above definition of a subset. If A ⊆ B and A 6= B,then A is called a strict subset of B, denoted A ( B.

If B is a set and P (x) is a statement about an element x of B (i.e., for each x ∈ B,P (x) is either true or false), then we can define a subset A of B by writing

A := {x ∈ B : P (x)}. (1.6)

This notation is supposed to mean that the set A consists precisely of those elements ofB such that P (x) is true (has the truth value T in the language of Sec. 1.2).

Example 1.22. (a) For each set A, one has A ⊆ A and ∅ ⊆ A.

(b) If A ⊆ B, then A = {x ∈ B : x ∈ A}.

(c) We have {3} ⊆ {6.7, 3, 0}. Letting A := {−10,−8, . . . , 8, 10}, we have {−2, 0, 2} ={x ∈ A : x3 ∈ A}, ∅ = {x ∈ A : x+ 21 ∈ A}.

Remark 1.23. As a consequence of Def. 1.18, the sets A and B are equal if, and onlyif, one has both inclusions, namely A ⊆ B and B ⊆ A. Thus, when proving the equalityof sets, one often divides the proof into two parts, first proving one inclusion, then theother.

Definition 1.24. (a) The intersection of the sets A and B, denoted A∩B, consists ofall elements that are in A and in B. The sets A,B are said to be disjoint if, andonly if, A ∩ B = ∅.

(b) The union of the sets A and B, denoted A ∪B, consists of all elements that are inA or in B (as in the logical disjunction in (1.2), the or is meant nonexclusively). IfA and B are disjoint, one sometimes writes A ∪B and speaks of the disjoint unionof A and B.

(c) The difference of the sets A and B, denoted A\B (read “A minus B” or “A withoutB”), consists of all elements of A that are not elements of B, i.e. A \ B := {x ∈A : x /∈ B}. If B is a subset of a given set A (sometimes called the universe inthis context), then A \ B is also called the complement of B with respect to A.In that case, one also writes Bc := A \ B (note that this notation suppresses thedependence on A).

Example 1.25. (a) Examples of Intersections:

{1, 2, 3} ∩ {3, 4, 5} = {3}, (1.7a)

{√2} ∩ {1, 2, . . . , 10} = ∅, (1.7b)

{−1, 2,−3, 4, 5} ∩ {−10,−9, . . . ,−1} ∩ {−1, 7,−3} = {−1,−3}. (1.7c)


(b) Examples of Unions:

{1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5}, (1.8a)

{1, 2, 3}∪{4, 5} = {1, 2, 3, 4, 5}, (1.8b)

{−1, 2,−3, 4, 5} ∪ {−99,−98, . . . ,−1} ∪ {−1, 7,−3}= {−99,−98, . . . ,−2,−1, 2, 4, 5, 7}. (1.8c)

(c) Examples of Differences:

{1, 2, 3} \ {3, 4, 5} = {1, 2}, (1.9a)

{1, 2, 3} \ {3, 2, 1,√5} = ∅, (1.9b)

{−10,−9, . . . , 9, 10} \ {0} = {−10,−9, . . . ,−1} ∪ {1, 2, . . . , 9, 10}. (1.9c)

With respect to the universe {1, 2, 3, 4, 5}, it is

{1, 2, 3}c = {4, 5}; (1.9d)

with respect to the universe {0, 1, . . . , 20}, it is

{1, 2, 3}c = {0} ∪ {4, 5, . . . , 20}. (1.9e)

As mentioned earlier, it will often be unavoidable to consider sets of sets. Here are firstexamples:

{∅, {0}, {0, 1}

},{{0, 1}, {1, 2}

}.

Definition 1.26. Given a set A, the set of all subsets of A is called the power set of A,denoted P(A) (for reasons explained later (cf. Prop. 2.18), the power set is sometimesalso denoted as 2A).

Example 1.27. Examples of Power Sets:

P(∅) = {∅}, (1.10a)

P({0}) ={∅, {0}

}, (1.10b)

P(P({0})

)= P

({∅, {0}

})={∅, {∅}, {{0}},P({0})

}. (1.10c)

—

So far, we have restricted our set-theoretic examples to finite sets. However, not sur-prisingly, many sets of interest to us will be infinite (we will have to postpone a mathe-matically precise definition of finite and infinite to Sec. 3.2). We will now introduce themost simple infinite set.

Definition 1.28. The set N := {1, 2, 3, . . . } is called the set of natural numbers (for amore rigorous construction of N, based on the axioms of axiomatic set theory, see Sec.A.3.4 of the Appendix, where Th. A.46 shows N to be, indeed, infinite). Moreover, wedefine N0 := {0} ∪ N.

—


The following theorem compiles important set-theoretic rules:

Theorem 1.29. Let A,B,C, U be sets.

(a) Commutativity of Intersections: A ∩ B = B ∩ A.

(b) Commutativity of Unions: A ∪B = B ∪ A.

(c) Associativity of Intersections: (A ∩ B) ∩ C = A ∩ (B ∩ C).

(d) Associativity of Unions: (A ∪ B) ∪ C = A ∪ (B ∪ C).

(e) Distributivity I: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C).

(f) Distributivity II: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).

(g) De Morgan’s Law I: U \ (A ∩ B) = (U \ A) ∪ (U \B).

(h) De Morgan’s Law II: U \ (A ∪B) = (U \ A) ∩ (U \B).

(i) Double Complement: If A ⊆ U , then U \ (U \ A) = A.

Proof. In each case, the proof results from the corresponding rule of Th. 1.11:

(a):

x ∈ A ∩ B ⇔ x ∈ A ∧ x ∈ BTh. 1.11(c)⇔ x ∈ B ∧ x ∈ A⇔ x ∈ B ∩ A.

(g): Under the general assumption of x ∈ U , we have the following equivalences:

x ∈ U \ (A ∩ B) ⇔ ¬(x ∈ A ∩ B) ⇔ ¬(x ∈ A ∧ x ∈ B

) Th. 1.11(i)⇔ ¬(x ∈ A) ∨ ¬(x ∈ B)

⇔ x ∈ U \ A ∨ x ∈ U \B ⇔ x ∈ (U \ A) ∪ (U \B).

The proofs of the remaining rules are left as an exercise. �

Remark 1.30. The correspondence between Th. 1.11 and Th. 1.29 is no coincidence.One can actually prove that, starting with an equivalence of propositional formulasφ(A1, . . . , An) ⇔ ψ(A1, . . . , An), where both formulas contain only the operators ∧,∨,¬,one obtains a set-theoretic rule (stating an equality of sets) by reinterpreting all state-ment variables A1, . . . , An as variables for sets, all subsets of a universe U , and replacing∧ by ∩, ∨ by ∪, and ¬ by U\ (if there are no multiple negations, then we do not needthe hypothesis that A1, . . . , An are subsets of U). The procedure also works in the op-posite direction – one can start with a set-theoretic formula for an equality of sets andtranslate it into two equivalent propositional formulas.


1.4 Predicate Calculus

Now that we have introduced sets in the previous section, we have to return to thesubject of mathematical logic once more. As it turns out, propositional calculus, whichwe discussed in Sec. 1.2, does not quite suffice to develop the theory of calculus (normost other mathematical theories). The reason is that we need to consider statementssuch as

x+ 1 > 0 holds for each natural number x. (T) (1.11a)

All real numbers are positive. (F) (1.11b)

There exists a natural number bigger than 10. (T) (1.11c)

There exists a real number x such that x2 = −1. (F) (1.11d)

For all natural numbers n, there exists a natural number bigger than n. (T) (1.11e)

That means we are interested in statements involving universal quantification via thequantifier “for all” (one also often uses “for each” or “for every” instead), existentialquantification via the quantifier “there exists”, or both. The quantifier of universalquantification is denoted by ∀ and the quantifier of existential quantification is denotedby ∃. Using these symbols as well as N and R to denote the sets of natural and realnumbers, respectively, we can restate (1.11) as

∀x∈N

x+ 1 > 0. (T) (1.12a)

∀x∈R

x > 0. (F) (1.12b)

∃n∈N

n > 10. (T) (1.12c)

∃x∈R

x2 = −1. (F) (1.12d)

∀n∈N

∃m∈N

m > n. (T) (1.12e)

Definition 1.31. A universal statement has the form

∀x∈A

P (x), (1.13a)

whereas an existential statement has the form

∃x∈A

P (x). (1.13b)

In (1.13), A denotes a set and P (x) is a sentence involving the variable x, a so-calledpredicate of x, that becomes a statement (i.e. becomes either true or false) if x is substi-tuted with any concrete element of the set A (in particular, P (x) is allowed to containfurther quantifiers, but it must not contain any other quantifier involving x – one saysx must be a free variable in P (x), not bound by any quantifier in P (x)).

The universal statement (1.13a) has the truth value T if, and only if, P (x) has the truthvalue T for all elements x ∈ A; the existential statement (1.13b) has the truth value Tif, and only if, P (x) has the truth value T for at least one element x ∈ A.


Remark 1.32. Some people prefer to write∧

x∈A

instead of ∀x∈A

and∨

x∈A

instead of ∃x∈A

.

Even though this notation has the advantage of emphasizing that the universal statementcan be interpreted as a big logical conjunction and the existential statement can beinterpreted as a big logical disjunction, it is significantly less common. So we will stickto ∀ and ∃ in this class.

Remark 1.33. According to Def. 1.31, the existential statement (1.13b) is true if, andonly if, P (x) is true for at least one x ∈ A. So if there is precisely one such x, then(1.13b) is true; and if there are several different x ∈ A such that P (x) is true, then(1.13b) is still true. Uniqueness statements are often of particular importance, and onesometimes writes

∃!x∈A

P (x) (1.14)

for the statement “there exists a unique x ∈ A such that P (x) is true”. This notationcan be defined as an abbreviation for

∃x∈A

(

P (x) ∧ ∀y∈A

(P (y) ⇒ x = y

))

. (1.15)

Example 1.34. Here are some examples of uniqueness statements:

∃!n∈N

n > 10. (F) (1.16a)

∃!n∈N

12 > n > 10. (T) (1.16b)

∃!n∈N

11 > n > 10. (F) (1.16c)

∃!x∈R

x2 = −1. (F) (1.16d)

∃!x∈R

x2 = 1. (F) (1.16e)

∃!x∈R

x2 = 0. (T) (1.16f)

Remark 1.35. As for propositional calculus, we also have some important rules forpredicate calculus:

(a) Consider the negation of a universal statement, ¬ ∀x∈A

P (x), which is true if, and

only if, P (x) does not hold for each x ∈ A, i.e. if, and only if, there exists at leastone x ∈ A such that P (x) is false (such that ¬P (x) is true). We have just provedthe rule

¬ ∀x∈A

P (x) ⇔ ∃x∈A

¬P (x). (1.17a)

Similarly, consider the negation of an existential statement. We claim the corre-sponding rule is

¬ ∃x∈A

P (x) ⇔ ∀x∈A

¬P (x). (1.17b)

Indeed, we can prove (1.17b) from (1.17a):

¬ ∃x∈A

P (x)Th. 1.11(k)⇔ ¬ ∃

x∈A¬¬P (x) (1.17a)⇔ ¬¬ ∀

x∈A¬P (x) Th. 1.11(k)⇔ ∀

x∈A¬P (x).

(1.18)


One can interpret (1.17) as a generalization of the De Morgan’s laws Th. 1.11(i),(j).

One can actually generalize (1.17) even a bit more: If a statement starts with severalquantifiers, then one negates the statement by replacing each ∀ with ∃ and vice versaplus negating the predicate after the quantifiers (see the example in (1.21e) below).

(b) If A,B are sets and P (x, y) denotes a predicate of both x and y, then ∀x∈A

∀y∈B

P (x, y)

and ∀y∈B

∀x∈A

P (x, y) both hold true if, and only if, P (x, y) holds true for each x ∈ A

and each y ∈ B, i.e. the order of two consecutive universal quantifiers does notmatter:

∀x∈A

∀y∈B

P (x, y) ⇔ ∀y∈B

∀x∈A

P (x, y) (1.19a)

In the same way, we obtain the following rule:

∃x∈A

∃y∈B

P (x, y) ⇔ ∃y∈B

∃x∈A

P (x, y). (1.19b)

If A = B, one also uses abbreviations of the form

∀x,y∈A

P (x, y) for ∀x∈A

∀y∈A

P (x, y), (1.20a)

∃x,y∈A

P (x, y) for ∃x∈A

∃y∈A

P (x, y). (1.20b)

Generalizing rules (1.19), we can always commute identical quantifiers. Caveat:Quantifiers that are not identical must not be commuted (see Ex. 1.36(d) below).

Example 1.36. (a) Negation of universal and existential statements:

Negation of (1.12a) : ∃x∈N

¬(x+1>0)︷︸︸︷

x+ 1 ≤ 0 . (F) (1.21a)

Negation of (1.12b) : ∃x∈R

¬(x>0)︷︸︸︷

x ≤ 0 . (T) (1.21b)

Negation of (1.12c) : ∀n∈N

¬(n>10)︷︸︸︷

n ≤ 10 . (F) (1.21c)

Negation of (1.12d) : ∀x∈R

¬(x2=−1)︷︸︸︷

x2 6= −1 . (T) (1.21d)

Negation of (1.12e) : ∃n∈N

∀m∈N

¬(m>n)︷︸︸︷

m ≤ n . (F) (1.21e)

(b) As a more complicated example, consider the negation of the uniqueness statement


(1.14), i.e. of (1.15):

¬ ∃!x∈A

P (x) ⇔ ¬ ∃x∈A

(

P (x) ∧ ∀y∈A

(P (y) ⇒ x = y

))

(1.17b), Th. 1.11(a)⇔ ∀x∈A

¬(

P (x) ∧ ∀y∈A

(¬P (y) ∨ x = y

))

Th. 1.11(i)⇔ ∀x∈A

(

¬P (x) ∨ ¬ ∀y∈A

(¬P (y) ∨ x = y

))

(1.17a)⇔ ∀x∈A

(

¬P (x) ∨ ∃y∈A

¬(¬P (y) ∨ x = y

))

Th. 1.11(j),(k)⇔ ∀x∈A

(

¬P (x) ∨ ∃y∈A

(P (y) ∧ x 6= y

))

Th. 1.11(a)⇔ ∀x∈A

(

P (x) ⇒ ∃y∈A

(P (y) ∧ x 6= y

))

. (1.22)

So how to decode the expression, we have obtained at the end? It states that ifP (x) holds for some x ∈ A, then there must be at least a second, different, elementy ∈ A such that P (y) is true. This is, indeed, precisely the negation of ∃!

x∈AP (x).

(c) Identical quantifiers commute:

∀x∈R

∀n∈N

x2n ≥ 0 ⇔ ∀n∈N

∀x∈R

x2n ≥ 0, (1.23a)

∀x∈R

∃y∈R

∃n∈N

ny > x2 ⇔ ∀x∈R

∃n∈N

∃y∈R

ny > x2. (1.23b)

(d) The following example shows that different quantifiers do, in general, not commute(i.e. do not yield equivalent statements when commuted): While the statement

∀x∈R

∃y∈R

y > x (1.24a)

is true (for each real number x, there is a bigger real number y, e.g. y := x+ 1 willdo the job), the statement

∃y∈R

∀x∈R

y > x (1.24b)

is false (for example, since y > y is false). In particular, (1.24a) and (1.24b) are notequivalent.

(e) Even though (1.14) provides useful notation, it is better not to think of ∃! as aquantifier. It is really just an abbreviation for (1.15), and it behaves very differentlyfrom ∃ and ∀: The following examples show that, in general, ∃! commutes neitherwith ∃, nor with itself:

∃n∈N

∃!m∈N

m < n 6⇔ ∃!m∈N

∃n∈N

m < n

(the statement on the left is true, as one can choose n = 2, but the statement onthe right is false, as ∃

n∈Nm < n holds for every m ∈ N). Similarly,

∃!n∈N

∃!m∈N

m < n 6⇔ ∃!m∈N

∃!n∈N

m < n


(the statement on the left is still true and the statement on the right is still false(there is no m ∈ N such that ∃!

n∈Nm < n)).

Remark 1.37. One can make the following observations regarding the strategy forproving universal and existential statements:

(a) To prove that ∀x∈A

P (x) is true, one must check the truth of P (x) for every element

x ∈ A – examples are not enough!

(b) To prove that ∀x∈A

P (x) is false, it suffices to find one x ∈ A such that P (x) is

false – such an x is then called a counterexample and one counterexample is alwaysenough to prove ∀

x∈AP (x) is false!

(c) To prove that ∃x∈A

P (x) is true, it suffices to find one x ∈ A such that P (x) is true

– such an x is then called an example and one example is always enough to prove∃

x∈AP (x) is true!

—

The subfield of mathematical logic dealing with quantified statements is called predicatecalculus. In general, one does not restrict the quantified variables to range only overelements of sets (as we have done above). Again, we refer to [EFT07] for a deepertreatment of the subject.

As an application of quantified statements, let us generalize the notion of union andintersection:

Definition 1.38. Let I 6= ∅ be a nonempty set, usually called an index set in the presentcontext. For each i ∈ I, let Ai denote a set (some or all of the Ai can be identical).

(a) The intersection⋂

i∈I

Ai :=

{

x : ∀i∈I

x ∈ Ai

}

(1.25a)

consists of all elements x that belong to every Ai.

(b) The union⋃

i∈I

Ai :=

{

x : ∃i∈I

x ∈ Ai

}

(1.25b)

consists of all elements x that belong to at least one Ai. The union is called disjointif, and only if, for each i, j ∈ I, i 6= j implies Ai ∩ Aj = ∅.

Proposition 1.39. Let I 6= ∅ be an index set, let M denote a set, and, for each i ∈ I,let Ai denote a set. The following set-theoretic rules hold:

(a)

(⋂

i∈I

Ai

)

∩M =⋂

i∈I

(Ai ∩M).


(b)

(⋃

i∈I

Ai

)

∪M =⋃

i∈I

(Ai ∪M).

(c)

(⋂

i∈I

Ai

)

∪M =⋂

i∈I

(Ai ∪M).

(d)

(⋃

i∈I

Ai

)

∩M =⋃

i∈I

(Ai ∩M).

(e) M \ ⋂i∈I

Ai =⋃

i∈I

(M \ Ai).

(f) M \ ⋃i∈I

Ai =⋂

i∈I

(M \ Ai).

Proof. We prove (c) and (e) and leave the remaining proofs as an exercise.

(c):

x ∈(⋂

i∈I

Ai

)

∪M ⇔ x ∈M ∨ ∀i∈I

x ∈ Ai(∗)⇔ ∀

i∈I

(x ∈ Ai ∨ x ∈M

)

⇔ x ∈⋂

i∈I

(Ai ∪M).

To justify the equivalence at (∗), we make use of Th. 1.11(b) and verify ⇒ and ⇐. For⇒ note that the truth of x ∈M implies x ∈ Ai∨x ∈M is true for each i ∈ I. If x ∈ Ai

is true for each i ∈ I, then x ∈ Ai ∨x ∈M is still true for each i ∈ I. To verify ⇐, notethat the existence of i ∈ I such that x ∈ M implies the truth of x ∈ M ∨ ∀

i∈Ix ∈ Ai.

If x ∈ M is false for each i ∈ I, then x ∈ Ai must be true for each i ∈ I, showingx ∈M ∨ ∀

i∈Ix ∈ Ai is true also in this case.

(e):

x ∈M \⋂

i∈I

Ai ⇔ x ∈M ∧ ¬ ∀i∈I

x ∈ Ai ⇔ x ∈M ∧ ∃i∈I

x /∈ Ai

⇔ ∃i∈I

x ∈M \ Ai ⇔ x ∈⋃

i∈I

(M \ Ai),

completing the proof. �

Example 1.40. We have the following identities of sets:

⋂

x∈R

N = N, (1.26a)

⋂

n∈N

{1, 2, . . . , n} = {1}, (1.26b)

⋃

x∈R

N = N, (1.26c)

2 FUNCTIONS AND RELATIONS 22

⋃

n∈N

{1, 2, . . . , n} = N, (1.26d)

N \⋃

n∈N

{2n} = {1, 3, 5, . . . } =⋂

n∈N

(N \ {2n}

): (1.26e)

Comparing with the notation of Def. 1.38, in (1.26a), for example, we have I = R andAi = N for each i ∈ I (where, in (1.26a), we have written x instead of i). Similarly, in(1.26b), we have I = N and An = {1, 2, . . . , n} for each n ∈ I.

2 Functions and Relations

2.1 Functions

Definition 2.1. Let A,B be sets. Given x ∈ A, y ∈ B, the set

(x, y) :={

{x}, {x, y}}

(2.1)

is called the ordered pair (often shortened to just pair) consisting of x and y. The set ofall such pairs is called the Cartesian product A×B, i.e.

A× B := {(x, y) : x ∈ A ∧ y ∈ B}. (2.2)

Example 2.2. Let A be a set.

A× ∅ = ∅ × A = ∅, (2.3a)

{1, 2} × {1, 2, 3} = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)} (2.3b)

6= {1, 2, 3} × {1, 2} = {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)}. (2.3c)

Also note that, for x 6= y,

(x, y) ={{x}, {x, y}

}6={{y}, {x, y}

}= (y, x). (2.4)

Definition 2.3. Given sets A,B, a function or map f is an assignment rule that assignsto each x ∈ A a unique y ∈ B. One then also writes f(x) for the element y. The set Ais called the domain of f , denoted D(f), and B is called the range of f , denoted R(f).The information about a map f can be concisely summarized by the notation

f : A −→ B, x 7→ f(x), (2.5)

where x 7→ f(x) is called the assignment rule for f , f(x) is called the image of x, andx is called a preimage of f(x) (the image must be unique, but there might be severalpreimages). The set

graph(f) :={(x, y) ∈ A×B : y = f(x)

}(2.6)


is called the graph of f (not to be confused with pictures visualizing the function f ,which are also called graph of f). If one wants to be completely precise, then oneidentifies the function f with the ordered triple (A,B, graph(f)).

The set of all functions with domain A and range B is denoted by F(A,B) or BA, i.e.

F(A,B) := BA :={(f : A −→ B) : A = D(f) ∧ B = R(f)

}. (2.7)

Caveat: Some authors reserve the word map for continuous functions, but we use func-tion and map synonymously.

Definition 2.4. Let A,B be sets and f : A −→ B a function.

(a) If T is a subset of A, then

f(T ) := {f(x) ∈ B : x ∈ T} (2.8)

is called the image of T under f .

(b) If U is a subset of B, then

f−1(U) := {x ∈ A : f(x) ∈ U} (2.9)

is called the preimage or inverse image of U under f .

(c) f is called injective or one-to-one if, and only if, every y ∈ B has at most onepreimage, i.e. if, and only if, the preimage of {y} has at most one element:

f injective ⇔ ∀y∈B

(

f−1{y} = ∅ ∨ ∃!x∈A

f(x) = y

)

⇔ ∀x1,x2∈A

(x1 6= x2 ⇒ f(x1) 6= f(x2)

). (2.10)

(d) f is called surjective or onto if, and only if, every element of the range of f has apreimage:

f surjective ⇔ ∀y∈B

∃x∈A

y = f(x) ⇔ ∀y∈B

f−1{y} 6= ∅. (2.11)

(e) f is called bijective if, and only if, f is injective and surjective.

Example 2.5. Examples of Functions:

f : {1, 2, 3, 4, 5} −→ {1, 2, 3, 4, 5}, f(x) := −x+ 6, (2.12a)

g : N −→ N, g(n) := 2n, (2.12b)

h : N −→ {2, 4, 6, . . . }, h(n) := 2n, (2.12c)

h : N −→ {2, 4, 6, . . . }, h(n) :=

{

n for n even,

n+ 1 for n odd,(2.12d)

G : N −→ R, G(n) := n/(n+ 1), (2.12e)

F : P(N) −→ P(P(N)

), F (A) := P(A). (2.12f)


Instead of f(x) := −x + 6 in (2.12a), one can also write x 7→ −x + 6 and analogouslyin the other cases. Also note that, in the strict sense, functions g and h are different,since their ranges are different (however, using the following Def. 2.4(a), they have thesame image in the sense that g(N) = h(N)). Furthermore,

f({1, 2}) = {5, 4} = f−1({1, 2}), h−1({2, 4, 6}) = {1, 2, 3, 4, 5, 6}, (2.13)

f is bijective; g is injective, but not surjective; h is bijective; h is surjective, but notinjective. Can you figure out if G and F are injective and/or surjective?

Example 2.6. (a) For each nonempty set A, the map Id : A −→ A, Id(x) := x, iscalled the identity on A. If one needs to emphasize that Id operates on A, then onealso writes IdA instead of Id. The identity is clearly bijective.

(b) Let A,B be nonempty sets. A map f : A −→ B is called constant if, and only if,there exists c ∈ B such that f(x) = c for each x ∈ A. In that case, one also writesf ≡ c, which can be read as “f is identically equal to c”. If f ≡ c, ∅ 6= T ⊆ A, andU ⊆ B, then

f(T ) = {c}, f−1(U) =

{

A for c ∈ U,

∅ for c /∈ U.(2.14)

f is injective if, and only if, A = {x}; f is surjective if, and only if, B = {c}.

(c) Given A ⊆ X, the mapι : A −→ X, ι(x) := x, (2.15)

is called inclusion (also embedding or imbedding). An inclusion is always injective;it is surjective if, and only if A = X, i.e. if, and only if, it is the identity on A.

(d) Given A ⊆ X and a map f : X −→ B, the map g : A −→ B, g(x) = f(x), is calledthe restriction of f to A; f is called the extension of g to X. In this situation, onealso uses the notation f ↾A for g (some authors prefer the notation f |A or f |A).

Theorem 2.7. Let f : A→ B be a map, let ∅ 6= I be an index set, and assume S, T, Si,i ∈ I, are subsets of A, whereas U, V, Ui, i ∈ I, are subsets of B. Then we have thefollowing rules concerning functions and set-theoretic operations:

f(S ∩ T ) ⊆ f(S) ∩ f(T ), (2.16a)

f

(⋂

i∈I

Si

)

⊆⋂

i∈I

f(Si), (2.16b)

f(S ∪ T ) = f(S) ∪ f(T ), (2.16c)

f

(⋃

i∈I

Si

)

=⋃

i∈I

f(Si), (2.16d)

f−1(U ∩ V ) = f−1(U) ∩ f−1(V ), (2.16e)

f−1

(⋂

i∈I

Ui

)

=⋂

i∈I

f−1(Ui), (2.16f)


f−1(U ∪ V ) = f−1(U) ∪ f−1(V ), (2.16g)

f−1

(⋃

i∈I

Ui

)

=⋃

i∈I

f−1(Ui), (2.16h)

f(f−1(U)) ⊆ U, f−1(f(S)) ⊇ S, (2.16i)

f−1(U \ V ) = f−1(U) \ f−1(V ). (2.16j)

Proof. We prove (2.16b) (which includes (2.16a) as a special case) and the second partof (2.16i), and leave the remaining cases as exercises.

For (2.16b), one argues

y ∈ f

(⋂

i∈I

Si

)

⇔ ∃x∈A

∀i∈I

(x ∈ Si ∧ y = f(x)

)⇒ ∀

i∈Iy ∈ f(Si) ⇔ y ∈

⋂

i∈I

f(Si).

The observation

x ∈ S ⇒ f(x) ∈ f(S) ⇔ x ∈ f−1(f(S)).

establishes the second part of (2.16i). �

It is an exercise to find counterexamples that show one can not, in general, replace thefour subset symbols in (2.16) by equalities (it is possible to find examples with sets thathave at most 2 elements).

Definition 2.8. The composition of maps f and g with f : A −→ B, g : C −→ D, andf(A) ⊆ C is defined to be the map

g ◦ f : A −→ D, (g ◦ f)(x) := g(f(x)

). (2.17)

The expression g ◦ f is read as “g after f” or “g composed with f”.

Example 2.9. Consider the maps

f : N −→ R, n 7→ n2, (2.18a)

g : N −→ R, n 7→ 2n. (2.18b)

We obtain f(N) = {1, 4, 9, . . . } ⊆ D(g), g(N) = {2, 4, 6, . . . } ⊆ D(f), and the composi-tions

(g ◦ f) : N −→ R, (g ◦ f)(n) = g(n2) = 2n2, (2.19a)

(f ◦ g) : N −→ R, (f ◦ g)(n) = f(2n) = 4n2, (2.19b)

showing that composing functions is, in general, not commutative, even if the involvedfunctions have the same domain and the same range.

Proposition 2.10. Consider maps f : A −→ B, g : C −→ D, h : E −→ F , satisfyingf(A) ⊆ C and g(C) ⊆ E.


(a) Associativity of Compositions:

h ◦ (g ◦ f) = (h ◦ g) ◦ f. (2.20)

(b) One has the following law for forming preimages:

∀W∈P(D)

(g ◦ f)−1(W ) = f−1(g−1(W )). (2.21)

Proof. (a): Both h ◦ (g ◦ f) and (h ◦ g) ◦ f map A into F . So it just remains to prove(h ◦ (g ◦ f)

)(x) =

((h ◦ g) ◦ f

)(x) for each x ∈ A. One computes, for each x ∈ A,

(h ◦ (g ◦ f)

)(x) = h

((g ◦ f)(x)

)= h

(g(f(x))

)= (h ◦ g)(f(x))

=((h ◦ g) ◦ f

)(x), (2.22)

establishing the case.

(b): Exercise. �

Definition 2.11. A function g : B −→ A is called a right inverse (resp. left inverse)of a function f : A −→ B if, and only if, f ◦ g = IdB (resp. g ◦ f = IdA). Moreover,g is called an inverse of f if, and only if, it is both a right and a left inverse. If g isan inverse of f , then one also writes f−1 instead of g. The map f is called (right, left)invertible if, and only if, there exists a (right, left) inverse for f .

Example 2.12. (a) Consider the map

f : N −→ N, f(n) := 2n. (2.23a)

The maps

g1 : N −→ N, g1(n) :=

{

n/2 if n even,

1 if n odd,(2.23b)

g2 : N −→ N, g2(n) :=

{

n/2 if n even,

2 if n odd,(2.23c)

both constitute left inverses of f . It follows from Th. 2.13(c) below that f does nothave a right inverse.

(b) Consider the map

f : N −→ N, f(n) :=

{

n/2 for n even,

(n+ 1)/2 for n odd.(2.24a)

The maps

g1 : N −→ N, g1(n) := 2n, (2.24b)

g2 : N −→ N, g2(n) := 2n− 1, (2.24c)

both constitute right inverses of f . It follows from Th. 2.13(c) below that f doesnot have a left inverse.


(c) The map

f : N −→ N, f(n) :=

{

n− 1 for n even,

n+ 1 for n odd,(2.25a)

is its own inverse, i.e. f−1 = f . For the map

g : N −→ N, g(n) :=

2 for n = 1,

3 for n = 2,

1 for n = 3,

n for n /∈ {1, 2, 3},

(2.25b)

the inverse is

g−1 : N −→ N, g−1(n) :=

3 for n = 1,

1 for n = 2,

2 for n = 3,

n for n /∈ {1, 2, 3}.

(2.25c)

While Examples 2.12(a),(b) show that left and right inverses are usually not unique,they are unique provided f is bijective (see Th. 2.13(c)).

Theorem 2.13. Let A,B be nonempty sets.

(a) f : A −→ B is right invertible if, and only if, f is surjective (where the implication“⇐” makes use of the axiom of choice (AC), see Appendix A.4).

(b) f : A −→ B is left invertible if, and only if, f is injective.

(c) f : A −→ B is invertible if, and only if, f is bijective. In this case, the right inverseand the left inverse are unique and both identical to the inverse.

Proof. (a): If f is surjective, then, for each y ∈ B, there exists xy ∈ f−1{y} such thatf(xy) = y. By AC, we can define the choice function

g : B −→ A, g(y) := xy. (2.26)

Then, for each y ∈ B, f(g(y)) = y, showing g is a right inverse of f . Conversely, ifg : B −→ A is a right inverse of f , then, for each y ∈ B, it is y = f(g(y)), showing thatg(y) ∈ A is a preimage of y, i.e. f is surjective.

(b): Fix a ∈ A. If f is injective, then, for each y ∈ B with f−1{y} 6= ∅, let xy denotethe unique element in A satisfying f(xy) = y. Define

g : B −→ A, g(y) :=

{

xy for f−1{y} 6= ∅,a otherwise.

(2.27)


Then, for each x ∈ A, g(f(x)) = x, showing g is a left inverse of f . Conversely, ifg : B −→ A is a left inverse of f and x1, x2 ∈ A with f(x1) = f(x2) = y, thenx1 = (g ◦ f)(x1) = g(f(x1)) = g(f(x2)) = (g ◦ f)(x2) = x2, showing y has precisely onepreimage and f is injective.

The first part of (c) follows immediately by combining (a) and (b) (and, actually, withoutusing AC, since, if f is both injective and surjective, then, for each y ∈ B, the elementxy ∈ f−1{y} is unique, and (2.26) can be defined without AC). It merely remains toverify the uniqueness of right and left inverse for bijective maps. So let g be a left inverseof f , let h be a right inverse of f , and let f−1 be an inverse of f . Then, for each y ∈ B,

g(y) =(g ◦ (f ◦ f−1)

)(y) =

((g ◦ f) ◦ f−1

)(y) = f−1(y), (2.28a)

h(y) =((f−1 ◦ f) ◦ h

)(y) =

(f−1 ◦ (f ◦ h)

)(y) = f−1(y), (2.28b)

thereby proving the uniqueness of left and right inverse for bijective maps. �

Theorem 2.14. Consider maps f : A −→ B, g : B −→ C. If f and g are both injective(resp. both surjective, both bijective), then so is g ◦ f . Moreover, in the bijective case,one has

(g ◦ f)−1 = f−1 ◦ g−1. (2.29)

Proof. Exercise. �

Definition 2.15. (a) Given an index set I and a set A, a map f : I −→ A is sometimescalled a family (of elements in A), and is denoted in the form f = (ai)i∈I withai := f(i). When using this representation, one often does not even specify f andA, especially if the ai are themselves sets.

(b) A sequence in a set A is a family of elements in A, where the index set is the set ofnatural numbers N. In this case, one writes (an)n∈N or (a1, a2, . . . ). More generally,a family is called a sequence, given a bijective map between the index set I and asubset of N.

(c) Given a family of sets (Ai)i∈I , we define the Cartesian product of the Ai to be theset of functions

∏

i∈I

Ai :=

{(

f : I −→⋃

j∈I

Aj

)

: ∀i∈I

f(i) ∈ Ai

}

. (2.30)

If I has precisely n elements with n ∈ N, then the elements of the Cartesian product∏

i∈I Ai are called (ordered) n-tuples, (ordered) triples for n = 3.

Example 2.16. (a) Using the notion of family, we can now say that the intersection⋂

i∈I Ai and union⋃

i∈I Ai as defined in Def. 1.38 are the intersection and union ofthe family of sets (Ai)i∈I , respectively. As a concrete example, let us revisit (1.26b),where we have

(An)n∈N, An := {1, 2, . . . , n},⋂

n∈N

An = {1}. (2.31)


(b) Examples of Sequences:

Sequence in {0, 1} : (1, 0, 1, 0, 1, 0, . . . ), (2.32a)

Sequence in N : (n2)n∈N = (1, 4, 9, 16, 25, . . . ), (2.32b)

Sequence in R :((−1)n

√n)

n∈N=(

−1,√2,−

√3, . . .

)

, (2.32c)

Sequence in R : (1/n)n∈N =

(

1,1

2,1

3, . . .

)

, (2.32d)

Finite Sequence in P(N) :({3, 2, 1}, {2, 1}, {1}, ∅

). (2.32e)

(c) The Cartesian product∏

i∈I A, where all sets Ai = A, is the same as AI , the setof all functions from I into A. So, for example,

∏

n∈N R = RN is the set of allsequences in R. If I = {1, 2, . . . , n} with n ∈ N, then

∏

i∈I

A = A{1,2...,n} =:n∏

i=1

A =: An (2.33)

is the set of all n-tuples with entries from A.

—

In the following, we explain the common notation 2A for the power set P(A) of a setA. It is related to a natural identification between subsets and their correspondingcharacteristic function.

Definition 2.17. Let A be a set and let B ⊆ A be a subset of A. Then

χB : A −→ {0, 1}, χB(x) :=

{

1 if x ∈ B,

0 if x /∈ B,(2.34)

is called the characteristic function of the set B (with respect to the universe A). Onealso finds the notations 1B and 1B instead of χB (note that all the notations supressthe dependence of the characteristic function on the universe A).

Proposition 2.18. Let A be a set. Then the map

χ : P(A) −→ {0, 1}A, χ(B) := χB, (2.35)

is bijective (recall that P(A) denotes the power set of A and {0, 1}A denotes the set ofall functions from A into {0, 1}).

Proof. χ is injective: Let B,C ∈ P(A) with B 6= C. By possibly switching the namesof B and C, we may assume there exists x ∈ B such that x /∈ C. Then χB(x) = 1,whereas χC(x) = 0, showing χ(B) 6= χ(C), proving χ is injective.

χ is surjective: Let f : A −→ {0, 1} be an arbitrary function and define B := {x ∈ A :f(x) = 1}. Then χ(B) = χB = f , proving χ is surjective. �

Proposition 2.18 allows one to identify the sets P(A) and {0, 1}A via the bijective mapχ. This fact together with the common practise of set theory to identify the number 2with the set {0, 1} explains the notation 2A for P(A).


2.2 Relations

2.2.1 Definition and Properties

Definition 2.19. Given sets A and B, a relation is a subset R of A× B (if one wantsto be completely precise, a relation is an ordered triple (A,B,R), where R ⊆ A × B).If A = B, then we call R a relation on A. One says that a ∈ A and b ∈ B are relatedaccording to the relation R if, and only if, (a, b) ∈ R. In this context, one usually writesaR b instead of (a, b) ∈ R.

Example 2.20. (a) The relations we are probably most familiar with are = and ≤.The relation R of equality, usually denoted =, makes sense on every nonempty setA:

R := ∆(A) := {(x, x) ∈ A× A : x ∈ A}. (2.36)

The set ∆(A) is called the diagonal of the Cartesian product, i.e., as a subset ofA× A, the relation of equality is identical to the diagonal:

x = y ⇔ xR y ⇔ (x, y) ∈ R = ∆(A). (2.37)

Similarly, the relation ≤ on R is identical to the set

R≤ := {(x, y) ∈ R2 : x ≤ y}. (2.38)

(b) Every function f : A −→ B is a relation, namely the relation

Rf = {(x, y) ∈ A×B : y = f(x)} = graph(f). (2.39)

Conversely, if B 6= ∅, then every relation R ⊆ A × B uniquely corresponds to thefunction

fR : A −→ P(B), fR(x) = {y ∈ B : xR y}. (2.40)

Definition 2.21. Let R be a relation on the set A.

(a) R is called reflexive if, and only if,

∀x∈A

xRx, (2.41)

i.e. if, and only if, every element is related to itself.

(b) R is called symmetric if, and only if,

∀x,y∈A

(xR y ⇒ y Rx

), (2.42)

i.e. if, and only if, each x is related to y if, and only if, y is related to x.

(c) R is called antisymmetric if, and only if,

∀x,y∈A

((xR y ∧ y Rx) ⇒ x = y

), (2.43)

i.e. if, and only if, the only possibility for x to be related to y at the same time thaty is related to x is in the case x = y.


(d) R is called transitive if, and only if,

∀x,y,z∈A

((xR y ∧ y R z) ⇒ xR z

), (2.44)

i.e. if, and only if, the relatedness of x and y together with the relatedness of y andz implies the relatedness of x and z.

Example 2.22. The relations = and ≤ on R (or N) are reflexive, antisymmetric, andtransitive; = is also symmetric, whereas ≤ is not; < is antisymmetric (since x < y∧y < xis always false) and transitive, but neither reflexive nor symmetric. The relation

R :={(x, y) ∈ N2 : (x, y are both even) ∨ (x, y are both odd)

}(2.45)

on N is not antisymmetric, but reflexive, symmetric, and transitive. The relation

S := {(x, y) ∈ N2 : y = x2} (2.46)

is not transitive (for example, 2S 4 and 4S 16, but not 2S 16), not reflexive, not sym-metric; it is only antisymmetric.

2.2.2 Order Relations

Definition 2.23. A relation R on a set A is called a partial order if, and only if, R isreflexive, antisymmetric, and transitive. If R is a partial order, then one usually writesx ≤ y instead of xR y. A partial order ≤ is called a total or linear order if, and only if,for each x, y ∈ A, one has x ≤ y or y ≤ x.

Notation 2.24. Given a (partial or total) order ≤ on A 6= ∅, we write x < y if, andonly if, x ≤ y and x 6= y, calling < the strict order corresponding to ≤ (note that thestrict order is never a partial order).

Definition 2.25. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A.

(a) x ∈ A is called lower (resp. upper) bound for B if, and only if, x ≤ b (resp. b ≤ x)for each b ∈ B. Moreover, B is called bounded from below (resp. from above) if, andonly if, there exists a lower (resp. upper) bound for B; B is called bounded if, andonly if, it is bounded from above and from below.

(b) x ∈ B is called minimum or just min (resp. maximum or max) of B if, and only if,x is a lower (resp. upper) bound for B. One writes x = minB if x is minimum andx = maxB if x is maximum.

(c) A maximum of the set of lower bounds of B (i.e. a largest lower bound) is calledinfimum of B, denoted inf B; a minimum of the set of upper bounds of B (i.e. asmallest upper bound) is called supremum of B, denoted supB.


Example 2.26. (a) For each A ⊆ R, the usual relation ≤ defines a total order on A.For A = R, we see that N has 0 and 1 as lower bound with 1 = minN = inf N. Onthe other hand, N is unbounded from above. The set M := {1, 2, 3} is boundedwith minM = 1, maxM = 3. The positive real numbers R+ := {x ∈ R : x > 0}have inf R+ = 0, but they do not have a minimum (if x > 0, then 0 < x/2 < x).

(b) Consider A := N× N. Then

(m1,m2) ≤ (n1, n2) ⇔ m1 ≤ n1 ∧ m2 ≤ n2, (2.47)

defines a partial order on A that is not a total order (for example, neither (1, 2) ≤(2, 1) nor (2, 1) ≤ (1, 2)). For the set

B :={(1, 1), (2, 1), (1, 2)

}, (2.48)

we have inf B = minB = (1, 1), B does not have a max, but supB = (2, 2) (if(m,n) ∈ A is an upper bound for B, then (2, 1) ≤ (m,n) implies 2 ≤ m and(1, 2) ≤ (m,n) implies 2 ≤ n, i.e. (2, 2) ≤ (m,n); since (2, 2) is clearly an upperbound for B, we have proved supB = (2, 2)).

A different order on A is the so-called lexicographic order defined by

(m1,m2) ≤ (n1, n2) ⇔ m1 < n1 ∨ (m1 = n1 ∧ m2 ≤ n2). (2.49)

In contrast to the order from (2.47), the lexicographic order does define a totalorder on A.

Lemma 2.27. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. Then the relation ≥,defined by

x ≥ y ⇔ y ≤ x, (2.50)

is also a partial order on A. Moreover, using obvious notation, we have, for each x ∈ A,

x ≤-lower bound for B ⇔ x ≥-upper bound for B, (2.51a)

x ≤-upper bound for B ⇔ x ≥-lower bound for B, (2.51b)

x = min≤B ⇔ x = max≥B, (2.51c)

x = max≤B ⇔ x = min≥B, (2.51d)

x = inf≤B ⇔ x = sup≥B, (2.51e)

x = sup≤B ⇔ x = inf≥B. (2.51f)

Proof. Reflexivity, antisymmetry, and transitivity of ≤ clearly imply the same propertiesfor ≥, respectively. Moreover

x ≤-lower bound for B ⇔ ∀b∈B

x ≤ b ⇔ ∀b∈B

b ≥ x ⇔ x ≥-upper bound for B,


proving (2.51a). Analogously, we obtain (2.51b). Next, (2.51c) and (2.51d) are impliedby (2.51a) and (2.51b), respectively. Finally, (2.51e) is proved by

x = inf≤B ⇔ x = max≤{y ∈ A : y ≤-lower bound for B}⇔ x = min≥{y ∈ A : y ≥-upper bound for B} ⇔ x = sup≥B,

and (2.51f) follows analogously. �

Proposition 2.28. Let ≤ be a partial order on A 6= ∅, ∅ 6= B ⊆ A. The elementsmaxB, minB, supB, inf B are all unique, provided they exist.


Definition 2.29. Let A,B be nonempty sets with partial orders, both denoted by ≤(even though they might be different). A function f : A −→ B, is called (strictly)isotone, order-preserving, or increasing if, and only if,

∀x,y∈A

(x < y ⇒ f(x) ≤ f(y) (resp. f(x) < f(y))

); (2.52a)

f is called (strictly) antitone, order-reversing, or decreasing if, and only if,

∀x,y∈A

(x < y ⇒ f(x) ≥ f(y) (resp. f(x) > f(y))

). (2.52b)

Functions that are (strictly) isotone or antitone are called (strictly) monotone.

Proposition 2.30. Let A,B be nonempty sets with partial orders, both denoted by ≤.

(a) A (strictly) isotone function f : A −→ B becomes a (strictly) antitone functionand vice versa if precisely one of the relations ≤ is replaced by ≥.

(b) If the order ≤ on A is total and f : A −→ B is strictly isotone or strictly antitone,then f is one-to-one.

(c) If the order ≤ on A is total and f : A −→ B is invertible and strictly isotone (resp.antitone), then f−1 is also strictly isotone (resp. antitone).

Proof. (a) is immediate from (2.52).

(b): Due to (a), it suffices to consider the case that f is strictly isotone. If f is strictlyisotone and x 6= y, then x < y or y < x since the order on A is total. Thus, f(x) < f(y)or f(y) < f(x), i.e. f(x) 6= f(y) in every case, showing f is one-to-one.

(c): Again, due to (a), it suffices to consider the isotone case. If u, v ∈ B such that u < v,then u = f(f−1(u)), v = f(f−1(v)), and the isotonicity of f imply f−1(u) < f−1(v) (weare using that the order on A is total – otherwise, f−1(u) and f−1(v) need not becomparable). �

Example 2.31. (a) f : N −→ N, f(n) := 2n, is strictly increasing, every constant mapon N is both increasing and decreasing, but not strictly increasing or decreasing.All maps occurring in (2.25) are neither increasing nor decreasing.


(b) The map f : R −→ R, f(x) := −2x, is invertible and strictly decreasing, and so isf−1 : R −→ R, f−1(x) := −x/2.

(c) The following counterexamples show that the assertions of Prop. 2.30(b),(c) are nolonger correct if one does not assume the order on A is total. Let A be the set from(2.48) (where it had been called B) with the (nontotal) order from (2.47). The map

f : A −→ N,

f(1, 1) := 1,

f(1, 2) := 2,

f(2, 1) := 2,

is strictly isotone, but not one-to-one. The map

f : A −→ {1, 2, 3},

f(1, 1) := 1,

f(1, 2) := 2,

f(2, 1) := 3,

is strictly isotone and invertible, however f−1 is not isotone (since 2 < 3, butf−1(2) = (1, 2) and f−1(3) = (2, 1) are not comparable, i.e. f−1(2) ≤ f−1(3) is nottrue).

2.2.3 Equivalence Relations

Definition 2.32. Let R be a relation on a set A.

(a) R is called an equivalence relation if, and only if, R is reflexive, symmetric, andtransitive. If R is an equivalence relations, then one often writes x ∼ y instead ofxR y.

(b) Let ∼ := R be an equivalence relation on A. For each x ∈ A, define

[x] := {y ∈ A : x ∼ y} (2.53)

and call [x] the equivalence class of x. Moreover, each y ∈ [x] is called a represen-tative of [x]. The set of all equivalence classes A/ ∼:= {[x] : x ∈ A} is called thequotient set of A by ∼, and the map

π : A −→ A/ ∼, x 7→ [x], (2.54)

is called the corresponding quotient map, canonical map, or canonical projection.

—

The following Th. 2.33 shows that the equivalence classes of an equivalence relationon a nonempty set A decompose A into disjoint sets and, conversely, that a givendecomposition of a nonempty set A into disjoint nonempty sets Ai, i ∈ I, gives riseto a unique equivalence relation ∼ on A such that the Ai are precisely the equivalenceclasses corresponding to ∼:


Theorem 2.33. Let A be a nonempty set.

(a) Given a disjoint union A = ˙⋃i∈IAi with every Ai 6= ∅ (a so-called decomposition

of A), an equivalence relation on A is defined by

x ∼ y :⇔ ∃i∈I

(x ∈ Ai ∧ y ∈ Ai

). (2.55)

Moreover, for the equivalence classes given by ∼, one then has

∀x∈A

∀i∈I

(

x ∈ Ai ⇔ Ai = [x])

. (2.56)

(b) Given an equivalence relation ∼ on a nonempty set A, the equivalence classes givenby ∼ form a decomposition of A: One has

∀x,y∈A

(([x] = [y] ⇔ x ∼ y

)∧

([x] ∩ [y] = ∅ ⇔ ¬(x ∼ y)

))

(2.57)

and

A =⋃

i∈IAi, (2.58)

where I := A/ ∼ is the quotient set of Def. 2.32(b) and Ai := i for each i ∈ I.

Proof. (a): That ∼ is symmetric is immediate from (2.55). If x ∈ A, then, as A is theunion of the Ai, there exists i ∈ I with x ∈ Ai, showing x ∼ x, i.e. ∼ is reflexive. Ifx, y, z ∈ A with x ∼ y and y ∼ z, then there exist i, j ∈ I with x, y ∈ Ai and y, z ∈ Aj.Then y ∈ Ai ∩Aj, implying i = j (as the union is disjoint) and x ∼ z, showing ∼ to betransitive as well. Thus, we have shown ∼ to be an equivalence relation. Now considerx ∈ A and i ∈ I. If Ai = [x], then x ∼ x implies x ∈ [x] = Ai. Conversely, assumex ∈ Ai. Then

y ∈ Ai(2.55)⇔ x ∼ y ⇔ y ∈ [x],

proving Ai = [x]. Hence, we have verified (2.56) and (a).

(b): Let x, y ∈ A. If [x] = [y], then (as y ∼ y) y ∈ [y] = [x], implying x ∼ y. Conversely,assume x ∼ y. Then z ∈ [y] implies y ∼ z, implying x ∼ z (since ∼ is transitive) and,thus, z ∈ [x] and [y] ⊆ [x]. From this, and the symmetry of ∼, we also obtain x ∼ yimplies y ∼ x, which implies [x] ⊆ [y]. Altogether, we have [x] = [y]. Thus, we haveestablished the first equivalence of (2.57). In consequence, we also have

[x] 6= [y] ⇔ ¬(x ∼ y).

To prove the second equivalence of (2.57), we now show [x] 6= [y] ⇔ [x] ∩ [y] = ∅: If[x] ∩ [y] = ∅, then [x] = [y] could only hold for [x] = [y] = ∅. However, x ∈ [x] andy ∈ [y], showing [x] 6= [y]. For the converse, we argue via contraposition and assumez ∈ [x]∩ [y]. Then x ∼ z and y ∼ z and, by symmetry and transitivity of ∼, x ∼ y and[x] = [y], proving the second equivalence of (2.57). It remains to verify (2.58). From(2.57), we know the elements of A/ ∼ to be disjoint. On the other hand, if x ∈ A, thenx ∈ [x] ∈ A/ ∼, showing A to be the union of the Ai, thereby proving (2.58) and thetheorem. �


Example 2.34. (a) The equality relation = is an equivalence relation on each A 6= ∅,where, for each x ∈ A, one has [x] = {x}.

(b) The relation R defined in (2.45) is an equivalence relation on N. Here, R yields pre-cisely two equivalence classes, one consisting of all even numbers and one consistingof all odd numbers.

Remark 2.35. If ∼ is an equivalence relation on a nonempty set A, then, clearly, thequotient map π : A −→ A/ ∼, x 7→ [x], is always surjective. It is injective (and, thus,bijective) if, and only if, every equivalence class has precisely one element, i.e. if, andonly if, ∼ is the equality relation =.

Example 2.36. (a) An important application of equivalence relations and quotientsets is the construction of the set Z = {. . . ,−2,−1, 0, 1, 2, . . . } of integers from theset N0 of natural numbers (including 0) of Def. 1.28: One actually defines Z as thequotient set with respect to the following equivalence relation ∼ on N0×N0, wherethe relation ∼ on N0 × N0 is defined1 by

(a, b) ∼ (c, d) :⇔ a+ d = b+ c. (2.59)

We verify that (2.59) does, indeed, define an equivalence relation on N0 × N0: Ifa, b ∈ N0, then a + b = b + a shows (a, b) ∼ (a, b), proving ∼ to be reflexive. Ifa, b, c, d ∈ N0, then

(a, b) ∼ (c, d) ⇒ a+ d = b+ c ⇒ c+ b = d+ a ⇒ (c, d) ∼ (a, b),

proving ∼ to be symmetric. If a, b, c, d, e, f ∈ N0, then

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ a+ d = b+ c ∧ c+ f = d+ e

⇒ a+ d+ c+ f = b+ c+ d+ e ⇒ a+ f = b+ e ⇒ (a, b) ∼ (e, f),

proving ∼ to be transitive and an equivalence relation. Thus, we can, indeed, define

Z := (N0 × N0)/ ∼ ={[(a, b)] : (a, b) ∈ N0 × N0

}. (2.60)

To simplify notation, in the following, we will write

[a, b] := [(a, b)] (2.61)

for the equivalence class of (a, b) with respect to ∼. The map

ι : N0 −→ Z, ι(n) := [n, 0], (2.62)

1In (2.59), we employ the usual addition on N. Its mathematically precise definition is actuallysomewhat tedious and, at this stage, we do not even have all the necessary prerequisites in place, yet.Still, it should be possible to follow the present example, using one’s intuitive, informal understandingof the addition on N0. The precise definition can be found in [Phi16, Sec. D.1] and interested readersmight want to study the definition in [Phi16, Sec. D.1] once we have introduced the notions of inductionand recursion.


is injective (since ι(m) = [m, 0] = ι(n) = [n, 0] implies m+0 = 0+n, i.e. m = n). Itis customary to identify N0 with ι(N0), as it usually does not cause any confusion.One then just writes n instead of [n, 0] and −n instead of [0, n] = −[n, 0] (we willcome back to the addition on Z later and then this equation will make more sense,cf. Th. 4.15).

(b) Having constructed the set if integers Z in (a), in a next step, one can perform asimilar construction to obtain the set of rational numbers Q. One defines Q as thequotient set with respect to the following equivalence relation ∼ on Z× (Z \ {0}),where the relation ∼ on Z× (Z \ {0}) is defined2 by

(a, b) ∼ (c, d) :⇔ a · d = b · c. (2.63)

Noting that (2.63) is precisely the same as (2.59) if + is replaced by ·, the prooffrom (a) also shows that (2.63) does, indeed, define an equivalence relation onZ× (Z \ {0}): One merely replaces each + with · and each N0 with Z or Z \ {0}),respectively. The only modification needed occurs for 0 ∈ {a, c, e} in the proof oftransitivity (in this case, the proof of (a) yields adcf = 0 = bcde, which does notimply af = be), where one now argues, for a = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = 0 = bc ∧ cf = de

b 6=0⇒ c = 0d 6=0⇒ e = 0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f),

for c = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = 0 = bc ∧ cf = 0 = de

d 6=0⇒ a = e = 0 af = 0 = be ⇒ (a, b) ∼ (e, f),

and, for e = 0,

(a, b) ∼ (c, d) ∧ (c, d) ∼ (e, f) ⇒ ad = bc ∧ cf = 0 = de

f 6=0⇒ c = 0d 6=0⇒ a = 0 ⇒ af = 0 = be ⇒ (a, b) ∼ (e, f).

Thus, we can, indeed, define

Q :=(Z× (Z \ {0})

)/ ∼ =

{[(a, b)] : (a, b) ∈ Z× (Z \ {0})

}. (2.64)

As is common, we will write

a

b:= a/b := [(a, b)] (2.65)

2In (2.63), we employ the usual multiplication on Z, as we used addition on N0 in (2.59) above,this time appealing to the reader’s intuitive, informal understanding of multiplication on Z. We willactually provide a mathematically precise definition of multiplication on Z later in this class in Ex.4.32, making use of the definition of Z given in (a) (readers who do not want to wait can, e.g., consult[Phi16, Def. D.16]).

3 NATURAL NUMBERS, INDUCTION, AND THE SIZE OF SETS 38

for the equivalence class of (a, b) with respect to ∼. The map

ι : Z −→ Q, ι(k) :=k

1, (2.66)

is injective (since ι(k) = k/1 = ι(l) = l/1 implies k · 1 = l · 1, i.e. k = l). It iscustomary to identify Z with ι(Z), as it usually does not cause any confusion. Onethen just writes k instead of k

1.

—

While the set of real numbers R can now also be constructed from the set Q, makinguse of the notions of equivalence relation and quotient set, the construction is morecomplicated and it also makes use of additional notions from the field of Analysis.Thus, this construction is not within the scope of this class and the interested reader isreferred to [Phi16, Sec. D.5].

3 Natural Numbers, Induction, and the Size of Sets

3.1 Induction and Recursion

One of the most useful proof techniques is the method of induction – it is used insituations, where one needs to verify the truth of statements φ(n) for each n ∈ N, i.e.the truth of the statement

∀n∈N

φ(n). (3.1)

Induction is based on the fact that N satisfies the so-called Peano axioms:

P1: N contains a special element called one, denoted 1.

P2: There exists an injective map S : N −→ N \ {1}, called the successor function (foreach n ∈ N, S(n) is called the successor of n).

P3: If a subset A of N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A, thenA is equal to N. Written as a formula, the third axiom is:

∀A∈P(N)

(1 ∈ A ∧ S(A) ⊆ A ⇒ A = N

).

Remark 3.1. In Def. 1.28, we had introduced the natural numbers N := {1, 2, 3, . . . }.The successor function is S(n) = n + 1. In axiomatic set theory, one starts with thePeano axioms and shows that the axioms of set theory allow the construction of aset N which satisfies the Peano axioms. One then defines 2 := S(1), 3 := S(2), . . . ,n+ 1 := S(n). The interested reader can find more details in [Phi16, Sec. D.1].

Theorem 3.2 (Principle of Induction). Suppose, for each n ∈ N, φ(n) is a statement(i.e. a predicate of n in the language of Def. 1.31). If (a) and (b) both hold, where


(a) φ(1) is true,

(b) ∀n∈N

(φ(n) ⇒ φ(n+ 1)

),

then (3.1) is true, i.e. φ(n) is true for every n ∈ N.

Proof. Let A := {n ∈ N : φ(n)}. We have to show A = N. Since 1 ∈ A by (a), and

n ∈ A ⇒ φ(n)(b)⇒ φ(n+ 1) ⇒ S(n) = n+ 1 ∈ A, (3.2)

i.e. S(A) ⊆ A, the Peano axiom P3 implies A = N. �

Remark 3.3. To prove some φ(n) for each n ∈ N by induction according to Th. 3.2consists of the following two steps:

(a) Prove φ(1), the so-called base case.

(b) Perform the inductive step, i.e. prove that φ(n) (the induction hypothesis) impliesφ(n+ 1).

Example 3.4. We use induction to prove the statement

∀n∈N

(

1 + 2 + · · ·+ n =n(n+ 1)

2

)

︸︷︷︸

φ(n)

: (3.3)

Base Case (n = 1): 1 = 1·22, i.e. φ(1) is true.

Induction Hypothesis: Assume φ(n), i.e. 1 + 2 + · · ·+ n = n(n+1)2

holds.

Induction Step: One computes

1 + 2 + · · ·+ n+ (n+ 1)

(φ(n))

=n(n+ 1)

2+ n+ 1 =

n(n+ 1) + 2n+ 2

2

=n2 + 3n+ 2

2=

(n+ 1)(n+ 2)

2, (3.4)

i.e. φ(n+ 1) holds and the induction is complete.

Corollary 3.5. Theorem 3.2 remains true if (b) is replaced by

∀n∈N

((

∀1≤m≤n

φ(m)

)

⇒ φ(n+ 1)

)

. (3.5)

Proof. If, for each n ∈ N, we use ψ(n) to denote ∀1≤m≤n

φ(m), then (3.5) is equivalent to

∀n∈N

(ψ(n) ⇒ ψ(n+ 1)

), i.e. to Th. 3.2(b) with φ replaced by ψ. Thus, Th. 3.2 implies

ψ(n) holds true for each n ∈ N, i.e. φ(n) holds true for each n ∈ N. �


Corollary 3.6. Let I be an index set. Suppose, for each i ∈ I, φ(i) is a statement. Ifthere is a bijective map f : N −→ I and (a) and (b) both hold, where

(a) φ(f(1)

)is true,

(b) ∀n∈N

(

φ(f(n)

)⇒ φ

(f(n+ 1)

))

,

then φ(i) is true for every i ∈ I.

Finite Induction: The above assertion remains true if f : {1, . . . ,m} −→ I is bijectivefor some m ∈ N and N in (b) is replaced by {1, . . . ,m− 1}.

Proof. If, for each n ∈ N, we use ψ(n) to denote φ(f(n)

), then Th. 3.2 shows ψ(n) is

true for every n ∈ N. Given i ∈ I, we have n := f−1(i) ∈ N with f(n) = i, showing thatφ(i) = φ

(f(n)

)= ψ(n) is true.

For the finite induction, let ψ(n) denote(n ≤ m ∧ φ

(f(n)

))∨ n > m. Then, for 1 ≤

n < m, we have ψ(n) ⇒ ψ(n+1) due to (b). For n ≥ m, we also have ψ(n) ⇒ ψ(n+1)due to n ≥ m ⇒ n + 1 > m. Thus, Th. 3.2 shows ψ(n) is true for every n ∈ N. Giveni ∈ I, it is n := f−1(i) ∈ {1, . . . ,m} with f(n) = i. Since n ≤ m ∧ ψ(n) ⇒ φ

(f(n)

), we

obtain that φ(i) is true. �

Apart from providing a widely employable proof technique, the most important appli-cation of Th. 3.2 is the possibility to define sequences (i.e. functions with domain N, cf.Def. 2.15(b)) inductively, using so-called recursion:

Theorem 3.7 (Recursion Theorem). Let A be a nonempty set and x ∈ A. Given asequence of functions (fn)n∈N, where fn : An −→ A, there exists a unique sequence(xn)n∈N in A satisfying the following two conditions:

(i) x1 = x.

(ii) ∀n∈N

xn+1 = fn(x1, . . . , xn).

The same holds if N is replaced by an index set I as in Cor. 3.6.

Proof. To prove uniqueness, let (xn)n∈N and (yn)n∈N be sequences in A, both satisfying(i) and (ii), i.e.

x1 = y1 = x and (3.6a)

∀n∈N

(xn+1 = fn(x1, . . . , xn) ∧ yn+1 = fn(y1, . . . , yn)

). (3.6b)

We prove by induction (in the form of Cor. 3.5) that (xn)n∈N = (yn)n∈N, i.e.

∀n∈N

xn = yn︸︷︷︸

φ(n)

: (3.7)


Base Case (n = 1): φ(1) is true according to (3.6a).

Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}, i.e. xm = ym holds foreach m ∈ {1, . . . , n}.Induction Step: One computes

xn+1(3.6b)= fn(x1, . . . , xn)

(φ(1),...,φ(n)

)

= fn(y1, . . . , yn)(3.6b)= yn+1, (3.8)

i.e. φ(n+ 1) holds and the induction is complete.

To prove existence, we have to show that there is a function F : N −→ A such that thefollowing two conditions hold:

F (1) = x, (3.9a)

∀n∈N

F (n+ 1) = fn(F (1), . . . , F (n)

). (3.9b)

To this end, let

F :=

B ⊆ N× A : (1, x) ∈ B ∧ ∀

n∈N,(1,a1),...,(n,an)∈B

(n+ 1, fn(a1, . . . , an)

)∈ B

(3.10)

andG :=

⋂

B∈F

B. (3.11)

Note that G is well-defined, as N × A ∈ F . Also, clearly, G ∈ F . We would like todefine F such that G = graph(F ). For this to be possible, we will show, by induction,

∀n∈N

∃!xn∈A

(n, xn) ∈ G︸︷︷︸

φ(n)

. (3.12)

Base Case (n = 1): From the definition of G, we know (1, x) ∈ G. If (1, a) ∈ G witha 6= x, then H := G \ {(1, a)} ∈ F , implying G ⊆ H in contradiction to (1, a) /∈ H.This shows a = x and proves φ(1).

Induction Hypothesis: Assume φ(m) for each m ∈ {1, . . . , n}.Induction Step: From the induction hypothesis, we know

∃!(x1,...,xn)∈An

(1, x1), . . . , (n, xn) ∈ G.

Thus, if we let xn+1 := fn(x1, . . . , xn), then (n+ 1, xn+1) ∈ G by the definition of G. If(n+ 1, a) ∈ G with a 6= xn+1, then H := G \ {(n+ 1, a)} ∈ F (using the uniqueness ofthe (1, x1), . . . , (n, xn) ∈ G), implying G ⊆ H in contradiction to (n + 1, a) /∈ H. Thisshows a = xn+1, proves φ(n+ 1), and completes the induction.

Due to (3.12), we can now define F : N −→ A, F (n) := xn, and the definition of G thenguarantees the validity of (3.9). �


Example 3.8. In many applications of Th. 3.7, one has functions gn : A −→ A anduses

∀n∈N

(fn : An −→ A, fn(x1, . . . , xn) := gn(xn)

). (3.13)

Here are some important concrete examples:

(a) The factorial function F : N0 −→ N, n 7→ n!, is defined recursively by

0! := 1, 1! := 1, ∀n∈N

(n+ 1)! := (n+ 1) · n!, (3.14a)

i.e. we have A = N and gn(x) := (n+ 1) · x. So we obtain

(n!)n∈N0= (1, 1, 2, 6, 24, 120, . . . ). (3.14b)

(b) Summation Symbol: On A = R (or, more generally, on every set A, where anaddition + : A × A −→ A is defined), define recursively, for each given (possiblyfinite) sequence (a1, a2, . . . ) in A:

1∑

i=1

ai := a1,

n+1∑

i=1

ai := an+1 +n∑

i=1

ai for n ≥ 1, (3.15a)

i.e. we havegn : A −→ A, gn(x) := x+ an+1. (3.15b)

In (3.15a), one can also use other symbols for i, except a and n; for a finite sequence,n needs to be less than the maximal index of the finite sequence.

More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, thendefine

∑

i∈I

ai :=n∑

i=1

aφ(i). (3.15c)

The commutativity of addition implies that the definition in (3.15c) is actuallyindependent of the chosen bijective map φ (cf. Th. B.4 in the Appendix). Alsodefine ∑

i∈∅

ai := 0 (3.15d)

(for a general A, 0 is meant to be an element such that a+ 0 = 0 + a = a for eacha ∈ A and we can even define this if 0 /∈ A).

(c) Product Symbol: On A = R (or, more generally, on every set A, where a multiplica-tion · : A× A −→ A is defined), define recursively, for each given (possibly finite)sequence (a1, a2, . . . ) in A:

1∏

i=1

ai := a1,

n+1∏

i=1

ai := an+1 ·n∏

i=1

ai for n ≥ 1, (3.16a)


i.e. we havegn : A −→ A, gn(x) := an+1 · x. (3.16b)

In (3.16a), one can also use other symbols for i, except a and n; for a finite sequence,n needs to be less than the maximal index of the finite sequence.

More generally, if I is an index set and φ : {1, . . . , n} −→ I a bijective map, thendefine

∏

i∈I

ai :=n∏

i=1

aφ(i). (3.16c)

The commutativity of multiplication implies that the definition in (3.16c) is actuallyindependent of the chosen bijective map φ (cf. Th. B.4 in the Appendix); however,we will see later that, for a general multiplication on a set A, commutativity willnot always hold (an important example will be matrix multiplication), and, in thatcase, the definition in (3.16c) does, in general, depend on the chosen bijective mapφ. Also define

∏

i∈∅

ai := 1 (3.16d)

(for a general A, 1 is meant to be an element such that a · 1 = 1 · a = a for eacha ∈ A and we can even define this if 1 /∈ A).

Example 3.9. As an (academic) example, where, in each step, the recursive definitiondoes depend on all previously computed values, consider the sequence (xn)n∈N, definedby

x1 := 1, ∀n∈N

xn+1 :=1

n

n∏

i=1

xi,

i.e. by setting A := N and

fn : An −→ A, fn(x1, . . . , xn) :=1

n

n∏

i=1

xi.

One obtains

x1 = 1, x2 = f1(1) = 1, x3 = f2(1, 1) =1

2, x4 = f3

(

1, 1,1

2

)

=1

6,

x5 = f4

(

1, 1,1

2,1

6

)

=1

48, x6 = f5

(

1, 1,1

2,1

6,1

48

)

=1

2880, . . .

—

In the above recursive definitions, we have always explicitly specified A and the gn orfn. However, in the literature as well as in the rest of this class, most of the time, thegn or fn are not provided explicitly.


3.2 Cardinality: The Size of Sets

Cardinality measures the size of sets. For a finite set A, it is precisely the number ofelements in A. For an infinite set, it classifies the set’s degree or level of infinity (it turnsout that not all infinite sets have the same size).

3.2.1 Definition and General Properties

Definition 3.10. (a) The sets A,B are defined to have the same cardinality or thesame size if, and only if, there exists a bijective map ϕ : A −→ B. According toTh. 3.11 below, this defines an equivalence relation on every set of sets.

(b) The cardinality of a set A is n ∈ N (denoted #A = n) if, and only if, there existsa bijective map ϕ : A −→ {1, . . . , n}. The cardinality of ∅ is defined as 0, i.e.#∅ := 0. A set A is called finite if, and only if, there exists n ∈ N0 such that#A = n; A is called infinite if, and only if, A is not finite, denoted #A = ∞ (in thestrict sense, this is an abuse of notation, since ∞ is not a cardinality – for example#N = ∞ and #P(N) = ∞, but N and P(N) do not have the same cardinality, sincethe power set P(A) is always strictly bigger than A (see Th. 3.16 below) – #A = ∞is merely an abbreviation for the statement “A is infinite”). The interested studentfinds additional material regarding characterizations of infinite sets in Th. A.53 ofthe Appendix.

(c) The set A is called countable if, and only if, A is finite or A has the same cardinalityas N. Otherwise, A is called uncountable.

Theorem 3.11. Let M be a set of sets. Then the relation ∼ on M, defined by

A ∼ B :⇔ A and B have the same cardinality,

constitutes an equivalence relation on M.

Proof. According to Def. 2.32, we have to prove that ∼ is reflexive, symmetric, andtransitive. According to Def. 3.10(a), A ∼ B holds for A,B ∈ M if, and only if, thereexists a bijective map f : A −→ B. Thus, since the identity Id : A −→ A is bijective,A ∼ A, showing ∼ is reflexive. If A ∼ B, then there exists a bijective map f : A −→ B,and f−1 is a bijective map f−1 : B −→ A, showing B ∼ A and that ∼ is symmetric.If A ∼ B and B ∼ C, then there are bijective maps f : A −→ B and g : B −→ C.Then, according to Th. 2.14, the composition (g ◦f) : A −→ C is also bijective, provingA ∼ C and that ∼ is transitive. �

Theorem 3.12 (Schroder-Bernstein). Let A,B be sets. The following statements areequivalent (even without assuming the axiom of choice):

(i) The sets A and B have the same cardinality (i.e. there exists a bijective mapφ : A −→ B).


(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

We will base the proof of the Schroder-Bernstein theorem on the following lemma (foran alternative proof, see [Phi16, Th. A.55]):

Lemma 3.13. Let A be a set. Consider P(A) to be endowed with the partial ordergiven by set inclusion, i.e., for each X, Y ∈ P(A), X ≤ Y if, and only if, X ⊆ Y . IfF : P(A) −→ P(A) is isotone with respect to that order, then F has a fixed point, i.e.F (X0) = X0 for some X0 ∈ P(A).

Proof. Define

A := {X ∈ P(A) : F (X) ⊆ X}, X0 :=⋂

X∈A

X

(X0 is well-defined, since F (A) ⊆ A). Suppose X ∈ A, i.e. F (X) ⊆ X and X0 ⊆ X.Then F (X0) ⊆ F (X) ⊆ X due to the isotonicity of F . Thus, F (X0) ⊆ X for everyX ∈ A, i.e. F (X0) ⊆ X0. Using the isotonicity of F again shows F (F (X0)) ⊆ F (X0),implying F (X0) ∈ A and X0 ⊆ F (X0), i.e. F (X0) = X0 as desired. �

Proof of Th. 3.12. (i) trivially implies (ii), as one can simply set f := φ and g := φ−1.It remains to show (ii) implies (i). Thus, let f : A −→ B and g : B −→ A be injective.To apply Lem. 3.13, define

F : P(A) −→ P(A), F (X) := A \ g(B \ f(X)

),

and note

X ⊆ Y ⊆ A ⇒ f(X) ⊆ f(Y ) ⇒ B \ f(Y ) ⊆ B \ f(X)

⇒ g(B \ f(Y )

)⊆ g(B \ f(X)

)⇒ F (X) ⊆ F (Y ).

Thus, by Lem. 3.13, F has a fixed point X0. We claim that a bijection is obtained viasetting

φ : A −→ B, φ(x) :=

{

f(x) for x ∈ X0,

g−1(x) for x /∈ X0.

First, φ is well-defined, since x /∈ X0 = F (X0) implies x ∈ g(B \ f(X0)

). To verify

that φ is injective, let x, y ∈ A, x 6= y. If x, y ∈ X0, then φ(x) 6= φ(y), as f isinjective. If x, y ∈ A \ X0, then φ(x) 6= φ(y), as g−1 is well-defined. If x ∈ X0 andy /∈ X0, then φ(x) ∈ f(X0) and φ(y) ∈ B \ f(X0), once again, implying φ(x) 6= φ(y). Ifremains to prove surjectivity. If b ∈ f(X0), then φ(f

−1(b)) = b. If b ∈ B \ f(X0), theng(b) /∈ X0 = F (X0), i.e. φ(g(b)) = b, showing φ to be surjective. �

Theorem 3.14. Let A,B be nonempty sets. Then the following statements are equiva-lent (where the implication “(ii) ⇒ (i)” makes use of the axiom of choice (AC)).

(i) There exists an injective map f : A −→ B.

(ii) There exists a surjective map g : B −→ A.


Proof. According to Th. 2.13(b), (i) is equivalent to f having a left inverse g : B −→ A(i.e. g ◦ f = IdA), which is equivalent to g having a right inverse, which, according toTh. 2.13(a), is equivalent to (ii) (AC is used in the proof of Th. 2.13(a) to show eachsurjective map has a right inverse). �

Corollary 3.15. Let A,B be nonempty sets. Using AC, we can expand the two equiv-alent statements of Th. 3.12 to the following list of equivalent statements:

(i) The sets A and B have the same cardinality (i.e. there exists a bijective mapφ : A −→ B).

(ii) There exist an injective map f : A −→ B and an injective map g : B −→ A.

(iii) There exist a surjective map f : A −→ B and a surjective map g : B −→ A.

(iv) There exist an injective map f1 : A −→ B and a surjective map f2 : A −→ B.

(v) There exist an injective map g1 : B −→ A and a surjective map g2 : B −→ A.

Proof. The equivalences are an immediate consequence of combining Th. 3.12 with Th.3.14. �

Theorem 3.16. Let A be a set. There can never exist a surjective map from A ontoP(A) (in this sense, the size of P(A) is always strictly bigger than the size of A; inparticular, A and P(A) can never have the same size).

Proof. If A = ∅, then there is nothing to prove. For nonempty A, the idea is toconduct a proof by contradiction. To this end, assume there does exist a surjective mapf : A −→ P(A) and define

B := {x ∈ A : x /∈ f(x)}.

Now B is a subset of A, i.e. B ∈ P(A) and the assumption that f is surjective impliesthe existence of a ∈ A such that f(a) = B. If a ∈ B, then a /∈ f(a) = B, i.e. a ∈ Bimplies a ∈ B ∧ ¬(a ∈ B), so that the principle of contradiction tells us a /∈ B must betrue. However, a /∈ B implies a ∈ f(a) = B, i.e., this time, the principle of contradictiontells us a ∈ B must be true. In conclusion, we have shown our original assumption thatthere exists a surjective map f : A −→ P(A) implies a ∈ B ∧ ¬(a ∈ B), i.e., accordingto the principle of contradiction, no surjective map from A into P(A) can exist. �

3.2.2 Finite Sets

While many results concerning cardinalities of finite sets seem intuitively clear, as amathematician, one still has to provide a rigorous proof. Providing such proofs in thepresent section also provides us with the opportunity to see more examples of inductionproofs. We begin by showing finite cardinalities to be uniquely determined. The key isthe following theorem:


Theorem 3.17. If m,n ∈ N and the map f : {1, . . . ,m} −→ {1, . . . , n} is bijective,then m = n.

Proof. We conduct the proof via induction on m.

Base Case (m = 1): If m = 1, then the surjectivity of f implies n = 1.

Induction Step: Let m > 1. From the bijective map f , we define the map

g : {1, . . . ,m} −→ {1, . . . , n}, g(x) :=

n for x = m,

f(m) for x = f−1(n),

f(x) otherwise.

Then g is bijective, since it is the composition g = h ◦ f of the bijective map f with thebijective map

h : {f(m), n} −→ {f(m), n}, h(f(m)) := n, h(n) := f(m).

Thus, the restriction g↾{1,...,m−1}: {1, . . . ,m−1} −→ {1, . . . , n−1}must also be bijective,such that the induction hypothesis yields m− 1 = n− 1, which, in turn, implies m = nas desired. �

Corollary 3.18. Let m,n ∈ N and let A be a set. If #A = m and #A = n, thenm = n.

Proof. If #A = m, then, according to Def. 3.10(b), there exists a bijective map f :A −→ {1, . . . ,m}. Analogously, if #A = n, then there exists a bijective map g :A −→ {1, . . . , n}. In consequence, we have the bijective map (g ◦ f−1) : {1, . . . ,m} −→{1, . . . , n}, such that Th. 3.17 yields m = n. �

Theorem 3.19. Let A 6= ∅ be a finite set.

(a) If B ⊆ A with A 6= B, then B is finite with #B < #A.

(b) If a ∈ A, then #(A \ {a}

)= #A− 1.

Proof. For #A = n ∈ N, we use induction to prove (a) and (b) simultaneously, i.e. weshow

∀n∈N

(

#A = n ⇒ ∀B∈P(A)\{A}

∀a∈A

#B ∈ {0, . . . , n− 1} ∧#(A \ {a}

)= n− 1

)

︸︷︷︸

φ(n)

.

Base Case (n = 1): In this case, A has precisely one element, i.e. B = A \ {a} = ∅, and#∅ = 0 = n− 1 proves φ(1).

Induction Step: For the induction hypothesis, we assume φ(n) to be true, i.e. we assume(a) and (b) hold for each A with #A = n. We have to prove φ(n+ 1), i.e., we considerA with #A = n+1. From #A = n+1, we conclude the existence of a bijective map ϕ :


A −→ {1, . . . , n+ 1}. We have to construct a bijective map ψ : A \ {a} −→ {1, . . . , n}.To this end, set k := ϕ(a) and define the auxiliary function

f : {1, . . . , n+ 1} −→ {1, . . . , n+ 1}, f(x) :=

n+ 1 for x = k,

k for x = n+ 1,

x for x /∈ {k, n+ 1}.

Then f ◦ ϕ : A −→ {1, . . . , n+ 1} is bijective by Th. 2.14, and

(f ◦ ϕ)(a) = f(ϕ(a)) = f(k) = n+ 1.

Thus, the restriction ψ := (f ◦ ϕ) ↾A\{a} is the desired bijective map ψ : A \ {a} −→{1, . . . , n}, proving #

(A \ {a}

)= n. It remains to consider the strict subset B of A.

Since B is a strict subset of A, there exists a ∈ A \ B. Thus, B ⊆ A \ {a} and, aswe have already shown #

(A \ {a}

)= n, the induction hypothesis applies and yields B

is finite with #B ≤ #(A \ {a}

)= n, i.e. #B ∈ {0, . . . , n}, proving φ(n + 1), thereby

completing the induction. �

Theorem 3.20. For #A = #B = n ∈ N and f : A −→ B, the following statementsare equivalent:

(i) f is injective.

(ii) f is surjective.

(iii) f is bijective.


Lemma 3.21. For each finite set A (i.e. #A = n ∈ N0) and each B ⊆ A, one has#(A \B) = #A−#B.

Proof. For B = ∅, the assertion is true since #(A \B) = #A = #A− 0 = #A−#B.

For B 6= ∅, the proof is conducted over the size of B, i.e. as a finite induction (cf. Cor.3.6) over the set {1, . . . , n}, showing

∀m∈{1,...,n}

(#B = m ⇒ #(A \B) = #A−#B

)

︸︷︷︸

φ(m)

.

Base Case (m = 1): φ(1) is precisely the statement provided by Th. 3.19(b).

Induction Step: For the induction hypothesis, we assume φ(m) with 1 ≤ m < n. Toprove φ(m + 1), consider B ⊆ A with #B = m + 1. Fix an element b ∈ B and setB1 := B \ {b}. Then #B1 = m by Th. 3.19(b), A \B = (A \B1) \ {b}, and we compute

#(A \B) = #((A \B1) \ {b}

) Th. 3.19(b)= #(A \B1)− 1

(φ(m)

)

= #A−#B1 − 1

= #A−#B,

proving φ(m+ 1) and completing the induction. �


Theorem 3.22. If A,B are finite sets, then #(A ∪B) = #A+#B −#(A ∩ B).

Proof. The assertion is clearly true if A or B is empty. If A and B are nonempty, thenthere exist m,n ∈ N such that #A = m and #B = n, i.e. there are bijective mapsf : A −→ {1, . . . ,m} and g : B −→ {1, . . . , n}.We first consider the case A∩B = ∅. We need to construct a bijective map h : A∪B −→{1, . . . ,m+ n}. To this end, we define

h : A ∪ B −→ {1, . . . ,m+ n}, h(x) :=

{

f(x) for x ∈ A,

g(x) +m for x ∈ B.

The bijectivity of f and g clearly implies the bijectivity of h, proving #(A ∪ B) =m+ n = #A+#B.

Finally, we consider the case of arbitrary A,B. Since A∪B = A ∪(B \A) and B \A =B \ (A ∩B), we can compute

#(A ∪B) = #(A ∪(B \ A)

)= #A+#(B \ A)

= #A+#(B \ (A ∩ B)

) Lem. 3.21= #A+#B −#(A ∩ B),

thereby establishing the case. �

Theorem 3.23. If (A1, . . . , An), n ∈ N, is a finite sequence of finite sets, then

#n∏

i=1

Ai = #(A1 × · · · × An

)=

n∏

i=1

#Ai. (3.17)

Proof. If at least one Ai is empty, then (3.17) is true, since both sides are 0.

The case where all Ai are nonempty is proved by induction over n, i.e. we know ki :=#Ai ∈ N for each i ∈ {1, . . . , n} and show by induction

∀n∈N

#n∏

i=1

Ai =n∏

i=1

ki

︸︷︷︸

φ(n)

.

Base Case (n = 1):∏1

i=1Ai = #A1 = k1 =∏1

i=1 ki, i.e. φ(1) holds.

Induction Step: From the induction hypothesis φ(n), we obtain a bijective map ϕ :A −→ {1, . . . , N}, where A :=

∏ni=1Ai and N :=

∏ni=1 ki. To prove φ(n + 1), we need

to construct a bijective map h : A× An+1 −→ {1, . . . , N · kn+1}. Since #An+1 = kn+1,there exists a bijective map f : An+1 −→ {1, . . . , kn+1}. We define

h : A× An+1 −→ {1, . . . , N · kn+1},h(a1, . . . , an, an+1) :=

(f(an+1)− 1

)·N + ϕ(a1, . . . , an).

Since ϕ and f are bijective, and since every m ∈ {1, . . . , N · kn+1} has a unique repre-sentation in the form m = a ·N + r with a ∈ {0, . . . , kn+1 − 1} and r ∈ {1, . . . , N} (seeTh. D.1 in the Appendix), h is also bijective. This proves φ(n + 1) and completes theinduction. �


Theorem 3.24. For each finite set A with #A = n ∈ N0, one has #P(A) = 2n.

Proof. The proof is conducted by induction by showing

∀n∈N0

(#A = n ⇒ #P(A) = 2n

)

︸︷︷︸

φ(n)

.

Base Case (n = 0): For n = 0, we have A = ∅, i.e. P(A) = {∅}. Thus, #P(A) = 1 = 20,proving φ(0).

Induction Step: Assume φ(n) and consider A with #A = n + 1. Then A containsat least one element a. For B := A \ {a}, we then know #B = n from Th. 3.19(b).Moreover, setting M :=

{C ∪ {a} : C ∈ P(B)

}, we have the disjoint decomposition

P(A) = P(B) ∪M. As the map ϕ : P(B) −→ M, ϕ(C) := C∪{a}, is clearly bijective,P(B) and M have the same cardinality. Thus,

#P(A)Th. 3.22= #P(B) + #M = #P(B) + #P(B)

(φ(n))

= 2 · 2n = 2n+1,

thereby proving φ(n+ 1) and completing the induction. �

3.2.3 Countable Sets

In this section, we present a number of important results regarding the natural numbersand countability.

Theorem 3.25. (a) Every nonempty finite subset of a totally ordered set has a mini-mum and a maximum.

(b) Every nonempty subset of N has a minimum.


Proposition 3.26. Every subset A of N is countable.

Proof. Since ∅ is countable, we may assume A 6= ∅. From Th. 3.25(b), we know thatevery nonempty subset of N has a min. We recursively define a sequence in A by

a1 := minA, an+1 :=

{

minAn if An := A \ {ai : 1 ≤ i ≤ n} 6= ∅,an if An = ∅.

This sequence is the same as the function f : N −→ A, f(n) = an. An easy inductionshows that, for each n ∈ N, an 6= an+1 implies the restriction f ↾{1,...,n+1} is injective.Thus, if there exists n ∈ N such that an = an+1, then f ↾{1,...,k}: {1, . . . , k} −→ A isbijective, where k := min{n ∈ N : an = an+1}, showing A is finite, i.e. countable. Ifthere does not exist n ∈ N with an = an+1, then f is injective. Another easy inductionshows that, for each n ∈ N, f({1, . . . , n}) ⊇ {k ∈ A : k ≤ n}, showing f is alsosurjective, proving A is countable. �


Proposition 3.27. For each set A 6= ∅, the following three statements are equivalent:

(i) A is countable.

(ii) There exists an injective map f : A −→ N.

(iii) There exists a surjective map g : N −→ A.

Proof. Directly from the definition of countable in Def. 3.10(c), one obtains (i)⇒(ii) and(i)⇒(iii). To prove (ii)⇒(i), let f : A −→ N be injective. Then f : A −→ f(A) isbijective, and, since f(A) ⊆ N, f(A) is countable by Prop. 3.26, proving A is countableas well. To prove (iii)⇒(i), let g : N −→ A be surjective. Then g has a right inversef : A −→ N. One can obtain this from Th. 2.13(a), but, here, we can actually constructf without the axiom of choice: For a ∈ A, let f(a) := min g−1({a}) (recall Th. 3.25(b)).Then, clearly, g ◦f = IdA. But this means g is a left inverse for f , showing f is injectiveaccording to Th. 2.13(b). Then A is countable by an application of (ii). �

Theorem 3.28. If (A1, . . . , An), n ∈ N, is a finite family of countable sets, then∏n

i=1Ai

is countable.

Proof. We first consider the special case n = 2 with A1 = A2 = N and show the map

ϕ : N× N −→ N, ϕ(m,n) := 2m · 3n,

is injective: If ϕ(m,n) = ϕ(p, q), then 2m · 3n = 2p · 3q. Moreover m ≤ p or p ≤ m.If m ≤ p, then 3n = 2p−m · 3q. Since 3n is odd, 2p−m · 3q must also be odd, implyingp − m = 0, i.e. m = p. Moreover, we now have 3n = 3q, implying n = q, showing(m,n) = (p, q), i.e. ϕ is injective.

We now come back to the general case stated in the theorem. If at least one of the Ai isempty, then A is empty. So it remains to consider the case, where all Ai are nonempty.The proof is conducted by induction by showing

∀n∈N

n∏

i=1

Ai is countable

︸︷︷︸

φ(n)

.

Base Case (n = 1): φ(1) is merely the hypothesis that A1 is countable.

Induction Step: Assuming φ(n), Prop. 3.27(ii) provides injective maps f1 :∏n

i=1Ai −→N and f2 : An+1 −→ N. To prove φ(n+1), we provide an injective map h :

∏n+1i=1 Ai −→

N: Define

h :n+1∏

i=1

Ai −→ N, h(a1, . . . , an, an+1) := ϕ(f1(a1, . . . , an), f2(an+1)

).

The injectivity of f1, f2, and ϕ clearly implies the injectivity of h, thereby provingφ(n+ 1) and completing the induction. �

4 BASIC ALGEBRAIC STRUCTURES 52

Theorem 3.29. If (Ai)i∈I is a countable family of countable sets (i.e. ∅ 6= I is countableand each Ai, i ∈ I, is countable), then the union A :=

⋃

i∈I Ai is also countable (thisresult makes use of AC, cf. Rem. 3.30 below).

Proof. It suffices to consider the case that all Ai are nonempty. Moreover, according toProp. 3.27(iii), it suffices to construct a surjective map ϕ : N −→ A. Also accordingto Prop. 3.27(iii), the countability of I and the Ai provides us with surjective mapsf : N −→ I and gi : N −→ Ai (here AC is used to select each gi from the set of allsurjective maps from N onto Ai). Define

F : N× N −→ A, F (m,n) := gf(m)(n).

Then F is surjective: Given x ∈ A, there exists i ∈ I such that x ∈ Ai. Since f issurjective, there is m ∈ N satisfying f(m) = i. Moreover, since gi is surjective, thereexists n ∈ N with gi(n) = x. Then F (m,n) = gi(n) = x, verifying that F is surjective.As N×N is countable by Th. 3.28, there exists a surjective map h : N −→ N×N. Thus,F ◦ h is the desired surjective map from N onto A. �

Remark 3.30. The axiom of choice is, indeed, essential for the proof of Th. 3.29. Itis shown in [Jec73, Th. 10.6] that it is consistent with the axioms of ZF (i.e. with theaxioms of Sec. A.3 of the Appendix) that, e.g., the uncountable sets P(N) and R (cf.[Phi16, Th. F.2]) are countable unions of countable sets.

4 Basic Algebraic Structures

We are now in a position to conduct some actual algebra, that means we will start tostudy sets A that come with a composition law ◦ : A×A −→ A, (a, b) 7→ a◦ b, assumedto satisfy a number of rules, so-called axioms. One goal is to prove further rules as aconsequence of the axioms. A main benifit of this method is the fact that these provedrules are then known to hold in every structure, satisfying the axioms. Perhaps themost simple structure that still occurs in a vast number of interesting places throughoutmathematics is that of a group (see Def. 4.5(b) below), examples including the set ofreal numbers R with the usual addition as well as R \ {0} with the usual multiplication.So every rule we will prove as a consequence of the group axioms will hold in these twogroups as well as in countless other groups. We note that vector spaces, the structuresof central interest to linear algebra, are, in particular, always groups, so that everythingwe prove about groups in the following is, in particular, true in every vector space.

4.1 Magmas and Groups

Definition 4.1. Let A be a nonempty set. A map

◦ : A× A −→ A, (x, y) 7→ x ◦ y, (4.1)


is called a composition on A; the pair (A, ◦) is then called a magma. It is also commonto call a composition a multiplication and to write x · y or just xy instead of x ◦ y. Inthis situation, the product symbol

∏is defined as in Ex. 3.8(c). If the composition is

called an addition and x + y is written instead of x ◦ y, then it is assumed that it iscommutative, that means it satisfies the law of Def. 4.2(a) below (a multiplication mightor might now satisfy commutativity). For an addition, the summation symbol

∑is

defined as in Ex. 3.8(b).

Definition 4.2. Let (A, ◦) be a magma.

(a) ◦ is called commutative or abelian if, and only if, x ◦ y = y ◦x holds for all x, y ∈ A.

(b) ◦ is called associative if, and only if, x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for all x, y, z ∈ A.

(c) An element e ∈ A is called left neutral (resp. right neutral) if, and only if,

∀x∈A

e ◦ x = x (resp. x ◦ e = x). (4.2)

e ∈ A is called neutral if, and only if, it is both left neutral and right neutral.

—

If associativity holds for each triple of elements, then it also holds for each finite tupleof elements; if the composition is associative and commutativity holds for each pair ofelements, then it also holds for each finite tuple of elements (see Th. B.3 and Th. B.4in the Appendix).

Lemma 4.3. Let (A, ◦) be a magma with l, r ∈ A. If l is left neutral and r is rightneutral, then l = r (in particular, A contains at most one neutral element).

Proof. As l is left neutral and r is right neutral, we obtain r = l ◦ r = l. �

Notation 4.4. If a composition on a set A is written as a multiplication ·, then itis common to write a neutral element as 1 and to also call it a unit or one. If a(commutative) composition on a set A is written as an addition +, then it is commonto write a neutral element as 0 and to also call it zero.

Definition 4.5. Let (A, ◦) be a magma.

(a) (A, ◦) is called a semigroup if, and only if, ◦ is associative.

(b) (A, ◦) is called a group if, and only if, it is a semigroup with the additional propertythat A contains a right neutral3 element e ∈ A and, for each x ∈ A, there exists aninverse element x ∈ A, i.e. an element x ∈ A such that

x ◦ x = e. (4.3)

A group is called commutative or abelian if, and only if, ◦ is commutative.

3We will see in Th. 4.6(a) that the remaining properties of a group then force this right neutralelement to be a neutral element as well. However, not requiring e to be neutral here has the advantagethat we know right away that we only need to prove the existence of a right neutral element to showsome structure is a group.


—

Before considering examples of magmas (A, ◦) in Ex. 4.9 below, we prove some firstrules and introduce some more notation.

Theorem 4.6. Let (G, ◦) be a group with right neutral element e ∈ G. Then thefollowing statements and rules are valid in G:

(a) e is a neutral element (thus, in particular, it is uniquely determined according toLem. 4.3).

(b) If x, a ∈ G, then x ◦ a = e if, and only if, a ◦ x = e (i.e. an element is right inverseif, and only if, it is left inverse). Moreover, inverse elements are unique (for eachx ∈ G, the unique inverse is then denoted by x−1).

(c) (x−1)−1 = x holds for each x ∈ G.

(d) y−1 ◦ x−1 = (x ◦ y)−1 holds for each x, y ∈ G.

(e) Cancellation Laws: For each x, y, a ∈ G, one has

x ◦ a = y ◦ a ⇒ x = y, (4.4a)

a ◦ x = a ◦ y ⇒ x = y. (4.4b)

Proof. (a): Let x ∈ G. By Def. 4.5(b), there exists y ∈ G such that x ◦ y = e and, inturn, z ∈ G such that y ◦ z = e. Thus, as e is right neutral.

e ◦ z = (x ◦ y) ◦ z = x ◦ (y ◦ z) = x ◦ e = x,

implyingx = e ◦ z = (e ◦ e) ◦ z = e ◦ (e ◦ z) = e ◦ x,

showing e to be left neutral as well.

(b): To a choose b ∈ G such that a ◦ b = e. If x ◦ a = e, then

e = a ◦ b = (a ◦ e) ◦ b = (a ◦ (x ◦ a)) ◦ b = a ◦ ((x ◦ a) ◦ b) = a ◦ (x ◦ (a ◦ b))= a ◦ (x ◦ e) = a ◦ x.

Interchanging the roles of x and a now proves the converse. To prove uniqueness, leta, b be inverses to x. Then a = a ◦ e = a ◦ x ◦ b = e ◦ b = b.

(c): x−1 ◦ x = e holds according to (b) and shows that x is the inverse to x−1. Thus,(x−1)−1 = x as claimed.

(d) is due to y−1 ◦ x−1 ◦ x ◦ y = y−1 ◦ e ◦ y = e.

(e): If x ◦ a = y ◦ a, then x = x ◦ a ◦ a−1 = y ◦ a ◦ a−1 = y as claimed; if a ◦ x = a ◦ y,then x = a−1 ◦ a ◦ x = a−1 ◦ a ◦ y = y as well. �


Notation 4.7. Exponentiation with Integer Exponents: Let (A, ·) be a magma withneutral element 1 ∈ A. Define recursively for each x ∈ A and each n ∈ N0:

x0 := 1, ∀n∈N0

xn+1 := x · xn. (4.5a)

Moreover, if (A, ·) constitutes a group, then also define for each x ∈ A and each n ∈ N:

x−n := (x−1)n. (4.5b)

Theorem 4.8. Exponentiation Rules: Let (G, ·) be a semigroup with neutral element1 ∈ G. Let x, y ∈ G. Then the following rules hold for each m,n ∈ N0. If (G, ·) is agroup, then the rules even hold for every m,n ∈ Z.

(a) xm+n = xm · xn.

(b) (xm)n = xmn.

(c) If the composition is also commutative, then xnyn = (xy)n.

Proof. (a): First, we fix n ∈ N0 and prove the statement for each m ∈ N0 by induction:The base case (m = 0) is xn = xn, which is true. For the induction step, we compute

xm+1+n (4.5a)= x · xm+n ind. hyp.

= x · xm · xn (4.5a)= xm+1xn,

completing the induction step. Now assume G to be a group. Consider m ≥ 0 andn < 0. If m+ n ≥ 0, then, using what we have already shown,

xmxn(4.5b)= xm(x−1)−n = xm+nx−n(x−1)−n = xm+n.

Similarly, if m+ n < 0, then

xmxn(4.5b)= xm(x−1)−n = xm(x−1)m(x−1)−n−m (4.5b)

= xm+n.

The case m < 0, n ≥ 0 is treated completely analogously. It just remains to considerm < 0 and n < 0. In this case,

xm+n = x−(−m−n) (4.5b)= (x−1)−m−n = (x−1)−m · (x−1)−n (4.5b)

= xm · xn.

(b): First, we prove the statement for each n ∈ N0 by induction (for m < 0, we assmueG to be a group): The base case (n = 0) is (xm)0 = 1 = x0, which is true. For theinduction step, we compute

(xm)n+1 (4.5a)= xm · (xm)n ind. hyp.

= xm · xmn (a)= xmn+m = xm (n+1),

completing the induction step. Now, let G be a group and n < 0. We already know(xm)−1 = x−m. Thus, using what we have already shown,

(xm)n(4.5b)=

((xm)−1

)−n= (x−m)−n = x(−m)(−n) = xmn.


(c): For n ∈ N0, the statement is proved by induction: The base case (n = 0) isx0y0 = 1 = (xy)0, which is true. For the induction step, we compute

xn+1yn+1 (4.5a)= x · xn · y · yn ind. hyp.

= xy · (xy)n (4.5a)= (xy)n+1,

completing the induction step. If G is a group and n < 0, then, using what we havealready shown,

xnyn(4.5b)= (x−1)−n(y−1)−n = (x−1y−1)−n Th. 4.6(d)

=((xy)−1

)−n (4.5b)= (xy)n,

which completes the proof. �

Example 4.9. (a) Let M be a set. Then (P(M),∩) and (P(M),∪) are magmas withneutral elements, where M is neutral for ∩ and ∅ is neutral for ∪. From Th. 1.29,we also know ∩ and ∪ to be associative and commutative, showing (P(M),∩) and(P(M),∪) to form abelian semigroups. However, if M 6= ∅, then neither structureforms a group, due to the lack of inverse elements: If N (M , then there does notexist B ⊆ M such that N ∩ B = M ; if ∅ 6= N ⊆ M , then there does not existB ⊆ M such that N ∪ B = ∅. If M = ∅, then P(M) = {∅} and (P(M),∩) and(P(M),∪) both constitute the group with one element (∅ being inverse to itself inboth cases).

(b) Let M be a set and define

SM :={(π : M −→M) : π is bijective

}, (4.6)

where the elements of SM are also called permutations on M . Then (SM , ◦), where◦ is given by composition of maps according to Def. 2.8, forms a group, the so-calledsymmetric group or permutation group on M . We verify (SM , ◦) to be a group: ◦is well-defined on SM , since the composition of bijective maps is bijective by Th.2.14. Moreover, ◦ is associative by Th. 2.10(a). The neutral element is given by theidentity map Id : M −→ M , Id(a) := a, and, if π ∈ SM , then its inverse map π−1

is also its inverse element in the group SM . We claim that ◦ is commutative on SM

if, and only if, #M ≤ 2: While commutativity is clear for #M ≤ 2, if {a, b, c} ⊆Mwith #{a, b, c} = 3 and we define f, g ∈ SM by letting

f(x) :=

a for x = a,

c for x = b,

b for x = c,

x for x /∈ {a, b, c},

g(x) :=

b for x = a,

a for x = b,

c for x = c,

x for x /∈ {a, b, c}.

(4.7)

Then (f ◦ g)(a) = c 6= b = (g ◦ f)(a), showing ◦ is not commutative.

The case M = {1, . . . , n}, n ∈ N, is often of particular interest and, in this case,one also writes Sn for SM .


(c) (N0,+) and (N0, ·), where + and · denote the usual addition and multiplication onN0, respectively (see [Phi16, Def. D.1] for a definition via recursion), form commu-tative semigroups (see [Phi16, Th. D.2]), but no groups due to the lack of inverseelements. From N0, one can then construct the sets of integers Z (see Ex. 2.36(a)),rational numbers Q (see Ex. 2.36(b)), real numbers R (see [Phi16, Def. D.32]), andcomplex numbers C (see [Phi16, Def. 5.1]). One can extend + and · to each of thesesets (see Th. 4.15, Ex. 4.32, and Ex. 4.38 below, [Phi16, Def. D.32], and [Phi16,Def. 5.1]) and show that (Z,+), (Q,+), (R,+), (C,+) are commutative groups (seeTh. 4.15, Ex. 4.32(b), and Ex. 4.38 below, [Phi16, Th. D.35], and [Phi16, Th. 5.2])as well as (Q\{0}, ·), (R\{0}, ·), (C\{0}, ·) (see Ex. 4.38 below, [Phi16, Th. D.35],and [Phi16, Th. 5.2]).

(d) Let (A, ◦) be a magma. We define another magma (P(A), ◦) by letting

∀B,C∈P(A)

B ◦ C := {b ◦ c : b ∈ B ∧ c ∈ C} ⊆ A. (4.8)

The case, where B or C is a singleton set is of particular interest and one then usesthe following simplified notation:

∀B∈P(A)

∀a∈A

(

a ◦B := {a} ◦B, B ◦ a := B ◦ {a})

. (4.9)

Sets of the form a ◦ B are called left cosets, sets of the form B ◦ a are called rightcosets. If (A, ◦) is commutative, then B ◦ C = C ◦ B for each B,C ∈ P(A), sincea = b ◦ c with b ∈ B, c ∈ C if, and only if, a = c ◦ b. In the same manner, one seesthat, if ◦ is associative on A, then it is also associative on P(A). If e ∈ A is left(resp. right) neutral, then, clearly, {e} is left (resp. right) neutral for ◦ on P(A). Ingeneral, one can not expect (P(A), ◦) to be a group, even if (A, ◦) is a group, dueto the lack of inverse elements: For example, while (Z,+) is a group, (P(Z),+) isnot: For example, B := {0, 1} has no inverse: If C were inverse to B, then −1 ∈ C,implying −1 = 0 + (−1) ∈ B + C, i.e. B + C 6= {0}, in contradiction to C beinginverse to B.

(e) Let (A, ·) be a magma and let M be a set. We consider the set F(M,A) = AM offunctions, mapping M into A. We make AM into a magma by defining · pointwiseon AM : Define

∀f,g∈AM

f · g : M −→ A, (f · g)(x) := f(x) · g(x). (4.10)

We then have

(A, ·) commutative ⇒ (AM , ·) commutative, (4.11a)

(A, ·) associative ⇒ (AM , ·) associative, (4.11b)

e ∈ A left/right neutral ⇒ f ∈ AM , f ≡ e left/right neutral, (4.11c)

(A, ·) group ⇒ (AM , ·) group, (4.11d)


where (4.11a) – (4.11c) are immediate from (4.10). To verify (4.11d), let (A, ·) bea group with e ∈ A being neutral. If f ∈ AM , then let

g : M −→ A, g(x) := (f(x))−1.

Then∀

x∈M(f · g)(x) = f(x) · (f(x))−1 = e,

showing g to be inverse to f in AM , i.e. (AM , ·) forms a group.

—

The, perhaps, most important notion when studying structures is that of the composi-tion-respecting map. The technical term for such a map is homomorphism, which wedefine next:

Definition 4.10. Let (A, ◦) and (B, ◦) be magmas (caveat: to simplify notation, wedenote both compositions by ◦; however, they might even denote different compositionson the same set A = B). A map φ : A −→ B is called homomorphism if, and only if,

∀x,y∈A

φ(x ◦ y) = φ(x) ◦ φ(y). (4.12)

If φ is a homomorphism, then it is called monomorphism if, and only if, it is injective;epimorphism if, and only if, it is surjective; isomorphism if, and only if, it is bijective;endomorphism if, and only if, (A, ◦) = (B, ◦); automorphism if, and only if, it is bothendomorphism and isomorphism. Moreover, (A, ◦) and (B, ◦) are called isomorphic(denoted (A, ◦) ∼= (B, ◦)) if, and only if, there exists an isomorphism φ : A −→ B. IfA and B have neutral elements e ∈ A, e′ ∈ B, then a homomorphism φ : A −→ B iscalled unital if, and only if, φ(e) = e′.

—

If (A, ◦) and (B, ◦) are isomorphic, then, from an algebraic perspective, the two struc-tures are the same, except that the elements of the underlying set have been “renamed”(where the isomorphism provides the assignment rule for obtaining the new names).

Proposition 4.11. Let (A, ◦), (B, ◦), (C, ◦) be magmas.

(a) If φ : A −→ B and ψ : B −→ C are homomorphisms, then so is ψ ◦ φ. If φ, ψ areunital, so is ψ ◦ φ.

(b) If φ : A −→ B is an isomorphism, then φ−1 is an isomorphism as well.

(c) Let φ : A −→ B be an epimorphism. Then the following implications hold:

(A, ◦) commutative ⇒ (B, ◦) commutative,

(A, ◦) semigroup ⇒ (B, ◦) semigroup,

(A, ◦) group ⇒ (B, ◦) group.

If φ is even an isomorphism, then the above implications become equivalences.


(d) Let φ : A −→ B be an epimorphism. If e ∈ A is neutral, then B contains a neutralelement and φ is unital.

(e) If (A, ◦), (B, ◦) are groups and φ : A −→ B is a homomorphism, then φ is unitaland φ(a−1) = (φ(a))−1 for each a ∈ A.

Proof. (a): If φ and ψ are homomorphisms and x, y ∈ A, then

(ψ ◦ φ)(x ◦ y) = ψ(φ(x) ◦ φ(y)

)= ψ(φ(x)) ◦ ψ(φ(y)),

showing ψ ◦ φ to be a homomorphism. If e ∈ A, e′ ∈ B, e′′ ∈ C are neutral and φ andψ are unital, then

(ψ ◦ φ)(e) = ψ(e′) = e′′,

showing ψ ◦ φ to be unital.

(b): If φ is an isomorphism and x, y ∈ B, then

φ−1(x ◦ y) = φ−1(

φ(φ−1(x)

)◦ φ(φ−1(y)

))

= φ−1(

φ(φ−1(x) ◦ φ−1(y)

))

= φ−1(x) ◦ φ−1(y),

showing φ−1 to be an isomorphism.

(c): Exercise.

(d): It suffices to show, e′ := φ(e) is neutral. Let b ∈ B and a ∈ A such that φ(a) = b.Then

b ◦ e′ = φ(a) ◦ φ(e) = φ(a ◦ e) = φ(a) = b,

e′ ◦ b = φ(e) ◦ φ(a) = φ(e ◦ a) = φ(a) = b,

proving e′ to be neutral as desired.

(e): Let (A, ◦), (B, ◦) be groups with neutral elements e ∈ A, e′ ∈ B, and let φ : A −→ Bbe a homomorphism. Then

φ(e) ◦ e′ = φ(e) = φ(e ◦ e) = φ(e) ◦ φ(e).

Applying (φ(e))−1 to both sides of the above equality proves φ(e) = e′. Moreover, foreach a ∈ A,

φ(a−1) ◦ φ(a) = φ(a−1 ◦ a) = φ(e) = e′,

proving (e). �

Example 4.12. (a) If (A, ◦) is a magma, then it is immediate that the identity Id :A −→ A is an automorphism.

(b) If (A, ◦), (B, ◦) are magmas, where e ∈ B is neutral, then the constant map φ ≡ eis a homomorphism: Indeed,

∀x,y∈A

φ(x ◦ y) = e = e ◦ e = φ(x) ◦ φ(y).


(c) If (A, ·) is a semigroup with neutral element 1 ∈ A, then, for each fixed a ∈ A,

φa : N0 −→ A, φa(n) := an, (4.13)

is a homomorphism from (N0,+) into (A, ·); if (A, ·) is a group, we may extend φa

to Z and it becomes a homomorphism from (Z,+) into (A, ·): These statements areimmediate from Th. 4.8(a). Note that, if A = N0,Z,Q,R,C and (A, ·) = (A,+),then φa(n) = na.

(d) If (G, ·) is an abelian group, then for each fixed k ∈ Z,

φk : G −→ G, φk(a) := ak, (4.14)

is an endomorphism due to Th. 4.8(c) (the case k = −1, where a 7→ a−1 is ofparticular interest). Note that, if G = N0,Z,Q,R,C and (G, ·) = (G,+), thenφk(a) = ka.

Notation 4.13. A composition ◦ on a nonempty finite set A can be defined by meansof a composition table (also known as a multiplication table or Cayley table: For eachelement a ∈ A, the table has a row labeled by a and it has a column labeled by a; ifa, b ∈ A, then the entry at the intersection of the row labaled a and the column labeledb is a ◦ b (see Ex. 4.14 below, for examples).

Example 4.14. (a) The most trivial groups are the group with one element and thegroup with two elements, which can be defined by the following Cayley tables (wheree, a are distinct elements):

◦ ee e

,◦ e ae e aa a e

Both structures are abelian groups, which is immediate for the group with oneelement and is easily verified for the group with two elements. We will see laterthat they are isomorphic to the groups we will denote Z1 and Z2 (which, in light ofProp. 4.11(c), shows again that the structure with two elements is a group).

(b) The following examples show that, in non-groups, left and right neutral elementsneed not be unique: For a 6= b, let

◦ a ba a bb a b

,◦ a ba a ab b b

One easily verifies these structures to be associative, but they are not groups: Themagma on the left has no right neutral element, the magma on the right does nothave inverses to both a and b.


Theorem 4.15. Let (A,+) be an abelian semigroup with neutral element 0 ∈ A andassume (A,+) to satisfy the cancellation laws (4.4) (with ◦ replaced by +). Considerthe quotient set

G := (A× A)/ ∼ ={[(a, b)] : (a, b) ∈ A× A

}(4.15)

with respect to the equivalence relation defined by

(a, b) ∼ (c, d) :⇔ a+ d = b+ c. (4.16)

Introducing the simplified notation [a, b] := [(a, b)], we define addition and subtractionon G by

+ : G×G −→ G,([a, b], [c, d]

)7→ [a, b] + [c, d] := [a+ c, b+ d], (4.17a)

− : G×G −→ G,([a, b], [c, d]

)7→ [a, b]− [c, d] := [a, b] + [d, c]. (4.17b)

Then addition and subtraction on G are well-defined, (G,+) forms an abelian group,where [0, 0] is the neutral element, [b, a] is the inverse element of [a, b] for each a, b ∈ A,and, denoting the inverse element of [a, b] by −[a, b] in the usual way, [a, b] − [c, d] =[a, b] + (−[c, d]) for each a, b, c, d ∈ A. Moreover, the map

ι : A −→ G, ι(a) := [a, 0], (4.18)

is a monomorphism, where it is customary to identify A with ι(A). One then just writesa instead of [a, 0] and −a instead of [0, a] = −[a, 0]. The most important application isthe case A = N0, which yields that addition on Z = G forms an abelian group (cf. Ex.2.36(a), [Phi16, Th. D.2], [Phi16, Th. D.7(b)]).

Proof. If one reexamines Ex. 2.36(a), replacing N0 by A and Z by G, then one sees itshows ∼ to constitute an equivalence relation, where the proof makes use of commuta-tivity, associativity, and the cancellation law. To verify + and − are well-defined on G,we need to check the above definitions do not depend on the chosen representatives, i.e.

∀a,b,c,d,a,b,c,d∈A

(

[a, b] = [a, b]∧ [c, d] = [c, d] ⇒ [a+ c, b+d] = [a+ c, b+ d])

(4.19a)

and

∀a,b,c,d,a,b,c,d∈A

(

[a, b] = [a, b]∧ [c, d] = [c, d] ⇒ [a, b]− [c, d] = [a, b]− [c, d])

. (4.19b)

Indeed, [a, b] = [a, b] means a + b = b + a, [c, d] = [c, d] means c + d = d + c, implyinga+c+ b+ d = b+ a+d+ c, i.e. [a+c, b+d] = [a+ c, b+ d], proving (4.19a). Now (4.19b)is just (4.17b) combined with (4.19a). To verify commutativity and associativity on G,let a, b, c, d, e, f ∈ A. Then

[a, b] + [c, d] = [a+ c, b+ d] = [c+ a, b+ d] = [c, d] + [a, b],


proving commutativity, and

[a, b] +([c, d] + [e, f ]

)= [a, b] + [c+ e, d+ f ] = [a+ (c+ e), b+ (d+ f)]

= [(a+ c) + e, (b+ d) + f ] = [a+ c, b+ d] + [e, f ]

=([a, b] + [c, d]

)+ [e, f ],

proving associativity. For every a, b ∈ A, one obtains [a, b]+ [0, 0] = [a+0, b+0] = [a, b],proving neutrality of [0, 0], whereas [a, b] + [b, a] = [a + b, b + a] = [a + b, a + b] = [0, 0](since (a+ b, a+ b) ∼ (0, 0)) shows [b, a] = −[a, b]. Now [a, b]− [c, d] = [a, b] + (−[c, d])is immediate from (4.17b). The map ι is injective, as, for each a, b ∈ A, ι(a) = [a, 0] =ι(b) = [b, 0] implies a+ 0 = 0 + b, i.e. a = b. The map ι is a homomorphism, as

∀a,b∈A

ι(a+ b) = [a+ b, 0] = [a, 0] + [b, 0] = ι(a) + ι(b),

completing the proof of the theorem. �

Definition 4.16. Let (G, ◦) be a group, ∅ 6= U ⊆ G. We call U a subgroup of G(denoted U ≤ G) if, and only if, (U, ◦) forms a group, where the composition on U isthe restriction of the composition on G, i.e. ◦↾U×U .

Theorem 4.17. Let (G, ◦) be a group, ∅ 6= U ⊆ G. Then U ≤ G if, and only if, (i)and (ii) hold, where

(i) For each u, v ∈ U , one has u ◦ v ∈ U .

(ii) For each u ∈ U , one has u−1 ∈ U .

If U is finite, then U ≤ G is already equivalent to (i).


Example 4.18. (a) If (G, ◦) is an arbitrary group with neutral element e ∈ G, then itis immediate from Th. 4.17, that {e} and G are always subgroups.

(b) We use Th. 4.17 to verify that, for each k ∈ Z, kZ = {kl : l ∈ Z} is a subgroup of(Z,+): As 0 ∈ kZ, kZ 6= ∅. If l1, l2 ∈ Z, then kl1 + kl2 = k(l1 + l2) ∈ kZ. If l ∈ Z,then −kl = k(−l) ∈ kZ. Thus, Th. 4.17 applies, showing kZ ≤ Z.

(c) As N is no subgroup of Z, even though Th. 4.17(i) holds for N, we see that, ingeneral, Th. 4.17(ii) can not be omitted for infinite subsets.

(d) If (G, ◦) is a group with neutral element e ∈ G, I 6= ∅ is an index set, and (Ui)i∈Iis a family of subgroups, then U :=

⋂

i∈I Ui is a subgroup as well: Indeed, e ∈ U ,since e ∈ Ui for each i ∈ I. If a, b ∈ U , then a, b ∈ Ui for each i ∈ I, implyinga ◦ b ∈ Ui for each i ∈ I, i.e. a ◦ b ∈ U . Moreover a−1 ∈ Ui for each i ∈ I as well,showing a−1 ∈ U . Thus, U is a subgroup by Th. 4.17.


(e) In contrast to intersections, unions of subgroups are not necessarily subgroups: Weknow from (b) that 2Z and 3Z are subgroups of (Z,+). However, 2 ∈ 2Z, 3 ∈ 3Z,but 5 = 2 + 3 6∈ (2Z) ∪ (3Z), showing (2Z) ∪ (3Z) is not a subgroup of (Z,+).

(f) One can show that every group G is isomorphic to a subgroup of the permutationgroup SG (see Sec. C of the Appendix, in particular, Cayley’s Th. C.2).

Definition 4.19. Let G and H be groups and let φ : G −→ H be a homomorphism.Let e′ ∈ H be neutral. Define the sets

kerφ := φ−1{e′} = {a ∈ G : φ(a) = e′}, (4.20a)

Imφ := φ(G) = {φ(a) : a ∈ G}, (4.20b)

where kerφ is called the kernel of φ and Imφ is called the image of φ.

Theorem 4.20. Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism.Let e ∈ G and e′ ∈ H be the respective neutral elements. Then the following statementshold true:

(a) kerφ ≤ G.

(b) Imφ ≤ H.

(c) φ is a monomorphism (i.e. injective) if, and only if, kerφ = {e}.(d) It holds true that

∀a∈G

φ−1({φ(a)}

)= a(kerφ) = (kerφ)a,

i.e. the nonempty preimages of elements of H are cosets of the kernel of φ (cf. Ex.4.9(d)).

Proof. (a) – (c): Exercise.

(d): Fix a ∈ G. Suppose b ∈ φ−1({φ(a)}

). Then φ(a) = φ(b), implying φ(a−1b) =

(φ(a))−1φ(b) = e′ and a−1b ∈ kerφ. Thus, b ∈ a(kerφ) and φ−1({φ(a)}

)⊆ a(kerφ).

Similarly, φ(ba−1) = φ(b)(φ(a))−1 = e′, showing b ∈ (kerφ)a and φ−1({φ(a)}

)⊆

(kerφ)a. Conversely, suppose b ∈ a(kerφ) (resp. b ∈ (kerφ)a). Then a−1b ∈ kerφ(resp. ba−1 ∈ kerφ), implying

e′ = φ(a−1b) = φ(a−1)φ(b)(resp. e′ = φ(ba−1) = φ(b)φ(a−1)

).

In both cases, we conclude φ(a) = φ(b), i.e. b ∈ φ−1({φ(a)}

). Thus a(kerφ) ⊆

φ−1({φ(a)}

)as well as (kerφ)a ⊆ φ−1

({φ(a)}

), thereby establishing the case. �

Example 4.21. According to Ex. 4.12(c), if k ∈ Z, then the map φk : Z −→ Z,φk(l) := kl, is a homomorphism of (Z,+) into itself. Clearly, Imφk = kZ, showing, onceagain, kZ ≤ Z (cf. Ex. 4.18(b)). For another example, let G := {e, a} be the groupwith two elements of Ex. 4.14(a). According to Ex. 4.12(c), the map φ : Z −→ G,φ(n) := an, is a homomorphism. Clearly, φ is an epimorphism with kerφ = 2Z (i.e. theset of all even numbers).

—


In Ex. 4.9(d), we have defined cosets for a given subset of a magma. For a group G,cosets of subgroups are of particular interest: We will see in Prop. 4.22 that, if U ≤ G,then the cosets of U form a partition of G, giving rise to an equivalence relation on Gaccording to Th. 2.33(a). Thus, the cosets form the quotient set with respect to thisequivalence relation and, if the subgroup is sufficiently benign (namely what we will calla normal subgroup in Def. 4.23 below), then we can make this quotient set itself intoanother group, the so-called quotient group or factor group G/U (cf. Def. 4.25 and Th.4.26(a) below).

Proposition 4.22. Let (G, ·) be a group, U ≤ G. Then

G =⋃

a∈G

aU, ∀a,b∈G

(

aU = bU ∨ aU ∩ bU = ∅)

, (4.21a)

G =⋃

a∈G

Ua, ∀a,b∈G

(

Ua = Ub ∨ Ua ∩ Ub = ∅)

, (4.21b)

i.e. both the left cosets and the right cosets of U form decompositions of G.

Proof. We conduct the proof for left cosets – the proof for right cosets is completelyanalogous. For each a ∈ G, we have a ∈ aU (since e ∈ U), already showing G =⋃

a∈G aU . Now let a, b, x ∈ G with x ∈ aU ∩ bU . We need to prove aU = bU . Fromx ∈ aU ∩ bU , we obtain

∃u1,u2∈U

x = au1 = bu2. (4.22)

Now let au ∈ aU (i.e. u ∈ U). As (4.22) implies a = bu2u−11 , we obtain au = bu2u

−11 u ∈

bU , using u2u−11 u ∈ U , due to U being a subgroup. This shows aU ⊆ bU . Analogously,

if bu ∈ bU , then bu = au1u−12 u ∈ aU , where we used b = au1u

−12 due to (4.22) and

u1u−12 u ∈ U . This shows bU ⊆ aU , completing the proof. �

Definition 4.23. Let (G, ·) be a group with subgroup U ≤ G. Then U is called normalif, and only if,

∀a∈G

aU = Ua, (4.23)

i.e. if, and only if, left and right cosets with respect to U are the same.

Example 4.24. (a) If G is an abelian group, then every subgroup of G is normal.

(b) Let (G, ·) and (H, ·) be groups and let φ : G −→ H be a homomorphism. Thenkerφ is a normal subgroup of G as a consequence of Th. 4.20(d).

(c) In Ex. 4.9(b), we defined the symmetric groups S2 and S3. Clearly, the map

φ : S2 −→ S3, f 7→ φ(f), φ(f)(n) :=

{

f(n) for n ∈ {1, 2},3 for n = 3,

constitutes a monomorphism. By identifying S2 with φ(S2), we may consider S2 asa subgroup of S3. We claim that S2 is not a normal subgroup of S3. Indeed, onechecks that

(23)S2 = (23){Id, (12)} = {(23), (132)} 6= S2(23) = {(23), (123)},


where we made use of a notation, commonly used for permutations (we will need tostudy it more thoroughly later): (12) is the map that permutes 1 and 2 and leaves3 fixed, (23) permutes 2 and 3 and leaves 1 fixed, (132) maps 1 into 3, 3 into 2, 2into 1, (123) maps 1 into 2, 2 into 3, 3 into 1.

Definition 4.25. Let (G, ·) be a group and let N ≤ G be a normal subgroup. Accordingto Prop. 4.22, the cosets of N form a partition of G. According to Th. 2.33(a), the cosetsare precisely the equivalence classes of the equivalence relation ∼ on G, defined by

a ∼ b :⇔ aN = bN.

DefineG/N := G/ ∼,

i.e. G/N is the set of all cosets of N . We define a composition on G/N by letting

∀a,b∈G

(aN) · (bN) := abN (4.24)

(note that we already know the forming of cosets to be associative by Ex. 4.9(d)). Wecall (G/N, ·) the quotient group of G by N or the factor group of G with respect to N(cf. Th. 4.26(a) below).

Theorem 4.26. (a) Let (G, ·) be a group and let N ≤ G be a normal subgroup. Then(4.24) well-defines a composition on G/N that makes G/N into a group with neutralelement N . Moreover, the map

φN : G −→ G/N, φN(a) := aN, (4.25)

is an epimorphism, called the canonical or natural homomorphism from G ontoG/N (comparing with Def. 2.32(b), we see that φN is precisely the quotient mapwith respect to the equivalence relation ∼ of Def. 4.25).

(b) Isomorphism Theorem: If (G, ·) and (H, ·) are groups and φ : G −→ H is ahomomorphism, then

G/ kerφ ∼= Imφ. (4.26)

More precisely, the map

f : G/ kerφ −→ Imφ, f(a kerφ) := φ(a), (4.27)

is well-defined and constitutes an isomorphism. If fe : G −→ G/ kerφ denotes thenatural epimorphism according to (a) and ι : Imφ −→ H, ι(a) := a, denotes theembedding, then fm : G/ kerφ −→ H, fm := ι ◦ f , is a monomorphism such that

φ = fm ◦ fe. (4.28)

Proof. (a): To see that (4.24) well-defines a composition on G/N , we need to showthat the definition is independent of the chosen coset representatives a, b: To this end,suppose a, b, a′, b′ ∈ G are such that aN = a′N and bN = b′N . We need to show


abN = a′b′N . There exist na, nb ∈ N such that a′ = ana, b′ = bnb. Since N is a normal

subgroup of G, we have bN = Nb and there exists n ∈ N such that nab = bn. Then, asnnbN = N , we obtain

a′b′N = anabnbN = abnnbN = abN,

as needed. That φN is surjective is immediate from (4.25). Moreover, the computation

∀a,b∈G

φN(ab) = abN = (aN) · (bN) = φN(a) · φN(b)

verifies φN to be a homomorphism. That (G/N, ·) forms a group is now an immediateconsequence of Prop. 4.11(c),(d).

(b): We conduct the proof by showing f(a kerφ) does not depend on the chosen cosetrepresentative a ∈ G: Let e′ ∈ H be neutral and suppose a′ ∈ G is such that a′ kerφ =a kerφ. Then there exists x ∈ kerφ such that a′ = ax, implying

f(a′ kerφ) = φ(a′) = φ(ax) = φ(a)φ(x) = φ(a) · e′ = φ(a) = f(a kerφ),

as desired. To prove f to be a homomorphism, let a, b ∈ G. Then

f((a kerφ) · (b kerφ)

)= f(ab kerφ) = φ(ab) = φ(a)φ(b) = f(a kerφ)f(b kerφ),

i.e. f is a homomorphism. If x ∈ Imφ, then there exists a ∈ G with x = φ(a).Thus, f(a kerφ) = φ(a) = x, showing f to be surjective. Now suppose a ∈ G is suchthat f(a kerφ) = e′. Then φ(a) = e′, i.e. a ∈ kerφ. Thus, a kerφ = kerφ, showingker f = {kerφ}, i.e. f is injective, completing the proof that f is an isomorphism. Sincef and ι are monomorphisms, so is fm = ι ◦ f . Moreover, if a ∈ G, then

(fm ◦ fe)(a) = fm(a kerφ) = f(a kerφ) = φ(a),

thereby proving (4.28). �

Example 4.27. (a) Consider (Z,+) and let n ∈ N. We know from Ex. 4.18(b) thatnZ ≤ Z. As (Z,+) is also commutative, nZ is a normal subgroup of Z and we canform the quotient group

Zn := Z/nZ. (4.29)

The elements of Zn are the cosets r + nZ, r ∈ Z. For r ∈ Z, we have

r := r + nZ = {r +mn : m ∈ Z}.

Note that r +mn + nZ = r + nZ. For integers k, l ∈ Z, one says k is congruent lmodulo n if, and only if, there exists m ∈ Z such that k− l = mn and, in this case,one writes k ≡ l (mod n). If k = r +mn with m ∈ Z, then k − r = mn, implying

r + nZ = {k ∈ Z : k ≡ r (mod n)}.

For this reason, one also calls the elements of Zn congruence classes. If k = r+mnas above, one also says that r is the residue (or remainder) when dividing k by n


and, thus, one also calls the elements of Zn residue classes. Here, the canonicalhomomorphism of Th. 4.26(a) takes the form

φ : Z −→ Zn, φ(r) := r = r + nZ.

We also note that

Z =n−1⋃

r=0

r. (4.30)

Now consider G := {0, . . . , n− 1} and define an addition ⊕ on G by letting

∀r,s∈G

r ⊕ s :=

{

r + s for r + s ≤ n− 1,

r + s− n for n ≤ r + s ≤ 2n− 2,

which is commutative due to (Z,+) being commutative. We claim (G,⊕) ∼= (Zn,+)due to the isomorphism f : G −→ Zn, f(r) := r. If r, s ∈ G, then

f(r ⊕ s) = f(r + s) = r + s = r + s = f(r) + f(s) for r + s ≤ n− 1,

f(r ⊕ s) = f(r + s− n) = r + s− n = r + s = r + s

= f(r) + f(s) for n ≤ r + s ≤ 2n− 2,

showing f to be a homomorphism. Moreover, if r 6= s, then r 6= s, i.e. f is injective,while (4.30) shows f to be surjective, completing the proof that f is an isomorphism.In particular, (G,⊕) is a group, due to Th. 4.11(c). Even though it constitutes aslight abuse of notation, we will tend to use the isomorphism f to identify G withZn, where we write simply + instead of ⊕ and write r instead of r, idendifying rwith its standard representative r ∈ r. The Cayley tables for (Z1,+) and (Z2,+)are

+ 00 0

,+ 0 10 0 11 1 0

,

respectively. Comparing with Ex. 4.14(a), we see that φ1 : Z1 −→ {e}, φ1(0) := e,and φ2 : Z2 −→ {e, a}, φ2(0) := e, φ2(1) := a, are isomorphisms.

(b) Let m,n ∈ N. As example for an application of the isomorphism theorem of Th.4.26(b), we show

mZ/mnZ ∼= Zn :

Consider the mapφ : mZ −→ Zn, φ(mr) := r + nZ.

Then φ is a homomorphism, since

∀r,s∈Z

φ(mr+ms) = φ(m(r+s)) = r+s+nZ = (r+nZ)+(s+nZ) = φ(mr)+φ(ms).

Clearly, φ is surjective, i.e. Imφ = Zn. Moreover,

mr ∈ kerφ ⇔ r ∈ nZ ⇔ ∃k∈Z

r = kn ⇔ mr = mkn ∈ mnZ,

showing kerφ = mnZ. In consequence, mZ/mnZ ∼= Zn holds due to Th. 4.26(b).


4.2 Rings and Fields

While groups are already sufficiently rich to give rise to the vast algebraic disciplinecalled group theory, before we can define vector spaces, we still need to consider struc-tures called rings and fields (where fields are rings that are especially benign). Ringsare richer and, in general, more complicated than groups, as they always come with twocompositions instead of just one.

Definition 4.28. Let R be a nonempty set with two composition maps

+ : R×R −→ R, (x, y) 7→ x+ y,

· : R×R −→ R, (x, y) 7→ x · y (4.31)

(+ is called addition and · is called multiplication; as before, one often writes xy insteadof x · y). Then (R,+, ·) (or just R, if + and · are understood) is called a ring if, andonly if, the following three conditions are satisfied:

(i) (R,+) is a commutative group (its neutral element is denoted by 0).

(ii) Multiplication is associative.

(iii) Distributivity:

∀x,y,z∈R

x · (y + z) = x · y + x · z, (4.32a)

∀x,y,z∈R

(y + z) · x = y · x+ z · x. (4.32b)

A ring R is called commutative if, and only if, its multiplication is commutative. More-over, a ring is called a ring with unity if, and only if, R contains a neutral element withrespect to multiplication (i.e. there is 1 ∈ R such that 1 · x = x · 1 = x for each x ∈ R)– some authors always require a ring to have a neutral element with respect to multi-plication. Finally, (R,+, ·) is called a field if, and only if, it is a ring and, in addition,(R \ {0}, ·) constitutes a commutative group (its neutral element is denoted by 1). Foreach x, y in a ring R, define y− x := y+ (−x), where −x is the additive inverse of x. IfR is a field, then, for x 6= 0, also define the fractions y

x:= y/x := yx−1 with numerator

y and denominator x, where x−1 denotes the multiplicative inverse of x.

—

Like group theory, ring theory and field theory are vast important subdisciplines ofalgebra. Here we will merely give a brief introduction to some elementary notions andresults, before proceeding to vector spaces, which are defined, building on the notion ofa field. Before we come to examples of rings and fields, we will prove a number of basicrules. However, it might be useful to already know that, with the usual addition andmultiplication, Z is a ring (but not a field), and Q, R, C all are fields.

Theorem 4.29. The following statements and rules are valid in every ring (R,+, ·) (let0 denote the additive neutral element and let x, y, z ∈ R):


(a) x · 0 = 0 = 0 · x.

(b) x(−y) = −(xy) = (−x)y.

(c) (−x)(−y) = xy.

(d) x(y − z) = xy − xz, (y − z)x = yx− zx.

Proof. (a): One computes

x · 0 + x · 0 (4.32a)= x · (0 + 0) = x · 0 = 0 + x · 0,

i.e. x · 0 = 0 follows since (R,+) is a group. The second equality follows analogouslyusing (4.32b).

(b): xy+x(−y) = x(y−y) = x ·0 = 0, where we used (4.32a) and (a). This shows x(−y)is the additive inverse to xy. The second equality follows analogously using (4.32b).

(c): xy = −(−(xy)) = −(x(−y)) = (−x)(−y), where (b) was used twice.

(d): x(y − z) = x(y + (−z)) = xy + x(−z) = xy − xz and (y − z)x = (y + (−z))x =yx+ (−z)x = yx− zx. �

Theorem 4.30. The following statement and rules are valid in every field (F,+, ·):

(a) xy = 0 ⇒ x = 0 ∨ y = 0.

(b) Rules for Fractions:

a

c+b

d=ad+ bc

cd,

a

c· bd=ab

cd,

a/c

b/d=ad

bc,

where all denominators are assumed 6= 0.

Proof. (a): If xy = 0 and x 6= 0, then y = 1 · y = x−1xy = x−1 · 0 = 0.

(b): One computes

a

c+b

d= ac−1 + bd−1 = add−1c−1 + bcc−1d−1 = (ad+ bc)(cd)−1 =

ad+ bc

cd

anda

c· bd= ac−1bd−1 = ab(cd)−1 =

ab

cd

anda/c

b/d= ac−1(bd−1)−1 = ac−1b−1d = ad(bc)−1 =

ad

bc,

completing the proof. �


Definition and Remark 4.31. Let (R,+, ·) be a ring and a ∈ R. Then a is called a left(resp. right) zero divisor if, and only if, there exists x ∈ R \ {0} such that ax = 0 (resp.xa = 0). If a 6= 0 is a zero divisor, then it is called a nonzero or nontrivial zero divisor.According to Th. 4.29(a), 0 is always a zero divisor, except for R = {0}. Accordingto Th. 4.30(a), in a field, there do not exist any nontrivial zero divisors. However, ingeneral, rings can have nontrivial zero divisors (see, e.g. Ex. 4.37 and Ex. 4.41(a) below).

Example 4.32 (Ring of Integers Z). Even though we have occasionally already usedmultiplication on Z in examples, so far, we have not provided a mathematically precisedefinition. The present example, will remedy this situation. Recall from Ex. 2.36(a) andTh. 4.15 the definition of Z as a set of equivalence classes of elements of N0 × N0 withaddition (and subtraction) on Z defined according to (4.17). To obtain the expectedlaws of arithmetic, multiplication on Z needs to be defined such that (a− b) · (c− d) =(ac + bd) − (ad + bc), which leads to the following definition: Multiplication on Z isdefined by

· : Z× Z −→ Z,([a, b], [c, d]

)7→ [a, b] · [c, d] := [ac+ bd, ad+ bc]. (4.33)

(a) It is an exercise to show the definition in (4.33) does not depend on the chosenrepresentatives, i.e.

∀a,b,c,d,a,b,c,d∈N0

(

[a, b] = [a, b]∧[c, d] = [c, d] ⇒ [ac+bd, ad+bc] = [ac+bd, ad+bc])

.

(4.34)

(b) It is an exercise to show (Z,+, ·) is a commutative ring with unity, where [1, 0] isthe neutral element of multiplication, and there are no nontrivial zero divisors i.e.

∀a,b,c,d∈N0

(

[a, b]·[c, d] = [ac+bd, ad+bc] = [0, 0] ⇒ [a, b] = [0, 0]∨[c, d] = [0, 0])

.

(4.35)

(c) (Z,+, ·) is not a field, since, e.g., there is no multiplicative inverse for 2 ∈ Z: While0 · 2 = 0, one has n · 2 ≥ 2 for each n ∈ N and −n · 2 ≤ −2 for each n ∈ N, showingk · 2 6= 1 for each k ∈ Z.

Definition 4.33. (a) Let (R,+, ·) be a ring (resp. a field), ∅ 6= U ⊆ R. We call U asubring (resp. a subfield) of R if, and only if, (U,+, ·) forms a ring (resp. a field),where the compositions on U are the respective restrictions of the compositions onR, i.e. +↾U×U and ·↾U×U .

(b) Let (R,+, ·) and (S,+, ·) be rings (caveat: to simplify notation, we use the samesymbols to denote the compositions on R and S, respectively; however, they mighteven denote different compositions on the same set R = S). A map φ : R −→ Sis called ring homomorphism if, and only if, φ is a homomorphism in the sense ofDef. 4.10 with respect to both + and · (caveat: Ex. 4.35(a) below shows that aring homomorphism is not necessarily unital with respect to multiplication). Thenotions ring monomorphism, epimorphism, isomorphism, endomorphism, automor-phism are then defined as in Def. 4.10. Moreover, (R,+, ·) and (S,+, ·) are called


isomorphic (denoted (R,+, ·) ∼= (S,+, ·)) if, and only if, there exists a ring isomor-phism φ : R −→ S.

Theorem 4.34. (a) Let (R,+, ·) be a ring, ∅ 6= U ⊆ R. Then U is a subring of R if,and only if, (i) and (ii) hold, where

(i) (U,+) is a subgroup of (R,+).

(ii) For each a, b ∈ U , one has ab ∈ U .

(b) Let (F,+, ·) be a field, ∅ 6= U ⊆ F . Then U is a subfield of F if, and only if, (i)and (ii) hold, where

(i) (U,+) is a subgroup of (F,+).

(ii) (U \ {0}, ·) is a subgroup of (F \ {0}, ·).

Proof. In both cases, it is clear that (i) and (ii) are necessary. That (i) and (ii) arealso sufficient in both cases is due to the fact that, if associativity (resp. commutativity,resp. distributivity) hold on R (resp. F ), then it is immediate that the same propertyholds on U . �

Example 4.35. (a) Clearly, the trivial ring ({0},+, ·) (∼= (Z1,+, ·), see Ex. 4.37 below)is a subring of every ring (it is not a field, since {0} \ {0} = ∅ is not a group).If R and S are arbitrary rings, then, clearly, the constant map φ0 : R −→ S,φ0 ≡ 0, is always a ring homomorphism (this shows that a ring homomorphism isnot necessarily unital with respect to multiplication). Also note that Th. 4.29(a)implies that any ring that has 0 = 1 (in the sense that the neutral elements foraddition and multiplication are the same) has necessarily precisely one element, i.e.it is (isomorphic to) ({0},+, ·).

(b) Let n ∈ N. It is clear from Th. 4.34(a) that nZ is a subring of Z (note that, forn > 1, nZ is not a ring with unity). Moreover, for n > 1, φ : Z −→ nZ, φ(k) := nk,is not a ring homomorphism, since, e.g. φ(1 · 1) = n 6= n2 = φ(1)φ(1).

(c) If R, S are arbitrary rings and φ : R −→ S is a ring homomorphism, then it isclear from Th. 4.34(a) that Imφ is a subring of S. Moreover, kerφ := φ−1({0}) isa subring of R: This is also due to Th. 4.34(a), since (kerφ,+) ≤ (R,+) and, ifa, b ∈ kerφ, then φ(ab) = φ(a)φ(b) = 0, showing ab ∈ kerφ. However, if R and Sare rings with unity, and φ(1) = 1, then kerφ is a ring with unity if, and only if,S = {0}: Indeed, if 1 ∈ kerφ, then 0 = φ(1) = 1 in S.

(d) Let (R,+, ·) be a ring (resp. a ring with unity, resp. a field), let I 6= ∅ be an indexset, and let (Ui)i∈I be a family of subrings (resp. a subrings with unity, resp. asubfields). It is then an exercise to show U :=

⋂

i∈I Ui is a subring (resp. a subringwith unity, resp. a subfield).

(e) In contrast to intersections, unions of rings are not necessarily rings: We know from(b) that 2Z and 3Z are subrings of Z. But we already noted in Ex. 4.18(e) that


(2Z) ∪ (3Z) is not even a subgroup of (Z,+). To show that the union of two fieldsis not necessarily a field (not even a ring, actually), needs slightly more work: Weuse that Q is a subfield of R (cf. Rem. 4.39 below) and show that, for each x ∈ R

with x2 ∈ Q, Q(x) := {r + sx : r, s ∈ Q} is a subfield of R (clearly, Q(x) = Q if,and only if, x ∈ Q): Setting r = s = 0, shows 0 ∈ Q(x). Setting r = 1, s = 0,shows 1 ∈ Q(x). If r1, s1, r2, s2 ∈ Q, then

r1 + s1x+ r2 + s2x = r1 + r2 + (s1 + s2)x ∈ Q(x),

(r1 + s1x)(r2 + s2x) = r1r2 + s1s2x2 + (s1r2 + r1s2)x ∈ Q(x),

−(r1 + s1x) = −r1 − s1x ∈ Q(x),

(r1 + s1x)−1 =

r1 − s1x

r21 − s21x2∈ Q(x),

showing Q(x) to be a subfield of R by Th. 4.34(b). However, e.g., U := Q(√2) ∪

Q(√3) is not even a ring, since α :=

√2 +

√3 /∈ U : α /∈ Q(

√2) since, otherwise,

there are r, s ∈ Q with α = r + s√2, i.e.

√3 = r + (s− 1)

√2, and 3 = r2 + 2r(s−

1)√2 + 2(s− 1)2, implying the false statement

3 = 0 ∨√3 ∈ Q ∨

√

3/2 ∈ Q ∨√2 ∈ Q.

Switching the roles of√2 and

√3, shows α /∈ Q(

√3), i.e. α /∈ U .

—

We can extend Prop. 4.11(c) to rings and fields:

Proposition 4.36. Let A be a nonempty set with compositions + and ·, and let B bea nonempty set with compositions + and · (caveat: to simplify notation, we use thesame symbols to denote the compositions on R and S, respectively; however, they mighteven denote different compositions on the same set A = B). If φ : A −→ B is anepimorphism with respect to both + and ·, then

(A,+, ·) satisfies (4.32a) ⇒ (B,+, ·) satisfies (4.32a),

(A,+, ·) satisfies (4.32b) ⇒ (B,+, ·) satisfies (4.32b),

(A,+, ·) ring ⇒ (B,+, ·) ring,(A,+, ·) ring with unity ⇒ (B,+, ·) ring with unity,

(A,+, ·) field ⇒ (B,+, ·) field.

If φ is even an isomorphism with respect to both + and ·, then the above implicationsbecome equivalences.

Proof. Let b1, b2, b3 ∈ B with preimages a1, a2, a3 ∈ A, respectively. If (A,+, ·) satisfies(4.32a), then

b1 · (b2 + b3) = φ(a1) ·(φ(a2) + φ(a3)

)= φ

(a1 · (a2 + a3)

)= φ(a1 · a2 + a1 · a3)

= φ(a1) · φ(a2) + φ(a1) · φ(a3) = b1 · b2 + b1 · b3,


showing (B,+, ·) satisfies (4.32a) as well. If (A,+, ·) satisfies (4.32b), then

(b2 + b3) · b1 =(φ(a2) + φ(a3)

)· φ(a1) = φ

((a2 + a3) · a1

)= φ(a2 · a1 + a3 · a1)

= φ(a2) · φ(a1) + φ(a3) · φ(a1) = b2 · b1 + b3 · b1,

showing (B,+, ·) satisfies (4.32b) as well. If (A,+, ·) is a ring, then we know fromProp. 4.11(c) that (B,+) is a commutative group and that (B, ·) is associative, showingthat (B,+, ·) is a ring as well. If (A,+, ·) is a ring with unity, then the above andProp. 4.11(d) imply (B,+, ·) to be a ring with unity as well. If (A,+, ·) is a field, then(B \ {0}, ·) must also be a group, i.e. (B,+, ·) is a field. Finally, if φ is an isomorphism,then the implications become equivalences, as both φ and φ−1 are epimorphisms byProp. 4.11(b) (and since φ(0) = 0). �

Example 4.37. Let n ∈ N. In Ex. 4.27(a), we considered the group (Zn,+), where Zn

was defined as the quotient group Zn := Z/nZ. We now want to define a multiplicationon Zn that makes Zn into a ring with unity (and into a field if, and only if, n is prime,cf. Def. D.2(b)). This is accomplished by letting

∀r,s∈Z

r · s = (r + nZ) · (s+ nZ) := rs = rs+ nZ : (4.36)

First, we check that the definition does not depend on the chosen representatives: Letr1, r2, s1, s2 ∈ Z. If r1 = r2 and s1 = s2, then there exist mr,ms ∈ Z such thatr1 − r2 = mrn and s1 − s2 = msn, implying

r1 · s1 = r1s1 + nZ = (r2 +mrn)(s2 +msn) + nZ

= r2s2 + (mrs2 +msr2 +mrmsn)n+ nZ = r2s2 + nZ = r2 · s2,

as desired. We already know the canonical homomorphism

φ : Z −→ Zn, φ(r) = r = r + nZ

to be a homomorphism with respect to addition. Since

∀r,s∈Z

φ(rs) = rs = r · s = φ(r) · φ(s),

it is also a homomorphism with respect to the above-defined multiplication. Since φ isalso surjective, (Zn,+, ·) is a ring with unity by Prop. 4.36. Now suppose that n is notprime. If n = 1, then #Z1 = 1, i.e. it is (isomorphic) to the trivial ring of Ex. 4.35(a).If n = ab with a, b ∈ N, 1 < a, b < n, then

a · b = n = 0,

showing a and b to be nontrivial divisors of 0 (in Zn). In particular, (Zn,+, ·) is not afield. Now consider the case that n is prime. If r ∈ Z and r 6= 0, then gcd(r, n) = 1(cf. Def. D.2(c)) and, by (D.6) of the Appendix, there exist x, y ∈ Z with xn+ yr = 1.Then

y · r = yr = 1− xn = 1,


showing y to be the multiplicative inverse to r. Thus, (Zn\{0}, ·) is a group and (Zn,+, ·)is a field. For the rest of the example, we, once again, allow an arbitrary n ∈ N. InEx. 4.27(a), we showed that, letting G := {0, . . . , n − 1}, we had (G,+) ∼= (Zn,+).We now want to extend this isomorphism to a ring isomorphism. To this end, define amultiplication ⊗ on G by letting

∀r,s∈G

r ⊗ s := p, where rs = qn+ p with q, p ∈ N0 and 0 ≤ p < n (cf. Th. D.1),

which is commutative due to (Z,+) being commutative. We claim the additative iso-morphism f : G −→ Zn, f(r) := r, of Ex. 4.27(a) to be a multiplicative isomorphismas well: If r, s ∈ G, then, using q, p as above,

f(r ⊗ s) = f(p) = p = rs− qn = rs = r · s = f(r)f(s),

showing f to be a multiplicative homomorphism. As we know f to be bijective, (G,+, ·)is a ring by Prop. 4.36, yielding (G,+,⊗) ∼= (Zn,+, ·). As stated in Ex. 4.27(a) in thecontext of (Zn,+), one tends to use the isomorphism f to identify G with Zn. One doesthe same in the context of (Zn,+, ·), also simply writing · for the multiplication on G.

Example 4.38 (Field of Rational Numbers Q). In Ex. 2.36(b), we defined the set ofrational numbers Q as the quotient set of Z× (Z \ {0}) with respect to the equivalencerelation defined by

(a, b) ∼ (c, d) :⇔ a · d = b · c.We now want to define “the usual” addition and multiplication on Q and then verifythat these make Q into a field (recall the notation a

b:= a/b := [(a, b)]):

Addition on Q is defined by

+ : Q×Q −→ Q,(a

b,c

d

)

7→ a

b+c

d:=

ad+ bc

bd. (4.37)

Multiplication on Q is defined by

· : Q×Q −→ Q,(a

b,c

d

)

7→ a

b· cd:=

ac

bd. (4.38)

We will now show that (Q,+, ·) forms a field, where 0/1 and 1/1 are the neutral elementswith respect to addition and multiplication, respectively, (−a/b) is the additive inverseto a/b, whereas b/a is the multiplicative inverse to a/b with a 6= 0. We already knowfrom Ex. 2.36(b) that the map

ι : Z −→ Q, ι(k) :=k

1,

is injective. We will now also show it is a unital ring monomorphism, i.e. it satisfiesι(1) = 1

1,

∀k,l∈Z

ι(k + l) = ι(k) + ι(l), (4.39a)

∀k,l∈Z

ι(kl) = ι(k) · ι(l). (4.39b)


We begin by showing that the definitions in (4.37) and (4.38) do not depend on thechosen representatives, i.e.

∀a,c,a,c∈Z

∀b,d,b,d∈Z\{0}

(

a

b=a

b∧ c

d=c

d⇒ ad+ bc

bd=ad+ bc

bd

)

(4.40)

and

∀a,c,a,c∈Z

∀b,d,b,d∈Z\{0}

(a

b=a

b∧ c

d=c

d⇒ ac

bd=ac

bd

)

. (4.41)

Furthermore, the results of both addition and multiplication are always elements of Q.

(4.40) and (4.41): a/b = a/b means ab = ab, c/d = c/d means cd = cd, implying

(ad+ bc)bd = bd(ad+ bc), i.e.ad+ bc

bd=ad+ bc

bd

and

acbd = bdac, i.e.ac

bd=ac

bd.

That the results of both addition and multiplication are always elements of Q followsfrom (4.35), i.e. from the fact that Z has no nontrivial zero divisors. In particular, ifb, d 6= 0, then bd 6= 0, showing (ad+ bc)/(bd) ∈ Q and (ac)/(bd) ∈ Q.

Next, we verify + and · to be commutative and associative on Q: Let a, c, e ∈ Z andb, d, f ∈ Z \ {0}. Then, using commutativity on Z, we compute

c

d+a

b=cb+ da

db=ad+ bc

bd=a

b+c

d,

c

d· ab=ca

db=ac

bd=a

b· cd,

showing commutativity on Q. Using associativity and distributivity on Z, we compute

a

b+

(c

d+e

f

)

=a

b+cf + de

df=adf + b(cf + de)

bdf=

(ad+ bc)f + bde

bdf

=ad+ bc

bd+e

f=(a

b+c

d

)

+e

f,

a

b·(c

d· ef

)

=a(ce)

b(df)=

(ac)e

(bd)f=(a

b· cd

)

· ef,

showing associativity on Q. We proceed to checking distributivity on Q: Using commu-tativity, associativity, and distributivity on Z, we compute

a

b·(c

d+e

f

)

=a(cf + de)

bdf=acf + dae

bdf=acbf + bdae

bdbf=ac

bd+ae

bf=a

b· cd+a

b· ef,

proving distributivity on Q. We now check the claims regarding neutral and inverse


elements:

a

b+

0

1=a · 1 + b · 0

b · 1 =a

b,

a

b+

−ab

=ab+ b(−a)

b2=

(a− a)b

b2=

0

b2=

0

1,

a

b· 11=a · 1b · 1 =

a

b,

a

b· ba=ab

ba=

1

1.

Thus, (Q,+, ·) is a ring and (Q \ {0}, ·) is a group, implying Q to be a field. Finally

ι(k) + ι(l) =k

1+l

1=k · 1 + 1 · l

1=k + l

1= ι(k + l),

ι(k) · ι(l) = k

1· l1=kl

1= ι(kl),

proving (4.39).

Remark 4.39. With the usual addition and multiplication, (R,+, ·) and (C,+, ·) arefields (see [Phi16, Th. D.35] and and [Phi16, Th. 5.2], respectively). In particular, R isa subfield of C and Q is a subfield of both R and C.

Definition and Remark 4.40. Let (R,+, ·) be a ring with unity. We call x ∈ Rinvertible if, and only if, there exists x ∈ R such that xx = xx = 1. We call (R∗, ·),where R∗ := {x ∈ R : x invertible} the group of invertible elements of R. We verify(R∗, ·) to be a group: If x, y ∈ R∗, then xyyx = 1 and yxxy = 1, showing xy ∈ R∗.Moreover, R∗ inherits associativity from R, 1 ∈ R∗ and each x ∈ R∗ has a multiplicativeinverse by definition of R∗, proving (R∗, ·) to be a group. We also note that, if R 6= {0},then 0 /∈ R∗ and (R∗,+) is not a group (in particular, R∗ is then not a subring of R).

Example 4.41. (a) Let (R,+, ·) be a ring and let M be a set. As in Ex. 4.9(e), wecan define + and · pointwise on RM , i.e.

∀f,g∈RM

f + g : M −→ R, (f + g)(x) := f(x) + g(x),

f · g : M −→ R, (f · g)(x) := f(x) · g(x).

As with commutativity and associativity, it is immediate that RM inherits distribu-tivity from R. Thus, (RM ,+, ·) is a ring as well; if (R,+, ·) is a ring with unity,then so is (RM ,+, ·), where f ≡ 1 is neutral for · in RM , and, using the notation ofDef. and Rem. 4.40, (RM)∗ = (R∗)M . We point out that, if M and R both have atleast two distinct elements, then RM always has nontrivial divisors of 0, even if Rhas none: Let x, y ∈M with x 6= y, and 0 6= a ∈ R. Define

fx, fy : M −→ R, fx(z) :=

{

a if z = x,

0 if z 6= x,fy(z) :=

{

a if z = y,

0 if z 6= y.

Then fx 6≡ 0, fy 6≡ 0, but fx · fy ≡ 0.


(b) Let (G,+) be an abelian group and let End(G) denote the set of (group) endomor-phisms on G. Define addition and multiplication on End(G) by letting, for eachf, g ∈ End(G),

(f + g) : G −→ G, (f + g)(a) := f(a) + g(a), (4.42a)

(f · g) : G −→ G, (f · g)(a) := f(g(a)). (4.42b)

We claim that (End(G),+, ·) forms a ring with unity (the so-called ring of endo-morphisms on G): First, we check + to be well-defined (which is clear for ·): Foreach a, b ∈ G, we compute, using commutativity of + on G,

(f+g)(a+b) = f(a+b)+g(a+b) = f(a)+f(b)+g(a)+g(b) = (f+g)(a)+(f+g)(b),

showing f + g ∈ End(G). As we then know (End(G),+) ≤ (F(G,G),+), we know· to be associative, and Id is clearly neutral for ·, it remains to check distributivity:To this end, let f, g, h ∈ End(G). Then, for each a ∈ G,

(f · (g + h)

)(a) = f

(g(a) + h(a)

)= (fg)(a) + (fh)(a) = (fg + fh)(a),

((g + h) · f

)(a) = (g + h)(f(a)) = (gf)(a) + (hf)(a) = (gf + hf)(a),

proving distributivity and that (End(G),+, ·) forms a ring with unity. Moreover, theset Aut(G) := End(G)∩SG of automorphisms on G is a subgroup of the symmetricgroup SG and, comparing with Def. and Rem. 4.40, we also have Aut(G) = End(G)∗.

Definition and Remark 4.42. Let (F,+, ·) be a field. Recall the homomorphism fromEx. 4.12(c), which we now apply to the group (F,+) with a := 1. Then

φ1 : N0 −→ F, φ1(n) =n∑

i=1

1 =: n · 1. (4.43)

We say that the field F has characteristic 0 (denoted charF = 0) if, and only if, n ·1 6= 0for each n ∈ N. Otherwise, we define the field’s characteristic to be the number

charF := min{n ∈ N : n · 1 = 0}.

With this definition, we have charQ = charR = charC = 0 and, for each prime numberp, charZp = p.

—

The examples above show that 0 and each prime number occur as characteristic of fields.It is not difficult to prove that no other numbers can occur:

Theorem 4.43. Let F be a field. If charF 6= 0, then p := charF is a prime number.

Proof. Suppose p = ab with ab ∈ N. For clarity of notation, we use the homomorphismsφa of Ex. 4.12(c) (in particular φ1 as in Def. and Rem. 4.42) and use · only for themultiplication in F (to not get confused when reading the following computation, one

5 VECTOR SPACES 78

should bear in mind that φa and exponentiation are used for the additive group (F,+)).We obtain

0 = φ1(p) = φ1(ab)Th. 4.8(b)

= = φφ1(a)(b) =b∑

i=1

φ1(a) =b∑

i=1

(φ1(a) · 1

)

= φ1(a) ·b∑

i=1

1 = φ1(a) · φ1(b).

As the field F has no nontrivial zero divisors, we must have φ1(a) = 0 or φ1(b) = 0. Butp = min{n ∈ N : φ1(n) = 0}, implying a = p or b = p, showing p to be prime. �

5 Vector Spaces

5.1 Vector Spaces and Subspaces

Definition 5.1. Let F be a field and let V be a nonempty set with two maps

+ : V × V −→ V, (x, y) 7→ x+ y,

· : F × V −→ V, (λ, x) 7→ λ · x (5.1)

(+ is called (vector) addition and · is called scalar multiplication; as with other mul-tiplications before, often one writes λx instead of λ · x – take care not to confuse thevector addition on V with the addition on F and, likewise, not to confuse the scalarmultiplication with the multiplication on F , the symbol + is used for both additionsand · is used for both multiplications, but you can always determine from the contextwhich addition or multiplication is meant). Then V is called a vector space or a linearspace over F (sometimes also an F -vector space) if, and only if, the following conditionsare satisfied:

(i) (V,+) is a commutative group. The neutral element with respect to + is denoted0 (do not confuse 0 ∈ V with 0 ∈ F – once again, the same symbol is used fordifferent objects (both objects only coincide for F = V )).

(ii) Distributivity:

∀λ∈F

∀x,y∈V

λ(x+ y) = λx+ λy, (5.2a)

∀λ,µ∈F

∀x∈V

(λ+ µ)x = λx+ µx. (5.2b)

(iii) Compatibility between Multiplication on F and Scalar Multiplication:

∀λ,µ∈F

∀x∈V

(λµ)x = λ(µx). (5.3)

5 VECTOR SPACES 79

(iv) The neutral element with respect to the multiplication on F is also neutral withrespect to the scalar multiplication:

∀x∈V

1x = x. (5.4)

If V is a vector space over F , then one calls the elements of V vectors and the elementsof F scalars.

Example 5.2. (a) Every field F is a vector space over itself if one uses the field ad-dition in F as the vector addition and the field multiplication in F as the scalarmultiplication (as important special cases, we obtain that R is a vector space overR and C is a vector space over C): All the vector space laws are immediate conse-quences of the corresponding field laws: Def. 5.1(i) holds as every field is a commu-tative group with respect to addition; Def. 5.1(ii) follows from field distributivityand multiplicative commutativity on F ; Def. 5.1(iii) is merely the multiplicativeassociativity on F ; and Def. 5.1(iv) holds, since scalar multiplication coincides withfield multiplication on F .

(b) The reasoning in (a) actually shows that every field F is a vector space over everysubfield E ⊆ F . In particular, R is a vector space over Q.

(c) If A is any nonempty set, F is a field, and Y is a vector space over the field F ,then we can make V := F(A, Y ) = Y A (the set of functions from A into Y ) into avector space over F by defining for each f, g : A −→ Y :

(f + g) : A −→ Y, (f + g)(x) := f(x) + g(x), (5.5a)

(λ · f) : A −→ Y, (λ · f)(x) := λ · f(x) for each λ ∈ F : (5.5b)

To verify that (V,+, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i),i.e. by showing (V,+) is a commutative group: Since (5.5a) is precisely the pointwisedefinition of Ex. 4.9(e), we already know (V,+) to be a commutative group due to(4.11d).

To verify Def. 5.1(ii), one computes

∀λ∈F

∀f,g∈V

∀x∈A

(λ(f + g)

)(x) = λ

(f(x) + g(x)

) (∗)= λf(x) + λg(x)

=(λf + λg

)(x),

where distributivity in the vector space Y was used at (∗), proving λ(f+g) = λf+λgand, thus, (5.2a). Similarly,

∀λ,µ∈F

∀f∈V

∀x∈A

((λ+ µ)f

)(x) = (λ+ µ)f(x)

(∗)= λf(x) + µf(x)

=(λf + µf

)(x),

where, once more, distributivity in the vector space Y was used at (∗), proving(λ+ µ)f = λf + µf and, thus, (5.2b).

5 VECTOR SPACES 80

The proof of Def. 5.1(iii), is given by

∀λ,µ∈F

∀f∈V

∀x∈A

((λµ)f

)(x) = (λµ)f(x)

(∗)= λ(µf(x)) =

(λ(µf)

)(x),

where the validity of Def. 5.1(iii) in the vector space Y was used at (∗).Finally, to prove Def. 5.1(iv), one computes

∀f∈V

∀x∈A

(1 · f)(x) = 1 · f(x) (∗)= f(x),

where the validity of Def. 5.1(iv) in the vector space Y was used at (∗).

(d) Let F be a field (e.g. F = R or F = C) and n ∈ N. For x = (x1, . . . , xn) ∈ F n,y = (y1, . . . , yn) ∈ F n, and λ ∈ F , define componentwise addition

x+ y := (x1 + y1, . . . , xn + yn), (5.6a)

and componentwise scalar multiplication

λx := (λx1, . . . , λxn). (5.6b)

Then (F n,+, ·) constitutes a vector space over F . The validity of Def. 5.1(i) – Def.5.1(iv) can easily be verified directly, but (F n,+, ·) can also be seen as a specialcase of (c) with A = {1, . . . , n} and Y = F : Recall that, according to Ex. 2.16(c),F n = F {1,...,n} = F

({1, . . . , n}, F

)is the set of functions from {1, . . . , n} into F .

Then x = (x1, . . . , xn) ∈ F n is the same as the function f : {1, . . . , n} −→ F ,f(j) = xj. Thus, (5.6a) is, indeed, the same as (5.5a), and (5.6b) is, indeed, thesame as (5.5b).

Proposition 5.3. The following rules hold in each vector space V over the field F(here, even though we will usually use the same symbol for both objects, we write 0 forthe additive neutral element in F and ~0 for the additive neutral element in V ):

(a) λ ·~0 = ~0 for each λ ∈ F .

(b) 0 · v = ~0 for each v ∈ V .

(c) λ · (−v) = (−λ) · v = −(λ · v) for each λ ∈ F and each v ∈ V .

(d) If λ · v = ~0 with λ ∈ F and v ∈ V , then λ = 0 or v = ~0.


Definition 5.4. Let (V,+, ·) be a vector space over the field F , ∅ 6= U ⊆ V . We call Ua subspace of V if, and only if, (U,+, ·) forms a vector space over F with respect to theoperations + and · it inherits from V , i.e. with respect to +↾U×U and ·↾F×U .

Theorem 5.5. Let (V,+, ·) be a vector space over the field F , ∅ 6= U ⊆ V . Then thefollowing statements are equivalent:

5 VECTOR SPACES 81

(i) U is a subspace of V .

(ii) One has

∀x,y∈U

x+ y ∈ U, (5.7a)

∧ ∀λ∈F

∀x∈U

λx ∈ U. (5.7b)

(iii) One has∀

λ,µ∈F∀

x,y∈Uλx+ µy ∈ U. (5.8)

Proof. From the definition of a vector space, it is immediate that (i) implies (ii) and(iii).

“(iii)⇒(ii)”: One obtains (5.7a) from (5.8) by setting λ := µ := 1, and one obtains(5.7b) from (5.8) by setting y := 0 (and µ arbitrary, e.g., µ := 1).

“(ii)⇒(i)”: Clearly, U inherits commutativity and distributivity as well as the validityof (5.3) and (5.4) from V . Moreover, if x ∈ U , then, setting λ := −1 in (5.7b), shows−x ∈ U . This, together with (5.7a), proves (U,+) to be a subgroup of (V,+), therebycompleting the proof of (i). �

Example 5.6. (a) As a consequence of Th. 5.5, if V is a vector space over the fieldF , then {0} ⊆ V is always a subspace of V (sometimes called the tivial or the zerovector space over F ).

(b) Q is not a subspace of R if R is considered as a vector space over R (for example,√2 · 2 /∈ Q). However, Q is a subspace of R if R is considered as a vector space

over Q.

(c) Consider the vector space V := R3 over R and let

U := {(x, y, z) ∈ V : 3x− y + 7z = 0}.

We use Th. 5.5 to show U is a subspace of V : Since (0, 0, 0) ∈ U , we have U 6= ∅.Moreover, if u1 = (x1, y1, z1) ∈ U , u2 = (x2, y2, z2) ∈ U , and λ, µ ∈ R, then

λu1 + µu2 = (λx1 + µx2, λy1 + µy2, λz1 + µz2)

and

3(λx1 + µx2)− (λy1 + µy2) + 7(λz1 + µz2)

= λ(3x1 − y1 + 7z1) + µ(3x2 − y2 + 7z2)u1,u2∈U= λ · 0 + µ · 0 = 0,

showing λu1 + µu2 and proving U to be a subspace of V .

5 VECTOR SPACES 82

(d) It is customary to write K if K may stand for R or C, and, from now on, we willoccasionally use this notation. From Ex. 5.2(c), we know that, for each ∅ 6= A,F(A,K) constitutes a vector space over K. Thus, as a consequence of Th. 5.5, asubset of F(A,K) is a vector space over K if, and only if, it is closed under additionand scalar multiplication. By using results from Calculus, we obtain the followingexamples:

(i) The set P(K) of all polynomials mapping from K into K is a vector space overK by [Phi16, Rem. 6.4]; for each n ∈ N0, the set Pn(K) of all such polynomialsof degree at most n is also a vector space over K by [Phi16, Rem. (6.4a),(6.4b)].

(ii) If ∅ 6= M ⊆ C, then the set of continuous functions from M into K, i.e.C(M,K), is a vector space over K by [Phi16, Th. 7.38].

(iii) If a, b ∈ R∪{−∞,∞} and a < b, then the set of differentiable functions fromI :=]a, b[ into K is a vector space over K by [Phi16, Th. 9.7(a),(b)]. Moreover,[Phi16, Th. 9.7(a),(b)] also implies that, for each k ∈ N, the set of k timesdifferentiable functions from I into K is a vector space over K, and so is eachset Ck(I,K) of k times continuously differentiable functions ([Phi16, Th. 7.38]is also needed for the last conclusion). The set C∞(I,K) :=

⋂

k∈NCk(I,K) is

also a vector space over K by Th. 5.7(a) below.

Theorem 5.7. Let V be a vector space over the field F .

(a) Let I 6= ∅ be an index set and (Ui)i∈I a family of subspaces of V . Then the inter-section U :=

⋂

i∈I Ui is also a subspace of V .

(b) In contrast to intersections, unions of subspaces are almost never subspaces. Moreprecisely, if U1 and U2 are subspaces of V , then

U1 ∪ U2 is subspace of V ⇔(

U1 ⊆ U2 ∨ U2 ⊆ U1

)

. (5.9)

Proof. (a): Since 0 ∈ U , we have U 6= ∅. If x, y ∈ U and λ, µ ∈ F , then λx + µy ∈ Ui

for each i ∈ I, implying λx+ µy ∈ U . Thus, U is a subspace of V by Th. 5.5.

(b): Exercise. �

While the union of two subspaces U1, U2 is typically not a subspace, Th. 5.7(a) impliesthere is always a smallest subspace containing U1 ∪ U2 and we will see below that thissubspace is always the so-called sum U1 +U2 of the two subspaces (this is a special caseof Prop. 5.11). As it turns out, the sum of vector spaces is closely connected with twoother essential notions of vector space theory, namely the notions linear combinationand span, which we define first:

Definition 5.8. Let V be a vector space over the field F .

5 VECTOR SPACES 83

(a) Let n ∈ N and v1, . . . , vn ∈ V . A vector v ∈ V is called a linear combination ofv1, . . . , vn if, and only if, there exist λ1, . . . , λn ∈ F (often called coefficients in thiscontext) such that

v =n∑

i=1

λi vi. (5.10)

(b) Let A ⊆ V , and

U :={U ∈ P(V ) : A ⊆ U ∧ U is subspace of V

}. (5.11)

Then the set〈A〉 := spanA :=

⋂

U∈U

U (5.12)

is called the span of A. Moreover A is called a spanning set or a generating set ofV if, and only if, 〈A〉 = V .

Proposition 5.9. Let V be a vector space over the field F and A ⊆ V .

(a) 〈A〉 is a subspace of V , namely the smallest subspace of V containing A.

(b) If A = ∅, then 〈A〉 = {0}; if A 6= ∅, then 〈A〉 is the set of all linear combinationsof elements from A, i.e.

〈A〉 ={

n∑

i=1

λi ai : n ∈ N ∧ λ1, . . . , λn ∈ F ∧ a1, . . . , an ∈ A

}

. (5.13)

(c) If A ⊆ B ⊆ V , then 〈A〉 ⊆ 〈B〉.

(d) A = 〈A〉 if, and only if, A is a subspace of V .

(e) 〈〈A〉〉 = 〈A〉.

Proof. (a) is immediate from Th. 5.7(a).

(b): For the case A = ∅, note that {0} is a subspace of V , and that {0} is contained inevery subspace of V . For A 6= ∅, let W denote the right-hand side of (5.13), and recallfrom (5.12) that 〈A〉 is the intersection of all subspaces U of V that contain A. If U isa subspace of V and A ⊆ U , then W ⊆ U , since U is closed under vector addition andscalar multiplication, showing W ⊆ 〈A〉. On the other hand, W is clearly a subspace ofV that contains A, showing 〈A〉 ⊆ W , completing the proof of 〈A〉 = W .

(c) is immediate from (b).

(d): If A = 〈A〉, then A is a subspace by (a). For the converse, while it is clear thatA ⊆ 〈A〉 always holds, if A is a subspace, then A ∈ U , where U is as in (5.11), implying〈A〉 ⊆ A.

(e) now follows by combining (d) with (a). �

5 VECTOR SPACES 84

Definition 5.10. Let V be a vector space over the field F , let I be an index set andlet (Ui)i∈I be a family of subspaces of V . For I = ∅, define

∑

i∈I

Ui := {0}. (5.14a)

If #I = n ∈ N, then let φ : {1, . . . , n} −→ I be a bijection and define

∑

i∈I

Ui :=n∑

k=1

Uφ(k) :=

{n∑

k=1

uk : ∀k∈{1,...,n}

uk ∈ Uφ(k)

}

, (5.14b)

where the commutativity of addition guarantees the definition is independent of thechosen bijection (note that this definition is consistent with the definition in Ex. 4.9(d)).For a general, not necessarily finite I, let J := {J ⊆ I : J is finite} and define

∑

i∈I

Ui :=⋃

J∈J

∑

j∈J

Uj. (5.14c)

In each case, we call∑

i∈I Ui the sum of the subspaces Ui, i ∈ I.

Proposition 5.11. Let V be a vector space over the field F , let I be an index set andlet (Ui)i∈I be a family of subspaces of V . Then

∑

i∈I

Ui =

⟨⋃

i∈I

Ui

⟩

(5.15)

(in particular,∑

i∈I Ui is always a subspace of V ).

Proof. Let A :=∑

i∈I Ui, B := 〈⋃i∈I Ui〉. If I = ∅, then both sides of (5.15) are {0},i.e. the statement holds in this case. Now assume I 6= ∅. If v ∈ A, then there arei1, . . . , in ∈ I, n ∈ N, such that v =

∑nk=1 uik , uik ∈ Uik , showing v ∈ B, since B is a

subspace of V . Thus, A ⊆ B. To prove B ⊆ A, it suffices to show A is a subspace of Vthat contains each Ui, i ∈ I. If i ∈ I and v ∈ Ui, then v ∈ A is clear from (5.14c) (setJ := {i}). Let v, w ∈ A, where v is as above and there exist j1, . . . , jm ∈ I, m ∈ N, suchthat w =

∑mk=1wjk , wjk ∈ Ujk . Then v + w =

∑nk=1 uik +

∑mk=1 ujk ∈ A and, for each

λ ∈ F , λv =∑n

k=1(λuik) ∈ A, proving A to be a subspace of V . Thus, B ⊆ A, therebyconcluding the proof. �

5.2 Linear Independence, Basis, Dimension

The notions introduced in the present section are of central importance to the theoryof vector spaces. In particular, we will see that the structure of many interesting vectorspaces (e.g. Kn) is particularly simple, as every vector can be written as a so-calledlinear combination of a fixed finite set of linearly dependent vectors (we will see this isthe case if, and only if, the vector space is finite-dimensional, cf. Th. 5.23 below).

5 VECTOR SPACES 85


(a) A vector v ∈ V is called linearly dependent on a subset U of V (or on the vectorsin U) if, and only if, v = 0 or there exists n ∈ N and u1, . . . , un ∈ U such that v isa linear combination of u1, . . . , un. Otherwise, v is called linearly independent of U .

(b) A subset U of V is called linearly independent if, and only if, whenever 0 ∈ V iswritten as a linear combination of distinct elements of U , then all coefficients mustbe 0 ∈ F , i.e. if, and only if,

(

n ∈ N ∧ W ⊆ U ∧ #W = n ∧∑

u∈W

λu u = 0 ∧ ∀u∈W

λu ∈ F

)

⇒ ∀u∈W

λu = 0. (5.16a)

Occasionally, one also wants to have the notion available for families of vectorsinstead of sets, and one calls a family (ui)i∈I of vectors in V linearly independentif, and only if,

(

n ∈ N ∧ J ⊆ I ∧ #J = n ∧∑

j∈J

λj uj = 0 ∧ ∀j∈J

λj ∈ F

)

⇒ ∀j∈J

λj = 0. (5.16b)

Sets and families that are not linearly independent are called linearly dependent.

Example 5.13. Let V be a vector space over the field F .

(a) ∅ is linearly independent: Indeed, if U = ∅, then the left side of the implication in(5.16a) is always false (since W ⊆ U means #W = 0), i.e. the implication is true.Moreover, by Def. 5.12(a), v ∈ V is linearly dependent on ∅ if, and only if, v = 0(this is also consistent with

∑

u∈∅ λuu = 0 (cf. (3.15d)).

(b) If 0 ∈ U ⊆ V , then U is linearly dependent (in particular, {0} is linearly dependent),due to 1 · 0 = 0.

(c) If 0 6= v ∈ V and λ ∈ F with λv = 0, then λ = 0 by Prop. 5.3(d), showing {v} to belinearly independent. However, the family (v, v) is always linearly dependent, since1v + (−1)v = 0 (also for v = 0). Moreover, 1v = v also shows that every v ∈ V islinearly dependent on itself.

(d) The following example is of some general importance: Let I be a set and V = F I

(cf. Ex. 5.2(c),(d)). Define

∀i∈I

ei : I −→ F, ei(j) := δij :=

{

1 if i = j,

0 if i 6= j.(5.17)

5 VECTOR SPACES 86

Then δ is known as the Kronecker delta (which can be seen as a function δ :I × I −→ {0, 1}). If n ∈ N and V = F n, i.e. I = {1, . . . , n}, then one often writesthe ei in the form e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en−1 = (0, . . . , 0, 1, 0),en = (0, . . . , 0, 1); for example in R3 (or, more generally, in F 3), e1 = (1, 0, 0),e2 = (0, 1, 0), e3 = (0, 0, 1). Returning to the case of a general I, we check that theset

B := {ei : i ∈ I} (5.18)

is linearly independent: Assume

∑

j∈J

λjej = 0 ∈ V,

where J ⊆ I, #J = n ∈ N, and λj ∈ F for each j ∈ J . Recalling that 0 ∈ V is thefunction that is constantly equal to 0 ∈ F , we obtain

∀k∈J

0 =

(∑

j∈J

λjej

)

(k) =∑

j∈J

λjej(k) =∑

j∈J

λjδjk = λk,

thereby proving the linear independence of B.

(e) For a Calculus application, we let

∀k∈N0

fk : R −→ K, fk(x) := ekx,

and show thatU := {fk : k ∈ N0}

is linearly independent as a subset of the vector space V := C(R,K) over K ofdifferentiable functions from R into K (cf. Ex. 5.6(d)(iii)), by showing each setUn := {fk : k ∈ {0, . . . , n}}, n ∈ N0, is linearly independent, using an inductionon n: Since, for n = 0, fn ≡ 1, the base case holds. Now let n ≥ 1 and assumeλ0, . . . , λn ∈ K are such that

∀x∈R

n∑

k=0

λkekx = 0. (5.19)

Multiplying (5.19) by n yields

∀x∈R

n

n∑

k=0

λkekx = 0, (5.20)

whereas differentiating (5.19) with respect to x yields

∀x∈R

n∑

k=0

kλkekx = 0. (5.21)

5 VECTOR SPACES 87

By subtracting (5.21) from (5.20), we then obtain

∀x∈R

n−1∑

k=0

(n− k)λkekx = 0,

implying λ0 = · · · = λn−1 = 0 due to the indcution hypothesis. Using this in (5.19),then yields

∀x∈R

λnenx = 0

and, thus, λn = 0 as well, completing the induction proof.

Proposition 5.14. Let V be a vector space over the field F and U ⊆ V .

(a) U is linearly dependent if, and only if, there exists u0 ∈ U such that u0 is linearlydependent on U \ {u0}.

(b) If U is linearly dependent and U ⊆M ⊆ V , then M is linearly dependent as well.

(c) If U is linearly independent und M ⊆ U , then M is linearly independent as well.

(d) Let U be linearly independent. If U1, U2 ⊆ U and V1 := 〈U1〉, V2 := 〈U2〉, thenV1 ∩ V2 = 〈U1 ∩ U2〉.

Proof. (a): Suppose, U is linearly dependent. Then there exists W ⊆ U , #W = n ∈ N,such that

∑

u∈W λuu = 0 with λu ∈ F and there exists u0 ∈ W with λu06= 0. Then

u0 = −λ−1u0

∑

u∈W\{u0}

λuu =∑

u∈W\{u0}

(−λ−1u0λu)u,

showing u0 to be linearly dependent on U \ {u0}. Conversely, if u0 ∈ U is linearlydependent on U \ {u0}, then u0 = 0, in which case U is linearly dependent according toEx. 5.13(b), or u0 6= 0, in which case, there exists n ∈ N, distinct u1, . . . , un ∈ U \ {u0},and λ1, . . . , λn ∈ F such that

u0 =n∑

i=1

λiui ⇒ −u0 +n∑

i=1

λiui = 0,

showing U to be linearly dependent, since the coefficient of u0 is −1 6= 0.

(b) and (c) are now both immediate from (a).

(d): Since V1 ∩V2 is a vector space with U1 ∩U2 ⊆ V1 ∩V2, we already have 〈U1 ∩U2〉 ⊆V1 ∩ V2. Now let u ∈ V1 ∩ V2. Then there exist distinct z1, . . . , zk ∈ U1 ∩U2 and distinctxk+1, . . . , xk+n ∈ U1 \ U2 and distinct yk+1, . . . , yk+m ∈ U2 \ U1 with k ∈ N0, m,n ∈ N,as well as λ1, . . . , λk+n, µ1, . . . , µk+m ∈ F such that

u =k∑

i=1

λizi +k+n∑

i=k+1

λixi =k∑

i=1

µizi +k+m∑

i=k+1

µiyi,

5 VECTOR SPACES 88

implying

0 =k∑

i=1

(λi − µi)zi +k+n∑

i=k+1

λixi −k+m∑

i=k+1

µiyi.

The linear independence of U then yields λ1 = µ1, . . . , λk = µk as well as λk+1 =· · · = λk+n = µk+1 = · · · = µk+m = 0. Thus, in particular, u ∈ 〈U1 ∩ U2〉, provingV1 ∩ V2 ⊆ 〈U1 ∩ U2〉 as desired. �

Definition 5.15. Let V be a vector space over the field F and B ⊆ V . Then B iscalled a basis of V if, and only if, B is a generating set for V (i.e. V = 〈B〉) that is alsolinearly independent.

Example 5.16. (a) Due to Ex. 5.13(a), in each vector space V over a field F , we havethat ∅ is linearly independent with 〈∅〉 = {0}, i.e. ∅ is a basis of the trivial space{0}.

(b) If one considers a field F as a vector space over itself, then, clearly, every {λ} with0 6= λ ∈ F is a basis.

(c) We continue Ex. 5.13(d), where we showed that, given a field F and a set I, B ={ei : i ∈ I} with the ei defined by (5.17) constitutes a linearly independent subsetof the vector space F I over F . We will now show that 〈B〉 = F I

fin (i.e. B is a basisof F I

fin), where FIfin denotes the set of functions f : I −→ F such that there exists

a finite set If ⊆ I, satisfying

f(i) = 0 for each i ∈ I \ If , (5.22a)

f(i) 6= 0 for each i ∈ If (5.22b)

(then F Ifin = F I if, and only if, I is finite (for example F n

fin = F n for n ∈ N); but, ingeneral, F I

fin is a strict subset of F I). We first show that F Ifin is always a subspace

of F I : Indeed, if f, g ∈ F Ifin and λ ∈ F , then Iλf = If for λ 6= 0, Iλf = ∅ for λ = 0,

and If+g ⊆ If ∪ Ig, i.e. λf ∈ F Ifin and f + g ∈ F I

fin. We also note B ⊆ F Ifin, since

Iei = {i} for each i ∈ I. To see 〈B〉 = F Ifin, we merely note that

∀f∈F I

fin

f =∑

i∈If

f(i)ei. (5.23)

Thus, B is a basis of F Ifin as claimed – B is often referred to as the standard basis of

F Ifin. In particular, we have shown that, for each n ∈ N, the set {ej : j = 1, . . . , n},

where e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1), forms a basis ofF n (of Rn if F = R and of Cn if F = C). If I = N, then F I

fin is the space of allsequences in F that have only finitely many nonzero entries. We will see later that,actually, every vector space is isomorphic to some suitable F I

fin (cf. Rem. 5.25 andTh. 6.9).

(d) In Ex. 5.6(d)(i), we noted that the set P(K) of polynomials with coefficients inK is a vector space over K. We show the set B := {ej : j ∈ N0} of monomials

5 VECTOR SPACES 89

ej : K −→ K, ej(x) := xj, to be a basis of P(K): While 〈B〉 = P(K) is immediatefrom the definition of P(K), linear independence of B can be shown similarly to thelinear independence of the exponential functions in Ex. 5.13(e): As in that example,we use the differentiability of the considered functions (here, the ej) to show eachset Un := {ej : j ∈ {0, . . . , n}}, n ∈ N0, is linearly independent, using an inductionon n: Since, for n = 0, en ≡ 1, the base case holds. Now let n ≥ 1 and assumeλ0, . . . , λn ∈ K are such that

∀x∈R

n∑

j=0

λjxj = 0. (5.24)

Differentiating (5.24) with respect to x yields

∀x∈R

n∑

j=1

jλjxj−1 = 0,

implying λ1 = · · · = λn = 0 due to the indcution hypothesis. Using this in (5.24),then yields λ0 = λ0x

0 = 0, completing the induction proof. We will see later thatP(K) is, actually, isomorphic, to (K)Nfin.

Theorem 5.17. Let V be a vector space over the field F and B ⊆ V . Then the followingstatements (i) – (iii) are equivalent:

(i) B is a basis of V .

(ii) B is a maximal linearly independent subset of V , i.e. B is linearly independentand each set A ⊆ V with B ( A is linearly dependent.

(iii) B is a minimal generating set for V , i.e. 〈B〉 = V and 〈A〉 ( V for each A ( B.

Proof. “(i)⇒(ii)”: Let B ⊆ A ⊆ V and a ∈ A \ B. Since 〈B〉 = V , there existλ1, . . . , λn ∈ F , b1, . . . , bn ∈ B, n ∈ N, such that

a =n∑

i=1

λibi ⇒ (−1)a+n∑

i=1

λibi = 0,

showing A to be linearly dependent.

“(ii)⇒(i)”: Suppose B to be a maximal linearly independent subset of V . We need toshow 〈B〉 = V . Since B ⊆ 〈B〉, let v ∈ V \B. Then A := {v}∪B is linearly dependent,i.e.

∃W⊆A

(

#W = n ∈ N ∧∑

u∈W

λu u = 0 ∧ ∀u∈W

λu ∈ F ∧ ∃u∈W

λu 6= 0

)

,

(5.25)where the linear independence of B implies v ∈ W and λv 6= 0. Thus,

v = −∑

u∈W\{v}

λ−1v λu u

5 VECTOR SPACES 90

showing v ∈ 〈B〉, since W \ {v} ⊆ B.

“(i)⇒(iii)”: Let A ( B and b ∈ B \ A. Since B is linearly independent, Prop. 5.14(a)implies b not to be a linear combination of elements in A, showing 〈A〉 ( V .

“(iii)⇒(i)”: We need to show that a minimal generating set for V is linearly independent.Arguing by contraposition, we let B be a generating set for V that is linearly dependentand show B is not minimal: Since B is linearly dependent, (5.25) holds with A replacedby B and there exists b0 ∈ W with λb0 6= 0, yielding

b0 = −∑

u∈W\{b0}

λ−1b0λu u. (5.26)

We show A := B \ {b0} to be a generating set for V : If v ∈ V , since 〈B〉 = V , thereexist λ0, λ1, . . . , λn ∈ F and b1, . . . , bn ∈ A, n ∈ N, such that

v = λ0b0 +n∑

i=1

λibi(5.26)= −

∑

u∈W\{b0}

λ0λ−1b0λu u+

n∑

i=1

λibi.

Since u ∈ W \ {b0} ⊆ A and b1, . . . , bn ∈ A, this proves A to be a generating set for V ,i.e. B is not minimal. �

Proposition 5.18. Let V be a vector space over the field F and U ⊆ V a finite subset,#U = n ∈ N, u0 ∈ U , v0 :=

∑

u∈U λuu with λu ∈ F for each u ∈ U and λu06= 0,

U := {v0} ∪ (U \ {u0}).

(a) If 〈U〉 = V , then 〈U〉 = V .

(b) If U is linearly independent, then so is U .

(c) If U is a basis of V , then so is U .

Proof. Under the hypothesis, we have

u0 = λ−1u0v0 − λ−1

u0

∑

u∈U\{u0}

λuu. (5.27)

(a): If v ∈ V , then there exist µu ∈ F , u ∈ U , such that

v =∑

u∈U

µuu = µu0u0 +

∑

u∈U\{u0}

µuu(5.27)= µu0

λ−1u0v0 +

∑

u∈U\{u0}

(µu − µu0λ−1u0λu)u,

showing 〈U〉 = V .

(b): Suppose µ0 ∈ F and µu ∈ F , u ∈ U \ {u0}, are such that

0 = µ0v0 +∑

u∈U\{u0}

µuu(5.27)= µ0λu0

u0 +∑

u∈U\{u0}

(µ0λu + µu)u.

5 VECTOR SPACES 91

Since U is linearly independent, we obtain µ0λu0= 0 = µ0λu+µu for each u ∈ U \{u0}.

Since λu06= 0, we have µ0 = 0, then also implying µu = 0 for each u ∈ U \{u0}, showing

U to be linearly independent.

(c) is now immediate from combining (a) and (b). �

Theorem 5.19 (Coordinates). Let V be a vector space over the field F and assumeB ⊆ V is a basis of V . Then each vector v ∈ V has unique coordinates with respectto the basis B, i.e., for each v ∈ V , there exists a unique finite subset Bv of B and aunique map c : Bv −→ F \ {0} such that

v =∑

b∈Bv

c(b) b. (5.28)

Note that, for v = 0, one has Bv = ∅, c is the empty map, and (5.28) becomes 0 =∑

b∈∅ c(b) b.

Proof. The existence of Bv and the map c follows from the fact that the basis B is agenerating set, 〈B〉 = V . For the uniqueness proof, consider finite sets Bv, Bv ⊆ B andmaps c : Bv −→ F \ {0}, c : Bv −→ F \ {0} such that

v =∑

b∈Bv

c(b) b =∑

b∈Bv

c(b) b. (5.29)

Extend both c and c to A := Bv ∪ Bv by letting c(b) := 0 for b ∈ Bv \ Bv and c(b) := 0for b ∈ Bv \ Bv. Then

0 =∑

b∈A

(c(b)− c(b)

)b, (5.30)

such that the linear independence of A implies c(b) = c(b) for each b ∈ A, which, inturn, implies Bv = Bv and c = c. �

Remark 5.20. If the basis B of V has finitely many elements, then one often enumeratesthe elements B = {b1, . . . , bn}, n = #B ∈ N, and writes λi := c(bi) for bi ∈ Bv, λi := 0for bi /∈ Bv, such that (5.28) takes the, perhaps more familiar looking, form

v =n∑

i=1

λibi. (5.31)

—

In Ex. 5.16(c), we have seen that the vector space F n, n ∈ N, over F has the finitebasis {e1, . . . , en} and, more generally, that, for each set I, B = {ei : i ∈ I} is a basisfor the vector space F I

fin over F . Since, clearly, B and I have the same cardinality (viathe bijective map i 7→ ei), this shows that, for each set, there exists a vector space,whose basis has the same cardinality as I. In Th. 5.23 below, we will show one of thefundamental results of vector space theory, namely that every vector space has a basisand that two bases of the same vector space always have the same cardinality (which isreferred to as the dimension of the vector space, cf. Def. 5.24).

5 VECTOR SPACES 92

As some previous results of this class, the proof of Th. 5.23 makes use of the axiomof choice (AC) of Appendix A.4, which postulates, for each nonempty set M, whoseelements are all nonempty sets, the existence of a choice function, that means a functionthat assigns, to each M ∈ M, an element m ∈ M . Somewhat surprisingly, AC can notbe proved (or disproved) from ZF, i.e. from the remaining standard axioms of set theory(see Appendix A.4 for a reference). Previously, when AC was used, choice functions wereemployed directly. However, in the proof of Th. 5.23, AC enters in the form of Zorn’slemma. While Zorn’s lemma turns out to be equivalent to AC, the equivalence is notobvious (see Th. A.52(iii)). However, Zorn’s lemma provides an important techniquefor conducting existence proofs that is encountered frequently throughout mathematics,and the proof of existence of a basis in a vector space is the standard place to firstencounter and learn this technique (as it turns out, the existence of bases in vectorspaces is, actually, also equivalent to AC – see the end of the proof of Th. A.52 for areference).

The following Def. 5.21 prepares the statement of Zorn’s lemma:

Definition 5.21 (Same as Part of Appendix Def. A.50). Let X be a set and let ≤ bea partial order on X.

(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there existsno x ∈ X such that m < x (note that a maximal element does not have to be amax and that a maximal element is not necessarily unique).

(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by≤.

Theorem 5.22 (Zorn’s Lemma). Let X be a nonempty partially ordered set. If everychain C ⊆ X (i.e. every nonempty totally ordered subset of X) has an upper bound inX (such chains with upper bounds are sometimes called inductive), then X contains amaximal element (cf. Def. 5.21(a)).

Proof. According to Th. A.52(iii), Zorn’s lemma is equivalent to the axiom of choice. �

Theorem 5.23 (Bases). Let V be a vector space over the field F .

(a) If U ⊆ V is linearly independent, then there exists a basis of V that contains U .

(b) V has a basis B ⊆ V .

(c) Bases of V have a unique cardinality, i.e. if B ⊆ V and B ⊆ V are both bases ofV , then there exists a bijective map φ : B −→ B. In particular, if #B = n ∈ N0,then #B = n.

(d) If B is a basis of V and U ⊆ V is linearly independent, then there exists C ⊆ B suchthat B := U ∪C is a basis of V . In particular, if #B = n ∈ N, B = {b1, . . . , bn},then #U = m ∈ N with m ≤ n, and, if U = {u1, . . . , um}, then there exist distinctum+1, . . . , un ∈ B such that B = {u1, . . . , un} is a basis of V .

5 VECTOR SPACES 93

Proof. If V is generated by finitely many vectors, then the proof does not need Zorn’slemma and, thus, we treat this case first: If V = {0}, then (a) – (d) clearly hold. Thus,we also assume V 6= {0}. To prove (d), assume B = {b1, . . . , bn} is a basis of V with#B = n ∈ N. We first show, by induction on m ∈ {1, . . . , n} that, if U is linearlyindependent, #U = m, U = {u1, . . . , um}, then there exist distinct um+1, . . . , un ∈ Bsuch that B = {u1, . . . , un} is a basis of V : If m = 1, we write u1 =

∑ni=1 λibi with

λi ∈ F . Since u1 6= 0, there exists i0 ∈ {1, . . . , n} such that λi0 6= 0. Then B :={u1}∪ (B \ {bi0}) is a basis of V by Prop. 5.18(c), thereby establishing the base case. If1 < m ≤ n, then, by induction hypothesis, there exist distinct cm, cm+1, . . . , cn ∈ B suchthat {u1, . . . , um−1} ∪ {cm, cm+1, . . . , cn} forms a basis of V . Then there exist µi ∈ Fsuch that

um =m−1∑

i=1

µiui +n∑

i=m

µici.

There then must exist i0 ∈ {m, . . . , n} such that µi0 6= 0, since, otherwise, U werelinearly dependent. Thus, again, by Prop. 5.18(c), B := U ∪ ({cm, cm+1, . . . , cn} \ {ci0})forms a basis of V , completing the induction. Now suppose m > n. Then the aboveshows {u1, . . . , un} to form a basis of V , in contradiction to the linear independence ofU . This then also implies that U can not be infinite. Thus, we always have #U = m ≤ nand #(B \ {um+1, . . . , un}) = m, completing the proof of (d). Now if U is not merelylinearly independent, but itself a basis of V , then we can switch the roles of B andU , obaining m = n, proving (c). Since we assume V to be generated by finitely manyvectors, there exists C ⊆ V with 〈C〉 = V and #C = k ∈ N. Then C is either a basis orit is not minimal, in which case there exists c ∈ C such that C := C \ {c} still generatesV , #C = k − 1. Proceeding inductively, we can remove vectors until we obtain a basisof V , proving (b). Combining (b) with (d), now proves (a).

General Case: Here, we prove (a) first, making use of Zorn’s lemma: Given a linearlyindependent set U ⊆ V , define

M := {M ⊆ V : U ⊆M and M is linearly independent} (5.32)

and note that set inclusion ⊆ constitutes a partial order on M. Moreover, U ∈ M, i.e.M 6= ∅. Let C ⊆ M be a chain, i.e. assume C 6= ∅ to be totally ordered by ⊆. Define

C0 :=⋃

C∈C

C. (5.33)

It is then immediate that C ∈ C implies C ⊆ C0, i.e. C0 is an upper bound for C.Moreover, C0 ∈ M: Indeed, if C ∈ C, then U ⊆ C ⊆ C0. To verify C0 is linearlyindependent, let c1, . . . , cn ∈ C0 be distinct and λ1, . . . , λn ∈ F , n ∈ N, with

0 =n∑

i=1

λici.

Then, for each i ∈ {1, . . . , n}, there exists Ci ∈ C with ci ∈ Ci. Since

D := {C1, . . . , Cn} ⊆ C

5 VECTOR SPACES 94

is totally ordered by⊆, by Th. 3.25(a), there exist i0 ∈ {1, . . . , n} such that Ci0 = maxD,i.e. Ci ⊆ Ci0 for each i ∈ {1, . . . , n}, implying ci ∈ Ci0 for each i ∈ {1, . . . , n} as well.As Ci0 is linearly independent, λ1 = · · · = λn = 0, showing C0 is linearly independent,as desired. Having, thus, verified all hypotheses of Zorn’s lemma, Th. 5.22 provides amaximal element B ∈ M. By Th. 5.17(ii), B is a basis of V . As U ⊆ B also holds, thisproves (a) and, in particular, (b).

To prove (c), let B1, B2 be bases of V . As we have shown above that (c) holds, providedB1 or B2 is finite, we now assume B1 and B2 both to be infinite. We show the existenceof an injective map φ1 : B2 −→ B1: To this end, for each v ∈ B1, let B2,v be the finitesubset of B2 given by Th. 5.19, consisting of all b ∈ B2 such that the coordinate of vwith respect to b is nonzero. Let ψv : B2,v −→ {1, . . . ,#B2,v} be bijective. Also define

E :=⋃

v∈B1

B2,v.

Then, since V = 〈B1〉, we also have V = 〈E〉. Thus, as E ⊆ B2 and the basis B2 isa minimal generating set, we obtain E = B2. In particular, for each b ∈ B2, we maychoose some v(b) ∈ B1 such that b ∈ B2,v(b). The map

f : B2 −→ B1 × N, f(b) :=(v(b), ψv(b)(b)

),

is then, clearly, well-defined and injective. Moreover, Th. A.56 of the Appendix providesus with a bijective map φB1

: B1 × N −→ B1. In consequence, φ1 : B2 −→ B1,φ1 := φB1

◦ f is injective as well. Since B1, B2 were arbitrary bases, we can interchangethe roles of B1 and B2 to also obain an injective map φ2 : B1 −→ B2. According tothe Schroder-Bernstein Th. 3.12, there then also exists a bijective map φ : B1 −→ B2,completing the proof of (c).

To prove (d), let B be a basis of V and let U ⊆ V be linearly independent. Analogouslyto the proof of (a), apply Zorn’s lemma, but this time with the set

M := {M ⊆ V : M = U ∪C, C ⊆ B, M is linearly independent}, (5.34)

such that the maximal element B ∈ M, obtained from Th. 5.22, has the form B = U ∪Cwith C ⊆ B. We claim B to be a basis of V : Linear independence is clear, since B ∈ M,and it only remains to check 〈B〉 = V . Due to the maximality of B, if b ∈ B, there exista finite set Bb ⊆ B and λbu ∈ F , u ∈ Bb, such that

b =∑

u∈Bb

λbuu. (5.35)

Now let v ∈ V be arbitrary. As B is a basis of V , there exist b1, . . . , bn ∈ B andλ1, . . . , λn ∈ F , n ∈ N, such that

v =n∑

i=1

λibi(5.35)=

n∑

i=1

∑

u∈Bbi

λiλbiuu,

proving 〈B〉 = V as desired. �

5 VECTOR SPACES 95

Definition 5.24. According to Th. 5.23, for each vector space V over a field F , thecardinality of its bases is unique. This unique cardinality is called the dimension of V andis denoted dimV . If dimV < ∞ (i.e. dimV ∈ N0), then V is called finite-dimensional,otherwise infinite-dimensional.

Remark 5.25. Let F be a field. In Ex. 5.16(c), we saw that B = {ei : i ∈ I} with

∀i∈I

ei : I −→ F, ei(j) = δij :=

{

1 if i = j,

0 if i 6= j,

is a basis of F Ifin. Since #B = #I, we have shown

dimF Ifin = #I, (5.36)

and, in particular, for I = {1, . . . , n}, n ∈ N,

dimF n = dimRn = dimCn = n. (5.37)

We will see in Th. 6.9 below that, in a certain sense, F Ifin is the only vector space of

dimension #I over F . In particular, for n ∈ N, one can think of F n as the standardmodel of an n-dimensional vector space over F .

Remark 5.26. Bases of vector spaces are especially useful if they are much smaller thanthe vector space itself, as in the case of finite-dimensional vector spaces over infinitefields, Rn and Cn being among the most important examples. For infinite-dimensionalvector spaces V , one often has dimV = #V (cf. Th. E.1 and Cor. E.2 in the Appendix)such that vector space bases are not as helpful in such situations. On the other hand,infinite-dimensional vector spaces might come with some additional structure, providinga more useful notion of basis (e.g., in infinite-dimensional Hilbert spaces, one typicallyuses so-called orthonormal bases rather than vector space bases).

—

In the rest of the section, we investigate relations between bases and subspaces; and weintroduce the related notions direct complement and direct sum.

Theorem 5.27. Let V be a vector space over the field F and let U be a subspace of V .

(a) If BU is a basis of U , then there exists a basis B of V with BU ⊆ B.

(b) dimU ≤ dimV , i.e. if BU is a basis of U and B is a basis of V , then there existsan injective map φ : BU −→ B.

(c) There exists a subspace U ′ of V such that V = U + U ′ and U ∩ U ′ = {0}.

(d) If dimV = n ∈ N0, then

dimU = dimV ⇔ U = V.

5 VECTOR SPACES 96

Proof. (a): Let BU be a basis of U . Since BU is a linearly independent subset of V , wecan employ Th. 5.23(a) to obtain C ⊆ V such that B := BU ∪C is a basis of V .

(b): Let BU and B be bases of U and V , respectively. As in the proof of (a), chooseC ⊆ V such that B := BU ∪C is a basis of V . According to Th. 5.23(c), there exists abijective map φV : B −→ B. Then the map φ := φV ↾BU

is still injective.

(c): Once again, let BU be a basis of U and choose C ⊆ V such that B := BU ∪C is abasis of V . Define U ′ := 〈C〉. Since B = BU ∪C is a basis of V , it is immediate thatV = U +U ′. Now suppose u ∈ U ∩U ′. Then there exist λ1, . . . , λm, µ1, . . . , µn ∈ F withm,n ∈ N, as well as distinct b1, . . . , bm ∈ BU and distinct c1, . . . , cn ∈ C such that

u =m∑

i=1

λibi =n∑

i=1

µici ⇒ 0 =m∑

i=1

λibi −n∑

i=1

µici.

Since {b1, . . . , bm, c1, . . . , cn} ⊆ B is linearly independent, this implies 0 = λ1 = · · · =λm = µ1 = · · · = µn and u = 0.

(d): Let dimU = dimV = n ∈ N0. If BU is a basis of U , then #BU = n and, by (a),BU must also be a basis of V , implying U = 〈BU〉 = V . �

Definition 5.28. Let V be a vector space over the field F and let U,W be subspacesof V . Then W is called a direct complement of U if, and only if, V = U + W andU ∩W = {0}. In that case, we also write V = U ⊕W and we say that V is the directsum of U and W .

—

Caveat: While set-theoretic complements A \ B are uniquely determined by A and B,nontrivial direct complements of vector subspaces are never unique, as shown in thefollowing proposition:

Proposition 5.29. Let V be a vector space over the field F and let U,W be subspacesof V such that V = U ⊕W . If there exist 0 6= u ∈ U and 0 6= w ∈ W , then there existsa subspace W 6= W of V with V = U ⊕ W .

Proof. According to Th. 5.23(d), there exist bases BU of U and BW of W with u ∈ BU

and w ∈ BW . Then v := u + w /∈ W , since, otherwise u = (u + w) − w ∈ W , incontradiction to U ∩W = {0}. Let B := {v} ∪ (BW \ {w}) and W := 〈B〉. Suppose,x ∈ U ∩ W . Then there exist distinct w1, . . . , wn ∈ BW \ {w} and λ0, λ1, . . . , λn ∈ F ,n ∈ N, such that

x = λ0(u+ w) +n∑

i=1

λiwi.

Since x ∈ U , we also obtain x − λ0u ∈ U . Also x − λ0u = λ0w +∑n

i=1 λiwi ∈ W ,implying

0 = x− λ0u = λ0w +n∑

i=1

λiwi.

5 VECTOR SPACES 97

However, w,w1, . . . , wn ∈ BW are linearly independent, yielding 0 = λ0 = · · · = λnand, thus, x = 0, showing U ∩ W = {0}. To obtain V = U ⊕ W , it remains to showw ∈ U + W (since V = U ⊕W , U ⊆ U + W , and BW \ {w} ⊆ W ⊆ U + W ). Sincew = −u+ (u+ w) ∈ U + W , the proof is complete. �

Theorem 5.30. Let V be a vector space over the field F and let U,W be subspaces ofV . Moreover, let BU , BW , B+, B∩ be bases of U,W,U +W,U ∩W , respectively.

(a) U +W = 〈BU ∪ BW 〉.

(b) If U ∩W = {0}, then BU ∪BW is a basis of U ⊕W .

(c) It holds thatdim(U +W ) + dim(U ∩W ) = dimU + dimW, (5.38)

by which we mean that there exists a bijective map

φ : (BU × {0}) ∪(BW × {1}) −→ (B+ × {0}) ∪(B∩ × {1}).

(d) Suppose B := BU ∪BW is a basis of V . Then V = U ⊕W .

Proof. (a): Let B := BU∪BW . If v = u+w with u ∈ U , w ∈ W , then there exist distinctu1, . . . , un ∈ BU , distinct w1, . . . , wm ∈ BW (m,n ∈ N), and λ1, . . . , λn, µ1, . . . , µm ∈ Fsuch that

v = u+ w =n∑

i=1

λiui +m∑

i=1

µiwi, (5.39)

showing U +W = 〈B〉.(b): Let B := BU ∪ BW . We know U ⊕ W = 〈B〉 from (a). Now consider (5.39)with u + w = 0. Then u = −w ∈ U ∩W = {0}, i.e. u = 0. Then w = 0 as well and0 = λ1 = · · · = λn = µ1 = · · · = µm, due to the linear independence of the u1, . . . , un andthe linear independence of the w1, . . . , wm. In consequence, B is linearly independentas well and, thus, a basis of U ⊕W .

(c): According to Th. 5.27(a), we may choose BU and BW such that B∩ ⊆ BU andB∩ ⊆ BW . Let B0 := BU \B∩, U0 := 〈B0〉, B1 := BW \B∩, U1 := 〈B1〉. Then

B := BU ∪ BW = B0 ∪B1 ∪B∩

is linearly independent: Consider, once again, the situation of the proof of (b), withv = 0 and, this time, with u1, . . . , un ∈ B0. Again, we obtain u = −w ∈ W , i.e.

u ∈ U0 ∩ (U ∩W ) = 〈B0〉 ∩ 〈B∩〉Prop. 5.14(d)

= 〈B0 ∩ B∩〉 = {0}.

Thus, 0 = λ1 = · · · = λn = 0, due to the linear independence of B0. This then impliesw = 0 and 0 = µ1 = · · · = µm due to the linear independence of BW , yielding the

6 LINEAR MAPS 98

claimed linear independence of B. According to (a), we also have U +W = 〈B〉, i.e. Bis a basis of U +W . Now we are in a position to define

φ : (BU × {0}) ∪(BW × {1}) −→ (B × {0}) ∪(B∩ × {1}),

φ(b, α) :=

(b, 0) for b ∈ BU and α = 0,

(b, 0) for b ∈ B1 and α = 1,

(b, 1) for b ∈ B∩ and α = 1,

which is, clearly, bijective. Bearing in mind Th. 5.23(c), this proves (c).

(d): As B = BU ∪BW is a basis of V , we can write each v ∈ V as v = u+w with u ∈ Uand w ∈ W , showing V = U +W . On the other hand,

U ∩ V = 〈BU〉 ∩ 〈BV 〉Prop. 5.14(d)

= 〈BU ∩ BV 〉 = {0},

proving V = U ⊕W . �

6 Linear Maps

6.1 Basic Properties and Examples

In previous sections, we have studied structure-preserving maps, i.e. homomorphisms,in the context of magmas (in particular, groups) and rings (in particular, fields), cf. Def.4.10 and Def. 4.33(b). In the context of vector spaces, the homomorphisms are so-calledlinear maps, which we proceed to define and study next.

Definition 6.1. Let V and W be vector spaces over the field F . A map A : V −→ Wis called vector space homomorphism or F -linear (or merely linear if the field F isunderstood) if, and only if,

∀v1,v2∈V

A(v1 + v2) = A(v1) + A(v2), (6.1a)

∧ ∀λ∈F

∀v∈V

A(λv) = λA(v) (6.1b)

or, equivalently, if, and only if,

∀λ,µ∈F

∀v1,v2∈V

A(λv1 + µv2) = λA(v1) + µA(v2) (6.2)

(note that, in general, vector addition and scalar multiplication will be different on theleft-hand sides and right-hand sides of the above equations). Thus, A is linear if, andonly if, it is a group homomorphism with respect to addition and also satisfies (6.1b).In particular, if A is linear, then kerA and ImA are defined by Def. 4.19, i.e.

kerA = A−1{0} = {v ∈ V : A(v) = 0}, (6.3a)

ImA = A(V ) = {A(v) : v ∈ V }. (6.3b)

6 LINEAR MAPS 99

We denote the set of all F -linear maps from V into W by L(V,W ). The notionsvector space (or linear) monomorphism, epimorphism, isomorphism, endomorphism,automorphism are then defined as in Def. 4.10. Moreover, V andW are called isomorphic(denoted V ∼= W ) if, and only if, there exists a vector space isomorphism A : V −→ W .

—

Before providing examples of linear maps in Ex. 6.7 below, we first provide and studysome basic properties of such maps.

Notation 6.2. For the composition of linear maps A and B, we often write BA insteadof B ◦ A. For the application of a linear map, we also often write Av instead of A(v).

Proposition 6.3. Let V,W,X be vector spaces over the field F .

(a) If A : V −→ W and B : W −→ X are linear, then so is BA.

(b) If A : V −→ W is a linear isomorphism, then A−1 is a linear isomorphism as well(i.e. A−1 is not only bijective, but also linear).

(c) If A ∈ L(V,W ), then kerA is a subspace of V and ImA is a subspace of W .

(d) A ∈ L(V,W ) is injective if, and only if, kerA = {0}.

Proof. (a): We note BA to be a homomorphism with respect to addition by Prop.4.11(a). Moreover, if v ∈ V and λ ∈ F , then

(BA)(λv) = B(A(λv)) = B(λ(Av)) = λ(B(Av)) = λ((BA)(v)),

proving the linearity of BA.

(b): A−1 is a homomorphism with respect to addition by Prop. 4.11(b). Moreover, ifw ∈ W and λ ∈ F , then

A−1(λw) = A−1(

λ(A(A−1w)

))

= A−1(

A(λ(A−1w)

))

= λ(A−1w),

proving the linearity of A−1.

(c): According to Th. 4.20(a),(b), (kerA,+) is a subgroup of (V,+) and (ImA,+) is asubgroup of (W,+). Let λ ∈ F . If v ∈ kerA, then

A(λv) = λ(Av) = λ · 0 = 0,

showing λv ∈ kerA and that kerA is a subspace of V . If w ∈ ImA, then there existsv ∈ V with Av = w. Thus,

A(λv) = λ(Av) = λw,

showing λw ∈ ImA and that ImA is a subspace of W .

(d) is merely a restatement of Th. 4.20(c) for the current situation. �

6 LINEAR MAPS 100

Proposition 6.4. Let V be a vector space over the field F and let W be a set withcompositions + : W × W −→ W and · : F × W −→ W . If A : V −→ W is anepimorphism with respect to + that also satisfies (6.1b), then W (with + and ·), is avector space over F as well.


Proposition 6.5. Let V and W be vector spaces over the field F , and let A : V −→ Wbe linear.

(a) A is injective if, and only if, for each linearly independent subset S of V , A(S) isa linearly independent subset of W .

(b) A is surjective if, and only if, for each generating subset S of V , A(S) is a generatingsubset of W .

(c) A is bijective if, and only if, for each basis B of V , A(B) is a basis of W .

Proof. Excercise. �

Theorem 6.6. Let V and W be vector spaces over the field F . Then each linear mapA : V −→ W is uniquely determined by its values on a basis of V . More precisely, if B isa basis of V , (wb)b∈B is a family in W , and, for each v ∈ V , Bv and cv : Bv −→ F \{0}are as in Th. 5.19 (we now write cv instead of c to underline the dependence of c on v),then the map

A : V −→ W, A(v) = A

(∑

b∈Bv

cv(b) b

)

:=∑

b∈Bv

cv(b)wb, (6.4)

is linear, and A ∈ L(V,W ) with

∀b∈B

A(b) = wb, (6.5)

implies A = A.

Proof. We first verify A is linear. Let v ∈ V and λ ∈ F . If λ = 0, then A(λv) = A(0) =0 = λA(v). If λ 6= 0, then Bλv = Bv, cλv = λcv, and

A(λv) = A

(∑

b∈Bλv

cλv(b) b

)

=∑

b∈Bv

λ cv(b)wb = λA

(∑

b∈Bv

cv(b) b

)

= λA(v). (6.6a)

Now let u, v ∈ V . If u = 0, then A(u + v) = A(v) = 0 + A(v) = A(u) + A(v),and analogously if v = 0. So assume u, v 6= 0. If u + v = 0, then v = −u and

6 LINEAR MAPS 101

A(u+v) = A(0) = 0 = A(u)+A(−u) = A(u)+A(v). If u+v 6= 0, then Bu+v ⊆ Bu∪Bv

and

A(u+ v) = A

∑

b∈Bu+v

cu+v(b) b

=∑

b∈Bu+v

cu+v(b)wb

=∑

b∈Bu

cu(b)wb +∑

b∈Bv

cv(b)wb = A(u) + A(v). (6.6b)

If v ∈ V and Bv and cv are as before, then the linearity of A and (6.5) imply

A(v) = A

(∑

b∈Bv

cv(b) b

)

A∈L(V,W )=

∑

b∈Bv

cv(b) A(b) =∑

b∈Bv

cv(b)wb = A(v). (6.7)

Since (6.7) establishes A = A, the proof is complete. �

Example 6.7. (a) Let V and W be finite-dimensional vector spaces over the field F ,dimV = n, dimW = m, where m,n ∈ N0. From Th. 6.6, we obtain a very usefulcharacterization of the linear maps between V and W : If V = {0} or W = {0},then A ≡ 0 is the only element of L(V,W ). Now let b1, . . . , bn ∈ V form a basis ofV and let c1, . . . , cm ∈ W form a basis of W . Given A ∈ L(V,W ), define

∀i∈{1,...,n}

wi := A(bi).

Then, by Th. 5.19, there exists a unique family (aji)(j,i)∈{1,...,m}×{1,...,n} in F suchthat

∀i∈{1,...,n}

wi = A(bi) =m∑

j=1

aji cj. (6.8)

Thus, given v =∑n

i=1 λibi ∈ V , λ1, . . . , λn ∈ F , we find

A(v) = A

(n∑

i=1

λibi

)

=n∑

i=1

λiwi =n∑

i=1

m∑

j=1

λi aji cj =m∑

j=1

(n∑

i=1

ajiλi

)

cj

=m∑

j=1

µjcj, where µj :=n∑

i=1

ajiλi. (6.9)

Thus, from Th. 6.6, we conclude that the map, assigning to each vector v =∑n

i=1 λibi ∈ V the vector∑m

j=1 µjcj with µj :=∑n

i=1 ajiλi is precisely the uniquelinear map A : V −→ W , satisfying (6.8).

(b) Let V be a vector space over the field F , let B be a basis of V . Given v ∈ V , letBv and cv : Bv −→ F \ {0} be as in Th. 5.19. For each c ∈ B, define

πc : V −→ F, πc(v) :=

{

cv(c) for c ∈ Bv,

0 for c /∈ Bv,

6 LINEAR MAPS 102

calling πc the projection onto the coordinate with respect to c. Comparing with(6.4), we see that πc is precisely the linear map A : V −→ W := F , determined bythe family (wb)b∈B in F with

∀b∈B

wb := δbc :=

{

1 for b = c,

0 for b 6= c,(6.10)

as this yields

∀v∈V

A(v) = A

(∑

b∈Bv

cv(b) b

)

=∑

b∈Bv

cv(b)wb =∑

b∈Bv

cv(b) δbc =

{

cv(c) for c ∈ Bv,

0 for c /∈ Bv.

In particular, πc is linear. Moreover, for each c ∈ B, we have

Im πc = F, ker πc = 〈B \ {c}〉, V = 〈c〉 ⊕ ker πc. (6.11)

(c) Let I be a nonempty set and let Y be a vector space over the field F . We thenknow from Ex. 5.2(c) that V := F(I, Y ) = Y I is a vector space over F . For eachi ∈ I, define

πi : V −→ Y, πi(f) := f(i),

calling πi the projection onto the ith coordinate (note that, for I = {1, . . . , n},n ∈ N, Y = F , one has V = F n, πi(v1, . . . , vn) = vi). We verify πi to be linear: Letλ ∈ F and f, g ∈ V . Then

πi(λf) = (λf)(i) = λ(f(i)) = λπi(f),

πi(f + g) = (f + g)(i) = f(i) + g(i) = πi(f) + πi(g),

proving πi to be linear. Moreover, for each i ∈ I, we have

Im πi = Y, ker πi = {(f : I −→ Y ) : f(i) = 0} ∼= Y I\{i}, V = Y ⊕ ker πi,(6.12)

where, for the last equality, we identified

Y ∼= {(f : I −→ Y ) : f(j) = 0 for each j 6= i}.A generalization of the present example to general Cartesian products of vectorspaces can be found in Ex. E.4 of the Appendix. To investigate the relation betweenthe present projections and the projections of (b), let Y = F . We know from Ex.5.16(c) that B = {ei : i ∈ I} with

∀i∈I

ei : I −→ F, ei(j) = δij :=

{

1 if i = j,

0 if i 6= j,

is a basis of F Ifin, which is a subspace of F I , such that the πi are defined on F I

fin.Now, if f =

∑nj=1 λijeij with λ1, . . . , λn ∈ F ; i1, . . . , in ∈ I; n ∈ N, and πeik is

defined as in (b) (k ∈ {1, . . . , n}), then

πik(f) = f(ik) =n∑

j=1

λijeij(ik) =n∑

j=1

λijδjk = λik = πeik (f), (6.13)

showing πik↾F Ifin= πeik .

6 LINEAR MAPS 103

(d) We consider the set of complex numbers C as vector space over R with basis {1, i}.Then the map of complex conjugation

A : C −→ C, A(z) = A(x+ iy) = z = x− iy,

is R-linear, since, for each z = x+ iy, w = u+ iv ∈ C,

z + w = x+u−iy−iv = z+w ∧ zw = xu−yv−(xv+yu)i = (x−iy)(u−iv) = zw

and z = z for z ∈ R. As A is bijective with A = A−1, A is even an automorphism.However, complex conjugation is not C-linear (now considering C as a vector spaceover itself), since, e.g., i · 1 = −i 6= i · 1 = i.

(e) By using results from Calculus, we obtain the following examples:

(i) Let V be the set of convergent sequences in K. According to [Phi16, Th.7.13(a)], scalar multiples of convergent sequences are convergent and sums ofconvergent sequences are convergent, showing V to be a subspace of KN, thevector space over K of all sequences in K. If (zn)n∈N and (wn)n∈N are elementsof V and λ ∈ K, then, using [Phi16, Th. 7.13(a)] once again, we know

limn→∞

(λzn) = λ limn→∞

zn, limn→∞

(zn + wn) = limn→∞

zn + limn→∞

wn,

showing the map

A : V −→ K, A(zn)n∈N := limn→∞

zn, (6.14)

to be linear. Moreover, we have

ImA = K, kerA = {(zn)n∈N ∈ V : limn→∞

zn = 0}, V = 〈(1)n∈N〉 ⊕ kerA,

(6.15)where the last equality is due to the fact that, if (zn)n∈N ∈ V with limn→∞ zn =λ, then limn→∞(zn − λ · 1) = 0, i.e. (zn)n∈N − λ(1)n∈N ∈ kerA.

(ii) Let a, b ∈ R, a < b, I :=]a, b[, and let V : {(f : I −→ K) : f is differentiable}.According to [Phi16, Th. 9.7], scalar multiples of differentiable functions aredifferentiable and sums of differentiable functions are differentiable, showingV to be a subspace of KI , the vector space over K of all functions from I intoK. Using [Phi16, Th. 9.7] again, we know the map

A : V −→ KI , A(f) := f ′, (6.16)

to be linear. While kerA = {f ∈ V : f constant} (e.g. due to the fundamentaltheorem of calculus in the form [Phi16, Th. 10.20(b)]), ImA is not so simpleto characterize (cf. [Phi17b, Th. 2.11]).

(iii) Let a, b ∈ R, a ≤ b, I := [a, b], and letW := R(I,K) be the set of all K-valuedRiemann integrable functions on I. According to [Phi16, Th. 10.11(a)], scalarmultiples of Riemann integrable functions are Riemann integrable and sums

6 LINEAR MAPS 104

of Riemann integrable functions are Riemann integrable, showing W to beanother subspace of KI . Using [Phi16, Th. 10.11(a)] again, we know the map

B : W −→ K, B(f) :=

∫

I

f, (6.17)

to be linear. Moreover, for a < b, we have

ImB = K, kerB =

{

f ∈ W :

∫

I

f = 0

}

, W = 〈1〉 ⊕ kerB, (6.18)

where the last equality is due to the fact that, if f ∈ W with∫

If = λ, then

∫

I(f − λ

b−a· 1) = 0, i.e. f − λ

b−a· 1 ∈ kerB.

—

For some of the linear maps investigated in Ex. 6.7 above, we have provided the re-spective kernels, images, and direct complements of the kernels. The examples suggestrelations between the dimensions of domain, image, and kernel of a linear map, whichare stated an proved in the following theorem:

Theorem 6.8 (Dimension Formulas). Let V and W be vector spaces over the field F ,and let A : V −→ W be linear. Moreover, let BV be a basis of V , let Bker be a basis ofkerA, let BIm be a basis of ImA, and let BW be a basis of W .

(a) If U is a direct complement of kerA (i.e. if V = U ⊕kerA), then A↾U : U −→ ImAis bijective. In consequence, dimV = dimkerA + dim ImA, i.e. there exists abijective map φ : BV −→ (Bker × {0}) ∪(BIm × {1}).

(b) dim ImA ≤ dimV , i.e. there exists an injective map from BIm into BV .

(c) dim ImA ≤ dimW , i.e. there exists an injective map from BIm into BW .

Proof. (a): If u ∈ kerA ↾U , then u ∈ U ∩ kerA, i.e. u = 0, i.e. A ↾U is injective.If w ∈ ImA and Av = w, then v = v0 + u with v0 ∈ kerA and u ∈ U . ThenAu = A(v)− A(v0) = w − 0 = w, showing A↾U is surjective. According to Th. 5.30(b),if BU is a basis of U , then B := Bker ∪BU is a basis of V . Moreover, by Prop. 6.5(c),A(BU) is a basis of ImA. Thus, by Th. 5.23(c), it suffices to show there exists a bijectivemap ψ : B −→ (Bker × {0}) ∪(A(BU)× {1}). If we define

ψ : B −→ (Bker × {0}) ∪(A(BU)× {1}), ψ(b) :=

{

(b, 0) for b ∈ Bker,

(A(b), 1) for b ∈ BU ,

then ψ is bijective due to the bijectivity of A↾U .

(b) is immediate from (a).

(c) is due to Th. 5.27(b), since ImA is a subspace of W . �

6 LINEAR MAPS 105

The following theorem is one of the central results of Linear Algebra: It says that vectorspaces are essentially determined by the size of their bases:

Theorem 6.9. Let V and W be vector spaces over the field F . Then V ∼= W (i.e. Vand W are isomorphic) if, and only if, dimV = dimW .

Proof. Suppose dimV = dimW . If BV is a basis of V and BW is a basis of W , thenthere exists a bijective map φ : BV −→ BW . According to Th. 6.6, φ defines a uniquelinear map A : V −→ W with A(b) = φ(b) for each b ∈ BV . More precisely, letting,once again, for each v ∈ V , Bv and cv : Bv −→ F \ {0} be as in Th. 5.19 (writing cvinstead of c to underline the dependence of c on v),

∀v∈V

A(v) = A

(∑

b∈Bv

cv(b) b

)

=∑

b∈Bv

cv(b)φ(b). (6.19)

It remains to show A is bijective. If v 6= 0, then Bv 6= ∅ and A(v) =∑b∈Bvcv(b)φ(b) 6= 0,

since cv(b) 6= 0 and {φ(b) : b ∈ Bv} ⊆ BW is linearly independent, showing A isinjective by Prop. 6.3(d). If w ∈ W , then there exists a finite set Bw ⊆ BW andcw : Bw −→ F \ {0} such that w =

∑

b∈Bwcw(b) b. Then

A

∑

b∈Bw

cw(b)φ−1(b)

A∈L(V,W )

=∑

b∈Bw

cw(b)A(

φ−1(b))

φ−1(b)∈BV=

∑

b∈Bw

cw(b)φ(

φ−1(b))

=∑

b∈Bw

cw(b) b = w, (6.20)

showing ImA = W , completing the proof that A is bijective.

If A : V −→ W is a linear isomorphism and B is a basis for V , then, by Prop. 6.5(c),A(B) is a basis for W . As A is bijective, so is A↾B, showing dimV = #B = #A(B) =dimW as claimed. �

Theorem 6.10. Let V and W be finite-dimensional vector spaces over the field F ,dimV = dimW = n ∈ N0. Then, given A ∈ L(V,W ), the following three statementsare equivalent:

(i) A is an isomorphism.

(ii) A is an epimorphism.

(iii) A is a monomorphism.

Proof. If suffices to prove the equivalence between (ii) and (iii).

“(ii)⇒(iii)”: If A is an epimorphism, then W = ImA, implying

dimkerAdimV < ∞, Th. 6.8(a)

= dimV − dim ImA = dimV − dimW = n− n = 0,

6 LINEAR MAPS 106

showing kerA = {0}, i.e. A is a monomorphism.

“(iii)⇒(ii)”: If A is a monomorphism, then kerA = {0}. Thus, by Th. 6.8(a), n =dimW = dimV = dimkerA+dim ImA = 0+dim ImA = dim ImA. From Th. 5.27(d),we then obtain W = ImA, i.e. A is an epimorphism. �

Example 6.11. The present example shows that the analogue of Th. 6.10 does not holdfor infinite-dimensional vector spaces: Let F be a field and V := FN (the vector spaceof sequences in F – the example actually still works in precisely the same way over thesimpler space FN

fin). Define the maps

R : V −→ V, R(λ1, λ2, . . . , ) := (0, λ1, λ2, . . . , ),

L : V −→ V, L(λ1, λ2, . . . , ) := (λ2, λ3, . . . )

(R is called the right shift operator, L is called the left shift operator). Clearly, R and Lare linear and

L ◦R = Id,

i.e. L is a left inverse for R, R is a right inverse for L. Thus, according to Th. 2.13,R is injective and L is surjective (which is also easily seen directly from the respectivedefinitions of R and L). However,

ImR ={(f : N −→ F ) : f(1) = 0

}6= V (e.g. (1, 0, 0, . . . , ) /∈ ImR),

kerL ={(f : N −→ F ) : f(k) = 0 for k 6= 1

}6= {0} (e.g. 0 6= (1, 0, 0, . . . , ) ∈ kerL),

showing R is not surjective, L is not injective.

6.2 Quotient Spaces

In Def. 4.25, we defined the quotient group G/N of a group G with respect to a normalsubgroup N . Now, if V is a vector space over a field F , then, in particular, (V,+) is acommutative group. Thus, every subgroup of V is normal. Thus, if U is a subspace ofV , then we can always form the quotient group V/U . We will see below that we caneven make V/U into a vector space over F , called the quotient space or factor space ofV with respect to U . As we write (V,+) as an additive group, we write the respectivecosets (i.e. the elements of V/U) as v + U , v ∈ V .

Theorem 6.12. Let V be a vector space over the field F and let U be a subspace of V .

(a) The compositions

+ : V/U × V/U −→ V/U, (v + U) + (w + U) := v + w + U, (6.21a)

· : F × V/U −→ V/U, λ · (v + U) := λv + U, (6.21b)

are well-defined, i.e. the results do not depend on the chosen representatives of therespective cosets.

6 LINEAR MAPS 107

(b) The natural (group) epimorphism of Th. 4.26(a),

φU : V −→ V/U, φU(v) := v + U, (6.22)

satisfies

∀v,w∈V

φU(v + w) = φU(v) + φU(w), (6.23a)

∀λ∈F

∀v∈V

φU(λv) = λφU(v). (6.23b)

(c) V/U with the compositions of (a) forms a vector space over F and φU of (b)constitutes a linear epimorphism.

Proof. (a): The composition + is well-defined by Th. 4.26(a). To verify that · is well-defined as well, let v, w ∈ V , λ ∈ F , and assume v + U = w + U . We need to showλv+U = λw+U . If x ∈ λv+U , then there exists u1 ∈ U such that x = λv+ u1. Sincev+U = w+U , there then exists u2 ∈ U such that v+u1 = w+u2 and v = w+u2−u1.Thus, x = λv+u1 = λw+λ(u2−u1)+u1. Since U is a subspace of V , λ(u2−u1)+u1 ∈ U ,showing x ∈ λw + U and λv + U ⊆ λw + U . As we can switch the roles of v and w inthe above argument, we also have λw + U ⊆ λv + U and λv + U = λw + U , as desired.

(b): (6.23a) holds, as φU is a homomorphism with respect to + by Th. 4.26(a). Toverify (6.23b), let λ ∈ F , v ∈ V . Then

φU(λv) = λv + U(6.21b)= λ(v + U) = λφU(v),

establishing the case.

(c) is now immediate from (b) and Prop. 6.4. �

One can sometimes obtain information about a vector space V by studying a subspaceU and the corresponding quotient space V/U , both of which are “smaller” spaces in thesense of the following Cor. 6.13:

Corollary 6.13. Let V be a vector space over the field F and let U be a subspace of V .Let φ : V −→ V/U , φ(v) := v + U , denote the natural epimorphism.

(a) If B is a basis of V and BU ⊆ B is a basis of U , then B/ := {b+ U : b ∈ B \ BU}is a basis of V/U .

(b) Let BU be a basis of U and let W be another subspace of V with basis BW . Ifφ↾W : W −→ V/U is bijective, then BU ∪BW forms a basis of V .

(c) dimV = dimU + dimV/U (cf. Th. 6.8(a)).

Proof. (a): LettingW := 〈B\BU〉, we have V = U⊕W by Th. 5.30(d). Since U = kerφ,we have V = kerφ ⊕W and φ ↾W : W −→ V/U is bijective by Th. 6.8(a) and, thus,B/ = φ↾W (B \BU) is a basis of V/U .

6 LINEAR MAPS 108

(b): Let v ∈ V . Then there exists w ∈ W with v+U = φ(v) = φ(w) = w+U , implyingv − w ∈ U . Thus, v = w + v − w ∈ W + U , showing V = W + U . On the other hand,if v ∈ U ∩W , then φ(v) = v + U = U = φ(0) and, thus, v = 0, showing V = W ⊕ U .Then BU ∪BW forms a basis of V by Th. 5.30(b).

(c): Since U = kerφ and V/U = Imφ, (c) is immediate from Th. 6.8(a). �

From Def. 4.25 and Th. 4.26(a), we already know that the elements of V/U are preciselythe equivalence classes of the equivalence relation defined by

∀v,w∈V

(

v ∼ w :⇔ v + U = w + U ⇔ v − w ∈ U)

. (6.24)

Here, in the context of vector spaces, it turns out that the equivalence relations on Vthat define quotient spaces, are precisely so-called linear equivalence relations:

Definition 6.14. Let V be a vector space over the field F and let R ⊆ V × V be arelation on V . Then R is called linear if, and only if, it has the following two properties:

(i) If v, w, x, y ∈ V with vRw and xRy, then (v + x)R(w + y).

(ii) If v, w ∈ V and λ ∈ F with vRw, then (λv)R(λw).

Proposition 6.15. Let V be a vector space over the field F .

(a) If U is a subspace of V , then the equivalence relation defined by (6.24) is linear.

(b) Let ∼ be a linear equivalence relation on V . Then U := {v ∈ V : v ∼ 0} is asubspace of V and ∼ satisfies (6.24).

Proof. (a): The linearity of ∼ defined by (6.24) is precisely the assertion that thecompositions + and · of Th. 6.12(a) are well-defined.

(b): We have 0 ∈ U , as ∼ is reflexive. If v, w ∈ U and λ ∈ F , then the linearity of ∼implies v+w ∼ 0+0 = 0 and λv ∼ λ0 = 0, i.e. v+w ∈ U and λv ∈ U , showing U to bea subspace. Now suppose v, w ∈ V with v ∼ w. Then the linearity of ∼ yields v−w ∼ 0,i.e. v−w ∈ U . If x = v+u with u ∈ U , then x = w+(x−w) = w+(v+u−w) ∈ w+U ,showing v + U ⊆ w + U . As ∼ is symmetric, we then have w + U ⊆ v + U as well.Conversely, if v, w ∈ V with v + U = w + U , then v − w ∈ U , i.e. v − w ∼ 0 and thelinearity of ∼ implies v ∼ w, proving ∼ satisfies (6.24). �

In Th. 4.26(b), we proved the isomorphism theorem for groups: If G and H are groupsand φ : G −→ H is a homomorphism, then G/ kerφ ∼= Imφ. Since vector spaces V andW are, in particular, groups, if A : V −→ W is linear, then V/ kerA ∼= ImA as groups,and it is natural to ask, whether V/ kerA and ImA are isomorphic as vector spaces. InTh. 6.16(a) below we see this, indeed, to be the case.

Theorem 6.16 (Isomorphism Theorem). Let V and X be vector spaces over the fieldF , let U,W be subspaces of V .

6 LINEAR MAPS 109

(a) If A : V −→ X is linear, then

V/ kerA ∼= ImA. (6.25)

More precisely, the map

f : V/ kerA −→ ImA, f(v + kerA) := A(v), (6.26)

is well-defined and constitutes an linear isomorphism. If fe : V −→ V/ kerAdenotes the natural epimorphism and ι : ImA −→ X, ι(v) := v, denotes theembedding, then fm : V/ kerA −→ X, fm := ι ◦ f , is a linear monomorphism suchthat

A = fm ◦ fe. (6.27)

(b) One has(U +W )/W ∼= U/(U ∩W ). (6.28)

(c) If U is a subspace of W , then

(V/U)/(W/U) ∼= V/W. (6.29)

Proof. (a): All assertions, except the linearity assertions, were already proved in Th.4.26(b). Moreover, fe is linear by Th. 6.12(b). Thus, it merely remains to show

f(λ(v + kerA)

)= λ f(v + kerA)

for each λ ∈ F and each v ∈ V (then fm is linear, as both f and ι are linear). Indeed,if λ ∈ F and v ∈ V , then

f(λ(v + kerA)

)= f(λv + kerA) = A(λv) = λA(v) = λ f(v + kerA),

as desired.

(b): According to (a), it suffices to show that

A : U +W −→ U/(U ∩W ), A(u+ w) := u+ (U ∩W ) for u ∈ U , w ∈ W (6.30)

well-defines a linear map with kerA = W . To verify that A is well-defined, let u, u1 ∈ Uand w,w1 ∈ W with u+ w = u1 + w1. Then u− u1 = w1 − w ∈ U ∩W and, thus,

A(u+ w) = u+ (U ∩W ) = u1 + w1 − w + (U ∩W ) = u1 + (U ∩W ) = A(u1 + w1),

proving A to be well-defined by (6.30). To prove linearity, we no longer assume u+w =u1 + w1 and calculate

A((u+ w) + (u1 + w1)

)= A

((u+ u1) + (w + w1)

)= u+ u1 + (U ∩W )

=(u+ (U ∩W )

)+(u1 + (U ∩W )

)

= A(u+ w) + A(u1 + w1),

6 LINEAR MAPS 110

as well as, for each λ ∈ F ,

A(λ(u+ w)

)= A(λu+ λw) = λu+ (U ∩W ) = λ

(u+ (U ∩W )

)= λA(u+ w),

as needed. If w ∈ W , thenA(w) = A(0+w) = U∩W , showing w ∈ kerA andW ⊆ kerA.Conversely, let u+w ∈ kerA, u ∈ U , w ∈ W . Then A(u+w) = u+ (U ∩W ) = U ∩W ,i.e. u ∈ U ∩W and u+ w ∈ W , showing kerA ⊆ W .

(c): Exercise. �

6.3 Vector Spaces of Linear Maps

Definition 6.17. Let V andW be vector spaces over the field F . We define an additionand a scalar multiplication on L(V,W ) by

(A+B) : V −→ W, (A+ B)(x) := A(x) + B(x), (6.31a)

(λ · A) : V −→ W, (λ · A)(x) := λ · A(x) for each λ ∈ F . (6.31b)

Theorem 6.18. Let V and W be vector spaces over the field F . The addition andscalar multiplication on L(V,W ) given by (6.31) are well-defined in the sense that, ifA,B ∈ L(V,W ) and λ ∈ F , then A+B ∈ L(V,W ) and λA ∈ L(V,W ). Moreover, withthe operations defined in (6.31), L(V,W ) forms a vector space over F .


Theorem 6.19. Let V and W be vector spaces over the field F , let BV and BW be basesof V and W , respectively. Using Th. 6.6, define maps Avw ∈ L(V,W ) by letting

∀w,v,v∈BW×BV ×BV

Avw(v) :=

{

w for v = v,

0 for v 6= v.(6.32)

DefineB :=

{Avw : (v, w) ∈ BV × BW

}. (6.33)

(a) B is linearly independent.

(b) If V is finite-dimensional, dimV = n ∈ N, BV = {v1, . . . , vn}, then B constitutes abasis for L(V,W ). If, in addition, dimW = m ∈ N, BW = {w1, . . . , wm}, then wecan write

dimL(V,W ) = dimV · dimW = n ·m. (6.34)

(c) If dimV = ∞ and dimW ≥ 1, then 〈B〉 ( L(V,W ) and, in particular, B is not abasis of L(V,W ).

6 LINEAR MAPS 111

Proof. (a): Let N,M ∈ N. Let v1, . . . , vN ∈ BV be distinct and let w1, . . . , wM ∈ BW

be distinct as well. If λji ∈ F , (j, i) ∈ {1, . . . ,M} × {1, . . . , N}, are such that

A :=M∑

j=1

N∑

i=1

λjiAviwj= 0,

then, for each k ∈ {1, . . . , N},

0 = A(vk) =M∑

j=1

N∑

i=1

λjiAviwj(vk) =

M∑

j=1

N∑

i=1

λjiδikwj =M∑

j=1

λjkwj

implies λ1k = · · · = λMk = 0 due to the linear independence of the wj. As this holds foreach k ∈ {1, . . . , N}, we have established the linear independence of B.(b): According to (a), it remains to show 〈B〉 = L(V,W ). Let A ∈ L(V,W ) andi ∈ {1, . . . , n}. Then there exists a finite set Bi ⊆ BW such that Avi =

∑

w∈Biλww with

λw ∈ F . Now let w1, . . . , wM , M ∈ N, be an enumeration of the finite set B1 ∪ · · · ∪Bn.Then there exist aji ∈ F , (j, i) ∈ {1, . . . ,M} × {1, . . . , n}, such that

∀i∈{1,...,n}

A(vi) =M∑

j=1

ajiwj.

Letting L :=∑M

j=1

∑ni=1 ajiAviwj

, we claim A = L. Indeed,

∀k∈{1,...,n}

L(vk) =M∑

j=1

n∑

i=1

ajiAviwj(vk) =

M∑

j=1

n∑

i=1

ajiδikwj =M∑

j=1

ajkwj = A(vk),

proving L = A by Th. 6.6. Since L ∈ 〈B〉, the proof of (b) is complete.

(c): As dimW ≥ 1, there exists w ∈ BW . If A ∈ 〈B〉, then {v ∈ BV : Av 6= 0} is finite.Thus, if BV is infinite, then the map A ∈ L(V,W ) with A(v) := w for each v ∈ BV isnot in 〈B〉, proving (c). �

Lemma 6.20. Let V,W,X be vector spaces over the field F . Recalling Not. 6.2, wehave:

(a) If A ∈ L(W,X) and B,C ∈ L(V,W ), then

A(B + C) = AB + AC. (6.35)

(b) If A,B ∈ L(W,X) and C ∈ L(V,W ), then

(A+ B)C = AC +BC. (6.36)

6 LINEAR MAPS 112

Proof. (a): For each v ∈ V , we calculate

(A(B + C)

)v = A(Bv + Cv) = A(Bv) + A(Cv) = (AB)v + (AC)v = (AB + AC)v,

proving (6.35).

(b): For each v ∈ V , we calculate

((A+ B)C

)v = (A+ B)(Cv) = A(Cv) +B(Cv) = (AC)v + (BC)v = (AC + BC)v,

proving (6.36). �

Theorem 6.21. Let V be a vector space over the field F . Then R := L(V, V ) con-stitutes a ring with unity with respect to pointwise addition and map composition asmultiplication. If dimV > 1, then R is not commutative and it has nontrivial divisorsof 0.

Proof. Since (V,+) is a commutative group, R is a subring (with unity Id ∈ R) of thering with unity of group endomorphisms End(V ) of Ex. 4.41(b) (that R constitutes aring can, alternatively, also be obtained from Lem. 6.20 with V = W = X). Now letv1, v2 ∈ V be linearly independent and B a basis of V with v1, v2 ∈ B. Define A1, A2 ∈ Rby letting, for v ∈ B,

A1v :=

{

v2 for v = v1,

0 otherwise,A2v :=

{

v1 for v = v2,

0 otherwise.(6.37)

Then A1A2v1 = 0, but A2A1v1 = A2v2 = v1, showing A1A2 6= A2A1. Moreover,A2

1v1 = A1v2 = 0 and A22v2 = A2v1 = 0, showing A2

1 ≡ 0 and A22 ≡ 0. In particular,

both A1 and A2 are nontrivial divisors of 0. �


(a) A linear endomorphism A ∈ L(V, V ) is called regular if, and only if, it is an au-tomorphism (i.e. if, and only if, it is bijective/invertible). Otherwise, A is calledsingular.

(b) The set GL(V ) := {A ∈ L(V, V ) : A is regular} = L(V, V )∗ (cf. Def. and Rem.4.40) is called the general linear group of V (cf. Cor. 6.23 below).

Corollary 6.23. Let V be a vector space over the field F . Then the general lineargroup GL(V ) is, indeed, a group with respect to map composition. For V 6= {0}, it isnot a group with respect to addition (note 0 /∈ GL(V ) in this case). If dimV ≥ 3 orF 6= {0, 1} and dimV ≥ 2, then the group GL(V ) is not commutative.

Proof. Since L(V, V ) is a ring with unity, we know GL(V ) = L(V, V )∗ to be a groupby Def. and Rem. 4.40 – since the composition of linear maps is linear, GL(V ) is also asubgroup of the symmetric group SV (cf. Ex. 4.9(b)). If dimV ≥ 3, then the noncom-mutativity of GL(V ) follows as in Ex. 4.9(b), where the elements a, b, c of (4.7) are now

7 MATRICES 113

taken as distinct elements of a basis of V . Now let v1, v2 ∈ V be linearly independentand B a basis of V with v1, v2 ∈ B. Define A1, A2 ∈ L(V, V ) by letting, for v ∈ B,

A1v :=

v2 for v = v1,

v1 for v = v2,

v otherwise,

A2v :=

−v1 for v = v1,

v2 for v = v2,

v otherwise.

(6.38)

Clearly, A1, A2 ∈ GL(V ) with A−11 = A1 and A−1

2 = A2. Moreover, A1A2v1 = −A1v1 =−v2, A2A1v1 = A2v2 = v2. If F 6= {0, 1}, then v2 6= −v2, showing A1A2 6= A2A1. �

7 Matrices

7.1 Definition and Arithmetic

Matrices provide a convenient representation for linear mapsA between finite-dimension-al vector spaces V and W . Recall the basis

{Aviwj

: (j, i) ∈ {1, . . . ,m}× {1, . . . , n}}of

L(V,W ) that, in Th. 6.19, was shown to arise from bases {v1, . . . , vn} and {w1, . . . , wm}of V and W , respectively; m,n ∈ N. Thus, each A ∈ L(V,W ) can be written in theform

A =m∑

j=1

n∑

i=1

ajiAviwj, (7.1)

with coordinates (aji)(j,i)∈{1,...,m}×{1,...,n} in F (also cf. Ex. 6.7(a)). This motivates thefollowing definition of matrices, where, however, instead of F , we allow F an arbitraynonempty set S, as it turns out to be sometimes useful to have matrices with entriesthat are not necessarily elements of a field.

Definition 7.1. Let S be a nonempty set andm,n ∈ N. A family (aji)(j,i)∈{1,...,m}×{1,...,n}

in S is called an m-by-n or an m × n matrix over S, where m × n is called the size,dimension or type of the matrix. The aji are called the entries or elements of the matrix.One also writes just (aji) instead of (aji)(j,i)∈{1,...,m}×{1,...,n} if the size of the matrix isunderstood. One usually thinks of the m× n matrix (aji) as the rectangular array

(aji) =

a11 . . . a1n...

......

am1 . . . amn

(7.2)

with m rows and n columns. One therefore also calls 1 × n matrices row vectors andm × 1 matrices column vectors, and one calls n × n matrices quadratic. One calls theelements akk the (main) diagonal elements of (aji) and one also says that these elementslie on the (main) diagonal of the matrix and that they form the (main) diagonal of thematrix. The set of all m× n matrices over S is denoted by M(m,n, S), and for the setof all quadratic n× n matrices, one uses the abbreviation M(n, S) := M(n, n, S).

7 MATRICES 114

Definition 7.2 (Matrix Arithmetic). Let F be a field and m,n, l ∈ N. Let S be a ringor a vector space over F (e.g. S = F ).

(a) Matrix Addition: For m× n matrices (aji) and (bji) over S, define the sum

(aji) + (bji) := (aji + bji) ∈ M(m,n, S). (7.3)

(b) Scalar Multiplication: Let (aji) be an m× n matrix over S. If S is a ring, then letλ ∈ S; if S is a vector space over F , then let λ ∈ F . Define

λ (aji) := (λ aji) ∈ M(m,n, S). (7.4)

(c) Matrix Multiplication: Let S be a ring. For each m× n matrix (aji) and each n× lmatrix (bji) over S, define the product

(aji)(bji) :=

(n∑

k=1

ajkbki

)

(j,i)∈{1,...,m}×{1,...,l}

∈ M(m, l, S), (7.5)

i.e. the product of an m× n matrix and an n× l matrix is an m× l matrix (cf. Th.7.13 below).

Example 7.3. As an example of matrix multiplication, we compute

(1 −1 02 0 −2

)

−3 01 10 −1

=

(1 · (−3) + (−1) · 1 + 0 · 0 1 · 0 + (−1) · 1 + 0 · (−1)2 · (−3) + 0 · 1 + (−2) · 0 2 · 0 + 0 · 1 + (−2) · (−1)

)

=

(−4 −1−6 2

)

.

Remark 7.4. Let S be a nonempty set and m,n ∈ N.

(a) An m × n matrix A := (aji)(j,i)∈{1,...,m}×{1,...,n} is defined as a family in S, i.e.,recalling Def. 2.15(a), A is defined as the function A : {1, . . . ,m}×{1, . . . , n} −→ S,A(j, i) = aji; and M(m,n, S) = F

({1, . . . ,m} × {1, . . . , n}, S

)(so we notice that

matrices are nothing new in terms of objects, but just a new way of thinking aboutfunctions from {1, . . . ,m} × {1, . . . , n} into S, that turns out to be convenient incertain contexts). Thus, if F is a field and S = Y is a vector space over F , then theoperations defined in Def. 7.2(a),(b) are precisely the same operations that weredefined in (5.5) and M(m,n, Y ) is a vector space according to Ex. 5.2(c). Clearly,the map

I : M(m,n, Y ) −→ Y m·n, (aji) 7→ (y1, . . . , ym·n),

where yk = aji if, and only if, k = (j − 1) · n+ i,(7.6)

constitutes a linear isomorphism. For Y = F , other important linear isomorphismsbetweenM(m,n, F ) and vector spaces of linear maps will be provided in Th. 7.10(a)below.

7 MATRICES 115

(b) Let A := (aji)(j,i)∈{1,...,m}×{1,...,n} be an m × n matrix over S. Then, for each i ∈{1, . . . , n}, the ith column of A,

ci := cAi :=

a1i...ami

can be considered both as an m× 1 matrix and as an element of Sm. In particular,if S = F is a field, then one calls ci the ith column vector of A. Analogously, foreach j ∈ {1, . . . ,m}, the jth row of A,

rj := rAj :=(aj1 . . . ajn

)

can be considered both as a 1× n matrix and as an element of Sn. In particular, ifS = F is a field, then one calls rj the jth row vector of A.

—

It can sometimes be useful to think of matrix multiplication in terms of the matrixcolumns and rows of Rem. 7.4(b) in a number of different ways, as compiled in thefollowing lemma:

Lemma 7.5. Let S be a ring, let l,m, n ∈ N, let A := (aji) be an m × n matrix overS, and let B := (bji) be an n× l matrix over S. Moreover, let cA1 , . . . , c

An and rA1 , . . . , r

Am

denote the columns and rows of A, respectively, let cB1 , . . . , cBl and rB1 , . . . , r

Bn denote the

columns and rows of B (cf. Rem. 7.4(b)), i.e.

A = (aji) =(cA1 . . . cAn

)=

rA1...rAm

, B = (bji) =

(cB1 . . . cBl

)=

rB1...rBn

.

(a) Consider

v :=

v1...vn

∈ M(n, 1, S) ∼= Sn, w :=

(w1 . . . wn

)∈ M(1, n, S) ∼= Sn.

Then

Av =n∑

k=1

cAk vk ∈ M(m, 1, S), wB =n∑

k=1

wk rBk ∈ M(1, l, S),

where we wrote cAk vk, as we do not assume S to be a commutative ring – if S iscommutative (e.g. a field), then the more familiar form vk c

Ak is also admissible.

(b) Columnwise Matrix Multiplication: One has

∀i∈{1,...,l}

cABi = AcBi .

7 MATRICES 116

(c) Rowwise Matrix Multiplication: One has

∀j∈{1,...,m}

rABj = rAj B.

Proof. (a): Let x := Av, y := wB. We then have

x =

x1...xm

, cAk =

a1k...

amk

, y =

(y1 . . . yl

), rBk =

(bk1 . . . bkl

),

yielding

∀j∈{1,...,m}

xj =n∑

k=1

ajkvk =

(n∑

k=1

cAk vk

)

j

,

∀i∈{1,...,l}

yi =n∑

k=1

wkbki =

(n∑

k=1

wk rBk

)

i

,

thereby proving (a).

(b): For each i ∈ {1, . . . , l}, j ∈ {1, . . . ,m}, one computes

(cABi )j = (AB)ji =

n∑

k=1

ajkbki = (AcBi )j,

proving (b).

(c): For each j ∈ {1, . . . ,m}, i ∈ {1, . . . , l}, one computes

(rABj )i = (AB)ji =

n∑

k=1

ajkbki = (rAj B)i,

proving (c). �

Example 7.6. Using the matrices from Ex. 7.3, we have

(−4−6

)

=

(1 −1 02 0 −2

)

−310

=

(12

)

· (−3) +

(−10

)

· 1 +(

0−2

)

· 0

and

(−4 −1

)=(1 −1 0

)

−3 01 10 −1

= 1 ·(−3 0

)+ (−1) ·

(1 1

)+ 0 ·

(0 −1

).

Theorem 7.7. Let S be a ring.

7 MATRICES 117

(a) Matrix multiplication of matrices over S is associative whenever all relevant multi-plications are defined. More precisely, if A = (aji) is an m× n matrix, B = (bji) isan n× l, and C = (cji) is an l × p matrix, then

(AB)C = A(BC). (7.7)

(b) Matrix multiplication of matrices over S is distributive whenever all relevant mul-tiplications are defined. More precisely, if A = (aji) is an m× n matrix, B = (bji)and C = (cji) are n× l matrices, and D = (dji) is an l × p matrix, then

A(B + C) = AB + AC, (7.8a)

(B + C)D = BD + CD. (7.8b)

(c) For each n ∈ N, M(n, S) is a ring. If S is a ring with unity, then M(n, S)is a ring with unity as well, where the additive neutral element is the zero matrix0 ∈ M(n, S) (all entries are zero) and the multiplicative neutral element of M(n, S)is the so-called identity matrix Idn := (δji),

δji :=

{

1 for i = j,

0 for i 6= j,Idn =

1

. . .

1

, (7.9)

where the usual meaning of such notation is that all omitted entries are 0. We callGLn(S) := M(n, S)∗, i.e. the group of invertible matrices over S, general lineargroup (we will see in Th. 7.10(a) below that, if S = F is a field and V is a vectorspace of dimension n over F , then GLn(F ) is isomorphic to the general linear groupGL(V ) of Def. 6.22(b)). Analogous to Def. 6.22(a), we call the matrices in GLn(S)regular and the matrices in M(n, S) \GLn(S) singular.

Proof. (a): One has m × p matrices (AB)C = (dji) and A(BC) = (eji), where, usingassociativity and distributivity in S, one computes

dji =l∑

α=1

(n∑

k=1

ajkbkα

)

cαi =l∑

α=1

n∑

k=1

ajkbkαcαi =n∑

k=1

ajk

(l∑

α=1

bkαcαi

)

= eji,

thereby proving (7.7).

(b): Excercise.

(c): WhileM(n, S) is a ring due to (a), (b), and Ex. 4.9(e) (which says that (M(n, S),+)is a commutative group), we check the neutrality of Idn: For each A = (aji) ∈ M(n, S),we obtain

IdnA =

(n∑

k=1

δjkaki

)

= (aji) = A =

(n∑

k=1

ajkδki

)

= A Idn,

thereby establishing the case. �

7 MATRICES 118

Caveat 7.8. Matrix multiplication of matrices over a ring S is, in general, not com-mutative, even if S is commutative: If A is an m× n matrix and B is an n× l matrixwith m 6= l, then BA is not even defined. If m = l, but m 6= n, then AB has dimensionm ×m, but BA has different dimension, namely n × n. And even if m = n = l > 1,then commutativity is, in general, not true – for example, if S is a ring with unity and0 6= 1 ∈ S, then

1 1 . . . 10 0 . . . 0...

......

...0 0 . . . 0

1 0 . . . 01 0 . . . 0...

......

...1 0 . . . 0

=

λ 0 . . . 00 0 . . . 0...

......

...0 0 . . . 0

, (7.10a)

1 0 . . . 01 0 . . . 0...

......

...1 0 . . . 0

1 1 . . . 10 0 . . . 0...

......

...0 0 . . . 0

=

1 1 . . . 11 1 . . . 1...

......

...1 1 . . . 1

. (7.10b)

Note that λ = m for S = R or S = Z, but, in general, λ will depend on S, e.g. forS = {0, 1}, one obtains λ = m mod 2.

7.2 Matrices as Representations of Linear Maps

Remark 7.9. Coming back to the situation discussed at the beginning of the previoussection above, resulting in (7.1), let v =

∑ni=1 λivi ∈ V with λ1, . . . , λn ∈ F . Then (also

cf. the calculation in (6.9))

A(v) =n∑

i=1

λiA(vi) =n∑

i=1

λi

m∑

j=1

n∑

k=1

ajkAvkwj(vi)

(6.32)=

n∑

i=1

λi

m∑

j=1

ajiwj

=m∑

j=1

(n∑

i=1

ajiλi

)

wj. (7.11)

Thus, if we represent v by a column vector v (an n×1 matrix) containing its coordinatesλ1, . . . , λn with respect to the basis {v1, . . . , vn} and A(v) by a column vector w (anm×1matrix) containing its coordinates with respect to the basis {w1, . . . , wm}, then (7.11)shows

w =Mv =M

λ1...λn

, where M := (aji) ∈ M(m,n, F ). (7.12)

—

For finite-dimensional vector spaces, the precise relationship between linear maps, bases,and matrices is provided by the following theorem:

7 MATRICES 119

Theorem 7.10. Let V and W be finite-dimensional vector spaces over the field F ,let BV := {v1, . . . , vn} and BW := {w1, . . . , wm} be bases of V and W , respectively;m,n ∈ N, n = dimV , m = dimW .

(a) The map I := I(BV , BW ),

I : L(V,W ) −→ M(m,n, F ), A 7→ (aji), (7.13)

where the aji are given by (7.1), i.e. by

A =m∑

j=1

n∑

i=1

ajiAviwj,

constitutes a linear isomorphism. Moreover, in the special case V = W , vi = wi

for i ∈ {1, . . . , n}, the map I also constitutes a ring isomorphism I : L(V, V ) ∼=M(n, F ); its restriction to GL(V ) then constitutes a group isomorphism I : GL(V )∼= GLn(F ).

(b) Let aji ∈ F , j ∈ {1, . . . ,m}, i ∈ {1, . . . , n}. Moreover, let A ∈ L(V,W ). Then(7.1) holds if, and only if,

∀i∈{1,...,n}

Avi =m∑

j=1

ajiwj. (7.14)

Proof. (a): According to Th. 6.19,{Aviwj

: (j, i) ∈ {1, . . . ,m} × {1, . . . , n}}

formsa basis of L(V,W ). Thus, to every family of coordinates

{aji : (j, i) ∈ {1, . . . ,m} ×

{1, . . . , n}}in F , (7.1) defines a unique element of L(V,W ), i.e. I is bijective. It remains

to verify that I is linear. To this end, let λ, µ ∈ F and A,B ∈ L(V,W ) with

A =m∑

j=1

n∑

i=1

ajiAviwj, (aji) = I(A) ∈ M(m,n, F ),

B =m∑

j=1

n∑

i=1

bjiAviwj, (bji) = I(B) ∈ M(m,n, F ).

Then

λA+ µB = λm∑

j=1

n∑

i=1

ajiAviwj+ µ

m∑

j=1

n∑

i=1

bjiAviwj=

m∑

j=1

n∑

i=1

(λaji + µbji)Aviwj,

showing

I(λA+ µB) = (λaji + µbji) = λ(aji) + µ(bji) = λI(A) + µI(B),

proving the linearity of I. Now let V = W and vi = wi for each i ∈ {1, . . . , n}. Weclaim

BA =n∑

j=1

n∑

i=1

cjiAvivj , where cji =n∑

k=1

bjkaki : (7.15)

7 MATRICES 120

Indeed, for each l ∈ {1, . . . , n}, one calculates

(BA)vl = B

(n∑

k=1

n∑

i=1

akiAvivkvl

)

= B

(n∑

k=1

aklvk

)

=n∑

j=1

n∑

i=1

bjiAvivj

n∑

k=1

aklvk

=n∑

j=1

n∑

k=1

bjkaklvj =n∑

j=1

cjlvj =n∑

j=1

n∑

i=1

cjiAvivjvl,

thereby proving (7.15). Thus,

I(BA) =

(n∑

k=1

bjkaki

)

= (bji)(aji) = I(B)I(A),

proving I to be a ring isomorphism. Since, clearly, I(GL(V )) = GLn(F ), the proof of(a) is complete.

(b): If (7.1) holds, then the calculation (7.11) (with λk := δik) proves (7.14). Coversely,assume (7.14). It was then shown in the proof of Th. 6.19(b) that (7.1) must hold. �

Definition and Remark 7.11. In the situation of Th. 7.10, for each A ∈ L(V,W ), onecalls the matrix I(A) = (aji) ∈ M(m,n, F ) the (transformation) matrix correspondingto A with respect to the basis {v1, . . . , vn} of V and the basis {w1, . . . , wm} of W . If thebases are understood, then one often tends to identify the map with its correspondingmatrix.

However, as I(A) depends on the bases, identifying A and I(A) is only admissible aslong as one keeps the bases of V and W fixed! Moreover, if one represents matrices asrectangular arrays as in (7.2) (which one usually does), then one actually considers thebasis vectors of {v1, . . . , vn} and {w1, . . . , wm} as ordered from 1 to n (resp. m), i.e. I(A)actually depends on the so-called ordered bases (v1, . . . , vn) and (w1, . . . , wm) (orderedbases are tuples rather than sets and the matrix corresponding to A changes if the orderof the basis vectors changes).

Similarly, we had seen in (7.12) that it can be useful to identify a vector v =∑n

i=1 λiviwith its coordinates (λ1, . . . , λn), typically represented as an n × 1 matrix (a columnvector, as in (7.12)) or a 1 × n matrix (a row vector). Obviously, this identification isalso only admissible as long as the basis {v1, . . . , vn} and its order is kept fixed.

Example 7.12. Let V := Q4 with ordered basis BV := (v1, v2, v3, v4) and let W := Q2

with ordered basis BW := (w1, w2). Moreover, assume A ∈ L(V,W ) satisfies

Av1 = w1

Av2 = 2w1 − w2

Av3 = 3w1

Av4 = 4w1 − 2w2.

According to Th. 7.10(a),(b), the coefficients on the right-hand side of the above equa-tions provide the columns of the matrix representing A: The matrix I(A) of A with

7 MATRICES 121

respect to BV and BW (I according to Th. 7.10(a)) is

I(A) =

(1 2 3 40 −1 0 −2

)

.

If we switch v1 and v4 in BV to obtain B′V := (v4, v2, v3, v1), and we switch w1 and w2 in

BW to obtain B′W := (w2, w1), then the matrix I ′(A) of A with respect to B′

V and B′W

is obtained from I(A) by switching columns 1 and 4 as well as rows 1 and 2, resultingin

I ′(A) =

(−2 −1 0 04 2 3 1

)

.

Suppose, we want to determine Av for the vector v := −v1 + 3v2 − 2v3 + v4. Then,according to (7.12), we can do that via the matrix multiplication

I(A)

−13−21

=

(1 2 3 40 −1 0 −2

)

.

−13−21

=

(3−5

)

,

obtaining Av = 3w1 − 5w2.

—

The following Th. 7.13 is the justification for defining matrix multiplication accordingto Def. 7.2(c).

Theorem 7.13. Let F be a field, let n,m, l ∈ N, and let V,W,X be finite-dimensionalvector spaces over F , dimV = n, dimW = m, dimX = l, with ordered bases, BV :=(v1, . . . , vn), BW := (w1, . . . , wm), and BX := (x1, . . . , xl), respectively. If A ∈ L(V,W ),B ∈ L(W,X), M = (aji) ∈ M(m,n, F ) is the matrix corresponding to A with respect toBV and BW , and N = (bji) ∈ M(l,m, F ) is the matrix corresponding to B with respectto BW and BX , then NM = (

∑mk=1 bjkaki) ∈ M(l, n, F ) is the matrix corresponding to

BA with respect to BV and BX .

Proof. For each i ∈ {1, . . . , n}, one computes

(BA)(vi) = B(A(vi)

)= B

(m∑

k=1

akiwk

)

=m∑

k=1

akiB(wk) =m∑

k=1

aki

l∑

j=1

bjkxj

=l∑

j=1

m∑

k=1

bjkakixj =l∑

j=1

(m∑

k=1

bjkaki

)

xj, (7.16)

proving NM = (∑m

k=1 bjkaki) is the matrix corresponding to BA with respect to thebases {v1, . . . , vn} and {x1, . . . , xl}. �

As we have seen, given A ∈ L(V,W ), the matrix I(A) representing A depends on thechosen (ordered) bases of V and W . It will be one goal of Linear Algebra II to find

7 MATRICES 122

methods to determine bases of V andW such that the form of I(A) becomes particularlysimple. In the following Th. 7.14, we show how I(A) changes for given basis transitionsfor V and W , respectively.

Theorem 7.14. Let V and W be finite-dimensional vector spaces over the field F ,m,n ∈ N, dimV = n, dimW = m. Let BV := (v1, . . . , vn) and BW := (w1, . . . , wm) beordered bases of V and W , respectively. Moreover, let B′

V := (v′1, . . . , v′n) and B′

W :=(w′

1, . . . , w′m) also be ordered bases of V and W , respectively, and let cji, dji, fji ∈ F be

such that

∀i∈{1,...,n}

v′i =n∑

j=1

cjivj, (7.17a)

∀i∈{1,...,m}

w′i =

m∑

j=1

fjiwj, wi =m∑

j=1

djiw′j, (7.17b)

where (dji) = (fji)−1 ∈ GLm(F ) (we then call (cji) ∈ GLn(F ) and (fji) ∈ GLm(F ) the

transition matrices corresponding to the basis transitions from BV to B′V and from BW

to B′W , respectively). If A ∈ L(V,W ) has I(BV , BW )(A) = (aji) ∈ M(m,n, F ) with

I(BV , BW ) as in Th. 7.10(a), then

I(B′V , B

′W )(A) = (eji), where (eji) = (dji)(aji)(cji) = (fji)

−1(aji)(cji). (7.18)

In particular, in the special case, where V = W , vi = wi, v′i = w′

i for each i ∈ {1, . . . , n},one has

I(B′V , B

′V )(A) = (eji), where (eji) = (cji)

−1(aji)(cji). (7.19)

Proof. For each i ∈ {1, . . . , n}, we compute

Av′i = A

(n∑

l=1

clivl

)

=n∑

l=1

cliAvl(7.14)=

n∑

l=1

cli

m∑

k=1

aklwk =n∑

l=1

cli

m∑

k=1

akl

m∑

j=1

djkw′j

=m∑

j=1

(n∑

l=1

(m∑

k=1

djkakl

)

cli

)

w′j,

i.e. (7.18) holds in consequence of Th. 7.10(b),(a). �

Example 7.15. We explicitly write the transformations and matrices of Th. 7.14 for thesituation of Ex. 7.12: There we had V := Q4 with ordered bases BV := (v1, v2, v3, v4),B′

V := (v4, v2, v3, v1), and W := Q2 with ordered bases BW := (w1, w2), B′W := (w2, w1).

Thus,v′1 = v4v′2 = v2v′3 = v3v′4 = v1,

(cji) =

0 0 0 10 1 0 00 0 1 01 0 0 0

andw′

1 = w2

w′2 = w1,

(fji) =

(0 11 0

)

7 MATRICES 123

andw1 = w′

2

w2 = w′1,

(dji) =

(0 11 0

)

.

Moreover, A ∈ L(V,W ) in Ex. 7.12 was such that

(aji) = I(BV , BW )(A) =

(1 2 3 40 −1 0 −2

)

.

Thus, according to (7.18),

I(B′V , B

′W )(A) =

(0 11 0

)(1 2 3 40 −1 0 −2

)

0 0 0 10 1 0 00 0 1 01 0 0 0

=

(0 11 0

)(4 2 3 1−2 −1 0 0

)

=

(−2 −1 0 04 2 3 1

)

,

which is precisely I ′(A) of Ex. 7.12.

7.3 Rank and Transpose

Definition 7.16. Let F be a field and m,n ∈ N.

(a) Let V and W be vector spaces over F and A ∈ L(V,W ). Then rk(A) := dim ImAis called the rank of A.

(b) Let (aji)(j,i)∈{1,...,m}×{1,...,n} ∈ M(m,n, F ) and consider the column vectors and rowvectors of (aji) as elements of the vector spaces Fm and F n, respectively (cf. Rem.7.4(b)). Then

rk(aji) := dim

⟨

a1i...ami

: i ∈ {1, . . . , n}

⟩

(7.20a)

is called the column rank or just the rank of (aji). Analogously,

rrk(aji) := dim⟨{ (

aj1 . . . ajn): j ∈ {1, . . . ,m}

}⟩

(7.20b)

is called the row rank of (aji).

—

A main goal of the present section is to show that, if V and W are finite-dimensionalvector spaces, A ∈ L(V,W ), and (aji) is the matrix representing A with respect tochosen ordered bases of V and W , respectively, then rk(A) = rk(aji) = rrk(aji) (cf. Th.7.23 below).

7 MATRICES 124

Theorem 7.17. Let F be a field, m,n ∈ N.

(a) Let V,W be finite-dimensional vector spaces over F , dimV = n, dimW = m, whereBV := (v1, . . . , vn) is an ordered basis of V and BW := (w1, . . . , wm) is an orderedbasis of W . If A ∈ L(V,W ) and (aji) ∈ M(m,n, F ) is the matrix corresponding toA with respect to BV and BW , then

rk(A) = rk(aji).

(b) If the matrices (cji) ∈ M(n, n, F ) and (dji) ∈ M(m,m,F ) are regular and (aji) ∈M(m,n, F ), then

rk(aji) = rk(

(dji)(aji)(cji))

.

Proof. (a) is basically due to Lem. 7.5(a) and Th. 6.9: Let B : W −→ Fm be the linearisomorphism with B(wj) = ej, where {e1, . . . , em} is the standard basis of Fm (cf. Ex.5.16(c)). Then

rk(A) = dim ImA = dimB(ImA) = dim Im(BA) = rk(BA). (7.21)

Since

∀i∈{1,...,n}

(BA)vi = B

(m∑

j=1

ajiwj

)

=

a1i...ami

= cAi ,

we have Im(BA) = 〈{cA1 , . . . , cAn}〉, which, together with (7.21), proves (a).

(b): Let BV := (v1, . . . , vn) and BW := (w1, . . . , wm) be ordered bases of V := F n andW := Fm, respectively. Moreover, let A ∈ L(V,W ) be defined by (7.14), i.e. by

∀i∈{1,...,n}

Avi =m∑

j=1

ajiwj.

Then (aji) is the matrix representing A with respect to BV and BW and rk(A) = rk(aji)by (a). Now let B′

V := (v′1, . . . , v′n) and B

′W := (w′

1, . . . , w′m) also be ordered bases of V

and W , respectively, such that (7.17) holds, i.e.

∀i∈{1,...,n}

v′i =n∑

j=1

cjivj, ∀i∈{1,...,m}

wi =m∑

j=1

djiw′j

(such bases exist, since (cji) and (dji) are regular), then, according to Th. 7.14, themaxtrix (eji) := (dji)(aji)(cji) represents A with respect to B′

V and B′W . Thus, (a)

yields rk(eji) = rk(A) = rk(aji), as desired. �

Theorem 7.18. Let V,W be finite-dimensional vector spaces over the field F , dimV =n, dimW = m, (m,n) ∈ N2, A ∈ L(V,W ) with r := rk(A). Then there exists ordered

7 MATRICES 125

bases BV := (v1, . . . , vn) and BW := (w1, . . . , wm) of V and W , respectively, such thatthe matrix (aji) ∈ M(m,n, F ) corresponding to A with respect to BV and BW satisfies

aji =

{

1 for j = i, 1 ≤ j ≤ r,

0 otherwise,i.e. (aji) =

(Idr 00 0

)

.

Proof. According to Th. 6.8(a), we know

n = dimV = dimkerA+ dim ImA = dimkerA+ r.

Thus, we can choose an ordered basis BV = (v1, . . . , vn) of V , such that (vr+1, . . . , vn)is a basis of kerA. For each i ∈ {1, . . . , r}, let wi := Avi. Then (w1, . . . , wr) is a basisof ImA and there exist wr+1, . . . , wm ∈ W such that BW = (w1, . . . , wm) constitutes abasis of W . Since

∀i∈{1,...,n}

Avi =

{

wi for 1 ≤ i ≤ r,

0 for i > r,

(aji) has the desired form. �

As a tool for showing rk(aji) = rrk(aji), we will now introduce the transpose of amatrix, which is also of general interest. While, here, we are mostly interested in usingthe transpose of a matrix for a matrix representing a linear map, the concept makessense in the more general situation of Def. 7.1:

Definition 7.19. Let S be a nonempty set, A := (aji)(j,i)∈{1,...,m}×{1,...,n} ∈ M(m,n, S),and m,n ∈ N. Then we define the transpose of A, denoted At, by

At := (atji)(j,i)∈{1,...,n}×{1,...,m} ∈ M(n,m, S), where ∀(j,i)∈{1,...,n}×{1,...,m}

atji := aij.

(7.22)Thus, if A is anm×nmatrix, then its transpose is an n×mmatrix, where one obtains At

from A by switching rows and columns: A is the map f : {1, . . . ,m}×{1, . . . , n} −→ S,f(j, i) = aji, and its transpose At is the map f t : {1, . . . , n} × {1, . . . ,m} −→ S,f t(j, i) = atji = f(i, j) = aij.

Example 7.20. As examples of forming transposes, consider

(1 23 4

)t

=

(1 32 4

)

,

(1 2 3 40 −1 0 −2

)t

=

1 02 −13 04 −2

.

Remark 7.21. Let S be a nonempty set, m,n ∈ N. It is immediate from Def. 7.19 thatthe map A 7→ At is bijective from M(m,n, S) onto M(n,m, S), where

∀A∈M(m,n,S)

(At)t = A. (7.23)

Theorem 7.22. Let m,n, l ∈ N.

7 MATRICES 126

(a) Let S be a commutative ring. If A ∈ M(m,n, S) and B ∈ M(n, l, S), then

(AB)t = BtAt. (7.24)

If A ∈ GLn(S), then At ∈ GLn(S) and

(At)−1 = (A−1)t. (7.25)

(b) Let F be a field. Then the map

I : M(m,n, F ) −→ M(n,m, F ), A 7→ At, (7.26)

constitutes a linear isomorphism.


Theorem 7.23. Let V,W be finite-dimensional vector spaces over the field F , dimV =n, dimW = m, with ordered bases BV := (v1, . . . , vn) and BW := (w1, . . . , wm), respec-tively. If A ∈ L(V,W ) and (aji) ∈ M(m,n, F ) is the matrix corresponding to A withrespect to BV and BW , then

rk(A) = rk(aji) = rrk(aji).

Proof. As the first equality was already shown in Th. 7.17(a), it merely remains to showrk(aji) = rrk(aji). Let r := rk(A). According to Th. 7.18 and Th. 7.14, there existregular matrices (xji) ∈ GLm(F ), (yji) ∈ GLn(F ) such that

(xji)(aji)(yji) =

(Idr 00 0

)(7.24)⇒ (yji)

t(aji)t(xji)

t =

(Idr 00 0

)

.

Since (xji)t and (yji)

t are regular by Th. 7.22(a), we may use Th. 7.17(b) to obtain

rrk(aji) = rk(aji)t = rk

((yji)

t(aji)t(xji)

t)= rk

(Idr 00 0

)

= r,

as desired. �

Remark 7.24. Let V,W be finite-dimensional vector spaces over the field F , dimV = n,dimW = m, where m,n ∈ N. Moreover, let BV = (v1, . . . , vn) and BW = (w1, . . . , wm)be ordered bases of V and W , respectively, let A ∈ L(V,W ), and let (aji) ∈ M(m,n, F )be the matrix corresponding to A with respect to BV and BW . Using the transposematrix (aji)

t ∈ M(n,m, F ), one can now also define a transpose At of the map A.However, a subtlety arises related to basis transitions and, given λ1, . . . , λm ∈ F , rep-resenting coordinates of w ∈ W with respect to BW , one can not simply define At byapplying (aji)

t to the column vector containing λ1, . . . , λm, as the result would, in gen-eral, depend on BV and BW . Instead, one obtains At by left-multiplying (aji) with therow vector containing λ1, . . . , λm:

Atw :=(λ1 . . . λm

)(aji).

Even though the resulting coordinate values are the same that one obtains from com-puting (aji)

t(λ1 . . . λm)t, the difference appears when considering basis transitions. In

consequence, algebraically, one obtains the transpose At to be a map from the dual W ′

of W into the dual V ′ of V . Here, we will not go into further details, as we do not wantto explain the concept of dual spaces at this time.

7 MATRICES 127

7.4 Special Types of Matrices

In the present section, we investigate types of matrices that posses particularly simplestructures, often, but not always, due to many entries being 0. For many of the setsthat we introduce in the following, there does not seem to be a standard notation, atleast not within Linear Algebra. However, as these sets do appear very frequently, itseems inconvenient and cumbersome not to introduce suitable notation.

Definition 7.25. Let n ∈ N and let S be a set containing elements denoted 0 and 1(usually, S will be a ring with unity or even a field, but we want to emphasize that thepresent definition makes sense without any structure on the set S).

(a) We call a matrix A = (aji) ∈ M(n, S) diagonal if, and only if, aji = 0 for j 6= i,i.e. if, and only if, all nondiagonal entries of A are 0. We define

Dn(S) :={A ∈ M(n, S) : A is diagonal

}.

If (s1, . . . , sn) ∈ Sn, then we define

diag(s1, . . . , sn) := D = (dji) ∈ Dn(S), dji :=

{

sj for j = i,

0 for j 6= i.

(b) A matrix A = (aji) ∈ M(n, S) is called upper triangular or right triangular (resp.lower triangular or left triangular) if, and only if, aji = 0 for each i, j ∈ {1, . . . , n}such that j > i (resp. j < i), i.e. if, and only if, all nonzero entries of A areabove/right (resp. below/left) of the diagonal. A triangular matrix A is calledstrict if, and only if, aii = 0 for each i ∈ {1, . . . , n}; it is called unipotent if, andonly if, aii = 1 for each i ∈ {1, . . . , n}. We define4

BUn(S) :={A ∈ M(n, S) : A is upper triangular

},

BLn(S) :={A ∈ M(n, S) : A is lower triangular

},

BU0n(S) :=

{A ∈ M(n, S) : A is strict upper triangular

},

BL0n(S) :=

{A ∈ M(n, S) : A is strict lower triangular

},

BU1n(S) :=

{A ∈ BUn(S) : A is unipotent

},

BL1n(S) :=

{A ∈ BLn(S) : A is unipotent

}.

(c) A matrix A = (aji) ∈ M(n, S) is called symmetric if, and only if, At = A, i.e. ifaji = aij for each i, j ∈ {1, . . . , n}. We define

Symn(S) :={A ∈ M(n, S) : A is symmetric

}.

Proposition 7.26. Let S be a ring and m,n ∈ N. Let s1, . . . , smax{m,n} ∈ S.

4The ‘B’ in the notation comes from the theory of so-called Lie algebras, where one finds, at leastfor suitable S, that BUn(S) and BLn(S) form so-called Borel subalgebras of the algebra M(n, S).

7 MATRICES 128

(a) Let D1 := diag(s1, . . . , sm) ∈ Dm(S), D2 := diag(s1, . . . , sn) ∈ Dn(S), A = (aji) ∈M(m,n, S). Then left multiplication of A by D1 multiplies the j-th row of A by sj;right multiplication of A by D2 multiplies the i-th column of A by si:

D1A = diag(s1, . . . , sm)

rA1...rAm

=

s1 rA1

...sm r

Am

, (7.27a)

AD2 =(cA1 . . . cAn

)diag(s1, . . . , sn) =

(cA1 s1 . . . cAn sn

). (7.27b)

(b) Dn(S) is a subring of M(n, S) (subring with unity if S is a ring with unity). If Sis a field, then Dn(S) is a vector subspace of M(n, S).

(c) If S is a ring with unity, then D := diag(s1, . . . , sn) ∈ GLn(S) if, and only if, eachsi is invertible, i.e. if, and only if, si ∈ S∗ for each i ∈ {1, . . . , n}. In that case, onehas D−1 = diag(s−1

1 , . . . , s−1n ) ∈ GLn(S). Moreover, Dn(S

∗) = Dn(S) ∩ GLn(S) isa subgroup of GLn(S).

Proof. (a): If (dji) := D1, (eji) := D2, (xji) := D1A and (yji) := AD2, then, for each(j, i) ∈ {1, . . . ,m} × {1, . . . , n},

xji =n∑

k=1

djkaki = sjaji, yji =m∑

k=1

ajkeki = ajisi.

(b): As a consequence of (a), we have, for a1, . . . , an, b1, . . . , bn ∈ S,

diag(a1, . . . , an) diag(b1, . . . , bn) = diag(a1b1, . . . , anbn) ∈ Dn(S). (7.28)

As A,B ∈ Dn(S), clearly, implies A + B ∈ Dn(S) and −A ∈ Dn(S), we have shownDn(S) to be a subring of M(n, S), where, if 1 ∈ S, then Idn ∈ Dn(S). If λ ∈ S,A ∈ Dn(S), then, clearly, λA ∈ Dn(S), showing that, for S being a field, Dn(S) is avector subspace of M(n, S).

(c) follows from (7.28) and the fact that (S∗, ·) is a group according to Def. and Rem.4.40. �

Proposition 7.27. Let S be a ring and n ∈ N.

(a) Let A := (aji) ∈ M(n, S) and B := (bji) ∈ M(n, S). If A,B ∈ BUn(S) orA,B ∈ BLn(S), and C := (cji) = AB, then cii = aiibii for each i ∈ {1, . . . , n}.

(b) BUn(S),BLn(S),BU0n(S),BL

0n(S) are subrings of M(n, S) (BUn(S),BLn(S) are

subrings with unity if S is a ring with unity). If S is a field, then BUn(S), BLn(S),BU0

n(S), BL0n(S) are vector subspaces of M(n, S).

7 MATRICES 129

(c) Let S be a ring with unity. If B := (bji) ∈ BUn(S)∩GLn(S) and A := (aji) = B−1,then A ∈ BUn(S) with

aji =

0 for i < j,

b−1ii for i = j,

−(∑i−1

k=1 ajkbki

)

b−1ii recursively for i > j.

(7.29a)

If B := (bji) ∈ BLn(S) ∩GLn(S) and A := (aji) = B−1, then A ∈ BLn(S) with

aji =

0 for j < i,

b−1ii for j = i,

−(∑j

k=i+1 ajkbki

)

b−1ii recursively for j > i.

(7.29b)

(d) If S is a ring with unity and A := (aji) ∈ BUn(S) ∪ BLn(S), then A ∈ GLn(S) if,and only if, each aii is invertible. Moreover, BU1

n(S),BL1n(S) as well as BUn(S) ∩

GLn(S),BLn(S) ∩GLn(S) are subgroups of GLn(S).

Proof. (a): For A,B ∈ BUn(S), we compute

cii =i−1∑

k=1

=0︷︸︸︷aik bki + aiibii +

n∑

k=i+1

aik

=0︷︸︸︷

bki = aiibii,

while, for A,B ∈ BLn(S), we compute

cii =i−1∑

k=1

aik

=0︷︸︸︷

bki +aiibii +n∑

k=i+1

=0︷︸︸︷aik bki = aiibii.

(b): Let A,B ∈ BUn(S) and C := (cji) = AB. We obtain C ∈ BUn(S), since, if j > i,then

cji =i∑

k=1

ajkbki +n∑

k=i+1

ajkbki = 0

(the first sum equals 0 since k ≤ i < j and A is upper triangular, the second sum equals0 since k > i and B is upper triangular). Now let A,B ∈ BLn(S) and C := (cji) := AB.We obtain C ∈ BLn(S), since, if j < i, then

cji =i−1∑

k=1

ajkbki +n∑

k=i

ajkbki = 0

(the first sum equals 0 since k < i andB is lower triangular, the second sum equals 0 sincej < i ≤ k and A is lower triangular). Now letM ∈ {BUn(S),BLn(S),BU

0n(S),BL

0n(S)}.

Then the above and (a) show AB ∈ M for A,B ∈ M. As A,B ∈ M, clearly, impliesA + B ∈ M and −A ∈ M, we have shown M to be a subring of M(n, S), where, if

7 MATRICES 130

1 ∈ S, then Idn ∈ BUn(S)∩BLn(S). If λ ∈ S, A ∈ M, then, clearly, λA ∈ M, showingthat, for S being a field, M is a vector subspace of M(n, S).

(c): If B ∈ BUn(S) ∩ GLn(S), then each bii is invertible by (a) and (b). Now letA := (aji) be defined by (7.29a), C := (cji) := AB. We already know C ∈ BUn(S)according to (b). Moreover, cii = 1 follows from (a) and, for j < i,

cji =i∑

k=j

ajkbki = ajibii +i−1∑

k=j

ajkbki = 0

by the recursive part of (7.29a), completing the proof of C = Idn. If B ∈ BLn(S) ∩GLn(S), then each bii is invertible by (a) and (b). Now let A := (aji) be defined by(7.29b), C := (cji) := AB. We already know C ∈ BLn(S) according to (b). Moreover,cii = 1 follows from (a) and, for j > i,

cji =

j∑

k=i

ajkbki = ajibii +

j∑

k=i+1

ajkbki = 0

by the recursive part of (7.29b), completing the proof of C = Idn.

(d) is now merely a corollary of suitable parts of (a),(b),(c). �

Example 7.28. Let

B :=

1 0 04 2 06 5 3

.

We use (7.29b) to compute A := (aji) = B−1:

a11 = 1, a22 =1

2, a33 =

1

3,

a32 = −a33b32/b22 = −5

3· 12= −5

6,

a21 = −a22b21/b11 = −4

2= −2,

a31 = −(a32b21 + a33b31)/b11 = −(

−5 · 46

+6

3

)

=4

3.

Thus,

AB =

1 0 0

−2 12

043

−56

13

.

1 0 0

4 2 0

6 5 3

=

1 0 0

−2 + 42

1 043− 5·4

6+ 6

3−5·2

6+ 5

31

= Id3 .

Proposition 7.29. Let F be a field and n ∈ N.

(a) Symn(F ) is a vector subspace of M(n, F ).

7 MATRICES 131

(b) If A ∈ Symn(F ) ∩GLn(F ), then A−1 ∈ Symn(F ) ∩GLn(F ).

Proof. (a): If A,B ∈ Symn(F ) and λ ∈ F , then (A + B)t = At + Bt = A + B and(λA)t = λAt = λA, showing A + B ∈ Symn(F ) and λA ∈ Symn(F ), i.e. Symn(F ) is avector subspace of M(n, F ).

(b): If A ∈ Symn(F )∩GLn(F ), then (A−1)t = (At)−1 = A−1, proving A−1 ∈ Symn(F )∩GLn(F ). �

7.5 Blockwise Matrix Multiplication

It is sometimes useful that one can carry out matrix multiplication in a blockwise fashion,i.e. by partitioning the entries of a matrix into submatrices and then performing amatrix multiplication for new matrices that have the submatrices as their entries, seeTh. 7.31 below. To formulate and proof Th. 7.31, we need to define matrices and theirmultiplication, where the entries are allowed to be indexed by more general index sets:

Definition 7.30. Let S be a ring. Let J, I be finite index sets, #J,#I ∈ N. We thencall each family (aji)(j,i)∈J×I in S a J × I matrix over S, denoting the set of all suchmatrices by M(J, I, S). If K is another index set with #K ∈ N, then, for each J ×Kmatrix (aji) and each K × I matrix (bji) over S, define the product

(aji)(bji) :=

(∑

k∈K

ajkbki

)

(j,i)∈J×I

. (7.30)

Theorem 7.31. Let S be a ring. Let J, I,K be finite index sets, #J,#I,#K ∈ N.Now assume, we have disjoint partitions of J, I,K into nonempty sets:

J =⋃

α∈{1,...,M}

Jα, I =⋃

β∈{1,...,N}

Iβ, K =⋃

γ∈{1,...,L}

Kγ, (7.31)

where M,N,L ∈ N,

∀α∈{1,...,M}

mα := #Jα ∈ N, ∀β∈{1,...,N}

nβ := #Iβ ∈ N, ∀γ∈{1,...,M}

lγ := #Kγ ∈ N.

Let A := (aji) be a J ×K matrix over S and let B := (bji) be a K × I matrix over S.Define the following submatrices of A and B, respectively:

∀(α,γ)∈{1,...,M}×{1,...,L}

Aαγ := (aji)(j,i)∈Jα×Kγ, (7.32a)

∀(γ,β)∈{1,...,L}×{1,...,N}

Bγβ := (bji)(j,i)∈Kγ×Iβ . (7.32b)

Then, for each (α, γ, β) ∈ {1, . . . ,M}×{1, . . . , L}×{1, . . . , N}, Aαγ is a Jα×Kγ matrixand Bγβ is a Kγ × Iβ matrix. Thus, we can define

(Cαβ) := (Aαγ)(Bγβ) :=

(L∑

γ=1

AαγBγβ

)

(α,β)∈{1,...,M}×{1,...,N}

. (7.32c)

8 LINEAR SYSTEMS 132

We claim thatC := (cji) := AB = (Cαβ) (7.33)

in the sense that

∀(α,β)∈{1,...,M}×{1,...,N}

∀(j,i)∈Jα×Iβ

cji = (Cαβ)ji. (7.34)

Proof. Let (α, β) ∈ {1, . . . ,M} × {1, . . . , N} and (j, i) ∈ Jα × Iβ. Then, according to(7.32) and since K is the disjoint union of the Kγ, we have

(Cαβ)ji =L∑

γ=1

∑

k∈Kγ

ajkbki =∑

k∈K

ajkbki = cji,

proving (7.34) and the theorem. �

Example 7.32. Let S be a ring with unity and A,B,C,D ∈ GL2(S). Then we can useTh. 7.31 to perform the following multiplication of a 4× 6 matrix by a 6× 4 matrix:

(A B 00 C D

)

A−1 0B−1 C−1

0 D−1

=

(2 Id2 BC−1

CB−1 2 Id2

)

.

8 Linear Systems

8.1 General Setting

Let F be a field. A linear equation over F has the form

n∑

k=1

ak xk = b,

where n ∈ N; b ∈ F and the a1, . . . , an ∈ F are given; and one in interested in determin-ing the “unknowns” x1, . . . , xn ∈ F such that the equation holds. The equation is calledlinear, as b is desired to be a linear combination of the a1, . . . , an. A linear system is nowa finite set of linear equations that the x1, . . . , xn ∈ F need to satisfy simultaneously:

∀j∈{1,...,m}

n∑

k=1

ajk xk = bj, (8.1)

where m,n ∈ N; b1, . . . , bm ∈ F and the aji ∈ F , j ∈ {1, . . . ,m}, i ∈ {1, . . . , n}are given. One observes that (8.1) can be concisely written as Ax = b, using matrixmultiplication, giving rise to the following definition:


Definition 8.1. Let F be a field. Given a matrix A ∈ M(m,n, F ), m,n ∈ N, andb ∈ M(m, 1, F ) ∼= Fm, the equation

Ax = b (8.2)

is called a linear system for the unknown x ∈ M(n, 1, F ) ∼= F n. The matrix one obtainsby adding b as the (n + 1)th column to A is called the augmented matrix of the linearsystem. It is denoted by (A|b). The linear system (8.2) is called homogeneous for b = 0and inhomogeneous for b 6= 0. By

L(A|b) := {x ∈ F n : Ax = b}, (8.3)

we denote the set of solutions to (8.2).

Example 8.2. While linear systems arise from many applications, we merely providea few examples, illustrating the importance of linear systems for problems from insidethe subject of linear algebra.

(a) Let v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1). The question if the setM := {v1, v2, v3} is a linearly dependent subset of R4 is equivalent to asking if thereexist x1, x2, x3 ∈ R, not all 0, such that x1v1 + x2v2 + x3v3 = 0, which is equivalentto the question if the linear system

x1 + x3 = 02x1 + 3x2 + x3 = 0

2x2 + x3 = 03x1 + x2 + x3 = 0

has a solution x = (x1, x2, x3) 6= (0, 0, 0). The linear system can be written as

1 0 12 3 10 2 13 1 1

x1x2x3

=

0000

.

The augmented matrix of the linear system is

(A|b) :=

1 0 1 | 02 3 1 | 00 2 1 | 03 1 1 | 0

.

(b) Let b1 := (1, 2, 2, 1). The question if b1 ∈ 〈{v1, v2, v3}〉 with v1, v2, v3 as in (a) isequivalent to the question, whether the linear system

Ax =

1221

, A :=

1 0 12 3 10 2 13 1 1

has a solution.


(c) Let n ∈ N. The problem of finding an inverse to an n× n matrix A ∈ M(n, n, F )is equivalent to solving the n linear systems

Av1 = e1, . . . , Avn = en, (8.4)

where e1, . . . , en are the standard (column) basis vectors. Then the vk are obviouslythe column vectors of A−1.

8.2 Abstract Solution Theory

Before describing in Sec. 8.3 below, how one can systematically determine the set ofsolutions to a linear system, we apply some of our general results from the theoryof vector spaces, linear maps, and matrices to obtain criteria for the existence anduniqueness of solutions to linear systems.

Remark 8.3. Let F be a field and m,n ∈ N. If A ∈ M(m,n, F ), then A can beinterpreted as the linear map

LA : F n −→ Fm, LA(x) := Ax, (8.5)

where x ∈ F n is identified with a column vector in M(n, 1, F ) and the column vectorAx ∈ M(m, 1, F ) is identified with an element of Fm. In view of this fact, given ann-dimensional vector space V over F , an m-dimensional vector spaceW over F , a linearmap L ∈ L(V,W ), and b ∈ W , we call

Lx = b (8.6)

a linear system or merely a linear equation for the unknown vector x ∈ V and we writeL(L|b) for its set of solutions. If V = F n, W = Fm, and the the matrix A ∈ M(m,n, F )corresponds to L with respect to the respective standard bases of F n and Fm (cf. Ex.5.16(c)), then (8.6) is identical to (8.2) and

L(A|b) = L(L|b) = L−1{b}, (8.7)

i.e. the set of solutions is precisely the preimage of the set {b} under L. In consequence,we then have

L(A|0) = L(L|0) = kerL, (8.8)

dim kerLTh. 6.8(a)

= dimF n − dim ImL = n− rk(A), (8.9)

and, more generally,

b /∈ ImL ⇒ L(A|b) = L(L|b) = ∅, (8.10a)

b ∈ ImLTh. 4.20(d)⇒ ∀

x0∈L(L|b)L(A|b) = L(L|b) = x0 + kerL = x0 + L(L|0). (8.10b)

One can also express (8.10b) by saying one obtains all solutions to the inhomogeneoussystem Lx = b by adding a particular solution of the inhomogeneous system to the setof all solutions to the homogeneous system Lx = 0.


Theorem 8.4 (Existence of Solutions). Let F be a field and m,n ∈ N, let L ∈L(F n, Fm), and let A ∈ M(m,n, F ) correspond to L with respect to the respectivestandard bases of F n and Fm.

(a) Given b ∈ Fm, the following statements are equivalent:

(i) L(A|b) = L(L|b) 6= ∅.(ii) b ∈ ImL.

(iii) b ∈ 〈{cA1 , . . . , cAn}〉, where the cAi denote the column vectors of A.

(iv) rk(A) = rk(A|b).

(b) The following statements are equivalent:

(i) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ Fm.

(ii) L is surjective.

(iii) rk(A) = m.

Proof. (a): (i) implies (ii) by (8.7). Since A corresponds to L with respect to therespective standard bases of F n and Fm, (ii) implies (iii) by Lem. 7.5(a). (iii) implies(iv), since

rk(A|b) = dim〈{cA1 , . . . , cAn , b}〉(iii)= dim〈{cA1 , . . . , cAn}〉 = rk(A).

(iv) implies (i): If rk(A) = rk(A|b), then b is linearly dependent on cA1 , . . . , cAn , which,

by Lem. 7.5(a), yields the existence of x ∈ F n with Ax =∑n

k=1 xk cAk = b.

(b): The equivalence between (i) and (ii) is merely the definition of L being surjective.Moreover, the equivalences

rk(A) = m ⇔ dim ImL = m ⇔ ImL = Fm ⇔ L surjective

prove the equivalence between (ii) and (iii). �

Theorem 8.5 (Uniqueness of Solutions). Consider the situation of Th. 8.4. Givenb ∈ ImL, the following statements are equivalent:

(i) #L(A|b) = #L(L|b) = 1, i.e. Lx = b has a unique solution.

(ii) L(A|0) = L(L|0) = {0}, i.e. the homogeneous system Lx = 0 has only the so-calledtrivial solution x = 0.

(iii) rk(A) = n.

Proof. Since b ∈ ImL, there exists x0 ∈ F n with L(x0) = b, i.e. x0 ∈ L(L|b).“(i)⇔(ii)”: Since L(L|b) = x0 + kerL by (8.10b), (i) is equivalent to L(A|0) = kerL ={0}.


“(ii)⇔(iii)”: According to Th. 6.8(a), we know

rk(A) = dim ImL = dimF n − dimkerL = n− dimkerL.

Thus, L(A|0) = kerL = {0} is equivalent to rk(A) = n. �

Corollary 8.6. Consider the situation of Th. 8.4 with m = n. Then the followingstatements are equivalent:

(i) There exists b ∈ F n such that Lx = b has a unique solution.

(ii) L(A|b) = L(L|b) 6= ∅ holds for each b ∈ F n.

(iii) The homogeneous system Lx = 0 has only the so-called trivial solution x = 0.

(iv) rk(A) = n.

(v) A and L are invertible.

Proof. “(iii)⇔(iv)”: The equivalence is obtained by using b := 0 in Th. 8.5.

“(iii)⇔(v)”: Since m = n, L is bijective if, and only if, it is injective by Th. 6.10.

“(ii)⇔(v)”: Since m = n, L is bijective if, and only if, it is surjective by Th. 6.10.

“(i)⇔(iii)” is another consequence of Th. 8.5. �

8.3 Finding Solutions

While the results of Sec. 8.2 provide some valuable information regarding the solutionsof linear systems, in general, they do not help much to solve concrete systems, especially,if the systems consist of many equations with many variables. To remedy this situation,in the present section, we will investigate methods to systematically solve linear systems.

8.3.1 Echelon Form, Back Substitution

We recall from (8.1) and Def. 8.1 that, given a field F and m,n ∈ N, we are consideringlinear systems

∀j∈{1,...,m}

n∑

k=1

ajk xk = bj,

which we can write in the formAx = b

with A = (aji) ∈ M(m,n, F ) and b ∈ M(m, 1, F ) ∼= Fm, where x and b are interpretedas column vectors, and (A|b) ∈ M(m,n + 1, F ), where b is added as a last column toA, is called the augmented matrix of the linear system. The goal is to determine theset of solutions L(A|b) = {x ∈ F n : Ax = b}. We first investigate a situation, wheredetermining L(A|b) is particularly easy, namely where (A|b) has so-called echelon form:


Definition 8.7. Let S be a set with 0 ∈ S and A = (aji) ∈ M(m,n, S), m,n ∈ N. Foreach row, i.e. for each j ∈ {1, . . . ,m}, let ν(j) ∈ {1, . . . , n} be the smallest index k suchthat ajk 6= 0 and ν(j) := n+1 if the jth row consists entirely of zeros. Then A is said tobe in (row) echelon form if, and only if, for each j ∈ {2, . . . , n}, one has ν(j) > ν(j− 1)or ν(j) = n+ 1. Thus, A is in echelon form if, and only if, it looks as follows:

A =

0 . . . 0 � ∗ ∗ ∗ ∗0 . . . 0 0 . . . 0 � ∗0 . . . 0 0 . . . 0 0 . . ....

. . . ∗ ∗ ∗ ∗∗ ∗ . . . ∗ ∗0 � ∗ . . . ∗

...

. (8.11)

The first nonzero elements in each row are called pivot elements (in (8.11), the positionsof the pivot elements are marked by squares �). The columns cAk containing a pivotelement are called pivot columns and k ∈ {1, . . . , n} is then called a pivot index; an indexk ∈ {1, . . . , n} that is not a pivot index is called a free index. Let IAp , I

Af ⊆ {1, . . . , n}

denote the sets of pivot indices and free indices, respectively. If A represents a linearsystem, then the variables corresponding to pivot columns are called pivot variables. Allremaining variables are called free variables.

Example 8.8. The following matrix over R is in echelon form:

A :=

0 0 3 3 0 −1 0 30 0 0 4 0 2 −3 20 0 0 0 0 0 1 1

.

It remains in echelon form if one adds zero rows at the bottom. For the linear systemAx = b, the variables x3, x4, x7 are pivot variables, whereas x1, x2, x5, x6, x8 are freevariables.

Theorem 8.9. Let F be a field and m,n ∈ N. Let A ∈ M(m,n, F ), b ∈ M(m, 1, F ) ∼=Fm, and consider the linear system Ax = b. Assume the augmented matrix (A|b) to bein echelon form.

(a) Then the following statements are equivalent:

(i) Ax = b has at least one solution, i.e. L(A|b) 6= ∅.(ii) rk(A) = rk(A|b).(iii) The final column b in the augmented matrix (A|b) contains no pivot elements

(i.e. if there is a zero row in A, then the corresponding entry of b also van-ishes).

(b) Let LA : F n −→ Fm be the linear map associated with A according to Rem. 8.3.Noting A to be in echelon form as well and using the notation from Def. 8.7, onehas

dimkerLA = dimL(A|0) = #IAf , dim ImLA = rkA = #IAp ,

i.e. the dimension of the kernel of A is given by the number of free variables, whereasthe dimension of the image of A is given by the number of pivot variables.


Proof. (a): The equivalence between (i) and (ii) was already shown in Th. 8.4.

“(ii)⇔(iii)”: As both A and (A|b) are in echelon form, their respective ranks are, clearly,given by their respective number of nonzero rows. However, A and (A|b) have the samenumber of nonzero rows if, and only if, b contains no pivot elements.

(b): As A is in echelon form, its number of nonzero rows (i.e. its rank) is precisely thenumber of privot indices, proving dim ImLA = rkA = #IAp . Since #IAp + #IAf = n =dimkerLA + dim ImLA, the claimed dimkerLA = #IAf now also follows. �

If the augmented matrix (A|b) of the linear system Ax = b is in echelon form and suchthat its final column b contains no pivot element, then the linear system can be solved,using so-called back substitution: One obtains a parameterized representation of L(A|b)as follows: Starting at the bottom with the first nonzero row, one solves each row for thecorresponding pivot variable, in each step substituting the expressions for pivot variablesthat were obtained in previous steps. The free variables are treated as parameters. Aparticular element of L(A|b) can be obtained by setting all free variables to 0. We nowprovide a precise formulation of this algorithm:

Algorithm 8.10 (Back Substitution). Let F be a field and m,n ∈ N. Let 0 6= A =(aji) ∈ M(m,n, F ) and b ∈ M(m, 1, F ) ∼= Fm and consider the linear system Ax = b.Assume the augmented matrix (A|b) to be in echelon form and assume its final columnb to contain no pivot element. As before, let IAp , I

Af ⊆ {1, . . . , n} denote the sets of pivot

indices and free indices, respectively. Also note IAp 6= ∅, whereas IAf might be empty.Let i1 < · · · < iN be an enumeration of the elements of IAp , where N ∈ {1, . . . , n}. As Ahas echelon form, for each k ∈ {1, . . . , N}, ak,ik is the pivot element occurring in pivotcolumn cAik (i.e. the pivot element is in the k-th row and this also implies N ≤ m).

The algorithm of back substitution defines a family (αkl)(k,l)∈J×I in F , J := IAp , I :={0} ∪ IAf , such that

∀k∈{1,...,N}, 0<l<ik

αik l = 0, (8.12a)

∀k∈{1,...,N}

(

xik = αik 0 +∑

l∈I: l>ik

αik l xl ∧ αik 0 = a−1k ik

bk

)

, (8.12b)

where, for l > ik, the αik l are defined recursively over k = N, . . . , 1 as follows: Assumingαij l have already been defined for each j > k such that (8.12b) holds, solve the k-thequation of Ax = b for xik :

n∑

i=ik

ak i xi = bk ⇒ xik = a−1k ik

bk − a−1k ik

n∑

i=ik+1

ak i xi. (8.13)

All variables xi in the last equation have indices i > ik and, inductively, we use (8.12b)for ij > ik to replace all pivot variables in this equation for xik . The resulting expressionfor xik has the form (8.12b) with αik l ∈ F , as desired.


From (αkl), we now define column vectors

s :=

s1...sn

, ∀

i∈IAf

vi :=

v1i...vni

,

where

∀j∈{1,...,n}

sj :=

{

αik 0 for j = ik ∈ IAp ,

0 for j ∈ IAf ,∀

(j,i)∈{1,...,n}×IAf

vji :=

αik i for j = ik ∈ IAp ,

1 for j = i,

0 for j ∈ IAf \ {i}.

Theorem 8.11. Consider the situation of Alg. 8.10 and let LA : F n −→ Fm be thelinear map associated with A according to Rem. 8.3.

(a) Let x = (x1, . . . , xn)t ∈ F n be a column vector. Then x ∈ L(A|b) if, and only if,

the xik , k ∈ {1, . . . , N}, satisfy the recursion over k = N, . . . , 1, given by (8.13) or,equivalently, by (8.12b). In particular, x ∈ kerLA = L(A|0) if, and only if, the xiksatisfy the respective recursion with b1 = · · · = bN = 0.

(b) The setB := {vi : i ∈ IAf } (8.14)

forms a basis of kerLA = L(A|0) and

L(A|b) = s+ kerLA. (8.15)

Proof. (a) holds, as the implication in (8.13) is, clearly, an equivalence.

(b): As we already know dimkerLA = #IAf from Th. 8.9(b), to verify B is a basis ofkerLA, it suffices to show B is linearly independent and B ⊆ kerLA. If IAf = ∅, thenthere is nothing to prove. Otherwise, if (λi)i∈IA

fis a family in F such that

0 =∑

i∈IAf

λivi,

then∀

i∈IAf

0 =∑

j∈IAf

λjvji =∑

j∈IAf

λjδji = λi,

showing the linear independence of B. To show B ⊆ kerLA, fix i ∈ IAf . To provevi ∈ kerLA, according to (a), we need to show

∀k∈{1,...,N}

vik i =∑

l∈IAf: l>ik

αik l vli.

Indeed, we have

vik i = αik i =∑

l∈IAf: l>ik

αik l vli =∑

l∈IAf: l>ik

αik l δli,


which holds, since, if Ik := {l ∈ IAf : l > ik} = ∅, then the sum is empty and, thus, 0with vik i = 0 as well, due to i < ik, and, if Ik 6= ∅, then, due to the Kronecker δ, thesum evaluates to αik i as well. To prove (8.15), according to (8.10b), it suffices to shows ∈ L(A|b) and, by (a), this is equivalent to

∀k∈{1,...,N}

sik = αik 0 +∑

l∈IAf: l>ik

αik l sl.

However, the validity of these equations is immediate from the definition of s, which, inparticular, implies the sums to vanish (sl = 0 for each l ∈ IAf ). �

Example 8.12. Consider the linear system Ax = b, where

A :=

1 2 −1 3 0 10 0 1 1 −1 10 0 0 0 1 10 0 0 0 0 0

∈ M(4, 6,R), b :=

1230

∈ R4.

Since (A|b) is in echelon form and b does not have any pivot elements, the set of solutionsL(A|b) is nonempty, and it can be obtained using back substitution according to Alg.8.10: Using the same notation as in Alg. 8.10, we have

m = 4, n = 6, IAp = {1, 3, 5}, N = 3, IAf = {2, 4, 6}.

In particular, we have IAp = {i1, i2, i3} with i1 = 1, i2 = 3, i3 = 5 and the recursion forthe xik , αik l is

x5 = 3− x6,

x3 = 2− x6 + x5 − x4 = 5− 2x6 − x4,

x1 = 1− x6 − 3x4 + x3 − 2x2 = 6− 3x6 − 4x4 − 2x2.

Thus, the αik l are given by

α50 = 3, α52 = 0, α54 = 0, α56 = −1,α30 = 5, α32 = 0, α34 = −1, α36 = −2,α10 = 6, α12 = −2, α14 = −4, α16 = −3,

and, from Th. 8.11(b), we obtain

L(A|b) = s+ kerLA =

605030

+

x2

−210000

+ x4

−40−1100

+ x6

−30−20−11

: x2, x4, x6 ∈ R

.


8.3.2 Elementary Row Operations, Variable Substitution

As seen in the previous section, if the augmented matrix (A|b) of the linear systemAx = b is in echelon form, then the set of solutions can be obtained via back substitution.Thus, it is desirable to transform the augmented matrix of a linear system into echelonform without changing the set of solutions. This can be accomplished in finitely manysteps by the so-called Gaussian elimination algorithm of Alg. 8.17 below (cf. Th. 8.19),using elementary row operations and variable substitutions, where variable substitutionsactually turn out to constitute particular elementary row operations.

Definition 8.13. Let F be a field, m,n ∈ N, and A ∈ M(m,n, F ). Let r1, . . . , rmdenote the rows of A. The following three operations, which transform A into anotherm× n matrix Aer over F , are known as elementary row operations:

(a) Row Switching: Switching two rows ri and rj, where i, j ∈ {1, . . . ,m}.

(b) Row Multiplication: Replacing a row ri by some nonzero multiple λri of that row,i ∈ {1, . . . ,m}, λ ∈ F \ {0}.

(c) Row Addition: Replacing a row ri by the sum ri + λrj of that row and a multipleof another row, (i, j) ∈ {1, . . . ,m}2, i 6= j, λ ∈ F .

Definition and Remark 8.14. Let F be a field. Consider a linear system

∀j∈{1,...,m}

n∑

k=1

ajk xk = bj,

where m,n ∈ N; b1, . . . , bm ∈ F ; and aji ∈ F , j ∈ {1, . . . ,m}, i ∈ {1, . . . , n}. Foraji 6= 0, one has

xi = a−1ji bj − a−1

ji

n∑

k=1,k 6=i

ajk xk, (8.16)

and a variable substitution means replacing the l-th equation∑n

k=1 alk xk = bl of thesystem by the equation, where the variable xi has been substituted using (8.16) withsome j 6= l, i.e. by the equation

i−1∑

k=1

alk xk + ali

a−1ji bj − a−1

ji

n∑

k=1,k 6=i

ajk xk

+n∑

k=i+1

alk xk = bl,

which, after combining the coefficients of the same variables, reads

i−1∑

k=1

(alk − alia−1ji ajk) xk + 0 · xi +

n∑

k=i+1

(alk − alia−1ji ajk) xk = bl − alia

−1ji bj. (8.17)

Comparing with Def. 8.13(c), we see that variable substitution is a particular case ofrow addition: One obtains (8.17) by replacing row rl of (A|b) by the sum rl − alia

−1ji rj.


Theorem 8.15. Let F be a field and m,n ∈ N. Let A = (aji) ∈ M(m,n, F ) andb ∈ M(m, 1, F ) ∼= Fm and consider the linear system Ax = b. Applying elementary rowoperations (and, in particular, variable substitutions as in Def. and Rem. 8.14) to theaugmented matrix (A|b) of the linear system Ax = b does not change the system’s set ofsolutions, i.e. if (Aer|ber) is the new matrix obtained from applying an elementary rowoperation according to Def. 8.13 to (A|b), then L(A|b) = L(Aer|ber).

Proof. For each of the three elementary row operations, it is immediate from the arith-metic laws holding in the field F that, for each x ∈ F n, one has x ∈ L(A|b) if, andonly if, x ∈ L(Aer|ber) (where the inverse operation of row switching is merely switchingrows ri and rj again, the inverse operation of row multiplication by λ ∈ F \ {0} is rowmultiplication by λ−1, and the inverse operation of row addition ri 7→ ri + λrj, i 6= j, isri 7→ ri − λrj). �

Corollary 8.16. Let F be a field, A = (aji) ∈ M(m,n, F ), m,n ∈ N. Applyingelementary row operations to A does not change the rank of A, i.e. if Aer is the newmatrix obtained from applying an elementary row operation according to Def. 8.13 to A,then rkA = rkAer.

Proof. Let LA : F n −→ Fm and LAer: F n −→ Fm be the linear maps associated

with A and Aer, respectively, according to Rem. 8.3. According to Th. 8.15, we havekerLA = L(A|0) = L(Aer|0) = kerLAer

. Thus, according to (8.9), we have rkA =n− dimkerLA = n− dimkerLAer

= rkAer. �

8.3.3 Gaussian Elimination

Algorithm 8.17 (Gaussian Elimination). Let F be a field and m,n ∈ N. Let A =(aji) ∈ M(m,n, F ). The Gaussian elimination algorithm is the following procedurethat, starting with A, recursively applies elementary row operations of Def. 8.13:

Let A(1) := A, r(1) := 1. For each k ≥ 1, as long as r(k) < m and k ≤ n, the Gaussian

elimination algorithm transforms A(k) = (a(k)ji ) ∈ M(m,n, F ) into A(k+1) = (a

(k+1)ji ) ∈

M(m,n, F ) and r(k) ∈ N into r(k+1) ∈ N by performing precisely one of the followingactions:

(a) If a(k)r(k),k 6= 0, then, for each i ∈ {r(k) + 1, . . . ,m}, replace the ith row by the ith

row plus −a(k)ik /a(k)r(k),k times the r(k)th row5. Set r(k + 1) := r(k) + 1.

(b) If a(k)r(k),k = 0, and there exists i ∈ {r(k) + 1, . . . ,m} such that a

(k)ik 6= 0, then one

chooses such an i ∈ {r(k) + 1, . . . ,m} and switches the ith with the r(k)th row.One then proceeds as in (a).

5Comparing with Def. and Rem. 8.14, we observe this to be a variable substitution (in the case,where the matrix A represents the augmented matrix of a linear system), substituting the variable xk

in the ith row, using the r(k)th row to replace xk.


(c) If a(k)ik = 0 for each i ∈ {r(k), . . . ,m}, then set A(k+1) := A(k) and r(k + 1) := r(k).

Remark 8.18. Note that the Gaussian elimination algorithm of Alg. 8.17 stops afterat most n steps. Moreover, in its kth step, it can only manipulate elements that haverow number at least r(k) and column number at least k (the claim with respect to the

column numbers follows since a(k)ij = 0 for each (i, j) ∈ {r(k), . . . ,m} × {1, . . . , k − 1},

cf. the proof of the following Th. 8.19).

Theorem 8.19. Let F be a field and m,n ∈ N. Let A = (aji) ∈ M(m,n, F ). TheGaussian elimination algorithm of Alg. 8.17 yields a matrix A ∈ M(m,n, F ) in echelonform. More precisely, if r(k) = m or k = n + 1, then A := A(k) is in echelon form.Moreover, rk A = rkA and, if A = (B|b) and A = (B|b) represent the augmentedmatrices of the linear systems Bx = b and Bx = b, respectively, then L(B|b) = L(B|b).

Proof. Let N ∈ {1, . . . , n+1} be the maximal k occurring during the Gaussian elimina-tion algorithm, i.e. A := A(N). To prove that A is in echelon form, we show by inductionon k ∈ {1, . . . , N} that the first k− 1 columns of A(k) are in echelon form as well as the

first r(k) rows of A(k) with a(k)ij = 0 for each (i, j) ∈ {r(k), . . . ,m} × {1, . . . , k − 1}. For

k = 1, the assertion is trivially true. By induction, we assume the assertion for k < Nand prove it for k + 1 (for k = 1, we do not assume anything). As already stated inRem. 8.18 above, the kth step of Alg. 8.17 does not change the first r(k)−1 rows of A(k)

and it does not change the first k − 1 columns of A(k), since (by induction hypothesis)

a(k)ij = 0 for each (i, j) ∈ {r(k), . . . ,m} × {1, . . . , k − 1}. Moreover, if we are in the case

of Alg. 8.17(a), then

∀i∈{r(k)+1,...,m}

a(k+1)ik = a

(k)ik − a

(k)ik

a(k)r(k),k

a(k)r(k),k = 0.

In each case, after the application of (a), (b), or (c) of Alg. 8.17, a(k+1)ij = 0 for each

(i, j) ∈ {r(k + 1), . . . ,m} × {1, . . . , k}. We have to show that the first r(k + 1) rows ofA(k+1) are in echelon form. For Alg. 8.17(c), it is r(k + 1) = r(k) and there is nothing

to prove. For Alg. 8.17(a),(b), we know a(k+1)r(k+1),j = 0 for each j ∈ {1, . . . , k}, while

a(k+1)r(k),k 6= 0, showing that the first r(k + 1) rows of A(k+1) are in echelon form in each

case. Thus, we know that the first r(k + 1) rows of A(k+1) are in echelon form, andall elements in the first k columns of A(k+1) below row r(k + 1) are zero, showing thatthe first k columns of A(k+1) are also in echelon form. As, at the end of the Gaussianelimination algorithm, r(k) = m or k = n+ 1, we have shown A to be in echelon form.That rk A = rkA is an immediate consequence of Cor. 8.16 combined with a simpleinduction. If A = (B|b) and A = (B|b) represent the augmented matrices of the linearsystems Bx = b and Bx = b, respectively, then L(B|b) = L(B|b) is an immediateconsequence of Th. 8.15 combined with a simple induction. �

Remark 8.20. To avoid mistakes, especially when performing the Gaussian eliminationalgorithm manually, it is advisable to check the row sums after each step. It obviouslysuffices to consider how row sums are changed if (a) of Alg. 8.17 is carried out. Moreover,


let i ∈ {r(k) + 1, . . . ,m}, as only rows with row numbers i ∈ {r(k) + 1, . . . ,m} can be

changed by (a) in the kth step. If s(k)i =

∑nj=1 a

(k)ij is the sum of the ith row before (a)

has been carried out in the kth step, then

s(k+1)i =

n∑

j=1

a(k+1)ij =

n∑

j=1

(

a(k)ij − a

(k)ik

a(k)r(k),k

a(k)r(k),j

)

= s(k)i − a

(k)ik

a(k)r(k),k

s(k)r(k) (8.18)

must be the sum of the ith row after (a) has been carried out in the kth step.

Remark 8.21. Theorem 8.19 shows that the Gaussian elimination algorithm of Alg.8.17 provides a reliable systematic way of solving linear systems (when combined withthe back substitution Alg. 8.10). It has few simple rules, which make it easy to implementas computer code. However, the following Ex. 8.22 illustrates that Alg. 8.17 is notalways the most efficient way to obtain a solution and, especially when solving smalllinear systems manually, it often makes sense to deviate from it. In the presence ofroundoff errors, there can also be good reasons to deviate from Alg. 8.17, which aretypically addressed in classes on Numerical Analysis (see, e.g. [Phi17a, Sec. 5]).

Example 8.22. Over the field R, consider the linear system

Ax =

6 0 10 2 01 −1 −1

x1x2x3

=

460

=: b, (8.19)

which we can also write in the form

6x1 + x3 = 42x2 = 6

x1 − x2 − x3 = 0

One can quickly solve this linear system without making use of Alg. 8.17 and Alg.8.10: The second equation yields x2 = 3, which, plugged into the third equation, yieldsx1−x3 = 3. Adding this to the first equation yields 7x1 = 7, i.e. x1 = 1. Thus, the firstequation then provides x3 = 4− 6 = −2 and we obtain

L(A|b) =

13−2

. (8.20)

To illustrate Alg. 8.17 and Alg. 8.10, we solve (8.19) once again, first using Alg. 8.17 totransform the system to echelon form, then using Alg. 8.10 to obtain the solution: Foreach k ∈ {1, 2, 3}, we provide (A(k)|b(k)), r(k), and the row sums s

(k)i , i ∈ {1, 2, 3}:

(A(1)|b(1)) := (A|b) =

6 0 1 40 2 0 61 −1 −1 0

, r(1) := 1,

s(1)1 = 11, s

(1)2 = 8, s

(1)3 = −1.


For k = 2, we apply Alg. 8.10(a), where we replace the third row of (A(1)|b(1)) by thethird row plus (−1

6) times the first row to obtain

(A(2)|b(2)) :=

6 0 1 40 2 0 60 −1 −7

6−2

3

, r(2) := r(1) + 1 = 2,

s(2)1 = 11, s

(2)2 = 8, s

(2)3 = s

(1)3 − 1

6· s(1)1 = −1− 11

6= −17

6.

For k = 3, we apply Alg. 8.10(a) again, where we replace the third row of (A(2)|b(2)) bythe third row plus 1

2times the second row to obtain

(A(3)|b(3)) :=

6 0 1 40 2 0 60 0 −7

673

, r(3) := r(2) + 1 = 3,

s(3)1 = 11, s

(3)2 = 8, s

(3)3 = s

(2)3 +

1

2· s(2)2 = −17

6+ 4 =

7

6.

Since r(3) = 3, Alg. 8.10 is done and (A|b) := (A(3)|b(3)) is in echelon form. In particular,we see 3 = rk A = rk(A|b) = rk(A|b), i.e. Ax = b has a unique solution, which we nowdetermine via the back substitution Alg. 8.10: We obtain

x3 = −7

3· 67= −2, x2 =

6

2= 3, x1 =

1

6(4− x3) = 1,

recovering (8.20).

Example 8.23. (a) In Ex. 8.2(b), we saw that, in R4, the question if b ∈ 〈v1, v2, v3〉with b := (1, 2, 2, 1) and v1 := (1, 2, 0, 3), v2 := (0, 3, 2, 1), v3 := (1, 1, 1, 1) isequivalent to determining if the linear system Ax = b with augmented matrix

(A|b) :=

1 0 1 | 12 3 1 | 20 2 1 | 23 1 1 | 1

has a solution. Let us answer this question by transforming the system to echelonform via elementary row operations: Replacing the 2nd row by the 2nd row plus(−2) times the 1st row, and replacing the 4th row by the 4th row plus (−3) timesthe 1st row yields

1 0 1 | 00 3 −1 | 00 2 1 | 20 1 −2 | −2

.

Switching rows 2 and 4 yields

1 0 1 | 00 1 −2 | −20 2 1 | 20 3 −1 | 0

.


Replacing the 3rd row by the 3rd row plus (−2) times the 2nd row, and replacingthe 4th row by the 4th row plus (−3) times the 2nd row yields

1 0 1 | 00 1 −2 | −20 0 5 | 60 0 5 | 6

.

Replacing the 4th row by the 4th row plus (−1) times the 3rd row yields

(A|b) :=

1 0 1 | 00 1 −2 | −20 0 5 | 60 0 0 | 0

.

Thus, 3 = rkA = rk A = rk(A|b) = rk(A|b), showing Ax = b to have a uniquesolution and b ∈ 〈v1, v2, v3〉.

(b) In Ex. 8.2(a), we saw that the question, if the set M := {v1, v2, v3} with v1, v2, v3as in (a) is a linearly dependent subset of R4, can be equivalently posed as thequestion, if the linear system Ax = 0, with the same matrix A as in (a), has asolution x = (x1, x2, x3) 6= (0, 0, 0). From (a), we know rkA = rk A = 3, implyingL(A|0) = {0} and M is linearly independent.

8.4 LU Decomposition

8.4.1 Definition and Motivation

If one needs to solve linear systems Ax = b with the same matrix A, but varying b (as,e.g., in the problem of finding an inverse to an n×n matrix A, cf. Ex. 8.2(c)), then it isnot efficient to separately transform each (A|b) into echelon form. A more efficient wayaims at decomposing A into simpler matrices L and U , i.e. A = LU , then facilitatingthe simpler solution of Ax = LUx = b when varying b (i.e. without having to transform(A|b) into echelon form for each new b), cf. Rem. 8.25 below.

Definition 8.24. Let S be a ring with unity and A ∈ M(m,n, S) with m,n ∈ N. Adecomposition

A = LA (8.21a)

is called an LU decomposition of A if, and only if, L ∈ BL1m(S) (i.e. L is unipotent lower

triangular) and A ∈ M(m,n, S) is in echelon form. If A is an m × m matrix, thenA ∈ BUm(S) (i.e. A is upper triangular) and (8.21a) is an LU decomposition in thestrict sense, which is emphasized by writing U := A:

A = LU. (8.21b)


Remark 8.25. Let F be a field. Suppose the goal is to solve linear systems Ax = bwith fixed matrix A ∈ M(m,n, F ) and varying vector b ∈ F n. Moreover, suppose oneknows an LU decomposition A = LA. Then

x ∈ L(A|b) ⇔ Ax = LAx = b ⇔ Ax = L−1b ⇔ x ∈ L(A|L−1b). (8.22)

Thus, solving Ax = LAx = b is equivalent to solving solving Lz = b for z and thensolving Ax = z = L−1b for x. Since A is in echelon form, Ax = z = L−1b can besolved via the back substitution Alg. 8.10; and, since L is lower triangular, Lz = b canbe solved via an analogous version of Alg. 8.10, called forward substitution. To obtainthe entire set L(A|b), one uses L(A|0) = L(A|0) and L(A|b) = x0 + L(A|0) for eachx0 ∈ L(A|b), where one finds x0 as described above and L(A|0) via the back substitutionAlg. 8.10. Even though the described strategy works fine in many situations, it can failin the presence of roundoff errors (we will come back to this issue in Rem. 8.36 below).

Example 8.26. Unfortunately, not every matrix has an LU decomposition: Consider,for example

A = (aji) :=

(0 11 1

)

∈ M(2, 2,R).

Clearly, rkA = 2, i.e. A is regular. Suppose, A has an LU decomposition, i.e. A = LUwith L = (lji) unipotent lower triangular and U = (uji) upper triangular. But then0 = a11 = l11u11 = u11, showing the first column of U to be 0, i.e. U is singular, incontradiction to A being regular.

—

Even though the previous example shows not every matrix has an LU decomposition, wewill see in Th. 8.34 below that every matrix does have an LU decomposition, providedone is first allowed to permute its rows in a suitable manner.

8.4.2 Elementary Row Operations Via Matrix Multiplication

The LU decomposition of Th. 8.34 below can be obtained, with virtually no extraeffort, by performing the Gaussian elimination Alg. 8.17. The proof will be foundedon the observation that each elementary row operation can be accomplished via leftmultiplication with a suitable matrix: For row multiplication, one multiplies from theleft with a diagonal matrix, for row addition, one multiplies from the left with a so-called Frobenius matrix or with the transpose of such a matrix (cf. Cor. 8.29 below);for row switching, one multiplies from the left with a so-called permutation matrix (cf.Cor. 8.33(a) below) – where, for Alg. 8.17 and the proof of Th. 8.34, we only need rowswitching and row addition by means of a Frobenius matrix (not its transpose).

Definition 8.27. Let n ∈ N and let S be a ring with unity. We call a unipotent lowertriangular matrix A = (aji) ∈ BL1

n(S) a Frobenius matrix of index k, k ∈ {1, . . . , n}, if,


and only if, aji = 0 for each i /∈ {j, k}, i.e. if, and only if, it has the following form:

A =

1. . .

1ak+1,k 1ak+2,k 1

.... . .

an,k 1

. (8.23)

We defineFrokn(S) :=

{A ∈ M(n, S) : A is Frobenius of index k

}.

Proposition 8.28. Let S be a ring with unity, n ∈ N.

(a) If m ∈ N, A = (aji) ∈ M(m,n, S), B = (bji) ∈ Frokm(S), k ∈ {1, . . . ,m},

B =

1. . .

1bk+1,k 1bk+2,k 1

.... . .

bm,k 1

,

and C = (cji) = BA ∈ M(m,n, S), then

cji =m∑

α=1

bjαaαi =

{

aji for each j ∈ {1, . . . , k},bjkaki + aji for each j ∈ {k + 1, . . . ,m}. (8.24)

(b) Let F := S be a field and A = (aji) ∈ M(m,n, F ), m ∈ N. Moreover, letA(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of the Gaussianelimination Alg. 8.17 and assume the kth step is being performed by Alg. 8.17(a),i.e. by, for each i ∈ {r(k) + 1, . . . ,m}, replacing the ith row by the ith row plus

−a(k)ik /a(k)r(k),k times the r(k)th row, then

A(k+1) = Lk A(k), Lk ∈ Fror(k)m (F ), (8.25a)

where

Lk =

1. . .

1−lr(k)+1,r(k) 1−lr(k)+2,r(k) 1

.... . .

−lm,r(k) 1

, li,r(k) := a(k)ik /a

(k)r(k),k. (8.25b)


(c) Let k ∈ {1, . . . , n}. If L = (lji) ∈ Frokn(S), then L−1 ∈ Frokn(S). More precisely,

L =

1. . .

1lk+1,k 1lk+2,k 1...

. . .

ln,k 1

⇒ L−1 =

1. . .

1−lk+1,k 1−lk+2,k 1

.... . .

−ln,k 1

.

(d) If

L = (lji) =

1 0 . . . 0l21 1 . . . 0...

.... . .

...ln1 ln2 . . . 1

∈ BL1n(S)

is an arbitrary unipotent lower triangular matrix, then it is the product of the fol-lowing n− 1 Frobenius matrices:

L = L1 · · · Ln−1,

where

Lk :=

1. . .

1lk+1,k 1lk+2,k 1...

. . .

ln,k 1

∈ Frokn(S) for each k ∈ {1, . . . , n− 1}.

Proof. (a): Formula (8.24) is immediate from the definition of matrix multiplication.

(b) follows by applying (a) with B := Lk and A := A(k).

(c): Applying (a) with A := L and B := L−1, where L−1 is defined as in the statementof (c), we can use (8.24) to compute C = BA:

cji(8.24)=

aji =

{

0 for each j ∈ {1, . . . , k}, i 6= j,

1 for each j ∈ {1, . . . , k}, i = j,

bjkaki + aji =

−ljk · 0 + 0 = 0 for each j ∈ {k + 1, . . . , n}, i 6= j, k,

−ljk · 1 + ljk = 0 for each j ∈ {k + 1, . . . , n}, i = k,

−ljk · 0 + 1 = 1 for each j ∈ {k + 1, . . . , n}, i = j,

showing C = Id, thereby establishing the case.


(d): We proof by induction on k = n− 1, . . . , 1 that

n−1∏

α=k

Lα =

1

0. . .

.... . . 1

0 0 1

0 0 lk+1,k. . .

0 0 lk+2,k. . . 1

... . . ....

.... . . ln−1,n−2 1

0 . . . 0 lnk . . . ln,n−2 ln,n−1 1

=: Ak.

For k = n − 1, there is nothing to prove, since Ln−1 = An−1 by definition. So let1 ≤ k < n − 1. We have to show that C := LkAk+1 = Ak. Letting A := Ak+1 andletting B := Lk, we can use (8.24) to compute C := BA:

cji(8.24)=

aji =

{

0 for each j ∈ {1, . . . , k}, j 6= i,

1 for each j ∈ {1, . . . , k}, j = i,

bjkaki + aji =

{

ljk · 0 + aji = aji for each j ∈ {k + 1, . . . , n}, i 6= k,

ljk · 1 + 0 = ljk for each j ∈ {k + 1, . . . , n}, i = k,

showing C = Ak, thereby establishing the case. �

Corollary 8.29. Consider the situation of Def. 8.13, i.e. let F be a field, m,n ∈ N,and A ∈ M(m,n, F ) with rows r1, . . . , rm.

(a) The elementary row operation of row multiplication, i.e. of replacing row ri by somenonzero multiple λri of that row, i ∈ {1, . . . ,m}, λ ∈ F \{0}, can be accomplished byleft-multiplying A by the diagonal matrix Dλ := diag(d1, . . . , dm) ∈ Dm(F ), where

∀j∈{1,...,m}

dj :=

{

λ for j = i,

1 for j 6= i.

(b) The elementary row operation of row addition, i.e. of replacing row ri by the sumri + λrk of that row and a multiple of another row, (i, k) ∈ {1, . . . ,m}2, i 6= k,λ ∈ F , can be accomplished, for i > k, by left-multiplying A by the Frobeniusmatrix Fλ ∈ Frokm(F ),

Fλ =

1. . .

1fk+1,k 1fk+2,k 1

.... . .

fm,k 1

, ∀j∈{k+1,...,m}

fj,k :=

{

λ for j = i,

0 for j 6= i,


and, for i < k, by left-multiplying A by the transpose of the Frobenius matrixFλ ∈ Froim(F ), where Fλ is as above, except with i and k switched.

Proof. (a) is immediate from (7.27a).

(b): For i > k, we apply Prop. 8.28(a) with B := Fλ, obtaining for (cji) := FλA,

cjl =

ajl for each j ∈ {1, . . . , k},fjkakl + ajl = λakl + ajl for j = i,

fjkakl + ajl = ajl for each j ∈ {k + 1, . . . ,m}, j 6= i,

thereby establishing the case. For i < k, we obtain, for (bji) := (Fλ)t, (cji) := (Fλ)

tA,

cjl =m∑

α=1

bjαaαl =

{

ajl for each j ∈ {1, . . . ,m}, j 6= i,

ajl + bjkakl = ajl + λakl for j = i,

once again, establishing the case. �

Definition 8.30. Let n ∈ N and recall the definition of the symmetric group Sn fromEx. 4.9(b). Moreover, let F be a field (actually, the following definition still makes senseas long as F is a set containing elements 0 and 1). For each permutation π ∈ Sn, wedefine an n× n matrix

Pπ :=(etπ(1) etπ(2) . . . etπ(n)

)∈ M(n, F ), (8.26)

where the ei denote the standard unit (row) vectors of F n. The matrix Pπ is called thepermutation matrix (in M(n, F )) associated with π. We define

Pern(F ) :={Pπ ∈ M(n, F ) : π ∈ Sn

}.

Proposition 8.31. Let n ∈ N and let F be a field.

(a) If π ∈ Sn and the columns of Pπ ∈ Pern(F ) are given according to (8.26), then therows of Pπ are given according to the inverse permutation π−1,

Pπ =

eπ−1(1)

eπ−1(2)

. . .eπ−1(n)

, (Pπ)

t = Pπ−1 . (8.27)

(b) P := (pji) ∈ M(n, F ) is a permutation matrix if, and only if, each row and eachcolumn of P have precisely one entry 1 and all other entries of P are 0.

(c) Left multiplication of a matrix A ∈ M(n,m, F ), m ∈ N, by a permutation matrixPπ ∈ Pern(F ), permutes the rows of A according to π,

Pπ

— r1 —— r2 —

—... —

— rn —

=

— rπ(1) —— rπ(2) —

—... —

— rπ(n) —

,


which follows from the special case

Pπ eti =

eπ−1(1)

eπ−1(2)

. . .eπ−1(n)

eti =

eπ−1(1) · etieπ−1(2) · eti

. . .eπ−1(n) · eti

= etπ(i),

that holds for each i ∈ {1, . . . , n}.

(d) Right multiplication of a matrix A ∈ M(m,n, F ), m ∈ N, by a permutation matrixPπ ∈ Pern(F ) permutes the columns of A according to π−1,

| | . . . |c1 c2 . . . cn| | . . . |

Pπ =

| | . . . |cπ−1(1) cπ−1(2) . . . cπ−1(n)

| | . . . |

,

which follows from the special case

ei Pπ = ei(etπ(1) etπ(2) . . . etπ(n)

)=(ei · etπ(1) ei · etπ(2) . . . ei · etπ(n)

)

= eπ−1(i),

that holds for each i ∈ {1, . . . , n}.

(e) For each π, σ ∈ Sn, one hasPπPσ = Pπ◦σ, (8.28)

such thatI : Sn −→ Pern(F ), I(π) := Pπ,

constitutes a group isomorphism. In particular, Pern(F ) is a subgroup of GLn(F )and

∀P∈Pern(F )

P−1 = P t. (8.29)

Proof. (a): Consider the ith row of (pji) := Pπ. Then pji = 1 if, and only if, j = π(i)(since pji also belongs to the ith column). Thus, pji = 1 if, and only if, i = π−1(j),which means that the jth row is eπ−1(j).

(b): Suppose P = Pπ, π ∈ Sn, n ∈ N, is a permutation matrix. Then (8.26) impliesthat each column of P has precisely one 1 and all other elements are 0; (8.27) impliesthat each row of P has precisely one 1 and all other elements are 0.

Conversely, suppose that P ∈ M(n, F ) is such that each row and each column of P haveprecisely one entry 1 and all other entries of P are 0. Define a map π : {1, . . . , n} −→{1, . . . , n} by letting π(i) be such that pπ(i),i = 1. Then

P =(etπ(1) etπ(2) . . . etπ(n)

),

and it just remains to show that π is a permutation. It suffices to show that π issurjective. To that end, let k ∈ {1, . . . , n}. Then there is a unique i ∈ {1, . . . , n} such


that pki = 1. But then, according to the definition of π, π(i) = k, showing that π issurjective and P = Pπ.

(c): Note that, for A = (aji) ∈ M(n,m, F )

A =m∑

i=1

n∑

j=1

aji

(0 . . . etj

︸︷︷︸

ith column

. . . 0)

.

Thus,

PπA =m∑

i=1

n∑

j=1

ajiPπ

(0 . . . etj

︸︷︷︸

ith column

. . . 0)

=m∑

i=1

n∑

j=1

aji

eπ−1(1)

eπ−1(2)

. . .eπ−1(n)

(0 . . . etj

︸︷︷︸

ith column

. . . 0)

=m∑

i=1

n∑

j=1

aji

0 . . . eπ−1(1) · ej . . . 00 . . . eπ−1(2) · ej . . . 0

. . .0 . . . eπ−1(n) · ej

︸︷︷︸

ith column

. . . 0

=m∑

i=1

n∑

j=1

aji

(0 . . . etπ(j)

︸︷︷︸

ith column

. . . 0)

,

which establishes the case.

(d): We have

| | . . . |c1 c2 . . . cn| | . . . |

Pπ

t

(8.27)= Pπ−1

— c1 —— c2 —

—... —

— cn —

(c)=

— cπ−1(1) —— cπ−1(2) —

—... —

— cπ−1(n) —

=

| | . . . |cπ−1(1) cπ−1(2) . . . cπ−1(n)

| | . . . |

t

,

proving (d).

(e): For π, σ ∈ Sn, we compute

PπPσ =

eπ−1(1)

eπ−1(2)

. . .eπ−1(n)

(etσ(1) etσ(2) . . . etσ(n)

)

(∗)=(et(π◦σ)(1) et(π◦σ)(2) . . . et(π◦σ)(n)

)= Pπ◦σ,


showing (8.28), where, at “(∗)”, we used that

∀i,j∈{1,...,n}

π−1(i) = σ(j) ⇔ i = (π ◦ σ)(j).

Both surjectivity and injectivity of I are clear from Def. 8.30 and, then, Pern(F ) mustbe a group by Prop. 4.11(c). Combining (8.28) with (b) proves (8.29) and, thus, alsoshows Pern(F ) to be a subgroup of GLn(F ). �

Definition and Remark 8.32. Let n ∈ N and let F be a field. A permutation matrixPτ ∈ Pern(F ), corresponding to a transposition τ = (ji) ∈ Sn (τ permutes i and jand leaves all other elements fixed), is called a transposition matrix and is denoted byPji := Pτ . The transposition matrix Pji has the form

Pji =

1. . .

10

1. . .

1

1

10

1. . .

1

. (8.30)

Since every permutation is the composition of a finite number of transpositions (which wewill prove in Linear Algebra II), it is implied by Prop. 8.31(e) that every permutationmatrix is the product of a finite number of transposition matrices. It is immediatefrom Prop. 8.31(c),(d) that left multiplication of a matrix A ∈ M(n,m), m ∈ N,by Pji ∈ Pern(F ) switches the ith and jth row of A, whereas right multiplication ofA ∈ M(m,n) by Pji ∈ Pern(F ) switches the ith and jth column of A.

Corollary 8.33. Consider the situation of Def. 8.13, i.e. let F be a field, m,n ∈ N,and A ∈ M(m,n, F ) with rows r1, . . . , rm.

(a) The elementary row operation of row switching, i.e. of switching rows ri and rj,where i, j ∈ {1, . . . ,m}, can be accomplished by left-multiplying A by the transposi-tion matrix Pij ∈ Perm(F ).

(b) Let A(k) 7→ A(k+1) be the matrix transformation occurring in the kth step of theGaussian elimination Alg. 8.17 and assume the kth step is being performed by Alg.8.17(b), i.e. by first switching the ith row with the r(k)th row and then, for each i ∈{r(k)+1, . . . ,m}, replacing the (new) ith row by the (new) ith row plus −a(k)ik /a

(k)r(k),k

times the (new) r(k)th row, then

A(k+1) = Lk Pi,r(k)A(k) (8.31)


with transposition matrix Pi,r(k) ∈ Perm(F ) and with Lk ∈ Fror(k)m (F ) being theFrobenius matrix of index r(k) defined in (8.25b).

Proof. (a) is immediate from Prop. 8.31(c) (cf. Def. and Rem. (8.32)).

(b) is due to (a) combined with Prop. 8.28(b). �

8.4.3 Existence

Theorem 8.34. Let F be a field and A ∈ M(m,n, F ) with m,n ∈ N. Then thereexists a permutation matrix P ∈ Perm(F ) such that PA has an LU decomposition inthe sense of Def. 8.24. More precisely, if A ∈ M(m,n, F ) is the matrix in echelon formresulting at the end of the Gaussian elimination Alg. 8.17 applied to A, then there exist apermutation matrix P ∈ Perm(F ) and a unipotent lower triangular matrix L ∈ BL1

m(F )such that

PA = LA. (8.32)

It is an LU decomposition in the strict sense (i.e. with A ∈ BUm(F )) for A beingquadratic (i.e. m×m). If no row switches occurred during the application of Alg. 8.17,then P = Idm and A itself has an LU decomposition. If A ∈ GLn(F ) has an LUdecomposition, then its LU decomposition is unique.

Proof. Let N ∈ {0, . . . , n} be the final number of steps that is needed to perform theGaussian elimination Alg. 8.17. If N = 0, then A consists of precisely one row and thereis nothing to prove (set L := P := (1)). If N ≥ 1, then let A(1) := A and, recursively, foreach k ∈ {1, . . . , N}, let A(k+1) be the matrix obtained in the kth step of the Gaussianelimination algorithm. If Alg. 8.17(b) is used in the kth step, then, according to Cor.8.33(b),

A(k+1) = Lk Pk A(k), (8.33)

where Pk := Pi,r(k) ∈ Perm(F ) is the transposition matrix that switches rows r(k) and i,

while Lk ∈ Fror(k)m (F ) is the Frobenius matrix of index r(k) given by (8.25b). If (a) or(c) of Alg. 8.17 is used in the kth step, then (8.33) also holds, namely with Pk := Idm

for (a) and with Lk := Pk := Idm for (c). By induction, (8.33) implies

A = A(N+1) = LNPN · · ·L1P1A. (8.34)

To show that we can transform (8.34) into (8.32), we first rewrite the right-hand side of(8.34) taking into account P−1

k = Pk for each of the transposition matrices Pk:

A = LN (PNLN−1 PN )(PN︸︷︷︸

Idm

PN−1LN−2 PN−1PN )(PNPN−1︸︷︷︸

Idm

PN−2 · · ·L1 P2 · · ·PN )PNPN−1 · · ·P2︸︷︷︸

Idm

P1A.

(8.35)

Defining

L′N := LN , L′

k := PNPN−1 · · ·Pk+1LkPk+1 · · ·PN−1PN for each k ∈ {1, . . . , N − 1},(8.36)


(8.35) takes the formA = L′

N · · ·L′1PNPN−1 · · ·P2P1A. (8.37)

We now observe that the L′k are still Frobenius matrices of index r(k), except that the

entries lj,r(k) of Lk with j > r(k) have been permuted according to PNPN−1 · · ·Pk+1:This follows since

Pji

1. . .

1...

. . .

li,r(k) 1...

. . .

lj,r(k) 1...

. . .

Pji =

1. . .

1...

. . .

lj,r(k) 1...

. . .

li,r(k) 1...

. . .

,

as left multiplication by Pji switches the ith and jth row, switching li,r(k) and lj,r(k)and moving the corresponding 1’s off the diagonal, whereas right multiplication by Pji

switches ith and jth column, moving both 1’s back onto the diagonal while leaving ther(k)th column unchanged.

Finally, using that each (L′k)

−1, according to Prop. 8.28(c), exists and is also a Frobeniusmatrix, (8.37) becomes

LA = PA (8.38)

withP := PNPN−1 · · ·P2P1, L := (L′

1)−1 · · · (L′

N)−1. (8.39)

Comparing with Prop. 8.28(d) shows that L = (L′1)

−1 · · · (L′N)

−1 is a unipotent lowertriangular matrix, thereby establishing (8.32). Note: From Prop. 8.28(d) and the def-inition of the Lk, we actually know all entries of L explicitly: Every nonzero entry ofthe r(k)th column is given by (8.25b). This will be used when formulating the LUdecomposition algorithm in Sec. 8.4.4 below.

If A is invertible, then so is U := A = L−1A. If A is invertible, and we have LUdecompositions

L1U1 = A = L2U2,

then we obtain U1U−12 = L−1

1 L2 =: E. As both upper triangular and unipotent lowertriangular matrices are closed under matrix multiplication, the matrix E is unipotentand both upper and lower triangular, showing E = Idm. This, in turn, yields U1 = U2

as well as L1 = L2, i.e. the claimed uniqueness of the LU decomposition for invertibleA. �

Remark 8.35. Note that, even for invertible A ∈ GLn(F ), the triple (P,L, A) of (8.32)


is, in general, not unique. For example(1 00 1

)

︸︷︷︸

P1

(1 11 0

)

︸︷︷︸

A

=

(1 01 1

)

︸︷︷︸

L1

(1 10 −1

)

︸︷︷︸

A1

,

(0 11 0

)

︸︷︷︸

P2

(1 11 0

)

︸︷︷︸

A

=

(1 01 1

)

︸︷︷︸

L2

(1 00 1

)

︸︷︷︸

A2

.

Remark 8.36. Let F be a field. Suppose the goal is to solve linear systems Ax = bwith fixed matrix A ∈ M(m,n, F ) and varying vector b ∈ F n. Moreover, suppose oneknows an LU decomposition PA = LA with permutation matrix P ∈ Perm(F ). Onecan then proceed as in Rem. 8.25 above, except with A replaced by PA and b replacedby Pb. As already indicated in Rem. 8.25, in the presence of roundoff errors, such errorsmight be magnified by the use of an LU decomposition. If one works over the fields Ror C, then there are other decompositions (e.g. the so-called QR decomposition) thattend to behave more benignly in the presence of roundoff errors (cf., e.g., [Phi17a, Rem.5.16(c)] and [Phi17a, Sec. 5.4.3]).

8.4.4 Algorithm

Let us crystallize the proof of Th. 8.34 into an algorithm to actually compute thematrices L, P , and A occurring in decomposition (8.32) of a given A ∈ M(m,n, F ). Asin the proof of Th. 8.19, let N ∈ {1, . . . , n+ 1} be the maximal k occurring during theGaussian elimination Alg. 8.17 applied to A.

(a) Algorithm for A: Starting with A(1) := A, the final step of Alg. 8.17 yields thematrix A := A(N) ∈ M(m,n, F ) in echelon form.

(b) Algorithm for P : Starting with P (1) := Idm, in the kth step of Alg. 8.17, defineP (k+1) := PkP

(k), where Pk := Pi,r(k) if rows r(k) and i are switched according toAlg. 8.17(b), and Pk := Idm, otherwise. In the last step, one obtains P := P (N).

(c) Algorithm for L: Starting with L(1) := 0 (zero matrix), we obtain L(k+1) from L(k)

in the kth step of Alg. 8.17 as follows: For Alg. 8.17(c), set L(k+1) := L(k). If rowsr(k) and i are switched according to Alg. 8.17(b), then we first switch rows r(k) andi in L(k) to obtain some L(k) (this conforms to the definition of the L′

k in (8.36), wewill come back to this point below). For Alg. 8.17(a) and for the elimination step ofAlg. 8.17(b), we first copy all elements of L(k) (resp. L(k)) to L(k+1), but then change

the elements of the r(k)th column according to (8.25b): Set l(k+1)i,r(k) := a

(k)ik /a

(k)r(k),k

for each i ∈ {r(k) + 1, . . . ,m}. In the last step, we obtain L from L(N) by settingall elements on the diagonal to 1 (postponing this step to the end avoids worryingabout the diagonal elements when switching rows earlier).

To see that this procedure does, indeed, provide the correct L, we go back to theproof of Th. 8.34: From (8.38), (8.39), and (8.36), it follows that the r(k)th column


of L is precisely the r(k)th column of L−1k permuted according to PN · · ·Pk+1. This

is precisely what is described in (c) above: We start with the r(k)th column of L−1k

by setting l(k+1)i,r(k) := a

(k)ik /a

(k)r(k),k and then apply Pk+1, . . . , PN during the remaining

steps.

Remark 8.37. Note that in the kth step of the algorithm for A, we eliminate allelements of the kth column below the row with number r(k), while in the kth step ofthe algorithm for L, we populate elements of the r(k)th column below the diagonal forthe first time. Thus, when implementing the algorithm in practice, one can save memorycapacity by storing the new elements for L at the locations of the previously eliminatedelements of A. This strategy is sometimes called compact storage.

Example 8.38. Let us determine the LU decomposition (8.32) (i.e. the matrices P , L,and A) for the matrix

A :=

1 4 2 30 0 1 42 6 3 11 2 1 0

∈ M(4, 4,R).

According to the algorithm described above, we start by initializing

A(1) := A, P (1) := Id4, L(1) := 0, r(1) := 1

(where the function r is the one introduced in the Gaussian elimination Alg. 8.17). Weuse Alg. 8.17(a) to eliminate in the first column, obtaining

P (2) = P (1) = Id4, L(2) =

0 0 0 00 0 0 02 0 0 01 0 0 0

,

A(2) =

1 4 2 30 0 1 40 −2 −1 −50 −2 −1 −3

, r(2) = r(1) + 1 = 2.

We now need to apply Alg. 8.17(b) and we switch rows 2 and 3 using P23:

P (3) = P23P(2) =

1 0 0 00 0 1 00 1 0 00 0 0 1

,

P23L(2) =

0 0 0 02 0 0 00 0 0 01 0 0 0

, P23A

(2) =

1 4 2 30 −2 −1 −50 0 1 40 −2 −1 −3

.


Eliminating the second column yields

L(3) =

0 0 0 02 0 0 00 0 0 01 1 0 0

, A(3) =

1 4 2 30 −2 −1 −50 0 1 40 0 0 2

, r(3) = r(2) + 1 = 3.

Accidentally, elimination of the third column does not require any additional work andwe have

P = P (4) = P (3) =

1 0 0 00 0 1 00 1 0 00 0 0 1

, L(4) = L(3), L =

1 0 0 02 1 0 00 0 1 01 1 0 1

,

A = A(4) = A(3) =

1 4 2 30 −2 −1 −50 0 1 40 0 0 2

, r(4) = r(3) + 1 = 4.

Recall that L is obtained from L(4) by setting the diagonal values to 1. One checks that,indeed, PA = LA.

Example 8.39. We remain in the situation of the previous Ex. 8.38. We see thatrkA = rk A = 4, showing A to be invertible. In Ex. 8.2(c), we observed that we obtainthe columns vk of A−1 by solving the linear systems

Av1 = e1, . . . , Avn = en,

where n = 4 in the present case and where e1, . . . , en denote the standard (column) basisvectors. We now determine the vk using the LU decomposition of Ex. 8.38 together withthe strategies described in Rem. 8.36 and Rem. 8.25: First, with

L =

1 0 0 02 1 0 00 0 1 01 1 0 1

we solve

Lz1 = Pe1 = e1, Lz2 = Pe2 = e3, Lz3 = Pe3 = e2, Lz4 = Pe4 = e4,

using forward substitution:

z11 = 1, z12 = −2z11 = −2, z13 = 0, z14 = −z11 − z12 = −1 + 2 = 1,

z21 = 0, z22 = 0, z23 = 1, z24 = 0,

z31 = 0, z32 = 1− 2z31 = 1, z33 = 0, z34 = −z31 − z32 = −1,

z41 = 0, z42 = 0, z43 = 0, z44 = 1,


obtaining the column vectors

z1 =

1−201

, z2 =

0010

, z3 =

010−1

, z4 =

0001

.

Next, with

A =

1 4 2 30 −2 −1 −50 0 1 40 0 0 2

we solveAv1 = z1, Av2 = z2, Av3 = z3, Av4 = z4,

using backward substitution:

v14 =1

2, v13 = 0− 4v14 = −2, v12 =

(

−1

2

)

(−2 + v13 + 5v14) =3

4, v11 = 1− 4v12 − 2v13 − 3v14 =

1

2,

v24 = 0, v23 = 1− 4v24 = 1, v22 =

(

−1

2

)

(0 + v23 + 5v24) = −1

2, v21 = 0− 4v22 − 2v23 − 3v24 = 0,

v34 = −1

2, v33 = 0− 4v34 = 2, v32 =

(

−1

2

)

(1 + v33 + 5v34) = −1

4, v31 = 0− 4v32 − 2v33 − 3v34 = −

3

2,

v44 =1

2, v43 = 0− 4v44 = −2, v42 =

(

−1

2

)

(0 + v43 + 5v44) = −1

4, v41 = 0− 4v42 − 2v43 − 3v44 =

7

2,

obtaining the column vectors

v1 =

1234

−212

, v2 =

0

−12

1

0

, v3 =

−32

−14

2

−12

, v4 =

72

−14

−212

.

Thus,

A−1 =1

4·

2 0 −6 143 −2 −1 −1−8 4 8 −82 0 −2 2

.

One verifies

AA−1 =

1 4 2 30 0 1 42 6 3 11 2 1 0

· 14·

2 0 −6 143 −2 −1 −1−8 4 8 −82 0 −2 2

=

1 0 0 00 1 0 00 0 1 00 0 0 1

.

A AXIOMATIC SET THEORY 161

A Axiomatic Set Theory

A.1 Motivation, Russell’s Antinomy

As it turns out, naive set theory, founded on the definition of a set according to Cantor(as stated at the beginning of Sec. 1.3) is not suitable to be used in the foundation ofmathematics. The problem lies in the possibility of obtaining contradictions such asRussell’s antinomy, after Bertrand Russell, who described it in 1901.

Russell’s antinomy is obtained when considering the set X of all sets that do not con-tain themselves as an element: When asking the question if X ∈ X, one obtains thecontradiction that X ∈ X ⇔ X /∈ X:

Suppose X ∈ X. Then X is a set that contains itself. But X was defined to containonly sets that do not contain themselves, i.e. X /∈ X.

So suppose X /∈ X. Then X is a set that does not contain itself. Thus, by the definitionof X, X ∈ X.

Perhaps you think Russell’s construction is rather academic, but it is easily translatedinto a practical situation. Consider a library. The catalog C of the library should containall the library’s books. Since the catalog itself is a book of the library, it should occuras an entry in the catalog. So there can be catalogs such as C that have themselves asan entry and there can be other catalogs that do not have themselves as an entry. Nowone might want to have a catalog X of all catalogs that do not have themselves as anentry. As in Russell’s antinomy, one is led to the contradiction that the catalog X musthave itself as an entry if, and only if, it does not have itself as an entry.

One can construct arbitrarily many versions, which we will not do. Just one more:Consider a small town with a barber, who, each day, shaves all inhabitants, who do notshave themselves. The poor barber now faces a terrible dilemma: He will have to shavehimself if, and only if, he does not shave himself.

To avoid contradictions such as Russell’s antinomy, axiomatic set theory restricts theconstruction of sets via so-called axioms, as we will see below.

A.2 Set-Theoretic Formulas

The contradiction of Russell’s antinomy is related to Cantor’s sets not being hierarchical.Another source of contradictions in naive set theory is the imprecise nature of informallanguages such as English. In (1.6), we said that

A := {x ∈ B : P (x)}

defines a subset of B if P (x) is a statement about an element x of B. Now takeB := N := {1, 2, . . . } to be the set of the natural numbers and let

P (x) := “The number x can be defined by fifty English words or less”. (A.1)


Then A is a finite subset of N, since there are only finitely many English words (if youthink there might be infinitely many English words, just restrict yourself to the wordscontained in some concrete dictionary). Then there is a smallest natural number n thatis not in A. But then n is the smallest natural number that can not be defined byfifty English words or less, which, actually, defines n by less than fifty English words, incontradiction to n /∈ A.

To avoid contradictions of this type, we require P (x) to be a so-called set-theoreticformula.

Definition A.1. (a) The language of set theory consists precisely of the following sym-bols: ∧,¬, ∃, (, ),∈,=, vj , where j = 1, 2, . . . .

(b) A set-theoretic formula is a finite string of symbols from the above language of settheory that can be built using the following recursive rules:

(i) vi ∈ vj is a set-theoretic formula for i, j = 1, 2, . . . .

(ii) vi = vj is a set-theoretic formula for i, j = 1, 2, . . . .

(iii) If φ and ψ are set-theoretic formulas, then (φ)∧ (ψ) is a set-theoretic formula.

(iv) If φ is a set-theoretic formulas, then ¬(φ) is a set-theoretic formula.

(v) If φ is a set-theoretic formulas, then ∃vj(φ) is a set-theoretic formula forj = 1, 2, . . . .

Example A.2. Examples of set-theoretic formulas are (v3 ∈ v5) ∧ (¬(v2 = v3)),∃v1(¬(v1 = v1)); examples of symbol strings that are not set-theoretic formulas arev1 ∈ v2 ∈ v3, ∃∃¬, and ∈ v3∃.

Remark A.3. It is noted that, for a given finite string of symbols, a computer can, inprinciple, check in finitely many steps, if the string constitutes a set-theoretic formulaor not. The symbols that can occur in a set-theoretic formula are to be interpretedas follows: The variables v1, v2, . . . are variables for sets. The symbols ∧ and ¬ areto be interpreted as the logical operators of conjunction and negation as described inSec. 1.2.2. Similarly, ∃ stands for an existential quantifier as in Sec. 1.4: The statement∃vj(φ) means “there exists a set vj that has the property φ”. Parentheses ( and ) areused to make clear the scope of the logical symbols ∃,∧,¬. Where the symbol ∈ occurs,it is interpreted to mean that the set to the left of ∈ is contained as an element in theset to the right of ∈. Similarly, = is interpreted to mean that the sets occurring to theleft and to the right of = are equal.

Remark A.4. A disadvantage of set-theoretic formulas as defined in Def. A.1 is thatthey quickly become lengthy and unreadable (at least to the human eye). To makeformulas more readable and concise, one introduces additional symbols and notation.Formally, additional symbols and notation are always to be interpreted as abbreviationsor transcriptions of actual set-theoretic formulas. For example, we use the rules of Th.


1.11 to define the additional logical symbols ∨, ⇒, ⇔ as abbreviations:

(φ) ∨ (ψ) is short for ¬((¬(φ)) ∧ (¬(ψ))) (cf. Th. 1.11(j)), (A.2a)

(φ) ⇒ (ψ) is short for (¬(φ)) ∨ (ψ) (cf. Th. 1.11(a)), (A.2b)

(φ) ⇔ (ψ) is short for ((φ) ⇒ (ψ)) ∧ ((ψ) ⇒ (φ)) (cf. Th. 1.11(b)). (A.2c)

Similarly, we use (1.17a) to define the universal quantifier:

∀vj(φ) is short for ¬(∃vj(¬(φ))). (A.2d)

Further abbreviations and transcriptions are obtained from omitting parentheses if it isclear from the context and/or from Convention 1.10 where to put them in, by writingvariables bound by quantifiers under the respective quantifiers (as in Sec. 1.4), and byusing other symbols than vj for set variables. For example,

∀x

(φ⇒ ψ) transcribes ¬(∃v1(¬((¬(φ)) ∨ (ψ)))).

Moreover,

vi 6= vj is short for ¬(vi = vj); vi /∈ vj is short for ¬(vi ∈ vj). (A.2e)

Remark A.5. Even though axiomatic set theory requires the use of set-theoretic for-mulas as described above, the systematic study of formal symbolic languages is thesubject of the field of mathematical logic and is beyond the scope of this class (see, e.g.,[EFT07]). In Def. and Rem. 1.15, we defined a proof of statement B from statement A1

as a finite sequence of statements A1, A2, . . . , An such that, for 1 ≤ i < n, Ai impliesAi+1, and An implies B. In the field of proof theory, also beyond the scope of this class,such proofs are formalized via a finite set of rules that can be applied to (set-theoretic)formulas (see, e.g., [EFT07, Sec. IV], [Kun12, Sec. II]). Once proofs have been formal-ized in this way, one can, in principle, mechanically check if a given sequence of symbolsdoes, indeed, constitute a valid proof (without even having to understand the actualmeaning of the statements). Indeed, several different computer programs have beendevised that can be used for automatic proof checking, for example Coq [Wik15a], HOLLight [Wik15b], and Isabelle [Wik15c] to name just a few.

A.3 The Axioms of Zermelo-Fraenkel Set Theory

Axiomatic set theory seems to provide a solid and consistent foundation for conductingmathematics, and most mathematicians have accepted it as the basis of their everydaywork. However, there do remain some deep, difficult, and subtle philosophical issuesregarding the foundation of logic and mathematics (see, e.g., [Kun12, Sec. 0, Sec. III]).

Definition and Remark A.6. An axiom is a statement that is assumed to be truewithout any formal logical justification. The most basic axioms (for example, the stan-dard axioms of set theory) are taken to be justified by common sense or some underlyingphilosophy. However, on a less fundamental (and less philosophical) level, it is a common


mathematical strategy to state a number of axioms (for example, the axioms definingthe mathematical structure called a group), and then to study the logical consequencesof these axioms (for example, group theory studies the statements that are true for allgroups as a consequence of the group axioms). For a given system of axioms, the ques-tion if there exists an object satisfying all the axioms in the system (i.e. if the systemof axioms is consistent, i.e. free of contradictions) can be extremely difficult to answer.

—

We are now in a position to formulate and discuss the axioms of axiomatic set the-ory. More precisely, we will present the axioms of Zermelo-Fraenkel set theory, usuallyabbreviated as ZF, which are Axiom 0 – Axiom 8 below. While there exist variousset theories in the literature, each set theory defined by some collection of axioms, theaxioms of ZFC, consisting of the axioms of ZF plus the axiom of choice (Axiom 9, seeSec. A.4 below), are used as the foundation of mathematics currently accepted by mostmathematicians.

A.3.1 Existence, Extensionality, Comprehension

Axiom 0 Existence:∃X

(X = X).

Recall that this is just meant to be a more readable transcription of theset-theoretic formula ∃v1(v1 = v1). The axiom of existence states that thereexists (at least one) set X.

In Def. 1.18 two sets are defined to be equal if, and only if, they contain precisely thesame elements. In axiomatic set theory, this is guaranteed by the axiom of extensionality:

Axiom 1 Extensionality:

∀X

∀Y

(

∀z

(z ∈ X ⇔ z ∈ Y ) ⇒ X = Y)

.

Following [Kun12], we assume than the substitution property of equality is part of theunderlying logic, i.e. if X = Y , then X can be substituted for Y and vice versa withoutchanging the truth value of a (set-theoretic) formula. In particular, this yields theconverse to extensionality:

∀X

∀Y

(

X = Y ⇒ ∀z

(z ∈ X ⇔ z ∈ Y ))

.

Before we discuss further consequences of extensionality, we would like to have theexistence of the empty set. However, Axioms 0 and 1 do not suffice to prove the exis-tence of an empty set (see [Kun12, I.6.3]). This, rather, needs the additional axiom ofcomprehension. More precisely, in the case of comprehension, we do not have a singleaxiom, but a scheme of infinitely many axioms, one for each set-theoretic formula. Itsformulation makes use of the following definition:


Definition A.7. One obtains the universal closure of a set-theoretic formula φ, bywriting ∀

vjin front of φ for each variable vj that occurs as a free variable in φ (recall

from Def. 1.31 that vj is free in φ if, and only if, it is not bound by a quantifier in φ).

Axiom 2 Comprehension Scheme: For each set-theoretic formula φ, not containing Yas a free variable, the universal closure of

∃Y

∀x

(

x ∈ Y ⇔ (x ∈ X ∧ φ))

is an axiom. Thus, the comprehension scheme states that, given the set X,there exists (at least one) set Y , containing precisely the elements of X thathave the property φ.

Remark A.8. Comprehension does not provide uniqueness. However, if both Y andY ′ are sets containing precisely the elements of X that have the property φ, then

∀x

(

x ∈ Y ⇔ (x ∈ X ∧ φ) ⇔ x ∈ Y ′)

,

and, then, extensionality implies Y = Y ′. Thus, due to extensionality, the set Y givenby comprehension is unique, justifying the notation

{x : x ∈ X ∧ φ} := {x ∈ X : φ} := Y (A.3)

(this is the axiomatic justification for (1.6)).

Theorem A.9. There exists a unique empty set (which we denote by ∅ or by 0 – it iscommon to identify the empty set with the number zero in axiomatic set theory).

Proof. Axiom 0 provides the existence of a set X. Then comprehension allows us todefine the empty set by

0 := ∅ := {x ∈ X : x 6= x},where, as explained in Rem. A.8, extensionality guarantees uniqueness. �

Remark A.10. In Rem. A.4 we said that every formula with additional symbols andnotation is to be regarded as an abbreviation or transcription of a set-theoretic formulaas defined in Def. A.1(b). Thus, formulas containing symbols for defined sets (e.g. 0or ∅ for the empty set) are to be regarded as abbreviations for formulas without suchsymbols. Some logical subtleties arise from the fact that there is some ambiguity in theway such abbreviations can be resolved: For example, 0 ∈ X can abbreviate either

ψ : ∃y

(

φ(y) ∧ y ∈ X)

or χ : ∀y

(

φ(y) ⇒ y ∈ X)

, where φ(y) stands for ∀v(v /∈ y).

Then ψ and χ are equivalent if ∃y! φ(y) is true (e.g., if Axioms 0 – 2 hold), but they can

be nonequivalent, otherwise (see discussion between Lem. 2.9 and Lem. 2.10 in [Kun80]).

—


At first glance, the role played by the free variables in φ, which are allowed to occurin Axiom 2, might seem a bit obscure. So let us consider examples to illustrate thatallowing free variables (i.e. set parameters) in comprehension is quite natural:

Example A.11. (a) Suppose φ in comprehension is the formula x ∈ Z (having Z as afree variable), then the set given by the resulting axiom is merely the intersectionof X and Z:

X ∩ Z := {x ∈ X : φ} = {x ∈ X : x ∈ Z}.

(b) Note that it is even allowed for φ in comprehension to have X as a free variable, soone can let φ be the formula ∃

u(x ∈ u ∧ u ∈ X) to define the set

X∗ :={

x ∈ X : ∃u

(x ∈ u ∧ u ∈ X)}

.

Then, if 0 := ∅, 1 := {0}, 2 := {0, 1}, we obtain

2∗ = {0} = 1.

—

It is a consequence of extensionality that the mathematical universe consists of setsand only of sets: Suppose there were other objects in the mathematical universe, forexample a cow C and a monkey M (or any other object without elements, other thanthe empty set) – this would be equivalent to allowing a cow or a monkey (or any otherobject without elements, other than the empty set) to be considered a set, which wouldmean that our set-theoretic variables vj were allowed to be a cow or a monkey as well.However, extensionality then implies the false statement C =M = ∅, thereby excludingcows and monkeys from the mathematical universe.

Similarly, {C} and {M} (or any other object that contains a non-set), can not be insidethe mathematical universe. Indeed, otherwise we had

∀x

(

x ∈ {C} ⇔ x ∈ {M})

(as C and M are non-sets) and, by extensionality, {C} = {M} were true, in contradic-tion to a set with a cow inside not being the same as a set with a monkey inside. Thus,we see that all objects of the mathematical universe must be so-called hereditary sets,i.e. sets all of whose elements (thinking of the elements as being the children of the sets)are also sets.

A.3.2 Classes

As we need to avoid contradictions such as Russell’s antinomy, we must not require theexistence of a set {x : φ} for each set-theoretic formula φ. However, it can still beuseful to think of a “collection” of all sets having the property φ. Such collections arecommonly called classes:


Definition A.12. (a) If φ is a set-theoretic formula, then we call {x : φ} a class,namely the class of all sets that have the property φ (typically, φ will have x as afree variable).

(b) If φ is a set-theoretic formula, then we say the class {x : φ} exists (as a set) if, andonly if

∃X

(

∀x

(

x ∈ X ⇔ φ))

(A.4)

is true. Then X is actually unique by extensionality and we identify X with theclass {x : φ}. If (A.4) is false, then {x : φ} is called a proper class (and the usualinterpretation is that the class is in some sense “too large” to be a set).

Example A.13. (a) Due to Russell’s antinomy of Sec. A.1, we know that R := {x :x /∈ x} forms a proper class.

(b) The universal class of all sets, V := {x : x = x}, is a proper class. Once again,this is related to Russell’s antinomy: If V were a set, then

R = {x : x /∈ x} = {x : x = x ∧ x /∈ x} = {x : x ∈ V ∧ x /∈ x}

would also be a set by comprehension. However, this is in contradiction to R beinga proper class by (a).

Remark A.14. From the perspective of formal logic, statements involving properclasses are to be regarded as abbreviations for statements without proper classes. Forexample, it turns out that the class G of all sets forming a group is a proper class. Butwe might write G ∈ G as an abbreviation for the statement “The set G is a group.”

A.3.3 Pairing, Union, Replacement

Axioms 0 – 2 are still consistent with the empty set being the only set in existence (see[Kun12, I.6.13]). The next axiom provides the existence of nonempy sets:

Axiom 3 Pairing:∀x∀y∃Z(x ∈ Z ∧ y ∈ Z).

Thus, the pairing axiom states that, for all sets x and y, there exists a set Zthat contains x and y as elements.

In consequence of the pairing axiom, the sets

0 := ∅, (A.5a)

1 := {0}, (A.5b)

2 := {0, 1} (A.5c)

all exist. More generally, we may define:


Definition A.15. If x, y are sets and Z is given by the pairing axiom, then we call

(a) {x, y} := {u ∈ Z : u = x ∨ u = y} the unordered pair given by x and y,

(b) {x} := {x, x} the singleton set given by x,

(c) (x, y) := {{x}, {x, y}} the ordered pair given by x and y (cf. Def. 2.1).

—

We can now show that ordered pairs behave as expected:

Lemma A.16. The following holds true:

∀x,y,x′,y′

(

(x, y) = (x′, y′) ⇔ (x = x′) ∧ (y = y′))

.

Proof. “⇐” is merely

(x, y) = {{x}, {x, y}} x=x′, y=y′

= {{x′}, {x′, y′}} = (x′, y′).

“⇒” is done by distinguishing two cases: If x = y, then

{{x}} = (x, y) = (x′, y′) = {{x′}, {x′, y′}}.

Next, by extensionality, we first get {x} = {x′} = {x′, y′}, followed by x = x′ = y′,establishing the case. If x 6= y, then

{{x}, {x, y}} = (x, y) = (x′, y′) = {{x′}, {x′, y′}},

where, by extensionality {x} 6= {x, y} 6= {x′}. Thus, using extensionality again, {x} ={x′} and x = x′. Next, we conclude

{x, y} = {x′, y′} = {x, y′}

and a last application of extensionality yields y = y′. �

While we now have the existence of the infinitely many different sets 0, {0}, {{0}}, . . . ,we are not, yet, able to form sets containing more than two elements. This is remediedby the following axiom:

Axiom 4 Union:∀M

∃Y∀x∀X

(

(x ∈ X ∧ X ∈ M) ⇒ x ∈ Y)

.

Thus, the union axiom states that, for each set of sets M, there exists a setY containing all elements of elements of M.


Definition A.17. (a) If M is a set and Y is given by the union axiom, then define

⋃

M :=⋃

X∈M

X :=

{

x ∈ Y : ∃X∈M

x ∈ X

}

.

(b) If X and Y are sets, then define

X ∪ Y :=⋃

{X, Y }.

(c) If x, y, z are sets, then define

{x, y, z} := {x, y} ∪ {z}.

Remark A.18. (a) The definition of set-theoretic unions as

⋃

i∈I

Ai :=

{

x : ∃i∈I

x ∈ Ai

}

in (1.25b) will be equivalent to the definition in Def. A.17(a) if we are allowed toform the set

M := {Ai : i ∈ I}.If I is a set and Ai is a set for each i ∈ I, then M as above will be a set by Axiom5 below (the axiom of replacement).

(b) In contrast to unions, intersections can be obtained directly from comprehensionwithout the introduction of an additional axiom: For example

X ∩ Y := {x ∈ X : x ∈ Y },⋂

i∈I

Ai :=

{

x ∈ Ai0 : ∀i∈I

x ∈ Ai

}

,

where i0 ∈ I 6= ∅ is an arbitrary fixed element of I.

(c) The union⋃

∅ =⋃

X∈∅

X =⋃

i∈∅

Ai = ∅

is the empty set – in particular, a set. However,

⋂

∅ =

{

x : ∀X∈∅

x ∈ X

}

= V =

{

x : ∀i∈∅

x ∈ Ai

}

=⋂

i∈∅

Ai,

i.e. the intersection over the empty set is the class of all sets – in particular, a properclass and not a set.


Definition A.19. We define the successor function

x 7→ S(x) := x ∪ {x} (for each set x).

Thus, recalling (A.5), we have 1 = S(0), 2 = S(1); and we can define 3 := S(2), . . . Ingeneral, we call the set S(x) the successor of the set x.

—

In Def. 2.3 and Def. 2.19, respectively, we define functions and relations in the usualmanner, making use of the Cartesian product A × B of two sets A and B, which,according to (2.2) consists of all ordered pairs (x, y), where x ∈ A and y ∈ B. However,Axioms 0 – 4 are not sufficient to justify the existence of Cartesian products. To obtainCartesian products, we employ the axiom of replacement. Analogous to the axiomof comprehension, the following axiom of replacement actually consists of a scheme ofinfinitely many axioms, one for each set-theoretic formula:

Axiom 5 Replacement Scheme: For each set-theoretic formula, not containing Y as afree variable, the universal closure of

(

∀x∈X

∃y! φ

)

⇒(

∃Y

∀x∈X

∃y∈Y

φ

)

is an axiom. Thus, the replacement scheme states that if, for each x ∈ X,there exists a unique y having the property φ (where, in general, φ will dependon x), then there exists a set Y that, for each x ∈ X, contains this y withproperty φ. One can view this as obtaining Y by replacing each x ∈ X bythe corresponding y = y(x).

Theorem A.20. If A and B are sets, then the Cartesian product of A and B, i.e. theclass

A×B :=

{

x : ∃a∈A

∃b∈B

x = (a, b)

}

exists as a set.

Proof. For each a ∈ A, we can use replacement with X := B and φ := φa being theformula y = (a, x) to obtain the existence of the set

{a} ×B := {(a, x) : x ∈ B} (A.6a)

(in the usual way, comprehension and extensionality were used as well). Analogously,using replacement again with X := A and φ being the formula y = {x} ×B, we obtainthe existence of the set

M := {{x} × B : x ∈ A}. (A.6b)

In a final step, the union axiom now shows⋃

M =⋃

a∈A

{a} × B = A× B (A.6c)

to be a set as well. �


A.3.4 Infinity, Ordinals, Natural Numbers

The following axiom of infinity guarantees the existence of infinite sets (e.g., it will allowus to define the set of natural numbers N, which is infinite by Th. A.46 below).

Axiom 6 Infinity:

∃X

(

0 ∈ X ∧ ∀x∈X

(x ∪ {x} ∈ X)

)

.

Thus, the infinity axiom states the existence of a set X containing ∅ (iden-tified with the number 0), and, for each of its elements x, its successorS(x) = x ∪ {x}.

In preparation for our official definition of N in Def. A.41 below, we will study so-calledordinals, which are special sets also of further interest to the field of set theory (thenatural numbers will turn out to be precisely the finite ordinals). We also need somenotions from the theory of relations, in particular, order relations (cf. Def. 2.19 and Def.2.23).

Definition A.21. Let R be a relation on a set X.

(a) R is called asymmetric if, and only if,

∀x,y∈X

(

xRy ⇒ ¬(yRx))

, (A.7)

i.e. if x is related to y only if y is not related to x.

(b) R is called a strict partial order if, and only if, R is asymmetric and transitive. It isnoted that this is consistent with Not. 2.24, since, recalling the notation ∆(X) :={(x, x) : x ∈ X}, R is a partial order on X if, and only if, R \ ∆(X) is a strictpartial order on X. We extend the notions lower/upper bound, min, max, inf, supof Def. 2.25 to strict partial orders R by applying them to R∪∆(X): We call x ∈ Xa lower bound of Y ⊆ X with respect to R if, and only if, x is a lower bound of Ywith respect to R ∪∆(X), and analogous for the other notions.

(c) A strict partial order R is called a strict total order or a strict linear order if, andonly if, for each x, y ∈ X, one has x = y or xRy or yRx.

(d) R is called a (strict) well-order if, and only if, R is a (strict) total order and everynonempty subset of X has a min with respect to R (for example, the usual ≤constitutes a well-order on N (see, e.g., [Phi16, Th. D.5]), but not on R (e.g., R+

does not have a min)).

(e) If Y ⊆ X, then the relation on Y defined by

xSy :⇔ xRy

is called the restriction of R to Y , denoted S = R↾Y (usually, one still writes R forthe restriction).


Lemma A.22. Let R be a relation on a set X and Y ⊆ X.

(a) If R is transitive, then R↾Y is transitive.

(b) If R is reflexive, then R↾Y is reflexive.

(c) If R is antisymmetric, then R↾Y is antisymmetric.

(d) If R is asymmetric, then R↾Y is asymmetric.

(e) If R is a (strict) partial order, then R↾Y is a (strict) partial order.

(f) If R is a (strict) total order, then R↾Y is a (strict) total order.

(g) If R is a (strict) well-order, then R↾Y is a (strict) well-order.

Proof. (a): If a, b, c ∈ Y with aRb and bRc, then aRc, since a, b, c ∈ X and R is transitiveon X.

(b): If a ∈ Y , then a ∈ X and aRa, since R is reflexive on X.

(c): If a, b ∈ Y with aRb and bRa, then a = b, since a, b ∈ X and R is antisymmetricon X.

(d): If a, b ∈ Y with aRb, then ¬bRa, since a, b ∈ X and R is asymmetric on X.

(e) follows by combining (a) – (d).

(f): If a, b ∈ Y with a = b and ¬aRb, then bRa, since a, b ∈ X and R is total on X.Combining this with (e) yields (f).

(g): Due to (f), it merely remains to show that every nonempty subset Z ⊆ Y has amin. However, since Z ⊆ X and R is a well-order on X, there is m ∈ Z such that m isa min for R on X, implying m to be a min for R on Y as well. �

Remark A.23. Since the universal class V is not a set, ∈ is not a relation in the senseof Def. 2.19. It can be considered as a “class relation”, i.e. a subclass of V × V, butit is a proper class. However, ∈ does constitute a relation in the sense of Def. 2.19 oneach set X (recalling that each element of X must be a set as well). More precisely, ifX is a set, then so is

R∈ := {(x, y) ∈ X ×X : x ∈ y}. (A.8a)

Then∀

x,y∈X(x, y) ∈ R∈ ⇔ x ∈ y. (A.8b)

Definition A.24. A set X is called transitive if, and only if, every element of X is alsoa subset of X:

∀x∈X

x ⊆ X. (A.9a)

Clearly, (A.9a) is equivalent to

∀x,y

(

x ∈ y ∧ y ∈ X ⇒ x ∈ X)

. (A.9b)


Lemma A.25. If X, Y are transitive sets, then X ∩ Y is a transitive set.

Proof. If x ∈ X ∩ Y and y ∈ x, then y ∈ X (since X is transitive) and y ∈ Y (since Yis transitive). Thus y ∈ X ∩ Y , showing X ∩ Y is transitive. �

Definition A.26. (a) A set α is called an ordinal number or just an ordinal if, and onlyif, α is transitive and ∈ constitutes a strict well-order on α. An ordinal α is called asuccessor ordinal if, and only if, there exists an ordinal β such that α = S(β), whereS is the successor function of Def. A.19. An ordinal α 6= 0 is called a limit ordinalif, and only if, it is not a successor ordinal. We denote the class of all ordinals byON (it is a proper class by Cor. A.33 below).

(b) We define

∀α,β∈ON

(α < β :⇔ α ∈ β), (A.10a)

∀α,β∈ON

(α ≤ β :⇔ α < β ∨ α = β). (A.10b)

Example A.27. Using (A.5), 0 = ∅ is an ordinal, and 1 = S(0), 2 = S(1) are bothsuccessor ordinals (in Prop. A.43, we will identify N0 as the smallest limit ordinal). Eventhough X := {1} and Y := {0, 2} are well-ordered by ∈, they are not ordinals, sincethey are not transitive sets: 1 ∈ X, but 1 6⊆ X (since 0 ∈ 1, but 0 /∈ X); similarly,1 ∈ 2 ∈ Y , but 1 /∈ Y .

Lemma A.28. No ordinal contains itself, i.e.

∀α∈ON

α /∈ α.

Proof. If α is an ordinal, then ∈ is a strict order on α. Due to asymmetry of strictorders, x ∈ x can not be true for any element of α, implying that α ∈ α can not betrue. �

Proposition A.29. Every element of an ordinal is an ordinal, i.e.

∀α∈ON

(

X ∈ α ⇒ X ∈ ON)

(in other words, ON is a transitive class).

Proof. Let α ∈ ON and X ∈ α. Since α is transitive, we have X ⊆ α. As ∈ is astrict well-order on α, it must also be a strict well-order on X by Lem. A.22(g). Inconsequence, it only remains to prove that X is transitive as well. To this end, letx ∈ X. Then x ∈ α, as α is transitive. If y ∈ x, then, using transitivity of α again,y ∈ α. Now y ∈ X, as ∈ is transitive on α, proving x ⊆ X, i.e. X is transitive. �

Proposition A.30. If α, β ∈ ON, then X := α ∩ β ∈ ON (we will see in Th. A.35(a)below that, actually, α ∩ β = min{α, β}).


Proof. X is transitive by Lem. A.25, and, since X ⊆ α, ∈ is a strict well-order on X byLem. A.22(g). �

Proposition A.31. On the class ON, the relation ≤ (as defined in (A.10)) is the sameas the relation ⊆, i.e.

∀α,β∈ON

(

α ≤ β ⇔ α ⊆ β ⇔ (α ∈ β ∨ α = β))

. (A.11)

Proof. Let α, β ∈ ON.

Assume α ≤ β. If α = β, then α ⊆ β. If α ∈ β, then α ⊆ β, since β is transitive.

Conversely, assume α ⊆ β and α 6= β. We have to show α ∈ β. To this end, we setX := β \α. Then X 6= ∅ and, as ∈ well-orders β, we can let m := minX. We will showm = α (note that this will complete the proof, due to α = m ∈ X ⊆ β). If µ ∈ m, thenµ ∈ β (since m ∈ β and β is transitive) and µ /∈ X (since m = minX), implying µ ∈ α(since X = β \ α) and, thus, m ⊆ α. Seeking a contradiction, assume m 6= α. Thenthere must be some γ ∈ α \m ⊆ α ⊆ β. In consequence γ,m ∈ β. As γ /∈ m and ∈ isa total order on β, we must have either m = γ or m ∈ γ. However, m 6= γ, since γ ∈ αand m /∈ α (as m ∈ X). So it must be m ∈ γ ∈ α, implying m ∈ α, as β is transitive.This contradiction proves m = α and establishes the proposition. �

Theorem A.32. The class ON is well-odered by ∈, i.e.

(i) ∈ is transitive on ON:

∀α,β,γ∈ON

(

α < β ∧ β < γ ⇒ α < γ)

.

(ii) ∈ is asymmetric on ON:

∀α,β∈ON

(

α < β ⇒ ¬(β < α))

.

(iii) Ordinals are always comparable:

∀α,β∈ON

(

α < β ∨ β < α ∨ α = β)

.

(iv) Every nonempty set of ordinals has a min.

Proof. (i) is clear, as γ is a transitive set.

(ii): If α, β ∈ ON, then α ∈ β ∈ α implies α ∈ α by (i), which is a contradiction toLem. A.28.

(iii): Let γ := α ∩ β. Then γ ∈ ON by Prop. A.30. Thus

γ ⊆ α ∧ γ ⊆ βLem. A.31⇒ (γ ∈ α ∨ γ = α) ∧ (γ ∈ β ∨ γ = β). (A.12)


If γ ∈ α and γ ∈ β, then γ ∈ α∩β = γ, in contradiction to Lem. A.28. Thus, by (A.12),γ = α or γ = β. If γ = α, then α ⊆ β. If γ = β, then β ⊆ α, completing the proof of(iii).

(iv): Let X be a nonempty set of ordinals and consider α ∈ X. If α = minX, thenwe are already done. Otherwise, Y := α ∩ X = {β ∈ X : β ∈ α} 6= ∅. Since α iswell-ordered by ∈, there is m := minY . If β ∈ X, then either β < α or α ≤ β by (iii).If β < α, then β ∈ Y and m ≤ β. If α ≤ β, then m < α ≤ β. Thus, m = minX,proving (iv). �

Corollary A.33. ON is a proper class (i.e. there is no set containing all the ordinals).

Proof. If there is a set X containing all ordinals, then, by comprehension, β := ON ={α ∈ X : α is an ordinal} must be a set as well. But then Prop. A.29 says that the setβ is transitive and Th. A.32 yields that the set β is well-ordered by ∈, implying β to bean ordinal, i.e. β ∈ β in contradiction to Lem. A.28. �

Corollary A.34. For each set X of ordinals, we have:

(a) X is well-ordered by ∈.(b) X is an ordinal if, and only if, X is transitive. Note: A transitive set of ordinals

X is sometimes called an initial segment of ON, since, here, transitivity can berestated in the form

∀α∈ON

∀β∈X

(

α < β ⇒ α ∈ X)

. (A.13)

Proof. (a) is a simple consequence of Th. A.32(i)-(iv).

(b) is immediate from (a). �

Theorem A.35. Let X be a nonempty set of ordinals.

(a) Then γ :=⋂X is an ordinal, namely γ = minX. In particular, if α, β ∈ ON,

then min{α, β} = α ∩ β.(b) Then δ :=

⋃X is an ordinal, namely δ = supX. In particular, if α, β ∈ ON, then

max{α, β} = α ∪ β.

Proof. (a): Let m := minX. Then γ ⊆ m, since m ∈ X. Conversely, if α ∈ X, thenm ≤ α implies m ⊆ α by Prop. A.31, i.e. m ⊆ γ. Thus, m = γ, proving (a).

(b): To show δ ∈ ON, we need to show δ is transitive (then δ is an ordinal by Cor.A.34(b)). If α ∈ δ, then there is β ∈ X such that α ∈ β. Thus, if γ ∈ α, then γ ∈ β,since β is transitive. As γ ∈ β implies γ ∈ δ, we see that δ is transitive, as needed. Itremains to show δ = supX. If α ∈ X, then α ⊆ δ, i.e. α ≤ δ, showing δ to be an upperbound for X. Now let u ∈ ON be an arbitrary upper bound for X, i.e.

∀α∈X

α ⊆ u.

Thus, δ ⊆ u, i.e. δ ≤ u, proving δ = supX. �


Next, we obtain some results regarding the successor function of Def. A.19 in the contextof ordinals.

Lemma A.36. We have

∀α∈ON

(

x, y ∈ S(α) ∧ x ∈ y ⇒ x 6= α)

.

Proof. Seeking a contradiction, we reason as follows:

x = αα/∈α⇒ y 6= α

y∈S(α)⇒ y ∈ αα transitive⇒ y ⊆ α

x∈y⇒ α ∈ α.

This contradiction to α /∈ α yields x 6= α, concluding the proof. �

Proposition A.37. For each α ∈ ON, the following holds:

(a) S(α) ∈ ON.

(b) α < S(α).

(c) For each ordinal β, β < S(α) holds if, and only if, β ≤ α.

(d) For each ordinal β, if β < α, then S(β) < S(α).

(e) For each ordinal β, if S(β) < S(α), then β < α.

Proof. (a): Due to Prop. A.29, S(α) is a set of ordinals. Thus, by Cor. A.34(b), itmerely remains to prove that S(α) is transitive. Let x ∈ S(α). If x = α, then x =α ⊆ α ∪ {α} = S(α). If x 6= α, then x ∈ α and, since α is transitive, this impliesx ⊆ α ⊆ S(α), showing S(α) to be transitive, thereby completing the proof of (a).

(b) holds, as α ∈ S(α) holds by the definition of S(α).

(c) is clear, since, for each ordinal β,

β < S(α) ⇔ β ∈ S(α) ⇔ β ∈ α ∨ β = α ⇔ β ≤ α.

(d): If β < α, then S(β) = β ∪ {β} ⊆ α, i.e. S(β) ≤ α < S(α).

(e) follows from (d) using contraposition: If ¬(β < α), then β = α or α < β, implyingS(β) = S(α) or S(α) < S(β), i.e. ¬(S(β) < S(α)). �

We now proceed to define the natural numbers:

Definition A.38. An ordinal n is called a natural number if, and only if,

n 6= 0 ∧ ∀m∈ON

(

m ≤ n ⇒ m = 0 ∨ m is successor ordinal)

.

Proposition A.39. If n = 0 or n is a natural number, then S(n) is a natural numberand every element of n is a natural number or 0.


Proof. Suppose n is 0 or a natural number. If m ∈ n, then m is an ordinal by Prop.A.29. Suppose m 6= 0 and k ∈ m. Then k ∈ n, since n is transitive. Since n is a naturalnumber, k = 0 or k is a successor ordinal. Thus, m is a natural number. It remainsto show that S(n) is a natural number. By definition, S(n) = n ∪ {n} 6= 0. Moreover,S(n) ∈ ON by Prop. A.37(a), and, thus, S(n) is a successor ordinal. If m ∈ S(n), thenm ≤ n, implying m = 0 or m is a successor ordinal, completing the proof that S(n) isa natural number. �

Theorem A.40 (Principle of Induction). If X is a set satisfying

0 ∈ X ∧ ∀x∈X

S(x) ∈ X, (A.14)

then X contains 0 and all natural numbers.

Proof. Let X be a set satisfying (A.14). Then 0 ∈ X is immediate. Let n be a naturalnumber and, seeking a contradiction, assume n /∈ X. ConsiderN := S(n)\X. Accordingto Prop. A.39, S(n) is a natural number and all nonzero elements of S(n) are naturalnumbers. Since N ⊆ S(n) and 0 ∈ X, 0 /∈ N and all elements of N must be naturalnumbers. As n ∈ N , N 6= 0. Since S(n) is well-ordered by ∈ and 0 6= N ⊆ S(n), Nmust have a min m ∈ N , 0 6= m ≤ n. Since m is a natural number, there must be ksuch that m = S(k). Then k < m, implying k /∈ N . On the other hand

k < m ∧m ≤ n ⇒ k ≤ n ⇒ k ∈ S(n).

Thus, k ∈ X, implying m = S(k) ∈ X, in contradiction to m ∈ N . This contradictionproves n ∈ X, thereby establishing the case. �

Definition A.41. If the set X is given by the axiom of infinity, then we use compre-hension to define the set

N0 := {n ∈ X : n = 0 ∨ n is a natural number}

and note N0 to be unique by extensionality. We also denote N := N0\{0}. In set theory,it is also very common to use the symbol ω for the set N0.

Corollary A.42. N0 is the set of all natural numbers and 0, i.e.

∀n

(

n ∈ N0 ⇔ n = 0 ∨ n is a natural number)

.

Proof. “⇒” is clear from Def. A.41 and “⇐” is due to Th. A.40. �

Proposition A.43. ω = N0 is the smallest limit ordinal.

Proof. Since ω is a set of ordinals and ω is transitive by Prop. A.39, ω is an ordinalby Cor. A.34(b). Moreover ω 6= 0, since 0 ∈ ω; and ω is not a successor ordinal (ifω = S(n) = n ∪ {n}, then n ∈ ω and S(n) ∈ ω by Prop. A.39, in contradiction toω = S(n)), implying it is a limit ordinal. To see that ω is the smallest limit ordinal, letα ∈ ON, α < ω. Then α ∈ ω, that means α = 0 or α is a natural number (in particular,a successor ordinal). �


In the following Th. A.44, we will prove that N satisfies the Peano axioms P1 – P3 ofSec. 3.1 (if one prefers, one can show the same for N0, where 0 takes over the role of 1).

Theorem A.44. The set of natural numbers N satisfies the Peano axioms P1 – P3 ofSec. 3.1.

Proof. For P1 and P2, we have to show that, for each n ∈ N, one has S(n) ∈ N \ {1}and that S(m) 6= S(n) for each m,n ∈ N, m 6= n. Let n ∈ N. Then S(n) ∈ N by Prop.A.39. If S(n) = 1, then n < S(n) = 1 by Prop. A.37(b), i.e. n = 0, in contradiction ton ∈ N. If m,n ∈ N with m 6= n, then S(m) 6= S(n) is due to Prop. A.37(d). To proveP3, suppose A ⊆ N has the property that 1 ∈ A and S(n) ∈ A for each n ∈ A. We needto show A = N (i.e. N ⊆ A, as A ⊆ N is assumed). Let X := A ∪ {0}. Then X satisfies(A.14) and Th. A.40 yields N0 ⊆ X. Thus, if n ∈ N, then n ∈ X \ {0} = A, showingN ⊆ A. �

Notation A.45. For each n ∈ N0, we introduce the notation n + 1 := S(n) (moregenerally, one also defines α + 1 := S(α) for each ordinal α).

Theorem A.46. Let n ∈ N0. Then A := N0 \ n is infinite (see Def. 3.10(b)). Inparticular, N0 and N = N0 \ {0} = N0 \ 1 are infinite.

Proof. Since n /∈ n, we have n ∈ A 6= ∅. Thus, if A were finite, then there were abijection f : A −→ Am := {1, . . . ,m} = {k ∈ N : k ≤ m} for some m ∈ N. However,we will show by induction on m ∈ N that there is no injective map f : A −→ Am. SinceS(n) /∈ n, we have S(n) ∈ A. Thus, if f : A −→ A1 = {1}, then f(n) = f(S(n)),showing that f is not injective and proving the cases m = 1. For the induction step, weproceed by contraposition and show that the existence of an injective map f : A −→Am+1, m ∈ N, (cf. Not. A.45) implies the existence of an injective map g : A −→ Am.To this end, let m ∈ N and f : A −→ Am+1 be injective. If m+ 1 /∈ f(A), then f itselfis an injective map into Am. If m + 1 ∈ f(A), then there is a unique a ∈ A such thatf(a) = m+ 1. Define

g : A −→ Am, g(k) :=

{

f(k) for k < a,

f(k + 1) for a ≤ k.(A.15)

Then g is well-defined: If k ∈ A and a ≤ k, then k+1 ∈ A\{a}, and, since f is injective,g does, indeed, map into Am. We verify g to be injective: If k, l ∈ A, k < l, then alsok < l+1 and k+1 6= l+1 (by Peano axiom P2 – k+1 < l+1 then also follows, but wedo not make use of that here). In each case, g(k) 6= g(l), proving g to be injective. �

For more basic information regarding ordinals see, e.g., [Kun12, Sec. I.8].

A.3.5 Power Set

There is one more basic construction principle for sets that is not covered by Axioms 0– 6, namely the formation of power sets. This needs another axiom:


Axiom 7 Power Set:∀X

∃M

∀Y

(

Y ⊆ X ⇒ Y ∈ M)

.

Thus, the power set axiom states that, for each set X, there exists a set Mthat contains all subsets Y of X as elements.

Definition A.47. If X is a set and M is given by the power set axiom, then we call

P(X) := {Y ∈ M : Y ⊆ X}

the power set of X. Another common notation for P(X) is 2X (cf. Prop. 2.18).

A.3.6 Foundation

Foundation is, perhaps, the least important of the axioms in ZF. It basically cleansesthe mathematical universe of unnecessary “clutter”, i.e. of certain pathological sets thatare of no importance to standard mathematics anyway.

Axiom 8 Foundation:

∀X

(

∃x(x ∈ X) ⇒ ∃

x∈X¬∃

z

(

z ∈ x ∧ z ∈ X))

.

Thus, the foundation axiom states that every nonempty set X contains anelement x that is disjoint to X.

Theorem A.48. Due to the foundation axiom, the ∈ relation can have no cycles, i.e.there do not exist sets x1, x2, . . . , xn, n ∈ N, such that

x1 ∈ x2 ∈ · · · ∈ xn ∈ x1. (A.16a)

In particular, sets can not be members of themselves:

¬∃xx ∈ x. (A.16b)

Proof. If there were sets x1, x2, . . . , xn, n ∈ N, such that (A.16a) were true, then, byusing the pairing axiom and the union axiom, we could form the set

X := {x1, . . . , xn}.

Then, in contradiction to the foundation axiom, X ∩ xi 6= ∅, for each i = 1, . . . , n:Indeed, xn ∈ X ∩ x1, and xi−1 ∈ X ∩ xi for each i = 2, . . . , n. �

For a detailed explanation, why “sets” forbidden by foundation do not occur in standardmathematics, anyway, see, e.g., [Kun12, Sec. I.14].


A.4 The Axiom of Choice

In addition to the axioms of ZF discussed in the previous section, there is one moreaxiom, namely the axiom of choice (AC) that, together with ZF, makes up ZFC, theaxiom system at the basis of current standard mathematics. Even though AC is usedand accepted by most mathematicians, it does have the reputation of being somewhatless “natural”. Thus, many mathematicians try to avoid the use of AC, where possible,and it is often pointed out explicitly, if a result depends on the use of AC (but thispractise is by no means consistent, neither in the literature nor in this class, and onemight sometimes be surprised, which seemingly harmless result does actually depend onAC in some subtle nonobvious way). We will now state the axiom:

Axiom 9 Axiom of Choice (AC):

∀M

∅ /∈ M ⇒ ∃f :M−→

⋃

N∈M

N

(

∀M∈M

f(M) ∈M

)

.

Thus, the axiom of choice postulates, for each nonempty set M, whose ele-ments are all nonempty sets, the existence of a choice function, that meansa function that assigns, to each M ∈ M, an element m ∈M .

Example A.49. For example, the axiom of choice postulates, for each nonempty setA, the existence of a choice function on P(A) \ {∅} that assigns each subset of A one ofits elements.

—

The axiom of choice is remarkable since, at first glance, it seems so natural that onecan hardly believe it is not provable from the axioms in ZF. However, one can actuallyshow that it is neither provable nor disprovable from ZF (see, e.g., [Jec73, Th. 3.5, Th.5.16] – such a result is called an independence proof, see [Kun80] for further material). Ifyou want to convince yourself that the existence of choice functions is, indeed, a trickymatter, try to define a choice function on P(R)\{∅} without AC (but do not spend toomuch time on it – one can show this is actually impossible to accomplish).

Theorem A.52 below provides several important equivalences of AC. Its statement andproof needs some preparation. We start by introducing some more relevant notions fromthe theory of partial orders:

Definition A.50. Let X be a set and let ≤ be a partial order on X.

(a) An element m ∈ X is called maximal (with respect to ≤) if, and only if, there existsno x ∈ X such that m < x (note that a maximal element does not have to be amax and that a maximal element is not necessarily unique).

(b) A nonempty subset C of X is called a chain if, and only if, C is totally ordered by≤. Moreover, a chain C is called maximal if, and only if, no strict superset Y of C(i.e. no Y ⊆ X such that C ( Y ) is a chain.


—

The following lemma is a bit technical and will be used to prove the implication AC⇒ (ii) in Th. A.52 (other proofs in the literature often make use of so-called transfiniterecursion, but that would mean further developing the theory of ordinals, and we willnot pursue this route in this class).

Lemma A.51. Let X be a set and let ∅ 6= M ⊆ P(X) be a nonempty set of subsets ofX. We let M be partially ordered by inclusion, i.e. setting A ≤ B :⇔ A ⊆ B for eachA,B ∈ M. Moreover, define

∀S⊆M

⋃

S :=⋃

S∈S

S (A.17)

and assume∀

C⊆M

(

C is a chain ⇒⋃

C ∈ M)

. (A.18)

If the function g : M −→ M has the property that

∀M∈M

(

M ⊆ g(M) ∧ #(g(M) \M) ≤ 1)

, (A.19)

then g has a fixed point, i.e.∃

M∈Mg(M) =M. (A.20)

Proof. Fix some arbitrary M0 ∈ M. We call T ⊆ M an M0-tower if, and only if, Tsatisfies the following three properties

(i) M0 ∈ T .

(ii) If C ⊆ T is a chain, then⋃ C ∈ T .

(iii) If M ∈ T , then g(M) ∈ T .

Let T := {T ⊆ M : T is an M0-tower}. If T1 := {M ∈ M : M0 ⊆ M}, then, clearly,T1 is an M0-tower and, in particular, T 6= ∅. Next, we note that the intersection of allM0-towers, i.e. T0 :=

⋂

T ∈T T , is also an M0-tower. Clearly, no strict subset of T0 canbe an M0-tower and

M ∈ T0 ⇒ M ∈ T1 ⇒ M0 ⊆M. (A.21)

The main work of the rest of the proof consists of showing that T0 is a chain. To showT0 to be a chain, define

Γ :=

{

M ∈ T0 : ∀N∈T0

(M ⊆ N ∨ N ⊆M)

}

. (A.22)

We intend to show that Γ = T0 by verifying that Γ is an M0-tower. As an intermediatestep, we define

∀M∈Γ

Φ(M) := {N ∈ T0 : N ⊆M ∨ g(M) ⊆ N}


and also show each Φ(M) to be an M0-tower. Actually, Γ and each Φ(M) satisfy (i)due to (A.21). To verify Γ satisfies (ii), let C ⊆ Γ be a chain and U :=

⋃ C. ThenU ∈ T0, since T0 satisfies (ii). If N ∈ T0, and C ⊆ N for each C ∈ C, then U ⊆ N . IfN ∈ T0, and there is C ∈ C such that C 6⊆ N , then N ⊆ C (since C ∈ Γ), i.e. N ⊆ U ,showing U ∈ Γ and Γ satisfying (ii). Now, let M ∈ Γ. To verify Φ(M) satisfies (ii), letC ⊆ Φ(M) be a chain and U :=

⋃ C. Then U ∈ T0, since T0 satisfies (ii). If U ⊆M , thenU ∈ Φ(M) as desired. If U 6⊆ M , then there is x ∈ U such that x /∈ M . Thus, thereis C ∈ C such that x ∈ C and g(M) ⊆ C (since C ∈ Φ(M)), i.e. g(M) ⊆ U , showingU ∈ Φ(M) also in this case, and Φ(M) satisfies (ii). We will verify that Φ(M) satisfies(iii) next. For this purpose, fix N ∈ Φ(M). We need to show g(N) ∈ Φ(M). We alreadyknow g(N) ∈ T0, as T0 satisfies (iii). As N ∈ Φ(M), we can now distinguish three cases.Case 1: N (M . In this case, we cannot have M ( g(N) (otherwise, #(g(N) \N) ≥ 2in contradiction to (A.19)). Thus, g(N) ⊆ M (since M ∈ Γ), showing g(N) ∈ Φ(M).Case 2: N = M . Then g(N) = g(M) ∈ Φ(M) (since g(M) ∈ T0 and g(M) ⊆ g(M)).Case 3: g(M) ⊆ N . Then g(M) ⊆ g(N) by (A.19), again showing g(N) ∈ Φ(M). Thus,we have verified that Φ(M) satisfies (iii) and, therefore, is an M0-tower. Then, by thedefinition of T0, we have T0 ⊆ Φ(M). As we also have Φ(M) ⊆ T0 (from the definitionof Φ(M)), we have shown

∀M∈Γ

Φ(M) = T0.

As a consequence, if N ∈ T0 and M ∈ Γ, then N ∈ Φ(M) and this means N ⊆ M ⊆g(M) or g(M) ⊆ N , i.e. each N ∈ T0 is comparable to g(M), showing g(M) ∈ Γ and Γsatisfying (iii), completing the proof that Γ is an M0-tower. As with the Φ(M) above,we conclude Γ = T0, as desired. To conclude the proof of the lemma, we note Γ = T0

implies T0 is a chain. We claim that

M :=⋃

T0

satisfies (A.20): Indeed, M ∈ T0, since T0 satisfies (ii). Then g(M) ∈ T0, since T0

satisfies (iii). We then conclude g(M) ⊆ M from the definition of M . As we alwayshave M ⊆ g(M) by (A.19), we have established g(M) =M and proved the lemma. �

Theorem A.52 (Equivalences to the Axiom of Choice). The following statements (i)– (v) are equivalent to the axiom of choice (as stated as Axiom 9 above).

(i) Every Cartesian product∏

i∈I Ai of nonempty sets Ai, where I is a nonemptyindex set, is nonempty (cf. Def. 2.15(c)).

(ii) Hausdorff’s Maximality Principle: Every nonempty partially ordered set X con-tains a maximal chain (i.e. a maximal totally ordered subset).

(iii) Zorn’s Lemma: Let X be a nonempty partially ordered set. If every chain C ⊆ X(i.e. every nonempty totally ordered subset of X) has an upper bound in X (suchchains with upper bounds are sometimes called inductive), then X contains amaximal element (cf. Def. A.50(a)).

(iv) Zermelo’s Well-Ordering Theorem: Every set can be well-ordered (recall the defi-nition of a well-order from Def. A.21(d)).


(v) Every vector space V over a field F has a basis B ⊆ V .

Proof. “(i) ⇔ AC”: Assume (i). Given a nonempty set of nonempty sets M, let I := Mand, for each M ∈ M, let AM :=M . If f ∈∏M∈I AM , then, according to Def. 2.15(c),for eachM ∈ I = M, one has f(M) ∈ AM =M , proving AC holds. Conversely, assumeAC. Consider a family (Ai)i∈I such that I 6= ∅ and each Ai 6= ∅. Let M := {Ai : i ∈ I}.Then, by AC, there is a map g : M −→ ⋃

N∈MN =⋃

j∈I Aj such that g(M) ∈ M foreach M ∈ M. Then we can define

f : I −→⋃

j∈I

Aj, f(i) := g(Ai) ∈ Ai,

to prove (i).

Next, we will show AC ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ AC.

“AC ⇒ (ii)”: Assume AC and let X be a nonempty partially ordered set. Let M be theset of all chains in X (i.e. the set of all nonempty totally ordered subsets of X). Then∅ /∈ M and M 6= ∅ (since X 6= ∅ and {x} ∈ M for each x ∈ X). Moreover, M satisfiesthe hypothesis of Lem. A.51, since, if C ⊆ M is a chain of totally ordered subsets of X,then

⋃ C is a totally ordered subset of X, i.e. in M (here we have used the notationof (A.17); also note that we are dealing with two different types of chains here, namelythose with respect to the order on X and those with respect to the order given by ⊆ onM). Let f : P(X) \ {∅} −→ X be a choice function given by AC, i.e. such that

∀Y ∈P(X)\{∅}

f(Y ) ∈ Y.

As an auxiliary notation, we set

∀M∈M

M∗ :={x ∈ X \M : M ∪ {x} ∈ M

}.

With the intention of applying Lem. A.51, we define

g : M −→ M, g(M) :=

{

M ∪ {f(M∗)} if M∗ 6= ∅,M if M∗ = ∅.

Since g clearly satisfies (A.19), Lem. A.51 applies, providing an M ∈ M such thatg(M) =M . Thus, M∗ = ∅, i.e. M is a maximal chain, proving (ii).

“(ii) ⇒ (iii)”: Assume (ii). To prove Zorn’s lemma, let X be a nonempty set, partiallyordered by ≤, such that every chain C ⊆ X has an upper bound. Due to Hausdorff’smaximality principle, we can assume C ⊆ X to be a maximal chain. Let m ∈ X be anupper bound for the maximal chain C. We claim that m is a maximal element: Indeed,if there were x ∈ X such that m < x, then x /∈ C (since m is upper bound for C) andC ∪ {x} would constitute a strict superset of C that is also a chain, contradicting themaximality of C.

“(iii) ⇒ (iv)”: Assume (iii) and let X be a nonempty set. We need to construct awell-order on X. Let W be the set of all well-orders on subsets of X, i.e.

W :={(Y,W ) : Y ⊆ X ∧ W ⊆ Y × Y ⊆ X ×X is a well-order on Y

}.


We define a partial order ≤ on W by setting

∀(Y,W ),(Y ′,W ′)∈W

(

(Y,W ) ≤ (Y ′,W ′) :⇔ Y ⊆ Y ′ ∧ W = W ′↾Y

∧ (y ∈ Y, y′ ∈ Y ′, y′W ′y ⇒ y′ ∈ Y ))

(recall the definition of the restriction of a relation from Def. A.21(e)). To apply Zorn’slemma to (W ,≤), we need to check that every chain C ⊆ W has an upper bound. Tothis end, if C ⊆ W is a chain, let

UC := (YC,WC), where YC :=⋃

(Y,W )∈C

Y, WC :=⋃

(Y,W )∈C

W.

We need to verify UC ∈ W : If aWCb, then there is (Y,C) ∈ C such that aWb. Inparticular, (a, b) ∈ Y × Y ⊆ YC × YC, showing WC to be a relation on YC. Clearly, WC

is a total order on YC (one just uses that, if a, b ∈ YC, then, as C is a chain, there is(Y,W ) ∈ C such that a, b ∈ Y and W = WC ↾Y is a total order on Y ). To see that WC

is a well-order on YC, let ∅ 6= A ⊆ YC. If a ∈ A, then there is (Y,W ) ∈ C such thata ∈ Y . Since W = WC ↾Y is a well-order on Y , we can let m := minY ∩ A. We claimthat m = minA as well: Let b ∈ A. Then there is (B,U) ∈ C such that b ∈ B. IfB ⊆ Y , then b ∈ Y ∩ A and mWb. If Y ⊆ B, then m, b ∈ B. If mUb, then we aredone. If bUm, then b ∈ Y (since (Y,W ) ≤ (B,U)), i.e., again, b ∈ Y ∩ A and mWb(actually m = b in this case), proving m = minA. This completes the proof that WC isa well-order on YC and, thus, shows UC ∈ W . Next, we check UC to be an upper boundfor C: If (Y,W ) ∈ C, then Y ⊆ YC and W = WC ↾Y are immediate. If y ∈ Y , y′ ∈ YC,and y′WCy, then y

′ ∈ Y (otherwise, y′ ∈ A with (A,U) ∈ C, (Y,W ) ≤ (A,U), y′Uy, incontradiction to y′ /∈ Y ). Thus, (Y,W ) ≤ UC, showing UC to be an upper bound for C.By Zorn’s lemma, we conclude that W contains a maximal element (M,WM ). But thenM = X and WM is the desired well-order on X: Indeed, if there is x ∈ X \M , then wecan let Y :=M ∪ {x} and,

∀a,b∈Y

(

aWb :⇔ (a, b ∈M ∧ aWMb) ∨ b = x)

.

Then (Y,W ) ∈ W with (M,WM) < (Y,W ) in contradiction to the maximality of(M,WM ).

“(iv) ⇒ AC”: Assume (iv). Given a nonempty set of nonempty sets M, let X :=⋃

M∈MM . By (iv), there exists a well-order R on X. Then every nonempty Y ⊆ Xhas a unique min. As every M ∈ M is a nonempty subset of X, we can define a choicefunction

f : M −→ X, f(M) := minM ∈M,

proving AC.

“(v) ⇔ AC”: That every vector space has a basis is proved in Th. 5.23 by use of Zorn’slemma. That, conversely, (v) implies AC was first shown in [Bla84], but the proof needsmore algebraic tools than we have available in this class. �


A.5 Cardinality

The following theorem provides two interesting, and sometimes useful, characterizationsof infinite sets:

Theorem A.53. Let A be a set. Using the axiom of choice (AC) of Sec. A.4, thefollowing statements (i) – (iii) are equivalent. More precisely, (ii) and (iii) are equivalenteven without AC (a set A is sometimes called Dedekind-infinite if, and only if, it satisfies(iii)), (iii) implies (i) without AC, but AC is needed to show (i) implies (ii), (iii).

(i) A is infinite.

(ii) There exists M ⊆ A and a bijective map f : M −→ N.

(iii) There exists a strict subset B ( A and a bijective map g : A −→ B.

One sometimes expresses the equivalence between (i) and (ii) by saying that a set isinfinite if, and only if, it contains a copy of the natural numbers. The property stated in(iii) might seem strange at first, but infinite sets are, indeed, precisely those identical insize to some of their strict subsets (as an example think of the natural bijection n 7→ 2nbetween all natural numbers and the even numbers).

Proof. We first prove, without AC, the equivalence between (ii) and (iii).

“(ii) ⇒ (iii)”: Let E denote the even numbers. Then E ( N and h : N −→ E,h(n) := 2n, is a bijection, showing that (iii) holds for the natural numbers. According to(ii), there existsM ⊆ A and a bijective map f : M −→ N. Define B := (A\M) ∪ f−1(E)and

h : A −→ B, h(x) :=

{

x for x ∈ A \M,

f−1 ◦ h ◦ f(x) for x ∈M.(A.23)

Then B ( A since B does not contain the elements of M that are mapped to oddnumbers under f . Still, h is bijective, since h↾A\M= IdA\M and h↾M= f−1 ◦ h ◦ f is thecomposition of the bijective maps f , h, and f−1↾E: E −→ f−1(E).

“(iii) ⇒ (ii)”: As (iii) is assumed, there exist B ⊆ A, a ∈ A \ B, and a bijective mapg : A −→ B. Set

M := {an := gn(a) : n ∈ N}.We show that an 6= am for each m,n ∈ N with m 6= n: Indeed, suppose m,n ∈ N withn > m and an = am. Then, since g is bijective, we can apply g−1 m times to an = amto obtain

a = (g−1)m(am) = (g−1)m(an) = gn−m(a).

Since l := n−m ≥ 1, we have a = g(gl−1(a)), in contradiction to a ∈ A \ B. Thus, allthe an ∈ M are distinct and we can define f : M −→ N, f(an) := n, which is clearlybijective, proving (ii).

“(iii) ⇒ (i)”: The proof is conducted by contraposition, i.e. we assume A to be finite andproof that (iii) does not hold. If A = ∅, then there is nothing to prove. If ∅ 6= A is finite,


then, by Def. 3.10(b), there exists n ∈ N and a bijective map f : A −→ {1, . . . , n}. IfB ( A, then, according to Th. 3.19(a), there exists m ∈ N0, m < n, and a bijectivemap h : B −→ {1, . . . ,m}. If there were a bijective map g : A −→ B, then h ◦ g ◦ f−1

were a bijective map from {1, . . . , n} onto {1, . . . ,m} with m < n in contradiction toTh. 3.17.

“(i) ⇒ (ii)”: Inductively, we construct a strictly increasing sequence M1 ⊆M2 ⊆ . . . ofsubsets Mn of A n ∈ N, and a sequence of functions fn : Mn −→ {1, . . . , n} satisfying

∀n∈N

fn is bijective, (A.24a)

∀m,n∈N

(

m ≤ n ⇒ fn↾Mm= fm

)

: (A.24b)

Since A 6= ∅, there exists m1 ∈ A. Set M1 := {m1} and f1 : M1 −→ {1}, f1(m1) := 1.ThenM1 ⊆ A and f1 bijective are trivially clear. Now let n ∈ N and supposeM1, . . . ,Mn

and f1, . . . , fn satisfying (A.24) have already been constructed. Since A is infinite, theremust bemn+1 ∈ A\Mn (otherwiseMn = A and the bijectivity of fn : Mn −→ {1, . . . , n}shows A is finite with #A = n; AC is used to select the mn+1 ∈ A \Mn). Set Mn+1 :=Mn ∪ {mn+1} and

fn+1 : Mn+1 −→ {1, . . . , n+ 1}, fn+1(x) :=

{

fn(x) for x ∈Mn,

n+ 1 for x = mn+1.(A.25)

Then the bijectivity of fn implies the bijectivity of fn+1, and, since fn+1↾Mn= fn holds

by definition of fn+1, the implication

m ≤ n+ 1 ⇒ fn+1↾Mm= fm

holds true as well. An induction also shows Mn = {m1, . . . ,mn} and fn(mn) = n foreach n ∈ N. We now define

M :=⋃

n∈N

Mn = {mn : n ∈ N}, f : M −→ N, f(mn) := fn(mn) = n. (A.26)

Clearly, M ⊆ A, and f is bijective with f−1 : N −→M , f−1(n) = mn. �

We now proceed to prove some rules regarding cardinality:

Theorem A.54. Let A,B be sets. Then

#A ≤ #B ∨ #B ≤ #A, (A.27)

i.e. there exists a bijective map φA : A −→ N with N ⊆ B or a bijective map φB :M −→ B, M ⊆ A (this result makes use of AC).

Proof. To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on the set

M :={

(M,N, f) : M ⊆ A, N ⊆ B, f : M −→ N is bijective}

(A.28)


by letting(M,N, f) ≤ (U, V, g) :⇔ M ⊆ U, g↾M= f. (A.29)

Then M contains the empty map, (∅, ∅, ∅) ∈ M, i.e. M 6= ∅. Every chain C ⊆ Mhas an upper bound, namely (MC, fC) with MC :=

⋃

(M,N,f)∈C M and fC(x) := f(x),

where (M,N, f) ∈ C is chosen such that x ∈ M (since C is a chain, the value of fC(x)does not actually depend on the choice of (M,N, f) ∈ C and is, thus, well-defined).Clearly, MC ⊆ A and NC :=

⋃

(M,N,f)∈C N ⊆ B. We need to verify fC : MC −→ NC

to be bijective. If x, y ∈ MC, then, since C is a chain, there exists (M,N, f) ∈ C withx, y ∈ M . As f is injective, we have fC(x) = f(x) 6= f(y) = fC(y), showing fC to beinjective as well. If z ∈ NC, then z ∈ N for some (f,M,N) ∈ C. Since f : M −→ N issurjective, there exists x ∈M ⊆MC with fC(x) = f(x) = z, showing fC to be surjectiveas well, proving (MC, NC, fC) ∈ M. To see (MC, NC, fC) to be an upper bound for C, notethat the definition of (MC, NC, fC) immediately implies M ⊆MC for each (M,N, f) ∈ Cand fC ↾M= f for each (M,N, f) ∈ C. Thus, Zorn’s lemma applies, yielding a maximalelement (Mmax, Nmax, fmax) ∈ M. If a ∈ A \Mmax and b ∈ B \Nmax, then

f : {a} ∪Mmax −→ {b} ∪Nmax, f(x) :=

{

b for x = a,

fmax(x) for x ∈Mmax

is a bijective extension of fmax. Thus, the maximality of (Mmax, Nmax, fmax) impliesMmax = A or Nmax = B, completing the proof. �

Theorem A.55. Let A,B be sets, let A be infinite and assume there exists an injectivemap φB : B −→ A. Then

#(A ∪B) = #A, (A.30)

i.e. there exists a bijective map, mapping A onto A ∪ B (this result makes use of AC).

Proof. Since the map a 7→ a always maps A injectively into A ∪ B, it remains to showthe existence of an injective map φ : A ∪ B −→ A (then (A.30) holds due to theSchroder-Bernstein Th. 3.12). Possibly replacing B with A \B, we may also assume Aand B to be disjoint, without loss of generality. Thus, letM := A ∪B. We apply Zorn’slemma of Th. A.52(iii) to

M :=

{

A ⊆ P(A) :

(

∀X∈A

#X = #N

)

∧(

∀X,Y ∈A

X 6= Y ⇒ X ∩ Y = ∅)}

, (A.31)

partially ordered by set inclusion ⊆ (each element of M constitutes a partition of somesubset of A into infinite, countable subsets). Then M 6= ∅ by Th. A.53(ii). In theusual way, if C ⊆ M is a chain, then the union of all elements of C is an element of Mthat constitutes an upper bound for C. Thus, Zorn’s lemma applies, yielding a maximalelement Amax ∈ M. The maximality of Amax then implies

F := A \⋃

C∈Amax

C


to be finite. Replacing some fixed C ∈ Amax with C ∪ F , we may assume Amax to be apartition of A. For each X ∈ Amax, we have X ⊆ A and a bijective map φX : N −→ X.Thus, the map

φ0 : N×Amax −→ A, φ0(n,X) := φX(n),

is bijective as well. Letting N0 := {n ∈ N : n even} and N1 := {n ∈ N : n odd}, weobtain

N×Amax = (N0 ×Amax) ∪(N1 ×Amax).

Moreover, there exist bijective maps ψ0 : N×Amax −→ N0×Amax and ψ1 : N×Amax −→N1 ×Amax. Thus, we can define

φ : M := A ∪B −→ A, φ(x) :=

{

(φ0 ◦ ψ0 ◦ φ−10 )(x) for x ∈ A,

(φ0 ◦ ψ1 ◦ φ−10 ◦ φB)(x) for x ∈ B,

which is, clearly, injective. �

Theorem A.56. Let A,B be nonempty sets, let A be infinite and assume there existsan injective map φB : B −→ A. Then

#(A×B) = #A, (A.32)

i.e. there exists a bijective map, mapping A onto A× B (this result makes use of AC).

Proof. Since the map (a, b) 7→ (a, φB(b)) is injective from A × B into A × A, and themap a 7→ (a, a) is injective from A into A × A, it suffices to show the existence of aninjective map from A×A into A (then (A.30) holds due to the Schroder-Bernstein Th.3.12). However, we will actually directly prove the existence of a bijective map from Aonto A × A: To apply Zorn’s lemma of Th. A.52(iii), we define a partial order on theset

M :={

(D, f) : D ⊆ A, f : D −→ D ×D is bijective}

(A.33)

by letting(D, f) ≤ (E, g) :⇔ D ⊆ E, g↾D= f. (A.34)

Then M 6= ∅, since there exists a bijective map f : N −→ N × N by Th. 3.28 anda bijective map between some D ⊆ A and N by Th. A.53(ii). Every chain C ⊆ Mhas an upper bound, namely (DC, fC) with DC :=

⋃

(D,f)∈C D and fC(x) := f(x), where

(D, f) ∈ C is chosen such that x ∈ D (since C is a chain, the value of fC(x) does notactually depend on the choice of (D, f) ∈ C and is, thus, well-defined). Clearly, DC ⊆ A,and we merely need to verify fC : DC −→ DC ×DC to be bijective. If x, y ∈ DC, then,since C is a chain, there exists (D, f) ∈ C with x, y ∈ D. As f is injective, we havefC(x) = f(x) 6= f(y) = fC(y), showing fC to be injective as well. If (x, y) ∈ DC × DC

and (D, f) ∈ C with x, y ∈ D as before, then the surjectivity of f : D −→ D×D yieldsz ∈ D ⊆ DC with fC(z) = f(z) = (x, y), showing fC to be surjective as well, proving(DC, fC) ∈ M. To see (DC, fC) to be an upper bound for C, note that the definitionof (DC, fC) immediately implies D ⊆ DC for each (D, f) ∈ C and fC ↾D= f for each(D, f) ∈ C. Thus, Zorn’s lemma applies, yielding a maximal element (Dmax, fmax) ∈ M.


We would like to prove Dmax = A. However, in general, this can not be expected tohold (for example, if A = N, and N \ D is finite, then (D, f) ∈ M is always maximal– since (E × E) \ (D ×D) is infnite for each D ( E ⊆ A, f does not have a bijectiveextension g : E −→ E ×E). What we can prove, though, is the existence of a bijectivemap φ : Dmax −→ A, which suffices, since the map

φ : A −→ A× A, φ := (φ, φ) ◦ fmax ◦ φ−1,

is then also bijective. It remains to establish the existence of the bijective map φ :Dmax −→ A. Arguing via contraposition, we assume the nonexistence of a bijectiveφ and show (Dmax, fmax) is not maximal: Let E := A \ Dmax. Then A = Dmax ∪E.According to Th. A.55, there does not exist an injective map from E into Dmax. Then,according to Th. A.54, there exists a bijective map φD : Dmax −→ N , N ⊆ E. Define

P := (N ×N) ∪(N ×Dmax) ∪(Dmax ×N).

Since φD : Dmax −→ N is bijective and fmax : Dmax −→ Dmax × Dmax is bijective, weconclude

#(N ×N) = #(N ×Dmax) = #(Dmax ×N) = #Dmax = #N.

Thus, #P = #Dmax = #N by Th. A.55 and, in particular, there exists a bijection fN :N −→ P . In consequence, we can combine the bijections fmax : Dmax −→ Dmax ×Dmax

and fN : N −→ P to obtain a bijection

f : N ∪Dmax −→ P ∪(Dmax ×Dmax) = (Dmax ∪N)× (Dmax ∪N),

which is a bijective extension of fmax, i.e. (N ∪Dmax, f) ∈ M and (Dmax, fmax) is notmaximal. Thus, the maximality of (Dmax, fmax) implies the existence of a bijectiveφ : Dmax −→ A as desired. �

Corollary A.57. Let n ∈ N and let A be an infinite set. Then

#An = #A, (A.35)

i.e. there exists a bijective map, mapping A onto An (this result makes use of AC).

Proof. The claimed rule (A.35) follows via a simple induction from Th. A.56, since, foreach n ∈ N with n ≥ 2, the map

φ : An −→ An−1 × A, φ(a1, . . . , an) :=((a1, . . . , an−1), an

),

is, clearly, bijective. �

Theorem A.58. Let A be a set and let Pfin(A) denote the set of finite subsets of A.

(a) One has#Pfin(A) = #2Afin, (A.36)

where 2Afin := {0, 1}Afin is defined as in Ex. 5.16(c) and a bijection between Pfin(A)and {0, 1}Afin is given by restricting the map

χ : P(A) −→ {0, 1}A, χ(B) := χB,

of Prop. 2.18 to Pfin(A).


(b) If A is infinite, then

∀n∈N

#Pn(A) = #Pfin(A) = #A, (A.37)

i.e. there exists a bijective map, mapping A onto Pfin(A) as well as a bijective map,mapping A onto Pn(A), where Pn(A) denotes the set of subsets of A with preciselyn elements (both results of (b) make use of AC).

Proof. (a): χ↾Pfin(A) is injective, since χ is injective. If f ∈ {0, 1}Afin, thenBf := {a ∈ A : f(a) 6= 0} ∈ Pfin(A)

with f = χBf, showing χ↾Pfin(A) to be surjective onto {0, 1}Afin.

(b): Let n ∈ N and let a1, . . . , an+1 ∈ A be n+ 1 distinct elements of A. Define

ιn : A −→ Pn(A), ιn(a) :=

{

a ∪ {a1, . . . , an−1} for a /∈ {a1, . . . , an−1},{a1, . . . , an+1} \ {a} for a ∈ {a1, . . . , an−1}.

Then, clearly, each ιn is injective as a map into Pn(A) and, due to Pn(A) ⊆ Pfin(A),also as a map into Pfin(A). Thus, it remains to show the existence of injective mapsφn : Pn(A) −→ A and φ : Pfin(A) −→ A (then (A.37) holds due to the Schroder-Bernstein Th. 3.12). Fix n ∈ N. Due to Cor. A.57, it suffices to define an injective mapφn : Pn(A) −→ An. For each B ∈ Pn(A), let φB : {1, . . . , n} −→ B be bijective anddefine

φn : Pn(A) −→ An, φn(B) := φB.

If B,C ∈ Pn(A) with x ∈ B \ C, then x ∈ φB({1, . . . , n}), but x /∈ φC({1, . . . , n}),showing φB 6= φC , i.e. φn is injective. To obtain φ, it suffices to define an injective mapφ : Pfin(A) −→ A×N, as there exists a bijective map from A×N onto A by Th. A.56.Since

Pfin(A) =⋃

n∈N

Pn(A), (A.38)

we may define

φ : Pfin(A) −→ A× N, φ(B) :=(φn(B), n

)for B ∈ Pn(A),

which is, clearly, injective, due to (A.38) and the injectivity of φn. �

Theorem A.59. Let A,B be sets. If 2 ≤ #A (i.e. there exists an injective map ι2 :{0, 1} −→ A), #A ≤ #B (i.e. there exists an injective map ιA : A −→ B), and B isinfinite, then

#AB = #2B = #P(B), (A.39a)

#ABfin = #2Bfin = #Pfin(B)

(A.37)= #B, (A.39b)

i.e. there exist bijective maps φ1 : AB −→ {0, 1}B and φ2 : {0, 1}B −→ P(B), as well as

bijective maps φ1,f : ABfin −→ {0, 1}Bfin and φ2,f : {0, 1}Bfin −→ Pfin(B) (this result makes

use of AC). For the purposes of (A.39b), we introduce the notation 0 := ι2(0) ∈ A, sothat both AB

fin and 2Bfin make sense according to the definition in Ex. 5.16(c).

B ASSOCIATIVITY AND COMMUTATIVITY 191

Proof. The existence of φ2 was already established in Prop. 2.18, the existence of φ2,f inTh. A.58(a). Thus, according to the Schroder-Bernstein Th. 3.12, for the existence of φ1,it suffices to show the existence of injective maps f1 : {0, 1}B −→ AB, f2 : A

B −→ BB,f3 : B

B −→ P(B × B), as well as a bijective map g : P(B × B) −→ P(B), which canbe stated concisely as

#2B ≤ #AB ≤ #BB ≤ #P(B ×B) = #P(B). (A.40a)

Analogously, for the existence of φ1,f , it suffices to show the existence of injective mapsf1,f : {0, 1}Bfin −→ AB

fin, f2,f : ABfin −→ BB

fin, f3,f : BBfin −→ Pfin(B × B), as well as a

bijective map gf : Pfin(B ×B) −→ Pfin(B), which can be stated concisely as

#2Bfin ≤ #ABfin ≤ #BB

fin ≤ #Pfin(B ×B) = #Pfin(B) (A.40b)

(for BBfin to make sense according to the definition in Ex. 5.16(c), we now also introduce

the notaion 0 := ιA(ι2(0)) ∈ B). Clearly, the maps

f1 : {0, 1}B −→ AB, f1(α)(b) := ι2(α(b)),

f2 : AB −→ BB, f2(α)(b) := ιA(α(b)),

f3 : BB −→ P(B × B), f3(α) := {(x, y) ∈ B ×B : y = α(x)},

are, indeed, injective. Then the restrictions

f1,f : {0, 1}Bfin −→ ABfin, f1,f := f1↾{0,1}B

fin, f2,f : A

Bfin −→ BB

fin, f2,f := f2↾ABfin,

are well-defined and injective as well. While the restriction of f3 does not work, as itdoes not map into Pfin(B ×B), we can use the injective map

f3,f : BBfin −→ Pfin(B ×B), f3,f(α) := {(x, y) ∈ B ×B : y = α(x) 6= 0}.

From Th. A.56, we know the existence of a bijective map ψ : B ×B −→ B, implying

g : P(B × B) −→ P(B), g(C) := ψ(C) = {ψ(x, y) : (x, y) ∈ C},to be bijective as well, thereby proving (A.40a) and (A.39a). Then the restriction

gf : Pfin(B ×B) −→ Pfin(B), gf := g↾Pfin(B×B),

is well-defined and also bijective, proving (A.40b) and (A.39b). �

B General Forms of the Laws of Associativity and

Commutativity

B.1 Associativity

In the literature, the general law of associativity is often stated in the form thata1a2 · · · an gives the same result “for every admissible way of inserting parentheses intoa1a2 · · · an”, but a completely precise formulation of what that actually means seems tobe rare. As a warm-up, we first prove a special case of the general law:


Proposition B.1. Let (A, ·) be a semigroup (i.e. · : A × A −→ A is an associativecomposition on A). Then

∀n∈N,n≥2

∀a1,...,an∈A

∀k∈{2,...,n}

(n∏

i=k

ai

)(k−1∏

i=1

ai

)

=n∏

i=1

ai, (B.1)

where the product symbol is defined according to (3.16a).

Proof. The assumed associativity means the validity of

∀a,b,c∈A

(ab)c = a(bc). (B.2)

If k = n, then (B.1) is immediate from (3.16a). For 2 ≤ k < n, we prove (B.1) byinduction on n: For the base case, n = 2, there is nothing to prove. For n > 2, onecomputes

(n∏

i=k

ai

)(k−1∏

i=1

ai

)

(3.16a)=

(

an ·n−1∏

i=k

ai

)(k−1∏

i=1

ai

)

(B.2)= an ·

((n−1∏

i=k

ai

)(k−1∏

i=1

ai

))

ind.hyp.= an ·

n−1∏

i=1

ai(3.16a)=

n∏

i=1

ai, (B.3)

completing the induction and the proof of the proposition. �

The difficulty in stating the general form of the law of associativity lies in giving aprecise definition of what one means by “an admissible way of inserting parentheses intoa1a2 · · · an”. So how does one actually proceed to calculate the value of a1a2 · · · an, giventhat parentheses have been inserted in an admissible way? The answer is that one doesit in n− 1 steps, where, in each step, one combines two juxtaposed elements, consistentwith the inserted parentheses. There can still be some ambiguity: For example, for(a1a2)(a3(a4a5)), one has the freedom of first combining a1, a2, or of first combininga4, a5. In consequence, our general law of associativity will show that, for each admissiblesequence of n − 1 directives for combining two juxtaposed elements, the final result isthe same (under the hypothesis that (B.2) holds). This still needs some preparatorywork.

In the following, one might see it as a slight notational inconvenience that we havedefined

∏ni=1 ai as an · · · a1 rather than a1 · · · an. For this reason, we will enumerate the

elements to be combined by composition from right to left rather then from left to right.

Definition and Remark B.2. Let A be a nonempty set with a composition · : A ×A −→ A, let n ∈ N, n ≥ 2, and let I be a totally ordered index set, #I = n, I ={i1, . . . , in} with i1 < · · · < in. Moreover, let F := (ain , . . . , ai1) be a family of nelements of A.


(a) An admissible composition directive (for combining two juxtaposed elements of thefamily) is an index ik ∈ I with 1 ≤ k ≤ n − 1. It transforms the family F intothe family G := (ain , . . . , aik+1

aik , . . . , ai1). In other words, G = (bj)j∈J , whereJ := I \ {ik+1}, bj = aj for each j ∈ J \ {ik}, and bik = aik+1

aik . We can write thistransformation as two maps

F 7→ δ(1)ik(F ) := G = (ain , . . . , aik+1

aik , . . . , ai1) = (bj)j∈J , (B.4a)

I 7→ δ(2)ik(I) := J = I \ {ik+1}. (B.4b)

Thus, an application of an admissible composition directive reduces the length ofthe family and the number of indices by one.

(b) Recursively, we define (finite) sequences of families, index sets, and indices as fol-lows:

Fn := F, In := I, (B.5a)

∀α∈{2,...,n}

Fα−1 := δ(1)jα(Fα), Iα−1 := δ

(2)jα(Iα), where jα ∈ Iα \ {max Iα}.

(B.5b)

The corresponding sequence of indices D := (jn, . . . , j2) in I is called an admissibleevaluation directive. Clearly,

∀α∈{1,...,n}

#Iα = α, i.e. Fα has length α. (B.6)

In particular, I1 = {j2} = {i1} (where the second equality follows from (B.4b)),F1 = (a), and we call

D(F ) := a (B.7)

the result of the admissible evaluation directive D applied to F .

Theorem B.3 (General Law of Associativity). Let (A, ·) be a semigroup (i.e. · : A ×A −→ A is an associative composition on A). Let n ∈ N, n ≥ 2, and let I be atotally ordered index set, #I = n, I = {i1, . . . , in} with i1 < · · · < in. Moreover, letF := (ain , . . . , ai1) be a family of n elements of A. Then, for each admissible evaluationdirective as defined in Def. and Rem. B.2(b), the result is the same, namely

D(F ) =n∏

k=1

aik . (B.8)

Proof. We conduct the proof via induction on n. For n = 3, there are only two possibledirectives and (B.2) guarantees that they yield the same result. For the induction step,let n > 3. As in Def. and Rem. B.2(b), we write D = (jn, . . . , j2) and obtain someI2 = {i1, im}, 1 < m ≤ n, as the corresponding penultimate index set. Depending onim, we partition (jn, . . . , j3) as follows: Set

J1 :={k ∈ {3, . . . , n} : jk < im

}, J2 :=

{k ∈ {3, . . . , n} : jk ≥ im

}. (B.9)


Then, for k ∈ J1, jk is a composition directive to combine two elements to the right ofaim and, for k ∈ J2, jk is a composition directive to combine two elements to the left ofaim . Moreover, J1 and J2 might or might not be the empty set: If J1 = ∅, then jk 6= i1for each k ∈ {3, . . . , n}, implying im = i2; if J2 = ∅, then, in each of the n− 2 steps toobtain I2, an ik with k < m was removed from I, implying im = in (in particular, asn 6= 2, J1 and J2 cannot both be empty). If J1 6= ∅, then D1 := (jk)k∈J1 is an admissibleevaluation directive for (aim−1

, . . . , ai1) – this follows from

jk ∈ K ⊆ {i1, . . . , im−1} ⇒ δ(2)jk(K) ⊆ K ⊆ {i1, . . . , im−1}. (B.10)

Since m− 1 < n, the induction hypothesis applies and yields

D1(aim−1, . . . , ai1) =

m−1∏

k=1

aik . (B.11)

Analogously, if J2 6= ∅, then D2 := (jk)k∈J2 is an admissible evaluation directive for(ain , . . . , aim) – this follows from

jk ∈ K ⊆ {im, . . . , in} ⇒ δ(2)jk(K) ⊆ K ⊆ {im, . . . , in}. (B.12)

Since m > 1, the induction hypothesis applies and yields

D2(ain , . . . , aim) =n∏

k=m

aik . (B.13)

Thus, if J1 6= ∅ and J2 6= ∅, then we obtain

D(F )j2=i1= D2(ain , . . . , aim) · D1(aim−1

, . . . , ai1) =

(n∏

k=m

aik

)(m−1∏

k=1

aik

)

Prop. B.1=

n∏

k=1

aik

(B.14)as desired. If J1 = ∅, then, as explained above, im = i2. Thus, in this case,

D(F )j2=i1= D2(ain , . . . , ai2) · ai1 =

(n∏

k=2

aik

)

· ai1Prop. B.1

=n∏

k=1

aik (B.15)

as needed. Finally, if J2 = ∅, then, as explained above, im = in. Thus, in this case,

D(F )j2=i1= ain · D1(ain−1

, . . . , ai1) =n∏

k=1

aik , (B.16)

again, as desired, and completing the induction. �

B.2 Commutativity

In the present section, we will generalize the law of commutativity ab = ba to a finitenumber of factors, provided the composition is also associative.


Theorem B.4 (General Law of Commutativity). Let (A, ·) be a semigroup (i.e. · :A × A −→ A is an associative composition on A). If the composition is commutative,i.e. if

∀a,b∈A

ab = ba, (B.17)

then

∀n∈N

∀π∈Sn

∀a1,...,an∈A

n∏

i=1

ai =n∏

i=1

aπ(i), (B.18)

where Sn is the set of bijective maps on {1, . . . , n} (cf. Ex. 4.9(b)).

Proof. We conduct the proof via induction on n: For n = 1, there is nothing to proveand, for n = 2, (B.18) is the same as (B.17). So let n > 2. As π is bijective, we maydefine k := π−1(n). Then

n∏

i=1

aπ(i)Th. B.3=

(n∏

i=k+1

aπ(i)

)

· aπ(k) ·(

k−1∏

i=1

aπ(i)

)

(B.17)= aπ(k) ·

(n∏

i=k+1

aπ(i)

)

·(

k−1∏

i=1

aπ(i)

)

. (B.19)

Define the bijective map

ϕ : {1, . . . , n− 1} −→ {1, . . . , n− 1}, ϕ(j) :=

{

π(j) for 1 ≤ j ≤ k − 1,

π(j + 1) for k ≤ j ≤ n− 1(B.20)

(where the bijectivity of π implies the bijectivity of ϕ). Then

n∏

i=1

aπ(i)(B.19)= an ·

(n−1∏

i=k

aϕ(i)

)

·(

k−1∏

i=1

aϕ(i)

)

Th. B.3= an ·

n−1∏

i=1

aϕ(i)

ind.hyp.= an ·

n−1∏

i=1

ai =n∏

i=1

ai, (B.21)

completing the induction proof. �

The following example shows that, if the composition is not associative, then, in general,(B.17) does not imply (B.18):

Example B.5. Let A := {a, b, c} with #A = 3 (i.e. the elements a, b, c are all distinct).Let the composition · on A be defined according to the composition table

· a b ca b b bb b b ac b a a

C GROUPS 196

Then, clearly, · is commutative. However, · is not associative, since, e.g.,

(cb)a = aa = b 6= a = cb = c(ba), (B.22)

and (B.18) does not hold, since, e.g.,

a(bc) = aa = b 6= a = cb = c(ba). (B.23)

C Groups

In Ex. 4.9(b), we defined the symmetric group SM of bijective maps fromM intoM (i.e.of so-called permutations). It is a remarkable result that every group G is isomorphicto a subgroup of the permutation group SG and the proof is surprisingly simple (cf.Th. C.2 below). On the other hand, this result seems to have surprisingly few usefulapplications, partly due to the fact that SG is usually much bigger (and, thus, usuallymore difficult to study) than G (cf. Prop. C.4 below).

We start with a preparatory lemma:

Lemma C.1. Let M,N be sets and let ψ : M −→ N be bijective. Then the symmetricgroups SM and SN are isomorphic.


Theorem C.2 (Cayley). Let (G, ·) be a group. Then G is isomorphic to a subgroup ofthe symmetric group (SG, ◦) of permutations on G. In particular, every finite group isisomorphic to a subgroup of Sn.

Proof. Exercise. Hint: Show that

φ : G −→ SG, a 7→ fa, fa(x) := ax, (C.1)

defines a monomorphism and use Lem. C.1. �

Notation C.3. Let M,N be sets. Define

S(M,N) := {(f : M −→ N) : f bijective}.

Proposition C.4. Let M,N be sets with #M = #N = n ∈ N0, S := S(M,N) (cf.Not. C.3). Then #S = n!; in particular #SM = n!.

Proof. We conduct the proof via induction: If n = 0, then S contains precisely theempty map (i.e. the empty set) and #S = 1 = 0! is true. If n = 1 and M = {a},N = {b}, then S contains precisely the map f : M −→ N , f(a) = b, and #S = 1 = 1!is true. For the induction step, fix n ∈ N and assume #M = #N = n + 1. Let a ∈ Mand

A :=⋃

b∈N

S(

M \ {a}, N \ {b})

. (C.2)

D NUMBER THEORY 197

Since the union in (C.2) is finite and disjoint, one has

#A =∑

b∈N

#S(

M \ {a}, N \ {b})

ind.hyp.=

∑

b∈N

(n!) = (n+ 1) · n! = (n+ 1)!. (C.3)

Thus, it suffices to show

φ : S −→ A, φ(f) : M \ {a} −→ N \ {f(a)}, φ(f) := f ↾M\{a}, (C.4)

is well-defined and bijective. If f : M −→ N is bijective, then f ↾M\{a}: M \ {a} −→N \ {f(a)} is bijective as well, i.e. φ is well-defined. Suppose f, g ∈ S with f 6= g. Iff(a) 6= g(a), then φ(f) 6= φ(g), as they have different ranges. If f(a) = g(a), then thereexists x ∈ M \ {a} with f(x) 6= g(x), implying φ(f)(x) = f(x) 6= g(x) = φ(g)(x), i.e.,once again, φ(f) 6= φ(g). Thus, φ is injective. Now let h ∈ S

(M \{a}, N \{b}

)for some

b ∈ N . Letting

f : M −→ N, f(x) :=

{

b for x = a,

h(x) for x 6= a,

we have φ(f) = h, showing φ to be surjective as well. �

D Number Theory

The algebraic discipline called number theory, studies properties of the ring of integers(and similar algebraic structures).

Theorem D.1 (Remainder Theorem). For each pair of numbers (a, b) ∈ N2, thereexists a unique pair of numbers (q, r) ∈ N2

0 satisfying the two conditions a = qb+ r and0 ≤ r < b.

Proof. Existence: Define

q := max{n ∈ N0 : nb ≤ a}, (D.1a)

r := a− qb. (D.1b)

Then q ∈ N0 by definition and (D.1b) immediately yields a = qb + r as well as r ∈ Z.Moreover, from (D.1a), qb ≤ a = qb + r, i.e. 0 ≤ r, in particular, r ∈ N0. Since (D.1a)also implies (q + 1)b > a = qb+ r, we also have b > r as required.

Uniqueness: Suppose (q1, r1) ∈ N0, satisfying the two conditions a = q1b + r1 and0 ≤ r1 < b. Then q1b = a − r1 ≤ a and (q1 + 1)b = a − r1 + b > a, showingq1 = max{n ∈ N0 : nb ≤ a} = q. This, in turn, implies r1 = a − q1b = a − qb = r,thereby establishing the case. �

Definition D.2. (a) Let a, k ∈ Z. We define a to be a divisor of k (and also say thata divides k, denoted a| k) if, and only if, there exists b ∈ Z such that k = ab. If ais no divisor of k, then we write a 6 | k.

D NUMBER THEORY 198

(b) A number p ∈ N, p ≥ 2, is called prime if, and only if, 1 and p are its only divisors.

(c) Let M be a nonempty set of integers. If M 6= {0}, then the number

gcd(M) := max

{

n ∈ N : ∀k∈M

n| k}

(D.2)

is called the greatest common divisor of the numbers in M . If M = {a, b}, thenone also writes gcd(a, b) := gcd(M). The numbers in M are called relatively primeor mutually prime if, and only if gcd(M) = 1. If M is finite and 0 /∈ M , then thenumber

lcm(M) := min

{

n ∈ N : ∀k∈M

k|n}

(D.3)

is called the least common multiple of the numbers in M . If M = {a, b}, then onealso writes lcm(a, b) := lcm(M).

Remark D.3. Let M be a nonempty set of integers. If M 6= {0}, then gcd(M) iswell-defined, since, for each k ∈ Z, 1| k, and, if 0 6= k ∈ M , then gcd(M) ≤ k. If M isfinite and 0 /∈ M , then lcm(M) is well-defined, since max{|k| : k ∈ M} ≤ lcm(M) ≤∏

k∈M |k|. Some examples are

gcd(Z) = 1, gcd(3Z) = 3, gcd{8, 12, 20} = 4,

lcm(2, 3} = 6, lcm(4, 8) = 8, lcm{8, 12, 20} = 120.

Theorem D.4 (Bezout’s Lemma). If a, b ∈ Z with {a, b} 6= {0} and d := gcd(a, b),then

{xa+ yb : x, y ∈ Z} = dZ. (D.4)

In particular,∃

x,y∈Zxa+ yb = d, (D.5)

which is known as Bezout’s identity. An important special case is

gcd(a, b) = 1 ⇒ ∃x,y∈Z

xa+ yb = 1. (D.6)

Proof. Clearly, it suffices to prove (D.4). To prove (D.4), we show

S := {xa+ yb : x, y ∈ Z} ∩ N = dN. (D.7)

By setting y := 0 and x := ±1, we see that S contains a or−a, i.e. S 6= ∅. Let d := minS.Then there exist s, t ∈ Z with sa+ tb = d. It remains to show d := gcd(a, b). We applyTh. D.1, to obtain (q, r) ∈ N2

0 with

|a| = qd+ r ∧ 0 ≤ r < d.

Then, letting s := s for a = |a| and s := −s for a = −|a|, we have

r = |a| − qd = |a| − q(|a|s+ bt) = |a|(1− qs)− bqt ∈ S ∪ {0}.

D NUMBER THEORY 199

Since d = minS and r < d, this yields r = 0 and d | a. Using b instead of a in the abovereasoning, yields d | b as well, showing d to be a common divisor of a and b. Now letc ∈ N be an arbitrary common divisor of a and b. Then there exist u, v ∈ Z such thata = cu and b = cv, implying

d = as+ bt = cus+ cvt = c(us+ vt).

In consequence c|d, implying c ≤ d, showing d := gcd(a, b), as desired. �

Theorem D.5 (Euclid’s Lemma). Let a, b ∈ Z and n ∈ N.

(a) If n and a are relatively prime (i.e. gcd(n, a) = 1), then

n| ab ⇒ n|b.

(b) If n is prime, thenn| ab ⇒

(n|a ∨ n|b

).

Proof. (a): As gcd(n, a) = 1, from (D.6), we obtain x, y ∈ Z such that xn + ya = 1.Multiplying this equation by b yields

xnb+ yab = b.

Moreover, by hypothesis, there exists c ∈ Z with ab = nc, implying n(xb + yc) = b, i.e.n|b as claimed.

(b): Suppose n is prime, n| ab, and n 6 | a. As n is prime, the only divisors of n are 1and n. As n 6 | a, we know gcd(n, a) = 1, implying n|b by (a). �

Theorem D.6 (Fundamental Theorem of Arithmetic). Every n ∈ N, n ≥ 2, has aunique factorization into prime numbers (unique, except for the order of the primes).More precisely, given n as above, there exists a unique function kn : Pn −→ N such thatPn ⊆ {p ∈ N : p prime} and

n =∏

p∈Pn

pkn(p) (D.8)

(then, clearly, Pn is necessarily finite).

Proof. Existence: This is a simple induction proof: The base case is n = 2. As 2 isclearly prime, we can let P2 := {2} and k2(2) := 1. For the induction step, let n > 2. Ifn is prime, then, as in the base case, let Pn := {n} and kn(n) := 1. If n is not prime,then there exist a, b ∈ N with 1 < a ≤ b < n and n = ab. By induction hypothesis, thereexist functions ka : Pa −→ N, kb : Pb −→ N, such that Pa, Pb ⊆ {p ∈ N : p prime} and

a =∏

p∈Pa

pka(p), b =∏

p∈Pb

pkb(p).

E VECTOR SPACES 200

Letting Pn := Pa ∪ Pb and

kn : Pn −→ N, kn(p) :=

ka(p) for p ∈ Pa \ Pb,

kb(p) for p ∈ Pb \ Pa,

ka(p) + kb(p) for p ∈ Pa ∩ Pb,

we obtain

n = ab =∏

p∈Pa

pka(p)∏

p∈Pb

pkb(p) =∏

p∈Pa\Pb

pka(p)∏

p∈Pb\Pa

pkb(p)∏

p∈Pa∩Pb

pka(p)+kb(p) =∏

p∈Pn

pkn(p),

thereby completing the induction proof.

Uniqueness: Here we will make use of Th. D.5(b). As existence, the proof is via inductionon n. Since n = 2 has precisely the divisors 1 and 2, uniqueness for the base case is clear.In the same way, uniqueness is clear for every prime n > 2. Thus, for the inductionstep, it only remains to consider the case, where n > 2 is not prime. Suppose, we haveP,Q ⊆ {p ∈ N : p prime} and k : P −→ N, l : Q −→ N, such that

n =∏

p∈P

pk(p) =∏

q∈Q

ql(q).

Let p0 ∈ P . Since p0 is prime and p0|n =∏

q∈Q ql(q), Th. D.5(b) implies p0| q0 for some

q0 ∈ Q. As p0 and q0 are both prime, this implies p0 = q0 and

m := n/p0 = pk(p0)−10

∏

p∈P\{p0}

pk(p) = pl(p0)−10

∏

q∈Q\{p0}

ql(q).

Since n is not prime, we know m ≥ 2. As we know m < n, we may use the inductionhypothesis to conclude P = Q, k(p) = l(p) for each p ∈ P \{p0}, and k(p0)−1 = l(p0)−1,i.e. k = l as desired. �

Theorem D.7 (Euclid’s Theorem). The set of prime numbers is infinite.

Proof. We show that no finite set of primes contains every prime: Let ∅ 6= P be a finiteset of prime numbers. Define p :=

∏

q∈P q, p := p+ 1. Then p /∈ P , since p > q for eachq ∈ P . Thus, if p is prime, we are already done. If p is not prime, then there exists aprime n such that n|p (this is immediate from Th. D.6, but also follows easily withoutTh. D.6, since p can only have finitely many factors a with 1 < a < p). If n ∈ P , thenthere is a ∈ N such that na = p. As there is also b ∈ N such that nb = p, we have1 = p − p = nb − na = n(b − a), implying n = b − a = 1, in contradiction to 1 /∈ P .Thus, n /∈ P , completing the proof. �

E Vector Spaces

E.1 Cardinality and Dimension

Theorem E.1. Let V be a vector space over the field F and let B be a basis of V .

E VECTOR SPACES 201

(a) #V = #FBfin, i.e. there exists a bijective map φ : V −→ FB

fin, where FBfin is defined

as in Ex. 5.16(c).

(b) If B is infinite and #F ≤ #B (i.e. there exists an injective map φF : F −→ B),then

#V = #FBfin = #2Bfin = #Pfin(B) = #B, (E.1)

i.e. there exist bijective functions

φ1 : V −→ FBfin, φ2 : F

Bfin −→ {0, 1}Bfin,

φ3 : {0, 1}Bfin −→ Pfin(B), φ4 : Pfin(B) −→ B.

Proof. (a): Given v ∈ V and b ∈ B, let cv(b) denote the coordinate of v with respect tob according to Th. 5.19. Then

φ : V −→ FBfin, φ(v)(b) := cv(b),

is well-defined and bijective by Th. 5.19.

(b): The map φ1 exists according to (a), whereas φ2, φ3, φ3 exist according to Th.A.59. �

Corollary E.2. Let F be a field and let M be an infinite set. If #F ≤ #M (i.e. thereexists an injective map φF : F −→M), then

dimFM = #2M (as vector space over F ), (E.2)

i.e. if B is a basis of FM , then there exists a bijective map φ : B −→ P(M). Inparticular, we have

dimRR = #2R (as vector space over R), (E.3a)

dim(Z2)N = #2N

[Phi16, Th. F.2]= #R (as vector space over Z2). (E.3b)

Proof. Let B be a basis of FM . Since FMfin is a subspace of FM and dimFM

fin = #M by(5.36), there exists an injective map φM : M −→ B by Th. 5.27(b). In consequence,the map (φM ◦ φF ) : F −→ B is injective as well. Thus, Th. E.1(b) applies and proves(E.2). For (E.3a), we apply (E.2) with F = M = R; for (E.3b), we apply (E.2) withF = Z2 and M = N. �

E.2 Cartesian Products

Example E.3. In generalization of Ex. 5.2(c), if I is a nonempty index set, F is a field,and (Yi)i∈I is a family of vector spaces over F , then we make the Cartesian productV :=

∏

i∈I Yi into a vector space over F by defining for each a := (ai)i∈I , b := (bi)i∈I ∈ V :

(a+ b)i := ai + bi, (E.4a)

(λa)i := λai for each λ ∈ F : (E.4b)

E VECTOR SPACES 202

To verify that (V,+, ·) is, indeed, a vector space, we begin by checking Def. 5.1(i), i.e.by showing (V,+) is a commutative group: Let n = (ni)i∈I ∈ A, ni := 0 ∈ Yi. Then

∀a=(ai)i∈I∈V

∀i∈I

(a+ n)i = ai + 0 = ai,

i.e. a+n = a, showing n is the additive neutral element of V . Next, given a = (ai)i∈I ∈V , define a = (ai)i∈I , ai := −ai. Then

∀i∈I

(a+ a)i = ai + (−ai) = 0,

i.e. a+ a = n = 0, showing a to be the additive inverse element of a (as usual, one thenwrites −a instead of a). Associativity is verified since

∀(ai)i∈I ,(bi)i∈I ,(ci)i∈I∈V

∀i∈I

((a+ b) + c

)

i= (ai + bi) + ci

(∗)= ai + (bi + ci)

=(a+ (b+ c)

)

i,

where the associativity of addition in the vector space Yi was used at (∗). Likewise,commutativity is verified since

∀(ai)i∈I ,(bi)i∈I∈V

∀i∈I

(a+ b)i = ai + bi(∗)= bi + ai = (b+ a)i,

where the commutativity of addition in the vector space Yi was used at (∗). Thiscompletes the proof that (V,+) is a commutative group.

To verify Def. 5.1(ii), one computes

∀λ∈F

∀(ai)i∈I ,(bi)i∈I∈V

∀i∈I

(λ(a+ b)

)

i= λ(ai + bi)

(∗)= λai + λbi

= (λa+ λb)i,

where distributivity in the vector space Yi was used at (∗), proving λ(a+ b) = λa+ λb.Similarly,

∀λ,µ∈F

∀(ai)i∈I∈V

∀i∈I

((λ+ µ)a

)

i= (λ+ µ)ai

(∗)= λai + µai

= (λa+ µa)i,

where, once more, distributivity in the vector space Yi was used at (∗), proving (λ+µ)a =λa+ µa.

The proof of Def. 5.1(iii), is given by

∀λ,µ∈F

∀(ai)i∈I∈V

∀i∈I

((λµ)a

)

i= (λµ)ai

(∗)= λ(µai) =

(λ(µa)

)

i,

where the validity of Def. 5.1(iii) in the vector space Yi was used at (∗).Finally, to prove Def. 5.1(iv), one computes

∀(ai)i∈I∈V

∀i∈I

(1 · a)i = 1 · ai(∗)= ai,

where the validity of Def. 5.1(iv) in the vector space Yi was used at (∗).

REFERENCES 203

Example E.4. In generalization of Ex. 6.7(c), we consider general projections definedon a general Cartesian product of vector spaces, as defined in Ex. E.3 above: Let I bea nonempty index set, let (Yi)i∈I be a family of vector spaces over the field F , and letV :=

∏

i∈I Yi be the Cartesian product vector space as defined in Ex. E.3. For each∅ 6= J ⊆ I, define

πJ : V −→ VJ :=∏

i∈J

Yi, πJ ((ai)i∈I) := (ai)i∈J ,

calling πJ the projection onto the coordinates in J We verify πJ to be linear: Let λ ∈ Fand a := (ai)i∈I , b := (bi)i∈I ∈ V . Then

πJ(λa) = (λai)i∈J ,= λ(ai)i∈J = λπJ(a),

πJ(a+ b) = (ai + bi)i∈J = (ai)i∈J + (bi)i∈J = πJ(a) + πJ(b),

proving πJ to be linear. Moreover, for each ∅ 6= J ⊆ I, we have

Im πJ = VJ , ker πJ = {(ai)i∈I : ai = 0 for each i ∈ J} ∼= Y I\J , V = VJ ⊕ ker πJ ,(E.5)

where, for the last equality, we identified

VJ ∼= {(ai)i∈I : ai = 0 for each i /∈ J}.

References

[Bla84] A. Blass. Existence of Bases Implies the Axiom of Choice. ContemporaryMathematics 31 (1984), 31–33.

[EFT07] H.-D. Ebbinghaus, J. Flum, and W. Thomas. Einfuhrung in die math-ematische Logik, 5th ed. Spektrum Akademischer Verlag, Heidelberg, 2007(German).

[Jec73] T. Jech. The Axiom of Choice. North-Holland, Amsterdam, 1973.

[Kun80] Kenneth Kunen. Set Theory. Studies in Logic and the Foundations of Math-ematics, Vol. 102, North-Holland, Amsterdam, 1980.

[Kun12] Kenneth Kunen. The Foundations of Mathematics. Studies in Logic,Vol. 19, College Publications, London, 2012.

[Phi16] P. Philip. Analysis I: Calculus of One Real Variable. Lecture Notes, Lud-wig-Maximilians-Universitat, Germany, 2015/2016, available in PDF formatathttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_Analysis1.pdf.

REFERENCES 204

[Phi17a] P. Philip. Numerical Analysis I. Lecture Notes, Ludwig-Maximi-lians-Universitat, Germany, 2016/2017, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNotes/philipPeter_NumericalAnalysis1.pdf.

[Phi17b] P. Philip. Functional Analysis. Lecture Notes, Ludwig-Maximi-lians-Universitat, Germany, 2017, available in PDF format athttp://www.math.lmu.de/~philip/publications/lectureNot

es/philipPeter_FunctionalAnalysis.pdf.

[Wik15a] Wikipedia. Coq — Wikipedia, The Free Encyclopedia. 2015, https://en.wikipedia.org/wiki/Coq Online; accessed Sep-01-2015.

[Wik15b] Wikipedia. HOL Light — Wikipedia, The Free Encyclopedia. 2015, https://en.wikipedia.org/wiki/HOL_Light Online; accessed Sep-01-2015.

[Wik15c] Wikipedia. Isabelle (proof assistant) — Wikipedia, The Free Encyclopedia.2015, https://en.wikipedia.org/wiki/Isabelle_(proof_assistant)

Online; accessed Sep-01-2015.

Date post:	31-Oct-2019
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

LinearAlgebraI - math.lmu.de · LinearAlgebraI Peter Philip∗ Lecture Notes Created for the Class...

Documents