1
Schofield
Springer Texts in Business and Economics
Mathematical Methods in Economics and Social Choice
Mathem
atical Methods in
Economics and Social Choice
Second Edition
In recent years, the usual optimization techniques, which have proved so useful in microeconomic theory, have been extended to incorporate more powerful topological and differential methods, and these methods have led to new results on the qualitative behavior of general economic and political systems. These developments have neces-sarily resulted in an increase in the degree of formalism in the publications in the academic journals. This formalism can often deter graduate students. The progression of ideas presented in this book will familiarize the student with the geometric concepts underlying these topological methods, and, as a result, make mathematical economics, general equilibrium theory, and social choice theory more accessible.
2nd Ed.
Norman Schofield
Springer Texts in Business and Economics
Norman Schofield
Mathematical Methods in Economics and Social ChoiceSecond Edition
Business / Economics
9 7 8 3 6 4 2 3 9 8 1 7 9
isbn 978-3-642-39817-9
Metadata of the book and chapters that will be visualized online Book series name Springer Texts in Business and Economics
Book title Mathematical Methods in Economics and Social Choice
Book copyright year 2014
Book copyright holder Springer-Verlag Berlin Heidelberg
Corresponding Author Family name Schofield
Particle
Given Name Norman
Suffix
Division Center in Political Economy
Organization Washington University in Saint Louis
Address Saint Louis, MO, USA
Chapter 1
Chapter title Sets, Relations, and Preferences
Abstract Chapter 1 introduces elementary set theory and the notation to be used throughout the book. We also define the notions of a binary relation, of a function, as well as the axioms of a group and field. Finally we discuss the idea of an individual and social preference relation, and mention some of the concepts of social choice and welfare economics.
Chapter 2
Chapter title Linear Spaces and Transformations
Abstract Chapter 2 surveys material on linear or vector spaces, and introduces the idea of a linear transformation between vector spaces, and shows how a linear transformation can be represented by a matrix. We also prove the dimension theorem, which shows how a linear transformation can be represented in a simple form, given by its kernel and image. We also introduce the notion of an eigenvalue and eigenvector of a matrix.
Chapter 3
Chapter title Topology and Convex Optimisation
Abstract Chapter 3 covers Topology and convex optimization. In the Chap. 2 we introduced the notion of the scalar product of two vectors in ℜn. More generally if a scalar product is defined on some space, then this permits the definition of a norm, or length, associated with a vector, and this in turn allows us to define the distance between two vectors. A distance function or metr ic may be defined on a space, X, even when
X admits no norm. More general than the notion of a metric is that of a topology. This notion allows us to define the idea of continuity of a function as well as analogous ideas for a correspondence. We then introduce three powerful theorems, the Brouwer Fixed Point Theorem for a function, Michael’s Selection Theorem, and the Browder Fixed Point Theorem for a correspondence.
Chapter 4
Chapter title Differential Calculus and Smooth Optimisation
Abstract In this chapter we develop the ideas of the differential calculus. Under certain conditions a continuous function f :ℜn→ℜm can be approximated at each point x in ℜn
by a linear function df (x):ℜn→ℜm, known as the di f ferent ial of f at x. In the same way the differential df may be approximated by a bilinear map d2f(x). When all differentials are continuous then f is called smooth. For a smooth function f , Taylor’s Theorem gives a relationship between the differentials at a point x and the value of f in a neighbourhood of a point. This in turn allows us to characterise maximum points of the function by features of the first and second differential. For a real-valued function whose preference correspondence is convex we can virtually identify critical points (where df (x)=0) with the maxima of the function. We use calculus to derive important results in economic theory, namely conditions for existence of a price equilibrium for an economy, and the Welfare Theorem for an exchange economy.
Chapter 5
Chapter title Singularity Theory and General Equilibrium
Abstract In Chap. 5 we introduce the fundamental result in singularity theory, that the set of singularity points of a smooth preference profile almost always has a particular geometric structure. We then go on to use this result to discuss the Debreu-Smale Theorem on the generic existence of regular economies. Section <InternalRef RefID="Sec6">5.4</InternalRef> uses an example of Scarf (Int. Econ. Rev. 1:157–172, 1960) to illustrate the idea of an excess demand function for an exchange economy. The example provides a general way to analyse a smooth adjustment process leading to a Walrasian equilibrium. Sections <InternalRef RefID="Sec7">5.5</InternalRef> and <InternalRef RefID="Sec8">5.6</InternalRef> introduce the more abstract topological ideas of structural stability and chaos in dynamical systems.
Chapter 6
Chapter title Topology and Social Choice
Abstract In this chapter we apply earlier results to the study of social choice and modelling elections. In Chap. 3 we showed the Nakamura Theorem. that a social choice could
be guaranteed as long as the dimension of the space did not exceed k(σ)=2. We now consider what can happen in dimension above k(σ)=1. We then go on to consider “probabilistic” social choice, where there is some uncertainty over voters’ preferences, by constructing an empirical model of the 2008 U.S. presidential election.
Chapter 7
Chapter title Review Exercises
Abstract ??????
PDF-OUTPUT
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Springer Texts in Business and Economics
For further volumes:www.springer.com/series/10099
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Norman Schofield
Mathematical Methodsin Economics and SocialChoice
Second Edition
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Norman SchofieldCenter in Political EconomyWashington University in Saint LouisSaint Louis, MO, USA
ISSN 2192-4333 ISSN 2192-4341 (electronic)Springer Texts in Business and EconomicsISBN 978-3-642-39817-9 ISBN 978-3-642-39818-6 (eBook)DOI 10.1007/978-3-642-39818-6Springer Heidelberg New York Dordrecht London
Library of Congress Control Number: ???
© Springer-Verlag Berlin Heidelberg 2004, 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work. Duplication ofthis publication or parts thereof is permitted only under the provisions of the Copyright Law of thePublisher’s location, in its current version, and permission for use must always be obtained from Springer.Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violationsare liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date of pub-lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for anyerrors or omissions that may be made. The publisher makes no warranty, express or implied, with respectto the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Dedicated to the memory of Jeffrey Banksand Richard McKelvey
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Foreword
The use of mathematics in the social sciences is expanding both in breadth anddepth at an increasing rate. It has made its way from economics into the other socialsciences, often accompanied by the same controversy that raged in economics in the1950s. And its use has deepened from calculus to topology and measure theory tothe methods of differential topology and functional analysis.
The reasons for this expansion are several. First, and perhaps foremost, mathe-matics makes communication between researchers succinct and precise. Second, ithelps make assumptions and models clear; this bypasses arguments in the field thatare a result of different implicit assumptions. Third, proofs are rigorous, so math-ematics helps avoid mistakes in the literature. Fourth, its use often provides moreinsights into the models. And finally, the models can be applied to different contextswithout repeating the analysis, simply by renaming the symbols.
Of course, the formulation of social science questions must precede the construc-tion of models and the distillation of these models down to mathematical problems,for otherwise the assumptions might be inappropriate.
A consequence of the pervasive use of mathematics in our research is a changein the level of mathematics training required of our graduate students. We need ref-erence and graduate text books that address applications of advanced mathematicsto a widening range of social sciences. This book fills that need.
Many years ago, Bill Riker introduced me to Norman Schofield’s work and thento Norman. He is unique in his ability to span the social sciences and apply integra-tive mathematical reasoning to them all. The emphasis on his work and his book ison smooth models and techniques, while the motivating examples for presentationof the mathematics are drawn primarily from economics and political science. Thereader is taken from basic set theory to the mathematics used to solve problems atthe cutting edge of research. Students in every social science will find exposure tothis mode of analysis useful; it elucidates the common threads in different fields.Speculations at the end of Chap. 5 provide students and researchers with many openresearch questions related to the content of the first four chapters. The answers arein these chapters. When the first edition appeared in 2002, I wrote in my Forewordthat a goal of the reader should be to write Chap. 6. For the second edition of thebook, Norman himself has accomplished this open task.
Marcus BerliantSt. Louis, Missouri, USA2013
vii
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Preface to the Second Edition
For the second edition, I have added a new chapter six. This chapter continues withthe model presented in Chap. 3 by developing the idea of dynamical social choice.In particular the chapter considers the possibility of cycles enveloping the set ofsocial alternatives.
A theorem of Saari (1997) shows that for any non-collegial set, D, of decisive orwinning coalitions, if the dimension of the policy space is sufficiently large, then thechoice is empty under D for all smooth profiles in a residual subspace of Cr(W,�n).In other words the choice is generically empty.
However, we then define a social solution concept, known as the heart. Whenregarded as a correspondence, the heart is lower hemi-continuous. In general theheart is centrally located with respect to the distribution of voter preferences, andis guaranteed to be non-empty. Two examples are given to show how the heart isdetermined by the symmetry of the voter distribution.
Finally, to be able to use survey data of voter preferences, the chapter introducesthe idea of stochastic social choice. In situations where voter choice is given by aprobability vector, we can model the choice by assuming that candidates choosepolicies to maximise their vote shares. In general the equilibrium vote maximisingpositions can be shown to be at the electoral mean. The necessary and sufficient con-dition for this is given by the negative definiteness of the candidate vote Hessians. Inan empirical example, a multinomial logit model of the 2008 Presidential election ispresented, based on the American National Election Survey, and the parameters ofthis model used to calculate the Hessians of the vote functions for both candidates.According to this example both candidates should have converged to the electoralmean.
Norman SchofieldSaint Louis, Missouri, USAJune 13, 2013
ix
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Author’s Preface
In recent years, the optimisation techniques, which have proved so useful in mi-croeconomic theory, have been extended to incorporate more powerful topologicaland differential methods. These methods have led to new results on the qualitativebehaviour of general economic and political systems. However, these developmentshave also led to an increase in the degree of formalism in published work. This for-malism can often deter graduate students. My hope is that the progression of ideaspresented in these lecture notes will familiarise the student with the geometric con-cepts underlying these topological methods, and, as a result, make mathematicaleconomics, general equilibrium theory, and social choice theory more accessible.
The first chapter of the book introduces the general idea of mathematical struc-ture and representation, while the second chapter analyses linear systems and therepresentation of transformations of linear systems by matrices. In the third chapter,topological ideas and continuity are introduced and used to solve convex optimi-sation problems. These techniques are also used to examine existence of a “socialequilibrium.” Chapter four then goes on to study calculus techniques using a linearapproximation, the differential, of a function to study its “local” behaviour.
The book is not intended to cover the full extent of mathematical economicsor general equilibrium theory. However, in the last sections of the third and fourthchapters I have introduced some of the standard tools of economic theory, namelythe Kuhn Tucker Theorem, together with some elements of convex analysis andprocedures using the Lagrangian. Chapter four provides examples of consumer andproducer optimisation. The final section of the chapter also discusses, in a heuristicfashion, the smooth or critical Pareto set and the idea of a regular economy. The fifthand final chapter is somewhat more advanced, and extends the differential calculusof a real valued function to the analysis of a smooth function between “local” vectorspaces, or manifolds. Modem singularity theory is the study and classification of allsuch smooth functions, and the purpose of the final chapter to use this perspective toobtain a generic or typical picture of the Pareto set and the set of Walrasian equilibriaof an exchange economy.
Since the underlying mathematics of this final section are rather difficult, I havenot attempted rigorous proofs, but rather have sought to lay out the natural path ofdevelopment from elementary differential calculus to the powerful tools of singular-ity theory. In the text I have referred to work of Debreu, Balasko, Smale, and Saari,among others who, in the last few years, have used the tools of singularity theory to
xi
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
xii Author’s Preface
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
develop a deeper insight into the geometric structure of both the economy and thepolity. These ideas are at the heart of recent notions of “chaos.” Some speculationson this profound way of thinking about the world are offered in Sect. 5.6. Reviewexercises are provided at the end of the book.
I thank Annette Milford for typing the manuscript and Diana Ivanov for thepreparation of the figures.
I am also indebted to my graduate students for the pertinent questions they askedduring the courses on mathematical methods in economics and social choice, whichI have given at Essex University, the California Institute of Technology, and Wash-ington University in St. Louis.
In particular, while I was at the California Institute of Technology I had the priv-ilege of working with Richard McKelvey and of discussing ideas in social choicetheory with Jeff Banks. It is a great loss that they have both passed away. This bookis dedicated to their memory.
Norman SchofieldSaint Louis, Missouri, USA
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Contents
1 Sets, Relations, and Preferences . . . . . . . . . . . . . . . . . . . . 11.1 Elements of Set Theory . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 A Set Theory . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 A Propositional Calculus . . . . . . . . . . . . . . . . . 41.1.3 Partitions and Covers . . . . . . . . . . . . . . . . . . . 61.1.4 The Universal and Existential Quantifiers . . . . . . . . . 7
1.2 Relations, Functions and Operations . . . . . . . . . . . . . . . 71.2.1 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Groups and Morphisms . . . . . . . . . . . . . . . . . . . . . . 121.4 Preferences and Choices . . . . . . . . . . . . . . . . . . . . . . 24
1.4.1 Preference Relations . . . . . . . . . . . . . . . . . . . . 241.4.2 Rationality . . . . . . . . . . . . . . . . . . . . . . . . . 251.4.3 Choices . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.5 Social Choice and Arrow’s Impossibility Theorem . . . . . . . . 321.5.1 Oligarchies and Filters . . . . . . . . . . . . . . . . . . . 331.5.2 Acyclicity and the Collegium . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2 Linear Spaces and Transformations . . . . . . . . . . . . . . . . . 392.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 45
2.2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.2 The Dimension Theorem . . . . . . . . . . . . . . . . . 492.2.3 The General Linear Group . . . . . . . . . . . . . . . . . 532.2.4 Change of Basis . . . . . . . . . . . . . . . . . . . . . . 552.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Canonical Representation . . . . . . . . . . . . . . . . . . . . . 622.3.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . 632.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 662.3.3 Symmetric Matrices and Quadratic Forms . . . . . . . . 672.3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4 Geometric Interpretation of a Linear Transformation . . . . . . . 73
xiii
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
xiv Contents
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
3 Topology and Convex Optimisation . . . . . . . . . . . . . . . . . . 773.1 A Topological Space . . . . . . . . . . . . . . . . . . . . . . . . 77
3.1.1 Scalar Product and Norms . . . . . . . . . . . . . . . . . 773.1.2 A Topology on a Set . . . . . . . . . . . . . . . . . . . . 82
3.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.4.1 A Convex Set . . . . . . . . . . . . . . . . . . . . . . . 993.4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1003.4.3 Separation Properties of Convex Sets . . . . . . . . . . . 104
3.5 Optimisation on Convex Sets . . . . . . . . . . . . . . . . . . . 1103.5.1 Optimisation of a Convex Preference Correspondence . . 110
3.6 Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . . . . . . . . 1153.7 Choice on Compact Sets . . . . . . . . . . . . . . . . . . . . . . 1183.8 Political and Economic Choice . . . . . . . . . . . . . . . . . . 125References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4 Differential Calculus and Smooth Optimisation . . . . . . . . . . . 1354.1 Differential of a Function . . . . . . . . . . . . . . . . . . . . . 1354.2 Cr -Differentiable Functions . . . . . . . . . . . . . . . . . . . . 142
4.2.1 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . 1424.2.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . 1454.2.3 Critical Points of a Function . . . . . . . . . . . . . . . . 149
4.3 Constrained Optimisation . . . . . . . . . . . . . . . . . . . . . 1554.3.1 Concave and Quasi-concave Functions . . . . . . . . . . 1554.3.2 Economic Optimisation with Exogenous Prices . . . . . . 162
4.4 The Pareto Set and Price Equilibria . . . . . . . . . . . . . . . . 1714.4.1 The Welfare and Core Theorems . . . . . . . . . . . . . 1714.4.2 Equilibria in an Exchange Economy . . . . . . . . . . . 180
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5 Singularity Theory and General Equilibrium . . . . . . . . . . . . 1895.1 Singularity Theory . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.1.1 Regular Points: The Inverse and Implicit FunctionTheorem . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.1.2 Singular Points and Morse Functions . . . . . . . . . . . 1965.2 Transversality . . . . . . . . . . . . . . . . . . . . . . . . . . . 2005.3 Generic Existence of Regular Economies . . . . . . . . . . . . . 2035.4 Economic Adjustment and Excess Demand . . . . . . . . . . . . 2075.5 Structural Stability of a Vector Field . . . . . . . . . . . . . . . 2105.6 Speculations on Chaos . . . . . . . . . . . . . . . . . . . . . . . 221References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6 Topology and Social Choice . . . . . . . . . . . . . . . . . . . . . . 2316.1 Existence of a Choice . . . . . . . . . . . . . . . . . . . . . . . 2316.2 Dynamical Choice Functions . . . . . . . . . . . . . . . . . . . 233
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
Contents xv
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
6.3 Stochastic Choice . . . . . . . . . . . . . . . . . . . . . . . . . 2386.3.1 The Model Without Activist Valence Functions . . . . . . 244
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7 Review Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2517.1 Exercises to Chap. 1 . . . . . . . . . . . . . . . . . . . . . . . . 2517.2 Exercises to Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . 2517.3 Exercises to Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . 2537.4 Exercises to Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . 2557.5 Exercises to Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . 255
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
1Sets, Relations, and Preferences
In this chapter we introduce elementary set theory and the notation to be usedthroughout the book. We also define the notions of a binary relation, of a function, aswell as the axioms of a group and field. Finally we discuss the idea of an individualand social preference relation, and mention some of the concepts of social choiceand welfare economics.
1.1 Elements of Set Theory
Let U be a collection of objects, which we shall call the domain of discourse, theuniversal set, or universe. A set B in this universe (namely a subset of U ) is a sub-collection of objects from U . B may be defined either explicitly by enumerating theobjects, for example by writing
B = {Tom,Dick,Harry}, or
B = {x1, x2, x3, . . .}.
Alternatively B may be defined implicitly by reference to some property P(B),which characterises the elements of B , thus
B = {x : x satisfies P(B)}.
For example:
B = {x : x is an integer satisfying 1≤ x ≤ 5}
is a satisfactory definition of the set B , where the universal set could be the collec-tion of all integers. If B is a set, write x ∈ B to mean that the element x is a memberof B . Write {x} for the set which contains only one element, x.
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_1,© Springer-Verlag Berlin Heidelberg 2014
1
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2 1 Sets, Relations, and Preferences
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
If A, B are two sets write A∩B for the intersection: that is the set which containsonly those elements which are both in A and B . Write A ∪ B for the union: that isthe set whose elements are either in A or B . The null set or empty set Φ , is thatsubset of U which contains no elements in U .
Finally if A is a subset of U , define the negation of A, or complement of A in Uto be the set U\A=A= {x : x is in U but not in A}.
1.1.1 A Set Theory
Now let Γ be a family of subsets of U , where Γ includes both U and Φ , i.e., Γ ={U ,Φ,A,B, . . .}.
If A is a member of Γ , then write A ∈ Γ . Note that in this case Γ is a collectionor family of sets.
Suppose that Γ satisfies the following properties:1. for any A ∈ Γ,A ∈ Γ ,2. for any A,B in Γ,A∪B is in Γ ,3. for any A,B in Γ,A∩B is in Γ .Then we say that Γ satisfies closure with respect to (−,∪,∩), and we call Γ a settheory.
For example let 2U be the set of all subsets of U , including both U and Φ . Clearly2U satisfies closure with respect to (−,∪,∩).
We shall call a set theory Γ that satisfies the following axioms aBoolean algebra.
AxiomsS1. Zero element A∪Φ =A, A∩Φ =Φ
S2. Identity element A∪ U = U , A∩U =A
S3. Idempotency A∪A=A, A∩A=A
S4. Negativity A∪A= U , A∩A=Φ
A=A
S5. Commutativity A∪B = B ∪A
A∩B = B ∩A
S6. De Morgan Rule A∪B =A∩B
A∩B =A∪B
S7. Associativity A∪ (B ∪C)= (A∪B)∪C
A∩ (B ∩C)= (A∩B)∩C
S8. Distributivity A∪ (B ∩C)= (A∪B)∩ (A∪C)
A∩ (B ∪C)= (A∩B)∪ (A∩C).
We can illustrate each of the axioms by Venn diagrams in the following way.Let the square on the page represent the universal set U . A subset B of points
within U can then represent the set B . Given two subsets A,B the union is thehatched area, while the intersection is the double hatched area.
We shall use ⊂ to mean “included in”. Thus “A⊂ B” means that every elementin A is also an element of B . Thus:
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.1 Elements of Set Theory 3
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Fig. 1.1 ???
Fig. 1.2 ???
Suppose now that P(A) is the property that characterizes A, or that
A= {x : x satisfies P(A)}.
The symbol ≡ means “identical to”, so that [x ∈A] ≡ “x satisfies P(A)”.Associated with any set theory is a propositional calculus which satisfies prop-
erties analogous with a Boolean algebra, except that we use ∧ and ∨ instead of thesymbols ∩ and ∪ for “and” and “or”.
For example:
A∪B = {x : “x satisfies P(A)”∨ “x satisfies P(B)”}
A∩B = {x : “x satisfies P(A)”∧ “x satisfies P(B)”}.
The analogue of “⊂” is “if. . .then” or “implies”, which is written⇒.Thus A⊂ B ≡ [“x satisfies P(A)”⇒ “x satisfies P(B)”].The analogue of “=” in set theory is the symbol “⇐⇒” which means “if and
only if”, generally written “iff”. For example,
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4 1 Sets, Relations, and Preferences
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
[A= B] = [“x satisfies P(A)”⇐⇒ “x satisfies P(B)”].
Hence
[A= B] ≡ [“x ∈A”⇐⇒ “x ∈ B”]≡ [A⊂ B andB ⊂A].
1.1.2 A Propositional Calculus
Let {U ,Φ,P1, . . . ,Pi, . . .} be a family of simple propositions. U is the universalproposition and always true, whereas Φ is the null proposition and always false.Two propositions P1,P2 can be combined to give a proposition P1 ∧ P2 (i.e., P1
and P2) which is true iff both P1 and P2 are true, and a proposition P1∨P2 (i.e., P1
or P2) which is true if either P1 or P2 is true. For a proposition P , the complementP in U is true iff P is false, and is false iff P is true.
Now extend the family of simple propositions to a family P , by including in Pany propositional sentence S(P1, . . . ,Pi, . . .) which is made up of simple proposi-tions combined under −,∨,∧. Then P satisfies closure with respect to (−,∨,∧)
and is called a propositional calculus.Let T be the truth function, which assigns to any simple proposition, Pi , the
value 0 if Pi is false and 1 if Pi is true. Then T extends to sentences in the obviousway, following the rules of logic, to give a truth function T : P→{0,1}. If T (S1)=T (S2) for all truth values of the constituent simple propositions of the sentences S1
and S2, then S1 = S2 (i.e., S1 and S2 are identical propositions).For example the truth values of the proposition P1 ∨ P2 and P2 ∨ P1 are given
by the table:
T (P1) T (P2) T (P1 ∨ P2) T (P2 ∨ P1)
0 0 0 00 1 1 11 0 1 11 1 1 1
Since T (P1∨P2)= T (P2∨P1) for all truth values it must be the case that P1∨P2 =P2 ∨ P1.
Similarly, the truth tables for P1 ∧ P2 and P2 ∧ P1 are:
T (P1) T (P2) T (P1 ∧ P2) T (P2 ∧ P1)
0 0 0 00 1 0 01 0 0 01 1 1 1
Thus P1 ∧ P2 = P2 ∧ P1.The propositional calculus satisfies commutativity of ∧ and ∨. Using these truth
tables the other properties of a Boolean algebra can be shown to be true.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.1 Elements of Set Theory 5
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
For example:(i) P ∨Φ = P , P ∧Φ =Φ .
T (P ) T (Φ) T (P ∨ Φ) T (P ∧Φ)
0 0 0 01 0 1 0
(ii) P ∨ U = U , P ∧ U = P .
T (P ) T (U) T (P ∨ U) T (P ∧ U)
0 1 1 01 1 1 1
(iii) Negation is given by reversing the truth value. Hence=P= P .
T (P ) T (P ) T (=P)
0 1 01 0 1
(iv) P ∨ P = U , P ∧ P =Φ .
T (P ) T (P ) T (P ∨ P ) T (P ∧ P )
0 1 1 01 0 1 0
Example 1.1 Truth tables can be used to show that a propositional calculus P =(U ,Φ,P1,P2, . . .) with the operators (−,∨,∧) is a Boolean algebra.
Suppose now that S1(A1, . . . ,An) is a compound set (or sentence) which is madeup of the sets A1, . . . ,An together with the operators {∪,∩,− }.
For example suppose that
S1(A1,A2,A3)=A1 ∪ (A2 ∩A3),
and let P(A1),P (A2),P (A3) be the propositions that characterise A1,A2,A3.Then
S1(A1,A2,A3)={x : x satisfies “S1
(P(A1),P (A2),P (A3)
)”}
S1(P (A1),P (A2),P (A3)) has precisely the same form as S1(A1,A2,A3) exceptthat P(A1) is substituted for Ai , and (∧,∨) are substituted for (∩,∪).
In the example
S1(P(A1),P (A2),P (A3)
)= P(A1)∨(P(A2)∧ P(A3)
).
Since P is a Boolean algebra, we know [by associativity] that P(A1)∨ (P (A2)∧P(A3))= (P (A1)∨P(A2))∧ (P (A1)∨P(A3))= S2(P (A1),P (A2),P (A3)), say.
Hence the propositions S1(P (A1), P (A2), P (A3)) and S2(P (A1),P (A2),
P (A3)), are identical, and the sentence
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6 1 Sets, Relations, and Preferences
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
S1(A2,A2,A3)={x : x satisfies P
((A1)∨ P(A2)
)∧ (P(A1)∨ P(A3))}
= (A1 ∪A2)∩ (A1 ∪A3)
= S2(A1,A2,A3).
Consequently if Γ = (U ,Φ,A1,A2, . . .) is a set theory, then by exactly this proce-dure Γ can be shown to be a Boolean algebra.
Suppose now that Γ is a set theory with universal set U , and X is a subset of U .Let ΓX = (X,Φ,A1 ∩ X,A2 ∩X, . . .). Since Γ is a set theory on U ,ΓX must be aset theory on X, and thus there will exist a Boolean algebra in ΓX .
To see this consider the following:1. Since A ∈ Γ , then A ∈ Γ . Now let AX = A ∩ X. To define the com-
plement or negation (let us call it AX) of A in ΓX we have AX = {x :x is in X but not in A} = X ∩ A. As we noted previously this is also oftenwritten X−A, or X\A. But this must be the same as the complement or A∩X
in X, i.e., (A∩X)∩X = (A∪X)∩X = (A∩X)∪ (X ∩X)=A∩X.2. If A,B ∈ Γ then (A∩B)∩X = (A∩X)∩(B∩X). (The reader should examine
the behaviour of union.)A notion that is very close to that of a set theory is that of a topology.Say that a family Γ = (U ,Φ,A1,A2, . . .) is a topology on U iff
T1. when A1,A2 ∈ Γ then A1 ∩A2 ∈ Γ ;T2. If Aj ∈ Γ for all j belonging to some index set J (possibly infinite) then⋃
j∈J Aj ∈ Γ .T3. Both U and Φ belong to Γ .
Axioms T1 and T2 may be interpreted as saying that Γ is closed under finiteintersection and (infinite) union.
Let X be any subset of U . Then the relative topology ΓX induced from the topol-ogy Γ on U is defined by
ΓX = (X,Φ,A1 ∩X, . . .)
where any set of the form A∩X, for A ∈ Γ , belongs to ΓX .
Example 1.2 We can show that ΓX is a topology. If U1,U2 ∈ ΓX then there mustexist sets A1,A2 ∈ Γ such that Ui =Ai ∩X, for i = 1,2. But then
U1 ∩U2 = (A1 ∩X)∩ (A2 ∩X)
= (A1 ∩A2)∩X.
Since Γ is a topology, A1 ∩A2 ∈ Γ . Thus U1 ∩U2 ∈ ΓX . Union follows similarly.
1.1.3 Partitions and Covers
If X is a set, a cover for X is a family Γ = (A1,A2, . . . ,Aj , . . .) where j belongsto an index set J (possibly infinite) such that
X =∪{Aj : j ∈ J }.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.2 Relations, Functions and Operations 7
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
A partition for X is a cover which is disjoint, i.e., Aj ∩Ak =Φ for any distinctj, k ∈ J .
If ΓX is a cover for X, and Y is a subset of X then ΓY = {Aj ∩ Y : j ∈ J } is theinduced cover on Y .
1.1.4 The Universal and Existential Quantifiers
Two operators which may be used to construct propositions are the universal andexistential quantifiers.
For example, “for all x in A it is the case that x satisfies P(A).” The term “forall” is the universal quantifier, and generally written as ∀.
On the other hand we may say “there exists some x in A such that x satisfiesP(A).” The term “there exists” is the existential quantifier, generally written ∃.
Note that these have negations as follows:
not[∃ x s.t. x satisfies P ] ≡ [∀ x : x does not satisfyP ]not[∀ x : x satisfiesP ] ≡ [∃ x s.t. x does not satisfy P ].
We use s.t . to mean “such that”.
1.2 Relations, Functions and Operations
1.2.1 Relations
If X,Y are two sets, the Cartesian product set X × Y is the set of ordered pairs(x, y) such that x ∈X and y ∈ Y .
For example if we let � be the set of real numbers, then �� or �2 is the set
{(x, y) : x ∈ �, y ∈ �},
namely the plane. Similarly �n =�×· · ·×� (n times) is the set of n-tuples of realnumbers, defined by induction, i.e., �n =� × (� × (� × · · · , . . .)).
A subset of the Cartesian product Y × X is called a relation, P , on Y × X. If(y, x) ∈ P then we sometimes write yPx and say that y stands in relation P to x.If it is not the case that (y, x) ∈ P then write (y, x) /∈ P or not (yPx). X is calledthe domain of P , and Y is called the target or codomain of P .
If V is a relation on Y × X and W is a relation on Z × Y , then define the re-lation W ◦ V to be the relation on Z × X given by (z, x) ∈ W ◦ V iff for somey ∈ Y , (z, y) ∈W and (y, x) ∈ V . The new relation W ◦ V on Z ×X is called thecomposition of W and V .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
8 1 Sets, Relations, and Preferences
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
The identity relation (or diagonal) eX on X×X is
eX ={(x, x) : x ∈X
}.
If P is a relation on Y ×X, its inverse, P−1, is the relation on X× Y defined by
P−1 = {(x, y) ∈X× Y : (y, x) ∈ P}.
Note that:
P−1 ◦ P = {(z, x) ∈X×X : ∃ y ∈ Y s.t. (z, y) ∈ P−1 and (y, x) ∈ P}.
Suppose that the domain of P is X, i.e., for every x ∈ X there is some y ∈ Y s.t.(y, x) ∈ P . In this case for every x ∈ X, there exists y ∈ Y such that (x, y) ∈ P−1
and so (x, x) ∈ P−1 ◦ P for any x ∈X. Hence eX ⊂ P−1 ◦ P . In the same way
P ◦ P−1 = {(t, y) ∈ Y × Y : ∃x ∈Xs.t. (t, x) ∈ P and (x, y) ∈ P−1}
and so eY ⊂ P ◦ P−1.
1.2.2 Mappings
A relation P on Y × X defines an assignment or mapping from X to Y , which iscalled φP and is given by
φP (x)= {y : (y, x) ∈ P}.
In general we write φ :X→ Y for a mapping which assigns to each element ofX the set, φ(x), of elements in Y . As above, the set Y is called the co-domain of φ.
The domain of a mapping, φ, is the set {x ∈X : ∃ y ∈ Y s.t. y ∈ φ(x)}, and theimage of φ is {y ∈ Y : ∃ x ∈X s.t. y ∈ φ(x)}.
Suppose now that V,W are relations on Y × X,Z × Y respectively. We havedefined the composite relation W ◦ V on Z × X. This defines a mapping φW◦V :X→ Z by z ∈ φW◦V (x) iff ∃y ∈ Y such that (y, x) ∈ V and (z, y) ∈W . This inturn means that y ∈ φV (x) and z ∈ φW(y).
If φ : X→ Y and ψ : Y → Z are two mappings then define their compositionψ ◦ φ :X→ Z by
(ψ ◦ φ)(x)=ψ[φ(x)]=∪{ψ(y) : y ∈ φ(x)
}.
Clearly z ∈ φW◦V (x) iff z ∈ φW [φV (x)].Thus φW◦V (x) = φW [φV (x)] = [(φW ◦ φV )(x)], ∀x ∈ X. We therefore write
φW◦V = φW ◦ φV .For example suppose V and W are given by
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.2 Relations, Functions and Operations 9
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
V : {(2,3), (3,2), (1,2), (4,4), (4,1)}
and
W : {(1,4), (4,4), (4,1), (2,1), (2,3), (3,2)}
with mappings
φV φW
1 −→ 4 −→ 1↗ ↘
4 1 −→ 4↗ ↘
2 −→ 3 −→ 23 −→ 2 −→ 3
then the composite mapping φW ◦ φV = φW◦V is
1 −→ 1↗↘
4 −→ 4↘
2 −→ 23 −→ 3
with relation
W ◦ V = {(3,3), (2,2), (4,2), (1,4), (4,4), (1,1), (4,1)}.
Given a mapping φ : X → Y then the reverse procedure to the above gives arelation, called the graph of φ, or graph (φ), where
graph(φ)=⋃
x∈X
(φ(x)× {x})⊂ Y ×X.
In the obvious way if φ :X→ Y and ψ : Y → Z, are mappings, with compositionψ ◦ φ :X→ Z, then graph (ψ ◦ φ)= graph(ψ) ◦ graph(φ).
Suppose now that P is a relation on Y × X, with inverse P−1 on X × Y , andlet φP :X→ Y be the mapping defined by P . Then the mapping φP−1 : Y →X isdefined as follows:
φP−1(y)= {x : (x, y) ∈ P−1}
= {x : (y, x) ∈ P}
= {x : y ∈ φP (x)}.
More generally if φ :X→ Y is a mapping then the inverse mapping φ−1 : Y →X
is given by
φ−1(y)= {x : y ∈ φ(x)}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
10 1 Sets, Relations, and Preferences
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Thus
φP−1 = (φP )−1 : Y →X.
For example let Z4 be the first four positive integers and let P be the relation onZ4 ×Z4 given by
P = {(2,3), (3,2), (1,2), (4,4), (4,1)}.
Then the mapping φP and inverse φP−1 are given by:
φP : 1 −→ 4 φP−1 : 4 −→ 1↗ 1 1 ↘
4 ↗ ↘ 42 −→ 3 3 −→ 23 −→ 2 2 −→ 3
If we compose P−1 and P as above then we obtain
P−1 ◦ P = {(1,1), (1,4), (4,1), (4,4), (2,2), (3,3)},
with mapping
φP−1 ◦ φP
1 −→ 1↗↘
4 −→ 42 −→ 23 −→ 3
Note that P−1◦P contains the identity or diagonal relation e= {(1,1), (2,2), (3,3),
(4,4)} on Z4 = {1,2,3,4}. Moreover φP−1 ◦ φP = φ(P−1◦P).The mapping idX :X→X defined by idX(x)= x is called the identity mapping
on X. Clearly if eX is the identity relation, then φeX= idX and graph (idX)= ex .
If φ,ψ are two mappings X→ Y then write ψ ⊂ φ iff for each x ∈X, ψ(x)⊂φ(x).
As we have seen eX ⊂ P−1 ◦ P and so
φeX= idX ⊂ φ(P−1◦P) = φP−1 ◦ φP = (φP )−1 ◦ φP .
(This is only precisely true when X is the domain of P , i.e., when for every x ∈X
there exists some y ∈ Y such that (y, x) ∈ P .)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.2 Relations, Functions and Operations 11
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
1.2.3 Functions
If for all x in the domain of φ, there is exactly one y such that y ∈ φ(x) then φ is
called a function. In this case we generally write f :X→ Y , and sometimes xf→ y
to indicate that f (x)= y. Consider the function f and its inverse f−1 given by
f f−1 f−1 ◦ f
1 −→ 4 −→ 1 1 −→ 1↗ ↘ ↗↘
4 1 4 4 −→ 42 −→ 3 −→ 2 2 −→ 23 −→ 2 −→ 3 3 −→ 3
Clearly f−1 is not a function since it maps 4 to both 1 and 4, i.e., the graph off−1 is {(1,4), (4,4), (2,3), (3,2)}. In this case idX is contained in f−1 ◦ f but isnot identical to f−1 ◦ f . Suppose that f−1 is in fact a function. Then it is necessarythat for each y in the image there be at most one x such that f (x)= y. Alternativelyif f (x1)= f (x2) then it must be the case that x1 = x2. In this case f is called 1− 1or injective. Then f−1 is a function and
idX = f−1 ◦ f on the domain X of f
idY = f ◦ f−1 on the image Y of f.
A mapping φ : X→ Y is said to be surjective (or called a surjection) iff everyy ∈ Y belongs to the image of φ; that is, ∃ x ∈X s.t. y ∈ φ(x).
A function f :X→ Y which is both injective and surjective is said to be bijec-tive.
Example 1.3 Consider
π π−1
1 −→ 4 −→ 14 −→ 2 −→ 42 −→ 3 −→ 23 −→ 1 −→ 3.
In this case the domain and image of π coincide and π is known as a permutation.Consider the possibilities where φ is a mapping � → �, with graph (φ) ⊂ �2.(Remember � is the set of real numbers.) There are three cases:
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
12 1 Sets, Relations, and Preferences
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
(i) φ is a mapping:
(ii) φ is a non injective function:
(iii) φ is an injective function:
1.3 Groups and Morphisms
We earlier defined the composition of two mappings φ :X→ Y and ψ : Y →X tobe ψ ◦ φ : X→ Z given by (ψ ◦ φ)(x) = ψ[φ(x)] = ∪ {ψ(y) : y ∈ φ(x)}. In thecase of functions f :X→ Y and g : Y → Z this translates to
(g ◦ f )(x)= g[f (x)
]= {g(y) : y = f (x)}.
Since both f,g are functions the set on the right is a singleton set, and so g ◦ f is afunction. Write F(A,B) for the set of functions from A to B . Thus the compositionoperator, ◦, may be regarded as a function:
◦ : F(X,Y )×F(Y,Z) → F(X,Z)
(f, g) → g ◦ f.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 13
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
Example 1.4 To illustrate consider the function (or matrix) F given by
(a b
c d
)(x1x2
)=(
ax1 + bx2cx1 + dx2
).
This can be regarded as a function F : �2→�2 since it maps (x1, x2) → (ax1+bx2, cx1 + dx2) ∈ �2.
Now let
F =(
a b
c d
), H =
(e f
g h
).
F ◦H is represented by
(x1x2
)F→(
ax1 + bx2cx1 + dx2
)H→(
e(ax1 + bx2)+ f (cx1 + dx2)
g(ax1 + bx2)+ h(cx1 + dx2)
).
Thus
(H ◦ F)
(x1x2
)=(
ea + f c | eb+ f d
ga + hc | gb+ hd
)(x1x2
)
or
(H ◦ F)=(
e f
g h
)◦(
a b
c d
)=(
ea + f c | eb+ f d
ga + hc | gb+ hd
).
The identity E is the function
E
(x1x2
)=(
a b
c d
)(x1x2
)=(
x1x2
).
Since this must be true for all x1, x2, it follows that a = d = 1 and c= b= 0.
Thus E = ( 1 00 1
).
Suppose that the mapping F−1 : �2 →�2 is actually a matrix. Then it is cer-tainly a function, and by Sect. 1.2.3, F−1 ◦F must be equal to the identity functionon �2, which here we call E. To determine F−1, proceed as follows:
Let F−1 = ( e f
g h
). We know F−1 ◦ F = ( 1 0
0 1
).
Thus
ea+ fc= 1 | eb+ fd = 0
ga+ hc= 0 | gb+ hd = 1.
If a �= 0 and b �= 0 then e=− f db= 1−f c
a.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
14 1 Sets, Relations, and Preferences
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Now let |F | = (ad − bc), where |F | is called the determinant of F . Clearlyif |F | �= 0, then f = −b/|F |. More generally, if |F | �= 0 then we can solve theequations to obtain:
F−1 =(
e f
g h
)= 1
|F |(
d −b
−c a
).
If |F | = 0, then what we have called F−1 is not defined. This suggests that when|F | = 0, the inverse F−1 cannot be represented by a matrix, and in particular thatF−1 is not a function. In this case we shall call F singular. When |F | �= 0 then weshall call F non-singular, and in this case F−1 can be represented by a matrix, andthus a function. Let M(2) stand for the set of 2× 2 matrices, and let M∗(2) be thesubset of M(2) consisting of non-singular matrices.
We have here defined a composition operation:
◦ : M(2)×M(2) → M(2)
(H,F ) → H ◦ F.
Suppose we compose E with F then
E ◦ F =(
1 00 1
)(a b
c d
)=(
a b
c d
)= F.
Finally for any F ∈M∗(2) it is the case that there exists a unique matrix F−1 ∈M(2) such that
F−1 ◦ F =E.
Indeed if we compute the inverse (F−1)−1 of F−1 then we see that (F−1)−1 = F .Thus F−1 itself belongs to M∗(2).
M∗(2) is an example of what is called a group.More generally a binary operation, ◦, on a set G is a function
◦ : G×G → G,
(x, y) → x ◦ y.
Definition 1.1 A group G is a set G together with a binary operation, ◦ : G ×G→G which1. is associative: (x ◦ y) ◦ z= x ◦ (y ◦ z) for all x, y, z in G;2. has an identity e : e ◦ x = x ◦ e= x ∀ x ∈G;3. has for each x ∈G an inverse x−1 ∈G such that x ◦ x−1 = x−1 ◦ x = e.
When G is a group with operation, ◦, write (G,◦) to signify this.Associativity simply means that the order of composition in a sequence of com-
positions is irrelevant. For example consider the integers, Z , under addition. Clearlya + (b+ c)= (a + b)+ c, where the left hand side means add b to c, and then add
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 15
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
a to this, while the right hand side is obtained by adding a to b, and then adding c
to this. Under addition, the identity is that element e ∈Z such that a + e = a. Thisis usually written 0. Finally the additive inverse of an integer a ∈ Z is (−a) sincea + (−a)= 0. Thus (Z,+) is a group.
However consider the integers under multiplication, which we shall write as “·”.Again we have associativity since
a · (b · c) = (a · b) · c.
Clearly 1 is the identity since 1 · a = a. However the inverse of a is that objecta−1 such that a ·a−1 = 1. Of course if a = 0, then no such inverse exists. For a �= 0,a−1 is more commonly written 1
a. When a is non-zero, and different from ±1, then
1a
is not an integer. Thus (Z, ·) is not a group. Consider the set Q of rationals, i.e.,a ∈Q iff a = p
q, where both p and q are integers. Clearly 1 ∈Q. Moreover, if a = p
q
then a−1 = qp
and so belongs to Q. Although zero does not have an inverse, we canregard (Q\{0}, ·) as a group.
Lemma 1.1 If (G,◦) is a group, then the identity e is unique and for each x ∈G
the inverse x−1 is unique. By definition e−1 = e. Also (x−1)−1 = x for any x ∈G.
Proof1. Suppose there exist two distinct identities, e,f . Then e ◦ x = f ◦ x for some x.
Thus (e ◦ x) ◦ x−1 = (f ◦ x) ◦ x−1. This is true because the composition oper-ation
((e ◦ x), x−1)→ (e ◦ x) ◦ x−1
gives a unique answer.By associativity (e ◦ x) ◦ x−1 = e ◦ (x ◦ x−1), etc.Thus e ◦ (x ◦ x−1)= f ◦ (x ◦ x−1). But x ◦ x−1 = e, say.
Since e is an identity, e ◦ e= f ◦ e and so e= f . Since e ◦ e= e it must bethe case that e−1 = e.
2. In the same way suppose x has two distinct inverses, y, z, so x ◦ y = x ◦ z= e.Then
y ◦ (x ◦ y)= y ◦ (x ◦ z)
(y ◦ x) ◦ y = (y ◦ x) ◦ z
e ◦ y = e ◦ z
y = z.
3. Finally consider the inverse of x−1. Since x ◦ (x−1) = e and by definition(x−1)−1 ◦ (x−1)= e by part (2), it must be the case that (x−1)−1 = x. �
We can now construct some interesting groups.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
16 1 Sets, Relations, and Preferences
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
Lemma 1.2 The set M∗(2) of 2 × 2 non-singular matrices form a group undermatrix composition, ◦.
Proof We have already shown that there exists an identity matrix E in M∗(2).Clearly |E| = 1 and so E has inverse E.
As we saw in Example 1.4, when we solved H ◦ F =E we found that
H = F−1 = 1
|F |(
d −b
−c a
).
By Lemma 1.1, (F−1)−1 = F and so F−1 must have an inverse, i.e., |F−1| �= 0,and so F−1 is non-singular. Suppose now that the two matrices H,F belong toM∗(2). Let
F =(
a b
c d
)
and
H =(
e f
g h
).
As in Example 1.4,
|H ◦ F | =∣∣∣∣
(ea + f c eb+ f d
ga + hc gb+ hd
)∣∣∣∣= (ea + f c)(gb+ hd)− (ga + hc)(eb+ f d)
= (eh− gf )(ad − bc)= |H ||F |.
Since both H and F are non-singular, |H | �= 0 and |F | �= 0 and so |H ◦ F | �= 0.Thus H ◦ F belongs to M∗(2), and so matrix composition is a binary operationM∗(2)×M∗(2)→M∗(2).
Finally the reader may like to verify that matrix composition on M∗(2) is asso-ciative. That is to say if F,G,H are non-singular 2× 2 matrices then
H ◦ (G ◦ F)= (H ◦G) ◦ F.
As a consequence (M∗(2),◦) is a group. �
Example 1.5 For a second example consider the addition operation on M(2) de-fined by
(e f
g h
)+(
a b
c d
)=(
a + e f + b
g+ c h+ d
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 17
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
Fig. 1.3 ???
Clearly the identity matrix is( 0 0
0 0
)and the inverse of F is
−F =−(
a b
c d
)=(−a −b
−c −d
).
Thus (M(2),+) is a group.Finally consider those matrices which represent rotations in �2.If we rotate the point (1,0) in the plane through an angle θ in the anticlockwise
direction then the result is the point (cos θ, sin θ ), while the point (0,1) is trans-formed to (− sin θ, cos θ ). As we shall see later, this rotation can be represented bythe matrix
(cos θ − sin θ
sin θ cos θ
)
which we will call eiθ .Let Θ be the set of all matrices of this form, where θ can be any angle between 0
and 360◦. If eiθ and eiψ are rotations by θ,ψ respectively, and we rotate by θ firstand then by ψ , then the result should be identical to a rotation by ψ + θ . To see this:
(cosψ − sinψ
sinψ cosψ
)(cos θ − sin θ
sin θ cos θ
)
=(
cosψ cos θ − sinψ sin θ | − cosψ sin θ − sinψ cos θ
sinψ cos θ + cosψ sin θ | − sinψ sin θ + cosψ cos θ
)
=(
cos(ψ + θ) − sin(ψ + θ)
sin(ψ + θ) cos(ψ + θ)
)= ei(θ+ψ).
Note that |eiθ | = cos2 θ + sin2 θ = 1. Thus
(eiθ)−1 =
(cos θ sin θ
− sin θ cos θ
)=(
cos θ − sin(−θ)
sin(−θ) cos θ
)= ei(−θ).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
18 1 Sets, Relations, and Preferences
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
Hence the inverse to eiθ is a rotation by (−θ), that is to say by θ but in theopposite direction. Clearly E = ei0, a rotation through a zero angle. Thus (Θ,◦) isa group. Moreover Θ is a subset of M∗(2), since each rotation has a non-singularmatrix. Thus Θ is a subgroup of M∗(2).
A subset Θ of a group (G,◦) is a subgroup of G iff the composition operation,◦, restricted to Θ is “closed”, and Θ is a group in its own right. That is to say (i) ifx, y ∈Θ then x ◦ y ∈Θ , (ii) the identity e belongs to Θ and (iii) for each x in Θ
the inverse, x−1, also belongs to Θ .
Definition 1.2 Let (X,◦) and (Y, ·) be two sets with binary operations, ◦, ·, re-spectively. A function f : X→ Y is called a morphism (with respect to (◦, ·)) ifff (x ◦ y) = f (x) · f (y), for all x, y ∈ X. If moreover f is bijective as a function,then it is called an isomorphism. If (X,◦), (Y, ·) are groups then f is called a homo-morphism.
A binary operation on a set X is one form of mathematical structure that the setmay possess. When an isomorphism exists between two sets X and Y then mathe-matically speaking their structures are identical.
For example let Rot be the set of all rotations in the plane. If rot(θ) and rot(ψ)
are rotations by θ , ψ respectively then we can combine them to give a rotationrot(ψ + θ), i.e.,
rot(ψ) ◦ rot(θ)= rot(ψ + θ).
Here ◦ means do one rotation then the other. To the rotation, rot(θ) let f assignthe 2× 2 matrix, called eiθ as above. Thus
f : (rot,◦)→ (Θ,◦),
where f (rot(θ))= eiθ .Moreover
f(rot(ψ) ◦ rot(θ)
)= f(rot(ψ + θ)
)
eiψ ◦ eiθ = ei(ψ+θ)
Clearly the identity rotation is rot(0) which corresponds to the zero matrix ei◦,while the inverse rotation to rot(θ) is rot(−θ) corresponding to e−iθ . Thus f is amorphism.
Here we have a collection of geometric objects, called rotations, with their ownstructure and we have found another set of “mathematical” objects namely 2 × 2matrices of a certain type, which has an identical structure.
Lemma 1.3 If f : (X,◦)→ (Y, ·) is a morphism between groups then
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 19
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
(1) f (eX)= eY where eX , eY are the identities in X,Y .(2) for each x in X, f (x−1)= [f (x)]−1.
Proof1. Since f is a morphism f (x ◦ eX)= f (x) · f (eX)= f (x). By Lemma 1.2, eY
is unique and so f (eX)= eY .2. f (x ◦ x−1) = f (x) · f (x−1) = f (eX) = eY . By Lemma 1.2, [f (x)]−1 is
unique, and so f (x−1)= [f (x)]−1. �
As an example, consider the determinant function det :M(2)→�.From the proof of Lemma 1.3, we know that for any 2× 2 matrices,H and F ,
it is the case that |H ◦ F | = |H | |F |. Thus det : (M(2),◦)→ (�, ·) is a morphismwith respect to matrix composition, ◦, in M(2) and multiplication, ·, in �.
Note also that if F is non-singular then det(F )= |F | �= 0, and so det :M∗(2)→�\{0}.
It should be clear that (�\{0}, ·) is a group.Hence det : (M∗(2),◦) → (�\{0}, ·) is a homomorphism between these two
groups. This should indicate why those matrices in M(2) which have zero deter-minant are those without an inverse in M(2).
From Example 1.4, the identity in M∗(2) is E, while the multiplicative identityin � is 1. By Lemma 1.3, det(E)= 1.
Moreover |F |−1 = 1|F | and so, by Lemma 1.3, |F−1| = 1
|F | . This is easy to checksince
∣∣F−1∣∣=∣∣∣∣
1
|F |(
d −b
−c a
)∣∣∣∣=da − bc
|F |2 = |F ||F |2 =
1
|F | .
However the determinant det :M∗(2)→�\0 is not injective, since it is clearlypossible to find two matrices, H,F such that |H | = |F | although H and F aredifferent.
Example 1.6 It is clear that the real numbers form a group (�,+) under additionwith identity 0, and inverse (to a) equal to −a. Similarly the reals form a group(�\{0}, ·) under multiplication, as long as we exclude 0.
Now let Z2 be the numbers {0,1} and define “addition modulo 2,” written +,on Z2, by 0+ 0= 0, 0+ 1 = 1, 1+ 0= 1, 1+ 1, and “multiplication modulo 2,”written ·, on Z2, by 0 · 0= 0, 0 · 1= 1 · 0= 0, 1 · 1= 1.
Under “addition modulo 2,” 0 is the identity, and 1 has inverse 1. Associativityis clearly satisfied, and so (Z2,+) is a group. Under multiplication, 1 is the identityand inverse to itself, but 0 has no inverse. Thus (Z2, ·) is not a group. Note that(Z2\{0}, ·) is a group, namely the trivial group containing only one element. Let Zbe the integers, and consider the function
f :Z→Z2,
defined by f (x)= 0 if x is even, 1 if x is odd.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
20 1 Sets, Relations, and Preferences
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
We see that this is a morphism f : (Z,+)→ (Z2,+);1. if x and y are both even then f (x)= f (y)= 0; since x+y is even, f (x+y)=
0.2. if x is even and y odd, f (x)= 0, f (y)= 1 and f (x)+ f (y)= 1. But x + y is
odd, so f (x + y)= 1.3. if x and y are both odd, then f (x)= f (y)= 1, and so f (x)+ f (y)= 0. But
x + y is even, so f (x + y)= 0.Since (Z,+) and (Z2,+) are both groups, f is a homomorphism. Thus f (−a)=
f (a).On the other hand consider
f : (Z, ·)→ (Z2, ·);
1. if x and y are both even then f (x)= f (y)= 0 and so f (x) ·f (y)= 0= f (xy).2. if x is even and y odd, then f (x)= 0, f (y)= 1 and f (x) · f (y)= 0. But xy
is even so f (xy)= 0.3. if x and y are both odd, f (x)= f (y)= 1 and so f (x)f (y)= 1. But xy is odd,
and f (xy)= 1.Hence f is a morphism. However, neither (Z, ·) nor (Z2, ·) is a group, and so f
is not a homomorphism.A computer, since it is essentially a “finite” machine, is able to compute in binary
arithmetic, using the two groups (Z2,+), (Z2\{0}, ·) rather than with the groups(�,+), (�\{0}, ·).
This is essentially because the additive and multiplicative groups based on Z2form what is called a field.
Definition 1.31. A group (G,◦) is commutative or abelian iff for all a, b ∈G, a ◦ b= b ◦ a.2. A field (F,+, ·) is a set together with two operations called addition (+) and
multiplication (·) such that (F,+) is an abelian group with zero, or identity 0,and (F\{0}, ·) is an abelian group with identity 1. For convenience the additiveinverse of an element a ∈F is written (−a) and the multiplicative inverse of anon zero a ∈F is written a−1 or 1
a.
Moreover, multiplication is distributive over addition, i.e., for all a, b, c in F ,a · (b+ c)= a · b+ a · c.
To give an indication of the notion of abelian group, consider M∗(2) again. Aswe have seen
H ◦ F =(
e f
g h
)◦(
a b
c d
)=(
ea + f c eb+ f d
ga + hc gb+ hd
).
However,
F ◦H =(
a b
c d
)◦(
e f
g h
)=(
ea + bg af + bh
ce+ dg cf + dh
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 21
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
Thus H ◦ F �= F ◦H in general and so M∗(2) is non abelian. However, if weconsider two rotations eiθ , eiψ then eiψ ◦ eiθ = ei(ψ+θ) = eiθ ◦ eiψ . Thus the group(Θ,◦) is abelian.
Lemma 1.4 Both (�,+, ·) and (Z2,+, ·) are fields.
Proof Consider (Z2,+, ·) first of all. As we have seen (Z2,+) and (Z2\{0}, ·) aregroups. (Z2,+) is obviously abelian since 0+ 1= 1+ 0= 1, while (Z2\{0},◦) isabelian since it has one element.
To check for distributivity, note that
1 · (1+ 1)= 1 · 0= 0= 1 · 1+ 1 · 1= 1+ 1.
Finally to see that (�,+, ·) is a field, we note that for any real numbers, a, b, c,�, (b+ c)= ab+ ac. �
Given a field (F,+, ·) we define a new object called Fn where n is a positiveinteger as follows. Any element x ∈Fn is of the form
⎛
⎝x1·
xn
⎞
⎠
where x1, . . . , xn all belong to F .F1. If a ∈F , and x ∈Fn define αx ∈Fn by
α
⎛
⎝x1·
xn
⎞
⎠=⎛
⎝αx1·
αxn
⎞
⎠ .
F2. Define addition in Fn by
x + y =⎛
⎝x1·
xn
⎞
⎠+⎛
⎝y1·
yn
⎞
⎠=(
x1 + y1xn + yn
).
Since F , by definition, is an abelian additive group, it follows that
x + y =⎛
⎝x1 + y1·
xn + yn
⎞
⎠=⎛
⎝y1 + x1·
yn + xn
⎞
⎠= y + x.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
22 1 Sets, Relations, and Preferences
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
Now let
0=⎛
⎝0·0
⎞
⎠ .
Clearly
x + 0=⎛
⎝x1·
xn
⎞
⎠+⎛
⎝0·0
⎞
⎠= x.
Hence 0 belongs to Fn and is an additive identity in Fn.Suppose we define
(−x)=−⎛
⎝x1·
xn
⎞
⎠=⎛
⎝−x1·−xn
⎞
⎠ .
Clearly
x + (−x)=⎛
⎝x1·
xn
⎞
⎠+⎛
⎝−x1·−xn
⎞
⎠=(
x1 − x1xn − xn
)= 0.
Thus for each x ∈Fn there is an inverse, (−x), in Fn.Finally, since F is an additive group
x + (y + z)=⎛
⎝x1·
xn
⎞
⎠+⎛
⎝y1 + z1·
yn + zn
⎞
⎠=⎛
⎝x1 + y1·
xn + yn
⎞
⎠+⎛
⎝z1·zn
⎞
⎠
= (x + y)+ z.
Thus (Fn,+) is an abelian group, with zero 0.The fact that it is possible to multiply an element x ∈ Fn by a scalar a ∈ F
endows Fn with further structure. To see this consider the example of �2.1. If a ∈ � and both x, y belong to �2, then
α
[(x1x2
)+(
y1y2
)]= α
(x1 + y1x2 + y2
)=(
αx1 + αy1αx2 + αy2
)
by distribution, and
=(
αx1αx2
)+(
αy1αy2
)= α
(x1x2
)+ α
(y1y2
)
by F1. Thus α(x + y)= αx + αy.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.3 Groups and Morphisms 23
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
2.
(α + β)
(x1x2
)=(
(α + β)x1(α + β)x2
)
by F1
=(
αx1 + βx1αx2 + βx2
)=(
αx1αx2
)+(
βx1βx2
)
by F2
= α
(x1x2
)+ β
(x1x2
)
by F1. Therefore, (α + β)x = αx + βx.3.
(αβ)
(x1x2
)=(
(αβ)x1(αβ)x2
)
by F1 = α(
βx1βx2
)by associativity and F1, and = α(βx) by F1.
Thus (αβ)x = α(βx).
4.
(x1x2
)=(
1 · x11 · x2
)=(
x1x2
).
Therefore 1(x)= x.These four properties characterise what is know as a vector space.
Finally, consider the operation of a matrix F on the set of elements in �2. Bydefinition
F(x + y)=(
a b
c d
)[(x1x2
)+(
y1y2
)]=(
a b
c d
) (x1 + y1x2 + y2
)
=(
a(x1 + y1)+ b(x2 + y2)
c(x1 + y1)+ d(x2 + y2)
)=(
ax1 + bx2cx1 + dx2
)+(
ay1 + by2cy1 + dy2
)
by F2
=(
a b
c d
)(x1x2
)+(
a b
c d
) (y1y2
)= F(x)+ F(y).
Hence F : (�2,+) → (�2,+) is a morphism from the abelian group (�2,+)
into itself.By Lemma 1.3, we know that F(0)= 0, and for any element x ∈ �2,
F(−x)= F(−1(x)
)=−F(x)=−1F(x).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
24 1 Sets, Relations, and Preferences
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
Fig. 1.4 ???
A morphism between vector spaces is called a linear transformation. Vectorspaces and linear transformations are discussed in Chap. 2.
1.4 Preferences and Choices
1.4.1 Preference Relations
A binary relation P on X is a subset of X×X; more simply P is called a relation onX. For example let X ≡� (the real line) and let P be “>” meaning strictly greaterthan. The relation “>” clearly satisfies the following properties:1. it is never the case that x > x
2. it is never the case that x > y and y > x
3. it is always the case that x > y and y > z implies x > z.These properties can be considered more abstractly. A relation P on X is:1. symmetric iff xPy⇒ yPx
asymmetric iff xPy⇒ not (yPx)
antisymmetric iff xPy and yPx⇒ x = y
2. reflexive iff (xPx) ∀ x ∈X
irreflexive iff not (xPx) ∀x ∈X
3. transitive iff xPy and yPz⇒ xPz
4. connected iff for any x, y ∈X either xPy or yPx.By analogy with the relation “>” a relation P , which is both irreflexive and
asymmetric is called a strict preference relation.Given a strict preference relation P on X, we can define two new relations called
I , for indifference, and R for weak preference as follows.1. xIy iff not (xPy) and not (yPx)
2. xRy iff xPy or xIy.By de Morgan’s rule xIy iff not (xPy ∨ yPx). Thus for any x, y ∈X either xIy
or xPy or yPx. Since P is asymmetric it cannot be the case that both xPy andyPx are true. Thus the propositions “xPy”, “yPx”, “xIy” are disjoint, and henceform a partition of the universal proposition, U .
Note that (xPy ∨ xIy)≡ not (yPx) since these three propositions form a (dis-joint) partition. Thus (xRy) iff not (yPx).
In the case that P is the strict preference relation “>”, it should be clear that in-difference is identical to “=” and weak preference to “> or =” usually written “≥”.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.4 Preferences and Choices 25
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
Lemma 1.5 If P is a strict preference relation then indifference (I ) is reflexive andsymmetric, while weak preference (R) is reflexive and connected. Moreover if xRy
and yRx then xIy.
Proof1. Since not (xPx) this must imply xIx, so I is reflexive2.
xIy ⇐⇒ not (xPy)∧ not (yPx)
⇐⇒ not (yPx)∧ not (xPy)
⇐⇒ yIx.
Hence I is symmetric.3. xRy⇐⇒ xPy or xIy. Thus xIx⇒ xRx, so R is reflexive.4. xRy⇐⇒ xPy or yIx and yRx⇐⇒ yPx or yIx.
‘ Not (xRy ∨ yRx)⇐⇒ not (xPy ∨ yPx ∨ xIy). But xPy ∨ yPx ∨ xIy isalways true since these three propositions form a partition of the universal set.Thus not (xRy ∨ yRx) is always false, and so xRy ∨ yRx is always true. ThusR is connected.
5. Clearly
xRy and yRx ⇐⇒ (xPy ∧ yPx)∨ xIy
⇐⇒ xIy
since xPy ∧ yPx is always false by asymmetry. �
In the case that P corresponds to “>” then x ≥ y and y ≥ x⇒ x = y, so “≥” isantisymmetric.
Suppose that P is a strict preference relation on X, and there exists a functionu :X→�, called a utility function, such that xPy iff u(x) > u(y). Therefore
xRy iff u(x)≥ u(y)
xIy iff u(x)= u(y).
The order relation “>” on the real line is transitive (since x > y > z⇒ x > z).Therefore P must be transitive when it is representable by a utility function.
We therefore have reason to consider “rationality” properties, such as transitivity,of a strict preference relation.
1.4.2 Rationality
Lemma 1.6 If P is irreflexive and transitive on X then it is asymmetric.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
26 1 Sets, Relations, and Preferences
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
Proof To show that A∧B⇒ C we need only show that B ∧ not (C)⇒ not (A).Therefore suppose that P is transitive but fails asymmetry. By the latter assump-
tion there exists x, y ∈ X such that xPy and yPx. By transitivity this gives xPx,which violates irreflexivity. �
Call a strict preference relation, P , on X negatively transitive iff it is the casethat, for all x, y, z ∈X, not (xPy) ∧ not (yP z)⇒ not (xP z). Note that xRy⇐⇒not (yPx). Thus the negative transitivity of P is equivalent to the property
yRx ∧ zRy⇒ zRx.
Hence R must be transitive.
Lemma 1.7 If P is a strict preference relation that is negatively transitive then P ,I , R are all transitive.
Proof1. By the previous observation, R is transitive.2. To prove P is transitive, suppose otherwise, i.e., that there exist x, y, z such
that xPy, yPz but not (xP z). By definition not (xP z)⇐⇒ zRx. MoreoveryPz or yIz⇐⇒ yRz. Thus yPz�⇒ yRz. By transitivity of R, zRx and yRz
gives yRx, or not (xPy). But we assumed xPy. By contradiction we musthave xPz.
3. To show I is transitive, suppose xIy, yIz but not (xIz). Suppose xPz, say.But then xRz. Because of the two indifferences we may write zRy and yRx.By transitivity of R, zRx. But zRx and xRz imply xIz, a contradiction. In thesame way if zPx, then zRx, and again xIz. Thus I must be transitive. �
Note that this lemma also implies that P , I and R combine transitively. Forexample, if xRy and yPz then xPz.
To show this, suppose, in contradiction, that not (xP z).This is equivalent to zRx. If xRy, then by transitivity of R, we obtain zRy and so
not (yP z). Thus xRy and not (xP z)⇒ not (yP z). But yPz and not (yP z) cannotboth hold. Thus xRy and yPz⇒ xPz. Clearly we also obtain xIy and yPz⇒ xPz
for example.When P is a negatively transitive strict preference relation on X, then we call it
a weak order on X. Let O(X) be the set of weak orders on X. If P is a transitivestrict preference relation on X, then we call it a strict partial order. Let T (X) be theset of strict partial orders on X. By Lemma 1.7, O(X)⊂ T (X).
Finally call a preference relation acyclic if it is the case that for any finite se-quence x1, . . . , xr of points in X if xjPxj+1 for j = 1, . . . , r − 1 then it cannot bethe case that xrPx1.
Let A(X) be the set of acyclic strict preference relations on X. To see that T (x)⊂A(X), suppose that P is transitive, but cyclic, i.e., that there exists a finite cyclex1Px2 . . . P xrPx1. By transitivity xr−1PxrPx1 gives xr−1Px1, and by repetitionwe obtain x2Px1. But we also have x1Px2, which violates asymmetry.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.4 Preferences and Choices 27
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1.4.3 Choices
As we noted previously, if P is a strict preference relation on a set X, then a maximalelement, or choice, on X is an element x such that for no y ∈ X is it the case thatyPx. We can express this another way. Since P ⊂X×X, there is a mapping
φP :X→ X where φP (x)= {y : yPx}.We shall call φP the preference correspondence of P . The choice of P on X is theset Cp(X) = {x : φP (X) = Φ}. Suppose now that P is a strict preference relationon X. For each subset Y of X, let
CP (Y )= {x ∈ Y : φP (X)∩ Y =Φ}.
This defines a choice correspondence CP : 2X → 2X from 2X , the set of allsubsets of X, into itself.
An important question in social choice and welfare economics concerns the exis-tence of a “social” choice correspondence, CP , which guarantees the non-emptinessof the social choice CP (Y ) for each feasible set, Y , in X, and an appropriate socialpreference, P .
Lemma 1.8 If P is an acyclic strict preference relation on a finite set X, thenCP (Y ) is non-empty for each subset Y of X.
Proof Suppose that X = {x1, . . . , xr}. If all elements in X are indifferent thenclearly CP (X)=X.
So we can assume that if the cardinality |Y | of Y is at least 2, then x2Px1 forsome x2, x1. We proceed by induction on the cardinality of Y .
If Y = {x1} then obviously x1 = CP (Y ).If Y = {x1, x2} then either x1Px2, x2Px1, or x1Ix2 in which case CP (Y ) =
{x1}, {x2} or {x1, x2} respectively. Suppose CP (Y ) �= Φ whenever the cardinality|Y | of Y is 2, and consider Y ′ = {x1, x2, x3}.
Without loss of generality suppose that x2 ∈CP ({x1, x2}), but that neither x1 norx2 ∈ CP (Y ′). There are two possibilities (i) If x2Px1 then by asymmetry of P , not(x1Px2). Since x2 /∈ CP (Y ′) then x3Px2, so not (x2Px3). Suppose that CP (Y ′)=Φ . Then x1Px3, and we obtain a cycle x1Px3Px2Px1. This contradicts acyclicity,so x3 ∈ CP (Y ′). (ii) If x2Ix1 and x3 /∈ CP (Y ′) then either x1Px3 or x2Px3. Butneither x1 nor x2 ∈ CP (Y ′) so x3Px1 and x3Px2. This contradicts asymmetry of P .Consequently x3 ∈ CP (Y ′).
It is clear that this argument can be generalised to the case when |Y | = k and Y ′is a superset of Y (i.e., Y ⊂ Y ′ with |Y ′| = k+ 1.) So suppose CP (Y ) �=Φ . To showCP (Y ′) �=Φ when Y ′ = Y ∪{xk+1}, suppose [CP (Y )∪{xk+1}]∩CP (Y ′) �=Φ . Thenthere must exist some x ∈ Y such that xPxk+1.
If x ∈ CP (Y ) then xk+1Px, since x /∈ CP (Y ′) and zPx for no z ∈ Y . Hence weobtain the asymmetry xPxk+1Px. On the other hand if x ∈ Y\CP (Y ), then theremust exist a chain xrPxr−1P . . . x1Px with r < k, such that xr ∈CP (Y ). Since xr /∈
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
28 1 Sets, Relations, and Preferences
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
CP (Y ′) it must be the case that xk+1Pxr . This gives a cycle xPxk+1PxrP . . . x. Bycontradiction, CP (Y ′) �=Φ .
By induction if CP (Y ) �= Φ then CP (Y ′) �= Φ for any superset Y ′ of Y . SinceCP (Y ) �=Φ whenever |Y | = 2, it is evident that CP (Y ) �=Φ for any finite subset Y
of X. �
If P is a strict preference relation on X and P is representable by a utility func-tion u :X→� then it must be the case that all of P, I,R are transitive. To see this,we note the following:1. xRy and yRz iff u(x) ≥ u(y) ≥ u(z). Since “≥” on � is transitive it follows
that u(x)≥ u(z) and so xRz.2. xIy and yIz iff u(x)= u(y)= u(z), and thus xIz.
In this case indifference, I , is reflexive, symmetric and transitive. Such a relationon X is called an equivalence relation.
For any point x in X, let [x] be the equivalence class of x in X, i.e., [x] = {y :yIx}.
Every point in X belongs to exactly one equivalence class. To see this supposethat x ∈ [y] and x ∈ [z], then xIy and xIz. By symmetry zIx, and by transitivityzIy. Thus [y] = [z].
The set of equivalence classes in X under an equivalence relation, I , is writtenX/I . Clearly if u : X→� is a utility function then an equivalence class [x] is ofthe form
[x] = {y ∈X : u(x)= u(y)},
which we may also write as u−1[u(x)].If X is a finite set, and P is representable by a utility function then
CP (X)= {x ∈X : u(x) = s}
where s is max [u(y) : y ∈X], the maximum value of u on X.Social choice theory is concerned with the existence of a choice under a social
preference relation P which in some sense aggregates individual preferences for allmembers of a society M = {1, . . . , i, . . . ,m}. Typically the social preference relationcannot be representable by a “social” utility function. For example suppose a societyconsists of n individuals, each one of whom has a preference relation Pi on thefeasible set X.
Define a social preference relation P on X by xPy iff xPiy for all i ∈M (P iscalled the strict Pareto rule).
It is clear that if each Pi is transitive, then so must be P . As a result, P must beacyclic. If X is a finite set, then by Lemma 1.8, there exists a choice CP (X) on X.
The same conclusion follows if we define xQy iff xRjy ∀ j ∈M , and xPiy forsome i ∈M .
If we assume that each individual has negatively transitive preferences, then Q
will be transitive, and will again have a choice. Q is called the weak Pareto rule.Note that a point x belongs to CQ(X) iff it is impossible to move to another point y
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.4 Preferences and Choices 29
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
Fig. 1.5 ???
which makes nobody “worse off”, but makes some members of the society “betteroff”. The set CQ(X) is called the Pareto set. Although the social preference rela-tion Q has a choice, there is no social utility function which represents Q. To seethis suppose the society consists of two individuals 1,2 with transitive preferencesxP1yP1z and zP2xP2y.
By the definition xQy, since both individuals prefer x to y. However conflict ofpreference between y and z, and between x and z gives yIz and xIz, where I isthe social indifference rule associated with Q. Consequently I is not transitive andthere is no “social utility function” which represents Q. Moreover the elements of X
cannot be partitioned into disjoint indifference equivalence classes.To see the same phenomenon geometrically define a preference relation P on �2
by
(x1, x2)P (y1, y2) ⇐⇒ x1 > y1 ∧ x2 > y2.
From Fig. 1.5 (x1, x2) P (y1, y2). However (x1, x2) I (z1, z2) and (y1, y2) I
(z1, z2). Again there is no social utility function representing the preference relationQ. Intuitively it should be clear that when the feasible set is “bounded” in someway in �2, then the preference relation Q has a choice. We shall show this moregenerally in a later chapter. (See Lemma 3.9 below.)
In Fig. 1.5, we have represented the preference Q in �2 by drawing the set pre-ferred to the point (y1, y2), say, as a subset of �2.
An alternative way to describe the preference is by the graph of φP . For example,suppose X is the unit interval [0,1] in �, and let the horizontal axis be the domainof φP , and the vertical axis be the co-domain of φP . In Fig. 1.6, the preference P isidentical to the relation > on the interval (so yPx iff y > x). The graph of φP is thenthe shaded set in the figure. Note that yPx iff xP−1y. Because P is irreflexive, thediagonal eX = {(x, x) : x ∈ X} cannot belong to graph (φP ). To find graph (φ−1
P )
we simply “reflect” graph (φP ) in the diagonal.The shaded set in Fig. 1.7 represents graph (φ−1
P ). Because P is asymmetric,it is impossible for both yPx and xPy to be true. This means that graph(φP ) ∩graph(φ−1
P ) = Φ (the empty set). This can be seen by superimposing Figs. 1.6and 1.7. A preference of the kind illustrated in Fig. 1.6 is often called monotonicsince increasing values of x are preferred.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
30 1 Sets, Relations, and Preferences
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
Fig. 1.6 ???
Fig. 1.7 ???
Fig. 1.8 ???
To illustrate a transitive, but non-monotonic strict preference, consider Fig. 1.8which represents the preference yPx iff x < y < 1−x, for x ≤ 1
2 , or 1−x < y < x,for x > 1
2 . For example if x = 14 , then φP (x) is the interval ( 1
4 , 34 ), namely all points
between 14 , and 3
4 , excluding the end points.It is obvious that P represents a utility function
u(x)= x if x ≤ 1
2
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.4 Preferences and Choices 31
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
Fig. 1.9 ???
u(x)= 1− x if1
2< x ≤ 1.
Clearly the choice of P is CP (x)= 12 . Such a most preferred point is often called
a “bliss point” for the preference. Indeed a preference of this kind is usually called“Euclidean”, since the preference is induced from the distance from the bliss point.In other words, yPx iff |y− 1
2 |< |x− 12 |. Note again that this preference is transitive
and of course acyclic. The fact that the P is asymmetric can be seen from notingthat the shaded set in Fig. 1.8 (graph φP ) and the shaded set in Fig. 1.9 (graph φ−1
P )do not intersect.
Figure 1.10 represents a more complicated asymmetric preference. Here
φP (x)=(
x, x + 1
2
)if x ≤ 1
2
=(
1
2, x
)∪(
0, x − 1
2
)if x >
1
2.
Clearly there is a cycle, say 14 P 1
8 P 1116 P 1
4 . Moreover the choice CP (X) isempty. This example illustrates that when acyclicity fails, then it is possible for thechoice to be empty.
To give an example where P is both acyclic on the interval, yet no choice exists,consider Fig. 1.11. Define φP (x)= (x, x+ 1
2 ) if x ≤ 12 and φP (x)= ( 1
2 , x) if x > 12 .
P is still asymmetric, but we cannot construct a cycle. For example, if x = 14
then yPx for y ∈ ( 14 , 3
4 ) but if zPy then z > 14 . Note however that φP ( 1
2 )= ( 12 ,1)
so CP (x)=Φ .This example shows that Lemma 1.8 cannot be extended directly to the case that
X is the interval. In Chap. 3 below we show that we have to impose “continuity” onP to obtain an analogous result to Lemma 1.8.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
32 1 Sets, Relations, and Preferences
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
Fig. 1.10 ???
Fig. 1.11 ???
1.5 Social Choice and Arrow’s Impossibility Theorem
The discussion following Lemma 1.8 showed that even the weak Pareto rule, Q,did not give rise to transitive indifference, and thus could not be represented by a“social utility function”. However Q does give rise to transitive strict preference. Weshall show that any rule that gives transitive strict preference must be “oligarchic”in the same way that Q is oligarchic. In other words any rule that gives transitivestrict preference must be based on the Pareto (or unanimity) choice of some subset,say Θ of the society, M . Arrow’s Theorem (Arrow 1951) shows that if it is desiredthat indifference be transitive, then the rule must be “dictatorial”, in the sense that itobeys the preference of a single individual.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.5 Social Choice and Arrow’s Impossibility Theorem 33
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1.5.1 Oligarchies and Filters
The literature on social choice theory is very extensive and technical, and this sec-tion will not attempt to address its many subtleties. The general idea to examine thepossible rationality properties of a “social choice rule”
σ : S(X)M −→ S(X).
Here S(X) stands for the set of strict preference relations on the set X andM = {1, . . . , i, . . .} is a society. Usually M is a finite set of cardinality m. Some-times however M will be identified with the set of integers Z . S(X)M is the setof strict preference profiles for this society. For example if |M| = m, then a pro-file π = {P1, . . . ,Pm} is a list of preferences for the members of the society. Weuse A(X)M,T (X)M,O(X)M for profiles whose individual preferences are acyclic,strict partial orders or weak orders, respectively. Social choice theory is based onbinary comparisons. This means that if two profiles π1 and π2 agree on a pair ofalternatives {x, y}, say, then the social preferences σ(π1) and σ(π2) also agree on{x, y}. A key idea is that of a decisive coalition. Say a subset A ⊂M is decisiveunder the rule σ iff for any profile π = (P1, . . . ,Pm) such that xPiy for all i ∈ A
then x(σ (π))y. That is to say whenever A is decisive, and its members agree thatx is preferred to y then the social preference chooses x over y. The set of decisivecoalitions under the rule is written Dσ , or more simply, D. To illustrate this idea,suppose M = {1,2,3} and Dσ comprises any coalition with at least two members.It is easy to construct a profile π ∈ A(x)M such that σ is not even acyclic. Forexample, choose a profile π on the alternatives {x, y, z} such that
xP1yP1z, yP2zP2x, zP3xP3y.
Since both 1 and 2 prefer y to z we must have yσ(π)z. But in the same waywe find that zσ (π)x and xσ(π)y, giving a social preference cycle on {x, y, z}. Ingeneral restricting the image of σ so that it lies in A(X),T (X), and O(X) imposesconstraints on Dσ . We now examine these constraints.
Lemma 1.9 If σ : T (X)M −→ T (X), and M,A,B all belong to Dσ , then A ∩B ∈Dσ .
Outline of Proof Partition M into the four sets V1 =A∩B,
V2 =A\B,V3 = B\A,V4 =M\(A∪B) and suppose that each individual has pref-erences on the alternatives {x, y, z} as follows:
i ∈ V1: zPixPiy
i ∈ V2: xPiy, with preferences for z unspecified
i ∈ V3: zPix, with preferences for y unspecified
i ∈ V4: completely unspecified.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
34 1 Sets, Relations, and Preferences
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
Now A\B = {i ∈ A, but i /∈ B}, so V1 ∪ V2 = A. Since A is decisive and everyindividual in A prefers x to y we obtain xσ(π)y. In the same way V1∪V3 = B , andB is decisive, so zσ (π)x. Since we require σ(π) to be transitive, it is necessary thatzσ (π)y. Since individual preferences are assumed to belong to T (X), we requirethat zPiy for all i ∈ V1. We have not however specified the preferences for the restof the society. Thus V1 =A∩B must be decisive for {x, z} in the sense that V1 canchoose between x and z, independently of the rest of the society. But this must betrue for every pair of alternatives. Thus A∩B ∈Dσ . �
In general it could be possible for Dσ to be empty. However it is usual to assumethat σ satisfies the strict Pareto rule. That is to say for any x, y if xPiy, for all i ∈M
then xσ(π)y. This simply means that M ∈Dσ . Moreover, this implies that Φ /∈Dσ .To see this, suppose that Φ ∈Dσ and consider a profile with xPiy ∀ i ∈M . Sincenobody prefers y to x, and the empty set is decisive, we obtain yσ(π)x. But bythe Pareto rule, we have xσ(π)y. We assume however that σ(π) is always a strictpreference relation, and so yσ(π)x cannot occur. Finally if A ∈Dσ then any set B
which contains A must also be decisive. Thus Lemma 1.9 can be interpreted in thefollowing way.
Lemma 1.10 If σ : T (X)M −→ T (X) and σ satisfies the strict Pareto rule, thenDσ satisfies the following conditions:D1. (monotonicity) A⊂ B and A ∈Dσ implies B ∈Dσ .D2. (identity) A ∈Dσ and Φ /∈Dσ .D3. (closed under intersection) A,B ∈Dσ implies A∩B ∈Dσ .
A collection D of subsets of M which satisfy D1, D2, and D3 is called a filter.Note also that when M is finite, then Dσ must also be finite. By repeating
Lemma 1.9 for each pair of coalitions, we find that Θ = ∩{Ai : Ai ∈ Dσ } mustbe non-empty and also decisive. This set Θ is usually called the oligarchy. In thecase that σ is simply the strict Pareto rule, then the oligarchy is the whole society,M .
However, any rule that gives a transitive strict preference relation must be equiv-alent to the Pareto rule based on some oligarchy, possibly a strict subset of M .
For example, majority rule for the society M = {1,2,3} defines the decisivecoalitions {1,2}, {1,3}, {2,3}. These three coalitions contain no oligarchy. We canimmediately infer that this rule cannot be transitive. In fact, as we have seen, it isnot even acyclic. Below, we explore this further. If we require the rule always to sat-isfy the condition of negative transitivity then the oligarchy will consist of a singleindividual, when M is finite.
Lemma 1.11 If σ : T (X)M −→O(X) and M ∈Dσ and A⊂M and A /∈Dσ , thenM\A ∈Dσ .
Proof Since A /∈Dσ , we can find a profile π and a pair {x, y} such that yPix for alli ∈ A, yet not (yσ (π)x). Let us write this latter condition as xRy, where R standsfor weak social preference.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.5 Social Choice and Arrow’s Impossibility Theorem 35
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
Suppose now there is an alternative z such that xPiz for all i ∈M\A, and thatyPiz for all i ∈M . By the Pareto condition (M ∈ Dσ ) we obtain yPz (where P
stands for σ(π)). By negative transitivity of P , xRy and yPz implies xPz (seeLemma 1.7). However we have not specified the preferences of A on {x, z}. But wehave shown that if the members of M\A prefer x to z, then so does the society. Itthen follows that M\A must be decisive. �
It follows from this lemma that if σ : T (X)M −→ O(X) and M ∈ Dσ , thenwhenever A ∈ Dσ , there is some proper subset B (such that B ⊂ A yet B �= A)with B ∈ Dσ . To see this consider any proper subset C of A with C /∈ Dσ . ByLemma 1.11, M\C ∈Dσ . But since O(X) belongs to T (X), we can use the prop-erty D3 of Lemma 1.10 to infer that A ∩ (M\C) ∈ Dσ . But A ∩ (M\C) = A\C,and since C is a proper subset of A, A\C �= Φ . Hence A\C ∈ Dσ . In the case M
has finite cardinality, we can repeat this argument to show that there must be someindividual i such that {i} ∈Dσ . But then i is a dictator in the sense that, for any x, y
if π is a profile with xPiy then xσ(π)y.
Arrow’s Impossibility Theorem If σ :O(X)M −→O(X) and M ∈Dσ , with |M|finite, then there is a dictator {i}, say, such that {i} ∈Dσ .
It is obvious that if Dσ is non-empty, then all the coalitions in Dσ must containthe dictator {i}. In particular {i} = ∩{Mi : i ∈Dσ }, and {i} ∈Dσ .
A somewhat similar result holds when M is a “space” rather than a finite set.In this case there need not be a dictator in M . However in this case, the filter Dσ
defines an “invisible dictator”. That is to say, we can imagine the coalitions in Dσ
becoming smaller and smaller, so that they define the invisible dictator in the limit.
1.5.2 Acyclicity and the Collegium
As Lemma 1.8 demonstrated, if P is an acyclic preference relation on a finite set,then the choice for P will be non-empty. Given Arrow’s Theorem, it is thereforeuseful to examine the properties of a social choice rule that are compatible withacyclicity. To this end we introduce the notion of the Nakamura number for a rule,σ (Nakamura 1979).
Definition 1.4 Let D be a family of subsets of the finite set M . The collegiumK(D) is the intersection
∩{Ai :Ai ∈D}.That is to say K(D) is the largest set in M such that K(D)⊂A for all A ∈D.If K(D) is empty then D is said to be non-collegial. Otherwise D is collegial.If σ is a social choice rule, and Dσ its family of decisive coalitions, then
K(σ)=K(Dσ ) is the collegium for σ . Again σ is called collegial or non-collegialdepending on whether K(σ) is non-empty or empty. The Nakamura number of
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
36 1 Sets, Relations, and Preferences
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
a non-collegial family D is written k(D) and is the cardinality of the smallestnon-collegial subfamily of D. That is, there exists some subfamily D′ of D with|D′| = k(D) such that K(D′) = Φ . Moreover if D′′ is a subfamily of D with|D′′| ≤ k(D)− 1 then K(D′′) �=Φ .
In the case D is collegial define k(D)=∞. For a social choice rule define k(σ )=k(Dσ ), where Dσ is the family of decisive coalitions for σ .
Example 1.7(i) To illustrate this definition, suppose D consists of the four coalitions,
{A1,A2,A3,A4} where A1 = {2,3,4}, A2 = {1,3,4}, A3 = {1,2,4,5} andA4 = {1,2,3,5}. Of course D will be monotonic, so supersets of thesecoalitions will be decisive. It is evident that A1 ∩ A2 ∩ A3 = {4} and soif D′ = {A1,A2,A3} then K(D′) �= Φ . However K(D′) ∩ A4 = Φ and soK(D)=Φ . Thus k(D)= 4.
(ii) An especially interesting case is of a q-majority rule where each individualhas one vote, and any coalition with at least q voters (out of m) is decisive.In this case it is easy to show then that k(σ )= 2+ [ q
m−q] where [ q
m−q] is the
greatest integer strictly less than qm−q
. In the case that m= 4 and q = 3, then
we find that [ 31 ] = 2, so k(σ )= 4.
On the other hand for all other simple majority rules where m= 2s + 1 or2s and q = s + 1 (and s is integer) then
[q
m− q
]=[s + 1
s
]or
[s + 1
s − 1
]
depending on whether m is odd or even. In both cases [ qm−q
] = 1. Thusk(σ )= 3 for any simple majority rule with m �= 4.
(iii) Finally, observe that for any simple majoritarian rule, if M1,M2 both belongto D, then A1 ∩ A2 �= Φ . So in general, any non-collegial subfamily of Dmust include at least three coalitions. Consequently any majoritarian rule, σ ,has k(σ )≥ 3.
The Nakamura number allows us to construct social preference cycles.
Nakamura Lemma Suppose that σ is a non-collegial voting rule, with Nakamuranumber k(σ )= k. Then there exists an acyclic profile π = (P1, . . . ,Pn) for the so-ciety M , on a set X = {x1, . . . , xk} of cardinality k, such that σ(π) is cyclic on W ,and the choice of σ(π) is empty.
Proof We wish to construct a cycle by considering k different decisive coalitions,A1, . . . ,Ak and assigning preferences to the members of each coalition such that
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1.5 Social Choice and Arrow’s Impossibility Theorem 37
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
xiPix2 for all i ∈A1
...
xk−1Pixk for all i ∈Ak−1
xkPix1 for all i ∈Ak.
We now construct such a profile, π . Let Dk = {A1, . . . ,Ak−1}. By the definitionof the Nakamura number, this must be collegial. Hence there exists some individual{k}, say, with k ∈ A1 ∩ · · · ∩ Ak−1. We can assign k the acyclic preference profilex1Pkx2Pk . . .Pkxk .
In the same way, for each subfamily Dj = {A, . . . ,Aj−1,Aj+1, . . . ,Ak} thereexists a collegium containing individual j , to whom we assign the preference
xj+1Pjxj+2 . . . xkPjx1 . . . xjPjxj .
We may continue this process to assign acyclic preferences to each member ofthe various collegia of subfamilies of D, so as to give the required cyclic socialpreference. �
Lemma 1.12 A necessary condition for a social choice rule σ to be acyclic on thefinite set X of cardinality at least m= |M|, for each acyclic profile π on X, is thatσ be collegial.
Proof Suppose σ is not collegial. It is easy to show that the Nakamura number k(σ )
will not exceed m. By the Nakamura Theorem there is an acyclic profile π on a setX of cardinality m, such that σ(π) is cyclic on X. Thus acyclicity implies that σ
must be collegial. �
Note that this lemma emphasizes the size of the set of alternatives. It is worthobserving here that in the previous proofs of Arrow’s Theorem, the cardinality ofthe set of alternatives was implicitly assumed to be at least 3.
These techniques using the Nakamura number can be used to show that a simplesocial rule will be acyclic whenever it is collegial. Say a social choice rule is simpleiff whenever xσ(π)y for the profile π , then xPiy for all i in some coalition that isdecisive for σ .
Note that a social choice rule need not, in general, be simple. If σ is simple thenall the information necessary to analyse the rule is contained in Dσ .
Lemma 1.13 Let σ be a simple choice rule on a finite set X:(i) If σ is dictatorial, then σ(π)⊂O(X) for all π ∈O(X)M
(ii) If σ is oligarchic, then σ(π)⊂ T (X) for all π ∈ T (X)M
(iii) If σ is collegial, then σ(π)⊂A(X) for all π ∈A(X)M .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
38 1 Sets, Relations, and Preferences
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
Proof(i) If i is a dictator, then xIiy implies that x and y are socially indifferent. Be-
cause Pi belongs to O(X) so must σ(π).(ii) In the same way, if Θ is the oligarchy and xσ(π)y then xPiy for all i in Θ .
Thus σ(π) must be transitive.(iii) If there is a cycle x1σ(π) . . . , xkσ (π)x1 then each of these social preferences
must be supported by a decisive coalition. Since the collegium is non-empty,there is some individual, i, say, who has such a cyclic preference. This con-tradicts the assumption that π ∈A(X)M .
�
Another way of expressing part (iii) of this lemma is to introduce the idea ofa prefilter. Say D is a prefilter if and only if it satisfies D1 (monotonicity) andD2 (identity) introduced earlier, and also non-empty intersection (so K(D) �= Φ).Clearly if σ is simple, and Dσ is a prefilter, then σ is acyclic and consistent with thePareto rule.
In Chap. 3 we shall further develop the notion of social choice using the notionof the Nakamura number, in the situation where X has a geometric structure.
References
The first version of the impossibility theorem can be found in
Arrow, K. J. (1951). Social choice and individual values. New York: Wiley.
The ideas of the filter and Nakamura number are in
<unc> Kirman, A. P., & Sondermann, D. (1972). Arrow’s Impossibility Theorem, many agents and invis-ible dictators. Journal of Economic Theory, 5, 267–278.
Nakamura, K. (1979). The vetoers in a simple game with ordinal preferences. International Journalof Game Theory, 8, 55–61.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
2Linear Spaces and Transformations
2.1 Vector Spaces
We showed in Sect. 1.3 that when F was a field, the n-fold product set Fn had anadditional operation defined on it, which was induced from addition in F , so that(Fn,+) became an abelian group with zero 0. Moreover we were able to define aproduct · : F ×Fn → Fn which takes (α, x) to a new element of Fn called (αx).Elements of Fn are known as vectors, and elements of F as scalars. The propertiesthat we discovered in Fn characterise a vector space. A vector space is also knownas a linear space.
Definition 2.1 A vector space (V ,+) is an abelian additive group with zero 0,together with a field (F,+, ·) with zero 0 and identity 1. An element of V is calleda vector and an element of F a scalar. Moreover for any α ∈ F, v ∈ V there is ascalar multiplication (α, v)→ αv ∈ V which satisfies the following properties:V1. α(v1 + v2)= αv1 + αv2, for any α ∈F, v1, v2 ∈ V .V2. (α + β)v = αv + βv, for any α,β ∈F, v ∈ V .V3. (αβ)v = α(βv), for any α,β ∈F, v ∈ V .V4. 1 · v = v, for 1 ∈F , and for any v ∈ V .Call V a vector space over the field F . From the previous discussion the set �n
becomes an abelian group (�n,+) under addition. We shall frequently
write
x =⎛
⎜⎝
x1...
xn
⎞
⎟⎠
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_2,© Springer-Verlag Berlin Heidelberg 2014
39
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
40 2 Linear Spaces and Transformations
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Fig. 2.1 ???
for a vector in �n, where x1, . . . , xn are called the coordinates of x. Vector additionis then defined by
x + y =⎛
⎜⎝
x1...
xn
⎞
⎟⎠+⎛
⎜⎝
y1...
yn
⎞
⎟⎠=
⎛
⎜⎝
x1 + y1...
xn + yn
⎞
⎟⎠ .
A vector space over � is called a real vector space.
For example (Z2,+, ·) is a field and so (Z2)n is a vector space over the field Z2.
It may not be possible to represent each vector in a vector space by a list of coordi-nates. For example, consider the set of all functions with domain X and image in �.Call this set �X . If f,g ∈ �X , define f + g to be that function which maps x ∈X
to f (x)+ g(x). Clearly there is a zero function 0 defined by 0(x)= 0, and each f
has an inverse (−f ) defined by (−f )(x) = −(f (x)). Finally for α ∈ �, f ∈ �X ,define αf :X→� by (αf )(x)= α(f (x)). Thus �X is a vector space over �.
Definition 2.2 Let (V ,+) be a vector space over a field, F . A subset V ′ of V iscalled a vector subspace of V if and only if1. v1, v2 ∈ V ′ ⇒ v1 + v2 ∈ V ′, and2. if α ∈ F and v ∈ V ′ then αv′ ∈ V ′.
Lemma 2.1 If (V ,+) is a vector space with zero 0 and V ′ is a vector subspace,then, for each v ∈ V ′, the inverse (−v) ∈ V ′, and 0 ∈ V ′, so (V ′,+) is a subgroupof (V ,+).
Proof Suppose v ∈ V ′. Since F is a field, there is an identity 1, with additive inverse−1. But by V2, (1− 1)v = 1 · v + (−1)v = 0 · v, since 1− 1= 0. Now (1+ 0)v =1 · v + 0 · v, and so 0 · v = 0. Thus (−1)v = (−v). Since V ′ is a vector subspace,(−1)v ∈ V ′, and so (−v) ∈ V ′. But then v + (−v)= 0, and so 0 ∈ V ′. �
From now on we shall simply write V for a vector space, and refer to the fieldonly on occasion.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.1 Vector Spaces 41
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Definition 2.3 Let V ′ = {v1, . . . , vr} be a set of vectors in the vector space V .A vector v is called a linear combination of the set V ′ iff v can be written in theform
v =r∑
i=1
λivi
where each λi, i = 1, . . . , r belongs to the field F . The span of V ′, written Span(V ′)is the set of vectors which are linear combinations of the set V ′. If V ′′ = Span(V ′),then V ′ is said to span V ′′.
For example, suppose
V ′ ={(
12
)(21
)}.
Since we can solve the equation
(x
y
)= α
(12
)+ β
(21
)
for any (x, y) ∈ �2, by setting α = 13 (2y − x) and β = 1
3 (2x − y), it is clear that V ′is a span for �2.
Lemma 2.2 If V ′ is a finite set of vectors in the vector space, V , then Span(V ′) isa vector subspace of V .
Proof We seek to show that for any α,β ∈ F and any u,w ∈ Span(V ′), then αu+βw ∈ Span(V ′). By definition, if V ′ = {v1, . . . , vr}, then u =∑r
i=1 ηivi and w =∑ri=1 μivi , where ηi,μi ∈F for i = 1, . . . , r . But then αu+ βw = α
∑ri=1 ηivi +
β∑r
i=1 μivi =∑ri=1 λivi , where λi = αηi + βμi ∈ F , for i = 1, . . . , r . Thus
αu+ βw ∈ Span(V ′).Note that, by this lemma, the zero vector 0 belongs to Span(V ′). �
Definition 2.4 Let V ′ = {v1, . . . , vr} be a set of vectors in V . V ′ is called a frameiff∑r
i=1 αivi = 0 implies that αi = 0 for i = 1, . . . , r . (Here each αi belongs to thefield F .) In this case the set V ′ is called a linearly independent set. If V ′ is not aframe, the vectors in V ′ are said to be linearly dependent. Say a vector is linearlydependent on V ′ = {v1, . . . , vr} iff v ∈ Span(V ′).
Note that if V ′ is a frame, then1. 0 /∈ V ′ since α0= 0 for every non-zero α ∈ F .2. If v ∈ V ′ then (−v) /∈ V ′, otherwise 1 · v + 1(−v) = 0 would belong to V ′,
contradicting (1).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
42 2 Linear Spaces and Transformations
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Lemma 2.31. V ′ is not a frame iff there is some vector v ∈ V ′ which is linearly dependent on
V ′\{v}.2. If V ′ is a frame, then any subset of V ′ is a frame.3. If V ′ spans V ′′, but V ′ is not a frame, then there exists some vector v ∈ V ′ such
that V ′′′ = V ′\{v} spans V ′′.
Proof Let V ′ = {v1, . . . , vr} be the set of vectors in the vector space V .1. Suppose V ′ is not a frame. Then there exists an equation
∑rj=1 αjvj =
0, where, for at least one k, it is the case that αk �= 0. But then vk =− 1
αk(∑
j �=k αj vj ). Let vk = v. Then v is linearly dependent on V ′\{v}. Onthe other hand suppose that v1, say, is linearly dependent on {v2, . . . , vr}. Thenv1 =∑r
j=2 αjvj , and so 0=−v1+∑rj=2 αjvj =∑r
j=1 αjvj where α1 =−1.Since α1 �= 0,V ′ cannot be a frame.
2. Suppose V ′′ is a subset of V ′, but that V ′′ is not a frame. For convenience letV ′′ = {v1, . . . , vk} where k ≤ r . Then there is a non-zero solution
0 �=k∑
j=1
αjvj .
Since V ′′ is a subset of V ′, this implies that V ′ cannot be a frame. Thus if V ′is a frame, so is any subset V ′′.
3. Suppose that V ′ is not a frame, but that it spans V ′′. By part (1), there ex-ists a vector v1, say, in V ′ such that v1 belongs to Span(V ′\{v1}). Thusv1 =∑r
j=2 αjvj . Since V ′ is a span for V ′′, any vector v in V ′′ can be written
v =r∑
j=1
βjvj
= β1
(r∑
j=2
αjvj
)
+r∑
j=2
βjvj .
Thus v is a linear combination of V ′\{v1} and so V ′′ = Span(V ′\{v1}).Let V ′′′ = V ′\{v1} to complete the proof. �
Definition 2.5 A basis for a vector space V is a frame V ′ which spans V .For example, we previously considered
V ′ ={(
12
),
(21
)}
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.1 Vector Spaces 43
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
and showed that any vector in �2 could be written as(
x
y
)=(
2y − x
3
)(12
)+(
2x − y
3
)(21
)= λ1
(12
)+ λ2
(21
).
Thus V ′ is a span for �2. Moreover if (x, y) = (0,0) then λ1 = λ2 = 0 and so V ′is a frame. Hence V ′ is a basis for �2. If V ′ = {v1, . . . , vn} is a basis for a vectorspace V then any vector v ∈ V can be written
v =n∑
j=1
αjvj
and the elements (α1, . . . , αn) are known as the coordinates of the vector v, withrespect to the basis V ′.
For example the natural basis for �n is the set V ′ = {e1, . . . , en} where ei =(0, . . . ,1, . . . ,0} with a 1 in the ith position.
Lemma 2.4 {e1, . . . , en} is a basis for �n.
Proof We can write any vector x in �n as {x1, . . . , xn}. Clearly
x =⎛
⎜⎝
x1...
xn
⎞
⎟⎠= x1
⎛
⎝10·
⎞
⎠+ · · ·xn
⎛
⎜⎝
0...
1
⎞
⎟⎠ .
If x = 0 then x1 = · · · = xn = 0 and so {e1, . . . , en} is a frame, as well as a span, andthus a basis for �n. �
However a single vector x will have different coordinates depending on the basischosen. For example the vector (x, y) has coordinates (x, y) in the basis {e1, e2} but
coordinates (2y−x
3 ,2x−y
3 ) with respect to the basis{( 1
2
),( 2
1
)}.
Once the basis is chosen, the coordinates of any vector with respect to that basisare unique. �
Lemma 2.5 Suppose V ′ = {v1, . . . , vn} is a basis for V . Let v =∑ni=1 αivi . Then
the coordinates (α1, . . . , αn), with respect to the basis, are unique.
Proof If the coordinates were not unique then it would be possible to write v =∑ni=1 βivi =∑n
i=1 αivi with βi �= αi for some i.But 0= v − v =∑n
i=1 αivi −∑ni=1 βivi =∑n
i=1 (αi − βi)vi .Since V ′ is a frame, αi − βi = 0 for i = 1, . . . , n. Thus αi = βi for all i, and so
the coordinates are unique. �
Note in particular that with respect to any basis {v1, . . . , vn} for V , the uniquezero vector 0 always has coordinates (0, . . . ,0).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
44 2 Linear Spaces and Transformations
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Definition 2.6 A space V is finitely generated iff there exists a span V ′, for V ,which has a finite number of elements.
Lemma 2.6 If V is a finitely generated vector space, then it has a basis with afinite number of elements.
Proof Since V is finitely generated, there is a finite set V1 = {v1, . . . , vn} whichspans V . If V1 is a frame, then it is a basis. If V1 is linearly dependent, thenby Lemma 2.3(3) there is a vector v ∈ V1, such that Span(V2) = V , whereV2 = V1\{v}. Again if V2 is a frame, then it is a basis. If there were no subsetVr = {v1, . . . , vn−r+1} of V1 which was a frame, then V1 would have to be the emptyset, implying that V was an empty set. But this contradicts 0 ∈ V . �
Lemma 2.7 If V is a finitely generated vector space, and V1 is a frame, then thereis a basis V2 for V which includes V1.
Proof Let V1 = {v1, . . . , vr}. If Span(V1) = V then V1 is a basis. So supposethat Span(V1) �= V . Then there exists an element vr+1 ∈ V which does not be-long to Span(V1). We seek to show that V2 = V1 ∪ {vr+1} is a frame. Consider0= αr+1vr+1 +∑r
i=1 αivi .If αr+1 = 0, then the linear independence of V1 implies that αi = 0, for i =
1, . . . , r . Thus V2 is a frame. If αr+1 �= 0, then
vr+1 =− 1
αr+1
(r∑
i=1
αivi
)
.
But this implies that vr+1 belongs to Span(V1) and therefore that V = Span(V1).Thus V2 is a frame. If V2 is a span for V , then it is a basis. If V2 is not a span, reiteratethis process. Since V is finitely generated, there must be some frame Vn−r+1 ={v1, . . . , vr , vr+1, . . . , vn} which is a span, and thus a basis for V . �
These two lemmas show that if V is a finitely generated vector space, and{v1, . . . , vm} is a span then some subset {v1, . . . , vn}, with n≤m, is a basis. A basisis a minimal span.
On the other hand if X = {v1, . . . , vr} is a frame, but not a span, then elementsmay be added to X in such a way as to preserve linear independence, until this“superset” of X becomes a basis. Consequently a basis is a maximal frame. Thesetwo results can be combined into one theorem.
Exchange Theorem Suppose that V is a finitely generated vector space. Let X ={x1, . . . , xn} be a frame and Y = {y1, . . . , yn} a span. Then there is some subset Y ′of Y such that X ∪ Y ′ is a basis for V .
Proof By induction, let XS = {x1, . . . , xs}, for each s = 1, . . . , n, and let X0 = φ. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 45
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
We know already from Lemma 2.6 that there is some subset Y0 of Y such thatX0 ∪ Y0 is a basis for V . Suppose for some s < m, there is a subset Ys of Y suchthat Xs ∪ Ys is a basis.
Let Ys = {y1, . . . , yt }. Now xs+1 /∈ Span(Xs ∪ Ys) since Xs ∪ Ys is a basis. Thusxs+1 =∑s
1 α1x1 +∑t1 βiyi . But Xs+1 = {x1, . . . , xs+1} is a frame, since it is a
subset of X.Thus at least one βj �= 0. Let Ys+1 = Ys\{yj }, so Yj /∈ Span(Xs+1 ∪ Ys+1) and
so Xs+1 ∪ Ys+1 = {x1, . . . , xs+1} ∪ {y1, . . . , yj−1, yj+1, . . . , yt } is a basis for V .Thus if there is some subset Ys of Y such that Xs ∪ Ys is a basis, there is a subset
Ys+1 of Y such that Xs+1 ∪ Ys+1 is a basis.By induction, there is a subset Ym = Y ′ of Y such that Xm ∪ Ym = X ∪ Y ′ is a
basis. �
Corollary 2.8 If X = {x1, . . . , xm} is a frame in a vector space V , and Y ={y1, . . . , yn} is a span for V , then m≤ n.
Lemma 2.9 If V is a finitely generated vector space, then any two bases have thesame number of vectors, where this number is called the dimension of V , and writtendim(V ).
Proof Let X,Y be two bases with m,n number of elements. Consider X as a frameand Y as a span. Thus m≤ n. However Y is also a frame and X a span. Thus n≤m.Hence m= n. �
If V ′ is a vector subspace of a finitely generated vector space V , then any basisfor V ′ can be extended to give a basis for V . To see this, there must exist some finiteset V ′′ = {v1, . . . , vr} of vectors all belonging to V ′ such that Span(V ′′)= V ′. Oth-erwise V could not be finitely generated. As before eliminate members of V ′′ until aframe is obtained. This gives a basis for V ′. Clearly dim(V ′)≤ dim(V ). Moreover ifV ′ has a basis V ′′′ = {v1, . . . , vr} then further linear independent vectors belongingto V \V ′ can be added to V ′′′ to give a basis for V .
As we showed in Lemma 2.3, the vector space �n has a basis {e1, . . . , en} con-sisting of n elements. Thus dim (�n)= n.
If V m is a vector subspace of �n of dimension m, where of course m ≤ n, thenin a certain sense V m is identical to a copy of �m through the origin 0. We makethis more explicit below.
2.2 Linear Transformations
In Chap. 1 we considered a morphism from the abelian group (�2,+) to itself.A morphism between vector spaces is called a linear transformation.
Definition 2.7 Let V,U be two vector spaces of dimension n,m respectively, overthe same field F . Then a linear transformation T : V → U is a function from V toU with domain V , such that
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
46 2 Linear Spaces and Transformations
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
1. for any a ∈F , any v ∈ V,T (αv)= α(T (v))
2. for any v1, v2 ∈ V, T (v1 + v2)= T (v1)+ T (v2).Note that a linear transformation is simply a morphism between (V ,+) and
(U,+) which respects the operation of the field F . We shall show that any lineartransformation T can be represented by an array of the form
M(T )=⎛
⎜⎝
a11 a1n
......
...
am1 amn
⎞
⎟⎠
consisting of n× m elements in F . An array such as this is called an n by m (orn×m) matrix. The set of n×m matrices we shall write as M(n,m).
2.2.1 Matrices
For convenience we shall consider finitely generated vector spaces over �, so thatwe restrict attention to linear transformations between �n and �m, for any integersn and m. Now let V = {v1, . . . , vn} be a basis for �n and U = {u1, . . . , um} a basisfor �m.
Since V is a basis for �n, any vector x ∈ �n can be written as x =∑nj=1 xjvj ,
with coordinates (x1, . . . , xn).If T is a linear transformation, then T (αv1 + βv2) = T (αv1) + T (βv2) =
αT (v1)+ βT (v2). Therefore
T (x)= T
(n∑
j=1
xjvj
)
=n∑
j=1
xjT (vj ).
Since each T (vj ) lies in �m we can write T (vj )=∑mi=1 aijui , where (a1j , a2j ,
. . . , amj ) are the coordinates of T (vj ) with respect to the basis U for �m.Thus
T (x)=n∑
j=1
xj
m∑
i=1
aijui =m∑
i=1
yiui
where the ith coordinate, yi , of T (x) is equal to∑n
j=1 aij xj .We obtain a set of linear equations:
y1 = a11x1 + a12x2 + · · ·a1j xj + · · ·a1nxn
......
yi = ai1x1 + ai2x2 + · · ·aij xj + · · ·ainxn
......
ym = am1x1 + am2x2 + · · ·amjxj + · · ·amnxn.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 47
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
This set of equations is more conveniently written
row i
⎛
⎜⎜⎜⎜⎜⎜⎝
a11 . . . aij . . . a1n
...
ai1 aij ain
...
am1 . . . amj . . . amn
⎞
⎟⎟⎟⎟⎟⎟⎠
j th column
⎛
⎜⎜⎜⎜⎜⎜⎝
x1...
xj
...
xn
⎞
⎟⎟⎟⎟⎟⎟⎠=
⎛
⎜⎜⎜⎜⎜⎜⎝
y1...
yi
...
ym
⎞
⎟⎟⎟⎟⎟⎟⎠
.
or as M(T )x = y, where M(T ) is the n×m array whose ith row is (ai1, . . . , ain)
and whose j th column is (a1j , . . . , amj ). This matrix is commonly written as (aij )
where it is understood that i = 1, . . . ,m and j = 1, . . . , n.Note that the operation of M(T ) on x is as follows: to obtain the ith coor-
dinate, yi , take the ith row vector (a11, . . . , a1n) and form the scalar product ofthis with the column vector (x1, . . . , xn), where this scalar product is defined to be∑n
j=1 aij xj .The coefficients of T (vj ) with respect to the basis (u1, . . . , um) are (a1j , . . . , amj )
and these turn up as the j th column of the matrix. Thus we could write the matrixas
M(T )= (T (v1) . . . T (vj ) . . . T (vn))
where T (vj ) is the column of coordinates in �m. Suppose now that W ={w1, . . . ,wp} is a basis for �p and S : �m →�p is a linear transformation. Thento represent S as a matrix with respect to the two sets of bases, U and W , for eachi = 1, . . . ,m, we need to know
S(ui)=p∑
k=1
bkiwk.
Then as before S is represented by the matrix
M(S)=⎛
⎜⎝
b11 . . . b1i . . . b1m
... bki
...
bp1 . . . bpi . . . bpm
⎞
⎟⎠
where the ith column is the column of coordinates of S(ui) in �p .We can compute the composition
(S ◦ T ) : �n T→�m S→�p.
The question is how should we compose the two matrices M(S) and M(T ) sothat the result “corresponds” to the matrix M(S ◦ T ) which represents S ◦ T .
First of all we show that S ◦ T : �n→�p is a linear transformation, so that weknow that it can be represented by an (n× p) matrix.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
48 2 Linear Spaces and Transformations
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Lemma 2.10 If T : �n→�m and S : �m→�p are linear transformations, thenS ◦ T : �n→�p is a linear transformation.
Proof Consider α,β ∈ �, v1, v2 ∈ �n. Then
(S ◦ T )(αv1 + βv2)= S[T (αv1 + βv2)
]
= S(αT (v1)+ βT (v2)
)since T is linear
= αS(T (v1)
)+ βS(T (v2)
)since S is linear
= α(S ◦ T )(v1)+ β(S ◦ T )(v2).
Thus S ◦ T is linear.By the previous analysis, (S ◦T ) can be represented by an (n×p) matrix whose
j th column is (S ◦ T )(vj ). Thus
(S ◦ T )(vj )= S
(m∑
i=1
aijui
)
=m∑
i=1
aijS(ui)
=m∑
i=1
aij
p∑
k=1
bkiwk
=p∑
k=1
(m∑
i=1
aij bki
)
wk.
Thus the kth entry in the j th column of M(S ◦ T ) is∑m
i=1 bkiaij .Thus (S ◦ T ) can be represented by the matrix
M(S ◦ T )= kth row
⎛
⎜⎜⎝
←− n −→. . .
∑mi=1 bkiaij . . .
j th column
⎞
⎟⎟⎠p
The j th column in this matrix can be obtained more simply by operating thematrix M(S) on the j th column vector T (vj ) in the matrix M(T ).
Thus M(S ◦ T )= (M(S)(T (v1)) . . .M(S)(T (vn)))=M(S) ◦M(T ).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 49
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
kth row ofp rows
(bk1 . . . bki . . . bkm
←− m columns −→)
⎛
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
←− n−→aij
...
aij
...
amj
j th column
⎞
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
m rows
=M(S) ◦M(T ). �
Thus the “natural” method of matrix composition corresponds to the compositionof linear transformations.
Now let L(�n,�n) stand for the set of linear transformations from �n to �n. Aswe have shown, if S,T belong to this set then S ◦ T is also a linear transformationfrom�n to�n. Thus composition of functions (◦) is a binary operation L(�n,�n)×L(�n,�n)→L(�n,�n).
Let M :L(�n,�n)→M(n,n) be the mapping which assigns to any linear trans-formation T : �n →�n the matrix M(T ) as above. Note that M is dependent onthe choice of bases {v1, . . . , vn} and {u1, . . . , un} for the domain and codomain, �n.There is in general no reason why these two bases should be the same.
Now let ◦ be the method of matrix composition which we have just defined. Thusthe mapping M satisfies
M(S ◦ T )=M(S) ◦M(T )
for any two linear transformations, S and T . Suppose now that we are given a lineartransformation, T ∈ L(�n,�n). Clearly the matrix M(T ) which represents T withrespect to the two bases is unique, and so M is a function.
On the other hand suppose that T ,S are both represented by the same matrixA= (aij ).
By definition T (vj )= S(vj )=∑mi=1 aijui for each j = 1, . . . , n.
But then T (x)= S(x) for any x ∈ �n, and so T = S. Thus M is injective.Moreover if A is any matrix, then it represents a linear transformation, and so M
is surjective. Thus we have a bijective morphism
M : (L(�n,�n),◦)→ (M(n,n),◦).
As we saw in the case of 2× 2 matrices, the subset of non-singular matrices inM(n,n) forms a group. We repeat the procedure for the more general case.
2.2.2 The Dimension Theorem
Let T : V → U be a linear transformation between the vector spaces V,U of di-mension n,m respectively over a field F . The transformation is characterised bytwo subspaces, of V and U .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
50 2 Linear Spaces and Transformations
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Definition 2.81. the kernel of a transformation T : V →U is the set Ker(T )= {x ∈ V : T (x)=
0} in V .2. The image of the transformation is the set Im(T ) = {y ∈ U : ∃ x ∈ V s.t.
T (x)= y}.Both these sets are vector subspaces of U,V respectively. To see this supposev1, v2 ∈ Ker(T ), and α,β ∈ F . Then T (αv1 + βv2) = αT (v1) + βT (v2) = 0 +0= 0. Hence αv1 + βv2 ∈Ker(T ).
If u1, u2 ∈ Im(T ) then there exists v1, v2 ∈ V such that T (v1)= u1, T (v2)= u2.But then
αμ1 + βμ2 = αT (v1)+ βT (v2)
= T (αv1 + βv2).
Since V is a vector space, αv1 + βv2 ∈ V and so αu1 + βu2 ∈ Im(T ).By the exchange theorem there exists a basis k1, . . . , kp for Ker(T ), where p =
dim Ker(T ) and a basis u1, . . . , us for Im(T ) where s = dim(Im(T )). Here p iscalled the kernel rank of T , often written kr(T ), and s is the rank of T , or rk(T ).
The Dimension Theorem If T : V →U is a linear transformation between vectorspaces over a field F , where dimension (V ) �= n, then the dimension of the kerneland image of T satisfy the relation
dim(Im(T )
)+ dim(Ker(T )
)= n.
Proof Let {u1, . . . , us} be a basis for Im(T ) and for each i = 1, . . . , s, let vi be thevector in V n such that T (vi)= ui .
Let v be any vector in V . Then
T (v)=s∑
i=1
αiui, for T (v) ∈ Im(T ).
So
T (v)=s∑
i=1
αiT (vi)
= T
(s∑
i=1
αivi
)
, and
T
(
v−s∑
i=1
αivi
)
= 0,
the zero vector in U , i.e., v −∑si=1 αivi ∈ kernel T . Let {k1, . . . , kp} be the basis
for Ker(T ).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 51
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
Then v − ∑si=1 αivi = ∑p
j=1 βjkj , or v = ∑si=1 αivi + ∑p
j=1 βjkj . Thus(v1, . . . , vs, k1, . . . , kp) is a span for V .
Suppose we consider
s∑
i=1
αivi +p∑
j=1
βjkj = 0. (*)
Then, since T (kj )= 0 for j = 1, . . . , p,
T
(s∑
i=1
αivi +p∑
j=1
βjkj
)
=s∑
i=1
αiT (vi)+p∑
j=1
βjT (kj )
=s∑
i=1
αiT (vi)=s∑
i=1
αiui = 0.
Now {ui, . . . , us} is a basis for Im(T ), and hence these vectors are linearly inde-pendent. So αi = 0, i = 1, . . . , s. Therefore (*) gives
∑p
j=1 βjkj = 0.However {k1, . . . , kp} is a basis for Ker(T ) and therefore a frame, so βj = 0 for
j = 1, . . . , p. Hence {v1, . . . , vs, k1, . . . , kp} is a frame, and therefore a basis for V .By the exchange theorem the dimension of V is the unique number of vectors in abasis. Therefore s + p = n. �
Note that this theorem is true for general vector spaces. We specialise now tovector spaces �n and �m.
Suppose {v1, . . . , vn} is a basis for �n. The coordinates of vj with respect to thisbasis are (0, . . . ,1, . . . ,0) with 1 in the j th place. As we have noted the image of vj
under the transformation T can be represented by the j th column (aij , . . . , amj ) inthe matrix M(T ), with respect to the original basis (e1, . . . , em), say, for �m. Callthe n different column vectors of this matrix a1, . . . , aj , . . . , an.
Then the equation M(T )(x) = y is identical to the equation∑n
j=1 xjaj = y
where x = (x1, . . . , xn).Clearly any vector y in the image of M(T ) can be written as a linear com-
bination of the columns A = {a1, . . . , an}. Thus Span(A) = Im(M(T )). Supposenow that A is not a frame. In this case an, say, can be written as a linear com-bination of {a1, . . . , an−1}, i.e.,
∑nj=1 k1j aj = 0 and k1n �= 0. Then the vector
k1 = (k11, . . . , k1n) satisfies M(T )(k1)= 0. Thus k1 belongs to Ker(M(T )).Eliminate an, say, and proceed in this way. After p iterations we will have ob-
tained p kernel vectors {k1, . . . , kp} and the remaining column vectors {a1, . . . ,
an−p} will form a frame, and thus a basis for the image of M(T ).Consequently dim(Im(M(T ))) = n − p = n − dim(Ker(M(T ))). The number
of linearly independent columns in the matrix M(T ) is called the rank of M(T ),and is clearly the dimension of the image of M(T ). In particular if M1(T ) andM2(T ) are two matrix representations with respect to different bases, of the lineartransformation T , then rankM1(T )= rankM2(T )= rank(T ).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
52 2 Linear Spaces and Transformations
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Thus rank(T ) is an invariant, in the sense of being independent of the particularbases chosen for �n and �m.
In the same way the kernel rank of T is an invariant; that is, for any matrixrepresentation M(T ) of T we have ker rank(M(T ))= ker rank(T ).
In general if y ∈ Im(T ), x0 satisfies T (x0)= y, and k belongs to the kernel, then
T (x0 + k)= T (x0)+ T (k)= y + 0= y.
Thus if x0 is a solution to the equation T (x0) = y, the point x0 + k is also asolution. More generally x0+Ker(T )= {x0+ k : k ∈Ker(T )} will also be the set ofsolutions. Thus for a particular y ∈ Im(T ), T −1(y)= {x : T (x)= y} = x0+Ker(T ).
By the dimension theorem dim Ker(T ) = n − rank(T ). Thus T −1(y) is a geo-metric object of “dimension” dim Ker(T )= n− rank(T ).
We defined T to be injective iff T (x0)= T (x) implies x0 = x. Thus T is injectiveiff Ker(T )= {0}. In this case, if there is a solution to the equation T (x0)= y, thenthis solution is unique.
Suppose that n ≤ m, and that the n different column vectors of the matrix arelinearly independent. In this case rank(T )= n and so dim Ker(T )= 0. Thus T mustbe injective. In particular if n < m then not every y ∈ �m belongs to the image ofT , and so not every equation T (x) = y has a solution. Suppose on the other handthat n > m. In this case the maximum possible rank is m (since n vectors cannot belinearly independent in �m when n > m). If rank(T ) = m, then there must exist akernel of dimension (n−m).
Moreover Im(T ) = �m, and so for every y ∈ �m there exists a solution to thisequation T (x)= y. Thus T is surjective. However the solution is not unique, sinceT −1(y)= x +Ker(T ) is of dimension (n−m) as before.
Suppose now that n=m, and that T : �n→�n has maximal rank n. Then T isboth injective and surjective and thus an isomorphism. Indeed T will have an inversefunction T −1 : �n → �n. Moreover T −1 is linear. To see this note that if x1 =T −1(y1) and x2 = T −1(y2) then T (x1)= y1 and T (x2)= y2 so T (x1 + x2)= y1 +y2. Thus T −1(y1+ y2)= x1+ x2 = T −1(y1)+T −1(y2). Moreover if x = T −1(αy)
then T (x)= αy. If α �= 0, then 1αT (x)= T ( 1
αx)= y or 1
αx = T −1(y). Hence x =
αT −1(y). Thus T −1(αy)= αT −1(y). Since T −1 is linear it can be represented bya matrix M(T −1). As we know M : (L(�n,�n),◦)→ (M(n,n),◦) is a bijectivemorphism, so M maps the identity linear transformation, Id, to the identity matrix
M(Id)= I =⎛
⎜⎝
1 . . . 0...
...
0 . . . 1
⎞
⎟⎠ .
When T is an isomorphism with inverse T −1, then the representation M(T −1)
of T −1 is [M(T )]−1. We now show how to compute the inverse matrix [M(T )]−1
of an isomorphism.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 53
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
2.2.3 The General Linear Group
To compute the inverse of an n× n matrix A, we define, by induction, the determi-nant of A. For a 1× 1 matrix (a11) define det (A11)= a11, and for a 2× 2 matrix
A= ( a11 a12a21 a22
)define det A= a11a22 − a21a12.
For an n× n matrix A define the (i, j)th cofactor to be the determinant of the(n− 1)× (n− 1) matrix A(i, j) obtained from A by removing the ith row and j thcolumn, then multiplying by (−1)i+j . Write this cofactor as Aij . For example inthe 3× 3 matrix, the cofactor in the (1,1) position is
A11 = det
(a22 a23a32 a33
)= a22a33 − a32a23.
The n× n matrix (Aij ) is called the cofactor matrix.The determinant of the n× n matrix A is then
∑nj=1 a1jA1j . The determinant is
also often written as |A|.This procedure allows us to define the determinant of an n× n matrix. For ex-
ample if A= (aij ) is a 3× 3 matrix, then
|A| = a11
∣∣∣∣a22 a23a32 a33
∣∣∣∣− a12
∣∣∣∣a21 a23a31 a33
∣∣∣∣+ a13
∣∣∣∣a21 a22a31 a32
∣∣∣∣
= a11(a22a33 − a32a23)− a12(a21a33 − a31a23)+ a13(a21a32 − a31a22).
An alternative way of defining the determinant is as follows. A permutation of n
is a bijection s : {1, . . . , n}→ {1, . . . , n}, with degree d(s) the number of exchangesneeded to give the permutation.
Then |A| =∑s(−1)d(s)Πni=1ais(i) = a11a22a33 · · · + · · · where the summation is
over all permutations. The two definitions are equivalent, and it can be shown that
|A| =n∑
j=1
aijAij (for any i = 1, . . . , n)
=n∑
i=1
aijAij (for any j = 1, . . . , n)
while
0=n∑
i=1
aijAik if j �= k
=n∑
j=1
aijAkj if i �= k.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
54 2 Linear Spaces and Transformations
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
Thus
(aij ) (Ajk)t =(
n∑
j=1
aijAkj
)
=⎛
⎜⎝
|A| . . . 0...
...
0 . . . |A|
⎞
⎟⎠= |A|I.
Here (Ajk)t is the n× n matrix obtained by transposing the rows and columns
of (Ajk). Now the matrix A−1 satisfies A ◦ A−1 = I , and if A−1 exists then it isunique. Thus A−1 = 1
|A| (Aij )t .
Suppose that the matrix A is non-singular, so |A| �= 0. Then we can construct aninverse matrix A−1.
Moreover if A(x) = y then y = A−1(x) which implies that A is both injectiveand surjective. Thus rank(A) = n and the column vectors of A must be linearlyindependent.
As we have noted, however, if A is not injective, with Ker(A) �= {0}, thenrank(A) < n, and the column vectors of A must be linearly dependent. In this casethe inverse A−1 is not a function and cannot therefore be represented by a matrixand so we would expect |A| to be zero.
Lemma 2.11 If A is an n× n matrix with rank(A) < n then |A| = 0.
Proof Let A′ be the matrix obtained from A by adding a multiple (α) of the kthcolumn of A to the j th column of A. The j th column of A′ is therefore aj + αak .This operation leaves the j th column of the cofactor matrix unchanged. Thus
∣∣A′∣∣=
n∑
i=1
a′ijAij
=n∑
i=1
(aij + αaik)Aij
=n∑
i=1
aijAij + α
n∑
i=1
aikAij
= |A| + 0= |A|.
Suppose now that the columns of A are linearly dependent, and that aj =∑k �=j αkak for example. Let A′ be the matrix obtained from A by substituting
a′j = 0= aj −∑k �=j αkak for the j th column.By the above |A′| =∑n
i=1 a′ijAij = 0= |A|. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 55
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
Suppose now that A,B are two non-singular matrices (aij ), (bki). The composi-tion is then B ◦A= (
∑mi=1 bkiaij ) with determinant
|B ◦A| =∑
s
(−1)d(s)
n∏
k=1
◦(
m∑
i=1
bkiais(k)
)
.
This expression can be shown to be equal to
∑
s
(−1)d(s)n∏
i=1
ais(i)
∑
s
(−1)d(s)n∏
i=1
bks(k) = |B||A| �= 0.
Hence the composition (B ◦A) has an inverse (B ◦A)−1 given by A−1 ◦B−1.Now let (GL(�n,�n),◦) be the set of invertible linear transformations, with ◦
composition of functions, and let M∗(n,n) be the set of non-singular n×n matrices.Choice of bases {v1, . . . , vn}, {u1, . . . , un} for the domain and codomain defines amorphism
M : (GL(�n,�n
),◦)→ (M∗(n,n),◦).
Suppose now that T belongs to GL(�n,�n). As we have seen this is equivalentto |M(T )| �= 0, so the image of M is precisely M∗(n,n). Moreover if |M(T )| �= 0then |M(T −1)| = 1
|M(T )| and M(T −1) belongs to M∗(n,n). On the other hand if
S,T ∈ GL(�n,�n) then S ◦ T also has rankn, and has inverse T −1 ◦ S−1 withrankn.
The matrix M(S ◦ T ) representing T ◦ S has inverse
M(T −1 ◦ S−1)=M
(T −1) ◦M
(S−1)
= [M(T )]−1 ◦ [M(S)
]−1.
Thus M is an isomorphism between the two groups (GL(�n,�n,◦)) and(M∗(n,n),◦).
The group of invertible linear transformations is also called the general lineargroup.
2.2.4 Change of Basis
Let L(�n,�m) stand for the set of linear transformations from �n to �m, and letM(n,m) stand for the set of n×m matrices. We have seen that the choice of basesfor �n,�m defines a function.
M :L(�n,�m)→M(n,m)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
56 2 Linear Spaces and Transformations
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
which take a linear transformation T to its representation M(T ). We now exam-ine the relationship between two representations M1(T ), M2(T ) of a single lineartransformation.
Basis Change Theorem Let {v1, . . . , vn} and {u1, . . . , um} be bases for �n,�m
respectively.Let T be a linear transformation which is represented by a matrix A = (aij )
with respect to these bases. If V ′ = {v′1, . . . , v′n},U ′ = {u′1, . . . , u′m} are new basesfor �n,�m then T is represented by the matrix B =Q−1 ◦A ◦ P , where P,Q arerespectively (n× n) and (m×m) invertible matrices.
Proof For each v′k ∈ V ′ = {v′1, . . . , v′n} let v′k =∑n
i=1 bikvi and bk = (b′1k, . . . , bnk).Let P = (b1, . . . , bn) where the kth column of P is the column of coordinates
of bk . With respect to the new basis V ′, v′k has coordinates ek = (0, . . . ,1, . . . ,0)
with a 1 in the kth place.But then P(ek)= bk the coordinates of v′k with respect to V .Thus P is the matrix that transforms coordinates with respect to V ′ into coordi-
nates with respect to V . Since V is a basis, the columns of P are linearly indepen-dent, and so rankP = n, and P is invertible.
In the same way let u′k =∑m
i=1 cikui, ck = (c1k, . . . , cmk) and Q= (c1, . . . , cm)
the matrix with columns of these coordinates.Hence Q represents change of basis from U ′ to U . Since Q is an invertible m×m
matrix it has inverse Q−1 which represents change of basis from U to U ′.Thus we have the diagram
{v1, . . . , vn} A−→ {u1, . . . , um}
P ↑ Q−1 ↓ ↑Q
{v′1, . . . , v′n
} B−→ {u′1, . . . , u′m
}
from which we see that the matrix B , representing the linear transformation T :�n→�m with respect to the new bases is given by B =Q−1 ◦A ◦ P . �
Isomorphism Theorem Any linear transformation T : �n→�m of rank r can berepresented, by suitable choice of bases for �n and �m, by an n×m matrix
(Ir 00 0
)where Ir =
(1 00 1
)is the (r × r) identity matrix.
In particular1. if n < m and T is injective then there is an isomorphism S : �m →�m such
that S ◦ T (x1, . . . , xn) = (x1, . . . , xn,0, . . . ,0) with (n − m) zero entries, forany vector (x1, . . . , xn) in �n
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 57
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
2. if n ≥ m and T is surjective then there are isomorphisms � : �n →�n, S :�m→�m such that S ◦ T ◦ R(x1, . . . , xn)= (x1, . . . , xm). If n=m, then S ◦T ◦R is the identity isomorphism.
Proof Of necessity rank(T )= r ≤min(n,m). If r < n, let p = n− r and choose abasis k1, . . . , kp for Ker(T ). Let V = {v1, . . . , vn} be the original basis for �n. Bythe exchange theorem there exists r = (n− p) different members {v1, . . . , vr} sayof V such that V ′ = {v1, . . . , vr , k1, . . . , kp} is a basis for �n.
Choose V ′ as the new basis for �n, and let P be the basis change matrixwhose columns are the column vectors in V ′. As in the proof of the dimen-sion theorem the image of the vectors v1, . . . , vn−p under T provide a basis forthe image of T . Let U = {u1, . . . , um} be the original basis of �m. By the ex-change theorem there exists some subset U ′ = {u1, . . . , um−r} of U such that U ′′ ={T (v1), . . . , T (vr), u1, . . . , um−r} form a basis for �m. Note that T (v1), . . . , T (vr)
are represented by the r linearly independent columns of the original matrix A rep-resenting T . Now let Q be the matrix whose columns are the members of U ′′. Bythe basis change theorem, B =Q−1 ◦A ◦ P , where B is the matrix representing T
with respect to these new bases. Thus we obtain
{v1, . . . , vn} A−→ {u1, . . . , um}
P ↑ Q−1 ↓
{v1, . . . , vr , k1 . . .} B−→ {T (v1) . . . T (vr), u1, . . . , um−r
}.
With respect to these new bases, the matrix B representing T has the required form:(
Ir 00 0
).
1. If n < m and T is injective then r = n. Hence P is the identity matrix, and soB =Q−1 ◦A.
But Q−1 is an m×m invertible matrix, and thus represents an isomorphism�n −→�n, while
B
(x1xn
)=(
In
0
)(x1xn
)=
⎛
⎜⎜⎜⎜⎜⎜⎝
x1...
xn
...
0
⎞
⎟⎟⎟⎟⎟⎟⎠
.
Write a vector x =∑ni=1 xivi as (x1, . . . , xn), and let S be the linear transfor-
mation �m →�m represented by the matrix Q−1. Then S ◦ T (x1, . . . , xn) =(x1, . . . , xn,0, . . . ,0).
2. If n≥m and T is surjective then rank(T )=m, and dim Ker(T )= n−m. ThusB = (Im 0)=Q−1 ◦A ◦ P . Let S,R be the linear transformations representedby Q−1 and P respectively.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
58 2 Linear Spaces and Transformations
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
Then S ◦ T ◦ R(x1, . . . , xn) = (x1, . . . , xm). If n = m then S ◦ T ◦ R is theidentity transformation. �
Suppose now that V,U are the two bases for �n,�m as in the basis theorem.A linear transformation T : �n −→ �m is represented by a matrix M1(T ) withrespect to these bases. If V ′,U ′ are two new bases, then T will be represented bythe matrix M2(T ), and by the basis theorem
M2(T )=Q−1 ◦M1(T ) ◦ P
where Q,P are non-singular (m × m) and (n × n) matrices respectively. SinceM1(T ) and M2(T ) represent the same linear transformation, they are in some senseequivalent. We show this more formally.
Say the two matrices A,B ∈ M(n,m) are similar iff there exist non singularsquare matrices P ∈M∗(n,n) and Q ∈M∗(m,m) such that B =Q−1 ◦A ◦ P , andin this case write B ∼A.
Lemma 2.12 The similarity relation (∼) on M(n,m) is an equivalence relation.
Proof1. To show that ∼ is reflexive note that A= I−1
m ◦A ◦ In where Im, In are respec-tively the (m×m) and (n× n) identity matrices.
2. To show that ∼ is symmetric we need to show that B ∼A implies that A∼ B .Suppose therefore that B =Q−1 ◦A ◦ P .Since Q ∈M∗(m,m) it has inverse Q−1 ∈M∗(m,m).Moreover (Q−1)−1 ◦Q−1 = Im, and thus Q= (Q−1)−1. Thus
Q ◦B ◦ P−1 = (Q ◦Q−1) ◦A ◦ (P ◦ P−1)
=A
= (Q−1)−1 ◦B ◦ (P−1).
Thus A∼ B .3. To show ∼ is transitive, we seek to show that C ∼ B ∼ A implies C ∼ A.
Suppose therefore that C =R−1 ◦B ◦ S and B =Q−1 ◦A ◦ P , where R,Q ∈M∗(m,m) and S,P ∈M∗(n,n). Then
C = (R−1 ◦Q−1) ◦A ◦ P ◦ S
= (Q ◦R)−1 ◦A ◦ (P ◦ S).
Now (M∗(m,m),◦), (M∗(n,n),◦) are both groups and so Q◦R ∈M∗(m,m),P ◦ S ∈M∗(n,n). Thus C ∼A. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 59
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
The isomorphism theorem shows that if there is a linear transformation T :�n −→ �m of rank r , then the (n × m) matrix M1(T ) which represents T , withrespect to some pair of the bases, is similar to an n×m matrix
B =(
Ir 00 0
)i.e., M(T )∼ B.
If S is a second linear transformation of rank r then M1(S)∼ B .By Lemma 2.12, M1(S)∼M1(T ).Suppose now that U ′,V ′ are a second pair of bases for �n,�m and let
M2(S),M2(T ) represent S and T . Clearly M2(S)∼M2(T ).Thus if S,T are linear transformations �m −→ �n we may say that S,T are
equivalent iff for any choice of bases the matrices M(S),M(T ) which representS,T are similar.
For any linear transformation T ∈ L(�n,�m) let [T ] be the equivalence class{S ∈ L(�n,�m) : S ∼ T }. Alternatively a linear transformation S belongs to [T ]iff rank(S)= rank(T ). Consequently the equivalence relation partitions L(�n,�m)
into a finite number of distinct equivalence classes where each class is classified byits rank, and the rank runs from 0 to min(n,m).
2.2.5 Examples
Example 2.1 To illustrate the use of these procedures in the solution of linear equa-tions, consider the case with n < m and the equation A(x)= y where
A=
⎛
⎜⎜⎝
1 −1 25 0 3−1 −4 53 2 −1
⎞
⎟⎟⎠ and y1 =
⎛
⎜⎜⎝
−11−11
⎞
⎟⎟⎠ , y2 =
⎛
⎜⎜⎝
05−55
⎞
⎟⎟⎠ .
To find Im(A), we first of all find Ker(A). The equation A(x) = 0 gives fourequations
x1 − x2 + 2x3 = 0
5x1 + 0+ 3x3 = 0
−x1 − 4x2 + 5x3 = 0
3x1 + 2x2 − x3 = 0
with solution k = (x1, x2, x3)= (−3,7,5).Thus Ker(A) ⊃ {λk ∈ �3 : λ ∈ �}. Hence dim Im(A) ≤ 2. Clearly the first two
columns (a1, a2) of A are linearly independent and so dim Im(A) = 2. Howevery2 = a1 + a2. Thus a particular solution to the equation A(x)= y2 is x0 = (1,1,0).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
60 2 Linear Spaces and Transformations
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
The full set of solutions to the equation is
x0 +Ker(A)= {(1,1,0)+ λ(−3,7,5) : λ ∈ �}.
To see whether y1 ∈ Im(A) we need only attempt to solve the equation y1 = αa1 +βa2. This gives
−1= α − β
1= 5α
−1=−α − 4β
1= 3α + 2β.
From the first two equations α = 15 , β = 6
5 , which is incompatible with the fourthequation. Thus y1 cannot belong to Im(A).
Example 2.2 Consider now an example of the case n > m, where
A=(
2 1 1 1 11 2 −1 1 1
): �5 −→�2.
Obviously the first two columns are linearly independent and so dim Im(A)≥ 2.Let {ai : i = 1, . . . ,5} be the five column vectors of the matrix and consider theequation
(21
)−(
12
)−(
1−1
)=(
00
).
Thus k1 = (1,−1,−1,0,0) belongs to Ker(A). On the other hand
(21
)+(
12
)− 3
(11
)=(
00
).
Thus k2 = (1,1,0,−3,0) and k3 = (1,1,0,0,−3) both belong to Ker(A).Consequently the rank of A has its maximal value of 2, while the kernel is three-
dimensional. Hence for any y ∈ �2 there is a set of solutions of the form x0 +Span{k1, k2, k3} to the equation A(x)= y.
Change the bases of �5 and �2 to
⎛
⎜⎜⎜⎜⎝
10000
⎞
⎟⎟⎟⎟⎠
,
⎛
⎜⎜⎜⎜⎝
01000
⎞
⎟⎟⎟⎟⎠
,
⎛
⎜⎜⎜⎜⎝
1−1−100
⎞
⎟⎟⎟⎟⎠
,
⎛
⎜⎜⎜⎜⎝
110−30
⎞
⎟⎟⎟⎟⎠
,
⎛
⎜⎜⎜⎜⎝
1100−3
⎞
⎟⎟⎟⎟⎠
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.2 Linear Transformations 61
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
and( 2
1
),( 1
2
)respectively, then
B =(
2 11 2
)−1(2 1 1 1 11 2 −1 1 1
)
⎛
⎜⎜⎜⎜⎝
1 0 1 1 10 1 −1 1 10 0 −1 0 00 0 0 −3 00 0 0 0 −3
⎞
⎟⎟⎟⎟⎠
= 1
3
(2 −1−1 2
) (2 1 0 0 01 2 0 0 0
)=(
1 0 0 0 00 1 0 0 0
).
Example 2.3 Consider the matrix
Q=
⎛
⎜⎜⎝
1 −1 0 05 0 0 0−1 −4 1 03 2 0 1
⎞
⎟⎟⎠ .
Since |Q| = 5 we can compute its inverse. The cofactor matrix (Qij ) of Q is
⎛
⎜⎜⎝
0 −5 −20 101 1 5 −50 0 5 00 0 0 5
⎞
⎟⎟⎠
and thus
Q−1 = 1
|Q| (Qij )t =
⎛
⎜⎜⎝
0 15 0 0
−1 15 0 0
−4 1 1 02 −1 0 1
⎞
⎟⎟⎠ .
Example 2.4 Let T : �3→�4 be the linear transformation represented by the ma-trix A of Example 2.1, with respect to the standard bases for �3,�4. We seek tochange the bases so as to represent T by a diagonal matrix
B =(
Ir
0
).
By Example 2.1, the kernel is spanned by (−3,7,5), and so we choose a newbasis
e1 =⎛
⎝100
⎞
⎠ , e2 =⎛
⎝010
⎞
⎠ , k =⎛
⎝−375
⎞
⎠
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
62 2 Linear Spaces and Transformations
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
with basis change matrix P = (e1, e2, k). Note that |P | = 5 and P is non-singular.Thus {e1, e2, k} form a basis for �3. Now Im(A) is spanned by the first two columnsa1, a2, of A. Moreover A(e1)= a1 and A(e2)= a2. Thus choose
a1 =
⎛
⎜⎜⎝
15−13
⎞
⎟⎟⎠ , a2 =
⎛
⎜⎜⎝
−10−42
⎞
⎟⎟⎠ , e′3 =
⎛
⎜⎜⎝
0010
⎞
⎟⎟⎠ , e′4 =
⎛
⎜⎜⎝
0001
⎞
⎟⎟⎠
as the new basis for �4. Let Q = (a1, a2, e′3, e
′4) be the basis change matrix. The
inverse Q−1 is computed in Example 2.3. Thus we have (B)=Q−1 ◦A ◦ P .To check that this is indeed the case we compute:
Q−1 ◦A ◦ P =
⎛
⎜⎜⎝
0 15 0 0
−1 15 0 0
−4 1 1 02 −1 0 1
⎞
⎟⎟⎠
⎛
⎜⎜⎝
1 1 25 0 3−1 −4 53 2 −1
⎞
⎟⎟⎠
⎛
⎝1 0 −30 1 70 0 5
⎞
⎠
=
⎛
⎜⎜⎝
1 0 00 1 00 0 00 0 0
⎞
⎟⎟⎠
as required.
2.3 Canonical Representation
When considering a linear transformation T : �n −→ �n it is frequently conve-nient to change the basis of �n to a new basis V = {v1, . . . , vn} such that T is nowrepresented by a matrix
M2(T )= P−1 ◦M1(T ) ◦ P.
In this case it is generally not possible to obtain M2(T ) in the form(
Ir 00 0
)as before.
Under certain conditions however M2(T ) can be written in a diagonal form
⎛
⎝λ1 0
·0 λn
⎞
⎠ ,
where λ1, . . . , λn are known as the eigenvalues.More explicitly, a vector x is called an eigenvector of the matrix A iff there is a
solution to the equation A(x)= λx where λ is a real number. In this case, λ is calledthe eigenvalue associated with the eigenvector x. (Note that we assume x �= 0.)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.3 Canonical Representation 63
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
2.3.1 Eigenvectors and Eigenvalues
Suppose that there are n linearly independent eigenvectors {x1, . . . , xn} for A, where(for each i = 1, . . . , n) λi is the eigenvalue associated with xi . Clearly the eigen-vector xi belongs to Ker(A) iff λi = 0. If rank(A) = r then there would a subset{x1, . . . , xr} of eigenvectors which form a basis for Im(A), while {x1, . . . , xn} forma basis for �n. Now let Q be the (n× n) matrix representing a basis change fromthe new basis to the original basis. That is to say the ith column, vi , of Q is thecoordinate of xi with respect to the original basis.
After transforming, the original becomes
Q−1 ◦A ◦Q=⎛
⎜⎝
λ1 . . . 0... λr
...
0 0
⎞
⎟⎠=∧,
where rank∧= rankA= r .In general we can perform this diagonalisation only if there are enough eigen-
vectors, as the following lemma indicates.
Lemma 2.13 If A is an n× n matrix, then there exists a non-singular matrix Q,and a diagonal matrix ∧ such that ∧ =Q−1AQ iff the eigenvectors of A form abasis for �n.
Proof1. Suppose the eigenvectors form a basis, and let Q be the eigenvector matrix. By
definition, if vi is the ith column of Q, then A(vi)= λivi , where λi is real. ThusAQ=Q∧. But since {v1, . . . , vn} is a basis, Q−1 exists and so ∧=Q−1AQ.
2. On the other hand if ∧ =Q−1AQ, where Q is non-singular then AQ =Q∧.But this is equivalent to A(v1)= λivi for i = 1, . . . , n where λi is the ith diag-onal entry in ∧, and vi is the ith column of Q.Since Q is non-singular, the columns {v1, . . . , vn} are linearly independent, andthus the eigenvectors form a basis for �n. �
If there are n distinct (real) eigenvalues then this gives a basis, and thus a diago-nalisation.
Lemma 2.14 If {v1, . . . , vm} are eigenvectors corresponding to distinct eigenval-ues {λ1, . . . , λm}, of a linear transformation T : �n →�n, then {v1, . . . , vm} arelinearly independent.
Proof Since v1 is assumed to be an eigenvector, it is non-zero, and thus {v1} is alinearly independent set. Proceed by induction.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
64 2 Linear Spaces and Transformations
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
Suppose Vk = {v1, . . . , vk}, with k < m, are linearly independent. Let vk+1 beanother eigenvector and suppose
v =k+1∑
r=1
arvr = 0.
Then 0= T (v)=∑k+1r=1 arT (vr)=∑k+1
r=1 arλrvr .If λk+1 = 0, then λi �= 0 for i = 1, . . . , k and by the linear independence of
Vk, arλr = 0, and thus ar = 0 for r = 1, . . . , k.Suppose λk+1 �= 0. Then
λk+1v =k+1∑
r=1
λk+1arvr =k+1∑
r=1
arλrvr = 0.
Thus∑k
r=1(λk+1 − λr)arvr = 0.By the linear independence of Vk, (λk+1 − λr)ar = 0 for r = 1, . . . , k.But the eigenvalues are distinct and so ar = 0, for r = 1, . . . , k.Thus ak+1vk+1 = 0 and so ar = 0, r = 1, . . . , k + 1. Hence
Vk+1 = {v1, . . . , vk+1}, k < m,
is linearly independent.By induction Vm is a linearly independent set. �
Having shown how the determination of the eigenvectors gives a diagonalisation,we proceed to compute eigenvalues.
Consider again the equation A(x) = λx. This is equivalent to the equationA′(x)= 0, where
A′ =
⎛
⎜⎜⎜⎝
a11 − λ a12 . . . a1n
a21 a22 − λ ·...
......
...
an1 ann − λ
⎞
⎟⎟⎟⎠
.
For this equation to have a non zero solution it is necessary and sufficient that|A′| = 0. Thus we obtain a polynomial equation (called the characteristic equation)of degree n in λ, with n roots λ1, . . . , λn not necessarily all real. In the 2× 2 casefor example this equation is λ2− λ(a11+ a22)+ (a11a22− a21a12)= 0. If the rootsof this equation are λ1, λ2 then we obtain
(λ− λ1)(λ− λ2)= λ2 − λ(λ1 + λ2) + λ1λ2.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.3 Canonical Representation 65
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
Hence
λ1λ2 = (a11a22 − a21a22)= |A|λ1 + λ2 = a11 + a22.
The sum of the diagonal elements of a matrix is called the trace of A. In the 2×2case therefore
λ1λ2 = |A|, λ1 + λ2 = a11 + a22 = trace(A).
In the 3× 3 case we find
(λ− λ1)(λ− λ2)(λ− λ3)
= λ3 − λ2(λ1 + λ2 + λ3)+ λ(λ1λ2 + λ1λ3 + λ2λ3)− λ1λ2λ3
= λ3 − λ2(traceA)+ λ(A11 +A22 +A33)− |A| = 0,
where Aii is the ith diagonal cofactor of A. Suppose all the roots are non-zero (thisis equivalent to the non-singularity of the matrix A). Let
∧=⎛
⎝λ1 0 00 λ2 00 0 λ3
⎞
⎠
be the diagonal eigenvalue matrix, with | ∧ | = λ1λ2λ3.The cofactor matrix of ∧ is then
⎛
⎝λ2λ3 0 0
0 λ1λ3 00 0 λ1λ2
⎞
⎠ .
Thus we see that the sum of the diagonal cofactors of A and ∧ are identical.Moreover trace (A)= trace (∧) and | ∧ | = |A|.
Now let ∼ be the equivalence relation defined on L(�n,�n) by B ∼A iff thereexist basis change matrices P,Q and a diagonal matrix ∧ such that
∧= P−1AP =Q−1BQ.
On the set of matrices which can be diagonalised, ∼ is an equivalence relation,and each class is characterised by n invariants, namely the trace, the determinant,and (n− 2) other numbers involving the cofactors.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
66 2 Linear Spaces and Transformations
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
2.3.2 Examples
Example 2.5 Let
A=⎛
⎝2 1 −10 1 12 0 −2
⎞
⎠ .
The characteristic equation is
(2− λ)[(1− λ)(−2− λ)
]− 1(−2)− (−2(1− λ)=−λ(λ2 − λ− 2
)
=−λ(λ− 2)(λ+ 1)
= 0.
Hence (λ1, λ2, λ3)= (0,2,−1). Note that λ1 + λ2 + λ3 = trace(A)= 1 and
λ2λ3 =−2=A11 +A22 +A33.
Eigenvectors corresponding to these eigenvalues are
x1 =⎛
⎝1−11
⎞
⎠ , x2 =⎛
⎝211
⎞
⎠ , x3 =⎛
⎝1−1
2
⎞
⎠ .
Let P be the basis change matrix given by these three column vectors. The in-verse can be readily computed, to give
P−1AP =⎛
⎝1 −1 −113
13 0
− 23
13 1
⎞
⎠
⎛
⎝2 1 −10 1 12 0 2
⎞
⎠
⎛
⎝1 2 1−1 1 −11 1 2
⎞
⎠=⎛
⎝0 0 00 2 00 0 −1
⎞
⎠ .
Suppose we now compute A2 =A ◦A : �3→�3. This can easily be seen to be
⎛
⎝2 3 12 1 −10 2 2
⎞
⎠ .
The characteristic function of A2 is (λ3 − 5λ2 + 4λ) with roots μ1 = 0, μ2 = 4,μ3 = 1.
In fact the eigenvectors of A2 are x1, x2, x3, the same as A, but with eigenval-ues λ2
1, λ22, λ
23. In this case Im(a) = Im(A2) is spanned by {x2, x3} and Ker(A) =
Ker(A2) has basis {x1}.More generally consider a linear transformation A : �n →�n. Then if x is an
eigenvector with a non-zero eigenvalue λ,A2(x) = A ◦ A(x) = A[λx] = λA(x) =λ2x, and so x ∈ Im(A)∩ Im(A2).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.3 Canonical Representation 67
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
If there exist n distinct real roots to the characteristic equation of A, then a basisconsisting of eigenvectors can be found. Then A can be diagonalized, and Im(A)=Im(A2), Ker(A)=Ker(A2).
Example 2.6 Let
A=⎛
⎝3 −1 −11 3 −75 −3 1
⎞
⎠
Then Ker(A) has basis {(1,2,1)}, and Im(A) has basis {(3,1,5), (−1,3,−3)}. Theeigenvalues of A are 0,0,7. Since we cannot find three linearly independent eigen-vectors, A cannot be diagonalised. Now
A2 =⎛
⎝3 −3 3−29 29 −2917 −17 17
⎞
⎠
and thus Im(A2) has basis {(3,−29,17)}. Note that
⎛
⎝3−2917
⎞
⎠=−2
⎛
⎝315
⎞
⎠− 9
⎛
⎝−13−3
⎞
⎠ ∈ Im(A)
and so Im(A2) is a subspace of Im(A).Moreover Ker(A2) has basis {(1,2,1), (1,−1,0)} and so Ker(A) is a subspace
of Ker(A2).This can be seen more generally. Suppose f : �n→�n is linear, and x ∈Ker(f ).
Then f 2(x)= f (f (x))= 0, and so x ∈ Ker(f 2). Thus Ker(f )⊂ Ker(f 2). On theother hand if v ∈ Im(f 2) then there exists w ∈ �n such that f 2(w)= v. But f (w) ∈�n and so f (f (w))= v ∈ Im(f ). Thus Im(f 2)⊂ Im(f ).
2.3.3 Symmetric Matrices and Quadratic Forms
Given two vectors x = (x1, . . . , xn) and y = (y1, . . . , yn) in �n, let 〈x, y〉 =∑ni=1 xiyi ∈ � be the scalar product of x and y. Note that 〈λx,y〉 = λ〈x, y〉 =
〈x,λy〉 for any real λ. (We use 〈−,−〉 to distinguish the scalar product from a vec-tor in �2. However the notations (x, y) or x · y are often used for scalar product.)
An n× n matrix A= (aij ) may be regarded as a map A∗ : �n×�n→�, whereA∗(x, y)= 〈x,A(y)〉.
A∗ is linear in both x and y and is called bilinear. By definition 〈x,A(y)〉 =∑ni=1∑n
j=1 xiaij yj .Call an n× n matrix A symmetric iff A=At where At = (aji) is obtained from
A by exchanging rows and columns.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
68 2 Linear Spaces and Transformations
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
In this case 〈A(x), y〉 =∑ni=1(∑n
j=1 aji xi)yj =∑ni=1∑n
j=1 xi aij yj , sinceaij = aji for all i, j .
Hence 〈A(x), y〉 = 〈x,A(y)〉 for any x, y ∈ �n whenever A is symmetric.
Lemma 2.15 If A is a symmetric n × n matrix, and x, y are eigenvectors of A
corresponding to distinct eigenvalues then 〈x, y〉 = 0, i.e., x and y are orthogonal.
Proof Let λ1 �= λ2 be the eigenvalues corresponding to the distinct eigenvectorsx, y. Now
⟨A(x), y
⟩= ⟨x,A(y)⟩
= 〈λ1x, y〉 = 〈x,λ2y〉= λ1〈x, y〉 = λ2〈x, y〉.
Here 〈A(x), y〉 = 〈x,A(y)〉 since A is symmetric. Moreover 〈x,λy〉 =∑ni=1 xi(λyi)= λ〈x, y〉. Thus (λ1 − λ2)〈x, y〉 = 0. If λ1 �= λ2 then 〈x, y〉 = 0. �
Lemma 2.16 If there exist n distinct eigenvalues to a symmetric n× n matrix A,then the eigenvectors X = {x1, . . . , xn} form an orthogonal basis for �n.
Proof Directly by Lemmas 2.14 and 2.15.We may also give a brief direct proof of Lemma 2.16 by supposing that∑ni=1 αixi = 0. But then for each j = i, . . . , n,
0= 〈xj ,0〉 =n∑
i=1
αi〈xj , xi〉 = αj 〈xj , xj 〉.
But since xj �= 0, 〈xj , xj 〉> 0 and so αj = 0 for each j . Thus X is a frame. Sincethe vectors in X are mutually orthogonal, X is an orthogonal basis for �n.
For a symmetric matrix the roots of the characteristic equation will all be real.To see this in the 2× 2 case, consider the characteristic equation
(λ− λ1)(λ− λ2)= λ2 − λ(a11 + a22)= (a11a22 − a21a12).
The roots of this equation are −b+√
b2−4c2 with real roots iff b2 − 4c ≥ 0.
But this is equivalent to
(a11 + a22)2 − 4(a11a22 − a21a12)= (a11 − a22)
2 + 4(a12)2 ≥ 0,
since a12 = a21.Both terms in this expression are non-negative, and so λ1, λ2 are real.In the case of a symmetric matrix, A, let Eλ be the set of eigenvectors associ-
ated with a particular eigenvalue, λ, of A together with the zero vector. Supposex1, x2 belong to Eλ. Clearly A(x1+ x2)=A(x1)+A(x2)= λ(x1+ x2) and so x1+
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.3 Canonical Representation 69
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
x2 ∈ Eλ. If x ∈ Eλ, then A(αx) = αA(x) = α(λx) = λ(αx) and αx ∈ Eλ for eachnon-zero real number, α.
Since we also now suppose that for each eigenvalue, λ, the eigenspace Eλ con-tains 0, then Eλ will be a vector subspace of �n. If λ= λ1 = · · · = λr are repeatedroots of the characteristic equation, then, in fact, the eigenspace, Eλ, will be of di-mension r , and we can find r mutually orthogonal vectors in Eλ, forming a basisfor Eλ.
Suppose now that A is a symmetric n×n matrix. As we shall show we may write∧ = P−1AP where P is the n× n basis change matrix whose columns are the n
linearly independent eigenvectors of A.Now normalise each eigenvector xj by defining zj = 1
‖xj ‖ (x1j , . . . , xnj ) where
‖xj‖ =√∑
(xkj )2 =√〈xj , xj 〉 is called the norm of xj .Let Q = (z1, . . . , zn) be the n× n matrix whose columns consist of z1, . . . , zn.
Now
QtQ=⎛
⎝z11 z21 zn1z1j znj
z1n znn
⎞
⎠
⎛
⎜⎜⎜⎝
z11 z1j z1n
z21 z2j z2n
......
...
zn1 znj znn
⎞
⎟⎟⎟⎠
=⎛
⎜⎝
〈z1, z1〉 . . . 〈z1, zn〉〈z2, z1〉
... . . . 〈zn, zn〉
⎞
⎟⎠
since the (i, k)th entry in QtQ is∑n
r=1 zrizrk = 〈zi, zk〉.But (zi, zk) = 〈 xi‖xi‖ ,
xk‖xk‖ 〉 = 1‖xi‖‖xk‖ 〈xi, xk〉 = 0 if i �= k. On the other hand
〈zi, zi〉 = 1‖xi‖2 〈xi, xi〉 = 1, and QtQ = In, the n × n identity matrix. Thus Qt =
Q−1.Since {z1, . . . , zn} are eigenvectors of A with real eigenvalues {λ1, . . . , λn} we
obtain
∧=⎛
⎝λ1 0
λr
0 0
⎞
⎠=QtAQ
where the last (n− r) columns of Q correspond to the kernel vectors of A.When A is a symmetric n× n matrix the function A∗ : �n ×�n →� given by
A∗(x, y)= 〈x,A(y)〉 is called a quadratic form, and in matrix notation is given by
(x1, . . . , xn)
⎛
⎝aij
⎞
⎠
⎛
⎝y1
yn
⎞
⎠
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
70 2 Linear Spaces and Transformations
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
Consider
A∗(x, x)= ⟨x,A(x)⟩
= ⟨x,Q∧At(x)⟩
= ⟨Qt(x),∧Qt(x)⟩.
Now Qt(x) = (x′1 . . . , x′n) is the coordinate representation of the vector x withrespect to the new basis {z1, . . . , zn} for �n. Thus
A∗(x, x)= (x′1, . . . , x′n)⎛
⎝λ1
λr
0
⎞
⎠(
x′1x′n
)=
r∑
i=1
λi
(x′i)2
.
Suppose that rankA= r and all eigenvalues of A are non-negative. In this case,A∗(x, x)=∑n
i=1 |λi |(xi)2 ≥ 0. Moreover if x is a non-zero vector then Qt(x) �= 0,
since Qt must have rank n.Define the nullity of A∗ to be {x :A∗(x, x)= 0}. Clearly if x is a non-zero vector
in Ker(A) then it is an eigenvector with eigenvalue 0. Thus the nullity of A∗ is avector subspace of �n of dimension at least n− r , where r = rank(A). If the nullityof A∗ is {0} then call A∗ non-degenerate. If all eigenvalues of A are strictly positive(so that A∗ is non-degenerate) then A∗(x, x) > 0 for all non-zero x ∈ �n. In thiscase A∗ is called positive definite. If all eigenvalues of A are non-negative but someare zero, then A∗ is called positive semi-definite, and in this case A∗(x, x) > 0,for all x in a subspace of dimension r in �n. Conversely if A∗ is non-degenerateand all eigenvalues are strictly negative, then A∗ is called negative definite. If theeigenvalues are non-positive, but some are zero, then A∗ is called negative semi-definite.
The index of the quadratic form A∗ is the maximal dimension of the subspaceon which A∗ is negative definite. Therefore index (A∗) is the number of strictlynegative eigenvalues of A.
When A has some eigenvalues which are strictly positive and some which arestrictly negative, then we call A∗ a saddle.
We have not as yet shown that a symmetric n× n matrix has n real roots to itscharacteristic equation. We can show however that any (symmetric) quadratic formcan be diagonalised.
Let A= (aij ) and 〈x,A(x)〉 =∑ni=1∑n
j=1 aij xixj . If aii = 0 for all i = 1, . . . , n
then it is possible to make a linear transformation of coordinates such that aij �= 0 forsome j . After relabelling coordinates we can take a11 �= 0. In this case the quadraticform can be written
⟨x,A(x)
⟩= a11x21 + 2a12x1x2, . . .
= a11
(x1 + a12
a11x2 · · ·
)2
+(
a22 − a212
a11
2)(x2 + · · ·)2 + · · ·
=n∑
i=1
αiy2i .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.3 Canonical Representation 71
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
Here each yi is a linear combination of {x1, . . . , xn}. Thus the transformation x→P(x)= y is non-singular and has inverse Q say.
Letting x =Q(y) we see the quadratic form becomes
⟨x,A(x)
⟩= ⟨Q(y),A ◦Q(y)⟩
= ⟨y,QtAQ(y)⟩
= ⟨y,D(y)⟩,
where D is a diagonal matrix with real diagonal entries (α1, . . . , αn). Note thatD =QtAQ and so rank(D) = rank(A) = r , say. Thus only r of the diagonal en-tries may be non zero. Since the symmetric matrix, A, can be diagonalised, not onlyare all its eigenvalues real, but its eigenvectors form a basis for �n. Consequently∧ = P−1AP where P is the n× n basis change matrix whose columns are theseeigenvectors. Moreover, if λ is an eigenvalue with multiplicity r (i.e., λ occurs asa root of the characteristic equation r times) then the eigenspace, Eλ, has dimen-sion r . �
2.3.4 Examples
Example 2.7 To give an illustration of this procedure consider a matrix
A=⎛
⎝0 0 10 1 01 0 0
⎞
⎠
representing the quadratic form x22 + 2x1x3. Let
P1(x)=⎛
⎝0 1 01 0 01 0 1
⎞
⎠
⎛
⎝x1x2x3
⎞
⎠=⎛
⎝z1z2z3
⎞
⎠
giving the quadratic form z21 − 2(z2 − 1
2z3)2 + 1
2z23 and
P2(z)=⎛
⎝1 0 00 1 − 1
20 0 1
⎞
⎠
⎛
⎝z1z2z3
⎞
⎠=⎛
⎝y1y2y3
⎞
⎠ .
Then 〈x,A(x)〉 = 〈y,D(y)〉, where
D =⎛
⎝1 0 00 −2 00 0 1
2
⎞
⎠=⎛
⎝α1 0 00 α2 00 0 α3
⎞
⎠ , and A= P t1P t
2DP2P1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
72 2 Linear Spaces and Transformations
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
Consequently the matrix A can be diagonalised. A has characteristic equation(1− λ)(λ2 − 1) with eigenvalues 1, 1, −1.
Then normalized eigenvectors of A are
1√2
⎛
⎝101
⎞
⎠ ,
⎛
⎝010
⎞
⎠ ,1√2
⎛
⎝10−1
⎞
⎠ ,
corresponding to the eigenvalues 1, 1, −1.Thus A∗ is a non-degenerate saddle of index 1. Let Q be the basis change matrix
1√2
⎛
⎝1 0 10 2 01 0 −1
⎞
⎠ .
Then
QtAQ= 1
2
⎛
⎝1 0 10 2 01 0 −1
⎞
⎠
⎛
⎝0 0 10 1 01 0 0
⎞
⎠
⎛
⎝1 0 10 2 01 0 −1
⎞
⎠=⎛
⎝1 0 00 1 00 0 −1
⎞
⎠ .
As a quadratic form
(x1, x2, x3)A
⎛
⎝x1x2x3
⎞
⎠= (x1, x2, x3)
⎛
⎝x3x2x1
⎞
⎠= x1x3 + x22 .
We can also write this as
(x1, x2, x3)1
2
⎛
⎝1 0 10√
2 01 0 −1
⎞
⎠
⎛
⎝1 0 00 1 00 0 −1
⎞
⎠
⎛
⎝1 0 10√
2 01 0 −1
⎞
⎠
⎛
⎝x1x2x3
⎞
⎠
= 1
2
(x1 + x3,
√2x2, x1 − x3
)⎛
⎝1 0 00 1 00 0 −1
⎞
⎠
⎛
⎝x1 + x3√
2x2x1 − x3
⎞
⎠
= 1
2(x1 + x3)
2 + 2x22 − (x1 − x3)
2.
Note that A is positive definite on the subspace {(x1, x2, x3) ∈ �3 : (x1 = x3)}spanned by the first two eigenvectors.
We can give a geometric interpretation of the behaviour of a matrix A with bothpositive and negative eigenvalues. For example
A=(
1 22 1
)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.4 Geometric Interpretation of a Linear Transformation 73
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
Fig. 2.2 ???
has eigenvectors
z1 =(
11
)z2 =
(1−1
)
corresponding to the eigenvalues 3, −1 respectively. Thus A maps the vector z1
to 3z1 and z2 to −z2. The second operation can be regarded as a reflection of thevector z2 in the line {(x, y) : x − y = 0}, associated with the first eigenvalue. Thefirst operation z1→ 3z1 is a translation of z1 to 3z1. Consider now any point x ∈ �2.We can write x = αz1 + βz2. Thus A(x)= 3αz1 − βz2. In other words A may bedecomposed into two operations: a translation in the direction z1, followed by areflection about z1.
2.4 Geometric Interpretation of a Linear Transformation
More generally suppose A has real roots to the characteristic equation and has eigen-vectors {x1, . . . , xs, z1, . . . , zt , k1, . . . , kp}.
The first s vectors correspond to positive eigenvalues, the next t vectors to nega-tive eigenvalues, and the final p vectors belong to the kernel, with zero eigenvalues.
Then A may be described in the following way:1. collapse the kernel vectors on to the image spanned by {x1, . . . , xs, z1, . . . , zt }.2. translate each xi to λixi .3. reflect each zj to −zj , and then translate to − | μj | zj (where μj is the nega-
tive eigenvalue associated with zj ).These operations completely describe a symmetric matrix or a matrix, A, which
is diagonalisable. When A is non-symmetric then it is possible for A to have com-plex roots.
For example consider the matrix
(cos θ − sin θ
sin θ cos θ
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
74 2 Linear Spaces and Transformations
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
Fig. 2.3 ???
As we have seen this corresponds to a rotation by θ in an anticlockwise directionin the plane �2. To determine the eigenvalues, the characteristic equation is (cos θ−λ)2 + sin2 θ = λ2 − 2λ cos θ + (cos2 θ + sin2 θ)= 0. But cos2 θ + sin2 θ = 1. Thus
λ= 2 cos θ ± 2√
cos2 θ−12 = cos θ ± i sin θ .
More generally a 2× 2 matrix with complex roots may be regarded as a trans-formation λeiθ where λ corresponds to a translation by λ and eiθ corresponds torotation by θ .
Example 2.8 Consider A= ( 2 −22 2
)with trace (A)= tr(A)= 4 and |A| = 8.
As we have seen the characteristic equation for A is (λ2 − (traceA))+ |A| = 0,
with roots trace(A)±√
(traceA)2−4|A|2 . Thus the roots are 2 ± 1
2
√16− 32 = 2± 2i =
2√
2[ 1√2+ i√
2] where cos θ = sin θ = 1√
2and so θ = 45◦. Thus
A :(
x
y
)→ 2
√2
(x cos 45 −y sin 45x sin 45 +y cos 45
).
Consequently A first sends (x, y) by a translation to (2√
2x,2√
2y) and thenrotates this vector through an angle 45◦.
More abstractly if A is an n× n matrix with two complex conjugate eigenvalues(cos θ + i sin θ), (cos θ − i sin θ), then there exists a two dimensional eigenspace Eθ
such that A(x)= λeiθ (x) for all x ∈Eθ , where λeiθ (x) means rotate x by θ withinEθ and then translate by λ.
In some cases a linear transformation, A, can be given a canonical form in termsof rotations, translations and reflections, together with a collapse onto the kernel.What this means is that there exists a number of distinct eigenspaces
{E1, . . . ,Ep,X1, . . . ,Xs,K}
where A maps1. Ej to Ej by rotating any vector in Ej through an angle θj ;2. Xj to Xj by translating a vector x in Xj to λjx, for some non-zero real number
λj ;
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
2.4 Geometric Interpretation of a Linear Transformation 75
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
3. the kernel K to {0}.In the case that the dimensions of these eigenspaces sum to n, then the canonical
form of the matrix A is⎛
⎝eiθ 0 00 ∧ 00 0 0
⎞
⎠
where eiθ consists of p different 2× 2 matrices, and ∧ is a diagonal s × s matrix,while 0 is an (n− r)× (n− r) zero matrix, where r = rank(A)= 2p+ s.
However, even when all the roots of the characteristic equation are real, it neednot be possible to obtain a diagonal, canonical form of the matrix.
To illustrate, in Example 2.6 it is easy to show that the eigenvalue λ = 0 occurstwice as a root of the characteristic equation for the non-symmetric matrix A, eventhough the kernel is of dimension 1. The eigenvalue λ = 7 occurs once. Moreoverthe vector (3,−29,17) clearly must be an eigenvector for λ= 7, and thus span theimage of A2. However it is also clear that the vector (3,−29,17) does not span theimage of A. Thus the eigenspace E7 does not provide a basis for the image of A,and so the matrix A cannot be diagonalised.
However, as we have shown, for any symmetric matrix the dimensions of theeigenspaces sum to n, and the matrix can be expressed in canonical, diagonal, form.
In Chap. 4 below we consider smooth functions and show that “locally” such afunction can be analysed in terms of the canonical form of a particular symmetricmatrix, known as the Hessian.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
3Topology and Convex Optimisation
3.1 A Topological Space
In the previous chapter we introduced the notion of the scalar product of two vectorsin �n. More generally if a scalar product is defined on some space, then this permitsthe definition of a norm, or length, associated with a vector, and this in turn allowsus to define the distance between two vectors. A distance function or metric maybe defined on a space, X, even when X admits no norm. For example let X be thesurface of the earth. Clearly it is possible to say what is the shortest distance, d(x, y),between two points, x, y, on the earth’s surface, although it is not meaningful to talkof the “length” of a point on the surface. More general than the notion of a metricis that of a topology. Essentially a topology on a space is a mathematical notionfor making more precise the idea of “nearness”. The notion of topology can thenbe used to precisely define the property of continuity of a function between twotopological spaces. Finally continuity of a preference gives proof of existence of asocial choice and of an economic equilibrium in a world that is bounded.
3.1.1 Scalar Product and Norms
In Sect. 2.3 we defined the Euclidean scalar product of two vectors
x =n∑
i=1
xiei, and y =n∑
i=1
yiei in �n,
where {e1, . . . , en} is the standard basis, to be
〈x, y〉 =n∑
i=1
xiyi .
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_3,© Springer-Verlag Berlin Heidelberg 2014
77
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
78 3 Topology and Convex Optimisation
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
More generally suppose that {v1, . . . , vn} is a basis for �n, and (x1, . . . , xn),(y1, . . . , yn) are the coordinates of x, y with respect to this basis. Then
〈x, y〉 =n∑
i=1
n∑
j=1
xiyj 〈vi, vj 〉,
where 〈vi, vj 〉 is the scalar product of vi and vj . Thus
〈x, y〉 = (x1, . . . , xn)
⎛
⎜⎜⎝
...
aij
...
⎞
⎟⎟⎠
⎛
⎜⎝
y1...
yn
⎞
⎟⎠
where A = (aij ) = 〈vi, vj 〉i=1,...,n;j=1,...,n. If we let vi =∑nk=1 vikek , then clearly
〈vi, vi〉 = ∑nk=1(vik)
2 > 0. Moreover 〈vi, vj 〉 = 〈vj , vi〉. Thus the matrix A issymmetric. Since A must be of rank n, it can be diagonalized to give a matrix∧ =QtAQ, all of whose diagonal entries are positive. Here Q represents the or-thogonal basis change matrix and Qt(x1, . . . , xn) = (x′1, . . . , x′n) gives the coordi-nates of x with respect to the basis of eigenvectors of A. Hence
〈x, y〉 = ⟨x,A(y)⟩= ⟨x,Q∧Qt(y)
⟩
= ⟨Qt(x),∧Qt(y)⟩
=n∑
i=1
λix′iy′i .
Thus a scalar product is a non-degenerate positive definite quadratic form. Notethat the scalar product is bilinear since
〈x1 + x2, y〉 = 〈x1, y〉 + 〈x2, y〉 and 〈x, y1 + y2〉 = 〈x, y1〉 + 〈x, y2〉,
and symmetric since
〈x, y〉 = ⟨x,A(y)⟩= ⟨y,A(x)
⟩= 〈y, x〉.
We shall call the scalar product given by 〈x, y〉 =∑ni=1 xiyi the Euclidean scalar
product.
We define the Euclidean norm, ‖ ‖E , by ‖x‖E = √〈x, x〉 =√∑n
i=1 x2i . Note
that ‖x‖E ≥ 0 if and only if x = (0, . . . ,0). Moreover, if a ∈ �, then
‖ax‖E =√√√√
n∑
i=1
a2x2i = |a| ‖x‖E.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.1 A Topological Space 79
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Fig. 3.1 ???
Lemma 3.1 If x, y ∈ �n, then ‖x + y‖E ≤ ‖x‖E + ‖y‖E .
Proof For convenience write ‖ x‖E as ‖x‖. Now
‖x + y‖2 = 〈x + y, x + y〉 = 〈x, x〉 + 〈x, y〉 + 〈y, x〉 + 〈y, y〉.
But the scalar product is symmetric. Therefore
‖x + y‖2 = ‖x‖2 + ‖y‖2 + 2〈x, y〉.
Furthermore (‖x‖+ ‖y‖)2 = ‖x‖2 +‖y‖2 + 2‖x‖‖y‖. Thus ‖x + y‖ ≤ ‖x‖+ ‖y‖iff 〈x, y〉 ≤ ‖x‖‖y‖. To show this note that
∑i<j (xiyj − xjyi)
2 ≥ 0. Thus∑
i<j (x2i y2
j + x2j y2
i ) ≥ 2∑
i<j xiyixj yj . Add∑n
i=1 x2i y2
i to both sides. This
gives (∑n
i=1 x2i )(∑n
i=1 y2i ) ≥ (
∑ni=1 xiyi)
2. Therefore ‖x‖2 ‖y‖2 ≥ 〈x, y〉2 and so(x,y)‖x‖‖y‖ ≤ 1, or ‖x + y‖ ≤ ‖x‖ + ‖y‖. �
In this lemma we have shown that
−1≤ 〈x, y〉‖x‖ ‖y‖ ≤ 1.
This ratio can be identified with cos θ , where θ is the angle between the two vectorsx, y. In the case of unit vectors, 〈x, y〉 can be identified with the perpendicularprojection of y onto x as in Fig. 3.1.
The property ‖x + y‖ ≤ ‖x‖ + ‖y‖ is known as the triangle inequality (seeFig. 3.2).
Definition 3.1 Let X be a vector space over the field �. A norm, ‖ ‖, on X is amapping ‖ ‖ :X→� which satisfies the following three properties:N1. ‖x‖ ≥ 0 for all x ∈X, and ‖x‖ = 0 iff x = 0.N2. ‖ax‖ = |a|‖x‖ for all x ∈X, and a ∈ �.N3. ‖x + y‖ ≤ ‖x‖ + ‖y‖ for all x, y ∈X.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
80 3 Topology and Convex Optimisation
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Fig. 3.2 ???
There are many different norms on a vector space. For example if A is a non-degenerate positive definite symmetric matrix, we could define ‖ ‖A by ‖x‖A =√〈x,Ax〉.
The Cartesian norm is ‖x‖c = ‖(x1, . . . , xn)‖c =max{|x1|, . . . , |xn|}.Clearly ‖x‖c ≥ 0 and ‖x‖c = 0 iff xi = 0 for all i = 1, . . . , n. Moreover ‖ax‖c =
max{|ax1|, . . . , |axn|} =max{|a| |x1|, . . . , |a| |xn|} = |a| |xi |, for some i.Thus ‖ax‖c = |a|max{|x1|, . . . , |xn|} = |a| ‖x‖c.Finally, ‖x + y‖c = |xi + yi | for some i ≤ |xi | + |yi | ≤ max{|x1|, . . . , |xn|} +
max{|y1|, . . . , |yn|} = ‖x‖c + ‖y‖c. Define the city block norm ‖x‖B to be ‖x‖B =∑ni=1 |xi |. Clearly ‖x+y‖B =∑n
i=1 |xi+yi | ≤∑ni=1(|xi |+ |yi |)= ‖x‖B +‖y‖B .
If ‖ ‖ is a norm on the vector space X, the distance function or metric d on X
induced by ‖ ‖ is the function d :X×X→� : d(x, y)= ‖x − y‖.Note that d(x, y) ≥ 0 for all x, y ∈ X and that d(x, y) = 0 iff x − y = 0, i.e.,
x = y. Moreover,
d(x, y)+ d(y, z)= ‖x − y‖ + ‖y − z‖ ≥ ∥∥(x − y)+ (y − z)∥∥
= ‖x − z‖ = d(x, z).
Hence d(x, z) ≤ d(x, y)+ d(y, z).
Definition 3.2 A metric on a set X is a function d :X×X→� such thatD1. d(x, y)≥ 0 for all x, y ∈X and d(x, y)= 0 iff x = y
D2. d(x, z)≤ d(x, y)+ d(y, z) for all x, y, z ∈X.Note that a metric d may be defined on a set X even when X is not a vector
space. In particular d may be defined without reference to a particular norm. At thebeginning of the chapter for example we mentioned that the surface of the earth,S2, admits a metric d , where the distance between two points x, y on the surface ismeasured along a great circle through x, y. A second metric on S2 is obtained bydefining d(x, y) to be the angle, θ , subtended at the centre of the earth by the tworadii to x, y (see Fig. 3.3).
Any set X which admits a metric, d , we shall call metrisable, or a metric space.To draw attention to the metric, d , we shall sometimes write (X,d) for a metricspace.
In a metric space (X,d) the open ball at x of radius r in X is
Bd(x, r)= {y ∈X : d(x, y) < r},
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.1 A Topological Space 81
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Fig. 3.3 ???
and the closed ball centre x of radius r is
ClosBd(x, r)= {y ∈X : d(x, y)≤ r}.
The sphere of radius r at x is
Sd(x, r)= {y ∈X : d(x, y)= r}.
In �n, the Euclidean sphere of radius r is therefore
S(x, r)={
y :n∑
i=1
(xi − yi)2 = r2
i
}
.
For convenience a sphere in �n is often written as Sn−1. Here the superfix isn− 1 because as we shall see the sphere in �n is (n− 1)-dimensional, even thoughit is not a vector space. If (X,d) is a metric space, say a set V in X is d-open iff forany x ∈ V there is some radius rx (dependent on x) such that
Bd(x, rx)⊂ V.
Lemma 3.2 Let Γd be the family of all sets in X which are d-open. Then Γd satis-fies the following properties:T1. If U,V ∈ Γd , then U ∩ V ∈ Γd .T2. If Uj ∈ Γd for all j belonging to an index set J (which is possibly infinite),
then⋃
j∈J Uj ∈ Γd .T3. Both X and the empty set, Φ , belong to Γd .
Proof Clearly X and Φ are d-open. If U and V ∈ Γd , but U ∩ V =Φ then U ∩ V
is d-open. Suppose on the other hand that x ∈U ∩V . Since both U and V are open,there exist r1, r2 such that Bd(x, r1)⊂ U and Bd(x1, r2)⊂ V . Let r =min(r1, r2).By definition
Bd(x, r)= Bd(x, r1)∩Bd(x, r2)⊂U ∩ V.
Thus there is an open ball, centre x, of radius r contained in U ∩ V .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
82 3 Topology and Convex Optimisation
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Finally suppose x ∈⋃j∈J Uj = U . Since x belongs to at least one Uj , say U1,there is an open ball B = B(x, r1) contained in U1. Since U1 is open so is U . �
Note that by T1 the finite intersection of open sets is an open set, but infiniteintersection of open sets need not be open. To see this consider a set of the form
I = (a, b)= {x ∈ � : a < x < b}.
For any x ∈ I it is possible to find an ε such that a + ε < x < b− ε. Hence theopen ball B(x, ε)= {y : x − ε < y < x + ε} belongs to I , and so I is open.
Now consider the family {Ur : r = 1, . . . ,∞} of sets of the form Ur = (− 1r, 1
r).
Clearly the origin, 0, belongs to each Ur , and so 0 ∈ ∩ Ur = U . Suppose thatU is open. Since 0 ∈ U , there must be some open ball B(0, ε) belonging to U . Letr0 be an integer such that r0 > 1
ε, so 1
r0< ε. But then Ur0 = (− 1
r0, 1
r0) is strictly
contained in (−ε, ε).Therefore the ball B(0, ε)= {y ∈ � : |y|< ε} = (−ε, ε) is not contained in Ur0 ,
and so cannot be contained in U . Hence U is not open.
3.1.2 A Topology on a Set
We may define a topology on a set X to be any collection of sets in X which satisfiesthe three properties T1′, T2′, T3′.
Definition 3.3 A topology Γ on a set X is a collection of sets in X which satisfiesthe following properties:T1′ If U,V ∈ Γ then U ∩ V ∈ Γ .T2′ If J is any index set and Uj ∈ Γ for each j ∈ J , then
⋃J Uj ∈ Γ .
T3′ Both X and the empty set belong to Γ .
A set X which has a topology Γ is called a topological space, and written (X,Γ ).The sets in Γ are called Γ -open, or simply open. An open set in Γ which containsa point x is called a neighbourhood (or nbd.) of x.
A base B for a topology Γ is a collection of Γ -open sets such that any member U
of Γ can be written as a union of members of B. Equivalently if x belongs to a Γ -open set U , then there is a member V of the base such that x ∈ V ⊂U .
By Lemma 3.2, the collection Γd of sets which are open with respect to the metricd comprise a topology called the metric topology Γd on X. We also said that a set U
belonged to Γd iff each point x ∈U had a neighbourhood Bd(x, r) which belongedto U . Thus the family of sets
B = {Bd(x, r) : x ∈X,r > 0}
forms a base for the metric topology Γd on X.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.1 A Topological Space 83
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Fig. 3.4 ???
Consider again the metric topology on �. As we have shown any set of the form(a, b) is open. Indeed a set of the form (−∞, a) or (b,∞) is also open.
In general if U is an open set (in the topology Γ for the topological space X),then the complement UX =X\U of U in X is called closed.
Thus in �, the set [a, b] = {x ∈ � : a ≤ x ≤ b} is closed since it is the com-plement of the open set (−∞, a) ∪ (b,∞). Note that the sets (−∞, a] and [b,∞)
are also closed since they are complements of the open sets (a,∞) and (−∞, b)
respectively.If A is any set in a topological space, (X,Γ ), then define the open set, Int(A),
called the interior of A, by x ∈ Int(A) iff x is in A and there exists an open set G
containing x such that G⊂A.Conversely, define the closed set, Clos(A), or closure of A, by x ∈ Clos(A) iff x
is in X and for any open set G containing x, G∩A is non empty.Clearly Int(A) ⊂ A ⊂ Clos(A). (See Exercise ?? at the end of the book.) The
boundary of A, written δA, is Clos(A)∩Clos(AX), where AX =X\A.For example, if A is an open set, the complement (AX) of A in X is a closed
set containing δA (i.e., Clos(AX)= AX). The closure, Clos(A), on the other handis the closed set which intersects (AX) precisely in δA. Clearly if x belongs to theboundary of A, then any neighbourhood of x intersects both A and its complement.
A point x is an accumulation or limit point of a set A if any open set U containingx also contains points of A different from x. If A is closed then it contains its limitpoints. If A is a subset of X and Clos(A) = X then call A dense in X. Note thatthis means that any point x ∈X either belongs to A or, if it belongs to X\A, has theproperty that for any neighborhood U of x, U ∩ A �= Φ . Thus if x ∈ X\A it is anaccumulation point of A. If A is dense in X a point outside A may be ‘approximatedarbitrarily closely’ by points inside A.
For example the set of non-integer real numbers Z is dense (as well as open) in�.The set of rational numbers Q = {p
q: p,q ∈ Z} is also dense in �, but not open,
since any neighbourhood of a rational number must contain irrational numbers. Foreach rational q ∈Q, note that �\{q} is open, and dense. Thus
�\Q=⋂
q∈Q
{�\{q}} :
the set of irrationals is the “countable” intersection of open dense sets. Moreover�\Q is itself dense. Such a set is called a residual set, and is in a certain sense“more dense” than a dense set.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
84 3 Topology and Convex Optimisation
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Fig. 3.5 ???
We now consider different topologies on a set X.
Definition 3.41. If (X,Γ ) is a topological space, and Y is a subset of X, the relative topology
ΓY on Y is the topology whose open sets are {U ∩ Y :U ∈ Γ }.2. If (X,Γ ) and (Y,S) are topological spaces the product topology Γ ×S on the
set X× Y is the topology whose base consists of all sets of the form {U × V :U ∈ Γ,V ∈ S}.
We already introduced the relative topology in Sect. 1.1.2, and showed that thisformed a topology. To show that Γ × S is a topology for X × Y we need to showthat the union and intersection properties are satisfied. Suppose that
Wi =Ui × Vi
for i = 1,2, where Ui ∈ Γ,Vi ∈ S . Now W1 ∩ W2 = (U1 × V1) ∩ (U2 × V2) =(U1∩U2) × (V1∩V2). But U1∩U2 ∈ Γ,V1∩V2 ∈ S , since S and Γ are topologies.Thus W1 ∩W2 ∈ Γ × S . Suppose now that x ∈W1 ∪W2. Then x belongs either toU1 × V1 or U2 × V2 (or both). In either case x belongs to a member of the base forΓ × S .
Another way of expressing the product topology is that W is open in the producttopology iff for any x ∈W there exist open sets, U ∈ Γ and V ∈ S , such that x ∈U × V and U × V ⊂W .
For example consider the metric topology Γ induced by the norm ‖ ‖ on �.This gives the product topology Γ n on �n, where U is open in Γ n iff for eachx = (x1, . . . , xn) ∈ U there exists an open interval B(xi, ri) of radius ri about theith coordinate xi such that
B(x1, r1)× · · · ×B(xn, rn)⊂U.
Consider now the Cartesian norm on �n, where
‖x‖C =max{|x1|, . . . , |xn|
}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.1 A Topological Space 85
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
Fig. 3.6 The producttopology in �2
Fig. 3.7 A Cartesian openball of radius r in �2
This induces a Cartesian metric
dC(x, y)=max{|x1 − y1|, . . . , |xn − yn|
}.
A Cartesian open ball of radius r about x is then the set
BC(x, r)= {y ∈ �n : |yi − xi |< r ∀i = 1, . . . , n}.
A set U is open in the Cartesian topology ΓC for �n iff for every x ∈ U thereexists some r > 0 such that the ball BC(x, r)⊂U .
Suppose now that U is an open set in the product topology Γ n for �n. At anypoint x ∈U , there exist r1 . . . rn all > 0 such that
B = B(x1, r1)×, . . . ,B(xn, rn)⊂U.
Now let r =min(r1, . . . , rn). Then clearly the Cartesian ball BC(x, r) belongs tothe product ball B , and hence to U . Thus U is open in the Cartesian topology.
On the other hand if U is open in the Cartesian topology, for each point x
in U , the Cartesian ball BC(x, r) belongs to U . But this means the product ballB(x1, r)×, . . . ,B(xn, r) also belongs to U . Hence U is open in the product topol-ogy.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
86 3 Topology and Convex Optimisation
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Fig. 3.8 ???
We have therefore shown that a set U is open in the product topology Γ n on�n iff it is open in the Cartesian topology ΓC on �n. Thus the two topologies areidentical.
We have also introduced the Euclidean and city block norms on �n. These normsinduce metrics and thus the Euclidean topology ΓE and city block topology ΓB
on �n. As before a set U is open in ΓE (resp. ΓB ) iff for any point x ∈ U , there isan open neighborhood
BE(x, r)={
y ∈ �n :n∑
i=1
(yi − xi)2 < r2
}
,
resp. BB(x, r)= {y ∈ �n :∑ |yi − xi |< r} of x which belongs to U .(The reason we use the term “city block” should be obvious from the nature of a
ball under this metric, so displayed in Fig. 3.8.)In fact these three topologies ΓC , ΓE and ΓB on �n are all identical. We shall
show this in the case n= 2.
Lemma 3.3 The Cartesian, Euclidean and city block topologies on �2 are identi-cal.
Proof Suppose that U is an open set in the Euclidean topology ΓE for �2. Thus atx ∈U , there is an r > 0, such that the set
BE(x, r)={
y ∈ �2 :2∑
i=1
(yi − xi)2 < r2
}
⊂U.
From Fig. 3.9, it is obvious that the city block ball BB(x, r) also belongs toBE(x, r) and thus U . Thus U is open in ΓB .
On the other hand the Cartesian ball BC(x, r2 ) belongs to BB(x, r) and thus to U .
Hence U is open in ΓC .Finally the Euclidean ball BE(x, r
2 ) belongs to BC(x, r2 ). Hence if U is open
in ΓC it is open in ΓE .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.1 A Topological Space 87
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
Fig. 3.9 ???
Thus U open in ΓE ⇒U open in ΓB ⇒U open in ΓC ⇒U open in ΓE .Consequently all three topologies are identical in �2. �
Suppose that Γ1 and Γ2 are two topologies on a space X. If any open set U inΓ1 is also an open set in Γ2 then say that Γ2 is as fine as Γ1 and write Γ1 ⊂ Γ2. IfΓ1 ⊂ Γ2 and Γ2 ⊂ Γ1 then Γ1 and Γ2 are identical, and we write Γ1 = Γ2. If Γ1 ⊂ Γ2
but Γ2 contains an open set that is not open in Γ1 then say Γ2 is finer than Γ1. Wealso say Γ1 is coarser than Γ2.
If d1 and d2 are two metrics on a space X, then under some conditions thetopologies Γ1 and Γ2 induced by d1 and d2 are identical. Say the metrics d1
and d2 are equivalent iff for each ε > 0 there exist η1 > 0 and η2 > 0 such thatd1(x, y) < η1 ⇒ d2(x, y) < ε, and d2(x, y) < η2 ⇒ d1(x, y) < ε. Another wayof expressing this is that B1(x, η1) ⊂ B2(x, ε), and B2(x, η2) ⊂ B2(x, ε), whereBi(x, r)= {y : di(x, y) < r} for i = 1 or 2.
Just as in Lemma 3.3, the Cartesian, Euclidean and city block metrics on �n areequivalent. We can use this to show that the induced topologies are identical.
We now show that equivalent metrics induce identical topologies.If f :X→� is a function, and V is a set in X, define sup(f,V ), the supremum
(from the Latin, supremus) of f on V to be the smallest number M ∈ � such thatf (x)≤M for all x ∈ V .
Similarly define inf(f,V ), the infimum (again from the Latin, infimus) of f on V
to be the largest number m ∈ � such that f (x)≥m for all x ∈ V .Let d : X × X→� be a metric on X. Consider a point, x, in X, and a subset
of X. Then define the distance from x to V to be d(x,V )= inf(d(x,−),V ), whered(x,−) : V →� is the function d(x,−)(y)= d(x, y).
Suppose now that U is an open set in the topology Γ1 induced by the metric d1.For any point x ∈ U there exists r > 0 such that B1(x, r) ⊂ U , where B1(x, r) ={y ∈X : d1(x, y) < r}. Since we assume the metrics d1 and d2 are equivalent, theremust exist s > 0, say, such that
B2(x, s)⊂ B1(x, r)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
88 3 Topology and Convex Optimisation
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Fig. 3.10 ???
where B2(x, s)= {y ∈X : d2(x, y) < s}. Indeed one may choose s = d2(x,B1(x, r))
where B1(x, r) is the complement of B1(x, r) in X (see Fig. 3.10). Clearly theset U must be open in Γ2 and so Γ2 is as fine as Γ1. In the same way, how-ever, there exists t > 0 such that B1(x, t)⊂ B2(x, s), where again one may chooset = d1(x,B2(x, s)). Hence if U is open in Γ2 it is open in Γ1. Thus Γ1 is as fineas Γ2. As a consequence Γ1 and Γ2 are identical.
Thus we obtain the following lemma.
Lemma 3.4 The product topology, Γ n, Euclidean topology ΓE , Cartesian topol-ogy ΓC and city block topology ΓB are all identical on �n.
As a consequence we may use, as convenient, any one of these three metrics, orany other equivalent metric, on �n knowing that topological results are unchanged.
3.2 Continuity
Suppose that (X,Γ ) and (Y,S) are two topological spaces, and f : X → Y is afunction between X and Y . Say that f is continuous (with respect to the topologiesΓ and S) iff for any set U in S (i.e., U is S-open) the set f−1(U)= {x ∈X : f (x) ∈U} is Γ -open.
This definition can be given an alternative form in the case when X and Y aremetric spaces, with metrics d1, d2, say.
Consider a point x0 in the domain of f . For any ε > 0 the ball
B2(f (x0), ε
)= {y ∈ Y : d2(f (x0), y
)< ε}
is open. For continuity, we require that the inverse image of this ball is open. Thatis to say there exists some δ, such that the ball
B1(x0, δ)={x ∈X : d1(x0, x) < δ
}
belongs to f−1(B2(f (x0), ε)). Thus x ∈ B1(x0, δ)⇒ f (x) ∈ B2(f (x0), ε).Therefore say f is continuous at x0 ∈X iff for any ε > 0,∃ δ > 0 such that
f(B1(x0, δ)
)⊂ B2(f (x0), ε
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.2 Continuity 89
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
Fig. 3.11 ???
In the case that X and Y have norms ‖ ‖X and ‖ ‖Y , then we may say that f iscontinuous at x0 iff for any ε > 0,∃ δ > 0 such that
‖x − x0‖x < δ⇒ ∥∥f (x)− f (x0)∥∥
y< ε.
Then f is continuous on X iff f is continuous at every point x in its domain.If X,Y are vector spaces then we may check the continuity of a function f :
X→ Y by using the metric or norm form of the definition.For example suppose f : �→� has the graph given in Fig. 3.11. Clearly f is not
continuous. To see this formally let f (x0)= y0 and choose ε such that |y1−y0|> ε.If x ∈ (x0 − δ, x0) then f (x) ∈ (y0 − ε, y0).
However for any δ > 0 it is the case that x ∈ (x0, x0 + δ) implies f (x) > y0 + ε.Thus there exists no δ > 0 such that
x ∈ (x0 − δ, x0 + δ) ⇒ f (x) ∈ (y0 − ε, y0 + ε).
Hence f is not continuous at x0.We can give an alternative definition of continuity. A sequence of points in a
space X is a collection of points {xk : k ∈ Z}, indexed by the positive integers Z .The sequence is written (xk). The sequence (xk) in the metric space, X, has a limit,x, iff ∀ε > 0 ∃k0 ∈Z such that k > k0 implies ‖xk − x‖X < ε.
In this case write xk → x as k→∞, or Limk→∞xk = x.Note that x is then an accumulation point of the sequence {x1, . . .}.More generally (xn)→ x iff for any open set G containing x, all but a finite
number of points in the sequence (xk) belong to G.Thus say f is continuous at x0 iff
xk → x0 implies f (xk)→ f (x0).
Example 3.1 Consider the function f : �→� given by
f : x→{
x sin 1x
if x �= 0
0 if x = 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
90 3 Topology and Convex Optimisation
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Now x sin 1x= siny
ywhere y = 1
x. Consider a sequence (xk) where
Limk→∞ xk = 0. We shall write this limit as x → 0. Limx→0 x sin 1x= siny
y= 0
since | siny| is bounded above by 1, and Limy→∞ 1y= 0. Thus Limx→0 f (x)= 0.
But f (0)= 0, and so xk → 0 implies f (xk)→ 0. Hence f is continuous at 0. Onthe other hand suppose g(x) = sin 1
x. Clearly g(x) has no limit as x → 0. To see
this observe that for any sequence (xk)→ 0 it is impossible to find a neighborhoodG of some point y ∈ [−1,1] such that g(xk) ∈G whenever k > ko.
Any linear function between finite-dimensional vector spaces is continuous. Thusthe set of continuous functions contains the set of linear functions, when the domainis finite-dimensional. To see this suppose that f : V →W is a linear transformationbetween normed vector spaces. (Note that V and W may be infinite-dimensional.)Let ‖ ‖v and ‖ ‖w be the norms on V,W respectively.
Say that f is bounded iff ∃ B > 0 such that ‖f (x)‖w ≤ B‖x‖v for all x ∈ V .Suppose now that
∥∥f (x)− f (x0)∥∥
w< ε.
Now∥∥f (x)− f (x0)
∥∥w= ∥∥f (x − x0)
∥∥w≤ B‖x − x0‖v,
since f is linear and bounded. Choose δ = εB
. Then
‖x − x0‖v < δ⇒ ∥∥f (x)− f (x0)∥∥
w≤ B‖x − x0‖v.
Thus if f is linear and bounded it is continuous.
Lemma 3.5 Any linear transformation f : V →W is bounded and thus continuousif V is finite-dimensional (of dimension n).
Proof Use the Cartesian norm ‖ ‖c on V , and let ‖ ‖w be the norm on W . Lete1 . . . en be a basis for V ,
‖x‖c = sup{|xi | : i = 1, . . . , n
}, and
e= supn
{∥∥f (ei)∥∥
w: i = 1, . . . , n
}.
Now f (x) =∑i=1 xif (ei). Thus ‖f (x)‖w ≤∑ni ‖f (xiei)‖w , by the triangle
inequality, and n≤∑ni=1 |xi |‖f (ei)‖w , since ‖ay‖w = |a| ‖y‖w ≤ n e‖x‖c. Thus f
is bounded, and hence continuous with respect to the norms ‖ ‖c, ‖ ‖w . But for anyother norm ‖ ‖v it can be shown that there exists B ′ > 0 such that ‖x‖c ≤ B ′‖x‖v .Thus f is bounded, and hence continuous, with respect to the norms ‖ ‖v , ‖ ‖w . �
Consider now the set L(�n,�m) of linear functions from �n to �m. Clearly iff,g ∈ L(�n,�m) then the sum f + g defined by (f + g)(x)= f (x)+ g(x) is alsolinear, and for any α ∈ �, αf , defined by (αf )(x)= α(f (x)) is linear.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.2 Continuity 91
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
Hence L(�n,�m) is a vector space over �. Since �n is finite dimensional,by Lemma 3.5 any member of L(�n,�m) is bounded. Therefore for any f ∈L(�n,�m) we may define
‖f ‖ = supx∈�n
{‖f (x)‖‖x‖ : ‖x‖ �= 0
}.
Since f is bounded this is defined. Moreover ‖ ‖ = 0 only if f is the zerofunction. By definition ‖f ‖ is the real number such that ‖f ‖ ≤ B for all B suchthat ‖f (x)‖ ≤ B‖x‖. In particular ‖f (x)‖ ≤ ‖f ‖ ‖x‖. If f,g ∈ L(�n,�m), then‖(f + g)(x)‖ = ‖f (x) + g(x)‖ ≤ ‖f (x)‖ + ‖g(x)‖ ≤ ‖f ‖ ‖x‖ + ‖g‖ ‖x‖ =(‖f ‖ + ‖g‖)‖x‖. Thus ‖f + g‖ ≤ ‖f ‖ + ‖g‖.
Hence ‖ ‖ on L(�n,�m) satisfies the triangle inequality, and so L(�n,�m) is anormed vector space. This in turn implies that L(�n,�m) has a metric and thus atopology. It is often useful to use the metrics
d1(f, g)= sup{∥∥f (x)− g(x)
∥∥ : x ∈ �n}, or
d2(f, g)= sup{∣∣fi(x)− gi(x)
∣∣ : i = 1, . . . ,m,x ∈ �n}
where f = (f1, . . . , fm), g = (g1, . . . , gm). We write L1(�n,�m) and L2(�n,�m)
for the set L(�n,�m) with the topologies induced by d1 and d2 respectively. Clearlythese two topologies are identical.
Alternatively, choose bases for�n and�m and consider the matrix representationfunction
M : (L(�n,�m),+)→ (M(n,m),+).
On the right hand side we add matrices element by element under the rule (aij )+(bij )= (aij + bij ). With this operation M(n,m) is also a vector space. Clearly wemay choose a basis for M(n,m) to be {Eij : i = 1, . . . , n; j = 1, . . . ,m} where Eij
is the elementary matrix with 1 in the ith column and j th row.Thus M(n,m) is a vector space of dimension nm. Since M is a bijection,
L(�n,�m) is also of dimension nm.A norm on M(n,m) is given by
‖A‖ = sup{|aij | : i = 1, . . . , n; j = 1, . . . ,m
},
where A= (aij ).This in turn defines a metric and thus a topology on M(n,m). Finally this de-
fines a topology on L(�n,�m) as follows. For any open set U in M(n,m), letV =M−1(U) and call V open. The base for the topology on L(�n,�m) then con-sists of all sets of this form. One can show that the topology induced on L(�n,�m)
in this way is independent of the choice of bases. We call this the induced topologyon L(�n,�m) and write L(�n,�m) for this topological space. If we consider the
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
92 3 Topology and Convex Optimisation
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
norm topology on M(n,m) and the induced topology L(�n,�m) then the represen-tation map is continuous. Moreover the two topologies induced by the metrics d1and d2 on L(�n,�m) are identical to the induced topology L(�n,�m). Thus M isalso continuous when these metric topologies are used for its domain. (Exercise ??at the end of the book is devoted to this observation.)
If V is an infinite dimensional vector space and f ∈ L(V,W), then f need notbe continuous or bounded. However the subset of L(V,W) consisting of bounded,and thus continuous, maps in L(V,W) admits a norm and thus a topology. So wemay write this subset as L(V ,W).
Now let Co(�n,�m) be the set of continuous functions from �n to �m. We nowshow that Co(�n,�m) is a vector space.
Lemma 3.6 Co(�n,�m) is a vector space over �.
Proof Suppose that f,g are both continuous maps. At any x0 ∈ �n, and any ε > 0,∃ δ1, δ2 > 0 such that
‖x − x0‖< δ1 ⇒∥∥f (x)− f (x0)
∥∥<
(1
2
)ε
‖x − x0‖< δ2 ⇒∥∥g(x)− g(x0)
∥∥<
(1
2
)ε.
Let δ = min (δ1, δ2). Then ‖x−x0‖< δ⇒‖(f +g)(x)− (f +g)(x0)‖ = ‖f (x)−f (x0) + g(x) − g(x0)‖ ≤ ‖f (x) − f (x0)‖ + ‖g(x) − g(x0)‖ < ( 1
2 )ε + ( 12 )ε = ε.
Thus f + g ∈ Co(�n,�m).Also for any α ∈ �, any ε > 0 ∃ δ > 0 such that
‖x − x0‖< δ⇒ ∥∥f (x)− f (x0)∥∥<
ε
|α| .
Therefore ‖(αf )(x) − (αf )(x0)‖ = |α| ‖f (x) − f (x0)‖ < ε. Thus αf ∈ Co(�n,
�m). Hence Co(�n,�m) is a vector space. �
Since [L(�n,�m)] is closed under addition and scalar multiplication, it is a vec-tor subspace of dimension nm of the vector space Co(�n,�m). Note however thatCo(�n,�m) is an infinite-dimensional vector space (a function space).
Lemma 3.7 If (X,Γ ), (Y,S) and (Z,R) are topological spaces and Co((X,Γ ),
(Y,S)),Co((Y,S), (Z,R)) and Co((X,Γ ), (Z,R)) are the sets of functions whichare continuous with respect to these topologies, then the composition operator, ◦,maps Co((X,Γ ), (Y,S))×Co((Y,S), (Z,R)) to Co((X,Γ ), (Z,R)).
Proof Suppose f : (X,Γ )→ (Y,S) and g : (Y,S)→ (Z,R) are continuous. Weseek to show that g ◦ f : (X,Γ )→ (Z,R) is continuous. Choose any open set U
in Z. By continuity of g,g−1(U) is an S-open set in Y . But by the continuity of
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.3 Compactness 93
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
f,f−1(g−1(U)) is a Γ -open set in X. However f−1g−1(U)= (g ◦ f )−1(U). Thusg ◦ f is continuous. �
Note therefore that if f ∈ L(�n,�m) and g ∈L(�m,�k) then g◦f ∈ L(�n,�k)
will also be continuous.
3.3 Compactness
Let (X,Γ ) be a topological space. An open cover for X is a family {Uj : j ∈ J } ofΓ -open sets such that X =⋃j∈J Uj .
If U = {Uj : j ∈ J } is an open cover for X,a subcover of U is an open cover U ′of X where U ′ = {Uj : j ∈ J ′} and the index set J ′ is a subset of J . The subcoveris called finite if J ′ is a finite set (i.e., |J ′|, the number of elements of J ′, is finite).
Definition 3.5 A topological space (X,Γ ) is called compact iff any open coverof X has a finite subcover. If Y is a subset of the topological space (X,Γ ) then Y
is compact iff the topological space (Y,ΓY ) is compact. Here ΓY is the topologyinduced on Y by Γ . (See Sect. 1.1.3.)
Say that a family CJ = {Cj : j ⊂ J } of closed sets in a topological space (X,Γ )
has the finite intersection property (FIP) iff whenever J ′ is a finite subset of J then⋂j∈J ′ Cj is non-empty.
Lemma 3.8 A topological space (X,Γ ) is compact iff whenever CJ is a family ofclosed sets with the finite intersection property then
⋂j∈J Cj is non-empty.
Proof We establish first of all that UJ = {Uj : j ∈ J } is an open cover of X iff thefamily CJ = {Cj =X\Uj : j ∈ J } of closed sets has empty intersection. Now
⋃
J
Uj =⋃
J
(X\Cj )=⋃
J
(X ∩Cj )
=X ∩(⋃
j
Cj
)=X ∩
(⋂
J
Cj
).
Thus
⋃
J
Uj =X iff⋂
J
Cj =Φ.
To establish necessity, suppose that X is compact and that the family CJ hasthe FIP. If CJ in fact has empty intersection then UJ = {X\Cj : j ∈ J } must bean open cover. Since X is compact there exists a finite set J ′ ⊂ J such that UJ ′ is
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
94 3 Topology and Convex Optimisation
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
a cover. But then CJ ′ has empty intersection contradicting FIP. Thus CJ has non-empty intersection. To establish sufficiency, suppose that any family CJ , satisfyingFIP, has non-empty intersection. Let UJ = {Uj : j ∈ J } be an open cover for X.If UJ has no finite subcover, then for any finite J ′, the family CJ ′ = {X\Uj : j ∈J ′} must have non-empty intersection. By the FIP, the family CJ must have non-empty intersection, contradicting the assumption that UJ is a cover. Thus (X,Γ ) iscompact. �
This lemma allows us to establish conditions under which a preference relation P
on X has a non-empty choice CP (X). (See Lemma 1.8 for the case with X finite.)Say the preference relation P on the topological space X is lower demi-continuous(LDC) iff the inverse correspondence φ−1
P : X→ X : x → {y ∈ X : xPy} is openfor all x in X.
Lemma 3.9 If X is a compact topological space and P is an acyclic and lowerdemi-continuous preference on X, then there exists a choice x of P in X.
Proof Suppose on the contrary that there is no choice. Thus for all x ∈ X thereexists some y ∈X such that yPx. Thus x ∈ φ−1
P (y). Hence U = {φ−1P (y) : y ∈X}
is a cover for X. Moreover for each y ∈ X, φ−1P (y) is open. Since X is compact,
there exists a finite subcover of U . That is to say there exists a finite set A in X
such that U ′ = {φ−1P (y) : y ∈ A} is a cover for X. In particular this implies that
for each x ∈ A, there is some y ∈ A such that x ∈ P−1P (y), or that yPx. But then
CP (A)= {x ∈A : φP (x)=Φ} =Φ . Now P is acyclic on X, and thus acyclic on A.Hence, by the acyclicity of P and Lemma 1.8, CP (A) �= Φ . By the contradictionU = {φ−1
P (y) : y ∈ X} cannot be a cover. That is to say there is some x ∈ X suchthat x ∈ φ−1
P (y) for no y ∈X. But then yPx for no y ∈X, and x ∈ CP (X), or x isthe choice on X. �
This lemma can be used to show that a continuous function f : X→� from acompact topological space X into the reals attains its bounds. Remember that wedefined the supremum and infimum of f on a set Y to be1. sup(f,Y ) = M such that f (x) ≤ M for all x ∈ Y and if f (x) ≤ M ′ for all
x ∈ Y then M ′ ≥M
2. inf(f,Y )=m such that f (x)≥m for all x ∈ Y and if f (x)≥m′ for all x ∈ Y
then m≥m′.Say f attains its upper bound on Y iff there is some xs in Y such that f (xs) =sup(f,Y ). Similarly say f attains its lower bound on Y iff there is some xi in Y
such that f (xi)= inf(f,Y ).Given the function f : X → �, define a preference P on X × X by xPy iff
f (x) > f (y). Clearly P is acyclic, since > on� is acyclic. Moreover for any x ∈X,
φ−1P (x)= {y : f (y) < f (x)
}
is open, when f is continuous.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.3 Compactness 95
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
To see this let U = (−∞, f (x)). Clearly U is an open set in �. Moreover f (y)
belongs to the open interval (−∞, f (x)) iff y ∈ φ−1P (x). But f is continuous, and
so f−1(U) is open in X. Since y ∈ f−1(U) iff y ∈ φ−1P (x),φ−1
P (x) is open for anyx ∈X.
Weierstrass Theorem Let (X,Γ ) be a topological space and f :X→� a contin-uous real-valued function. If Y is a compact subset of X, then f attains its boundson Y .
Proof As above, for each x ∈ Y , define Ux = (−∞, f (x)). Then φ−1P (x) = {y ∈
Y : f (y) < f (x)} = f−1(Ux) ∩ Y is an open set in the induced topology on Y . ByLemma 3.9 there exists a choice x in Y such that φ−1
P (x) = Φ . But then f (y) >
f (x) for no y ∈ Y . Hence f (y)≤ f (x) for all y ∈ Y . Thus f (x)= sup(f,Y ).In the same way let Q be the relation on X given by xQy iff f (x) < f (y). Then
there is a choice x ∈ Y such that f (y) < f (x) for no y ∈ Y . Hence f (y)≥ f (x) forall y ∈ Y , and so f (x)= inf(f,Y ). Thus f attains its bounds on Y . �
We can develop this result further.
Lemma 3.10 If f : (X,Γ )→ (Z,S) is a continuous function between topologicalspaces, and Y is a compact subset of X, then
f (Y )= {f (y) ∈Z : y ∈ Y}
is compact.
Proof Let {Wα} be an open cover for f (Y ). Then each member Wα of this covermay be expressed in the form Wα = Uα ∩ f (Y ) where Uα is an open set in Z. Foreach α, let Vα = f−1(Uα)∩ Y . Now each Vα is open in the induced topology on Y .Moreover, for each y ∈ Y , there exists some Wα such that f (y) ∈Wα . Thus {Vα} isan open cover for Y . Since Y is compact, {Vα} has a finite subcover {Vα : α ∈ J },and so {f (Vα) : α ∈ J } is a finite subcover of {Wα}. Thus f (Y ) is compact. �
Now a real-valued continuous function f is bounded on a compact set, Y (bythe Weierstrass Theorem). So f (Y ) will be contained in [f (x,f (x))] say, for somex ∈ Y . Since f (Y ) must also be compact, this suggests that a closed set of the form[a, b] must be compact.
For a set Y ⊂ � define sup(Y ) = sup(id, Y ), the supremum of Y , and inf(Y ) =inf(id, Y ), the infimum of Y . Here id : �→� is the identity on �. The set Y ⊂� isbounded above (or below) iff its supremum (or infimum) is finite. The set is boundediff it is both bounded above and below. Thus a set of the form [a, b], say with−∞< a < b <+∞ is bounded.
Heine Borel Theorem A closed bounded interval, [a, b], of the real line is compact.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
96 3 Topology and Convex Optimisation
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
Sketch of Proof Consider a family C = {[a, ci], [dj , b] : i ∈ I, j ∈ J } of subsets of[a, b] with the finite intersection property. Suppose that neither I nor J is empty.Let d = sup({dj : j ∈ J }) and suppose that for some k ∈ I , ck < d . Then there existsi ∈ I and j ∈ J such that [a, ci] ∩ [dj , b] =Φ , contradicting the finite intersectionproperty. Thus ci ≥ d , and so [a, ci] ∩ [d, b] �=Φ , for all i ∈ I . Hence the family C
has non empty intersection. By Lemma 3.8, [a, b] is compact �
Definition 3.6 A topological space (X,Γ ) is called Hausdorff iff any two distinctpoints x, y ∈X have Γ -open neighbourhoods Ux,Uy such that Ux ∩Uy =Φ .
Lemma 3.11 If (X,d) is a metric space then (X,Γd) is Hausdorff, where Γd is thetopology on X induced by the metric d .
Proof For two points x �= y, let ε = d(x, y) �= 0. Define Ux = B(x, ε3 ) and Ux =
B(y, ε3 ).
Clearly, by the triangle inequality, B(x, ε3 )∩B(y, ε
3 )=Φ . Otherwise there wouldexist a point z such that d(x, z) < ε
3 , d(z, y) < ε3 , which would imply that d(x, y) <
d(x, z)+ d(z, y)= 2ε3 . By contradiction the open balls of radius ε
3 do not intersect.Thus (x,Γd) is Hausdorff. �
A Hausdorff topological space is therefore a natural generalisation of a metricspace.
Lemma 3.12 If (X,Γ ) is a Hausdorff topological space, then any compact subsetY of X is closed.
Proof We seek to show that X\Y is open, by showing that for any x ∈X\Y , thereexists a neighbourhood G of x and an open set H containing Y such that G∩H =Φ . Let x ∈ X\Y , and consider any y ∈ Y . Since X is Hausdorff, there exists aneighbourhood V (y) of y and a neighbourhood U(y), say, of x such that V (y) ∩U(y)=Φ . Since the family {V (y) : y ∈ Y } is an open cover of Y , and Y is compact,there exists a finite subcover {V (y) : y ∈A}, where A is a finite subset of Y .
Let H =⋃y∈A V (y) and G=⋂y∈A U(y).Suppose that G∩H �=Φ . Then this implies there exists y ∈A such that V (y)∩
U(y) is non-empty. Thus G ∩H =Φ . Since A is finite, G is open. Moreover Y iscontained in H . Thus X\Y is open and Y is closed. �
Lemma 3.13 If (X,Γ ) is a compact topological space and Y is a closed subsetof X, then Y is compact.
Proof Let {Uα} be an open cover for Y , where each Uα ⊂X. Then {Vα = Uα ∩ Y }is also an open cover for Y .
Since X\Y is open, {X\Y } ∪ {Vα} is an open cover for X.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.3 Compactness 97
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
Since X is compact there is a finite subcover, and since each Vα ⊂ Y , X\Ymust be a member of this subcover. Hence the subcover is of the form {X\Y } ∪{Vj : j ∈ J }. But then {Vj : j ∈ J } is a finite subcover of {Vα} for Y . Hence Y iscompact. �
Tychonoff’s Theorem If (X,Γ ) and (Y,S) are two compact topological spacesthen (X× Y,Γ × S), with the product topology, is compact.
Proof To see this we need only consider a cover for X × Y of the form {Uα × Vβ}for {Uα} an open cover for X and {Vβ} an open cover for Y . Since both X and Y
are compact, both {Uα} and {Vα} have finite subcovers {Uj }j∈J and {Vk}k∈K , andso {Uj × Vk : (j, k) ∈ J ×K} is a finite subcover for X× Y . �
As a corollary of this theorem, let Ik = [ak, bk] for k = 1, . . . , n be a family ofclosed bounded intervals in �. Each interval is compact by the Heine Borel Theo-rem. Thus the closed cube In = I1 × I2, . . . , In in �n is compact, by Tychonoff’sTheorem. Say that a set Y ⊂�n is bounded iff for each y ∈ Y there exists some finitenumber K(y) such that ‖x − y‖< K(y) for all x ∈ Y . Here ‖ ‖ is any convenientnorm on �n. If Y is bounded then clearly there exists some closed cube In ⊂ �n
such that Y ⊂ In. Thus we obtain:
Lemma 3.14 If Y is a closed bounded subset of �n then Y is compact.
Proof By the above, there is a closed cube In such that Y ⊂ In. But In is com-pact by Tychonoff’s Theorem. Since Y is a closed subset of In, Y is compact, byLemma 3.13. �
In �n a compact set Y is one that is both closed and bounded. To see this notethat �n is certainly a metric space, and therefore Hausdorff. By Lemma 3.12, ifY is compact, then it must be closed. To see that it must be bounded, consider theunbounded closed interval A= [0,∞) in�, and an open cover {Uk = (k−2, k); k =1, . . . ,∞}. Clearly {(−1,1), (0,2), (1,3), . . .} cover [0,∞). A finite subcover mustbe bounded above by K , say, and so the point K does not belong to the subcover.Hence [0,∞) is non-compact.
Lemma 3.15 A compact subset Y of � contains its bounds.
Proof Let s = sup(id, Y ) and i = inf(id, Y ) be the supremum and infimum of Y .Here id : � → � is the identity function. By the discussion above, Y must bebounded, and so i and s must be finite. We seek to show that Y contains thesebounds, i.e., that i ∈ Y and s ∈ Y . Suppose for example that s /∈ Y . By Lemma 3.12,Y must be closed and hence �\Y must be open. But then there exists a neighbour-hood (s − ε, s + ε) of s in �\Y , and so s − ε
2 /∈ Y . But this implies that y ≤ s − ε2
for all y ∈ Y , which contradicts the assumption that s = sup(id, Y ). Hence s ∈ Y .A similar argument shows that i ∈ Y . Thus Y = [i, y1] ∪, . . . ,∪ [yr , s] say, and soY contains its bounds. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
98 3 Topology and Convex Optimisation
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
Fig. 3.12 ???
Lemma 3.16 Let (X,Γ ) be a topological space and f :X→� a continuous real-valued function. If Y ⊂X is compact then there exist points x0 and x1 in Y such thatf (x0)≤ f (y)≤ f (x1) for all y ∈ Y .
Proof By Lemma 3.10, f (Y ) is compact. By Lemma 3.15, f (Y ) contains its infi-mum and supremum. Thus there exists x0, x1 ∈ Y such that
f (x0)≤ f (y)≤ f (x1) for all y ∈ Y.
Note that f (Y ) must be bounded, and so f (x0) and f (x1) must be finite. �
We have here obtained a second proof of the Weierstrass Theorem that a contin-uous real-valued function on a compact set attains its bounds, and shown moreoverthat these bounds are finite. A useful application of this theorem is that if Y is acompact set in �n and x /∈ Y then there is some point y in Y which is nearest to x.Remember that we defined the distance from a point x in a metric space (X,d)
to a subset Y of X to be d(x,Y ) = inf(fx,Y ) where fx : Y → � is defined byfx(y)= d(x,−)(y)= d(x, y).
Lemma 3.17 Suppose Y is a subset of a metric space (X,d) and x ∈X. Then thefunction fx : Y →� given by fx(y)= d(x, y) is continuous.
Proof Consider y1, y2 ∈ Y and suppose that d(x, y1) ≥ d(x, y2). Then |d(x, y1)−d(x, y2)| = d(x, y1)− d(x, y2).
By the triangle inequality d(x, y1) ≤ d(x, y2) + d(y2, y1). Hence |d(x, y1) −d(x, y2)| ≤ d(y1, y2) and so d(y1, y2) < ε ⇒ |d(x, y1) − d(x, y2)| < ε, for anyε > 0.
Thus for any ε > 0, d(y1, y2) < ε⇒ |fx(y1)− fx(y2)|< ε, and so fx is contin-uous. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 99
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
Lemma 3.18 If Y is a compact subset of a compact metric space X and x ∈ X,then there exists a point y0 ∈ Y such that d(x,Y )= d(x, y0) <∞.
Proof By Lemma 3.17, the function d(x,−) : Y → � is continuous. By Lem-ma 3.16, this function attains its lower and upper bounds on Y . Thus there existsy0 ∈ Y such that d(x, y0)= inf(d(x,−), Y )= d(x,Y ), where d(x, y0) is finite. �
The point y0 in Y such that d(x, y0)= d(x,Y ) is the nearest point in Y to x.Note of course that if x ∈ Y then d(x,Y )= 0.More importantly, when Y is compact d(x,Y ) = 0 if and only if x ∈ Y . To see
this necessity, suppose that d(x,Y )= 0. Then by Lemma 3.18, there exists y0 ∈ Y
such that d(x, y0)= 0. By the definition of a metric d(x, y0)= 0 iff x = y0 and sox ∈ Y . The point y ∈ Y that is nearest to x is dependent on the metric of course, andmay also not be unique.
3.4 Convexity
3.4.1 A Convex Set
If x, y are two points in a vector space, X, then the arc, [x, y], is the set {z ∈ X :∃ λ ∈ [0,1] s.t. z = λx + (1 − λ)y}. A point in the arc [x, y] is called a convexcombination of x and y. If Y is a subset of X, then the convex hull, con(Y ), of Y isthe smallest set in X that contains, for every pair of points x, y in Y , the arc [x, y].The set Y is called convex iff con(Y ) = Y . The set Y is strictly convex iff for anyx, y ∈ Y the combination λx + (1− λ)y, for λ ∈ (0,1), belongs to the interior of Y .
Note that if Y is a vector subspace of the real vector space X then Y must beconvex. For then if x, y ∈ Y both λ, (1− λ) ∈ � and so λx + (1− λ)y ∈ Y .
Definition 3.7 Let Y be a real vector space, or a convex subset of a real vectorspace, and let f : Y →� be a function. Then f is said to be1. convex iff f (λx + (1− λ)y)≤ λf (x)+ (1− λ)f (y)
2. concave iff f (λx + (1− λ)y)≥ λf (x)+ (1− λ)f (y)
3. quasi-concave iff f (λx + (1− λ)y) ≥ min[f (x), f (y)] for any x, y ∈ Y andany λ ∈ [0,1].
Suppose now that f : Y →� and consider the preference P ⊂ Y × Y inducedby f . For notational convenience from now on we regard P as a correspondenceP : Y → Y . That is define P by
P(x)= {y ∈ Y : f (y) > f (x)}.
If f is quasi-concave then when y1, y2,∈ P(x),
f(λy1 + (1− λ)y2
)≥min[f (y1), f (y2)
]> f (x).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
100 3 Topology and Convex Optimisation
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
Hence λy1 + (1− λ)y2 ∈ P(x). Thus for all x ∈ Y,P (x) is convex.We shall call a preference correspondence P : Y → Y convex when Y is convex
and P is such that P(x) is convex for all x ∈ Y .When a function f : Y →� is quasi-concave then the strict preference corre-
spondence P defined by f is convex. Note also that the weak preference R : Y → Y
given by
R(x)= {y ∈ Y : f (y)≥ f (x)}
will also be convex.If f : Y →� is a concave function then it is quasi-concave. To see this consider
x, y ∈ Y , and suppose that f (x)≤ f (y). By concavity,
f(λx + (1− λ)y
)≥ λf (x)+ (1− λ)f (y)
≥ λf (x)+ (1− λ)f (x)
≥min[f (x), f (y)
].
Thus f is quasi-concave. Note however that a quasi-concave function need beneither convex nor concave. However if f is a linear function then it is convex, con-cave and quasi-concave. There is a partial order > on �n given by x > y iff xi > yi
where x = (x1, . . . , xn), y = (y1, . . . , yn). A function f : �n→� is weakly mono-tonically increasing iff f (x)≥ f (y) for any x, y ∈ �n such that x > y. A functionf : �n→� has decreasing returns to scale iff f is weakly monotonically increas-ing and concave. A very standard assumption in economic theory is that feasibleproduction of an output has decreasing returns to scale of inputs, and that con-sumers’ utility or preference has decreasing returns to scale in consumption. Weshall return to this point below.
3.4.2 Examples
Example 3.2 (i) Consider the set X1 = {(x1, x2) ∈ �2 : x2 ≥ x1}. Clearly if x2 ≥ x1and x′2 ≥ x′1 then λx2+ (1−λ)x′2 ≥ λx1+ (1−λ)x′1, for λ ∈ [0,1]. Thus λ(x1, x2)+(1− λ)(x′1, x′2)= λx1 + (1− λ)x′1, λx2 + (1− λ)x′2 ∈X1. Hence X1 is convex.
On the other hand consider the set X2 = {(x1, x2) ∈ �2 : x2 ≥ x12}.
As Fig. 3.13a indicates, this is a strictly convex set. However the set X3 ={(x1, x2) ∈ �2 : |x2| ≥ x1
2} is not convex. To see this suppose x2 < 0. Then(x1, x2) ∈X3 implies that x2 ≤−x1
2. But then −x2 ≥ x12. Clearly (x1,0) belongs
to the convex combination of (x1, x2) and (x1,−x2) yet (x1,0) /∈X3.(ii) Consider now the set X4 = {(x1, x2) : x2 > x1
3}. As Fig. 3.13b shows it ispossible to choose (x1, x2) and (x′1, x′2) with x1 < 0, so that the convex combinationof (x1, x2) and (x′1, x′2) does not belong to X4. However X5 = {(x1, x2) ∈ �2 : x2 ≥x1
3 and x1 ≥ 0} and X6 = {(x1, x2) ∈ �2 : x2 ≤ x13 and x1 ≤ 0} are both convex
sets.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 101
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
Fig. 3.13a ???
Fig. 3.13b ???
(iii) Now consider the set X7 = {(x1, x2) ∈ �2 : x1x2 ≥ 1}. From Fig. 3.13c itis clear that the restriction of the set X7 to the positive quadrant �2+ = {(x1, x2) ∈�2 : x1 ≥ 0 and x2 ≥ 0} is strictly convex, as is the restriction of x7 to the negativequadrant �2− = {(x1, x2) ∈ �2 : x1 ≤ 0 and x2 ≤ 0}. However if (x1, x2) ∈X7 ∩�2+then (−x1,−x2) ∈X7 ∩�2−. Clearly the origin (0,0) belongs to the convex hull of(x1, x2) and (−x1,−x2), yet does not belong to X7. Thus X7 is not convex.
Finally a set of the form
X8 ={(x1, x2) ∈ �2+ : x2 ≤ x1
α for α ∈ (0,1)}
is also convex. See Fig. 3.13d.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
102 3 Topology and Convex Optimisation
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
Fig. 3.13c ???
Fig. 3.13d ???
Fig. 3.13e ???
Example 3.3 (i) Consider the set B = {(x1, x2) ∈ �2 : (x1−a1)2+(x2−a2)
2 ≤ r2}.See Fig. 3.13e. This is the closed ball centered on (a1, a2)= a, of radius r . Supposethat x, y ∈ B and z= λx + (1− λ)y for λ ∈ [0,1].
Let ‖ ‖ stand for the Euclidean norm. Then x, y both satisfy ‖x − a‖ ≤ r,‖y −a‖ ≤ r . But ‖z− a‖ ≤ λ‖x− a‖+ (1− λ)‖y− a‖. Thus ‖z− a‖ ≤ r and so z ∈ B .Hence B is convex. Moreover B is a closed and bounded subset of �2 and is thuscompact. For a general norm on �n, the closed ball B = {x ∈ �n : ‖x − a‖ ≤ r}
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 103
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
Fig. 3.14 ???
will be compact and convex. In particular, if the Euclidean norm is used, then B isstrictly convex.
(ii) In the next section we define the hyperplane H(ρ,α) normal to a vectorρ in �n to be {x ∈ �n : 〈ρ,x〉 = α} where α is some real number. Suppose thatx, y ∈H(ρ,x).
Now
⟨ρ,λx + (1− λ)y
⟩= ⟨λ(ρ, x)+ (1− λ)(ρ, y)⟩
= α, whenever λ ∈ [0,1].
Thus H(ρ,α) is a convex set. We also define the closed half-space H+(ρ,α) byH+(ρ,α) = {x ∈ �n : 〈ρ,x〉 ≥ α}. Clearly if x, y ∈ H+(ρ,α) then 〈ρ,λx + (1 −λ)y〉 = (λ〈ρ,x〉 + (1− λ)〈ρ,y〉)≥ α and so H+(ρ,α) is also convex.
Notice that if B is the compact convex ball in �n then there exists some ρ ∈ �n
and some α ∈ � such that B ⊂H+(ρ,α).If A and B are two convex sets in �n then A∩B must also be a convex set, while
A ∪ B need not be. For example the union of two disjoint convex sets will not beconvex.
We have called a function f : Y → � convex on Y iff f (λx + (1 − λ)y) ≤λf (x)+ (1− λ)f (y) for x, y ∈ Y .
Clearly this is equivalent to the requirement that the set F = {(z, x) ∈ � × Y :z≥ f (x)} is convex. (See Fig. 3.14.)
To see this suppose (z1, x1) and (z2, x2) ∈ F .Then λ(z1, x1)+ (1− λ)(z2, x2) ∈ F iff λz1 + (1− λ)z2 ≥ f (λx1 + (1− λ)x2).
But (f (x1), x1) and (f (x2), x2) ∈ F , and so λz1 + (1 − λ)z2 ≥ λf (x1) + (1 −λ)f (x2)≥ f (λx1 + (1− λ)x2) for λ ∈ [0,1].
In the same way f is concave on Y iff G= {(z, x) ∈ �×Y : z≤ f (x)} is convex.If f : Y →� is concave then the function (−f ) : Y →�, given by (−f )(x) =−f (x), is convex and vice versa.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
104 3 Topology and Convex Optimisation
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
To see this note that if z≤ f (x) then −z≥−f (x), and so G= {(z, x) ∈ �× Y :z≤ f (x)} is convex implies that F = {(z, x) ∈ �× Y : z≥ (−f )(x)} is convex.
Finally f is quasi-concave on Y iff, for all z ∈ �, the set G(z) = {x ∈ Y : z ≤f (x)} is convex.
Notice that G(x) is the image of G under the projection mapping pz : � × Y →Y : (z, x)→ x. Since the image of a convex set under a projection is convex, clearlyG(z) is convex for any z whenever G is convex. As we know already this meansthat a concave function is quasi-concave. We now apply these observations.
Example 3.4(i) Let f : �→� by x→ x2.
As Example 3.2(i) showed, the set F = {(x, z) ∈ �×� : z≥ f (x)= x2}is convex. Hence f is convex.
(ii) Now let f : � → � by x → x3. Example 3.2(ii) showed that the set F ={(x, z) ∈ �+ × � : z ≥ f (x) = x3} is convex and so f is convex on theconvex set �+ = {x ∈ � : x ≥ 0}.
On the other hand F = {(x, z) ∈ �− ×� : z ≤ f (x)= x3} is convex andso f is concave on the convex set �− = {x ∈ � : x ≤ 0}.
(iii) Let f : �→ � by x→ 1x
. By Example 3.2(iii) the set F = {(x, z) ∈ �+ ×� : z≥ f (x)= 1
x} is convex, and so f is convex on �+ and concave on �−.
(iv) Let f (x)= xα where 0 < α < 1. Then F = {(x, z) ∈ �+ ×� : z ≤ f (x)=xα} is convex, and so f is concave.
(v) Consider the exponential function exp : � → � : x → ex . Figure 3.15ademonstrates that the exponential function is convex. Another way of show-ing this is to note that ex > f (x) for any geometric function f : x→ xr forr > 1, for any x ∈ �+.
Since the geometric functions are convex, so is ex . On the other hand asFig. 3.15b shows the function loge : �+ →�, inverse to exp, is concave.
(vi) Consider now f : �2 →� : (x, y)→ xy. Just as in Example 3.2(iii) the set{(x, y) ∈ �2+ : xy ≥ t} = �2+ ∩ f−1[t,∞)} is convex and so f is a quasi-concave function on �2+. Similarly f is quasi-concave on �2−. However f
is not quasi-concave on �2.(vii) Let f : �2 →� : (x1, x2)→ r2 − (x1 − a1)
2 − (x2 − a2)2. Since the func-
tion g(x)= x2 is convex, (−g)(x)=−x2 is concave, and so clearly f is aconcave function. Moreover it is obvious that f has a supremum in �2 at thepoint (x1, x2)= (a1, a2). On the other hand the functions in Examples 3.4(iv)to (vi) are monotonically increasing.
3.4.3 Separation Properties of Convex Sets
Let X be a vector space of dimension n with a scalar product 〈, 〉. Define H(ρ,α)={x ∈X : 〈ρ,x〉 = α} to be the hyperplane in X normal to the vector ρ ∈ �n. It shouldbe clear that H(ρ,α) is an (n− 1) dimensional plane displaced some distance fromthe origin. To see this suppose that x = λρ belongs to H(ρ,α). Then 〈ρ,λρ〉 =
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 105
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
Fig. 3.15 ???
(a)
(b)
λ‖ρ‖2 = α. Thus λ = α
‖ρ‖2 . Hence the length of x is ‖x‖ = |λ| ‖ρ‖ = |α|‖ρ‖ and so
x =± |α|‖ρ‖2 ρ.
Clearly if y = λρ + y0 belongs to H(ρ,α) then 〈ρ,y〉 = 〈ρ,λρ + y0〉 = α +〈ρ,y0〉 and so 〈ρ,y0〉 = 0.
Thus any vector y in H(ρ,α) can be written in the form y = λρ+ y0 where y0 isorthogonal to ρ. Since there exist (n−1) linearly independent vectors y1, . . . , yn−1,all orthogonal to ρ, any vector y ∈H(ρ,α) can be written in the form
y = λρ +n−1∑
i=1
aiyi,
where λρ is a vector of length |α|‖ρ‖ . Now let {ρ⊥} = {x ∈ �n : 〈ρ,x〉 = 0}. Clearly
{ρ⊥} is a vector subspace of �n, of dimension (n− 1) through the origin.Thus H(ρ,α)= λρ + {ρ⊥} has the form of an (n− 1)-dimensional vector sub-
space displaced a distance |α|‖ρ‖ along the vector ρ. Clearly if ρ1 and ρ2 are colinear
vectors (i.e., ρ2 = aρ1 for some a ∈ �) then {ρ⊥} = {(ρ2)⊥}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
106 3 Topology and Convex Optimisation
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
Suppose that α1ρ1‖ρ‖2 = α2ρ2
‖ρ‖2 , then both H(ρ1, α1) and H(ρ2, α2) contain the same
point and are thus identical. Thus H(ρ1, α1)=H(aρ1, aα1)=H(ρ1‖ρ2‖ ,
α1‖ρ1‖ ).The hyperplane H(ρ,α) separates X into two closed half-spaces:
H+(ρ,α)= {x ∈X : 〈ρ,x〉 ≥ α}, and H−(ρ,α)= {x ∈X : 〈ρ,x〉 ≤ α
}.
We shall also write
H 0+(ρ,α)= {x ∈X : 〈ρ,x〉> α}, and H 0−(ρ,α)= {x ∈X : 〈ρ,x〉< α
}
for the open half-spaces formed by taking the interiors of H+(ρ,α) and H−(ρ,α),in the case ρ �= 0.
Lemma 3.19 Let Y be a non-empty compact convex subset of a finite dimensionalreal vector space X, and let x be a point in X\Y . Then there is a hyperplane H(ρ,α)
through a point y0 ∈ Y such that
〈ρ,x〉< α = 〈ρ,y0〉 ≤ 〈ρ,y〉 for all y ∈ Y.
Proof As in Lemma 3.17 let fx : Y →� be the function fx(y)= ‖x − y‖, where‖ ‖ is the norm induced from the scalar product 〈, 〉 in X.
By Lemma 3.18 there exists a point y0 ∈ Y such that ‖x − y0‖ = inf (fx,Y )=d(x,Y ). Thus ‖x − y0‖ ≤ ‖x − y‖ for all y ∈ Y . Now define ρ = y0 − x and α =〈ρ,y0〉. Then
〈ρ,x〉 = 〈ρ,y0〉 −⟨ρ, (y0 − x)
⟩= 〈ρ,y0〉 − ‖ρ‖2 < 〈ρ,y0〉.
Suppose now that there is a point y ∈ Y such that 〈ρ,y0〉> 〈ρ,y〉. By convexity,w = λy + (1− λ)y0 ∈ Y , where λ belongs to the interval (0,1). But
‖x − y0‖2 − ‖x −w‖2 = ‖x − y0‖2 − ‖x − λy − y0 + λy0‖2
= 2λ⟨ρ, (y0 − y)
⟩− λ2‖y − y0‖2.
Now 〈ρ,y0〉> 〈ρ,y〉 and so, for sufficiently small λ, the right hand side is pos-itive. Thus there exists a point w in Y , close to y0, such that ‖x − y0‖> ‖x −w‖.But this contradicts the assumption that y0 is the nearest point in Y to x. Thus〈ρ,y〉 ≥ 〈ρ,y0〉 for all y ∈ Y . Hence 〈ρ,x〉< α = 〈ρ,y0〉 ≤ 〈ρ,y〉 for all y ∈ Y . �
Note that the point y0 belongs to the hyperplane H(ρ,α), the set Y belongs tothe closed half-space H+〈ρ,α〉, while the point x belongs to the open half-space
H 0−(ρ,α)= {z ∈X : 〈ρ, z〉< α}.
Thus the hyperplane separates the point x from the compact convex set Y (seeFig. 3.16).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 107
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
Fig. 3.16 ???
Fig. 3.17 ???
While convexity is necessary for the proof of this theorem, the compactness re-quirement may be weakened to Y being closed. Suppose however that Y is an openset. Then it is possible to choose a point x outside Y , which is, nonetheless, anaccumulation point of Y such that d(x,Y )= 0.
On the other hand if Y is compact but not convex, then a situation such asFig. 3.17 is possible. Clearly no hyperplane separates x from Y .
If A and B are two sets, and H(ρ,α) = H is a hyperplane such that A ⊆H−(ρ,α) and B ⊆ H+(ρ,α) then say that H weakly separates A and B . If H
is such that A⊂H−(ρ,α) and B ⊂H+(ρ,α) then say H strongly separates A andB . Note in the latter case that it is necessary that A∩B =Φ .
In Lemma 3.19 we found a hyperplane H(ρ,α) such that 〈ρ,x〉< α. Clearly itis possible to find α− < α such that 〈ρ,x〉< α−.
Thus the hyperplane H(ρ,α−) strongly separates x from the compact convexset Y .
If Y is convex but not compact, then it is possible to find a hyperplane H thatweakly separates X from Y .
We now extend this result to the separation of convex sets.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
108 3 Topology and Convex Optimisation
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
Separating Hyperplane Theorem Suppose that A and B are two disjoint non-empty convex sets of a finite dimensional vector space X. Then there exists a hy-perplane H that weakly separates A from B . If both A and B are compact then H
strongly separates A from B .
Proof Since A and B are convex the set A− B = {w ∈ X : w = a − b where a ∈A,b ∈ B} is also convex.
To see this suppose a1−b1 and a2−b2 ∈A−B . Then λ(a1−b1)+ (1−λ)(a2−b2)= [λa1 + (1− λ)a2] + [λb1 + (1− λ)b2] ∈A−B .
Now A ∩ B = Φ . Thus there exists no point in both A and B , and so 0 /∈A− B . By Lemma 3.19, there exists a hyperplane H(−ρ,0) weakly separating 0from A−B , i.e., 〈ρ,0〉 ≤ 〈ρ,w〉 for all w ∈ B −A. But then 〈ρ,a〉 ≤ 〈ρ,b〉 for alla ∈A,b ∈ B .
Choose α ∈ [supa∈A〈ρ,a〉, infb∈B〈ρ,b〉]. In the case that A,B are non-compact,it is possible that
supa∈A
〈ρ,a〉 = infb∈B〈ρ,b〉.
Thus 〈ρ,a〉 ≤ α ≤ 〈ρ,b〉 and so H(ρ,α) weakly separates A and B .Consider now the case when A and B are compact.The function ρ∗ : X → � given by ρ∗(x) = 〈ρ,x〉 is clearly continuous. By
Lemma 3.16, since both A and B are compact, there exist points a ∈ A and b ∈ B
such that
〈ρ,a〉 = supa∈A
〈ρ,a〉 and 〈ρ,b〉 = infb∈B〈ρ,b〉.
If supa∈A〈ρ,a〉 = infb∈B〈ρ,b〉, then 〈ρ,a〉 = 〈ρ,b〉, and so a = b, contradictingA∩B =Φ .
Thus 〈ρ,a〉 < 〈ρ,b〉 and we can choose α such that 〈ρ,a〉 ≤ 〈ρ,a〉 < α <
〈ρ,b〉 ≤ 〈ρ,b〉 for all a, b in A,B .Thus H(ρ,α) strongly separates A and B . (See Fig. 3.18b.) �
Example 3.5 Hildenbrand and Kirman (1976) have applied this theorem to find aprice vector which supports a final allocation of economic resources. Consider asociety M = {1, . . . ,m} in which each individual i has an initial endowment ei =(ei1, . . . , ein) ∈ �n of n commodities. At price vector p = (p1, . . . , pn), the budgetset of individual i is
Bi(p)= {x ∈ �n : 〈p,x〉 ≤ 〈p, ei〉}.
Each individual has a preference relation Pi on �n �n, and at the price vectorp the demand Di(p) by i is the set
{x ∈ Bi(p) : yPix for no y ∈ Bi(p)
}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.4 Convexity 109
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
Fig. 3.18 a Weak separation.b Strong separation
(a)
(b)
Let fi = (fi1, . . . , fin) ∈ �n be the final allocation to individual i, for i =1, . . . ,m. Suppose there exists a price vector p = (p1, . . . , pn) with the property (∗)xPifi ⇒ 〈p,x〉> 〈p, ei〉. Then this would imply that fi ∈ Bi(p)⇒ fi ∈Di(p). Ifproperty (∗) holds at some price vector p, for each i, then fi ∈Di(p) for each i.
To show existence of such a price vector, let
πi = Pi(fi)− ei ∈ �n.
Here as before Pi(fi)= {x ∈ �n : xPifi}. Suppose that there exists a hyperplaneH(p,0) strongly separating 0 from πi . In this case 0 < 〈p,x−ei〉 for all x ∈ Pi(fi).But this is equivalent to 〈p,x〉> 〈p, ei〉 for all x ∈ Pi(fi).
Let π = Con[⋃i∈N πi] be the convex hull of the sets πi, i ∈M . Clearly if 0 /∈ π
and there is a hyperplane H(p,0) strongly separating 0 from π , then p is a pricevector which supports the final allocation f1, . . . , fn.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
110 3 Topology and Convex Optimisation
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
3.5 Optimisation on Convex Sets
A key notion underlying economic theory is that of the maximisation of an objectivefunction subject to one or a number of constraints. The most elementary case ofsuch a problem is the one addressed by the Weierstrass Theorem: if f :X→� is acontinuous function, and Y is a compact constraint set then there exists some pointy such that f (y)= sup(f,Y ). Here y is a maximum point of f on Y .
Using the Separating Hyperplane Theorem we can extend this analysis to theoptimisation of a convex preference correspondence on a compact convex constraintset.
3.5.1 Optimisation of a Convex Preference Correspondence
Suppose that Y is a compact, convex constraint set in �n and P : �n →�n is apreference correspondence which is convex (i.e., P(x) is convex for all x ∈ �n).A choice for P on Y is a point y ∈ Y such that P(y)∩ Y =Φ .
We shall say that P is non-satiated in �n iff for no y ∈ �n is it the case thatP(y) = Φ . A sufficient condition to ensure non-satiation for example is the as-sumption of monotonicity, i.e., x > y (where as before this means xi > yi , for eachof the coordinates xi, yi, i = 1, . . . , n) implies that x ∈ P(y).
Say that P is locally non-satiated in �n iff for each y ∈ �n and any neighbour-hood Uy of y in �n, then P(y)∩Uy �=Φ .
Clearly monotonicity implies local non-satiation implies non-satiation.Suppose that y belongs to the interior of the compact constraint set Y . Then there
is a neighbourhood Uy of y within Y . Consequently P(y)∩Uy �=Φ and so y cannotbe a choice from Y . On the other hand, since Y is compact it is closed, and so if y
belongs to the boundary δY of Y , it belongs to Y itself. By definition if y ∈ δY thenany neighbourhood Uy of y intersects �n\Y . Thus when P(y)⊂�n\Y,y will be achoice from Y . Alternatively if y is a choice of P from Y , then y must belong to theboundary of P .
Lemma 3.20 Let Y be a compact, convex constraint set in�n and let P : �n→�n
be a preference correspondence which is locally non-satiated, and is such that, forall x ∈ �n,P (x) is open and convex. Then y is a choice of P from Y iff there is ahyperplane H(p,α) through y in Y which separates Y from P(y) in the sense that
〈p,y〉 ≤ α = 〈p,y〉< 〈p,x〉 for all y ∈ Y and all x ∈ P(y).
Proof Suppose that the hyperplane H(p,α) contains y and separates Y from P(y)
in the above sense. Clearly y must belong to the boundary of Y . Moreover 〈p,y〉<〈p,x〉 for all y ∈ Y,x ∈ P(y). Thus Y ∩ P(y) = Φ and so y is the choice of P
from Y .On the other hand suppose that y is a choice. Then P(y)∩ Y =Φ .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.5 Optimisation on Convex Sets 111
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
Moreover the local non-satiation property, P(y) ∩ Uy �= Φ for Uy a neighbor-hood of y in �n, guarantees that y must belong to the boundary of Y . Since Y andP(y) are both convex, there exists a hyperplane H(p,α) through y such that
〈p,y〉 ≤ α = 〈p,y〉 ≤ 〈p,x〉
for all y ∈ Y , all x ∈ P(y). But P(y) is open, and so the last inequality can bewritten 〈p,y〉< 〈p,x〉. �
Note that if either the constraint set, Y , or the correspondence P is such thatP(y) is strictly convex, for all y ∈ �n, then the choice y is unique.
If f : �n →� is a concave or quasi-concave function then application of thislemma to the preference correspondence P : �n → �, where P(x) = {y ∈ �n :f (y) > f (x)}, characterises the maximum point y of f on Y . Here y is a maximumpoint of f on Y if f (y)= sup(f,Y ). Note that local non-satiation of P requires thatfor any point x in �n, and any neighbourhood Ux of x in �n, there exists y ∈ Ux
such that f (y) > f (x).The vector p = (p1, . . . , pn) which characterises the hyperplane H(p,α) is
called in economic applications the vector of shadow prices. The reason for thiswill become clear in the following example.
Example 3.6 As an example suppose that optimal use of (n− 1) different inputs(x1, . . . , xn−1) gives rise to an output y, say, where y = y(x1, . . . , xn−1). Any n
vector (x1, . . . , xn−1, xn) is feasible as long as xn ≤ y(x1, . . . , xn−1). Here xn isthe output. Write g(x1, . . . , xn−1, xn) = y(x1, . . . , xn−1) − xn. Then a vector x =(x1, . . . , xn−1, xn) is feasible iff g(x)≥ 0.
Suppose now that y = y(x1, . . . , xn−1) is a concave function in x1, . . . , xn−1.Then clearly the set G = {x ∈ �n : g(x) ≥ 0} is a convex set. Now let π(x1, . . . ,
xn−1, xn) = −∑pixi + pnxn be the profit function of the producer, when pricesfor inputs and outputs are given exogenously by (−p1, . . . ,−pn−1,pn). Again letP : �n →�n be the preference correspondence P(x) = {z ∈ �n : π(z) > π(x)}.Since for each x,P (x) is convex, and locally non-satiated, there is a choice x and ahyperplane H(ρ,α) separating P(x) from G.
Indeed it is clear from the construction that P(x) ⊂ H 0+(ρ,α) and G ⊂H−(ρ,α).
Moreover the hyperplane H(ρ,α) must coincide with the set of points {x ∈ �n :π(x)= π(x)}. Thus the hyperplane H(ρ,α) has the form
{x ∈ �n : 〈p,x〉 = π(x)
}
and so may be written H(p,π(x)). Note that the intercept on the xn axis is π(x)pn
while the distance of the hyperplane from the origin is π(x)‖p‖ = π(x)√∑
p2i
. Thus the
intercept gives the profit measured in units of output, while the distance from the
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
112 3 Topology and Convex Optimisation
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
Fig. 3.19 ???
origin of the profit hyperplane gives the profit in terms of a normalized price vector(‖p‖).
Figure 3.19 illustrates the situation with one input (x1) and one output (x2).Precisely the same analysis can be performed when optimal production is char-
acterised by a general production function F : �n→�.Here x1, . . . , xm are inputs, with prices−p1, . . . ,−pm and xm+1, . . . , xn are out-
puts with prices pm+1, . . . , pn. Let p = (−p1, . . . ,−pm,−pm+1, . . . , pn) ∈ �n.Define F so that a vector x ∈ �n is feasible iff F(x)≥ 0. Note that we also need
to restrict all inputs and outputs to be non-negative. Therefore define
�n+ = {x : xi ≥ 0 for i = 1, . . . , n}.Assume that the feasible set (or production set)
G= {x ∈ �n+ : F(x)≥ 0}
is convex.
As before let P : �n → �n where P(x) = {z ∈ �n : π(z) > π(x)}. Then thepoint x is a choice of P from G iff x maximises the profit function
π(x)=n−m∑
j=1
pm+j xm+j −m∑
j=1
pjxj .
By the previous example x is a choice iff the hyperplane H(p,π(x)) separatesP(x) and G: i.e., P(x)⊂H 0+(p,π(x)) and G⊂H−(p,π(x)).
In the next chapter we shall use this optimality condition to characterise thechoice m more fully in the case when F is “smooth”.
Example 3.7 Consider now the case of a consumer maximising a preference corre-spondence P : �n→�n subject to a budget constraint B(p) which is dependent ona set of exogenous prices p1, . . . , pn.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.5 Optimisation on Convex Sets 113
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
Fig. 3.20 ???
For example the consumer may hold an initial set of endowments (e1, . . . , en),so let
I =n∑
i=1
piei = 〈p, e〉.
The budget set is then
B(p)= {x ∈ �n+ : 〈p,x〉 ≤ I},
where for convenience we assume the consumer only buys a non-negative amountof each commodity. Suppose that P is monotonic, and P(x) is open, convex forall x ∈ �n. As before x is a choice from B(p) iff there is a hyperplane H(ρ,α)
separating P(x) from B(p).Under these conditions the choice must belong to the upper boundary of B(p)
and so satisfy (p, x)= (p, e)= I . Thus the hyperplane has the form H(p, I), andso the optimality condition is P(x)⊂H 0+(p, I ) and B(p)⊂H−(p, I ); i.e., (p, x)≤I = (p, x) < (p,y) for all x ∈ B(p) and all y ∈ P(x).
Figure 3.20 illustrates the situation with two commodities x1 and x2.In the next chapter we use this optimality condition to characterise a choice when
preference is given by a smooth utility function f : �n→�.In the previous two examples we considered
1. optimisation of a profit function, which is determined by exogenously givenprices, subject to a fixed production constraint, and
2. optimisation of a fixed preference correspondence, subject to a budget con-straint, which is again determined by exogenous prices.
Clearly at a given price vector each producer will “demand” a certain input vec-tor and supply a particular output vector, so that the combination is his choice inthe environment determined by p. In the same way a consumer will respond to aprice vector p by demanding optimal amounts of each commodity, and possiblysupplying other commodities such as labor, or various endowments. In contrast toExample 3.7, regard all prices as positive, and consider a commodity xj demandedby an agent i as an input to be negative, and positive when supplied as an output.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
114 3 Topology and Convex Optimisation
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
Let xij (p) be the optimal demand or supply of commodity j by agent i at the pricevector p, with m agents and n commodities, then market equilibrium of supplyand demand in the economy occurs when
∑mi=1∑n
j=1 xij (p) = 0. A price vectorwhich leads to market equilibrium in demand and supply is called an equilibriumprice vector.
Example 3.8 To give a simple example, consider two agents. The first agent con-trols a machine which makes use of labor, x, to produce a single output y. Regardx ∈ (−∞,0] and consider a price vector p ∈ �2+, where p = (w, r) and w is theprice of labor, and r the price of the output. An output is feasible from Agent Oneiff F(x, y)≥ 0.
Agent Two is the only supplier of labor, but is averse to working. His preferenceis described by a quasi-concave utility function f : �2→� and we restrict attentionto a domain
D = {(x, y) ∈ �2 : x ≤ 0, y ≥ 0}.
Assume that f is monotonic, i.e., if x1 < x2 and y1 < y2 then f (x1, y1) <
f (x2, y2). The budget constraint of Agent Two at (w, r) is therefore
B(w, r)= {(x1, y2) ∈D : ry2 ≤w|x|},where |x| is the quantity of labor supplied, and y2 is the amount of commodity y
consumed. Profit for Agent One is π(x, y)= ry−wx, and we shall assume that thisagent then consumes an amount y1 = π(x,y)
rof commodity y.
For equilibrium of supply and demand at prices (w, r)
1. y = y1 + y2;2. (x, y) maximises π(x, y)= ry −wx subject to F(x, y)≥ 0;3. (x, y2) maximises f (x, y2) subject to ry2 =wx.
At any point (x, y) ∈D, and vector (w, r) define
P(x, y)= {(x′, y′) ∈D : f (x ′, y′ − y1)> f (x, y − y1)
}
where as above y1 = ry−wxr
is the amount of commodity y consumed by the pro-ducer. Thus P(x, y) is the preference correspondence of Agent One displaced bythe distance y1 up the y-axis.
Figure 3.21 illustrates that it is possible to find a vector (w, r) such that H =H((w, r),π(x, y)) separates P(x, y) and the production set
G= {(x, y) ∈D : F(x, y)≥ 0}.
As in Example 3.6, the intersect of H with the y-axis is y1 = π(x,y)r
, the con-sumption of y by Agent One.
The hyperplane H is the set
{(x, y) : ry +wx = ry1
}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.6 Kuhn-Tucker Theorem 115
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
Fig. 3.21 ???
Hence for all (x, y) ∈H,(x, y− y1) satisfies r(y− y1)+wx = 0. Thus H is theboundary of the second agent’s budget set {(x, y2) : ry2+wx = 0} displaced up they-axis by y1. Consequently y = y1 + y2 and so (x, y) is a market equilibrium.
Note that the hyperplane separation satisfies:
py − rx ≤ π(x, y) < ry′ −wx′
for all (x, y) ∈G, and all (x′, y′) ∈ P(x, y).As above (x′, y′) ∈ P(x, y) iff f (x′, y′ − y1) > f (x, y2). So the right hand side
implies that if (x′, y′) ∈ P(x, y), then ry′ −wx′ > π(x, y).Since y′2 = y′ − y1, ry
′2 − wx′ > 0 or (x′, y′ − y1) ∈ H 0+((w, r),0) and so
(x′, y′ − y1) is infeasible for Agent Two.Finally (x, y) maximises π(x, y) subject to (x, y) ∈G, and so (x, y) ∈G and so
(x, y) results from optimising behaviour by both agents at the price vector (w, r).
As this example illustrates it is possible to show existence of a market equilib-rium in economies characterised by compact convex constraint sets, and convexpreference correspondences. To do this in a general context however requires morepowerful mathematical tools, which we shall introduce in Sect. 3.8 below. Beforethis however we consider one further application of the hyperplane separation the-orem to a situation where we wish to optimise a concave objective function subjectto a number of concave constraints.
3.6 Kuhn-Tucker Theorem
Here we consider a family of constraints in �n. Let g = (g1, . . . , gm) : �n →�m.As before let �m+ = {(y1, . . . , ym) : yi ≥ 0 for i = 1, . . . , n}. A point x is feasible iffx ∈ �n+ and g(x) ∈ �m+ (i.e., gi(x) ≥ 0 for i = 1, . . . ,m). Let f : �n →� be the
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
116 3 Topology and Convex Optimisation
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
objective function. Say that x∗ ∈ �n is an optimum of the constrained optimisationproblem (f, g) iff x∗ solves the problem: maximise f (x) subject to the feasibilityconstraint g(x) ∈ �m+, x ∈ �n+.
Call the problem (f, g) solvable iff there is some x ∈ �n+ such that gi(x) > 0 fori = 1, . . . ,m. The Lagrangian to the problem (f, g) is:
L(x,λ)= f (x)+m∑
i=1
λigi(x)= f (x)+ (λ,g(x))
where x ∈ �n+ and λ = (λ1, . . . , λm) ∈ �m+. The pair (x∗, λ∗) ∈ �n+m+ is called aglobal saddle point for the problem (f, g) iff
L(x,λ∗
)≤ L(x∗, λ∗
)≤ L(x∗, λ)
for all x ∈ �n+, λ ∈ �m+.
Kuhn-Tucker Theorem 1 Suppose f,g1, . . . , gm : �n →�m are concave func-tions for all x ∈ �n+. Then if x∗ is an optimum to the solvable problem (f, g) :�n→�m+1 there exists a λ∗ ∈ �m+ such that (x∗, λ∗) is a saddle point for (f, g).
Proof Let A = {y ∈ �m+1 : ∃ x ∈ �n+ : y ≤ (f, g)(x)}. Here (f, g)(x) = (f (x),
g(x), . . . , gm(x)). Thus y = (y1, . . . , ym+1) ∈A iff ∃x ∈ �n+ such that
y1 ≤ f (x)
yj+1 ≤ gj (x) for j = 1, . . . ,m.
Let x∗ be an optimum and
B = {z= (z1, . . . , zm+1) ∈ �m : z1 > f(x∗)
and (z2, . . . , zm+1) > 0}.
Since f,g are concave, A is convex. To see this suppose y1, y2 ∈ A. But sinceboth f and g are concave af (x1)+(1−a)f (x2)≤ f (ax1+(1−a)x2) and similarlyfor g, for any a ∈ [0,1]. Thus
ay1 + (1− a)y2 ≤ a(f,g)(x1)+ (1− a)(f, g)(x2)≤ (f, g)(ax1 + (1− a)x2
).
Since x1, x2 ∈ �n+, ax1 + (1− a)x2 ∈ �n+, and so ay1 + (1− a)y2 ∈ �n+.Clearly B is convex, since az11+ (1− a)z12 > f (x∗) if a ∈ [0,1] and z11, z12 >
f (x∗).To see A ∩ B =Φ , consider x ∈ �n such that g(x) < 0. Then (y2, . . . , ym+1)≤
g(x) < 0≤ (z2, . . . , zm+1).If g(x) ∈ �m+ then x is feasible. In this case y1 ≤ f (x)≤ f (x∗) < z1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.6 Kuhn-Tucker Theorem 117
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
By the separating hyperplane theorem, there exists (p1, . . . , pm+1) ∈ �m+1 andα ∈ � such that H(p,α)= {w ∈ �m+1 :∑m+1
j=1 wjpj = α} separates A and B , i.e.,∑m+1
j=1 pjyj ≤ α ≤∑m+1j=1 pjzj for any y ∈A and z ∈ B .
Moreover p ∈ �m+1+ . By the definition of A, for any y ∈ A, ∃ x ∈ �n+ such thaty ≤ (f, g)(x).
Thus for any x ∈ �n+,
p1f (x)+m∑
j=2
pjgj (x)≤m+1∑
j=1
pjzj .
Since (f (x∗),0, . . . ,0) belongs to the boundary of B ,
p1f (x)+m∑
j=2
pjgj (x)≤ p1f(x∗).
Suppose p1 = 0. Since p ∈ �m+1+ , there exists pj > 0.Since the problem is solvable, ∃ x ∈ �n+ such that gj (x) > 0. But this gives∑mj=2 pjgj (x) > 0, contradicting p1 = 0. Hence p1 > 0.
Let λ∗j = pj+1p1
for j = 1, . . . ,m.
Then L(x,λ∗) = f (x) +∑mj=1 λ∗j gj (x) ≤ f (x∗) for all x ∈ �n+, where λ∗ =
(λ∗1, . . . , λ∗m) ∈ �m+. Since x∗ is feasible, g(x∗) ∈ �m+, and 〈λ∗, g(x∗)〉 ≥ 0. Butf (x∗) + 〈λ∗, g(x∗)〉 ≤ f (x∗) implying 〈λ∗, g(x∗)〉 ≤ 0. Thus 〈λ∗, g(x∗)〉 = 0.Clearly 〈λ,g(x∗)〉 ≥ 0 if λ ∈ �m+. Thus L(x,λ∗) ≤ L(x∗, λ∗) ≤ L(x∗, λ) for anyx ∈ �n+, λ ∈ �m+. �
Kuhn-Tucker Theorem 2 If the pair (x∗, λ∗) is a global saddle point for the prob-lem (f, g), then x∗ is an optimum.
Proof By the assumption
L(x,λ∗
)≤ L(x∗, λ∗
)≤ L(x∗, λ
)
for all x ∈ �n+, λ ∈ �m+.Choose λ= (λ∗1, . . . ,2λ∗i , . . . , λ∗m).Then L(x∗, λ)≥L(x∗, λ∗) implies gi(x
∗)λ∗i ≥ 0. If λ∗i �= 0 then gi(x∗)≥ 0, and
so 〈λ∗, g(x∗)〉 ≥ 0.On the other hand, L(x∗, λ∗) ≤ L(x∗,0) implies 〈λ∗, g(x∗)〉 ≤ 0. Thus
〈λ∗, g(x∗)〉 = 0. Hence f (x) + 〈λ∗, g(x)〉 ≤ f (x∗) ≤ f (x∗) + 〈λ,g(x∗)〉. If x isfeasible, g(x) ≥ 0 and so 〈λ∗, g(x)〉 ≥ 0. Thus f (x) ≤ f (x)+ 〈λ∗, g(x)〉 ≤ f (x∗)for all x ∈ �n+, whenever g(x) ∈ �m+. Hence x∗ is an optimum for the problem(f, g). Note that for a concave optimisation problem (f, g), x∗ is an optimumfor (f, g) iff (x∗, λ∗) is a global saddle point for the Lagrangian L(x,λ),λ ∈ �m+.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
118 3 Topology and Convex Optimisation
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
Moreover (x∗, λ∗) are such that 〈λ∗, g(x∗)〉 minimises 〈λ,g(x)〉 for all λ ∈ �m+, x ∈�n+, g(x) ∈ �m+. Since 〈λ∗, g(x∗)〉 = 0 this implies that if gi(x
∗) > 0 then λ∗i = 0and if λ∗i > 0 then gi(x
∗)= 0.The coefficients (λ∗1, . . . , λ∗m) are called shadow prices. If the optimum is such
that gi(x∗) > 0 then the shadow price λ∗1 = 0. In other words if the optimum does
not lie in the boundary of the ith constraint set ∂Bi = {x : gi(x) = 0}, then thisconstraint is slack, with zero shadow price. If the shadow price is non zero then theconstraint cannot be slack, and the optimum lies on the boundary of the constraintset.
In the case of a single constraint, the assumption of non-satiation was sufficientto guarantee that the constraint was not slack.
In this case
f (x)+ p2
p1g(x)≤ f
(x∗)≤ f
(x∗)+ λg
(x∗)
for any x ∈ �+, and λ ∈ �+, where p2p1
> 0.The Kuhn-Tucker theorem is of particular use when objective and constraint
functions are smooth. In this case the Lagrangean permits computation of the opti-mal points of the problem. We deal with these procedures in Chap. 4. �
3.7 Choice on Compact Sets
In Lemma 3.9 we showed that when a preference relation is acyclic and lower demi-continuous (LDC) on a compact space, X, then P admits a choice. Lemma 3.20gives a different kind of result making use of compactness and convexity of theconstraint set, and convexity and openness of the image P(x) at each point x. Wenow present a class of related results using compactness, convexity and lower demi-continuity. These results are essentially all based on the Brouwer Fixed Point The-orem (Brouwer 1912) and provide the technology for proving existence of a marketequilibrium. We first introduce the notion of retraction and contractibility, and thenshow that any continuous function on the closed ball in �n admits a fixed point.
Definition 3.8 Let X be a topological space.(i) Suppose Y is a subset of X. Say Y has the fixed point property iff whenever
f : Y →X is a continuous function (with respect to the induced topology onY ) such that f (Y )= {f (y) ∈X : y ∈ Y } ⊂ Y , then there exists a point x ∈ Y
such that f (x)= x.(ii) If Y is a topological space, and there exists a bijective continuous function h :
X→ Y such that h−1 is also continuous then h is called a homeomorphismand X,Y are said to be homeomorphic (see Sect. 1.2.3 for the definition ofbijective).
(iii) If Y is a topological space and there exist continuous functions g : Y → X
and h : X→ Y such that h ◦ g : Y → X→ Y is the identity on Y , then h iscalled an r-map (for g).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.7 Choice on Compact Sets 119
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
(iv) If Y ⊂ Z ⊂ X and g = id : Y → Y is the (continuous) identity map and h :Z→ Y is an r-map for g then h is called a retraction (of Z on Y ) and Y iscalled a retract of Z.
(v) If Y ⊂X and there exists a continuous function f : Y × [0,1]→ Y such thatf (y,0) = y (so f ( ,0) is the identity on Y ) and f (y,1) = y0 ∈ Y , for ally ∈ Y , then Y is said to be contractible.
(vi) Suppose that Y ⊂ Z ⊂ X and there exists a continuous function f : Z ×[0,1] → Z such that f (z,0)= z ∀z ∈ Z, f (y, t)= y ∀y ∈ Y and f (z,1)=h(z) where h : Z→ Y is a retraction, then f is called a strong retraction ofZ on Y , and Y is called a deformation retract of Z.
To illustrate the idea of contractibility, observe that the closed Euclidean ballin �n of radius ξ , centered at x, namely
Bn = clos(Bd(x, ξ)
)= {y ∈ �n : d(x, y)≤ ξ},
is obviously strictly convex and compact. Moreover the center, {x}, is clearly a de-formation retract of Bn. To see this let f (y, t)= (1− t)y + tx for y ∈ Bn. Clearlyf (y,1)= h(y)= {x} so h : Bn→{x} and h(x)= x. Since f is continuous, this alsoimplies that Bn is contractible. A continuous function such as f : Bn×[0,1]→ Bn,or more generally f : Z × [0,1] → Z, is often called a homotopy and writtenft : Z→ Z where ft (z) = f (z, t). The homotopy ft between the identity and theretraction f1 of Z on Y means that Z and Y are “topologically” equivalent (in somesense). Thus the ball Bn and the point {x} are topologically equivalent. More gen-erally, if Y is contractible and there is a strong retraction of Z on Y , then Z is alsocontractible.
Lemma 3.21 Let X be a topological space. If Y is contractible and Y ⊂ Z ⊂ X
such that Y is a deformation retract of Z, then Z is contractible.
Proof Let g : Z × [0,1] → Z be the strong retraction of Z on Y , and let f : Y ×[0,1]→ Y be the contraction of Y to y ∈ Y . Define
r :Z×[
0,1
2
]→Z by r(x, t)= g(z,2t)
r ′ :Z×[
1
2,1
]→Z by r ′(z, t)= f
(g(z,1),2t − 1
).
To see this define a contraction s of Z onto y0. Note that r(z, 12 ) = r ′(z, 1
2 ),since g(z,1) = f (g(z,1),0). This follows because g is a strong retraction and sog(z,1) ∈ Y , g(y,1)= f (y,0)= y if y ∈ Y . Clearly s : Z × [0,1]→ Z (defined bys(z, t)= r(z, t) if t < 1
2 , s(z, t)= r ′(z, t) if t ≥ 12 ) is continuous and is the identity
at t = 0. Finally if t = 1, then s(z,1)= f (g(z,1),1)= y0. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
120 3 Topology and Convex Optimisation
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Lemma 3.22 If Z is a (non-empty) compact, convex set in �n, then it is con-tractible.
Proof For any x in the interior of Z there exists some ξ > 0 such that Bn =clos(Bd(x, ξ)) is contained in Z. (Indeed ξ can be chosen so that Bn is containedin the interior of Z.) As observed in Example 3.3, the closed ball, Bn, is both com-pact and strictly convex. By Lemma 3.17, the distance function d(z,−) : Bn →�is continuous for each z ∈ Z, and so there exists a point y(z) in Bn, say, such thatd(z, y(z)) < d(z, y)) ∀y ∈ Bn. Then d(z, y(z)) = d(z,Bn), the distance betweenz and Bn. Indeed d(z, y(z)) = 0 iff z ∈ Bn. Moreover for each z ∈ Z, y(z) isunique. Define the function f : Z×[0,1]→ Z by f (z, t)= tz+ (1− t)y(z). Sincey(z) ∈ Bn ⊂ Z for each z, and Z is convex, f (z, t) ∈ Z for all t ∈ (0,1]. Clearly ifz ∈ Bn then f (z, t)= z for all t ∈ [0,1] and f (−,1)= h : Z→ Bn is a retraction.Thus f is a strong retraction, and Bn is a deformation retract of Z. By Lemma 3.21,Z is contractible. �
Note that compactness of Z is not strictly necessary for the validity of this lemma.
Lemma 3.23 If Z is contractible to z0 and Y ⊂ Z is a retract of Z by h : Z→ Y
then Y is contractible to h(z0).
Proof Let f : Z × [0,1] → Z be the contraction of Z on z0, and let h : Z → Y
be the retraction. Clearly h ◦ f : Z × [0,1] → Z→ Y . If y ∈ Y , then f (y,0) = y
and h(y)= y (because h is a retraction). Thus h ◦ f (y,0)= y. Moreover f (z,1)=z0 ∀ ∈ Z, so h ◦ f (z,1)= h(z0). �
Clearly being a deformation retract of Z is a much stronger property than being aretract of Z. Both of these properties are useful in proving that any compact convexset has the fixed point property, and that the sphere is neither contractible nor hasthe fixed point property.
Remember that the sphere of radius ξ in �n, with center x, is
Sn−1 = Boundary(clos(Bd(x, ξ)
))= {y ∈ �n : d(x, y)= ξ}.
Now let x0 ∈ Sn−1 be the north pole of the sphere. We shall give an intuitiveargument why D = Sn−1\{x0} is contractible, but Sn−1 is not contractible.
Example 3.9 Let D = Sn−1\{x0} and let Z be a copy of D which is flattened atthe south pole. Let D0 be the flattened disc round the South Pole, xS . Clearly D0is homeomorphic to an (n− 1) dimensional ball Bn−1 centered on xS . Then Z ishomeomorphic to the object D0 × [0,1). There is obviously a strong retraction ofD onto D0. This retraction may be thought of as the function that moves any pointz ∈ Sn−1\{x0} down the lines of longitude to D0. Since D0 is compact, convexit is contractible to xS and thus, by Lemma 3.20, there is a contraction f : D ×[0,1]→D to xS .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.7 Choice on Compact Sets 121
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
Fig. 3.22 The retraction ofBn on Sn−1
To indicate why Sn−1 cannot be contractible, let us suppose without loss of gen-erality, that g : Sn−1 × [0,1] → Sn−1 is a contraction of Sn−1 to xS , and that g
extends the contraction f :D×[0,1]→D (i.e., g(z, t)= f (z, t) whenever z ∈D).Now f (y, t) maps each point y ∈D to a point further south on the longitudinal linethrough y. If we require g(y, t) = f (y, t) for each y ∈ D, and we require g to becontinuous at (x0,0) then it is necessary for g(x0, t) to be an equatorial circle inSn−1. In other words if g is a function it must fail continuity at (x0,0). While thisis not a formal argument, the underlying idea is clear: The sphere Sn−1 contains ahole, and it is topologically different from the ball Bn.
Brouwer’s Theorem Any compact, convex set in �n has the fixed point property.
Proof We prove first that the ball Bn ≡ B has the fixed point property. Supposeotherwise: that is there exists a continuous function f : B → B with x �= f (x) forall x ∈ B .
Since f (x) �= x, construct the arc from f (x) to x and extend this to the boundaryof B . Now label the point where the arc and the boundary of B intersect as h(x).Since the boundary of B is Sn−1, we have constructed a function h : B→ Sn−1. Itis easy to see that h is continuous (because f is continuous). Moreover if x′ ∈ Sn−1
then h(x′) ∈ Sn−1. Since Sn−1 ⊂ B , it is clear that h : B → Sn−1 is a retraction.(See Fig. 3.22.) By Lemma 3.23, the contractibility of B to its center, x0 say, im-plies that Sn−1 is contractible to h(x0). But Example 3.9 indicates that Sn−1 is notcontractible. The contradiction implies that any continuous function f : B→ B hasa fixed point.
Now let Y be any compact convex set in �n. Then there exists for some ξ
and y0 ∈ Y , a closed ξ -ball, centered at y0, such that Y is contained in B =clos(Bd(y0, ξ)). As in the proof of Lemma 3.22, there exists a strong retractiong : B × [0,1] → B; so Y is a deformation retract of B . (See Fig. 3.23.) In par-ticular g(−,1) = h : B → Y is a retraction. If f : Y → Y is continuous, thenf ◦ h : B → Y → Y ⊂ B is continuous and has a fixed point. Since the image off ◦ h is in Y , this fixed point, y1, belongs to Y . Hence f ◦ h(y1) = y1 for somey1 ∈ Y . But h= id (the identity) on Y , so h(y1)= y1 and thus y1 = f (y1). Conse-quently y1 ∈ Y is a fixed point of f . Thus Y has the fixed point property. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
122 3 Topology and Convex Optimisation
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
Fig. 3.23 The strongretraction of B on Y
Example 3.10 The standard compact, convex set in �n is the (n−1)-simplex Δn−1
defined by
Δ=Δn−1 ={
x = (x1, . . . , xn) ∈ �n :n∑
i=1
xi = 1, and xi ≥ 0 ∀ i
}
.
Δ has n vertices {x0i }, where x0
i = (0, . . . ,1, . . . ,0) with a 1 in the ith entry. Anedge between x0
i and x0j is the arc 〈〈x0
i , x0j 〉〉 or convex set of the form
{x ∈ �n : x = λx0
i + (1− λ)x0j for λ ∈ [0,1]}.
An s-dimensional face of Δ is simply the convex hull of (s+1) different vertices.Note in particular that there are n different (n− 2) dimensional faces. Such a face isopposite the vertex x0
i , so we may label this face Δn−2i . These n different faces have
empty intersection. However any subfamily, F , of this family of n faces (where Fhas cardinality at most (n − 1)), does have a non-empty intersection. In fact if Fhas cardinality (n− 1) then the intersection is a vertex.
Brouwer’s Theorem allows one to derive further results on the existence ofchoice.
Lemma 3.24 Let Q :Δ→�n be an LDC correspondence from the (n−1) dimen-sional simplex to �n, such that Q(x) is both non-empty and convex, for each x ∈Δ.Then there exists a continuous selection, f , for Q, namely a continuous functionf :Δ→�n such that f (x) ∈Q(x) for all x ∈Δ.
Proof Since Q(x) �= Φ , ∀x ∈ Δ, then for each x ∈ Δ,x ∈ Q−1(y) for some y ∈�n. Hence {Q−1(y) : y ∈ �n} is a cover for Δ. Since Q is LDC, Q−1(y) is open,∀y ∈ �n, and thus the cover is an open cover. Δ is compact. As in the proof ofLemma 3.9, there is a finite index set, A = {y1, . . . , yk} of points in �n such that{Q−1(yi) : yi ∈ A} covers Δ. Define αi :Δ→� by αi(x)= d(x,Δ\Q−1(yi)) fori = 1, . . . , k, and let gi :Δ→� be given by gi(x)= αi(x)/
∑kj=1 αj (x). As before,
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.7 Choice on Compact Sets 123
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
d is the distance operator, so αi(x) is the distance from x to Δ\Q−1(yi). {gi} isknown as a partition of unity for Q. Clearly
∑gi(x)= 1 for all x ∈Δ, and gi(x)=
0 iff x ∈Δ\Q−1(yi) (since Δ\Q−1(yi) is closed and thus compact). Finally definef :Δ→�n by f (x)=∑k
i=1 gi(x)yi . By the construction, gi(x)= 0 iff yi /∈Q(x),thus f (x) is a convex combination of points all in Q(x). Since Q(x) is convex,f (x) ∈Q(x). By Lemma 3.17, each αi is continuous. Thus f is continuous. �
Lemma 3.25 Let P : Δ→ Δ be an LDC correspondence such that for all x ∈Δ,P (x) is convex and x /∈ Con P(x), the convex hull of P(x). Then the choiceCP (Δ) is non empty.
Proof Suppose CP (Δ)=Φ . Then P(x) �=Φ ∀x ∈Δ. By Lemma 3.24, there existsa continuous function f : Δ→ Δ such that f (x) ∈ P(x) ∀x ∈ Δ. By Brouwer’sTheorem, there exists a fixed point x0 ∈ Δ such that x0 = f (x0). This contradictsx /∈ ConP(x), ∀x ∈Δ. Thus CP (Δ) �=Φ . �
These two lemmas are stated for correspondences with domain the finite dimen-sional simplex, Δ. Clearly they are valid for correspondences with domain a (finitedimensional) compact convex space. However both results can be extended to (in-finite dimensional) topological vector spaces. The general version of Lemma 3.24is known as Michael’s Selection Theorem (Michael 1956). However it is necessaryto impose conditions on the domain and codomain spaces. In particular it is nec-essary to be able to construct a partition of unity. For this purpose we can use acondition call “paracompactness” rather than compactness. Paracompactness of aspace X requires that there exist, at any point x ∈X, an open set Ux containing x,such that for any open cover {Ui} of X, only finitely many of the open sets of thecover intersect Ux . To construct the continuous selection it is also necessary thatthe codomain Y of the correspondence has a norm, and is complete (essentially thismeans that a limit of a convergent sequences of points is contained in Y ). A com-plete normed topological vector space Y is called a Banach space. We also needY to be “separable” (ie Y contains a countable dense subset.) If Y is a separableBanach space we say it is admissible.
Michael’s Selection Theorem employs a property, called lower hemi-continuity.
Definition 3.9 A correspondence Q : X→ Y between the topological spaces, X
and Y , is lower hemi-continuous (LHC) if whenever U is open in Y , then the set{x ∈X :Q(x)∩U �=Φ} is open in X.
Michael’s Selection Theorem Suppose Q : X → Y is a lower hemi-continuouscorrespondence from a paracompact, Hausdorff topological space X into the ad-missible space Y , such that Q(x) is non-empty closed and convex, for all x ∈ X.Then there exists a continuous selection f :X→ Y for Q.
Lemma 3.24 also provides the proof for a useful intersection theorem known asthe Knaster-Kuratowski-Mazurkiewicz (KKM) Theorem.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
124 3 Topology and Convex Optimisation
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
Before stating this theorem, consider an arbitrary collection {x1, . . . , xk} of dis-tinct points in �n. Then clearly the convex hull, Δ, of these points can be identifiedwith a (k− 1)-dimensional simplex. Let S ⊂ {1, . . . , k} be any index set, and let ΔS
be the simplex generated by this collection of s − 1 points (where s = |S|).
KKM Theorem Let R :X→ Y be a correspondence between a convex set X con-tained in a Hausdorff topological vector space Y such that R(x) �=Φ for all x ∈X.Suppose that for at least one point x0 ∈X,R(x0) is compact. Suppose further thatR(x) is closed for all x ∈X. Finally for any set {x1, . . . , xk} of points in X, supposethat
Con{x1, . . . , xk} ⊂k⋃
i=1
R(xi).
Then⋂
x∈X R(x) is non empty.
Proof By Lemma 3.8, since R(x0) is compact,⋂
x∈X R(x) is non-empty iff⋂k
i=1 R(xi) �= Φ for any finite index set. So let K = {1, . . . , k} and let Δ bethe (k − 1)-dimensional simplex spanned by {x1, . . . , xk}. Define P : Δ→ Δ byP(x) = {y ∈ Δ : x ∈ Δ\R(y)} and define Q : Δ→ Δ by Q(x) = ConP(x), theconvex hull of P(x).
But P−1(y) = {x ∈ Δ : y ∈ P(x)} = Δ\R(y) is an open set, in Δ, and so P isLDC. Thus Q is LDC. Now suppose that
⋂i∈K R(xi)=Φ .
Thus for each x ∈ Δ there exists xi(i ∈ K) such that x /∈ R(xi). But then x ∈Δ−R(xi) and so x ∈ P−1(xi). In particular, for each x ∈Δ, P(x), and thus Q(x),is non-empty. Moreover {Q−1(xi) : i ∈K} is an open cover for Δ. As in the proofof Lemma 3.24, there is a partition of unity for Q. (We need Y to be Hausdorff forthis construction.) In particular there exists a continuous selection f : Δ→ Δ forQ. By Brouwer’s Theorem, f has a fixed point x0 ∈Δ. Thus x0 ∈ Con P(x0), andso x0 ∈ Con{y1, . . . , yk} where yi ∈ P(x0) for i ∈ K . But then x0 ∈ Δ\R(yi) fori ∈K , and so x0 /∈R(yi) for i ∈K .
Hence
Con{y1, . . . , yk} �⊂k⋃
i=1
R(yi).
This contradicts the hypothesis of the Theorem. Consequently⋂
i∈K R(xi) �=Φ
for any finite vertex set K . By compactness⋂
x∈X R(x) �=Φ . �
We can immediately use the KKM theorem to prove a fixed point theorem for acorrespondence P from a compact convex set X to a Hausdorff topological vectorspace, Y . In particular X need not be finite dimensional.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.8 Political and Economic Choice 125
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
Browder Fixed Point Theorem Let Q : X → X be a correspondence where X
is a compact convex subset of the Hausdorff topological vector space, Y . Supposefurther that Q is LDC, and that Q(x) is convex and non-empty for all x ∈X. Thenthere exists x0 ∈X such that x0 ∈Q(x0).
Proof Suppose that x /∈Q(x) ∀x ∈ X. Define R : X→ X by R(x) = X\Q−1(x).Since Q is LDC, R(x) is closed and thus compact ∀x ∈X. To use KKM, we seek toshow that Con {x1, . . . , xk} ⊂⋃k
i=1 R(xi) for any finite index set, K = {1, . . . , k}.We proceed by contradiction. That is, suppose that there exists x0 in X with x0 ∈
Con {x1, . . . , xk} but x0 /∈R(xi) for i ∈K . Then x0 ∈Q−1(xi), so xi ∈Q(x0),∀i ∈K . But then x0 ∈ Con Q(x0). Since Q(x) is convex ∀ x ∈ X, this implies thatx0 ∈Q(x0), a contradiction. Consequently x ∈ R(xi) for some i ∈K . By the KKMTheorem,
⋂x∈X R(x) �=Φ .
Thus ∃ x0 ∈ X with x0 ∈ R(x), and so x0 ∈ X\Q−1(x) ∀x ∈ X. Thus x0 /∈Q−1(x) and x /∈ Q(x0) ∀x ∈ X. This contradicts the assumption that Q(x) �=Φ ∀x ∈ X. Hence ∃ x0 ∈X with x0 ∈Q(x0). �
Ky Fan Theorem Let P : X → X be an LDC correspondence where X isa compact convex subset of the Hausdorff topological vector space, Y . If x /∈ConP(x), ∀x ∈ X, then the choice CP (X0) �= Φ for any compact convex subset,X0 of X.
Proof Define Q : X→ X by Q(x) = Con P(x). If Q(x) �= Φ for all x ∈ X, thenby the Browder fixed point theorem, ∃ x0 ∈ X with x0 ∈ Q(x0). This contradictsx /∈ Con P(x) ∀ x ∈ X. Hence Q(x0) = Φ for some x0 ∈ X. Thus CP (X) = {x ∈X : P(x)=Φ} is non-empty. The same inference is valid for any compact, convexsubset X0 of X. �
3.8 Political and Economic Choice
The results outlined in the previous section are based on the intersection propertyof a family of closed sets. With compactness, this result can be extended to the caseof a correspondence R : X→ X to show that
⋂x∈X R(x) �= Φ . If we regard R as
derived from an LDC correspondence P :X→X by R(x)=X\P−1(x) then R(x)
can be interpreted as the set of points “no worse than” or “at last as good as” x.But then the choice CP (X) =⋂x∈X R(x), since such a choice must be at least
as good as any other point. The finite-dimensional version (Lemma 3.25) of theproof that the choice is non-empty is based simply on a fixed point argument usingBrouwer’s Theorem. To extend the result to an infinite dimensional topological vec-tor space we reduced the problem to one on a finite dimensional simplex, Δ, spannedby {x1, . . . , xk : xi ∈X} and then showed essentially that
⋂x∈Δ R(x) is non empty.
By compactness then⋂
x∈X R(x) is non-empty. There is, in fact, a related infinitedimensional version of Brouwer’s Fixed Point Theorem, known as Schauder’s fixedpoint theorem for a continuous function f :X→ Y , where Y is a compact subset ofthe convex Banach space X.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
126 3 Topology and Convex Optimisation
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
Fig. 3.24 ???
One technique for proving existence of an equilibrium price vector in an ex-change economy (as discussed in Sect. 3.5) is to construct a continuous functionf :Δ→Δ, where Δ is the price simplex of feasible price vectors, and show that f
has a fixed point (using either the Brouwer or Schauder fixed point theorems).An alternative technique is to use the Ky Fan Theorem to prove existence of an
equilibrium price vector. This technique permits a proof even when preferences arenot representable by utility functions. More importantly, perhaps, it can be used inthe infinite dimensional case.
Example 3.11 To illustrate the Ky Fan Theorem with a simple example, considerFig. 3.24, which reproduces Fig. 1.11 from Chap. 1.
It is evident that the inverse preference, P−1, is not LDC: for example, P−1( 34 )=
( 14 , 1
2 ] ∪ ( 34 ,1] which is not open. As we saw earlier the choice of P on the unit
interval is empty. In fact, to ensure existence of a choice we can require simply thatP be lower hemi-continuous. This we can do by deleting the segment ( 1
2 ,1) abovethe point 1
2 . If the choice were indeed empty, then by Michael’s Selection Theoremwe could find a continuous selection f : [0,1] → [0,1] for P . By Brouwer’s fixedpoint theorem f has a fixed point, x0, say, with x0 ∈ P(x0). By inspection the fixedpoint must be x0 = 1
2 . If we require P to be irreflexive (since it is a strict preferencerelation) then this means that 1
2 /∈ P( 12 ) and so the choice must be CP ([0,1])= { 1
2 }.Notice that the preference displayed in Fig. 3.24 cannot be represented by a utilityfunction. This follows because the implicit indifference relation is intransitive.
The Ky Fan Theorem gives a proof of the existence of a choice for a “spatialvoting game”. Remember a social choice procedure, σ is simple iff it is defined bya family D of decisive coalitions, each a subset of the society M . In this case ifπ = (P1, . . . ,Pm) is a profile on the topological space X, then the social preferenceis given by
xσ(π)y iff x ∈⋃
A∈D
⋂
i∈A
Pi(y)= PD(y).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.8 Political and Economic Choice 127
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
Fig. 3.25 The graph ofindifference
Here we use Pi :X→X to denote the preference correspondence of individuali. For coalition A, then x ∈ PA(y)=⋂i∈A Pi(y) means every member of A prefersx to y.
Thus x ∈⋃A∈D PA(y) means that for some coalition A ∈ D, all members ofA prefer x to y. Finally we write x ∈ PD(y) for the condition that x is sociallypreferred to y. The choice for σ(π) on X is then
Cσ(π)(X)= {x : PD(x)=Φ}.
Nakamura Theorem Suppose X is a compact convex topological vector spaceof dimension n. Suppose that π = (P1, . . . ,Pm) is a profile on X such that eachpreference Pi :X→X is (i) LDC; and (ii) semi-convex, in the sense that x /∈ ConPi(x) for all x ∈ X. If σ is simple and has Nakamura number k(σ ), and if n ≤k(σ )− 2 then the choice Cσ(π)(X) is non-empty.
Proof For any point x, y ∈ P−1D (x) means x ∈ PD(y) and so x ∈ Pi(y) ∀i ∈ A,
some A ∈D. Thus y ∈ P−1i (x) ∀i ∈A or y ∈⋂i∈A P−1
i (x) or y ∈⋃D⋂
A P−1i (x).
But each Pi is LDC and so P−1i (x) is open, for all x ∈X. Finite intersection of open
sets is open, and so PD is LDC.Now suppose that PD is not semi-convex (that is x ∈ ConPD(x) for some
x ∈X). Since X is n-dimensional and convex, this means it is possible to find a set ofpoints {x1, . . . , xn+1} such that x ∈ Con {x1, . . . , xn+1} and such that xj ∈ PD(x) foreach j = 1, . . . , n+1. Without loss of generality this means there exists a subfamilyD′ = {A1, . . . ,An+1} of D such that xj ∈ PAj
(x). Now n+ 1 ≤ k(σ )− 1, and bythe definition of the Nakamura number, the collegium K(D′) is non-empty. In par-ticular there is some individual i ∈ Aj , for j = 1, . . . , n+ 1. Hence xj ∈ Pi(x) forj = 1, . . . , n+1. But then x ∈ ConPi(x). This contradicts the semi-convexity of in-dividual preference. Consequently PD is semi-convex. The conditions of the Ky FanTheorem are satisfied, and so PD(x)=Φ for some x ∈X. Thus Cσ(π)(x) �=Φ . �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
128 3 Topology and Convex Optimisation
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
It is worth observing that in finite dimensional spaces the Ky Fan Theorem isvalid with the continuity property weakened to lower hemi-continuity (LHC). Notefirst that if P is LDC then it is LHC; this follows because {x ∈ X : P(x) ∩ V �=Φ} =⋃y∈V (P−1(y)∩X) is the union of open sets and thus open.
Moreover (as suggested in Example 3.11) if P is LHC and the choice is nonempty, then the correspondence x → ConP(x) has a continuous selection f (byMichael’s Selection Theorem). By the Brouwer Fixed Point Theorem, there is afixed point xo such that xo ∈ ConP(x0). This violates semi-convexity of P . Thusthe Nakamura Theorem is valid when preferences are LHC. The finite dimensionalversion of the Ky Fan Theorem can be used to show existence of a Nash equilibrium(Nash 1950).
Definition 3.10(i) A Game G = {(Pi,X) : i ∈M} for society M consists of a strategy space,
Xi , and a strict preference correspondence Pi :X→X for each i ∈M , whereX =Πi Xi =X1 × · · · ×Xm is the joint strategy space.
(ii) In a game G, the Nash improvement correspondence for individual i is de-fined by
P̂i :X→X where y ∈ P̂i(x) iff y ∈ Pi(x) and
y = (x1, . . . , xi−1, x∗i , . . . xm
),
x = (x1, . . . , xi−1, xi, . . . , xm).
(iii) The overall Nash improvement correspondence is
P̂ =⋃
i∈M
Pi :X→X.
(iv) A point x ∈X is a Nash Equilibrium for the game G iff P̂ (x)=Φ .
Bergstrom (1975, 1992) showed the following.
Bergstrom’s Theorem Let G = {(Pi,X)} be a game, and suppose each Xi ⊂ �n
is a non-empty compact, convex subset of �n. Suppose further that for all i ∈M,P̂i
is both semi-convex and LHC. Then there exists a Nash equilibrium for G.
Proof Since each P̂i is LHC, it follows easily that P̂ :X→X is LHC. To see thatP̂ is semi-convex, suppose that y ∈ Con P̂ (x), then y =∑λiy
i ,∑
i∈N λi = 1,λi ≥ 0, ∀i ∈ N and yi ∈ P̂i(x). By the definition of P̂j , y − x =∑i∈M λiz
i wherezi = yi − x.
This follows because yi and x only differ on the ith coordinate, and so zi =(0, . . . , zi
0, . . .) where zi0 ∈ Xi . Moreover, λi �= 0 iff zi �= 0 because P̂i is semi-
convex.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.8 Political and Economic Choice 129
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
Fig. 3.26 Failure ofsemi-convexity
Clearly {zi : i ∈ M,λi �= 0} is a linearly independent set, so y = x iff λi = 0∀i ∈M . But then yi = x ∀i ∈M , which again violates semi-convexity of Pi . Thusy �= x and so P̂ is semi-convex. By the Ky Fan Theorem, for X finite dimensional,the choice of P̂ on X is non-empty. Thus the Nash Equilibrium is non-empty. �
Although the Nakamura Theorem guarantees existence of a social choice for asocial choice rule, σ , for any semi-convex and LDC profile in dimension at mostk(σ )− 2, it is easy to construct situations in higher dimensions with empty choice.The example we now present also describes a game with no Nash equilibrium.
Example 3.12 Consider a simply voting procedure with M = {1,2,3} and let Dconsist of any coalition with at least two voters. Let X be a compact convex setin �2, and construct preferences for each i in M as follows. Each i has a “blisspoint” xi ∈ X and a preference Pi on X such that for y, x ∈ Pi(y) iff ‖x − xi‖ <
‖y − yi‖. The preference is clearly LDC and semi-convex (since Pi(x) is a convexset and x /∈ P(x) for all x ∈ X). Now let Δ = Con{x1, x2, x3} the 2-dimensionalsimplex in X (for convenience suppose all xi are distinct and in the interior of X).For each A⊂M let PA(x)=⋂i∈A Pi(x) as before.
In particular the choice for the strict Pareto rule is CPM(X)= CM(X)=Δ. This
can be seen by noting that if x ∈ CM(X) iff there is no point y ∈ M such that‖y − xi‖< ‖x − xi‖ ∀i ∈M . Clearly this condition holds exactly at those points inΔ. For this reason preferences of this kind are called “Euclidean preferences”.
Now consider a point in the interior of X. At x the preferred sets for the threecoalitions (D′ = {1,2}, {1,3}, {2,3}) do not satisfy the semi-convexity property.Figure 3.26 shows that
x ∈ Con{P12(x),P13(x),P23(x)
}.
While PD′ is LDC, it violates semi-convexity. Thus the Ky Fan Theorem cannotbe used to guarantee existence of a choice. To illustrate the connection with Walker’sTheorem, note also that there is a voting cycle. That is 1 prefers a to b to c, 2 prefersb to c to a, and 3 prefers c to a to b. The reason the cycle can be constructed is that
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
130 3 Topology and Convex Optimisation
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
Fig. 3.27 Empty Nashequilibrium
the Nakamura number is 3, and dimension (X) = 2, thus n = k(σ ) − 1. In factthere is no social choice in X. This voting cycle also defines a game with no Nashequilibrium.
Let {1,2} be the two parties, and let the “strategy spaces” be X1 = X2 = Δ.Say party i adopts a winning position yi ∈ Xi over party j ′s position, yj ∈ Xj
iff yi ∈ PD′(yj ), so party i gains a majority over party j . This induces a Nashimprovement correspondence P̂j , for j = 1,2.
For example P̂1 : Y → Y where Y =X1×X2 is defined by (y∗1 , y2) ∈ P̂1(y1, y2)
iff y∗1 ∈ PD′(y2), so y∗1 is a winning position over y2 but y1 /∈ PD′(y2), so y1 is nota winning position. In other words party 1 prefers to move from y1 to y∗1 , so as towin. It is evident if we choose y1, y2 and the points {z1, z2, z3} as in Fig. 3.27, thenzi ∈ PD′(y2) so (zi, y2) ∈ P̂1(y1, y2) for i = 1,2,3 and y1 ∈ Con{z1, z2, z3}.
Thus (y1, y2) ∈ Con P̂1(y1, y2). Because of the failure of semi-convexity of bothP̂1 and P̂2, a Nash equilibrium cannot be guaranteed. In fact the Nash equilibriumis empty.
We now briefly indicate how the Ky Fan Theorem can be used to show existenceof an equilibrium price vector in an exchange economy. First of all each individual i
initially holds a vector of endowments ei ∈ �n. A price vector p ∈Δn−1 belongs tothe (n− 1)-dimensional simplex: that is p = (p1, . . . , pn) such that
∑ni=1 pi = 1.
An allocation x ∈ X =∏i∈N Xi ⊂ (�n)m where Xi is i’s consumption set in�n+ (here xi ∈ �n+ iff xij ≥ 0, j = 1, . . . , n). At the price vector p, the ith budget setis
Bi(p)= {xi ∈ �n+ : 〈p,x〉 ≤ 〈p, ei〉}.
At price p, the demand vector x = (x1, . . . , xm) satisfies the optimality conditionP̂i(x)∩ {x ∈X : xi ∈ Bi(p)} =Φ for each i.
As before P̂i : X→ X is the Nash improvement correspondence (as in Defini-tion 3.10).
As we discussed earlier in Sect. 3.5.1, an equilibrium price vector p is a pricevector p = (p1, . . . , pn) such that the demand vector x satisfies the optimality con-dition at p and such that total demand does not exceed supply. This latter condition
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
3.8 Political and Economic Choice 131
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
requires that∑
i∈M xi ≤∑i∈M ei (the two terms are both vectors in �n). That is, ifwe use the suffix of xj to denote commodity j , then
∑i∈M(xij − eij )≤ 0 for each
j = 1, . . . , n.Note also that a transformation p > λp, for a real number λ > 0, does not change
the budget set.This follows because Bi(p)= {xi ∈ �n+ : 〈p,x〉 ≤ 〈p, ei〉} = Bi(λp).Consequently if p is an equilibrium price vector, then so is λp. Without loss of
generality, then, we can normalize the price vector, p, so that ‖p‖ = 1 for somenorm on �n. We may do this for example by assuming that
∑nj=1 pj = 1 and that
pj ≥ 0 ∀j .For this reason we let Δn−1 represent the set of all price vectors.A further point is worth making. Since we assume that pj ≥ 0 ∀j it is possible
for commodity j to have zero price. But then the good must be in excess supply.To ensure this, we require the equilibrium price vector p and i’s demand xi at p
to satisfy the condition 〈p,xi〉 = 〈p, ei〉 As we noted in Sect. 3.5.1, this can beensured by assuming that preference is locally non-satiated in X. That is if we letP i(x)⊂Xi be the projection of P̂i onto Xi at x, then for any neighborhood U of xi
in Xi there exists x′i ∈U such that x′i ∈ P i(x).To show existence of a price equilibrium we need to define a price adjustment
mechanism.To this end define:
P̂0 :Δ×X→Δ by
P̂0(p, x)={p′ ∈Δ :
⟨p′ − p
∑
i∈M
(xi − ei)
⟩> 0
}.
(**)
Now let X∗ =Δ×X and define P ∗0 :X∗ →X∗ by (p′, x) ∈ P ∗0 (p, y) iff x = y and
p′ ∈ P̂0(p, x).In the same way for each i ∈M extend P̂i :X→X to P ∗i :X∗ →X∗ by letting
(p′, x) ∈ P ∗i (p, y) iff p′ = p and x ∈ P̂i(y). This defines an exchange game Ge ={(P ∗i ,X∗) : i = 0, . . . ,m}.
Bergstrom (1992) has shown (under the appropriate conditions of semi-convexity,LHC and local monotonicity for each P̂i ) that there is a Nash equilibrium to Ge.Note that e ∈ (�n)m is the initial vector of endowments. We can show that the Nashequilibria comprise a non-empty set {(p, x) ∈Δ×X} where x = (x1, . . . , xm) is avector of final demands for the members of M , and p is an equilibrium price vector.
A Nash equilibrium (p, x) ∈Δ×X satisfies the following properties:(i) Since P ∗i (p, x)=Φ for each i ∈N , it follows that xi is i’s demand. Moreover
by local monotonicity we have 〈p,xi〉 = 〈p, ei〉, ∀i ∈M . Thus
∑
i∈M
〈p,xi − ei〉 = 0. (*)
(ii) Now P ∗0 (p, x)=Φ so 〈p− p,∑
i∈M(xi − ei)〉 ≤ 0 ∀p ∈Δ (See (**)).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
132 3 Topology and Convex Optimisation
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
Suppose that∑
i∈M(xi − ei) > 0. Then it is clearly possible to find p ∈Δ suchthat pj > 0, for some j , with 〈p,
∑i∈M(xi−ei)〉> 0. But this violates (*) and (**).
Consequently∑
i∈M(xij − eij )≤ 0 for j = 1, . . . , n.Thus x ∈ (�n)m satisfies the feasibility constraint
∑i∈M xi ≤∑i∈M ei .
Hence (p, x) is a free-disposal equilibrium, in the sense that total demand may beless than total supply. Bergstrom (1992) demonstrates how additional assumptionson individual preference are sufficient to guarantee equality of demand and supply.Section 4.4, below, discusses this more fully.
References
The reference for the Kuhn-Tucker Theorems is:
<unc> Kuhn, H. W., & Tucker, A. W. (1950). Non-linear programming. In Proceedings: 2nd Berke-ley symposium on mathematical statistics and probability, Berkeley: University of CaliforniaPress.
A useful further reference for economic applications is:
<unc> Heal, E. M. (1973). Theory of economic planning. Amsterdam: North Holland.
The classic references on fixed point theorems and the various corollaries of the theorems are:
Brouwer, L. E. J. (1912). Uber Abbildung von Mannigfaltikeiten. Mathematische Annalen, 71,97–115.
<unc> Browder, F. E. (1967). A new generalization of the Schauder fixed point theorem. MathematischeAnnalen, 174, 285–290.
<unc> Browder, F. E. (1968). The fixed point theory of multivalued mappings in topological vector spaces.Mathematische Annalen, 177, 283–301.
Fan, K. (1961). A generalization of Tychonoff’s fixed point theorem. Mathematische Annalen, 142,305–310.
Knaster, B., Kuratowski, K., & Mazerkiewicz, S. (1929). Ein Beweis des Fixpunktsatze fur n-dimensionale Simplexe. Fundamenta Mathematicae, 14, 132–137.
Michael, E. (1956). Continuous selections I. Annals of Mathematics, 63, 361–382.Nash, J. F. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of
Sciences of the United States of America, 36, 48–49.<unc> Schauder, J. (1930). Der Fixpunktsatze in Funktionalräumn. Studia Mathematica, 2, 171–180.
A useful discussion of the relationships between the theorems can be found in:
<unc> Border, K. (1985). Fixed point theorems with applications to economics and game theory. Cam-bridge: Cambridge University Press.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
References 133
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
The proof of the Browder fixed point theorem by KKM is given in:
<unc> Yannelis, N., & Prabhakar, N. (1983). Existence of maximal elements and equilibria in lineartopological spaces. Journal of Mathematical Economics, 12, 233–245.
Applications of the Fan Theorem to show existence of a price equilibrium can be found in:
<unc> Aliprantis, C., & Brown, D. (1983). Equilibria in markets with a Riesz space of commodities.Journal of Mathematical Economics, 11, 189–207.
Bergstrom, T. (1975). The existence of maximal elements and equilibria in the absence of transi-tivity (Typescript). Washington University in St. Louis.
Bergstrom, T. (1992). When non-transitive relations take maxima and competitive equilibriumcan’t be beat. In W. Neuefeind & R. Riezman (Eds.), Economic theory and international trade,Berlin: Springer.
<unc> Shafer, W. (1976). Equilibrium in economies without ordered preferences or first disposal. Journalof Mathematical Economics, 3, 135–137.
<unc> Shafer, W., & Sonnenschein, H. (1975). Equilibrium in abstract economies without ordered pref-erences. Journal of Mathematical Economics, 2, 345–348.
The application of the idea of the Nakamura number to existence of a voting equilibrium can befound in:
<unc> Greenberg, J. (1979). Consistent majority rules over compact sets of alternatives. Econometrica,41, 285–297.
Schofield, N. (1984). Social equilibrium and cycles on compact sets. Journal of Economic Theory,33, 59–71.
<unc> Strnad, J. (1985). The structure of continuous-valued neutral monotonic social functions. SocialChoice and Welfare, 2, 181–195.
Finally there are results on existence of a joint political economic equilibrium, namely an outcome(p, t, x, y), where (p, x) is a market equilibrium, t is an equilibrium tax schedule voted on undera rule D and y is an allocation of publicly provided goods.
<unc> Konishi, H. (1996). Equilibrium in abstract political economies: with an application to a publicgood economy with voting. Social Choice and Welfare, 13, 43–50.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
4Differential Calculus and Smooth Optimisation
Under certain conditions a continuous function f : �n→�m can be approximatedat each point x in �n by a linear function df (x) : �n→�m, known as the differen-tial of f at x. In the same way the differential df may be approximated by a bilinearmap d2f (x). When all differentials are continuous then f is called smooth. For asmooth function f , Taylor’s Theorem gives a relationship between the differentialsat a point x and the value of f in a neighbourhood of a point. This in turn allows usto characterise maximum points of the function by features of the first and seconddifferential. For a real-valued function whose preference correspondence is convexwe can virtually identify critical points (where df (x)= 0) with maxima.
In the maximisation problem for a smooth function on a “smooth” constraint set,we seek critical points of the Lagrangian, introduced in the previous chapter. In par-ticular in economic situations with exogenous prices we may characterise optimumpoints for consumers and producers to be points where the differential of the utilityor production function is given by the price vector. Finally we use these results toshow that for a society the set of Pareto optimal points belongs to a set of generalisedcritical points of a function which describes the preferences of the society.
4.1 Differential of a Function
A function f : �→� is differentiable at x ∈ � if Limh→0f (x+h)−f (x)
hexists (and
is neither +∞ nor −∞). When this limit exists we shall write it as dfdx|x . Another
way of writing this is that as (xn)→ x then
f (xn)− f (x)
xn − x→ df
dx
∣∣∣∣x
,
the derivative of f at x.This means that there is a real number λ(x) = df
dx|x ∈ � such that f (x) =
λ(x)h+ ε|h|, where ε→ 0 as h→ 0.
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_4,© Springer-Verlag Berlin Heidelberg 2014
135
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
136 4 Differential Calculus and Smooth Optimisation
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Fig. 4.1 ???
Let df (x) be the linear function �→� given by df (x)(h) = λ(x)h. Then themap �→� given by
h→ f (x)+ df (x)(h)= g(x + h)
is a “first order approximation” to the map
h→ f (x + h).
In other words the maps h→ g(x + h) and h→ f (x + h) are “tangent” to oneanother where “tangent” means that
|f (x + h)− g(x + h)||h|
approaches 0 as h→ 0. Note that the map h→ f (x)+ df (x)(h)= g(x + h) has astraight line graph, and so df (x) is a “linear approximation” to the function f at x.
Example 4.11. Suppose f : � → � : x → x2. Then Limh→0
f (x+h)−f (x)h
=Limh→0
(x+h)2−x2
h= Limh→0
2hx+h2
h= 2x + Limh→0 h = 2x. Similarly if
f : �→� : x→ xr then df (x)= rxr−1.2. Suppose f : �→� : x→ sinx. Then
Limh→0
(sin(x + h)− sinx
h
)
= Limh→0
(sinx(cosh− 1)+ cosx sinh
h
)
= Limh→0
sinx
h
(−h2
2
)+ Lim
h→0
cosx
h(h)
= cosx.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.1 Differential of a Function 137
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
3. f : �→� : x→ ex .
Limh→0
ex+h − ex
h= Lim
h→0
ex
h
[1+ h+ h2
2· · · − 1
]= ex.
4. f : �→� : x→ x4 if x ≥ 0, x2 if x < 0. Consider the limit as h approaches0 from above (i.e., h→ 0+). Then
Limh→0+
f (0+ h)− f (0)
h= h4 − 0
h= h3 = 0.
The limit as h approaches 0 from below is
Limh→0−
f (0+ h)− f (0)
h= h3 − 0
h= h2 = 0.
Thus df (0) is defined and equal to 0.5. f : �→�, by
x→−x2 x ≤ 0
x→ (x − 1)2 − 1 0 < x ≤ 1
x→−x x > 1
Limx→0
f (x)= 0
Limx→0+
f (x)= 0.
Thus f is continuous at x = 0.
Limx→1−
f (x)=−1
Limx→1+
f (x)=−1.
Thus f is continuous at x = 1.
Limx→0−
df (x)= Limx→0−
(−2x)= 0
Limx→0+
df (x)= Limx→0+
2(x − 1)=−2
Limx→1−
df (x)= Limx→1−
2(x − 1)= 0
Limx→1+
df (x)= Limx→1+
(−1)=−1.
Hence df (x) is not continuous at x = 0 and x = 1.To extend the definition to the higher dimension case, we proceed as follows:
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
138 4 Differential Calculus and Smooth Optimisation
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Fig. 4.2 ???
Definition 4.1 Let X,Y be two normed vector spaces, with norms ‖ ‖X , ‖ ‖Y , andsuppose f,g :X→ Y . Then say f and g are tangent at x ∈X iff
Lim‖h‖X→0
‖f (x + h)− g(x + h)‖Y‖h‖X = 0.
If there exists a linear map df (x) : X → Y such that the function g : X → Y
given by
g(x + h)= f (x)+ df (x)(h)
is tangent to f at x, then f is said to be differentiable at x, and df (x) is called thedifferential of f at x.
In other words df (x) is the differential of f at x iff there is a linear approxima-tion df (x) to f at x, in the sense that
f (x + h)− f (x)= df (x)(h)+ ‖h‖Xμ(h)
where μ :X→ Y and ‖μ(h)‖Y → 0 as ‖h‖X → 0.Note that since df (x) is a linear map from X to Y , then its image is a vector
subspace of Y , and df (x)(0) is the origin, 0, in Y .Suppose now that f is defined on an open ball U in X.For some x ∈ U , consider an open neighborhood V of x in U . The image of the
map
h→ g(x + h) for each h ∈U
will be of the form f (x) + df (x)(h), which is to say a linear subspace of Y , buttranslated by the vector f (x) from the origin.
If f is differentiable at x, then we can regard df (x) as a linear map from X toY , so df (x) ∈ L(X,Y ), the set of linear maps from X to Y . As we have shown inSect. 3.2 of Chap. 3, L(X,Y ) is a normed vector space, when X is finite dimen-sional. For example, for k ∈ L(X,Y ) we can define ‖k‖ by
‖k‖ = sup{∥∥k(x)
∥∥Y: x ∈X s.t. ‖x‖X = 1
}.
Let L(X,Y ) be L(X,Y ) with the topology induced from this norm.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.1 Differential of a Function 139
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
When f : U ⊂X→ Y is continuous we shall call f a C0-map. If f is C0, anddf (x) is defined at x, then df (x) will be linear and thus continuous, in the sensethat df (x) ∈ L(X,Y ).
Hence we can regard df as a map
df :U → L(X,Y ).
It is important to note here that though the map df (x) may be continuous, themap df : U → L(X,Y ) need not be continuous at x. However when f is C0, andthe map
df :U → L(X,Y )
is continuous for all x ∈ U , then we shall say that f is a C1-differentiable mapon U . Let C0(U,Y ) be the set of maps from U to Y which are continuous on U ,and let C1(U,Y ) be the set of maps which are C1-differentiable on U . ClearlyC1(U,Y ) ⊂ C0(U,Y ). If f is a differentiable map, then df (x), since it is lin-ear, can be represented by a matrix. Suppose therefore that f : �n →�, and let{e1, . . . , en} be the standard basis for �n. Then for any h ∈ �n,h=∑n
i=1 hiei andso df (x)(h)=∑n
i=1 hidf (x)(ei)=∑ni=1 hiαi say.
Consider the vector (0, . . . , hi, . . . ,0) ∈ �n.Then by the definitions
αi = df (x)(0, . . . , ei, . . . ,0)
= Limhi→o
{f (x1, . . . , xi + hi, . . . , )− f (x1, . . . , xi, . . . , xn)
hi
}= ∂f
∂xi
∣∣∣∣x
,
where ∂f∂xi|x is called the partial derivative of f at x with respect to the ith coordi-
nate, xi .Thus the linear function df (x) : �→� can be represented by a “row vector” or
matrix
Df (x)=(
∂fj
∂x1
∣∣∣∣x
, . . . ,∂f
∂xn
∣∣∣∣x
).
Note that this representation is dependent on the particular choice of the basis for�n. This matrix Df (x) can also be regarded as a vector in �n, and is then called thedirection gradient of f at x. The ith coordinate of Df (x) is the partial derivative off with respect to xi at x.
If h is a vector in �n with coordinates (h1, . . . , hn) with respect to the standardbasis, then
df (x)(h)=n∑
i=1
hi
∂f
∂xi
∣∣∣∣x
= ⟨Df (x),hi
⟩
where 〈Df (x),h〉 is the scalar product of h and the direction gradient Df (x).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
140 4 Differential Calculus and Smooth Optimisation
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
In the same way if f : �n→�m and f is differentiable at x, then df (x) can berepresented by the n×m matrix
Df (x)=(
∂fj
∂xi
)
x
, i = 1, . . . , n; j = 1, . . . ,m
where f (x)= f (x1, . . . , xn)= (f1(x), . . . , fj (x), . . . , fm(x)). This matrix is calledthe Jacobian of f at x. We may define the norm of Df (x) to be
∥∥Df (x)∥∥= sup
{∣∣∣∣∂fj
∂xi
∣∣∣∣x
: i = 1, . . . , n; j = 1, . . . ,m
}.
When f has domain U ⊂ �n, then continuity of Df : U → M(n,m), whereM(n,m) is the set of n×m matrices, implies the continuity of each partial derivative
U →� : x→ ∂fj
∂xi
∣∣∣∣x
.
Note that when f : � → � then ∂f∂x|x is written simply as df
dx|x and is a real
number.Then the linear function df (x) : �→� is given by df (x)(h)= (
dfdx|x)h.
To simplify notation we shall not distinguish between the linear function df (x)
and the real number dfdx|x when f : �→�.
Suppose now that f : U ⊂ X→ Y and g : V ⊂ Y → Z such that g ◦ f : U ⊂X→ Z exists. If f is differentiable at x, and g is differentiable at f (x) then g ◦ f
is differentiable at x and is given by
d(g ◦ f )(x)= dg(f (x)
) ◦ df (x).
In terms of Jacobian matrices this is D(g ◦ f )(x) =Dg(f (x)) ◦Df (x), or ∂gk
∂xi=
∑mj=1
∂gk
∂fj
∂fj
∂xi, i.e.,
kth row(
∂gk
∂f1. . .
∂gk
∂fm
)
⎛
⎜⎜⎜⎜⎝
∂f1∂xi
...∂fm
∂xi
ith column
⎞
⎟⎟⎟⎟⎠
This is also known as the chain-rule.If Id : �n →�n is the identity map then clearly the Jacobian matrix of Id must
be the identity matrix.Suppose now that f : U ⊂ �n → V ⊂ �n is differentiable at x, and has an in-
verse g = f−1 which is differentiable. Then g ◦ f = Id and so Id=D(g ◦ f )(x)=Dg(f (x)) ◦Df (x). Thus D(f−1)(f (x))= [Df (x)]−1.
In particular, for this to be the case Df (x) must be an n× n matrix of rank n.When this is so, f is called a diffeomorphism.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.1 Differential of a Function 141
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Fig. 4.3 ???
On the other hand suppose f :X→� and g : Y →�, where f is differentiableat x ∈X and g is differentiable at y ∈ Y .
Let fg : X × Y →� : (x, y)→ f (x)g(y). From the chain rule, d(fg)(x, y)×(h, k) = f (x) dg(y)(k) + g(y) df (x)(h), and so d(fg)(x, y) = f (x)dg(y) +g(y)df (x).
Hence fg is differentiable at the point (x, y) ∈X× Y .When X = Y = �, and (fg)(x) = f (x)g(x) then d(fg)(x) = f dg(x) +
gdf (x), where this is called the product rule.
Example 4.2 Consider the function f : �→� given by x→ x2 sin 1x
if x �= 0; 0 ifx = 0.
We first of all verify that f is continuous. Let g(x)= x2, h(x)= sin 1x= ρ[m(x)]
where m(x) = 1x
and ρ(y) = sin(y). Since m is continuous at any non zero point,both h and g are continuous. Thus f is continuous at x �= 0. (Compare with Exam-ple 3.1.)
Now Limx→0 x2 sin 1x= Limy→+∞ siny
y2 . But −1 ≤ siny ≤ 1, and so
Limy→+∞ siny
y2 = 0.Hence xn→ 0 implies f (xn)→ 0= f (0). Thus f is also continuous at x = 0.Consider now the differential of f . By the product rule, since f = gh,
d(gh)(x)= x2dh(x)+(
sin1
x
)dg(x).
Since h(x)= ρ[m(x)], by the chain rule,
dh(x) = dρ(m(x)
) · dm(x)
= cos
[(m(x)
)(− 1
x2
)]
= − 1
x2cos
1
x.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
142 4 Differential Calculus and Smooth Optimisation
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Thus
df (x) = d(gh)(x)
= x2[− 1
x2cos
1
x+ 2x sin
1
x
]
= − cos1
x+ 2x sin
1
x,
for any x �= 0.Clearly df (x) is defined and continuous at any x �= 0. To determine if df (0) is
defined, let k = 1h
. Then
Limh→0+
f (0+ h)− f (0)
h= Lim
h→0+
h2 sin 1h
h
= Limh→0+
h sin1
h
= Limk→+∞
sin k
k.
But again −1 ≤ sin k ≤ 1 for all k, and so Limk→+∞ sinkk= 0. In the same
way Limh→0−h2 sin 1
h
h= 0. Thus Limh→0
f (0+h)−f (0)h
= 0, and so df (0)= 0. Hencedf (0) is defined and equal to zero.
On the other hand consider (xn)→ 0+. We show that Limxn→0+ df (xn) doesnot exist. By the above Limx→0+ df (x) = Limx→0+[2x sin 1
x− cos 1
x]. While
Limx→0+ 2x sin 1x= 0, there is no limit for cos 1
xas x → 0+ (see Example 3.1).
Thus the function df : �→ L(�,�) is not continuous at the point x = 0.The reason for the discontinuity of the function df at x = 0 is that in any neigh-
bourhood U of the origin, there exist an “infinite” number of non-zero points, x′,such that df (x′)= 0. We return to this below.
4.2 Cr -Differentiable Functions
4.2.1 The Hessian
Suppose that f :X→ Y is a C1-differentiable map, with domain U ⊂X. Then aswe have seen df : U → L(X,Y ) where L(X,Y ) is the topological vector space oflinear maps from X to Y with the norm
‖k‖ = sup{∥∥k(x)
∥∥Y: x ∈X s.t. ‖x‖X = 1
}.
Since both U and L(X,Y ) are normed vector spaces, and df is continuous, df
may itself be differentiable at a point x ∈ U . If df is differentiable, then its deriva-tive at x is written d2f (x), and will itself be a linear approximation of the map df
from X to L(X,Y ). If df is C1, then df will be continuous, and d2f (x) will alsobe a continuous map. Thus d2f (x) ∈ L(X,L(X,Y )).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 143
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
When d2f :U → L(X,L(X,Y )) is itself continuous, and f is C1-differentiable,then say f is C2-differentiable. Let C2(U,Y ) be the set of C2-differentiable mapson U . In precisely the same way say that f is Cr -differentiable iff f is Cr−1-differentiable, and the r th derivative df : U → L(X,L(X,L(X, . . .))) is continu-ous.
The map is called smooth or C∞ if drf is continuous for all r .Now the second derivative d2f (x) satisfies d2f (x)(h)(k) ∈ Y for vectors
h, k ∈X. Moreover d2f (x)(h) is a linear map from X to Y and d2f (x) is a lin-ear map from X to L(X,Y ).
Thus d2f (x) is linear in both factors h and k. Hence d2f (x) may be regarded asa map
H(x) :X×X→ Y
where H(x)(h, k)= d2f (x)(h)(k) ∈ Y .Moreover d2f (x) is linear in both h and k, and so H(x) is linear in both h and k.
As in Sect. 2.3.3, we say H(x) is bilinear.Let L2(X;Y) be the set of bilinear maps X×X → Y . Thus H ∈L2(X;Y) iff
H(α1h1 + α2h2, k) = α1H(h1, k)+ α2H(h2, k)
H(h,β1k1 + β2k2) = β1H(h, k1)+ β2H(h, k2)
for any α1, α2, β1, β2 ∈Re,h,h1, h2, k, k1, k2 ∈X.Since X is a finite-dimensional normed vector space, so is X ×X, and thus the
set of bilinear maps L2(X;Y) has a norm topology. Write L2(X,Y ) when the setof bilinear maps has this topology. The continuity of the second differential d2f :U → L(X,L(X,Y )) is equivalent to the continuity of the map H :U → L2(X;Y),and we may therefore regard d2f as a map d2f : U → L2(X;Y). In the same waywe may regard drf as a map drf : U → Lr (X;Y) where Lr (X;Y) is the set ofmaps Xr → Y which are linear in each component, and is endowed with the normtopology.
Suppose now that f : �n→� is a C2-map, and consider a point x = (x1, . . . , xn)
where the coordinates are chosen with respect to the standard basis. As we have seenthe differential df :U → L(�n,�) can be represented by a continuous function
Df : x→(
∂f
∂x1
∣∣∣∣x
, . . . ,∂f
∂xn
∣∣∣∣x
).
Now let ∂fj : U →� be the continuous function x→ ∂f∂xj|x . Clearly the differ-
ential of ∂fj will be
x→(
∂
∂x1(∂fj )
∣∣∣∣x
, . . . ,∂
∂xn
(∂fj )
∣∣∣∣x
);
write ∂∂xi
(∂fj )|x = ∂fji = ∂∂xi
(∂f∂xj
)|x .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
144 4 Differential Calculus and Smooth Optimisation
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Then the differential d2f (x) can be represented by the matrix array
(∂fji)x =
⎛
⎜⎜⎝
∂∂x1
(∂f∂x1
) . . . ∂∂x1
(∂f∂xn
)
......
∂∂xn
(∂f∂x1
) . . . ∂∂xn
(∂f∂xn
)
⎞
⎟⎟⎠
x
.
This n× n matrix we shall also write as Hf (x) and call the Hessian matrix off at x. Note that Hf (x) is dependent on the particular basis, or coordinate systemfor X.
From elementary calculus it is known that
∂
∂xi
(∂f
∂xj
)∣∣∣∣x
= ∂
∂xj
(∂f
∂xi
)∣∣∣∣x
and so the matrix Hf (x) is symmetric.Consequently, as in Sect. 2.3.3, Hf (x) may be regarded as a quadratic form
given by
D2f (x)(h, k) = ⟨h,Hf (x)(k)⟩
= (h1, . . . , hn)
(∂
∂xi
(∂f
∂xj
)(k1kn
))
=n∑
i=1
n∑
j=1
hi
∂
∂xi
(∂f
∂xj
)kj .
As an illustration if f : �2→� is C2 then D2f (x) : �2 ×�2→� is given by
D2f (x)(h,h) = (h1 h2)⎛
⎝∂2f
∂x12
∂2f
∂x12∂x2
∂2f
∂x12∂x1
∂2f
∂x22
⎞
⎠
⎛
⎝h1
h2
⎞
⎠
=(
h12 ∂2f
∂x12+ 2h1h2
∂2f
∂x1∂x2+ h2
2 ∂2f
∂x22
)∣∣∣∣x
.
In the case that f : �→ � is C2, then ∂2f
∂x2 |x is simply written as d2f
dx2 |x , a real
number. Consequently the second differential D2f (x) is given by
D2f (x)(h,h) = h
(d2f
dx2
∣∣∣∣x
)h
= h2 d2f
dx2
∣∣∣∣x
.
We shall not distinguish in this case between the linear map D2f (x) : �2 →�and the real number d2f
dx2 |x .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 145
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
Fig. 4.4 ???
4.2.2 Taylor’s Theorem
From the definition of the derivative of a function f : X→ Y , df (x) is the linearapproximation to f in the sense that f (x + h) − f (x) = df (x)(h) + ‖h‖xμ(h)
where the “error” ‖h‖xμ(h) approaches zero as h approaches zero. Taylor’s The-orem is concerned with the “accuracy” of this approximation for a small vector h,by using the higher order derivatives. Our principal tool in this is the following. Iff :U ⊂X→� and the convex hull [x, x + h] of the points x and x + h belongs toU , then there is some point z ∈ [x, x + h] such that f (x + h)= f (x)+ df (z)(h).
To prove this result we proceed as follows.
Lemma 4.1 (Rolle’s Theorem) Let f : U →� where U is an open set in � con-taining the compact interval I = [a, b], and a < b. Suppose that f is continuousand differentiable on U , and that f (a)= f (b). Then there exists a point c ∈ (a, b)
such that df (c)= 0.
Proof From the Weierstrass Theorem (and Lemma 3.16) f attains its upper andlower bounds on the compact interval, I . Thus there exists finite m,M ∈ � suchthat m≤ f (x)≤M for all x ∈ I .
If f is constant on I , so m= f (x)=M,∀x ∈ I , then clearly df (x)= 0 for allx ∈ I .
Then there exists a point c in the interior (a, b) of I such that df (c)= 0. Supposethat f is not constant. Since f is continuous and I is compact, there exist pointsc, e ∈ I such that f (c)=M and f (e)=m. Suppose that neither c nor e belong to(a, b). In this case we obtain a = e and b= c, say. But then M =m and so f is theconstant function. When f is not the constant function either c or e belongs to theinterior (a, b) of I .1. Suppose c ∈ (a, b). Clearly f (c) − f (x) ≥ 0 for all x ∈ I . Since c ∈ (a, b)
there exists x ∈ I s.t. x > c, in which case f (x)−f (c)x−c
≤ 0. On the other hand
there exists x ∈ I s.t. x < c and f (x)−f (c)x−c
≥ 0. By the continuity of df at x,
df (c) = Limx→c+f (x)−f (c)
x−c= Limx→c−
f (x)−f (c)x−c
= 0. Since c ∈ (a, b) anddf (x)= 0 we obtain the result.
2. If e ∈ (a, b) then we proceed in precisely the same way to show df (e) = 0.Thus there exists some point c ∈ (a, b), say, such that df (c)= 0. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
146 4 Differential Calculus and Smooth Optimisation
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Fig. 4.5 ???
Note that when both c and e belong to the interior of I , then these maximum andminimum points for the function f are critical points in the sense that the derivativeis zero.
Lemma 4.2 Let f : �→ � where f is continuous on the interval I = [a, b] anddf is continuous on (a, b). Then there exists a point c ∈ (a, b) such that df (c) =f (b)−f (a)
b−a.
Proof Let g(x) = f (b) − f (x) − k(b − x) and k = f (b)−f (a)b−a
. Clearly, g(a) =g(b)= 0. By Rolle’s Theorem, there exists some point c ∈ (a, b) such that dg(c)=0. But dg(c)= k− df (c). Thus df (c)= f (b)−f (a)
b−a. �
Lemma 4.3 Let f : �→ � be continuous and differentiable on an open set con-taining the interval [a, a + h]. Then there exists a number t ∈ (0,1) such that
f (a + h)= f (a)+ df (a + th)(h).
Proof Put b= a + h. By the previous lemma there exists c ∈ (a, a + h) such that
df (c)= f (b)− f (a)
b− a.
Let t = c−ab−a
. Clearly t ∈ (0,1) and c = a + th. But then df (a + th) : �→� isthe linear map given by df (a + th)(h)= f (b)− f (a), and so f (a + h)= f (a)+df (a + th)(h). �
Mean Value Theorem Let f : U ⊂ X → � be a differentiable function on U ,where U is an open set in the normed vector space X. Suppose that the line seg-ment
[x, x + h] = {z : z= x + th where t ∈ [0,1]}
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 147
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
belongs to U . Then there is some number t ∈ (0,1) such that
f (x + h)= f (x)+ df (x + th)(h).
Proof Define g : [0,1]→� by g(t)= f (x + th). Now g is the composition of thefunction
ρ : [0,1]→U : t → x + th
with f : [x, x + h]→�.Since both ρ and f are differentiable, so is g. By the chain rule,
dg(t) = df(ρ(t)) ◦ dρ(t)
= df (x + th)(h).
By Lemma 4.3, there exists t ∈ (0,1) such that dg(t) = g(1)−g(0)1−0 . But g(1) =
f (x + h) and g(0)= f (x). Hence df (x + th)(h)= f (x + h)− f (x). �
Lemma 4.4 Suppose g : U →� is a C2-map on an open set U in � containingthe interval [0,1]. Then there exists θ ∈ (0,1) such that
g(1)= g(0)+ dg(0)+ 1
2d2g(θ).
Proof (Note here that we regard dg(t) and d2g(t) as real numbers.) Now definek(t)= g(t)− g(0)− tdg(0)− t2[g(1)− g(0)− dg(0)].
Clearly k(0) = k(1) = 0, and so by Rolle’s Theorem, there exists θ ∈ (0,1)
such that dk(θ)= 0. But dk(t)= dg(t)− dg(0)− 2t[g(1)− g(0)− dg(0)]. Hencedk(0)= 0.
Again by Rolle’s Theorem, there exists θ ′ ∈ (0, θ) such that d2k(θ ′) = 0. Butd2k(θ ′)= d2g(θ ′)− 2[g(1)− g(0)− dg(0)].
Hence g(1)− g(0)− dg(0)= 12d2g(θ ′) for some θ ′ ∈ (0,1). �
Lemma 4.5 Let f :U ⊂X→� be a C2-function on an open set U in the normedvector space X. If the line segment [x, x + h] belongs to U , then there exists z ∈(x, x + h) such that
f (x + h)= f (x)+ df (x)(h)+ 1
2d2f (z)(h,h).
Proof Let g : [0,1] → � by g(t) = f (x + th). As in the mean value theorem,dg(t)= df (x + th)(h). Moreover d2g(t)= d2f (x + th)(h,h).
By Lemma 4.4, g(1)= g(0)+ dg(0)+ 12d2g(θ ′) for some θ ′ ∈ (0,1).
Let z= x + θ ′h. Then f (x + h)= f (x)+ df (x)(h)+ 12d2f (h,h). �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
148 4 Differential Calculus and Smooth Optimisation
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Fig. 4.6 ???
Taylor’s Theorem Let f :U ⊂X→� be a smooth (or C∞-) function on an openset U in the normed vector space X. If the line segment [x, x + h] belongs to U ,then f (x + h) = f (x)+∑n
r=11r!d
rf (x)(h, . . . , h)+ Rn(h) where the error term
Rn(h)= 1(n+1)!d
n+1f (z)(h, . . . , h) and z ∈ (x, x + h).
Proof By induction on Lemma 4.5, using the mean value theorem. �
The Taylor series of f at x to order k is
[f (x)
]k= f (x)+
k∑
r=1
1
r!drf (x)(h, . . . , h).
When f is C∞, then [f (x)]k exists for all k. In the case when X =�n and the errorterm Rk(h) approaches zero as k→∞, then the Taylor series [f (x)]k will convergeto f (x + h).
In general however [f (x)]k need not converge, or if it does converge then it neednot converge to f (x + h).
Example 4.3 To illustrate this, consider the flat function f : �→� given by
f (x) = exp
(− 1
x2
), x �= 0,
= 0 x = 0.
Now Df (x)=− 2x3 exp(− 1
x2 ) for x �= 0. Since y32 e−y → 0 as y→∞, we obtain
Df (x)→ 0 as x→ 0. However
Df (0)= Limh→0
f (0+ h)− f (0)
h= Lim
h→0
1
hexp
(− 1
h2
)= 0.
Thus f is both continuous and C1 at x = 0. In the same way f is Cr andDrf (0)= 0 for all r > 1. Thus the Taylor series is [f (0)]k = 0. However for smallh > 0 it is evident that f (0+ h) �= 0. Hence the Taylor series cannot converge to f .
These remarks lead directly to classification theory in differential topology, andare beyond the scope of this work. The interested reader may refer to Chillingsworth(1976) for further discussion.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 149
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
4.2.3 Critical Points of a Function
Suppose now that f :U ⊂�n→� is a C2-map. Once a coordinate basis is chosenfor �n, then D2f (x) may be regarded as a quadratic form. In matrix notation thismeans that
D2f (x)(h,h)= htHf (x)h.
As we have seen in Chap. 2 if the Hessian matrix Hf (x) = (∂fji)x has all itseigenvalues positive, then D2f (x)(h,h) > 0 for all h ∈ �n, and so Hf (x) will bepositive definite.
Conversely Hf (x) is negative definite iff all its eigenvalues are strictly negative.
Lemma 4.6 If f :U ⊂�n→� is a C2 map on U , and the Hessian matrix Hf (x)
is positive (negative) definite at x, then there is a neighbourhood V of x in U s. t.Hf (y) is positive (negative) definite for all y ∈ V .
Proof If Hf (x) is positive definite, then as we have seen there are n different al-gebraic relationships between the partial derivatives ∂fji(x) for j = 1, . . . , n andi = 1, . . . , n, which characterise the roots λ1(x), . . . , λn(x) of the characteristicequation
∣∣Hf (x)− λ(x)I
∣∣= 0.
But since f is C2, the map D2f : U → L2(�n;�) is continuous. In particularthis implies that for each i, j the map x→ ∂
∂xi(
∂f∂xj
)|x = ∂fji(x) is continuous. Thusif ∂fji(x) > 0 then there is a neighbourhood V of x in U such that ∂fji(y) > 0 forall y ∈ V . Moreover if
C(x)= C(∂fji(x) : i = 1, . . . , n; j = 1, . . . , n
)
is an algebraic sentence in ∂fji(x) such that C(x) > 0 then again there is a neigh-bourhood V of x in U such that C(y) > 0 for all y ∈ V .
Thus there is a neighborhood V of x in U such that λi(x) > 0 for i = 1, . . . , n
implies λi(y) > 0 for i = 1, . . . , n, and all y ∈ V . Hence Hf (x) is positive definiteat x ∈ U implies that Hf (y) is positive definite for all y in some neighborhood ofx in U . The same argument holds if Hf (x) is negative definite at x. �
Definition 4.2 Let f :U ⊂�n→� where U is an open set in �n. A point x in U
is called1. a local strict maximum of f in U iff there exists a neighbourhood V of x in U
such that f (y) < f (x) for all y ∈ V with y �= x;2. a local strict minimum of f in U iff there exists a neighbourhood V of x in U
such that f (y) > f (x) for all y ∈ V with y �= x;3. a local maximum of f in U iff there exists a neighbourhood V of x in U such
that f (y)≤ f (x) for y ∈ V ;
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
150 4 Differential Calculus and Smooth Optimisation
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
4. a local minimum of f in U iff there exists a neighbourhood V of x in U suchthat f (y)≥ f (x) for all y ∈ V .
5. Similarly a global (strict) maximum (or minimum) on U is defined by requiringf (y) < (≤,>,≥)f (x) respectively on U .
6. If f is C1-differentiable then x is called a critical point iff df (x)= 0, the zeromap from �n to �.
Lemma 4.7 Suppose that f : U ⊂ �n →� is a C2-function on an open set U
in �n. Then f has a local strict maximum (minimum) at x if1. x is a critical point of f and2. the Hessian Hf (x) is negative (positive) definite.
Proof Suppose that x is a critical point and Hf (x) is negative definite. ByLemma 4.5
f (y)= f (x)+ df (x)(h)+ 1
2d2f (z)(h,h)
whenever the line segment [x, y] ∈U,h= y−x and z= x+θh for some θ ∈ (0,1).Now by the assumption there is a coordinate base for �n such that Hf (x) is
negative definite. By Lemma 4.6, there is a neighbourhood V of x in U such thatHf (y) is negative definite for all y in V . Let Nε(x) = {x + h : ‖h‖ < ε} be anε-neighborhood in V of x. Let S ε
2(0)= {h ∈ �n : ‖h‖ = 1
2ε}.Clearly any vector x + h, where h ∈ S ε
2(0) belongs to Nε(x), and thus V . Hence
Hf (z) is negative definite for any z = x + θh, where h ∈ S ε2(0), and θ ∈ (0,1).
Thus Hf (z)(h,h) < 0, and any z ∈ [x, x + h].But also by assumption df (x) = 0 and so df (x)(h) = 0 for all h ∈ �n. Hence
f (x + h) = f (x) + 12d2f (z)(h,h) and so f (x + h) < f (x) for h ∈ S ε
2(0). But
the same argument is true for any h satisfying ‖h‖ < ε2 . Thus f (y) < f (x) for
all y in the open ball of radius ε2 about x. Hence x is a local strict maximum. The
same argument when Hf (x) is positive definite shows that x must be a local strictminimum. �
In Sect. 2.3 we defined a quadratic form A∗ : �n×�n→� to be non-degenerateiff the nullity of A∗, namely {x : A∗(x, x) = 0}, is {0}. If x is a critical point of aC2-function f : U ⊂�n →� such that d2f (x) is non-degenerate (when regardedas a quadratic form), then call x a non-degenerate critical point.
The dimension (s) of the subspace of �n on which d2f (x) is negative definite iscalled the index of f at x, and x is called a critical point of index s.
If x is a non-degenerate critical point, then when any coordinate system for �n
is chosen, the Hessian Hf (x) will have s eigenvalues which are negative, and n− s
which are positive.For example if f : �→�, then only three cases can occur at a critical point
1. d2f (x) > 0 : x is a local minimum;
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 151
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
2. d2f (x) < 0 : x is a local maximum;3. d2f (x)= 0 : x is a degenerate critical point.
If f : �2 →� then a number of different cases can occur. There are three non-degenerate cases:
1. Hf (x)= [ 1 00 1
], say, with respect to a suitable basis; x is a local minimum since
both eigenvalues are positive. Index= 0.
2. Hf (x) = [−1 00 −1
]; x is a local maximum, both eigenvalues are negative.
Index= 2.
3. Hf (x)= [ 1 00 −1
]; x is a saddle point or index 1 non-degenerate critical point.
In the degenerate cases, one eigenvalue is zero and so det (Hf (x))= 0.
Example 4.4 Let f : �2 →� : (x, y)→ xy. The differential at (x, y) is Df (x, y)
= (y, x). Thus H =Hf (x, y)= ( 0 11 0
). Clearly (0,0) is the critical point. Moreover
|H | = −1 and so (0,0) is non-degenerate. The eigenvalues λ1, λ2 of the Hessiansatisfy λ1 + λ2 = 0, λ1λ2 =−1. Thus λ1 = 1, λ2 =−1. Eigenvectors of Hf (x, y)
are v1 = (1,1) and v2 = (1,−1) respectively. Let P = 1√2
( 1 11 −1
)be the normalized
eigenvector (basis change) matrix, so P−1 = P . Then
∧= 1
2
(1 1−1 1
)(0 11 0
)(1 1−1 0
)=(
1 00 −1
).
Consider a vector h= (h1, h2) ∈ �2. In matrix notation, htHh= htP ∧ P−1h.Now
P(h)= 1√2
(1 11 −1
)(h1h2
)= 1√
2
(h1 + h2h1 − h2
).
Thus htHh= 12 [(h1 + h2)
2 − (h1 − h2)2] = 2h1h2.
It is clear that D3f (0,0)= 0. Hence from Taylor’s Theorem,
f (0+ h1,0+ h2)= f (0)+Df (0)(h)+ 1
2D2f (0)(h,h)
and so f (h1, h2)= 12htHh= h1h2.
Suppose we make the basis change represented by P . Then with respect to thenew basis {v1, v2} the point (x, y) has coordinates ( 1√
2(h1 + h2),
1√2(h1 − h2)).
Thus f can be represented in a neighbourhood of the origin as
(h1, h2)→ 1√2(h1 + h2)
1√2(h1 − h2)= 1
2
(h1
2 − h22).
Notice that with respect to this new coordinate system
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
152 4 Differential Calculus and Smooth Optimisation
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
Fig. 4.7 ???
Df (h1, h2) = (h1,−h2), and
Hf (h1, h2) =(
1 00 −1
).
In the eigenspace E1 = {(x, y) ∈ �2 : x = y} the Hessian has eigenvalue 1 and so f
has a local minimum at 0, when restricted to E1.Conversely in the eigenspace E−1 = {(x, y) ∈ �2 : x + y = 0}, f has a local
maximum at 0.More generally when f : �2→� is a quadratic function in x, y, then at a critical
point f can be represented, after a suitable coordinate change, either as1. (x, y)→ x2 + y2 an index 0, minimum2. (x, y)→−x2 − y2 an index 2, maximum3. (x, y)→ x2 − y2 an index 1 saddle point.
Example 4.5 Let f : �3 →� : (x, y, z)→ x2 + 2y2 + 3z2 + xy + xz. ThereforeDf (x, y, z)= (2x + y + z,4y + x,6z+ x).
Clearly (0,0,0) is the only critical point.
H =Hf (x, y, z)=⎛
⎝2 1 11 4 01 0 6
⎞
⎠
|H | = 38 and so (0,0,0) is non-degenerate. It can be shown that the eigenvaluesof the matrix are strictly positive, and so H is positive definite and (0,0,0) is aminimum of the function. Thus f can be written in the form (u, v,w)→ au2 +bv2 + cw2, where a, b, c > 0 and (u, v,w) are the new coordinates after a (linear)basis change.
Notice that Lemma 4.7 does not assert that a local strict maximum (or minimum)of a C1-function must be a critical point where the Hessian is negative (respectivelypositive) definite.
For example consider the “flat” function f : � → � given by f (x) =− exp(− 1
x2 ), and f (0)= 0. As we showed in Example 4.3, df (0)= d2f (0)= 0.
Yet clearly 0 >− exp(− 1a2 ) for any a �= 0. Thus 0 is a global strict maximum of
f on �, although d2f is not negative definite.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.2 Cr -Differentiable Functions 153
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
A local maximum or minimum must however be a critical point. If the point isa local maximum for example then the Hessian can have no positive eigenvalues.Consequently D2f (x)(h,h) ≤ 0 for all h, and so the Hessian must be negativesemi-definite. As the flat function indicates, the Hessian may be identically zero atthe local maximum.
Lemma 4.8 Suppose that f : U ⊂ �n → � is a C2-function on an open set u
in �n. Then f has a local maximum or minimum at x only if x is a critical pointof f .
Proof Suppose that df (x) �= 0. Then we seek to show that x can be neither a localmaximum nor minimum at x.
Since df (x) is a linear map from�n to� it is possible to find some vector h ∈ �n
such that df (x)(h) > 0.Choose h sufficiently small so that the line segment [x, x + h] belongs to U .Now f is C1, and so df : U → L(�n,�) is continuous. In particular since
df (x) �= 0 then for some neighbourhood V of x in U , df (y) �= 0 for all y ∈ V .Thus for all y ∈ V,df (y)(h) > 0 (see Lemma 4.18 for more discussion of this phe-nomenon).
By the mean value theorem there exists t ∈ (0,1) such that f (x + h)= f (x)+df (x + th)(h). By choosing h sufficiently small, the vector x + th ∈ V . Hencef (x + h) > f (x). Consequently x cannot be a local maximum.
But in precisely the same way if df (x) �= 0 then it is possible to find h such thatdf (x)(h) < 0. A similar argument can then be used to show that f (x + h) < f (x)
and so x cannot be a local minimum. Hence if x is either a local maximum orminimum of f then it must be a critical point of f . �
Lemma 4.9 Suppose that f : U ⊂ �n →� is a C2-function on an open set U
in �n. If f has a local maximum at x then the Hessian d2f at the critical pointmust be negative semi-definite (i.e., d2f (x)(h,h)≤ 0 for all h ∈ �n).
Proof We may suppose that x is a critical point. Suppose further that for some co-ordinate basis at x, and vector h ∈ �n, d2f (x)(h,h) > 0. From Lemma 4.6, by thecontinuity of d2f there is a neighbourhood V of x in U such that d2f (x′)(h,h) > 0for all x′ ∈ V .
Choose an ε-neighbourhood of x in V , and choose α > 0 such that ‖αh‖ = 12ε.
Clearly x+αh ∈ V . By Taylor’s Theorem, there exists z= x+θαh, θ ∈ (0,1), suchthat f (x + αh)= f (x)+ df (x)(h)+ 1
2d2f (z)(αh,αh). But d2f (z) is bilinear, sod2f (z)(αh,αh)= α2d2f (z)(h,h) > 0 since z ∈ V . Moreover df (x)(h)= 0. Thusf (x + αh) > f (x).
Moreover for any neighbourhood U of x it is possible to choose ε sufficientlysmall so that x + αh belongs to U . Thus x cannot be a local maximum. �
Similarly if f has a local minimum at x then x must be a critical point withpositive semi-definite Hessian.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
154 4 Differential Calculus and Smooth Optimisation
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
Example 4.6 Let f : �2→� : (x, y)→ x2y2− 4yx2+ 4x2; Df (x, y)= (2xy2−8xy + 8x,2x2y − 4x2). Thus (x, y) is a critical point when1. x(2y2 − 8y + 8)= 0 and2. 2x2(y − 2)= 0.
Now 2y2− 8y + 8= 2(y − 2)2. Thus (x, y) is a critical point either when y = 2or x = 0.
Let S(f ) be the set of critical points. Then S(f )= V1 ∪ V2 where
V1 ={(x, y) ∈ �2 : x = 0
}
V2 ={(x, y) ∈ �2 : y = 2
}.
Now
Hf (x, y)=(
2(y − 2)2 −4x(2− y)
−4x(2− y) 2x2
)
and so when (x, y) ∈ V1 then Hf (x, y) = (μ2 00 0
), and when (x, y) ∈ V2, then
Hf (x, y)= ( 0 00 τ2
).
For suitable μ and τ , any point in S(f ) is degenerate. On V1\{(0,0)} clearlya critical point is not negative semi-definite, and so such a point cannot be a localmaximum. The same is true for a point on V2\{(0,0)}.
Now (0,0) ∈ V1∩V2, and Hf (0,0)= (0). Lemma 4.9 does not rule out (0,0) asa local maximum. However it should be obvious that the origin is a local minimum.
Unlike Examples 4.4 and 4.5 no linear change of coordinate bases transforms thefunction into a quadratic canonical form.
To find a local maximum point we therefore seek all critical points. Those whichhave negative definite Hessian must be local maxima. Those points remaining whichdo not have a negative semi-definite Hessian cannot be local maxima, and may berejected. The remaining critical points must then be examined.
A C2-function f : �n→� with a non-degenerate Hessian at every critical pointis called a Morse function. Below we shall show that any Morse function can be rep-resented in a canonical form such as we found in Example 4.4. For such a function,local maxima are precisely those critical points of index n. Moreover, any smoothfunction with a degenerate critical point can be “approximated” by a Morse func-tion.
Suppose now that we wish to maximise a C2-function on a compact set K . Aswe know from the Weierstrass theorem, there does exist a maximum. However,Lemmas 4.8 and 4.9 are now no longer valid and it is possible that a point on theboundary of K will be a local or global maximum but not a critical point. However,Lemma 4.7 will still be valid, and a negative definite critical point will certainly bea local maximum.
A further difficulty arises since a local maximum need not be a global maximum.However, for concave functions, local and global maxima coincide. We discuss max-imisation by smooth optimisation on compact convex sets in the next section.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 155
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
4.3 Constrained Optimisation
4.3.1 Concave and Quasi-concave Functions
In the previous section we obtained necessary and sufficient conditions for a criticalpoint to be a local maximum on an open set U . When the set is not open, then a localmaximum need not be a critical point. Previously we have defined the differential ofa function only on an open set. Suppose now that Y ⊂�n is compact and thereforeclosed, and has a boundary ∂Y . If df is continuous at each point in the interior, Int(Y ) of Y , then we may extend df over Y by defining df (x), at each point x in theboundary, ∂Y of Y , to be the limit df (xk) for any sequence, (xk), of points in Int(Y ), which converge to x. More generally we shall say a function f : Y ⊂�n→�is C1 on the admissible set Y if df : Y → L(�n,�) is defined and continuous in theabove sense at each x ∈ Y . We now give an alternative definition of the differentialof a C1-function f : Y →�. Suppose that Y is convex and both x and x+h belongto Y . Then the arc [x, x + h] = {z ∈ �n : z= x + λh,λ ∈ [0,1]} belongs to Y .
Now df (x)(h) = limλ→0+f (x+λh)−f (x)
λand thus df (x)(h) is often called the
directional derivative of f at x in the direction h.Finding maxima of a function becomes comparatively simple when f is a con-
cave or quasi-concave function (see Sect. 3.4 for definitions of these terms). Ourfirst result shows that if f is a concave function then we may relate df (x)(y− x) tof (y) and f (x).
Lemma 4.10 If f : Y ⊂�n→� is a concave C1-function on a convex admissibleset Y then
df (x)(y − x)≥ f (y)− f (x).
Proof Since f is concave
f(λy + (1− λ)x
)≥ λf (y)+ (1− λ)f (x)
for any λ ∈ [0,1] whenever x, y ∈ Y . But then f (x+λ(y− x)−f (x))≥ λ[f (y)−f (x)], and so
df (x)(y − x) = Limλ→0+
f (x + λ(y − x))− f (x)
λ
≥ f (y)− f (x). �
This enables us to show that for a concave function, f , a critical point of f mustbe a global maximum when Y is open.
First of all call a function f : Y ⊂�n→� strictly quasi-concave iff Y is convexand for all x, y ∈ Y
f(λy + (1− λ)x
)> min
(f (x), f (y)
)for all λ ∈ (0,1).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
156 4 Differential Calculus and Smooth Optimisation
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
Remember that f is quasi-concave if
f(λy + (1− λ)x
)≥min(f (x), f (y)
)for all λ ∈ [0,1].
As above let P(x;Y) = {y ∈ Y : f (y) > f (x)} be the preferred set of a func-tion f on the set Y . A point x ∈ Y is a global maximum of f on Y iff P(x;Y)=Φ .When there is no chance of misunderstanding we shall write P(x) for P(x;Y). Asshown in Sect. 3.4, if f is (strictly) quasi-concave then, ∀x ∈ Y , the preferred set,P(x), is (strictly) convex.
Lemma 4.111. If f : Y ⊂�n→� is a concave or strictly quasi-concave function on a convex
admissible set, then any point which is a local maximum of f is also a globalmaximum.
2. If f :U ⊂�n→� is a concave C1-function where U is open and convex, thenany critical point of f is a global maximum on U .
Proof1. Suppose that f is concave or strictly quasi-concave, and that x is a local max-
imum but not a global maximum on Y . Then there exists y ∈ Y such thatf (y) > f (x).
Since Y is convex, the line segment [x, y] belongs to Y . For any neighbour-hood U of x in Y there exists some λ∗ ∈ (0,1) such that, for λ ∈ (0, λ∗), z=λy + (1− λ)x ∈U . But by concavity
f (z)≥ λf (y)+ (1− λ)f (x) > f (x).
Hence in any neighbourhood U of x in Y there exists a point z such that f (z) >
f (x). Hence x cannot be a local maximum. Similarly by strict quasi-concavity
f (z) > min(f (x), f (y)
)= f (x),
and so, again, x cannot be a local maximum. By contradiction a local maximummust be a global maximum.
2. If f is C1 and U is open then by Lemma 4.8, a local maximum must be acritical point. By Lemma 4.10, df (x)(y − x)≥ f (y)− f (x). Thus df (x)= 0implies that f (y)− f (x) ≤ 0 for all y ∈ Y . Hence x is a global maximum off on Y . �
Clearly if x were a critical point of a concave function on an open set thenthe Hessian d2f (x) must be negative semi-definite. To see this, note that byLemma 4.11, the critical point must be a global maximum, and thus a local maxi-mum. By Lemma 4.9, d2f (x) must be negative semi-definite. The same is true if f
is quasi-concave.
Lemma 4.12 If f : U ⊂ �n →� is a quasi-concave C2-function on an open set,then at any critical point, x, d2f (x) is negative semi-definite.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 157
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
Proof Suppose on the contrary that df (x) = 0 and d2f (x)(h,h) > 0 for someh ∈ �n. As in Lemma 4.6, there is a neighbourhood V of x in U such thatd2f (z)(h,h) > 0 for all z in V .
Thus there is some λ∗ ∈ (0,1) such that, for all λ ∈ (0, λ∗), there is some z in V
such that
f (x + λh) = f (x)+ df (x)(λh)+ λ2d2f (z)(h,h), and
f (x − λh) = f (x)+ df (x)(−λh)+ (−λh)2d2f (z)(h,h),
where [x − λh,x + λh] belongs to U . Then f (x + λh) > f (x) and f (x − λh) >
f (x). Now x ∈ [x − λh,x + λh] and so by quasi-concavity,
f (x)≥min(f (x + λh),f (x − λh)
).
By contradiction d2f (x)(h,h)≤ 0 for all h ∈ �n. �
For a concave function, f , on a convex set Y , d2f (x) is negative semi-definitenot just at critical points, but at every point in the interior of Y .
Lemma 4.131. If f : U ⊂ �n →� is a concave C2-function on an open convex set U , then
d2f (x) is negative semi-definite for all x ∈U .2. If f : Y ⊂�n→� is a C2-function on an admissible convex set Y and d2f (x)
is negative semi-definite for all x ∈ Y , then f is concave.
Proof1. Suppose there exists x ∈U and h ∈ �n such that d2f (x)(h,h) > 0. By the con-
tinuity of d2f , there is a neighbourhood V of x in U such that d2f (z)(h,h) >
0, for z ∈ V . Choose θ ∈ (0,1) such that x+θh ∈ V . Then by Taylor’s theoremthere exists z ∈ (x, x + θh) such that
f (x + θh) = f (x)+ df (x)(θh)+ 1
2d2f (z)(θh, θh)
> f (x)+ df (x)(θh).
But then df (x)(θh) < f (x + θh) − f (x). This contradicts df (x)(y − x) ≥f (y)− f (x), ∀x, y in U . Thus d2f (x) is negative semi-definite.
2. If x, y ∈ Y and Y is convex, then the arc [x, y] ⊂ Y . Hence there is somez= λx + (1− λ)y, where λ ∈ (0,1), such that
f (y) = f (x)+ df (x)(y − x)+ d2f (z)(y − x, y − x)
≤ f (x)+ df (x)(y − x).
But in the same way f (x) − f (z) ≤ df (z)(x − z) and f (y) − f (z) ≤df (z)(y − z). Hence
f (z)≥ λ[f (x)− df (z)(x − z)
]+ (1− λ)[f (y)− df (z)(y − z)
].
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
158 4 Differential Calculus and Smooth Optimisation
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
Now λdf (z)(x − z)+ (1− λ)df (z)(y − z)= df (z)[λx + (1− λ)y − z] =df (z)(0)= 0, since df (z) is linear.
Hence f (z)≥ λf (x)+ (1− λ)f (y) for any λ ∈ [0,1] and so f is concave. �
We now extend the analysis to a quasi-concave function and characterise thepreferred set P(x;Y).
Lemma 4.14 Suppose f : Y ⊂ �n → � is a quasi-concave C1-function on theconvex admissible set Y .1. If f (y)≥ f (x) then df (x)(y − x)≥ 0.2. If df (x)(y − x) > 0, then there exists some λ∗ ∈ (0,1) such that f (z) > f (x)
for any
z= λy + (1− λ)x where λ ∈ (0, λ∗).
Proof1. By the definition of quasi-concavity f (λy+ (1−λ)x) > f (x) for all λ ∈ [0,1].
But then, as in the analysis of a concave function,
f(x + λ(y − x)
)− f (x)≥ 0
and so df (x)(y − x)= Limλ→0+f (x+λ(y−x))−f (x)
λ≥ 0.
2. Now suppose f (x)≥ f (z) for all z in the line segment [x, y]. Then df (x)(y−x)= Limλ→0+
f (x+λ)(y−x)−f (x)λ
≤ 0, contradicting df (x)(y − x) > 0.Thus there exists z∗ = λ∗y + (1 − λ∗)x such that f (z∗) > f (x). But then for allz ∈ (x,λ∗y + (1− λ∗)x), f (z) > f (x). �
The property that f (y) ≥ f (x) ⇒ df (x)(y − x) ≥ 0 is often called pseudo-concavity.
We shall also say that f : Y ⊂�n→� is strictly pseudo-concave iff for any x, y
in Y , with y �= x then f (y)≥ f (x) implies that df (x)(y − x) > 0. (Note we do notrequire Y to be convex, but we do require it to be admissible.)
Lemma 4.15 Suppose f : Y ⊂ �n →� is a strictly pseudo-concave function onan admissible set Y .1. Then f is strictly quasi-concave when Y is convex.2. If x is a critical point, then it is a global strict maximum.
Proof1. Suppose that f is not strictly quasi-concave. Then for some x, y ∈ Y,f (λ∗y +
(1 − λ∗)x) ≤ min (f (x), f (y)) for some λ∗ ∈ (0,1). Without loss of gener-ality suppose f (x) ≤ f (y), and f (λy + (1 − λ)x) ≤ f (x) for all λ ∈ (0,1).Then df (x)(y − x) = Limλ→0+
f (λy+(l−λ)x)−f (x)λ
≤ 0. But by strict pseudo-concavity, we require that df (x)(y − x) ≥ 0. Thus f must be strictly quasi-concave.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 159
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
2. If df (x)= 0 then df (x)(y − x)= 0 for all y ∈ U . Hence f (y) < f (x) for ally ∈U,y �= x. Thus x is a global strict maximum. �
As we have observed, when f is a quasi-concave function on Y , the preferred setP(x;Y) is a convex set in Y . Clearly when f is continuous then P(x;Y) is openin Y . As we might expect from the Separating Hyperplane Theorem, P(x;Y) willthen belong to an “open half space”. To see this note that Lemma 4.14 establishes(for a quasi-concave C1 function f ) that the weakly preferred set
R(x;Y)= {y ∈ Y : f (y)≥ f (x)}
belongs to the closed half-space
H(x;Y)= {y ∈ Y : df (x)(y − x)≥ 0}.
When Y is open and convex the boundary of H(x;Y) is the hyperplane {y ∈ Y :df (x)(y−x)= 0} and H(x;Y) has relative interior
0H (x;Y)= {y ∈ Y : df (x)(y−
x) > 0}. Write H(x),R(x),P (x) for H(x;Y),R(x;Y),P (x;Y), etc., when Y isunderstood.
Lemma 4.16 Suppose f :U ⊂�n→� is C1, and U is open and convex.
1. If f is quasi-concave, with df (x) �= 0 then P(x)⊂ 0H (x),
2. If f is concave or strictly pseudo-concave then P(x)⊂ 0H(x) for all x ∈U . In
particular if x is a critical point, then P(x)= 0H(x)=Φ .
Proof
1. Suppose that df (x) �= 0 but that P(x) �⊂ 0H(x). However both P(x) and
0H(x)
are open sets in U . By Lemma 4.14, R(x)⊂H(x), and thus the closure of P(x)
belongs to the closure of0H(x) in U . Consequently there must exist a point y
which belongs to P(x) yet df (x)(y − x)= 0, so y belongs to the boundary of0H(x). Since P(x) is open there exists a neighbourhood V of y in P(x), and
thus in R(x). Since y is a boundary point of0H(x) in any neighbourhood V of
y there exists z such that z /∈H(x). But this contradicts R(x)⊂H(x). Hence
P(x)⊂ 0H(x).
2. Since a concave or strictly pseudo-concave function is a quasi-concave func-
tion, (1) establishes that P(x) ⊂ 0H(x) for all x ∈ U such that df (x) �= 0.
By Lemmas 4.11 and 4.15, if df (x) = 0 then x is a global maximum. Hence
P(x)= 0H(x)=Φ . �
Lemma 4.8 shows that if0H(x) �=Φ , then x cannot be a global maximum on the
open set U .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
160 4 Differential Calculus and Smooth Optimisation
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
Fig. 4.8 ???
Moreover for a concave or strictly pseudo-concave function P(x) is empty if0H(x) is empty (i.e., when df (x)= 0).
Figure 4.8 illustrates these observations. Let P(x) be the preferred set of a quasi-concave function f (at a non-critical point x).
Point y1 satisfies f (y1)= f (x) and thus belongs to R(x) and hence H(x). Pointy2 ∈H(x)\P(x) but there exists an open interval (x, z) belonging to [x, y2] and toP(x).
We may identify the linear map df (x) : �n → � with a vector Df (x) ∈ �n
where df (x)(h) = 〈Df (x),h〉 the scalar product of Df (x) with h. Df (x) is thedirection gradient, and is normal to the indifference surface at x, and therefore tothe hyperplane ∂H(x)= {y ∈ Y : df (x)(y − x)= 0}.
To see this intuitively, note that the indifference surface I (x)= {y ∈ Y : f (y)=f (x)} through x, and the hyperplane ∂H(x) are tangent at x. Just as df (x) is anapproximation to the function f , so is the hyperplane ∂H(x) an approximation toI (x), near to x.
As we shall see in Example 4.7, a quasi-concave function, f , may have a critical
point x (so0H(x)=Φ) yet P(x) �=Φ . For example, if Y is the unit interval, then f
may have a degenerate critical point with P(x) �= Φ . Lemma 4.16 establishes thatthis cannot happen when f is concave and Y is an open set. The final Lemma of thissection extends Lemma 4.16 to the case when Y is admissible.
Lemma 4.17 Let f : Y ⊂�n→� be C1, and let Y be a convex admissible set.
1. If f is quasi-concave on Y , then ∀x ∈ Y,0H(x;Y) �=Φ implies P(x;y) �=Φ . If
x is a local maximum of f , then0H(x;Y)=Φ .
2. If f is a strictly pseudo-concave function on an admissible set Y , and x is alocal maximum, then it is a global strict maximum.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 161
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
Fig. 4.9 ???
3. If f is concave C1 or strictly pseudo-concave on Y , then P(x;Y) ⊂ 0H(x :
Y) ∀x ∈ Y . Hence0H(x;Y)=Φ⇒ P(x;Y)=Φ .
Proof
1. If0H(x;Y) �= Φ then df (x)(y − x) > 0 for some y ∈ Y . Then by Lem-
ma 4.14(2), in any neighbourhood U of x in Y there exists z such thatf (z) > f (x). Hence x cannot be a local maximum, and indeed P(x) �=Φ .
2. By Lemma 4.15, f must be quasi-concave. By (1), if x is a local maximum thendf (x)(y−x)≤ 0 for all y ∈ Y . By definition this implies that f (y) < f (x) forall y ∈ Y such that y �= x. Thus x is a global strict maximum.
3. If f is concave and y ∈ P(x;Y), then f (y) > f (x). By Lemma 4.10,
df (x)(y − x)≥ f (y)− f (x) > 0. Thus y ∈ 0H(x).
4. If f is strictly pseudo-concave then f (y) > f (x) implies df (x)(y − x) > 0,
and so P(x;Y)⊂ 0H(x;Y). �
Example 4.7 These results are illustrated in Fig. 4.9.1. For the general function f1, b is a critical point and local maximum, but not
a global maximum (e). On the compact interval [a, d], d is a local maximumbut not a global maximum.
2. For the quasi-concave function f2, a is a degenerate critical point but neither alocal nor global maximum, while b is a degenerate critical point which is alsoa global maximum.Point c is a critical point which is also a local maximum. However on [b, c], c
is not a global maximum.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
162 4 Differential Calculus and Smooth Optimisation
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
3. For the concave function f3, clearly b is a degenerate (but negative semi-definite) critical point, which is also a local and global maximum. Moreoveron the interval [a, c], c is the local and global maximum, even though it is nota critical point. Note that df3(c)(a − c) < 0.
Lemma 4.17 suggests that we call any point x in an admissible set Y a general-
ized critical point in Y iff0H(x;Y)=Φ , of course if df (x)= 0, then
0H(x;Y)=Φ ,
but the converse is not true when x is a boundary point.Lemma 4.17 shows that (i) for a quasi-concave C1-function, a global maximum
is a local maximum is a generalised critical point; (ii) for a concave C1-or strictlypseudo-concave function a critical point is a generalised critical point is a localmaximum is a global maximum.
4.3.2 Economic Optimisation with Exogenous Prices
Suppose now that we wish to find the maximum of a quasi-concave C1-functionf : Y →� subject to a constraint g(x) ≥ 0 where g is also a quasi-concave C1-function g : Y →�.
As we know from the previous section, when Pf (x) = {y ∈ Y : f (y) > f (x)}and df (x) �= 0, then Pf (x)⊂Hf (x)= {y ∈ Y : df (x)(x − y)≥ 0}.
Suppose now that Hg(x) = {y ∈ Y : dg(x)(x − y) ≥ 0} has the property that
Hg(x)∩ 0Hf (x)=Φ , and x satisfies g(x)= 0.
In this case, there exists no point y such that g(y)≥ 0 and f (y) > f (x).
A condition that is sufficient for the disjointness of the two half-spaces0Hf (x)
and Hg(x) is clearly that λdg(x)+ df (x)= 0 for some λ > 0.In this case if df (x)(v) > 0, then dg(x)(v) < 0, for any v ∈ �n.Now let L= Lλ(f,g) be the Lagrangian f + λg : Y →�. A sufficient condition
for x to be a solution to the optimisation problem is that dL(x)= 0.Note however that this is not a necessary condition. As we know from the pre-
vious section it might well be the case for some point x on the boundary of theadmissible set Y that dL(x) �= 0 yet there exists no y ∈ Y such that
dg(x)(x − y)≥ 0 and df (x)(x − y) > 0.
Figure 4.10 illustrates such a case when
Y = {(x, y) ∈ �2 : x ≥ 0, y ≥ 0}.
We shall refer to this possibility as the boundary problem.As we know we may represent the linear maps df (x), dg(x) by the direction
gradients, or vectors normal to the indifference surfaces, labelled Df (x),Dg(x).When the maps df (x), dg(x) satisfy df (x)+λdg(x)= 0, λ > 0, then the direc-
tion gradients are positively dependent and satisfy Df (x)+ λDg(x)= 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 163
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
Fig. 4.10 ???
In the example Df (x) and Dg(x) are not positively dependent, yet x is a solutionto the optimisation problem.
It is often common to make some boundary assumption so that the solution doesnot belong to the boundary of the feasible or admissible set Y .
In the more general optimisation problem (f, g) : Y →�m+1, the Kuhn Tuckertheorem implies that a global saddle point (x∗, λ∗) to the Lagrangian Lλ(f,g) =f +∑λigi gives a solution x∗ to the optimisation problem. Aside from the bound-ary problem, we may find the global maxima of Lλ(f,g) by finding the criticalpoints of Lλ(f,g).
Thus we must choose x∗ such that
df(x∗)+
m∑
i=1
λi dgi
(x∗)= 0.
Once a coordinate system is chosen this is equivalent to finding x∗, and coeffi-cients λ1, . . . , λm all non-negative such that
Df(x∗)+
m∑
i=1
λiDgi
(x∗)= 0.
The Kuhn Tucker Theorem also showed that if x∗ is such that gi(x∗) > 0, then
λi = 0 and if gi(x∗)= 0 then λi > 0.
Example 4.8 Maximise the function f : �→�:
x → x2 : x ≥ 0
x → 0 : x < 0
subject to g1(x)= x ≥ 0 and g2(x)= 1− x ≥ 0. Now Lλ(x)= x2 + λ1x + λ2(1−x); ∂L
∂x= 2x + λ1 − λ2 = 0, ∂L
∂λ1= x = ∂L
∂λ2= 1− x = 0.
Clearly these equations have no solution. By inspection the solution cannot sat-isfy g1(x)= 0. Hence choose λ1 = 0 and solve
Lλ(x)= x2 + λ(1− x).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
164 4 Differential Calculus and Smooth Optimisation
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
Fig. 4.11 ???
Then ∂L∂x= 2x − λ, ∂L
∂λ= 1− x = 0. Thus x = 1 and λ= 2 is a solution.
Suppose now that f,g1, . . . , gm are all concave functions on the convex admis-sible set Y = {x ∈ �n : xi ≥ 0, i = 1, . . . , n}. Obviously if z= αy + (1− α)x, then
Lλ(f,g)(z) = f (z)+m∑
i=1
λigi(z)
≥ αf (y)+ (1− α)f (x)+m∑
i=1
λi
[αgi(y)+ (1− α)gi(x)
]
= αLλ(f,g)(y)+ (1− α)Lλ(f,g)(x).
Thus Lλ(f,g) is a concave function. By Lemma 4.11, x∗ is a global maximumof Lλ(f,g) iff dL(f,g)(x∗)= 0 (aside from the boundary problem).
For more general functions, to find the global maximum of the LagrangianLλ(f,g), and thus the optimum to the problem (f, g), we find the critical pointsof Lλ(f,g). Those critical points which have negative definite Hessian will then belocal maxima of Lλ(f,g). However we still have to examine the local maxima whenthe Hessian of the Lagrangian is negative semi-definite to find the global maxima.Even in this general case, any solution x∗ to the problem (f, g) must be a globalmaximum for a suitably chosen Lagrangian Lλ(f,g), and thus must satisfy the firstorder condition
Df(x∗)+
m∑
i=1
λiDgi
(x∗)= 0
(again, subject to the boundary problem).
Example 4.9 Maximise f : �2 → � : (x, y) → xy subject to the constraintg(x, y)= 1− x2 − y2 ≥ 0. We seek a solution to the first order condition:
DL(x,y)=Df (x, y)+ λDg(x, y)= 0.
Thus (y, x)+ λ(−2x,−2y)= 0 or λ= y2x= x
2yso x2 = y2.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 165
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
Fig. 4.12 ???
For x = −y, λ < 0 and so Df (x, y) = |λ|Dg(x, y), (corresponding to a mini-mum of f on the feasible set g(x, y)≥ 0).
Thus we choose x = y and λ = 12 . For (x, y) on the boundary of the constraint
set we require 1− x2 − y2 = 0. Hence x = y =± 1√2
.
The Lagrangian is therefore L = xy + 12 (1 − x2 − y2) with differential (with
respect to x, y)
DL(x,y) = (y − x, x − y) and Hessian
HL(x,y) =(−1 1
1 −1
).
The eigenvalues of HL are −2, 0 corresponding to eigenvectors (1,−1) and(1,1) respectively. Hence HL is negative semi-definite, and so for example thepoint ( 1√
2, 1√
2) is a local maximum for the Lagrangian.
As we have observed in Example 3.4, the function f (x, y) = xy is not quasi-concave on �2, and hence it is not the case that Pf ⊂ Hf . However on �2+ ={(x, y) ∈ �2 : x ≥ 0, y ≥ 0}, f is quasi-concave, and so the optimality conditionHg(x, y)∩Hf (x, y)=Φ is sufficient for an optimum.
Note also that Df (x, y)= (y, x) and so the origin (0,0) is a critical point of f .However setting DL(x,y) = 0 at (x, y) = (0,0) requires λ = 0. In this case how-ever
HL(x,y)=(
0 11 0
)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
166 4 Differential Calculus and Smooth Optimisation
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
Fig. 4.13 ???
and as in Example 4.4, HL is non-degenerate with eigenvalues +1, −1. HenceHL is not negative semi-definite, and so (0,0) cannot be a local maximum for theLagrangian.
However if we were to maximise f (x, y)=−xy on the feasible set �2+ subjectto the same constraint then L would be maximized at (0,0) with λ= 0.
Example 4.10 In Example 3.7 we examined the maximisation of a convex pref-erence correspondence of a consumer subject to a budget constraint of the formB(p)= {x ∈ �n+ : 〈p,x〉 ≤ 〈p, e〉 = I }, given by an exogenous price vector p ∈ �n+,and initial endowment vector e ∈ �n+.
Suppose now that the preference correspondence is given by a utility function:
f : �2+ →� : (x, y)→ β logx + (1− β) logy, 0 < β < 1.
Clearly Df (x, y)= (βx,
1−βy
), and so
Hf (x, y)=( −β
x2 0
0 −(1−β)
y2
)
is negative definite. Thus f is concave on �2. The budget constraint is
g(x, y)= I − p1x − p2y ≥ 0
where p1,p2 are the given prices of commodities x, y. The first order condition onthe Lagrangian is: Df (x, y) + λDg(x, y) = 0, i.e., (
βx,
1−βy
) + λ(−p1,−p2) = 0
and λ > 0. Hence p1p2= β
x· y
1−β. See Fig. 4.13.
Now f is concave and has no critical point within the constraint set. Thus (x, y)
maximises Lλ iff
y = x
(p1
p2
)(1− β
β
)and g(x, y)= 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 167
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
Thus y = I−p1xp2
and so x = Iβp1
, y = I (1−β)p2
, and λ= 1I
, is the marginal utility ofincome.
Now consider a situation where prices vary. Then optimal consumption(x∗, y∗)= (d1(p1,p2), d2(p1,p2)) where di(p1,p2) is the demand for commodityx or y. As we have just shown, d1(p1,p2)= Iβ
p1, and d2(p1,p2)= I (1−β)
p2.
Suppose that all prices are increased by the same proportion i.e., (p′1,p′2) =α(p1,p2), α > 0.
In this exchange situation I ′ = p′1e1 + p′2e2 = αI = α(p1e1 + p2e2).
Thus x′ = I ′βp′1= αIβ
αp1= x′, and y′ = y.
Hence di(αp1, αp2)= di(p1,p2), for i = 1,2. The demand function is said to behomogeneous in prices.
Suppose now that income is obtained from supplying labor at a wage rate w say.Let the supply of labor by the consumer be e = 1− x3, where x3 is leisure time
and enters into the utility function.Then f : �3 → � : (x1x2x3) → ∑3
i=1 ai logxi and the budget constraint isp1x1 + p2x2 ≤ (1− x3)w, or g(x1, x2, x3) = w − (p1x1 + p2x2 + wx3) ≥ 0. Thefirst order condition is ( a1
x1, a2
x2,
a3x3
)= λ(p1,p2,w),λ > 0.Clearly the demand function will again be homogeneous, since d(p1,p2,w) =
d(αp1, αp2, αw).For the general consumer optimisation problem, we therefore normalise the price
vector. In general, in an n-commodity exchange economy let
Δ= {p ∈ �n+ : ‖p‖ = 1}
be the price simplex. Here ‖ ‖ is a convenient norm on �n.If f : �n+ →� is the utility function, let
D∗f (x)= Df (x)
‖Df (x)‖ ∈Δ.
Suppose then that x∗ ∈ �n is a maximum of f : �n+ →� subject to the budgetconstraint 〈p,x〉 ≤ I .
As we have seen the first order condition is
Df (x)+ λDg(x)= 0,
where Dg(x)=−p = (−p1, . . . ,−pn), p ∈Δ, and
Df (x)=(
∂f
∂x1, . . . ,
∂f
∂xn
).
Thus Df (x)= λ(p1, . . . , pn)= λp ∈ �n+. But then D∗f (x)= p‖p‖ ∈Δ.
Subject to boundary problems, a necessary condition for optimal consumer be-havior is that D∗f (x)= p
‖p‖ .
As we have seen the optimality condition is that ∂f∂xi
/∂f∂xj= pi
pj, for the ith and
j th commodity, where ∂f∂xi
is often called the marginal utility of the ith commodity.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
168 4 Differential Calculus and Smooth Optimisation
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
Fig. 4.14 ???
Now any point y on the boundary of the budget set satisfies
〈p,y〉 = I = 1
λ
⟨Df(x∗), x∗⟩.
Hence y ∈H(p, I), the hyperplane separating the budget set from the preferredset at the optimum x∗, iff 〈Df (x∗), y − x∗〉 = 0.
Consider now the problem of maximisation of a profit function by a producer
π(x1, . . . , xm, xm+1, . . . , xn)=n−m∑
j=1
pm+j xm+j −m∑
j=1
pjxj ,
where (x1, . . . , xm) ∈ � are inputs, (xm+1, . . . , xn) are outputs and p ∈ �n+ is a non-negative price vector.
As in Example ??, the set of feasible input-output combinations is given by theproduction set G= {x ∈ �n+ : F(x)≥ 0} where F : �n+ → � is a smooth functionand F(x)= 0 when x is on the upper boundary or frontier of the production set G.
At a point x on the boundary, the vector which is normal to the surface {x :F(x)= 0} is
DF(x)=(
∂F
∂x1, . . . ,
∂F
∂xn
)
x
.
The first order condition for the Lagrangian is that
Dπ(x)+ λDF(x)= 0
or (−p1, . . . ,−pm,pm+1, . . . , pn)+ λ( λFλx1
, . . . , ∂F∂xn
)= 0.For example with two inputs (x1 and x2) and one output (x3) we might express
maximum possible output y in terms of x1 and x2, i.e., y = g(x1, x2). Then thefeasible set is
{x ∈ �3+ : F(x1, x2, x3)= g(x1, x2)− x3 ≥ 0
}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.3 Constrained Optimisation 169
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
Then (−p1,−p2,p3)+ λ(∂g∂x1
,∂g∂x2
,−1)= 0 and so
p1 = p3∂g
∂x1, p2 = p3
∂g
∂x2or
p1
p2= ∂g
∂x1
/ ∂g
∂x2.
Here ∂g∂xj
is called the marginal product (with respect to commodity j for j =1,2).
For fixed x = (x1, x2) consider the locus of points in �2+ such that y = g(x) is
a constant. If (∂g∂x1
,∂g∂x2
)x �= 0 at x, then by the implicit function theorem (discussedin the next chapter) we can express x2 as a function x2(x1) of x1, only, near x.
In this case ∂g∂x1+ dx2
dx1
∂g∂x2= 0 and so ∂g
∂x1/
∂g∂x2|x = dx2
dx1|x = p1
p2.
The ratio ∂g∂x1
/∂g∂x2|x is called the marginal rate of technical substitution of x2
for x1 at the point (x1, x2).
Example 4.11 There are two inputs K (capital) and L (labor), and one output, Y ,say.
Let g(K,L)= [dK−ρ + (1− d)L−ρ]−1ρ . The feasibility constraint is
F(K,L,Y )= g(K,L)− Y ≥ 0.
Let −v,−w,p be the prices of capital, labor and the output. For optimality weobtain:
(−v,−w,p)+ λ
(∂F
∂K,∂F
∂L,∂F
∂Y
)= 0.
On the production frontier, g(K,L)= Y and so p =−λ∂F∂Y= λ since ∂F
∂L=−1.
Now let X = [dk−ρ + (1− d)L−ρ].Then ∂F
∂K= (− 1
ρ)[−ρdk−ρ−1] X− 1
ρ−1.
Now Y =X− 1
ρ so Y 1+ρ =X− 1
ρ−1. Thus ∂F
∂K= d( Y
K)1+ρ .
Similarly ∂F∂L= (1− d)(Y
L)1+ρ . Thus r
w= ∂F
∂K/∂F
∂L= d
1−d( LK
)1+ρ .In the case just of a single output, where the production frontier is given by a
function
xn+1 = g(x1, . . . , xn) and (x1, . . . , xn) ∈ �n+is the input vector, then clearly the constraint set will be a convex set if and only if g
is a concave function. (See Example 3.3.) In this case the solution to the Lagrangianwill give an optimal solution. However when the constraint set is not convex, thensome solutions to the Lagrangian problem may be local minima. See Fig. 4.15 foran illustration.
As with the consumer, the optimum point on the production frontier is unchangedif all prices are multiplied by a positive number.
For a general consumer let d : �n+ →�n+ be the demand map where
d(p1, . . . , pn)=(x∗1 (p), . . . , x∗n(p)
)= x∗(p)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
170 4 Differential Calculus and Smooth Optimisation
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
Fig. 4.15 ???
and x∗(p) is any solution to the maximisation problem on the budget set
B(p1, . . . , pn)={x ∈ �n+ : 〈p,x〉 ≤ I
}.
In general it need not be the case that d is single-valued, and so it need not be afunction.
As we have seen, d is homogeneous in prices and so we may regard d as acorrespondence
d :Δ→�n+.
Similarly for a producer let s : Δ→�n+ where s(p1, . . . , pn) = (−x∗1 (p), . . . ,
−x∗m(p), x∗m+1(p) . . .) be the supply correspondence. (Here the first m values arenegative because these are inputs.)
Now consider a society {1, . . . , i, . . . ,m} and commodities named {1 . . . j . . . n}.Let ei ∈ �n+ be the initial endowment vector of agent i, and e =∑m
i=1 ei the totalendowment of the society. Then a price vector p∗ ∈ Δ is a market-clearing priceequilibrium when e +∑m
i=1 si(p∗) =∑m
i=1 di(p∗) where si(p
∗) ∈ �n+ belongs tothe set of optimal input-output vectors at price p∗ for agent i, and di(p
∗) is anoptimal demand vector for consumer i at price vector p∗.
As an illustration, consider a two person, two good exchange economy (withoutproduction) and let eij be the initial endowment of good j to agent i. Let (f1, f2) :�2+ →�2 be the C1-utility functions of the two players.
At (p1,p2) ∈Δ, for optimality we have(
∂fi
∂xi1,
∂fi
∂xi2
)= λi(p1,p2), λi > 0.
But x1j + x2j = e1j + e2j for j = 1 or 2, in market equilibrium. Thus ∂fi
∂xij=− ∂fi
∂xkj
when i �= k. Hence 1λ1
(− ∂f1∂x11
,∂f1∂x12
)= (p1,p2)= 1λ2
(∂f2∂x11
,− ∂f2∂x12
) or (∂f1∂x11
,∂f1∂x12
)+λ(
∂f2∂x11
,∂f2∂x12
)= 0, for some λ > 0. See Fig. 4.16.As we shall see in the next section this implies that the result (x11, x12, x21, x22)
of optimal individual behaviour at the market-clearing price equilibrium is a Paretooptimal outcome under certain conditions.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 171
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
Fig. 4.16 ???
4.4 The Pareto Set and Price Equilibria
4.4.1 The Welfare and Core Theorems
Consider a society M = {1, . . . ,m} of m individuals where the preference of theith individual in M on a convex admissible set Y in �n is given by a C1-functionui : Y ⊂�n→�. Then u= (u1, . . . , um) : Y ⊂�n→�m is called a C1-profile forthe society.
A point y ∈ Y is said to be Pareto preferred (for the society M) to x ∈ Y iffui(y) > ui(x) for all i ∈M . In this case write y ∈ PM(x) and call PM : Y → Y
the Pareto correspondence. The (global) Pareto set for M on Y is the choiceP(u1, . . . , um)= {y ∈ Y : PM(y)=Φ}. We seek to characterise this set.
In the same way as before we shall let
Hi(x)= {y ∈ Y : dui(x)(y − x) > 0}
where Hi : Y → Y for each i = 1, . . . ,m. (Notice that Hi(x) is open ∀x ∈ Y .)Given a correspondence P : Y → Y the inverse correspondence P−1 : Y → Y is
defined by
P−1(x)= {y ∈ Y : x ∈ P(y)}.
In Sect. 3.3 we said a correspondence P : Y → Y (where Y is a topologicalspace) is lower demi-continuous (LDC) iff for all x ∈ Y,P−1(x) is open in Y .
Clearly if ui : W → � is continuous then the preference correspondence Pi :Y → Y given by Pi(x)= {y : ui(y) > ui(x)} is LDC, since
P−1i (y)= {x ∈ Y : ui(y) > u(x)
}
is open.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
172 4 Differential Calculus and Smooth Optimisation
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
We show now that when ui : Y →� is C1 then Hi : Y → Y is LDC and as aconsequence if Hi(x) �=Φ then Pi(x) �=Φ .
This implies that if x is a global maximum of ui on Y (so Pi(x) = Φ) thenHi(x)=Φ (so x is a generalised critical point).
Lemma 4.18 If ui : Y →� is a C1-function on the convex admissible set Y , thenHi : Y → Y is lower demi-continuous and if Hi(x) �=Φ then Pi(x) �=Φ .
Proof Suppose that Hi(x) �= Φ . Then there exists y ∈ Y such that dui(x)(y −x) > 0. Let h= y − x.
By the continuity of dui : Y → L(�n,�) there exists a neighbourhood U of x
in Y , and a neighbourhood V of h in �n such that dui(z)(h′) > 0 for all z ∈ U , for
all h′ ∈ V .Since y ∈Hi(x), we have x ∈H−1
i (y). Now h= y − x. Let
U ′ = {x′ ∈U : y − x′ ∈ V}.
For all x′ ∈ U ′, dui(x′)(y − x′) > 0. Thus U ′ ⊂ H−1
i (y). Hence H−1i (y) is open.
This is true at each y ∈ Y , and so Hi is LDC.Suppose that Hi(x) �=Φ and h ∈Hi(x). Since Hi is LDC it is possible to choose
λ ∈ (0,1), by Taylor’s Theorem, such that
ui(x + λh)= ui(x)+ dui(z)(λh),
where dui(z)(h) > 0, and z ∈ (x, x + λh). Thus ui(x + λh) > ui(x) and soPi(x) �=Φ . �
When u= (u1, . . . , um) : Y →�m is a C1-profile then define the correspondenceHM : Y → Y by HM(x)=⋂i∈M Hi(x) i.e., y ∈HM(x) iff dui(x)(y − x) > 0 forall i ∈M .
Lemma 4.19 If (u1, . . . , um) : Y →�m is a C1-profile, then HM : Y → Y is lowerdemi-continuous. If HM(x) �=Φ then PM(x) �=Φ .
Proof Suppose that HM(x) �=Φ . Then there exists y ∈Hi(x) for each i ∈M . Thusx ∈ H−1
i (y) for all i ∈M . But each H−1i (y) is open; hence ∃ an open neighbour-
hood Ui of x in H−1i (y); let U =⋂i∈M Ui . Then x′ ∈U implies that x′ ∈H−1
M (y).Thus HM is LDC. As in the proof of Lemma 4.18 it is then possible to chooseh ∈ �n such that, for all i in M ,
ui(x + h)= ui(x)+ dui(z)(h)
where z belongs to U , and dui(z)(h) > 0. Thus x + h ∈ PM(x) and soPM(x) �=Φ . �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 173
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
The set {x : HM(x) = Φ} is called the critical Pareto set, and is often writtenas ΘM , or Θ(u1, . . . , um). By Lemma 4.19, Θ(u1, . . . , um) contains the Pareto setP(u1, . . . , um).
Moreover we can see that ΘM must be closed in Y . To see this suppose thatHM(x) �=Θ , and y ∈HM(x) �=Φ . Thus x ∈H−1
M (y). But HM is LDC and so thereis a neighbourhood U of x in Y such that x′ ∈ H−1
M (y) for all x′ ∈ U . Then y ∈HM(x′) for all x′ ∈U , and so HM(x′) �=Φ for all x′ ∈U .
Hence the set {x ∈ Y : HM(x) �= Φ} is open and so the critical Pareto set isclosed.
In the same way, the Pareto correspondence PM : Y → Y is given by PM(x) =⋂i∈M Pi(x) where Pi(x) = {y : ui(y) > ui(x)} for each i ∈M . Since each Pi is
LDC, so must be PM , and thus the Pareto set P(u1, . . . , um) must also be closed.Suppose now that u1, . . . , um are all concave C1-or strictly pseudo-concave func-
tions on the convex set Y .By Lemma 4.17, for each i ∈M,Pi(x)⊂Hi(x) at each x ∈ Y .If x ∈Θ(u1, . . . , um) then
⋂
i∈M
Pi(x)⊂⋂
i∈M
Hi(x)=Φ
and so x must also belong to the (global) Pareto set. Thus if u= (u1, . . . , um) witheach ui concave C1 or strictly pseudo-concave, then the global Pareto set P(u)
and the critical Pareto set Θ(u) coincide. In this case we may more briefly say thepreference profile represented by u is strictly convex.
A point in P(u1, . . . , um) is the precise analogue, in the case of a family of func-tions, of a maximum point for a single function, while a point in Θ(u1, . . . , um) isthe analogue of a critical point of a single function u : Y →�. In the case of a fam-ily or profile of functions, a point x belongs to the critical Pareto set ΘM(u), whena generalised Lagrangian L(u1, . . . , um) has differential dL(x)= 0.
This allows us to define a Hessian for the family and determine which criticalPareto points are global Pareto points.
Suppose then that u = (u1, . . . , um) : Y →�m where Y is a convex admissibleset in �n and each ui : Y →� is a C1-function.
A generalised Lagrangian L(λ,u) for u is a semipositive combination∑m
i=1 λiui
where each λi ≥ 0 but not all λi = 0.For convenience let us write
�m+ ={x ∈ �m : xi ≥ 0 for i ∈M
}
0�m+ ={x ∈ �m : xi > 0 for i ∈M
}, and
�m+ = �m+\{0}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
174 4 Differential Calculus and Smooth Optimisation
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
Thus λ ∈ �m+ iff each λi ≥ 0 but not all λi = 0. Since each ui : Y →� is a C1-function, the differential at x is a linear map dui(x) : �n →�. Once a coordinatebasis for �n is chosen, dui(x) may be represented by the row vector
Dui(x)=(
∂ui
∂x
∣∣∣∣x
, . . . ,∂ui
∂xn
∣∣∣∣x
).
Similarly the profile u : Y →�m has differential at x represented by the (n×m)Jacobian matrix
Du(x)=⎛
⎜⎝
Du1(x)...
Dum(x)
⎞
⎟⎠ : �n→�m.
Suppose now that λ ∈ �m. Then define λ ·Du(x) : �n→� by
(λ ·Du(x)
)(v)= ⟨λ,Du(x)(v)
⟩
where 〈λ,Du(x)(v)〉 is the scalar product of the two vectors λ,Du(x)(v) in �m.
Lemma 4.20 The gradient vectors {Dui(x) : i ∈M} are linearly dependent andsatisfy the equation
m∑
i=1
λiDui(x)= 0
iff [ImDu(x)]⊥ is the subspace of �m spanned by λ= (λ1, . . . , λm).Here λ ∈ [ImDu(x)]⊥ iff 〈λ,w〉 = 0 for all w ∈ ImDu(x).
Proof
λ ∈ [ImDu(x)]⊥ ⇔ 〈λ,w〉 = 0 ∀ w ∈ ImDu(x)
⇔ ⟨λ,Du(x)(v)
⟩= 0 ∀ v ∈ �n
⇔ (λ ·Du(x)
)(v)= 0 ∀ v ∈ �n
⇔ λ ·Du(x)= 0.
But λ ·Du(x)= 0⇔∑mi=1 λiDui(x)= 0, where λ= (λ1, . . . , λm). �
Theorem 4.21 If u : Y →�m is a C1-profile on an admissible convex set and x
belongs to the interior of Y , then x ∈ Θ(u1, . . . , um) iff there exists λ ∈ �m+ suchthat dL(λ,u)(x)= 0.
If x belongs to the boundary of Y and dL(λ,u)(x) = 0, for λ ∈ �m+, then x ∈Θ(u1, . . . , um).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 175
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
Proof Pick a coordinate basis for �n. Suppose that there exists λ ∈ �m+ such that
L(λ,u)(x)=m∑
i=1
λiui(x) ∈ �,
satisfies∑m
i=1 λiDui(x)= 0 (that is to say DL(λ,u)(x)= 0).By Lemma 4.20 this implies that
λ ∈ [Im(Du(x))]⊥
.
However suppose x /∈ Θ(u1, . . . , um). Then there exists v ∈ �n such that
Du(x)(v) = w ∈ 0�m+, i.e., 〈Dui(x), v〉 = wi > 0 for all i ∈ M , where w =(w1, . . . ,wm). But w ∈ ImDu(x) and w ∈ 0�m+.
Moreover λ ∈ �m+ and so 〈λ,w〉> 0 (since not all λi = 0).This contradicts λ ∈ [Im(Du(x))]⊥, since 〈λ,w〉 �= 0. Hence x ∈Θ(u1, . . . , um).
Thus we have shown that for any x ∈ Y , if DL(λ,u)(x) = 0 for some λ ∈ �m+,then x ∈Θ(u1, . . . , um). Clearly DL(λ,u)(x)= 0 iff DL(λ,u)(x)= 0, so we haveproved sufficiency.
To show necessity, suppose that {Dui(x) : i ∈M} are linearly independent. Ifx belongs to the interior of Y then for a vector h ∈ �n there exists a vector y =x + θh, for θ sufficiently small, so that y ∈ Y and ∀i ∈M, 〈Dui(x),h〉 > 0. Thusx /∈Θ(u1, . . . , um).
So suppose that DL(λ,u)(x) = 0 where λ �= 0 but λ /∈ 0�m+. Then for at least
one i, λi < 0. But then there exists a vector w ∈ 0�m+ where w = (w1, . . . ,wm) andwi > 0 for each i ∈M , such that 〈λ,w〉 = 0. By Lemma 4.20, w ∈ Im(Du(x)).Hence there exists a vector h ∈ �n such that Du(x)(h)=w.
But w ∈ 0�m+, and so 〈Dui(x), h〉> 0 for all i ∈M . Since x belongs to the interiorof Y , there exists a point y = x + αh such that y ∈ Hi(x) for all i ∈ M . Hencex /∈Θ(u1, . . . , um).
Consequently if x is an interior point of Y then x ∈Θ(u1, . . . , um) implies thatdL(λ,u)(x)= 0 for some semipositive λ in �m+. �
Example 4.12 To illustrate, we compute the Pareto set in �2 when the utility func-tions are
u1(x1, x2) = x1αx2 where α ∈ (0,1), and
u2(x1, x2) = 1− x12 − x2
2.
We maximise u1 subject to the constraint u2(x1, x2)≥ 0.As in Example 4.9, the first order condition is
(αx1
α−1x2, x1α)+ λ(−2x1,−2x2)= 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
176 4 Differential Calculus and Smooth Optimisation
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
Fig. 4.17 ???
Hence λ= αx1α−1x2
2x1= x1
α
2x2, so αx2
2 = x12, or x1 =±√αx2.
If x1 = −√αx2 then λ = x1α√α
2(−x1) < 0, and so such a point does not belong
to the critical Pareto set. Thus (x1, x2) ∈ Θ(u1, u2) iff x1 = x2√
α. Note that ifx1 = x2 = 0 then the Lagrangian may be written as
L(λ,u)(0,0)= λ1u1(0,0)+ λ2u2(0,0)
where λ1 = 0 and λ2 is any positive number. In the positive quadrant�2+, the criticalPareto set and global Pareto set coincide.
Finally to maximise u1 on the set {(x1, x2) : u2(x1, x2)≥ 0} we simply choose λ
such that u2(x1, x2)= 0.Thus x1
2+x22 = αx2
2+x22 = 1 or x2 = 1√
1+αand so (x1, x2)= (
√α√
1+α, 1√
1+α).
In the next chapter we shall examine the critical Pareto set Θ(u1, . . . , um) anddemonstrate that the set belongs to a singularity “manifold” which can be topologi-cally characterised to be of “dimension” m− 1. This allows us then to examine theprice equilibria of an exchange economy.
Note that in the example of a two person exchange economy studied inSect. 4.3.2, we showed that the result of individual optimising behaviour led toan outcome, x, such that
Du1(x)+ λDu2(x)= 0
where λ > 0. As we have shown here this “market clearing equilibrium” must be-long to the critical Pareto set Θ(u1, u2). Moreover, when both u1 and u2 representstrictly convex preferences, then this outcome belongs to the global Pareto set. Wedevelop this in the following theorem.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 177
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
The Welfare Theorem for an Exchange Economy Consider an exchange econ-omy where each individual in the society M = {1, . . . , i, . . . ,m} has initial endow-ment ei ∈ �n+ .1. Suppose that the demand x∗i (p) ∈ �n+ of agent i at each price p ∈ Δ is such
that x∗i (p) maximises the C1-utility function ui : �n →� on the budget setBi(p)= {xi ∈ �n+ : 〈p,xi〉 ≤ 〈p, ei〉} and satisfies(a) Dui(x
∗i (p))= λip, λi > 0,
(b) 〈p,x∗i (p)〉 = 〈p, ei〉.2. Suppose further that p∗ is a market clearing price equilibrium in the sense that∑m
i=1 x∗i (p∗)=∑mi=1 ei ∈ �n.
Then x∗ = (x∗1 (p∗), . . . , x∗m(p)) ∈Θ(u1, . . . , um).
Moreover if each ui is either concave, or strictly pseudo-concave then x∗ belongsto the Pareto set, P(u1, . . . , um).
Proof We need to define the set of outcomes first of all. An outcome, x, is a vector
x = (x1, . . . , xi, . . . , xm) ∈ (�n+)m =�nm+ ,
where xi = (xi1, . . . , xin) ∈ �n+ is an allocation for agent i. However there are n
resource constraintsm∑
i=1
xij =m∑
i=1
eij = e·j ,
for j = 1, . . . , n, where ei = (ei1, . . . , ein) ∈ �n+ is the initial endowment of agent i.Thus the set, Y , of feasible allocations to the members of M is a hyperplane of
dimension n(m− 1) in �nm+ through the point (e1, . . . , em).As coordinates for any point x ∈ Y we may choose
x = (x11, . . . , x1n, x21, . . . , x2n, . . . , x(m−1)1, . . . , x(m−1)n)
where it is implicit that the bundle of commodities available to agent m is
xm = (xm1, . . . , xmn)
where xmj = e·j −∑m−1i=1 xij .
Now define u∗i : Y →�, the extended utility function of i on Y by
u∗i (x)= ui(xi1, . . . , xin).
For i ∈M , it is clear that the direction gradient of i on Y is
Du∗i (x)=(
0, . . . ,0,∂ui
∂xi
, . . . ,∂ui
∂xin
,0, . . .
)= (. . . ,0, . . . ,Dui(x), . . . ,0
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
178 4 Differential Calculus and Smooth Optimisation
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
For agent m, ∂u∗m∂xij=− ∂um
∂xmjfor i = 1, . . . ,m− 1; thus
Du∗m(x) = −(
∂um
∂xm1, . . . ,
∂um
∂xmn
, . . .
)
= −(Dum(x), . . . , . . . ,Dum(x)).
If p∗ is a market-clearing price equilibrium, then by definition
m∑
i=1
x∗i(p∗)=
m∑
i=1
ei .
Thus x∗(p∗) = (x∗1 (p∗), . . . , x∗m−1(p∗)) belongs to Y . But each x∗i is a critical
point of ui : �n+ →� on the budget set Bi(p∗) and Dui(x
∗i (p∗))= λip
∗.Thus the Jacobian for u∗ = (u∗1, . . . , u∗m) : Y →�m at x∗(p∗) is
Du∗(x∗)=
⎡
⎢⎢⎢⎣
λ1p∗ 0 . . . . . . 0
0 λ2p∗
...... λm−1p
∗−λmp∗ −λmp∗ −λmp∗
⎤
⎥⎥⎥⎦
Hence 1λ1
Du∗1(x∗)+ 1λ2
Du∗2(x∗) · · · + 1λm
Du∗m(x∗)= 0. But each λi > 0 for i =1, . . . ,m. Then dL(μ,u∗)(x∗(p∗)) = 0 where L(μ,u∗)(x) =∑m
i=1 μiu∗i (x) and
μi = 1λi
and μ ∈ �m+.By Theorem 4.21, x∗(p∗) belongs to the critical Pareto set.Clearly, if for each i, ui : �n+ → � is concave C1- or strictly pseudo-concave
then u∗1 : Y →� will be also. By previous results, the critical and global Pareto setwill coincide, and so x∗(p∗) will be Pareto optimal. �
One can also show that the competitive allocation, x∗(p∗) ∈ �nm+ , constructedin this theorem is Pareto optimal in a very simple way. By definition x∗(p∗) ischaracterised by the two properties:1.∑m
i=1 x∗i (p∗)=∑mi=1 ei in �n (feasibility)
2. If ui(xi) > ui(x∗i (p∗)) then 〈p∗, xi〉> 〈p∗, ei〉 (by the optimality condition for
agent i).But if x∗i (p∗) is not Pareto optimal, then there exists a vector x = (x1, . . . , xm) ∈
�nm+ such that ui(xi) > ui(x∗i (p∗)) for i = 1, . . . ,m. By (2), 〈p∗, xi〉> 〈p∗, ei〉 for
each i and som∑
i=1
⟨p∗, xi
⟩=⟨
p∗,m∑
i=1
xi
⟩
>
⟨
p∗,m∑
i=1
ei
⟩
.
But if x ∈ �nm+ is feasible then∑m
i=1 xi ≤ ∑mi=1 ei which implies 〈p∗,∑m
i=1 xi〉 ≤ 〈p∗,∑mi=1 ei〉.
By contradiction x∗(p∗) must belong to the Pareto optimal set.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 179
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
The observation has an immediate extension to a result on existence of a core ofan economy.
Definition 4.3 Let e = (e1, . . . , em) ∈ �nm+ be an initial endowment vector for asociety M . Let D be any family of subsets of M , and let P = (P1, . . . ,Pm) be aprofile of preferences for society M , where each Pi : �n+ → �n+ is a preferencecorrespondence for i on the ith consumption space Xi ⊂�n+.
An allocation x ∈ �nm+ is S-feasible (for S ⊂ M) iff x = (xij ) ∈ �nm+ and∑i∈S xij =∑i∈S eij for each j = 1, . . . , n.Given e and P , an allocation x ∈ �nm+ belongs to the D-core of (e,P ) iff
x = (x1, . . . , xm) is M-feasible and there exists no coalition S ∈ D and an allo-cation y ∈ �nm+ such that y is S-feasible and of the form y = (y1, . . . , ym) withyi ∈ Pi(xi),∀i ∈ S.
To clarify this definition somewhat, consider the set Y from the proof of the wel-fare theorem. Y = YM is a hyperplane of dimension n(m− 1) through the endow-ment point e ∈ �nm+ . For any coalition S ∈D of cardinality s, there is a hyperplaneYS , say, of dimension n(s − 1) through the endowment point e, consisting of S-feasible trades among the members of S. Clearly YS ⊂ YM . If x ∈ YM but there issome y ∈ YS such that every member of S prefers y to x, then the members of S
could refuse to accept the allocation x. If there is no such point x, then x is “un-beaten”, and belongs to the D-core of the economy described by (e,P ).
Core Theorem Let D be any family of subsets of M . Suppose that p∗ ∈ Δ is amarket clearing price equilibrium for the economy (e,P ) and x∗(B) ∈ �nm+ is thedemand vector, where
x∗(p∗)= (x∗1 (p), . . . , x∗m(p)
) ∈ YM, and Pi
(x∗i(p∗))∩Bi
(p∗)=Φ.
Then x∗(p∗) belongs to the D-core, for the economy (e,P ).
Proof Suppose that x∗(p∗) is not in the core. Then there is some y ∈ YS such thaty = (yi : i ∈ S) and y ∈ Pi(xi) for each i ∈ S. Now x∗i (p∗) is a most preferred pointfor i on Bi(p
∗) so 〈p∗, yi〉> 〈p∗, ei〉 for all i ∈ S.Hence
⟨
p∗,∑
i∈S
yi
⟩
=∑
i∈S
⟨p∗, yi
⟩>∑
i∈S
⟨p∗, ei
⟩.
However if y ∈ YS , then∑
i∈S yi = ∑i∈S ei ∈ �n+, which implies 〈p∗,∑i∈S(yi − ei)〉 = 0. By contradiction, x∗(p∗) must be in the core. �
The Core Theorem shows, even if a price mechanism is not used, that if a marketclearing price equilibrium, p∗ does exist, then the core is non-empty. This means inessence that the competitive allocation, x∗(p∗), is Pareto optimal for every coalitionS ∈D.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
180 4 Differential Calculus and Smooth Optimisation
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
By the results of Sect. 3.8, a market clearing price equilibrium p∗ will exist undercertain conditions on preference. In particular suppose preference is representableby smooth utility functions that are concave or strictly pseudo-concave and mono-tonic in the individual consumption spaces. Then the conditions of the Welfare The-orem will be satisfied, and there will exist a market clearing price equilibrium, p∗,and a competitive allocation, x∗(p∗), at p∗ which belongs to the Pareto set for thesociety M . Indeed, since the Core Theorem is valid when D consists of all sub-sets of N , the two results imply that x∗(p∗) will then belong to the critical Paretoset ΘS , associated with each coalition S ∈M . This in turn suggests for any S, thereis a solution x∗ = x∗(p∗) to the Lagrangian problem dLS(μ,u)(x∗) = 0, whereLS(μ,u∗)(x)=∑i∈S μiu
∗i (x) and μi ≥ 0 ∀i ∈ S.
Here u∗i : YS →� is the extended utility function for i on YS .It is also possible to use the concept of a core in the more general context consid-
ered in Sect. 3.8, where preferences are defined on the full space X =ΠiXi ∈ �nm+ .In this case, however, a price equilibrium may not exist if the induced social prefer-ence violates convexity or continuity. It is then possible for the D-core to be empty.
Note in particular that the model outlined in this section implicitly assumes thateach economic agent chooses their demand so as to optimize a utility function onthe budget set determined by the price vector. Thus prices are treated as exogenousvariables. However, if agents treat prices as strategic variables then it may be rationalfor them to compute their effect on prices, and thus misrepresent their preferences.The economic game then becomes much more complicated than the one analyzedhere.
A second consideration is whether the price equilibria are unique, or even locallyunique. If there exists a continuum of pure equilibria, then prices may move aroundchaotically.
A third consideration concerns the attainment of the price equilibrium. InSect. 3.8 we constructed an abstract preference correspondence for an “auction-eer” so as to adjust the price vector to increase the value of the excess supply of thecommodities. We deal with these considerations in the next section and in Chap. 5.
4.4.2 Equilibria in an Exchange Economy
The Welfare Theorem gives an important insight into the nature of competitive al-locations. The coefficients μi of the Lagrangean L(μ,u) of the social optimisationproblem turn out to be inverse to the coefficients λi in the individual optimisationproblems, where λi is equal to the marginal utility of income for the ith agent.This in turn suggests that it is possible for an agent to transform his utility functionfrom ui to u′i in such a way as to decrease λi and thus increase μi , the “weight”of the ith agent in the social optimisation problem. This is called the problem ofpreference manipulation and is an interesting research problem with applications intrade theory.
Secondly the weights μi can be regarded as functionally dependent on the initialendowment vector (e1, . . . , em) ∈ �nm+ . Thus the question of market equilibriumcould be examined in terms of the functions μi : �nm+ →�, i = 1, . . . ,m.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 181
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
It is possible that one or a number of agents could destroy or exchange com-modities so as to increase their weights. This is termed the problem of resourcemanipulation or the transfer paradox (see Gale 1974, and Balasko 1978).
Example 4.13 To illustrate these observations consider a two person (i = 1,2) ex-change economy with two commodities (j = 1,2).
As in Example 4.10, assume the preference of the ith agent is given by a utilityfunction fi : �2+ →� : fi(x, y)= βi logx + (1− βi) logy where 0 < βi < 1.
Let the initial endowment vector of i be ei = (ei1, ei2). At the price vector p =(p1,p2), demand by agent i is di(p1,p2)= (
Iiβi
p1,
Ii (1−βi)p2
) where I = p1ei1+p2ei2
is the value at p of the endowment.Thus agent i “desires” to change his initial endowment from ei to e′i :
(ei1, ei2)→(e′i1, e′i2
)=(
βiei1 + βiei2p2
p1, (1− βi)ei2 + (1− βi)
ei1p1
p2
).
Another way of interpreting this is that i optimally divides expenditure betweenthe first and second commodities in the ratio β : (1− β). Thus agent i offers to sell(1 − β)ei1 units of commodity 1 for (1 − β)ei1p1 monetary units and buy (1 −βi)ei1
p1p2
units of the second commodity, and offers to sell βiei2 units of the second
commodity and buy βiei2p2p1
units of the first commodity.At the price vector (p1,p2) the amount of the first commodity on offer is (1−
βi)e11 + (1− β2)e21 and the amount on request is β1e12p∗2 + β2e22p
∗2 ; where p∗2
is the ratio p2 : p1 of relative prices. For (p1,p2) to be a market-clearing priceequilibrium we require
e11(1− β1)+ e21(1− β2)= p∗2(e12β1 + e22β2).
Clearly if all endowments are increased by a multiple α > 0, then the equilibriumrelative price vector is unchanged. Thus p∗2 is uniquely determined and so the finalallocations (e′11, e
′12), (e
′21, e
′22) can be determined.
As we showed in Example 4.10, the coefficients λi for the individual optimisationproblems satisfy λi = 1
Ii, where λi is the marginal utility of income for agent i.
By the previous analysis, the weights μi in the social optimisation problem sat-isfy μi = Ii . After some manipulation of the price equilibrium equation we find
μi
μk
= ei1(ei2 + βkek2)+ ei2(1− βk)ek1
ek1(ek2 + βiei2)+ ek2(1− βi)ei1.
Clearly if agent i can increase the ratio μi : μk then the relative utility of i vis-à-vis k is increased. However since the relative price equilibrium is uniquely de-termined in this example, it is not possible for agent i, say, to destroy some of theinitial endowments (ei1, ei2) so as to bring about an advantageous final outcome.The interested reader is referred to Balasko (1978).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
182 4 Differential Calculus and Smooth Optimisation
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
Fig. 4.18 ???
In this example the (relative) price equilibrium is unique, but this need not al-ways occur. Consider the two person, two commodity case illustrated below inFig. 4.18. As in the Welfare Theorem, the set of feasible outcomes Y is the sub-set of �4+ = (x11, x12, x21, x22) such that x11+x21 = e·1;x12+x22 = e·2, and this isa two-dimensional hyperplane through the point (e11, e12, e21, e22). Thus Y can berepresented in the usual two-dimensional Edgeworth box where point A, the mostpreferred point for agent 1, satisfies (x11, x12)= (e·1, e·2).
The price ray p̃ is that ray through (e11, e12) where tan α = p1p2
. Clearly (p1,p2)
is an equilibrium price vector if the price ray intersects the critical Pareto setΘ(f1, f2) at a point (x11, x12) in Y where p̃ is tangential to the indifference curvesfor f1 and f2 through (x11, x12). At such a point we then have Df1(x11, x12) +μDf2(x11, x12)= 0.
As Fig. 4.18 indicates there may well be a second price ray p̃′ which satisfiesthe tangency property. Indeed it is possible that there exists a family of such rays, oreven an open set V in the price simplex such that each p in V is a market clearingequilibrium. We now explore the question of local uniqueness of price equilibria.
To be more formal let X = �n+ be the commodity or consumption space. Aninitial endowment is a vector e = (e1, . . . , em) ∈Xm. A Cr utility function is a Cr -function u= (u1, . . . , um) :X→�m.
Let Cr(X,�m) be the set of Cr -profiles, and endow Cr(X,�m) with a topologyin the following way. (See the next chapter for a more complete discussion of theWhitney topology.)
A neighbourhood of f ∈ Cr(X,�m) is a set
{g ∈ Cr
(X,�m
) : ∥∥dkg(x)− dkf (x)∥∥< ε(x)
}for k = 0, . . . , r
where ε(x) > 0 for all x ∈X. (We use the notation that d0g = g and d1g = dg.)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 183
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
Write Cr(X,�m) for Cr(X,�m) with this topology. A property K is calledgeneric iff it is true for all profiles which belong to a residual set in Cr(X,�m).Here residual means that the set is the countable intersection of open dense sets inCr(X,�m).
If a property is generic then we may say that almost all profiles in Cr(X,�m)
have that property.A smooth exchange economy is a pair (e, u) ∈Xm × Cr(X,�m). As before the
feasible outcome set is
Y ={
(x1, . . . , xm) ∈Xm :m∑
i=1
xi =m∑
i=1
ei
}
and a price vector p belongs to the simplex Δ= {p ∈X : ‖p‖ = 1}, where Δ is anobject of dimension n− 1.
As in the welfare theorem, the demand by agent i at p ∈ Δ satisfies : x∗i (p)
maximises ui on
B(p)= {xi ∈X : 〈p,xi〉 ≤ 〈p, ei〉}.
As we have observed, under appropriate boundary conditions, we may assume∀ i ∈M,x∗i (p) satisfies1. 〈p,x∗i (p)〉 = 〈p, ei〉2. D∗ui(x
∗i (p))= p ∈Δ. Say (x∗(p∗),p∗)= (x∗1 (p∗), . . . , x∗m(p∗),p∗) ∈Xm×
Δ is a market or Walrasian equilibrium iff x∗(p∗) is the competitive allocationat p∗ and satisfies
m∑
i=1
x∗i(p∗)=
m∑
i=1
ei ∈ �n+.
The economy (e, u) is regular iff (e, u) is such that the set of Walrasian equilibriais finite.
Debreu-Smale Theorem on Generic Existence of Regular Economies There isa residual set U in Cr(X,�m) such that for every profile u ∈U , there is a dense setV ∈Xm with the property that (e, u) is a regular economy whenever (e, u) ∈ V ×U .
The proof of this theorem is discussed in the next chapter (the interested readermight also consult Smale 1976). However we can give a flavor of the proof here.
Consider a point (e, x,p) ∈ Xm ×Xm ×Δ. This space is of dimension 2nm+(n− 1).
Now there are n resource restrictions
m∑
i=1
xij =m∑
i=1
eij
for j = 1, . . . , n, together with (m− 1) budget restrictions 〈p,xi〉 = 〈p, ei〉 for i =1, . . . ,m− 1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
184 4 Differential Calculus and Smooth Optimisation
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
Fig. 4.19 ???
Note that the budget restriction for the mth agent is redundant.Let Γ = {(e, x,p) ∈Xm×Xm×Δ} be the set of points satisfying these various
restrictions. Then Γ will be of dimension 2nm+ (n−1)−[n+m−1] = 2nm−m.However, we also have m distinct vector equations D∗ui(x)= p, i = 1, . . . ,m.Since these vectors are normalised, each one consists of (n− 1) separate equa-
tions. Chapter 5 shows that singularity theory implies that for every profile u ina residual set, each of these constraints is independent. Together these m(n − 1)
constraints reduce the dimension of Γ by m(n− 1). Hence the set of points in Γ
satisfying the first order optimality conditions is a smooth object Zu of dimension
2nm−m−m(n− 1)= nm.
Now consider the projection
Zu ⊂Xm × (Xm ×Δ)→Xm : (e, x,p)→ e,
and note that both Zu and Xm have dimension nm.A regular economy (e, u) is one such that the projection map proj : Zu → Xm :
(e, x,p)→ e has differential with maximal rank nm. Call e a regular value in thiscase. From singularity theory it is known that for all u in a residual set U , the setof regular values of the projection map is dense in Xm. Thus when u ∈ U , and e isregular, the set of Walrasian equilibria for (e, u) will be finite. Figure 4.19 illustratesthis. At e1 there is only one Walrasian equilibrium, while at e3 there are three. More-over in a neighbourhood of e3 the Walrasian equilibria move continuously with e.At e4 the Walrasian equilibrium set is one-dimensional. As e moves from the rightpast e2 the number of Walrasian equilibria drops suddenly from 3 to 1, and displaysa discontinuity. Note that the points (x,p) satisfying (e, x,p) ∈ (proj)−1(e) neednot be Walrasian equilibria in the classical sense, since we have considered only thefirst order conditions. It is clearly the case that if there is non-convex preference,then the first order conditions are not sufficient for equilibrium. However, Smale’s
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
4.4 The Pareto Set and Price Equilibria 185
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
Fig. 4.20 ???
theorem shows the existence of extended Walrasian equilibria. The same difficultyoccurs in the proof that a Walrasian equilibrium gives a Pareto optimal point in Y .
Let0Θ(u)= 0
Θ(u1, . . . , um) be the set of points satisfying the first order conditiondL(λ,u)= 0 where λ ∈ �m+. Suppose that we solve this with λ1 �= 0.
Then we may write Du1(x)+∑mi=2
λi
λ1Dui(x)= 0.
Clearly there are (m − 1) degrees of freedom in this solution and indeed0Θ(u1, . . . , um) can be shown to be a geometric object of dimension (m − 1) “al-
most always” (see Chap. 5). However0Θ(u) will contain points that are the “social”
equivalents of the minima of a real-valued function.
Note, by Lemma ??, that0Θ(u) and the critical Pareto set Θ(u) coincide, except
for boundary points. If the boundary of the space is smooth, then it is possible todefine a Lagrangian which characterises the boundary points in Θ(u).
For example consider Fig. 4.20, of a two agent two commodity exchange econ-omy.
Agent 1 has non-convex preference, and the critical Pareto set consists of threecomponents ABC,ADC and EFG.
On ADC although the utilities satisfy the first order condition, there exist nearbypoints that both agents prefer. For example, both agents prefer a nearby point y tox. See Fig. 4.21.
In Fig. 4.22 from an initial endowment such as e = (e11, e12), there exists threeWalrasian extended equilibria, but at least one can be excluded. Note that if e is theinitial endowment vector, then the Walrasian equilibrium B which is accessible byexchange along the price vector may be Pareto inferior to a Walrasian equilibrium,F , which is not readily accessible.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
186 4 Differential Calculus and Smooth Optimisation
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
Fig. 4.21 ???
Fig. 4.22 ???
Fig. 4.23 ???
Example 4.14 Consider the example due to Smale (as in Fig. 4.23). Let Y = �2
and suppose
u1(x, y) = y − x2
u2(x, y) = −y
x2 + 1.
Then
Du1(x, y) = (−2x,1)
Du2(x, y) =(
2xy
(x2 + 1)2,−1
x2 + 1
).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
References 187
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
Let DLλ(x, y)= λ1(−2x,1)+ λ2(2xy
(x2+1)2 , −1x2+1
).Clearly one solution will be x = 0, in which case λ1(0,1) + λ2(0,−1) = 0 or
λ1 = λ2 = 1.The Hessian for L at x = 0 is then
HL(0, y) =D2u1(0, y)+D2u2(0, y)
=(−2 0
0 0
)+(
2y 00 0
)
which is negative semi-definite for 2(y − 1) < 0 or y < 1.
References
For a lucid account of economic equilibrium theory see:
<unc> Arrow, K. J., & Hahn, F. H. (1971). General competitive analysis. Edinburgh: Oliver and Boyd.Hildenbrand, W., & Kirman, A. P. (1976). Introduction to equilibrium analysis. Amsterdam: North
Holland.
For the ideas of preference or resource manipulation, see:
Balasko, Y. (1978). The transfer problem and the theory of regular economies. International Eco-nomic Review, 19, 687–694.
Gale, D. (1974). Exchange equilibrium and coalitions. Journal of Mathematical Economics, 1,63–66.
<unc> Guesnerie, R., & Laffont, J.-J. (1978). Advantageous reallocations of initial resources. Economet-rica, 46, 687–694.
<unc> Safra, Z. (1983). Manipulation by reallocating initial endowments. Journal of Mathematical Eco-nomics, 12, 1–17.
For a general introduction to the application of differential topology to economics see:
Smale, S. (1976). Dynamics in general equilibrium theory. American Economic Review, 66, 288–294. Reprinted in S. Smale (1980) The Mathematics of Time. Springer: Berlin.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
5Singularity Theory and General Equilibrium
In the last section of the previous chapter we introduced the critical Pareto set Θ(u)
of a smooth profile for a society, and the notion of a regular economy. Both ideasrelied on the concept of a singularity of a smooth function f : X→�m, where asingularity is a point analogous to the critical point of a real-valued function.
In this chapter we introduce the fundamental result in singularity theory, that theset of singularity points of a smooth profile almost always has a particular geometricstructure. We then go on to use this result to discuss the Debreu-Smale Theorem onthe generic existence of regular economies. No attempt is made to prove these re-sults in full generality. Instead the aim is to provide a geometric understanding of theideas. Section 5.4 uses an example of Scarf (1960) to illustrate the idea of an excessdemand function for an exchange economy. The example provides a general wayto analyse a smooth adjustment process leading to a Walrasian equilibrium. Sec-tions 5.5 and 5.6 introduce the more abstract topological ideas of structural stabilityand chaos in dynamical systems.
5.1 Singularity Theory
In Chap. 4 we showed that when f :X→� was a C2-function on a normed vectorspace, then knowledge of the first and second differential of f at a critical point, x,gave information about the local behavior (near x) of the function. In this sectionwe discuss the case of a differentiable function f :X→ Y between general normedvector spaces, and consider regular points (where the differential has maximal rank)and singularity points (where the differential has non-maximal rank). For both kindsof points we can locally characterise the behavior of the function.
5.1.1 Regular Points: The Inverse and Implicit Function Theorem
Suppose that f : X→ Y is a function between normed vector spaces and that forall x ′ in a neighbourhood U of the point x the differential df (x′) is defined. Bythe results of Sect. 3.2, if df (x′) is bounded, or if X is finite-dimensional, then
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_5,© Springer-Verlag Berlin Heidelberg 2014
189
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
190 5 Singularity Theory and General Equilibrium
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
df (x′) will be a continuous linear function. Suppose now that X and Y have thesame finite dimension (n) and df (x′) has rank n at all x′ ∈ U . Then we knowthat df (x′)−1 : Y → X is a linear map and thus continuous. We shall call f a C2-diffeomorphism on U in this case.
In general, when Y is an infinite-dimensional vector space, then even if df (x′)−1
exists it need not be continuous. However when X and Y are complete normedvector spaces then the existence of df (x′)−1 is sufficient to guarantee that df (x′)−1
is continuous.Essentially a normed vector space X is complete iff any “convergent” sequence
(xk) does indeed converge to a point in X. More formally, a Cauchy sequence isa sequence (xk) such that for any ε > 0 there exists some number k(ε) such thatr, s > k(ε) implies that ‖xr − xs‖ < ε. If (xk) is a sequence with a limit x0 in X
then clearly (xk) must be a Cauchy sequence. On the other hand a Cauchy sequenceneed not in general converge to a point in the space X. If every Cauchy sequence hasa limit in X, then X is called complete. A complete normed vector space is calleda Banach space. Clearly �n is a Banach space. Suppose now that X,Y are normedvector spaces of the same dimension, and f : U ⊂ X→ Y is a Cr -differentiablefunction on U , such that df (x) has a continuous inverse [df (x)]−1 at x. We call f aCr -diffeomorphism at x. Then we can show that f has an inverse f−1 : f (U)→U
with differential df−1(f (x))= [df (x)]−1. Moreover there exists a neighbourhoodV of x in U such that f is a Cr -diffeomorphism on V . In particular this meansthat f has an inverse f−1 : f (V )→ V with continuous differential df−1(f (x′))=[df (x′)]−1 for all x′ ∈ V , and that f−1 is Cr -differentiable on V . To prove thetheorem we need to ensure that [df (x)]−1 is not only linear but continuous, and itis sufficient to assume X and Y are Banach spaces.
Inverse Function Theorem Suppose f : U ⊂ X→ Y is Cr -differentiable, whereX and Y are Banach spaces of dimension n. Suppose that the linear map df (x) :X→ Y , for x ∈ U , is an isomorphism with inverse [df (x)]−1 : Y →X. Then thereexist open neighbourhoods V of x in U and V ′ of f (x) such that f : V → V ′ isa bijection with inverse f−1 : V ′ → V . Moreover f−1 is itself Cr -differentiableon V ′, and for all x′ ∈ V , df−1(f (x′)) = [df (x′)]−1. f is called a local Cr -dif-feomorphism, at x.
Outline of Proof Let t = df (x) :X→ Y . Since [df (x)]−1 exists and is continuous,t−1 : Y →X is linear and continuous.
It is possible to choose a neighbourhood V ′ of f (x) in f (U) and a closed ball Vx
in U centered at x, such that, for each y ∈ V ′, the function gy : Vx ⊂U ⊂X→ Vx ⊂X defined by gy(x
′)= x′ − t−1[f (x′)− y] is continuous. By Brouwer’s Theorem,each gy has a fixed point.
That is to say for each y ∈ V ′, there exists x′ ∈ Vx such that gy(x′)= x′. But then
t−1[f (x′)− y] = 0. Since, by hypothesis, t−1 is an isomorphism, its kernel = {0},and so f (x′) = y. Thus for each y ∈ V ′ we establish gy(x
′) = x′ is equivalent tof (x′) = y. Define f−1(y) = gy(x
′) = x′, which gives the inverse function on V ′.To show f−1 is differentiable, proceed as follows.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.1 Singularity Theory 191
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Note that dgy(x′)= Id−t−1◦df (x′) is independent of y. Now dgy(x
′) is a linearand continuous function from X to X and is thus bounded. Since X is Banach, it ispossible to show that L(X,X), the topological space of linear and continuous mapsfrom X to X, is also Banach. Thus if u ∈ L(X,X), so is (Id−u)−1. This followssince (Id−u)−1 converges to an element of L(X,X).
Now dgy(x′) ∈ L(X,X) and so (Id−dgy(x
′))−1 ∈ L(X,X). But then t−1 ◦df (x′) has a continuous linear inverse. Now t−1 ◦ df (x′) : X→ Y → X and t−1
has a continuous linear inverse. Thus df (x′) has a continuous linear inverse, for allx′ ∈ V . Let V be the interior of Vx . By the construction the inverse of df (x′), forx′ ∈ V , has the required property. �
This is the fundamental theorem of differential calculus. Notice that the theoremasserts that if f : �n →�n and df (x) has rank n at x, then df (x′) has rank n forall x′ in a neighbourhood of x.
Example 5.1 (i) For a simple example, consider the function exp : �→�+ : x→ex . Clearly for any finite x ∈ �, d(exp)(x)= ex �= 0, and so the rank of the differ-ential is 1. The inverse φ : �+ →� must satisfy
dφ(y)= [d(exp)(x)]−1 = 1
ex
where y = exp(x)= ex . Thus dφ(y)= 1y
.Clearly φ must be the function loge : y→ loge y.(ii) Consider sin: (0,2π)→[−1,+1].Now d(sin)(x) �= 0. Hence there exist neighbourhoods V of x and V ′ of sinx
and an inverse φ : V ′ → V such that
dφ(y)= 1
cosx= 1√
1− y2.
This inverse φ is only locally a function. As Fig. 5.1 makes clear, even whensinx = y, there exist distinct values x1, x2 such that sin(x1)= sin(x2)= y. Howeverd(sin)(x1) �= d(sin)(x2).
The figure also shows that there is a neighbourhood V ′ of y such that φ : V ′ → V
is single-valued and differentiable on V ′. Suppose now x = π2 . Then d(sin)(π
2 )= 0.Moreover there is no neighbourhood V of π
2 such that sin : (π2 − ε, π
2 + ε)= V →V ′ has an inverse function.
Note one further consequence of the theorem. For h small, we may write
f (x + h) = f (x)+ df (x) ◦ [df (x)]−1(
f (x + h)− f (x))
= f (x)+ df (x)ψ(h),
where ψ(h)= d[f (x)]−1(f (x+h)−f (x)). Now by a linear change of coordinateswe can diagonalise df (x). So that in the case f = (f1, . . . , fn) : �n →�n we canensure ∂fi
∂xj|x = ∂ij where ∂ij = 1 if i = j and 0 if i �= j . Hence f (x + h)= f (x)+
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
192 5 Singularity Theory and General Equilibrium
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Fig. 5.1 ???
(ψ1(h), . . . ,ψn(h)). There is therefore a Cr -diffeomorphic change of coordinates φ
near x such that φ(0)= x and
f(φ(h1, . . . , hn)
)= f (x)+ (h1, . . . , hn).
In other words by choosing coordinates appropriately f may be represented byits linear differential.
Suppose now that f : U ⊂�n→�m is a C1-function. The maximal rank of df
at a point x in U is min(n,m). If indeed df (x) has maximal rank then x is called aregular point of f , and f (x) a regular value. In this case we write x ∈ S0(f ). Theinverse function theorem showed that when n=m and x ∈ S0(f ) then f could beregarded as an identity function near x.
In particular this means that there is a neighbourhood U of x such thatf−1[f (x)] ∩U = {x} is an isolated point.
In the case that n �=m we use the inverse function theorem to characterise f atregular points.
Implicit Function Theorem for Vector Spaces1. (Surjective Version). Suppose that f : U ⊂ �n → �m, n ≥ m, and rank
(df (x)) = m, with f (x) = 0 for convenience. If f is Cr -differentiable at x,then there exists a Cr -diffeomorphism φ : �n→�n on a neighbourhood of theorigin such that φ(0)= x, and f ◦ φ(h1, . . . , hn)= (h1, . . . , hm).
2. (Injective Version). If f : U ⊂ �n → �m, n ≤ m, rank (df (x)) = n, withf (0) = y, and f is Cr -differentiable at x then there exists a Cr -diffeomorph-ism ψ : �m→�m such that ψ(y)= 0 and
ψf (h1, . . . , hn)= (h1, . . . , hn,0, . . . ,0).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.1 Singularity Theory 193
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Proof1. Now df (x) = [B C], with respect to some coordinate system, where B is an
(m×m) non singular matrix and C is an (n−m)×m matrix. Define F : �n→�n by
F(x1, . . . , xn)=(f (x1, . . . , xn), xm+1, . . . , xn
).
Clearly DF(x) has rank n, and by the inverse function theorem there existsan inverse φ to F near x. Hence F ◦ φ(h1, . . . , hn) = (h1, . . . , hn). But thenf ◦ φ(h1, . . . , hn)= (h1, . . . , hm).
2. Follows similarly. �
As an application of the theorem, suppose f : �m ×�n−m→�m. Write x for avector in �m, and y for a vector in �n−m, and let df (x, y)= [B C] where B is anm×m matrix and C is an (n−m)×m matrix. Suppose that B is invertible, at (x, y),and that f (x, y) = 0. Then the implicit function theorem implies that there existsan open neighbourhood U of y in �n−m and a differentiable function g : U →�m
such that g(y′)= x′ and f (g(y′), y′)= 0 for all y′ ∈ V .To see this define
F : �m ×�n−m→�m ×�n−m
by F(x, y)= (f (x, y), y).
Clearly dF(x, y) = [ B C
O I
]and so dF(x, y) is invertible. Thus there exists a
neighbourhood V of (x, y) in �n on which F has a diffeomorphic inverse G. NowF(x, y)= (0, y). So there is a neighbourhood V ′ of (0, y) and a neighbourhood V
of (x, y) s.t. G : V ⊂�n→ V ′ ⊂ �n is a diffeomorphism.Let g(y′) be the x coordinate of G(0, y′) for all y′ such that (0, y′) ∈ V ′.Clearly g(y′) satisfies G(0, y′)= (g(y′), y′) and so F ◦G(0, y′)= F(g(y′), y)=
(f (g(y′, y′)), y′)= (0, y′).Now if (x′, y′) ∈ V ′ then y′ ∈ U where U is open in �n−m. Hence for all y′ ∈
U,g(y′) satisfies f (g(y′), y′) = 0. Since G is differentiable, so must be g : U ⊂�n−m→�m. Hence x′ = g(y′) solves f (x′, y′)= 0.
Example 5.21. Let f : �3→�2 where
f1(x, y, z) = x2 + y2 + z2 − 3
f2(x, y, z) = x3y3z3 − x + y − z.
At (x, y, z)= (1,1,1), f1 = f2 = 0.We seek a function g : �→�2 such that f (g1(z), g2(z), z)= 0 for all z in
a neighbourhood of 1.Now
df (x, y, z)=(
2x 2y 2z
3x2y3z3 − 1 3x3y2z3 + 1 3x3y3z2 − 1
)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
194 5 Singularity Theory and General Equilibrium
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
and so
df (1,1,1)=(
2 2 22 4 2
).
The matrix(
2 22 4
)
is non-singular. Hence there exists a diffeomorphism G : �3 →�3 such thatG(0,0,1)= (1,1,1), and G(0,0, z′)= (g1(z
′), g2(z′), z′) for z′ near 1.
2. In a simpler example consider f : �2 →� : (x, y)→ (x − a)2 + (y − b)2 −25= 0, with df (x, y)= (2(x − a),2(y − b)).
Now let F(x, y) = (x, f (x, y)), where F : �2 →�2, and suppose y �= b.Then
dF(x, y) =(
1 0∂f∂x
∂f∂y
)
=(
1 02(x − a) 2(y − b)
)
with inverse
dG(x, y)= 1
2(y − b)
(2(y − b) 0−2(x − a) 1
).
Define g(x′) to be the y-coordinate of G(x′,0). Then F ◦ G(x′,0) =F(x′, g(x′)) = F(x′, f (x′, y′)) = (x′,0), and so y′ = g(x′) for f (x′, y′) = 0and y′ sufficiently close to y. Note also that
dg
dx
∣∣∣∣x′= dG2
∂x
∣∣∣∣(x′,g′)
=− (x′ − a)
(y′ − b).
In Example 5.2(1), the “solution” g(z) = (x, y) to the smooth constraintf (x, y, z)= 0 is, in a topological sense, a one-dimensional object (since it is givenby a single constraint in �2).
In the same way in Example 5.2(2) the solution y = g(x) is a one-dimensionalobject (since it is given by a single constraint in �2).
More specifically say that an open set V in �n is an r-dimensional smooth man-ifold iff there exists a diffeomorphism
φ : V ⊂�n→U ⊂�r .
When f : �n→�m and rank (df (x))=m≤ n then say that f is a submersionat x. If rank (df (x))= n≤m then say f is an immersion at x.
One way to interpret the implicit function theorem is as follows:
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.1 Singularity Theory 195
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
(1) When f is a submersion at x, then the inverse f−1(f (x)) of a point f (x)
“looks like” an object of the form {x,hm+1, . . . , hn} and so is a smooth manifold in�n of dimension (n−m).
(2) When f is an immersion at x, then the image of an (n-dimensional) neighbor-hood U of x “looks like” an n-dimensional manifold, f (u), in �m. These observa-tions can be generalized to the case when f :Xn→ Ym is “smooth”, and X,Y arethemselves smooth manifolds of dimension n,m respectively. Without going intothe formal details, X is a smooth manifold of dimension n if it is a paracompacttopological space and for any x ∈ X there is a neighborhood V of x and a smooth“chart”, φ : V ⊂X→ U ⊂�n. In particular if x ∈ Vi ∩ Vj for two open neighbor-hoods, Vi,Vj of x then
φi ◦ φ−1j : φj (Vi ∩ Vj )⊂�n→ φi(Vi ∩ Vj )⊂�n
is a diffeomorphism. A smooth structure on X is an atlas, namely a family {(φi,Vi)}of charts such that {Vi} is an open cover of X. The purpose of this definition is thatif f :Xn→ Ym then there is an induced function near a point x given by
fij =ψi ◦ f ◦ φ−1j : �n→ φ−1
j (Vj )→ Y →�m.
Here (φj ,Vj ) is a chart at x, and (ψi,Vi) is a chart at f (x). If the inducedfunctions {fij } at every point x are differentiable then f is said to be differentiable,and the “induced” differential of f is denoted by df . The charts thus provide aconvenient way of representing the differential df of f at the point x. In particularonce (φj ,Vj ) and (ψi,Vi) are chosen for x and f (x), then df (x) can be representedby the Jacobian matrix Df (x)= (∂fij ). As before Df (x) will consist of n columnsand m rows. Characteristics of the Jacobian, such as rank, will be independent ofthe choices for the charts (and thus coordinates) at x and f (x). (See Chillingsworth(1976), for example, for the details.)
If the differential df of a function f : Xn→ Ym is defined and continuous thenf is called C1. Let C1(X,Y ) be the collection of such C1-maps. Analogous to thecase of functions between real vector spaces, we may also write Cr(X,Y ) for theclass of Cr -differentiable functions between X and Y .
The implicit function theorem also holds for members of C1(X,Y ).
Implicit Function Theorem for Manifolds Suppose that f : Xn → Ym is a C1-function between smooth manifolds of dimension n,m respectively.1. If n ≥ m and f is a submersion at x (i.e., rank df (x) = m) then f−1(f (x))
is (locally) a smooth manifold in X of dimension (n−m). Moreover, if Z is amanifold in Yn of dimension r , and f is a submersion at each point in f−1(Z),then f−1(Z) is a submanifold of X of dimension n−m+ r .
2. If n ≤ m and f is an immersion at x (i.e., rank df (x) = n) then there is aneighbourhood U of x in X such that f (U) is an n-dimensional manifold in Y
and in particular f (U) is open in Y .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
196 5 Singularity Theory and General Equilibrium
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Fig. 5.2 ???
The proof of this theorem is considerably beyond the scope of this book, but theinterested reader should look at Golubitsky and Guillemin (1973, p. 9) or Hirsch(1976, p. 22). This theorem is a smooth analogue of the isomorphism theorem forlinear functions given in Sect. 2.2. For a linear function T : �n→�m when n≥m
and T is surjective, then T −1(y) has the form x0 + K where K is the (n − m)-dimensional kernel. Conversely if T : �n →�m and n ≤ m when T is injective,then image (T ) is an n-dimensional subspace of �m. More particularly if U is ann-dimensional open set in �n then T (U) is also an n-dimensional open set in �m.
Example 5.3 To illustrate, consider Example 5.2(2) again. When y �= b, df hasrank 1 and so there exists a “local” solution y′ = g(x′) such that f (x′, g(x′))= 0.In other words
f−1(0)= {(x′, g(x′)) ∈ �2 : x′ ∈U},
which essentially is a copy of U but deformed in �2. Thus f−1(0) is “locally”a one-dimensional manifold. Indeed the set S1 = {(x, y) : f (x, y) = 0} itself is a1-dimensional manifold in �2.
If y �= b, and (x, y) ∈ S1 then there is a neighbourhood U of x and a diffeomor-phism g : S1→� : (x′, y′)→ g(y′) and this parametrises S1 near (x, y).
If y = b, then we can do the same thing through a local solution x′ = h(y′)satisfying f (h(y′), y′)= 0.
5.1.2 Singular Points and Morse Functions
When f : Xn → Ym is a C1-function between smooth manifolds, and rank df (x)
is maximal (=min(n,m)) then as before write x ∈ S0(f ).The set of singular points of f is S(f ) = X\S0(f ). Let z =min(n,m) and say
that x is a corank r singularity, or x ∈ Sr(f ), if rank (df (x))= z− r .Clearly S(f )=⋃r>1 Sr(f ).In the next section we shall examine the corank r singularity sets of a C1-function
and show that they have a nice geometric structure. In this section we consider thecase m= 1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.1 Singularity Theory 197
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
In the case of a C2-function f : Xn →�, either x will be regular (in S0(f ))or a critical point (in S1(f )) where df (x) = 0. We call a critical point of f non-degenerate iff d2f (x) is non-singular. A C2-function all of whose critical pointsare non degenerate is called a Morse function. A Morse function, f , near a criticalpoint has a very simple representation.
A local system of coordinates at a point x in X is a smooth assignment
yφ→ (h1, . . . , hn)
for every y in some neighbourhood U of x in X.
Lemma 5.1 (Morse) If f :Xn→� is C2 and x is a non-degenerate critical pointof index k, then there exists a local system of coordinates (or chart (φ,V )) at x suchthat f is given by
yφ→ (h1, . . . , hn)
g→ f (x)−k∑
i=1
hi2 +
n∑
i=k+1
hi2.
As before the index of the critical point is the number of negative eigenvalues ofthe Hessian Hf at x. The C2-function g has Hessian
Hg(0)=
⎛
⎜⎜⎜⎜⎜⎜⎜⎜⎝
−2··−2
2··
⎞
⎟⎟⎟⎟⎟⎟⎟⎟⎠
↑k
↓
with k negative eigenvalues. Essentially the Morse lemma implies that when x is anon-degenerate critical point of f , then f is topologically equivalent to the functiong with a similar Hessian at the point.
By definition, if f is a Morse function then all its critical points are non-degenerate. Moreover if x ∈ S1(f ) then there exists a neighbourhood V of x suchthat x is the only critical point of f in V .
To see this note that for y ∈ V ,
df (y)= dg(h1, . . . , hn)= (−2h1, . . . ,2hn)= 0
iff h1 = · · · = hn = 0, or y = x. Thus each critical point of f is isolated, and soS1(f ) is a set of isolated points and thus a zero-dimensional object.
As we shall see almost any smooth function can be approximated arbitrarilyclosely by a Morse function.
To examine the regular points of a differentiable function f :X→�, we can usethe Sard Lemma.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
198 5 Singularity Theory and General Equilibrium
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Fig. 5.3 ???
First of all a set V in a topological space X is called nowhere dense if its closure,clos(V ), contains no non-empty open set. Alternatively X\clos(V ) is dense.
If X is a complete metric space then the union of a countable collection of closednowhere dense sets is nowhere dense. This also means that a residual set (the in-tersection of a countable collection of open dense sets) is dense. (See Sect. 3.1.2).A set V is of measure zero in X iff for any ε > 0 there exists a family of cubes, withvolume less than ε, covering V . If V is closed, of measure zero, then it is nowheredense.
Lemma 5.2 (Sard) If f :Xn→� is a Cr -map where r ≥ n, then the set f (S1(f ))
of critical values of f has measure zero in �. Thus f (S0(f )), the set of regularvalues of f , is the countable intersection of open dense sets and thus is dense.
To illustrate this consider Fig. 5.3. f is a quasi-concave C1-function f : �→�.The set of critical points of f , namely S1(f ), clearly does not have measure zero,since S1(f ) has a non-empty interior. Thus f is not a Morse function. Howeverf (S1(f )) is an isolated point in the image.
Example 5.4 To illustrate the Morse lemma let Z = S1 × S1 be the torus (the skinof a doughnut) and let f :Z→� be the height function.
Point s, at the bottom of the torus, is a minimum of the function, and so the indexof s = 0. Let f (s)= 0.
Then near s, f can be represented by
(h1, h2)→ 0+ h12 + h2
2.
Note that the Hessian of f at s is[ 2 0
0 2
], and so is positive definite.
The next critical point, t , is obviously a saddle, with index 1, and so we can write
(h1, h2)→ f (t)+ h12 − h2
2. Clearly Hf (t)= [ 2 00 −2
].
Suppose now that a ∈ (f (s), f (t)). Clearly a is a regular value, and so any pointx ∈Z satisfying f (x)= a is a regular point, and f is a submersion at x. By the im-plicit function theorem f−1(a) is a one-dimensional manifold. Indeed it is a singlecopy of the circle, S1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.1 Singularity Theory 199
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
Fig. 5.4 Critical points on Z
The next critical point is the saddle, u, near which f is represented as
(h1, h2)→ f (u)− h12 + h2
2.
Now for b ∈ (f (t), f (u)), f−1(b) is a one-dimensional manifold, but this timeit is two copies of S1. Finally v is a local maximum and f is represented near v by(h1, h2)→ f (u)− h1
2 − h22. Thus the index of v is 2.
We can also use this example to introduce the idea of the Euler characteristicχ(X) of a manifold X. If X has dimension, n, let ci(X,f ) be the number of criticalpoints of index i, of the function f :X→� and let
χ(X,f )=n∑
i=0
(−1)ici(X,f ).
For example the height function, f , on the torus Z has
(i) c0(Z,f )= 1, since s has index 0(ii) c1(Z,f )= 2, since both t and u have index 1
(iii) c2(Z,f )= 1, since v has index 2.Thus χ(Z,f )= 1− 2+ 1= 0. In fact, it can be shown that χ(X,f ) is indepen-
dent of f , when X is a compact manifold. It is an invariant of the smooth manifoldX, labelled (χ(X)). Example 5.4 illustrates the fact that χ(Z)= 0.
Example 5.5(1) The sphere S1. It is clear that the height function f : S1 →� has an index 0
critical point at the bottom and an index 1 critical point at the top, so χ(S1)=1− 1= 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
200 5 Singularity Theory and General Equilibrium
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Fig. 5.5 Critical points in S2
(2) The sphere S2 has an index 0 critical point at the bottom and an index 2 criticalpoint at the top, so χ(S2)= c0 + c2 = 1+ 1= 2.
It is possible to deform the sphere, S2, so as to induce a saddle, but thiscreates an index 0 critical point. In this case c0 = 2, c1 = 1, and c2 = 1 as inFig. 5.5. Thus χ(S2)= 2− 1+ 1= 2 again.
(3) More generally, χ(Sn)= 0 if n is odd and = 2 if n is even.(4) To compute χ(Bn) for the closed n-ball, take the sphere Sn and delete the top
hemisphere. The remaining bottom hemisphere is diffeomorphic to Bn. Bythis method we have removed the index n critical point at the top of Sn.
For n= 2k + 1 odd, we know
χ(S2k+1)=
2k∑
i=0
(−1)ici
(Sn)− cn
(Sn)= 0,
so χ(B2k+1)= χ(S2k+1)+ 1= 1. For n= 2k, even we have
2k−1∑
i=0
(−1)ci
(Sn)+ cn
(Sn)= 2, so χ
(B2k)= χ
(S2k)− 1= 1.
5.2 Transversality
To examine the singularity set S(f ) of a smooth function f :X→ Y we introducethe idea of transversality.
A linear manifold V in �n of dimension v is of the form x0 +K , where K is avector subspace of �n of dimension v. Intuitively if V and W are linear manifoldsin �n of dimension v,w then typically they will not intersect if v +w < n.
On the other hand if v + w ≥ n then V ∩ W will typically be of dimensionv +w− n.
For example two lines in �2 will intersect in a point of dimension 1+ 1− 2.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.2 Transversality 201
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
Another way of expressing this is to define the codimension of V in �n to ben− v. Then the codimension of V ∩W in W will typically be w − (v +w − n)=n− v, the same codimension.
Suppose now that f : Xn → Ym where X,Y are vector spaces of dimensionn,m respectively. Let Z be a z-dimensional linear manifold in Y . Say that f istransversal to Z iff for all x ∈ X, either (i) f (x) /∈ Z or (ii) the image of df (x),regarded as a vector subspace of Ym, together with Z span Y . In this case write
fT∩ Z. The same idea can be extended to the case when X,Y,Z, are all manifolds.
Whenever fT∩ Z, then if x ∈ f−1(Z), f will be a submersion at x, and so f−1(Z)
will be a smooth manifold in X of codimension equal to the codimension of Z in Y .Another way of interpreting this is that the number of constraints which determineZ in Y will be equal to the number of constraints which determine f−1(Z) in X.Thus dim(f−1(Z))= n− (m− z).
In the previous chapter we put the Whitney Cs -topology on the set of Cs -differentiable maps Xn→ Ym, and called this Cs(X,Y ). In this topological space aresidual set is dense. The fundamental theorem of singularity theory is that transver-sal intersection is generic.
Thom Transversality Theorem Let Xn,Ym be manifolds and Zz a submanifoldof Y . Then the set
{f ∈Cs(X,Y ) : f T∩Z
}= T∩ (X,Y ;Z)
is a residual (and thus dense) set in the topological space Cs(X,Y ).
Note that if f ∈ T∩ (X,Y ;Z) then f−1(Z) will be a manifold in X of dimensionn−m+ z.
Moreover if g ∈ Cs(X,Y ) but g is not transversal to Z, then there exists someCs -map, as near to g as we please in the Cs topology, which is transversal to Z.Thus transversal intersection is typical or generic.
Suppose now that f : Xn → Ym, and corank df (x) = r , so rank df (x) =min(n,m) − r . In this case we said that x ∈ Sr(f ), the corank r singularity setof f . We seek to show that Sr(f ) is a manifold in X, and compute its dimension.
Suppose Xn,Ym are vector spaces, with dimension n,m respectively. As beforeL(X,Y ) is the normed vector space of linear maps from X to Y . Let Lr (X,Y ) bethe subset of L(X,Y ) consisting of linear maps with corank r .
Lemma 5.3 Lr (X,Y ) is a submanifold of L(X,Y ) of codimension (n−z+r)(m−z+ r) where z=min(n,m).
Proof Choose bases such that a corank r linear map S is represented by a matrix(
A B
C D
)where rank (A)= k = z− r .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
202 5 Singularity Theory and General Equilibrium
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
Let U be a neighbourhood of S in L(X,Y ), such that for all S′ ∈ U , corank(S′)= r .
Define F :U → L(�n−k,�m−k) by F(S′)=D′ −C(A′)−1B ′.Now S′ ∈ F−1(0) iff rank (S′) = k. The codimension of 0 in L(�n−k,�m−k)
is (n− k)(m− k). Since F is a submersion F−1(0)= Lr (X,Y ) is of codimension(n− z+ r)(m− z+ r). �
Now if f ∈ Cs(X,Y ), then df ∈ Cs−1(X,L(X,Y )). If df (x) ∈ Lr (X,Y )
then x ∈ Sr(f ). By the Thom Transversality Theorem, there is a residual set inCs−1(X,L(X,Y )) such that df (x) is transversal to Lr (X,Y ). But then Sr(f ) =df−1(Lr (X,Y )) is a submanifold of X of codimension (n − z + r)(m − z + r).Thus we have:
The Singularity Theorem There is the residual set V in Cs(X,Y ) such that forall f ∈ V , the corank r singularity set of f is a submanifold in X of codimension(n− z+ r)(m− z+ r).
If f :Xn→� then codim S1(f )= (n−1+1)(1−1+1)= n. Hence genericallythe set of critical points will be zero-dimensional, and so critical points will be iso-lated. Now a Morse function has isolated, non-degenerate critical points. Let Ms(X)
be the set of Morse functions on X with the topology induced from Cs(X,�).
Morse Theorem The set Ms(X) of Cs -differentiable Morse functions (with non-degenerate critical points), is an open dense set in Cs(X,�). Moreover if f ∈Ms(X), then the critical points of f are isolated.
More generally if f :Xn→ Ym with m≤ n then in the generic case, S1(f ) is ofcodimension (n−m+ 1)(n−m+ 1) = n−m+ 1 in X and so S1(f ) will be ofdimension (m− 1).
Suppose now that n > 2m − 4, and n ≥ m. Then 2n − 2m + 4 > n and sor(n−m+ r) > n for r ≥ 2. But codimension (Sr(f )) = r(n−m+ r), and sincecodimension (Sr(f )) > dimension X, Sr(f )=Φ for r ≥ 2.
Submanifold Theorem If Zz is a submanifold of Ym and z < m then Z is nowheredense in Y (here z= dim(Z) and m= dim(Y )).
In the case n≥m, the singularity set S(f ) will generically consist of a union ofthe various co-rank r singularity submanifolds, for r ≥ 1. The highest dimension ofthese is m− 1. We shall call an S(f ) a stratified manifold of dimension (m− 1).Note also that S(f ) will then be nowhere dense in X. We also require the followingtheorem.
Morse Sard Theorem If f : Xn → Ym is a Cs -map, where s > n − m, thenf (S(f )) has measure zero in Y and f (S0(f )) is residual and therefore dense in Y .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.3 Generic Existence of Regular Economies 203
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
We are now in a position to apply these results to the critical Pareto set. Sup-pose then that u= (u1, . . . , um) :Xn→�m is a smooth profile on the manifold offeasible states X.
Say x ∈ 0Θ(u1, . . . , um)= 0
Θ(u) iff dL(λ,u)(x)= 0, where L(λ,u)=∑mi=1 λiui
and λ ∈ �m+.
By Lemma ??, the critical Pareto set, Θ(u), contains0Θ(u) but possibly also
contains further points on the boundary of X. However we shall regard0Θ as the
differential analogue of the Pareto set. By Lemma 4.19,0Θ(u) must be closed in X.
Moreover when n≥m and x ∈ 0Θ(u) then the differentials {dui(x) : i ∈M}must be
linearly dependent. Hence0Θ(u) must belong to S(u). But also S(u) will be nowhere
dense in X. Thus we obtain the following theorem (Smale 1973 and Debreu 1970).
Pareto Theorem There exists a residual set U in C1(X,�m), for dim(X) ≥ m,
such that for any profile u ∈ U , the closed critical Pareto set0Θ(u) belongs to the
nowhere dense stratified singularity manifold S(u) of dimension (m− 1). Moreoverif dim(X) > 2m − 4, then Θ(u) is itself a manifold of dimension (m − 1), for allu ∈ C1(X,�m).
As we have already observed this result implies that Θ(u) can generally beregarded as an (m − 1) dimensional object parametrised by (m − 1) coefficients
(λ2λ1
, . . . , λm
λ1) say. Since points in
0Θ(u) are characterised by first order conditions
alone, it is necessary to examine the Hessian of L to find the Pareto optimal points.
5.3 Generic Existence of Regular Economies
In this section we outline a proof of the Debreu-Smale Theorem on the GenericExistence of Regular Economies (see Debreu 1970, and Smale 1974a).
As in Sect. 4.4, let u = (u1, . . . , um) : Xm → �m be a smooth profile, whereX = �n+, the commodity space facing each individual. Let e = (e1, . . . , em) ∈ Xm
be an initial endowment vector. Given u, define the Walras manifold to be the set
Zu ={(e, x,p) ∈Xm ×Xm ×Δ
}
(where Δ is the price simplex) such that (x,p) is a Walrasian equilibrium for theeconomy (e, u). That is, (x,p)= (x1, . . . , xm,p) ∈Xm ×Δ satisfies:1. individual optimality: D∗ui(xi)= p for i ∈M ,2. individual budget constraints: 〈p,xi〉 = 〈p, ei〉 for i ∈M ,3. social resource constraints:
∑mi=1 xij = ∑m
i=1 eij for each commodity j =1, . . . , n.
Note that we implicitly assume that each individual’s utility function, ui , is de-fined on a domain Xi ≡X ⊂�n+, so that the differential dui(xi) at xi ∈Xi can berepresented by a vector Dui(xi) ∈ �n. As we saw in Chap. 4, we may normalize
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
204 5 Singularity Theory and General Equilibrium
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
Dui and p so the optimality condition for i becomes D∗ui(x)= p for p ∈Δ. Forthe space of normalized price vectors, we may identify Δ with {p ∈ �n+ : ‖p‖ = 1}.Observe that dim(Δ)= n− 1.
We seek to show that there is a residual set U in Cs(X,�m) such that the Walrasmanifold is a smooth manifold of dimension mn.
Now define the Debreu projection
π : Zu ⊂Xm ×Xm ×Δ→Xm : (e, x,p)→ e.
Note that both Zu and Xm will then have dimension mn.By the Morse Sard Theorem the set
V = {e ∈Xm : dπ has rank nm at(e, x,p)}
is dense in Xm.Say the economy (e, u) is regular if π(e, x,p) = e is a regular value of π (or
rank dπ =mn) for all (x,p) ∈Xm ×Δ such that ((e, x,p) ∈ Zu).When e is a regular value of π , then by the inverse function theorem,
π−1(e)= {(e, x,p)1, (e, x,p)2, . . . , (e, x,p)k}
is a zero-dimensional object, and thus will consist of a finite number of isolatedpoints. Thus for each e ∈ V , the Walrasian equilibria associated with e will be finitein number. Moreover there will exist a neighbourhood N of e in V such that theWalrasian equilibria move continuously with e in N .
Proof of the Generic Regularity of the Debreu Projection Define ψu : Xm ×Δ→Δm+1 where u ∈ Cr(X,�m) by ψu(x,p) = (D∗u1(x1), . . . ,D
∗um(xm),p) wherex = (x1, . . . , xm) and u= (u1, . . . , um). Let I be the diagonal {(p, . . . ,p)} in Δm+1.If (x,p) ∈ ψ−1
u (I ) then for each i, D∗ui(xi) = p and so the first order individualoptimality conditions are satisfied. By the Thom Transversality Theorem there isa residual set (in fact an open dense set) of profiles U such that ψu is transversalto I for each u ∈ U . But then the codimension of ψ−1
u (I ) in Xm × Δ equals thecodimension of I in Δm+1.
Now Δ and I are both of dimension (n− 1) and so codimension (I ) in Δm+1
is (m + 1)(n − 1) − (n − 1) = m(n − 1). Thus dim(Xm × Δ) − dim(ψ−1u (I )) =
m(n− 1) and dim(ψ−1u (I ))=mn+ (n− 1)−m(n− 1)= n+m− 1, for all u ∈U .
Now let e ∈Xm be the initial endowment vector and
Y(e)={
(x1, . . . , xm) ∈Xm :m∑
i=1
xi =m∑
i=1
ei
}
be the set of feasible outcomes, a hyperplane in �nm+ of dimension n(m − 1).For each i, let Bi(p) = {xi ∈ X : 〈p,xi〉 = 〈p, ei〉}, be the hyperplane through theboundary of the ith budget set at the price vector p.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.3 Generic Existence of Regular Economies 205
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
Define
Σ(e)= {(x,p) ∈Xm ×Δ : x ∈ Y(e), xi ∈ Bi(p), ∀i ∈M}, and
Γ = {(e, x,p) : e ∈Xm, (x,p) ∈Σ(e)}.
As discussed in Chap. 4, Y(e) is characterized by n linear equations, while thebudget restrictions induce a further (m−1) linear restraints (the mth budget restraintis redundant). Thus the dimension of Γ is 2mn+ (n−1)−n− (m−1)= 2mn−m.(In fact, if Δ is taken to be the (n − 1) dimensional simplex, then Γ will be alinear manifold of dimension 2mn−m. More generally, Γ will be a submanifoldof Xm ×Xm ×Δ of dimension 2mn−m. At each point the projection is a regularmap (i.e., the rank of the differential of (e, x,p)→ (x,p) is maximal).
To see this define φ :Xm ×Xm ×Δ→�n ×�m−1 by
φ(e, x,p)=(
m∑
i=1
xi −m∑
i=1
ei, 〈p,x1〉 − 〈p, e1〉, . . . , 〈p,xm−1〉 − 〈p, em−1〉)
.
Clearly if φ(e, x,p) = 0 then x ∈ Y(e) and xi ∈ Bi(p) for each i. But 0 is ofcodimension n + m − 1 in �n × �m−1; thus φ−1(0) is of the same codimensionin X2m × Δ. Thus dim(X2m × Δ) − dimφ−1(0) = n + m − 1 and dimφ−1(0) =2nm+ (n− 1)− (n+m− 1)= 2mn−m (giving the dimension of Γ ). In a similarfashion, for (x,p) ∈Σ(e),φ(e, x,p)= 0, and so
dim(Xm ×Δ
)− dimφ−1(0)= n+m− 1.
Thus Σ(e) is a submanifold of Xm ×Δ of dimension
nm+ (n− 1)− (n+m− 1)=mn−m.
Finally define Zu = {(e, x,p) ∈ Γ : ψu(x,p) ∈ I }. For each u ∈ U,Zu is a sub-manifold of X2m ×Δ of dimension mn.
To see this, let fu(e, x,p)=ψu(x,p). Then
fu : Γ →Xm ×Xm ×Δ→Xm ×Δψu→Δm+1.
As we observed for all u ∈ U,ψu is transversal to I in Δm+1. But the codimen-sion of I in Δm+1 is m(n− 1). Since fu will be transversal to I ,
dim(Γ )− dim(f−1
u (I ))=m(n− 1).
Hence dim(f−1u (I ))=mn. Clearly Zu = f−1
u (I ).Thus for all u ∈ U , the Debreu projection π : Zu → Xm will be a C1-map be-
tween manifolds of dimension mn. The Morse Sard Theorem gives the result. �
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
206 5 Singularity Theory and General Equilibrium
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
Fig. 5.6 The Debreu map
Thus we have shown that for each smooth profile u in an open dense set U ,there exists an open dense set V of initial endowments such that (e, u) is a regulareconomy for all e ∈ V .
The result is also related to the existence of a demand function for an economy.A demand function for i (with utility ui ) is the function
fi :Δ×�+→X
where fi(p, I ) is that xi ∈X which maximizes ui on
Bi(p, I )= {x ∈X : 〈p,x〉 = I}.
Now define φi :X→Δ×�+ by φi(x)= (D∗ui(x), 〈D∗ui(x), x〉).But the optimality condition is precisely that D∗ui(x)= p and 〈D∗ui(x), x〉 =
〈p,x〉 = I . Thus when φi has maximal rank, it is locally invertible (by the inversefunction theorem) and so locally defines a demand function.
On the other hand if fi is a C1-function then φi must be locally invertible (byfi ). If this is true for all the agents, then ψu :Xm×Δ→Δm+1 must be transversalto I . Consequently if u = (u1, . . . , um) is such that each ui defines a C1-demandfunction fi :Δ×�+ →X then u ∈ U , the open dense set of the regular economytheorem.
As a final note suppose that u ∈U and e is a critical value of the Debreu projec-tion. Then it is possible that π−1(e)= e×W where W is a continuum of Walrasianequilibria. Another possibility is that there is a continuum of singular or catastrophicendowments C, so that as the endowment vector crosses C the number of Walrasianequilibria changes suddenly. As we discussed in Sect. 4.4, at a “catastrophic” en-dowment, stable and unstable Walrasian equilibria may merge (see Balasko 1975).
Another question is whether for every smooth profile, u, and every endowmentvector, e, there exists a Walrasian equilibrium (x,p). This is equivalent to the re-quirement that for every u the projection Zu→Xm is onto Xm.
In this case for each e ∈Xm there will exist some (e, x,p) ∈ Zu. This is clearlynecessary if there is to exist a market clearing price equilibrium p for the economy(e, u).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.4 Economic Adjustment and Excess Demand 207
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
The usual general equilibrium arguments to prove existence of a market clearingprice equilibrium typically rely on convexity properties of preference (see Chap. 3).However weaker assumptions on preference permit the use of topological argumentsto show the existence of an extended Walrasian equilibrium (where only the firstorder conditions are satisfied).
When the market-clearing price equilibrium does exist, it is useful to consider aprice adjustment process (or “auctioneer”) to bring about equilibrium.
5.4 Economic Adjustment and Excess Demand
To further develop the idea of a demand function and price adjustment process, weconsider the following famous example of Scarf (1960).
Example 5.6 There are three individuals i ∈M = {1,2,3} and three commodities.Agent i has utility ui(x
i1, x
i2, x
i3)=min(xi
i , xij ) where xi = (xi
1, xi2, x
33) ∈ �3+ is the
ith commodity space.At income I , and price vector p = (p1,p2,p3), i demands equal amounts of
xii , x
jj and zero of xi
k : thus (pi + pj )x = I , so xii = I (pi + pj )
−1 = xij .
Suppose the initial endowment ei of agent i is 1 unit of the ith good, and nothingof the j th and kth. Then I = pi and so i′s demand function fi has the form fi(p)=(fii(p), fij (p), fik(p))= (
pi
pi+pj,
pi
pi+pj,0) ∈ �3.
The excess demand function by i is ξi(p)= fi(p)− ei .Since ei = (eii , eij , eik)= (1,0,0) this gives
ξi(p)=( −pj
pi + pj
,pi
pi + pj
,0
)= (ξii , ξij , ξik).
Suppose now the other two consumers are described by cyclic permutationof subscripts, e.g., j has 1 unit of the j th good and utility uj (x
jj , x
jk , x
ji ) =
min(xjj , x
jk ), etc., then the total excess demand at p is
ξ(p)=3∑
i=1
ξi(p) ∈ �3.
For example, the excess demand in commodity j is:
ξj = ξ1j + ξ2j + ξ3j = pi
pi + pj
− pk
pj + pk
.
Since each i chooses fi(p) to maximize utility subject to 〈p1, fi(p)〉 = I =〈p, ei〉 we expect
∑3i=1 〈p,fi(p)− ei〉 = 0.
To see this note that
⟨p, ξ(p)
⟩ =⟨p,
(p3
p3 + p1− p2
p1 + p2,
p1
p1 + p2− p3
p2 + p3,
p2
p2 + p3− p1
p1 + p3
)⟩
= 0.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
208 5 Singularity Theory and General Equilibrium
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
The equation 〈p, ξ(p)〉 = 0 is known as Walras’ Law. To interpret it, suppose welet Δ be the simplex in �3 of price vectors such that ‖p‖ = 1, and pi ≥ 0. Walras’Law says that the excess demand vector ξ(p) is orthogonal to the vector p. In otherwords ξ(p) may be thought of as a tangent vector in Δ. (This is easier to see if weidentify Δ with a quadrant of the sphere, S2.)
We may therefore consider a price adjustment process, which changes the pricevector p(t), at time t by the differential equation
dp(t)
dt= ξ(p) (*)
This adjustment process is a vector field on Δ: that is at every p there exists a rulethat changes p(t) by the equation dp(t)
dt= ξ(p).
If at a vector p∗, the excess demand ξ(p∗)= 0 then dp(t)dt|p∗ = 0, and the price
adjustment process has a stationary point. The flow on Δ can be obtained by inte-grating the differential equation. It is easy to see that if p∗ satisfies p∗1 = p∗2 = p∗3then ξ(p∗)= 0, so there clearly is a price equilibrium where excess demand is zero.
The price adjustment equation (*) does not result in a flow into the price equilib-rium.
To see this, compute the scalar product
⟨(p2p3,p1p3,p1p2), ξ(p)
⟩
=−p3(p1
2 − p22)
p1 + p2+ p2
(p32 − p1
2)
p1 + p3+ p1
(p12 − p3
2)
p2 + p3
= p3(p1 − p2)+ p2(p3 − p1)+ p1(p2 − p1)
= 0.
But if ξ(p)= dpdt
then we obtain p2p3dpdt+ p1p3
dpdt+ p1p2
dpdt= 0.
The obvious solution to this equation is that p1(t)p2(t)p3(t)= constant.In other words when the adjustment process satisfies (*) then the price vector p
(regarded as a function of time, t ,) satisfies the equation p1(t)p2(t)p3(t)= constant.The flow through any point p = (p1,p2,p3), other than the equilibrium price vectorp∗, is then homeomorphic to a circle S1, inside Δ.
Just to illustrate, consider a vector p with p3 = 0.Then ξ(p)= (
−p2p1+p2
,p1
p1+p2,0).
Because we have drawn the flow on the simplex Δ = {p ∈ �3+ :∑
pi = 1} the
flow dpdt
(t)= ξ(p) is discontinuous in p at the three vertices of Δ.However in the interior of Δ the flow is essentially circular (and anti-clockwise).
See Fig. 5.7.To examine the nature of the flow given by the differential equation dp
dt(t) =
ξ(p), define a Lyapunov function at p(t) to be L(p(t)) = 12
∑3i=1(pi(t) − p∗i )2
where p∗ = (p∗1,p∗2,p∗3) is the equilibrium price vector satisfying ξ(p∗)= 0. Since
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.4 Economic Adjustment and Excess Demand 209
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
Fig. 5.7 Flow on the pricesimplex
p∗ ∈Δ, we may choose p∗ = ( 13 , 1
3 , 13 ). Then
dL
dt=
3∑
i=1
−(pi(t)− p∗i)dpi
dt
=3∑
i=1
ξi
(p(t))pi(t)−
3∑
i=1
p∗i ξi
(p(t))
= −1
3
3∑
i=1
ξi
(p(t)).
(This follows since 〈p, ξ(p)〉 = 0.)If ξi(p(t)) > 0 for i = 1,2,3 then dL
dt< 0 and so the Lyapunov distance L(p(t))
of p(t) from p∗ decreases as t →∞. In other words if δp(t)= p(t)− p∗ then thedistance ‖δp(t)‖ → 0 as t →∞. The equilibrium p∗ is then said to be stable.
If on the contrary ξi(p(t)) < 0 ∀i, then dLdt
> 0 and ‖δp(t)‖ increases as t →∞.In this case p∗ is called unstable.
However it is easy to show that the equilibrium point p∗ is neither stable norunstable. To see this consider the price vector p(t) = ( 2
3 , 16 , 1
6 ). It is then easyto show that ξ = (0, 3
10 , −310 ) so the flow through p (locally) keeps the distance
L(p(t)) constant. To see how L(p(t)) behaves near p(t), consider the pointsp(t − δt) = ( 2
3 , 16 − 1
20 , 16 + 1
10 ) and p(t + δt) = ( 23 , 1
6 + 110 , 1
6 − 110 ). After some
easy arithmetic we find that ξ(t− δt)= (0.195,0.409,−0.814) so that ‖δp(t− δt)‖is increasing at p(t − δt). On the other hand ξ(t + δt)= (−0.195,0.814,−0.409)
so ‖δp(t + δt)‖ is decreasing at p(t + δt). In other words the total excess demand∑3i=1 ξi(p(t)) oscillates about zero as we transcribe one of the closed orbits, so the
distance ‖δp(t)‖ increases then decreases.The Scarf Example gives a way to think more abstractly about the process of
price adjustment in an economy.As we have observed, the differential equation dp
dt= ξ(p) on Δ defines a flow
in Δ. That is if we consider any point p0 ∈Δ and then solve the equation for p, weobtain an “orbit”
{p(t) ∈Δ, t ∈ (−∞,∞); p(0)= p0 and dp = ξ
(p(t))}
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
210 5 Singularity Theory and General Equilibrium
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
Fig. 5.8 Smoothing the scarfprofile
that commences at the point p(0) = p0, and gives the past and future trajectory.Because the differential equation has a unique solution, any point p0 can belongto only one orbit. As we saw, each orbit in the example satisfies the equationp1(t) · p2(t) · p3(t) = constant. The phase portrait of the differential equation isits collection of orbits.
The differential equation dpdt= ξ(p) assigns to each point p ∈ Δ a vector
ξ(p) ∈ �n, and so ξ may be regarded as a function ξ : Δ→ �n. In fact ξ is acontinuous map except at the boundary of Δ. This discontinuity only occurs be-cause Δ itself is not smooth at its vertices. If we ignore this boundary feature, thenwe may write ξ ∈ C0(Δ,�n), where C0 as before stands for the set of continuousmaps. In fact if we examine ξ as a function of p then it can be seen to be differen-tiable, so ξ ∈ C1(Δ,�n). Obviously C1(Δ,�n) has a natural metric and thereforethe set C1(Δ,�n) can be given the C1-topology. A differential equation dp
dt= ξ(p)
of this kind can thus be treated as an element of C1(Δ,�n) in which case it is calleda vector field. C1(Δ,�n) with the C1-topology is written C1(Δ,�n) or V1(Δ). Weshall also write P(Δ) for the collection of phase portraits on Δ. Obviously, once thevector field, ξ , is specified, then this defines the phase portrait, τ(ξ), of ξ .
In the example, ξ was determined by the utility profile u and endowment vectore ∈ �3×3. As Fig. 5.8 illustrates the profile u can be smoothed by rounding each ui
without changing the essence of the example.
5.5 Structural Stability of a Vector Field
More abstractly then we can view the excess demand function ξ as a map fromCs(X,�m)×Xm to the metric space of vector fields on Δ: that is
ξ :Cs(X,�m
)×Xm −→ V1(Δ).
The genericity theorem given above implies that, in fact, there is an open denseset U in Cs(X,�m) such that ξ is indeed a C1 vector field on Δ. Moreover ξ isan excess demand function obtained from the individual demand functions {fi} asdescribed above.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.5 Structural Stability of a Vector Field 211
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
Fig. 5.9 Dissimilar phase portraits
An obvious question to ask is how ξ “changes” as the parameters u ∈Cs(X,�m)
and e ∈Xm change. One way to do this is to consider small perturbations in a vectorfield ξ and determine how the phase portrait of ξ changes.
It should be clear from the Scarf example that small perturbations in the utilityprofile or in e may be sufficient to change ξ so that the orbits change in a qualitativeway. If two vector fields, ξ1, and ξ2 have phase portraits that are homeomorphic,then τ(ξ1) and τ(ξ2) are qualitatively identical (or similar). Thus we say ξ1 and ξ2are similar vector fields if there is a homeomorphism h : Δ→ Δ such that eachorbit in the phase portrait τ(ξ1) of ξ1 is mapped by h to an orbit in τ(ξ2).
As we saw in the Scarf example, each of the orbits of the excess demand function,ξ1, say, comprises a closed orbit (homeomorphic to S1). Now consider the vectorfield ξ2 whose orbits approach an equilibrium price vector p∗. The phase portraitsof ξ1 and ξ2 are given in Fig. 5.9.
The price equilibrium in Fig. 5.9(b) is stable since limt→∞ p(t)→ p∗. Obvi-ously each of the orbits of ξ2 are homeomorphic to the half open interval (−∞,0].Moreover (−∞,0] and S1 are not homeomorphic, so ξ1 and ξ2 are not similar.
It is intuitively obvious that the vector field, ξ2 can be obtained from ξ1 by a“small perturbation”, in the sense that ‖ξ1 − ξ2‖< δ, for some small δ > 0. Whenthere exists a small perturbation ξ2 of ξ1, such that ξ1 and ξ2 are dissimilar, then ξ1is called structurally unstable. On the other hand, it should be plausible that, for anysmall perturbation ξ3 of ξ2 then ξ3 will have a phase portrait τ(ξ3) homeomorphicto τ(ξ2), so ξ2 and ξ3 will be similar. Then ξ2 is called structurally stable. Noticethat structural stability of ξ2 is a much more general property than stability of theequilibrium point p∗ (where ξ2(p
∗)= 0).All that we have said on Δ can be generalised to the case of a smooth manifold
Y . So let V1(Y ) be the topological space of C1-vector fields on Y and P(Y ) thecollection of phase portraits on Y .
Definition 5.1(1) Let ξ1, ξ2 ∈ V1(Y ). Then ξ1 and ξ2 are said to be similar (written ξ1 ∼ ξ2) iff
there is a homeomorphism h : Y → Y such that an orbit σ is the phase portraitτ(ξ1) of ξ1 iff h(σ ) is in the phase portrait of τ(ξ2).
(2) The vector field ξ is structurally stable iff there exists an open neighborhoodV of ξ in V1(Y ) such that ξ ′ ∼ ξ for all ξ ′ ∈ V .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
212 5 Singularity Theory and General Equilibrium
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
Fig. 5.10 A source
(3) A property K of vector fields in V1(2) is generic iff the set {ξ ∈ V1(Y ) :ξ satisfiesK} is residual in V1(Y ).
As before, a residual set, V , is the countable intersection of open dense sets, and,when V1(Y ) is a “Baire” space, V will itself be dense.
It was conjectured that structural stability is a generic property. This is true if thedimension of Y is 2, but is false otherwise (Smale 1966; Peixoto 1962).
Before discussing the Peixoto-Smale Theorems, it will be useful to explore fur-ther how we can qualitatively “classify” the set of phase portraits on a manifold Y .The essential feature of this classification concerns the nature of the critical or singu-larity points of the vector field on Y and how these are constrained by the topologicalnature of Y .
Example 5.7 Let us return to the example of the torus Z = S1 × S1 examined inExample 5.4. We defined a height function f : Z → � and considered the fourcritical points {s, t, u, v} of f . To remind the reader v was an index 2 critical point(a local maximum of f ). Near v, f could be represented as
f (h1, h2)= f (v)− h12 − h2
2.
Now f defines a gradient vector field ξ where
ξ(h1, h2)=−df (h1, h2)
Looking down on v we see the flow near v induced by ξ resembles Fig. 5.10.The field ξ may be interpreted as the law of motion under a potential energy field,
f , so that the system flows from the “source”, v, towards the “sink”, s, at the bottomof Z.
Another way of characterizing the source, v, is by what we can call the “degree”of v. Imagine a small ball B2 around v and consider how the map g : S1 → S1 :(h1, h2)→ ξ(h1,h2)‖ξ(h1,h2)‖ behaves as we consider points (h1, h2) on the boundary S1
of B2.At point 1, ξ(h′1, h′2) points “north” so 1 → 1′. Similarly at 2, (ξ(h1
2, h22))
points east, so 2→ 2′. As we traverse the circle once, so does g. The degree ofg is +1, and the degree of v is also +1. However the saddle, u, is of degree −1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.5 Structural Stability of a Vector Field 213
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
Fig. 5.11 ???
Fig. 5.12 A saddle
Fig. 5.13 A sink
At 1, the field points north, but at 2 the field points west, so as we traverse thecircle on the left in a clockwise direction, we traverse the circle on the right inan anti-clockwise direction. Because of this change of orientation, the degree of u
is −1.It can also easily be shown that the sink s has degree +1.The rotation at s induced by g is clockwise. It can be shown that, in general,
the Euler characteristic can be interpreted as the sum of the degrees of the criticalpoints. Thus χ(Z)= 1−1−1+1, since the degree at each of the two saddle pointsis −1, and the degree at the source and sink is +1.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
214 5 Singularity Theory and General Equilibrium
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
Fig. 5.14 ???
Example 5.8 It is obvious that the flow for the Scarf example is not induced by agradient vector field. If there were a function f :Δ→� satisfying ξ(p)=−df (p),then the orbits of ξ would correspond to decreasing values of f . As we saw however,there are circular orbits for ξ . It is clearly impossible to have a circular flow suchthat f decreases round the circle. However the Euler characteristic still determinesthe nature of the zeros (or singularities) of ξ .
First we compute the degree of the singularity p∗ where ξ(p∗)= 0.As we saw in Example 5.5, the orbits look like smoothed triangles homeomorphic
to S1. Let S1 be a copy of the circle (see Fig. 5.14). At 1, g points west, while at 2,g points northwest. At 3, g points northeast, at 4, g points east. Clearly the degreeis +1 again.
As we showed, the Euler characteristic χ(Δ) of the simplex is+1, and the degreeof the only critical point p∗ of the vector field ξ is+1. This suggests that again thereis a relationship between the Euler characteristic χ(Y ) of a manifold Y and the sumof the degrees of the critical points of any vector field ξ on Y . One technical pointshould be mentioned, concerning the nature of the flow on the boundary of Δ. Inthe Scarf example the flow of ξ was “along” the boundary, ∂Δ, of Δ. In a realeconomy one would expect that as the price vector approaches the boundary ∂Δ (sothat pi → 0 for some price pi ), then excess demand ξi for that commodity wouldrapidly increase as (ξi →∞). This essentially implies that the vector field ξ wouldpoint towards the interior of Δ. So now consider perturbations of the Scarf exampleas in Fig. 5.15.
In Fig. 5.15(a) is a perturbation where one of the circular orbits (called S) areretained; only flow commencing near to the boundary approaches S, as does anyflow starting near to the zero p∗ where ξ(p∗)= 0. In this case the boundary ∂Δ isa repellor; the closed orbit is an attractor, and the singularity point or zero, p∗, is asource (or point repellor). In Fig. 5.15(b) the flow is reversed. The closed orbit S isa repellor; p∗ is a sink (or attractor), while ∂Δ is an attractor. Now consider a copyΔ′ of Δ inside Δ (given by the dotted line in Fig. 5.15(b)). On the boundary of Δ′the flow points outwards. Then χ(Δ′) is still 1 and the degree of p∗ is still 1. Thisillustrates the following theorem.
Poincaré-Hopf Theorem Let Y be a compact smooth manifold with boundary ∂Y .Suppose ξ ∈ V1(Y ) has only a finite number of singularities, and points outwardson ∂Y . Then χ(Y ) is equal to the sum of the degrees of the singularities of ξ .
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.5 Structural Stability of a Vector Field 215
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
Fig. 5.15 ???
To apply this theorem, suppose ξ is the vector field given by the excess demandfunction. Suppose that ξ points towards the interior of Δ. Then the vector field (−ξ)
points outward. By the Debreu-Smale Theorem, we can generically assume that ξ
has (at most) a finite number of singularities. Since χ(Δ)= 1, there must be at leastone singularity of (−ξ) and thus of ξ . Unfortunately this theorem does not allow usto infer whether or not there exists a singularity p∗ which is stable (i.e., an attractor).
The Poincaré-Hopf Theorem can also be used to understand singularities of vec-tor fields on manifolds without boundary. As we have suggested, the Euler charac-teristic of a sphere is 2 for the even dimensional case and 0 for the odd dimensionalcase. This gives the following result.
The “Hairy Ball” Theorem Any vector field ξ on S2n (even dimension) must havea singularity. However there exists a vector field ξ on S2n+1 (odd dimension) suchthat ξ(p)= 0 for no p ∈ S2n+1.
To illustrate Fig. 5.16 shows a vector field on S2 where the flow is circular oneach of the circles of latitude, but both north and south poles are singularities. Theflow is evidently non-gradient, since no potential function, f , can increase around acircular orbit.
Example 5.9 As an application, we may consider a more general type of flow, bydefining at each point x ∈ Y a set h(x) of vectors in the tangent space at x. Asdiscussed in Chap. 4, h could be induced by a family of utility functions {ui : Y →�, i ∈M}, such that
h(x)= {v ∈ �n : ⟨dui(x), v⟩> 0, ∀i ∈M
}.
That is to say v ∈ h(x) iff each utility function increases in the direction v. We
can interpret Lemma ?? to assert that h(x) = Φ whenever x ∈ 0Θ(u1, . . . , um), the
critical Pareto set.
Suppose that0Θ = Φ . Then in general we can use a selection theorem to select
from h(x) a non-zero vector ξ(x), at every x ∈ Y , such that ξ is a continuous vectorfield. That is, ξ ∈ V1(Y ), but ξ has no singularities. However if χ(Y ) �= 0 then any
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
216 5 Singularity Theory and General Equilibrium
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
Fig. 5.16 Zeros of a fieldon S2
vector field ξ on Y has critical points whose degrees sum to χ(Y ), subject to theboundary condition of the Poincaré-Hopf Theorem.
Pareto Theorem If χ(Y ) �= 0 then0Θ(u) �=Φ for any smooth profile u on Y .
The Euler characteristic can be interpreted as an obstruction to the non-existenceof equilibria, or of fixed points. For example suppose that χ(Y )= 0. Then it is pos-sible to construct a vector field ξ on Y without zeros. It is then possible to constructa function f : Y → Y which follows the trajectories of ξ a small distance, ε, say.But then the function f is homotopic to the identity.
That is to say, from each point x construct a path cx : [0,1]→ Y with cx(0)= x
and cx(1) = f (x) whose gradient dcdt
(t) at time t is given by the vector field ξ atthe point x′ = cx(t). Say that f is induced by the vector field, ξ . The homotopyF : [0,1] × Y → Y is then given by F(0, x) = x and F(t, x) = cx(t). Since cx
is continuous, so is F . Thus F is a homotopy between f and the identity on Y .A function f : Y → Y which is homotopic to the identity is called a deformationof Y .
If χ(Y )= 0 then it is possible to find a vector field ξ on Y without singularitiesand then construct a deformation f of Y induced by ξ . Since ξ(x)= 0 for no x, f
will not have a fixed point. Conversely if f is a deformation on Y and χ(Y ) �= 0,then the homotopy between f and the identity generates a vector field ξ . Were f tohave no fixed point, then ξ would have no singularity. If f and thus ξ have the rightbehavior on the boundary, then ξ must have at least one singularity. This contradictsthe fixed point free property of f .
Lefschetz Fixed Point Theorem If Y is a manifold with χ(Y )= 0 then there existsa fixed point free deformation of Y . If χ(Y ) �= 0 then any deformation of Y has afixed point.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.5 Structural Stability of a Vector Field 217
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
Note that the Lefschetz Fixed Point Theorem does not imply that any functionf : S2 → S2 has a fixed point. For example the “antipodal” map f (x) = −x, for‖x‖ = 1, is fixed point free. However f cannot be induced by a continuous vectorfield, and is therefore not a deformation. The Lefschetz fixed point theorem juststated is in fact a corollary of a deeper result: For any continuous map f : Y → Y ,with Y compact, there is an “obstruction” called the Lefschetz number λ(f ). Ifλ(f ) �= 0, then f must have a fixed point. If Y is “homotopy equivalent” to thecompact ball then λ(f ) �= 0 for every continuous function on the ball, so the ballhas the fixed point property. On the other hand if f is homotopic to the identity,Id , on Y then λ(f ) = λ(Id), and it can be shown that λ(Id) = χ(Y ) the Eulercharacteristic of Y . It therefore follows that χ(Y ) is an obstruction to the existenceof a fixed point free deformation of the compact manifold Y .
Example 5.10 To illustrate an application of this theorem in social choice, supposethat Y is a compact manifold of dimension at most k(σ ) − 2, where k(σ ) is theNakamura number of the social choice rule, σ (see Sect. 3.8). Suppose (u1, . . . , um)
is a smooth profile on Y . Then it can be shown (Schofield 1984) that if the choiceCσ(π)(Y ) is empty, then there exists a fixed point free deformation on Y . Conse-quently if χ(X) �= 0, then Cσ(π)(Y ) must be non-empty.
Example 5.11 It would be useful to be able to use the notion of the Lefschetzobstruction to obtain conditions under which the singularities of the excess demandfunction, ξ , of an economy were stable. However, as Scarf’s example showed, it isentirely possible for there to be a single attractor, or a single repellor (as in Fig. 5.15)or even a situation with an infinite number of closed orbits. However, consider amore general price adjustment process as follows. At each p in the interior, Int Δ,of Δ let
ξ∗(p)= {v ∈ �n : ⟨v, ξ(p)⟩> 0}.
A vector field v ∈ V1(Δ) is dual to ξ iff v(p) ∈ ξ∗(p) for all p ∈ Int Δ, and v(p)= 0iff ξ(p)= 0 for p ∈ Int Δ. It may be possible to find a vector field, v, dual to ξ whichhas attractors. Suppose that v is dual to ξ , and that f :Δ→Δ is induced by v. Aswe have seen, the Lefschetz number of f gives information about the singularitiesof v.
Dierker (1972) essentially utilized the following hypothesis: there exists a dualvector field, v, and a function f :Δ→Δ induced by v such that f = f0 is homo-topic to the constant map f1 :Δ→{( 1
n, . . . , 1
n)} such that {p ∈Δ : ft (p)= p for t ∈
[0,1]} is compact. Under the assumption that the economy is regular (so the numberof singularities of ξ is finite), then he showed that the number of such singularitiesmust be odd. Moreover, if it is known that ξ only has stable singularities, then thereis only one. The proof of the first assertion follows by observing that λ(f0)= λ(f1).But f1 is the constant map on Δ so λ(f1)= 1. Moreover λ(f0) is equal to the sumof the degrees of the singularities of v, and Dierker shows that at each singularity ofv, the degree is ±1. Consequently the number of singularities must be odd. Finallyif there are only stable singularities, each has degree +1, so it must be unique.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
218 5 Singularity Theory and General Equilibrium
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
Fig. 5.17 ???
Example 5.12 As a further application of the Lefschetz obstruction, suppose, con-trary to the usual assumption that negative prices are forbidden, that p ∈ Sn−1
rather than Δ. It is natural to suppose that ξ(p) = ξ(−p) for any p ∈ Sn−1. Sup-pose now that ξ(p) = 0 for no p ∈ Sn−1. This defines an (even) spherical mapg : Sn−1 → Sn−1 by g(p) = ξ(p)/‖ξ(p)‖. Thus g(p) = g(−p). The degree (deg(g)) of such a g can readily be seen to be an even integer, and it follows that theLefschetz obstruction of g is λ(g)= 1+ (−1)n−1 deg (g).
Clearly λ(g) �= 0 and so g has a fixed point p such that g(p) = p. But thenξ(p)= αp for some α > 0. This violates Walras’ Law, since 〈p, ξ(p)〉 = α‖p‖2 �=0, so ξ(p)= 0 for some p ∈ Sn−1. Keenan (1992) goes on to develop some of theearlier work by Rader (1972) to show, in this extended context, that for generic,regular economies there must be an odd number of singularities.
The above examples have all considered flows on the simplex or the sphere. Toreturn to the idea of structural stability, let us consider once again examples of avector field on the torus.
Example 5.13 (1) For a more interesting deformation of the torus Z = S1 × S1,consider Fig. 5.17.
The closed orbit at the top of the torus is a repellor, R, say. Any flow starting nearto R winds towards the bottom closed orbit, A, an attractor. There are no singulari-ties, and the induced deformation is fixed point free.
(2) Not all flows on the torus Z need have closed orbits. Consider the flow on Z
given in Fig. 5.18. If the tangent of the angle, θ , is rational, then the orbit through x
is closed, and will consist of a specific number of turns round Z. However supposethis flow is perturbed. There will be, in any neighborhood of θ , an irrational angle.The orbits of an irrational flow will not close up. To relate this to the Peixoto The-orem which follows, with rational flow there will be an infinite number of closedorbits. However the phase portrait for rational flow cannot be homeomorphic tothe portrait for irrational flow. Thus any perturbations of rational flow gives a non-
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.5 Structural Stability of a Vector Field 219
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
Fig. 5.18 ???
homeomorphic irrational flow. Clearly any vector field on the torus which givesrational flow is structurally unstable.
Structural Stability Theorem1. If dimY = 2 and Y is compact, then structural stability of vector fields on Y is
generic.2. If dimY ≥ 3, then structural stability is non-generic.
Peixoto (1962) proved part (1) by showing that structurally stable vector fieldson compact Y (of dimension 2) must satisfy the following properties:(1) there are a finite number of non-degenerate isolated singularities (that is, crit-
ical points which can be sources, sinks, or saddles)(2) there are a finite number of attracting or repelling closed orbits(3) every orbit (other than closed orbits) starts at a source or saddle, or winds
away from a repellor and finishes at a saddle or sink, or winds towards anattractor
(4) no orbit connects saddle points.Peixoto showed that for any vector field ξ on Y and any neighborhood V of ξ in
V1(Y ) there was a vector field ξ ′ in V that satisfied the above four conditions andthus was structurally stable.
Although we have not carefully defined the terms used above, they should beintuitively clear. To illustrate, Fig. 5.19(a) shows an orbit connecting saddles, whileFig. 5.19(b) shows that after perturbation a qualitatively different phase portrait isobtained.
In Fig. 5.19(a), A and B are connected saddles, C is a repellor (orbits startingnear to C leave it) and D is a closed orbit. A small perturbation disconnects A andB as shown in Fig. 5.19(b), and orbits starting near to D (either inside or outside)approach D, so it is an attractor.
The excess demand function, ξ , of the Scarf example clearly has an infinite num-ber of closed orbits (all homeomorphic to S1). Thus ξ cannot be structurally stable.From Peixoto’s Theorem, small perturbations of ξ will destroy this feature. As wesuggested, a small perturbation may change ξ so that p∗ becomes a stable equilib-rium (an attractor) or an unstable equilibrium (a repellor).
Smale’s (1966) proof that structural stability was non-generic in three or moredimensions was obtained by constructing a diffeomorphism f : Y 3 → Y 3 (with Y =S1 × S1 × S1). This induced a vector field ξ ∈ V1(Y ) that had the property thatfor a neighborhood V of ξ in V1(Y ), no ξ ′ in V was structurally stable. In other
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
220 5 Singularity Theory and General Equilibrium
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
Fig. 5.19 ???
words every ξ ′ when perturbed led to a qualitatively different phase portrait. Wecould say that ξ was chaotic. Any attempt to model ξ by an approximation ξ ′, say,results in an essentially different vector field. The possibility of chaotic flow and itsramifications will be discussed in general terms in the next section. The consequencefor economic theory is immediate, however. Since it can be shown that any excessdemand function ξ and thus any vector field can result from an economy, (u, e), itis possible that the price adjustment process is chaotic.
As we observed after Example 5.8, an economically realistic excess demandfunction ξ on Δ should point into the price simplex at any price vector on ∂Δ. Thisfollows because if pi → 0 then ξi would be expected to approach∞. Let V1
0 (Δ) be
the topological space of vector fields on Δ, of the form dpdt|p = ξ(p), such that dp
dt|p
points into the interior of Δ for p near ∂Δ.
The Sonnenschein-Mantel-Debreu Theorem The map
ξ :Cs(X,�m
)×Xm→ V10 (Δ)
is onto if m≥ n.
Suppose that there are at least as many economic agents (m) as commodities.Then it is possible to construct a well-behaved economy (u, e) with monotonic,strictly convex preferences induced from smooth utilities, and an endowment vectore ∈ Xm, such that any vector field in V1
0 (Δ) is generated by the excess demandfunction for the economy (u, e).
Versions of the theorem were presented in Sonnenschein (1972), Mantel (1974),and Debreu (1974). A more recent version can be found in Mas-Colell (1985). As wehave discussed in this section, because the simplex Δ has χ(Δ)= 1, then the “ex-cess demand” vector field ξ will always have at least one singularity. In fact, from
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.6 Speculations on Chaos 221
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
the Debreu-Smale theorem, we expect ξ to generically exhibit only a finite num-ber of singularities. Aside from these restrictions, ξ , is essentially unconstrained. Ifthere are at least four commodities (and four agents) then it is always possible toconstruct (u, e) such that the vector field induced by excess demand is “chaotic”.
As we saw in Sect. 5.4, the vector field ξ of the Scarf example was structurallyunstable, but any perturbation of ξ led to a structurally stable field ξ ′, say, eitherwith an attracting or repelling singularity. The situation with four commodities ispotentially much more difficult to analyze. It is possible to find (u, e) such that theinduced vector field ξ on Δ is chaotic—in some neighborhood V of ξ there is nostructurally stable field. Any attempt to model ξ by ξ ′, say, must necessarily incor-porate some errors, and these errors will multiply in some fashion as we attemptto map the phase portrait. In particular the flow generated by ξ through some pointx ∈ Δ can be very different from the flow generated by ξ ′ through x. This phe-nomenon has been called “sensitive dependence on initial conditions.”
5.6 Speculations on Chaos
It is only in the last twenty years or so that the implications of “chaos” (or failure ofstructural stability in a profound way) have begun to be realized. In a recent bookKauffman commented on the failure of structural stability in the following way.
“One implication of the occurrence or non-occurrence of structural stability isthat, in structurally stable systems, smooth walks in parameter space must (resultin) smooth changes in dynamical behavior. By contrast, chaotic systems, which arenot structurally stable, adapt on uncorrelated landscapes. Very small changes in theparameters pass through many interlaced bifurcation surfaces and so change thebehavior of the system dramatically.”1
The whole point of the Debreu-Smale Theorem is that generically the Debreumap is regular. Thus there are open sets in the parameter space (of utility profilesand endowments) where the number, E(ξ), of singularities of ξ is finite and con-stant. As the Scarf example showed, however, even though E(ξ) may be constant ina neighborhood, the vector field ξ can be structurally unstable. The structurally un-stable circular vector field of the Scarf example is not particularly surprising. Afterall, similar structurally unstable systems are common (the oscillator or pendulumis one example). These have the feature that, when perturbed, they become struc-turally unstable. Thus the dynamical system of a pendulum with friction is struc-turally stable. Its phase portrait shows an attractor, which still persists as the frictionis increased or decreased. Smale’s Structural Instability Theorem together with theSonnenschien-Mantel-Debreu Theorem suggests that the vector field generated byexcess demand can indeed be chaotic when there are at least four commodities andagents. This does not necessarily mean that chaotic price adjustment processes arepervasive. As in Dierker’s example, even though the vector field ξ(p)= dp
dtcan be
1S. Kauffman, The Origins of Order (1993) Oxford University Press: Oxford.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
222 5 Singularity Theory and General Equilibrium
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
chaotic, it may be possible to find a structurally stable vector field, v, dual to ξ ,which is structurally stable.
This brief final section will attempt to discuss in an informal fashion, whether ornot it is plausible for economies to exhibit chaotic behavior.
It is worth mentioning that the idea of structural stability is not a new one, thoughthe original discussion was not formalized in quite the way it is today. Newton’sgreat work Philosophiae Naturalis Principia Mathematica (published in 1687) grewout of his work on gravitation and planetary motion. The laws of motion could besolved precisely giving a vector field and the orbits (or phase portrait) in the case ofa planet (a point mass) orbiting the sun. The solution accorded closely with Kepler’s(1571–1630) empirical observations on planetary motion. However, the attempt tocompute the planetary orbits for the solar system had to face the problem of pertur-bations. Would the perturbations induced in each orbit by the other planets cause theorbital computations to converge or diverge? With convergence, computing the orbitof Mars, say, can be done, by approximating the effects of Jupiter, Saturn perhaps,on the Mars orbit. The calculations would give a prediction very close to the actualorbit. Using the approximations, the planetary orbits could be computed far into thefuture, giving predictions as precise as calculating ability permitted. Without con-vergence, it would be impossible to make predictions with any degree of certainty.Laplace in his work “Mécanique Céleste” (published between 1799 and 1825) hadargued that the solar system (viewed as a formal dynamical system) is structurallystable (in our terms). Consistent with his view was the use of successive approxima-tions to predict the perihelion (a point nearest the sun) of Haley’s comet, in 1759,and to infer the existence and location of Neptune in 1846.
Structural stability in the three-body problem (of two planets and a sun) wasthe obvious first step in attempting to prove Laplace’s assertion. In 1885 a prizewas announced to celebrate the King of Sweden’s birthday. Henri-Poincaré submit-ted his entry “Sur le problème des trois corps et les Equations de la Dynamique.”This attempted to prove structural stability in a restricted three body problem. Theprize was won by Poincaré’s entry, although it was later found to contain an error.Poincaré had obtained his doctorate in mathematics in Paris in 1878, had brieflytaught at Caen and later became professor at Paris. His work on differential equa-tions in the 1880s and his later work on Celestial Mechanics in the 1890s developednew qualitative techniques (in what we now call differential topology) to study dy-namical equations.
In passing it is worth mentioning that since there is a natural periodicity to anyrotating celestial system, the state space in some sense can be viewed as productsof circles (that is tori). Many of the examples mentioned in the previous section,such as periodic (rational) or a-periodic (non-rational) flow on the torus came upnaturally in celestial mechanics.
One of the notions implicitly emphasized in the previous sections of this chapteris that of bifurcation: namely a dynamical system on the boundary separating quali-tatively different systems. At such a bifurcation, features of the system separate out
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.6 Speculations on Chaos 223
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
in pairs. For example, in the Debreu map, a bifurcation occurs when two of the priceequilibria coalesce. This is clearly linked to the situation studied by Dierker, wherethe number of price equilibria (in Δ) is odd. At a bifurcation, two equilibria withopposite degrees coalesce. In a somewhat similar fashion Poincaré showed that, forthe three-body problem, if there is some value μ0 (of total mass, say) such that pe-riodic solutions exist for μ ≤ μ0 but not for μ > μ0, then two periodic solutionsmust have coalesced at μ0. However Poincaré also discovered that the bifurcationcould be associated with the appearance of a new solution with period double that ofthe original. This phenomenon is central to the existence of a period-doubling cas-cade as one of the characteristics of chaos. Near the end of his Celestial Mechanics,Poincaré writes of this phenomenon:
“Neither of the two curves must ever cut across itself, but it must bend backupon itself in a very complex manner an infinite number of times. . . . Nothing ismore suitable for providing us with an idea of the complex nature of the three bodyproblem.”2
Although Poincaré was led to the possibility of chaos in his investigations intothe solar system, it appears that the system is in fact structurally stable. Arnol’dshowed in 1963 that for a system with small planets, there is an open set of initialconditions leading to bounded orbits for all time. Computer simulations of the sys-tem far into time also suggests it is structurally stable.3 Even so, there are events inthe system that affect us and appear to be chaotic (perhaps catastrophic would be amore appropriate term). The impact of large asteroids may have a dramatic effecton the biosphere of the earth, and these have been suggested as a possible cause ofmass extinction. The onset and behavior of the ice ages over the last 100,000 yearsis very possibly chaotic, and it is likely that there is a relationship between theseviolent climatic variations and the recent rapid evolution of human intelligence.4
More generally, evolution itself is often perceived as a gradient dynamical pro-cess, leading to increasing complexity. However Stephen Jay Gould has argued overa number of years that evolution is far from gradient-like: increasing complexity co-exists with simple forms of life, and past life has exhibited an astonishing variety.5
Evolution itself appears to proceed at a very uneven rate.6
2My observations and quotations are taken from D. Goroff’s introduction and the text of a recentedition of Poincaré’s New Methods of Celestial Mechanics, (1993) American Institute of Physics:New York.3See I. Peterson, Newton’s Clock: Chaos in the Solar System (1993) Freeman: New York.4See W. H. Calvin, The Ascent of Mind. Bantam: New York.5S. J. Gould, Full House (1996) Harmony Books: New York; S. J. Gould, Wonderful Life (1989)Norton: New York.6N. Eldredge and S. J. Gould, “Punctuated Equilibria: An Alternative to Phyletic Gradualism,” inModels in Paleobiology (1972), T. J. M. Schopf, ed. Norton: New York.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
224 5 Singularity Theory and General Equilibrium
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
Fig. 5.20 The butterfly
“Empirical” chaos was probably first discovered by Edward Lorenz in his ef-forts to numerically solve a system of equations representative of the behavior ofweather.7 A very simple version is the non-linear vector equation
dx
dt=⎛
⎝dx1dx2dx3
⎞
⎠=⎛
⎝−a(x1 − x2)
−x1x3 + a2x1 − x2x1x2 − a3x3
⎞
⎠
which is chaotic for certain ranges of the three constants, a1, a2, a3.The resulting “butterfly” portrait winds a number of times about the left hole
(A in Fig. 5.20), then about the right hole (B), then the left, etc. Thus the phaseportrait can be described by a sequence of winding numbers (w1
l ,w1k ,w
2l ,w
2k , etc.).
Changing the constants a1, a2, a3 slightly changes the winding numbers.Given that chaos can be found in such a simple meteorological system, it is
worthwhile engaging in a thought experiment to see whether “climatic” chaos isa plausible phenomenon. Weather occurs on the surface of the earth, so the spa-tial context is S2 × I , where I is an interval corresponding to the depth of theatmosphere. As we know, χ(S2) = χ(S2 × I ) = 2 so we would expect singulari-ties. Secondly there are temporal periodicities, induced by the distance from the sunand earth’s rotation. Thirdly there are spatial periodicities or closed orbits. Chiefamong these must be the jet stream and the oceanic orbit of water from the southernhemisphere to the North Atlantic (the Gulf Stream) and back. The most interestingsingularities are the hurricanes generated each year off the coast of Africa and chan-neled across the Atlantic to the Caribbean and the coast of the U.S.A. Hurricanes
7E.N. Lorenz, “The Statistical Prediction of Solutions of Dynamical Equations,” Proceedings Int.Symp. Num. Weather Pred (1962) Tokyo; E. N. Lorenz “Deterministic Non Periodic Flow,” J. At-mos. Sci. (1963): 130–141.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
5.6 Speculations on Chaos 225
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
are self-sustaining heat machines that eventually dissipate if they cross land or coolwater. It is fairly clear that their origin and trajectory is chaotic.
Perhaps we can use this thought experiment to consider the global economy. Firstof all there must be local periodicities due to climatic variation. Since hurricanes andmonsoons, etc. effect the economy, one would expect small chaotic occurrences.More importantly, however, some of the behavior of economic agents will be basedon their future expectations about the nature of economic growth, etc. Thus onewould expect long term expectations to affect large scale “decisions” on matterssuch as fertility. The post-war “baby boom” is one such example. Large scale peri-odicities of this kind might very well generate smaller chaotic effects (such as, forexample, the oil crisis of the 1970s), which in turn may trigger responses of variouskinds.
It is evident enough that the general equilibrium (GE) emphasis on the existenceof price equilibria, while important, is probably an incomplete way to understandeconomic development. In particular, GE theory tends to downplay the formationof expectations by agents, and the possibility that this can lead to unsustainable“bubbles”.
Remember, it is a key assumption of GE that agents’ preferences are defined onthe commodity space alone. If, on the contrary, these are defined on commoditiesand prices, then it is not obvious that the assumptions of the Ky Fan Theorem (cf.,Chap. 3) can be employed to show existence of a price equilibrium. Indeed ma-nipulation of the kind described in Chap. 4 may be possible. More generally onecan imagine energy engines (very like hurricanes) being generated in asset markets,and sustained by self-reinforcing beliefs about the trajectory of prices. It is true thatmodern decentralised economies are truly astonishing knowledge or data-processingmechanisms. From the perspective of today, the argument that a central planning au-thority can be as effective as the market in making “rational” investment decisionsappears to have been lost. Hayek’s case, the so-called “calculation” argument, withvon Mises and against Lange and Schumpeter, was based on the observation thatinformation is dispersed throughout the economy and is, in any case, predominantlysubjective. He argued essentially that only a market, based on individual choices,can possibly “aggregate” this information.8
Recently, however, theorists have begun to probe the degree of consistency orconvergence of beliefs in a market when it is viewed as a game. It would seem thatwhen the agents “know enough about each other”, then convergence in beliefs is aconsequence.9
8See F. A. Hayek, “The Use of Knowledge in Society,” American Economic Review (1945) 55:519–530, and the discussion in A. Gamble, Hayek: The Iron Cage of Liberty (1996) Westview:Boulder, Colorado.9See R. J. Aumann, “Agreeing to Disagree,” Annals of Statistics (1976) 1236–1239 and K. J. Arrow“Rationality of Self and Others in an Economic System,” Journal of Business (1986) 59: S385–S399.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
226 5 Singularity Theory and General Equilibrium
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
In fact the issue about the “truth-seeking” capability of human institutions is veryold and dates back to the work of Condorcet.10 Nonetheless it is possible for beliefcascades or bubbles to occur under some circumstances.11 It is obvious enoughthat economists writing after the Great Crash of the 1930s might be more willingthan those writing today to consider the possibility of belief cascades and collapse.John Maynard Keynes’ work on The General Theory of Employment, Interest andMoney (1936) was very probably the most influential economic book of the cen-tury. What is interesting about this work is that it does appear to have grown outof work that Keynes did in the period 1906 to 1914 on the foundation of probabil-ity, and that eventually was published as the Treatise on Probability (1921). In theTreatise, Keynes viewed probability as a degree of belief. He also wrote: “The oldassumptions, that all quantity is numerical and that all quantitative characteristicsare additive, can no longer be sustained. Mathematical reasoning now appears as anaid in its symbolic rather than its numerical character. I, at any rate, have not thesame lively hope as Condorcet, or even as Edgeworth, ‘Eclairer le Science moraleset politiques par le flambeau de l’Algèbre.”’12
Macro-economics as it is practiced today tends to put a heavy emphasis on theempirical relationships between economic aggregates. Keynes’ views, as I inferfrom the Treatise, suggest that he was impressed neither by econometric relation-ships nor by algebraic manipulation. Moreover, his ideas on “speculative eupho-ria” and crashes13 would seem to be based on an understanding of the economygrounded not in econometrics or algebra but in the qualitative aspects of its dynam-ics.
Obviously I have in mind a dynamical representation of the economy somewherein between macro-economics and general equilibrium theory. The laws of motion ofsuch an economy would be derived from modeling individuals’ “rational” behavioras they process information, update beliefs and locally optimise. At present it isnot possible to construct such a micro-based macro-economy because the laws ofmotion are unknown. Nonetheless, just as simulation of global weather systems canbe based on local physical laws, so may economic dynamics be built up from local“rationality” of individual agents. In my view, the qualitative theory of dynamicalsystems will have a major rôle in this enterprise. The applications of this theory, asoutlined in the chapter, are intended only to give the reader a taste of how this theorymight be developed.
10See his work on the so-called Jury Theorem in his Essai of 1785. A discussion of Condorcet’swork can be found in I. McLean and F. Hewitt, Condorcet: Foundations of Social Choice andPolitical Theory (1994) Edward Elgar: Aldershot, England.11See S. Bikhchandani, D. Hirschleifer and I. Welsh, “A Theory of Fads, Fashion, Custom, andCultural Change as Information Cascades,” Journal of Political Economy (1992) 100: 992–1026.12John Maynard Keynes, Treatise on Probability (1921) Macmillan: London pp. 349. The twovolumes by Robert Skidelsky on John Maynard Keynes (1986, 1992) are very useful in helping tounderstand Keynes’ thinking in the Treatise and the General Theory.13See, for example, the work of Hyman Minsky John Maynard Keynes (1975) Columbia UniversityPress: New York, and Stabilizing an Unstable Economy (1986) Yale University Press: New Haven.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
References 227
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
References
A very nice though brief survey of the applications of global analysis (or differential topology) toeconomics is:
<unc> Debreu, G. (1976). The application to economics of differential topology and global analysis:regular differentiable economies. The American Economic Review, 66, 280–287.
An advanced and detailed text on the use of differential topology in economics is:
Mas-Colell, A. (1985). The theory of general economic equilibrium. Cambridge: Cambridge Uni-versity Press.
Background reading on differential topology and the ideas of transversality can be found in:
Chillingsworth, D. R. J. (1976). Differential topology with a view to applications. Pitman: London.Golubitsky, M., & Guillemin, V. (1973). Stable mappings and their singularities. Berlin: Springer.Hirsch, M. (1976). Differential topology. Berlin: Springer.
For the Debreu-Smale Theorem see:
Balasko, Y. (1975). Some results on uniqueness and on stability of equilibrium in general equilib-rium theory. Journal of Mathematical Economics, 2, 95–118.
Debreu, G. (1970). Economies with a finite number of equilibria. Econometrica, 38, 387–392.Smale, S. (1974a). Global analysis of economics IV: finiteness and stability of equilibria with
general consumption sets and production. Journal of Mathematical Economics, 1, 119–127.<unc> Smale, S. (1974b). Global analysis and economics IIA: extension of a theorem of Debreu. Journal
of Mathematical Economics, 1, 1–14.
The Smale-Pareto theorem on the generic structure of the Pareto set is in:
Smale, S. (1973). Global analysis and economics I: Pareto optimum and a generalization of Morsetheory. In M. Peixoto (Ed.), Dynamical systems. New York: Academic Press.
Scarf’s example and the use of the Euler characteristic are discussed in Dierker’s notes on topo-logical methods.
Dierker, E. (1972). Two remarks on the number of equilibria of an economy. Econometrica, 951–953.
<unc> Dierker, E. (1974). Lecture notes in economics and mathematical systems: Vol. 92. Topologicalmethods in Walrasian economics. Berlin: Springer.
Scarf, H. (1960). Some examples of global instability of the competitive equilibrium. InternationalEconomic Review, 1, 157–172.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
228 5 Singularity Theory and General Equilibrium
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
The Lefschetz fixed point theorem as an obstruction theory for the existence of fixed point freefunctions and continuous vector fields can be found in:
<unc> Brown, R. (1971). The Lefschetz fixed point theorem. Glenview: Scott and Foresman.
Some applications of these techniques in economics are in:
Keenan, D. (1992). Regular exchange economies with negative prices. In W. Nenuefeind & R.Riezman (Eds.), Economic theory and international trade. Berlin: Springer.
Rader, T. (1972). Theory of general economic equilibrium. New York: Academic Press.
An application of this theorem to existence of a voting equilibrium is in:
Schofield, N. (1984). Existence of equilibrium on a manifold. Journal of Operations Research, 9,545–557.
The results on structural stability are given in:
Peixoto, M. (1962). Structural stability on two-dimensional manifolds. Topology, 1, 101–120.Smale, S. (1966). Structurally stable systems are not dense. American Journal of Mathematics, 88,
491–496.
A very nice and fairly elementary demonstration of Peixoto’s Theorem in two dimensions, togetherwith the much earlier classification results of Andronov and Pontrjagin, is given in:
<unc> Hubbard, J. H., & West, B. H. (1995). Differential equations: a dynamical systems approach.Berlin: Springer.
René Thom’s early book applied these “topological” ideas to development:
<unc> Thom, R. (1975). Structural stability and morphogenesis. Reading: Benjamin.
The result that any excess demand function is possible can be found in:
Debreu, G. (1974). Excess demand functions. Journal of Mathematical Economics, 1, 15–21.Mantel, R. (1974). On the characterization of aggregate excess demand. Journal of Economic The-
ory, 12, 197–201.Sonnenschein, H. (1972). Market excess demand functions. Econometrica, 40, 549–563.
The idea of chaos has become rather fashionable recently. For a discussion see:
<unc> Gleick, J. (1987). Chaos: making a new science. New York: Viking.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
References 229
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
An excellent background to the work of Poincaré, Birkhoff and Smale, and applications in meteo-rology is:
<unc> Lorenz, E. N. (1993). The essence of chaos. Seattle: University of Washington Press.
For applications of the idea of chaos in various economic and social choice contexts see:
<unc> Saari, D. (1985). Price dynamics, social choice, voting methods, probability and chaos. In D.Aliprantis, O. Burkenshaw, & N. J. Rothman (Eds.), Lecture notes in economics and mathe-matical systems (Vol. 244). Berlin: Springer.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
6Topology and Social Choice
In Chap. 3 we showed the Nakamura Theorem that a social choice could be guaran-teed as long as the dimension of the space did not exceed k(σ )−2. We now considerwhat can happen in dimension above k(σ )− 1. We then go on to consider “proba-bilistic” social choice, where there is some uncertainty over voters’ preferences.
6.1 Existence of a Choice
First we repeat some of the results presented in Chap. 3.As we showed in this chapter, arguments for the existence of an equilibrium or
choice are based on some version of Brouwer’s fixed point theorem, which we canregard as a variant of the Fan Choice Theorem. Brouwer’s theorem asserts that anycontinuous function f : B → B between the finite dimensional ball, B or indeedany compact convex set in �w , has the fixed point property.
This section will consider the use of variants of the Brouwer theorem, to proveexistence of an equilibrium of a general social choice mechanism. We shall arguethat the condition for existence of an equilibrium will be violated if there are cyclesin the underlying mechanism.
Let W ⊂ �w be the set of alternatives and, and let 2W be the set of all subsetsof W . A preference correspondence, P , on W assigns to each point x ∈ W , itspreferred set P(x). Write P : W � W to denote that the image of x under P is aset (possibly empty) in W . For any subset V of W , the restriction of P to V gives acorrespondence PV : V � V . Define P−1
V : V � V such that for each x ∈ V ,
P−1V (x)= {y : x ∈ P(y)
}∩ V.
The sets PV (x),P−1V (x) are sometimes called the upper and lower preference sets
of P on V . When there is no ambiguity we delete the suffix V . The choice of P
from W is the set
C(W,P )= {x ∈W : P(x)=∅}.
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_6,© Springer-Verlag Berlin Heidelberg 2014
231
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
232 6 Topology and Social Choice
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Here ∅ is the empty set. The choice of P from a subset, V , of W is the set
C(V,P )= {x ∈ V : PV (x)=∅}.
Call CP a choice function on W if CP (V ) = C(V,P ) �= ∅ for every subset V
of W . We now seek general conditions on W and P which are sufficient for CP tobe a choice function on W . Continuity properties of the preference correspondenceare important and so we require the set of alternatives, W , to be a topological space.
Definition 6.1 Let W,Y be two topological spaces. A correspondence P : W � Y
is(i) Lower hemi-continuous (lhc) iff, for all x ∈W , and any open set U ⊂ Y such
that P(x)∩U �=∅ there exists an open neighborhood V of x in W , such thatP(x ′)∩U �=∅ for all x′ ∈ V .
(ii) Upper hemi-continuous (uhc) iff, for all x ∈W and any open set U ⊂ Y suchthat P(x) ⊂ U , there exists an open neighborhood V of x in W such thatP(x ′)⊂U for all x′ ∈ V .
(iii) Lower demi-continuous (ldc) iff, for all x ∈ Y , the set
P−1(x)= {y ∈W : x ∈ P(y)}
is open (or empty) in W .(iv) Upper demi-continuous (udc) iff, for all x ∈ W , the set P(x) is open (or
empty) in Y .(v) Continuous iff P is both ldc and udc.
(vi) Acyclic if it is impossible to find a cycle xt ∈ P(xt−1), xt−1 ∈ P(xt−2), . . . ,
x1 ∈ P(xt ).
We shall use lower demi-continuity of a preference correspondence to prove ex-istence of a choice. In some cases, however, it is possible to make use of lowerhemi-continuity. Note that if P is ldc then it is lhc.
We shall now show that if W is compact, and P is an acyclic and ldc preferencecorrespondence P : W � W , then C(W,P ) �=∅. First of all, say a preference cor-respondence P : W � W satisfies the finite maximality property (FMP) on W iff forevery finite set V in W , there exists x ∈ V such that P(x)∩ V =∅.
Lemma 6.1 (Walker Wal1977??) If W is a compact, topological space and P is an <ref:??>
ldc preference correspondence that satisfies FMP on W , then C(W,P ) �= ∅. Thisfollows readily, using compactness to find a finite subcover, and then using FMP.
Corollary 6.1 If W is a compact topological space and P is an acyclic, ldc pref-erence correspondence on W , then C(W,P ) �=∅.
As Walker (Wal1977??) noted, when W is compact and P is ldc, then P is <ref:??>
acyclic iff P satisfies FMP on W , and so either property can be used to show exis-tence of a choice. A alternative method of proof to show that CP is a choice functionis to substitute a convexity property for P rather than acyclicity.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.2 Dynamical Choice Functions 233
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Definition 6.2(i) If W is a subset of a vector space, then the convex hull of W is the set,
Con[W ], defined by taking all convex combinations of points in W .(ii) W is convex iff W = Con[W ]. (The empty set is also convex.)
(iii) W is admissible iff W is a compact, convex subset of a topological vectorspace.
(iv) A preference correspondence P : W � W on a convex set W is convex iff,for all x ∈W , P(x) is convex.
(v) A preference correspondence P : W � W is semi-convex iff, for all x ∈W ,it is the case that x /∈ Con(P (x)).
As we showed in Chap. 3, Fan (1961) has demonstrated that if W is admissibleand P is ldc and semi-convex, then C(W,P ) is non-empty.
Choice Theorem (Fan 1961; Bergstrom 1975) If W is an admissible subset of aHausdorff topological vector space, and P : W � W a preference correspondenceon W which is ldc and semi-convex then C(W,P ) �=∅.
As demonstrated in Chap. 3, the proof uses the KKM lemma due to Knaster et al.(1929). There is a useful corollary to the Fan Choice theorem. Say a preferencecorrespondence on an admissible space W satisfies the convex maximality property(CMP) iff for any finite set V in W , there exists x ∈ Con(V ) such that P(x) ∩Con(V )=∅.
Corollary 6.2 Let W be admissible and P : W � W be ldc and semi-convex. ThenP satisfies the convex maximality property.
Numerous applications of the procedure have been made to show existence ofsuch an economic equilibrium. Note however, that these results all depend on semi-convexity of the preference correspondences.
6.2 Dynamical Choice Functions
We now consider a generalized preference field H :W � T W , on a smooth mani-fold W .
We use this notation to mean that at any x ∈W , H(x) is a cone in the tangentspace TxW above x. That is, if a vector v ∈ H(x), then λv ∈ H(x) for any λ > 0.If there is a smooth curve, c : [−1,1] →W , such that the differential dc(t)
dt∈H(x),
whenever c(t)= x, then c is called an integral curve of H . An integral curve of H
from x = c(o) to y = limt→1 c(t) is called an H -preference curve from x to y. Thepreference field H is called S-continuous iff, for any v ∈H(x) �=∅ then there is anintegral curve, c, in a neighborhood of x with dc(0)
dt= v. The choice C(W,H) of H
on W is defined by
C(W,H)= {x ∈W :H(x)=∅}.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
234 6 Topology and Social Choice
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Say H is half open if at every x ∈W , either H(x) = ∅ or there exists a vectorv′ ∈ TxW such that (v′ �v) > 0 for all v ∈H(x). We can say in this case that there is,at x, a direction gradient d in the cotangent space T ∗x W of linear maps from TxW
to � such that d(v) > 0 for all v ∈H(x). If H is S-continuous and half-open, thenthere will exist such a continuous direction gradient d V → T ∗V on a neighborhoodV of x.
Choice Theorem If H is an S-continuous half open preference field, on a finitedimensional compact manifold, W , then C(W,H) �= ∅. If H is not half openthen there exists an H -preference cycle through {x1, x2, x3, .xr .x1}. For each arc(xs, xs+1) there is an H -preference curve from xs to xs+1, with a final H -preferencecurve from xr to x1.
The Choice Theorem implies the existence of a singularity of the field, H .
Existence of Nash Equilibrium Let {W1, . . . ,Wn} be a family of compact, con-tractible, smooth, strategy spaces with each Wi ⊂ �w . A smooth profile is a func-tion u : WN = W1 ×W2 × · · · ×Wn � �n. Let Hi : Wi � T Wi be the inducedi-preference field in the tangent space over Wi . If each Hi is S-continuous andhalf open in T Wi then there exists a critical Nash equilibrium, z ∈WN such thatHN(z)= (H1 × · · · ×Hn)(z)=∅.
This follows from the previous choice theorem because the product preferencefield, HN , will be half-open and S-continuous Below we consider existence of localNash equilibrium. With smooth utility functions, a local Nash equilibrium can befound by checking the second order conditions on the Hessians. We now repeatExample 3.12. from Chap. 3.
Example 6.1 To illustrate the Choice Theorem, consider the example due toKramer (1973), with N = {1,2,3}. Let the preference relation PD : W � W be gen-erated by a set of decisive coalitions, D= {{1,2}, {1,3}, {2,3}}, so that y ∈ PD(x)
whenever two voters prefer y to x. Suppose further that the preferences of the votersare characterized by the direction gradients
{dui(x) : i = 1,2,3
}
as in Fig. 6.1.As the figure makes evident, it is possible to find three points {a, b, c} in W such
that
u1(a) > u1(b)= u1(x) > u1(c)
u2(b) > u2(c)= u2(x) > u2(a)
u3(c) > u3(a)= u3(x) > u3(b).
That is to say, preferences on {a, b, c} give rise to a Condorcet cycle. Note also thatthe set of points PD(x), preferred to x under the voting rule, are the shaded “winsets” in the figure. Clearly x ∈ ConPD(x), so PD(x) is not semi-convex. Indeed it
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.2 Dynamical Choice Functions 235
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Fig. 6.1 Cycles near apoint x
should be clear that in any neighborhood V of x it is possible to find three points{a′, b′, c′} such that there is local voting cycle, with a′ ∈ PD(b′), b′ ∈ PD(c′), c′ ∈PD(a′). We can write this as
a′ → c′ → b′ → a′.
Not only is there a voting cycle, but the Fan theorem fails, and we have no reason tobelieve that C(W,PD) �=∅.
We can translate this example into one on preference fields by writing
HD(u)=∪HM(u) :W � T W
where each M ∈D and
HM(u)(x)= {v ∈ TxW : (dui(x) � v)> 0, ∀i ∈M
}.
Figure 6.2 shows the three difference preference fields {Hi : i = 1,2,3} on W , aswell as the intersections HM , for M = {1,2} etc.
Obviously the joint preference field HD :W � T W fails the half open propertyat x. Although HD is S-continuous, we cannot infer that C(W,H) �=∅. If we define
Cycle(W,H)= {x ∈W :H(x) is not half open},
then at any point in Cycle(W,H) it is possible to construct local cycles in the mannerjust described.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
236 6 Topology and Social Choice
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Fig. 6.2 The failure ofhalf-openness of a preferencefield
The choice theorem can then be interpreted to mean that for any S-continuousfield on W , then
Cycle(W,H)∪C(W,H) �=∅.
Chilchinisky (1995) has obtained similar results for markets, where the conditionthat the dual is non-empty was termed market arbitrage, and defined in terms ofglobal market co-cones associated with each player. Such a dual co-cone, [Hi(u)]∗is precisely the set of prices in the cotangent space that lie in the dual of the pre-ferred cone, [Hi(u)], of the agent. By analogy with the above, she identifies thiscondition on non-emptiness of the intersection of the family of co-cones as onewhich is necessary and sufficient to guarantee an equilibrium.
The following Theorem implies that Fig. 6.2 is “generic.” As in Chap. 4, a prop-erty is generic if it is true of all profiles in a residual set in Cr(W,�n), where thisis the Whitney topology on smooth profiles, for a society of size n, on the policyspace W . Now consider a non-collegial voting game, D with Nakamura numberκ(D) Then we have the following Theorem by Saari (1997).
Saari Theorem For any non-collegial D, there exists an integer w(D) > κ(D) suchthat dim(W) > w(D) implies that C(W,HD(u)) = ∅ for all u in a residual sub-space of Cr(W,�n).
This result was essentially proved by Saari (1997), building on earlier resultsby Banks (1995), McKelvey (1976, 1979), Kramer (1973), Plott (Plo1969??), <ref:??>
Schofield (1978, 1983) and McKelvey and Schofield (1987). Although this resultformally applies to voting rules, Schofield (2010) argues that it is applicable to anynon-collegial social mechanism, and as a result can be interpreted to imply that
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.2 Dynamical Choice Functions 237
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Fig. 6.3 The heart with auniform electorate on thetriangle
chaos is a generic phenomenon in coalitional systems. Since an equilibrium maynot exist, we now introduce a more general social solution concept, called the heart.
Definition 6.3 (The Heart)(i) If W is a topological space, then x ∈W is locally covered (under a preference
correspondence Q) iff for any neighborhood Y of x in W , there exists y ∈ Y
such that (a) y ∈ Q(x), and (b) there exists a neighborhood Y ′ of y, withY ′ ⊆ Y such that Y ′ ∩Q(y)⊂Q(x).
(ii) The heart of Q, written H(Q), is the set of locally uncovered points in W .This notion can be applied to a preference correspondence PD or to the prefer-
ence field, HD(u), in which case we write H(PD) or H(HD(u)). Schofield (1999)shows that the heart will belong to the Pareto set, and is lower hemi-continuouswhen regarded as a correspondence.
Example 6.2 To illustrate the heart, Fig. 6.3 gives a simple artificial examplewhere the utility profile, u, is one where society has “Euclidean” preferences, basedon distance, and the ideal points are uniformly distributed on the boundary ofthe equilateral triangle. Under majority rule, D, the heart H(HD(u)), is the star-shaped figure inside the equilateral triangle (the Pareto set), and contains the “yolk”McKelvey (1986). The heart is generated by an infinite family of “median lines,”such as {M1,M2, . . .}. The shape of the heart reflects the asymmetry of the distri-bution. Inside the heart, voting trajectories can wander anywhere. Outside the heartthe dual cones intersect, so any trajectory starting outside the heart converges to theheart. Thus the heart is an “attractor” of the voting process. Figure 6.4 gives a simi-lar example, this time where preferences are defined by the pentagon, and the heartis the small centrally located circle.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
238 6 Topology and Social Choice
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
Fig. 6.4 The heart with auniform electorate on thepentagon
6.3 Stochastic Choice
To construct such a social preference field, we first consider political choice on acompact space, W , of political proposals. This model is an extension of the stan-dard multiparty stochastic model, modified by inducing asymmetries in terms of thepreferences of voters.
We define a stochastic electoral model, M(λ,μ, θ ,α, β), which utilizes socio-demographic variables and voter perceptions of character traits. For this model weassume that voter i utility is given by the expression
uij (xi, zj ) = λj +μj (zj )+ (θj � ηi)+ (αj � τi)− β‖xi − zj‖2 + εj (6.1)
= [u∗ij (xi, zj )]+ εj (6.2)
The points {xi ∈ W : i ∈ N} are the preferred policies of the set, N , of voters inthe political or policy space W , and z = {zj ∈W : j ∈Q} are the positions of theagents/candidates. The term ‖xi − zj‖ is simply the Euclidean distance between xi
and zj . The error vector (ε1, . . . , εj , . . . , εp) is distributed by the iid type I extremevalue distribution, as assumed in empirical multinomial logit estimation (MNL).The symbol θ denotes a set of k-vectors {θj : j ∈ Q} representing the effect ofthe k different sociodemographic parameters (class, domicile, education, income,religious orientation, etc.) on voting for agent j while ηi is a k-vector denotingthe ith individual’s relevant “sociodemographic” characteristics. The compositions{(θj � ηi)} are scalar products, called the sociodemographic valences for j . Thesescalar terms characterize the various types of the voters.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.3 Stochastic Choice 239
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
The terms {(αj � τi)} are scalars giving voter i’s perceptions and beliefs. Thesecan include perceptions of the character traits of candidate or agent j , or beliefsabout the state of the economy, etc. We let α = (αq, . . . , α1). A trait score can beobtained by factor analysis from a set of survey questions asking respondents aboutthe traits of the agent, including ‘moral’, ‘caring’, ‘knowledgeable’, ‘strong’, ‘hon-est’, ‘intelligent’, etc. The perception of traits can be augmented with voter percep-tion of the state of the economy, etc. in order to examine how anticipated changesin the economy affect each agent’s electoral support.
Finally the exogenous valence vector λ= (λ1, λ2, . . . , λq) gives the general per-ception of the quality of the various candidates, {1, . . . , q}. This vector satisfiesλq ≥ λq−1 ≥ · · · ≥ λ2 ≥ λ1, where (1, . . . , q) label the candidates, and λj is theexogenous valence of agent or candidate j . In empirical multinomial logit models,the valence vector λ is given by the intercept terms for each agent. Finally {μj (zj )}represent the endogenous valences of the candidates. These valences depend on thepositions {zj ∈W : j ∈Q} of the agents.
In the model, the probability that voter i chooses candidate j , when party posi-tions are given by z is:
ρij (z)= Pr[[
uij (xi, zj ) > uil(xi, zl)], for all l �= j
].
A local Political Nash equilibrium (SLNE) is a vector, z, such that each candi-date, j , has chosen zj to locally strictly maximize the expectation Σiρij (z).
The type I extreme value distribution, Ψ , has a cumulative distribution the closedform
Ψ (h)= exp[− exp[−h]],
while its pdf has variance 16π2.
With this distribution it follows, for each voter i, and candidate, j , that
ρij (z)=exp[u∗ij (xi, zj )]
∑q
k=1 expu∗ik(xi, zk). (6.3)
This game is an example of what is known as a Quantal response gameMcKelvey and Palfrey (1995), Levine and Palfrey (2007). Note that the utility ex-pressions {u∗ik(xi, zk)} can be estimated from surveys that include vote intentions.We can use the American National Election Survey (ANES for 2008) which givesgave individual perceptions of the important political policy questions. As indicatedin Table 6.1, we are able to use factor analysis of these responses to construct a twodimensional policy space. ANES 2008 also gave voter perceptions of the charactertraits of the candidates, in terms of “moral”, “caring”, “knowledgeable”, “strong”and “honest”. We performed a factor analysis of these perceptions as shown in Ta-ble 6.2. Further details of the model can be found in Schofield et al. (2011)). Ta-ble 6.3 gives estimates of the average voter positions for the two parties . We alsoobtained data on those voters who declared they provided support for the candidates.these we designated as activists. Figure 6.5 gives an estimate of the voter estimatedpositions as well as the two presidential candidate positions in 2008.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
240 6 Topology and Social Choice
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
Table 6.1 Factor loadings for economic and social policy for the 2008 election
Question Economic policy Social policy
Less Government services 0.53 0.12
Oppose Universal health care 0.51 0.22
Oppose Bigger Government 0.50 0.14
Prefer Market to Government 0.56
Decrease Welfare spending 0.24
Less government 0.65
Worry more about Equality 0.14 0.37
Tax Companies Equally 0.28 0.10
Support Abortion 0.55
Decrease Immigration 0.12 0.25
Civil right for gays 0.60
Disagree Traditional values 0.53
Gun access 0.36
Support Afr. Amer 0.14 0.45
Conservative v Liberal 0.30 0.60
Eigenvalue 1.93 1.83
Table 6.2 Factor loadingsfor candidate traits scores2008
Question Obama traits McCain traits
Obama Moral 0.72 −0.01
Obama Caring 0.71 −0.18
Obama Knowledgeable 0.61 −0.07
Obama Strong 0.69 −0.13
Obama Honest 0.68 −0.09
Obama Intelligent 0.61 0.08
Obama Optimistic 0.55 0.00
McCain Moral −0.09 0.67
McCain Cares −0.17 0.63
McCain Knowledgeable −0.02 0.65
McCain Strong −0.10 0.70
McCain Honest −0.03 0.63
McCain Intelligent 0.11 0.68
McCain Optimistic −0.07 0.57
Eigenvalue 3.07 3.00
These survey data allow us to construct a spatial logit model of the election as inTable 6.4. This table has Obama as the baseline candidate.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.3 Stochastic Choice 241
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
Table 6.3 Estimated voter, activist and candidate positions in 2008
Econ policy Social policy n
Mean s.e. 95 % C.I Mean s.e. 95 % C.I
Activists
Democrats −0.20 0.09 [−0.38,−0.02] 1.14 0.11 [0.92,1.37] 80
Republicans 1.41 0.13 [1.66,1.16] −0.82 0.09 [−0.99,−0.65] 40
Non-activists
Democrats −0.17 0.03 [−0.24,−0.11] 0.36 0.04 [0.29,0.44] 449
Republicans 0.72 0.06 [0.60,0.84] −0.56 0.05 [−0.65,−0.46] 219
788
Fig. 6.5 Distribution of voter ideal points and candidate positions in the 2008 presidential election
Once the voter probabilities over a given set z= {zj : j ∈Q} are computed, thenestimation procedures allow these probabilities to be computed for each possiblevector z. This allows the determination and proof of existence of local Nash equi-librium (LNE), namely a vector, z∗ = (z∗1, . . . , z∗j , . . . , z∗q) such that each candidate,j , chooses z∗j to locally maximize its expected vote share, given by the expectationVj (z∗)=∑i ρij (z∗).
Schofield (2006) shows that the first order condition for a LNE is that themarginal electoral pull at z∗ = (z∗1, . . . , z∗j , . . . , z∗q) is zero. For candidate j , thisis defined to be
dE∗jdzj
(z∗j)= [zel
j − z∗j]
where zelj ≡
n∑
i=1
�ijxi
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
242 6 Topology and Social Choice
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Table 6.4 Spatial logit models for USA 2008a
Variable (1) Spatial (2) Sp. & Traits (3) Sp. & Dem. (4) Full
McCain valence λ −0.84∗∗∗ −1.08∗∗∗ −2.60∗∗ −3.58∗∗∗
(7.6) (8.3) (2.8) (3.4)
Spatial β 0.85∗∗∗ 0.78∗∗∗ 0.86∗∗∗ 0.83∗∗∗
(14.1) (10.1) (12.3) (10.3)
McCain traits 1.30∗∗∗ 1.36∗∗∗
(7.6) (7.15)
Obama traits −1.02∗∗∗ −1.16∗∗∗
(6.8) (6.44)
Age −0.01 −0.01
(1.0) (1.0)
Gender (F) 0.29 0.44
(1.26) (0.26)
African American −4.16∗∗∗ −3.79∗∗∗
(3.78) (3.08)
Hispanic −0.55 −0.23
(1.34) (0.51)
Education 0.15∗ 0.22∗∗∗
(2.5) (3.66)
Income 0.03 0.01
(1.5) (0.50)
Working Class −0.54∗ −0.70∗∗
(2.25) (2.59)
South 0.36 −0.02
(1.5) (0.07)
Observations 788
log likelihood (LL) −299 −243 −250 −207
AIC 601 494 521 438
BIC 611 513 567 494
aObama is the baseline for this model
is the weighted electoral mean of candidate j .Here the weights {�ij } are individual specific, and defined at the vector z∗ by:
[�ij ] =[ [ρij (z∗)− ρij (z∗)2]∑
k∈N [ρkj (z∗)− ρkj (z∗)2]]
(6.4)
Because the candidate utility functions {Vj : W → �} are differentiable, thesecond order condition on the Hessian of each Vj at z∗ can then be used to de-termine whether z∗ is indeed an LNE. Proof of existence of such an LNE will
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.3 Stochastic Choice 243
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
Fig. 6.6 Optimal Republican position
then follow from some version of the Choice Theorem. For example, in the sim-pler model M(λ, β), without activists, all weights are equal to 1
n, so the electoral
mean x0 = 1n
∑xi satisfies the first order condition, as suggested by Hinich (1977).
Schofield (2007) gives the necessary and sufficient second order conditions for anLNE at the mean.
The underlying idea of this model is that each candidate, j , will be attracted to therespective weighted electoral mean, zel
j , but will also be pulled away to a positionpreferred by the party activists. Figure 6.6 suggests the location of the weightedelectoral mean for a Republican candidate. The contract curve in this figure is aheuristic estimated locus of preferred activist positions. See also Schofield et al.(2011).
Because the candidate utility functions {Vj : W → �} are differentiable, thesecond order condition on the Hessian of each Vj at z∗ can then be used to de-termine whether z∗ is indeed an LNE. Proof of existence of such an LNE willthen follow from some version of the Choice Theorem. For example, in the sim-pler model M(λ,β), without activists, all weights are equal to 1
n, so the electoral
mean x0 = 1n
∑xi satisfies the first order condition, as suggested by Hinich (1977).
Schofield (2007) gives the necessary and sufficient second order conditions for anLNE at the mean.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
244 6 Topology and Social Choice
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
The underlying idea of this model is that each candidate, j , will be attracted to therespective weighted electoral mean, zel
j , but will also be pulled away to a positionpreferred by the party activists. Figure 6.6 suggests the location of the weightedelectoral mean for a Republican candidate. The contract curve in this figure is aheuristic estimated locus of preferred activist positions. See also Schofield et al.(2011).
We can compute the necessary and sufficient second order conditions for an LNEat the electoral mean. In the case that the activist valence functions and sociode-mographic terms are identically zero, we call this the pure spatial model, denotedM(λ, β).
In this case, the first order condition is
dVj (z)dzj
= 1
n
∑
i∈N
dρij
dzj
(6.5)
= 1
n
∑
i∈N
{2β(xi − zj )
}[ρij − ρ2
ij
]= 0. (6.6)
Suppose that all zj are identical. Then all ρij are independent of {xi} and thus of i,and ρij may be written as ρj . Then for each fixed j , the first order condition is
dVj (z)dzj
= 2β[ρj − ρ2
j
]∑
i∈N
[(xi − zj )
]= 0. (6.7)
The second order condition for an LNE at z* depends on the negative definitenessof the Hessian of the activist valence function. If the eigenvalues of these Hessiansare negative at a balance solution, and of sufficient magnitude, then this will guar-antee that a vector z* which satisfies the balance condition will be a SLNE. Indeed,this condition can ensure concavity of the vote share functions, and thus of existenceof a PNE.
6.3.1 The Model Without Activist Valence Functions
We now apply the Theorem to the pure spatial model M(λ, β), by setting μ= θ =α ≡ 0.
As we have shown above, the joint electoral mean z0 satisfies the first ordercondition for a LNE. We now consider the second order condition.
Definition 6.4 (The Convergence Coefficient of the Model M(λ, β)) When thespace W has dimension w.
(i) Define
ρ1 =[
1+p∑
k=2
exp[λk − λ1]]−1
. (6.8)
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.3 Stochastic Choice 245
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
(ii) Let W be endowed with an orthogonal system of coordinate axes (1, . . . , s,
. . . , t, . . . ,w). For each coordinate axis let ξt = (x1t , x2t , . . . , xnt ) ∈ Rn be
the vector of the t th coordinates of the set of n voter ideal points. Let(ξs, ξt ) ∈ R denote scalar product. The covariance between the sth and t thaxes is denoted (σs, σt ) = 1
n(ξs, ξt ) and σ 2
s = 1n(ξs, ξs) is the electoral vari-
ance on the sth axis. Note that these variances and covariances are takenabout the electoral means on each axis.
(iii) The symmetric w × w electoral covariance matrix ∇0 is defined to be1n[(ξs, ξt )]s=1,...,w
t=1,...,w .(iv) The electoral variance is
σ 2 =w∑
s=1
σ 2s =
1
n
w∑
s=1
(ξs, ξs)= trace(∇0).
(v) The w by w characteristic matrix, of agent 1 is given by
C1 = 2β(1− 2ρ1)∇0 − I. (6.9)
(vi) The convergence coefficient of the model M(λ, β) is
c≡ c(λ, β)= 2β[1− 2ρ1]σ 2. (6.10)
Observe that the β-parameter has dimension L−2, so that c is dimensionless. Wecan therefore use c to compare different models.
Note also that agent 1 is by definition the agent with the lowest valence, and ρ1,as defined above, is the probability that a generic voter will choose this agent whenall agents are located at the origin. The estimate of the probability ρ1 depends onlyon the comparison functions {fkj }, as given above and these can be estimated interms of the valence differences.
The following result is proved in Schofield (2007).
The Mean Voter Valence Theorem(i) The joint mean z0 satisfies the first order condition to be a LNE for the model
M(λ, β).(ii) The necessary and sufficient second order condition for SLNE at z0 is that
C1 has negative eigenvalues.1
(iii) A necessary condition for z0 to be a SLNE for the model M(λ, β) is thatc(λ, β) < w.
(iv) A sufficient condition for convergence to z0 in the two dimensional case isthat c < 1.
Notice that (iii) follows from (ii) since the condition of negative eigenvaluesmeans that
trace(C1)= 2β[1− 2ρ1]σ 2 −w < 0.
1In the usual way, the condition for an LNE is that the eigenvalues are negative semi-definite.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
246 6 Topology and Social Choice
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
In the case c(λ, β)= w, then trace (C1)= 0, which means either that all eigen-values are zero, or at least one is positive. This degenerate situation requires ex-amination of C1. The additional condition c < 1 is sufficient to guarantee thatdet(C1) > 0, which ensures that both eigenvalues are negative.
The expression for C1 has a simple form because of the assumption of a singledistance parameter β . It is possible to use a model with different coefficients β ={β1, β2, . . . , βw} on each dimension. In this case the characteristic matrix can readilybe shown to be
C1 = 2(1− 2ρ1)β∇0β − β,
We require trace (C1) < 0, or
2(1− 2ρ1) trace(β∇0β) < β1 + β2 + · · · + βw.
The convergence coefficient in this case is
c(λ,β)= 2(1− 2ρ1) trace(β∇0β)
1w
(β1 + β2 + · · · + βw)
again giving the necessary condition of c(λ,β) < w.Note that if C1 has negative eigenvalues, then the Hessians of the vote shares for
all agents are negative definite at the joint mean, z0. When this is true, then the jointmean is a candidate for a PNE, and this property can be verified by simulation.
When the convergence condition c(λ,β) < w is violated the joint origin cannotbe a SPNE.
In the degenerate case c(λ,β)= w it is again necessary to examine the charac-teristic matrix to determine whether the joint mean can be a PNE.
Model (1) in Table 6.4 shows the coefficients in 2008 for the pure spatial model,M(λ, β), to be
(λObama, λMcCain, β)= (0,−0.84,0.85).
Table 6.4 indicates, the loglikelihood, Akaike information criterion (AIC) andBayesian information criterion (BIC) are all quite acceptable, and all coefficientsare significant with probability < 0.001.
Note that these parameters are estimated when the candidates are located at theestimated positions. Again, λMcCain is the relative negative exogenous valence ofMcCain, with respect to Obama, according to the pure spatial model M(λ, β). Weassume that the parameters of the model remain close to these values as we modifythe candidates positions in order to determine the equilibria of the model.
According to the model M(λ, β), the probability that a voter chooses McCain orObama when both are positioned at the electoral mean, z0, are
(ρMcCain, ρObama)=(
e0
e0 + e0.84,
e0.84
e0 + e0.84
)= (0.30,0.70).
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
6.3 Stochastic Choice 247
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
Now the covariance matrix can be estimated to be
∇0 =[
0.80 −0.13−0.13 0.83
].
Thus from Table 6.4, we obtain
CMcCain = [2β(1− 2ρMcCain)∇0 = [2× 0.85× 0.4×∇0] − I
= (0.68)∇0 − I
= (0.68)
[0.80 −0.13−0.13 0.83
]− I =
[0.54 −0.09−0.09 0.56
]− I
=[−0.46 −0.09−0.09 −0.44
],
c= 2β(1− 2ρMcCain) trace∇0 = 2(0.85)(0.4)(1.63)= 1.1.
The determinant of CMcCain is positive and the trace negative, so both eigenvaluesare negative, showing that the mean is an LNE. The lower 95 % estimate for ρMcCain
is 0.26, and the upper 95 % estimate for β is 0.97, so a very conservative upperestimate for β(1− 2ρMcCain) is 0.97× 0.48 = 0.47, so the upper estimate for c is1.53, giving an estimate for CMcCain of
(0.94)
[0.80 −0.09−0.09 0.83
]− I
=[
0.75 −0.13−0.13 0.78
]− I
=[−0.25 −0.13−0.13 −0.22
],
which still has negative eigenvalues.We also considered a spatial model where the x and y axes had different coeffi-
cients, β1 = 0.8, β2 = 0.92.Using
c(λ,β)= 2(1− 2ρlib) trace(β∇0β)
1w
(β1 + β2 + · · · + βw)
with 12 (β1 + β2)= 1
2 (0.80+ 0.92)= 0.86 and ρlib = 0.25, we find
c(λ,β) = 2(0.4)
0.86trace
[(0.80)2(0.80) (0.80)(0.92)(−0.13)
(0.80)(0.92)(−0.13) (0.92)2(0.83)
]
= (0.93) trace
[0.51 −0.09−0.09 0.70
]= (0.93)(1.21)= 1.23.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
248 6 Topology and Social Choice
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
For the characteristic matrix,
CMcCain = 2(1− 2ρMcCain)β∇0β − β
= 2(0.4)
[0.51 −0.09−0.09 0.70
]−[
0.80 00 0.92
]
=[−0.41 −0.07−0.07 −0.56
]−[
0.80 00 0.92
]
=[−0.39 −0.07−0.07 −0.36
].
The analysis showed the Hessian for this case had negative eigenvalues, so againz0 is a LNE. This model is essentially the same as the model with a single β . Sincethese models imply the origin is an LNE but nether candidate is located there we in-fer that activists exert influence to move the candidate positions into opposite quad-rants of the policy space.
References
Banks, J. S. (1995). Singularity theory and core existence in the spatial model. Journal of Mathe-matical Economics, 24, 523–536.
Chichilnisky, G. (1995). Limited arbitrage is necessary and sufficient for the existence of a com-petitive equilibrium with or without short sales. Economic Theory, 5, 79–107.
Hinich, M. J. (1977). Equilibrium in spatial voting: the median voter theorem is an artifact. Journalof Economic Theory, 16, 208–219.
Kramer, G. H. (1973). On a class of equilibrium conditions for majority rule. Econometrica, 41,285–297.
Levine, D., & Palfrey, T. R. (2007). The paradox of voter participation. American Political ScienceReview, 101, 143–158.
<unc> Lin, T., Enelow, M. J., & Dorussen, H. (1999). Equilibrium in multicandidate probabilistic spatialvoting. Public Choice, 98, 59–82.
McKelvey, R. D. (1976). Intransitivities in multidimensional voting models and some implicationsfor agenda control. Journal of Economic Theory, 12, 472–482.
McKelvey, R. D. (1979). General conditions for global intransitivities in formal voting models.Econometrica, 47, 1085–1112.
McKelvey, R. D. (1986). Covering, dominance and institution free properties of social choice.American Journal of Political Science, 30, 283–314.
McKelvey, R. D., & Palfrey, T. R. (1995). Quantal response equilibria in normal form games.Games and Economic Behavior, 10, 6–38.
McKelvey, R. D., & Schofield, N. (1987). Generalized symmetry conditions at a core point. Econo-metrica, 55, 923–933.
<unc> Miller, G., & Schofield, N. (2003). Activists and partisan realignment in the US. American PoliticalScience Review, 97, 245–260.
<unc> Plott, C. R. (1967). A notion of equilibrium and its possibility under majority rule. The AmericanEconomic Review, 57, 787–806.
Saari, D. (1997). The generic existence of a core for q-rules. Economic Theory, 9, 219–260.Schofield, N. (1978). Instability of simple dynamic games. Review of Economic Studies, 45, 575–
594.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
References 249
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
Schofield, N. (1983). Generic instability of majority rule. Review of Economic Studies, 50, 695–705.
Schofield, N. (1999). The heart and the uncovered set. Journal of Economics. Supplementum, 8,79–113.
Schofield, N. (2006). Equilibria in the spatial stochastic model of voting with party activists. Re-view of Economic Design, 10, 183–203.
Schofield, N. (2007). The Mean Voter Theorem: necessary and sufficient conditions for convergentequilibrium. Review of Economic Studies, 74, 965–980.
Schofield, N. (2010). Social orders. Social Choice and Welfare, 34, 503–536.<unc> Schofield, N. (2013). The probability of a fit choice. Review of Economic Design, 17, 129–150.
Schofield, N., Claassen, C., & Ozdemir, U. (2011). Empirical and formal models of the US presi-dential elections in 2004 and 2008. In N. Schofield & G. Caballero (Eds.), The political economyof institutions, democracy and voting (pp. 217–258). Berlin: Springer.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
7Review Exercises
7.1 Exercises to Chap. 1
1.1. Consider the relations:
P = {(2,3), (1,4), (2,1), (3,2), (4,4)}
and Q= {(1,3), (4,2), (2,4), (4,1)}.
Compute Q ◦ P , P ◦ Q, (P ◦ Q)−1 and (Q ◦ P)−1. Let φQ and φP be themappings associated with these two relations. Are either φQ and φP functions, andare they surjective and/or injective?
1.2. Suppose that each member i of a society M = {1, . . . ,m} has weak and strictpreferences (Ri,Pi ) on a finite set X of feasible states. Define the weak Pareto rule,Q, on X by xQy iff xRiy∀i ∈M , and xPjy for some j ∈M . Show that if eachRi , i ∈M , is transitive, then Q is transitive. Hence show that the Pareto choice setCQ(X) is non empty.
1.3. Show that the set Θ = {eiθ : 0≤ θ ≤ 2π}, of all 2× 2 matrices representingrotations, is a subgroup of (M∗(2),◦), under matrix composition, ◦.
7.2 Exercises to Chap. 2
2.1. With respect to the usual basis for �3, let x1 = (1,1,0), x2 = (0,1,1), x3 =(1,0,1). Show that {x1, x2, x3} are linearly independent.
2.2. Suppose f : �5→�4 is a linear transformation, with a 2-dimensional kernel.Show that there exists some vector z ∈ �4, such that for any vector y ∈ �4 thereexists a vector y0 ∈ Im(f ) with y = y0 + λz for some λ ∈ �.
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6_7,© Springer-Verlag Berlin Heidelberg 2014
251
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
252 7 Review Exercises
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
2.3. Find all solutions to the equations A(x)= bi , for i = 1,2,3, where
A=⎛
⎝1 4 2 33 1 −1 11 −1 4 6
⎞
⎠
and
b1 =⎛
⎝734
⎞
⎠ , b2 =⎛
⎝111
⎞
⎠ and b3 =⎛
⎝321
⎞
⎠ .
2.4. Find all solutions to the equation A(x)= b where
A=⎛
⎝6 −1 1 41 1 3 −13 4 1 2
⎞
⎠
and
b=⎛
⎝437
⎞
⎠ .
2.5. Let F : �4→�2 be the linear transformation represented by the matrix
(1 5 −1 3−1 0 −4 2
).
Compute the set F−1(y), when y = ( 41
).
2.6. Find the kernel and image of the linear transformation, A, represented by thematrix
⎛
⎝3 7 24 10 21 −2 5
⎞
⎠ .
Find new bases for the domain and codomain of A so that A can be representedas a matrix
(I 00 0
)
with respect to these bases.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
7.3 Exercises to Chap. 3 253
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
2.7. Find the kernel of the linear transformation, A, represented by the matrix
⎛
⎝1 3 12 −1 −5−1 1 3
⎞
⎠ .
Use the dimension theorem to compute the image of A. Does the equationA(x)= b have a solution when
b=⎛
⎝111
⎞
⎠?
2.8. Find the eigenvalues and eigenvectors of
(2 −11 4
).
Is this matrix positive or negative definite or neither?
2.9. Diagonalize the matrix
⎛
⎝4 1 11 8 01 10 2
⎞
⎠ .
2.10. Compute the eigenvalues and eigenvectors of
⎛
⎝1 0 00 0 10 1 0
⎞
⎠
and thus diagonalize the matrix.
7.3 Exercises to Chap. 3
3.1. Show that if A is a set in a topological space (X,T ) then the interior, Int(A),of A is open and the closure, Clos(A), is closed. Show that Int(A)⊂A⊂ Clos(A).What is the interior and what is the closure of the set [a, b) in �, with the Euclideantopology? What is the boundary of [a, b)? Determine the limit points of [a, b).
3.2. If two metrics d1, d2 on a space X are equivalent write d1 ∼ d2. Show that∼ is an equivalence relation on the set of all metrics on X. Thus show that theCartesian, Euclidean and city block topologies on �n are equivalent.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
254 7 Review Exercises
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
3.3. Show that the set, L(�n,�m), of linear transformations from �n to �m is anormed vector space with norm
‖f ‖ = supx∈�n
{‖f (x)‖‖x‖ : ‖x‖ �= 0
},
with respect to the Euclidean norms on �n and �m. In particular verify that ‖ ‖Lsatisfies the three norm properties. Describe an open neighbourhood of a member f
of L(�n,�m) with respect to the induced topology on L(�n,�m). Let M(n,m) bethe set of n×m matrices with the natural topology (see page 106), and let
M :L(�n,�m)→M(n,m)
be the matrix representation with respect to bases for �n and �m. Discuss the con-tinuity of M with respect to these topologies for L(�n,�m) and M(n,m).
3.4. Determine, from first principles, whether the following functions are contin-uous on their domain:1. �+→� : x→ loge x;2. �→�+ : x→ x2;3. �→�+ : x→ ex ;4. �→� : x→ cosx;5. �→� : x→ cos 1
x.
3.5. What is the image of the interval [−1,1] under the function x → cos 1x
? Isthe image compact?
3.6. Determine which of the following sets are convex:1. X1 = {(x1, x2) ∈ �2 : 3x2
1 + 2x22 ≤ 6};
2. X2 = {(x1, x2) ∈ �2 : x1 ≤ 2, x2 ≤ 3};3. X3 = {(x1, x2) ∈ �2+ : x1x2 ≤ 1};4. X4 = {(x1, x2) ∈ �2+ : x2 − 3≥−x2
1}.
3.7. In �2, let BC(x, r1) be the Cartesian open ball of radius r1 about x, andBE(y, r2) the Euclidean ball of radius r2 about x. Show that these two sets areconvex. For fixed x, y ∈ �2 obtain necessary and sufficient restrictions on r1, r2 sothat these two open balls may be strongly separated by a hyperplane.
3.8. Determine whether the following functions are convex, quasi-concave, orconcave:1. �→�+ : x→ ex ;2. �→� : x→ x7;3. �→� : (x, y)→ xy;4. �→� : x→ 1
x;
5. �2→� : (x, y)→ x2 − y.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
7.4 Exercises to Chap. 4 255
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
7.4 Exercises to Chap. 4
4.1. Suppose that f : �n →�m and g : �m →�k are both Cr -differentiable. Isg ◦ f : �n→�k , a Cr -differentiable function? If so, why?
4.2. Find and classify the critical points of the following functions:1. �2→� : (x, y)→ x2 + xy + 2y2 + 3;2. �2→� : (x, y)→−x2 + xy − y2 + 2x + y;3. �2→� : (x, y)→ e2x − 2x + 2y.
4.3. Determine the critical points, and the Hessian at these points, of the function�2→� : (x, y)→ x2y.
Compute the eigenvalues and eigenvectors of the Hessian at critical points, anduse this to determine the nature of the critical points.
4.4. Show that the origin is a critical point of the function:
�3→� : (x, y, z)→ x2 + 2y2 +Zz2 + xy + xz.
Determine the nature of this critical point by examining the Hessian.
4.5. Determine the set of critical points of the function
�2→� : (x, y)→−x2y2 + 4xy − 4x2.
4.6. Maximise the function �2 →� : (x, y)→ x2y subject to the constraint 1−x2 − y2 = 0.
4.7. Maximise the function �2 →� : (x, y)→ a logx + b logy, subject to theconstraint px + qy ≤ I , where p,q, I ∈ �+.
7.5 Exercises to Chap. 5
5.1. Show that if dimension (X) > m, then for almost every smooth profile u =(u1, . . . , um) :X→�m it is the case that Pareto optimal points in the interior of X
can be parametrised by at most (m−1) strictly positive coefficients {λ1, . . . , λm−1}.
5.2. Consider a two agent, two good exchange economy, where the initial endow-ment of good j , by agent i is eij . Suppose that each agent, i, has utility functionui : (xi1, xi2)→ a logxi1 + b logxi2. Compute the critical Pareto set Θ , within thefeasible set
Y = {(x11, x12, x21, x22) ∈ �4+},
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
256 7 Review Exercises
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Fig. 7.1 The butterflysingularity
where the coordinates of Y satisfy
x11 + x21 = e11 + e21 and x12 + x22 = e12 + e22.
What is the dimension of Y and what is the codimension of Θ in Y ? Computethe market-clearing equilibrium.
5.3. Figure 7.1 shows a “butterfly singularity”, A, in �2. Compute the degreeof this singularity. Show why such a singularity (though it is isolated) cannot beassociated with a generic excess demand function on the two-dimensional pricesimplex.
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Subject Index
AAbelian group, 20Accumulation point, 83Acyclic relation, 26Additive inverse, 15Additive relation, 22Admissible set, 158Antisymmetric relation, 24Arrow’s Impossibility Theorem, 33, 35Associativity in a group, 15Associativity of sets, 2Asymmetric relation, 24Attractor of a vector field, 214
BBaire space, 212Banach space, 123, 190Base for a topology, 82Basis of a vector space, 42Bergstrom’s theorem, 128Bijective function, 11Bilinear map, 67Binary operation, 14Binary relation, 24Bliss point of a preference, 31Boolean algebra, 2Boundary of a set, 83Boundary problem, 162Bounded function, 29Brouwer Fixed Point Theorem, 118, 121Browder Fixed Point Theorem, 125Budget set, 113Butterfly dynamical system, 224
CCalculation argument on economic
information, 225Canonical form of a matrix, 75Cartesian metric, 85Cartesian norm, 80Cartesian open ball, 85
Cartesian product, 7Cartesian topology, 85Chain rule, 141Change of basis, 55Chaos, 221, 224Characteristic equation of a matrix, 68Choice correspondence, 27City block metric, 87City block norm, 80City block topology, 86Closed set, 83Closure of a set, 83Coalition feasibility, 179Codomain of a relation, 7Cofactor matrix, 53Collegial rule, 35Collegium, 35Commutative group, 20Commutativity of sets, 2Compact set, 95Competitive allocation, 178Complement of a set, 2Complete vector space, 123, 190Composition of mappings, 8Composition of matrices, 14Concave function, 100Connected relation, 24Constrained optimisation, 155Consumer optimisation, 113Continuous function, 94Contractible space, 119Convex function, 99Convex preference, 99Convex set, 99Corank, 196Corank r singularity, 196Core Theorem for an exchange economy, 179Cover for a set, 6, 93Critical Pareto set, 173Critical point, 150
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6,© Springer-Verlag Berlin Heidelberg 2014
257
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
258 Subject Index
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
DDebreu projection, 204Debreu-Smale Theorem, 183, 189, 203, 215,
221Decisive coalition, 33Deformation, 216Deformation retract, 119Degree of a singularity, 212Demand, 114Dense set, 83Derivative of a function, 145Determinant of a matrix, 14, 53Diagonalisation of a matrix, 63, 64Dictator, 35Diffeomorphism, 140Differentiable function, 135Differential of a function, 138Dimension of a vector space, 45Dimension theorem, 50Direction gradient, 139Distributivity of a field, 21Distributivity of sets, 2Domain of a mapping, 8Domain of a relation, 8
EEconomic optimisation, 162Edgeworth box, 182Eigenvalue, 62Eigenvector, 62Endowment, 108Equilibrium prices, 114Equivalence relation, 28Euclidean norm, 78Euclidean scalar product, 77Euclidean topology, 86Euler characteristic of simplex, 214Euler characteristic of sphere and torus, 199Excess demand function, 207Exchange theorem, 44Existential quantifier, 7
FFan theorem, 125, 225Feasible input-output vector, 111Field, 21Filter, 34, 38Fine topology, 87Finite intersection property, 93Finitely generated vector space, 44, 45Fixed point property, 118Frame, 41Free-disposal equilibrium, 132
Function, 11Function space, 92
GGame, 128General linear group, 55Generic existence of regular economies, 183,
203Generic property, 183, 201, 212Global maximum (minimum) of a function,
154Global saddlepoint of the Lagrangian, 117Graph of a mapping, 9Group, 14
HHairy Ball theorem, 215Hausdorff space, 96Heine-Borel Theorem, 95Hessian, 142, 144Homeomorphism, 118, 211Homomorphism, 18
IIdentity mapping, 10Identity matrix, 13, 52Identity relation, 8Image of a mapping, 8Image of a transformation, 50Immersion, 194Implicit function theorem, 192, 195Index of a critical point, 197Index of a quadratic form, 70Indifference, 24Infimum of a function, 87Injective function, 12Interior of a set, 83Intersection of sets, 2Inverse element, 15Inverse function, 11Inverse function theorem, 190Inverse matrix, 13, 54Inverse relation, 8Invisible dictator, 35Irrational flow on torus, 218Isomorphism, 18Isomorphism theorem, 56
JJacobian of a function, 140
KKernel of a transformation, 50Kernel rank, 50
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
Subject Index 259
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
Knaster-Kuratowski-Mazurkiewicz (KKM)Theorem, 123
Kuhn-Tucker theorems, 115–118
LLagrangian, 116Lefschetz fixed point theorem, 216Lefschetz obstruction, 217Limit of a sequence, 94Limit point, 83Linear combination, 42Linear dependence, 15Linear transformation, 45Linearly independent, 41Local maximum (minimum), 154Locally non-satiated preference, 110Lower demi-continuity, 118Lower hemi-continuity, 128Lyapunov function, 208, 209
MMajority rule, 34Manifold, 194Mapping, 8Marginal rate of technical substitution, 169Market equilibrium, 114Matrix, 13, 46Mean value theorem, 146Measure zero, 198Metric, 80Metric topology, 82Metrisable space, 80Michael’s Selection Theorem, 123Monotonic rule, 36Morphism, 18Morse function, 154, 197Morse lemma, 197Morse Sard theorem, 202Morse theorem, 202
NNakamura Lemma, 36Nakamura number, 35Nakamura Theorem, 127Nash equilibrium, 128Negation of a set, 2Negative definite form, 70Negative of an element, 21Negatively transitive, 26Neighbourhood, 82Non-degenerate critical point, 150Non-degenerate form, 70Non-satiated preference, 110Norm of a vector, 69, 78
Norm of a vector space, 80Normal hyperplane, 106Nowhere dense, 198Null set, 2Nullity of a quadratic form, 70
OOligarchy, 34, 38Open ball, 81Open cover, 93Open set, 82Optimum, 116Orthogonal vectors, 69
PPareto correspondence, 171Pareto set, 29, 171Pareto theorem, 203, 216Partial derivative, 139Partition, 7Peixoto-Smale theorem, 212Permutation, 11Phase portrait, 210Poincaré-Hopf Theorem, 214Positive definite form, 70Preference manipulation, 180Preference relation, 24Prefilter, 38Price adjustment process, 208Price equilibrium, 170, 225Price equilibrium existence, 126Price vector, 108, 112Producer optimisation, 111Product rule, 141Product topology, 84Production set, 114Profit function, 112Propositional calculus, 4Pseudo-concave function, 158
Qq-Majority, 36Quadratic form, 69Quasi-concave function, 100
RRank of a matrix, 51Rank of a transformation, 50Rationality, 25Real vector space, 40Reflexive relation, 24Regular economy, 183, 206Regular point, 192Regular value, 192Relation, 8
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
260 Subject Index
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Relative topology, 6, 84Repellor for a vector field, 214Residual set, 83, 183Resource manipulation, 181Retract, 119Retraction, 119Rolle’s Theorem, 145, 147Rotations, 17, 74
SSaddle, 70Saddle point, 151Sard’s lemma, 198Scalar, 22Scalar product, 47, 77Separating hyperplane, 108Separation of convex sets, 107Set theory, 1–3, 6Shadow prices, 111Shauder’s fixed point theorem, 125Similar matrices, 58Singular matrix, 14Singular point, 196Singularity set of a function, 200Singularity theorem, 202Smooth function, 143Social utility function, 28Sonnenschein-Mantel-Debreu Theorem, 220Stratified manifold, 202Strict Pareto rule, 28, 34Strict partial order, 26, 33Strict preference relation, 24Strictly quasi-concave function, 156, 158Structural stability of a vector field, 219, 222Subgroup, 18Submanifold, 202Submanifold theorem, 202Submersion, 194Supremum of a function, 87Surjective function, 11
Symmetric matrix, 68Symmetric relation, 24
TTangent to a function, 138Taylor’s theorem, 148Thom transversality theorem, 201Topological space, 82Topology, 6Torus, 198Trace of a matrix, 65Transfer paradox, 181Transitive relation, 24Transversality, 200Triangle inequality, 79Truth table, 4Two party competition, 130Tychonoff’s theorem, 97
UUnion of sets, 2Universal quantifier, 7Universal set, 1Utility function, 25
VVector field, 211Vector space, 40Vector subspace, 40Venn diagram, 2
WWalras’ Law, 218Walras manifold, 203Walrasian equilibrium, 183Weak monotone function, 100Weak order, 26Weak Pareto rule, 28Weierstrass theorem, 95Welfare theorem, 177Whitney topology, 182, 236
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
Author Index
AAliprantis, C., 133Aliprantis, D., 229Arrow, K. J., 32, 35, 38, 187, 225Aumann, R. J., 225
BBalasko, Y., 181, 187, 206, 227Banks, J. S., 236, 248Bergstrom, T., 128, 131, 132, 133Bikhchandani, S., 226Border, K., 132Brouwer, L. E. J., 118, 121, 132Browder, F. E., 125, 132Brown, D., 133Brown, R., 228Burkenshaw, O., 229
CCaballero, G., 239, 243, 244, 249Calvin, W. H., 223Chichilnisky, G., 236, 248Chillingsworth, D. R. J., 195, 227Claassen, C., 239, 243, 244, 249Condorcet, M. J. A. N., 226
DDebreu, G., 183, 203, 204, 206, 215, 220, 223,
227, 228Dierker, E., 217, 221, 227Dorussen, H., 248
EEnelow, M. J., 248Eldredge, N., 223
FFan, K., 125, 132, 225
GGale, D., 181, 187Gamble, A., 225Gleick, J., 228Golubitsky, M., 196, 227Goroff, D., 223Greenberg, J., 133Guesnerie, R., 187Guillemin, V., 196, 227
HHahn, F. H., 187Hayek, F. A., 225Heal, E. M., 132Hewitt, F., 226Hildenbrand, W., 108, 187Hinich, M. J., 243, 248Hirsch, M., 196, 227Hirschleifer, D., 226Hubbard, J. H., 228
KKauffman, S., 221Keenan, D., 218, 228Kepler, J., 222Keynes, J. M., 226Kirman, A. P., 38, 108, 187Knaster, B., 123, 132Konishi, H., 133Kramer, G. H., 234, 236, 248Kuhn, H. W., 115, 132Kuratowski, K., 123, 132
LLaffont, J.-J., 187Lange, O., 225Laplace, P. S., 222Levine, D., 239, 248Lin, T., 248Lorenz, E. N., 224, 229
N. Schofield, Mathematical Methods in Economics and Social Choice,Springer Texts in Business and Economics, DOI 10.1007/978-3-642-39818-6,© Springer-Verlag Berlin Heidelberg 2014
261
AU
TH
OR
’S P
RO
OF
Book ID: 74899_2_En, Date: 2013-09-30, Proof No: 1, UNCORRECTED PROOF
262 Author Index
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
MMantel, R., 220, 221, 228Mas-Colell, A., 220, 227Mazerkiewicz, S., 132McKelvey, R. D., 236, 237, 239, 248McLean, I., 226Michael, E., 123, 132Miller, G., 248Minsky, H., 226
NNakamura, K., 35, 36, 38, 127Nash, J. F., 128, 132Nenuefeind, W., 218, 228Neuefeind, W., 128, 131, 132, 133Newton, I., 222
OOzdemir, U., 239, 243, 244, 249
PPalfrey, T. R., 239, 248Peixoto, M., 203, 212, 219, 227, 228Peterson, I., 223Plott, C. R., 248Poincaré, H., 214, 222, 223Pontrjagin, L. S., 228Prabhakar, N., 133
RRader, T., 218, 228Riezman, R., 128, 131, 132, 133, 218, 228
Rothman, N. J., 229
SSaari, D., 229, 236, 248Safra, Z., 187Scarf, H., 189, 207, 209, 217, 227Schauder, J., 125, 132Schofield, N., 133, 217, 228, 236, 237, 239,
241, 243–245, 248, 249Schumpeter, J. A., 225Shafer, W., 133Skidelsky, R., 226Smale, S., 183, 187, 203, 212, 215, 219, 221,
227, 228Sondermann, D., 38Sonnenschein, H., 133, 220, 221, 228Strnad, J., 133
TThom, R., 201, 228Tucker, A. W., 115, 132
Vvon Mises, L., 225
WWelsh, I., 226West, B. H., 228
YYannelis, N., 133