Luenberger_optimzation by Vector Space Methods

OPTIMIZATION BYVECTOR SPACE METHODS............\SERIES IN DECISION AND CONTROLRonald! A. HowardOPTIMIZATION BY VECTOR SPACE METHODSby David G. LuenbergerINTRODUCTION TO DYNAMIC PROGRAMMINGby George L. NemhauserDYNAMIC PROBABILISTIC SYSTEMS (In Preparation)by Ronald A. Howard,....,.;...)- .. _- -.-OPTIMIZATION BYVECTOR SPACE METHODSDavid G. LuenbergerStanford University,Stanford, Californiaffi John Wiley & SCJ11s,!nc.~ New York-London-Sydney-Toronto''r>ot,. .... ~ , \!!Copyright 1969 by'JohnWiJey & Sons, Inc.All rights reserved. Nopart of this book may bereproduced by any means,nor transmitted, nor lated into a machinelanguage without thewritten permissionof the publisher.Library of CongressCatalog Card Number:68-8716 SBN 471 5S3S9xPrinted in the UnitedStates of America3 4 5 6 7 8 9 10IIIIIlII\II-_._-----,.._,,_.._... .....--_.----_.._._-----_.'._-- ------.,----To Nancy., ,,PREFACEThis book has evolved from a course on optimization that I have taught atStanford University for the past five years. It is intended to be essentiallyself-contained and should be suitable for classroom work or self-study.As a text it is aimed at first- or second-year graduate students in engineering,mathematics, operations research, or other disciplines dealing with opti-mization theory.The primary objective of the book is to demonstrate that a rather largesegrruinCof'tlie'lield-'ofoptiriii:l'ation cali be effectively unified by a fewgeometric principles of linear vector space theory. Byuse of these principles,important and complex infinite-dimensional as' "i'b'osegenerated byconsideration of time functions, are interpreted and solved bymethods springing from qur geometric insight. Concepts such as distance,orthogonality, and convexity play fundamental' roles in this development.Viewed in these terms, seemingly diverse problems and techniques oftenare found to be intimately related.The essential mathematical prerequisite is a familiarity with linearalgebra, preferably from the geometric viewpoint. Some familiarity withelementary analysis including the basic notions of sets, convergence, andcontinuity.is assumed, but deficiencies in this area can be corrected as oneprogresses through the book. More advanced concepts of analysis such asLebesgue measure and integration theory, although referred to in a fewisolated sections, are not required background for this book.Imposing simple intuitive interpretations on complex infinite-dimen-sional problems requires a fair degree of mathematical sophistication.The backbone of the approach taken in this book is functional analysis,the study of linear vector spaces. In an attempt to keep the mathematicalprerequisites to a minimumwhile Not sacrificingcompleteness ofthe develop-ment, the early chapters of the book essentially constitute an introductionto functional analysis, with applications to optimization, for those havingthe r;latively modest background described above, The mathematician ormore advanced student may wish simply to scan Chapters 2,3,5, and 6 forreview or for sections treating applications and then on theother chapters which deal explicitly with optimization theory.vii"......y.:-..-"-"!>VJ1l PREFACEThe sequencing of the various sections is not necessarily inviolable.Even at the chapter level the reader may wish to alter his oider of progressthrough 'the' book. The course from which this text developed is twoquarters (six months) in duration, but there is more material in the textthan can be comfortably covered in that period. By reading only the firstfew sections of Chapter 3, it is possible to go directly from Chapters 1and 2to Chapters 5, 7,8, and 10 for a fairly comprehensive treatment of optim-ization which can be covered in about one semester. thematerial at the end of Chapter 6 can be combined with Chapters 3and 4 fora unified introduction to Hilbert space problems. To help the reader makeintelligent decisions regarding his order of progress through the book,sections of a specialized or digressive nature are indicated by an *.The problems at the end of each chapter are of two basic varieties.The first consists of miscellaneous mathematical problems and proofswhich extend and supplement the theoretical material inthe text; the sec'ondconsists of optimization problems which illustrate further areas of appli-cation and which hopefully will help the student formulate and solvepractical problems. The problems represent a major component of thebook) and the serious student will not pass over them lightly.I have received help and encouragement from many people during theyears of preparation of this book. Of great benefit were comments andsuggestions of Pravin Varaiya, E. Bruce and particularly SamuelKarlin who read the entire manuscript and suggested several valuableimprovements. I wish to acknowledge the Departments of Engineering-Economic Systems and Electrical Engineering at Stanford University forsupplying much of the assistance. This effort was also partiallysupported by the Office of Naval Research and the National Science Fobn-dation. Of particular benefit) of course) have been the faces of puzzled'confusion or of elated understanding, the critical comments and the sinceresuggestions of the many students who have worked through this materialas the book evolved.DAVID G. LUENBERGERPalo Alto, CaliforniaAugust 1968iI

I'CONTENTS1 INTRODUCTION 11.1 Motivation 11.2 Applications 21.3 The Main Principles 81.4 Organization of the Book 102 LINEAR SPACES 112.1 Introduction 11VECTOR SPACES 112.2 Definition and Examples 112.3 Linear Combinations, and Linear 142.4 Convexity and Cones 172.5 Linear Independence and Dimension 19NOR1vlED LINEAR SPACES 222.6 Definition and Examples 22'2.7 Open and Cilosed Sets 242.8 Convergence: 262.9 Transformations and Continuity 27*2.10 The lp and LpSpaces 292.11 Banach Spaces 332.12 Complete Subsets 38*2.13 Extreme Vall11es of Functionals, and Compactness 39*2.14 Quotient Spaces 41*2.15 Denseness and Separability 422.16 Problems 43References 453 HILBERT SPACE 463.1 Introduction 46IX...- ..rT.2.y ,~ . .\X CONTENTSPREHILBERT SPACES 463.2 Inner Products 463.3 The Projection Theorem 493.4 Orthogonal Complements 523.5 The Gram-Schmidt Proccdllre 53APPROXIMATiON 553.6 The Normal Equations and Gram Matrices 553.7 Fourier Series 58*3.8 Complete Orthonormal Sequences 603.9 Approximation and Fourier Series 62OTHER MINIMUM NORM PROBLEMS 643.10 The Dual Approximation Problem 64*3.11 A Control Problem 683.12 Minimum Distance to a Convex Set 693.13 Problems 72References 774 LEAST-SQUARES ESTIMATION 784.1 Introduction 784.2 Hilbert Space of Random Variables 794.3 The Least-Squares Estimate 824.4 Minimum-Variance Unbiased Estimate (Gauss-Markov Esti-mate) 844.5 Minimum-Variance Estimate 87 '4.6 Additional Properties of Minimum-Variance Estimates 904.7 Recursive Estimation 934.8 Problems 97References 1025 DUAL SPACES 1035.1 Introduction 103LINEAR FUNCTIONALS 1045.2 Basic Concepts 1045,3 Duals of Some Common Banach Spaces 106EXTENSION FORM OF THE HAHN-BANACHTHEOREM 1105.4 Extension of Linear Functionals ] 105,5 The Dual of C[a, b] 1135,6 The Second Dual Space 1155.7 Alignment and Orthogonal Complements 116\"./ .....CONTENTS Xl5.8 Minimum Norm Problems 1185.9 Applications 122*5.l0 Weak Convergence 126GEOMETRIC FORM OF THE HAHN-BANACHTHEOREM 1295.11 Hyperplanes and Linear Functionals 1295.12 Hyperplan.es and Convex Sets 131*5.13 Duality in Minimum Norm Problems 1345.14 Problems 137References 1426 LINEAR OPERATORS AND ADJOINTS 1436.1 Introduction 1436.2 Fundamentals 143INVERSE OPERATORS 1476.3 Linearity of Inverses 1476,4 The Banach Inverse Theorem 148ADJOINTS 1506.5 Definition and Examples 15('6.6 Relations between Range and Nu]]space 1556.7 Duality Re:lations for Convex Cones 157*6.8 Geometric Interpretation of Adjoints 159OPTIMIZATION IN HILBERT SPACE 1606.9 The Normal Equations 1606.10 The Dual' Problem 1616.11 Pseudoinverse Operators 1636.12 Problems 165References 1687 OPTIMIZATION OF FUNCTlONALS 1697.1 Introduction 1697.27.37A*7.5*7.67.7LOCAL THEORY 171Gateaux and Frechet DifferentialsFrechet Derivatives 175Extrema 177Euler-Lagrange Equations 179Problems with Variable End PointsProblems with Constraints 185171183XlI CONTENTSGLOBAL THEORY 1907.8 Convex and Concave Functionals 190*7.9 Properties of the Set Lf, C] 1927.10 Conjugate Convex Functionals 1957.11 Conjugate Concave Functionals 1997.12 Dual Optimization Problems 200*7.13 Min-Max Theorem of Game Theory 2067.14 Problems 209References 2128 GLOBAL THEORY OF CONSTRAINED OPTIMIZATION 2138.1 Introduction 2138.2 Positive Cones and Convex Mappings 2148.3 Lagrange Multipliers 2168.4 Sufficiency 2198.5 Sensitivity 2218.6 Duality 2238.7 Applications 2268.8 Problems 236References 2389 LOCAL THEORY OF CONSTRAINED OPTIMIZATION 2399.19.2. 9.39.4Introduction 239LAGRANGE MULTIPLIER THEOREMSInverse Function Theorem 240Equality Constraints 242Inequality Constraints (Kuhn-Tucker Theorem),240247~ "OPTIMAL CONTROL THEORY 2549.5 Basic Necessary Conditions- 254*9.6 Pontryagin Maximum Principle 2619.7 Problems 266References 26910 ITERATIVE METHODS OF OPTIMIZATION 27110.1 Introduction 271METHODS FOR SOLVING EQUATIONS 27210.2 Successive Approximation 2721003 Newton's Method 277CONTENTS xiiiDESCENT METHODS 28310.4 General Philosophy 28310.5 Steepest Descent 285CONJUGATE DIRECTION METHODS 29010.6 Fourier Series 290*10.7 Orthogonalization of Moments 29310.8 The Conjugate Gradient Method 294METHODS FOR SOLVING CONSTRAINEDPR01BLEMS 29710.9 Projection ]\1ethods 29710.10 The PrimalDual Method 29910.11 Penalty Functions 30210.12 Problems 308References 311SYMBOL INDEX 321SUBJECT INDEX 323,....NOTATIONSetsIf x is a member of the set S, we write xeS. The notation y S means y isnot a member of S.A set may be specified by listing its elements between braces such as. S ={I, 2, 3} for the set consisting of the first three positive integers.Alternatively, a set S may be specified as consisting of all elements of theset X which have the .property P. This is written S ={x e X: P(x)} ifX is understood, S = {x: P(x)}.The union of two sets Sand T is denoted S u T and consists of thoseelements that are in either S or T.The intersection of two sets Sand T is denoted S n T and consists ofthose that are in both Sand T. Two sets are disjoint if theiri!ltersection is empty.If S is defined as a subset of elements of X, the complement of S, denotedS, consists of those elements of X that are not in S.A set S is a subset of the set T if every element of S is also an element ofT. In this case we write SeT or T:::> S. If S c:: T and S is not equal to Tthen S is said to be a proper subset of T.Sets of Real NumbersIf a and b are real numbers, [a, b] denotes the set of rea] numbers x satis-fying aox=ox 0 there is {) > 0 such that for all x satisfying Ix - xol< 0, g(x) < S + B, and (2) for every B > 0 and {) > 0 there is an x such thatIx - Xo I < (j and g(x) > S - E. (See the corresponding definitions forsequences.)If 9 is a real-valued function of a real variable, the notation g(x) =O(x)means thatSequencesA sequence Xt , x2 , .. , XII' . is denoted by or {Xi} if the range ofthe indices is clear.Let {xi} be an infinite sequence of real numbers and suppose that thereis a real number S satisfying: (I) for every > 0 there is an N such that forall n > N, XII < S + , and (2) for every > 0 and M > 0 there is an n > Msuch that XII> S - E. Then S is called the limit superior of{x,J and we writeS = lim sup xn . If {xn} is not bounded above we write lim sup Xn = + 00. The limit inferior of Xn is lim inf Xn = -lim sup ( - xn). If lim sup x" = Jiminf Xn=S, we write lim Xn = S.. '."NOTATION xviiMatrices and Avector xwithneomponents is written x = (Xl' XZ, , x,J, but when usedin matrix calculations it is represented as a column vector, Le.,The corresponding row vector isAn n x m matritx A with entry aij in its i-th row andj-th column is writtenA = [atj]' If x = (Xl' X2 , , XII)' the product Ax is the vector Y with com-ponents Yt = Li=1 a'jXj ' i = 1, 2, ... , m.Let/(x1' X2, ,. XII) be a function of the n real variables Xi' Then wewrite Ix for the row vectorIf F = (fh f2 , :.fm) is a vector function of x = (XH , xn), we writeFx for the m x n Jacobian matrix [aii/oxi]'1INTRODUCTION1.1 MotivationDuring the past twenty years mathematics and engineering have beenincreasingly directed towards is of decision making in physical ororganizational systems. This trend has been primarily by thesignificant economic benefits which often result from a proper decisionconcerning the distribution of expensive and by the repeateddemonstration that such problems can be realistically formulated andmathematically an:alyzed to obtain good decisions.The arrival of high-speed digital computers has also played a majorrole in the development of the science of decision making. Computershave inspired the: development of larger systems and the coupling ofpreviously separate thereby resulting in decision and controlproblems of correspondingly increased complexity. At the same time,however, computt:rs have revolutionized applied mathematics and solved.many of the complex problems they generated.It is perhaps natural that the concept of best or optimal decisions shouldemerge as the fundamental approach for formulating decision problems.In this approach Ii single real quantity, summarizing the performance orvalue of a is isolated and optimized (i.e., either maximized orminimized depending on the situation) by proper selection among availablealternatives. The resulting optimal decision is taken as the solution to thedecision problem. This approach to decision problems has the virtues ofsimplicity, preciseness, elegance, and, in many cases, mathematical ability. It also has obvious limitations due to the necessity of selecting asing{e objective by which to measure results. But optimization has provedits utility as a mode of analysis and is firmly entrenched in the field ofdecision making. classical theory of optimization, motivated primarily byproblems of physics, is associated with great mathematicians Gauss,. . ..Lagrange, Euler, the Bernoutis, etc. During the recent development ofoptimization in decision the classical techniques have been re-examined, extend,ed, sometimes rediscovered, and applied to problems12 INTRODUCTION 1having quite different origins than those responsible for their earlierdevelopment. New insights have been obtained and new techniques havebeen discovered. The computer has rendered many techniques obsoletewhile making other previously impractical methods feasible and efficient.These recent developments in optimization have been made by mathe-maticians, system engineers, economists, operations. researchers, statis-ticians, numerical and others in a host of different fields.The study of optimization as an independeht topic must, of course, beregarded as a branch of applied mathematics. As such it must look tovarious areas of pure mathematics for its unification,clarification, andgeneral foundation. One such area of particular relevance is functionalanalysis.Functional analysis is the study of vector spaces resulting from amerging of geometry, linear algebra, and analysis. It serves as a basis foraspects of several important branches of applied mathematics inc1UrdingFourier series, integral and differential equations, numerical analysis, andany field where linearity plays a key role. Its appeal as a unifying disciplinestems primarily from its geometric character. Most'"of the principal resultsin functional analysis are expressed as abstractions of intuitive geometricproperties of ordinary three-dimensional space.Some readers may look with great expectation toward functionalanalysis, hoping to discover new powerful techniques that will enable themto solve important problems beyond the reach of simpler mathematicalanalysis. Such hopes are rarely realized in practice. The primary utilityof functional analysis for the purposes of this book is its role as a unifyingdiscipline, gathering a number of apparently diverse, specialized mathe-matical tricks into one or a few general geometric principles.1.2 ApplicationsThe main purpose of this section is to illustrate the variety of problemsthat can be formulated as optimization problems in vector space by ducing some specific examples that are treated in later chapters. As avehicle for this purpose, we classify optimization problems according tothe role of the decision maker. We list the classification, briefly describeits meaning, and illustrate it with one problem that can be formulated invector space and treated by the methods described later in the book. Theclassification is not intended to be necessarily complete nor, for that matter,particularly significant. It is merely representative of the classificationsoften employed when discussing optimization.Although the formal definition of a vector space is not given untilChapter 2, we point out, in the examples that follow, how each problem.1.1.2 APPLICATIONS 3can be regarded as formulated in some apprc""riate vector space. However,the details of the formulation must, in many cases, be deferred until laterchapters.1. Allocation. In allocation problems there is typically a collection ofresources to be distributed in some optimal fashion. Almost any optimiza-tion problem can be placed in this broad category, but usually the termis reserved for problems in which the resources are distributed over spaceor among various activiities.A typical problem of this type is that faced by a manufacturer with aninventory of raw materials. He has certain processing equipment capableof producing n different kinds of goods from the raw materials. Hisproblem is to allocate the raw materials among the possible products soas to maximize his profit.In an idealized version of the problem, we assume that the productionand profit model is linear. Assume that the selling price per unit of productj is Pi j = 1,2, ... , n. If xj denotes the amount of product} that is to beproduced, bi the amount of raw material i on hand, and au the amount ofmaterial i in one unit of product j, the manufacturer seeks to maximizehis profit.PIXI +PZ X2 + ... +Pnxnsubject to the productio:n constraints on the amount of raw materialsaU x1 + a12 x 2 + ... + a1nx" b1a21x l +a22 x 2 +.. + a2n x " b2 . . .aml xl +am2 x2 + ... + am"xn ::;;; bmandThis type of problem, characterized by a linear objective function subjectto linear ineq:uality is linear programming problem and isused to illustrate aspects of the general theory of optimization in laterchapters.We note that the problem can be regarded as formulated in ordinaryn-dimensional vector space. The vector x with components Xi is the un-known. The constraints define a region ii_ the vector space in which theselected vector must lie. The optimal vector is the in that regionmaximizing the objective. .The manufacturing problem can be generalized to allow for nonlin.earand one seeks the function r satisfying the inequality constraints4 INTRODUCTION 1objectives and more general constraints. Linearity is destroyed, which maymake the solution more difficult to obtain, but the problem can stili beregarded as one in ordinary Euclidean n-cltimensional space.2. Planning. Planning is the problem of determining an optimal pro-cedure for attaining a set of objectives. In common usage, planning refersespecially to those problems involving outlays of capital over a period oftime such as (1) planning a future investment in electric power generationequipment for a given geographic region or (2) determining the best hiringpolicy in order to complete a complex'project at minimum expense.As an example, consider a problem of production planning. A firmproducing a certain product wishes to plan its production schedule over aperiod of time in an optimal fashion. It is assumed that a fixed demandfunction over the time interval is known and that this demand must be met.!Excess inventory must be stored at a storage cost proportional to theamount stored. There is a production .cost associated with a given rate ofproduction. Thus, denoting x(t) as the stock held at time t, ret) as the rateof production' at time I, and d(t) as the demand at time t, the productionsystem can be described by the equations!,x(O) given 0 Iand minimizing the costfor 0 t :$ T.TJ = Jo{c[r(t)] + h . xCt)} dtwhere dr] is the production cost rate for the production level rand h . xis the inventory cost rate for inventory level x.This problem can be regarded as defined on a vector spa'Ce consisting ofcontinuous functions on. the interval [0, T] .of the realline. The optimalproduction schedule r is then an element of the space; Again the con-straints define a region in the space in which the solution r must lie whileminimizing the cost.3. Control (or Guidance). Problems of control are with dy-namic systems evolving in time. Control is quite similar to planning;1 x(t) dx(t)/dt.1.2 APPLICATIONS 5indeed, as we shall see, it is often the source of a problem rather than itsmathematical structure which determines its category.Control or guidance usually refers to directed! influence on a dynamicsystem to achieve desired performance. The system itself may be physicalin nature, such as a rocket heading for Mars or a chemical plant processingacid, or it may be operational such as a warehouse receiving and fillingorders.Often we seek feedback or so-caHed dosed-loop control in which de-cisions of current control action are made continuously in time based onrecent observations of system behavior. Thus, one may imagine himself asa controller sitting at the control panel watching meters and turning knobsor in a warehouse ordering new stock based on inventory and predicteddemand. This is in contrast to the approach described for planning inwhich the whole series of control actions is predetermined. Generally, the terms planning or control may refer to either possibility.As an example of a control problem, we consider the launch of a rocketto a fixed altitudeh in given time T while expending a minimum of fuel."For simplicity, we assume unit mass, a constant gravitational force, andthe absence of aerodynamic forces. The motion of a rocket being propelledvertically is governed by the equationsy(t) = u(t) - 9where y is the vertical height, u is the accelerating force, and 9 is thegravitational force. The optimal control function u is the one whichforces yeT) = h whi;le minimizing the fuel expenditure Jb' lu(t)1 dt.This problem tOQI might be formulated in a vector space consisting offunctions u defined on the interval [0, i]. The solution to this problem,however, is that u(t)i consists of an impulse at t =0 lind, therefore, correctproblem formulaticln and- selection of an appropriate vector space arethemselves interestimg aspects of this example. Problems of this type, jn-eluding this specific example, are diHcussed in Chaptet S.4. Approximation!. Approximation problems are motivated by the desireto approximate a gcmeral mathematical entity (such as a function) by oneof simpler, specifiedl form. A large class of such approximation problemsis important in num.erical analysis. For example, suppose we wish, becauseof storage limitatioIlls or for purposes ofsimplifying an analysis. to approxi-mate a function, say x(t), over an interval [a, b] of the real line by a nomial pet} of n. The best approximating polynomial p minimizesthe error e = x - p in the sense of some criterion. The choice of criteriondetermines the approximation. Often used criteria are:,\ .6 INTRODueTION 1b"t" fe2(t) dta "2. max le(t)\ b3. Jle(t)\ dt.QThe problem is quite naturally viewed as fO'rmulated in a vector spaceof functions overthe interval [a, b]. The problem is then viewed as fmdinga vector from a given class (polynomials) which is closest to a given vector.5. Estimation. Estimation problems are really a special class of approxi-mation problems. We seek to. estimate some quantity from imperfectobservations of it or from observations which" are statistically correlatedbut not deterministically related to it. Loosely speaking, the problemamounts to approximating the unobservable quantity by a combination ofthe observable ones. For example, the position of a random maneuveringairplane at some future time might reasonably be estimated by a linearcombination of past measurements of its position. .Another example of estimation arises in connection with triangulationproblems such as in location of torest fires, ships at sea, or "remote stars.Suppose there are three lookout stations, each of which measures the angleof the from the station to the observed object. The situation isillustrated in Figure 1.1. Given these three angles, what is the best estimateof the object's location?Figure 1.1 A triangulation problemTo formulate the problem completely, a criterion must be preciselyprescribed and hypotheses specified regarding the nature of probable1.2 APPLICATIONS 7measurement errors and probable location of the object. Approaches canbe taken that-resulLin a problem in vector space; prob-lems are di'Scussed in Chapter 4.,6. Games. Many problems involving a competItIve element can beregarded as' games. In the usual formulation, involving two players orprotagonists, there is an objective function whose value depends jointly onthe action employed by both players. One player attempts to maximizethis objective while the other attempts to minimize it.Often two problems from the categories discussed above can be com-petitively intermixed to produce a game. Combinations of categories thatlead to interesting games. include: allocation-allocation, allocation-control,control-control, and estimation-control.As an example, consider a control-con:rol game. Most problems of thistype are of the pursuer-evader type such as a fighter plane chasing abomber. Each player has a system he but one is trying to maximize, the objective (time to, intercept for instance) while the other is trying tominimize the objective.As a simpler example, we consider a problem of advertising or campaign-ing which is essentially an game.2Two opposingcandidates, A and B, are running for office and must plan how to allocatetheir advertising resources (A and B dollars, respectively) among n distinctgeographical areas. Let Xi and Yf denote, respectively, the resources com-mitted to area i by candidates A and B. We assume that there are currentlya total of u undecided votes of which there are U i undecided votes in area i.The number. of votes going to candidates A and B from area'i are assumedto berespectively. The total difference between the number of votes received byA and by B is' then ' Xi - YtL.- --- Uj'i;:; 1 Xi ,.. YlCandidate A seeks to maximize this quantity while B seeks to minimize it.This problem is obviously finite dimensional and can be solved byordinary calculus in a few lines. It is illustrative, however, of an iJiterestingclass of game problems.2. This problem is due to L. Friedman [57]..ji 8 INTRODUCTION 1,1.3 The Main PrindplesThe theory of optimization presented in this book is derived from a fewsimple, intuitive, geometric relations. The extension of these relations toinfinite-dimensional spaces is the motivation for the mathematics offunctional analysis which, in a sense, often enables us to extend our three-dimensional geometric insights to complex problems.This is the conceptua[ utility of functionall analysis. On the other hand,these simple geometric relations have great practical utility as well becausea vast assortment of problems can be analyzed from this point of view.In this section, we briefly describe a few of the important geometricprinciples of optimization that are developed in detail in later chapters.1. The Projection Theorem. This theorem is one of the simplest andnicest results of optimization theory. In ordinary three-dimensionalEuclidean space, it states that the shortest line from a point to a plane isfurnished by the perpendicular from the point to the plane, as illustratedin Figure 1.2.Figure 1.2 The projection theoremThis simple and seemingly innocuous result has direct extensions inspaces of higher dimension and in infinite-dimensio.nal Hilbert space. Inthe generalized form, this optimization principle forms the basis of allleast-squares approximation, control, and estimation procedures.2. Tile Hahn-Banach Theorem. Of the many results and concepts infunctional analysis, the one theorem dominating the theme of this bookand embodying the essence of the simple geometric ideas upon whichthe theory is built is the Hahn-Banach theorem. The theorem takes severalforms. One version extends the projection theorem to problems havingnonquadratic objectives. In this manner the simple geometric interpretationis preserved for these more complex problems. Another version of the1.3 THE MAIN PRINCIPLES 9Hahn-Banach theorem states (in simplest form) that given a sphere and apoint not in the sphere there is a byperplane separating the point and thesphere: This version of the theorem, together with the associated notionsof hyperplanes, is the basis for most of the theory beyond Chapter 5.3. Duality. There are several duality principles in optimization theorythat relate a expressed in terms of vectors in a space to a problemexpressed in terms of hyperplanes in the space. This concept of duality isa recurring theme in this book.Many of these duality principles are based on the geometric relationillustrated in Figure 1.3. The shortest distance from a point to a convexset is equal to the maximum of the distances from the point to a hyper-plane. separating the point from the convex set. Thus, the original mini-mization over vectors can be converted to maximization over hyperplanes.Figure 1.3 Duality4. Differentials. Perhaps the most familiar optimization technique isthe method of differential calculus-setting the derivative of the objectivefunction equal to zero. The technique is discussed for a single Of, perhaps,finite number of variables in the most elementary courses on differentialcakulus. Its extension to infinite-dimensional spaces is straightforward and,in that form, it can be applied to a variety of interesting optimization prob-lems>. Much of the classical theory of the calculus of variations can beviewed as a of this principle.The geometric interpretation of the technigue for one-dimensionalproblems is obvious. At a maximum or minimum the tangent to the graphof a function is horizontal. In higher dimensions the geometric interpreta-tion is similar: at a 'maximum or mnlimum the tangent hyperplane to theJ",,\1 10 INTRODUCTION 1II11 1, i 1Igraph is horizontal. Thus, again we are led to observe the fundamentalrole of hyperplanes in optimization.1.4 Organization of the BookBefore our discussion of optimization can begin in earnest, certain mental concepts and results of linear vector space theory must be duced. Chapter 2 is devoted to that task. The chapter consists of materialthat is standard, elementary functional analysis background and is essen-tial for further pursuit of our objectives. Anyone having some familiaritywith linear algebra and analysis should have little difficulty with thischapter.Chapters 3 and 4 are devoted to the projection theorem in Hilbert spaceand its applications. Chapter 3 develops the general theory, illustrating itwith some applications from Fourier approximation and optimal controltheory. Chapter 4 deals solely with,the applications of the projectiontheorem to estimation problems including the recursive estimation andprediction of time series as developed by Kalman.Chapter 5 is devoted to the Hahn-Banach theorem. It is in this chapterthat we meet with full force the essential ingredients of the general theoryof optimization: hyperplanes, duality, and convexity.Chapter 6 discusses linear transformations on a vector space and is thelast chapter devoted to the elements of linear functional analysis. Theconcept of duality is pursued in this chapter through the introduction ofadjoint transformations and their relation to minimum norm problems.'The pseudoinverse of an operator in Hilbert space is discussed.Chapters 7, 8, and 9 consider general optimization problems in linearspaces. Two basic approaches, the local theory leading to differentialconditions and the global theory relying on convexity, are isolated anddiscussed in a parallel fashion. The techniques in these chapters are a directoutgrowth of the principles of earlier chapters, and geometric visualizationis stressed wherever possible. In the course of the development, we treatproblems from the calculus of variations, the Fenchel conjugate functiontheory, Lagrange multipliers, the Kuhn-Tucker theorem, and Pontryagin'smaximum principle for optimal control problems.Finally, Chapter 10 contains an introduction to iterative techniques forthe solution of optimization problems. Some techniques in this chapterare quite different than those in previous chapters, but many are based onextensions of the same logic and geometrical considerations found to be sofruitful throughout the book. The methods discussed include successiveapproximation, Newton's method, steepest descent, conjugate gradients,the primal-dual method, and penalty functions.22.1 IntroductionThe first few sections of this chapter introduce the concept of a vectorspace and explor(l the elementary properties resulting from the basicdefinition. The notions of subspace, linear independence, convexity, anddimension are developed and illustrated by examples. The material islargely review for most readers since it duplicates the first part of standardcourses in linear algebra.The second part of the chapter discusses the basic properties oLnormedlinear spaces. A nClrmed linear space is a vector space having a measure ofdistance or length defined on it. With the introduction of a norm, itbecomes possible to define analytical or topological properties such asconvergence and open and closed sets. Therefore, that portion of thechapter introduces and explores these basic concepts which distinguishfunctional analysis from linear algebra.VECTOR SPACES2.2 Definition and ExamplesAssociated with every vector space is a set of scalars used to define scalar'multiplication on the space. In the most abstract Jetting these scalars arerequired only to he elements of an algebraic field. However, in this bookthe scalars are always taken to be either the set of real numbers or ofcomplex numbers. We sometimes distinguish between these possibilitiesby referring to a vector space as either a real or a complex vector space. Inthis book, however, the primary emphasis is on real vector spaces and,although occasion:al reference is made to complex spaces, many resultsare derived only for real spaces. In case of ambiguity, the reader shouldassume the space to be real.Definition. A vector space X is a set of elements called vectors togetherwith two operations. The first operation is addition which associates with11.... ,any two vectors x, y E X a vector x +Y E X, the sum of x and y. Thesecond operation is scalar multiplication which associates with any vectorx E X and any scalar C( a vector CtX; the scalar multiple of x by (x. The set Xand the operations of addition and scalar multiplication are assumed tosafisfy the following axioms:Some additional properties are given as exercises at the end of thechapter.Example 1. Perhaps the simplest example of a vector space is the set ofreal numbers. It is a real vector space with addition defined in the usualway and multiplication by (real) scalars defined as ordinary multiplication.The null vector is the real number zero. The properties of ordinary additionand multiplication of real numbers satisfy the axioms in the definition of avector space. This vector space is called the one-dimensional real co-ordinate space or simply the real line. It is denoted by R1or simply R.Example 2. An obvious extension of Example 1 is to n-dimensional realcoordinate space. Vectors in the space consist of sequences ofn real numbers so that a typical vector has the form x = e2 , , 'II).(associative law)(distributive laws)(cancellation laws).(distributive laws)(commutative law)(associative law)_........--,,-_.__.--.......LINEAR SPACES 2Proposition 1. In any vector space:1. x + y = x + z implies y = z. }2. ax =exy and (X "# 0 imply x = y.3. !Xx = fJx and x:#:() imply a =. /3. I4. (ex - f3)x = ax - f3x. )5. rx(x - y) = ax - (Xy.6. tie = e.1. x+y=y+x.2. (x + y) + z = x +(y + z).3. There is a null vector ein X suchthat x + 8 = x for aU x in X.4. a(x + y) = ax + ay. )5. (x +{J)x = (Xx + f3x.6. (rxf3)x = rx({3x) .7. Ox = 8, 1x = X.\For convenience the vector - 1x is qenoted -:- x and called the negativeof the vector x. We have x + (-x) = (1 - l)x =Ox = 8.There are several elementary but important properties of vector spacesthat follow directly from the axioms listed in the definition. For examplethe following properties are easily deduced. The details are left to thereader.122.2 DEFINITION AND EXAMPLES 13The real number 'k is referred to as the k-th component of the vector. Twovectors are equal if their corresponding components are equal. The nullvector is defined as () = (0,0, ., .,0). If x = ... , en) and y = 112' .. " the vector x + y is defined as the n-tuple whose component is + The vector lXX, where (% is a (real) scalar, is then-tup1e whose k-th component is The axioms in the definition areverified by checking for equality among components, For example, ifx = e2 , , the relation ek + 0 = ek implies x + (} =x.This space, n-dimensional real coordinate space, is denoted by Rn, Thecorresponding complex space consisting of n-tuples of complex numbersis denoted bye".At this point we are, strictly speaking, somewhat prematurely intro-ducing the term dimensionality. Later in this chapter the notion of dimen-sion is defined, and it is proved that these spaces are in fact n dimensional.Example 3. Several interesting vector spaces can be constructed withvectors consisting of infinite sequences of real numbers so that a typicalvector has the for.m x = (,1' '2"'" eb"') or, equivalently,x = Again addition and multiplication '.re defined componentwise as inExample 2. The collection of all infinite sequences of real numbers formsa vector space. A sequence{ek} is said to be bounded if there is a constantM such that < M for all k. The collection of all bounded infinite:' sequences forms a vector space since the sum of two bounded sequencesor the scalar multiple of a bounded sequence is again bounded. This spaceis referred to as the space of bounded real sequences.Example 4. The collection of all sequences of real numbers having only afinite number of terms not equal to zero is a vector space. (Differentmembers of the space may have different numbers of nonzero components.)This space is called the space of finitely nonzero sequences.Example 5. The collection of infinite sequences of real numbers whichconverge to zero is a vector space since the sum of two sequences con-verging to zero or the scalar multiple of a sequence converging to zero alsoconverges to zero.Example 6. Consider the interval [a, bJ on the real line. The collection ofall real-valued continuous functions on this interval forms a vector space.Write x = y if x(t) := y(t) for aU t E [a, b]. The null vector () is the functionidentically zero on [a, b]. If X and yare vectors in the space and ex is a(real) scalar, write (x + y)(t) = x(t) + yet) and = ax(t). These areobviously continuous functions. This space is referred to as the vectorspace of real-valued continuous functions on [a, b].14 LINEAR SPACES 2l,L1,".,.t.I..1....t.f..L,J....I. c.,i' ,I .! ... ;,,,....itIf""""I\

! I', :Ir 0 for all x E X, Ilxll =0 if and only if x = e.2. Ilx + yll < Ilxll + Ilyll for each x, y E X. (triangle inequality)3. Ilexxll = \exl' IIxll for all scalars a and each x E X.The norm is ct,early an abstraction of our usual concept of length. Thefollowing useful inequality is a direct consequence of the inequality.Lemma t. In a normed linear space Ilxll - lIyll < Ilx - Yll' for any twovectors x, y.ProofIlxll - llYl1 = Ilx - y +Yll - Ilyll < Ilx - Yll + Ily\1 - I\YII = llx - YII IBy introduction of a suitable norm, many of our earlier examples ofvector spaces can be converted to normed spaces.i!itIiIi\!{t 0 and is zero only for the function which is identically zero. Thetriangle inequality follows from the relationmax Ix(t) +y(t)1 max [lxCt)1 + ly(t)l] S; max Ix(t)1 +max ly(t)l.Finally, the third aximn follows from the relationmax Icu:(t) I = maxlctllx(t)1 = letl max Ix(t)lExample 2. The linear space D[a, b] consists of all functions onthe interval [a, b] whkh are continuous and have continuous derivativeson [a, b]. The norm on the space D[a, l,J is defined asIlxll = max Ix(t)1 + max li(t)I.astsb aStsbWe leave it to the reader to verify that D[a, b] is a normed linear space.Example 3. The space: of finitely nonzero sequences together with thenorm equal to the stim of the absolute values of the nonzero componentsis a normed linear space. Thus the element x = {e1 2' , 0, 0, ...}has its norm defined as Ilxll = 2':7= 1 lell We may easily verify the threerequired properties by inspectio.n.Example 6. The space of continuous fl. 'lctions on the interval fa, b]becomes a normed space with the norm of a function x defined asIlxll = Ix(t) Idt. This is a different normed space than C[a, b].Exampk 5. Euclidean n-space, denoted E", consists of n-tuples with thenorm of anelement x = {e1, e2"'" e,J defined as Ilxll = leiI2)lf2.This definition obviom.ly satisfies the first and third axioms for norms. Thetriangle inequality fOlr this norm is a well-known result from finite-dimensional vector spaces and is a special case of the Minkowski inequalitydiscussed in Section 2.10. The space En can be chosen as,a real or complex.. space by considering rl;:al or complex n-tuples. We employ the same nota-tion' En for both because it is generally apparent from context which ismeant.

Example 6. We considler now the space BV[a, b] consisting of functionsof bounded .variation 4)n the interval [a, b]. By a partition of the interval[a, b], we mean set of points ti E [a, i = 0, 1,2, ... , n, such thata =to < 11 < t2 < ....< tn=b. A function x defined on [a, b] is said to--"'1""'--".24 LINEAR SPACES 2be of bounded variation if there is a constant K so that for any partitionof [a, b]11L"Ix(tj ) - x(ti-t)! :$ K.i= 1The total variation of x, denoted T.V.(x), is then defined asnT.Y.(x) = sup L Ix(ti ) - xCti-I)1i = 1where the supremum is taken with respect to all 'partitions of (a, b]. Aconvenient and suggestive notation for the total variation isbT.Y.(x) = f Idx(t)laThe total variation of a constant function is zero and the total variation ofa monotonic function is the absolute value of the difference between thefunction values at the end points a and b.The space BV[a, bJJs defined as the space of all functions of boundedvariation on [a, b] together with the norm defined asIlxll = Ix(a)1 +T.V.(x).2.7 Open and Closed SetsWe come now to the concepts that are '''{undamental to the study oftopological properties.Definition. Let P be a subset of a normed space X. The point pEP issaid to be an interior point of P if there is an e > Osuch that all vectors xsatisfying Ilx- pll < B are also members of P. The collection of all i n ~teriar points of P is called the interior of P and is denoted P.We introduce the notation Sex, e) fof the (open) sphere centered at xwith radius e; that is, Sex, e) ={y; Ilx - yll < e}. Thus, according to theabove definition, a point x is an interior point of P if there is a sphereS(x, 6) centered at x and contained in P. A set may have an empty interioras, for example, a set consisting of a single point or a line in 2.Definition. A set P is said to be open if P =p.The eml1ty set is open since its interior is also empty. The entire spaceis an open set. The unit sphere consisting of all vectors x with Ilxll < 1 isan open set. We leave it to the reader to verify that Pis open.;iII!1\2.7 OPEN AND CLOSED SETS 25Definition. A point x E X is said to be a closure point of a set P if, givene > 0, there is a po,int pEP satisfying lIx - pll < e. The collection of allclosure points of P is called the closure of P and is denoted P.In other words, a point x is a closure point of P if every sphere centeredat x contains a point of P. It is clear that Pc:: P.Definition. A set P is said to be closed if P =p.The empty set and the whole space X are closed as well as open. Theunit sphere consisting of all points x with Ilxll < 1 is a closed set. A singlepoint is a closed set. It is clear that is = P.Proposition 1. The complement of an open set is closed and the complementof a closed set is open.Proof. Let P be an open set and 15 = {x : x P} its complement. Apoint x in P is not a closure point of Psince there is a sphere about xdisjoint from P. Thus P contains all its closure pomts and is thereforeclosed.Let S be a closed set. If xeS, then x is not a closure point of Sand,hence, there is a sphe:re about x which is disjoint from S. Therefore x isan interior point of S. We conclude that Sis open. IIThe proofs of the following two complementary results are ~ e f t to thereader.Proposition 2. The intersection of a finite number of open sets is open; theunion of an arbitrary collection of open sets is open.Proposition 3. The union ofa finite number of closed sets is closed; the inter-section of an arbitrary collection of closed sets is closed.We now have two topological operations, taking closures and takinginteriors, that can b l ~ applied to sets in normed space. It is natural toinvestigate the effect of these operations on convexity, the fundamentalalgebraic concept of vector space.Proposition 4. Let C be a convex set in a normed space. Then Eand Careconvex.Proof. If C is empty, it is convex. Suppose xo, Yo are points in E.Fix ct., 0 < ct. < 1. We must show that Zo = OCXo + (1 - oc)Yo E C. Givens> 0, let x, y be selected from C such that /Ix - XO II < e, IIY -"Yoll < e.Then Ilea + (1 - ~ ) y - exxo- (1 - ct)Yoll < e and, hence, Zo is within adistance 8 of z = aX + (l - a)y which is in C. Since B is arbitrary, itfollows that Zo is a closure point of C.26 LINEAR SPACES 2If Cis empty, it is convex. Suppose Xo, Yo E (; and fix ex, 0 < ex < 1. Wemust show that Zo = iXXo +(1 - ex)Yo E C. Since xo, Yo C, there is an8 > 0 such that the open spheres S(xo, e), S(Yo, e) are contained in C. Itfollows that aU points of the form Zo + w with llwll < e are in C sinceZo + w = ex(xo+ w) + (1 - ct)(Yo + w). Thus, Zo is an interior point of C. ILikewise, it can be shown that taking closure preserves subspaces,linear varieties, and cones.Finally, we remark that all of the topological concepts discussed abovecan be defined relative to a given linear variety. Suppose that P is a setcontained in a linear variety V. We say that p E: P is an interior point of Prelative to V if there is an e > 0 such that all vectors x E V satisfyingIlx - pll < E are also members of P. The set P is said to be open relativeto V if every point in P is an interior point of P relative to V.I n case V is taken as the closed linear variety generated by P, i.e., theintersection of an closed varieties containing P, then x is simplyreferred to as a relative interior point of P jf it is an interior point of Prelative to variety V. Similar meaning is given to relatively closed, etc.2.8 ConvergenceIn order to prove the existence of a vector satisfying a desired property,it is common to establish an appropriate sequence of vectors convergingto a limit. In many cases the limit can be shown to satisfy the requiredproperty. It is for this reason that the concept of convergence plays animportant role in analysis.Definition. In a normed linear space an infinite sequence of vectors {xn}is said to converge to a vector x if the sequence {llx - XliiI} of real numbersconverges to zero. In this case, we write X n X.If Xn x, it follows that II XII II Ilxll because, according to Lemma I,Section 2.6, we have both Ilxnll - !Ixll < Ilxl1 - xli and Ilxll - ::::;Ilxl1 - x 1'1 which .implies that Illxnll - Ilxlll < Ilx" - xiI -. O.In the space E" of n-tuples, a sequence converges if and only if eachcomponent converges; howeverI in other spaces convergence is lilot alwayseasy to characterize. In the space of finitely nonzero sequences; define thevectors (' {= {O, 0, ... , 11, 0, ... }, the i-th vector having each of its com-ponents zero except the i-th which is 1. The sequence of vectors {eJ(which is now a sequence of sequences) converges to zero componentwise,but the sequence does not converge to the null) vector since lleill = 1 for.all i.An important observation is stated in the following proposition.2.9 TRANSFORMATIONS AND CONTINUITY 27Proposition 1. If a sequence converges, its limit is unique.Proof. Suppose XII -+ X and XII -+ y. IIx - yll.= llx - XII +Xn - ylI::; I\x - xnll +Ilxn - yll O.Thus, x =y. IAnother way to state the definition of convergence is in terms of spheres.A sequence {xn} converges to x if and only if given e > 0; the sphereS(X, 8) contains for all n greater than some number N.The definition of convergence can be used to characterize closed setsand provides a useful alternative t6 the original definition of closed sets.Proposition 2. A set F is closed if and only if every convergent sequencewith elements in F has its limit in F.Proof The limit of a sequence fromF is obviously a closure point ofF and, therefore, must be contained in F if F is closed. Suppose now that Fis not closed. Then there is a Closure point x of F that is not in F. In eachof the spheres Sex, lin) we may a point xne F since x is a closurepoint. The sequence {x,,} generated in this way converges to x rj: F. ..2.9 Transformations and CODtinuityThe objects that make the ..study of linear spaces interesting and usefulare transformatiorts.Definition. Let X and Ybe linear vector spaces and let D be a subset of A rule which associates with every element XED an element ye Yis said tobe a transformation from Xto Ywith domain D. Ify correspondsto x underT, we write y = rex).Transformations on vector spaces become increasingly more importantas we progress through this book; they are treated in some detail beginningwith Chapter 6. It is the ;purpose of this section to introduce some commonterminology that is conyenient for describing the simple transformationsencountered in the early chapters.If 'a specific domain is not explicitly mentioned when discussing atransformation on a vector space Xf it is understood that the domain isX itself. If for every ye Y there is at most one XED for which T(x) = y,the transformation T is said to be one-to-one. If for every y E Y there is atleast one xeD for which T(x) =y, Tis said to be onto Of, more precisely,to map D onto Y. This terminology, as the notidn of a transformationitself, is of course an extension ofthe familiar notion of ordinary functions. transformation is simply afu.nction c.:.;:fined on one vector space Xwhile28 LINEAR SPACES 2taking values in another vector space Y. A special case of this situation isthat in which the space Y is taken to be the real line.Definition. A transformation from a vector space X into the space of real(or complex) scalars is said to be afunctional on X.In order to distinguish functionals from more general transformations,they are usually denoted by lower case letters such asf and g. Hence,f(x)denotes the scalar that f associates with the vector x E X.On a normed space, f(x) = Ilxll is an example of a functional. On thespace C[0, I], examples of functionals are flex) = xC!), f2(X) = J ~ x(t) dt,/.3 (x) = max x4(t), etc. ReaJ-valued functionals are of direct interesto~ t ~ 1to optimization theory, since optimization consists of selecting a vectorin minimize (or maximize) a prescribed functional.Definition. A transformation T mapping a vector space X into a vectorspace Y is said to be linear if for every Xl' Xl E Xand all scalars Cit, (X2 wehave T(a1x1 +0:2 x2) = C(t T(Xt) + a2 T(x2). IThe most familiar example of a linear transformation is supplied by arectangular.. m x n matrix mapping elements of Rn into R'". An exampleof a linear transformation mapping X = C [a; b] into X is the integraloperator T(x) = J ~ k(t, r)x(-r) dr where k(t, T) is a function continuous onthe square a < 1 :S b ~ a < r < b.Up to this point we have considered transformations mapping oneabstract space into another. If these spaces happen to be normed, it ispossible to define the notion of corttinuity.Definition. A transformation T mapping a llormed space X into a normedspace Y is continuous at XoE X if for every e > there is a l5 > 0 such thatIlx - xoll < b implies that llT(x) - T(xo)11 < e.Note that continuity depends on the norm in both the spaces X and Y.If T liS continuous at each point Xo E X, we sa.y that T is continuous every-where or, more simplY,that T is continuous.The following characterization of continuity is useful in many proofsinvolving continuous transformations.Proposition 1. A transformation T mapping a normed space X into a normedspace Y is continuous at the point Xo EX if and only If Xn ~ Xo impliesT(xtJ) ~ T(xo)Proof. The H if" portion of the statement is obvious; thus we needonly proof the H only if" portion. Let {xn} be a sequence such that oXn ---t Xo ,T(xn) -k T(xo). Then, for some > and every N there is an n > N such2.l0 THE Ip AND Lp SPACES 29that IIT(xn) - T(xo)II > 8. Since xn -4 xo, this implies that for every .5 > 0there is apoint x" with IlxlI - xoll < lJ and IIT(x") - T(x) II > 8. This provesthe" only if" portion by contraposition. I*2.10 The I., and LII SpacesIn this section we discuss some classical nonned spaces that are usefulthroughout the book.Definition. Let p be a rea'i number] < p < oo.The space Ip consists of allsequences of scalars {e1, J } for whicheoL < 00.i= 1The norm of an element x = {ei} in Ip is defined asThe space 100 consists of bounded sequences. The norm of an elementx = in /00 isdefined asIlxlleo -: sup , iIt is obvious that the norm on Ip satisfies I1tXxll = lal Ilxlll and thatIlxll > 0 for each x :I:. O. In this section, we establish two inequalities con-cerning the I, norms, the second of which gives the triangle inequality forthese Ip norms. Therefore, the Ip fiom indeed satisfies the three axiomsrequired of a general norm. Incidentally, it follows from these propertiesof the norm that 1" is in fact a linea" vector space because, if x =gil,Y = {tJJ} are vectors in 1, then for any scalars IX, /3, we have lIcxx + flylls Ilcxl !lxll +!f3IIIYlI < 00 so ax + {3y is a in Ip Since 1, is, avector space and the nortn satisfies the three required axioms, we mayjustifiably refer to Ip as a normed linear vector space.The following two theorems, although of fundamental importance for astudy of the Ip spaces, are somewhat tangential to our main purpose. Thereader wmlose little by simply scanning the proofs.Theorem 1. (The HOlder Inequality) If p and q are positive numbers1 S P 5,00, 1 '5. q < 00, such that IIp + l/q = 1 am! if x = {eh e1 , ...} e Ip'y = {'Iu tJ2' .. } E lq, then00LICi tlil < IIxll" flYHq 1= 11l-A=- ,qt...30 LINEAR SPACES 2Equality holds if and only if(JtL) l/q = (M)lIPIlxllp IIYll qfor each i.Proof The cases p = 1, 00 ; q = 00, 1 are straightforward and are leftto the reader. it is assumed that 1 < p < 00, 1 < q < 00. Wefirst prove the auxiliary inequality: For a 0, b > 0, and 0 < A< 1, wehavea)..b(l-J.) < Aa + (1 - A)bwith equality if and only if a = b.For this purpose, consider the functionf(t) = fA - At + A. - 1defined for t > O. Then f'U) = A(t)-l - 1). Since < A < 1, we havef'(t) > 0 for 0 < t < 1 and f'(I) < 0 for t > L It follows that for t > 0,f(t) < f( 1) = 0 with equality only for t = 1. Hence,. tA< At + 1- Awith equality only for t = 1. If b =f- 0, the substitution t ;= alb gives thedesired inequality, while for b = 0 the inequality is trivial.Applying this inequality to the numbers b_(ll1d)q - Ilxll p ' - IIY\lq Pwe obtain for each ilei rrd < (lli)P +.!. (M)qIlxllpllYll q- P Ilxll p q IIYll q .Summing this inequality over i, we obtain the Holder inequality N.In a normed space" every convergent sequence is a Cauchy sequencesince, if xn-t x, thenIIxn - xmll == IIxn - x +x - xmll Ilxn - xii + Ilx - xmll-loO.In general, however, a. Cauchy sequence may not be convergent.Narmed spaces in which every Cauchy sequence is convergent are ofparticular interest in analysis ;in such spaces it fs possible to identifyconvergent sequences without explicitly identifying their limits. A spacein which every Cauchy sequence has a limit (and is therefore convergent)is said to be complete.iII, 34 LINEAR SPACES 2It...Definition. A normed linear vector space X is complete if every Cauchysequence from X has a limit in X. A complete uormed linear vector spaceis called a Banach space.We frequently take great care to formulate problems arising in applica-tions as equivalent problems in Banach space rather than as problems inother, possibly incomplete, spaces. The advantage of Banachspace in optimization problems is that when seeking an optimal vectormaximizing a given objective, we often construct a sequence of vectors,each member of which is superior to the preceding members; the desiredoptimal vector is then the limit of the sequence. In order that the schemebe effective, there must be available a test for convergence which can beapplied when the limit is unknown. The Cauchy criterion for convergencemeets this requirement provided the underlying space is complete.We now consider examples of incomplete normed spaces.Example 1. Let X be the space of continuous functions on [0, 1J withnorm defined by Ilxll = glx(t)1 dt. One may readily verify that X is anormed linear space. Note that the space is not the space C [0, IJ since thenorm is different. We show that Xis incomplete. Define a sequence ofelements in X by t\\e equationforfor..1.__.. ',I11 ;I .'j... ,. ;I ":j:.I.: 1;j !:I ,(0nx (t) =.,' nt - - + 1" 1 211 1O N. Given E: > 0, choose N such that IIxn - xmll < B/2 forn, m > N. Then for n > NIxn(t) - x(t)1 < Ixn(t) - xm(t) I + Ixm(t) - xCt)1< Ilxn - xmll + Ixm(t) - xCt)l.By choosing m sufficiently large (which may depend on t), each term onthe right can be made smaller than e/2 so Ixit) - x(t}1 < & for n > N.We must still prove that the function x is continuous and that thesequence {x,J converges to x in the norm of C[O, 1]. To prove the con-tinuity of x, fix e> O. For every b, t, and n,[xCt + 15) - xCt)[ < Ix(! + 15) - xn(t + ek Furthermore,this convergence is uniform in k. Let X = ... }. Since {xn} is Cauchy,there is a constant M such that for all. , !lxnll < M. Therefore, for each kand each n, we have < Ilxnll < M from which it follows that x E Zooand Ilxil < M.The convergence of x" to x follows directly from the uniform convergenceof -. Examplt! 5. Lp[O, Il!. 1 < 00 is a Banach space. We do not prove thecompleteness of the Lp spaces because the proof requires a fairly thoroughfamiliarity with integration theory. Consider instead the spaceRp consisting of all functions x on [0, 1] for which IxlPis Riemannintegrable with norm defined asIIxll =u:Ix(t)I' dt)"'.The Dormed space Rp is incomplete. It may be completed by adjoiningto it certain additional functions derived from Cauchy sequences in Rp In this way, Rp is imbedded in a larger normed space which is complete.The smallest complete space containing Rp is Lp A general method forcompleting a normed space is discussed in Problem 15.Exampli! 6. Given two normed spaces X, Y, we consider the product spaceX x Y consisting of ordered pairs (x, y) as defined in Section 2.2. Thespace X x Y can be normed in several ways such as II(x, Y)II = Ilxll + Ilyll38 LINEAR SPACES 2or II(x,y)11 = max {llxll, Ilyll} but, unless specifically noted otherwise, wedefine the product nQrm as H(x, y)11 = Ilxll + l1yll. It is simple to show thatif X and Yare Banach spaces, the product space X x Y with the productnorm is arso a Banach space.2.12 Complete SubsetsThe definition of completeness has an obvious extension to subsets of anormed space; a subset is complete if every Cauchy sequence from thesubset converges to a limit in the subset. The following theorem states thatcompleteness and closure are equivalent in a Banach space. This isnot so in general normed space since, for example, a normed space isalways closed but not necessarily complete.Theorem 1. In a Banach space a subset is complete if and only iJit is closed.Proof A complete subset is obviously closed since every convergent(and hence Cauchy) sequence has a limit in the subset. A Cauchy sequencefrom a closed subset has a limit somewhere in the Banach space. Byclosure the limit must be in the subset. IThe following theorem is of great importance in many applications.Theorem 2. In a mormed linear space, any finite-dimensional subspace iscomplete.Proof The proof is by induction on the dimension of the subspace.A one-dimensional subspace is complete since, in such. a subspace, allelements have the form x = ae where (J. is an arbitrary scalar and e is afixed vector. Convergence of a sequence ctn e is equivalent to convergenceof the sequence of scalars {Cl,J and, hence, completeness follows from thecompleteness of E.Assume that the theorem is true for subspaces of dimension N - 1.Let X be a normed space and M an N-dimensional subspace of X, Weshow that M is complete.Let {el , e2 , ... , eN} be a basis for M. For each k, defineOk = inf Ilek - LC(jejll.a/I! j*kThe number Ok is the distance from the vector ek to the subspace Mkgenerated by the remaining N - 1 basis vectors. The number tJk is greaterthan l.ero because otherwise a sequence of vectors in the N - 1dimensionalsubspace Mk could be constructed converging to eli.' Such a sequencecannot exist since Mk is complete by the induction hypothesis.2.13 EXTREME VALUES OF FUNCTIONALS AND COMPACTNESS 39Define 1> > 0 as the minimum of the lJk , k = 1,2, "', N. Suppose that{xn} is a Cauchy sequence in M. Each Xli has a unique representation asFor arbitrary n, mIlx" - xmll = f (17 - - Ak'llJi::: 1 ,for each k, 1 S k < N. Since Ilx,. - xmll -+ 0, each IA7 - Akl -4 O. Thus,{lU 1 is a Cauchy sequence of scalars and hence convergent to a scalarAk' Let x =L;;=1 Akek. Obviously, x E M. We show that Xli x. For alln, we haveNIlxn - xii = L - Ak)ek < N max - Akl Ilekll,k=l but since 11k- Akl 0 for all k, IIxn - x II O. Thus, {x,,} converges toxeM. I*2.13 Extreme Values of Functionals and CompactnessOptimization theory is largely concerned with the maximization orminimization of real functionals over a given subset; indeed, a majorportion of this book is concerned with principles for finding the points atwhich a given functional attains its maximum. A more fundamentalquestion, however, is whether a functional has a maximum on a given set.In many cases the answer to this is easily established by inspection, but inothers it is by no means obvious.In finite-dimensional spaces the well-known Weierstrass theorem, whichstates that "a continuous function defined on a closed and bounded (com-pact) set has a maximum and a minimum, is of great utility. Usually, thistheorem alone is sufficient to establish the existence of a solution to agiven optimization problem.In this section we generalize the Weierstrass theorem to compact setsin a normed space, thereby obtaining" a simple and yet general resultapplicable to infinite-dimensional problems. Unfortunately, however, therestriction to compact sets is so severe in infinitedimensional normed

spaces that the Weierstrass theorem can in fact only be employed in tbeminority of optimization problems. The theorem, however, deservesspecial attention in optimization theory if only because of the finite-dimensional version. The: interested reader should also conslllt Section 5.10.L.\I;I.40 LINEAR SPACES 2Actually, to prove the Weierstrass theorem it is not necessary to assumecOli1timiity of the functional but onfy upper semicontinuity. This addedgenerality is often of great utility.Definition. A (real-valued) functional/ defined on a normed space X issaid to be upper sClmicontinuous at Xo if, given e > 0, there is a f> > 0 such.that f(x) -/(xo) < e for Ilx Xo II < 6. A functional/is said to be lowersemicontinuous at Xo if - fis upper semicontinuous at xo'As the reader may verify, an equivalent definition is upper continuous at Xo if 3 lim suPx-+xo!(x) < f(xo)' Clearly, if f is both upperand lower semicontinuous, it is continuous.Definition. A set K in a normed space X is. said to be compact if, given anarbitrary sequence {xJ in K, there is a subsequence {x;J converging to anelement x E K.In finite dimensions, compactness is equivalent to being closed andbounded, but, as is shown below, this is not true in normed space.Note, however, that a compact set K must be complete since any Cauchysequence from K must have a limit in K.Theorem 1. (Weierstrass) An upper semicontinuousfunctional on a compactsubset K ofa normed linear space X achiepes a maximum on K.,Proof. Let M = sup f(x) (we allow the possibility M =oo). There isxeKa sequence {Xi} from K such that !(xi ) -lo M. Since K is compact, thereis a convergent subsequence Xi" -lo XE K. Clearly, f(Xi,,) -lo M and, since fis upper semicontinuous, f(x) > lim/exi,,1 = M; Thus, since lex) must bennite, we conclude that M < 00 and that f(x) i M. IWe offer now an example of a continuous functiona[ on the unit sphereof C [0, 1J which does not attain a maximum (thus proving that the unitsphere is not compact).Example 1. Let the functional f be defined on C [0, nby1/2 1f(x) = J x(t) dt - f xCt) dt.o 1/2It is easily verified that.f is continuous since, in fact, If(x)1 < Ilxll. Thesupremum of f over the unit sphere in C[0, 1] is 1, but no continuous[unction of norm Jess than Unity aChieves this supremum. (If the problem3 The lim sup of a functional on a normed space is the obvious extension of thecorresponding definition for functions of a real variable. .2.14 QUOTIENT SPACES 41were formulated in Loo[O, IJ, the supremum would be achieved by afunction discontinuous. at t = t.)Example 2. Suppose that in the above example we restrict our attentionto those continuous functions in C [0, t] within the unit sphere which arepolynomials of degree n or less. The set of permissible elements in C [0, 1]is now a closed, bounded subset of the finite-dimensional space of n-thdegree polynomials and is therefore compact. Thus Weierstrass's theoremguarantees the existence of a maximizing vector from this set..*2.14 Quotient S p a c ( ~Suppose we select a subspace M from a vector space X and generatelinear varieties V in X by translations of. M. The linear varieties obtainedcan be regarded as t h ( ~ elements of a new vector space called the quotientspace of X modulo kt and denoted XI M. If, for example M is a planethrough the origin in three-dimensional space, XI M consists of the familyof planes parallel to 114. We formalize this definition below.Definition. Let M be: a subspace of a vector space X. Two elementsXl' X2E X are said to be equivalent modulo M jf Xl - x2EM. In this case,we write Xl == x2 This equivalence relation partitions the space X into disjoint subsets, orclasses, of equivalent elements: namely, the linear varieties that are distincttranslates of the subspaceM. These classes are oftencalled the cosets of M.Given an arbitrary element x E X, it belongs to a unique coset of M whichwe denote4by [x].Definition. Let M be a subspace of a vector space X. The quotient spaceXI M consists of all cosets of M with addition and scalar multiplicationdefined by [Xl] + [x2] = [Xl + Xl], (X[X] = [(XX].Several things, which we leave to the reader, need to be verified in orderto justify the above definition. It must be shown that the definitions foraddition and scalar multiplication are independent of the choice ofrepresentative elements, and the axioms for a vector space must be verified.These matters are easily proved, howevl',r, and it is not difficult to see thataddition of cosets [Xl] and [X2] merely amounts to addition of the corre-sponding linear varieties regarded as sets in X. Like-viset multiplication ofa coset by a scalar ex (except for ex = 0) amounts to multiplication of thecorresponding linear variety by ('1,4 The notation [x] is also used for the subspace generated by x but the usage is alwaysclear from context.42 LINEAR SPACES 2Suppose that X'is a normed space and that Mis 3; subspaceof X. We define the norm o'r a coset (x] E XIM byII [x]l! = inf Ilx + mil,nlEMi.e., II [x] II is the infimum of the norms of all elements in the coset [x]. The.assumption that M is closed insures that 11 [x] II > 0 if [x] e. Satisfactionof the other two axioms for a norm is easily verified. In the case of X beingtwo dimensional, M one dimensional;: and XIM consisting of parallellinesJ the quotient norm of one of the lines is the minimum distance of theline from the origin.Proposition 1. Let X be a Banach space, M a closed subspace of X, and XI Mthe quotient space 'rl'ith the quotient norm defined as above. Then XI M is alsoa Banach space.The proof is left to the reader.*2.15 Denseness and SeparabilityWe conclude this chapter by introducing one additional topological con-cept, that of denseness.Definition. A set D is said to be dense in a normed space Xif for eachelement x E X and each e > 0 there exists d E D with II x - dll < 8.fIf D is dense in X, there are points of Darbitrarily close to each x E X.Thus, given X J a sequence can be constructed from D which converges tox. It follows that equivalent to the' above definition is the statement: Dis dense in X if 15, the closure of D, is X.The definition converse to the above is that of a nowhere dense set.Definition. A set E is said to be nowhere dense in a normed space X if Econtains no open set.The classic example of a dense set is the set of rationals in the real line.Another example, provided by the well-known Weierstrass approximationtheorem, is that the space of polynomials is dense in the space C [a, b].Definition. A normed space is separable if itcontains a countable dense set.Most, but not all, of the spaces considered in this book are separableExample 1. The space Ell is separable. The collection of vectorsx = ... , ,") having rational components is countable and densein En.2.16 PROBLEMS 43Example 1. The lp spaces,l < p < are separable. To prove separability,let D be the set' consisting of all. finitely nonzero sequences with rationalcomponents. D is easily seen to be countable. Let x . {el' e2' ...}exist inlpl 1 O. Since If;. < 00, there is an N such that 1 led" < e/2. For each k, 1 k < N, let 'k be a rational such thatleA: - r"p' < s/2N; let d =: {rt , r2' ... , rN to, 0, ... }. ClearlYt de D andIlx - dilP< e. Thus, D is dense in 11"The space 100 is not separable.Example 3. The C [a, bJ is separable. Indeed, the countable setconsisting of all polynomials with rational coefficients is dense in C[a, b].Given x e C [a, b] and e > 0, it follows from the well-known Weierstrassapproximation theorem that a polynomial p can be found such thatIx(t) - p(t)1 < 6/2 for all t E [a, b]. there is another polynomial rwith rational such that Ip(t) - r(t)1 < e/2 for all t E [a, b]can be constructed by changing each coefficient by less than e/2N whereN - 1 is the order of the polynomial p). ThusIlx - rll = max Ix(t) - r(t)1 < max Ix(t) - p(t)1 + max Ip(t) - r(t)\te[o,b]Example 4. The Lp spaces 1 0, x # 0. It is shown inProposition 1 that II II satisfies the triangle inequality and, hence, defines anorm on the space.Before proving the triangle inequality, it is first necessary to pr.ove animportant 'lemma which is fundamental throughout this chapter.Lemma 1. (The Cauchy-Schwarz Inequality) For all x, y in an inner productspace, \(xIY)1 < Equality holds ifand only ifx =;'y or y = 8.Proof Ify = 8, the inequality holds trivially. Therefore, assume y '# e.For all scalars ;', we haveox - Ay 1x -- AY) = (x Ix) - l(y Ix) -l(xIy) + IAI20' Iy).In particular, for A= (x Iy)!(y Iy), we haveOX!x)_!exly)12- (y 1y) ,orI(x IY)I s; J(x Ix)(y Iy) = Ilxllllyll. IProposition 1. On a pre-Hilbert space X the function Ilxll == J(x Ix) is anorm.Proof The only requirement for a norm which has not already beenesta.blished is the triangle inequality. For any x, y E X, we havellx + Yll2 = (x + ylx +y) = (xix) +(xly) + (Ylx) + (Yly):s; IIxll2+ 2\Cx Iy) I+11Y1I 2By the Cauchy-Schwarz inequality, this becomes IIIx +JIl2s: IIxl12+ 2nxll Jlyll l- lIyll2 =('llxll + Ily11)2.The square root of the above inequality is the desi"ed result. I"48 HILBERT SPACE 3Several of the normed spaces that we have previously considered can beconverted to pre-Hilbert spaces by introducing an appropriate innerproduct.Example 1. The space En consisting of ntuples of real numbers is apre-Hilbert space with the inner product of the vector x = e2 ... , and the vector Y = (JIll rf2' . , tin) defined as (x IY) = Ii=:1 ei'li' In thiscase it is clear that (xly) =(yJx) and that (xly) is linear in both entries.The norm defined as J(x Ix) is(n' )1/2 = which is the Euclidean norm for En.Example 2. The (real) space 12 becomes a space with the innerproduct of the vectors x = gl' e2 , .. } and y ={'h, 112, } defined as(x Iy) = 1 The Holder inequality for 12 (which becomes theCauchy-Schwarz inequality) guarantees that I(x Iy) I < Ilxlll IIY\\ and,thus, the inner product has finite value. The norm defined by x)is the usuallz norm.Example 3. The (real) space L2[a, b] is a pre-Hilbert space with the innerproduct defined as (x Iy) = x(t)y(t) dt. Again the Holder inequalityguarantees that (x Iy) is finite.Example 4. The space ofpolynomifllftmctions on [a, b] with inner product(x Iy) = J: x(t)y(t) dt is a pre-Hilbert space. Obviously, this space is a sub-space of the space L 2[a, b].There are various properties of inner products which are a direct con-sequence of the definition. Some of these are useful in later developments.Lemma 2. In a pre-Hilbert space the statement (x Iy) = 0 jor all y impliesthat x = 8.Proof Putting y = x implies (x 1 x) = 0.1Lemma 3. (The Parallelogram Law) ,In apre-Hilbert spaceIlx + YI1 2+ \Ix - ylll = 211xl1 2+ 211Y11 2Proof The proof is made by direct ,expansion of the norms in terms ofthe inner product. IThis last result is a generalization of a result for parallelograms in two-dimensional geometry. The sum of' the squares of the lengths or thediagonals of a parallelogram is equal to twice the stirn of the squares oftwo adjacent sides. See Figure 3.1.I

IIjIII\\"\,I\\1I"IIII\I\,3.3 THE PROJECfION THEOREM 49Since a pre-Hilbert space is a special kind of normed linear spacet theconcepts of convergence, closure, completeness, etc., apply in these spaces.Definition. A pre-Hilbert space is called a Hilbert space.A Hilbert space, then, is a Banach space equipped with an inner productwhich indl.lces the norm. The spaces En, /2' and L2[a, b] are all Hilbertspaces. Inner products, enjoy the following useful continuity property. yFigure 3.1 ,The parallelogram lawLemma 4. (Continuity ofthe Inner Product) Suppose that Xn -+ x andYn -+ Yin a pre-Hilbert space. Then (x" IYn) -+ (x Iy).Proof Since the sequence {XII} is convergent, it is bounded; sayIlxnll M. NowI(XII IYn) - (x IY)I =I(XII IY,,) - (xnIy)+ (x" Iy) - (x \!y)1 < I(xnIYn - Y)I + II(x/I - x IY)IApplying the Cauchy-Schwarz inequality, we obtainI(x" IYn) - (x IY)l < IIX/lIII llYn - YI! + IIx" - xIlIlYIISince IIxlI l! is bounded,I{xn IYIt) - (x ly)1 S MIIY/l - yll +. Ilx/! - xlllilyll .... O. I3.3 The Projection TheoremWe get a lot of analytical mileage from the following definition.Definition. In a pre-Hilbert space two vectors x, yare said to be orthog-onal if (x Iy) = O. We symbolize this by x J.. y. A vector x is said to beorthogonal to a set S (written xl- S) if x..L s for each S E S.The concept of orthogonality has many of the donsequences in pre-Hilbert spaces that it has in plane geometry. For example, the Pythagoreantheorem is true in pre-Hilbert spaces.Lemma 1. If x loy, then Ilx +YII 2= llxl12+ .. ,50 HILBERT SPACE 3.lI. \, --Ii.t ~ .1f!."..JI..iI(tJ~~I'11'I.;.';' :"1,1l".: IfI. .i.",l"'1rL. :Proof.Ilx + YI12= (x + Y Ix + y) = 11xl12+ (x Iy) + (y Ix) + IIYl12 = I!xl12+ IIYll lWe turn now to our first optimization problem and the projection. \theorem which characterizes'its solution. We prove two slightly differentversions of the theorem: one valid in ali arbitrary. pre-Hilbert space andanother, with a stronger conclusion, valid in Hilbert space.The optimization problem considered is this: Given a vector x in a pre-Hilbert space X and a subspace M in X, find the vector m e M closest to xin the sense that it minimizes Ilx - mil. Of course, if x itself lies inM, thesolution is trivial. In general, however, three important questions must beanswered for a complete solution to the problem. First, is there a vectorm e M which minimizes Ilx - mII ,or is there no m that is at least as good asall others? Second, is the solution unique? And third, what is the soltitionor how is it characterized? We answer these questions now.Theorem 1. Let X be a pre-Hilbert space, M a subspace' of X, and x anarbitrary vector in X. If there is a vector mo 6 M such that Ilx - moll O.for i = 1, 2, ... , nwith equality if rti > O.Letting G be the Gra.m matrix of the Yi'S and letting bi =(x IYI), weobtain the equation(1) Get - b:::::: zfor some vector Z with components Zi > O. (Here IX and b are vectors withcomponents C'li and h

Date post:	30-Oct-2014
Category:	Documents
Upload:	pedrorafernandes
View:	182 times
Download:	25 times

Luenberger_optimzation by Vector Space Methods

Documents