The Nonlinear Geometry of Linear Programming I. Afﬁne and...

The Nonlinear Geometry of Linear ProgrammingI. Affine and Projective Scaling Trajectories

by

D. A. BayerColumbia UniversityNew York, New York

J. C. LagariasAT&T Bell LaboratoriesMurray Hill, New Jersey(June 24, 1986 revision)

ABSTRACT

This series of papers studies a geometric structure underlying Karmarkar’s projective scaling algorithm forsolving linear programming problems. A basic feature of the projective scaling algorithm is a vector fielddepending on the objective function which is defined on the interior of the polytope of feasible solutions ofthe linear program. The geometric structure we study is the set of trajectories obtained by integrating thisvector field, which we call P -trajectories. In order to study P -trajectories we also study a related vectorfield on the linear programming polytope, which we call the affine scaling vector field, and its associatedtrajectories, called A -trajectories. The affine scaling vector field is associated to another linearprogramming algorithm, the affine scaling algorithm. These affine and projective scaling vector fields areeach defined for liner programs of a special form, called strict standard form and canonical form,respectively.

This paper defines and presents basic properties of P -trajectories and A -trajectories. It reviews theprojective and affine scaling algorithms, defines the projective and affine scaling vector fields, and givesdifferential equations for P -trajectories and A -trajectories. It presents Karmarkar’s interpretation of A -trajectories as steepest descent paths of the objective function ⟨c , x⟩ with respect to the Riemannian

geometry ds 2 =c = 1Σn

xi2

dx i dx i_ ______ defined in the interior of the positive orthant. It establishes a basic relation

connecting P -trajectories and A -trajectories, which is that P -trajectories of a Karmarkar canonical formlinear program are radial projections of A -trajectories of an associated standard form linear program. As aconsequence there is a polynomial time linear programming algorithm using the affine scaling vector fieldof this associated linear program: this algorithm is essentially Karmarkar’s algorithm.

These trajectories will be studied in subsequent papers by a nonlinear change of variables which we callLegendre transform coordinates. It will be shown that both P -trajectories and A -trajectories have twodistinct geometric interpretations: parametrized one way they are algebraic curves, while parametrizedanother way they are geodesics (actually distinguished chords) of a geometry isometric to a Hilbertgeometry on a suitable polytope or cone. A summary of the main results of this series of papers isincluded.

The Nonlinear Geometry of Linear ProgrammingI. Affine and Projective Scaling Trajectories

by

D. A. BayerColumbia UniversityNew York, New York

J. C. LagariasAT&T Bell LaboratoriesMurray Hill, New Jersey(June 24, 1986 revision)

(Preliminary draft: May 1, 1986)

1. Introduction

In 1984 Narendra Karmarkar [K] introduced a new linear programming algorithm which is proved to

run in polynomial time in the worst case. Computational experiments with this algorithm are very

encouraging, suggesting that it will surpass the performance of the simplex algorithm on large linear

programs which are sparse in a suitable sense. The basic algorithm has been extended to fractional linear

programming [A] and convex quadratic programming [KV].

Karmarkar’s algorithm, which we call the projective scaling algorithm,* is a piecewise linear algorithm

defined in the relative interior of the polytope P of feasible solution of a linear programming problem. The

algorithm takes a series of (linear) steps, and the step direction is specified by a vector field v(x) defined at

all parts x in the relative interior of P. This vector field depends on the linear program constraints and on

the objective function. The projective scaling algorithms uses projective transformations to compute this

vector field (see Section 4.)

Our viewpoint is that the fundamental mathematical object underlying the projective scaling algorithm

is the set of trajectories obtained by following this vector field exactly. That is, given a projective scaling

vector field v(x) and an initial point x 0 one obtains (parametrized) curves by integrating the vector field for

all initial conditions:

__________________* This choice of name is explained in Section 4.

- 3 -

dtdx_ __ = v(x) (1.1a)

x( 0 ) = x 0 . (1.1b)

A P -trajectory (or projective scaling trajectory) is an (unparametrized) point set specified by such a curve

extended to the full range of t where a solution to the differential equation (1.1) exists.

In this series of papers our first object is to study the P -trajectories, to give several algebraic and

geometric characterizations of them, and to prove facts about their behavior. We will show the P -

trajectories are interesting in their own right. They have an extremely rich mathematical structure,

involving connections to algebraic geometry, differential geometry, partial differential equations, classical

mechanics and convexity theory. This structure can be exploited in several ways to give new linear

programming algorithms, which we will discuss elsewhere.

Our results concerning P -trajectories are derived using their connection to another set of trajectories,

which we call A-trajectories (or affine scaling trajectories), which are easier to study. Our second object is

therefore to give several geometric characterizations of A -trajectories. The A -trajectories arise from

integrating a vector field associated to another interior-point linear programming algorithm, which we call

the affine scaling algorithm.* The affine scaling vector field has been discovered and studied by [B],

[VMK], and many others. There is a simple relation between the P -trajectories of a linear programming

problem and the A -trajectories associated to an associated linear program that is given in Section 6.

We mention some background and related work. The idea of following trajectories to solve nonlinear

equations has a long history, and is a basic methodology in non-linear programming [FM], [GZ]. From this

perspective Karmarkar’s projective scaling algorithm can be viewed as a homotopy restart method using the

system of P -trajectories, as was observed by Nazareth [N] (see also [GZ], Sect. 15.4). One method of

constructing trajectories is by means of a parameterized family of barrier functions, see [FM] Chapter 5. In

this connection it is possible to relate A -trajectories and P -trajectories to trajectories defined using a

parameterized family of logarithmic barrier functions. (See equation (2.12) following.) N. Megiddo [M2]

__________________* The rationale for this name is given in Section 4.

- 4 -

studies trajectories obtained from other parameterized families of nonlinear optimization problems. The

geometric behavior of A -trajectories is being studied by M. Shub [S]. Finally J. Renegar [R] has made use

of P -trajectories together with new ideas to construct a new interior-point linear programming algorithm

which uses Newton’s method to follow the central P -trajectory. Renegar’s algorithm runs in polynomial

time and requires only O(√ n L) iterations in the worst case. This improves on Karmarkar’s [K] worst-case

bound of O(nL) iterations. Surveys of Karmarkar’s algorithm and recent developments appear in [H],

[M1].

In Section 2 we first summarize the main results of this series of papers, and then summarize the

contents of this paper in detail. Section 3 gives a brief description of the affine and projective scaling linear

programming algorithms, which is independent of the rest of the paper.

We are indebted to many people for aid during this research. We wish to thank particularly Jim Reeds

for conversations on convex analysis and references to Rockafellar’s work and Peter Doyle for

conversations on Riemannian geometry. We are indebted to Narendra Karmarkar for permission to include

his steepest descent interpretation of A -trajectories in Section 5 of this paper.

2. Summary of results

In this section we give an overview of the main results of this series of papers, and then summarize the

contents of this paper in more detail.

A. Main results — overview

We give two distinct geometric interpretations of the P -trajectories, corresponding to two different

parameterizations of these trajectories. First, in terms of the coordinate system of the linear program, each

P -trajectory is a piece of a (real) algebraic curve. The P -trajectory can then be naturally extended to the

full (complex) algebraic curve of which it is a part. Viewed algebraically it is then a branched covering of

the projective line P 1 ( C| ), while viewed analytically it is a Riemann surface. The objective function value

gives a natural parametrization of the P -trajectory. Second, there is a metric d H (. , .) defined on the

interior of the polytope P such that each P -trajectory is an extremal (‘‘geodesic’’) with respect to this

metric. The resulting geometry is isometric to Hilbert’s projective geometry defined on the interior of a

- 5 -

polytope P* which is combinatorially dual to P, (Hilbert’s geometry is defined in [H], Appendix 2). This

geometry is a chord geometry in the sense of Busemann [Bu2] and the P -trajectories are the distinguished

chords in the sense of Busemann-Phadke [BP]. The P -trajectory inherits an obvious parameterization from

the metric d H (. , .).

Our results about P -trajectories will be proved using their close connection to A -trajectories.

Karmarkar’s P -trajectories are defined for linear programs of the following special form which we call

canonical form:

minimize ⟨c , x⟩ (2.1a)

sub j ect to

Ax = 0 0 , (2.1b)

eT x = n , (2.1c)

x ≥ 0 , (2.1d)

with side conditions

Ae = 0 0 . (2.1e)

Here ⟨c , x⟩ = cT x denotes the usual Euclidean inner product, and e = ( 1 , 1 , ... , 1 ) T . There is a simple

relation between P -trajectories of a canonical form linear program (2.1) and A -trajectories of the

associated linear programming problem:


subject to

Ax = 0 0 , (2.2b)

x ≥ 0 0 , (2.2c)


Ae = 0 0 . (2.2d)

The relation is that the radial projection of an A -trajectory onto the hyperplane eT x = n is a P -trajectory.

(Theorem 6.1 of this paper.) There is a second relation between P -trajectories and A -trajectories of the

linear program (2.1) which we give later in this summary.

The A -trajectories also have several geometric interpretations. First, N. Karmarkar has observed that

- 6 -

A -trajectories of a standard form linear program:

minimize ⟨c , x⟩

subject to

Ax = b ,

x ≥ 0 0 .

having a feasible solution x with all x i > 0 may be interpreted as steepest descent curves of ⟨c , x⟩ with

respect to the Riemannian metric ds 2 =i = 1Σn

xi2

dx i dx i_ ______ defined on the interior of the positive orthant

Int(R+n ) = x : all x i > 0. We include a proof of this fact with his permission. Second, there is a

metric d E (. , .) defined on the relative interior Rel-Int( P) of the polytope P of feasible solutions such that

the affine scaling curves are geodesics with respect to this metric. This metric is isometric to Euclidean

geometry restricted to a cone. If P is a bounded polytope, then it is isometric to Euclidean geometry on Rk

where k = dim (P). The A -trajectories are algebraic curves with respect to the metric parameter, and this

metric parameter is algebraically related to the linear program coordinates, so that the A -trajectories are

pieces of (real) algebraic curves in the linear program coordinates. Hence A -trajectories also extend to

branched coverings of P 1 ( C| ) which are Riemann surfaces. Third, for a linear program in the homogeneous

form (1.3) the A -trajectories also have a Hilbert geometry interpretation. The polytope P of feasible

solutions to a homogeneous linear program (2.1) is a cone, and there is a pseudo-metric d_

H (. , .) on Int(P)

such that the geometry induced by this pseudo-metric is isometric to Hilbert’s projective geometry on the

dual cone, and the A -trajectories are a set of distinguished chords. (A pseudo-metric satisfies the triangle

inequality but may have d H (x 1 , x 2 ) = 0 with x 1 ≠ x 2 .)

Our results on P -trajectories and A -trajectories are obtained using a nonlinear change of coordinates.

We call the new coordinate system we construct Legendre transform coordinates. This name is chosen

because the coordinates are constructed using a Legendre transform mapping attached to a logarithmic

barrier function, cf. Rockafellar [R2], Section 25. We now describe these Legendre transform coordinates

in a special case. Consider a linear program in the following special form, which we call standard form:


subject to

- 7 -

Ax = b (2.3b)

x ≥ 0 0 (2.3c)


AA T is an invertible matrix . (2.3d)

We say such a linear program has in strict standard form constraints if it has a feasible solution

x = (x 1 , ... , x n ) with all x i positive. The Legendre transform coordinates are determined by the

constraints of the linear program and do not depend on the objective function. Let H denote the set of

constraints of a strict standard form linear program, and let P H be its associated polytope of feasible

solutions. The relative interior Rel-Int(P H ) of the polytope of feasible solutions of this linear program then

is nonempty and lies in the interior Int(R+n ) of the positive orthant. We consider the logarithmic barrier

function f : Int (R+n ) → Rn defined by

f H (x) = −i = 1Σn

log x i . (2.4)

This function has the gradient

∇ f H (x) =−

x 1

1_ __ , −x 2

1_ __ , . . . , −x n

1_ __

T

. (2.5)

The associated Legendre transform coordinate mapping φ H maps Rel-Int(P) into the subspace

A ⊥ = x : Ax = 0 0

of Rn and is defined by

φ H (x) = π A⊥ (∇ f (x) ) , (2.6)

where π A⊥ is the orthogonal projection operator onto the subspace A ⊥ . This projection operator is given

explicitly by the formula

π A⊥ = I − A T (AA T ) − 1 A ,

whenever AA T is invertible. We show as a special case of theorems proved in part II for a strict standard

form linear program whose polytope P H of feasible solutions is bounded that Legendre transform mapping

φ H : Rel - Int (P H ) → A ⊥

is a real-analytic diffeomorphism onto all of A ⊥ . In particular it is one-to-one and onto, so there is a unique

point x H in H such that

- 8 -

φ H (x H ) = 0 0 , (2.7)

and we call this point the center. We show in part II that this definition of center coincides with

Karmarkar’s definition of center when the constraints H are in Karmarkar’s canonical form. The center has

a geometric interpretation as that point in Rel-Int(P H) that maximizes the function w H (x) giving the

product of the distances of x to each of the hyperplanes defining the boundary of each inequality constraint.

For the constraints H we have

w H (x) =i = 1Π

nx i .

In part II we show that it is possible to define Legendre transform coordinates φ H for any set H of linear

program constraints, including both equality and inequality constraints. In the general case several extra

complications appear. These include the facts that the range space of the Legendre transform coordinate

mapping in the most general case is the interior of a polyhedral cone, that the mapping may be many-to-

one, and that a center may not exist. We show that in all cases these coordinates transform contravariantly

under an invertible affine transformation A : Rn → Rn . To describe this, let A(x) = Lx + c where L is

an invertible linear map, and let L * = (L T ) − 1 denote its dual map. Then the linear program constraints H

are carried to the linear program constraints A(H). The contravariance property is expressed in the

following commutative diagram:

Rel-Int(P A(H))

φ A(H)

Rel-Int(P H)

_ _____________

_ _____________A ⊥

φ H

(L(A) ) ⊥ .

A

L_*

(2.8)

where L_

* = π A⊥ (L *) is a vector space isomorphism.

The Legendre transform mapping is given by rational functions of the linear program coordinates x i .

This mapping is one-to-one for a strict standard form problem, hence it then has an inverse mapping φH− 1

which is necessarily given by algebraic functions of the Legendre transform coordinate space. The

logarithmic barrier function f H (x) can be shown to be strictly convex on Rel-Int(P H ) in this case and by

- 9 -

the general theory of convex analysis we can construct its (Fenchel) conjugate function g : A ⊥ → Rn

which is defined for y ∈ A⊥ as the solution of the problem:

g H (y) =x ∈ Rel − Int (PH )

sup (⟨x , y⟩ − f (x) ) , (2.9)

(see: [F],[R2] 12.2.2.) Then the Legendre transform duality theorem ([R2], Theorem 26.4) implies that φH− 1

is given by

φH− 1 (y) = ∇g H (y) . (2.10)

At present we cannot directly use this explicit formula due to our lack of knowledge how to compute the

Fenchel conjugate function except in special cases.

The Legendre transform mapping originally arose as a tool in studying ordinary and partial differential

equations, cf. [CH], Vol. II, pp. 32-39. In particular it is used to convert the Lagrangian formulation of a

classical mechanical system to the Hamiltonian formulation (see [A], pp. 59-65, [Ln]). This connection is

not accidental — the second author will show elsewhere there is an interpretation of A -trajectories arising

from a new family of completely integrable Hamiltonian dynamical systems [L].

The utility of Legendre transform coordinates is established in part III of this series of papers, where it

is shown that it maps the set of A -trajectories of a strict standard form linear program with bounded feasible

polyhedron to the complete set of parallel straight lines with slope

c′ = π A⊥ (c) . (2.11)

The A -trajectories of a strict standard form linear program having an unbounded polyhedron of feasible

solutions are mapped to a family of parallel half-lines or line segments having the same slope c′.

Consequently each A-trajectory is an inverse image of part of a straight line under the Legendre transform

mapping φ H . Since this mapping is a rational map each A -trajectory of a strict standard form linear

program must be part of a real algebraic curve. Then since each P -trajectory of a canonical form linear

program (2.1) is rationally related to an A -trajectory of the strict standard form linear program (2.2) it must

also be a part of a real algebraic curve.

We distinguish the particular P -trajectory of a canonical form linear program (2.1) which passes though

the center x H of its constraint set (2.1b)-(2.1d) and call it the central P-trajectory with objective function

⟨c , x⟩. We define the central A-trajectory of a strict standard form linear program having a center with

- 10 -

objective function ⟨c , x⟩ analogously. We prove in part III for a canonical form linear program (2.1) that

the central P -trajectory and central A -trajectory with objective function ⟨c , x⟩ coincide. This is a second

relation between P -trajectories and A -trajectories. In particular it implies that the central P -trajectory is a

straight line in Legendre transform coordinates.

The central P -trajectory (which is the central A -trajectory) plays a fundamental role in Karmarkar’s

algorithm. In part III we give a number of other geometric characterizations of this trajectory, the most

interesting of which is that it is the locus of centers of the linear programs obtained from the given standard

form linear program by adding the extra equality constraint,

⟨c , x⟩ = λ ,

where λ ranges over the possible values of the objective function in Rel-Int(P H). Another related

interpretation of the central P -trajectory of a standard form problem (2.3) is that it is described by the

solution x(µ) = x(µ ; φ) of a family of non-linear fixed-point problems parametrized by µ. This family is

given by:

minimize φ(⟨c , x⟩) − µi = 1Σn

log x i (2.12a)

subject to

Ax = b , (2.12b)

x > 0 0 , (2.12c)

where φ(t) : R → R is any one-one onto monotonic increasing function and − ∞ < µ < ∞. This

representation describes the central P -trajectory as the set of solutions to a parametrized family of

logarithmic barrier functions.

We also analyze the behavior of non-central P -trajectories. In part III we prove that every P -trajectory

lies in a plane in Legendre transform coordinates. For a non-central P -trajectory this plane is determined

the line given by the central P -trajectory (for the same objective function) together with any point on the

given non-central P -trajectory. Any noncentral P -trajectory is not a straight line in Legendre transform

coordinates. If the objective function ⟨c , x⟩ is normalized in Karmarkar’s sense, i.e. it takes the value 0 0 at

the optimal solution of a canonical form linear program, then the non-central P -trajectories in Legendre

transform coordinates for ⟨c , x⟩ asymptotically approach the central P -trajectory as x approaches the

- 11 -

optimal point.

Any noncentral P -trajectory of a canonical form linear program can be mapped to a central P -trajectory

(in a different Legendre transform coordinate system) through a suitable projective transformation which

transforms the linear program constraints H to a new set of linear program constraints H′ which are also in

canonical form. This follows immediately from Karmarkar’s observation that a projective transformation

exists taking an arbitrary point in Rel-Int(P H) to the center of the transformed polytope.

In part I of this paper we define P -trajectories for canonical form linear programs and A -trajectories for

strict standard form linear programs. Indeed the original definitions in terms of the projective scaling

vector fields and affine scaling vector fields only make sense in this context, see Section 4 of this paper. In

part II we define Legendre transform coordinates for any set of linear programming constraints. In part III

we then take the characterization of A -trajectories as straight lines in Legendre transform coordinates with

slope c′ given in (2.11) as a definition of A -trajectories valid for all linear programs. In part III we also use

the relation between P -trajectories of the canonical form problem (2.1) and A -trajectories of the

homogeneous standard form problem (2.2) to give a definition of P -trajectories valid for all linear

programs. With these definitions, we prove that A -trajectories are preserved by invertible affine

transformations of variables, and that P -trajectories are preserved by a (slightly restricted) set of projective

transformations, which includes all invertible affine transformations.

In part III we also use Legendre transform coordinates to compute the power series expansion of the

central P -trajectory. These power series coefficients assume a very simple form which is easy to compute.

This leads to the possibility of practical linear programming algorithms based on power series approaches.

This will be discussed elsewhere [BKL].

Now that we have established that P -trajectories and A -trajectories are parts of real algebraic curves,

we can define them outside the polytopes Rel-Int(P H) on which they were originally defined. These

algebraic curves extend into other cells determined by the arrangement of hyperplanes obtained from the

inequality constraints of the linear program by regarding them as equality constraints. It turns out that each

extended A -trajectory (resp. P -trajectory) visits a cell of the arrangement at most once, and that in each cell

it visits an extended A -trajectory (resp. P -trajectory) is an A -trajectory (resp. P -trajectory) for a linear

- 12 -

program having that cell as its set of feasible solutions. These linear programs are obtained from the

original linear program by reversing a suitable subset of the inequality constraints.

In part IV we use Legendre transform coordinates to show that P -trajectories are ‘‘geodesics’’ of a

metric geometry isometric to Hilbert’s geometry on the interior of the dual polytope, as well as an

analogous result for A -trajectories for homogeneous standard form linear programs.

B. Results of this paper

In this paper we define and present basic properties of P -trajectories and A -trajectories. In section 3 we

briefly review the projective and affine scaling algorithms, in order to provide background and perspective

on later developments. In section 4 we derive the affine and projective scaling vector fields, and then obtain

differential equations for A -trajectories and P -trajectories. The affine scaling vector field is calculated

using an affine rescaling of coordinates, and the projective scaling vector field is calculated using a

projective rescaling of coordinates. (This motivates our choice of names for these algorithms.) In order to

apply these rescaling transformations the linear programs must be of special forms: strict standard form for

the affine scaling algorithm, and the canonical form (2.1) for the projective scaling algorithm.

Consequently A -trajectories are defined in part I only for standard form problems and P -trajectories only

for canonical form problems. (In part III of this series of papers we will extend the definition of A -

trajectory and P -trajectory to other linear programs.) A connection with fractional linear programming is

also made in Section 4.

In section 5 we give Karmarkar’s geometric interpretation of A -trajectories for standard form linear

programs as steepest descent curves with respect to the Riemannian metric ds 2 =i = 1Σn

xi2

dx i dx i_ ______ . This

Riemannian metric has a rather special property: it is invariant under projective transformations taking the

positive orthant Int( R+n ) into itself. The results of this section are not used elsewhere in these papers.

In Section 6 we derive a fundamental relation between P -trajectories and A -trajectories, which is that

the P -trajectories of the canonical form linear program (2.1) are radial projections of the associated

homogeneous strict standard form linear program obtained by dropping the inhomogeneous constraint

⟨e , x⟩ = n from (2.1). In particular these P -trajectories and A -trajectories are algebraically related.

- 13 -

In the final section 7 a simple consequence of this relation is made. It is that a polynomial time linear

programming algorithm for a canonical form linear program results from following the affine scaling vector

field of the associated homogeneous standard form problem, which is:


subject to

Ax = 0 0 (2.13b)

x ≥ 0 0 (2.13c)


Ae = 0 0 (2.13d)

AA T is invertible . (2.13e)

The piecewise linear steps of the resulting ‘‘affine scaling’’ algorithm then radially project onto the

piecewise linear steps of Karmarkar’s projective scaling algorithm, so this ‘‘affine scaling’’ algorithm is

essentially Karmarkar’s projective scaling algorithm. We mention it because it is an example of a provably

polynomial time linear programming algorithm based on the affine scaling vector field. A final observation

is that this ‘‘affine scaling’’ algorithm is not solving the linear program (2.13), but rather is solving the

fractional linear program with objective function⟨e , x⟩⟨c , x⟩_ _____ subject to the homogeneous standard form

constraints (2.13b)-(2.13e). The results of Section 7 are perhaps best viewed as an interpretation of

Karmarkar’s projective scaling algorithm as an ‘‘affine scaling’’ algorithm for a particular fractional linear

programming problem. In this connection see [A].

3. Affine and projective scaling algorithms

In this section we briefly summarize Karmarkar’s projective scaling algorithm [K] and the affine scaling

algorithm, described in [B] and [VMF]. We start with Karmarkar’s algorithm. Karmarkar’s projective

scaling algorithm is a piecewise linear algorithm which proceeds in steps through the relative interior of the

polytope of feasible solutions to the linear programming problem. It has the following main features: an

initial starting point, a choice of step direction, a choice of step size at each step, and a stopping rule.

The initial starting point is supplied by the fact that the algorithm is defined only for linear

- 14 -

programming problems whose constraints are of a special form, which we call (Karmarkar) canonical

form, which comes with a particular initial feasible starting point which Karmarkar calls the center.

Karmarkar’s algorithm also requires that the objective function z = ⟨c , x⟩ satisfy the special restriction

that its value at the optimum point of the linear program is zero. We call such an objective function a

normalized objective function. In order to obtain a general linear programming algorithm, Karmarkar [K,

Section 5] shows how any linear programming problem may be converted to an associated linear

programming problem in canonical form which has a normalized objective function. This conversion is

done by combining the primal and dual problems, then adding slack variables and an artificial variable, and

as a last step using a projective transformation. An optimal solution of the original linear programming

problem can be easily recovered from an optimal solution of the associated linear program constructed in

this way. The step direction is supplied by a vector field defined on the relative interior Rel-Int(P) of the

polytope of feasible solutions of a canonical form linear program. Karmarkar’s vector field depends on

both the constraints and the objective function. It can be defined for any objective function on a canonical

form problem, whether or not this objective function is normalized. However Karmarkar only proves good

convergence properties for the piecewise linear algorithm he obtains using a normalized objective function.

Karmarkar’s vector field is defined implicitly in his paper [K], in which projective transformations serve as

a means for its calculation. This is described in Section 4.

The step size in Karmarkar’s algorithm is computed using an auxiliary function g: Rel-Int(P) → R

which he calls a potential function. In fact g: (Rn ) + → R is defined by

g(x) = n log cT x −i = 1Σn

log x i .

It depends on the objective function cT x and approaches − ∞ at the optimal point on the boundary ∂P of the

polytope P of feasible solutions, and approaches + ∞ at all other boundary points. It is related to the

objective function by the inequality

g(x) ≥ n log (cT x) . (3.1)

If x j is the starting point of the j th step and R + v the step direction, then the step size is taken to arrive at

that point x j + 1 on the ray x + R + v which minimizes g(x) on this ray. If x j + 1 is not an optimal point,

then x j + 1 remains in Rel-Int(P). Karmarkar proves that

- 15 -

g(x j + 1 ) ≤ g(x j ) −51_ _ (3.2)

provided that cT x is a normalized objective function. Finally, the stopping rule is related to the input data

and to the bound (3.2) on the potential function. If (3.2) fails to hold at any step, the original L.P. was

infeasible or unbounded. If we start at the center x 0 = e then

g(x 0 ) = n log cT x 0 .

With (2.1) and (2.2) this implies for a normalized objective function that

cT x 0

cT x j_ ____ ≤ e−

51_ __ j

.

It is known that there is a bound L easily computable from the input data of a canonical form linear

program with normalized objective function such that

cT w ≥ 2 − L

for any non-optimal vertex w of the polytope. When e−

51_ __ j

≤ 2 − L the algorithm is stopped, and one locates

a vertex w of P with

cT w ≤ cT x j , (3.4)

which is then guaranteed to be optimal. In practice one does not wait until the bound e−

51_ __ j

≤ 2 − L is

reached; instead every few iterates one derives a solution w to (3.4) and checks whether or not it is optimal.

The affine scaling algorithm is similar to the projective scaling algorithm. It differs in the following

respects. The input linear program is one required to have constraints of a special form which we call strict

standard form constraints. This form is less restricted than (Karmarkar) canonical form. It is described in

detail in Section 4. The step direction is calculated using a different scaling transformation based on an

affine change of variable; this justifies calling this algorithm the affine scaling algorithm. There are a

number of different proposals for calculating the step size, one of which is to go a fixed fraction (say 95%)

of the way to the boundary along the ray specified by the step direction. The stopping rule is the same as in

Karmarkar’s algorithm. The affine scaling algorithm using the fixed fraction step size has been proved (in

both [B] and [VMF]) to converge to an optimum solution under suitable nondegeneracy conditions. The

affine scaling algorithm has not been proved to run in polynomial time in the worst case, and this may well

not be true.

- 16 -

In Section 7 we show that a particular special case of the affine scaling algorithm does give a provably

polynomial time algorithm for linear programming. This occurs, however, because the resulting algorithm

is essentially identical to Karmarkar’s projective scaling algorithm.

4. Affine and projective scaling vector fields and differential equations

In this section we review the derivation of the affine and projective scaling vector fields as obtained by

rescaling the coordinates of the position orthant R+n .

A. Affine scaling vector field

We define the affine scaling vector field for linear programs of a special form which we call strict

standard form. A standard form linear program is:

minimize ⟨c , x⟩ (4.1)

subject to

Ax = b (4.2a)

x ≥ 0 0 (4.2b)

with side condition

AA T is invertible . (4.2c)

The invertibility condition (4.2c) guarantees that the projection operator π A⊥ which projects Rn onto the

subspace A ⊥ = x : Ax = 0 0 is given by

π A = I − A T (AA T ) − 1 A . (4.3)

We define standard form constraints to be the constraint conditions (4.2). We say that a set of linear

programming constraints is in strict standard form if it is a set of standard form constraints and it has a

feasible solution x = (x 1 , ... , x n ) such that all x i > 0. The notion of strict standard form constraints H

is a mathematical convenience introduced to make it easy to describe Rel-Int(P H ), which is then

P H ∩ R+n , and to be able to give explicit formulae for the effect of affine scaling transformations (and for

Legendre transform coordinates (2.6)). Note that any standard form linear program can be converted to

one that is in strict standard form by dropping all variables x i that are identically zero on P H . A

homogeneous strict standard form problem is a linear program having strict standard form constraints in

- 17 -

which b = 0 0, and its constraints are homogeneous strict standard form constraints.

In defining the affine scaling vector field we first consider a strict standard form linear program having

the point e = ( 1 , 1 , ... , 1 ) T as a feasible point. We define the affine scaling direction v A (e ; c)

at the point e to be the steepest descent direction for ⟨c , x⟩ at x 0 = e, subject to the constraint Ax = b,

so that

v A (x , c) = − π A⊥ (c) . (4.4)

This may be obtained by Lagrange multipliers as a solution to the constrained minimization problem:

minimize ⟨c , x⟩ − ⟨c , e⟩ (4.5a)

subject to

⟨x − e , x − e⟩ = ε , (4.5b)

Ax = b , (4.5c)

for any ε > 0.

Now we define the affine scaling vector field v(d ; c) for an arbitrary strict standard form linear

program at an arbitrary feasible point d = (d 1 , ... , d n ) in

Int (R+n ) = x : all x i > 0 .

Let D = diag (d 1 , ... , d n ) to be the diagonal matrix corresponding to d, so that d = De. We introduce

new coordinates by the affine (scaling) transformation

y = Φ D (x) = D − 1 x

with inverse transformation

ΦD− 1 (y) = Dy = x .

Under this change of variables the standard form program (4.1)-(4.2) becomes the following standard form

program.

minimize ⟨Dc , x⟩ (4.6)

subject to

ADy = b (4.7a)

y ≥ 0 0 (4.7b)

with side condition

- 18 -

AD 2 A T is invertible . (4.7c)

Furthermore Φ D (d) = e. By definition the affine scaling direction for this problem is — π (AD)⊥ (Dc), and

we define the affine scaling vector v A (d ; c) as the pullback by ΦD− 1 of this vector, which yields

v A (d ; c) = − Dπ (AD)⊥ (Dc) .

= − D(I − DA T (AD 2 A T ) − 1 AD) Dc (4.8)

We check that the affine scaling vector depends only on the component π A⊥ (c) of c in the A ⊥ direction, and

summarize the discussion so far as a lemma.

Lemma 4.1. The affine scaling vector field for a standard form problem (4.1)-(4.2) having a feasible

solution x = (x 1 , ... , x n ) with all x i > 0 is

v A (d ; c) = − Dπ (AD)⊥ (Dc) . (4.9)

In addition

v A (d ; c) = v(d ; π A⊥ (c) ) . (4.10)

Proof. The formula (4.9) is just (4.8). Using

π A (c) = A T (AA T ) − 1 Ac = A T λ λ ,

since direct substitution in (4.9) yields

v A (d ; π A (c) ) = − D 2 A T λ λ + D 2 A T (AD 2 A T ) − 1 AD 2 A T λ λ = 0 0 ,

which proves (4.10).

B. Projective scaling vector field

We define the projective scaling vector field for linear programs in the following form, which we call

canonical form:


subject to

Ax = 0 0 , (4.12a)

eT x = n , (4.12b)

x ≥ 0 0 , (4.12c)


- 19 -

Ae = 0 0 , (4.12d)


Note that a canonical form problem is always in strict standard form. We define canonical form constraints

to be constraints satisfying (4.12).

The projective scaling vector field is more naturally associated with a canonical form fractional linear

program, which is:

minimize⟨b , x⟩⟨c , x⟩_ ______ (4.13)

subject to

Ax = 0 0 , (4.14a)

⟨e , x⟩ = n , (4.14b)

x ≥ 0 , (4.14c)


Ae = 0 0 , (4.14d)


where the denominator b ≥ 0 0 is nonnegative, and is scaled so that ⟨b , e⟩ = 1. The condition (4.14d)

guarantees that e is a feasible solution to this fractional linear program.

We define the fractional projective scaling vector v FP (e ; c) of a canonical form fractional linear

program at e to be the steepest descent direction of the numerator ⟨c , x⟩ of the fractional linear objective

function; subject to the constraints Ax = 0 and eT x = 1, which is

v FP (e ; c) = − π eT

A_ __

⊥

(c) . (4.15)

The fact that this definition does not take into account the denominator ⟨b , x⟩ of the FLP objective

function may seem rather surprising. After defining the projective scaling vector field we will show

however that it gives a reasonable search direction for minimizing a normalized objective function.

We obtain the projective scaling direction for a canonical form linear program (4.11)-(4.12) by

identifying it with the fractional linear program having objective function⟨e , x⟩⟨c , x⟩_ _____ . Observe that this FLP

objective function is just the LP objective function ⟨c , x⟩ everywhere on the constraint set in view of the

- 20 -

constraint ⟨e ,x⟩ = n. We define the projective scaling vector v P (e ; c) to be v P (e ; c), so that

v P (e ; c) = − π eT

A_ __

⊥

(c) . (4.16)

Now we define the projective scaling vector field v P (d ; c) for a canonical form problem at an arbitrary

feasible point d in Rel-Int(S n − 1 ) = x : ⟨e ,x⟩ = n and x > 0. We define new variables using the

projective (scaling) transformation

y = Φ D (x) = neTD − 1 xD − 1 x_ _______ . (4.17)

with inverse transformation

ΦD− 1

(y) = neTDyDy_ _____ = x . (4.18)

Under this change of variables the canonical form fractional linear program (4.13)-(4.14) with objective

function⟨e ,x⟩⟨c , x⟩_ _____ becomes the following canonical form fractional linear program.

minimize⟨De , y⟩⟨Dc , y⟩_ _______ (4.19)

subject to

ADy = 0 0 , (4.20a)

⟨e , y⟩ = n , (4.20b)

y ≥ 0 0 , (4.20c)


De = 0 , (4.20d)

AD 2 A T is invertible . (4.20e)

Note that the denominator ⟨De , y⟩ is scaled so that ⟨De , e⟩ = 1. Furthermore Φ D (d) = e. By definition

the (fractional) projective scaling direction for this point is

v FP (e; Dc) = − π eT

AD_ ___

⊥

(Dc) . (4.21)

We define the projective scaling vector v P (d ; c) to be the pullback under ΦD− 1

of this vector, i.e.

v P (d ; c) = (ΦD− 1

)*

(v FP (e ; Dc) ) (4.22)

Now ΦD− 1

is a non-linear map, and a computation gives the formula

- 21 -

ΦD

− 1 *

(w) = Dw −n1_ _ ⟨De , w⟩ De .

The last three formulae combine to yield

v P (d ; c) = − Dπ eT

AD_ ___

⊥

(Dc) +n1_ _ ⟨De , π

eT

AD_ ___

⊥

(Dc) ⟩De . (4.23)

One motivation for this definition of the projective scaling direction is that it gives a ‘‘good’’ direction

for fractional linear programs having a normalized objective function. To show this we use observations of

Anstreicher [A]. We define a normalized objective function of an FLP to be one whose value at the

optimum point is zero. This property depends only on the numerator ⟨c , x⟩ of the FLP objective function.

The property of being normalized is preserved by the projective change of variable

y = Φ D (x) =eTD − 1 xnD − 1 x_ _______ . In fact the FLP (4.13)-(4.14) is normalized if and only if the transformed FLP

(4.19)-(4.20) is normalized. Now consider the FLP (4.13)-(4.14) with an arbitrary objective function. Let

x* denote the optimal solution vector of a fractional linear program of form (4.13)-(4.14), and let

z * =⟨b , x*⟩⟨c , x*⟩_ ______ be the optimal objective function value. Define the auxiliary linear program with objective

function

minimize ⟨c , x⟩ − z * ⟨b ,x⟩ .

and the same constraints (4.14) as the FLP. The point x* is easily checked to be an optimal solution of this

auxiliary linear program, using the fact that⟨b ,x⟩⟨c , x⟩_ _____ ≥ z * for all feasible x. In the special case that z * = 0

which arises from a normalized FLP, the steepest descent direction for this auxiliary linear program is just

the fractional projective scaling direction (4.15). Since normalization is preserved under projective

transformation y = Φ D (x) this leads to the definition (4.23) of the projective scaling direction v P (d ; c)

for a canonical form linear program with a normalized objective function.

This discussion provides no justification for the claim that the projective scaling direction v P (d ; c)

given by (4.15) is an interesting search direction for minimizing a general objective function. In fact the

direction specified by v P (d ; c) in the general case does have a reasonable consequence, as follows: it leads

to the simple relationship between affine scaling trajectories and projective scaling trajectories given in

Theorem 6.1.

- 22 -

Now we obtain a simplified formula for the projective scaling direction v P (d ; c), and also show that it

depends only on the component π A⊥ (c) of c in the A ⊥ direction. We summarize the facts in the following

Lemma.

Lemma 4.2. The projective scaling vector field for a canonical form linear program (4.11)-(4.12) is given

by

vP (d ; c) = − Dπ (AD)⊥ (Dc) +n1_ _ ⟨De , π (AD)⊥ (Dc) ⟩De . (4.24)

In addition

v P (d ; c) = v P (d ; π A⊥ (c) ) . (4.25)

Before giving the proof we remark that v P (d ; c) ≠ v P (d ; π e

A_ __

⊥

(c) ) in general.

Proof. By construction v P (d ; c) lies in eT

A_ __

⊥

. To see that v P (d ; c) lies in (eT ) ⊥ , we compute by

(4.23) that

⟨e , v P (d ; c) ⟩ = ⟨De ,π eT

AD_ ___

⊥

(Dc) ⟩ +n1_ _ ⟨De , π

eT

AD_ ___

⊥

(Dc) ⟩ ⟨Dc , e⟩

= 0 .

Now we simplify (4.23) by observing that the feasibility of d gives

ADe = Ad = 0 0 .

Hence the projections π (AD)⊥ and π (eT )⊥ commute with each other and

π eT

AD_ ___

⊥

= π (eT )⊥ π (AD)⊥ .

Next we observe that π (eT )⊥ = I −n1_ _ J where J = eeT is the matrix with all entries one, and that

Jw = ⟨ e , w⟩e for all vectors w. Applying these facts to (4.23) we obtain

- 23 -

vP (d ; c) = − Dπ (eT )⊥ (π (AD)⊥ (Dc) ) + λDe

= − Dπ (AD)⊥ (Dc) +n1_ _ DJπ (AD)⊥ (Dc) + λDe

= − Dπ (AD)⊥ (Dc) + µDe (4.26)

where λ and µ are scalars and

µ =n1_ _ ⟨De , π

eT

AD_ ___

⊥

(Dc) ⟩ +n1_ _ ⟨e , π (AD)⊥ (Dc) ⟩ . (4.27)

Multiplying (4.26) by eT , and using the identity ⟨e , v P (d ; c) ⟩ = 0 we derive an alternate expression for

µ which is

µ =n1_ _ ⟨De , π (AD)⊥ (Dc) ⟩ ,

and this proves (4.24).

To prove the remaining formula, start from

π A (c) = A T (AA T ) − 1 Ac = A T λ λ .

where we define λ λ = (AA T ) − 1 Ac. Then

π AD⊥ (Dπ A (c) ) = − (I − DA T (AD 2 A T ) − 1 AD) DA T λ λ

= 0 0 .

Substituting this in (4.24) yields

v P (d; π AT (c) ) = 0 0 .

Since c = π A⊥ (c) + π A (c) the formula (4.25) follows.

The projective scaling vector field vP (d ; c) depends on the component of c in the e-direction. The

requirement in Karmarkar’s algorithm that objective function be normalized so that ⟨c , x opt ⟩ = 0 0 specifies

the component of c in the e-direction and removes this ambiguity.

Lemma 4.3 Given a canonical form linear program H and an objective function c there is a unique

normalized objective function cN such that

(i) cN lies in A ⊥ .

(ii) π eT

A_ __

⊥ (c) = π eT

A_ __

(cN ) = π eT (cN ).

If c′ = π eT

A_ __

(c) then c N is given by

- 24 -

cN = c′ −n1_ _ ⟨c′ , x opt ⟩ e (4.28)

Proof. The condition Ae = 0 0 implies that A ⊥ = eT

A_ __

⊥

⊕ R ⟨e⟩. Hence the conditions (i) and (ii)

imply that any normalized objective function satisfying (i) and (ii) has

cN = c′ + µe .

for some scalar µ. The normalization condition gives

⟨cN , x opt ⟩ = ⟨c′ , x opt ⟩ − µ ⟨e , x opt ⟩ = 0 .

Since a canonical form problem has ⟨e , x⟩ = n we have ⟨e , x opt ⟩ = n so

µ =n1_ _ ⟨c′ , x opt ⟩ .

is unique.

C. Affine and Projective Scaling Differential Equations

The affine and projective scaling trajectories are found by integrating the affine and projective scaling

vector fields, respectively. Now we give definitions.

For the affine scaling case, consider a strict strict standard form problem


subject to

Ax = b

x ≥ 0 0

having a feasible solution x = (x 1 , ... , x n ) with x i > 0. In that case the relative interior Rel-Int (P) of

the polytope P of feasible solutions is

Rel − Int (P) = x : Ax = b and x > 0 . (4.29)

Suppose that x 0 is in Rel-Int(P). We define the A - tra j ectory T A (x 0 ; A , b , c) containing x 0 to be the

point set given by the integral curve x(t) of the affine scaling differential equation:

dtdx_ __ = − X π (AX)⊥ (Xc) , (4.30a)

x( 0 ) = x 0 , (4.30b)

in which X = X(t) is the diagonal matrix with diagonal elements x 1 (t) , ... , x n (t), so that x(t) = X(t) e.

- 25 -

This differential equation is obtained from the affine scaling vector field as defined in Lemma 4.1, together

with the initial value x 0 . The integral curve x(t) is defined for a range t 1 (x 0 ; A , c) < t < t 2 (x 0 ; A , c)

which is chosen to be the maximum interval on which the solution exists. (Here t 1 = − ∞ and t 2 = + ∞

are allowable values. It turns out that finite values of t 1 or t 2 may occur, cf. see equation (5.13).) An A -

trajectory T(x 0 ; A , b , c) lies in Rel-Int(P) because the vector field in (4.30) is defined only for x(t) in

Rel-Int(P).

For the projective scaling case, consider a canonical form problem (4.11)-(4.12). In this case

Rel - Int (P) = x : Ax = 0 0 , eT x = 1 and x > 0 0 .

Suppose that x 0 is in Rel-Int(P). We define the P-trajectory T Px 0 ; A , c

containing x 0 to be the point

set given by the integral curve x(t) of the projective scaling differential equation:

dtdx_ __ = − X π (AX)⊥ (Xc) + ⟨Xe , π (AX)⊥ (Xc) ⟩ Xe (4.31a)

x( 0 ) = x 0 . (4.31b)

This differential equation is obtained from the projective scaling vector field as defined in Lemma 4.2,

together with the initial value x 0 .

We have defined the A -trajectories and P -trajectories as point sets. The solutions to the differential

equations (4.30) and (4.31) specify these point sets as parametrized curves. An arbitrary scaling of the

vector fields by an everywhere positive function ρ(x , t) leads to a differential equation whose solution will

give the same trajectories with a different parametrization. Conversely, a reparametrization of the curve by

a variable u = ψ(t) with ψ′(t) > 0 for all t leads to a similar differential equation with a rescaled vector

field with ρ(x , t) = ψ′(t). If y(t) = x(ψ(t) ) and y( 0 0 ) = x 0 0 and x(t) satisfies the affine scaling

differential equation then y(t) satisfies:

dtdy_ __ = − ψ′(t) Y π (AY)⊥ (Yc) (4.32a)

y( 0 ) = x 0 .

If x(t) satisfies the projective scaling differential equation instead then y(t) satisfies:

- 26 -

dtdy_ __ = − ψ′(t) [Y π (AY)⊥ (Yc) − ⟨Ye , π (AY)⊥ (Yc) ⟩Ye] (4.33a)

y( 0 ) = x 0 . (4.33b)

In part III we will give explicit parametrized forms for the A -trajectories and P -trajectories which allow

us to characterize their geometric behavior.

5. The affine scaling vector field as a steepest descent vector field

In this section we present Karmarkar’s observation that the affine scaling vector field of a strict standard

form linear program is a steepest descent vector field of the objective function ⟨c , x⟩ with respect to a

particular Riemannian metric ds 2 defined on the relative interior of the polytope of feasible solutions of the

linear program.

We first review the definition of steepest descent with respect to a Riemannian metric. Let

ds 2 =i = 1Σn

j = 1Σn

g i j (x) dx i dx j (5.1)

be a Riemannian metric defined on an open subset Ω of Rn , i.e. we require that the matrix

G(x) = [g i j (x) ] (5.2)

be a positive-definite symmetric matrix on Ω. Let

f (x) : Ω → R (5.3)

be a differentiable function. The differential df x at x is a linear map on the tangent space Rn at x,

df x : (Rn ) → R (5.4)

given by

f (x + εv) = f (x) + ε df x (v) + O(ε2 ) (5.5)

as ε → 0 and v ∈ Rn . The Riemannian metric ds 2 permits us to define the gradient vector field

∇ G f : Ω → Rn with respect to G(x) by ∇ G f (x) is that direction such that f increases most steeply with

respect to ds 2 at x. This is the direction of the minimum of f (x) on an infinitesimal unit ball of ds 2 (which

is an ellipsoid) centered at x. Formally

- 27 -

∇ G f (x) = G(x) − 1

df x

∂x n

∂_ ___

...

df x ∂x 1

∂_ ___

. (5.6)

Note that if ds 2 is the Euclidean metric

ds 2 =i = 1Σn

dx i dx i ,

then ∇ G f is the usual gradient ∇ f. (See [Fl], p. 43.)

There is an analogous definition for the gradient vector field ∇ G fF of a function f restricted to a k-

dimensional flat F in Rn . Let the flat F be x 0 + V where V is a n-k-dimensional subspace of Rn given by

V = x : Ax = 0 0 ,

in which A is a k × n matrix of full row rank k. Geometrically the steepest descent direction ∇ G f (x 0 )F is

that direction in F that maximizes f (x) on an infinitesimal unit ball centered at x 0 of the metric ds 2F

restricted to F. A computation with Lagrange multipliers given in Appendix A shows that

∇ G f (x 0 )F = (G − 1 − G − 1 A T (AG − 1 A T ) − 1 AG − 1 )

Df x0

∂x n

∂_ ___

...

Df x0

∂x 1

∂_ ___

. (5.7)

where ds 2 has coefficient matrix G = G(x 0 ) at x 0 .

Now we consider a linear programming problem given in strict standard form:


subject to

Ax = b (5.8a)

x ≥ 0 0 (5.8b)


AA T is nonsingular . (5.8c)

having a feasible solution x with all x i > 0. Karmarkar’s steepest descent interpretation of the affine

- 28 -

scaling vector field is as follows.

Theorem 5.1. (Karmarkar) The affine scaling vector field v A (c ; d) of a strict standard form problem (5.8)

is the steepest descent vector − ∇ G (⟨c , x 0 ⟩)F at x 0 = d with respect to the Riemannian metric obtained

by restricting the metric

ds 2 =i = 1Σn

xi2

dx i dx i_ ______ (5.9)

defined on Int (R+n ) to the flat F = x : Ax = b.

Before proving this result (which is a simple computation) we discuss the metric (5.9). This is a very

special Riemannian metric. It may be characterized as the unique Riemannian metric (up to a positive

constant factor) on Int(R+n ) which is invariant under the scaling transformations Φ D : R+

n → R+n given

by

x i → d i x i for 1 ≤ i ≤ n , (5.10)

with all d i > 0 and D = diag (d 1 , ... , d n ), and under the inverting transformations

I i ( (x 1 , ... , x i , ... , x n ) ) = (x 1 , ... ,x i

1_ __ , ... , x n ) (5.11)

for 1 ≤ i ≤ n and under all permutations σ( (x 1 , ... , x n ) ) = (xσ( 1 ) , ... , xσ(n) ). The geometry induced

by ds 2 on Int(R+n ) is isometric to Euclidean geometry on Rn under the change of variables y i = log x i for

1 ≤ i ≤ n. All these facts are proved in Appendix B.

Proof of Theorem 5.1. The metric ds 2 =i = 1Σn

xi2

(dx i )2_ _____ induces a unique Riemannian metric ds 2F on the

region

Rel − Int (P) = x : Ax = b and x > 0 .

inside the flat F = x : Ax = b. The matrix G_ _

(x) associated to ds 2 is the diagonal matrix

G_ _

(x) = diag x1

21_ __ , ... ,

xn2

1_ __

= X − 2 ,

where X = diag (x 1 , ... , x n ). Using the definition (5.7) applied to the function ó c (x) = ⟨c , x⟩. we

obtain

∇ G_ _ (ó c (x) )F = X(I − XA(AX 2 A T ) − 1 AX) Xc .

- 29 -

The right side of this equation is − v A (x ; c) by Lemma 3.1.

We now show by an example that these steepest descent curves are not geodesics of the metric ds 2F

even in the simplest case. Consider the strict standard form problem with no equality constraints:


subject to

x ≥ 0 0 .

The affine scaling differential equation (4.30) becomes in this case

dtdx_ __ = − X 2 c (5.12a)

x( 0 ) = (d 1 , ... , d n ) . (5.12b)

This is a decoupled set of Riccati equations

dt

dx i_ ___ = − xi2 c i ,

x i ( 0 ) = d i .

for 1 ≤ i ≤ n. Using the change of variables y i =x i

1_ __ we easily find that

dt

dy i_ ___ = c i ,

y i ( 0 ) =d i

1_ __ ,

for 1 ≤ i ≤ n. From this we obtain

x(t) = d 1

1_ __ + c 1 t

1_ _________ , . . . ,

d n

1_ __ + c n t

1_ _________

. (5.13)

This trajectory is defined for t 1 < t < t 2 where

t 1 = max d i

c i_ __ : c i > 0

(5.14a)

t 2 = min d i

c i_ __ : c i < 0

(5.14b)

with the convention that t 1 = − ∞ if all c i ≤ 0 and t 2 = ∞ if all c i ≥ 0. The geodesic curves of

- 30 -

ds 2 =i = 1Σn

xi2

dx i dx i_ ______

are explicitly evaluated in Appendix B to be

γ γ(t) = (e a1 t + b1 , ... , e an t + bn ) ,

wherec = 1Σn

ai2 = 1 for − ∞ < t < ∞. It is easy to see these do not coincide with the curves (5.13) for

n ≥ 2, since x(t) is a rational curve while γ γ(t) satisfies no algebraic dependencies among its coordinates in

general.

6. Relations between P-trajectories and A-trajectories

There is a simple relationship between the P -trajectories of the canonical form linear program:


subject to

Ax = 0 0 (6.1b)

⟨e ,x⟩ = n (6.1c)

x ≥ 0 0 (6.1d)


Ae = 0 0 (6.1e)

AA T is invertible . (6.1f)

and the A -trajectories of the associated homogeneous strict standard form linear program:


subject to

Ax = 0 0 (6.2b)

x ≥ 0 0 (6.2c)


Ae = 0 0 (6.2d)


It is as follows.

- 31 -

Theorem 6.1. If T A (x 0 ; A , c) is an A-trajectory of the homogeneous strict standard form problem (6.2)

then its radial projection

T = ⟨e , x⟩

nx_ _____ : x ∈ T A (x 0 ; A , 0 0 , c)

(6.3)

is a P-trajectory of the associated canonical form linear program, which is given by

T = T P

⟨e , x 0 ⟩

nx 0_______ ; [A] , c

. (6.4)

Proof. Geometrically the radial projection produces the radial component in the projective scaling vector

field evident on comparing Lemmas 4.1 and 4.2. The trajectory T A (x 0 ; A , 0 0 , c) is parametrized by a

solution x(t) of the differential equation

dtdx_ __ = − Xπ (AX)⊥ (Xc) . (6.5)

x( 0 ) = x 0 .

Now define

y(t) =⟨e , x(t) ⟩

nx(t)_ ________ .

We verify directly that y(t) satisfies a (scaled) version of the projective scaling differential equation.

Let Y(t) = diag (y 1 (t) , ... , y n (t) ) and note that Y(t) = n⟨e ,x(t) ⟩ − 1 X(t) so that

Xπ (AX)⊥ (Xc) = n − 2 ⟨e , x(t) ⟩2 Yπ (AY)⊥ (Yc) .

Using this fact and Ye = n⟨e , x(t) ⟩ − 1 x we obtain

dtdy_ __ = n⟨e , x(t) ⟩ − 1

dtdx_ __ − n 2 ⟨e , x(t) ⟩ − 2 ⟨e ,

dtdx_ __ ⟩ x

= − n⟨e , x(t) ⟩ − 1 (n − 2 ⟨e , x(t) ⟩2 Yπ (AY)⊥ (Yc) − n − 3 ⟨e , x(t) ⟩2 ⟨e , Yπ (AY)⊥ (Yc⟩Ye)

=n1_ _ ⟨e , x(t) ⟩ (Yπ (AY)⊥ (Yc) − ⟨Ye , π (AY)⊥ (Yc) ⟩Ye

=n1_ _ ⟨e , x(t) ⟩ v P (y ; c) .

Since ψ′(t; x 0 ) =n1_ _ ⟨e , x(t) ⟩ > 0 since x(t) ∈ Int (R+

n ) this is a version of the projective scaling

differential equation (4.33). This proves (6.4) holds.

- 32 -

As an example we apply Theorem 6.4 to the canonical form linear program with no extra equality

constraints:


subject to

eT x = n ,

x ≥ 0 .

The feasible solutions to this problem form a regular simplex S n − 1 . In this case the associated

homogeneous standard form problem has no equality constraints:


subject to

x ≥ 0 0 .

Using the formula (5.13) parametrizing the affine scaling trajectories:

T A (d ; φ , φ , c) =

d 1

1_ __ + c 1 t

1_ _________ , ... ,

d n

1_ __ + c n t

1_ _________

: t 1 < t < t 2

we find that if d lies in Int(S n) then the projective scaling trajectory given by Theorem 6.4 is

T P (d ; φ , c) =

i = 1

Σn

d i

1_ __ + c i t

− 1n_ ________________

d 1

1_ __ + c 1 t

1_ _________ , ... ,

d n

1_ __ + c n t

1_ _________

: t 1 < t < t 2

.

where t 1 and t 2 are given by (5.14). Notice that both T A (d ; φ , φ , c) and T P (d ; φ , φ , c) are rational

curves in this example.

Since any canonical form problem is automatically a standard form problem, both P -trajectories and

A -trajectories are defined for a canonical form problem. In general an A -trajectory is not a P -trajectory

and vice-versa. However the A -trajectories and P -trajectories through the point e do coincide, and we have

the relation:

- 33 -

T Pe ; [A] , c

= T A

e ; eT

A_ __

, 1

0 0_ _

, c

. (6.6)

This is proved in [BL3]. We call the point e the center (as does Karmarkar) and we call the trajectories

(6.6) central trajectories.

7. The homogeneous affine scaling algorithm

Consider the homogeneous standard form linear program:


subject to

Ax = 0 0 , (7.1b)

x ≥ 0 0 , (7.1c)


Ae = 0 0 , (7.1d)


We define the homogeneous affine scaling algorithm to be a piecewise linear algorithm in which the

starting value is given by x 0 = e, the step direction is specified by the affine scaling vector field associated

to (7.1) and the step size is chosen to minimize Karmarkar’s ‘‘potential function’’

g(x) =i = 1Σn

log x i

⟨c , x⟩_ _____

(7.2)

along the line segment inside the feasible solution polytope specified by the step direction. Let x 0 , ... , x n

denote the resulting sequence of interior points obtained using this algorithm. Consider the associated

canonical form problem:


subject to

Ax = 0 0 , (7.3b)

⟨e , x⟩ = n , (7.3c)

x ≥ 0 0 , (7.3d)


- 34 -

Ae = 0 , (7.3e)

AA T is invertible . (7.3f)

We have the following result.

Theorem 7.1. If x(k) : 0 ≤ k < ∞ are the homogeneous affine scaling algorithm iterates associated to

the linear program (7.1) and if yi(k) are defined by

y i =⟨e , x(k) ⟩

nx(k)_ ________ , (7.4)

then x(k) : 0 ≤ k < ∞ are the projective scaling algorithm iterates of the canonical form problem (7.2).

Proof. We observe that Karmarkar’s ‘‘potential function’’ is constant on rays through the origin:

g(λx) = g(x) if λ > 0 . (7.5)

Now we prove the theorem by induction on the iteration number k. It is true by definition for k = 0. If it is

true for a given k, then the proof of Theorem 6.1 showed that the non-radial component of the affine scaling

vector field agrees with the projective scaling vector field. Hence the radial projection of the homogeneous

affine scaling step direction line segment inside R+n is the projective scaling step direction line segment

inside R+n . Since Karmarkar’s potential function is constant on rays, the step size criterion for the

homogeneous affine scaling algorithm causes (7.4) to hold for k + 1, completing the induction step.

Theorem 7.1 gives an interpretation of Karmarkar’s projective scaling algorithm as a polynomial time

linear programming algorithm using an affine scaling vector field. The homogeneous affine scaling

algorithm can alternatively be regarded as an algorithm solving the fractional linear program with objective

function

minimize⟨e , x⟩⟨c , x⟩_ _____ ,

subject to the standard form constraints (7.1b)-(7.1e). If Karmarkar’s stopping rule is used one obtains a

polynomial time algorithm for solving this fractional linear program.

- A1 -

Appendix A. Steepest descent direction with respect to a Riemannian metric

We compute the steepest descent direction ∇ G f (x 0 )F of a function f (x) defined on a flat

F = x 0 + x : Ax = 0 0 with respect to a Riemannian metric ds 2 =i = 1Σn

j = 1Σn

g i j (x) dx i dx j at x 0 . We

may suppose without loss of generality that x 0 = 0 0, and set G = [g i j ( 0 0 ) ].

The steepest descent direction is found by maximizing the linear functional

⟨df 0 , v⟩ =

df 0 0

∂x 1

∂_ ___

, ... , df 0 0

∂x n

∂_ ___

v (A.1)

on the ellipsoid

i = 1Σn

j = 1Σn

g i j v i v j = ε2 , (A.2)

subject to the constraints

Av = 0 . (A.3)

Note that the direction obtained is independent of ε. We define d ≡

df 0 0

∂x 1

∂_ ___

, ... , df 0 0

∂x n

∂_ ___

T

,

and set this problem up as a Lagrange multiplier problem. We wish to find a stationary point of

L = dT v − λTAv − µ(vTGv − ε2 ) . (A.4)

The stationarity conditions are

∂v∂L_ __ = d − A T λ λ − µ(G + G T ) v = 0 0 , (A.5)

∂ λ λ∂L_ __ = − Av = 0 0 , (A.6)

∂ µ∂L_ __ = vTGv = ε2 . (A.7)

Using (A.5) and G = G T we find that

v =2µ1_ __ G − 1 (d − A T λ λ) . (A.8)

Substituting this into (A.6) yields

AG − 1 A T λ λ = AG − 1 d .

Hence

- A2 -

λ λ = (AG − 1 A T ) − 1 AG − 1 d . (A.9)

Substituting this into (A.8) yields

v =2µ1_ __ (G − 1 − G − 1 A T (AG − 1 A T ) − 1 AG − 1 ) d . (A.10)

Now we check that the tangent vector

w = (G − 1 − G − 1 A T (AG − 1 A T ) − 1 AG − 1 ) d (A.11)

points in the maximizing direction. This corresponds to taking µ > 0 in (A.10). To show this, we show

that the linear functional dT v is nonnegative at v = w. Now recall that any positive definite symmetric

matrix M has a unique positive definite symmetric square root M1⁄2 . Using this fact on G we have

vT w = dTG − 1 d − dTG − 1 A T (AG − 1 A T ) − 1 AG − 1 d

= (dG − 1⁄2 ) (I − G − 1⁄2 A T (AG − 1 A T ) − 1 AG) − 1⁄2 d

Now π W = I − G − 1⁄2 A T (AG − 1 A T ) − 1 AG − 1⁄2 is a projection operator onto the subspace

W = x : AG − 1⁄2 x = 0 0. and (A.12) gives

dT w = (G − 1⁄2 d) T π W (G − 1⁄2 d)

= π W (G − 1⁄2 d) 2 ≥ 0 0 ,

where . denotes the Euclidean norm. There are two degenerate cases where dT w = 0. The first is where

d = 0 0, which corresponds to 0 0 being a stationary point of f, and the second is where d ≠ 0 0 but dT w = 0 0,

in which case the linear functional ⟨df 0 0 , v⟩ = dT v is constant on the flat F.

The vector (A.11) is the gradient vector field with respect to G. We obtain the analogue of a unit

gradient field by using the Lagrange multiplier µ to scale the length of v. Substituting (A.10) into (A.7)

yields

4µ2 ε = dTG − 1 d − dTG − 1 A T (AG − 1 A T ) − 1 AG − 1 d

Hence

µ =2ε±1_ __ (dTG − 1 d − dTG − 1 A T (AG − 1 A T ) − 1 AG − 1 d)

1⁄2

We obtain

ε →0lim

εv1_ __ = θ(G , d) (G − 1 − G − 1 A T (AG − 1 A T ) − 1 AG − 1 ) d (A.12)

where θ(G , d) is the scaling factor

- A3 -

θ(G , d) = (dTG − 1 d − dTG − 1 A T (AGAT ) − 1 AG − 1 d) − 1⁄2

Here θ(G , d) measures the length of the tangent vector w with respect to the metric ds 2 . (As a check, note

that for the Euclidean metric and F = Rn the formula (A.11) for w gives the ordinary gradient and (A.12)

gives the unit gradient.)

- B1 -

Appendix B. Invariant Riemannian metrics on the positive orthant R+n .

We consider Riemannian metrics such that

ds 2 =i = 1Σn

j = 1Σn

g i j (x) dx i dx j (B.1)

has g i j (x) = g j i (x) and all functions g i j (x) are defined on the interior Int(R+n ) of the positive orthant R+

n ,

i.e., if x = (x 1 , ... , x n ) T then

Int(R+n ) = x : x i > 0 for 1 ≤ i ≤ n .

Let D = diag (d 1 , ... , d n ) where all d i > 0, and let

Φ D (x) = Dx . (B.2)

and let G+n denote the (Lie) group of positive scaling transformations

G+n = Φ D : d i > 0 for 1 ≤ i ≤ n . (B.3)

Then G+n acts transitively on R+

n .

Theorem B.1. The Riemannian metrics defined on Int(R+n ) that are invariant under G+

n are exactly the

metrics

ds 2 =i = 1Σn

x i x j

c i j_ ____ dx i dx j (B.4)

where C = [c i j ] is a positive definite symmetric matrix.

Proof. To study a general metric on R+n we use the map L : (R+

n ) → Rn given by

L(x) = ( logx 1 , ... , log x n ) ,

i.e. the new coordinates are y i = log x i . Then (B.1) in the new coordinates is

ds 2 =i = 1Σn

j = 1Σn

g_

i j (y) e y1 + y j dy i dy j (B.5)

where

g_

i j (y) = g i j (e y i , ... , e yn ) . (B.6)

Under this transformation the group G+n becomes the group T n of translations on Rn , i.e.

L(Φ D (x) ) = L(x) + L(De)

where

- B2 -

L(De) = ( log d 1 , ... , log d n ) .

Now a translation-invariant Riemannian metric on Rn is specified by its infinitesimal unit ball at the origin,

i.e. it is a constant metric

ds 2 =i = 1Σn

j = 1Σn

c i j dy i dy j , (B.7)

where C = [c i j ] is a fixed positive definite symmetric matrix. Substituting this in (B.5) yields

g_

i j (y) = c i j e− (y i + y j ) , so that by (B.6) we have

g i j (x) =x i x j

c i j_ ____ .

Since the metrics (B.4) are all invariant under G+n we have proved Theorem B.1.

Theorem B.2. The only Riemannian metrics defined on Int (R+n ) that are invariant under G+

n and under all

inversions

I k ( (x 1 , ... , x k − 1 , x k , x k + 1 , ... , x n ) ) = (x 1 , ... , x k − 1 ,x k

1_ __ , x k + 1 , ... , x n )

for 1 ≤ k ≤ n is

ds 2 =i = 1Σn

c ixi

2

dx i dx i_ ______ . (B.8)

where all c i > 0. The only such metrics that are invariant under these transformations and also under all

permutations σ j (x 1 , ... , x n ) = (xσ( 1 ) , ... , xσn) ) are those of the form

ds 2 = ci = 1Σn

xi2

dx i dx i_ ______ (B.9)

where c > 0.

Proof. By Theorem A-1 we may assume that the metric has the form

ds 2 = Σ x i x j

c i j_ ____ dx i dx j . (B.10)

Now let y k =x k

1_ __ , y j = x j for j ≠ k. Then we compute

ds 2 = Σ g_

i j (y) dy i dy j

where

- B3 -

g_

i j (y) =x i x j

c i j_ ____∂y i

∂x i_ ___∂y j

∂x j_ ___ .

In particular for j ≠ k we have

g_

k j (y) =y j

c j k y k_ _____ −

yk2

1_ __

= −y j y k

c k j_ ____ .

By the invariance hypothesis we must have

g_

k j (y) =y j y k

c k j_ ____

This implies that

c i j = 0 if i ≠ j

and (B.8) follows with c i = c ii .

For the second part if y i = xσ(i) then a computation gives

ds 2 =i = 1Σn

j = 1Σn

y i y j

cσ(i) ,σ( j)_ _______ dy i dy j

hence comparison with (B.10) gives

c ii = cσ(i) , σ(i)

for all permutations σ and (B.9) follows.

Theorem B.3. The geodesics of

ds 2 =i = 1Σn

xi2

dx i dx i_ ______ (B.11)

in Int (R+n ) are exactly the curves

γ(t) = γ a,b (t) = (e a1 t + b1 , e a2 t + b2 , ... , e an t + bn ) , − ∞ < t < ∞ . (B.12)

where a 2 = a12 + a2

2 + ... + an2 = 1 .

Proof. The mapping L : R+n → Rn

L(x) = ( log x 1 , ... , log x n ) (B.13)

takes the metric (B.11) to the Euclidean metric

ds 2 =c = 1Σn

dy i dy i

on Rn . The geodesics of ds 2 are clearly

- B4 -

γ(t) = (a 1 t + b 1 , ... , a n t + b n )

where a = (a 1 , ... , a n ) has a i 2 = 1. The formula (A.12) follows using the inverse map

L − 1 (y) = (e y1 , ... , e yn ) .

The metric Σ xi2

dx i dx i_ ______ has Gaussian curvature 0 at every point of R+2 , i.e. it is a flat metric. This

follows since the transformation (B.12) does not change Gaussian curvature, and the Euclidean metric is

flat.

R-1

REFERENCES

[A] K. Anstreicher, A monotonic projective algorithm for fractional linear programming, preprint

(1985).

[Ar] V. I. Arnold, Mathematical Methods of Classical Mechanics, Springer-Verlag, New York 1978.

[B] E. R. Barnes, A variation on Karmarkar’s algorithm for solving linear programming problems,

preprint (1985).

[BKL] D. Bayer, N. Karmarkar, J. C. Lagarias, paper in preparation.

[BL2] D. Bayer and J. C. Lagarias, The nonlinear geometry of linear programming II. Legendre

transform coordinates, preprint.

[BL3] D. Bayer and J. C. Lagarias, The non-linear geometry of linear programming III. Central

trajectories, in preparation.

[BM] S. Bochner and W. T. Martin, Several Complex Variables, Princeton U. Press, Princeton, New

Jersey 1948.

[BP] H. Busemann, and B. B. Phadke, Beltrami’s theorem in the large, Pacific J. Math. 115 (1984),

299-315.

[Bu] H. Busemann, The geometry of geodesics, Academic Press, New York 1955.

[Bu2] H. Busemann, Spaces with distinguished shortest joints, in: A Spectrum of Mathematics, Auckland

1971, 108-120.

[CH] R. Courant and D. Hilbert, Methods of Mathematical Physics, Vol. I, II Wiley, New York 1962.

[F] W. Fenchel, On Conjugate Convex Functions, Canad. J. Math. 1 (1949) 73-77.

[FM] A. V. Fiacco and G. W. McCormick, Nonlinear Programming: Sequential Unconstrained

minimization techniques, John Wiley and Sons, New York 1968.

[Fl] W. H. Fleming, Functions of Several Variables, Addison-Wesley, Reading, Mass. 1965.

R-2

[GZ] C. B. Garcia and W. I. Zangnill, Pathways to Solutions, Fixed Points and Equilibria, Prentice-Hall

(Englewood Cliffs, N.J.) 1981.

[H] D. Hilbert, Grundlagen der Geometry, 7th Ed., Leipzig 1930. (English Translation: Foundations

of Geometry).

[Ho] J. Hooker, The projective linear programming algorithm, Interfaces, 1986, to appear.

[K] N. Karmarkar, A new polynomial time algorithm for linear programming, Combinatorica 4 (1984)

373-395.

[KV] S. Kapoor and P. M. Vaidya, Fast algorithms for convex quadratic programming and

multicommodity flows, Proc. 18th ACM Symp. on Theory of Computing, 1986 (to appear).

[L4] J. C. Lagarias, The non-linear geometry of linear programming IV. Hilbert geometry, in

preparation.

[L] J. C. Lagarias, paper in preparation.

[Ln] C. Lanczos, The Variational Principles of Mechanics, Univ. of Toronto Press, Toronto 1949.

[M] N. Megiddo, On the complexity of linear programming, in: Advances in Economic Theory (1985),

(T. Bewley, Ed.), Cambridge Univ. Press, 1986, to appear.

[M2] N. Megiddo, Pathways to the optimal set in linear programming, preprint 1986.

[N] J. L. Nazareth, Karmarkar’s method and homotopies with restarts, preprint 1985.

[Re] J. Renegar, paper in preparation.

[R1] R. T. Rockafellar, Conjugates and Legendre Transforms of Convex Functions, Canad. J. Math. 19

(1967) 200-205.

[R2] R. T. Rockafellar, Convex Analysis, Princeton U. Press, 1970.

[Sh] M. Shub, paper in preparation.

[SW] J. Stoer and C. Witzgall, Convexity and Optimization in Finite Dimensions I, Springer-Verlag,

New York 1970.

R-3

[VMF] R. J. Vanderbei, M. J. Meketon and B. A. Freedman, A modification of Karmarkar’s linear

programming algorithm, preprint (1985).

Date post:	30-Apr-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

The Nonlinear Geometry of Linear Programming I. Afﬁne and...

Documents