Inequalities : With Applications to Engineering

Inequalities: WithApplications to

Engineering

Michael J. CloudByron C. Drachman

Springer

Preface

We might wonder why it is necessary to study inequalities. Many appliedscience and engineering problems, for instance, can be pursued withouttheir explicit mention. Nevertheless, a facility with inequalities seems tobe necessary for an understanding of much of mathematics at intermediateand higher levels. Inequalities serve a natural purpose of comparison, andthey sometimes afford us indirect routes of reasoning or problem solvingwhen more direct routes might be inconvenient or unavailable.This small guide to inequalities was originally written with engineers and

other applied scientists in mind. Comments from those mathematicians whohave seen the manuscript lead us to hope that some mathematicians willfind some of the applications interesting, and that students of mathematicswill also find the book useful. It is intended to help fill the gap betweencollege-algebra treatments of inequalities and the formidable treatises onthe subject that exist in the mathematics literature. Important techniquesare all reinforced through the exercises that appear at the end of eachchapter, and hints are included to expedite the reader’s progress. We reviewa few topics from calculus, but make no attempt at a thorough review.In order to simplify the discussion, we use a stronger hypothesis than isnecessary in some of the statements or proofs of theorems and in some ofthe exercises. For a review of calculus, we recommend the fine classic byLandau [37]. Among the many good books on analysis, we can recommendStromberg [57].We would like to thank Edward Rothwell of the Department of Elec-

trical Engineering at Michigan State University, for encouragement duringthe early stages of writing this book. Thanks are also due to Beth Lannon-

vi Preface

Cloud for comments and suggestions on the figures. We thank CatherineFriess and Tammy Hatfield for creating the first LATEX version of this book.Thanks to Val Drachman for encouragement and support while we wrotethe book. We owe a substantial debt to the staff at Springer-Verlag, espe-cially Allan Abrams, Frank Ganz, Ina Lindemann, and Anne Fossella. Twoanonymous reviewers were very helpful, proposing several topics that werenot included in the preliminary version. We want to thank Glen Ander-son, Mavina Vamanamurthy, and Matti Vuorinen for their generous help,in particular for pointing out the importance of l’Hopital’s monotone ruleand for suggesting several related exercises. Finally, we wish to thank CarlGanser for many helpful discussions and suggestions, and for generouslyagreeing to read the manuscript.

Contents

Preface v

1 Basic Review of Inequalities 11.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Elementary Properties and Survival Rules . . . . . . . . . . 21.3 Bounded Set Terminology . . . . . . . . . . . . . . . . . . . 41.4 Quadratic Inequalities . . . . . . . . . . . . . . . . . . . . . 51.5 Absolute Value and the Triangle Inequality . . . . . . . . . 51.6 Miscellaneous Examples . . . . . . . . . . . . . . . . . . . . 101.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Methods from the Calculus 192.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Function Terminology and Facts . . . . . . . . . . . . . . . 192.3 Basic Results for Integrals . . . . . . . . . . . . . . . . . . . 212.4 Results from the Differential Calculus . . . . . . . . . . . . 232.5 Some Applications . . . . . . . . . . . . . . . . . . . . . . . 272.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Some Standard Inequalities 373.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 Bernoulli’s Inequality . . . . . . . . . . . . . . . . . . . . . . 373.3 Young’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 383.4 The Inequality of the Means . . . . . . . . . . . . . . . . . . 38

viii Contents

3.5 Holder’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 403.6 Minkowski’s Inequality . . . . . . . . . . . . . . . . . . . . . 433.7 The Cauchy–Schwarz Inequality . . . . . . . . . . . . . . . . 443.8 Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . 453.9 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . 463.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Inequalities in Abstract Spaces 534.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3 Iteration in a Metric Space . . . . . . . . . . . . . . . . . . 564.4 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 574.5 Orthogonal Projection and Expansion . . . . . . . . . . . . 624.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5 Some Applications 675.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Estimation of Integrals . . . . . . . . . . . . . . . . . . . . . 675.3 Series Expansions . . . . . . . . . . . . . . . . . . . . . . . . 685.4 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . 725.5 Taylor’s Method . . . . . . . . . . . . . . . . . . . . . . . . 745.6 Special Functions of Mathematical Physics . . . . . . . . . . 775.7 A Projectile Problem . . . . . . . . . . . . . . . . . . . . . . 825.8 Geometric Shapes . . . . . . . . . . . . . . . . . . . . . . . 845.9 Electrostatic Fields and Capacitance . . . . . . . . . . . . . 885.10 Applications to Matrices . . . . . . . . . . . . . . . . . . . . 935.11 Topics in Signal Analysis . . . . . . . . . . . . . . . . . . . 1005.12 Dynamical System Stability and Control . . . . . . . . . . . 1035.13 Some Inequalities of Probability . . . . . . . . . . . . . . . . 1105.14 Applications in Communication Systems . . . . . . . . . . . 1125.15 Existence of Solutions . . . . . . . . . . . . . . . . . . . . . 1155.16 A Duality Theorem and Cost Minimization . . . . . . . . . 1215.17 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Appendix Hints for Selected Exercises 127

References 143

Index 147

1Basic Review of Inequalities

1.1 Preliminaries

A few set and logic symbols shall serve as convenient shorthand, including:

∈ for set membership;⊆ for subset containment;∪ for set union;∩ for set intersection;R for the set of all real numbers;N for the natural numbers (positive integers);C for the complex numbers;⇒ for logical implication;⇔ for logical equivalence.

The set-builder notationS = x | P(x)

specifies S as the set of all elements x such that proposition P(x) holds.For instance,

S = x ∈ R | x2 − 1 = 0defines S as the set of all real solutions of x2 − 1 = 0.Recall that z ∈ C can be written as z = x+ iy, where i is the imaginary

unit, x = [z] is the real part of z, and y = [z] is the imaginary part of z.In polar form, z = |z| exp(iφ), where |z| is the (nonnegative real) modulusof z and φ is the argument of z. We denote the complex conjugate of z byz.

2 1. Basic Review of Inequalities

We agree to leave the fundamental relation “≤” (is less than or equalto) undefined; for present purposes, the reader’s intuitive notion of what ismeant by a ≤ b will suffice. We may then proceed to define the other basicinequality relations. Let a, b, c ∈ R. Then, for example,

• a ≥ b means that b ≤ a;

• a b means that b < a;

• a ≤ b ≤ c means that a ≤ b and b ≤ c.

Other compound inequality notations, such as a b ≥ c,are given analogous definitions. We say that the inequalities a d have opposite sense.Inequalities such as a ≤ b are sometimes called weak or mixed, while thosesuch as a < b (in which equality is precluded) are called strict.

Example 1.1. The various finite intervals along the real line are subsetsof R defined as:

• (a, b) = x ∈ R | a < x < b;• [a, b) = x ∈ R | a ≤ x < b;• (a, b] = x ∈ R | a < x ≤ b;• [a, b] = x ∈ R | a ≤ x ≤ b.

Infinite intervals are defined using:

• [a,∞) = x ∈ R | x ≥ a;• (−∞, a) = x ∈ R | x < a;

and so forth.

1.2 Elementary Properties and Survival Rules

The basic laws underlying inequality manipulations are the axioms [18]that distinguish the set R as an ordered field. For any a, b, c ∈ R,(a) if a ≤ b and b ≤ c, then a ≤ c;

(b) we have a ≤ b and b ≤ a if and only if a = b;

(c) either a ≤ b or b ≤ a;

(d) if a ≤ b, then a+ c ≤ b+ c;

1.2 Elementary Properties and Survival Rules 3

(e) if 0 ≤ a and 0 ≤ b, then 0 ≤ ab.

We recognize axiom (a) as the familiar transitive property. Axiom (b) ishelpful when we want to show indirectly that two real numbers a and b areequal. Unlike R, the field C cannot be ordered (Exercise 1.8). However,|z| is real; hence any inequality established for real numbers can also beapplied to the moduli of complex numbers, and vice versa.In addition to the axioms, a number of useful order properties can be

established. The following list, while by no means exhaustive, will serve asa review of some of the most important aspects of inequality manipulation.Suppose a, b, c, d ∈ R.

• One and only one of the following holds: a < b, a = b, a > b.

• If a ≤ b and b < c, then a < c.

• We have a ≤ b if and only if a + c ≤ b + c; a bc.

• We have a ≤ b if and only if a− b ≤ 0.• If a ≤ b and c ≥ 0, then ac ≤ bc.

• If a ≤ 0 and b ≥ 0, then ab ≤ 0; if a ≤ 0 and b ≤ 0, then ab ≥ 0.• We have a2 ≥ 0; furthermore, if a = 0, then a2 > 0.

• We have a < 0 if and only if 1/a < 0; a > 0 if and only if 1/a > 0.

• We have 0 < a 0 and b > 0, then a/b > 0.

• If a 1, then a2 > a. If 0 < a < 1, then a2 < a.

• For c ≥ 0 and 0 < a ≤ b, we have ac ≤ bc, with equality if and onlyif b = a or c = 0.

• If a is less than every positive real number ε, then a ≤ 0.


Generally, a term can be transposed from one side of an inequality to theother, provided that its algebraic sign is changed in the process. Inequalitiescan be added together, and inequalities between positive numbers can bemultiplied. However, inequalities cannot in general be subtracted or divided.It is false, for instance, that for every a, b, c, d, the inequalities a a−c is equivalent to (b−a)−(d−c) > 0,which cannot be guaranteed under the hypothesis that a < b and c <d alone. The other caution, about dividing inequalities, exists for similarreasons; however, it should suffice to note that dividing the inequality 1 < 2by itself would yield the false result 1 < 1.

1.3 Bounded Set Terminology

Some additional points about the real numbers shall be useful later. Let Sbe a set of real numbers. If there is a number B such that s ≤ B for everys ∈ S, then S is bounded above and B is an upper bound for S. Of course, aset that is bounded above has many upper bounds. If there exists M suchthat M is an upper bound for S and no number less than M is an upperbound for S, then M is called the least upper bound or supremum for S,denoted by supS. If supS ∈ S, we may refer to supS as the maximumof S, denoted by maxS. These concepts are elementary, but the readerunfamiliar with them should not pass over them too quickly and we offeran example for clarification.

Example 1.2. The interval I1 = [0, 1] is bounded above by 1 (and by anyx > 1). No number less than 1 is an upper bound for I1, because if y < 1,then there exists ε > 0 such that y + ε ∈ I1. Hence sup I1 = 1. Moreover,1 ∈ I1 so that max I1 = 1. The interval I2 = [0, 1), on the other hand, hasno maximum value although it does have a supremum (viz., 1).

A fundamental property of the real numbers is that any nonempty setof real numbers that is bounded above has a supremum. Similarly, we canformulate definitions for the analogous concepts of bounded below, lowerbound, and greatest lower bound or infimum. The infimum of S is symbol-ized as inf S, and always exists for sets that are bounded below. If inf S ∈ S,then S has a minimum denoted by minS. Again, the supremum and in-fimum concepts are convenient and important because for a bounded setthese values always exist, whereas maximum and minimum values may not.The supremum and infimum concepts may also be applied to functions.

Let f be a real-valued function with domain D, and let S be a nonemptysubset of D. The image of S under f is

f(S) = f(x) | x ∈ S

1.5 Absolute Value and the Triangle Inequality 5

and we have, by definition,

supx∈S

f(x) = sup f(S) and infx∈S

f(x) = inf f(S).

If f(S) is not bounded above or below, respectively, we write sup f(x) =∞or inf f(x) = −∞. Various properties of, and relations between, supremaand infima are given in Exercises 1.10 and 1.11.

1.4 Quadratic Inequalities

Consider the quadratic polynomial g(x) = ax2 + 2bx + c, a = 0, withdiscriminant ∆ = b2 − ac. Completing the square, we have

1ag(x) =

(x+

b

a

)2

− ∆a2 .

Therefore, ∆ ≤ 0 implies (1/a)g(x) ≥ 0 for all x. Conversely, if (1/a)g(x) ≥0 for all x, then in particular letting x = −b/a gives ∆ ≤ 0. It is clear that

• (1/a)g(x) ≥ 0 for all x if and only if ∆ ≤ 0;• (1/a)g(x) > 0 for all x if and only if ∆ < 0.

Geometrically, of course, g(x) is a parabola with roots (−b ± √∆)/a bythe quadratic formula. If ∆ ≤ 0, then g(x) does not have two distinct realroots; hence its graph never crosses the real axis, and g(x) has the samesign everywhere. Conversely, if either g(x) ≥ 0 for all x or g(x) ≤ 0 forall x, then ∆ ≤ 0. If ∆ < 0, then g(x) has no real roots; hence its graphnever touches the real axis, and g(x) is strictly positive or strictly negative.Conversely, if either g(x) > 0 for all x or g(x) < 0 for all x, then ∆ < 0.

1.5 Absolute Value and the Triangle Inequality

The absolute value |x| of x ∈ R is given by

|x| =

x if x ≥ 0,−x if x < 0.

Some useful properties hold for the absolute value. We have:

• |x| ≥ 0, with equality if and only if x = 0;• |ab| = |a||b| and, if b = 0, |a/b| = |a|/|b|;• |x− a| < b if and only if −b < x− a < b;


• −|a| ≤ a ≤ |a|, and ab ≤ |a||b|;• |a| ≤ |b| if and only if a2 ≤ b2.

Example 1.3. The set

Nε(x0) = x ∈ R | |x− x0| < εis called an ε-neighborhood of x0 in R.

Example 1.4. The restriction |x − 3| < 1 is sufficient to insure that theinequality |x+ 2|−1 < 1/4 holds, because

|x− 3| < 1 ⇒ −1 < x− 3 < 1⇒ 4 < x+ 2 < 6⇒ |x+ 2| > 4

and we can then take reciprocals.

This is a convenient place to introduce sequences and convergence intoour discussion. A sequence an has limit A as n tends to ∞, written

limn→∞ an = A

or an → A as n → ∞, if and only if for every positive number ε, thereexists a corresponding number N such that the inequality |an − A| < εholds whenever n > N . The sequence is said to converge to A. The fol-lowing simple fact about sequences will be crucial in our later work withinequalities, because it will provide a framework for passage from resultsfor finite sums to corresponding results for series or integrals.

Lemma 1.1. Let an → A and bn → B as n→∞, with an ≤ bn for all n.Then A ≤ B.

Proof. Suppose A > B, and let ε = (A − B)/2 > 0. Then there exist Na

and Nb such that

n > Na ⇒ |an −A| < ε,

n > Nb ⇒ |bn −B| < ε.

Choose n > maxNa, Nb and add the two inequalities−ε < bn −B < ε,

−ε < A− an < ε,

to obtain−2ε < bn − an + (A−B) < 2ε,

or−2ε < bn − an + 2ε < 2ε.

This implies that bn − an < 0, a contradiction.


Note, however, that strict inequality can be lost in such a limiting process.Consider, for instance, the case an = 0, bn = 1/n; we have an < bn for alln, but A = B = 0.In certain cases, absolute value signs can be cleared from an inequality

(i.e., an equivalent inequality can be obtained) through the equivalence

|a| ≤ |b| ⇔ a2 ≤ b2.

This can help us verify inequalities that involve several pairs of such signs.

Example 1.5. To establish

||a| − |b|| ≤ |a− b|

we can use the following chain of equivalences, eventually reaching an in-equality that is self-evident:

||a| − |b|| ≤ |a− b| ⇔ (|a| − |b|)2 ≤ (a− b)2

⇔ −2|a||b| ≤ −2ab⇔ |a||b| ≥ ab.

A pivotal inequality involving the absolute value is as follows:

Theorem 1.1 (Triangle Inequality). Let z1, . . . , zn be nonzero complexnumbers. Then ∣∣∣∣∣

n∑i=1

zi

∣∣∣∣∣ ≤n∑i=1

|zi|. (1.1)

Equality holds if and only if the zi all have the same arguments.

Proof. The case n = 1 is trivial, so we examine n = 2 as the verificationstep for mathematical induction. Some elementary facts about complexnumbers are needed here. For z ∈ C,

[z] = x ≤√

x2 + y2 = |z|.

Similarly, [z] ≤ |z|. It is also easily shown that

|z| = |z|, |z|2 = zz, [z] = 12(z + z), [z] = 1

2i(z − z).

We use these facts as follows:

|z1 + z2|2 = (z1 + z2)(z1 + z2) = |z1|2 + |z2|2 + 2[z1z2].

However,2[z1z2] ≤ 2|z1z2| = 2|z1||z2|


so that|z1 + z2|2 ≤ |z1|2 + |z2|2 + 2|z1||z2| = (|z1|+ |z2|)2.

Taking a square root and noting that both sides are positive, we obtain

|z1 + z2| ≤ |z1|+ |z2|. (1.2)

In general, we have∣∣∣∣∣n+1∑i=1

zi

∣∣∣∣∣ =∣∣∣∣∣n∑i=1

zi + zn+1

∣∣∣∣∣ ≤∣∣∣∣∣n∑i=1

zi

∣∣∣∣∣+ |zn+1|

≤n∑i=1

|zi|+ |zn+1| =n+1∑i=1

|zi|.

For a shorter proof and a discussion of the conditions for equality, seeExercise 4.9.

Similarly

|z1 − z2|2 = |z1|2 + |z2|2 − 2[z1z2]

≥ |z1|2 + |z2|2 − 2|z1||z2|= (|z1| − |z2|)2,

so that|z1 − z2| ≥ ||z1| − |z2||

and this can be combined with (1.2) in the convenient form

||z1| − |z2|| ≤ |z1 ± z2| ≤ |z1|+ |z2| (1.3)

which provides both upper and lower bounds for the modulus of a sum ordifference. Geometrically, the length of any side of a triangle can neitherexceed the sum of the lengths of the two remaining sides, nor fall short ofthe difference in the lengths of the two remaining sides.Opportunities to employ the triangle inequality range from direct ap-

plications to those requiring prior setup or possibly multiple, sequentialapplications of the inequality.

Example 1.6. We show that

|a+ b|1/2 ≤ |a|1/2 + |b|1/2

for a, b ∈ R. Because both sides are nonnegative, we may square both sidesto obtain the equivalent statement

|a+ b| ≤ |a|+ |b|+ 2|a|1/2|b|1/2.But this is implied by the triangle inequality, completing the proof.


Example 1.7. The uniqueness of a sequence limit is easily established.We begin by supposing an has limits A1, A2. Let ε > 0; then there existN1 and N2 such that

n > N1 ⇒ |an −A1| < ε,

n > N2 ⇒ |an −A2| < ε.

We choose n > maxN1, N2 and write|A1 −A2| = |A1 − an + an −A2| ≤ |A1 − an|+ |A2 − an| < ε+ ε = 2ε.

Because ε is arbitrary, it follows that |A1 −A2| = 0 and hence A1 = A2.

Example 1.8. To show that the limit of a product of sequences is theproduct of the limits, the triangle inequality is employed only after somealgebraic manipulation. Supposing an → A and bn → B as n → ∞, wehave

|anbn −AB| = |(an −A)(bn −B) +A(bn −B) +B(an −A)|≤ |an −A||bn −B|+ |A||bn −B|+ |B||an −A|.

Each quantity on the right can now be made arbitrarily small for n suffi-ciently large, and the proof is easily completed.

Example 1.9. With z = x+ iy, we have

|sinh y| ≤ |sin z| ≤ cosh ybecause

||eix||e−y| − |e−ix||ey|||2i| ≤

∣∣∣∣ei(x+iy) − e−i(x+iy)

2i

∣∣∣∣ ≤ |eix||e−y|+ |e−ix||ey||2i|

by (1.3), where |e±ix| = |i| = 1.Example 1.10. Here is a rough theorem for bounding polynomial zeros.Suppose that

f(z) = a0zn + a1z

n−1 + · · ·+ an−1z + an

(a0 = 0) where z is complex and where the coefficients ai may be real orcomplex. We assert and would like to show that all the zeros of f(z) havemoduli less than or equal to the number

ξ = 1 +n

|a0|(max

1≤k≤n|ak|). (1.4)

We look for a disk such that outside the disk the leading term of thepolynomial dominates; that is, we seek ξ such that

|z| > ξ ⇒ |a0zn| >∣∣∣∣∣n−1∑i=0

an−izi∣∣∣∣∣ .


First note that if |z| ≥ 1 we haven−1∑i=0

|z|i ≤ n|z|n−1

and so∣∣∣∣∣n−1∑i=0

an−i zi∣∣∣∣∣ ≤

n−1∑i=0

|an−i||z|i ≤Mn−1∑i=0

|z|i ≤ nM |z|n−1,

where M = max |ak|. We now choose ξ as in (1.4) so that |z| > ξ impliesboth |z| > 1 and |z| > nM/|a0|. Then for |z| > ξ,

|a0||z|n − nM |z|n−1 > 0

and

|f(z)| =∣∣∣∣∣a0z

n +n−1∑i=0

an−i zi∣∣∣∣∣ ≥ |a0z

n| −∣∣∣∣∣n−1∑i=0

an−i zi∣∣∣∣∣

≥ |a0zn| − nM |z|n−1 > 0

by (1.3). Thus all zeros of f(z) are in the disk |z| ≤ ξ. (This argumentis used in complex variable theory to prove the fundamental theorem ofalgebra via Rouche’s theorem.)

1.6 Miscellaneous Examples

A number of powerful standard inequalities are available for application inaddition to the triangle inequality, and these will be developed in Chapter3. However, we can also “invent” useful inequalities in an ad hoc mannerbased on simple ideas. For instance, the obvious fact that

1n2 <

1n(n− 1)

for all n > 1 is used in the following example:

Example 1.11. A sequence an is increasing if for every n ∈ N, an ≤an+1. It can be shown that every increasing sequence that is bounded aboveis convergent (Exercise 1.12). We can use this fact to prove convergence ofthe increasing sequence

1, 1 +14, 1 +

14+19, 1 +

14+19+116

, · · · ,

1.6 Miscellaneous Examples 11

because its terms for n ≥ 2 are given by

an = 1 +n−1∑k=1

1(k + 1)2

≤ 1 +n−1∑k=1

1k(k + 1)

= 1−n−1∑k=1

(1

k + 1− 1

k

)= 1−

(1n− 1)

< 2.

Here we have illustrated a common practice — the use of an inequalityas an alternative to an attempt at direct algebraic simplification of theoriginal expression. The idea, of course, is to compare a difficult expressionwith one that is simpler. We also used partial fraction expansion and, toevaluate the summation, the telescoping property (Exercise 1.7).

Some inequalities may not be so obvious at first glance; however, theybecome clear upon revelation of the process by which they are obtained.

Example 1.12. If n ∈ N and n > 1, then

14n

<1

(n+ 1)2+

1(n+ 2)2

+ · · ·+ 1(n+ n)2

<1n.

To see why, we note that of the n terms in the expression

an =1

(n+ 1)2+

1(n+ 2)2

+ · · ·+ 1(n+ n)2

the smallest is 1/(n+ n)2 and the largest is 1/(n+ 1)2. Obviously

n

[1

(n+ n)2

]< an < n

[1

(n+ 1)2

]

or14n

< an <n

(n+ 1)2.

Reducing the denominator of the rightmost member (we give a little andreplace n+ 1 by n) yields the desired result.

The give-a-little approach can also be combined with other techniques,such as mathematical induction.

Example 1.13. To show that for n ∈ N and 0 ≤ x ≤ 1,

(1 + x)n ≤ 1 + (2n − 1)x,

we put n = k into the inequality and create a proposition P(k). P(1) is1 + x ≤ 1 + x, and thus holds trivially. It remains to show that P(k) ⇒


P(k + 1). Hence we assume P(k) holds, and multiply both sides by thepositive quantity 1 + x:

(1 + x)k+1 ≤ (1 + x) + (1 + x)(2k − 1)x≤ (1 + x) + 2(2k − 1)x= 1 + (2k+1 − 1)x.

This is P(k + 1), as required.Other useful inequalities, such as

(1 + x)n >n(n− 1)

2x2

for x > 0, n ∈ N, n > 1, may be obtained ad hoc from the binomialexpansion

(1 + x)n = 1 + nx+n(n− 1)

2!x2 + · · · . (1.5)

Example 1.14. To show that

limn→∞

n

an= 0

for a > 1, we can write

n

an=

n

[1 + (a− 1)]n <n

n(n− 1)2 (a− 1)2

=2

(n− 1)(a− 1)2 .

Then, for n > 1,

0 <n

an<

2(n− 1)(a− 1)2

and as n → ∞ the nth term of the sequence is squeezed to zero. Thissqueezing idea will occur repeatedly throughout the book, and a rigorousjustification is requested in Exercise 1.4.

Many problems of physical interest rely on simple inequality concepts fortheir solution. We give two illustrations at this point, and will be able tooffer many more after further stocking our arsenal with inequalities.

Example 1.15. Consider a person walking across flat, nonslippery terrain.One leg swings forward pendulum-like while the other foot is planted firmlyon the ground. In a simple biomechanical model [2] for the body duringthe stride, the leg attached to the planted foot is represented by a straight,rigid member of length L, while the rest of the body is a point mass mon top of the leg (Figure 1.1). This mass describes a circular arc at sometangential velocity v, and having centripetal acceleration v2/L. This value

1.6 Miscellaneous Examples 13

m

L

v

FIGURE 1.1. Example on walking.

of acceleration certainly cannot exceed the free-fall acceleration constantg, and we are led immediately to the inequality

v ≤√

gL.

We can use this to estimate the speed at which a transition from the walkinggait to a running gait must occur for an average person with L ≈ 0.9 meters;since g = 9.8 m/s2, the model suggests that the person must run to exceeda speed of 3 m/s.

Example 1.16. Electric current divides among parallel resistors in sucha way that power dissipation is minimized. For suppose that resistancesR1, . . . , Rm are connected in parallel, let i be the total current entering theparallel combination, and let in be the current through Rn. The resistorsall share an identical voltage v. The power dissipated in Rn is i2nRn, andthe total power dissipated is given by

P =m∑n=1

i2nRn.

Consider now what would happen if the total current i were distributedin some other way. Letting the current through Rn be in + δn instead, theconstraint

m∑n=1

(in + δn) = i

implies that the δn sum to 0. If P ′ is the new dissipated power, then

P ′ − P =m∑n=1

(in + δn)2Rn −m∑n=1

i2nRn = 2vm∑n=1

δn +m∑n=1

δ2nRn.

Hence

P ′ − P =m∑n=1

δ2nRn ≥ 0

and we conclude that P ′ ≥ P .


1.7 Exercises

1.1. Assuming n,m ∈ N, prove the following:

(a) x, y > 1 ⇒ x+ y < 2xy.

(b) x > 0 ⇒ x+ x−1 ≥ 2.

(c) (x+ y)2 ≤ 2(x2 + y2).

(d) n > 2 ⇒ n! > 2n−1.

(e) 2n+1 > n2.

(f) a, b > 0 ⇒ a4 + b4 ≥ ab(a2 + b2).

(g) n > 1 ⇒(

n2

n2 − 1

)n> 1 + 1

n .

(h) (n+m)! ≥ n!(n+ 1)m.

(i)∑m

n=11√n> 2(

√m+ 1 − 1).

(j) sinhx ≤ coshx.

(k) coshx ≥ 1.

1.2. Simple physical applications:

(a) An ice skater spins with arms fully outstretched. Show that when she pullsin her arms, her angular frequency and rotational kinetic energy both in-crease.

(b) Show that the parallel combination of a set of electrical resistors gives anequivalent resistance that cannot exceed any of the individual resistancesin the set.

1.3. Establish Weierstrass’s inequalities:

(a) For positive real numbers a1, . . . , an,

n∏i=1

(1 + ai) ≥ 1 +n∑

i=1

ai.

(b) For n ≥ 2 with 0 ≤ ai < 1,

n∏i=1

(1 − ai) > 1 −n∑

i=1

ai.

(For related inequalities, along with applications to the convergence of infiniteproducts, see Bromwich [12].)

1.4. Problems on sequence limits:

(a) Prove the squeeze principle: if an → L and cn → L as n → ∞ and inaddition an ≤ bn ≤ cn for n > N , then bn → L.

(b) Show that n1/n → 1 as n → ∞.

(c) Find the limit of n!/nn as n → ∞.

1.7 Exercises 15

1.5. Given positive numbers a1, . . . , am, the numbers A,H, and G defined by

A =1m

m∑n=1

an, H =

(1m

m∑n=1

a−1n

)−1

, G =

(m∏

n=1

an

)1/m

,

are called the arithmetic, harmonic, and geometric means, respectively, of theset. Show that if the an are not all equal, then each of the means lies betweenthe minimum and maximum values of the an.

1.6. The Fibonacci numbers fn are defined by the recursion

fn = fn−1 + fn−2

with f1 = f2 = 1. Show thatfn < 2n

for n ∈ N.

1.7. Prove the telescoping property

m∑n=1

(an+1 − an) = am+1 − a1

for finite summations. Note that if b1, . . . , bm is another set of numbers such thatbn > an+1 − an, then the inequality

m∑n=1

bn > am+1 − a1

may be asserted even if the sum on the left happens to be unavailable in closedform. Apply this idea with an =

√n to show that

2(√m+ 1 − √

m) <m∑

n=1

1√n.

1.8. For a number system S to be ordered, S must have a subset P of positivenumbers such that the following requirements are met. First, the sum and productof any pair of positive numbers must also be positive. Second, for every x ∈ Sexactly one of the conditions x = 0, x ∈ P , −x ∈ P must hold. Use this definitionto establish that the system C cannot be ordered.

1.9. Show that with z = x+ iy:

(a) |sinh y| ≤ |cos z| ≤ cosh y;

(b) sinh |x| ≤ |sinh z| ≤ coshx.

1.10. Supply proofs for the following assertions about suprema and infima. As-sume all sets are subsets of R.

(a) If supS exists, then it is unique (likewise for inf S).

(b) If x is an upper bound for S and x ∈ S, then x = maxS.


(c) We have s = supS if and only if: (1) for all ε > 0, if x ∈ S, then x < s+ ε;and (2) for all ε > 0, there exists y ∈ S such that y > s− ε.

(d) If supS exists, then supS = inf U , where U is the set of all upper boundsfor S. If inf S exists, then inf S = supL, where L is the set of all lowerbounds for S.

(e) If S is nonempty, then inf S ≤ supS, with equality if and only if S containsa single number.

(f) Let A ⊆ B. If supA and supB both exist, then supA ≤ supB. If inf Aand inf B both exist, then inf A ≥ inf B.

(g) Let S− = x | −x ∈ S where S is a bounded set. Then

supS = −inf S−,

inf S = −supS−.

(h) Let Sc = x | x/c ∈ S where S is bounded and c > 0. Then

supSc = c supS,

inf Sc = c inf S.

1.11. Assume that f(x) and g(x) are defined over a common domain D, andestablish the following relations. (Take all subsets to be nonempty.)

(a) If S1 ⊆ S2 ⊆ D, then

supx∈S1

f(x) ≤ supx∈S2

f(x),

infx∈S1

f(x) ≥ infx∈S2

f(x).

(b) If f(x) ≤ g(x) for all x ∈ S ⊆ D, then

supx∈S

f(x) ≤ supx∈S

g(x),

infx∈S

f(x) ≤ infx∈S

g(x).

(c) For S ⊆ D,

supx∈S

[f(x) + g(x)] ≤ supx∈S

f(x) + supx∈S

g(x),

infx∈S

[f(x) + g(x)] ≥ infx∈S

f(x) + infx∈S

g(x),

supx∈S

[−f(x)] = − infx∈S

f(x),

infx∈S

[−f(x)] = − supx∈S

f(x),

supx∈S

[f(x) − g(x)] ≤ supx∈S

f(x) − infx∈S

g(x),

infx∈S

[f(x) − g(x)] ≥ infx∈S

f(x) − supx∈S

g(x).

1.7 Exercises 17

(d) If f(x) ≥ 0 and g(x) ≥ 0 whenever x ∈ S ⊆ D, then

infx∈S

[f(x)g(x)] ≥ infx∈S

f(x) infx∈S

g(x),

supx∈S

[f(x)g(x)] ≤ supx∈S

f(x) supx∈S

g(x),

supx∈S

[1/f(x)] = 1/ infx∈S

f(x),

infx∈S

[1/f(x)] = 1/ supx∈S

f(x).

1.12. Prove that every bounded monotonic sequence converges. (Recall that asequence is monotonic if it is either increasing or decreasing; it is increasing ifam ≥ an whenever m > n, or decreasing if am ≤ an whenever m > n.)

1.13. By definitionlimx→∞

f(x) = L,

where L is finite, means that corresponding to ε > 0 there exists N > 0 suchthat |f(x) − L| < ε whenever x > N . Prove the following assertions:

(a) The limit, if it exists, is unique.

(b) The limit of a sum is the sum of the limits, and the limit of a product isthe product of the limits.

(c) If f(x) ≥ 0 for x sufficiently large, then L ≥ 0.

This page intentionally left blank

2Methods from the Calculus

2.1 Introduction

Several topics studied in calculus are basic to working with inequalities.Facts involving function continuity, differentiation, and extrema are partic-ularly important, as are certain facts about integration. We review the mostcrucial of these here, and then advance to some preliminary applications.

2.2 Function Terminology and Facts

Let I represent an interval. By the statement f(x) is bounded on I, wemean that there is a number B such that for every x ∈ I,

|f(x)| ≤ B.

Example 2.1. The function sinx is bounded on any interval.

Two forms of notation are sometimes useful in comparing the behaviorof functions of the same independent variable as that independent variabletends to a limit. Given two functions f(x) and g(x), we may write

f(x) = O(g(x)) as x→∞if there exist positive numbers N and B such that

|f(x)| ≤ Bg(x) (2.1)

20 2. Methods from the Calculus

whenever x > N . Similarly, f(x) = O(g(x)) as x → 0+ if there existpositive numbers δ and B such that (2.1) holds whenever 0 < x < δ. Weunderstand statements of the form

f(x) = g(x) +O(h(x))

to mean that the function f(x) − g(x) is O(h(x)). If f(x)/g(x) → 0 asx→ x0, we may write

f(x) = o(g(x)) as x→ x0.

In these O and o statements, the gauge function g is usually chosen to havea simple form, such as x−1, 1, or x.We assume a working knowledge of function limits. One fact, however, is

worthy of explicit mention. We omit the proof, which is analogous to thecorresponding proof for sequences (Exercise 1.4).

Lemma 2.1 (Squeeze Principle). If g(x) ≤ f(x) ≤ h(x), and if

limx→a

g(x) = limx→a

h(x) = L,

thenlimx→a

f(x) = L.

We also take a moment to review function continuity. The statementf(x) is continuous at x = a means that for every ε > 0, there is a δ > 0such that |f(x) − f(a)| < ε whenever |x − a| < δ. We write f(x) ∈ C(I)if f is continuous at every x ∈ I (suitable modifications having been madefor continuity at endpoints of closed intervals). Continuity of f on [a, b]is denoted by f(x) ∈ C[a, b]. Two useful facts about continuity are thefollowing:

Lemma 2.2 (Persistence of Sign). Suppose f(x) ∈ C at x = x0, andsuppose that f(x0) is nonzero. Then there is an open interval containingx0 such that f(x) is nonzero at every point of the interval.

Proof. Assume f(x0) > 0. (Otherwise replace f by −f .) Let ε = f(x0).There exists δ > 0 such that |x − x0| < δ implies |f(x) − f(x0)| < ε so ifx ∈ (x0− δ, x0+ δ), then −ε < f(x)−f(x0) < ε. Hence f(x0)−ε < f(x) <f(x0) + ε or, since f(x0) = ε, 0 < f(x).

Theorem 2.1 (Continuity and Convergence). A function f(x) is con-tinuous at x0 if and only if f(xn)→ f(x0) whenever xn → x0.

Proof. Suppose f is continuous at x0 and xn → x0. Let ε > 0. There existsδ > 0 such that |x − x0| < δ implies |f(x) − f(x0)| < ε. Now supposexn → x0. Choose N such that n > N implies |xn − x0| < δ. For this N,n > N implies |f(xn)−f(x0)| < ε and therefore f(xn)→ f(x0). Conversely,

2.3 Basic Results for Integrals 21

suppose xn → x0 implies f(xn)→ f(x0). To show f is continuous at x0, wesuppose f is not continuous at x0, and seek a contradiction. There existsε > 0 such that for any δ > 0, there exists some x with |x − x0| < δ but|f(x) − f(x0)| ≥ ε. In particular we may choose a sequence δi = 1/i andxi with |xi − x0| < δi but |f(xi)− f(x0)| ≥ ε for all i ∈ N. Then xi → x0but it is false that f(xi)→ f(x0).

The consequences of continuity on a closed interval are particularly im-portant. We state the following without proof, and refer the reader to anystandard calculus text for more details. The first of these consequences isknown as the intermediate value property.

Theorem 2.2. If f(x) ∈ C[a, b], then f(x) assumes every value betweenf(a) and f(b), f(x) is bounded on [a, b], and f(x) takes on its supremumand its infimum in [a, b].

Finally, we review concepts relating to function monotonicity and ex-trema. A function f(x) is increasing on I if f(x2) ≥ f(x1) whenever x2 > x1for all x1, x2 in I. Similarly, f(x) is decreasing if f(x2) ≤ f(x1) wheneverx2 > x1. If strict inequality holds we use the terms strictly increasing ordecreasing, respectively. Let x0 ∈ I. If f(x0) ≥ f(x) for all x ∈ I, thenf(x) has a maximum on I equal to f(x0). The definition of minimum isanalogous.

2.3 Basic Results for Integrals

The formal definition of the Riemann integral appears in Exercise 2.8. It ishelpful to keep in mind that f(x) is integrable on [a, b] if f(x) is continuousor monotonic on [a, b].Several useful inequalities for integrals can be established by forming

Riemann sums. Given an integral∫ b

a

f(x) dx,

we use the notation ∆x = (b−a)/n and xi = a+ i∆x for i = 0, . . . , n, andwrite the corresponding Riemann sum as

n∑i=1

f(xi)∆x.

Once an inequality is established for such a sum, we may let n → ∞ andapply Lemma 1.1 to obtain an inequality involving the integral.

Theorem 2.3. If f(x) and g(x) are integrable on [a, b] with f(x) ≤ g(x),then ∫ b

a

f(x) dx ≤∫ b

a

g(x) dx.


Proof. With the notation described above, we form Riemann sums:

n∑i=1

f(xi)∆x ≤n∑i=1

g(xi)∆x.

The result follows as n→∞ by Lemma 1.1.

Corollary 2.3.1 (Simple Estimate). If f(x) is integrable on [a, b] withm ≤ f(x) ≤M , then

m(b− a) ≤∫ b

a

f(x) dx ≤M(b− a).

Corollary 2.3.2 (Modulus Inequality). If f(x) is integrable on [a, b],then ∣∣∣∣∣

∫ b

a

f(x) dx

∣∣∣∣∣ ≤∫ b

a

|f(x)| dx.

The second corollary follows from the inequalities

−|f(x)| ≤ f(x) ≤ |f(x)|

and plays the role of the triangle inequality for integrals.If continuity is assumed in the integrand function f(x), the persistence

of sign property leads to the next result.

Lemma 2.3. Let f ∈ C[a, b] and suppose that f(x) ≥ 0 on [a, b] withf(x) > 0 for some x ∈ [a, b]. Then

∫ b

a

f(x) dx > 0.

Proof. Suppose f(x0) > 0 where x0 ∈ (a, b). There is an open intervalabout x0 where f(x) > 0. Choose a smaller closed interval where f(x) > 0,say I = [x0 − δ, x0 + δ]. Let m be the minimum value of f(x) in I. Then

∫ b

a

f(x) dx ≥ m(2δ) > 0.

If x0 is an endpoint, f(x) is also positive at an interior point so the argumentstill applies.

A class of results known as mean value theorems are also useful. We givetwo of these next, and refer the reader to Exercise 2.10 for other examples.

2.4 Results from the Differential Calculus 23

Theorem 2.4 (Second Mean Value Theorem for Integrals). If f ∈C[a, b], and g(x) is integrable and never changes sign on [a, b], then forsome ξ ∈ [a, b]

∫ b

a

f(x)g(x) dx = f(ξ)∫ b

a

g(x) dx. (2.2)

Proof. Assume that g(x) ≥ 0 on [a, b]; otherwise, replace g(x) by −g(x).Let M and m denote the maximum and minimum values, respectively, off(x) on [a, b]. Then

mg(x) ≤ f(x)g(x) ≤Mg(x)

for all x, hence

m

∫ b

a

g(x) dx ≤∫ b

a

f(x)g(x) dx ≤M

∫ b

a

g(x) dx.

If∫ bag(x) dx = 0 then any choice of ξ will do. Otherwise

m ≤∫ baf(x)g(x) dx∫ bag(x) dx

≤M.

By the intermediate value property applied to f ,

f(ξ) =

∫ baf(x)g(x) dx∫ bag(x) dx

for some ξ ∈ [a, b], and (2.2) follows.Corollary 2.4.1 (First Mean Value Theorem for Integrals). If f ∈C[a, b], then for some ξ ∈ [a, b]

∫ b

a

f(x) dx = f(ξ)(b− a).

Hence there is a point ξ ∈ [a, b] such that f(ξ) equals the average of f(x)taken over [a, b].

2.4 Results from the Differential Calculus

The notation f ∈ Cn[I] means that the first n derivatives of f exist andare continuous on I. In the case n = 0 it means that f is continuous, andwe omit the superscript.We first recall the fundamental theorem of calculus. A proof may be found

in any standard calculus text.


Theorem 2.5. If f ∈ C[a, b] and F ′(x) = f(x), then∫ b

a

f(x) dx = F (b)− F (a).

The next result is a source of series expansions that are useful in approx-imating functions and, as we shall see, in generating inequalities.

Theorem 2.6 (Taylor’s Theorem). Let x > a, f(x) ∈ Cn[a, x], and letf (n+1)(x) exist on (a, x). Then

f(x) = f(a) + f ′(a)(x− a) + · · ·+ f (n)(a)n!

(x− a)n +f (n+1)(ξ)(n+ 1)!

(x− a)n+1

for some ξ strictly between a and x.

Proof. The first n+ 1 terms constitute the Taylor polynomial of degree nfor f(x) about the point a; the last term is called the remainder term. Tosimplify the proof, assume f ∈ C(n+1)[a, b]. By Theorem 2.5,

f(x)− f(a) =∫ x

a

f ′(t) dt.

Integrate by parts with u = f ′(t), du = f ′′(t)dt, v = −(x− t), and dv = dt;then ∫ x

a

f ′(t) dt = f ′(a)(x− a) +∫ x

a

f ′′(t)(x− t) dt.

Repeat with u = f ′′(t), du = f ′′′(t)dt, v = −(x− t)2/2, dv = (x− t)dt, andcontinue the process until

f(x) = f(a)+f ′(a)(x−a)+· · ·+f (n)(a)n!

(x−a)n+1n!

∫ x

a

f (n+1)(t)(x−t)n dt.

Because (x − t)n never changes sign in the interval with endpoints a andx, by (2.2) the remainder term can be rewritten

1n!

∫ x

a

f (n+1)(t)(x−t)n dt =f (n+1)(ξ)

n!

∫ x

a

(x−t)n dt =f (n+1)(ξ)(n+ 1)!

(x−a)n+1

for some ξ between a and x.

The next two results, although both important in their own right, canbe viewed as immediate consequences of Taylor’s theorem.

Corollary 2.6.1 (Mean Value Theorem). Let f ∈ C[a, b], with f(x)differentiable on (a, b). Then there exists ξ ∈ (a, b) such that

f(b) = f(a) + f ′(ξ)(b− a). (2.3)

2.4 Results from the Differential Calculus 25

a b xξ

y

FIGURE 2.1. Mean value theorem.

Intuitively, there is a point in (a, b) such that the slope of the line tangentto f(x) at that point is equal to the slope of the secant line connecting thefunctional values at the endpoints of [a, b]. See Figure 2.1.

Corollary 2.6.2 (Rolle’s Theorem). If f(x) ∈ C[a, b], f(x) is differen-tiable on (a, b), and f(b) = f(a) = 0, then there exists ξ ∈ (a, b) such thatf ′(ξ) = 0.

Rolle’s theorem indicates that between every two zeros of a continuousfunction the derivative has at least one zero.

Theorem 2.7 (Conditions for Monotonicity). If f ∈ C[a, b] and f(x)is differentiable on (a, b) with f ′(x) ≥ 0, then f(x) is increasing on [a, b]. Iff ′(x) > 0, then f(x) is strictly increasing. Corresponding statements holdfor decreasing functions, for which f ′(x) ≤ 0.Proof. We prove only the first part of the theorem, and leave the rest forthe reader. Suppose a ≤ x1 < x2 ≤ b. By Corollary 2.6.1, there is a numberξ ∈ (x1, x2) such that f(x2) − f(x1) = f ′(ξ)(x2 − x1). But f ′(ξ) ≥ 0 byhypothesis and x2 − x1 > 0, so f(x2) − f(x1) ≥ 0. Hence f(x2) ≥ f(x1)whenever x2 > x1 on [a, b], as required.

Theorem 2.8 (Cauchy’s Mean Value Theorem). Let f, g ∈ C[a, b] bedifferentiable on (a, b). Then there exists ξ ∈ (a, b) such that

[f(b)− f(a)] g′(ξ) = [g(b)− g(a)] f ′(ξ).

Proof. Call A = f(b) − f(a), B = −[g(b) − g(a)], C = −Bf(a) − Ag(a),and apply Rolle’s theorem to h(x) = Ag(x) +Bf(x) + C.

The following is useful for establishing the monotonicity of the ratio oftwo functions:


Theorem 2.9 (l’Hopital’s Monotone Rule). Let f, g ∈ C[a, b], with fand g differentiable on (a, b) such that g′(x) = 0 on (a, b). Let f ′(x)/g′(x)be increasing (or decreasing) on (a, b). Then the functions

f(x)− f(a)g(x)− g(a)

andf(x)− f(b)g(x)− g(b)

are also increasing (or decreasing) on (a, b). If f ′(x)/g′(x) is strictly in-creasing (or decreasing) so are


andf(x)− f(b)g(x)− g(b)

.

Proof. (See [3]). We may assume g′(x) > 0 on (a, b). (If not, multiply fand g by −1.) By Theorem 2.8, for x ∈ (a, b) there exists y ∈ (a, x) with


=f ′(y)g′(y)

≤ f ′(x)g′(x)

, so f ′(x) ≥ g′(x)f(x)− f(a)g(x)− g(a)

.

Now use the quotient rule and the last expression to deduce that the deriva-tive of [f(x) − f(a)]/[g(x) − g(a)] is nonnegative, hence Theorem 2.7 ap-plies.

By l’Hopital’s rule, to evaluate a ratio of the indeterminate form 0/0 wedifferentiate both numerator and denominator and try to evaluate again.Theorem 2.9 is almost as easily remembered. To determine whether a ra-tio is monotonic on an interval, we verify that we get 0/0 at one of theendpoints, then differentiate numerator and denominator and try again(making sure the new denominator is nonzero on the open interval).

Theorem 2.10 (Second Derivative Test). Assume f(x) ∈ C2(a, b).Let x0 ∈ (a, b), and suppose that f ′(x0) = 0 and f ′′(x0) > 0. Then f(x)has a local minimum at x0. That is, there is a positive δ such that if 0 <|x− x0| < δ, then f(x) > f(x0).

Proof. Because f ′′(x0) > 0, by Lemma 2.2 there is a positive δ such thatf ′′(x) > 0 if |x − x0| < δ. Now let 0 < |∆x| < δ. By Theorem 2.6 thereexists some ξ strictly between x0 and x0 +∆x such that

f(x0 +∆x) = f(x0) + f ′(x0)∆x+12f ′′(ξ)(∆x)2. (2.4)

Since f ′(x0) = 0 and f ′′(ξ) > 0 the result follows by inspection. Note thatif we assume f ′(x0) = 0 and f ′′(x0) < 0, then f(x) has a local maximumat x0. We will state and prove the theorem for n variables later.

2.5 Some Applications 27

2.5 Some Applications

We begin by exploiting Corollary 2.6.1, the mean value theorem.

Example 2.2. We can verify the useful inequality

tanx > x (2.5)

for 0 < x < π/2 by applying Corollary 2.6.1 with f(x) = tanx, a = 0, andb = x < π/2; i.e., by asserting that

tanx− tan 0 = 1cos2 ξ

(x− 0)

for some ξ ∈ (0, x), and simply noting that 0 < cos2 ξ < 1. Similarly,Corollary 2.6.1 yields

sinx < x

whenever x > 0 (Exercise 2.4), so

sinx < x < tanx

whenever 0 < x < π/2.

Example 2.3. Applying Corollary 2.6.1 to the natural log, we obtain

ln(1 + x)− ln 1 = 1ξ[(1 + x)− 1]

for some ξ ∈ (1, 1 + x). Therefore

x

1 + x< ln(1 + x) < x

whenever x > 0. This is the logarithmic inequality. (The range of validitycan be extended to include −1 < x < 0 as well.)

The other differentiation theorems also provide ways to check many pro-posed inequalities. One plan is as follows. Suppose the proposed inequalityis of the general form

g(x) < h(x) (x > x0), (2.6)

where g(x0) = h(x0) and the functions g(x) and h(x) have known deriva-tives. Defining

f(x) = h(x)− g(x)

we have f(x0) = 0. If we can further show that f ′(x) > 0 for x > x0, then(2.6) is established.


xπ/2

y

sinx

2x/π

1

FIGURE 2.2. Jordan’s inequality.

Example 2.4. We can prove

xr ≤ rx+ (1− r) (2.7)

for x > 0 and 0 < r < 1 by this method. Defining

f(x) = (1− r) + rx− xr

we find f(1) = 0 and

f ′(x) = r − rxr−1 = r

(1− 1

x1−r

).

For x > 1, f ′(x) > 0; for 0 < x < 1, f ′(x) < 0. Hence (2.7) holds, withequality if and only if x = 1. Similarly,

xr ≥ rx+ (1− r)

whenever x > 0 and r > 1.

Example 2.5. For 0 < x < π/2,

d

dx

(sinxx

)= cosx

(x− tanx

x2

)< 0

by (2.5), so sinx/x is strictly decreasing on the interval of interest. Becausesinx/x→ 2/π as x→ π/2, we conclude that

sinx >2πx

whenever 0 < x < π/2. This is Jordan’s inequality. It is useful and easilyremembered (Figure 2.2).


Another handy technique relies on the inspection of series expansions toestablish inequalities. We saw in Chapter 1 how the binomial expansioncould be used for this purpose. Taylor series are also useful in this regard.

Example 2.6. From the Taylor series

ex =∞∑n=0

xn

n!

we see that

ex > 1 + x+x2

2for all x > 0. Even more simply

ex > 1 + x,

but we can replace x by x/n to get the less obvious result

ex >(1 +

x

n

)n

for all x > 0 and n ∈ N.Example 2.7. If z ∈ C, (1.1) yields

|sin z| =∣∣∣∣∣

∞∑n=1

(−1)n−1 z2n−1

(2n− 1)!

∣∣∣∣∣≤

∞∑n=1

∣∣∣∣(−1)n−1 z2n−1

(2n− 1)!∣∣∣∣

=∞∑n=1

|z|2n−1

(2n− 1)!

and we have|sin z| ≤ sinh |z|.

The basic integration properties are useful in estimating (i.e., puttingbounds on) integrals.

Example 2.8. For any positive constants a, b, c, d with a < b,

(b− a)√

ca3 + d <

∫ b

a

√cx3 + d dx < (b− a)

√cb3 + d.

Example 2.9. Consider the integral

I =∫ 1

0

x5

(x+ 25)1/2dx.


x

y

0 1 2 3 4 5

xp

FIGURE 2.3. Overestimating an integral.

On the interval [0, 1] we have x5 ≥ 0; hence by Theorem 2.4 there existsξ ∈ [0, 1] such that

I =1

6(ξ + 25)1/2.

Therefore1

6√26≤ I ≤ 1

30.

Example 2.10. A simple observation shows that

∫ ∞

t

e−x2

x2n dx =∫ ∞

t

xe−x2

x2n+1 dx ≤∫ ∞

t

xe−x2

t2n+1 dx

= − 12t2n+1

∫ ∞

t

(−2x)e−x2dx =

e−t2

2t2n+1 .

Example 2.11. The average of a positive, increasing function is increas-ing. We can see this as follows. Let f(x) be increasing on [0, a]. Then forevery x ∈ (0, a] we have

f(x) ≥ maxu∈[0,x]

f(u) =(maxu∈[0,x]

f(u))1x

∫ x

0du ≥ 1

x

∫ x

0f(u) du.

Hence

f(x)− 1x

∫ x

0f(u) du ≥ 0

or1x2

(xf(x)−

∫ x

0f(u) du

)≥ 0.


x

y

0 1 2 3 4

xp

FIGURE 2.4. Underestimating an integral.

By the quotient rule for differentiation,

d

dx

(1x

∫ x

0f(u) du

)≥ 0

as required.

It is possible to obtain other inequalities involving integrals through anad hoc consideration of areas bounded by various plane curves. This simpleprocess, reminiscent of the integral test from calculus, is probably bestdescribed in an example.

Example 2.12. The function f(x) = xp, where −1 < p < 0, is strictlydecreasing. From Figures 2.3 and 2.4 it is apparent that

∫ n+1

1xp dx <

n∑k=1

kp <

∫ n

0xp dx.

Hence, after carrying out the integrations,

(n+ 1)p+1 − 1p+ 1

<

n∑k=1

kp <np+1

p+ 1.

Integration along a contour in the complex plane follows many rulesthat are analogous to those for real integration, with little modification. Inparticular, Corollary 2.3.2 extends to the complex case: if g(z) is integrableon contour C, then

∣∣∣∣∫C

g(z) dz∣∣∣∣ ≤∫C

|g(z)| |dz|.


Example 2.13. Suppose C is of finite length L. If there is a numberM > 0such that for all z ∈ C the inequality

|g(z)| < M

holds, then ∣∣∣∣∫C

g(z) dz∣∣∣∣ < ML

because∣∣∣∣∫C

g(z) dz∣∣∣∣ ≤∫C

|g(z)| |dz| <∫C

M |dz| =M

∫C

|dz| =ML.

2.6 Exercises

2.1. Let p ∈ R, p > 0. Use the fact that

lnx =∫ x

1

dt

t

and the squeeze principle to show that

limx→∞

lnxxp

= 0.

2.2. Use differentiation to prove the following. Assume n ∈ N.

(a) lnx ≤ n(x1/n − 1) for x > 0 with equality if and only if x = 1.

(b) xn + (n− 1) ≥ x for x ≥ 0.

(c) 2 ln(secx) < sinx tanx for 0 < x < π/2.

(d) sinh x ≥ x for x ≥ 0.

(e) |x lnx| ≤ e−1 for 0 ≤ x ≤ 1.

(f) ex < (1 − x)−1 for x < 1.

(g) πe < eπ.

(h) (s+ t)a ≤ sa + ta ≤ 21−a(s+ t)a for s, t > 0, 0 < a ≤ 1.

(i) 21−b(s+ t)b ≤ sb + tb ≤ (s+ t)b for s, t > 0, b ≥ 1.

2.3. Obtain the inequalityex ≥

(exa

)afor x > a and a > 0, and determine the circumstances for equality to hold. (SeeMitrinovic [45] for several related inequalities.)

2.4. Use Corollary 2.6.1 to derive the inequalities:

(a) sinx < x for x > 0.

(b) x/(1 + x2) < tan−1 x < x for x > 0.

(c) 1 + (x/2√1 + x) <

√1 + x < 1 + x/2 for x > 0.

2.6 Exercises 33

(d) ex(y − x) < ey − ex < ey(y − x) for y > x.

(e) (1 + x)a ≤ 1 + ax(1 + x)a−1, for a > 1 and x > −1, with equality if andonly if x = 0.

Also prove that if |f ′(x)| ≤ B for some constant B, then f satisfies the Lipschitzcondition

|f(x2) − f(x1)| ≤ B|x2 − x1|.

2.5. Applications of l’Hopital’s monotone rule:

(a) For a > 1 and x > −1, x = 0, define h(x) = ((1+x)a−1)/x. Use l’Hopital’srule to define h(0) = a. Use Theorem 2.9 to show that h(x) is increasingon [−1,∞) and hence that

(1 + x)a ≥ 1 + ax

with equality if and only if x = 0 (cf., Example 2.4).

(b) Show that

h(x) =ln coshx

ln((sinhx)/x)

is decreasing on (0,∞).

(c) Prove that for x ∈ (0, 1),

π <sinπxx(1 − x)

≤ 4.

(d) Prove that 1 > sinx/x > 2/π on (0, π/2) (cf., Example 2.5.)

2.6. Use series expansions to establish the following inequalities:

(a) tan 2x ≥ tanx+ tanhx for 0 ≤ x < π/4.

(b) |cos z| ≤ cosh |z|, z ∈ C.

(c) |ln(1 + x)| ≤ − ln(1 − |x|) if |x| < 1.

(d)∏∞

n=1(1 + an) ≤ exp(∑∞

n=1 an)if 0 ≤ an < 1 for all n.

2.7. Show that if n is an integer greater than 1 and a, b are positive with a > b,then

bn−1 <an − bn

n(a− b)< an−1.

Use this to prove that no positive real number can have more than one positiventh root.

2.8. The following exercises involve the definition of integration. Recall fromcalculus that f(x) is integrable on [a, b] and

∫ b

a

f(x) dx = I

means that given ε > 0 there exists some δ > 0 such that if

a = x0 < x1 < · · · < xn = b


and if xi − xi−1 < δ for i = 1, . . . , n and if ξi ∈ [xi−1, xi] for i = 1, . . . , n, then∣∣∣∣∣

n∑i=1

f(ξi)(xi − xi−1) − I

∣∣∣∣∣ < δ.

Note as a special case that if f(x) is integrable on [a, b], then

limn→∞

n∑i=1

f(a+ i∆x)∆x =∫ b

a

f(x) dx,

where ∆x = (b− a)/n.

(a) Show that if f(x) is integrable on [a, b], then f(x) is bounded on [a, b].

(b) Show that if f(x) is integrable on [a, b], then the function

F (x) =∫ x

a

f(t) dt

is continuous on [a, b].

(c) Define f(x) = x−1/2 if 0 < x ≤ 1 and f(0) = 0. Does∫ 10 f(x) dx exist?

2.9. More exercises involving integration:

(a) Put simple lower and upper bounds on the family of integrals

I(α, β) =∫ 1

0

dx

(xβ + 1)α

where α, β ≥ 0.

(b) Show that ∫ π/2

0ln(1/ sin t) dt < ∞.

(c) A function f(t) is of exponential order on [0,∞) if there exist positivenumbers b and C such that whenever t ≥ 0,

|f(t)| ≤ Cebt.

Show that the Laplace transform of f(t) given by

F (s) =∫ ∞

0f(t)e−st dt

exists if f(t) is of exponential order.

(d) Verify that∫ π/2

0(sinx)2n+1 dx ≤

∫ π/2

0(sinx)2n dx ≤

∫ π/2

0(sinx)2n−1 dx

and establish Wallis’s product

π

2=

21

· 23

· 43

· 45

· 65

· 67

· · · · · 2m2m− 1

· 2m2m+ 1

· · · · .

2.6 Exercises 35

(e) Show that

limT→∞

∫ T

0

sinxx

dx

exists and is between 1 and 3.

2.10. Prove the following statements. Parts (a) and (b) are challenging; accord-ing to Hobson [29], they were first given by Bonnet circa 1850.

(a) Let f(x) be a monotonic decreasing, nonnegative function on [a, b], and letg(x) be integrable on [a, b]. Then for some ξ with a ≤ ξ ≤ b,

∫ b

a

f(x)g(x) dx = f(a)∫ ξ

a

g(x) dx.

(b) Let f(x) be a monotonic increasing, nonnegative function on [a, b], and letg(x) be integrable on [a, b]. Then for some η with a ≤ η ≤ b,

∫ b

a

f(x)g(x) dx = f(b)∫ b

η

g(x) dx.

(c) Let f(x) be bounded and monotonic on [a, b], and let g(x) be integrableon [a, b]. Then for some ξ with a ≤ ξ ≤ b,

∫ b

a

f(x)g(x) dx = f(a)∫ ξ

a

g(x) dx+ f(b)∫ b

ξ

g(x) dx.

This is also referred to as the second mean value theorem for integrals,particularly in older books.

(d) Let f(x) be a monotonic function integrable on [a, b], and suppose thatf(a)f(b) ≥ 0 and |f(a)| ≥ |f(b)|. Then, if g is a real function integrable on[a, b], ∣∣∣∣

∫ b

a

f(x)g(x) dx∣∣∣∣ ≤ |f(a)| max

a≤ξ≤b

∣∣∣∣∫ ξ

a

g(x) dx∣∣∣∣ .

This is Ostrowski’s inequality for integrals.

2.11. Exercises using graphical approaches:

(a) Verify pictorially that

∫ n

1lnx dx < ln(n!) <

∫ n+1

1lnx dx

for n ∈ N, n > 1.

(b) Sketch the curve y = 1/x for x > 0, and consider the area bounded by thiscurve, the x-axis, and the lines x = a and x = b (b > a). Compare thiswith the areas of two suitable trapezoids and obtain

2(b− a)b+ a

< ln(b

a

)<

b2 − a2

2ab.


2.12. Euler’s constant C is defined by

C = limn→∞

Cn = limn→∞

(n∑

j=1

1j

− lnn

).

Verify that C exists and is positive by showing that Cn is strictly decreasing withlower bound 1/2. (Preliminary hint: Use trapezoids as in Exercise 2.11.)

2.13. Show that a thin metal ring rolls more slowly down an incline than anysolid circular disk of the same radius and same total mass.

2.14. Prove the following generalized version of Rolle’s theorem. Let g ∈ Cn[a, b],and let x0 < x1 < · · · < xn be n+ 1 points in [a, b]. Suppose

g(x0) = g(x1) = · · · = g(xn) = 0.

Then there exists ξ ∈ [a, b] such that g(n)(ξ) = 0.

2.15. Prove that if g(x) ≥ 0, g(x) ∈ C[a, b], and∫ b

a

g(x) dx = 0,

then g(x) ≡ 0 on [a, b].

2.16. A set A is said to be dense in a set B if every element of B is the limit of asequence of elements belonging to A. Show that if f(x) and g(x) are continuouson B with f(x) ≤ g(x) for every x in some dense subset of B, then f(x) ≤ g(x)for all x ∈ B. Explain how this idea could be used to extend to real argumentsan inequality proved for rational arguments.

2.17. (A simple caution.) Given a valid inequality between two functions, is itgenerally possible to obtain another valid inequality by direct differentiation? Isit true, for instance, that f ′(x) > g′(x) whenever f(x) > g(x)? Note, however,that if f ′(x) > g′(x) on [a, b], then we do have f(b) − f(a) > g(b) − g(a).

3Some Standard Inequalities

3.1 Introduction

Here we examine some famous inequalities. Many of these are extremelypowerful, and have left bold imprints on both pure and applied mathemat-ics. We begin with a simple result called Bernoulli’s inequality, and proceedto examine results of a more advanced nature.

3.2 Bernoulli’s Inequality

Theorem 3.1 (Bernoulli’s Inequality). If n ∈ N and x ≥ −1, then(1 + x)n ≥ 1 + nx. (3.1)

Equality holds if and only if n = 1 or x = 0. (See Exercise 2.5 for ageneralization.)

Proof. We can give an elementary proof by induction. Let P(n) be theproposition x ≥ −1⇒ (1 + x)n ≥ 1+ nx with equality if and only if n = 1or x = 0. P(1) holds trivially. Now let n ∈ N and assume P(n) is true.Note that since n + 1 = 1, conditions for equality in P(n + 1) are simplyx = 0. Multiplying by the nonnegative number 1 + x, we have

(1 + x)n+1 ≥ (1 + x)(1 + nx) = 1 + (n+ 1)x+ nx2 ≥ 1 + (n+ 1)x. (3.2)Equality holds in (3.2) if and only if nx2 = 0, which holds if and only ifx = 0.

38 3. Some Standard Inequalities

w

h

A1

A2

x = g(y)y = f(x)

x

y

FIGURE 3.1. Young’s inequality.

3.3 Young’s Inequality

Consider two continuous functions f and g, both strictly increasing andinverses of one another. Suppose further that both functions vanish at theorigin when graphed as in Figure 3.1. Area A1+A2 clearly exceeds the areaof a rectangle of width w and height h (for any choice of positive numbersw, h), and we are led immediately to the following:

Theorem 3.2 (Young’s Inequality). Let f, g ∈ C be strictly increasingand inverse to one another for nonnegative argument, with f(0) = g(0) = 0.Then

wh ≤∫ w

0f(x) dx+

∫ h

0g(x) dx. (3.3)

Equality holds if and only if h = f(w).

An analytical proof is requested in Exercise 3.1.

3.4 The Inequality of the Means

We now present the celebrated arithmetic mean – geometric mean, or AM–GM, inequality.

Theorem 3.3 (Weighted AM–GM Inequality). Let a1, . . . , an be pos-itive numbers and let δ1, . . . , δn be positive numbers (weights) such thatδ1 + · · ·+ δn = 1. Then

δ1a1 + · · ·+ δnan ≥ aδ11 · · · aδnn (3.4)

and equality holds if and only if the ai are all equal.

3.4 The Inequality of the Means 39

Proof. (See [21]). We begin with the fact that x − 1 − lnx ≥ 0 wheneverx > 0, with equality if and only if x = 1 (Exercise 2.2). Call

A =n∑

k=1

δkak.

For each i, ai/A− 1− ln(ai/A) ≥ 0. Multiplying each such term by δi andsumming over i, we have

n∑i=1

(δiai/A− δi)−n∑i=1

δi ln(ai/A) ≥ 0. (3.5)

Sincen∑i=1

(δiai/A− δi) = 0

we haven∑i=1

δi ln(ai/A) ≤ 0.

Now by the fact that the exponential function is increasing (see Theorem2.7) and the laws of exponents,

exp

[n∑i=1

δi ln(ai/A)

]≤ exp(0) = 1.

Hence (aδ11 · · · aδnn )/A ≤ 1, and

aδ11 · · · aδnn ≤ δ1a1 + · · ·+ δnan. (3.6)

Equality holds in (3.6) if and only if it holds in (3.5). Because each summandis nonnegative, equality holds in (3.5) if and only if each summand is zerowhich is equivalent to each ai/A = 1. In other words, equality holds in (3.6)if and only if a1 = · · · = an.For other proofs, see Exercises 3.5, 3.9, and 3.23.

The choice of weights δi = 1/n for all i leads to the next result.

Corollary 3.3.1 (AM–GM Inequality). If a1, . . . , an are positive num-bers, then

a1 + · · ·+ ann

≥ (a1 · · · an)1/n. (3.7)

Equality holds if and only if the ai are all equal.

The left member is the ordinary arithmetic mean of the n numbers, whilethe right member is by definition the ordinary geometric mean.


Example 3.1. Application of (3.7) to the reciprocals 1/ai gives

n

a−11 + · · ·+ a−1

n

≤ (a1 · · · an)1/n.

The left member is called the harmonic mean of the ai (recall Exer-cise 1.5). Thus the harmonic mean of positive numbers never exceeds thegeometric mean.

Example 3.2. A simple technique is to multiply by unity and then apply(3.7). Consider, for instance, the sequence

an =(1 +

1n

)n.

We have

an =(1 +

1n

)n· 1 <n(1 + 1

n

)+ 1

n+ 1

n+1

=(n+ 2n+ 1

)n+1

=(1 +

1n+ 1

)n+1

.

Hence an < an+1, and an is monotone increasing.

An integral form of the AM–GM inequality is introduced in Exercise3.10.

3.5 Holder’s Inequality

This result can be obtained in one step from the weighted AM–GM in-equality.

Theorem 3.4 (Holder’s Inequality). Suppose for each j, 1≤j≤n, thataj1, . . . , ajm are nonzero numbers. Suppose δ1, . . . , δn are positive numberssuch that δ1 + · · ·+ δn = 1. For each j denote

Sj =m∑i=1

|aji|.

Then

m∑i=1

|a1i|δ1 · · · |ani|δn ≤ Sδ11 · · ·Sδnn . (3.8)

3.5 Holder’s Inequality 41

Proof.

∑mi=1 |a1i|δ1 · · · |ani|δn

Sδ11 · · ·Sδnn

=m∑i=1

( |a1i|S1

)δ1· · ·( |ani|

Sn

)δn

≤m∑i=1

δ1|a1i|S1

+ · · ·+ δn|ani|Sn

= δ1 + · · ·+ δn

= 1 (3.9)

by the application of (3.4) to each summand.

With n = 2 write δ1 = 1/p, δ2 = 1/q, and let a1i = |ai|p and a2i = |bi|qfor i = 1, . . . ,m. Then (3.8) becomes

m∑i=1

|aibi| ≤(

m∑i=1

|ai|p)1/p( m∑

i=1

|bi|q)1/q

. (3.10)

This special case is also commonly referred to as Holder’s inequality, and wecan give another proof based on Young’s inequality. Putting f(x) = xp−1

and g(x) = xq−1 with complementary exponents that satisfy

1p+1q= 1 (1 < p <∞)

we obtain from (3.3)

wh ≤ wp

p+

hq

q.

With two sets of m numbers a1, . . . , am and b1, . . . , bm, we form the quan-tities

α =

m∑j=1

|aj |p

1/p

, β =

m∑j=1

|bj |q

1/q

.

Assuming that α, β are both nonzero, we have

|ai|α

|bi|β≤ 1

p

|ai|pαp

+1q

|bi|qβq

for any positive integer i. Summation over i produces

1αβ

m∑i=1

|αi||bi| ≤ 1pαp

m∑i=1

|ai|p + 1qβq

m∑i=1

|bi|q = 1p+1q= 1,

as required.


Taking m→∞ we have, by Lemma 1.1,

∞∑i=1

|aibi| ≤( ∞∑i=1

|ai|p)1/p( ∞∑

i=1

|bi|q)1/q

provided the series on the right both converge. The corresponding resultfor integrals, provided the integrals exist, is

∫ b

a

|f(x)g(x)| dx ≤(∫ b

a

|f(x)|p dx)1/p(∫ b

a

|g(x)|q dx)1/q

. (3.11)

See Exercise 3.14 for a derivation.In order to discuss when equality holds in Holder’s inequality, we note

that if αi ≥ 0 for all i, thenm∑i=1

αi = 0

if and only if each αi = 0. If αi ≥ βi for all i, thenm∑i=1

αi =m∑i=1

βi

if and only if αi = βi for all i. Thus equality holds in (3.9) if and only iffor each i ( |a1i|

S1

)δ1· · ·( |ani|

Sn

)δn

= δ1|a1i|S1

+ · · ·+ δn|ani|Sn

. (3.12)

From the weighted AM–GM inequality, (3.12) holds for each i if and onlyif

|a1i|S1

= · · · = |ani|Sn

. (3.13)

Hence equality holds in Holder’s inequality (3.8) if and only if (3.13) holdsfor all i. In the case n = 2, equality holds in (3.10) if and only if

|ai|p∑mi=1 |ai|p

=|bi|q∑mi=1 |bi|q

(3.14)

for all i.It is convenient to remove the condition that each aji be nonzero. If

aj1 = · · · = ajm = 0, then (3.8) holds by inspection. Now suppose eachset aj1, . . . , ajm contains at least one nonzero term. For each index i in(3.9) it is still true that

( |a1i|S1

)δ1· · ·( |ani|

Sn

)δn

≤ δ1|a1i|S1

+ · · ·+ δn|ani|Sn

3.6 Minkowski’s Inequality 43

(by (3.4) if each aji = 0, by inspection otherwise) so (3.9) and (3.10) arestill valid. We summarize this discussion applied to (3.10) as follows:

Theorem 3.5 (Holder’s Inequality). Let p > 1 and q > 1 and p−1 +q−1 = 1. Let a1, . . . , am and b1, . . . , bm be two sequences of real numbers.Then

m∑i=1

|aibi| ≤(

m∑i=1

|ai|p)1/p( m∑

i=1

|bi|q)1/q

.

Equality holds if and only if one of the sequences ai or bi consists entirelyof zeros or else

|ai|p∑mi=1 |ai|p

=|bi|q∑mi=1 |bi|q

for all i.

3.6 Minkowski’s Inequality

Theorem 3.6 (Minkowski’s Inequality). Assume that a1, . . . , am andb1, . . . , bm are real numbers, and let p ≥ 1. Then

(m∑i=1

|ai + bi|p)1/p

≤(

m∑i=1

|ai|p)1/p

+

(m∑i=1

|bi|p)1/p

. (3.15)

Proof. If p = 1 this follows from the triangle inequality. Now suppose p > 1,and choose q > 1 so that p−1 + q−1 = 1. Write Holder’s inequality as

m∑i=1

|αiβi| ≤(

m∑i=1

|αi|p)1/p( m∑

i=1

|βi|q)1/q

.

Let αi = |ai| and βi = |ai+bi|p/q, and then let αi = |bi| and βi = |ai+bi|p/qto get

m∑i=1

|ai||ai + bi|p/q ≤(

m∑i=1

|ai|p)1/p( m∑

i=1

|ai + bi|p)1/q

(3.16)

and

m∑i=1

|bi||ai + bi|p/q ≤(

m∑i=1

|bi|p)1/p( m∑

i=1

|ai + bi|p)1/q

, (3.17)

respectively. Since p = 1 + (p/q), for each i,

|ai + bi|p = |ai + bi||ai + bi|p/q≤ |ai||ai + bi|p/q + |bi||ai + bi|p/q. (3.18)


Summing over the terms in (3.18), and using (3.17) and (3.16), we have

m∑i=1

|ai + bi|p ≤(

m∑i=1

|ai|p)1/p( m∑

i=1

|ai + bi|p)1/q

+

(m∑i=1

|bi|p)1/p( m∑

i=1

|ai + bi|p)1/q

=

(

m∑i=1

|ai|p)1/p

+

(m∑i=1

|bi|p)1/p(

m∑i=1

|ai + bi|p)1/q

.

We may assume thatm∑i=1

|ai + bi|p = 0

(because (3.15) obviously holds otherwise) and the theorem is proved.

For conditions when equality holds, see Exercise 4.8.Minkowski’s inequality can be extended to apply to infinite series, pro-

vided the series converge, as( ∞∑i=1

|ai + bi|p)1/p

≤( ∞∑i=1

|ai|p)1/p

+

( ∞∑i=1

|bi|p)1/p

and to integrals, provided the integrals exist, as(∫ b

a

|f(x) + g(x)|p dx)1/p

≤(∫ b

a

|f(x)|p dx)1/p

+

(∫ b

a

|g(x)|p dx)1/p

.

3.7 The Cauchy–Schwarz Inequality

Theorem 3.7 (Cauchy–Schwarz Inequality). Suppose a1, . . . , am andb1, . . . , bm are nonnegative real numbers. Then

(m∑i=1

aibi

)2

≤(

m∑i=1

a2i

)(m∑i=1

b2i

). (3.19)

Equality holds if and only if all ai are zero, or all bi are zero, or

aj =

√∑a2i∑b2i

bj

for all j.

3.8 Chebyshev’s Inequality 45

Proof. Let p = q = 2 in Holder’s inequality. Alternatively, we can use ourprevious remarks on quadratic inequalities as follows. First, if ai = 0 forall i, then (3.19) holds trivially. If ai = 0 for at least one i, then for everyx ∈ R certainly

m∑i=1

(aix+ bi)2 ≥ 0

org(x) = αx2 + 2βx+ γ ≥ 0,

where

α =m∑i=1

a2i , β =

m∑i=1

aibi, γ =m∑i=1

b2i .

Hence β2 − αγ = ∆ ≤ 0, and (3.19) is established.Provided the series on the right converge,

( ∞∑i=1

aibi

)2

≤( ∞∑i=1

a2i

)( ∞∑i=1

b2i

).

We can also write (3.19) for Riemann sums and apply Lemma 1.1 to obtain

(∫ b

a

f(x)g(x) dx

)2

≤(∫ b

a

f2(x) dx

)(∫ b

a

g2(x) dx

)

for functions f(x) and g(x), provided the integrals exist.

3.8 Chebyshev’s Inequality

Theorem 3.8 (Chebyshev’s Inequality). Let ai and bi be similarly or-dered such that either

a1 ≤ · · · ≤ am,b1 ≤ · · · ≤ bm,

or

a1 ≥ · · · ≥ am,b1 ≥ · · · ≥ bm.

Then1m

m∑i=1

aibi ≥(1m

m∑i=1

ai

)(1m

m∑n=1

bi

)

with equality if and only if a1 = · · · = am or b1 = · · · = bm.

Proof. For either of the two cases it is evident that for any choice of i, j,

(ai − aj)(bi − bj) ≥ 0.


Summation over both indices yields

m∑i=1

m∑j=1

(ai − aj)(bi − bj) ≥ 0

and expansion gives

m∑i=1

aibi

m∑j=1

(1)−m∑i=1

ai

m∑j=1

bj −m∑j=1

aj

m∑i=1

bi +m∑j=1

ajbj

m∑i=1

(1) ≥ 0

or

2mm∑i=1

aibi − 2m∑i=1

ai

m∑i=1

bi ≥ 0,

as required.

Example 3.3. Choosing bi = ai for all i, we see that the square of thearithmetic mean never exceeds the mean of the squares.

With functions f(x) and g(x), analogous operations yield∫ b

a

f(x)g(x) dx ≥ 1b− a

∫ b

a

f(x) dx∫ b

a

g(x) dx

if f(x) and g(x) are either both increasing or both decreasing on [a, b]. Ifone function is increasing and the other is decreasing, the inequality signis reversed.

3.9 Jensen’s Inequality

A function f(x) is convex on the open interval (a, b) if and only if theinequality

f(px1 + (1− p)x2) ≤ pf(x1) + (1− p)f(x2) (3.20)

holds for all x1, x2 ∈ (a, b) and every p ∈ (0, 1). In the case of strict inequal-ity for x1 = x2, f is strictly convex on (a, b). We note that any xp ∈ (x1, x2)can be expressed as xp = x1 + (1 − p)(x2 − x1) = px1 + (1 − p)x2 forsome p ∈ (0, 1). The straight line connecting the points (x1, f(x1)) and(x2, f(x2)) is

fs(x) = f(x1) +[f(x2)− f(x1)

x2 − x1

](x− x1)

so that fs(xp) = pf(x1) + (1− p)f(x2). Geometrically then, the convexitycondition prevents the graph of f(x) from rising above the secant lineconnecting any two of its points (Figure 3.2).

3.9 Jensen’s Inequality 47

a x1 xp x2 b

f(x)

fs(x)

x

y

FIGURE 3.2. Function convexity.

Upon reflection it seems natural to associate convexity with the require-ment that f ′′(x) ≥ 0 on (a, b). In fact, this requirement is equivalent to(3.20) for functions twice continuously differentiable on (a, b) (Exercise3.22). The reader should also be aware that other definitions of convex-ity exist. An example is the midpoint convexity definition requiring that

f

(x1 + x2

2

)≤ f(x1) + f(x2)

2

for x1, x2 ∈ (a, b). Here the geometric requirement is only that the midpointof every secant line lie on or above the graph of f . For more detailedinformation on convexity, see Mitrinovic [45]. Our main result for convexfunctions is as follows:

Theorem 3.9 (Jensen’s Inequality). Let f(x) be convex on (a, b), andlet x1, . . . , xm be m points of (a, b). Also let c1, . . . , cm be nonnegativeconstants such that c1 + · · ·+ cm = 1. Then

f

(m∑i=1

cixi

)≤

m∑i=1

cif(xi). (3.21)

If f is strictly convex and if additionally each ci > 0, then equality holds ifand only if x1 = · · · = xm.

Proof. We first consider the case for which cm < 1, and proceed by induc-tion. With m = 2, (3.21) holds by convexity of f . If x1 = x2, then equalityholds in (3.21) by inspection, and if f is strictly convex, all ci > 0, andequality holds in (3.21) we must have x1 = x2, for otherwise

f(c1x1 + c2x2) < c1f(x1) + c2f(x2).


Now assume the theorem is true for m = k and suppose the numbersc1, . . . , ck+1 sum to 1. We have

f

(k+1∑i=1

cixi

)= f

((1− ck+1)

k∑i=1

ci1− ck+1

xi + ck+1xk+1

)

≤ (1− ck+1)f

(k∑i=1

ci1− ck+1

xi

)+ ck+1f(xk+1)

by convexity of f . Since the numbers ci/(1− ck+1) for 1 ≤ i ≤ k sum to 1,

f

(k∑i=1

ci1− ck+1

xi

)≤ 11− ck+1

k∑i=1

cif(xi), (3.22)

hence (with m = k + 1 ) (3.21) holds. If x1 = · · · = xk+1 equality holds in(3.21) by inspection. Now suppose equality holds (withm = k+1) in (3.21),f is strictly convex, and all ci > 0. Then equality also holds in (3.22), forif not then equality cannot hold in (3.21) either, contrary to hypothesis.Hence, since the theorem is assumed true for k numbers, x1 = · · · = xk.Putting this into (3.21), we obtain

f

((k∑i=1

ci

)x1 + ck+1xk+1

)=

(k∑i=1

ci

)f(x1) + ck+1f(xk+1),

so by the case m = 2, xk+1 = x1 and hence the induction is complete. Theother case, for which cm = 1, is much easier; for then c1 = · · · = cm−1 = 0,whence (3.21) becomes simply f(xm) ≤ f(xm).

Example 3.4. The function f(x) = xn, n ∈ N, is convex on (0,∞). Hencefor all x, y > 0 we have

(x+ y

2

)n≤ xn + yn

2.

Example 3.5. Choosing ci = 1/m for i = 1, . . . ,m, we have

f

(1m

m∑i=1

xi

)≤ 1

m

m∑i=1

f(xi)

for any convex f . If instead f is “concave” such that −f is convex, ourinequality becomes

f

(1m

m∑i=1

xi

)≥ 1

m

m∑i=1

f(xi).

3.10 Exercises 49

An example is f(x) = sinx on (0, π), and we have

1m

m∑i=1

sin θi ≤ sin(1m

m∑n=1

θi

)(3.23)

for 0 < θ1 ≤ · · · ≤ θm < π. In fact, −sinx is strictly convex on (0, π) sothat equality holds in (3.23) only if θ1 = · · · = θm. See Exercise 3.21 for anapplication.

An integral form of Jensen’s inequality is introduced in Exercise 3.24.

3.10 Exercises

3.1. Construct an analytical proof of Young’s inequality.

3.2. Assume a, b, c, d > 0 and prove the following:

(a) a2 + b2 ≥ 2ab.

(b) a4 + b4 ≥ 2a2b2.

(c) a2 + b2 + c2 ≥ ab+ bc+ ca.

(d) a4 + b4 + c4 + d4 ≥ 4abcd.

(e) (a+ b)(b+ c)(c+ a) ≥ 8abc.

3.3. Use AM–GM to show that

n! <(n+ 12

)n.

Prove also that if n > 1 is a natural number, then [45]

(2n− 1)!! < nn and (n+ 1)n > (2n)!!,

where

(2n− 1)!! = (2n− 1) · (2n− 3) · (2n− 5) · · · · · 5 · 3 · 1(2n)!! = (2n) · (2n− 2) · (2n− 4) · · · · · 4 · 2.

3.4. Simple applications of the AM–GM inequality:

(a) Show that of all rectangles having a given area, a square has the leastperimeter.

(b) A charge q is removed from a given electric charge Q to make two separatecharges q and Q− q. Determine q so that repulsion between the charges ata given distance is maximized.

3.5. Prove the weighted AM–GM inequality by induction on n.

3.6. Show that if the product of N positive numbers equals 1, then the sum ofthose numbers cannot be less than N .


3.7. Show that the sequence (1 − n−1)n is monotone increasing.

3.8. Let a1, . . . , aN be positive numbers that sum to 1, and let m be a positiveinteger. Show that

N∑n=1

a−mn ≥ Nm+1.

3.9. Let n ∈ N, and let x1, . . . , xn and δ1, . . . , δn be positive numbers such that∑ni=1 δi = 1. For any real number t = 0 define g(t) = (

∑ni=1 δix

ti)

1/t.

(a) Show that g(t) →∏ni=1 x

δii as t → 0. Hence we define g(0) =

∏ni=1 x

δii .

(b) Show that g is increasing. Preliminary hint: Take the logarithm and usel’Hopital’s monotone rule on (0,∞) and (−∞, 0).

(c) Note that g(−1) ≤ g(0) ≤ g(1) gives(

n∑i=1

δi/xi

)−1

≤n∏

i=1

xδii ≤

n∑i=1

δixi

(the weighted harmonic–geometric–arithmetic means inequality).

3.10. Let f(x) ∈ C[a, b] and let f(x) > 0 on [a, b]. Prove

b− a∫ ba(1/f(x)) dx

≤ exp[

1b− a

∫ b

a

ln f(x) dx]

≤ 1b− a

∫ b

a

f(x) dx.

This is the harmonic–geometric–arithmetic mean inequality for integrals.

3.11. Let n ∈ N and x1, . . . , xn be positive numbers. On (0,∞) define h(t) =(∑n

i=1 xi)1/t. Show that h is decreasing.

3.12. Show that for any m numbers ai that satisfy 0 < a1 < · · · < am and anym positive numbers λi that sum to 1, Kantorovich’s inequality

(m∑i=1

λiai

)(m∑i=1

λiai

)≤(A

G

)2

holds, where A = (a1 + am)/2 and G =√a1am.

3.13. Suppose p and q are positive real numbers such that p−1 + q−1 = 1. Leta1, . . . , am be nonzero numbers. Define bi = c|ai|p−1 for i = 1, . . . ,m. Verify that

|ai|p∑mi=1 |ai|p =

|bi|p∑mi=1 |bi|p (3.24)

for i = 1, . . . ,m. By Theorem 3.5 equality must hold in Holder’s inequality. Verifythis by direct substitution. Conversely, show that (3.24) implies there is a c > 0such that

|bi| = c|ai|p−1 (3.25)

for i = 1, . . . ,m. Hence the condition for equality in Holder’s inequality can bestated by (3.25).

3.10 Exercises 51

3.14. Justify equation (3.11).

3.15. A function f(x) is square integrable on [a, b] if

∫ b

a

|f(x)|2 dx < ∞.

Show that the sum of two square integrable functions is square integrable.

3.16. Prove that if h(x) ≥ 0, then

∣∣∣∣∫ b

a

f(x)g(x)h(x) dx∣∣∣∣2

≤∫ b

a

f2(x)h(x) dx∫ b

a

g2(x)h(x) dx.

3.17. A particle undergoes rectilinear motion at speed v. Show that the temporalaverage of v never exceeds its spatial average:

1T

∫ T

0v(t) dt ≤ 1

X

∫ X

0v(x) dx.

Under what condition does equality hold?

3.18. Show that if ci > 0 for i = 1, . . . , n, then

n2 ≤(

n∑i=1

ci

)(n∑

i=1

1ci

).

3.19. Use the Chebyshev inequalities for integrals to derive the inequalities

∫ b

a

[f(x)]2 dx ≥ 1b− a

(∫ b

a

f(x) dx)2

and ∫ b

a

f(x) dx∫ b

a

dx

f(x)≥ (b− a)2.

3.20. Show that the sum of finitely many convex functions is convex.

3.21. Prove that of all N sided polygons that can be inscribed in a circle of fixedradius, a regular polygon has the greatest area.

3.22. Show that the condition

(1 − p)(x2 − x1)2∫ 1

0

∫ s

(1−p)sf ′′(x1 + (x2 − x1)t) dt ds ≥ 0

is equivalent to (3.20) for functions f that are twice continuously differentiable.Reason from this fact that f ′′(x) ≥ 0 is necessary and sufficient for the convexityof such functions.

3.23. Use Jensen’s inequality with f(x) = −lnx, x > 0, to deduce the weightedAM–GM inequality.


3.24. Let g(t) and p(t) be continuous and defined for a ≤ t ≤ b such thatα ≤ g(t) ≤ β and p(t) > 0. Let f(u) be continuous and convex on the intervalα ≤ u ≤ β. Show that

f

∫ b

a

g(t)p(t) dt∫ b

a

p(t) dt

≤

∫ b

a

f(g(t)) p(t) dt∫ b

a

p(t) dt.

This is Jensen’s inequality for integrals.

4Inequalities in Abstract Spaces

4.1 Introduction

Generality is gained by working in abstract spaces. For instance, all essen-tial aspects of the topics of convergence and continuity can be studied inthe context of a metric space. When we search for solutions to problemsof physical interest, we must often search among the members of linear orvector spaces. Inequalities provide basic structure for abstract spaces likethese, and we turn to a consideration of that topic in the present chapter.In doing so we present a few topics from functional analysis. Needless tosay our coverage is neither broad nor deep: we hope only to catch a glimpseof inequalities in the kind of abstract setting that can help unify many ofour previous results before we proceed to the final chapter on applications.

4.2 Metric Spaces

Let M be a nonempty set of elements (points). Let d(x, y) be a real-valuedfunction defined for each x, y ∈M such that:

M1. d(x, y) = 0 if and only if x = y.

M2. d(x, y) = d(y, x).

M3. d(x, y) ≤ d(x, z) + d(z, y) for every x, y, z ∈ S.

54 4. Inequalities in Abstract Spaces

Then M taken together with d is a metric space, and d is the distance ormetric for M . Property M3 is an abstract version of the familiar triangleinequality. Putting y = x in M3 and using the other two properties, weget 2d(x, z) ≥ 0 so that distance as defined is never negative. The distanceis also symmetric by M2, and we see that the properties assigned to themetric do satisfy our primary expectations about the distance concept.

Example 4.1. R and C are metric spaces, with distances each defined bythe usual metric

d(x, y) = |x− y|.More generally, Rn and Cn, the spaces consisting of n-tuples of elementsof R and C, respectively, are metric spaces. A closed interval [a, b] in R isalso a metric space. The set C[a, b] of all real-valued functions defined andcontinuous on [a, b] can form a metric space if the metric is chosen as

d(f, g) = maxx∈[a,b]

|f(x)− g(x)|. (4.1)

To verify this we check to see that d(f, g) satisfies the required properties.We have d(f, g) = 0 if and only if |f(x) − g(x)| = 0 for all x ∈ [a, b],verifying M1. Property M2 is obviously satisfied. For M3,

|f(x)− g(x)| = |f(x)− h(x) + h(x)− g(x)| ≤ |f(x)− h(x)|+ |h(x)− g(x)|,so that

maxx∈[a,b]

|f(x)− g(x)| ≤ maxx∈[a,b]

|f(x)− h(x)|+ maxx∈[a,b]

|h(x)− g(x)|,

as desired. The metrics in Rn and Cn are given, respectively, by

d(x, y) =√(x1 − y1)2 + · · ·+ (xn − yn)2

andd(z, w) =

√|z1 − w1|2 + · · ·+ |zn − wn|2.

We now state some definitions important in the study of metric spaces.An ε-neighborhood of x0 in M is a set

Nε(x0) = x ∈M | d(x, x0) < ε.This is a direct extension of the corresponding definition in R (recall Ex-ample 1.3). A set S in M is open if, given any x0 ∈ S, there exists ε > 0such that Nε(x0) is contained in S. A set S in M is said to be closed if itscomplement inM is open. A point z ∈M is a limit point of a set S if everyε-neighborhood of z contains at least one point of S distinct from z. It canbe shown that S is closed if and only if it contains all its limit points. Asequence of points xn converges to the limit x if, for every ε > 0, thereexists N such that d(xn, x) < ε whenever n > N . A sequence xn is aCauchy sequence if, for every ε > 0, there exists N such that for every pairof numbers m,n, the inequalities m > N and n > N together imply thatd(xm, xn) < ε.

4.2 Metric Spaces 55

Theorem 4.1. If xn converges, then xn is a Cauchy sequence.

Proof. Let xn → x. Then by the triangle inequality,

d(xm, xn) ≤ d(xm, x) + d(x, xn) < ε/2 + ε/2 = ε

for sufficiently large n,m.

The converse of Theorem 4.1 is false, and a metric space M is calledcomplete if every Cauchy sequence in M converges to a point in M .

Example 4.2. Let M = C[a, b], with d defined as in (4.1). Then M iscomplete. Let fn be a Cauchy sequence inM . For each x ∈ [a, b], fn(x)is a Cauchy sequence in R and hence has a limit which we denote by φ(x).To show that φ(x) is continuous at x1 ∈ [a, b] we use an ε/3 argument. Forx2 ∈ [a, b]

|φ(x1)− φ(x2)| ≤ |φ(x1)− fn(x1)|+ |fn(x1)− fn(x2)|+ |fn(x2)− φ(x2)|.

The first and third terms can be made less than ε/3 for a sufficiently largechoice of n, independent of x1 and x2, and with n fixed, the middle termcan be made less than ε/3 by choosing x2 sufficiently close to x1. Also, ifK > 0, then

f ∈ C[a, b] | |f(x)| ≤ K for all x ∈ [a, b]

is a complete metric space by the previous argument and Lemma 1.1.

Next, we take M1,M2 to be two metric spaces with distance functionsd1, d2, respectively, and denote by F :M1 →M2 a mapping (i.e., function)from M1 to M2. If F :M →M , then we say that F is a mapping on M . Amapping F : M1 → M2 is continuous at x0 ∈ M1 if for every ε > 0, thereis a δ > 0 such that d2(F (x), F (x0)) < ε whenever d1(x, x0) < δ.

Lemma 4.1 (Persistence of Sign). If M is a metric space and f :M →R is continuous at x0 with f(x0) > 0, then there exists a δ > 0 such thatf(x) > 0 whenever d(x, x0) < δ.

Proof. We leave it to the reader to modify the proof of Lemma 2.2.

The interrelationship between convergence and continuity in R, noted inTheorem 2.1, extends to a general metric space.

Theorem 4.2. The mapping F : M1 → M2 is continuous at x0 ∈ M1 ifand only if F (xn)→ F (x0) whenever xn → x0.

Proof. We leave it to the reader to make the necessary modifications to theproof of Theorem 2.1.


Let F be a mapping on M . F is a contraction mapping on M if thereexists a number α ∈ [0, 1) such that

d(F (x1), F (x2)) ≤ αd(x1, x2) (4.2)

whenever x1, x2 ∈M . The point y is a fixed point of F if F (y) = y.

4.3 Iteration in a Metric Space

An iterative process is a method of locating a fixed point of F , or, in otherwords, of solving the equation y = F (y). The method is based on use ofthe recursion

yn+1 = F (yn)

for n = 0, 1, 2, . . . , to obtain successive approximations to the desired fixedpoint y. The construction of such a sequence is referred to as Picard itera-tion. If F is a contraction, then repeated application of (4.2) gives, for anyn ∈ N,

d(yn+1, yn) ≤ αnd(y1, y0).

Because 0 ≤ α < 1, the successive approximations form a sequence of pointsy0, y1, y2, . . . in the metric space that cluster together at a rate controlledby α.

The reader may prove that if F is a contraction mapping on M , then Fis continuous on M (in the ε − δ definition of continuity, choose δ = ε).The following is one of the most important theorems in all of mathematics:

Theorem 4.3 (Banach Contraction Mapping Theorem). Let M bea complete metric space and let F :M →M be a contraction. Then F hasa unique fixed point.

Proof. Choose any initial point y0 ∈ M . As above, let ym+1 = F (ym) forall m. For m > n,

d(ym, yn) ≤ d(ym, ym−1)+d(ym−1, ym−2)+· · ·+d(yn+2, yn+1)+d(yn+1, yn),

so that

d(ym, yn) ≤ (αm−1 + αm−2 + · · ·+ αn+1 + αn)d(y1, y0)

= αn(1 + α+ · · ·+ αm−n−2 + αm−n−1)d(y1, y0)

≤(

αn

1− α

)d(y1, y0).

Now αn/(1−α) can be made arbitrarily small by choosing n large enough.Hence ym is a Cauchy sequence, so there is a point Y ∈ M such thatym → Y as m→∞. By continuity of F ,

Y = limm→∞F (ym) = F

(limm→∞ ym

)= F (Y )

4.4 Linear Spaces 57

and the existence of the fixed point is established. For uniqueness, supposethat Y = F (Y ) and Z = F (Z). Then

d(Y,Z) = d[F (Y ), F (Z)] ≤ αd(Y,Z).

But α < 1; hence d(Y,Z) = 0, and uniqueness is established.

We will encounter several applications of the contraction mapping theo-rem in Chapter 5.

4.4 Linear Spaces

A linear space over a field F is a setX whose elements are called vectors to-gether with two operations called vector addition and scalar multiplicationsuch that the following axioms are satisfied:

1. X is closed with respect to the two operations. That is, x + y ∈ Xand αx ∈ X whenever x, y ∈ X and α ∈ F .

2. Addition is both commutative and associative.

3. There is an additive identity element (zero vector) in X. For eachvector x ∈ X, there is an additive inverse −x ∈ X.

4. If x, y ∈ X and α, β ∈ F , then:

(a) α(x+ y) = αx+ αy.

(b) (α+ β)x = αx+ βx.

(c) (αβ)x = α(βx).

(d) 1x = x.

If the field of scalars is R, then X is a real linear space; if F = C, then Xis a complex linear space.

Example 4.3. Rn and Cn are linear spaces. The collection of all real-valued continuous functions on [a, b] is also a linear space.

Some terms and concepts are important in the study of linear spaces.If M is a nonempty subset of a linear space X, then M is a subspace ofX if M is itself a vector space under the same operations of addition andmultiplication as X. A linear combination of the vectors x1, . . . , xn is asum of the form α1x1+ · · ·+αnxn where the scalars α1, . . . , αn ∈ F . A setof vectors x1, . . . , xn is linearly dependent if there exist α1, . . . , αn ∈ F ,not all zero, such that α1x1 + · · ·+ αnxn = 0. A set of vectors that is notlinearly dependent is linearly independent. If every vector x ∈ X can beexpressed as a linear combination of the vectors from a set S, then S is a


spanning set for X. A linearly independent spanning set is called a basis. Abasis is essentially a coordinate system. Any vector space which has a finitespanning set (i.e., any finite-dimensional space) contains a basis. All basesof such a space contain the same number of vectors, called the dimensionof the space. If X has dimension n, then any set of n linearly independentvectors in X is a basis of X.Thus far, the issues of “magnitude and direction” normally associated

with physical vectors have not been addressed. To introduce the notionof magnitude into a linear space, we define the vector norm. A norm is areal-valued function that assigns to each vector x ∈ X a number ‖x‖ suchthat:

N1. ‖x‖ ≥ 0, and ‖x‖ = 0 if and only if x = 0.N2. ‖αx‖ = |α|‖x‖.N3. ‖x+ y‖ ≤ ‖x‖+ ‖y‖.Example 4.4. In Rn,

‖x‖ =√

x21 + · · ·+ x2

n.

A function space is an example of an infinite-dimensional linear space.Norms often used with function spaces include the supremum norm

||f || = maxx∈[a,b]

|f(x)|

and the L2 norm

||f || =(∫ b

a

|f(x)|2 dx)1/2

.

For a particular selection of norm, the function space is defined as the setof all f with ||f || <∞.

X is now a normed linear space, and we have the following result:

Theorem 4.4 (Triangle Inequality). Assume that x, y are two vectorsin a normed linear space. Then

|‖x‖ − ‖y‖| ≤ ‖x− y‖ ≤ ‖x‖+ ‖y‖. (4.3)

Proof. By N3 with x replaced by x− y,

‖x‖ − ‖y‖ ≤ ‖x− y‖.Swapping x and y we have, by N2,

‖y‖ − ‖x‖ ≤ ‖y − x‖ = ‖(−1)(x− y)‖ = ‖x− y‖.Therefore |‖x‖−‖y‖| ≤ ‖x− y‖, and the use of N3 again yields the desiredinequality.


It is often convenient to have a measure of the distance between arbitraryelements of the normed space. Hence we employ the norm to induce a metricusing

d(x, y) = ‖x− y‖.The normed linear space is now a metric space also, and our previousconcepts involving Cauchy sequences, convergence, and completeness apply.The following theorem, for instance, is useful in applications:

Theorem 4.5. In a normed linear space every Cauchy sequence is bounded.

Proof. If xn is a Cauchy sequence, then with ε = 1 there exists N suchthat ‖xn − xm‖ < 1 whenever n,m > N . With m = N + 1, this reads‖xn − xN+1‖ < 1 whenever n > N . For all n > N ,

‖xn‖ = ‖xn − xN+1 + xN+1‖ ≤ ‖xn − xN+1‖+ ‖xN+1‖ < ‖xN+1‖+ 1.Hence an upper bound for ‖xn‖ is given by

B = max‖x1‖, . . . , ‖xN‖, ‖xN+1‖+ 1.

To get the concept of angle (and hence of vector direction) into a linearspace, we introduce the inner product. An inner product on a real linearspace is a function which assigns to each pair of vectors x, y a real number〈x, y〉 such that:I1. 〈x, y〉 = 〈y, x〉.I2. 〈αx, y〉 = α〈x, y〉.I3. 〈x+ y, z〉 = 〈x, z〉+ 〈y, z〉.I4. 〈x, x〉 ≥ 0, and 〈x, x〉 = 0 if and only if x = 0.

To define an inner product on a complex linear space, we modify the aboveso that 〈x, y〉 ∈ C, and rewrite I1 as:I1. 〈x, y〉 = 〈y, x〉.

A linear space furnished with an inner product is called an inner productspace.

Example 4.5. In Rn and Cn we use, respectively, the inner product ex-pressions

〈x, y〉 =n∑i=1

xiyi, 〈x, y〉 =n∑i=1

xiyi.

An inner product of the form

〈f(x), g(x)〉 =∫ b

a

f(x)g(x) dx


is often used with functions. (Here we gloss over a technical point involvingmeasure theory and Lebesgue integration; the reader is referred to Oden[47] for a more thorough treatment.)

With inner-product structure in a linear space, we can watch more fa-miliar inequalities arise.

Theorem 4.6 (Cauchy–Schwarz Inequality). Let x, y be two vectorsin a complex inner product space. Then

|〈x, y〉| ≤√〈x, x〉〈y, y〉 (4.4)

and equality holds if and only if there is a scalar β such that x = βy.Furthermore, in the case of equality with y = 0, 〈x, y〉 = 〈βy, y〉 = β〈y, y〉so β = 〈x, y〉/〈y, y〉. Thus equality holds if and only if x = 0 or y = 0 orelse x = (〈x, y〉/〈y, y〉)y.Proof. By property I4, for every scalar α, 0 ≤ 〈x+αy, x+αy〉 with equalityif and only if x = −αy = βy. By the other properties this inequality canbe manipulated into the equivalent form

0 ≤ 〈x, x〉+ α〈x, y〉+ α〈x, y〉+ αα〈y, y〉.

To shorten the notation, we write a = 〈x, x〉, b = 〈x, y〉, c = 〈y, y〉 and have

0 ≤ |α|2c+ 2[αb] + a.

Note that a and c are real and nonnegative. If c = 0, we may put α = −b/cand get |b|2 ≤ ac as desired. If c = 0 but a = 0, the roles of x and y may bereversed in the definitions of a, b, c above to yield the same result. If c anda are both zero, then x and y are both zero by I4, and Cauchy–Schwarzholds trivially.

Example 4.6. Substituting the various inner product expressions given inExample 4.5, we may quickly generate the specific forms

∣∣∣∣∣n∑i=1

xiyi

∣∣∣∣∣ ≤√√√√ n∑

i=1

x2i

n∑i=1

y2i ,

∣∣∣∣∣n∑i=1

xiyi

∣∣∣∣∣ ≤√√√√ n∑

i=1

|xi|2n∑i=1

|yi|2,∣∣∣∣∣∫ b

a

f(x)g(x) dx

∣∣∣∣∣ ≤√∫ b

a

|f(x)|2 dx∫ b

a

|g(x)|2 dx.

Note the economy and power of the abstract space approach.


For any two vectors x, y in a real linear space, the Cauchy–Schwarz in-equality can be written as

〈x, y〉2 ≤ 〈x, x〉〈y, y〉. (4.5)

Theorem 4.7 (Minkowski Inequality). Suppose x, y are two vectors ina linear space. Then

√〈x+ y, x+ y〉 ≤

√〈x, x〉+

√〈y, y〉. (4.6)

Proof.

〈x+ y, x+ y〉 = 〈x, x〉+ 2〈x, y〉+ 〈y, y〉≤ 〈x, x〉+ 2|〈x, y〉|+ 〈y, y〉≤ 〈x, x〉+ 2

√〈x, x〉〈y, y〉+ 〈y, y〉

=(√

〈x, x〉+√〈y, y〉)2

.

Conditions for equality are treated in Exercise 4.8.

Whereas the triangle inequality makes a statement about generalizeddistances, we see that the Cauchy–Schwarz and Minkowski inequalities,as stated above, are concerned with the properties of generalized angles.A norm can also be conveniently induced by the inner product using theequation

‖x‖2 = 〈x, x〉. (4.7)

The Cauchy–Schwarz and Minkowski inequalities can then be written as

|〈x, y〉| ≤ ‖x‖‖y‖

and‖x+ y‖ ≤ ‖x‖+ ‖y‖,

respectively. In this case we also have the following theorem:

Theorem 4.8 (Parallelogram Law). Let x, y be vectors in a linear spacein which the norm is induced by the inner product. Then

‖x+ y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Proof. This follows from straightforward expansion and manipulation ofthe quantity 〈x + y, x + y〉 + 〈x − y, x − y〉 using the basic inner productproperties.


Note that in R2 the vectors x, y represent adjacent sides of a parallelo-gram, while the vectors x+ y, x− y are the diagonals.With a norm present, we may also employ concepts of convergence and

completeness. A Hilbert space is a complete linear space equipped with aninner product. The following two items are also of some use in applications:

Theorem 4.9 (Continuity of the Inner Product). Assume the normis induced by the inner product, and suppose that xn → x and yn → y.Then 〈xn, yn〉 → 〈x, y〉.Proof. We use the triangle and Cauchy–Schwarz inequalities:

|〈xn, yn〉 − 〈x, y〉| = |〈xn, yn〉 − 〈xn, y〉+ 〈xn, y〉 − 〈x, y〉|= |〈xn, yn − y〉+ 〈xn − x, y〉|≤ |〈xn, yn − y〉|+ |〈xn − x, y〉|≤ ‖xn‖ ‖yn − y‖+ ‖xn − x‖ ‖y‖.

Since xn is convergent it is bounded with ‖xn‖ ≤ B for some finiteB. The other n-dependent quantities can be made as small as desired bychoosing n sufficiently large.

Corollary 4.9.1 (Continuity of the Norm). If the norm is induced bythe inner product, then xn → x implies ‖xn‖ → ‖x‖.The notion of angle also leads to a generalized notion of “perpendicular-

ity.” Two vectors x, y are orthogonal if 〈x, y〉 = 0. We say x1, x2, . . . is anorthogonal set if for all i, j > 0, i = j implies 〈xi, xj〉 = 0. An orthonormalset is an orthogonal set where for all i, ‖xi‖ = 1. It can be shown thatmutual orthogonality among the members of a finite set of nonzero vectorsimplies linear independence among those vectors. Through an algorithmcalled the Gram–Schmidt procedure, we can generate a mutually orthog-onal set from any linearly independent set of vectors. A handy theorem,which follows directly from the inner product and orthogonality definitions,is the following:

Theorem 4.10 (Pythagorean Theorem). Let the vectors x and y beorthogonal. Then

‖x+ y‖2 = ‖x‖2 + ‖y‖2.

4.5 Orthogonal Projection and Expansion

Given a vector x in a Hilbert space H, and a subspace M of H, someoptimization schemes require a vector m ∈ M that is “closest” to x inthe sense that ‖x − m‖ is minimized. Such a vector m0 is known as aminimizing vector. Assuming M is closed, we can establish the existence

4.5 Orthogonal Projection and Expansion 63

and uniqueness of m0 ∈M . First, we wish to show that, corresponding tothe given x, there exists m0 ∈M such that for all m ∈M ,

‖x−m‖ ≥ ‖x−m0‖.

Let x ∈ H be given. If x ∈ M , we simply choose m0 = x. If x ∈ M , wedefine

δ = infm∈M

‖x−m‖.

Note that for any mi,mj ∈M ,

‖mj −mi‖2 = ‖(mj − x) + (x−mi)‖2

and by Theorem 4.8

‖(mj−x)+(x−mi)‖2+‖(mj−x)−(x−mi)‖2 = 2‖x−mj‖2+2‖x−mi‖2.

Hence

‖mj −mi‖2 = 2‖x−mj‖2 + 2‖x−mi‖2 − 4∥∥∥∥x− mi +mj

2

∥∥∥∥2

≤ 2‖x−mj‖2 + 2‖x−mi‖2 − 4δ2. (4.8)

The inequality follows by definition of δ, because M is a subspace andtherefore contains the vectors (mi + mj)/2. Now let mi be a sequencein M such that ‖x − mi‖ → δ. As i, j → ∞ then, the squeeze principlegives ‖mj −mi‖ → 0 so that mi is a Cauchy sequence in M (and in H).Because mi is a Cauchy sequence and M is closed, mi converges to apoint m0 ∈M . By continuity of the norm, ‖x−m0‖ = δ. The minimizingvector is unique. For supposing that m01,m02 ∈ M are two minimizingvectors, then choosing mi = m01 and mj = m02 in (4.8) gives

‖m02 −m01‖2 ≤ 2‖x−m02‖2 + 2‖x−m01‖2 − 4δ2

≤ 2δ2 + 2δ2 − 4δ2

= 0,

and hence m02 = m01.From an intuitive “best approximation” standpoint, it is not surprising

(Figure 4.1) that m0 is the unique minimizing vector if and only if theerror vector m0 − x is orthogonal to every m ∈ M . For suppose thereexists m ∈ M that fails to be orthogonal to m0 − x. We may assume (bydividing m by its norm if necessary) that m is a unit vector. We get avector in M closer to x by subtracting the component of m0 − x along mfrom m0 as follows: denote 〈m0 − x,m〉 = α = 0. The vector m0 − αm


x

m0

m

M

H

FIGURE 4.1. Minimizing vector.

satisfies

‖x− (m0 − αm)‖2 = 〈(x−m0) + αm, (x−m0) + αm〉 (4.9)= 〈x−m0, x−m0〉+ 〈αm, x−m0〉+ 〈x−m0, αm〉+ 〈αm,αm〉

= ‖x−m0‖2 + 2(〈αm, x−m0〉) + |α|2‖m‖2= ‖x−m0‖2 − 2|α|2 + |α|2= ‖x−m0‖2 − |α|2< ‖x−m0‖2.

Because ‖x− (m0−αm)‖ < ‖x−m0‖, m0 cannot be a minimizing vector,a contradiction. The orthogonality of x−m0 to every m ∈ M is sufficientfor uniqueness of m0 because for any m ∈M we have

‖x−m‖2 = ‖x−m0 +m0 −m‖2 = ‖x−m0‖2 + ‖m−m0‖2,and hence ‖x−m‖ > ‖x−m0‖ unless m = m0.Abstract spaces also offer a concise viewpoint on the subject of Fourier

series expansion. Let f be an arbitrary vector in a Hilbert space, andx1, x2, . . . an orthonormal set in the space. For fixed n ∈ N, form thesubspace generated by functions

g(x) =n∑i=1

cixi.

Using the properties of the inner product and orthonormality, it is easilyshown that

〈f − g, f − g〉 = ‖f − g‖2 = ‖f‖2 +n∑i=1

|ci − 〈f, xi〉|2 −n∑i=1

|〈f, xi〉|2.

The terms ‖f‖2 and ∑ni=1 |〈f, xi〉|2 are fixed; hence, with ci = 〈f, xi〉 the

function f−g is minimized in the least squares sense and Bessel’s inequality

4.6 Exercises 65

n∑i=1

|〈f, xi〉|2 ≤ ‖f‖2

holds. These choices of ci are the Fourier coefficients of f . As n→∞, theresulting series on the left must converge; therefore, we deduce the Riemannlemma

limi→∞

〈f, xi〉 = 0.Consider the classical setting for Fourier series, where H is the set of

square-summable functions on [0, 2π] using Lebesgue integration with theinner product given by

〈f, g〉 = 12π

∫ 2π

0f(x)g(x) dx.

With n fixed, let M be the subspace generated by all trigonometric poly-nomials of order n. That is, let

x1, . . . , xn = 1, e±ix, e±2ix, . . . , e±mix.The Fourier coefficients ci chosen as above makes g(x) the minimizing vec-tor, the vector closest to a given f .

4.6 Exercises

4.1. Prove that in any metric space:

(a) |d(x, y) − d(x, z)| ≤ d(y, z).

(b) d(x1, xn) ≤ d(x1, x2) + d(x2, x3) + · · · + d(xn−1, xn).

4.2. Prove that if a sequence converges in a metric space, then its limit is unique.

4.3. (The space lp.) Let p ≥ 1 be a fixed real number, let X be the set of all realsequences of the form x = ξ1, ξ2, . . . such that

∑∞i=1 |ξi|p is convergent. For

two points x = ξ1, ξ2, . . . and y = η1, η2, . . . , let the distance be defined as

d(x, y) =

( ∞∑i=1

|ξi − ηi|p)1/p

.

Show that:

(a) the series defining d(x, y) is convergent for all x, y ∈ X; and

(b) X is a metric space.

4.4. Show that C[a, b] with distance defined using

d(f, g) =∫ b

a

|f(x) − g(x)| dx

is a metric space.


4.5. Show that the set of all bounded sequences xi with

d(x, y) = sup1≤i<∞

|xi − yi|

is a metric space.

4.6. Show that equation (4.7) gives rise to a valid norm.

4.7. In a triangle ABC let the points β and α be the midpoints of sides AC andBC, respectively. Show that if the side lengths of the triangle satisfy AC > BC,then the medians satisfy Aα > Bβ.

4.8. State and prove conditions for equality in Minkowski’s inequality.

4.9. Use the Cauchy–Schwarz inequality to prove Theorem 1.1.

5Some Applications

5.1 Introduction

We now look at a small subset of inequality applications. The reader whohas worked patiently through the mathematical content of the previouschapters should be comfortable dealing with these and many other areasof inequality application. The topics of this chapter are chosen for variety,and are presented in no particular order (just as we might encounter themin practice).

5.2 Estimation of Integrals

This idea was introduced in Chapter 2, and we now offer some additionalexamples. Note that the triangle, Cauchy–Schwarz, and Minkowski inequal-ities all provide upper bounds for an integral. A useful lower bound canoften be obtained from the Chebyshev inequality.

Example 5.1. Consider the integral

I =∫ 1

0

√1 + x5 dx.

Because the integrand takes extreme values of 1 and√2 on [0, 1], the in-

equality 1 ≤ I ≤ √2 is easily obtained. However, Cauchy–Schwarz with

68 5. Some Applications

g(x) ≡ 1 gives an improved upper bound:∫ 1

0|√1 + x5| dx ≤

√∫ 1

0(1 + x5) dx =

76≈ 1.167.

So 1 ≤ I ≤ 7/6. (A numerical evaluation gives I ≈ 1.075.)Example 5.2. Given two functions f(t) and g(t) defined on (−∞,∞), theconvolution of f(t) and g(t), written f(t) ∗ g(t), is defined by

f(t) ∗ g(t) =∫ ∞

−∞f(x)g(t− x) dx

provided the integral exists. The function f ∗ g is bounded provided thatboth f(t) and g(t) are square integrable on (−∞,∞) (see Exercise 3.15).For by Cauchy–Schwarz we have

|f(t) ∗ g(t)|2 =(∫ ∞

−∞f(x)g(t− x) dx

)2

≤∫ ∞

−∞|f(x)|2 dx

∫ ∞

−∞|g(t− x)|2 dx.

Hence |f(t) ∗ g(t)| <∞ for all t, and f ∗ g is bounded.Example 5.3. The integrand of

I =∫ 5

2ex(x+ 1) dx

is a product of functions that both increase on [2, 5]. Hence by the Cheby-shev inequality

I ≥ 13

∫ 5

2exdx

∫ 5

2(x+ 1) dx =

92(e5 − e2) ≈ 635.

An upper bound can be obtained from Cauchy–Schwarz:

I ≤√∫ 5

2e2x dx

∫ 5

2(x+ 1)2 dx ≈ 832.

The precise value of I is close to 727.

5.3 Series Expansions

Series of functions arise in many contexts. Suppose the functions ui(x), fori = 1, 2, 3, . . . , have a common domain D along the x-axis. The nth partial

5.3 Series Expansions 69

sum of the series∑

ui(x) is

Sn(x) =n∑i=1

ui(x).

For each fixed x0 ∈ D we could view∑

ui(x0) as a numerical series anddecide on its convergence or divergence via the standard tests from calcu-lus. This would yield information about pointwise convergence of

∑ui(x).

However, uniform convergence is more important. We say that∑

ui(x)converges uniformly to u(x) on D if and only if for every ε > 0 there isan N > 0 (dependent on ε but not on x) such that for every n and forevery x in D, n > N ⇒ |u(x) − Sn(x)| < ε. Uniform convergence cansettle the question whether a given series of functions can be integrated ordifferentiated termwise. The manipulation

∫ b

a

∞∑i=1

ui(x) dx =∞∑i=1

∫ b

a

ui(x) dx

is valid provided that the functions ui(x) are integrable and∑

ui(x) con-verges uniformly on [a, b]. The manipulation

d

dx

∞∑i=1

ui(x) =∞∑i=1

d

dxui(x)

is valid for all x ∈ [a, b] if the functions ui(x) have continuous derivativesin [a, b],

∑ui(x) converges uniformly on [a, b], and the differentiated se-

ries on the right converges uniformly in [a, b]. A useful lemma called theWeierstrass M -test provides sufficient conditions for uniform convergence.Suppose a convergent series of positive constants

∑Mi can be found such

that for all x in D and for all i, |ui(x)| ≤ Mi. Call u(x) =∑

ui(x) andM =∑

Mi. Then

|u(x)−Sn(x)| =∣∣∣∣∣

∞∑i=n+1

ui(x)

∣∣∣∣∣ ≤∞∑

i=n+1

|ui(x)| ≤∞∑

i=n+1

Mi =

∣∣∣∣∣M −n∑i=1

Mi

∣∣∣∣∣ .But∑

Mi is convergent; hence, given ε > 0, we can choose N such thatfor n > N the last absolute value quantity is less than ε. Thus

∑ui(x)

converges uniformly on D.

Example 5.4. Let f(x) be periodic with period 2π. The series

a0 +∞∑n=1

an cosnx+ bn sinnx

is the Fourier series of f(x) if

a0 =12π

∫ π

−πf(x) dx,


and for n ∈ N,

an =1π

∫ π

−πf(x) cosnx dx, bn =

1π

∫ π

−πf(x) sinnx dx.

Convergence (especially uniform convergence) of Fourier series has receiveda great deal of study, and a general treatment of the topic involves Lebesgueintegration. However, it is instructive to see a simple set of convergenceconditions established using the M test. If f(x) has continuous derivativesthrough order 2 for all x, then the trigonometric Fourier series of f(x)converges uniformly everywhere. We integrate the formula for an by partstwice and make use of the periodicity of f(x) and its derivative to get

an = − 1n2π

∫ π

−πf ′′(x) cosnx dx.

Now since f ′′(x) is continuous on [−π, π], it attains maximum and minimumvalues on that interval. Hence, for some B > 0,

|an| ≤ 1n2π

∫ π

−π|f ′′(x)| dx ≤ 2B

n2 .

Similarly, |bn| ≤ 2B/n2. The desired conclusion follows from the M testand convergence of the numerical series

∑n−p for p > 1.

Another important type of expansion is the asymptotic expansion. Inan asymptotic sequence of functions each term is dominated, in o fashion,by the previous term. Thus, the criterion for wn(x) to be an asymptoticsequence for x→ x0 is wn+1(x) = o(wn(x)), or equivalently

limx→x0

wn+1(x)wn(x)

= 0.

The weighted sum∑

anwn(x), where the an are constants, might turn outto be a good approximation to some function f(x) when x is close to x0. If

f(x)−m∑n=1

anwn(x) = o(wm(x)) (x→ x0), (5.1)

then the summation is an asymptotic expansion to m terms of f(x) forx→ x0, and we write

f(x) ∼m∑n=1

anwn(x) (x→ x0).

(The special case m = 1 gives rise to a single-term “expansion” known asan asymptotic formula for f .) For fixed m, the difference between f and its

5.3 Series Expansions 71

asymptotic expansion approaches zero faster than the last term includedin the expansion. Many functions have asymptotic expansions for large xof the form

f(x) ∼m∑n=0

anxn

(x→∞),

i.e., in inverse powers of x. If such a function can be written without ap-proximation as

f(x) =m∑n=0

anxn

+Rm(x)

then suitable criteria on the remainder are that for any fixed m,

Rm(x)→ 0, Rm = O

(1

xm+1

)(x→∞). (5.2)

The latter O requirement stipulates the existence of some finite B suchthat for sufficiently large x,

|Rm| ≤ B/xm+1.

Hence xm|Rm| ≤ B/x, and the quantity Rm/(1/xm) is squeezed to zero asx→∞, implementing the o requirement in (5.1).Example 5.5. Consider the function g(x) defined by the integral

g(x) =∫ ∞

x

ex−t

tdt.

An m-fold integration by parts yields

g(x) =1x− 1

x2 +2!x3 −

3!x4 + · · ·+ (−1)m−1 (m− 1)!

xm+Rm(x),

where

Rm(x) = (−1)mm!∫ ∞

x

ex−t

tm+1 dt.

But we can give a little and write

∫ ∞

x

ex−t

tm+1 dt ≤∫ ∞

x

ex−t

xm+1 dt =1

xm+1 ,

so that (5.2) is satisfied, and thus

g(x) ∼ 1x− 1

x2 +2!x3 −

3!x4 + · · ·+ (−1)m−1 (m− 1)!

xm.


5.4 Simpson’s Rule

Suppose we need a numerical estimation of an integral of the form∫ b

a

f(x) dx. (5.3)

We assume that all derivatives of f(x) formed in the next discussion existand are continuous. Interval [a, b] is partitioned into 2n subintervals eachof length ∆x = (b− a)/2n, and f(x) is approximated by a quadratic poly-nomial on the first two subintervals, another quadratic polynomial on thethird and fourth subintervals, and so on. Of course, polynomials are easy tointegrate, and the sum of the integrals of the approximating polynomialsis used to approximate (5.3). In order to carry out the integration of thepolynomials, we mention Lagrange interpolation: Let x0, x1, . . . , xn ben+ 1 distinct points. Define the function

li(x) =∏j =i

(x− xj)(xi − xj)

.

Then li(xi) = 1 and li(xj) = 0 if j = i. The polynomial

pn(x) =n∑i=0

f(xi)li(x)

interpolates f(x) at x0, x1, . . . , xn; that is, pn(x) = f(x) at each xi.We call h = ∆x. We now restrict our attention to the first two intervals:x1 = x0 + h, x2 = x0 + 2h. On [x0, x2]

p2(x) = f(x0)l0(x) + f(x1)l1(x) + f(x2)l2(x).

The integral∫ x2

x0p2(x) dx, which we denote by S[x0,x2], is after simplification

S[x0,x2] =h

3(f(x0) + 4f(x1) + f(x2)). (5.4)

To find the difference (error) between the integral∫ x2

x0f(x) dx and the ap-

proximation S[x0,x2] we expand both terms in Taylor series. Define F (x) =∫ xx0

f(t) dt. By Theorem 2.5, F ′(x) = f(x), F ′′(x) = f ′(x), etc.

F (x0 + 2h) = F (x0) + F ′(x0)2h+ · · ·+ F (5)(x0)5!

(2h)5 +O(h6)

= f(x0)2h+ · · ·+ f (4)(x0)5!

(2h)5 +O(h6),

f(x0 + h) = f(x0) + f ′(x0)h+ · · ·+ f (4)(x0)4!

(h)4 +O(h5),

5.4 Simpson’s Rule 73

f(x0 + 2h) = f(x0) + f ′(x0)2h+ · · ·+ f (4)(x0)4!

(2h)4 +O(h5).

Since ∫ x2

x0

f(x) dx = F (x0 + 2h)

substituting the Taylor series for the various terms and simplifying, we get∫ x2

x0

f(x) dx− S[x0,x2] =−h5

90f (4)(x0) +O(h6).

Summing over all pairs of intervals, we get the difference∫ b

a

f(x) dx− S[a,b] =−h5

90f (4)(x0) + · · ·

+−h5

90f (4)(x2n−2)

+O(h6) + · · ·+O(h6), (5.5)

where S[a,b] is the Simpson approximation given by

S[a,b] = S[x0,x2] + · · ·+ S[xn−2,xn]

=h

3(f(x0) + 4f(x1) + 2f(x2) + · · ·+ 4f(x2n−1) + f(x2n)).

Let M and m denote the maximum and minimum values, respectively, off (4)(x) on [a, b]. Then

nm ≤ f (4)(x0) + · · ·+ f (4)(x2n−2) ≤ nM,

m ≤ f (4)(x0) + · · ·+ f (4)(x2n−2)n

≤M.

So by the intermediate value property, for some ξ ∈ [a, b],f (4)(x0) + · · ·+ f (4)(x2n−2)

n= f (4)(ξ).

Since b− a = 2nh, we can write the sum of the n terms

−h5

90f (4)(x0) + · · ·+ −h5

90f (4)(x2n−2) =

−h4

180f (4)(ξ)(b− a).

Similarly, the terms

O(h6) + · · ·+O(h6) = nO(h6) = ((b− a)/2h)O(h6) = O(h5).

The error, (−h4/180)f (4)(ξ)(b− a) +O(h5), more simply put, is∫ b

a

f(x) dx− S[a,b] = O(h4).


This expression for the error is of theoretical interest: if (b − a) andthe higher derivatives of f(x) are not large, then Simpson’s rule is veryaccurate for small h. In practice, we seldom know the fourth derivativeof a function, and instead keep doubling the number of partitions until(within a specified tolerance) convergence, noting that the sum of functionevaluations at the interior points at one iteration gives the sum of thefunction evaluations with even indices at the next iteration, hence does notneed to be recomputed. For other methods including Rhomberg’s method,see Patel [49].

5.5 Taylor’s Method

Suppose we are given the initial-value problem

y′ = f(x, y) (y(a) = y0). (5.6)

Suppose y(x) is a solution to the problem and that a numerical estimateof y(b) is to be computed. We suppose y(x) ∈ Cp+1[a, b]. Assuming y(xn)has been computed (accurately), we want to compute y at the next valueof x, at xn+1 = xn + h. By Taylor’s theorem

y(xn+1) = y(xn) + y′(xn)h+ y′′(xn)h2

2+ · · ·

+ y(p)(xn)hp

p!+ y(p+1)(ξ)

hp+1

(p+ 1)!. (5.7)

Since y(x) is an unknown function of x, we do not know y′(x), y′′(x), . . . ,explicitly. By (5.6) we do know y′(x) in terms of x, y(x) and by the chainrule we can express y′′(x), y′′′(x), . . . in terms of x, y(x). To simplify nota-tion, we write ∂f/∂x as fx, ∂f/∂y as fy, etc. Since

y′(x) = f(x, y), (5.8)

by the chain rule

y′′(x) = f ′(x) = fx(x, y) + fy(x, y)y′(x) = fx(x, y) + fy(x, y)f(x, y).

We shorten the notation for the right-hand side so

y′′(x) = (fx + fyf)|(x,y). (5.9)

Similarly,

y′′′ = (fxx + 2ffxy + fyyf2 + fxfy + f2

y f)|(x,y). (5.10)

Similar but more complicated expressions hold for the higher derivatives.Fortunately, the tedium formerly involved in entering such expressions into

5.5 Taylor’s Method 75

a Fortran or C program has been overcome by the availability of symbolicmanipulators, which permit us to cut and paste symbolic computations ofderivatives into a program (see Exercise 5.5). We write Taylor’s series fory(xn + h) as

y(xn+1) = y(xn) + hΦp(xn, y(xn)) + y(p+1)(ξ)hp+1

(p+ 1)!,

where

Φp(x, y) =(f +

h

2(fx + fyf) + · · ·+ h(p−1)

p!f (p−1)

)∣∣∣∣(x,y)

,

where the term f (p−1)(x, y(x)) is the (p − 1)th derivative with respect tox and can be expanded as in (5.8), (5.9), etc. Taylor’s algorithm of orderp uses the Taylor polynomial of degree p to get the next approximation toy(xn+1). In other words, let yn denote the last computed approximation tothe exact value y(xn). Then the next approximation

yn+1 = yn + hΦp(xn, yn).

The special case of p = 1 gives the familiar Euler method:

yn+1 = yn + hf(xn, yn).

Of course, the remainder term has been dropped, so at each step thereis a (discretization) error caused by approximating the value of y by itsTaylor series of order p. In order to simplify the discussion we ignore anyerrors due to roundoff in floating point arithmetic. A local error can growalong with a solution, and so at a later stage earlier discretization errorsmight have grown along with the solution. To see how the error can grow,call the difference at xn between the exact solution and the computedapproximation en = y(xn)− yn. Then

en+1 − en = y(xn+1)− yn+1 − (y(xn)− yn))= y(xn+1)− y(xn)− (yn+1 − yn)

= hΦp(xn, y(xn)) + y(p+1)(ξn)hp+1

(p+ 1)!− hΦp(xn, yn).

(5.11)

We now add the hypothesis that there exists a positive constant L suchthat for all u, v ∈ R and all x ∈ [a, b],

|Φp(x, u)− Φp(x, v)| ≤ L|u− v|. (5.12)

Since we have assumed y(x) ∈ Cp+1[a, b] there is some constant Y suchthat

|y(p+1)(x)| ≤ Y (x ∈ [a, b]). (5.13)


Then by (5.11), (5.12), and (5.13),

|en+1| ≤ |en|+ hL|y(xn)− yn|+ Yhp+1

(p+ 1)!,

hence

|en+1| ≤ (1 + hL)|en|+ Yhp+1

(p+ 1)!. (5.14)

To see how quickly en can grow we construct, by replacing inequality withequality, a sequence zn that dominates |en|. In other words, assumingour initial condition y(a) = y0 is correct, we define z0 = e0 = 0 and, for allpositive n,

zn+1 = (1 + hL)zn + Yhp+1

(p+ 1)!.

Call

B = Y hp+1/(p+ 1)! (5.15)

so

z1 = B,

z2 = (1 + hL)z1 +B = ((1 + hL) + 1)B,

...

zn = ((1 + hL)n−1 + · · ·+ 1)B. (5.16)

Summing the geometric series, we get

zn = B(1 + hL)n − 11 + hL− 1 =

(1 + hL)n − 1hL

B.

By a Taylor series argument 1 + hL < ehL, so

zn ≤ ehLn − 1hL

B.

Since our x values xn are in [a, b], nh ≤ (b− a), and, using (5.15), we have

|en| ≤ zn ≤ Y

L

(eL(b−a) − 1

) hp

(p+ 1)!→ 0 (h→ 0).

Because Y and L are constants, at least in theory Taylor’s method con-verges to the exact solution as h → 0. The error bound, although com-forting, is not useful in practice. However, we can compare the results atthe same final point b for two different choices of h, say h and h/2, and(usually) safely assume that h has been made sufficiently small if the re-sults agree to within a specified tolerance. See [49] for more sophisticatedmethods of choosing stepsize h appropriately.

5.6 Special Functions of Mathematical Physics 77

5.6 Special Functions of Mathematical Physics

Applied science is replete with so-called special functions, many of whichsatisfy interesting inequalities [1, 4, 27, 58]. We provide a basic selection ofexamples in order to whet the reader’s appetite.

Example 5.6. If [z] > 0, the gamma function Γ(z) is given by Euler’sintegral of the second kind:

Γ(z) =∫ ∞

0tz−1e−t dt.

When z = x where x is real and positive, Γ(x) can be differentiated anynumber of times, with

dnΓ(x)dxn

=∫ ∞

0tx−1e−t(ln t)n dt

for any x > 0. By the Cauchy–Schwarz inequality,

|Γ′(x)|2 ≤∫ ∞

0(t(x−1)/2e−t/2)2 dt

∫ ∞

0(t(x−1)/2e−t/2 ln t)2 dt

and we obtain the result

|Γ′(x)|2 ≤ Γ(x)Γ′′(x).

In the complex-argument case,

|Γ(x+ iy)| =∣∣∣∣∫ ∞

0tx−1e−ttiydt

∣∣∣∣ ≤∫ ∞

0|tx−1e−t||tiy| dt,

where |tiy| = |eiy ln t| = 1, and hence|Γ(x+ iy)| ≤ |Γ(x)|.

Of course, with positive integer arguments the gamma function reduces tothe factorial function, with Γ(n) = (n− 1)! (see Exercise 5.6).Example 5.7. For x > 0 and n = 0, 1, 2, . . . , a sequence of exponentialintegrals

En(x) =∫ ∞

1

e−xt

tndt

may be defined. The observation(∫ ∞

1

e−xt

tndt

)2

=(∫ ∞

1

e−xt/2

t(n−1)/2

e−xt/2

t(n+1)/2 dt

)2

≤∫ ∞

1

(e−xt/2

t(n−1)/2

)2

dt

∫ ∞

1

(e−xt/2

t(n+1)/2

)2

dt


leads to the inequality

E2n(x) ≤ En−1(x)En+1(x)

for n = 1, 2, 3, . . . .

Example 5.8. The function

Tn(x) = cos(n cos−1 x)

for n ∈ N is the Chebyshev polynomial of the first kind, order n. Obviously,

|Tn(x)| ≤ 1whenever −1 ≤ x ≤ 1. With x = cos p, differentiation gives

dTn(x)dx

= − 1sin p

dTn(p)dp

= nsinnpsin p

.

The maximum occurs at p = 0, and we obtain∣∣∣∣dTn(x)dx

∣∣∣∣ ≤ n2

whenever −1 ≤ x ≤ 1.Example 5.9. The Bessel function of the first kind and integer order nmay be defined for −∞ < x <∞ by the series expansion

Jn(x) =∞∑m=0

(−1)m(x/2)2m+n

m!(m+ n)!,

or by the integral representation

Jn(x) =1π

∫ π

0cos(nt− x sin t) dt.

Immediately from the integral representation

|Jn(x)| ≤ 1π

∫ π

0|cos(nt− x sin t)| dt ≤ 1

π

∫ π

0(1) dt = 1.

Other useful properties of Jn(x), such as

J−n(x) = (−1)nJn(x)may be derived from the series expansion. Still others follow from the gen-erating function relation

exp[x

2

(t− 1

t

)]=

∞∑n=−∞

Jn(x)tn,


among these the symmetry property

Jn(−x) = (−1)nJn(x),the fact that J0(0) = 1, and the addition theorem

Jn(x+ y) =∞∑

m=−∞Jm(x)Jn−m(y).

Putting y = −x and n = 0 we obtain

J0(0) = 1 =∞∑

m=−∞Jm(x)J−m(−x)

so that

1 = J0(x)J0(−x) +∞∑m=1

[J−m(x)Jm(−x) + Jm(x)J−m(−x)]

= J20 (x) + 2

∞∑m=1

J2m(x).

Hence the bound|Jm(x)| ≤ 1/

√2

for m = 1, 2, . . . .It is also of interest to note the interlacing property of the zeros of con-

secutively ordered Bessel functions. We take Rolle’s theorem, along withthe differentiation formulas

[xnJn(x)]′ = xnJn−1(x) = −x−nJn+1(x)

which hold for any x > 0. With n = k and n = k + 1, we obtain

[xkJk(x)]′ = −x−kJk+1(x), [xk+1Jk+1(x)]′ = xk+1Jk(x),

respectively. The first of these equations implies that between any two zerosof Jk, Jk+1 has at least one zero; but the second implies that between anytwo zeros of Jk+1, Jk has at least one zero. Hence, each function has one andonly one zero between each pair of zeros of the other, and the interlacingproperty is established.

Example 5.10. The Legendre polynomials Pn(x), for n = 0, 1, 2, . . . , aresolutions to a certain common ordinary differential equation; they are alsogiven [27] by the Laplace integral formula

Pn(x) =1π

∫ π

0[x+√

x2 − 1 cos t]n dt = 1π

∫ π

0[x+ i√1− x2 cos t]n dt


for |x| ≤ 1, and possess many other properties, among them the integral∫ 1

−1xmPn(x) dx = 0 (0 ≤ m < n),

and the recursion formula

(x2 − 1)P ′n(x) = n[xPn(x)− Pn−1(x)] (|x| < 1).

By the Laplace formula,

π|Pn(x)| ≤∫ π

0|x+ i√1− x2 cos t|n dt

=∫ π

0[x2 + (1− x2) cos2 t]n/2 dt

≤∫ π

0[x2 + (1− x2)]n/2 dt.

Hence the upper bound |Pn(x)| ≤ 1 whenever |x| ≤ 1 and n = 0, 1, 2, . . . .Alternatively,

π|Pn(x)| ≤∫ π

0[x2 + (1− x2) cos2 t]n/2 dt

= 2∫ π/2

0[1− (1− x2) sin2 t]n/2 dt

≤ 2∫ π/2

0

[1− (1− x2)

(2tπ

)2]n/2

dt

< 2∫ π/2

0

(exp[−4(1− x2)t2

π2

])n/2dt.

The last step follows from the fact that e−x > 1− x for every x > 0. Then

π|Pn(x)| ≤ 2∫ π/2

0exp[−2n(1− x2)t2

π2

]dt

≤ 2∫ ∞

0exp[−2n(1− x2)t2

π2

]dt.

The remaining integral exists in closed form for n = 0 and gives us

|Pn(x)| ≤√

π

2n(1− x2)

for |x| < 1 and n = 1, 2, 3, . . . . The recursion relation can be treated withthe triangle inequality,

|P ′n(x)| =

n|xPn(x)− Pn−1(x)||x2 − 1| ≤ n

|xPn(x)|+ |Pn−1(x)||x2 − 1|

≤ n|x|+ 1|x2 − 1| = n

|x|+ 1||x|2 − 1| ,


giving the upper bound

|P ′n(x)| ≤

n

1− |x|

on the first derivatives whenever |x| < 1.

We spend a bit more time on the Legendre polynomials.

Example 5.11. The Legendre polynomials are orthogonal polynomials. Afamily of polynomials pn(x) for n = 0, 1, 2, . . . is said to be orthogonal withrespect to a weight function w(x) > 0 on [a, b] if for m = n

∫ b

a

pn(x)pm(x)w(x) dx = 0.

One interesting fact about orthogonal polynomials concerns the ease withwhich a bound can be placed on the locations of their zeros. Putting m = 0we have, for all n ≥ 1,

∫ b

a

pn(x)w(x) dx = 0,

and it is evident that there exists at least one x ∈ (a, b) at which pn(x)has a sign change. If all such points are denoted by x1, . . . , xk, then thequantity pn(x)(x−x1) · · · (x−xk)w(x) never changes sign in [a, b]. However,k < n would imply that

∫ b

a

pn(x)(x− x1) · · · (x− xk)w(x) dx = 0,

because pn(x) is orthogonal to any polynomial of degree less than n. (Thisfollows from the fact that any polynomial of degree less than n can bewritten uniquely as a linear combination of the polynomials pj(x) for j <n.) So k ≥ n, hence k = n because pn(x) cannot have more than n zeros.Our conclusion: the zeros of pn(x) are all real, distinct, and located in (a, b).

Example 5.12. Suppose that Pn(x), a polynomial of degree n, is dividedthrough by its leading coefficient cn to give the related polynomial πn(x) =Pn(x)/cn. It turns out that πn(x) has a smaller norm on (−1, 1) than anyother polynomial fn(x) of degree n with leading coefficient 1. To show this,we define the difference polynomial

dn−1(x) = fn(x)− πn(x)


of degree n− 1, and note that

‖fn(x)‖2 =∫ 1

−1[dn−1(x) + πn(x)]2 dx

=∫ 1

−1d2n−1(x) dx+ 2

∫ 1

−1dn−1(x)πn(x) dx+

∫ 1

−1π2n(x) dx

= ‖dn−1(x)‖2 + ‖πn(x)‖2.Hence ‖fn(x)‖2 ≥ ‖πn(x)‖2.We can sometimes make direct use of power series expansions in estab-

lishing bounds for special functions.

Example 5.13. The modified Bessel function of the first kind and zerothorder is given by

I0(x) =∞∑n=0

x2n

(n! 2n)2.

It is easily shown by induction that (n! 2n)2 ≥ (2n)! for any nonnegativeinteger n. Hence

I0(x) ≤∞∑n=0

x2n

(2n)!= coshx.

See also Exercise 5.10.

We hasten to point out that bounds for interesting special functions arenot always as easy to obtain as those we have seen here. For instance, muchmore work is needed to obtain a bound on the Hermite polynomials Hn(x)that are important in quantum mechanics. The interested reader can seeIndritz [30] for a treatment of these functions, including an outline of stepsleading to the inequality

|Hn(x)| ≤ (2nn!)1/2ex2/2.

5.7 A Projectile Problem

Suppose an object is thrown straight upward with initial speed v0. If thedrag due to air resistance is directly proportional to instantaneous speed,which part of the subsequent motion would take longer: the upward flight,or the return trip?Newton’s second law dictates that the velocity vu(t) for the upward mo-

tion be described by

mdvudt

= −mg − kvu,

5.7 A Projectile Problem 83

where m is the mass of the object, g is the free-fall acceleration constant,and k is the proportionality constant quantifying air resistance. With thegiven initial condition, this equation has solution

vu(t) = γg[(1 + α)e−t/γ − 1],where γ = m/k and α = v0/γg. The time tu to complete the upwardmotion can be found from the condition vu(tu) = 0; it is

tu = γ ln(1 + α)

and the maximum height reached is

h =∫ tu

0vu(t) dt = γ2g[α− ln(1 + α)].

Referring a new time origin to the start of the downward motion, the speedvd for the downward trip is governed by

dvddt

+1γvd = g.

When subjected to the initial condition vd(0) = 0, the solution becomes

vd(t) = γg(1− e−t/γ).

The time interval td for completion of the downward motion must thensatisfy ∫ td

0vd(t) dt = h

orγgtd + γ2g(e−td/γ − 1) = γ2g[α− ln(1 + α)].

This is a transcendental equation for the unknown td; introducing the vari-able Td = td/γ, we can write it as F (Td) = 0 where

F (x) = x+ e−x − (1 + α) + ln(1 + α).

Because F ′(x) = 1−e−x > 0, F (x) is strictly increasing. Defining Tu = tu/γwe have

F (Tu) = 2 ln(1 + α)− (1 + α) +1

1 + α= 2 ln(1 + α)− α

(2 + α

1 + α

)< 0,

where the last inequality is easily verified by differentiation. This and themonotonicity of F are enough to conclude (Figure 5.1) that Tu < Td. Hence,the object spends more time in its descent than it does in its ascent.Of course, the physical reliability of this conclusion depends on the cor-

rectness of the model employed. It is well known that in many (if not most)situations drag is actually proportional to the square of the speed — seeGlaister [26] for a treatment of this case. For a more general analysis witharbitrary air resistance, see de Alwis [16].


xTd

y

F (x)

F (Tu)

FIGURE 5.1. Times for projectile motion.

5.8 Geometric Shapes

It is worthwhile to examine a few applications of inequalities to simplegeometrical objects.

Example 5.14. A polyhedron is a solid figure bounded by planes. Such afigure can be considered as a union of a (finite) number of polygonal faces.The faces are joined along line segments called edges, and at the two endsof each edge are points called vertices. Among the most beautiful of thepolyhedra are the regular polyhedra, where the faces are all congruent reg-ular polygons. That there exist only five of these (the Platonic solids) canbe shown using simple arguments with inequalities. The proof is based onEuler’s formula, which states that for any simple polyhedron, the numberof faces F , the number of edges E, and the number of vertices V are allrelated by the equation F −E + V = 2. For instance, the cube has F = 6,E = 12, V = 8, while the tetrahedron has F = 4, E = 6, V = 4. A precisedefinition of the term “simple” would require a topological digression thatwould send us too far afield; suffice it to say that we rule out shapes withholes in them, such as toroidal-shaped polyhedra [6].Consider then a simple, regular polyhedron. Because the face polygons

are all assumed identical, we may define a constant σ as the number ofedges per face, and another constant v as the number of edges meeting ateach vertex. It is readily apparent that σ ≥ 3, v ≥ 3. We may also assertthat 2E = σF = vV , because each edge has two vertices and is shared bytwo faces. Elimination of F and V from Euler’s formula gives

1σ+1v=12+1E. (5.17)

Now because1σ+1v>12

5.8 Geometric Shapes 85

we must rule out the possibility that both σ > 3 and v > 3. Putting σ = 3in (5.17) we get

1v− 16=1E

> 0

and hence the restriction 3 ≤ v ≤ 5; putting instead v = 3 we get

1σ− 16=1E

> 0

or 3 ≤ σ ≤ 5. Hence there are only five permissible combinations, and theycan be tabulated as follows:

σ ν E F = 2E/σ V = 2E/v Object3 3 6 4 4 tetrahedron3 4 12 8 6 octahedron3 5 30 20 12 icosahedron4 3 12 6 8 cube5 3 30 12 20 dodecahedron

These are the Platonic solids.

An interesting class of inequalities called isoperimetric inequalities pro-vides information about various extremal properties of geometric shapes.We encounter one of these inequalities in our next example (see [44]).

Example 5.15. Let f(x) be periodic with period L. We showed previ-ously that under suitable restrictions it can be represented by a uniformlyconvergent Fourier series. The form of the series in this case is

f(x) = a0 +∞∑n=1

an cos2πnL

x+ bn sin2πnL

x.

Similarly, for another function F (x) of the same period,

F (x) = A0 +∞∑n=1

An cos2πnL

x+Bn sin2πnL

x.

Integration of the product of these two functions over [0, L] gives Parseval’sidentity

∫ L

0f(x)F (x) dx =

L

2

[2a0A0 +

∞∑n=1

(anAn + bnBn)

]. (5.18)

Consider a simple, smooth, closed plane curve C with known length Las in Figure 5.2. We ask what shape C must have in order that its enclosedarea A be maximized. (This is an extremal problem, but not of the type


x

y

C

s = 0si

∆x

∆y

∆s

FIGURE 5.2. Derivation of an isoperimetric inequality.

normally encountered in calculus courses.) We choose a reference point Pon C, and define a parameter s to measure arc length along C to the point(x, y) as shown. We assume that x(s) and y(s), each with period L, aresufficiently smooth (i.e., differentiable) so that we may represent them asFourier series:

x(s) = a0 +∞∑n=1

an cos2πnL

s+ bn sin2πnL

s,

y(s) = A0 +∞∑n=1

An cos2πnL

s+Bn sin2πnL

s.

Moreover, uniform convergence permits termwise differentiation:

x′(s) =∞∑n=1

2πnL

(bn cos

2πnL

s− an sin2πnL

s

),

y′(s) =∞∑n=1

2πnL

(Bn cos

2πnL

s−An sin2πnL

s

).

The application of (5.18) to these series and addition of the results gives∫ L

0[x′2(s) + y′2(s)] ds =

2π2

L

∞∑n=1

n2(a2n + b2n +A2

n +B2n).

But x′2(s) + y′2(s) ≡ 1 so that∞∑n=1

n2(a2n + b2n +A2

n +B2n) =

L2

2π2 .

5.8 Geometric Shapes 87

Referring again to the figure, we have for the enclosed area

A ≈∑i

x(si)∆y =∑i

x(si)∆y

∆s∆s

or in the limit

A =∫ L

0x(s)y′(s) ds = π

∞∑n=1

n(anBn −Anbn)

by (5.18) and use of the differentiated series above. Then

L2 − 4πA = 2π2∞∑n=1

n2(a2n + b2n +A2

n +B2n)− 4π2

∞∑n=1

n(anBn −Anbn)

= 2π2∞∑n=1

[(nan −Bn)2 + (nAn + bn)2 + (n2 − 1)(b2n +B2n)]

or, since the right member is nonnegative,

A ≤ L2/4π.

This is the desired isoperimetric inequality, and will answer our maximiza-tion question. It is apparent that equality holds if and only if: (1) all theFourier coefficients vanish whenever n ≥ 2; and (2) a1 = B1 and A1 = −b1.Under these conditions

x(s) = a0 + a1 cos2πL

s+ b1 sin2πL

s,

y(s) = A0 − b1 cos2πL

s+ a1 sin2πL

s.

Squaring and adding to eliminate s, we obtain

(x− a0)2 + (y −A0)2 = a21 + b21.

Hence of all closed curves of a given length, a circle encloses the greatestarea.

Example 5.16. We can show that of all triangles having a given fixedperimeter, an equilateral triangle encloses the greatest area. The area A ofa triangle is given by Heron’s formula

A =√

s(s− a)(s− b)(s− c),

where a, b, c are the side lengths and s is one-half the perimeter p. By theAM–GM inequality,

3

√A2

s= 3√(s− a)(s− b)(s− c) ≤ (s− a) + (s− b) + (s− c)

3=

s

3.


Hence

A ≤ s2

3√3=

p2

12√3.

Equality is attained only if the numbers s− a, s− b, s− c are all equal.

5.9 Electrostatic Fields and Capacitance

Electrostatics is the study of stationary electric charges and their effectson one another. Utmost in electrostatics is the conservative nature of theforce field, which permits the vector electric intensity to be expressed asthe gradient of a potential function. The electrostatic potential Φ(x, y, z)is produced by electric charge and satisfies Poisson’s equation

∇2Φ =∂2Φ∂x2 +

∂2Φ∂y2 +

∂2Φ∂z2 = −

ρ

ε0,

where ρ = ρ(x, y, z) is the density of electric charge (Coulombs/meter3)and ε0 is the free-space permittivity (a positive constant). Φ is convenientbecause of its scalar nature, and we can study some of its most fundamentalproperties through basic manipulations with inequalities.

Example 5.17. Consider electric charge distributed with continuous den-sity ρ(x, y, z) throughout a volume region V in unbounded free space. Theresulting potential at points (x, y, z) external to V is given by

Φ(x, y, z) =1

4πε0

∫V

ρ(x′, y′, z′)R

dx′dy′dz′,

where R is the distance from (x, y, z) to an element of electric charge at thelocation (x′, y′, z′) of the differential volume dx′dy′dz′. We inquire as to howΦ behaves at large distance from V . For simplicity we assume ρ(x, y, z) ≥ 0in V ; the following argument is easily modified if negative charge is present.Let the maximum and minimum values of R for a fixed point (x, y, z) beRM and Rm, respectively. Then

1RM

≤ 1R≤ 1

Rm

and we obtain∫V

ρ

RMdx′dy′dz′ ≤

∫V

ρ

Rdx′dy′dz′ ≤

∫V

ρ

Rmdx′dy′dz′

orQ

4πε0R

RM≤ RΦ ≤ Q

4πε0R

Rm,

5.9 Electrostatic Fields and Capacitance 89

whereQ =∫V

ρ dx′dy′dz′

is the total charge in V . As R → ∞, R/RM and R/Rm both approach 1and the middle term RΦ is squeezed to Q/4πε0. Hence Φ = O(R−1) andthe potential is said to be regular at infinity.

Example 5.18. (See [60]). Consider a two-dimensional situation where

∂2Φ∂x2 +

∂2Φ∂y2 = −

ρ(x, y)ε0

(5.19)

holds within a bounded domain D of the xy-plane. Let the boundary curveof D be C. We investigate a property possessed by any continuous solutionΦ(x, y) under the condition that ρ is strictly negative so that the forcingfunction for (5.19) is strictly positive. If Φ is continuous on D ∪C, then itmust attain a maximum on D ∪C, at point p0 = (x0, y0) say. Now p0 ∈ Dimplies that simultaneously

∂2Φ∂x2

∣∣∣∣p0

≤ 0, ∂2Φ∂y2

∣∣∣∣p0

≤ 0,

a contradiction, and hence p0 ∈ C. Under the given assumptions maxΦmust occur on the boundary contour C.Next, suppose that ρ is required only to be nonpositive in D. We take B

to be an upper bound for Φ on C, and let

Φ(x, y) = φ(x, y)− ε(x2 + y2),

where φ is a new function and ε > 0 is arbitrary. Substitution into (5.19)gives

∂2φ

∂x2 +∂2φ

∂y2 = −ρ(x, y)

ε0+ 4ε.

Because φ is a solution of (5.19) and its forcing function is strictly positive,we know that maxφ occurs on C. Because Φ(x, y) ≤ φ(x, y),

max(x,y)∈D

Φ ≤ max(x,y)∈D

φ ≤ max(x,y)∈C

φ = max(x,y)∈C

[Φ + ε(x2 + y2)]

= B + ε max(x,y)∈C

(x2 + y2)

for every ε > 0. Hence for every (x, y) ∈ D, Φ(x, y) ≤ B.The two results obtained above, called maximum principles, yield prior

knowledge about the behavior of all possible solutions of Poisson’s equationunder certain prescribed conditions. The case ρ(x, y) ≡ 0 of (5.19) is theimportant Laplace equation

∂2Φ∂x2 +

∂2Φ∂y2 = 0. (5.20)


We suppose for (5.20) that there exist positive numbers b and B such thatfor every (x, y) ∈ C,

b ≤ Φ(x, y) ≤ B. (5.21)

Then Φ(x, y) ≤ B in D by the maximum principle above. Moreover, −Φalso satisfies (5.20) and the condition −B ≤ −Φ(x, y) ≤ −b on C. Hence onD we have −Φ(x, y) ≤ −b and it follows that (5.21) holds there. In otherwords, both the maximum and minimum values of any solution to (5.20) ina bounded domain must occur on the boundary of the domain. The readerinterested in pursuing this area further could obtain Protter [51].

A quantity of interest in electrostatics is the capacitance of a metallicbody. Consider a conducting solid with boundary surface A and held atpotential Φ0, and let Φ be the potential produced by the charge on thebody. The capacitance of the body is defined as the ratio of the totalcharge it carries to the surface potential Φ0, and is given by

C =ε0Φ2

0

∫Ve

|∇Φ|2 dV, (5.22)

where volume integration is done over the space Ve exterior to A. Based onthis relation we can derive an inequality that provides a convenient upperbound on the capacitance. We introduce two new scalar fields f and δ (nothaving any particular physical interpretation) such that

f(x, y, z) = Φ(x, y, z) + δ(x, y, z),

where δ = 0 on the body A and δ → 0 at large distance from A so that fsatisfies the same boundary conditions as Φ. Notice that∫Ve

|∇f |2 dV =∫Ve

|∇Φ+∇δ|2 dV

=∫Ve

|∇Φ|2 dV + 2∫Ve

∇δ · ∇Φ dV +∫Ve

|∇δ|2 dV,

so that by (5.22), Green’s formula∮S

φ∂ψ

∂ndS =∫V

∇φ · ∇ψ dV +∫V

φ∇2ψ dV

and Laplace’s equation,∫Ve

|∇f |2 dV =Φ2

0C

ε0+ 2[∮

S

δ∂Φ∂n

dS −∫Ve

δ∇2Φ dV

]+∫Ve

|∇δ|2 dV

=Φ2

0C

ε0+∫Ve

|∇δ|2 dV. (5.23)

5.9 Electrostatic Fields and Capacitance 91

VeL

V ′

FIGURE 5.3. Bound on capacitance.

Because the rightmost term is nonnegative, we have Dirichlet’s principle

C ≤ ε0Φ2

0

∫Ve

|∇f |2 dV. (5.24)

Equality is attained when f = Φ, i.e., the actual potential; any other func-tion that fits the same boundary conditions will overestimate C.

Example 5.19. The capacitance C of a body is less than that of any otherbody that can completely surround it (Figure 5.3). For let ΦL be the actualpotential due to the larger body, and take f = ΦL at points in the regionVeL outside the larger body while taking f = Φ0 everywhere inside. Thenf satisfies the boundary conditions for both bodies and

C ≤ ε0Φ2

0

∫Ve

|∇f |2 dV =ε0Φ2

0

∫V ′|∇Φ0|2 dV +

ε0Φ2

0

∫VeL

|∇ΦL|2 dV = CL.

For instance, because the capacitance of a sphere is elementary we couldget a rough estimate of the capacitance of a cube by inscribing and circum-scribing appropriate spheres.

The reader interested in more sophisticated uses of Dirichlet’s principleis referred to Polya and Szego [50] who, for instance, show that the capac-itance of a convex but otherwise arbitrarily shaped body cannot exceedthe capacitance of a certain related prolate spheroid. The capacitance of aprolate spheroid of eccentricity e is well known [56] and is given by

C =8πε0ae

ln[(1 + e)/(1− e)], (5.25)

where a is the major semiaxis of the generating ellipse. The capacitanceof a convex body can never exceed the capacitance of a prolate spheroid,the major and minor semiaxes of which are the mean radius and surfaceradius, respectively, of the body. These last terms are further defined in the


reference, where this elegant result is used to attack the difficult problemof better estimating the capacitance of a cube.Another approach to the estimation of electrical capacitance is based

on a geometrical concept of symmetrization. Steiner symmetrization of agiven solid B with respect to a plane P is an operation which changes Binto new solid B′ such that:

• B′ is symmetric with respect to P ;

• if L is a straight line perpendicular to P , then L intersects B if andonly if L intersects B′, and both intersections have the same length;

• L ∩B′ is just one line segment, bisected by P (or, is a point of P ina degenerate case).

P is known as the plane of symmetrization for the operation. For instance,suppose our original solid is the hemispherical ball

B = (x, y, z) | 0 < z <√

a2 − x2 − y2

and P is the z = 0 plane. We see that the new solid

B′ =(x, y, z) | |z| < 1

2

√a2 − x2 − y2

satisfies the three conditions of the definition of symmetrization; hence, B′

is the ellipsoidx2

a2 +y2

a2 +z2

(a/2)2= 1.

Symmetrization has useful properties. First, the solids B and B′ haveequal volumes (as is easily verified for the example above). Second, theoperation does not increase surface area; that is, supposing B has boundaryarea S and B′ has boundary area S′, then S′ ≤ S. A similar relation holdsbetween the capacitances C and C ′ of metallic objects formed in the shapesof B and B′, respectively, i.e., that

C ′ ≤ C.

Polya and Szego discuss an ingenious application of this idea to the cal-culation of capacitance of arbitrarily shaped bodies. The main ideas areas follows. We begin with an arbitrarily shaped body B(0) having knownvolume V and unknown capacitance C(0). Now imagine symmetrizing thebody repeatedly, with respect to any number of different successive planes.After the nth such symmetrization, we get a body B(n) whose volume isstill V but whose capacitance C(n) satisfies

C(n) ≤ C(n−1) ≤ · · · ≤ C(0).

5.10 Applications to Matrices 93

As n → ∞ we should arrive at a sphere of volume V ; letting e → 0 in(5.25), the capacitance of this simple object is 4πε0a where a is the radius.Since the volume is V = 4πa3/3, we can eliminate a from the capacitanceexpression and assert that

C(0) ≥ 4πε0 3

√3V4π

for the body B(0). For a metallic cube of edge length L, for instance, thiswould yield

C ≥ 4πε0 3

√3L3

4π≈ 7.796ε0L

as a lower bound.

5.10 Applications to Matrices

Matrix theory and linear algebra contain many references to inequalities.Given an n× n square matrix of complex elements

A =

a11 · · · a1n...

. . ....

an1 · · · ann

,

an important related matrix is the conjugate transpose A† of A:

A† =

a11 · · · an1...

. . ....

a1n · · · ann

.

In terms of inner products, if x, y ∈ Cn, then 〈x,Ay〉 = 〈A†x, y〉. Thematrix A is called Hermitian or self-adjoint if A = A†. Because 〈x,Ay〉 =〈A†x, y〉 for any vectors x, y ∈ Cn and any complex matrixA, the Hermitianmatrix A satisfies

〈x,Ay〉 = 〈Ax, y〉. (5.26)

Next, we derive an important inequality involving the eigenvalues of asquare matrix A. Recall that the eigenvalues λ1, λ2, . . . , λn of A are scalarssatisfying Ax = λix for some nonzero column vectors x (the correspond-ing eigenvectors). We first note that if λ is an eigenvalue of the Hermitianmatrix A, then λ must be real. To see this, let λ be an eigenvalue withcorresponding eigenvector x. Then

〈x, x〉λ = 〈x, λx〉 = 〈x,Ax〉 = 〈Ax, x〉 = 〈λx, x〉 = λ〈x, x〉.


Since 〈x, x〉 = 0, λ = λ. In case A is a real matrix, Hermitian meanssymmetric: aij = aji for all i, j.We now restrict our discussion to symmetricmatrices. Now suppose λ1 and λ2 are two (distinct) eigenvalues of thesymmetric matrix A with corresponding eigenvectors x1, x2. Then x1 andx2 are orthogonal, i.e., 〈x1, x2〉 = 0. To see this, we write

〈x1, x2〉λ2 = 〈x1, λ2x2〉 = 〈x1, Ax2〉 = 〈Ax1, x2〉= 〈λ1x1, x2〉 = λ1〈x1, x2〉.

But λ1 = λ2, hence 〈x1, x2〉 = 0.The following several theorems are useful:

Theorem 5.1. Let A be an n × n symmetric (real) matrix. Suppose the(real) eigenvalues λi satisfy λ1 < λ2 < · · · < λn. Define the quadraticform for x ∈ Rn by Q(x) = 〈x,Ax〉. Then, for any x ∈ Rn,

λ1‖x‖2 ≤ Q(x) ≤ λn‖x‖2.

Proof. Let x1, . . . , xn be corresponding eigenvectors. We may assumeeach xi satisfies ‖xi‖ = 1. (Otherwise replace xi by xi/‖xi‖.) By our pre-vious observation, the vectors x1, . . . , xn are an orthonormal set. Hencethey are linearly independent and form a basis. Thus there exist coefficientsci such that

x =n∑i=1

cixi.

Then

Q(x) =

⟨n∑i=1

cixi , An∑j=1

cjxj

⟩=

⟨n∑i=1

cixi ,n∑j=1

cjλjxj

⟩=

n∑i=1

c2iλi,

so that

Q(x) ≤ λn

n∑i=1

c2i = λn‖x‖2.

Similarly, λ1‖x‖2 ≤ Q(x).

Theorem 5.2 (Sylvester’s Criterion). Let A be a symmetric n×n realmatrix. The quadratic form defined by Q(x) = xTAx = 〈x,Ax〉 for x ∈ Rn

is positive definite if and only if the determinants

∣∣ a11∣∣ ,∣∣∣∣ a11 a12a21 a22

∣∣∣∣ , . . . ,∣∣∣∣∣∣∣a11 · · · a1n...

. . ....

an1 · · · ann

∣∣∣∣∣∣∣are all positive.


Proof. Recall from linear algebra that an n × n symmetric matrix A iscalled positive definite if xTAx > 0 whenever the vector x = 0. We discussthe theorem for n = 2 and refer the reader to Gelfand [25] for the generalcase. Suppose Q(x) is positive definite. That is, Q(x) > 0 if x = 0. Choosex =(10

). Then Q(x) = a11 > 0. Now choose x =

(x11

). Because x = 0

we have Q(x) = a11x21 +2a12x1 + a22 > 0 for all x1 so by our discussion of

quadratic inequalities in Chapter 1,

∆ =∣∣∣∣ a11 a12a12 a22

∣∣∣∣ > 0.

The converse is proved in a similar way.

Theorem 5.3 (Second Derivative Test for n Variables). Let U bean open set in Rn. Let f(x) ∈ C2(U). Let x0 ∈ U and suppose f ′(x0) = 0and f ′′(x0) is positive definite. Then f(x) has a local minimum at x0. Thatis, there is a positive δ such that if 0 < ‖x− x0‖ < δ, then f(x) > f(x0).

Proof. We first discuss some of the terms used. f(x) ∈ C2(U) means allfirst and second partial derivatives of f with respect to x1, . . . , xn ex-ist and are continuous in U . f ′(x0) represents the n-tuple, or row vector(∂f(x0)/∂x1, . . . , ∂f(x0)/∂xn). The second derivative f ′′(x0), or Hessian,is the n×nmatrix whose (i,j)th entry is ∂2f(x0)/∂xi∂xj . If x is the column

vector

x1...xn

, then xT (the transpose) is the row vector (x1, . . . , xn). By

Sylvester’s criterion, the second derivative f ′′(x0) is positive definite if andonly if the determinants

∣∣∣∣ ∂2f

∂x21

∣∣∣∣ ,∣∣∣∣∣∣∣

∂2f∂x2

1

∂2f∂x1∂x2

∂2f∂x1∂x2

∂2f∂x2

2

∣∣∣∣∣∣∣, . . . ,

∣∣∣∣∣∣∣∣∣∣

∂2f∂x2

1· · · ∂2f

∂x1∂xn...

. . ....

∂2f∂xn∂x1

· · · ∂2f∂x2

n

∣∣∣∣∣∣∣∣∣∣are all positive at x0. By persistence of sign applied to n variables, if thedeterminants are positive at x0, then they are positive nearby. Choose apositive δ such that x ∈ U and all the determinants are positive at xwhenever ‖x − x0‖ < δ. Now let 0 < ‖∆x‖ < δ. The following generalizesequation (2.4) to n variables:

f(x0 +∆x) = f(x0) + f ′(x0)∆x+12(∆x)T f ′′(ξ)(∆x)

for some ξ (on the line segment) strictly between x0 and x0+∆x. (See [37,17].) Because f ′(x0) = 0 and f ′′(ξ) is positive definite the result follows byinspection.


Another useful concept is the trace of a square matrix M , denoted bytr[M ] and defined as the sum of the diagonal elements ofM . With B = A†Awe have bij =

∑nk=1 akiakj and the trace of B is

n∑m=1

bmm =n∑

m=1

n∑k=1

akmakm =n∑

m=1

n∑k=1

|akm|2 ≥ 0.

Hence tr[A†A] ≥ 0, and equality holds if and only if A is the zero matrix. Weuse this simple result to derive another inequality involving matrix eigenval-ues. Because λx−Ax = λIx−Ax = (λI−A)x where I is the identity matrixhaving the same dimension as A, the eigenvalues are conveniently computedas solutions of the characteristic equation det(λI −A) = 0. For any squarematrix S there is a unitary matrix U (i.e., one having U−1 = U†) such thatU†SU is upper triangular. This upper triangular matrix T = U†SU is saidto be unitarily similar to S. Since

det(λI − T ) = det(λU†IU − U†SU)

= det(U−1) det(λI − S) det(U)= det(λI − S)

it is apparent that T has the same eigenvalues as does S; moreover, theeigenvalues of T reside along the main diagonal. We use these facts asfollows. Suppose A is square with eigenvalues λ1, . . . , λn. Then B = U†AUis upper triangular and

BB† = (U†AU)(U†AU)†

= (U†AU)(U†(U†A)†)

= (U†AU)(U†A†U)

= U†AA†U,

so that BB† is unitarily similar to AA†. Because the trace of a matrix isthe sum of its eigenvalues, tr[BB†] = tr[AA†] or

n∑i=1

n∑j=1

|aij |2 =n∑i=1

n∑j=1

|bij |2.

But remembering that B is upper triangular and unitarily similar to A, wehave

n∑i=1

n∑j=1

|bij |2 =n∑i=1

i−1∑j=1

|bij |2 + |bii|2 +n∑

j=i+1

|bij |2

=n∑i=1

|λi|2 +n∑i=1

n∑j=i+1

|bij |2.


Hencen∑i=1

n∑j=1

|aij |2 ≥n∑i=1

|λi|2.

This is Schur’s inequality. Equality holds if and only ifn∑i=1

n∑j=i+1

|bij |2 = 0,

i.e., if and only if B is a diagonal matrix.

Example 5.20. Schur’s inequality can be applied to find a rough boundon the magnitudes of the individual eigenvalues. Since

|λi|2 ≤n∑i=1

|λi|2 ≤n∑i=1

n∑j=1

|aij |2 ≤ n2max |aij |2

we have |λi| ≤ nmax|aij | for i = 1, . . . , n.Example 5.21. Schur’s inequality can also be used in conjunction withAM–GM to obtain a bound on the determinant of A. Recall that the de-terminant of a square matrix equals the product of its eigenvalues:

|detA| =∣∣∣∣∣n∏i=1

λi

∣∣∣∣∣ =n∏i=1

|λi|.

Then

|detA|2/n = n

√√√√ n∏i=1

|λi|2 ≤ 1n

n∑i=1

|λi|2 ≤ 1n

n∑i=1

n∑j=1

|aij |2

≤ 1nn2max|aij |2,

and therefore|detA| ≤ nn/2(max|aij |)n.

Inequalities also arise in the discussion of vector and matrix norms. Fora column vector X = (x1, . . . , xn)T , scalars called Lp norms can be definedthrough the equation

‖X‖p =(

n∑i=1

|xi|p)1/p

,

where p = 1, 2, . . . ,∞. In particular,

‖X‖2 =√√√√ n∑

i=1

|xi|2


is called the Euclidean or L2 norm. This norm has many applications insystems engineering, as does the L1 norm

‖X‖1 =n∑i=1

|xi|. (5.27)

The definition for p =∞ is interpreted as ‖X‖∞ = max |xi|.Inequalities are available to relate the various vector norms to one an-

other. For instance, the reader might wish to pause momentarily and verifythe following:

‖X‖2 ≤√n‖X‖∞, (‖X‖2)2 ≤ ‖X‖1‖X‖∞.

For the vector norm ‖X‖p, we define the induced matrix norm of the n×nmatrix A by

‖A‖p = maxX =0

‖AX‖p‖X‖p

. (5.28)

Then ‖AX‖p ≤ ‖A‖p‖X‖p for all X, and ‖AX‖p = ‖A‖p‖X‖p for some X.Although ‖X‖2 is the most natural way to measure the length of a vector,‖A‖2 is difficult to compute in general. For this and other reasons, p = 1or p =∞ are frequent choices. ‖A‖1 gives the max column sum, i.e.,

‖A‖1 = max1≤j≤n

n∑i=1

|aij |

while ‖A‖∞ gives the max row sum. Two other matrix norms that arecommonly used are the Frobenius norm

‖A‖F =√√√√ n∑

i=1

n∑j=1

|aij |2 =√tr[A†A]

and the cubic norm

‖A‖C = nmax|aij |.

A property that must be satisfied by any valid matrix norm is

‖AB‖ ≤ ‖A‖ ‖B‖


for any two matrices A,B. It is easily verified that the Frobenius normsatisfies this general property; by the Cauchy–Schwarz inequality,

‖AB‖F =√√√√ n∑

i=1

n∑j=1

∣∣∣∣∣n∑

k=1

aikbkj

∣∣∣∣∣2

≤√√√√ n∑

i=1

n∑j=1

n∑k=1

|aik|2n∑

m=1

|bmj |2

=

√√√√ n∑i=1

n∑k=1

|aik|2n∑j=1

n∑m=1

|bmj |2

= ‖A‖F ‖B‖F ,as desired. The same property is also easily verified for the cubic norm.Another property of interest is that

‖A‖2 ≤ ‖A‖F ≤√n‖A‖2,

providing an estimate for the more difficult to compute ‖A‖2.A matrix norm is said to be compatible with a given vector norm if the

inequality‖AX‖ ≤ ‖A‖ ‖X‖

holds everywhere. For example, ‖A‖2 and ‖A‖F are both compatible with‖X‖2. The inequality from (5.28)

‖AX‖p ≤ ‖A‖p‖X‖pis sharp in the sense that equality holds for some x = 0. However, theFrobenius norm is not sharp (Exercise 5.15). We pay a penalty for havingan easily computed matrix norm ‖A‖F ; it overestimates the “size” of thematrix A as an operator. Using any compatible norms, we can write

‖λiX‖ = |λi|‖X‖ = ‖AX‖ ≤ ‖A‖ ‖X‖and because the eigenvectors are nonzero,

|λi| ≤ ‖A‖for i = 1, . . . , n. The spectral radius ρ[A] of the matrix A is defined to bethe magnitude of the largest eigenvalue of A. Obviously then,

ρ[A] ≤ ‖A‖.We have already met a specific instance of this inequality in Example 5.20,where the cubic norm was effectively used. See Theorem 6.9.2 of Stoer andBulirsch [55] for more discussion on the spectral radius. See Marcus andMinc [42] for more discussion on matrix inequalities in general. Anothergood reference on matrices is Lutkepohl [40].


5.11 Topics in Signal Analysis

Consider a periodic square wave w(t), given over one cycle by

w(t) = −π/2 for − π ≤ t < 0,

π/2 for 0 ≤ t < π.

By expressions given earlier, w(t) can be represented as the Fourier series

w(t) = 2∞∑n=1

sin(2n− 1)t2n− 1 . (5.29)

Waveform w(t) has a jump discontinuity at t = 0, and it is well known thata truncated version of the Fourier series will overshoot this jump (the Gibbsphenomenon). We now compute the amount of overshoot present (see [41]).Let us call the mth partial sum of the series Swm(t). Differentiation gives

dSwm(t)dt

= 2m∑n=1

cos(2n− 1)t.

Using the identity

m∑n=1

cos(2n− 1)t ≡ 12sin 2mt csc t

and then integrating, we get

Swm(t) =∫ t

0

sin 2mτ

sin τdτ.

Defining

∆m(t) =∣∣∣∣Swm(t)−

∫ t

0

sin 2mτ

τdτ

∣∣∣∣ (5.30)

(the motivation becomes clear later) we see that

∆m(t) =∣∣∣∣∫ t

0sin 2mτ

τ

sin τ

(1τ− sin τ

τ2

)dτ

∣∣∣∣≤∫ t

0| sin 2mτ |

∣∣∣ τ

sin τ

∣∣∣∣∣∣∣1τ −

sin ττ2

∣∣∣∣ dτ.But sin τ ≥ 2τ/π whenever 0 ≤ τ ≤ π/2 by Jordan’s inequality; also,

1τ− sin τ

τ2 =τ

3!− τ3

5!+

τ5

7!− · · · ,

5.11 Topics in Signal Analysis 101

so that for small positive τ

0 <1τ− sin τ

τ2 <τ

3!

and

∆m(t) ≤∫ t

0

π

2· τ3!

dτ =π

24t2.

Hence for any ε > 0, there is a T > 0 such that ∆m(t) < ε whenever0 ≤ t ≤ T . For m > π/2T , we may choose in particular t = π/2m and aftera change of variables write (5.30) as∣∣∣∣Swm

( π

2m

)−∫ π

0

sin ττ

dτ

∣∣∣∣ < ε. (5.31)

But ∫ π

0

sin ττ

dτ =∫ ∞

0

sin ττ

dτ −∫ ∞

π

sin ττ

dτ =π

2−∫ ∞

π

sin ττ

dτ

and (5.31) is ∣∣∣∣[Swm

( π

2m

)− π

2

]−[−∫ ∞

π

sin ττ

dτ

]∣∣∣∣ < ε.

So the difference between the series and w(t) in the neighborhood to theright of the jump discontinuity at t = 0 is

−∫ ∞

π

sin ττ

dτ ≈ 0.281.

This is roughly 9% of the jump height π, and is independent of m. Thefact that the Gibbs overshoot cannot be eliminated by a sufficiently largechoice of m is, of course, related to the fact that the convergence of theseries of functions in (5.29) is not uniform.For aperiodic signals f(t), the Fourier transform

F [f(t)] = F (ω) =∫ ∞

−∞f(t)e−iωt dt

and its inverse

F−1[F (ω)] = f(t) =12π

∫ ∞

−∞F (ω)eiωt dω

are used to study frequency content as a function of the continuous angularfrequency variable ω. By the differentiation property

F[dnf(t)dtn

]= (iω)nF (ω) (5.32)


we have

|ωnF (ω)| =∣∣∣∣ 1in∫ ∞

−∞

dnf

dtne−iωt dt

∣∣∣∣ ≤∫ ∞

−∞

∣∣∣∣dnf

dtn

∣∣∣∣ dtwith the resulting bounds

|F (ω)| ≤ 1|ωn|∫ ∞

−∞

∣∣∣∣dnf

dtn

∣∣∣∣ dtfor n = 0, 1, 2, . . . , on the spectrum of f . This inequality lends supportto our intuitive notion that only signals that vary rapidly with time (i.e.,having significant nth derivatives for large n) can have significant spec-tral content at high frequencies. A related fact is that short-duration timesignals have broadband frequency spectra. In order to quantify this rela-tionship, we use second-moment integrals to define the temporal durationof f(t) as

D2 =∫ ∞

−∞t2f2(t) dt

and the bandwidth of its spectrum as

B2 =∫ ∞

−∞ω2|F (ω)|2 dω.

The uncertainty principle states that if f(t) = o(t−1/2), then

DB ≥√

π/2. (5.33)

To obtain (5.33) we write the Cauchy–Schwarz inequality for integrals as∣∣∣∣∫ ∞

−∞[tf(t)]

(df

dt

)dt

∣∣∣∣2

≤∫ ∞

−∞[tf(t)]2 dt

∫ ∞

−∞

(df

dt

)2

dt.

But integration by parts gives∫ ∞

−∞[tf(t)]

(df

dt

)dt =∫ ∞

−∞tf(t) df(t) = t

f2(t)2

∣∣∣∣∞

−∞−∫ ∞

−∞

f2(t)2

dt,

where the first term in the rightmost member vanishes by the o conditionon f . The integral

E =∫ ∞

−∞f2(t) dt

is called the normalized energy in the signal f ; without loss of generalitythis can be set to unity to give

14≤ D2∫ ∞

−∞

(df

dt

)2

dt.

5.12 Dynamical System Stability and Control 103

It only remains to invoke (5.32) and Parseval’s identity in the form

2π∫ ∞

−∞f2(t) dt =

∫ ∞

−∞|F (ω)|2 dω

to obtain (5.33). It is easily shown (Exercise 5.16) that the minimumduration-bandwidth product is realized (i.e., equality is attained in (5.33))when the signal f is a Gaussian pulse.

5.12 Dynamical System Stability and Control

A broad class of continuous-time systems can be modeled using an initial-value problem of the form

dx(t)dt

= f(x(t)) (t ≥ t0),

x(t0) = x0,

where x(t) is the N -dimensional state vector of the system, and the systemstructure is reflected in function f . Such systems are unforced (i.e., noinput signal). The set of all possible x is the state space of the system, andsolution curves in state space are known as system trajectories.Stability theory deals with sensitivity to unwanted disturbances. Of spe-

cial concern are disturbances tending to perturb the system from an equi-librium state, a value x = xe such that f(xe) = 0 whenever t ≥ t0. Becausesuch a state may always be transferred to the origin of state space by asuitable coordinate translation, it is customary to take xe = 0. If xe isunstable, a slight perturbation could put the system on a trajectory lead-ing away from xe; on the other hand, the trajectory could stay within asmall neighborhood of xe, or it could lead back to xe. Technically, thereare several notions of stability. The origin xe = 0 is . . .

• stable in the sense of Lyapunov if for every given ε > 0 there is a num-ber δ(ε) > 0 such that if ‖x(t0)‖ < δ, then the resulting trajectorysatisfies ‖x(t)‖ < ε for all t > t0.

• asymptotically stable if it is stable in the sense of Lyapunov and, inaddition, there exists γ > 0 such that whenever ‖x(t0)‖ < γ theresulting trajectory has ‖x(t)‖ → 0 as t→∞.

• exponentially stable if there exist positive numbers α, λ, such thatfor all t > t0, ‖x(t)‖ ≤ α‖x0‖e−λt whenever the initial state x0 fallssufficiently close to xe.

Lyapunov theory can yield conclusions about stability without explicitknowledge of x(t). This theory is extensive, and here we can offer only


a few preliminary remarks for the reader. A principal notion is that if asystem having just one equilibrium state is dissipative, then the systemwill always return to that state after any perturbation from it. This equi-librium will be a point of minimum energy for the system, and as anytrajectory is followed toward this point the system energy must continuallydecrease. Use is therefore made of a “generalized energy” function V (x),called a Lyapunov function. Assume F (x) is continuous with continuouspartial derivatives in state space, and let Ω be a region about x = 0. Wesay that F (x) is positive definite in Ω if F (0) = 0 and, for every nonzerox ∈ Ω, F (x) > 0. F (x) is negative semidefinite in Ω if F (0) = 0 and, forevery nonzero x ∈ Ω, F (x) ≤ 0. Similar definitions can be formulated forthe terms positive semidefinite and negative definite.We can now state a simple stability result. If a positive-definite function

V (x) can be determined for a system such that dV/dt is negative semidef-inite, then the equilibrium point x = 0 is stable in the sense of Lyapunov.Here dV/dt means dV (x(t))/dt, which is also written as dV (x)/dt.For let ε > 0 be given. Let Sε = x | ‖x‖ = ε. Because Sε is closed

and bounded, as in the case for one variable, V (x) assumes a minimumvalue m on Sε. Note that m > 0 because V (x) is positive definite aboutx = 0. Continuity of V at x = 0 guarantees a δ > 0 such that δ < ε andsuch that for all x having ‖x‖ ≤ δ, V (x) ≤ m/2. Also, because dV (x)/dt isnegative semidefinite we know that V (x(t)) is nonincreasing with respectto t. Hence, for an initial condition with ‖x(t0)‖ ≤ δ, we have V (x(t)) ≤V (x(t0)) ≤ m/2 whenever t > t0. But this implies that ‖x(t)‖ < ε whenevert > t0. To see this, suppose to the contrary that ‖x(t)‖ ≥ ε at some timet > t0. Because ‖x(t0)‖ < δ < ε, at some intermediate time t0 < t′ ≤ t,‖x(t′)‖ = ε. But on Sε, V (x) ≥ m contradicting V (x(t)) ≤ m/2 whenevert > t0.Moreover, if dV (x)/dt is negative definite, then x = 0 is asymptotically

stable. Geometrically, we may imagine a level contour or surface V (x) =constant > 0 (Figure 5.4). Let x be on this contour or surface. Since x = xewe require dV (x)/dt < 0. By the chain rule,

dV (x)dt

= (∇V (x))T f(x), (5.34)

∇V (x) represents an outward normal to the level contour or surface, andthe vector field f(x) provides tangent vectors along solutions. In engineeringterminology (∇V (x))T f(x) is the dot product, the product of the magni-tudes (norms) with the cosine of the angle between the two vectors. Thefact that the cosine of the angle is negative means that the solution isdirected inward, hence a solution starting in the interior of the level con-tour or surface can never leave the interior and so remains bounded for alltime and, in fact, approaches xe as t → ∞. The Lyapunov function V (x)sometimes corresponds to actual physical energy, but not always.


solutioncurve

xe

V (x)=constant

∇V (x)xf(x)

FIGURE 5.4. Lyapunov function.

Example 5.22. Consider the motion of a mass m attached to a nonlinearspring with stiffness k(x+x3), where x is the distance from the equilibriumposition, and a dashpot (shock absorber) is attached to provide a dampingforce c(dx/dt). The differential equation is

mx′′ + cx′ + k(x+ x3) = 0.

The equation is converted to a first-order system by substituting x′ = y soour system is

dx

dt= y,

dy

dt= − k

m(x+ x3)− c

my.

The kinetic and potential energies are given respectively by

KE =12mv2 =

12my2, PE =

∫k(x+ x3) dx = k

(x2

2+

x4

4

).

Hence the total energy should be a candidate for the Lyapunov function

V

(xy

)= k

(x2

2+

x4

4

)+12my2. (5.35)

By (5.34),

dV

dt

(xy

)=(∇V

(xy

))Tf

(xy

)= −cy2. (5.36)

Equations (5.35) and (5.36) show that V is positive definite and dV/dt isnegative semidefinite, so the system is stable. However, physical intuitiontells us the damped system should be asymptotically stable. Unfortunately,for this choice of V the derivative is zero along the x-axis, and we wanted

dV

dt

(xy

)< 0 if

(xy

)=(00

).


After some fiddling, let

V

(xy

)= k

(x2

2+

x4

4

)+12my2 + β

(xy +

c

m

x2

2

).

Then, using (5.34) again,

dV

dt

(xy

)= (−c+ β)y2 − βk

m(x2 + x4).

Thus, if we choose 0 < β < c, then dV/dt is negative definite. We want Vto be positive definite. Rewrite

V

(xy

)=

kx4

4+W

(xy

).

Recognize

W

(xy

)=(k

2+

βc

2m

)x2 + βxy +

m

2y2

as the quadratic form

W

(xy

)=(

xy

)T (a11 a12a21 a22

)(xy

),

where a11 = k/2 + βc/2m, a12 = a21 = β/2, and a22 = m/2. Recall fromTheorem 5.2 that if A is symmetric and

a11 > 0 and the determinant∣∣∣∣ a11a12a21a22

∣∣∣∣ > 0, (5.37)

thenW is positive definite. It is easily verified that the choice β = c/2 guar-antees that (5.37) is satisfied. HenceW and therefore V is positive definite.Because 0 < β < c we have guaranteed that dV/dt is negative definite.In other words, the system is asymptotically stable at the origin. In thisrelatively simple example finding a Lyapunov function was not immediate.Before continuing we simplify matters by assuming m = k = c = 1 andβ = 1/2. By Theorem 5.1 the quadratic form W satisfies

λ1(x2 + y2) ≤W ≤ λ2(x2 + y2),

where the eigenvalues of A are

λ1 =5−√58

and λ2 =5 +

√5

8.

Since V = (x4/4) +W ,

x4

4+ λ1(x2 + y2) ≤ V ≤ x4

4+ λ2(x2 + y2). (5.38)


NowdV

dt= −x2 + y2

2− x4

2

so we may substitute for x2 + y2 to get

V ≤ x4

4+ λ2

(−2dV

dt− x4)≤ −2λ2

dV

dt,

which impliesV (t) ≤ V (0)e−t/2λ2 .

We use (5.38) and observe

λ1x2 ≤ x4

4+ λ1(x2 + y2) ≤ V,

hence

|x(t)| ≤√

V (0)λ1

e−t/4λ2 .

Similarly, |y(t)| and therefore ‖(x, y)T ‖ are bounded by decaying expo-nentials and the system is exponentially stable at the origin. The readerinterested in pursuing Lyapunov theory further is invited to consult thereferences [11, 32, 35].

When considering systems with nonzero input signals, other notions ofstability must be employed. A crucial question is whether a bounded inputsignal will always give rise to a bounded output signal. If so, the systemis said to have bounded-input, bounded-output (BIBO) stability. More pre-cisely, a system is BIBO stable if there is a constant I such that if the input|u(t)| ≤ B for all t, then the output |y(t)| ≤ BI for all t. One class of sys-tems that covers many engineering situations is the linear, time-invariant(LTI) system, which can be modeled by an equation of the form L[y] = uwhere operator L is time-independent and linear, for instance, a linearconstant-coefficient differential operator of the form

L = andn

dtn+ · · ·+ a1

d

dt+ a0.

An LTI system is relaxed if has zero initial conditions. In this case, thereexists a function h(t) such that

y(t) = h(t) ∗ u(t) =∫ ∞

0h(τ)u(t− τ) dτ

for any input u(t). Knowledge of the system’s weighting function h(t) issufficient to determine the output corresponding to any given input. We


take the system to be causal so that h(t) ≡ 0 whenever t < 0. For boundedinputs

|y(t)| ≤∫ ∞

0|h(τ)||u(t− τ)| dτ ≤ B1

∫ ∞

0|h(τ)| dτ,

and thus a sufficient condition for BIBO stability is that h(t) be absolutelyintegrable on [0,∞): ∫ ∞

0|h(t)| dt = B2 <∞.

Conversely, suppose the system has BIBO stability. In particular, if wechoose B = 1 there exists a constant M such that if the input is boundedby B, the output is bounded by M . We claim that

∫ ∞

0|h(t)| dt <∞.

If not, choose T so that∫ T

0|h(t)| dt > M.

Define the bounded input

u(t) =

h(T − t)|h(T − t)| , 0 ≤ t ≤ T, h(T − t) = 0,0, otherwise.

Then

y(T ) =∫ T

0h(τ)u(T − τ) dτ =

∫ T

0|h(τ)| dτ > M,

which is a contradiction.An alternative system description often employed for relaxed LTI systems

is the transfer function H(s), defined as the Laplace transform of h(t):

H(s) =∫ ∞

0h(t)e−st dt.

Again, this function should contain all necessary information about theinternal structure of the system. As a function of the complex variable s,however, its interesting properties involve its singularities in the complexplane. Of these, the poles (the values of s for which |H(s)| → ∞) are ofgreatest importance. To see why, let s0 be a point located either on theimaginary axis or in the right-half of the s-plane so that [s0] ≥ 0. Thenfor any t ≥ 0, |e−s0t| ≤ 1 and

|H(s0)| ≤∫ ∞

0|h(t)| dt = B2


if the system is BIBO stable. Hence stability implies that all the poles ofH(s) must lie in the left-half of the s-plane, a result familiar to electricaland mechanical engineers.Like stability, the subject of control is huge. We give one example.

Example 5.23. The differential equation

dω(t)dt

+ aω(t) = Kv(t)

can model the angular shaft speed ω(t) of a fixed-field, armature-controlleddc motor. Here the forcing function v(t) is the voltage applied to the ar-mature, while a and K are positive constants that incorporate necessarydata on the resistance of the motor windings, rotational inertial of the shaftand its load, frictional effects, back emf, etc. Application of Laplace trans-form methods, with zero initial conditions on the shaft speed, yields thecorresponding weighting function h(t) = Ke−at for t > 0, and hence byconvolution with the input

ω(t) =∫ t

0v(λ)h(t− λ) dλ =

∫ t

0v(λ)Ke−a(t−λ) dλ

for all t > 0. A simple motor control question is this: What input v(t)should be applied in order to bring the shaft from rest to some given speedωd, in time T , while keeping the input energy integral

Ev =∫ T

0|v(t)|2 dt

a minimum? By the Cauchy–Schwarz inequality,

ω2d ≤ Ev

∫ T

0K2e−2a(T−λ) dλ

and hence the energy required for the task satisfies

Ev ≥ 2aω2d

K2(1− e−2aT ).

Equality is attained when

v(λ) = Kpe−a(T−λ),

where the proportionality constant Kp is determined by setting

ωd =∫ T

0Kpe

−a(T−λ)Ke−a(T−λ) dλ =KpK

2a(1− e−2aT ).


Hence the optimal driving voltage waveform is given for t > 0 by

v(t) =2aωd

K(1− e−2aT )e−a(T−t).

Putting this back into the convolution integral, we write the resulting shaftspeed as

ω(t) = ωd

(eat − e−at

eaT − e−aT

)= ωd

sinh atsinh aT

.

5.13 Some Inequalities of Probability

Take a random variable X ≥ 0. Then for any t > 0, the probability of theevent X ≥ t satisfies

P (X ≥ t) ≤ µXt

, (5.39)

where µX is the mean or expected value ofX. This is theMarkov inequality.To illustrate how it is developed, let us consider the case where X is adiscrete random variable having frequency function fX(x) = P (X = x) ≥ 0(recall that probabilities are always nonnegative). We have

µX =∑x≥0

xfX(x) =∑

0≤x<txfX(x) +

∑x≥t

xfX(x),

so thatµX ≥

∑x≥t

xfX(x) ≥ t∑x≥t

fX(x) = tP (X ≥ t),

and (5.39) follows. The development for a continuous random variable issimilar.The inequality

P (|X − µX | ≥ t) ≤ σ2X

t2, (5.40)

where σ2X is the variance of X, is the Chebyshev inequality. To derive it

from the Markov inequality, we start from the obvious fact that

(X − µX)2 ≥ 0.Then, for every t > 0,

P((X − µX)2 ≥ t2

) ≤ E[(X − µX)2]t2

=σ2X

t2.

But (X − µX)2 ≥ t2 if and only if |X − µX | ≥ t, and (5.40) follows.

5.13 Some Inequalities of Probability 111

Example 5.24. The special case

P (|X − µX | ≥ nσX) ≤ 1/n2

immediately leads to the interpretation that a random variable X is likelyto fall close to its mean.

Example 5.25. Binomially distributed random variables have mean npand variance np(1− p), where n is the number of trials of the experimentand p is the probability of “success” in each trial. Then t = nβ gives

P (|X − np| < nβ) ≥ 1− p(1− p)nβ2 .

For instance, suppose a population contains an unknown proportion p ofdefective objects. Let X be the number of defectives in a sample of size N .Then for every β > 0,

P

(∣∣∣∣XN − p

∣∣∣∣ < β

)≥ 1− p(1− p)

Nβ2 .

Now max[p(1− p)] = 0.25; hence for fixed β,N , the minimum probabilitythat the observed proportion of defectives in the sample differs from theactual proportion p by an amount less than β is 1 − 0.25/Nβ2. Hence,to insure that this probability meets or exceeds some value P , we needN ≥ 0.25/(1− P )β2.

The Chebyshev inequality is too conservative for some applications. Abetter estimate on the tail of a probability distribution is often reachedthrough the more specialized Chernoff bound (see Lafrance [36] for a nicedevelopment).Many important continuous random variables are normally distributed.

Recall that the standard normal density has

fX(x) =1√2π

e−x2/2

for −∞ < x < ∞. It is often convenient to work with the related coerrorfunction

Q(x) = P [X > x] =1√2π

∫ ∞

x

e−t2/2 dt

for which repeated integration by parts yields the asymptotic expansion [1]

Q(x) =e−x2/2√2πx

[1− 1

x2 +1 · 3x4 + · · ·+ (−1)n1 · 3 · 5 · · · (2n− 1)

x2n

]

+(−1)n+11 · 3 · 5 · · · (2n+ 1)√

2π

∫ ∞

x

e−t2/2

t2n+2 dt


H(f) gi(t) + ni(t) go(t) + no(t)

FIGURE 5.5. Pre-decision filter.

whenever x > 0. The inequalities

1√2πx

(1− 1

x2

)e−x2/2 < Q(x) <

1√2πx

e−x2/2

are thus apparent. Tighter bounds on Q(x) are also available (see [10] andExercise 5.17). This function is of interest in communication engineering,where it is often assumed that the system noise is Gaussian. We touch onsome other aspects of communication systems in the next section.

5.14 Applications in Communication Systems

The field of communications strives toward the accurate reception of in-formation signals in the presence of noise. Noise phenomena can be typedaccording to their physical mode of production, or simply according to theshape of their power spectra. For instance, much noise is produced by elec-trons moving randomly in conductors; this noise, called thermal noise, isapproximately white because power in the associated waveform tends to bedistributed evenly across the frequency spectrum.In binary communication, time is divided into successive bit intervals, of

length T seconds say, and during each interval at the receiver a known de-terministic signal gi(t) is either present or absent (corresponding to binary1 or binary 0). Of course, noise ni(t) is present in either case (subscript idenotes waveforms at the receiver input, as in Figure 5.5). Since the formof gi(t) is known in advance, the receiver’s function is simply to make apresence/absence decision during each bit interval. The decision processis, of course, complicated by ni(t), which in many cases enters in additivefashion so that the received waveform is gi(t) + ni(t). An error occurs ifthe receiver decides gi(t) is present when it was never transmitted, or viceversa. It can be shown that the probability of such an error is minimizedif the decision is based on a sample taken from the received waveform at atime instant when the signal-to-noise power ratio S/N is maximum. Hencewe become interested in a system block that can enhance signal power atsome instant of time while simultaneously reducing average noise power.Such a device, known as a matched filter, can be found as follows.We assume additive white noise with power spectral density N0/2 (watts

per Hz), and seek an expression for the Fourier-domain transfer function

5.14 Applications in Communication Systems 113

H(f). The signal output from the filter is given by Fourier inversion as

g0(t) = F−1[G0(f)] =∫ ∞

−∞Gi(f)H(f)ej2πft df,

so that at the sample time t = T its normalized power is

S = g20(T ) =

∣∣∣∣∫ ∞

−∞Gi(f)H(f)ej2πfT df

∣∣∣∣2

.

The power spectral density of the output noise is given by (N0/2)|H(f)|2because the power gain of the linear system H(f) at frequency f is |H(f)|2;hence the normalized output average noise power is

N =∫ ∞

−∞(N0/2)|H(f)|2 df

and the signal-to-noise ratio is

S

N=

∣∣∣∫∞−∞ Gi(f)H(f)ej2πfT df

∣∣∣2∫∞−∞(N0/2)|H(f)|2 df

.

The integral in the numerator can be expressed in inner-product form (seeExample 4.5):

∫ ∞

−∞Gi(f)H(f)ej2πfT df =

∫ ∞

−∞H(f)Gi(f)e−j2πfT df.

But by the Cauchy–Schwarz inequality (see Example 4.6),

S

N≤∫∞

−∞ |Gi(f)|2 df∫∞

−∞ |H(f)|2 df(N0/2)

∫∞−∞ |H(f)|2 df .

Equality is attained when

H(f) ∝ Gi(f)e−j2πfT

leading to the choice

h(t) = F−1Gi(f)e−j2πfT = gi(T − t)

for the matched filter. For more details on the matched filter, along withderivation of the optimal error rate expressions in terms of coerror functionQ(x), the reader is referred to Couch [15].

The study of communications naturally involves some aspects of thebroadly based mathematical discipline known as information theory, a sub-ject replete with inequalities beginning at its most elementary level. To


ζ = S1, . . . , SN S

FIGURE 5.6. An information source.

formally define such a nebulous concept as “information” must have beena substantial challenge to Claude Shannon and the other early workers inthis area. To help make the idea precise, we can imagine a hypotheticalmachine called a discrete memoryless source (DMS). The DMS has an al-phabet ζ, which is just a discrete set of symbols ζ = S1, . . . , SN, and itperiodically emits one of these symbols S as a message to the outside world(Figure 5.6). The DMS is random in its methods of symbol selection, withassigned symbol probabilities P (S = Sn) = pn (with

∑pn = 1), that are

time independent and such that successive symbols are statistically inde-pendent (hence the term memoryless). By definition, the self-informationassociated with each source symbol is given by

In = logb

(1pn

)(n = 1, . . . , N).

This definition passes some key intuitive tests. Because 0 ≤ pn ≤ 1, wehave In ≥ 0, and need not worry about the possibility of getting “negativeinformation” from the source. As pn → 1, In → 0; that is, a symbol thatoccurs with such stubborn repetitiveness as to be deterministic would neversurprise an observer and should carry no information. We have In > Imwhenever pn < pm; the monotonic behavior of the logarithm means thatunlikely symbols carry more information than likely ones. Finally, the jointprobability P (Sn and Sm) = pnpm for two successive independent mes-sages, leading to a joint information quantity satisfying

Inm = logb

(1

pnpm

)= logb

(1pn

)+ logb

(1pm

)= In + Im.

The logarithmic base b is arbitrary, but does determine the unit of infor-mation. Commonly b = 2 is employed, giving information measured in bits.Because the self-information I is a random variable taking possible valuesI1, . . . , IN , we can compute its expected value as

H(ζ) =N∑n=1

Inpn =N∑n=1

pn log2

(1pn

).

This quantity, the average information per symbol, is the entropy of theDMS. Of special interest are bounds on H(ζ). Certainly

H(ζ) ≥ 0

5.15 Existence of Solutions 115

and equality holds if and only if all the pn except one vanish (again thecase of no uncertainty, no information). For an upper bound we may con-vert to the natural log via the formula log2 x = K lnx (the constant K isimmaterial) and use

lnx ≤ x− 1(Exercise 2.2), which is regarded as a fundamental inequality of informationtheory. We have

log2 N = log2 NN∑n=1

pn =N∑n=1

pn log2 N

so that

H(ζ)− log2 N =N∑n=1

pn

[log2

(1pn

)− log2 N

]=

N∑n=1

pn log2

(1

Npn

)

= KN∑n=1

pn ln(

1Npn

)≤ K

N∑n=1

pn

(1

Npn− 1).

The last quantity vanishes, hence

H(ζ) ≤ log2 N.

This upper bound is attained if and only if 1/(Npn) = 1 for all n, thatis, when pn = 1/N for all n. The entropy of a DMS is therefore greatestwhen its output is least predictable on average, i.e., when all the messageoutputs are equally probable.As usual, we could only touch on a few preliminary aspects of this fas-

cinating subject here. The interested reader can consult the references[9, 36, 54] for more.

5.15 Existence of Solutions

Theorem 4.3, the contraction mapping theorem, is a powerful result thatallows us to prove the existence of unique solutions to many equations ofpractical importance. In addition, the proofs yield practical methods suchas Neumann series and Picard iteration for solving certain integral equa-tions and differential equations, and Newton’s method for solving systemsof nonlinear equations.Integral equations, where the unknown is a function appearing under-

neath an integral sign, arise naturally in areas like mechanics, electromag-netics, control, and population dynamics. For example, the equation

ψ(x) = g(x) + λ

∫ b

a

K(x, t)ψ(t) dt (a ≤ x ≤ b), (5.41)


where ψ(x) is unknown, is called a Fredholm integral equation of the sec-ond kind. We assume that g(x) ∈ C[a, b], and that the kernel K(x, t) iscontinuous for both a ≤ x ≤ b and a ≤ t ≤ b. We seek a condition underwhich the integral operator

F (ψ)(x) = g(x) + λ

∫ b

a

K(x, t)ψ(t) dt

is a contraction on C[a, b]. Now since K(x, t) is continuous on a closed andbounded domain, we know that K(x, t) is bounded (by B, say). Let u(x)and v(x) be arbitrary members of C[a, b]. Then

d(F (u), F (v)) = maxx∈[a,b]

∣∣∣∣∣λ∫ b

a

K(x, t)[u(t)− v(t)] dt

∣∣∣∣∣≤ max

x∈[a,b]|λ|∫ b

a

|K(x, t)||u(t)− v(t)| dt

≤ B|λ| maxx∈[a,b]

∫ b

a

|u(t)− v(t)| dt

≤ B|λ|(b− a) maxx∈[a,b]

|u(x)− v(x)|

= B|λ|(b− a) d(u(x), v(x)).

For F to be a contraction on C[a, b] then, we require that

|λ| < 1/B(b− a).

Provided this condition is satisfied, we may iterate to solve (5.41) forψ(x) on [a, b]. Starting with an initial guess of ψ(0)(x) = g(x), the firstiteration yields

ψ(1)(x) = g(x) + λ

∫ b

a

K(x, t)ψ(0)(t) dt = g + λΓ[g],

where for notational convenience we have defined Γ as the integral operator

Γ[ψ] =∫ b

a

K(x, t)ψ(t) dt.

The second iteration is then

ψ(2) = g + λΓ[g] + λΓ2[g] = g +2∑i=1

λiΓi[g]

and, in general,

ψ(n) = g +n∑i=1

λiΓi[g].


By Theorem 4.3, we can express the solution of (5.41) as

ψ = limn→∞ψ(n) = g +

∞∑i=1

λiΓi[g].

This is called the Neumann series for the integral equation.

Example 5.26. A specific instance of (5.41) is furnished by

ψ(x) = g(x) + λ

∫ 1

0ex−tψ(t) dt (0 ≤ x ≤ 1). (5.42)

In this case it is easily verified that Γn[g] = κex for n = 1, 2, 3, . . . , where

κ =∫ 1

0e−tg(t) dt.

Hence

ψ(x) = g(x) +∞∑i=1

λiκex = g(x) + κexλ

1− λ(5.43)

for 0 ≤ x ≤ 1, provided that |λ| < 1. The validity of (5.43) as a solution to(5.42) is easily verified by direct substitution.

The reader interested in pursuing integral equations further could referto Jerri [31]. Another application of Theorem 4.3 is to the proof of existenceof solutions to differential equations.

Theorem 5.4 (Picard–Lindelof Theorem). Suppose we are given thedifferential equation

dy

dx= f(x, y) with initial condition y(x0) = y0. (5.44)

We may assume x0 = y0 = 0 (by taking appropriate translations if neces-sary). Suppose f is continuous in a rectangle D = (x, y) | |x| ≤ a, |y| ≤ bwhere a and b are positive constants. Also suppose that f satisfies a Lip-schitz condition in y in D, namely that there exists a positive constant ksuch that

|f(x, y1)− f(x, y2)| ≤ k|y1 − y2| for all (x, y1) and (x, y2) ∈ D. (5.45)

Then there exists a constant α > 0 so that in the interval I = x | |x| ≤ αthere exists a unique solution y = φ(x) to the initial-value problem (5.44).

Proof. Before giving the proof we give some motivation. Suppose we alreadyknew φ(x) exists. Then

φ(x) =∫ x

0f(t, φ(t)) dt for x ∈ I (5.46)


by Theorem 2.5. Let M be the space of continuous functions on the closedinterval I. M is a complete metric space with metric

d(f, g) = maxx∈I

|f(x)− g(x)|.

Define F from M to itself as follows: if ψ ∈M , define F (ψ) by

F (ψ)(x) =∫ x

0f(t, ψ(t)) dt (x ∈ I).

So if φ(x) exists, then it is a fixed point of F ; i.e., F (φ) = φ. By Theorem 4.3F will have a unique fixed point if it is a contraction. Now let φ1, φ2 ∈M .Suppose in addition that

|φ1(t)| ≤ b and |φ2(t)| ≤ b for all t ∈ I, (5.47)

d(F (φ1), F (φ2)) = maxx∈I

|F (φ1)(x)− F (φ2)(x)|

= maxx∈I

∣∣∣∣∫ x

0f(t, φ1(t)) dt−

∫ x

0f(t, φ2(t)) dt

∣∣∣∣≤ max

t,x∈I|f(t, φ1(t))− f(t, φ2(t))||x− 0| (5.48)

≤ kmaxt∈I

|φ1(t)− φ2(t)|α (5.49)

= αkd(φ1, φ2). (5.50)

We will want to choose α so that αk < 1 in order that F be a contraction.Also we will want for any φ1 and φ2 ∈ M that (5.47) be satisfied so that(5.45) will allow us to deduce (5.49) from (5.48). We are now ready to provethe theorem. Since f(x, y) is continuous on D there is a positive constantQ such that |f(x, y)| ≤ Q for all (x, y) ∈ D. Choose α sufficiently smallso that αk = λ < 1 and α < b/Q. We now define I = x | |x| ≤ α andmodify our original definition of M so that

M = φ | φ ∈ C(I) and |φ(t)| ≤ b for all t ∈ I,

M is a complete metric space. To see that F maps M into M , for φ ∈ Mand x ∈ I,

|φ(x)| =∣∣∣∣∫ x

0f(t, φ(t)) dt

∣∣∣∣ ≤ αQ < b.

The derivation of the inequality (5.50) is now valid. Since αk < 1, F is acontraction on M and therefore has a fixed point φ. We may choose φ0(x)as the (constant) zero function, and for all i perform Picard iteration

φi+1(x) = F (φi)(x) =∫ x

0f(t, φi(t)) dt for all x ∈ I.


Then φi(x) → φ(x) as i→∞.With one minor difference, we have given the same proof twice. In finding

a solution to (5.41), F is a contraction when λ is sufficiently small; in findinga solution to (5.46), F is a contraction when the interval of integration from0 to x is sufficiently small.

Recall from our discussion of the mean value theorem for derivatives thatif f : R→ R ∈ C1 and x, x+∆x ∈ (a, b), then

f(x+∆x) = f(x) + f ′(η)∆x (5.51)

for some η between x and x+∆x. However, if f : Rn → Rn ∈ C1 so that

f(x) =

f1(x)...

fn(x)

, f ′(x) =

∂f1∂x1

· · · ∂f1∂xn

.... . .

...∂fn∂x1

· · · ∂fn∂xn

,

then it is not true in general that given x,∆x ∈ Rn there exists η between(in the line segment joining) x and x + ∆x such that (5.51) holds. Anexample [17] is

f : R2 → R2, f

(x1x2

)=(

ex1 − x2x2

1 − 2x2

).

Our concern is as follows. For f : [a, b]→ [a, b] to be a contraction in [a, b],if |f ′(x)| ≤ λ < 1 on [a, b], then f is a contraction on [a, b] by Corollary2.6.1:

|f(x)− f(y)| = |f ′(η)(x− y)| ≤ λ|x− y|for some η. We cannot extend this argument directly to f : Rn → Rn.However

f(x+∆x)− f(x) =∫ x+∆x

x

f ′(z) dz =∫ 1

0f ′(x+ t∆x)∆x dt

by componentwise application of Theorem 2.5. This implies that

‖f(x+∆x)− f(x)‖ ≤M‖∆x‖ where M = maxx∈U

‖f ′(x)‖,

where U is a closed neighborhood containing x and x+∆x. Using this, wecan prove the following important theorem:

Theorem 5.5 (Implicit Function Theorem). Let F ∈ C1(U × V,W )where U, V,W are open subsets of Rn,Rm,Rn, respectively. Let (x0, y0) ∈U × V with F (x0, y0) = 0 and

DxF (x0, y0) =

∂F1∂x1

· · · ∂F1∂xn

.... . .

...∂Fn∂x1

· · · ∂Fn∂xn

(x0, y0)


be nonsingular. Then there exists a neighborhood U1 × V1 ⊂ U × V and afunction f : V1 → U1, f ∈ C1, where f(y0) = x0 such that

F (x, y) = 0 for (x, y) ∈ U1 × V1 ⇔ x = f(y).

Proof. The basic idea is that if we have more unknowns than equations,we may choose and rename surplus variables as y1, . . . , ym. Then, holdingthese variables constant, we may solve the set of equations in suitableneighborhoods for x1, . . . , xn by Newton’s method. Staying nearby, as thevalues of y1, . . . , ym vary, so will the values of x1, . . . , xn. Usually we donot know an explicit formula, but x1, . . . , xn are determined implicitly byy1, . . . , ym. In the above, we use the standard notation

U × V = (x, y) | x ∈ U and y ∈ V .

We now give the proof. SinceDxF (x0, y0) is nonsingular, there is a neigh-borhood of (x0, y0) in which DxF is nonsingular (the determinant is a con-tinuous function that is nonzero at a point, and hence in a neighborhood).By choosing U and V smaller if necessary, (DxF (x, y))−1 is defined onU × V . Define G : U × V → Rn by G(x, y) = x − (DxF (x, y))−1F (x, y).Then G(x0, y0) = x0 and DxG(x0, y0) = I − I = 0, where I is the identitymatrix. Since G(x0, y0) = x0 and DxG(x0, y0) = 0, and since G ∈ C1,there exists a neighborhood U1 × V1 of (x0, y0) with ‖DxG(x, y)‖ ≤ α < 1in U1 × V1. Choose this neighborhood U1 × V1 such that G: U1 × V1 → U1(see above). For each y ∈ V1, G(x, y): U1 → U1 is a contraction, and hencehas a unique fixed point which we denote by f(y). To see that f is smooth,and to see an illustrative example, consult Edwards [22].

A special case of the preceding proof guarantees convergence of Newton’smethod (under reasonable hypotheses) if the initial guess is sufficiently closeto the solution.

Theorem 5.6. Let F ∈ C1(U,W ) with U,W open in Rn. Suppose F (ξ) =0 and F ′(ξ) is nonsingular. Then there is a neighborhood U1 of ξ such thatif x0 ∈ U1 and xn+1 = xn − (F ′(xn))−1F (xn) for all n, then the sequencexn converges to ξ.

Proof. Considerm = 0 andRm as the empty set. That is, take F (x) insteadof F (x, y). G: U → Rn becomes G(x) = x− (F ′(x))−1F (x) on sufficientlysmall U1, with G: U1 → U1 a contraction; hence if x0 ∈ U1, xn+1 = G(xn)converges to the fixed point ξ of G (where F (ξ) = 0).

For a treatment of the implicit function theorem, existence of solutionsof differential equations, and related topics in greater generality, see Chowand Hale [14].

5.16 A Duality Theorem and Cost Minimization 121

5.16 A Duality Theorem and Cost Minimization

The proof of the duality theorem and an example are by Duffin [19, 20].Suppose c1, . . . , cn is a sequence of positive constants and for each i =1, . . . , n there is a sequence of real numbers αi1, . . . , αik. Suppose theseare used to define for positive t1, . . . , tk the cost function

u(t1, . . . , tk) = c1tα111 . . . tα1k

k + · · ·+ cntαn11 . . . tαnk

k .

Denote

R+k = t = (t1, . . . , tk) | each ti > 0,

∆n =

δ = (δ1, . . . , δn) | each δi > 0 and

n∑i=1

δi = 1

,

∆αn =

δ | δ ∈ ∆n and

n∑i=1

αijδi = 0 for j = 1, . . . , k

,

Pi(t) = tαi11 . . . tαik

k for t ∈ R+k ,

v(δ) =n∏i=1

(ciδi

)δi

for δ ∈ ∆n.

Using the above notation with the positive constants c1, . . . , cn and thematrix of real numbers (the exponents in the cost function) αij fixed, westate two problems:

• Problem 1: Let M = inft∈R+ku(t). Find M .

• Problem 2 (The dual problem): Let m = supδ∈∆αnv(δ). Find m.

Zero is clearly a lower bound for the set in Problem 1 hence M exists.On the interval (0, 1), (1/δi)δi is bounded by e1/e so m exists for Problem2.To show how the two problems are related we apply the weighted AM–

GM inequality to u(t): for any δ ∈ ∆n and t ∈ R+k ,

u(t) =n∑i=1

δi

(ciPi(t)

δi

)≥

n∏i=1

(ciPi(t)

δi

)δi

= v(δ) tD11 · · · tDk

k ,

where each

Dj =n∑i=1

αijδi.

If δ ∈ ∆αn, then each Dj = 0. Thus u(t) ≥ v(δ) for all t ∈ R+

k and δ ∈ ∆αn.

For t ∈ R+k and δ ∈ ∆n define

Q(t, δ) = u(t)− v(δ) tD11 . . . tDk

k .


Then Q(t, δ) ≥ 0 with equality if and only if all ciPi(t)/δi are equal. Nowsuppose that u(t) attains its infimum M at a point t∗ ∈ R+

k . For each ichoose δ∗

i = ciPi(t∗)/M. Then δ∗ ∈ ∆n and all ciPi(t∗)/δ∗i are equal, hence

Q(t∗, δ∗) = 0. Since we have assumed the cost function u(t) attains itsminimum at t∗ in the open setR+

k , ∂u/∂tj = 0 for j = 1, . . . , k at this point.SinceQ(t, δ) attains its minimum at (t∗, δ∗), all its partial derivatives, hencethe first k, ∂Q/∂tj = 0 at (t∗, δ∗). But

∂Q

∂tj=

∂

∂tju− ∂

∂tjv(δ) tD1

1 . . . tDk

k .

Since the first term is zero,∂

∂tjv(δ) tD1

1 . . . tDk

k = Djv(δ) tD11 . . . t

Dj−1j . . . tDk

k = 0.

The conditions Dj = 0 are exactly that δ ∈ ∆αn. Thus δ∗ ∈ ∆α

n andv(δ∗) = M. Since v(δ) ≤ u(t) for all t ∈ R+

k and δ ∈ ∆αn, in particular

v(δ) ≤ u(t∗) =M for all δ ∈ ∆αn with equality when δ = δ∗. Thus M = m

and

v(δ) ≤ M ≤ u(t) for all δ ∈ ∆αn and t ∈ R+

k (5.52)

with equality at t = t∗ and δ = δ∗. Therefore the following has been proved:

Theorem 5.7. If the cost function u(t) in Problem 1 attains its infimumM in R+

k , then the dual function v(δ) in Problem 2 has maximum value Min ∆α

n.

Example 5.27. Suppose 400 yd3 of material must be ferried across a riverin an open box of length t1, width t2, and height t3. The bottom and sidescost $10 per yd2 and the ends cost $20 per yd2. Two runners costing $2.50per yd are needed for the box to slide on. Each round trip of the ferry costs$0.10. Minimize the total cost

u(t1, t2, t3) =40

t1t2t3+ 20t1t3 + 40t2t3 + 10t1t2 + 5t1.

(Ignore the fact that a fraction of a trip does not make sense.)The reader might wish to perform the following numerical experiment.

Apply Newton’s method to the gradient of u(t), hoping to find a minimumpoint. In order to force the constraints that ti > 0, substitute ti = ε + x2

i

and estimate the minimum of u(x) by applying Newton’s method to itsgradient. (When using Newton’s method, use the Jacobian derivative ofthe gradient of u, which is the Hessian of u.) Set ε = 0, take initial guessesti = x2

i = 1, and find values of t1 ≈ 1.54, t2 ≈ 1.11, t3 ≈ .557. Take thisvalue of t as (a good approximation for) t∗ and define δ∗

i = ciPi(t∗)/u(t∗) forall i and verify (to within a specified tolerance) that δ∗ ∈ ∆α

n and u(t∗) =v(δ∗) = 108.69. Note that a typical application of Newton’s method yieldsonly a local result. However, (5.52) tells us that we found M = $108.69,the global minimum cost.

5.17 Exercises 123

5.17 Exercises

5.1. Applications of the Chebyshev inequality:

(a) Obtain an upper bound for the integral

I =∫ 5

2

ex

x+ 1dx.

(b) Derive the inequality

(sin−1 t)2 <t

2ln∣∣∣∣ t+ 1t− 1

∣∣∣∣ .5.2. Show that if u(x) and every un(x) are integrable on [a, b], and if un(x)converges uniformly to u(x) on [a, b], then

∫ b

a

limn→∞

un(x) dx =∫ b

a

u(x) dx = limn→∞

∫ b

a

un(x) dx.

5.3. Let u(x) and the sequence un(x) be defined on [a, b]. We say that un(x)converges in mean to u(x) if and only if

limn→∞

∫ b

a

[u(x) − un(x)]2 dx = 0.

(a) Show that uniform convergence implies mean convergence.

(b) Show that mean convergence implies

limn→∞

∫ b

a

u2n(x) dx =

∫ b

a

u2(x) dx.

5.4. Write a computer program using Simpson’s rule to estimate the integral∫ 10 ex

2dx. Keep doubling the number of partitions until convergence within a

specified tolerance. Do not recompute any function evaluations.

5.5. Use Taylor’s method of order 5 to estimate y(3) given that

y′ = 1 − y

x, y(2) = 2.

The exact solution is y = x/2 + 2/x. Use a symbolic manipulator to form thederivatives in Φ5 and paste them into a standard language such as Fortran or C.

5.6. Show that for n ∈ N

2Γ(n+

12

)≤ Γ(12

)Γ (n+ 1) ≤ 2nΓ

(n+

12

)

with strict inequality for n > 1. See [38] for an application to traffic flow.

5.7. For the exponential integrals defined in the text, show that

En+1(x) < En(x)

for n = 1, 2, 3, . . . .


5.8. Show that for x > 1

x− 1x2 e−x <

∫ ∞

x

e−t

tdt <

e−x

x.

5.9. The autocorrelation of a real-valued function f is the function f f definedby

f f(t) =∫ ∞

−∞f(u)f(t+ u) du.

Show that f f reaches its maximum at t = 0.

5.10. Use the power series expansion for Jn(x) to show that for n ≥ 0,

|Jn(x)| ≤ |x|n2n(n!)

ex2/4.

5.11. The complementary error function erfc(x) is defined by

erfc(x) =2√π

∫ ∞

x

e−t2 dt.

(a) Establish the following upper bound:

erfc(√x) <

e−x

√πx

.

This bound is useful for x > 3, but obviously not for small x.

(b) Show that for 0 ≤ x ≤ y,

erfc(√y) ≤ e−(y−x) erfc(

√x).

5.12. (See Anderson et al. [3]). Let a, b be real, positive numbers. Denote thearithmetic and geometric means by A(a, b) = (a + b)/2 and G(a, b) =

√ab,

respectively. Define sequences starting with a0 = a, b0 = b and, for all n, an+1 =A(an, bn), bn+1 = G(an, bn).

(a) Use AM–GM and induction to show that

bn ≤ bn+1 ≤ an+1 ≤ an

for n ≥ 1.

(b) Observe that an is a decreasing sequence bounded below (by b0), and hencehas a limit. Similarly, bn has a limit.

(c) Show that the sequences an and bn have a common limit. This commonlimit, the arithmetic–geometric mean of a and b, is denoted by AG(a, b).

(d) Defining the integral

T (a, b) =∫ π/2

0

dx√a2 cos2 x+ b2 sin2 x

with a1 and b1 defined as above, show that T (a, b) = T (a1, b1).

5.17 Exercises 125

(e) Show that the sequence (a2n cos2 x + b2n sin2 x)1/2 converges uniformly to

AG(a, b) on [0, π/2].

(f) Show that

T (a, b) =π/2

AG(a, b).

(g) Now let 0 < r < 1 and set a = 1 and b =√1 − r2 to deduce the stunning

result of Gauss concerning Legendre’s elliptic integrals of the first kind:∫ π/2

0

dx√1 − r2 sin2 x

=π/2

AG(1,√1 − r2)

.

(h) Let r = 1/2. Verify that you get six digits of accuracy in using (π/2)/a2 toestimate the elliptic integral

∫ π/2

0(√1 − r2 sin2 x)−1dx.

5.13. Show that if ∇2Φ = 0 throughout a region bounded by a simple closedsurface S, then ∮

S

Φ∂Φ∂n

dS ≥ 0.

5.14. Two isolated conducting bodies carry electric charges. Show that if theyare subsequently connected by a very thin wire, the total stored energy of thesystem is diminished.

5.15. LetA =(

1 10 1

). Show that there is noX = 0 with ‖AX‖2 = ‖A‖F ‖X‖2.

5.16. Show that of all finite-energy signal forms, the Gaussian pulse of the formf(t) = K2 exp(K1t

2/2), where K1,K2 are constants with K1 < 0, has the mini-mum duration-bandwidth product.

5.17. Show that an improved lower bound on the coerror function is given forx > 0 by

Q(x) >1√2π

x

x2 + 1e−x2/2.

5.18. Solve the differential equation y′ = 2y, y(0) = 1 by Picard iteration.

5.19. Describe a suitable choice of the neighborhood U1 when using Newton’smethod (Theorem 5.6) for the case n = 1.


AppendixHints for Selected Exercises

1.1. (a) Add 1/x < 1 to 1/y < 1, and then multiply by xy. (b) Manipulate(x − 1)2 ≥ 0. (c) (x − y)2 ≥ 0. (d) 1 · 2 · 3 · · ·n > 1 · 2 · 2 · · · 2. (e) Verifyby substitution for n = 1, 2, 3. Use induction for n ≥ 3. (f) Manipulate theinequality algebraically into (x− 1)(x3 − 1) ≥ 0 where x = b/a. Equality holds ifand only if b = a. (g) Apply long division, followed by the binomial theorem andthen an ad hoc step. (h) An ad hoc result, obvious upon inspection. (j) Comparethe expressions in terms of exponentials. (k) cosh x− 1 = (1/2)(ex + e−x) − 1 =(1/2)(ex/2 − e−x/2)2 ≥ 0.

1.2. (b) For a parallel combination of N resistances, the equivalent resistanceReq is given by R−1

eq = R−11 + · · · +R−1

N ≥ Rn for any n.

1.3. (a) The inequality holds trivially for n = 1. For the induction step use

n+1∏i=1

(1 + ai) = (1 + an+1)n∏

i=1

(1 + ai) =n∏

i=1

(1 + ai) + an+1

n∏i=1

(1 + ai)

≥ 1 +n∑

i=1

ai + an+1

n∏i=1

(1 + ai) ≥ 1 +n∑

i=1

ai + an+1.

(b) First check n = 2. Then

n+1∏i=1

(1 − ai) = (1 − an+1)n∏

i=1

(1 − ai) > (1 − an+1)

(1 −

n∑i=1

ai

)

= 1 − an+1 −n∑

i=1

ai + an+1

n∑i=1

ai > 1 −n+1∑i=1

ai.

128 Appendix. Hints for Selected Exercises

1.4. (a) The proof hinges on L − ε < an ≤ bn ≤ cn < L + ε which holds forsufficiently large n. (b) An equivalent statement is dn → 0 where dn = n1/n − 1;from the binomial expansion n = (1 + dn)n > n(n − 1)d2

n/2 so that for n > 1,0 < dn < [2/(n−1)]1/2. (c) The answer is 0; establish and apply 0 < n!/nn ≤ 2/n2

for n > 2.

1.5. Sum the inequalities min an ≤ an ≤ max an, 1 ≤ n ≤ m, then dividethrough by m to get the result for A.

1.6. Let P(n) be fn < 2n. P(1) and P(2) hold by inspection. The inequalityfn = fn−1+fn−2 < 2n−1+2n−2 = 3(2n−2) < 4(2n−2) = 2n shows that the truthof P(k) for 1 ≤ k < n implies P(n). By the “strong” variation of the principle ofinduction, this is enough.

1.7. an+1 − an =√n+ 1 − √

n. Rationalize the numerator to write this asan+1 − an = 1/(

√n+ 1 +

√n). Hence an+1 − an < 1/(2

√n) bn.

1.8. Assume C can be ordered by a suitable selection of set P . By definition,i is the solution to the equation z2 + 1 = 0; from this it is apparent that i = 0and hence either i ∈ P or −i ∈ P (but not both). The assumption i ∈ P leadsto i2 = −1 ∈ P and therefore to i(−1) = −i ∈ P , a violation of trichotomy.Similarly, the assumption −i ∈ P implies that i ∈ P.

1.10. (a) Suppose that S has two distinct suprema s1,2. Both s1,2 are upperbounds for S; s1 ≥ s2 because s2 is a supremum for S, and likewise s2 ≥ s1because s1 is a supremum for S. Hence s2 = s1, contradicting the suppositionthat s1,2 are distinct. (d) Let U be the set of all upper bounds for S, and show thatthe suppositions supS > inf U and supS < inf U both lead to contradictions.(f) Assume supA > supB, with supA − supB = δ > 0. For every ε > 0,there is x ∈ A such that x > supA − ε. Hence there exists x0 ∈ A such thatx0 > supA − δ/2. We then have x0 > δ + supB − δ/2 = supB + δ/2. Becauseδ is arbitrary, x0 > supB. But x ∈ A ⇒ x ∈ B since A ⊆ B. Thus supB wouldnot be an upper bound for B, a contradiction.

1.11. (c) To show that supx∈S h(x) ≤ supx∈S f(x) + supx∈S g(x) where h(x) =f(x) + g(x), we may assume the contrary: that sup h(S)− sup f(S)− sup g(S) =δ > 0. For every x ∈ S, f(x) + g(x) ≤ sup f(S) + sup g(S). Also, there existsx0 ∈ S such that h(x0) = f(x0) + g(x0) > suph(S) − δ/2. Hence suph(S) −δ/2 < f(x0) + g(x0) ≤ sup f(S) + sup g(S), or δ < δ/2, a contradiction. Usethis result and supx∈S [−f(x)] = − infx∈S f(x) to obtain supx∈S [f(x) − g(x)] ≤supx∈S f(x) − infx∈S g(x).

1.12. Suppose an is increasing and bounded above. Then S = supan existsand to any ε > 0 there corresponds k such that S− ε < ak ≤ S. Thus for m > k,S − ε < ak ≤ am ≤ S < S + ε. Therefore, to any ε > 0 there corresponds k suchthat m > k ⇒ S − ε < am < S + ε ⇒ |am − S| < ε.

1.13. (c) Given ε > 0 there exists N such that for x > N , L− ε < f(x) < L+ ε.Hence for sufficiently large x we have 0 ≤ f(x) < L+ ε or L ≥ −ε for any ε > 0.From this conclude that L ≥ 0, as required.

2.1. Choose α < 1 with p+α > 1 (e.g., if p ≥ 1 let α = 1/2, if 0 1,

lnx =∫ x

1(1/t) dt ≤

∫ x

1(1/tα) dt = (x1−α − 1)/(1 − α)

Appendix. Hints for Selected Exercises 129

hence 0 < lnx/xp ≤ (x1−α−p − x−p)/(1 − α) → 0 as x → ∞.

2.2. (g) Differentiate the function f(x) = lnx/x to show that its maximumis attained at x = e, not at x = π, and hence that f(e) > f(π). (h) Definef(x) = xa + (1 − x)a for 0 ≤ x ≤ 1, 0 < a < 1. Check that f(0) = f(1) = 1,f ′(x) = 0 at x = 1/2 and f(1/2) = 21−a. Thus 1 ≤ xa + (1 − x)a ≤ 21−a. Nowsubstitute x = s/(s + t). (i) Define f(x) = xb + (1 − x)b for 0 ≤ x ≤ 1, b > 1.Check that f(0) = f(1) = 1, f ′(x) = 0 at x = 1/2 and f(1/2) = 21−b. Thus21−b ≤ xa + (1 − x)a ≤ 1. Now substitute x = s/(s+ t).

2.3. The result can be obtained via the inequality ln x ≤ x − 1 (Exercise 2.2).Putting x → x/a and rearranging gives x ≥ a[1 + ln(x/a)] = a[ln e+ ln(x/a)] =ln[(ex/a)a]. Now raise e to the power of both sides. Equality holds if and only ifx = a.

2.4. (b) Take f(x) = tan−1 x. (c) f(x) =√x. (d) f(x) = ex. (e) Using f(x) =

(1+x)a in the mean value theorem, if x > 0 there exists ξ ∈ (0, x) with ((1+x)a−1)/x = a(1+ ξ)a−1 < a(1+ x)a−1, and since x > 0, (1+ x)a − 1 < ax(1+ x)a−1.If −1 < x < 0 there exists ξ with x < ξ < 0 such that ((1 + x)a − 1)/x =a(1 + ξ)a−1 > a(1 + x)a−1, and since x < 0, (1 + x)a − 1 < ax(1 + x)a−1.

2.5. (a) Write h(x) = f(x)/g(x) with f(x) = (1 + x)a − 1 and g(x) = x. Thenf ′(x)/g′(x) = a(1+x)a−1 is increasing (since its derivative is a(a−1)(1+x)a−2 >0). Now use f(0) = g(0) = 0 and l’Hopital’s monotone rule (LMR) for the casesx > 0 giving h(x) > h(0) = a and x < 0 giving h(x) < h(0) = a.(b)

ln coshxln((sinhx)/x)

∣∣∣=0/0 at x=0

−→LMR

x tanh2 x

x− tanhx

∣∣∣=0/0 at x=0

−→LMR

1 + 4x/(sinh 2x)

which clearly decreases on (0,∞).(c) Check that h(x) = sinπx/(x(1 − x)) is symmetric about x = 1/2, so werestrict to (0, 1/2) = (a, b). Now h(x) → π as x → 0+ by l’Hopital’s rule andh(1/2) = 4. To show h is monotonic,

sinπxx(1 − x)

∣∣∣=0/0 at x=0

−→LMR

π cosπx1 − 2x

∣∣∣=0/0 at x=1/2

−→LMR

(π2 sinπx)/2

is increasing on (0, 1/2).(d) h(x) = sinx/x on (0, π/2] extends to [0, π/2] with h(0) = 1 by l’Hopital’srule, h(π/2) = 2/π. Now

sinxx

∣∣∣=0/0 at x=0

−→LMR

cosx1

is strictly decreasing hence 1 > sinx/x > 2/π on (0, π/2).

2.6. (d) Apply ex ≥ 1 + x. Thus 1 + an ≤ exp(an) and

N∏n=1

(1 + an) ≤N∏

n=1

exp(an) = exp

(N∑

n=1

an

).

2.7. Substitute x = a/b > 1 and use either differentiation or Corollary 2.6.1 toestablish that n(x − 1) < xn − 1 < nxn−1(x − 1). Next, suppose that an = bn


with a = b; then bn−1 < 0 < an−1 (a contradiction, because b assumed positive).

2.8. (a) Fix ε and δ and a partition a = x0 < x1 < · · · < xn = b withxi − xi−1 < δ for all i. If f(x) is not bounded on [a, b], then it is not bounded onsome subinterval [xk−1, xk]. Choose ξj for all j = k. There exists some ξk with|f(ξk)| sufficiently large to contradict |∑n

i=1 f(ξi)(xi − xi−1) − I| < δ. (b) Sincef(x) is integrable it is bounded, i.e., there exists M with |f(x)| ≤ M on [a, b].Let ε > 0 be given, and suppose x0 ∈ (a, b). Then

|F (x) − F (x0)| =∣∣∣∣∫ x

a

f(t) dt−∫ x0

a

f(t) dt∣∣∣∣ =∣∣∣∣∫ x

x0

f(t) dt∣∣∣∣ ≤∣∣∣∣∫ x

x0

|f(t)| dt∣∣∣∣ .

Hence |F (x) − F (x0)| ≤ M |x − x0| and we may choose δ = ε/M . (c) No. f(x)is not bounded on [0, 1]. However the integral exists as the improper integrallimε→0+

∫ 1εx−1/2 dx.

2.9. (a) 2−α ≤ I(α, β) ≤ 1. (b) Use Jordan’s inequality. (c) For s > b,∣∣∣∣∫ ∞

0f(t)e−st dt

∣∣∣∣ ≤∫ ∞

0|f(t)||e−st| dt ≤

∫ ∞

0Cebte−st dt =

C

s− b.

(d) Use suitable table lookup integrals to obtain

1 ≤ (2n− 1)!!(2n+ 1)!![(2n)!!]2

π

2≤ 2n+ 1

2n.

As n → ∞, the middle quantity is squeezed to 1. (e) Consider the alternatingseries

ai =∫ (i+1)π

iπ

(sinx)/x dx.

Since the alternating series has |ai+1| < |ai| and |ai| → 0 as i → ∞, the seriesconverges. Write the series of positive terms

∫ ∞

0(sinx)/x dx = (a0 + a1) + (a2 + a3) + · · · > a0 + a1.

Using Jordan’s inequality on [0, π/2], (sinx)/x ≥ 2/π so

∫ π/2

0(sinx)/x dx ≥ 1.

Using symmetry and Figure 2.2, on [π/2, π] (sinx)/x ≥ 2/x−2/π ≥ 2/(π/2)−2/πhence ∫ π

π/2(sinx)/x dx ≥ 1.

Also ∫ 2π

π

(sinx)/x dx ≥ −2/3.

(Sketch a box from π to 2π touching the curve (sinx)/x at 3π/2.) Hence∫ ∞

0(sinx)/x dx > 4/3.


Now regroup and get a0 plus a series of negative terms,∫ ∞

0(sinx)/x dx = a0 + (a1 + a2) + (a3 + a4) + · · · < a0.

Form a left Riemann sum2∑

i=0

f(xi)∆x > a0,

where ∆x = π/3, xi = i∆x.The integral from 0 to ∞ of the Fourier kernel (sin x)/x is computed using

complex variables, with a crucial step invoking Jordan’s lemma, which in turnuses Jordan’s inequality. See [48]. The answer is π/2.

2.10. (a) [53] First assume f(b) = 0. Since g is integrable on [a, b] it is bounded,so we may choose a constant c > 0 with g(x) + c ≥ 0 on [a, b]. Define the(continuous) function G(ξ) =

∫ ξag(x) dx. Let M denote the maximum and m the

minimum of G on [a, b]. Let ∆x = (b − a)/n and xi = a + i∆x for i = 0, . . . , n.Then∫ b

a

(g(x) + c)f(x) dx =n∑

k=1

∫ xk

xk−1

(g(x) + c)f(x) dx

≤n∑

k=1

f(xk−1)∫ xk

xk−1

(g(x) + c) dx

=n∑

k=1

f(xk−1)∫ xk

xk−1

g(x) dx+ c

n∑k=1

f(xk−1)(xk − xk−1).

In the last expression the first term can be rewritten

n∑k=1

f(xk−1)(G(xk) −G(xk−1)) =n∑

k=1

G(xk)(f(xk−1) − f(xk))

≤ M

n∑k=1

(f(xk−1) − f(xk)) = Mf(a).

As n → ∞ the second term approaches c∫ baf(x) dx. Taking limits, by Lemma

1.1, ∫ b

a

(g(x) + c)f(x) dx ≤ Mf(a) + c

∫ b

a

f(x) dx.

Similarly, ∫ b

a

(g(x) + c)f(x) dx ≥ mf(a) + c

∫ b

a

f(x) dx.

Now apply the intermediate value theorem to G. Finally, note that if f(b) = 0 wemay redefine f(x) to be 0 at x = b without changing the integral

∫ baf(x)g(x) dx.

(c) If f(x) is monotonic decreasing, replace f(x) by f(x)−f(b) in part (a). If f(x)is monotonic increasing, replace f(x) by f(x) − f(a) in part (b). (d) Immediatefrom part (a).

2.11. (a) Note that ln(n!) =∑n

k=1 ln(k), and interpret this as a sum of rectan-gular areas (each of unit width). (b) Both trapezoids are bounded by the x-axis,


and the lines x = a, b. The fourth side of the smaller trapezoid is formed by theline tangent to y = 1/x at the midpoint x = (a + b)/2 of interval (a, b). Thefourth side of the larger trapezoid is formed by the secant line connecting thepoints x = a, y = 1/a, and x = b, y = 1/b.2.12. We have Cn − Cn+1 = ln(1 + 1/n) − 1/(n + 1) > 0 by the logarith-mic inequality, so Cn is decreasing. The lower bound of 1/2 would require that∑n

j=1 1/j−1/2 > lnn =∫ n1 dx/x. To show that this is indeed the case, construct

suitable trapezoids to slightly overestimate the area under 1/x as

A =12

n−1∑j=1

(1j+

1j + 1

)=

n∑j=1

1j

− 12

− 12n

,

which makes it apparent thatn∑

j=1

1j

− 12> lnn+

12n

> lnn.

2.13. The moment of inertia

I =∫

r2 dm ≤ r2max

∫dm = mr2max.

2.14. Between each xi and xi+1 there is a point where g′(x) vanishes. Betweenevery two adjacent such points there is a point where g′′(x) vanishes. Continuethe pattern until reaching a point ξ where g(n)(x) vanishes.2.15. Assume the contrary and develop a contradiction based on Lemma 2.3.2.16. Let A be dense in B, and suppose xn → x where x ∈ B and xn is asequence of points in A. By hypothesis f(xn) ≤ g(xn) for all n; hence, by Lemma1.1 we know that lim f(xn) ≤ lim g(xn) and by continuity (Theorem 2.1) theresult is proved. Also note that the rationals are dense in the reals.2.17. No; think of two functions f(x) and g(x) (e.g., two straight lines definedon some interval of the x-axis) such that f(x) is greater than g(x) but the slopeof f(x) is less than that of g(x).3.1. Eggleston [23] gives a proof along the following lines. Let

If =∫ w

0f(x) dx, Ig =

∫ f(w)

0g(y) dy.

Suppose without loss of generality that h ≥ f(w). (Otherwise interchange w, fwith h, g.) First partition the interval [0, w]: let ∆x = w/n and xi = i∆x fori = 0, . . . , n. Observe graphically that because f is increasing

If ≥n−1∑i=0

f(xi)∆x =n−1∑i=0

fi∆x,

where fi = f(xi) for i = 0, . . . , n. Similarly, because the numbers fi serve to(nonuniformly) partition the interval [0, f(w)],

Ig ≥n−1∑i=0

g(fi)(fi+1 − fi) =n−1∑i=0

xi(fi+1 − fi).


Hence

If + Ig ≥n−1∑i=0

fi∆x+n−1∑i=0

xi(fi+1 − fi)

=n−1∑i=0

fi+1∆x+n−1∑i=0

(fi − fi+1)∆x+n−1∑i=0

xi(fi+1 − fi)

=n−1∑i=0

[fi+1(xi +∆x) − xifi] +n−1∑i=0

(fi − fi+1)∆x

=n−1∑i=0

(xi+1fi+1 − xifi) + ∆xn−1∑i=0

(fi − fi+1)

= xnfn − x0f0 − ∆x(fn − f0)

= (w − ∆x)f(w)

by the telescoping property. But g(y) ≥ g(f(w)) if y ≥ f(w), so that∫ h

f(w)g(y) dy ≥

∫ h

f(w)g(f(w)) dy = [h− f(w)]w.

Upon addition to the previous inequality,∫ w

0f(x) dx+

∫ h

0g(y) dy ≥ wh− ∆xf(w).

This holds for arbitrary ∆x > 0, leading to the desired conclusion.3.2. (a) Use AM–GM. (c) Use cyclic permutation on the result of part (a); i.e.,add the inequalities a2 + b2 ≥ 2ab, b2 + c2 ≥ 2bc, and c2 + a2 ≥ 2ca.3.3. The summation formula

∑nk=1 k = n(n+1)/2 is useful in this problem. For

instance,

[(2n)!!]1/n <(2n) + (2n− 2) + · · · + 4 + 2

n=

2∑n

i=1 i

n= n+ 1.

3.4. (a) Let the rectangle have length L, width W , perimeter P . Then P/4 =(L+W )/2 ≥ (LW )1/2. Equality occurs if and only if L = W . (b) q = Q/2.3.5. The case n = 1 is trivial. For n ≥ 2 consider an as a variable by definingfor x > 0,

f(x) = (δ1a1 + · · · + δn−1an−1 + δnx)/(aδ11 · · · aδn−1n−1 x

δn)

which is rewritten as f(x) = (sn−1 + δnx)/(pn−1xδn). Show f ′(x) = 0 at xm =

sn−1/(1 − δn) and f ′′(xm) = (δn/pn−1)x−δn−1m (1 − δn) > 0, so f(x) has its

minimum at xm.

f(xm) = [δ1/(1 − δn)]a1 + · · · + [δn−1/(1 − δn)]an−11−δn/(aδ11 · · · aδn−1n−1 ).

The weights δ1/(1− δn), . . . , δn−1/(1− δn) add up to 1, so inductively if a1, . . . ,an−1 are not all equal, then

[δ1/(1 − δn)]a1 + · · · + [δn−1/(1 − δn)]an−1 > aδ1/(1−δn)1 · · · aδn−1/(1−δn)

n−1


so

f(x) ≥ f(xm) > [aδ1/(1−δn)1 · · · aδn−1/(1−δn)

n−1 ]1−δn/(aδ11 · · · aδn−1n−1 ) = 1.

If a1 = · · · = an−1, then xm = [δ1/(1 − δn)]a1 + · · · + [δn−1/(1 − δn)]an−1 = a1

and f(xm) = 1. For any other choice of x, f(x) > f(xm) = 1.

3.6. Use AM–GM.

3.7. Multiply by 1 and then use AM–GM.

3.8. Use AM–GM twice:

N∑n=1

a−mn ≥ N

(N∏

n=1

a−mn

)1/N

= N

[1

(∏N

n=1 an)1/N

]m

≥ N

[1

(1/N)∑N

n=1 an

]m.

3.9. (a)

limt→0

ln g(t) = limt→0

(ln(∑

δixti

))/t

= limt→0

(∑δix

ti lnxi

)/(∑δix

ti

)

=∑

δi lnxi

by l’Hopital’s rule and the fact that∑n

i=1 δi = 1. Hence

g(t) →n∏

i=1

xδii (t → 0).

(b)

ln g(t) =(ln(∑

δixti

))/t∣∣∣=0/0 at t=0

−→LMR

(∑δix

ti lnxi

)/(∑δix

ti

).

The last expression is increasing since its derivative, using the quotient rule, hasnumerator (∑

δixti

)(∑δix

ti ln

2 xi)

−(∑

δixti lnxi

)2which is nonnegative because(∑

δixti lnxi

)2=(∑

δ1/2i x

t/2i δ

1/2i x

t/2i lnxi

)2≤(∑

δixti

)(∑δix

ti ln

2 xi)

by the Cauchy–Schwarz inequality (3.19). (This applies to g on (0,∞) or (−∞, 0).)

3.10. Choose a partition x0 = a, x1 = a+∆x, . . . , xn = b where ∆x = (b−a)/n.Form Riemann sums approximating each term with a1 = f(x1), . . . , an = f(xn)so that

(a1 + · · · + an)/n = (f(x1) + · · · + f(xn))∆x/(b− a) → (1/(b− a))∫ b

a

f(x) dx,


((a−11 + · · · + a−1

n )/n)−1 → (b− a)/∫ b

a

(1/f(x)) dx,

and

(a1 · · · an)1/n = exp[(1/n)(ln a1 + · · · + ln an)]

= exp[(1/(b− a))∆x(ln(f(x1)) + · · · + ln(f(xn))]

→ exp[(1/(b− a))

∫ b

a

ln f(x) dx]

as n → ∞. Use the previous exercise (c) with each δi = 1/n and Lemma 1.1.

3.11. To simplify notation, denote yi = xti and s =∑

yi. Then (d/dt) lnh(t) =((t/s)

∑yi lnxi − ln s)/t2 is nonpositive since ln s ≥ (t/s)

∑yi lnxyi

i becauses ln s ≥ ∑ lnxyi

i , which follows from ln ss ≥ ∑ t lnxyii = ln

∏yyii . The last

inequality holds because ss = s∑

yi =∏syi ≥∏ yyi

i .

3.12. Ptak [52] gives a proof using AM–GM as follows. From the given ai, forma new set of numbers bi = ai/

√a1am (i = 1, 2, . . . ,m). Like the ai, these are

positive and satisfy b1 < b2 < · · · < bm; moreover, they have the additional prop-erty bm = am/

√a1am =

√a1am/a1 = 1/b1 so that bi ≤ 1/b1 (i = 1, 2, . . . ,m).

Manipulate this to get

bi − b1 ≤ bi − b1bib1

=1b1

− 1bi

orbi +

1bi

≤ b1 +1b1

(i = 1, 2, . . . ,m).

Hencem∑i=1

λibi +m∑i=1

λibi

≤(b1 +

1b1

) m∑i=1

λi

or12

[m∑i=1

λibi +m∑i=1

λibi

]≤ b1 + bm

2.

By AM–GM then,(

m∑i=1

λibi

)(m∑i=1

λibi

)≤(b1 + bm

2

)2

.

When rewritten in terms of the ai, this is the desired result.

3.13. If (3.24) holds, then

c =

(m∑i=1

|bi|q/ m∑

i=1

|ai|p)1/q

.

If (3.25) holds, then for each i,

(|bi|q)/( m∑

i=1

|bi|q)

= ((c|ai|p−1)q)/( m∑

i=1

(c|ai|p−1)q)

= (|ai|p)/( m∑

i=1

|ai|p).


3.14. Let a = x0, x1 = x0 + ∆x, . . . , xm = b where ∆x = (b − a)/m. Callingai = f(xi), bi = g(xi),

m∑i=1

|aibi|∆x →∫ b

a

|f(x)g(x)| dx,(

m∑i=1

|ai|p∆x)1/p

→(∫ b

a

|f(x)| dx)1/p

,

(m∑i=1

|bi|q∆x)1/q

→(∫ b

a

|g(x)| dx)1/q

,

as m → ∞. Now use (3.10) and Lemma 1.1.

3.15. Use∫ b

a

|f(x) + g(x)|2 dx ≤∫ b

a

|f(x) + g(x)|2 dx+∫ b

a

|f(x) − g(x)|2 dx

=∫ b

a

[f(x) + g(x)]2 dx+∫ b

a

[f(x) − g(x)]2 dx

= 2∫ b

a

[f2(x) + g2(x)] dx.

3.16. Use Cauchy–Schwarz with the two functions f√h and g

√h.

3.17. By Cauchy–Schwarz

(1T

∫ T

0v(t) dt

)2

≤ 1T 2

∫ T

0(1)2dt

∫ T

0v2(t) dt

=1T

∫ T

0v2(t) dt =

1T

∫ T

0

(dx

dt

)2

dt

=1T

∫ X

0

dx

dtdx =

1T

∫ X

0v(x) dx.

Here X =∫ T0 v(t) dt. Equality holds if the particle speed is constant.

3.18. Make a substitution into Cauchy–Schwarz.

3.21. Inscribe a polygon of N sides in a circle of fixed radius R. With θn thecentral angle subtending the nth side of the polygon, the area of the polygon isA, where

A =N∑

n=1

R2

2sin θn =

NR2

21N

N∑n=1

sin θn ≤ NR2

2sin

(1N

N∑n=1

θn

).

Thus A ≤ (NR2/2) sin(2π/N), and equality holds only with all central anglesequal.

3.23. Since f ′′(x) > 0, f(x) is convex. Therefore −ln(δ1a1 + · · · + δnan) ≤−δ1 ln a1 − · · · − δn ln an, which implies that δ1a1 + · · · + δnan ≥ aδ11 · · · aδn

n .


3.24. (By now this is old hat.) Let ∆t = (b − a)/n, ti = a + i∆t, ci =p(ti)/

∑nj=1 p(tj), xi = g(ti). Apply (3.21) and Lemma 1.1.

4.1. (a) The statement is equivalent to −d(y, z) ≤ d(x, y) − d(x, z) ≤ d(y, z).The left inequality is equivalent to d(x, z) ≤ d(x, y) + d(y, z), while the right isequivalent to d(x, y) ≤ d(x, z) + d(z, y). But these are both occurrences of thetriangle inequality. (b) Use induction to generalize the triangle inequality.

4.2. Assume x and y are distinct limits for xn. Then for sufficiently large n,d(x, y) ≤ d(x, xn) + d(xn, y) < ε/2 + ε/2 = ε by the triangle inequality. Becauseε is arbitrarily small, we must have x = y, a contradiction.

4.3. (a) Use Minkowski’s inequality. (b) Verification of the first two metric spacerequirements is trivial. For the third, use the triangle and Minkowski’s inequalitiesas follows:

d(ξ, η) =

( ∞∑i=1

|ξi − ζi + ζi − ηi|p)1/p

≤( ∞∑

i=1

[|ξi − ζi| + |ζi − ηi|]p)1/p

≤( ∞∑

i=1

|ξi − ζi|p)1/p

+

( ∞∑i=1

|ζi − ηi|p)1/p

.

4.4. Employ the result of Exercise 2.15.

4.7. Let x be the vector from C to β, y the vector from C to α. Then themedian vector from α to A is 2x − y, and the median vector from β to B is2y − x. Compare the magnitudes of these median vectors by using the fact thatthe inner product of any vector with itself equals its magnitude squared.

4.8. By the proof of Theorem 4.7 equality holds if and only if we have |〈x, y〉| =√〈x, x〉〈y, y〉 and 〈x, y〉 = |〈y, y〉|, i.e., 〈x, y〉 is real and nonnegative. Henceequality holds if and only if x = 0 or y = 0 or else x = βy for some β real andnonnegative.

4.9. Consider each zj = aj + i bj ∈ C as a point wj =(ajbj

)∈ R2.

∣∣∣∣∣n∑

i=1

zi

∣∣∣∣∣ =√√√√(

n∑i=1

ai

)2

+

(n∑

i=1

bi

)2

=

√√√√ n∑i=1

‖wi‖2 + 2∑

1≤i<j≤n

〈wi, wj〉,

whereasn∑

i=1

|zi| =n∑

i=1

‖wi‖.

Squaring both sides, (1.1) is equivalent to

n∑i=1

‖wi‖2 +∑

1≤i<j≤n

2〈wi, wj〉 ≤(

n∑i=1

‖wi‖)2

=n∑

i=1

‖wi‖2 +∑

1≤i<j≤n

2‖wi‖ ‖wj‖.


By Cauchy–Schwarz for each i, j, 〈wi, wj〉 ≤ ‖wi‖ ‖wj‖ hence the inequality isestablished. Furthermore, equality holds in the sum if and only if 〈wi, wj〉 =|〈wi, wj〉| = ‖wi‖ ‖wj‖ for all i < j. Hence equality holds if and only if for eachi < j, wi = βijwj for a constant βij > 0, i.e., arg(zi) = arg(zj).

5.1. (a) I ≤ 32.58 (actual value close to 28). (b) Start with

∫ t

0

(1√

1 − x2

)2

dx >1t

(∫ t

0

dx√1 − x2

)2

.

5.2. If u(x) and every un(x) are integrable on [a, b], then∣∣∣∣∫ b

a

un(x) dx−∫ b

a

u(x) dx∣∣∣∣ =∣∣∣∣∫ b

a

[un(x) − u(x)] dx∣∣∣∣ ≤∫ b

a

|un(x) − u(x)| dx.

Hence if un(x) converges uniformly to u(x) on [a, b], then for ε > 0 there existsN such that ∣∣∣∣

∫ b

a

un(x) dx−∫ b

a

u(x) dx∣∣∣∣ < (b− a)ε

whenever n > N .

5.3. (a) Given ε > 0, take N so large that n > N implies |u(x) − un(x)| <[ε/(b− a)]1/2 for all x ∈ [a, b]. Then for n > N ,

∫ b

a

[u(x) − un(x)]2 dx <∫ b

a

ε/(b− a) dx = ε.

(b) Use Minkowski to generate√∫ b

a

u2 dx ≤√∫ b

a

u2n dx+

√∫ b

a

(u− un)2 dx,

√∫ b

a

u2n dx ≤

√∫ b

a

u2 dx+

√∫ b

a

(u− un)2 dx,

which together are equivalent to the needed inequality∣∣∣∣∣∣√∫ b

a

u2 dx−√∫ b

a

u2n dx

∣∣∣∣∣∣ ≤√∫ b

a

(u− un)2 dx.

5.4. Convert the following pseudo-code to your favorite language:

tol=.5E-3; a=0; b=1; f(x)=eˆ(xˆ2); n=2; h=(b-a)/n; ends=f(a)+f(b);evens=0; odds=f(a+h); aold=(h/3)(ends+4*odds+2*evens);DoLoopn=2*n;h=h/2;evens=evens+odds;odds=f(a+h)+f(a+3h)+...+f(a+(n-1)h);anew=(h/3)(ends+4*odds+2*evens)if |anew-aold|<=tol*|anew| then exit


elseaold=anew;

End DoloopPrint anew

5.5. A Fortran program:

program taylordimension yp(5)x=2y=2h=.1do i=1,10

c the next five lines were created by ac symbolic manipulator and pasted in

yp(1)=1-y/xyp(2)=(-x+2*y)/x**2yp(3)=3*(x-2*y)/x**3yp(4)=12*(-x+2*y)/x**4yp(5)=60*(x-2*y)/x**5Phi=yp(5)do k=5,2,-1Phi=(h/k)*Phi+yp(k-1)

enddoy=y+h*Phix=x+henddowrite(*,*) ’x,y=’,x,yend

5.6. Some helpful formulas are [27]

Γ(12

)=

√π, Γ(n) = (n− 1)!, Γ

(n+

12

)=

√π

2n(2n− 1)!!.

Using these the given inequality can be put into the form

2 ≤ 2nn!(2n− 1)!!

≤ 2n

which is easily established by induction.

5.7. t > 1 ⇒ e−xt/tn > e−xt/tn+1.

5.8. Integrate by parts.

5.9. Use(∫ ∞

−∞f(u)f(t+ u) du

)2

≤∫ ∞

−∞f2(u) du

∫ ∞

−∞f2(t+ u) du

=(∫ ∞

−∞f2(u) du

)2

= (f f(0))2.


5.10. (See [59].) Starting with

|Jn(x)| =∣∣∣∣∣

∞∑m=0

(−1)m(x/2)2m+n

m!(m+ n)!

∣∣∣∣∣apply the triangle inequality to get

|Jn(x)| ≤∣∣∣x2

∣∣∣n∞∑

m=0

(x2/4)m

m!(m+ n)!=

1n!

∣∣∣x2

∣∣∣n∞∑

m=0

(x2/4)m

m!(n+ 1)(n+ 2) · · · (n+m)

≤ 1n!

∣∣∣x2

∣∣∣n∞∑

m=0

(x2/4)m

m!(n+ 1)m=

1n!

∣∣∣x2

∣∣∣n exp[(x2/4)n+ 1

]

≤ 1n!

∣∣∣x2

∣∣∣n exp[x2

4

].

5.11. (b) Letting y = x+ d, we have∫ ∞

√y

e−t2 dt =∫ ∞

√x+d

e−t2 dt =∫ ∞

x

e−(u+d)

2√u+ d

du

≤ e−d

∫ ∞

x

e−u

2√udu = e−d

∫ ∞

√x

e−t2 dt.

5.12. (c) For n ≥ 1, (an+1−bn+1) = (1/2)((√an−√

bn)/(√an+

√bn))(an−bn) ≤

(1/2)(an − bn) so (a2 − b2) ≤ (1/2)(a1 − b1), . . . , (an+1 − bn+1) ≤ (1/2n)(a1 −b1) and now use the squeeze principle on 0 ≤ an+1 − bn+1 ≤ (1/2n)(a1 −b1). (d) Rewrite the integrand of T (a, b) as 1/(sinx cosx((a + b)2 + (a cotx −b tanx)2)1/2). Sketch a cotx − b tanx to see that we may substitute tan y =(a cotx− b tanx)/(2b1) for −π/2 < y < π/2. Then 1/ cos y = ((2b1)2 +(a cotx−b tanx)2)1/2/(2b1). Now using b21 = ab and a few steps of algebra, 2b1/ cos y =a cotx + b tanx. Since also 2b1 tan y = (a cotx − b tanx), subtraction yieldsb tanx = b1/ cos y − b1 tan y. Next, differentiate to get (b/ cos2 x)dx = b1(sin y −1)/ cos2 y) dy. Multiplication of both sides by cotx gives b dx/(sinx cosx) =b1(sin y − 1)/(cos2 y tanx)dy. Now use 1/ tanx = b cos y/(b1(1 − sin y)) to getdx/(sinx cosx) = −dy/ cos y hence

T (a, b) =12

∫ π/2

−π/2

sec y dy√a21 + b21 tan2 y

=∫ π/2

0

dy√a21 cos2 y + b21 sin

2 y.

(e)

bn =√b2n cos2 x+ b2n sin2 x ≤

√a2n cos2 x+ b2n sin2 x

≤√a2n cos2 x+ a2

n sin2 x = an.

Use the squeeze principle. (f) T (a, b) = T (a1, b1). By induction, we have T (a, b) =T (an, bn) for all n. Since the sequence (a2

n cos2 x + b2n sin2 x)1/2 converges uni-formly we may pass the limit inside the integral:

T (a, b) = T (an, bn) =∫ π/2

0(√a2n cos2 x+ b2n sin2 x)−1dx

→∫ π/2

0AG(a, b)−1dx = (π/2)/(AG(a, b)).


5.13. Obvious from∮SΦ(∂Φ/∂n) dS =

∫V

|∇Φ|2 dV .

5.14. Let the first body carry charge Q1 at surface potential Φ1, the second bodyQ2 at potential Φ2; the individual capacitances are then C1 = Q1/Φ1, C2 =Q2/Φ2. After the bodies are put in communication the new charges becomeQ′

1, Q′2, and the shared potential Φ, where Φ = Q′

1/C1 = Q′2/C2 and Q′

1 +Q′2 =

Q = Q1 + Q2. These equations yield Φ = Q/(C1 + C2) so that the overall ca-pacitance is C1 + C2. Now it is straightforward to show that the energy storedby any conducting body is given by W = Q2/2C, where Q is its charge andC is its capacitance. Assuming Q1 and Q2 are both positive, AM–GM gives2Q1Q2C1C2 ≤ Q2

2C21 +Q2

1C22 , and suitable algebraic manipulation of this yields

(Q1+Q2)2/(C1+C2) ≤ Q21/C1+Q2

2/C2, as desired. (In case Q2 = 0, the desiredinequality is Q2

1/(C1 + C2) ≤ Q21/C1, which is obvious.)

5.16. Condition for equality in Cauchy–Schwarz:

df

dt= K1tf(t) ⇒ 1

f

df

dt=

d

dt[ln f(t)] = K1t ⇒ ln f(t) =

K1t2

2.

5.17. After one integration by parts,

Q(x) =1√2πx

e−x2/2 − 1√2π

∫ ∞

x

e−t2/2

t2dt.

But1√2π

∫ ∞

x

e−t2/2

t2dt <

1x2

1√2π

∫ ∞

x

e−t2/2 dt =1x2Q(x).

HenceQ(x) >

1√2πx

e−x2/2 − 1x2Q(x).

Now solve for Q(x).

5.18. Substitute ω = y− 1 so ω′ = 2ω+2, ω(0) = 0. For this shifted differentialequation,

φ0(x = 0,

φ1(x) =∫ x

0f(t, φ0(t)) dt =

∫ x

0(2φ0(t) + 2) dt = 2x,

...

φn(x) =(2x)n

n!+ · · · + (2x)2

2+ 2x.

Recognize φn(x) as the Taylor polynomial of degree n for e2x − 1 hence φn(x) →e2x − 1. Therefore y = e2x solves the original differential equation.

5.19. In general we want to choose a neighborhood U1 of ξ so that ‖G′(x)‖ ≤ α <1 for x ∈ U1 so that G is a contraction. For the case n = 1, G(x) = x−F (x)/F ′(x)and G′(x) = F (x)F ′′(x)/(F ′(x)2). So if an initial guess x is sufficiently close to ξso that |F (x)F ′′(x)/(F ′(x)2)| ≤ α < 1, then convergence is guaranteed (at leastin theory).


References

[1] Abramowitz, M., and I. Stegun. Handbook of Mathematical Functions.New York: Dover, 1965.

[2] Alexander, N. Exploring Biomechanics: Animals in Motion. New York:Scientific American Library, 1992.

[3] Anderson, G., M. Vamanamurthy, and M. Vuorinen. Conformal In-variants, Inequalities, and Quasiconformal Maps. New York: Wiley,1997.

[4] Andrews, L. Special Functions of Mathematics for Engineers. NewYork: McGraw-Hill, 1992.

[5] Arbel, B. “From ‘tricks’ to strategies for problem solving,” Interna-tional Journal of Mathematical Education in Science and Technology,vol. 21, no. 3, 1990.

[6] Ballard, W. Geometry. Philadelphia: W.B. Saunders, 1970.

[7] Bazaraa, M., and C. Shetty. Nonlinear Programming. New York: Wi-ley, 1979.

[8] Beckenbach, E., and R. Bellman. An Introduction to Inequalities. NewYork: Random House, 1961.

[9] Blahut, R. Principles and Practice of Information Theory. Reading,MA: Addison-Wesley, 1987.

144 References

[10] Borjesson, P., and C. Sundberg. “Simple approximations of the errorfunction Q(x) for communications applications,” IEEE Transactionson Communications, vol. COM-27, no. 3, 1979.

[11] Brogan, W. Modern Control Theory. New York: Quantum, 1974.

[12] Bromwich, T. An Introduction to the Theory of Infinite Series. Lon-don: Macmillan, 1965.

[13] Chong, K. “A study of some inequalities involving the modulus signs,”International Journal of Mathematical Education in Science and Tech-nology, vol. 12, no. 4, 1981.

[14] Chow, S., and J. Hale. Methods of Bifurcation Theory. New York:Springer-Verlag, 1982.

[15] Couch, L. Digital and Analog Communication Systems. New York:Macmillan, 1990.

[16] de Alwis, T. “Projectile motion with arbitrary resistance,” CollegeMathematics Journal, vol. 26, no. 5, 1995.

[17] Dennis, J., and R. Schnabel. Numerical Methods for UnconstrainedOptimization and Nonlinear Equations. Englewood Cliffs, NJ: PrenticeHall, 1983.

[18] Dieudonne, J. Foundations of Modern Analysis. New York: AcademicPress, 1960.

[19] Duffin, R. “Cost minimization problems treated by geometric means,”Operations Research, vol. 10, no. 5, pp. 668–675, 1962.

[20] Duffin, R. “Dual programs and minimum cost,” Journal of the Societyfor Industrial and Applied Mathematics, vol. 10, pp. 119–123, 1962.

[21] Duren, P. Theory of Hp Spaces. New York: Academic Press, 1970.

[22] Edwards, C. Advanced Calculus of Several Variables. New York: Aca-demic Press, 1973.

[23] Eggleston, H. Elementary Real Analysis. Cambridge University Press,UK, 1962.

[24] Everitt, W. Inequalities: Fifty years on from Hardy, Littlewood, andPolya: Proceedings of the International Conference. New York: Dekker,1991.

[25] Gelfand, I. Lectures on Linear Algebra. New York: Dover, 1989.

[26] Glaister, P. “Does what goes up take the same time to come down?,”College Mathematics Journal, vol. 24, no. 2, 1993.

References 145

[27] Gradshteyn, I., and I. Ryzhik. Table of Integrals, Series, and Products.Boston: Academic Press, 1994.

[28] Hardy, G., J. Littlewood, and G. Polya. Inequalities. Cambridge Uni-versity Press, UK, 1952.

[29] Hobson, E. The Theory of Functions of a Real Variable and the Theoryof Fourier’s Series, vol. I. New York: Dover, 1957.

[30] Indritz, J. Methods in Analysis. New York: Macmillan, 1963.

[31] Jerri, A. Introduction to Integral Equations with Applications. NewYork: Dekker, 1985.

[32] Jordan, D., and P. Smith. Nonlinear Ordinary Differential Equations.Clarendon Press, Oxford, UK, 1987.

[33] Kazarinoff, N. Geometric Inequalities. New York: Random House,1961.

[34] Klamkin, M. (ed.) Problems in Applied Mathematics. Selections fromSIAM Review. Philadelphia: SIAM, 1990.

[35] Knowles, J. “Energy decay estimates for damped oscillators,” Interna-tional Journal of Mathematical Education in Science and Technology,vol. 28, no. 1, 1997.

[36] Lafrance, P. Fundamental Concepts in Communication. EnglewoodCliffs, NJ: Prentice Hall, 1990.

[37] Landau, E. Differential and Integral Calculus (3rd ed.). New York:Chelsea, 1980.

[38] Lew, J., J. Frauenthal, and N. Keyfitz. “On the average distances in acircular disc,” in Mathematical Modeling: Classroom Notes in AppliedMathematics. Philadelphia: SIAM, 1987.

[39] Libeskind, S. “Summation of finite series — A unified approach,” Two-Year College Mathematics Journal, vol. 12, no. 1, 1981.

[40] Lutkepohl, H. Handbook of Matrices. Chichester: Wiley, 1996.

[41] Manley, R. Waveform Analysis. New York: Wiley, 1945.

[42] Marcus, M., and H. Minc. A Survey of Matrix Theory and MatrixInequalities. New York: Dover, 1992.

[43] Marshall, A., and I. Olkin. Inequalities: Theory of Majorization andits Applications. New York: Academic Press, 1979.

146 References

[44] Meschkowski, H. Series Expansions for Mathematical Physicists. NewYork: Interscience, 1968.

[45] Mitrinovic, D. Analytic Inequalities. Berlin: Springer-Verlag, 1970.

[46] Mitrinovic, D., and Dragoslav, S. Recent Advances in Geometric In-equalities. Boston: Kluwer Academic, 1989.

[47] Oden, J. Applied Functional Analysis: A First Course for Students ofMechanics and Engineering Science. Englewood Cliffs, NJ: PrenticeHall, 1979.

[48] Papoulis, A. The Fourier Integral and its Applications. New York:McGraw-Hill, 1962.

[49] Patel, V. Numerical Analysis. New York: Saunders College, 1994.

[50] Polya, G., and G. Szego. Isoperimetric Inequalities in MathematicalPhysics. Annals of Mathematics Studies No. 27. Princeton, NJ: Prince-ton University Press, 1951.

[51] Protter, M. Maximum Principles in Differential Equations. New York:Springer-Verlag, 1984.

[52] Ptak, V. “The Kantorovich inequality,” American MathematicalMonthly, vol. 102, no. 9, 1995.

[53] Rogosinski, W. Volume and Integral. New York: Interscience, 1952.

[54] Shannon, C. The Mathematical Theory of Communication. Urbana:University of Illinois Press, 1964.

[55] Stoer, J., and R. Bulirsch. Introduction to Numerical Analysis. NewYork: Springer-Verlag, 1980.

[56] Stratton, J. Electromagnetic Theory. New York: McGraw-Hill, 1941.

[57] Stromberg, K. An Introduction to Classical Real Analysis. Belmont,CA: Wadsworth International, 1981.

[58] Temme, N. Special Functions: An Introduction to the Classical Func-tions of Mathematical Physics. New York: Wiley, 1996.

[59] Watson, G. A Treatise on the Theory of Bessel Functions. CambridgeUniversity Press, UK, 1944.

[60] Weinberger, H. A First Course in Partial Differential Equations withComplex Variables and Transform Methods. New York: Wiley, 1965.

Index

absolute value, 5air resistance, 82algebra of inequalities, 2AM–GM inequality, 38, 49–51arithmetic mean, 15, 39, 50arithmetic–geometric mean, 124asymptotic sequence, 70axioms of order, 2

Bernoulli’s inequality, 37Bessel’s inequality, 65bound

Chernoff, 111greatest lower, 4least upper, 4

boundedabove,below, 4function, 19

capacitance, 90Cauchy mean value theorem, 25Cauchy sequence, 54, 59Cauchy–Schwarz inequality, 44,

60, 61, 66Chebyshev’s inequality, 45, 51,

110

communication, 112complementary exponents, 41complete, 55continuity, 20, 62contraction mapping, 56

theorem, 56control, 109convergence, 6, 54

in mean, 123pointwise, 69uniform, 69, 123

convolution, 68

dense, 36DMS, 114duality, 121

eigenvalues, 93elliptic integral, 125entropy, 114equation

Laplace, 89Poisson, 88

estimation of integrals, 29, 67Euler method, 75

148 Index

Euler’s constant, 36Euler’s formula, 84extrema, 21

Fibonacci numbers, 15fixed point, 56Fourier

coefficients, 65series, 65, 69, 85transform, 101

Fredholm integral equation, 116function

autocorrelation, 124Bessel, 78, 82, 124bounded, 19coerror, 125complementary error, 124continuity, 20convex, 46convolution, 68decreasing, 21elliptic integral, 125exponential integral, 77, 123gamma, 77, 123increasing, 21Lyapunov, 104monotonic, 21, 35square integrable, 51terminology and facts, 19transfer, 108

geometric mean, 15, 39, 50geometric shapes, 84Gibb’s phenomenon, 100give a little, 11Green’s formula, 90

Holder’s inequality, 40, 42, 43, 50harmonic mean, 15, 40, 50harmonic–geometric–arithmetic

means inequality, 50Heron’s formula, 87Hilbert space, 62

implicit function theorem, 119

inequalityAM–GM, 38, 49–51Bernoulli, 37Bessel, 65Cauchy–Schwarz, 44, 60, 61,

66Chebyshev, 45, 51, 110convexity, 46Holder, 40, 42, 43, 50harmonic–geometric–

arithmetic means,50

for integrals, 50integral, 21isoperimetric, 85Jensen, 46, 51, 52Jordan, 29Kantorovich, 50logarithmic, 27Markov, 110Minkowski, 43, 61, 66modulus, 22of the means, 38, 49–51Ostrowski, 35quadratic, 5Schur, 97triangle, 7, 58Weierstrass, 14Young, 38, 41, 49

infimum, 4information, 113, 114inner product, 59, 62inner product space, 59intermediate value property, 21isoperimetric inequality, 85iterative process, 56

Jensen’s inequality, 46, 51, 52Jordan’s inequality, 29

Kantorovich’s inequality, 50

l’Hopital’s monotone rule, 25Lagrange interpolation, 72Laplace equation, 89

Index 149

Laplace transform, 108law

parallelogram, 61transitive, 3

limit, 6limit point, 54linear space, 57Lipschitz condition, 33, 117logarithmic inequality, 27Lyapunov function, 104

mapping, 55continuous, 55contraction, 56

Markov’s inequality, 110matched filter, 112matrix, 93

conjugate transpose, 93eigenvalues, 93Hermitian, 93norm, 98symmetric, 94trace, 96

maximum, 4mean

arithmetic, 15, 39, 50arithmetic–geometric, 124geometric, 15, 39, 50harmonic, 15, 40, 50

mean value theorem, 24Cauchy, 25for integrals, 23, 35

metric, 54metric space, 53minimizing vector, 62minimum, 4Minkowski’s inequality, 43, 61,

66modulus inequality, 22monotonicity

conditions for, 25monotonicity, conditions for, 25motor control, 109

neighborhood, 6, 54

Neumann series, 117Newton’s method, 120, 125noise, 112norm, 58, 62

L2, 58Lp, 97compatible, 99cubic, 98Frobenius, 98induced, 61matrix, 97supremum, 58

orderexponential, 34symbols O, o, 19

orthogonal projection, 62orthogonality, 62orthonormal set, 62Ostrowski’s inequality, 35

parallelogram law, 61Parseval’s identity, 85persistence of sign, 20, 55Picard iteration, 56, 125Picard–Lindelof theorem, 117Platonic solids, 85Poisson’s equation, 88polyhedron, 84polynomial, 82

Chebyshev, 78Legendre, 79orthogonal, 81zeros of, 81

positive definite, 95, 104principle

Dirichlet, 91maximum, 89squeeze, 14, 20uncertainty, 102

projectile problem, 82Pythagorean theorem, 62

quadratic form, 94quadratic inequality, 5

150 Index

results from differential calculus,23

Riemannintegral, 21lemma, 65sum, 21

Rolle’s theorem, 25

Schur’s inequality, 97second derivative test, 26, 95sequence, 6

asymptotic, 70increasing, 10monotonic, 17

setclosed, 54open, 54

Simpson’s rule, 72, 123space

lp, 65function, 58Hilbert, 62inner product, 59linear, 57metric, 53complete, 55

normed linear, 58spectral radius, 99spring, nonlinear, 105square integrable, 51, 68squeeze principle, 14, 20stability, 103, 107

asymptotic, 103BIBO, 107

exponential, 103Lyapunov, 103

Steiner symmetrization, 92successive approximations, 56supremum, 4Sylvester’s criterion, 94

Taylor’s method, 74, 123Taylor’s theorem, 24telescoping property, 15theorem

contraction mapping, 56fundamental, of calculus, 23implicit function, 119l’Hopital’s monotone, 25mean value, 24Cauchy, 25for integrals, 23, 35

Picard–Lindelof, 117Pythagorean, 62Rolle, 25Taylor, 24

transfer function, 108transform

Fourier, 101Laplace, 108

triangle inequality, 7, 58

Wallis’s product, 34Weierstrass’s inequality, 14

Young’s inequality, 38, 41, 49

zeros of polynomials, bounds, 9

Date post:	19-Dec-2016
Category:	Documents
Upload:	doanh
View:	250 times
Download:	5 times

Inequalities : With Applications to Engineering

Documents