Math 133 Volume Geometry of integrals.

Math 133 Volume Stewart §5.2

Geometry of integrals. In this section, we will learn how to computevolumes using integrals defined by slice analysis. First, we recall from Cal-culus I how to compute areas. Given the region under a graph y = f(x)and above an interval [a, b] on the x-axis, we slice up the interval into nincrements (small parts) of width ∆x = b−a

n , with division points:

a < a+∆x < a+2∆x < · · · < a+n∆x = b,

and we take sample points x1, . . . , xn, one in each increment. This slices thearea into increments ∆A1, . . . ,∆An, each approximately a rectangle:

The area of an increment is approximately:

∆Ai ≈ (height)×(width) = f(xi) ∆x ,

so the total area is the Riemann sum (see §4.1,I):

A =n∑i=1

∆Ai ≈n∑i=1

f(xi)∆x .

Taking the limit of very many, very thin slices defines the integral, whichcomputes the exact area:

A = limn→∞

n∑i=1

f(xi)∆x =

∫ b

af(x) dx .

This only gives approximate numerical answers to our question, though,unless we can determine the limit, which is very difficult directly (§4.1,II).

So far, there is no really surprising idea here: for thousands of years,mathematicians used similar methods to compute areas, with very limitedsuccess. The genius of Newton shines in three ideas which solve the problem,easily computing almost any area (§4.3). The first idea is to make area intoa function: A(x) =

∫ xa f(t) dt is the area over a variable interval t ∈ [a, x].

Second, we find that A′(x) = f(x): amazingly, the area function is anantiderivative of f(x), because the rate of change of area is the height ofthe graph. That is, as x moves rightward, A(x) increases quickly or slowlydepending on how high f(x) is: this is the First Fundamental Theorem.

Notes by Peter Magyar [email protected]

The third and clinching idea is the algebra of differentiation. If we canreverse the derivative rules to find another antiderivative, a known functionF (x) with F ′(x) = f(x), then the Uniqueness Theorem tells us that thetwo antiderivatives are the same, except for adding some constant: A(x) =∫ xa f(t) dt = F (x) + C. Since 0 = A(a) = F (a) + C, we must haveC = −F (a), so A(x) = F (x)− F (a), the Second Fundamental Theorem:

A = A(b) =

∫ b

af(x) dx = F (b)− F (a).

That is, if f(x) is the rate of change of F (x), so that ∆Fi = f(xi)∆x arethe small changes (increments) in F (x), then the integral is the total changein F (x) over the interval [a, b].∗

These three brilliant ideas, working perfectly together, make it easy tocompute the areas of most shapes, provided we are skillful enough at revers-ing derivative rules to find indefinite integrals.

example: Find the area of the leftmost region enclosed below the curvey = x cos(x2) and above the line y = −x.

The left corner of the region is at x = 0. To find the right corner, we setx cos(x2) = −x and get x(1+ cos(x2)) = 0, whose smallest positive solutionis x =

√π ≈ 1.77 .

To compute the area, we consider thin slices from the top curve to thebottom line, with height x cos(x2)− (−x) = x cos(x2) + x, and take a limitof Riemann sums to get:

A =

∫ √π0

(x cos(x2) + x

)dx =

∫ √π0

x cos(x2) dx +

∫ √π0

x dx.

We can evaluate the first term using the substitution u = x2, du = 2x dx,so∫x cos(x2) dx = 1

2

∫cos(x2) 2x dx = 1

2

∫cos(u) du = 1

2 sin(u) = 12 sin(x2).

Thus:

A =[

12 sin(x2) + 1

2x2]x=√π

x=0= 1

2π ≈ 1.57 .

That looks about right from the picture. Not a result you could get fromelementary geometry!

∗The indefinite integral∫f(x) dx = F (x) + C denotes the general antiderivative func-

tion, also called the primitive function. By contrast, the definite integral∫ b

af(x) =

F (b)− F (a) is a number.

Solids of revolution. The integral is a very powerful tool to computethe total of many small parts, such as the thin slices which fill up an area.A similar method allows us to compute volumes. We start with a solid ofrevolution which is formed by rotating a region in the plane around the x-axis, sweeping out a kind of barrel shape.

The total volume is the sum of n disk slices at sample points x1, . . . , xn in[a, b], each with radius f(xi) and thickness ∆x = b−a

n .

The volume of each slice is approximately:

∆Vi ≈ (circle area)×(thickness) = πf(xi)2 ∆x ,

and taking the limit as n→∞ gives the exact volume as an integral:

V = limn→∞

n∑i=1

πf(xi)2 ∆x =

∫ b

aπf(x)2 dx .

Once we express our answer as an integral, we no longer consider its geo-metric motivation: finding an antiderivative and determining the value is apurely algebraic problem.

This analysis should not be surprising: we expect our answer to convergeto an integral as soon as we can slice up the problem into small increments.

example: Find the volume of the trumpet solid obtained by rotating aroundthe x-axis the region defined by: 0 ≤ y ≤ 1

x and 1 ≤ x ≤ 2. This is the solidof revoltion of the curve y = 1

x over the interval x ∈ [a, b].

Applying our formula:

V =

∫ 2

1π(

1x

)2dx =

∫ 2

1πx−2 dx

=[−πx−1

]x=2

x=1= (−π2−1)− (−π1−1) = π

2 .

example: Find the volume of the solid obtained by rotating the regiondefined by x2 ≤ y ≤ 2 and x ≥ 1, around the vertical axis x = 1.

The setup is different from the x-axis rotation, so we must repeat our sliceanalysis. Since we rotate around the vertical line x = 1, we should takehorizontal slices positioned by their height y, and we should write our regionas lying next to the interval y ∈ [1, 2] between the curves x = 1 and x =

√y.

Our slices are thin horizontal disks at sample heights y1, . . . , yn in theinterval y ∈ [1, 2]. The thickness of each disk is ∆y. The radius at heightyi is the horizontal distance from the axis x = 1 to the curve x =

√y; this

distance is√yi − 1. The volume of each disk is ∆Vi ≈ π(

√yi−1)2 ∆y, and:

V =

∫ 2

1π(√y−1)2 dy =

∫ 2

1π(y−2

√y+1) dy =

[π(

12y

2−43y

3/2+y)]y=2

y=1= π

(236 −

8√

23

).

Solids with specified cross sections. Let us consider a solid rather likethe sail of the man-of-war jellyfish. It has a base outlined by the circlex2 + y2 = 1, and its vertical cross section over each line x = c is an isoscelesright triangle, with height equal to half its base.

The slices are defined over x ∈ [−1, 1], with each slice at x = xi having

thickness ∆x and area 12(base)×(height) = 1

2

(2√

1−x2i

)(√1−x2

i

). Thus:

V =

∫ 1

−1

12

(2√

1−x2)(√

1−x2)dx =

∫ 1

−11−x2 dx =

[x−1

3x3]x=1

x=−1= 4

3 .

Method of slice analysis to compute size. We have now seen severalcases where we compute the size or bulk of geometric objects. Let S denoteany measure of the size of an object: length, area, volume, mass, etc.

1. Cut the object into slices whose position is determined by some vari-able x ∈ [a, b].

2. Mark off the interval [a, b] into n increments of width ∆x = b−an ,

each with a sample point xi . This splits the object into n slices, andsumming up their sizes gives the total size: S =

∑ni=1 ∆Si .

3. Because the slice at xi is so thin, we can find a good approximation ofits size by some simple formula of the form ∆Si ≈ f(xi) ∆x.

4. Taking n→∞ and ∆x→ 0, the approximation becomes exact:

S = limn→∞

n∑i=1

f(xi) ∆x =

∫ b

af(x) dx .

5. Having expressed S =∫ ba f(x) dx, we evaluate this integral by algebraic

or numerical techniques.

challenge problem: Consider the solid of revolution around the x-axisof the curve y = f(x) over the interval x ∈ [a, b]. Explain the followingformula for the surface area of this solid:

S =

∫ b

a2πf(x)

√1+(f ′(x))2 dx .

Hint:√

(∆x)2 + (∆y)2 =√

1 + ( ∆y∆x)2 ∆x ≈

√1 + ( dydx)2 ∆x.

Math 133 Work Stewart §5.4

Energy and work. The concept of energy in physics unifies a variety ofeveryday concepts which can all be converted into each other by doing work.

• Kinetic energy is in the motion of a mass, and can do work by pushing.

• Potential energy lies in the position of an object pulled by a force field:water behind a dam can do work as gravity pulls it down.

• Chemical energy, in the ions of a battery, can work a machine; in themolecular bonds of our food, it powers our muscles.

• Heat energy, which is actually the kinetic energy of jiggling molecules,can do work in a steam engine.

• Nuclear energy in a uranium atom is the potential energy of protonswhose positive electric charges explosively repel each other. (The pro-tons are barely held together by the nuclear strong force.)

Conservation of energy is a universally confirmed law of physics: energy isnever created or destroyed, only converted from one form to another. Somefamiliar units of energy are watt-hours of electricity, calories in food, andtons of TNT explosive.

Mechanical work is a change in kinetic and potential energy by applyingforce to an object. Formally, work = (force applied) times (distance moved):

W = F · s

example: If a 5 pound weight is lifted 10 feet, what is the work done againstgravity by the lifting force? The force of gravity is measured by weight, so:

W = F · s = (5 lb) · (10 ft) = 50 ft-lb.

Here foot-pounds (ft-lb) is another unit of energy.∗

example: If a 5 kilogram weight is lifted 2 meters, what is the work doneagainst gravity? This computation is more complicated, because in metricunits we do not have the shortcut of using pounds as a measure for bothmass (how hard it is to shove an object) and force (how hard gravity pullsthe object). Rather:

(force of gravity) = (mass) · (acceleration of gravity)

F = (5 kg) · (9.8 m/sec2)

= 49 kg-m/sec2 = 49 N.

Notes by Peter Magyar [email protected]∗50 ft-lb ≈ 0.02 watt-hours, meaning it would take a 1 watt toy electric motor 0.02

hours, about a minute, to winch up the weight, not counting wasted energy. This is alsoabout 0.02 food calories: it takes a lot of lifting to burn up that candy bar (though youdo a lot of extra work in moving and heating your body).

Here newtons (N) are the metric unit of force, with 1 N = 1 kg-m/sec2 =0.22 lbs. Rather than going through this computation, Webwork problemswill usually tell you the force on a mass in newtons, or the force density innewtons per meter of rope or per cubic meter of liquid.

Now we can find:

(work) = (force) · (distance)

W = F · s= (49 N) · (2 m)

= 98 N-m = 98 J.

Here joules (J) are the metric unit of energy, with 1 joule = 1 newton-meter= 1 watt-sec = 0.74 ft-lbs.

Work against a spring force. Imagine a spring stretching along the x-axis, with its left end fixed at some negative point, and its natural lengthplacing its right end at the equilibrium position x = 0. Hooke’s Law saysthat the force required to hold the spring in an arbitrary position x is:

F (x) = kx,

where the spring constant k > 0 depends on the physical properties of thespring.†

Thus, if the spring is stretched out to a positive x, it requires a force inthe positive direction to keep it from contracting; and if it is compressedto a negative x, it requires a force in the negative direction to keep it frombouncing back.

example: How much work is done in stretching a spring from x = 2 m tox = 5 m, assuming the initial condition F (2) = 1 N? First, since 1 = F (2) =(k)(2), we know that k = 1

2 , and F (x) = 12x N.

Since the force varies with x, we must split up the work W into smallincrements ∆W1, . . . ,∆Wn, whose sum approaches an integral (see Methodof Slice Analysis, end of §5.2). That is, we slice the interval x ∈ [2, 5] into nincrements of length ∆x, and take sample points x1, . . . , xn. Then:

(work increment) ≈ (force at xi) · (distance increment)

∆Wi ≈ F (xi) ·∆x = 12xi ∆x .

The total work is given by a Riemann sum approaching an integral:

W = limn→∞

n∑i=1

∆Wi = limn→∞

∑i=1

12xi ∆x =

∫ 5

2

12x dx = 1

4x2∣∣x=5

x=2= 5.25 .

†In practice, this law only holds for x not too far from 0.

Work against gravity. An underground tank is 6 ft × 6 ft wide and 10 ftdeep, with its top at ground level. How much work is done in pumping allthe water up to ground level?

Draw an axis with the bottom of the tank at x = 0, the top at x = 10,and slice the tank into n thin horizontal slices, each with thickness ∆x.

The force on each slice is its weight:

(force) = (weight) = (volume) · (density of water)

F =((6)(6)(∆x) ft3

)·(62.4 lb/ft3

)= c∆x lb, where c = 2246.4

The force F is the same for each slice, but the distance to be lifted fromheight xi is si = 10− xi, so the increment of work is:

∆Wi = F · si = (c∆x) · (10−xi) = c (10−xi) ∆x .

Thus the total work is:

W = limn→∞

n∑i=1

∆Wi = limn→∞

n∑i=1

c (10−xi) ∆x

=

∫ 10

0c (10−x) dx = c

(10x−1

2x2)∣∣x=10

x=0= 112,320 ft-lb.

Math 133 Inverse Functions Stewart §6.1

What is a function? Like other mathematical concepts, the idea of a function hasfour levels of meaning: physical (word problems), geometric (graphs), numerical (spread-sheets), and algebraic (formulas). We illustrate these with the following example.

• Physical. We throw a stone upward off a 25 m building, with an initial verticalspeed of 20 m/sec; and we assume gravitational acceleration of 10 m/sec2 down-ward. Let s be the height of the stone in meters, t sec after the throw. Thus, thefunction s = f(t) is defined by looking to see the height at a given time. We limitour experiment to t ∈ [0, 5].

• Geometric. The stone’s height is plotted approximately by:

• Numerical. We collect the following observations for height each second:

t 0 1 2 3 4 5

s = f(t) 25 40 45 40 25 0

This is only a rough model: we could take a smaller increment for the time inputs,or indeed imagine an infinite table to capture all inputs and outputs of f(t).

• Algebraic: s = f(t) = 25+20t−5t2. This agrees with the physical version becausethe initial height is f(0) = 25; the initial velocity is f ′(0) = [20 − 10t]t=0 = 20;and the constant acceleration is f ′′(t) = −10, negative meaning downward. Thisformula is only valid for t ∈ [0, 5], since the stone stops at the ground, and like allphysical models, it is slightly thrown off by subtle factors such as air resistance.

What is an inverse function? Continuing our example, we examine the inverse ofthe function s = f(t) above. This means reversing the roles of input and output (theindependent and dependent variables), so that time t becomes a function of height s:in symbols t = f−1(s). That is, f−1 is the rule that tells us at what time t the stonereaches a given height s.

Here we encounter a problem: since the stone rises and then falls, it reaches a givenheight s at two different times: for example, f(1) = f(3) = 40. Thus f−1(40) = 1 and/or3, with no single output, so f−1 is not a function. This happens because the originalfunction f is not one-to-one: instead of taking different inputs to different outputs, ittakes two different inputs t = 1, 3 to the same output s = 40. In graphical terms,


s = f(t) fails the horizontal line test, meaning a horizontal line like s = 40 intersects thegraph more than once.

To fix this problem, we must restrict the domain of our function: we will look atthe stone only for t ∈ [2, 5] (solid part of the graph). Technically, this defines a newfunction:

f : [2, 5]→ [0, 45],

meaning the only allowed inputs are t ∈ [2, 5], which produce outputs covering theinterval s ∈ [0, 45]. This restricted function is one-to-one, satisfying the horizontal linetest, so we can get an inverse function:

f−1 : [0, 45]→ [2, 5],

which again has four meanings:

• Physical. The inverse function t = f−1(s) gives the unique time t ∈ [2, 5] for whichthe stone is at height s.

• Geometric. The graph t = f−1(s) is the original graph s = f(t) flipped diagonally,so as to switch the vertical and horizontal axes.

• Numerical. We restrict the s = f(t) table to t ∈ [2, 5], and switch the two rows sothat s is the input row, t is the output row.

s 45 40 25 0

t=f−1(s) 2 3 4 5

We can tidy this, rearranging and supplementing the data columns:

s 0 5 10 15 20 25 30 35 40 45

t=f−1(s) 5.0 4.8 4.6 4.4 4.2 4.0 3.7 3.4 3.0 2.0

• Algebraic: We must go from s = 25 + 20t − 5t2 to t = formula in s. This justmeans to solve the original equation for t in terms of s. The Quadratic Formula

solves ax2+bx+c = 0 as x = −b±√b2−4ac

2a , so:

s = 25 + 20t− 5t2 ⇐⇒ (−5)t2 + 20t + (25−s) = 0

⇐⇒ t =−20±

√202 − 4(−5)(25−s)

2(−5)= 2±

√9− 1

5s.

The ± sign gives a choice between the two values t1, t2 in the original domaint ∈ [0, 5] which correspond to s. We want the larger choice, namely the + sign, toget t ∈ [2, 5] as required:

t = 2 +√

9− 15s.

Note: In the relationships s = f(t) and t = f−1(s), the variables t, s are merelysuggestive, recalling the physical meaning of these functions. On the algebraiclevel, we don’t really care what the variables mean, and we sometimes changeletters to make x the input variable for every function:

f(x) = 25 + 20x− 5x2, f−1(x) = 2 +√

9− 15x .

Formal definitions.

Definition: Consider a function f : A → B with inputs in the set A andoutputs covering the set B. Suppose f is one-to-one, meaning if x1 6= x2,then f(x1) 6= f(x2). (The graph y = f(x) satisfies the horizontal line test.)

Then we define the inverse function f−1 : B → A as f−1(b) = a, wherea ∈ A is the unique value with f(a) = b.

Inverse Theorem: The function f and its inverse f−1 satisfy:

f−1(f(a)) = a and f(f−1(b)) = b

for all a ∈ A, b ∈ B. That is, f and f−1 undo each other under composition.

The proof is just applying the definitions: if f(a) = b, then f−1(b) = a, which meansf−1(f(a)) = f−1(b) = a; and similarly for the other equation.

In our previous example, this means:

f−1(f(t)) = 2 +√

9− 15(25+20t−5t2) = t,

f(f−1(s)) = 25 + 20

(2+√

9− 15s

)− 5

(2+√

9− 15s

)2

= s.

These equations are not obvious, but they must hold after simplification.

Another Example. For the function: y = f(x) = x+1x+2 , we can get its inverse by solving

for x in terms of y:y = x+1

x+2 ⇐⇒ x+1 = (x+2)y

⇐⇒ x+1 = xy + 2y

⇐⇒ x− xy = 2y − 1

⇐⇒ x(1−y) = 2y − 1

⇐⇒ x = f−1(y) = 2y−11−y .

Changing the input variable to x, we get: f−1(x) = 2x−11−x .

Note that the natural domain of f(x), the set of inputs for which the formula x+1x+2

makes sense, is all x 6= −2. Also, we can write f(x) = x+2−1x+2 = 1− 1

x+2 , so the range of

f(x), the set of all outputs, is all y 6= 1. This is reversed for the inverse function: f−1(y)has domain y 6= 1 and range x 6= 2.

Derivatives of inverse functions.

Inverse Derivative Theorem. Suppose f : A → B has inverse f−1 : B → A,that f(a) = b for some values a, b, and that f(x) is differentiable at x = a.Then f−1(y) is differentiable at y = b:

(f−1)′(b) =1

f ′(a)=

1

f ′(f−1(b)).

In Leibnitz notation with y = f(x) and x = f−1(y), we have:

dx

dy

∣∣∣∣y=b

=1

dydx

∣∣∣x=a

.

Proof: This is most clear geometrically, considering that in the graph y = f(x), wehave f ′(a) ≈ ∆y

∆x , the rise-over-run for a small interval near x = a. In the inversegraph x = f−1(y), the vertical and horizontal increments (rise and run) are switched, sothat (f−1)′(b) ≈ ∆x

∆y ≈1

f ′(a) . Taking ∆x,∆y → 0 turns the approximations into exactequalities in the limit.

A different, algebraic proof comes from the equations in the Inverse Theorem. Takingx as the input variable, we have f(f−1(x)) = x. Applying the Chain Rule with f−1(x)as the inside function gives:

[f(f−1(x))]′ = (x)′ =⇒ f ′(f−1(x)) · (f−1)′(x) = 1

=⇒ (f−1)′(x) =1

f ′(f−1(x)),

which is the formula of the Theorem when x = b.

example: Let y = f(x) = x3 + x + 1. Since f ′(x) = 3x2 + 1 > 0 for all x, we see thatf(x) is increasing everywhere, and different inputs x1 < x2 must go to different outputsf(x1) < f(x2); thus f(x) is one-to-one. Hence there is an inverse function f−1(x), eventhough there is no neat formula for it.

Nevertheless, we know f(1) = 3 and f ′(1) = 4, so:

(f−1)′(3) =1

f ′(f−1(3))=

1

f ′(1)=

1

4.

Math 133 Natural Logarithms Stewart §6.2

Review of exponential and logarithm functions. We recall some facts from alge-bra, which we will later prove from a calculus point of view. In an expression of theform ap, the number a is called the base and the power p is the exponent. An exponentialfunction∗ is of the form f(x) = ax. It is defined for rational x = m

n by am/n = n√a · · · a

(m factors), a−x = 1/ax, and a0 = 1; but this is much harder for irrational exponents likea√2. We have addition and multiplication formulas: ax1ax2 = ax1+x2 and (ax)p = apx.Given the exponential function f(x) = ax, the logarithm function is the inverse

f−1(x) = loga(x), as defined in §6.1.†

That is, y = ax is equivalent to x = loga(y), and we have loga(ax) = x and aloga(y) = y.Every fact about the exponential function corresponds to an inverse fact about thelogarithm. Setting y1 = ax1 , x1 = log(y1); and y2 = ax2 , x2 = log(y2); the additionformula becomes:

ax1ax2 = ax1+x2 =⇒ y1y2 = alog(y1)+log(y2)

=⇒ log(y1y2) = log(y1) + log(y2).

Setting y = ax, x = log(y), the multiplication formula becomes:

(ax)p = apx =⇒ yp = ap log(y)

=⇒ log(yp) = p log(y).

example: Expand the expression log(√

x+1x−1

)as much as possible into a sum of simple

terms. Using the addition and multiplication formulas:

log(√

x+1x−1

)= log

(((x+1)(x−1)−1

) 12

)= 1

2( log(x+1) + (−1) log(x−1) )

= 12 log(x+1)− 1

2 log(x−1).

Notes by Peter Magyar [email protected]∗Do not confuse this with a power function of the form f(x) = xp

†If the base a is understood, we write simply log(x). In science and engineering literature, if there isno base specified, we assume the base a = 10.

Natural exponential and logarithm. In the physical world, an exponential functionf(t) = at typically appears as the size of a population which is self-reproducing. Thismeans the population growth rate, the number of births per unit time, is proportionalto the current population size:

f ′(t) = c f(t) .

It is a fact (proven below) that any exponential function f(t) = at satisfies this equationfor some constant c.‡

The natural exponential function uses the unique choice of base a = e = 2.71 · · ·which makes the above constant c = 1. That is, if we write f(x) = exp(x) = ex, then:

f ′(x) = f(x), exp′(x) = exp(x), (ex)′ = ex.

The natural logarithm is the inverse function of f(x) = exp(x), namely f−1(x) = ln(x) =loge(x). Applying the Inverse Derivative Theorem (§6.1) to ln(x), we get:

ln′(x) = (f−1)′(x) =1

f ′(f−1(x))=

1

exp′(ln(x))=

1

exp(ln(x))=

1

x.

Amazingly, though the definition of ln(x) was complicated, its derivative is the extremelysimple function 1

x .

example: Find the derivative of f(x) = ln(x2+1). Taking outside function ln(x) withln′(x) = 1

x , and inside function x2+1, the Chain Rule gives:

f ′(x) = ln′(x2+1) · (x2+1)′ =1

x2+1· (2x) =

2x

x2+1.

example: Find the derivative of f(x) = ln(sin(x)). From the Chain Rule;

f ′(x) = ln′(sin(x)) · sin′(x) =1

sin(x)· cos(x) = cot(x).

example: Find the derivative of f(x) =(x+1)3 sin2(x)

(2x+1)5. We use the shortcut of logarith-

mic differentiation, in which we first take the logarithm of both sides, turning productsinto sums, then we differentiate:

ln f(x) = 3 ln(x+1) + 2 ln(sin(x))− 5 ln(2x+1)

(ln f(x))′ = 3 ln(x+1)′ + 2 ln(sin(x))′ − 5 ln(2x+1)′

1

f(x)f ′(x) = 3

1

1+x+ 2

1

sin(x)cos(x)− 5

1

2x+1(2)

f ′(x) =(x+1)3 sin2(x)

(2x+1)5

(3

1+x+ 2 cot(x)− 10

2x+1

).

‡Mathematical laws in science are typically stated in such differential equations, in which an unknownfunction f(t) has a specified relation with its rate of change f ′(t), its acceleration f ′′(t), etc. For example,Newton’s law of universal gravitation is essentially f ′′(t) = −c/f(t)2.

Logarithms and integrals. Reversing our new Basic Derivative ln′(x) = 1x , we see

ln(x) as an antiderivative of 1x = x−1, a key function for which we previously knew

no antiderivative. (The Basic Antiderivative for a power function xp would give thenonsense answer 1

p+1xp+1 = 1

0x0.)

Hence the Second Fundamental Theorem of Calculus (§4.3) tells us:∫ b

a

1

xdx = ln(b)− ln(a).

Thus, we get∫ x1

1t dt = ln(x)− ln(1) = ln(x).

Geometrically, ln(x) is the area under the curve y = 1x and above the interval [1, x].

This gives a way to approximate a natural logarithm as a Riemann sum: split theinterval [1, x] into n increments of size ∆x = x−1

n , take sample points xi = 1 + i∆x fori = 1, . . . , n, and compute:

ln(x) =

∫ x

1

1

tdt ≈

n∑i=1

1

xi∆x .

For example, for x = 2, n = 100, ∆x = 0.01, we get:

ln(2) ≈ 11.01(0.01) + 1

1.02(0.01) + · · ·+ 11.99(0.01) + 1

2.00(0.01) ≈ 0.691 ,

which is a good approximation to the actual value ln(2) = 0.693 · · · . Calculators andcomputers need some such approximation algorithm to compute values of ln(x): thereis no “exact formula” in elementary terms.

example: Compute∫ ba

ln(x)x dx. Although there does not appear to be any outside or

inside function, we see that ln(x)x = ln(x)· 1x = ln(x)·ln′(x), so we can use the substitution

u = ln(x), du = 1x dx:∫ b

aln(x)

1

xdx =

∫ b

au du =

[12u

2]

=[12 ln2(x)

]x=b

x=a.

example: A tricky integral:∫

sec(x) dx. There seems to be no convenient substitution,but with trig functions we can often use identities to transform our integral into atractable form. In this case, we use a shrewd trick to introduce sec2(x) = tan′(x) andsec(x) tan(x) = sec′(x):∫

sec(x) dx =

∫sec(x) sec(x)+tan(x)

sec(x)+tan(x) dx =

∫1

sec(x)+tan(x) ·(sec2(x) + sec(x) tan(x)

)dx

Amazingly, this is perfectly set up for the substitution u = sec(x) + tan(x), du =(sec2(x) + sec(x) tan(x))dx:∫

1

sec(x)+ tan(x)

(sec2(x)+ sec(x) tan(x)

)dx =

∫1

udu = ln|u| = ln|sec(x)+ tan(x)|.

Proofs. To formally prove all the basic facts about exponentials and logarithms, webase everything on the one connection between these complicated functions and anelementary function: ln′(x) = 1

x .We now forget everything we previously stated about exponentials and logarithms,

and build up our definitions from scratch, proving all properties.

Definition. For a given x > 0, we let: ln(x) =

∫ x

1

1

tdt.

That is, having forgotten our previous definition of ln(x), we take the symbol ln(x) tomean the given integral, which we know how to compute with arbitrary accuracy. Giventhis, the First Fundamental Theorem (§4.3) immediately proves ln′(x) = 1

x . Next:

Theorem: (a) ln(x1x2) = ln(x1) + ln(x2); (b) ln(xp) = p ln(x).

Proof. (a) For a constant k > 0, the derivative of ln(kx) is:

ln(kx)′ = ln′(kx) · k =1

kx· k =

1

x= ln′(x).

Since ln(kx) and ln(x) are both antiderivatives of 1x , we must have ln(kx) = ln(x) + C

for some constant C (§3.9 Antiderivative Theorem). Setting x = 1, we get ln(k) =ln(1) + C = C, i.e. C = ln(k). Thus, ln(kx) = ln(x) + ln(k) = ln(k) + ln(x), whichbecomes the desired formula if we let k = x1 and x = x2.(b) We use the same steps, starting from ln(xp)′= 1

xp pxp−1 = px = (p ln(x))′.

Definition: The function f(x) = ln(x) is one-to-one for x> 0, so it has aninverse function f−1(x) = ln−1(x). We name this inverse exp(x) = ln−1(x).

Indeed, since ln′(x) = 1x > 0 for all x > 0, we know that ln(x) is increasing (that is,

x1 < x2 guarantees ln(x1) < ln(x2)), so ln(x) is necessarily one-to-one. Thus the InverseTheorem (§6.1) immediately proves exp(ln(x)) = x and ln(exp(x)) = x.

Theorem: exp′(x) = exp(x)

Proof. We apply the Inverse Derivative Theorem (§6.1) to f(x) = ln(x):

exp′(x) = (f−1)′(x) =1

f ′(f−1(x))=

1

ln′(exp(x))=

11

exp(x)

= exp(x).

The exponential addition and multiplication formulas, exp(x1) exp(x2) = exp(x1+x2)and exp(x)p = exp(px), follow from reversing our previous reasoning which proved thelogarithm formulas from the exponential ones. We define the number e = exp(1), i.e. theunique number such that:

∫ e1

1t dt = 1. Finally, we define a general exponential function

as ax = exp(ln(a)x). The Chain Rule then gives (ax)′ = exp(ln(a)x) · ln(a) = ln(a) ax.

Math 133 Natural Exp and Log Stewart §6.3

Basic Properties. Here is pretty much all you need to know about theexp(x) and ln(x) functions.

• exp(x) = ex ln(x) = loge(x)

• eln(x) = x ln(ex) = x

• e0 = 1 e1 = e ≈ 2.71 ln(1) = 0 ln(e) = 1

• ex1ex2 = ex1+x2 (ex)p = epx

• ln(x1x2) = ln(x1) + ln(x2) ln(xp) = p ln(x)

• (ex)′ = ex∫ex dx = ex + C ln′(x) = 1

x

∫1x dx = ln|x|+ C

We give some tricky examples, applying the basic facts and the Chain Rule.

example: Solve for y in the equation: ln(yex) + 1 = 2x + ln(y2).Strategy: Expand into a sum, move y’s to the left, all else to the right.

ln(y) + ln(ex) + 1 = 2x + 2 ln(y)

ln(y)− 2 ln(y) = 2x− x− 1

ln(y) = 1− x

y = e1−x.

example: Differentiate f(x) = sin(etan(x)).Strategy: Apply the Chain Rule with outer = sin(x), inner = etan(x).(

sin(etan(x)))′

= sin′(etan(x)) · (etan(x))′

= cos(etan(x)) · exp′(tan(x)) · tan′(x)

= cos(etan(x)) · etan(x) · sec2(x)

example: Differentiate f(x) =

∫ ex

2ln(t sin(t)) dt.

Strategy: Apply the Chain Rule with outer function g(x) =∫ x2 ln(t sin(t)) dt.

The First Fundamental Theorem (§4.3) says∗ that g′(x) = ln(x sin(x)). Weare given a composition of functions f(x) = g(ex), so the Chain Rule applies:

f ′(x) = g′(ex) · (ex)′ = ln(ex sin(ex)) · ex = xex + ln(sin(ex)) ex.

Notes by Peter Magyar [email protected]∗That is, in the plane with t and y axes, g(x) is the area between the curve y =

ln(t sin(t)) and the interval t ∈ [1, x]. The rate of change of this area function, g′(x),equals the level of the curve at t = x, the moving end of the interval: ln(x sin(x)).

Algebraically, the derivative of the integral of a function gives back the original function.

example: Find dydx by implicit differentiation, for (x, y) satisfying:

ey = cos(x+y).

Specifically, find dydx at the point (x, y) = (0, 0).

Strategy: The equation defines some unknown curve containing the point(x, y) = (0, 0), since e0 = cos(0+0). We want the slope of the tangent line atthat point. Assuming y = y(x) is some function which satisfies the equation,we apply the Chain Rule to both sides, and solve for y′ = dy

dx .

(ey(x))′ = cos(x + y(x))′

exp′(y(x)) · y′(x) = cos′(x + y(x)) · (x + y(x))′

ey · y′ = − sin(x+y) · (1+y′)

(ey + sin(x+y)) y′ = − sin(x+y)

y′ = − sin(x+y)

ey + sin(x+y).

Substituting (x, y) = (0, 0) gives y′ = dydx

∣∣∣x=0

= − sin(0+0)e0+sin(0+0)

= 0. That is,

the unknown curve has a horizontal tangent at the origin.

example: Find the derivative of f(x) = ax for any base a > 0.Strategy: write ax in terms of the natural exponential, whose derivative isknown. Specifically, solving a = ep by p = ln(a), we get a = eln(a), andax = (eln(a))x = eln(a)x. Applying the Chain Rule:

(ax)′ = (eln(a)x)′ = exp′(ln(a)x) · (ln(a)x)′

= eln(a)x · ln(a) = ln(a) ax .

Note that ln(a) is a constant, so (ln(a)x)′ = ln(a).†

†If we tried to apply the Product and Chain Rules, we would get:

(ln(a)x)′ = (ln(a))′·x+ ln(a)·(x)′ = ln′(a)·a′·x+ ln(a)·1 = 1a·0·x+ ln(a) = ln(a).

Math 133 General Exp and Log Stewart §6.4

Derivative of general exp. To compute with functions of arbitrary base,we will repeatedly apply:

Natural Base Principle: To deal with general exponentials andlogarithms in calculus, write them in terms of the natural basee functions ex and ln(x), which have (ex)′ = ex and ln′(x) = 1

x .

For example, we have a = eln(a), so:

(ax)′ = (eln(a)x)′ = exp′(ln(a)x) · (ln(a)x)′ = eln(a)x · ln(a) = ln(a) ax.

Note that one factor is just our original function ax, because differentiatingthe outside function ex has no effect. In the other factor, ln(a) is a (compli-cated) constant, so (ln(a)x)′ = ln(a).

Derivative of general log. Since f(x) = ax = eln(a)x, we can find the in-verse function f−1(y) = loga(y) by solving y = eln(a)x to get: ln(y) = ln(a)x,

and x = ln(y)ln(a) . That is, f−1(y) = loga(y) = ln(y)

ln(a) . Switching the input vari-able to x, we get the logarithm base change formula:

loga(x) =ln(x)

ln(a).

Hence:

log′a(x) =

(ln(x)

ln(a)

)′=

1

ln(a)ln′(x) =

1

ln(a)x.

Problems.

example: Differentiate f(x) = 6x+cos(x). It is not helpful to factor: f(x) =6x6cos(x). Instead, we have 6 = eln(6), so:

f ′(x) =(eln(6)(x+cos(x))

)′= exp′(ln(6)(x+ cos(x)) · ln(6)(x+ cos(x))′

= eln(6)(x+cos(x)) · ln(6) (1− sin(x))

= 6x+cos(x) ln(6) (1− sin(x))

Notice that the original function is again a factor of the derivative, becausethe derivative of the outside exp is itself.


example: Differentiate f(x) = xx. Since x = eln(x), we have:

f ′(x) =(eln(x)x

)′= exp′(ln(x)x) · (ln(x)x)′

= exp′(ln(x)x) · (ln′(x)x + ln(x)x′)

= xx (1 + ln(x)) .

Once again, the original function is a factor of the derivative.Another approach is the logarithmic derivative, based on the formula:

(ln(f(x))′ = ln′(f(x)) f ′(x) =f ′(x)

f(x)=⇒ f ′(x) = f(x) (ln(f(x))′.

For our function, ln(f(x)) = ln(xx) = x ln(x), and we quickly get the pre-vious answer:

f ′(x) = f(x) (ln(f(x))′ = xx(x ln(x))′ = xx(1 + ln(x)).

example: Find the indefinite integral∗∫x 6x

2dx.

We write in terms of natural functions, and do the substitution u =ln(6)x2:∫

x 6x2dx =

∫x eln(6)x

2dx =

1

2 ln(6)

∫eln(6)x

2ln(6) 2x dx

=1

2 ln(6)

∫eu du =

eu

2 ln(6)=

eln(6)x2

2 ln(6)=

6x2

2 ln(6)

∗The notation∫f(x) dx, with no limits of integration, is simply a shorthand for the

general antiderivative, and is called the indefinite integral. Indeed, if we find the indefiniteintegral

∫f(x) dx = F (x) + C, where F ′(x) = f(x), then we can evaluate the definite

integral:∫ b

af(x) dx = [F (x)]x=b

x=a.

Math 133 Exponential Growth and Decay Stewart §6.5

Differential equations. An algebra equation involves a variable representing an un-known number, often denoted by x; and to solve the equation means to find the nu-merical values of x which make the equation true. A differential equation (DE) involves

an unknown function, often y = f(x), and its derivatives dydx = f ′(x), d2y

dx2 = f ′′(x), etc.To solve the DE means to find the functions f(x) which make the equation true. Forexample, the equation f ′(x) = 2x has solutions f(x) = x2 + C for any constant C.

Scientific laws attempt to give simple explanations of complex phenomena. Some,such as the principle of evolution through natural selection, are qualitative laws, statedin ordinary language. Those laws which are quantitative, precisely explaining numericalmeasurements, are usually stated in terms of simple differential equations: the complex-ity arises from the solutions of these equations. The theory of differential equations isone of the richest, most extensive fields of mathematics: in fact, the most importantequations, such as those describing fluid flow, are each large areas of study all by them-selves. Here we will consider only a few very elementary, easy-to-solve examples.

Equations solved by integration alone. The very simplest DE’s are those of theform f ′(x) = g(x), where y = f(x) is the unknown and g(x) is some given, knownfunction. The solution is just the indefinite integral (anti-derivative):

f ′(x) = g(x) =⇒ f(x) =

∫g(x) dx = G(x) + C .

Here G(x) is found by reversing the derivative rules; but if this is not possible, we canalways write G(x) =

∫ xa g(t) dt, a definite integral which can be approximated by Rie-

mann sums. The constant C is often determined by intitial conditions such as f(0) = c.

example: Suppose your car starts at a standstill, and you press the accelerator veryslowly so that after 1 second you are gaining 1 mph each sec, after 2 seconds you aregaining 2 mph each sec, etc. How far have you traveled in t seconds, and how manyseconds until you travel 1000 ft?

Here the unknown function y = f(t) is the distance traveled (the position past the

starting point) in feet; the velocity is dydt = f ′(t); and the acceleration d2y

dt2= f ′′(t) is

given as t mph per sec. Converting to consistent units, 1 mph = 5280 ft3600 sec ≈ 1.5 ft/sec, so

1 mph per sec ≈ 1.5 ft/sec2. Thus:

f ′′(t) = 1.5 t, f ′(0) = 0, f(0) = 0.

The the initial distance traveled is zero, and initial velocity is zero because we start ata standstill.

To solve, we integrate twice:

f ′(t) =

∫1.5 t dt = 0.75 t2 +C1 , f(t) =

∫0.75 t2 +C1 dt = 0.25 t3 +C1t+C2 .

The initial conditions determine the unknown constants C1, C2:

0 = f ′(0) = 0.75(02) + C1 = C1 and 0 = f(0) = 0.25(03) + C1(0) + C2 = C2 .


Therefore: f(t) = 0.25 t3. Finally, solving f(t) = 0.25 t3 = 1000 answers the originalquestion: It takes t = 3

√4000 ≈ 16 sec to travel 1000 ft.

Exponential growth equation. Steady self-reproduction means that on average, eachindividual produces a certain number of offspring per unit time. Translating into math,we say the rate of population growth is proportional to the population level. That is, ifP (t) is the population at time t, and k is the constant reproduction rate per individual,

dP

dt= kP .

To predict the population, we must solve for the unknown function P (t). This time,integrating both sides will not help, since the right side is just as unknown as the left side.However, it is easy to guess a solution function: P (t) = ekt, with P ′(t) = ekt · (kt)′ =P (t) k. In fact, any multiple of this will clearly work:

P (t) = cekt.

I claim that this is the most general solution of the differential equation. In fact, ifP (t) is any solution function with P ′(t) = k P (t), then the Quotient Rule (§2.3) says:(

P (t)

ekt

)′=

P ′(t)ekt − P (t)(ekt)′

(ekt)2=

k P (t) ekt − P (t) k ekt

e2kt= 0.

Since P (t)ekt

has zero derivative, it must be a constant function (§3.2):(P (t)

ekt

)′= 0 =⇒ P (t)

ekt= c =⇒ P (t) = c ekt .

That is, any solution P (t) must have the desired form.

Exponential growth doubling problem. Here is a common type of problem whichneeds no calculus, apart from the general exponential formula found above. Let P (t)grams be the population of bacteria in a tank at t hours. Suppose the population doublesevery 3 hours, and P (1) = 2. Find P (t).

We must translate all the words of the problem (physical level) into equations (alge-braic level). First, doubling in constant time means exponential growth: P (t) = c ekt;but here it is easier to write a = ek, so that P (t) = cat. We need only find the unknownconstants c, a. For a given population P (t), the population 3 hours later will be twiceas much:

P (t+3) = 2P (t) =⇒ c at+3 = 2c at =⇒ ata3 = 2at =⇒ a =3√

2 ≈ 1.26 .

The initial condition becomes: P (1) = c a1 = 2, so that c = 2/a = 2/ 3√

2 = 22/3 ≈ 1.59.

P (t) = 22/3 2(1/3)t = 2(t+2)/3

Beware: this or any exponential model will break down when the population outgrowsits available resources. After that time, our prediction is invalid.

The reciprocal of exponential growth is exponential decay: a process in which, insteadof doubling in constant time, a quantity shrinks by half.

Separation of variables method. Certain easy DE’s can be reduced to an integrationproblem by a simple trick. For some given functions g(x), h(x), we want to find theunknown f(x) satisfying:

f ′(x) =g(x)

h(f(x)).

Here the denominator is some expression in the unknown f(x). Clearing denominatorsputs f(x) only on the left, and taking the integral of both sides gives:∫

h(f(x)) f ′(x) dx =

∫g(x) dx.

The left-hand integral can be simplified by the substitution y = f(x), giving∫h(y) dy.

Assuming we can find antiderivatives∫h(y) dy = H(y) and

∫g(x) dx = G(x) + C, our

equation becomes:∫h(y) dy =

∫g(x) dx =⇒ H(y) = H(f(x)) = G(x)+C =⇒ f(x) = H−1(G(x)+C) .

assuming we can find an inverse function H−1(x) which undoes H(x).This reasoning looks especially natural in Leibnitz notation: letting y = f(x):

dy

dx=

g(x)

h(y)=⇒ h(y)

dy

dx= g(x) =⇒

∫h(y)

dy

dxdx =

∫g(x) dx

=⇒∫

h(y) dy =

∫g(x) dx =⇒ H(y) = G(x)+C =⇒ y = H−1(G(x)+C).

example: Applying the method to the exponential growth equation from before:

dP

dt= kP =⇒

∫1

P

dP

dtdt =

∫k dt =⇒

∫1

PdP =

∫k dt

=⇒ log|P | = kt + C =⇒ P = ±ekt+C = ±eCekt .

This is our previous answer, except the arbtrary coefficient is ±eC instead of c.∗

example: Solve f ′(x) = sin(x)f(x) . That is, the derivative of our function y = f(x) is sin(x)

divided by f(x) itself. The method gives:

dy

dx=

sin(x)

y=⇒

∫ydy

dxdx =

∫sin(x) dx =⇒

∫y dy =

∫sin(x) dx

=⇒ 1

2y2 = − cos(x) + C =⇒ y =

√2C−2 cos(x) .

We can check our solution by plugging it into the equation. Indeed, by the Chain Rule,f ′(x) = 1

2(2C−2 cos(x))−1/2 · 2 sin(x) = sin(x)f(x) .

∗The form of the constant is irrelevant. For example, if we had put constants in both generalantiderivatives, log |P | + C1 = kt + C2, it would lead to P = ±eC1−C2ekt, where C1, C2 are arbitraryconstants. But this does not give a more general solution than ±eC for a single arbitrary C, or just asimple constant coefficient c.

Newton’s Law of Cooling. This is a toy example of how scientific laws are expressedby DE’s. Newton proposed a simple mechanism to describe the temperature T (t) of a hotbody cooling down to environmental temperature E: the rate of cooling is proportionalto the difference between the body’s temperature and the environment. The DE is:

dT

dt= −k (T−E) ,

where k > 0, so a high temperature cools quickly. Separating variables gives:∫1

T−EdT

dtdt = −

∫k dt =⇒

∫1

T−EdT = −

∫k dt

=⇒ ln|T−E| = −kt + C =⇒ T = E ± eCe−kt = E + ce−kt.

That is, T (t) approaches the horizontal asymptote E by exponential decay.

example: In a room at 20◦C, a cup of boiling tea (100◦C) cools to 80◦C in 1 minute.How long until it is sippable at 50◦C? To answer, we assume T (t) = E+ce−kt = 20+ce−kt

according to Newton’s Law. The two intial conditions determine the parameters:

T (0) = 100 = 20 + c =⇒ c = 80.

T (1) = 80 = 20 + 80e−k =⇒ e−k = 6080 = 3

4 .

Thus T (t) = 20 + 80(34

)t, and solving T (t) = 50 gives t ≈ 3.4 min.

A few more examples of differential equations are solved in the last lecture notes.

Math 133 Inverse Trigonometric Functions Stewart §6.6

Inverses and domains. Consider a hot-air balloon 20 feet in the air, tethered by arope stretching 50 feet diagonally to the ground. What is the rope’s angle of elevation?

Because sine = opposite/hypotenuse, the angle of elevation θ has sin(θ) = 2050 = 2

5 . Tofind θ, we need the inverse function: θ = sin−1(25) ≈ 0.41 rad ≈ 23.6◦, using the inv

sin or arcsin function on a calculator. However, the equation sin(θ) = 25 has infinitely

many solutions:

If the initial solution is θ0, there is another solution at θ1 = π − θ0, and in general atθ0 + 2nπ, θ1 + 2nπ for any integer n. In our problem, we clearly want an acute angle,so we restrict 0 ≤ θ ≤ π

2 , making θ = θ0 the unique acceptable solution.A bit more generally, we restrict sin(x) to the domain −π

2 ≤ θ ≤π2 marked below, to

make it a one-to-one function (so different inputs go to different outputs, and the graphsatisfies the horizontal line test). We get a pair of inverse functions:

sin :[−π

2 ,π2

]−→ [−1, 1], sin−1 : [−1, 1] −→

[−π

2 ,π2

].

See the end of this section for graphs of inverse functions with standard domains.An alternative notation is sin−1(y) = arcsin(y), meaning the arc (angle) whose sine

is y.∗ Similarly tan−1(y) = arctan(y), etc. Watch out for an unfortunate ambiguity:sin−1(x) could mean either arcsin(x), the inverse under composition of functions; or

1sin(x) , the inverse under multiplication of functions. We will always write:

sin−1(x) = arcsin(x), sin(x)−1 =1

sin(x)= csc(x) .

Inverse functions and triangles. The Pythagorean relations between trig functionslead to relations among their inverses. Given θ = sin−1(y), i.e. sin(θ) = y, we set up thetriangle at left below so that sin(θ) = opposite/hypotenuse = y/1. The adjacent sidex satisfies x2 + y2 = 1, so x =

√1− y2, and we can compute:

cos(θ) =adjacent

hypotenuse=

√1− y2

1,

Notes by Peter Magyar [email protected]∗Recall that the radian angle θ is defined as the length of an arc on the unit circle: the full circle has

circumference 2π, hence 2π radians.

that is, cos(sin−1(y)) = cos(θ) =√

1− y2, and similarly for tan(θ) = tan(sin−1(y)), etc.

In the right picture, we have θ = tan−1(y) since tan(θ) = opposite/adjacent = y/1,and we compute sin(tan−1(y)) = sin(θ) = y√

1+y2, etc.

θ = sin−1(y) θ = tan−1(y)

sin(θ) = y sin(θ) =y√

1 + y2

cos(θ) =√

1− y2 cos(θ) =1√

1 + y2

tan(θ) =y√

1− y2tan(θ) = y

Derivatives of inverse functions. Recall the Inverse Derivative Formula from §6.1:if y = f(θ) and θ = f−1(y), then:

(f−1)′(y) =1

f ′(θ)=

1

f ′(f−1(y)).

Taking f(θ) = sin(θ) and θ = sin−1(y), we get:

(sin−1)′(y) =1

sin′(θ)=

1

cos(θ)=

1√1− y2

.

Similarly, we conclude:

(sin−1)′(y) =1√

1− y2(cos−1)′(y) = − 1√

1− y2

(tan−1)′(y) =1

1 + y2(sec−1)′(y) =

1

y√y2 − 1

.

The inverse secant is widely deprecated. Instead of∫

1

y√y2−1

dy = sec−1(y), many prefer:

(tan−1

√y2−1

)′=

1

y√y2 − 1

=⇒∫

1

y√y2 − 1

dy = tan−1√y2−1 .

Inverse functions and integrals. The above derivative formulas can be reversed togive antiderivatives (indefinite integrals). That is,

∫1√1−y2

dy = sin−1(y) + C, etc.

example: Find∫

1√2−x2 dx. The trick is to rewrite the integrand in the form of one of

our derivatives, whichever is closest, in this case 1√1−y2

.

∫1√

2− x2dx =

∫1√2

1√1− x2

2

dx =

∫1√

1− ( x√2)2

1√2dx

[y = x√

2

dy = 1√2dx

]

=

∫1√

1− y2dy = sin−1(y) + C = sin−1( x√

2) + C .

example: Find∫

1√1+x−x2 dx. Again, we want to force the integrand into the form

1√1−y2

. Since we have a three-term quadratic, we complete the square,† writing:

1 + x− x2 = 54 − (x−1

2)2 = 54

(1− 4

5

(x−1

2

)2)= 5

4

(1−

(2√5x− 1√

5

)2).

Thus, we take y = 2√5x− 1√

5, dy = 2√

5dx, so that:∫

1√1 + x− x2

dx =

∫1

√52

√1−

(2√5x− 1√

5

)2 dx =

∫1√

1−(

2√5x− 1√

5

)2 2√5dx

=

∫1√

1− y2dy = sin−1(y) + C = sin−1

(2√5x− 1√

5

)+ C .

An impressive integral!

Graphs of inverse functions

†That is, rewrite: x2 + bx + c = x2 + 2( b2)x + ( b

2)2 − ( b

2)2 + c = (x + b

2)2 + b2−4c

4. This is the

computation which leads to the Quadratic Formula.

The strange standard domain for sec(θ) is θ ∈ [0, π2 )∪[π, 3π2 ), which is chosen to make thesigns work out in (sec−1)′(y) = 1

y√y2−1

. If we had instead chosen sec−1(y) = cos−1( 1y ),

we would get (sec−1)′(y) = 1

|y|√y2−1

Math 133 Hyperbolic Functions Stewart §6.7

Definitions. Besides the algebraic functions defined by arithmetic operations, pow-ers, and roots, we have seen several types of transcendental functions such as ex, thetrigonometric functions, and their inverse functions. Now we introduce the hyperbolicfunctions, a new class of transcendental functions which appear in some scientific andmathematical applications (though much less commonly than our previous functions).

Each hyperbolic function corresponds to a trigonometric function: to the ordinarysine function sin(x) there corresponds the hyperbolic sine sinh(x); to the ordinary tan-gent there corresponds the hyperbolic tangent tanh(x), etc.∗ These new functions aredefined in terms of exponential functions:

sinh(x) =ex − e−x

2cosh(x) =

ex + e−x

2tanh(x) =

sinh(x)

cosh(x)

sech(x) =1

cosh(x)csch(x) =

1

sinh(x)coth(x) =

cosh(x)

sinh(x)

All these can be written in terms of exponentials, such as tanh(x) = ex−e−x

ex+e−x , and their

graphs are easy to picture from knowing y = ex and y = e−x:

Notice that sinh(x) is an odd function like sin(x), meaning f(−x) = −f(x); andcosh(x) is an even function like cos(x), meaning f(−x) = f(x). Also, ex = sinh(x) +cosh(x) , so we can think of the two main hyperbolic functions as the odd and evencomponents of the exponential function.†

Notes by Peter Magyar [email protected]∗We pronounce sinh as “sinch”, cosh as “kosh”, tanh as “tanch”, etc.†The hyperbolic ex = cosh(x) + sinh(x) corresponds to Euler’s formula eix = cos(x) + i sin(x).

Geometric meaning. Why the trigonometric nomenclature? The most importantgeometric role of the trigonometric functions is to pamametrize circular motion: (x, y) =(cos(t), sin(t)) traces out the unit circle for t ∈ [0, 2π]. This is because the identitycos2(t) + sin2(t) = 1 corresponds to the circle equation x2 + y2 = 1.

It turns out the hyperbolic functions (x, y) = (cosh(t), sinh(t)) for t ∈ (−∞,∞) traceout a branch of the standard hyperbola defined by x2 − y2 = 1, because of the identitycosh2(t)− sinh2(t) = 1.

In fact, the shaded sector with corners (0, 0), (1, 0), (cosh(t), sinh(t)) has area 12 t; just

as in the circle, the sector with corners (0, 0), (1, 0), (cos(t), sin(t)) has area 12 t.

The identity can easily be checked from the definitions:

cosh2(t)− sinh2(t) =

(ex + e−x

2

)2−(ex − e−x

2

)2

=(e2x+2+e−2x)− (e2x−2+e−2x)

4=

4

4= 1

Formulas. The analogy goes much further: almost every formula involving trigonomet-ric functions has a hyperbolic counterpart, often with changes in the ± signs.

cosh2(x)− sinh2(x) = 1

sinh(x+ y) = sinh(x) cosh(y) + cosh(x) sinh(y)

cosh(x+ y) = cosh(x) cosh(y) + sinh(x) sinh(y)

sinh′(x) = cosh(x) cosh′(x) = sinh(x)

tanh′(x) = sech2(x) sech′(x) = − tanh(x) sech(x)

Each of these can be easily verified from the definitons via exponentials. For example:

sinh′(x) =(12(ex − e−x)

)′= 1

2

(ex − (−e−x)

)= cosh(x).

tanh′(x) =(

sinh(x)cosh(x)

)′= sinh′(x) cosh(x)−sinh(x) cosh′(x)

cosh2(x)

= cosh2(x)−sinh2(x)cosh2(x)

= 1cosh2(x)

= sech2(x).

example: Find the derivative of ln(sinh(x)). Using the Chain Rule:

(ln(sinh(x)))′ = ln′(sinh(x)) · sinh′(x) =1

sinh(x)· cosh(x) = coth(x).

example: Find the antiderivative∫ sinh(x)

cosh2(x)dx. Substitute u = cosh(x), du = sinh(x) dx:

∫ sinh(x)

cosh2(x)dx =

∫1

cosh2(x)· sinh(x) dx =

∫1u2 du = − 1

u = − 1cosh(x) = −sech(x) .

Alternatively:∫ sinh(x)

cosh2(x)dx =

∫tanh(x) sech(x) dx = sech(x), directly from reversing

our derivative table. (For brevity, we have neglected the arbitrary constant term +C.)

Integral table. We can define inverse hyperbolic functions and compute their deriva-tives just as we did for trig functions. The result gives several more antiderivativeformulas, which we summarize here along with the trig versions. (We omit +C.)∫

1√1− x2

dx = sin−1(x)

∫1√

x2 − 1dx = cosh−1(x) = ln(x+

√x2 − 1)

∫1√

1 + x2dx = sinh−1(x) = ln(x+

√1 + x2)

∫1

x√x2 − 1

dx = sec−1(x)

∫1

x√

1− x2dx = sech−1(x) = ln(x)−ln(1+

√1− x2)∫

1

x√

1 + x2dx = −csch−1(x) = ln(x)− ln(1 +

√1 + x2)

∫1

1 + x2dx = tan−1(x)

∫1

1− x2dx = tanh−1(x) = 1

2 ln(1 + x)− 12 ln(1− x)

We sometimes denote sin−1 as arcsin, and we may denote sinh−1 as argsinh, etc.

Math 133 L’Hopital’s Rule Stewart §6.8

This technique evaluates limits which approach indeterminate forms like 00 and ∞∞ .

Theorem: For functions f(x), g(x), suppose f ′(x), g′(x) exist and g′(x) 6= 0,on some interval x ∈ (a−δ, a+δ). Suppose that either:

limx→a

f(x) = limx→a

g(x) = 0 or limx→a|f(x)| = lim

x→a|g(x)| =∞.

Then:

limx→a

f(x)

g(x)= lim

x→a

f ′(x)

g′(x),

provided the right side limit exists, or equals ∞ or −∞.

There is another version for limits as x becomes very large:

Theorem: Let f(x), g(x) be functions which are differentiable and g′(x) 6= 0,on a semi-infinite interval x ∈ (c,∞). Suppose that either:

limx→∞

f(x) = limx→∞

g(x) = 0 or limx→∞

|f(x)| = limx→∞

|g(x)| =∞.

Then:

limx→∞

f(x)

g(x)= lim

x→∞

f ′(x)

g′(x),

provided the right side limit exists, or equals ∞ or −∞.

The above also holds with x→∞ replaced with x→ −∞.

Proof.∗ There is an easy and enlightening proof of the Theorem if we assume:

limx→a

f(x) = f(a) = 0, limx→a

g(x) = g(a) = 0,

limx→a

f ′(x) = f ′(a), limx→a

g′(x) = g′(a) 6= 0.

In this case:

limx→a

f ′(x)

g′(x)=

f ′(a)

g′(a)= lim

x→a

f(x)−f(a)x−a

g(x)−g(a)x−a

= limx→a

f(x)− f(a)

g(x)− g(a)= lim

x→a

f(x)

g(x).

That is, the quotient on the left is approximately ∆f∆g . But if f starts at f(a) = 0, then

the change in f(x) is just the value of f(x): that is, ∆f = f(x) − f(a) = f(x); andsimilarly ∆g = g(x).

Notes by Peter Magyar [email protected]∗A more complete proof. Assume only that limx→a f(x) = f(a) = 0, limx→a g(x) = g(a) = 0 and

limx→a f′(x)/g′(x) exists. This means f ′(x), g′(x) are defined and g′(x) 6= 0 near x = a. If g(x) = 0 near

x = a, the Mean Value Theorem (§3.2) would imply g′(c) = 0 for c ∈ (a, x) or (x, a), a contradiction;thus g(x) 6= 0 near x = a.

The Cauchy Mean Value Theorem (end of §3.2) says that if f(x), g(x) are continuous on [a, b], differ-

entiable on (a, b), then there is some c ∈ (a, b) with f(b)−f(a)g(b)−g(a)

= f ′(c)g′(c) , provided the denominators are

non-zero. Applying this to any sufficiently small interval [a, x] or [x, a] gives some cx ∈ (a, x) or (x, a)with f(x)/g(x) = f ′(cx)/g′(cx). Now, as x → a, also cx → a, and f(x)/g(x) = f ′(cx)/g′(cx) clearlyapproaches the same value as f ′(x)/g′(x).

example: limx→2x−2x2−4

. The top and bottom both approach zero, so the limit ap-

proaches the indeterminate form 00 , and l’Hopital’s Rule applies.

limx→2

x− 2

x2 − 4

Hop= lim

x→2

(x− 2)′

(x2 − 4)′= lim

x→2

1

2x=

1

4.

In this simple case, we can also find the limit by cancelling vanishing factors in thenumerator and denominator:

limx→2

x− 2

x2 − 4= lim

x→2

x− 2

(x− 2)(x+ 2)= lim

x→2

1

x+ 2=

1

4.

example: limx→0ex−1−x

x2 . This approaches 00 , so l’Hopital applies.

limx→0

ex − 1− xx2

Hop= lim

x→0

ex − 0− 1

2x.

This still approaches 00 , so we can use l’Hopital again:

limx→0

ex − 0− 1

2x

Hop= lim

x→0

ex

2=

e0

2=

1

2.

example: limx→0+ x ln(x). (Here we use a one-sided limit x → 0+ because ln(x) isundefined for x < 0.) This approaches the indeterminate form 0 · (−∞), so it is adifficult limit, but we must manipulate it into a quotient to apply l’Hopital:

limx→0+

x ln(x) = limx→0+

ln(x)

1/x

Now top and bottom become infinite, the limit approaches ∞−∞ and l’Hopital applies.

limx→0+

x ln(x) = limx→0+

ln(x)

1/x

Hop= lim

x→0+

1/x

−1/x2= lim

x→0+(−x) = 0 .

example: limx→0 xx. This approaches the indeterminate form 00, but we can once

again manipulate it into a limit we can handle:

limx→0

xx = limx→0

eln(x)x = limx→0

exp(x ln(x)) = exp(

limx→0

x ln(x)).

We can move the limit inside the exp( ) because it is a continuous function (see §1.8Composition Law). Now, applying the previous example, the limit becomes exp(0) = 1.

example: limx→0sin(x)ex . The bottom does not approach 0, so this is not indeterminate

at all, and l’Hopital does not apply here. Instead, this is an easy limit that can beevaluated by continuity:

limx→0

sin(x)

ex=

sin(0)

e0=

0

1= 0.

If we incorrectly try to apply l’Hopital when it is not valid, we get a wrong answer:

limx→0

sin(x)

ex??

Hop= ?? lim

x→0

cos(x)

ex=

cos(0)

e0= 1 = WRONG.

example: limx→∞ex

xn for any integer n > 0. Here top and bottom go to∞ as x becomesvery large, so the limit approaches ∞∞ and l’Hopital applies; in fact it applies n times:

limx→∞

xn

exHop= lim

x→∞nxn−1

exHop= lim

x→∞n(n−1)xn−2

exHop= · · · Hop

= limx→∞

n!x0

ex = 0 ,

since the top is the constant n! = n(n−1)(n−2) · · · (3)(2)(1) and the bottom goes to∞. This means that the exponential growth on the bottom is much faster than thepolynomial growth on the top, so the quotient gets smaller and smaller.

example:

limx→∞

x3 + x2 + x+ 1

x2 − x+ 1

Hop= lim

x→∞

3x2 + 2x+ 1

2x− 1

Hop= lim

x→∞

6x+ 2

2= ∞

This means that the x3 growth on top is much faster than the x2 growth on the bottom.We can see this without l’Hopital if we divide top and bottom by the smaller leadingterm, namely x2:

limx→∞

1x2 (x3 + x2 + x+ 1)

1x2 (x2 − x+ 1)

= limx→∞

x+ 1 + 1x + 1

x2

1− 1x + 1

x2

.

Clearly, the top approaches x + 1, while the bottom approaches 1, so the quotientapproaches ∞.

Math 133 Integration by Parts Stewart §7.1

Review of integrals. The definite integral gives the cumulative total of many smallparts, such as the slivers which add up to the area under a graph. Numerically, it is alimit of Riemann sums: ∫ b

af(x) dx = lim

n→∞

n∑i=1

f(xi) ∆x ,

where we divide the interval x ∈ [a, b] into n increments of size ∆x = b−an with division

points a < a+∆x < a+2∆x < · · · < a+n∆x = b, and x1, . . . xn are sample points fromeach increment. This definition is not a theoretical curiosity: it is the reason integralsare relevant to physical problems, and it is the only way to evaluate most integrals: thereis no algebraic way.

However, for sufficiently simple functions f(x), we can evaluate integrals algebraicallyby the shortcut of the Second Fundamental Theorem of Calculus. This says that if f(x)is the rate of change of some known antiderivative F (x), then the integral of f(x) is thecumulative total change of F (x):

F ′(x) = f(x) =⇒∫ b

af(x) dx = F (x)|x=b

x=a = F (b)− F (a) .

(The First Fundamental Theorem says that the definite integral gives an antiderivativeeven if there is no formula F (x): defining I(x) =

∫ xa f(t) dt, we have I ′(x) = f(x).)

Algebraic integration is the process of finding antiderivative formulas, denoted asindefinite integrals

∫f(x) dx = F (x) + C. The most direct method is to reverse Basic

Derivatives, such as (xp)′ = pxp−1 reversing to∫xp dx = xp+1

p+1 . Our only other integrationmethod so far is the Substitution Method, which reverses the Chain Rule:∫

f(g(x)) g′(x) dx =

∫f(u) du = F (u) + C where u = g(x) and F ′(u) = f(u).

Reversing the Product Rule. Since we have:

(f(x) g(x))′ = f(x) g′(x) + g(x) f ′(x) ,

we can take the antiderivative of both sides to give:

f(x) g(x) =

∫f(x) g′(x) dx +

∫g(x) f ′(x) dx ,∫

f(x) g′(x) dx = f(x) g(x)−∫g(x) f ′(x) dx .

In Leibnitz notation, taking u = f(x), du = f ′(x) dx and v = g(x), dv = g′(x) dx:∫u dv = uv −

∫v du .

This method transforms the integral of a product f(x) g′(x) into f(x) g(x) minus theintegral of g(x) f ′(x), the other term in the Product Rule; we can think of lowering f(x)to its derivative f ′(x) and raising g′(x) to its antiderivative g(x).


Method for Integration by Parts.

1. Given an indefinite integral∫h(x) dx, find a factor of the integrand h(x) which

you recognize as the derivative of a function g(x): that is, write h(x) = f(x) ·g′(x).

2. Taking u = f(x), dv = g′(x) dx, transform the integral∫h(x) dx =

∫u dv into

uv −∫v du = f(x)g(x)−

∫g(x) f ′(x) dx.

3. Simplify g(x) f ′(x), possibly using identities, and try to find its integral by othermethods such as Substitution.

4. Sometimes you can repeat Steps 1 & 2 on∫g(x) f ′(x) dx with a different u, v.∗

This might result in a simpler integral which you can evaluate by other methods.

5. Instead of simplifying the integral, Step 3 or 4 might give an expression with thesame integral you started with. Solve the resulting equation to find that integral.

Notice that Step 1 is the same as for the Method of Substitution, where you must finda factor of the integrand which is a known derivative g′(x); but for Substitution, g(x)must also appear as an inside function in the remaining factor: h(x) = f(g(x)) · g′(x).

example: Evaluate∫x cos(x) dx. There are two obvious candidates for u, v. First, if

we take u = cos(x), dv = x dx, we get du = − sin(x) dx, v = 12x

2, and:∫u dv = uv −

∫v du∫

cos(x)x dx = cos(x)(12x

2)−

∫12x

2 (− sin(x)) dx

Unfortunately, the new integral∫x2 sin(x) dx is harder than the original

∫x cos(x) dx.

We must make a wiser choice of u, v, so that the derivative du will be simpler than theoriginal u, while the antiderivative v will be no worse than the original dv.

The other obvious choice will work: take u = x, dv = cos(x) dx, so that du = 1 dxand v = sin(x). Then:† ∫

u dv = uv −∫v du∫

x cos(x) dx = x sin(x) −∫

sin(x) 1 dx

= x sin(x) + cos(x).

Thus, Steps 1–3 were enough to integrate.To check our answer, we reverse our Integration by Parts using the Product Rule:

(x sin(x) + cos(x))′ = x sin′(x) + sin(x)(x)′ + cos′(x)

= x cos(x) + sin(x)− sin(x) = x cos(x).

∗Repeating with the same factorization∫v du would get back the original integral

∫u dv.

†For brevity, we again neglect the arbitrary constant +C in a general antiderivative, though youshould write it on a test or quiz.

example: Evaluate∫x2 e−x dx. We should choose u = x2, dv = e−x, so that du = 2x dx

is simpler, but v = −e−x is no more complicated:∫u dv = uv −

∫v du∫

x2 e−x dx = x2(−e−x) −∫

(−e−x) 2x dx

= −x2e−x + 2∫xe−x dx

Going on to Step 4, we repeat the process for the integral on the right side, this timewith u = x, dv = e−x dx and du = dx, v = −e−x:∫


x e−x dx = x(−e−x) −∫

(−e−x) dx

= −xe−x +∫e−x dx

= −xe−x + e−xPutting these together:∫

x2 e−x dx = −x2e−x + 2(−xe−x + e−x) = −(x2 + 2x− 2)e−x.

example: Evaluate∫ex sin(x) dx. Steps 1–4 give:∫

ex sin(x) dx = ex sin(x) −∫ex cos(x) dx, u = sin(x), v = ex

= ex sin(x) −(ex cos(x)−

∫ex(− sin(x)) dx

), u = cos(x), v = ex.

We conclude: ∫ex sin(x) dx = ex sin(x)− ex cos(x)−

∫ex sin(x) dx.

Since our integral∫ex sin(x) dx appears on both sides, we go to Step 5 and solve for it:∫

ex sin(x) dx = 12 (ex sin(x)− ex cos(x)) .

example: Evaluate∫

ln(x) dx. Here there does not seem to be any dv factor, but wecan always take dv = 1 dx, so v = x:∫


ln(x) 1 dx = ln(x)x −∫x 1

x dx

= x ln(x) − x.

example: Evaluate∫

sin−1(x) dx.‡ Again we must use u = sin−1(x) and dv = 1 dx,counting on the fact that du is simpler than u:∫


sin−1(x) 1 dx = sin−1(x)x −∫x 1√

1−x2dx

Continuing Step 3, we use the substitution z = 1− x2 on the right-hand integral:∫x 1√

1−x2dx = −1

2

∫1√

1−x2(−2x) dx = −1

2

∫1√zdz = −

√z = −

√1−x2.

Combining:∫sin−1(x) dx = x sin−1(x)− (−

√1−x2) = x sin−1(x) +

√1−x2.

‡Notation: sin−1(x) = arcsin(x), but sin(x)−1 = 1sin(x)

= csc(x).

Math 133 Trigonometric Integrals Stewart §7.2

Products by substitution. In this section, we develop several methods to find indef-inite integrals (antiderivatives) of products of trig functions. The simplest method is asimple trig substitution which reduces the integral to a polynomial:

(a)∫

sinn(x) cos2m+1(x) dx for m ≥ 0. Take u = sin(x), du = cos(x) dx. Example:∗∫sin−10(x) cos3(x) dx =

∫sin−10(x) cos2(x) · cos(x) dx

=∫

sin−10(x)(1− sin2(x)) · cos(x) dx

=∫u−10(1− u2) du

=∫u−10 − u−8 du

= 1−9u

−9 − 1−7u

−7

= −19 sin−9(x) + 1

7 sin−7(x)

= −19 csc9(x) + 1

7 csc7(x).

(b)∫

sin2n+1(x) cosm(x) dx for n ≥ 0. Take u = cos(x), du = − sin(x) dx. Example:∫sin5(x) cos20(x) dx = −

∫sin4(x) cos20(x) · (− sin(x)) dx

= −∫

(1− cos2(x))2 cos20(x) · (− sin(x)) dx

=∫

(1− u2)2u20 du

=∫

(1− 2u + u2)u20 du

= 121u

21 − 222u

22 + 123u

23

= 121 cos21(x)− 1

11 cos22(x) + 123 cos23(x).

(c)∫

tann(x) sec2m+2(x) dx for m ≥ 0. Take u = tan(x), du = sec2(x) dx. Example:†∫tan(x)−1 sec6(x) dx =

∫tan(x)−1 sec4(x) · sec2(x) dx

=∫

tan(x)−1(tan2(x) + 1)2 du

=∫u−1(u2 + 1)2 du

=∫u−1(u4 + 2u2 + 1) du

= 14u

4 + u2 + ln|u|

= 14 tan4(x) + tan2(x) + ln|tan(x)|.

(d)∫

tan2n+1(x) secm(x) dx for n ≥ 0. Take u = sec(x), du = tan(x) sec(x).

Notes by Peter Magyar [email protected]∗For brevity, we again neglect the +C in indefinite integrals, though you should write it on a test.†Notation: tan(x)−1 = 1

tan(x)= cot(x), but tan−1(x) = arctan(x).

Remaining cases. If a product of trig functions does not fit the above types, we havesome strategies to make it tractable.

• Rewrite using trig function definitions, producing a good type.∫sin−3(x) dx =

∫cos3(x)

sin3(x)

1

cos3(x)dx =

∫tan3(x) sec3(x) dx = type (d).∫

sin2(x) cos(x) tan2(x) csc(x) dx =

∫sin2(x) cos(x)

sin2(x)

cos2(x)

1

sin(x)dx

=

∫sin3(x) cos(x)−1 dx = type (b).

• Rewrite using the identities: sin2(x) = 12(1− cos(2x)), cos2(x) = 1

2(1+ cos(2x)),sin(x) cos(x) = 1

2 sin(2x). For example:∫sin6(x) dx =

∫(12(1− cos(2x))3 dx

= 18

∫1− 3 cos(2x) + 3 cos2(2x)− cos3(2x) dx

= 18

∫1− 3 cos(2x) + 3

2(1+ cos(4x))− cos3(2x) dx ,

where we used the binomial formula (a+b)3 = a3 + 3a2b + 3ab2 + b3. Now eachterm can be done as type (a) with the substitution u = sin(2x) or u = sin(4x).

Recalcitrant cases. For the really tough ones, we need tricks.

example: Recall this amazing trick from §6.2:‡∫sec(x) dx =

∫sec(x)

sec(x) + tan(x)

sec(x) + tan(x)dx

=

∫1

sec(x) + tan(x)·(sec2(x) + sec(x) tan(x)

)dx

=

∫1

udu for

{u = sec(x) + tan(x)

du = (sec2(x) + sec(x) tan(x))dx

= ln|u| = ln∣∣sec(x)+ tan(x)

∣∣ .Another trick for this is to write

∫sec(x) dx =

∫1

cos2(x)cos(x) dx, and substitute u =

sin(x) to get∫

11−u2 du. We will see how to integrate such rational functions in §7.4.

‡The integral of secant was an important problem for map-makers in the 1600’s, when Calculus wasfirst developed. It calibrates the stretching in the Mercator projection, in which map directions matchcompass directions.

example: Here is a tricky integration by parts, in which we get back to the sameintegral we started with:∫

sec3(x) dx =

∫sec(x)(1 + tan2(x)) dx =

∫sec(x) dx +

∫tan(x)︸︷︷︸

u

sec(x) tan(x) dx︸︷︷︸dv

= ln|sec(x)+ tan(x)|+ tan(x)︸︷︷︸u

sec(x)︸︷︷︸v

−∫

sec(x)︸︷︷︸v

sec2(x) dx︸︷︷︸du

Since we have∫

sec3(x) dx on both sides, we can solve for it to get:∫sec3(x) dx = 1

2

(ln∣∣sec(x)+ tan(x)

∣∣ + tan(x) sec(x)).

Integrals with inside coefficients. Recall the identities:

cos(a+b) = cos(a) cos(b)− sin(a) sin(b)

cos(a−b) = cos(a) cos(b) + sin(a) sin(b)

12

(cos(a+b) + cos(a−b)

)= cos(a) cos(b).

This allows us to do integrals of the form:∫cos(nx) cos(mx) dx =

∫12

(cos(nx+mx) + cos(nx−mx)

)dx

= 12(n+m) sin

((n+m)x

)+ 1

2(n−m) sin((n−m)x

).

Some similarly useful idenitities:

12

(cos(a−b)− cos(a+b)

)= sin(a) sin(b)

12

(sin(a+b)− sin(a−b)

)= sin(a) cos(b).

Math 133 Reverse Trig Substitution Stewart §7.3

Reducing to standard trig forms. To find an indefinite integral∫f(x) dx, we trans-

form it by methods like Substitution and Integration by Parts until we reduce to anintegral we recognize from before, a “standard form”. In the previous section §7.2, wewere able to compute most integrals involving products of trig functions, so these arenow standard forms to work toward.

A common type of difficult integral involves forms like√±x2 ± 1. We convert such

forms into trigonometric integrals, which at first seems to complicate them. However, wetake careful advantage of the Pythagorean identities cos2(θ) + sin2(θ) = 1 and tan2(θ) +1 = sec2(θ), so that the resulting trig formulas simplify to doable integrals.

Our first example is∫√

1− x2 dx. This seems simple enough, but neither Substitu-tion nor Integration by Parts will simplify it. Instead, imagine that we obtained thisfrom a more complicated integral by a trig substitution x = sin(θ): the current variablex is actually a function of a previous variable θ. The previous integral would be:∫ √

1−x2 dx =

∫ √1− sin2(θ) · cos(θ) dθ, where

{x = sin(θ)

dx = cos(θ) dθ.

We did not choose this substitution at random: the trig form simplifies because√1− sin2(θ) = cos(θ), and we obtain a standard form from §7.2:∫ √

1− sin2(θ) · cos(θ) dθ =

∫cos2(θ) dθ = 1

2θ −14 sin(2θ) = 1

2θ −12 sin(θ) cos(θ) .

Finally, we substitute back in terms of x: θ = arcsin(x), sin(θ) = x, cos(θ) =√

1−x2:∗∫ √1−x2 dx = 1

2 arcsin(x)− 12x√

1−x2 .

Let’s check the area of the unit circle, i.e. twice the area under the graph y =√

1−x2:

2

∫ 1

−1

√1−x2 dx =

[arcsin(x)− x

√1−x2

]x=1

x=−1= arcsin(1)− arcsin(−1) = π.

Integrals with√±x2±a2. We choose a reverse trig substitution depending on the

signs in the expression, then use the corresponding Pythagorean identity to obtain astandard trig form:

√a2−x2 x = a sin(θ) dx = a cos(θ) dθ

√a2−a2 sin2(θ) = a cos(θ)

√a2+x2 x = a tan(θ) dx = a sec2(θ) dθ

√a2+a2 tan2(θ) = a sec(θ)

√x2−a2 x = a sec(θ) dx = a tan(θ) sec(θ) dθ

√a2 sec2(θ)−a2 = a tan(θ)

Notes by Peter Magyar [email protected]∗See §6.6 for inverse trig functions.

example:

∫1√

4− x2dx. Let x = 2 sin(θ), dx = 2 cos(θ) dθ,

√4−(2 sin(θ))2 = 2 cos(θ):

∫1√

4− x2dx =

∫1

2 cos(θ)· 2 cos(θ) dθ = θ = arcsin(12x) .

We could do this more directly by manipulating the integrand to the known derivativeof arcsin(x) by the substitution u = 1

2x:∫1√

4−x2dx =

∫1√

1−(12x)2· 12 dx =

∫1√

1−u2du = arcsin(u) = arcsin(12x).

example:

∫1√

9x2+4dx. Let x = 2

3 tan(θ), dx = 23 sec2(θ) dθ,

√9(23 tan(θ))2+4 = 2 sec(θ):

∫1√

9x2+4dx =

∫1

2 sec(θ)· 23 sec2(θ) dθ = 1

3

∫sec(θ) dθ

= 13 ln∣∣tan(θ) + sec(θ)

∣∣ = 13 ln∣∣∣32x+ 1

2

√9x2+4

∣∣∣ .We can even write this in terms of the inverse hyperbolic sine from §6.7:

13 ln∣∣∣32x+

√94x

2+1∣∣∣ = 1

3 sinh−1(32x) .

example:

∫ √x2−25

xdx; x = 5 sec(θ), dx = 5 tan(θ) sec(θ) dθ,

√(5 sec(θ))2−25 = 5 tan(θ):

∫ √x2−25

xdx =

∫5 tan(θ)

5 sec(θ)· 5 tan(θ) sec(θ) dθ = 5

∫tan2(θ) dθ

= 5

∫sec2(θ)−1 dθ = 5 tan(θ)− 5θ =

√x2−25− 5 arcsec(x5 ) .

Since we dislike arcsec (§6.6), we can also write this as:√x2−25− 5 arctan(15

√x2−25) .

example:

∫1

(x2−4)3/2dx; x = 2 sec(θ), dx = 2 tan(θ) sec(θ) dθ, ((2 sec(θ))2−4)

32 = 8 tan3(θ):

∫1

(x2−4)3/2dx =

∫1

8 tan3(θ)· 2 tan(θ) sec(θ) dθ = 1

4

∫sin−2(θ) cos(θ) dθ

= −14 sin(θ)−1 = −1

4

12x√

(12x)2−1= − x

4√x2−4

.

Extra topic: Geometric substitution. Here is some esoteric knowledge for the hard-core students. Consider any integral in which trig functions are combined by the fourarithmetic operations, such as:∫

cos2(x) sin(x)− 2 tan(x) + 1

sec3(θ) + sin3(θ) + 3 cos(θ) sin(θ) + 5dθ.

There is an amazing technique, the Tangent Half-Angle Substitution, which allows us toreduce any such problem to the integral of a rational function (a quotient of polynomials),which can then be done by Partial Fractions (see §7.4).

This substitution is motivated entirely by geometry. Recall that the basic trig func-tions are circular functions: the coordinates (x, y) = (cos(θ), sin(θ)) for θ ∈ [0, 2π]trace the points on a unit circle. There is another way to trace this circle, the rationalparametrization: for any number t ∈ (−∞,∞), draw the line L from the fixed point(−1, 0) to the point (0, t) on the y-axis: this line cuts the circle in exactly one otherpoint (x, y). As (0, t) moves along the y-axis, the point (x, y) moves around the entirecircle, leaving out only the fixed point (−1, 0).

To compute the coordinates (x, y) corresponding to a given t, we write the slope of lineL using the two similar triangles between L and the x-axis:

slope =t

1=

y

x+1=⇒ y = t(x+1).

The point (x, y) also satisfies the circle equation x2 + y2 = 1. Substituting for y gives:

x2 + (t(x+1))2 = 1 =⇒ x2 + t2x2 + 2t2x+ t2 − 1 = 0 .

Now, for any t, this equation is always satisfied by the fixed point with x = −1, so x+1must be a factor of the last polynomial. Long division gives:

x2 + t2x2 + 2tx+ t− 1 = (x+1)(t2x+x+t2−1) = 0 =⇒

x =1− t2

1 + t2

or x = −1 .

Plugging this into y = t(x+1) to write y in terms of t, we get a new formula for thepoints of the circle, controlled by t ∈ (−∞,∞):

(x, y) =

(1− t2

1 + t2,

2t

1 + t2

).

Comparing this with our trig formula (x, y) = (cos(θ), sin(θ)) suggests the substitution:

cos(θ) =1− t2

1 + t2, sin(θ) =

2t

1 + t2, tan(θ) =

2t

1− t2, etc.

We can think of this as a backward substitution, θ = arcsin(

2t1+t2

), and compute:

sin(θ) =2t

1 + t2=⇒ cos(θ) dθ = 2

1− t2

(1 + t2)2dt

=⇒ dθ = 21

cos(θ)

1− t2

(1 + t2)2dt = 2

1 + t2

1− t21− t2

(1 + t2)2dt =

2

1 + t2dt .

To restore the original variable θ at the end of the integration, we need to writet = y

x+1 in terms of trig functions. We can do this just by (x, y) = (cos(θ), sin(θ)); butalso recall the theorem of elementary geometry which says the angle between L and thex-axis is θ

2 , giving another expression for the slope:

slope = t =y

x+1=

sin(θ)

cos(θ) + 1= tan

(θ

2

),

which is why we call it the Tangent Half-Angle Substitution.

example: We carry out this substitution, then the Partial Fraction Method from §7.4:∫1

sin2(θ) + cos(θ) + 2dθ =

∫1(

2t1+t2

)2+(1−t21+t2

)+ 2

· 2

1+t2dt =

∫2(t2+1)

t4 + 8t2 + 3dt

=

∫2(t2+1)

(t2+4)2 − 13dt =

∫ 1 + 3√13

t2+4+√

13+

1− 3√13

t2+4−√

13dt

= (√

13+3)√

4−√

13 arctan

(t√

4+√13

)+ (√

13−3)√

4+√

13 arctan

(t√

4−√13

)= (√

13+3)√

4−√

13 arctan

(tan(θ/2)√4+√13

)+ (√

13−3)√

4+√

13 arctan

(tan(θ/2)√4−√13

).

The point here is not the specific answer, which can be gotten by computer muchmore reliably than by hand. It is the principle that this is possible for any integral of thistype, precisely because of the rational parametrization of the circle. This leads towardthe theory of Lie groups, which generalizes the circle to highly symmetric geometricobjects in higher dimensions, starting with the 3-dimensional sphere.†

†We should be able to produce a similar substitution for integrals involving matrix coefficients of anyrepresentation of a compact Lie group, composed with the exponential map on the Lie algebra.

Math 133 Reverse Trig Substitution Stewart §7.3

Reducing to standard trig forms. To find an indefinite integral∫f(x) dx, we trans-

form it by methods like Substitution and Integration by Parts until we reduce to anintegral we recognize from before, a “standard form”. In the previous section §7.2, wewere able to compute most integrals involving products of trig functions, so these arenow standard forms to work toward.

A common type of difficult integral involves forms like√±x2 ± 1. We convert such

forms into trigonometric integrals, which at first seems to complicate them. However, wetake careful advantage of the Pythagorean identities cos2(θ) + sin2(θ) = 1 and tan2(θ) +1 = sec2(θ), so that the resulting trig formulas simplify to doable integrals.

Our first example is∫√

1− x2 dx. This seems simple enough, but neither Substitu-tion nor Integration by Parts will simplify it. Instead, imagine that we obtained thisfrom a more complicated integral by a trig substitution x = sin(θ): the current variablex is actually a function of a previous variable θ. The previous integral would be:∫ √

1−x2 dx =

∫ √1− sin2(θ) · cos(θ) dθ, where

{x = sin(θ)

dx = cos(θ) dθ.

We did not choose this substitution at random: the trig form simplifies because√1− sin2(θ) = cos(θ), and we obtain a standard form from §7.2:∫ √

1− sin2(θ) · cos(θ) dθ =

∫cos2(θ) dθ = 1

2θ −14 sin(2θ) = 1

2θ −12 sin(θ) cos(θ) .

Finally, we substitute back in terms of x: θ = arcsin(x), sin(θ) = x, cos(θ) =√

1−x2:∗∫ √1−x2 dx = 1

2 arcsin(x)− 12x√

1−x2 .

Let’s check the area of the unit circle, i.e. twice the area under the graph y =√

1−x2:

2

∫ 1

−1

√1−x2 dx =

[arcsin(x)− x

√1−x2

]x=1

x=−1= arcsin(1)− arcsin(−1) = π.

Integrals with√±x2±a2. We choose a reverse trig substitution depending on the

signs in the expression, then use the corresponding Pythagorean identity to obtain astandard trig form:

√a2−x2 x = a sin(θ) dx = a cos(θ) dθ

√a2−a2 sin2(θ) = a cos(θ)

√a2+x2 x = a tan(θ) dx = a sec2(θ) dθ

√a2+a2 tan2(θ) = a sec(θ)

√x2−a2 x = a sec(θ) dx = a tan(θ) sec(θ) dθ

√a2 sec2(θ)−a2 = a tan(θ)

example:

∫1√

4− x2dx. Let x = 2 sin(θ), dx = 2 cos(θ) dθ,

√4−(2 sin(θ))2 = 2 cos(θ):∫

1√4− x2

dx =

∫1

2 cos(θ)· 2 cos(θ) dθ = θ = arcsin(12x) .

∗See §6.6 for inverse trig functions.

We could do this more directly by manipulating the integrand to the known derivativeof arcsin(x) by the substitution u = 1

2x:∫1√

4−x2dx =

∫1√

1−(12x)2· 12 dx =

∫1√

1−u2du = arcsin(u) = arcsin(12x).

example:

∫1√

9x2+4dx. Let x = 2

3 tan(θ), dx = 23 sec2(θ) dθ,

√9(23 tan(θ))2+4 = 2 sec(θ):

∫1√

9x2+4dx =

∫1

2 sec(θ)· 23 sec2(θ) dθ = 1

3

∫sec(θ) dθ

= 13 ln∣∣tan(θ) + sec(θ)

∣∣ = 13 ln∣∣∣32x+ 1

2

√9x2+4

∣∣∣ .We can even write this in terms of the inverse hyperbolic sine from §6.7:

13 ln∣∣∣32x+

√94x

2+1∣∣∣ = 1

3 sinh−1(32x) .

example:

∫ √x2−25

xdx; x = 5 sec(θ), dx = 5 tan(θ) sec(θ) dθ,

√(5 sec(θ))2−25 = 5 tan(θ):

∫ √x2−25

xdx =

∫5 tan(θ)

5 sec(θ)· 5 tan(θ) sec(θ) dθ = 5

∫tan2(θ) dθ

= 5

∫sec2(θ)−1 dθ = 5 tan(θ)− 5θ =

√x2−25− 5 arcsec(x5 ) .

Since we dislike arcsec (§6.6), we can also write this as:√x2−25− 5 arctan(15

√x2−25) .

example:

∫1

(x2−4)3/2dx; x = 2 sec(θ), dx = 2 tan(θ) sec(θ) dθ, ((2 sec(θ))2−4)

32 = 8 tan3(θ):

∫1

(x2−4)3/2dx =

∫1

8 tan3(θ)· 2 tan(θ) sec(θ) dθ = 1

4

∫sin−2(θ) cos(θ) dθ

= −14 sin(θ)−1 = −1

4

12x√

(12x)2−1= − x

4√x2−4

.

Geometric substitution for circular functions. By now, only the hard-core students arereading, so here is some esoteric knowledge for you. Consider any integral in which trigfunctions are combined by the four arithmetic operations, such as:∫

1

cos3(θ) + cos(θ) sin(θ) + sin3(θ)dθ.

There is an amazing substitution which allows us to reduce any such problem to theintegral of a rational function (a quotient of polynomials), which can then be done byPartial Fractions (see §7.4).

The fascinating thing about this substitution is that the only way you would thinkof it is from geometry. Recall that the basic trig functions are circular functions: thecoordinates (x, y) = (cos(θ), sin(θ)) for θ ∈ [0, 2π] trace the points on a unit circle. Thereis another way to trace this circle: for any number t ∈ (−∞,∞), draw the point (1, 2t)on the line x = 1, which is tangent to the circle at (1, 0). Now the line L from (−1, 0)to (1, 2t) cuts the circle in exactly one other point (x, y). As t increases, the point (x, y)goes around the circle.

Let us find the coordinates (x, y) corresponding to a given t. The slope of line L can bedetermined from the two similar triangles cut from the angle between L and the x-axis:

slope =y

x+1=

2t

2=⇒ y = t(x+1).

The point (x, y) also satisfies the circle equation x2 + y2 = 1. Substituting for y gives:

x2 + (t(x+1))2 = 1 =⇒ x2 + t2x2 + 2t2x+ t2 − 1 = 0 .

Now, for any t, this equation is always satisfied by the fixed point of L at x = −1, sox+1 must be a factor of the above polynomial. Long division gives:

x2 + t2x2 + 2tx+ t− 1 = (x+1)(t2x+x+t2−1) = 0 .

Solving the second factor for x, and plugging back to get y, we have:

(x, y) =

(1− t2

1 + t2,

2t

1 + t2

).

Since the point (x, y) can be obtained either as (cos(θ), sin(θ)) or (1−t2

1+t2, 2t1+t2

), weset:

cos(θ) =1− t2

1 + t2, sin(θ) =

2t

1 + t2, tan(θ) =

2t

1− t2.

We can think of this as a backward substitution: θ = arcsin(

2t1+t2

), and we can compute:

sin(θ) =2t

1 + t2=⇒ cos(θ) dθ = 2

1− t2

(1 + t2)2dt

=⇒ dθ = 21

cos(θ)

1− t2

(1 + t2)2dt = 2

1 + t2

1− t21− t2

(1 + t2)2dt =

2

1 + t2dt .

To restore the original variable at the end, we just recall:

slope = t =y

x+1=

sin(θ)

cos(θ) + 1.

Math 133 Partial Fractions Stewart §7.4

Integrating basic rational functions. For a function f(x), we have examined severalalgebraic methods∗ for finding its indefinite integral (antiderivative) F (x) =

∫f(x) dx,

which allows us to compute definite integrals∫ ba f(x) dx = F (b) − F (a) by the Second

Fundamental Theorem.In this section, we will learn a special technique to integrate any rational function,

meaning a quotient of two polynomials:

f(x) =g(x)

h(x)=

amxm + am−1x

m−1 + · · ·+ a1x+ a0bnxn + bn−1xn−1 + · · ·+ b1x+ b0

,

where ai, bj are constant coefficients. We call the largest powers m and n the degrees ofthe polynomials g(x) and h(x), assuming that the highest coefficients am, bn 6= 0.

We have several basic rational functions whose integrals we already know:

(i)

∫amx

m + · · ·+ a1x+ a0 dx = amm+1x

m+1 + · · ·+ a12 x

2 + a0x+ C.

(ii)

∫1

x− adx = ln|x−a|+ C.

(iii)

∫1

(x− a)ndx = − 1

(n−1)(x− a)n−1for n ≥ 2.

(iv)

∫x

x2 + adx = 1

2

∫1

x2 + a· 2x dx = 1

2 ln|x2+a|+ C

(v)

∫1

x2 + adx = 1√

a

∫1

( x√a)2 + 1

· 1√adx = 1√

aarctan( x√

a) + C, for a > 0.

(vi)

∫1

(x2 + 1)2dx. Letting x = tan(θ), x2+1 = sec2(θ), dx = sec2(θ) dθ:

∫1

(x2 + 1)2dx =

∫1

sec4(θ)sec2(θ) dθ =

∫cos2(θ) dθ

= 12(θ + sin(θ) cos(θ)) + C = 1

2

(arctan(x) +

x

x2 + 1

)+ C .

We used:∫

cos2(θ) dθ =∫

12 + 1

2 cos(2θ) dθ = 12θ+ 1

4 sin(2θ) = 12θ+ 1

2 sin(θ) cos(θ),and (as in §6.6) sin(θ) = x√

x2+1, cos(θ) = 1√

x2+1.

Quadratic denominator. With the above basic integrals, we can integrate any rationalfunction with numerator of degree at most 1 and denominator of degree at most 2:∫

px+ q

ax2 + bx+ cdx .

There are two different cases, depending on the sign of the discriminant d = b2 − 4ac.

Notes by Peter Magyar [email protected]∗Substitution §4.5, Integration by Parts §7.1, Products of Trig Functions §7.2, Reverse Trig Substi-

tution §7.3

example: Here is how to handle the case where d = b2 − 4ac > 0, such as:∫x+ 1

x2 + x− 2dx,

where d = 12 − 4(1)(−2) = 9. By the Quadratic Formula, the denominator has two real

roots x = −b±√b2−4ac2a = 1,−2, which are the vertical asymptotes of our function:

We split our function into a sum of simple parts, each having just one vertical asymptote:

x+ 1

x2 + x− 2=

x+ 1

(x−1)(x+2)=

A

x−1+

B

x+2.

This is called the partial fraction expansion of our rational function. For any constantsA,B, the graph of the right-hand function will have the same asymptotes as our originalfunction, but we can actually find constants which make the two exactly equal. Clearingdenominators, we want A,B such that:

x+ 1 = A(x+2) +B(x−1) for all x.

Setting x = 1 gives 1 + 1 = A(1+2) + B(0), so A = 23 ; and setting x = −2 gives

−2 + 1 = A(0) +B(−2−1), so B = 13 . Now we can use the basic integral (ii) above:∫

x+ 1

x2 + x− 2dx =

∫ 23

x−1+

13

x+2dx = 2

3 ln|x−1|+ 13 ln|x+2|+ C .

example: The other case is when d = b2 − 4ac < 0, such as:∫x+ 1

x2 + x+ 1dx,

for which d = 12 − 4(1)(1) = −3. In this case, the denominator has no real-numberzeroes: x2 + x + 1 > 0, and it cannot be factored; hence the graph of x+1

x2+x+1has no

vertical asymptotes. Our strategy is to reduce the integral to the basic integrals (iii)and (iv) above.

The first step is to complete the square in the denominator and force it into the form(x+p)2 + q, the same process that produces the Quadratic Formula:

x2 + x+ 1 = x2 + 2(12)x+ (12)2 − (12)2 + 1 = (x+12)2 + 3

4 .

Thus, letting u = x+ 12 , du = dx:∫

x+ 1

x2 + x+ 1dx =

∫(x+1

2)− 12 + 1

(x+12)2 + 3

4

dx

=

∫u

u2 + 34

dx+ 12

∫1

u2 + 34

dx

= 12 ln|u2+3

4 | + 1

2√

3/4arctan( u√

3/4) + C

= 12 ln∣∣x2+x+1

∣∣ + 1√3

arctan(2x+1√3

) + C.

In fact, the two terms in our answer correspond to splitting the original function (blue)into a graph with reflection symmetry across the line x = −1

2 (green), and a graph with180◦ rotation symmetry around the point (−1

2 , 0) (red):

example: One more case: if the numerator has degree greater than or equal to thedenominator, for example: ∫

x4 + 2x+ 3

x2 + x− 2dx .

Then y = 0 is no longer a horizontal asymptote. Instead, the behavior of the functionas x→ ±∞ is controlled by a polynomial curve obtained by polynomial long division.

x2 − x + 3 rem − 3x+ 9

x2 + x− 2)x4 + 2x+ 3

−(x4 + x3 − 2x2)

− x3 + 2x2 + 2x+ 3

− (x3 − x + 2x)

3x2 + 3− (3x2 + 3x− 6)

− 3x+ 9

Thus x4 + 2x+ 3 = (x2−x+3)·(x2+x−2) + (−3x+9), and:

x4 + 2x+ 3

x2 + x− 2= (x2−x+3) +

−3x+ 9

x2 + x+ 1= (x2−x+3) +

2

x− 1− 5

x+ 2

The last equality is a partial fraction expansion similar to our first example above. Now:∫x4 + 2x+ 3

x2 + x− 2dx = 1

3x3−1

2x2+3x+ 2 ln|x−1| − 5 ln|x+2|+ C .

General case. The above techniques suffice to integrate any rational function f(x) =g(x)/h(x), provided we can factor the denominator. First, we perform a partial fractiondecomposition of f(x) into a sum of terms of the following forms:

• A polynomial q(x), which is the quotient in the long division g(x) ÷ h(x) = q(x)with remainder r(x).

• For each linear factor x−r of the denominator h(x), suppose (x−r)n is the highestpower which divides h(x). Then we add a sum of n terms:

A1

x− r+

A2

(x− r)2+ · · ·+ An

(x− r)n.

• For each irreducible quadratic factor ax2 + bx+ c of h(x), suppose (ax2 + bx+ c)n

is the highest power which divides h(x). Then we add a sum of n terms:

B1x+ C1

ax2 + bx+ c+

B2x+ C2

(ax2 + bx+ c)2+ · · ·+ Bnx+ Cn

(ax2 + bx+ c)n.

Setting f(x) = g(x)/h(x) equal to the sum of all the terms above, we clear the denom-inators and solve for all the unknown constants in the numerators as we did for A,Bin our first example above. Once this is done, we can integrate using (i)–(vi) and theabove examples.†

example: We find the partial fraction expansion of:

f(x) =1

x2(x2 + 1)2=

A1

x+A2

x2+B1x+C1

x2 + 1+B2x+C2

(x2 + 1)2.

We need to find the six constants A1, A2, B1, B2, C1, C2 which make the above equationvalid. Clearing denominators gives:

1 = A1x(x2+1)2 +A2(x2+1)2 + (B1x+C1)x

2(x2+1) + (B2x+C2)x2

= (A1+B1)x5 + (A2+C1)x

4 + (2A1+B1+B2)x3 + (2A2+C1+C2)x

2 +A1x+A2

Since this is an equality of polynomial functions, the coefficients of xk must be the samefor all k:

A1 +B1 = 0A2 + C1 = 0

2A1 +B1 +B2 = 02A2 + C1 + C2 = 0A1 = 0, A2 = 1

We solve this as:

A1 = 0, A2 = 1, B1 = −A1 = 0, C1 = −A2 = −1,

B2 = −2A1 −B1 = 0, C2 = −2A2 − C1 = −1 .

Hence, according to (i)–(vi):∫1

x2(x2 + 1)2dx =

∫1

x2− 1

x2 + 1− 1

(x2 + 1)2dx

= −1

x− arctan(x)− 1

2

(arctan(x) +

x

x2 + 1

).

†For∫

1(x2+1)j

dx we need even more elaborate contortions. The truth is this only becomes manage-

able if we use imaginary number factorizations like x2 + 1 = (x−√−1)(x+

√−1), avoiding quadratic

denominators entirely.

Trig integrals again. In §7.2–7.3, we reduced trig integrals by Substitution to rationalfunction integrals, which we can now find by Partial Fractions. For example:∫

sec(x) dx =

∫1

cos2(x)· cos(x) dx =

∫1

1− sin2(x)· cos(x) dx

=

∫1

1− u2du =

∫ 12

1− u+

12

1− udu

= 12 ln(1+u)− 1

2 ln(1−u) = ln

√1+u

1−u= ln

√1+ sin(x)

1− sin(x).

challenge problem: Show by identities that this is equal to our previous answer∫sec(x) dx = ln|tan(x) + sec(x)| given in §7.2 . Also: try this method on

∫sec3(x) dx.

Math 133 Improper Integrals Stewart §7.8

Integrals near a vertical asymptote. What happens if we take the integral of afunction over an interval containing a vertical asymptote, such as:

I =

∫ 2

0

1

xdx = ??

Algebraically, we would get I = ln |2| − ln |0|, but ln(0) is undefined. Numerically, theRiemann sum for I does not converge, because of the very large values of f(x) nearx = 0. Geometrically, I measures a region (the positive area in the graph on the nextpage) which stretches infinitely along the asymptote x = 0, and the meaning of suchan infinitely extended area is not clear.

Our previous definitions fail to give meaning to this integral, so we give a newdefinition: ∫ 2

0

1

xdx = lim

r→0+

∫ 2

r

1

xdx.

That is, we take the integral over the interval x ∈ [r, 2] where the function is continuous,then take the limit as r squeezes up against the asymptote x = 0 from the right. Now,∫ 2r

1x dx = ln |2| − ln |r|, and limr→0+ ln(r) = −∞, meaning ln(r) becomes a larger and

larger negative number, so the improper integral is:∗∫ 2

0

1

xdx = lim

r→0+ln(2)− ln(r) = ∞.

This says that the total area under the graph y = 1x and above [0, 2] is infinite:

no matter how many square units of paint are put on this region, there will still beunpainted area high up next to the asymptote.

General definition: If the function f(x) has a vertical asymptote near x = q, wedefine the improper integral of vertical type:

• on an interval [a, q] as

∫ q

af(x) dx = lim

r→q−

∫ r

af(x) dx;

• on an interval [q, b] as

∫ b

qf(x) dx = lim

r→q+

∫ b

rf(x) dx.

• on an interval with q ∈ (a, b) as

∫ b

af(x) dx =

∫ q

af(x) dx+

∫ b

qf(x) dx.

If such an integral has a finite value, we say it converges; if it is infinite or undefined,we say it diverges.

Notes by Peter Magyar [email protected]∗We take ln(2) minus a larger and larger negative number; and this equals a larger and larger

positive number, denoted by ∞.

example: Evaluate∫ 2−1

1x dx. This attempts to measure two infinite regions: one above

[0, 2] along the positive y-axis, and another below [−1, 0] along the negative y-axis.

The improper integral avoids the asymptote from both sides:∫ 2

−1

1

xdx =

∫ 0

−1

1

xdx+

∫ 2

0

1

xdx = lim

r→0−

∫ r

−1

1

xdx + lim

r→0+

∫ 2

r

1

xdx.

But when we try to calculate this, we get:∫ 2

−1

1

xdx =

(limr→0−

ln |r| − ln |− 1|)

+ limr→0+

ln |2| − ln |r| = −∞+∞ ,

which is an indeterminate form: the integral is truly undefined. We have no goodmeaning for an infinite positive area canceled by an infinite negative area. In particular,the naive answer is wrong:∫ 2

−1

1

xdx = undefined 6= ln |2| − ln |− 1|.

example: Evaluate∫ 21

1√x−1 dx. Since the vertical asymptote is x = 1, we have:∫ 2

1

1√x− 1

dx = limr→1+

∫ 2

r

1√x− 1

dx = limr→1+

2√x−1

∣∣x=2

x=r

= limr→1+

2√

2−1− 2√r−1 = 2− 0 = 2.

In this case, the region has a finite area of 2, even though it stretches infinitely highalong the vertical asymptote. Thus, if we start with a bucket of paint for 2 squareunits, we use less and less as we paint the higher parts of the region, and never runout of paint.

Integrals near a horizontal asymptote. If y = f(x) has y = 0 as a horizontalasymptote, we can define improper integrals of horizontal type.

• If limx→∞ f(x) = 0, we define the integral on an interval [a,∞) as:∫ ∞a

f(x) dx = limr→∞

∫ r

af(x) dx.

• If limx→∞ f(x) = 0, we define the integral over an interval (−∞, a] as:∫ a

−∞f(x) dx = lim

r→−∞

∫ a

rf(x) dx.

• If limx→±∞ f(x) = 0, we is over the whole real line (−∞,∞), we define it bysplitting at any finite value x = a:∫ ∞

−∞f(x) dx =

∫ a

−∞f(x) dx+

∫ ∞a

f(x) dx for any a.

example:∫∞1

1x2dx = lim

r→∞

∫ r1

1x2dx = lim

r→∞

(− 1

x

)∣∣x=rx=1

= limr→∞−1r + 1

1 = 1 .

This integral measures a region which stretches infinitely along the x-axis above [1,∞),but which has a finite total area of 1.

On the other hand∫∞1

1√xdx = limr→∞ 2

√x |x=rx=1 = ∞. In fact,

∫∞1

1xp dx is

finite if p > 1, but is infinite if p ≤ 1. Informally, the faster f(x) shrinks as x → ∞,the easier it is for the integral to converge to a finite value.

example:∫∞0 e−x dx = limr→∞

∫ r0 e−x dx = limr→∞ −e−x|x=rx=0 = −0− (−1) = 1.

It is not surprising that this converges, because e−x shrinks faster than 1xp for any p.

example: ∫ ∞−∞

1

1 + x2dx = lim

r→−∞

∫ 0

r

1

1 + x2dx + lim

r→∞

∫ r

0

1

1 + x2dx

= limr→−∞

tan−1(x)∣∣x=0

x=r+ lim

r→∞tan−1(x)

∣∣x=rx=0

=(0− (−π

2 ))

+(π2 − 0

)= π.

Remarkably, the total area under y = 11+x2

turns out to be π, same as a unit circle!

Comparison tests for convergence. Sometimes an improper integral is too compli-cated to find an algebraic antiderivative, but we can still be sure it converges becausethe infinite region measured fits inside a larger region of known finite area.

For example, the Gaussian bell-curve integral∫∞1 e−x

2dx cannot be integrated by

an antiderivative. However, for x ≥ 1, we have x2 ≥ x, so e−x2 ≤ e−x: that is, the

curve y = e−x2

lies below y = e−x:

We can easily evaluate the area below the upper curve, which shows that the smallerarea under the lower curve is finite, i.e. the improper integral converges:∫ ∞

1e−x

2dx <

∫ ∞1

e−x dx = 0− (−e−1) = 1e ≈ 0.37 .

Direct Comparison Test: Consider an improper integral∫ ba g(x), with a or b infinite.

• If |f(x)| ≤ g(x) for x∈ [a, b], and∫ ba g(x) dx converges, then

∫ ba f(x) dx converges.

• If f(x)≥ g(x)≥ 0 for x∈ [a, b] and∫ ba g(x) dx diverges, then

∫ ba f(x) dx diverges.

The proof uses the Domination Rule for ordinary integrals (§4.2), plus some complica-tions with limits.

example: Does∫∞0

4 sin(x)+1e2x+x2

dx converge? This function shrinks rapidly, since the topdoes not grow, and the bottom grows exponentially; thus we guess that the integral con-verges. To prove this using the first part of the Test, we should bound f(x) = 4 sin(x)+1

e2x+x2

inside the graph of a fairly simple comparison function g(x) = g1(x)g2(x)

with |f(x)| ≤ g(x).

Now, increasing the numerator of f(x) and decreasing its denominator gives a largerfraction, so let us take: ∣∣∣∣4 sin(x) + 1

e2x + x2

∣∣∣∣ ≤ 5

e2x= 5e−2x .

Now the comparison integral converges:∫∞0 5e−2x dx = limr→∞

(−5

2e−2x)∣∣x=r

x=0= 5

2 ;hence and so does the original integral:∣∣∣∣∫ ∞

0

4 sin(x) + 1

e2x + x2dx

∣∣∣∣ ≤ 52 .

By contrast, to prove divergence of a fractional f(x), we would bound f(x) above acomparison function g(x) with a smaller numerator and larger denominator.

Limit Comparison Test or Ratio Comparison Test: Suppose f(x), g(x) are functions

with limx→∞f(x)g(x) = L. Then

∫∞a f(x) dx converges if and only if

∫∞a g(x) dx converges.

In the case that g(x) ≥ 0, this is simply because, given limx→∞f(x)g(x) = L, we can

take x large enough so that 12Lg(x) ≤ f(x) ≤ 3

2Lg(x), and we can apply the DirectComparison Test.

To apply this Test to∫∞a f(x) dx for a fraction f(x) = f1(x)

f2(x), we generally choose

the comparison function g(x) = g1(x)g2(x)

where g1(x) is the largest term in f1(x), and

likewise with g2(x) and f2(x). For example, for:

f(x) =x2 − e−x + sin(x)√

x5 + 7take g(x) =

x2√x5

= x−1/2 .

We previously showed∫∞a x−1/2 dx diverges, so the original integral

∫∞a f(x) dx also

diverges.

Math 133 Method for Integration §7.1–7.8

Given a function f(x), we wish to find the indefinite integral∫f(x) dx = F (x) + C,

i.e. an antiderivative function with F ′(x) = f(x). For brevity, we omit the constant +C.

1. Basic integrals which directly reverse basic derivatives:∫xp dx = 1

p+1xp+1 (p 6=−1)

∫1x dx = ln|x|

∫ex dx = ex∫

sin(x) dx = − cos(x)∫

cos(x) dx = sin(x)∫sec2(x) dx = tan(x)

∫tan(x) sec(x) dx = sec(x)∫

1√1−x2

dx = arcsin(x)∫

11+x2 dx = arctan(x)

2. Substitution: Factor the integrand so that∫f(x) dx =

∫h(g(x)) · g′(x) dx.

That is, find a factor g′(x) which is a known derivative of some g(x) appearinginside the other factor. To get g′(x) exactly, perhaps multiply and divide by aconstant. To find the outside h(u), you may need to solve u = g(x) as x = g−1(u).

Take u = g(x), du = g′(x) dx, so that∫h(g(x)) · g′(x) dx =

∫h(u) du = H(u).

Restore the original variable:∫f(x) dx = H(g(x)).

3. Integration by Parts. Factor the integrand so that one factor is a known derivativeg′(x). Then:

∫f(x) dx =

∫h(x)·g′(x) dx = h(x)·g(x)−

∫g(x)·h′(x) dx.

In Leibnitz notation,∫u dv = uv −

∫v du.

Do the remaining integral∫g(x)·h′(x) dx by another method. Here g(x) should

be no more complicated than g′(x), and h′(x) should be simpler than h(x).

4. Products of Trig Functions. Substitute by factoring out a derivative g′(x) =cos(x), sin(x), sec2(x) or tan(x) sec(x); and writing the remaining factor in terms

of u = g(x) using cos2(x)+ sin2(x) = 1, tan2(x)+1 = sec2(x), tan(x) = sin(x)cos(x) .

Otherwise, use identities sin2(x) = 12 −

12 cos(2x), cos2(x) = 1

2 + 12 cos(2x).

A hard case:∫

sec(x) dx = ln|tan(x)+ sec(x)|. Also, any trig integral converts intoa rational function integral by the Tangent Half-Angle Substitution (§7.3).

5. Reverse Trig Substitution. If√a2−x2 appears in

∫f(x) dx, complicate the integral

by substituting x = a sin(θ), dx = a cos(θ) dθ; simplify using√a2−(a sin(θ))2 =

a cos(θ). Do the resulting trig integral, then restore x using θ = arcsin(xa ).

Do the same for√x2−a2 using x = a sec(θ); and for

√x2+a2 using x = a tan(θ).

6. Partial Fractions for integrating rational functions f(x) = g(x)h(x) , where g(x), h(x)

are polynomials. If g(x) has degree greater than or equal to h(x), perform long

division to get f(x) = q(x) + r(x)h(x) , where r(x) has degree less than h(x).

If the denominator factors as h(x) = (x−a)(x−b) · · · with a, b, . . . all different,

split f(x) into the form: f(x) = g(x)(x−a)(x−b)··· = A

x−a + Bx−b + · · · . Solve for the

constant A after clearing denominators and substituting x = a; and similarly forthe other constants B, . . .. Finally, integrate using

∫A

x−a dx = A ln|x−a|.

If h(x) has factors like (x−a)k or ax2+bx+c with no real roots, see §7.4.


Math 133 Arclength Stewart §8.1

Increments of length. In this section, we give an integral formula to compute thelength of a curve, by the same Method of Slice Analysis we used in §5.2 to computevolume, and in §5.3 to compute work (see end §5.2).

We want the arclength L of a graph curve y = f(x) for x ∈ [a, b]. We cut the curveinto n bits determined by ∆x-increments of x ∈ [a, b]. (In the picture, n = 5.)

Because the bit at the sample point xi is so short, it is well approximated by a straightsegment, and we can use the Pythagorean Theorem to compute its length:

∆Li ≈√

(∆x)2 + (∆y)2 .

We want to write this as a term in a Riemann sum, so we must write it in the formg(xi) ∆x for some function g(x). We simply factor out ∆x:

∆Li ≈√(

1 + (∆y)2

(∆x)2

)(∆x)2 =

√1 + ( ∆y

∆x)2 ∆x .

In the limit as n → ∞, we get ∆x → 0 and ∆y∆x →

dydx = f ′(xi), and the Riemann sum

total of the ∆Li’s becomes an integral:

L = limn→∞

n∑i=1

∆Li = limn→∞

n∑i=1

√1 + ( ∆y

∆x)2 ∆x =

∫ b

a

√1 + ( dydx)2 dx .

In Newton notation:

L =

∫ b

a

√1 + y′(x)2 dx .

example: Compute the arclength of the curve y = x√x over the interval x ∈ [0, 4]. We

have dydx = (x3/2)′ = 3

2x1/2, so:

L =

∫ 4

0

√1 + ( dydx)2 dx =

∫ 4

0

√1 + 9

4x dx = 827(1+9

4x)3/2∣∣∣x=4

x=0= 8

27(√

10−1) ≈ 9.07

To check this, we compare with the straight-line distance between the endpoints (0, 0)and (4, 8): this is

√42 + 82 ≈ 8.9, and indeed the length of the curve is slightly larger.


example: Compute the circumference of the unit circle, which is twice the arclength ofthe graph y =

√1−x2 for x ∈ [−1, 1]:

C = 2L = 2

∫ 1

−1

√1 + ( d

dx

√1−x2 )2 dx = 2

∫ 1

−1

√1 +

(−2x

2√

1−x2

)2dx

= 2

∫ 1

−1

√1− x2 + x2

1− x2dx = 2

∫ 1

−1

1√1−x2

dx = 2 arcsin(x)∣∣∣x=1

x=−1= 2π .

example: Compute the arclength of the parabola y = x2 over any interval x ∈ [0, b].

L =

∫ b

0

√1 + ( d

dx(x2))2 dx =

∫ b

0

√1 + 4x2 dx .

To find the indefinite integral, we use the reverse trig substitution (§7.3): x = 12 tan(θ),√

1 + 4x2 = sec(θ), dx = 12 sec2(θ) dx:∫ √

1 + 4x2 dx =

∫12 sec3(θ) dθ = 1

4 ln∣∣tan(θ)+ sec(θ)

∣∣ + 14 tan(θ) sec(θ),

where we use∫

sec3(θ) dθ from §7.2. Restoring the original variable, tan(θ) = 2x,sec(θ) =

√1+4x2 , and taking the definite integral:

L =[

14 ln∣∣∣2x+

√1+4x2

∣∣∣ + 12x√

1+4x2]x=b

x=0= 1

4 ln∣∣∣2b+√1+4b2

∣∣∣ + 12b√

1+4b2 .

Arclength tends to get quite complicated even for quite simple curves!

example: Compute the arclength of the curve y = x3 over x ∈ [0, 1].

L =

∫ 1

0

√1 + ( d

dx(x3))2 dx =

∫ 1

0

√1 + 9x4 dx .

This is already complicated enough that it has no algebraic antiderivative.∗

Does this mean the arclength formula is useless? Not at all! We cannot get ananswer on the algebraic level, but we can still get a numerical answer as accurate aswe like. This means going from the integral formula for L back to the Riemann sumsfrom which we deduced the integral. For example, taking n = 1000, the increment is∆x = 1

1000 = 0.001, and the sample points are xi = i∆x = (0.001)i. The computergives:

L ≈n∑

i=1

√1 + 9x4

i ∆x =1000∑i=1

√1 + 9(10−12)i4 (0.001) ≈ 1.548 ,

To gauge the accuracy of this, we re-do it with n = 10, 000, getting L ≈ 1.547, so wecan be confident that L ≈ 1.54 is accurate to 2 decimal places.

∗The integral can be expressed in terms of an “elliptic function”, but this is circular reasoning sinceelliptic functions themselves are defined as integrals!

Math 133 Parametric Curves Stewart §10.1

Back to pictures! We have emphasized four conceptual levels, or points of view onmathematics: physical, geometric, numerical, algebraic. The physical viewpoint is thatof Applied Mathematics, including engineering and the hard sciences: useful, powerful,revealing. The numerical point of view, officially called Analysis, is concerned withapproximations, error-control, and convergence of limits. It prevents our reasoningfrom falling into chaos when we deal with infinite shapes or processes, but a likingfor Analysis is a special taste even among mathematicians. Algebra, my favorite, isconcerned with formulas to consisely construct and transform complicated quantitiesby means of symbolic operations, often giving amazingly simple answers.

But deep down, what we really love in math is Geometry: pictures! In this section,we will learn to handle the simplest geometric objects: curves. So far, we have dealtwith curves as graphs of functions y = f(x), in which we imagine the independentvariable x moving along its axis while f(x) controls the height.

Parametric lines. A more general model for a curve is to consider it as the path of aparticle moving in the plane in any fashion. We specify its coordinates as functions oftime: that is, at time t, the particle is at the position (x(t), y(t)). We call the variablet the parameter, and the trajectory traced out is a parametric curve.

Any graph y = f(x) can immediately be written parametrically as (x(t), y(t)) =(t, f(t)), meaning that the particle moves so that at time t it is above the point x = twith height f(t).

example: Suppose a particle starts at time t = 0 at the point (x(0), y(0)) = (1, 2) ,and moves with constant velocity until time t = 1 to the point (x(1), y(1)) = (4, 6).The horizontal velocity is 4−1

1−0 = 3, the vertical velocty is 6−21−0 = 4,∗ and the position

at time t will be:

(x(t), y(t)) = (1 + 3t, 2 + 4t) for t ∈ [0, 1] ,

shown by the thick line segment below.

If we keep the same velocity for all real values of t, we get the thin infinite line.

Notes by Peter Magyar [email protected]∗The overall speed is

√32 + 42 = 5.

Given any parametric curve, writing it in terms of an equation satisfied by x and y iscalled deparametrizing: in this case, we want the graph of a linear function y = mx+b.A general method is to solve for t in terms of x, then plug in to the equation for y:{

x = 1 + 3ty = 2 + 4t

=⇒

{t = 1

3(x−1)

y = 2 + 4(13(x−1)

) =⇒ y = 43x+ 2

3 .

Indeed, we could have immediately seen that the slope is the vertical velocity over thehorizontal velocity: m = 4

3 .

Parametric circles. Given the unit circle defined by the equation x2 + y2 = 1,we would like to parametrize it: to trace the curve by a particle moving accordingto (x(t), y(t)). One way is to let the particle make an angle of t radians at time t,meaning:

(x(t), y(t)) = (cos(t), sin(t)) for t ∈ [0, 2π] .

If we keep the same motion for all t, the particle travels around and around the circle.We can check that this formula does trace the circle, because the coordinates do satisfythe known equation:

x(t)2 + y(t)2 = cos2(t) + sin2(t) = 1 .

Our standard circular motion has center (0, 0), radius r = 1, starting at (1, 0) for t = 0,with 1 counterclockwise rotation during t ∈ [0, 2π]. We can modify each part of this:

• Move the center of the circle to (6, 7): (x(t), y(t)) = (6 + cos(t), 7 + sin(t)).

• Stretch the radius to r = 5: (x(t), y(t)) = (5 cos(t), 5 sin(t)).

• Make the rotation clockwise:† (x(t), y(t)) = (cos(−t), sin(−t)).

• Make the particle start at (−1, 0) at t = 0: (x(t), y(t)) = (cos(t−π2 ), sin(t−π

2 )).

• Do 10 rotations over t ∈ [0, 1]: (x(t), y(t)) = (cos(10·2πt), sin(10·2πt)).

†In general, to reverse the motion of (x(t), y(t)), take (x(−t), y(−t)) to make time go backwards.

To combine all the above:

(x(t), y(t)) =(

6 + 5 cos(10·2π(−t−π

2 )), 7 + 5 sin

(10·2π(−t−π

2 )) )

.

example: We parametrize an ellipse, which is a circle stretched horizontally and/orvertically. For example, here is a parametric equation for the ellipse centered at (0, 0),2 units high, and 3 units wide:

(x(t), y(t)) = (3 cos(t), 2 sin(t))

Because of the uneven stretching, the particle will not travel at constant speed, andthe central angle at time t will not be proportional to t.

example: A tricky way to parametrize the unit circle is the rational parametrization:

(x(t), y(t)) =

(1− t2

1 + t2,

2t

1 + t2

).

We can tell that each point (x(t), y(t)) lies on the circle because:

x(t)2 + y(t)2 =

(1− t2

1 + t2

)2+

(2t

1 + t2

)2=

1− 2t2 + t4

(1 + t2)2+

4t2

(1 + t2)2

=1 + 2t2 + t4

(1 + t2)2=

(1 + t2)2

(1 + t2)2= 1 .

It is not easy to find any points on the circle with both coordinates rational numbers:if we start with rational x, we expect irrational y =

√1−x2. Amazingly, the rational

parametrization produces infinitely many rational points: just plug in any fractionfor t, for example (x(12), y(12)) = (35 ,

45) and (x(23), y(23)) = ( 5

13 ,1213). For each point

(x, y) = (ac ,bc), clearing denominators in the equation x2 + y2 = 1 gives a2 + b2 = c2.

This defines a Pythagorean triple: a right triangle in which all three sides a, b, c arewhole numbers! For example a = 5, b = 12, c = 13 satisfy 52 + 122 = 132. This is asample of the mathematical field of Algebraic Geometry.

Cycloid curve. This famous curve traces the path of a particle on the rim of a rollingwheel (a unit circle rolling over the x-axis).

As the wheel rolls, its circumference traces an equal distance along the x-axis, so thatat time t, the wheel’s center is at (t, 1). If we hold the center fixed at the origin, theparticle on the rim starts at (−1, 0) and turns clockwise once over t ∈ [0, 2π], so itsposition is:

(cos(−t−π2 ), sin(−t−π

2 )) = (− sin(t),− cos(t))

by standard trig identities.Combining the linear motion of the center with the circular motion around the

center gives the parametric equation of the cycloid curve:

(x(t), y(t)) = (t− sin(t), 1− cos(t)) .

These equations allow a computer to easily plot the cycloid. (It’s actually not so hardeven by hand.)

Let us deparametrize this to get an xy-equation for the cycloid. We solve for t interms of one variable (in this case y), and plug into the other variable (in this case x):

{x = t− sin(t)y = 1− cos(t)

=⇒

cos(t) = 1− ysin(t) =

√1−(1−y)2 =

√2y−y2

t = arccos(1−y)

x = arccos(1−y)−√

2y−y2

Simplifying:

cos(x+

√2y−y2

)+ y = 1 .

This is pretty weird, but it allows us to immediately decide if a given point (x, y) lieson the cycloid: just check if it satisfies the equation! The parametric form, on theother hand, allows us to produce points on the curve.

Epicycloids. One variant of the cycloid is the epicycloid, in which the wheel rollsaround a fixed circle. The curve varies depending on the relative size of the two circles.From the perspective of a fixed central Earth, the trajectories of the other planetsare very close to epicycloids, and the classical astronomers in the tradition of Ptolemyattempted to find an exact model for planetary motion by adding further epicycles,wheels rolling on wheels like a gigantic clockwork. But starting with Copernicus, weinterpret the apparent epicycloid as an illusion based on combining the separate orbitsof Earth and the other planet around a fixed central Sun.

Here is a compound epicycloid with a central circle of radius 1, a wheel of radius16 rolling around it, and a wheel of radius 1

6 rolling around that (assuming the circlescan pass through each other):

Math 133 Parametric Calculus Stewart §10.2

Tangents of a parametric curve. We have learned how to write a curve paramet-rically, as the path of a particle whose position at time t is given by two coordinatefunctions (x(t), y(t)) over a time interval t ∈ [a, b].

Considering the curve as a track on which the particle runs, the tangent line at apoint (x(c), y(c)) is the path the particle would take if it were suddenly released fromthe track at time t = c, keeping a constant velocity from that moment. The velocity att = c has horizontal and vertical components (x′(c), y′(c)), giving the parametric line:

(x(c) + x′(c) t, y(c) + y′(c) t) .

The components are the linear approximations of x(t) and y(t) near t = c, which isappropriate since the tangent is the line which best approximates the curve near thepoint.

We can convert this parametric line into an xy-equation as in §10.1. The slope

is the horizontal over the vertical velocity: m = y′(c)x′(c) , and we know the line passes

through (x(c), y(c)), so we have the point-slope equation:

y = y′(c)x′(c)(x−x(c)) + y(c) .

Here (x, y) is a general point of the line, but x(c), y(c), x′(c), y′(c) are constants com-puted from the coordinate functions of the original curve.

To further explain this, we imagine the original curve as the graph of a functiony = f(x), meaning y(t) = f(x(t)) for all t. The Chain Rule gives:

y′(t) = f ′(x(t)) · x′(t) ⇐⇒ dy

dt=

dy

dx· dxdt

.

At time t = c and x = x(c), this gives our previous slope formula:

f ′(x(c)) =y′(c)

x′(c)⇐⇒ dy

dx=

dydtdxdt

.

Tangents of a circle. We find the tangent line to (x(t), y(t)) = (2 sin(πt), 2 cos(πt))at the point (

√2,√

2). First, to picture the curve, we note:

• Since the components are 2 sin and 2 cos of the same quantity, the curve is acircle of radius 2.

• The full circle is traced by πt ∈ [0, 2π], i.e. t ∈ [0, 2].

• The curve starts at (x(0), y(0)) = (0, 2) on the y-axis; it moves clockwise, sincethe x-coordinate 2 sin(πt) increases for small t ≥ 0.


To apply our formulas, we need to know the value t = c at which the curve passesthrough the given point: (x(t), y(t)) = (

√2,√

2). That is, we must solve the system ofequations: {

2 sin(πt) =√

2

2 cos(πt) =√

2⇐⇒ t = 1

4 .

We can find a simultaneous solution to both equations precisely because the point lieson the curve. We have (x′(c), y′(c)) = (2π cos(π4 ),−2π sin(π4 )) = (

√2π,−

√2π), so the

tangent line is:

(x(c) + x′(c)t, y(c) + y′(c)t) = (√

2 +√

2πt,√

2−√

2πt)

y = y′(c)x′(c)(x−x(c)) + y(c) =

√2π

−√

2π(x−√

2) +√

2 ⇐⇒ y = −x+ 2√

2 .

Note that each tangent to the circle is perpendicular to the corresponding radius.

Tangents of a polynomial curve. Find the tangent to (x(t), y(t)) = (t2, t3 − 3t) atthe point (3, 0). This is not a familiar curve, so to picture it, we must plot points byplugging in various values of t:

We see that the curve passes twice through the given point (3, 0). Algebraically:{t2 = 3

t3 − 3t = 0⇐⇒ t =

√3 or t = −

√3 .

Note that t3 − 3t = 0 by itself has the solutions t = 0,±√

3, but t = 0 does not satisfythe first equation t2 = 3: for time t = 0, the curve is at (0, 0), not (3, 0).

Now we can easily find the two tangent lines: (3 + 3√

3t, 6t) and (3− 3√

3t, 6t).

example: Which points of this curve have horizontal tangents? The tangent is hor-izontal when the vertical velocity is zero: (t3 − 3t)′ = 3t2 − 3 = 0 ⇐⇒ t = ±1,corresponding to the points (1,−2) and (1, 2).

Arclength. After applying derivatives to parametric curves, we now apply integrals,which compute the size or bulk of geometric objects. The most natural measure of thesize of a curve is its arclength. We already computed this for graph curves y = f(x)in §8.1, and now we do the more general parametric case.

We follow the general scheme for computing any measure of size of a geometricobject from §5.2. We want the arclength L of a parametric curve (x(t), y(t)) for t ∈[a, b]. We cut the curve into n bits determined by ∆t-increments of t ∈ [a, b].

Because the bit at the sample point ti is so short, it is well approximated by a straightsegment, and we can use the Pythagorean Theorem to compute its length:

∆Li ≈√

(∆x)2 + (∆y)2 =

√(∆x)2 + (∆y)2

(∆t)2∆t =

√(∆x

∆t )2 + (∆y

∆t )2 ∆t .

In the limit as n→∞, we get ∆t→ 0 and ∆x∆t →

dxdt = x′(ti); similarly for ∆y

∆t :

L = limn→∞

n∑i=1

∆Li = limn→∞

n∑i=1

√(∆x

∆t )2 + (∆y

∆t )2 ∆t =

∫ b

a

√(dxdt )

2 + (dydt )2 dt .

In Newton notation:

L =

∫ b

a

√x′(t)2 + y′(t)2 dt .

In fact, the integrand is just the total speed of the particle at time t, combining thehorizontal and vertical speeds.

example: Compute the circumference length of a circle of radius r. The stan-dard parametrization is (x(t), y(t)) = (r cos(t), r sin(t)) for t ∈ [0, 2π], with derivative(x′(t), y′(t)) = (−r sin(t), r cos(t)), and length:

L =

∫ 2π

0

√(−r sin(t))2 + (r cos(t))2 dt =

∫ 2π

0r

√sin2(t) + cos2(t) dt

=

∫ 2π

0r dt = rt

∣∣∣t=2π

t=0= 2πr .

The integral is so easy because the particle travels at constant speed r. This wasmuch harder in §8.1, using our previous formula L =

∫ r−r√

1+f ′(x)2 dx, where f(x) =√r2 − x2.

example: Find the length of one arch of the cycloid from §10.1: (x(t), y(t)) =(t− sin(t), 1− cos(t)) for t ∈ [0, 2π]. We have (x′(t), y′(t)) = (1− cos(t), sin(t)), so:

L =

∫ 2π

0

√(1− cos(t))2 + (sin(t))2 dt =

∫ 2π

0

√1−2 cos(t) + cos2(t) + sin2(t) dt

=

∫ 2π

0

√2(1− cos(t)) dt =

∫ 2π

02 sin( t2) dt = 8 .

Here we used the identity sin( t2) =

√1−cos(t)

2 .

Math 133 Polar Coordinates Stewart §10.3/I

Points in polar coordinates. The first and greatest achievement of modern math-ematics was Descartes’ description of geometric objects by numbers, using a systemof coordinates. In the simplest example, Cartesian or rectangular coordinates on theplane locate a point P in terms of two coordinate measurements x and y: how far overand how far up the point is, moving parallel to the marked axes. We loosely say thatP “is” the pair (x, y), because the coordinates tell how to get there from the origin.The name P is like identifying a house as “the Jones place”, whereas the coordinatesare like saying “the third house to the right on the second street down”.

In this section, we learn how to locate the point P using a different pair of mea-surements, the polar coordinates (r, θ). The radius r is the distance from the origin;and the angle θ is measured in radians couterclockwise from the positive x-axis ray.This is like pointing to “the house 500 yards in that direction”.

Unlike rectangular coordinates, the polar coordinates of a point are multivalent,having many equivalent versions because of the ambiguity of angles. For example, thepoint (x, y) = (0, 1) on the positive y-axis corresponds to (r, θ) = (1, π2 ), where θ = π

2means a 1

4 turn counterclockwise from the positive x-axis. However, we could equallywell get to this point by a 3

4 turn clockwise, giving (r, θ) = (1,−3π2 ). In fact, we could

get to the point by 114 turns counterclockwise, 13

4 clockwise, etc. In general, we mustconsider angles that differ by a multiple of a full turn 2π as the same, meaning theydefine the same point:

(r, θ) = (r, θ+2nπ) for any integer n.

It is also useful to allow negative radius: (−r, θ) means to move out along the line atangle θ, but in the opposite direction from the positive ray, along the ray θ ± π; thus:

(−r, θ) = (r, θ ± π) .

There is even more ambiguity for the origin (x, y) = (0, 0), which can be written as(r, θ) = (0, θ) for any angle at all.

Both types of coordinates completely locate a point, so given either (x, y) or (r, θ),we can find the other by simple trigonometric formulas:

Given (r, θ) =⇒ find (x, y) with

{x = r cos(θ)

y = r sin(θ) .

Given (x, y) =⇒ find (r, θ) with

{r =

√x2 + y2

θ = arctan( yx) .


Here we get θ from the defining formula tan(θ) = yx , and we could equally well use

sin(θ) = y√x2+y2

, etc., always remembering we can change θ to θ+2nπ. Also, since

−π2 < arctan(θ) < π

2 , we must define arctan(∞) = π2 , arctan(−∞) = −π

2 ; and we mustadjust the angle by ±π if the point lies left of the x-axis.∗

Curves in polar coordinates. Any geometric object in the plane is a set (collection)of points, so we can describe it by a set of coordinate pairs. For example, the unit circleC is the set of all points at distance 1 from the origin;† the coordinates of these pointsform the set of all pairs (x, y) which satisfy the Pythagorean equation x2 + y2 = 1:

C = {(x, y) such that x2 + y2 = 1} .

Again, the equality of these sets is meant loosely: a pair of numbers like (x, y) = (35 ,45)

is not literally a geometric point on the circle, but it identifies a point by means ofthe rectangular coordinate system. Now, polar coordinates are specially adapted todealing with round, turny shapes, and they make the equation of the circle as simpleas possible:

C = {(r, θ) such that r = 1} .

example: The line x+y = 1 is not at all circular, and its equation becomes complicatedin polar coordinates:

x+ y = 1 =⇒ r cos(θ) + r sin(θ) = 1

=⇒ r =1

cos(θ)+ sin(θ)= 1√

2sec(θ−π

4 ) .

example: Consider the Archimedean spiral, the shape of the groove on an old vinylrecord (solid blue line).

This is defined by a point moving steadily outward as it turns around the origin: inparametric polar coordinates, (r(t), θ(t)) = (t, t) for t ≥ 0, meaning at time t the radiusand angle are both t. Converting into rectangular coordinates:

(x(t), y(t)) = (r(t) cos θ(t), r(t) sin θ(t)) = (t cos(t), t sin(t)) .

∗

θ = arcsin

(y√x2+y2

)= arccos

(x√x2+y2

)=

{arctan

(yx

)if x ≥ 0

arctan(yx

)+ sgn(y)π if x < 0

The last formula is expressed in computer languages as atan2(y,x).†There is no separate curve “connecting” the points: the curve is just all the points.

Deparametrizing gives the rθ and xy-equations:

r = θ + 2nπ for integer n =⇒√x2 + y2 = arctan

(yx

)+ 2πn

=⇒ y = x tan√x2 + y2 .

For example, we can tell the points (x, y) = (2nπ, 0) are on the spiral, because 0 =2nπ tan

√(2nπ)2 + 02. Actually, the last equation defines the spiral together with its

natural continuation back past its center point, namely the 12 turn rotation of the

original spiral (dashed red line).

Sketching polar curves. We can sketch the curve defined by r = f(θ) by plottingpoints, just as for a rectangular graph.

example: Sketch the curve r = sin(θ). We imagine the plane as a field, with usstanding at the origin. We look along the positive x-axis and draw a point at radius 0,namely the origin itself. As we increase θ > 0, turning slowly to the left, we increasethe radius as 2 sin(θ) increases. The radius tops out at 1 when θ = π

2 along the positivey-axis; and as we continue to turn the point comes back in to the origin when θ = π.After that, as we turn toward negative y directions the radius becomes negative, so wetrace points behind us, in fact retracing the original curve.

From the sketch, we may guess this curve is a circle, which we verify by converting toan xy-equation, and simplifying by completing the square:

r = sin(θ) =⇒√x2 + y2 =

y√x2 + y2

=⇒ x2 + y2 − y = 0

=⇒ x2 + y2 − 2(12)y + (12)2 = (12)2 =⇒ x2 + (y−12)2 = (12)2 .

Indeed, this is a circle of radius 12 centered at (x, y) = (0, 12).

example: Sketch the curve r = sin(2θ). Repeating the above procedure, we get fourlobes, traced in the order indicated as we turn from θ = 0 to θ = 2π, with lobes 2 and4 traced with r < 0.

Math 133 Polar Coordinates Stewart §10.3/I,II

Points in polar coordinates. The first and greatest achievement of modern math-ematics was Descartes’ description of geometric objects by numbers, using a systemof coordinates. In the simplest example, Cartesian or rectangular coordinates on theplane locate a point P in terms of two coordinate measurements x and y: how far overand how far up the point is, moving parallel to the marked axes. We loosely say thatP “is” the pair (x, y), because the coordinates tell how to get there from the origin.The name P is like identifying a house as “the Jones place”, whereas the coordinatesare like saying “the third house to the right on the second street down”.

In this section, we learn how to locate the point P using a different pair of mea-surements, the polar coordinates (r, θ). The radius r is the distance from the origin.The angle θ is measured couterclockwise in radians, from the positive x-axis ray to theray from orgin through the point P . This is like pointing to “the house 500 yards inthat direction”.

Unlike rectangular coordinates, the polar coordinates of a point are multivalent,having many equivalent versions because of the ambiguity of angles. For example, thepoint (x, y) = (0, 1) on the positive y-axis corresponds to (r, θ) = (1, π2 ), where θ = π

2means a 1

4 turn counterclockwise from the positive x-axis. However, we could equallywell get to this point by a 3

4 turn clockwise, giving (r, θ) = (1,−3π2 ). In fact, we could

get to the point by 114 turns counterclockwise, 13

4 clockwise, etc. In general, we mustconsider all angles that differ by a multiple of a full turn 2π as the same, meaning theydefine the same point:

(r, θ) = (r, θ+2nπ) for any integer n.

It is also useful to allow negative radius: (−r, θ) means to move out along the line atangle θ, but in the opposite direction from the positive ray, along the ray θ ± π; thus:

(−r, θ) = (r, θ ± π) .

There is even more ambiguity for the origin (x, y) = (0, 0), which can be written as(r, θ) = (0, θ) for any angle at all.

Both types of coordinates completely locate a point, so given either (x, y) or (r, θ),


we can find the other by simple trigonometric formulas:

Given (r, θ) =⇒ find (x, y) with

{x = r cos(θ)

y = r sin(θ) .

Given (x, y) =⇒ find (r, θ) with

{r =

√x2 + y2

θ = arctan( yx) .

Here we get θ from the defining formula tan(θ) = yx , and we could equally well use

sin(θ) = y√x2+y2

, etc., always remembering we can change θ to θ+2nπ. Also, since

−π2 < arctan(θ) < π

2 , we must define arctan(∞) = π2 , arctan(−∞) = −π

2 ; and we mustadjust the angle by ±π if the point lies left of the y-axis.∗

Curves in polar coordinates. Any geometric object in the plane is a set (collection)of points, so we can describe it by a set of coordinate pairs. For example, the unit circleC is the set of all points at distance 1 from the origin;† the coordinates of these pointsform the set of all pairs (x, y) which satisfy the Pythagorean equation x2 + y2 = 1:

C = {(x, y) such that x2 + y2 = 1} .

Again, the equality of these sets is meant loosely: a pair of numbers like (x, y) = (35 ,45)

is not literally a geometric point on the circle, but it identifies a point by means ofthe rectangular coordinate system. Now, polar coordinates are specially adapted todescribe round, turny shapes centered at the origin, and they make the equation of thecircle as simple as possible:

C = {(r, θ) such that r = 1} .

example: The line x + y = 1 is not at all circular or centered at the origin, and itsequation becomes complicated in polar coordinates:

x+ y = 1 =⇒ r cos(θ) + r sin(θ) = 1

=⇒ r =1

cos(θ)+ sin(θ)= 1√

2sec(θ−π

4 ) .

The last equality follows from the identity cos(θ−π4 ) = cos(θ) cos(π4 ) + sin(θ) sin(π4 ) =

1√2(cos(θ) + sin(θ)).

Similar reasoning gives the polar form of a general linear equation. For ax+by = 0,we get θ = α for the constant angle α = arctan( ba). For c 6= 0, we get:

ax+ by = c =⇒ r =c

a cos(θ) + b sin(θ)=

c√a2 + b2

sec(θ−α) ,

∗Summarizing:

θ = arcsin

(y√x2+y2

)= arccos

(x√x2+y2

)=

{arctan

(yx

)if x ≥ 0

arctan(yx

)+ sgn(y)π if x < 0

The last formula is expressed in computer languages as atan2(y,x).†There is no separate curve “connecting” the points: the curve is just all the points.

example: Consider the Archimedean spiral, the shape of the groove on an old vinylrecord (solid blue line).

This is defined by a point moving steadily outward as it turns around the origin: inparametric polar coordinates, (r(t), θ(t)) = (t, t) for t ≥ 0, meaning at time t the radiusand angle are both t. Converting into rectangular coordinates:

(x(t), y(t)) = (r(t) cos θ(t), r(t) sin θ(t)) = (t cos(t), t sin(t)) .

Deparametrizing gives the rθ and xy-equations:

r = θ + 2nπ for integer n =⇒√x2 + y2 = arctan

(yx

)+ 2πn

=⇒ y = x tan√x2 + y2 .

For example, we can tell the points (x, y) = (2nπ, 0) are on the spiral, because 0 =2nπ tan

√(2nπ)2 + 02. Actually, the equation y = x tan

√x2+y2 defines the spiral

together with its natural continuation back past its center point, namely the 12 turn

rotation of the original spiral (dashed red line).

Sketching polar graphs. Remember that a function f is just a rule taking inputnumbers to output numbers. It does not care what letters we use for inputs andoutputs, or how we interpret those letters geometrically. We usually illustrate thefunction by drawing its rectangular graph y = f(x), in which f controls the height yabove each point on the x-axis. But another way to illustrate this function is the polargraph r = f(θ), in which f controls the radius r along each ray θ.

We can sketch the polar graph r = f(θ) by plotting points, just as for a rectangulargraph. For example, consider the polar curve:

r = sin(θ) .

We imagine the plane as a field, with us standing at the origin. We look along thepositive x-axis and draw a point at radius 0, namely the origin itself. As we increaseθ > 0, turning slowly to the left, we increase the radius as sin(θ) increases. The radiustops out at 1 when θ = π

2 along the positive y-axis; and as we continue to turn thepoint comes back in to the origin when θ = π. After that, as we turn toward negative ydirections the radius becomes negative, so we draw points behind us, in fact retracingthe original curve.

Actually, this is a computer plot to turn the qualitative story above into an accurategraph. But we really could do this by hand, by plotting r for some standard θ:

deg 0◦ 30◦ 45◦ 60◦ 90◦ 120◦ 135◦ 150◦ 180◦

θ 0 π6

π4

π3

π2

2π3

3π4

5π6 π

r 0 0.5 0.7 0.9 1 0.9 0.7 0.5 0

As we said, the angles π ≤ θ ≤ 2π give negative radius and re-plot the same points.From the sketch, we may guess this curve is a circle, which we verify by converting

to an xy-equation, and simplifying by completing the square:

r = sin(θ) =⇒√x2 + y2 =

y√x2 + y2

=⇒ x2 + y2 − y = 0

=⇒ x2 + y2 − 2(12)y + (12)2 = (12)2 =⇒ x2 + (y−12)2 = (12)2 .

Indeed, this is a circle of radius 12 centered at (x, y) = (0, 12).

More sketching. We sketch the curve:

r = 1 + sin(2θ) .

This is more complicated, so instead of computing a table of θ and r values, we start bydrawing the function r = f(θ) = 1 + sin(2θ) in our usual way as a rectangular graph,labeling the horizontal and vertical axes by r and θ because that is how we intend todraw them later in the polar graph.

Even without precise values, we can sketch the polar graph by adjusting the radiusaccording to the heights of the rectangular graph (dotted lines).‡ The blue lobe istraced by θ ∈ [−π

4 ,3π4 ]; then the green lobe is for θ ∈ [3π4 ,

7π4 ].

‡Graphically, we crush the entire horizontal θ-axis in the rectangular graph to the origin in thepolar graph, spreading out the radial lines like a fan.

Math 133 Polar Areas and Lengths Stewart §10.4

Slope in polar coordinates. We have seen that round, turny shapes are more simplydescribed by polar rθ-equations than rectangular xy-equations. In this section, we use polarequations to compute geometric information.

Thus, we consider a polar curve r = f(θ) for θ ∈ [a, b]. We split the interval θ ∈ [a, b] intoa large number n of increments, each of length ∆θ = b−a

n , with sample points θ1, . . . , θn.Here is a typical increment of the curve over θ ∈ [θi, θi+1], showing the correspondingincrements in the coordinates:

Our first problem is to find the slope of this curve at a given θ. It is not the derivativef ′(θ) = dr

dθ , which is the rate of change of the radius with respect to the angle. Rather,the slope is the rate of change of y = r sin(θ) = f(θ) sin(θ) with respect to x = r cos(θ) =f(θ) cos(θ). That is:

(slope at θ) =dy

dx=

dydθdxdθ

=(f(θ) sin(θ))′

(f(θ) cos(θ))′=

f ′(θ) sin(θ) + f(θ) cos(θ)

f ′(θ) cos(θ)− f(θ) sin(θ).

Area in polar coordinates. Next, we assume r = f(θ) ≥ 0 for θ ∈ [a, b] to avoidcomplications with negative radius, and we consider the region inside the curve, defined by0 ≤ r ≤ f(θ) for θ ∈ [a, b]. Again we apply Slice Analysis (§5.2), splitting the area A ofthis region into n thin wedges ∆Ai corresponding to [θi, θi+1]:


We must compute the wedge area ∆Ai. Since ∆θ is tiny, the small curve segments are veryclose to straight lines, and ∆Ai is a very thin triangle. Neglecting the small piece withradius larger that ri, the slice ∆Ai is approximately an isosceles triangle with height ri andbase ri∆θ.

∗ Thus:

∆Ai ≈ 12(base)×(height) ≈ 1

2(ri∆θ)ri = 12r

2i ∆θ .

Therefore the total area is:

A = limn→∞

n∑i=1

∆Ai = limn→∞

n∑i=1

12r

2i ∆θ = lim

n→∞

n∑i=1

12f(θi)

2∆θ =

∫ b

a

12f(θ)2 dθ .

Arclength in polar coordinates. Finally, we compute the length of the curve r = f(θ)for θ ∈ [a, b]. The length L is a sum of n increments ∆Li:

Each increment ∆Li is approximately a straight line segment. Next to it is the radialsegment ∆r and the tiny circular arc with length ri ∆θ, which is also approximately a

∗On a unit circle, an arc of θ radians has length θ, which is the definition of radian measure. On a circleof radius r, and arc of θ radians has length rθ.

straight line. We get an approximate right triangle with hypotentuse ∆Li and legs ri ∆θand ∆r, so the Pythagorean Theorem gives:

∆Li ≈√

(ri∆θ)2 + (∆r)2 =

√(ri∆θ)

2+(∆r)2

(∆θ)2∆θ =

√r2i + (∆r

∆θ )2 ∆θ .

Therefore the total arclength is:

L = limn→∞

n∑i=1

∆Li = limn→∞

n∑i=1

√r2i + (∆r

∆θ )2 ∆θ

= limn→∞

n∑i=1

√f(θi)2 + (∆f(θi)

∆θ )2 ∆θ =

∫ b

a

√f(θ)2 + f ′(θ)2 dθ .

Example: exponential spiral. Consider the polar curve:

r = f(θ) = eθ/2π ,

called an exponential spiral, logarithmic spiral, or or snail-shell:

It winds infinitely toward the center with each turn having a radius e−1 times the previousone.

What is the length of this curve, from the point (r, θ) = (1, 0) all the way to the center,that is, for θ ∈ (−∞, 0]? The arclength formula gives:

L =

∫ 0

−∞

√f(θ)2 + f ′(θ)2 dθ =

∫ 0

−∞

√2 eθ/2π dθ

= 2√

2πeθ/2π∣∣∣θ=0

θ=−∞= 2√

2πe0/2π − limN→∞

2√

2πe−N = 2√

2π ≈ 8.9 .

Next, consider the shaded region enclosed by the same section of the curve, along withthe dotted segment θ = 0, e−1 ≤ r ≤ 1. The outermost turn of the curve, θ ∈ [−2π, 0],

sweeps out wedges which fill this whole region, so this interval defines the correct boundsfor integration:

A =

∫ 0

−2π

12f(θ)2 dθ =

∫ 0

−2π

12eθ/π dθ = π

2 eθ/π∣∣∣θ=0

θ=−2π= π

2 (1−e−2) ≈ 1.4 .

Areas of intersections. Consider the polar curve r = f(θ) = 1− cos(θ). To picture thefunction f , we draw its rectangular graph (end of §10.3):

The polar graph is a cardioid (heart-shape), which we draw along with the circle r = 12 .

problem: Find the area of the crescent-shaped region which is inside the cardioid andoutside the circle.

We must first determine the intersection points of the two curves, where:

r = 1− cos(θ) = 12 =⇒ cos(θ) = 1

2 =⇒ θ = ±π3 + 2nπ ,

where n is any integer. Since the whole cardioid is traced by θ ∈ [0, 2π], we can take allintersection points in this range: θ = π

3 and θ = −π3 + 2π = 5π

3 . Now we take the areainside the cardioid r = f(θ) = 1− cos(θ), minus the area inside the circle r = g(θ) = 1

2 :

A =

∫ b

a

12f(θ)2 − 1

2g(θ)2 dθ =

∫ 5π/3

π/3

12(1− cos(θ))2 − 1

2(12)2 dθ

=[

58θ − sin(θ) + 1

8 sin(2θ)]θ=5π/3

θ=π/3= 7

8

√3 + 5π

6 ≈ 4.1 .

Math 133 Sequences Stewart §11.1

Real functions and sequences. So far, our main objects of study have beenfunctions f : R → R, where the inputs and outputs are in the set of real numbersR = (−∞,∞). In this chapter, we introduce a new type of function called a sequence:

a : {1, 2, 3, . . .} → R ,

in which the inputs are whole numbers n = 1, 2, 3, . . ., and the outputs are again realnumbers, usually written as an instead of a(n). The index n can be replaced arbitrarily:{ai}∞i=1 is the same sequence as {an}∞n=1. Also, we define some sequences to begin witha0, so we write {an}∞n=0.

We can write a sequence either as a formula or as a list of outputs; for example:

an = 1n ⇐⇒ {an}∞n=1 = 1, 12 ,

13 ,

14 , . . .

Here {an}∞n=1 denotes the entire sequence, thought of as an infinite list, and we writethe first few values a1 = 1, a2 = 1

2 , etc., to make the pattern clear. We can picturethis by plotting the points (n, an) in the plane, sometimes with a bar-graph as at left;or by marking only the output values a1, a2, a3, . . . on a number line as at right:

examples:

• {an} = 1,−1, 1,−1, . . . ⇐⇒ an =

{1 for n odd−1 for n even

⇐⇒ an = (−1)n−1

• an = sin(nπ2

)⇐⇒ an =

0 for n even1 for n = 4k + 1 with integer k−1 for n = 4k + 3 with integer k

⇐⇒n 1 2 3 4 5 6 7 8 · · ·an 1 0 −1 0 1 0 −1 0 · · ·

• an = 2n ⇐⇒ {an} = 2, 4, 8, 16, . . . ⇐⇒ a1 = 2, an = 2an−1 for n ≥ 2

The last definition is recursive, meaning that each value an is defined in terms ofthe previous value an−1, starting with an initial value a1 = 2,


• The Fibonacci sequence is the most famous recursive sequence: each entry is thesum of the previous two.

F1 = F2 = 1, Fn = Fn−1 + Fn−2 for n ≥ 3.

n 1 2 3 4 5 6 7 8 · · ·Fn 1 1 2 3 5 8 13 21 · · ·

There is no obvious formula for Fn in terms of n, but look up Binet’s formula.

Convergence. It would be meaningless to take the limit of a sequence an as n → c,since a whole number n cannot gradually approach a finite value c. However, we cantake the limit as n→∞.

Definition: We say the sequence {an}∞n=1 converges to the number L, de-noted limn→∞ an = L, whenever an gets as close as desired to L, providedn is large enough. Specifically, for any error tolerance ε > 0, there is somelower bound N such that n > N forces L− ε < an < L+ ε; or equivalently:

n > N =⇒ |an − L| < ε .

If the limit does not exist, we say the sequence diverges.

This just repeats the error-control definition for limx→∞ f(x) = L from §1.7, and wehave a similar definition for divergence to infinity, limn→∞ an = ∞ or −∞. In thepictures above, we can see the convergence of an = 1

n to L = 0: in the graph, we seethe points (n, an) approach the horizontal asymptote y = L; on the number line, wesee the an points march closer and closer to the limit value L.

example: Prove that: limn→∞ 2 + (−1)n2n = 2. Given the acceptable error tolerance

ε > 0, we work backward from the desired inequality:

2− ε < 2 +(−1)n

2n< 2 + ε ⇐⇒ −ε < (−1)n

2n< ε ⇐⇒

∣∣∣∣(−1)n

2n

∣∣∣∣ < ε ⇐⇒ n >1

2ε

For example, if we want |an − 2| < ε = 1100 , we take n > 1

2ε = 12/100 = 50.

Limit Laws. We do not usually perform error-control analysis to work with limits ofsequences an, but rather rely on our previous knowledge of limits of functions f(x):

Sequence Comparison Theorem: If f(x) is a function with an = f(n) forall n, then lim

n→∞an = lim

x→∞f(x), provided the right-hand limit exists or is

±∞.

example: Compute limn→∞n2+n2n2−3 . Here an = n2+n

2n2−3 for n = 1, 2, 3, . . . is the sequence

version of f(x) = x2+x2x2−3 for real numbers x, and we have techniques to deal with limits

of f(x). Here, we can use L’Hopital’s Rule:

limn→∞

n2 + n

2n2 − 3= lim

x→∞

x2 + x

2x2 − 3

Hop= lim

x→∞

2x+ 1

4x

Hop= lim

x→∞

2

4=

1

2.

We cannot use L’Hopital’s Rule directly on an because we cannot take the derivativeof a sequence: it is not a curve with a slope at each point.

An alternative way of handling limits of sequences is to repeat the kind of analysiswe did with functions: combine Basic Limits using Limit Laws (§1.6). We have:

• Basic Limits: limn→∞

c = c and limn→∞

n =∞.

• Sum Law: limn→∞

an + bn = limn→∞

an + limn→∞

bn .

• Product Law: limn→∞

anbn = limn→∞

an · limn→∞

bn .

• Quotient Law: limn→∞

anbn

=limn→∞

an

limn→∞

bn.

The above Laws are valid provided the right-side expressions make sense: for example,in the Quotient Law we must assume that an and bn converge, and lim

n→∞bn 6= 0. Fur-

thermore, the Laws are valid when the right-side limits are infinite, provided we usethe Infinity Rules:

∞+∞ =∞ ∞·∞ =∞ c ·∞ =

{∞ if c > 0−∞ if c < 0

1

±∞= 0 .

example: We can re-do the sequence in the previous example as follows:

limn→∞

n2 + n

2n2 − 3= lim

n→∞

n2 + n

2n2 − 3·

1n2

1n2

= limn→∞

1 + 1n

2− 3n2

Applying the Limit Laws and Infinity Rules, this becomes:

1 + limn→∞

1n

2− 3(

limn→∞

1n

)2 =1 + 1

∞

2− 3(

1∞)2 =

1 + 0

2− 3(02)=

1

2.

Limit Theorems. We have two more results which parallel those for limits of f(x):

Squeeze Theorem: If an ≤ bn ≤ cn for all n, and limn→∞

an = limn→∞

cn = L,

then limn→∞

bn = L.

example: Rigorously evaluate the limit of bn = 2+sin(n2)n . Note that the sequence

qn = sin(n2) diverges, oscillating unpredictably between −1 ≤ sin(n2) ≤ 1. However,we have bounds:

an =1

n=

2− 1

n≤ 2 + sin(n2)

n≤ 2 + 1

n=

3

n= cn .

Since the upper and lower bounds both approach the limit L = 0, so does the middle

sequence: limn→∞

2+sin(n2)n = 0.

Continuity Theorem: If g(x) is continuous, meaning limx→c

g(x) = g(c) for all

c, then:

limn→∞

g(an) = g(

limn→∞

an

).

example: Find limn→∞

n1/n: that is, does the sequence 1,√

2, 3√

3, 4√

4, . . . approach a

finite value? As always with exponentiation, we rewrite in terms of the natural expo-nential exp(x) = ex, which is a continuous function:

limn→∞

n1/n = limn→∞

eln(n)/n = limn→∞

exp

(ln(n)

n

)= exp

(limn→∞

ln(n)

n

).

Now we can evaluate the inside limit by L’Hopital:

limn→∞

ln(n)

n= lim

x→∞

ln(x)

x= lim

x→∞

1/x

1= 0.

Hence limn→∞

n1/n = e0 = 1. Check this by computing values of n1/n on your calculator.

Continuous compounding. Here is a surprising example from financial theory.Suppose a bank account pays an annual interest rate of r: for example, r = 0.04 = 4%means that after a year, each dollar becomes 1 + r = 1.04 dollars.

Now suppose half the interest is paid after half a year, giving 1 + r2 dollars, and in

the second half-year, the previous interest also earns interest (i.e. compound interest).At the end of the year, each dollar becomes (1 + r

2)(1 + r2) = (1 + r

2)2 dollars. If theinterest is paid three times a year, compound interest gives (1 + r

3)3 dollars; and ifinterest is paid n times a year, it gives (1 + r

n)n dollars.Now imagine if interest were paid every hour, or every second, etc., approach-

ing a system of compounding continuously at every instant. Would this produce anunbounded amount of money, or tend to a limit? Let’s see!(

1 +r

n

)n= exp

(ln(

1 +r

n

)n)

Now L’Hopital gives:

limx→∞

ln(

1 +r

x

)x = lim

x→∞

ln(1 + rx−1

)x−1

Hop= lim

x→∞

11+rx−1 (−rx−2)

−x−2

= limx→∞

rx

x+ r

Hop= lim

x→∞

r

1= r .

Therefore:limn→∞

(1 +

r

n

)n= exp(r) = er .

Thus, an interest rate of r produces an effective annual yield of er under continuouscompounding. No intervals of compounding will produce more than this. Once again,the natural exponential intrudes even though the original question had nothing to dowith it.

Math 133 Series Stewart §11.2

Sequences and series. To any sequence {an}∞n=1 we associate another sequence{sn}∞n=1, called the series of sums of {an}, defined by:

sn = a1 + a2 + · · ·+ an =

n∑i=1

ai .

That is, s1 = a1, s2 = a1 + a2, s3 = a1 + a2 + a3, and in general sn is the sum ofthe first n entries of {an}. The sigma notation

∑ni=1 ai is a convenient shorthand

for taking each integer value i = 1, 2, . . . , n, substituting it into the expression ai,and adding all the resulting quantities (see §4.1 Pt 2). We can change the indexletters arbitrarily: sN =

∑Nn=1 an defines the same series as sn =

∑ni=1 ai.

example: If an = 1n , then s1 = 1, s2 = 1 + 1

2 = 32 , s3 = 1 + 1

2 + 13 = 11

6 , etc.The general term sn =

∑ni=1

1i = 1

1 + 12 + 1

3 + · · ·+ 1n has no elementary formula,

which is typical for series, even when an is quite simple. Geometrically, sn is thearea under the bar graph of {an} above the interval [0, n].

We can also take the infinite sum, which is defined as a limit of finite sums:

∞∑i=1

ai = limn→∞

n∑i=1

ai = limn→∞

sn .

This limit, converging or diverging, is the total area under the bar graph of{an}∞n=1.

example: The general purpose of a series is to express a complicated quantityas an infinite sum of simple quantities, s =

∑∞i=1 ai, so that the finite sums

sn =∑n

i=1 ai are approximations. A familiar example of this is decimal notation:an irrational real number is a complicated quantity with an infinite amount ofdetail in its digits, which are equivalent to the sum of a certain series.

Given a sequence of digits {dn}∞n=1 with dn ∈ {0, 1, . . . , 9}, we have the number:

s = 0.d1d2d3 · · · =d1101

+d2102

+d3103

+ · · · =

∞∑n=1

dn10n

.

By definition, an infinite decimal is the limit of its finite decimal approximations,the number approached as we add more digits. A trivial example is the repeatingdecimal s = 0.999 · · · , which clearly gets as close as desired to 1 as we take more


digits, so s = 1. For a more complicated pattern of digits, the series converges tosome complicated real number (see §11.4).

example: We will eventually (§11.9, 11.10) develop powerful methods to writefamiliar numbers and functions as infinite series. Two outstanding formulas are:

π = 4(1− 12 + 1

3 −14 + · · · ) , sin(x) = x− 1

3!x3 + 1

5!x5 − 1

7!x7 + · · · .

It is formulas like these that allow machines to compute complicated transcenden-tal quantities using only the four arithmetic operations (which are all that caneasily be built into a logic circuit).

Wheat kernels on a chessboard. Consider the following classic puzzle: if weput one kernel of wheat on the first square of a chessboard, then two kernels onthe second square, then four on the third square, and we keep doubling until the64th square, how many kernels on the whole board?

We start with the sequence an = number of kernels on the nth square, definedrecursively by a1 = 1, an = 2an−1, which leads to the explicit formula an = 2n−1.This is called an exponential sequence or geometric sequence.∗ The associatedgeometric series is sn = total number of kernels on the first n squares:

sn =n∑

i=1

2i−1 = 1 + 21 + 22 + · · ·+ 2n−2 + 2n−1.

Surprisingly, we can find a simple formula for sn as follows:

2sn = 21 + 22 + · · ·+ 2n−1 + 2n

−sn = −1 − 21 − 22 − · · ·− 2n−1

Adding these, the two sides become:

(2− 1)sn = 2an − a1 = 2n − 1 =⇒ sn =2n − 1

2− 1= 2n − 1 .

Therefore, the answer to our puzzle is s64 = 264−1 kernels, which is enough wheatto fill a football stadium.

Now we change the problem: we put one ounce of gold on the first square, halfan ounce on the second square, a quarter ounce on the third square, and so onuntil the 64th. What is the total weight of gold on the board?

The weight on the nth square is given by another geometric (i.e. exponential)

sequence an =(12

)n−1, and the total weight on the first n squares is the geometric

series sn =∑n

i=1

(12

)i−1. We use the same trick as before:

sn = 1 + 12 +

(12

)2+ · · ·+

(12

)n−1−1

2sn = − 12 −

(12

)2 − · · ·−(12

)n−1 −(12

)n(1− 1

2)sn = a1 − 12an = 1−

(12

)n=⇒ sn =

1−(12

)n1− 1

2

.

∗This terminology is obscure, but very standard.

Thus, the total weight is s64 =1−( 1

2)64

1− 12

. Since(12

)64is a tiny, negligeable quantity,

this is very close to 11− 1

2

= 2, meaning the first square has just about the sametotal weight as the other 63 squares. In fact, adding more squares would barelychange the total, since the limit is:

∞∑i=1

(12

)i−1= lim

n→∞sn = lim

n→∞

1−(12

)n1− 1

2

=1− 0

1− 12

= 2 .

Geometric sequences and series. A general geometric sequence starts with aninitial value a1 = c, and subsequent terms are multiplied by the ratio r, so thatan = ran−1; explicitly, an = crn−1. The same trick as above gives a formula forthe corresponding geometric series. We have sn − rsn = c− crn, so:

sn =n∑

i=1

cri−1 = c+ cr + cr2 + · · ·+ crn−1 = c1− rn

1− r.

(Notice that the power rn is one larger than in the last term crn−1.) This ingeniousformula is known as the sum of a finite geometric series; the limit is the sum ofan infinite geometric series:

limn→∞

sn =∞∑i=1

cri−1 = c1

1− r, provided |r| < 1.

Of couse, the infinite series diverges if |r| ≥ 1. These formulas are needed againand again in practical problems, especially those involving finance and lending.

Manipulating sigma notation. The geometric series formulas allow us to eval-uate any series whose terms involve only exponential functions like 2n or 2−n, butnot power functions like n or n2. For this, we rearrange and manipulate the termsinto the form of the geometric sequences which we know.

example: Evaluate the finite sum

n∑i=1

2i − 1

3i+1, and the infinite sum

∞∑i=1

2i − 1

3i+1.

First, we work this out in dot-dot-dot notation:

21 − 1

31+1+ · · ·+ 2n − 1

3n+1=

(21

31+1+ · · ·+ 2n

3n+1

)−(

1

31+1+ · · ·+ 1

3n+1

)

=

(2

3221−1

31−1+ · · ·+ 2

322n−1

3n−1

)−(

1

321

31−1+ · · ·+ 1

321

3n−1

)

=

(2

32

(2

3

)1−1+ · · ·+ 2

32

(2

3

)n−1)−

(1

32

(1

3

)1−1+ · · ·+ 1

32

(1

3

)n−1)

= 232

1−(23

)n1− 2

3

− 132

1−(13

)n1− 1

3

Here we factored out 232

and 132

so the remaining factor would be ri−1 for somer, which we can then evaluate using the geometric series formula.

This computation can be written more compactly in sigma notation:

n∑i=1

2i−13i+1 =

n∑i=1

(2i

3i+1 − 13i+1

)=

n∑i=1

2i

3i+1 −n∑

i=1

13i+1

=n∑

i=1

232

2i−1

3i−1 −n∑

i=1

132

13i−1

=n∑

i=1

232

(23

)i−1 − n∑i=1

132

(13

)i−1= 2

321−

(23

)n1− 2

3

− 132

1−(13

)n1− 1

3

.

To get the infinite sum, we just remove the terms (23)n and (13)n, since these go tozero as n→∞.

Repeating decimals. An important application of geometric series is to writerepeating infinite decimals as fractions. For example, consider:

s = 0.0626262 · · · = 0.062

We can apply the ratio-multiplication trick to cancel the infinite tail of digits:

s = 0.0626262 · · · , 1100s = 0.0006262 · · ·

(1− 1100)s = 0.062 =⇒ 99

100s = 621000 =⇒ s = 62

990 .

It is no coincidence that we can use the same trick as before: in fact, thisinfinite decimal is the sum of a geometric series:

s = 6102

+ 2103

+ 6104

+ 2105

+ 6106

+ 2107

+ · · ·

= 62103

+ 62105

+ 62107

+ · · ·

=

∞∑i=1

62

102i+1=

∞∑i=1

62

1011

(102)i=

∞∑i=1

62

103

(1

100

)i−1=

62

1031

1− 1100

=62

1000

100

99=

62

990.

Another example:

1.5626262 · · · = 1.5 + 0.0626262 · · · =15

10+

62

990=

1547

990.

We can carry out such reasoning for any infinite decimal which starts with ar-bitrary digits, then becomes repeating. Thus, any infinite decimal represents areal number, but the repeating decimals represent precisely the rational numbers

(fractions)! For example, since we know√

2 = 1.4142135623730950488 · · · is anirrational number, not equal to any fraction, its decimal digits will not repeat.

Geometric power series. We can take the ratio in a geometric series to be avariable, obtaining a function called a power series:

g(x) =

∞∑n=0

xn = 1 + x+ x2 + x3 + · · ·

(Traditionally, the index starts at n = 0, so the first term is x0 = 1.) By definition,this means that for the input x = r, the output is:

g(r) =∞∑n=0

rn = limN→∞

N∑n=0

rn ,

whenever this limit exists. Our formula for the sum of a geometric series can be

rewritten:∞∑n=0

crn = c 11−r for |r| < 1. This means:

g(x) =

∞∑n=0

xn =

1

1− xif |x| < 1

undefined if |x| ≥ 1.

The set of x for which the series converges is called the interval of convergence: inthis case it is −1 < x < 1, i.e. x ∈ (−1, 1).

Another example:

g(x) =∞∑n=0

3n−1xn+1 =∞∑n=0

x

3(3x)n =

x

3

1

1− 3x=

x

3− 9x,

provided |3x| < 1. The interval of convergence is thus x ∈ (−13 ,

13).

Testing convergence. We usually cannot find a neat formula for the sum of aninfinite series

∑∞i=1 ai = limn→∞

∑ni=1 ai, but we still wish to know whether the

series converges to some finite value. The most obvious way to analyze this is:

Nth Term Non-vanishing Test: If limn→∞

an 6= 0, then the series∑∞

i=1 ai

diverges. If limn→∞

an = 0, then the series might converge or diverge.

examples: Use the Non-Vanishing Test to detect divergence.

•∑∞

n=0 3n−1xn+1 from the previous example. For the Non-vanishing Test, we

want to know if the limit of the terms vanishes, limn→∞

3n−1xn+1 ?= 0. Here

3n−1xn+1 = (3x)n−1x2, and x is fixed while n gets bigger. We see the limitis non-zero provided |3x| ≥ 1, or |x| ≥ 1

3 : in this case the series diverges.

For |x| < 13 , the terms do approach zero, but in this case the Non-vanishing

Test cannot determine convergence or divergence (though we know from ouranalysis of geometric series that it really does converge).

•∑∞

n=11n = 1 + 1

2 + 13 + 1

4 + · · · . Here limn→∞

an = limn→∞

1n = 0, so the Non-

vanishing Test cannot determine whether the series converges. (We will seein §11.3 that it diverges by the Integral Test.)

Math 133 Integral Test Stewart §11.3

Series and integrals. Our goal for infinite series is to express complicatedquantities as infinite series of simple terms, so that finite partial sums approximatethe original quantity as accurately as we like. We will not have significant toolsto achieve this until §11.9.

For now (11.3-7), we concentrate on a more elementary question: when doesa given series converge to some finite value? For example, we have seen the nthTerm Vanishing Test: the series must diverge if the the terms do not approachzero, limn→∞ an 6= 0. A more subtle and powerful convergence test comes fromcomparing the sum of a series to the area under a curve y = f(x) passing througheach point (n, an).

Integral Test: Suppose the function f(x) is continuous, positive, anddecreasing on the interval x ∈ [1,∞), and that an = f(n). We comparethe improper integral

∫∞1 f(x) dx with the infinite series

∑∞n=1 an.

• If∫∞1 f(x) dx diverges, then

∑∞n=1 an also diverges.

• If∫∞1 f(x) dx converges, then

∑∞n=1 an also converges.

Divergent case. Consider∑∞

n=1 an =∑∞

n=11n . Then the function f(x) = 1

x hasan = f(n), and for x ∈ [1,∞), this function is:

• continuous, since its only vertical asymptote is x = 0, outside x ∈ [1,∞) ;

• positive, since x ≥ 1 implies 1x > 0 ;

• decreasing, since its derivative is negative,∗ f ′(x) = − 1x2 < 0.

Thus, we can apply the Integral Test to compare the infinite series with the im-proper integral

∫ x1 f(x) dx. We compute:∫∞

11x dx = lim

n→∞

∫ n1

1x dx = lim

n→∞ln(x)

∣∣∣x=n

x=1= lim

n→∞ln(n)− ln(1) = ∞.

Since the integral diverges, the series∑∞

n=1 an must also diverge.The Integral Test is best understood geometrically. The value of the series is

the total area under the bar-graph of {an}, where we draw the bar at height anabove the interval x ∈ [n, n+1].

Notes by Peter Magyar [email protected]∗See §3.3 Derivatives and Graphs, in the Math 132 Lecture Notes

The integral is the area of the region under y = f(x) and above x ∈ [1,∞). Butthe bar graph completely contains the integral region, so:

∞∑n=1

an ≥∫ ∞1

f(x) dx .

Since the integral diverges to infinity, getting larger and larger with no bound aswe add area on the right, the series must also diverge as we add more terms.

Convergent case. Now consider∑∞

n=1 an =∑∞

n=11n2 which has an = f(n) for

f(x) = 1x2 , which is again a continuous, positive, decreasing function for x ∈ [1,∞).

Thus we can apply the Integral Test to compare the infinte series with the improperintegral

∫ x1 f(x) dx:∫∞

11x2 dx = lim

n→∞

∫ n1

1x2 dx = lim

n→∞− 1

x

∣∣∣x=n

x=1= lim

n→∞− 1

n − (−11) = 1.

Since the integral converges, the series∑∞

n=1 an must also converge.Again, we can understand this geometrically. The value of the series is the

total area under the bar-graph of {an}, but this time we draw the bar at heightan above the interval x ∈ [n−1, n], shifted left from our previous method. Noticethat the heights of the bars are an = 1

n2 , much lower than the previous examplewith an = 1

n , so this area has a better chance of converging to a finite value.

Clearly, the part of the bar-graph after a2 is contained in the integral region under

y = f(x) and above x ∈ [1,∞). Thus:

∞∑n=2

an ≤∫ ∞1

f(x) dx ,

∞∑n=1

an = a1 +

∞∑n=2

an ≤ a1 +

∫ ∞1

f(x) dx = 2 .

We conclude not only that∑∞

n=11n2 is finite, but that it is at most 2.†

Examples.

• Standard p-series. The above reasoning is easily generalized to show:

∞∑n=1

1

npconverges if p > 1, diverges if p ≤ 1.

• Determine the convergence of:

∞∑n=1

an =∞∑n=1

1

(2n− 5)(2n− 7).

In this case, the function f(x) = 1(2x−5)(2x−7) has vertical asymptotes at

x = 52 ,

72 : it is not continuous, or positive, or decreasing on x ∈ [1,∞), so

the Integral Test does not immediately apply.

However, to the right of the asympotes, for x ∈ [4,∞), the function iscontinuous; it is positive, since 2x − 5 > 0 and 2x − 7 > 0 for x ≥ 4; andit is decreasing, since the derivative f ′(x) = − 8(x−3)

(5−2x)2(7−2x)2 < 0 for x ≥ 4.

Further, we have:∫∞4

1(2x−5)(2x−7) dx =

∫∞4 −

12

2x−5 +12

2x−7 dx

= limn→∞14 ln(2x−72x−5

)∣∣∣nx=4

= 0− ln(13) < ∞.

since limn→∞ ln(2n−72n−5

)= ln(1) = 0.

Slightly generalizing the reasoning of the convergent case of the Integral Test

above, we find that∞∑n=5

an ≤∫∞4 f(x) dx. Thus, we have:

∞∑n=1

an = a1 + a2 + a3 + a4 +∞∑n=5

an

< a1 + a2 + a3 + a4 +∫∞4 f(x) dx < ∞.

†In fact, Euler computed this limit as π2

6≈ 1.64. Look up the Basel Problem.

Math 133 Comparison Tests Stewart §11.4

Convergence and divergence. We continue to discuss convergence tests: waysto tell if a given series

∑∞n=1 an = limN→∞

∑Nn=1 an converges (to a finite value), or

diverges (to infinity or by oscillating).∗ So far, we know convergence for two kinds ofstandard series:

• Geometric series:∑∞

n=1 crn−1 converges to c

1−r if |r| < 1, diverges if |r| ≥ 1.

• Standard p-series:∑∞

n=11np converges if p > 1, and diverges if p ≤ 1.

In this section, we test convergence of a complicated series∑an by comparing it to a

simpler one (such as the above): a convergent ceiling∑cn, or a divergent floor

∑dn.

Direct Comparison Test: Let M be a positive integer starting point.

• If 0≤ an≤ cn for n≥M , and∞∑n=1

cn converges, then∞∑n=1

an converges.

• If an≥ dn≥ 0 for n≥M , and∞∑n=1

dn diverges, then∞∑n=1

an diverges.

These results are clear, since the series∑∞

n=1 an is term-by-term smaller or largerthan its comparison series, except possibly the first M−1 terms.†

Example: Determine convergence of:∞∑n=1

n− 1

n2√n + 1

. We have:

an =n− 1

n2√n + 1

≤ cn =n

n2√n

=1

n3/2for n ≥ 1,

since on the left the numerator is smaller and the denominator is larger than onthe right. The comparison series

∑∞n=1 cn =

∑∞n=1

1n3/2 is a standard p-series which

converges, so∑∞

n=1 an also converges.

Example: Determine the convergence of:∞∑n=1

23n+sin(n)

3n + 4n2.

As a rough guess, we ignore the lower-order terms in numerator and denominatorto compare with 23n

3n =(83

)n, which makes a divergent geometric series, so our

series an should also diverge. However, it is not clear that an is really larger thanthis comparison series, so we cannot use dn =

(83

)nas a divergent floor for an in the

second part of the Comparison Test.We want to produce a fractional dn from our an by making the numerator smaller

and the denominator larger. To bound the numerator: 23n+sin(n) = 23n2sin(n) ≥

Notes by Peter Magyar [email protected]∗A general divergent series might oscillate up and down forever, but a positive series (with an ≥ 0)

either levels off to a finite value, or diverges to infinity.†Here we use the completeness axiom of real analysis, which states that if a series of partial sums

has an upper bound, sN =∑N

n=1 an < B for all N , then the least upper bound L = limN→∞

sN exists.

23n2−1. To bound the denominator, we take an exponential function with a slightlylarger base: we can check that 4n ≥ 3n + 4n2 for all n ≥ 3. Thus:

an =23n+sin(n)

3n + n2≥ dn =

23n2−1

4n= 1

22n for n ≥ 3.

Note that we only need the inequality for all large n: the first couple of terms a1, a2make no difference to the convergence or divergence. Since

∑∞n=1 dn =

∑∞n=1

122n is

a divergent geometric series, the orginal∑∞

n=1 an also diverges.

Example: Determine convergence of:∞∑n=1

n+ 1

n3 − 20.

Again, we estimate this sequence by its leading terms:∑∞

n=1nn3 =

∑∞n=1

1n2 ,

which is a convergent standard p-series. However, an = n+1n3−20 >

nn3 , so we cannot

use cn = nn3 as a convergent ceiling for an in the first part of the Test.

However, we should have:

an =n+ 1

n3 − 20≤ cn = 2

n

n3for n large enough.

How large does n need to be to make this inequality valid? Let us check:

n+ 1

n3 − 20≤ 2

n2⇐= 0 < n2(n+1) ≤ 2(n3−20) ⇐⇒ 40 ≤ n2(n−1) ⇐= n ≥ 4 .

Thus, we have:

an =n+ 1

n3 − 20≤ cn =

2

n2for n ≥ 4,

where∑∞

n=12n2 = 2

∑∞n=1

1n2 converges, so the original

∑∞n=1 an also converges.

example: Consider any infinite decimal:

s = 0.d1d2d3 · · · =d110

+d2102

+d3103

+ · · · =

∞∑n=1

dn10n

,

where 0 ≤ dn ≤ 9 are any decimal digits. Does this series always converge, so thatthe infinite decimal represents a real number, or could a bad choice of digits define ameaningless decimal?

In fact, we can compare 0 ≤ dn10n ≤

910n , since each digit is at most 9. The ceiling

is a convergent geometric series:∑∞

n=19

10n =∑∞

n=1910

(110

)n−1= 9

101

1− 110

= 1, so the

original decimal sequence also converges. Any infinite decimal represents a number.

Limit Comparison Test. Suppose limn→∞anbn

= L with 0 < L <∞.

• If∞∑n=1

bn converges, then∞∑n=1

an converges.

• If∞∑n=1

bn diverges, then∞∑n=1

an diverges.

Proof: limn→∞anbn

= L means that, for any small ε > 0, we can take a starting pointN so that for all n ≥ N , we have:

L−ε ≤ anbn≤ L+ε and (L−ε)bn ≤ an ≤ (L+ε)bn .

Taking ε small enough that L±ε > 0, we can prove convergence or divergence bytaking cn = (L+ε)bn or dn = (L−ε)bn in the Direct Comparison Test.

Example: We redo∞∑n=1

n+ 1

n3 − 20. Now we can immediately compare with bn =

n

n3:

anbn

=n+ 1

n3 − 20

/ n

n3=

n+ 1

n

/n3 − 20

n3=

1 + 1n

1− 20n3

.

Taking n→∞ gives L = 1. Since this satisfies 0 < L <∞, and∑∞

n=1 bn =∑∞

n=11n2

is a convergent standard p-series, the original series∑∞

n=1 an also converges.

Extended Limit Comparison Test. In the case where limn→∞anbn

= L = 0,we have an much smaller than bn, so if

∑∞n=1 bn converges, then so does

∑∞n=1 an.

Similarly, in the case where limn→∞anbn

= L = ∞, we have an much larger than bn,so if

∑∞n=1 bn diverges, then so does

∑∞n=1 an.

example: Determine the convergence of:∞∑n=1

n2

2n.

Since n2 is negligeable compared to the exponential growth of 2n, we could roughlyestimate this by

∑∞n=1 bn =

∑∞n=1

12n =

∑∞n=1

(12

)n, a convergent geometric series, so

the original series should converge.However, taking the Limit Comparison Test with this bn = 1

2n gives L =∞, since

an = n2

2n is much larger than bn. Thus this comparison fails: bn is a convergent floorfor an, and we can’t tell whether

∑an converges or diverges.

Let us instead take a slightly larger, but still convergent, comparison: bn =(34

)n:

limn→∞

anbn

= limn→∞

n2(12

)n(34

)n = limn→∞

n2(23

)n= 0 ,

as we could prove by L’Hopital’s Rule. Thus an = n2

2n becomes much smaller than bn,and

∑∞n=1 bn =

∑∞n=1

(34

)nis a convergent ceiling for

∑∞n=1 an, which therefore must

also converge.

Math 133 Ratio Test Stewart §11.6/I

We have one more important test for convergence of an infinite series∑∞

n=1 an. Thistest does not require us to choose a comparison series: instead, we test the ratio of eachterm an compared to the next term an+1.

Ratio Test: Suppose limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = L .

• If L < 1, then∑∞

n=1 an converges.

• If L > 1, then∑∞

n=1 an diverges.

• If L = 1, then this test fails to determine convergence.

Proof: Assuming an > 0, the limit limn→∞

∣∣∣an+1

an

∣∣∣ = L means that, for any small number

ε > 0, we can take a starting point N so that for all n ≥ N , we have:

L−ε ≤ an+1

an≤ L+ε

an(L−ε) ≤ an+1 ≤ an(L+ε) .

Iterating this inequality gives: c1(L−ε)n ≤ an ≤ c2(L+ε)n for some constants c1, c2.∗

If L < 1, we take ε small enough that L+ε < 1, and we compare∑an to the

convergent ceiling series∑c2(L+ε)n. If L > 1, we take ε small enough that L−ε > 1,

and we compare∑an to the divergent floor series

∑c2(L−ε)n. If L = 1, adding any ε

produces a divergent ceiling, and subtracting any ε produces a convergent floor, neitherof which would constrain the original series. Finally, for the general case where the an’smay be positive or negative, the above argument shows

∑|an| converges, which implies∑

an converges by §11.6 Part II. Q.E.D.

The Ratio Test is most useful when an is a product of a growing number of factors,which will mostly cancel out in an+1

an.

example: Determine the convergence of∞∑n=1

n2

2n.

We did this one in §11.4 by finding a tricky comparison series. The Ratio Testnaturally applies here, because an = n2

2n = (n)(n)(12) · · · (12) has more and more factorsas n gets larger. We have:

L = limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

(n+1)2

2n+1

/n2

2n= lim

n→∞

(n+1)2

n2

/2n+1

2n=

1

2.

Since L = 12 < 1, the Test shows

∑an converges.

example: Determine the convergence of

∞∑n=1

(−1)nx2n

n!, where x is a given number and

we use the factorial notation n! = (n)(n−1)(n−2) · · · (2)(1). Again, the terms have alarge number of factors, so we use the Ratio Test:

L = limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

x2(n+1)

(n+1)!

/x2n

n!= lim

n→∞

x2

n+1= 0 .

Since L = 0 < 1, the Test shows∑an converges.

Notes by Peter Magyar [email protected]∗Specifically: an ≤ an−1(L+ε) ≤ an−2(L+ε)2 ≤ · · · ≤ aN (L+ε)n−N = aN

(L+ε)N(L+ε)n.

Math 133 Absolute Convergence Stewart §11.6/II

Series with positive terms. So far, we have mostly considered positive series∑∞

n=1 anwith an ≥ 0, whose partial sums sN =

∑Nn=1 an = a1 + a2 + · · ·+ aN can only increase as

we add more positive terms. As N →∞, these can behave in one of two ways:

• Convergence: partial sums level off beneath a ceiling value:∗ limN→∞

sN =∞∑n=1

an = L.

• Divergence to infinity: partial sums increase without bound: limN→∞

sN =∞∑n=1

an = ∞.

We can picture the sequence {sn}∞n=1 as a line graph connecting the points (n, sn):

Series with positive and negative terms. In the more general case where an can bepositive or negative, the partial sums can osciallate up and down depending on the signof each term added.

• Convergence (oscillating): partial sums wiggle above and below the horizontal asymp-

tote which is their limiting value: limN→∞

sN =∞∑n=1

an = L.

• Divergence to infinity (oscillating): partial sums have more ups than downs, making

an overall increase without bound: limN→∞

sN =∞∑n=1

an = ∞ ; or more downs than

ups, so the limit is −∞.

• Divergence (indecisive oscillation): partial sums do not consistently go up or down

or approach a horizontal asymptote, so limN→∞

sN =∞∑n=1

an does not exist at all.

Notes by Peter Magyar [email protected]∗In fact if the increasing partial sums have an upper bound, sn ≤ B for all n, then the completeness

axiom of real analysis states that the least upper bound limn→∞

sn exists.

An example of indecisive oscillation is an = (−1)n, for which:

sn = 1− 1 + 1− 1 + · · · ± 1 =

{1 for n odd0 for n even.

Absolute convergence. We say that a series∑∞

n=1 an is absolutely convergent wheneverthe series of absolute values is convergent:

∑∞n=1 |an| = M . A series is conditionally

convergent if it is convergent,∑∞

n=1 an = L, but∑∞

n=1 |an| =∞.

In terms of the graph of sN =∑N

n=1 an, absolute convergence means the total lengthof ups and downs is a finite number M . Equivalently, if we change all down steps an < 0to up steps |an| > 0, we obtain the graph of a convergent positive series tN =

∑Nn=1 |an|

converging to the ceiling M :

Absolute Convergence Theorem: If a series is absolutely convergent with∑∞

n=1 |an| = M ,then it is convergent with

∑∞n=1 an = L.

Proof: Let bn = |an|, and p(n) = ±1 be the sign of an, so that an = p(n) bn. By hypothesis,∑|an| =

∑bn is convergent, hence so are the sums of only the positive an and only the

negative an:∞∑n=1

p(n)=+1

bn = L1 and∞∑n=1

p(n)=−1

bn = L2 .

Now: ∑∞n=1 an = lim

N→∞

N∑n=1

an(∗)= lim

N→∞

N∑n=1

p(n)=+1

bn −N∑

n=1p(n)=−1

bn

(∗∗)= lim

N→∞

N∑n=1

p(n)=+1

bn − limN→∞

N∑n=1

p(n)=−1

bn = L1 − L2

Here the equality (∗) follows from rearranging a finite sum of terms, and (∗∗) follows fromthe Limit Sum Law from Calculus I §1.6.

Series with alternating signs. We say that a series is alternating when successiveterms an are of opposite sign; i.e. an = (−1)nbn or an = (−1)n−1bn with bn ≥ 0.

Alternating Series Test: If an is an alternating series with bn = |an| decreasing,meaning bn ≥ bn+1 for all n, and limn→∞ bn = 0, then

∑∞n=1 an converges to

some L. Also, the error of a partial sum is bounded by the next term:∣∣∣∣∣L−N∑

n=1

an

∣∣∣∣∣ ≤ |aN+1|.

Proof: Assuming an = (−1)n−1bn where b1 ≥ b2 ≥ b3 ≥ · · · ≥ 0, and setting sN =∑Nn=1 an = b1 − b2 + b3 − b4 + · · · ± bN , we see that:

s3 = b1 − b2 + b3 = b1 − (b2 − b3) < b1 = s1 ,

and similarly:s2 ≤ s4 ≤ s6 ≤ · · · ≤ s5 ≤ s3 ≤ s1 ,

so the even values of sN form an increasing subsequence, and the odd values form adecreasing subsequence. Furthermore, we have limn→∞ |sn+1 − sn| = limn→∞ bn = 0, sothe even and odd subsequences become arbitrarily close, clearly zeroing in on a finite limitL. Error estimate: for N even, sN ≤ L ≤ sN+1 = sN + bN+1; similarly for N odd. Q.E.D.

Absolutely convergent series have several nice properties which conditionally convergentseries lack. For example, if we rearrange the order of terms in an absolutely convergentseries, the limit does not change, but this is not true for a conditionally convergent series.

example: Consider∑∞

n=1(−1)n−1 1n = 1 − 1

2 + 13 −

14 + · · · , which is convergent by the

Alternating Series Test.† We easily see that the series of positive terms∑∞

n=11

2n−1 =∞and the series of negative terms

∑∞n=1(−

12n) = −∞ are both divergent, so the conditionally

convergent sum of the alternating series involves competing infinities. If we rearrange togive the positive terms a head start, so that a large number of positive terms outrun eachnegative term, then the positive infinity will win. In a sum like:

1 + 13 + 1

5−12 + 1

7 + 19 + · · ·+ 1

21−14 + 1

23 + 125 + · · ·+ 1

101−16 + · · · ,

all the terms an = (−1)n−1 1n eventually appear, but the partial sums tend to ∞, not to

the finite value of the original alternating series.

†In fact, we will see later that 1− 12+ 1

3− 1

4+ · · · = ln(2).

Math 133 Method for Convergence Testing Stewart §11.7

For a series∑∞

n=1 an = a1 + a2 + a3 + · · · , determine if it converges toward a limit aswe add more terms, or diverges (often to ∞).

0. If limn→∞

an 6= 0, then the series diverges by the n-th Term Test (Vanishing Test).

1. Try to manipulate the series into a Standard Series:

• Geometric series:∞∑n=1

crn−1 = c+cr+cr2+cr3+· · · ={

c1−r for |r| < 1

diverges for |r| ≥ 1.

• Standard p-series:∞∑n=1

1np = 1 + 1

2p + 13p + · · · =

{converges for p > 1diverges for p ≤ 1.

2. Estimate the fraction an by taking only the largest terms in the numerator anddenominator, obtaining a simple bn which is often a standard series. Convergenceof∑an is likely to be the same as convergence of

∑bn. Justify with a Test:

• Direct Comparison Test (positive an)

◦ Ceiling 0 ≤ an ≤ cn where∑cn converges =⇒

∑an also converges.

◦ Floor 0 ≤ dn ≤ an where∑dn diverges =⇒

∑an also diverges.

The ceiling cn or floor dn will usually be closely related to the estimate bn.

• Limit Comparison Test (positive an): Determine L = limn→∞

an/ bn.

◦ L <∞ and∑bn converges =⇒

∑an also converges [an < (L+ε)bn].∗

◦ L > 0 and∑bn diverges =⇒

∑an also diverges [an > (L−ε)bn].

3. Try the Integral Test if an is positive and fairly simple, but not comparable to astandard series: e.g. 1

n ln(n) . For positive, decreasing, continuous f(x) with an =

f(n), compute improper integral∫∞1 f(x) dx = lim

N→∞

∫ N1 f(x) dx = lim

N→∞F (N)−F (1).

◦∫∞1 f(x) dx converges =⇒

∑an also converges [

∑∞n=1 an ≤ a1+

∫∞1 f(x) dx].

◦∫∞1 f(x) dx diverges =⇒

∑an also diverges [

∑∞n=1 an ≥

∫∞1 f(x) dx].

4. Try the Ratio Test if an has a growing number of factors, for example if it containsrn or n!. Determine lim

n→∞|an+1/an| = L.

◦ L < 1 =⇒∑an converges [an ≤ c(L+ε)n].

◦ L > 1 =⇒∑an diverges [an ≥ c(L−ε)n].

◦ L = 1 =⇒ no conclusion.

5. If∑an has positive and negative terms, try:

• Absolute Convergence:∑|an| converges =⇒


• Alternating Series: an = (−1)nbn with bn ≥ 0:limn→∞

bn = 0, bn decreasing =⇒∑an converges.

Error estimates: If sN =∑N

n=1 an and L =∑∞

n=1 an, then for N ≥ 1:s2N−1 ≤ L ≤ s2N−1 + b2N and s2N − b2N+1 ≤ L ≤ s2N .

Notes by Peter Magyar [email protected]∗Most later tests are proved by reducing to a Direct Comparison, specified in [brackets].

Math 133 Power Series Stewart §11.8

Series of functions. The main purpose of series is to write an interesting, complicatedquantity as an infinite sum of simple quantities, so that finite partial sums approximatethe original quantity. For example, it is a fact (explained in §11.9) that:

14π = 1− 1

3 + 15 −

17 + 1

9 − · · · ,

and we can approximate π by taking enough terms of this series (and multiplying by 4).∗

So far, we have no tools to find or prove such formulas. Our Tests determine whether aseries converges, but they tell us nothing about what it converges to, nor how to writea given interesting quantity as a series.

We have looked only at series of numbers manipulated with basic algebra and limits.In the next few sections, we will learn about series of functions, and use calculus to writeinteresting, complicated functions as infinite sums of simple functions.

Definition: A power series is a function of x whose output is the sum of aninfinite series:

f(x) =∞∑n=0

cn(x−a)n = c0 + c1(x−a) + c2(x−a)2 + c3(x−a)3 + · · ·

Here the numbers c0, c1, c2, . . . are called the coefficients, and a is called thecenter. The domain of the function (the set of acceptable inputs) containsthose values of x for which the series converges.

The simplest power series is:

f(x) =

∞∑n=0

xn = 1 + x+ x2 + x3 + · · · .

This is a function which takes any particular input x = r to the output f(r) =∑∞

n=0 rn =

1 + r + r2 + r3 + · · · , the sum of the corresponding geometric series. As we know, thisconverges exactly when |r| < 1, so the domain of f(x) is |x| < 1, the interval x ∈ (−1, 1).We also know a simple formula for this sum: f(x) = 1

1−x , which means we can writethis simple rational function as a kind of infinite polynomial function:

f(x) =1

1− x= 1 + x+ x2 + x3 + · · · for x ∈ (−1, 1).

A general power series might not have a simple formula for the sum, but any standardfunction can be written as a power series, as we shall see in §11.10.

Notes by Peter Magyar [email protected]∗An irrational number like π can never be written exactly as a fraction or a finite decimal, so simple

ways to approximate are the best we can hope for.

Taking a finite partial sum like 1 + x+ x2 gives an approximation to f(x) = 11−x for

each x ∈ (−1, 1), so their graphs are close to each other above this interval:

The series converges fastest, the approximation is closest, when |x| is very small; thatis, close to the center x = 0. The graph of 1 +x+x2 is just a parabola, but it is alreadya good approximation to y = f(x) = 1

1−x for x ∈ (−12 ,

12), and taking more terms, such

as 1 +x+x2 +x3 +x4, improves the approximation for x ∈ (−1, 1). The approximationis least accurate close to the vertical asymptote x = 1, since polynomial functions arealways continuous. The approximations are useless outside the interval of convergence.

We can modify f(x) to get power series for related functions. For example, if wesubstitute 1−x into the series for f(x), we get a power series with center x = 1:

1x = 1

1−(1−x) = 1 + (1−x) + (1−x)2 + (1−x)3 + · · ·

= 1− (x−1) + (x−1)2 − (x−1)3 + · · · .

Since the previous series∑∞

n=0 xn converges when |x| < 1, the new series converges when

|1−x| < 1, the interval x ∈ (0, 2) centered at x = 1. The partial sum approximations to1x are most accurate when |1−x| is very small, close to the center.

Power Series Convergence Theorem. Any power series f(x) =∑n

n=0 cn(x−a)n

has one of three types of convergence:

• The series converges for all x.

• The series converges for |x−a| < R, the interval x ∈ (a−R, a+R), and it divergesfor |x−a| > R, where R > 0 is a value called the radius of convergence.†

• The series converges only at the center x = a and diverges otherwise.

Given a power series, we apply one of our Convergence Tests, usually the Ratio Test, toshow which values of x make the series converge.

†The convergence at the endpoints x = a−R, a+R must be determined separately.

example: Determine the domain of convergence of

∞∑n=0

xn

n!.

Since the factorial on the bottom grows much faster than the exponential function ontop, we guess that this always converges. To prove this, the Ratio Test is appropriatebecause the terms of the series have a growing number of factors, most of which cancel:

an+1

an=

xn+1

(n+1)!

/xn

n!=

xn+1

xn· n!

(n+1)!

= x · n (n−1)(n−2) · · · (2)(1)

(n+1)n (n−1)(n−2) · · · (2)(1)=

x

n+1.

Taking any particular value of x, even a large value like x = 1000, gives:

L = limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

|x|n+1

= 0 .

Since L < 1, the sequence converges for all x.

example: Determine the domain of convergence of∞∑n=0

(−1)n(x−2)2n+1

n2 ln(n).

This is a power series with coefficients cn = (−1)nn2 ln(n)

and center a = 2. Ratio Test:

L = limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

∣∣∣∣ (x−2)2n+3

(n+1)2 ln(n+1)

/(x−2)2n+1

n2 ln(n)

∣∣∣∣= |x−2|2 · lim

n→∞

n2

(n+1)2· limn→∞

ln(n)

ln(n+1)= |x−2|2 · 1 · 1 = |x−2|2 ,

where the last two limits are determined by L’Hopital’s Rule. Thus, the series convergeswhen L = |x−2|2 < 1, i.e. when |x−2| < 1, and diverges otherwise. The radius ofconvergence is R = 1, and the open interval of convergence is x ∈ (1, 3). We do notworry about convergence at the endpoints.

example: Determine the domain of convergence of∞∑n=0

(2x+4)3n

3n.

Factoring 2x+4 = 2(x+2), we can rewrite this as a power series:

∞∑n=0

(2x+4)3n

3n=

∞∑n=0

23n

3n(x+2)3n =

∞∑n=0

(23

3

)n(x+2)3n .

Not every power (x+2)n appears, only multiple-of-3 powers (x+2)3n, but this still fitsinto the definition of power series since we can think of the missing terms as having zerocoefficients. The Ratio Test gives L = limn→∞ |an+1

an| = 8

3 |x+2|, so the series converges

when 83 |x+2| < 1, i.e. when |x+2| < 3

8 . The radius of convergence is R = 38 , and the

center is the point where x+2 = 0, namely x = −2; so the open interval of convergenceis: x ∈ (−2− 3

8 , −2 + 38) = (−19

8 ,−138 ).

example: Find a power series∑∞

n=0 cnxn which converges only at x = 0. We must

choose coefficients cn which grow fast enough to overwhelm the decrease of xn, no matterhow small we take x 6= 0. The factorial cn = n! does it: applying the Ratio Test to∑∞

n=0 n!xn gives L = limn→∞(n+1)x = ∞ for any value of x 6= 0, showing divergence.Of course, for x = 0, all the higher terms vanish, and the series converges.

Math 133 More Power Series Stewart §11.9

Calculus on power series. In this section, we will finally be able to give seriesfor some interesting numbers and functions. The cornerstone we build on is thegeometric series 1

1−r =∑∞

n=0 rn, which we manipulate into much more interesting

series formulas. Now, series of numbers can only be manipulated by algebra; but wehave introduced series of functions (power series), where we can apply the calculusoperations of differentiation and integration.

Theorem: Given a power series convergent for |x−a| < R, for some R > 0:

f(x) =∞∑n=0

cn(x−a)n = c0 + c1(x−a) + c2(x−a)2 + c3(x−a)3 + · · · .

Then we have:

• For |x−a| < R, the derivative is:

f ′(x) =

∞∑n=0

n cn(x−a)n−1 = c1 + 2c2(x−a) + 3c3(x−a)2 + · · · .

• For |x−a| < R, the antiderivative is:∫f(x) dx = C +

∞∑n=0

cn(x−a)n+1

n+1= C + c0(x−a) + c1

(x−a)2

2+ c2

(x−a)3

3+ · · · .

That is, we can differentiate and integrate a power series term by term.

Derivatives of geometric series. We think of the geometric series as a function:

f(x) =1

1−x=

∞∑n=0

xn = 1 + x+ x2 + x3 + x4 · · · ,

Its first two derivatives are, for |x| < 1:

f ′(x) =1

(1−x)2=

∞∑n=0

nxn−1 = 1 + 2x+ 3x2 + 4x3 + · · · ,

f ′′(x) =2

(1−x)3=

∞∑n=0

n(n−1)xn−2 = 2 + 3 · 2x+ 4 · 3x2 + · · · .

This lets us find the sums of many series similar to geometric series.

example: Find the sum∞∑n=1

n2

3n=

1

3+

4

9+

9

27+

16

81+ · · · . The first 10 terms sum


to about 1.499, so we can guess the answer is 32 , but how to be sure? Consider the

series as one output of a function; it is g(13) for:

g(x) =∞∑n=0

n2xn = x+ 4x2 + 9x3 + 16x4 + · · · .

We can write this in terms of our known series because n2 = n(n−1) + n, so that:

g(x) =∑∞

n=0 n2xn =

∑∞n=0 n(n−1)xn +

∑∞n=0 nx

n

= x2∑∞

n=0 n(n−1)xn−2 + x∑∞

n=0 nxn−1

= x2 2(1−x)3 + x 1

(1−x)2 = x(x+1)(1−x)3 .

Hence our series sums to g(13) =13( 13+1)

(1− 13)3

= 32 . Fun!

Integrals of geometric series. Using the second part of the Theorem, we integratethe geometric series formula 1

1−x =∑∞

n=0 xn:∫

1

1− xdx = − log(1−x) = C +

∞∑n=0

1

n+1xn+1 = C + x+

x2

2+x3

3+x4

4· · · .

To determine the correct constant C, we use the initial value at x = 0: the left sidebecomes − log(1 − 0) = 0, the right side C + 0 + 02

2 + · · · = C, so C = 0. We canmanipulate this into an expression for ln(x) itself:

ln(1+x) = − (− ln(1−(−x))) = −(

(−x) + (−x)22 + (−x)3

3 + (−x)44 + · · ·

)= x− 1

2x2 + 1

3x3 − 1

4x4 + · · ·

ln(x) = ln(1+(x−1)) = (x−1)− 12(x−1)2 + 1

3(x−1)3 − 14(x−1)4 + · · ·

The first series converges for |x| < 1, so plugging x−1 in place of x, we find that thelast series converges for |x−1| < 1. We conclude:

ln(x) =

∞∑n=1

(−1)n

n(x−1)n for |x−1| < 1.

For example, taking x = 12 , we get x−1 = −1

2 , and:

ln(12) = (−12)− 1

2(−12)2 + 1

3(−12)3 − 1

4(−12)4 + · · · .

Since ln(12) = − ln(2), we conclude:

ln(2) =

∞∑n=1

1

n 2n=

1

2+

1

2 · 22+

1

3 · 23+

1

4 · 24+ · · · .

Since the denominator grows quickly, the series converges rapidly and gives a goodapproximation to ln(2) = 0.69314 · · · after only a few terms. For example, the first10 terms give:

∑10n=1

1n 2n = 0.69306 · · · , accurate to 3 or 4 decimal places. When a

calculator or Wolfram Alpha computes logarithms, it is using some method similarto this, taking enough terms to obtain the desired number of decimal places.

Series for π. We obtained the series for logarithm because it is an inverse functionwhose derivative is a rational function (see the Inverse Derivative Theorem in §6.1).We can do the same for the arctangent:

1

1+x2=

1

1−(−x)2=

∞∑n=0

(−x2)n =

∞∑n=0

(−1)nx2n = 1− x2 + x4 − x6 + · · ·

tan−1(x) =

∫1

1+x2dx =

∞∑n=0

(−1)nx2n+1

2n+1= x− x3

3+x5

5− x7

7+ · · · .

We know there is no constant shift because both sides are 0 at x = 0. The seriesconverges for |x| < 1. Since tan(π4 ) = 1, we have:

π

4= tan−1(1) = 1− 1

3+

1

5− 1

7+ · · · .

Here we must be careful, since x = 1 is all the way at the edge of the interval ofconvergence: the series is not absolutely convergent, but we can show conditionalconvergence by the Alternating Series Test (§11.6/II). (Also, Abel’s Theorem∗ showsthat the convergent series gives the expected value tan−1(1).)

This is known as the Leibnitz formula. It is astonishing because the series is sosimple and seemingly has no relation to circles or angles. It can be used to computeπ (multiplying both sides by 4), but it is inefficient because of the slow convergence.It takes about 200 terms to get 2 decimal places of accuracy, π ≈ 3.14. Further tricksare needed to get an efficient series.

∗Theorem: Let f(x) =∑∞n=0 anx

n converge for |x| < R, and suppose f(R) =∑∞n=0 anR

n exists.Then limx→R− f(x) = f(R), so the power series defines a left-continuous function at x = R.Proof: Replacing f(x) with f(Rx), we can assume R = 1. Replacing a0 with a0 −

∑∞n=1 an, we can

assume f(1) =∑∞n=0 an = 0. We must show limx→1−

∑∞n=0 anx

n = 0.Abel’s summation-by-parts formula states that for sequences {an}, {bn}, with sn =

∑∞k=0 ak, we

have∑Nn=0 anbn = snbn−

∑N−1n=0 sn(bn+1−bn). Applying this for bn = xn with bn+1−bn = (x−1)xn,

we have∑Nn=0 anx

n = sN+1xn + (1−x)

∑N−1n=1 sn+1x

n, and taking N → ∞ for a fixed 0 < x < 1gives f(x) = (1−x)

∑∞n=0 sn+1x

n.Given any ε > 0, we must choose x close enough to 1 to force |f(x)| = (1−x)

∣∣∑∞n=0 sn+1x

n∣∣ < ε.

First choose N large enough that |sn| =∣∣∑n

k=0

∣∣ < 12ε for n > N . Then choose (1−x) small enough

that (1−x)∣∣∣∑N−1

n=0 sn+1xn∣∣∣ < 1

2ε. Finally:

|f(x)| ≤ (1−x) |∑N−1n=0 sn+1x

n|+ (1−x) |∑∞n=N sn+1x

n|

≤ 12ε+ (1−x)

∑∞n=N

12ε xn = 1

2ε+ 1

2ε(1−x) x

N

1−x < ε. �

Now, for f(x) = x − x3

3+ x3

3− · · · , we know that f(x) = tan−1(x) for x ∈ (−1, 1), and Abel’s

Theorem gives f(1) = limx→1− f(x) = limx→1− tan−1(x) = tan−1(1) = π4

.

Math 133 More Taylor Series Stewart §11.11

Review. In §11.10, we saw how Taylor series compute any reasonable function f(x) asa kind of “infinite polynomial” near a center point x = a:

f(x) =∞∑n=0

f (n)(a)n! (x−a)n = f(a) + f ′(a)(x−a) + f ′′(a)

2! (x−a)2 + f ′′′(a)3! (x−a)3 + · · · ,

where n! = n(n−1) · · · (2)(1) with 0! = 1. The constant coefficients cn = f (n)(a)n! involve

the nth derivatives f (n)(x), but use their values only at the center point x = a: if theformula to compute f(x) required that we know f(x), it would be useless.

The first two terms f(x) ≈ f(a) + f ′(a)(x−a) make the linear approximation, while

the degree N Taylor polynomial TN (x) =∑N

n=1f (n)(a)

n! (x−a)n gives a better and betterapproximation of f(x) as we take more terms, provided x is in the interval of conver-gence.∗ This is how calculators can accurately compute complicated functions usingonly the four arithmetic operations.

In this section, we consider only Maclaurin series f(x) =∞∑n=0

cnxn, centered at x = 0.

Binomial series. We have seen several functions which have simple series becausetheir nth derivatives are easy to compute, at least at x = 0. One of the most useful ofthese is the binomial series, the Maclaurin series for the function f(x) = (1+x)p, thepth power of a binomial (an expression with two terms). The coefficients of the series(1+x)p =

∑∞n=0 cnx

n are called binomial coefficients, and they have a special symbolcn =

(pn

),† so that by definition:

f(x) = (1+x)p =∞∑n=0

(pn

)xn =

(p0

)+(p1

)x +

(p2

)x2 + · · · .

We compute these by the usual formula:(pn

)= cn = f (n)(0)

n! . The nth derivative is:

f (n)(x) = p(p−1) · · · (p−n+1) (1+x)p−n ,

so plugging in x = 0 gives: (p

n

)=

n factors︷︸︸︷p(p−1) · · · (p−n+1)

n!.

The Ratio Test shows that any binomial series has radius of convergence |x| < 1.

Notes by Peter Magyar [email protected]∗The Lagrange Remainder Formula bounds the error in the approximation f(x) ≈ TN (x). By the

Ratio Test, the series f(x) =∑∞

n=1 cn(x−a)n will converge if |x−a| < R, where the radius of convergenceis: R = lim

n→∞|cn/cn+1|. If R is finite, the open interval of convergence is x ∈ (a−R, a+R).

†The symbol(pn

)is usually read “p choose n” because if p is a whole number, it turns out that

(pn

)is the

number of ways, given a set of p objects, to choose a subset of n of them. For example,(42

)= 6 corresponds

the 6 ways to choose 2 numbers from {1, 2, 3, 4}, namely {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}.

example: For p = 12 and f(x) = (1+x)1/2 =

√1+x , we get a series very much like that

for√x in §11.10:

(1+x)1/2 = 1 + 12x +

12 ·(−

12 )

2! x2 +12 ·(−

12 )·(−

32 )

3! x3 +12 ·(−

12 )·(−

32 )·(−

52 )

4! x4 + · · · .

= 1 + 12x−

12 ·

12 ·

12! x

2 + 12 ·

12 ·

32 ·

13! x

3 − 12 ·

12 ·

32 ·

52 ·

14! x

4 + · · · .

= 1 + 12x +

∞∑n=2

(−1)n−1(2n−3)!!

2nn!xn for |x| < 1,

where we use the odd-factorial notation: (2n−3)!! = 1·3·5·7 · · · (2n−3).

example: For p = −1 we get a geometric series with ratio r = −x:

(1+x)−1 = 1 + (−1)x + (−1)(−2)2! x2 + (−1)(−2)(−3)

3! x3 + · · · = 1− x + x2 − x3 − · · · .

For p = −2, or any negative integer, the binomial series also simplifies:

(1+x)−2 = 1 + (−2)x + (−2)(−3)2! x2 + (−2)(−3)(−4)

3! x3 + · · · = 1− 2x + 3x2 − 4x3 − · · · ,

which we obtained in §11.9 as the derivative of (1+x)−1.

Whole number powers. If p is a positive integer, the function (1+x)p mulitplies outto a polynomial, so it has a finite series with highest non-zero term xp (since all higherterms have coefficient zero, and can be dropped). For example, taking p = 5:

(1+x)5 =∑∞

n=0

(pn

)xn = 1 + 5

1!x + 5·42! x

2 + 5·4·33! x3 + 5·4·3·2

4! x4 + 5·4·3·2·15! x5 + 0x6 + · · ·

= 1 + 5x + 10x2 + 10x3 + 5x4 + x5 .

Indeed, taking x = ba and clearing denominators gives a general algebraic formula anal-

ogous to (a + b)2 = a2 + 2ab + b2:

(a + b)5 = a5 + 5a4b + 10a3b2 + 10a2b3 + 5ab4 + b5 .

Of course, we could also obtain this by successively multiplying out powers of (a + b):

(a + b)0 = 1

(a + b)1 = a + b

(a + b)2 = a2 + 2ab + b2

(a + b)3 = a3 + 3a2b + 3ab2 + b3

(a + b)4 = a4 + 4a3b + 6a2b2 + 4ab3 + b4

(a + b)5 = a5 + 5a4b + 10a3b2 + 10a2b3 + 5ab4 + b5

In the pth row, the coefficients are:(p

0

)= 1

(p

1

)= p

(p

2

)= 1

2p(p−1) · · ·(

p

p−1

)= p

(p

p

)= 1 .

Because each row is obtained by multiplying the previous by (a + b), each coefficient isthe sum of the two immediately above it to the left and right, for example 10 = 4 + 6.The array of whole-number coefficients, continuing downward infinitely, is called Pascal’sTriangle; it occurs in many problems in algebra and probability.

Modifications of series. Once we know a series formula

f(x) = c0 + c1x + c2x2 + c3x

3 + · · · =∞∑n=0

cnxn

for some explicit coefficients cn, we can manipulate it to get new series formulas forsimilar functions. Let k be a fixed positive integer, and q a constant.

• qxkf(x) = qc0xk + qc1x

k+1 + qc2xk+2 + qc3x

k+3 + · · · =∞∑n=k

qcn−kxn.

• f(qxk) = c0 + c1qxk + c2q

2x2k + c3q3x3k + · · · =

∞∑n=0

ckqkxkn.

•∫

f(x) dx = c0x + c1x2

2+ c2

x3

3+ c3

x4

4+ · · · =

∞∑n=0

cnxn+1

n+1.

(We already saw the last modification in §11.9.)

example: Find the explicit Maclaurin series of f(x) = x2+13√x2−1

. We manipulate this

function to write it in terms of the known binomial series (1+x)1/3 =∞∑n=0

( 13n

)xn.

x2+13√x2−1

= − x2

(1−x2)1/3− 1

(1−x2)1/3

= −x2∞∑n=0

(13

n

)(−x2)n −

∞∑n=0

(13

n

)(−x2)n

=∞∑n=0

(−1)n+1

(13

n

)x2n+2 −

∞∑n=0

(−1)n(1

3

n

)x2n

=∞∑n=1

(−1)n( 1

3

n−1

)x2n −

∞∑n=0

(−1)n(1

3

n

)x2n

=

∞∑n=1

(−1)n[( 1

3

n−1

)−(1

3

n

)]x2n .

A tricky point is the index shift from n = 0 to n = 1: the terms remain the same, asyou can see from writing out the series in dot-dot-dot notation:

∞∑n=0

(−1)n+1( 1

3n

)x2n+2 = (−1)1

( 130

)x2+(−1)2

( 131

)x4+(−1)3

( 132

)x6+· · · =

∞∑n=1

(−1)n( 1

3n−1)x2n.

example: Find the explicit Maclaurin series of f(x) = x sin(x2) − x + x3 . We write

this in terms of the known trig series sin(x) =∞∑n=0

(−1)n(2n+1)! x

2n+1.

x sin(x2)− x + x3 = −x + x3 + x∞∑n=0

(−1)n

(2n+1)!(x2)2n+1

= −x + x3 +∞∑n=0

(−1)n

(2n+1)!x4n+3

= −x +

∞∑n=1

(−1)n

(2n+1)!x4n+3 .

Here the two x3 terms canceled each other.

example: Find the explicit Maclaurin series of the indefinite integral∫x sin(x2) dx.

This integral cannot be computed algebraically, though of course we could numericallyapproximate

∫ x0 t sin(t2) dt for a given x. An alternative is the Taylor series, whose finite

sums give approximations to the integral function for all x:∫x sin(x2) dx =

∫x∞∑n=0

(−1)n

(2n+1)!(x2)2n+1 dx

=

∞∑n=0

(−1)n

(2n+1)!

∫x4n+3 dx =

∞∑n=0

(−1)n

(2n+1)!

x4n+4

4n+4.

Math 133 Taylor Series Stewart §11.10

Series representation of a function. The main purpose of series is to write agiven complicated quantity as an infinite sum of simple terms; and since the termsget smaller and smaller, we can approximate the original quantity by taking only thefirst few terms of the series. In this section, we finally develop the tool that lets us dothis in most cases: a way to write any reasonable function as an explicit power series.This will allow us to compute outputs of the function by plugging into the series.

Our functions must behave decently near the center point of the desired powerseries. We say f(x) is analytic at x = a if it is possible to write f(x) =

∑∞n=0 cn(x−a)n

for some coefficients cn, with positive radius of convergence. In practice, any formulainvolving standard functions and operations defines an analytic function, providedthe formula gives real number values in a small interval around x = a. For example

1x−a is not analytic at x = a, because it gives ±∞ at x = a; and

√x−a is not analytic

at x = a because for x slightly smaller than a, it gives the square root of a negativenumber.∗

Taylor Series Theorem: Let f(x) be a function which is analytic at x = a.Then we can write f(x) as the following power series, called the Taylorseries of f(x) at x = a:

f(x) = f(a)+f ′(a) (x−a)+f ′′(a)

2!(x−a)2+

f ′′′(a)

3!(x−a)3+

f ′′′′(a)

4!(x−a)4+· · · ,

valid for x within a radius of convergence |x−a| < R with R > 0, orconvergent for all x.

If we write the nth derivative of f(x) as f (n)(x), this becomes:

f(x) =∞∑n=0

cn(x−a)n with coefficients cn =f (n)(a)

n!.

warning: The coefficients are constants with no x, so c1 = f ′(a), not f ′(x).

Proof. By hypothesis f(x) is analytic, so f(x) =∑∞

n=0 cn(x−a)n for some cn; wewill derive the desired formula for these coefficients. Since f(a) =

∑∞n=0 cn(a−a)n =

c0 + c1(0) + c2(02) + · · · , we get c0 = f(a). Next, by the Theorem in §11.9, we havef ′(x) =

∑∞n=0 ncn(x−a)n−1, so f ′(a) = c1 + 2c2(0) + 3c3(02) + · · · , and c1 = f ′(a).

Next, f ′′(x) =∑∞

n=0 n(n−1)cn(x−a)n−2, so f ′′(a) = (2)(1)c2 and c2 = 12f′′(a).

Continuing, we get:

f (N)(x) =

∞∑n=1

n(n−1) · · · (n−N+1) cn (x−a)n−N .

Notes by Peter Magyar [email protected]∗The function 3

√x−a is also not analytic near x = a, ever though it gives real number values.

The problem is that it has a vertical tangent at x = a, so it is not differentiable.

The terms for n = 0, 1, . . . , N−1 are all zero because of the factors n(n−1) · · · (n−N+1), so the first non-zero term is for n = N . Plugging in x = a gives: f (N)(a) =N(N−1) · · · (1)cN , and cN = 1

N !f(N)(a) as desired. Q.E.D.

Once we have a power series for f(x) with known coefficients cn = f (n)(a)n! , we can

approximate f(x) by taking a finite partial sum of the series up to some cutoff termN . This partial sum is called a Taylor polynomial, denoted TN (x):

f(x) ≈ TN (x) =

N∑n=0

cn(x−a)n = f(a) + f ′(a)(x−a) + · · ·+ f (N)(a)

N !(x−a)N .

Note that T1(x) = f(a) + f ′(a)(x − a) is just the linear approximation near x = a,whose graph is the tangent line (Calculus I §2.9). We can improve this approximationof f(x) in two ways:

• Take more terms, increasing N .

• Take the center a close to x, giving small (x−a) and tiny (x−a)n.

A Taylor series centered at a = 0 is specially named a Maclaurin series.

Example: sine function. To find Taylor series for a function f(x), we must de-termine f (n)(a). This is easiest for a function which satisfies a simple differentialequation relating the derivatives to the original function. For example, f(x) = sin(x)satisfies f ′′(x) = −f(x), so coefficients of the Maclaurin series (center a = 0) are:

n 0 1 2 3 4 5 6 7

f (n)(x) sin(x) cos(x) − sin(x) − cos(x) sin(x) cos(x) − sin(x) − cos(x)

f (n)(0) 0 1 0 −1 0 1 0 −1

cn = f (n)(0)n! 0 1 0 − 1

3! 0 15! 0 − 1

7!

That is:

sin(x) = x− x3

3!+x5

5!− x7

7!+ · · · =

∞∑n=0

(−1)nx2n+1

(2n+1)!.

To find the domain of convergence, we apply the Ratio Test (11.6/I):

L = limn→∞

∣∣∣∣an+1

an

∣∣∣∣ = limn→∞

∣∣∣∣∣ x2(n+1)+1

(2(n+1)+1)!

/x2n+1

(2n+1)!

∣∣∣∣∣= lim

n→∞

|x|2n+3

|x|2n+1· (2n+1)!

(2n+3)!= lim

n→∞

|x|2

(2n+2)(2n+3)= 0

for any fixed x 6= 0. Since L = 0 < 1 regardless of x, the series converges for all x.This formula for sin(x) astonishes because the right side is a simple algebraic

series having no apparent relation to trigonometry. We can try to understand and

check the series by graphically comparing sin(x) with its first few Taylor polynomialapproximations:

• The Taylor polynomial T1(x) = x (in red) is just the linear approximation ortangent line of y = sin(x) at the center point x = 0. The curve and line areclose (to within a couple of decimal places) near the point of tangency and upto about |x| ≤ 0.5. Once they veer apart, the approximation is useless.

• The next Taylor polynomial T3(x) = x − x3

3! = x − 16x

3 (in green) matchesy = sin(x) in its first three derivatives at x = 0, and stays close to the originalcurve up to about |x| ≤ 1.5 .

• The next T5(x) = x− x3

3! + x5

5! = x− 16x

3 + 1120x

5 is even closer to f(x) for evenlarger x. Taking enough terms in the Taylor series will give a good approxima-tion for any x, since the series converges everywhere.

problem: Compute sin(10◦). A geometric method would be to construct a righttriangle with a 10◦ angle, and measure the opposite side divided by the hypotenuse;but this would only work for a couple of decimal places of accuracy. Of course, acalculator can produce many decimal places, but how does it know? By Taylor series!

As always when doing calculus on trig functions, we must first convert to radians(see end of §2.5): 10◦ = 2π

360(10) = π18 . Here |x| = π

18 ≈16 is small, so the Maclaurin

series centered at 0 should converge quickly, giving very accurate approximations:

sin( π18) ≈ T3( π18) = π18 −

16( π18)3 ≈ 0.1736468 .

It turns out this is correct to 5 decimal places (underlined), using only two non-zero terms of the Taylor series and a good estimate for π. We could verify this bytaking more terms and seeing that these 5 digits do not change; or by the RemainderEstimates below.

Example: square roots. We compute√

2 to 5 decimal places.† First, we mustconsider

√2 to be an output of the function f(x) =

√x at x = 2. Next, we must

choose the center a for its Taylor series.

• a = 0 does not work because√x is not analytic at x = 0. Indeed, if there

were a convergent Taylor series√x = c0 + c1x + c2x

2 + · · · , we could plug inx = −0.1 to get:

√−0.1 = c0 + c1(−0.1) + c2(−0.1)2 + · · · , a real value for the

square root of a negative number!

• a = 1 is too far from x = 2: it turns out |x−a| = |2−1| = 1 is beyond the radiusof convergence of the Taylor series.

• a = 2 is useless, since writing the Taylor series requires us to know f (n)(2),including f(2) =

√2, the same number we are trying to compute.

• A useful choice of a requires: a > 0 so that the Taylor series exists; a is closeto x = 2, making |x−a| small so the series converges quickly; and f(a) =

√a

is easy to compute so we can find the coefficients. A value satisfying all threeconditions is: a = 9

4 .

Now we have:

n 0 1 2 3 4

f (n)(x) x1/2 12 x−1/2 −1·1

2·2 x−3/2 1·1·3

2·2·2 x−5/2 −1·1·3·5

2·2·2·2 x−7/2

f (n)(94) 3

213 − 2

27481 − 40

729

cn =f (n)( 9

4)

n!32

13 − 1

272

243 − 52187

Hence:

√x = 3

2 + 13(x−9

4)− 127(x−9

4)2 + 2243(x−9

4)3 − 52187(x−9

4)4 + · · ·

= 32 + 1

3(x−94) +

∞∑n=2

(−1)n−1 (2n−3)!!n!

2n−1

32n−1 (x−94)n ,

where we use the odd factorial notation (2n−3)!! = (1)(3)(5) · · · (2n−3). For x = 2,we have x−9

4 = −14 , so:

√2 = 3

2 + 13(−1

4)− 127(−1

4)2 + 2243(−1

4)3 − 52187(−1

4)4 + · · · ,

≈ 32 −

13

14 −

127

142− 2

243143− 5

2187144≈ 1.4142143 ,

which is correct to 5 decimal places (underlined).

†We saw another very good algorithm for this in Calculus I §3.8: Newton’s Method, in which wefound approximate solutions to equations like x2−2 = 0 by repeatedly taking a linear approximationto f(x) = x2 − 2. However, Newton’s Method does not help to compute values of sin(x).

Common Taylor series

• 1

1−x=

∞∑n=0

xn for |x| < 1 (Geometric Series).

• ln(1+x) =∞∑n=1

(−1)n−1xn

nfor |x| < 1.

• (1+x)p =

∞∑n=0

p(p−1) · · · (p−n+1)

n!xn for |x| < 1 (Binomial Series).

• exp(x) =

∞∑n=0

xn

n!for all x.

• sin(x) =∞∑n=0

(−1)nx2n+1

(2n+1)!for all x.

• cos(x) =∞∑n=0

(−1)nx2n

(2n)!for all x.

Bounding the remainder to determine accuracy. For a function with Taylorseries f(x) =

∑∞n=0 cn(x−a)n, we define the remainder term as the difference between

a function and its Taylor polynomial approximation:

RN (x) = f(x)− TN (x) =∞∑

n=N+1

cn(x−a)n.

That is, f(x) = TN (x) + RN (x), so that RN (x) is the error in the approximationf(x) ≈ TN (x).

Lagrange Remainder Formula: For any Taylor polynomial approximationf(x) = TN (x) +RN (x), the remainder term is equal to:

RN (x) =f (N+1)(c)

(N+1)!(x−a)N+1

for some point c between a and x.

This allows an a priori estimate of the error, provided we can find an upper boundfor the derivative: if |f (N+1)(t)| ≤M for all t ∈ [a, x] or [x, a], then:

|RN (x)| ≤ maxt∈[a,x]

∣∣∣∣∣f (N+1)(t)

(N+1)!(x−a)N+1

∣∣∣∣∣ ≤ M

(N+1)!|x−a|N+1 ,

since we can apply the bound |f (N+1)(t)| ≤ M to t = c in the Lagrange Re-mainder Formula. This generalizes the error estimate for the linear approxima-tion (Calculus I §2.9 end and §3.2 end). Note the similarity of the error expression

1(N+1)!f

(N+1)(t) (x−a)N+1 to the next term in the Taylor series, 1(N+1)!f

(N+1)(a) (x−a)N+1.We give proofs below.

example: We previously computed sin( π18) = T3( π18) + R3( π18), centered at a = 0.We have the upper bound:

f (N+1)(t) = sin(4)(t) = sin(t) ≤ M = 1 for t ∈ [0, π18 ] .

Thus, the error term is at most:∣∣R3( π18)∣∣ ≤ M

(N+1)! |x−a|N+1 = 1

4!(π18)4 ≈ 4× 10−5 .

Approximation to n decimal places means with error smaller than 0.5× 10−n, so ourapproximation is accurate to at least 4 places (though it is actually 5 places).

example: We previously computed√

2 = T4(2)+R4(2), centered at a = 94 . We have

the upper bound:

|f (N+1)(t)| = | d5dt5

(t1/2)| = 1·1·3·5·72·2·2·2·2 t

−9/2 ≤ 1·1·3·5·72·2·2·2·2 2−9/2 ≤ M = 1

5

for t ∈ [2, 94 ]: we plug in the left endpoint t = 2 since t−9/2 is a decreasing function.

Thus, the error term is at most:

|R4(2)| ≤ M(N+1)! |x−a|

N+1 = 1/55!

∣∣2−94

∣∣5 ≈ 2× 10−6 .

Our approximation is accurate to at least 5 decimal places.

example: The function f(x) = e−1/x2 is not analytic at x = 0, since 1/x2 is undefinedat that point. However, we can easily check that limx→0 f(x) = 0, so x = 0 is

a removable discontinuity (§1.8); and in fact limx→0 f(n)(x) = 0. Thus the Taylor

Series Theorem would give cn = 1n!f

(n)(0) = 0, but this would give the trivial Taylorseries f(x)

??= 0 + 0x+ 0x2 + · · · , which is clearly nonsense. This is because no matter

how small |x| 6= 0, the remainder RN (x) does not go to zero as N→∞: the numerator

f (N+1)(c) overwhelms 1(N+1)! |x|

N+1.

Proof of Remainder Bound. The First Fundamental Theorem (§4.3) gives f(x) = f(a) +∫ xa f ′(t) dt.

Integrating by parts,∫ xa u dv = uv|t=x

t=a −∫ xa v du with u = f ′(t), du = f ′′(t) dt, v = x−t, dv = −dt:

f(x) = f(a)−∫ xa f ′(t) (x−t)′dt

= f(a)−(f ′(x)(x−x)− f ′(a)(x−a)

)+∫ xa f ′′(t)(x−t) dt

= f(a) + f ′(a)(x−a) +∫ xa f ′′(t)(x−t) dt,

which means R1(x) =∫ xa f ′′(t)(x−t) dt. Repeating with u = f ′′(t) and v = 1

2(x−t)2:

f(x) = f(a) + f ′(a)(x−a) + 12f ′′(a)(x−a)2 + 1

2

∫ xa f ′′′(t)(x−t)2 dt,

so that R2(x) = 12

∫ xa f ′′′(t)(x−t)2 dt. Continuing in this way gives:

RN (x) =1

N !

∫ x

af (N+1)(t) (x−t)Ndt.

Thus |f (N+1)(t)| < M implies the weak bound |RN (x)| ≤ MN !

(x−a)N+1, omitting a desired factor of 1N+1

.

To get the full Lagrange remainder formula and the consequent remainder bound, hold x constant anddefine the function r(t) = 1

N !f (N+1)(t) (x−t)N , so that RN (x) =

∫ xa r(t) dt by the above computations.

Applying the integral form of the Cauchy Mean Value Theorem (see §3.2 & §4.4) to the functions r(t) andg(t) = (x−t)N , we find that there exists c ∈ (a, x) such that r(c)/g(c) = (

∫ xa r(t) dt)/(

∫ xa g(t) dt), i.e.

1N !f (N+1)(c) (x−c)N

(x−c)N=

RN (x)

− 1N+1

(x−x)N+1 + 1N+1

(x−a)N+1.

Simplifying gives RN (x) = 1(N+1)!

f (N+1)(c) (x−a)N+1 as desired.

Extra Topic: Irrationality of e. In §11.2, we saw that repeating decimals representrational numbers (fractions), and every fraction can be written by long division asa repeating decimal. Thus, the non-repeating infinite decimals are the real numberswhich cannot be written as fractions: they are irrational. However, it is diffficult toprove that any given number (such as π or

√2) is irrational.

We can use our series to prove the irrationality of the constant e = 2.7182818284590 · · · .To prove the negative proposition that e is not equal to any possible fraction a

b , weuse the method of contradiction: that is, we assume that there were some fractionwith e = a

b , and use this to deduce an impossible statement, which will show thatthe original assumption e = a

b is also impossible.Thus, using the Taylor series definition for e, we assume the possibility:

1 +1

1!+

1

2!+

1

3!+ · · · def

= e =a

b.

It is easy to show 2 < e < 3, so e is not a whole number, and would have denominatorb > 1. Consider the bth order Taylor approximation:

1 +1

1!+

1

2!+

1

3!+ · · ·+ 1

b!+Rb = e, Rb =

∞∑n=b+1

1

n!.

We multiply by b! to clear denominators up to the 1b! term:

b! +b!

1!+b!

2!+ · · ·+ b!

b!+ b!Rb = b! e = b!

a

b= (b−1)! a .

The terms b! , b!1! ,b!2! , . . . ,

b!b! on the left are whole numbers, and (b−1)! a on the right is

a whole number, so the remainder b!Rb must also be a whole number. But it mustalso be very small, as we can see from a simple geometric series comparison:‡

b!Rb =∞∑

n=b+1

b!n! = 1

b+1 + 1(b+1)(b+2) + 1

(b+1)(b+2)(b+3) + · · ·

< 1b+1 + 1

(b+1)2+ 1

(b+1)3+ · · ·

= 1b+1

11− 1

b+1

= 1b+1

b+1b+1−1 = 1

b < 1.

Thus, the same positive number b!Rb would be both a whole number and less than1, which is impossible. Thus the original assumption e = a

b is also impossible.

‡We do not need the powerful Lagrange remainder formula here.

Math 133 Taylor Series Stewart §11.10–11

Binomial Theorem. (a + b)2 = a2 + 2ab + b2

(a + b)3 = (a + b)(a + b)2 = (a + b)(a2 + 2ab + b2) =What about (a + b)6 = a6 + (?)a5b + (?)a4b2 + · · · (?)ab5 + b6.Take a = 1, b = x, so (1 + x)6 = c0 + c1x + c2x

2 + · · ·x6.Find cn = 6·5···(6−n+1)

n!So (a + b)6 =Binomial Theorem: (a + b)N and (1 + x)α.

Modify known Taylor series to get new ones x sin(x) + 11+x

Taylor series for integrals of known functions∫ √

1 + x3 dxTaylor series for limits where L’Hopital does not workSum given series by recognizing it as a Taylor seriesTo bound error of Taylor poly approx, see §11.10


Synthesis: Differential Equations

Differential equations. In science and engineering, the solution of differential equa-tions is the main application of calculus, allowing precise quantitative predictions ofcomplicated dynamical systems. Here we continue the analysis of exponential growthfrom §6.5, introducing some interesting differential equations and solving them by com-bining an array of techniques from the course.

Logistic equation. Consider a process of self-reproduction constrained by an environ-mental ceiling. That is, the rate of growth of the population P (t) is proportional tothe population level, but also to the shortfall between the environmental capacity E (aconstant) and the population level:

dP

dt= kP (E−P ).

At first, when P (t) stays small and E−P (t) stays near E, we expect P (t) to grow aboutexponentially; but as P (t) approaches the capacity E, the shrinking value of E−P (t)will slow the growth rate, making P (t) asymptotically approach E.

We solve by separation of variables (§6.5):∫dP

P (E−P )=

∫k dt = kt+ C.

We integrate the left side by a partial fraction expansion (§7.4): 1P (E−P ) = A

P + BE−P .

Clearing denominators gives 1 = A(E−P ) + BP , and substituting P = 0 and P = Eallows us to solve for the coefficients: A = 1/E and B = 1/E. Thus:∫

dP

P (E−P )=

1

E

∫1

P+

1

E−PdP =

1

E(ln(P )− ln(E−P )) =

1

Eln

(P

E−P

).

Hence 1E ln

(P

E−P

)= kt+ C, which we can solve for P to get a logistic function:

P =EMeEkt

1 +MeEkt,

where M = eEC = P (0)E−P (0) . For E = 1, k = 1, P (0) = 0.1, this looks as expected:


Catenary curve. A flexible chain hanging by its own weight forms a familiar curve:

To determine the equation y = f(x) of this curve, consider a segment stretching fromthe lowest point (0, 0) to an arbitrary point (x, y). The forces on this segment are: aconstant horizontal tension −T0 pulling on the left end; gravity pulling downward withforce −gL(x), proportional to the arclength L(x); and the tangential tension vector T (x)at the right end, decomposing into components T1(x) and T2(x).

With the chain in the equilibrium position, the above forces must cancel: T1(x) = T0and T2(x) = gL(x). Thus the tangent slope of the curve at (x, y) is: dy

dx = T2(x)T1(x)

= kL(x),

where k = g/T0. Applying the formula for arclength of a graph, (§8.1), we get:

dy

dx= k

∫ x

0

√1 + (dy(t)dt )2 dt.

Differentiating both sides by ddx and using the First Fundamental Theorem (§4.3): d2y

dx2 =

k√

1 + ( dydx)2, a second-order differential equation for y = f(x). Since only the derivativesof y appear in the equation, not y itself, we can rewrite it in terms of a new variablez = dy

dx = f ′(x), as: dzdx = k

√1+z2. This is a separable equation (§6.5):∫dz√1+z2

=

∫k dx = kx+ C .

The left-hand integral can be determined using the trigonometric substitution (§7.3)z = tan θ,

√1+z2 = sec θ, dz = sec2θ dz, then

∫sec θ from §7.2:∫

dz√1+z2

=

∫sec2θ

sec θdθ =

∫sec θ dθ = ln(tan θ + sec θ) = ln(z +

√1+z2).

Solving ln(z +√

1+z2) = kx+ C gives: dydx = z = 1

2(ekx+C − e−kx−C). Integrating:

y =1

2

∫ekx+C−e−kx−C dx =

ekx+C + e−kx−C

2k+B =

1

kcosh(kx+C) +B,

where we use the hyperbolic cosine cosh(x) = 12(ex + e−x) from §6.7. We can choose

k,C,B to adjust the curve to any given endpoints and length of chain.

exercise: Imitate the above arguments to work out the shape of a suspension bridge,in which the chain has negligeable weight, but the gravitational force on a segment isproportional to the length of the roadway suspended beneath: i.e. proportional to x.

Terminal velocity. In the standard model of air resistance for a falling object or aspeeding car, the drag force is proportional to the square of the velocity v:

Fd = 12CdAρv

2,

where Cd is the coeffiient of drag, A is cross-sectional area, and ρ is air density. In amass m, this force produces the deceleration:

ad =CdAρ

2mv2 = cv2.

If we drop an object as in §2.7, we let s(t) denote its height and v = s′(t) its velocity.Its total acceleration a(t) = s′′(t) combines the constant acceleration −g due to gravitywith the above deceleration: a = −g + cv2. This gives the differential equation:

s′′(t) = −g + c(s′(t))2.

If we write s′(t) = v(t) and s′′(t) = v′(t) = dvdt we get the simpler equation:

dv

dt= −g + cv2.

As before, this can be solved by separation of variables (§6.5):∫dv

g − cv2=

∫dt = −t+ C.

The substitution u =√

cg v leads to an inverse hyperbolic integral (§6.7):∫

dvg−cv2 = 1

g

∫dv

1−(√

cgv)2

= 1√cg

∫du

1−u2

= 1√cg tanh−1(u) = 1√

cg tanh−1(√

cg v),

where tanh−1(u) = ln√

1+u1−u . Equating this with −t+ C and solving for v gives:

v(t) = −√

gc tanh(

√gc t+K),

where tanh(x) = ex−e−x

ex+e−x . Integrating v = dydt gives (§6.7):

s(t) = −√

gc

∫tanh(

√gc t+K) dt

= −1c ln cosh(

√gc t+K) + L,

= −1c ln(e

√gc t+K + e−

√gc t−K) +M,

where K,M are arbitrary constants, and cosh(x) = ex+e−x

2 .If we assume the initial conditions y(0) = 0, y′(0) = 0, this gives K = 0, L = ln(2).

Near t = 0, this has Taylor series (§11.10) s(t) = −12gt

2 + 112cg

2t4 + · · · . Thus at first,for small t > 0, the falling object closely follows the expected ballistic trajectory withoutair resistance, s ≈ −1

2gt2.

But eventually, for large t→∞, we get:

s(t) ≈ −1c ln(e

√cg t) = −

√gc t .

This means the terminal velocity is v∞ = −√

gc . Terminal velocity would be doubled

by a g-force 4 times as strong, or by cutting the drag coefficient by a factor of 4. In theequivalent situation of a car with steady accelerating force coutered by air resistance,this would mean 4 times the horsepower to double the speed, 9 times to triple.

Math 132 &133 Collection of Methods and Theorems MSU Math

Methods (Theorems at end)

Man vs. machine. We list detailed methods for standard problems from CalculusI & II. Some require connecting conceptual levels (physical, geometric, numerical, alge-braic), and cannot be automated. However, it is important to know even those methodsthat are best done by computer (such as curve sketching and integration), so as to checkat least the general shape of the answer for yourself. If you let the computer do thethinking, not just the calculating, you are ready to blindly accept any bizarre wronganswer, and one typo error can escalate into disaster. You must check the computer’sanswer against your own reasonable expectations.

§2.8. Method for related rates problems

1. Draw a picture labeled with:

• numerical constant values• letter variables and their known current values (at time t = 0)• arrows showing known current rates of change (derivatives at t = 0)• an arrow for the unknown rate of change which is desired (the target rate)

2. Write an equation relating the variables according to the geometry of the picture.

3. Assuming each variable is a function of time t, take the derivative ddt of both sides

of the equation, with the Chain Rule producing derivatives of the variables. Ifnecessary, solve the derivative equation for the derivative which is desired.

4. Plug in the current values of the variables and rates to compute the target rate.

§3.7. Method for optimization. We aim to find the maximum or minimum possiblevalue of a target quantity within the constraints of a (usually geometric) situation.

1. Draw a picture labeled with numerical constant values and with letters for varyingquantities, including: controlling variables to determine the shape; constrainedvariables required to have a fixed value; and the target variable to be optimized.

2. Write equations relating variables according to the geometry of the picture.

3. Choose one of the controlling variables (say, x) as the independent variable, andwrite all other variables as functions of it by solving the above equations. Alsodetermine the relevant domain x ∈ [a, b], which is often restricted by requiring allvariables to be positive.

4. Find the absolute maximum/minimum of the target variable over its domain, sayT = T (x) over x ∈ [a, b]. That is, solve T ′(x) = 0 or undef, to find the criticalpoints x = c1, c2, . . . , as well as the endpoints x = a, b. Take the output values T (x)at these candidate points: the largest/smallest output is the desired max/min.

5. If needed, find values of the other variables at the optimum x. Make sure theanswer is physically plausible to check for mistakes.

§3.8 Newton’s Method. For an equation f(x) = 0, find a numerical solution x witha specified accuracy, starting with a rough approximate solution x ≈ x1.

1. Numerically compute x2, x3, . . . according to the formula: xn+1 = xn − f(xn)f ′(xn)

,

with at least the specified accuracy (number of decimal places).

2. Stop once the approximations no longer change: xn ≈ xn+1 up to the given accu-racy. The final approximate solution is x ≈ xn.


§3.5. Method for Graphing. Given a function y = f(x).

1. Determine the derivatives f ′(x) and f ′′(x) with Derivative Rules.Determine the domain of f(x): for what x the formula makes sense.

2. Solve f ′(x) = 0 and f ′(x) = undef to find the critical points.

3. Make a sign table for f ′(x) to classify each critical point x = a:

x<a x=a x>a

local max _ f ′(x) + 0 −f(x) ↗ f(a) ↘

local min ^ f ′(x) − 0 +

f(x) ↘ f(a) ↗local max ∧ f ′(x) + undef −

f(x) ↗ f(a) ↘local min ∨ f ′(x) − undef +

f(x) ↘ f(a) ↗vert asymp ↗|↖ f ′(x) + 1

0 −f(x) + 1

0 +

vert asymp ↗|↙ f ′(x) + 10 +

f(x) + 10 −

vert asymp ↘|↖ f ′(x) − 10 −

f(x) − 10 +

vert asymp ↘|↙ f ′(x) − 10 +

f(x) − 10 −

Here f(a) means the output value is defined; and 10 means a zero denominator at

x = a produces ±∞ values. There other possibilities if x = a is a discontinuity(see §1.8).

4. Solve f ′′(x) = 0 or undef to find inflection points x = a; we also require that f ′(a)exists and is a local max/min of f ′(x). Make a sign table for f ′′(x) if concavityis needed: f ′′(x) > 0 means concave up (smiling), f ′′(x) < 0 means concave down(frowning).

5. Solve f(x) = 0 to find the x-intercepts; and compute the y-intercept (0, f(0)).

6. Find the behavior as x→ ±∞.• Approximate by highest terms on top and bottom to get f(x) ≈ cxp.• For a better approximation of a rational function f(x) = g(x)

h(x) , use polynomiallong division to get f(x) = q(x) + r(x)

h(x) .

If f(x) = mx+ b+ r(x)h(x) , then y = mx+ b is a slant asymptote.

In general, y = f(x) asymtotically approaches y = q(x) as x→ ±∞.

7. Check for symmetries: ways to move the graph onto itself.

• Side-to-side reflection symmetry for even function f(−x) = f(x).examples: x2+3, x4, cos(x)

• 180◦ rotation symmetry for odd function f(−x) = −f(x).examples: 2x, x3, sin(x)

• Shift-sideways translation symmetry for periodic f(x+c) = f(x).examples: cos(x+2π) = cos(x), tan(x+π) = tan(x).

8. Draw all the above features on the graph.

§3.9, 4.5, 6.6, 6.7, 7.1-7.4. Method for integration. For a function f(x), find theindefinite integral

∫f(x) dx = F (x)+C, i.e. an antiderivative function with F ′(x) = f(x).

For brevity, we omit the constant +C.

1. Basic integrals which directly reverse basic derivatives:∫xp dx = 1

p+1xp+1 (p 6=−1)

∫1x dx = ln|x|

∫ex dx = ex∫

sin(x) dx = − cos(x)∫

cos(x) dx = sin(x)∫sec2(x) dx = tan(x)

∫tan(x) sec(x) dx = sec(x)∫

1√1−x2

dx = sin−1(x)∫

11+x2 dx = tan−1(x)

∫1

x√x2−1 dx = sec−1(x)∫

1√x2−1 dx = cosh−1(x) = ln(x+

√x2 − 1)

∫1√

1+x2dx = sinh−1(x) = ln(x+

√1 + x2)

2. Substitution: Factor the integrand so that∫f(x) dx =

∫h(g(x)) · g′(x) dx.

That is, find a factor g′(x) which is a known derivative of some g(x) appearinginside the other factor. To get g′(x) exactly, perhaps multiply and divide by aconstant. To find the outside h(u), you may need to solve u = g(x) as x = g−1(u).

Take u = g(x), du = g′(x) dx, so that∫h(g(x)) · g′(x) dx =

∫h(u) du = H(u).

Restore the original variable:∫f(x) dx = H(g(x)).

3. Integration by Parts. Factor the integrand so that one factor is a known derivativeg′(x). Then:

∫f(x) dx =

∫h(x)·g′(x) dx = h(x)·g(x)−

∫g(x)·h′(x) dx.

In Leibnitz notation,∫u dv = uv −

∫v du.

Do the remaining integral∫g(x)·h′(x) dx by another method. Here g(x) should

be no more complicated than g′(x), and h′(x) should be simpler than h(x).

4. Products of Trig Functions. Substitute by factoring out a derivative g′(x) =cos(x), sin(x), sec2(x) or tan(x) sec(x); and writing the remaining factor in terms

of u = g(x) using cos2(x)+ sin2(x) = 1, tan2(x)+1 = sec2(x), tan(x) = sin(x)cos(x) .

Otherwise, use identities sin2(x) = 12 −

12 cos(2x), cos2(x) = 1

2 + 12 cos(2x).

A hard case:∫

sec(x) dx = ln|tan(x)+ sec(x)|. Also, any trig integral converts intoa rational function integral by the Tangent Half-Angle Substitution (§7.3).

5. Reverse Trig Substitution. If√a2−x2 appears in

∫f(x) dx, complicate the integral

by substituting x = a sin(θ), dx = a cos(θ) dθ; simplify using√a2−(a sin(θ))2 =

a cos(θ). Do the resulting trig integral, then restore x using θ = arcsin(xa ).

Do the same for√x2−a2 using x = a sec(θ); and for

√x2+a2 using x = a tan(θ).

6. Partial Fractions for integrating rational functions f(x) = g(x)h(x) , where g(x), h(x)

are polynomials. If g(x) has degree greater than or equal to h(x), perform long

division to get f(x) = q(x) + r(x)h(x) , where r(x) has degree less than h(x).

If the denominator factors as h(x) = (x−a)(x−b) · · · with a, b, . . . all different,

split f(x) into the form: f(x) = g(x)(x−a)(x−b)··· = A

x−a + Bx−b + · · · . Solve for the

constant A after clearing denominators and substituting x = a; and similarly forthe other constants B, . . .. Finally, integrate using

∫A

x−a dx = A ln|x−a|.

If h(x) has factors like (x−a)k or ax2+bx+c with no real roots, see §7.4.

§11.7. Method for Convergence Testing. For a series∑∞

n=1 an = a1+a2+a3+ · · · ,determine if it converges toward a limit as we add more terms, or diverges (often to∞).

0. If limn→∞

an 6= 0, then the series diverges by the n-th Term Test (Vanishing Test).

1. Try to manipulate the series into a Standard Series:

• Geometric series:∞∑n=1

crn−1 = c+cr+cr2+cr3+· · · ={

c1−r for |r| < 1

diverges for |r| ≥ 1.

• Standard p-series:∞∑n=1

1np = 1 + 1

2p + 13p + · · · =

{converges for p > 1diverges for p ≤ 1.

2. Estimate the fraction an by taking only the largest terms in the numerator anddenominator, obtaining a simple bn which is often a standard series. Convergenceof∑an is likely to be the same as convergence of

∑bn. Justify with a Test:

• Direct Comparison Test (positive an)

◦ Ceiling 0 ≤ an ≤ cn where∑cn converges =⇒


◦ Floor 0 ≤ dn ≤ an where∑dn diverges =⇒

∑an also diverges.

The ceiling cn or floor dn will usually be closely related to the estimate bn.

• Limit Comparison Test (positive an): Determine L = limn→∞

an/ bn.

◦ L <∞ and∑bn converges =⇒

∑an also converges [an < (L+ε)bn].∗

◦ L > 0 and∑bn diverges =⇒

∑an also diverges [an > (L−ε)bn].

3. Try the Integral Test if an is positive and fairly simple, but not comparable to astandard series: e.g. 1

n ln(n) . For positive, decreasing, continuous f(x) with an =

f(n), compute improper integral∫∞1 f(x) dx = lim

N→∞

∫ N1 f(x) dx = lim

N→∞F (N)−F (1).

◦∫∞1 f(x) dx converges =⇒

∑an also converges [

∑∞n=1 an ≤ a1+

∫∞1 f(x) dx].

◦∫∞1 f(x) dx diverges =⇒

∑an also diverges [

∑∞n=1 an ≥

∫∞1 f(x) dx].

4. Try the Ratio Test if an has a growing number of factors, for example if it containsrn or n!. Determine lim

n→∞|an+1/an| = L.

◦ L < 1 =⇒∑an converges [an ≤ c(L+ε)n].

◦ L > 1 =⇒∑an diverges [an ≥ c(L−ε)n].

◦ L = 1 =⇒ no conclusion.

5. If∑an has positive and negative terms, try:

• Absolute Convergence:∑|an| converges =⇒


• Alternating Series: For an = (−1)nbn with bn ≥ 0limn→∞

bn = 0, bn decreasing =⇒∑an converges.

Error estimate: For L =∑∞

n=1 an, get |L−∑N

n=1 an| ≤ bN+1 for N ≥ 1.

∗Most later tests are proved by reducing to a Direct Comparison, specified in [brackets].

§5.1, 5.2, 5.4, 10.2, 10.4. Method of slice analysis to compute size. Let S beany measure of the size or bulk of a geometric object: length, area, volume, mass, etc.We want an integral formula to compute it.

1. Cut the object into slices whose position is determined by some variable x ∈ [a, b].

2. Mark off the interval [a, b] into n increments of width ∆x = b−an , each with a

sample point xi . This splits the object into n slices, and summing up their sizesgives the total size: S =

∑ni=1 ∆Si .

3. Because the slice at xi is so thin, we can find a good approximation of its size bysome simple formula of the form ∆Si ≈ f(xi) ∆x.

4. Taking n→∞ and ∆x→ 0, the approximations become exact:

S = limn→∞

n∑i=1

f(xi) ∆x =

∫ b

af(x) dx .

5. Having expressed S =∫ ba f(x) dx, we evaluate this integral by algebraic or numer-

ical techniques.

Theorems. Key theoretical results from Calculus I & II.

§1.5 Limit definitions.

• limx→a f(x) = L means that f(x) can be forced arbitrarily close to L by makingx sufficiently close to (but unequal to) a.

• limx→a f(x) =∞ means that f(x) can be forced to be arbitrarily large by makingx sufficiently close to (but unequal to) a.

• limx→a+ f(x) = L means that f(x) can be forced arbitrarily close to L by makingx sufficiently close to (but larger than) a.

§1.6 Squeeze Theorem: Suppose f(x) ≤ g(x) ≤ h(x) for all x near a (except possiblyx = a), and limx→a f(x) = limx→a h(x) = L. Then limx→a g(x) = L.

§1.8 Continuity definition. A function f(x) is continuous at x = a wheneverlimx→a f(x) = f(a).Graphically, a function is continuous whenever the graph y = f(x) proceeds through thepoint (a, f(a)) without jumps or holes.

Types of discontinuity.

• Removable discontinuity: f(a) is undefined, but limx→a f(x) exists.

• Removable discontinuity: f(a) and limx→a f(x) exist, but are unequal.

• Jump discontinuity: the left and right limits are unequal, limx→a+ f(x) 6= limx→a− f(x).

• Vertical asymptote: limx→a+ f(x) and/or limx→a− f(x) are ±∞.

• Essential discontinuity: limx→a+ f(x) and/or limx→a− f(x) do not exist.

§1.8 Intermediate Value Theorem (IVT): If f(x) is continuous for x ∈ [a, b], andr is between f(a) and f(b), then there is a value c ∈ (a, b) such that f(c) = r; that is,f(x) must pass through every value r between f(a) and f(b).

§2.1 Derivative defintion: The derivative of f(x) at x = a means

f ′(a) = limh→0

f(a+h)− f(a)

h= lim

x→a

f(x)− f(a)

x− a.

The function is differentiable at x = a if f ′(a) exists.

§2.2 Continuity Theorem. If f(x) is differentiable at x= a, then f(x) is also contin-uous at x= a.

§3.1 Extremal Value Theorem (EVT): If f(x) is continuous on the closed, finiteinterval x ∈ [a, b], then f(x) possesses at least one absolute maximum point and oneabsolute minimum point.

§3.1 First Derivative Theorem: if f(x) has a local maximum or minimum overx ∈ [a, b] at x = c ∈ (a, b), and f ′(c) exists, then f ′(c) = 0.

§3.2 Mean Value Theorem (MVT): If f(x) is continuous on the closed intervalx ∈ [a, b] and differentiable on the open interval x ∈ (a, b), then there is some pointc ∈ (a, b) with

f ′(c) =f(b)− f(a)

b− a.

That is, the tangent line to the graph y = f(x) at some point (c, f(c)) must be parallelto the secant line from (a, f(a)) to (b, f(b)).

§3.2 Uniqueness Theorem: If f(x), g(x) have the same derivative f ′(x) = g′(x) forall x ∈ (a, b), and the same initial value f(c) = g(c) for some c ∈ [a, b], then f(x) = g(x)for all x ∈ [a, b]. That is, there can be only one function with a given derivative and agiven initial value.

§3.3 Increasing/Decreasing Theorem: Let f(x) be continuous for x ∈ [a, b].• If f ′(x)> 0 for all x ∈ (a, b), then f(x) is strictly increasing: f(p) < f(q) for p < q.

• If f ′(x)≥ 0 for all x ∈ (a, b), then f(x) is increasing: f(p) ≤ f(q) for p < q.

• Similarly for f ′(x) < 0 and f(x) decreasing.

§4.2 Integral Definition: Given a function f(x) on an interval x ∈ [a, b].• Divide [a, b] into n increments of width ∆x = b−a

n , and choose sample pointsx1, . . . , xn with one in each increment: xi ∈ [a+(i−1)∆x, a+i∆x]. Define theintegral as a limit of Riemann sums:∫ b

af(x) dx = lim

n→∞

n∑i=1

f(xi)∆x = limn→∞

f(x1)∆x+ · · ·+ f(xn)∆x.

• In an upper Riemann sum, choose the sample points so that f(xi) is maximal inits increment x ∈ [a+(i−1)∆x, a+i∆x]; then the sum gives an overestimate of theintegral. Similarly for a lower Riemann sum giving an underestimate.

• The function f(x) is integrable over [a, b] whenever the above limit exists for everypossible choice of sample points xi.

• Theorem: Any continuous function, or even a function with a finite list of remov-able or jump discontinuities, is integrable over [a, b].

§4.3 First Fundamental Theorem of Calculus (FTC1): Let f(x) be continuouson x ∈ [a, b] and define I(x) =

∫ xa f(t) dt. Then:

I ′(x) =d

dx

(∫ x

af(t) dt

)= f(x) .

That is, the rate of change of the cumulative effect of f(t) over t ∈ [a, x] is the strengthof the effect f(x) at the endpoint t = x.

§4.3 Second Fundamental Theorem. If f(x) has a known anti-derivative F (x) withF ′(x) = f(x), then: ∫ b

af(x) dx = F (b)− F (a) .

That is, the cumulative effect of the rate of change f(x) = F ′(x) is the total changeF (b)− F (a).

§4.4 Average definition: The average of f(x) over x ∈ [a, b] is fave = 1b−a

∫ ba f(x) dx.

§6.1 Inverse functions: Consider a function f : A→ B with inputs in the set A andoutputs covering the set B.

• Suppose f is one-to-one, meaning if x1 6= x2, then f(x1) 6= f(x2), that is, thegraph y = f(x) satisfies the horizontal line test.

• Define the inverse function f−1 : B → A as f−1(b) = a, where a ∈ A is the uniquevalue with f(a) = b.

• The function f and its inverse f−1 undo each other: f−1(f(a)) = a and f(f−1(b)) =b for all a ∈ A, b ∈ B.

• If f(x) is differentiable at x = a, and b = f(a), then f−1(y) is differentiable aty = b, and

(f−1)′(b) =1

f ′(a)=

1

f ′(f−1(b)).

In Leibnitz notation with y = f(x) and x = f−1(y): dxdy

∣∣∣y=b

= 1 /(

dydx

∣∣∣x=a

).

§6.8 L’Hopital’s Rule: For functions f(x), g(x), suppose f ′(x), g′(x) exist and g′(x) 6=0, on some interval x ∈ (a−δ, a+δ). Suppose that either:

limx→a

f(x) = limx→a

g(x) = 0 or limx→a|f(x)| = lim

x→a|g(x)| =∞.

Then:

limx→a

f(x)

g(x)= lim

x→a

f ′(x)

g′(x),

provided the right side limit exists, or equals ∞ or −∞.Let f(x), g(x) be functions which are differentiable and g′(x) 6= 0, on a semi-infinite

interval x ∈ (c,∞). Suppose that either:

limx→∞

f(x) = limx→∞

g(x) = 0 or limx→∞

|f(x)| = limx→∞

|g(x)| =∞.

Then:

limx→∞

f(x)

g(x)= lim

x→∞

f ′(x)

g′(x),

provided the right side limit exists, or equals ∞ or −∞. All this also holds with x→∞replaced with x→ −∞.

Date post:	16-Jan-2022
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Math 133 Volume Geometry of integrals.

Documents