GENERAL RELATIVITY COSMOLOGY

transcript

GENERAL RELATIVITY

COSMOLOGY

- A Quick Guide -

Huan Q. Bui

Colby College

PHYSICS & MATHEMATICSStatistics

Class of 2021

July 31, 2019

Preface

Greetings,

General Relativity & Cosmology - A Quick Guide is compiled from myPH335: General Relativity and Cosmology with professor Robert Bluhm atColby College. The course is based on A Short Course in General Relativity,3th Edition, by James Foster and J. David Nightingale.

One of my favorite things to do is to transfer my handwritten lecture notesinto nice LATEXdocuments. I started working on these documents last year withSpecial Relativity - A Quick Guide to, which was based on (part of) my PH241:Modern Physics I with professor Charles Conover. I am still working on it, alongwith another one on Linear Algebra (based on MA253 with Otto Bretscher),and of course General Relativity - the one you are reading now. All of these.pdf files can be found on my website, under the “A Quick Guide to...” tab.For the “raw” content, i.e., my handwritten lecture notes, feel free to visit the“Lecture Notes” tab.

Today is Dec 22, 2018. In the spring I will be doing an independent studyon Classical Fields Theory with prof. Bluhm. I will also be taking Matrix Anal-ysis with prof. Leo Livshits. I’m excited to be taking these advanced physicsand mathematics courses, and I look forward to sharing my journey with youthrough my lecture notes.

Enjoy!

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1 Overview and Review 71.1 Review of Special Relativity . . . . . . . . . . . . . . . . . . . . . 71.2 The Equivalence Principle . . . . . . . . . . . . . . . . . . . . . . 91.3 Versions of the Equivalence Principle . . . . . . . . . . . . . . . . 10

1.3.1 The Strong Equivalence Principle . . . . . . . . . . . . . . 111.3.2 The Weak Equivalence Principle . . . . . . . . . . . . . . 11

2 Review of Vector Calculus 132.1 Operations & Theorems . . . . . . . . . . . . . . . . . . . . . . . 132.2 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Flat 3-dimensional space 213.1 Curvilinear coordinates . . . . . . . . . . . . . . . . . . . . . . . 213.2 Basis vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3 Natural basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Dual basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.5 Contravariant and covariant vectors . . . . . . . . . . . . . . . . 28

3.5.1 The suffix notation (vector calculus) . . . . . . . . . . . . 283.5.2 The Einstein summation notation . . . . . . . . . . . . . 283.5.3 Contravariant and covariant vectors . . . . . . . . . . . . 29

3.6 Metric tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.6.1 Summations as matrix products . . . . . . . . . . . . . . 36

3.7 Coordinate transformation . . . . . . . . . . . . . . . . . . . . . . 393.7.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.7.2 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.8 Scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Flat spacetime 514.1 Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1.1 The Lorentz Transformations, revisited . . . . . . . . . . 534.1.2 A curiosity about the Lorentz boosts . . . . . . . . . . . . 574.1.3 The Poincare transformations . . . . . . . . . . . . . . . . 574.1.4 Velocity, momentum, and force . . . . . . . . . . . . . . . 62

4 CONTENTS

4.2 Relativistic Electrodynamics . . . . . . . . . . . . . . . . . . . . . 67

5 Curved spaces 715.1 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 2-dimensional curved spaces . . . . . . . . . . . . . . . . . . . . . 735.3 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.4 Tensors on manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.4.1 Combining tensors . . . . . . . . . . . . . . . . . . . . . . 785.4.2 Special tensors . . . . . . . . . . . . . . . . . . . . . . . . 81

6 Gravitation and Curvature 836.1 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.2 Geodesics and Affine connections Γσµν . . . . . . . . . . . . . . . 84

6.2.1 Flat 3D space . . . . . . . . . . . . . . . . . . . . . . . . . 846.3 Geodesics in curved space . . . . . . . . . . . . . . . . . . . . . . 896.4 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.5 Curved Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . 956.6 Principle of covariance . . . . . . . . . . . . . . . . . . . . . . . . 966.7 Absolute and Covariant differentiation . . . . . . . . . . . . . . . 99

6.7.1 Absolute differentiation . . . . . . . . . . . . . . . . . . . 996.7.2 Covariant differentiation . . . . . . . . . . . . . . . . . . . 101

6.8 Newtonian limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 1046.8.2 Weak limit of General Relativity . . . . . . . . . . . . . . 105

7 Einstein’s field equations 1097.1 The stress-energy tensor Tµν . . . . . . . . . . . . . . . . . . . . 1117.2 Riemann curvature tensor Rλµνσ . . . . . . . . . . . . . . . . . . 1137.3 The Einstein equations . . . . . . . . . . . . . . . . . . . . . . . . 1147.4 Schwarzschild solution . . . . . . . . . . . . . . . . . . . . . . . . 115

8 Predictions and tests of general relativity 1198.0.1 Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8.1 Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238.2 Gravitational redshift . . . . . . . . . . . . . . . . . . . . . . . . 1248.3 Radar time-delay experiments . . . . . . . . . . . . . . . . . . . . 126

8.3.1 Experiments of Shapiro (1968 - 1971) . . . . . . . . . . . 1288.4 Particle Motion in Schwarzschild geometry . . . . . . . . . . . . . 129

8.4.1 Planar motion θ = π/2 . . . . . . . . . . . . . . . . . . . . 1308.4.2 Light motion with θ = π/2 . . . . . . . . . . . . . . . . . 1328.4.3 Can light have circular orbit? . . . . . . . . . . . . . . . . 132

8.5 Other tests of GR . . . . . . . . . . . . . . . . . . . . . . . . . . 1338.6 Black Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

8.6.1 Radial trajectories of massive objects . . . . . . . . . . . 1338.6.2 Light signal . . . . . . . . . . . . . . . . . . . . . . . . . . 137

CONTENTS 5

9 Cosmology 1399.1 Large-scale geometry of the universe . . . . . . . . . . . . . . . . 140

9.1.1 Cosmological principle . . . . . . . . . . . . . . . . . . . . 1409.1.2 Robertson-Walker (flat, open, closed) geometries . . . . . 1409.1.3 Expansion of the universe . . . . . . . . . . . . . . . . . . 1409.1.4 Distances and speeds . . . . . . . . . . . . . . . . . . . . . 1409.1.5 Redshifts . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.2 Dynamical evolution of the universe . . . . . . . . . . . . . . . . 1409.2.1 The Friedmann equations . . . . . . . . . . . . . . . . . . 1409.2.2 The cosmological constant Λ . . . . . . . . . . . . . . . . 1409.2.3 Equations of state . . . . . . . . . . . . . . . . . . . . . . 1409.2.4 A matter-dominated universe (Λ = 0) [Friedmann models] 1409.2.5 A flat, matter-dominated universe (Λ = 0) [old favorite

model] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1409.3 Observational cosmology . . . . . . . . . . . . . . . . . . . . . . . 140

9.3.1 Hubble law . . . . . . . . . . . . . . . . . . . . . . . . . . 1409.3.2 Acceleration of the universe . . . . . . . . . . . . . . . . . 1409.3.3 Matter densities and dark matter . . . . . . . . . . . . . . 1409.3.4 The flatness and horizon problems . . . . . . . . . . . . . 1409.3.5 Cosmic Microwave Background (CMB) anisotropy . . . . 140

9.4 Modern cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . 1409.4.1 Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1409.4.2 Dark energy (The cosmological constant problem) . . . . 1409.4.3 Concordance model [new favorite model] . . . . . . . . . . 1409.4.4 Open questions . . . . . . . . . . . . . . . . . . . . . . . . 140

6 CONTENTS

Chapter 1

Overview and Review

General relativity is a theory of gravity, replacing Newton’s gravity law for heavymasses to give more precise predictions. However, we should keep in mind thatgeneral relativity is not yet compatible with quantum mechanics. There arenumerous open problems in physics related to reconciling gravity and quantummechanics.

1.1 Review of Special Relativity

Special relativity studies the kinematics and dynamics in relatively moving iner-tial reference frames. Most of special relativity and its consequences are encodedin the Lorentz transformations. A classic example of a Lorentz transformationthat is often focused on in introductory special relativity is called the “Lorentzboost” in the x-direction. Note that there is nothing special about x or y orz. The x-Lorentz boost is essentially a coordinate transformation, i.e. givencoordinates an event A in frame (S), we can calculate the coordinates of A inframe (S′).

γ −γβ 0 0−γβ γ 0 0

0 0 1 00 0 0 1

where β = v/c and the Lorentz factor

γ =1√

1− v2

Since we are studying relativity, it is important to look at “invariants” - quan-tities that do not change under transformations. In special relativity, one such

8 CHAPTER 1. OVERVIEW AND REVIEW

invariant is the spacetime invariant:

(∆S)2 = (c∆τ)2

= (c∆t)2 − (∆x)2 − (∆y)2 − (∆z)2

= (c∆t′)2 − (∆x′)2 − (∆y′)2 − (∆z′)2

= (∆S′)2 = (c∆τ ′)2.

This can be readily shown. In fact, we have verified this in Special Relativity: AQuick Guide. This quantity, roughly speaking, is a measure of proper distancesand times. If we go to a rest frame of event A, such that ∆x′ = ∆y′ = ∆z′ = 0,then we get ∆t′ = ∆τ , where τ denotes the proper time, which is the timemeasured in the rest frame. This gives (∆S)2 = (c∆τ)2.

In relativity in general and in Minkowski spacetime in particular, we areinterested in two types of objects: scalars and vectors. Scalars are invariantunder general coordinate transformations. An example of a scalar is the space-time invariant. Another scalar, which we probably will not see again in thistext, is the metric signature (sgn([ηµν ] = −2)). Vectors, on the other hand, area different type of objects, which transform in the same way under coordinatetransformations, but are not invariant under general coordinate transformationsin general, i.e., their components are not necessarily the same in different co-ordinate systems. In 4-dimensional spacetime, we work with 4-vectors, whichsimply means 4-component vectors.

In special relativity, we often talk about position vectors x = (ct, x, y, z)> =(ct, ~x)> and energy-momentum vectors p = (E/c, px, py, pz)

> = (E/c, ~p)>.These vectors transform under the Lorentz transformation.

Example 1.1.1. The x-Lorentz boost applied to p givesE′/cp′xp′yp′z

γ −γβ 0 0−γβ γ 0 0

0 0 1 00 0 0 1

E/cpxpypz

Notice that E/c transforms like ct (time) and ~p transforms like ~x (space). Wealso have the following invariant

c2− p2

x − p2y − p2

c2− ~p · ~p.

But recall that (mc)2 = E2/c2 − ~p · ~p, so

c2− p2

x − p2y − p2

z = (mc)2.

If we go to a rest frame, such that ~p = ~0, then we obtain the famous restmass-energy equivalence: E = mc2.

1.2. THE EQUIVALENCE PRINCIPLE 9

Some quantities associated with a vector can be invariant (scalar), suchas their norm. In fact, the spacetime invariant in Minkowski space is nothingbut a dot product of a vector with itself:

a · a = a0a0 − a1a1 − a2a2 − a3a3,

where 0, 1, 2, 3 are indices, and a implies a 4-vectors, which should be dis-tinguished from the 3-vector ~a = (a1, a2, a3)>. Notice that the dot productin Minkowski spacetime is defined differently from that in flat 3-dimensionalCartesian coordinate system. In the following chapters, we will explore how dotproducts are defined in general. Hint: the metric tensor plays an important role.

1.2 The Equivalence Principle

In 1907, Albert Einstein had the “happiest thought of his life” when he realizedthat in a freely falling frame (non rotating and/or accelerating), the effects ofgravity go away, i.e., there is an equivalence between gravity and accelerationsuch that they can “undo” each other. For example, the following two situa-tions are equivalent in terms of the acceleration experienced by the observer:(i) a person standing on Earth, and (ii) a person inside an elevator acceleratingupwards at rate g in free space (no gravitational field). Likewise, the followingtwo situations are also equivalent: (iii) a person floating in free space, and (iv) aperson inside an airplane free falling towards Earth (this is commercially knownas “zero-G flight”). we arrive at the statement of the Equivalence Principle:

“A small, non-rotating, freely falling frame in a ~g field is an inertial frame.′′

The above statement is a direct result of Galileo’s discovery that all objectshave the same acceleration due to gravity. This result may seem a little bitcircular, because it is actually a coincidence that the two roles of mass: (i) tocause gravitational force like charge in an electric field:

~F =GMmG

r3~r = mG~g

and (ii) to measure inertia:

~F = mI~a

are the same, i.e., mI = mG. It could have been that “gravitational mass”mG and “inertial mass” mI are not the same, in which case ~a 6= ~g, and theequivalence principle does not hold. However, the famous Eotvos experiment,which measured the correlation between inertial mass and gravitational mass,has shown that

|mG −mI |mI

≤ 10−10.

While the observations we have made so far can seem self-apparent, theconsequence of the equivalence principle is the bending of light around massiveobjects. Consider a light ray going horizontally (left to right) pass an upward-accelerating elevator with an observer inside. For the observer, since he isaccelerating upwards, he sees the light beam as bent, entering at the top-left ofthe elevator and exiting at the bottom-right. Now, according to the equivalenceprinciple, because the upward-accelerating scenario is equivalent to the existenceof a massive object (such that the observer experiences the same acceleration).This means that the observer, now no longer in the elevator, “sees” the light asbent around this massive object. Fig. 1. illustrates this postulate.

Figure 1.1: (a) A person in an upward-accelerating elevator (hence “Ground” isdownward-accelerating at rate g) sees light as bent. (b) A person on a massiveplanet also sees light as bent.

General relativity predicts that light going pass Earth’s surface will “fall” byapproximately 1A, which is not observable. However, for a much more massiveobject like the Sun, general relativity predicts a bending of 1.75′′ (arc sec). Thisprediction was verified by the glorious experiment of Arthur Eddington.

Note that we could argue for the bending of light, using Newtonian physics.However, in order to get the correct predictions for the bending of light, weneed general relativity. We will explore the reason behind this discrepancy inthe following chapter. But roughly speaking, spacetime is assumed to be flatin Newtonian physics, while spacetime is curved by massive objects, accordingto general relativity. One might ask: “How do we view falling objects on Earthas due to the curvature of spacetime?” The answer requires bringing backMinkowski spacetime diagrams.

1.3 Versions of the Equivalence Principle

There are two versions of the equivalence principle, referred to as the strongequivalence principle, SEP, and the weak equivalence principle, WEP.

1.3. VERSIONS OF THE EQUIVALENCE PRINCIPLE 11

Figure 1.2: World-lines of an object traveling at constant velocity v and of anobject that is falling. Note that the “curvature” due to g here is extremelyexaggerated.

1.3.1 The Strong Equivalence Principle

The strong equivalence principle states that all of physics reduces to specialrelativity in a freely falling frame.

1.3.2 The Weak Equivalence Principle

The weak equivalence principle states that all point particles fall at the samerate in a gravitational field (mG = mI). We notice that WEP only applies togravity, which makes it sufficient to develop general relativity, but not quantummechanics. For this remaining of this note, we will rely on the WEP.

Chapter 2

Review of Vector Calculus

2.1 Operations & Theorems

Recall that a path in 3-dimensional space can by parameterized a single variable,which we will call t, as

~r(t) = x(t)i+ y(t)j + z(t)k,

where the tangent to this path is given by

~r(t) =d~r

The length of the underlying curve of this path is the integral over the lineelement ds = ||~r|| = ||~r|| dt:

||~r|| dt.

Example 2.1.1. Consider a path in R2 defined as ~r(t) = (t, sin t), with t ∈[−2π, 2π]. We instantly recognize that the underlying curve of this path isnothing but a sine curve in the xy-plane. The length of this curve is

ˆ 2π

−2π

||(1, cos t)|| dt =

ˆ 2π

−2π

√1 + cos2 t dt ≈ 15.28.

Next, we consider a vector-valued function ~F : R3 → R3 defined as~F (~r) = (Fx(x, y, z), Fy(x, y, z), Fz(x, y, z)). Recall the two “del operators” fromvector calculus: the curl and the div, whose definitions are

curl(~F ) = ~∇× ~F = det

i j k∂∂x

∂∂y

∂∂z

Fx Fy Fz

14 CHAPTER 2. REVIEW OF VECTOR CALCULUS

div(~F ) = ~∇ · ~F =∂Fx∂x

+∂Fy∂y

+∂Fz∂z

where the “del” symbol represents the operator

~∇ =

∂x,∂

∂y,∂

We have also seen the gradient of a scalar field (or potential) f : R3 → R. Thegradient is defined as

~∇f =

∂x,∂f

∂y,∂f

There are two kinds of potential functions in vector calculus: scalar potentialand vector potential. While we are more familiar with the scalar potential (e.g.the gravitational potential or the electric potential in electromagnetism), thereisn’t as strong a connection between vector potentials and any physical meaning.However, we should still look at some examples.

Example 2.1.2. In electromagnetism, an electric field ~E is defined as the neg-ative gradient of the electric potential φ, measured in Volts:

~E = −~∇φ.

On the other hand, an magnetic field ~B is defined as the curl of the magneticpotential ~A.

~B = ~∇× ~A.

In this case, A is a vector potential. Note that we don’t often talk about avector potential of an electric field. This is simply because for a point charge,which produces an inverse-square field, the vector potential simply does notexist. The proof is straightforward. For curious readers, this proof can bean interesting exercise (the solution should be a one-liner). Or else, pleaserefer to my vector calculus notes on my website. Hint: we need one of thetwo following fundamental theorems of vector calculus: Stokes’ and Gauss’ (orOstrogradsky’s).

Stokes’ theorem states that, the circulationaround a closed curved in R3 by a vector field ~F is the flux of ~∇× ~F throughthe surface S whose boundary C = ∂S is the curve C.

˛C=∂S

~F · d~s =

~∇× ~F · d~S.

2.1. OPERATIONS & THEOREMS 15

Gauss’ theorem states that, if S is closed surface, oriented outward, andif ~F is defined throughout the solid region W enclosed by S, then the flux of ~Fthrough S is the total divergence of S through the region W .

‹S=∂W

~F · d~S =

˚~∇ · ~F dV.

Line integrals are one of a few important operations in vector calculus. In thecontext of undergraduate or high school physics, we often use the line integral tocalculate the work done by a force ~F over some path ~r. The integral in generalis simply an accumulation of the component of the vector field along the path.

~F · d~r.

If ~F is a conservative field, then the line integral along ~F between two pointA and B only depends on where A and B are, i.e., the line integral is path-independent. A good illustration of this fact is the way we compute the changein electric potential:

W = −ˆ

~E · d~r = ∆φ

To actually compute a line integral, we often look for symmetry first (or

whether ~F is conservative), then parameterize ~r = ~r(s) if we must. Then, theline integral becomes

~F · d~r =

~F (~r(s)) · d~rdsds

Surface integrals give the flux of a vector field through a surface:

ˆ~F · d~S =

ˆ~F · ~n dS = ±

ˆ~F · ~N dφ, dθ,

where ~F is the vector field, and ~n is the unit normal vector to the surface.The last expression denotes the surface integral in terms of a parameterization~S(φ, θ) = (x(φ, θ), y(φ, θ), z(φ, θ)). ~N is the standard normal to the surface,which is not necessarily a unit vector (with norm of 1) and not necessarily inthe same direction as ~n, which is intrinsic to the surface.

Example 2.1.3. Consider the electric flux through a sphere:

‹~E · d~a =

The last expression comes from Gauss’ (or Ostrogradsky’s) theorem, equiva-lently the divergence theorem, where q is the charge enclosed.

We should introduce the Maxwell’s equations as they “bring together”the concepts we have touched on so far. To see how, we consider the Maxwell’sequations in integral form:

Gauss’ law:

‹~E · d~a =

No magnetic monopole:

˛~B · d~a = 0

Faraday’s law:

˛~E · d~s = − d

dtΦB =

~B · d~a

Ampere-Maxwell’s law:

‹~B · d~s = µ0I + µ0ε0

~E · d~a

Using the theorems and facts we have discussed, we can convert the aboveequations into differential form. First, we can use Gauss’ theorem on Gauss’law, with

ρ d3r

where ρ is the charge volume density. So, Gauss’ law becomes˛

~E · d~a =

~∇ · ~E d3r =1

ρ d3r,

which implies that the divergence of any electric field ~E is proportional to thecharge enclosed q.

~∇ · ~E =ρ

Applying Gauss’ theorem to the second equation,‹

~B · ,.~a =

~∇ · ~B d3r = 0

we immediately see that the divergence of any magnetic field ~B is 0, i.e., thereis no such thing as a “magnetic monopole.”

~∇ · ~B = 0

We can apply Stokes’ theorem on the other two equations to turn them intodifferential form. Converting the third equation is as simple as bookkeeping:

˛~E · d~s =

~∇× ~E · d~a = − ∂

~B · d~a,

2.2. COORDINATE SYSTEMS 17

which says the curl of any electric field ~E is negatively proportional to thechange in the magnetic field.

~∇× ~E = −∂~B

To convert the fourth equation into differential form, we define a new quantity,~J , as current (area) density, such that

~J · d~a.

Again, applying Stokes’ theorem to the last equation,˛

~B · d~s =

~∇× ~B · d~a = µ0I + µ0ε0∂

~E · d~a = µ0

(~J + ε0

∂ ~E

)· d~a.

So, in differential form:

~∇× ~B = µ0~J + µ0ε0

∂ ~E

In a later chapter, we will see how to make these equations fully relativistic. Forthe curious reader wanting more information regarding this subsection, pleaserefer to a standard textbook on vector calculus, or feel free to use my VectorCalculus lecture notes for a more formal treatment of this subject.

2.2 Coordinate Systems

There are numerous coordinate systems to describe 3-dimensional space. How-ever, we are most familiar with the Cartesian coordinates (x, y, z), the Sphericalcoordinates (r, θ, φ), and the Cylindrical coordinates (ρ, φ, z). The Cartesian co-ordinate system is the most simple, elegant, and “nice” of the three systems,since the (x, y, z) coordinates can also be coefficients of the vector pointing fromthe origin to any point in space. We will see that this is not the case in generalcoordinate systems. One such coordinate system is the spherical coordinatesystem (illustrated in Fig. 3), where

x = r sin θ cosφ

y = r sin θ sinφ

z = r cos θ

and the inverse relations are

r =√x2 + y2 + z2

θ = cos−1(zr

)= cos−1

x2 + y2 + z2

)φ = tan−1

where 0 ≤ r, 0 ≤ θ ≤ π, and 0 ≤ φ ≤ 2π. The volume element is given by

Figure 2.1: (a) Spherical coordinates, (b) Cylindrical coordinates

dV = dx dy dz = r2 sin θ dr dθ dφ.

Consider a sphere of radius R. The area element is given by

dA = Rdθ dφ.

In cylindrical coordinates,

x = ρ cosφ

y = ρ sinφ

and the inversion relations are

ρ =√x2 + y2

φ = tan−1(yx

)The volume element is given by

dV = r dt dz dφ.

Consider a cylinder of radius R, the area element is given by

dA = Rdz dφ.

The purpose of having the area and volume elements is so that we could inte-grate over regions, parameterized in spherical or cylindrical coordinates. Whileit seems like these quantities appear out of the blue, there is a general proce-dure to find them, given any arbitrary coordinate system. This “procedure” is

2.2. COORDINATE SYSTEMS 19

called the Jacobian, which essentially gives the scaling factor as we move fromone coordinate system into another. We know that the area element in polarcoordinate is dA = r dr dθ. We can see this is true, using the Jacobian. First,the Jacobian matrix, which is characteristic of a transformation from coordinatesystem ui into uj

′is defined as

[U j′

[∂uj

∂u1′ . . . ∂u1

∂un′

.... . .

...∂un

∂u1′ . . . ∂un

∂un′

Note that i and j′ are indices, not powers. Now, the general scaling factor isdefined by the following equality

N = du1′ du2′ . . . dun′

=∣∣∣det

([U ji ]

)∣∣∣ du1 du2 . . . dun

Example 2.2.1. The scaling factor as we move from 2-dimensional Cartesiancoordinates into polar coordinates is

N = det

(∂x∂r

∂x∂θ

∂y∂r

∂y∂θ

)= det

(cos θ −r sin θsin θ r cos θ

So, the area element is dA = dx dy = rdr dθ, as expected.

Example 2.2.2. We can try repeating the same procedure to get the vol-ume element for spherical coordinates, which we have said before to be dV =r2 sin θ dr dθ dφ. First, define the Jacobian matrix:

[U j′

∂x∂r

∂x∂θ

∂x∂φ

∂y∂r

∂y∂θ

∂y∂φ

∂z∂r

∂z∂θ

∂z∂φ

sin θ cosφ r cos θ cosφ −r sin θ sinφsin θ sinφ r cos θ sinφ r sin θ cosφ

cos θ −r sin θ 0

After some bookkeeping-type rearrangements and computation, we find that

N =∣∣∣det

[U j′

]∣∣∣ = r2 sin θ,

as expected. As a check, we can integrate this volume element over a regiondefined by x2 + y2 + z2 = R2 to find its volume

ˆ 2π

r2 sin θ dr dθ dφ =4

3πR3,

which is not surprisingly the volume of the sphere of radius R.

Chapter 3

Flat 3-dimensional space

Flat, or “Euclidean” space are spaces with no curvature. For instance, a flatsheet of paper in R3 is considered “flat” while the surface of a sphere embeddedin R3 is considered curved space. There should also be a distinction betweena sphere in R3 and a sphere embedded into R3. The former case is simplysome subset of flat 3-dimensional space, so it is considered flat. But in thelatter case, the sphere is a curved 2-dimensional space, although it lives in flat3-dimensional space.

Our goal in this section is to see how we can use any arbitrary coordinatesystem, which specify points in 3-dimensions as an intersection of 3 surfaces. Forinstance, in Cartesian coordinates, a point (a1, a2, a3) is specified by 3 planes:x = a1, y = a2, and z = a3. In spherical coordinates, a point is specifiedby intersecting a sphere of radius r, a cone with azimuth angle θ, and a planeof constant φ. In cylindrical coordinates, a point is specified by intersectinga cylinder of radius ρ, vertical plane of constant φ to the x−axis, and thehorizontal plane of constant z.

3.1 Curvilinear coordinates

Curvilinear coordinates are an arbitrary coordinate system in Euclidean spacewhere coordinate lines can be curved (hence curvilinear). Note that althoughcoordinate lines can be curvy, the space is still flat. We can call (u, v, w) ourarbitrary coordinates, which specify a point (a1, a2, a3) by intersecting threesurfaces: u = a1, v = a2, w = a3. Let let relations between (u, v, w) and(x, y, z) be established as

u = u(x, y, z); x = x(u, v, w)

v = v(x, y, z); y = y(u, v, w)

w = w(x, y, z); z = z(u, v, w)

22 CHAPTER 3. FLAT 3-DIMENSIONAL SPACE

Figure 3.1: (a) Three intersecting (mutually distinct) planes define a point inCartesian coordinates (source), (b) Three intersecting surfaces (a cone, a sphere,and a plane) defining a point in spherical coordinates (source), and (c) Threeintersecting surfaces (two planes and a cylinder) defining a point in cylindricalcoordinates (source).

3.2 Basis vectors

3.3 Natural basis

Next, since we want to be able to describe vectors using arbitrary curvilinearcoordinates, we need to know a basis set of vectors that span the space. InCartesian coordinates, such a basis set is {i, j, k}. Similarly, we want to find aset {~eu, ~ev, ~ew} that would give a basis for an arbitrary curvilinear coordinate

system. The way we obtain this set is the same as how we obtain {i, j k} fromCartesian coordinates. We notice that i is a vector that follows the change inthe x-direction with y and z fixed, i.e., i is a tangent vector to a vector ~r alongthe change in x. Hence

i =∂~r

3.3. NATURAL BASIS 23

Figure 3.2: In curvilinear coordinates, a point is defined by the intersection oflevel surfaces (source).

Likewise, j = ∂~r∂y , and k = ∂~r

∂z . By analogy, we can define the set {~eu, ~ev, ~ew}with

~eu =∂~r

∂u, ~ev =

∂v, ~ew =

In general, given a coordinate system with (u1, u2, . . . , un), we can define anatural basis set by letting

~eui =∂~r

where i = 0, 1, 2, . . . , n are indices. Note that the set {~eui} need not be orthog-onal. The only requirement is that they have to be linearly independent andspan the space. We also note that they need not be unit vectors. It is possibleto make a basis set of only unit vectors by letting

eu =~eu||~eu||

but we will see that this definition is not so useful in general relativity. A ques-tion one might ask is: “What is natural about this basis set?” The answer,unfortunately, will be provided in the following sections, where we will discuss

how they give rise to the metric tensor.

We will often use {i, j, k} as a reference basis, i.e., we can express {~eu, ~ev, ~ew}in terms of these:

~εu = (eu)xi+ (eu)y j + (eu)z k.

Example 3.3.1. We can look at the basis set for the spherical coordinatesystem, {~er, ~eθ, ~eφ}. First, let the coordinates (u, v, w) be (r, θ, φ), the sphericalcoordinates. A vector ~r, expressed in spherical coordinates, has the form:

r sin θ cosφr sin θ sinφr cos θ

where r ≥ 0, 0 ≤ θ ≤ π, and 0 ≤ φ ≤ 2π. To obtain the basis vectors, wesimply take ∂~r/∂r, ∂~r/∂θ, and ∂~r/∂φ. We expect that our resulting vectors

should form the column vectors of the Jacobian matrix [U j′

i ], which, as we havediscussed previously, has the following form:

[U j′

What we’re doing now, in terms of linear algebra, is essentially constructing

[U j′

i ] - the change-of-basis matrix. Let’s see if our described procedure works:

~er =∂~r

sin θ cosφsin θ sinφ

cos θ

~eθ =

∂θ=

r cos θ cosφr cos θ sinφ−r sin θ

~eφ =

∂φ=

−r sin θ sinφr sin θ cosφ

Not surprisingly,

[U j′

= (~er|~eθ|~eφ) .

Next, we can check if ~er, ~eθ, ~eφ is mutually-perpendicular set. (Note that I avoidthe word “orthogonal” here, since it carries different meanings in physics and inlinear algebra). We can also check whether the three vectors are unit vectors.

3.4. DUAL BASIS 25

It suffices to take the dot products of ~er, ~eθ, ~eφ among themselves:

~er · ~er = sin2 θ(cos2 φ+ sin2 φ) + cos2 θ = 1

~er · ~eθ = r cos2 φ sin θ cos θ + r sin2 φ sin θ cos θ − r sin θ cos θ = 0

~er · ~eφ = −r sin2 θ sinφ cosφ+ r sin2 θ sinφ cosφ = 0

~eθ · ~eθ = r2 cos2 φ cos2 θ + r2 cos2 φ sin2 θ + r2 sin2 θ = r2

~eθ · ~eφ = 0

~eφ · ~eφ = r2 sin2 φ sin2 θ + r2 sin2 θ cos2 φ = r2 sin2 θ.

We notice that all dot products of cross-coordinates result in 0, which meansthe basis vectors we found are mutually perpendicular (we can also see thisfat geometrically, based on how the spherical coordinates are defined). But wealso find that these basis vectors are not unit vectors, since their norms are notnecessarily 1:

||~er|| = 1

||~eθ|| = r

||~eφ|| = r sin θ.

3.4 Dual basis

The procedure we have described in the previous subsection allows us to find anatural basis set of vectors for a coordinate system. However, basis sets arenot unique. In this section, we look at an alternative basis set of vectors, calledthe dual basis set, denoted {~eu, ~e v, ~ew, . . . }. Instead of deriving these from thefinding the tangent vectors, we find the normal of surfaces of constant u, v, w, . . .

Recall that the gradient operator ~∇, when applied to a scalar field f , givesthe normal vectors to surfaces of constant f . Since curvilinear coordinates aregiven by coordinate lines where u, v, or w is constant, we can let

~eu = ~∇u

~e v = ~∇v

~ew = ~∇w,

and so on, where ~eu is a normal vector to the surface of constant u. Or moregenerally,

= ~∇ui

Now, we might wonder what the dual basis in Cartesian coordinates is. It turnsout the dual and natural basis sets in Cartesian coordinates are the same:

~e x = ~∇x = (1, 0, 0)> = i = ~ex

~e y = ~∇y = (0, 1, 0)> = j = ~ey

~e z = ~∇z = (0, 1, 0)> = k = ~ez.

There’s actually no magic behind this “coincidence.” The reason the basis setsare the same is simply that the direction of increasing/decreasing x is the sameas the normal vector to the surface of constant x, i.e., the yz-plane. However,

we suspect that this is not the case in general curvilinear coordinates, since thedirection of increasing u, for instance, should not necessarily be the same as thenormal vector to the surface of constant u, as illustrated here:

Example 3.4.1. We can look at the dual basis set for the spherical coordinatesystem. Based on what we have established so far, we can compute the dualbasis set, using the inverse relations (x, y, z)→ (r, θ, φ).

~e r = ~∇r = ~∇(x2 + y2 + z2)1/2 = (x2 + y2 + z2)−1/2

sin θ cosφsin θ sinφ

cos θ

Tedious and painstaking computations give us the other two vectors:

~e θ =1

cos θ cosφcos θ sinφ− sin θ

~eφ =

− sinφsin θ

cosφsin θ0

3.4. DUAL BASIS 27

We can readily compare {~e r, ~e θ, ~eφ} to {~er, ~eθ, ~eφ} and find that while ~e r = ~er,~e θ 6= ~eθ and ~eφ 6= ~eφ.

Example 3.4.2. Let’s consider a non-familiar case of paraboloidal surfaces(u, v, w), a non-orthogonal set, where

x = u+ v

y = u− vz = 2uv + w,

which gives the inverse relations

2(x+ y)

2(x− y)

w = z − 1

2(x2 − y2).

Let ~r = (x, y, z)> = (u+ v, u− v, 2uv + w)>. We first obtain the natural basisset:

~eu =∂~r

∂u= (1, 1, 2v)>

~ev =∂~r

∂v= (1, 1, 2u)>

~ew =∂~r

∂w= (0, 0, 1)>,

which is an mutually perpendicular set, by inspection (i.e., all cross-dot productterms are zero). Next, we obtain the dual basis set:

~eu = ~∇u =1

2~∇(x+ y) =

)>~e v = ~∇v =

2~∇(x− y) =

2,−1

)>~ew = ~∇w = ~∇

(z − 1

2(x2 − y2)

)= (−x, y, 1)> = (−u− v, u− v, 1)>

But notice that we get non-zero cross-dot product terms:

~eu · ~ew = −v~eu · ~e v = 0

~e v · ~ew = −u.The reason for observing what value the dot products between basis vectors arewill become clear when we talk about the metric tensor in the next subsections.

3.5 Contravariant and covariant vectors

3.5.1 The suffix notation (vector calculus)

In general coordinate systems, there can be many more than three coordinates,and it can be burdensome to keep track of different letters representing these co-ordinates, especially if they don’t follow a convention. It is convenient to changeour notation from “letter-based” to “index-based.” For example, instead of us-ing (u, v, w) as coordinates, we will now use (u1, u2, u3), for i = 1, 2, 3. Notethat we are using upper indices here. As we will discuss soon, upper and lowerindices have different meanings.

Similarly, our notation for basis vectors also change. For the natural basis,we “code” {~eu, ~ev, ~ew} as {~ei}, where i = 1, 2, 3. For the dual basis, we “code”{~eu, ~e v, ~ew} as {~e i} for i = 1, 2, 3. Our vector notation also changes as a result,

but there are additional rules. Let vector ~λ be given. In the natural basis, thecomponents of λ is written with upper indices,

~λ = λ1~e1 + λ2~e2 + λ3~e3 =

3∑i=1

λ1~ei.

and in the dual basis, we use lower indices for the components:

~λ = λ1~e1 + λ2~e

2 + λ3~e3 =

3∑i=1

λi~ei

3.5.2 The Einstein summation notation

We shall make another “leap in notation” as we introduce the Einstein summa-tion convention:

Any index that appears once up and once down is automatically summed

For example, from now on, we will write ~λ as:

3∑i=1

λi~ei = λi~ei

Note that this summation convention removes us from indicating the minimumand maximum values for our indices, which allows us to keep the mathematicsas applicable to any n-dimensional system as possible. We should also noticethat indices can be exchangeable, especially in summed situations. For example,

aibi = ajbj = · · · =N∑n=1

3.5. CONTRAVARIANT AND COVARIANT VECTORS 29

Keep in mind that aibi does not make sense in the Einstein convention. If wewant to denote a sum, we have to specify the range of i as follows:

N∑i=n

Likewise, aibici is not defined either. Remember that only “1 up index, 1

down index” is allowed. Finally, it is widely agreed upon that certain lettersare reserved for some specific cases. For instance, i, j, k, l, · · · = 1, 2, 3 are re-served for 3-dimensional space. We use Greek characters µ, ν, α, β, · · · = 0, 1, 2, 3for 4-dimensional spacetime, A,B,C, · · · = 1, 2 for 2-dimensional spaces, anda, b, c, · · · = 1, 2, . . . , N for N-dimensional manifolds.

3.5.3 Contravariant and covariant vectors

Source

Earlier, we have established that any vector ~λ can be written in two “upper-lower index” ways, depending on whether we are using the natural or the dualbasis set:

~λ = λi~ei = λi~ei.

We shall now call λi the contravariant components and λi the covariantcomponents. (Remember: “co” is “low”). Contravariant vectors are simplyvectors with contravariant components. Likewise, covariant vectors are vectorswith covariant components. Later in this text, we will denote ~λ simply as λi

(contravariant vector) or λi (covariant vector).

Putting together what we have established so far about index notation andEinstein summation convention, we can now look at the last interesting dot

product: between a vector in the natural basis and one in the dual basis: ~e i ·~ej .Before we proceed, keep in mind that this single expression (~e i · ~ej) represents9 different objects because there is not summation here (i is not necessarily jand i, j = 1, 2, 3). Now, recall the definitions of these vectors:

~e i = ~∇ui =∂ui

∂xi+

∂yj +

~ej =∂~r

∂uj=

∂uji+

∂ujj +

∂ujk

Therefore,

~e i · ~ej =∂ui

∂uj+∂ui

But notice that this is nothing but the chain rule, if we think about ui asui(x, y, z). So,

~e i · ~ej =∂ui

But also notice that the ui’s are independent variables, i.e.,

∂uj=

{0 if i 6= j

1 if i = j

We can introduce a two-index object, the Kronecker delta, defined as

δij =

{0 if i 6= j

1 if i = j

We obtain an elegant relation:

~e i · ~ej = δij

Again, keep in mind that this is not a single equation but rather 9. Six ofwhich are 0 = 0 and three 1 = 1. We observe that for u 6= v, ~eu · ~ev = 0, i.e.,~eu ⊥ ~ev, which makes sense, by definition: As a little aside, every time we seean two-index object, we should immediately think of a matrix. In matrix form,the Kronecker delta is our beloved identity matrix I:

[δij ] =

1 0 00 1 00 0 1

What about the dot (or inner) products of {~e i} and {~ej} among themselves? Wehave calculated the terms before, when we introduced spherical coordinates, but

3.5. CONTRAVARIANT AND COVARIANT VECTORS 31

Figure 3.3: The natural and basis vectors are defined such that ~eu ·~ev = δuv . Thisis not so difficult to show mathematically (all simply follow form the definitions),but a figure can help visualize “what’s going on” (source).

we have not thought about their significance. Let us define another two-indexobject, called the metric tensor by

gij = ~ei · ~ej

which in matrix form is:

[gij ] =

~e1 · ~e1 ~e1 · ~e2 ~e1 · ~e3

~e2 · ~e1 ~e2 · ~e2 ~e2 · ~e3

~e3 · ~e1 ~e3 · ~e2 ~e3 · ~e3

And likewise, we define the inverse metric tensor as

gij = ~e i · ~e j

which in matrix form is:

[gij ] =

~e 1 · ~e 1 ~e 1 · ~e 2 ~e 1 · ~e 3

~e 2 · ~e 1 ~e 2 · ~e 2 ~e 2 · ~e 3

~e 3 · ~e 1 ~e 3 · ~e 2 ~e 3 · ~e 3

Since dot products are commutative, i.e, ~e i ·~e j = ~e j ·~e i and ~ei ·~ej = ~ej ·~ei, weget an interesting property:

gij = gji and gij = gji

In other words, the matrices [gij ] and [gij ] are symmetric. Just for sanitycheck, we can look at [gij ] for Cartesian coordinates, which (not surprisingly),turns out to be the identity matrix, i.e., [gij ] = [gij ] = [δij ] = I, which is sym-metric. However, [gij ] are not diagonal, in general. A counter-example is [gij ]

in paraboloidal coordinates.

We are of course not limited to looking only at dot products of the basisvectors among themselves. Consider the dot product of two vectors ~µ = µi~ei =µi~e

i and ~λ = λi~ei = λi~ei. We immediately see that there are more than one

way to express ~µ · ~λ, all of which are equivalent. Now, we have to be carefulwith writing the dot product with indices, because λi~e iµi · ~e i is not defined inthe Einstein summation convention. We have to realize that the indices of ~µand ~λ should be independent, so it suffices to replace i with j as indices for oneof the vectors. We are now safe to proceed.

~λ · ~µ = λi~ei · µj~ej = λi~ei · µj~e j = λi~e

i · µj~ej = λi~ei · µj~e j

We have shown that ~e i ·~ejδij , ~e i ·~e j = gij , and ~ei ·~ej = gij , we obtain a numberof relations:

~λ · ~µ = λi~ei · µj~ej = gijλiµj

= λi~ei · µj~e j = gijλiµj

= λi~ei · µj~ej = δijλiµ

j = λiµi = λjµ

= λi~ei · µj~e j = δji λiµj = λiµi = λjµj

We can summarize the statements above into five equivalent statements:

~λ · ~µ = gijλiµj = gijλiµj = λiµ

i = λiµi

From the second and fourth expressions, we get

gijµj = µi

Likewise, if we look at the third and fifth expressions:

gijλi = λj

This means we can use the metric tensor gij and the inverse metric tensor gij togo back and forth between covariant and covariant vector components. Or, in amore “bookkeeping” sense, gij raises an index, while gij lowers an index. Theseoperations are useful to remember, since they will become especially handy whenwe have objects with more than one upper and lower indices.

But we can extract even more information from the above two equalities bycombining them:

µi = gijµj = gij(gjkµ

3.6. METRIC TENSOR 33

Note that we need an extra index k here since the indices of the expansion ofµj are independent of the indices of gij , only except for j. Now, since it is alsotrue that µi = δijµ

j , the following has to be true as well:

gijgjk = δik

We obtain a similar equality if we start with the covariant component µi:

gijgjk = δki

The above two equalities imply that the matrices [gij ] and [gij ] are inverses ofeach other. Hence, we call gij the metric tensor and gij the inverse metrictensor.

Example 3.5.1. Consider ~a = ~eθ in spherical coordinates. Express ~a withcontravariant components, covariant components, and find the norm of ~a, withand without the metric gij .

Since ~a = ~eθ, a single natural basis vector, we can easily write ~a as a con-travariant vector as

ai = (a1, a2, a3)> = (0, 1, 0)>.

We can calculate the covariant components of ~a using the metric tensor gij :

a1 = g1jaj = g11a

a2 = g2jaj = g22a

2 = r2

a3 = g3jaj = g33a

3 = 0.

So, ~a, with covariant components, is

ai = (0, r2, 0)>

Next, we can calculate ||~a|| in two ways, and both will give the same answer:

||~a||2 = aiai = a1a1 + a2a2 + a3a3 = r2

||~a||2 = gijaiaj = g11a

1a1 + g22a2a2 + g33a

3a3 = r2.

So, ||~a|| = r.

3.6 Metric tensor

In the previous subsection we have defined the metric tensor and the inversemetric tensor. In this subsection, we will discuss their physical significance and

related properties.

To start, the metric tensor contains information about physical lengths andthe geometry of the space. To illustrate this point, let us consider a path ~r = ~r(t)in flat 3-dimensional space (Euclidean). For vector calculus, we know that thelength of the underlying curve, denoted as L, is given by

||~r|| dt.

Let this curve be expressed in a general curvilinear coordinate system withcoordinates (u, v, w), i.e., ~r(t) = (u(t), v(t), w(t)), it follows that

~r =∂~r

dt+∂~r

= ~eudu

dt+ ~ev

dt+ ~ew

= ~eidui

in Einstein’s notation. So,∣∣∣∣∣∣~r∣∣∣∣∣∣ =√~r · ~r =

√~eidui

dt· ~ej

√gijdui

Putting everything (so far) together, we get

∣∣∣∣∣∣∣∣d~rdt∣∣∣∣∣∣∣∣ dt =

√gijdui

Let ds the infinitesimal distance, we get

√gijdui

Bringing the dt into the square root, and squaring both sides, we get

ds2 = gij dui duj

Even though ds2 has units of length squared, we still shall refer to this quantityas the line element. Let us look at a few examples to see how the aboveequality manifests in geometries we are familiar with.

Example 3.6.1. In Cartesian coordinates, the natural basis vectors are {~ei}{i, j, k}.Observe that gij = ~ei ·~ej = δij , since the ~ei’s are both unit vectors and mutuallyperpendicular. So, as a matrix:

[gij ] =

1 0 00 1 00 0 1

The line element, ds2 = gij dui duj , is a sum of nine terms. However, we notice

that gij = 0 if i 6= j and 1 otherwise. Therefore the line element is:

ds2 = gij dui duj = dx2 + dy2 + dz2,

which we are very familiar with. Indeed, the moral of this example is to showthat while the form of the line element can be derived geometrically, a rigorousway to obtain the line element is actually through the metric. For Cartesiancoordinates, deriving the line element is more or less trivial, because we realizethat the above equality is simply Pythagorean theorem in the three dimensions.However, as we will see in the next two examples, it is more difficult to obtainthe line element by “inspection,” and that realizing the line element from themetric is more reliable.

Example 3.6.2. We have shown that, in spherical coordinates:

~er · ~er = 1

~eθ · ~eθ = r2

~eφ · ~eφ = r2 sin2 θ

~er · ~eθ = ~er · ~eφ = ~eθ · ~eφ = 0,

which give the metric, in matrix form, as:

[gij ] =

1 0 00 r2 00 0 r2 sin2 θ

Therefore, the line element in spherical coordinates is:

ds2 = gij dui duj = dr2 + r2 dθ2 + r2 sin2 θ dφ2.

While one can certainly derive this expression geometrically, it is no doubt morechallenging to do than in Cartesian coordinates.

Example 3.6.3. Find the length of a curve in spherical coordinates with theparameterization ~r(t) = (r(t), θ(t), φ(t)) = (t, π/4, 4t), where 0 ≤ t ≤ π.

It is possible to proceed without knowing what the underlying curve of thispath is. However, it certainly does not harm to know what length we are tryingto find here. First, notice that θ is fixed, so we are limited to a cone, orientedvertically along the z-axis. Now, since 0 ≤ t ≤ π, and that φ = 4t, we know thatthe curve makes two revolutions around the z-axis. So we can imagine the curvelooking something like this: Find the length of a curve is only “mechanical” once

Figure 3.4: The underlying curve of path ~r(t), for 0 ≤ t ≤ π, side view and topview.

we already know the line element ds2, which we have derived in the previousexample. Plus, we know that dθ = 0 since θ is constant. So,

√dr2

dt2+ r2

dt2+ r2 sin2 θ

√1 + 0 +

(4t sin

√1 + 8t2 dt

≈ 14.55.

As we have seen before in the earlier sections, the metric tensor doesnot just encode physical lengths. gij are also ubiquitous in calculating norms ofvectors and inner products (or dot products). As a reminder:

Norm: ||~λ||2 = ~λ · ~λ = gijλiλj

Inner product: ~λ · ~µ = gijλiµj .

They can also be used to raise/lower indices, as we have seen.

3.6.1 Summations as matrix products

As a little “aside,” writing vectors and 2-component tensors (such as gij) asmatrices can be more convenient for us. But note that more general tensors(with three or more indices) can no longer be written as matrices.

Suppose that we have two square matrices A = [aij ] and B = [bij ]. Let

matrix C = AB. In matrix multiplication,

. . . cij . . ....

...ai1 . . . ain...

. . . b1j . . .

.... . . bnj . . .

i.e., the ijth element of C is

n∑k=1

aikbkj

where k on the a term indicates a column, while k on the b term indicates arow. Notice that the summed index (k) goes as “column-row,” whereas i− j goas row-column. Note that I do not have a restriction on n, because the aboverelations are true in any n-dimensional space.

Similarly, we can multiply vectors in matrix notations. Let two vectors~F , ~G ∈ Rn be given. Assuming that the metric is the identity, their innerproduct is given by:

~F · ~G =

= ~F> ~G =

n∑k=1

With a general metric gij , we want to express ~λ · ~µ = gijλiµj as a matrix

multiplication. We have to make sure that the matrix multiplication is defined,i.e., the rows of the right-multiplying element is equal to the columns of the left-multiplying element. There are two things we can do: ordering the ij indices,and transposing. Let λ denote ~λ = [λi], and µ denote ~µ = [µi].

For contravariant vectors:

Let G = [gij ], where i is the row indicator and j is the column indicator. This

means that the index of ~λ has to be a column indicator, so we transpose ~λ. Theindex of µ is a row indicator, which matches with j-the column indicator of G,so we are good to proceed. Again, note that n is not necessarily 3.

~λ · ~µ = λigijµj =

(λ1 . . . λn

)g11 . . . g1n

.... . .

...gn1 . . . gnn

...µn

= λ>Gµ

For covariant vectors:

We can find express the inner product of covariant vectors in matrix multi-plication, using the form we have found for contravariant products. Considertwo covariant vectors λ∗ = [λi] and µ∗ = [µi] and the inverse metric tensorG−1 = [gij ]

−1 = [gij ] (recall that gijgjk = δki ). Similar to the contravariant

vector case, but now with ~λ · ~µ = gijλiµj :

λ∗ · µ∗ =(λ1 . . . λn

g11 . . . g1n

.... . .

...gn1 . . . gnn

...µn

= λ∗>G−1µ

As an little aside, to write λ∗ in terms of the metric tensor matrix and thecontravariant vector, we simply remember that λi = gijλ

j . So,

λ∗ = Gλ

λ = G−1λ∗

In matrix form, the above two equalities can be written as:λ1

...λn

g11 . . . g1n

.... . .

...gn1 . . . gnn

...λn

and λ

...λn

g11 . . . g1n

.... . .

...gn1 . . . gnn

...λn

respectively.

Metric tensor and inverse metric tensor:

As mentioned, [gij ] = [gij ]−1. In matrix form:

G−1G =

g11 . . . g1n

.... . .

...gn1 . . . gnn

g11 . . . g1n

.... . .

...gn1 . . . gnn

= diag(1, . . . , 1) = I

Example 3.6.4. Find the inverse metric tensor gij in spherical coordinates,given the metric tensor gij is:

[gij ] =

1 0 00 r2 00 0 r2 sin2 θ

3.7. COORDINATE TRANSFORMATION 39

By definition, [gij ] = [gij ]−1. So taking the inverse of [gij ] will give us [gij ]. But

notice that [gij ] is diagonal, so gij is simply the reciprocal of gij , if gij 6= 0, and0 otherwise. This gives

[gij ] =

1 0 00 r−2 00 0 r−2 sin−2 θ

3.7 Coordinate transformation

We have already seen the usage of multiple (more or less conventional) coordi-nate systems in the three-dimensional space. In general n-dimensional spaces,the options for coordinate systems are almost unlimited. It is crucial for us, asrelativists, to develop for rigorous procedure to transform vector and tensor com-ponents as we move from one coordinate system to another. Note that vectorand tensor components change under general coordinate transformation. It iscommon in physics to associate a vector or a tensor with a physical quantity, andcoordinate system, or reference frame, etc., with perspectives. It makes sensethat what we see might differ at different perspectives, but the physical quantityitself is independent of how we view it. In other words, vectors and tensors donot change under general coordinate transformations, only their components do.

In this subsection, we will learn how to transform between arbitrary coordi-nate systems. We will also learn how vector and tensor components transform,as well as what vectors and tensors are.

3.7.1 Vectors

We as physicists often think of vectors as an object with magnitude and di-rection. But this definition is incomplete, as it disregards a reference frame inwhich the vector is viewed. As we have mentioned in the introduction to thissection, under general coordinate transformations, vectors do not change, buttheir components do, since the basis set of the coordinate system changes. Forexample: one frame is rotated by θ from another. In frames (S) and (S′), ~λ

will have different components, but the “physical object” ~λ does not change.For instance, we can calculate the norm of ~λ, ||~λ||, and find that it is the samein both frames. In fact, scalars are invariant under general coordinatetransformations.

Next, let’s talk notations. Let’s assume, without loss of generality, thatwe are only dealing with contravariant vectors here (the rules we will establish

shortly will apply to covariant vectors as well). Consider ~λ in frame (S). In

index-notation, we can write ~λ = {λi}. Consider a different frame (a “primed”

frame), denoted (S′). We will denote the components of ~λ in (S′) as λi′. Just

to get us familiar with the new notation, let’s go through an example:

Example 3.7.1. Suppose that ~λ is described two coordinate systems: spher-ical and cylindrical. Let the “unprimed” coordinate system be the sphericalcoordinate system. Then,

{ui} = (r, θ, φ)

{ui′} = (ρ, φ, z).

As simple as that!

Back to (S) and (S′). Since these are two different coordinate systems, thebasis sets and metric tensors are also different:

(S) : ~ei =∂~r

∂ui, ~e i = ~∇ui, gij = ~ei · ~ej

(S′) : ~ei′ =∂~r

ui′, ~e i

′= ~∇ui

′, gi′j′ = ~ei′ · ~ej′ .

Now, since ~λ is the same in both (S) and (S′), it must be true that”

~λ = λi~ei = λi′~ei′ ,

i.e., λi and ~ei must transform in a way that leaves ~λ unchanged. Consider~r = ~r(ui

′) = ~r(ui

′(uj)). By the chain rule:

~ej =∂~r

∂uj=

∂ui′∂ui

∂uj= ~e i

′ ∂ui′

∂uj.

Let us define, for simplicity sake,

U i′

j =∂ui

Note that there is nothing new here. U i′

j simply generalizes 9 partial derivatives

(because we are in 3-dimensional space). Together as a matrix, [U i′

j ] is nothingbut the Jacobian matrix “representing” the transformation (S)→ (S′):

[U i′

∂u1′

∂u1∂u1′

∂u2∂u1′

∂u2′

∂u1∂u2′

∂u2∂u2′

∂u3′

∂u1∂u3′

∂u2∂u3′

So, the chain rule for ~ej becomes the general coordinate transformation rule forthe natural basis sets ~ei′ → ~ei.

~ej = U i′

j ~ei′

Next, since ~λ = λi′~ei′ = λj~ej = λjU i

j ~ei′ , we obtain the general coordinate

transformation rule for contravariant vector components λi → λi′:

λi′

= U i′

We can, of course, conjecture that the transformation rule for contravariantvector components λi

′ → λi is:

λi′

= U ji′λi′

and we would be correct. The “proof” follows here. First, we look at theJacobian matrix representing the transformation (S′)→ (S):

[U ji′ ] =

[∂uj

∂ui′

which is simply the inverse of[U i′

]. So, it follows that:

[U j′

k ][Uki′ ] = diag(1, . . . , 1) = [δj′

i′ ] = [δji ] = I ,

[Uki′ ][Ui′

j ] = diag(1, . . . , 1) = [δkj ] = I ,

because matrices that are inverses of each other are commutative. Element-wise:

Uki′Ui′

j = δkj = δk′

j′ = Uk′

i Uij′

So we can now verify our little “conjecture:”

λi′

= U i′

Uki′λi′ = U i

j Uki′λ

Uki′λi′ = δkj λ

Uki′λi′ = λk.

We notice that nothing makes U i′

j more special than U ij′ . Also, while we usecontravariant vectors more often than covariant vectors, nothing makes con-travariant vectors more special than covariant vectors. So if transformationrules apply to contravariant vectors, we should also expect them to work forcovariant vectors:

λi′ = U ji′λj

λi = U j′

i λj′

At this point it looks like we are inventing new mathematics by playing a littlegame of swapping indices, and it is reasonable for us to grow a little bit suspicious

of what we are doing here. But we can show that our “index-swapping intuition”is correct. Let us consider (again) a vector ~λ = λj~e

j = {λj}, where ~e j is a vectorin the dual basis:

~e j = ~∇uj =∂uj

∂xi+

∂yj +

∂zk.

Let us apply to chain rule:

∂x=∂uj

∂ui′∂ui

Therefore,

~e j = ~∇uj

=∂uj

∂xi+

∂yj +

=∂uj

∂ui′∂ui

∂xi+

∂ui′∂ui

∂yj +

∂ui′∂ui

3∑i=1

∂ui′∂ui

∂xi+

∂ui′∂ui

∂yj +

∂ui′∂ui

3∑i=1

∂ui′

(∂ui

∂xi+

∂ui′

∂yj +

∂ui′

3∑i=1

∂ui′~∇ui

3∑i=1

U ji′~ei′ .

Or, in Einstein’s summation notation, the transformation rule for the dual basisvectors is:

~e j = U ji′~ei′

And the inverse transformation follows:

~e j′

= U ji′~ei′

Note that, the above relation is in fact true for any n-dimensional space. Our“proof” here is restricted to only 3-dimensional space. However, it is not sodifficult to fix this slight issue. If we redefine the ~∇ operator for n-dimensionalspace, then we can easily see that the new proof is identical to the proof above,except for the maximal value of n.

We easily obtain the transformation rules for the covariant vector compo-nents using the results above:

~λ = λi′~ei′ = λj~e

j = λjUji′~e

i′ = λi′Ui′

j ~ej .

Therefore,

λi′ = U ji′λj

λj = U i′

j λi′

Below is a summary of what we have learned so far about general coordinatetransformations:

Natural basis: ~ej = U i′

j ~ei′ ~ej′ = U ij′~ei

Dual basis: ~e j = U ji′~ei′ ~e j

′= U j

Contravariants: ~λj′

= U j′

i~λi ~λj = U ji′

~λi′

Covariants: ~λj = U i′

j~λi′ ~λj′ = U ij′

~λj′

Since components of a vector must transform according to the rules above undergeneral coordinate transformations, we can turn these rules into a more generaldefinition of a vector.

Definition 3.7.1. A vector is a quantity whose components transform con-travariantly as λi

′= U i

j λj under a general coordinate transformation ui

ui′(uj).

Remark 3.7.1. We are often interested in vector fields, a collection of vectorsat different points in space whose components depend on the coordinates: λi =λi(uj). Since vectors have to transform correctly, we require that at any pointP , we require that λi

′= U i

j λj be true for a vector field (defined on λi) to be

defined.

Remark 3.7.2. Not all 3-tuple of functions are vectors. Consider a 3-tuple ofcoordinates. Let λi = ui and λj

′= uj

′, where uj

′= uj

′(ui). For λi to be make

a vector field, it is required that λi′

= U i′

j λj . This implies that ui

′= U i

j uj ,

with U i′

j = ∂ui′/∂uj . However, we immediately see that this is not true in

general:

ui′6= ∂ui

∂ujuj

But rather, ui′

= ui′(uj) = f(uj). So, coordinates themselves do not make

vectors in general, because they, as components, do not transform correctlyunder general coordinate transformations. This is why we never use gij or gij

to lower or raise the index of ui or ui, i.e., ui 6= gijuj , and uj 6= gijui, in general.

There are exemptions to the “general rule,” of course. For instance, con-

sider linear transformations, where uj′

= uj′(ui) = Cj

i ui, where Cj

i areconstants. What this means is that the new coordinates are simply linear com-binations of the old coordinates. It follows that

∂uj′

∂uk= Cj

i δik = Cj

If we let k = i (we’re just swapping dummy indices here), then Cj′

i = ∂uj′/∂ui =

U j′

i , i.e., uj′

= U j′

i ui under linear transformations. Therefore, ui transform

correctly, and in this case the coordinates ui can form a vector, {ui}.

Remark 3.7.3. Properly speaking we can define vectors with respect to aparticular class of transformations. It is possible for something to be a vectorwith respect to a class of transformation, but not a vector with respect toanother. Such a class of transformation (default) is the general coordinatetransformations.

Example 3.7.2. Can differentials of coordinates (dui) form a vector?

The solution is actually quite simple. Without loss of generality, let usconsider a hypothetical vector {dui} in 3-dimensional space. {du1, du2, du3}.Applying the chain rule:

dui′

=∂ui

∂ujduj = U i

j duj .

We see that dui transform correctly under general coordinate transformations.Therefore, dui can make a vector.

Example 3.7.3. Find [U j′

i ] for a coordinate transformation from Cartesian to

spherical in flat 3-dimensional space. Let {uj} = {x, y, z} and {uj′} = {r, θ, φ}.We find the Jacobian matrix:

[U j′

∂r∂x

∂r∂y

∂r∂z

∂θ∂x

∂θ∂y

∂θ∂z

∂φ∂x

∂φ∂y

∂φ∂z

r sin θ cosφ r sin θ sinφ r cos θcos θ cosφ cos θ sinφ − sin θ− sinφsin θ

cosφsin θ 0

Example 3.7.4. Consider ~λ = (1, 0, 0)> = {λi′} in Cartesian coordinates.

What are the components of ~λ in spherical coordinates?

With our developed procedure, we should be able to obtain the answer withlittle thinking:

~λ = {λi′} =

λ1′

λ2′

λ3′

= [U i′

j λj ] =

r sin θ cosφ r sin θ sinφ r cos θcos θ cosφ cos θ sinφ − sin θ− sinφsin θ

cosφsin θ 0

which gives λ1′

λ2′

λ3′

r sin θ cosφcos θ cosφ− sinφsin θ

In the natural basis of spherical coordinates,

~λ = sin θ cosφ~er +1

rcos θ cosφ~eθ −

r sin θ~eφ.

We have said before that the “physical quantity” ~λ does not change. We cantest this by finding ||~λ|| in Cartesian and spherical coordinates and compare theresults. In Cartesian, the metric tensor is gij = δji , so

||~λ||Cartesian =√gijλiλj =

√12 + 02 + 02 = 1,

as expected. The metric tensor gi′j′ of spherical coordinate, in matrix form, is:

[gi′j′ ] =

1 0 00 r2 00 0 r2 sin2 θ

It follows that

||~λ||Spherical =√gi′j′λi

′λj′ = g1′1′

(λ1′)2

+ g2′2′

(λ2′)2

+ g3′3′

(λ3′)2

√(sin θ cosφ)

rcos θ cosφ

+ r2 sin2 θ

(− sinφ

r sin θ

√(sin θ cosφ)

2+ (cos θ cosφ)

2+ sin2 φ

√cos2 φ+ sin2 φ

Not surprisingly, ||~λCartesian|| = ||~λ Spherical|| = 1.

Example 3.7.5. Find U j′

i for a rotation by an angle φ around the z-axis in

Cartesian coordinates. Suppose ~a = {ai} = (1, 1, 0)>. What is ~a = {ai′} aftera rotation by φ?

From linear algebra, we know that a rotation by φ around the z-axis is givenby: x′y′

cosφ − sinφ 0sinφ cosφ 0

[U j′

Given that ~a = {ai} = (1, 1, 0)>,λ1′

λ2′

λ3′

cosφ− sinφsinφ+ cosφ

Notice that because the coefficients of [U j

i ] are constants, the coordinates{x, y, z} can make a vector. But note (again) that this is not true in gen-eral.

Next, we can find ||~λ|| in both coordinate systems again and show that they

are equal. In Cartesian coordinates, {λi} = (1, 1, 0)>. So, ||~λ|| =√

2. In therotated frame:

||~λ|| =√gi′j′λi

′λj′ .

We seem to be stuck here, since we have yet found the form of the new metricgi′j′ . However, we observe that gi′j′ = ei

′ · ~ej′ , which is just either 1 (if i = j)or 0 (if j 6= j), because the new coordinate system is just the “tilted” version ofthe old one. So we expect that new metric tensor is the same as the old metrictensor, which is nothing but gi′j′ = diag(1, 1, 1). We can proceed to calculatethe norm:

||~λ|| =√

(cosφ− sinφ)2

+ (sinφ+ cosφ)2

2(sin2 φ+ cos2 φ

which is of course what we found earlier.

Theabove example motivates what we will discuss next. We have seen how vectorcomponents transform under general coordinate transformations, and how theirnorms (scalar) remain invariant. However, we start to question how the metrictensor transforms, in general. In the above example, the only reason we wasable to calculate the norm of ~λ was that we could guess the form of the metrictensor, using some cleverness. In general, though, we will not be able to solvethis problem without knowing exactly the form of the metric. Our next task, asapparent as it is now, is to find out how metric tensors in particular and tensorsin general transform under general coordinate transformations.

3.7.2 Tensors

We should first ask ourselves what tensors are. While a rigorous definitionof a tensor will come shortly in this section (hint: it will resemble that of avector, i.e., based on how their components transform under general coordinatetransformations), we can first think of tensors as generalizations of vectors, i.e.,with more than one index and hence multi-dimensional. Consider a physicalobject like a balloon which stretches and compresses multi-dimensionally whensome force is applied to it. We call this “response” the stress tensor, which has9 components in 3-dimensional space: Fxx, Fxy, Fxz, Fyx, Fyy, Fyz, Fzx, Fzy, Fzz.

Definition 3.7.2. A tensor is a multi-component quantity whose componentstransform as contravariant or covariant vector components.

Example 3.7.6. Find the condition for τ ij lk to be a tensor.

Well, a tensor has to transform correctly, so we can play a little game ofindex-replacing:

τ i′j′ l′

k′ = U i′

mUj′

n Upk′U

p τmn q

under a general coordinate transformation ui′ → ui

′(uj).

Example 3.7.7. Show that gij is a tensor.

gij is a tensor, in fact, by definition:

gi′j′ = Uki′~ek · U lj′~el = Uki′Ulj′~ek · ~el = Uki′U

lj′gkl

Similarly, the inverse gij is also a tensor:

gk′l′ = Uk

i ~ei · U l

j ~ej = Uk

i Ul′

j ~ei · ~e j = Uk

i Ul′

Definition 3.7.3. A tensor τabc...def... is of type (r, s) when it has r contravari-ant components and s covariant components.

Example 3.7.8. gij is a type (0, 2) tensor. gij is a type (2, 0) tensor.

Vectors are just a special type of tensors. Contravariant vectors λi aretype (1, 0) tensors, while covariant vectors λi are type (0, 1) tensors.

Remark 3.7.4. The transformation U i′

j is NOT a tensor. Indeed, it does notmake sense to talk about how a transformation transforms.

Example 3.7.9. Write gi′j′ = Uki′Ulj′gkl as a matrix equation.

We let G = [gij ], G′ = [gi′j′ ], and U = U−1 = [∂uk/∂ui

′]. To create a matrix

equation, we have to be aware of when an index indicate a row or column. Thecorrect “row-column-sensitive” order expression is:

[gi′j′ ] = [Uki′ ][gkl][Ulj′ ],

or equivalently,

G′ = U>GU

Example 3.7.10. Using the transformation rules for tensors, obtain gi′j′ for arotation by φ around the z-axis in 3-dimensional space.

Recall that

[U j′

and that gij = diag(1, 1, 1) in 3-dimensional Cartesian coordinate system. To

find G′, we first have to find Uki′ , the inverse transformation. We expect that

Uki′ is simply a rotating by −φ (since this intuitively “reverses” what Uk′

i does).But we, as always, find Uki′ mechanically:

[∂uk

∂ui′

]= U−1 =

cosφ sinφ 0− sinφ cosφ 0

as anticipated. Using the result of the previous example,

G′ = U>GU

cosφ sinφ 0− sinφ cosφ 0

1 0 00 1 00 0 1

3.8. SCALARS 49

just as guessed. So our intuition was correct after all. But we realize an im-portant fact: the metric tensor can remain the same after a non-trivial trans-formation (in this case the transformation is a rotating around the z-axis). Wealso notice that the metric tensors are the same under a transformation if thetransformation matrix is orthogonal, i.e., its inverse the equal to its transpose:

U = U−1 = U>.

Remark 3.7.5. Only tensors of type (r, s) with r + s ≤ 2 can be displayedas matrices and/or matrix multiplications. It is not possible to write τ ijkl as amatrix.

3.8 Scalars

As we have seen from time to time, scalars are invariant under general coordinatetransformations. As tensors, scalars are of type (0, 0), i.e., they have openindices.

Example 3.8.1. Show that the magnitude of a vector is a scalar.

Consider ~a = ai = ai′

in two different coordinate systems. The magnitudeof ~a = ||~a|| is a scalar if:

~a · ~a = aiai = ai′ai′ .

This can be readily shown, since we already know the transformation rules forvector components:

ai′ai′ =

(U i′

j aj) (Uki′ak

)= U i

j ajUki′aka

= δkj akaj

= akak

= aiai.

Therefore, ||~a|| is a scalar.

Example 3.8.2. Show that the line element ds2 = gij dui duj is a scalar.

Similar to the previous example, we have to show that:

gi′j′ dui′ duj

′= gij du

i duj .

We simply apply the transformation rule to each factor and play a little gameof index-arranging:

gi′j′ dui′ duj

′=(Uki′U

lj′gkl

) (U i′

m dum)(

U j′

n dun)

= Uki′Ulj′U

mUj′

n gkl dum dun

= δkmδln gkl du

= gkl duk dul

= gij dui duj .

Therefore, ds2 = gij dui duj is a scalar.

Chapter 4

Flat spacetime

Recall that in space, we require 3 coordinates. It makes sense that in orderto describe spacetime, we need an extra coordinate to describe time, in whichcase, we will switch from Roman-character indices (i, j, k, · · · = 1, 2, 3) to Greek-character indices (µ, ν, σ, . . . ) = 0, 1, 2, 3. We also tend to (by an unwrittenconvention and of course not mandatory) write vectors as capital letters todistinguish 4-vectors from 3-vectors. For example, below are a number of wayswe express a 4-vector:

Xµ ={X0, X1, X2, X3

}=(X0, ~X

)=(X0, Xi

Notice that the “hat” notation is reserved for 3-vectors.

4.1 Special Relativity

Coordinate transformations in special relativity are Lorentz transformationsunder which the spacetime interval (in Cartesian coordinates)

ds2 = c2 dt2 + dx2 + dy2 + dz2 =(dx0)2

+(dx1)2

+(dx2)2

+(dx3)2,

which gives “physical distances” in spacetime, is invariant. We also recall that

ds2 = gij dui duj ,

we can “read off” the metric from the spacetime interval as diag = (1,−1,−1,−1).This is referred to as the Minkowski metric (flat 4-dimensional spacetime inCartesian coordinates), denoted as ηµν :

[ηµν ] =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

52 CHAPTER 4. FLAT SPACETIME

Because any frame in special relativity can be connected to the original frameby a Lorentz transformation, i.e.,

ds2 = ds′2

=(dx0′

+(dx1′

+(dx2′

+(dx3′

The Minkowski metric (in Cartesian coordinates) is the same for any referenceframe:

[ηµ′ν′ ] = [ηµν ] =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

This says that:

ds2 = ηµν dXµ dXν = ηµ′ν′ dX

µ′ dXν′

For general coordinate systems in three dimensions, the metric gij 6= diag(1, 1, 1).In these cases, the Minkowski metric has the following form:

[ηµν ] =

(1 00 −[gij ]

1 0 0 000 −[gij ]0

More generally, however, spacetime can be curved, and the metric tensor of thethree-dimensional space can be non-Cartesian. In these cases, we simply replaceηµν by gµν . It follows that:

ds2 = gµν duµ duν

in general. The “moral” here is that ηµν is reserved for flat Minkowski spacetime.

Same as before, the metric tensor encodes all “physical properties” of theMinkowski spacetime and allows us to transform contravariant components intocovariant components and vice versa. For instance, consider a contravariantvector λµ = (λ0, λ1, λ2, λ3) = (λ0, ~λ). The covariant vector is given by simply“lowering” the index of the contravariant vector using the Minkowski metrictensor:

λµ = ηµνλν = (λ0, λ1, λ2, λ3).

The inverse transformation has the same form:

λµ = ηµνλν .

Note that if we use the Minkowski metric tensor in Cartesian coordinates, then

λµ = ηµνλν = (λ0, λ1, λ2, λ3) = (λ0,−λ1,−λ2,−λ3).

4.1. SPECIAL RELATIVITY 53

We observe that

λ0 = λ0

λi = −λi.

We also observe that the inverse Minkowski metric tensor is Minkowski metrictensor itself:

[ηµν ] = [ηµν ]−1 =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

Inner products of vectors follow the same rule as before:

a · b = aµbµ = aµbµ = ηµνa

µbν = ηµνaµbν

On a more practical side of things, we have to be careful with the signs that gowith the terms when we solve problems:

a · b = a1b1 + a2b2 + a2b2 + a3b3

= a1b1 − a2b2 − a2b2 − a3b3

= a1b1 − a2b2 − a2b2 − a3b3.

At this point, we might wonder what the basis vectors ~eµ in the flat, 4-dimensionalMinkowski spacetime in Cartesian coordinates are. We can, of course, find them,from the metric tensor ηµν . However, one thing for sure: they are not going to

be the basis vectors i, j, k. In fact, the basis vectors that make up ηµν will likelycontain imaginary parts, since ~ei · ~ej = −1 for i = j. The point is: we don’treally need the basis vectors once we already have the metric, because after all,almost all of our computations involve the metric tensor.

4.1.1 The Lorentz Transformations, revisited

Recall that Lorentz transformations are (not necessarily strictly linear) trans-formations from one inertial frame (S) to another inertial frame (S′).

The most general Lorentz transformations, also referred to collectively as“Poincare transformations” include:

• Lorentz boost: inertial frames with relative velocity v.

• Translation: origins don’t coincide at t′ = t = 0.

• Spatial rotation

• Spatial inversion: or parity transformations (x′ = −x)

• Time reversal: t′ = −t.

We can also make further distinctions among the listed transformations:

• Inhomogeneous Lorentz transformations: have non-trivial translation.

• Homogeneous Lorentz transformations: no translations.

• Improper Lorentz transformations: with time reversal or parity.

• Proper Lorentz transformation: no parity or time reversal.

We can first look at homogeneous, proper Lorentz transformations withno rotations, i.e., Lorentz boosts. These are probably the simplest of Lorentztransformations that we encountered in introductory special relativity. A classicexample is the Lorentz boost along the x-axis:

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

where the transformation matrix element U i

j in 3-dimensional space is nowgeneralized to 4-dimensional spacetime:

Xµ′

ν =∂aµ

∂aν

Specifically for Lorentz boosts, we let the general Xµ′

ν be Λµ′

ν . This slight changein notation gives:

[Λµ′

ν ] =

[∂aµ

∂aν

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

We notice immediately that Λµ

ν is constant for all µ, ν. This means (i) Lorentztransformations are linear transformations, and (ii) Cartesian coordinates Xµ

can form the components of a vector under Lorentz transformations because

Xµ′ = Λµ′

ν Xν

is obeyed. While this concept might seem foreign, we realize we have seen thismany times before in special relativity, only in a slightly different form:

γ −βγ 0 0−βγ γ 0 0

0 0 1 00 0 0 1

which gives us the very first Lorentz-Einstein equations that we are familiarwith:

ct′ = γ(ct− βx)

x′ = γ(x− cβt)y′ = y

z′ = z.

Keep in mind that the equalities above are only true because Λµ′

ν Λνσ′ = δµσ , andare not true in general, because coordinates don’t form vectors in general.

We can find the inverse of Λµ′

ν in two ways: the intuitive way, and themechanical way. Intuitively, Λµν′ , or the inverse transformation, undoes what

Λµ′

ν , so it has to be a boost in the opposite direction (v → −v therefore β → −β).So if Λµ

ν is an x-boost, then its inverse is simply:

[Λµν′ ] =

γ βγ 0 0βγ γ 0 00 0 1 00 0 0 1

which not surprisingly gives us the inverse Lorentz-Einstein transformationequations:

ct = γ(ct′ + βx)

x = γ(x+ cβt)

z = z.

A reasonable next step is to look at proper homogeneous Lorentz trans-formation with rotation(s). Recall from linear algebra that such a transfor-mation is simply a composition of multiple, more elementary transformations(pure boost and/or pure rotation). For example, a pure rotation an angle φabout the z-axis has the following matrix:

[Λµ′

ν ] =

1 0 0 00 cosφ sinφ 00 − sinφ cosφ 00 0 0 1

To think how to do about how to construct a transformation matrix has includesboth a boost and a rotation, we can first do a gentle example.

Example 4.1.1. Find the transformation matrix for a Lorentz boost in the y-axis in two ways: intuition and composition of rotation by π/2 and boost alongthe x-direction.

We can easily predict that the matrix for a boost in y should be almost thesame as that for a boost x, except for the locations of the terms in the matrix.We know this because the x-direction is no more special than the y-direction.So,

[Λµ′

ν ] =

γ 0 −βγ 00 0 0 0−βγ 0 γ 0

0 0 0 0

We should get the same matrix by doing the “composition method.” We cansimply think of a boost in the y-direction as a rotating by π/2, followed bya boost along the x-direction, then followed by a rotation by −π/2. A finaltransformation matrix is a product of the list matrix, in right-to-left order:

[Λµ′

ν ] =

1 0 0 00 0 −1 00 1 0 00 0 0 1

γ βγ 0 0βγ γ 0 00 0 1 00 0 0 1

1 0 0 00 0 1 00 −1 0 00 0 0 1

γ 0 −βγ 00 0 0 0−βγ 0 γ 0

0 0 0 0

as expected.

The moral of the above example is to show that anyLorentz boost along an arbitrary direction can be found as a combination of aboost along the x-direction and a spatial rotation:

[Λµ′

ν ] =

1 0 0 00 cosφ − sinφ 00 sinφ cosφ 00 0 0 1

γ βγ 0 0βγ γ 0 00 0 1 00 0 0 1

1 0 0 00 cosφ sinφ 00 − sinφ cosφ 00 0 0 1

4.1.2 A curiosity about the Lorentz boosts

We can actually make Lorentz boosts look like rotation using hyperbolic func-tions:

sinh(α) =eα − e−α

cosh(α) =eα + e−α

sech(α) =1

cosh(α)

csch(α) =1

sinh(α)

tanh(α) =sinh(α)

cosh(α)

coth(α) =1

tanh(α),

which have the following identities:

cosh2(α)− sinh2(α) = 1

1− tanh2(α) = sech2(α).

If we let β = tanh(φ), where the angle φ is called “rapidity,” then it followsthat:

γ =1√

1− β2=

1√1− tanh2(φ)

= (sech(φ))−1

= cosh(φ),

γβ = sinh(φ).

So the Lorentz boost [Λµ′

ν ] becomes:

[Λµ′

ν ] =

cosh(φ) − sinh(φ) 0 0− sinh(φ) cosh(φ) 0 0

0 0 1 00 0 0 1

4.1.3 The Poincare transformations

The general form of Poincare transformations are given by:

aµ′

= Λµ′

ν aν + bµ

where bµ is a constant, i.e., ∂bµ′

∂xν = 0. We notice that, by the form of the equa-tion, these are “affine” transformations, or equivalent a linear transformation

(or linear map) plus a shift/translation. What we mean by “affine” is that,

suppose we take ∂aµ′

∂aν :

∂aµ′

∂aν= Λµ

which is constant, and “looks like a Lorentz transformation.” With the chainrule:

Λµ′

ν Λνσ′ =∂aµ

∂aν∂aν

∂aσ′= δµσ ,

which is what we have seen in Lorentz transformations.

A key feature of the Lorentz transformation is that it is a linear map, andthat

ds2 = ηµν dxµ dxν = ηµ′ν′ dx

µ′ dxν′,

[ηµν ] = [ηµ′ν′ ] =

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

i.e., the Lorentz transformations preserve the Minkowski metric in Cartesiancoordinates. Now, if we take the differentials of the Poincare transformationequation:

daµ′

= Λµ′

σ daσ,

and plug this into the ds2 equation above, then

ηµν dxµ dxν = ηµ′ν′ dx

µ′ dxν′

= ηµ′ν′(

Λµ′

σ dxσ)(

Λν′

ρ dxρ)

= Λµ′

σ Λν′

ρ ηµ′ν′ dxσ dxρ

Let σ → µ, ρ→ ν, µ′ → α′, and ν′ → β′ (since these are just indices), we get:

ηµν = Λα′

σ Λβ′

ρ ηα′β′

So, we conclude that in Poincare transformations, the metric tensor transformsaccording to the above rule. We make two observations, as a consequence of theabove fact and from the subsection on Lorentz transformations:

• ηµν is a tensor, since it transforms correctly under general coordinatetransformations (Poincare).

• ηµν is unchanged (preserved) under Lorentz transformations.

As a summary, we have this little table:

Contravariant vectors: λµ′

= Λµ′

ν λν

Covariant vectors: λµ′ = Λνµ′λν

Tensors: τµ′ν′

σ′ = Λµ′

α Λν′

β Λγσ′ταβγ

Scalars: Invariant

Example 4.1.2. Show that inner products of vectors are scalars.

We actually have done an identical problem before. Consider two vectorsaµ′

and bµ′. Their inner product is given by:

aµ′bµ′ = Λµ

ν aνΛσµ′bσ

= Λµ′

ν Λσµ′aνbσ

= δσν aνbσ

= aνbν .

This shows that the norm of every 4-vector is invariant:

λ · λ = λµλµ = λµ′λµ′ .

Notice that the sign of a vector norm squared is also invariant. Among otherquantities, this fact has some profound physical consequences. Recall that, inMinkowski spacetime, the norm squared of a vector can be either negative,positive, or zero. We recall and rethink the following facts from introductoryspecial relativity:

• If λ2 > 0, then λµ is time-like. This means there exists a referenceframe where λµ

′= (λ0, 0, 0, 0), i.e., there exists a reference frame where

the “event” described is at the spatial origin. Recall that if λµ is a vectorbetween two events on the Minkowski spacetime diagram, then time-like-ness implies the existence of a frame where the events occur at the samelocation.

• If λ2 < 0, then λµ is space-like. This means there exists a referenceframe where λµ

′= (0, λ1, 0, 0), i.e., there exists a reference frame where the

“event” described is at the time origin. Recall that if λµ is a vector betweentwo events on the Minkowski spacetime diagram, then space-like-nessimplies the existence of a frame where the events are simultaneous.

• If λ2 = 0, then λµ is light-like. We say that λµ is a null vector, i.e.,we can always find a frame where λµ = (λ0, λ0, 0, 0), or (λ0, 0, λ0, 0), etc.

More generally, λµ = (λ0, ~λ) with ||~λ|| = λ0.

Example 4.1.3. Is Xµ = (ct, x, y, z) a contravariant vector under Poincaretransformations?

In order for Xµ = (ct, x, y, z) to be a contravariant vector under Poincaretransformation, the following needs to hold:

Xµ′ = Λµ′

ν Xν .

Now, the Poincare transformation is given by:

Xµ′ = Λµ′

ν Xν + aµ

for any aµ′. We immediately realize that Xµ is not a vector if aµ

′ 6= 0, be-cause for both equalities to hold, it is required that aµ

′= 0, in which case,

the transformation is reduced to a very special case (Lorentz transformation),rather than any general Poincare transformation where aµ

′can be non-zero.

Example 4.1.4. Is dXµ = (c dt, dx, dy, dz) a contravariant vector under Poincaretransformations?

Recall the form of Poincare transformations:

Xµ′ = Λµ′

ν Xν + aµ

Computing the differentials, we obtain:

dXµ′ = Λµ′

ν dXν + 0 = Λµ′

ν dXν .

Since dXµ transform correctly, dXµ is a vector under Poincare transformations.

Example 4.1.5. Here is an interesting follow-up to the previous example. Sup-pose that we take ∂/∂Xµ of a scalar field φ. Will the end result be a vector?In other words, is ∂φ/∂Xµ a vector? If so, what type of vector is this?

Without much thought, we apply the chain rule on φ = φ(Xν(Xµ′

obtain the following:

∂Xµ′=

∂Xν

∂Xµ′= Λνµ′

∂Xν.

We observe that ∂φ/∂Xµ transforms correctly under Poincare transformations.Therefore, ∂φ/∂Xµ is a vector. Specifically, ∂φ/∂Xµ is a covariant vector.We can tell from two ways: (i) because the upper indices cancel out, and (ii) bythe Λ term in the transformation rule. For covariant transformations from “un-primed” to “primed,” the Λ term has an unprimed upper index and a primedlower index.

The previous example is interesting enough that we should make some def-initions, for the sake of simplicity and bookkeeping. For the partial derivativeoperator with respect to a coordinate, we define

∂µ =∂

∂Xµ

This simplifies our notation for a partial derivative of a scalar field:

∂µφ =∂φ

∂Xµ.

With this definition, we can construct a vector whose components are the partialderivatives of a scalar field: the gradient:

~∇ = ∂i = (∂1, ∂2, ∂3)

Of course, we can define a “generalized gradient” operator as:

∂µ = (∂0, ∂i) =(∂0, ~∇

)These definitions will come in handy when we revisit the Maxwell’s equations.

Now that we have a “covariantly” operator, in makes sense to define a “con-travariantly” operator ∂i. In Minkowski spacetime with Cartesian coordinates,we define a lower coordinate:

Xµ = ηµνXν .

∂µ =∂

∂Xµ

Xµ = ηµνXν ,

we get

∂Xµ

∂Xν= ηµν ,

since ηµν is constant. This implies

∂µ =∂

∂Xµ=∂Xν

∂Xµ

∂Xν= ηµν∂ν .

We quickly realize that we can raise the index of ∂µ to get ∂µ:

∂µ = ηµν∂ν

So, for a scalar field φ:

∂µφ = ηµν∂νφ.

But note that ∂i 6= ~∇. Rather, due to the nature of ηµν (or equivalently ηµν),we get

∂i = −∂i = −~∇.

So, we can write

∂µ =(∂0, ∂i

)=(∂0,−~∇

)4.1.4 Velocity, momentum, and force

In this section we want look at what velocity, momentum, and force look like as4-vectors and how they transform. To be precise, we want to know in which formdo these vectors transform correctly under general Poincare transformations orspecifically in Lorentz transformations. Consider a position vector Xµ = (ct, ~X)in Minkowski spacetime, with metric tensor ηµν .

Let us conjecture that “velocity” is given by:

dXµ′

dtΛµ′

ν Xµ′ +

dtaµ′

= Λµ′

dXµ′

We want to see whether V µ is a vector (specifically 4-vector). Let us calldXµ/dt = V µ and d~x/dt = ~v, we obtain:

V µ = (c,~v).

We give the same definitions for the “primed” frame:

V µ′

=dXµ′

dt′= (c,~v ′).

Since t 6= t′ in general,

V µ =dXµ

dt6= dXµ′

dt′= V µ

which means that

V µ′6= Λµ

ν Vν .

So, no. V µ is not a vector in Minkowski spacetime. Therefore, this definition of“velocity” is not “good.” However, we can actually find a similar 4-vector thatworks as velocity, using the proper time, τ . Consider objects with non-zeromass and speed v < c (so not photons). In this case:

ds2 = c2 dτ2 = ηµν dXµ dXν > 0.

If we divide both sides by dτ2, we get the speed of light squared, which is aninvariant:

c2 = ηµνdXµ

Let us define the world velocity:

uµ =dXµ

We see immediately that uµ transforms correctly. Applying the chain rule:

uµ′

=dXµ′

dτ=dXµ′

dτ= Λµ

ν uµ.

This shows that uµ, as expected by the index placement, is a contravariant 4-vector under general Poincare transformations. We also see that, by definition,

uµuµ = ηµνdXµ

dτ= c2,

which is invariant. Next, we can relate uµ and V µ by c2 dτ2 = c2 dt2 − d~x 2:

dt2= 1− 1

c2d~x 2

dt2= 1− v2

We have just derived time-dilation (for massive objects of course):

dτ= γ

This implies:

uµ = γV µ = γ(c,~v) = (γc, γ~v)

In the object’s rest frame, ~v = 0, γ = 1. This gives uµ = (c, 0, 0, 0). We caninterpret this as “the object travels at speed c in the time direction when in itsrest frame.”

Moving on to momentum. We simply define 4-momentum based on 4-velocity:

pµ = muµ

We can quickly see that

pµ = γ

(γmc2

c, γm~v

)Next, we shall look at what the norm square (invariant) of 4-momentum is. Itis easy to show that:

pµpµ = m2uµuµ = m2c2.

But also, by definition:

pµpµ = ηµν dpµ dν =

c2− ~p d =

c2− p2.

The above two equalities give us the full energy-momentum equivalence

E2 = c2p2 +m2c4

But what about massless particles (like photons)? For massless particles inMinkowski spacetime, v = c, and hence the proper time τ is not defined. Thisalso means the definition uµ = dXµ/dτ is no longer defined. However, the lineelement ds2 still is:

ds2 = c2 dt2 − d~x 2

= c2 dt2(

1− 1

)= c2 dt2

(1− c2

It is indeed true that for light

ds2 = 0

by the construction of Minkowski spacetime. Recall that photon vectors are“light-like,” as we often said special relativity. Here, we will be using a some-what different terminology. We will refer to photon’s trajectories as null in flatMinkowski spacetime. This can be readily shown, as follows

Now, just because the proper time τ is not defined does not mean we cannotdescribe the trajectory of light. Consider a parameterization Xµ = Xµ(σ),where σ is simply some parameter. Let us define uµ = ∂Xµ

∂σ , then, since ds2 = 0

uµuµ = ηµν∂Xµ

∂Xν

∂σ=ds2

dσ2= 0,

i.e., uµ is light-like. We can also talk about energy and momentum of light. Letpµ be the 4-vector momentum of light, then

)=(p0, ~p

But recall that for light, E = hν and ||~p|| = h/λ, i.e., E = c||~p||. This gives

pµpµ = ηµν dpµ dpν =

c2− ||~p||2 =

c2− E2

c2= 0,

which should make sense, because light 4-momentum is a light-like vector. We

shall also see that the wave 4-vector, kµ =(k0,~k

), defined as pµ = ~kµ, is also

a light-like vector. We first notice that

k0 =p0

λ= ||~k||.

This implies

kµkµ =(k0)2 − ||~k||2 = ||~k||2 − ||~k||2 = 0.

This is quite obvious, because kµ ∝ pµ.

Example 4.1.6. Find the wavelength λ for light emitted from a source (withemitted wavelength λ0) that is receding at speed v.

Let the wave vector in the moving frame (S′) be kµ′

=(k0′ ,~k

. Since the

source is in (S′),

kµ′

=(k0′ ,~k

λ0,−2π

λ0, 0, 0

In the stationary frame (S),

kµ =(k0,~k

λ,−2π

λ, 0, 0

Now, since we want to find a quantity in the stationary frame, we want to applythe inverse transformation Λµν′ to kν

′in (S′) to get back kµ in (S):

kµ′

= Λµν′kν .

As a matrix equation:

λ−1

−λ−1

γ βγ 0 0βγ γ 0 00 0 1 00 0 0 1

λ0−1

−λ0−1

It is not difficult to see that

λ= Λ0

ν′kν′ =

λ0− γβ 2π

λ0γ(1− β).

Therefore,

λ0(1− β) =

√1− β1 + β

We obtain nothing but the relativistic Doppler shift formula for a recedinglight source.

λ = λ0

√1 + β

1− β

We notice that because λ > λ0, the observed light is red-shifted. Of course, ifwe have an approach source, then v → −v, which gives blue-shifted observedlight:

λ = λ0

√1− β1 + β

Keep in mind that these are Doppler shifts due to relative motion. Later onin this text, we will discuss gravitation spectral shifts and Cosmologicalred-shift. The first is due the curvature of spacetime, whereas the latter is dueto the expansion of space (the universe).

Just as 4-vector momentum is defined on 4-vector velocity (or the worldvelocity) as its scalar multiple, 4-vector force, fµ, for massive objects, isdefined on 4-vector momentum as

fµ =dpµ

With pµ = muµ, we get the relativistic version of Newton’s 2nd law:

fµ = md2Xµ

Next, we want to relate the 4-force with our much-familiar 3-force, ~F . We canstart with pµ = (E/c, ~p). Applying the chain rule to the 4-momentum, and thefact that dt/dτ = γ,

dτ=dt

dt= γ

dt,d~p

4.2. RELATIVISTIC ELECTRODYNAMICS 67

Noticing that dE/dt = ~F · ~v is the definition of “power,” we get:

fµ = γ

c~F · ~v, ~F

)Example 4.1.7. Show that uµfµ = 0.

Without loss of generality (recall: scalars are invariant), we can assume a1-dimensional case where

c, γF, 0, 0

uµ = (γc, γv, 0, 0) .

We can first show that uµfµ = 0, by a mechanical computation:

uµfµ = ηµνuµfν = γ2vF − γ2vF = 0.

We can also solve this problem geometrically. In Minkowski’s spacetime dia-gram, uµ and fµ are “orthogonal,” not in the sense that the are perpendicular(note that this is true only in Euclidean geometry), but rather in the sense thatthere exists a frame (S′) such that uµ

′is “parallel” to the ct′ axis and fµ

parallel to the x′ axis, i.e., they are orthogonal in (S′). But because their dotproduct is invariant under Lorentz transformations, uµfµ = uµ

′fµ′ = 0.

4.2 Relativistic Electrodynamics

Previously we have obtained Maxwell’s equations in differential form, from theintegral form, with ~E =

(E1, E2, E3

)= Ei and ~B =

(B1, B2, B3

)= Bj . ρ is

the charge density and ~J =(J1, J2, J3

)= Jk is the current density.

Gauss’ law:

‹~E · d~a =

No magnetic monopole:

˛~B · d~a = 0

Faraday’s law:

˛~E · d~s = − d

dtΦB =

~B · d~a

Ampere-Maxwell’s law:

‹~B · d~s = µ0I + µ0ε0

~E · d~a

Note that ~E and ~B are 3-vectors. Together, their six components mix underLorentz transformations, i.e., ~E and ~B join relativistically. We have been work-ing in 4-dimensional flat Minkowski spacetime, and we would want to rewriteMaxwell’s equations in this space. How are we going to accomplish this? Un-fortunately, the full answer is beyond the scope of this text. However, let it begiven a tensor, which is a combination of ~E and ~B in spacetime. We have thefollowing definition:

Definition 4.2.1. The electromagnetic field strength tensor Fµν is given by:

[Fµν ] =

0 E1/c E2/c E3/c

−E1/c 0 B3 −B2

−E2/c −B3 0 B1

−E3/c B2 −B1 0

Now, applying index-lowering operations twice gives

Fµν = ηµαηνβFαβ

As matrices,

[Fµν ] = [ηµα][Fαβ ][ηνβ ]>

This gives

[Fµν ] =

0 −E1/c −E2/c −E3/c

E1/c 0 B3 −B2

E2/c −B3 0 B1

E3/c B2 −B1 0

Next, we can form vectors out of ρ and ~J , the 4-vector current density:

jµ =(ρc, ~J

)The Maxwell equations, in index notation become:

∂vFµν = µ0j

∂σFµν+ ∂µFνσ + ∂νFσµ = 0

We shall see readily how the equations above manifest in differential form. First,we look at

∂nuFµν = µ0j

4.2. RELATIVISTIC ELECTRODYNAMICS 69

For µ = 0,

∂νF0ν = µ0j

0 = µ0ρc

= ∂0F00 + ∂1F

01 + ∂2F02 + ∂3F

= 0 +1

c∂iE

c~∇ · ~E.

But because µ0ε0 = c−2,

~∇ · ~E = ρc2µ0 =ρ

which is Gauss’ law in differential form. Next, we let µ = k = {1, 2, 3}. So,

∂νFkν = µ0j

k = µ0Jk

= ∂0Fk0 + ∂iF

(−Ek

)+ ∂iF

= − 1

c2∂Ek

∂t+ ∂iF

To find what ∂iFki is, we can look at the case where k = 1:

∂iF1i = ∂1F

11 + ∂2F12 + ∂3F

= 0 + ∂2B3∂3

(−B2

)=(~∇× ~B

So, ∂iFki is the curl!

∂iFki =

(~∇× ~B

)kCombining this with what we have found earlier:

c2∂Ek

∂t+(~∇× ~B

)k= µ0J

This is nothing but Ampere-Maxwell’s law:

~∇× ~B = µ0~J + µ0ε0

∂ ~E

Similarly, we can look at the other equation:

∂σFµν + ∂µFνσ + ∂νFσµ = 0.

We can show that for various values of σ, ν, µ, we get Faraday’s law and the“law of no magnetic monopole.”

Example 4.2.1. Show that, for µ = 0, ν = 1, σ = 2, we get

(~∇× ~E

(∂ ~B

This is just a simple plug-and-chug problem:

∂2F01 + ∂0F12 + ∂1F20 =∂

=(~∇× ~E

(∂ ~B

(~∇× ~E

(∂ ~B

This result can of course be generalized to give

(~∇× ~E

)j= −

(∂ ~B

Or equivalently, Faraday’s law.

~∇× ~E = −∂~B

To show the “no magnetic monopole” law, we can limitour use to only the non-zero indices, so that what comes out of the cyclic iden-tity is the divergence of ~B, or ∂iB

i = 0. This is left as an exercise for the reader.

So, to summarize, in special relativity, all physical properties are some sortof tensors with (invariant) scalars: m, τ, ds2, c. We also have the special vectors(which are a type of tensors): the world velocity uµ, 4-momentum pµ, and4-force fµ. Notable tensors include the Minkowski metric ηµν = ηµν , and theelectromagnetic field strength tensor Fµν , all of which transform correctly underLorentz transformations.

Chapter 5

Curved spaces

Recall that equivalence principle leads us towards the idea of curved spacetime.In general relativity, gravity is not a force. Instead, massive objects curve orwrap spacetime around them. Light travels as a free particle along a “geodesic”through curved spacetime. One of the questions we will eventually answer in thissection is: “How to find the equation for geodesics?” To answer this question,we will have to learn how to describe curved spaces and spacetime directly. Wewill derive the geodesic equation:

dτ2+ Γµνσ

dτ= 0.

We will also see how the metric tensor gµν and the Christoffel symbol Γµνσ,and the Riemann curvature tensor Rλµνσ are related. We, of course, willexplore what these mathematical objects are. Then (finally), we will look atthe Einstein equations that will let us solve the metric tensor gµν for a givendistribution of matter (mass and energy).

5.1 Geodesics

In 3-dimensional flat space, we can think of a geodesic as the shortest distancebetween two points, i.e., a straight line, or the path of a free particle. It is veryimportant to remember that free particles follow geodesics, as this is truein all spaces, while geodesics can be curves.

For general n-dimensional spaces, geodesics are no longer shortest distances,as we will see in this section. Rather, geodesics are paths of free particles.

Example 5.1.1. Show that a geodesic in Minkowski spacetime is not the short-est distance.

72 CHAPTER 5. CURVED SPACES

Consider a free particle, fµ = 0. This implies

∂2Xµ

∂τ2= 0,

which means there exists a geodesic Xµ in spacetime such that Xµ = Xµ(τ) isa straight line in spacetime (the equation above, in terms of vector calculus,means the curve has no curvature). However, we can show now that this Xµ

does not give the shortest distance in spacetime. Consider

ds2 = ηµν dXµ dXν .

For moving, massive particles, Xµ is time-like:

ds2 = c2 dτ2 > 0

Now, consider two points A and B on a Minkowski’s spacetime diagram andconsider two possible paths from A to B. The first path (vertical) indicates that

the particle is at rest in spacetime (so fµ = 0). The second path is that of amoving particle. Now, because they depart and arrive at the same locations inspacetime, ∆τ = ∆t. Next, we calculate the “length” of the first path as c∆τ ,as evidently shown on the spacetime diagram. What about the length of thesecond path?

c∆τ ′ = 2

2c∆t

− (∆x)2< c∆τ2.

This implies

∆τ ′ < ∆τ,

i.e., the geodesic maximizes the proper time τ , or equivalently, the “length” ofthe geodesic in this case is the largest of all possible “lengths.”

5.2. 2-DIMENSIONAL CURVED SPACES 73

5.2 2-dimensional curved spaces

According to general relativity, we live in a 4-dimensional spacetime. However,this is hard to visualize. To start off more simply, we can look at 2-dimensionalspaces that we can embed in 3-dimensional spaces.

First off, 2-dimensional spaces can be embedded in 3-dimensional spaces. Forexample, the xy-plane can be embedded into 3-space simply by letting z = 0 inCartesian coordinates. Just like how 1-dimensional curves in 3-dimensions canbe described by a parameterization ~r = ~r(t) = (x(t), y(t), z(t)), 2-dimensionalsurfaces can be parameterized with 2 parameters, say u and v. For example, asurface ~S in R3 can have the following form:

~S = ~S(u, v) = (x(u, v), y(u, v), z(u, v)).

Example 5.2.1. Parameterize the surface of a sphere of radius R.

We know from spherical coordinates the conversion formulas:

x = R sin θ cosφ

y = R sin θ sinφ

z = R cos θ.

So the parameterization is simply:

~S(θ, φ) = (R sin θ cosφ,R sin θ sinφ,R cos θ),

where 0 ≤ φ ≤ 2π and 0 ≤ θ ≤ π. We can give this parameterization toMathematica, under “ParametricPlot3D” to get a sphere. Let R = 2.

But we in fact don’t need an extra dimensionto describe a curved space. An obvious example is that we don’t need to leavethe surface of Earth to know that it is curved. With just two coordinates u andv in a 2-dimensional space, we can mathematically show whether the space iscurved or not. The first step is to generate tangent vectors:

~ei =∂~r

∂ui.

For the case of a 2-dimensional surface parameterized by u and v, we obtaintwo tangent vectors:

~eu =∂~S

∂u, and ~ev =

Note that these tangent vectors are not in the surface (this is also true in generaln-dimensional cases). Rather, they lie in a a tangent space Tp at each point Pin the space. Tangent vectors are useful because they give directions along the

Figure 5.1: Surface of a sphere of radius R = 2

curve or surface.

Next, we want to look at an infinitesimal displacement d~S. Because ~S =~S(u, v), we get from the chain rule:

d~S =∂~S

∂udu+

∂vdv = ~eu du+ ~ev dv.

For the sake of convenience (and further generalizations in the future), we shallswitch to index notation. Recall that capital Roman characters are reserved for2-dimensional manifolds, let uA = (u1, u2) = (u, v), with A = 1, 2. This gives

d~S = ~eA duA.

It follows that the line element ds2 is

ds2 = d~S · d~S =(~eA du

A)·(~eB du

= ~eA · ~eB duA duB .

But this is nothing but the all-familiar:

ds2 = gAB duA duB

Note that we have just shown the above equation holds not only for flat butalso for general curved spaces. With this, we can go on to calculate lengths ofcurves in general curved 2-dimensional spaces. Let a curve in this 2-dimensionalspace be ~l(σ) = (u(σ), v(σ)). The length L is this curve for a ≤ σ ≤ b is given

5.2. 2-DIMENSIONAL CURVED SPACES 75

√gAB duA duB

ˆ √gAB

dσdσ

√gABuA(σ)uB(σ) dσ.

We realize that this has the same form as the length-formula we have foundearlier. The only difference is that we are in curved spaces. This suggests thatwe can generalize this formula for any n-dimensional space.

We have defined the natural basis vectors as ~eA = ∂~S/∂uA. What about

the dual basis vectors? If we try to define the dual basis vectors as ~eA = ~∇uA,we run into trouble. Recall that we had we previous had three coordinates, andthat the gradient of u is a normal vector to the surface of constant u. However,in 2-dimensional spaces constant u are lines, which has infinitely many normalvectors. This simply means we cannot use the gradient operator to find ~eA.Rather, we will use a more powerful tool, the inverse metric tensor, gAB toraise the index of the natural basis vectors ~eA:

~eA = gAB~eB

So, the general procedure is: (i) find the tangent vectors, i.e., the natural basisvectors ~eA, (ii) generate the metric tensor gAB , (iii) compute the inverse metrictensor gAB , and (iv) find the dual basis vectors ~eA from raising the index of ~eA.

Example 5.2.2. Consider a saddle in 3-dimensional flat space. Using paraboloidalcoordinates with constant w, find the metric tensor and inverse metric tensorof this saddle surface.

Recall the conversion formulas for paraboloidal coordinates:

x = u+ v

y = u− vz = 2uv

Let be saddle be

~r = (u+ v, u− v, 2uv).

We first find the tangent vectors (or the natural basis vectors):

~e1 =∂~r

∂u= (1, 1, 2v)>

~e2 =∂~r

∂v= (1, 1, 2u)>.

Finding the metric tensor is then purely mechanical:

[gAB ] = [~eA · ~eB ] =

(2 + 4v2 4uv

4uv 2 + 4u2

as for finding the inverse metric tensor:

[gAB ] = [gAB ]−1 =1

2(1 + 2u2 + 2v2)

(1 + 2u2 −2uv−2uv 1 + 2v2

)With the natural basis vectors and the inverse metric tensor, we can easily findthe dual basis vectors:

~e1 = [gAB ]~e1 =1

2(1 + 2u2 + 2v2)

(1 + 2u2 −2uv−2uv 1 + 2v2

) 112v

~e2 = [gAB ]~e2 =

2(1 + 2u2 + 2v2)

(1 + 2u2 −2uv−2uv 1 + 2v2

Ultimately, though, we will find that basis vectors are not so useful once we

already have the metric tensor. So we will not be worrying about them so muchmoving forward. In the next sections, we will explore the Einstein equations,whose purpose is to find the metric tensor describing a space of a given matterdistribution. Once we obtain the metric tensor, we can just mechanically workout other characteristics of the space.

5.3 Manifolds

An arbitrary, curved n-dimensional space is called a manifold. Assuming thatwe know the metric tensor, we can write the n coordinates associated with the

5.3. MANIFOLDS 77

space as

xa = (x1, x2, . . . , xn),

for any coordinate system. Next, assume that we have two coordinate systemswhose conversion formulae are differential and invertible functions:

= xa′(xb)

xb = xb(xa′).

Next, just like before, we define the Jacobian matrix for the transformation:

[Xa′

[∂Xa′

And of course the inverse transformation:

[Xab′ ] =

[∂Xa

∂Xb′

]These have the usual properties we have seen before:

b Xbc′ = δac = Xa

b′Xb′

Again, just as before, we define vectors, tensors, and scalars by how the trans-form:

Contravariant vectors: λa′

= Xa′

Covariant vectors: µa′ = Xba′µb

Tensors: τa′b′

c′d′ = Xa′

e Xb′

f Xgc′X

hd′τ

Raising/lowering index: τab = gbdτad

Identity: gabgbc = δac

In general, the metric tensor need not be positive definite, which meansthat ds2 can be positive, negative, or zero. However, the signature (the num-ber of positives minus the number of negatives) of gab is a scalar and is −2 forall metric in general relativity. For instance, ηµν has signature −2.

There are classes of manifolds: Riemannian manifolds, which have posi-tive definite metric tensors, and pseudo-Riemannian manifolds, where met-ric tensors do not have to be positive definite. Note that spacetime is a pseudo-Riemannian manifold.

To define lengths and distances as real numbers, we need to add the absolutevalues to our previous definitions:

Infinitesimal distance: ds =√||gab dxa dxb||

Length of a curve: L =

ds = intba

√||gab dxa dxb||

Vector norm: ||~λ|| =√||λaλa||

Note that vectors can be null or non-null. For non-null vectors in generaln-dimensional manifolds, we can also define an “angle” θ between two vectors:

cos θ =~λ · ~µ||~λ||||~µ||

=gabλ

||~λ||||~µ||.

Beware that this definition works well (intuitively) with positive definite metrictensors. However, for spacetime (whose metric tensor is not positive definite),things can get a little weird.

Example 5.3.1. Consider space-like vector λµ in flat 4-dimensional Minkowskispacetime. Find θ between it and itself.

Without changing the inner product (due to invariance), consider a frame

where λµ = (0, ~λ). According to our definition:

cos θ =ηµνλ

µλν

||λµ||2=−||~λ||2

||~λ||2= −1.

This implies θ = π, so the vector makes an angle of π with itself.

We call the vectors λµ and πµ orthogonal if λµ · πµ = 0, i.e., there exists aframe where they are orthogonal.

5.4 Tensors on manifolds

5.4.1 Combining tensors

In this section, we will learn the mathematical properties of tensors includingspecial identities and tensor algebra. Familiarizing ourselves with tensor algebrawill save us a lot of time and effort when we cover the later topics where we willencounter a few more mathematical objects that are defined on tensors.

Proposition 5.4.1. Adding tensors of the same type gives tensors.

5.4. TENSORS ON MANIFOLDS 79

Proof. Let two general tensors τabc and Θabc be given. Let ζa

′b′

c′ = τa′b′

c′ +

Θa′b′

c′ . We will show that ζa′b′

c′ is a tensor, i.e.:

ζa′b′

c′ = XadX

fc′ζ

The proof is a straightforward application of the transformation rules for tensors:

ζa′b′

c′ = τa′b′

c′ + Θa′b′

= XadX

fc′τ

def +Xa

fc′Θ

= XadX

(τdef + Θde

fc′ζ

Since ζdef transforms correctly, it is a tensor.

Proposition 5.4.2. Multiplying a tensor by a scalar gives a tensor

Proof. The proof to this proposition is trivial. Suppose σab = ατab , where τab isa tensor. We want to show that σab is also a tensor, i.e.,

σa′

b′ = Xa′

c Xdb′σ

Once again, a straightforward application of the transformation rule applies:

σa′

b′ = ατa′

= αXa′

c Xdb′τ

= Xa′

c Xdb′σ

So far, we have seen that properties of tensors are similar to those of vectors.Specifically, we see that tensor spaces, similar to vector spaces, are also closedunder addition and scalar multiplication. But there are more properties thattensor spaces have but vector spaces do not.

Proposition 5.4.3. Products of tensors are tensors.

Proof. Suppose that σabc = λaτ bc . We want to show:

σa′b′

c′ = Xa′

d Xb′

e Xfc′σ

Once again...

σa′b′

c′ = λa′τ b′

=(Xa′

d λd)(

e Xfc′τ

)= Xa′

d Xb′

e Xfc′λ

= Xa′

d Xb′

e Xfc′σ

Proposition 5.4.4. Contracting a tensor of type (r, s) gives a tensor of type(r − 1, s− 1).

Proof. Suppose τabcd is a (2, 2) tensor. Let us call another tensor σab = τabcb ,which is a (1, 1) tensor. We want to show that the contraction σab is also atensor:

σa′

b′ = Xa′

d Xgb σ

Again, applying the transformation rules:

σa′

b′ = τa′c′

c′b′

= Xa′

d Xc′

e′Xfc′X

gb′τ

= Xa′

d Xgb′δ

= Xa′

d Xgb′τ

= Xa′

d Xgb′σ

Note that we have already seen this property already in the index lower-ing/raising operations. For example:

λa = gabλb.

As a consequence of the above proposition, we can let σabc be

σabc = τabeµegcfλf ,

and know immediately that it is a tensor (since the indices “add up to” or“contract to” to upper ab and lower c).

Proposition 5.4.5 (Diving: Quotient Theorem). Suppose τabcλc transforms

as a tensor for all λc. The quotient theorem says τabc is a tensor.

Proof. We want to show that

τa′

b′g′ = Xa′

d Xeb′X

fg′τ

We first know that the given quantity transforms correctly as a tensor. We alsoknow that λc

′= Xc′

f λf . It follows that

τa′

b′c′

(Xc′

f λf)−Xa′

d Xeb′τ

def λ

τa′

b′c′Xc′

f −Xa′

d Xeb′τ

def = 0

τa′

b′c′Xc′

f = Xa′

d Xeb′τ

τa′

b′c′ δc′

g′ = Xa′

d Xeb′X

fg′τ

τa′

b′g′ = Xa′

d Xeb′X

fg′τ

Therefore, τabc is a tensor.

5.4. TENSORS ON MANIFOLDS 81

5.4.2 Special tensors

Symmetric tensors. Let a two-indexed tensor τab be given. It is symmetric if

τab = τba.

Note that a straightforward application of the transformation rule can showthat if a tensor is symmetric in one frame (S), then it is also symmetric in adifferent frame (S′).

Anti-symmetric tensors. Let a 2-indexed tensor τab be given. We say thatτab is anti-symmetric if

τab = −τ ba.

Again, an application of the transformation rule can show that if a tensor isanti-symmetric in one frame, it is anti-symmetric in all frames.

Kronecker delta. Recall that the Kronecker delta (a type (1, 1) tensor) isdefined as

δab =

{1, a = b

0, a 6= b.

As we might have seen before, the Kronecker delta is frame-independent:

δab = Xa′

e Xfb′δ

ef = Xa′

e Xeb′ = δa

b′ .

Note: For most tensors (with any arbitrary number of indices), the ordering ofindices matter, i.e., τabe 6= τaeb in general. Therefore, if we are not sure whetherthe ordering of the indices matter, we should not write, for instance τacb becausethis could mean, say τa bc or τacb .

Chapter 6

Gravitation and Curvature

As we have said many times before, in general relativity, gravity is not a force.Rather, matter (mass and energy) causes spacetime to curve. And becausegravity is not a force, we define free particles - objects who are not actedupon by any force - as objects moving with no forces other than gravity. Thetrajectory of free particles are called geodesics.

In this section, we will learn about:

• Curvature: How can we tell if a space is curved?

• Geodesics: What is an equation to solve for a geodesic, and how can wesolve for a geodesic?

• Motion in curved spaces: How do vectors behave as we move them alonga geodesic? What if we move them not along a geodesic?

• The laws of physics: How do we describe the laws of physics in tensor form?As space can be curved and vectors can change due to the curvature ofspace, how we define the “derivative?”

• Newtonian limit: Will we be able to get back Newtonian physics in thelimit?

This section will provide us with some powerful mathematical machinery forworking with and understanding the Einstein equations and cosmology in thefuture.

6.1 Curvature

As tiny humans living of the surface of a giant ball, how could we tell that thespace we are living in, where it seems locally flat, is actually curved? And whatwe mean by “going straight” once we have found out the Earth is round?

84 CHAPTER 6. GRAVITATION AND CURVATURE

One way to test if a 2-dimensional space (the Earth for instance) is to justto walk, but with equal left and right step lengths. Have two people start walk-ing parallel to each other. If one point their paths cross, then the space is curved.

Another way to see if the Earth is curved or flat is doing the “triangle test.”Simply generated a “spherical triangle” on the surface of Earth by intersectingthree great circles (so the equator and two perpendicular (at the pole) longi-tude lines would suffice). We would observe that the sum of the angles in thistriangle is 270◦, not 180◦. This says that the surface of the Earth is curved.

Now, before we go on and attempt to make a rigorous procedure to distin-guish curved and flat spaces, we should make a few key observations this aboveexample:

• If parallel lines cross in a space, then the space is non-Euclidean and istherefore curved.

• Lines formed when walking “straight” (walking with left step size equalright step size as in our first method) are geodesics.

• On a sphere, great circles are circles whose center is the origin, for example:the equator, longitude lines. Beware that latitude lines are not greatcircles. On a sphere, great circles are geodesics.

6.2 Geodesics and Affine connections Γσµν

In this section, we will learn a procedure to generate a geodesic, given that weare in a space or spacetime where we know the metric of the space. This proce-dure will follow something along the line of “moving straight” in the space, butmuch more mathematically rigorous.

6.2.1 Flat 3D space

Let us consider the simplest possible case with flat 3D case where geodesics are,as we guessed it, literally straight lines. In Cartesian coordinates, a straight lineobeys the “no-curvature” equation:

∂2~r

∂t2= ~0.

Now, suppose that we are in an arbitrary curvilinear coordinate system. Whatwould be the equation of a straight line here? It makes sense to think aboutdistance and displacement. If we are moving such that the distance coveredthe same as displacement, at every instance, then we are moving in a straight

6.2. GEODESICS AND AFFINE CONNECTIONS Γσµν 85

line. So, let our displacement be ~r. With arc length parameterization: ~r = ~r(s),where s is essentially distance traveled. The condition of straightness gives:∣∣∣∣∣∣~r(s)∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣d~rds∥∥∥∥∣∣∣∣ = 1

We should recognize that d~r/ds is nothing but the tangent vector to the path.

Let’s call this vector ~λ. Using the chain rule, it follows that

~λ =d~r

∂ui∂ui

∂s=∂ui

∂s~ei = λi~ei,

where ui is simply a coordinate in the curvilinear coordinate system. The aboveequality gives:

λi =∂ui

∂s= ui(s)

Now, be ~λ is a tangent vector, its direction does not change along a straight line.Since we also know that ||~r(s)|| = 1, ~λ has both fixed direction and magnitudealong a straight line. Therefore, “straightness,” at least in this limited case,implies that the derivative of tangent vectors with respect to arc length is zero.We call this implication the condition of straightness:

∂~λ

∂s= 0

Let’s take a closer look at the boxed expression but in a different coordinatesystem. If we think about the above equality, it should also make sense for anygeneral 3-dimensional coordinate system in flat space, in which case:

(λi~ei

)= λi~ei + λi~ei = ~0.

In Cartesian coordinates, the basis vectors ~ei happen to be constant, so ~ei = ~0.This gives λi = 0. And since λi = ui = xi, which implies ∂2xi/∂s2 = ~0. Nowimagine if we parameterize ~r with a different parameter, say t. If t ∝ s, then

∂2xi

∂s2=∂x2xi

∂t= ~0

holds. This is equivalent to the case we have an object moving with constantvelocity (so that s = vt). If s is not proportional to t then the above equalitydoes not hold.

We have established the “condition of straightness” in Cartesian coordinates.However, the equation ∂2xi/∂s2 = 0 does not hold for general coordinate sys-tems. Rather, the more fundamental equation applies:

∂λi

∂s~ei + λi

∂~ei∂s

=∂~ei∂uj

∂s=∂~ei∂uj

uj 6= 0

in general. Now, to simplify our notation, let us use (as we have introducedbefore) ∂j = ∂/∂uj . This gives:

∂λi

∂s~ei + λi

∂~ei∂s

= λi~ei + λi (∂j~ei) uj = 0

Note that ∂j~ei are vectors, which we can expand in terms of basis vectors, butwe will not do that. Rather, let us define the Christoffel symbol, or the affineconnection, Γkij in terms of ∂j~ei as:

∂j~ei = Γkij~ek

We will interpret Γkij as “the kth component of the uj th derivative of ~ei.” It is

very important that we keep in mind Γkij is not a tensor. Rather, the Christoffelsymbols are connections - they are just vector components.

With the definition of Γkij , we can express the derivative of ~ei as:

~ei = (∂j~ei) uj = Γkij~eku

It follows that the condition of straightness becomes:

ds= λi~ei + λiΓkij~eku

= λi~ei + λjΓijk~eiuk, letting i→ j, j → k, k → i

=(λi + Γijku

λi + Γijkuk = 0.

Let us write the above equality more completely with the arc length parameters, and call it the equation of a straight line in flat 3-dimensional space. Butnotice how the number of dimensions do not really play a role anywhere inour derivation. This hints to us that the equation also holds for n-dimensionalmanifolds, but we will discuss this topic later on.

ds2+ Γijk

6.2. GEODESICS AND AFFINE CONNECTIONS Γσµν 87

For i, j, k, the above expressions encodes 3 equations whose solution is geodesicin flat space.

Now, let us “worry” a little bit about the Christoffel symbol Γijk. It seems like

that because Γijk has three indices i, j, k, there will be 27 coefficients associatedwith different combinations of i, j, k. However, we can show that this is not thecase. Recall that

∂j~ei = Γkij~ek.

Dotting the entire equation with ~el gives

∂j~ei · ~el = Γkij~ek · ~el = Γkijδlk = Γlij .

Also, observe that

∂j~ei =∂

∂uj∂~r

∂ui=

∂ui∂~r

∂uj= ∂i~ej .

This simply implies a symmetric relation:

Γlij = Γlji

So, it turns out that we only have 18 independent terms, instead of 27.

Next, we want to the relationship between the Christoffel symbols and themetric tensor. Specifically, we want to find out how to compute the Christoffelsymbols, using the metric tensor. Consider:

∂kgij = ∂k (~ei · ~ej)= ~ej∂k~ei + ~ei∂k~ej

= ~ejΓmik~em + ~eiΓ

mkj~em.

This gives

∂kgij = Γmikgjm + Γmjkgim

By symmetry, we also get

∂igjk = Γmjigkm + Γmkigjm

∂jgik = Γmkjgim + Γmij gkm

Adding the first two equations and subtract the third, we get

∂kgij + ∂igjk − ∂jgik = 2Γmikgjm.

Next, multiplying this entire equation by gjl, so that gjmgjl = δlm, we get

Γlik =1

2gjl (∂kgij + ∂igjk − ∂jgik) .

Finally, letting l → k, k → i, i → j, j → l, we obtain the formula to computethe Christoffel symbol from the metric:

Γkij =1

2gkl (∂igjl + ∂jgil − ∂lgij)

Now, in Cartesian coordinates, the metric is simply the identity matrix, whichis constant. This simply means that all Christoffel symbols Γkij in Cartesian co-ordinates are zero. However, this does not imply if the Christoffel symbols arenon-zero then the space is curved. In fact, we get non-zero Christoffel symbolsin curvilinear coordinates in flat space whenever the basis vectors ~ei are notconstant. An example would be spherical coordinates, which we will soon findout.

We might ask whether there is a nice, short way to calculate the Christof-fel symbols. Unfortunately, the answer is no - brute force is the only way togo. Luckily, though, Mathematica can handle Christoffel symbol computations,given a metric tensor.

Example 6.2.1. Find Γ123 = Γ1

Again, brute force is the way to go:

Γ123 = Γ132

2g11 (∂2g31 + ∂3g21 − ∂1g23)

2g12 (∂2g32 + ∂3g22 − ∂2g23)

2g13 (∂2g33 + ∂3g23 − ∂3g23)

We can of course repeat this process to find the other 25 terms, but we will notdo that!

We have used arc length as a parameter in finding the geodesic equation:

ds2+ Γijk

ds= 0.

Now, what if we use a different parameter t = f(s)? How will the form of theequation above change? We can show (but we won’t) that with t = f(s), theabove equation has an extra term:

ds2+ Γijk

ds= − d

)−2dui

6.3. GEODESICS IN CURVED SPACE 89

But if we have a linear relationship between t and s, i.e.,

t = As+B,

where A 6= 0 and B are constants, then we get

dt2+ Γijk

dt= − d

)−2dui

simply because

ds2= 0.

A parameter t of this form t = As+B is called an affine parameter.

6.3 Geodesics in curved space

We have seen multiple correspondences between flat 3-dimensional space incurvilinear coordinates and curved general N -dimensional manifolds:

uj ↔ xa

gij ↔ gab

λi′

= U i′

j λj ↔ λa

′Xa′

ds2 = gij dui duj ↔ ds2 = gab du

a dub,

which is not surprising because flat 3-dimensional space is simply a special caseof general curved N -dimensional spaces. We have expected the form of thegeodesic equation in flat 3-dimensional space to remain the same in generalcurved N -dimensional space. And we should be correct, because nowhere inour derivation of the geodesic equation is dimension-specific, i.e., the derived

equation does not just apply exclusively to the flat three dimensional case. So,the geodesic equation in general curved N -dimensional space is simply

dσ2+ Γabc

dσ= 0

where σ is an affine parameter. The Christoffel symbol Γabc is defined inexactly the same fashion as before:

Γabc =1

2gad (∂bgcd + ∂cgbd − ∂dgbc)

Now, let us dwell into a nice example that illustrates how the geodesic equationgives a geodesic.

Example 6.3.1. Determine if lines of constant latitude of a 2-sphere of radiusa are geodesics.

We have said before that only great circles on a sphere are geodesics. Now,we can verify this fact using the geodesic equation. First, consider any circle onthe 2-sphere of radius 2. Let us oriented the sphere so that this line is the lineof constant latitude. Next, let us gather all we have known about the geometry

of a 2-sphere. The metric tensor is given by

[gAB ] =

(a2 00 a2 sin2 θ

and the inverse metric tensor is given by

[gAB ] =

(a−2 00 a−2 sin−2 θ

where θ is the angle form by the position vector ~r with the z-axis. The line ofconstant latitude is given by the parameters:

uA = (u1, u2) = (θ, φ),

6.3. GEODESICS IN CURVED SPACE 91

where θ is fixed and 0 ≤ φ ≤ 2π. The question we should be asking ourselves iswhether

ds2+ ΓABC

ds= 0,

where s is the arc length parameter, holds for certain values of θ. Because linesof constant latitudes are circles, the arc length s is easy to find: s(φ) = r′φ =aφ sin θ. This gives:

a sin θ.

Therefore, we can parameterize the path with the arc length s:

uA(s) = (u1, u2)(s) =(θ,

a sin θ

But since θ is fixed,

ds2= (0, 0).

So it seems that we only need to check for which θ

ΓABCduB

ds= 0.

So the next reasonable step to take is computing the Christoffel symbols. Thereare 8 of these, but it turns out (by symmetry and the fact that gAB is diagonal),we only have to do a few computations:

Γ122 = − sin θ cos θ

Γ212 = Γ2

21 = cot θ

Γ112 = Γ1

21 = Γ211 = Γ2

22 = 0.

For A = 1,

ΓABCduB

− sin θ cos θdu2

− sin θ cos θ (a sin θ)−2

can only be true if θ = π/2. This suggests that the equator is the only viablecandidate to be a geodesic. However, we should check if the equality can hold

for A = 2, because if the equality cannot hold then we have to conclude thatlines of constant latitude cannot be geodesics. Let us check:

ΓABCduB

(0 + 0) cot θ = 0.

Ah! So equality holds are any θ in this case. Combining the above too facts, weconclude that only the circle of latitude π/2 works as a geodesic. Equivalently,for a sphere, only great circles are geodesics.

6.4 Parallel transport

Our condition for constructing a geodesic was that the tangent vector ~λ = λi~ei =λi = ~ei = ui~ei does not change as we move along the curve, i.e. the conditionof straightness:

ds= 0,

which led us to the geodesic equation:

ui + Γijkuj uk = 0.

Now, what about any arbitrary vector ~λ that is not a tangent to the geodesic?To ask how ~λ changes as it moves along a geodesic is a legitimate question.However, we have to be more specific. There are infinitely many ways for ~λ totravel along the geodesic because the “tangent condition” is no longer required.So, let us require that we move ~λ along the geodesic in such a way that the onlychanges happening the ~λ are due to the geometry of the space. For instance,imagine pushing a pencil along the equator without rotating it. Obviously thedirection of the pencil changes over time, but not because we are exerting atorque on it but rather because the surface of the globe is a curved space. Wethis action of “moving a vector in a space without changing it” parallel trans-port. We will define this term more rigorously as we move on.

Now, because ~λ does not change. Let t be an affine parameter, we get thecondition of parallel transport:

dt= 0.

In flat space, ~λ does not change its direction due to parallel transport. But incurved spaces, as we have discussed in the example of the pencil and the globe,

6.4. PARALLEL TRANSPORT 93

~λ can change directions due to curvature. Now, we can expand the condition ofparallel transport to get further insights:

λi~ei + λi~e i = 0.

Since we also know that

~ei = uj∂j~ei = Γkij uj~ek,

letting k → i, i→ j, j → k, we get

λi + λjΓikj uk = 0

This equation tells us how the components λi change when ~λ is parallel trans-ported along any curve parameterized by the affine parameter t. Note (again)that our geodesic equations are not dimension-specific, i.e., we can just changethe indices for the equation to work in any N -dimensional manifold:

λa + λbΓabcuc = 0

Example 6.4.1. Show that we get the geodesic equation from the above equal-ity in the case where ~λ is a tangent vector.

If λi = ui, then the above equality simply becomes:

ui + Γijkuj uk = 0,

which is exactly the geodesic equation. Note that this also says to paralleltransport tangent vectors, the curve must be a geodesic (so that ~λ remains atangent vector).

Example 6.4.2. Consider a unit vector ~λ on the surface of a sphere of radiusa which makes an angle α with respect to a longitude line. Show that as ~λ is

parallel transported along the line of constant latitude θ0, the direction of ~λchanges by an angle χ = 2πω, where ω = cos θ0.

First, we parameterize the curve as uA = (u1, u2) = (θ0, φ), where θ0 isfixed and 0 ≤ φ ≤ 2π. The arc length is s = φa sin θ0. So this gives, asbefore, φ = s (a sin θ0)

−1. Next, let ~λ(0) be the vector before parallel transport

and ~λ(2π) be the vector after parallel transport. We know that χ is the angle

between formed between these two vectors. Now, because ~λ(0) makes an angleα with a longitude line, we have

λA(0) =(λ1(0), λ2(0)

)=(a−1 cosα, (a sin θ0)

−1sinα

We can readily verify that ~λ(0) is indeed a unit vector:

gABλA(0)λB(0)

=(a−1 cosα (a sin θ0)

−1sinα

)(a2 0

0 (a sin θ0)2

)(a−1 cosα

(a sin θ0)−1

)= cos2 α+ sin2 α = 1.

We can also readily verify that λA(0) makes an angle α with the longitude line.Let µA be a vector that points along a longitude line. In Cartesian coordinates,

µA = 1~eθ + 0~eφ.

So, in the spherical basis, µA simply becomes (1, 0)>. The angle between λA

and µA can come out of their inner product:

cosβ =gABµ

||µA||||λB ||=a2a−1 cosα

a= cosα.

So, the angle between µA and λA is indeed α. Next, we parallel transport λA

along the line of constant latitude θ0. We need to solve the parallel transportequations (there will be two of them) to find the new components of ~λ afterparallel transporting:

λA + ΓABCλBuC = 0.

With initial values

~λ(0) =

(a−1 cosα

(a sin θ0)−1

the problem boils down to an ordinary differential initial value problem. Usingthe Christoffel symbols we have found before, we have a coupled system ofequations: {

λ1 + Γ122λ

2u2 = 0

λ2 + Γ212λ

1u1 + Γ221λ

2u2 = 0.

6.5. CURVED SPACETIME 95

We are not going over the details of how to solve this system, but we certainverify that

λA(t) =(λ1(t), λ2(t)

)=(a−1 cos(α− ωt), (a sin θ0)

−1sin(α− ωt)

)where ω = cos θ0 solves the initial value problem. Now, let t = 2π, we get

λA(2π) =(a−1 cos(α− 2πω), (a sin θ0)

−1sin(α− 2πω)

The final step is to find the angle between λA(0) and λA(2π):

cosχ =gABλ

A(0)λB(2π)

||λA(0)||||λB(2π)||=gABλ

A(0)λB(2π)

= a2a−1 cosαa−1 cos(α− 2πω) + (a sin θ0)2(a sin θ0)−2 sinα sin(α− 2πω)

= cosα cos(α− 2πω) + sinα sin(α− 2πω)

= cos(α− α+ 2πω)

= cos 2πω

So, χ = 2πω = 2π cos θ0. What happens if we look at θ0 = π/2? Not surpris-ingly, χ = 2π cosω = 2π cos 2π = 0, i.e., the direction of λA(2π) is the same asλA(2π) if we move it along a geodesic.

6.5 Curved Spacetime

As we have established before, the geodesic equations are not dimension-specific.So we expect that the form of the geodesic equations hold in curved spacetime,and we would be right:

dτ2+ Γµνσ

dτ= 0.

with the Christoffel symbols defined in the same fashion as before

Γµνσ =1

2gµρ (∂νgσρ + ∂σgνρ − ∂ρgνσ) .

Instead of giving us the trajectory of a free particle in space, the geodesic equa-tions in curved spacetime gives the trajectories of free particles in spacetimexµ(τ). For example, we can solve for the trajectory of a particle in a gravita-tional field (recall that in general relativity, gravity is not a force). Also recallthat for massive particles, we can use the proper time τ as an affine parameterdue to the relation ds2 = c2dτ2.

Likewise, any vector λµ can be parallel transported along a curve xµ(τ)where how the components changes obeys the 4-dimensional parallel transportequations:

λµ + Γµνσλν xσ = 0.

6.6 Principle of covariance

The next question we want to ask ourselves is, given the geometry of spacetimeand the trajectory of free particles, can we formulate the laws of physics forcurved spacetime in a way that is the law of physics in flat spacetime and flatspace are just special cases. Of course, to answer this question we require theprinciple of covariance.

Recall that one of the postulates of special relativity is that the laws ofphysics are the same in all inertial frames, i.e., equations of physics are invari-ant under general Lorentz transformations.

Example 6.6.1.

fµ =dpµ

Λν′

µ fµ = Λν

fν′

(Λν′

µ pµ)

fν′

The first equality is equivalent to the last equality. In fact, they differ only inthe dummy indices. Note that the metric tensors are also the same in bothframes ηµν = ηµ′ν′ , we can indeed confirm that the two mentioned equalitiesare the same. This example is one many verifications for the fact that equa-tions of physics in special relativity are invariant under Lorentz transformations.

In general relativity, equations have to maintain the same form under gen-eral coordinate transformations. We have been aware that the metric tensorand Christoffel symbols can differ in different coordinate systems, so we are notguaranteed the invariant condition. Therefore, in general relativity, while thelaws of physics do not have to be invariant under transformations, they haveto be covariant. Note that invariance implies covariance. But the converse isnot true.

6.6. PRINCIPLE OF COVARIANCE 97

In trying to figure out how equations hold in curved spacetime, Einsteinintroduced the principle of covariance:

An equation is true in general relativity in all coordinate systems if:

• It is true in special relativity.

• It is a tensor equation that preserves its form under general coor-dinate transformations (covariance).

We should make a few remarks on this principle. The first condition stemsfrom the equivalence principle, which states that there is always a freely fallingframe where the laws of special relativity hold locally. The second condition ismotivated by the fact that tensors of the same type all transform the same wayunder general coordinate transformations. For example, for tensors Aµ = Bµ,

Aµ′

= Xµ′

n uAν = Xµ′

ν Bν = Bµ

What the principle of covariance is saying is that as long as the laws of spe-cial relativity are written in tensor form, the same equations will be true in thepresence of gravity. This gives us a powerful prescription for finding the lawsof physics in general relativity. Let’s us solidify our understanding so far bylooking an example where things go wrong.

Example 6.6.2. We know that

fµ =dpµ

holds in special relativity (flat spacetime). Does this equation also hold in gen-eral curved spacetime?

We shall check whether the equation above is a tensor equation. If it is, thenyes, if not then no. We should not have any difficulty in recognizing that dpµ/dτis not a tensor because it does not transform correctly under general coordinatetransformations. We can show this simply by applying the chain rule:

(Xµ′

ν pν)

= Xµ′

dτ+ pν

dXµ′

dτ6= Xµ′

in general because the Jacobian matrix element Xµ′

ν is not necessarily indepen-dent of the proper time τ . Therefore,

fµ =dpµ

is not covariant, and therefore it does not apply to a general coordinate system.

Example 6.6.3. Given the metric tensor gµν . Show that ∂λgµν is not a tensor.

We simply apply the transformation rule

∂λ′gµ′ν′ = ∂λ′Xαµ′X

βν′gαβ 6= Xγ

λ′Xαµ′X

βν′∂γgαβ .

So, ∂λgαβ is not a tensor. For this reason,

Γµλν =1

2gµρ (∂νgρλ + ∂λgγρ − ∂ρgνλ)

is also not a tensor, even though it is covariant, i.e. it retains the same form ina primed frame.

Γµ′

λ′ν′ =1

2gµ′ρ′ (∂ν′gρ′λ′ + ∂λ′gγ′ρ′ − ∂ρ′gν′λ′)

The above examples suggest to us that “things go wrong” because of thederivative ∂τ . In fact, we have just shown that the derivative of a tensor is notnecessarily a tensor under general coordinate transformations. So, it is reason-able to suspect that in order to derive a procedure for finding the laws of physicsin a different coordinate system, we would first need to generalize our currentdefinition of the derivative so that derivatives of tensors are tensors.

Let us revisit the our current definition of the derivative and see where wewould run into trouble. Consider a vector λa in a general curved space. Let λa

at P and Q denote how λa evolves over some ∆t.

dt= lim

∆t→0

λa(t+ ∆t)− λa(t)

Applying the transformation rules Xb′

a (t) on λa at point Q and Xb′

a (t+ ∆t) onλa at P , we realize that the Jacobian matrix element Xb′

a can depend on t, i.e.,the space is different at P and Q, so it does make sense to compare λa at P toλa at Q if we cannot account for the change in the space. A better approachwould be to compare λa(t+ ∆t) and λa at the same point. And to do this, weneed to parallel transport λa(t + ∆t) from Q to P . We shall explore the ‘new’definition of the derivative in the next section.

6.7. ABSOLUTE AND COVARIANT DIFFERENTIATION 99

6.7 Absolute and Covariant differentiation

6.7.1 Absolute differentiation

Consider a manifold and a contravariant vector λa parameterized by t. It followsfrom the definition of the derivative with respect to t that:

dt= lim

∆t→0

λa(∆t+ t)− λa(t)

We have stated that we run into trouble with this definition because the spaceat P and Q are somehow “different” due to general curvature, i.e.

b′ (P ) 6= Xab (Q).

Specifically, as ∆t→ 0, we will also get extra terms that are derivatives of Xa′

b′ .To ‘fix’ this issue, we introduce the absolute derivative:

dt= lim

∆t→0

λa(∆t+ t)− λ2

where λa = λa at point P but parallel transported to point Q. Now, we wantan explicit expression for the absolute derivative. To do this, we Taylor expandthe first term in the definition

λa(∆t+ t) ≈ λa(t) +dλa

dt∆t = λa(P ) +

dt∆t,

and apply the parallel transport equation to the second term:

λa + Γabcλbxc = 0.

For small finite intervals,

λa ≈ ∆λa

xc ≈ ∆xc

So the parallel transport equation becomes:

∆λa + Γabcλb∆xc = 0,

where ∆λa = λa(Q)− λa(P ). This gives

λa(Q) ≈ λa(P )− Γabcλb∆xc.

Plugging this into the definition of the absolute derivative, we get

dt= lim

∆t→0

(dλa/dt)∆t+ Γabcλb∆xc

= lim∆t→0

dt+ Γabcλ

b∆xc

)=dλa

dt+ Γabcλ

So, let us define the absolute derivative of a contravariant vector as:

dt=dλa

dt+ Γabcλ

Note that this definition is covariant as expected:

dt= Xa

b′Dλb

We also notice that the right hand side of the definition is quite similar to theparallel transport equation. In fact,

dt+ Γabcλ

bxc = 0

if we parallel transport a contravariant vector λa. This says that λa is constantunder absolute differentiation. We get a nice property for a parallel transportedλa:

A reasonable question to ask now is what the absolute differentiation of scalars,covariant vectors, and general tensors look like.

Recall that scalars are invariant under general coordinate transformations,and that invariance implies covariance. This implies absolute derivatives ofscalars are just the normal derivatives of scalars:

dt=dφ

To find the form of the absolute derivatives of covariant vectors, we first considerthe scalar λaµa. By the definition of absolute derivatives of scalars, we get:

Dλaµadt

=λaµadt

dtµa +

Dµadt

λa =dλa

dtµa +

dµadt

dt+ Γabcλ

+ λaDµadt

dtµa + λa

dµadt

Dµadt

(λadµadt− µaΓabcλ

Dµadt

λa(λa − µdΓdacλaxc

Dµadt

=dµadt− Γdacµdx

c (6.1)

We notice that the form of the absolute derivative of covariant vectors is similarto that of contravariant vectors except for the leading sign of the Christoffelsymbol.

Lastly, combining what we know about absolute derivatives of contravariantand covariant objects, we can look at the absolute derivative of a general tensor.Consider τabc = λaσbµc (recall that multiplying vectors components can givetensors). We can show that

Dτabcdt

=dτabcdt

+ Γadeτdbc x

e + Γbdeτadc x

b − Γdceτabd x

Because absolute derivatives are tensors (by definition), we can apply the trans-formation rules as usual:

Dτa′b′

dt= Xa′

d Xb′

e Xfc′

Dτdefdt

6.7.2 Covariant differentiation

The absolute derivative is taken with respect to a parameter, but we also needto take derivatives with respect to coordinates. In the latter case, we need todefine the covariant derivative based on the “old” notion of the derivativeswith respect to coordinates, which is

∂a =∂

∂xa.

Let us define the covariant derivative by applying a chain rule to the absolutederivative, letting λa = λa(t)

dt=dλa

dt+ Γabcλ

bxc =Dλa

dxcdxc

dt=Dλa

dxcxc.

dxcxc =

dt+ Γabcλ

bxc =∂λa

∂xc∂xc

∂t+ Γabcλ

which gives the covariant derivative of λa with respect to xc:

dxc=∂λa

∂xc+ Γabcλ

b = ∂cλa + Γabcλ

For notation simplicity sake, we define the “comma” and “semi-colon” deriva-tives for normal and covariant/absolute derivatives of covariant vectors. Thesemi-colon is reserved for the covariant and absolute derivatives, while thecomma is reserved for the normal derivative. So:

λa;c =Dλa

dxc=∂λa

∂xc+ γabcλ

b = ∂cλa + Γabcλ

λa,c =∂λa

∂xc= ∂cλ

why do we do this? Because by writing the covariant derivatives in terms ofindices, it is easier for us to work with it as a tensor. Recall that λa is a type(1, 0) tensor. We have also shown that the covariant derivative of a covariantvector is a type (1, 1) tensor. By writing the new derivative as λa;c, it is moreobvious to us that it is a type (1, 1) tensor.

With this notation, we can write the covariant derivative of τab;c as:

τab;c = ∂cτab + Γadcτ

db − Γdbcτ

Example 6.7.1. Show that the metric tensor is covariantly constant, whichmeans that gab;c = 0.

We can start with

Γabc =1

2gad (∂bgcd + Γcgbd − ∂dgbc)

Let us invoke the following quantity, resulted from the lowering of Γebc indices:

Γabc = gaeΓebc

ed (∂bgcd + ∂cgbd − ∂dgbc)

2(∂bgac + ∂cgab − ∂agbc) .

Swapping a and b so that by a similar reasoning we get

Γbac =1

2(∂agbc + ∂cgba − ∂bgac)

Combining the previous two results we get

Γabc + Γbac =1

2(∂cgab + ∂cgba) = ∂cgab.

So, by definition:

gab;c = ∂cgab − Γdacgdb − Γdbcgad

= Γabc + Γbac − Γdacgbd − Γdbcgad

= Γabc + Γbac − Γbac − Γabc

Note that we might have predicted ahead of time that gab;c = 0by going to a local Lorentz frame so that gµν;c = 0 in 4-dimensional spacetime.There, the metric tensor is simple the Minkowski metric, whose derivatives arezero, resulting in zero-valued Christoffel symbols. Therefore, gµν;c = 0. In fact,following from the fact that tensor equations are covariant.

Proposition 6.7.1. If a tensor is zero in one frame, then it is zero in all frames.

Let us follow the previous result up with another nice example:

Example 6.7.2. Show that the Kronecker delta is covariantly constant: δab;c =

0. Then show that gab;c = 0 and gab; = 0.

This will be left as an exercise for the reader for now.

With the principle of covariance, we now have a prescription for findingphysics equations in general relativity:

• Step 1: Write down the physics equations in special relativity

• Change all derivatives to absolute or covariant derivatives, so that theequations turn into tensor equations.

• transform to arbitrary frames where the equations don’t change.

Example 6.7.3. Write Newton’s 2nd law in free space as a tensor equation:

In special relativity:

fµ =dpµ

To make this equation a tensor equation, we replace the derivative with respectto proper time to an absolute derivative with respect to proper time:

fµ =Dpµ

Now let us suppose that the particle in consideration is a free particle (subjectto nothing but curvature of spacetime due to gravity). We get

fµ =Dpµ

dτ= m

dτ= 0.

Therefore,

dτ+ Γµλνu

λxν = 0.

But by definition:

uµ =dXµ

we get

dτ2+ Γµνλ

dτ= 0,

which is the geodesic equation, as expected.

6.8 Newtonian limit

6.8.1 Overview

In Newtonian physics, gravity is a force:

~F = −GMm

and the equation of motion is:

~F =d2 ~X

In general relativity, however, the equation of motion is the geodesic equation:

dτ2+ Γµνλ

dτ= 0.

Now, in some limit, these equations have to match up. Indeed, there is aquantity in the metric tensor gµν that links up with a quantity in Newtonianphysics. This quantity is called the gravitational potential V . We can think ofV by analogy to the electromagnetic potential:

q= −1

4πε0

We can define a similar quantity for gravity, called the gravitational potential:

m= − 1

r= −GM

We can find the relationship between gravitational potential and force:

~F = −m~∇V.

So Newton’s second law of motion becomes:

~F = m~a = md2X

dt2= −m~∇V.

6.8. NEWTONIAN LIMIT 105

Therefore,

dt2= ~∇V.

Let us write the above equation in terms of indices:

dt2= −∂iV.

But in order for this equation work with relativistic theory (where we work withfour instead of three indices), we “fix” with a Kronecker delta δij :

dt2= −δij∂jV

We might wonder how this equation is compatible with the geodesic equationin the non-relativistic limit. We shall explore the answers in the following sub-sections:

6.8.2 Weak limit of General Relativity

The effects of gravity near Earth or the Sun are weak. Therefore only a slightcurvature is expected, which means we can approximate the metric tensor as aMinkowski metric plus a small correction term:

gµν ≈ ηµν + hµν ,

where |hµν | � 1. In the Newtonian limit, spacetime is almost Minkowski-like(flat). Keeping only the first terms in hµν , we can show the following examplethat

gµν ≈ etaµν − hµν ,

hµν = ηµαηµβhαβ .

Example 6.8.1. Show that gµν ≈ ηµν − hµν where hµν = ηµαηνβhαβ . Hint:verify that gµνgνσ ≈ δµσ to first order, assuming that products of h terms aresmall enough to be taken to 0.

At this point this example is left to the reader.

Once we have these results, we can find the Christoffel symbols Γµνσ in termsof the correction metric h to first order:

Γµνσ =1

2ηµρ (∂νhσρ + ∂σhνρ − ∂µhνσ)

Let us use this Christoffel symbol in the geodesic equation, but before we start,we should make a few approximations. First, we want a result in the non-relativistic (slow) limit, so

dτ� dXi

This follows since

dτ(ct) = c

whereas

dτ=dXi

dτ� c

With this result, we can ignore the terms dXi/dτ in summing compared todX0/dτ . Now we can start:

dτ2+ Γµνσ

dτ≈ d2Xµ

dτ2+ Γµ00

dτ≈ 0.

Second, we assume static gravitational field (not changing in time - assumingstationary Earth/Sun), i.e. ∂0hσ0 = 0:

Γµ00 =1

2ηµσ (∂0hσ0 + ∂0hσ0 − ∂σh00)

≈ −1

2ηµσ∂σh00.

So, the geodesic equation becomes:

dτ≈ 1

2ηµσ∂σh00

2ηµσ∂σh00c

Let µ = 0.

dτ2= c2

dτ2≈(

2η0σ∂σh00

)c2(dt

The only value for σ so that ησ0 is nonzero is σ = 0. But because ∂0h00 = 0 inthe static limit,

c2d2X0

dτ2=d2t

dτ2= 0.

Therefore, dt/dτ has no τ dependence.

6.8. NEWTONIAN LIMIT 107

Let µ = i,

dτ2≈ 1

2ηiσ (∂σh00) c2

Applying the chain rule:

)=d2Xi

dτ+dXi

)2d2Xi

dt2=c2

2ηiσ (∂σh00) .

Now note that this equation indicates a summing of indices. If σ = 0, thenηiσ = 0. While if σ = j, then ηij = −1 = −δij if i = j. So,

dt2≈ −c

2δij (∂jh00) .

Now, we compare this with the Newtonian equation:

dt2= −δij∂jV.

In order for general relativity to go back to Newtonian physics, there must bethe following correspondence in the limit:

V ≈ c2

2h00 + Constant.

Now if we “turn off” gravity, then hµν = 0, since spacetime is completelyMinkowski, which means V = 0. So, Constant = 0. We get:

h00 =2V

Now, since

g00 ≈ η00 + h00 = 1 + h00,

we finally get the correspondence between general relativity and Newtonianphysics.

g00 = 1 +2V

A more careful analysis shows

dt=√

1 + h00,

which follows from the fact that

c dτ2 = gµν dXµ dXν

is independent of τ in the static and weak limit. We notice that this results ina new type of time dilation which we will explore later when we are close todiscussing cosmology. Basically, the exact solution for a spherical mass M ingeneral relativity is the Schwarzschild solution:

gµν =

1− 2GM

c2r 0 0 0

0 −(1− 2GM

)−10 0

0 0 −r2 00 0 0 −r2 sin2 θ

Note that

g00 = 1 +2V

as predicted by our workings so far.

Chapter 7

Einstein’s field equations

We have been assuming that we know the metric tensor and have looked atphysics in curved spaces. But we have yet to develop a method to obtain themetric from a given distribution of mass and energy. In this section, we willlook at the Einstein equations - our prescription to finding the metric tensorgiven a distribution of mass and energy. The Einstein equations took Einstein 8years to develop. The Einstein equations are a set of coupled, non-linear partialdifferential equations that we can solve for the metric tensor gµν .

As a little teaser, let us look at the form of the Einstein equations:

Rµν − 1

2Rgµν = −8πG

c4Tµν

• Tµν is the energy-momentum stress tensor, which represents the densityof energy, mass, and momentum, which together is the source of gravityor the curvature of spacetime.

• Rµν is the Ricci tensor. This is actually a contraction of the Riemanncurvature tensor Rµνλσ:

Rµν = Rσµνσ

• R is the curvature scalar, which is the contraction of the Ricci tensor Rµν :

R = gµνRµν = Rµµ.

We will eventually show that the Riemann curvature tensor Rµνσλ is a functionof the metric tensor gµν and its derivatives.

As a little aside: When Einstein looked at the solutions for a gas of cosmicmatter, he found an evolving solution which suggested an expanding/contracting

110 CHAPTER 7. EINSTEIN’S FIELD EQUATIONS

universe. However, because Einstein thought the universe has to be static (as hewas doing this before Hubble’s discover in 1929 that the universe is expanding),Einstein added an extra term Λ to his equations:

Rµν − 1

2gµν + Λgµν =

c4Tµν .

Λ is the cosmological constant. This extra term accounts for a cosmic sourceof energy density that is associated with the vacuum (which we today refer toas dark energy). After Hubble’s discovery, though, Einstein set Λ to 0, callingit the “greatest blunder” of his life, and for decades, all cosmology uses Λ = 0until the 1990s when it was discovered that the universe is not only expandingbut also the expansion is accelerating. This event brought back Λ. Today, al-most all cosmological models include the cosmological constant or some form of“dark energy.”

In the later sections, we will study cosmology with and without the cosmo-logical constant Λ. Our plan is to look at Tµν , Rµνλσ , Rµν , R, etc. and retracesome of Einstein’s steps with which he came up with his solutions.

We should also note that the Einstein equations are very difficult to solvepartly because they are non-linear: Gravitational fields carry energy, which inturn affects themselves, unlike in electromagnetism where we get linear equa-tions whose solutions obey the principle of superposition because electromag-netic waves don’t carry charge and don’t interact with each other. Now, we arenot going to attempt to solve the Einstein equations (that would be way beyondthe scope of this Quick Guide). Instead, we will study two well-known solutionsto the Einstein equations:

• The Schwarzschild solution, which gives the metric tensor outside a spher-ical static mass M such as the Sun or the Earth or a black hole.

• The Friedmann-Robertson-Walker (FRW) solution, which gives the metrictensor for a homogeneous and spatially isotropic universe with Λ = 0 orΛ 6= 0. Note that the FRW solution with Λ 6= 0 is the current bestcosmological model.

Now, let us find out how Einstein was guided to find what we know call theEinstein equations. In Newtonian limit, for a point particle, we know that

~F = −m~∇V,

V = −GMr.

What about for a mass density of ρ? Let us first define the mass density ρ as:

~∇2V = 4πGρ

7.1. THE STRESS-ENERGY TENSOR Tµν 111

This is called the Poison equation. How we show this is true? We can dothis by analogy (again) to electromagnetism. In electromagnetism, the chargedensity is found through the first of four Maxwell’s equations:

ρ = ε0~∇ · ~E,

where ~E can be expressed in terms of the electric potential:

~E = −~∇V.

This simply gives:

~∇2V = − ρ

Einstein indeed used this analogy as a guide.

7.1 The stress-energy tensor T µν

As we have said before, Tµν is the energy-momentum stress tensor. For a givendistribution of matter,

ρ =dM

is the mass density, while ρc2 gives the mass-energy density. We also knowthat energy and momentum couple relativistically, so we might be wonderingwhat the momentum-type density (that goes with mass density) is. The answer,it turns out, to be pressure, p, (force per area). So, in general relativity, thepressure p acts as a source of energy-momentum density. But note that p is nota vector, so we expect p and ρc2 to be the matrix elements of the tensor Tµν .Also, since gµν = gνµ (symmetry), we expect a similar symmetry Tµν = T νµ.Therefore, for a simple gas of particles in a rest frame in flat spacetime, weexpect Tµν to have the following form:

Tµν =

ρc2 0 0 00 p 0 00 0 p 00 0 0 p

But to put Tµν into covariant form that allows for moving matter we need totake into account in the world velocity uµ. We can show that Tµν has thefollowing form:

Tµν =(ρ+

)uµuν − pηµν

where uµ = (γc, γ~v) for moving matter, and gµν = ηµν in flat spacetime. Ein-stein knew this was the quantity to use because it obeys the conservation law:

Tµν,µ = 0 ≡ ∂µTµν = 0

This gives two well-known equations in fluid dynamics: the continuity equa-tion:

∂t= ~∇ρ~v = 0,

and Euler’s equation:

∂t+ ~v · ~∇

)~v = −~∇ρ.

Now, we can also have energy density from electromagnetism because electricand magnetic fields carry energy and momentum. So, we define a stress tensorfor them as:

TµνEM = Fµλ Fνλ +

4ηµνFαβF

αβ ,

where Fµν is the electromagnetic stress tensor. (For now, we will not showhow this definition is motivated). With this definition, we can define the totalenergy-momentum stress tensor as the element-wise sum of the stress tensors:

Tµν = Tµνmatter + TµνEM + . . . .

Lastly, to make the equations covariant, we simply replace η with g and thecomma derivative with semi-colon derivatives. This gives the energy-momentumstress tensor of matter in general relativity as

Tµνmatter =(ρ+

)uµuν − pgµν .

And the conservation law:

Tµν;µ = 0.

Next, how do we find an equation that lets us solve for the metric tensor gµνgiven a distribution of matter Tµν . An obvious first guess is:

gµν = kTµν .

If this is the case, then gµν inherits all properties of Tµν . This is good, but itdoesn’t give the Poisson equation. So, we want to look for a relationship thatinvolves the Christoffel symbols. But remember that Γλµν is NOT a tensor. Also,recall that Γµλν 6= 0 does not mean spacetime is curved. These two remindersmotivate us (or rather, Riemann) to find a quantity that describes the curvatureof spacetime. This quantity is called the Riemann curvature tensor:

Rµνλσ = ∂λΓµνσ − ∂σΓµνλ + ΓρνσΓµρλ − ΓρνλΓµρσ

Spacetime is flat if the Riemann curvature tensor is zero at all points. If theRiemann curvature tensor is nonzero at any point in spacetime, then spacetimeis curved.

7.2. RIEMANN CURVATURE TENSOR Rλµνσ 113

7.2 Riemann curvature tensor Rλµνσ

The Riemann curvature tensor is obtained by doing repeated covariant differ-entiation. Notice, though, that we are concerned with covariant differentiationrather than usual derivatives, which obey “the equality of mixed partials:”

∂Xµ ∂Xν=

∂Xµ ∂Xν

As we have learned, the equality of mixed partials does not hold when is curva-ture. Suppose that λµ is a covariant vector:

λν;σ = ∂σλν − Γµνσλµ,

λν;σλ = (λν;σ);λ = (∂σλν − Γµνσλµ);ν .

We can easily show that the equality of mixed partials no longer hold in general:

λν;λσ 6= λν;σλ.

However, it is even more interesting that:

λν;λσ − λν;σλ = Rµνλσλµ

where if the Riemann curvature tensor is nonzero then the covariant second-derivatives (of different order of differentiation) of λµ are not equal.

Example 7.2.1. Prove the following cyclic identity: Rµνλσ +Rµλσν +Rµσνλ =0.

For now the exercise is left to the reader.

We observe that the Riemann curvature tensor has fourindices, each with four values 0 to 3. We might be expecting 44 = 256 indices,but we will show later on that they are not all mutually independent. Apartfrom the cyclic identity above, the Riemann curvature tensor has a number ofinteresting properties. We shall look at some important ones.

If we lower an index:

Rµνλσ = gµσRσνλσ,

we can show that:

Rµνλσ = −RνµλσRµνλσ = −RµνσλRµνλσ = Rλσµν .

All of these properties follow from the definition in terms of the Christoffel sym-bols Γµνσ and the metric tensor gµν . With all these relations, we can show thatthere are only 20 independent components in the Riemann curvature tensorRµνλσ.

Next, what if we contract an index of the Riemann curvature tensor?

Rµµλσ = gµρRρµλσ = −gµρRµρλσ = −Rρρλσ = −Rµµλσ.

This simply says that the contraction vanishes:

Rµµλσ = 0

But there are contractions that don’t vanish. For example, the Ricci tensor:

Rµν = Rσµλσ.

Example 7.2.2. Show that the Ricci tensor is symmetric: Rµν = Rνµ.

We start with the cyclic identity:

Rµνλσ +Rµλσν +Rµσνλ = 0.

We then contract σ with µ (or let σ = µ):

Rµνλµ +Rµλµν +Rµµνλ = 0

Rµνλµ +Rµλµν = 0

Rµνλµ −Rµλµµ = 0

Rµν = Rνµ.

The significance of this example is that it shows Rµν has only 10 independentcomponents just like gµν .

7.3 The Einstein equations

Before we get into the Einstein equations, we define the curvature scalar:

R = Rµµ = gµνRµν

In developing his equations, Einstein looked at combining gµν , Tµν , and Rµνand R in various ways. In 1915, he tried the combination: Rµν = kTµν wherek is a coupling constant. But this doesn’t work, since Tµν;µ = 0 for energy-momentum conservation, but Rµν;µ = 0 in general. Ultimate, however, Einsteinfound the combination involving

Gµν = Rµν − R

2gµν = kTµν ,

7.4. SCHWARZSCHILD SOLUTION 115

which has zero covariant divergence

Gµν;µ = 0.

As a result, Einstein settled with Gµν = kTµν because it is consistent withTµν;µ = 0. k is defined from Newtonian limit:

k = −8πG

So, this gives the Einstein equations

Rµν − R

2gµν = −8πG

c4Tµν

Now it seems like there has been a lot of guessing, but it has been shown thatpossibilities are very limited. One can show mathematically that a tensor tµν

that is a function of gµν and at most two derivatives that obeys tµν;µ = 0 canbe written as

tµν = ARµν +BRgµν + Cgµν ,

where A,B,C are constants. So, the only generalization to Einstein’s equationsis the cosmological constant term Λ:

Rµν − R

2gµν + Λgµν = −8πG

c4Tµν

We will look at the Einstein equations with and without Λ, and we will see thesignificance of Λ in cosmology. On the scale of our solar system, however, Λplays a very insignificant role.

7.4 Schwarzschild solution

During World War I in 1916, in the trenches, Schwarzschild found an exact so-lution to the Einstein equations (with Λ = 0) which describes the metric tensorfor spacetime outside of a static distribution of mass with boundary radius rB .Also, note that because the solution describes spacetime in empty space (be-yond a distribution of mass), energy-momentum stress tensor Tµν is simply zero.

Example 7.4.1. Show that

R =8πG

c4T = Rµµ = gµνRµν = gµνR

µν .

This can be accomplished by simply contracting µ by ν in the Einsteinequations:

gµνRµν − R

2gµνg

µν = −8πG

c4gµνT

Rµµ −R

2δµµ = −8πG

c4Tµµ

R− 2R = −8πG

R =8πG

From the previous example, we can show that the Einstein equations inempty space reduces to

Rµν = 0

Schwarzschild wrote down the general form of gµν for a static, spherically sym-metric, massive object, requiring that

gµν → ηµν as r →∞.

What this says is basically as we move infinitely far away from the given distri-bution of mass, the metric turns into Minkowski metric (flat spacetime). Next,in the Newtonian limit,

g00 = 1 +2V

V = −GMr.

Now for the big reveal... The Schwarzschild metric is:

[gµν ] =

1− 2MG

c2r 0 0 0

0 −(1− 2MG

)−10 0

0 0 −r2 00 0 0 −r2 sin2 θ

It is obvious that the Schwarzschild metric satisfies all conditions we have beendiscussing. As M → 0 or r →∞:

[gµν ] =

1 0 0 00 −1 0 00 0 −r2 00 0 0 −r2 sin2 θ

7.4. SCHWARZSCHILD SOLUTION 117

which is identically ηµν in spherical coordinates. Next, we will study this metricby applying it to the Earth, Sun, and black holes.

For r ≤ rB , the line element is

(1− 2GM

)c2 dt2 −

(1− 2GM

dr2 − r2 dθ2 − r2 sin2 θ dφ2

For the sake of notation simplicity, we define a length quantity:

We can write

(1− 2m

)c2 dt2 −

(1− 2m

Observe that

(1− 2GM

→∞ at r = rS =2GM

c2= 2m,

where rS is the Schwarzschild radius. As a result, we need to distinguish twotypes of object:

• rB > rS : if the boundary radius is larger than the Schwarzschild radius,there is no problem, since the solution describes spacetime outside of theobject, while rS is inside the object.

• rB < rS : black hole.

Chapter 8

Predictions and tests ofgeneral relativity

In this section we will investigate the curvature of spacetime near a planet orstar. Consider the Schwarzschild metric, let us look at the differences betweengeneral relativity and special relativity.

Special relativity is a relativity theory. Coordinates (x, t) represent physicallengths and times in a frame (S), and (x′, t′) are physical lengths and times ina frame (S′) measured by rulers and clocks and are related by Lorentz transfor-mations.

The key difference between coordinates in special and general relativity isthe fact that coordinates in general relativity do not give physical lengths andtimes. In general relativity, we should keep in mind a clear distinction be-tween coordinates and physical lengths and times. The Schwarzschild metricis written in terms of coordinates. (ct, r, θ, φ) are dimensional quantities thatuniquely label points in spacetime. But they are not physical lengths and times.

So, a reasonable question to ask is: “How do we compute lengths and timesusing the metric and coordinates?” We shall follow the following prescription:

• Look at spatial and time separation using the line element

• Consider moving in spacetime. This brings the geodesic equations intoconsideration. However, we have to be careful whether the particle inconsideration is massive of massless.

There is, however, a special case where the coordinates (ct, r, θ, φ) becomephysical lengths and times. It is precisely the case of Minkowski flat spacetime(when we are infinitely far away from the gravity source). Why? Because inflat spacetime, coefficients of the metric tensor are either 1 or -1.

120CHAPTER 8. PREDICTIONS AND TESTS OF GENERAL RELATIVITY

8.0.1 Lengths

We observe that the Schwarzschild metric has no time dependence:

[gµν ] =

1− 2MG

c2r 0 0 0

0 −(1− 2MG

)−10 0

0 0 −r2 00 0 0 −r2 sin2 θ

So we can ”separate” space and time in a sense that we can take t = constantslice of spacetime and look at the geometry of space. Let us consider dt = 0 andlook only at the spatial metric tensor. For the sake of convenience, we shall alsoswitch the signs of the spatial components in the original metric tensor fromnegative to positive.

gµν =

(1− 2GMc2r

)−10 0

0 r2 00 0 r2 sin2 θ

And so the ”new” spatial line element is:

(1− 2GM

dr2 + r2 dθ2 + r2 sin2 θ dφ2

where [gij ] = [−gij ] and xi = (r, θ, φ).

What is the geometry of this space? Consider θ = π/2, which gives us theequatorial plane. In fact, any slice from through the center will be the same.Now, because θ is fixed, dθ = 0. So, we are reduced to a 2-dimensional surface.The line element becomes

(1− 2GM

dr2 + r2 dφ2

This describes a 2-dimensional sheet through equators. Consider two concentriccircles with radii r1 and r2. Now for each circle we find its physical circumfer-

ence. With dr = 0, the line element becomes ds2 = r2 dφ2. So,

ˆ 2π

dφ = 2πr.

Thus r = s/2π, but is it? This circumference is based on the assumption that ris the physical distance to the center, which it isn’t. r here is just the coordinate.To find radial distances, we have to integrate r with φ fixed (dφ → 0), whichleaves the line element as

(1− 2GM

and so

(1− 2GM

)−12

dr = R,

where R is the physical radial distance from the center. Notice that R ≥ δr,

because 2GM/c2r < 1. So, how do we actually make measurements when ourruler has different lengths at different points in space? The solution is to opena rule factory at r → ∞, build 1 m sticks, then distribute them everywhere.Measurements count how many 1 m sticks are needed to get from one point toanother.

It turns out that for r0, the physical circumference is indeed 2πr0. But go-ing radially inwards, we need more than r0 sticks. To visualize this, we canuse a hyperspace as an embedding space. The result looks something like this:What we see is a 2-dimensional sheet in the shape of a funnel in 3-dimensional

space. A similar thing happens on the surface of a sphere, which is a curved2-dimensional space. Here the circumferences are 2πr1 and 2πr2 respectively.But the radial distance R12 isn’t the same as the difference in radial position ∆r.

By the form of the metric, we see that the pace near a static mass M is

curved, but for the Earth or the Sun the effects are quite small:

Earth:2m

c2rB≈ 10−9

Sun:2m

c2rB≈ 10−6

or in other words,

R ≈ r1 − r2.

Example 8.0.1. Find ∆R and ∆r between the surface of the Sun and Earth.We will use r2 = 7× 108m, and r1 = 1.5× 1011 m. For the Sun,

c2= 1482m.

So we see that

r1� 1, likewise

r2� 1.

So we can approximate: (1− 2m

)−1/2

≈ 1 +m

∆R =

(1− 2m

)−1/2

rdr = ∆r12 +m ln

8.1. TIMES 123

With ∆r ≈ 1.5× 1011 m, m ln(r1/r2) ≈ 7.9× 103m� ∆R−∆r. We have

∆R ≈ ∆r ≈ 1.5× 1011m.

We can look at relative difference:

∆R−∆r

∆R≈ 5.3× 10−8.

Like we said, the curvature is negligible for the Sun-Earth system. It turns outthat even if we don’t approximate, the exact solution

∆R =√r1(r1 − 2m)−

√r2(r2 − 2m) + 2m ln

(√r1 +

√r1 − 2m

√r2 +

√r2 − 2m

)gives the same answer.

8.1 Times

Now that we have looked at lengths, let us shift our focus to time intervals.Consider a clock at east in a gravitation field. Then we have r, θ, φ fixed, ordr = dφ = dθ = 0, s = cτ , where t 6= τ and τ is proper time. This gives

ds2 = c2dτ2 =

(1− 2GM

)c2, dt2.

Thus, we have time dilation

(1− 2GM

where τ is he physical time on clock, and t is coordinate time, or time on avery far away clock. Next, suppose we have two clocks at rest a two differentlocations. Then we have two time dilations:

r1 : ∆τ1 =

(1− 2GM

r2 : ∆τ2 =

(1− 2GM

So we have

∆τ1∆τ2

√1− 2GM/c2r1

1− 2GM/c2r2

This is called gravitational time dilation. We see that if r2 < r2, then ∆τ2 < ∆τ1.This means time goes slower in stronger gravitational fields. But we note thateverything slows down together, so that we don’t notice anything locally. Thisis because we will measure slowed down events with slowed down clocks. Tosee any difference, we need two clocks at two different locations. For example,we can compare clocks on the ground versus clocks on an airplane or satellite.It turns out the experiments agree with general relativity. In fact, thanks togeneral relativity we have GPS.

8.2 Gravitational redshift

As with special relativity, when we have time dilation, we have spectral shifts.Consider light emitted and received at two locations as follows. Clocks at rE

and rR tick at different rates. The frequency perceived at each location for thesame number of cycles n is

νE =n

∆τEνR =

∆τR.

Each proper time is related to a coordinate time as

∆τR =

√1− 2GM

c2rR∆tR

∆τE =

√1− 2GM

c2rE∆tE

∆tE = t(n)E − t(0)

is the coordinate time for emission of n cycles. Now, because light follows thenull trajectory, the line element for light is zero, i.e.,

0 = ds2 =

(1− 2m

)c2 dt2 + gij dx

i dxj .

So this says

√(1− 2GM

gij dxi dxj

8.2. GRAVITATIONAL REDSHIFT 125

With this, we get

t(0)R − t

(0)E =

√(1− 2GM

gij dxi dxj dr.

Notice that there is no time dependence here. In fact, it is easy to see that weget the same right hand side for the end of the nth wave. So,

∆tR = ∆tE .

Therefore,

∆τR√1− 2GM

=∆τE√

1− 2GMc2rE

or equivalently,

νRνE

=n/∆τRn/∆τE

=∆τE∆τR

√1− 2GM/c2rE1− 2GM/c2rR

For 2m/r � 1, we get the approximation

νRνE≈ 1−m/rE

1−m/rR,

or equivalently,

νE=νR − νEνE

≈ GM

rR− 1

)In summary, for rR > rE (away from g source), we have a redshift, and viceversa.

In reality, how do we look for these effects? Here’s the setup of the exper-iment done by Pound-Rebka at Harvard. Suppose we have the following twoscenarios: In the first scenario, both observers see νE , because dr = 0 for each.

But in the second scenario, even though νE is the same for both, one observersees blueshifted light while the other redshifted. The experiment was able toconfirm a difference in received wavelengths.

8.3 Radar time-delay experiments

Radar time-delay experiments provide one of the best tests of general relativity.Consider the following setup. The goal here is to bounce a radar signal from

Earth off of Venus with the Sun behind, then measure the duration of the roundtrip using a clock at rest on Earth.

Naively, we might expect that

∆τ = 2r2 − r1

But there is actually a time delay. We know that for light, ds = 0. If we let θ, φbe constant, as we send light radially, we get that

ds2 = 0 =

(1− 2GM

)c2 dt2 −

(1− 2GM

This means

dt= ±c

(1− 2GM

This is the coordinate speed of light. We see that∣∣∣∣drdt∣∣∣∣ < c,

so the coordinate speed of light is less than c. So how long does a round triptake, as measured with a clock at rest on Earth? Well, for r, t we have that

(1− 2GM

dr ≈ 1

This gives the coordinate time of

∆t = 2

c∆r +

∣∣∣∣r1

∣∣∣∣

8.3. RADAR TIME-DELAY EXPERIMENTS 127

For clocks on Earth,

∆τ =

(1− 2m

∆t (8.1)

where m = GM/c2 and M is the mass of the Sun. There is a gravitational effectdue to the Earth’s mass but it is significantly smaller than the Sun’s:(

)Earth

. (8.2)

Next, we Taylor expand the delay to get

∆τ =

(1− 2m

∆t =

(1− GM

c∆r +

∣∣∣∣r1

∣∣∣∣] . (8.3)

∆τGR =2

c∆r − 2m

r1c∆r +

∣∣∣∣r1

∣∣∣∣ (8.4)

Comparing this with the expected result 2∆R = c∆τ , which says that

∆τ =2

c∆R (8.5)

(1− 2GM

)−1/2

dr (8.6)

)dr (8.7)

[∆r +m ln

∣∣∣∣r1

∣∣∣∣] , (8.8)

and so the expected time interval is

∆τexpected ≈2

c∆r +

∣∣∣∣r1

∣∣∣∣ (8.9)

We see that

∆τGR 6= ∆τexpected. (8.10)

We also note that

∆τGR −∆τ ≈ 2GM

∣∣∣∣r1

∣∣∣∣− ∆r

)> 0. (8.11)

This essentially says that GR predicts a time delay. What does this mean? Well,this is up to interpretation. The first interpretation says that the speed of light

is slowed down in GR. This interpretation is a little misleading, because whileit is true that

∣∣drdt

∣∣ < c, this is not the physical speed of light. A different inter-pretation says that you can’t use a clock on Earth and ∆R for a t = Constantslice for light moving through a non-constant gravitational field. Essentially,clocks run differently all along the way. We’re also using ∆R, which assumest = Constant. However, light is moving through time. So, the predicted GRresult takes all of this into account and gives a different answer.

Here’s a question: What speed does light have in GR? We must first knowthat xµ = dXµ/dτ is NOT defined because light has no proper time. How-ever, we can work around this by going to a freely falling frame where gµν →ηµν , ds

2 = ηµνdXµdXν = 0. This gives

c2 dt2 − (dXi)2 = 0 =⇒∣∣∣∣dXi

∣∣∣∣ = c. (8.12)

But what about in non-inertial frames? In this case, we will need to measurethe speed of light locally, which means we will use local clocks (dτ) and dR. Weknow that

(1− 2m

)−1/2

dr dτ =

(1− 2m

dt, (8.13)

which says

dτ︸︷︷︸physical

(1− 2m

)−1dr

dt︸︷︷︸coordinate

. (8.14)

But, because

dt= ±c

(1− 2m

), (8.15)

we get

dτ= ±c. (8.16)

So the speed of light is still c. However, keep in mind that we can’t concludethat going a distant 2∆R takes τ = 2∆R/c because instead, GR predicts anextra delay.

8.3.1 Experiments of Shapiro (1968 - 1971)

These radar delay experiments measured delays of radar signals bouncing offVenus as they pass behind the Sun. Here’s a rough sketch of the “setup” ofthe experiment: Since accurate time delays could not be be computed, Shapiro

8.4. PARTICLE MOTION IN SCHWARZSCHILD GEOMETRY 129

looked at changes in delays and fitted data to a metric of the form

gµν =

(1− 2m/r)

−(1− γ 2m

−r2 sin2 θ

(8.17)

where γ is a parameter. They fitted γ to the data to find the best value. Shapiroet al found that

γ ≈ 1.03± 0.01, (8.18)

which is consistent with Schwarzschild geometry.

8.4 Particle Motion in Schwarzschild geometry

Massive particles have defined proper time:

c2 dτ2 = ds2 =

(1− c2 2m

)dt2 −

(1− 2m

dr2 − r2 dθ2 − r2 sin2 θ dφ2.

(8.19)

This has 5 variables. But recall that we also have the 4 geodesic equations:

dτ2+ Γµνλ

dτ= 0. (8.20)

With these there are enough equations to solve for the variables. First, we needthe connections. By definition,

Γµρσ =1

2gµλ (∂ρgσλ + ∂σgρλ − ∂λgρσ) (8.21)

whose values are dependent on the metric (which can be found in various places).In “dot” notation (where “dot” means derivative w.r.t. τ), we can write thegeodesic equations as

Xµ + ΓµνσXµXσ = 0. (8.22)

For µ = 0, t = X0, we have

ct+ 2Γ001ctr = 0 (8.23)

because the only non-zero connections are Γ001 = Γ0

10. This gives

(1− 2m

tr = 0. (8.24)

Repeating this for µ = 1, 2, 3, we get three more equations:

µ = 1 : r +mc2

(1− 2m

(−mr

)(1− 2m

+ (2m− r)θ2 − r sin2 θ

(1− 2m

)φ2 = 0 (8.25)

µ = 2 : θ +2r

rθ − sin θ cos θφ2 = 0 (8.26)

µ = 3 : φ+2r

2 cos θ

rθφ = 0. (8.27)

8.4.1 Planar motion θ = π/2

Consider the planar motion θ = π/2, then the µ = 2 equation goes away,sin θ = 1, cos θ = 0. In this case we have

µ = 0 : t+2m

(1− 2m

tr = 0 (8.28)

µ = 1 : r +mc2

(1− 2m

)t2 − m

(1− 2m

r2 − r(

1− 2m

)φ2 = 0

(8.29)

µ = 3 : φ+2r

rφ = 0. (8.30)

For the first equation, we divide both sides by t and integrate to get

ˆ−2m/r2

1− 2m/rdr (8.31)

ln t = − ln

(1− 2m

)+ C. (8.32)

8.4. PARTICLE MOTION IN SCHWARZSCHILD GEOMETRY 131

So this says

(1− 2m

)(8.33)

For the third equation, we can write it as d/dτ(r2φ) = 0 to get

r2φ = h = Constant. (8.34)

Thus, putting everything together,(1− 2m

r +mc2

r2t2 +

(1− 2m

)−2m

r2r2 − rφ2 = 0 (8.35)(

1− 2m

t = k (8.36)

r2φ = h. (8.37)

The line element at θ = π/2 is

c2 dτ2 = c2(

1− 2m

)t2 −

(1− 2m

r2 − r2φ (8.38)

Now, we have 4 equations with 4 unknowns with k, h as constants. We can solvethese for r, t, φ, τ in terms of k, h.

For example, using these equations Einstein calculated the precession ofMercury’s perihelion (point of closest approach) In Newtonian physics, there is

a precession rate of 532′′ per century caused by other planets. However. therewas always an extra 43′′ per century that could not be explained. Einstein wasable to do the calculations and find an extra 43′′ per century. This is quite aremarkable feat.

8.4.2 Light motion with θ = π/2

For light, we must use the null line element because proper time is no longerdefined. Again, let us consider the planar motion θ = π/2. We have in this case

ds2 = 0 =

(1− 2m

)c2 dt2 −

(1− 2m

(8.39)

(1− 2m

)c2 dt2 −

(1− 2m

dr2 − r2 dφ2. (8.40)

Next, let us parameterize the null trajectory with ω (instead of t or τ). Withthis, we have Xµ = dXµ/dω and so on. Any ω is good as along as it giveslight-like trajectory: 0 = gµνdX

µdXν . We can use ω in the geodesic equationsto describe the motion of a free particle. Once again we have

Xµ + ΓµνσXνXσ = 0 (8.41)

Let t = dtdω and so on, we get the same equations with µ = 1, 2, 3 in a similar

fashion as before. Using these equations, Einstein was able to calculate thedeflection of light passing close by the Sun: Einstein predicted that ∆α = 1.75′′

for a certain deflection of star light. This was later confirmed by Sir Eddingtonin 1919.

8.4.3 Can light have circular orbit?

The answer turns out to be YES, but only for r = 3m. We need solutions forrB < 3m. This means we need either black holes with rB < 2m or very close...Let us look at a plane θ = π/2 again with line element:

0 = c2(

1− 2m

)t2 −

(1− 2m

r2 − r2φ2. (8.42)

But for circular orbit, we also have that r = r = 0. This simplifies the lineelement to

0 = c2(

1− 2m

)t2 − r2φ2. (8.43)

The r-geodesics equation says that(1− 2m

r2t2 −

(1− 2m

)−2m

r2r2 − rφ2 = 0. (8.44)

8.5. OTHER TESTS OF GR 133

But with r = r = 0, this simplifies to

r2t2 − rφ2 = 0 (8.45)

or equivalently

rt2 − r2φ2 = 0 (8.46)

With this and the simplified line element, we can solve for r and get r = 3m.This is the radius of a circular orbit for light.

8.5 Other tests of GR

Other tests of GR include gravitational lensing and detecting gravitationalwaves. The first gravitational waves were detected at LIGO in 2015-2016.

8.6 Black Holes

For r ≥ rB , we use the Schwarzschild solution where the line element is givenby

(1− 2m

)c2 dt2 −

(1− 2m

dr2 − r2 dθ2 − r2 sin2 θ dφ2. (8.47)

We see that at the event horizon r = 2m, g11 → ∞. Also, there s anothersingularity g00 → −∞, g11 → ∞ as r → 0. For the Sun and Earth, etc rB ≥2m so there is no problem. However for some objects where rB < rs = 2m,singularities matter. Such objects are called black holes. In this section, we willlook at the geometry at r > 2m (outside), r < 2m (inside), r = 2m (on theevent horizon) of a black hole.

8.6.1 Radial trajectories of massive objects

Suppose we are falling radially from rest from r = r0 into a black hole. Let usstart with the line element. In terms of proper time τ :

ds2 = c2 dτ2 =

(1− 2m

)c2 dt2 −

(1− 2m

dr2. (8.48)

Dividing everything by dτ2, we get

(1− 2m

)c2t2 −

(1− 2m

r2. (8.49)

Next, apply r = 0 at r = r0, we get

(1− 2m

)−1/2

(8.50)

or (initially)

(1− 2m

)−1/2

dτ. (8.51)

What about t? Let us look at the geodesic equation:(1− 2m

)t− k = 0, (8.52)

where k is a constant. So, at r = r0, to find k:

(1− 2m

)−1/2(1− 2m

(1− 2m

. (8.53)

What is this constant? Can we interpret it? The answer is YES. For m/r0 � 1,then we can expand k to get

k ≈ 1− m

r0= 1− GM

c2r0. (8.54)

Suppose that our object has mass M0, then its rest energy and potential energyat r = r0 relate through

E = M0c2 − GMM0

r0. (8.55)

This says

M0c2= 1− GM

c2r0≡ k (8.56)

8.6. BLACK HOLES 135

And so we can see that k is a ratio between the total energy and rest energy ofthe object.

Next, let us make another approximation. Let r0 →∞, then k ≈ 1. We canuse k ≈ 1 for falling from rest from far away where r0 →∞. But we also don’twant r0 = ∞ exactly because then there is no interesting effects. We assumethat the r0 is big enough that we can use k = 1. In this case

1 = k =

(1− 2m

)t (8.57)

holds for any r, t. This the becomes

(1− 2m

=⇒ dτ =

(1− 2m

)dt (8.58)

where t is the coordinate time, and τ is the proper time.

Note that this result is different from the time dilation formula dτ =(1− 2m

)1/2dt

for clocks at rest we have found earlier. Is there a contradiction here? The an-swer is a solid NO. Clocks at rest don’t follow geodesics! Clock at rest hasa net force in the gravitational field, which means they are the very oppositeof free-falling.

Back to the problem. Let us look at the line element again and plug in

dτ2 =(1− 2m

)2dt2 into the line element. This gives

1− 2m

dt2 = c2(

1− 2m

)dt2 −

(1− 2m

dr2 (8.59)(1− 2m

)dt2 = dt2 − 1

(1− 2m

dr2 (8.60)(1− 2m

)= 1− 1

(1− 2m

)−2(dr

(8.61)

dt= −

√2mc2

(1− 2m

“−” because falling “into”

(8.62)

So, the coordinate velocity w.r.t. clocks from far away is

dt= −c

(1− 2m

)(8.63)

With this, we can integrate to find t in terms of r:

∞=dt

drdr = −

(1− 2m

=∞. (8.64)

This means we must cut off the integral at some larger r. We can numericallyevaluate and plot t versus r. From the plots, we see that viewers at ∞ see the

falling objects slowing as r → 2m and it never reaches the horizon (|dr/dt| → 0as r → 2m ).

What about for the falling observer with τ being their proper time? We canlook at dr/dt and τ vs. r. We use k = 1 and

(1− 2m

. (8.65)

By the chain rule:

dt= −c

(1− 2m

)(1− 2m

= −c√

r. (8.66)

So we have

dt= −c

r(8.67)

We can integrate this to get

τ = −ˆ r

−∞

2m. (8.68)

We can plot τ versus r and |dr/dtτ | versus r and get

For a falling observer, he/she reaches the event horizon in finite time infast and faster rate. In fact, since nothing blows up, the observer passes rightthrough the event horizon. So, we conclude that the falling observer reachesr = 0 in finite time.

8.6. BLACK HOLES 137

Example 8.6.1. Calculate the proper time to go from r = 2m to r = 0:

τ = −ˆ 0

3c3. (8.69)

For M = MSun, τ ≈ 6.5µs.

So, just to summarize what we have discovered, we see 2 different views ofwhat happens. Far away observers says you never reach the event horizon. Butas you fall into a black hole, you find that you cross the event horizon and headinto the center of the black hole in finite time.

8.6.2 Light signal

To better understand the different perspectives, let us look at light signals.Suppose that falling observer sends light signals outward. We know that lightrays follow null trajectories ds2 = 0.

140 CHAPTER 9. COSMOLOGY

Chapter 9

Cosmology

9.1 Large-scale geometry of the universe

9.1.1 Cosmological principle

9.1.2 Robertson-Walker (flat, open, closed) geometries

9.1.3 Expansion of the universe

9.1.4 Distances and speeds

9.1.5 Redshifts

9.2 Dynamical evolution of the universe

9.2.1 The Friedmann equations

9.2.2 The cosmological constant Λ

9.2.3 Equations of state

9.2.4 A matter-dominated universe (Λ = 0) [Friedmannmodels]

9.2.5 A flat, matter-dominated universe (Λ = 0) [old fa-vorite model]

9.3 Observational cosmology

9.3.1 Hubble law

9.3.2 Acceleration of the universe

9.3.3 Matter densities and dark matter

9.3.4 The flatness and horizon problems

9.3.5 Cosmic Microwave Background (CMB) anisotropy

9.4 Modern cosmology

9.4.1 Inflation

9.4.2 Dark energy (The cosmological constant problem)

9.4.3 Concordance model [new favorite model]

9.4.4 Open questions

GENERAL RELATIVITY COSMOLOGY

Documents