+ All Categories
Home > Documents > Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf ·...

Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf ·...

Date post: 17-Oct-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
25
Probability Review Michael Bar 1 July 9, 2012 1 San Francisco State University, department of economics.
Transcript
Page 1: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

Probability Review

Michael Bar1

July 9, 2012

1San Francisco State University, department of economics.

Page 2: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

ii

Page 3: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

Contents

1 Appendix 1: Probability Review 1

1.1 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Marginal pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.2 Conditional pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 Independence of random variables . . . . . . . . . . . . . . . . . . . . 8

1.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.1 Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.2 Variance and standard deviation . . . . . . . . . . . . . . . . . . . . . 12

1.4.3 Covariance and correlation . . . . . . . . . . . . . . . . . . . . . . . . 16

iii

Page 4: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

iv CONTENTS

Page 5: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

Chapter 1

Appendix 1: Probability Review

In this chapter we review some basic concepts in probability and statistics. This chapter is

not a substitute for an introductory course in statistics, such as ECON 311. My goal here is

to review some key concepts for those who are already familiar with them, but might need

a speedy refresher.

1.1 Probability Spaces

The most fundamental concept in probability theory is a random experiment.

Definition 1 A random experiment is an experiment whose outcome cannot be predicted

with certainty, before the experiment is run.

For example, a coin flip, tossing a die, getting married or getting a college degree. Al-

though we cannot predict with certainty the exact outcome of a coin flip, but we know that

it can be either "heads" or "tails". Similarly, in a toss of a die we know that the outcome is

one of the following: 1 2 3 4 5 6.

Definition 2 A sample space, Ω, is the set of all possible (and distinct) outcomes of a

random experiment.

Thus, the sample space for the random experiment of a coin flip is Ω = "heads", "tails",and the sample space of the random experiment of a die toss is Ω = 1 2 3 4 5 6. It isless clear what is the sample space for the random experiments of getting married or getting

a college degree. In the first, one can look at the number of kids as an outcome, or the

duration the marriage will last. In the second, one can look at the types of jobs the person

gets, his wage, etc.

As another example, what is the sample space of flipping two coins? Let heads and tails

be denoted by H and T. Then the sample space is Ω = HH, HT, TH, TT, that is, thesample space consists of 4 outcomes.

Exercise 1 Write the sample space of flipping 3 coins.

Exercise 2 Write the sample space of tossing two dice.

1

Page 6: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

2 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

Sometimes we don’t care about all the particular outcomes of an experiment, but only

about groups of outcomes. For example, consider the random experiment of taking a course.

The set of all possible numerical grades is Ω = 0 1 100. However, if a letter grade of"A" is assigned to all grades of 92 and above, then we might be particularly interested in

the subset of Ω which gives an "A": 92 93 100.

Definition 3 An event is a subset of the sample space, denoted ⊆ Ω.

In the previous example of a grade in a course, we can define an event = 92 93 100.If one of the outcomes in occurs, we say that event occurred. Notice that ⊆ Ω, which

reads " is a subset of Ω". As another example, suppose that a student fails a course if his

grade is below 60. We can define an event = 0 1 59. Again, if one of the outcomesin occurs, we say that event occurred.

Exercise 3 Let the random experiment be flipping two coins, and be the event in which

the two coins have identical outcomes. List the outcomes in .

Exercise 4 Let the random experiment be a toss of a die, and be the event in which the

outcome is odd. List the outcomes in .

Finally, we would like to calculate the probability of an event. Intuitively, probability

of an event is the "chance" that an event will happen.

Definition 4 The probability of a random event denotes the relative frequency of occur-

rence of an experiment’s outcomes in that event, when repeating the experiment many times.

For example, in a toss of a balanced coin, if we repeat the experiment many times, we

expect to get "heads" half of the time, and "tails" half of the time. We then say that "the

probability of heads is 12 and the probability of tails is also 12". Similarly, in tossing a

balanced die, we expect each of the outcomes to occur with probability 16. Let the event

denote an even outcome in a toss of a die. Thus, = 2 4 6. What is the probability of? Intuitively, the probability of this event is the sum of the probabilities of each outcome,

i.e. () = 16+ 1

6+ 1

6= 1

2. In general, the probability of event is calculated by adding

up the probabilities of the outcomes in .

It should be obvious that (Ω) = 1 from the definition of the sample space. Since the

sample space contains all possible outcomes of a random experiment, the probability that

some outcome will occur is 1. It should also be obvious that probability of any event must

be between 0 and 1, and your instructors will be very alarmed if you calculate negative

probabilities or probabilities that are greater than 1.

In the examples of a coin flip and a toss of a die, each distinct outcome occurs with

equal probability. This need not always be the case. Consider the experiment of pick-

ing a person at random and recording his education. The sample space can be for example

"no school", "less-than-high school", "high school", "college degree", "MA", "Ph.D.". Ob-viously ("high school") is much higher that ("MA").

Page 7: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.2. RANDOM VARIABLES 3

1.2 Random Variables

Notice that some random experiments have numerical outcomes (e.g. as a toss of a die) while

others have outcomes that we described verbally (e.g. "heads" and "tails in a flip of a coin,

or education level of a person, "high school", "college degree",...). Calculating probabilities

and performing statistical analysis becomes much simpler if we could describe all outcomes

in terms of numbers.

Definition 5 A random variable is a function which assigns a real number to each out-

come in the sample space. Formally, : Ω → R is the notation of a function (named ),

which maps the sample space Ω into the real numbers R.

From the above definition, note that if is a random variable and (·) is some real valuedfunction, then () is another random variable. It is conventional to use capital letters to

denote the random variable’s name, and small case letters to denote particular realizations

of the random variable. For example, we had seen that the sample space of flipping two coins

is Ω = HH, HT, TH, TT. One can define a random variable , which counts the numberof times that "heads" is observed. The possible values of (called the support of ) are

∈ 0 1 2. We can then calculate the probabilities ( = 0), ( = 1) and ( = 2),

in other words we can find the distribution of the random variable .

Exercise 5 Calculate the probabilities the random variable , which counts the number of

"heads" in a two coin flip. That is, find ( = 0), ( = 1) and ( = 2).

As another example, consider the random experiment to be a course with 20 students,

with the relevant outcomes being success or failure. Suppose that I want to calculate the

probability that all students pass the course, or that 1 student fails and 19 pass, or that

2 students fail and 18 pass, etc. I can define a random variable to be the number of

students who fail the course. Thus, the possible values of are 0 1 2 20. If I knowfrom previous experience that with probability a student fails my course, we can calculate

the distribution of . Roughly speaking, we can calculate ( = 0), ( = 1),...,

( = 20).

Definition 6 A discrete random variable is one that has a finite or countable number

of possible values.

Definition 7 A continuos random variable is one that has a continuum of possible val-

ues.

The distribution of a random variable is usually described by it’s probability density

function (pdf). The probability density function for a discrete random variable assigns

probabilities to each value of the random variable:

() = ( = )

Here is the name of the random variable, and is one possible realization. The probability

density function of a continuous random variable can be used to calculate the probability

Page 8: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

4 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

that the random variable gets values in a given interval [ ], by integrating the pdf over

that interval: Z

() = ( ≤ ≤ )

For example, the random variable that counts the number of failing students, among 20

who take the course, is an example of a discrete random variable. It has 21 possible values

(the support is ∈ 0 1 2 20) and a binomial distribution. We write ∼ ( )

to indicate that there are students, each fails with probability . If there are 20 students

and = 5%, we write ∼ (20 005). In case you are curious, the pdf of is

() = ( = ) =

µ

¶ (1− )

where µ

¶=

!

! (− )!

The next figure illustrates the pdf of (20 005):

Notice that high number of failures is very unlikely, which is a good news.

Exercise 6 Suppose that 20 students enroll in a course, and their instructor knows from

experience that each of them fails with probability 005. Calculate the probability that 7

students fail the course.

An example of a continuous random variable is continuous uniform distribution. If

∼ [ ], we say that has a uniform distribution on the interval [ ]. The probability

density function of is () = 1− ∀ ∈ [ ], and () = 0 otherwise. Thus, the support

of this random variable is ∈ [ ]. The next figure plots the pdf of a continuous uniform

Page 9: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.2. RANDOM VARIABLES 5

distribution.

Suppose that we have two numbers, , such that ≤ ≤ ≤ . The probability that

∈ [ ] is ( ≤ ≤ ) =

Z

1

− =

In other words, the probability that a continuous uniform random variable falls within some

interval [ ] is equal to the relative length of that interval to the support.

In general, when the pdf of a random variable is given, the support of that random

variable is the set of all the values for which the pdf is positive. Written mathematically,

support () = | () 0which reads "the support of is the set of values such that the pdf is positive". This

is the same as saying that the support of is the set of all possible values of , because

values for which the pdf is zero are not possible.

It should be obvious that sum (in discrete case) or integral1 (in continuos case) of all the

values of a probability density function is 1. This has to be true, because a random variable

assigns a number to any outcome in the sample space, and since (Ω) = 1, the probability

of the random variable getting some number is also 1. Formally,

[ is discrete] :X

() = 1

[ is continuos] :

Z ∞

−∞ () = 1

In fact, any function which integrates (or sums) to 1 and is nonnegative (never attains

negative values) is a probability density function of some random variable.

Exercise 7 Suppose that is a continuous uniform random variable, with support [ ].

Find the critical value such that ( ≥ ) = 005.

1Mathematicians often don’t even bother using the word sum, because integral is also a sum. The integral

symbol,R, comes from an elongated letter s, standing for summa (Latin for "sum" or "total").

Page 10: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

6 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

Exercise 8 Suppose that is a continuous uniform random variable, with support [ ].

Find the critical value such that ( ≤ ) = 005.

Exercise 9 Suppose that has exponential distribution, denoted ∼ (), 0,

i.e. the pdf is:

() =

½−

0

≥ 0 0

(a). Prove that the above function is indeed a probability density function.

(b). Calculate the critical value ∗ such that ( ≥ ∗) = 005.

1.3 Random Vectors

Sometimes we want to explore several random variables at the same time, i.e. jointly.

For example, suppose that we choose people at random and look at how their education

and wages are distributed together. Letting denote the years of education and denote

hourly wage, we can describe the joint distribution of and with a joint probability

density function (joint pdf). If and are discrete, their joint pdf is

( ) = ( = = )

If and are continuous, the probability that ∈ [ ] and ∈ [ ] is

( ≤ ≤ ≤ ≤ ) =

Z

Z

( )

Obviously this discussion can be generalized to any number of random variables, say1 ,

with joint pdf (1 ). For example, when the unit of observation is a randomly cho-

sen household, we might be interested in the joint distribution of number of people in the

household, their genders, their incomes, their education levels, their work experience, their

age, their race, etc.

The joint pdfs must also integrate (or sum) to 1, just as in the single random variable,

because we know with certainty that and will attain some values. Formally,

[ and are discrete] :X

X

( ) = 1

[ and are continuos] :

Z ∞

−∞

Z ∞

−∞ ( ) = 1

Again, the above can be generalized to joint pdf of any number of random variables.

1.3.1 Marginal pdf

In relation to the joint pdf ( ), the functions () and () are called marginal or

(individual) probability density functions. The marginal pdfs are obtained by

[ and are discrete] :

½ () =

P ( )

() =P

( )(1.1)

[ and are continuos] :

½ () =

R∞−∞ ( )

() =R∞−∞ ( )

(1.2)

Page 11: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.3. RANDOM VECTORS 7

Intuitively, can attain a given value of when attains the value 1 or 2 or any other

possible value. Thus, to get ( = ) we need to sum (integrate) over all these cases of

getting any of its possible values.

A point of caution is needed here. The marginal densities of and do not need to have

the same functions. Some texts are using the notations () and () to make it clear in

equations (1.1) and (1.2) that the marginal pdfs are in general not the same function; has

pdf and has pdf . These notes follow other texts, which give up some mathematical

precision in favor of ease of notation. Our textbook does not discuss joint and marginal

densities, and therefore does not have this problem2.

For example, suppose that I am looking at the joint distribution of gender, , attaining

the value of 1 for female and 0 for male, and GPA, , attaining values 1 , of studentsin this course. Suppose I want to obtain the marginal distribution of the gender . Then,

using equation (1.1), we have

(1) = ( = 1) = (1 1) + + (1 ) =X

(1 )

(0) = ( = 0) = (0 1) + + (0 ) =X

(0 )

Written in another way, a female or a male student can have any of the GPA values

1 , and therefore (female) = (female and = 1) + + (female and = )

(male) = (male and = 1) + + (male and = )

Exercise 10 Consider the function:

( ) =

½2− −

0

0 ≤ ≤ 1; 0 ≤ ≤ 1otherwise

(a) Show that Z ∞

−∞

Z ∞

−∞ ( ) = 1

In other words, prove that ( ) is a probability density function.

(b) Find the marginal pdfs () and ().

1.3.2 Conditional pdf

In econometrics, we are often interested in the behavior of one variable, conditional on

given values of another variable. For example, I might want to study how well female

students are doing in statistics courses, compared with male students. This is the study

of grades pdf, given that the student is female, and grades pdf, given that the student is

male. As another example, one might want to study the distribution of wages, given that

the level of education is "college" v.s. the distribution of wages given that the education

level is "MBA". Again, this is the study of the pdf of wages, conditional on education level.

2The reason why I bother you with detailed discussion of joint densities will become clear when we talk

about independence of random variables - a key concept in sampling theory and regression analysis.

Page 12: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

8 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

Definition 8 Conditional probability density function of given is given by

(|) = ( )

()(1.3)

In cases when () = 0, we define (|) = 0.

The left hand side of equation (1.3) is sometimes written as (| = ), to emphasize

that this is the conditional density of given that is fixed at a given value of . The

vertical bar with following it, |, means that we are conditioning on (or it is given that = ). For example, the conditional pdf of wages ( ), given that education is at level 3,

( = 3), is written

(| = 3) = ( )

( = 3)

or in short

(|3) = ( )

(3)

One can think of whatever is written after the bar as information. If two variables,

and are related, then the information on one of them should restrict the possible values

of other. For example, consider the random experiment of tossing a die, and let be the

result to a toss. Thus, the possible values of (the support of ) are 1 2 3 4 5 6. Thepdf of is () = 16 for = 1 6. Now suppose that we provide an information that the

result of a toss is an even number. We can provide this information by defining a random

variable = 1 if is even and = 0 if is odd. What is the conditional pdf of

given that = 1? We know that conditional on = 1, can only attain values 2 4 6.Intuitively, these values can be attained with equal probabilities, so ( = 2| = 1) =

( = 4| = 1) = ( = 6| = 1) = 13. This pdf can be obtained using the definition

of conditional density in equation (1.3), and the fact that even numbers in a toss of a die

occur with probability 12:

(2| = 1) = (1 2)

( = 1)=

¡16

¢¡12

¢ = 1

3

(4| = 1) = (1 4)

( = 1)=

¡16

¢¡12

¢ = 1

3

(6| = 1) = (1 6)

( = 1)=

¡16

¢¡12

¢ = 1

3

1.3.3 Independence of random variables

What if I provide an information which is not useful? Intuitively, the distribution of a random

variable with no information at all should be the same as the conditional distribution given

some useless information. For example, suppose that I toss a die, and let be the random

variable which is equal to the result. Suppose that at the same time, my friend flips a coin,

and let = 1 if heads and = 0 if tails. What is the conditional pdf of given that

Page 13: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 9

= 1? The information about the result of the coin flip does not reveal anything about

the likely outcomes of the die toss. In this case we say that and are independent.

Intuitively, when information is useless, we expect that (|) = (). Thus, from the

definition (1.3) it follows that when and are independent, we have:

(|) = ( )

()= ()

⇒ ( ) = () ()

Definition 9 Two random variables are statistically independent if and only if

( ) = () ()

that is, the joint pdf can be written as the product of marginal pdfs.

In general though, if and are dependent, the definition in (1.3) says that the joint

pdf can be factored out in two ways:

(|) = ( )

()⇒ ( ) = () (|)

(|) = ( )

()⇒ ( ) = () (|)

In other words, the joint pdf is the product of a conditional and marginal pdfs. Comparing

the two gives the relationship between the two conditional densities, known as the Bayes

rule for pdfs:

(|) = () (|) ()

The concept of independence is crucial in econometrics. For starters, a random sample

must be such that the units of observation are independent of each other.

Exercise 11 Consider the pdf

( ) =

½2− −

0

0 ≤ ≤ 1; 0 ≤ ≤ 1otherwise

Check whether are are statistically independent.

1.4 Moments

Often the distribution of a random variable can be summarized in terms of a few of its char-

acteristics, known as the moments of the distribution. The most widely used moments are

mean (or expected value) and variance. In characterizing a random vector, in addition

to mean and variance, we also look at covariance and correlation.

Page 14: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

10 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

1.4.1 Expected value

Intuitively, expected value or the mean of a random variable, is some average of all the values

that the random variable can attain. In particular, the values of the random variable should

be weighted by their probabilities.

Definition 10 The expected value (or mean) of a random variable is

[If is discrete] : () =X

()

[If is continuos] : () =

Z ∞

−∞ ()

In the discrete case, the sum is over all the possible values of , and each is weighted

by the discrete pdf. No differently, in the continuous case, the integral is over all the possible

values of weighted by the continuous pdf.

For example of discrete case, consider to be the result of a die toss, with pdf () = 16

for all ∈ 1 2 3 4 5 6. This is called the discrete uniform distribution. The expected

value of is:

() =X

() =1

6(1 + 2 + 3 + 4 + 5 + 6) = 35

For example of a continuous case, consider ∼ [ ] with pdf () = 1− ( () = 0

outside of the interval [ ]). This is the continuous uniform distribution. The expected

value of is:

() =

Z ∞

−∞ () =

Z

1

− =

1

µ2

2

¯

=1

µ2 − 2

2

¶=

1

(− ) (+ )

2

=+

2

This result is not surprising, when you look at the graph of the continuous uniform density

and notice that the mean is simply the middle of the support [ ].

It is important to realize that expectation of a random variable is always a number.

While the toss of a die is a random variable which attains 6 different values, its expectation

(mean) is a single number 35. Next, we list some general properties of expected values, i.e.

properties that apply to discrete and continuous random variables.

Rules of Expected Values

1. Expected value of a constant is the constant itself. Thus, if is a constant, then

() =

2. Constants factor out. If is a constant number, then

() = ()

Page 15: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 11

3. Expected value of a sum is the sum of expected values. That is, for any random variable

and

( + ) = () + ( )

Together with rule 2, this generalizes to any linear combination of random variables.

Let 1 be random variables, and let 1 be numbers. Then,

(11 + + ) = 1 (1) + + ()

4. If and are independent, then

( ) = () ( )

We will prove the 4th rule, and leave the 2nd and 3rd as exercise.

Proof. (4th rule of expected values). The product is another random variable, since any

function of random variables is another random variable. Thus, if and are continuous,

the expected value of the product is

( ) =

Z ∞

−∞

Z ∞

−∞ ( )

By definition 9

( ) =

Z ∞

−∞

Z ∞

−∞ () ()

=

Z ∞

−∞ ()

∙Z ∞

−∞ ()

¸| z

()

= ()

Z ∞

−∞ ()

= () ( )

The proof is exactly the same for discrete and with summation instead of integration.

Exercise 12 (Proof of rules 2 and 3 of expected values). Let and be a random variables

and are numbers.

(a) Prove that () = () when is discrete.

(b) Prove that () = () when is continuous.

(c) Prove that ( + ) = () + ( ) when and are discrete.

(d) Prove that ( + ) = () + ( ) when and are continuous.

Page 16: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

12 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

1.4.2 Variance and standard deviation

While the expected value tells us about some average of distribution, the variance and

standard deviation measure the dispersion of the values around the mean. Imagine one

uniform random variable with values 9 10 11 and another with 0 10 20. They bothhave the same mean of 10, but in the first the values are concentrated closer to the mean

(smaller variance).

Definition 11 Let be a random variable with mean () = . The variance of is

() = 2 = £( − )

Definition 12 The positive square root of variance, () = =p (), is the

standard deviation of .

Thus, the variance is the expected value of the squared deviation of the random variable

from its mean. We already know the definitions of expected value of discrete and continuous

random variables, and therefore we can write the definition of variance as follows:

[If is discrete] : () = 2 =X

(− )2 ()

[If is continuos] : () = 2 =

Z ∞

−∞(− )

2 ()

As example of a discrete case, lets compute the variance of a toss of a die, , with pdf

() = 16 for all ∈ 1 2 3 4 5 6. We already computed the mean = 35. Thus, the

variance is:

() =1

6

∙(1− 35)2 + (2− 35)2 + (3− 35)2+(4− 35)2 + (5− 35)2 + (6− 35)2

¸= 29167

For example of a continuous case, consider ∼ [ ] with pdf () = 1− ( () = 0

outside of the interval [ ]). This is the continuous uniform distribution. We already found

that the mean is = +2, and therefore the variance of is:

() =

Z

µ− +

2

¶21

=1

Z

Ã2 − (+ )+

µ+

2

¶2!

=1

Ã3

3− (+ )

2

2+

µ+

2

¶2

¯¯

=1

"3 − 3

3− (+ )

2 − 2

2+

µ+

2

¶2(− )

#

Page 17: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 13

Using the rule, 3 − 3 = (− ) (2 + + 2), we get

() =

"2 + + 2

3− (+ )

2

2+

µ+

2

¶2#

=

"2 + + 2

3− (+ )

2

4

#

=

∙4 (2 + + 2)− 3 (2 + 2+ 2)

12

¸=

∙2 − 2+ 2

12

¸=(− )

2

12

The derivation was pretty messy, but the result makes intuitive sense. Notice that the

variance of ∼ [ ] depends positively on the length of the interval − .

Exercise 13 Consider the continuous uniform random variable ∼ [ ].

(a) If = 0 and = 1, find the pdf of , its expected value and variance.

(b) If = 0 and = 2, find the pdf of , its expected value and variance.

For computational convenience, the definition of variance can be manipulated to create

an equivalent, but easier to use formula.

() = £( − )

= £2 − 2 + 2

¤=

¡2¢− 2 () + 2

= ¡2¢− 22 + 2

= ¡2¢− 2

Thus, you will often use the formula:

() = ¡2¢− ()

2(1.4)

Having found the expected value, the last term is just its square. So we only need to compute

the mean of 2. For example of discrete case, consider again toss of a die, , with pdf

() = 16 for all ∈ 1 2 3 4 5 6. We found that () = 35, so ()2 = 352 = 1225.The first term is

¡2¢=

X

2 ()

=1

6

¡12 + 22 + 32 + 42 + 52 + 62

¢= 151667

Thus,

() = ¡2¢− ()

2

= 15167− 1225 = 29167

Page 18: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

14 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

For example of a continuous case, consider ∼ [ ] with pdf () = 1− ( () = 0

outside of the interval [ ]). We already found that () = +2, so ()

2=¡+2

¢2=

(+)2

4. The first term is:

¡2¢=

Z

21

=1

µ3

3

¯

=1

µ3 − 3

3

¶=

2 + + 2

3

Thus, the variance is

() = ¡2¢− ()

2

=2 + + 2

3− (+ )

2

4

=(− )

2

12

Notice that expressions become less messy with the definition of variance is (1.4).

Rules of Variance

1. Variance of constant is zero. Thus, if is a constant, then

() = 0

2. If is a constant number, then

() = 2 ()

Thus, for example, if you multiply a random variable by 2, its variance increase by a

factor of 4.

3. Adding a constant to a random variable, does not change its variance. Thus, if is a

constant number, then

( + ) = ()

Adding a constant to a random variable, just shifts all the values and the mean by

that number, but does not change their dispersion around the mean.

4. Variance of the sum (difference) is NOT the sum (difference) of the variances. For

random variables and , we have

( + ) = () + ( ) + 2 ( )

( − ) = () + ( )− 2 ( )

where ( ) is the covariance between and , to be defined in the next section.

Page 19: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 15

The first 3 rules can be easily proved directly from the definition of variance () =

£( − )

2¤. We’ll get back to the 4th rule in the next section, after we discuss covariance.

Proof. (Rules 1,2,3 of variance).

(1) If = , then from the rules of expected values () = . Thus

() = £(− )

2¤= 0

(2) From the rules of expected values, () = (). Thus,

() = £( − ())

= £( − ())

= £2 ( − )

= 2£( − )

= 2 ()

(3) From rules of expected values, ( + ) = () + = + . Thus,

( + ) = £( + − (+ ))

= £( − )

= ()

Rules of Standard Deviation

Usually, textbooks do not even bother presenting the rules of standard deviations, because

they follow directly from rules of variances. For example, taking square roots of the first 3

rules of variances, gives the following rules of standard deviations:

[1] :p () =

√0⇒ () = 0

[2] :p () =

p2 ()⇒ () = || · ()

[3] :p ( + ) =

p ()⇒ ( + ) = ()

Notice that rule 2 implies that if we double a random variable, we also double its standard

deviation.

Besides the importance of variance and standard deviation is statistics (we will be using

them everywhere in this course), they are extremely important in finance. Any financial

asset (or a portfolio of assets) is a random variable, which has expected return (mean re-

turn) and risk (standard deviation of returns). A great deal of modern portfolio theory

deals with finding portfolios which maximize expected return for any given level of risk, or

alternatively, minimizing the level of risk for any given expected return. These are called

efficient portfolios. If you take Finance 350 course (which is highly recommended), these

notes will be useful as well.

Page 20: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

16 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

Exercise 14 Let be random variable with mean and variance 2. Let be the

following transformation of :

= −

Thus is obtained by subtracting the mean and dividing by standard deviation (this operation

called standardization). Show that has mean zero and standard deviation 1 (i.e. = 0,

and = 1). Such random variable is called standard.

Exercise 15 Can the variance of any random variable be negative?

Exercise 16 Can the standard deviation of any random variable be negative?

Exercise 17 Let be random variable with mean and variance 2. What is the stan-

dard deviation of = 2?

Exercise 18 Let be random variable with mean and variance 2. What is the stan-

dard deviation of = −2?

1.4.3 Covariance and correlation

When analyzing the joint behavior of two random variables, we often start by asking if they

are correlated. For example, we would like to know if education level and wages are positively

correlated (otherwise, what are we doing here?). As another example, in macroeconomics we

often want to know if some variable is correlated with real GDP or not (say unemployment).

What this means in plane language, is that we want to investigate whether two variables are

moving around the mean in the same direction, opposite direction or there is no connection

between their movements.

Definition 13 Let and be two random variables with means and respectively.

The covariance between them is

( ) = [( − ) ( − )]

Notice that if and are mostly moving together around their mean, then when is

above , also tends to be above , and when is below , also tends to be below

. In this case the terms ( − ) and ( − ) are either both positive or both negative,

and ( ) 0. We then say that and are positively correlated. On the other hand,

suppose that and are for the most part moving in opposite directions. Then when

is above , tends to be below , and when is below , tends to be above .

In such case the terms ( − ) and ( − ) will have opposite signs and ( ) 0.

We then say that and are positively correlated. We will see later, that when and

are independent, then ( ) = 0, and we say that and are uncorrelated.

Notice that just as the definition of variance, the definition of covariance involves com-

puting expected value. In order to compute these expected values, we once again have two

Page 21: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 17

formulas for discrete and for continuos random variables.

[If and are discrete] : ( ) =X

(− ) ( − ) ( )

[If and are continuos] : ( ) =

Z ∞

−∞

Z ∞

−∞(− ) ( − ) ( )

Also as we did with variance, we can manipulate the definition of covariance to derive a more

convenient formula for computation.

( ) = [( − ) ( − )]

= [ − − + ]

= ( )− ()− ( ) +

= ( )− − +

= ( )−

Thus, you will often use the formula

( ) = ( )− () ( ) (1.5)

Rules of Covariance

1. Covariance of a random variable with itself is the variance of :

() = ()

Notice that we could have discussed covariance before variance, and then define vari-

ance as the covariance of a variable with itself. In fact, this way we can prove the rules

of variances in a more elegant and easy way. Our textbook does exactly that, so in

these notes I present moments in the usual order (mean → variance → covariance).

2. Covariance between a random variable and a constant is zero. Thus, if is a random

variable, and is a constant, then

( ) = 0

3. Constants factor out as product. Let and be random variables and and be

constant numbers. Then,

( ) = · ( )

4. Distributive property. Let , and be random variables. Then,

( + ) = ( ) + ()

5. Adding constants to random variables, does not change the covariance between them.

Let and be random variables and and be constant numbers. Then,

( + + ) = ( )

Page 22: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

18 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

6. Covariance between two independent variables is 0. That is, suppose that random

variables and are independent. Then,

( ) = 0

or

( ) = () ( )

Recall that variables with zero covariance are called uncorrelated. This rule says

that independence implies lack of correlation.

Proof. For all the proofs of the covariance rules it is easier to use the definition in equation

(1.5), that is, ( ) = ( )− () ( ).

(Rule 1). This just follows from comparing the definition of covariance (1.5), when

= , with the definition of variance (1.4).

(Rule 2). It should be intuitive, that covariance of any random variable with a constant

is zero, because covariance measures the comovement of the two around their means, and a

constant simply does not move. Formally,

( ) = ()− () ()

= ()− ()

= 0

(Rule 3). Constants factor out because the do so in expected values. Formally,

( ) = ( · )− () ( )

= ( )− () ( )

= [ ( )− () ( )]

= · ( )

(Rule 4). Distributive property.

( + ) = ( · ( + ))− () ( + )

= ( +)− () [ ( ) + ()]

= ( ) + ()− () ( )− () ()

= [ ( )− () ( )]| z ( )

+ [ ()− () ()]| z ()

(Rule 5). Adding constants does not affect the covariance.

( + + ) = [( + ) · ( + )]− ( + ) ( + )

= [ ++ + ]− [ () + ] [ ( ) + ]

= [ ( ) + () + ( ) + ]− [ () ( ) + () + ( ) + ]

= ( )− () ( )

= ( )

Page 23: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 19

(Rule 6). Covariance between two independent variables is 0. This is exactly rule 4 of

expected values, which we already proved and will not repeat here.

Now we are in a position to prove rule 4 of variance, which is a formula of variance of

sums:

( + ) = () + ( ) + 2 ( )

Proof. (Rule 4 of variance). Recall that variance is covariance between the variable and

itself. Therefore,

( + ) = ( + + )

By distributive property of covariance (rule 4), we have

( + + ) = ( + ) + ( + )

= ()| z ()

+ ( ) + () + ( )| z ( )

= () + ( ) + 2 ( )

Exercise 19 Prove that for any two random variables and , we have

( − ) = () + ( )− 2 ( )

Exercise 20 Prove that for two independent random variables and , the variance of

a sum is the sum of the variances:

( − ) = () + ( )

Recall that covariance measures the degree of comovement of two random variables

around their means. The main disadvantage of this measure is that it is NOT unit-free.

Suppose researcher studies the comovement of years of schooling and annual wages

measured in thousands of dollars, . He finds a positive covariance:

( ) = 7

Researcher has the same goal and the same data on wages and schooling, but he chooses

to measure wages in dollars (instead of thousands of dollars). Thus, he finds

( 1000 · ) = 1000 ( ) = 7000

Should researcher conclude that wages and years of schooling have stronger comovement

than what was previously found by researcher ? NO, if researcher and use the same

data, we want them to reach the same conclusion about the degree of comovement between

wages and years of schooling. In order to make the results comparable, researchers report

correlation instead of covariance.

Page 24: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

20 CHAPTER 1. APPENDIX 1: PROBABILITY REVIEW

Definition 14 The correlation between random variables and is

( ) = = ( )

We define ( ) = 0 whenever = 0, or = 0 or both. In other words, whenever

the covariance between two random variables is zero, we define the correlation to be also zero,

and say that the two variables are uncorrelated.

Thus, the correlation is the variance divided by the product of standard deviations. How

does this resolve the problem of covariance depending on units of the random variables?

Let and be random variables, and let and be positive numbers. We know that

( ) = · ( ), so if and represent different units used by different

researchers, we will have different results. Correlation however is unit free, and does not

change when we scale the random variables by positive numbers.

Properties of Correlation

1. For any two random variables and , the sign of the correlation between them is

the same as the sign of the covariance between them.

( ( )) = ( ( ))

In other words, correlation gives us the same information as covariance about the qual-

itative nature of the comovement between and around their mean. If ( )

0, meaning that and tend to be above and below their respective means at the

same time, then we will also have ( ) 0.

2. Correlation is unit free. Let and be random variables, and let and be numbers,

both of the same sign. Then,

( ) = ( )

3. Correlation between any two random variables can be only between −1 and 1. Let and be random variables. Then,

−1 ≤ ( ) ≤ 1

Proof. (Property 1). Since standard deviation of any random variable is always positive,

dividing by them does not change the sign.

(Property 2). Let and be random variables, and let and be numbers with the

same sign.

( ) = ( )

() · ( )=

· ( )

|| · () · || · ( )=

|| · || ( )

() ( )

= ( )

Page 25: Probability Review - online.sfsu.eduonline.sfsu.edu/mbar/ECON851_files/Probability Review.pdf · probability of heads is 1 2 and the probability of tails is also 1 2". Similarly,

1.4. MOMENTS 21

We do not provide the proof of property 3, since it involves knowledge of inner products

and Cauchy—Schwarz inequality. Instead, we discuss this property at intuitive level. The

highest possible correlation that can be achieved, is when = . Then the two random

variables and are always guaranteed to be above or below their mean at the same

time. The lowest possible correlation is achieved when = −. In this case, the twovariables always deviate from their means in opposite directions. Thus, we will prove that

() = 1 and (−) = −1, and these will be the maximal and minimalcorrelations, so that in general, for any and we have −1 ≤ ( ) ≤ 1.

() = ()

=

()

()= 1

(−) = (−)

=− ()

= −1

In the last equation we used rule 3 of covariance, that of constants factoring out, with the

constant in this case being −1.

Exercise 21 Prove that if two random variables, and , are independent, then they are

uncorrelated, i.e. ( ) = 0.

The opposite statement is not true. Two random variables can be uncorrelated, but not

independent. The next exercise gives such an example.

Exercise 22 Let be a random variable with pdf () = 14 for ∈ 1 2−1−2. Let = 2.

(a) Show that and are not correlated, i.e. show that ( ) = 0.

(b) Show that and are not independent. (Hint: compare the conditional pdf (|)with (). If and are independent, then we must have (|) = ()).

Exercise 23 Let be a random variables, and , be some numbers. Let = + .

(a) Prove that if 0, then ( ) = 1.

(b) prove that if 0, then ( ) = −1.(c) Prove that if = 0, then ( ) = 0.


Recommended