Module 5: Random vectorsruben/Stat321Website/Lectures/Slides5_Fa… · Ruben Zamar Department of...

transcript

Module 5: Random vectors

Ruben ZamarDepartment of Statistics

February 9, 2016

Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 1 / 103

RANDOM VECTORS

USED TO DESCRIBE QUANTITATIVE FEATURES OF ARANDOM OUTCOME

INTEREST CENTRES ON THE RELATIONSHIP AMONG THEFEATURES (JOINT BEHAVIOR)

EXAMPLE: ROLLING TWO DICE

SUM OF POINTS

DIFFERENCE OF POINTS

NOTATION

RV’S ARE DENOTED BY BOLD UPPERCASE LETTERS SUCH AS

X,Y,Z,U,V

X1X2...Xm

EACH ENTRY, Xi , IN X IS A RANDOM VARIABLE

TYPES OF RANDOM VECTORS

WE CONSIDER THE “DISCRETE” AND “CONTINUOUS” CASES

DISCRETE: ALL THE ENTRIES ARE DISCRETE RANDOMVARIABLES

CONTINUOUS: ALL THE ENTRIES ARE CONTINUOUSRANDOM VARIABLES

JOINT PROBABILITY MASS FUNCTION (JOINT pmf)

f (x1, x2, ..., xm) = P (X1 = x1,X2 = x2, ...,Xm = xm)

EXAMPLE: ROLLING TWO DICE

X =(X1X2

(SUM OF POINTSABSOLUTE DIFFERENCE OF POINTS

X12 3 4 5 6 7 8 9 10 11 12

0 1/36 1/36 1/36 1/36 1/36 1/36

1 1/18 1/18 1/18 1/18 1/18

2 1/18 1/18 1/18 1/18

3 1/18 1/18 1/18

4 1/18 1/18

5 1/18

MARGINAL DENSITIES (pmf’s)

Take m = 2 for simplicity.

Let f (x1, x2) be a discrete joint pmf function.

f1 (x1) = ∑x2

f (x1, x2)

f2 (x2) = ∑x1

f (x1, x2)

The other variable is added out.(other variables when m > 2)

EXAMPLE: ROLLING TWO DICE (continued)

2 3 4 5 6 7 8 9 10 11 12 f20 1/36 1/36 1/36 1/36 1/36 1/36 6/36

1 1/18 1/18 1/18 1/18 1/18 10/36

2 1/18 1/18 1/18 1/18 8/36

3 1/18 1/18 1/18 6/36

4 1/18 1/18 4/36

5 1/18 2/36

f1 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 36/36

MEAN OF A RANDOM VECTOR

E (X) = E

X1X2...Xm

E (X1)E (X2)...

E (Xm)

µ1µ2...

COVARIANCE MATRIX OF A RANDOM VECTOR

Cov (X) = E[(X−µ) (X− µ)′

σ11 σ12 σ13 · · · σ1mσ21 σ22 σ23 · · · σ2m

σm1 σm2 σm3 · · · σmm

COVARIANCE MATRIX (continued)

σii = E{(Xi − µi )

2}= Var (Xi )

σij = E{(Xi − µi )(Xj − µj )

}= cov (Xi ,Xj )

COVARIANCE MATRIX

To fix ideas take m = 2.

Cov (X) = E

(X1 − µ1)2 (X1 − µ1) (X2 − µ2)

(X2 − µ2) (X1 − µ1) (X2 − µ2)2

E[(X1 − µ1)

E [(X1 − µ1) (X2 − µ2)]

E [(X2 − µ2) (X1 − µ1)] E[(X2 − µ2)

(σ11 σ12σ21 σ22

CORRELATION COEFFICIENT

ρij =σij√σiiσjj

=cov (Xi ,Xj )

SD (Xi ) SD (Xj )

IT CAN BE SHOWN THAT

−1 ≤ ρij ≤ 1

DISCUSSION

A LINEAR INCREASING RELATION:

X1 AND X2 ARE LIKELY TO BE ABOVE AND BELOW THEIRMEANS TOGETHER

CROSS PRODUCTS (X1 − µ1)(X2 − µ2) ARE LIKELY TO BEPOSITIVE

σ12 IS “LARGE” AND POSITIVE

ρ12 IS CLOSE TO 1

INCREASING LINEAR RELATION

10 5 0 5 10 15

Increasing Linear Relation

SIGN OF THE CROSS PRODUCTS

10 5 0 5 10 15

SIGN OF THE CROSS PRODUCTS

+ = + + = +

= + + =

INCREASING LINEAR RELATION

ρ close to 1

10 5 0 5 10 15

Increasing Linear Relation

DECREASING LINEAR RELATION

ρ close to -1

10 5 0 5 10 15

Decreasing Linear Relation

NO LINEAR RELATION

10 5 0 5 10 15

No Linear Relation

NO LINEAR RELATION

ρ close to 0

10 5 0 5 10 15

No Linear Relation

COMPUTING COVARIANCES

σ12 = E {(X1 − µ1)(X2 − µ2)}

= E {X1X2}+ µ1µ2 − E {X1} µ2 − E {X2} µ1

= E {X1X2}+ µ1µ2 − µ1µ2 − µ1µ2 =

= E {X1X2} − µ1µ2

x1 f1 (x1) x1f1 (x1)

2 1/36 2/363 2/36 6/364 3/36 12/365 4/36 20/366 5/36 30/367 6/36 42/368 5/36 40/369 4/36 36/3610 3/36 30/3611 2/36 22/3612 1/36 12/36

sum 252/36

HENCE µ1 =25236

x1 f1 (x1) x1f1 (x1)

2 1/36 2/363 2/36 6/364 3/36 12/365 4/36 20/366 5/36 30/367 6/36 42/368 5/36 40/369 4/36 36/3610 3/36 30/3611 2/36 22/3612 1/36 12/36

sum 252/36

HENCE µ1 =25236

x2 f2 (x2) x2f2 (x2)

0 6/36 01 10/36 10/362 8/36 16/363 6/36 18/364 4/36 16/365 2/36 10/36sum 70/36

HENCE µ2 =7036= 1.9444

x2 f2 (x2) x2f2 (x2)

0 6/36 01 10/36 10/362 8/36 16/363 6/36 18/364 4/36 16/365 2/36 10/36sum 70/36

HENCE µ2 =7036= 1.9444

X1 Sum2 3 4 5 6 7 8 9 10 11 12

0 1/36 1/36 1/36 1/36 1/36 1/36 0

1 1/18 1/18 1/18 1/18 1/18 35/18

2 1/18 1/18 1/18 1/18 56/18

3 1/18 1/18 1/18 63/18

4 1/18 1/18 56/18

5 1/18 35/18

HENCEE {X1X2} =

35+ 56+ 63+ 56+ 3518

= 13.611

FINALLY

σ12 = E {X1X2} − µ1µ2

= 13.611− 7× 1.9444 = 0.0002

IN THIS CASE WE HAVE

σ11 = 5.833

σ22 = 2.0525

ρ12 =0.0002√

5.833× 2.052= 0.00006

CONCLUSION: NO LINEAR ASSOCIATION BETWEEN THESEVARIABLES

INDEPENDENT RANDOM VARIABLES

The random variables

X1,X2, ...,Xm

are independent if and only if

f (x1, x2, ..., xm) = f1 (x1) f2 (x2) · · · fm (xm)

EXAMPLE

1 2 3 f (x2)1 0.12 0.20 0.08 0.402 0.18 0.30 0.12 0.60f (x1) 0.30 0.50 0.20 1.00

DISCUSSION

RESULT: If X and Y are independent then

E (XY ) = E (X )E (Y )

PROOF:

E (XY ) = ∑x

∑yxyf (x , y) = ∑

x∑yxyfX (x) fY (y) =

E (X )︷︸︸︷

∑xxfX (x)

E (Y )︷︸︸︷∑yyfY (y)

= E (X )E (Y ) .Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 30 / 103

COVARIANCE AND INDEPENDENCE

RESULT : If X ,Y are independent then Cov (X ,Y ) = σXY = 0PROOF:

σXY = E (XY )− E (X )E (Y )

= E (X )E (Y )− E (X )E (Y ) = 0

DISCUSSION

In general σXY = 0 does not imply that X ,Y are independent.

For example, let X be such that

P (X = x) = 1/21, for x = −10,−9, ...,−1, 0, 1, ..., 9, 10

Y = X 2 + V ,

where V is independent of X and and V ≈ 0, that is

P (V = −1) = P (V = 1) = 0.01 and P (V = 0) = 0.98.

Clearly Y ≈ X 2 (V is a small perturbation). So, Y and X arehighly dependent.

However,

σXY = E (XY )−0︷︸︸︷

E (X )E (Y ) = E[X(X 2 + V

= E(X 3)+ E [XV ] =

0︷︸︸︷E(X 3)+

0︷︸︸︷E (X )E (V ) = 0

CONDITIONAL pmf’s

f (x2|x1) = P (X2 = x2|X1 = x1)

=P (X1 = x1,X2 = x2)

P (X1 = x1)

=f (x1, x2)f1 (x1)

Note that

f (x1, x2)f1 (x1)

= f (x2|x1) implies that

f (x1, x2) = f1 (x1) f (x2|x1)

Similarly

f (x1, x2)f2 (x2)

= f (x1|x2) implies that

f (x1, x2) = f2 (x2) f (x1|x2)

If X1 and X2 are independent then

f (x1, x2) = f1 (x1) f2 (x2)

In this case:

f (x2|x1) =f (x1, x2)f1 (x1)

=f1 (x1) f2 (x2)f1 (x1)

= f2 (x2)

Similarly

f (x1|x2) = f1 (x1)

CONDITIONAL MEAN AND VARIANCE

µy |x = E (Y |X = x) = ∑yyf (y |x) (cond. mean)

σ2y |x = Var (Y |X = x) = ∑y

(y − µy |x

)2f (y |x) (cond. variance)

= ∑yy2f (y |x)− µ2y |x = E

(Y 2|X = x

)− µ2y |x

Recall:

2 3 4 5 6 7 8 9 10 11 12 f20 1/36 1/36 1/36 1/36 1/36 1/36 6/36

1 1/18 1/18 1/18 1/18 1/18 10/36

2 1/18 1/18 1/18 1/18 8/36

3 1/18 1/18 1/18 6/36

4 1/18 1/18 4/36

5 1/18 2/36

f1 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 36/36

f ( x2 | x1 )

x2 f(x2|2) f(x2|3) f(x2|4) f(x2|5) f(x2|6) f(x2|7)0 1 1/3 1/51 1 1/2 1/32 2/3 2/53 1/2 1/34 2/5 05 1/3

x2 f(x2|8) f(x2|9) f(x2|10) f(x2|11) f(x2|12)0 1/5 1/3 11 1/2 12 2/5 2/33 1/24 2/55

E( X2 | x1 ) and Var( X2 | x1 )

x1 E (X2|X1 = x1) Var (X2|X1 = x1)2 0.00 0.0003 1.00 0.0004 1.33 0.8895 2.00 1.0006 2.40 2.2407 3.00 2.6668 2.40 2.2409 2.00 1.00010 1.33 0.88911 1.00 0.00012 0.00 0.000

f (x1 | x2 )

2 3 4 5 6 7 8 9 10 11 12

f(x1|0) 1/6 1/6 1/6 1/6 1/6 1/6

f(x1|1) 1/5 1/5 1/5 1/5 1/5

f(x1|2) 1/4 1/4 1/4 1/4

f(x1|3) 1/3 1/3 1/3

f(x1|4) 1/2 1/2

f(x1|5) 1

E( X1 | x2 ) and Var( X1 | x2 )

x2 E (X1|X2 = x2) Var (X1|X2 = x2)0 7.00 11.6671 7.00 8.0002 7.00 5.0003 7.00 2.6674 7.00 1.0005 7.00 0.000

For instance

Var (X1|X2 = 0) =4+ 16+ 36+ 64+ 100+ 144

6− 72 = 11.667

SPECIAL CASE: X AND Y ARE INDEPENDENT

If X and Y are independent we have:

a) conditional pmf = marginal pmf

f (y |x) = fY (y) and f (x |y) = fX (x)

b) conditional means and variances = marginal means and variances:

E (Y |X = x) = E (Y ) , Var (Y |X = x) = Var (Y )

E (X |Y = y) = E (X ) , Var (X |Y = y) = Var (X )

SPECIAL CASE: X AND Y ARE INDEPENDENT

c) If X and Y are independent we also have:

E (g (X ,Y ) |X = x)always︷︸︸︷= E (g (x ,Y ) |X = x)

independence︷︸︸︷= E (g (x ,Y ))

More precisely:

E (g (X ,Y ) |X = x) = ∑yg (x , y) f (y |x) = ∑

yg (x , y) fY (y) = E (g (x ,Y ))

Example:E(eX+Y |X = x

)= exE

CONDITIONAL MEAN AS A FUNCTION

In general E (Y |X = x) is a function of x :

E (Y |X = x) = h (x)

We can consider the random function:

h (X )

This is denoted ash (X ) = E (Y |X )

EXAMPLE

1 2 3 4 5 6

TWO-STEP AVERAGE

RESULT:E {E (Y |X )} = E (Y )

More generally,

E {E (g (X ,Y ) |X )} = E (g (X ,Y ))

= ∑x

[∑yg (x , y) f (y |x)

]fX (x)

EXPECTED DIFFERENCE IN TWO STEPS

E (X2) computed in two steps:

x1 E (X2|X1 = x1) f1 (x1) E (X2|X1 = x1) f1 (x1)2 0.00 1/36 0

3 1.00 2/36 2/36

4 1.33 3/36 4/36

5 2.00 4/36 8/36

6 2.40 5/36 12/36

7 3.00 6/36 18/36

8 2.40 5/36 12/36

9 2.00 4/36 8/36

10 1.33 3/36 4/36

11 1.00 2/36 2/36

12 0.00 1/36 0

SUM 1.944

⇒ E (X2) = 1.944

TWO-STEP AVERAGE PROOF

E {E (Y |X )} = E (h (X )) = ∑xh(x)fX (x)

= ∑x[E (Y |X = x)] fX (x) = ∑

[∑yyf (y |x)

]fX (x)

= ∑yy ∑xf (y |x) fX (x) = ∑

yy ∑xf (x , y)

= ∑yyfY (y) = E (Y )

TOTAL VARIANCE

Red circles represent the conditional means: E (Y |X = x)Red ended whiskers represent the conditional variances: Var(Y |X = x)

1 2 3 4 5 6 7

Blue circle represents the overall mean: E (Y )Blue whisker represents the overall variance: Var(Y )

TOTAL VARIANCE FORMULA

RESULT:

V (Y ) = E {Var (Y |X )}+ Var {E (Y |X )}

= Unexplained Variance + Explained Variance

Unexpl Var = E {Var (Y |X )} Expl Var = Var {E (Y |X )}

EXPLAINED VARIANCE

Total Var =Var (Y ) = Expl Var + Unexpl Var

Percentage of Explained Variance =Expl VarTotal Var

=Var {E (Y |X )}

Var (Y )100%

PREDICTING THE SUM OF DICE

x2 E (X1|X2 = x2) Var (X1|X2 = x2) f2 (x2)0 7.00 11.667 6/361 7.00 8.000 10/362 7.00 5.000 8/363 7.00 2.667 6/364 7.00 1.000 4/365 7.00 0.000 2/36

Expl Var = Var (E (X1|X2)) = 0 (conditional mean is constant)

Unexpl Var = E (Var (X1|X2)) = 11.667×636+ · · ·+ 0× 2

36= 5.833

Total Var = 0+ 5.8334 = 5.833

Percentage of Explained Variance = 0%

PREDICTING THE DIFFERENCE

x1 E (X2|X1 = x1) Var (X2|X1 = x1) f1 (x1)2 0.00 0.000 1/363 1.00 0.000 2/364 1.33 0.898 3/365 2.00 1.000 4/366 2.40 2.240 5/367 3.00 2.667 6/368 2.40 2.240 5/369 2.00 1.000 4/3610 1.33 0.898 3/3611 1.00 0.000 2/3612 0.00 0.000 1/36

From the previous table we get

Expl Var = Var (E (X2|X1))

= E[E (X2|X1)2

]− (E [E (X2|X1)])2 = 0.6161127

Unexpl Var = E (Var (X2|X1)) = 1.438611

Var (X2) = 0.6161127+ 1.438611 = 2.0525

Percentage of Explained Variance =0.61611272.0525

× 100 = 30.02%

PROOF OF THE TOTAL VAR FORMULA

Proof:

Var (Y ) = E(Y 2)− [E (Y )]2

= E{E(Y 2|X

)}− [E {E (Y |X )}]2

= E{Var (Y |X ) + [E (Y |X )]2

}− [E {E (Y |X )}]2

= E {Var (Y |X )}+

Var (E (Y |X ))︷︸︸︷E{[E (Y |X )]2

}− [E {E (Y |X )}]2

EXAMPLE

Example: Y | N = n ∼ Bin (N + 1, p) and

N ∼ Poisson(5)

Calculate:

(a) E (Y )

(b) Var (Y ) . What fraction of this variance is explained by N?

EXAMPLE (Solution part (b))

E (Y ) = E {E [Y |N ]}

= E {(N + 1) p}

= pE {N + 1}

EXAMPLE (Solution part (c))

Solution

(b) Var (Y ) = E {Var (Y |N)}+ Var {E (Y |N)}

= E {(N + 1) p (1− p)}+ Var {(N + 1) p}

= p (1− p)E {N + 1}+ p2Var (N + 1)

= 6p (1− p) + 5p2 = p [6− 6p + 5p] = p (6− p)

Explained Variance Fraction =Var {E (Y |N)}

Var (Y )=

p (6− p) =5p6− p

Explained Variance Fraction As a Function of p

0.0 0.2 0.4 0.6 0.8 1.0

Explained Variance Fraction As a Function of p

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

p = 0.05

0 1 2 3 4 5 6 7 8 9 10 11 12

p = 0.95

CONDITIONAL MEAN IS THE “BEST”PREDICTOR

Suppose we know X = x and we wish to predict the correspondingvalue Y .

Our prediction will be a function g (x)

The mean squared prediction error will be

E[(g (x)− Y )2 |X = x

SinceE[(Y − t)2

]≥ E

[(Y − E (Y ))2

]we have

constant︷︸︸︷g (x) − Y

| X = x

≥ E[(E (Y |X = x)− Y )2 | X = x

Hence, g (x) = E (Y |X = x) is our “best prediction”

CONTINUOUS RANDOM VECTORS

All the entries are continuous random variables

Joint behavior determined by the continuous joint density f (x)

The continuous joint density f (x) is a function f : Rm → Rsatisfying

1 f (x1, x2, ..., xm) ≥ 0, for all x = (x1, x2, ..., xm)∈Rm , and

2∫ ∞−∞ · · ·

∫ ∞−∞ f (x1, x2, ..., xm) dx1 · · · dxm = 1.

We will take m = 2 for simplicity

CONTINUOUS RANDOM VECTORS

The corresponding continuous joint distribution function is:

F (x1, x2) = P (X1 ≤ x1,X2 ≤ x2)

=∫ x2

−∞

∫ x1

−∞f (t1, t2) dt1dt2

By the Fundamental Theorem of Calculus

f (x1, x2) =∂2

∂x1∂x2F (x1, x2)

EXAMPLES

Uniform on the unit square

f (x1, x2) = 1, 0 ≤ x1 ≤ 1, 0 ≤ x1 ≤ 1

Uniform on the circle

f (x1, x2) =1π, x21 + x

22 ≤ 1

Namelessf (x1, x2) =

, x1 > 1, x2 > 1

BIVARIATE NORMAL

f (x1, x2) =

(1− ρ2

)−1/2

2πσ1σ2exp

−(x1−µ1)

σ21+(x2−µ2)

σ22− 2ρ(x1−µ1)(x2−µ2)

σ1σ2

2 (1− ρ2)

We show in Module 6 that

E (X) =(

µ1µ2

)= µ, Var (X1) = σ21, Var (X2) = σ22,

andCov (X1,X2) = σ12 = ρσ1σ2

Therefore

Cor (X1,X2) =σ12

σ1σ2= ρ.

EXERCISE

Using that

σ11 σ12σ21 σ22

(σ21 σ1σ2σ1σ2 σ22

)and that

Σ−1 =(

σ11 σ12

σ21 σ22

11− ρ2

(1/σ21 −ρ/ (σ1σ2)−ρ/ (σ1σ2) 1/σ22

)show that

f (x1, x2) =det (Σ)−1/2

2πexp

{−12(x− µ)′ Σ−1 (x− µ)

MARGINAL DENSITIES

Let f (x1, x2) be the joint density for (X1,X2).

The marginal densities for X1 and X2 are

f1 (x1) =∫ ∞

−∞f (x1, x2) dx2

f2 (x2) =∫ ∞

−∞f (x1, x2) dx1,

respectively

The extra variables is integrated out.

CONDITIONAL DENSITIES

As in the discrete case

f (x2|x1) =f (x1, x2)f1 (x1)

, f (x1|x2) =f (x1, x2)f2 (x2)

Moreover,

f (x1, x2) = f1 (x1) f (x2|x1) = f2 (x2) f (x1|x2)

INDEPENDENT RANDOM VARIABLES

As in the discrete case, the continuous random variables X1,X2, ...,Xm areindependent if and only if

f (x1, x2, ..., xm) = f1 (x1) f2 (x2) · · · fm (xm)

RESULT: If X1,X2 are independent then

σ12 = 0

PROOF: σ12 =∫ ∞−∞

∫ ∞−∞ (x1 − µ1) (x2 − µ2) f1 (x1) f2 (x2) dx1dx2 =(∫ ∞

−∞ (x1 − µ1) f1 (x1) dx1) (∫ ∞

−∞ (x2 − µ2) f2 (x2) dx2)= 0

CONDITIONAL MEAN

µy |x = E (Y |X = x) =∫ ∞

−∞yf (y |x) dy (continuous case)

µy |x = E (Y |X = x) = ∑yyf (y |x) (discrete case)

CONDITIONAL VARIANCE

σ2y |x = Var (Y |X = x) =∫ ∞

−∞

(y − µy |x

)2f (y |x) dy (continuous case)

σ2y |x = Var (Y |X = x) = ∑y

(y − µy |x

)2f (y |x) (discrete case)

PRACTICE

Example: Suppose that X ∼ Unif (0, 10) and thatY |X = x ∼ Exp (1/x) .

(a) Calculate the mean and variance of Y .

(b) What fraction of the total variance is explained by X ?

PRACTICE

Solution: (a)

Recall that if Y ∼ Exp (λ) then E (Y ) = 1/λ and Var (Y ) = 1/λ2.

Since Y |X = x ∼ Exp (1/x) we have:

E (Y |X ) = X

Recall that, if X ∼ Unif (a, b) , the E (X ) = (a+ b) /2 andVar (X ) = (b− a)2 /12.

Since X ∼ Unif (0, 10) we have E (X ) = 5 and

Var (X ) = 100/12 = 8. 3333.

PRACTICE (continued)

HenceE (Y ) = E {E (Y |X )} = E {X} = 5

Var (Y ) = E {Var (Y |X )}+ Var {E (Y |X )}

= E{X 2}+ Var {X}

= Var (X ) + [E (X )]2 + Var (X )

=10012

+ 52 +10012

= 41.667

PRACTICE (continued)

Percentage of Explained Variance =Var {E (Y |X )}

Var (Y )100%

=100/1241.667

FUNCTIONS OF CONTINUOUS RANDOM VECTORS

Suppose the random vector X has joint density

fX (x1, x2, ..., xm)

Consider the 1-1 function

y = h (x)

x = h−1 (y)

fY (y) = fX(h−1 (y)

) ∣∣∣∣det( ∂xi∂yj

)∣∣∣∣ = fX [h−1 (y)] J (y)

Suppose the random vector

has joint density

fX (x1, x2)

Consider the 1-1 function from R2 → R2 :

h1 (x1, x2)

h2 (x1, x2)

= h (x)

h−11 (y1, y2)

h−12 (y1, y2)

= h−1 (y)

The matrix of partial derivatives:

(∂xi∂yj

∂x1∂y1

∂x1∂y2

∂x2∂y1

∂x2∂y2

∂h−11 (y1,y2)

∂y1∂h−11 (y1,y2)

∂h−12 (y1,y2)∂y1

∂h−12 (y1,y2)∂y2

The determinant:

∂xi∂yj

)= det

∂x1∂y1

∂x1∂y2

∂x2∂y1

∂x2∂y2

∂x1∂y1

∂x2∂y2− ∂x1

∂x2∂y1

EXAMPLE 1

Example 1: Suppose that X1 and X2 are independent standard normalrandom variables. Their joint density is

f (x1, x2) =12πe(−1/2)(x 21+x 22 ), −∞ ≤ x1 ≤ ∞, −∞ ≤ x2 ≤ ∞.

Find the density function for

R =√X 21 + X

θ = arctan(X2X1

Note: this is the representation of the point (X1,X2) in the “polarcoordinate system”.

RANGE AND INVERSE TRANSFORMATION

The range for (R, θ) is

R ≥ 0 , 0 ≤ θ < 2π

and the inverse transformation is

X1 = R cos (θ)

X2 = R sin (θ)

THE JACOBIAN

The Jacobian for this tranformation is:

∣∣∣∣det( ∂x1∂R

∂x1∂θ

∂x2∂R

∂x2∂θ

)∣∣∣∣=

∣∣∣∣det( cos (θ) −R sin (θ)sin (θ) R cos (θ)

)∣∣∣∣= R cos2 (θ) + R sin2 (θ) = R

THE JACOBIAN

The density for (R, θ) is:

f (r , θ) =12πe(−1/2)r 2 r

It is easy to check that:

R and θ are independent,

θ is uniform on [0, 2π) and

R has the Rayleigh density

f (r) = e(−1/2)r 2 r , r ≥ 0.

EXERCISE: calculate F (r) and E (R) .Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 86 / 103

EXAMPLE 2

Example 2: Suppose that (X1,X2) have uniform density on the unitsquare:

f (x1, x2) = 1, 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1

Find the density function for

Y1 = X1 + X2

Solution: Complete an invertible 1-1 function

x1 + x2

← function of interest

← auxiliary function

with inverse function x1

y1 − y2

h−11 (y1, y2)

h−12 (y1, y2)

NOTE: the auxiliary function is not unique. It is chosen for convenience.

Students can pursue other choices for practice. For example, y2 = x1 − x2

The (Joint) Range of Y1 and Y2

0 < x1 < 1⇒ 0 < y1 − y2 < 1

0 < x2 < 1⇒ 0 < y2 < 1

Therefore,

0 < y2 < 1

y2 < y1 < 1+ y2

0 < y2 < 1

y2 < y1 < 1+ y2

0.0 0.5 1.0 1.5

y1 = y2+1

y1 = y2

The Jacobian

x1 = y1 − y2 = h−11 (y1, y2)

x2 = y2 = h−12 (y1, y2)

J (y) =

∣∣∣∣∣∣∣det

∂x1∂y1

∂x1∂y2

∂x2∂y1

∂x2∂y2

∣∣∣∣∣∣∣ =

∣∣∣∣( 1 −10 1

)∣∣∣∣ = 1

fY (y1, y2) = fX (y1 + y2, y2) = 1, 0 < y2 < 1, y2 < y1 < 1+ y2

To obtain the marginal density for Y1 we must integrate out y2.

From the picture of the domain it follows that for 0 ≤ y1 ≤ 1, we have0 < y2 < y1.Hence,

fY1 (y1) =∫ y1

=1︷︸︸︷fY (y1, y2)dy2 =

∫ y1

0dy2 = y1

Similarly, for 1 ≤ y1 ≤ 2, we have y1 − 1 < y2 < 1.Hence,

fY1 (y1) =∫ 1

y1−1dy2 = 2− y1

In summary, Y1 has the triangular density

fY1 (y1) =

y1 0 ≤ y1 ≤ 1

2− y1 1 ≤ y1 ≤ 2

0 .0 0 .5 1 .0 1 .5 2 .0

Triangular Density

y 1 2 y 1

COVARIANCE

In the continuous case

σ12 =∫ ∞

−∞

∫ ∞

−∞(x1 − µ1) (x2 − µ2) f (x1, x2) dx1dx2

=∫ ∞

−∞

∫ ∞

−∞x1x2f (x1, x2) dx1dx2 − µ1µ2

= E (X1X2)− µ1µ2

EXAMPLE

(X1,X2) have joint uniform density on the unit square

µ1 = µ2 =∫ 1

0xdx =

E (X1X2) =∫ 1

0x1x2dx1dx2 =

Covarianceσ12 = E (X1X2)− µ1µ2 = 0

LINEAR TRANSFORMATIONS

Let X be a p-dimensional random vector with mean vector µX andcovariance matrix ΣX.

DefineY =AX+ b

where A is a constant q × p matrix and b is a constant q dimensionalvector.

ThenµY = E (Y) = AµX+b

andΣY = Cov (Y) = A ΣX A′

LINEAR TRANSFORMATIONS (proof)

Proof:The proof for the mean formula is immediate:

E (Y) =E (AX+ b) = AE (X) +b = AµX+b

LINEAR TRANSFORMATIONS (proof)

Proof: To prove the covariance formula we write:

ΣY = E{(Y− µY) (Y− µY)

′}= E

{[(AX+ b)− (AµX+b)] [(AX+ b)− (AµX+b)]

′}= E

{(AX−AµX) (AX−AµX)

′}= E

{A (X− µX) (X− µX)

′ A′}

= AE{(X− µX) (X− µX)

′}A′ = A ΣX A′

LINEAR TRANSFORMATIONS (Example)

X1X2X3

, E (X) =

, Cov (X) =

3 2 12 6 11 1 4

A =(1 1 1

), b = 1

Y=AX =(1 1 1

) X1X2X3

+ 1 = X1 + X2 + X3 + 1Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 99 / 103

LINEAR TRANSFORMATIONS (Example)

E (Y ) =(1 1 1

+ 1 = 11

Cov (Y ) = Var (Y ) =(1 1 1

) 3 2 12 6 11 1 4

=(1 1 1

LINEAR COMBINATIONS OF RV’S

a1a2...am

, a vector of given constants

Y = a′X =m

∑i=1aiXi (a linear combination of the X ′i s )

E (Y ) = E

∑i=1aiXi

)= a′µ =

∑i=1aiµi

Var (Y ) = Var

∑i=1aiXi

)= a′Σa =

∑i=1

∑j=1aiajσij

SUM OF RV’S

X1 + X2 + · · ·+ Xm = 1′XThen

Var (X1 + X2 + · · ·+ Xm) = Var(1′X)= = 1′ Σ 1

∑i=1

∑j=1

σij =m

∑i=1

σii + 2∑i<j

SUM OF INDEPENDENT RV’S

If X1,X2, ...,Xm are independent, then σij = 0, for i 6= j and so:

Var (X1 + X2 + · · ·+ Xm) =m

∑i=1

σii =m

∑i=1Var (Xi )

Module 5: Random vectorsruben/Stat321Website/Lectures/Slides5_Fa… · Ruben Zamar Department of...

Documents