Post on 09-Oct-2020
transcript
Module 5: Random vectors
Ruben ZamarDepartment of Statistics
UBC
February 9, 2016
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 1 / 103
RANDOM VECTORS
USED TO DESCRIBE QUANTITATIVE FEATURES OF ARANDOM OUTCOME
INTEREST CENTRES ON THE RELATIONSHIP AMONG THEFEATURES (JOINT BEHAVIOR)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 2 / 103
EXAMPLE: ROLLING TWO DICE
X =
X1
X2
=
SUM OF POINTS
DIFFERENCE OF POINTS
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 3 / 103
NOTATION
RV’S ARE DENOTED BY BOLD UPPERCASE LETTERS SUCH AS
X,Y,Z,U,V
X =
X1X2...Xm
EACH ENTRY, Xi , IN X IS A RANDOM VARIABLE
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 4 / 103
TYPES OF RANDOM VECTORS
WE CONSIDER THE “DISCRETE” AND “CONTINUOUS” CASES
DISCRETE: ALL THE ENTRIES ARE DISCRETE RANDOMVARIABLES
CONTINUOUS: ALL THE ENTRIES ARE CONTINUOUSRANDOM VARIABLES
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 5 / 103
JOINT PROBABILITY MASS FUNCTION (JOINT pmf)
f (x1, x2, ..., xm) = P (X1 = x1,X2 = x2, ...,Xm = xm)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 6 / 103
EXAMPLE: ROLLING TWO DICE
X =(X1X2
)=
(SUM OF POINTSABSOLUTE DIFFERENCE OF POINTS
)
X2
X12 3 4 5 6 7 8 9 10 11 12
0 1/36 1/36 1/36 1/36 1/36 1/36
1 1/18 1/18 1/18 1/18 1/18
2 1/18 1/18 1/18 1/18
3 1/18 1/18 1/18
4 1/18 1/18
5 1/18
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 7 / 103
MARGINAL DENSITIES (pmf’s)
Take m = 2 for simplicity.
Let f (x1, x2) be a discrete joint pmf function.
f1 (x1) = ∑x2
f (x1, x2)
f2 (x2) = ∑x1
f (x1, x2)
The other variable is added out.(other variables when m > 2)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 8 / 103
EXAMPLE: ROLLING TWO DICE (continued)
X2
X1
2 3 4 5 6 7 8 9 10 11 12 f20 1/36 1/36 1/36 1/36 1/36 1/36 6/36
1 1/18 1/18 1/18 1/18 1/18 10/36
2 1/18 1/18 1/18 1/18 8/36
3 1/18 1/18 1/18 6/36
4 1/18 1/18 4/36
5 1/18 2/36
f1 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 36/36
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 9 / 103
MEAN OF A RANDOM VECTOR
E (X) = E
X1X2...Xm
=
E (X1)E (X2)...
E (Xm)
=
µ1µ2...
µm
=µ
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 10 / 103
COVARIANCE MATRIX OF A RANDOM VECTOR
Cov (X) = E[(X−µ) (X− µ)′
]
=
σ11 σ12 σ13 · · · σ1mσ21 σ22 σ23 · · · σ2m
σm1 σm2 σm3 · · · σmm
= Σ
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 11 / 103
COVARIANCE MATRIX (continued)
σii = E{(Xi − µi )
2}= Var (Xi )
σij = E{(Xi − µi )(Xj − µj )
}= cov (Xi ,Xj )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 12 / 103
COVARIANCE MATRIX
To fix ideas take m = 2.
Cov (X) = E
(X1 − µ1)2 (X1 − µ1) (X2 − µ2)
(X2 − µ2) (X1 − µ1) (X2 − µ2)2
=
E[(X1 − µ1)
2]
E [(X1 − µ1) (X2 − µ2)]
E [(X2 − µ2) (X1 − µ1)] E[(X2 − µ2)
2]
=
(σ11 σ12σ21 σ22
)= Σ
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 13 / 103
CORRELATION COEFFICIENT
ρij =σij√σiiσjj
=cov (Xi ,Xj )
SD (Xi ) SD (Xj )
IT CAN BE SHOWN THAT
−1 ≤ ρij ≤ 1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 14 / 103
DISCUSSION
A LINEAR INCREASING RELATION:
X1 AND X2 ARE LIKELY TO BE ABOVE AND BELOW THEIRMEANS TOGETHER
CROSS PRODUCTS (X1 − µ1)(X2 − µ2) ARE LIKELY TO BEPOSITIVE
σ12 IS “LARGE” AND POSITIVE
ρ12 IS CLOSE TO 1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 15 / 103
INCREASING LINEAR RELATION
10 5 0 5 10 15
50
510
15
Increasing Linear Relation
x
y
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 16 / 103
SIGN OF THE CROSS PRODUCTS
10 5 0 5 10 15
50
510
15
SIGN OF THE CROSS PRODUCTS
x
y
+ = + + = +
= + + =
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 17 / 103
INCREASING LINEAR RELATION
ρ close to 1
10 5 0 5 10 15
50
510
15
Increasing Linear Relation
x
y
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 18 / 103
DECREASING LINEAR RELATION
ρ close to -1
10 5 0 5 10 15
15
10
50
510
1520
Decreasing Linear Relation
x
y
+
+
+
+
+
+
+
+
++
++
++
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 19 / 103
NO LINEAR RELATION
10 5 0 5 10 15
50
5
No Linear Relation
x
y
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 20 / 103
NO LINEAR RELATION
ρ close to 0
10 5 0 5 10 15
50
5
No Linear Relation
x
y
+
++
++ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 21 / 103
COMPUTING COVARIANCES
σ12 = E {(X1 − µ1)(X2 − µ2)}
= E {X1X2}+ µ1µ2 − E {X1} µ2 − E {X2} µ1
= E {X1X2}+ µ1µ2 − µ1µ2 − µ1µ2 =
= E {X1X2} − µ1µ2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 22 / 103
EXAMPLE: ROLLING TWO DICE (continued)
x1 f1 (x1) x1f1 (x1)
2 1/36 2/363 2/36 6/364 3/36 12/365 4/36 20/366 5/36 30/367 6/36 42/368 5/36 40/369 4/36 36/3610 3/36 30/3611 2/36 22/3612 1/36 12/36
sum 252/36
HENCE µ1 =25236
= 7
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 23 / 103
EXAMPLE: ROLLING TWO DICE (continued)
x1 f1 (x1) x1f1 (x1)
2 1/36 2/363 2/36 6/364 3/36 12/365 4/36 20/366 5/36 30/367 6/36 42/368 5/36 40/369 4/36 36/3610 3/36 30/3611 2/36 22/3612 1/36 12/36
sum 252/36
HENCE µ1 =25236
= 7
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 23 / 103
EXAMPLE: ROLLING TWO DICE (continued)
x2 f2 (x2) x2f2 (x2)
0 6/36 01 10/36 10/362 8/36 16/363 6/36 18/364 4/36 16/365 2/36 10/36sum 70/36
HENCE µ2 =7036= 1.9444
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 24 / 103
EXAMPLE: ROLLING TWO DICE (continued)
x2 f2 (x2) x2f2 (x2)
0 6/36 01 10/36 10/362 8/36 16/363 6/36 18/364 4/36 16/365 2/36 10/36sum 70/36
HENCE µ2 =7036= 1.9444
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 24 / 103
EXAMPLE: ROLLING TWO DICE (continued)
X2
X1 Sum2 3 4 5 6 7 8 9 10 11 12
0 1/36 1/36 1/36 1/36 1/36 1/36 0
1 1/18 1/18 1/18 1/18 1/18 35/18
2 1/18 1/18 1/18 1/18 56/18
3 1/18 1/18 1/18 63/18
4 1/18 1/18 56/18
5 1/18 35/18
HENCEE {X1X2} =
35+ 56+ 63+ 56+ 3518
= 13.611
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 25 / 103
EXAMPLE: ROLLING TWO DICE (continued)
FINALLY
σ12 = E {X1X2} − µ1µ2
= 13.611− 7× 1.9444 = 0.0002
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 26 / 103
EXAMPLE: ROLLING TWO DICE (continued)
IN THIS CASE WE HAVE
σ11 = 5.833
σ22 = 2.0525
HENCE
ρ12 =0.0002√
5.833× 2.052= 0.00006
CONCLUSION: NO LINEAR ASSOCIATION BETWEEN THESEVARIABLES
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 27 / 103
INDEPENDENT RANDOM VARIABLES
The random variables
X1,X2, ...,Xm
are independent if and only if
f (x1, x2, ..., xm) = f1 (x1) f2 (x2) · · · fm (xm)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 28 / 103
EXAMPLE
X1
X2
1 2 3 f (x2)1 0.12 0.20 0.08 0.402 0.18 0.30 0.12 0.60f (x1) 0.30 0.50 0.20 1.00
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 29 / 103
DISCUSSION
RESULT: If X and Y are independent then
E (XY ) = E (X )E (Y )
PROOF:
E (XY ) = ∑x
∑yxyf (x , y) = ∑
x∑yxyfX (x) fY (y) =
=
E (X )︷ ︸︸ ︷
∑xxfX (x)
E (Y )︷ ︸︸ ︷∑yyfY (y)
= E (X )E (Y ) .Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 30 / 103
COVARIANCE AND INDEPENDENCE
RESULT : If X ,Y are independent then Cov (X ,Y ) = σXY = 0PROOF:
σXY = E (XY )− E (X )E (Y )
= E (X )E (Y )− E (X )E (Y ) = 0
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 31 / 103
DISCUSSION
In general σXY = 0 does not imply that X ,Y are independent.
For example, let X be such that
P (X = x) = 1/21, for x = −10,−9, ...,−1, 0, 1, ..., 9, 10
Let
Y = X 2 + V ,
where V is independent of X and and V ≈ 0, that is
P (V = −1) = P (V = 1) = 0.01 and P (V = 0) = 0.98.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 32 / 103
Clearly Y ≈ X 2 (V is a small perturbation). So, Y and X arehighly dependent.
However,
σXY = E (XY )−0︷ ︸︸ ︷
E (X )E (Y ) = E[X(X 2 + V
)]
= E(X 3)+ E [XV ] =
0︷ ︸︸ ︷E(X 3)+
0︷ ︸︸ ︷E (X )E (V ) = 0
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 33 / 103
CONDITIONAL pmf’s
f (x2|x1) = P (X2 = x2|X1 = x1)
=P (X1 = x1,X2 = x2)
P (X1 = x1)
=f (x1, x2)f1 (x1)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 34 / 103
Note that
f (x1, x2)f1 (x1)
= f (x2|x1) implies that
f (x1, x2) = f1 (x1) f (x2|x1)
Similarly
f (x1, x2)f2 (x2)
= f (x1|x2) implies that
f (x1, x2) = f2 (x2) f (x1|x2)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 35 / 103
If X1 and X2 are independent then
f (x1, x2) = f1 (x1) f2 (x2)
In this case:
f (x2|x1) =f (x1, x2)f1 (x1)
=f1 (x1) f2 (x2)f1 (x1)
= f2 (x2)
Similarly
f (x1|x2) = f1 (x1)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 36 / 103
CONDITIONAL MEAN AND VARIANCE
µy |x = E (Y |X = x) = ∑yyf (y |x) (cond. mean)
σ2y |x = Var (Y |X = x) = ∑y
(y − µy |x
)2f (y |x) (cond. variance)
= ∑yy2f (y |x)− µ2y |x = E
(Y 2|X = x
)− µ2y |x
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 37 / 103
EXAMPLE: ROLLING TWO DICE (continued)
Recall:
X2
X1
2 3 4 5 6 7 8 9 10 11 12 f20 1/36 1/36 1/36 1/36 1/36 1/36 6/36
1 1/18 1/18 1/18 1/18 1/18 10/36
2 1/18 1/18 1/18 1/18 8/36
3 1/18 1/18 1/18 6/36
4 1/18 1/18 4/36
5 1/18 2/36
f1 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 36/36
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 38 / 103
f ( x2 | x1 )
x2 f(x2|2) f(x2|3) f(x2|4) f(x2|5) f(x2|6) f(x2|7)0 1 1/3 1/51 1 1/2 1/32 2/3 2/53 1/2 1/34 2/5 05 1/3
x2 f(x2|8) f(x2|9) f(x2|10) f(x2|11) f(x2|12)0 1/5 1/3 11 1/2 12 2/5 2/33 1/24 2/55
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 39 / 103
E( X2 | x1 ) and Var( X2 | x1 )
x1 E (X2|X1 = x1) Var (X2|X1 = x1)2 0.00 0.0003 1.00 0.0004 1.33 0.8895 2.00 1.0006 2.40 2.2407 3.00 2.6668 2.40 2.2409 2.00 1.00010 1.33 0.88911 1.00 0.00012 0.00 0.000
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 40 / 103
f (x1 | x2 )
2 3 4 5 6 7 8 9 10 11 12
f(x1|0) 1/6 1/6 1/6 1/6 1/6 1/6
f(x1|1) 1/5 1/5 1/5 1/5 1/5
f(x1|2) 1/4 1/4 1/4 1/4
f(x1|3) 1/3 1/3 1/3
f(x1|4) 1/2 1/2
f(x1|5) 1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 41 / 103
E( X1 | x2 ) and Var( X1 | x2 )
x2 E (X1|X2 = x2) Var (X1|X2 = x2)0 7.00 11.6671 7.00 8.0002 7.00 5.0003 7.00 2.6674 7.00 1.0005 7.00 0.000
For instance
Var (X1|X2 = 0) =4+ 16+ 36+ 64+ 100+ 144
6− 72 = 11.667
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 42 / 103
SPECIAL CASE: X AND Y ARE INDEPENDENT
If X and Y are independent we have:
a) conditional pmf = marginal pmf
f (y |x) = fY (y) and f (x |y) = fX (x)
b) conditional means and variances = marginal means and variances:
E (Y |X = x) = E (Y ) , Var (Y |X = x) = Var (Y )
E (X |Y = y) = E (X ) , Var (X |Y = y) = Var (X )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 43 / 103
SPECIAL CASE: X AND Y ARE INDEPENDENT
c) If X and Y are independent we also have:
E (g (X ,Y ) |X = x)always︷︸︸︷= E (g (x ,Y ) |X = x)
independence︷︸︸︷= E (g (x ,Y ))
More precisely:
E (g (X ,Y ) |X = x) = ∑yg (x , y) f (y |x) = ∑
yg (x , y) fY (y) = E (g (x ,Y ))
Example:E(eX+Y |X = x
)= exE
(eY)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 44 / 103
CONDITIONAL MEAN AS A FUNCTION
In general E (Y |X = x) is a function of x :
E (Y |X = x) = h (x)
We can consider the random function:
h (X )
This is denoted ash (X ) = E (Y |X )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 45 / 103
EXAMPLE
1 2 3 4 5 6
05
1015
2025
30
x
E(Y|
X=x)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 46 / 103
TWO-STEP AVERAGE
RESULT:E {E (Y |X )} = E (Y )
More generally,
E {E (g (X ,Y ) |X )} = E (g (X ,Y ))
= ∑x
[∑yg (x , y) f (y |x)
]fX (x)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 47 / 103
EXPECTED DIFFERENCE IN TWO STEPS
E (X2) computed in two steps:
x1 E (X2|X1 = x1) f1 (x1) E (X2|X1 = x1) f1 (x1)2 0.00 1/36 0
3 1.00 2/36 2/36
4 1.33 3/36 4/36
5 2.00 4/36 8/36
6 2.40 5/36 12/36
7 3.00 6/36 18/36
8 2.40 5/36 12/36
9 2.00 4/36 8/36
10 1.33 3/36 4/36
11 1.00 2/36 2/36
12 0.00 1/36 0
SUM 1.944
⇒ E (X2) = 1.944
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 48 / 103
TWO-STEP AVERAGE PROOF
E {E (Y |X )} = E (h (X )) = ∑xh(x)fX (x)
= ∑x[E (Y |X = x)] fX (x) = ∑
x
[∑yyf (y |x)
]fX (x)
= ∑yy ∑xf (y |x) fX (x) = ∑
yy ∑xf (x , y)
= ∑yyfY (y) = E (Y )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 49 / 103
TOTAL VARIANCE
Red circles represent the conditional means: E (Y |X = x)Red ended whiskers represent the conditional variances: Var(Y |X = x)
1 2 3 4 5 6 7
05
1015
2025
30
x
y
Blue circle represents the overall mean: E (Y )Blue whisker represents the overall variance: Var(Y )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 50 / 103
TOTAL VARIANCE FORMULA
RESULT:
V (Y ) = E {Var (Y |X )}+ Var {E (Y |X )}
= Unexplained Variance + Explained Variance
Unexpl Var = E {Var (Y |X )} Expl Var = Var {E (Y |X )}
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 51 / 103
EXPLAINED VARIANCE
Total Var =Var (Y ) = Expl Var + Unexpl Var
Percentage of Explained Variance =Expl VarTotal Var
100%
=Var {E (Y |X )}
Var (Y )100%
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 52 / 103
PREDICTING THE SUM OF DICE
x2 E (X1|X2 = x2) Var (X1|X2 = x2) f2 (x2)0 7.00 11.667 6/361 7.00 8.000 10/362 7.00 5.000 8/363 7.00 2.667 6/364 7.00 1.000 4/365 7.00 0.000 2/36
Expl Var = Var (E (X1|X2)) = 0 (conditional mean is constant)
Unexpl Var = E (Var (X1|X2)) = 11.667×636+ · · ·+ 0× 2
36= 5.833
Total Var = 0+ 5.8334 = 5.833
Percentage of Explained Variance = 0%
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 53 / 103
PREDICTING THE DIFFERENCE
x1 E (X2|X1 = x1) Var (X2|X1 = x1) f1 (x1)2 0.00 0.000 1/363 1.00 0.000 2/364 1.33 0.898 3/365 2.00 1.000 4/366 2.40 2.240 5/367 3.00 2.667 6/368 2.40 2.240 5/369 2.00 1.000 4/3610 1.33 0.898 3/3611 1.00 0.000 2/3612 0.00 0.000 1/36
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 54 / 103
From the previous table we get
Expl Var = Var (E (X2|X1))
= E[E (X2|X1)2
]− (E [E (X2|X1)])2 = 0.6161127
Unexpl Var = E (Var (X2|X1)) = 1.438611
Var (X2) = 0.6161127+ 1.438611 = 2.0525
Percentage of Explained Variance =0.61611272.0525
× 100 = 30.02%
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 55 / 103
PROOF OF THE TOTAL VAR FORMULA
Proof:
Var (Y ) = E(Y 2)− [E (Y )]2
= E{E(Y 2|X
)}− [E {E (Y |X )}]2
= E{Var (Y |X ) + [E (Y |X )]2
}− [E {E (Y |X )}]2
= E {Var (Y |X )}+
Var (E (Y |X ))︷ ︸︸ ︷E{[E (Y |X )]2
}− [E {E (Y |X )}]2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 56 / 103
EXAMPLE
Example: Y | N = n ∼ Bin (N + 1, p) and
N ∼ Poisson(5)
Calculate:
(a) E (Y )
(b) Var (Y ) . What fraction of this variance is explained by N?
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 57 / 103
EXAMPLE (Solution part (b))
(a)
E (Y ) = E {E [Y |N ]}
= E {(N + 1) p}
= pE {N + 1}
= 6p
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 58 / 103
EXAMPLE (Solution part (c))
Solution
(b) Var (Y ) = E {Var (Y |N)}+ Var {E (Y |N)}
= E {(N + 1) p (1− p)}+ Var {(N + 1) p}
= p (1− p)E {N + 1}+ p2Var (N + 1)
= 6p (1− p) + 5p2 = p [6− 6p + 5p] = p (6− p)
Explained Variance Fraction =Var {E (Y |N)}
Var (Y )=
5p2
p (6− p) =5p6− p
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 59 / 103
EXAMPLE (Solution part (c))
Explained Variance Fraction As a Function of p
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p
EXPL
AIN
ED V
ARIA
NC
E F
RAC
TIO
N
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 60 / 103
EXAMPLE (Solution part (c))
Explained Variance Fraction As a Function of p
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
05
1015
p = 0.05
N
Y
0 1 2 3 4 5 6 7 8 9 10 11 12
05
1015
p = 0.95
N
Y
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 61 / 103
CONDITIONAL MEAN IS THE “BEST”PREDICTOR
Suppose we know X = x and we wish to predict the correspondingvalue Y .
Our prediction will be a function g (x)
The mean squared prediction error will be
E[(g (x)− Y )2 |X = x
]
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 62 / 103
SinceE[(Y − t)2
]≥ E
[(Y − E (Y ))2
]we have
E
constant︷ ︸︸ ︷g (x) − Y
2
| X = x
≥ E[(E (Y |X = x)− Y )2 | X = x
]
Hence, g (x) = E (Y |X = x) is our “best prediction”
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 63 / 103
CONTINUOUS RANDOM VECTORS
All the entries are continuous random variables
Joint behavior determined by the continuous joint density f (x)
The continuous joint density f (x) is a function f : Rm → Rsatisfying
1 f (x1, x2, ..., xm) ≥ 0, for all x = (x1, x2, ..., xm)∈Rm , and
2∫ ∞−∞ · · ·
∫ ∞−∞ f (x1, x2, ..., xm) dx1 · · · dxm = 1.
We will take m = 2 for simplicity
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 64 / 103
CONTINUOUS RANDOM VECTORS
The corresponding continuous joint distribution function is:
F (x1, x2) = P (X1 ≤ x1,X2 ≤ x2)
=∫ x2
−∞
∫ x1
−∞f (t1, t2) dt1dt2
By the Fundamental Theorem of Calculus
f (x1, x2) =∂2
∂x1∂x2F (x1, x2)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 65 / 103
EXAMPLES
Uniform on the unit square
f (x1, x2) = 1, 0 ≤ x1 ≤ 1, 0 ≤ x1 ≤ 1
Uniform on the circle
f (x1, x2) =1π, x21 + x
22 ≤ 1
Namelessf (x1, x2) =
1x1x2
, x1 > 1, x2 > 1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 66 / 103
BIVARIATE NORMAL
f (x1, x2) =
(1− ρ2
)−1/2
2πσ1σ2exp
−(x1−µ1)
2
σ21+(x2−µ2)
2
σ22− 2ρ(x1−µ1)(x2−µ2)
σ1σ2
2 (1− ρ2)
We show in Module 6 that
E (X) =(
µ1µ2
)= µ, Var (X1) = σ21, Var (X2) = σ22,
andCov (X1,X2) = σ12 = ρσ1σ2
Therefore
Cor (X1,X2) =σ12
σ1σ2= ρ.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 67 / 103
EXERCISE
Using that
Σ =(
σ11 σ12σ21 σ22
)=
(σ21 σ1σ2σ1σ2 σ22
)and that
Σ−1 =(
σ11 σ12
σ21 σ22
)=
11− ρ2
(1/σ21 −ρ/ (σ1σ2)−ρ/ (σ1σ2) 1/σ22
)show that
f (x1, x2) =det (Σ)−1/2
2πexp
{−12(x− µ)′ Σ−1 (x− µ)
}
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 68 / 103
MARGINAL DENSITIES
Let f (x1, x2) be the joint density for (X1,X2).
The marginal densities for X1 and X2 are
f1 (x1) =∫ ∞
−∞f (x1, x2) dx2
f2 (x2) =∫ ∞
−∞f (x1, x2) dx1,
respectively
The extra variables is integrated out.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 69 / 103
CONDITIONAL DENSITIES
As in the discrete case
f (x2|x1) =f (x1, x2)f1 (x1)
, f (x1|x2) =f (x1, x2)f2 (x2)
Moreover,
f (x1, x2) = f1 (x1) f (x2|x1) = f2 (x2) f (x1|x2)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 70 / 103
INDEPENDENT RANDOM VARIABLES
As in the discrete case, the continuous random variables X1,X2, ...,Xm areindependent if and only if
f (x1, x2, ..., xm) = f1 (x1) f2 (x2) · · · fm (xm)
RESULT: If X1,X2 are independent then
σ12 = 0
PROOF: σ12 =∫ ∞−∞
∫ ∞−∞ (x1 − µ1) (x2 − µ2) f1 (x1) f2 (x2) dx1dx2 =(∫ ∞
−∞ (x1 − µ1) f1 (x1) dx1) (∫ ∞
−∞ (x2 − µ2) f2 (x2) dx2)= 0
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 71 / 103
CONDITIONAL MEAN
µy |x = E (Y |X = x) =∫ ∞
−∞yf (y |x) dy (continuous case)
µy |x = E (Y |X = x) = ∑yyf (y |x) (discrete case)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 72 / 103
CONDITIONAL VARIANCE
σ2y |x = Var (Y |X = x) =∫ ∞
−∞
(y − µy |x
)2f (y |x) dy (continuous case)
σ2y |x = Var (Y |X = x) = ∑y
(y − µy |x
)2f (y |x) (discrete case)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 73 / 103
PRACTICE
Example: Suppose that X ∼ Unif (0, 10) and thatY |X = x ∼ Exp (1/x) .
(a) Calculate the mean and variance of Y .
(b) What fraction of the total variance is explained by X ?
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 74 / 103
PRACTICE
Solution: (a)
Recall that if Y ∼ Exp (λ) then E (Y ) = 1/λ and Var (Y ) = 1/λ2.
Since Y |X = x ∼ Exp (1/x) we have:
E (Y |X ) = X
Recall that, if X ∼ Unif (a, b) , the E (X ) = (a+ b) /2 andVar (X ) = (b− a)2 /12.
Since X ∼ Unif (0, 10) we have E (X ) = 5 and
Var (X ) = 100/12 = 8. 3333.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 75 / 103
PRACTICE (continued)
HenceE (Y ) = E {E (Y |X )} = E {X} = 5
and
Var (Y ) = E {Var (Y |X )}+ Var {E (Y |X )}
= E{X 2}+ Var {X}
= Var (X ) + [E (X )]2 + Var (X )
=10012
+ 52 +10012
= 41.667
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 76 / 103
PRACTICE (continued)
(b)
Percentage of Explained Variance =Var {E (Y |X )}
Var (Y )100%
=100/1241.667
100%
= 20%
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 77 / 103
FUNCTIONS OF CONTINUOUS RANDOM VECTORS
Suppose the random vector X has joint density
fX (x1, x2, ..., xm)
Consider the 1-1 function
y = h (x)
x = h−1 (y)
Then
fY (y) = fX(h−1 (y)
) ∣∣∣∣det( ∂xi∂yj
)∣∣∣∣ = fX [h−1 (y)] J (y)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 78 / 103
m = 2
Suppose the random vector
X =
(X1X2
)
has joint density
fX (x1, x2)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 79 / 103
Consider the 1-1 function from R2 → R2 :
y =
y1
y2
=
h1 (x1, x2)
h2 (x1, x2)
= h (x)
x =
x1
x2
=
h−11 (y1, y2)
h−12 (y1, y2)
= h−1 (y)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 80 / 103
The matrix of partial derivatives:
(∂xi∂yj
)=
∂x1∂y1
∂x1∂y2
∂x2∂y1
∂x2∂y2
=
∂h−11 (y1,y2)
∂y1∂h−11 (y1,y2)
∂y2
∂h−12 (y1,y2)∂y1
∂h−12 (y1,y2)∂y2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 81 / 103
The determinant:
det(
∂xi∂yj
)= det
∂x1∂y1
∂x1∂y2
∂x2∂y1
∂x2∂y2
=
∂x1∂y1
∂x2∂y2− ∂x1
∂y2
∂x2∂y1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 82 / 103
EXAMPLE 1
Example 1: Suppose that X1 and X2 are independent standard normalrandom variables. Their joint density is
f (x1, x2) =12πe(−1/2)(x 21+x 22 ), −∞ ≤ x1 ≤ ∞, −∞ ≤ x2 ≤ ∞.
Find the density function for
R =√X 21 + X
22
θ = arctan(X2X1
)
Note: this is the representation of the point (X1,X2) in the “polarcoordinate system”.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 83 / 103
RANGE AND INVERSE TRANSFORMATION
The range for (R, θ) is
R ≥ 0 , 0 ≤ θ < 2π
and the inverse transformation is
X1 = R cos (θ)
X2 = R sin (θ)
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 84 / 103
THE JACOBIAN
The Jacobian for this tranformation is:
J =
∣∣∣∣det( ∂x1∂R
∂x1∂θ
∂x2∂R
∂x2∂θ
)∣∣∣∣=
∣∣∣∣det( cos (θ) −R sin (θ)sin (θ) R cos (θ)
)∣∣∣∣= R cos2 (θ) + R sin2 (θ) = R
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 85 / 103
THE JACOBIAN
The density for (R, θ) is:
f (r , θ) =12πe(−1/2)r 2 r
It is easy to check that:
R and θ are independent,
θ is uniform on [0, 2π) and
R has the Rayleigh density
f (r) = e(−1/2)r 2 r , r ≥ 0.
EXERCISE: calculate F (r) and E (R) .Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 86 / 103
EXAMPLE 2
Example 2: Suppose that (X1,X2) have uniform density on the unitsquare:
f (x1, x2) = 1, 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
Find the density function for
Y1 = X1 + X2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 87 / 103
Solution: Complete an invertible 1-1 function
y1
y2
=
x1 + x2
x2
← function of interest
← auxiliary function
with inverse function x1
x2
=
y1 − y2
y2
=
h−11 (y1, y2)
h−12 (y1, y2)
NOTE: the auxiliary function is not unique. It is chosen for convenience.
Students can pursue other choices for practice. For example, y2 = x1 − x2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 88 / 103
The (Joint) Range of Y1 and Y2
0 < x1 < 1⇒ 0 < y1 − y2 < 1
0 < x2 < 1⇒ 0 < y2 < 1
Therefore,
0 < y2 < 1
y2 < y1 < 1+ y2
.
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 89 / 103
0 < y2 < 1
y2 < y1 < 1+ y2
0.0 0.5 1.0 1.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
y2
y1
y1 = y2+1
y1 = y2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 90 / 103
The Jacobian
x1 = y1 − y2 = h−11 (y1, y2)
x2 = y2 = h−12 (y1, y2)
J (y) =
∣∣∣∣∣∣∣det
∂x1∂y1
∂x1∂y2
∂x2∂y1
∂x2∂y2
∣∣∣∣∣∣∣ =
∣∣∣∣( 1 −10 1
)∣∣∣∣ = 1
fY (y1, y2) = fX (y1 + y2, y2) = 1, 0 < y2 < 1, y2 < y1 < 1+ y2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 91 / 103
To obtain the marginal density for Y1 we must integrate out y2.
From the picture of the domain it follows that for 0 ≤ y1 ≤ 1, we have0 < y2 < y1.Hence,
fY1 (y1) =∫ y1
0
=1︷ ︸︸ ︷fY (y1, y2)dy2 =
∫ y1
0dy2 = y1
Similarly, for 1 ≤ y1 ≤ 2, we have y1 − 1 < y2 < 1.Hence,
fY1 (y1) =∫ 1
y1−1dy2 = 2− y1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 92 / 103
In summary, Y1 has the triangular density
fY1 (y1) =
y1 0 ≤ y1 ≤ 1
2− y1 1 ≤ y1 ≤ 2
0 .0 0 .5 1 .0 1 .5 2 .0
0.0
0.2
0.4
0.6
0.8
1.0
Triangular Density
y 1
y 1 2 y 1
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 93 / 103
COVARIANCE
In the continuous case
σ12 =∫ ∞
−∞
∫ ∞
−∞(x1 − µ1) (x2 − µ2) f (x1, x2) dx1dx2
=∫ ∞
−∞
∫ ∞
−∞x1x2f (x1, x2) dx1dx2 − µ1µ2
= E (X1X2)− µ1µ2
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 94 / 103
EXAMPLE
(X1,X2) have joint uniform density on the unit square
Means
µ1 = µ2 =∫ 1
0xdx =
12
E (X1X2) =∫ 1
0
∫ 1
0x1x2dx1dx2 =
14
Covarianceσ12 = E (X1X2)− µ1µ2 = 0
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 95 / 103
LINEAR TRANSFORMATIONS
Let X be a p-dimensional random vector with mean vector µX andcovariance matrix ΣX.
DefineY =AX+ b
where A is a constant q × p matrix and b is a constant q dimensionalvector.
ThenµY = E (Y) = AµX+b
andΣY = Cov (Y) = A ΣX A′
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 96 / 103
LINEAR TRANSFORMATIONS (proof)
Proof:The proof for the mean formula is immediate:
E (Y) =E (AX+ b) = AE (X) +b = AµX+b
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 97 / 103
LINEAR TRANSFORMATIONS (proof)
Proof: To prove the covariance formula we write:
ΣY = E{(Y− µY) (Y− µY)
′}= E
{[(AX+ b)− (AµX+b)] [(AX+ b)− (AµX+b)]
′}= E
{(AX−AµX) (AX−AµX)
′}= E
{A (X− µX) (X− µX)
′ A′}
= AE{(X− µX) (X− µX)
′}A′ = A ΣX A′
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 98 / 103
LINEAR TRANSFORMATIONS (Example)
X =
X1X2X3
, E (X) =
325
, Cov (X) =
3 2 12 6 11 1 4
A =(1 1 1
), b = 1
Y=AX =(1 1 1
) X1X2X3
+ 1 = X1 + X2 + X3 + 1Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 99 / 103
LINEAR TRANSFORMATIONS (Example)
E (Y ) =(1 1 1
) 325
+ 1 = 11
Cov (Y ) = Var (Y ) =(1 1 1
) 3 2 12 6 11 1 4
111
=(1 1 1
) 696
= 21
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 100 / 103
LINEAR COMBINATIONS OF RV’S
a =
a1a2...am
, a vector of given constants
Y = a′X =m
∑i=1aiXi (a linear combination of the X ′i s )
THEN:
E (Y ) = E
(m
∑i=1aiXi
)= a′µ =
m
∑i=1aiµi
Var (Y ) = Var
(m
∑i=1aiXi
)= a′Σa =
m
∑i=1
m
∑j=1aiajσij
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 101 / 103
SUM OF RV’S
X1 + X2 + · · ·+ Xm = 1′XThen
Var (X1 + X2 + · · ·+ Xm) = Var(1′X)= = 1′ Σ 1
=m
∑i=1
m
∑j=1
σij =m
∑i=1
σii + 2∑i<j
σij
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 102 / 103
SUM OF INDEPENDENT RV’S
If X1,X2, ...,Xm are independent, then σij = 0, for i 6= j and so:
Var (X1 + X2 + · · ·+ Xm) =m
∑i=1
σii =m
∑i=1Var (Xi )
Ruben Zamar Department of Statistics UBC ()Module 5 February 9, 2016 103 / 103