+ All Categories
Home > Documents > Generalized Variance Multivariate Normal Distribution

Generalized Variance Multivariate Normal Distribution

Date post: 04-Feb-2017
Category:
Upload: vukhue
View: 228 times
Download: 2 times
Share this document with a friend
57
Lecture #4 - 9/14/2005 Slide 1 of 57 Generalized Variance Multivariate Normal Distribution Lecture 4 September 14, 2005 Multivariate Analysis
Transcript
Page 1: Generalized Variance Multivariate Normal Distribution

Lecture #4 - 9/14/2005 Slide 1 of 57

Generalized VarianceMultivariate Normal Distribution

Lecture 4September 14, 2005Multivariate Analysis

Page 2: Generalized Variance Multivariate Normal Distribution

Overview

Last Time

Today’s Lecture

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 2 of 57

Last Time

■ Matrices and vectors.

◆ Eigenvalues.

◆ Eigenvectors.

◆ Determinants.

■ Basic descriptive statistics using matrices:◆ Mean vectors.

◆ Covariance Matrices.

◆ Correlation Matrices.

Page 3: Generalized Variance Multivariate Normal Distribution

Overview

Last Time

Today’s Lecture

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 3 of 57

Today’s Lecture

■ Generalized Variance (Chapter 3, Section 4).

■ Multivariate Normal Distribution (Chapter 4).

Page 4: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

Generalized Sample Variance

Generalized Sample Variance

With RTotal Sample Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 4 of 57

Sample Covariance Matrix

■ Recall the sample covariance matrix:

S =1

n − 1

n∑

i=1

(xi − x)2 =1

n − 1(X − 1x′)′(X − 1x′).

■ The overall sample covariance matrix gives a picture of thecovariation between each variable in the sample.

■ A single-number summary of this matrix can be provided bythe generalized variance.

■ Although the generalized variance is not used frequently, youwill see in later slides that this value is part of the multivariatenormal distribution.

Page 5: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

Generalized Sample Variance

Generalized Sample Variance

With RTotal Sample Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 5 of 57

Generalized Sample Variance

■ Generalized Sample Variance is computed by |S| (thedeterminant of the sample covariance matrix).

■ To begin, imagine a multidimensional cube (an ellipsoid) thatrepresents the end points of all the column vectors in thesample matrix X.

◆ Covariance matrix describes the overall spread of thatshape in each direction (if we could plot it).

◆ The equation (x − x)S−1(x − x) = a2 describes anequation where all points are equal distant from the meanx (i.e., it will form an ellipsoid).

◆ That ellipsoid will have axes proportional to the squareroots of the eigenvalues of S.

◆ The volume of that ellipsoid is equal to |S|1/2 (notice whathappens if we have a zero eigenvalue?

Page 6: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

Generalized Sample Variance

Generalized Sample Variance

With RTotal Sample Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 6 of 57

Generalized Sample Variance With R

■ The generalized sample variance is dependent upon thescale of the variables in the sample.

■ Because of the scale of these variables, the generalizedsample variance can be hard to interpret (much likevariances and covariances).

◆ The scale of a single variable may have a disproportionateimpact on the generalized variance (e.g., your sample hasthe variables of GPA and income in US dollars).

Page 7: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

Generalized Sample Variance

Generalized Sample Variance

With RTotal Sample Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 7 of 57

Generalized Sample Variance With R

■ Rather than using |S|, the sample correlation matrix can beused |R|.

■ The interpretation of the GSV is the same - the volume of theellipsoid that is formed by the standardized variables in thesample.

■ The difference in magnitude is proportional to the product ofthe variances of the variables in the sample.

■ SAS Example #1...

Page 8: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

Generalized Sample Variance

Generalized Sample Variance

With RTotal Sample Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 8 of 57

Total Sample Variance

■ Another way to characterize the sample variance is with thetotal variance.

■ Total variance equals tr(S) = s11 + s22 + . . . + spp.

■ Describes the variability of the data without taking in toaccount the covariances.

■ Like generalized variance, the total sample variance reflectsthe overall spread of the data.

■ Many multivariate techniques refer use total sample variancein computation of variance accounted for.

Page 9: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 9 of 57

Multivariate Normal Distribution

■ The generalization of the well-known normal distribution tomultiple variables is called the multivariate normaldistribution (MVN).

■ Many multivariate techniques rely on this distribution in somemanner.

■ Although real data may never come from a true MVN, theMVN provides a robust approximation, and has many nicemathematical properties.

■ Furthermore, because of the central limit theorem, manymultivariate statistics converge to the MVN distribution as thesample size increases.

Page 10: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 10 of 57

Univariate Normal Distribution

■ The univariate normal distribution function is:

f(x) =1√

2πσ2e−[(x−µ)/σ]2/2

■ The mean is µ.

■ The variance is σ2.

■ The standard deviation is σ.

■ Standard notation for normal distributions is N(µ, σ2), whichwill be extended for the MVN distribution.

Page 11: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 11 of 57

Univariate Normal Distribution

N(0, 1)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Univariate Normal Distribution

x

f(x)

Page 12: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 12 of 57

Univariate Normal Distribution

N(0, 2)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Univariate Normal Distribution

x

f(x)

Page 13: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 13 of 57

Univariate Normal Distribution

N(3, 1)

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

Univariate Normal Distribution

x

f(x)

Page 14: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 14 of 57

UVN - Notes

■ Recall that the area under the curve for the univariate normaldistribution is a function of the variance/standard deviation.

■ In particular:

P (µ − σ ≤ X ≤ µ + σ) = 0.683

P (µ − 2σ ≤ X ≤ µ + 2σ) = 0.954

■ Also note the term in the exponent:

(

(x − µ)

σ

)2

= (x − µ)(σ2)−1(x − µ)

■ This is the square of the distance from x to µ in standarddeviation units, and will be generalized for the MVN.

Page 15: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 15 of 57

MVN

■ The multivariate normal distribution function is:

f(x) =1

(2π)p/2|Σ|1/2e−(x−µ)Σ

−1

(x−µ)/2

■ The mean vector is µ.

■ The covariance matrix is Σ.

■ Standard notation for multivariate normal distributions isNp(µ,Σ).

■ Visualizing the MVN is difficult for more than two dimensions,so I will demonstrate some plots with two variables - thebivariate normal distribution.

Page 16: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 16 of 57

Bivariate Normal Plot #1

µ =

[

0

0

]

,Σ =

[

1 0

0 1

]

−4

−2

0

2

4

−4

−2

0

2

40

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Page 17: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 17 of 57

Bivariate Normal Plot #1a

µ =

[

0

0

]

,Σ =

[

1 0

0 1

]

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Page 18: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 18 of 57

Bivariate Normal Plot #2

µ =

[

0

0

]

,Σ =

[

1 0.5

0.5 1

]

−4

−2

0

2

4

−4

−2

0

2

40

0.05

0.1

0.15

0.2

Page 19: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 19 of 57

Bivariate Normal Plot #2

µ =

[

0

0

]

,Σ =

[

1 0.5

0.5 1

]

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Page 20: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 20 of 57

MVN Contours

■ The lines of the contour plots denote places of equalprobability mass for the MVN distribution.

■ These contours can be constructed from the eigenvaluesand eigenvectors of the covariance matrix.

◆ The direction of the ellipse axes are in the direction of theeigenvalues.

◆ The length of the ellipse axes are proportional to theconstant times the eigenvector.

■ Specifically:

(x − µ)Σ−1(x − µ) = c2

has ellipsoids centered at µ, and has axes ±c√

λiei.

Page 21: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 21 of 57

MVN Contours, Continued

■ Contours are useful because they provide confidenceregions for data points from the MVN distribution.

■ The multivariate analog of a confidence interval is given byan ellipsoid, where c is from the Chi-Squared distributionwith p degrees of freedom.

■ Specifically:

(x − µ)Σ−1(x − µ) = χ2p(α)

provides the confidence region containing 1 − α of theprobability mass of the MVN distribution.

Page 22: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 22 of 57

MVN Contour Example

■ Imagine we had a bivariate normal distribution with:

µ =

[

0

0

]

,Σ =

[

1 0.5

0.5 1

]

■ The covariance matrix has eigenvalues and eigenvectors:

λ =

[

1.5

0.5

]

, E =

[

0.707 −0.707

0.707 0.707

]

■ We want to find a contour where 95% of the probability willfall, corresponding to χ2

2(0.05) = 5.99

Page 23: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 23 of 57

MVN Contour Example

■ This contour will be centered at µ.

■ Axis 1:

µ ±√

5.99 × 1.5

[

0.707

0.707

]

=

[

2.12

2.12

]

,

[

−2.12

−2.12

]

■ Axis 2:

µ ±√

5.99 × 0.5

[

−0.707

0.707

]

=

[

−1.22

1.22

]

,

[

1.22

−1.22

]

Page 24: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Univariate Review

MVN

MVN Contours

MVN Properties

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 24 of 57

MVN Properties

■ The MVN distribution has some convenient properties.

■ If X has a multivariate normal distribution, then:

1. Linear combinations of X are normally distributed.

2. All subsets of the components of X have a MVNdistribution.

3. Zero covariance implies that the correspondingcomponents are independently distributed.

4. The conditional distributions of the components are MVN.

Page 25: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

UVN CLT

Multi CLT

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 25 of 57

Distribution of x and S

Recall back in Univariate statistics you discussed the CentralLimit Theorem (CLT)

It stated that, if the set of n observations x1, x2, . . . , xn werenormal or not...

■ The distribution of x would be normal with mean equal to µand variance σ2/n

■ We were also told that (n − 1)s2/σ2 had a Chi-Squaredistribution with n − 1 degrees of freedom

■ Note: We ended up using these pieces of information forhypothesis testing such as t-test and ANOVA.

Page 26: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

UVN CLT

Multi CLT

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 26 of 57

Distribution of x and S

We also have a Multivariate Central Limit Theorem (CLT)

It states that, if the set of n observations x1, x2, . . . , xn aremultivariate normal or not...

■ The distribution of x would be normal with mean equal to µ

and variance/covariance matrix Σ/n

■ We are also told that (n − 1)S will have a Wishartdistribution, Wp(n − 1,Σ), with n − 1 degrees of freedom

◆ This is the multivariate analogue to a Chi-Squaredistribution.

■ Note: We will end up using some of this information formultivariate hypothesis testing.

Page 27: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

UVN CLT

Multi CLT

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 27 of 57

Distribution of x and S

■ Therefore, let x1, x2, . . . , xn be independent observationsfrom a population with mean µ and covariance Σ

■ The following are true:

◆√

n(

X − µ)

is approximately Np(0,Σ).

◆ n (X − µ)′ S−1 (X − µ) is approximately χ2

p.

Page 28: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 28 of 57

Assessing Normality

■ Recall from earlier that IF the data have a Multivariatenormal distribution then all of the previously discussedproperties will hold.

■ So we will want to have at least a set of test/methods toassess Multivariate Normality.

◆ We said that if all marginal distributions are NOT normalthen the joint distribution can not be MVN. So we will firsttalk about the assessing normality

◆ We also said even if all marginals are normally distributedthe joint is not necessarily MVN, So we will then assessMultivariate Normality.

Page 29: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 29 of 57

Assessing Normality

■ We will find that there are two ways to assessnormality/MVN.

1. By comparing the distribution of your observations (orsome transformation of your observations) to some knowndistribution. (These are commonly called Q-Q plots)

2. By computing some set of statistics and obtaining ap-value (i.e., compute a statistic with a known distributionand determine how extreme the statistic is compared to anull hypothesis).

Page 30: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 30 of 57

Assessing Univariate Normality

We begin with Assessing Univariate Normality using a Q-Qplot.■ A Q-Q plot is a plot that matches the Quantiles of the

observed data with the Quantiles of a specific distribution.

■ A Quantile (commonly called a percentile) is that value suchthat a specific proportion p of the population will score at orbelow.

◆ For example the .5 quantile of a N(0,1) is 0.

■ In our case the Quantiles of a specific distribution will be anormal, N(0, 1).

◆ It could be a N(x, s2x), if preferred.

■ There should be a linear relationship between the quantilesof the observed data with their theoretical quantiles(assuming the distribution) if they follow the samedistribution.

Page 31: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 31 of 57

Constructing a Q-Q plot

Lets assume that we have n observations x1, x2, . . . , xn. Toconstruct a Q-Q plot we:

1. Order the observations from smallest to largest (i.e.,x(1) ≤ y(2) ≤ . . . ≤ x(n) ).

2. Next we define the ith point, x(i), as the (i − .5)/n quantile.

■ We could use i/n but can cause problems.

3. Based on a N(0, 1) distribution we compute the quantilevalues q1, q2, . . . , qn (this is typically done using a table orcomputer).

4. Finally plot (x(i), qi), and if they follow the same distribution(Normal) they should form a line.

Page 32: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 32 of 57

Example Q-Q plot

Lets assume that we have 5 observations: 3, 6, 4, 5, 2:

First we order them

y(i) (i − .5)/n qi

23456

Page 33: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 33 of 57

Example Q-Q plot

Lets assume that we have 5 observations: 3, 6, 4, 5, 2:

Next compute quantiles

y(i) (i − .5)/n qi

2 (1 − .5)/5 = .1

3 (2 − .5)/5 = .3

4 (3 − .5)/5 = .5

5 (4 − .5)/5 = .7

6 (5 − .5)/5 = .9

Page 34: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 34 of 57

Example Q-Q plot

Lets assume that we have 5 observations: 3, 6, 4, 5, 2:

Finally compute quantiles values assuming N(0, 1) (i.e., this isa z-score)

y(i) (i − .5)/n qi

2 (1 − .5)/5 = .1 -1.283 (2 − .5)/5 = .3 -0.524 (3 − .5)/5 = .5 0.005 (4 − .5)/5 = .7 0.526 (5 − .5)/5 = .9 1.28

and plot

Page 35: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 35 of 57

Example Q-Q plot

Notice how it follows nearly a straight line

Figure 1: Q-Q plot

Q

1.5 1.0 .5 0.0 -.5 -1.0 -1.5

Y

7

6

5

4

3

2

1

Page 36: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 36 of 57

Three Tests for Normality

■ The remaining methods are tests for normality

■ In each case it is computing a statistic and checking forsignificance.

■ I will mention these because in some journals this ismentioned.

■ However, because we will be mostly interested in MVN onecould quickly check for normality and the if happy, test forMVN.

Page 37: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 37 of 57

Test 1

Begin by computing Skewness and Kurtosis

Skewness,√

b1

b1 =

√n

∑ni=1(yi − y)3

[∑n

i=1(yi − y)2]3/2

Kurtosis, b2

b2 =n

∑ni=1(yi − y)4

[∑n

i=1(yi − y)2]2

Page 38: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing Normality

Uni Norm

Make a Q-Q plot

Example Q-Q plot

Normality Tests

Test 1

Other Tests

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 38 of 57

Other Tests

Other tests can be gathered in SAS (note the null hypothesis isalways that the data is normally distributed).proc univariate data=mydata normal plot;var x1-x5;run;

Page 39: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Assessing MVN

Scatter Plots

Q-Q Plots

Tests

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 39 of 57

Assessing MVN

■ Notice that many of the procedures that we discussed(Including the Q-Q plot) required that we rank order theobservations.

■ If instead of single observations, we have a set of nobservations of p variables (x1, x1, . . . , xn) ordering allobservations is much more difficult.

■ In fact, the book mentions that any test for MVN is far moredifficult and often has little power because of the number ofobservations on p-space, but some check should be done.

Page 40: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Assessing MVN

Scatter Plots

Q-Q Plots

Tests

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 40 of 57

Scatter Plots

Here we will discuss two graphical methods

■ Possibly the easiest method to assess MVN is to use scatterplots.

■ If there are not a large number of variables consider lookingscatter plots of all pairs of variables.

◆ Recall that one property of a MVN is that all subsets ofvariables should also be multivariate (in this casebivariate) normal.

◆ This means any relationship between variables should belinear and otherwise should have a random pattern

■ Could also look at all sets of three variables should also havetri-variate normal distributions.

Page 41: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Assessing MVN

Scatter Plots

Q-Q Plots

Tests

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 41 of 57

Q-Q Plots

As an alternative (for example if there are too many variableswe could use a Q-Q plot

■ Our Q-Q plot will be based on D2 = (x − x)′S−1(y − y).Note, x is each entity.

◆ Recall that D2 = (x − µ)′Σ−1(x − µ) has a χ2p distribution

with p degrees of freedom.

◆ We could use that to compute a Q-Q plot (i.e., use the χ2p

distribution to get the values of qi).

◆ However in estimating the mean vector and variancecovariance matrix this plot could be misleading.

■ Instead, some suggest using a function of D2.

Page 42: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Assessing MVN

Scatter Plots

Q-Q Plots

Tests

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 42 of 57

Q-Q Plots

We will define a new variable ui, where

ui = D2i .

1. Order the observations from smallest to largest (i.e.,u(1) ≤ u(2) ≤ . . . ≤ u(n) ).

2. Next we define the ith point, u(i), as the (i − .5)/n quantile.

3. Based on a χ2p distribution, we compute the quantile values

q1, q2, . . . , qn (this is typically done using a table orcomputer) where:

Page 43: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Assessing MVN

Scatter Plots

Q-Q Plots

Tests

Outliers

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 43 of 57

Tests

■ Tests based on the Skewness and Kurtosis also exist.

■ These are a generalizations of the univariate tests.

■ You will not be tested on this, but know where it is just incase you ever need a test.

Page 44: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 44 of 57

Outliers

■ Detecting outliers is difficult in Multiple dimensions.

◆ Can’t simply order them and look for extreme values.

◆ May not be able to see in only 2-D plots.

◆ Could be different degrees of recording errors.

■ Here I discuss one method for detecting Outliers

Page 45: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 45 of 57

Outliers Using D2

Wilk’s statistics was designed to detect a single outlier:■ Compute w where

w = maxi

|(n − 2)S−i||(n − 1)S|

■ To simplify it can be shown that w also equals:

w = 1 −nD2

(n)

(n − 1)2

■ The distribution for w is the F distribution, however the onlyunknown is D2

(n) and so we can just make decisions basedon it.

■ Therefore a test can be made that is based on the D2, whichis also computed to assess MVN

Page 46: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 46 of 57

Example

Here we consider three variables: time it takes people to get toclass, number of years in school, overall happiness.

■ Just as a generic procedure I would

1. Evaluate each variable individually

2. Evaluate each pair (if this is reasonable)

3. Evaluate MVN using a Q-Q plot

4. Evaluate the reasonableness of any outliers (this can alsobe done in the previous steps.

Page 47: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 47 of 57

Example Q-Q plots

First we us a Q-Q plot for each variable.

Figure 2: Q-Q plot of Time

Page 48: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 48 of 57

Example Q-Q plots

Figure 3: Q-Q plot of Year

Page 49: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 49 of 57

Example Q-Q plots

Figure 4: Q-Q plot of Happy

Page 50: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 50 of 57

Example Bivariate plots

■ So if we decide that everything looks OK then we move on tothe bivariate plots.

■ If they do not look OK we can consider using some kind ofrobust analyses or consider a transformation.

Page 51: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 51 of 57

Example Bivariate plots

Figure 5: Time versus Year

Page 52: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 52 of 57

Example Bivariate plots

Figure 6: Time versus Happy

Page 53: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 53 of 57

Example Bivariate plots

Figure 7: Year versus Happy

Page 54: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Outliers

Outliers Using D2

Example

Example Q-Q plots

Example Bivariate plots

MV Q-Q Plot

Transformations

Wrapping Up

Lecture #4 - 9/14/2005 Slide 54 of 57

MV Q-Q Plot

Finally, the Multivariate Q-Q plot.

Figure 8: Year versus Happy

■ We can also look for an outlier.■ Largest D2 = 13.4

■ Compare it to Table A.6 for 30 observation and 3 variables.◆ Max D2 is 12.24 with p-value=.05 and 14.14 with

p-value=.01.

Page 55: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

TransformationsTransformations to Near

Normality

Wrapping Up

Lecture #4 - 9/14/2005 Slide 55 of 57

Transformations to Near Normality

A few types of data must be transformed prior to doinganalyses that assume data comes from a MVN distribution:

■ Counts, y, are transformed with√

y.

■ Proportions, p, are transformed with logit(p) = 12 log

(

p1−p

)

■ Correlations, r, are transformed with (Fisher’s)

z(r) = 12 log

(

1+r1−r

)

.

Variations of transformations exist, as do other techniques.

Page 56: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Final Thought

Next Class

Lecture #4 - 9/14/2005 Slide 56 of 57

Final Thought

■ The multivariate normaldistribution is an analog tothe univariate normaldistribution.

■ The MVN distribution willplay a large role in theupcoming weeks.

■ We can finally put the background material to rest, and beginlearning some practical statistics.

Page 57: Generalized Variance Multivariate Normal Distribution

Overview

Generalized Variance

MVN

Distributions

Assessing Uni Normality

Assessing MV Normality

Outliers

Transformations

Wrapping Up

Final Thought

Next Class

Lecture #4 - 9/14/2005 Slide 57 of 57

Next Time

■ Statistical Analyses - Mean Vector Inference.


Recommended