+ All Categories
Home > Documents > Data Analysis: TD multivariate normal correction

Data Analysis: TD multivariate normal correction

Date post: 29-Oct-2021
Category:
Upload: others
View: 13 times
Download: 0 times
Share this document with a friend
15
Data Analysis: TD multivariate normal correction Christophe Ambroise Christophe Ambroise Data Analysis: TD multivariate normal correction 1 / 15
Transcript
Page 1: Data Analysis: TD multivariate normal correction

Data Analysis: TD multivariate normal correction

Christophe Ambroise

Christophe Ambroise Data Analysis: TD multivariate normal correction 1 / 15

Page 2: Data Analysis: TD multivariate normal correction

Section 1

Multivariate normal distribution (Exercices)

Christophe Ambroise Data Analysis: TD multivariate normal correction 2 / 15

Page 3: Data Analysis: TD multivariate normal correction

IQ

Knowing that IQ is a normal measure of mean 100 and standard deviation 15, what is theprobability of having an IQ

more than 120?less than 100?

Christophe Ambroise Data Analysis: TD multivariate normal correction 3 / 15

Page 4: Data Analysis: TD multivariate normal correction

IQ (Solution) I

QI.sup.120<-function(x){ifelse(x>120,dnorm(x,mean=100,sd=15),NA)}

ggplot(data.frame(x=c(20, 180)),aes(x)) +stat_function(fun = dnorm,args = list(mean=100,sd=15)) +stat_function(fun =QI.sup.120 , geom = "area", fill = "coral", alpha = 0.3) +geom_text(x = 127, y = 0.003, size = 4, fontface = "bold",

label = paste0(round(pnorm(120,mean=100,sd=15,lower.tail = FALSE),3) * 100, "%")) +scale_x_continuous(breaks = c(80,100,120,130)) +geom_vline(xintercept=120,colour="coral")

Christophe Ambroise Data Analysis: TD multivariate normal correction 4 / 15

Page 5: Data Analysis: TD multivariate normal correction

IQ (Solution) II

9.1%9.1%

0.00

0.01

0.02

80 100 120 130x

y

curve(dnorm(x,mean=100,sd=15),20,180)abline(v=120,col="red")text(x=130,y=0.01,"0,1%")

50 100 150

0.00

00.

005

0.01

00.

015

0.02

00.

025

x

dnor

m(x

, mea

n =

100

, sd

= 1

5)

0,1%

Christophe Ambroise Data Analysis: TD multivariate normal correction 5 / 15

Page 6: Data Analysis: TD multivariate normal correction

Bias of the maximum likelihood estimator of the variance

Show that the maximum likelihood estimator of the variance is biased and propose anunbiased estimator.

Solution

E[σ̂2ml ] = E[ 1n

∑i

x2i − x̄2]

= σ2 + µ2 + σ2

n − µ2

Christophe Ambroise Data Analysis: TD multivariate normal correction 6 / 15

Page 7: Data Analysis: TD multivariate normal correction

Extreme values

Consider the Fisher irises. Find flowers whose measured widths and lengths areexceptionally large or small.

Christophe Ambroise Data Analysis: TD multivariate normal correction 7 / 15

Page 8: Data Analysis: TD multivariate normal correction

Solution {-} I

data(iris)parameters <-as.tibble(iris) %>%select(-"Species") %>%gather(factor_key = TRUE) %>%group_by(key) %>%summarise(mean= mean(value), sd= sd(value)) %>%mutate(min=mean - 2*sd,max=mean + 2*sd)

## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.## Please use `as_tibble()` instead.## The signature and semantics have changed, see `?as_tibble`.## This warning is displayed once every 8 hours.## Call `lifecycle::last_warnings()` to see where this warning was generated.

## `summarise()` ungrouping output (override with `.groups` argument)flower.outliers <-(apply(t((t(iris[,1:4]) < parameters$min) + (t(iris[,1:4]) > parameters$max)),1,sum)>0)ggplot(iris,aes(x=Sepal.Length,y=Sepal.Width))+geom_point(colour=as.numeric(iris$Species),size= flower.outliers*2 + 1 )

Christophe Ambroise Data Analysis: TD multivariate normal correction 8 / 15

Page 9: Data Analysis: TD multivariate normal correction

Solution {-} II

2.0

2.5

3.0

3.5

4.0

4.5

5 6 7 8Sepal.Length

Sep

al.W

idth

Figure 1: Iris de FisherChristophe Ambroise Data Analysis: TD multivariate normal correction 9 / 15

Page 10: Data Analysis: TD multivariate normal correction

Equiprobability Ellipses I

Generate 1000 observation of a two-dimensional normal distribution N (µ,Σ) withΣ =

(2 11 0.75

)µt = (0, 0)

Draw the ellipses of equiprobability of the multiples of 5%.

Christophe Ambroise Data Analysis: TD multivariate normal correction 10 / 15

Page 11: Data Analysis: TD multivariate normal correction

Solution {-} I

Let x1, . . . , xp i.i.d. variables following N (0, 1), then = (x1, . . . , xp)) ∼ Np(0, Ip)Find a matrix A of size (p, p) such that Ax has variance Σ, i.e. AA′ = Σ. Sevralsolutions are possible - Cholesky : Σ = T ′T where T is triangular (A = T ′) - SVD :Σ = UDU ′ where D is a diagonal matrix of eigenvalues and U an orthogonal matrixof eigenvectors (A = UD 1

2 )then y = Ax + µ ∼ Np(0,Σ)

If x ∼ Np(µ,Σ) alors y = Σ−1/2(x − µ) ∼ Np(0, Ip) and

Q = y ty ∼ χ2p

.

The equationP(Q ≤ q) = α

with q = χ2p,α defines an α level equiprobability ellipsoid .

Christophe Ambroise Data Analysis: TD multivariate normal correction 11 / 15

Page 12: Data Analysis: TD multivariate normal correction

Solution {-} II

par(mfrow=c(1,3)) # partage l'affichage en 2Q<-qchisq(p=seq(0.05,0.95,by=0.1),df=2)sigma<-matrix(c(2,1,1,0.75),2,2)Y<-matrix(rnorm(2000),1000,2)%*%chol(sigma)plot(Y,xlab="x",ylab="y",pch='.')x<-seq(-4,4,length=100)y<-seq(-4,4,length=100)sigmainv<-solve(sigma)a<-sigmainv[1,1]b<-sigmainv[2,2]c<-sigmainv[1,2]z<-outer(x,y,function(x,y) (a*xˆ2+b*yˆ2+2*c*x*y))image(x,y,z)contour(x,y,z,col="blue4",levels=Q,labels=seq(from=0.05,to=0.95,by=0.1),add=T)persp(x,y,1/(2*pi)*det(sigmainv)ˆ(-1/2)*exp(-0.5*z),col="cornflowerblue",theta=5,phi=10,zlab="f(x)")

Christophe Ambroise Data Analysis: TD multivariate normal correction 12 / 15

Page 13: Data Analysis: TD multivariate normal correction

Solution {-} III

−4 −2 0 2 4

−2

−1

01

2

x

y

−4 −2 0 2 4

−4

−2

02

4

x

y

0.0

5

0.15

0.25

0.35

0.4

5

0.55

0.65

0.75

0.85

0.95

x

y

f(x)

Figure 2: Ellipso<U+00EF>de d’<U+00E9>quiprobabilit<U+00E9> dans le plan

Christophe Ambroise Data Analysis: TD multivariate normal correction 13 / 15

Page 14: Data Analysis: TD multivariate normal correction

Limit between two bidimensional Gaussian

Simulate to Gaussian multivariate densities in 2d with respective mean vectors µ1 =(00

)and µ2 =

(22

)a. With the same covariance matrix Σ =

(2 11 0.75

)b. With different covariance matrices Σ1 =

(2 11 0.75

)and Σ2 =

(1 00 1

)Consider a mixture of the two densities in proportion π, 1− π and draw the limit betweenthe two posterior densities (where probabilities of being drawn from each component isequal) for diffent values of π.

Christophe Ambroise Data Analysis: TD multivariate normal correction 14 / 15

Page 15: Data Analysis: TD multivariate normal correction

Correction

The distribution if a mixture

f (x) = πf1(x) + (1− π)f2(x).

The posterior of the first class is

p(x|k = 1) = πf1(x)f (x)

The equation to use for the contour line is

log p(x|k = 1) = log p(x|k = 2)

Christophe Ambroise Data Analysis: TD multivariate normal correction 15 / 15


Recommended