Algebraic Statistics Seminar, November 14th 2018 Stéphanie...

Post on 14-Jun-2020

2 views 0 download

transcript

Statistics PrimerAlgebraic Statistics Seminar, November 14th 2018Stéphanie van der Pas

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics

Statistical inferenceGoal: infer properties of a population, based on a sample.

Statistical modelsDef: A statistical model is a collection of probability distributions or density functions on a given outcome space.

Def: A parametric statistical model is a collection of probability distributions or density functions that can be described with a finite number of parameters. Notation:

{pθ : θ ∈ Θ}parameter space

Example of a parametric model duration in minutes of telephone call between employee m and customer i.

Assumption: for every m,

Model:

Outcome space:

Parameter space:

Xmi =

Xm1 , Xm

2 , …, Xmnm i.i.d. Exp(λm)

{pλ : pλ(x) = λe−λx, x ≥ 0, λ > 0}

minutes

count

employee 1

minutes

count

employee 2

!6

77¥777*00

[ o, a )

( o , a )

An i.i.d. sampleDef: If each have the same probability distribution, say with distribution or density , and are mutually independent, then is and independent and identically distributed sample, often abbreviated as

Then the joint distribution of the is determined by the marginal distribution.

X1, X2, …, Xnpθ

D = X1, X2, …, Xn

X1, X2, …, Xn i.i.d

X1, X2, …, Xn

Xi N the i. i. d.

-

←.

Polk , , . . - , xn ) = IT pocxi )

Nonparametric and semi-parametric statistical models

Def: A statistical model is nonparametric if it cannot be parametrized by a finite number of parameters.

Ex: The collection of all distributions with mean equal to zero.

Def: A statistical model is semi-parametric if it has both a parametric and a nonparametric component.

Ex: The Cox-model, which contains all densities with hazard functions of the form

λ(t ∣ X = x) = λ(t)eβT x = λ(t)e ∑pj= 1 βjxj .②Para me En ,

non parametric

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics

Estimation

Parameter estimation

0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2 3 4 5

0.0

0.1

0.2

0.3

0.4

0.5

−2 0 2 4

0.0

0.1

0.2

0.3

0.4

!11

Ncm ,i ) Expo )

Gamma Ca, p )

Estimator / statisticModel , data X.

Def: An estimator / statistic is a stochastic vector that only depends on the data and known quantities. The corresponding estimate after observing x is

{pθ : θ ∈ Θ}

θ (X)

θ (x) .e. g . Xi ,

- - -

, Xn i. i. d. Expat )

Ifk, - ,Xn ) = I is an estimator

ICH , - →Xn ) =

znot an estimator

Estimation - method of momentsIf are i.i.d. with finite mean , then:

X1, X2, … μ

ℙ ( limn→∞

1n

n

∑i= 1

Xi = μ) = 1.

MomentsDef: The jth moment of a random variable X, with distribution dependent on , is if it exists.

Ex:

θ *θ[Xj]

first moment

second moment

third moment

*θ[X]

*θ[X2]

*θ[X3]

!14

density X

tJoc . Po Cx ) dx

5×2 . pack doc

5×3 - po Gc ) doc

Empirical momentsDef: The jth empirical moment of i.i.d. random variables is

Vb:

X1, X2, …, XnXj = 1

n

n

∑i= 1

(Xi) j

first empirical moment

second empirical moment

third empirical moment

X = 1n

n

∑i= 1

Xi

X2 = 1n

n

∑i= 1

(Xi)2

X3 = 1n

n

∑i= 1

(Xi)3

!15

Method of moments

1. Compute the moments, starting from the first, until you have k moments depending on .

2. Replace the moments by their empirical counterparts and replace the parameters by their estimators, and solve the system.

Let θ ∈ ℝk, θ = (θ1, θ2, …, θk)

θ

!16

Example 1: exponential distributionX1, X2, …, Xn i.i.d. Exp(λ), pλ(x) = λe−λx, *[Xi] = 1

λ0 2 4 6 8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

!17

D KICK ] -

- IT

27 I = IT , so I = ¥

I Ga, . . . pal -

¥

Example 2: uniform distributionX1, X2, …, Xn i.i.d. Unif[0, θ], pθ(x) = 1

θ1{0 ≤ x ≤ θ}

!18

Estimation - maximum likelihoodDef: Let X be a stochastic vector with probability mass function or density which depends on a parameter

The likelihood function is the function given by

Def: The log-likelihood function is the logarithm of the likelihood function, denoted by

Def: The score function is the gradient of the log-likelihood function.

pθ, θ ∈ Θ .

L : Θ → ℝ

L(θ ∣ x) = pθ(x) .

ℓ(θ ∣ x) = log L(θ ∣ x) .

Example: binomial probability mass function

As a pmf, we view as a function of for givenE.g. with n = 5:

X ∼ Bin(n, p)pp(x) = ℙp(X = x) = (n

x) px(1 − p)n−x

pp(x) x, p .

x

ℙ0.5(X = x)

x

ℙ0.7(X = x)p -

- o. 7

Binomial likelihood

We view as a function of , for given

X ∼ Bin(n, p)pp(x) = ℙp(X = x) = (n

x) px(1 − p)n−x

pp(x) x .p

p

L(p; x = 1)

p

L(p; x = 3)pH =3 ) it

÷"÷¥"

a

!

o i o Ia I

The MLEDef: The maximum likelihood estimator (MLE) for is the maximizer of the likelihood function.

θ

p

L(p; x = 1)

p

L(p; x = 3)

Aia.

Example MLEX1, X2, …, Xn i.i.d Exp(λ)X -

LAH IIPxcxi ) -

- II te-txi.hn e-III. ai

la )= n log X- t II Xi

(G) = I - II. xi . e''

Al -- Fi

e' ( I ) .

- o yields I = ¥

:-,¥

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics

Hypothesis testing

Van‘tVeeretal.(2002),GeneExpressionProfilingPredictsClinicalOutcomeofBreastCancer,Nature415,530-536.

Model and hypothesesModel:

Common hypotheses if

Typically, the aim is to reject H0.

{pθ : θ ∈ Θ},H0 : θ ∈ Θ0H1 : θ ∈ Θ1

Θ0 ⊂ Θ, Θ1 = Θ\Θ0

θ ∈ ℝ :

H0 : θ = θ0H1 : θ ≠ θ0

H0 : θ ≤ θ0H1 : θ > θ0

H0 : θ ≥ θ0H1 : θ < θ0

Conclusions and errorsPossible conclusions:- reject H0- do not reject H0

Errors:

reject H0 do not reject H0

H0 correct

H0 not correct

type I Cal ✓

✓ type I (B)

Ho : pt is net pregnant

Hypothesis testing - conceptIf a patient has Klinefelter syndrome,then the patient has XXY chromosomes.

If a genetical test indicates that a tall man with weak muscles does not have an extra X chromosome,

then the patient does not have Klinefelter syndrome.

go

observation

we reject Ho

Hypothesis testing - conceptIf a patient has Klinefelter syndrome,then it is very unlikely that the patient is fertile.

If a tall man with weak muscles is fertile,then…?

Hypothesis testing - conceptIf a patient has Klinefelter syndrome,then it is very unlikely that the patient is fertile.

If a tall man with weak muscles is fertile,then the patient probably does not have Klinefelter syndrome.

Statistical test and rejection regionA statistical test is given by a rejection region K.

H0 is rejected if and only if the observation is in K.

Example

Test:

How to choose c?

X1, X2, …, Xn i.i.d. 0(μ, σ2)H0 : μ ≤ μ0H1 : μ > μ0

jenown

reject Ho if Iza for some C EIR

Lk = { ( x , , - - - ,xn7 E IR"

i I Z C }

Size and powerA test is of size α if

The power of a test is the probability of rejecting H0 as a function of θ:

supθ∈Θ0

ℙθ(X ∈ K) = α

θ ↦ ℙθ(X ∈ K)

" We got the i - P"

-

Ho is rejected-

type I error

-

Ho is rejectedFor O e ④i :

Po CX Ek ) = I - 1Pa CX # k ) = I - Bt type I enron

prob

Power

ℙθ(X ∈ K)

θ

ideal

i -- - -

--

#

a

.----=¥#⇒ ④

,

“p-Value Wars”

p-value

p-valueIntuition: the p-value is the probability of the observations, or even more unexpected ones, if H0 is true.

General definition, relative to a collection of tests indexed by their size α: smallest value of α for which H0 would be rejected.

K

x

p-value: special cases

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics

Bayesian statistics

McGrayne (2012). The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy.

Bayesian statistics

prior distribution of .θ

likelihood .

Bayes’ rule . posterior distribution of .θ

Bayes’ rule - discrete versionFor events A and B, such that

Example: A = persoon has HIV B = HIV-test of a person is positive

ℙ(B) ≠ 0 :

ℙ(A ∣ B) = ℙ(B ∣ A)ℙ(A)ℙ(B)

= ℙ(B ∣ A)ℙ(A)ℙ(B ∣ A)ℙ(A) + ℙ(B ∣ Ac)ℙ(Ac)

"

priona

O

Bayes rule, continuous versionFor a parametric model {pθ : θ ∈ Θ}

π(θ ∣ X = x) = pθ(x)π(θ)∫Θ pθ(x)π(θ)dθ

livelihood prionI d

p ← marginalposterior distribution

ExampleCoin with probability of heads equal to . We observe x heads in n coin tosses.

Prior on : Unif[0, 1]

θ

θ Ii !¥i .

17 Col = I . A Eo EO E i }

Po Cx ) -

- (2) a" a - o )

n - x ( binomial )

(2) a" a - a)

n - K I fo so ee }real X = x ) = -

§ (2) ok - a)n - K Aloe Oei ] die

O" a - a) n - x

- A lo EOE I }= ! C , - cash

- ' 'do

Examplexx - l G - se ,

At

Beta C a, p ) - density pa

,

C x ) =

-

E- ' a - a) Aid , ,

A losses , }

IX

a - 6) h - x

nColX=xI=-

Joke, - asn - sandy

# IOEOEI )

So the posterior is a

Beta Cacti,

n - xtc ) distribution

What do we do with the posterior distribution?

'

I-

Overview

Statistical models Estimation Hypothesis testing Bayesian

statistics