+ All Categories
Home > Documents > Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics -...

Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics -...

Date post: 10-Oct-2019
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
40
9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40 Statistics - Lecture 04 Nicodème Paul Faculté de médecine, Université de Strasbourg
Transcript
Page 1: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40

Statistics - Lecture 04Nicodème Paul Faculté de médecine, Université de Strasbourg

Page 2: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 2/40

CorrelationIn many situations the objective in studying the joint behavior of two variables is to seewhether they are related.

Given n pairs of observations , it is natural to speak of and having a positive relationship if large x's are paired with large y's and small x's withsmall y's. Similarly, if large x's are paired with small y's and small x's with large y's, then anegative relationship between the variables is implied.

·

: temperature and : body length in drosophila

: Height and : Weight

: parent Height and : child Height

: eyes Color and : hair Color

: Color and : perceived Taste

- X Y

- X Y

- X Y

- X Y

- X Y

· ( , ), ( , ), . . . , ( , )x1 y1 x2 y2 xn yn X

Y

2/40

Page 3: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 3/40

CorrelationIn a population, we collect a random sample of 38 individuals. For each individual, we measure his or her height(mm) and his or her weight.

Height Weight

1 1629 71.0

2 1569 56.5

3 1561 56.0

4 1619 61.0

5 1566 65.0

6 1639 62.0

7 1494 53.0

8 1568 53.0

9 1540 65.0

10 1530 57.0

3/40

Page 4: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 4/40

IndependenceIs the variable Height is independent of the variable Weight?

No

Yes

I cannot tell

Submit Show Hint Show Answer Clear

4/40

Page 5: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 5/40

CorrelationA bivariate data set consists of measurements or observations on two variables, X and Y·

5/40

Page 6: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 6/40

Sample correlationThe Pearson's sample correlation coe�cient r of a bivariate data for is given by:

· ( , )xi yi i = 1, . . . ,n

r =( − )( − )∑n

i=1 xi x yi y

( ( − )( ( − )∑ni=1 xi x)2 ∑n

i=1 yi y)2− −−−−−−−−−−−−−−−−−−−−−−−−

6/40

Page 7: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 7/40

Sample correlation

Some properties of the Pearson correlation coe�cent are as following:

The value of does not depend on the unit of measurement for either variable

The value of does not depend on which of the two variables is considered

The value of is between 1 and -1

A correlation coe�cient of occurs only when all the points in a scatterplot of the datalie exactly on a straight line that slopes upward. Similarly, only when all the points lieexactly on a downward-sloping line.

· r

· r X

· r

· r = 1r = −1

7/40

Page 8: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 8/40

Examples of scatterplotsValeurs possibles: -1, -0.8, -0.4, 0, 0.4, 0.8, 1·

8/40

Page 9: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 9/40

Examples of scatterplots with coe�cient correlations

9/40

Page 10: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 10/40

Hypothesis testing on ρ and be jointly normal random variables

is an estimator for .

Hypotheses:

Test statistic:

follows a t distribution with degrees of freedoom

Decision is based on the critical value or

· Xi Yi

· R(X,Y ) =( − )( − )∑n

i=1 Xi X Yi Y

( ( − )( ( − )∑ni=1 Xi X)

2∑

ni=1 Yi Y )

2√ρ

· : ρ = 0 : ρ ≠ 0H0 H1

·

T = Rn − 2

1 − R2

− −−−−−

n − 2

· tn−21−α/2

p − value = (T > | |)PH0tn−2

10/40

Page 11: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 11/40

Example - linear relationship. As , the value of the t statistic is:

For , . As , we reject the null hypothesis, meaning thatthere is strong evidence that , the population parameter is signi�cantly di�erent from .

Note that you cannot �nd this value neither calculate the p-value using the t table provided.Indeed, there is no entry for the t distribution with 36 degrees of freedom in the table.

· r = 0.62 n = 38

t = r = 4.74n − 2

1 − r2

− −−−−√

· α = 0.05 = 2.028t360.975 4.74 > 2.028

ρ 0

·

11/40

Page 12: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 12/40

If a signi�cant sample correlation coe�cient between two variables X and Y is observed, what could be itsmeaning?

X causes Y

Y causes X

Some third factor, either directly or indirectly, causes both X and Y

An unlikely event has occurred and a large sample correlation coe�cient has beengenerated by chance from a population in which X and Y are, in fact, not correlated

The correlation is purely nonsensical, a situation that may arise when measurements ofX and Y are not taken on a common unit of association

Submit Show Hint Show Answer Clear

12/40

Page 13: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 13/40

Goodness of �t - checking for normalityA normal probability plot is a scatterplot of the (normal score , observed value ) pairs for

. The empirical cummulative distribution is de�ned as:

And the scores are found such that .

In software packages, the graphical representation can be obtained using a quantile -quantile plot.

A strong linear pattern in a normal probability plot suggests that population normality isplausible. On the other hand, systematic departure from a straight-line pattern (such ascurvature in the plot) indicates that it is not reasonable to assume that the populationdistribution is normal.

· zi xii = 1, 2, . . . ,n

P(X ≤ ) =xi

⎪⎪⎪

⎪⎪⎪

1 − 0.51/n

0.51/n

i − 0.3175

n + 0.365

if i = 1

if i = n

otherwise

· zi P(Z ≤ ) = P(X ≤ )zi xi

·

·

13/40

Page 14: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 14/40

Checking for normality

14/40

Page 15: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 15/40

Assess normality with Shapiro testHypotheses: · : L(X) = N : L(X) ≠ NH0 H1

Shapiro-Wilk normality test

data: rnorm

(50, mean = 1, sd = 3)

W = 0.97627, p-value = 0.4073

Shapiro-Wilk normality test

data: rf

(50, df1 = 3, df2 = 2)

W = 0.67196, p-value = 2.687e-09

15/40

Page 16: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 16/40

Assess normality with Shapiro testHypotheses: · : L(X) = N : L(X) ≠ NH0 H1

Shapiro-Wilk normality test

data: scatdat$Height

W = 0.93336, p-value = 0.02562

Shapiro-Wilk normality test

data: scatdat$Weight

W = 0.95284, p-value = 0.1105

16/40

Page 17: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 17/40

Dealing with non normal data

Transform the data with the log or the square root function.

Wilcoxon signed rank test can be used in place of a 1-sample or paired t-test

Wilcoxon rank sum test substitutes for the 2-sample t-test

Bootstrappping (testing for a population mean)

1. Extract a new sample of n observations from the original set of n.

2. Calculate the mean of this new sample of n.

3. Repeat steps (1) and (2) an arbitrarily large number of times, say 5000 times.

4. Use the estimated distribution of the sample mean for inference

·

·

·

·

17/40

Page 18: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 18/40

Goodness of �tSuppose data is generated from an experiment, given the following frequencies for the valuesof .

1 2 ... k

Frequencies ...

We want to test the hypotheses:

with a signi�cant level .

X

X

n1 n2 nk

: L(X) = ( , , . . . , ) : L(X) ≠ ( , , . . . , )H0 p1 p2 pk H1 p1 p2 pk

α

18/40

Page 19: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 19/40

Goodness of �t

Genetics

Consider two di�erent characteristics of tomatoes: leaf shape and plant size. The leaf shapemay be potato-leafed or cut-leafed, and the plant may be tall or dwarf.

19/40

Page 20: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 20/40

Goodness of �t - Genetics

Tall cut-leaf (TTCC, TTCc, TtCc, TtCC): , Tall potato-leaf (TTcc, Ttcc):

Dwarf cut-leaf (ttCC, ttCc): , Dwarf potato-leaf (ttcc):

· 916

316

· 316

116

20/40

Page 21: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 21/40

Notation

with tall cut-leaf: 1, tall potato-leaf: 2, dwarf cut-leaf: 3, dwarf potato-leaf: 4

Given a categorical random variable with k possible values (k di�erent levels or categoriesor cells):

is the true proportion of category and we have .

· X

L(X) = = ( , , . . . , )P0 p1 p2 pk

pi i + +. . . + = 1p1 p2 pk

X = {1, 2, 3, 4}

21/40

Page 22: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 22/40

Goodness of �tSuppose data is generated from an experiment, given the following frequencies for the valuesof .

1 2 ... k

Frequencies ...

We want to test the hypotheses:

with a signi�cant level .

X

X

n1 n2 nk

: L(X) = ( , , . . . , ) : L(X) ≠ ( , , . . . , )H0 p1 p2 pk H1 p1 p2 pk

α

22/40

Page 23: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 23/40

Goodness of �tUnder the null hypothesis, given a sample size , the expected count for category is: .We can de�ne the goodness-of-�t statistic, KH as:

Expressed di�erently and referring to category as cell, we write:

Under , with additionnal conditions such as: , , for wehave:

is the number of estimated parameters in the null hypothesis.

n i n × pi

KH =∑i=1

k ( − n ×ni pi)2

n × pi

KH = ∑all cells

(observed cell count − expected cell count)2

expected cell count

H0 n ≥ 50 n ≥ 5pi i = 1, . . . ,k

L(KH) ≈ χ2k−1−d

d

23/40

Page 24: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 24/40

Goodness of �t - ExampleSuppose we perform an experiment with tomato plants, and we observed the following data :

We want to test :

with a level of signi�cance .

: L(X) = ( , , , ) : L(X) ≠ ( , , , )H09

16

3

16

3

16

1

16H1

9

16

3

16

3

16

1

16

α = 0.01

24/40

Page 25: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 25/40

Goodness of �t

The value of the test statistic:

We verify:

kh = + +(926 − 906.1875)2

906.1875

(288 − 302.0625)2

302.0625

(293 − 302.0625)2

302.0625

+ ≈ 1.47(104 − 100.6875)2

100.6875

25/40

Page 26: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 26/40

Goodness of �tFor a signi�cance level , we calculate the critical value such that . Wewould fail to reject si .

In the example, , , , number of degrees of freedom is 3.

The critical value is 11.345. Therefore, we fail to reject the null hypothesis. The p-value isapproximately 0.65.

α cα P( > ) = αχ2k−1−d

H0 kh < cα

kh ≈ 1.47 d = 0 α = 0.01

26/40

Page 27: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 27/40

Contingency tableA rectangular table used to summarize a categorical data set; two-way tables are used tocompare several populations on the basis of a categorical variable or to determine if anassociation exists between two categorical variables.

Example of a contingency table :

·

·

27/40

Page 28: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 28/40

ReviewGiven two discrete random variables X and Y, the joint distribution is de�ned by:

...

...

...

... ... ... ... ...

...

= P(X = ,Y = )pij xi yj

Y

X y1 y2 ym

x1 p11 p12 p1m

x2 p21 p22 p2m

xl pl1 pl2 plm

The marginal distribution of :

The marginal distribution of :

If and are independent :

· X P(X = ) = =xi pi. ∑mj=1 pij

· Y P(X = ) = =yj p.j ∑ni=1 pij

· X Y = ×pij pi. p.j

28/40

Page 29: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 29/40

Testing independence

...

...

...

... ... ... ... ...

...

The sample data observed from is a contingency table:· (X,Y )

Y

X y1 y2 ym

x1 n11 n12 n1m

x2 n21 n22 n2m

xl nl1 nl2 nlm

The marginal frequencies for :

The marginal frequencies for :

If and are independent :

· X =ni. ∑mj=1 nij

· Y =n.j ∑ni=1 nij

· X Y =nij×ni. n.j

n

29/40

Page 30: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 30/40

Testing independence : X and Y are independent

: X and Y are not independent

avec et

The number of parameters to estimate is .

Test statistic:

· H0

· H1

· = ×pij pi. p.j i = 1, 2, . . . , l j = 1, 2, . . . ,m

· l − 1 + m − 1

·

KH =∑i=1

l

∑j=1

m ( −nijni.n.j

n)2

ni.n.j

n

Under , when and for all i, j:· H0 n ≥ 50 ≥ 5ni.n.j

n

L(KH) ≈ χ2(l−1)(m−1)

30/40

Page 31: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 31/40

Example

Let a random variables with values (Family) and (alone) and another random variablewith values (Present) et (Absent). From the sample, we can estimate:

X F S Y

P A

P(F ,P) ≈ P(F ,A) ≈ P(S,P) ≈ P(S,A) ≈40

260

60

260

100

260

60

260

P(F) ≈ P(S) ≈ P(P) ≈ P(A) ≈100

260

160

260

140

260

120

260

31/40

Page 32: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 32/40

Example

and are independent if:

: and are independent

: and are not independent

· H0 X Y

· H1 X Y

X Y

P(X = x,Y = y) = P(X = x) × P(Y = y)

32/40

Page 33: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 33/40

Example

P(F) × P(P) ≈ × P(F) × P(A) ≈ ×100

260

140

260

100

260

120

260

P(S) × P(P) ≈ × P(S) × P(A) ≈ ×160

260

140

260

160

260

120

260

33/40

Page 34: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 34/40

Example

The distribution has degrees of freedom. for .We reject then the null hypothesis.

expected cell count =(row marginal total) ∗ (column marginal total)

grand total

df = (number of rows − 1) × (number of columns − 1)

kh = + + + ≈ 12.54(40 − 54)2

54

(60 − 46)2

46

(100 − 86.4)2

86.4

(60 − 73.6)2

73.6

χ2 (2 − 1) × (2 − 1) = 1 = 3.841cα α = 0.05

34/40

Page 35: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 35/40

Can she taste the di�erence?

Tea-tasting experiment: a woman claimed to be able to judge whether tea or milk was poured in a cup �rst. Fisherdesigned an experiment to test her ability

What should be done about chance variations in the temperature, sweetness, and so on?

How many cups should be used in the test? Should they be paired? In what order should thecups be presented? Fisher suggests: if discrimination of the kind under test is absent, the resultof the experiment will be wholly governed by the laws of chance.

What conclusion could be drawn from a perfect score or from one with one or more errors?

·

·

·

35/40

Page 36: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 36/40

Fisher's exact testTea-tasting experiment: a woman claimed to be able to judge whether tea or milk was poured in a cup �rst. Thewoman was given eight cups of tea, in four of which milk was poured �rst, and was told to guess which four hadmilk poured �rst.

The contingency for this design is:·

: probability to select a cup milk �rst

: probability to select a cup tea �rst

The hypotheses to test:

· p1

· p2

·

: = : >H0 p1 p2 H1 p1 p2

36/40

Page 37: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 37/40

Fisher's exact testTea-tasting experiment: a woman claimed to be able to judge whether tea or milk was poured in a cup �rst. Thewoman was given eight cups of tea, in four of which tea was poured �rst, and was told to guess which four had teapoured �rst.

random variable with avlues

The Hypergeometric Distribution:

There are possible ways to classify 4 of the 8 cups as milk �rst.

· X x = 0, 1, 2, 3, 4

·

(X = x) = x = 0, 1, 2, 3, 4PH0

( )( )4x

44−x

( )84

· ( ) = 7084

37/40

Page 38: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 38/40

Check yourselfTea-tasting experiment: a woman claimed to be able to judge whether tea or milk was poured in a cup �rst.The woman was given eight cups of tea, in four of which milk was poured �rst, and was told to guess which fourhad milk poured �rst. Suppose she identi�es 3 cups milk �rst correctly. The p-value is:

1/70

0.04

16/70

17/70

Submit Show Hint Show Answer Clear

38/40

Page 39: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 39/40

Fisher's exact test

The marginal counts , , , and are �xed

The hypotheses to test:

The test statistic is the number of observations cell (1, 1) and

· a + b c + d a + c b + d

·

: = : > or < or ≠H0 p1 p2 H1 p1 p2 p1 p2 p1 p2

· T

(T = x) = x = 0, 1, 2, . . . ,min{a + b,a + c}PH0

( )( )a+cx

b+da+b−x

( )na+b

39/40

Page 40: Statistics - Lecture 04 - appspot.comstatnipa.appspot.com/cours/04/04.pdf9/26/2018 Statistics - Lecture 04 file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/inde x.html#40

9/26/2018 Statistics - Lecture 04

file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 40/40

See you next time

40/40


Recommended