STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence...

Post on 31-Mar-2018

223 views 4 download

transcript

STAT 111 Recitation 8

Linjun Zhang

March 17, 2017

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at linjunz@wharton.upenn.edu if you have any

feedback. (eg. less review, more practice problems? )

1

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at linjunz@wharton.upenn.edu if you have any

feedback. (eg. less review, more practice problems? )

1

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at linjunz@wharton.upenn.edu if you have any

feedback. (eg. less review, more practice problems? )

1

Confidence intervals

A general formula. For a parameter θ, suppose we estimate it by θ.

Then an approximate 95% confidence interval for θ is

θ − 2 · s.d.(θ) to θ + 2 · s.d.(θ)

where s.d.(θ) is the standard deviation of θ, and s.d.(θ) is the

estimate of s.d.(θ).

2

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Estimating the difference between two means

If X11,X12, ...,X1n are i.i.d. with mean µ1 and variance σ21 , X21,X22, ...,X2m

are i.i.d. with mean µ2 and variance σ22 , and we observe x11, ..., x1n,

x21, ..., x2m, what can we say about µ1 − µ2?

I We estimate µ1 − µ2 by x1 − x2, where x1 = x11+...+x1nn , x2 = x21+...+x2m

m .

I The variance of X1 − X2 isσ2

1n +

σ22

m , and we estimate σ21 and σ2

2 by

s21 =

x211+...+x2

1n−n(x1)2

n−1 , s22 =

x221+...+x2

2m−m(x1)2

m−1 .

I An approximate 95% confidence interval for µ1 − µ2 is

x1 − x2 − 2

√s2

1

n+

s22

mto x1 − x2 + 2

√s2

1

n+

s22

m.

4

Practice problem

Question

We are interested in investigating any potential difference between the mean blood

sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a

sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,

118, 138. We also took a sample of eight non-diabetics and found the following blood

sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find

two numbers between which we are about 95% certain that µ1 − µ2 lies.

Solution

µ1 = x1 = 16

(127 + 144 + 140 + 136 + 118 + 138) = 133.83.

σ21 = s2

1 = 16−1

(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.

µ2 = x2 = 18

(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.

σ22 = s2

2 =

18−1

(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.

The 95% confidence interval is given as x1 − x2 ± 2

√s21n

+s22m

, which is −2.49 to 17.90.

5

Practice problem

Question

We are interested in investigating any potential difference between the mean blood

sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a

sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,

118, 138. We also took a sample of eight non-diabetics and found the following blood

sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find

two numbers between which we are about 95% certain that µ1 − µ2 lies.

Solution

µ1 = x1 = 16

(127 + 144 + 140 + 136 + 118 + 138) = 133.83.

σ21 = s2

1 = 16−1

(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.

µ2 = x2 = 18

(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.

σ22 = s2

2 =

18−1

(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.

The 95% confidence interval is given as x1 − x2 ± 2

√s21n

+s22m

, which is −2.49 to 17.90.

5

RegressionSuppose we observe n data points (xi , yi ), i = 1, 2, ..., n.

It seems like there is some kind of linear relationship between the random

variables Xi and Yi , i = 1, 2, ..., n, i.e.

Yi = α + βXi + εi

where εi denotes the noise term (we assume that each yi is observed with noise

εi that has mean 0 and variance σ2).

6

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

7

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I Y is the growth height of a tree, and X is the amount of water.

7

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I In a basketball game analysis, Y is the points scored and X is

the minutes played.

7

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I In a basketball game analysis, Y is the points scored and X is

the minutes played of a player.

7

Regression: before/after the experiment

I Before the experiment

I Conceptualize about Y1,Y2, ...,Yn

I Y1 corresponds to x1, Y2 corresponds to x2 and so on.

I Mean of Yi = α + βxi and variance of Yi = σ2.

I The various Yi are independent but not identically distributed.

I After the experiment

I Obtain observed values y1, y2, ..., yn.

I Plot (x1, y1), (x2, y2), ..., (xn, yn) values in the x-y plane.

8

Regression: auxiliary quantities

x =1

n

n∑i=1

xi

y =1

n

n∑i=1

yi

sxx =n∑

i=1

(xi − x)2 =n∑

i=1

x2i − nx2

syy =n∑

i=1

(yi − y)2 =n∑

i=1

y2i − ny2

sxy =n∑

i=1

(xi − x)(yi − y) =n∑

i=1

xiyi − nx y

9

Regression: estimating α, β, σ2

Unbiased estimate :

I Estimate β by b =sxysxx

.

I Estimate α by a = y − bx .

I Estimate σ2 by s2r =

syy−b2sxxn−2 .

I Estimate the regression line by y = a + bx .

10

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11