+ All Categories
Home > Documents > STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence...

STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence...

Date post: 31-Mar-2018
Category:
Upload: truongtuong
View: 222 times
Download: 4 times
Share this document with a friend
22
STAT 111 Recitation 8 Linjun Zhang March 17, 2017
Transcript
Page 1: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

STAT 111 Recitation 8

Linjun Zhang

March 17, 2017

Page 2: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at [email protected] if you have any

feedback. (eg. less review, more practice problems? )

1

Page 3: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at [email protected] if you have any

feedback. (eg. less review, more practice problems? )

1

Page 4: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Misc

I Midterm grades will be posted next Tuesday or Wednesday.

I The slides can be found on

http://stat.wharton.upenn.edu/∼ linjunz/

I Send me email at [email protected] if you have any

feedback. (eg. less review, more practice problems? )

1

Page 5: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Confidence intervals

A general formula. For a parameter θ, suppose we estimate it by θ.

Then an approximate 95% confidence interval for θ is

θ − 2 · s.d.(θ) to θ + 2 · s.d.(θ)

where s.d.(θ) is the standard deviation of θ, and s.d.(θ) is the

estimate of s.d.(θ).

2

Page 6: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Page 7: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Page 8: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Confidence intervals

I If X has a binomial distribution Binomial(n, θ), and we observe X = x .

An (conservative) approximate 95% confidence interval for θ is

x

n−

√1

nto

x

n+

√1

n.

I If X1,X2, ...,Xn are i.i.d. with mean µ and variance σ2, and we observe

x1, x2, ..., xn. An approximate 95% confidence interval for µ is

x − 2s√n

to x + 2s√n,

where x = x1+...+xnn

, s2 =x2

1 +...+x2n−n(x)2

n−1.

I If X1 has a binomial distribution Binomial(n1, θ1), and X2 has a binomial

distribution Binomial(n2, θ2). We observe X1 = x1,X2 = x2. An

(conservative) approximate 95% confidence interval for θ1 − θ2 is

p1 − p2 −√

1

n1+

1

n2to p1 − p2 +

√1

n1+

1

n2,

where pi = xini

, for i = 1, 2.

3

Page 9: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Estimating the difference between two means

If X11,X12, ...,X1n are i.i.d. with mean µ1 and variance σ21 , X21,X22, ...,X2m

are i.i.d. with mean µ2 and variance σ22 , and we observe x11, ..., x1n,

x21, ..., x2m, what can we say about µ1 − µ2?

I We estimate µ1 − µ2 by x1 − x2, where x1 = x11+...+x1nn , x2 = x21+...+x2m

m .

I The variance of X1 − X2 isσ2

1n +

σ22

m , and we estimate σ21 and σ2

2 by

s21 =

x211+...+x2

1n−n(x1)2

n−1 , s22 =

x221+...+x2

2m−m(x1)2

m−1 .

I An approximate 95% confidence interval for µ1 − µ2 is

x1 − x2 − 2

√s2

1

n+

s22

mto x1 − x2 + 2

√s2

1

n+

s22

m.

4

Page 10: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Practice problem

Question

We are interested in investigating any potential difference between the mean blood

sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a

sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,

118, 138. We also took a sample of eight non-diabetics and found the following blood

sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find

two numbers between which we are about 95% certain that µ1 − µ2 lies.

Solution

µ1 = x1 = 16

(127 + 144 + 140 + 136 + 118 + 138) = 133.83.

σ21 = s2

1 = 16−1

(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.

µ2 = x2 = 18

(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.

σ22 = s2

2 =

18−1

(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.

The 95% confidence interval is given as x1 − x2 ± 2

√s21n

+s22m

, which is −2.49 to 17.90.

5

Page 11: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Practice problem

Question

We are interested in investigating any potential difference between the mean blood

sugar level of diabetics (µ1) and that of non-diabetics (µ2). To do this we took a

sample of six diabetics and found the following blood sugar levels: 127, 144, 140, 136,

118, 138. We also took a sample of eight non-diabetics and found the following blood

sugar levels: 125, 128, 133, 141, 109, 125, 126, 122. (a) Estimate µ1 − µ2. (b) Find

two numbers between which we are about 95% certain that µ1 − µ2 lies.

Solution

µ1 = x1 = 16

(127 + 144 + 140 + 136 + 118 + 138) = 133.83.

σ21 = s2

1 = 16−1

(1272 + 1442 + 1402 + 1362 + 1182 + 1382 − 6 × 133.832) = 93.24.

µ2 = x2 = 18

(125 + 128 + 133 + 141 + 109 + 125 + 126 + 122) = 126.13.

σ22 = s2

2 =

18−1

(1252 + 1282 + 1332 + 1412 + 1092 + 1252 + 1262 + 1222 − 8× 126.1252) = 83.55.

The 95% confidence interval is given as x1 − x2 ± 2

√s21n

+s22m

, which is −2.49 to 17.90.

5

Page 12: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

RegressionSuppose we observe n data points (xi , yi ), i = 1, 2, ..., n.

It seems like there is some kind of linear relationship between the random

variables Xi and Yi , i = 1, 2, ..., n, i.e.

Yi = α + βXi + εi

where εi denotes the noise term (we assume that each yi is observed with noise

εi that has mean 0 and variance σ2).

6

Page 13: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

7

Page 14: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I Y is the growth height of a tree, and X is the amount of water.

7

Page 15: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I In a basketball game analysis, Y is the points scored and X is

the minutes played.

7

Page 16: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression

I We can view Y as some random non-controllable quantity, and X as

some non-random controllable quantity.

Example:

I In a basketball game analysis, Y is the points scored and X is

the minutes played of a player.

7

Page 17: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: before/after the experiment

I Before the experiment

I Conceptualize about Y1,Y2, ...,Yn

I Y1 corresponds to x1, Y2 corresponds to x2 and so on.

I Mean of Yi = α + βxi and variance of Yi = σ2.

I The various Yi are independent but not identically distributed.

I After the experiment

I Obtain observed values y1, y2, ..., yn.

I Plot (x1, y1), (x2, y2), ..., (xn, yn) values in the x-y plane.

8

Page 18: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: auxiliary quantities

x =1

n

n∑i=1

xi

y =1

n

n∑i=1

yi

sxx =n∑

i=1

(xi − x)2 =n∑

i=1

x2i − nx2

syy =n∑

i=1

(yi − y)2 =n∑

i=1

y2i − ny2

sxy =n∑

i=1

(xi − x)(yi − y) =n∑

i=1

xiyi − nx y

9

Page 19: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: estimating α, β, σ2

Unbiased estimate :

I Estimate β by b =sxysxx

.

I Estimate α by a = y − bx .

I Estimate σ2 by s2r =

syy−b2sxxn−2 .

I Estimate the regression line by y = a + bx .

10

Page 20: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11

Page 21: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11

Page 22: STAT 111 Recitation 8stat.wharton.upenn.edu/~linjunz/rec8.pdf · ˜ linjunz/ ... Con dence intervals A general formula. ... 3. Con dence intervals

Regression: practice problem

Practice Problem

Suppose we have observations of average income and total pizza sales for a

1-month period for eight different towns:

Estimate the mean pizza sales of a town with income x via the formula

“estimated mean = a+bx”. (That is, calculate a and b.)

Solution

I x = 10, y = 43.625, sxx = 210, syy = 1829.875, sxy = 610.

I b =sxysxx

= 610210

= 2.905; a = y −bx = 43.625−2.904762×10 = 14.57738.

11


Recommended