+ All Categories
Home > Documents > WeekOne - sciences.ucf.edu

WeekOne - sciences.ucf.edu

Date post: 12-Jan-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
72
Week One Slides posted; new work item posted; videos for sta6236 on line on you tube Pep talk for regression, especially in light of predicAve analyAcs Reality check (informaAve quiz) reveals that many of you need to review reference distribuAons (normal, t (n) ,F (ν1, ν2) , χ 2 (n), matrix analysis; likely hypothesis tesAng (pvalue low means null must go), confidence/predicAon intervals; Take it upon yourself to proceed accordingly. 1
Transcript
Page 1: WeekOne - sciences.ucf.edu

Week  One  •  Slides  posted;  new  work  item  posted;  videos  for  sta6236  on  line  on  you  tube  

•  Pep  talk  for  regression,  especially  in  light  of  predicAve  analyAcs  

•  Reality  check  (informaAve  quiz)  reveals  that  many  of  you  need  to  review  reference  distribuAons  (normal,  t(n),  F(ν1, ν2),  χ2(n),  matrix  analysis;    likely  hypothesis  tesAng  (p-­‐value  low  means  null  must  go),  confidence/predicAon  intervals;    Take  it  upon  yourself  to  proceed  accordingly.  

1  

Page 2: WeekOne - sciences.ucf.edu

Regression Models

Supply answers to the question: ‘What is the relationship between the variables?’

Answer in the form of equations involving: Numerical response (dependent) variable One or more numerical or categorical independent (explanatory) variables

Emphasis on prediction & estimation rather than inference (hypothesis testing)

2  

Page 3: WeekOne - sciences.ucf.edu

Another Reason: Demonstrate No Relationship!

3  

Page 4: WeekOne - sciences.ucf.edu

Brief Recap: Pep Talk for Regression

•  Syllabus/Text/JMP Pro 11 tied together •  Regression analysis used extensively by

statisticians as well as non-statisticians •  Exploratory mode (Y versus X or X’s) •  Estimation •  Prediction •  Understanding

4  

Page 5: WeekOne - sciences.ucf.edu

Regression  to  help  with  the  placement  of  incisions  for  roboAc  surgery.    PracAce  6-­‐7  Ames  and  good  to  go.    Help  the  lesser  experienced  benefit  from  the  experts.    Take  results  from  many  successful  surgeries,  save  these  measurements;  given  a  paAent’s  height,  weight,  other  factors,      produce  distance  from  ASIS,  TIC  to  camera  incision.    

5  

Page 6: WeekOne - sciences.ucf.edu

GeXng  familiar  with  JMP  pla[orms  (or  their  equivalents)  

•  Get  ahold  of  JMP  Pro  11  (various  opAons)  •  Take  tutorials,  pracAce  loading  data  sets  

–  Fit  Y  by  X  –  Fit  Model  –  Analyze  distribuAon  –  Formula  –  Table  manipulaAons  –  Note  linkages  

•  Read  ahead  in  text    as  noted  in  FAQ  for  STA  6236  •  Review  matrix  analysis  if  rusty  •  One  demo  on  You  Tube    using  SLOPE.jmp  file  

6  

Page 7: WeekOne - sciences.ucf.edu

Data  sets  ASLM    Ch1ta1  

 

7  

Page 8: WeekOne - sciences.ucf.edu

 PracAce  with  “Genuine”  Fake  Made  up  Data  

•  Consider  a  model  of  the  form    y  =  10  +  5x  +  error    where  x:    1,  2,  3,  4,  5  replicated  

•  Check  how  you  did  •  Fit  it.    (true,  with  error)  •  Play  with  parameters,  assumpAons    •  Try  it  with  1000  replicated  data  sets  

8  

Page 9: WeekOne - sciences.ucf.edu

You  tube  Video  #2  part2      PracAce  with  “Genuine”  Fake  Made  up  Data  

•  Consider  a  model  of  the  form    y  =  10  +  5x  +  error    where  x:    1,  2,  3,  4,  5  replicated  

•  Check  how  you  did  •  Fit  it.    (true,  with  error)  •  15  Minute  Limit!!!!      LLL  •  Need  to  create  the  groups  and  then  fiddle  with  assumpAons  

•  Play  with  parameters,  assumpAons    •  Try  it  with  1000  replicated  data  sets  

9  

Page 10: WeekOne - sciences.ucf.edu

Homework  assignment  

•  Simulated  data  problem;      do  this  one  as  well  

10  

Page 11: WeekOne - sciences.ucf.edu

Zillow  2011    data  

•  Fit  line  •  Fit  orthog  •  Fit  x  on  y  •  Fit  Y  on  X;  fit  model  

11  

Page 12: WeekOne - sciences.ucf.edu

12  

Page 13: WeekOne - sciences.ucf.edu

Fit  Y  by  X  

13  

Page 14: WeekOne - sciences.ucf.edu

Functional v. Statistical Relations

Functional Relation: One Y-value for each X-value

14  

Page 15: WeekOne - sciences.ucf.edu

Statistical Relation

Example 1: Y = Year-end employee evaluation X = Mid-year evaluation

15  

Page 16: WeekOne - sciences.ucf.edu

Curvilinear Statistical Relation

Example 2: x = Age y = Steroid level in blood

Regression Objectives: Characterize the statistical Relation and/or predict new values

16  

Page 17: WeekOne - sciences.ucf.edu

Statistical Relations

•  Have distribution of Y-values for each X •  The mean changes systematically with X

17  

Page 18: WeekOne - sciences.ucf.edu

Simple Linear Regression Model: The Assumptions

18  

Page 19: WeekOne - sciences.ucf.edu

Notes on Simple Linear Regression Model (Regression “lite”)

1.  Model is simple, because only one predictor (X)

2.  Model is linear because parameters enter linearly

3.  Since X = X 1 (and X 2, X 3, etc. not present) model is first-order

19  

Page 20: WeekOne - sciences.ucf.edu

Homework problems. (Best to run many data sets. Examine output. Interpret. Where does it come from?)

•  What I have done in the past: Do as many problems as

you wish. Student solution manual is floating around out in the cloud. Evidently easy to find. Do as you wish, to get comfortable with the material.

•  Not to be graded, but I may go through them in an extended class period following usual lecture format class (to be determined). You should be comfortable doing these types of problems. For quiz situation, you should also be comfortable extracting relevant material from JMP output.

•  1.5, 1.6 (draw plot by hand), 1.7, 1.10, 1.11, 1.13, 1.16, 1.18, 1.19 (needs software and data disc that comes with the book), 1.20 through 1.28, 1.29, 1.32, 1.33, 1.34, 1.35, 1.36, 1.39,1.43, 1.44, 1.45, 1.46

20  

Page 21: WeekOne - sciences.ucf.edu

Features of Model

1.  εi is a random variable, so Yi is also a random variable 2.  Mean of Yi is the regression function:

3. εi is the vertical deviation of Yi from the mean at Xi

21  

Page 22: WeekOne - sciences.ucf.edu

Features of Model (continued) 4.  Variance is constant:

Var(Yi) = Var(β0 + β1 Xi + εi) = Var(εi) = σ2

5.  Yi is uncorrelated with Yj for i ≠ j

6.  In summary, regression model (1.1) implies that responses Yi come from probability distributions whose means are E(Yi) = β0 + β1 Xi and whose variances are σ 2, the same for all levels of X. Any two responses Yi and Yj are uncorrelated.

22  

Page 23: WeekOne - sciences.ucf.edu

Illustration of Simple Linear Regression

Error terms NOT assumed to be normally distributed—no distributional assumptions made other than on moments.

23  

Page 24: WeekOne - sciences.ucf.edu

Section 1.4

•  Observational Data – Lung cancer impacts from smoking/smoking

cessation; suggests causation but not provable

•  Experimental Data – Feasible to control assignment of subjects to

treatments; STA 5205 designed experiment •  Completely Randomized Design

– Free of “bias”…possibly not efficient

24  

Page 25: WeekOne - sciences.ucf.edu

25  

Page 26: WeekOne - sciences.ucf.edu

Least Squares Estimators: Properties

1.  Linear: We will show they are each linear combinations of the Yi ’s

2.  Unbiased: E{b0} = β0 and E{b1} = β1 3.  Best: Minimum variance (maximum precision) among

all linear, unbiased estimators of these parameters. 4.  Estimators

Gauss-Markov Theorem: If the assumptions hold, the LS estimators are “BLUE”:

26  

Page 27: WeekOne - sciences.ucf.edu

27  

Page 28: WeekOne - sciences.ucf.edu

Relationship to Features of Model

•  Y = f(X) does not appear linear •  εi and εj uncorrelated? •  Presumption of declining values of y •  Large drop in y suggests long duration

until next y observed •  Is X fixed? •  Other than that, … no problem!

28  

Page 29: WeekOne - sciences.ucf.edu

Alternative Version of Model: Use Centered Predictor(s)

where:

Same slope, different intercept! 29  

Page 30: WeekOne - sciences.ucf.edu

Estimating the Regression Function

Example: Persistence Study. Each of 3 subjects given a difficult task. Yi is the number of attempts before quitting.

Sub ject i : 1 2 3Age X i : 20 55 30

Number of attempts Yi : 5 12 10

30  

Page 31: WeekOne - sciences.ucf.edu

Estimating the Regression Function

Scatter Plot:

20 30 40 50

0

5

10

15

Age

Attempts

Hypothesis: E{Y} = β0 + β1 X How do we estimate β0 and β1?

31  

Page 32: WeekOne - sciences.ucf.edu

Criteria for choice of β0 and β1

•  Sum of perpendicular distances ┴

•  Sum of vertical distances (absolute values) ↕

•  Sum of vertical distances squared (↕)2

•  Sum of horizontal distances (n)

32  

Page 33: WeekOne - sciences.ucf.edu

Least Squares Criterion

Find the values of β0 and β1 that minimize the least squares objective function Q, given the sample, (X1, Y1), …, (Xn,Yn). Call those minimizing values: b0 and b1.

33  

Page 34: WeekOne - sciences.ucf.edu

Persistence Study

Who wins? Which fit is better?

34  

Page 35: WeekOne - sciences.ucf.edu

How do we find b0 and b1? Calculus: 1.  Take partial derivatives with respect to β0 and β1,

and set equal to zero 2.  Get two equations and two unknowns, solve.

Denote solutions by b0 and b1:

35  

Page 36: WeekOne - sciences.ucf.edu

36  

Page 37: WeekOne - sciences.ucf.edu

37  

Page 38: WeekOne - sciences.ucf.edu

Least Squares Estimators: Properties

1.  Best: Minimum variance (maximum precision) 2.  Linear: We will show they are each linear

combinations of the Yi ’s 3.  Unbiased: E{b0} = β0 and E{b1} = β1 4.  Estimators

Gauss-Markov Theorem: If the assumptions hold, the LS estimators are “BLUE”:

38  

Page 39: WeekOne - sciences.ucf.edu

Example 1: Toluca Company Data

39  

Page 40: WeekOne - sciences.ucf.edu

Toluca Company Fit

Bivariate Fit of Work Hours by Lot Size

40  

Page 41: WeekOne - sciences.ucf.edu

JMP Pro 11 Output

Output from Fit Y by X Platform

41  

Page 42: WeekOne - sciences.ucf.edu

Estimating the mean response at X

Regression Function:

Estimator:

Using centered-X model:

42  

Page 43: WeekOne - sciences.ucf.edu

Residuals!

Estimated residuals are key to assessing fit and validity of assumptions

True Residuals (always unknown!): εi = Yi – (β0 + β1 Xi )

Estimated Residuals:

43  

Page 44: WeekOne - sciences.ucf.edu

More Properties related to b0 and b1

1. 2.

5.  The regression line passes through the point

3.

4.

44  

Page 45: WeekOne - sciences.ucf.edu

Estimating the variance, σ2

Single population (no X’s for now, just Y’s):

Degrees of freedom is n-1 here because the mean was estimated using one statistic, namely Y. For regression, mean is estimated by Y, which uses two statistics, b0 and b1.

= SSE / (n-1)

^

_

45  

Page 46: WeekOne - sciences.ucf.edu

Tolucca data, ch01ta1

•  Easy to load via best guess (then delete a column probably).

•  Can copy and paste from EXCEL as well

46  

Page 47: WeekOne - sciences.ucf.edu

s2 is the “mean square for error” in Analysis of Variance (MSE = SSE/DF)

47  

SSE  =  54825.46    MSE  =  2384    48.82  is  square  root  of  2384,  esAmate  of  standard  deviaAon,  σ

Page 48: WeekOne - sciences.ucf.edu

More Properties related to b0 and b1

1. 2.

5.  The regression line passes through the point

3.

4.

48  

Page 49: WeekOne - sciences.ucf.edu

49  

Page 50: WeekOne - sciences.ucf.edu

50  

Page 51: WeekOne - sciences.ucf.edu

51  

Page 52: WeekOne - sciences.ucf.edu

52  

Page 53: WeekOne - sciences.ucf.edu

53  

Page 54: WeekOne - sciences.ucf.edu

August 27, 2014

•  No class on Labor Day…in case you forgot

•  Next class after tonight is Sept. 3

54  

Page 55: WeekOne - sciences.ucf.edu

Study Guide •  Understand terminology •  Assumptions for simple linear regression •  Least squares criterion; normal equations •  Derive estimators b0, b1 for β0, β1, resp. •  Sense in which the LS are BLUE •  Be able to extract relevant numbers from

JMP Pro 11 output (e.g., estimates; fitted model)

•  Properties of b0, b1

55  

Page 56: WeekOne - sciences.ucf.edu

Study Guide (continued) •  Assumptions for normal error regression •  LS versus MLE estimators •  Yi is normal, distribution of linear combination of these Yi ’s

•  Properties of the ki ’s •  Distribution, mean and variance of b1 •  Difference between confidence interval on the regression

function and prediction interval for future observations at xh

•  SSTO=SSR+SSE and why we care about ANOVA •  General linear test/full, reduced model •  Definition and interpretation of R2

56  

Page 57: WeekOne - sciences.ucf.edu

Normal Error Regression Model

Add to the “assumptions” one more item:

Notes: 1.  N(0,σ2) implies normally distributed with mean zero and variance σ2. 2.  “Uncorrelated” implies independence for

normal errors. 3. Normality is a strong assumption—might not be true!

57  

Page 58: WeekOne - sciences.ucf.edu

One “Rationale” for Normality Suppose the true model involves 21 weak predictors: X and Z1, …, Z20 so that:

Yi = β0 + β1 Xi + β2 Z1,i +β3 Z2,i + … + β20 Z20,i +εi But we use:

Yi = β0 + β1 Xj + εi So that

εi = β2 Z1,i +β3 Z2,i + … + β20 Z20,i Central Limit Theorem suggests normality of εi. (not implausible that error terms normal)

58  

Page 59: WeekOne - sciences.ucf.edu

Maximum Likelihood Estimation

Rationale: Use as estimates, those values of the parameters that maximize the likelihood of the observed data

Case 1: Single sample; estimate µ. Assume σ2 = 100. Data: n = 3; Y1 = 250, Y2=265, Y3=259. Which is more likely, µ = 230 or µ = 259?

59  

Page 60: WeekOne - sciences.ucf.edu

Maximum Likelihood Estimators

The values of β0, β1, and σ2 that maximize the likelihood function, namely β0, β1, and σ2, are called the maximum likelihood estimators. Some results:

^^ ^

60  

Page 61: WeekOne - sciences.ucf.edu

Some KEY Points from Appendix 1

Note: Review Appendix 1 with special attention to A.1: Summation and product notation A.3: Random variables A.4: Normal and related distributions A.6: Inferences about population mean A.7: Comparisons of population means A.8: Inferences about population variance

61  

Page 62: WeekOne - sciences.ucf.edu

Linear Combinations of Random Variables

Let Y1, . . ., Yn be random variables, and a1, . . ., an are constants. Then:

Z = a1Y1 + . . . + anYn is a linear combination of the random variables Y1, . . . ,Yn

62  

Page 63: WeekOne - sciences.ucf.edu

Examples of Linear Combinations

1.  Example 1: Difference of two random variables

2.  Example 2: The sample mean

63  

Page 64: WeekOne - sciences.ucf.edu

Examples of Linear Combinations

1.  Example 1: Difference of two random variables

X - Y

2.  Example 2: The sample mean X = X1/n + X2/n + … + Xn/n

_

64  

Page 65: WeekOne - sciences.ucf.edu

Expectation and Variance of Linear Combinations

1. Expectation (A.29a). Let E{Yi} = µi, for i = 1,2, . . ., n, and let Z = a1Y1 + . . . + anYn. Then:

E{Z} = Σ ai µi

2. Variance (A.31): In addition to the above, Assume that the {Yi} are mutually independent and σ2{Yi} = σi

2, i = 1,2, . . ., n. Then:

σ2{Z} = a12

σ12 + . . . + an

2σn2

65  

Page 66: WeekOne - sciences.ucf.edu

Examples of Linear Combinations 1.  Example 1: Difference of two random variables

E(X-Y) = E(X) – E(Y) Var(X-Y) = Var(X) + Var(Y)

2.  Example 2: The sample mean

E(X-bar)= 1/n [E(X1) + …+ E(Xn)] = (1/n) (n*µ) = µ Var(X-bar) = (1/n)2 [σ2 + … + σ2] = σ2/n

66  

Page 67: WeekOne - sciences.ucf.edu

Expectation and Variance of Linear Combinations: Examples

Example 4: Let Z = Y. Find E{Z} and σ2{Z}.

_

67  

Page 68: WeekOne - sciences.ucf.edu

68  

Page 69: WeekOne - sciences.ucf.edu

t Distribution Examples

Example 5: Find the t statistic corresponding to the sample average in a sample of size n. Assume E{Yi} = µ0, for i = 1,2, . . ., n

69  

Page 70: WeekOne - sciences.ucf.edu

Linear Combinations of Independent Normal RVs (A.40)

70  

Page 71: WeekOne - sciences.ucf.edu

Chapter 2

71  

Page 72: WeekOne - sciences.ucf.edu

72  


Recommended