+ All Categories
Home > Documents > Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf ·...

Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf ·...

Date post: 17-Oct-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
108
Unit 6: Simple Linear Regression Lecture : Introduction to SLR Statistics 101 Thomas Leininger June 17, 2013
Transcript
Page 1: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Unit 6: Simple Linear RegressionLecture : Introduction to SLR

Statistics 101

Thomas Leininger

June 17, 2013

Page 2: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 3: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 4: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Does ball-throwing ability vary by major?

Going back to our carnival game, should I be worried if a bus-load ofpublic policy majors show up at my booth?

The hypotheses are:H0: Ball-throwing ability and major are independent. Ball-throwing

skills do not vary by major.HA: Ball-throwing ability and major are dependent. Ball-throwing

skills vary by major.

https:// commons.wikimedia.org/ wiki/ File:Archery Target 80cm.svg

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

Note: I multiplied the numbers by 10 to meet our expected cell counts conditions.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 2 / 35

Page 5: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Does ball-throwing ability vary by major?

Going back to our carnival game, should I be worried if a bus-load ofpublic policy majors show up at my booth?

The hypotheses are:

H0: Ball-throwing ability and major are independent. Ball-throwingskills do not vary by major.

HA: Ball-throwing ability and major are dependent. Ball-throwingskills vary by major.

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

Note: I multiplied the numbers by 10 to meet our expected cell counts conditions.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 2 / 35

Page 6: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Does ball-throwing ability vary by major?

Going back to our carnival game, should I be worried if a bus-load ofpublic policy majors show up at my booth?

The hypotheses are:

H0: Ball-throwing ability and major are independent. Ball-throwingskills do not vary by major.

HA: Ball-throwing ability and major are dependent. Ball-throwingskills vary by major.

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

Note: I multiplied the numbers by 10 to meet our expected cell counts conditions.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 2 / 35

Page 7: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Does ball-throwing ability vary by major?

Going back to our carnival game, should I be worried if a bus-load ofpublic policy majors show up at my booth?

The hypotheses are:

H0: Ball-throwing ability and major are independent. Ball-throwingskills do not vary by major.

HA: Ball-throwing ability and major are dependent. Ball-throwingskills vary by major.

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

Note: I multiplied the numbers by 10 to meet our expected cell counts conditions.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 2 / 35

Page 8: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Chi-square test of independence

The test statistic is calculated as

χ2df =

k∑i=1

(O − E)2

Ewhere df = (R − 1) × (C − 1),

where k is the number of cells, R is the number of rows, and C isthe number of columns.

Note: We calculate df differently for one-way and two-way tables.

Expected counts in two-way tables

Expected Count =(row total) × (column total)

table total

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35

Page 9: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Ball throwing

Chi-square test of independence

The test statistic is calculated as

χ2df =

k∑i=1

(O − E)2

Ewhere df = (R − 1) × (C − 1),

where k is the number of cells, R is the number of rows, and C isthe number of columns.

Note: We calculate df differently for one-way and two-way tables.

Expected counts in two-way tables

Expected Count =(row total) × (column total)

table total

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35

Page 10: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 11: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

Expected counts in two-way tables

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

df = (R − 1) × (C − 1) =

(2 − 1) × (3 − 1) = 2

χ2df =

k∑i=1

(O − E)2

E=

(40−25.7)2

25.7 + · · · +(30−22.857)2

22.857 = 24.306

p-value :

smaller than 0.001

Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83

2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.823 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.274 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.475 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 4 / 35

Page 12: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

Expected counts in two-way tables

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

df = (R − 1) × (C − 1) =

(2 − 1) × (3 − 1) = 2

χ2df =

k∑i=1

(O − E)2

E=

(40−25.7)2

25.7 + · · · +(30−22.857)2

22.857 = 24.306

p-value :

smaller than 0.001

Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83

2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.823 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.274 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.475 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 4 / 35

Page 13: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

Expected counts in two-way tables

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

df = (R − 1) × (C − 1) = (2 − 1) × (3 − 1) = 2

χ2df =

k∑i=1

(O − E)2

E=

(40−25.7)2

25.7 + · · · +(30−22.857)2

22.857 = 24.306

p-value :

smaller than 0.001

Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83

2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.823 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.274 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.475 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 4 / 35

Page 14: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

Expected counts in two-way tables

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

df = (R − 1) × (C − 1) = (2 − 1) × (3 − 1) = 2

χ2df =

k∑i=1

(O − E)2

E=

(40−25.7)2

25.7 + · · · +(30−22.857)2

22.857 = 24.306

p-value :

smaller than 0.001

Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83

2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.823 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.274 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.475 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 4 / 35

Page 15: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Recap: Chi-square test of independence Expected counts in two-way tables

Expected counts in two-way tables

Major Public Policy Undeclared Other TotalHit target 40 10 10 60Missed target 20 30 30 80Total 60 40 40 140

df = (R − 1) × (C − 1) = (2 − 1) × (3 − 1) = 2

χ2df =

k∑i=1

(O − E)2

E=

(40−25.7)2

25.7 + · · · +(30−22.857)2

22.857 = 24.306

p-value : smaller than 0.001

Upper tail 0.3 0.2 0.1 0.05 0.02 0.01 0.005 0.001df 1 1.07 1.64 2.71 3.84 5.41 6.63 7.88 10.83

2 2.41 3.22 4.61 5.99 7.82 9.21 10.60 13.823 3.66 4.64 6.25 7.81 9.84 11.34 12.84 16.274 4.88 5.99 7.78 9.49 11.67 13.28 14.86 18.475 6.06 7.29 9.24 11.07 13.39 15.09 16.75 20.52

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 4 / 35

Page 16: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 17: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)

1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 18: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)

1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 19: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)

2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 20: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 21: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 22: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 23: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Modeling numerical variables

So far we have worked with1 numerical variable (Z, T)1 categorical variable (χ2)1 numerical and 1 categorical variable (2-sample Z/T, ANOVA)2 categorical variables (χ2 test for independence)

Next up: relationships between two numerical variables, as wellas modeling numerical response variables using a numerical orcategorical explanatory variable.

Wed–Friday: to model numerical variables using manyexplanatory variables at once.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 5 / 35

Page 24: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 25: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 26: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 27: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 28: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 29: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 30: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduaterate in all 50 US states and DC and the % of residents who live belowthe poverty line (income below $23,050 for a family of 4 in 2012).

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Response?

% in poverty

Explanatory?

% HS grad

Relationship?

linear, negative,moderately strong

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 6 / 35

Page 31: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 32: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Quantifying the relationship

Correlation describes the strength of the linear associationbetween two variables.

It takes values between -1 (perfect negative) and +1 (perfectpositive).

A value of 0 indicates no linear association.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 7 / 35

Page 33: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Quantifying the relationship

Correlation describes the strength of the linear associationbetween two variables.

It takes values between -1 (perfect negative) and +1 (perfectpositive).

A value of 0 indicates no linear association.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 7 / 35

Page 34: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Quantifying the relationship

Correlation describes the strength of the linear associationbetween two variables.

It takes values between -1 (perfect negative) and +1 (perfectpositive).

A value of 0 indicates no linear association.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 7 / 35

Page 35: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Guessing the correlation

Question

Which of the following is the best guess for the correlation between %in poverty and % HS grad?

(a) 0.6

(b) -0.75

(c) -0.1

(d) 0.02

(e) -1.5

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 8 / 35

Page 36: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Guessing the correlation

Question

Which of the following is the best guess for the correlation between %in poverty and % HS grad?

(a) 0.6

(b) -0.75

(c) -0.1

(d) 0.02

(e) -1.5

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 8 / 35

Page 37: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Guessing the correlation

Question

Which of the following is the best guess for the correlation between %in poverty and % HS female householder?

(a) 0.1

(b) -0.6

(c) -0.4

(d) 0.9

(e) 0.5

●●

● ●

●●

8 10 12 14 16 18

6

8

10

12

14

16

18

% female householder, no husband present

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 9 / 35

Page 38: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Guessing the correlation

Question

Which of the following is the best guess for the correlation between %in poverty and % HS female householder?

(a) 0.1

(b) -0.6

(c) -0.4

(d) 0.9

(e) 0.5

●●

● ●

●●

8 10 12 14 16 18

6

8

10

12

14

16

18

% female householder, no husband present

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 9 / 35

Page 39: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Assessing the correlation

Question

Which of the following is has the strongest correlation, i.e. correlationcoefficient closest to +1 or -1?

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●●

(a)

●●●

●●●

●●●●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●

●●

(b)

●●

●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

(c)

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

(d)

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 10 / 35

Page 40: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Correlation

Assessing the correlation

Question

Which of the following is has the strongest correlation, i.e. correlationcoefficient closest to +1 or -1?

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●

●●●●

●●●●

(a)

●●●

●●●

●●●●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●

●●

●●●●●

●●

●●

●●●

●●●

●●

(b)

●●

●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

(c)

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

(d)

(b)→correlationmeans linearassociation

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 10 / 35

Page 41: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 42: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Residuals

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 43: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Residuals

Residuals

Residuals are the leftovers from the model fit: Data = Fit + Residual

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 11 / 35

Page 44: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Residuals

Residuals (cont.)

ResidualResidual is the difference between the observed and predicted y.

ei = yi − yi

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

y

5.44

yy

−4.16

y

DC

RI

% living in poverty inDC is 5.44% morethan predicted.

% living in poverty inRI is 4.16% less thanpredicted.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 12 / 35

Page 45: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Residuals

Residuals (cont.)

ResidualResidual is the difference between the observed and predicted y.

ei = yi − yi

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

y

5.44

yy

−4.16

y

DC

RI

% living in poverty inDC is 5.44% morethan predicted.

% living in poverty inRI is 4.16% less thanpredicted.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 12 / 35

Page 46: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Residuals

Residuals (cont.)

ResidualResidual is the difference between the observed and predicted y.

ei = yi − yi

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

y

5.44

yy

−4.16

y

DC

RI

% living in poverty inDC is 5.44% morethan predicted.

% living in poverty inRI is 4.16% less thanpredicted.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 12 / 35

Page 47: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 48: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:

1 Option 1: Minimize the sum of magnitudes (absolute values) ofresiduals

|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 49: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 50: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 51: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?

1 Most commonly used2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 52: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used

2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 53: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used2 Easier to compute by hand and using software

3 In many applications, a residual twice as large as another is morethan twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 54: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

A measure for the best line

We want a line that has small residuals:1 Option 1: Minimize the sum of magnitudes (absolute values) of

residuals|e1| + |e2| + · · · + |en|

2 Option 2: Minimize the sum of squared residuals – least squares

e21 + e2

2 + · · · + e2n

Why least squares?1 Most commonly used2 Easier to compute by hand and using software3 In many applications, a residual twice as large as another is more

than twice as bad

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 13 / 35

Page 55: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Best line

The least squares line

y = β0 + β1x��

����predicted y��

��intercept

@@@R

slope

HHHHHj

explanatory variable

Notation:Intercept:

Parameter: β0Point estimate: b0

Slope:Parameter: β1Point estimate: b1

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 14 / 35

Page 56: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 57: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Given...

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

% HS grad % in poverty(x) (y)

mean x = 86.01 y = 11.35sd sx = 3.73 sy = 3.1

correlation R = −0.75

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 15 / 35

Page 58: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Slope

Slope

The slope of the regression can be calculated as

b1 =sy

sxR

In context...b1 =

3.13.73

× −0.75 = −0.62

InterpretationFor each % point increase in HS graduate rate, we would expect the% living in poverty to decrease on average by 0.62% points.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 16 / 35

Page 59: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Slope

Slope

The slope of the regression can be calculated as

b1 =sy

sxR

In context...b1 =

3.13.73

× −0.75 = −0.62

InterpretationFor each % point increase in HS graduate rate, we would expect the% living in poverty to decrease on average by 0.62% points.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 16 / 35

Page 60: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Slope

Slope

The slope of the regression can be calculated as

b1 =sy

sxR

In context...b1 =

3.13.73

× −0.75 = −0.62

InterpretationFor each % point increase in HS graduate rate, we would expect the% living in poverty to decrease on average by 0.62% points.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 16 / 35

Page 61: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Intercept

InterceptThe intercept is where the regression line intersects the y-axis. Thecalculation of the intercept uses the fact the a regression line alwayspasses through (x, y).

b0 = y − b1x

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

b0 = 11.35 − (−0.62) × 86.01

= 64.68

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 17 / 35

Page 62: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Intercept

InterceptThe intercept is where the regression line intersects the y-axis. Thecalculation of the intercept uses the fact the a regression line alwayspasses through (x, y).

b0 = y − b1x

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

b0 = 11.35 − (−0.62) × 86.01

= 64.68

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 17 / 35

Page 63: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Intercept

InterceptThe intercept is where the regression line intersects the y-axis. Thecalculation of the intercept uses the fact the a regression line alwayspasses through (x, y).

b0 = y − b1x

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

b0 = 11.35 − (−0.62) × 86.01

= 64.68

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 17 / 35

Page 64: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Interpret b0

Question

How do we interpret the intercept? (b0 = 64.68)

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

States with no HS graduates are expected on average to have64.68% of residents living below the poverty line.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 18 / 35

Page 65: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Interpret b0

Question

How do we interpret the intercept? (b0 = 64.68)

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

States with no HS graduates are expected on average to have64.68% of residents living below the poverty line.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 18 / 35

Page 66: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Recap: Interpretation of slope and intercept

Intercept: When x = 0, y is expected to equal the value of theintercept.

Slope: For each unit increase in x, y is expected toincrease/decrease on average by value of the slope.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 19 / 35

Page 67: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression The least squares line

Regression line

% in poverty = 64.68 − 0.62 % HS grad

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 20 / 35

Page 68: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 69: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Prediction

Using the linear model to predict the value of the responsevariable for a given value of the explanatory variable is calledprediction, simply by plugging in the value of x in the linear modelequation.There will be some uncertainty associated with the predictedvalue - we’ll talk about this next time.

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 21 / 35

Page 70: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Extrapolation

Applying a model estimate to values outside of the realm of theoriginal data is called extrapolation.

Sometimes the intercept might be an extrapolation.

●● ●

●●●●●

● ●●

●●●

●●

●●

●●●

●●●

●●

●●●●

● ●

0 20 40 60 80 1000

10

20

30

40

50

60

70

% HS grad

% in

pov

erty

intercept

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 22 / 35

Page 71: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Examples of extrapolation

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 23 / 35

Page 72: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Examples of extrapolation

1 http:// www.colbertnation.com/ the-colbert-report-videos/ 269929

2 Sprinting:

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 24 / 35

Page 73: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Examples of extrapolation

1 http:// www.colbertnation.com/ the-colbert-report-videos/ 2699292 Sprinting:

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 24 / 35

Page 74: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Prediction & extrapolation

Examples of extrapolation

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 25 / 35

Page 75: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 76: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions for the least squares line

1 Linearity

2 Nearly normal residuals

3 Constant variability

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 26 / 35

Page 77: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions for the least squares line

1 Linearity

2 Nearly normal residuals

3 Constant variability

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 26 / 35

Page 78: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions for the least squares line

1 Linearity

2 Nearly normal residuals

3 Constant variability

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 26 / 35

Page 79: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (1) Linearity

The relationship between the explanatory and the responsevariable should be linear.

Methods for fitting a model to non-linear relationships exist, butare beyond the scope of this class.Check using a scatterplot of the data, or a residuals plot.

x x

ysu

mm

ary(

g)$r

esid

uals

x

ysu

mm

ary(

g)$r

esid

uals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 27 / 35

Page 80: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (1) Linearity

The relationship between the explanatory and the responsevariable should be linear.Methods for fitting a model to non-linear relationships exist, butare beyond the scope of this class.

Check using a scatterplot of the data, or a residuals plot.

x x

ysu

mm

ary(

g)$r

esid

uals

x

ysu

mm

ary(

g)$r

esid

uals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 27 / 35

Page 81: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (1) Linearity

The relationship between the explanatory and the responsevariable should be linear.Methods for fitting a model to non-linear relationships exist, butare beyond the scope of this class.Check using a scatterplot of the data, or a residuals plot.

x x

ysu

mm

ary(

g)$r

esid

uals

x

ysu

mm

ary(

g)$r

esid

uals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 27 / 35

Page 82: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Anatomy of a residuals plot

% HS grad

% in

pov

erty

80 85 90

5

10

15

−5

0

5

∗ RI:

% HS grad = 81 % in poverty = 10.3% in poverty = 64.68 − 0.62 ∗ 81 = 14.46

e = % in poverty − % in poverty

= 10.3 − 14.46 = −4.16

� DC:

% HS grad = 86 % in poverty = 16.8% in poverty = 64.68 − 0.62 ∗ 86 = 11.36

e = % in poverty − % in poverty

= 16.8 − 11.36 = 5.44

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 28 / 35

Page 83: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Anatomy of a residuals plot

% HS grad

% in

pov

erty

80 85 90

5

10

15

−5

0

5

∗ RI:

% HS grad = 81 % in poverty = 10.3% in poverty = 64.68 − 0.62 ∗ 81 = 14.46

e = % in poverty − % in poverty

= 10.3 − 14.46 = −4.16

� DC:

% HS grad = 86 % in poverty = 16.8% in poverty = 64.68 − 0.62 ∗ 86 = 11.36

e = % in poverty − % in poverty

= 16.8 − 11.36 = 5.44

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 28 / 35

Page 84: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (2) Nearly normal residuals

The residuals should be nearly normal.

This condition may not be satisfied when there are unusualobservations that don’t follow the trend of the rest of the data.Check using a histogram or normal probability plot of residuals.

residuals

freq

uenc

y

−4 −2 0 2 4 6

02

46

810

12

●●

●●

●● ●

●●

−2 −1 0 1 2

−4

−2

02

4

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 29 / 35

Page 85: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (2) Nearly normal residuals

The residuals should be nearly normal.This condition may not be satisfied when there are unusualobservations that don’t follow the trend of the rest of the data.

Check using a histogram or normal probability plot of residuals.

residuals

freq

uenc

y

−4 −2 0 2 4 6

02

46

810

12

●●

●●

●● ●

●●

−2 −1 0 1 2

−4

−2

02

4

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 29 / 35

Page 86: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (2) Nearly normal residuals

The residuals should be nearly normal.This condition may not be satisfied when there are unusualobservations that don’t follow the trend of the rest of the data.Check using a histogram or normal probability plot of residuals.

residuals

freq

uenc

y

−4 −2 0 2 4 6

02

46

810

12

●●

●●

●● ●

●●

−2 −1 0 1 2

−4

−2

02

4

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 29 / 35

Page 87: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (3) Constant variability

●●

● ●

●●

80 85 90

68

1012

1416

18

% HS grad

% in

pov

erty

● ●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

80 90

−4

04

The variability of pointsaround the least squares lineshould be roughly constant.

This implies that the variabilityof residuals around the 0 lineshould be roughly constant aswell.

Also called homoscedasticity.

Check using a residuals plot.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 30 / 35

Page 88: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (3) Constant variability

●●

● ●

●●

80 85 90

68

1012

1416

18

% HS grad

% in

pov

erty

● ●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

80 90

−4

04

The variability of pointsaround the least squares lineshould be roughly constant.

This implies that the variabilityof residuals around the 0 lineshould be roughly constant aswell.

Also called homoscedasticity.

Check using a residuals plot.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 30 / 35

Page 89: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (3) Constant variability

●●

● ●

●●

80 85 90

68

1012

1416

18

% HS grad

% in

pov

erty

● ●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

80 90

−4

04

The variability of pointsaround the least squares lineshould be roughly constant.

This implies that the variabilityof residuals around the 0 lineshould be roughly constant aswell.

Also called homoscedasticity.

Check using a residuals plot.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 30 / 35

Page 90: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Conditions: (3) Constant variability

●●

● ●

●●

80 85 90

68

1012

1416

18

% HS grad

% in

pov

erty

● ●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

80 90

−4

04

The variability of pointsaround the least squares lineshould be roughly constant.

This implies that the variabilityof residuals around the 0 lineshould be roughly constant aswell.

Also called homoscedasticity.

Check using a residuals plot.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 30 / 35

Page 91: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Checking conditions

Question

What condition is this linear modelobviously violating?

(a) Constant variability

(b) Linear relationship

(c) Non-normal residuals

(d) No extreme outliers x x

yg$residuals

x

yg$residuals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 31 / 35

Page 92: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Checking conditions

Question

What condition is this linear modelobviously violating?

(a) Constant variability

(b) Linear relationship

(c) Non-normal residuals

(d) No extreme outliers x x

yg$residuals

x

yg$residuals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 31 / 35

Page 93: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Checking conditions

Question

What condition is this linear modelobviously violating?

(a) Constant variability

(b) Linear relationship

(c) Non-normal residuals

(d) No extreme outliersx x

yg$residuals

x

yg$residuals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 32 / 35

Page 94: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Conditions for the least squares line

Checking conditions

Question

What condition is this linear modelobviously violating?

(a) Constant variability

(b) Linear relationship

(c) Non-normal residuals

(d) No extreme outliersx x

yg$residuals

x

yg$residuals

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 32 / 35

Page 95: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 96: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

R2

The strength of the fit of a linear model is most commonlyevaluated using R2.

R2 is calculated as the square of the correlation coefficient.

It tells us what percent of variability in the response variable isexplained by the model.

The remainder of the variability is explained by variables notincluded in the model.

For the model we’ve been working with, R2 = (−0.62)2 = 0.38.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 33 / 35

Page 97: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

R2

The strength of the fit of a linear model is most commonlyevaluated using R2.

R2 is calculated as the square of the correlation coefficient.

It tells us what percent of variability in the response variable isexplained by the model.

The remainder of the variability is explained by variables notincluded in the model.

For the model we’ve been working with, R2 = (−0.62)2 = 0.38.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 33 / 35

Page 98: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

R2

The strength of the fit of a linear model is most commonlyevaluated using R2.

R2 is calculated as the square of the correlation coefficient.

It tells us what percent of variability in the response variable isexplained by the model.

The remainder of the variability is explained by variables notincluded in the model.

For the model we’ve been working with, R2 = (−0.62)2 = 0.38.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 33 / 35

Page 99: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

R2

The strength of the fit of a linear model is most commonlyevaluated using R2.

R2 is calculated as the square of the correlation coefficient.

It tells us what percent of variability in the response variable isexplained by the model.

The remainder of the variability is explained by variables notincluded in the model.

For the model we’ve been working with, R2 = (−0.62)2 = 0.38.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 33 / 35

Page 100: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

R2

The strength of the fit of a linear model is most commonlyevaluated using R2.

R2 is calculated as the square of the correlation coefficient.

It tells us what percent of variability in the response variable isexplained by the model.

The remainder of the variability is explained by variables notincluded in the model.

For the model we’ve been working with, R2 = (−0.62)2 = 0.38.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 33 / 35

Page 101: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

Interpretation of R2

Question

Which of the below is the correct interpretation of R = −0.62, R2 = 0.38?

(a) 38% of the variability in the % of HGgraduates among the 51 states isexplained by the model.

(b) 38% of the variability in the % ofresidents living in poverty among the 51states is explained by the model.

(c) 38% of the time % HS graduates predict% living in poverty correctly.

(d) 62% of the variability in the % ofresidents living in poverty among the 51states is explained by the model.

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 34 / 35

Page 102: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression R2

Interpretation of R2

Question

Which of the below is the correct interpretation of R = −0.62, R2 = 0.38?

(a) 38% of the variability in the % of HGgraduates among the 51 states isexplained by the model.

(b) 38% of the variability in the % ofresidents living in poverty among the 51states is explained by the model.

(c) 38% of the time % HS graduates predict% living in poverty correctly.

(d) 62% of the variability in the % ofresidents living in poverty among the 51states is explained by the model.

●●

● ●

●●

80 85 90

6

8

10

12

14

16

18

% HS grad

% in

pov

erty

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 34 / 35

Page 103: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

1 Recap: Chi-square test of independenceBall throwingExpected counts in two-way tables

2 Modeling numerical variables

3 Correlation

4 Fitting a line by least squares regressionResidualsBest lineThe least squares linePrediction & extrapolationConditions for the least squares lineR2

Categorical explanatory variables

Statistics 101

U6 - L1: Introduction to SLR Thomas Leininger

Page 104: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

Poverty vs. region (east, west)

poverty = 11.17 + 0.38 × west

Explanatory variable: region, reference level: eastIntercept: The estimated average poverty percentage in easternstates is 11.17%/

This is the value we get if we plug in 0 for the explanatory variable

Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.

Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.This is the value we get if we plug in 1 for the explanatory variable

This is called using a dummy variable.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 35 / 35

Page 105: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

Poverty vs. region (east, west)

poverty = 11.17 + 0.38 × west

Explanatory variable: region, reference level: eastIntercept: The estimated average poverty percentage in easternstates is 11.17%/

This is the value we get if we plug in 0 for the explanatory variable

Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.

Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.This is the value we get if we plug in 1 for the explanatory variable

This is called using a dummy variable.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 35 / 35

Page 106: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

Poverty vs. region (east, west)

poverty = 11.17 + 0.38 × west

Explanatory variable: region, reference level: eastIntercept: The estimated average poverty percentage in easternstates is 11.17%/

This is the value we get if we plug in 0 for the explanatory variable

Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.

Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.This is the value we get if we plug in 1 for the explanatory variable

This is called using a dummy variable.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 35 / 35

Page 107: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

Poverty vs. region (east, west)

poverty = 11.17 + 0.38 × west

Explanatory variable: region, reference level: eastIntercept: The estimated average poverty percentage in easternstates is 11.17%/

This is the value we get if we plug in 0 for the explanatory variable

Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.

Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.

This is the value we get if we plug in 1 for the explanatory variable

This is called using a dummy variable.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 35 / 35

Page 108: Unit 6: Simple Linear Regression Lecture : Introduction to SLRtjl13/s101/slides/unit6lec1.pdf · Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 3 / 35.

Fitting a line by least squares regression Categorical explanatory variables

Poverty vs. region (east, west)

poverty = 11.17 + 0.38 × west

Explanatory variable: region, reference level: eastIntercept: The estimated average poverty percentage in easternstates is 11.17%/

This is the value we get if we plug in 0 for the explanatory variable

Slope: The estimated average poverty percentage in westernstates is 0.38% higher than eastern states.

Then, the estimated average poverty percentage in westernstates is 11.17 + 0.38 = 11.55%.This is the value we get if we plug in 1 for the explanatory variable

This is called using a dummy variable.

Statistics 101 (Thomas Leininger) U6 - L1: Introduction to SLR June 17, 2013 35 / 35


Recommended