+ All Categories
Home > Documents > SADC Course in Statistics Simple Linear Regression (Session 02)

SADC Course in Statistics Simple Linear Regression (Session 02)

Date post: 28-Mar-2015
Category:
Upload: abigail-hickey
View: 226 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
SADC Course in Statistics Simple Linear Regression (Session 02)
Transcript
Page 1: SADC Course in Statistics Simple Linear Regression (Session 02)

SADC Course in Statistics

Simple Linear Regression

(Session 02)

Page 2: SADC Course in Statistics Simple Linear Regression (Session 02)

2To put your footer here go to View > Header and Footer

Learning ObjectivesAt the end of this session, you will be able to

• understand the meaning of a simple linear regression model, its aims and terminology

• determine the best fitting line describing the relationship between a quantitative response (y) and a quantitative explanatory variable (x)

• Interpret the unknown parameters of the regression line

Page 3: SADC Course in Statistics Simple Linear Regression (Session 02)

3To put your footer here go to View > Header and Footer

An illustrative example

Data on the next slide shows the average number of cigarettes smoked per adult in 1930 and the death rate per million in 1952 for sixteen countries.

The question of interest is whether there is a relationship between the death rate (y) and level of smoking (x). Here both y and x are quantitative measurements.

Page 4: SADC Course in Statistics Simple Linear Regression (Session 02)

4To put your footer here go to View > Header and Footer

The DataCountry Cig. Smoked (x) Death rate (y)England and Wales 1378 461Finland 1662 433

Austria 960 380Nethelands 632 276Belgium 1066 254Switzerland 706 236New Zealand 478 216U.S.A. 1296 202Denmark 465 179Australia 504 177Canada 760 176France 585 140Italy 455 110Sweden 388 89Norway 359 77Japan 723 40

Page 5: SADC Course in Statistics Simple Linear Regression (Session 02)

5To put your footer here go to View > Header and Footer

Start by plotting - shows pattern

-a straight line relationship seems plausible here.

010

020

030

040

050

0D

eat

h ra

te (

y)

0 500 1000 1500 2000Cigarettes smoked (x)

Page 6: SADC Course in Statistics Simple Linear Regression (Session 02)

6To put your footer here go to View > Header and Footer

Recall reasons for modelling

• To determine which of (often) several factors explain variability in the key response of interest;

• To summarise the relationship(s);

• For predictive purposes, e.g. predicting y for given x’s, or identifying x’s that optimise y in some way;

Note: Presence of an association betweenvariables does not necessarily implycausation.

Page 7: SADC Course in Statistics Simple Linear Regression (Session 02)

7To put your footer here go to View > Header and Footer

Describe variation in response (here death rate) in terms of its relationship with the explanatory variable (here cig. numbers).

Model : Model : data = pattern + residual

–can describe pattern as: a + bx , if straight line relationship seems

reasonable

–residual is unexplained variation - assumed to be random.

Describing the Regression Model

Page 8: SADC Course in Statistics Simple Linear Regression (Session 02)

8To put your footer here go to View > Header and Footer

If there is only one explanatory variable, we have a Simple Linear Regression Model.

Here data = pattern + residual becomes:

y = + x +

where + x =pattern and = residual.• is called the intercept• is called the slope• the ’s represent the departure of the true line from the observed values.

Simple Linear Regression Model

Page 9: SADC Course in Statistics Simple Linear Regression (Session 02)

9To put your footer here go to View > Header and Footer

A Diagrammatic Representation

}

}

x

y

y x

i

x

y

××

×

××

××

×

×

i

i

Page 10: SADC Course in Statistics Simple Linear Regression (Session 02)

10To put your footer here go to View > Header and Footer

and are the unknown parameters in the model. They are estimated from the data

• The random error, , is assumed to have a– normal distribution– with constant variance (whatever the

value of x)

We shall return to these assumptions later.

Parameters of Model & Assumptions

Page 11: SADC Course in Statistics Simple Linear Regression (Session 02)

11To put your footer here go to View > Header and Footer

Results of model fitting------------------------------------------------------ deathrate|Coef. Std.Err. t P>|t| [95% Conf.Int.]---------+--------------------------------------------Cigars | .2410 .0544 4.43 0.001 .1245 .3577Const. | 28.31 46.92 0.60 0.556 -72.34 128.95------------------------------------------------------

These are estimates of coefficients of the regression equation since this is a sample of data - precision quantified by standard errors

Estimated equation is: y = 28.31 + 0.241 * x

Note: The t and P>|t| columns will be discussed in the next session.

Page 12: SADC Course in Statistics Simple Linear Regression (Session 02)

12To put your footer here go to View > Header and Footer

The fitted line0

100

200

300

400

500

0 500 1000 1500 2000Cigarettes smoked (x)

Death rate (y) Fitted values

Page 13: SADC Course in Statistics Simple Linear Regression (Session 02)

13To put your footer here go to View > Header and Footer

Interpreting model parameters

• Slope (regression coefficient): If cigarettes smoked increases by 1 unit per year, death rate will increase by 0.24 units. In other words, if cigarettes smoked increases by 100 units, death rate will increase by 24 units.

• Intercept of 28.31 only has meaning if the range of x values (cigarettes smoked) under study includes the value of zero. Here zero cigarettes smoked still gives an estimated death rate of 28.3

Page 14: SADC Course in Statistics Simple Linear Regression (Session 02)

14To put your footer here go to View > Header and Footer

Predictions from the lineThe model equation can also be used to

predict y at a given value of x

Thus from y = 28.31 + 0.241 x, predicted death rate ( ) in a country where

number of cigarettes smoked is x=1000, is given by

= 28.31 + 0.241 (1000)= 269.3

Note: Predictions will be discussed in greater detail in Session 9.

ˆˆy x

y

Page 15: SADC Course in Statistics Simple Linear Regression (Session 02)

15To put your footer here go to View > Header and Footer

Computation of model estimates (for reference only)

i iˆy x ˆˆ y xn

i i i i

2 2i i

x y ( x )( y ) / n Sxyˆx ( x ) / n Sxx

Note: Can also write i i

2i

(x x)(y y)Sxy

Sxx (x x)

Page 16: SADC Course in Statistics Simple Linear Regression (Session 02)

16To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…


Recommended