Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am...

Post on 20-Jan-2018

213 views 0 download

description

Objectives JIM 212 After going through this lesson, you should be able to:  Draw a scatter plot for a set of ordered pairs  Compute the correlation coefficient, r  Test the hypothesis: H 0 : ρ = 0 (test the significance of correlation coefficient) 3

transcript

Video Conference 1Video Conference 1

AS 2013/2012AS 2013/2012Chapters 10 – Correlation and Regression Chapters 10 – Correlation and Regression

15 December 2013 15 December 2013 10 am – 11 am10 am – 11 am

Puan Hasmawati Binti Hassanhasma@usm.my

04-6532285

Chapter 10 OverviewChapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10-2 Regression 10-3 Coefficient of Determination and

Standard Error of the Estimate 10-4 Multiple Regression (Optional)

2

ObjectivesObjectives

JIM 212

After going through this lesson, you should be able to:

Draw a scatter plot for a set of ordered pairs

Compute the correlation coefficient, r Test the hypothesis: H0: ρ = 0 (test the significance of correlation

coefficient)3

4

ObjectivesObjectives1. Draw a scatter plot for a set of ordered pairs.2. Compute the correlation coefficient.3. Test the hypothesis Ho: ρ = 0.4. Compute the equation of the regression line.5. Compute the standard error of the estimate.6. Find a prediction interval.7. Be familiar with the concept of multiple

regression - determining whether a relationship between two or more numerical or quantitative variables exists.

JIM 2125

Terminology

1. Correlation2. Independent variable3. Dependent variable4. Relationship5. Simple relationship6. Multiple relationship7. Positive relationship8. Negative relationship9. Linear relationship10.Correlation coefficient11.Prediction

JIM 2126

In addition to hypothesis testing and confidence intervals, inferential statistics involves determining whether a relationshiprelationship between two or more numerical or quantitative variables exists.

Introduction

JIM 212

• CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.

7

Introduction (cont…)

JIM 2128

• The purpose of this chapter is to answer these questions statistically:

1. Are two or more variables related?2. If so, what is the strength of the

relationship?3. What type of relationship exists?4. What kind of predictions can be

made from the relationship?

Introduction (cont…)

JIM 2129

Introduction (cont…)

1. Are two or more variables related?2. If so, what is the strength of the

relationship?

To answer these two questions, statisticians use the correlation coefficientcorrelation coefficient, a numerical measure to determine whether two or more variables are related and to determine the strength of the relationship between or among the variables.

JIM 21210

Introduction (cont…)

3. What type of relationship exists?

There are two types of relationships: simple and multiple.

In a simple relationship, there are two variables: an independent variable independent variable (predictor variable) and a dependent variable dependent variable (response variable).

In a multiple relationship, there are two or more independent variables that are used to predict one dependent variable.

JIM 21211

4. What kind of predictions can be made from the relationship?

Predictions are made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the more accurate the prediction is.

Introduction (cont…)

• Both are STATISTICAL METHODS• CorrelationCorrelation - to determine whether relationship relationship

between variables exists• RegressionRegression - to describe the nature of the relationship nature of the relationship

between variables (+ or -, linear or nonlinear)

Correlation & RegressionCorrelation & Regression

12

13

The purpose of this chapter is to answer these questions statistically:

1. Are two or more variables related?2. If so, what is the strength of the relationship?

3. What type of relationship exists?

4. What kind of predictions can be made from the relationship?

correlation correlation coefficientcoefficient

simple & multiplesimple & multiple

all areas and dailyall areas and daily

JIM 212

• Graph of ordered pairs (x, y) of numbers consisting of the independent variable x independent variable x and the dependent variable ydependent variable y.

• Independent variable? Independent variable? • Dependent variable?Dependent variable?

Scatter PlotsScatter Plots

14

JIM 212

Q1(i) Forest Fires and Acres Burneda) Page 549 Ex. 10 – 1 No. 14

Number of fires vs. number of acres burned15

JIM 21216

CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.

Correlation

JIM 21217

• The correlation coefficient correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables.

• There are several types of correlation coefficients. The one explained in this section is called the Pearson product moment Pearson product moment correlation coefficient (PPMC)correlation coefficient (PPMC).

• The symbol for the sample correlation sample correlation coefficient is coefficient is rr. The symbol for the population population correlation coefficient is correlation coefficient is ..

Correlation (cont…)

JIM 21218

• The range of the correlation coefficient is from 1 to 1.

• If there is a strong positive linear strong positive linear relationship relationship between the variables, the value of r will be close to 1.

• If there is a strong negative linear strong negative linear relationship relationship between the variables, the value of r will be close to 1.

Correlation (cont…)

JIM 21219

Correlation (cont…)

JIM 212

o Numerical measure to determine whether two or more variables are

linearlylinearly related, ando to determine the strengthstrength of the

relationship between or among the variables.

Correlation Coefficient

20

JIM 212

the strength (strong, weak) and direction (+ , -) of a linearlinear relationship between two variables.

r : sample correlation coefficient : population correlation coefficient Range: -1 ≤ ≤ 1

**Look at page 540 Figure 10-6

Correlation Coefficient (cont…)

21

JIM 21222

2 22 2

n xy x yr

n x x n y y

Formula for Correlation Coefficient

One of the formula for r :

where n is the number of data pairs.

494x 260y 2 31,692x 2 10,596y

17,285 8xy n

2 22 2

n xy x yr

n x x n y y

2 2

8 17,285 494 260

8 31,692 494 8 10,596 260

0.771

1(i) b) Page 549 Ex. 10 – 1 No. 14

JIM 21223

The Significance of the Correlation Coefficient

Use hypothesis-testing procedure, in order to make the decision.

3 ways 1. Traditional method2. P-value method3. Using Table I in Appendix C

JIM 21224

JIM 21225

• In hypothesis testing, one of the following is true:H0: 0 This null hypothesis means that

there is no correlation no correlation between the x and y variables in the population.

H1: 0 This alternative hypothesis means that there is a significant significant

correlation correlation between the variables in the population.

Hypothesis Testing

0

1

H : 0H : 0

2

21nt rr

Decision: Reject the null hypothesis, since the test value falls in the critical region. There is significant linear relationship between the number of forest fires and the number of acres burned.

2

8 20.7711 0.771

2.966

. 2.447c v

1(i) (c, d, e) Page 549 Ex. 10 – 1 No. 14 cont...

JIM 21226

JIM 21227

Now try using the other two procedures.

10.2 Regression10.2 Regression If the value of the correlation coefficient is

significant, the next step is to determine the equation of the regression line regression line which is the data’s line of best fit.

28

RegressionRegression

29

Best fit Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.

Regression LineRegression Line

30

y a bx

2

22

22

where = intercept = the slope of the line.

y x x xya

n x x

n xy x yb

n x x

a yb

31

Q1(ii) Forest Fires and Acres BurnedQ1(ii) Forest Fires and Acres BurnedPage 559 Ex. 10 – 2 No. 14Page 559 Ex. 10 – 2 No. 14

2

22

y x x xya

n x x

2

260 31,692 494 17,285

8 31,692 494

298,8709500

31.46

2 2

494 260 17, 285

31,692 10,596 8

x y xy

x y n

32

22

n xy x yb

n x x

2

8 17,285 494 260

8 31,692 494

98409500

1.036

' 31.46 1.036y x

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

33

' 31.46 1.036y x

Number of fires vs. number of acres burned

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

' 31.46 1.036y x

' when 60y x

' 31.46 1.036 60y

30.7 acres

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

34

' 31.46 1.036y x Regression line:

2 10,596y 260y 17,285 8xy n

10,596 31.46 260 1.036 17,2858 2

2

2est

y a y b xyS

n

12.03

Q1(iii) Q1(iii) Page 574 Ex. 10 – 3 No. 16Page 574 Ex. 10 – 3 No. 16 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

35

2

/ 2 22

1' 1est

n x Xy t S

n n x x

494x 2 31,692x

When 60, ' 30.7x y

494 61.758

X 12.03estS

/ 2 2.447t

Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 20Page 574 Ex. 10 – 3 No. 20 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

36

2

2

8 60 61.75130.7 2.447 12.03 18 8 31,692 494

30.7 31.259

2

/ 2 22

1' 1est

n x Xy t S

n n x x

0.559 61.959y

(Q1(iv)) (Q1(iv)) Page 574 Ex. 10 – 3 No. 20 cont...Page 574 Ex. 10 – 3 No. 20 cont... ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

37

JIM 21238

Q2(i) State Debt and Per Capita Taxa) Page 549 Ex. 10 – 1 No. 16

500 700 900 1100 1300 1500 1700 1900500

700

900

1100

1300

1500

1700

1900

x

y

JIM 21239

2(i) b) Page 549 Ex. 10 – 1 No. 16

2 22 2

n xy x yr

n x x n y y

2 2

5 11,247,109 6545 8416

5 9,635,035 6545 5 14,351,678 8416

0.518

2 2

6545 8416 11,247,109

9,635,035 14,351,678

x y xy

x y

JIM 21240

2(i) (c, d, e) Page 549 Ex. 10 – 1 No. 16 cont...

0

1

H : 0H : 0

. . 5 2 3, 0.05, . . 0.878d f c v

Decision: Do not reject. There is nosignificant linear relationship between percapita debt and tax.

0.518r

0.8780.878 0.518

41

Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16Page 549 Ex. 10 – 2 No. 16

From the hypothesis testing done, the null hypothesis is not rejected (r is not significant).

Therefore, there is no significant linear relationship between state debt and per capita tax.

Therefore, no regression should be done.

0.518r

No regression line no prediction??? When r is not significant, ......?........ is the

best predictor of y.

42

Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16 (cont...)Page 549 Ex. 10 – 2 No. 16 (cont...)

Standard Error of the EstimateStandard Error of the Estimate The standard error of estimatestandard error of estimate, denoted

by sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is:

43

2

2

est

y ys

n

2

2

est

y a y b xys

n

44

Since r is not significant, the standard error should not be calculated.

Q2(iii) Q2(iii) Page 574 Ex. 10 – 3 No. 18Page 574 Ex. 10 – 3 No. 18 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)

2

/ 2 22

2

/ 2 22

11

1

'

1'

est

esty

n x Xt

n n x x

n x Xt

n n

S

xS

x

y

y

Prediction IntervalPrediction Interval

45

46

Since r is not significant, the prediction interval should not be calculated.

Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 22Page 574 Ex. 10 – 3 No. 22 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)

47

Multiple RegressionMultiple Regression

In multiple regression, there are several independent variables and one dependent variable, and the equation is

1 1 2 2 k ky a b x b x b x

1 2

where , , , = independent variables. kx x x

48

Assumptions for Multiple RegressionAssumptions for Multiple Regression1. normality assumption – for any specific value of the

independent variable, the values of the y variable are normally distributed.

2. equal-variance assumption - the variances (or standard deviations) for the y variables are the same for each value of the independent variable.

3. linearity assumption - there is a linear relationship between the dependent variable and the independent variables.

4. nonmulticollinearity assumption - the independent variables are not correlated.

5. independence assumption - the values for the y variables are independent.

49

Q3. Special Occasion CakesQ3. Special Occasion Cakes Page 581 Ex. 10 – 4 No. 8Page 581 Ex. 10 – 4 No. 8

1 2 326.279 14.855 3.1035 0.73079y x x x

1

2

3

number of layers desirednumber of servings neededamount of filling mix used

xxx

price of a cakey

26.279 14.855 3 3.1035 48 0.73079 40y

$196.49

50

Thank Thank YouYou

51