+ All Categories
Home > Documents > Regression and correlation Dependence of two quantitative variables.

Regression and correlation Dependence of two quantitative variables.

Date post: 05-Jan-2016
Category:
Upload: amberly-dorsey
View: 222 times
Download: 1 times
Share this document with a friend
47
Regression and correlation Dependence of two quantitative variables
Transcript
Page 1: Regression and correlation Dependence of two quantitative variables.

Regression and correlation

Dependence of two quantitative variables

Page 2: Regression and correlation Dependence of two quantitative variables.

Regression – I do know, which one is dependent and which one is

independentlength = 0.713+ 0.2702age

2 4 6 8 10 12 14 16 18

age[y ears ]

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

length[cm

]

Page 3: Regression and correlation Dependence of two quantitative variables.

Similarly will depend

• High of plant on nutrient content in soil

• Intensity of photosynthesis on amount of light

• Species diversity on latitude

• Rate of enzymatic reaction on temperature

• and not vice versa

Page 4: Regression and correlation Dependence of two quantitative variables.

Correlation – both variables are “equal”

0 10 20 30 40 50 60 70 80

D BH [cm]

0

10

20

30

40

50

60

HIG

H[m

]

Page 5: Regression and correlation Dependence of two quantitative variables.

Similarly we can be interested in correlations of

• Pb and Cd contents in water• Number of points from test in maths and

chemistry• Cover of Cirsium and Agropyron in square

in meadow

• Anywhere, where it is hard to say, what depends on what

Page 6: Regression and correlation Dependence of two quantitative variables.

Even by equal variables

• we can use one of them as a predictor.

• Regression is then used even in cases when there is not clear causality. I can predict on the basis of DBH (easier measurement) height of a tree.

Page 7: Regression and correlation Dependence of two quantitative variables.

Model of simple linear regression

XY

Dependent variable, response

InterceptSlope, coefficient of regression

Independent variable, predictor

Error variability - N(0,σ2)

Page 8: Regression and correlation Dependence of two quantitative variables.

XYCoefficient of regression = slope of the line, how much Y changes if X is changed by one unit. So, it is a value dependent on units in which X and Y are measured. It reaches from - to +.

00

α=value of Y if X=0

Β=tg of angle slope

Page 9: Regression and correlation Dependence of two quantitative variables.

XYSo, we presume:

X is measured exactly

Y measurement is subject to an error

mean value of Y depends linearly on X

variance “around line” is always the same (homogenity of variances)

Page 10: Regression and correlation Dependence of two quantitative variables.

Which line is the best one?

0

2

4

6

8

10

12

0 2 4 6 8 10

X

Y

Page 11: Regression and correlation Dependence of two quantitative variables.

Which line is the best one?

0

2

4

6

8

10

12

0 2 4 6 8 10

X

Y

Page 12: Regression and correlation Dependence of two quantitative variables.

Which line is the best one?

0

2

4

6

8

10

12

0 2 4 6 8 10

X

Y

This one probably not, but how I can distinguish it?

Page 13: Regression and correlation Dependence of two quantitative variables.

The best line is that one fitting

• Criterion of Least squares (LS)

• i.e. the least sum of squares of deviations predicted – real value of dependent variable

n

iii YY

1

n

iii YY

1

Page 14: Regression and correlation Dependence of two quantitative variables.

I.e. the best is that line having the least sum of squares of residuals

0

2

4

6

8

10

12

0 2 4 6 8 10

X

Y

n

iii YY

1

Vertical, no horizontal distance to line!!!

Page 15: Regression and correlation Dependence of two quantitative variables.

Can parameters of line be computed from this condition?

I replace valuation with Y

n

i

n

ibXaYYYSS

bXaY

1

2

1

2 )()ˆ(

ˆ

X and Y are values measured. We count them as fixed. So, I am searching for local minimum of function with two variables, a and b.

We calculate derivations according a and b. Then put

d SS/da = 0, and d SS/ db=0

by solving those equations, I get the parameters

Page 16: Regression and correlation Dependence of two quantitative variables.

We get

2XX

YYXXb

i

ii

XbYa

Line always goes through the point of averages of both the variables YX ,

α and β are real values, a and b are their estimations

Page 17: Regression and correlation Dependence of two quantitative variables.

b is (sample) estimate of real value β

Hypothetical population with regression coefficient equal to zero.Ringed points can be possible sample with five observations.

Every estimate is subject to an error – from data variability Statistica computes mean error of estimate b

Page 18: Regression and correlation Dependence of two quantitative variables.

In case of independence β=0

P-value for the test of

H0: β=0

is probability, that I get such good dependence by chance, if variables are independent

Page 19: Regression and correlation Dependence of two quantitative variables.

For test H0: β=0

estimation parameter of error standardparameter of value hypothetic - estimation parametert

besb

t..

One tailed tests can be used. Similar test can be used also for parameter a, we test then, if the line goes through zero, what is in the most cases uninteresting

Number of degrees of freedom is n-2

Page 20: Regression and correlation Dependence of two quantitative variables.

Test using the ANALYSIS OF VARIANCE of regression model

We test null hypothesis, that our model explains nothing (variables are indpendent). Then holds that β=0. [So, the test should be in congruence with the previous one, it just doesn’t enables one-way hypothesis]

Again – as in classic ANOVA, the principle is the analysis of sum of squares

Page 21: Regression and correlation Dependence of two quantitative variables.

Grand variability = squares of deviations of observations from grand mean

2YYSS iTOT

Variability explained by model=squares of deviations of predicted values from grand mean

2ˆ YYSS iREG

Age, X, in days

Win

g l

en

gth

, Y

, in

ce

nti

me

ters

Page 22: Regression and correlation Dependence of two quantitative variables.

Error variability= squares of deviations of observed and predict values

2ˆ iie YYSS

eREGTOT SSSSSS Holds:

Age, X, in days

Win

g l

en

gth

, Y

, in

ce

nti

me

ters

Page 23: Regression and correlation Dependence of two quantitative variables.

As in classic ANOVA holds

MS=SS/DF - it is estimate of variance of population, if null hypothesis is true. And also here we make a test using ratio of grand variation estimations based on variance explained and unexplained by the model

e

REG

MSMS

F

Page 24: Regression and correlation Dependence of two quantitative variables.

This beta is something different from the one used so far

Test of the null hypothesis, that in the hatching time birds are wingless (in day zero length is zero)

ANOVA model

Analysis of Variance; DV: delka (suspavelikonoce1)

EffectSums ofSquares

dfMeanSquares

F p-level

Regress.ResidualTotal

19.13221119.13221401.08750.0000000.52471110.0477019.65692

Page 25: Regression and correlation Dependence of two quantitative variables.

Coefficient of determination - percent of variability explained

TOT

e

TOT

REG

SSSS

SSSS

R 12

Page 26: Regression and correlation Dependence of two quantitative variables.

Confidence belt – where is with given [95% here] probability for given X

mean value Yleng th = 0.7131+ 0.2702*x

2 4 6 8 10 12 14 16 18

age [day s ]

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

length[cm

]

Basically – where is the line

Page 27: Regression and correlation Dependence of two quantitative variables.

Prediction or toleration belt

leng th = 0.7131+ 0.2702*x

2 4 6 8 10 12 14 16 18

age[day s ]

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

length[cm

]

Where the next observation will be

Page 28: Regression and correlation Dependence of two quantitative variables.

Reliability is the best around the mean

leng th = 0.7131+ 0.2702*x

2 4 6 8 10 12 14 16 18

age[day s ]

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

length[cm

]

Page 29: Regression and correlation Dependence of two quantitative variables.

Regression going through zero – it is possible, but

N oSpecies = -9.5714+ 1.1714*x

0 2 4 6 8 10 12 14 16 18 20 22

s uc c es s ion[y ears ]

-10

-8

-6

-4

-2

0

2

4

6

8

10

12

14

16

NoS

pecies.m

-2

How was it in reality?

Page 30: Regression and correlation Dependence of two quantitative variables.

My regression has proved with high certainty, that in the time of volcanic

island’s birth there was a negative number of species

Regression Summary for Dependent Variable: pocetdruhu (suspavelikonoce1)R= .98008746 R2= .96057143 Adjusted R2= .95071429F(1,4)=97.449 p<.00059 Std.Error of estimate: .99283

N=6BetaStd.Err.

of BetaBStd.Err.

of Bt(4)p-level

Interceptsukcese[roky]

-9.571431.825556-5.243020.0063270.9800870.0992831.171430.1186669.871640.000591

Page 31: Regression and correlation Dependence of two quantitative variables.

Regression going through zero – it is possible, but

How was it in reality?

N oSpecies = -9.5714+ 1.1714*x

0 2 4 6 8 10 12 14 16 18 20 22

s uc c es s ion[y ears ]

-10

-8

-6

-4

-2

0

2

4

6

8

10

12

14

16

NoS

pecies.m

-2 regression going through zero do such a thing

Page 32: Regression and correlation Dependence of two quantitative variables.

We don’t use linear regression, because

• We believe, that dependence is linear in all its range, but nevertheless we often (and legitimately) believe, that we can rationally approximate it by linear function in the range of our values used.

• Be carefull with extrapolations (especially dangerous are extrapolations to zero)

Page 33: Regression and correlation Dependence of two quantitative variables.

Using of regression doesn’t mean causal dependence

• Significant are:

• Dependence of number of murders on number of frost days in year in USA states

• Dependence of number of divorces on number of fridges in years

• Dependence of number of inhabitants of India on concentration of CO2 in years

• Causal dependence can be proved just by manipulative experiment

Page 34: Regression and correlation Dependence of two quantitative variables.

Dependence of number of murders (Murders) on number of frost days

(Frost) in single states of USAResults of regression analysis of number of murders per 100 000 inhabitants in year 1976 (Murders) in individual states of USA in dependence on number of frost days in the capital of given state in years 1931-1960 (Frost). P<0.01

Page 35: Regression and correlation Dependence of two quantitative variables.

Power of test

• Depends on number of observations and strength of the relation (so, on R2 in the whole population)

• In experimental studies we can increase R2 by increasing range of independent variable (keep in mind, it usually makes linearity of relation worse)

Page 36: Regression and correlation Dependence of two quantitative variables.

In interpretations

• Make difference, when we are more interested in the strength of relation (and thus R2 value), and when we are happy, when “it is significant”.

• How much is new cheap analytical method based on real concentration? (If I haven’t believed, that H0: method is completely independent on concentration isn’t true, so I wouldn’t do it – I am interested in R2 or in error of estimation.)

Page 37: Regression and correlation Dependence of two quantitative variables.

Declaration

• Method is excellent, dependence on real concentrations is highly significant (p<0.001) says the only thing – we are very sure, that the method is better than random number generator. We are interested mainly in R2 [and value of 0.8 can be low for us] (and especially here the error of estimation).

Page 38: Regression and correlation Dependence of two quantitative variables.

On the other side

• Declaration: Number of species is positively dependent on soil pH (F1,33=12.3, p<0.01) is interesting, as the fact that the null hypothesis is not true is not clear a priori. But I am interested in R2 too (but I might be satisfied even with very low values, e.g. 0.2).

Page 39: Regression and correlation Dependence of two quantitative variables.

HIG H[m ] = -11.5952+ 0.6905*x DB H[cm ]

25 30 35 40 45 50 55 60 65 70 75

D BH [cm]

5

10

15

20

25

30

35

40

45

HIG

H[m

]

DB H[cm ] = 24.1035+ 1.1175*x High[m ]

25 30 35 40 45 50 55 60 65 70 75

D BH [cm]

5

10

15

20

25

30

35

40

45

HIG

H[m

]

Changing X for Y I get logically different results (as regression formulas aren’t inverse functions). But R2, F, and P are the same.

I estimate DBH with help of height I estimate height with help of DBH

minimise minimise

Page 40: Regression and correlation Dependence of two quantitative variables.

Even simple regression

• is computed in Statistica with help of “Multiple regression”. I write to my results, that I have used simple regression!!!

Page 41: Regression and correlation Dependence of two quantitative variables.

Data transformation in regression

• Attention – values aren’t equal

• Independent value is considered exact

• Dependent variable contains error (and on it I do minimization of error sum of squares)

Page 42: Regression and correlation Dependence of two quantitative variables.

Make difference

• with transformation of independent variable I change shape of dependence, but not residual distribution

• with transformation of dependent variable I change both – shape and residual distribution

Page 43: Regression and correlation Dependence of two quantitative variables.

Linearized regressionThe most common transformation is logarithmic one.

With logarithm of independent variable, I get

Y=a+b log(X) Scatterplot (suspavelikonoce1 10v*13c)

SPEC = 3.1904+12.5165*log10(x)

0 2 4 6 8 10 12 14 16 18 20 22 24 26

AREA

2

4

6

8

10

12

14

16

18

20

22

24S

PE

C

The first line is usually deleted, the second one usually to in the case of publication

Presumption - residuals weren’t dependent on mean – transformation haven’t done anything with them.

S=a+blog(A)

Page 44: Regression and correlation Dependence of two quantitative variables.

Relationhip is exponential Residuals are linearly dependent

on mean

bXabXa eeeY

bXaY ln

Page 45: Regression and correlation Dependence of two quantitative variables.

It doesn’t matter, if I use ln or log

But if I want to estimate growth rate, then use ln!

rtNN

eNN

t

rtt

)ln()ln( 0

0 I logarithm just dependent variance – and I “homogenize” residuals

Page 46: Regression and correlation Dependence of two quantitative variables.

Popular is power relationship

A Several lines of formula y=axb

It always goes through zero - Allometric relationships, Species-Area

Page 47: Regression and correlation Dependence of two quantitative variables.

baXY XbaY lnlnln

Use either ln or log

It linearizes most of monotonic relationships without flex point [S=cAz] going through zero

Log transformation of both variables, residuals are assumed to be positively dependent on the mean.

Attention, using the logarithm, positive deviance from prediction are “decreased” more than the negative ones.


Recommended