Correlation by Neeraj Bhandari ( Surkhet.Nepal )

Post on 21-Jan-2015

730 views 2 download

Tags:

description

 

transcript

CORRELATION

Correlation is a statistical measurement of the relationship between two variables such that a change in one variable results a change in other variable and such variables are called correlated.

Thus the correlation analysis is a mathematical tool which is used to measure the degree to which are variable is linearly related to each other

DIRECT OR POSITIVE CORRELATIONIf the increase(or decrease) in one variable results in a corresponding increase (or decrease) in the other, the correlation is said to be direct or positive.

INVERSE OR NEGATIVE CORRELATIONIf the increase(or decrease) in one variable results in a corresponding decrease (or increase) in the other, the correlation is said to be inverse or negative correlation.

For example, the correlation between (i)The income and expenditure; is positive.

And the correlation between (i) the volume and pressure of a perfect gas; is negative.

LINEAR CORRELATION A relation in which the values of two variable have a constant ratio is called linear correlation(or perfect correlation).NON LINEAR CORRELATIONA relation in which the values of two variable does not have a constant ratio is called a non linear correlation.

Karl Pearson’s Coefficient of Correlation-Correlation coefficient between two variables x and y is denoted by r(x,y) and it is a numerical measure of linear relationship between them.

r=

Where r = correlation coefficient between x and y σx= standard deviation of x σy = standard deviation of y n= no. of observations

Properties of coefficient of correlation-

(i) It is the degree of measure of correlation(ii)The value of r(x,y) lies between -1 and 1.(iii) If r=1, then the correlation is perfect positive.(iv) If r= -1, then the correlation is perfect negative.(v) If r = 0,then variables are independent , i.e. no correlation

(vi) Correlation coefficient is independent of change of origin and scale. If X and Y are random variables and a,b,c,d are any numbers provided that a ≠0, c ≠0 ,then

r( aX+b, cY+d) = r(X,Y)

Example:- Calculate the correlation coefficient of the following heights(in inches) of fathers(X) and their sons(Y):

X : 65 66 67 67 68 69 70 72 Y : 67 68 65 68 72 72 69 71

X Y XY

65 67 4225 4489 4355

66 68 4356 4624 4488

67 65 4489 4225 4355

67 68 4489 4624 4556

68 72 4624 5184 4896

69 72 4761 5184 4968

70 69 4900 4761 4830

72 71 5184 5041 5112

Total =544

552 37028 38132 37560

2x 2y

= = 544/8 ,

= 68

= = 552/8

= 69

r(X,Y) =

On putting all the values , we get r = .603

SOLUTION:SHORT-CUT METUOD-

X Y U=X-68

V=Y-69

U2 V2 UV

65 67 -3 -2 9 4 6

66 68 -2 -1 4 1 2

67 65 -1 -4 1 16 4

67 68 -1 -1 1 1 1

68 72 0 3 0 9 0

69 72 1 3 1 9 3

70 69 2 0 4 0 0

72 71 4 2 16 4 8

Total 0 0 36 44 24

= 0

= 0

r(U,V) =

On putting all the values we get- r(U,V) = .603

RANK CORRELATION-

Let (xi ,yi) i = 1,2,3……n be the ranks of n individuals in the group for the characteristic A and B respectively.Co-efficient of correlation between the ranks is called the rank correlation co-efficient between the characteristic A and B for that group of individuals.

r = 1-

Where di denotes the difference in ranks of the ith individual.

EXAMPLE-Compute the rank correlation co-efficient for the following data-Person : A B C D E F G H I JRank in Maths : 9 10 6 5 7 2 4 8 1 3 Rank in Physics:1 2 3 4 5 6 7 8 9 10

Person R1 R2 d=R1 -R2 d2

A 9 1 8 64

B 10 2 8 64

C 6 3 3 9

D 5 4 1 1

E 7 5 2 4

F 2 6 -4 16

G 4 7 -3 9

H 8 8 0 0

I 1 9 -8 64

J 3 10 -7 49

TOTAL 280

r = 1-

=1- [ {6×280}/10(100-1)] = 1- 1.697 = -0.697.

Repeated Ranks

2 2 2 21 1 2 2

2

1 1 16 1 1 ..... 1

12 12 121

1

k kd m m m m m mr

n n

Example : Obtain the rank correlation co-efficient for the following data ;

X 68 64 75 50 64 80 75 40 55 64

Y 62 58 68 45 81 60 68 48 50 70

X 68 64 75 50 64 80 75 40 55 64

Y 62 58 68 45 81 60 68 48 50 70

Ranks in X

4 6 2.5 9 6 1 2.5 10 8 6

Ranks in Y

5 7 3.5 10 1 6 3.5 9 8 2

d=x-y -1 -1 -1 -1 5 -5 -1 1 0 4 0

d2 1 1 1 1 25 25 1 1 0 16 72

2 2 2 21 1 2 2 3 3

2

2 2 2

2

1 1 16 1 1 1

12 12 121

1

1 1 16 72 2 2 1 3 3 1 2 2 1

12 12 121

10 10 1

6 75 61 0.545

990 11

d m m m m m mr

n n

r

r

Regression AnalysisThe term regression means some sort of functional relationship between two or more variables.

Regression measures the nature and extent of correlation.Regression is the estimation or prediction of unknown values of one variable from known values of another variable.

CURVE OF REGRESSION AND REGRESSION EQUATION

If two variates x and y are correlated, then the scatter diagram will be more or less concentrated round a curve. This curve is called the curve of regression.The mathematical equation of the regression curve is called regression equation.

LINEAR REGRESSION

When the points of the scatter diagram concentrate round a straight line, the regression is called linear and this straight line is known as the line of regression.

LINES OF REGRESSIONIn case of n pairs (x,y), we can assume x or y as independent or dependent variable.Either of the two may be estimated for the given values of the other. Thus if want to estimate y for given values of x, we shall have the regression equation of the form y = a + bx, called the regression line of y on x. And if we wish to estimate x from the given values of y, we shall have the regression line of the form x = A + By, called the regression line of x on y. Thus in general, we always have two lines of regression

LINE OF REGRESSION OF Y ON X:

( )yxy y b x x

WHERE IS REGRESSION CO-EFFICIENT.

2 2( )y

yxx

n xy x yb r

n x x

yxb

LINE OF REGRESSION OF X ON Y:

( )xyx x b y y

Where is the regression co-efficient. xyb

2 2( )x

xyy

n xy x yb r

n y y

Theorem :- Correlation co-efficient is the geometric mean between the regression co-efficients.

The co-efficient of regression are

Then geometric mean =

= co-efficient of correlation

y xyx xy

x y

r rb and b

yx

y x

rrr

EXAMPLE-

Find the line of regression of y on x for the data given below:

X: 1.53 1.78 2.60 2.95 3.43

Y: 33.50 36.30 40 45.80 53.50

Solution:

x y x y

1.53 33.50 2.3409 51.255

1.78 36.30 2.1684 64.614

2.60 40.00 6.76 104

2.95 45.80 8.7025 135.11

3.42 53.50 11.6964

182.97

2x

12.28x 209.1y 2 32.67x 537.95xy

Here n=5

= 9.726Then, the line of regression of y on x

y=17.932+9.726xWhich is required line of regression of y on x.

2 2( )yx

n xy x yb

n x x

( )yxy y b x x

Question:For 10 observations on price (x) and supply (y), the following data were obtained :

Obtain the two lines of regression and estimate the supply when price is 16 units.

2 2130., 220., 2288., 5506., 3467x y x y xy

Solution:

Regression coefficient of y on x

=1.015

Regression line of y on x is

y=1.015x+8.805

10,, 13., 22x y

n x yn n

2 2( )yx

n xy x yb

n x x

( )yxy y b x x

Since we are to estimate supply (y) when price (x) is given therefore we are to use regression line of y on x here.

When x=16 units y = 1.105(16)+8.805 =25.045

Ques:- From the following data, find the most likely value of y when x=24:

Mean (x)=18.1, mean (y)=985.8 S.D (x)=2, S.D (y)=36.4, r=0.58

Ex. In a partially destroyed laboratory record of an analysis of a correlation data, the following results only are eligible : Variance of x = 9 Regression equations :

What were (a) the mean values of x and y , (b) the standard deviation of x and y and the coefficient of correlation between x and y

8 10 66 0, 40 18 214.x y x y

2

(i)Sinceboth the linesof regression pass through thepoint (x,y) therefore,

8 10 66 0

40 18 214 0 .

13 17

( ) 9 3

0.8 6.6

x x

x y

x y Solvetheseeqs

x and y

ii Variance of x

Theequations of lines of regressioncanbewritten as

y x and x

2

0.45 5.35

0.8 0.45

* 0.8*0.45 0.36

0.6

0.8*0.34

0.6

yx xy

yx xy

y yx xyx y

x

y

b and b

r b b

r

r bb

r

Ques. : If the regression co-efficient are 0.8 and 0.2, what would be the value of co-efficient of correlation.

Ques.: The equations of two lines of regression obtained in a correlation analysis of 60 observation are 5x = 6y +24 , and 1000y =768 x – 3608.

What is the co-efficient of correlation ? Mean values of x and y. What is the ratio of variance of x and y ?