Lecture 16 Correlation and Coefficient of Correlation By Aziza
Munir
Slide 2
Learning Objectives What is Correlation What does it indicates
What is the purpose of correlation if regression is already there/
What does coefficient of Correlation Indicates Linear, multiple and
Partial correlation
Slide 3
Introduction Correlation a LINEAR association between two
random variables Correlation analysis show us how to determine both
the nature and strength of relationship between two variables When
variables are dependent on time correlation is applied Correlation
lies between +1 to -1
Slide 4
A zero correlation indicates that there is no relationship
between the variables A correlation of 1 indicates a perfect
negative correlation A correlation of +1 indicates a perfect
positive correlation
Slide 5
Types of Correlation There are three types of correlation
TypesType 1Type 2Type 3
Slide 6
Type1 PositiveNegative No Perfect If two related variables are
such that when one increases (decreases), the other also increases
(decreases). If two variables are such that when one increases
(decreases), the other decreases (increases) If both the variables
are independent
Slide 7
When plotted on a graph it tends to be a perfect line When
plotted on a graph it is not a straight line Type 2 Linear Non
linear
Slide 8
Slide 9
Two independent and one dependent variable One dependent and
more than one independent variables One dependent variable and more
than one independent variable but only one independent variable is
considered and other independent variables are considered constant
Type 3 SimpleMultiplePartial
Slide 10
Methods of Studying Correlation Scatter Diagram Method Karl
Pearson Coefficient Correlation of Method Spearmans Rank
Correlation Method
Slide 11
Very good fitModerate fit Correlation: Linear Relationship s
Strong relationship = good linear fit Points clustered closely
around a line show a strong correlation. The line is a good
predictor (good fit) with the data. The more spread out the points,
the weaker the correlation, and the less good the fit. The line is
a REGRESSSION line (Y = bX + a)
Slide 12
Coefficient of Correlation A measure of the strength of the
linear relationship between two variables that is defined in terms
of the (sample) covariance of the variables divided by their
(sample) standard deviations Represented by r r lies between +1 to
-1 Magnitude and Direction
Slide 13
-1 < r < +1 The + and signs are used for positive linear
correlations and negative linear correlations, respectively
Slide 14
Shared variability of X and Y variables on the top Individual
variability of X and Y variables on the bottom
Slide 15
Interpreting Correlation Coefficient r strong correlation: r
>.70 or r < .70 moderate correlation: r is between.30
&.70 or r is between .30 and .70 weak correlation: r is between
0 and.30 or r is between 0 and .30.
Slide 16
Coefficient of Determination Coefficient of determination lies
between 0 to 1 Represented by r 2 The coefficient of determination
is a measure of how well the regression line represents the data If
the regression line passes exactly through every point on the
scatter plot, it would be able to explain all of the variation The
further the line is away from the points, the less it is able to
explain
Slide 17
r 2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other
variable It is a measure that allows us to determine how certain
one can be in making predictions from a certain model/graph The
coefficient of determination is the ratio of the explained
variation to the total variation The coefficient of determination
is such that 0 < r 2 < 1, and denotes the strength of the
linear association between x and y
Slide 18
The Coefficient of determination represents the percent of the
data that is the closest to the line of best fit For example, if r
= 0.922, then r 2 = 0.850 Which means that 85% of the total
variation in y can be explained by the linear relationship between
x and y (as described by the regression equation) The other 15% of
the total variation in y remains unexplained
Slide 19
Spearmans rank coefficient A method to determine correlation
when the data is not available in numerical form and as an
alternative the method, the method of rank correlation is used.
Thus when the values of the two variables are converted to their
ranks, and there from the correlation is obtained, the correlations
known as rank correlation.
Slide 20
Computation of Rank Correlation Spearmans rank correlation
coefficient can be calculated when Actual ranks given Ranks are not
given but grades are given but not repeated Ranks are not given and
grades are given and repeated
Slide 21
Testing significance of correlation Test for the significance
of relationships between two CONTINUOUS variables We introduced
Pearson correlation as a measure of the STRENGTH of a relationship
between two variables But any relationship should be assessed for
its SIGNIFICANCE as well as its strength. A general discussion of
significance tests for relationships between two continuous
variables. Factors in relationships between two variables The
strength of the relationship: is indicated by the correlation
coefficient: r but is actually measured by the coefficient of
determination: r 2 The significance of the relationship is
expressed in probability levels: p (e.g., significant at p =.05)
This tells how unlikely a given correlation coefficient, r, will
occur given no relationship in the population NOTE! NOTE! NOTE! The
smaller the p-level, the more significant the relationship BUT!
BUT! BUT! The larger the correlation, the stronger the
relationship
Slide 22
Consider the classical model for testing significance It
assumes that you have a sample of cases from a population The
question is whether your observed statistic for the sample is
likely to be observed given some assumption of the corresponding
population parameter. If your observed statistic does not exactly
match the population parameter, perhaps the difference is due to
sampling error The fundamental question: is the difference between
what you observe and what you expect given the assumption of the
population large enough to be significant -- to reject the
assumption? The greater the difference -- the more the sample
statistic deviates from the population parameter -- the more
significant it is That is, the lessl ikely (small probability
values) that the population assumption is true. The classical model
makes some assumptions about the population parameter: Population
parameters are expressed as Greek letters, while corresponding
sample statistics are expressed in lower-case Roman letters: r =
correlation between two variables in the sample (rho) = correlation
between the same two variables in the population A common
assumption is that there is NO relationship between X and Y in the
population: = 0.0 Under this common null hypothesis in
correlational analysis: r = 0.0
Slide 23
Testing for the significance of the correlation coefficient, r
When the test is against the null hypothesis: r xy = 0.0 What is
the likelihood of drawing a sample with r xy 0.0? The sampling
distribution of r is approximately normal (but bounded at -1.0 and
+1.0) when N is large and distributes t when N is small. The
simplest formula for computing the appropriate t value to test
significance of a correlation coefficient employs the t
distribution: The degrees of freedom for entering the
t-distribution is N - 2 Example: Suppose you obsserve that r=.50
between literacy rate and political stability in 10 nations Is this
relationship "strong"? Coefficient of determination = r-squared
=.25 Means that 25% of variance in political stability is
"explained" by literacy rate Is the relationship "significant"?
That remains to be determined using the formula above r =.50 and
N=10
Slide 24
set level of significance (assume.05) determine one-or
two-tailed test (aim for one-tailed) For 8 df and one-tailed test,
critical value of t = 1.86 We observe only t = 1.63 It lies below
the critical t of 1.86 So the null hypothesis of no relationship in
the population (r = 0) cannot be rejected Comments Note that a
relationship can be strong and yet not significant Conversely, a
relationship can be weak but significant The key factor is the size
of the sample. For small samples, it is easy to produce a strong
correlation by chance and one must pay attention to signficance to
keep from jumping to conclusions: i.e., rejecting a true null
hypothesis, which meansmaking a Type I error. For large samples, it
is easy to achieve significance, and one must pay attention to the
strength of the correlation to determine if the relationship
explains very much
Slide 25
Correlation summary Most common form (Pearson) used with two
continuous variables, in a linear association Spearman used with
curvilinear associations Point-biserial used whenever an
independent samples t-test can be used Phi used when a chi square
for goodness of fit (with just 2 levels/variable) can be used Can
vary between -1 and +1 Does not tell anything about causation
Slide 26
Difference between Correlation and Regression Correlation
Coefficient, R, measures the strength of bivariate association The
regression line is a prediction equation that estimates the values
of y for any given x
Slide 27
Back to the idea of prediction With correlation, you can
predict the value of one variable based on the value of another
variable If you know someones marital problems, you can predict
that persons level of satisfaction But, if you knew more about that
person you could do an even better job predicting satisfaction
regression: used to predict one quantitative variable from a whole
mess of quantitative variables
Slide 28
Building up to regression First, the equation for a line? Y =
bX + a AKA: Y = mX + b In both, have intercept and slope Intercept
= predicted value of Y when X is zero Slope = how much Y is
predicted to change as X changes Goal of regression line: Minimize
the discrepancy between predicted and actual values of Y
Slide 29
Linking this to correlation Correlation = slope of the
regression line, if the scores are in z-scores predicted z score
for Y variable = correlation value * z-score for X variable
Slide 30
Difference between regression and correlation Correlation is a
special case of regression, with just one predictor variable
Regression lets you add in more predictor variables to: Figure out
how much of the Y variable is explained by a whole mess of
predictor variables Figure out how much each predictor variable
uniquely tells about the Y variable two tests for significance for
whole model, and for each individual variable
Slide 31
Limitations of the correlation coefficient Though R measures
how closely the two variables approximate a straight line, it does
not validly measures the strength of nonlinear relationship When
the sample size, n, is small we also have to be careful with the
reliability of the correlation Outliers could have a marked effect
on R Causal Linear Relationship
Slide 32
Conclusion Correlation and regression Types of correlations
Coefficient of correlation and its interpretation Difference
between regression and correlation