Post on 25-Dec-2015
transcript
Amsterdam Rehabilitation Research Center | Reade
Correlation and linear regression analysis
Martin van der Esch, Phd
Amsterdam Rehabilitation Research Center | Reade
Content
Correlation and linear regression analysis
Association researchHowever, also used in experimental studies
Amsterdam Rehabilitation Research Center | Reade
Correlation and regression
- Interested in relationship/association/correlation- Direction and magnitude of relationship- Dependent or independent variables- Association does not imply a ‘cause and effect’ relationship
Amsterdam Rehabilitation Research Center | Reade
Correlation
Amsterdam Rehabilitation Research Center | Reade
Correlation
Expressed as productmomentcorrelation Pearson coefficent (r) when data are not skewed
or rank order correlation Spearman (rs) when data are ordinal, skewed or in case of presence of outliers.
DimensionlessRage between +1 and –1 (0 = no correlation)Magnitude indicates how close the points are to a
straight line (the strength of an association)
+1 or –1: perfect correlation: all points lying on the line
5
Amsterdam Rehabilitation Research Center | Reade
Between -1 to 1.
Amsterdam Rehabilitation Research Center | Reade
70605040302010
Leeftijd
220
200
180
160
140
120
100
Sy
st.
blo
ed
dru
k
R Sq Linear = 0,432
Amsterdam Rehabilitation Research Center | Reade
70605040302010
Leeftijd
220
200
180
160
140
120
100
Sy
st.
blo
ed
dru
k
R Sq Linear = 0,712
Amsterdam Rehabilitation Research Center | Reade
Correlation coefficient
Range: -1 ≤ r ≤ 1.
In SPSS Model Summary
,844a ,712 ,702 9,563Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Leeftijda.
Coefficientsa
97,077 5,528 17,562 ,000,949 ,116 ,844 8,174 ,000
(Constant)Leeftijd
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Syst. bloeddruka.
Amsterdam Rehabilitation Research Center | Reade
Formula correlation
n
i
n
iii
n
iii
yyxx
yyxxr
1 1
22
1
)()(
))((
Amsterdam Rehabilitation Research Center | Reade
Regression analysis
Amsterdam Rehabilitation Research Center | Reade
Statistical analysisData were analyzed with SPSS for Windows 16.0 (SPSS Inc). According to their distribution, the
various parameters are expressed as mean (± standard deviation) or median (interquartile range). Data with a non-Gaussian distribution was log transformed for analysis if possible. To compare the groups, student’s T-test or Mann-Whitney U test was used when appropriate. Furthermore, correlations between variables were analyzed by using Pearson correlation or Spearman’s rho tests. Univariate linear regression analyses were performed on log-transformed data to investigate the influence of possible confounders (i.e. sex, smoking status, systolic blood pressure and body mass index (BMI) on the results. Wilcoxon signed-rank test was used to investigate the differences in values at baseline and at 8 weeks in the prospectively followed subgroup of patients (n=9). P-values less than 0.05 were considered statistically significant.
I C van Eijk, M E Tushuizen, A Sturk, B A C Dijkmans, M Boers, A E Voskuyl, M Diamant, G.J. Wolbink, R Nieuwland and M T NurmohamedCirculating microparticles remain associated with complement activation despite intensive anti-inflammatory therapy in early rheumatoid arthritisAnn Rheum Dis published online 16 Nov 2009;
Amsterdam Rehabilitation Research Center | Reade
Typical association question
Research question: is there an association between age and pain in patients with …?
Hypothesis: pain increases in older patients
Y = a + bX + e
Amsterdam Rehabilitation Research Center | Readeage
pain
50
Amsterdam Rehabilitation Research Center | Reade15
Simple (uni) linear regression analysis
Difference with correlation analysis: prediction line that gives the best description of the scatter
plot, best fitting line difficult to draw line by hand solve problem with mathematical equation
Amsterdam Rehabilitation Research Center | Reade
Simple (uni) linear regression analysis
We use the ‘Method of Least Squares’ to fit the best line
Minimal distance between the data and the fitting line
Amsterdam Rehabilitation Research Center | Readeage
pain
50
Amsterdam Rehabilitation Research Center | Reade
Simple regression analysis
1 = difference between age 0 and age 1 difference between age 1 and age 2
----------------------------------- difference between age 30 and age 31
Pain = 0 + 1 * age
What is 0?
What is 1? 1 = Beta=b
0 = pain at age is 0
Amsterdam Rehabilitation Research Center | Reade19
Mathematical equation to describe the relationship
y = a + b*x
y is called the dependent (outcome) variable
x is called the independent (predictor, explanatory) variable
a is the intercept: value of y when x=0
b (unstandardized beta) is the ´slope´: it represents the amount by which Y increases on average if we increase x by one unit
a and b are called regression coefficients
Amsterdam Rehabilitation Research Center | Reade
Simple regression analysis
Regression coefficient is equal to the difference in the outcome variable when the determinant one unit changes
Amsterdam Rehabilitation Research Center | Readeage
pain
50
1
1
Amsterdam Rehabilitation Research Center | Reade
Simple regression analysis
pain = - 20 + 0,5 * age
What is –20? What is 0,5?
Amsterdam Rehabilitation Research Center | Reade
You can also analyse difference between two groups with simple regression analysis.
Amsterdam Rehabilitation Research Center | Reade
Back to 2 groups and analysis of pain
group pre after
medication 75.8 (6.8) 65.8 (10.1)
placebo 75.4 (7.1) 68.2 (9.0)
Amsterdam Rehabilitation Research Center | Reade
Group Statistics
100 -7,2000 3,75513
100 -10,0000 6,81650
groepplacebo
nieuwe medicatie
VERSCHILN Mean Std. Deviation
Independent Samples Test
3,598 198 ,000 2,8000 1,26530 4,33470VERSCHILt df Sig. (2-tailed)
MeanDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
Amsterdam Rehabilitation Research Center | Reade
Now analysed by simple regression analysis
placebo Medication 1
Continuous outcome
Pain
Amsterdam Rehabilitation Research Center | Readeplacebo Medication 1
Continuous outcome
Pain
Amsterdam Rehabilitation Research Center | Reade
Simple regression analysis
Regression coefficient is equal to the difference in mean between two comparable groups
Amsterdam Rehabilitation Research Center | Reade
Simpe (uni) linear regression analysis
1 = mean difference between placebo and medication
Pain = 0 + 1 * group
placebo = 0; medication = 1
0 = mean in controlegroup
Amsterdam Rehabilitation Research Center | Reade
Coefficientsa
-7,200 ,550 -13,084 ,000
-2,800 ,778 -3,598 ,000
(Constant)
groep
Model1
B Std. Error
UnstandardizedCoefficients
t Sig.
Dependent Variable: VERSCHILa.
0 1
Amsterdam Rehabilitation Research Center | Reade
Hypothesis test for β
SEt
N-2 degrees of freedom
Amsterdam Rehabilitation Research Center | Reade
P value?
t -3,598
778,0
800,2t
Amsterdam Rehabilitation Research Center | Reade
Back to the exampleExperimental designIncluding another medicineThree comparable groups
Amsterdam Rehabilitation Research Center | Reade
Pain
group To T1
medication1 75.8 (6.8) 65.8 (10.1)
medication2 76.8 (7.5) 61.9 (11.7)
placebo 75.4 (7.1) 68.2 (9.0)
Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2
Continuous outcome
Amsterdam Rehabilitation Research Center | Reade
Coefficientsa
-6,850 ,586 -11,681 ,000
-3,850 ,454 -8,475 ,000
(Constant)
groep
Model1
B Std. Error
UnstandardizedCoefficients
t Sig.
Dependent Variable: VERSCHILa.
Group analysed as continuous variabele
Amsterdam Rehabilitation Research Center | Reade
But…, group isn’t a continous variable: a categorical variable Therefore it needs to be analysed by dummy-variables
Amsterdam Rehabilitation Research Center | Reade
Amsterdam Rehabilitation Research Center | Reade
Dummy variables
Categorical Variables Codings
,000 ,000
1,000 ,000
,000 1,000
placebo
nieuwe medicatie
alternatieve medicatie
GROEP(1) (2)
Parameter coding
Dummy 1: new medication - placebo
Dummy 2: alt. medication - placebo
Placebo: controle / control groep
Amsterdam Rehabilitation Research Center | Reade
Simple regression analysis
Pain = 0 + 1 * medicationgroup1 + 2 * medicatiogroup2
What is 0?0 = mean of placebogroupWhat is 1?1 = difference between placebo and medication1What is 2?2 = difference between placebo and medication2
Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2
Continuous outcome
Amsterdam Rehabilitation Research Center | Reade
Coefficientsa
-7,200 ,642 -11,222 ,000
-2,800 ,907 -3,086 ,002
-7,700 ,907 -8,487 ,000
(Constant)
DUMMIE1
DUMMIE2
Model1
B Std. Error
UnstandardizedCoefficients
t Sig.
Dependent Variable: VERSCHILa.
Pain = 0 + 1 * medicationgroup1 +
2 * medicationgroup2
Amsterdam Rehabilitation Research Center | Reade
Intermezzo
Little excercise..•
Is there a relationship between your height (cm) and shoesize (european size)…
•Estimate relationcoefficient…
•What does that mean?
•Estimate formula Height = ? + ? * shoesize
•Group: men/woman.
•Groups: occupational therapy, physiotherapy, other.
Amsterdam Rehabilitation Research Center | Reade
Assumption linear regression analysis
Linear relationship between x en y•S
catter diagram(
otherwise Logaritmic transformation (next week)
For each value of x, there is a distribution of values of y in the population; this distribution is Normal
•Analyses of the residuals
Variability of the distribution of y values in the population is the same for all values of x, i.e. the variance is constant (s2 / sd)
•Analyses of the residuals
Amsterdam Rehabilitation Research Center | Reade
Checking for linearity
Scatterplot
Adding a quadratic term
Splitting exposure variable into groups (4-5)
Amsterdam Rehabilitation Research Center | Reade
Adding a quadratic termpain
age
Amsterdam Rehabilitation Research Center | Reade
Checking for linearity
Splitting exposure variable into groups
Amsterdam Rehabilitation Research Center | Reade
Splitting exposure variable into groupspain
age
1
2
34
Amsterdam Rehabilitation Research Center | Reade
Example in SPSS
Examine the association between age and pain score at baseline.
ScatterplotLinear regression analysisChecking for linearity
•Adding a quadratic term
•Splitting exposure variable into groups
Amsterdam Rehabilitation Research Center | Reade
Scatter plot
40,00 50,00 60,00 70,00 80,00 90,00
age
65,00
70,00
75,00
80,00
pa
in
Amsterdam Rehabilitation Research Center | Reade
Lineair regression analysis
Pain (at baseline) = 56.2 + 0.23 * age
Coefficientsa
56,239 2,131 26,394 ,000
,234 ,033 ,523 7,005 ,000
(Constant)
age
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: paina.
Amsterdam Rehabilitation Research Center | Reade
Adding a quadratic term
Coefficientsa
56,239 2,131 26,394 ,000
,234 ,033 ,523 7,005 ,000
49,128 13,116 3,746 ,000
,456 ,405 1,020 1,124 ,263
-,002 ,003 -,499 -,549 ,584
(Constant)
age
(Constant)
age
age2
Model1
2
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: paina.
Amsterdam Rehabilitation Research Center | Reade
Splitting exposure variable into groups
Produce categorical age variableRecode to dummy variablesPerform linear regression analysis with dummiesAre the B’s increasing in a linear order with comparable
distance between the dummies?
Amsterdam Rehabilitation Research Center | Reade
Splitting exposure variable into groups
Coefficientsa
68,382 ,535 127,936 ,000
1,924 ,774 ,223 2,486 ,014
3,346 ,750 ,404 4,459 ,000
5,446 ,768 ,638 7,094 ,000
(Constant)
dummy1
dummy2
dummy3
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: paina.
Amsterdam Rehabilitation Research Center | Reade
Questions?
55