CORRELATION AND REGRESSION
Vanja Radišić Biljak
Department of medical laboratory diagnostics, University Hospital „Sveti Duh”, Zagreb, Croatia
MedCalc
Example 1
Is there an association between White Blood Cell (WBC) count and
concentration of C-reactive protein (CRP) in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
What is the question about the data?
Are these groups
different?
Are these groups
associated?
Can I predict one variable
by knowing the other?
Tests for
statistical
differences
Correlation Regression
Correlation
• Statistical procedure applied to investigate association between two
variables
• Numerically expressed as coefficient of correlation (r)
• Level of significance (P)
Positive correlation
• Increase of x → Increase of y
• Decrease of x → Decrease of y
0 < r ≤ 1
x
y
Negative correlation
-1 ≤ r < 0
Increase of x → Decrease of y
Decrease of x → Increase of y
y
x
No correlation
• Increase of x → ? of y
• Decrease of x → ? of y
y
x
Association but no correlation
• Correlation can be calculated only if there is LINEAR association
y y
x x
Types of correlation
• Pearson (rp) and Spearman correlation (rs)
1. Sample size < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
Is there an association between White Blood Cell (WBC) count and
concentration of C-reactive protein (CRP) in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
Is there an association between White Blood Cell (WBC) count and
concentration of C-reactive protein (CRP) in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
Is there an association between White Blood Cell (WBC) count and
concentration of C-reactive protein (CRP) in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
Example 1
Is there an association between White Blood Cell (WBC) count and
concentration of C-reactive protein (CRP) in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
Quantitative (numerical) data
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Reject normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Reject normality
Both variables do
not follow normal
distribution
Spearman correlation
Pearson correlation
Example 1
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Reject normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
MedCalc
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
There is statistically significant positive poor
association between WBC and CRP in a group of
University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis.
Coefficient of determination (D)
• D = r2
• Indicates how well data fit a statistical model
• r = 0.85; D = 0.7225
• 72% of data are associated
• r = 0.25; D = 0.0625
• 6.25% of data are associated
There is statistically significant positive poor
association between WBC and CRP in a group of
University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis.
r2=0,2162
22% of data are associated
Example 2
Is there an association between White Blood Cell (WBC) count and Mean
Platelet Volume (MPV) in a group of University Hospital „Sveti Duh”
Emergency Department patients with a suspicion of acute appendicitis?
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Type of data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
Is there an association between White Blood Cell (WBC) count and Mean
Platelet Volume (MPV) in a group of University Hospital „Sveti Duh”
Emergency Department patients with a suspicion of acute appendicitis?
Quantitative (numerical) data
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Accept normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 2
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Accept normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
MedCalc
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
We cannot interpret association between WBC and
MPV in a group of University Hospital „Sveti Duh”
Emergency Department patients with a suspicion of
acute appendicitis, as the P value is not significant.
Example 3
Is there an association between platelet (PLT) count and Mean Platelet
Volume (MPV) in a group of University Hospital „Sveti Duh” Emergency
Department patients with a suspicion of acute appendicitis?
Example 3
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 3
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Accept normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Example 3
• Pearson (rp) and Spearman correlation (rs)
1. Sample size = 120 < 30 Spearman correlation
2. Numerical data At least one ordinal
data Spearman correlation
3. Accept normality
Both variables do not
follow normal
distribution
Spearman correlation
Pearson correlation
Coefficient of correlation (r)
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
There is statistically significant negative poor
association between Plt and MPV in a group of
University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis.
There is statistically significant negative poor
association between Plt and MPV in a group of
University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis.
r2=0,248
25% of data are associated
Example 4
Can we predict MPV values if we know Plt count in a group of University
Hospital „Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
What is the question about the data?
Are these groups
different?
Are these groups
associated?
Can I predict one variable
by knowing the other?
Tests for
statistical
differences
Correlation Regression
Regression
• Dependent variable is numeric
• Calculating value of dependent
variable
• Dependent variable is categorical
(binomial)
• Calculating the odds of an event
• Presence/Existence of disease
• Cut-off value
Linear Logistic
Example 4
Can we predict MPV if we know PLT count in a group of University Hospital
„Sveti Duh” Emergency Department patients with a suspicion of acute
appendicitis?
Quantitative (numerical) data
Example 4
• Dependent variable is numeric
• Calculating value of dependent
variable
• Dependent variable is categorical
(binomial)
• Calculating the odds of an event
• Presence/Existence of disease
• Cut-off value
Linear Logistic
Example 4
• Dependent variable is numeric
• Calculating value of dependent
variable
• Dependent variable is categorical
(binomial)
• Calculating the odds of an event
• Presence/Existence of disease
• Cut-off value
Linear Logistic
Linear regression
• Linear regression can be calculated ONLY if there is correlation between
variables
• Independent variable (x)
• Dependent variable (y)
• Dependent variable (y) is calculated from the independent variable (x) using
mathematical operation
Linear regression equation
Equation
y = a + bx
Intercept = a
1
1
Slope = b
x
y
Independent
variable
Dependent
variable
www.mathisfun.com
MedCalc
Example 4
Confidence limits
95% confidence
interval for slope and
intercept
y = a + bx
Example 4
Example 4
Example 4
only 25% of data follow calulated equation
Residuals
• Difference between
measured and calculated
value
• If equation describes the
data well, residuals are
low
x
Y – f(x)
0
> 0
< 0
Example 4
Equation doesn’t describe the data well,
coefficient of determination is low, residuals are high.
Example 5
• Can we predict CRP concentrations if we know WBC count and MPV
values in a group of University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis?
Example 5
• Can we predict CRP concentrations if we know WBC count and MPV
values in a group of University Hospital „Sveti Duh” Emergency Department
patients with a suspicion of acute appendicitis?
Multiple regression
Example 5
Example 5
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Example 5
Coefficient of correlation (r) Interpretation
0-0.24 No association
0.25-0.49 Poor association
0.50-0.74 Moderate to good association
0.75-1.00 Very good to excellent
association
r can be interpreted only if P < level of significance (0.05)
Example 5
There is no association between MPV and CRP in a group of
University Hospital „Sveti Duh” Emergency Department patients
with a suspicion of acute appendicitis.
MedCalc
Example 5
Independent variables
Dependent
variable
Example 6
Example 5
We wanted to predict an OUTCOME
Example 6
• Dependent variable is numeric
• Calculating value of dependent
variable
• Dependent variable is categorical
(binomial)
• Calculating the odds of an event
• Presence/Existence of disease
• Cut-off value
Linear Logistic
Example 6
• Dependent variable is numeric
• Calculating value of dependent
variable
• Dependent variable is categorical
(binomial)
• Calculating the odds of an event
• Presence/Existence of disease
• Cut-off value
Linear Logistic
Logistic regression
• We can analyze more than two groups of data
• Dependent variable is categorical and binomial (Y)
• Independent variables can be both, numerical and categorical (x1, x2, x3...)
Example 6
Dependent variable: OUTCOME (APPENDICITIS YES/NO)
Independent variables:
CLINICAL
(categorical)
LABORATORY
(numerical)
LABORATORY
(categorical)
Appetite CRP
Urine test strip
Vomiting WBC
Diarrhea RBC
Dysuria RDW
Rebound tenderness PLT
Pain migration MPV
Stepwise analysis
• Analyze all variables separately to
identify which are significantly
associated with the outcome
• Include significantly associated
variables
• Include other variables for
adjustment
1. Univariate analysis 2. Multivariate analysis
MedCalc
Example 6
Example 6
Example 6
Example 6
Example 6
Example 6
Example 6
Example 6
Logistic regression
The goal of logistic regression is:
to find the best fitting model
(yet biologically reasonable)
to describe the relationship between the
dependent variable and the set of independent
(predictor or explanatory) variables.
Logistic regression
The goal of logistic regression is:
to find the best fitting model
(yet biologically reasonable)
to describe the relationship between the
dependent variable and the set of independent
(predictor or explanatory) variables.
Logistic regression
The goal of logistic regression is:
to find the best fitting model
(yet biologically reasonable)
to describe the relationship between the
dependent variable and the set of independent
(predictor or explanatory) variables.
https://www.biochemia-medica.com