+ All Categories
Home > Documents > Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences...

Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences...

Date post: 28-Mar-2015
Category:
Upload: emma-gibbs
View: 214 times
Download: 2 times
Share this document with a friend
Popular Tags:
52
Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical and Public Health Institute University of Basel [email protected] meeting of the Swiss Societies of Clinical Neuroph Neurology, Lugano, May 3 rd 2012
Transcript
Page 1: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Department of Epidemiology and Public HealthUnit of Biostatistics and Computational Sciences

Ordinary linear regression

PD Dr. C. SchindlerSwiss Tropical and Public Health Institute

University of [email protected]

Annual meeting of the Swiss Societies of Clinical Neurophysiology and of Neurology, Lugano, May 3rd 2012

Page 2: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Example:

Association betweenblood volume

andbody weight

in women

Page 3: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.
Page 4: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Question:

How does the mean of blood volume depend on body weight in women?

Page 5: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

The regression line

y = 893 + 45.7 · x

y = 893 + 45.7 · 70 = 4092

Page 6: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

In this example, the regression line describes the mean of blood volume of women as a function of weight.

* syn. outcome variable

** syn. explanatory or predictor variable

In general:

The regression line describes the mean of the dependent variable Y* as a function of the independent variable X**.

Page 7: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

00

y = + · x

= intercept = y-value of the line at x = 0

= slope of the line = change in y, if x increases by one unit

x

y = y / x

x

y

Regression equation and regression parameters

Regression parameters

Page 8: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

The values of the parameters must be determinedfrom empirical data.

They are estimates of the respective true parameter values at the population level.

Therefore, they are referred to as parameter estimates.

Page 9: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

^ = estimated intercept = 893 ml: for a weight of 0 kg, a blood volume of 893 ml would be predicted.

^= estimated slope = 45.7 ml/kg: According to this model, the mean of blood volume in women is supposed to increase by 45.7 ml with each additional kg of weight.

Of course, this interpretation does not make sense, since valid predictions can only be made for values of weight between 50 and 80 kg (range of observed values)

Interpretation of parameter estimates

Note: and denote the parameters of the true regression line at the population level.

Page 10: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Residuals and predicted values

Residual plot

Residual = deviation of the observed value of the dependent variable (here: blood volume) from the value which the model predictsfor the respective value of the independent variable (here: weight) (-> predicted value).

Page 11: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Definition and properties of the regression line

2. The regression line always runs through the point (mean of X, mean of Y)

i.e., for the mean of the independent variable, the regression line always predicts the mean of the dependent variable.

1. Among all possible lines, the regression line stands out as the one with the smallest possible variance of the residuals.

Page 12: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Regression output of a statistics program (SPSS)

The rightmost column (Sig) contains the p-values of the two parameterestimates. They refer to the deviation of these estimates from 0. The t-value (4. column) equals the ratio between the parameter estimate(B) and its standard error (Std. Error). The standardized coefficient equals Pearson’s correlation coefficient.1parameter estimate, 2intercept , 3slope

Coefficientsa

Model

Unstandardized CoefficientsStandardized Coefficients

t Sig.B 1 Std. Error Beta1

(Constant) 2 893.253 369.827 2.415 .020

Gewicht ) 3 45.682 5.846 .777 7.815 .000

a. Dependent Variable: blutvol

Page 13: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

With a p-value < 0.0001, the deviation of the estimated slope from 0 is highly significant.

*If the true slope is 0 then two situations are possible: a) the mean of Y does not depend on X at all or b) the mean of Y depends on X in a specific non-linear way (see next slide)

The hypothesis, that the slope of the true regression line be 0 can therefore be rejected at the usual significance level of 0.05 (in fact even at a significance level of 0.0001).

Slope

Page 14: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Here = 0. The mirror symmetriy of the curve with respectto the vertical axis at x = 0 forces the regression line to runhorizontally.

y = 0.1 · x2

y

x

y = 0 · x + 8

Page 15: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

With a p-value < 0.05, the deviation of the estimated intercept from 0 is statistically significant as well at the usual level of 0.05.

Therefore, the hypothesis that the true regression line pass throughthe origin of the coordinate system, can also be rejected at the usual level of 0.05.

Intercept

Page 16: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Approximate 95%-confidence interval of the slope(Parameter estimate ± 2 standard error)

45.7 ± 2 · 5.8 = (34.1, 57.3)

It is thus quite certain that the true regression slope is higher than 30 and lower than 60 ml/kg.

We can be 95% confident that the slope of the regression line at the population level lies between 34.1 and 57.3 ml/kg.

Page 17: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Coefficientsa

Model

Unstandardized CoefficientsStandardized Coefficients

t Sig.B Std. Error Beta1

(Constant) 893.253 369.827 2.415 .020

Gewicht 45.682 5.846 .777 7.815 .000

a. Dependent Variable: blutvol

Model Summary

Model R R SquareAdjusted R

SquareStd. Error of the

Estimate1

.777a .604 .594 308.45008

a. Predictors: (Constant), Gewicht

Standard deviation of residuals

Proportion of variance of Y, which is explained by the model

Other important parameters of a regression model (SPSS)

Page 18: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Decomposition of total variance

Total variance = variance of predicted values + variance of residuals

Total variance = sum of squared deviations of the individual values of Y from their mean value.

Variance of residuals = sum of squared residuals (“residual sum of squares”)

Variance of predicted values = sum of squared deviations of the predicted values of Y from the sample mean of Y.

explained variance unexplained variance

Page 19: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

R2-value (or measure of determination) of the model

Note:

R2 = 1 The data are completely explained by the model, i.e., all the points lie on the regression line.

R2 = 0 slope of the regression line = 0.

explained variance*total variance*

= total variance* - unexplained variance* total variance*

* of Y

Page 20: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Regression line with 95%-confidence intervals

Confidence intervals of predicted values become wider with increasing distance from the center.

Page 21: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Power considerations

Xs

sbSE

1-n)( residuals

SE(b) is proportional to the standard deviation of residuals -> the residuals should be as small as possible

SE(b) is inversely proportional to the square root of n-1-> n should be sufficiently large

SE(b) is inversely proportional to the standard deviation of X-> the range of X should be as large as possible

(standard error of the slope)

Page 22: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Conditions for the validity of a regression model

Page 23: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

The residual plot should display a horizontal point cloud (no banana or wave shape).

-> validity of parameter estimates, confidence intervals and p-values)

The (vertical) variability of the residuals should be more or less constant across the whole range of the independent variable (condition of homoscedasticity).

-> validity of confidence intervals and p-values

a)

b)

Page 24: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

-1000 -500 0 500 1000Residuen (ml)

-3

-2

-1

0

1

2

3Q

u an t

il e d

e r S

tan d

a rd n

o rm

a lv e

r te i

lun g

the distribution of residuals should be approximately normal (visual assess-ment by normal probability plot).

-> validity of confidence intervals and

p-values

1. Each observational unit should only occupy one row of the data table (i.e., each subject should contribute one observation to the analysis).

2. If the individual observational units can be grouped into clusters (families, hospitals, etc.) then the cluster means of residuals must not vary systemati-cally between the clusters (i.e., cluster means of residuals should differ from 0 only by chance*).

-> validity of confidence intervals and p-values

c)

d)

*If they don’t, one should introduce the cluster variable as additional fixed or random factor into the regression model.

Page 25: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Beware: Not all relations can be well described by a regression line. Very often,

relation(s) between dependent and independent variable(s) are non-linear.

Linear associationNon-linear associationy = -22.6 + 2.3 · xy = -1.6 + 4.26 · x – 0.039 · x2

Page 26: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Multiple regression models (illustration based on concrete example)

Association

betweensystolic blood pressure,

gender, age and overweight

Page 27: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Different purposes of regression models

1. Prediction modelsex. Prediction of blood volume based on weight. Prediction of clinical outcome after t years.

2. Reference modelsex. Growth curves, reference values for functional parameters as a function of sex, age, etc.

3. Explanatory models*describe the parallel influences of different predictor variables on a given outcome variable.e.g., Influence of sex, age and obesity on systolic blood pressure.

* also serve to “protect” effect estimates against confounding.

Page 28: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Aim 1: Reference model for adult systolic blood pressure (SBP) in Lugano as a function of sex and age.

Sample used: SAPALDIA-subjects from Lugano with normal weight (i.e., BMI < 25 kg/m2)

Page 29: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

SAPALDIA study

(Swiss Cohort Study on Air Pollution and Lung and Heart Diseases in Adults)

1st survey (1991): n = 9651 lung health (symptoms/lung function) + allergies

2nd survey (2002): n 6500 lung health + allergies + cardiovascular health (blood pressure, 24hr – ECG)

8 study areas (Basel, Geneva, Lugano, Aarau, Wald, Payerne, Davos, Montana)

Study subjects were between 18 and 60 years old in 1991 andhad to be resident in the respective area for at least 3 years.

Page 30: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

SAPALDIA: Study areas

Page 31: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Statistical method

Ordinary linear regression (quantitative outcome)

Simple model:

E[SBP | sex, age] = b0 + b1 · female + b2 · age_50

female = binary variable with 1 in women and 0 in men.age_50 = age – 50 (age centered at 50 yrs)

E[SBP | sex, age] = mean of SBP (as a function of sex and age) predicted value of Y ( “ ) expected value of Y ( “ )

Page 32: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 480-------------+------------------------------ F( 2, 477) = 69.24 Model | 37845.7847 2 18922.8923 Prob > F = 0.0000 Residual | 130361.613 477 273.294787 R-squared = 0.2250-------------+------------------------------ Adj R-squared = 0.2217 Total | 168207.398 479 351.16367 Root MSE = 16.532

------------------------------------------------------------------------------ bpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- female | -13.41972 1.593982 -8.42 0.000 -16.55181 -10.28762 age_50 | .5503499 .0650948 8.45 0.000 .4224418 .678258 _cons | 128.5214 1.296175 99.15 0.000 125.9745 131.0683------------------------------------------------------------------------------

Result of regression model (program STATA)

1. The age-adjusted mean of systolic blood pressure was significantly lower among women (i.e., by 13.4 mm Hg).

3. The value of the intercept parameter, 128.5 mm Hg, is the estimated mean of SBP in 50 year old men (they have female = 0 and age_50 = 0).

2. The gender-adjusted mean of SBP showed a mean increase of 0.55 mm Hg per year.

Page 33: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Point line is slightly curved -> distribution of residuals is slightly skewed

Normal probability plot (QQ-plot)-5

00

50

10

0R

esi

du

als

-50 0 50Inverse Normal

Page 34: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Residual plot

(vertical) variability of residuals increases from left to right

x-axis:predicted values

y-axis:residuals

-50

05

01

00

Re

sid

ua

ls

100 110 120 130 140Fitted values

Page 35: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

If the distribution of residuals is left skewed and their (vertical) variability gets larger with increasing predicted values, then a logarithmic transformation of the data often helps.

We will thus consider the new outcome variable

Y = ln(SBP)

Page 36: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Statistical method

Ordinary linear regression (quantitative outcome)

Alternative model:

E[ln(SBP) | sex, age] = b0 + b1 · female + b2 · age_50

E[ln(Y) | sex, age] = mean of ln(Y) as a function of sex and age exp{E[ln(Y) | sex, age]} = geometric mean of Y as a function of sex and age. ≈ median of Y as a function of sex and age (if residuals are symmetrically distributed)e

E[ln(Y) | sex, age]

Page 37: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Residual plot

x-axis:predicted values

y-axis:residuals

Point line is almost linear -> distribution of residuals close to normal

-.4

-.2

0.2

.4R

esid

uals

-.4 -.2 0 .2 .4Inverse Normal

Page 38: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

(vertical) variability of residuals increases less strongly from left to right

x-axis:predicted values

y-axis:residuals

Residual plot-.

4-.

20

.2.4

Res

idua

ls

4.6 4.7 4.8 4.9 5Fitted values

Page 39: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 480-------------+------------------------------ F( 2, 477) = 68.25 Model | 2.49178871 2 1.24589436 Prob > F = 0.0000 Residual | 8.70727941 477 .018254255 R-squared = 0.2225-------------+------------------------------ Adj R-squared = 0.2192 Total | 11.1990681 479 .0233801 Root MSE = .13511

------------------------------------------------------------------------------ lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- female | -.1109775 .0130272 -8.52 0.000 -.1365753 -.0853798 age_50 | .0043791 .000532 8.23 0.000 .0033337 .0054244 _cons | 4.846471 .0105933 457.50 0.000 4.825656 4.867287------------------------------------------------------------------------------

1. The age-adjusted mean of ln(SBP) was lower by 0.11 in women. The geometric mean ratio of SBP between women and men was exp(-0.11) = 0.90. The geometric mean of SBP was lower in women by 10%.

2. On average, the geom. mean of SBP increased by a factor of exp(0.0043) = 1.0043, i.e., by 0.43% per year of age.

3. The estimated geometric mean of SBP in 50 year old men is exp(4.846) = 127.2.

Page 40: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 480-------------+------------------------------ F( 3, 476) = 45.77 Model | 2.5071076 3 .835702532 Prob > F = 0.0000 Residual | 8.69196052 476 .018260421 R-squared = 0.2239-------------+------------------------------ Adj R-squared = 0.2190 Total | 11.1990681 479 .0233801 Root MSE = .13513

------------------------------------------------------------------------------- lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]--------------+---------------------------------------------------------------- female | -.1103769 .0130459 -8.46 0.000 -.1360115 -.0847423 age_50 | .0043318 .0005346 8.10 0.000 .0032814 .0053823age_50squared | .0000422 .000046 0.92 0.360 -.0000483 .0001326 _cons | 4.840393 .0125019 387.17 0.000 4.815827 4.864959-------------------------------------------------------------------------------

Is the relation between ln(SBP) and age linear?

may be assessed by adding the square of age_50: age_50squared = age_502

The square term is clearly not significant with a p-value of 0.36.

Page 41: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 480-------------+------------------------------ F( 3, 476) = 47.74 Model | 2.59024374 3 .86341458 Prob > F = 0.0000 Residual | 8.60882438 476 .018085766 R-squared = 0.2313-------------+------------------------------ Adj R-squared = 0.2264 Total | 11.1990681 479 .0233801 Root MSE = .13448

------------------------------------------------------------------------------- lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]--------------+---------------------------------------------------------------- female | -.1139236 .0130282 -8.74 0.000 -.1395236 -.0883236 age_50 | .0027492 .0008766 3.14 0.002 .0010267 .0044716female_age_50 | .0025665 .0011 2.33 0.020 .000405 .0047279 _cons | 4.847934 .0105629 458.96 0.000 4.827179 4.86869-------------------------------------------------------------------------------

Is the relation between ln(SBP) and age independent of gender?

may be assessed by adding the interaction term: female_age_50 = female*age_50

The interaction term is statistically significant with a p-value of 0.02.The slope between ln(SBP) and age is higher in women (i.e., 0.0027+0.0026 = 0.0053)than in men (i.e., 0.0027).

Page 42: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Graphical representation of the model on the log-scale 4.

44.

64.

85

5.2

5.4

30 40 50 60 70age

menwomen

ln(S

BP

)

Page 43: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Graphical representation of the model on the original scale: 5

01

00

15

02

00

25

0

30 40 50 60 70

age

SB

P

menwomen

Page 44: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Variable selection strategies in prediction / reference models

1. Between two models select the one which is more significant.

2. Between two models select the one with the lower AIC-value (AIC = Akaike information criterion).

3. Between two models select the one with the lower BIC-value (BIC = Bayesian information criterion).

2) and 3) are better than 1), because they estimate performance of the model in new data. They are strongly linked to cross-validation.3) is stricter than 2) and is preferable if parsimony of the model is an important criterion.

Page 45: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Aim 2: Assessment of the association between adult systolic blood pressure (SBP) in Lugano and overweight.

We consider variable „overweight“ with values:

0 in persons with BMI 25kg/m2

1 in persons with BMI > 25 kg/m2

Page 46: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 924-------------+------------------------------ F( 1, 922) = 98.63 Model | 2.14124857 1 2.14124857 Prob > F = 0.0000 Residual | 20.0169737 922 .021710384 R-squared = 0.0966-------------+------------------------------ Adj R-squared = 0.0957 Total | 22.1582222 923 .024006741 Root MSE = .14734

------------------------------------------------------------------------------ lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- overweight | .0963513 .0097019 9.93 0.000 .0773109 .1153917 _cons | 4.779094 .0067253 710.61 0.000 4.765896 4.792293------------------------------------------------------------------------------

1. The mean of ln(SBP) was higher by 0.096 in overweight persons compared to persons of normal weight. The geometric mean ratio of SBP between overweight and normal weight persons was exp(0.096) = 1.10. The geometric mean of SBP was higher by 10% in overweight persons.

Regression model: ln(SBP) = b0 + b1 · overweight

2. The estimated geometric mean of SBP in normal weight persons is exp(4.779) = 119.0.

Page 47: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 924-------------+------------------------------ F( 4, 919) = 103.79 Model | 6.89527577 4 1.72381894 Prob > F = 0.0000 Residual | 15.2629465 919 .016608212 R-squared = 0.3112-------------+------------------------------ Adj R-squared = 0.3082 Total | 22.1582222 923 .024006741 Root MSE = .12887

------------------------------------------------------------------------------- lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]--------------+---------------------------------------------------------------- female | -.103432 .0091201 -11.34 0.000 -.1213307 -.0855334 age_50 | .0036905 .0005486 6.73 0.000 .0026138 .0047672female_age_50 | .0022885 .000742 3.08 0.002 .0008323 .0037447 overweight | .054128 .0088583 6.11 0.000 .0367431 .071513 _cons | 4.840025 .0083731 578.05 0.000 4.823592 4.856457-------------------------------------------------------------------------------

Adjustment for gender and age:

The gender and age-adjusted mean of ln(SBP) was higher by 0.054 in overweight persons compared to persons of normal weight. The adjusted geometric mean ratio of SBP between overweight and normal weight persons was exp(0.054) = 1.055. The adjusted geometric mean of SBP was higher by 5.5% in overweight persons.

Page 48: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Arithmetic of confounding

OW SBP

age

+ +

association between OW and age = +

association between SBP and age = +

Confounding of association betweenSBP and OW by age = + + = + .

association between OW and F = -

association between SBP and F = -

Confounding of association betweenSBP and OW by sex = - - = + .

OW SBP

female

- -

Both, age and sex are positive confounders of the association between SBP and OW.=> If age and sex are included in the model, the slope between SBP and OW decreases.

Page 49: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Adjustment for clustering of data

Example: multi-center studies

If clustering is ignored, then this may lead to

a) a loss of power (RCT‘s with randomisation stratified by center)

b) confounding (observational studies with different study areas)

Remedy: Introduce study center as a fixed factor into the regression model or use mixed linear model with random effects for the different centers.

Page 50: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Source | SS df MS Number of obs = 3243-------------+------------------------------ F( 11, 3231) = 140.38 Model | 23.5304364 11 2.13913058 Prob > F = 0.0000 Residual | 49.2344884 3231 .015238158 R-squared = 0.3234-------------+------------------------------ Adj R-squared = 0.3211 Total | 72.7649248 3242 .022444456 Root MSE = .12344

------------------------------------------------------------------------------- lnbpsys | Coef. Std. Err. t P>|t| [95% Conf. Interval]--------------+---------------------------------------------------------------- female | -.0887425 .0045394 -19.55 0.000 -.0976429 -.079842 age_50 | .0031324 .0002716 11.53 0.000 .0025999 .0036649female_age_50 | .0031068 .000379 8.20 0.000 .0023637 .0038499 overweight | .0600691 .0045947 13.07 0.000 .0510602 .0690779 _Iarea_161 | -.0076996 .0078966 -0.98 0.330 -.0231825 .0077832 _Iarea_162 | .0201633 .0098355 2.05 0.040 .0008788 .0394477 _Iarea_163 | -.0084698 .0084272 -1.01 0.315 -.0249929 .0080534 _Iarea_164 | -.0076526 .0093321 -0.82 0.412 -.02595 .0106449 _Iarea_165 | -.0411928 .0086567 -4.76 0.000 -.058166 -.0242196 _Iarea_166 | -.0126127 .0083109 -1.52 0.129 -.0289078 .0036825 _Iarea_167 | -.0260132 .0095668 -2.72 0.007 -.0447708 -.0072556 _cons | 4.841831 .0071065 681.32 0.000 4.827897 4.855765

SAPALDIA-example (fixed area effects)

All but one study area gets a parameter estimate, expressing its difference to the one area which serves as the reference (here: area 160).

Page 51: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Random-effects ML regression Number of obs = 3243Group variable: area Number of groups = 8

Random effects u_i ~ Gaussian Obs per group: min = 259 avg = 405.4 max = 624

LR chi2(4) = 1227.18Log likelihood = 2176.8602 Prob > chi2 = 0.0000

------------------------------------------------------------------------------- lnbpsys | Coef. Std. Err. z P>|z| [95% Conf. Interval]--------------+---------------------------------------------------------------- female | -.0889006 .0045361 -19.60 0.000 -.0977913 -.08001 age_50 | .0031431 .0002713 11.58 0.000 .0026113 .003675female_age_50 | .0031 .0003787 8.19 0.000 .0023578 .0038422 overweight | .0597827 .0045912 13.02 0.000 .0507841 .0687812 _cons | 4.831426 .0068285 707.54 0.000 4.818043 4.84481--------------+---------------------------------------------------------------- /sigma_u | .0150779 .0045517 .0083441 .0272459 /sigma_e | .1233706 .0015339 .1204006 .1264139 rho | .014717 .0087657 .0041602 .0430383

SAPALDIA-example (mixed linear model with random area effects)

Random area effects u are viewed as independent outcomes of a normal distribution with u = 0 and u = 0.015 (residual standard deviation within areas = 0.123).

Page 52: Department of Epidemiology and Public Health Unit of Biostatistics and Computational Sciences Ordinary linear regression PD Dr. C. Schindler Swiss Tropical.

Thank you for your attention!


Recommended