+ All Categories
Home > Documents > Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: ...

Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: ...

Date post: 21-Dec-2015
Category:
Upload: aubrie-doyle
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
59
03/25/2 2 H.S . 1 Short overview of statistical methods Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ courses
Transcript
Page 1: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 1

Short overview of statistical methods

Hein Stigum

Presentation, data and programs at:

http://folk.uio.no/heins/

courses

Page 2: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 2

Agenda• Concepts

• Bivariate analysis– Continuous symmetrical– Continuous skewed– Categorical

• Multivariable analysis– Linear regression– Logistic regression

Outcome variable decides analysis

Page 3: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

CONCEPTS

04/19/23 H.S. 3

Page 4: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 4

Precision and bias

• Measures of populations– precision - random error - statistics

– bias - systematic error - epidemiology

Truevalue

Estimate

Precision

Bias

Page 5: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 5

Precision: Estimation

Population Sample

valueTrue

Estimate

( | )

Estimate with confidence interval

95% confidence interval: 95% of repeated intervals will contain the true value

Page 6: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 6

Precision: Testing

Population Sample

2

1

groupvalueTrue

groupvalueTrue2Estimate

1Estimate

|

|

group 1

group 2

p-value=P(observing this difference or more, when the true difference is zero)

Page 7: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 7

Precision: Significance level

Birth weight, 500 newborn, observe difference

H0: boys=girls 10 gr p=0.90

50 gr p=0.40

100 gr p=0.10

130 gr p=0.04

150 gr p=0.02

Ha: boys≠girls

p<0.05Significance level

Page 8: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 8

Precision: Test situations

• 1 sample test• Weight =10

• 2 independent samples• Weight by sex

• K independent samples• Weight by age groups

• 2 dependent samples• Weight last year = Weight today

Page 9: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 9

Bias: DAGs

Egest age

Dbirth weight

C2parity

C1sex

Associations Bivariate (unadjusted)Causal effects Multivariable (adjusted)

Draw your assumptions before your conclusions

Page 10: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

WHY USE GRAPHS?

04/19/23 H.S. 10

Page 11: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 11

Problem example

• Lunch meals per week– Table of means (around 5 per week)

– Linear regression0

1020

3040

50P

erce

nt

1 2 3 4 5 6 7Lunch meals per week

Page 12: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 12

Problem example 2

• Iron level by sex– Both linear and logistic regression

– Opposite results

meangirls

meanboys0

.02

.04

.06

.08

75 129100 10490 110Irom level in bloodIron level in blood

Page 13: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 13

Datatypes

• Categorical data– Nominal: married/ single/ divorced

– Ordinal: small/ medium/ large

• Numerical data– Discrete: number of children

– Continuous: weight

Page 14: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 14

Data type

Normal data

MeansT-testLinear regression

MediansNon-par tests

Freq tableCross, ChisquareLogistic regression

CategoricalNumerical

Yes No

Outcome data type dictates type of analysis

Page 15: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

BIVARIATE ANALYSIS 1Continuous symmetric outcome: Birth weight

04/19/23 H.S. 15

Page 16: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 16

Distribution0

.000

2.0

004

.000

6.0

008

De

nsity

0 2000 4000 6000weight

0.0

002

.000

4.0

006

.000

8D

ens

ity2000 3000 4000 5000 6000

weight

kdensity weightdrop if weight<2000kdensity weight

0 2,000 4,000 6,000weight

Page 17: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 17

Central tendency and dispersion

Mean and standard deviation:

Mean with confidence interval:

Page 18: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 18

Compare groups, equal variance?

• Equal • Not equal

2 0 2 4 2 0 2 4

Page 19: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 19

2 independent samples

Are birth weights the same for boys and girls?

2000 3000 4000 5000 6000Birth weight

2000

3000

4000

5000

6000

Birt

h w

eigh

t

Boys Girlssex

Scatterplot Density plot

Page 20: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 20

2 independent samples test

ttest weight, by(sex) unequal unequal variancesttest var1==var2 paired test

Page 21: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 21

K independent samples

• Is birth weight the same over parity?

Scatterplot Density plot

2000

3000

4000

5000

6000

Birt

h w

eigh

t

0 1 2-7Parity

2000 3000 4000 5000 6000Birth weight

012-7

Parity:

Page 22: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 22

K independent samples test

equal means?

Equal variances?

Page 23: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 23

Continuous by continuous • Does birth weight depend on gestational age?

Scatterplot Scatterplot, outlier dropped

200

03

000

400

05

000

600

0B

irth

wei

ght

200 300 400 500 600 700Gestational age

200

03

000

400

05

000

Birt

h w

eig

ht

200 220 240 260 280 300Gestational age

Page 24: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 24

Continuous by continuous tests

• Cut gestational age up in groups, then use T-test or ANOVA

or

• Use linear regression with 1 covariate

Page 25: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 25

Test situations

• 1 sample test• ttest weight =10

• 2 independent samples• test weight, by(sex)

• K independent samples• oneway weight parity

• 2 dependent samples (Paired)• ttest weight_last_year == weight_today

Page 26: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

BIVARIATE ANALYSIS 2Continuous skewed outcome: Number of sexual partners

04/19/23 H.S. 26

Page 27: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 27

Distributionkdensity partners if partners<=50

25%50% 75% 95%0.0

2.0

4.0

6.0

8.1

11 4 9 20 50Partners

N=394

Distribution of number of lifetime partners

Page 28: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 28

Central tendency and dispersion

Median and percentiles:

Page 29: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 29

2 independent samples

Do males and females have the same number of partners?

Scatterplot Density plot

0 10 20 30 40 50partners

050

100

150

200

Par

tner

s

Males FemalesGender

Page 30: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 30

2 independent samples test

equal medians?

Page 31: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 31

K independent samples

Do partners vary with age?

Scatterplot Density plot

050

100

150

200

Par

tner

s

18-29 30-44 45-60agegr3

0 10 20 30 40 50partners

Age:18-2930-4445-60

Page 32: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 32

K independent samples test

equal medians?

Page 33: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 33

Table of descriptives

ProportionsNormal Skewed

DescriptivesCenter Mean Median pDispersion Standard deviation Fractiles

Confidence intervals for center estimatesStandard error se(mean) se(p)95% Confidence interval mean ± 2*se(mean) p ± 2*se(p)

Numerical data

Page 34: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 34

Table of tests

ProportionsNormal Skewed

1 sample One sample T-test Kolmogorov-Smirnov Binomial2 independent samples Independent sample T-test Mann-Whitney U Chi-squareK independent samples ANOVA Kruskal-Wallis Chi-square2 dependent samples Paired sample T-test Wilcoxon signed rank test Mc-Nemar (2x2)

Numerical data

Categorical ordered:

use nonparametric tests

If N is large:

may use parametric tests

Remarks: If unequal variance in ANOVA:

Use linear regression with robust variance estimation

Page 35: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

BIVARIATE ANALYSIS 3Categorical outcome: Being bullied

04/19/23 H.S. 35

Page 36: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 36

Frequency and proportionFrequency:

Proportion with CI:

Page 37: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 37

Proportion, confidence interval

proportion:

standard error:

confidence interval:

nx

p x=”disease”n=total number

)(2)(

)1()(

pseppCI

npp

pse

Page 38: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 38

Crosstables

equal proportions?

Are boys bullied as much as girls?

Page 39: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 39

Ordered categories, trend

Trend?

equal proportions?

Page 40: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 40

Table of tests

ProportionsNormal Skewed

1 sample One sample T-test Kolmogorov-Smirnov Binomial2 independent samples Independent sample T-test Mann-Whitney U Chi-squareK independent samples ANOVA Kruskal-Wallis Chi-square2 dependent samples Paired sample T-test Wilcoxon signed rank test Mc-Nemar (2x2)

Numerical data

Categorical ordered:

use nonparametric tests

If N is large:

may use parametric tests

Remarks: If unequal variance in ANOVA:

Use linear regression with robust variance estimation

Page 41: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

MULTIVARIABLE ANALYSIS 1Continuous outcome: Linear regression, Birth weight

04/19/23 H.S. 41

Page 42: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 42

Regression idea

residual error,e

xofeffect ,tcoefficienb

covariate =x

outcome=y

:model

1

10

exbby

covariate = x,x

:cofactorsmany with model

21

22110 exbxbby

2500

3000

3500

4000

4500

5000

birt

h w

eigh

t (gr

am

)

250 260 270 280 290 300 310gestational age (days)

Page 43: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 43

Model and assumptions

• Model

• Association measure1 = increase in y for one unit increase in x1

• Assumptions– Independent errors

– Linear effects

– Constant error variance

• Robustness– influence

),0(, 222110 Nxxy

Page 44: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 44

Workflow

• DAG

• Scatterplots

• Bivariate analysis

• Regression– Model estimation– Test of assumptions

• Independent errors• Linear effects• Constant error variance

– Robustness • Influence

Egest age

Dbirth weight

C2parity

C1sex

539

2000

3000

4000

5000

6000

birt

h w

eigh

t (gr

am

)

200 300 400 500 600 700gestational age (days)

Page 45: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Categorical covariates

• 2 categories– OK

• 3+ categories– Use “dummies”

• “Dummies” are 0/1 variables used to create contrasts• Want 3 categories for parity: 0, 1 and 2-7 children• Choose 0 as reference• Make dummies for the two other categories

04/19/23 H.S. 45

generate Parity1 = (parity==1) if parity<.

generate Parity2_7 = (parity>=2) if parity<.

Page 46: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Create meaningful constant

Expected birth weight at:

7_21

)(tirth weighExpected b

43210 ParityParitysexgest

yE

gr1925 0

gest= 0, sex=0, parity=0, not meaningful

gest=280, sex=1, parity=0gr35241280 210

Page 47: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Model estimation

04/19/23 H.S. 47

coeff 95% conf. Int.Birth weight at ref 3524.3Gestational age

per day 6.0 (3.9 , 8.2)Sex

Boy 0Girl -139.2 (-228.9 , -49.5)

Parity0 01 232.0 (130.6 , 333.5)2-7 226.0 (106.9 , 345)

Page 48: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 48

Test of assumptions

• Plot residuals versus predicted y– Independent

residuals?

– Linear effects?

– constant variance?

-100

0-5

000

500

1000

1500

Res

idua

ls

3200 3400 3600 3800 4000Linear prediction

Outlier not included

Page 49: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 49

Violations of assumptions• Dependent residuals

Use mixed models or GEE

• Non linear effectsAdd square term

• Non-constant varianceUse robust variance estimation

-1-.

50

.51

200 220 240 260 280 300gest

-2-1

01

2re

s

3400 3500 3600 3700 3800p

Page 50: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 50

Influence

Outlier

Regression with outlier

Regressionwithout outlier

2000

3000

4000

5000

6000

Birt

h w

eigt

h

200 300 400 500 600 700Gestational age

Page 51: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 51

Measures of influence

• Measure change in:– Predicted outcome

– Deviance

– Coefficients (beta)• Delta beta

Remove obs 1, see changeremove obs 2, see change

-.6

-.4

-.2

0.2

Influ

ence

1 2 10Id

Page 52: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Delta beta for gestational age

04/19/23 H.S. 52

539-10

-8-6

-4-2

0D

fbet

a ge

stC

280

2000 3000 4000 5000 6000weight

beta for gestational age= 6.04

If obs nr 539 is removed, beta will change from 6 to 16

Page 53: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Removing outlier

04/19/23 H.S. 53

coeff 95% conf. Int.Birth weight at ref 3524Gestational age

per day 6 (4 , 8)Sex

Boy 0Girl -139 (-229 , -49)

Parity0 01 232 (131 , 333)2-7 226 (107 , 345)

coeff 95% conf. Int.Birth weight at ref 3531Gestational age

per day 17 (13 , 20)Sex

Boy 0Girl -166 (-252 , -80)

Parity0 01 229 (132 , 326)2-7 225 (112 , 339)

Full model Outlier removed

One outlier affected two estimates Final model

Page 54: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

MULTIVARIABLE ANALYSIS 2Binary outcome: Logistic regression, Being bullied

04/19/23 H.S. 54

Page 55: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

Ordered categories and model

04/19/23 H.S. 55

Categories Regression model

2 Logistic

3-7 Ordinal logistic

>7 Linear (treat as interval)

Interval versus ordered scale:

Interval scale

Ordered scale

1 2 3

low medium high

Page 56: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 56

Logistic model and assumptions

• Association measure Odds ratio in y for 1 unit increase

in x1

• Assumptions– Independent errors

– Linear effects on the log odds scale

• Robustness– influence

11

eOR

Page 57: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 5704/19/23 5704/19/23 H.S. 57

Being bullied

• We want the total effect of country on being bullied. – The risk of being bullied depends on age

and sex.

– The age and sex distribution may differ between countries.

• Should we adjust for age and sex?

Ecountry

Dbullied

C1age

C2sex

No, age and sex are mediating variables

Page 58: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

N % p-value OR 95% conf. Int.Country <0.001

Sweden 407 8.7 1Island 448 10.9 1.3 (0.8 , 2)Norway 379 16.2 2.0 (1.3 , 3.2)Finland 409 25.9 3.7 (2.4 , 5.6)Denmark 436 23.4 3.2 (2.1 , 4.9)

Logistic: being bullied

04/19/23 H.S. 58

ORRR if outcome is rareOR>RR (further from 1) if the outcome is common

Prevalence of being bullied=17%

Roughly:Same risk of being bullied in Island as in Sweden.2 times the risk in Norwayas in Sweden.

3 times the risk in Finnlandas in Sweden.

Page 59: Jul-15H.S.1 Short overview of statistical methods Hein Stigum Presentation, data and programs at:  courses.

04/19/23 H.S. 59

Summing up• DAGs

– State prior knowledge. Guide analysis

• Plots– Linearity, variance, outliers

• Bivariate analysis– Continuous symmetrical Mean, T-test, anova– Continuous skewed Median,

nonparametric– Categorical Freq, cross, chi-square

• Multivariable analysis– Continuous Linear regression– Binary Logistic regression


Recommended